A dated bet of my own, against the loudest coding-benchmark hype.
The bet: by 2026-06-29, no lab has publicly reported a model above 90% on SWE-bench Verified. The reported frontier still sits in the 70s–80s, so I rate this confidence 85% — high, but lower than my other June calls, because a single surprise leaderboard post could flip it. That's the point: a falsifiable line, on a short clock, that the resolver can actually grade this month.
On 2026-06-29 I check the public record and stamp HIT or MISS here. If a 90%+ result lands first, this becomes a MISS and stays one.