Weather Bot — Episode 6: DRY RUN Results — 21 Predictions Across 5 Cities on Polymarket

Ten days. Twenty-one predictions. Five cities. No real money.

The DRY RUN was supposed to answer one question: does the v1.5 strategy actually work, or did I just backtest myself into false confidence? I let the bot run from March 3 to March 11, scanning markets every five minutes, sending Telegram reports, tracking every forecast against every resolution.

Here's what the data said.

What This Post Covers

The complete DRY RUN results — city by city, confidence level by confidence level. Which cities performed, which ones I probably should've dropped, and a dynamic exit simulation that changed how I think about this whole project. If you've been following along since the 0-7 disaster in Episode 3, this is where the v1.5 changes get their first real test.

The Overall Numbers

With the model maximum + city bias approach (v1.5), I got 12 correct predictions out of 21. That's a 57% hit rate — up from the 14% (1 out of 7) that the old model averaging approach produced on the exact same data.

Same trades. Same cities. Same dates. Just a different way of picking which model to trust. The gap between 14% and 57% was the entire reason I kept going after that first week.

City by City

Ankara was the clear winner. 80% accuracy on HIGH confidence trades (4 out of 5). The +0.5°C bias correction I'd added after the 0-7 week was doing exactly what I hoped — pushing the forecast up to match reality. Ankara also had the advantage of being in a timezone where most US-based bots were asleep during the best buying window.

Buenos Aires looked perfect — 2 for 2, 100% — but two trades isn't enough to call anything. No bias correction needed, which was a good sign. I wanted more data before getting excited.

Paris landed at 67% on HIGH confidence (2 out of 3). The one miss was a MEDIUM confidence trade that I shouldn't have taken. When I restricted Paris to HIGH only, it was 2 for 2. That restriction stayed.

Wellington was a coin flip — 50%, 2 out of 4. The two misses were both volatile weather days where the actual temperature swung further than any model predicted. Wellington has some of the most unpredictable weather of the five cities, and the data confirmed it.

Seoul was the worst performer at 33% on HIGH confidence (1 out of 3). But here's what was interesting: when I added the +0.5°C bias correction for MEDIUM confidence trades, Seoul's accuracy went from 33% to 50% (2 out of 3 on MEDIUM). Still not great, but the correction was clearly helping. Seoul stayed in the portfolio, on watch.

HIGH vs MEDIUM: The Gap That Matters

Across all cities, the confidence filter did its job:

HIGH confidence (all three models within 1.5°C): 62% win rate, 8 out of 13. These were the trades where GFS, ECMWF, and ICON basically agreed on the temperature. When they agree, they're usually right.

MEDIUM confidence (models spread 1.5-3.0°C): 38% win rate, 3 out of 8. The bias correction helped, but the underlying uncertainty was still there. These trades were riskier, and the numbers showed it.

The gap between 62% and 38% was big enough to build a rule around: Ankara, Paris, and Wellington only trade on HIGH. Seoul and Buenos Aires are allowed MEDIUM with bias correction. Anything above 3.0°C spread gets skipped entirely.

The Dynamic Exit That Sold Me

Here's where the numbers got interesting beyond just forecast accuracy.

Ankara, March 10. I'd flagged this trade during the DRY RUN — it wasn't a real position, but I was tracking what would've happened.

The bot would've bought at $0.045 around 3 AM (Ankara time). By 7 AM, the price had climbed to $0.107. That's a 2.4× move in four hours. Under the dynamic exit rules from Episode 4, the 4-hour tier triggers a sell at 2.0× the buy price. So the bot would've sold at $0.09 or higher.

Profit: at least +$0.045 per share. On a $1 position, that's roughly +100% in four hours.

And here's the thing — my forecast for that trade was technically wrong. The predicted bucket didn't match the resolution bucket. But it didn't matter. The price moved in my direction fast enough that the dynamic exit would've locked in profit regardless of the final answer.

That one simulation did more to convince me than any win rate calculation. The strategy doesn't need perfect forecasts. It needs prices to move in the right direction before resolution — and the data showed they often do.

What Worked

Studying the successful traders first. If I'd kept trying to invent my own approach instead of copying what gopfan2 demonstrated on-chain, I'd probably still be iterating on EV formulas.

Running DRY RUN for a full week before the strategy pivot, and then another full week after. Two weeks of paper trading caught every assumption error without costing me a cent.

Focusing on non-US cities. Less competition, more predictable models, better time zone coverage. Ankara alone justified the entire city selection strategy.

What Didn't

US cities were a waste from the start. Chicago and Seattle had 4-5°F errors that no amount of bias correction could fix. I should've dropped them after the first three data points instead of waiting for seven.

The confidence filter was initially too strict. I spent one stretch of about 10 hours seeing zero trading opportunities because every city had a model spread above 1.5°C. I loosened it for Seoul and Buenos Aires with the bias correction, which brought opportunities back without killing accuracy.

MEDIUM confidence trades on cities without bias correction. Paris at 0% on MEDIUM told me everything I needed to know. Some cities just don't work when the models disagree.

Twenty-one data points isn't a lot. I knew the sample was small. But the direction was clear enough to make a decision: switch to real money and keep collecting data.

Key Takeaways

  • Model max + city bias (v1.5): 57% overall, up from 14% with model averaging. Same data, different method.
  • Ankara at 80% on HIGH confidence was the standout. Seoul at 33% was the weakest but improved to 50% with bias correction on MEDIUM trades.
  • HIGH confidence trades win at 62%. MEDIUM at 38%. The filter works.
  • Dynamic exit simulation showed profitable trades even when the forecast was wrong. Price movement matters more than prediction accuracy.

What's Next

The DRY RUN data was in. The confidence filters were tuned. The city list was set. I'd been staring at paper trades for two weeks, and the numbers were saying what I wanted to hear: this might actually work with real money.

Episode 7 covers what happened when I switched to live trading — and the exit strategy change that turned out to be more important than anything I'd done with the forecasting.

← Previous: Episode 5: Building a Polymarket Weather Bot in Python       Next: Episode 7: The $0.45 Exit Strategy →


More updates on the way. If you're working on something similar or found a smarter way to do it, drop it in the comments — the more we share, the faster we all move.

Disclaimer: This blog documents my personal learning journey. Nothing here is financial advice.

Comments