Weather Bot — Episode 11: When Data Rewrote My Entire Polymarket Strategy — MEDIAN, Model Swaps, and the LA Disaster

I spent two weeks picking forecast models based on gut feeling and small samples. Then I got enough data to actually check, and the data told me I was wrong about almost everything.

Six out of nine cities had the wrong model selected. Not slightly off — systematically inferior models chosen with 10-15 data points when I needed 50+. And one city showed me exactly how bad it could get.

What This Post Covers

The episode where I stopped guessing and started measuring. A 17°F forecast disaster in Los Angeles that exposed a fundamental flaw in how the bot picks temperatures, the fix that came from it, city-by-city model accuracy data, and a confidence-level analysis that made one decision painfully obvious.

The LA Disaster

Los Angeles, March 28. My bot predicted a high of 86°F. Actual high: 69°F. Off by seventeen degrees Fahrenheit.

I pulled the model data:

Models for LA, March 28:
  GFS:   69.0°F
  ICON:  70.2°F
  AIFS:  78.2°F
  HRRR:  69.0°F
  ECMWF: 86.4°F  ← the outlier

MAX mode picked: 86.4°F
Actual:          69.4°F
Error:           -17.0°F

ECMWF — supposedly the world's most accurate model — predicted 86.4°F while every other model said 69-70°F. My bot was set to MAX mode: always pick the highest prediction. So it grabbed the one outlier and ran with it.

ECMWF's grid cell for the LA airport coordinates probably sits partially over inland desert. The model isn't globally wrong — it's wrong for this specific location. And MAX mode amplified that by always choosing the highest value, no matter how far it deviated from everything else.

MEDIAN: The Fix

The solution hit me when I sorted the five model values:

Sorted: [69.0, 69.0, 70.2, 78.2, 86.4]
Middle value: 70.2°F
Error vs actual: -0.8°F  ← nearly perfect

Pick the middle, not the maximum. MEDIAN ignores one or two outliers and goes with what the majority agrees on. For LA, it would've been off by 0.8°F instead of 17°F.

I added MEDIAN as a selection mode alongside MAX and single-model options. Any new city starts with MEDIAN until I have enough data to know which specific model works best there.

The Model Accuracy Report

With the forecast tracker running for two weeks on the VPS, I finally had real numbers on which models perform best at the 36-48 hour window — the time when my bot actually places orders.

The table was humbling. Ankara — my best-performing city — should've been using GFS the whole time, not ECMWF. GFS hit 83% accuracy at 0.57° average error. Tokyo was the same story: GFS at 100% accuracy (5/5), but I'd been using MAX mode. Atlanta turned out to want AIFS, not the GFS→ICON combo I'd been running.

The worst part was Shanghai. I had the bias direction completely backwards. My setting was +0.4° (assuming underprediction) when the actual error was -0.70° (overprediction). The model was running hot, not cold. I'd been pushing it further in the wrong direction.

NYC stood out for a different reason: 0% accuracy across 8 trades. 4.55° average error. No model could predict NYC weather reliably at the 36-48h window. I kept it active because it occasionally produced MEDIUM confidence wins (+$2.00 on one trade), but the data was saying this city might not be worth it.

HIGH vs MEDIUM: The Money Split

This was the number that made me sit up straight.

ConfidenceROIProfit
HIGH+21.1%+$3.47
MEDIUM-34.0%-$7.36

MEDIUM confidence trades were destroying my returns. If I'd only taken HIGH trades, my portfolio would've been +$3.47 instead of -$3.89. The MEDIUM trades weren't just underperforming — they were actively eating my profits.

But it wasn't the same everywhere. Miami made +$2.79 on MEDIUM trades (1 out of 2 winning). NYC made +$2.00 (1 out of 3). Those two cities had something about their weather patterns that made MEDIUM confidence still tradeable.

So the fix wasn't "kill all MEDIUM" — it was "kill MEDIUM where it loses." I removed MEDIUM from Ankara (0/2, -$1.92), Atlanta (0/2, -$1.86), Toronto (0/2, -$1.76), and several others. Kept it for Miami and NYC only.

The Bias Recalibration

Every city has a systematic bias — models consistently run hot or cold for that location. My original values were set with small samples. Now I had 50+.

Most corrections were small adjustments. Buenos Aires went from +1.6° to +0.95° (I'd been overcompensating). Seattle from +2.3° to +1.35°. Tokyo from -1.5° to -1.0°.

But Shanghai's was different. The direction itself was wrong — +0.4° became -0.13°. I'd been pushing the forecast higher when it was already too high. Every Shanghai trade had a built-in headwind from my own bias correction making it worse. Fixing that one city's direction felt like more progress than all the other adjustments combined.

The Comeback: London and Paris

Both cities had been deactivated back in Episode 9 for poor performance. I'd assumed they were just hard to predict. The new data told a different story.

London with AIFS: 0.12°C average error. 100% accuracy. Five out of five predictions landed in the correct bucket. Every single one.

Paris with ICON: 0.22°C average error. Also 100%. Five out of five.

These were better numbers than any other city in the portfolio. The previous failures weren't because London and Paris are unpredictable — it was because I'd picked the wrong models and bought too early (54h+). With the right model, HIGH confidence only, and the 48-hour buy limit from Episode 10, both cities became the most accurate in the system.

I reactivated both. Seoul stayed off — ECMWF (the best available model for Seoul) still showed +3.0° average error at 36-48h. Spring in Seoul creates temperature swings that none of the models handle well.

Key Takeaways

  • MAX mode is dangerous when one model is an outlier. MEDIAN picks the middle value and ignores extremes. LA's 17°F error would've been 0.8°F with MEDIAN.
  • 6 out of 9 cities had the wrong model. You need 50+ data points per city to pick reliably — 10-15 isn't enough.
  • HIGH confidence: +21.1% ROI. MEDIUM: -34.0%. Remove MEDIUM from cities where it loses, keep it where it works (Miami, NYC).
  • London and Paris came back at 100% accuracy once I used the right models. Bad performance isn't always a city problem — sometimes it's a model selection problem.

What's Next

The model swaps were done. The bias corrections were updated. MEDIUM was restricted to cities that earned it. In Episode 12, I'll cover what happened when Polymarket added 11 new cities to the weather markets — how I evaluated each one without guessing, and which ones turned out to be almost perfectly predictable.

← Previous: Episode 10: Moving to Oracle Cloud and the Stop-Loss Surgery       Next: Episode 12: From 15 Cities to 26 →


More updates on the way. If you're working on something similar or found a smarter way to do it, drop it in the comments — the more we share, the faster we all move.

Disclaimer: This blog documents my personal learning journey. Nothing here is financial advice.

Comments