Weather Bot — Episode 12: From 15 Cities to 26 — How I Evaluated New Polymarket Weather Markets

Polymarket kept adding weather markets. By late March, I was spotting cities that didn't exist when I started this project two months earlier. Hong Kong, Singapore, Sao Paulo, Madrid — each one a potential new revenue source, or a potential new way to lose money.

After the LA disaster in Episode 11, I wasn't about to add cities by guessing which model would work. So I built a pipeline: two weeks of data collection before any real money goes in. No shortcuts.

What This Post Covers

The expansion from 15 to 26 cities. How I found the new markets, the two-week evaluation process, which cities passed (10 out of 11), which one failed spectacularly, and the quirks I discovered along the way — like a US city that resolves weather from a Space Force base.

The Discovery

I wrote a script that checked Polymarket's gamma API for temperature markets in every major city I could think of. Eleven new ones came back positive:

Asia-Pacific: Hong Kong, Singapore, Taipei, Beijing. Americas: Houston, LA, San Francisco, Denver, Austin, Sao Paulo. Europe: Madrid.

Each one needed research before anything else: which airport weather station does it use, what timezone, when does peak temperature hit, does it resolve through Weather Underground or something else?

Some of the answers caught me off guard. Houston uses Hobby Airport, not the bigger Bush Intercontinental — different location, different microclimate. Denver resolves from Buckley Space Force Base, not Denver International. I spent an embarrassing amount of time wondering why my Denver forecasts were off before finding that one. Taipei is the only city that resolves via NOAA instead of Weather Underground. And Singapore is labeled in Fahrenheit on Polymarket but the market question uses Celsius — the kind of thing that would've cost me money if I hadn't read the fine print.

The Two-Week Pipeline

Every new city went through the same process. No exceptions.

Week 1-2: Snapshot mode only. The bot collects forecast data every 6 hours but doesn't trade. All models get recorded — GFS, ECMWF, ICON, AIFS, plus regional models like HRRR for US cities. Market history gets collected daily: winning bucket, actual temperature, price movements.

After 2 weeks: I run the accuracy analysis. Each city needs at least 8 data points at the 36-48h window. Pick the best model. Set entry price based on historical winning bucket prices. Decide 1 bucket or 2. Activate with HIGH confidence only.

It felt slow. Two weeks of watching data accumulate while those markets were live and tradeable. But after the LA disaster — 17°F error because I guessed instead of measuring — slow was the right speed.

API Growing Pains

Adding 11 cities bumped API calls by 73%. I ran the math:

Before: 15 cities × 3 calls × 96 scans/day
        = 4,320/day
After:  26 cities × 3 calls × 96 scans/day
        = 7,488/day
Open-Meteo free limit: 10,000/day

Still under, thanks to the 15-minute forecast cache. But the first day after expansion, I got 69 rate-limit errors (HTTP 429) — all in a single hour. The bot's retry logic handled them automatically, and they haven't come back since. Probably a burst during the initial data collection when the cache was cold.

New city evaluation results showing 11 cities with best model accuracy and activation status with Hong Kong at 100% and San Francisco failed at 33%

The Results: 10 Out of 11

Hong Kong stood out immediately. AIFS predicted its temperature within 0.20°C on average at the 36-48h window. Eight data points, every single one in the correct bucket. For a market with 1°C buckets, that's essentially perfect. I set the entry price at $0.25 and activated it with high expectations.

Sao Paulo was similarly impressive — ICON at 0.42°C error, 100% accuracy across 8 data points. Singapore came in at 88% with ICON. Beijing at 88% with ECMWF. Houston at 100% with AIFS, though with only 5 data points I wanted more before getting too confident.

San Francisco was the only failure. ECMWF was the "best" model at 3.40°F average error, but 33% accuracy means two out of three predictions missed the bucket. Every model struggled — even GFS showed 4.94°F error. The marine layer and microclimates around SFO airport create conditions that no global model handles well. SF stayed off.

Entry Price Logic

I didn't guess entry prices. Each city's threshold came from the D-1 (day before) winning bucket price distribution across resolved markets.

The formula: entry ≈ D-1 median price + $0.04-0.06, targeting a 70-75% catch rate. Catch rate meaning: what percentage of winning buckets were available at or below this price, 24-48 hours before peak?

For context, the existing cities I'd been trading showed catch rates from 73% (NYC) to 95% (Tokyo). Singapore was the trickiest — D-1 median price of $0.365 meant the winning bucket is usually expensive a day out. I set entry at $0.30, accepting lower catch rate for better risk/reward.

Dynamic Bias: No More Guessing

All 10 new cities launched with fixed bias at zero. Instead of guessing a correction value, they use a rolling calculation:

adjusted_bias = fixed_bias × 0.7
             + yesterday_error × 0.3

With fixed bias at 0, it simplifies to: today's bias = 30% of yesterday's error. If the model underpredicted Hong Kong by 1.8°C yesterday, today's bias becomes +0.54°. If it overpredicted by 1.0°, bias becomes -0.30°. Self-correcting, no manual input needed.

After 20+ data points per city, I'll set permanent fixed bias values. Until then, the dynamic system keeps things honest without me touching anything.

The Final Count

From 5 cities in Episode 1 to 26. Twenty-four active, two disabled (Seoul and San Francisco). The bot scans every market, every 3 minutes, 24 hours a day, from a free cloud server in Tokyo. Each trade is still $1. The expansion added more opportunities without requiring more capital — just more data and more patience.

Key Takeaways

Two-week data pipeline before activating any city. No guessing. Hong Kong at 0.20° error and 100% accuracy justified the wait.
Research the airport station first. Houston uses Hobby, Denver uses a Space Force base, Taipei resolves via NOAA. Get these wrong and your forecasts are off before you start.
10 out of 11 new cities passed evaluation. San Francisco was the only failure — marine layer defeats all models.
Dynamic bias at 0.3× yesterday's error lets new cities self-correct without manual calibration.

What's Next

Twenty-six cities running, data accumulating, the strategy refined from dozens of iterations. And then Polymarket changed the rules. In Episode 13, I'll cover the fee structure change that hit weather markets, a ghost trade that almost cost me $12, and where the numbers actually stand now.

← Previous: Episode 11: When Data Rewrote My Entire Polymarket Strategy Next: Episode 13: The Fee Change →

More updates on the way. If you're working on something similar or found a smarter way to do it, drop it in the comments — the more we share, the faster we all move.

Disclaimer: This blog documents my personal learning journey. Nothing here is financial advice.

Search This Blog

PrintMoneyLab - Automate & Earn