Database Strategy for AI Applications: Scaling from Local to Oracle Cloud

When I started building the x402 Protocol, everything went into JSON files. Scores, logs, user data, API responses — all of it dumped into flat files that I could open in a text editor and read with my eyes. It worked. For about two weeks.

Then the SpeedTap Telegram bot started getting actual users, and the problems arrived. Slow reads when the file got large. Corrupted data when two processes tried to write at the same time. No way to query "show me the top 10 scores from yesterday" without loading the entire file into memory and filtering it in Python.

JSON files aren't a database strategy. They're a prototype that overstayed its welcome. Here's what replaced them and why it's been enough for everything I've built since.

What This Post Covers

The database decisions I made across each project, why SQLite runs almost everything in production, where JSONL logging fits as a lightweight alternative, what I haven't needed yet and why I'm not reaching for bigger tools prematurely, and the backup pattern that gives me peace of mind without adding complexity.

The JSON File Trap

For prototypes and first-day builds, JSON files are irresistible. No setup. No driver installation. No schema definition. Just open a file, write a Python dictionary, close the file. I can see the data by opening the file. I can edit the data by editing the file. Everything is visible and simple.

The problems show up the moment a project moves past "just me testing locally."

Concurrent writes corrupt data. Two processes writing to the same JSON file at the same time can produce half-written content. I lost a morning of SpeedTap leaderboard data this way — the file had a truncated array because the bot process and the web process both tried to write simultaneously.

Queries require loading everything. "What's the average score for Hard mode?" means loading all scores into memory, filtering by mode, and calculating. With 50 users this is instant. With 5,000 it's slow. With 50,000 it's unusable.

No schema enforcement. JSON doesn't care what shape your data is. That sounds like freedom until you realize your bot has been writing user_name in some entries and username in others for the past three days, and now half your queries return nothing because the field name doesn't match.

Every project I've built has started with JSON files and moved away from them within the first week. The pattern is so consistent that I now force myself to start with SQLite from day one, even for tiny projects. The ten minutes of setup saves hours of data migration later.

SQLite Runs Almost Everything

SQLite is a weird database. It's "serverless" in the original sense — there's no server process to install, configure, or manage. The entire database is a single file on disk. You interact with it through SQL queries. It's included by default in Python, which means there's literally nothing to install.

For solo builders running projects on a single VPS, SQLite is surprisingly powerful. It handles thousands of reads per second without breaking a sweat. It supports proper SQL queries, indexing, transactions, and schema enforcement. And the entire thing fits in one file that I can copy with scp if I need to move it.

Here's what's running on SQLite right now across my Oracle Cloud instance:

# SpeedTap — 12 tables users, scores, daily_challenges, duels, purchases, referrals, achievements, settings, notifications, play_counts, challenge_codes, admin_stats # x402 Protocol — stats + cache kr_sentiment_cache.json → SQLite migration (pending) stats.jsonl → read by admin dashboard # ACP Agent — task history task_log, verification_results

SpeedTap alone has 12 tables. Users, scores, daily challenges, duels, purchases, referral tracking, notification history. All of it in a single .db file that's currently about 2 MB. Claude Code wrote every table definition and every query. I described what I needed ("store the user's best time per difficulty mode, with a timestamp"), and Claude Code translated that into a CREATE TABLE statement and the corresponding Python functions.

Why SQLite and not PostgreSQL or MySQL: Those are better databases for large-scale, multi-server architectures. They're also dramatically more complex to set up, maintain, and back up. For a solo builder running everything on one VPS, that complexity has no payoff. SQLite gives me 95% of the capability at 5% of the operational burden. If any project outgrows SQLite, I'll migrate then. None have.

JSONL for Logs and Events

There's one place where flat files still make sense: event logging. Not the structured data that runs the application, but the stream of events I use for monitoring and debugging.

For the x402 Protocol, every API call, every Claude invocation, every cache hit, and every payment writes a line to a JSONL file:

# stats.jsonl — one event per line {"ts": 1744785600, "type": "api_call", "endpoint": "kr-sentiment", "paid": true, "price_usd": 0.05} {"ts": 1744785601, "type": "claude_call", "cost_usd": 0.003, "tokens_in": 2073} {"ts": 1744785602, "type": "cache_hit", "endpoint": "kr-sentiment"}

JSONL (JSON Lines) is one JSON object per line. No arrays, no nesting. Appending a new event means opening the file in append mode and writing one line. It's fast, it's safe against concurrent writes (append is atomic on most filesystems), and I can grep through it with standard command-line tools.

The admin dashboard reads this file and generates daily summaries. Telegram bot commands like /stats and /cost parse it for real-time metrics. It's not a database — it's a log stream. I don't query it for random access; I process it sequentially.

The distinction matters: structured data that the application depends on goes in SQLite. Event streams that I use for monitoring and analysis go in JSONL. Mixing them — putting application state in log files or logging events into database tables — creates confusion about which is the source of truth.

Backups Without Complexity

The fear with any single-file database is losing it. If the .db file gets corrupted or the disk fails, everything's gone. For a solo builder without a dedicated backup infrastructure, this needs a simple solution.

My backup approach is embarrassingly simple: a cron job that copies the SQLite file to a different location every 6 hours.

# crontab — runs every 6 hours 0 */6 * * * cp /home/ubuntu/speedtap/speedtap.db \ /home/ubuntu/backups/speedtap_$(date +\%Y\%m\%d_\%H).db

Old backups older than 7 days get cleaned up by another cron job. The result is about 28 point-in-time snapshots that let me recover to any 6-hour window in the past week.

For the x402 Protocol's sentiment cache, I use atomic writes — write to a temp file first, then rename it over the existing file. This prevents the half-written-file corruption that plagued my JSON file era. If the process crashes mid-write, the old file survives intact.

Is this enterprise-grade? No. Would a PostgreSQL replication setup be more robust? Yes. Is the extra robustness worth the extra complexity for a solo project? Not yet. The right backup strategy is the one you actually run, not the one you plan to set up someday.

What I Haven't Needed Yet

Three database technologies that keep coming up in "AI developer stack" articles that I haven't adopted:

PostgreSQL. The standard recommendation for any "real" project. Powerful, well-supported, battle-tested. But it requires running a database server process, managing connections, configuring authentication, handling upgrades. For a single-server setup where SQLite handles everything, the added complexity has no payoff. If I ever run multiple application servers that need to share a database, PostgreSQL becomes necessary. Until then, it's overhead.

Redis. In-memory cache and message broker. I recently signed up and started experimenting with it for a new project. For the x402 Protocol's caching needs, a Python dictionary with TTL checks has been sufficient. Redis would make sense if I needed shared cache across multiple processes or persistence guarantees for cached data. I'm watching for a use case that justifies the setup, but I'm not forcing it.

Edge databases (Cloudflare D1 and similar). SQL databases that run at the edge, close to users globally. The pitch is compelling for latency-sensitive applications. My backends all run on a single Oracle VPS in Tokyo, so there's no multi-region data problem to solve. If I built a project that served personalized data to users worldwide and the latency mattered, an edge database would be worth exploring. For now, it's a solution looking for a problem in my stack.

The pattern across all three: each is a good tool that solves a real problem I don't currently have. Adopting them prematurely would add operational complexity without improving anything users experience. The discipline is to reach for tools when the problem arrives, not before.

Where to Start

If you're still using JSON files for application data, the migration path to SQLite is shorter than it looks.

Pick one project. Define the tables you need (Claude Code can generate the schema from a description of your data). Write a migration script that reads your JSON files and inserts the data into SQLite. Update your application code to read from and write to the database instead of the file. Test. Deploy.

The whole migration takes maybe an hour for a small project. The queries Claude Code writes will be correct almost every time because SQL is well-defined and well-documented. You don't need to become a database expert. You need to describe what data you have and what questions you want to ask of it.

Start simple. One database file. A few tables. Basic queries. Add indexes when something feels slow. Add backups the day you have data you'd miss if it disappeared. Each step is small and each one compounds into a foundation that doesn't fall apart when real users show up.

What's Next

Data stored correctly. Deployments automated. Secrets secured. The next question is visibility — how do you know if your running services are actually healthy? The next post covers monitoring and logging in production, the Telegram bot commands that give me real-time metrics, and the simple alerting setup that tells me when something breaks before users notice.

← Previous: The Power of Claude's Artifacts       Next: Monitoring & Logging →


More posts in this series will cover the actual stack — monitoring, scaling patterns, the philosophy of AI-assisted coding, and the path to sustainable revenue. If you're working on shipping something with AI tools and have questions, drop them in the comments — the more we share, the faster we all move.

Disclaimer: This blog documents practical development workflows based on personal experience. Nothing here is financial, legal, or professional advice.

Comments