Building Scalable APIs: Lessons from Production at Scale

There's a massive gap between building an API that works and building one that scales. After three years maintaining APIs that handle 50M+ requests per day, I've collected hard-won lessons that I wish someone had told me on day one.

Context: This post focuses on REST APIs serving web and mobile clients. The principles apply broadly, but specific implementations may vary for GraphQL, gRPC, or real-time systems.

The Database Will Be Your Bottleneck

Not "might be." Will be. Most performance issues I've debugged ultimately traced back to database queries. The good news? Most are preventable with proper design upfront.

N+1 Queries: The Silent Killer

You've probably seen this pattern:

// Get all users
const users = await db.query('SELECT * FROM users LIMIT 10');

// For each user, get their posts (N queries!)
for (const user of users) {
    user.posts = await db.query('SELECT * FROM posts WHERE user_id = ?', [user.id]);
}

That's 1 query + 10 queries = 11 database round trips. At 5ms per query, you've blown 55ms before you've even started processing. Under load, this becomes seconds.

The fix: Eager loading with joins or batching.

// Single query with JOIN
const results = await db.query(`
    SELECT users.*, posts.*
    FROM users
    LEFT JOIN posts ON posts.user_id = users.id
    WHERE users.id IN (...)
`);

Index Everything You Query On

This seems obvious, but I've seen production APIs doing full table scans on millions of rows because someone forgot to add an index. Monitor your slow query logs religiously.

Query Pattern	Index Required	Why
`WHERE user_id = ?`	Index on `user_id`	Direct lookup
`WHERE status = ? ORDER BY created_at`	Composite: `(status, created_at)`	Filter + sort in one pass
`WHERE email LIKE 'john%'`	Index on `email`	Prefix search (not `%john%`!)

Caching: Your Best Friend and Worst Enemy

Caching is how you go from surviving to thriving. But poorly implemented caching causes more bugs than slow queries ever did.

Cache Invalidation: The Hard Parts

Rule #1: If you can't invalidate it correctly, don't cache it. Stale cache is worse than no cache.

My caching strategy hierarchy:

Short TTLs for everything (30-60 seconds) — your safety net
Tagged invalidation — clear related keys when data changes
Cache-aside pattern — app controls both read and write
Monitoring cache hit rates — know when it's helping

// Cache-aside pattern
async function getUser(userId) {
    const cacheKey = `user:${userId}`;
    
    // Try cache first
    let user = await redis.get(cacheKey);
    if (user) return JSON.parse(user);
    
    // Cache miss - hit database
    user = await db.query('SELECT * FROM users WHERE id = ?', [userId]);
    
    // Store in cache (60s TTL)
    await redis.setex(cacheKey, 60, JSON.stringify(user));
    
    return user;
}

Rate Limiting: Protect Yourself from Yourself

Rate limiting isn't just about protecting against bad actors. It's about preventing one client (or one buggy deploy) from taking down your entire service.

Multi-Tier Limits

Per-user limits: 100 requests/minute prevents abuse
Per-endpoint limits: Expensive operations get tighter quotas
Global limits: Never exceed your infrastructure capacity

Return proper headers so clients can self-regulate:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 87
X-RateLimit-Reset: 1707542400

Monitoring That Actually Matters

Don't just measure uptime. Measure experience.

The Four Golden Signals

Latency: How long does it take? (P50, P95, P99)
Traffic: How many requests per second?
Errors: What's your error rate?
Saturation: How full are your resources?

Pro tip: Alert on P95 latency, not averages. Averages hide the pain of your worst users.

API Versioning: Plan for Change

You will need to make breaking changes. The only question is whether you planned for it.

My preferred approach: URL versioning with a deprecation timeline.

/v1/users — original
/v2/users — new version
v1 supported for 12 months after v2 launch
Announce deprecation early and loudly

Testing at Scale

Load testing isn't optional. You need to know your breaking point before your users find it.

What to Test

Baseline load: Can you handle 2x your peak traffic?
Spike behavior: What happens when traffic jumps 10x suddenly?
Degradation: How does your service fail? Gracefully or catastrophically?

Tools like k6, Gatling, or Apache JMeter can simulate realistic load patterns. Run them regularly, not just before launch.

The Checklist

Before you ship your next API endpoint:

✅ All queries have appropriate indexes
✅ N+1 queries eliminated
✅ Caching implemented with proper invalidation
✅ Rate limiting configured
✅ Error responses documented
✅ Monitoring and alerting set up
✅ Load tested at 2x expected traffic
✅ Versioning strategy defined

Final Thoughts

Building scalable APIs is as much about discipline as it is about technology. The patterns above won't solve every problem, but they'll prevent most of the common ones.

The best advice I can give: measure everything, assume nothing, and learn from production. Your monitoring dashboard will teach you more than any blog post ever could.