There's a massive gap between building an API that works and building one that scales. After three years maintaining APIs that handle 50M+ requests per day, I've collected hard-won lessons that I wish someone had told me on day one.
The Database Will Be Your Bottleneck
Not "might be." Will be. Most performance issues I've debugged ultimately traced back to database queries. The good news? Most are preventable with proper design upfront.
N+1 Queries: The Silent Killer
You've probably seen this pattern:
// Get all users
const users = await db.query('SELECT * FROM users LIMIT 10');
// For each user, get their posts (N queries!)
for (const user of users) {
user.posts = await db.query('SELECT * FROM posts WHERE user_id = ?', [user.id]);
}
That's 1 query + 10 queries = 11 database round trips. At 5ms per query, you've blown 55ms before you've even started processing. Under load, this becomes seconds.
The fix: Eager loading with joins or batching.
// Single query with JOIN
const results = await db.query(`
SELECT users.*, posts.*
FROM users
LEFT JOIN posts ON posts.user_id = users.id
WHERE users.id IN (...)
`);
Index Everything You Query On
This seems obvious, but I've seen production APIs doing full table scans on millions of rows because someone forgot to add an index. Monitor your slow query logs religiously.
| Query Pattern | Index Required | Why |
|---|---|---|
WHERE user_id = ? |
Index on user_id |
Direct lookup |
WHERE status = ? ORDER BY created_at |
Composite: (status, created_at) |
Filter + sort in one pass |
WHERE email LIKE 'john%' |
Index on email |
Prefix search (not %john%!) |
Caching: Your Best Friend and Worst Enemy
Caching is how you go from surviving to thriving. But poorly implemented caching causes more bugs than slow queries ever did.
Cache Invalidation: The Hard Parts
My caching strategy hierarchy:
- Short TTLs for everything (30-60 seconds) — your safety net
- Tagged invalidation — clear related keys when data changes
- Cache-aside pattern — app controls both read and write
- Monitoring cache hit rates — know when it's helping
// Cache-aside pattern
async function getUser(userId) {
const cacheKey = `user:${userId}`;
// Try cache first
let user = await redis.get(cacheKey);
if (user) return JSON.parse(user);
// Cache miss - hit database
user = await db.query('SELECT * FROM users WHERE id = ?', [userId]);
// Store in cache (60s TTL)
await redis.setex(cacheKey, 60, JSON.stringify(user));
return user;
}
Rate Limiting: Protect Yourself from Yourself
Rate limiting isn't just about protecting against bad actors. It's about preventing one client (or one buggy deploy) from taking down your entire service.
Multi-Tier Limits
- Per-user limits: 100 requests/minute prevents abuse
- Per-endpoint limits: Expensive operations get tighter quotas
- Global limits: Never exceed your infrastructure capacity
Return proper headers so clients can self-regulate:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 87
X-RateLimit-Reset: 1707542400
Monitoring That Actually Matters
Don't just measure uptime. Measure experience.
The Four Golden Signals
- Latency: How long does it take? (P50, P95, P99)
- Traffic: How many requests per second?
- Errors: What's your error rate?
- Saturation: How full are your resources?
API Versioning: Plan for Change
You will need to make breaking changes. The only question is whether you planned for it.
My preferred approach: URL versioning with a deprecation timeline.
/v1/users— original/v2/users— new version- v1 supported for 12 months after v2 launch
- Announce deprecation early and loudly
Testing at Scale
Load testing isn't optional. You need to know your breaking point before your users find it.
What to Test
- Baseline load: Can you handle 2x your peak traffic?
- Spike behavior: What happens when traffic jumps 10x suddenly?
- Degradation: How does your service fail? Gracefully or catastrophically?
Tools like k6, Gatling, or Apache JMeter can simulate realistic load patterns. Run them regularly, not just before launch.
The Checklist
Before you ship your next API endpoint:
- ✅ All queries have appropriate indexes
- ✅ N+1 queries eliminated
- ✅ Caching implemented with proper invalidation
- ✅ Rate limiting configured
- ✅ Error responses documented
- ✅ Monitoring and alerting set up
- ✅ Load tested at 2x expected traffic
- ✅ Versioning strategy defined
Final Thoughts
Building scalable APIs is as much about discipline as it is about technology. The patterns above won't solve every problem, but they'll prevent most of the common ones.
The best advice I can give: measure everything, assume nothing, and learn from production. Your monitoring dashboard will teach you more than any blog post ever could.