API Rate Limiter – System Design

1️⃣ Problem Statement

We need an API Rate Limiter to:

Protect APIs from abuse
Ensure fair usage per tenant/user/route
Allow bursty traffic with a steady average
Work in Kubernetes + multi-replica + multi-region
Make decisions with very low latency
Degrade gracefully if Redis or control plane fails

2️⃣ High-Level Idea (One-Line)

Rate limiting is an edge decision problem — keep it fast, local, and predictable.

3️⃣ Where Rate Limiting Happens


Client
  |
  v
[API Gateway / Ingress]
        |
        v
 [Rate Limiter]
        |
   (allow / deny)
        |
        v
 [Backend Service]

Best practice

Enforce at the edge first (Gateway / Ingress)
Optional: Sidecar / mesh for fine-grained internal APIs

4️⃣ Algorithm Choice (Keep It Simple)

✅ Token Bucket (Recommended)

Allows bursts
Maintains steady average rate
Easy to reason about

❌ Sliding Window

More accurate
Heavier on storage and compute
Usually overkill

👉 Interview tip

“I use token bucket for RPS and fixed window for daily/monthly quotas.”

5️⃣ Storage & Atomicity

Redis + Lua (Authoritative Check)


Key: rl:{tenant}:{route}
Value: { tokens_remaining, last_refill_time }

Lua script does atomically:

Refill tokens based on time
Deduct request cost
Return allow/deny + remaining tokens

⚠️ Important Fix

If you use INCR + EXPIRE, that is fixed window, not token bucket.
➡️ Fix: Store tokens + refill timestamp and calculate refill in Lua.

6️⃣ Request Data Flow (Hot Path)


1. Request hits Gateway
2. Extract key (tenant / route / method)
3. Check local in-memory bucket (optional)
4. Redis Lua check (authoritative)
5. Allow → forward
6. Deny → 429 response


Client
  |
  v
[Gateway]
  |
  v
[Limiter] ---> Redis (Lua)
  |
  +--> 200 OK → Service
  |
  +--> 429 Too Many Requests

7️⃣ Multi-Region Strategy (Choose One)

Option A – Home Region (Recommended)


Tenant → fixed region → single Redis cluster

Strong fairness
Simple reasoning

Option B – Eventual Consistency


Region A Redis  ← async merge → Region B Redis

Best latency
Small temporary overshoot allowed

👉 Interview answer

“If fairness is critical, route tenants to a home region.
If latency matters more, accept small overshoot with eventual consistency.”

8️⃣ Failure Handling (Very Important)

Redis Down

Default: Fail-open + local limiter
Critical APIs: Fail-closed (payments/admin)

Control Plane Down

Use last known policy
Alert if policy is stale

9️⃣ Kubernetes Integration (Summary)

Option	Use Case
NGINX Ingress	Simple IP/path limits
Kong + Redis	Per-tenant / header-based limits
Envoy / Istio	Local + global rate limiting
Custom CRD	Enterprise policy management

🔟 Rate Limit Response

Always return standard headers:


X-RateLimit-Limit
X-RateLimit-Remaining
X-RateLimit-Reset
Retry-After

Denied response:


HTTP 429 Too Many Requests

1️⃣1️⃣ What to Fix / Improve (Key Section)

✅ Fix 1: Use real Token Bucket

Replace window counters with token + refill timestamp.

✅ Fix 2: Add local limiter

Use in-memory bucket to reduce Redis load and hot keys.

✅ Fix 3: Decide multi-region policy clearly

Don’t mix strong consistency and CRDT casually.

✅ Fix 4: Define fail-open vs fail-closed per endpoint

Availability vs protection trade-off must be explicit.

1️⃣2️⃣ Text Diagram – Complete Flow


Client
  |
  v
[Ingress / Gateway]
  |
  v
[Rate Limiter]
   |        |
   |        v
   |     Redis (Lua)
   |
   +--> Allow → Service → 200
   |
   +--> Deny  → 429

1️⃣3️⃣ One-Minute Interview Explanation

“I enforce rate limiting at the gateway using a token bucket algorithm.
Each request checks a local bucket first, then Redis via a Lua script for atomic refill and decrement.
For multi-region, I either route tenants to a home region for strict fairness or allow small overshoot with eventual consistency.
On Redis failure, I fail-open with local limits by default and fail-closed only for critical APIs.”

✅ Final Outcome

Predictable fairness
Low-latency decisions
Redis protected from overload
Clear failure semantics
Easy Kubernetes integration

The Backend Engineer’s Journal