API Rate Limiter – System Design
1️⃣ Problem Statement
We need an API Rate Limiter to:
-
Protect APIs from abuse
-
Ensure fair usage per tenant/user/route
-
Allow bursty traffic with a steady average
-
Work in Kubernetes + multi-replica + multi-region
-
Make decisions with very low latency
-
Degrade gracefully if Redis or control plane fails
2️⃣ High-Level Idea (One-Line)
Rate limiting is an edge decision problem — keep it fast, local, and predictable.
3️⃣ Where Rate Limiting Happens
Best practice
-
Enforce at the edge first (Gateway / Ingress)
-
Optional: Sidecar / mesh for fine-grained internal APIs
4️⃣ Algorithm Choice (Keep It Simple)
✅ Token Bucket (Recommended)
-
Allows bursts
-
Maintains steady average rate
-
Easy to reason about
❌ Sliding Window
-
More accurate
-
Heavier on storage and compute
-
Usually overkill
👉 Interview tip
“I use token bucket for RPS and fixed window for daily/monthly quotas.”
5️⃣ Storage & Atomicity
Redis + Lua (Authoritative Check)
Lua script does atomically:
-
Refill tokens based on time
-
Deduct request cost
-
Return allow/deny + remaining tokens
⚠️ Important Fix
If you use INCR + EXPIRE, that is fixed window, not token bucket.
➡️ Fix: Store tokens + refill timestamp and calculate refill in Lua.
6️⃣ Request Data Flow (Hot Path)
7️⃣ Multi-Region Strategy (Choose One)
Option A – Home Region (Recommended)
-
Strong fairness
-
Simple reasoning
Option B – Eventual Consistency
-
Best latency
-
Small temporary overshoot allowed
👉 Interview answer
“If fairness is critical, route tenants to a home region.
If latency matters more, accept small overshoot with eventual consistency.”
8️⃣ Failure Handling (Very Important)
Redis Down
-
Default: Fail-open + local limiter
-
Critical APIs: Fail-closed (payments/admin)
Control Plane Down
-
Use last known policy
-
Alert if policy is stale
9️⃣ Kubernetes Integration (Summary)
| Option | Use Case |
|---|---|
| NGINX Ingress | Simple IP/path limits |
| Kong + Redis | Per-tenant / header-based limits |
| Envoy / Istio | Local + global rate limiting |
| Custom CRD | Enterprise policy management |
🔟 Rate Limit Response
Always return standard headers:
Denied response:
1️⃣1️⃣ What to Fix / Improve (Key Section)
✅ Fix 1: Use real Token Bucket
Replace window counters with token + refill timestamp.
✅ Fix 2: Add local limiter
Use in-memory bucket to reduce Redis load and hot keys.
✅ Fix 3: Decide multi-region policy clearly
Don’t mix strong consistency and CRDT casually.
✅ Fix 4: Define fail-open vs fail-closed per endpoint
Availability vs protection trade-off must be explicit.
1️⃣2️⃣ Text Diagram – Complete Flow
1️⃣3️⃣ One-Minute Interview Explanation
“I enforce rate limiting at the gateway using a token bucket algorithm.
Each request checks a local bucket first, then Redis via a Lua script for atomic refill and decrement.
For multi-region, I either route tenants to a home region for strict fairness or allow small overshoot with eventual consistency.
On Redis failure, I fail-open with local limits by default and fail-closed only for critical APIs.”
✅ Final Outcome
-
Predictable fairness
-
Low-latency decisions
-
Redis protected from overload
-
Clear failure semantics
-
Easy Kubernetes integration
No comments:
Post a Comment