System Design: URL Shortener

(High-level, interview-oriented)

1️⃣ Use Case & Problem Context

What are we building?

A service that:

Converts long URLs into short, human-shareable links
Redirects users to the original URL with very low latency
Tracks click analytics
Optionally supports expiration (TTL)

Key challenges

Very high read traffic (redirects)
Viral spikes (sudden traffic bursts)
No collisions in short codes
Fast redirects (<10 ms if cached)
Analytics should not slow redirects

2️⃣ Core Requirements (Interview Language)

Functional

Create a short URL
Redirect short URL → long URL
Track clicks (count, time, referrer, device)
Optional expiry (TTL)

Non-Functional

Low latency
High availability
Horizontally scalable
Eventual consistency for analytics

3️⃣ High-Level Architecture


            ┌────────────┐
            │   Client   │
            └─────┬──────┘
                  │
        ┌─────────▼─────────┐
        │   CDN / Edge       │
        └─────────┬─────────┘
                  │
        ┌─────────▼─────────┐
        │ Redirect Service   │
        └───────┬───────┬───┘
                │       │
           ┌────▼───┐   │
           │ Redis   │   │
           │ Cache   │   │
           └────┬───┘   │
                │       │
           ┌────▼───────▼───┐
           │   URLs Database │
           └────────────────┘
                |
         (Async Click Event)
                |
           ┌────▼────┐
           │  Queue   │
           └────┬────┘
                |
           ┌────▼────┐
           │ Analytics│
           │ (OLAP)   │
           └──────────┘

4️⃣ ID / Short Code Generation

Goal

Generate globally unique, short, URL-safe codes

Preferred approaches

Snowflake / KSUID → Base62
- Distributed
- Time-sortable
- No coordination bottleneck

Alternatives

Auto-increment counter + Base62
- Fastest
- Needs sharding to avoid single DB hotspot
Hash(long_url)
- Requires collision handling (bucket or retry)

👉 Interview answer

“I prefer Snowflake IDs encoded in Base62 to avoid collisions and support distributed generation.”

5️⃣ Write Path (Create Short URL)

Step-by-step data flow

Client → POST /shorten
- Validate URL (scheme, length)
Policy checks
- Rate limit
- Malware/domain reputation (async if needed)
Generate short code
- Using Snowflake/KSUID → Base62
Persist mapping
```
(short_code → long_url, metadata)
```
Optional
- Warm Redis cache
- Emit “URL_CREATED” event

Why this works

Write traffic is much smaller than read traffic
Uniqueness guaranteed by ID generator + DB constraint

6️⃣ Read Path (Redirect – Hot Path)

Step-by-step data flow

Client hits short URL
```
GET /abc123
```
CDN / Edge receives request
Check Redis cache
- ✅ Cache hit → get long URL
- ❌ Cache miss → DB lookup → update cache
Return redirect
- 302 Redirect (preferred)
Fire-and-forget click event
- Send event to queue (async)

👉 Why 302 (not 301)?

Allows:
- URL edits
- TTL enforcement
- Analytics
301 is cached aggressively by browsers

7️⃣ Analytics (Cold Path – Never Block Redirects)

Data flow

Redirect service publishes click event:


(short_code, timestamp, ip_hash, user_agent, referrer)

Events go to queue (Kafka / PubSub)
Batch processing → OLAP store
Precompute aggregates:
- Clicks per day
- Referrer
- Device/browser

Key principle

Redirect path must stay fast even if analytics is slow or down

8️⃣ Availability & Scaling

Read scalability

CDN + Edge workers
Redis for hot keys
Cache-aside pattern

Write scalability

ID generation is distributed
DB sharded by short_code hash

Failure handling

Cache miss → DB fallback
Analytics overload → sample or drop events
Multi-AZ database replication

9️⃣ Abuse & Governance

Abuse prevention

Rate limiting per user / IP
Domain reputation checks
Manual takedown / disable link

Privacy

Hash IP (PII minimization)
Store minimal user data

🔟 Data Model (Simple & Interview-Friendly)

URLs table


urls(
  short_code PK,
  long_url,
  owner_id,
  created_at,
  ttl_at,
  is_active,
  click_count,
  last_accessed_at
)

Click events (append-only)


click_events(
  short_code,
  timestamp,
  ip_hash,
  user_agent,
  referrer
)

1️⃣1️⃣ APIs

Create short URL


POST /shorten
{
  "long_url": "...",
  "ttl": "7d"
}
→ { "short_code": "abc123", "url": "https://bit.ly/abc123" }

Redirect


GET /abc123 → 302

Stats


GET /stats/abc123?range=7d

1️⃣2️⃣ Benefits (Say This in Interview)

Sub-10ms redirects when cached
Handles viral traffic via CDN + Redis
Analytics isolated from hot path
Simple, scalable, fault-tolerant design

1️⃣3️⃣ Interview Talking Points (Very Important)

302 vs 301
Code space & collision handling
Hot key mitigation
Cache-aside pattern
Eventual consistency for analytics
PII minimization (hash IPs)

1️⃣4️⃣ Mermaid Diagram (Optional – Blog / Notes)


flowchart LR
    Client --> API[Shorten API]
    API --> IDGen[ID Generator]
    API --> DB[(URLs DB)]

    Client --> CDN[CDN / Edge]
    CDN --> EDGE[Redirect Service]
    EDGE --> CACHE[(Redis Cache)]
    EDGE -->|Cache Miss| DB
    EDGE -->|Click Event| EVENTS[[Event Queue]]

    EVENTS --> ETL[Batch Processor]
    ETL --> OLAP[(Analytics Store)]

System Design: URL Shortener

System Design: URL Shortener

1️⃣ Use Case & Problem Context

What are we building?

Key challenges

2️⃣ Core Requirements (Interview Language)

Functional

Non-Functional

3️⃣ High-Level Architecture

4️⃣ ID / Short Code Generation

Goal

Preferred approaches

Alternatives

5️⃣ Write Path (Create Short URL)

Step-by-step data flow

Why this works

6️⃣ Read Path (Redirect – Hot Path)

Step-by-step data flow

7️⃣ Analytics (Cold Path – Never Block Redirects)

Data flow

Key principle

8️⃣ Availability & Scaling

Read scalability

Write scalability

Failure handling

9️⃣ Abuse & Governance

Abuse prevention

Privacy

🔟 Data Model (Simple & Interview-Friendly)

URLs table

Click events (append-only)

1️⃣1️⃣ APIs

Create short URL

Redirect

Stats

1️⃣2️⃣ Benefits (Say This in Interview)

1️⃣3️⃣ Interview Talking Points (Very Important)

1️⃣4️⃣ Mermaid Diagram (Optional – Blog / Notes)

No comments:

Post a Comment

Confusion Matrix + Precision/Recall (Super Simple, With Examples)

Featured Posts