System Design: URL Shortener

 

System Design: URL Shortener

(High-level, interview-oriented)


1️⃣ Use Case & Problem Context

What are we building?

A service that:

  • Converts long URLs into short, human-shareable links

  • Redirects users to the original URL with very low latency

  • Tracks click analytics

  • Optionally supports expiration (TTL)

Key challenges

  • Very high read traffic (redirects)

  • Viral spikes (sudden traffic bursts)

  • No collisions in short codes

  • Fast redirects (<10 ms if cached)

  • Analytics should not slow redirects


2️⃣ Core Requirements (Interview Language)

Functional

  • Create a short URL

  • Redirect short URL → long URL

  • Track clicks (count, time, referrer, device)

  • Optional expiry (TTL)

Non-Functional

  • Low latency

  • High availability

  • Horizontally scalable

  • Eventual consistency for analytics


3️⃣ High-Level Architecture

┌────────────┐ │ Client │ └─────┬──────┘ │ ┌─────────▼─────────┐ │ CDN / Edge │ └─────────┬─────────┘ │ ┌─────────▼─────────┐ │ Redirect Service │ └───────┬───────┬───┘ │ │ ┌────▼───┐ │ │ Redis │ │ │ Cache │ │ └────┬───┘ │ │ │ ┌────▼───────▼───┐ │ URLs Database │ └────────────────┘ | (Async Click Event) | ┌────▼────┐ │ Queue │ └────┬────┘ | ┌────▼────┐ │ Analytics│ │ (OLAP) │ └──────────┘

4️⃣ ID / Short Code Generation

Goal

Generate globally unique, short, URL-safe codes

Preferred approaches

  • Snowflake / KSUID → Base62

    • Distributed

    • Time-sortable

    • No coordination bottleneck

Alternatives

  • Auto-increment counter + Base62

    • Fastest

    • Needs sharding to avoid single DB hotspot

  • Hash(long_url)

    • Requires collision handling (bucket or retry)

👉 Interview answer

“I prefer Snowflake IDs encoded in Base62 to avoid collisions and support distributed generation.”


5️⃣ Write Path (Create Short URL)

Step-by-step data flow

  1. Client → POST /shorten

    • Validate URL (scheme, length)

  2. Policy checks

    • Rate limit

    • Malware/domain reputation (async if needed)

  3. Generate short code

    • Using Snowflake/KSUID → Base62

  4. Persist mapping

    (short_code → long_url, metadata)
  5. Optional

    • Warm Redis cache

    • Emit “URL_CREATED” event

Why this works

  • Write traffic is much smaller than read traffic

  • Uniqueness guaranteed by ID generator + DB constraint


6️⃣ Read Path (Redirect – Hot Path)

Step-by-step data flow

  1. Client hits short URL

    GET /abc123
  2. CDN / Edge receives request

  3. Check Redis cache

    • ✅ Cache hit → get long URL

    • ❌ Cache miss → DB lookup → update cache

  4. Return redirect

    • 302 Redirect (preferred)

  5. Fire-and-forget click event

    • Send event to queue (async)

👉 Why 302 (not 301)?

  • Allows:

    • URL edits

    • TTL enforcement

    • Analytics

  • 301 is cached aggressively by browsers


7️⃣ Analytics (Cold Path – Never Block Redirects)

Data flow

  1. Redirect service publishes click event:

    (short_code, timestamp, ip_hash, user_agent, referrer)
  2. Events go to queue (Kafka / PubSub)

  3. Batch processing → OLAP store

  4. Precompute aggregates:

    • Clicks per day

    • Referrer

    • Device/browser

Key principle

Redirect path must stay fast even if analytics is slow or down


8️⃣ Availability & Scaling

Read scalability

  • CDN + Edge workers

  • Redis for hot keys

  • Cache-aside pattern

Write scalability

  • ID generation is distributed

  • DB sharded by short_code hash

Failure handling

  • Cache miss → DB fallback

  • Analytics overload → sample or drop events

  • Multi-AZ database replication


9️⃣ Abuse & Governance

Abuse prevention

  • Rate limiting per user / IP

  • Domain reputation checks

  • Manual takedown / disable link

Privacy

  • Hash IP (PII minimization)

  • Store minimal user data


🔟 Data Model (Simple & Interview-Friendly)

URLs table

urls( short_code PK, long_url, owner_id, created_at, ttl_at, is_active, click_count, last_accessed_at )

Click events (append-only)

click_events( short_code, timestamp, ip_hash, user_agent, referrer )

1️⃣1️⃣ APIs

Create short URL

POST /shorten { "long_url": "...", "ttl": "7d" } → { "short_code": "abc123", "url": "https://bit.ly/abc123" }

Redirect

GET /abc123 → 302

Stats

GET /stats/abc123?range=7d

1️⃣2️⃣ Benefits (Say This in Interview)

  • Sub-10ms redirects when cached

  • Handles viral traffic via CDN + Redis

  • Analytics isolated from hot path

  • Simple, scalable, fault-tolerant design


1️⃣3️⃣ Interview Talking Points (Very Important)

  • 302 vs 301

  • Code space & collision handling

  • Hot key mitigation

  • Cache-aside pattern

  • Eventual consistency for analytics

  • PII minimization (hash IPs)


1️⃣4️⃣ Mermaid Diagram (Optional – Blog / Notes)

flowchart LR Client --> API[Shorten API] API --> IDGen[ID Generator] API --> DB[(URLs DB)] Client --> CDN[CDN / Edge] CDN --> EDGE[Redirect Service] EDGE --> CACHE[(Redis Cache)] EDGE -->|Cache Miss| DB EDGE -->|Click Event| EVENTS[[Event Queue]] EVENTS --> ETL[Batch Processor] ETL --> OLAP[(Analytics Store)]


No comments:

Post a Comment

Online Food Delivery Platform — System Design

  Online Food Delivery Platform — System Design  1) Use Case & Problem Context Users should be able to: Browse restaurants near them...

Featured Posts