Social Media Feed System Design (Timeline) — High Level

 

Social Media Feed System Design (Timeline) — High Level

1) Use Case & Problem Context

We need to serve a fresh, ranked timeline for each user that blends:

  • Posts from followed users

  • Ads

  • Recommended / suggested content

Key challenges:

  • Low-latency feed reads (most traffic is reads)

  • Celebrity/high-fanout accounts (millions of followers)

  • Ranking + personalization

  • Spam/abuse (bot posts, engagement manipulation)


2) Core Requirements

Functional

  • Users create posts (POST /post)

  • Users fetch their feed (GET /feed?cursor=...)

  • Support pagination, freshness, and ranking

  • Mix organic + ads + recommendations

Non-Functional

  • Very low read latency (P95/P99)

  • Scalable fan-out strategy

  • Eventual consistency acceptable (seconds)

  • Strong abuse controls and observability


3) High-Level Architecture (Text Diagram)

(Write Path) Client -> Post API -> Post Store -> Event Bus -> Fanout Workers | +--> Safety/Spam Checks | +--> Timeline Store/Cache (precomputed) (Read Path) Client -> Feed API -> Timeline Cache/Store -> Ranker -> Media CDN -> Response | +-> For celebrities: On-demand pull of posts | +-> Feature Store (affinity/engagement signals) | +-> Ads + Recommendations mixer

4) Data Flow (Step-by-Step)

A) Write Path: Creating a Post (Ingest)

Goal: Store the post once, distribute it efficiently, and keep system safe.

Flow

  1. User calls POST /post

  2. Post is written to Posts DB (source of truth)

  3. Emit event to Event Bus: POST_CREATED

  4. Run spam/safety checks

    • immediate checks (rate limits, known bad domains)

    • async ML checks (spam, nudity, policy)

  5. Update indexes:

    • author -> posts index (fast author feed)

    • engagement logs start empty

Why event bus?

  • Decouples ingest from fanout and ranking

  • Makes the system resilient and scalable


B) Fan-out Strategy (Push vs Pull)

This is the most important design decision.

Option 1: Push model (Fan-out-on-write)

For “normal” accounts:

  • When an author posts, we push that post into followers’ timelines.

✅ Fast reads
✅ Feed can be mostly precomputed
❌ Can explode for celebrities

Option 2: Pull model (Fan-out-on-read)

For “celebrity / high-fanout” accounts:

  • Do NOT push to all followers.

  • Instead, followers’ feed pulls celebrity posts at read time.

✅ Avoid massive writes
✅ Scales for high-fanout
❌ Slightly heavier reads

Recommended approach: Hybrid (Industry standard)

  • Push for normal accounts

  • Pull for high-fanout accounts

  • Threshold based on follower count or write amplification cost

Interview line: “Hybrid fanout is the practical choice—push for most users, pull for celebrities.”


5) Read Path: Fetching the Timeline

Endpoint:

  • GET /feed?cursor=...

Flow

  1. Feed API reads from Timeline Cache/Store

    • precomputed items (push model)

  2. Also fetches “pull sources”

    • recent posts from celebrity accounts the user follows

  3. Combine candidates into a working set

  4. Rank the candidates using an online ranker

  5. Blend:

    • organic posts

    • ads

    • recommended content

  6. Hydrate media refs using Media CDN

  7. Return results + next cursor

Read path diagram

GET /feed?cursor | v [Timeline Cache] + [Celebrity Pull] | v Candidates | v Ranker <--- Feature Store | v Mixer (Organic + Ads + Recs + Constraints) | v Hydrate Media -> Response (items + cursor)

6) Ranking (High-Level)

Ranking decides “what appears first”.

Inputs (features)

  • Recency (newer posts higher)

  • Affinity (how close user is to author)

  • Engagement probability (likes/comments history)

  • Content type (photo/video/text)

  • Negative signals (spam, low-quality, repetitive)

Feature Store

To keep ranking fast:

  • Precompute stable signals (affinity, historical engagement)

  • Cache them in a Feature Store (Redis/online store)

  • Online ranker just “looks up” features

Blending constraints (important in interviews)

  • Freshness guarantees (don’t show all old posts)

  • Diversity (avoid 10 posts from same author)

  • Content mix (video/photo/text)

  • Ads spacing rules


7) Spam & Safety Controls

Spam and abuse must be handled early and continuously.

On ingest (post-time)

  • Rate limit posting

  • Reputation scoring (new account, suspicious domains)

  • ML classifiers (spam, policy violations)

  • Shadow banning / quarantine queue

On engagement (like/comment anomalies)

  • Detect bot-like behavior and engagement spikes

  • Downrank suspicious posts

  • Block/limit repeat offenders

Interview line: “Safety is part of the pipeline—both at ingest and via engagement anomaly detection.”


8) Data Model (Simple)

Tables (conceptual):

  • posts(id, author, ts, body, media_refs, visibility)

  • follows(u, v, ts) // u follows v

  • timelines(user, post_id, ts, source)

    • source: pushed vs pulled candidate vs ad vs recommendation

  • engagements(user, post_id, type, ts)

    • type: like/comment/share/view


9) APIs

  • POST /post → create a post

  • GET /feed?cursor=... → main timeline

  • GET /u/{id}/feed?cursor=... → user profile feed (author’s posts)


10) Pagination (Cursor-based)

Use cursor pagination (not offset) to handle:

  • Changing ranking

  • New posts arriving

  • Large timelines

Cursor typically encodes:

  • last seen timestamp

  • last seen rank score

  • last seen post id


11) Benefits

  • Low latency feed reads via precomputed timelines

  • Scales to celebrity accounts using pull strategy

  • Supports personalization + ads

  • Built-in safety and abuse resistance

  • Clean separation: ingest, fanout, ranking, delivery


12) Interview Talking Points (What to emphasize)

  • Push vs Pull fanout (+ hybrid approach)

  • Celebrity problem (write amplification)

  • Ranking inputs + feature store

  • Cursor-based pagination

  • Ads blending and constraints

  • Spam/abuse: shadow bans, rate limits, anomaly detection

  • Tradeoffs: freshness vs latency vs consistency


Whiteboard-Style Summary (30 seconds)

POST: Post API -> Posts DB -> Event Bus -> Fanout (push normal users) | +-> Safety checks GET: Feed API -> Timeline cache + Celebrity pull -> Ranker -> Blend -> Media -> Response


No comments:

Post a Comment

Online Food Delivery Platform — System Design

  Online Food Delivery Platform — System Design  1) Use Case & Problem Context Users should be able to: Browse restaurants near them...

Featured Posts