Social Media Feed System Design (Timeline) — High Level

1) Use Case & Problem Context

We need to serve a fresh, ranked timeline for each user that blends:

Posts from followed users
Ads
Recommended / suggested content

Key challenges:

Low-latency feed reads (most traffic is reads)
Celebrity/high-fanout accounts (millions of followers)
Ranking + personalization
Spam/abuse (bot posts, engagement manipulation)

2) Core Requirements

Functional

Users create posts (POST /post)
Users fetch their feed (GET /feed?cursor=...)
Support pagination, freshness, and ranking
Mix organic + ads + recommendations

Non-Functional

Very low read latency (P95/P99)
Scalable fan-out strategy
Eventual consistency acceptable (seconds)
Strong abuse controls and observability

3) High-Level Architecture (Text Diagram)


                (Write Path)
Client -> Post API -> Post Store -> Event Bus -> Fanout Workers
                                       |
                                       +--> Safety/Spam Checks
                                       |
                                       +--> Timeline Store/Cache (precomputed)

                (Read Path)
Client -> Feed API -> Timeline Cache/Store -> Ranker -> Media CDN -> Response
                              |
                              +-> For celebrities: On-demand pull of posts
                              |
                              +-> Feature Store (affinity/engagement signals)
                              |
                              +-> Ads + Recommendations mixer

4) Data Flow (Step-by-Step)

A) Write Path: Creating a Post (Ingest)

Goal: Store the post once, distribute it efficiently, and keep system safe.

Flow

User calls POST /post
Post is written to Posts DB (source of truth)
Emit event to Event Bus: POST_CREATED
Run spam/safety checks
- immediate checks (rate limits, known bad domains)
- async ML checks (spam, nudity, policy)
Update indexes:
- author -> posts index (fast author feed)
- engagement logs start empty

Why event bus?

Decouples ingest from fanout and ranking
Makes the system resilient and scalable

B) Fan-out Strategy (Push vs Pull)

This is the most important design decision.

Option 1: Push model (Fan-out-on-write)

For “normal” accounts:

When an author posts, we push that post into followers’ timelines.

✅ Fast reads
✅ Feed can be mostly precomputed
❌ Can explode for celebrities

Option 2: Pull model (Fan-out-on-read)

For “celebrity / high-fanout” accounts:

Do NOT push to all followers.
Instead, followers’ feed pulls celebrity posts at read time.

✅ Avoid massive writes
✅ Scales for high-fanout
❌ Slightly heavier reads

Recommended approach: Hybrid (Industry standard)

Push for normal accounts
Pull for high-fanout accounts
Threshold based on follower count or write amplification cost

Interview line: “Hybrid fanout is the practical choice—push for most users, pull for celebrities.”

5) Read Path: Fetching the Timeline

Endpoint:

GET /feed?cursor=...

Flow

Feed API reads from Timeline Cache/Store
- precomputed items (push model)
Also fetches “pull sources”
- recent posts from celebrity accounts the user follows
Combine candidates into a working set
Rank the candidates using an online ranker
Blend:
- organic posts
- ads
- recommended content
Hydrate media refs using Media CDN
Return results + next cursor

Read path diagram


GET /feed?cursor
   |
   v
[Timeline Cache] + [Celebrity Pull]
         |
         v
     Candidates
         |
         v
     Ranker  <--- Feature Store
         |
         v
Mixer (Organic + Ads + Recs + Constraints)
         |
         v
Hydrate Media -> Response (items + cursor)

6) Ranking (High-Level)

Ranking decides “what appears first”.

Inputs (features)

Recency (newer posts higher)
Affinity (how close user is to author)
Engagement probability (likes/comments history)
Content type (photo/video/text)
Negative signals (spam, low-quality, repetitive)

Feature Store

To keep ranking fast:

Precompute stable signals (affinity, historical engagement)
Cache them in a Feature Store (Redis/online store)
Online ranker just “looks up” features

Blending constraints (important in interviews)

Freshness guarantees (don’t show all old posts)
Diversity (avoid 10 posts from same author)
Content mix (video/photo/text)
Ads spacing rules

7) Spam & Safety Controls

Spam and abuse must be handled early and continuously.

On ingest (post-time)

Rate limit posting
Reputation scoring (new account, suspicious domains)
ML classifiers (spam, policy violations)
Shadow banning / quarantine queue

On engagement (like/comment anomalies)

Detect bot-like behavior and engagement spikes
Downrank suspicious posts
Block/limit repeat offenders

Interview line: “Safety is part of the pipeline—both at ingest and via engagement anomaly detection.”

8) Data Model (Simple)

Tables (conceptual):

posts(id, author, ts, body, media_refs, visibility)
follows(u, v, ts) // u follows v
timelines(user, post_id, ts, source)
- source: pushed vs pulled candidate vs ad vs recommendation
engagements(user, post_id, type, ts)
- type: like/comment/share/view

9) APIs

POST /post → create a post
GET /feed?cursor=... → main timeline
GET /u/{id}/feed?cursor=... → user profile feed (author’s posts)

10) Pagination (Cursor-based)

Use cursor pagination (not offset) to handle:

Changing ranking
New posts arriving
Large timelines

Cursor typically encodes:

last seen timestamp
last seen rank score
last seen post id

11) Benefits

Low latency feed reads via precomputed timelines
Scales to celebrity accounts using pull strategy
Supports personalization + ads
Built-in safety and abuse resistance
Clean separation: ingest, fanout, ranking, delivery

12) Interview Talking Points (What to emphasize)

Push vs Pull fanout (+ hybrid approach)
Celebrity problem (write amplification)
Ranking inputs + feature store
Cursor-based pagination
Ads blending and constraints
Spam/abuse: shadow bans, rate limits, anomaly detection
Tradeoffs: freshness vs latency vs consistency

Whiteboard-Style Summary (30 seconds)


POST: Post API -> Posts DB -> Event Bus -> Fanout (push normal users)
                                 |
                                 +-> Safety checks

GET: Feed API -> Timeline cache + Celebrity pull -> Ranker -> Blend -> Media -> Response

The Backend Engineer’s Journal