Notification Service at Scale (Email/SMS/Push/In-App)
1) Use Case & Problem Context
We need to send notifications to millions of users across multiple channels:
-
Email, SMS, Push, In-app
The system must support:
-
Templates + personalization (e.g.,
{name},{orderId}) -
Localization fallback
-
User preferences (opt-in/out)
-
Quiet hours + frequency caps
-
Retries with exponential backoff
-
Provider failover (if one provider is down)
-
Delivery status via provider webhooks
-
Observability (success rate, bounces, latency)
2) High-Level Architecture (Text Diagram)
Key principle:
✅ Redirect hot path? (not relevant here)
✅ For notifications: API should be fast and async; heavy work happens in workers.
3) Core Data Flow
A) Send flow
-
Client calls
POST /notify -
Service creates a Send Job
-
Enqueue tasks per recipient per channel
-
Fanout worker checks:
-
preferences
-
quiet hours
-
dedupe/idempotency
-
-
Render template
-
Route to provider (failover if needed)
-
Persist delivery status
-
Retry with backoff if transient error
-
Move to DLQ if permanently failing
B) Status flow
-
Provider calls webhook: delivered/bounced/failed
-
Store delivery event and update metrics
Notes (What to say in interviews)
-
Separate hot path vs cold path: API returns quickly; workers do heavy work.
-
Fan-out: convert one job into many tasks (per user/channel).
-
Preferences first: opt-out + quiet hours should skip early.
-
Provider failover: try next provider on transient failures.
-
Retries + DLQ: exponential backoff + poison message handling.
-
Idempotency: dedupe key prevents duplicates.
No comments:
Post a Comment