Confusion Matrix + Precision/Recall (Super Simple, With Examples)

1) Binary Classification Setup

Binary classification means the model predicts one of two classes:

Positive (1) → e.g., Fraud, Spam, Disease present
Negative (0) → e.g., Not Fraud, Not Spam, Healthy

Important: “Positive” does not mean “good”. It just means the class you care about detecting.

2) Confusion Matrix (The 2×2 Table)

A confusion matrix compares Actual vs Predicted:


                 Predicted
               0 (Negative)   1 (Positive)
Actual 0 (Neg)     TN            FP
Actual 1 (Pos)     FN            TP

✅ The 4 outcomes (all combinations)

1) True Positive (TP)

Actual = Positive (1)
Predicted = Positive (1)

Example (Fraud):

Transaction is fraud ✅
Model says fraud ✅

2) True Negative (TN)

Actual = Negative (0)
Predicted = Negative (0)

Example:

Transaction is not fraud ✅
Model says not fraud ✅

3) False Positive (FP) — “False Alarm”

Actual = Negative (0)
Predicted = Positive (1)

Example:

Not fraud ❌
Model says fraud ✅ (wrong)

Impact: blocks good users, annoys customers

4) False Negative (FN) — “Miss”

Actual = Positive (1)
Predicted = Negative (0)

Example:

Fraud ✅
Model says not fraud ❌ (wrong)

Impact: fraud slips through (often expensive)

3) Precision, Recall, Accuracy (Simple Meaning)

✅ Accuracy

“Out of all predictions, how many were correct?”

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

Good when:

classes are balanced (equal positives and negatives)

✅ Precision

“When the model says Positive, how often is it correct?”

Precision = \frac{TP}{TP + FP}

High precision means:

few false positives
good when “false alarms” are costly
(e.g., blocking legitimate bank transactions)

✅ Recall (Sensitivity)

“Out of actual Positives, how many did we catch?”

Recall = \frac{TP}{TP + FN}

High recall means:

few false negatives
good when missing positives is costly
(e.g., cancer detection, fraud detection)

✅ F1 Score

“Balance between Precision and Recall”

F1 = \frac{2 \cdot Precision \cdot Recall}{Precision + Recall}

Use when:

you need a tradeoff between FP and FN

4) Real Example With Numbers (Very Clear)

Assume we have:

TP = 40
FP = 10
FN = 20
TN = 30

Precision

Precision = \frac{40}{40+10} = \frac{40}{50} = 0.80

Meaning:

When we say “Positive”, we’re correct 80% of the time.

Recall

Recall = \frac{40}{40+20} = \frac{40}{60} \approx 0.67

Meaning:

We catch 67% of all real positives.

Accuracy

Accuracy = \frac{40+30}{100} = 0.70

5) Small Python Code Example (Confusion Matrix + Precision/Recall)


from sklearn.metrics import confusion_matrix, classification_report, precision_score, recall_score

# Actual labels (ground truth)
y_true = [1, 0, 1, 1, 0, 0, 1, 0, 0, 1]

# Model predictions
y_pred = [1, 0, 0, 1, 0, 1, 1, 0, 0, 0]

cm = confusion_matrix(y_true, y_pred)
print("Confusion Matrix:\n", cm)

tn, fp, fn, tp = cm.ravel()
print("\nTN:", tn, "FP:", fp, "FN:", fn, "TP:", tp)

print("\nPrecision:", precision_score(y_true, y_pred))
print("Recall:", recall_score(y_true, y_pred))

print("\nClassification Report:\n", classification_report(y_true, y_pred))

Output interpretation

cm.ravel() gives: TN, FP, FN, TP (in that order)
Use this to clearly see FP vs FN

6) How to Remember FP vs FN (Super Easy Trick)

False Positive (FP) = “False Alarm”

Model says Positive
But it’s actually Negative

Example: Spam filter puts real email into spam ❌

False Negative (FN) = “Miss”

Model says Negative
But it’s actually Positive

Example: Fraud transaction not detected ❌

7) When to Focus on Precision vs Recall (Interview Ready)

Focus on Precision when FP is costly

Spam filter (don’t block important emails)
Payment fraud block (don’t block genuine customers)
Legal/Compliance flags

Focus on Recall when FN is costly

Cancer detection (don’t miss disease)
Fraud detection (don’t miss fraud)
Security intrusion detection

8) Final Summary (One Paragraph)

A confusion matrix shows TP, TN, FP, FN. False positives are “false alarms” (predict positive when actually negative). False negatives are “misses” (predict negative when actually positive). Precision measures how reliable positive predictions are (reduces FP). Recall measures how many real positives are detected (reduces FN). F1 balances precision and recall, and accuracy is overall correctness but can be misleading when classes are imbalanced.

The Backend Engineer’s Journal