Confusion Matrix + Precision/Recall (Super Simple, With Examples)
1) Binary Classification Setup
Binary classification means the model predicts one of two classes:
-
Positive (1) → e.g., Fraud, Spam, Disease present
-
Negative (0) → e.g., Not Fraud, Not Spam, Healthy
Important: “Positive” does not mean “good”. It just means the class you care about detecting.
2) Confusion Matrix (The 2×2 Table)
A confusion matrix compares Actual vs Predicted:
✅ The 4 outcomes (all combinations)
1) True Positive (TP)
-
Actual = Positive (1)
-
Predicted = Positive (1)
Example (Fraud):
-
Transaction is fraud ✅
-
Model says fraud ✅
2) True Negative (TN)
-
Actual = Negative (0)
-
Predicted = Negative (0)
Example:
-
Transaction is not fraud ✅
-
Model says not fraud ✅
3) False Positive (FP) — “False Alarm”
-
Actual = Negative (0)
-
Predicted = Positive (1)
Example:
-
Not fraud ❌
-
Model says fraud ✅ (wrong)
Impact: blocks good users, annoys customers
4) False Negative (FN) — “Miss”
-
Actual = Positive (1)
-
Predicted = Negative (0)
Example:
-
Fraud ✅
-
Model says not fraud ❌ (wrong)
Impact: fraud slips through (often expensive)
3) Precision, Recall, Accuracy (Simple Meaning)
✅ Accuracy
“Out of all predictions, how many were correct?”
Good when:
-
classes are balanced (equal positives and negatives)
✅ Precision
“When the model says Positive, how often is it correct?”
High precision means:
-
few false positives
-
good when “false alarms” are costly
(e.g., blocking legitimate bank transactions)
✅ Recall (Sensitivity)
“Out of actual Positives, how many did we catch?”
High recall means:
-
few false negatives
-
good when missing positives is costly
(e.g., cancer detection, fraud detection)
✅ F1 Score
“Balance between Precision and Recall”
Use when:
-
you need a tradeoff between FP and FN
4) Real Example With Numbers (Very Clear)
Assume we have:
-
TP = 40
-
FP = 10
-
FN = 20
-
TN = 30
Precision
Meaning:
-
When we say “Positive”, we’re correct 80% of the time.
Recall
Meaning:
-
We catch 67% of all real positives.
Accuracy
5) Small Python Code Example (Confusion Matrix + Precision/Recall)
Output interpretation
-
cm.ravel()gives:TN, FP, FN, TP(in that order) -
Use this to clearly see FP vs FN
6) How to Remember FP vs FN (Super Easy Trick)
False Positive (FP) = “False Alarm”
-
Model says Positive
-
But it’s actually Negative
Example: Spam filter puts real email into spam ❌
False Negative (FN) = “Miss”
-
Model says Negative
-
But it’s actually Positive
Example: Fraud transaction not detected ❌
7) When to Focus on Precision vs Recall (Interview Ready)
Focus on Precision when FP is costly
-
Spam filter (don’t block important emails)
-
Payment fraud block (don’t block genuine customers)
-
Legal/Compliance flags
Focus on Recall when FN is costly
-
Cancer detection (don’t miss disease)
-
Fraud detection (don’t miss fraud)
-
Security intrusion detection
8) Final Summary (One Paragraph)
A confusion matrix shows TP, TN, FP, FN. False positives are “false alarms” (predict positive when actually negative). False negatives are “misses” (predict negative when actually positive). Precision measures how reliable positive predictions are (reduces FP). Recall measures how many real positives are detected (reduces FN). F1 balances precision and recall, and accuracy is overall correctness but can be misleading when classes are imbalanced.
No comments:
Post a Comment