AI/ML Basics — Supervised vs Unsupervised Learning (Simple Guide + Code)
1) What is Machine Learning?
Machine Learning (ML) helps computers learn patterns from data so they can:
-
predict outcomes (e.g., house price)
-
classify things (e.g., spam vs not spam)
-
group similar items (e.g., customer segments)
2) Supervised vs Unsupervised Learning
✅ Supervised Learning (Labeled Data)
What
You train a model using:
-
input features
X -
known output labels/targets
y
Example:
-
X = [size, bedrooms] -
y = house_price
Goal
Learn a mapping:
X → y
Common problems
-
Regression: predict a number (price, demand, temperature)
-
Classification: predict a category (spam/ham, fraud/not fraud)
✅ Unsupervised Learning (Unlabeled Data)
What
You only have X, but no labels y.
Example:
-
customer data: spending, visits, age
(no “segment label” provided)
Goal
Discover structure:
-
clusters (groups)
-
similarity
-
hidden patterns
Common problems
-
Clustering (K-Means, Hierarchical)
-
dimensionality reduction (PCA)
3) Supervised Learning Algorithms (with Simple Code)
3.1 Linear Regression (Regression)
Use case
Predict a continuous value:
-
house price
-
sales forecast
Code (Simple)
3.2 Logistic Regression (Classification)
Use case
Predict a category:
-
spam vs not spam
-
pass/fail
-
fraud/not fraud
Code (Iris dataset)
3.3 Random Forest (Classification + Regression)
What
Random Forest is an ensemble of many decision trees.
It reduces overfitting and works well in practice.
A) Random Forest Classifier
B) Random Forest Regressor
4) Unsupervised Learning Algorithms (with Simple Code)
4.1 K-Means Clustering
What
K-Means groups points into K clusters by minimizing distance to cluster centers.
Use cases
-
customer segmentation
-
grouping similar products
-
anomaly detection (rough)
Code
Interpretation
-
Each row gets a cluster label (0/1/2)
-
Points with same label belong to the same group
4.2 Hierarchical Clustering (Agglomerative)
What
Builds clusters by progressively merging closest groups:
-
start with each point as its own cluster
-
merge until desired cluster count
Use cases
-
when you want a “cluster tree” (dendrogram concept)
-
small/medium datasets
Code
Note
-
"ward"works best with Euclidean distance -
linkage options:
ward,complete,average,single
5) When to Use Which Algorithm? (Simple Decision)
Supervised
✅ Linear Regression → numeric prediction, linear relationship
✅ Logistic Regression → simple classification, interpretable
✅ Random Forest → strong baseline for most tabular problems
Unsupervised
✅ K-Means → fast clustering when you know K
✅ Hierarchical → good when you want cluster structure and no need for huge scale
6) Interview-Friendly Summary (One Paragraph)
Supervised learning uses labeled data (X, y) to learn a mapping and is used for regression and classification (e.g., Linear Regression, Logistic Regression, Random Forest). Unsupervised learning uses only features X to find hidden patterns, mainly clustering (e.g., K-Means, Hierarchical). Linear regression predicts numbers, logistic regression predicts classes, random forests provide robust performance by combining many trees, and clustering algorithms group similar points without labels.
No comments:
Post a Comment