๐งฉ Understanding Java HashMap — Internal Working Explained Step by Step
The HashMap is one of the most powerful and widely used data structures in Java.
It stores key–value pairs, allowing fast lookups, insertions, and deletions — on average, in O(1) time.
⚙️ Key Features of HashMap
-
✅ Stores key–value pairs
-
๐ซ Duplicate keys are not allowed (latest value replaces old)
-
⚡ Fast
put()andget()operations -
๐ Order is not maintained
-
⚠️ Not synchronized (use
ConcurrentHashMapfor thread safety) -
๐พ Allows one null key and multiple null values
๐ก Internal Data Structure
Internally, a HashMap is an array of buckets,
and each bucket holds a linked list (or red-black tree, since Java 8) of entries.
Each entry (also called a Node) stores:
-
key -
value -
hash(calculated from the key) -
next(a pointer to the next node — used when multiple entries fall into the same bucket)
This chaining structure is the key to how HashMap handles collisions — when two different keys map to the same bucket index.
๐งฎ Step-by-Step: How put() Works
Let’s take an example:
Step 1️⃣ Compute Hash Code
HashMap calls the hashCode() method of the key:
For "Apple", this produces an integer (e.g., 63043).
Then HashMap spreads the hash bits using an internal formula to reduce collisions:
Step 2️⃣ Find the Bucket Index
The index is computed using modulo on the array size:
If the array size is 16, and hash = 63043 →
index = 63043 % 16 = 3
So this entry goes into bucket 3.
Step 3️⃣ Check for Collision
If bucket 3 is empty → insert a new node directly.
If not empty → traverse the linked list using the next pointer and compare keys using .equals():
-
If key already exists → replace its value.
-
Otherwise → attach the new entry at the end of the list.
This is why each node has a next pointer — it allows HashMap to connect multiple entries inside the same bucket.
Step 4️⃣ Store the Entry
Each entry looks like this internally:
When "Apple" is inserted:
If "Mango" also hashes to index 3:
Step 5️⃣ Resize if Needed
When the number of stored entries exceeds the load factor threshold (default = 0.75 × capacity),
the HashMap doubles its capacity and rehashes all entries to new buckets.
This process ensures performance stays close to O(1).
๐งฉ Step-by-Step: How get() Works
When you call:
Here’s what happens:
1️⃣ Compute hash for "Apple" again.
2️⃣ Find bucket index using hash % array_length.
3️⃣ Go to that bucket and check the first node:
-
If key matches → return value.
-
If not → follow
nextpointer to check the next node.
4️⃣ Continue until key is found ornext == null.
✅ Result: Returns "Red".
๐ Why the Linked List (and next Pointer) Is Needed
Let’s imagine two different keys produce the same index:
| Key | hash(key) | index |
|---|---|---|
| "Apple" | 63043 | 3 |
| "Mango" | 75395 | 3 |
Both fall into bucket 3 — this is a hash collision.
To prevent overwriting existing entries,
HashMap connects them using a linked list.
When get("Mango") is called, HashMap checks the first node (Apple), finds no match, then follows the next pointer to Mango and retrieves "Yellow".
๐ Without the linked list, only one entry could exist per bucket — collisions would cause data loss.
๐ณ Tree Optimization (Since Java 8)
If too many nodes end up in the same bucket (default threshold = 8),
HashMap converts the linked list into a Red-Black Tree.
Why?
-
Linked list lookup = O(n)
-
Tree lookup = O(log n)
This significantly improves performance in worst-case scenarios.
๐ง Summary of Internal Workflow
| Operation | Description | Complexity |
|---|---|---|
put() | Hash → Index → Insert/Replace → Rehash if needed | O(1) avg, O(n) worst |
get() | Hash → Index → Traverse linked list/tree | O(1) avg, O(log n) tree |
remove() | Hash → Index → Remove node | O(1) avg |
| Collision Handling | Linked list (or tree) per bucket | O(n) or O(log n) |
| Resizing | Doubles capacity when threshold exceeded | O(n) (rehash) |
๐งพ Quick Example
Output
๐งฉ Text-Based Visualization
If keys 3 and 19 hash to the same bucket,
they are connected using the next pointer to form a chain.
⚠️ Important Notes
-
HashMap is not thread-safe — concurrent writes can cause corruption.
-
Load factor (default = 0.75) controls how full the table can get before resizing.
-
You can specify both initial capacity and load factor when creating:
✅ Final Takeaways
| Concept | Purpose |
|---|---|
| Bucket Array | Primary storage of entries |
| Linked List / Tree | Handles collisions |
next Pointer | Connects entries within the same bucket |
| Rehashing | Redistributes keys when capacity grows |
| O(1) Average Time | Thanks to hash distribution & chaining |
| O(n) Worst Case | If many keys land in same bucket |
No comments:
Post a Comment