INTRODUCTION TO HASHING
What is Hashing? Hashing is a method of storing and retrieving data in O(1) (Constant Time).
The Problem: In an Array or Linked List, to find a specific number (e.g., "500"), you might have to check every
single slot. This is O(n), which is slow.
The Solution: Hashing uses a formula to calculate the exact address. If you want "500", the formula tells you:
"Go immediately to Index 4".
Representation:
Hash Table: An array of fixed size M where data is stored.
Hash Function (h(x)): A mathematical algorithm that takes an input (Key) and produces an output (Index).
3.
PROPERTIES & APPLICATIONS
Properties of a Good Hash Function:
Low Cost: It must be very fast to calculate.
Uniform Distribution: It should spread keys evenly across the table. It should not clump data into one area (Clustering).
Deterministic: Input A must always equal Index B. Randomness is not allowed.
Real-World Applications:
Databases: Indexing for fast lookups (e.g., finding a user by ID).
Cryptography: Storing passwords (e.g., MD5, SHA-256). We never store plain-text passwords; we store the hash.
Compilers: "Symbol Tables" used to store variable names and function names during coding.
Caches: Browser caches use hashing to quickly locate saved web pages.
4.
HASH FUNCTION 1:DIVISION METHOD
Map a key into a table slot by taking the remainder of the key divided by the table size.
Formula:
h(k)=k mod m
(where k is the key and m is the table size)
Detailed Example:
Table Size (m): 13 (Using a Prime number is best to avoid patterns).
Key (k): 25
Calculation: 25÷13=1 with a remainder of 12.
Result: Store 25 at Index 12.
Pros & Cons:
Pro: Extremely fast (just one division operation).
Con: Poor performance if m is not a prime number (e.g., if m is 2p, the hash depends only on the last p bits).
5.
HASH FUNCTION 2:MULTIPLICATION METHOD
Multiply the key by a constant fraction (0<A<1), take the decimal part, and scale it to the table size.
Steps:
Choose constant A (Knuth suggests (5
−1)/2≈0.618).
Multiply k×A.
Take the fractional part (remove the whole number).
Multiply by table size m.
Take the floor (integer part).
Detailed Example:
Size (m): 100
Key (k): 123
Constant (A): 0.618
123×0.618=76.014
Fraction is 0.014
0.014×100=1.4
Floor(1.4) = 1
Result: Store at Index 1.
6.
HASH FUNCTION 3:MID-SQUARE METHOD
Good for randomization. Square the key to get a larger number, then extract the middle digits.
Steps:
Compute k2.
Extract the middle r digits (where r depends on table size. If table is size 100, we need 2 digits).
Detailed Example:
Table Size: 100 (Indices 00-99)
Key: 45
Square: 45×45=2025
Extract Middle: The middle digits of 2025 are 02.
Result: Store at Index 2.
Note: Why the middle? The middle digits depend on all digits of the original key, mixing the data thoroughly.
7.
HASH FUNCTION 3:MID-SQUARE METHOD
Good for randomization. Square the key to get a larger number, then extract the middle digits.
Steps:
Compute k2.
Extract the middle r digits (where r depends on table size. If table is size 100, we need 2 digits).
Detailed Example:
Table Size: 100 (Indices 00-99)
Key: 45
Square: 45×45=2025
Extract Middle: The middle digits of 2025 are 02.
Result: Store at Index 2.
Note: Why the middle? The middle digits depend on all digits of the original key, mixing the data thoroughly.
8.
HASH FUNCTION 4:FOLDING METHOD
Used for large keys (like Social Security Numbers or IP addresses). Break the key into chunks and add them up.
Two Types:
Fold Shift: Add parts as they are.
Fold Boundary: Reverse the boundary parts before adding (more complex mixing).
Detailed Example (Fold Shift):
Key: 123456789
Table Size: 1000 (We need 3-digit indices).
Split: 123 | 456 | 789
Add: 123+456+789=1368
Wrap: Ignore the leading '1' to fit in size 1000.
Result: Store at Index 368.
9.
COLLISION RESOLUTION TECHNIQUES
What is a Collision? When two different keys generate the same index.
Hash(15)→5
Hash(25)→5
We cannot store two items in one array slot.
We need a strategy to fix this:
Open Hashing (Separate Chaining): Store collisions outside the table (in a list).
Closed Hashing (Open Addressing): Find another empty slot inside the table.
10.
TECHNIQUE 1: SEPARATECHAINING
Each slot in the hash table points to a Linked List. If a collision happens, add the new item to the end of the list at
that index.
Example:
Function: kmod10
Insert 12: Index 2. [12]
Insert 22: Index 2. Collision! [12] -> [22]
Insert 32: Index 2. Collision! [12] -> [22] -> [32]
Analysis:
Pro: The table never gets "full".
Con: Uses extra memory for pointers. Search speed degrades to O(n) if the chain gets too long.
11.
TECHNIQUE 2: LINEARPROBING (OPEN ADDRESSING)
If the calculated slot is full, check the next available slot sequentially.
Formula: Index=(Hash(k)+i)modm (i = 1, 2, 3...)
Example:
Function: kmod10
Insert 55: Index 5. (Placed)
Insert 65: Index 5 (Full).
Check Index 6. (Empty? Yes. Place 65 here).
Insert 75: Index 5 (Full).
Check Index 6 (Full).
Check Index 7 (Empty? Yes. Place 75 here).
The Problem: Primary Clustering. Long blocks of occupied cells form, increasing search time for future items.
12.
TECHNIQUE 3: QUADRATICPROBING (OPEN ADDRESSING)
To fix the clustering problem of Linear Probing, we don't jump by 1. We jump by squares (12,22,32,42).
Formula: Index=(Hash(k)+i2)modm
Example:
Insert at Index 5 (Full).
Attempt 1: 5+12=6. (Full?)
Attempt 2: 5+22=9. (Full?)
Attempt 3: 5+32=14mod10=4. (Empty? Place here).
Analysis:
Pro: Reduces Primary Clustering.
Con: Can still suffer from "Secondary Clustering" (keys hashing to the same start point follow the same path).
13.
SUMMARY
Goal: Searchand Insert in O(1) time.
Hash Functions: Division (k%m) is simplest. Multiplication & Mid-Square provide better randomness. Folding is
for large keys.
Separate Chaining: Uses Linked Lists. Good for unlimited data.
Linear Probing: Jumps +1. Simple but causes clustering.
Quadratic Probing: Jumps +1,+4,+9. Reduces clustering.