HASHING
1. Hashing is finding an address where the data is
to be stored as well as located using a key with
the help of the algorithmic function.
2. Hashing is a method of directly computing the
address of the record with the help of a key by
using a suitable mathematical function called
the hash function.
3. A hash table is an array-based structure used to
store <key, information> pairs
INTRODUCTION
INTRODUCTION
4. Hash Table: A hash table is a data structure that
stores records in an array, called a hash table. A
Hash table can be used for quick insertion and
searching.
key
Hash(key) Address
• For array to store a record in a hash table, hash
function is applied to the key of the record
being stored, returning an index within the
range of the hash table.
• The item is then stored in the table of that index
position.
INTRODUCTION
• Hash table uses a special function known as a hash
function that maps a given value with a key to access the
elements faster.
• A Hash table is a data structure that stores some
information, and the information has basically two main
components, i.e., key and value. The hash table can be
implemented with the help of an associative array. The
efficiency of mapping depends upon the efficiency of the
hash function used for mapping.
HASH TABLE
Here, are pros/benefits of using hash tables:
1. Hash tables have high performance when looking up data,
inserting, and deleting existing values.
2. The time complexity for hash tables is constant regardless
of the number of items in the table.
3. They perform very well even when working with
large datasets.
ADVANTAGES OF HASH TABLE
Here, are cons of using hash tables:
1. You cannot use a null value as a key.
2. Collisions cannot be avoided when generating keys using
hash functions. Collisions occur when a key that is already
in use is generated.
3. If the hashing function has many collisions, this can lead
to performance decrease.
DISAVANTAGES OF HASH TABLE
Here, are the Operations supported by Hash tables:
1. Insertion – this Operation is used to add an element to the
hash table
2. Searching – this Operation is used to search for elements
in the hash table using the key
3. Deleting – this Operation is used to delete elements from
the hash table
OPERATIONS ON HASH TABLE
Real-world Applications
In the real-world, hash tables are used to store data
for:
1. Databases
2. Associative array
3. Sets
4. Memory cache
APPLICATIONS OF HASH TABLE
HASH TABLE
1) Hash function should be simple to computer.
2) Number of collision should be less
3) The hash function uses all the input data
4) The hash function "uniformly" distributes the data across the entire set of
possible hash values.
5) The hash function generates very different hash values for similar strings.
PROPERTIES OF HASH FUNCTION
• A function that maps a key into the range [0 to Max − 1], the
result of which is used as an index (or address) to hash table for
storing and retrieving record
HASH FUNCTION
• Bucket is an index position in hash table that can store more
than one record
• When the same index is mapped with two keys, then both
the records are stored in the same bucket
BUCKET
• The result of two keys hashing into the same address is
called collision
COLLISION
• Each calculation of an address and test for success is
known as
Probe.
PROBE
• The result of more keys hashing to the same address and if
there is no room in the bucket, then it is said that overflow has
occurred
OVERFLOW
• A perfect hash function 'h' for a set S is a hash function that
maps distinct elements in S to a set of 'm' integers, with no
collisions.
• A perfect hash function with values in a limited range can be used
for efficient lookup operations, by placing keys from 'S' (or other
associated values) in a table indexed by the output of the
function.
Perfect Hash
Function
Load Factor: The load factor of a hash table is a measure of
how full the table is.
It is defined as the ratio of the number of elements (n) in the hash
table to the number of slots (buckets) (m) available in the hash
table.
Load Factor= n/m
LOAD FACTOR
Low Load Factor: This means most buckets are empty, leading to efficient
insertions and lookups because there are fewer collisions.
High Load Factor: As the table becomes more populated, the likelihood of
collisions increases, which can degrade the performance of insertions,
deletions, and lookups.
Resizing: When the load factor exceeds a certain threshold (often 0.7 or
70%), the hash table is typically resized (often doubled) to maintain
efficient operations. Resizing involves rehashing all existing elements into
a new, larger table.
LOAD FACTOR
LOAD DENSITY
It typically refers to the distribution of elements within the hash table
and can imply how evenly the elements are spread across the buckets.
•Even Load Density: This indicates that elements are uniformly
distributed across the buckets, minimizing collisions and maintaining
efficient performance.
•Uneven Load Density: When elements are clumped together in a few
buckets, it leads to higher collision rates and can degrade performance.
Consider a hash table with 10 buckets (slots) and 7 elements.
Load Factor=n/m=7/10=0.7
This means that 70% of the hash table is occupied. If the load
factor exceeds a certain threshold (e.g., 0.7), the hash table may
need to be resized to maintain efficient operations.
EXAMPLE OF LOAD FACTOR
EXAMPLE OF LOAD DENSITY
To illustrate load density, let's consider how elements are distributed across the buckets in
the hash table. Suppose we have the following distribution of elements:
•Bucket 0: 2 elements
•Bucket 1: 1 element
•Bucket 2: 0 elements
•Bucket 3: 1 element
•Bucket 4: 0 elements
•Bucket 5: 2 elements
•Bucket 6: 0 elements
•Bucket 7: 1 element
•Bucket 8: 0 elements
•Bucket 9: 0 elements
Load Density Observation:
•Even Distribution: If elements were evenly distributed, each bucket would have close to
n/m=7/10=0.7
•Actual Distribution: In this case, the distribution is uneven. Some buckets have 2
elements, some have 1, and others have none.
The uneven distribution (load density) indicates that certain buckets are more "loaded"
than others, which could lead to higher collision rates in those buckets.
• Hashing is one of the searching techniques that uses a constant time.
The time complexity in hashing is O(1).
• In Hashing technique, the hash table and hash function are used.
Using the hash function, we can calculate the address at which the
value can be stored.
HASHING
• The main idea behind the hashing is to create the
(key/value) pairs. If the key is given, then the algorithm
computes the index at which the value would be stored. It can be
written as:
• Index = hash(key)
HASHING/HASH FUNCTION
There are three ways of calculating the hash function:
1. Division method
2. Folding method
3. Mid square method
In the division method, the hash function can be defined as:
h(ki) = ki % m;
where m is the size of the hash table.
For example, if the key value is 6 and the size of the hash table is
10. When we apply the hash function to key 6 then the index would be:
h(6) = 6%10 = 6
The index is 6 at which the value is stored.
TYPES OF HASH FUNCTION
1. Division Method:
This is the most simple and easiest method to generate a hash value. The
hash function divides the value 'k' by 'M' and then uses the remainder
obtained.
Formula:
h(K) = k mod M
Here,
k is the key value, and
M is the size of the hash table.
TYPES OF HASH FUNCTION
Example:
k = 12345
M = 95
h(12345) = 12345 mod 95
= 90
k = 1276
M = 11
h(1276) = 1276 mod 11
= 0
2. The mid square method is a very good hashing method.
It involves two steps to compute the hash value-
Square the value of the key 'k' i.e. k2
Extract the middle 'r' digits as the hash value.
Formula:
h(K) = h(k x k)
Here,
k is the key value.
TYPES OF HASH FUNCTION
Example:
Suppose the hash table has 100 memory locations.
So, r = 2 because two digits are required to map
the key to the memory location.
k = 60
k x k = 60 x 60
= 3600
h(60) = 60
The hash value obtained is 60
3. Digit Folding Method : This method involves two steps:
Divide the key-value 'k' into a number of parts i.e. k1, k2, k3,….,kn, where each
part has the same number of digits except for the last part that can have lesser
digits than the other parts.
Add the individual parts. The hash value is obtained by ignoring the last carry if
any.
Formula:
k = k1, k2, k3, k4, ….., kn
s = k1+ k2 + k3 + k4 +….+
kn h(K)= s
Here,
s is obtained by adding the
TYPES OF HASH FUNCTION
3. Digit Folding Method :
Example:
k = 12345
k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5
= 51
h(K) = 51
TYPES OF HASH FUNCTION
4. Multiplication method :
This method involves the following steps:
• Choose a constant value 'A' such that 0 < A < 1.
• Multiply the key value with A.
• Extract the fractional part of kA.
• Multiply the result of the above step by the size of the hash
table
i.e. M.
• The resulting hash value is obtained by taking the floor of
the result obtained in step 4.
TYPES OF HASH FUNCTION
4. Multiplication method :
Formula:
h(K) = floor (M (kA mod
1)) Here,
M is the size of the hash
table. k is the key value.
A is a constant value.
Where "k A mod 1" means the fractional part of k A, that is, k A -
⌊k A⌋.
TYPES OF HASH FUNCTION
4. Multiplication method :
Example:
k = 12345
A = 0.357840
M = 100
h(12345) = floor[ 100 (12345*0.357840 mod 1)]
= floor[ 100 (4417.5348 mod 1) ]
= floor[ 100 (0.5348) ]
= floor[ 53.48 ]
= 53
TYPES OF HASH FUNCTION
5. Digit Extraction:
Consider a key value of 246813579 and a hash table size of 100.
Select Specific Digits (e.g., 2nd, 4th, and 6th digits):
2nd Digit: 4
4th Digit: 8
6th Digit: 3
Combine the Digits (e.g., by concatenation or summation)
Concatenation: 483
Summation: 4 + 8 + 3 = 15
Hash Value (if using summation):
Hash Value=15
TYPES OF HASH FUNCTION
6. Radix Transformation
Radix transformation in hashing involves converting a key from one numeric base (radix)
to another. This method can be particularly useful when dealing with non-integer keys,
such as strings, where characters are mapped to numeric values based on their position in
a character set. The transformed key can then be used in hashing algorithms to generate
hash values.
Steps for Radix Transformation
1. Convert the Key to a Numeric Base: Convert the key (which could be a string, for
example) into a numeric representation based on a chosen radix (base).
2. Combine the Numeric Values: Combine these numeric values into a single number,
often using polynomial accumulation.
3. Apply the Hash Function: Use the numeric value as input to a hash function to
generate the hash value.
TYPES OF HASH FUNCTION
Example
Consider the string key "abc" and a radix (base) of 26 (for the 26 letters of the
alphabet).
1.Assign Numeric Values to Characters:
1. 'a' -> 0
2. 'b' -> 1
3. 'c' -> 2
2.Combine the Numeric Values:
1. Treat the string as a number in base 26.
2. "abc" in base 26 can be represented as 0×262
+1×261
+2×260.
3.Calculate the Combined Value: 0×262
+1×261
+2×260
=0+26+2=28
4. Hash Value:
1. Use the combined numeric value (28) as input to a hash function.
2. If the hash table size is 10, for example:
Hash Value=28%10=8
TYPES OF HASH FUNCTION
6. Universal Hash Functions
Universal hash functions are a class of hash functions designed to minimize the chances of collision for any
given set of keys. They provide a probabilistic guarantee that the hash function chosen from a family of hash
functions will distribute keys uniformly across the hash table.
A family of hash functions H is said to be universal if, for any two distinct keys x and y, the probability that
they collide (i.e., h(x)=h(y)) is at most 1/m, where 'm’ is the number of possible hash values.
Construction of Universal Hash Functions
One common method to construct a universal family of hash functions is to use modular arithmetic with
random coefficients. Here’s an example construction:
Example: Hash Functions of the Form ha,b(x)=((a x+b)mod p)mod m
⋅
Parameters:
• p: A prime number larger than the maximum possible key value.
• m: The size of the hash table.
• a and b: Random integers chosen from {0,1,2,…,p−1} with a≠0a.
Hash Function: ha,b(x)=((a x+b)mod p)mod m
⋅
This construction ensures that the hash function ha,b is chosen uniformly at random from a family of
hash functions and minimizes collisions.
TYPES OF HASH FUNCTION
To resolve these collisions, we have some techniques known as collision
techniques.
The following are the collision techniques:
Open Hashing: It is also known as closed addressing.
Closed Hashing: It is also known as open addressing.
COLLISION
In Hashing, collision resolution techniques are classified as-
Collision Resolution Techniques
Open hashing, also known as separate chaining, is a technique used to handle collisions in hash
tables. In open hashing, each bucket of the hash table contains a linked list (or another data
structure) that stores all elements that hash to the same bucket. This approach allows multiple
elements to occupy the same bucket without overwriting each other, making it easier to handle
collisions.
OPEN HASHING
Let's first understand the chaining to resolve the collision.
Suppose we have a list of key values
A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and h(k) = 2k+3
In this case, we cannot directly use h(k) = ki/m as h(k) =
2k+3
The index of key value 3 is:
index = h(3) = (2(3)+3)%10 = 9
The value 3 would be stored at the index 9.
The index of key value 2 is:
index = h(2) = (2(2)+3)%10 = 7
The value 2 would be stored at the index 7.
OPEN HASHING
The index of key value 9 is:
index = h(9) = (2(9)+3)%10 = 1
The value 9 would be stored at the index 1.
The index of key value 6 is:
index = h(6) = (2(6)+3)%10 = 5
The value 6 would be stored at the index 5.
The index of key value 11 is:
index = h(11) = (2(11)+3)%10 = 5
The value 11 would be stored at the index
OPEN HASHING
The index of key value 13 is:
index = h(13) = (2(13)+3)%10 = 9
The value 13 would be stored at index
9.
The index of key value 7 is:
index = h(7) = (2(7)+3)%10 = 7
The value 7 would be stored at index 7.
The index of key value 12 is:
index = h(12) = (2(12)+3)%10 = 7
OPEN HASHING
OPEN HASHING
OPEN HASHING
Advantages of Open Hashing
1.Dynamic Size: Linked lists can grow dynamically, allowing for efficient use of space even
when the number of elements exceeds the number of buckets.
2.Simple Collision Resolution: Collisions are easily managed by inserting elements into the
linked list of the corresponding bucket.
3.Deletion: Deletion is straightforward and does not require rehashing.
Disadvantages of Open Hashing
4.Memory Overhead: Each node in the linked list requires additional memory for the pointer,
which can lead to higher memory usage.
5.Performance Degradation: If many elements hash to the same bucket, the performance of
search, insert, and delete operations can degrade to O(n) in the worst case.
6.Cache Inefficiency: Linked lists can lead to cache inefficiency due to their non-contiguous
memory allocation.
Closed hashing, also known as open addressing, is a technique used to
handle collisions in hash tables. In closed hashing, all elements are
stored directly in the hash table array itself, and collisions are resolved
by probing—finding the next available slot in the array according to a
specified probing sequence. There are several common probing methods,
including linear probing, quadratic probing, and double hashing.
CLOSED HASHING
PROBING
Method Description
This method searches for empty slots
linearly starting from
Linear probing
the position where the collision occurred
and moving forward. If the end of the list
is reached and no empty slot is found.
The probing starts at the beginning of
the list.
Quadratic
probing
This method uses quadratic polynomial
expressions to find the next available free
slot.
Double
Hashing
This technique uses a secondary
hash function algorithm to find the
next free available slot.
Linear Probing
Each cell in the hash table contains a key-value pair, so when the
collision occurs by mapping a new key to the cell already
occupied by another key, then linear probing technique searches
for the closest free locations and adds a new key to that empty
cell. In this case, searching is performed sequentially, starting from
the position where the collision occurs till the empty cell is not
found.
CLOSED HASHING
Consider the above example for the linear probing:
A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and h(k) = 2k+3
The key values 3, 2, 9, 6 are stored at the indexes 9, 7, 1, 5 respectively.
The calculated index value of 11 is 5 which is already occupied by another
key value, i.e., 6. When linear probing is applied, the nearest empty cell to
the index 5 is 6; therefore, the value 11 will be added at the index 6.
The next key value is 13. The index value associated with this key value is
9 when hash function is applied. The cell is already filled at index 9. When
linear probing is applied, the nearest empty cell to the index 9 is 0;
therefore, the value 13 will be added at the index 0
CLOSED HASHING
Let us consider a simple hash function as “key mod 7” and
a sequence of keys as 50, 700, 76, 85, 92, 73, 101.
CLOSED
HASHING
Advantages of Linear Probing
1.Simplicity: Linear probing is simple to implement and understand.
2.Cache Performance: Linear probing tends to have good cache performance due to its sequential
access pattern.
3.No Additional Memory: All elements are stored within the hash table array itself, leading to better
memory utilization.
Disadvantages of Linear Probing
4.Primary Clustering: Linear probing can lead to primary clustering, where long sequences of
occupied slots form, increasing the average search time.
Quadratic Probing
In quadratic probing, if a collision occurs at index i, the next slot to
check is determined by a quadratic function. The general formula for
quadratic probing is:
h(k,i)=(h(k)+c1 i+c
⋅ 2 i
⋅ 2
)mod m
where:
•h(k) is the primary hash function.
•‘i’ is the probe number (starting from 0).
•c1 and c2 are constants.
•m is the size of the hash table.
A common choice is c1=c2=1 which simplifies the formula to:
h(k,i)=(h(k)+i+i2
)mod m
Quadratic probing helps reduce clustering, a common problem with
CLOSED HASHING
CLOSED HASHING
Double hashing : It is a collision resolving technique in Open
Addressed Hash tables. Double hashing uses the idea of applying a
second hash function to key when a collision occurs.
Advantages of Double hashing
1. The advantage of Double hashing is that it is one of the best
form of
records
of probing, producing a uniform
distribution throughout a hash table.
2. This technique does not yield any clusters.
3. It is one of effective method for resolving collisions
CLOSED HASHING
Double hashing :
Double hashing can be done using :
(hash1(key) + i * hash2(key)) % TABLE_SIZE
Here hash1() and hash2() are hash functions and TABLE_SIZE
is size of hash table.
(We repeat by increasing i when collision occurs)
First hash function is typically hash1(key) = key %
TABLE_SIZE
A popular second hash function is : hash2(key) = PRIME – (key
% PRIME) where PRIME is a prime smaller than the TABLE_SIZE.
CLOSED HASHING
Double hashing : A good second Hash function
is:
1. It must never evaluate to zero
2. Must make sure that all cells can be probed
CLOSED HASHING
Rehashing is a collision resolution technique.
Rehashing is a technique in which the table is resized, i.e., the size of
table is doubled by creating a new table. It is preferable is the total size
of table is a prime number. There are situations in which the rehashing is
required.
• When table is completely full
• With quadratic probing when the table is filled half.
• When insertions fail due to overflow.
In such situations, we have to transfer entries from old table to the new
table by re computing their positions using hash functions
Rehashin
g
Rehashin
g
As the name suggests, rehashing means hashing again.
Basically, when the load factor increases to more than its pre-
defined value (default value of load factor is 0.75), the
complexity increases. So to overcome this, the size of the
array is increased (doubled) and all the values are hashed
again and stored in the new double sized array to maintain a
low load factor and low complexity.
Rehashin
g
Why rehashing?
Rehashing is done because whenever key value pairs are
inserted into the map, the load factor increases, which implies
that the time complexity also increases as explained above.
This might not give the required time complexity of O(1).
Hence, rehash must be done, increasing the size of the
bucketArray so as to reduce the load factor and the time
complexity
Rehashin
g
How Rehashing is done?
Rehashing can be done as follows:
• For each addition of a new entry to the map, check the load
factor.
• If it’s greater than its pre-defined value (or default value of 0.75
if not given), then Rehash.
• For Rehash, make a new array of double the previous size and
make it the new bucketarray.
• Then traverse to each element in the old bucketArray and call
the insert() for each so as to insert it into the new larger bucket
array.
Rehashin
g
Deletion
• With a chaining method, deleting an element leads to the
deletion of a node from a linked list holding the element.
Deletion
• Consider the table in which the keys are stored using linear probing.
• The keys have been entered in the following order: A1,A4,A2,B4,B1.
• After A4 is deleted and position 4 is freed
• we try to find B4 by first checking position 4.
• But this position is now empty, so we may conclude that B4 is not in the table. The same result occurs
after deleting A2 and marking cell 2 as empty. Then, the search for B1 is unsuccessful, because if we are
using linear probing, the search terminates at position 2. The situation is the same for the other open
addressing methods.
Deletion
• If we leave deleted keys in the table with markers indicating that they are not valid elements of the table, any subsequent
search for an element does not terminate prematurely.
• When a new key is inserted, it overwrites a key that is only a space filler.
• However, for a large number of deletions and a small number of additional insertions, the table becomes overloaded with
deleted records, which increases the search time because the open addressing methods require testing the deleted elements.
• Therefore, the table should be purged after a certain number of deletions by moving undeleted elements to the cells occupied
by deleted elements. Cells with deleted elements that are not overwritten by this procedure are marked as free.

Hashing and Collision Advanced data structure and algorithm

  • 1.
  • 2.
    1. Hashing isfinding an address where the data is to be stored as well as located using a key with the help of the algorithmic function. 2. Hashing is a method of directly computing the address of the record with the help of a key by using a suitable mathematical function called the hash function. 3. A hash table is an array-based structure used to store <key, information> pairs INTRODUCTION
  • 3.
    INTRODUCTION 4. Hash Table:A hash table is a data structure that stores records in an array, called a hash table. A Hash table can be used for quick insertion and searching. key Hash(key) Address
  • 4.
    • For arrayto store a record in a hash table, hash function is applied to the key of the record being stored, returning an index within the range of the hash table. • The item is then stored in the table of that index position. INTRODUCTION
  • 5.
    • Hash tableuses a special function known as a hash function that maps a given value with a key to access the elements faster. • A Hash table is a data structure that stores some information, and the information has basically two main components, i.e., key and value. The hash table can be implemented with the help of an associative array. The efficiency of mapping depends upon the efficiency of the hash function used for mapping. HASH TABLE
  • 6.
    Here, are pros/benefitsof using hash tables: 1. Hash tables have high performance when looking up data, inserting, and deleting existing values. 2. The time complexity for hash tables is constant regardless of the number of items in the table. 3. They perform very well even when working with large datasets. ADVANTAGES OF HASH TABLE
  • 7.
    Here, are consof using hash tables: 1. You cannot use a null value as a key. 2. Collisions cannot be avoided when generating keys using hash functions. Collisions occur when a key that is already in use is generated. 3. If the hashing function has many collisions, this can lead to performance decrease. DISAVANTAGES OF HASH TABLE
  • 8.
    Here, are theOperations supported by Hash tables: 1. Insertion – this Operation is used to add an element to the hash table 2. Searching – this Operation is used to search for elements in the hash table using the key 3. Deleting – this Operation is used to delete elements from the hash table OPERATIONS ON HASH TABLE
  • 9.
    Real-world Applications In thereal-world, hash tables are used to store data for: 1. Databases 2. Associative array 3. Sets 4. Memory cache APPLICATIONS OF HASH TABLE
  • 10.
  • 11.
    1) Hash functionshould be simple to computer. 2) Number of collision should be less 3) The hash function uses all the input data 4) The hash function "uniformly" distributes the data across the entire set of possible hash values. 5) The hash function generates very different hash values for similar strings. PROPERTIES OF HASH FUNCTION
  • 12.
    • A functionthat maps a key into the range [0 to Max − 1], the result of which is used as an index (or address) to hash table for storing and retrieving record HASH FUNCTION
  • 13.
    • Bucket isan index position in hash table that can store more than one record • When the same index is mapped with two keys, then both the records are stored in the same bucket BUCKET
  • 14.
    • The resultof two keys hashing into the same address is called collision COLLISION
  • 15.
    • Each calculationof an address and test for success is known as Probe. PROBE
  • 16.
    • The resultof more keys hashing to the same address and if there is no room in the bucket, then it is said that overflow has occurred OVERFLOW
  • 17.
    • A perfecthash function 'h' for a set S is a hash function that maps distinct elements in S to a set of 'm' integers, with no collisions. • A perfect hash function with values in a limited range can be used for efficient lookup operations, by placing keys from 'S' (or other associated values) in a table indexed by the output of the function. Perfect Hash Function
  • 18.
    Load Factor: Theload factor of a hash table is a measure of how full the table is. It is defined as the ratio of the number of elements (n) in the hash table to the number of slots (buckets) (m) available in the hash table. Load Factor= n/m LOAD FACTOR
  • 19.
    Low Load Factor:This means most buckets are empty, leading to efficient insertions and lookups because there are fewer collisions. High Load Factor: As the table becomes more populated, the likelihood of collisions increases, which can degrade the performance of insertions, deletions, and lookups. Resizing: When the load factor exceeds a certain threshold (often 0.7 or 70%), the hash table is typically resized (often doubled) to maintain efficient operations. Resizing involves rehashing all existing elements into a new, larger table. LOAD FACTOR
  • 20.
    LOAD DENSITY It typicallyrefers to the distribution of elements within the hash table and can imply how evenly the elements are spread across the buckets. •Even Load Density: This indicates that elements are uniformly distributed across the buckets, minimizing collisions and maintaining efficient performance. •Uneven Load Density: When elements are clumped together in a few buckets, it leads to higher collision rates and can degrade performance.
  • 21.
    Consider a hashtable with 10 buckets (slots) and 7 elements. Load Factor=n/m=7/10=0.7 This means that 70% of the hash table is occupied. If the load factor exceeds a certain threshold (e.g., 0.7), the hash table may need to be resized to maintain efficient operations. EXAMPLE OF LOAD FACTOR
  • 22.
    EXAMPLE OF LOADDENSITY To illustrate load density, let's consider how elements are distributed across the buckets in the hash table. Suppose we have the following distribution of elements: •Bucket 0: 2 elements •Bucket 1: 1 element •Bucket 2: 0 elements •Bucket 3: 1 element •Bucket 4: 0 elements •Bucket 5: 2 elements •Bucket 6: 0 elements •Bucket 7: 1 element •Bucket 8: 0 elements •Bucket 9: 0 elements Load Density Observation: •Even Distribution: If elements were evenly distributed, each bucket would have close to n/m=7/10=0.7 •Actual Distribution: In this case, the distribution is uneven. Some buckets have 2 elements, some have 1, and others have none. The uneven distribution (load density) indicates that certain buckets are more "loaded" than others, which could lead to higher collision rates in those buckets.
  • 23.
    • Hashing isone of the searching techniques that uses a constant time. The time complexity in hashing is O(1). • In Hashing technique, the hash table and hash function are used. Using the hash function, we can calculate the address at which the value can be stored. HASHING
  • 24.
    • The mainidea behind the hashing is to create the (key/value) pairs. If the key is given, then the algorithm computes the index at which the value would be stored. It can be written as: • Index = hash(key) HASHING/HASH FUNCTION
  • 25.
    There are threeways of calculating the hash function: 1. Division method 2. Folding method 3. Mid square method In the division method, the hash function can be defined as: h(ki) = ki % m; where m is the size of the hash table. For example, if the key value is 6 and the size of the hash table is 10. When we apply the hash function to key 6 then the index would be: h(6) = 6%10 = 6 The index is 6 at which the value is stored. TYPES OF HASH FUNCTION
  • 26.
    1. Division Method: Thisis the most simple and easiest method to generate a hash value. The hash function divides the value 'k' by 'M' and then uses the remainder obtained. Formula: h(K) = k mod M Here, k is the key value, and M is the size of the hash table. TYPES OF HASH FUNCTION Example: k = 12345 M = 95 h(12345) = 12345 mod 95 = 90 k = 1276 M = 11 h(1276) = 1276 mod 11 = 0
  • 27.
    2. The midsquare method is a very good hashing method. It involves two steps to compute the hash value- Square the value of the key 'k' i.e. k2 Extract the middle 'r' digits as the hash value. Formula: h(K) = h(k x k) Here, k is the key value. TYPES OF HASH FUNCTION Example: Suppose the hash table has 100 memory locations. So, r = 2 because two digits are required to map the key to the memory location. k = 60 k x k = 60 x 60 = 3600 h(60) = 60 The hash value obtained is 60
  • 28.
    3. Digit FoldingMethod : This method involves two steps: Divide the key-value 'k' into a number of parts i.e. k1, k2, k3,….,kn, where each part has the same number of digits except for the last part that can have lesser digits than the other parts. Add the individual parts. The hash value is obtained by ignoring the last carry if any. Formula: k = k1, k2, k3, k4, ….., kn s = k1+ k2 + k3 + k4 +….+ kn h(K)= s Here, s is obtained by adding the TYPES OF HASH FUNCTION
  • 29.
    3. Digit FoldingMethod : Example: k = 12345 k1 = 12, k2 = 34, k3 = 5 s = k1 + k2 + k3 = 12 + 34 + 5 = 51 h(K) = 51 TYPES OF HASH FUNCTION
  • 30.
    4. Multiplication method: This method involves the following steps: • Choose a constant value 'A' such that 0 < A < 1. • Multiply the key value with A. • Extract the fractional part of kA. • Multiply the result of the above step by the size of the hash table i.e. M. • The resulting hash value is obtained by taking the floor of the result obtained in step 4. TYPES OF HASH FUNCTION
  • 31.
    4. Multiplication method: Formula: h(K) = floor (M (kA mod 1)) Here, M is the size of the hash table. k is the key value. A is a constant value. Where "k A mod 1" means the fractional part of k A, that is, k A - ⌊k A⌋. TYPES OF HASH FUNCTION
  • 32.
    4. Multiplication method: Example: k = 12345 A = 0.357840 M = 100 h(12345) = floor[ 100 (12345*0.357840 mod 1)] = floor[ 100 (4417.5348 mod 1) ] = floor[ 100 (0.5348) ] = floor[ 53.48 ] = 53 TYPES OF HASH FUNCTION
  • 33.
    5. Digit Extraction: Considera key value of 246813579 and a hash table size of 100. Select Specific Digits (e.g., 2nd, 4th, and 6th digits): 2nd Digit: 4 4th Digit: 8 6th Digit: 3 Combine the Digits (e.g., by concatenation or summation) Concatenation: 483 Summation: 4 + 8 + 3 = 15 Hash Value (if using summation): Hash Value=15 TYPES OF HASH FUNCTION
  • 34.
    6. Radix Transformation Radixtransformation in hashing involves converting a key from one numeric base (radix) to another. This method can be particularly useful when dealing with non-integer keys, such as strings, where characters are mapped to numeric values based on their position in a character set. The transformed key can then be used in hashing algorithms to generate hash values. Steps for Radix Transformation 1. Convert the Key to a Numeric Base: Convert the key (which could be a string, for example) into a numeric representation based on a chosen radix (base). 2. Combine the Numeric Values: Combine these numeric values into a single number, often using polynomial accumulation. 3. Apply the Hash Function: Use the numeric value as input to a hash function to generate the hash value. TYPES OF HASH FUNCTION
  • 35.
    Example Consider the stringkey "abc" and a radix (base) of 26 (for the 26 letters of the alphabet). 1.Assign Numeric Values to Characters: 1. 'a' -> 0 2. 'b' -> 1 3. 'c' -> 2 2.Combine the Numeric Values: 1. Treat the string as a number in base 26. 2. "abc" in base 26 can be represented as 0×262 +1×261 +2×260. 3.Calculate the Combined Value: 0×262 +1×261 +2×260 =0+26+2=28 4. Hash Value: 1. Use the combined numeric value (28) as input to a hash function. 2. If the hash table size is 10, for example: Hash Value=28%10=8 TYPES OF HASH FUNCTION
  • 36.
    6. Universal HashFunctions Universal hash functions are a class of hash functions designed to minimize the chances of collision for any given set of keys. They provide a probabilistic guarantee that the hash function chosen from a family of hash functions will distribute keys uniformly across the hash table. A family of hash functions H is said to be universal if, for any two distinct keys x and y, the probability that they collide (i.e., h(x)=h(y)) is at most 1/m, where 'm’ is the number of possible hash values. Construction of Universal Hash Functions One common method to construct a universal family of hash functions is to use modular arithmetic with random coefficients. Here’s an example construction: Example: Hash Functions of the Form ha,b(x)=((a x+b)mod p)mod m ⋅ Parameters: • p: A prime number larger than the maximum possible key value. • m: The size of the hash table. • a and b: Random integers chosen from {0,1,2,…,p−1} with a≠0a. Hash Function: ha,b(x)=((a x+b)mod p)mod m ⋅ This construction ensures that the hash function ha,b is chosen uniformly at random from a family of hash functions and minimizes collisions. TYPES OF HASH FUNCTION
  • 37.
    To resolve thesecollisions, we have some techniques known as collision techniques. The following are the collision techniques: Open Hashing: It is also known as closed addressing. Closed Hashing: It is also known as open addressing. COLLISION
  • 38.
    In Hashing, collisionresolution techniques are classified as- Collision Resolution Techniques
  • 39.
    Open hashing, alsoknown as separate chaining, is a technique used to handle collisions in hash tables. In open hashing, each bucket of the hash table contains a linked list (or another data structure) that stores all elements that hash to the same bucket. This approach allows multiple elements to occupy the same bucket without overwriting each other, making it easier to handle collisions. OPEN HASHING
  • 40.
    Let's first understandthe chaining to resolve the collision. Suppose we have a list of key values A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and h(k) = 2k+3 In this case, we cannot directly use h(k) = ki/m as h(k) = 2k+3 The index of key value 3 is: index = h(3) = (2(3)+3)%10 = 9 The value 3 would be stored at the index 9. The index of key value 2 is: index = h(2) = (2(2)+3)%10 = 7 The value 2 would be stored at the index 7. OPEN HASHING
  • 41.
    The index ofkey value 9 is: index = h(9) = (2(9)+3)%10 = 1 The value 9 would be stored at the index 1. The index of key value 6 is: index = h(6) = (2(6)+3)%10 = 5 The value 6 would be stored at the index 5. The index of key value 11 is: index = h(11) = (2(11)+3)%10 = 5 The value 11 would be stored at the index OPEN HASHING
  • 42.
    The index ofkey value 13 is: index = h(13) = (2(13)+3)%10 = 9 The value 13 would be stored at index 9. The index of key value 7 is: index = h(7) = (2(7)+3)%10 = 7 The value 7 would be stored at index 7. The index of key value 12 is: index = h(12) = (2(12)+3)%10 = 7 OPEN HASHING
  • 43.
  • 44.
  • 45.
    Advantages of OpenHashing 1.Dynamic Size: Linked lists can grow dynamically, allowing for efficient use of space even when the number of elements exceeds the number of buckets. 2.Simple Collision Resolution: Collisions are easily managed by inserting elements into the linked list of the corresponding bucket. 3.Deletion: Deletion is straightforward and does not require rehashing. Disadvantages of Open Hashing 4.Memory Overhead: Each node in the linked list requires additional memory for the pointer, which can lead to higher memory usage. 5.Performance Degradation: If many elements hash to the same bucket, the performance of search, insert, and delete operations can degrade to O(n) in the worst case. 6.Cache Inefficiency: Linked lists can lead to cache inefficiency due to their non-contiguous memory allocation.
  • 46.
    Closed hashing, alsoknown as open addressing, is a technique used to handle collisions in hash tables. In closed hashing, all elements are stored directly in the hash table array itself, and collisions are resolved by probing—finding the next available slot in the array according to a specified probing sequence. There are several common probing methods, including linear probing, quadratic probing, and double hashing. CLOSED HASHING
  • 47.
    PROBING Method Description This methodsearches for empty slots linearly starting from Linear probing the position where the collision occurred and moving forward. If the end of the list is reached and no empty slot is found. The probing starts at the beginning of the list. Quadratic probing This method uses quadratic polynomial expressions to find the next available free slot. Double Hashing This technique uses a secondary hash function algorithm to find the next free available slot.
  • 48.
    Linear Probing Each cellin the hash table contains a key-value pair, so when the collision occurs by mapping a new key to the cell already occupied by another key, then linear probing technique searches for the closest free locations and adds a new key to that empty cell. In this case, searching is performed sequentially, starting from the position where the collision occurs till the empty cell is not found. CLOSED HASHING
  • 49.
    Consider the aboveexample for the linear probing: A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and h(k) = 2k+3 The key values 3, 2, 9, 6 are stored at the indexes 9, 7, 1, 5 respectively. The calculated index value of 11 is 5 which is already occupied by another key value, i.e., 6. When linear probing is applied, the nearest empty cell to the index 5 is 6; therefore, the value 11 will be added at the index 6. The next key value is 13. The index value associated with this key value is 9 when hash function is applied. The cell is already filled at index 9. When linear probing is applied, the nearest empty cell to the index 9 is 0; therefore, the value 13 will be added at the index 0 CLOSED HASHING
  • 50.
    Let us considera simple hash function as “key mod 7” and a sequence of keys as 50, 700, 76, 85, 92, 73, 101. CLOSED HASHING
  • 51.
    Advantages of LinearProbing 1.Simplicity: Linear probing is simple to implement and understand. 2.Cache Performance: Linear probing tends to have good cache performance due to its sequential access pattern. 3.No Additional Memory: All elements are stored within the hash table array itself, leading to better memory utilization. Disadvantages of Linear Probing 4.Primary Clustering: Linear probing can lead to primary clustering, where long sequences of occupied slots form, increasing the average search time.
  • 52.
    Quadratic Probing In quadraticprobing, if a collision occurs at index i, the next slot to check is determined by a quadratic function. The general formula for quadratic probing is: h(k,i)=(h(k)+c1 i+c ⋅ 2 i ⋅ 2 )mod m where: •h(k) is the primary hash function. •‘i’ is the probe number (starting from 0). •c1 and c2 are constants. •m is the size of the hash table. A common choice is c1=c2=1 which simplifies the formula to: h(k,i)=(h(k)+i+i2 )mod m Quadratic probing helps reduce clustering, a common problem with CLOSED HASHING
  • 53.
  • 54.
    Double hashing :It is a collision resolving technique in Open Addressed Hash tables. Double hashing uses the idea of applying a second hash function to key when a collision occurs. Advantages of Double hashing 1. The advantage of Double hashing is that it is one of the best form of records of probing, producing a uniform distribution throughout a hash table. 2. This technique does not yield any clusters. 3. It is one of effective method for resolving collisions CLOSED HASHING
  • 55.
    Double hashing : Doublehashing can be done using : (hash1(key) + i * hash2(key)) % TABLE_SIZE Here hash1() and hash2() are hash functions and TABLE_SIZE is size of hash table. (We repeat by increasing i when collision occurs) First hash function is typically hash1(key) = key % TABLE_SIZE A popular second hash function is : hash2(key) = PRIME – (key % PRIME) where PRIME is a prime smaller than the TABLE_SIZE. CLOSED HASHING
  • 56.
    Double hashing :A good second Hash function is: 1. It must never evaluate to zero 2. Must make sure that all cells can be probed CLOSED HASHING
  • 57.
    Rehashing is acollision resolution technique. Rehashing is a technique in which the table is resized, i.e., the size of table is doubled by creating a new table. It is preferable is the total size of table is a prime number. There are situations in which the rehashing is required. • When table is completely full • With quadratic probing when the table is filled half. • When insertions fail due to overflow. In such situations, we have to transfer entries from old table to the new table by re computing their positions using hash functions Rehashin g
  • 58.
  • 59.
    As the namesuggests, rehashing means hashing again. Basically, when the load factor increases to more than its pre- defined value (default value of load factor is 0.75), the complexity increases. So to overcome this, the size of the array is increased (doubled) and all the values are hashed again and stored in the new double sized array to maintain a low load factor and low complexity. Rehashin g
  • 60.
    Why rehashing? Rehashing isdone because whenever key value pairs are inserted into the map, the load factor increases, which implies that the time complexity also increases as explained above. This might not give the required time complexity of O(1). Hence, rehash must be done, increasing the size of the bucketArray so as to reduce the load factor and the time complexity Rehashin g
  • 61.
    How Rehashing isdone? Rehashing can be done as follows: • For each addition of a new entry to the map, check the load factor. • If it’s greater than its pre-defined value (or default value of 0.75 if not given), then Rehash. • For Rehash, make a new array of double the previous size and make it the new bucketarray. • Then traverse to each element in the old bucketArray and call the insert() for each so as to insert it into the new larger bucket array. Rehashin g
  • 62.
    Deletion • With achaining method, deleting an element leads to the deletion of a node from a linked list holding the element.
  • 63.
    Deletion • Consider thetable in which the keys are stored using linear probing. • The keys have been entered in the following order: A1,A4,A2,B4,B1. • After A4 is deleted and position 4 is freed • we try to find B4 by first checking position 4. • But this position is now empty, so we may conclude that B4 is not in the table. The same result occurs after deleting A2 and marking cell 2 as empty. Then, the search for B1 is unsuccessful, because if we are using linear probing, the search terminates at position 2. The situation is the same for the other open addressing methods.
  • 64.
    Deletion • If weleave deleted keys in the table with markers indicating that they are not valid elements of the table, any subsequent search for an element does not terminate prematurely. • When a new key is inserted, it overwrites a key that is only a space filler. • However, for a large number of deletions and a small number of additional insertions, the table becomes overloaded with deleted records, which increases the search time because the open addressing methods require testing the deleted elements. • Therefore, the table should be purged after a certain number of deletions by moving undeleted elements to the cells occupied by deleted elements. Cells with deleted elements that are not overwritten by this procedure are marked as free.