2. What is Hashing?
Hashing is an algorithm (via a hash function) that maps large data sets
of variable length, called keys, to smaller data sets of a fixed length
A hash table (or hash map) is a data structure that uses a hash
function to efficiently map keys to values, for efficient search and
retrieval
Widely used in many kinds of computer software, particularly for
associative arrays, database indexing, caches, and sets
3. Hash Table Data Structure : Purpose
To support insertion, deletion and search in average case constant
time
Assumption: Order of elements irrelevant
==> data structure *not* useful for if you want to maintain and retrieve some kind
of an order of the elements
Hash function
Hash[ “string key”] ==> integer value
4. Hash Function
• A hash function is a mathematical formula which, when applied
to a key, produces a value which can be used as an index for
the key in the hash table.
• The main aim of a hash function is that elements should
be uniformly distributed. It produces a unique set of integers
within some suitable range in order to reduce the number of
collisions.
5. Different Hash Functions
Division Method
This is the most simple method of hashing. Any integer, for example, x is divided by a num
and the remainder obtained is used as the hash.
Generally, M is chosen to be a prime number because a prime number increases the like
that the keys are mapped with uniformity in the output range of values.
This function could be represented as:
h(k) = k mod M
Multiplication Method
The Multiplication method has the following steps:
1.A constant is chosen which is between 0 and 1, say A.
3.The fractional part of kA is extracted.
4.The result of Step 3 is multiplied by the size of the hash table ( m).
This can be represented as:
h(k) = fractional_part[ m (kA mod 1) ]
The key k is multiplied by A.
6. Mid-Square Method
• The Mid-Square method is as follows:
1.The value of the key is squared. That is, k^2 is found.
2.The middle r digits of the result are extracted.
3.The result r is the hash obtained.
• The algorithm works well because most or all digits of the key-value
contribute to the resulting hash.
7. • The concept of a hash table is a generalized idea of an array where key
does not have to be an integer.
• We can have a name as a key, or for that matter any object as the key.
• The trick is to find a hash function to compute an index so that an object
can be stored at a specific location in a table such that it can easily be
found.
8. • Suppose we have a set of strings {“abc”, “def”, “ghi”} that we’d like to
store in a table.
• Our objective here is to find or update them quickly from a table, actually
in O(1).
• We are not concerned about ordering them or maintaining any order at
all.
• Let us think of a simple schema to do this. Suppose we assign “a” = 1,
“b”=2, … etc to all alphabetical characters.
• We can then simply compute a number for each of the strings by using
the sum of the characters as follows.
• “abc” = 1 + 2 + 3=6, “def” = 4 + 5 + 6=15 , “ghi” = 7 + 8 + 9=24
• If we assume that we have a table of size 5 to store these strings, we can
compute the location of the string by taking the sum mod 5.
9. • “abc” in 6 mod 5 = 1, “def” in 15 mod 5 = 0, and “ghi” in 24 mod 5 = 4 in
locations 1, 0 and 4 as follows:
10. Problem with Hashing
• First of all, the hash function we used, that is the sum of the letters, is a
bad one.
• In case we have permutations of the same letters, “abc”, “bac” etc in the
set, we will end up with the same value for the sum and hence the key.
• In this case, the strings would hash into the same location, creating what
we call a “collision”.
• Secondly, we need to find a good table size, preferably a prime number
so that even if the sums are different, then collisions can be avoided,
when we take mod of the sum to find the location.
11. Hashing with Chaining
• The Chaining is one collision resolution technique. We cannot avoid
collision, but we can try to reduce the collision, and try to store
multiple elements for same hash value.
• This technique suppose our hash function h(x) ranging from 0 to 6.
So for more than 7 elements, there must be some elements, that will
be places inside the same room. For that we will create a list to
store them accordingly. In each time we will add at the beginning of
the list to perform insertion in O(1) time
12. Let us see the following example to get better idea. If we have some elements like {15, 47, 23, 34, 85, 97, 65,
89, 70}. And our hash function is h(x) = x mod 7.
13. Open Addressing
• Once a collision takes place, open addressing (also known as closed
hashing ) computes new positions using a probe sequence and the
next record is stored in that position. There are some well-known probe
sequences:
1.Linear Probing: The interval between the probes is fixed to 1. This
means that the very next available position in the table would be tried.
14. 2. Quadratic Probing: The interval between the probes increases
quadratically. This means that the next available position that would
be tried would increase quadratically.
15. Let us consider table Size = 7, hash function as Hash(x) = x % 7 and collision resolution strategy to be f(i) = i2 .
Insert = 22, 30, and 50.
17. •Step 3: Inserting 50
• Hash(50) = 50 % 7 = 1
• In our hash table slot 1 is already occupied. So, we will search for slot 1+12, i.e. 1+1 = 2,
• Again slot 2 is found occupied, so we will search for cell 1+22, i.e.1+4 = 5,
• Now, cell 5 is not occupied so we will place 50 in slot 5.
18. • Double Hashing: The interval between probes is fixed for each record
but the hash is computed again by double hashing.
• Insert the keys 27, 43, 92, 72 into the Hash Table of size 7. where first
hash-function is h1(k) = k mod 7 and second hash-function is h2(k) = 1
+ (k mod 5)
• Step 1: Insert 27
• 27 % 7 = 6, location 6 is empty so insert 27 into 6 slot.
19. •Step 2: Insert 43
•43 % 7 = 1, location 1 is empty so insert 43 into 1 slot.
20. •Step 3: Insert 92
•92 % 7 = 6, but location 6 is already being occupied and this is a collision
•So we need to resolve this collision using double hashing.
hnew = [h1(92) + i * (h2(92)] % 7
= [6 + 1 * (1 + 92 % 5)] % 7
= 9 % 7
= 2
Now, as 2 is an empty slot,
so we can insert 92 into 2nd slot.
21. •Insert 72
•72 % 7 = 2, but location 2 is already being occupied and this is a collision.
•So we need to resolve this collision using double hashing.
hnew = [h1(72) + i * (h2(72)] % 7
= [2 + 1 * (1 + 72 % 5)] % 7
= 5 % 7
= 5,
Now, as 5 is an empty slot,
so we can insert 72 into 5th slot.
Insert key 72 in the hash table