SlideShare a Scribd company logo
1 of 9
Download to read offline
NOTES ON HASHING
Author: Jayakanth Srinivasan jksrini@mit.edu
Introduction
Any large information source (data base) can be thought of as a table (with multiple
fields), containing information.
For example:
A telephone book has fields name, address and phone number. When you want to find
somebody’s phone number, you search the book based on the name field.
A user account on AA-Design, has the fields user_id, password and home folder. You log
on using your user_id and password and it takes you to your home folder.
To find an entry (field of information) in the table, you only have to use the contents of
one of the fields (say name in the case of the telephone book). You don’t have to know
the contents of all the fields. The field you use to find the contents of the other fields is
called the key.
Ideally, the key should uniquely identify the entry, i.e. if the key is the name then no two
entries in the telephone book have the same name.
We can treat the Table formulation as an abstract data type. As with all ADT’s we can
define a set of operations on a table
Operation Description
Initialize Initialize internal structure; create an empty table
IsEmpty True iff the table has no elements
Insert Given a key and an entry, insert it into the table
Find Given a key, find the entry associated with the key
Remove Given a key, find the entry associated with the key and
remove it from the table
Table 1. Table ADT Operations
Implementation
Given an ADT, the implementation is subject to the following questions
What is the frequency of insertion and deletion into the table?
How many key values will be used?
What is the pattern for searching for the keys? i.e. will most of the accesses use
only one or two key values?
Is the table small enough to fit into memory?
How long should the table exist in memory?
We use the word node to represent an entry into the table. For searching, the key is
typically stored separately from the table entry (even if the key is present in the entry as
well). Can you think of why?
Unsorted Sequential Array
0
…
key entry
1
2
3
and so on
4
14
45
22
67
17
<data>
<data>
<data>
<data>
<data>
0
…
key entry
1
2
3
and so on
4
14
45
22
67
17
<data>
<data>
<data>
<data>
<data>
Figure 1. Unsorted Sequential Array Implementation
An array implementation stores the nodes consequtively in any order (not necessarily in
ascending or descending order).
Operation Description
Initialize O(1)
IsEmpty O(1) as you will only check if the first element is empty
Insert O(1) as you will add to the end of the array
Find O(n) as you have to sequentially search the array, in the
worst case through the entire array
Remove O(n) as you have to sequentially search the array, delete
the element and copy all elements one place up
Table 2. Unsorted Sequential Array Table Operations
Unsorted Sequential Array
0
…
key entry
1
2
3
and so on
4
15
17
22
45
67
<data>
<data>
<data>
<data>
<data>
0
…
key entry
1
2
3
and so on
4
15
17
22
45
67
<data>
<data>
<data>
<data>
<data>
Figure 2. Sorted Sequential Array Implementation
A sorted array implementation stores the nodes consequtively in either ascending or
descending order.
Operation Description
Initialize O(1)
IsEmpty O(1) as you will only check if the first element is empty
Insert O(1) as you will add to the end of the array
Find Olog(n) as you can perform a binary search operation
Can you think of why?
Remove O(n) as you have to perform a binary search and shuffle
elements one place up
Table 3. Sorted Sequential Array Table Operations
Linked List (Sorted or Unsorted)
key entry
14
45
22
67
17
<data>
<data>
<data>
<data>
<data>
key entry
14
45
22
67
17
<data>
<data>
<data>
<data>
<data>
Figure 3. Linked List Implementation
An linked list implementation stores the nodes consequtively (can be sorted or unsorted).
Operation Description
IsEmpty O(1) as you will only check if head pointer is null
Insert O(n) for a sorted list
O(1) for an unsorted list, insert at the begining
Find O(n) as you have to traverse the entire list in the worst
case
Remove O(n) as you have to traverse the list to find the node,
removal is carried out using pointer operations
Table 4. Linked List Table Operations
Ordered Binary Tree
14 <data>
22 <data>
17 <data>
45 <data>
67 <data>
14 <data>
22 <data>
17 <data>
45 <data>
67 <data>
Fig 4. Ordered Binary Tree Implementation
An ordered binary tree is a rooted tree with the property left sub-tree < root < right sub-
tree, and the left and right sub-trees are ordered binary trees.
Operation Description
IsEmpty O(1) as you will only check if the root is null
Insert O(logn)) tree is an ordered binary tree
Find O(log(n)) as the tree is an ordered binary tree
Remove O(log(n)) as finding takes O(log(n)) and removal takes
constant time as it is carried out using pointer operations
Table 5. Ordered Binary Tree Table Operations
Hashing
Having an insertion, find and removal of O(log(N)) is good but as the size of the table
becomes larger, even this value becomes significant. We would like to be able to use an
algorithm for finding of O(1). This is when hashing comes into play!
Hashing using Arrays
When implementing a hash table using arrays, the nodes are not stored consecutively,
instead the location of storage is computed using the key and a hash function. The
computation of the array index can be visualized as shown below:
Key hash
function
array
index
Key hash
function
array
index
Figure 5. Array Index Computation
The value computed by applying the hash function to the key is often referred to as the
hashed key. The entries into the array, are scattered (not necessarily sequential) as can be
seen in figure below.
key entry
4
10
<key> <data>
<key> <data>
<key> <data>
123
key entry
4
10
<key> <data>
<key> <data>
<key> <data>
123
Figure 6. Hashed Array
The cost of the insert, find and delete operations is now only O(1). Can you think of
why?
Hash tables are very good if you need to perform a lot of search operations on a relatively
stable table (i.e. there are a lot fewer insertion and deletion operations than search
operations).
One the other hand, if traversals (covering the entire table), insertions, deletions are a lot
more frequent than simple search operations, then ordered binary trees (also called AVL
trees) are the preferred implementation choice.
Hashing Performance
There are three factors the influence the performance of hashing:
Hash function
o should distribute the keys and entries evenly throughout the entire table
o should minimize collisions
Collision resolution strategy
o Open Addressing: store the key/entry in a different position
o Separate Chaining: chain several keys/entries in the same position
Table size
o Too large a table, will cause a wastage of memory
o Too small a table will cause increased collisions and eventually force
rehashing (creating a new hash table of larger size and copying the
contents of the current hash table into it)
o The size should be appropriate to the hash function used and should
typically be a prime number. Why? (We discussed this in class).
Selecting Hash Functions
The hash function converts the key into the table position. It can be carried out using:
Modular Arithmetic: Compute the index by dividing the key with some value and
use the remainder as the index. This forms the basis of the next two techniques.
For Example: index := key MOD table_size
Truncation: Ignoring part of the key and using the rest as the array index. The
problem with this approach is that there may not always be an even distribution
throughout the table.
For Example: If student id’s are the key 928324312 then select just the last three
digits as the index i.e. 312 as the index. => the table size has to be atleast 999.
Why?
Folding: Partition the key into several pieces and then combine it in some
convenient way.
For Example:
o For an 8 bit integer, compute the index as follows:
Index := (Key/10000 + Key MOD 10000) MOD Table_Size.
o For character strings, compute the index as follows:
Index :=0
For I in 1.. length(string)
Index := Index + ascii_value(String(I))
Collision
Let us consider the case when we have a single array with four records, each with two
fields, one for the key and one to hold data (we call this a single slot bucket). Let the
hashing function be a simple modulus operator i.e. array index is computed by finding the
remainder of dividing the key by 4.
Array Index := key MOD 4
Then key values 9, 13, 17 will all hash to the same index. When two(or more) keys hash
to the same value, a collision is said to occur.
k = 13
0
hash_table (I,J )
1
2
3
1
Key Hash
function
k = 9
Hashed
value
9
k = 17
k = 13
0
hash_table (I,J )
1
2
3
1
Key Hash
function
k = 9
Hashed
value
9
k = 17
Figure 7. Collision Using a Modulus Hash Function
Collision Resolution
The hash table can be implemented either using
Buckets: An array is used for implementing the hash table. The array has size
m*p where m is the number of hash values and p (≥ 1) is the number of slots (a
slot can hold one entry) as shown in figure below. The bucket is said to have p
slots.
0
1st slot
1
2
3
key
Hash
value
(index)
2nd slot
key
3rd slot
key
0
1st slot
1
2
3
key
Hash
value
(index)
2nd slot
key
3rd slot
key
Figure 8. Hash Table with Buckets
Chaining: An array is used to hold the key and a pointer to a liked list (either
singly or doubly linked) or a tree. Here the number of nodes is not restricted
(unlike with buckets). Each node in the chain is large enough to hold one entry as
shown in figure below.
0
1
3
Hash
Value
n
Hash
Table
Chain
.
.
.
NULL
NULL
NULL
NULL
1
C
2
B
A
0
1
3
Hash
Value
n
Hash
Table
Chain
.
.
.
NULL
NULL
NULL
NULL
1
C
2
B
A
Figure 9. Chaining using Linked Lists / Trees
Open Addressing (Probing)
Open addressing / probing is carried out for insertion into fixed size hash tables (hash
tables with 1 or more buckets). If the index given by the hash function is occupied, then
increment the table position by some number.
There are three schemes commonly used for probing:
Linear Probing: The linear probing algorithm is detailed below:
Index := hash(key)
While Table(Index) Is Full do
index := (index + 1) MOD Table_Size
if (index = hash(key))
return table_full
else
Table(Index) := Entry
Quadratic Probing: increment the position computed by the hash function in
quadratic fashion i.e. increment by 1, 4, 9, 16, ….
Double Hash: compute the index as a function of two different hash functions.
Chaining
In chaining, the entries are inserted as nodes in a linked list. The hash table itself is an
array of head pointers.
The advantages of using chaining are
Insertion can be carried out at the head of the list at the index
The array size is not a limiting factor on the size of the table
The prime disadvantage is the memory overhead incurred if the table size is small.

More Related Content

Similar to hashing.pdf

Data structure.pptx
Data structure.pptxData structure.pptx
Data structure.pptxSajalFayyaz
 
Data Structures Design Notes.pdf
Data Structures Design Notes.pdfData Structures Design Notes.pdf
Data Structures Design Notes.pdfAmuthachenthiruK
 
11_hashtable-1.ppt. Data structure algorithm
11_hashtable-1.ppt. Data structure algorithm11_hashtable-1.ppt. Data structure algorithm
11_hashtable-1.ppt. Data structure algorithmfarhankhan89766
 
Hashing and File Structures in Data Structure.pdf
Hashing and File Structures in Data Structure.pdfHashing and File Structures in Data Structure.pdf
Hashing and File Structures in Data Structure.pdfJaithoonBibi
 
Homework Assignment – Array Technical DocumentWrite a technical .pdf
Homework Assignment – Array Technical DocumentWrite a technical .pdfHomework Assignment – Array Technical DocumentWrite a technical .pdf
Homework Assignment – Array Technical DocumentWrite a technical .pdfaroraopticals15
 
Datastructures and algorithms prepared by M.V.Brehmanada Reddy
Datastructures and algorithms prepared by M.V.Brehmanada ReddyDatastructures and algorithms prepared by M.V.Brehmanada Reddy
Datastructures and algorithms prepared by M.V.Brehmanada ReddyMalikireddy Bramhananda Reddy
 
Hashing notes data structures (HASHING AND HASH FUNCTIONS)
Hashing notes data structures (HASHING AND HASH FUNCTIONS)Hashing notes data structures (HASHING AND HASH FUNCTIONS)
Hashing notes data structures (HASHING AND HASH FUNCTIONS)Kuntal Bhowmick
 
Assignment4 Assignment 4 Hashtables In this assignment we w.pdf
Assignment4 Assignment 4 Hashtables In this assignment we w.pdfAssignment4 Assignment 4 Hashtables In this assignment we w.pdf
Assignment4 Assignment 4 Hashtables In this assignment we w.pdfkksrivastava1
 

Similar to hashing.pdf (20)

Data structure.pptx
Data structure.pptxData structure.pptx
Data structure.pptx
 
Data Structures Design Notes.pdf
Data Structures Design Notes.pdfData Structures Design Notes.pdf
Data Structures Design Notes.pdf
 
Unit viii searching and hashing
Unit   viii searching and hashing Unit   viii searching and hashing
Unit viii searching and hashing
 
Hashing .pptx
Hashing .pptxHashing .pptx
Hashing .pptx
 
11_hashtable-1.ppt. Data structure algorithm
11_hashtable-1.ppt. Data structure algorithm11_hashtable-1.ppt. Data structure algorithm
11_hashtable-1.ppt. Data structure algorithm
 
Excel formula
Excel formulaExcel formula
Excel formula
 
Hashing and File Structures in Data Structure.pdf
Hashing and File Structures in Data Structure.pdfHashing and File Structures in Data Structure.pdf
Hashing and File Structures in Data Structure.pdf
 
Homework Assignment – Array Technical DocumentWrite a technical .pdf
Homework Assignment – Array Technical DocumentWrite a technical .pdfHomework Assignment – Array Technical DocumentWrite a technical .pdf
Homework Assignment – Array Technical DocumentWrite a technical .pdf
 
14078956.ppt
14078956.ppt14078956.ppt
14078956.ppt
 
Datastructures and algorithms prepared by M.V.Brehmanada Reddy
Datastructures and algorithms prepared by M.V.Brehmanada ReddyDatastructures and algorithms prepared by M.V.Brehmanada Reddy
Datastructures and algorithms prepared by M.V.Brehmanada Reddy
 
Data structures using C
Data structures using CData structures using C
Data structures using C
 
Hashing notes data structures (HASHING AND HASH FUNCTIONS)
Hashing notes data structures (HASHING AND HASH FUNCTIONS)Hashing notes data structures (HASHING AND HASH FUNCTIONS)
Hashing notes data structures (HASHING AND HASH FUNCTIONS)
 
Data structures using c
Data structures using cData structures using c
Data structures using c
 
Assignment4 Assignment 4 Hashtables In this assignment we w.pdf
Assignment4 Assignment 4 Hashtables In this assignment we w.pdfAssignment4 Assignment 4 Hashtables In this assignment we w.pdf
Assignment4 Assignment 4 Hashtables In this assignment we w.pdf
 
4.4 hashing
4.4 hashing4.4 hashing
4.4 hashing
 
Hashing PPT
Hashing PPTHashing PPT
Hashing PPT
 
Ch17 Hashing
Ch17 HashingCh17 Hashing
Ch17 Hashing
 
lecture10.ppt
lecture10.pptlecture10.ppt
lecture10.ppt
 
1st lecture.ppt
1st lecture.ppt1st lecture.ppt
1st lecture.ppt
 
Lec4
Lec4Lec4
Lec4
 

Recently uploaded

专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degreeyuu sss
 
Papular No 1 Online Istikhara Amil Baba Pakistan Amil Baba In Karachi Amil B...
Papular No 1 Online Istikhara Amil Baba Pakistan  Amil Baba In Karachi Amil B...Papular No 1 Online Istikhara Amil Baba Pakistan  Amil Baba In Karachi Amil B...
Papular No 1 Online Istikhara Amil Baba Pakistan Amil Baba In Karachi Amil B...Authentic No 1 Amil Baba In Pakistan
 
Call Girls In Paharganj 24/7✡️9711147426✡️ Escorts Service
Call Girls In Paharganj 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Paharganj 24/7✡️9711147426✡️ Escorts Service
Call Girls In Paharganj 24/7✡️9711147426✡️ Escorts Servicejennyeacort
 
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一ga6c6bdl
 
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...ur8mqw8e
 
《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》
《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》
《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》o8wvnojp
 
办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一
办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一
办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一diploma 1
 
萨斯喀彻温大学毕业证学位证成绩单-购买流程
萨斯喀彻温大学毕业证学位证成绩单-购买流程萨斯喀彻温大学毕业证学位证成绩单-购买流程
萨斯喀彻温大学毕业证学位证成绩单-购买流程1k98h0e1
 
如何办理(NUS毕业证书)新加坡国立大学毕业证成绩单留信学历认证原版一比一
如何办理(NUS毕业证书)新加坡国立大学毕业证成绩单留信学历认证原版一比一如何办理(NUS毕业证书)新加坡国立大学毕业证成绩单留信学历认证原版一比一
如何办理(NUS毕业证书)新加坡国立大学毕业证成绩单留信学历认证原版一比一ga6c6bdl
 
Call Girls Service Kolkata Aishwarya 🤌 8250192130 🚀 Vip Call Girls Kolkata
Call Girls Service Kolkata Aishwarya 🤌  8250192130 🚀 Vip Call Girls KolkataCall Girls Service Kolkata Aishwarya 🤌  8250192130 🚀 Vip Call Girls Kolkata
Call Girls Service Kolkata Aishwarya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Vip Noida Escorts 9873940964 Greater Noida Escorts Service
Vip Noida Escorts 9873940964 Greater Noida Escorts ServiceVip Noida Escorts 9873940964 Greater Noida Escorts Service
Vip Noida Escorts 9873940964 Greater Noida Escorts Serviceankitnayak356677
 
Vip Udupi Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Vip Udupi Call Girls 7001305949 WhatsApp Number 24x7 Best ServicesVip Udupi Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Vip Udupi Call Girls 7001305949 WhatsApp Number 24x7 Best Servicesnajka9823
 
(办理学位证)加州州立大学北岭分校毕业证成绩单原版一比一
(办理学位证)加州州立大学北岭分校毕业证成绩单原版一比一(办理学位证)加州州立大学北岭分校毕业证成绩单原版一比一
(办理学位证)加州州立大学北岭分校毕业证成绩单原版一比一Fi sss
 
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)861c7ca49a02
 
威廉玛丽学院毕业证学位证成绩单-安全学历认证
威廉玛丽学院毕业证学位证成绩单-安全学历认证威廉玛丽学院毕业证学位证成绩单-安全学历认证
威廉玛丽学院毕业证学位证成绩单-安全学历认证kbdhl05e
 
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsapps
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /WhatsappsBeautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsapps
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsappssapnasaifi408
 
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝soniya singh
 
专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degreeyuu sss
 
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCR
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCRReal Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCR
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCRdollysharma2066
 

Recently uploaded (20)

专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国加州州立大学东湾分校毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
 
Papular No 1 Online Istikhara Amil Baba Pakistan Amil Baba In Karachi Amil B...
Papular No 1 Online Istikhara Amil Baba Pakistan  Amil Baba In Karachi Amil B...Papular No 1 Online Istikhara Amil Baba Pakistan  Amil Baba In Karachi Amil B...
Papular No 1 Online Istikhara Amil Baba Pakistan Amil Baba In Karachi Amil B...
 
Call Girls In Paharganj 24/7✡️9711147426✡️ Escorts Service
Call Girls In Paharganj 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Paharganj 24/7✡️9711147426✡️ Escorts Service
Call Girls In Paharganj 24/7✡️9711147426✡️ Escorts Service
 
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
定制宾州州立大学毕业证(PSU毕业证) 成绩单留信学历认证原版一比一
 
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
 
《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》
《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》
《1:1仿制麦克马斯特大学毕业证|订制麦克马斯特大学文凭》
 
办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一
办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一
办理(CSU毕业证书)澳洲查理斯特大学毕业证成绩单原版一比一
 
萨斯喀彻温大学毕业证学位证成绩单-购买流程
萨斯喀彻温大学毕业证学位证成绩单-购买流程萨斯喀彻温大学毕业证学位证成绩单-购买流程
萨斯喀彻温大学毕业证学位证成绩单-购买流程
 
如何办理(NUS毕业证书)新加坡国立大学毕业证成绩单留信学历认证原版一比一
如何办理(NUS毕业证书)新加坡国立大学毕业证成绩单留信学历认证原版一比一如何办理(NUS毕业证书)新加坡国立大学毕业证成绩单留信学历认证原版一比一
如何办理(NUS毕业证书)新加坡国立大学毕业证成绩单留信学历认证原版一比一
 
Call Girls Service Kolkata Aishwarya 🤌 8250192130 🚀 Vip Call Girls Kolkata
Call Girls Service Kolkata Aishwarya 🤌  8250192130 🚀 Vip Call Girls KolkataCall Girls Service Kolkata Aishwarya 🤌  8250192130 🚀 Vip Call Girls Kolkata
Call Girls Service Kolkata Aishwarya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Vip Noida Escorts 9873940964 Greater Noida Escorts Service
Vip Noida Escorts 9873940964 Greater Noida Escorts ServiceVip Noida Escorts 9873940964 Greater Noida Escorts Service
Vip Noida Escorts 9873940964 Greater Noida Escorts Service
 
Vip Udupi Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Vip Udupi Call Girls 7001305949 WhatsApp Number 24x7 Best ServicesVip Udupi Call Girls 7001305949 WhatsApp Number 24x7 Best Services
Vip Udupi Call Girls 7001305949 WhatsApp Number 24x7 Best Services
 
(办理学位证)加州州立大学北岭分校毕业证成绩单原版一比一
(办理学位证)加州州立大学北岭分校毕业证成绩单原版一比一(办理学位证)加州州立大学北岭分校毕业证成绩单原版一比一
(办理学位证)加州州立大学北岭分校毕业证成绩单原版一比一
 
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
5S - House keeping (Seiri, Seiton, Seiso, Seiketsu, Shitsuke)
 
威廉玛丽学院毕业证学位证成绩单-安全学历认证
威廉玛丽学院毕业证学位证成绩单-安全学历认证威廉玛丽学院毕业证学位证成绩单-安全学历认证
威廉玛丽学院毕业证学位证成绩单-安全学历认证
 
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsapps
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /WhatsappsBeautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsapps
Beautiful Sapna Call Girls CP 9711199012 ☎ Call /Whatsapps
 
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
Call Girls in Dwarka Sub City 💯Call Us 🔝8264348440🔝
 
专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
专业一比一美国旧金山艺术学院毕业证成绩单pdf电子版制作修改#真实工艺展示#真实防伪#diploma#degree
 
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCR
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCRReal Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCR
Real Sure (Call Girl) in I.G.I. Airport 8377087607 Hot Call Girls In Delhi NCR
 
CIVIL ENGINEERING
CIVIL ENGINEERINGCIVIL ENGINEERING
CIVIL ENGINEERING
 

hashing.pdf

  • 1. NOTES ON HASHING Author: Jayakanth Srinivasan jksrini@mit.edu Introduction Any large information source (data base) can be thought of as a table (with multiple fields), containing information. For example: A telephone book has fields name, address and phone number. When you want to find somebody’s phone number, you search the book based on the name field. A user account on AA-Design, has the fields user_id, password and home folder. You log on using your user_id and password and it takes you to your home folder. To find an entry (field of information) in the table, you only have to use the contents of one of the fields (say name in the case of the telephone book). You don’t have to know the contents of all the fields. The field you use to find the contents of the other fields is called the key. Ideally, the key should uniquely identify the entry, i.e. if the key is the name then no two entries in the telephone book have the same name. We can treat the Table formulation as an abstract data type. As with all ADT’s we can define a set of operations on a table Operation Description Initialize Initialize internal structure; create an empty table IsEmpty True iff the table has no elements Insert Given a key and an entry, insert it into the table Find Given a key, find the entry associated with the key Remove Given a key, find the entry associated with the key and remove it from the table Table 1. Table ADT Operations
  • 2. Implementation Given an ADT, the implementation is subject to the following questions What is the frequency of insertion and deletion into the table? How many key values will be used? What is the pattern for searching for the keys? i.e. will most of the accesses use only one or two key values? Is the table small enough to fit into memory? How long should the table exist in memory? We use the word node to represent an entry into the table. For searching, the key is typically stored separately from the table entry (even if the key is present in the entry as well). Can you think of why? Unsorted Sequential Array 0 … key entry 1 2 3 and so on 4 14 45 22 67 17 <data> <data> <data> <data> <data> 0 … key entry 1 2 3 and so on 4 14 45 22 67 17 <data> <data> <data> <data> <data> Figure 1. Unsorted Sequential Array Implementation An array implementation stores the nodes consequtively in any order (not necessarily in ascending or descending order). Operation Description Initialize O(1) IsEmpty O(1) as you will only check if the first element is empty Insert O(1) as you will add to the end of the array Find O(n) as you have to sequentially search the array, in the worst case through the entire array Remove O(n) as you have to sequentially search the array, delete the element and copy all elements one place up Table 2. Unsorted Sequential Array Table Operations
  • 3. Unsorted Sequential Array 0 … key entry 1 2 3 and so on 4 15 17 22 45 67 <data> <data> <data> <data> <data> 0 … key entry 1 2 3 and so on 4 15 17 22 45 67 <data> <data> <data> <data> <data> Figure 2. Sorted Sequential Array Implementation A sorted array implementation stores the nodes consequtively in either ascending or descending order. Operation Description Initialize O(1) IsEmpty O(1) as you will only check if the first element is empty Insert O(1) as you will add to the end of the array Find Olog(n) as you can perform a binary search operation Can you think of why? Remove O(n) as you have to perform a binary search and shuffle elements one place up Table 3. Sorted Sequential Array Table Operations Linked List (Sorted or Unsorted) key entry 14 45 22 67 17 <data> <data> <data> <data> <data> key entry 14 45 22 67 17 <data> <data> <data> <data> <data> Figure 3. Linked List Implementation An linked list implementation stores the nodes consequtively (can be sorted or unsorted).
  • 4. Operation Description IsEmpty O(1) as you will only check if head pointer is null Insert O(n) for a sorted list O(1) for an unsorted list, insert at the begining Find O(n) as you have to traverse the entire list in the worst case Remove O(n) as you have to traverse the list to find the node, removal is carried out using pointer operations Table 4. Linked List Table Operations Ordered Binary Tree 14 <data> 22 <data> 17 <data> 45 <data> 67 <data> 14 <data> 22 <data> 17 <data> 45 <data> 67 <data> Fig 4. Ordered Binary Tree Implementation An ordered binary tree is a rooted tree with the property left sub-tree < root < right sub- tree, and the left and right sub-trees are ordered binary trees. Operation Description IsEmpty O(1) as you will only check if the root is null Insert O(logn)) tree is an ordered binary tree Find O(log(n)) as the tree is an ordered binary tree Remove O(log(n)) as finding takes O(log(n)) and removal takes constant time as it is carried out using pointer operations Table 5. Ordered Binary Tree Table Operations
  • 5. Hashing Having an insertion, find and removal of O(log(N)) is good but as the size of the table becomes larger, even this value becomes significant. We would like to be able to use an algorithm for finding of O(1). This is when hashing comes into play! Hashing using Arrays When implementing a hash table using arrays, the nodes are not stored consecutively, instead the location of storage is computed using the key and a hash function. The computation of the array index can be visualized as shown below: Key hash function array index Key hash function array index Figure 5. Array Index Computation The value computed by applying the hash function to the key is often referred to as the hashed key. The entries into the array, are scattered (not necessarily sequential) as can be seen in figure below. key entry 4 10 <key> <data> <key> <data> <key> <data> 123 key entry 4 10 <key> <data> <key> <data> <key> <data> 123 Figure 6. Hashed Array The cost of the insert, find and delete operations is now only O(1). Can you think of why? Hash tables are very good if you need to perform a lot of search operations on a relatively stable table (i.e. there are a lot fewer insertion and deletion operations than search operations). One the other hand, if traversals (covering the entire table), insertions, deletions are a lot more frequent than simple search operations, then ordered binary trees (also called AVL trees) are the preferred implementation choice.
  • 6. Hashing Performance There are three factors the influence the performance of hashing: Hash function o should distribute the keys and entries evenly throughout the entire table o should minimize collisions Collision resolution strategy o Open Addressing: store the key/entry in a different position o Separate Chaining: chain several keys/entries in the same position Table size o Too large a table, will cause a wastage of memory o Too small a table will cause increased collisions and eventually force rehashing (creating a new hash table of larger size and copying the contents of the current hash table into it) o The size should be appropriate to the hash function used and should typically be a prime number. Why? (We discussed this in class). Selecting Hash Functions The hash function converts the key into the table position. It can be carried out using: Modular Arithmetic: Compute the index by dividing the key with some value and use the remainder as the index. This forms the basis of the next two techniques. For Example: index := key MOD table_size Truncation: Ignoring part of the key and using the rest as the array index. The problem with this approach is that there may not always be an even distribution throughout the table. For Example: If student id’s are the key 928324312 then select just the last three digits as the index i.e. 312 as the index. => the table size has to be atleast 999. Why? Folding: Partition the key into several pieces and then combine it in some convenient way. For Example: o For an 8 bit integer, compute the index as follows: Index := (Key/10000 + Key MOD 10000) MOD Table_Size. o For character strings, compute the index as follows:
  • 7. Index :=0 For I in 1.. length(string) Index := Index + ascii_value(String(I)) Collision Let us consider the case when we have a single array with four records, each with two fields, one for the key and one to hold data (we call this a single slot bucket). Let the hashing function be a simple modulus operator i.e. array index is computed by finding the remainder of dividing the key by 4. Array Index := key MOD 4 Then key values 9, 13, 17 will all hash to the same index. When two(or more) keys hash to the same value, a collision is said to occur. k = 13 0 hash_table (I,J ) 1 2 3 1 Key Hash function k = 9 Hashed value 9 k = 17 k = 13 0 hash_table (I,J ) 1 2 3 1 Key Hash function k = 9 Hashed value 9 k = 17 Figure 7. Collision Using a Modulus Hash Function Collision Resolution The hash table can be implemented either using Buckets: An array is used for implementing the hash table. The array has size m*p where m is the number of hash values and p (≥ 1) is the number of slots (a slot can hold one entry) as shown in figure below. The bucket is said to have p slots. 0 1st slot 1 2 3 key Hash value (index) 2nd slot key 3rd slot key 0 1st slot 1 2 3 key Hash value (index) 2nd slot key 3rd slot key Figure 8. Hash Table with Buckets
  • 8. Chaining: An array is used to hold the key and a pointer to a liked list (either singly or doubly linked) or a tree. Here the number of nodes is not restricted (unlike with buckets). Each node in the chain is large enough to hold one entry as shown in figure below. 0 1 3 Hash Value n Hash Table Chain . . . NULL NULL NULL NULL 1 C 2 B A 0 1 3 Hash Value n Hash Table Chain . . . NULL NULL NULL NULL 1 C 2 B A Figure 9. Chaining using Linked Lists / Trees Open Addressing (Probing) Open addressing / probing is carried out for insertion into fixed size hash tables (hash tables with 1 or more buckets). If the index given by the hash function is occupied, then increment the table position by some number. There are three schemes commonly used for probing: Linear Probing: The linear probing algorithm is detailed below: Index := hash(key) While Table(Index) Is Full do index := (index + 1) MOD Table_Size if (index = hash(key)) return table_full else Table(Index) := Entry Quadratic Probing: increment the position computed by the hash function in quadratic fashion i.e. increment by 1, 4, 9, 16, ….
  • 9. Double Hash: compute the index as a function of two different hash functions. Chaining In chaining, the entries are inserted as nodes in a linked list. The hash table itself is an array of head pointers. The advantages of using chaining are Insertion can be carried out at the head of the list at the index The array size is not a limiting factor on the size of the table The prime disadvantage is the memory overhead incurred if the table size is small.