SlideShare a Scribd company logo
1 of 289
Download to read offline
Data Structures
Dictionaries
Andres Mendez-Vazquez
May 6, 2015
1 / 127
Images/cinvestav-
Outline
1 Introduction
2 Operation in a ADT dictionary
Add
Remove
GetValue
Contains
Iterators
3 Example
4 Implementation
5 Hash Tables
Number of Keys
Hash Functions
Hashing By Division
6 Overflow Handling
Open Addressing
Chaining
2 / 127
Images/cinvestav-
Dictionaries
Definition
The ADT dictionary—also called a map, table, or associative
array—contains entries that each have two parts:
A keyword—usually called a search key—such as an English word or a
person’s name
A value—such as a definition, an address, or a telephone
number—associated with that key
3 / 127
Images/cinvestav-
Dictionaries
Definition
The ADT dictionary—also called a map, table, or associative
array—contains entries that each have two parts:
A keyword—usually called a search key—such as an English word or a
person’s name
A value—such as a definition, an address, or a telephone
number—associated with that key
3 / 127
Images/cinvestav-
Dictionaries
Definition
The ADT dictionary—also called a map, table, or associative
array—contains entries that each have two parts:
A keyword—usually called a search key—such as an English word or a
person’s name
A value—such as a definition, an address, or a telephone
number—associated with that key
3 / 127
Images/cinvestav-
Example
Dictionary with Duplicates
Pairs are of the form (word, meaning).
More than a single meaning
(bolt, a threaded pin)
(bolt, a crash of thunder)
(bolt, to shoot forth suddenly)
(bolt, a gulp)
(bolt, a standard roll of cloth)
etc.
4 / 127
Images/cinvestav-
Example
Dictionary with Duplicates
Pairs are of the form (word, meaning).
More than a single meaning
(bolt, a threaded pin)
(bolt, a crash of thunder)
(bolt, to shoot forth suddenly)
(bolt, a gulp)
(bolt, a standard roll of cloth)
etc.
4 / 127
Images/cinvestav-
Example
Dictionary with Duplicates
Pairs are of the form (word, meaning).
More than a single meaning
(bolt, a threaded pin)
(bolt, a crash of thunder)
(bolt, to shoot forth suddenly)
(bolt, a gulp)
(bolt, a standard roll of cloth)
etc.
4 / 127
Images/cinvestav-
Example
Dictionary with Duplicates
Pairs are of the form (word, meaning).
More than a single meaning
(bolt, a threaded pin)
(bolt, a crash of thunder)
(bolt, to shoot forth suddenly)
(bolt, a gulp)
(bolt, a standard roll of cloth)
etc.
4 / 127
Images/cinvestav-
Example
Dictionary with Duplicates
Pairs are of the form (word, meaning).
More than a single meaning
(bolt, a threaded pin)
(bolt, a crash of thunder)
(bolt, to shoot forth suddenly)
(bolt, a gulp)
(bolt, a standard roll of cloth)
etc.
4 / 127
Images/cinvestav-
Example
Dictionary with Duplicates
Pairs are of the form (word, meaning).
More than a single meaning
(bolt, a threaded pin)
(bolt, a crash of thunder)
(bolt, to shoot forth suddenly)
(bolt, a gulp)
(bolt, a standard roll of cloth)
etc.
4 / 127
Images/cinvestav-
Example
Dictionary with Duplicates
Pairs are of the form (word, meaning).
More than a single meaning
(bolt, a threaded pin)
(bolt, a crash of thunder)
(bolt, to shoot forth suddenly)
(bolt, a gulp)
(bolt, a standard roll of cloth)
etc.
4 / 127
Images/cinvestav-
Thus
We have possibly in a dictionary
Sorted keys
Duplicate keys
5 / 127
Images/cinvestav-
Operation in the ADT dictionary
Common operations with most databases
insert adds a new entry to the dictionary, given a search key and
associated value.
delete removes an entry, given its associated search key
retrieve retrieves the value associated with a given search key
search sees whether the dictionary contains a given search key
traverse
It traverse all the search keys in the dictionary
It traverse all the values in the dictionary
6 / 127
Images/cinvestav-
Operation in the ADT dictionary
Common operations with most databases
insert adds a new entry to the dictionary, given a search key and
associated value.
delete removes an entry, given its associated search key
retrieve retrieves the value associated with a given search key
search sees whether the dictionary contains a given search key
traverse
It traverse all the search keys in the dictionary
It traverse all the values in the dictionary
6 / 127
Images/cinvestav-
Operation in the ADT dictionary
Common operations with most databases
insert adds a new entry to the dictionary, given a search key and
associated value.
delete removes an entry, given its associated search key
retrieve retrieves the value associated with a given search key
search sees whether the dictionary contains a given search key
traverse
It traverse all the search keys in the dictionary
It traverse all the values in the dictionary
6 / 127
Images/cinvestav-
Operation in the ADT dictionary
Common operations with most databases
insert adds a new entry to the dictionary, given a search key and
associated value.
delete removes an entry, given its associated search key
retrieve retrieves the value associated with a given search key
search sees whether the dictionary contains a given search key
traverse
It traverse all the search keys in the dictionary
It traverse all the values in the dictionary
6 / 127
Images/cinvestav-
Operation in the ADT dictionary
Common operations with most databases
insert adds a new entry to the dictionary, given a search key and
associated value.
delete removes an entry, given its associated search key
retrieve retrieves the value associated with a given search key
search sees whether the dictionary contains a given search key
traverse
It traverse all the search keys in the dictionary
It traverse all the values in the dictionary
6 / 127
Images/cinvestav-
Operation in the ADT dictionary
Common operations with most databases
insert adds a new entry to the dictionary, given a search key and
associated value.
delete removes an entry, given its associated search key
retrieve retrieves the value associated with a given search key
search sees whether the dictionary contains a given search key
traverse
It traverse all the search keys in the dictionary
It traverse all the values in the dictionary
6 / 127
Images/cinvestav-
Operation in the ADT dictionary
Common operations with most databases
insert adds a new entry to the dictionary, given a search key and
associated value.
delete removes an entry, given its associated search key
retrieve retrieves the value associated with a given search key
search sees whether the dictionary contains a given search key
traverse
It traverse all the search keys in the dictionary
It traverse all the values in the dictionary
6 / 127
Images/cinvestav-
In addition
We have the following extra operations
Detect whether a dictionary is empty
Get the number of entries in the dictionary
Remove all entries from the dictionary
7 / 127
Images/cinvestav-
In addition
We have the following extra operations
Detect whether a dictionary is empty
Get the number of entries in the dictionary
Remove all entries from the dictionary
7 / 127
Images/cinvestav-
In addition
We have the following extra operations
Detect whether a dictionary is empty
Get the number of entries in the dictionary
Remove all entries from the dictionary
7 / 127
Images/cinvestav-
Outline
1 Introduction
2 Operation in a ADT dictionary
Add
Remove
GetValue
Contains
Iterators
3 Example
4 Implementation
5 Hash Tables
Number of Keys
Hash Functions
Hashing By Division
6 Overflow Handling
Open Addressing
Chaining
8 / 127
Images/cinvestav-
Specifications: Add
Pseudocode
add(key, value)
Task
It adds the pair (key , value) to the dictionary.
Input and Output
Input: key is an object search key, value is an associated object.
Output: None.
9 / 127
Images/cinvestav-
Specifications: Add
Pseudocode
add(key, value)
Task
It adds the pair (key , value) to the dictionary.
Input and Output
Input: key is an object search key, value is an associated object.
Output: None.
9 / 127
Images/cinvestav-
Specifications: Add
Pseudocode
add(key, value)
Task
It adds the pair (key , value) to the dictionary.
Input and Output
Input: key is an object search key, value is an associated object.
Output: None.
9 / 127
Images/cinvestav-
Example
Adding something
h(k)
Hash Table
(k,item)
10 / 127
Images/cinvestav-
Outline
1 Introduction
2 Operation in a ADT dictionary
Add
Remove
GetValue
Contains
Iterators
3 Example
4 Implementation
5 Hash Tables
Number of Keys
Hash Functions
Hashing By Division
6 Overflow Handling
Open Addressing
Chaining
11 / 127
Images/cinvestav-
Specifications: Remove
Pseudocode
remove(key)
Task
It removes from the dictionary the entry that corresponds to a given search
key.
Input and Output
Input: key is an object search key.
Output: Returns either the value that was associated with the search
key or null if no such object exists.
12 / 127
Images/cinvestav-
Specifications: Remove
Pseudocode
remove(key)
Task
It removes from the dictionary the entry that corresponds to a given search
key.
Input and Output
Input: key is an object search key.
Output: Returns either the value that was associated with the search
key or null if no such object exists.
12 / 127
Images/cinvestav-
Specifications: Remove
Pseudocode
remove(key)
Task
It removes from the dictionary the entry that corresponds to a given search
key.
Input and Output
Input: key is an object search key.
Output: Returns either the value that was associated with the search
key or null if no such object exists.
12 / 127
Images/cinvestav-
Example
Removing something
h(k)
Hash Table
(k,item)
13 / 127
Images/cinvestav-
Outline
1 Introduction
2 Operation in a ADT dictionary
Add
Remove
GetValue
Contains
Iterators
3 Example
4 Implementation
5 Hash Tables
Number of Keys
Hash Functions
Hashing By Division
6 Overflow Handling
Open Addressing
Chaining
14 / 127
Images/cinvestav-
Specifications: GetValue
Pseudocode
getValue(key)
Task
It retrieves from the dictionary the value that corresponds to a given
search key.
Input and Output
Input: key is an object search key.
Output: Returns either the value associated with the search key or
null if no such object exists.
15 / 127
Images/cinvestav-
Specifications: GetValue
Pseudocode
getValue(key)
Task
It retrieves from the dictionary the value that corresponds to a given
search key.
Input and Output
Input: key is an object search key.
Output: Returns either the value associated with the search key or
null if no such object exists.
15 / 127
Images/cinvestav-
Specifications: GetValue
Pseudocode
getValue(key)
Task
It retrieves from the dictionary the value that corresponds to a given
search key.
Input and Output
Input: key is an object search key.
Output: Returns either the value associated with the search key or
null if no such object exists.
15 / 127
Images/cinvestav-
Outline
1 Introduction
2 Operation in a ADT dictionary
Add
Remove
GetValue
Contains
Iterators
3 Example
4 Implementation
5 Hash Tables
Number of Keys
Hash Functions
Hashing By Division
6 Overflow Handling
Open Addressing
Chaining
16 / 127
Images/cinvestav-
Specifications: Contains
Pseudocode
contains(key)
Task
It sees whether any entry in the dictionary has a given search key.
Input and Output
Input: key is an object search key.
Output: Returns true if an entry in the dictionary has key as its
search key.
17 / 127
Images/cinvestav-
Specifications: Contains
Pseudocode
contains(key)
Task
It sees whether any entry in the dictionary has a given search key.
Input and Output
Input: key is an object search key.
Output: Returns true if an entry in the dictionary has key as its
search key.
17 / 127
Images/cinvestav-
Specifications: Contains
Pseudocode
contains(key)
Task
It sees whether any entry in the dictionary has a given search key.
Input and Output
Input: key is an object search key.
Output: Returns true if an entry in the dictionary has key as its
search key.
17 / 127
Images/cinvestav-
Outline
1 Introduction
2 Operation in a ADT dictionary
Add
Remove
GetValue
Contains
Iterators
3 Example
4 Implementation
5 Hash Tables
Number of Keys
Hash Functions
Hashing By Division
6 Overflow Handling
Open Addressing
Chaining
18 / 127
Images/cinvestav-
Specifications: GetKeyIterator
Pseudocode
getKeyIterator()
Task
It creates an iterator that traverses all search keys in the dictionary.
Input and Output
Input: None.
Output: Returns an iterator that provides sequential access to the
search keys in the dictionary.
19 / 127
Images/cinvestav-
Specifications: GetKeyIterator
Pseudocode
getKeyIterator()
Task
It creates an iterator that traverses all search keys in the dictionary.
Input and Output
Input: None.
Output: Returns an iterator that provides sequential access to the
search keys in the dictionary.
19 / 127
Images/cinvestav-
Specifications: GetKeyIterator
Pseudocode
getKeyIterator()
Task
It creates an iterator that traverses all search keys in the dictionary.
Input and Output
Input: None.
Output: Returns an iterator that provides sequential access to the
search keys in the dictionary.
19 / 127
Images/cinvestav-
Specifications: GetValueIterator
Pseudocode
getValueIterator()
Task
It creates an iterator that traverses all values in the dictionary.
Input and Output
Input: None.
Output: Returns an iterator that provides sequential access to the
values in the dictionary.
20 / 127
Images/cinvestav-
Specifications: GetValueIterator
Pseudocode
getValueIterator()
Task
It creates an iterator that traverses all values in the dictionary.
Input and Output
Input: None.
Output: Returns an iterator that provides sequential access to the
values in the dictionary.
20 / 127
Images/cinvestav-
Specifications: GetValueIterator
Pseudocode
getValueIterator()
Task
It creates an iterator that traverses all values in the dictionary.
Input and Output
Input: None.
Output: Returns an iterator that provides sequential access to the
values in the dictionary.
20 / 127
Images/cinvestav-
Other Operations
isEmpty()
It sees whether the dictionary is empty.
getSize()
It gets the size of the dictionary.
clear()
It removes all entries from the dictionary.
21 / 127
Images/cinvestav-
Other Operations
isEmpty()
It sees whether the dictionary is empty.
getSize()
It gets the size of the dictionary.
clear()
It removes all entries from the dictionary.
21 / 127
Images/cinvestav-
Other Operations
isEmpty()
It sees whether the dictionary is empty.
getSize()
It gets the size of the dictionary.
clear()
It removes all entries from the dictionary.
21 / 127
Images/cinvestav-
However, we have two scenarios
Distinct search keys
Case 1 You can refuse to add another key-value.
Case 2 You can change the existing value associated with key to
the new value. Then, you return the old value
Duplicate search keys
if the method add adds every given key-value entry to a dictionary
The methods remove and getValue must deal with multiple entries
that have the same search key.
What do you remove or return!!!
22 / 127
Images/cinvestav-
However, we have two scenarios
Distinct search keys
Case 1 You can refuse to add another key-value.
Case 2 You can change the existing value associated with key to
the new value. Then, you return the old value
Duplicate search keys
if the method add adds every given key-value entry to a dictionary
The methods remove and getValue must deal with multiple entries
that have the same search key.
What do you remove or return!!!
22 / 127
Images/cinvestav-
However, we have two scenarios
Distinct search keys
Case 1 You can refuse to add another key-value.
Case 2 You can change the existing value associated with key to
the new value. Then, you return the old value
Duplicate search keys
if the method add adds every given key-value entry to a dictionary
The methods remove and getValue must deal with multiple entries
that have the same search key.
What do you remove or return!!!
22 / 127
Images/cinvestav-
However, we have two scenarios
Distinct search keys
Case 1 You can refuse to add another key-value.
Case 2 You can change the existing value associated with key to
the new value. Then, you return the old value
Duplicate search keys
if the method add adds every given key-value entry to a dictionary
The methods remove and getValue must deal with multiple entries
that have the same search key.
What do you remove or return!!!
22 / 127
Images/cinvestav-
However, we have two scenarios
Distinct search keys
Case 1 You can refuse to add another key-value.
Case 2 You can change the existing value associated with key to
the new value. Then, you return the old value
Duplicate search keys
if the method add adds every given key-value entry to a dictionary
The methods remove and getValue must deal with multiple entries
that have the same search key.
What do you remove or return!!!
22 / 127
Images/cinvestav-
Interface
We have the following interface
import java . u t i l . I t e r a t o r ;
p u b l i c i n t e r f a c e D i c t i o n a r y I n t e r f a c e <Key , Value>
{
p u b l i c Value add ( Key k , Value Item ) ;
p u b l i c Value remove ( Key k ) ;
p u b l i c Value getValue ( Key k ) ;
p u b l i c boolean c o n t a i n s ( Key k ) ;
p u b l i c I t e r a t o r <Key> g e t K e y I t e r a t o r ( ) ;
p u b l i c I t e r a t o r <Value> g e t V a l u e I t e r a t o r ( ) ;
p u b l i c boolean isEmpty ( ) ;
p u b l i c i n t g e t S i z e ( ) ;
p u b l i c void c l e a r ( ) ;
}
23 / 127
Images/cinvestav-
Where we can use this ADT?
In the phone directory problem
It is a directory that uses a name as the key and adds and returns a phone
number
For example
Name Number
Suzanne Nouveaux 401-555-1234
Andres Mendez-Vazquez 301-123-2345
... ...
24 / 127
Images/cinvestav-
Where we can use this ADT?
In the phone directory problem
It is a directory that uses a name as the key and adds and returns a phone
number
For example
Name Number
Suzanne Nouveaux 401-555-1234
Andres Mendez-Vazquez 301-123-2345
... ...
24 / 127
Images/cinvestav-
Thus, we have the following diagram
A class Diagram
TelephoneDirectory
readFile(data)
getPhoneNumber(name)
PhoneDirectory
Dictionary
Name
String
1 1
*
* *
*
25 / 127
Images/cinvestav-
Basic Code
We have something like this
p u b l i c c l a s s TelephoneDirectory
{
p r i v a t e D i c t i o n a r y I n t e r f a c e <Name , String > phoneBook ;
p u b l i c TelephoneDirectory ()
{
phoneBook = new SortedDictionary <Name , String >();
} // end d e f a u l t c o n s t r u c t o r
/∗∗ Reads a t e x t f i l e of names and telephone numbers∗∗/
p u b l i c void r e a d F i l e ( Scanner data )
{ . . .}
/∗∗ Gets the phone number of a given person . ∗/
p u b l i c S t r i n g getPhoneNumber (Name personName )
{ . . . } // end getPhoneNumber
} // end TelephoneDirectory
26 / 127
Images/cinvestav-
Now, the Big Question
It is a big one
How do we implement this data structure?
Possible ways
Linear List
Skip List
Hash Tables
...
27 / 127
Images/cinvestav-
Now, the Big Question
It is a big one
How do we implement this data structure?
Possible ways
Linear List
Skip List
Hash Tables
...
27 / 127
Images/cinvestav-
Now, the Big Question
It is a big one
How do we implement this data structure?
Possible ways
Linear List
Skip List
Hash Tables
...
27 / 127
Images/cinvestav-
Now, the Big Question
It is a big one
How do we implement this data structure?
Possible ways
Linear List
Skip List
Hash Tables
...
27 / 127
Images/cinvestav-
Now, the Big Question
It is a big one
How do we implement this data structure?
Possible ways
Linear List
Skip List
Hash Tables
...
27 / 127
Images/cinvestav-
First: Represent As A Linear List
You have
L = (e0, e1, ..., en−1)
Where
Each ei is a pair (key, element).
We can use the following representations
Array or linked representation.
28 / 127
Images/cinvestav-
First: Represent As A Linear List
You have
L = (e0, e1, ..., en−1)
Where
Each ei is a pair (key, element).
We can use the following representations
Array or linked representation.
28 / 127
Images/cinvestav-
First: Represent As A Linear List
You have
L = (e0, e1, ..., en−1)
Where
Each ei is a pair (key, element).
We can use the following representations
Array or linked representation.
28 / 127
Images/cinvestav-
Array Representation
Our classic array
b c d e a
We have then
Operation in Array Representation Complexity
getValue(theKey) O(size)
add(theKey, theItem) O(size) to find duplicate
O(1) to add at right end
remove(theKey) O(size)
29 / 127
Images/cinvestav-
Array Representation
Our classic array
b c d e a
We have then
Operation in Array Representation Complexity
getValue(theKey) O(size)
add(theKey, theItem) O(size) to find duplicate
O(1) to add at right end
remove(theKey) O(size)
29 / 127
Images/cinvestav-
What if we sort the array?
Sorted Keys
b c d ea
We have then
Operation in Array Representation Complexity
getValue(theKey) O(logsize)
add(theKey, theItem) O(logsize) to find duplicate
O(size) to add
remove(theKey) O(size)
30 / 127
Images/cinvestav-
What if we sort the array?
Sorted Keys
b c d ea
We have then
Operation in Array Representation Complexity
getValue(theKey) O(logsize)
add(theKey, theItem) O(logsize) to find duplicate
O(size) to add
remove(theKey) O(size)
30 / 127
Images/cinvestav-
Unsorted Chain
Our Structure
b c d e a
Complexity
Operation in Chain Representation Complexity
getValue(theKey) O(size)
add(theKey, theItem) O(size) to find duplicate
O(1) to add
remove(theKey) O(size)
31 / 127
Images/cinvestav-
Unsorted Chain
Our Structure
b c d e a
Complexity
Operation in Chain Representation Complexity
getValue(theKey) O(size)
add(theKey, theItem) O(size) to find duplicate
O(1) to add
remove(theKey) O(size)
31 / 127
Images/cinvestav-
Sorted Chain
Our Structure
b c d ea
Complexity
Operation in Chain Representation Complexity
getValue(theKey) O(size)
add(theKey, theItem) O(size) to find duplicate
O(1) to add
remove(theKey) O(size)
32 / 127
Images/cinvestav-
Sorted Chain
Our Structure
b c d ea
Complexity
Operation in Chain Representation Complexity
getValue(theKey) O(size)
add(theKey, theItem) O(size) to find duplicate
O(1) to add
remove(theKey) O(size)
32 / 127
Images/cinvestav-
Skip Lists: we will skip it - It is for an advance class of
analysis of algorithms
We have something like this
Complexity
Operation Complexity - Worst Case Complexity - Expected
getValue(theKey) O(size) O(log size)
add(theKey, theItem) O(size) O(logsize)
remove(theKey) O(size) O(logsize)
33 / 127
Images/cinvestav-
Skip Lists: we will skip it - It is for an advance class of
analysis of algorithms
We have something like this
Complexity
Operation Complexity - Worst Case Complexity - Expected
getValue(theKey) O(size) O(log size)
add(theKey, theItem) O(size) O(logsize)
remove(theKey) O(size) O(logsize)
33 / 127
Images/cinvestav-
We will concentrate our efforts in the Hash Tables
Definition
A hash table or hash map T is a data structure, most commonly an
array, that uses a hash function to efficiently map certain identifiers of
keys (e.g. person names) to associated values.
Why?
Operation in Array Representation Complexity - Worst Case Complexity - Expected
getValue(theKey) O(size) O (1 + C)
add(theKey, theItem) O(size) O (1 + C)
remove(theKey) O(size) O (1 + C)
34 / 127
Images/cinvestav-
We will concentrate our efforts in the Hash Tables
Definition
A hash table or hash map T is a data structure, most commonly an
array, that uses a hash function to efficiently map certain identifiers of
keys (e.g. person names) to associated values.
Why?
Operation in Array Representation Complexity - Worst Case Complexity - Expected
getValue(theKey) O(size) O (1 + C)
add(theKey, theItem) O(size) O (1 + C)
remove(theKey) O(size) O (1 + C)
34 / 127
Images/cinvestav-
Then
Advantages
They have the advantage of having a expected complexity of
operations of O(1 + C)
Still, be aware of C because this will change depending on which
overflow policy you use...
35 / 127
Images/cinvestav-
Then
Advantages
They have the advantage of having a expected complexity of
operations of O(1 + C)
Still, be aware of C because this will change depending on which
overflow policy you use...
35 / 127
Images/cinvestav-
Outline
1 Introduction
2 Operation in a ADT dictionary
Add
Remove
GetValue
Contains
Iterators
3 Example
4 Implementation
5 Hash Tables
Number of Keys
Hash Functions
Hashing By Division
6 Overflow Handling
Open Addressing
Chaining
36 / 127
Images/cinvestav-
You have two cases for this data structure
First
Small universe of keys.
Second
Large number of keys
37 / 127
Images/cinvestav-
You have two cases for this data structure
First
Small universe of keys.
Second
Large number of keys
37 / 127
Images/cinvestav-
When you have a small universe of keys, U
We can do the following
Key values are direct addresses in the array.
Direct implementation or Direct-address tables.
Operations
1 Direct-Address-Search(Table, key)
return Table[key]
2 Direct-Address-Search(Table,key,value)
Table[key]=value
3 Direct-Address-Delete(T, x)
Table[key]=null
38 / 127
Images/cinvestav-
When you have a small universe of keys, U
We can do the following
Key values are direct addresses in the array.
Direct implementation or Direct-address tables.
Operations
1 Direct-Address-Search(Table, key)
return Table[key]
2 Direct-Address-Search(Table,key,value)
Table[key]=value
3 Direct-Address-Delete(T, x)
Table[key]=null
38 / 127
Images/cinvestav-
When you have a small universe of keys, U
We can do the following
Key values are direct addresses in the array.
Direct implementation or Direct-address tables.
Operations
1 Direct-Address-Search(Table, key)
return Table[key]
2 Direct-Address-Search(Table,key,value)
Table[key]=value
3 Direct-Address-Delete(T, x)
Table[key]=null
38 / 127
Images/cinvestav-
When you have a small universe of keys, U
We can do the following
Key values are direct addresses in the array.
Direct implementation or Direct-address tables.
Operations
1 Direct-Address-Search(Table, key)
return Table[key]
2 Direct-Address-Search(Table,key,value)
Table[key]=value
3 Direct-Address-Delete(T, x)
Table[key]=null
38 / 127
Images/cinvestav-
When you have a small universe of keys, U
We can do the following
Key values are direct addresses in the array.
Direct implementation or Direct-address tables.
Operations
1 Direct-Address-Search(Table, key)
return Table[key]
2 Direct-Address-Search(Table,key,value)
Table[key]=value
3 Direct-Address-Delete(T, x)
Table[key]=null
38 / 127
Images/cinvestav-
When you have a small universe of keys, U
We can do the following
Key values are direct addresses in the array.
Direct implementation or Direct-address tables.
Operations
1 Direct-Address-Search(Table, key)
return Table[key]
2 Direct-Address-Search(Table,key,value)
Table[key]=value
3 Direct-Address-Delete(T, x)
Table[key]=null
38 / 127
Images/cinvestav-
When you have a small universe of keys, U
We can do the following
Key values are direct addresses in the array.
Direct implementation or Direct-address tables.
Operations
1 Direct-Address-Search(Table, key)
return Table[key]
2 Direct-Address-Search(Table,key,value)
Table[key]=value
3 Direct-Address-Delete(T, x)
Table[key]=null
38 / 127
Images/cinvestav-
When you have a small universe of keys, U
We can do the following
Key values are direct addresses in the array.
Direct implementation or Direct-address tables.
Operations
1 Direct-Address-Search(Table, key)
return Table[key]
2 Direct-Address-Search(Table,key,value)
Table[key]=value
3 Direct-Address-Delete(T, x)
Table[key]=null
38 / 127
Images/cinvestav-
When you have a large universe of keys, U
Then
Then, it is impractical to store a table of the size of |U|.
You can use a especial function for mapping
h : U→{0, 1, ..., m − 1} (1)
39 / 127
Images/cinvestav-
When you have a large universe of keys, U
Then
Then, it is impractical to store a table of the size of |U|.
You can use a especial function for mapping
h : U→{0, 1, ..., m − 1} (1)
39 / 127
Images/cinvestav-
Example
Imagine that you have
A 1D array (or table) table[0 : m − 1].
Thus
h(k)is the home bucket for key k.
Then
Every dictionary pair (key, Item) is stored in its home bucket table[h[key]].
40 / 127
Images/cinvestav-
Example
Imagine that you have
A 1D array (or table) table[0 : m − 1].
Thus
h(k)is the home bucket for key k.
Then
Every dictionary pair (key, Item) is stored in its home bucket table[h[key]].
40 / 127
Images/cinvestav-
Example
Imagine that you have
A 1D array (or table) table[0 : m − 1].
Thus
h(k)is the home bucket for key k.
Then
Every dictionary pair (key, Item) is stored in its home bucket table[h[key]].
40 / 127
Images/cinvestav-
Example
Push the following pairs in a hash table of size m = 8
(22,a), (33,c), (3,d), (73,e), (85,f).
Hash function is
key/11
Then, we have that
41 / 127
Images/cinvestav-
Example
Push the following pairs in a hash table of size m = 8
(22,a), (33,c), (3,d), (73,e), (85,f).
Hash function is
key/11
Then, we have that
41 / 127
Images/cinvestav-
Example
Push the following pairs in a hash table of size m = 8
(22,a), (33,c), (3,d), (73,e), (85,f).
Hash function is
key/11
Then, we have that
41 / 127
Images/cinvestav-
What Can Go Wrong?
What if we add?
Where does (26,g) go?
Then
PROBLEM!!!
Keys that have the same home bucket are synonyms.
22 and 26 are synonyms with respect to the hash function that is in
use.
This is known as collision or overflow.
42 / 127
Images/cinvestav-
What Can Go Wrong?
What if we add?
Where does (26,g) go?
Then
PROBLEM!!!
Keys that have the same home bucket are synonyms.
22 and 26 are synonyms with respect to the hash function that is in
use.
This is known as collision or overflow.
42 / 127
Images/cinvestav-
What Can Go Wrong?
What if we add?
Where does (26,g) go?
Then
PROBLEM!!!
Keys that have the same home bucket are synonyms.
22 and 26 are synonyms with respect to the hash function that is in
use.
This is known as collision or overflow.
42 / 127
Images/cinvestav-
What Can Go Wrong?
What if we add?
Where does (26,g) go?
Then
PROBLEM!!!
Keys that have the same home bucket are synonyms.
22 and 26 are synonyms with respect to the hash function that is in
use.
This is known as collision or overflow.
42 / 127
Images/cinvestav-
Collisions
This is a problem
We might try to avoid this by using a suitable hash function h.
Idea
Make appear to be “random” enough to avoid collisions altogether
(Highly Improbable) or to minimize the probability of them.
You still have the problem of collisions
Possible Solutions to the problem:
1 Chaining
2 Open Addressing
43 / 127
Images/cinvestav-
Collisions
This is a problem
We might try to avoid this by using a suitable hash function h.
Idea
Make appear to be “random” enough to avoid collisions altogether
(Highly Improbable) or to minimize the probability of them.
You still have the problem of collisions
Possible Solutions to the problem:
1 Chaining
2 Open Addressing
43 / 127
Images/cinvestav-
Collisions
This is a problem
We might try to avoid this by using a suitable hash function h.
Idea
Make appear to be “random” enough to avoid collisions altogether
(Highly Improbable) or to minimize the probability of them.
You still have the problem of collisions
Possible Solutions to the problem:
1 Chaining
2 Open Addressing
43 / 127
Images/cinvestav-
Collisions
This is a problem
We might try to avoid this by using a suitable hash function h.
Idea
Make appear to be “random” enough to avoid collisions altogether
(Highly Improbable) or to minimize the probability of them.
You still have the problem of collisions
Possible Solutions to the problem:
1 Chaining
2 Open Addressing
43 / 127
Images/cinvestav-
Collisions
This is a problem
We might try to avoid this by using a suitable hash function h.
Idea
Make appear to be “random” enough to avoid collisions altogether
(Highly Improbable) or to minimize the probability of them.
You still have the problem of collisions
Possible Solutions to the problem:
1 Chaining
2 Open Addressing
43 / 127
Images/cinvestav-
Other Issues
First Issue
The choice of the possible hash function.
Second
The collision handling method
Third
The size (number of buckets) at the hash table
44 / 127
Images/cinvestav-
Other Issues
First Issue
The choice of the possible hash function.
Second
The collision handling method
Third
The size (number of buckets) at the hash table
44 / 127
Images/cinvestav-
Other Issues
First Issue
The choice of the possible hash function.
Second
The collision handling method
Third
The size (number of buckets) at the hash table
44 / 127
Images/cinvestav-
Outline
1 Introduction
2 Operation in a ADT dictionary
Add
Remove
GetValue
Contains
Iterators
3 Example
4 Implementation
5 Hash Tables
Number of Keys
Hash Functions
Hashing By Division
6 Overflow Handling
Open Addressing
Chaining
45 / 127
Images/cinvestav-
Hash Functions
They have two parts
1 The conversion of the key into an integer in the case the key is not an
integer.
2 The mapping to the home bucket.
46 / 127
Images/cinvestav-
Hash Functions
They have two parts
1 The conversion of the key into an integer in the case the key is not an
integer.
2 The mapping to the home bucket.
46 / 127
Images/cinvestav-
Hash Functions
They have two parts
1 The conversion of the key into an integer in the case the key is not an
integer.
2 The mapping to the home bucket.
46 / 127
Images/cinvestav-
String To Integer
Something Notable
Each Java character is 2 bytes long.
An int is 4 bytes long.
We could have the following string s = “pt”
We then do the following:
1 int answer = s.charAt(0);
2 answer = (answer<<16)+s.charAt(1) ;
47 / 127
Images/cinvestav-
String To Integer
Something Notable
Each Java character is 2 bytes long.
An int is 4 bytes long.
We could have the following string s = “pt”
We then do the following:
1 int answer = s.charAt(0);
2 answer = (answer<<16)+s.charAt(1) ;
47 / 127
Images/cinvestav-
String To Integer
Something Notable
Each Java character is 2 bytes long.
An int is 4 bytes long.
We could have the following string s = “pt”
We then do the following:
1 int answer = s.charAt(0);
2 answer = (answer<<16)+s.charAt(1) ;
47 / 127
Images/cinvestav-
String To Integer
Something Notable
Each Java character is 2 bytes long.
An int is 4 bytes long.
We could have the following string s = “pt”
We then do the following:
1 int answer = s.charAt(0);
2 answer = (answer<<16)+s.charAt(1) ;
47 / 127
Images/cinvestav-
String To Integer
Something Notable
Each Java character is 2 bytes long.
An int is 4 bytes long.
We could have the following string s = “pt”
We then do the following:
1 int answer = s.charAt(0);
2 answer = (answer<<16)+s.charAt(1) ;
47 / 127
Images/cinvestav-
What about longer Strings?
We can do the following process
p u b l i c s t a t i c i n t i n t e g e r ( S t r i n g s )
{
i n t lengt h = s . l ength ( ) ;
// number of c h a r a c t e r s i n s
i n t answer = 0;
i f ( len gth % 2 == 1)
{// lengt h i s odd
answer = s . charAt ( le ngth − 1 ) ;
length −−;
}
48 / 127
Images/cinvestav-
Cont...
The rest of the code
// lengt h i s now even
f o r ( i n t i = 0; i < lengt h ; i += 2)
{// do two c h a r a c t e r s at a time
answer += s . charAt ( i ) ;
answer += (( i n t ) s . charAt ( i + 1)) << 16;
}
r e t u r n ( answer < 0) ? −answer : answer ; }
49 / 127
Images/cinvestav-
Analysis of hashing: Which hash function?
Consider that:
Good hash functions should maintain the property of simple uniform
hashing!
The keys have the same probability 1/m to be hashed to any bucket!!!
A uniform hash function minimizes the likelihood of an overflow when
keys are selected at random.
50 / 127
Images/cinvestav-
Analysis of hashing: Which hash function?
Consider that:
Good hash functions should maintain the property of simple uniform
hashing!
The keys have the same probability 1/m to be hashed to any bucket!!!
A uniform hash function minimizes the likelihood of an overflow when
keys are selected at random.
50 / 127
Images/cinvestav-
Analysis of hashing: Which hash function?
Consider that:
Good hash functions should maintain the property of simple uniform
hashing!
The keys have the same probability 1/m to be hashed to any bucket!!!
A uniform hash function minimizes the likelihood of an overflow when
keys are selected at random.
50 / 127
Images/cinvestav-
Possible hash function when the keys are natural numbers
The division method
h(k) = k mod m.
Good choices for m are primes not too close to a power of 2.
51 / 127
Images/cinvestav-
Possible hash function when the keys are natural numbers
The division method
h(k) = k mod m.
Good choices for m are primes not too close to a power of 2.
51 / 127
Images/cinvestav-
What if...
Question:
What about something with keys in a normal distribution?
52 / 127
Images/cinvestav-
Outline
1 Introduction
2 Operation in a ADT dictionary
Add
Remove
GetValue
Contains
Iterators
3 Example
4 Implementation
5 Hash Tables
Number of Keys
Hash Functions
Hashing By Division
6 Overflow Handling
Open Addressing
Chaining
53 / 127
Images/cinvestav-
Hashing By Division
Universe of keys
keySpace = all integers.
Thus, we have that
For every m, the number of integers that get mapped (hashed) into bucket
i is approximately 232
/m.
Properties
The division method results in a uniform hash function when keySpace =
all integers.
54 / 127
Images/cinvestav-
Hashing By Division
Universe of keys
keySpace = all integers.
Thus, we have that
For every m, the number of integers that get mapped (hashed) into bucket
i is approximately 232
/m.
Properties
The division method results in a uniform hash function when keySpace =
all integers.
54 / 127
Images/cinvestav-
Hashing By Division
Universe of keys
keySpace = all integers.
Thus, we have that
For every m, the number of integers that get mapped (hashed) into bucket
i is approximately 232
/m.
Properties
The division method results in a uniform hash function when keySpace =
all integers.
54 / 127
Images/cinvestav-
However
Problem
In practice, keys tend to be correlated.
Thus
The choice of the divisor b affects the distribution of home buckets.
Then
Because of this correlation, applications tend to have a bias towards keys
that map into odd integers (or into even ones).
55 / 127
Images/cinvestav-
However
Problem
In practice, keys tend to be correlated.
Thus
The choice of the divisor b affects the distribution of home buckets.
Then
Because of this correlation, applications tend to have a bias towards keys
that map into odd integers (or into even ones).
55 / 127
Images/cinvestav-
However
Problem
In practice, keys tend to be correlated.
Thus
The choice of the divisor b affects the distribution of home buckets.
Then
Because of this correlation, applications tend to have a bias towards keys
that map into odd integers (or into even ones).
55 / 127
Images/cinvestav-
Examples
Odd number and m an even number
Odd integers hash into odd home buckets
15%14 = 1, 3%14 = 3, 23%14 = 9
Even number and m an even number
Even integers into even home buckets.
20%14 = 6, 30%14 = 2, 8%14 = 8
Properties
The bias in the keys results in a bias toward either the odd or even home
buckets.
56 / 127
Images/cinvestav-
Examples
Odd number and m an even number
Odd integers hash into odd home buckets
15%14 = 1, 3%14 = 3, 23%14 = 9
Even number and m an even number
Even integers into even home buckets.
20%14 = 6, 30%14 = 2, 8%14 = 8
Properties
The bias in the keys results in a bias toward either the odd or even home
buckets.
56 / 127
Images/cinvestav-
Examples
Odd number and m an even number
Odd integers hash into odd home buckets
15%14 = 1, 3%14 = 3, 23%14 = 9
Even number and m an even number
Even integers into even home buckets.
20%14 = 6, 30%14 = 2, 8%14 = 8
Properties
The bias in the keys results in a bias toward either the odd or even home
buckets.
56 / 127
Images/cinvestav-
What if we use an Odd number?
Odd number and m an odd number
odd integers may hash into any home.
15%15 = 0, 3%15 = 3, 23%15 = 8
Even number and m an odd number
Even integers may hash into any home.
20%15 = 5, 30%15 = 0, 8%15 = 8
Thus
The bias in the keys does not result in a bias toward either the odd or even
home buckets.
57 / 127
Images/cinvestav-
What if we use an Odd number?
Odd number and m an odd number
odd integers may hash into any home.
15%15 = 0, 3%15 = 3, 23%15 = 8
Even number and m an odd number
Even integers may hash into any home.
20%15 = 5, 30%15 = 0, 8%15 = 8
Thus
The bias in the keys does not result in a bias toward either the odd or even
home buckets.
57 / 127
Images/cinvestav-
What if we use an Odd number?
Odd number and m an odd number
odd integers may hash into any home.
15%15 = 0, 3%15 = 3, 23%15 = 8
Even number and m an odd number
Even integers may hash into any home.
20%15 = 5, 30%15 = 0, 8%15 = 8
Thus
The bias in the keys does not result in a bias toward either the odd or even
home buckets.
57 / 127
Images/cinvestav-
Bias
Something Notable
The bias in the keys does not result in a bias toward either the odd or even
home buckets.
Then, we have
We have a better chance of uniformly distributed home buckets.
Thus
So do not use an even divisor.
58 / 127
Images/cinvestav-
Bias
Something Notable
The bias in the keys does not result in a bias toward either the odd or even
home buckets.
Then, we have
We have a better chance of uniformly distributed home buckets.
Thus
So do not use an even divisor.
58 / 127
Images/cinvestav-
Bias
Something Notable
The bias in the keys does not result in a bias toward either the odd or even
home buckets.
Then, we have
We have a better chance of uniformly distributed home buckets.
Thus
So do not use an even divisor.
58 / 127
Images/cinvestav-
Selecting The Divisor
Another Problem
Similar biased distribution of home buckets is seen, in practice, when the
divisor is a multiple of prime numbers such as 3, 5, 7, . . .
However
The effect of each prime divisor p of m decreases as p gets larger.
Rules of Choosing m
Ideally, choose m so that it is a prime number.
Not to close to a power of 2.
59 / 127
Images/cinvestav-
Selecting The Divisor
Another Problem
Similar biased distribution of home buckets is seen, in practice, when the
divisor is a multiple of prime numbers such as 3, 5, 7, . . .
However
The effect of each prime divisor p of m decreases as p gets larger.
Rules of Choosing m
Ideally, choose m so that it is a prime number.
Not to close to a power of 2.
59 / 127
Images/cinvestav-
Selecting The Divisor
Another Problem
Similar biased distribution of home buckets is seen, in practice, when the
divisor is a multiple of prime numbers such as 3, 5, 7, . . .
However
The effect of each prime divisor p of m decreases as p gets larger.
Rules of Choosing m
Ideally, choose m so that it is a prime number.
Not to close to a power of 2.
59 / 127
Images/cinvestav-
However
Something Notable
Even with this hash function, we can have problems
Remember
The Gaussian Keys...
60 / 127
Images/cinvestav-
However
Something Notable
Even with this hash function, we can have problems
Remember
The Gaussian Keys...
60 / 127
Images/cinvestav-
Java.util.HashTable
Quite simple
Simply uses a divisor that is an odd number.
Simplify the implementation
This simplifies implementation because we must be able to resize the hash
table as more pairs are put into the dictionary.
Why?
Array doubling, for example, requires you to go from a 1D array table
whose length is m (which is odd) to an array whose length is 2m + 1
(which is also odd).
61 / 127
Images/cinvestav-
Java.util.HashTable
Quite simple
Simply uses a divisor that is an odd number.
Simplify the implementation
This simplifies implementation because we must be able to resize the hash
table as more pairs are put into the dictionary.
Why?
Array doubling, for example, requires you to go from a 1D array table
whose length is m (which is odd) to an array whose length is 2m + 1
(which is also odd).
61 / 127
Images/cinvestav-
Java.util.HashTable
Quite simple
Simply uses a divisor that is an odd number.
Simplify the implementation
This simplifies implementation because we must be able to resize the hash
table as more pairs are put into the dictionary.
Why?
Array doubling, for example, requires you to go from a 1D array table
whose length is m (which is odd) to an array whose length is 2m + 1
(which is also odd).
61 / 127
Images/cinvestav-
A Better Solution: Universal Hashing
Issues
In practice, keys are not randomly distributed.
Any fixed hash function might yield retrieval O(n) time.
Goal
To find hash functions that produce uniform random table indexes
irrespective of the keys.
Idea
To select a hash function at random from a designed class of functions at
the beginning of the execution.
62 / 127
Images/cinvestav-
A Better Solution: Universal Hashing
Issues
In practice, keys are not randomly distributed.
Any fixed hash function might yield retrieval O(n) time.
Goal
To find hash functions that produce uniform random table indexes
irrespective of the keys.
Idea
To select a hash function at random from a designed class of functions at
the beginning of the execution.
62 / 127
Images/cinvestav-
A Better Solution: Universal Hashing
Issues
In practice, keys are not randomly distributed.
Any fixed hash function might yield retrieval O(n) time.
Goal
To find hash functions that produce uniform random table indexes
irrespective of the keys.
Idea
To select a hash function at random from a designed class of functions at
the beginning of the execution.
62 / 127
Images/cinvestav-
Hashing methods: Universal hashing
Example
Set of hash functions
Choose a hash function randomly
(At the beginning of the execution)
HASH TABLE
63 / 127
Images/cinvestav-
Example of Universal Hash
Proceed as follows:
Choose a primer number p large enough so that every possible key k is in
the range [0, ..., p − 1]
Zp = {0, 1, ..., p − 1}and Z∗
p = {1, ..., p − 1}
Define the following hash function:
ha,b(k) = ((ak + b) mod p) mod m, ∀a ∈ Z∗
p and b ∈ Zp
The family of all such hash functions is:
Hp,m = {ha,b : a ∈ Z∗
p and b ∈ Zp}
Important
a and b are chosen randomly at the beginning of execution.
The class Hp,m of hash functions is universal. 64 / 127
Images/cinvestav-
Example of Universal Hash
Proceed as follows:
Choose a primer number p large enough so that every possible key k is in
the range [0, ..., p − 1]
Zp = {0, 1, ..., p − 1}and Z∗
p = {1, ..., p − 1}
Define the following hash function:
ha,b(k) = ((ak + b) mod p) mod m, ∀a ∈ Z∗
p and b ∈ Zp
The family of all such hash functions is:
Hp,m = {ha,b : a ∈ Z∗
p and b ∈ Zp}
Important
a and b are chosen randomly at the beginning of execution.
The class Hp,m of hash functions is universal. 64 / 127
Images/cinvestav-
Example of Universal Hash
Proceed as follows:
Choose a primer number p large enough so that every possible key k is in
the range [0, ..., p − 1]
Zp = {0, 1, ..., p − 1}and Z∗
p = {1, ..., p − 1}
Define the following hash function:
ha,b(k) = ((ak + b) mod p) mod m, ∀a ∈ Z∗
p and b ∈ Zp
The family of all such hash functions is:
Hp,m = {ha,b : a ∈ Z∗
p and b ∈ Zp}
Important
a and b are chosen randomly at the beginning of execution.
The class Hp,m of hash functions is universal. 64 / 127
Images/cinvestav-
Example of Universal Hash
Proceed as follows:
Choose a primer number p large enough so that every possible key k is in
the range [0, ..., p − 1]
Zp = {0, 1, ..., p − 1}and Z∗
p = {1, ..., p − 1}
Define the following hash function:
ha,b(k) = ((ak + b) mod p) mod m, ∀a ∈ Z∗
p and b ∈ Zp
The family of all such hash functions is:
Hp,m = {ha,b : a ∈ Z∗
p and b ∈ Zp}
Important
a and b are chosen randomly at the beginning of execution.
The class Hp,m of hash functions is universal. 64 / 127
Images/cinvestav-
Example of Universal Hash
Proceed as follows:
Choose a primer number p large enough so that every possible key k is in
the range [0, ..., p − 1]
Zp = {0, 1, ..., p − 1}and Z∗
p = {1, ..., p − 1}
Define the following hash function:
ha,b(k) = ((ak + b) mod p) mod m, ∀a ∈ Z∗
p and b ∈ Zp
The family of all such hash functions is:
Hp,m = {ha,b : a ∈ Z∗
p and b ∈ Zp}
Important
a and b are chosen randomly at the beginning of execution.
The class Hp,m of hash functions is universal. 64 / 127
Images/cinvestav-
Example of Universal Hash
Proceed as follows:
Choose a primer number p large enough so that every possible key k is in
the range [0, ..., p − 1]
Zp = {0, 1, ..., p − 1}and Z∗
p = {1, ..., p − 1}
Define the following hash function:
ha,b(k) = ((ak + b) mod p) mod m, ∀a ∈ Z∗
p and b ∈ Zp
The family of all such hash functions is:
Hp,m = {ha,b : a ∈ Z∗
p and b ∈ Zp}
Important
a and b are chosen randomly at the beginning of execution.
The class Hp,m of hash functions is universal. 64 / 127
Images/cinvestav-
Example of Universal Hash
Proceed as follows:
Choose a primer number p large enough so that every possible key k is in
the range [0, ..., p − 1]
Zp = {0, 1, ..., p − 1}and Z∗
p = {1, ..., p − 1}
Define the following hash function:
ha,b(k) = ((ak + b) mod p) mod m, ∀a ∈ Z∗
p and b ∈ Zp
The family of all such hash functions is:
Hp,m = {ha,b : a ∈ Z∗
p and b ∈ Zp}
Important
a and b are chosen randomly at the beginning of execution.
The class Hp,m of hash functions is universal. 64 / 127
Images/cinvestav-
Example of Universal Hash
Proceed as follows:
Choose a primer number p large enough so that every possible key k is in
the range [0, ..., p − 1]
Zp = {0, 1, ..., p − 1}and Z∗
p = {1, ..., p − 1}
Define the following hash function:
ha,b(k) = ((ak + b) mod p) mod m, ∀a ∈ Z∗
p and b ∈ Zp
The family of all such hash functions is:
Hp,m = {ha,b : a ∈ Z∗
p and b ∈ Zp}
Important
a and b are chosen randomly at the beginning of execution.
The class Hp,m of hash functions is universal. 64 / 127
Images/cinvestav-
Example: Universal hash functions
Example
p = 977, m = 50, a and b random numbers
ha,b(k) = ((ak + b) mod p) mod m
65 / 127
Images/cinvestav-
Example of key distribution
Example, mean = 488.5 and dispersion = 5
66 / 127
Images/cinvestav-
Example with 10 keys
Universal Hashing Vs Division Method
67 / 127
Images/cinvestav-
Example with 50 keys
Universal Hashing Vs Division Method
68 / 127
Images/cinvestav-
Example with 100 keys
Universal Hashing Vs Division Method
69 / 127
Images/cinvestav-
Example with 200 keys
Universal Hashing Vs Division Method
70 / 127
Images/cinvestav-
Another Example: Matrix Method
Then
Let us say keys are u-bits long.
Say the table size M is power of 2.
an index is b-bits long with M = 2b.
The h function
Pick h to be a random b-by-u 0/1 matrix, and define h(x) = hx
where after the inner product we apply mod 2
Example
h
b



1 0 0 0
0 1 1 1
1 1 1 0



u
x




1
0
1
0





=
h (x)


1
1
0



71 / 127
Images/cinvestav-
Another Example: Matrix Method
Then
Let us say keys are u-bits long.
Say the table size M is power of 2.
an index is b-bits long with M = 2b.
The h function
Pick h to be a random b-by-u 0/1 matrix, and define h(x) = hx
where after the inner product we apply mod 2
Example
h
b



1 0 0 0
0 1 1 1
1 1 1 0



u
x




1
0
1
0





=
h (x)


1
1
0



71 / 127
Images/cinvestav-
Another Example: Matrix Method
Then
Let us say keys are u-bits long.
Say the table size M is power of 2.
an index is b-bits long with M = 2b.
The h function
Pick h to be a random b-by-u 0/1 matrix, and define h(x) = hx
where after the inner product we apply mod 2
Example
h
b



1 0 0 0
0 1 1 1
1 1 1 0



u
x




1
0
1
0





=
h (x)


1
1
0



71 / 127
Images/cinvestav-
Another Example: Matrix Method
Then
Let us say keys are u-bits long.
Say the table size M is power of 2.
an index is b-bits long with M = 2b.
The h function
Pick h to be a random b-by-u 0/1 matrix, and define h(x) = hx
where after the inner product we apply mod 2
Example
h
b



1 0 0 0
0 1 1 1
1 1 1 0



u
x




1
0
1
0





=
h (x)


1
1
0



71 / 127
Images/cinvestav-
Another Example: Matrix Method
Then
Let us say keys are u-bits long.
Say the table size M is power of 2.
an index is b-bits long with M = 2b.
The h function
Pick h to be a random b-by-u 0/1 matrix, and define h(x) = hx
where after the inner product we apply mod 2
Example
h
b



1 0 0 0
0 1 1 1
1 1 1 0



u
x




1
0
1
0





=
h (x)


1
1
0



71 / 127
Images/cinvestav-
Implementation of the column*vector mod 2
Code - SWAR-Popcount - Divide and Conquer
// This works only i n 32 b i t s
i n t product ( i n t row , i n t v e c t o r ){
i n t i = row & v e c t o r ;
i = i − (( i >> 1) & 0x55555555 ) ;
i = ( i & 0x33333333 ) + (( i >> 2) & 0x33333333 ) ;
i = ( ( ( i + ( i >> 4)) & 0x0F0F0F0F ) ∗ 0x01010101 ) >> 24;
r e t u r n i & 0x00000001 ;
}
72 / 127
Images/cinvestav-
Explanation
Counting Duo-Bits
A bit-duo (two neighboring bits) can be interpreted with bit 0 = a, and bit
1 = b as
duo := 2b + a
The duo population is
popcnt(duo) := b + a
This can be achieved by
(2b + a) − (2b + a) ÷ 2 or (2b + a) − b
73 / 127
Images/cinvestav-
Explanation
Counting Duo-Bits
A bit-duo (two neighboring bits) can be interpreted with bit 0 = a, and bit
1 = b as
duo := 2b + a
The duo population is
popcnt(duo) := b + a
This can be achieved by
(2b + a) − (2b + a) ÷ 2 or (2b + a) − b
73 / 127
Images/cinvestav-
Explanation
Counting Duo-Bits
A bit-duo (two neighboring bits) can be interpreted with bit 0 = a, and bit
1 = b as
duo := 2b + a
The duo population is
popcnt(duo) := b + a
This can be achieved by
(2b + a) − (2b + a) ÷ 2 or (2b + a) − b
73 / 127
Images/cinvestav-
We have then
Something Notable
i i div 2 popcnt(i)
00 00 00
01 00 01
10 01 01
11 01 10
We have then
Only the lower bit is needed from i div 2 - and one do not has to worry
about borrows from neighboring duos.
We need to clear the upper bits
Remember 5 in binary 0101 (Granularity Problem)... we use the 0 to
remove the upper bits
74 / 127
Images/cinvestav-
We have then
Something Notable
i i div 2 popcnt(i)
00 00 00
01 00 01
10 01 01
11 01 10
We have then
Only the lower bit is needed from i div 2 - and one do not has to worry
about borrows from neighboring duos.
We need to clear the upper bits
Remember 5 in binary 0101 (Granularity Problem)... we use the 0 to
remove the upper bits
74 / 127
Images/cinvestav-
We have then
Something Notable
i i div 2 popcnt(i)
00 00 00
01 00 01
10 01 01
11 01 10
We have then
Only the lower bit is needed from i div 2 - and one do not has to worry
about borrows from neighboring duos.
We need to clear the upper bits
Remember 5 in binary 0101 (Granularity Problem)... we use the 0 to
remove the upper bits
74 / 127
Images/cinvestav-
Thus, we have that
Clearing Bits
SWAR-wise, one needs to clear all "even" bits of the div 2 subtrahend
to perform a 32-bit subtraction of all 16 duos:
x = x - ((x >> 1) & 0x55555555);
Note
The popcount-result of the bit-duos still takes two bits.
Now
What?
75 / 127
Images/cinvestav-
Thus, we have that
Clearing Bits
SWAR-wise, one needs to clear all "even" bits of the div 2 subtrahend
to perform a 32-bit subtraction of all 16 duos:
x = x - ((x >> 1) & 0x55555555);
Note
The popcount-result of the bit-duos still takes two bits.
Now
What?
75 / 127
Images/cinvestav-
Thus, we have that
Clearing Bits
SWAR-wise, one needs to clear all "even" bits of the div 2 subtrahend
to perform a 32-bit subtraction of all 16 duos:
x = x - ((x >> 1) & 0x55555555);
Note
The popcount-result of the bit-duos still takes two bits.
Now
What?
75 / 127
Images/cinvestav-
Counting Nibble-Bits
We have
The next step is to add the duo-counts to populations of four neighboring
bits, the 8 nibble-counts, which may range from zero to four
We can do this using 3 in binary
0x3=0011
76 / 127
Images/cinvestav-
Counting Nibble-Bits
We have
The next step is to add the duo-counts to populations of four neighboring
bits, the 8 nibble-counts, which may range from zero to four
We can do this using 3 in binary
0x3=0011
76 / 127
Images/cinvestav-
How?
How many ones you can have in 4 bits
0 to 22
Thus, we only need 4 bits to store the result
We create a counter for each of the two bits in four bits
This is done using the mask - after How many ones can you have in
two bits?
From 0 to 2 you can use the lower 2 bits to represent this!!!
This is the instruction
i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
77 / 127
Images/cinvestav-
How?
How many ones you can have in 4 bits
0 to 22
Thus, we only need 4 bits to store the result
We create a counter for each of the two bits in four bits
This is done using the mask - after How many ones can you have in
two bits?
From 0 to 2 you can use the lower 2 bits to represent this!!!
This is the instruction
i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
77 / 127
Images/cinvestav-
How?
How many ones you can have in 4 bits
0 to 22
Thus, we only need 4 bits to store the result
We create a counter for each of the two bits in four bits
This is done using the mask - after How many ones can you have in
two bits?
From 0 to 2 you can use the lower 2 bits to represent this!!!
This is the instruction
i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
77 / 127
Images/cinvestav-
How?
How many ones you can have in 4 bits
0 to 22
Thus, we only need 4 bits to store the result
We create a counter for each of the two bits in four bits
This is done using the mask - after How many ones can you have in
two bits?
From 0 to 2 you can use the lower 2 bits to represent this!!!
This is the instruction
i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
77 / 127
Images/cinvestav-
How?
How many ones you can have in 4 bits
0 to 22
Thus, we only need 4 bits to store the result
We create a counter for each of the two bits in four bits
This is done using the mask - after How many ones can you have in
two bits?
From 0 to 2 you can use the lower 2 bits to represent this!!!
This is the instruction
i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
77 / 127
Images/cinvestav-
Example
We have
Old i new i 3 new i&3
0000 0000 0011 0000
0001 0001 0011 0001
0010 0001 0011 0001
0011 0010 0011 0010
0100 0100 0011 0000
0101 0101 0011 0001
1111 1010 0011 0010
new i>>2 3 new i>>2&3
0000 0011 0000
0000 0011 0000
0000 0011 0000
0000 0011 0000
0001 0011 0001
0001 0011 0001
0010 0011 0010
Final i
0000
0001
0001
0010
0001
0010
0100
78 / 127
Images/cinvestav-
We have the following
Now what?
You already got the idea? Now it is about to get the byte-populations
from two nibble-populations (8 bits)
Do you remember your hexadecimal notation
0x0f = 00001111
How many ones do you have in 8 positions
From 0 to 23, you only require 4 bits for this!!!
79 / 127
Images/cinvestav-
We have the following
Now what?
You already got the idea? Now it is about to get the byte-populations
from two nibble-populations (8 bits)
Do you remember your hexadecimal notation
0x0f = 00001111
How many ones do you have in 8 positions
From 0 to 23, you only require 4 bits for this!!!
79 / 127
Images/cinvestav-
We have the following
Now what?
You already got the idea? Now it is about to get the byte-populations
from two nibble-populations (8 bits)
Do you remember your hexadecimal notation
0x0f = 00001111
How many ones do you have in 8 positions
From 0 to 23, you only require 4 bits for this!!!
79 / 127
Images/cinvestav-
Thus the next step
It will get the counter for the number of ones in 8 bits
(i + (i >> 4)) & 0x0f0f0f0f0f0f0f0f;
80 / 127
Images/cinvestav-
Example
We have
Old x 2 bits 4 bits
11110000 10100000 01000000
11111111 10101010 01000100
Now we get
1 01000000 −→ (i >> 4) → 00000100 −→ i+(i >> 4)
→01000100&00001111−→ 00000100
2 01000100 −→ (i >> 4) → 00000100 −→ i+(i >> 4)
→01001000&00001111−→ 00001000
81 / 127
Images/cinvestav-
Example
We have
Old x 2 bits 4 bits
11110000 10100000 01000000
11111111 10101010 01000100
Now we get
1 01000000 −→ (i >> 4) → 00000100 −→ i+(i >> 4)
→01000100&00001111−→ 00000100
2 01000100 −→ (i >> 4) → 00000100 −→ i+(i >> 4)
→01001000&00001111−→ 00001000
81 / 127
Images/cinvestav-
We can keep doing this
It is possible to keep going
i = (i + (i >> 8)) & 0x00ff00ff
Then
i = (i + (i >> 16)) & 0x000000ff
82 / 127
Images/cinvestav-
We can keep doing this
It is possible to keep going
i = (i + (i >> 8)) & 0x00ff00ff
Then
i = (i + (i >> 16)) & 0x000000ff
82 / 127
Images/cinvestav-
With today fast 64 bit multiplications
Something Notable
one can multiply the vector of 8-byte-counts with 0x0101010101010101 to
get the final result in the most significant byte,
For this the code
(((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101)>>24
83 / 127
Images/cinvestav-
With today fast 64 bit multiplications
Something Notable
one can multiply the vector of 8-byte-counts with 0x0101010101010101 to
get the final result in the most significant byte,
For this the code
(((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101)>>24
83 / 127
Images/cinvestav-
Divide and Conquer
This is the natural way we do many things
We always attack smaller versions first of the large one!!!
84 / 127
Images/cinvestav-
Overflow Handling
Overflow
An overflow occurs when the home bucket for a new pair (key, element) is
full.
One strategy to handle overflow, small universe of keys
Search the hash table in some systematic fashion for a bucket that is not
full.
Linear probing (linear open addressing).
Quadratic probing.
Random probing.
The other strategy, a large universe of keys
Eliminate overflows by permitting each bucket to keep a list of all pairs for
which it is the home bucket.
Array linear list.
Chain.
85 / 127
Images/cinvestav-
Overflow Handling
Overflow
An overflow occurs when the home bucket for a new pair (key, element) is
full.
One strategy to handle overflow, small universe of keys
Search the hash table in some systematic fashion for a bucket that is not
full.
Linear probing (linear open addressing).
Quadratic probing.
Random probing.
The other strategy, a large universe of keys
Eliminate overflows by permitting each bucket to keep a list of all pairs for
which it is the home bucket.
Array linear list.
Chain.
85 / 127
Images/cinvestav-
Overflow Handling
Overflow
An overflow occurs when the home bucket for a new pair (key, element) is
full.
One strategy to handle overflow, small universe of keys
Search the hash table in some systematic fashion for a bucket that is not
full.
Linear probing (linear open addressing).
Quadratic probing.
Random probing.
The other strategy, a large universe of keys
Eliminate overflows by permitting each bucket to keep a list of all pairs for
which it is the home bucket.
Array linear list.
Chain.
85 / 127
Images/cinvestav-
Overflow Handling
Overflow
An overflow occurs when the home bucket for a new pair (key, element) is
full.
One strategy to handle overflow, small universe of keys
Search the hash table in some systematic fashion for a bucket that is not
full.
Linear probing (linear open addressing).
Quadratic probing.
Random probing.
The other strategy, a large universe of keys
Eliminate overflows by permitting each bucket to keep a list of all pairs for
which it is the home bucket.
Array linear list.
Chain.
85 / 127
Images/cinvestav-
Overflow Handling
Overflow
An overflow occurs when the home bucket for a new pair (key, element) is
full.
One strategy to handle overflow, small universe of keys
Search the hash table in some systematic fashion for a bucket that is not
full.
Linear probing (linear open addressing).
Quadratic probing.
Random probing.
The other strategy, a large universe of keys
Eliminate overflows by permitting each bucket to keep a list of all pairs for
which it is the home bucket.
Array linear list.
Chain.
85 / 127
Images/cinvestav-
Overflow Handling
Overflow
An overflow occurs when the home bucket for a new pair (key, element) is
full.
One strategy to handle overflow, small universe of keys
Search the hash table in some systematic fashion for a bucket that is not
full.
Linear probing (linear open addressing).
Quadratic probing.
Random probing.
The other strategy, a large universe of keys
Eliminate overflows by permitting each bucket to keep a list of all pairs for
which it is the home bucket.
Array linear list.
Chain.
85 / 127
Images/cinvestav-
Outline
1 Introduction
2 Operation in a ADT dictionary
Add
Remove
GetValue
Contains
Iterators
3 Example
4 Implementation
5 Hash Tables
Number of Keys
Hash Functions
Hashing By Division
6 Overflow Handling
Open Addressing
Chaining
86 / 127
Images/cinvestav-
Open addressing
Definition
All the elements occupy the hash table itself.
What is it?
We systematically examine table slots until either we find the desired
element or we have ascertained that the element is not in the table.
Advantages
The advantage of open addressing is that it avoids pointers altogether.
87 / 127
Images/cinvestav-
Open addressing
Definition
All the elements occupy the hash table itself.
What is it?
We systematically examine table slots until either we find the desired
element or we have ascertained that the element is not in the table.
Advantages
The advantage of open addressing is that it avoids pointers altogether.
87 / 127
Images/cinvestav-
Open addressing
Definition
All the elements occupy the hash table itself.
What is it?
We systematically examine table slots until either we find the desired
element or we have ascertained that the element is not in the table.
Advantages
The advantage of open addressing is that it avoids pointers altogether.
87 / 127
Images/cinvestav-
Insert in Open addressing
Extended hash function to probe
Instead of being fixed in the order 0, 1, 2, ..., m − 1 with Θ (n) search
time
Extend the hash function to
h : U × {0, 1, ..., m − 1} → {0, 1, ..., m − 1}
This gives the probe sequence h(k, 0), h(k, 1), ..., h(k, m − 1)
A permutation of 0, 1, 2, ..., m − 1
88 / 127
Images/cinvestav-
Insert in Open addressing
Extended hash function to probe
Instead of being fixed in the order 0, 1, 2, ..., m − 1 with Θ (n) search
time
Extend the hash function to
h : U × {0, 1, ..., m − 1} → {0, 1, ..., m − 1}
This gives the probe sequence h(k, 0), h(k, 1), ..., h(k, m − 1)
A permutation of 0, 1, 2, ..., m − 1
88 / 127
Images/cinvestav-
Insert in Open addressing
Extended hash function to probe
Instead of being fixed in the order 0, 1, 2, ..., m − 1 with Θ (n) search
time
Extend the hash function to
h : U × {0, 1, ..., m − 1} → {0, 1, ..., m − 1}
This gives the probe sequence h(k, 0), h(k, 1), ..., h(k, m − 1)
A permutation of 0, 1, 2, ..., m − 1
88 / 127
Images/cinvestav-
Insert in Open addressing
Extended hash function to probe
Instead of being fixed in the order 0, 1, 2, ..., m − 1 with Θ (n) search
time
Extend the hash function to
h : U × {0, 1, ..., m − 1} → {0, 1, ..., m − 1}
This gives the probe sequence h(k, 0), h(k, 1), ..., h(k, m − 1)
A permutation of 0, 1, 2, ..., m − 1
88 / 127
Images/cinvestav-
Hashing methods in Open Addressing
HASH-INSERT(T, k)
1 i = 0
2 repeat
3 j = h (k, i)
4 if T [j] == NIL
5 T [j] = k
6 return j
7 else i = i + 1
8 until i == m
9 error “Hash Table Overflow”
89 / 127
Images/cinvestav-
Hashing methods in Open Addressing
HASH-SEARCH(T,k)
1 i = 0
2 repeat
3 j = h (k, i)
4 if T [j] == k
5 return j
6 i = i + 1
7 until T [j] == NIL or i == m
8 return NIL
90 / 127
Images/cinvestav-
Linear probing: Definition and properties
Hash function
Given an ordinary hash function h : 0, 1, ..., m − 1 → U for
i = 0, 1, ..., m − 1, we get the extended hash function
h(k, i) = h (k) + i mod m, (2)
Sequence of probes
Given key k, we first probe T[h (k)], then T[h (k) + 1] and so on until
T[m − 1]. Then, we wrap around T[0] to T[h (k) − 1].
Distinct probes
Because the initial probe determines the entire probe sequence, there are
m distinct probe sequences.
91 / 127
Images/cinvestav-
Linear probing: Definition and properties
Hash function
Given an ordinary hash function h : 0, 1, ..., m − 1 → U for
i = 0, 1, ..., m − 1, we get the extended hash function
h(k, i) = h (k) + i mod m, (2)
Sequence of probes
Given key k, we first probe T[h (k)], then T[h (k) + 1] and so on until
T[m − 1]. Then, we wrap around T[0] to T[h (k) − 1].
Distinct probes
Because the initial probe determines the entire probe sequence, there are
m distinct probe sequences.
91 / 127
Images/cinvestav-
Linear probing: Definition and properties
Hash function
Given an ordinary hash function h : 0, 1, ..., m − 1 → U for
i = 0, 1, ..., m − 1, we get the extended hash function
h(k, i) = h (k) + i mod m, (2)
Sequence of probes
Given key k, we first probe T[h (k)], then T[h (k) + 1] and so on until
T[m − 1]. Then, we wrap around T[0] to T[h (k) − 1].
Distinct probes
Because the initial probe determines the entire probe sequence, there are
m distinct probe sequences.
91 / 127
Images/cinvestav-
Linear probing: Definition and properties
Disadvantages
Linear probing suffers of primary clustering.
Long runs of occupied slots build up increasing the average search
time.
Clusters arise because an empty slot preceded by i full slots gets filled
next with probability i+1
m .
Long runs of occupied slots tend to get longer, and the average
search time increases.
92 / 127
Images/cinvestav-
Linear probing: Definition and properties
Disadvantages
Linear probing suffers of primary clustering.
Long runs of occupied slots build up increasing the average search
time.
Clusters arise because an empty slot preceded by i full slots gets filled
next with probability i+1
m .
Long runs of occupied slots tend to get longer, and the average
search time increases.
92 / 127
Images/cinvestav-
Linear probing: Definition and properties
Disadvantages
Linear probing suffers of primary clustering.
Long runs of occupied slots build up increasing the average search
time.
Clusters arise because an empty slot preceded by i full slots gets filled
next with probability i+1
m .
Long runs of occupied slots tend to get longer, and the average
search time increases.
92 / 127
Images/cinvestav-
Linear probing: Definition and properties
Disadvantages
Linear probing suffers of primary clustering.
Long runs of occupied slots build up increasing the average search
time.
Clusters arise because an empty slot preceded by i full slots gets filled
next with probability i+1
m .
Long runs of occupied slots tend to get longer, and the average
search time increases.
92 / 127
Images/cinvestav-
Example: Linear Probing – Get And Put
Constraints
divisor = m (number of buckets) = 17.
Home bucket = key % 17.
Then
Put in pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45
We have
93 / 127
Images/cinvestav-
Example: Linear Probing – Get And Put
Constraints
divisor = m (number of buckets) = 17.
Home bucket = key % 17.
Then
Put in pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45
We have
93 / 127
Images/cinvestav-
Example: Linear Probing – Get And Put
Constraints
divisor = m (number of buckets) = 17.
Home bucket = key % 17.
Then
Put in pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45
We have
93 / 127
Images/cinvestav-
Linear Probing – Remove
Example
remove(0)
Compact Cluster
Search cluster for pair (if any) to fill vacated bucket.
94 / 127
Images/cinvestav-
Linear Probing – Remove
Example
remove(0)
Compact Cluster
Search cluster for pair (if any) to fill vacated bucket.
94 / 127
Images/cinvestav-
Linear Probing – Remove
Example
remove(0)
Compact Cluster
Search cluster for pair (if any) to fill vacated bucket.
94 / 127
Images/cinvestav-
Linear Probing – remove(34)
Example
remove(34)
Compact Cluster
Search cluster for pair (if any) to fill vacated bucket.
95 / 127
Images/cinvestav-
Linear Probing – remove(34)
Example
remove(34)
Compact Cluster
Search cluster for pair (if any) to fill vacated bucket.
95 / 127
Images/cinvestav-
Linear Probing – remove(34)
Example
remove(34)
Compact Cluster
Search cluster for pair (if any) to fill vacated bucket.
95 / 127
Images/cinvestav-
Linear Probing – remove(29)
Example
Compact Cluster
Search cluster for pair (if any) to fill vacated bucket.
96 / 127
Images/cinvestav-
Linear Probing – remove(29)
Example
Compact Cluster
Search cluster for pair (if any) to fill vacated bucket.
96 / 127
Images/cinvestav-
Code for Removing
We have the following
p u b l i c void remove ( key ){
i n t positionsChecked = 1;
i n t i = FindSlot ( Key ) ;
i f ( Table [ i ] == n u l l )
r e t u r n ; // key i s not i n the t a b l e
j = i ;
wh il e ( positionsChecked <= Table . len gth ){
j = ( j +1) % Table . leng th ;
i f ( Table [ j ] == n u l l ) break ;
k = Hashing ( Table [ j ] . key ) ;
i f ( i < j && ( k <= i | | k > j )) | |
( j < i && ( k <= i && k > j )) {
Table [ i ] = Table [ j ] ;
i = j ;
}
positionChecked++;
}
Table [ i ] = n u l l ;
}
97 / 127
Images/cinvestav-
Explanation
First
For all records in a cluster, there must be no vacant slots between their
natural hash position and their current position (else lookups will
terminate before finding the record).
Second
k is the raw hash where the record at j would naturally land in the
hash table if there were no collisions.
Thus
This test is asking if the record at j is invalidly positioned with respect to
the required properties of a cluster now that i is vacant.
98 / 127
Images/cinvestav-
Explanation
First
For all records in a cluster, there must be no vacant slots between their
natural hash position and their current position (else lookups will
terminate before finding the record).
Second
k is the raw hash where the record at j would naturally land in the
hash table if there were no collisions.
Thus
This test is asking if the record at j is invalidly positioned with respect to
the required properties of a cluster now that i is vacant.
98 / 127
Images/cinvestav-
Explanation
First
For all records in a cluster, there must be no vacant slots between their
natural hash position and their current position (else lookups will
terminate before finding the record).
Second
k is the raw hash where the record at j would naturally land in the
hash table if there were no collisions.
Thus
This test is asking if the record at j is invalidly positioned with respect to
the required properties of a cluster now that i is vacant.
98 / 127
Images/cinvestav-
Case 1
We have the following
Case 1
i j
k k
we have i < j
If i < k ≤ j then moving j to the i position will be incorrect... Why?
99 / 127
Images/cinvestav-
Case 2
We have the following
Case 2
ij
k
We have j < i
If k ≤ j < or i < k then moving j to the i position will be incorrect...
Why?
100 / 127
Images/cinvestav-
Complexity
We have then
Worst-case get/put/remove time is Θ(n), where n is the number of pairs
in the table.
Something Notable
This happens when all pairs are in the same cluster.
101 / 127
Images/cinvestav-
Complexity
We have then
Worst-case get/put/remove time is Θ(n), where n is the number of pairs
in the table.
Something Notable
This happens when all pairs are in the same cluster.
101 / 127
Images/cinvestav-
Explaining α
We have that
α= loading density = (number of pairs)/m.
In the example α = 12/17
We have the following terms to explain complexity in Open Addressing
Sn = expected number of buckets examined in a successful search
when n is large.
Un= expected number of buckets examined in a unsuccessful search
when n is large
102 / 127
Images/cinvestav-
Explaining α
We have that
α= loading density = (number of pairs)/m.
In the example α = 12/17
We have the following terms to explain complexity in Open Addressing
Sn = expected number of buckets examined in a successful search
when n is large.
Un= expected number of buckets examined in a unsuccessful search
when n is large
102 / 127
Images/cinvestav-
Explaining α
We have that
α= loading density = (number of pairs)/m.
In the example α = 12/17
We have the following terms to explain complexity in Open Addressing
Sn = expected number of buckets examined in a successful search
when n is large.
Un= expected number of buckets examined in a unsuccessful search
when n is large
102 / 127
Images/cinvestav-
Explaining α
We have that
α= loading density = (number of pairs)/m.
In the example α = 12/17
We have the following terms to explain complexity in Open Addressing
Sn = expected number of buckets examined in a successful search
when n is large.
Un= expected number of buckets examined in a unsuccessful search
when n is large
102 / 127
Images/cinvestav-
Expected Performance in Open Addressing, 0 ≤ α ≤ 1
We have for unsuccessful search
Un = O 1 +
1
1 − α
(3)
We have for successful search
Sn = O 1 +
1
α
ln
1
1 − α
(4)
103 / 127
Images/cinvestav-
Expected Performance in Open Addressing, 0 ≤ α ≤ 1
We have for unsuccessful search
Un = O 1 +
1
1 − α
(3)
We have for successful search
Sn = O 1 +
1
α
ln
1
1 − α
(4)
103 / 127
Images/cinvestav-
Thus, for different α s
We have using uniform keys!!!
α Sn Un
0.50 1.5 2.5
0.75 2.5 8.5
0.90 5.5 50.5
Recommendation
α ≤ 0.75 is recommended.
104 / 127
Images/cinvestav-
Thus, for different α s
We have using uniform keys!!!
α Sn Un
0.50 1.5 2.5
0.75 2.5 8.5
0.90 5.5 50.5
Recommendation
α ≤ 0.75 is recommended.
104 / 127
Images/cinvestav-
Hash Table Design
For example
If performance requirements are given, determine maximum permissible
loading density.
We want a successful search to make no more than 5 compares
(expected)
4 = 1 + 1
1−α‘
α = 3/4
We want an unsuccessful search to make no more than 6 compares
(expected).
6 = 1 + 1
α log 1
1−α
α ≈ 0.964
105 / 127
Images/cinvestav-
Hash Table Design
For example
If performance requirements are given, determine maximum permissible
loading density.
We want a successful search to make no more than 5 compares
(expected)
4 = 1 + 1
1−α‘
α = 3/4
We want an unsuccessful search to make no more than 6 compares
(expected).
6 = 1 + 1
α log 1
1−α
α ≈ 0.964
105 / 127
Images/cinvestav-
Hash Table Design
For example
If performance requirements are given, determine maximum permissible
loading density.
We want a successful search to make no more than 5 compares
(expected)
4 = 1 + 1
1−α‘
α = 3/4
We want an unsuccessful search to make no more than 6 compares
(expected).
6 = 1 + 1
α log 1
1−α
α ≈ 0.964
105 / 127
Images/cinvestav-
Use the minimum α for your trigger of doubling the array!!!
Thus, we have
αf ≤ min {3/4, 0.964}
106 / 127
Images/cinvestav-
Hash Table Design
Dynamic resizing of table.
Whenever loading density exceeds threshold, rehash into a table of
approximately twice the current size.
107 / 127
Images/cinvestav-
But , even with a nice division method!!!
Example using keys uniformly distributed
It was generated using the division method
Then
108 / 127
Images/cinvestav-
But , even with a nice division method!!!
Example using keys uniformly distributed
It was generated using the division method
Then
108 / 127
Images/cinvestav-
Example
Example using Gaussian keys
It was generated using the division method
Then
109 / 127
Images/cinvestav-
Example
Example using Gaussian keys
It was generated using the division method
Then
109 / 127
Images/cinvestav-
Outline
1 Introduction
2 Operation in a ADT dictionary
Add
Remove
GetValue
Contains
Iterators
3 Example
4 Implementation
5 Hash Tables
Number of Keys
Hash Functions
Hashing By Division
6 Overflow Handling
Open Addressing
Chaining
110 / 127
Images/cinvestav-
Linear List Of Synonyms
Thus
Each bucket keeps a linear list of all pairs for which it is the home
bucket.
The linear list may or may not be sorted by key.
The linear list may be an array linear list or a chain.
111 / 127
Images/cinvestav-
Collision Handling: Chaining
A Possible Solution
Insert the elements that hash to the same slot into a linked list.
U
(Universe of Keys)
112 / 127
Images/cinvestav-
Example Sorted Chains
Add to a hash table with m = 11
Put in pairs whose keys are 6, 17, 12, 23, 28, 5, 16, 3, 8
So, we have
Home bucket = key % 11.
113 / 127
Images/cinvestav-
Example Sorted Chains
Add to a hash table with m = 11
Put in pairs whose keys are 6, 17, 12, 23, 28, 5, 16, 3, 8
So, we have
Home bucket = key % 11.
113 / 127
Images/cinvestav-
Example
The Table
0
1
2
3
4
5
6
7
8
9
10
6, 17, 12, 23, 28, 5, 16, 3, 8
114 / 127
Images/cinvestav-
Example
The Table
0
1
2
3
4
5
6
7
8
9
10
6, 17, 12, 23, 28, 5, 16, 3
8
115 / 127
Images/cinvestav-
Example
The Table
0
1
2
3
4
5
6
7
8
9
10
6, 17, 12, 23, 28, 5, 16
8
3
116 / 127
Images/cinvestav-
Example
The Table
0
1
2
3
4
5
6
7
8
9
10
6, 17, 12, 23, 28, 5
8
3
16
117 / 127
Images/cinvestav-
Example
The Table
0
1
2
3
4
5
6
7
8
9
10
6, 17, 12, 23, 28, 5
8
3
16
118 / 127
Images/cinvestav-
Example
The Table
0
1
2
3
4
5
6
7
8
9
10
6, 17, 12, 23, 28
8
3
5 16
119 / 127
Images/cinvestav-
Example
The Table
0
1
2
3
4
5
6
7
8
9
10
6, 17, 12, 23
8
3
5 16
28
120 / 127
Images/cinvestav-
Example
The Table
0
1
2
3
4
5
6
7
8
9
10
6, 17, 12
8
3
5 16
28
23
121 / 127
Images/cinvestav-
Example
The Table
0
1
2
3
4
5
6
7
8
9
10
6, 17
8
3
5 16
28
12 23
122 / 127
Images/cinvestav-
Example
The Table
0
1
2
3
4
5
6
7
8
9
10
6
8
3
5 16
17
12 23
28
123 / 127
Images/cinvestav-
Example
The Table
0
1
2
3
4
5
6
7
8
9
10
8
3
5 16
17
12 23
286
124 / 127
Images/cinvestav-
Do You Remember This?
Universal Hashing Vs Division Method
125 / 127
Images/cinvestav-
Expected Complexity of Hash Table under Chaining
We have for unsuccessful search
Un = O (1 + α) (5)
We have for successful search
Sn = O (1 + α) (6)
126 / 127
Images/cinvestav-
Expected Complexity of Hash Table under Chaining
We have for unsuccessful search
Un = O (1 + α) (5)
We have for successful search
Sn = O (1 + α) (6)
126 / 127
Images/cinvestav-
Now, Back to the real world
java.util.Hashtable
It uses unsorted chains.
It uses a default initial m = divisor = 101
It uses a default α ≤ 0.75
When loading density exceeds a max permissible threshold, It rehash
with new m = 2m+1.
127 / 127
Images/cinvestav-
Now, Back to the real world
java.util.Hashtable
It uses unsorted chains.
It uses a default initial m = divisor = 101
It uses a default α ≤ 0.75
When loading density exceeds a max permissible threshold, It rehash
with new m = 2m+1.
127 / 127
Images/cinvestav-
Now, Back to the real world
java.util.Hashtable
It uses unsorted chains.
It uses a default initial m = divisor = 101
It uses a default α ≤ 0.75
When loading density exceeds a max permissible threshold, It rehash
with new m = 2m+1.
127 / 127
Images/cinvestav-
Now, Back to the real world
java.util.Hashtable
It uses unsorted chains.
It uses a default initial m = divisor = 101
It uses a default α ≤ 0.75
When loading density exceeds a max permissible threshold, It rehash
with new m = 2m+1.
127 / 127

More Related Content

What's hot

Java Generics
Java GenericsJava Generics
Java Genericsjeslie
 
Python Traning presentation
Python Traning presentationPython Traning presentation
Python Traning presentationNimrita Koul
 
Matlab and Python: Basic Operations
Matlab and Python: Basic OperationsMatlab and Python: Basic Operations
Matlab and Python: Basic OperationsWai Nwe Tun
 
Effective Java - Generics
Effective Java - GenericsEffective Java - Generics
Effective Java - GenericsRoshan Deniyage
 
Introduction to Python Language and Data Types
Introduction to Python Language and Data TypesIntroduction to Python Language and Data Types
Introduction to Python Language and Data TypesRavi Shankar
 
Introduction to Python - Part Three
Introduction to Python - Part ThreeIntroduction to Python - Part Three
Introduction to Python - Part Threeamiable_indian
 
Python quickstart for programmers: Python Kung Fu
Python quickstart for programmers: Python Kung FuPython quickstart for programmers: Python Kung Fu
Python quickstart for programmers: Python Kung Fuclimatewarrior
 
Introduction to Python - Training for Kids
Introduction to Python - Training for KidsIntroduction to Python - Training for Kids
Introduction to Python - Training for KidsAimee Maree Forsstrom
 
Python language data types
Python language data typesPython language data types
Python language data typesHoang Nguyen
 
Advanced Python, Part 1
Advanced Python, Part 1Advanced Python, Part 1
Advanced Python, Part 1Zaar Hai
 
Apprentissage statistique et analyse prédictive en Python avec scikit-learn p...
Apprentissage statistique et analyse prédictive en Python avec scikit-learn p...Apprentissage statistique et analyse prédictive en Python avec scikit-learn p...
Apprentissage statistique et analyse prédictive en Python avec scikit-learn p...La Cuisine du Web
 
Dotnet programming concepts difference faqs- 2
Dotnet programming concepts difference faqs- 2Dotnet programming concepts difference faqs- 2
Dotnet programming concepts difference faqs- 2Umar Ali
 
Java chapter 6 - Arrays -syntax and use
Java chapter 6 - Arrays -syntax and useJava chapter 6 - Arrays -syntax and use
Java chapter 6 - Arrays -syntax and useMukesh Tekwani
 
Programming with Python - Week 2
Programming with Python - Week 2Programming with Python - Week 2
Programming with Python - Week 2Ahmet Bulut
 

What's hot (19)

Java Generics
Java GenericsJava Generics
Java Generics
 
Python Traning presentation
Python Traning presentationPython Traning presentation
Python Traning presentation
 
Java generics
Java genericsJava generics
Java generics
 
Matlab and Python: Basic Operations
Matlab and Python: Basic OperationsMatlab and Python: Basic Operations
Matlab and Python: Basic Operations
 
L04 Software Design 2
L04 Software Design 2L04 Software Design 2
L04 Software Design 2
 
Effective Java - Generics
Effective Java - GenericsEffective Java - Generics
Effective Java - Generics
 
7 algorithm
7 algorithm7 algorithm
7 algorithm
 
Introduction to Python Language and Data Types
Introduction to Python Language and Data TypesIntroduction to Python Language and Data Types
Introduction to Python Language and Data Types
 
Introduction to Python - Part Three
Introduction to Python - Part ThreeIntroduction to Python - Part Three
Introduction to Python - Part Three
 
Python quickstart for programmers: Python Kung Fu
Python quickstart for programmers: Python Kung FuPython quickstart for programmers: Python Kung Fu
Python quickstart for programmers: Python Kung Fu
 
Introduction to Python - Training for Kids
Introduction to Python - Training for KidsIntroduction to Python - Training for Kids
Introduction to Python - Training for Kids
 
Advanced Python : Decorators
Advanced Python : DecoratorsAdvanced Python : Decorators
Advanced Python : Decorators
 
Python language data types
Python language data typesPython language data types
Python language data types
 
Advanced Python, Part 1
Advanced Python, Part 1Advanced Python, Part 1
Advanced Python, Part 1
 
Java Generics - by Example
Java Generics - by ExampleJava Generics - by Example
Java Generics - by Example
 
Apprentissage statistique et analyse prédictive en Python avec scikit-learn p...
Apprentissage statistique et analyse prédictive en Python avec scikit-learn p...Apprentissage statistique et analyse prédictive en Python avec scikit-learn p...
Apprentissage statistique et analyse prédictive en Python avec scikit-learn p...
 
Dotnet programming concepts difference faqs- 2
Dotnet programming concepts difference faqs- 2Dotnet programming concepts difference faqs- 2
Dotnet programming concepts difference faqs- 2
 
Java chapter 6 - Arrays -syntax and use
Java chapter 6 - Arrays -syntax and useJava chapter 6 - Arrays -syntax and use
Java chapter 6 - Arrays -syntax and use
 
Programming with Python - Week 2
Programming with Python - Week 2Programming with Python - Week 2
Programming with Python - Week 2
 

Viewers also liked

12 Machine Learning Supervised Hidden Markov Chains
12 Machine Learning  Supervised Hidden Markov Chains12 Machine Learning  Supervised Hidden Markov Chains
12 Machine Learning Supervised Hidden Markov ChainsAndres Mendez-Vazquez
 
Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks...
Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks...Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks...
Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks...Sri Ambati
 
Alex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning ApplicationsAlex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning ApplicationsSri Ambati
 
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...Beat Signer
 
Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov ModelsVu Pham
 

Viewers also liked (6)

12 Machine Learning Supervised Hidden Markov Chains
12 Machine Learning  Supervised Hidden Markov Chains12 Machine Learning  Supervised Hidden Markov Chains
12 Machine Learning Supervised Hidden Markov Chains
 
Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks...
Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks...Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks...
Smart Applications with Machine Learning by H2O - Alex Tellez of RHalf speaks...
 
Alex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning ApplicationsAlex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning Applications
 
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
 
Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov Models
 
SlideShare 101
SlideShare 101SlideShare 101
SlideShare 101
 

Similar to Preparation Data Structures 09 hash tables

Query DSL In Elasticsearch
Query DSL In ElasticsearchQuery DSL In Elasticsearch
Query DSL In ElasticsearchKnoldus Inc.
 
Python Dictionaries and Sets
Python Dictionaries and SetsPython Dictionaries and Sets
Python Dictionaries and SetsNicole Ryan
 
Introduction to Modern Perl
Introduction to Modern PerlIntroduction to Modern Perl
Introduction to Modern PerlDave Cross
 
Rspec Tips
Rspec TipsRspec Tips
Rspec Tipslionpeal
 
Session 1.5 supporting virtual integration of linked data with just-in-time...
Session 1.5   supporting virtual integration of linked data with just-in-time...Session 1.5   supporting virtual integration of linked data with just-in-time...
Session 1.5 supporting virtual integration of linked data with just-in-time...semanticsconference
 
Tdd is Dead, Long Live TDD
Tdd is Dead, Long Live TDDTdd is Dead, Long Live TDD
Tdd is Dead, Long Live TDDJonathan Acker
 
Solve the following problem in Python and fill the two funct.pdf
Solve the following problem in Python and fill the two funct.pdfSolve the following problem in Python and fill the two funct.pdf
Solve the following problem in Python and fill the two funct.pdfkshitiz77
 
07 darwino rest services
07   darwino rest services07   darwino rest services
07 darwino rest servicesdarwinodb
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_uploadProf. Wim Van Criekinge
 
Web-IT Support and Consulting - bulk dBase (DBF) exports via Microsoft Excel ...
Web-IT Support and Consulting - bulk dBase (DBF) exports via Microsoft Excel ...Web-IT Support and Consulting - bulk dBase (DBF) exports via Microsoft Excel ...
Web-IT Support and Consulting - bulk dBase (DBF) exports via Microsoft Excel ...Dirk Cludts
 
Web-IT Support and Consulting - dBase exports
Web-IT Support and Consulting - dBase exportsWeb-IT Support and Consulting - dBase exports
Web-IT Support and Consulting - dBase exportsDirk Cludts
 
Generic Programming
Generic ProgrammingGeneric Programming
Generic ProgrammingPingLun Liao
 

Similar to Preparation Data Structures 09 hash tables (20)

14 Skip Lists
14 Skip Lists14 Skip Lists
14 Skip Lists
 
Introduction to Perl and BioPerl
Introduction to Perl and BioPerlIntroduction to Perl and BioPerl
Introduction to Perl and BioPerl
 
Query DSL In Elasticsearch
Query DSL In ElasticsearchQuery DSL In Elasticsearch
Query DSL In Elasticsearch
 
Python Dictionaries and Sets
Python Dictionaries and SetsPython Dictionaries and Sets
Python Dictionaries and Sets
 
Introduction to Modern Perl
Introduction to Modern PerlIntroduction to Modern Perl
Introduction to Modern Perl
 
Rspec Tips
Rspec TipsRspec Tips
Rspec Tips
 
Bioinformatica 10-11-2011-p6-bioperl
Bioinformatica 10-11-2011-p6-bioperlBioinformatica 10-11-2011-p6-bioperl
Bioinformatica 10-11-2011-p6-bioperl
 
Session 1.5 supporting virtual integration of linked data with just-in-time...
Session 1.5   supporting virtual integration of linked data with just-in-time...Session 1.5   supporting virtual integration of linked data with just-in-time...
Session 1.5 supporting virtual integration of linked data with just-in-time...
 
Tdd is Dead, Long Live TDD
Tdd is Dead, Long Live TDDTdd is Dead, Long Live TDD
Tdd is Dead, Long Live TDD
 
Python Lecture 10
Python Lecture 10Python Lecture 10
Python Lecture 10
 
Bioinformatics p5-bioperlv2014
Bioinformatics p5-bioperlv2014Bioinformatics p5-bioperlv2014
Bioinformatics p5-bioperlv2014
 
.net F# mutable dictionay
.net F# mutable dictionay.net F# mutable dictionay
.net F# mutable dictionay
 
Solve the following problem in Python and fill the two funct.pdf
Solve the following problem in Python and fill the two funct.pdfSolve the following problem in Python and fill the two funct.pdf
Solve the following problem in Python and fill the two funct.pdf
 
07 darwino rest services
07   darwino rest services07   darwino rest services
07 darwino rest services
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload
 
Twig tips and tricks
Twig tips and tricksTwig tips and tricks
Twig tips and tricks
 
Web-IT Support and Consulting - bulk dBase (DBF) exports via Microsoft Excel ...
Web-IT Support and Consulting - bulk dBase (DBF) exports via Microsoft Excel ...Web-IT Support and Consulting - bulk dBase (DBF) exports via Microsoft Excel ...
Web-IT Support and Consulting - bulk dBase (DBF) exports via Microsoft Excel ...
 
Web-IT Support and Consulting - dBase exports
Web-IT Support and Consulting - dBase exportsWeb-IT Support and Consulting - dBase exports
Web-IT Support and Consulting - dBase exports
 
groovy & grails - lecture 2
groovy & grails - lecture 2groovy & grails - lecture 2
groovy & grails - lecture 2
 
Generic Programming
Generic ProgrammingGeneric Programming
Generic Programming
 

More from Andres Mendez-Vazquez

01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectorsAndres Mendez-Vazquez
 
01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issuesAndres Mendez-Vazquez
 
05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiationAndres Mendez-Vazquez
 
01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep LearningAndres Mendez-Vazquez
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learningAndres Mendez-Vazquez
 
Neural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusNeural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusAndres Mendez-Vazquez
 
Introduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusIntroduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusAndres Mendez-Vazquez
 
Ideas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesIdeas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesAndres Mendez-Vazquez
 
20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variationsAndres Mendez-Vazquez
 

More from Andres Mendez-Vazquez (20)

2.03 bayesian estimation
2.03 bayesian estimation2.03 bayesian estimation
2.03 bayesian estimation
 
05 linear transformations
05 linear transformations05 linear transformations
05 linear transformations
 
01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors
 
01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues
 
01.02 linear equations
01.02 linear equations01.02 linear equations
01.02 linear equations
 
01.01 vector spaces
01.01 vector spaces01.01 vector spaces
01.01 vector spaces
 
06 recurrent neural_networks
06 recurrent neural_networks06 recurrent neural_networks
06 recurrent neural_networks
 
05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation
 
Zetta global
Zetta globalZetta global
Zetta global
 
01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning01 Introduction to Neural Networks and Deep Learning
01 Introduction to Neural Networks and Deep Learning
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learning
 
Neural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusNeural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning Syllabus
 
Introduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusIntroduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabus
 
Ideas 09 22_2018
Ideas 09 22_2018Ideas 09 22_2018
Ideas 09 22_2018
 
Ideas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesIdeas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data Sciences
 
Analysis of Algorithms Syllabus
Analysis of Algorithms  SyllabusAnalysis of Algorithms  Syllabus
Analysis of Algorithms Syllabus
 
20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations
 
18.1 combining models
18.1 combining models18.1 combining models
18.1 combining models
 
17 vapnik chervonenkis dimension
17 vapnik chervonenkis dimension17 vapnik chervonenkis dimension
17 vapnik chervonenkis dimension
 
A basic introduction to learning
A basic introduction to learningA basic introduction to learning
A basic introduction to learning
 

Recently uploaded

Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 

Recently uploaded (20)

Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 

Preparation Data Structures 09 hash tables

  • 2. Images/cinvestav- Outline 1 Introduction 2 Operation in a ADT dictionary Add Remove GetValue Contains Iterators 3 Example 4 Implementation 5 Hash Tables Number of Keys Hash Functions Hashing By Division 6 Overflow Handling Open Addressing Chaining 2 / 127
  • 3. Images/cinvestav- Dictionaries Definition The ADT dictionary—also called a map, table, or associative array—contains entries that each have two parts: A keyword—usually called a search key—such as an English word or a person’s name A value—such as a definition, an address, or a telephone number—associated with that key 3 / 127
  • 4. Images/cinvestav- Dictionaries Definition The ADT dictionary—also called a map, table, or associative array—contains entries that each have two parts: A keyword—usually called a search key—such as an English word or a person’s name A value—such as a definition, an address, or a telephone number—associated with that key 3 / 127
  • 5. Images/cinvestav- Dictionaries Definition The ADT dictionary—also called a map, table, or associative array—contains entries that each have two parts: A keyword—usually called a search key—such as an English word or a person’s name A value—such as a definition, an address, or a telephone number—associated with that key 3 / 127
  • 6. Images/cinvestav- Example Dictionary with Duplicates Pairs are of the form (word, meaning). More than a single meaning (bolt, a threaded pin) (bolt, a crash of thunder) (bolt, to shoot forth suddenly) (bolt, a gulp) (bolt, a standard roll of cloth) etc. 4 / 127
  • 7. Images/cinvestav- Example Dictionary with Duplicates Pairs are of the form (word, meaning). More than a single meaning (bolt, a threaded pin) (bolt, a crash of thunder) (bolt, to shoot forth suddenly) (bolt, a gulp) (bolt, a standard roll of cloth) etc. 4 / 127
  • 8. Images/cinvestav- Example Dictionary with Duplicates Pairs are of the form (word, meaning). More than a single meaning (bolt, a threaded pin) (bolt, a crash of thunder) (bolt, to shoot forth suddenly) (bolt, a gulp) (bolt, a standard roll of cloth) etc. 4 / 127
  • 9. Images/cinvestav- Example Dictionary with Duplicates Pairs are of the form (word, meaning). More than a single meaning (bolt, a threaded pin) (bolt, a crash of thunder) (bolt, to shoot forth suddenly) (bolt, a gulp) (bolt, a standard roll of cloth) etc. 4 / 127
  • 10. Images/cinvestav- Example Dictionary with Duplicates Pairs are of the form (word, meaning). More than a single meaning (bolt, a threaded pin) (bolt, a crash of thunder) (bolt, to shoot forth suddenly) (bolt, a gulp) (bolt, a standard roll of cloth) etc. 4 / 127
  • 11. Images/cinvestav- Example Dictionary with Duplicates Pairs are of the form (word, meaning). More than a single meaning (bolt, a threaded pin) (bolt, a crash of thunder) (bolt, to shoot forth suddenly) (bolt, a gulp) (bolt, a standard roll of cloth) etc. 4 / 127
  • 12. Images/cinvestav- Example Dictionary with Duplicates Pairs are of the form (word, meaning). More than a single meaning (bolt, a threaded pin) (bolt, a crash of thunder) (bolt, to shoot forth suddenly) (bolt, a gulp) (bolt, a standard roll of cloth) etc. 4 / 127
  • 13. Images/cinvestav- Thus We have possibly in a dictionary Sorted keys Duplicate keys 5 / 127
  • 14. Images/cinvestav- Operation in the ADT dictionary Common operations with most databases insert adds a new entry to the dictionary, given a search key and associated value. delete removes an entry, given its associated search key retrieve retrieves the value associated with a given search key search sees whether the dictionary contains a given search key traverse It traverse all the search keys in the dictionary It traverse all the values in the dictionary 6 / 127
  • 15. Images/cinvestav- Operation in the ADT dictionary Common operations with most databases insert adds a new entry to the dictionary, given a search key and associated value. delete removes an entry, given its associated search key retrieve retrieves the value associated with a given search key search sees whether the dictionary contains a given search key traverse It traverse all the search keys in the dictionary It traverse all the values in the dictionary 6 / 127
  • 16. Images/cinvestav- Operation in the ADT dictionary Common operations with most databases insert adds a new entry to the dictionary, given a search key and associated value. delete removes an entry, given its associated search key retrieve retrieves the value associated with a given search key search sees whether the dictionary contains a given search key traverse It traverse all the search keys in the dictionary It traverse all the values in the dictionary 6 / 127
  • 17. Images/cinvestav- Operation in the ADT dictionary Common operations with most databases insert adds a new entry to the dictionary, given a search key and associated value. delete removes an entry, given its associated search key retrieve retrieves the value associated with a given search key search sees whether the dictionary contains a given search key traverse It traverse all the search keys in the dictionary It traverse all the values in the dictionary 6 / 127
  • 18. Images/cinvestav- Operation in the ADT dictionary Common operations with most databases insert adds a new entry to the dictionary, given a search key and associated value. delete removes an entry, given its associated search key retrieve retrieves the value associated with a given search key search sees whether the dictionary contains a given search key traverse It traverse all the search keys in the dictionary It traverse all the values in the dictionary 6 / 127
  • 19. Images/cinvestav- Operation in the ADT dictionary Common operations with most databases insert adds a new entry to the dictionary, given a search key and associated value. delete removes an entry, given its associated search key retrieve retrieves the value associated with a given search key search sees whether the dictionary contains a given search key traverse It traverse all the search keys in the dictionary It traverse all the values in the dictionary 6 / 127
  • 20. Images/cinvestav- Operation in the ADT dictionary Common operations with most databases insert adds a new entry to the dictionary, given a search key and associated value. delete removes an entry, given its associated search key retrieve retrieves the value associated with a given search key search sees whether the dictionary contains a given search key traverse It traverse all the search keys in the dictionary It traverse all the values in the dictionary 6 / 127
  • 21. Images/cinvestav- In addition We have the following extra operations Detect whether a dictionary is empty Get the number of entries in the dictionary Remove all entries from the dictionary 7 / 127
  • 22. Images/cinvestav- In addition We have the following extra operations Detect whether a dictionary is empty Get the number of entries in the dictionary Remove all entries from the dictionary 7 / 127
  • 23. Images/cinvestav- In addition We have the following extra operations Detect whether a dictionary is empty Get the number of entries in the dictionary Remove all entries from the dictionary 7 / 127
  • 24. Images/cinvestav- Outline 1 Introduction 2 Operation in a ADT dictionary Add Remove GetValue Contains Iterators 3 Example 4 Implementation 5 Hash Tables Number of Keys Hash Functions Hashing By Division 6 Overflow Handling Open Addressing Chaining 8 / 127
  • 25. Images/cinvestav- Specifications: Add Pseudocode add(key, value) Task It adds the pair (key , value) to the dictionary. Input and Output Input: key is an object search key, value is an associated object. Output: None. 9 / 127
  • 26. Images/cinvestav- Specifications: Add Pseudocode add(key, value) Task It adds the pair (key , value) to the dictionary. Input and Output Input: key is an object search key, value is an associated object. Output: None. 9 / 127
  • 27. Images/cinvestav- Specifications: Add Pseudocode add(key, value) Task It adds the pair (key , value) to the dictionary. Input and Output Input: key is an object search key, value is an associated object. Output: None. 9 / 127
  • 29. Images/cinvestav- Outline 1 Introduction 2 Operation in a ADT dictionary Add Remove GetValue Contains Iterators 3 Example 4 Implementation 5 Hash Tables Number of Keys Hash Functions Hashing By Division 6 Overflow Handling Open Addressing Chaining 11 / 127
  • 30. Images/cinvestav- Specifications: Remove Pseudocode remove(key) Task It removes from the dictionary the entry that corresponds to a given search key. Input and Output Input: key is an object search key. Output: Returns either the value that was associated with the search key or null if no such object exists. 12 / 127
  • 31. Images/cinvestav- Specifications: Remove Pseudocode remove(key) Task It removes from the dictionary the entry that corresponds to a given search key. Input and Output Input: key is an object search key. Output: Returns either the value that was associated with the search key or null if no such object exists. 12 / 127
  • 32. Images/cinvestav- Specifications: Remove Pseudocode remove(key) Task It removes from the dictionary the entry that corresponds to a given search key. Input and Output Input: key is an object search key. Output: Returns either the value that was associated with the search key or null if no such object exists. 12 / 127
  • 34. Images/cinvestav- Outline 1 Introduction 2 Operation in a ADT dictionary Add Remove GetValue Contains Iterators 3 Example 4 Implementation 5 Hash Tables Number of Keys Hash Functions Hashing By Division 6 Overflow Handling Open Addressing Chaining 14 / 127
  • 35. Images/cinvestav- Specifications: GetValue Pseudocode getValue(key) Task It retrieves from the dictionary the value that corresponds to a given search key. Input and Output Input: key is an object search key. Output: Returns either the value associated with the search key or null if no such object exists. 15 / 127
  • 36. Images/cinvestav- Specifications: GetValue Pseudocode getValue(key) Task It retrieves from the dictionary the value that corresponds to a given search key. Input and Output Input: key is an object search key. Output: Returns either the value associated with the search key or null if no such object exists. 15 / 127
  • 37. Images/cinvestav- Specifications: GetValue Pseudocode getValue(key) Task It retrieves from the dictionary the value that corresponds to a given search key. Input and Output Input: key is an object search key. Output: Returns either the value associated with the search key or null if no such object exists. 15 / 127
  • 38. Images/cinvestav- Outline 1 Introduction 2 Operation in a ADT dictionary Add Remove GetValue Contains Iterators 3 Example 4 Implementation 5 Hash Tables Number of Keys Hash Functions Hashing By Division 6 Overflow Handling Open Addressing Chaining 16 / 127
  • 39. Images/cinvestav- Specifications: Contains Pseudocode contains(key) Task It sees whether any entry in the dictionary has a given search key. Input and Output Input: key is an object search key. Output: Returns true if an entry in the dictionary has key as its search key. 17 / 127
  • 40. Images/cinvestav- Specifications: Contains Pseudocode contains(key) Task It sees whether any entry in the dictionary has a given search key. Input and Output Input: key is an object search key. Output: Returns true if an entry in the dictionary has key as its search key. 17 / 127
  • 41. Images/cinvestav- Specifications: Contains Pseudocode contains(key) Task It sees whether any entry in the dictionary has a given search key. Input and Output Input: key is an object search key. Output: Returns true if an entry in the dictionary has key as its search key. 17 / 127
  • 42. Images/cinvestav- Outline 1 Introduction 2 Operation in a ADT dictionary Add Remove GetValue Contains Iterators 3 Example 4 Implementation 5 Hash Tables Number of Keys Hash Functions Hashing By Division 6 Overflow Handling Open Addressing Chaining 18 / 127
  • 43. Images/cinvestav- Specifications: GetKeyIterator Pseudocode getKeyIterator() Task It creates an iterator that traverses all search keys in the dictionary. Input and Output Input: None. Output: Returns an iterator that provides sequential access to the search keys in the dictionary. 19 / 127
  • 44. Images/cinvestav- Specifications: GetKeyIterator Pseudocode getKeyIterator() Task It creates an iterator that traverses all search keys in the dictionary. Input and Output Input: None. Output: Returns an iterator that provides sequential access to the search keys in the dictionary. 19 / 127
  • 45. Images/cinvestav- Specifications: GetKeyIterator Pseudocode getKeyIterator() Task It creates an iterator that traverses all search keys in the dictionary. Input and Output Input: None. Output: Returns an iterator that provides sequential access to the search keys in the dictionary. 19 / 127
  • 46. Images/cinvestav- Specifications: GetValueIterator Pseudocode getValueIterator() Task It creates an iterator that traverses all values in the dictionary. Input and Output Input: None. Output: Returns an iterator that provides sequential access to the values in the dictionary. 20 / 127
  • 47. Images/cinvestav- Specifications: GetValueIterator Pseudocode getValueIterator() Task It creates an iterator that traverses all values in the dictionary. Input and Output Input: None. Output: Returns an iterator that provides sequential access to the values in the dictionary. 20 / 127
  • 48. Images/cinvestav- Specifications: GetValueIterator Pseudocode getValueIterator() Task It creates an iterator that traverses all values in the dictionary. Input and Output Input: None. Output: Returns an iterator that provides sequential access to the values in the dictionary. 20 / 127
  • 49. Images/cinvestav- Other Operations isEmpty() It sees whether the dictionary is empty. getSize() It gets the size of the dictionary. clear() It removes all entries from the dictionary. 21 / 127
  • 50. Images/cinvestav- Other Operations isEmpty() It sees whether the dictionary is empty. getSize() It gets the size of the dictionary. clear() It removes all entries from the dictionary. 21 / 127
  • 51. Images/cinvestav- Other Operations isEmpty() It sees whether the dictionary is empty. getSize() It gets the size of the dictionary. clear() It removes all entries from the dictionary. 21 / 127
  • 52. Images/cinvestav- However, we have two scenarios Distinct search keys Case 1 You can refuse to add another key-value. Case 2 You can change the existing value associated with key to the new value. Then, you return the old value Duplicate search keys if the method add adds every given key-value entry to a dictionary The methods remove and getValue must deal with multiple entries that have the same search key. What do you remove or return!!! 22 / 127
  • 53. Images/cinvestav- However, we have two scenarios Distinct search keys Case 1 You can refuse to add another key-value. Case 2 You can change the existing value associated with key to the new value. Then, you return the old value Duplicate search keys if the method add adds every given key-value entry to a dictionary The methods remove and getValue must deal with multiple entries that have the same search key. What do you remove or return!!! 22 / 127
  • 54. Images/cinvestav- However, we have two scenarios Distinct search keys Case 1 You can refuse to add another key-value. Case 2 You can change the existing value associated with key to the new value. Then, you return the old value Duplicate search keys if the method add adds every given key-value entry to a dictionary The methods remove and getValue must deal with multiple entries that have the same search key. What do you remove or return!!! 22 / 127
  • 55. Images/cinvestav- However, we have two scenarios Distinct search keys Case 1 You can refuse to add another key-value. Case 2 You can change the existing value associated with key to the new value. Then, you return the old value Duplicate search keys if the method add adds every given key-value entry to a dictionary The methods remove and getValue must deal with multiple entries that have the same search key. What do you remove or return!!! 22 / 127
  • 56. Images/cinvestav- However, we have two scenarios Distinct search keys Case 1 You can refuse to add another key-value. Case 2 You can change the existing value associated with key to the new value. Then, you return the old value Duplicate search keys if the method add adds every given key-value entry to a dictionary The methods remove and getValue must deal with multiple entries that have the same search key. What do you remove or return!!! 22 / 127
  • 57. Images/cinvestav- Interface We have the following interface import java . u t i l . I t e r a t o r ; p u b l i c i n t e r f a c e D i c t i o n a r y I n t e r f a c e <Key , Value> { p u b l i c Value add ( Key k , Value Item ) ; p u b l i c Value remove ( Key k ) ; p u b l i c Value getValue ( Key k ) ; p u b l i c boolean c o n t a i n s ( Key k ) ; p u b l i c I t e r a t o r <Key> g e t K e y I t e r a t o r ( ) ; p u b l i c I t e r a t o r <Value> g e t V a l u e I t e r a t o r ( ) ; p u b l i c boolean isEmpty ( ) ; p u b l i c i n t g e t S i z e ( ) ; p u b l i c void c l e a r ( ) ; } 23 / 127
  • 58. Images/cinvestav- Where we can use this ADT? In the phone directory problem It is a directory that uses a name as the key and adds and returns a phone number For example Name Number Suzanne Nouveaux 401-555-1234 Andres Mendez-Vazquez 301-123-2345 ... ... 24 / 127
  • 59. Images/cinvestav- Where we can use this ADT? In the phone directory problem It is a directory that uses a name as the key and adds and returns a phone number For example Name Number Suzanne Nouveaux 401-555-1234 Andres Mendez-Vazquez 301-123-2345 ... ... 24 / 127
  • 60. Images/cinvestav- Thus, we have the following diagram A class Diagram TelephoneDirectory readFile(data) getPhoneNumber(name) PhoneDirectory Dictionary Name String 1 1 * * * * 25 / 127
  • 61. Images/cinvestav- Basic Code We have something like this p u b l i c c l a s s TelephoneDirectory { p r i v a t e D i c t i o n a r y I n t e r f a c e <Name , String > phoneBook ; p u b l i c TelephoneDirectory () { phoneBook = new SortedDictionary <Name , String >(); } // end d e f a u l t c o n s t r u c t o r /∗∗ Reads a t e x t f i l e of names and telephone numbers∗∗/ p u b l i c void r e a d F i l e ( Scanner data ) { . . .} /∗∗ Gets the phone number of a given person . ∗/ p u b l i c S t r i n g getPhoneNumber (Name personName ) { . . . } // end getPhoneNumber } // end TelephoneDirectory 26 / 127
  • 62. Images/cinvestav- Now, the Big Question It is a big one How do we implement this data structure? Possible ways Linear List Skip List Hash Tables ... 27 / 127
  • 63. Images/cinvestav- Now, the Big Question It is a big one How do we implement this data structure? Possible ways Linear List Skip List Hash Tables ... 27 / 127
  • 64. Images/cinvestav- Now, the Big Question It is a big one How do we implement this data structure? Possible ways Linear List Skip List Hash Tables ... 27 / 127
  • 65. Images/cinvestav- Now, the Big Question It is a big one How do we implement this data structure? Possible ways Linear List Skip List Hash Tables ... 27 / 127
  • 66. Images/cinvestav- Now, the Big Question It is a big one How do we implement this data structure? Possible ways Linear List Skip List Hash Tables ... 27 / 127
  • 67. Images/cinvestav- First: Represent As A Linear List You have L = (e0, e1, ..., en−1) Where Each ei is a pair (key, element). We can use the following representations Array or linked representation. 28 / 127
  • 68. Images/cinvestav- First: Represent As A Linear List You have L = (e0, e1, ..., en−1) Where Each ei is a pair (key, element). We can use the following representations Array or linked representation. 28 / 127
  • 69. Images/cinvestav- First: Represent As A Linear List You have L = (e0, e1, ..., en−1) Where Each ei is a pair (key, element). We can use the following representations Array or linked representation. 28 / 127
  • 70. Images/cinvestav- Array Representation Our classic array b c d e a We have then Operation in Array Representation Complexity getValue(theKey) O(size) add(theKey, theItem) O(size) to find duplicate O(1) to add at right end remove(theKey) O(size) 29 / 127
  • 71. Images/cinvestav- Array Representation Our classic array b c d e a We have then Operation in Array Representation Complexity getValue(theKey) O(size) add(theKey, theItem) O(size) to find duplicate O(1) to add at right end remove(theKey) O(size) 29 / 127
  • 72. Images/cinvestav- What if we sort the array? Sorted Keys b c d ea We have then Operation in Array Representation Complexity getValue(theKey) O(logsize) add(theKey, theItem) O(logsize) to find duplicate O(size) to add remove(theKey) O(size) 30 / 127
  • 73. Images/cinvestav- What if we sort the array? Sorted Keys b c d ea We have then Operation in Array Representation Complexity getValue(theKey) O(logsize) add(theKey, theItem) O(logsize) to find duplicate O(size) to add remove(theKey) O(size) 30 / 127
  • 74. Images/cinvestav- Unsorted Chain Our Structure b c d e a Complexity Operation in Chain Representation Complexity getValue(theKey) O(size) add(theKey, theItem) O(size) to find duplicate O(1) to add remove(theKey) O(size) 31 / 127
  • 75. Images/cinvestav- Unsorted Chain Our Structure b c d e a Complexity Operation in Chain Representation Complexity getValue(theKey) O(size) add(theKey, theItem) O(size) to find duplicate O(1) to add remove(theKey) O(size) 31 / 127
  • 76. Images/cinvestav- Sorted Chain Our Structure b c d ea Complexity Operation in Chain Representation Complexity getValue(theKey) O(size) add(theKey, theItem) O(size) to find duplicate O(1) to add remove(theKey) O(size) 32 / 127
  • 77. Images/cinvestav- Sorted Chain Our Structure b c d ea Complexity Operation in Chain Representation Complexity getValue(theKey) O(size) add(theKey, theItem) O(size) to find duplicate O(1) to add remove(theKey) O(size) 32 / 127
  • 78. Images/cinvestav- Skip Lists: we will skip it - It is for an advance class of analysis of algorithms We have something like this Complexity Operation Complexity - Worst Case Complexity - Expected getValue(theKey) O(size) O(log size) add(theKey, theItem) O(size) O(logsize) remove(theKey) O(size) O(logsize) 33 / 127
  • 79. Images/cinvestav- Skip Lists: we will skip it - It is for an advance class of analysis of algorithms We have something like this Complexity Operation Complexity - Worst Case Complexity - Expected getValue(theKey) O(size) O(log size) add(theKey, theItem) O(size) O(logsize) remove(theKey) O(size) O(logsize) 33 / 127
  • 80. Images/cinvestav- We will concentrate our efforts in the Hash Tables Definition A hash table or hash map T is a data structure, most commonly an array, that uses a hash function to efficiently map certain identifiers of keys (e.g. person names) to associated values. Why? Operation in Array Representation Complexity - Worst Case Complexity - Expected getValue(theKey) O(size) O (1 + C) add(theKey, theItem) O(size) O (1 + C) remove(theKey) O(size) O (1 + C) 34 / 127
  • 81. Images/cinvestav- We will concentrate our efforts in the Hash Tables Definition A hash table or hash map T is a data structure, most commonly an array, that uses a hash function to efficiently map certain identifiers of keys (e.g. person names) to associated values. Why? Operation in Array Representation Complexity - Worst Case Complexity - Expected getValue(theKey) O(size) O (1 + C) add(theKey, theItem) O(size) O (1 + C) remove(theKey) O(size) O (1 + C) 34 / 127
  • 82. Images/cinvestav- Then Advantages They have the advantage of having a expected complexity of operations of O(1 + C) Still, be aware of C because this will change depending on which overflow policy you use... 35 / 127
  • 83. Images/cinvestav- Then Advantages They have the advantage of having a expected complexity of operations of O(1 + C) Still, be aware of C because this will change depending on which overflow policy you use... 35 / 127
  • 84. Images/cinvestav- Outline 1 Introduction 2 Operation in a ADT dictionary Add Remove GetValue Contains Iterators 3 Example 4 Implementation 5 Hash Tables Number of Keys Hash Functions Hashing By Division 6 Overflow Handling Open Addressing Chaining 36 / 127
  • 85. Images/cinvestav- You have two cases for this data structure First Small universe of keys. Second Large number of keys 37 / 127
  • 86. Images/cinvestav- You have two cases for this data structure First Small universe of keys. Second Large number of keys 37 / 127
  • 87. Images/cinvestav- When you have a small universe of keys, U We can do the following Key values are direct addresses in the array. Direct implementation or Direct-address tables. Operations 1 Direct-Address-Search(Table, key) return Table[key] 2 Direct-Address-Search(Table,key,value) Table[key]=value 3 Direct-Address-Delete(T, x) Table[key]=null 38 / 127
  • 88. Images/cinvestav- When you have a small universe of keys, U We can do the following Key values are direct addresses in the array. Direct implementation or Direct-address tables. Operations 1 Direct-Address-Search(Table, key) return Table[key] 2 Direct-Address-Search(Table,key,value) Table[key]=value 3 Direct-Address-Delete(T, x) Table[key]=null 38 / 127
  • 89. Images/cinvestav- When you have a small universe of keys, U We can do the following Key values are direct addresses in the array. Direct implementation or Direct-address tables. Operations 1 Direct-Address-Search(Table, key) return Table[key] 2 Direct-Address-Search(Table,key,value) Table[key]=value 3 Direct-Address-Delete(T, x) Table[key]=null 38 / 127
  • 90. Images/cinvestav- When you have a small universe of keys, U We can do the following Key values are direct addresses in the array. Direct implementation or Direct-address tables. Operations 1 Direct-Address-Search(Table, key) return Table[key] 2 Direct-Address-Search(Table,key,value) Table[key]=value 3 Direct-Address-Delete(T, x) Table[key]=null 38 / 127
  • 91. Images/cinvestav- When you have a small universe of keys, U We can do the following Key values are direct addresses in the array. Direct implementation or Direct-address tables. Operations 1 Direct-Address-Search(Table, key) return Table[key] 2 Direct-Address-Search(Table,key,value) Table[key]=value 3 Direct-Address-Delete(T, x) Table[key]=null 38 / 127
  • 92. Images/cinvestav- When you have a small universe of keys, U We can do the following Key values are direct addresses in the array. Direct implementation or Direct-address tables. Operations 1 Direct-Address-Search(Table, key) return Table[key] 2 Direct-Address-Search(Table,key,value) Table[key]=value 3 Direct-Address-Delete(T, x) Table[key]=null 38 / 127
  • 93. Images/cinvestav- When you have a small universe of keys, U We can do the following Key values are direct addresses in the array. Direct implementation or Direct-address tables. Operations 1 Direct-Address-Search(Table, key) return Table[key] 2 Direct-Address-Search(Table,key,value) Table[key]=value 3 Direct-Address-Delete(T, x) Table[key]=null 38 / 127
  • 94. Images/cinvestav- When you have a small universe of keys, U We can do the following Key values are direct addresses in the array. Direct implementation or Direct-address tables. Operations 1 Direct-Address-Search(Table, key) return Table[key] 2 Direct-Address-Search(Table,key,value) Table[key]=value 3 Direct-Address-Delete(T, x) Table[key]=null 38 / 127
  • 95. Images/cinvestav- When you have a large universe of keys, U Then Then, it is impractical to store a table of the size of |U|. You can use a especial function for mapping h : U→{0, 1, ..., m − 1} (1) 39 / 127
  • 96. Images/cinvestav- When you have a large universe of keys, U Then Then, it is impractical to store a table of the size of |U|. You can use a especial function for mapping h : U→{0, 1, ..., m − 1} (1) 39 / 127
  • 97. Images/cinvestav- Example Imagine that you have A 1D array (or table) table[0 : m − 1]. Thus h(k)is the home bucket for key k. Then Every dictionary pair (key, Item) is stored in its home bucket table[h[key]]. 40 / 127
  • 98. Images/cinvestav- Example Imagine that you have A 1D array (or table) table[0 : m − 1]. Thus h(k)is the home bucket for key k. Then Every dictionary pair (key, Item) is stored in its home bucket table[h[key]]. 40 / 127
  • 99. Images/cinvestav- Example Imagine that you have A 1D array (or table) table[0 : m − 1]. Thus h(k)is the home bucket for key k. Then Every dictionary pair (key, Item) is stored in its home bucket table[h[key]]. 40 / 127
  • 100. Images/cinvestav- Example Push the following pairs in a hash table of size m = 8 (22,a), (33,c), (3,d), (73,e), (85,f). Hash function is key/11 Then, we have that 41 / 127
  • 101. Images/cinvestav- Example Push the following pairs in a hash table of size m = 8 (22,a), (33,c), (3,d), (73,e), (85,f). Hash function is key/11 Then, we have that 41 / 127
  • 102. Images/cinvestav- Example Push the following pairs in a hash table of size m = 8 (22,a), (33,c), (3,d), (73,e), (85,f). Hash function is key/11 Then, we have that 41 / 127
  • 103. Images/cinvestav- What Can Go Wrong? What if we add? Where does (26,g) go? Then PROBLEM!!! Keys that have the same home bucket are synonyms. 22 and 26 are synonyms with respect to the hash function that is in use. This is known as collision or overflow. 42 / 127
  • 104. Images/cinvestav- What Can Go Wrong? What if we add? Where does (26,g) go? Then PROBLEM!!! Keys that have the same home bucket are synonyms. 22 and 26 are synonyms with respect to the hash function that is in use. This is known as collision or overflow. 42 / 127
  • 105. Images/cinvestav- What Can Go Wrong? What if we add? Where does (26,g) go? Then PROBLEM!!! Keys that have the same home bucket are synonyms. 22 and 26 are synonyms with respect to the hash function that is in use. This is known as collision or overflow. 42 / 127
  • 106. Images/cinvestav- What Can Go Wrong? What if we add? Where does (26,g) go? Then PROBLEM!!! Keys that have the same home bucket are synonyms. 22 and 26 are synonyms with respect to the hash function that is in use. This is known as collision or overflow. 42 / 127
  • 107. Images/cinvestav- Collisions This is a problem We might try to avoid this by using a suitable hash function h. Idea Make appear to be “random” enough to avoid collisions altogether (Highly Improbable) or to minimize the probability of them. You still have the problem of collisions Possible Solutions to the problem: 1 Chaining 2 Open Addressing 43 / 127
  • 108. Images/cinvestav- Collisions This is a problem We might try to avoid this by using a suitable hash function h. Idea Make appear to be “random” enough to avoid collisions altogether (Highly Improbable) or to minimize the probability of them. You still have the problem of collisions Possible Solutions to the problem: 1 Chaining 2 Open Addressing 43 / 127
  • 109. Images/cinvestav- Collisions This is a problem We might try to avoid this by using a suitable hash function h. Idea Make appear to be “random” enough to avoid collisions altogether (Highly Improbable) or to minimize the probability of them. You still have the problem of collisions Possible Solutions to the problem: 1 Chaining 2 Open Addressing 43 / 127
  • 110. Images/cinvestav- Collisions This is a problem We might try to avoid this by using a suitable hash function h. Idea Make appear to be “random” enough to avoid collisions altogether (Highly Improbable) or to minimize the probability of them. You still have the problem of collisions Possible Solutions to the problem: 1 Chaining 2 Open Addressing 43 / 127
  • 111. Images/cinvestav- Collisions This is a problem We might try to avoid this by using a suitable hash function h. Idea Make appear to be “random” enough to avoid collisions altogether (Highly Improbable) or to minimize the probability of them. You still have the problem of collisions Possible Solutions to the problem: 1 Chaining 2 Open Addressing 43 / 127
  • 112. Images/cinvestav- Other Issues First Issue The choice of the possible hash function. Second The collision handling method Third The size (number of buckets) at the hash table 44 / 127
  • 113. Images/cinvestav- Other Issues First Issue The choice of the possible hash function. Second The collision handling method Third The size (number of buckets) at the hash table 44 / 127
  • 114. Images/cinvestav- Other Issues First Issue The choice of the possible hash function. Second The collision handling method Third The size (number of buckets) at the hash table 44 / 127
  • 115. Images/cinvestav- Outline 1 Introduction 2 Operation in a ADT dictionary Add Remove GetValue Contains Iterators 3 Example 4 Implementation 5 Hash Tables Number of Keys Hash Functions Hashing By Division 6 Overflow Handling Open Addressing Chaining 45 / 127
  • 116. Images/cinvestav- Hash Functions They have two parts 1 The conversion of the key into an integer in the case the key is not an integer. 2 The mapping to the home bucket. 46 / 127
  • 117. Images/cinvestav- Hash Functions They have two parts 1 The conversion of the key into an integer in the case the key is not an integer. 2 The mapping to the home bucket. 46 / 127
  • 118. Images/cinvestav- Hash Functions They have two parts 1 The conversion of the key into an integer in the case the key is not an integer. 2 The mapping to the home bucket. 46 / 127
  • 119. Images/cinvestav- String To Integer Something Notable Each Java character is 2 bytes long. An int is 4 bytes long. We could have the following string s = “pt” We then do the following: 1 int answer = s.charAt(0); 2 answer = (answer<<16)+s.charAt(1) ; 47 / 127
  • 120. Images/cinvestav- String To Integer Something Notable Each Java character is 2 bytes long. An int is 4 bytes long. We could have the following string s = “pt” We then do the following: 1 int answer = s.charAt(0); 2 answer = (answer<<16)+s.charAt(1) ; 47 / 127
  • 121. Images/cinvestav- String To Integer Something Notable Each Java character is 2 bytes long. An int is 4 bytes long. We could have the following string s = “pt” We then do the following: 1 int answer = s.charAt(0); 2 answer = (answer<<16)+s.charAt(1) ; 47 / 127
  • 122. Images/cinvestav- String To Integer Something Notable Each Java character is 2 bytes long. An int is 4 bytes long. We could have the following string s = “pt” We then do the following: 1 int answer = s.charAt(0); 2 answer = (answer<<16)+s.charAt(1) ; 47 / 127
  • 123. Images/cinvestav- String To Integer Something Notable Each Java character is 2 bytes long. An int is 4 bytes long. We could have the following string s = “pt” We then do the following: 1 int answer = s.charAt(0); 2 answer = (answer<<16)+s.charAt(1) ; 47 / 127
  • 124. Images/cinvestav- What about longer Strings? We can do the following process p u b l i c s t a t i c i n t i n t e g e r ( S t r i n g s ) { i n t lengt h = s . l ength ( ) ; // number of c h a r a c t e r s i n s i n t answer = 0; i f ( len gth % 2 == 1) {// lengt h i s odd answer = s . charAt ( le ngth − 1 ) ; length −−; } 48 / 127
  • 125. Images/cinvestav- Cont... The rest of the code // lengt h i s now even f o r ( i n t i = 0; i < lengt h ; i += 2) {// do two c h a r a c t e r s at a time answer += s . charAt ( i ) ; answer += (( i n t ) s . charAt ( i + 1)) << 16; } r e t u r n ( answer < 0) ? −answer : answer ; } 49 / 127
  • 126. Images/cinvestav- Analysis of hashing: Which hash function? Consider that: Good hash functions should maintain the property of simple uniform hashing! The keys have the same probability 1/m to be hashed to any bucket!!! A uniform hash function minimizes the likelihood of an overflow when keys are selected at random. 50 / 127
  • 127. Images/cinvestav- Analysis of hashing: Which hash function? Consider that: Good hash functions should maintain the property of simple uniform hashing! The keys have the same probability 1/m to be hashed to any bucket!!! A uniform hash function minimizes the likelihood of an overflow when keys are selected at random. 50 / 127
  • 128. Images/cinvestav- Analysis of hashing: Which hash function? Consider that: Good hash functions should maintain the property of simple uniform hashing! The keys have the same probability 1/m to be hashed to any bucket!!! A uniform hash function minimizes the likelihood of an overflow when keys are selected at random. 50 / 127
  • 129. Images/cinvestav- Possible hash function when the keys are natural numbers The division method h(k) = k mod m. Good choices for m are primes not too close to a power of 2. 51 / 127
  • 130. Images/cinvestav- Possible hash function when the keys are natural numbers The division method h(k) = k mod m. Good choices for m are primes not too close to a power of 2. 51 / 127
  • 131. Images/cinvestav- What if... Question: What about something with keys in a normal distribution? 52 / 127
  • 132. Images/cinvestav- Outline 1 Introduction 2 Operation in a ADT dictionary Add Remove GetValue Contains Iterators 3 Example 4 Implementation 5 Hash Tables Number of Keys Hash Functions Hashing By Division 6 Overflow Handling Open Addressing Chaining 53 / 127
  • 133. Images/cinvestav- Hashing By Division Universe of keys keySpace = all integers. Thus, we have that For every m, the number of integers that get mapped (hashed) into bucket i is approximately 232 /m. Properties The division method results in a uniform hash function when keySpace = all integers. 54 / 127
  • 134. Images/cinvestav- Hashing By Division Universe of keys keySpace = all integers. Thus, we have that For every m, the number of integers that get mapped (hashed) into bucket i is approximately 232 /m. Properties The division method results in a uniform hash function when keySpace = all integers. 54 / 127
  • 135. Images/cinvestav- Hashing By Division Universe of keys keySpace = all integers. Thus, we have that For every m, the number of integers that get mapped (hashed) into bucket i is approximately 232 /m. Properties The division method results in a uniform hash function when keySpace = all integers. 54 / 127
  • 136. Images/cinvestav- However Problem In practice, keys tend to be correlated. Thus The choice of the divisor b affects the distribution of home buckets. Then Because of this correlation, applications tend to have a bias towards keys that map into odd integers (or into even ones). 55 / 127
  • 137. Images/cinvestav- However Problem In practice, keys tend to be correlated. Thus The choice of the divisor b affects the distribution of home buckets. Then Because of this correlation, applications tend to have a bias towards keys that map into odd integers (or into even ones). 55 / 127
  • 138. Images/cinvestav- However Problem In practice, keys tend to be correlated. Thus The choice of the divisor b affects the distribution of home buckets. Then Because of this correlation, applications tend to have a bias towards keys that map into odd integers (or into even ones). 55 / 127
  • 139. Images/cinvestav- Examples Odd number and m an even number Odd integers hash into odd home buckets 15%14 = 1, 3%14 = 3, 23%14 = 9 Even number and m an even number Even integers into even home buckets. 20%14 = 6, 30%14 = 2, 8%14 = 8 Properties The bias in the keys results in a bias toward either the odd or even home buckets. 56 / 127
  • 140. Images/cinvestav- Examples Odd number and m an even number Odd integers hash into odd home buckets 15%14 = 1, 3%14 = 3, 23%14 = 9 Even number and m an even number Even integers into even home buckets. 20%14 = 6, 30%14 = 2, 8%14 = 8 Properties The bias in the keys results in a bias toward either the odd or even home buckets. 56 / 127
  • 141. Images/cinvestav- Examples Odd number and m an even number Odd integers hash into odd home buckets 15%14 = 1, 3%14 = 3, 23%14 = 9 Even number and m an even number Even integers into even home buckets. 20%14 = 6, 30%14 = 2, 8%14 = 8 Properties The bias in the keys results in a bias toward either the odd or even home buckets. 56 / 127
  • 142. Images/cinvestav- What if we use an Odd number? Odd number and m an odd number odd integers may hash into any home. 15%15 = 0, 3%15 = 3, 23%15 = 8 Even number and m an odd number Even integers may hash into any home. 20%15 = 5, 30%15 = 0, 8%15 = 8 Thus The bias in the keys does not result in a bias toward either the odd or even home buckets. 57 / 127
  • 143. Images/cinvestav- What if we use an Odd number? Odd number and m an odd number odd integers may hash into any home. 15%15 = 0, 3%15 = 3, 23%15 = 8 Even number and m an odd number Even integers may hash into any home. 20%15 = 5, 30%15 = 0, 8%15 = 8 Thus The bias in the keys does not result in a bias toward either the odd or even home buckets. 57 / 127
  • 144. Images/cinvestav- What if we use an Odd number? Odd number and m an odd number odd integers may hash into any home. 15%15 = 0, 3%15 = 3, 23%15 = 8 Even number and m an odd number Even integers may hash into any home. 20%15 = 5, 30%15 = 0, 8%15 = 8 Thus The bias in the keys does not result in a bias toward either the odd or even home buckets. 57 / 127
  • 145. Images/cinvestav- Bias Something Notable The bias in the keys does not result in a bias toward either the odd or even home buckets. Then, we have We have a better chance of uniformly distributed home buckets. Thus So do not use an even divisor. 58 / 127
  • 146. Images/cinvestav- Bias Something Notable The bias in the keys does not result in a bias toward either the odd or even home buckets. Then, we have We have a better chance of uniformly distributed home buckets. Thus So do not use an even divisor. 58 / 127
  • 147. Images/cinvestav- Bias Something Notable The bias in the keys does not result in a bias toward either the odd or even home buckets. Then, we have We have a better chance of uniformly distributed home buckets. Thus So do not use an even divisor. 58 / 127
  • 148. Images/cinvestav- Selecting The Divisor Another Problem Similar biased distribution of home buckets is seen, in practice, when the divisor is a multiple of prime numbers such as 3, 5, 7, . . . However The effect of each prime divisor p of m decreases as p gets larger. Rules of Choosing m Ideally, choose m so that it is a prime number. Not to close to a power of 2. 59 / 127
  • 149. Images/cinvestav- Selecting The Divisor Another Problem Similar biased distribution of home buckets is seen, in practice, when the divisor is a multiple of prime numbers such as 3, 5, 7, . . . However The effect of each prime divisor p of m decreases as p gets larger. Rules of Choosing m Ideally, choose m so that it is a prime number. Not to close to a power of 2. 59 / 127
  • 150. Images/cinvestav- Selecting The Divisor Another Problem Similar biased distribution of home buckets is seen, in practice, when the divisor is a multiple of prime numbers such as 3, 5, 7, . . . However The effect of each prime divisor p of m decreases as p gets larger. Rules of Choosing m Ideally, choose m so that it is a prime number. Not to close to a power of 2. 59 / 127
  • 151. Images/cinvestav- However Something Notable Even with this hash function, we can have problems Remember The Gaussian Keys... 60 / 127
  • 152. Images/cinvestav- However Something Notable Even with this hash function, we can have problems Remember The Gaussian Keys... 60 / 127
  • 153. Images/cinvestav- Java.util.HashTable Quite simple Simply uses a divisor that is an odd number. Simplify the implementation This simplifies implementation because we must be able to resize the hash table as more pairs are put into the dictionary. Why? Array doubling, for example, requires you to go from a 1D array table whose length is m (which is odd) to an array whose length is 2m + 1 (which is also odd). 61 / 127
  • 154. Images/cinvestav- Java.util.HashTable Quite simple Simply uses a divisor that is an odd number. Simplify the implementation This simplifies implementation because we must be able to resize the hash table as more pairs are put into the dictionary. Why? Array doubling, for example, requires you to go from a 1D array table whose length is m (which is odd) to an array whose length is 2m + 1 (which is also odd). 61 / 127
  • 155. Images/cinvestav- Java.util.HashTable Quite simple Simply uses a divisor that is an odd number. Simplify the implementation This simplifies implementation because we must be able to resize the hash table as more pairs are put into the dictionary. Why? Array doubling, for example, requires you to go from a 1D array table whose length is m (which is odd) to an array whose length is 2m + 1 (which is also odd). 61 / 127
  • 156. Images/cinvestav- A Better Solution: Universal Hashing Issues In practice, keys are not randomly distributed. Any fixed hash function might yield retrieval O(n) time. Goal To find hash functions that produce uniform random table indexes irrespective of the keys. Idea To select a hash function at random from a designed class of functions at the beginning of the execution. 62 / 127
  • 157. Images/cinvestav- A Better Solution: Universal Hashing Issues In practice, keys are not randomly distributed. Any fixed hash function might yield retrieval O(n) time. Goal To find hash functions that produce uniform random table indexes irrespective of the keys. Idea To select a hash function at random from a designed class of functions at the beginning of the execution. 62 / 127
  • 158. Images/cinvestav- A Better Solution: Universal Hashing Issues In practice, keys are not randomly distributed. Any fixed hash function might yield retrieval O(n) time. Goal To find hash functions that produce uniform random table indexes irrespective of the keys. Idea To select a hash function at random from a designed class of functions at the beginning of the execution. 62 / 127
  • 159. Images/cinvestav- Hashing methods: Universal hashing Example Set of hash functions Choose a hash function randomly (At the beginning of the execution) HASH TABLE 63 / 127
  • 160. Images/cinvestav- Example of Universal Hash Proceed as follows: Choose a primer number p large enough so that every possible key k is in the range [0, ..., p − 1] Zp = {0, 1, ..., p − 1}and Z∗ p = {1, ..., p − 1} Define the following hash function: ha,b(k) = ((ak + b) mod p) mod m, ∀a ∈ Z∗ p and b ∈ Zp The family of all such hash functions is: Hp,m = {ha,b : a ∈ Z∗ p and b ∈ Zp} Important a and b are chosen randomly at the beginning of execution. The class Hp,m of hash functions is universal. 64 / 127
  • 161. Images/cinvestav- Example of Universal Hash Proceed as follows: Choose a primer number p large enough so that every possible key k is in the range [0, ..., p − 1] Zp = {0, 1, ..., p − 1}and Z∗ p = {1, ..., p − 1} Define the following hash function: ha,b(k) = ((ak + b) mod p) mod m, ∀a ∈ Z∗ p and b ∈ Zp The family of all such hash functions is: Hp,m = {ha,b : a ∈ Z∗ p and b ∈ Zp} Important a and b are chosen randomly at the beginning of execution. The class Hp,m of hash functions is universal. 64 / 127
  • 162. Images/cinvestav- Example of Universal Hash Proceed as follows: Choose a primer number p large enough so that every possible key k is in the range [0, ..., p − 1] Zp = {0, 1, ..., p − 1}and Z∗ p = {1, ..., p − 1} Define the following hash function: ha,b(k) = ((ak + b) mod p) mod m, ∀a ∈ Z∗ p and b ∈ Zp The family of all such hash functions is: Hp,m = {ha,b : a ∈ Z∗ p and b ∈ Zp} Important a and b are chosen randomly at the beginning of execution. The class Hp,m of hash functions is universal. 64 / 127
  • 163. Images/cinvestav- Example of Universal Hash Proceed as follows: Choose a primer number p large enough so that every possible key k is in the range [0, ..., p − 1] Zp = {0, 1, ..., p − 1}and Z∗ p = {1, ..., p − 1} Define the following hash function: ha,b(k) = ((ak + b) mod p) mod m, ∀a ∈ Z∗ p and b ∈ Zp The family of all such hash functions is: Hp,m = {ha,b : a ∈ Z∗ p and b ∈ Zp} Important a and b are chosen randomly at the beginning of execution. The class Hp,m of hash functions is universal. 64 / 127
  • 164. Images/cinvestav- Example of Universal Hash Proceed as follows: Choose a primer number p large enough so that every possible key k is in the range [0, ..., p − 1] Zp = {0, 1, ..., p − 1}and Z∗ p = {1, ..., p − 1} Define the following hash function: ha,b(k) = ((ak + b) mod p) mod m, ∀a ∈ Z∗ p and b ∈ Zp The family of all such hash functions is: Hp,m = {ha,b : a ∈ Z∗ p and b ∈ Zp} Important a and b are chosen randomly at the beginning of execution. The class Hp,m of hash functions is universal. 64 / 127
  • 165. Images/cinvestav- Example of Universal Hash Proceed as follows: Choose a primer number p large enough so that every possible key k is in the range [0, ..., p − 1] Zp = {0, 1, ..., p − 1}and Z∗ p = {1, ..., p − 1} Define the following hash function: ha,b(k) = ((ak + b) mod p) mod m, ∀a ∈ Z∗ p and b ∈ Zp The family of all such hash functions is: Hp,m = {ha,b : a ∈ Z∗ p and b ∈ Zp} Important a and b are chosen randomly at the beginning of execution. The class Hp,m of hash functions is universal. 64 / 127
  • 166. Images/cinvestav- Example of Universal Hash Proceed as follows: Choose a primer number p large enough so that every possible key k is in the range [0, ..., p − 1] Zp = {0, 1, ..., p − 1}and Z∗ p = {1, ..., p − 1} Define the following hash function: ha,b(k) = ((ak + b) mod p) mod m, ∀a ∈ Z∗ p and b ∈ Zp The family of all such hash functions is: Hp,m = {ha,b : a ∈ Z∗ p and b ∈ Zp} Important a and b are chosen randomly at the beginning of execution. The class Hp,m of hash functions is universal. 64 / 127
  • 167. Images/cinvestav- Example of Universal Hash Proceed as follows: Choose a primer number p large enough so that every possible key k is in the range [0, ..., p − 1] Zp = {0, 1, ..., p − 1}and Z∗ p = {1, ..., p − 1} Define the following hash function: ha,b(k) = ((ak + b) mod p) mod m, ∀a ∈ Z∗ p and b ∈ Zp The family of all such hash functions is: Hp,m = {ha,b : a ∈ Z∗ p and b ∈ Zp} Important a and b are chosen randomly at the beginning of execution. The class Hp,m of hash functions is universal. 64 / 127
  • 168. Images/cinvestav- Example: Universal hash functions Example p = 977, m = 50, a and b random numbers ha,b(k) = ((ak + b) mod p) mod m 65 / 127
  • 169. Images/cinvestav- Example of key distribution Example, mean = 488.5 and dispersion = 5 66 / 127
  • 170. Images/cinvestav- Example with 10 keys Universal Hashing Vs Division Method 67 / 127
  • 171. Images/cinvestav- Example with 50 keys Universal Hashing Vs Division Method 68 / 127
  • 172. Images/cinvestav- Example with 100 keys Universal Hashing Vs Division Method 69 / 127
  • 173. Images/cinvestav- Example with 200 keys Universal Hashing Vs Division Method 70 / 127
  • 174. Images/cinvestav- Another Example: Matrix Method Then Let us say keys are u-bits long. Say the table size M is power of 2. an index is b-bits long with M = 2b. The h function Pick h to be a random b-by-u 0/1 matrix, and define h(x) = hx where after the inner product we apply mod 2 Example h b    1 0 0 0 0 1 1 1 1 1 1 0    u x     1 0 1 0      = h (x)   1 1 0    71 / 127
  • 175. Images/cinvestav- Another Example: Matrix Method Then Let us say keys are u-bits long. Say the table size M is power of 2. an index is b-bits long with M = 2b. The h function Pick h to be a random b-by-u 0/1 matrix, and define h(x) = hx where after the inner product we apply mod 2 Example h b    1 0 0 0 0 1 1 1 1 1 1 0    u x     1 0 1 0      = h (x)   1 1 0    71 / 127
  • 176. Images/cinvestav- Another Example: Matrix Method Then Let us say keys are u-bits long. Say the table size M is power of 2. an index is b-bits long with M = 2b. The h function Pick h to be a random b-by-u 0/1 matrix, and define h(x) = hx where after the inner product we apply mod 2 Example h b    1 0 0 0 0 1 1 1 1 1 1 0    u x     1 0 1 0      = h (x)   1 1 0    71 / 127
  • 177. Images/cinvestav- Another Example: Matrix Method Then Let us say keys are u-bits long. Say the table size M is power of 2. an index is b-bits long with M = 2b. The h function Pick h to be a random b-by-u 0/1 matrix, and define h(x) = hx where after the inner product we apply mod 2 Example h b    1 0 0 0 0 1 1 1 1 1 1 0    u x     1 0 1 0      = h (x)   1 1 0    71 / 127
  • 178. Images/cinvestav- Another Example: Matrix Method Then Let us say keys are u-bits long. Say the table size M is power of 2. an index is b-bits long with M = 2b. The h function Pick h to be a random b-by-u 0/1 matrix, and define h(x) = hx where after the inner product we apply mod 2 Example h b    1 0 0 0 0 1 1 1 1 1 1 0    u x     1 0 1 0      = h (x)   1 1 0    71 / 127
  • 179. Images/cinvestav- Implementation of the column*vector mod 2 Code - SWAR-Popcount - Divide and Conquer // This works only i n 32 b i t s i n t product ( i n t row , i n t v e c t o r ){ i n t i = row & v e c t o r ; i = i − (( i >> 1) & 0x55555555 ) ; i = ( i & 0x33333333 ) + (( i >> 2) & 0x33333333 ) ; i = ( ( ( i + ( i >> 4)) & 0x0F0F0F0F ) ∗ 0x01010101 ) >> 24; r e t u r n i & 0x00000001 ; } 72 / 127
  • 180. Images/cinvestav- Explanation Counting Duo-Bits A bit-duo (two neighboring bits) can be interpreted with bit 0 = a, and bit 1 = b as duo := 2b + a The duo population is popcnt(duo) := b + a This can be achieved by (2b + a) − (2b + a) ÷ 2 or (2b + a) − b 73 / 127
  • 181. Images/cinvestav- Explanation Counting Duo-Bits A bit-duo (two neighboring bits) can be interpreted with bit 0 = a, and bit 1 = b as duo := 2b + a The duo population is popcnt(duo) := b + a This can be achieved by (2b + a) − (2b + a) ÷ 2 or (2b + a) − b 73 / 127
  • 182. Images/cinvestav- Explanation Counting Duo-Bits A bit-duo (two neighboring bits) can be interpreted with bit 0 = a, and bit 1 = b as duo := 2b + a The duo population is popcnt(duo) := b + a This can be achieved by (2b + a) − (2b + a) ÷ 2 or (2b + a) − b 73 / 127
  • 183. Images/cinvestav- We have then Something Notable i i div 2 popcnt(i) 00 00 00 01 00 01 10 01 01 11 01 10 We have then Only the lower bit is needed from i div 2 - and one do not has to worry about borrows from neighboring duos. We need to clear the upper bits Remember 5 in binary 0101 (Granularity Problem)... we use the 0 to remove the upper bits 74 / 127
  • 184. Images/cinvestav- We have then Something Notable i i div 2 popcnt(i) 00 00 00 01 00 01 10 01 01 11 01 10 We have then Only the lower bit is needed from i div 2 - and one do not has to worry about borrows from neighboring duos. We need to clear the upper bits Remember 5 in binary 0101 (Granularity Problem)... we use the 0 to remove the upper bits 74 / 127
  • 185. Images/cinvestav- We have then Something Notable i i div 2 popcnt(i) 00 00 00 01 00 01 10 01 01 11 01 10 We have then Only the lower bit is needed from i div 2 - and one do not has to worry about borrows from neighboring duos. We need to clear the upper bits Remember 5 in binary 0101 (Granularity Problem)... we use the 0 to remove the upper bits 74 / 127
  • 186. Images/cinvestav- Thus, we have that Clearing Bits SWAR-wise, one needs to clear all "even" bits of the div 2 subtrahend to perform a 32-bit subtraction of all 16 duos: x = x - ((x >> 1) & 0x55555555); Note The popcount-result of the bit-duos still takes two bits. Now What? 75 / 127
  • 187. Images/cinvestav- Thus, we have that Clearing Bits SWAR-wise, one needs to clear all "even" bits of the div 2 subtrahend to perform a 32-bit subtraction of all 16 duos: x = x - ((x >> 1) & 0x55555555); Note The popcount-result of the bit-duos still takes two bits. Now What? 75 / 127
  • 188. Images/cinvestav- Thus, we have that Clearing Bits SWAR-wise, one needs to clear all "even" bits of the div 2 subtrahend to perform a 32-bit subtraction of all 16 duos: x = x - ((x >> 1) & 0x55555555); Note The popcount-result of the bit-duos still takes two bits. Now What? 75 / 127
  • 189. Images/cinvestav- Counting Nibble-Bits We have The next step is to add the duo-counts to populations of four neighboring bits, the 8 nibble-counts, which may range from zero to four We can do this using 3 in binary 0x3=0011 76 / 127
  • 190. Images/cinvestav- Counting Nibble-Bits We have The next step is to add the duo-counts to populations of four neighboring bits, the 8 nibble-counts, which may range from zero to four We can do this using 3 in binary 0x3=0011 76 / 127
  • 191. Images/cinvestav- How? How many ones you can have in 4 bits 0 to 22 Thus, we only need 4 bits to store the result We create a counter for each of the two bits in four bits This is done using the mask - after How many ones can you have in two bits? From 0 to 2 you can use the lower 2 bits to represent this!!! This is the instruction i = (i & 0x33333333) + ((i >> 2) & 0x33333333); 77 / 127
  • 192. Images/cinvestav- How? How many ones you can have in 4 bits 0 to 22 Thus, we only need 4 bits to store the result We create a counter for each of the two bits in four bits This is done using the mask - after How many ones can you have in two bits? From 0 to 2 you can use the lower 2 bits to represent this!!! This is the instruction i = (i & 0x33333333) + ((i >> 2) & 0x33333333); 77 / 127
  • 193. Images/cinvestav- How? How many ones you can have in 4 bits 0 to 22 Thus, we only need 4 bits to store the result We create a counter for each of the two bits in four bits This is done using the mask - after How many ones can you have in two bits? From 0 to 2 you can use the lower 2 bits to represent this!!! This is the instruction i = (i & 0x33333333) + ((i >> 2) & 0x33333333); 77 / 127
  • 194. Images/cinvestav- How? How many ones you can have in 4 bits 0 to 22 Thus, we only need 4 bits to store the result We create a counter for each of the two bits in four bits This is done using the mask - after How many ones can you have in two bits? From 0 to 2 you can use the lower 2 bits to represent this!!! This is the instruction i = (i & 0x33333333) + ((i >> 2) & 0x33333333); 77 / 127
  • 195. Images/cinvestav- How? How many ones you can have in 4 bits 0 to 22 Thus, we only need 4 bits to store the result We create a counter for each of the two bits in four bits This is done using the mask - after How many ones can you have in two bits? From 0 to 2 you can use the lower 2 bits to represent this!!! This is the instruction i = (i & 0x33333333) + ((i >> 2) & 0x33333333); 77 / 127
  • 196. Images/cinvestav- Example We have Old i new i 3 new i&3 0000 0000 0011 0000 0001 0001 0011 0001 0010 0001 0011 0001 0011 0010 0011 0010 0100 0100 0011 0000 0101 0101 0011 0001 1111 1010 0011 0010 new i>>2 3 new i>>2&3 0000 0011 0000 0000 0011 0000 0000 0011 0000 0000 0011 0000 0001 0011 0001 0001 0011 0001 0010 0011 0010 Final i 0000 0001 0001 0010 0001 0010 0100 78 / 127
  • 197. Images/cinvestav- We have the following Now what? You already got the idea? Now it is about to get the byte-populations from two nibble-populations (8 bits) Do you remember your hexadecimal notation 0x0f = 00001111 How many ones do you have in 8 positions From 0 to 23, you only require 4 bits for this!!! 79 / 127
  • 198. Images/cinvestav- We have the following Now what? You already got the idea? Now it is about to get the byte-populations from two nibble-populations (8 bits) Do you remember your hexadecimal notation 0x0f = 00001111 How many ones do you have in 8 positions From 0 to 23, you only require 4 bits for this!!! 79 / 127
  • 199. Images/cinvestav- We have the following Now what? You already got the idea? Now it is about to get the byte-populations from two nibble-populations (8 bits) Do you remember your hexadecimal notation 0x0f = 00001111 How many ones do you have in 8 positions From 0 to 23, you only require 4 bits for this!!! 79 / 127
  • 200. Images/cinvestav- Thus the next step It will get the counter for the number of ones in 8 bits (i + (i >> 4)) & 0x0f0f0f0f0f0f0f0f; 80 / 127
  • 201. Images/cinvestav- Example We have Old x 2 bits 4 bits 11110000 10100000 01000000 11111111 10101010 01000100 Now we get 1 01000000 −→ (i >> 4) → 00000100 −→ i+(i >> 4) →01000100&00001111−→ 00000100 2 01000100 −→ (i >> 4) → 00000100 −→ i+(i >> 4) →01001000&00001111−→ 00001000 81 / 127
  • 202. Images/cinvestav- Example We have Old x 2 bits 4 bits 11110000 10100000 01000000 11111111 10101010 01000100 Now we get 1 01000000 −→ (i >> 4) → 00000100 −→ i+(i >> 4) →01000100&00001111−→ 00000100 2 01000100 −→ (i >> 4) → 00000100 −→ i+(i >> 4) →01001000&00001111−→ 00001000 81 / 127
  • 203. Images/cinvestav- We can keep doing this It is possible to keep going i = (i + (i >> 8)) & 0x00ff00ff Then i = (i + (i >> 16)) & 0x000000ff 82 / 127
  • 204. Images/cinvestav- We can keep doing this It is possible to keep going i = (i + (i >> 8)) & 0x00ff00ff Then i = (i + (i >> 16)) & 0x000000ff 82 / 127
  • 205. Images/cinvestav- With today fast 64 bit multiplications Something Notable one can multiply the vector of 8-byte-counts with 0x0101010101010101 to get the final result in the most significant byte, For this the code (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101)>>24 83 / 127
  • 206. Images/cinvestav- With today fast 64 bit multiplications Something Notable one can multiply the vector of 8-byte-counts with 0x0101010101010101 to get the final result in the most significant byte, For this the code (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101)>>24 83 / 127
  • 207. Images/cinvestav- Divide and Conquer This is the natural way we do many things We always attack smaller versions first of the large one!!! 84 / 127
  • 208. Images/cinvestav- Overflow Handling Overflow An overflow occurs when the home bucket for a new pair (key, element) is full. One strategy to handle overflow, small universe of keys Search the hash table in some systematic fashion for a bucket that is not full. Linear probing (linear open addressing). Quadratic probing. Random probing. The other strategy, a large universe of keys Eliminate overflows by permitting each bucket to keep a list of all pairs for which it is the home bucket. Array linear list. Chain. 85 / 127
  • 209. Images/cinvestav- Overflow Handling Overflow An overflow occurs when the home bucket for a new pair (key, element) is full. One strategy to handle overflow, small universe of keys Search the hash table in some systematic fashion for a bucket that is not full. Linear probing (linear open addressing). Quadratic probing. Random probing. The other strategy, a large universe of keys Eliminate overflows by permitting each bucket to keep a list of all pairs for which it is the home bucket. Array linear list. Chain. 85 / 127
  • 210. Images/cinvestav- Overflow Handling Overflow An overflow occurs when the home bucket for a new pair (key, element) is full. One strategy to handle overflow, small universe of keys Search the hash table in some systematic fashion for a bucket that is not full. Linear probing (linear open addressing). Quadratic probing. Random probing. The other strategy, a large universe of keys Eliminate overflows by permitting each bucket to keep a list of all pairs for which it is the home bucket. Array linear list. Chain. 85 / 127
  • 211. Images/cinvestav- Overflow Handling Overflow An overflow occurs when the home bucket for a new pair (key, element) is full. One strategy to handle overflow, small universe of keys Search the hash table in some systematic fashion for a bucket that is not full. Linear probing (linear open addressing). Quadratic probing. Random probing. The other strategy, a large universe of keys Eliminate overflows by permitting each bucket to keep a list of all pairs for which it is the home bucket. Array linear list. Chain. 85 / 127
  • 212. Images/cinvestav- Overflow Handling Overflow An overflow occurs when the home bucket for a new pair (key, element) is full. One strategy to handle overflow, small universe of keys Search the hash table in some systematic fashion for a bucket that is not full. Linear probing (linear open addressing). Quadratic probing. Random probing. The other strategy, a large universe of keys Eliminate overflows by permitting each bucket to keep a list of all pairs for which it is the home bucket. Array linear list. Chain. 85 / 127
  • 213. Images/cinvestav- Overflow Handling Overflow An overflow occurs when the home bucket for a new pair (key, element) is full. One strategy to handle overflow, small universe of keys Search the hash table in some systematic fashion for a bucket that is not full. Linear probing (linear open addressing). Quadratic probing. Random probing. The other strategy, a large universe of keys Eliminate overflows by permitting each bucket to keep a list of all pairs for which it is the home bucket. Array linear list. Chain. 85 / 127
  • 214. Images/cinvestav- Outline 1 Introduction 2 Operation in a ADT dictionary Add Remove GetValue Contains Iterators 3 Example 4 Implementation 5 Hash Tables Number of Keys Hash Functions Hashing By Division 6 Overflow Handling Open Addressing Chaining 86 / 127
  • 215. Images/cinvestav- Open addressing Definition All the elements occupy the hash table itself. What is it? We systematically examine table slots until either we find the desired element or we have ascertained that the element is not in the table. Advantages The advantage of open addressing is that it avoids pointers altogether. 87 / 127
  • 216. Images/cinvestav- Open addressing Definition All the elements occupy the hash table itself. What is it? We systematically examine table slots until either we find the desired element or we have ascertained that the element is not in the table. Advantages The advantage of open addressing is that it avoids pointers altogether. 87 / 127
  • 217. Images/cinvestav- Open addressing Definition All the elements occupy the hash table itself. What is it? We systematically examine table slots until either we find the desired element or we have ascertained that the element is not in the table. Advantages The advantage of open addressing is that it avoids pointers altogether. 87 / 127
  • 218. Images/cinvestav- Insert in Open addressing Extended hash function to probe Instead of being fixed in the order 0, 1, 2, ..., m − 1 with Θ (n) search time Extend the hash function to h : U × {0, 1, ..., m − 1} → {0, 1, ..., m − 1} This gives the probe sequence h(k, 0), h(k, 1), ..., h(k, m − 1) A permutation of 0, 1, 2, ..., m − 1 88 / 127
  • 219. Images/cinvestav- Insert in Open addressing Extended hash function to probe Instead of being fixed in the order 0, 1, 2, ..., m − 1 with Θ (n) search time Extend the hash function to h : U × {0, 1, ..., m − 1} → {0, 1, ..., m − 1} This gives the probe sequence h(k, 0), h(k, 1), ..., h(k, m − 1) A permutation of 0, 1, 2, ..., m − 1 88 / 127
  • 220. Images/cinvestav- Insert in Open addressing Extended hash function to probe Instead of being fixed in the order 0, 1, 2, ..., m − 1 with Θ (n) search time Extend the hash function to h : U × {0, 1, ..., m − 1} → {0, 1, ..., m − 1} This gives the probe sequence h(k, 0), h(k, 1), ..., h(k, m − 1) A permutation of 0, 1, 2, ..., m − 1 88 / 127
  • 221. Images/cinvestav- Insert in Open addressing Extended hash function to probe Instead of being fixed in the order 0, 1, 2, ..., m − 1 with Θ (n) search time Extend the hash function to h : U × {0, 1, ..., m − 1} → {0, 1, ..., m − 1} This gives the probe sequence h(k, 0), h(k, 1), ..., h(k, m − 1) A permutation of 0, 1, 2, ..., m − 1 88 / 127
  • 222. Images/cinvestav- Hashing methods in Open Addressing HASH-INSERT(T, k) 1 i = 0 2 repeat 3 j = h (k, i) 4 if T [j] == NIL 5 T [j] = k 6 return j 7 else i = i + 1 8 until i == m 9 error “Hash Table Overflow” 89 / 127
  • 223. Images/cinvestav- Hashing methods in Open Addressing HASH-SEARCH(T,k) 1 i = 0 2 repeat 3 j = h (k, i) 4 if T [j] == k 5 return j 6 i = i + 1 7 until T [j] == NIL or i == m 8 return NIL 90 / 127
  • 224. Images/cinvestav- Linear probing: Definition and properties Hash function Given an ordinary hash function h : 0, 1, ..., m − 1 → U for i = 0, 1, ..., m − 1, we get the extended hash function h(k, i) = h (k) + i mod m, (2) Sequence of probes Given key k, we first probe T[h (k)], then T[h (k) + 1] and so on until T[m − 1]. Then, we wrap around T[0] to T[h (k) − 1]. Distinct probes Because the initial probe determines the entire probe sequence, there are m distinct probe sequences. 91 / 127
  • 225. Images/cinvestav- Linear probing: Definition and properties Hash function Given an ordinary hash function h : 0, 1, ..., m − 1 → U for i = 0, 1, ..., m − 1, we get the extended hash function h(k, i) = h (k) + i mod m, (2) Sequence of probes Given key k, we first probe T[h (k)], then T[h (k) + 1] and so on until T[m − 1]. Then, we wrap around T[0] to T[h (k) − 1]. Distinct probes Because the initial probe determines the entire probe sequence, there are m distinct probe sequences. 91 / 127
  • 226. Images/cinvestav- Linear probing: Definition and properties Hash function Given an ordinary hash function h : 0, 1, ..., m − 1 → U for i = 0, 1, ..., m − 1, we get the extended hash function h(k, i) = h (k) + i mod m, (2) Sequence of probes Given key k, we first probe T[h (k)], then T[h (k) + 1] and so on until T[m − 1]. Then, we wrap around T[0] to T[h (k) − 1]. Distinct probes Because the initial probe determines the entire probe sequence, there are m distinct probe sequences. 91 / 127
  • 227. Images/cinvestav- Linear probing: Definition and properties Disadvantages Linear probing suffers of primary clustering. Long runs of occupied slots build up increasing the average search time. Clusters arise because an empty slot preceded by i full slots gets filled next with probability i+1 m . Long runs of occupied slots tend to get longer, and the average search time increases. 92 / 127
  • 228. Images/cinvestav- Linear probing: Definition and properties Disadvantages Linear probing suffers of primary clustering. Long runs of occupied slots build up increasing the average search time. Clusters arise because an empty slot preceded by i full slots gets filled next with probability i+1 m . Long runs of occupied slots tend to get longer, and the average search time increases. 92 / 127
  • 229. Images/cinvestav- Linear probing: Definition and properties Disadvantages Linear probing suffers of primary clustering. Long runs of occupied slots build up increasing the average search time. Clusters arise because an empty slot preceded by i full slots gets filled next with probability i+1 m . Long runs of occupied slots tend to get longer, and the average search time increases. 92 / 127
  • 230. Images/cinvestav- Linear probing: Definition and properties Disadvantages Linear probing suffers of primary clustering. Long runs of occupied slots build up increasing the average search time. Clusters arise because an empty slot preceded by i full slots gets filled next with probability i+1 m . Long runs of occupied slots tend to get longer, and the average search time increases. 92 / 127
  • 231. Images/cinvestav- Example: Linear Probing – Get And Put Constraints divisor = m (number of buckets) = 17. Home bucket = key % 17. Then Put in pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45 We have 93 / 127
  • 232. Images/cinvestav- Example: Linear Probing – Get And Put Constraints divisor = m (number of buckets) = 17. Home bucket = key % 17. Then Put in pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45 We have 93 / 127
  • 233. Images/cinvestav- Example: Linear Probing – Get And Put Constraints divisor = m (number of buckets) = 17. Home bucket = key % 17. Then Put in pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45 We have 93 / 127
  • 234. Images/cinvestav- Linear Probing – Remove Example remove(0) Compact Cluster Search cluster for pair (if any) to fill vacated bucket. 94 / 127
  • 235. Images/cinvestav- Linear Probing – Remove Example remove(0) Compact Cluster Search cluster for pair (if any) to fill vacated bucket. 94 / 127
  • 236. Images/cinvestav- Linear Probing – Remove Example remove(0) Compact Cluster Search cluster for pair (if any) to fill vacated bucket. 94 / 127
  • 237. Images/cinvestav- Linear Probing – remove(34) Example remove(34) Compact Cluster Search cluster for pair (if any) to fill vacated bucket. 95 / 127
  • 238. Images/cinvestav- Linear Probing – remove(34) Example remove(34) Compact Cluster Search cluster for pair (if any) to fill vacated bucket. 95 / 127
  • 239. Images/cinvestav- Linear Probing – remove(34) Example remove(34) Compact Cluster Search cluster for pair (if any) to fill vacated bucket. 95 / 127
  • 240. Images/cinvestav- Linear Probing – remove(29) Example Compact Cluster Search cluster for pair (if any) to fill vacated bucket. 96 / 127
  • 241. Images/cinvestav- Linear Probing – remove(29) Example Compact Cluster Search cluster for pair (if any) to fill vacated bucket. 96 / 127
  • 242. Images/cinvestav- Code for Removing We have the following p u b l i c void remove ( key ){ i n t positionsChecked = 1; i n t i = FindSlot ( Key ) ; i f ( Table [ i ] == n u l l ) r e t u r n ; // key i s not i n the t a b l e j = i ; wh il e ( positionsChecked <= Table . len gth ){ j = ( j +1) % Table . leng th ; i f ( Table [ j ] == n u l l ) break ; k = Hashing ( Table [ j ] . key ) ; i f ( i < j && ( k <= i | | k > j )) | | ( j < i && ( k <= i && k > j )) { Table [ i ] = Table [ j ] ; i = j ; } positionChecked++; } Table [ i ] = n u l l ; } 97 / 127
  • 243. Images/cinvestav- Explanation First For all records in a cluster, there must be no vacant slots between their natural hash position and their current position (else lookups will terminate before finding the record). Second k is the raw hash where the record at j would naturally land in the hash table if there were no collisions. Thus This test is asking if the record at j is invalidly positioned with respect to the required properties of a cluster now that i is vacant. 98 / 127
  • 244. Images/cinvestav- Explanation First For all records in a cluster, there must be no vacant slots between their natural hash position and their current position (else lookups will terminate before finding the record). Second k is the raw hash where the record at j would naturally land in the hash table if there were no collisions. Thus This test is asking if the record at j is invalidly positioned with respect to the required properties of a cluster now that i is vacant. 98 / 127
  • 245. Images/cinvestav- Explanation First For all records in a cluster, there must be no vacant slots between their natural hash position and their current position (else lookups will terminate before finding the record). Second k is the raw hash where the record at j would naturally land in the hash table if there were no collisions. Thus This test is asking if the record at j is invalidly positioned with respect to the required properties of a cluster now that i is vacant. 98 / 127
  • 246. Images/cinvestav- Case 1 We have the following Case 1 i j k k we have i < j If i < k ≤ j then moving j to the i position will be incorrect... Why? 99 / 127
  • 247. Images/cinvestav- Case 2 We have the following Case 2 ij k We have j < i If k ≤ j < or i < k then moving j to the i position will be incorrect... Why? 100 / 127
  • 248. Images/cinvestav- Complexity We have then Worst-case get/put/remove time is Θ(n), where n is the number of pairs in the table. Something Notable This happens when all pairs are in the same cluster. 101 / 127
  • 249. Images/cinvestav- Complexity We have then Worst-case get/put/remove time is Θ(n), where n is the number of pairs in the table. Something Notable This happens when all pairs are in the same cluster. 101 / 127
  • 250. Images/cinvestav- Explaining α We have that α= loading density = (number of pairs)/m. In the example α = 12/17 We have the following terms to explain complexity in Open Addressing Sn = expected number of buckets examined in a successful search when n is large. Un= expected number of buckets examined in a unsuccessful search when n is large 102 / 127
  • 251. Images/cinvestav- Explaining α We have that α= loading density = (number of pairs)/m. In the example α = 12/17 We have the following terms to explain complexity in Open Addressing Sn = expected number of buckets examined in a successful search when n is large. Un= expected number of buckets examined in a unsuccessful search when n is large 102 / 127
  • 252. Images/cinvestav- Explaining α We have that α= loading density = (number of pairs)/m. In the example α = 12/17 We have the following terms to explain complexity in Open Addressing Sn = expected number of buckets examined in a successful search when n is large. Un= expected number of buckets examined in a unsuccessful search when n is large 102 / 127
  • 253. Images/cinvestav- Explaining α We have that α= loading density = (number of pairs)/m. In the example α = 12/17 We have the following terms to explain complexity in Open Addressing Sn = expected number of buckets examined in a successful search when n is large. Un= expected number of buckets examined in a unsuccessful search when n is large 102 / 127
  • 254. Images/cinvestav- Expected Performance in Open Addressing, 0 ≤ α ≤ 1 We have for unsuccessful search Un = O 1 + 1 1 − α (3) We have for successful search Sn = O 1 + 1 α ln 1 1 − α (4) 103 / 127
  • 255. Images/cinvestav- Expected Performance in Open Addressing, 0 ≤ α ≤ 1 We have for unsuccessful search Un = O 1 + 1 1 − α (3) We have for successful search Sn = O 1 + 1 α ln 1 1 − α (4) 103 / 127
  • 256. Images/cinvestav- Thus, for different α s We have using uniform keys!!! α Sn Un 0.50 1.5 2.5 0.75 2.5 8.5 0.90 5.5 50.5 Recommendation α ≤ 0.75 is recommended. 104 / 127
  • 257. Images/cinvestav- Thus, for different α s We have using uniform keys!!! α Sn Un 0.50 1.5 2.5 0.75 2.5 8.5 0.90 5.5 50.5 Recommendation α ≤ 0.75 is recommended. 104 / 127
  • 258. Images/cinvestav- Hash Table Design For example If performance requirements are given, determine maximum permissible loading density. We want a successful search to make no more than 5 compares (expected) 4 = 1 + 1 1−α‘ α = 3/4 We want an unsuccessful search to make no more than 6 compares (expected). 6 = 1 + 1 α log 1 1−α α ≈ 0.964 105 / 127
  • 259. Images/cinvestav- Hash Table Design For example If performance requirements are given, determine maximum permissible loading density. We want a successful search to make no more than 5 compares (expected) 4 = 1 + 1 1−α‘ α = 3/4 We want an unsuccessful search to make no more than 6 compares (expected). 6 = 1 + 1 α log 1 1−α α ≈ 0.964 105 / 127
  • 260. Images/cinvestav- Hash Table Design For example If performance requirements are given, determine maximum permissible loading density. We want a successful search to make no more than 5 compares (expected) 4 = 1 + 1 1−α‘ α = 3/4 We want an unsuccessful search to make no more than 6 compares (expected). 6 = 1 + 1 α log 1 1−α α ≈ 0.964 105 / 127
  • 261. Images/cinvestav- Use the minimum α for your trigger of doubling the array!!! Thus, we have αf ≤ min {3/4, 0.964} 106 / 127
  • 262. Images/cinvestav- Hash Table Design Dynamic resizing of table. Whenever loading density exceeds threshold, rehash into a table of approximately twice the current size. 107 / 127
  • 263. Images/cinvestav- But , even with a nice division method!!! Example using keys uniformly distributed It was generated using the division method Then 108 / 127
  • 264. Images/cinvestav- But , even with a nice division method!!! Example using keys uniformly distributed It was generated using the division method Then 108 / 127
  • 265. Images/cinvestav- Example Example using Gaussian keys It was generated using the division method Then 109 / 127
  • 266. Images/cinvestav- Example Example using Gaussian keys It was generated using the division method Then 109 / 127
  • 267. Images/cinvestav- Outline 1 Introduction 2 Operation in a ADT dictionary Add Remove GetValue Contains Iterators 3 Example 4 Implementation 5 Hash Tables Number of Keys Hash Functions Hashing By Division 6 Overflow Handling Open Addressing Chaining 110 / 127
  • 268. Images/cinvestav- Linear List Of Synonyms Thus Each bucket keeps a linear list of all pairs for which it is the home bucket. The linear list may or may not be sorted by key. The linear list may be an array linear list or a chain. 111 / 127
  • 269. Images/cinvestav- Collision Handling: Chaining A Possible Solution Insert the elements that hash to the same slot into a linked list. U (Universe of Keys) 112 / 127
  • 270. Images/cinvestav- Example Sorted Chains Add to a hash table with m = 11 Put in pairs whose keys are 6, 17, 12, 23, 28, 5, 16, 3, 8 So, we have Home bucket = key % 11. 113 / 127
  • 271. Images/cinvestav- Example Sorted Chains Add to a hash table with m = 11 Put in pairs whose keys are 6, 17, 12, 23, 28, 5, 16, 3, 8 So, we have Home bucket = key % 11. 113 / 127
  • 283. Images/cinvestav- Do You Remember This? Universal Hashing Vs Division Method 125 / 127
  • 284. Images/cinvestav- Expected Complexity of Hash Table under Chaining We have for unsuccessful search Un = O (1 + α) (5) We have for successful search Sn = O (1 + α) (6) 126 / 127
  • 285. Images/cinvestav- Expected Complexity of Hash Table under Chaining We have for unsuccessful search Un = O (1 + α) (5) We have for successful search Sn = O (1 + α) (6) 126 / 127
  • 286. Images/cinvestav- Now, Back to the real world java.util.Hashtable It uses unsorted chains. It uses a default initial m = divisor = 101 It uses a default α ≤ 0.75 When loading density exceeds a max permissible threshold, It rehash with new m = 2m+1. 127 / 127
  • 287. Images/cinvestav- Now, Back to the real world java.util.Hashtable It uses unsorted chains. It uses a default initial m = divisor = 101 It uses a default α ≤ 0.75 When loading density exceeds a max permissible threshold, It rehash with new m = 2m+1. 127 / 127
  • 288. Images/cinvestav- Now, Back to the real world java.util.Hashtable It uses unsorted chains. It uses a default initial m = divisor = 101 It uses a default α ≤ 0.75 When loading density exceeds a max permissible threshold, It rehash with new m = 2m+1. 127 / 127
  • 289. Images/cinvestav- Now, Back to the real world java.util.Hashtable It uses unsorted chains. It uses a default initial m = divisor = 101 It uses a default α ≤ 0.75 When loading density exceeds a max permissible threshold, It rehash with new m = 2m+1. 127 / 127