HASH TABLE
VU QUANG TRAN
EXAMPLE
• Design a system to store employees' information using their phone number as
key
• Operations: Insert, Search, Delete
• Some possible data structures:
• Array
• Linked List
• Balanced Binary Search Tree
• Direct Access Table
EXAMPLE
• Design a system to store employees' information using their phone number as
key
• Operations: Insert, Search, Delete
• Some possible data structures:
• Array: O(n) Search, Delete
• Linked List: O(n) Search
• Balanced Binary Search Tree: O(log n) All
• Direct Access Table: Space Wastage
=> Hash Table: O(1) All
BASICS
• Data Structure that implements Associative
Array
• Map key to corresponding value
• Use Hash function to compute index of key-
value pairs into an array of buckets
• O(1) complexity on average and O(n) in worst
case
HASHING
• Distribute the entries (key-value pairs) across an array of buckets
• Hash function: Map data of arbitrary size to data of fixed size
• Two steps:
1. hash = hash_func(key)
2. index = hash % table_size
CHOOSING A HASH FUNCTION
• Easy to compute
• Uniform Distribution
TYPES OF HASH FUNCTION
• Two types:
• Cryptographic hash
• Non-cryptographic hash
• Non-cryptographic hash provides weaker
guarantees than cryptographic hash in
exchange for performance improvements
• Example:
• Crypto: BLAKE2b, SHA-512, MD5, …
• Non-crypto: MurmurHash, xxHash, ...
• Cryptographic hash aims to provide
certain security guarantees
• Main properties of cryptographic hash:
• Deterministic
• Quick
• One-way function
• Avalanche effect
• Collision resistant
• Pre-image attack resistant
COLLISION RESOLUTION
• Two or more keys result in a same hash value
• Practically unavoidable
• Handling techniques:
• Separate chaining
• Open addressing
COLLISION RESOLUTION
SEPARATE CHAINING
• Make each cell of hash table point to a linked list of records that have same hash
function value
COLLISION RESOLUTION
SEPARATE CHAINING
• Make each cell of hash table point to a linked list of records that have same hash
function value
• Advantages:
• Simple to implement
• Hash table never fills up
• Disadvantages:
• Cache performance
• Space wastage
• Search time can become O(n) if chain
gets long
COLLISION RESOLUTION
OPEN ADDRESSING
• All elements are stored in the hash table itself
• Operations:
• Insert: Keep probing until an empty slot is found
• Search: Keep probing until key is found or an empty slot is reached
• Delete: If we simply delete a key, then search may fail. So slots of deleted keys are
marked specially as DELETED
COLLISION RESOLUTION
OPEN ADDRESSING
Types:
• Linear probing: Linearly probe for next slot
index = [hash(x) + i] % S
COLLISION RESOLUTION
OPEN ADDRESSING
Types:
• Linear probing: Linearly probe for next slot => Clustering
index = [hash(x) + i] % S
COLLISION RESOLUTION
OPEN ADDRESSING
Types:
• Linear probing: Linearly probe for next slot => Clustering
index = [hash(x) + i] % S
• Quadratic probing: Look for i^2 slot in ith iteration
index = [hash(x) + i^2] % S
COLLISION RESOLUTION
OPEN ADDRESSING
Types:
• Linear probing: Linearly probe for next slot => Clustering
index = [hash(x) + i] % S
• Quadratic probing: Look for i^2 slot in ith iteration
index = [hash(x) + i^2] % S
• Double hashing: Use another hash function hash2(x) and look for i*hash2(x) in ith
iteration
index = [hash(x) + i*hash2(x)] % S
COLLISION RESOLUTION
OPEN ADDRESSING
Comparison:
• Linear probing:
• Easy to compute
• Best cache performance
• Suffers from Clustering
• Quadratic probing:
• Lies between cache performance and clustering
• Double hashing:
• Poor cache performance
• No clustering
• More computation time
COLLISION RESOLUTION
OPEN ADDRESSING
• Advantages:
• Better cache performance
• Better space usage
• Disadvantages:
• Harder to implement
• Hash table may become full
• Clustering
DYNAMIC RESIZING
• Load factor = number of entries / number of buckets
• When load factor is too low or too high => Dynamic resizing
• Approaches:
• Complete resizing
• Incremental resizing
USAGE
• Associative Array
• Database Indexing
• Cache
• Set
• …
REFERENCE
• Wikipedia
• GeeksForGeeks
THANK YOU

Hash table

  • 1.
  • 2.
    EXAMPLE • Design asystem to store employees' information using their phone number as key • Operations: Insert, Search, Delete • Some possible data structures: • Array • Linked List • Balanced Binary Search Tree • Direct Access Table
  • 3.
    EXAMPLE • Design asystem to store employees' information using their phone number as key • Operations: Insert, Search, Delete • Some possible data structures: • Array: O(n) Search, Delete • Linked List: O(n) Search • Balanced Binary Search Tree: O(log n) All • Direct Access Table: Space Wastage => Hash Table: O(1) All
  • 4.
    BASICS • Data Structurethat implements Associative Array • Map key to corresponding value • Use Hash function to compute index of key- value pairs into an array of buckets • O(1) complexity on average and O(n) in worst case
  • 5.
    HASHING • Distribute theentries (key-value pairs) across an array of buckets • Hash function: Map data of arbitrary size to data of fixed size • Two steps: 1. hash = hash_func(key) 2. index = hash % table_size
  • 6.
    CHOOSING A HASHFUNCTION • Easy to compute • Uniform Distribution
  • 7.
    TYPES OF HASHFUNCTION • Two types: • Cryptographic hash • Non-cryptographic hash • Non-cryptographic hash provides weaker guarantees than cryptographic hash in exchange for performance improvements • Example: • Crypto: BLAKE2b, SHA-512, MD5, … • Non-crypto: MurmurHash, xxHash, ... • Cryptographic hash aims to provide certain security guarantees • Main properties of cryptographic hash: • Deterministic • Quick • One-way function • Avalanche effect • Collision resistant • Pre-image attack resistant
  • 8.
    COLLISION RESOLUTION • Twoor more keys result in a same hash value • Practically unavoidable • Handling techniques: • Separate chaining • Open addressing
  • 9.
    COLLISION RESOLUTION SEPARATE CHAINING •Make each cell of hash table point to a linked list of records that have same hash function value
  • 10.
    COLLISION RESOLUTION SEPARATE CHAINING •Make each cell of hash table point to a linked list of records that have same hash function value • Advantages: • Simple to implement • Hash table never fills up • Disadvantages: • Cache performance • Space wastage • Search time can become O(n) if chain gets long
  • 11.
    COLLISION RESOLUTION OPEN ADDRESSING •All elements are stored in the hash table itself • Operations: • Insert: Keep probing until an empty slot is found • Search: Keep probing until key is found or an empty slot is reached • Delete: If we simply delete a key, then search may fail. So slots of deleted keys are marked specially as DELETED
  • 12.
    COLLISION RESOLUTION OPEN ADDRESSING Types: •Linear probing: Linearly probe for next slot index = [hash(x) + i] % S
  • 13.
    COLLISION RESOLUTION OPEN ADDRESSING Types: •Linear probing: Linearly probe for next slot => Clustering index = [hash(x) + i] % S
  • 14.
    COLLISION RESOLUTION OPEN ADDRESSING Types: •Linear probing: Linearly probe for next slot => Clustering index = [hash(x) + i] % S • Quadratic probing: Look for i^2 slot in ith iteration index = [hash(x) + i^2] % S
  • 15.
    COLLISION RESOLUTION OPEN ADDRESSING Types: •Linear probing: Linearly probe for next slot => Clustering index = [hash(x) + i] % S • Quadratic probing: Look for i^2 slot in ith iteration index = [hash(x) + i^2] % S • Double hashing: Use another hash function hash2(x) and look for i*hash2(x) in ith iteration index = [hash(x) + i*hash2(x)] % S
  • 16.
    COLLISION RESOLUTION OPEN ADDRESSING Comparison: •Linear probing: • Easy to compute • Best cache performance • Suffers from Clustering • Quadratic probing: • Lies between cache performance and clustering • Double hashing: • Poor cache performance • No clustering • More computation time
  • 17.
    COLLISION RESOLUTION OPEN ADDRESSING •Advantages: • Better cache performance • Better space usage • Disadvantages: • Harder to implement • Hash table may become full • Clustering
  • 18.
    DYNAMIC RESIZING • Loadfactor = number of entries / number of buckets • When load factor is too low or too high => Dynamic resizing • Approaches: • Complete resizing • Incremental resizing
  • 19.
    USAGE • Associative Array •Database Indexing • Cache • Set • …
  • 20.
  • 21.

Editor's Notes

  • #5 Associative Array is an abstract data type composed of a collection of (key, value) pairs, such that each possible key appears at most once in the collection.
  • #9 Collision occurs when a newly inserted key maps to an already occupied slot in hash table
  • #10 Example: Hash function = key % 7
  • #11 Space wastage: some buckets may never be used, extra space to store links
  • #12 At any point, size of table must be greater than or equal to total number of keys Insert can insert an item in a deleted slot, but search doesn’t stop at a deleted slot.
  • #13 Example: Hash function = key % 7
  • #14 Clustering: Many consecutive elements form groups and it starts taking time to find a free slot or to search an element
  • #18 Better space usage: bucket co the luu key cua bucket khac neu co collision, no extra space for links
  • #19 During the resize, allocate the new hash table, but keep the old table unchanged. In each lookup or delete operation, check both tables. Perform insertion operations only in the new table. At each insertion also move r elements from the old table to the new table. When all elements are removed from the old table, deallocate it.