2. Concurrent hash table
•Indexing key-value objects
• Lookup(key)
• Insert(key, value)
• Delete(key)
•Fundamental building block for modern systems
• System applications (e.g., kernel caches)
• Concurrent user-level applications
3. Goal: memory-efficient and high-throughput
• Memory efficient (e.g., > 90% space utilized)
• Fast concurrent reads (scale with # of cores/threads)
• Fast concurrent writes (scale with # of cores/threads)
4. separate chaining hash table
K V
K V
K V
K V
K V
Chaining items hashed in same bucket
lookup K V
5. separate chaining hash table
- Acquiring a Linked List Node Location
- Only one thread should insert new node.
- find the end of the linked list, lock out other
threads.
- Make sure we are still at the end of the linked
list.
- Then allocate a new node from the unfilled
region, and finally link it in.
6. Good: simple
Bad: poor cache locality
Bad: pointers cost space
separate chaining hash table
7. open addressing hash table
Probing alternate locations for vacancy
e.g., linear/quadratic probing, double hashing
- Inserting a Key :
- 1) The bucket is empty.
- Must acquire a location.
- TwoStepAcquireAttempt algorithm is used.
- Insert the key.
- 2) The bucket is already filled.
- Find next possible bucket using any probing
method.
8. Procedure: TwoStepAcquireAttempt
Parameters: (location, key)
// first check without locking
1: if location is empty then
2: lock(location)
// then check with the lock
3: if location is still empty then
4: Reserve location for key
5: unlock(location)
6: return true
7: end if
// Another thread modified the structure before we could acquire location
8: unlock(location)
9: end if
10: return false . // Caller needs to continue searching
9. open addressing hash table
Probing alternate locations for vacancy
lookup
Good: cache friendly
Bad: poor memory efficiency
- performance dramatically degrades when the
usage grows beyond 70% capacity or so
- e.g., Google dense_hash_map wastes 50%
memory by default.
10. Linear Hashing with linear Probing
H1 H2 H3 H4 H5 H6 H7 H8 H9 h10
Parallel Insert :
Case 1 : Two keys are inserted at different location.
xH1 = insert( ) yH5 = insert( )
Pass : Pass 1
11. Linear Hashing with linear Probing
H1 H2 H3 H4 H5 H6 H7 H8 H9 h10
s u t
Parallel Insert :
Case 2 : Two keys are inserted at different location but collision occurs.
xH1 = insert( ) yH5 = insert( )
Pass : Pass 1 Pass 2 Pass 3
12. Linear Hashing with linear Probing
H1 H2 H3 H4 H5 H6 H7 H8 H9 h10
xH1 = insert( ) yH1 = insert( )
Pass : Pass 1
Parallel Insert :
Case 3 : Two keys tried to insert at same location and contention occurred.
x y
Pass 2
13. Analysis : linear hashing
Serial Approach Parallel Approach
Lookup : Best case : O(n)
Worst Case : O(n2)
Best case : O(1),
Worst case : O(n)
Insert Best Case : O(n),
Worst Case : O(n2)
Best case : O(1)
Worst Case : O(n)
Removal : Best case : O(n),
Worst Case : O(n2)
Best case : O(1)
Worst case : O(n)
14. Cuckoo hashing
• Use two hash tables and two hash functions
• Each element will have exactly one “nest” (hash location) in each table
• Guarantee that any element will only ever exist in one of its “nests”.
• Lookup/delete are O(1) because we can check 2 locations (“nests”) in O(1) time.
15. Cuckoo hashing
Insertion :
1. Insert an element by finding one of its “nests” and
putting it there
• This may evict another element!
2. Insert the evicted element into its *other* “nest”
This may evict another element!
17. Cuckoo hashing
Asymmetric cuckoo hashing :
• Choose one (the first) table to be larger than
the other
– Improves the probability that we get a hit on the
first lookup
– Only a minor slowdown on insert
18. Cuckoo hashing
Same Table cuckoo Hashing :
• We didn’t actually need two separate tables.
– It made the analysis much easier.
– But… In practice, we just need two hash functions.
19. Cuckoo hashing
Each bucket has b slots for items (b-way set-associative)
Each key is mapped to two random buckets
• stored in one of them
buckets
0
1
2
3
4
5
6
7
8
key x
hash1(x)
hash2(x)
20. Predictable and fast lookup
• Lookup: read 2 buckets in parallel
• constant time in the worst case
x
0
1
2
3
4
5
6
7
8
Lookup x
21. 0
1
2
3
4
5
6
7
8
move keys to alternate buckets
Insert may need “cuckoo move”
• Insert:
Both are full?
Insert y a
x
b
k
r
c
s
e
n
f
x
a
bWrite to an empty slot in
one of the
possible
locations
x
a
b possible
locations
possible
locations
22. Insert may need “cuckoo move”
• Insert: move keys to alternate buckets
• find a “cuckoo path” to an empty slot
• move hole backwards
0
1
2
3
4
5
6
7
8
Insert y
x
a
b
y
b
a
x
23. Case 1 : Inserting ‘a’ and ‘b’ in parallel, ‘b’ find its place in 1st hash function and ‘a’
has to evict value.
24. k
Case 2 : Inserting ‘a’ and ‘b’ in parallel, ‘a’ and ‘b’ has to evict value.
25. Algorithmic optimizations
• Lock after discovering a cuckoo path
• minimize critical sections
• But…
• Decreases parallelization
26. Previous approach: writer locks the table
during the whole insert process
lock();
Search for a cuckoo path;
Cuckoo move and insert;
unlock();
All Insert operations of other threads are blocked
27. Case 3 : Inserting ‘a’ and ‘b’ in parallel, ‘a’ and ‘b’ has to evict value.
But they are having same path, so contention will occur on path.
while(1) {
Search for a cuckoo path;
lock();
Cuckoo move and insert;
if(success)
unlock();
break;
}
This algo will work to avoid
contention on path and one who
came first will be completing the
process and other will try after
some time.
28. Lock after discovering a cuckoo path
while(1) {
Search for a cuckoo path;
lock();
Cuckoo move and insert;
if(success)
unlock();
break;
}
// no locking required
Multiple Insert threads can look for cuckoo paths concurrently
←collision
Cuckoo move and insert;
29. Analysis : parallel cuckoo hashing
Serial Approach Parallel Approach
Lookup : Best case : O(n)
Worst Case : O(n2)
Best case : O(1),
Worst case : O(n)
Insert : Best Case : O(n),
Worst Case : O(n),
Best case : O(1)
Worst Case : O(n),
Removal : Best case : O(n),
Worst Case : O(n2)
Best case : O(1)
Worst case : O(n)
Cuckoo with 2 hash function performs well until 50% load.
Cuckoo with 3 hash function performs well until 90% load.
30. Lock-free Cuckoo Hashing
• The algorithm allows mutating operations to operate concurrently with
query ones and requires only single word compare-and-swap primitives.
• When an insertion triggers a sequence of key displacements, instead of
locking the whole cuckoo path, this algorithm breaks down the chain of
relocations into several single relocations which can be executed
independently and concurrently with other operations.
31. Lock-free Cuckoo Hashing
• Here, concurrent cuckoo hashing contains two hash tables (sub-tables),
which correspond to two independent hash functions.
• Each key can be stored at one of its two possible positions, one in each
sub-table called primary and secondary respectively.
32. Lock-free Cuckoo Hashing
• Problem :
• The original cuckoo approach inserts the new key to the primary sub-table
by evicting a collided key and re-inserting it to the other sub-table.
• This approach, however, causes the evicted key to be “nestless”, i.e. absent
from both sub-tables, until the re-insertion is completed.
• This might be an issue in concurrent environments: the “nestless” key is
unreachable by other concurrent operations which want to operate on it.
33. Lock-free Cuckoo Hashing
• Problem :
• Moving key problem :
• The key present in table[1] is relocated to table[0], meanwhile search reads
from table[0] and then table[1].
• To avoid such missing, search performs the second round query.
• Another issue with insertion operations is that a key can be inserted to two
sub-tables simultaneously.
• Since concurrent insertions can operate independently on two sub-tables, they can
both succeed. This results in two existing instances of one key, possibly mapped to
different values.
• To prevent such a case to happen, one can use a lock during insertion to lock both
slots so that only one instance of a key is successfully added to the table at a time.
34. Lock-free Cuckoo Hashing
• Searching :
• Searching for a key in our lock-free cuckoo hash table includes querying for
the existence of the key in two sub-tables.
• A key is available if it is found in one of them. A search operation starts
with the first query round by reading from the possible slots in the primary
sub-table and then in the secondary sub-table
• If the key is found in one of them, the value mapped to key is returned.
35. Search(x) : Case 1 : Value found in the Primary sub-table.
x
Primary sub-table Secondary sub-table
H1 = hash1(x)
h1
36. Search(x) : Case 2: Value found in the secondary sub-table.
Primary sub-table Secondary sub-table
x
H2= hash2(x)
h2
37. Lock-free Cuckoo Hashing
• Insert :
• Find :
• It examines both sub-tables to discover if two instances of the key exist in two
sub-tables. When the same key is found on both sub-tables, the one in the
secondary sub-table is deleted.
38. Find(x) :
Case 1 : If key already exist then update & return.
Primary sub-table Secondary sub-table
x
H1 = hash1(x)
39. Find(x) :
Case 2 : If the same key found on both then one from
secondary sub-table gets deleted.
Primary sub-table Secondary sub-table
x
H1 = hash1(x)
x
H2= hash2(x)
h1
h2
40. Find(x) :
Case 3 : returns current items which are stored at possible
slots where key should be hashed to.
Primary sub-table Secondary sub-table
y
H1 = hash1(x)
z
H2= hash2(x)
return y;
return z;
41. Lock-free Cuckoo Hashing
• Insert :
• Insert a new value using CAS( Compare-And-Swap) primitives.
• CAS is a synchronization primitive available in most modern processors. It
compares the content of a memory word to a given value and, only if they
are the same, modifies the content of that word to a given new value.
42. y
Insert (x) :
Case 1 : If one of the two slots is empty, the new entry is
inserted with a CAS.
Primary sub-table Secondary sub-table
H1 = hash1(x)
x
H2 = hash2(x)
x
h1
h2
43. r
e
c
a
b
n
Insert (x) :
Case 2 : If both the slots are occupied then relocation
process initiated using the cuckoo path.
Primary sub-table Secondary sub-table
H1 = hash1(x)
x
Cuckoo Path : s y k
s
y k
ys
k
k
y
s
f
44. X
Cuckoo hash : Problem while Cycle
Solution : After Threshold time, use another hash function or extend the size of
hash table.
S T
U
Z
Y V
45. Analysis : lock-free cuckoo
Serial Approach Parallel Approach
Lookup Best Case : O(n)
Worst Case : O(d*n)
Best Case : O(1)
Worst Case : O(d)
Insert Best Case : O(n),
Worst Case : > O(d*n),
Best case : O(1)
Worst case : > O(d),
Removal Best Case : O(n)
Worst Case : O(d*n)
Best Case : O(1)
Worst Case : O(d)
Where, d>2 is number of hash functions and sub table
Concurrent cuckoo hash table
- high memory efficiency
- fast concurrent writes and reads
46. References
1. Eric L. Goodman, David J. Hagliny, Chad Scherrer,Daniel Chavarr´ia-
Miranda, Jace Mogil, John Feo, “Hashing Strategies for the Cray XMT”
2. Parallel and Sequential Data Structures and Algorithms, 15-210 (Spring
2012) Lectured by Margaret Reid-Miller — 26 April 2012.
3. Xiaozhou Li, David G. Andersen, Michael Kaminsky, Michael J. Freedman,
“Algorithmic Improvements for Fast Concurrent Cuckoo Hashing”,
“Princeton University, Carnegie Mellon University, Intel Labs.”
4. Nhan Nguyen, Philippas Tsigas,Chalmers University of Technology
Gothenburg, Sweden, “Lock-free Cuckoo Hashing”.