Parallel Hashing Algorithms

Parallel Hashing Algorithms
Prepared By :
Akash Panchal (14MCEC17) & Nisarg Patel (14MCEC20)
@Nirma University, Ahmedabad

Concurrent hash table
•Indexing key-value objects
• Lookup(key)
• Insert(key, value)
• Delete(key)
•Fundamental building block for modern systems
• System applications (e.g., kernel caches)
• Concurrent user-level applications

Goal: memory-efficient and high-throughput
• Memory efficient (e.g., > 90% space utilized)
• Fast concurrent reads (scale with # of cores/threads)
• Fast concurrent writes (scale with # of cores/threads)

separate chaining hash table
K V
K V
K V
K V
K V
Chaining items hashed in same bucket
lookup K V

- Acquiring a Linked List Node Location
- Only one thread should insert new node.
- find the end of the linked list, lock out other
threads.
- Make sure we are still at the end of the linked
list.
- Then allocate a new node from the unfilled
region, and finally link it in.

Good: simple
Bad: poor cache locality
Bad: pointers cost space

open addressing hash table
Probing alternate locations for vacancy
e.g., linear/quadratic probing, double hashing
- Inserting a Key :
- 1) The bucket is empty.
- Must acquire a location.
- TwoStepAcquireAttempt algorithm is used.
- Insert the key.
- 2) The bucket is already filled.
- Find next possible bucket using any probing
method.

Procedure: TwoStepAcquireAttempt
Parameters: (location, key)
// ﬁrst check without locking
1: if location is empty then
2: lock(location)
// then check with the lock
3: if location is still empty then
4: Reserve location for key
5: unlock(location)
6: return true
7: end if
// Another thread modiﬁed the structure before we could acquire location
8: unlock(location)
9: end if
10: return false . // Caller needs to continue searching

open addressing hash table
Probing alternate locations for vacancy
lookup
Good: cache friendly
Bad: poor memory efficiency
- performance dramatically degrades when the
usage grows beyond 70% capacity or so
- e.g., Google dense_hash_map wastes 50%
memory by default.

Linear Hashing with linear Probing
H1 H2 H3 H4 H5 H6 H7 H8 H9 h10
Parallel Insert :
Case 1 : Two keys are inserted at different location.
xH1 = insert( ) yH5 = insert( )
Pass : Pass 1

H1 H2 H3 H4 H5 H6 H7 H8 H9 h10
s u t
Parallel Insert :
Case 2 : Two keys are inserted at different location but collision occurs.
Pass : Pass 1 Pass 2 Pass 3

H1 H2 H3 H4 H5 H6 H7 H8 H9 h10
Pass : Pass 1
Parallel Insert :
Case 3 : Two keys tried to insert at same location and contention occurred.
x y
Pass 2

Analysis : linear hashing
Serial Approach Parallel Approach
Lookup : Best case : O(n)
Worst Case : O(n2)
Best case : O(1),
Worst case : O(n)
Insert Best Case : O(n),
Worst Case : O(n2)
Best case : O(1)
Worst Case : O(n)
Removal : Best case : O(n),
Worst Case : O(n2)
Best case : O(1)
Worst case : O(n)

Cuckoo hashing
• Use two hash tables and two hash functions
• Each element will have exactly one “nest” (hash location) in each table
• Guarantee that any element will only ever exist in one of its “nests”.
• Lookup/delete are O(1) because we can check 2 locations (“nests”) in O(1) time.

Cuckoo hashing
Insertion :
1. Insert an element by finding one of its “nests” and
putting it there
• This may evict another element!
2. Insert the evicted element into its *other* “nest”
This may evict another element!

X
Cuckoo hashing in Sequential: ‘X’ is inserted by moving ‘Y’ and ‘Z’

Cuckoo hashing
Asymmetric cuckoo hashing :
• Choose one (the first) table to be larger than
the other
– Improves the probability that we get a hit on the
first lookup
– Only a minor slowdown on insert

Cuckoo hashing
Same Table cuckoo Hashing :
• We didn’t actually need two separate tables.
– It made the analysis much easier.
– But… In practice, we just need two hash functions.

Cuckoo hashing
Each bucket has b slots for items (b-way set-associative)
Each key is mapped to two random buckets
• stored in one of them
buckets
0
1
2
3
4
5
6
7
8
key x
hash1(x)
hash2(x)

Predictable and fast lookup
• Lookup: read 2 buckets in parallel
• constant time in the worst case
x
0
1
2
3
4
5
6
7
8
Lookup x

0
1
2
3
4
5
6
7
8
move keys to alternate buckets
Insert may need “cuckoo move”
• Insert:
Both are full？
Insert y a
x
b
k
r
c
s
e
n
f
x
a
bWrite to an empty slot in
one of the
possible
locations
x
a
b possible
locations
possible
locations

Insert may need “cuckoo move”
• Insert: move keys to alternate buckets
• find a “cuckoo path” to an empty slot
• move hole backwards
0
1
2
3
4
5
6
7
8
Insert y
x
a
b
y
b
a
x

Case 1 : Inserting ‘a’ and ‘b’ in parallel, ‘b’ find its place in 1st hash function and ‘a’
has to evict value.

k
Case 2 : Inserting ‘a’ and ‘b’ in parallel, ‘a’ and ‘b’ has to evict value.

Algorithmic optimizations
• Lock after discovering a cuckoo path
• minimize critical sections
• But…
• Decreases parallelization

Previous approach: writer locks the table
during the whole insert process
lock();
Search for a cuckoo path;
Cuckoo move and insert;
unlock();
All Insert operations of other threads are blocked

Case 3 : Inserting ‘a’ and ‘b’ in parallel, ‘a’ and ‘b’ has to evict value.
But they are having same path, so contention will occur on path.
while(1) {
lock();
if(success)
unlock();
break;
}
This algo will work to avoid
contention on path and one who
came first will be completing the
process and other will try after
some time.

Lock after discovering a cuckoo path
while(1) {
lock();
if(success)
unlock();
break;
}
// no locking required
Multiple Insert threads can look for cuckoo paths concurrently
←collision

Analysis : parallel cuckoo hashing
Lookup : Best case : O(n)
Worst Case : O(n2)
Best case : O(1),
Worst case : O(n)
Insert : Best Case : O(n),
Worst Case : O(n),
Best case : O(1)
Worst Case : O(n),
Removal : Best case : O(n),
Worst Case : O(n2)
Best case : O(1)
Worst case : O(n)
Cuckoo with 2 hash function performs well until 50% load.
Cuckoo with 3 hash function performs well until 90% load.

Lock-free Cuckoo Hashing
• The algorithm allows mutating operations to operate concurrently with
query ones and requires only single word compare-and-swap primitives.
• When an insertion triggers a sequence of key displacements, instead of
locking the whole cuckoo path, this algorithm breaks down the chain of
relocations into several single relocations which can be executed
independently and concurrently with other operations.

• Here, concurrent cuckoo hashing contains two hash tables (sub-tables),
which correspond to two independent hash functions.
• Each key can be stored at one of its two possible positions, one in each
sub-table called primary and secondary respectively.

• Problem :
• The original cuckoo approach inserts the new key to the primary sub-table
by evicting a collided key and re-inserting it to the other sub-table.
• This approach, however, causes the evicted key to be “nestless”, i.e. absent
from both sub-tables, until the re-insertion is completed.
• This might be an issue in concurrent environments: the “nestless” key is
unreachable by other concurrent operations which want to operate on it.

• Problem :
• Moving key problem :
• The key present in table[1] is relocated to table[0], meanwhile search reads
from table[0] and then table[1].
• To avoid such missing, search performs the second round query.
• Another issue with insertion operations is that a key can be inserted to two
sub-tables simultaneously.
• Since concurrent insertions can operate independently on two sub-tables, they can
both succeed. This results in two existing instances of one key, possibly mapped to
different values.
• To prevent such a case to happen, one can use a lock during insertion to lock both
slots so that only one instance of a key is successfully added to the table at a time.

• Searching :
• Searching for a key in our lock-free cuckoo hash table includes querying for
the existence of the key in two sub-tables.
• A key is available if it is found in one of them. A search operation starts
with the ﬁrst query round by reading from the possible slots in the primary
sub-table and then in the secondary sub-table
• If the key is found in one of them, the value mapped to key is returned.

Search(x) : Case 1 : Value found in the Primary sub-table.
x
Primary sub-table Secondary sub-table
H1 = hash1(x)
h1

Search(x) : Case 2: Value found in the secondary sub-table.
x
H2= hash2(x)
h2

• Insert :
• Find :
• It examines both sub-tables to discover if two instances of the key exist in two
sub-tables. When the same key is found on both sub-tables, the one in the
secondary sub-table is deleted.

Find(x) :
Case 1 : If key already exist then update & return.
x
H1 = hash1(x)

Find(x) :
Case 2 : If the same key found on both then one from
secondary sub-table gets deleted.
x
H1 = hash1(x)
x
H2= hash2(x)
h1
h2

Find(x) :
Case 3 : returns current items which are stored at possible
slots where key should be hashed to.
y
H1 = hash1(x)
z
H2= hash2(x)
return y;
return z;

• Insert :
• Insert a new value using CAS( Compare-And-Swap) primitives.
• CAS is a synchronization primitive available in most modern processors. It
compares the content of a memory word to a given value and, only if they
are the same, modiﬁes the content of that word to a given new value.

y
Insert (x) :
Case 1 : If one of the two slots is empty, the new entry is
inserted with a CAS.
H1 = hash1(x)
x
H2 = hash2(x)
x
h1
h2

r
e
c
a
b
n
Insert (x) :
Case 2 : If both the slots are occupied then relocation
process initiated using the cuckoo path.
H1 = hash1(x)
x
Cuckoo Path : s y k
s
y k
ys
k
k
y
s
f

X
Cuckoo hash : Problem while Cycle
Solution : After Threshold time, use another hash function or extend the size of
hash table.
S T
U
Z
Y V

Analysis : lock-free cuckoo
Lookup Best Case : O(n)
Worst Case : O(d*n)
Best Case : O(1)
Worst Case : O(d)
Insert Best Case : O(n),
Worst Case : > O(d*n),
Best case : O(1)
Worst case : > O(d),
Removal Best Case : O(n)
Worst Case : O(d*n)
Best Case : O(1)
Worst Case : O(d)
Where, d>2 is number of hash functions and sub table
Concurrent cuckoo hash table
- high memory efficiency
- fast concurrent writes and reads

References
1. Eric L. Goodman, David J. Hagliny, Chad Scherrer,Daniel Chavarr´ia-
Miranda, Jace Mogil, John Feo, “Hashing Strategies for the Cray XMT”
2. Parallel and Sequential Data Structures and Algorithms, 15-210 (Spring
2012) Lectured by Margaret Reid-Miller — 26 April 2012.
3. Xiaozhou Li, David G. Andersen, Michael Kaminsky, Michael J. Freedman,
“Algorithmic Improvements for Fast Concurrent Cuckoo Hashing”,
“Princeton University, Carnegie Mellon University, Intel Labs.”
4. Nhan Nguyen, Philippas Tsigas,Chalmers University of Technology
Gothenburg, Sweden, “Lock-free Cuckoo Hashing”.

Parallel Hashing Algorithms

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Parallel Hashing Algorithms

Similar to Parallel Hashing Algorithms (20)

Recently uploaded

Recently uploaded (20)

Parallel Hashing Algorithms

Editor's Notes