2007 JavaOne Conference

A Lock-Free Hash Table

A Lock-Free Hash Table

Dr. Cliff Click
Chief JVM Architect & Distinguish...
Hash Tables
www.azulsystems.com

•
•
•
•

Constant-Time Key-Value Mapping
Fast arbitrary function
Extendable, defined at r...
Popular Java Implementations
www.azulsystems.com

• Java's HashTable
─ Single threaded; scaling bottleneck

• HashMap

─ F...
A Lock-Free Hash Table
www.azulsystems.com

• No locks, even during table resize
No spin-locks
No blocking while holding l...
A Faster Hash Table
www.azulsystems.com

• Slightly faster than j.u.c for 99% reads < 32 cpus
• Faster with more cpus (2x ...
Agenda
www.azulsystems.com

•
•
•
•
•
•

|

Motivation
“Uninteresting” Hash Table Details
State-Based Reasoning
Resize
Per...
Some “Uninteresting” Details
www.azulsystems.com

• Hashtable: A collection of Key/Value Pairs
• Works with any collection...
“Uninteresting” Details
www.azulsystems.com

• Closed Power-of-2 Hash Table
─ Reprobe on collision
─ Stride-1 reprobe: bet...
Example get() code
www.azulsystems.com

idx = hash = key.hashCode();
while( true ) {

// reprobing loop

idx &= (size-1);
...
“Uninteresting” Details
www.azulsystems.com

• Could use prime table + MOD
─ Better hash spread, fewer reprobes
─ But MOD ...
Agenda
www.azulsystems.com

•
•
•
•
•
•

Motivation
“Uninteresting” Hash Table Details
State-Based Reasoning
Resize
Perfor...
Ordering and Correctness
www.azulsystems.com

• How to show table mods correct?
─ put, putIfAbsent, change, delete, etc.

...
State-Based Reasoning
www.azulsystems.com

• Define all {Key,Value} states and transitions
• Don't Care about memory order...
Valid States
www.azulsystems.com

• A Key slot is:

─ null – empty
─ K – some Key; can never change again

• A Value slot ...
State Machine
www.azulsystems.com

{null,null}

{K,T}

Empty

deleted key
insert

delete

insert
{K,null}
Partially insert...
Example put(key,newval) code:
www.azulsystems.com

idx = hash = key.hashCode();
while( true ) {

// Key-Claim stanza

idx ...
Example put(key,newval) code
www.azulsystems.com

// State: {key,?}
oldval = get_val(idx);

// State: {key,oldval}

// Tra...
Some Things to Notice
www.azulsystems.com

• Once a Key is set, it never changes

─ No chance of returning Value for wrong...
Some Things to Notice
www.azulsystems.com

• There is no machine-wide coherent State!
• Nobody guaranteed to read the same...
A Slightly Stronger Guarantee
www.azulsystems.com

• Probably want “happens-before” on Values
─ java.util.concurrent provi...
Agenda
www.azulsystems.com

•
•
•
•
•
•

Motivation
“Uninteresting” Hash Table Details
State-Based Reasoning
Resize
Perfor...
Resizing The Table
www.azulsystems.com

• Need to resize if table gets full
• Or just re-probing too often
• Resize copies...
Resizing
www.azulsystems.com

• Expand State Machine
• Side-effect: mid-resize is a valid State
• Means resize is:

─ Conc...
get/put during Resize
www.azulsystems.com

• get() works on the old table
─ Unless see a sentinel

• put() or other mod mu...
New State: 'use new table' Sentinel
www.azulsystems.com

• X: sentinel used during table-copy

─ Means: not in old table, ...
State Machine – old table
www.azulsystems.com

{X,null}

kill

{null,null}

{K,T}

Empty
insert

check newer table

Delete...
State Machine: Copy One Pair
www.azulsystems.com

empty
{null,null}

| 27
©2003 Azul Systems, Inc.

{X,null}
State Machine: Copy One Pair
www.azulsystems.com

empty
{null,null}

{X,null}

dead or partially inserted
{K,T/null}

| 28...
State Machine: Copy One Pair
www.azulsystems.com

empty
{null,null}

{X,null}

dead or partially inserted
{K,T/null}

{K,X...
Copying Old To New
www.azulsystems.com

• New States V', T' – primed versions of V,T

─ Prime'd values in new table copied...
New States: Prime'd
www.azulsystems.com

• A Key slot is:
─ null, K, X

• A Value slot is:

─ null, T, V, X
─ T',V' – prim...
State Machine – new table
www.azulsystems.com

kill

{null,null}
insert

{K,T}
{K,null}

Deleted key

Partially inserted
K...
State Machine – new table
www.azulsystems.com

kill

{null,null}
insert

{K,T}
{K,null}

Deleted key

Partially inserted
K...
State Machine – new table
www.azulsystems.com

kill

{null,null}
insert

{K,T}
{K,null}

Deleted key

Partially inserted
K...
State Machine: Copy One Pair
www.azulsystems.com

{K,V1}
read V1

K,V' in new table
X in old table

{K,X}

old
new
{K,V'x}...
Some Things to Notice
www.azulsystems.com

• Old value could be V or T

─ or V' or T' (if nested resize in progress)

• Sk...
Agenda
www.azulsystems.com

•
•
•
•
•
•

Motivation
“Uninteresting” Hash Table Details
State-Based Reasoning
Resize
Perfor...
Microbenchmark
www.azulsystems.com

• Measure insert/lookup/remove of Strings
• Tight loop: no work beyond HashTable itsel...
AMD 2.4GHz – 2 (ht) cpus
www.azulsystems.com

1K Table

1M Table
30

30

NB-99

25

CHM-99

20

M-ops/sec

20

M-ops/sec

...
Niagara – 8x4 cpus
www.azulsystems.com

1K Table

1M Table
80

70

70

60

60

CHM-99, NB-99

50

M-ops/sec

M-ops/sec

80...
Azul Vega1 – 384 cpus
www.azulsystems.com

1K Table

1M Table

500

500

NB-99
400

300

300

M-ops/sec

M-ops/sec

400

C...
Azul Vega2 – 768 cpus
www.azulsystems.com

1K Table

1M Table

1200

1200

NB-99

1000

1000

800

CHM-99

600

M-ops/sec
...
Summary
www.azulsystems.com

•
•
•
•
●

●
●

A faster lock-free HashTable
Faster for more CPUs
Much faster for higher tabl...
Summary
www.azulsystems.com

●

●

●

Obvious future work:
● Tools to check states
● Tools to write code
Seems applicable ...
#1 Platform for
Business Critical Java™

WWW.AZULSYSTEMS.COM
Thank You
Upcoming SlideShare
Loading in...5
×

Lock-Free, Wait-Free Hash Table

443

Published on

In this presentation, Dr. Cliff Click describes a totally lock-free hashtable with extremely low-cost and near perfect scaling. Readers pay no more than HashMap readers: just the cost of computing the hash, loading and comparing the key, and returning the value. Writers must use AtomicUpdate instead of a simple assignment but otherwise pay the same as readers. In particular, there is no required order between loads and stores; correctness is assured, no matter how the hardware orders memory operations. A state-based technique demonstrates the correctness of the algorithm. This novel approach is very straightforward and much easier to understand than the usual "happens-before" memory-order-based reasoning.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
443
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
13
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Lock-Free, Wait-Free Hash Table

  1. 1. 2007 JavaOne Conference A Lock-Free Hash Table A Lock-Free Hash Table Dr. Cliff Click Chief JVM Architect & Distinguished Engineer blogs.azulsystems.com/cliff Azul Systems May 8, 2007
  2. 2. Hash Tables www.azulsystems.com • • • • Constant-Time Key-Value Mapping Fast arbitrary function Extendable, defined at runtime Used for symbol tables, DB caching, network access, url caching, web content, etc • Crucial for Large Business Applications ─ > 1MLOC • Used in Very heavily multi-threaded apps ─ > 1000 threads | 2 ©2003 Azul Systems, Inc.
  3. 3. Popular Java Implementations www.azulsystems.com • Java's HashTable ─ Single threaded; scaling bottleneck • HashMap ─ Faster but NOT multi-thread safe • java.util.concurrent.HashMap ─ Striped internal locks; 16-way the default • Azul, IBM, Sun sell machines >100cpus • Azul has customers using all cpus in same app • Becomes a scaling bottleneck! | 3 ©2003 Azul Systems, Inc.
  4. 4. A Lock-Free Hash Table www.azulsystems.com • No locks, even during table resize No spin-locks No blocking while holding locks All CAS spin-loops bounded Make progress even if other threads die.... • Requires atomic update instruction: CAS (Compare-And-Swap), LL/SC (Load-Linked/Store-Conditional, PPC only) or similar • Uses sun.misc.Unsafe for CAS ─ ─ ─ ─ | 4 ©2003 Azul Systems, Inc.
  5. 5. A Faster Hash Table www.azulsystems.com • Slightly faster than j.u.c for 99% reads < 32 cpus • Faster with more cpus (2x faster) ─ Even with 4096-way striping ─ 10x faster with default striping • 3x Faster for 95% reads (30x vs default) • 8x Faster for 75% reads (100x vs default) • Scales well up to 768 cpus, 75% reads ─ Approaches hardware bandwidth limits | 5 ©2003 Azul Systems, Inc.
  6. 6. Agenda www.azulsystems.com • • • • • • | Motivation “Uninteresting” Hash Table Details State-Based Reasoning Resize Performance Q&A 6 ©2003 Azul Systems, Inc.
  7. 7. Some “Uninteresting” Details www.azulsystems.com • Hashtable: A collection of Key/Value Pairs • Works with any collection • Scaling, locking, bottlenecks of the collection management responsibility of that collection • Must be fast or O(1) effects kill you • Must be cache-aware • I'll present a sample Java solution ─ But other solutions can work, make sense | 7 ©2003 Azul Systems, Inc.
  8. 8. “Uninteresting” Details www.azulsystems.com • Closed Power-of-2 Hash Table ─ Reprobe on collision ─ Stride-1 reprobe: better cache behavior • Key & Value on same cache line • Hash memoized ─ Should be same cache line as K + V ─ But hard to do in pure Java • No allocation on get() or put() • Auto-Resize | 8 ©2003 Azul Systems, Inc.
  9. 9. Example get() code www.azulsystems.com idx = hash = key.hashCode(); while( true ) { // reprobing loop idx &= (size-1); // limit idx to table size k = get_key(idx); // start cache miss early h = get_hash(idx); // memoized hash if( k == key || (h == hash && key.equals(k)) ) return get_val(idx);// return matching value if( k == null ) return null; idx++; } | 9 ©2003 Azul Systems, Inc. // reprobe
  10. 10. “Uninteresting” Details www.azulsystems.com • Could use prime table + MOD ─ Better hash spread, fewer reprobes ─ But MOD is 30x slower than AND • Could use open table put() requires allocation Follow 'next' pointer instead of reprobe Each 'next' is a cache miss Lousy hash -> linked-list traversal • Could put Key/Value/Hash on same cache line • Other variants possible, interesting ─ ─ ─ ─ | 10 ©2003 Azul Systems, Inc.
  11. 11. Agenda www.azulsystems.com • • • • • • Motivation “Uninteresting” Hash Table Details State-Based Reasoning Resize Performance Q&A | 11 ©2003 Azul Systems, Inc.
  12. 12. Ordering and Correctness www.azulsystems.com • How to show table mods correct? ─ put, putIfAbsent, change, delete, etc. • Prove via: fencing, memory model, load/store • • • • ordering, “happens-before”? Instead prove* via state machine Define all possible {Key,Value} states Define Transitions, State Machine Show all states “legal” *Warning: hand-wavy proof follows | 12 ©2003 Azul Systems, Inc.
  13. 13. State-Based Reasoning www.azulsystems.com • Define all {Key,Value} states and transitions • Don't Care about memory ordering: ─ get() can read Key, Value in any order ─ put() can change Key, Value in any order ─ put() must use CAS to change Key or Value ─ But not double-CAS • No fencing required for correctness! ─ (sometimes stronger guarantees are wanted and will need fencing) • Proof is simple! | 13 ©2003 Azul Systems, Inc.
  14. 14. Valid States www.azulsystems.com • A Key slot is: ─ null – empty ─ K – some Key; can never change again • A Value slot is: ─ null – empty ─ T – tombstone ─ V – some Values • A state is a {Key,Value} pair • A transition is a successful CAS | 14 ©2003 Azul Systems, Inc.
  15. 15. State Machine www.azulsystems.com {null,null} {K,T} Empty deleted key insert delete insert {K,null} Partially inserted K/V pair {null,T/V} Partially inserted K/V pair Reader-only state | 15 ©2003 Azul Systems, Inc. change {K,V} Standard K/V pair
  16. 16. Example put(key,newval) code: www.azulsystems.com idx = hash = key.hashCode(); while( true ) { // Key-Claim stanza idx &= (size-1); k = get_key(idx); // State: {k,?} if( k == null && // {null,?} -> {key,?} CAS_key(idx,null,key) ) break; h = get_hash(idx); // State: {key,?} // get memoized hash if( k == key || (h == hash && key.equals(k)) ) break; idx++; } | 16 ©2003 Azul Systems, Inc. // State: {key,?} // reprobe
  17. 17. Example put(key,newval) code www.azulsystems.com // State: {key,?} oldval = get_val(idx); // State: {key,oldval} // Transition: {key,oldval} -> {key,newval} if( CAS_val(idx,oldval,newval) ) { // Transition worked ... // Adjust size } else { // Transition failed; oldval has changed // We can act “as if” our put() worked but // was immediately stomped over } return oldval; | 17 ©2003 Azul Systems, Inc.
  18. 18. Some Things to Notice www.azulsystems.com • Once a Key is set, it never changes ─ No chance of returning Value for wrong Key ─ Means Keys leak; table fills up with dead Keys ─ Fix in a few slides... • No ordering guarantees provided! ─ Bring Your Own Ordering/Synchronization • Weird {null,V} state meaningful but uninteresting ─ Means reader got an empty key and so missed ─ But possibly prefetched wrong Value | 18 ©2003 Azul Systems, Inc.
  19. 19. Some Things to Notice www.azulsystems.com • There is no machine-wide coherent State! • Nobody guaranteed to read the same State ─ Except on the same CPU with no other writers • No need for it either • Consider degenerate case of a single Key • Same guarantees as: ─ single shared global variable ─ many readers & writers, no synchronization ─ i.e., darned little | 19 ©2003 Azul Systems, Inc.
  20. 20. A Slightly Stronger Guarantee www.azulsystems.com • Probably want “happens-before” on Values ─ java.util.concurrent provides this • Similar to declaring that shared global 'volatile' • Things written into a Value before put() ─ Are guaranteed to be seen after a get() • Requires st/st fence before CAS'ing Value ─ “free” on Sparc, X86 • Requires ld/ld fence after loading Value ─ “free” on Azul | 20 ©2003 Azul Systems, Inc.
  21. 21. Agenda www.azulsystems.com • • • • • • Motivation “Uninteresting” Hash Table Details State-Based Reasoning Resize Performance Q&A | 21 ©2003 Azul Systems, Inc.
  22. 22. Resizing The Table www.azulsystems.com • Need to resize if table gets full • Or just re-probing too often • Resize copies live K/V pairs ─ Doubles as cleanup of dead Keys ─ Resize (“cleanse”) after any delete ─ Throttled, once per GC cycle is plenty often • Alas, need fencing, 'happens before' • Hard bit for concurrent resize & put(): ─ Must not drop the last update to old table | 22 ©2003 Azul Systems, Inc.
  23. 23. Resizing www.azulsystems.com • Expand State Machine • Side-effect: mid-resize is a valid State • Means resize is: ─ Concurrent – readers can help, or just read&go ─ Parallel – all can help ─ Incremental – partial copy is OK • Pay an extra indirection while resize in progress ─ So want to finish the job eventually • Stacked partial resizes OK, expected | 23 ©2003 Azul Systems, Inc.
  24. 24. get/put during Resize www.azulsystems.com • get() works on the old table ─ Unless see a sentinel • put() or other mod must use new table • Must check for new table every time ─ Late writes to old table 'happens before' resize • Copying K/V pairs is independent of get/put • Copy has many heuristics to choose from: ─ All touching threads, only writers, unrelated background thread(s), etc | 24 ©2003 Azul Systems, Inc.
  25. 25. New State: 'use new table' Sentinel www.azulsystems.com • X: sentinel used during table-copy ─ Means: not in old table, check new • A Key slot is: ─ null, K ─ X – 'use new table', not any valid Key ─ null → K OR null → X • A Value slot is: ─ null, T, V ─ X – 'use new table', not any valid Value ─ null → {T,V}* → X | 25 ©2003 Azul Systems, Inc.
  26. 26. State Machine – old table www.azulsystems.com {X,null} kill {null,null} {K,T} Empty insert check newer table Deleted key kill delete insert {K,null} {K,X} copy {K,V} into newer table {null,T/V/X} Partially inserted K/V pair | 26 ©2003 Azul Systems, Inc. change {K,V} Standard K/V pair States {X,T/V/X} not possible
  27. 27. State Machine: Copy One Pair www.azulsystems.com empty {null,null} | 27 ©2003 Azul Systems, Inc. {X,null}
  28. 28. State Machine: Copy One Pair www.azulsystems.com empty {null,null} {X,null} dead or partially inserted {K,T/null} | 28 ©2003 Azul Systems, Inc. {K,X}
  29. 29. State Machine: Copy One Pair www.azulsystems.com empty {null,null} {X,null} dead or partially inserted {K,T/null} {K,X} alive, but old {K,V1} {K,X} old table new table {K,V2} | 29 ©2003 Azul Systems, Inc. {K,V2}
  30. 30. Copying Old To New www.azulsystems.com • New States V', T' – primed versions of V,T ─ Prime'd values in new table copied from old ─ Non-prime in new table is recent put() ─ “happens after” any prime'd value ─ Prime allows 2-phase commit ─ Engineering: wrapper class (Java), steal bit (C) • Must be sure to copy late-arriving old-table write • Attempt to copy atomically ─ May fail & copy does not make progress ─ But old, new tables not damaged | 30 ©2003 Azul Systems, Inc.
  31. 31. New States: Prime'd www.azulsystems.com • A Key slot is: ─ null, K, X • A Value slot is: ─ null, T, V, X ─ T',V' – primed versions of T & V ─ Old things copied into the new table ─ “2-phase commit” ─ null → {T',V'}* → {T,V}* → X • State Machine again... | 31 ©2003 Azul Systems, Inc.
  32. 32. State Machine – new table www.azulsystems.com kill {null,null} insert {K,T} {K,null} Deleted key Partially inserted K/V pair {K,X} copy {K,V} into newer table {K,T'/V'} {null,T/T'/V/V'/X} kill insert copy in from older table check newer table delete Empty {X,null} {K,V} Standard K/V pair States {X,T/T'/V/V'/X} not possible | 32 ©2003 Azul Systems, Inc.
  33. 33. State Machine – new table www.azulsystems.com kill {null,null} insert {K,T} {K,null} Deleted key Partially inserted K/V pair {K,X} copy {K,V} into newer table {K,T'/V'} {null,T/T'/V/V'/X} kill insert copy in from older table check newer table delete Empty {X,null} {K,V} Standard K/V pair States {X,T/T'/V/V'/X} not possible | 33 ©2003 Azul Systems, Inc.
  34. 34. State Machine – new table www.azulsystems.com kill {null,null} insert {K,T} {K,null} Deleted key Partially inserted K/V pair {K,X} copy {K,V} into newer table {K,T'/V'} {null,T/T'/V/V'/X} kill insert copy in from older table check newer table delete Empty {X,null} {K,V} Standard K/V pair States {X,T/T'/V/V'/X} not possible | 34 ©2003 Azul Systems, Inc.
  35. 35. State Machine: Copy One Pair www.azulsystems.com {K,V1} read V1 K,V' in new table X in old table {K,X} old new {K,V'x} read V'x {K,V'1} partial copy Fence Fence | 35 ©2003 Azul Systems, Inc. {K,V'1} {K,V1} copy complete
  36. 36. Some Things to Notice www.azulsystems.com • Old value could be V or T ─ or V' or T' (if nested resize in progress) • Skip copy if new Value is not prime'd ─ Means recent put() overwrote any old Value • If CAS into new fails ─ Means either put() or other copy in progress ─ So this copy can quit • Any thread can see any state at any time ─ And CAS to the next state | 36 ©2003 Azul Systems, Inc.
  37. 37. Agenda www.azulsystems.com • • • • • • Motivation “Uninteresting” Hash Table Details State-Based Reasoning Resize Performance Q&A | 37 ©2003 Azul Systems, Inc.
  38. 38. Microbenchmark www.azulsystems.com • Measure insert/lookup/remove of Strings • Tight loop: no work beyond HashTable itself and test harness (mostly RNG) • “Guaranteed not to exceed” numbers • All fences; full ConcurrentHashMap semantics • Variables: ─ 99% get, 1% put (typical cache) vs 75 / 25 ─ Dual Athalon, Niagara, Azul Vega1, Vega2 ─ Threads from 1 to 800 ─ NonBlocking vs 4096-way ConcurrentHashMap ─ 1K entry table vs 1M entry table | 38 ©2003 Azul Systems, Inc.
  39. 39. AMD 2.4GHz – 2 (ht) cpus www.azulsystems.com 1K Table 1M Table 30 30 NB-99 25 CHM-99 20 M-ops/sec 20 M-ops/sec 25 NB-75 15 CHM-75 10 15 10 5 0 0 NB 5 1 2 3 4 5 Threads | 39 ©2003 Azul Systems, Inc. 6 7 8 CHM 1 2 3 4 5 Threads 6 7 8
  40. 40. Niagara – 8x4 cpus www.azulsystems.com 1K Table 1M Table 80 70 70 60 60 CHM-99, NB-99 50 M-ops/sec M-ops/sec 80 40 30 CHM-75, NB-75 50 40 NB 30 20 20 10 10 0 CHM 0 0 8 16 24 32 40 Threads | 40 ©2003 Azul Systems, Inc. 48 56 64 0 8 16 24 32 Threads 40 48 56 64
  41. 41. Azul Vega1 – 384 cpus www.azulsystems.com 1K Table 1M Table 500 500 NB-99 400 300 300 M-ops/sec M-ops/sec 400 CHM-99 200 200 NB NB-75 100 100 CHM-75 0 0 100 200 Threads | 41 ©2003 Azul Systems, Inc. 300 400 CHM 0 0 100 200 Threads 300 400
  42. 42. Azul Vega2 – 768 cpus www.azulsystems.com 1K Table 1M Table 1200 1200 NB-99 1000 1000 800 CHM-99 600 M-ops/sec M-ops/sec 800 600 NB 400 400 NB-75 200 200 CHM-75 0 CHM 0 0 100 200 300 400 500 600 700 800 Threads | 42 ©2003 Azul Systems, Inc. 0 100 200 300 400 500 Threads 600 700 800
  43. 43. Summary www.azulsystems.com • • • • ● ● ● A faster lock-free HashTable Faster for more CPUs Much faster for higher table modification rate State-Based Reasoning: ─ No ordering, no JMM, no fencing Any thread can see any state at any time ● Must assume values change at each step State graphs really helped coding & debugging Resulting code is small & fast | 43 ©2003 Azul Systems, Inc.
  44. 44. Summary www.azulsystems.com ● ● ● Obvious future work: ● Tools to check states ● Tools to write code Seems applicable to other data structures as well ● Concurrent append j.u.Vector ● Scalable near-FIFO work queues Code & Video available at: http://blogs.azulsystems.com/cliff/ | 44 ©2003 Azul Systems, Inc.
  45. 45. #1 Platform for Business Critical Java™ WWW.AZULSYSTEMS.COM Thank You
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×