Faster persistent data structures through                hashing                Johan Tibell           johan.tibell@gmail....
Motivating problem: Twitter data analysis       Im computing a communication graph from Twitter       data and then scan i...
Persistent maps in Haskell      Data.Map is the most commonly used map type.      Its implemented using size balanced tree...
Real world performance of Data.Map      Good in theory: no more than O(log n) comparisons.      Not great in practice: up ...
Hash tables      Hash tables perform well with string keys: O(k) amortized      lookup time for strings of length k.      ...
Milan Strakas idea: IntMaps as arrays      We can use hashing without using hash tables!      Data.IntMap implements a per...
Collisions are easy to deal with      IntMap implements a sparse, persistent array of size 232      (or 264 ).      Hashin...
HashMap implemented using an IntMap  Naive implementation:       newtype HashMap k v = HashMap (IntMap [(k, v)])  By inlin...
Benchmark: Map vs HashMap  Keys: 212 random 8-byte ByteStrings                         Runtime (μs)      Runtime          ...
Can we do better?      Imperative hash tables still perform better, perhaps theres      room for improvement.      We stil...
Borrowing from our neighbours      Clojure uses a hash-array mapped trie (HAMT) data      structure to implement persisten...
Hash-array mapped tries      Shallow tree with high branching factor.      Each node, except the leaf nodes, contains an a...
HAMT                  Hash  BitmapIndexed       Leaf
The Haskell definition of a HAMT  data HashMap k v      = Empty      | BitmapIndexed !Bitmap !(Array (HashMap k v))      | ...
High performance Haskell programming  Optimized implementation using standard techniques:      constructor unpacking,     ...
Optimizing insertion      Most time in insert is spent copying small arrays.      Array copying is implemented in Haskell ...
Optimizing insertion: copy less      Bagwells original formulation used a fanout of 32.      A fanout of 16 seems to provi...
Optimizing insertion: copy faster      Daniel Peebles and I have implemented a set of new      primops for copying arrays ...
Optimizing insertion: common patterns      In many cases maps are created in one go from a      sequence of key/value pair...
Optimizing lookup: Faster population count      Tried several bit population count implementations.      Best speed/memory...
Benchmark: IntMap-based vs HAMT  Keys: 212 random 8-byte ByteStrings                          Runtime (μs)      Runtime   ...
Memory usage: IntMap-based  Total: 96 MB, tree: 66MB (220 Int entries)           mem              43,678,549 bytes x secon...
Memory usage: HAMT  Total: 71MB, tree: 41MB (220 Int entries)           mem                                            25,...
Summary  Keys: 212 random 8-byte ByteStrings                        Runtime (μs)     Runtime                         Map H...
Upcoming SlideShare
Loading in...5
×

Faster persistent data structures through hashing

5,380

Published on

This talk was given at HIW-2011

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
5,380
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
29
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Faster persistent data structures through hashing

  1. 1. Faster persistent data structures through hashing Johan Tibell johan.tibell@gmail.com 2011-09-23
  2. 2. Motivating problem: Twitter data analysis Im computing a communication graph from Twitter data and then scan it daily to allocate social capital to nodes behaving in a good karmic manner. The graph is culled from 100 million tweets and has about 3 million nodes. We need a data structure that is fast when used with string keys, and doesnt use too much memory.
  3. 3. Persistent maps in Haskell Data.Map is the most commonly used map type. Its implemented using size balanced trees and is representative of the performance of other binary tree implementations. Keys can be of any type, as long as values of the type can be ordered.
  4. 4. Real world performance of Data.Map Good in theory: no more than O(log n) comparisons. Not great in practice: up to O(log n) comparisons! Many common types are expensive to compare e.g String, ByteString, and Text. Given a string of length k, we need O(k ∗ log n) comparisons to look up an entry.
  5. 5. Hash tables Hash tables perform well with string keys: O(k) amortized lookup time for strings of length k. However, we want persistent maps, not mutable hash tables.
  6. 6. Milan Strakas idea: IntMaps as arrays We can use hashing without using hash tables! Data.IntMap implements a persistent array and is much faster than Data.Map. Use hashing to derive an Int from an arbitrary key. class Hashable a where hash :: a -> Int
  7. 7. Collisions are easy to deal with IntMap implements a sparse, persistent array of size 232 (or 264 ). Hashing using this many buckets makes collisions rare: for 224 entries we expect about 32,000 single collisions. Implication: We can use any old collision handling strategy (e.g. chaining using linked lists).
  8. 8. HashMap implemented using an IntMap Naive implementation: newtype HashMap k v = HashMap (IntMap [(k, v)]) By inlining (``unpacking) the list and pair constructors we can save 2 words of memory per key/value pair.
  9. 9. Benchmark: Map vs HashMap Keys: 212 random 8-byte ByteStrings Runtime (μs) Runtime Map HashMap % increase lookup 1956 916 -53% insert 3543 1855 -48% delete 3791 1838 -52%
  10. 10. Can we do better? Imperative hash tables still perform better, perhaps theres room for improvement. We still need to perform O(min(W, log n)) Int comparisons, where W is the number of bits in a word. The memory overhead per key/value pair is still high, about 9 words per key/value pair.
  11. 11. Borrowing from our neighbours Clojure uses a hash-array mapped trie (HAMT) data structure to implement persistent maps. Described in the paper ``Ideal Hash Trees by Bagwell (2001). Originally a mutable data structure implemented in C++. Clojures persistent version was created by Rich Hickey.
  12. 12. Hash-array mapped tries Shallow tree with high branching factor. Each node, except the leaf nodes, contains an array of up to 32 elements. 5 bits of the hash are used to index the array at each level. A clever trick, using bit population count, is used to represent sparse arrays.
  13. 13. HAMT Hash BitmapIndexed Leaf
  14. 14. The Haskell definition of a HAMT data HashMap k v = Empty | BitmapIndexed !Bitmap !(Array (HashMap k v)) | Leaf !Hash !k v | Full !(Array (HashMap k v)) | Collision !Hash !(Array (Leaf k v)) type Bitmap = Word type Hash = Int data Array a = Array (Array# a)
  15. 15. High performance Haskell programming Optimized implementation using standard techniques: constructor unpacking, GHCs new INLINABLE pragma, and paying careful attention to strictness. insert performance still bad (e.g compare to hash tables).
  16. 16. Optimizing insertion Most time in insert is spent copying small arrays. Array copying is implemented in Haskell and GHC doesnt apply enough loop optimizations to make it run fast. When allocating arrays GHC fills the array with dummy elements, which are immediately overwritten.
  17. 17. Optimizing insertion: copy less Bagwells original formulation used a fanout of 32. A fanout of 16 seems to provide a better trade-off between lookup and insert performance in our setting. Improved performance by 14%
  18. 18. Optimizing insertion: copy faster Daniel Peebles and I have implemented a set of new primops for copying arrays in GHC. The implementation generates straight-line code for copies of statically known small size, and uses a fast memcpy otherwise. Improved performance by 20%
  19. 19. Optimizing insertion: common patterns In many cases maps are created in one go from a sequence of key/value pairs. We can optimize for this case by repeatedly mutating the HAMT and freezing it when were done. Keys: 212 random 8-byte ByteStrings Runtime (%) fromList/pure 100 fromList/mutating 50
  20. 20. Optimizing lookup: Faster population count Tried several bit population count implementations. Best speed/memory-use trade-off is a lookup table based approach. Using the POPCNT SSE 4.2 instructions improves the performance of lookup by 12%.
  21. 21. Benchmark: IntMap-based vs HAMT Keys: 212 random 8-byte ByteStrings Runtime (μs) Runtime IntMap HAMT % increase lookup 916 477 -48% insert 1855 1998 8% delete 1838 2303 25% The benchmarks dont include the POPCNT optimization, due to it not being available on many architectures.
  22. 22. Memory usage: IntMap-based Total: 96 MB, tree: 66MB (220 Int entries) mem 43,678,549 bytes x seconds Thu Sep 22 13:48 2011 bytes 80M main:Data.HashMap.Common.Tip 60M ghc-prim:GHC.Types.I# 40M main:Data.HashMap.Common.Bin 20M 0M 0.0 0.2 0.4 0.6 seconds
  23. 23. Memory usage: HAMT Total: 71MB, tree: 41MB (220 Int entries) mem 25,805,095 bytes x seconds Thu Sep 22 13:49 2011 bytes 60M ghc-prim:GHC.Types.I# main:Data.HashMap.Base.Leaf 40M MUT_ARR_PTRS_FROZEN 20M main:Data.HashMap.Base.BitmapIndexed 0M 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.5 0.5 seconds
  24. 24. Summary Keys: 212 random 8-byte ByteStrings Runtime (μs) Runtime Map HAMT % increase lookup 1956 477 -76% insert 3543 1998 -44% delete 3791 2303 -39%
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×