Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Scalable Architectures - Taming the... by Lorenzo Alberton 47332 views
- Scaling Teams, Processes and Archit... by Lorenzo Alberton 40136 views
- Profile your PHP application and ma... by Lorenzo Alberton 51384 views
- The Art of Scalability - Managing g... by Lorenzo Alberton 88450 views
- Monitoring at scale - Intuitive das... by Lorenzo Alberton 120944 views
- Graphs in the Database: Rdbms In Th... by Lorenzo Alberton 121020 views

The first part of a series of talks about modern algorithms and data structures, used by nosql databases like HBase and Cassandra. An explanation of Bloom Filters and several derivates, and Merkle Trees.

No Downloads

Total views

41,030

On SlideShare

0

From Embeds

0

Number of Embeds

5,099

Shares

0

Downloads

0

Comments

51

Likes

158

No notes for slide

- 1. Lorenzo Alberton @lorenzoalberton“Modern” Algorithms and Data Structures Part 1 Bloom Filters, Merkle Trees Cassandra-London, Monday 18th April 2011 1
- 2. Bloom Filters Burton Howard Bloom, 1970http://portal.acm.org/citation.cfm?doid=362686.362692 2
- 3. Bloom Filter Space-efﬁcient probabilistic data structure used to test set membership http://en.wikipedia.org/wiki/Bloom_ﬁlter 3
- 4. Bloom FilterSpace-efﬁcient probabilistic data structure that is used to testwhether an element is a member of a set 4
- 5. Bloom FilterSpace-efﬁcient probabilistic data structure that is used to testwhether an element is a member of a set Hash Table ⇒ chance of collision hash(x) hash(y) 4
- 6. Bloom FilterSpace-efﬁcient probabilistic data structure that is used to testwhether an element is a member of a set Hash Table ⇒ chance of collision hash(x) hash(y) False positives are possible, false negatives are not.It might be beneﬁcial to build an exception list of known false positives. 4
- 7. Bloom FilterSpace-efﬁcient probabilistic data structure that is used to testwhether an element is a member of a set 5
- 8. Bloom FilterSpace-efﬁcient probabilistic data structure that is used to testwhether an element is a member of a set Not a Key-Value store 5
- 9. Bloom FilterSpace-efﬁcient probabilistic data structure that is used to testwhether an element is a member of a set Not a Key-Value store Array of bits indicating the presence of a key in the ﬁlter 5
- 10. Bloom FilterSpace-efﬁcient probabilistic data structure that is used to testwhether an element is a member of a set Not a Key-Value store Array of bits indicating the presence of a key in the ﬁlter (*) Removing an element from the ﬁlter is not possible 5
- 11. Bloom Filter: Add & Querym bits (initially set to 0)k hash functionsS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 m-1 m 6
- 12. Bloom Filter: Add & Querym bits (initially set to 0)k hash functions AddS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 m-1 m 6
- 13. Bloom Filter: Add & Querym bits (initially set to 0) if f(x) = A,k hash functions set S[A] = 1 x AddS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 m-1 m 6
- 14. Bloom Filter: Add & Querym bits (initially set to 0) if f(x) = A,k hash functions set S[A] = 1 x Add f(x)S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 2 m-1 m 6
- 15. Bloom Filter: Add & Querym bits (initially set to 0) if f(x) = A,k hash functions set S[A] = 1 x Add g(x) f(x)S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 2 m-1 m 6
- 16. Bloom Filter: Add & Querym bits (initially set to 0) if f(x) = A,k hash functions set S[A] = 1 x Add g(x) f(x) h(x)S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 2 m-1 m 6
- 17. Bloom Filter: Add & Querym bits (initially set to 0) if f(x) = A,k hash functions set S[A] = 1 x y g(y) Add f(y) g(x) f(x) h(x) h(y)S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 2 m-1 m 6
- 18. Bloom Filter: Add & Querym bits (initially set to 0) if f(x) = A,k hash functions set S[A] = 1 x y g(y) Add f(y) g(x) f(x) h(x) h(y)S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 2 m-1 m Query 6
- 19. Bloom Filter: Add & Querym bits (initially set to 0) if f(x) = A,k hash functions set S[A] = 1 x y g(y) Add f(y) g(x) f(x) h(x) h(y)S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 2 m-1 m f(z) h(z) g(z) Query z 6
- 20. Bloom Filter: Add & Querym bits (initially set to 0) if f(x) = A,k hash functions set S[A] = 1 x y g(y) Add f(y) g(x) f(x) h(x) h(y)S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 2 m-1 m f(z) h(z) g(z) Query one bit set to 0 z ⇒z∉S 6
- 21. Bloom Filter: Hash Functionsk Hash functions: uniform random distribution in [1...m) k different hash functions The same hash functions with different salts Double or triple hashing : g (x) = h (x) + ih (x) mod m [1] i 1 2 2 hash functions can mimic k hashing functions Dillinger, Peter C.; Manolios, Panagiotis (2004b), "Bloom Filters in Probabilistic Veriﬁcation", [1] http://www.ccs.neu.edu/home/pete/pub/bloom-ﬁlters-veriﬁcation.pdf http://www.strchr.com/hash_functions 7
- 22. Bloom Filter: Hash Functionsk Hash functions: uniform random distribution in [1...m) k different hash functions ‣ Cryptographic Hash different salts The same hash functions withFunctions (MD5, SHA-1, SHA-256, Tiger, Whirlpool ...) Double or triple hashing : g (x) = h (x) + ih (x) mod m [1] i 1 2 2 hash functions can mimic k hashing functions ‣ Murmur Hashes http://code.google.com/p/smhasher/ Dillinger, Peter C.; Manolios, Panagiotis (2004b), "Bloom Filters in Probabilistic Veriﬁcation", [1] http://www.ccs.neu.edu/home/pete/pub/bloom-ﬁlters-veriﬁcation.pdf http://www.strchr.com/hash_functions 7
- 23. Bloom Filter: Usage Guard against First line of defence Peer to Peer Routing -expensive operations in high performance communication Resource Location (like disk access) (distributed) caches ... Squid Google Various Google Cisco Cassandra HBaseProxy Cache BigTable RDBMS’ Chrome Routers 8
- 24. Bloom Filter: Usage in Cassandra Used to save I/O during key look-ups (check for non-existent keys) One bloom ﬁlter per SSTable. 9
- 25. Bloom Filter: Usage in Cassandra Used to save I/O during key look-ups (check for non-existent keys) One bloom ﬁlter per SSTable. org.apache.cassandra.utils.BloomFilter 9
- 26. Bloom Filter: False Positive Rate m = number of bits in the ﬁlter n = number of elements k = number of hashing functions http://pages.cs.wisc.edu/~cao/papers/summary-cache/node8.html 10
- 27. Bloom Filter: False Positive Rate m = number of bits in the ﬁlter n = number of elements k = number of hashing functions http://pages.cs.wisc.edu/~cao/papers/summary-cache/node8.html 10
- 28. Bloom Filter: False Positive Rate A bloom ﬁlter with an optimal value for k and 1% error rate only needs 9.6 bits per key. Add 4.8 bits/key and the error rate decreases by 10 times.10.000 words, 1% error rate 10.000 words, 0.1% error rate 7 hash functions 11 hash functions ~12 KB of memory ~18 KB of memory http://www.igvita.com/2008/12/27/scalable-datasets-bloom-ﬁlters-in-ruby/ 11
- 29. Bloom Filter: False Positive Rate false positive probability bloom ﬁlter size (n) http://en.wikipedia.org/wiki/Bloom_ﬁlter 12
- 30. Counting Bloom Filter Can handle deletions Use counters instead of 0/1s When adding an element, increment the counters When deleting an element, decrement the counters Counters must be large enough to avoid overﬂow (4 bits) x y g(y) f(y) g(x) f(x) h(x) h(y)S 1 0 0 0 1 0 0 0 2 0 0 0 1 0 1 13
- 31. Stable (Time-Based) Bloom Filter Input StreamDuplicate 1 0 0 0 1 0 0 0 1 0 Filter Output Stream 14
- 32. Stable (Time-Based) Bloom Filter Input Before each insertion, P random Stream cells are decremented by one. The k cells for the new value xi are set to Max (usually < 7) http://webdocs.cs.ualberta.ca/~draﬁei/papers/DupDet06Sigmod.pdfDuplicate 1 0 0 0 1 0 0 0 1 0 Filter Output Stream 14
- 33. Stable (Time-Based) Bloom Filter Input Before each insertion, P random Stream cells are decremented by one. The k cells for the new value xi are set to Max (usually < 7) http://webdocs.cs.ualberta.ca/~draﬁei/papers/DupDet06Sigmod.pdfDuplicate 1 0 0 0 1 0 0 0 1 0 Filter Alternatively, set an expiry time Output for each cell, with a TTL dependent on the volume of data Stream http://www.igvita.com/2010/01/06/ﬂow-analysis-time-based-bloom-ﬁlters/ 14
- 34. Bloom Filters: Further readingCompressed Bloom FiltersImprove performance when the Bloom ﬁlter is passed as a message,and its transmission size is a limiting factor.http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.86.3346Retouched Bloom FiltersAllow networked applications to trade off selected false positivesagainst false negativeshttp://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.172.8453Bloomier FiltersExtended to handle approximate functions (each element of the sethas an associated function value)http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.86.4154 http://arxiv.org/abs/0807.0928Attenuated B.F., Spectral B.F., Distance-Sensitive B.F. ... 15
- 35. Merkle Trees Ralph C. Merkle, 1979http://www.springerlink.com/content/q865hwxq73ex1am9/ 16
- 36. Merkle Trees (Hash Trees) Data Structure containing a tree of summary information about a larger piece of data to verify its contents http://en.wikipedia.org/wiki/Hash_Tree 17
- 37. Merkle Trees (Hash Trees) Leaves: hashes of ROOT hash(A, B) data blocks. Nodes: hashes of their children. A B hash(C, D) hash(E, F) Used to detect inconsistencies C D E F between replicas hash(001) hash(002) hash(003) hash(004) (anti-entropy) and to minimise the Data Data Data Data Block Block Block Block amount of 001 002 003 004 transferred data 18
- 38. Merkle Trees Node A Node B gossip exchange 19
- 39. Merkle Trees Node A Node B gossip exchange Minimal data transfer Differences are easy to locate 19
- 40. Merkle Trees Node A Node B gossip exchange Minimal data transfer Differences are easy to locate SHA-1, Whirlpool or Tiger (TTH) hash functions 19
- 41. Merkle Trees: Usage Peer to Peer communication 20
- 42. Merkle Trees: Usage DC++ Peer to Peer communication 20
- 43. Merkle Trees: Usage DC++ Peer to Peer communication ... Amazon Google Google Cassandra HBase ZFS Dynamo BigTable Wave 20
- 44. Merkle Trees: Usage in Cassandra Ensure the P2P network of nodes receives data blocks unaltered and unharmed. Anti-entropy during major compactions (via Scuttlebutt reconciliation). http://wiki.apache.org/cassandra/ArchitectureAntiEntropy 21
- 45. Merkle Trees: Usage in Cassandra Ensure the P2P network of nodes receives data blocks unaltered and unharmed. Anti-entropy during major compactions (via Scuttlebutt reconciliation). One Merkle Tree per Column Family (in Dynamo, one per node / key range) http://wiki.apache.org/cassandra/ArchitectureAntiEntropy 21
- 46. Merkle Trees: Usage in Cassandra Ensure the P2P network of nodes receives data blocks unaltered and unharmed. Anti-entropy during major compactions (via Scuttlebutt reconciliation). One Merkle Tree per Column Family (in Dynamo, one per node / key range) org.apache.cassandra.utils.MerkleTree http://wiki.apache.org/cassandra/ArchitectureAntiEntropy 21
- 47. ReferencesBloom Filtershttp://bit.ly/bundles/quipo/1Merkle Treeshttp://bit.ly/bundles/quipo/2 22
- 48. We’re Hiring!http://mediasift.com/careers 23
- 49. Lorenzo Alberton @lorenzoalberton Thank you! lorenzo@alberton.infohttp://www.alberton.info/talks 24

No public clipboards found for this slide

Login to see the comments