Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Monoids and sketches and crdts, oh my!

123 views

Published on

A (hopefully) accessible introduction to some of the key mathematical concepts that make distributed and streaming computation possible.

Published in: Technology
  • Be the first to comment

Monoids and sketches and crdts, oh my!

  1. 1. Monoids and Sketches and CRDTs, oh my! Kevin Scaldeferri OSB 2016
  2. 2. How Do I Math with Big Data?
  3. 3. This document and the information herein (including any information that may be incorporated by reference) is provided for informational purposes only and should not be construed as an offer, commitment, promise or obligation on behalf of New Relic, Inc. (“New Relic”) to sell securities or deliver any product, material, code, functionality, or other feature. Any information provided hereby is proprietary to New Relic and may not be replicated or disclosed without New Relic’s express written permission. Such information may contain forward-looking statements within the meaning of federal securities laws. Any statement that is not a historical fact or refers to expectations, projections, future plans, objectives, estimates, goals, or other characterizations of future events is a forward-looking statement. These forward-looking statements can often be identified as such because the context of the statement will include words such as “believes,” “anticipates,” “expects” or words of similar import. Actual results may differ materially from those expressed in these forward-looking statements, which speak only as of the date hereof, and are subject to change at any time without notice. Existing and prospective investors, customers and other third parties transacting business with New Relic are cautioned not to place undue reliance on this forward-looking information. The achievement or success of the matters covered by such forward-looking statements are based on New Relic’s current assumptions, expectations, and beliefs and are subject to substantial risks, uncertainties, assumptions, and changes in circumstances that may cause the actual results, performance, or achievements to differ materially from those expressed or implied in any forward-looking statement. Further information on factors that could affect such forward-looking statements is included in the filings we make with the SEC from time to time. Copies of these documents may be obtained by visiting New Relic’s Investor Relations website at ir.newrelic.com or the SEC’s website at www.sec.gov. New Relic assumes no obligation and does not intend to update these forward-looking statements, except as required by law. New Relic makes no warranties, expressed or implied, in this document or otherwise, with respect to the information provided.
  4. 4. How?
  5. 5. Monoids and Sketches and CRDTs, oh my!
  6. 6. Monoids 超音波システム研究所 / http://bit.ly/26bBTQ1 / CC BY 3.0
  7. 7. Wikipedia A monoid is an algebraic structure with a single associative binary operation and an identity element. http://bit.ly/1Wlrigv / CC0
  8. 8. It’s just a thing you can “add”
  9. 9. interface Monoid[T] { // (x + y) + z = x + (y + z) T add(T x, T y); // 0 + x = x = x + 0 T unit(); }
  10. 10. interface Monoid[T] { // (x + y) + z = x + (y + z) T add(T x, T y); // 0 + x = x = x + 0 T unit(); }
  11. 11. interface Monoid[T] { // (x + y) + z = x + (y + z) T add(T x, T y); // 0 + x = x = x + 0 T unit(); }
  12. 12. interface Monoid[T] { // (x + y) + z = x + (y + z) T add(T x, T y); // 0 + x = x = x + 0 T unit(); }
  13. 13. interface Monoid[T] { // (x + y) + z = x + (y + z) T add(T x, T y); // 0 + x = x = x + 0 T unit(); }
  14. 14. One data type can have multiple monoids!
  15. 15. Operation Unit Sum 0 Product 1 Max -∞ Min +∞
  16. 16. Live Demo!
  17. 17. More Monoids Count Boolean And Lists & String Concatenation Boolean Or Set Union Function Composition
  18. 18. Tuple Monoids Monoid[U] & Monoid[V] ➜ Monoid[(U,V)]
  19. 19. Derived Monoids Count & Sum ➜ Average Count & Sum & SumOfSquares ➜ StdDev
  20. 20. Sets don’t scale Dan Morgan / http://bit.ly/1UiFhGs / CC BY 2.0
  21. 21. Sketches = Monoids + Physics
  22. 22. Counting by Flipping Coins HHT T T HHHHHT HT T HHT HT T T T T T HT T T T T T HT
  23. 23. Unique Count by Hashing 0111101001 1110101100 0010010010 0100100011 1000111000 0100011011 1100100110 1111011011 0011100001 1001011100 1110100101 1001110101 1010111001 1011110111 0000101001 0100101001 0100110000 0011110100 1011011010 0010011011
  24. 24. Set Cardinality (uniqueCount) ≈ HyperLogLog Aldo Schumann / http://bit.ly/1Yqzvme / public domain
  25. 25. Set Membership
  26. 26. interface ExtensionalSet[T] { Iterator[T] iterator() }
  27. 27. interface IntensionalSet[T] { boolean isMember(T t); }
  28. 28. Intensional Sets ≈ Bloom Filters
  29. 29. HashSet
  30. 30. A HashSet
  31. 31. A HashSet
  32. 32. A HashSet
  33. 33. A BHashSet
  34. 34. A BHashSet
  35. 35. A B HashSet
  36. 36. A B CHashSet
  37. 37. A B CHashSet
  38. 38. A B C Ohnoes! HashSet
  39. 39. A B C HashSet
  40. 40. A B C D?HashSet
  41. 41. A B C D?HashSet
  42. 42. A B C D? Nopes! HashSet
  43. 43. A B C E?HashSet
  44. 44. A B C E?HashSet
  45. 45. A B C E? Hmmm HashSet
  46. 46. A B C E?== HashSet
  47. 47. A B C E?== Nope! HashSet
  48. 48. BloomFilter 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  49. 49. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ABloomFilter
  50. 50. 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 ABloomFilter
  51. 51. 0 0 1 0 1 0 0 0 1 1 0 0 1 0 1 0 A BBloomFilter
  52. 52. 0 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0 A B CBloomFilter
  53. 53. 0 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0 A B C D? BloomFilter
  54. 54. 0 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0 A B C D? Nope! BloomFilter
  55. 55. 0 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0 A B C A? BloomFilter
  56. 56. 0 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0 A B C A? Yes* BloomFilter
  57. 57. BloomFilter Monoid 0 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 0 0 1 0 1 1 + =
  58. 58. Circling Back: BloomFilters are a scalable approximation to Sets
  59. 59. CountMinSketch
  60. 60. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 CountMinSketch
  61. 61. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 CountMinSketch
  62. 62. 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 CountMinSketch
  63. 63. 10 0 0 0 0 0 0 0 1 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 A 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 BCountMinSketch
  64. 64. 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 A 0 0 0 0 0 0 0 0 0 0 0 0 1 0 2 0 0 0 1 0 2 0 0 0 0 0 0 0 0 0 0 0 B CCountMinSketch
  65. 65. 0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 A 0 0 0 0 0 0 0 0 0 0 0 0 1 0 3 0 0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0 B CCountMinSketch
  66. 66. 0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 A 0 0 0 0 0 0 0 0 0 0 0 0 1 0 3 0 0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0 B C D? CountMinSketch
  67. 67. 0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 A 0 0 0 0 0 0 0 0 0 0 0 0 1 0 3 0 0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0 B C D? Min(2,1,0) = 0 CountMinSketch
  68. 68. 0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 A 0 0 0 0 0 0 0 0 0 0 0 0 1 0 3 0 0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0 B C A? CountMinSketch
  69. 69. 0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 A 0 0 0 0 0 0 0 0 0 0 0 0 1 0 3 0 0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0 B C A? Min(2,2,3) = 2 CountMinSketch
  70. 70. CountMinSketch Frequency of Occurrence
  71. 71. Funnels % of users who do A, then B Size(A ∪ B) ≈ HyperLogLog Size(A ∩ B) / Size(A ∪ B) ≈ MinHash pedrik / http://bit.ly/25WzP1H / CC BY 2.0
  72. 72. What About Streaming Data?
  73. 73. Streaming is Distributed-in-Time Computation
  74. 74. What About Mutable Data?
  75. 75. CRDTs
  76. 76. Conflict-Free Replicated Data Types
  77. 77. Available, Eventually Consistent Data Structures
  78. 78. How Can Two People Count?
  79. 79. 0 0 Shared Counter
  80. 80. 0 0 Shared Counter (+5) 5 5
  81. 81. 0 0 Shared Counter (+5) 5 5 (-4) (-3) 1 -2 2 -2
  82. 82. 0 0 Op-based Counter (+5) 5 5 (-4) (-3) 1 -2 2 -2
  83. 83. 0 0 Op-based Counter (+5) 5 5 10 Oops!
  84. 84. {} {} Naive Sets
  85. 85. {} {} Naive Sets (+X) {X} (+X) {X} {X} {X}
  86. 86. {} {} Naive Sets (+X) {X} (+X) {X} {X} {X} (-X) {} {}
  87. 87. {} {} Naive Sets (+X) {X} (+X) {X} {X} {X} (-X) {} {} Oops!
  88. 88. {} {} Observed-Remove Sets (+Xa) {Xa} (+Xb) {Xb} {Xb} {XaXb} (-Xa) {} {Xb}
  89. 89. 0 0 State-based Counter
  90. 90. 0 0 State-based Counter (+5) {a=5}=5 {a=5}=5
  91. 91. 0 0 {a=9}=9 State-based Counter (+5) (+4) (+3) {a=5}=5 {a=5}=5 {a=5,b=3}=8 {a=9,b=3}=12 {a=9,b=3}=12
  92. 92. 0 0 {a=9}=9 State-based Counter (+5) (+4) {a=5}=5 ???{a=9}=9
  93. 93. 0 0 Increment-only Counter (+5) (+4) {a=5}=5 {a=9}=9{a=9}=9 {a=9}=9
  94. 94. 0 0 {a=+5,-4}=1 {a=+5,-4}=1 PN Counter (+5) (-4) {a=+5}=5 {a=+8,-4}=4{a=+5,-4}=1 (+3) {a=+8,-4}=4
  95. 95. 0 0 {a:2:1}=1 {a:2:1}=1 Versioned State (+5) (-4) {a:1:5}=5 {a:3:4}=4{a:2:1}=1 (+3) {a:3:4}=4
  96. 96. Replace exactly-once, in-order delivery with an idempotent merge strategy
  97. 97. Summing Up Monoids allow computations to be done across many machines and merged Sketches allow approximate results when the exact answers are computationally infeasible CRDTs give an approach for mutable distributed data
  98. 98. Thank You kevin@scaldeferri.com @kscaldef

×