Tetra Data Blitz
10/1/2015
Monoids
Monoids
Everywhere
in ~5 minutes
Kevin Faro
http://s2.quickmeme.com/img/44/44b0bd758f8ee5c81362923f0d5c8e017c9ddf623925e60c29a4c015b89fbb45.jpg
Oh, that wasn’t clear enough?
An operation is considered a monoid if:
1. it is associative
a. (a●b)●c=a●(b●c)
2. it has an identity element
a. e●a=a●e=a
Examples
● Addition
○ associative: (1+2)+3=1+(2+3)=6
○ identity: 0+1=1+0=1
● Multiplication
○ associative: (1*2)*3=1*(2*3)=6
○ identity: 1*2=2*1=2
● Min
○ you get the idea ...
● Max
● Set Union
Let’s take a look at algebird
http://www.michael-noll.com/blog/2013/12/02/twitter-algebird-monoid-monad-for-large-scala-data-analytics/
https://izbicki.me/img/uploads/2013/05/fry-300x225.jpg
Why is this so awesome?!?!
● Divide and Conquer
● Parallelization
● Incrementalism
Sound Familiar?
● map/REDUCE
○ perfect for the reduce phase
○ see Scalding: expenses.groupBy('shoppingLocation) { _.sum[Double]('cost -> 'totalCost) }
● Streaming
○ perfect for maintaining running calculations on streams of data (storm, …)
Approximate Data Structures
● HyperLogLog
○ an algorithm for the count-distinct problem, approximating the number of distinct elements in a
Set.
● Count-min Sketch
○ a probabilistic data structure that provides an approximate frequency table.
● MinHash
○ estimates how similar two sets are (approximate Jaccard Similarity)
● Bloom filter
○ a probabilistic data structure that is used to test whether an element is a member of a Set
○ can answer definitely No or maybe Yes
Examples
● HyperLogLog
○ How many unique twitter handles tweeted @justinbieber in the past month?
● Count-min Sketch
○ What are the frequencies of the hashtags in those tweets?
● MinHash
○ How similar are the followers of @justinbieber(~70M) to the followers of @katyperry
(~76M)
● Bloom filter
○ Did Kevin tweet to @justinbieber in the past month? maybe yes. Must be a false positive,
can you really trust a bloom filter?!?!?
How did that get in there?
https://highlyscalable.files.wordpress.com/2012/04/probabilistic-sizes.png
This is better than Spanks™!
Thanks Twitter
https://github.com/twitter/algebird*
* Sorry, Algebird doesn’t have a cool logo. Don’t blame me, blame Twitter!
Kevin Faro
kevin@tetraconcepts.com
https://github.com/kevin-faro
http://cdn.meme.am/instances/500x/63234695.jpg
Need more?
● http://www.michael-noll.com/blog/2013/12/02/twitter-algebird-monoid-
monad-for-large-scala-data-analytics/
● https://github.com/twitter/algebird/wiki/Learning-Algebird-Monoids-with-
REPL
● https://github.com/twitter/algebird
● https://github.com/twitter/scalding
● https://github.com/twitter/summingbird
● https://github.com/twitter/algebird/wiki/Abstract-algebra-definitions

Monoids monoids everywhere