SlideShare a Scribd company logo
1 of 51
Algebird 
Abstract Algebra 
for 
Analytics 
Sam BESSALAH 
@samklr 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop @samklr
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
Abstract Algebra 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
From WikiPedia 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
Algebraic Structure 
โ€œ Set of values, coupled with one or 
more finite operations,and a set of 
laws those operations must obey. โ€œ 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Algebraic Structure 
โ€œ Set of values, coupled with one or more 
finite operations, and a set of laws those 
operations must obey. โ€œ 
e.g Sum, Magma, Semigroup, Groups, Monoid, 
Abelian Group, Semi Lattices, Rings, Monads, 
etc. 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Semigroup 
Semigroup Law : 
(x <> y) <> z = x <> (y <> z) 
(associativity) 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Semigroup 
Semigroup Law : 
(x <> y) <> z = x <> (y <> z) 
(associativity) 
trait Semigroup[T] { 
def aggregate(x : T, y : T) : T 
} 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Monoids 
Monoid Laws : 
(x <> y) <> z = x <> (y <> z) 
(associativity) 
identity <> x = x 
x <> identity = x 
(identity) 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Monoids 
Monoid Laws : 
(x <> y) <> z = x <> (y <> z) 
(associativity) 
identity <> x = x 
x <> identity = x 
(identiy / zero) 
trait Monoid[T] { 
def identity : T 
def aggregate (x, y) : T 
} 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Monoids 
Monoid Laws : 
(x <> y) <> z = x <> (y <> z) 
(associativity) 
identity <> x = x 
x <> identity = x 
trait Monoid[T] extends Semigroup[T]{ 
def identity : T 
} 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Groups 
Group Laws: 
(x <> y) <> z = x <> (y <> z) 
(associativity) 
identity <> x = x 
x <> identity = x 
(identity) 
x <> inverse x = identity 
inverse x <> x = identity 
(invertibility) 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Groups 
Group Laws 
(x <> y) <> z = x <> (y <> z) 
identity <> x = x 
x <> identity = x 
x <> inverse x = identity 
inverse x <> x = identity 
trait Group[T] extends Monoid[T]{ 
def inverse (v : T) :T 
} 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Many More 
- Abelian groups (Commutative Sets) 
- Rings 
- Semi Lattices 
- Ordered Semigroups 
- Fields .. 
Many of those are in Algebird โ€ฆ. 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Examples 
- (a min b) min c = a (b min c) with Int. 
- a max ( b max c) = (a max b) max c ** 
- a or (b or c) = (a or b) or c 
- a and (b and c) = (a and b) and c 
- int addition 
- set union 
- harmonic sum 
- Integer mean 
- Priority queue 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Why do we need those algebraic 
structures ? 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
We want to : 
- Build scalable analytics systems 
- Leverage distributed computing to perform aggregation 
on really large data sets. 
- A lot of operations in analytics are just sorting and 
counting at the end of the day 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Distributed Computing โ†’ Parallellism 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Distributed Computing โ†’ Parallellism 
Associativity โ†’ enables parallelism 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Distributed Computing โ†’ Parallellism 
Associativity enables parallelism 
Identity means we can ignore some data 
Commutativity helps us ignore order 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Typical Map Reduce ... 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Finding Top-K Elements in Scalding ... 
class TopKJob(args : Args) extends Job (args) { 
Tsv ( args(โ€˜inputโ€™), visitScheme) 
.filter (. ..) 
.leftJoinWithTiny ( โ€ฆ ) 
.filter ( โ€ฆ ) 
.groupBy( โ€˜fieldOne) 
{ _.sortWithTake (visitScheme -> top } 
(biggerSale) 
.write(Tsv(...) ) 
} 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
.sortWithTake( โ€ฆ ) 
Looking into .sortWithTake in Scalding, thereโ€™s one 
nice thing : 
class PiorityQueueMonoid[T] (max : Int) 
(implicit order : Ordering[T] ) 
extends Monoid[Priorityqueue[T] ] 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
class PiorityQueueMonoid[T] (max : Int) 
(implicit order : Ordering[T] ) 
extends Monoid[Priorityqueue[T] ] 
Letโ€™s take a look : 
PQ1 : 55, 45, 21, 3 
PQ2: 100, 80, 40, 3 
top-4 (PQ1 U PQ2 ): 100, 80, 55, 45 
Priority Queue : 
Can be empty 
Two Priority Queues can be โ€œaddedโ€ in any order 
Associative + Commutative 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
class PiorityQueueMonoid[T] (max : Int) 
(implicit order : Ordering[T] ) 
extends Monoid[Priorityqueue[T] ] 
Letโ€™s take a look : 
PQ1 : 55, 45, 21, 3 
PQ2: 100, 80, 40, 3 
top-4 (PQ1 U PQ2 ): 100, 80, 55, 45 
Priority Queue : 
Makes Scalding go fast, 
by doing sorting, 
filtering and extracting 
in one single โ€œmapโ€ 
step. 
Can be empty 
Two Priority Queues can be โ€œaddedโ€ in any order 
Associative + Commutative 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Stream Mining Challenges 
- Update predictions after each observation 
- Single pass : canโ€™t read old data or replay 
the stream 
- Full size of the stream often unknown 
- Limited time for computation per 
observation 
- O(1) memory size 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Stream Mining Challenges 
http://radar.oreilly.com/2013/10/stream-mining-essentials.html 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Tradeoff : Space and speed over 
accuracy. 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Tradeoff : Space and speed over 
accuracy. 
use sketches. 
Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Sketches 
Probabilistic data structures that store a summary 
(hashed mostly)of a data set that would be costly to 
store in its entirety, thus providing most of the 
time, sublinear algorithmic properties. 
E.g Bloom Filters, Counter Sketch, KMV counters, 
Count Min Sketch, HyperLogLog, Min Hashes 
#Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Bloom filters 
Approximate data structure for set membership 
Behaves like an approximate set 
BloomFilter.contains(x) => NO | Maybe 
P(False Positive) > 0 
P(False Negative) = 0 
#Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Internally : 
Bit Array of fixed size 
add(x) : for all element i, b[h(x,i)]=1 
contains(x) : TRUE if b[h(x,i)] = = 1 for all i. 
(Boolean AND => associative) 
Both are associative => BF can be designed as a Monoid 
#Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Bloom filters 
import com.twitter.algebird._ 
import com.twitter.algebird.Operators._ 
// generate 2 lists 
val A = (1 to 300).toList 
// Generate a Bloomfilter 
val NUM_HASHES = 6 
val WIDTH = 6000 // bits 
val SEED = 1 
implicit val bfm = new BloomFilterMonoid(NUM_HASHES, WIDTH, SEED) 
// approximate set with bloomfilter 
val A_bf = A.map{i => bfm.create(i.toString)}.reduce(_ + _) 
val approxBool = A_bf.contains(โ€œ150โ€) ---> ApproximateBoolean(true, 0.9995โ€ฆ) 
#Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Count Min Sketch 
Gives an approximation of the number of occurrences of an 
element in a set. 
#Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Count Min Sketch 
Count min sketch 
Adding an element is a numerical addition 
Querying uses a MIN function. 
Both are associative. 
useful for detecting heavy hitters, topK, LSH 
We have in Algebird : 
#Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
HyperLogLog 
Popular sketch for cardinality estimtion. 
Gives within a probilistic distribution of an error 
the number of distinct values in a data set. 
HLL.size = Approx[Number] 
Intuition 
Long runs of trailings 0 in a random bits 
chain are rare 
But the more bit chains you look at, the more 
likely you are to find a long one 
The longest run of trailing 0-bits seen can be 
an estimator of the number of unique bit chains 
observed. 
#Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Adding an element uses a Max and Sum function. 
Both are associative and Monoids. (Max is an 
ordered 
semigroup in Algebird really) 
Querying for an element uses an harmonic mean 
which is a Monoid. 
In Algebird : 
#Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Many More juicy sketches ... 
- MinHashes to compute Jaccard similarity 
- QTree for quantiles estimation. Neat for anomaly 
detection. 
- SpaceSaverMonoid, Awesome to find the approximate 
most frequent and top K elements. 
- TopKMonoid 
- SGD, PriorityQueues, Histograms, etc. 
#Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
SummingBird : Lamba in a box 
#Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Heard of Lambda Architecture ? 
#Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
SummingBird 
Same code for both batch and real time processing. 
#Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
SummingBird 
Same code, for both batch and real time processing. 
But works only on Monoids. 
Uses Storehaus, as a mergeable store layer. 
#Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
http://github.com/twitter/algebird 
#Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
http://github.com/twitter/algebird 
#Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
#Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr 
These slides : 
http://bit.ly/1szncAZ 
http://slidesha.re/1zhhXKU
#Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr
Links 
-Algebra for analytics by Oscar Boykin (Creator of Algebird) 
http://speakerdeck.com/johnynek/algebra-for-analytics 
- Take a look into HLearn https://github.com/mikeizbicki/HLearn 
- Great intro into Algebird by Michael Noll 
http://www.michael-noll.com/blog/2013/12/02/twitter-algebird-monoid-monad- 
for-large-scala-data-analytics/ 
-Aggregate Knowledge http://research.neustar.biz/2012/10/25/sketch-of- 
the-day-hyperloglog-cornerstone-of-a-big-data-infrastructure 
- Probabilistic data structures for web analytics. 
http://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures- 
web-analytics-data-mining/ 
- http://debasishg.blogspot.fr/2014/01/count-min-sketch-data-structure- 
for.html 
- http://infolab.stanford.edu/~ullman/mmds/ch3.pdf 
#Devoxx #algebird #scalding #monoid #hadoop #spark 
@samklr

More Related Content

What's hot

Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Spark Summit
ย 
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015
Holden Karau
ย 
Fresh from the Oven (04.2015): Experimental Akka Typed and Akka Streams
Fresh from the Oven (04.2015): Experimental Akka Typed and Akka StreamsFresh from the Oven (04.2015): Experimental Akka Typed and Akka Streams
Fresh from the Oven (04.2015): Experimental Akka Typed and Akka Streams
Konrad Malawski
ย 

What's hot (20)

Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
ย 
Developing distributed applications with Akka and Akka Cluster
Developing distributed applications with Akka and Akka ClusterDeveloping distributed applications with Akka and Akka Cluster
Developing distributed applications with Akka and Akka Cluster
ย 
Asynchronous stream processing with Akka Streams
Asynchronous stream processing with Akka StreamsAsynchronous stream processing with Akka Streams
Asynchronous stream processing with Akka Streams
ย 
mesos-devoxx14
mesos-devoxx14mesos-devoxx14
mesos-devoxx14
ย 
Reactive Streams / Akka Streams - GeeCON Prague 2014
Reactive Streams / Akka Streams - GeeCON Prague 2014Reactive Streams / Akka Streams - GeeCON Prague 2014
Reactive Streams / Akka Streams - GeeCON Prague 2014
ย 
Reactive programming on Android
Reactive programming on AndroidReactive programming on Android
Reactive programming on Android
ย 
XML-Motor
XML-MotorXML-Motor
XML-Motor
ย 
Writing Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingWriting Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using Scalding
ย 
Scalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/CascadingScalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/Cascading
ย 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
ย 
Akka Actor presentation
Akka Actor presentationAkka Actor presentation
Akka Actor presentation
ย 
whats new in java 8
whats new in java 8 whats new in java 8
whats new in java 8
ย 
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015
ย 
Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...
Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...
Introduction to Structured Streaming | Big Data Hadoop Spark Tutorial | Cloud...
ย 
Fresh from the Oven (04.2015): Experimental Akka Typed and Akka Streams
Fresh from the Oven (04.2015): Experimental Akka Typed and Akka StreamsFresh from the Oven (04.2015): Experimental Akka Typed and Akka Streams
Fresh from the Oven (04.2015): Experimental Akka Typed and Akka Streams
ย 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
ย 
Async - react, don't wait - PingConf
Async - react, don't wait - PingConfAsync - react, don't wait - PingConf
Async - react, don't wait - PingConf
ย 
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Escape from Hadoop: Ultra Fast Data Analysis with Spark & CassandraEscape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
ย 
Norikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubyNorikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In Ruby
ย 
HadoopCon 2016 - ็”จ Jupyter Notebook Hold ไฝไธ€ๅ€‹ไธŠ็ทš Spark Machine Learning ๅฐˆๆกˆๅฏฆๆˆฐ
HadoopCon 2016  - ็”จ Jupyter Notebook Hold ไฝไธ€ๅ€‹ไธŠ็ทš Spark  Machine Learning ๅฐˆๆกˆๅฏฆๆˆฐHadoopCon 2016  - ็”จ Jupyter Notebook Hold ไฝไธ€ๅ€‹ไธŠ็ทš Spark  Machine Learning ๅฐˆๆกˆๅฏฆๆˆฐ
HadoopCon 2016 - ็”จ Jupyter Notebook Hold ไฝไธ€ๅ€‹ไธŠ็ทš Spark Machine Learning ๅฐˆๆกˆๅฏฆๆˆฐ
ย 

Viewers also liked

An application of abstract algebra to music theory
An application of abstract algebra to music theoryAn application of abstract algebra to music theory
An application of abstract algebra to music theory
morkir
ย 
Snapdragon processors
Snapdragon processorsSnapdragon processors
Snapdragon processors
Deepak Mathew
ย 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
Daniel Tunkelang
ย 

Viewers also liked (19)

Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In Production
ย 
algebraic-geometry
algebraic-geometryalgebraic-geometry
algebraic-geometry
ย 
A Study to Design and Implement a Manual for the Learning Process of Technica...
A Study to Design and Implement a Manual for the Learning Process of Technica...A Study to Design and Implement a Manual for the Learning Process of Technica...
A Study to Design and Implement a Manual for the Learning Process of Technica...
ย 
An application of abstract algebra to music theory
An application of abstract algebra to music theoryAn application of abstract algebra to music theory
An application of abstract algebra to music theory
ย 
Deep learning for mere mortals - Devoxx Belgium 2015
Deep learning for mere mortals - Devoxx Belgium 2015Deep learning for mere mortals - Devoxx Belgium 2015
Deep learning for mere mortals - Devoxx Belgium 2015
ย 
Information Security Seminar #2
Information Security Seminar #2Information Security Seminar #2
Information Security Seminar #2
ย 
Definition ofvectorspace
Definition ofvectorspaceDefinition ofvectorspace
Definition ofvectorspace
ย 
Snapdragon Processor
Snapdragon ProcessorSnapdragon Processor
Snapdragon Processor
ย 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning Models
ย 
Snapdragon processors
Snapdragon processorsSnapdragon processors
Snapdragon processors
ย 
Kill the mutants - A better way to test your tests
Kill the mutants - A better way to test your testsKill the mutants - A better way to test your tests
Kill the mutants - A better way to test your tests
ย 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
ย 
Machine Learning on Big Data
Machine Learning on Big DataMachine Learning on Big Data
Machine Learning on Big Data
ย 
Myths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsMyths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data Scientists
ย 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
ย 
Tutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsTutorial on Deep learning and Applications
Tutorial on Deep learning and Applications
ย 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
ย 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
ย 
Build Features, Not Apps
Build Features, Not AppsBuild Features, Not Apps
Build Features, Not Apps
ย 

Similar to Algebird : Abstract Algebra for big data analytics. Devoxx 2014

Taxonomy of Scala
Taxonomy of ScalaTaxonomy of Scala
Taxonomy of Scala
shinolajla
ย 
Pharo, an innovative and open-source Smalltalk
Pharo, an innovative and open-source SmalltalkPharo, an innovative and open-source Smalltalk
Pharo, an innovative and open-source Smalltalk
Serge Stinckwich
ย 
Ruby ๅ…ฅ้–€ ็ฌฌไธ€ๆฌกๅฐฑไธŠๆ‰‹
Ruby ๅ…ฅ้–€ ็ฌฌไธ€ๆฌกๅฐฑไธŠๆ‰‹Ruby ๅ…ฅ้–€ ็ฌฌไธ€ๆฌกๅฐฑไธŠๆ‰‹
Ruby ๅ…ฅ้–€ ็ฌฌไธ€ๆฌกๅฐฑไธŠๆ‰‹
Wen-Tien Chang
ย 

Similar to Algebird : Abstract Algebra for big data analytics. Devoxx 2014 (20)

Everything is Permitted: Extending Built-ins
Everything is Permitted: Extending Built-insEverything is Permitted: Extending Built-ins
Everything is Permitted: Extending Built-ins
ย 
Concurrent programming with Celluloid (MWRC 2012)
Concurrent programming with Celluloid (MWRC 2012)Concurrent programming with Celluloid (MWRC 2012)
Concurrent programming with Celluloid (MWRC 2012)
ย 
The things we don't see โ€“ stories of Software, Scala and Akka
The things we don't see โ€“ stories of Software, Scala and AkkaThe things we don't see โ€“ stories of Software, Scala and Akka
The things we don't see โ€“ stories of Software, Scala and Akka
ย 
Ruby โ€” An introduction
Ruby โ€” An introductionRuby โ€” An introduction
Ruby โ€” An introduction
ย 
Taxonomy of Scala
Taxonomy of ScalaTaxonomy of Scala
Taxonomy of Scala
ย 
Blocks by Lachs Cox
Blocks by Lachs CoxBlocks by Lachs Cox
Blocks by Lachs Cox
ย 
Ruby 2: some new things
Ruby 2: some new thingsRuby 2: some new things
Ruby 2: some new things
ย 
A tour on ruby and friends
A tour on ruby and friendsA tour on ruby and friends
A tour on ruby and friends
ย 
Pharo, an innovative and open-source Smalltalk
Pharo, an innovative and open-source SmalltalkPharo, an innovative and open-source Smalltalk
Pharo, an innovative and open-source Smalltalk
ย 
Ruby ๅ…ฅ้–€ ็ฌฌไธ€ๆฌกๅฐฑไธŠๆ‰‹
Ruby ๅ…ฅ้–€ ็ฌฌไธ€ๆฌกๅฐฑไธŠๆ‰‹Ruby ๅ…ฅ้–€ ็ฌฌไธ€ๆฌกๅฐฑไธŠๆ‰‹
Ruby ๅ…ฅ้–€ ็ฌฌไธ€ๆฌกๅฐฑไธŠๆ‰‹
ย 
Spock: Test Well and Prosper
Spock: Test Well and ProsperSpock: Test Well and Prosper
Spock: Test Well and Prosper
ย 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
ย 
Ruby is an Acceptable Lisp
Ruby is an Acceptable LispRuby is an Acceptable Lisp
Ruby is an Acceptable Lisp
ย 
Ruby Topic Maps Tutorial (2007-10-10)
Ruby Topic Maps Tutorial (2007-10-10)Ruby Topic Maps Tutorial (2007-10-10)
Ruby Topic Maps Tutorial (2007-10-10)
ย 
Rails by example
Rails by exampleRails by example
Rails by example
ย 
Introductionto fp with groovy
Introductionto fp with groovyIntroductionto fp with groovy
Introductionto fp with groovy
ย 
Code is not text! How graph technologies can help us to understand our code b...
Code is not text! How graph technologies can help us to understand our code b...Code is not text! How graph technologies can help us to understand our code b...
Code is not text! How graph technologies can help us to understand our code b...
ย 
BASE Meetup: "Analysing Scala Puzzlers: Essential and Accidental Complexity i...
BASE Meetup: "Analysing Scala Puzzlers: Essential and Accidental Complexity i...BASE Meetup: "Analysing Scala Puzzlers: Essential and Accidental Complexity i...
BASE Meetup: "Analysing Scala Puzzlers: Essential and Accidental Complexity i...
ย 
Scala Up North: "Analysing Scala Puzzlers: Essential and Accidental Complexit...
Scala Up North: "Analysing Scala Puzzlers: Essential and Accidental Complexit...Scala Up North: "Analysing Scala Puzzlers: Essential and Accidental Complexit...
Scala Up North: "Analysing Scala Puzzlers: Essential and Accidental Complexit...
ย 
Code for Startup MVP (Ruby on Rails) Session 2
Code for Startup MVP (Ruby on Rails) Session 2Code for Startup MVP (Ruby on Rails) Session 2
Code for Startup MVP (Ruby on Rails) Session 2
ย 

Recently uploaded

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
ย 
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female serviceCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
ย 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
ย 
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
anilsa9823
ย 

Recently uploaded (20)

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
ย 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
ย 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
ย 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
ย 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
ย 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
ย 
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female serviceCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
ย 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
ย 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
ย 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
ย 
Shapes for Sharing between Graph Data Spacesย - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spacesย - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spacesย - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spacesย - and Epistemic Querying of RDF-...
ย 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
ย 
Vip Call Girls Noida โžก๏ธ Delhi โžก๏ธ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida โžก๏ธ Delhi โžก๏ธ 9999965857 No Advance 24HRS LiveVip Call Girls Noida โžก๏ธ Delhi โžก๏ธ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida โžก๏ธ Delhi โžก๏ธ 9999965857 No Advance 24HRS Live
ย 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...
ย 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
ย 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
ย 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
ย 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
ย 
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
ย 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
ย 

Algebird : Abstract Algebra for big data analytics. Devoxx 2014

  • 1. Algebird Abstract Algebra for Analytics Sam BESSALAH @samklr Room 4 #Devoxx #algebird #scalding #monoid #hadoop @samklr
  • 2. Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 3. Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 4. Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 5. Abstract Algebra Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 6. From WikiPedia Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 7. Algebraic Structure โ€œ Set of values, coupled with one or more finite operations,and a set of laws those operations must obey. โ€œ Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 8. Algebraic Structure โ€œ Set of values, coupled with one or more finite operations, and a set of laws those operations must obey. โ€œ e.g Sum, Magma, Semigroup, Groups, Monoid, Abelian Group, Semi Lattices, Rings, Monads, etc. Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 9. Semigroup Semigroup Law : (x <> y) <> z = x <> (y <> z) (associativity) Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 10. Semigroup Semigroup Law : (x <> y) <> z = x <> (y <> z) (associativity) trait Semigroup[T] { def aggregate(x : T, y : T) : T } Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 11. Monoids Monoid Laws : (x <> y) <> z = x <> (y <> z) (associativity) identity <> x = x x <> identity = x (identity) Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 12. Monoids Monoid Laws : (x <> y) <> z = x <> (y <> z) (associativity) identity <> x = x x <> identity = x (identiy / zero) trait Monoid[T] { def identity : T def aggregate (x, y) : T } Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 13. Monoids Monoid Laws : (x <> y) <> z = x <> (y <> z) (associativity) identity <> x = x x <> identity = x trait Monoid[T] extends Semigroup[T]{ def identity : T } Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 14. Groups Group Laws: (x <> y) <> z = x <> (y <> z) (associativity) identity <> x = x x <> identity = x (identity) x <> inverse x = identity inverse x <> x = identity (invertibility) Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 15. Groups Group Laws (x <> y) <> z = x <> (y <> z) identity <> x = x x <> identity = x x <> inverse x = identity inverse x <> x = identity trait Group[T] extends Monoid[T]{ def inverse (v : T) :T } Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 16. Many More - Abelian groups (Commutative Sets) - Rings - Semi Lattices - Ordered Semigroups - Fields .. Many of those are in Algebird โ€ฆ. Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 17. Examples - (a min b) min c = a (b min c) with Int. - a max ( b max c) = (a max b) max c ** - a or (b or c) = (a or b) or c - a and (b and c) = (a and b) and c - int addition - set union - harmonic sum - Integer mean - Priority queue Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 18. Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 19. Why do we need those algebraic structures ? Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 20. We want to : - Build scalable analytics systems - Leverage distributed computing to perform aggregation on really large data sets. - A lot of operations in analytics are just sorting and counting at the end of the day Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 21. Distributed Computing โ†’ Parallellism Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 22. Distributed Computing โ†’ Parallellism Associativity โ†’ enables parallelism Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 23. Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 24. Distributed Computing โ†’ Parallellism Associativity enables parallelism Identity means we can ignore some data Commutativity helps us ignore order Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 25. Typical Map Reduce ... Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 26. Finding Top-K Elements in Scalding ... class TopKJob(args : Args) extends Job (args) { Tsv ( args(โ€˜inputโ€™), visitScheme) .filter (. ..) .leftJoinWithTiny ( โ€ฆ ) .filter ( โ€ฆ ) .groupBy( โ€˜fieldOne) { _.sortWithTake (visitScheme -> top } (biggerSale) .write(Tsv(...) ) } Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 27. .sortWithTake( โ€ฆ ) Looking into .sortWithTake in Scalding, thereโ€™s one nice thing : class PiorityQueueMonoid[T] (max : Int) (implicit order : Ordering[T] ) extends Monoid[Priorityqueue[T] ] Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 28. class PiorityQueueMonoid[T] (max : Int) (implicit order : Ordering[T] ) extends Monoid[Priorityqueue[T] ] Letโ€™s take a look : PQ1 : 55, 45, 21, 3 PQ2: 100, 80, 40, 3 top-4 (PQ1 U PQ2 ): 100, 80, 55, 45 Priority Queue : Can be empty Two Priority Queues can be โ€œaddedโ€ in any order Associative + Commutative Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 29. class PiorityQueueMonoid[T] (max : Int) (implicit order : Ordering[T] ) extends Monoid[Priorityqueue[T] ] Letโ€™s take a look : PQ1 : 55, 45, 21, 3 PQ2: 100, 80, 40, 3 top-4 (PQ1 U PQ2 ): 100, 80, 55, 45 Priority Queue : Makes Scalding go fast, by doing sorting, filtering and extracting in one single โ€œmapโ€ step. Can be empty Two Priority Queues can be โ€œaddedโ€ in any order Associative + Commutative Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 30. Stream Mining Challenges - Update predictions after each observation - Single pass : canโ€™t read old data or replay the stream - Full size of the stream often unknown - Limited time for computation per observation - O(1) memory size Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 31. Stream Mining Challenges http://radar.oreilly.com/2013/10/stream-mining-essentials.html Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 32. Tradeoff : Space and speed over accuracy. Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 33. Tradeoff : Space and speed over accuracy. use sketches. Room 4 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 34. Sketches Probabilistic data structures that store a summary (hashed mostly)of a data set that would be costly to store in its entirety, thus providing most of the time, sublinear algorithmic properties. E.g Bloom Filters, Counter Sketch, KMV counters, Count Min Sketch, HyperLogLog, Min Hashes #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 35. Bloom filters Approximate data structure for set membership Behaves like an approximate set BloomFilter.contains(x) => NO | Maybe P(False Positive) > 0 P(False Negative) = 0 #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 36. Internally : Bit Array of fixed size add(x) : for all element i, b[h(x,i)]=1 contains(x) : TRUE if b[h(x,i)] = = 1 for all i. (Boolean AND => associative) Both are associative => BF can be designed as a Monoid #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 37. Bloom filters import com.twitter.algebird._ import com.twitter.algebird.Operators._ // generate 2 lists val A = (1 to 300).toList // Generate a Bloomfilter val NUM_HASHES = 6 val WIDTH = 6000 // bits val SEED = 1 implicit val bfm = new BloomFilterMonoid(NUM_HASHES, WIDTH, SEED) // approximate set with bloomfilter val A_bf = A.map{i => bfm.create(i.toString)}.reduce(_ + _) val approxBool = A_bf.contains(โ€œ150โ€) ---> ApproximateBoolean(true, 0.9995โ€ฆ) #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 38. Count Min Sketch Gives an approximation of the number of occurrences of an element in a set. #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 39. Count Min Sketch Count min sketch Adding an element is a numerical addition Querying uses a MIN function. Both are associative. useful for detecting heavy hitters, topK, LSH We have in Algebird : #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 40. HyperLogLog Popular sketch for cardinality estimtion. Gives within a probilistic distribution of an error the number of distinct values in a data set. HLL.size = Approx[Number] Intuition Long runs of trailings 0 in a random bits chain are rare But the more bit chains you look at, the more likely you are to find a long one The longest run of trailing 0-bits seen can be an estimator of the number of unique bit chains observed. #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 41. Adding an element uses a Max and Sum function. Both are associative and Monoids. (Max is an ordered semigroup in Algebird really) Querying for an element uses an harmonic mean which is a Monoid. In Algebird : #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 42. Many More juicy sketches ... - MinHashes to compute Jaccard similarity - QTree for quantiles estimation. Neat for anomaly detection. - SpaceSaverMonoid, Awesome to find the approximate most frequent and top K elements. - TopKMonoid - SGD, PriorityQueues, Histograms, etc. #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 43. SummingBird : Lamba in a box #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 44. Heard of Lambda Architecture ? #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 45. SummingBird Same code for both batch and real time processing. #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 46. SummingBird Same code, for both batch and real time processing. But works only on Monoids. Uses Storehaus, as a mergeable store layer. #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 47. http://github.com/twitter/algebird #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 48. http://github.com/twitter/algebird #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 49. #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr These slides : http://bit.ly/1szncAZ http://slidesha.re/1zhhXKU
  • 50. #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr
  • 51. Links -Algebra for analytics by Oscar Boykin (Creator of Algebird) http://speakerdeck.com/johnynek/algebra-for-analytics - Take a look into HLearn https://github.com/mikeizbicki/HLearn - Great intro into Algebird by Michael Noll http://www.michael-noll.com/blog/2013/12/02/twitter-algebird-monoid-monad- for-large-scala-data-analytics/ -Aggregate Knowledge http://research.neustar.biz/2012/10/25/sketch-of- the-day-hyperloglog-cornerstone-of-a-big-data-infrastructure - Probabilistic data structures for web analytics. http://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures- web-analytics-data-mining/ - http://debasishg.blogspot.fr/2014/01/count-min-sketch-data-structure- for.html - http://infolab.stanford.edu/~ullman/mmds/ch3.pdf #Devoxx #algebird #scalding #monoid #hadoop #spark @samklr