SlideShare a Scribd company logo
BugSense
Big Data for mobile
with Erlang, C, LISP
   Dionisis "Dio" Kakoliris
      Head of Engineering
      dgk@bugsense.com
BugSense Trivia
●   Third biggest mobile SDK in the world
●   Analyzing data from ~400M devices
●   More than ~120k writes/updates per sec
●   Custom Big Data database (LDB)
●   Eleven engineers
●   Cash positive
Data map landscape
Data processing landscape
Enter LDB
Distributed concurrent updates
Overview
●   Complex Event Processing, In-Memory DB
●   Time-series Stream Processor
●   Super easy to setup/use - one package
●   No big locks (fine-grained locking)
●   Describe-your-data mentality
●   C is fast
●   C is ideal for destructive updates (imperative)
●   "Let it crash!"
Overview (cont'd)
●   LISP-like DSL for custom processing / views
●   Ideal for parallelism (functional)
●   Lazy loading of files (asynchronous)
●   Saves data to disk (asynchronous)
●   Modular / Extendable architecture
●   Custom reducers / off-line processing
...And now, let's go BIG
●   We love Erlang!
●   Node isolation, replication, supervision
●   Request processing and forwarding
●   Distributed
●   Ideal for building real-time systems
●   Fast, robust, reliable
●   YOU TRUST ERLANG
Why Erlang?
●   Scaling
●   Handles lots of connections efficiently (HTTP)
●   Sending/receiving messages from/to nodes is trivial
●   Building a replication/take over engine is easy
●   Mnesia for storage and shared info
●   C Linkedin Drivers
●   Being able to connect to remote nodes
●   All of the above - integrated in one language
Architecture
Snippet
 RDLOCK_HASH_STR(kvm1->hashtab, id, bucket, 1, &hashv1, &hashv2,
              &bloom);
 if (!bloom) {
      RDUNLOCK_HASH_STR(kvm1->hashtab, bucket);


      return 0;
 }


 FIND_HASH_STR(kvm1->hashtab, bucket, id, val, struct kvm1_val_t *);
 if (val) {
      result = val->cntr;
      RDUNLOCK_HASH_STR(kvm1->hashtab, bucket);


      return result;
 }


 RDUNLOCK_HASH_STR(kvm1->hashtab, bucket);
Erlang <--> LDB
●    Erlang is the right tool for the job
●    Small language that handles a very crucial sector
●    Actors are magnificent!
●    Mnesia is your "faithful" companion
●    Constantly evolving / getting better
●    Erlang processes are your (million) friends!

More at: http://highscalability.com/blog/2012/11/26/bigdata-using-erlang-c-and-lisp-to-fight-the-
tsunami-of-mobi.html
Snippets...
spawn_ldb_app(NumOfKeyCells) ->
       Pid = spawn(ldb_appnode, create, [self()]),
       receive
                 {ok, Port} ->
                         io:format("New ldb_app started ~n", []),
                         Pid ! {?START, self(), integer_to_list(1), integer_to_list(NumOfKeyCells)},
                         receive
                                   {ok, []} ->
                                          ok
                         end
       end,
       {Pid, Port}.
Snippets...
create(ParentPid) ->
       Port = open_port({spawn_driver, lethe_drv}, [binary]),
       ParentPid ! {ok, Port},
       loop(Port).


loop(Port) ->
       receive
                 {?VIEW, Caller, Key, Value} ->
                         port_command(Port, term_to_binary({?VIEW, Key, Value})),
                         receive
                                   {ok, []} ->
                                          Caller ! {ok, []};   % THIS SHOULDNT BE FOO!
                                   {view, Key, Value, Result} ->
                                          Caller ! {view, Key, Value, Result};
                                   _ ->
                                          ignored
                         end,
                         loop(Port);
Bonus Stage: Why LISP?
●   Very easy lexer, parser, analyzer, interpreter impl.
●   SQL-ish queries
●   Expressive power, abstraction
●   S-expressions, heterogeneous multi-dim. lists
●   Immutability => Ideal for parallel computing
●   Tons of Data => Prefix notation
●   Tons of Data => Data transformation FTW!
Yet another snippet...
(define (in-all? arrays uid)
 (reduce (lambda (x acc) (and x acc))
         (map (lambda (x) (in? x uid))
              arrays)))


(define (in-all-timespan arrays uid from to)
 (map (lambda (x)
        (timebubble (timewarp (current-timespace) x)
                    (in-all? arrays uid)))
      (range from to)))


(define (at-least-once-in-all? arrays uid from to)
 (reduce (lambda (x acc) (or x acc))
         (in-all-timespan arrays uid from to)))


(define (always-in-all? arrays uid from to)
 (reduce (lambda (x acc) (and x acc))
         (in-all-timespan arrays uid from to)))


(define (never-in-all? arrays uid from to)
 (not (at-least-once-in-all? arrays uid from to)))
Moving Forward
●   Not easy hot code swapping
●   Auxiliary tools for monitoring, support, configuration
●   GUIs for the above
●   Cross-platform
●   Interoperability with other systems
●   Support role for big and complex systems
●   Tweaking - LDB has the potential to run everywhere!
●   ...From your cell-phone to an HPC cluster!
And?
●   Use Erlang for your projects!
●   It's small, easy and above-all it DELIVERS!
●   Don't be scared by the complexity of modern systems
●   Distributed systems are the present and future
●   Harness this massive potential with Erlang
●   START A PROJECT NOW!
Thank you!
Stay tuned for fresh stuff!
  www.bugsense.com
   blog.bugsense.com

More Related Content

What's hot

Openstack 簡介
Openstack 簡介Openstack 簡介
Openstack 簡介
kao kuo-tung
 
LINQ Inside
LINQ InsideLINQ Inside
LINQ Insidejeffz
 
Csp scala wixmeetup2016
Csp scala wixmeetup2016Csp scala wixmeetup2016
Csp scala wixmeetup2016
Ruslan Shevchenko
 
JVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, WixJVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, Wix
Codemotion Tel Aviv
 
Klee and angr
Klee and angrKlee and angr
Klee and angr
Wei-Bo Chen
 
Skiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DSkiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in D
Mithun Hunsur
 
Skydive, real-time network analyzer, container integration
Skydive, real-time network analyzer, container integrationSkydive, real-time network analyzer, container integration
Skydive, real-time network analyzer, container integration
Sylvain Afchain
 
Skydive 5/07/2016
Skydive 5/07/2016Skydive 5/07/2016
Skydive 5/07/2016
Sylvain Afchain
 
Optimizing with persistent data structures (LLVM Cauldron 2016)
Optimizing with persistent data structures (LLVM Cauldron 2016)Optimizing with persistent data structures (LLVM Cauldron 2016)
Optimizing with persistent data structures (LLVM Cauldron 2016)
Igalia
 
Knit, Chisel, Hack: Building Programs in Guile Scheme (Strange Loop 2016)
Knit, Chisel, Hack: Building Programs in Guile Scheme (Strange Loop 2016)Knit, Chisel, Hack: Building Programs in Guile Scheme (Strange Loop 2016)
Knit, Chisel, Hack: Building Programs in Guile Scheme (Strange Loop 2016)
Igalia
 
There is garbage in our code - Talks by Softbinator
There is garbage in our code - Talks by SoftbinatorThere is garbage in our code - Talks by Softbinator
There is garbage in our code - Talks by Softbinator
Adrian Oprea
 
ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...
ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...
ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...
ScyllaDB
 
DaNode - A home made web server in D
DaNode - A home made web server in DDaNode - A home made web server in D
DaNode - A home made web server in D
Andrei Alexandrescu
 
Introduction to FPGAs
Introduction to FPGAsIntroduction to FPGAs
Introduction to FPGAs
Scott Thibault
 
Optimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with TruffleOptimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with Truffle
Stefan Marr
 
SE 20016 - programming languages landscape.
SE 20016 - programming languages landscape.SE 20016 - programming languages landscape.
SE 20016 - programming languages landscape.
Ruslan Shevchenko
 
Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)
Daniel Lemire
 
maXbox Starter 39 GEO Maps Tutorial
maXbox Starter 39 GEO Maps TutorialmaXbox Starter 39 GEO Maps Tutorial
maXbox Starter 39 GEO Maps Tutorial
Max Kleiner
 
scala-gopher: async implementation of CSP for scala
scala-gopher:  async implementation of CSP  for  scalascala-gopher:  async implementation of CSP  for  scala
scala-gopher: async implementation of CSP for scala
Ruslan Shevchenko
 

What's hot (20)

Openstack 簡介
Openstack 簡介Openstack 簡介
Openstack 簡介
 
LINQ Inside
LINQ InsideLINQ Inside
LINQ Inside
 
Advanced locking
Advanced lockingAdvanced locking
Advanced locking
 
Csp scala wixmeetup2016
Csp scala wixmeetup2016Csp scala wixmeetup2016
Csp scala wixmeetup2016
 
JVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, WixJVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, Wix
 
Klee and angr
Klee and angrKlee and angr
Klee and angr
 
Skiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DSkiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in D
 
Skydive, real-time network analyzer, container integration
Skydive, real-time network analyzer, container integrationSkydive, real-time network analyzer, container integration
Skydive, real-time network analyzer, container integration
 
Skydive 5/07/2016
Skydive 5/07/2016Skydive 5/07/2016
Skydive 5/07/2016
 
Optimizing with persistent data structures (LLVM Cauldron 2016)
Optimizing with persistent data structures (LLVM Cauldron 2016)Optimizing with persistent data structures (LLVM Cauldron 2016)
Optimizing with persistent data structures (LLVM Cauldron 2016)
 
Knit, Chisel, Hack: Building Programs in Guile Scheme (Strange Loop 2016)
Knit, Chisel, Hack: Building Programs in Guile Scheme (Strange Loop 2016)Knit, Chisel, Hack: Building Programs in Guile Scheme (Strange Loop 2016)
Knit, Chisel, Hack: Building Programs in Guile Scheme (Strange Loop 2016)
 
There is garbage in our code - Talks by Softbinator
There is garbage in our code - Talks by SoftbinatorThere is garbage in our code - Talks by Softbinator
There is garbage in our code - Talks by Softbinator
 
ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...
ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...
ceph::errorator<> throw/catch-free, compile time-checked exceptions for seast...
 
DaNode - A home made web server in D
DaNode - A home made web server in DDaNode - A home made web server in D
DaNode - A home made web server in D
 
Introduction to FPGAs
Introduction to FPGAsIntroduction to FPGAs
Introduction to FPGAs
 
Optimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with TruffleOptimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with Truffle
 
SE 20016 - programming languages landscape.
SE 20016 - programming languages landscape.SE 20016 - programming languages landscape.
SE 20016 - programming languages landscape.
 
Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)
 
maXbox Starter 39 GEO Maps Tutorial
maXbox Starter 39 GEO Maps TutorialmaXbox Starter 39 GEO Maps Tutorial
maXbox Starter 39 GEO Maps Tutorial
 
scala-gopher: async implementation of CSP for scala
scala-gopher:  async implementation of CSP  for  scalascala-gopher:  async implementation of CSP  for  scala
scala-gopher: async implementation of CSP for scala
 

Similar to Big Data for Mobile

Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
Wisely chen
 
Dragoncraft Architectural Overview
Dragoncraft Architectural OverviewDragoncraft Architectural Overview
Dragoncraft Architectural Overview
jessesanford
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
rhatr
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Ran Silberman
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Ran Silberman
 
Toying with spark
Toying with sparkToying with spark
Toying with spark
Raymond Tay
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
Gert Drapers
 
The Evolution of Async-Programming (SD 2.0, JavaScript)
The Evolution of Async-Programming (SD 2.0, JavaScript)The Evolution of Async-Programming (SD 2.0, JavaScript)
The Evolution of Async-Programming (SD 2.0, JavaScript)jeffz
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
Dongmin Yu
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
NVIDIA Japan
 
Cloud jpl
Cloud jplCloud jpl
Cloud jpl
Marc de Palol
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
Jim Mlodgenski
 
Perl on Amazon Elastic MapReduce
Perl on Amazon Elastic MapReducePerl on Amazon Elastic MapReduce
Perl on Amazon Elastic MapReduce
Pedro Figueiredo
 
Ruby on Big Data (Cassandra + Hadoop)
Ruby on Big Data (Cassandra + Hadoop)Ruby on Big Data (Cassandra + Hadoop)
Ruby on Big Data (Cassandra + Hadoop)
Brian O'Neill
 
L3.fa14.ppt
L3.fa14.pptL3.fa14.ppt
L3.fa14.ppt
Tushar557668
 
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
David Walker
 
Apache Flink & Graph Processing
Apache Flink & Graph ProcessingApache Flink & Graph Processing
Apache Flink & Graph Processing
Vasia Kalavri
 
Big Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your BrowserBig Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your Browser
gethue
 

Similar to Big Data for Mobile (20)

Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
Dragoncraft Architectural Overview
Dragoncraft Architectural OverviewDragoncraft Architectural Overview
Dragoncraft Architectural Overview
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Toying with spark
Toying with sparkToying with spark
Toying with spark
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
The Evolution of Async-Programming (SD 2.0, JavaScript)
The Evolution of Async-Programming (SD 2.0, JavaScript)The Evolution of Async-Programming (SD 2.0, JavaScript)
The Evolution of Async-Programming (SD 2.0, JavaScript)
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
Cloud jpl
Cloud jplCloud jpl
Cloud jpl
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
 
Perl on Amazon Elastic MapReduce
Perl on Amazon Elastic MapReducePerl on Amazon Elastic MapReduce
Perl on Amazon Elastic MapReduce
 
Arvindsujeeth scaladays12
Arvindsujeeth scaladays12Arvindsujeeth scaladays12
Arvindsujeeth scaladays12
 
Ruby on Big Data (Cassandra + Hadoop)
Ruby on Big Data (Cassandra + Hadoop)Ruby on Big Data (Cassandra + Hadoop)
Ruby on Big Data (Cassandra + Hadoop)
 
L3.fa14.ppt
L3.fa14.pptL3.fa14.ppt
L3.fa14.ppt
 
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
 
Apache Flink & Graph Processing
Apache Flink & Graph ProcessingApache Flink & Graph Processing
Apache Flink & Graph Processing
 
Big Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your BrowserBig Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your Browser
 

Big Data for Mobile

  • 1. BugSense Big Data for mobile with Erlang, C, LISP Dionisis "Dio" Kakoliris Head of Engineering dgk@bugsense.com
  • 2. BugSense Trivia ● Third biggest mobile SDK in the world ● Analyzing data from ~400M devices ● More than ~120k writes/updates per sec ● Custom Big Data database (LDB) ● Eleven engineers ● Cash positive
  • 7. Overview ● Complex Event Processing, In-Memory DB ● Time-series Stream Processor ● Super easy to setup/use - one package ● No big locks (fine-grained locking) ● Describe-your-data mentality ● C is fast ● C is ideal for destructive updates (imperative) ● "Let it crash!"
  • 8. Overview (cont'd) ● LISP-like DSL for custom processing / views ● Ideal for parallelism (functional) ● Lazy loading of files (asynchronous) ● Saves data to disk (asynchronous) ● Modular / Extendable architecture ● Custom reducers / off-line processing
  • 9. ...And now, let's go BIG ● We love Erlang! ● Node isolation, replication, supervision ● Request processing and forwarding ● Distributed ● Ideal for building real-time systems ● Fast, robust, reliable ● YOU TRUST ERLANG
  • 10. Why Erlang? ● Scaling ● Handles lots of connections efficiently (HTTP) ● Sending/receiving messages from/to nodes is trivial ● Building a replication/take over engine is easy ● Mnesia for storage and shared info ● C Linkedin Drivers ● Being able to connect to remote nodes ● All of the above - integrated in one language
  • 12. Snippet RDLOCK_HASH_STR(kvm1->hashtab, id, bucket, 1, &hashv1, &hashv2, &bloom); if (!bloom) { RDUNLOCK_HASH_STR(kvm1->hashtab, bucket); return 0; } FIND_HASH_STR(kvm1->hashtab, bucket, id, val, struct kvm1_val_t *); if (val) { result = val->cntr; RDUNLOCK_HASH_STR(kvm1->hashtab, bucket); return result; } RDUNLOCK_HASH_STR(kvm1->hashtab, bucket);
  • 13. Erlang <--> LDB ● Erlang is the right tool for the job ● Small language that handles a very crucial sector ● Actors are magnificent! ● Mnesia is your "faithful" companion ● Constantly evolving / getting better ● Erlang processes are your (million) friends! More at: http://highscalability.com/blog/2012/11/26/bigdata-using-erlang-c-and-lisp-to-fight-the- tsunami-of-mobi.html
  • 14. Snippets... spawn_ldb_app(NumOfKeyCells) -> Pid = spawn(ldb_appnode, create, [self()]), receive {ok, Port} -> io:format("New ldb_app started ~n", []), Pid ! {?START, self(), integer_to_list(1), integer_to_list(NumOfKeyCells)}, receive {ok, []} -> ok end end, {Pid, Port}.
  • 15. Snippets... create(ParentPid) -> Port = open_port({spawn_driver, lethe_drv}, [binary]), ParentPid ! {ok, Port}, loop(Port). loop(Port) -> receive {?VIEW, Caller, Key, Value} -> port_command(Port, term_to_binary({?VIEW, Key, Value})), receive {ok, []} -> Caller ! {ok, []}; % THIS SHOULDNT BE FOO! {view, Key, Value, Result} -> Caller ! {view, Key, Value, Result}; _ -> ignored end, loop(Port);
  • 16. Bonus Stage: Why LISP? ● Very easy lexer, parser, analyzer, interpreter impl. ● SQL-ish queries ● Expressive power, abstraction ● S-expressions, heterogeneous multi-dim. lists ● Immutability => Ideal for parallel computing ● Tons of Data => Prefix notation ● Tons of Data => Data transformation FTW!
  • 17. Yet another snippet... (define (in-all? arrays uid) (reduce (lambda (x acc) (and x acc)) (map (lambda (x) (in? x uid)) arrays))) (define (in-all-timespan arrays uid from to) (map (lambda (x) (timebubble (timewarp (current-timespace) x) (in-all? arrays uid))) (range from to))) (define (at-least-once-in-all? arrays uid from to) (reduce (lambda (x acc) (or x acc)) (in-all-timespan arrays uid from to))) (define (always-in-all? arrays uid from to) (reduce (lambda (x acc) (and x acc)) (in-all-timespan arrays uid from to))) (define (never-in-all? arrays uid from to) (not (at-least-once-in-all? arrays uid from to)))
  • 18. Moving Forward ● Not easy hot code swapping ● Auxiliary tools for monitoring, support, configuration ● GUIs for the above ● Cross-platform ● Interoperability with other systems ● Support role for big and complex systems ● Tweaking - LDB has the potential to run everywhere! ● ...From your cell-phone to an HPC cluster!
  • 19. And? ● Use Erlang for your projects! ● It's small, easy and above-all it DELIVERS! ● Don't be scared by the complexity of modern systems ● Distributed systems are the present and future ● Harness this massive potential with Erlang ● START A PROJECT NOW!
  • 20. Thank you! Stay tuned for fresh stuff! www.bugsense.com blog.bugsense.com