“Parsing binaries and protocols 
    with erlang ?!”

      Bhasker V Kode
      co­founder  & CTO at hover.in

      at ...
“WHY ... ?!”

    foss.in/2009                        ...
“BUT I'm BUILDING webapps !?!”

    foss.in/2009      ...
“Everything's quick enough :D”

    foss.in/2009      ...

    foss.in/2009                              ...
“ha! ofcourse i knew that...
        err.... but people scale...
        that's what they do ..... 
        that's our way...
“scale UP ...!
        more RAM seems to stop those 
        stall those silly CPU­unit warnings 
        my hosting provi...
“scaling OUT , maybe with a 
        distributed filesystem
        and figure out a way for nodes to 
        talk, and.....
More data becoming archival 
      NOT by choice, but forced to. 

      Not pushed to handling streams of 
      data wel...
Erlang for RAM
             on the web is the new
                             Embedded C

“THE NEWS TODAY. Once popular 
        retro format 'binary' continues to 
        go unnoticed after brief sightings 
“fine! Binaries are everywhere, 
        disk's are not keeping up, and i've 
        got more cores on my nodes every 
“But i'm not still not going near a  
      strict, dynamically typed functional 
      programming language with 

        over­rated ?
                               under­appreciated ?

What happens when you start a erlang shell  . SMP did'nt exist before erlang build R11 ('06)
“ahh... so processes are pseudo 
       threads in the erlang VM that are 
       light weight & the base of erlang 
Max of 1024 schedulers can be set =>  your erlang src today should utilize box's upto 1024 cores
Let M=  msgs to random users
      Let N= 100,000 users
      Route M msgs to right N users !
      typical one­node appro...
3 papers to rule them all & 1 garbage collection method to free them!
3 papers to rule them all & 1 garbage collection method to free them!
3 papers to rule them all & 1 garbage collection method to free them!
EUREKA!!! we have a winner 
    foss.in/2009             ...
“ahh... so this is what the no 
        shared memory in erlang, or light 
        weight process's being garbage 
“How do you spawn a process?”

    foss.in/2009       ...
“Where can you spawn a 

“Can a spawned process talk back 
      to the callee?”

“Can a spawned process listen as 
      long as i want it to?”

      “Can a spawned process stop 
      listening when I ...
“So though erlang gives a library 
      called OTP & a db called mnesia for 
      making life easier ­ you can parse 
“show me the demo's”
●   Process related
        –   Message queue's , Client – server
        –   RPC , Timeouts
●   Bina...
“Binary pattern matching ?”

    <<1:32>>                 = <<...
summary of tech at hover.in
●       LYME stack since ~dec 07 , 4 (­1) nodes (64bit 4GB)
●       python crawler + associate...
Upcoming SlideShare
Loading in...5

Parsing binaries and protocols with erlang


Published on

Delivered by Bhasker V Kode at foss.in/2009

Official talk page at http://foss.in/2009/schedules/talkdetailspub.php?talkid=17

Erlang 's support for handling binaries and pattern matching make it a great choice for parsing everything from IPv4 packets, to payloads from the Memcached protocol, SWF files, or databases like Tokyo Cabinet. From a functional programming perspective, there are various ways of building these parsers, taking advantage of the concurrent and recursive nature that is inherent to the language and other challenges which have been gathered while validating the storage & retrieval options for our distributed crawler, and submitting patches to projects like Medici & Tora ( erlang based Tokyo Cabinet clients). The talk will also touch upon Tokyo cabinet's support for mapreduce with Lua, and notes from building your own custom formats & our internal mapreduce'esque and caching frameworks used in building a multi-million impression platform utilizing under a gig of RAM per node.

Notes on:
- trends in disk/memory/bandwidth
- why erlang, RAM, binaries
- garbage collection in the erlang VM
- message passing
- use-cases

Published in: Technology, News & Politics
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Parsing binaries and protocols with erlang

  1. 1. “Parsing binaries and protocols  with erlang ?!” Bhasker V Kode co­founder  & CTO at hover.in at foss.in December 4th, 2009     http://developers.hover.in
  2. 2. “WHY ... ?!”     foss.in/2009                                                                                        http://developers.hover.in
  3. 3. “BUT I'm BUILDING webapps !?!”     foss.in/2009                                                                                        http://developers.hover.in
  4. 4. “Everything's quick enough :D”     foss.in/2009                                                                                        http://developers.hover.in
  5. 5. “doh!”     foss.in/2009                                                                                        http://developers.hover.in
  6. 6. “ha! ofcourse i knew that... err.... but people scale... that's what they do .....  that's our way out !!!  scaling out ... scaling up ... auto scaling even...!!!   : O ”   foss.in/2009                                                                                        http://developers.hover.in
  7. 7. “scale UP ...! more RAM seems to stop those  stall those silly CPU­unit warnings  my hosting provider gives... bring on those infinite loops &  polling crons. RealTimeWeb FTW!”     foss.in/2009                                                                                        http://developers.hover.in
  8. 8. “scaling OUT , maybe with a  distributed filesystem and figure out a way for nodes to  talk, and... Replication... and  location transparency during  weekends... and  commodity  hardware which i can't pay for ”     foss.in/2009                                                                                        http://developers.hover.in
  9. 9. More data becoming archival  NOT by choice, but forced to.  Not pushed to handling streams of  data well ( even hadoop!) #bigdata  If you're not compromising, you're  not pushing enough. Disk's loss  must be some else's gain.    fixed­length eg's at fb, twitter, google   foss.in/2009                                                                                        http://developers.hover.in
  10. 10. Erlang for RAM on the web is the new Embedded C     foss.in/2009                                                                                        http://developers.hover.in
  11. 11. “THE NEWS TODAY. Once popular  retro format 'binary' continues to  go unnoticed after brief sightings  on wallpapers during the matrix  trilogy ....” pssst! in files of any mime/content type in db's that accept binary in RAM, via caching engines compact for n/w transfer & storage   the answer to unicode   foss.in/2009                                                                                        http://developers.hover.in
  12. 12. “fine! Binaries are everywhere,  disk's are not keeping up, and i've  got more cores on my nodes every  year.”     foss.in/2009                                                                                        http://developers.hover.in
  13. 13. “But i'm not still not going near a   strict, dynamically typed functional  programming language with  support for concurrency,  communication, and distribution,  automatic memory management &  supports multiple platforms !!!”     foss.in/2009                                                                                        http://developers.hover.in
  14. 14. Erlang!!! over­rated ?                      OR under­appreciated ? “ [ 87, 84, 70]  :O !”     foss.in/2009                                                                                        http://developers.hover.in
  15. 15. What happens when you start a erlang shell  . SMP did'nt exist before erlang build R11 ('06)     foss.in/2009                                                                                        http://developers.hover.in
  16. 16. “ahh... so processes are pseudo  threads in the erlang VM that are  light weight & the base of erlang  programs having their own heap or  message inbox & are meant for  message passing erlang  primitaves. Also the developer can  configure how many cores are  used based on # of schedulers,  which run process's.     foss.in/2009                                                                                        http://developers.hover.in
  17. 17. Max of 1024 schedulers can be set =>  your erlang src today should utilize box's upto 1024 cores     foss.in/2009                                                                                        http://developers.hover.in
  18. 18. Let M=  msgs to random users Let N= 100,000 users Route M msgs to right N users ! typical one­node approach :  for i to M   for j to N      if match, add_update actor approach:  N concurrent processes listening to all msgs  As new msg arrives, msg pass to all N pids in each concurrent process: if match, add_update     foss.in/2009                                                                                        http://developers.hover.in
  19. 19. 3 papers to rule them all & 1 garbage collection method to free them!     foss.in/2009                                                                                        http://developers.hover.in
  20. 20. 3 papers to rule them all & 1 garbage collection method to free them!     foss.in/2009                                                                                        http://developers.hover.in
  21. 21. 3 papers to rule them all & 1 garbage collection method to free them!     foss.in/2009                                                                                        http://developers.hover.in
  22. 22. EUREKA!!! we have a winner      foss.in/2009                                                                                        http://developers.hover.in
  23. 23. “ahh... so this is what the no  shared memory in erlang, or light  weight process's being garbage  collected easily since they dont  have references to data in each  other's process heap, & messages   copied or shared based on it's  size, likelihood of reuse and also  optimized for binary. tellmemore!!”     foss.in/2009                                                                                        http://developers.hover.in
  24. 24. “How do you spawn a process?”     foss.in/2009                                                                                        http://developers.hover.in
  25. 25. “Where can you spawn a  process?”     foss.in/2009                                                                                        http://developers.hover.in
  26. 26. “Can a spawned process talk back  to the callee?”     foss.in/2009                                                                                        http://developers.hover.in
  27. 27. “Can a spawned process listen as  long as i want it to?” “Can a spawned process stop  listening when I want it to?” “Can a spawned process spawn  more processes?”     foss.in/2009                                                                                        http://developers.hover.in
  28. 28. “So though erlang gives a library  called OTP & a db called mnesia for  making life easier ­ you can parse  or create binaries easily, make  client­server programs, distributed  rpc calls, tail­recursive servers,  message/priority queue's for  flowcontrol, talk to ports and other  lang's, or create any data structure  explicitly (a) in­memory (b)on­disk  of any connected node!     foss.in/2009                                                                                        http://developers.hover.in
  29. 29. “show me the demo's” ● Process related – Message queue's , Client – server – RPC , Timeouts ● Binary – Binary pattern matching, Parse swf/mp3 for metadata – Networking, comm. with C, Tokyocabinet client eg. ● Process + Binary! – Building a production ready in­memory CDN  consistently faster than Am4z0n cl0udfr0nt, in stages   open & gzip < concat js's < inmemory < streaming?   foss.in/2009                                                                                        http://developers.hover.in
  30. 30. “Binary pattern matching ?” <<Value:Size/Type­Signedness­Endianism­ unit:Unit>> <<1:32>> = <<0,0,0,1>. <<1:32/unsigned-little>> = <<1,0,0,0>. <<_:8,“mnesia”/binary>> = <<”Amnesia”>>. So <<Bin>> could be unicode characters  ( English, hindi, tamil ) or JPG's or http headers  or basically segments of binaries NewBinary=<<Segment1,Segment2>>.     foss.in/2009                                                                                        http://developers.hover.in
  31. 31. summary of tech at hover.in ● LYME stack since ~dec 07 , 4 (­1) nodes (64bit 4GB) ● python crawler + associated NLP parsers, index's now  in tokyo cabinet, inverted index's in erlang 's mnesia db  with binaries of 5 diff indian languages + multiple  content­types, cpu time­splicing algo's, priority queue's  for heat­seeking algo, flowcontrol, caching engines,  cyclic queues, map­reduces with non­blocking gathers,  headless­firefox for thumbnails, patches to  tokyocabinet client 'medici' ● Beta in Jan 09, 1 million hovers/month in May'09 ●   2­4 developers + several interns across ~2 years    foss.in/2009                                                                                        http://developers.hover.in
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.