Your SlideShare is downloading. ×
Parsing binaries and protocols with erlang
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Parsing binaries and protocols with erlang

3,710
views

Published on

Delivered by Bhasker V Kode at foss.in/2009 …

Delivered by Bhasker V Kode at foss.in/2009

Official talk page at http://foss.in/2009/schedules/talkdetailspub.php?talkid=17

Erlang 's support for handling binaries and pattern matching make it a great choice for parsing everything from IPv4 packets, to payloads from the Memcached protocol, SWF files, or databases like Tokyo Cabinet. From a functional programming perspective, there are various ways of building these parsers, taking advantage of the concurrent and recursive nature that is inherent to the language and other challenges which have been gathered while validating the storage & retrieval options for our distributed crawler, and submitting patches to projects like Medici & Tora ( erlang based Tokyo Cabinet clients). The talk will also touch upon Tokyo cabinet's support for mapreduce with Lua, and notes from building your own custom formats & our internal mapreduce'esque and caching frameworks used in building a multi-million impression platform utilizing under a gig of RAM per node.

Notes on:
- trends in disk/memory/bandwidth
- why erlang, RAM, binaries
- garbage collection in the erlang VM
- message passing
- use-cases

Published in: Technology, News & Politics

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,710
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
50
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. “Parsing binaries and protocols  with erlang ?!” Bhasker V Kode co­founder  & CTO at hover.in at foss.in December 4th, 2009     http://developers.hover.in
  • 2. “WHY ... ?!”     foss.in/2009                                                                                        http://developers.hover.in
  • 3. “BUT I'm BUILDING webapps !?!”     foss.in/2009                                                                                        http://developers.hover.in
  • 4. “Everything's quick enough :D”     foss.in/2009                                                                                        http://developers.hover.in
  • 5. “doh!”     foss.in/2009                                                                                        http://developers.hover.in
  • 6. “ha! ofcourse i knew that... err.... but people scale... that's what they do .....  that's our way out !!!  scaling out ... scaling up ... auto scaling even...!!!   : O ”   foss.in/2009                                                                                        http://developers.hover.in
  • 7. “scale UP ...! more RAM seems to stop those  stall those silly CPU­unit warnings  my hosting provider gives... bring on those infinite loops &  polling crons. RealTimeWeb FTW!”     foss.in/2009                                                                                        http://developers.hover.in
  • 8. “scaling OUT , maybe with a  distributed filesystem and figure out a way for nodes to  talk, and... Replication... and  location transparency during  weekends... and  commodity  hardware which i can't pay for ”     foss.in/2009                                                                                        http://developers.hover.in
  • 9. More data becoming archival  NOT by choice, but forced to.  Not pushed to handling streams of  data well ( even hadoop!) #bigdata  If you're not compromising, you're  not pushing enough. Disk's loss  must be some else's gain.    fixed­length eg's at fb, twitter, google   foss.in/2009                                                                                        http://developers.hover.in
  • 10. Erlang for RAM on the web is the new Embedded C     foss.in/2009                                                                                        http://developers.hover.in
  • 11. “THE NEWS TODAY. Once popular  retro format 'binary' continues to  go unnoticed after brief sightings  on wallpapers during the matrix  trilogy ....” pssst! in files of any mime/content type in db's that accept binary in RAM, via caching engines compact for n/w transfer & storage   the answer to unicode   foss.in/2009                                                                                        http://developers.hover.in
  • 12. “fine! Binaries are everywhere,  disk's are not keeping up, and i've  got more cores on my nodes every  year.”     foss.in/2009                                                                                        http://developers.hover.in
  • 13. “But i'm not still not going near a   strict, dynamically typed functional  programming language with  support for concurrency,  communication, and distribution,  automatic memory management &  supports multiple platforms !!!”     foss.in/2009                                                                                        http://developers.hover.in
  • 14. Erlang!!! over­rated ?                      OR under­appreciated ? “ [ 87, 84, 70]  :O !”     foss.in/2009                                                                                        http://developers.hover.in
  • 15. What happens when you start a erlang shell  . SMP did'nt exist before erlang build R11 ('06)     foss.in/2009                                                                                        http://developers.hover.in
  • 16. “ahh... so processes are pseudo  threads in the erlang VM that are  light weight & the base of erlang  programs having their own heap or  message inbox & are meant for  message passing erlang  primitaves. Also the developer can  configure how many cores are  used based on # of schedulers,  which run process's.     foss.in/2009                                                                                        http://developers.hover.in
  • 17. Max of 1024 schedulers can be set =>  your erlang src today should utilize box's upto 1024 cores     foss.in/2009                                                                                        http://developers.hover.in
  • 18. Let M=  msgs to random users Let N= 100,000 users Route M msgs to right N users ! typical one­node approach :  for i to M   for j to N      if match, add_update actor approach:  N concurrent processes listening to all msgs  As new msg arrives, msg pass to all N pids in each concurrent process: if match, add_update     foss.in/2009                                                                                        http://developers.hover.in
  • 19. 3 papers to rule them all & 1 garbage collection method to free them!     foss.in/2009                                                                                        http://developers.hover.in
  • 20. 3 papers to rule them all & 1 garbage collection method to free them!     foss.in/2009                                                                                        http://developers.hover.in
  • 21. 3 papers to rule them all & 1 garbage collection method to free them!     foss.in/2009                                                                                        http://developers.hover.in
  • 22. EUREKA!!! we have a winner      foss.in/2009                                                                                        http://developers.hover.in
  • 23. “ahh... so this is what the no  shared memory in erlang, or light  weight process's being garbage  collected easily since they dont  have references to data in each  other's process heap, & messages   copied or shared based on it's  size, likelihood of reuse and also  optimized for binary. tellmemore!!”     foss.in/2009                                                                                        http://developers.hover.in
  • 24. “How do you spawn a process?”     foss.in/2009                                                                                        http://developers.hover.in
  • 25. “Where can you spawn a  process?”     foss.in/2009                                                                                        http://developers.hover.in
  • 26. “Can a spawned process talk back  to the callee?”     foss.in/2009                                                                                        http://developers.hover.in
  • 27. “Can a spawned process listen as  long as i want it to?” “Can a spawned process stop  listening when I want it to?” “Can a spawned process spawn  more processes?”     foss.in/2009                                                                                        http://developers.hover.in
  • 28. “So though erlang gives a library  called OTP & a db called mnesia for  making life easier ­ you can parse  or create binaries easily, make  client­server programs, distributed  rpc calls, tail­recursive servers,  message/priority queue's for  flowcontrol, talk to ports and other  lang's, or create any data structure  explicitly (a) in­memory (b)on­disk  of any connected node!     foss.in/2009                                                                                        http://developers.hover.in
  • 29. “show me the demo's” ● Process related – Message queue's , Client – server – RPC , Timeouts ● Binary – Binary pattern matching, Parse swf/mp3 for metadata – Networking, comm. with C, Tokyocabinet client eg. ● Process + Binary! – Building a production ready in­memory CDN  consistently faster than Am4z0n cl0udfr0nt, in stages   open & gzip < concat js's < inmemory < streaming?   foss.in/2009                                                                                        http://developers.hover.in
  • 30. “Binary pattern matching ?” <<Value:Size/Type­Signedness­Endianism­ unit:Unit>> <<1:32>> = <<0,0,0,1>. <<1:32/unsigned-little>> = <<1,0,0,0>. <<_:8,“mnesia”/binary>> = <<”Amnesia”>>. So <<Bin>> could be unicode characters  ( English, hindi, tamil ) or JPG's or http headers  or basically segments of binaries NewBinary=<<Segment1,Segment2>>.     foss.in/2009                                                                                        http://developers.hover.in
  • 31. summary of tech at hover.in ● LYME stack since ~dec 07 , 4 (­1) nodes (64bit 4GB) ● python crawler + associated NLP parsers, index's now  in tokyo cabinet, inverted index's in erlang 's mnesia db  with binaries of 5 diff indian languages + multiple  content­types, cpu time­splicing algo's, priority queue's  for heat­seeking algo, flowcontrol, caching engines,  cyclic queues, map­reduces with non­blocking gathers,  headless­firefox for thumbnails, patches to  tokyocabinet client 'medici' ● Beta in Jan 09, 1 million hovers/month in May'09 ●   2­4 developers + several interns across ~2 years    foss.in/2009                                                                                        http://developers.hover.in