Parsing binaries and protocols with erlang
Upcoming SlideShare
Loading in...5
×
 

Parsing binaries and protocols with erlang

on

  • 4,478 views

Delivered by Bhasker V Kode at foss.in/2009 ...

Delivered by Bhasker V Kode at foss.in/2009

Official talk page at http://foss.in/2009/schedules/talkdetailspub.php?talkid=17

Erlang 's support for handling binaries and pattern matching make it a great choice for parsing everything from IPv4 packets, to payloads from the Memcached protocol, SWF files, or databases like Tokyo Cabinet. From a functional programming perspective, there are various ways of building these parsers, taking advantage of the concurrent and recursive nature that is inherent to the language and other challenges which have been gathered while validating the storage & retrieval options for our distributed crawler, and submitting patches to projects like Medici & Tora ( erlang based Tokyo Cabinet clients). The talk will also touch upon Tokyo cabinet's support for mapreduce with Lua, and notes from building your own custom formats & our internal mapreduce'esque and caching frameworks used in building a multi-million impression platform utilizing under a gig of RAM per node.

Notes on:
- trends in disk/memory/bandwidth
- why erlang, RAM, binaries
- garbage collection in the erlang VM
- message passing
- use-cases

Statistics

Views

Total Views
4,478
Slideshare-icon Views on SlideShare
4,459
Embed Views
19

Actions

Likes
3
Downloads
45
Comments
0

2 Embeds 19

http://www.slideshare.net 18
http://www.slashdocs.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Parsing binaries and protocols with erlang Parsing binaries and protocols with erlang Presentation Transcript

    • “Parsing binaries and protocols  with erlang ?!” Bhasker V Kode co­founder  & CTO at hover.in at foss.in December 4th, 2009     http://developers.hover.in
    • “WHY ... ?!”     foss.in/2009                                                                                        http://developers.hover.in
    • “BUT I'm BUILDING webapps !?!”     foss.in/2009                                                                                        http://developers.hover.in
    • “Everything's quick enough :D”     foss.in/2009                                                                                        http://developers.hover.in
    • “doh!”     foss.in/2009                                                                                        http://developers.hover.in
    • “ha! ofcourse i knew that... err.... but people scale... that's what they do .....  that's our way out !!!  scaling out ... scaling up ... auto scaling even...!!!   : O ”   foss.in/2009                                                                                        http://developers.hover.in
    • “scale UP ...! more RAM seems to stop those  stall those silly CPU­unit warnings  my hosting provider gives... bring on those infinite loops &  polling crons. RealTimeWeb FTW!”     foss.in/2009                                                                                        http://developers.hover.in
    • “scaling OUT , maybe with a  distributed filesystem and figure out a way for nodes to  talk, and... Replication... and  location transparency during  weekends... and  commodity  hardware which i can't pay for ”     foss.in/2009                                                                                        http://developers.hover.in
    • More data becoming archival  NOT by choice, but forced to.  Not pushed to handling streams of  data well ( even hadoop!) #bigdata  If you're not compromising, you're  not pushing enough. Disk's loss  must be some else's gain.    fixed­length eg's at fb, twitter, google   foss.in/2009                                                                                        http://developers.hover.in
    • Erlang for RAM on the web is the new Embedded C     foss.in/2009                                                                                        http://developers.hover.in
    • “THE NEWS TODAY. Once popular  retro format 'binary' continues to  go unnoticed after brief sightings  on wallpapers during the matrix  trilogy ....” pssst! in files of any mime/content type in db's that accept binary in RAM, via caching engines compact for n/w transfer & storage   the answer to unicode   foss.in/2009                                                                                        http://developers.hover.in
    • “fine! Binaries are everywhere,  disk's are not keeping up, and i've  got more cores on my nodes every  year.”     foss.in/2009                                                                                        http://developers.hover.in
    • “But i'm not still not going near a   strict, dynamically typed functional  programming language with  support for concurrency,  communication, and distribution,  automatic memory management &  supports multiple platforms !!!”     foss.in/2009                                                                                        http://developers.hover.in
    • Erlang!!! over­rated ?                      OR under­appreciated ? “ [ 87, 84, 70]  :O !”     foss.in/2009                                                                                        http://developers.hover.in
    • What happens when you start a erlang shell  . SMP did'nt exist before erlang build R11 ('06)     foss.in/2009                                                                                        http://developers.hover.in
    • “ahh... so processes are pseudo  threads in the erlang VM that are  light weight & the base of erlang  programs having their own heap or  message inbox & are meant for  message passing erlang  primitaves. Also the developer can  configure how many cores are  used based on # of schedulers,  which run process's.     foss.in/2009                                                                                        http://developers.hover.in
    • Max of 1024 schedulers can be set =>  your erlang src today should utilize box's upto 1024 cores     foss.in/2009                                                                                        http://developers.hover.in
    • Let M=  msgs to random users Let N= 100,000 users Route M msgs to right N users ! typical one­node approach :  for i to M   for j to N      if match, add_update actor approach:  N concurrent processes listening to all msgs  As new msg arrives, msg pass to all N pids in each concurrent process: if match, add_update     foss.in/2009                                                                                        http://developers.hover.in
    • 3 papers to rule them all & 1 garbage collection method to free them!     foss.in/2009                                                                                        http://developers.hover.in
    • 3 papers to rule them all & 1 garbage collection method to free them!     foss.in/2009                                                                                        http://developers.hover.in
    • 3 papers to rule them all & 1 garbage collection method to free them!     foss.in/2009                                                                                        http://developers.hover.in
    • EUREKA!!! we have a winner      foss.in/2009                                                                                        http://developers.hover.in
    • “ahh... so this is what the no  shared memory in erlang, or light  weight process's being garbage  collected easily since they dont  have references to data in each  other's process heap, & messages   copied or shared based on it's  size, likelihood of reuse and also  optimized for binary. tellmemore!!”     foss.in/2009                                                                                        http://developers.hover.in
    • “How do you spawn a process?”     foss.in/2009                                                                                        http://developers.hover.in
    • “Where can you spawn a  process?”     foss.in/2009                                                                                        http://developers.hover.in
    • “Can a spawned process talk back  to the callee?”     foss.in/2009                                                                                        http://developers.hover.in
    • “Can a spawned process listen as  long as i want it to?” “Can a spawned process stop  listening when I want it to?” “Can a spawned process spawn  more processes?”     foss.in/2009                                                                                        http://developers.hover.in
    • “So though erlang gives a library  called OTP & a db called mnesia for  making life easier ­ you can parse  or create binaries easily, make  client­server programs, distributed  rpc calls, tail­recursive servers,  message/priority queue's for  flowcontrol, talk to ports and other  lang's, or create any data structure  explicitly (a) in­memory (b)on­disk  of any connected node!     foss.in/2009                                                                                        http://developers.hover.in
    • “show me the demo's” ● Process related – Message queue's , Client – server – RPC , Timeouts ● Binary – Binary pattern matching, Parse swf/mp3 for metadata – Networking, comm. with C, Tokyocabinet client eg. ● Process + Binary! – Building a production ready in­memory CDN  consistently faster than Am4z0n cl0udfr0nt, in stages   open & gzip < concat js's < inmemory < streaming?   foss.in/2009                                                                                        http://developers.hover.in
    • “Binary pattern matching ?” <<Value:Size/Type­Signedness­Endianism­ unit:Unit>> <<1:32>> = <<0,0,0,1>. <<1:32/unsigned-little>> = <<1,0,0,0>. <<_:8,“mnesia”/binary>> = <<”Amnesia”>>. So <<Bin>> could be unicode characters  ( English, hindi, tamil ) or JPG's or http headers  or basically segments of binaries NewBinary=<<Segment1,Segment2>>.     foss.in/2009                                                                                        http://developers.hover.in
    • summary of tech at hover.in ● LYME stack since ~dec 07 , 4 (­1) nodes (64bit 4GB) ● python crawler + associated NLP parsers, index's now  in tokyo cabinet, inverted index's in erlang 's mnesia db  with binaries of 5 diff indian languages + multiple  content­types, cpu time­splicing algo's, priority queue's  for heat­seeking algo, flowcontrol, caching engines,  cyclic queues, map­reduces with non­blocking gathers,  headless­firefox for thumbnails, patches to  tokyocabinet client 'medici' ● Beta in Jan 09, 1 million hovers/month in May'09 ●   2­4 developers + several interns across ~2 years    foss.in/2009                                                                                        http://developers.hover.in