erlang at , Devcamp Blr 09


Published on

when erlang makes sense. this talk tries to draw simple metaphors from what we can learn from bacteria ,the brain ,memory & concurrency .

the talk was presented at devcamp bangalore, by bhasker v kode, co-founder & CTO at

also check out the devblog at

Published in: Technology, News & Politics
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

erlang at , Devcamp Blr 09

  1. 1. when erlang makes sense ( erlang at ) Bhasker V Kode co-founder & CTO at at devCamp, Bangalore Apr 11, 2009
  2. 2. brief introduction to choose words from your blog, & decide what content / ad you want when you hover* over it * or other events like click,right click,etc
  3. 3. brief introduction to or... the worlds first publisher driven in-text content & ad delivery platform...
  4. 4. brief introduction to or lets web publishers push client-side event handling to the cloud , to run various rich applications called hoverlets
  5. 5. <ul><li> founded late 2007 </li></ul>
  6. 6. <ul><li> founded late 2007 </li></ul><ul><li>the web ~ 10- 20 years old </li></ul>
  7. 7. <ul><li> founded late 2007 </li></ul><ul><li>the web ~ 10- 20 years old </li></ul><ul><li>humans 100's of thousands of years </li></ul>
  8. 8. <ul><li> founded late 2007 </li></ul><ul><li>the web ~ 10- 20 years old </li></ul><ul><li>humans 100's of thousands of years </li></ul><ul><li>but bacteria .... around for millions of years ... so this talk is going to be about what we can learn from bacteria, the brain , and memory in a concurrent world </li></ul>
  9. 9. where are we heading <ul><li>the C10k problem & beyond </li></ul><ul><li>disk , hosting getting comparitively cheaper </li></ul><ul><li>horizontal scaling, moving beyond a single node </li></ul><ul><li>era of multi-core computing, how do we utilize more cpu's packed into each chip </li></ul><ul><li>concurrent programming </li></ul>
  10. 10. erlang <ul><li>functional programming language , ideal for concurrent,distributed, fault-tolerant applications </li></ul><ul><li>relies on message passing </li></ul><ul><li>immutable state, .erl files compiled to .beam </li></ul><ul><li>can explicitly decide if tasks need to done: </li></ul><ul><ul><li>syncronously or asychronously </li></ul></ul><ul><ul><li>in parallel (concurrent) , or distributed (multi-node) </li></ul></ul>
  11. 11. demo <ul><li>erl -name 'node1@' -setcookie secret </li></ul><ul><li>> </li></ul><ul><li>erl -name 'node2@' -setcookie secret </li></ul><ul><li>> net_adm:ping( 'node1@ ') . </li></ul><ul><li>pong. </li></ul><ul><li>now that they recognize each other , try rpc:call (TargetNode,Module,Fn,Args). spawn ( Node , Fn ) spawn ( Node, Module, Fn, Args ) . </li></ul>
  12. 12. this is how erlang looks like... <ul><li>lists: foldl ( fun( X , Accumulator ) -> Rem = X rem 2, case Rem of 0 -> Accumulator ++ [X]; _ -> Accumulator </li></ul><ul><li>end, [ ], lists: seq( 1, 10 ) ). [2,4,6,8,10] </li></ul>
  13. 13. bacteria that exhibit group dynamics <ul><li>bacteria peforms group operation like a lists fold operation </li></ul><ul><li>each bacteria spawns its own set of proteins, when only when the Accumulator is > some threshhold, will group dynamics of making light (bioluminiscence) kick ( eg: deep sea animals) </li></ul><ul><li>All bacteria have some sort of some presence & replies associated which are queried </li></ul>
  14. 14. <ul><li>in erlang, you can create a new concurrent process that evaluates a Fun. </li></ul><ul><ul><ul><li>Pid = spawn ( fun() -> %% do something end ) </li></ul></ul></ul><ul><li>Pid ! Msg , message sending is asynchronous </li></ul><ul><li>erlang can be set to utilize smp support </li></ul>
  15. 15. what does smp mean <ul><li>SMP (Structured Message Passing) supports the dynamic construction of process families that communicate through asynchronous messages </li></ul><ul><li>Process families can be connected together </li></ul><ul><li>Each process can communicate with its parent, its children, and a subset of its siblngs, as specified by the family topology </li></ul>
  16. 16. pattern matching <ul><li>each molecule connects to its specific receptor protein to complete the missing piece,to ingite the group behaviour that are only succesful when all of the cells participate in unison. </li></ul><ul><li>Type = case UserType of user -> true; admin -> true; _Else -> false end </li></ul>
  17. 17. binary pattern matching <ul><li>Words = [ <<“hello”>>, <<“happy”>>, <<”hover”>>], </li></ul><ul><li>Suggest = fun(UserTyped)-> lists:foldl ( fun(X, Acc) -> Size = Get_size(UserTyped), %% 8bits per char case UserTyped of <<Usertyped:Size , _Rest:binary >> -> Acc ++ X; _Else > Acc end end , [] , Words ) end ) , Suggest(<<”he”>>) gives [<<”hello”>>,<<”help”>>] </li></ul>
  18. 18. fault-tolerance <ul><li>father of DNA, says that all humans have 10 places in our genome, where we have lost one or gained an another one </li></ul><ul><li>erlang catches timeouts for receiving a reply once you spawn a process, you can monitor it & link errors, or use the erlang/OTP architecture to supervise process's and set restart strategies </li></ul><ul><li>enabled ericcson to run with 9 9's uptime (99.999999999 %) </li></ul>
  19. 19. fault tolerance in the real world <ul><li>for a single google search result , the same requests are sent to multiple machines( ~1000 as of 09), which ever replies the quickest wins. </li></ul><ul><li>in amazon's dynamo architecture that powers S3, use a (3,2,2) rule . ie Maintain 3 copies of the same data, reads/writes are succesful only when 2 concurrent requests succeed. This ratio varies based on SLA, internal vs public service. ( more on conflict resolution... ) </li></ul>
  20. 20. inter-species communication <ul><li>if you look at your skin – consists of very many different species, but all bacteria found to communicate using one common chemical language. </li></ul>
  21. 21. inter-species communication <ul><li>if you look at your skin – consists of very many different species, but all bacteria found to communicate using one common chemical language. hmmmmmmmmmmmmmmmmmmm.............. ....serialization ?! ....a common protein interpret ?! ....or perhaps just in time protein compilation?! </li></ul>
  22. 22. interspecies comm. in the real world <ul><li>attempts at serialization , cross language communication include: </li></ul><ul><ul><li>thrift ( by facebook) </li></ul></ul><ul><ul><li>protocol buffers ( by google) </li></ul></ul><ul><ul><li>base64 en/decoding , port based communication ( erlang<->python at ) </li></ul></ul><ul><li>as for interspecies communication look no further: </li></ul>
  23. 23. interspecies comm. in the real world <ul><li>as for interspecies communication in the real world... look no further: JRuby !!!! IronPython!!! GWT !!! ...& coming to a theatre near you this..... FortScala ??? FlashTheApple++ ??? VisualJavaTranScriptual ???? </li></ul>
  24. 24. talking about scaling <ul><li>The brain of the worker honeybee weighs about 1mg, the total number of neurons in its brain is estimated to be 950,000 </li></ul><ul><li>Flies acrobatically , ecognizes patterns, navigates , norages , communicates </li></ul><ul><li>Energy consumption: 10−15 J/op, at least 106 more efficient than digital silicon </li></ul>
  25. 25. the human brain <ul><li>100 billion neurons, stores ~100 TB </li></ul><ul><li>Differential analysis e.g., we compute color </li></ul><ul><li>Multiple inputs: sight, sound, taste, smell, touch </li></ul><ul><li>Facial recognition subcircuits, peripheral vision </li></ul><ul><li>in essence - the left & right brain vary in: left -> persistent disk , handles past/future right -> temporal caches! , handles present </li></ul>
  26. 26. what we can learn <ul><li>why sharding data is critical in concurrent programming ( DB tx'ion locks ) </li></ul><ul><li>implementing flowcontrol ( eg: memory game) </li></ul><ul><li>event based alarms, timeouts, supervised workers </li></ul><ul><li>wrt performance/scaling , you can't improve what you cant measure </li></ul>
  27. 27. working with data concurrently? <ul><li>typical web backends – all user data in one table – then clustering justsplits that by artibary basis. Query content table where user=user1, </li></ul><ul><li>what if you have N concurrent process's accessing N diff user tables – no locks, you can parallize since sufficiently un-related. </li></ul><ul><li>same with mapreduce algo's, if the data is sufficiently unrelated, then parallelized easy, results can come back asynchronously </li></ul>
  28. 28. retrieving data concurrently? <ul><li>replication vs location transparency, are they fragmented, are some nodes read-only ? (rpc...) </li></ul><ul><li>need metadata for which node to acess for user1, (or use hashing fn like memcache) </li></ul><ul><li>are tables in-memory (right brain ), cached from disk , or on disk alone ( left brain ) </li></ul><ul><li>mnesia , erlang's inbuilt ~database lets you make highly granular choices </li></ul>
  29. 29. measurement <ul><li>you can't improve what you can't measure. </li></ul><ul><li>introducing the heat-seeking algo at </li></ul><ul><li>using tsung (written in erlang again ) – load performance testing tool, for simulating 100's of concurrent users/requests , and great for analysing bottlenecks of your system, benchmarking (content delivery networks (CDN's ) , etc </li></ul>
  30. 30. built-in datatypes <ul><li>atoms, integers, floats, tuples, lists </li></ul><ul><li>binary </li></ul><ul><li>dict </li></ul><ul><li>queues, sets , ord_sets </li></ul><ul><li>gb_trees </li></ul><ul><li>and unlike mysql, you can store complex datatypes into mnesia , and pattern match them </li></ul>
  31. 31. temporal data <ul><li>erlang/OTP comes with building blocks for making granular choices in data structures: </li></ul><ul><ul><li>gen_servers -> client/server architecture for process's </li></ul></ul><ul><ul><li>gen_event -> event based </li></ul></ul><ul><ul><li>get_fsm -> finite state machine , etc </li></ul></ul><ul><li>move over db, files. A single erlang process spawned can hold state. we use it to build own own set/get based cache workers. </li></ul>
  32. 32. temporal data in the real world <ul><li>you listen to a phone number in batche of 3 or 4 digits. the part that absorbs just before writing (temporal), until you write into your contact book or memorize it ( persistent) </li></ul><ul><li>a smart way of building counters that move ultra-fast in erlang, would be a gen_server that resides in-memory, accepting requests via a flowcontrol , getting a new state , and then writing to disc/db when conveneient. </li></ul>
  33. 33. more on sychcronous,asynchronous <ul><li>gen_sever's let you make synchonous call's that are blocking ( wait till it returns with the result ), and can catch timeouts, restarts,etc. </li></ul><ul><li>or can be non-blocking – asynchronous casts ( send the instruction of sending a mail,and return to “thank you page” immediately, dont wait for the mail to be sent. </li></ul><ul><li>ofcourse or you use the spawn,pid to write your own implementation of gen_*'s </li></ul>
  34. 34. flowcontrol <ul><li>queues to handle intents of reads/writes, to determine bottlenecks.(eg ur own,rabbitmq,etc ) </li></ul><ul><li>eg1: when we addjobs to the queue, if it takes greater than X consistently we move it to high traffic bracket, do things differently, possibly add workers or ignore based on the task. </li></ul><ul><li>eg2: amazon shopping carts, are known to be extra resilient to write failures, (dont mind multiple versions of them over time) </li></ul>
  35. 35. supervisors, workers <ul><li>as bacteria grow, they split into two. when muscle tears, it knows exactly what to replace. </li></ul><ul><li>erlang supervisors can decide restart policies: if one worker fails, restart all .... or if one worker fails, restart just that worker, more tweaks. </li></ul><ul><li>can spawn multiple workers on the fly, much like the need for launching a new ec2 instant </li></ul>
  36. 36. how do you know where to start/stop <ul><li>build your own bacteria antidote , stress tests to see, on your typical production server : </li></ul><ul><ul><li>how many process's can u create, how many open file sockets can you have, ( system limitations, tweak ) </li></ul></ul><ul><ul><li>how mant tables can you store, </li></ul></ul><ul><ul><li>how many rows will it take before it gets slow ( time to fragment ) </li></ul></ul><ul><ul><li>learn from the brain, gr8 videos on </li></ul></ul>
  37. 37. summary of tech at <ul><li>3 node cluster (64-bit 4gb ) on the LYME stack </li></ul><ul><li>python crawler, associated NLP parsers, mini metadata interpreter in client-side js , maybe moving to spidermonkey </li></ul><ul><li>remote node debugger/handler, flowcontrol, heat-seeking, cpu time-splicing algo's, headless-firefox for thumbnails queue </li></ul><ul><li>touching 500k hovers/month in Apr'09 , upwards of 25million S3 GET requests/month </li></ul>
  38. 38. summary of our erlang modules <ul><li>rewrites.erl error.erl frag_mnesia.erl hi_api_response.erl hi_appmods_api_user.erl hi_cache_app.erl , hi_cache_sup.erl hoverlingo.erl hi_cache_worker.erl hi_classes.erl hi_community.erl hi_cron_hoverletupdater_app.erl hi_cron_hoverletupdater.erl hi_cron_hoverletupdater_sup.erl hi_cron_kwebucket.erl hi_crypto.erl hi_flowcontrol_hoverletupdater.erl hi_htmlutils_site.erl hi_hybridq_app.erl hi_hybridq_sup.erl hi_hybridq_worker.erl hi_login.erl hi_mailer.erl hi_messaging_app.erl hi_messaging_sup.erl hi_messaging_worker.erl hi_mgr_crawler.erl hi_mgr_db_console.erl hi_mgr_db.erl hi_mgr_db_mnesia.erl hi_mgr_hoverlet.erl hi_mgr_kw.erl hi_mgr_node.erl hi_mgr_thumbs.erl hi_mgr_traffic.erl hi_nlp.erl hi_normalizer.erl hi_pagination_app.erl hi_pagination_sup.erl, hi_pagination_worker.erl hi_pmap.erl hi_register_app.erl hi_register.erl, hi_register_sup.erl, hi_register_worker.erl hi_render_hoverlet_worker.erl hi_rrd.erl , hi_rrd_worker.erl hi_settings.erl hi_sid.erl hi_site.erl hi_stat.erl hi_stats_distribution.erl hi_stats_overview.erl hi_str.erl hi_trees.erl hi_utf8.erl hi_yaws.erl </li></ul>
  39. 39. thank you
  40. 40. references <ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li>all the amazing brain-related talks at , </li></ul><ul><li>shoutout to everyone at #erlang ! </li></ul>