SLPAY: Distributed Systems Made Simple

1,505 views

Published on

Published in: Technology
1 Comment
1 Like
Statistics
Notes
  • Hello. The title is SPLAY, not SLPAY :-)
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
1,505
On SlideShare
0
From Embeds
0
Number of Embeds
246
Actions
Shares
0
Downloads
7
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

SLPAY: Distributed Systems Made Simple

  1. 1. ᔕᕈᒪᐱᓭ Distributed Systems Made Simple Pascal Felber, Raluca Halalai, Lorenzo Leonini, Etienne Rivière,Valerio Schiavoni, José Valerio Université de Neuchâtel, Switzerland www.splay-project.org ᔕᕈᒪᐱᓭMonday, January 23, 12
  2. 2. 2 Motivations • Developing, testing, and tuning distributed applications is hard • In Computer Science research, fixing the gap of simplicity between pseudocode description and implementation is hard • Using worldwide testbeds is hard ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  3. 3. 3 Networks of idle Testbeds workstations • Set of machines for testing distributed application/protocol • Several different testbeds! A cluster @UniNE Your machine ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  4. 4. 4 What is PlanetLab? ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  5. 5. 4 What is PlanetLab? • Machines contributed by universities, companies,etc. • 1098 nodes at 531 sites (02/09/2011) • Shared resources, no privileged access • University-quality Internet links • High resource contention • Faults, churn, packet-loss is the norm • Challenging conditions ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  6. 6. 5 Daily Job With Distributed Systems 1 code 1 • Write (testbed specific) code • Local tests, in-house cluster, PlanetLab... ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  7. 7. 5 Daily Job With Distributed Systems 2 1 code debug 2 • Debug (in this context, a nightmare) ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  8. 8. 5 Daily Job With Distributed Systems 3 1 2 code debug deploy 3 • Deploy, with testbed specific scripts ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  9. 9. 5 Daily Job With Distributed Systems 4 1 2 3 code debug deploy get logs 4 • Get logs, with testbed specific scripts ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  10. 10. 5 Daily Job With Distributed Systems 1 2 3 4 5 code debug deploy get logs plots 5 • Produce plots, hopefully ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  11. 11. 6 ᔕᕈᒪᐱᓭ at a Glance code debug deploy get logs plots t... plo gnu • Supports the development, evaluation, testing, and tuning of distributed applications on any testbed: • In-house cluster, shared testbeds, emulated environments... • Provides an easy-to-use pseudocode-like language based on Lua • Open-source: http://www.splay-project.org ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  12. 12. 7 The Big Picture application splayd splayd splayd splayd Splay Controller SQL DB ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  13. 13. 7 The Big Picture application splayd splayd splayd splayd Splay Controller SQL DB ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  14. 14. 7 The Big Picture splayd splayd application splayd splayd Splay Controller SQL DB ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  15. 15. 8 SPLAY architecture • Portable daemons (ANSI C) on the testbed(s) • Testbed agnostic for the user • Lightweight (memory & CPU) • Retrieve results by remote logging ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  16. 16. 9 Lua SPLAY language ributed system. It is noteworthy that the churn manage- Lua’s interpreter can directly execute source code, •ment system relieves the need for fault injection systems as well as hardware-dependent (but operating system- Based on Lua uch as Loki [13]. Another typical use of the churn man- independent) bytecode. In S PLAY, the favored way ofagement system is for long-running applications, e.g., a submitting applications is in the form of source code, but • High-level scripting language, simple interaction withDHT that serves as a substrate for some other distributedapplication under test and needs to stay available for the bytecode programs are also supported (e.g., for intellec- tual property protection). C, bytecode-based, garbage collection, ...whole duration of the experiments. In such a scenario, Isolation and sandboxing are achieved thanks to Lua’s •one can ask the churn manager to maintain a fixed size support for first-class functions with lexical scoping and Close to pseudo code (focus on the algorithm)population of nodes and to automatically bootstrap new closures, which allow us to restrict access to I/O and net-ones as faults occur in the testbed. • working libraries. We modify the behavior of these func- Favors natural ways of expressing algorithms (e.g., RPCs) tions to implement the restriction imposed by the admin- istrator or by the user at the time he/she submits the ap- •3.3 Language and Applications SPLAY applications can be run locally without the plication for deployment over S PLAY. Lua also supports cooperative multitasking by theS PLAY applications are written in the Lua language [17],a highly efficient scripting language. infrastructure (for debugging & testing) deployment This design choice means of coroutines, which are at the core of S PLAY’s •was dictated by four majors factors. First, the most im- event-based model (discussed below).portant reason is the support of sandboxingsetremote Comprehensive for of events/threadsprocesses and, as a result, increased security both for the libraries (extensible) luasocket* io (fs)* stdlib* sb_socket socketevents sb_fs sb_stdlib estbed owner and its users. As mentioned earlier, sand-boxing is a sound basis for execution in non-dedicated crypto* llenc json*environments, where resources need to be constrained misc log rpc splay::appand where the hosting operating system must be shielded * : third!party and lua libraries : main dependencies rom possibly buggy or ill-behaved code. Second, one ᔕᕈᒪᐱᓭ Distributed Systems Made Simple -Figure Felber - University of main S PLAY libraries.of S PLAY’s goals is to support large numbers of pro- Pascal 6: Overview of the Neuchâtelcesses within23, 12 Monday, January a single host of the testbed. This calls for
  17. 17. 10 Why ? • Light & Fast • (Very) close to equivalent code in C • Concise • Allow developers to focus on ideas more than implementation details • Key for researchers and students • Sandbox thanks to the possibility of easily redefine (even built-in) functions ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  18. 18. 11 Concise Pseudo code as published on original paper Executable code using SPLAY libraries ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  19. 19. 12 http://www.splay-project.orgMonday, January 23, 12
  20. 20. 13 Sandbox: Motivations • Experiments should access only their own resources • Required for non-dedicated testbeds • Idle workstations in universities or companies • Must protect the regular users from ill-behaved code (e.g., full disk, full memory, etc.) • Memory-allocation, filesystem, network resources must be restricted ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  21. 21. 14 Sandboxing with Lua • In Lua, all functions can be re-defined transparently • Resource control done in re-defined functions before calling the original function • No code modification required for the user Sandboxing SPLAY Application SPLAY Lua libraries (includes all stdlib and system calls) Operating system ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  22. 22. 15 for _,module in pairs(splay) present novelties introduced due to dist systems luasocket events io crypto llenc/benc json misc log rpc Modules sandboxed to prevent access to sensible resources ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  23. 23. 17 splay.events • Distributed protocols can use message- passing paradigm to communicate • Nodes react to events events • Local, incoming, outgoing messages • The core of the Splay runtime • Libraries splay.socket & splay.io provide non-blocking operational mode • Based on Lua’s coroutines ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  24. 24. 19 splay.rpc • Default mechanism to communicate between nodes • Support for UDP/TCP • Efficient BitTorrent-like encoding rpc • Experimental binary encoding ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  25. 25. responseCallback.onResponseReceived(msg); }/* 20 * CAN BE COMMENTED AS SEND RETRIAL OF MESSAGES IS OFF else { it is a Life Before ᔕᕈᒪᐱᓭ * duplicate lookup response, send a PONG to (hopefully) stop source to * send us again that message final Message pong = new * Message(MessageType.PONG, msg.messageId, this, msg.src); pong.ackedMsg * = msg; simulator.send(pong); } */ • Time spent on developing testbed } /* ELSE IF IS SOMETHING NOT A LOOKUP RESPONSE */ else { final Message beingAcked = msg.ackedMsg; specific protocols • Or fallback to simulations... final ResponseArrivedCallback handler = this.sentMessagesCallbacks .remove(beingAcked); • The focus should be on ideas /* if there is a mapping */ if (handler != null) { handler.onResponseReceived(msg); • Researchers usually have no time }/* * else { this may happen in case of a response arrive later than * expected... log .info("%%% " + this + " received the response => " + to produce industrial-quality code * msg + * " but no handler was expecting it! So, simply send back a PONG to make it stop sending again..n" * ); send a PONG to (hopefully) stop source to send us again final * Message pong = new Message(MessageType.PONG, msg.messageId, this, there is more :-( * msg.src); pong.ackedMsg = msg; simulator.send(pong); } */ } // /* it replied at last, so "reset" the value */ ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel // this.nodeConsecutiveTimeouts.put((DHTNode) msg.src, 0); }Monday, January 23, 12
  26. 26. such as epidemic diffusion on Erd¨ s-Renyi random 21 ographs [16] and various types of distribution trees [10] Life With ᔕᕈᒪᐱᓭ(n-ary trees, parallel trees). As one can note from the fol-lowing figure, all implementations are extremely concise 7in terms of lines of code (LOC): Chord (base) 59 (base) + 17 (FT) + 26 (leafset) = 101 Pastry 265 Scribe Pastry 79 SplitStream Pastry Scribe 58 BitTorrent 420 Cyclon 93 Epidemic 35 Trees 47 Although the number of lines is clearly just a rough • Lines thepseudocode ~== Lines of executableitcodestill aindicator of of expressiveness of a system, is 5 Splay daemons have been continuously running of Neuchâtel hosts for ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University on thesemore than one year.Monday, January 23, 12
  27. 27. There exist several characterizations of churn that can 22 y be leveraged to reproduce realistic conditions for the pro- -u- Churn management tocol under test. First, synthetic description issued from analytical studies [23] can be used to generate churn sce- n • narios and is key for testing large-scale application Churn replay them in the system. Second, several 0e- • traces “Natural churn”of real networks have been made of the dynamics on PlanetLab not reproducible e • publicly available by the community (e.g., see the reposi- Hard to compare algorithms under same conditions • tory at [1]); they cover a wide rangechurn SPLAY allows to finely control of applications such as highly churned file-sharing system [8] or high perfor- • Traces (e.g., from file sharing systems) mance computing clusters [26]. • Synthetic descriptions: dedicated language 1 < 2 > < 3 >4< 5 >6 1 at 30s join 10 Leaves/joins per min. Nodes leaving 2 from 5m to 10m inc 10 Nodes joining 30 3 from 10m to 15m const 20 Nodes 10 churn 50% 0 40 4 at 15m leave 50% 30 5 from 15m to 20m inc 10 20 Binned number of joins churn 150% 10 and leaves per minute at 20m stop 0er 6 0 5 10 15 20 Time (minutes) Figure 5: Example of a synthetic churn description. ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  28. 28. This section evaluates the use of the churn management as 14% of t 23 module, both using traces and synthetic descriptions. Us- minute; (2) r ing churn is as simple as launching a regular S PLAY appli- plex nor lon Synthetic churn cation with a trace file as extra argument. S PLAY provides a set of tools to generate and process trace files. One can, for instance, speed-up a trace, increase the churn ampli- we did for F estimate tha human effor tude whilst keeping its statistical properties, or generate a than with an trace from a synthetic description. a Pastry DHT Routing performance in lieve that the 90th perc. 60 courage the 75th perc. 40 protocols un 50th perc. Route failures (%) 25th perc. Self-repair current syste 5th perc. 20 Delay (ms) Failure rate 10 110 Time (seconds) 0 50% of the network fail 8 130 20 6 15 10 4 5 2 0 0 0 1 2 3 4 5 6 7 8 9 10 0 Time (minutes) Figure 11: Stabilized Using churn management to reproduce massive Figure 13: D churn conditions for thefailure: half of theimplementation. Massive S PLAY Pastry nodes fail tion of (1) the superset of da Figure 11 presents aMade Simple - experiment of aof massive ᔕᕈᒪᐱᓭ Distributed Systems typical Pascal Felber - University Neuchâtel failure using the synthetic description. We ran Pastry 5.6 DeployMonday, January 23, 12
  29. 29. 24 Trace-driven churn • Traces from the OverNet file sharing network [IPTPS03] • Speed up of 2, 5, 10 (1 minute = 30, 12, 6 seconds) • Evolution of hit-ratio & delays of Pastry on PlanetLab Churn x2 Churn x5 Churn x10 12 12 12 10 10 10 Route failures (%) Route failures (%) Route failures (%) Leaves and joins per minute Delay (seconds) Leaves and joins per minute Delay (seconds) Leaves and joins per minute Delay (seconds) 8 8 8 6 6 6 4 4 4 2 2 2 2 0 2 0 2 0 1.5 1.5 1.5 1 1 1 0.5 0.5 0.5 0 0 0 200 200 200 Nodes leaving Nodes leaving Nodes leaving Nodes joining 700 Nodes joining 700 Nodes joining 700 150 150 150 650 650 650 Nodes Nodes Nodes 100 100 100 600 600 600 50 550 50 550 50 550 0 500 0 500 0 500 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 Figure 11: Study of the effect of churn on Pastry deployed on PlanetLab. Churn is derived from the trace of the Overnet file sharing system and sped up for increasing volatility. 10 Churn x2 Churn x5 Churn x10 110% 150% 200% 70 SPLAY trees CRCP trees ime (seconds) 8 130% 170% 60 6 mpletions 50 4 http://www.splay-project.org40 16 KB 128 KB 512 KB 2Monday, January 23, 12 30
  30. 30. 25 Live Demo www.splay-project.org ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  31. 31. 26 Live demo: Gossip-based dissemination • Also called epidemic dissemination • A message spreads like the flu among population • Robust way to spread message in random network • Two actions: push and pull • Push: node chooses f other random node to infect • A newly infected node infects f other in turn • Done for a limited number of steps (TTL) • Pull: node tries to fetch message from random node ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  32. 32. 27 SOURCE Every node knows a random set of other nodes. A node wants to send a message to all nodes. ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  33. 33. 28 A first initial push in which nodes that receive the messages for the first time forwards it to f random neighbors, up to TTL times. ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  34. 34. 29 Periodically, nodes pull for new messages from one random neighbor. The message eventually reaches all peers with high probability. ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  35. 35. 30 Algorithm 1: A simple push-pull dissemination protocol, on node P . Constants f : Fanout (push: P forwards a new message to f random peers). TTL: Time To Live (push: a message is forwarded to f peers at most TTL times). : Pull period (pull : P asks for new messages every seconds). Variables R: set of received messages IDs N : set of node identifiers /* Invoked when a message is pushed to node P by node Q */ function Push(Message m, hops) if m received for the first time then /* Store the message locally */ R R⌅m /* Propage the message if required */ if hops > 0 then invoke Push(msg, hops-1 ) on f random peers from N /* Periodic pull */ thread PeriodicPull() every seconds do invoke Pull(R, P ) on a random node from N /* Invoked when a node Q requests messages from node P */ function Pull(RQ , Q) foreach m | m ⇥ R ⇧ m ⇤⇥ RQ do invoke Push(m, 0 ) on Q ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  36. 36. 31 require"splay.base" rpc = require"splay.urpc" Loading Base and UDP RPC Libraries. --[[ SETTINGS ]]-- fanout = 3 ttl = 3 pull_period = 3 rpc_timeout = 15 Set the timeout for RPC operations (i.e., an RPC will return nil after 15 seconds). --[[ VARS ]]-- msgs = {} function push(q, m)   if not msgs[m.id] then     msgs[m.id] = m Logging facilities: all logs are available at the     log.print("NEW: "..m.id.." (ttl:"..m.t..") from "..q.ip..":"..q.port)     m.t = m.t - 1 controllers, where they are timestamped and     if m.t > 0 then stored in the DB.       for _, n in pairs(misc.random_pick(job.nodes, fanout)) do         log.print("FORWARD: "..m.id.." (ttl:"..m.t..") to "..n.ip..":"..n.port) The RPC call itself is embedded in an inner         events.thread(function()           rpc.call(n, {"push", job.me, m}, rpc_timeout) anonymous function in a new anonymous         end) thread: we do not care about its success       end (epidemics are naturally robust to message     end   else loss). Testing for success would be easy: the     log.print("DUP: "..m.id.." (ttl:"..m.t..") from "..q.ip..":"..q.port) second return value is nil in case of failure.   end end ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  37. 37. 32 SPLAY provides a library for cooperative function periodic_pull()   while events.sleep(pull_period) do multithreading, events, scheduling and     local q = misc.random_pick_one(job.nodes) synchronization. SPLAY also provides various     log.print("PULLING: "..q.ip..":"..q.port) commodity libraries.     local ids = {}     for id, _ in pairs(msgs) do       ids[id] = true (In LUA, arrays and maps are the same type)     end     local r = rpc.call(q, {"pull", ids}, rpc_timeout)     if r then Variable r is nil if the RPC times out.       for id, m in pairs(r) do         if not msgs[id] then           msgs[id] = m           log.print("NEW REPLY: "..m.id.." from "..q.ip..":"..q.port)         else           log.print("DUP REPLY: "..m.id.." from "..q.ip..":"..q.port)         end       end     end   end end function pull(ids)   local r = {} There is no difference between a local function   for id, m in pairs(msgs) do     if not ids[id] then and one called by a distant node. Even variables       r[id] = m can be accessed by a distant node by a RPC!     end SPLAY does all the work of serialization, network   end   return r calls, etc. while preserving low footprint and end high performance. ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  38. 38. 33 function source()   local m = {id = 1, t = ttl, data = "data"}   for _, n in pairs(misc.random_pick(job.nodes, fanout)) do     log.print("SOURCE: "..m.id.." to "..n.ip..":"..n.port)     events.thread(function()       rpc.call(n, {"push", job.me, m}, rpc_timeout) Calling a remote function.     end)   end end events.loop(function()   rpc.server(job.me)   log.print("ME", job.me.ip, job.me.port, job.position) Start the RPC server: it is up and running.   events.sleep(15)   events.thread(periodic_pull)   if job.position == 1 then Embed a function in a separate thread.     source()   end end) ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12
  39. 39. 34 ay aw e- e ak lid T S • Distributed systems raise a number of issues for their evaluation • Hard to implement, debug, deploy, tune • ᔕᕈᒪᐱᓭ leverages Lua and centralized controller to produce an easy to use yet powerful working environment ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of NeuchâtelMonday, January 23, 12

×