Your SlideShare is downloading. ×
0
Distributed Erlang Systems In Operation
Distributed Erlang Systems In Operation
Distributed Erlang Systems In Operation
Distributed Erlang Systems In Operation
Distributed Erlang Systems In Operation
Distributed Erlang Systems In Operation
Distributed Erlang Systems In Operation
Distributed Erlang Systems In Operation
Distributed Erlang Systems In Operation
Distributed Erlang Systems In Operation
Distributed Erlang Systems In Operation
Distributed Erlang Systems In Operation
Distributed Erlang Systems In Operation
Distributed Erlang Systems In Operation
Distributed Erlang Systems In Operation
Distributed Erlang Systems In Operation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Distributed Erlang Systems In Operation

3,102

Published on

0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,102
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
67
Comments
0
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Distributed Erlang Systems In Operation Andy Gross <andy@basho.com>, @argv0 VP, Engineering Basho Technologies Erlang Factory SF 2010
  • 2. Architectural Goals • Decentralized (no masters). • Distributed (asynchronous, nodes use only local data). • Homogeneous (all nodes can do anything). • Fault tolerant (emergent goal). • Observable
  • 3. Anti-Goals • Global state: • pg2/hot data in mnesia • globally registered names • Distributed transactions • Reliance on physical time
  • 4. Compromise your Goals • Decentralized (no masters). • Distributed (nodes use only local data). • Homogeneous (all nodes can do anything). • No distributed transactions/global state. • No reliance on physical time.
  • 5. Systems Design • Cluster Membership • Load balancing/naming/resource allocation • Liveness checking • Soft Global State
  • 6. Cluster Membership • Option 1: Use a configuration file: • Requires out-of-band sync of configuration file across machines. • Not “elastic” enough for some use-cases. • Option II: Contact a seed node to join and use gossip protocol to propagate state.
  • 7. Load Balancing and Resource Allocation • Static assignment • Round-robin/Random • Static hashing: Nodes[hash(Item) mod length(Nodes)] • Dynamo/Riak/Cassandra/Voldemort: Consistent Hashing
  • 8. Liveness Checking • nodes() and net_adm:ping() operations can be too low-level. • Sometimes you’d like to divert traffic from a node at the application level while keeping distributed Erlang up. • Use net_kernel:monitor_nodes() and an app-level mechanism for liveness.
  • 9. Soft State/Gossip Protocols • An eventually-consistent alternative to global state. • Nodes make changes, gossip to another node. • Nodes receive changes, merge with local state, gossip to another node. • Requires up-front thought about data structures, dealing with slightly-stale data.
  • 10. Running Your System • Shipping code • Upgrading code • Debugging your own systems • Living with other people’s systems
  • 11. Shipping Code • Don’t rely on working Erlang on end-user machines (many Linux distros are broken or out of date). • Ship code with an embedded runtime and libraries. • Put version/build info in code.
  • 12. Upgrading Code • Hot code loading for small, emergency fixes. • For new releases, reboot the node. • Why not .appups? • Systems I’ve worked on have changed/ evolved too fast. • A reboot is a good test of resiliency.
  • 13. Debugging Running Systems • Remote Erlang shells are awesome, except when distributed Erlang dies (it happens). • run_erl (or even screen(1)) give you a backdoor for when -remsh fails. • rebar (http://hg.basho.com/rebar) makes this easy. • What if you don’t have access to the box?
  • 14. OPS - Other People’s Systems • Your Erlang, Enterprise firewalls. • Erlang shell is powerful, but scary. • Provide a debugging module. • Get data out via HTTP/SMTP/SNMP • Use disk_log/report_browser.
  • 15. Questions? “You know you have [a distributed system] when the crash of a computer you’ve never heard of stops you from getting any work done” -Leslie Lamport
  • 16. Resources • unsplit: http://github.com/uwiger/unsplit • gen_leader: http://github.com/KirinDave/gen_leader_revival • Dynamo: http://www.allthingsdistributed.com/2007/10/ amazons_dynamo.html • Hans Svensson: Distributed Erlang Application Pitfalls and Recipes: http://www.erlang.org/workshop/2007/proceedings/ 06svenss.ppt • Consistent Hashing and Random Trees: Distributed Caching Protocols for relieving Hot Spots on the World Wide Web: http://bit.ly/LewinConsistentHashing

×