SlideShare a Scribd company logo
1 of 16
Download to read offline
Distributed Erlang
Systems In Operation
 Andy Gross <andy@basho.com>, @argv0
             VP, Engineering
           Basho Technologies
         Erlang Factory SF 2010
Architectural Goals
• Decentralized (no masters).
• Distributed (asynchronous, nodes use only
  local data).
• Homogeneous (all nodes can do anything).
• Fault tolerant (emergent goal).
• Observable
Anti-Goals
• Global state:
 • pg2/hot data in mnesia
 • globally registered names
• Distributed transactions
• Reliance on physical time
Compromise your
       Goals
• Decentralized (no masters).
• Distributed (nodes use only local data).
• Homogeneous (all nodes can do anything).
• No distributed transactions/global state.
• No reliance on physical time.
Systems Design

• Cluster Membership
• Load balancing/naming/resource allocation
• Liveness checking
• Soft Global State
Cluster Membership
• Option 1: Use a configuration file:
 • Requires out-of-band sync of
    configuration file across machines.
  • Not “elastic” enough for some use-cases.
• Option II: Contact a seed node to join and
  use gossip protocol to propagate state.
Load Balancing and
  Resource Allocation
• Static assignment
• Round-robin/Random
• Static hashing: Nodes[hash(Item) mod
  length(Nodes)]
• Dynamo/Riak/Cassandra/Voldemort:
  Consistent Hashing
Liveness Checking
• nodes() and net_adm:ping() operations can
  be too low-level.
• Sometimes you’d like to divert traffic from
  a node at the application level while
  keeping distributed Erlang up.
• Use net_kernel:monitor_nodes() and an
  app-level mechanism for liveness.
Soft State/Gossip
         Protocols
• An eventually-consistent alternative to
  global state.
• Nodes make changes, gossip to another
  node.
• Nodes receive changes, merge with local
  state, gossip to another node.
• Requires up-front thought about data
  structures, dealing with slightly-stale data.
Running Your System

• Shipping code
• Upgrading code
• Debugging your own systems
• Living with other people’s systems
Shipping Code

• Don’t rely on working Erlang on end-user
  machines (many Linux distros are broken
  or out of date).
• Ship code with an embedded runtime and
  libraries.
• Put version/build info in code.
Upgrading Code
• Hot code loading for small, emergency
  fixes.
• For new releases, reboot the node.
• Why not .appups?
 • Systems I’ve worked on have changed/
    evolved too fast.
 • A reboot is a good test of resiliency.
Debugging Running
       Systems
• Remote Erlang shells are awesome, except
  when distributed Erlang dies (it happens).
• run_erl (or even screen(1)) give you a
  backdoor for when -remsh fails.
• rebar (http://hg.basho.com/rebar) makes
  this easy.
• What if you don’t have access to the box?
OPS - Other People’s
         Systems
• Your Erlang, Enterprise firewalls.
• Erlang shell is powerful, but scary.
• Provide a debugging module.
• Get data out via HTTP/SMTP/SNMP
• Use disk_log/report_browser.
Questions?
“You know you have [a distributed system]
when the crash of a computer you’ve never
 heard of stops you from getting any work
                  done”

             -Leslie Lamport
Resources
•   unsplit: http://github.com/uwiger/unsplit

•   gen_leader: http://github.com/KirinDave/gen_leader_revival

•   Dynamo: http://www.allthingsdistributed.com/2007/10/
    amazons_dynamo.html

•   Hans Svensson: Distributed Erlang Application Pitfalls and
    Recipes: http://www.erlang.org/workshop/2007/proceedings/
    06svenss.ppt

•   Consistent Hashing and Random Trees: Distributed Caching
    Protocols for relieving Hot Spots on the World Wide Web:
    http://bit.ly/LewinConsistentHashing

More Related Content

What's hot

A Collaborative Data Science Development Workflow
A Collaborative Data Science Development WorkflowA Collaborative Data Science Development Workflow
A Collaborative Data Science Development WorkflowDatabricks
 
Whirlpools in the Stream with Jayesh Lalwani
 Whirlpools in the Stream with Jayesh Lalwani Whirlpools in the Stream with Jayesh Lalwani
Whirlpools in the Stream with Jayesh LalwaniDatabricks
 
ELK at LinkedIn - Kafka, scaling, lessons learned
ELK at LinkedIn - Kafka, scaling, lessons learnedELK at LinkedIn - Kafka, scaling, lessons learned
ELK at LinkedIn - Kafka, scaling, lessons learnedTin Le
 
Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, Uber
Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, UberKafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, Uber
Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, UberHostedbyConfluent
 
Creating Reusable Geospatial Pipelines
Creating Reusable Geospatial PipelinesCreating Reusable Geospatial Pipelines
Creating Reusable Geospatial PipelinesDatabricks
 
Spark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu KasinathanSpark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu KasinathanDatabricks
 
Gocd – Kubernetes/Nomad Continuous Deployment
Gocd – Kubernetes/Nomad Continuous DeploymentGocd – Kubernetes/Nomad Continuous Deployment
Gocd – Kubernetes/Nomad Continuous DeploymentLeandro Totino Pereira
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 
OSMC 2021 | Use OpenSource monitoring for an Enterprise Grade Platform
OSMC 2021 | Use OpenSource monitoring for an Enterprise Grade PlatformOSMC 2021 | Use OpenSource monitoring for an Enterprise Grade Platform
OSMC 2021 | Use OpenSource monitoring for an Enterprise Grade PlatformNETWAYS
 
Migrating to Apache Spark at Netflix
Migrating to Apache Spark at NetflixMigrating to Apache Spark at Netflix
Migrating to Apache Spark at NetflixDatabricks
 
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxData
 
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...Brandon O'Brien
 
Standalone Spark Deployment for Stability and Performance
Standalone Spark Deployment for Stability and PerformanceStandalone Spark Deployment for Stability and Performance
Standalone Spark Deployment for Stability and PerformanceRomi Kuntsman
 
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLucidworks
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Data Con LA
 
#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on MesosPaco Nathan
 
Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka confluent
 
Introducing Kubernetes
Introducing Kubernetes Introducing Kubernetes
Introducing Kubernetes VikRam S
 

What's hot (20)

A Collaborative Data Science Development Workflow
A Collaborative Data Science Development WorkflowA Collaborative Data Science Development Workflow
A Collaborative Data Science Development Workflow
 
Whirlpools in the Stream with Jayesh Lalwani
 Whirlpools in the Stream with Jayesh Lalwani Whirlpools in the Stream with Jayesh Lalwani
Whirlpools in the Stream with Jayesh Lalwani
 
ELK at LinkedIn - Kafka, scaling, lessons learned
ELK at LinkedIn - Kafka, scaling, lessons learnedELK at LinkedIn - Kafka, scaling, lessons learned
ELK at LinkedIn - Kafka, scaling, lessons learned
 
Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, Uber
Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, UberKafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, Uber
Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, Uber
 
Creating Reusable Geospatial Pipelines
Creating Reusable Geospatial PipelinesCreating Reusable Geospatial Pipelines
Creating Reusable Geospatial Pipelines
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Spark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu KasinathanSpark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu Kasinathan
 
Gocd – Kubernetes/Nomad Continuous Deployment
Gocd – Kubernetes/Nomad Continuous DeploymentGocd – Kubernetes/Nomad Continuous Deployment
Gocd – Kubernetes/Nomad Continuous Deployment
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Cooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython NotebookCooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython Notebook
 
OSMC 2021 | Use OpenSource monitoring for an Enterprise Grade Platform
OSMC 2021 | Use OpenSource monitoring for an Enterprise Grade PlatformOSMC 2021 | Use OpenSource monitoring for an Enterprise Grade Platform
OSMC 2021 | Use OpenSource monitoring for an Enterprise Grade Platform
 
Migrating to Apache Spark at Netflix
Migrating to Apache Spark at NetflixMigrating to Apache Spark at Netflix
Migrating to Apache Spark at Netflix
 
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
 
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
 
Standalone Spark Deployment for Stability and Performance
Standalone Spark Deployment for Stability and PerformanceStandalone Spark Deployment for Stability and Performance
Standalone Spark Deployment for Stability and Performance
 
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos#MesosCon 2014: Spark on Mesos
#MesosCon 2014: Spark on Mesos
 
Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka Real-time Data Streaming from Oracle to Apache Kafka
Real-time Data Streaming from Oracle to Apache Kafka
 
Introducing Kubernetes
Introducing Kubernetes Introducing Kubernetes
Introducing Kubernetes
 

Viewers also liked

東京Node学園#8 Let It Crash!?
東京Node学園#8 Let It Crash!?東京Node学園#8 Let It Crash!?
東京Node学園#8 Let It Crash!?koichik
 
Concurrency in Elixir with OTP
Concurrency in Elixir with OTPConcurrency in Elixir with OTP
Concurrency in Elixir with OTPJustin Reese
 
Erlang vs. Java
Erlang vs. JavaErlang vs. Java
Erlang vs. JavaArtan Cami
 
Intro to Erlang
Intro to ErlangIntro to Erlang
Intro to ErlangKen Pratt
 
1 hour dive into Erlang/OTP
1 hour dive into Erlang/OTP1 hour dive into Erlang/OTP
1 hour dive into Erlang/OTPJordi Llonch
 
Node.jsエンジニア Erlangに入門するの巻
Node.jsエンジニア Erlangに入門するの巻Node.jsエンジニア Erlangに入門するの巻
Node.jsエンジニア Erlangに入門するの巻Recruit Technologies
 
Intro To Erlang
Intro To ErlangIntro To Erlang
Intro To Erlangasceth
 
リアルタイムサーバー 〜Erlang/OTPで作るPubSubサーバー〜
リアルタイムサーバー 〜Erlang/OTPで作るPubSubサーバー〜 リアルタイムサーバー 〜Erlang/OTPで作るPubSubサーバー〜
リアルタイムサーバー 〜Erlang/OTPで作るPubSubサーバー〜 Yugo Shimizu
 
Imprementation of realtime_networkgame
Imprementation of realtime_networkgameImprementation of realtime_networkgame
Imprementation of realtime_networkgameSatoshi Yamafuji
 
ニコニコ生放送の配信基盤改善
ニコニコ生放送の配信基盤改善ニコニコ生放送の配信基盤改善
ニコニコ生放送の配信基盤改善takahiro_yachi
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenLorenzo Alberton
 

Viewers also liked (12)

東京Node学園#8 Let It Crash!?
東京Node学園#8 Let It Crash!?東京Node学園#8 Let It Crash!?
東京Node学園#8 Let It Crash!?
 
Concurrency in Elixir with OTP
Concurrency in Elixir with OTPConcurrency in Elixir with OTP
Concurrency in Elixir with OTP
 
Erlang vs. Java
Erlang vs. JavaErlang vs. Java
Erlang vs. Java
 
Intro to Erlang
Intro to ErlangIntro to Erlang
Intro to Erlang
 
1 hour dive into Erlang/OTP
1 hour dive into Erlang/OTP1 hour dive into Erlang/OTP
1 hour dive into Erlang/OTP
 
Node.jsエンジニア Erlangに入門するの巻
Node.jsエンジニア Erlangに入門するの巻Node.jsエンジニア Erlangに入門するの巻
Node.jsエンジニア Erlangに入門するの巻
 
Erlang OTP
Erlang OTPErlang OTP
Erlang OTP
 
Intro To Erlang
Intro To ErlangIntro To Erlang
Intro To Erlang
 
リアルタイムサーバー 〜Erlang/OTPで作るPubSubサーバー〜
リアルタイムサーバー 〜Erlang/OTPで作るPubSubサーバー〜 リアルタイムサーバー 〜Erlang/OTPで作るPubSubサーバー〜
リアルタイムサーバー 〜Erlang/OTPで作るPubSubサーバー〜
 
Imprementation of realtime_networkgame
Imprementation of realtime_networkgameImprementation of realtime_networkgame
Imprementation of realtime_networkgame
 
ニコニコ生放送の配信基盤改善
ニコニコ生放送の配信基盤改善ニコニコ生放送の配信基盤改善
ニコニコ生放送の配信基盤改善
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
 

Similar to Distributed Erlang Systems In Operation

Deployment of WebObjects applications on CentOS Linux
Deployment of WebObjects applications on CentOS LinuxDeployment of WebObjects applications on CentOS Linux
Deployment of WebObjects applications on CentOS LinuxWO Community
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rulesOleg Tsal-Tsalko
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 
Operating systems (For CBSE School Students)
Operating systems (For CBSE School Students)Operating systems (For CBSE School Students)
Operating systems (For CBSE School Students)Gaurav Aggarwal
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)MongoDB
 
How to Build a Compute Cluster
How to Build a Compute ClusterHow to Build a Compute Cluster
How to Build a Compute ClusterRamsay Key
 
Repeating History...On Purpose...with Elixir
Repeating History...On Purpose...with ElixirRepeating History...On Purpose...with Elixir
Repeating History...On Purpose...with ElixirBarry Jones
 
Systems Design Experiences or Just Some War Stories…
Systems Design Experiences or Just Some War Stories…Systems Design Experiences or Just Some War Stories…
Systems Design Experiences or Just Some War Stories…Persistent Systems Ltd.
 
Spil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLSpil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLThijs Terlouw
 
Erlang factory SF 2011 "Erlang and the big switch in social games"
Erlang factory SF 2011 "Erlang and the big switch in social games"Erlang factory SF 2011 "Erlang and the big switch in social games"
Erlang factory SF 2011 "Erlang and the big switch in social games"Paolo Negri
 
Erlang, the big switch in social games
Erlang, the big switch in social gamesErlang, the big switch in social games
Erlang, the big switch in social gamesWooga
 
Hadoop Operations: Keeping the Elephant Running Smoothly
Hadoop Operations: Keeping the Elephant Running SmoothlyHadoop Operations: Keeping the Elephant Running Smoothly
Hadoop Operations: Keeping the Elephant Running SmoothlyMichael Arnold
 
Considerations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfConsiderations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfhik_lhz
 
Metasploit & Windows Kernel Exploitation
Metasploit & Windows Kernel ExploitationMetasploit & Windows Kernel Exploitation
Metasploit & Windows Kernel ExploitationzeroSteiner
 
Practical Windows Kernel Exploitation
Practical Windows Kernel ExploitationPractical Windows Kernel Exploitation
Practical Windows Kernel ExploitationzeroSteiner
 
Computer system organization
Computer system organizationComputer system organization
Computer system organizationSyed Zaid Irshad
 

Similar to Distributed Erlang Systems In Operation (20)

Deployment of WebObjects applications on CentOS Linux
Deployment of WebObjects applications on CentOS LinuxDeployment of WebObjects applications on CentOS Linux
Deployment of WebObjects applications on CentOS Linux
 
Introduction to multicore .ppt
Introduction to multicore .pptIntroduction to multicore .ppt
Introduction to multicore .ppt
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rules
 
Big Data for QAs
Big Data for QAsBig Data for QAs
Big Data for QAs
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Operating systems (For CBSE School Students)
Operating systems (For CBSE School Students)Operating systems (For CBSE School Students)
Operating systems (For CBSE School Students)
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
 
How to Build a Compute Cluster
How to Build a Compute ClusterHow to Build a Compute Cluster
How to Build a Compute Cluster
 
Repeating History...On Purpose...with Elixir
Repeating History...On Purpose...with ElixirRepeating History...On Purpose...with Elixir
Repeating History...On Purpose...with Elixir
 
Thread
ThreadThread
Thread
 
Systems Design Experiences or Just Some War Stories…
Systems Design Experiences or Just Some War Stories…Systems Design Experiences or Just Some War Stories…
Systems Design Experiences or Just Some War Stories…
 
Spil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLSpil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NL
 
Erlang factory SF 2011 "Erlang and the big switch in social games"
Erlang factory SF 2011 "Erlang and the big switch in social games"Erlang factory SF 2011 "Erlang and the big switch in social games"
Erlang factory SF 2011 "Erlang and the big switch in social games"
 
Erlang, the big switch in social games
Erlang, the big switch in social gamesErlang, the big switch in social games
Erlang, the big switch in social games
 
Hadoop Operations: Keeping the Elephant Running Smoothly
Hadoop Operations: Keeping the Elephant Running SmoothlyHadoop Operations: Keeping the Elephant Running Smoothly
Hadoop Operations: Keeping the Elephant Running Smoothly
 
Considerations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfConsiderations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmf
 
Metasploit & Windows Kernel Exploitation
Metasploit & Windows Kernel ExploitationMetasploit & Windows Kernel Exploitation
Metasploit & Windows Kernel Exploitation
 
Practical Windows Kernel Exploitation
Practical Windows Kernel ExploitationPractical Windows Kernel Exploitation
Practical Windows Kernel Exploitation
 
Computer system organization
Computer system organizationComputer system organization
Computer system organization
 

Distributed Erlang Systems In Operation

  • 1. Distributed Erlang Systems In Operation Andy Gross <andy@basho.com>, @argv0 VP, Engineering Basho Technologies Erlang Factory SF 2010
  • 2. Architectural Goals • Decentralized (no masters). • Distributed (asynchronous, nodes use only local data). • Homogeneous (all nodes can do anything). • Fault tolerant (emergent goal). • Observable
  • 3. Anti-Goals • Global state: • pg2/hot data in mnesia • globally registered names • Distributed transactions • Reliance on physical time
  • 4. Compromise your Goals • Decentralized (no masters). • Distributed (nodes use only local data). • Homogeneous (all nodes can do anything). • No distributed transactions/global state. • No reliance on physical time.
  • 5. Systems Design • Cluster Membership • Load balancing/naming/resource allocation • Liveness checking • Soft Global State
  • 6. Cluster Membership • Option 1: Use a configuration file: • Requires out-of-band sync of configuration file across machines. • Not “elastic” enough for some use-cases. • Option II: Contact a seed node to join and use gossip protocol to propagate state.
  • 7. Load Balancing and Resource Allocation • Static assignment • Round-robin/Random • Static hashing: Nodes[hash(Item) mod length(Nodes)] • Dynamo/Riak/Cassandra/Voldemort: Consistent Hashing
  • 8. Liveness Checking • nodes() and net_adm:ping() operations can be too low-level. • Sometimes you’d like to divert traffic from a node at the application level while keeping distributed Erlang up. • Use net_kernel:monitor_nodes() and an app-level mechanism for liveness.
  • 9. Soft State/Gossip Protocols • An eventually-consistent alternative to global state. • Nodes make changes, gossip to another node. • Nodes receive changes, merge with local state, gossip to another node. • Requires up-front thought about data structures, dealing with slightly-stale data.
  • 10. Running Your System • Shipping code • Upgrading code • Debugging your own systems • Living with other people’s systems
  • 11. Shipping Code • Don’t rely on working Erlang on end-user machines (many Linux distros are broken or out of date). • Ship code with an embedded runtime and libraries. • Put version/build info in code.
  • 12. Upgrading Code • Hot code loading for small, emergency fixes. • For new releases, reboot the node. • Why not .appups? • Systems I’ve worked on have changed/ evolved too fast. • A reboot is a good test of resiliency.
  • 13. Debugging Running Systems • Remote Erlang shells are awesome, except when distributed Erlang dies (it happens). • run_erl (or even screen(1)) give you a backdoor for when -remsh fails. • rebar (http://hg.basho.com/rebar) makes this easy. • What if you don’t have access to the box?
  • 14. OPS - Other People’s Systems • Your Erlang, Enterprise firewalls. • Erlang shell is powerful, but scary. • Provide a debugging module. • Get data out via HTTP/SMTP/SNMP • Use disk_log/report_browser.
  • 15. Questions? “You know you have [a distributed system] when the crash of a computer you’ve never heard of stops you from getting any work done” -Leslie Lamport
  • 16. Resources • unsplit: http://github.com/uwiger/unsplit • gen_leader: http://github.com/KirinDave/gen_leader_revival • Dynamo: http://www.allthingsdistributed.com/2007/10/ amazons_dynamo.html • Hans Svensson: Distributed Erlang Application Pitfalls and Recipes: http://www.erlang.org/workshop/2007/proceedings/ 06svenss.ppt • Consistent Hashing and Random Trees: Distributed Caching Protocols for relieving Hot Spots on the World Wide Web: http://bit.ly/LewinConsistentHashing