• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Apache Helix presentation at ApacheCon 2013

Apache Helix presentation at ApacheCon 2013



Building distributed systems using Helix

Building distributed systems using Helix



Total Views
Views on SlideShare
Embed Views



2 Embeds 381

https://twitter.com 380
https://abs.twimg.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Moving from single node to scalable, fault tolerant distributed mode is non trivial and slow, even though core functionality remains the same.
  • Limit the number of partitions on a single node,
  • You must define correct behavior of your system. How do you partition? What is the replication factor? Are replicas the same or are there different roles such as master/slave replicas? How should the system behave when nodes fail, new nodes are added etc. This differs from system to system.2. Once you have defined how the system must behave, you have to implement that behavior in code, maybe on top of ZK or otherwise. That implementation is non-trivial, hard to debug, hard to test. Worse, in response to requirements, if the behavior of the system were to change even slightly, the entire process has to repeat.MOVING FROM ONE SHARD PER NODE to MULTIPLE SHARD PER NODEInstead, wouldn't it be nice if all you had to do was step 1 i.e. simply define the correct behavior of your distributed system and step 2 was somehow taken care of?
  • Core helix concepts What makes it generic
  • In this slide, we will look at the problem from a different perspective and possibly re-define the cluster management problem.So re-cap to solve dds we need to define number of partitions and replicas, and for each replicas we need to different roles like master/slave etcOne of the well proven way to express such behavior is use a state machine
  • Dynamically change number of replicasAdd new resources, add nodesChange behavior easilyChange what runs whereelasticity
  • Design choicesCommunicationZK optimizationsMultiple deployment optionsNo SPOF
  • Mention configuration etc
  • Auto rebalance: Applicable to message consumer group, searchAutoDistributed data store
  • Allows one to come up with common toolsThink of maven plugins
  • Used in production and manage the core infrastructure components in the companyOperation is easy and easy for dev ops to operate multiple systems
  • S4 Apps and processing tasks, each have different state modelMultitenancy multiple resources
  • Define statesHow many you want in each stateState modelHelix provides MasterSlaveOnlineOfflineLeaderStandbyTwo master systemAutomatic replica creation
  • Provides the right combination of abstraction and flexibilityCode is stable and deployed in productionIntegration between multiple systems co-locatingGood thing helps think more about your design putting in right level of abstraction and modularity
  • contribution

Apache Helix presentation at ApacheCon 2013 Apache Helix presentation at ApacheCon 2013 Presentation Transcript

  • Building  distributed  systems  using   Helix   h?p://helix.incubator.apache.org    Apache  IncubaGon  Oct,  2012     @apachehelix  Kishore  Gopalakrishna,  @kishoreg1980h?p://www.linkedin.com/in/kgopalak     1  
  • Outline  •  Introduc)on  •  Architecture  •  How  to  use  Helix  •  Tools  •  Helix  usage     2  
  • Examples  of  distributed  data  systems   3  
  • Lifecycle   Cluster   Fault   Expansion   tolerance   •  Thro?le  data  movement   MulG   •  Re-­‐distribuGon   •  ReplicaGon   node   •  Fault  detecGon   •  Recovery   Single   Node   •  ParGGoning   •  Discovery   •  Co-­‐locaGon   4  
  • Typical  Architecture  App.   App.   App.   App.   Cluster   Network   manager  Node   Node   Node   Node   5  
  •     Distributed  search  service   INDEX  SHARD   P.1   P.2   P.5   P.6   P.3   P.4   P.3   P.4   P.1   P.2   P.5   P.6   REPLICA   Node  1   Node  2   Node  3   ParGGon   Fault  tolerance   ElasGcity   management   • MulGple  replicas   • Fault  detecGon   • re-­‐distribute   • Even   • Auto  create   parGGons   distribuGon   replicas   • Minimize   • Rack  aware   • Controlled   movement   placement   creaGon  of   • Thro?le  data   replicas     movement  
  •     Distributed  data  store   P.1   P.2   P.3   P.5   P.6   P.7   P.9   P.10   P.11   P.4   P.5   P.6   P.8   P.1   P.2   P.12   P.3   P.4   P.1   P.9   P.10   P.11   P.12   P.7   P.8   SLAVE  MASTER   Node  1   Node  2   Node  3   ParGGon   Fault  tolerance   ElasGcity   management   • MulGple  replicas   • Fault  detecGon   • Minimize   • 1  designated   • Promote  slave   downGme   master   to  master   • Minimize  data   • Even   • Even   movement   distribuGon   distribuGon   • Thro?le  data   • No  SPOF   movement  
  • Message  consumer  group  •  Similar  to  Message  groups  in  AcGveMQ   –  guaranteed  ordering  of  the  processing  of  related  messages   across  a  single  queue   –  load  balancing  of  the  processing  of  messages  across   mulGple  consumers   –  high  availability  /  auto-­‐failover  to  other  consumers  if  a   JVM  goes  down  •  Applicable  to  many  messaging  pub/sub   systems  like  kada,  rabbitmq  etc     8  
  • Message  consumer  group  ASSIGNMENT   SCALING   FAULT  TOLERANCE   9  
  • Zookeeper  provides  low  level  primiGves.     We  need  high  level  primiGves.     ApplicaGon  •  File  system   •  Node  •  Lock   •  ParGGon  •  Ephemeral   •  Replica   •  State   •  TransiGon   ApplicaGon   Framework     Consensus   Zookeeper   System   10  
  • 11  
  • Outline  •  IntroducGon  •  Architecture  •  How  to  use  Helix  •  Tools  •  Helix  usage   12  
  • Terminologies    Node   A  single  machine  Cluster   Set  of  Nodes  Resource   A  logical  en/ty  e.g.  database,  index,  task  ParGGon   Subset  of  the  resource.  Replica   Copy  of  a  parGGon  State   Status  of  a  parGGon  replica,  e.g  Master,  Slave  TransiGon   AcGon  that  lets  replicas  change  status  e.g  Slave  -­‐>  Master   13  
  • Core  concept   State  Machine   Constraints   ObjecGves  • States   • States   • ParGGon  Placement   • Offline,  Slave,  Master   • M=1,  S=2   • Failure  semanGcs  • TransiGon   • TransiGons   • O-­‐>S,  S-­‐>M,S-­‐>M,  M-­‐>S   • concurrent(0-­‐>S)  <  5     COUNT=2 minimize(maxnj∈N  S(nj)  ) t1≤5 S   t1 t2 t3 t4 O   M   COUNT=1 minimize(maxnj∈N  M(nj)  ) 14  
  • Helix  soluGon  Message  consumer  group   Distributed  search   Start  consumpGon   MAX=1   MAX  per  node=5   Offline   Online   Stop  consumpGon   MAX=3   (number  of  replicas)   15  
  • IDEALSTATE   P1   P2   P3   ConfiguraGon   Constraints  • 3  nodes   • 1  Master  • 3  parGGons  • 2  replicas   • 1  Slave   • Even   N1:M   N2:M   N3:M  • StateMachine   distribuGon   Replica   placement   N2:S   N3:S   N1:S   Replica     State   16  
  • CURRENT  STATE  N1   •  P1:OFFLINE   •  P3:OFFLINE  N2   •  P2:MASTER   •  P1:MASTER  N3   •  P3:MASTER   •  P2:SLAVE   17  
  • EXTERNAL  VIEW  P1   P2   P3  N1:O   N2:M   N3:M  N2:M   N3:S   N1:O   18  
  • Helix  Based  System  Roles   PARTICIPANT IDEAL STATE SPECTATOR Controller Parition routing logicCURRENT STATE RESPONSE COMMANDP.1   P.2   P.3   P.5   P.6   P.7   P.9   P.10   P.11  P.4   P.5   P.6   P.8   P.1   P.2   P.12   P.3   P.4   P.1  P.9   P.10   P.11   P.12   P.7   P.8   Node  1   Node  2   Node  3   19  
  • Logical  deployment   20  
  • Outline  •  IntroducGon  •  Architecture  •  How  to  use  Helix  •  Tools  •  Helix  usage       21  
  • Helix  based  soluGon  1.  Define    2.  Configure    3.  Run   22  
  • Define:  State  model  definiGon  •  States   •  e.g.  MasterSlave   –  All  possible  states   –  Priority  •  TransiGons   –  Legal  transiGons   S   –  Priority  •  Applicable  to  each   O   M   parGGon  of  a  resource   23  
  • Define:  state  model   Builder = new StateModelDefinition.Builder(“MASTERSLAVE”);! // Add states and their rank to indicate priority. ! builder.addState(MASTER, 1);! builder.addState(SLAVE, 2);! builder.addState(OFFLINE);!! //Set the initial state when the node starts! builder.initialState(OFFLINE);   //Add transitions between the states.! builder.addTransition(OFFLINE, SLAVE);! builder.addTransition(SLAVE, OFFLINE);! builder.addTransition(SLAVE, MASTER);! builder.addTransition(MASTER, SLAVE);! ! 24  
  • Define:  constraints   State   Transi)on  ParGGon   Y   Y  Resource   -­‐   Y  Node   Y   Y   COUNT=2Cluster   -­‐   Y   S   COUNT=1 State   Transi)on   O   M   ParGGon   M=1,S=2   -­‐   25  
  • Define:constraints   // static constraint! builder.upperBound(MASTER, 1);!!! // dynamic constraint! builder.dynamicUpperBound(SLAVE, "R");!!! ! // Unconstrained ! builder.upperBound(OFFLINE, -1;     26  
  • Define:  parGcipant  plug-­‐in  code   27  
  • Step  2:  configure  helix-­‐admin  –zkSvr  <zkAddress>  CREATE  CLUSTER  -­‐-­‐addCluster  <clusterName>  ADD  NODE  -­‐-­‐addNode  <clusterName  instanceId(host:port)>    CONFIGURE  RESOURCE    -­‐-­‐addResource  <clusterName  resourceName  par;;ons  statemodel>    REBALANCE  èSET  IDEALSTATE  -­‐-­‐rebalance  <clusterName  resourceName  replicas>   28  
  • zookeeper  view   IDEALSTATE   29  
  • Step  3:  Run  START  CONTROLLER   run-­‐helix-­‐controller    -­‐zkSvr  localhost:2181  –cluster  MyCluster  START  PARTICIPANT   30  
  • zookeeper  view   31  
  • Znode  content  CURRENT  STATE   EXTERNAL  VIEW   32  
  • Spectator  Plug-­‐in  code   33  
  • Helix  ExecuGon  modes   34  
  • IDEALSTATE   P1   P2   P3   ConfiguraGon   Constraints   N1:M   N2:M   N3:M  • 3  nodes   • 1  Master  • 3  parGGons   • 1  Slave  • 2  replicas   • Even  • StateMachine   distribuGon   N2:S   N3:S   N1:S   Replica   Replica     placement   State   35  
  • ExecuGon  modes  •  Who  controls  what     AUTO   AUTO   CUSTOM   REBALANCE   Replica   Helix   App   App   placement   Replica     Helix   Helix   App   State   36  
  • Auto  rebalance  v/s  Auto  AUTO  REBALANCE   AUTO   37  
  • In  acGon     Auto  rebalance   Auto     MasterSlave  p=3  r=2  N=3   MasterSlave  p=3  r=2  N=3  Node1   Node2   Node3   Node  1   Node  2   Node  3  P1:M   P2:M   P3:M   P1:M   P2:M   P3:M  P2:S   P3:S   P1:S   P2:S   P3:S   P1:S   On  failure:  Auto  create  replica     On  failure:  Only  change  states  to  saGsfy   and  assign  state   constraint   Node  1   Node  2   Node  3   Node  1   Node  2   Node  3   P1:O   P2:M   P3:M   P1:M   P2:M   P3:M   P2:O   P3:S   P1:S   P2:S   P3:S   P1:M   P1:M   P2:S   38  
  • Custom  mode:  example   39  
  • Custom  mode:  handling  failure  ™  Custom  code  invoker   ™  Code  that  lives  on  all  nodes,  but  acGve  in  one  place   ™  Invoked  when  node  joins/leaves  the  cluster   ™  Computes  new  idealstate   ™  Helix  controller  fires  the  transiGon  without  viola)ng  constraints   P1   P2   P3   P1   P2   P3   Transi)ons   1   N1   MàS   2   N2   Sà  M   N1:M   N2:M   N3:M   N1:S   N2:M   N3:M   1  &  2  in  parallel  violate  single   master  constraint   N2:S   N3:S   N1:S   N2:M   N3:S   N1:S   Helix  sends  2  aser  1  is  finished   40  
  • Outline  •  IntroducGon  •  Architecture  •  How  to  use  Helix  •  Tools  •  Helix  usage     41  
  • Tools  •  Chaos  monkey  •  Data  driven  tesGng  and  debugging  •  Rolling  upgrade  •  On  demand  task  scheduling  and  intra-­‐cluster   messaging  •  Health  monitoring  and  alerts   42  
  • Data  driven  tesGng  •  Instrument  –   •   Zookeeper,  controller,  parGcipant  logs  •  Simulate  –  Chaos  monkey  •  Analyze  –  Invariants  are   •  Respect  state  transiGon  constraints   •  Respect  state  count  constraints   •  And  so  on  •  Debugging  made  easy   •  Reproduce  exact  sequence  of  events       43  
  • Structured  Log  File  -­‐  sample   timestamp partition instanceName sessionId state1323312236368 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE1323312236426 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE1323312236530 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE1323312236530 TestDB_91 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE1323312236561 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE1323312236561 TestDB_91 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE1323312236685 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE1323312236685 TestDB_91 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE1323312236685 TestDB_60 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE1323312236719 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE1323312236719 TestDB_91 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE1323312236719 TestDB_60 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE1323312236814 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE
  • No  more  than  R=2  slaves  Time State Number Slaves Instance42632 OFFLINE 0 SLAVE 1 OFFLINE 1 OFFLINE 1 SLAVE 2 SLAVE 3 MASTER 2
  • How  long  was  it  out  of  whack?  Number  of  Slaves   Time     Percentage  0   1082319   0.5  1   35578388   16.46  2   179417802   82.99  3   118863   0.05   83%  of  the  Gme,  there  were  2  slaves  to  a  parGGon   93%  of  the  Gme,  there  was  1  master  to  a  parGGon  Number  of  Masters   Time   Percentage   0 15490456 7.164960359 1 200706916 92.83503964
  • Outline  •  IntroducGon  •  Architecture  •  How  to  use  Helix  •  Tools  •  Helix  usage     48  
  • Helix  usage  at  LinkedIn       Espresso   49  
  • In  flight  •  Apache  S4   –  ParGGoning,  co-­‐locaGon   –  Dynamic  cluster  expansion  •  Archiva   –  ParGGoned  replicated  file  store   –  Rsync  based  replicaGon  •  Others  in  evaluaGon   –  Bigtop   50  
  • Auto  scaling  sosware  deployment  tool  •  States   Offline < 100 •  Download,  Configure,  Start   Download •  AcGve,  Standby   Configure•  Constraint  for  each  state   Start •  Download    <  100   •  AcGve  1000   Active 1000 •  Standby  100   Standby 100 51  
  • Summary  •  Helix:  A  Generic  framework  for  building   distributed  systems  •  Modifying/enhancing  system  behavior  is  easy   –  AbstracGon  and  modularity  is  key  •  Simple  programming  model:  declaraGve  state   machine   52  
  • Roadmap  •  Features   •  Span  mulGple  data  centers   •  AutomaGc  Load  balancing   •  Distributed  health  monitoring   •  YARN  Generic  ApplicaGon  master  for  real  Gme   Apps   •  Stand  alone  Helix  agent      
  • website   h?p://helix.incubator.apache.org  user   user@helix.incubator.apache.org  dev   dev@helix.incubator.apache.org  twi?er   @apachehelix,  @kishoreg1980   54