Your SlideShare is downloading. ×
Improving	  HBase	  Availability	  and	  Repair	    Improving	  HBase	  Availability	  and	  Repair	         Jeff	  Bean,	 ...
Who	  Are	  We?	  •  Jeff	  Bean	      •  Designated	  Support	  Engineer,	  Cloudera	      •  EducaGon	  Program	  Lead,	 ...
What	  is	  Apache	  HBase?	                                                    Apache	  HBase	  is	  an	                 ...
Fault	  Tolerance	  vs	  Highly	  Available	  •  Fault	  tolerant:	  	       •  Ability	  to	  recover	  service	  if	  a	...
HBase	  Architecture	  •  HBase	  is	  designed	  to	  be	  fault	  tolerant	     and	  highly	  available	  	      •  It	...
Causes	  of	  HBase	  DownDme	                                                                                           H...
Causes	  of	  Unexpected	  Maintenance	  Incidents	  	                                                                    ...
Outline	  •  Where	  we	  were	  	      •  HBase	  0.90.x	  +	  Hadoop	  0.20.x/1.0.x	  	      •  Case	  Studies	  •  Wher...
[T]here	  are	  known	  knowns;	  there	  are	  things	  we	  know	  we	  know.	             We	  also	  know	  there	  ar...
Best	  PracDces	  to	  avoid	  hazards	                                                                      Unplanned	  M...
Case	  #1:	  Memory	  Over-­‐subscripDon	  Hazard	                  Misconfig	                                             ...
Case	  #2,	  #3:	  Hazards	  of	  Abusing	  HDFS	  and	  ZK	           Millions	  of	  HDFS	  files	                       ...
Case	  #4:	  SpliYng	  CorrupDon	  from	  HW	  failure	                                                                   ...
Case	  #5:	  Slow	  recovery	  from	  HW	  failure	                                                                       ...
IniDal	  Lessons	  •  Use	  Best	  pracGces	  to	  avoid	  problems	      •  ConservaGve	  first	      •  Avoid	  unstable	...
In	  war,	  then,	  let	  your	  great	  object	  be	  victory,	                                      not	  lengthy	  camp...
Goal:	  Reduce	  unexpected	  downDme	  by	  recovering	  faster	  	  •  Removing	  the	  SPOFs	      •  HA	  HDFS	  •  Fa...
Problem:	  HDFS	  NN	  goes	  down	  under	  HBase	  •  HBase	  depends	  on	  HDFS.	                                     ...
HBase-­‐HDFS	  HA	  Nodes	    NameNode	  	  (acGve)	                                                                      ...
HBase-­‐HDFS	  HA	  Nodes:	  Transparent	  to	  HBase	                                                                    ...
HBase-­‐HDFS	  HA	  Nodes:	  No	  more	  SPOF	                                                                            ...
Recovery	  operaDons	  •  If	  a	  network	  switch	  fails	  or	  if	  there	  is	  a	  power	  outage,	  	       •  HBas...
HBase	  Metadata	  CorrupDons	  •  Internal	  HBase	  metadata	                                                           ...
HBase	  Metadata	  Invariants	   Table	  Integrity	                                                       Region	  Consist...
DetecDng	  and	  Repairing	  corrupDon	  with	  hbck	  •  HBase	  0.90	  hbck	  	      •  Checks	  an	  HBase	         ins...
Case	  #4	  redux:	  SpliYng	  CorrupDon	                                                                                 ...
Case	  #4	  redux:	  SpliYng	  CorrupDon	                                HW	  Failure	                                    ...
Case	  #4	  redux:	  SpliYng	  CorrupDon	                                HW	  Failure	                                    ...
Data	  Consistency	  •  When	  a	  region	  server	  goes	  down,	  it	  tries	  to	  flush	  data	  in	     memory	  to	  ...
Write	  Path	  (Put	  /	  Delete	  /	  Increment)	       HBase	       client	                             Region	  Server	...
Write	  Path	  (Put	  /	  Delete	  /	  Increment)	                                                                        ...
Log	  SpliYng	                                                 HMaster	            RegionServer	                          ...
Log	  SpliYng	                                                 HMaster	            RegionServer	                          ...
Log	  SpliYng	                                                HMaster	             HLog1	                                 ...
Log	  SpliYng	                                                                                                           S...
Log	  SpliYng	                                                                                                           S...
Log	  SpliYng	                                                                                                           S...
Log	  SpliYng	                                                                                                           S...
Log	  SpliYng	                                                                                                     Whew.	 ...
Log	  SpliYng	                                                                                                        Regi...
Log	  SpliYng	                                                                                                            ...
Can	  we	  recover	  more	  quickly?	  	  •  In	  the	  case	  study,	  this	  is	  all	  done	  serially	  by	  the	  mas...
Distributed	  Log	  SpliYng	                                                                                              ...
Distributed	  Log	  SpliYng	                                                                                             T...
Distributed	  Log	  SpliYng	                                                                                         You	 ...
Distributed	  Log	  SpliYng	                                                                                         You	 ...
Distributed	  Log	  SpliYng	                                                                                              ...
Distributed	  Log	  SpliYng	                                                                                              ...
Distributed	  Log	  SpliYng	                                                                                              ...
Case	  #5	  redux:	  Network	  failure	  and	  slow	  recovery	                                                           ...
Case	  #5	  redux:	  Network	  failure	  and	  slow	  recovery	                                                           ...
WHERE	  WE	  ARE	  GOING	  HBASE	  0.96	  +	  HADOOP	  2.X	                 Hadoop	  Summit	  2012.	  6/13/12	  	  Copyrig...
Themes	  •  Minimizing	  Planned	  downGme	                                                            HBase	  DownDme	   ...
Table	  unavailable	  when	  changing	  schema	  •  Changing	  table	  schema	  requires	  disabling	  table	      •  disa...
Changing	  Server	  Configs	  and	  Sogware	  updates	  •  Rolling	  restart	  is	  an	  operaGon	  for	  upgrading	  an	  ...
Rolling	  Restart	                                                                                    Admin	              ...
Rolling	  Restart	                                                                                    Admin	              ...
Rolling	  Restart	                                                                                    Admin	              ...
Rolling	  Restart	                                                                                   Admin	               ...
Rolling	  Restart	                                                                                    Admin	              ...
Rolling	  Restart	                                                                                    Admin	              ...
Rolling	  Restart	                                                                                    Admin	              ...
Rolling	  Restart	                                                                                    Admin	              ...
Rolling	  Restart	                                                                                    Admin	              ...
Rolling	  Restart	                                                                                    Admin	              ...
Rolling	  Restart	                                                                                    Admin	              ...
Rolling	  Restart	                                                                                    Admin	              ...
Rolling	  Restart	                                                                                    Admin	              ...
Rolling	  Restart	                                                                                    Admin	              ...
Rolling	  restart	  limitaDons	  •  There	  are	  limitaGons	  on	                                              Unplanned	...
HBase	  CompaDbility	  and	  Extensibility	  •  Coming	  in	  HBase	  0.96	      •  HBASE-­‐5305	  and	  friends	  •  Goal...
HDFS	  Wire	  CompaDbility	  •  Here	  in	  HDFS	  2.0.x	       •  HADOOP-­‐7347	  and	  friends	                         ...
HDFS	  Wire	  CompaDbility	  •  Here	  in	  HDFS	  2.0.x	       •  HADOOP-­‐7347	  and	  friends	                         ...
CONCLUSIONS	            Hadoop	  Summit	  2012.	  6/13/12	  	  Copyright	  2012	     74	                  Cloudera	  Inc,	...
Improving	  how	  we	  handling	  causes	  of	  downDme	       HBase	  DownDme	  DistribuDon	                             ...
jon@cloudera.com	                                                                           TwiLer:	  @jmhsieh	  	        ...
Upcoming SlideShare
Loading in...5
×

Improving h base availability and repair

1,320

Published on

Apache HBase is a rapidly-evolving random-access distributed data store built on top of Apache Hadoop's HDFS and Apache ZooKeeper. Drawing from real-world support experiences, this talk provides administrators insight into improving HBase's availability and recovering from situations where HBase is not available. We share tips on the common root causes of unavailability, explain how to diagnose them, and prescribe measures for ensuring maximum availability of an HBase cluster. We discuss new features that improve recovery time such as distributed log splitting as well as supportability improvements. We will also describe utilities including new failure recovery tools that we have developed and contributed that can be used to diagnose and repair rare corruption problems on live HBase systems.

Published in: Technology, Business
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,320
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Transcript of "Improving h base availability and repair"

  1. 1. Improving  HBase  Availability  and  Repair   Improving  HBase  Availability  and  Repair   Jeff  Bean,  Jonathan  Hsieh  {jw2ean,jon} @cloudera.com   6/13/12          
  2. 2. Who  Are  We?  •  Jeff  Bean   •  Designated  Support  Engineer,  Cloudera   •  EducaGon  Program  Lead,  Cloudera  •  Jonathan  Hsieh   •  SoJware  Engineer,  Cloudera   •  Apache  HBase  CommiLer  and  PMC  member   Hadoop  Summit  2012.  6/13/12    Copyright  2012   2   Cloudera  Inc,  All  Rights  Reserved  
  3. 3. What  is  Apache  HBase?   Apache  HBase  is  an   reliable,  column-­‐ oriented  data  store   that  provides   consistent,  low-­‐ latency,  random   read/write  access.   Hadoop  Summit  2012.  6/13/12    Copyright  2012   3   Cloudera  Inc,  All  Rights  Reserved  
  4. 4. Fault  Tolerance  vs  Highly  Available  •  Fault  tolerant:     •  Ability  to  recover  service  if  a   component  fails,  without  losing   data.   Fault  Tolerant  •  Highly  Available:     •  Ability  to  quickly  recover  service  if   Highly   a  component  fails,  without  losing   Available   data.  •  Goal:  Minimize  downGme!   Hadoop  Summit  2012.  6/13/12    Copyright  2012   4   Cloudera  Inc,  All  Rights  Reserved  
  5. 5. HBase  Architecture  •  HBase  is  designed  to  be  fault  tolerant   and  highly  available     •  It  depends  on  other  systems  to  be  as  well.   App   MR  •  ReplicaDon  for  fault  tolerance     •  Serve  regions  from  any  Region  server   •  Failover  HMasters   •  ZK  Quorums   •  HDFS  Block  replicaGon  on  Data  Nodes   ZK   HDFS  •  But  replicaGon  doesn’t  guarantee  high   availability   •  There  can  sGll  be  soJware  or  human  faults   Hadoop  Summit  2012.  6/13/12    Copyright  2012   5   Cloudera  Inc,  All  Rights  Reserved  
  6. 6. Causes  of  HBase  DownDme   HBase  DownDme   DistribuDon  •  Unplanned  Maintenance     •  Hardware  failures     •  SoJware  errors   Planned   •  Human  error  •  Planned  Maintenance   •  Upgrades   Unplanned   •  MigraGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   6   Cloudera  Inc,  All  Rights  Reserved  
  7. 7. Causes  of  Unexpected  Maintenance  Incidents     Unplanned  Maintenance:  Root   Cause  from  Cloudera  Support  •  MisconfiguraGon  •  Metadata  CorrupGons   Repair  •  Network  /  HW  problems   Needed   HBase,  ZK,   28%  •  SW  problems   MR,  HDFS   Misconfig   44%   Fix  HW/•  Long  recovery  Gme   NW   16%   Patch   •  Automated  and  manual   Required   12%   Source:  Cloudera’s  producGon  HBase  Support  Tickets     CDH3’s  HBase  0.90.x,  Hadoop  0.20.x/1.0.x   Hadoop  Summit  2012.  6/13/12    Copyright  2012   7   Cloudera  Inc,  All  Rights  Reserved  
  8. 8. Outline  •  Where  we  were     •  HBase  0.90.x  +  Hadoop  0.20.x/1.0.x     •  Case  Studies  •  Where  we  are  today   •  HBase  0.92.x/0.94.x  +  Hadoop  2.0.x   •  Feature  Summary  •  Where  we  are  going   •  HBase  0.96.x  +  Hadoop  2.x     •  Feature  Preview   Hadoop  Summit  2012.  6/13/12    Copyright  2012   8   Cloudera  Inc,  All  Rights  Reserved  
  9. 9. [T]here  are  known  knowns;  there  are  things  we  know  we  know.   We  also  know  there  are  known  unknowns;  that  is  to  say  we  know   there  are  some  things  we  do  not  know.   But  there  are  also  unknown  unknowns  –  there  are  things  we  do  not   know  we  dont  know.   —United  States  Secretary  of  Defense  Donald  Rumsfeld  WHERE  WE  WERE:  CASE  STUDIES     Hadoop  Summit  2012.  6/13/12    Copyright  2012   9   Cloudera  Inc,  All  Rights  Reserved  
  10. 10. Best  PracDces  to  avoid  hazards   Unplanned  Maintenance:  Root   Cause  from  Cloudera  Support   Repair   Needed   HBase,  ZK,   28%   MR,  HDFS   Misconfig   44%   Fix  HW/ NW   16%   Patch   Required   12%   CAN PREVENT HBASE Source:  Cloudera’s  producGon  HBase  Support  Tickets     MISCONFIGURATIONS CDH3’s  HBase  0.90.x,  Hadoop  0.20.x/1.0.x   Hadoop  Summit  2012.  6/13/12    Copyright  2012   10   Cloudera  Inc,  All  Rights  Reserved  
  11. 11. Case  #1:  Memory  Over-­‐subscripDon  Hazard   Misconfig   Bad  Outcome   Masters  Take   Node  A  swaps  •  Too  many  MR  Slots   •  MapReduce  tasks  fail   AcGon  •  MR  Slots  too  large   •  HDFS  datanode   •  “Arbitrary”  processes   operaGons  Gme  out   •  JobTracker  blacklists  TT   pause  or  unresponsive   on  node  B   •  HBase  client  operaGons   fail   •  Jobs  fail  or  run  slow   •  NameNode  re-­‐replicates   blocks  from  node  A   Node    A  Under   Node  B  can’t   Load   connect  to  node  A   Hadoop  Summit  2012.  6/13/12    Copyright  2012   11   Cloudera  Inc,  All  Rights  Reserved  
  12. 12. Case  #2,  #3:  Hazards  of  Abusing  HDFS  and  ZK   Millions  of  HDFS  files   Millions  of  ZK  nodes   Bad  PracGce   MisconfiguraGon   500,000  blocks  per   Millions  of  ZK  znodes   datanode   400MB  snapshot   Heartbeat  thread   SW  Bug   ZK  fails  to  create  new   blocks  IO   snapshots,  fails   RS  cannot  access   Bad  outcome   HBase  goes  down   HDFS   HBase  goes  down   Bad  outcome   HBase  fails  to  restart   SW  Bug,  Worse   Hadoop  Summit  2012.  6/13/12    Copyright  2012   outcome   12   Cloudera  Inc,  All  Rights  Reserved  
  13. 13. Case  #4:  SpliYng  CorrupDon  from  HW  failure   Manual,  Slow,  and   HW  Failure   requires  expert   HBase  has   Region   regions   MulGple  6  hour   Network  failure   Split  Recovery   inconsistencies   aLempts  to   manual  repair   (takes  out  NN)   incomplete   split   (overlaps  /   sessions.   holes)   SW  Bug   Hadoop  Summit  2012.  6/13/12    Copyright  2012   13   Cloudera  Inc,  All  Rights  Reserved  
  14. 14. Case  #5:  Slow  recovery  from  HW  failure   Correct  but  slow!   Human  error   On  restart,   RS  loses   9  hour  hlog   Network   Root   Manual   HDFS,   spliung   HW  failure   and  .META.   Repairs   WALs   recovery   assign  fails   SW  error   Hadoop  Summit  2012.  6/13/12    Copyright  2012   14   Cloudera  Inc,  All  Rights  Reserved  
  15. 15. IniDal  Lessons  •  Use  Best  pracGces  to  avoid  problems   •  ConservaGve  first   •  Avoid  unstable  features  •  What  can  we  do?   •  Fix  the  bugs   •  Recover  from  problems  faster   •  Make  people  smarter  to  avoid  hazards  and  misconfiguraGons   •  Make  soJware  smarter  to  prevent  hazards  and   misconfiguraGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   15   Cloudera  Inc,  All  Rights  Reserved  
  16. 16. In  war,  then,  let  your  great  object  be  victory,   not  lengthy  campaigns.   -­‐-­‐  Sun  Tzu  WHERE  WE  ARE  TODAY  HBASE  0.92.X  +  HADOOP  2.0.X   Hadoop  Summit  2012.  6/13/12    Copyright  2012   16   Cloudera  Inc,  All  Rights  Reserved  
  17. 17. Goal:  Reduce  unexpected  downDme  by  recovering  faster    •  Removing  the  SPOFs   •  HA  HDFS  •  Faster  Recovery   •  Improved  hbck   •  Distributed  Log  spliung   Hadoop  Summit  2012.  6/13/12    Copyright  2012   17   Cloudera  Inc,  All  Rights  Reserved  
  18. 18. Problem:  HDFS  NN  goes  down  under  HBase  •  HBase  depends  on  HDFS.   App   MR   •  If  HDFS  is  down,  HBase  goes  down.  •  RamificaGons.   •  Forces  Recovery  mechanism   •  Caused  some  data  corrupGons   ZK   HDFS  •  Ideally  we  avoid  having  to  do  recovery  at  all.   Hadoop  Summit  2012.  6/13/12    Copyright  2012   18   Cloudera  Inc,  All  Rights  Reserved  
  19. 19. HBase-­‐HDFS  HA  Nodes   NameNode    (acGve)   HMaster     (metadata  server)   (region  metadata)   NameNode    (standby)   HMaster      (acGve-­‐standby   (hot  standby)    hot  failover)   ZooKeeper    Quorum   HDFS  DataNodes   HBase  RegionServers   Hadoop  Summit  2012.  6/13/12    Copyright  2012   19   Cloudera  Inc,  All  Rights  Reserved  
  20. 20. HBase-­‐HDFS  HA  Nodes:  Transparent  to  HBase   HMaster     (region  metadata)   HMaster     NameNode    (acGve)   (hot  standby)   ZooKeeper    Quorum   HDFS  DataNodes   HBase  RegionServers   Hadoop  Summit  2012.  6/13/12    Copyright  2012   20   Cloudera  Inc,  All  Rights  Reserved  
  21. 21. HBase-­‐HDFS  HA  Nodes:  No  more  SPOF   HMaster     NameNode    (acGve)   (acGve)   ZooKeeper    Quorum   HDFS  DataNodes   HBase  RegionServers   Hadoop  Summit  2012.  6/13/12    Copyright  2012   21   Cloudera  Inc,  All  Rights  Reserved  
  22. 22. Recovery  operaDons  •  If  a  network  switch  fails  or  if  there  is  a  power  outage,     •  HBase,  ZK,  and  HA  HDFS  will  fail   •  Will  always  sGll  rely  on  recovery  mechanisms.  •  Need  to  be  able  to  quickly  recover   •  Metadata  Invariants  to  fix  metadata  corrupGons   •  Data  Consistency  to  restore  ACID  guarantees   Hadoop  Summit  2012.  6/13/12    Copyright  2012   22   Cloudera  Inc,  All  Rights  Reserved  
  23. 23. HBase  Metadata  CorrupDons  •  Internal  HBase  metadata   Unplanned  Maintenance:  Root  Cause   corrupGons   from  Cloudera  Support   •  Prevent  HBase  from  starGng     •  Cause  some  regions  to  be   Repair   unavailable.   Needed   28%   HBase,  ZK,   MR,  HDFS   Misconfig  •  Repairs  are  intricate  and   44%   Fix  HW/ can  cause  extended  periods   NW   of  downGme.   16%   Patch   Required   12%   Hadoop  Summit  2012.  6/13/12    Copyright  2012   23   Cloudera  Inc,  All  Rights  Reserved  
  24. 24. HBase  Metadata  Invariants   Table  Integrity   Region  Consistency   •  Every  key  shall  get  assigned   •  Metadata  about  regions  should   to  a  single  region.   agree  in  hdfs,  meta  and  region   server  assignment.   [‘  ‘,A)   [A,B)   regioninfo     in  META   [B,  C)   [C,  D)   [D,  E)   Good   [E,  F)   region   assigned     .regioninfo     [F,  G)   to    RS   in  HDFS   [G,  ‘  ‘)   Hadoop  Summit  2012.  6/13/12    Copyright  2012   24   Cloudera  Inc,  All  Rights  Reserved  
  25. 25. DetecDng  and  Repairing  corrupDon  with  hbck  •  HBase  0.90  hbck     •  Checks  an  HBase   instance’s  internals   invariants.  •  HBase  hbck  today   •  Checks  and  can  fix   problem  in  an  HBase   instance’s  internal   invariants   •  0.90.7,  0.92.2,   0.94.0   •  CDH3u4,  CDH4   Hadoop  Summit  2012.  6/13/12    Copyright  2012   25   Cloudera  Inc,  All  Rights  Reserved  
  26. 26. Case  #4  redux:  SpliYng  CorrupDon   Manual,  Slow,  and   HW  Failure   requires  expert   HBase  has   Region   Network  failure   regions   MulGple  6  hour   Split  Recovery   inconsistencies   aLempts  to   manual  repair   (takes  out  NN)   incomplete   split   (overlaps  /   sessions.   holes)   SW  Bug   Hadoop  Summit  2012.  6/13/12    Copyright  2012   26   Cloudera  Inc,  All  Rights  Reserved  
  27. 27. Case  #4  redux:  SpliYng  CorrupDon   HW  Failure   HBase  has   Region   Network  failure   regions   Automated   Split  Recovery   inconsistencies   aLempts  to   repair  tool   (takes  out  NN)   incomplete   split   (overlaps  /   (Minutes)   holes)   SW  Bug   Fixes  are  quicker,   operator  can  use   Hadoop  Summit  2012.  6/13/12    Copyright  2012   27   Cloudera  Inc,  All  Rights  Reserved  
  28. 28. Case  #4  redux:  SpliYng  CorrupDon   HW  Failure   Minor    HBase   Region   Network  failure   inconsistencies   Automated   Split  Recovery   aLempts  to   repair  tool   (takes  out  NN)   incomplete   (bad   split   (seconds)   assignments)   Fixed  SW  Bug   Hadoop  Summit  2012.  6/13/12    Copyright  2012   28   Cloudera  Inc,  All  Rights  Reserved  
  29. 29. Data  Consistency  •  When  a  region  server  goes  down,  it  tries  to  flush  data  in   memory  to  HDFS.  •  If  it  cannot  write  to  HDFS,  it  relies  on  the  WAL/HLog.  •  Recovery  via  the  HLog  is  vital  to  prevent  data  loss   •  Understand  the  write  path.   •  Recovery:    HLog  spliung.   •  Faster  Recovery:  Distributed  HLog  spliung.   Hadoop  Summit  2012.  6/13/12    Copyright  2012   29   Cloudera  Inc,  All  Rights  Reserved  
  30. 30. Write  Path  (Put  /  Delete  /  Increment)   HBase   client   Region  Server   HLog   Put   Server   HRegion   HRegion   MemStore   MemStore   Put   HStore   HStore   HStore   HStore   Hadoop  Summit  2012.  6/13/12    Copyright  2012   30   Cloudera  Inc,  All  Rights  Reserved  
  31. 31. Write  Path  (Put  /  Delete  /  Increment)   Note,  both  regions   write  to  the  same   HBase   HLog   client   Region  Server   Put   HLog   Put   Put   Server   HRegion   HRegion   MemStore   MemStore   Put   Put   HStore   HStore   HStore   HStore   Hadoop  Summit  2012.  6/13/12    Copyright  2012   31   Cloudera  Inc,  All  Rights  Reserved  
  32. 32. Log  SpliYng   HMaster   RegionServer   RegionServer   RegionServer   HLog1   HLog2   HLog3   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   mem   mem   mem   mem   mem   mem   Hadoop  Summit  2012.  6/13/12    Copyright  2012   32   Cloudera  Inc,  All  Rights  Reserved  
  33. 33. Log  SpliYng   HMaster   RegionServer   RegionServer   RegionServer   HLog1   HLog2   HLog3   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   mem   mem   mem   mem   mem   mem   Hadoop  Summit  2012.  6/13/12    Copyright  2012   33   Cloudera  Inc,  All  Rights  Reserved  
  34. 34. Log  SpliYng   HMaster   HLog1   HLog2   HLog3   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   34   Cloudera  Inc,  All  Rights  Reserved  
  35. 35. Log  SpliYng   Spliung  log  1   HMaster   HLog1   HLog2   HLog3   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   35   Cloudera  Inc,  All  Rights  Reserved  
  36. 36. Log  SpliYng   Spliung  log  2   HMaster   HLog   HLog1   HLog2   HLog3   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   36   Cloudera  Inc,  All  Rights  Reserved  
  37. 37. Log  SpliYng   Spliung  log  3   HMaster   HLog   HLog1   HLog   HLog2   HLog3   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   37   Cloudera  Inc,  All  Rights  Reserved  
  38. 38. Log  SpliYng   Spliung  log  100   HMaster   HLog   HLog   HLog   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   38   Cloudera  Inc,  All  Rights  Reserved  
  39. 39. Log  SpliYng   Whew.    I  did  a  lot  of   spliung  work.    That   took  9  hours!   HMaster   HLog   HLog   HLog   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   39   Cloudera  Inc,  All  Rights  Reserved  
  40. 40. Log  SpliYng   RegionServers,  here   are  your  region   assignments.   HMaster   RegionServer4   RegionServer5   RegionServer6   …   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   40   Cloudera  Inc,  All  Rights  Reserved  
  41. 41. Log  SpliYng   Victory!   HMaster   RegionServer4   RegionServer5   RegionServer6   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   mem   mem   mem   mem   mem   mem   Hadoop  Summit  2012.  6/13/12    Copyright  2012   41   Cloudera  Inc,  All  Rights  Reserved  
  42. 42. Can  we  recover  more  quickly?    •  In  the  case  study,  this  is  all  done  serially  by  the  master     •  The  master  took  9  hours  to  recovery.   •  The  100  region  server  nodes  were  idle.    •  Let’s  use  the  idle  machines  to  do  spliung  in  parallel!  •  Distributed  log  spliYng  (HBASE-­‐1364)   •  Introduced  in  0.92.0  by  Prakash  Khemani  (Facebook)   •  Included  in  CDH4  (0.92.1)       •  Backported  to  CDH3u3  (off  by  default)   Hadoop  Summit  2012.  6/13/12    Copyright  2012   42   Cloudera  Inc,  All  Rights  Reserved  
  43. 43. Distributed  Log  SpliYng   I’m  the  boss.   HMaster   RegionServer   RegionServer   RegionServer   HLog1   HLog2   HLog3   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   mem   mem   mem   mem   mem   mem   Hadoop  Summit  2012.  6/13/12    Copyright  2012   43   Cloudera  Inc,  All  Rights  Reserved  
  44. 44. Distributed  Log  SpliYng   There  is  a  lot  of   spliung  work  here,   HMaster   let’s  split  it  up.   HLog1   HLog2   HLog3   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   44   Cloudera  Inc,  All  Rights  Reserved  
  45. 45. Distributed  Log  SpliYng   You  guys  do  the  work   for  me.   HMaster   RegionServer4   RegionServer5   RegionServer6   HLog1   HLog2   HLog3   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   45   Cloudera  Inc,  All  Rights  Reserved  
  46. 46. Distributed  Log  SpliYng   You  guys  do  the  work   for  me.   HMaster   RegionServer4   RegionServer5   RegionServer6   HLog1   HLog2   HLog3   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   46   Cloudera  Inc,  All  Rights  Reserved  
  47. 47. Distributed  Log  SpliYng   Great,  that  took  5.4   minutes.   HMaster   RegionServer4   RegionServer5   RegionServer6   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   47   Cloudera  Inc,  All  Rights  Reserved  
  48. 48. Distributed  Log  SpliYng   Good  Job,  here  are   your  region   assignments.   HMaster   RegionServer4   RegionServer5   RegionServer6   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   48   Cloudera  Inc,  All  Rights  Reserved  
  49. 49. Distributed  Log  SpliYng   Like  a  Boss.   HMaster   RegionServer4   RegionServer5   RegionServer6   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   mem   mem   mem   mem   mem   mem   Hadoop  Summit  2012.  6/13/12    Copyright  2012   49   Cloudera  Inc,  All  Rights  Reserved  
  50. 50. Case  #5  redux:  Network  failure  and  slow  recovery   Correct  but  slow!   Human  error   On  restart,   RS  loses   9  hour  hlog   Network   Root   Manual   HDFS,   spliung   HW  failure   and  .META.   Repair   WALs   recovery   assign  fails   Hadoop  Summit  2012.  6/13/12    Copyright  2012   50   Cloudera  Inc,  All  Rights  Reserved  
  51. 51. Case  #5  redux:  Network  failure  and  slow  recovery   Correct  and  Faster!   Human  error   On  restart,   5.4  Minute   RS  loses   Network   Root   AutomaGc   hlog   HDFS,   HW  failure   and  .META.   repairs   spliung   WALs   assign  fails   recovery   Fixed!   Hadoop  Summit  2012.  6/13/12    Copyright  2012   51   Cloudera  Inc,  All  Rights  Reserved  
  52. 52. WHERE  WE  ARE  GOING  HBASE  0.96  +  HADOOP  2.X   Hadoop  Summit  2012.  6/13/12    Copyright  2012   52   Cloudera  Inc,  All  Rights  Reserved  
  53. 53. Themes  •  Minimizing  Planned  downGme   HBase  DownDme   •  Changing  configuraGons   DistribuDon   •  Online  Schema  Change   (experimental  in  0.92,  0.94)   •  Rolling  Restarts   Planned   •  Wire  compaGbility   Unplanned   Hadoop  Summit  2012.  6/13/12    Copyright  2012   53   Cloudera  Inc,  All  Rights  Reserved  
  54. 54. Table  unavailable  when  changing  schema  •  Changing  table  schema  requires  disabling  table   •  disable  table,  alter  table  schema,  enable  table   •  Schema  includes  compression,  cf’s,  caching,  Ll,  versions.  •  Goal:  Quickly  change  table  and  column  configuraGon   seungs  without  having  to  disable  Hbase  tables.   •  Feature  Online  Schema  Change  (HBASE-­‐1730)   •  Included  in  but  considered  experimental  in  HBase  0.92/0.94.       •  Contributed  by  Facebook   Hadoop  Summit  2012.  6/13/12    Copyright  2012   54   Cloudera  Inc,  All  Rights  Reserved  
  55. 55. Changing  Server  Configs  and  Sogware  updates  •  Rolling  restart  is  an  operaGon  for  upgrading  an  HBase   cluster  to  a  compaGble  version  while  keeping  HBase   available  and  serving  data.   •  Handle  server  config  changes.   •  Handle  code  changes  like  ho}ixes  or  compaGble  upgrades     Hadoop  Summit  2012.  6/13/12    Copyright  2012   55   Cloudera  Inc,  All  Rights  Reserved  
  56. 56. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS1   RS2   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   56   Cloudera  Inc,  All  Rights  Reserved  
  57. 57. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS2   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   57   Cloudera  Inc,  All  Rights  Reserved  
  58. 58. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS1   RS2   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   58   Cloudera  Inc,  All  Rights  Reserved  
  59. 59. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS1   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   59   Cloudera  Inc,  All  Rights  Reserved  
  60. 60. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS1   RS2   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   60   Cloudera  Inc,  All  Rights  Reserved  
  61. 61. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS1   RS2   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   61   Cloudera  Inc,  All  Rights  Reserved  
  62. 62. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS1   RS2   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   62   Cloudera  Inc,  All  Rights  Reserved  
  63. 63. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS1   RS2   RS3   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   63   Cloudera  Inc,  All  Rights  Reserved  
  64. 64. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS1   RS2   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   64   Cloudera  Inc,  All  Rights  Reserved  
  65. 65. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   User   operaGons   HM2   RS1   RS2   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   65   Cloudera  Inc,  All  Rights  Reserved  
  66. 66. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS1   RS2   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   66   Cloudera  Inc,  All  Rights  Reserved  
  67. 67. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   RS1   RS2   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   67   Cloudera  Inc,  All  Rights  Reserved  
  68. 68. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS1   RS2   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   68   Cloudera  Inc,  All  Rights  Reserved  
  69. 69. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS1   RS2   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   69   Cloudera  Inc,  All  Rights  Reserved  
  70. 70. Rolling  restart  limitaDons  •  There  are  limitaGons  on   Unplanned  Maintenance:  Root   rolling  restarts     Cause  from  Cloudera  Support   •  All  Servers  and  clients  must  be   wire  compaGble   •  All  must  be  able  to  read  old   data  in  FS  and  ZK.   Repair   Needed   HBase,  ZK,   28%  •  RamificaGons:     MR,  HDFS   Misconfig   •  Only  minor  version  upgrades   44%   possible   Fix  HW/ •  New  features  that  change  RPCs   NW   require  custom  compaGbility   16%   Patch   shims.   Required   •  Data  format  changes  not   12%   possible  across  minor  versions.   Source:  Cloudera’s  producGon  HBase  Support  Tickets     CDH3’s  HBase  0.90.x,  Hadoop  0.20.x/1.0.x   Hadoop  Summit  2012.  6/13/12    Copyright  2012   70   Cloudera  Inc,  All  Rights  Reserved  
  71. 71. HBase  CompaDbility  and  Extensibility  •  Coming  in  HBase  0.96   •  HBASE-­‐5305  and  friends  •  Goals:   •  Allow  API  and  changes  and  persistent  data  structure  changes   while  guarantees  compaGbility  between  different  minor   versions  (0.96.0  -­‐>  0.96.1)   •  HBase  client  server  compaGbility  between  Major  Versions.   (0.96.x  -­‐>  0.98.x)   Hadoop  Summit  2012.  6/13/12    Copyright  2012   71   Cloudera  Inc,  All  Rights  Reserved  
  72. 72. HDFS  Wire  CompaDbility  •  Here  in  HDFS  2.0.x   •  HADOOP-­‐7347  and  friends   App   MR  •  Goals:   •  Allow  API  and  changes  while   guaranteeing  wire  compaGbility   between  different  minor  versions   •  HDFS  client  server  compaGbility   ZK   HDFS   between  Major  Versions.       Hadoop  Summit  2012.  6/13/12    Copyright  2012   72   Cloudera  Inc,  All  Rights  Reserved  
  73. 73. HDFS  Wire  CompaDbility  •  Here  in  HDFS  2.0.x   •  HADOOP-­‐7347  and  friends   App   MR  •  Goals:   •  Allow  API  and  changes  while   guaranteeing  wire  compaGbility   between  different  minor  versions   •  HDFS  client  server  compaGbility   ZK   HDFS   between  Major  Versions.       Hadoop  Summit  2012.  6/13/12    Copyright  2012   73   Cloudera  Inc,  All  Rights  Reserved  
  74. 74. CONCLUSIONS   Hadoop  Summit  2012.  6/13/12    Copyright  2012   74   Cloudera  Inc,  All  Rights  Reserved  
  75. 75. Improving  how  we  handling  causes  of  downDme   HBase  DownDme  DistribuDon   Unplanned  Maintenance:  Root   Cause  from  Cloudera  Support   Wire   compat   Best   hbck   pracGces   Repair   Planned   Needed   HBase,  ZK,   28%   MR,  HDFS   Misconfig   44%   Unplanned   Fix  HW/ NW   16%   Patch   Required   hbck  and   12%   distributed  log   Wire   spliung   compat   Hadoop  Summit  2012.  6/13/12    Copyright  2012   75   Cloudera  Inc,  All  Rights  Reserved  
  76. 76. jon@cloudera.com   TwiLer:  @jmhsieh     We’re  hiring!  QUESTIONS?     Hadoop  Summit  2012.  6/13/12    Copyright  2012   76   Cloudera  Inc,  All  Rights  Reserved  

×