SlideShare a Scribd company logo
1 of 76
Improving	
  HBase	
  Availability	
  and	
  Repair	
  
  Improving	
  HBase	
  Availability	
  and	
  Repair	
  


       Jeff	
  Bean,	
  Jonathan	
  Hsieh	
  {jw2ean,jon}
                         @cloudera.com	
  
                            6/13/12	
  	
  	
  	
  
                               	
  
Who	
  Are	
  We?	
  

•  Jeff	
  Bean	
  
    •  Designated	
  Support	
  Engineer,	
  Cloudera	
  
    •  EducaGon	
  Program	
  Lead,	
  Cloudera	
  



•  Jonathan	
  Hsieh	
  
    •  SoJware	
  Engineer,	
  Cloudera	
  
    •  Apache	
  HBase	
  CommiLer	
  and	
  PMC	
  member	
  




                           Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
     2	
  
                                 Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
What	
  is	
  Apache	
  HBase?	
  

                                                  Apache	
  HBase	
  is	
  an	
  
                                                   reliable,	
  column-­‐
                                                  oriented	
  data	
  store	
  
                                                     that	
  provides	
  
                                                    consistent,	
  low-­‐
                                                   latency,	
  random	
  
                                                  read/write	
  access.	
  
                      Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
     3	
  
                            Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Fault	
  Tolerance	
  vs	
  Highly	
  Available	
  

•  Fault	
  tolerant:	
  	
  
     •  Ability	
  to	
  recover	
  service	
  if	
  a	
  
        component	
  fails,	
  without	
  losing	
  
        data.	
                                                                                      Fault	
  Tolerant	
  



•  Highly	
  Available:	
  	
  
     •  Ability	
  to	
  quickly	
  recover	
  service	
  if	
                                           Highly	
  
        a	
  component	
  fails,	
  without	
  losing	
                                                 Available	
  
        data.	
  


•  Goal:	
  Minimize	
  downGme!	
  
                                 Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                             4	
  
                                       Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
HBase	
  Architecture	
  
•  HBase	
  is	
  designed	
  to	
  be	
  fault	
  tolerant	
  
   and	
  highly	
  available	
  	
  
    •  It	
  depends	
  on	
  other	
  systems	
  to	
  be	
  as	
  well.	
  
                                                                                                     App	
      MR	
  
•  ReplicaDon	
  for	
  fault	
  tolerance	
  	
  
    •    Serve	
  regions	
  from	
  any	
  Region	
  server	
  
    •    Failover	
  HMasters	
  
    •    ZK	
  Quorums	
  
    •    HDFS	
  Block	
  replicaGon	
  on	
  Data	
  Nodes	
  
                                                                                                      ZK	
     HDFS	
  
•  But	
  replicaGon	
  doesn’t	
  guarantee	
  high	
  
   availability	
  
    •  There	
  can	
  sGll	
  be	
  soJware	
  or	
  human	
  faults	
  

                                 Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                          5	
  
                                       Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Causes	
  of	
  HBase	
  DownDme	
  

                                                                                         HBase	
  DownDme	
  
                                                                                           DistribuDon	
  
•  Unplanned	
  Maintenance	
  
                  	
  
   •  Hardware	
  failures	
  
                         	
  
   •  SoJware	
  errors	
  
                                                                                                        Planned	
  
   •  Human	
  error	
  
•  Planned	
  Maintenance	
  
   •  Upgrades	
                                                                     Unplanned	
  


   •  MigraGons	
  




                          Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                             6	
  
                                Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Causes	
  of	
  Unexpected	
  Maintenance	
  Incidents	
  

	
                                                                         Unplanned	
  Maintenance:	
  Root	
  
                                                                           Cause	
  from	
  Cloudera	
  Support	
  
•      MisconfiguraGon	
  
•      Metadata	
  CorrupGons	
  
                                                                                         Repair	
  
•      Network	
  /	
  HW	
  problems	
                                                  Needed	
  
                                                                                                                     HBase,	
  ZK,	
  
                                                                                          28%	
  
•      SW	
  problems	
                                                                                              MR,	
  HDFS	
  
                                                                                                                     Misconfig	
  
                                                                                                                       44%	
  
                                                                                         Fix	
  HW/
•  Long	
  recovery	
  Gme	
                                                               NW	
  
                                                                                           16%	
       Patch	
  
        •  Automated	
  and	
  manual	
                                                               Required	
  
                                                                                                        12%	
  

                                                         Source:	
  Cloudera’s	
  producGon	
  HBase	
  Support	
  Tickets	
  	
  
                                                                    CDH3’s	
  HBase	
  0.90.x,	
  Hadoop	
  0.20.x/1.0.x	
  
                              Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                            7	
  
                                    Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Outline	
  
•  Where	
  we	
  were	
  	
  
    •  HBase	
  0.90.x	
  +	
  Hadoop	
  0.20.x/1.0.x	
  	
  
    •  Case	
  Studies	
  


•  Where	
  we	
  are	
  today	
  
    •  HBase	
  0.92.x/0.94.x	
  +	
  Hadoop	
  2.0.x	
  
    •  Feature	
  Summary	
  

•  Where	
  we	
  are	
  going	
  
    •  HBase	
  0.96.x	
  +	
  Hadoop	
  2.x	
  	
  
    •  Feature	
  Preview	
  
                                 Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
     8	
  
                                       Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
[T]here	
  are	
  known	
  knowns;	
  there	
  are	
  things	
  we	
  know	
  we	
  know.	
  
           We	
  also	
  know	
  there	
  are	
  known	
  unknowns;	
  that	
  is	
  to	
  say	
  we	
  know	
  
           there	
  are	
  some	
  things	
  we	
  do	
  not	
  know.	
  
           But	
  there	
  are	
  also	
  unknown	
  unknowns	
  –	
  there	
  are	
  things	
  we	
  do	
  not	
  
           know	
  we	
  don't	
  know.	
  
                              —United	
  States	
  Secretary	
  of	
  Defense	
  Donald	
  Rumsfeld	
  




WHERE	
  WE	
  WERE:	
  
CASE	
  STUDIES	
  
	
  
                 Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                      9	
  
                       Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Best	
  PracDces	
  to	
  avoid	
  hazards	
  

                                                                    Unplanned	
  Maintenance:	
  Root	
  
                                                                    Cause	
  from	
  Cloudera	
  Support	
  


                                                                                  Repair	
  
                                                                                  Needed	
  
                                                                                                              HBase,	
  ZK,	
  
                                                                                   28%	
  
                                                                                                              MR,	
  HDFS	
  
                                                                                                              Misconfig	
  
                                                                                                                44%	
  
                                                                                  Fix	
  HW/
                                                                                    NW	
  
                                                                                    16%	
       Patch	
  
                                                                                               Required	
  
                                                                                                 12%	
  

  CAN PREVENT HBASE                               Source:	
  Cloudera’s	
  producGon	
  HBase	
  Support	
  Tickets	
  	
  
  MISCONFIGURATIONS                                          CDH3’s	
  HBase	
  0.90.x,	
  Hadoop	
  0.20.x/1.0.x	
  
                       Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                            10	
  
                             Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Case	
  #1:	
  Memory	
  Over-­‐subscripDon	
  Hazard	
  


                Misconfig	
                                                                                                                Bad	
  Outcome	
  


                                                                                                                                                           Masters	
  Take	
  
                                                        Node	
  A	
  swaps	
  
•  Too	
  many	
  MR	
  Slots	
                                                                •  MapReduce	
  tasks	
  fail	
                               AcGon	
  
•  MR	
  Slots	
  too	
  large	
                                                               •  HDFS	
  datanode	
  
                                           •  “Arbitrary”	
  processes	
                          operaGons	
  Gme	
  out	
                  •  JobTracker	
  blacklists	
  TT	
  
                                              pause	
  or	
  unresponsive	
                                                                     on	
  node	
  B	
  
                                                                                               •  HBase	
  client	
  operaGons	
  
                                                                                                  fail	
                                     •  Jobs	
  fail	
  or	
  run	
  slow	
  
                                                                                                                                             •  NameNode	
  re-­‐replicates	
  
                                                                                                                                                blocks	
  from	
  node	
  A	
  
              Node	
  	
  A	
  Under	
                                                                     Node	
  B	
  can’t	
  
                  Load	
                                                                                 connect	
  to	
  node	
  A	
  




                                                       Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                                                  11	
  
                                                             Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Case	
  #2,	
  #3:	
  Hazards	
  of	
  Abusing	
  HDFS	
  and	
  ZK	
  

         Millions	
  of	
  HDFS	
  files	
                                                   Millions	
  of	
  ZK	
  nodes	
  
                                          Bad	
  PracGce	
                                                               MisconfiguraGon	
  
 500,000	
  blocks	
  per	
                                                  Millions	
  of	
  ZK	
  znodes	
  
 datanode	
                                                                  400MB	
  snapshot	
  


    Heartbeat	
  thread	
                          SW	
  Bug	
                     ZK	
  fails	
  to	
  create	
  new	
  
    blocks	
  IO	
                                                                 snapshots,	
  fails	
  


        RS	
  cannot	
  access	
                                                                                                 Bad	
  outcome	
  
                                                                                         HBase	
  goes	
  down	
  
        HDFS	
  


            HBase	
  goes	
  down	
                     Bad	
  outcome	
                        HBase	
  fails	
  to	
  restart	
  
                                                                                                                             SW	
  Bug,	
  Worse	
  
                                     Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
  
                                                                                                                               outcome	
   12	
  
                                           Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Case	
  #4:	
  SpliYng	
  CorrupDon	
  from	
  HW	
  failure	
  

                                                                                                                             Manual,	
  Slow,	
  and	
  
                              HW	
  Failure	
                                                                                 requires	
  expert	
  




                                                                                                     HBase	
  has	
  
   Region	
                                                                                           regions	
                    MulGple	
  6	
  hour	
  
                     Network	
  failure	
                Split	
  Recovery	
                      inconsistencies	
  
 aLempts	
  to	
                                                                                                                   manual	
  repair	
  
                     (takes	
  out	
  NN)	
               incomplete	
  
    split	
                                                                                             (overlaps	
  /	
             sessions.	
  
                                                                                                          holes)	
  




                                                                    SW	
  Bug	
  



                                    Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                                     13	
  
                                          Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Case	
  #5:	
  Slow	
  recovery	
  from	
  HW	
  failure	
  

                                                                                                                       Correct	
  but	
  slow!	
  
          Human	
  error	
  




                                                                On	
  restart,	
  
                               RS	
  loses	
                                                                                        9	
  hour	
  hlog	
  
  Network	
                                                        Root	
                                     Manual	
  
                                HDFS,	
                                                                                                  spliung	
  
 HW	
  failure	
                                                and	
  .META.	
                               Repairs	
  
                                WALs	
                                                                                                recovery	
  
                                                                assign	
  fails	
  




                                                     SW	
  error	
  


                                          Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                             14	
  
                                                Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
IniDal	
  Lessons	
  

•  Use	
  Best	
  pracGces	
  to	
  avoid	
  problems	
  
    •  ConservaGve	
  first	
  
    •  Avoid	
  unstable	
  features	
  


•  What	
  can	
  we	
  do?	
  
    •    Fix	
  the	
  bugs	
  
    •    Recover	
  from	
  problems	
  faster	
  
    •    Make	
  people	
  smarter	
  to	
  avoid	
  hazards	
  and	
  misconfiguraGons	
  
    •    Make	
  soJware	
  smarter	
  to	
  prevent	
  hazards	
  and	
  
         misconfiguraGons	
  

                             Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
     15	
  
                                   Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
In	
  war,	
  then,	
  let	
  your	
  great	
  object	
  be	
  victory,	
  
                                    not	
  lengthy	
  campaigns.	
  
                                                                                                   -­‐-­‐	
  Sun	
  Tzu	
  




WHERE	
  WE	
  ARE	
  TODAY	
  
HBASE	
  0.92.X	
  +	
  HADOOP	
  2.0.X	
  

              Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                                 16	
  
                    Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Goal:	
  Reduce	
  unexpected	
  downDme	
  by	
  
recovering	
  faster	
  	
  

•  Removing	
  the	
  SPOFs	
  
    •  HA	
  HDFS	
  


•  Faster	
  Recovery	
  
    •  Improved	
  hbck	
  
    •  Distributed	
  Log	
  spliung	
  




                            Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
     17	
  
                                  Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Problem:	
  HDFS	
  NN	
  goes	
  down	
  under	
  HBase	
  

•  HBase	
  depends	
  on	
  HDFS.	
                                                                App	
      MR	
  
     •  If	
  HDFS	
  is	
  down,	
  HBase	
  goes	
  down.	
  
•  RamificaGons.	
  
     •  Forces	
  Recovery	
  mechanism	
  
     •  Caused	
  some	
  data	
  corrupGons	
  
                                                                                                     ZK	
     HDFS	
  



•  Ideally	
  we	
  avoid	
  having	
  to	
  do	
  recovery	
  at	
  all.	
  



                                Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                          18	
  
                                      Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
HBase-­‐HDFS	
  HA	
  Nodes	
  

  NameNode	
  	
  (acGve)	
                                                                               HMaster	
  	
  
  (metadata	
  server)	
                                                                                  (region	
  metadata)	
  
  NameNode	
  	
  (standby)	
                                                                             HMaster	
  	
  
  	
  (acGve-­‐standby	
                                                                                  (hot	
  standby)	
  
  	
  hot	
  failover)	
  




                                         ZooKeeper	
  	
  Quorum	
  




          HDFS	
  DataNodes	
                                                                   HBase	
  RegionServers	
  


                                  Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                    19	
  
                                        Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
HBase-­‐HDFS	
  HA	
  Nodes:	
  Transparent	
  to	
  HBase	
  

                                                                                                          HMaster	
  	
  
                                                                                                          (region	
  metadata)	
  
                                                                                                          HMaster	
  	
  
  NameNode	
  	
  (acGve)	
                                                                               (hot	
  standby)	
  




                                         ZooKeeper	
  	
  Quorum	
  




          HDFS	
  DataNodes	
                                                                   HBase	
  RegionServers	
  


                                  Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                    20	
  
                                        Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
HBase-­‐HDFS	
  HA	
  Nodes:	
  No	
  more	
  SPOF	
  



                                                                                                          HMaster	
  	
  
  NameNode	
  	
  (acGve)	
                                                                               (acGve)	
  




                                         ZooKeeper	
  	
  Quorum	
  




          HDFS	
  DataNodes	
                                                                   HBase	
  RegionServers	
  


                                  Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                            21	
  
                                        Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Recovery	
  operaDons	
  

•  If	
  a	
  network	
  switch	
  fails	
  or	
  if	
  there	
  is	
  a	
  power	
  outage,	
  	
  
     •  HBase,	
  ZK,	
  and	
  HA	
  HDFS	
  will	
  fail	
  
     •  Will	
  always	
  sGll	
  rely	
  on	
  recovery	
  mechanisms.	
  



•  Need	
  to	
  be	
  able	
  to	
  quickly	
  recover	
  
     •  Metadata	
  Invariants	
  to	
  fix	
  metadata	
  corrupGons	
  
     •  Data	
  Consistency	
  to	
  restore	
  ACID	
  guarantees	
  




                                 Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
       22	
  
                                       Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
HBase	
  Metadata	
  CorrupDons	
  

•  Internal	
  HBase	
  metadata	
  
                                                                     Unplanned	
  Maintenance:	
  Root	
  Cause	
  
   corrupGons	
                                                            from	
  Cloudera	
  Support	
  
    •  Prevent	
  HBase	
  from	
  starGng	
  	
  
    •  Cause	
  some	
  regions	
  to	
  be	
  
                                                                                  Repair	
  
       unavailable.	
                                                             Needed	
  
                                                                                   28%	
                         HBase,	
  ZK,	
  
                                                                                                                 MR,	
  HDFS	
  
                                                                                                                 Misconfig	
  
•  Repairs	
  are	
  intricate	
  and	
                                                                            44%	
  
                                                                                  Fix	
  HW/
   can	
  cause	
  extended	
  periods	
                                            NW	
  
   of	
  downGme.	
                                                                 16%	
          Patch	
  
                                                                                                  Required	
  
                                                                                                    12%	
  



                              Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                        23	
  
                                    Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
HBase	
  Metadata	
  Invariants	
  

 Table	
  Integrity	
                                                       Region	
  Consistency	
  
 •  Every	
  key	
  shall	
  get	
  assigned	
                              •  Metadata	
  about	
  regions	
  should	
  
    to	
  a	
  single	
  region.	
                                             agree	
  in	
  hdfs,	
  meta	
  and	
  region	
  
                                                                               server	
  assignment.	
  
                [‘	
  ‘,A)	
  
                [A,B)	
                                                                                     regioninfo	
  	
  
                                                                                                             in	
  META	
  
                [B,	
  C)	
  
                [C,	
  D)	
  
                [D,	
  E)	
                                                                                    Good	
  
                [E,	
  F)	
                                                              region	
  
                                                                                         assigned	
  	
                    .regioninfo	
  	
  
                [F,	
  G)	
                                                              to	
  	
  RS	
                       in	
  HDFS	
  
                [G,	
  ‘	
  ‘)	
  

                                     Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                             24	
  
                                           Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
DetecDng	
  and	
  Repairing	
  corrupDon	
  with	
  hbck	
  
•  HBase	
  0.90	
  hbck	
  	
  
    •  Checks	
  an	
  HBase	
  
       instance’s	
  internals	
  
       invariants.	
  
•  HBase	
  hbck	
  today	
  
    •  Checks	
  and	
  can	
  fix	
  
       problem	
  in	
  an	
  HBase	
  
       instance’s	
  internal	
  
       invariants	
  
     •  0.90.7,	
  0.92.2,	
  
        0.94.0	
  
     •  CDH3u4,	
  CDH4	
  
                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
     25	
  
                                     Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Case	
  #4	
  redux:	
  SpliYng	
  CorrupDon	
  

                                                                                                                    Manual,	
  Slow,	
  and	
  
                              HW	
  Failure	
                                                                        requires	
  expert	
  




                                                                                                     HBase	
  has	
  
   Region	
          Network	
  failure	
                                                             regions	
           MulGple	
  6	
  hour	
  
                                                         Split	
  Recovery	
                      inconsistencies	
  
 aLempts	
  to	
                                                                                                          manual	
  repair	
  
                     (takes	
  out	
  NN)	
               incomplete	
  
    split	
                                                                                         (overlaps	
  /	
        sessions.	
  
                                                                                                       holes)	
  




                                                                    SW	
  Bug	
  



                                    Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                            26	
  
                                          Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Case	
  #4	
  redux:	
  SpliYng	
  CorrupDon	
  

                              HW	
  Failure	
  




                                                                                                     HBase	
  has	
  
   Region	
          Network	
  failure	
                                                             regions	
                Automated	
  
                                                         Split	
  Recovery	
                      inconsistencies	
  
 aLempts	
  to	
                                                                                                               repair	
  tool	
  
                     (takes	
  out	
  NN)	
               incomplete	
  
    split	
                                                                                         (overlaps	
  /	
            (Minutes)	
  
                                                                                                       holes)	
  




                                                                    SW	
  Bug	
                                          Fixes	
  are	
  quicker,	
  
                                                                                                                         operator	
  can	
  use	
  


                                    Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                                 27	
  
                                          Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Case	
  #4	
  redux:	
  SpliYng	
  CorrupDon	
  

                              HW	
  Failure	
  




                                                                                                   Minor	
  	
  HBase	
  
   Region	
          Network	
  failure	
                                                         inconsistencies	
         Automated	
  
                                                         Split	
  Recovery	
  
 aLempts	
  to	
                                                                                                            repair	
  tool	
  
                     (takes	
  out	
  NN)	
               incomplete	
                                  (bad	
  
    split	
                                                                                                                  (seconds)	
  
                                                                                                   assignments)	
  




                                                             Fixed	
  SW	
  Bug	
  



                                    Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                              28	
  
                                          Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Data	
  Consistency	
  

•  When	
  a	
  region	
  server	
  goes	
  down,	
  it	
  tries	
  to	
  flush	
  data	
  in	
  
   memory	
  to	
  HDFS.	
  
•  If	
  it	
  cannot	
  write	
  to	
  HDFS,	
  it	
  relies	
  on	
  the	
  WAL/HLog.	
  

•  Recovery	
  via	
  the	
  HLog	
  is	
  vital	
  to	
  prevent	
  data	
  loss	
  
     •  Understand	
  the	
  write	
  path.	
  
     •  Recovery:	
  	
  HLog	
  spliung.	
  
     •  Faster	
  Recovery:	
  Distributed	
  HLog	
  spliung.	
  



                              Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
     29	
  
                                    Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Write	
  Path	
  (Put	
  /	
  Delete	
  /	
  Increment)	
  

     HBase	
  
     client	
                             Region	
  Server	
  


                                                                                        HLog	
                  Put	
  
                                                Server	
  


                                                                                 HRegion	
                      HRegion	
  
                                                                                   MemStore	
                     MemStore	
  
                                                                                   Put	
  




                                                                                    HStore	
  


                                                                                                   HStore	
  



                                                                                                                  HStore	
  


                                                                                                                               HStore	
  
                         Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                                    30	
  
                               Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Write	
  Path	
  (Put	
  /	
  Delete	
  /	
  Increment)	
  
                                                                                                               Note,	
  both	
  regions	
  
                                                                                                               write	
  to	
  the	
  same	
  
     HBase	
                                                                                                   HLog	
  
     client	
                                 Region	
  Server	
  
                   Put	
  

                                                                                            HLog	
                      Put	
           Put	
  
                                                    Server	
  


                                                                                     HRegion	
                           HRegion	
  
                                                                                       MemStore	
                         MemStore	
  
                                                                                       Put	
                              Put	
  




                                                                                        HStore	
  


                                                                                                       HStore	
  



                                                                                                                           HStore	
  


                                                                                                                                            HStore	
  
                             Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                                             31	
  
                                   Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Log	
  SpliYng	
  
                                               HMaster	
  




          RegionServer	
                                RegionServer	
                                       RegionServer	
  
           HLog1	
                                         HLog2	
                                            HLog3	
  


                                                                                                                                             …	
  
            HRegion	
  


                           HRegion	
  




                                                              HRegion	
  


                                                                                HRegion	
  




                                                                                                                HRegion	
  


                                                                                                                               HRegion	
  
           mem	
          mem	
                             mem	
             mem	
                            mem	
          mem	
  
                                         Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                             32	
  
                                               Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Log	
  SpliYng	
  
                                               HMaster	
  




          RegionServer	
                                RegionServer	
                                       RegionServer	
  
           HLog1	
                                         HLog2	
                                            HLog3	
  


                                                                                                                                             …	
  
            HRegion	
  


                           HRegion	
  




                                                              HRegion	
  


                                                                                HRegion	
  




                                                                                                                HRegion	
  


                                                                                                                               HRegion	
  
           mem	
          mem	
                             mem	
             mem	
                            mem	
          mem	
  
                                         Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                             33	
  
                                               Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Log	
  SpliYng	
  
                                              HMaster	
  




           HLog1	
                                        HLog2	
                                           HLog3	
  


                                                                                                                                         …	
  
            HRegion	
  


                          HRegion	
  




                                                             HRegion	
  


                                                                               HRegion	
  




                                                                                                             HRegion	
  


                                                                                                                           HRegion	
  
                                        Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                          34	
  
                                              Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Log	
  SpliYng	
                                                                                                           Spliung	
  log	
  1	
  
                                              HMaster	
  




           HLog1	
                                        HLog2	
                                           HLog3	
  


                                                                                                                                                …	
  
            HRegion	
  


                          HRegion	
  




                                                             HRegion	
  


                                                                               HRegion	
  




                                                                                                             HRegion	
  


                                                                                                                               HRegion	
  
                                        Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                                 35	
  
                                              Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Log	
  SpliYng	
                                                                                                           Spliung	
  log	
  2	
  
                                              HMaster	
  




           HLog	
  
           HLog1	
                                        HLog2	
                                           HLog3	
  


                                                                                                                                                …	
  
            HRegion	
  


                          HRegion	
  




                                                             HRegion	
  


                                                                               HRegion	
  




                                                                                                             HRegion	
  


                                                                                                                               HRegion	
  
                                        Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                                 36	
  
                                              Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Log	
  SpliYng	
                                                                                                           Spliung	
  log	
  3	
  
                                              HMaster	
  




           HLog	
  
           HLog1	
                                        HLog	
  
                                                          HLog2	
                                           HLog3	
  


                                                                                                                                                …	
  
            HRegion	
  


                          HRegion	
  




                                                             HRegion	
  


                                                                               HRegion	
  




                                                                                                             HRegion	
  


                                                                                                                               HRegion	
  
                                        Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                                 37	
  
                                              Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Log	
  SpliYng	
                                                                                                           Spliung	
  log	
  100	
  
                                              HMaster	
  




           HLog	
                                         HLog	
                                            HLog	
  


                                                                                                                                                …	
  
            HRegion	
  


                          HRegion	
  




                                                             HRegion	
  


                                                                               HRegion	
  




                                                                                                             HRegion	
  


                                                                                                                                HRegion	
  
                                        Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                                 38	
  
                                              Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Log	
  SpliYng	
                                                                                                     Whew.	
  	
  I	
  did	
  a	
  lot	
  of	
  
                                                                                                                     spliung	
  work.	
  	
  That	
  
                                                                                                                         took	
  9	
  hours!	
  
                                              HMaster	
  




           HLog	
                                         HLog	
                                            HLog	
  


                                                                                                                                                       …	
  
            HRegion	
  


                          HRegion	
  




                                                             HRegion	
  


                                                                               HRegion	
  




                                                                                                             HRegion	
  


                                                                                                                                HRegion	
  
                                        Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                                            39	
  
                                              Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Log	
  SpliYng	
                                                                                                        RegionServers,	
  here	
  
                                                                                                                          are	
  your	
  region	
  
                                                                                                                           assignments.	
  
                                              HMaster	
  


         RegionServer4	
                              RegionServer5	
                                       RegionServer6	
  

                                                                                                                                             …	
  




                                                                                                                                             …	
  
            HRegion	
  


                          HRegion	
  




                                                             HRegion	
  


                                                                               HRegion	
  




                                                                                                               HRegion	
  


                                                                                                                               HRegion	
  
                                        Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                               40	
  
                                              Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Log	
  SpliYng	
                                                                                                                Victory!	
  
                                               HMaster	
  


         RegionServer4	
                               RegionServer5	
                                       RegionServer6	
  



                                                                                                                                               …	
  
           HRegion	
  


                           HRegion	
  




                                                             HRegion	
  


                                                                                HRegion	
  




                                                                                                                HRegion	
  


                                                                                                                               HRegion	
  
           mem	
         mem	
                             mem	
              mem	
                            mem	
          mem	
  




                                         Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                               41	
  
                                               Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Can	
  we	
  recover	
  more	
  quickly?	
  	
  

•  In	
  the	
  case	
  study,	
  this	
  is	
  all	
  done	
  serially	
  by	
  the	
  master	
  	
  
     •  The	
  master	
  took	
  9	
  hours	
  to	
  recovery.	
  
     •  The	
  100	
  region	
  server	
  nodes	
  were	
  idle.	
  	
  


•  Let’s	
  use	
  the	
  idle	
  machines	
  to	
  do	
  spliung	
  in	
  parallel!	
  

•  Distributed	
  log	
  spliYng	
  (HBASE-­‐1364)	
  
     •  Introduced	
  in	
  0.92.0	
  by	
  Prakash	
  Khemani	
  (Facebook)	
  
     •  Included	
  in	
  CDH4	
  (0.92.1)	
  	
  	
  
     •  Backported	
  to	
  CDH3u3	
  (off	
  by	
  default)	
  
                                 Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
     42	
  
                                       Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Distributed	
  Log	
  SpliYng	
                                                                                               I’m	
  the	
  boss.	
  
                                               HMaster	
  




          RegionServer	
                                RegionServer	
                                       RegionServer	
  
           HLog1	
                                         HLog2	
                                            HLog3	
  


                                                                                                                                                   …	
  
            HRegion	
  


                           HRegion	
  




                                                              HRegion	
  


                                                                                HRegion	
  




                                                                                                                HRegion	
  


                                                                                                                                 HRegion	
  
           mem	
          mem	
                             mem	
             mem	
                            mem	
            mem	
  
                                         Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                                   43	
  
                                               Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Distributed	
  Log	
  SpliYng	
                                                                                             There	
  is	
  a	
  lot	
  of	
  
                                                                                                                          spliung	
  work	
  here,	
  
                                             HMaster	
                                                                      let’s	
  split	
  it	
  up.	
  




          HLog1	
                                        HLog2	
                                           HLog3	
  


                                                                                                                                                    …	
  
           HRegion	
  


                         HRegion	
  




                                                            HRegion	
  


                                                                              HRegion	
  




                                                                                                            HRegion	
  


                                                                                                                                 HRegion	
  
                                       Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                                          44	
  
                                             Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Distributed	
  Log	
  SpliYng	
                                                                                         You	
  guys	
  do	
  the	
  work	
  
                                                                                                                                 for	
  me.	
  
                                              HMaster	
  


         RegionServer4	
                              RegionServer5	
                                       RegionServer6	
  




          HLog1	
                                         HLog2	
                                            HLog3	
  


                                                                                                                                                  …	
  
            HRegion	
  


                          HRegion	
  




                                                             HRegion	
  


                                                                               HRegion	
  




                                                                                                               HRegion	
  


                                                                                                                                HRegion	
  
                                        Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                                        45	
  
                                              Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Distributed	
  Log	
  SpliYng	
                                                                                         You	
  guys	
  do	
  the	
  work	
  
                                                                                                                                 for	
  me.	
  
                                              HMaster	
  


         RegionServer4	
                              RegionServer5	
                                       RegionServer6	
  




          HLog1	
                                         HLog2	
                                            HLog3	
  


                                                                                                                                                  …	
  
            HRegion	
  


                          HRegion	
  




                                                             HRegion	
  


                                                                               HRegion	
  




                                                                                                               HRegion	
  


                                                                                                                                HRegion	
  
                                        Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                                        46	
  
                                              Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Distributed	
  Log	
  SpliYng	
                                                                                              Great,	
  that	
  took	
  5.4	
  
                                                                                                                                  minutes.	
  
                                              HMaster	
  


         RegionServer4	
                              RegionServer5	
                                       RegionServer6	
  




                                                                                                                                                     …	
  
            HRegion	
  


                          HRegion	
  




                                                             HRegion	
  


                                                                               HRegion	
  




                                                                                                               HRegion	
  


                                                                                                                                    HRegion	
  
                                        Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                                          47	
  
                                              Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Distributed	
  Log	
  SpliYng	
                                                                                              Good	
  Job,	
  here	
  are	
  
                                                                                                                                your	
  region	
  
                                                                                                                               assignments.	
  
                                              HMaster	
  


         RegionServer4	
                              RegionServer5	
                                       RegionServer6	
  




                                                                                                                                                      …	
  
            HRegion	
  


                          HRegion	
  




                                                             HRegion	
  


                                                                               HRegion	
  




                                                                                                               HRegion	
  


                                                                                                                                     HRegion	
  
                                        Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                                        48	
  
                                              Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Distributed	
  Log	
  SpliYng	
                                                                                              Like	
  a	
  Boss.	
  
                                              HMaster	
  


         RegionServer4	
                              RegionServer5	
                                       RegionServer6	
  



                                                                                                                                                      …	
  
           HRegion	
  


                          HRegion	
  




                                                            HRegion	
  


                                                                               HRegion	
  




                                                                                                               HRegion	
  


                                                                                                                               HRegion	
  
           mem	
         mem	
                             mem	
             mem	
                            mem	
          mem	
  




                                        Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                                       49	
  
                                              Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Case	
  #5	
  redux:	
  Network	
  failure	
  and	
  slow	
  recovery	
  

                                                                                                                      Correct	
  but	
  slow!	
  
          Human	
  error	
  




                                                                On	
  restart,	
  
                               RS	
  loses	
                                                                                       9	
  hour	
  hlog	
  
  Network	
                                                        Root	
                                     Manual	
  
                                HDFS,	
                                                                                                 spliung	
  
 HW	
  failure	
                                                and	
  .META.	
                               Repair	
  
                                WALs	
                                                                                               recovery	
  
                                                                assign	
  fails	
  




                                          Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                            50	
  
                                                Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Case	
  #5	
  redux:	
  Network	
  failure	
  and	
  slow	
  recovery	
  

                                                                                                                      Correct	
  and	
  Faster!	
  
          Human	
  error	
  




                                                                On	
  restart,	
                                                     5.4	
  Minute	
  
                               RS	
  loses	
  
  Network	
                                                        Root	
                                     AutomaGc	
                  hlog	
  
                                HDFS,	
  
 HW	
  failure	
                                                and	
  .META.	
                                repairs	
              spliung	
  
                                WALs	
  
                                                                assign	
  fails	
                                                     recovery	
  




                                                                                                 Fixed!	
  


                                          Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                             51	
  
                                                Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
WHERE	
  WE	
  ARE	
  GOING	
  
HBASE	
  0.96	
  +	
  HADOOP	
  2.X	
  

               Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
     52	
  
                     Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Themes	
  

•  Minimizing	
  Planned	
  downGme	
                                                            HBase	
  DownDme	
  
   •  Changing	
  configuraGons	
                                                                   DistribuDon	
  
   •  Online	
  Schema	
  Change	
  
      (experimental	
  in	
  0.92,	
  0.94)	
  
   •  Rolling	
  Restarts	
                                                                                     Planned	
  

   •  Wire	
  compaGbility	
  

                                                                                             Unplanned	
  




                             Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                  53	
  
                                   Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Table	
  unavailable	
  when	
  changing	
  schema	
  

•  Changing	
  table	
  schema	
  requires	
  disabling	
  table	
  
    •  disable	
  table,	
  alter	
  table	
  schema,	
  enable	
  table	
  
    •  Schema	
  includes	
  compression,	
  cf’s,	
  caching,	
  Ll,	
  versions.	
  


•  Goal:	
  Quickly	
  change	
  table	
  and	
  column	
  configuraGon	
  
   seungs	
  without	
  having	
  to	
  disable	
  Hbase	
  tables.	
  
    •  Feature	
  Online	
  Schema	
  Change	
  (HBASE-­‐1730)	
  
    •  Included	
  in	
  but	
  considered	
  experimental	
  in	
  HBase	
  0.92/0.94.	
  	
  	
  
    •  Contributed	
  by	
  Facebook	
  


                              Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
     54	
  
                                    Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Changing	
  Server	
  Configs	
  and	
  Sogware	
  updates	
  

•  Rolling	
  restart	
  is	
  an	
  operaGon	
  for	
  upgrading	
  an	
  HBase	
  
   cluster	
  to	
  a	
  compaGble	
  version	
  while	
  keeping	
  HBase	
  
   available	
  and	
  serving	
  data.	
  
    •  Handle	
  server	
  config	
  changes.	
  
    •  Handle	
  code	
  changes	
  like	
  ho}ixes	
  or	
  compaGble	
  upgrades	
  
    	
  




                            Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
     55	
  
                                  Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Rolling	
  Restart	
  

                                                                                  Admin	
  
                                                                                operaGons	
  
                                                                                                             ZK	
  

                 Client	
                         Shell	
  

                                                                                                            HM1	
  
   User	
  
 operaGons	
  

                                                                                                            HM2	
  
                 RS1	
        RS2	
                  RS3	
                     RS4	
  


                                                                                                    Internal	
  
                                                                                                   operaGons	
  

                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                        56	
  
                                     Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Rolling	
  Restart	
  

                                                                                  Admin	
  
                                                                                operaGons	
  
                                                                                                             ZK	
  

                 Client	
                         Shell	
  

                                                                                                            HM1	
  
   User	
  
 operaGons	
  

                                                                                                            HM2	
  
                              RS2	
                  RS3	
                     RS4	
  


                                                                                                    Internal	
  
                                                                                                   operaGons	
  

                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                        57	
  
                                     Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Rolling	
  Restart	
  

                                                                                  Admin	
  
                                                                                operaGons	
  
                                                                                                             ZK	
  

                 Client	
                         Shell	
  

                                                                                                            HM1	
  
   User	
  
 operaGons	
  

                                                                                                            HM2	
  
                 RS1	
        RS2	
                  RS3	
                     RS4	
  


                                                                                                    Internal	
  
                                                                                                   operaGons	
  

                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                        58	
  
                                     Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Rolling	
  Restart	
  

                                                                                 Admin	
  
                                                                               operaGons	
  
                                                                                                            ZK	
  

                 Client	
                        Shell	
  

                                                                                                           HM1	
  
   User	
  
 operaGons	
  

                                                                                                           HM2	
  
                 RS1	
                              RS3	
                     RS4	
  


                                                                                                   Internal	
  
                                                                                                  operaGons	
  

                              Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                        59	
  
                                    Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Rolling	
  Restart	
  

                                                                                  Admin	
  
                                                                                operaGons	
  
                                                                                                             ZK	
  

                 Client	
                         Shell	
  

                                                                                                            HM1	
  
   User	
  
 operaGons	
  

                                                                                                            HM2	
  
                 RS1	
        RS2	
                  RS3	
                     RS4	
  


                                                                                                    Internal	
  
                                                                                                   operaGons	
  

                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                        60	
  
                                     Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Rolling	
  Restart	
  

                                                                                  Admin	
  
                                                                                operaGons	
  
                                                                                                             ZK	
  

                 Client	
                         Shell	
  

                                                                                                            HM1	
  
   User	
  
 operaGons	
  

                                                                                                            HM2	
  
                 RS1	
        RS2	
                                            RS4	
  


                                                                                                    Internal	
  
                                                                                                   operaGons	
  

                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                        61	
  
                                     Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Rolling	
  Restart	
  

                                                                                  Admin	
  
                                                                                operaGons	
  
                                                                                                             ZK	
  

                 Client	
                         Shell	
  

                                                                                                            HM1	
  
   User	
  
 operaGons	
  

                                                                                                            HM2	
  
                 RS1	
        RS2	
                  RS3	
                     RS4	
  


                                                                                                    Internal	
  
                                                                                                   operaGons	
  

                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                        62	
  
                                     Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Rolling	
  Restart	
  

                                                                                  Admin	
  
                                                                                operaGons	
  
                                                                                                             ZK	
  

                 Client	
                         Shell	
  

                                                                                                            HM1	
  
   User	
  
 operaGons	
  

                                                                                                            HM2	
  
                 RS1	
        RS2	
                  RS3	
  


                                                                                                    Internal	
  
                                                                                                   operaGons	
  

                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                        63	
  
                                     Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Rolling	
  Restart	
  

                                                                                  Admin	
  
                                                                                operaGons	
  
                                                                                                             ZK	
  

                 Client	
                         Shell	
  

                                                                                                            HM1	
  
   User	
  
 operaGons	
  

                                                                                                            HM2	
  
                 RS1	
        RS2	
                  RS3	
                     RS4	
  


                                                                                                    Internal	
  
                                                                                                   operaGons	
  

                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                        64	
  
                                     Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Rolling	
  Restart	
  

                                                                                  Admin	
  
                                                                                operaGons	
  
                                                                                                             ZK	
  

                 Client	
                         Shell	
  


   User	
  
 operaGons	
  

                                                                                                            HM2	
  
                 RS1	
        RS2	
                  RS3	
                     RS4	
  


                                                                                                    Internal	
  
                                                                                                   operaGons	
  

                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                        65	
  
                                     Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Rolling	
  Restart	
  

                                                                                  Admin	
  
                                                                                operaGons	
  
                                                                                                             ZK	
  

                 Client	
                         Shell	
  

                                                                                                            HM1	
  
   User	
  
 operaGons	
  

                                                                                                            HM2	
  
                 RS1	
        RS2	
                  RS3	
                     RS4	
  


                                                                                                    Internal	
  
                                                                                                   operaGons	
  

                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                        66	
  
                                     Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Rolling	
  Restart	
  

                                                                                  Admin	
  
                                                                                operaGons	
  
                                                                                                             ZK	
  

                 Client	
                         Shell	
  

                                                                                                            HM1	
  
   User	
  
 operaGons	
  


                 RS1	
        RS2	
                  RS3	
                     RS4	
  


                                                                                                    Internal	
  
                                                                                                   operaGons	
  

                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                        67	
  
                                     Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Rolling	
  Restart	
  

                                                                                  Admin	
  
                                                                                operaGons	
  
                                                                                                             ZK	
  

                 Client	
                         Shell	
  

                                                                                                            HM1	
  
   User	
  
 operaGons	
  

                                                                                                            HM2	
  
                 RS1	
        RS2	
                  RS3	
                     RS4	
  


                                                                                                    Internal	
  
                                                                                                   operaGons	
  

                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                        68	
  
                                     Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Rolling	
  Restart	
  

                                                                                  Admin	
  
                                                                                operaGons	
  
                                                                                                             ZK	
  

                 Client	
                         Shell	
  

                                                                                                            HM1	
  
   User	
  
 operaGons	
  

                                                                                                            HM2	
  
                 RS1	
        RS2	
                  RS3	
                     RS4	
  


                                                                                                    Internal	
  
                                                                                                   operaGons	
  

                               Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                        69	
  
                                     Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Rolling	
  restart	
  limitaDons	
  
•  There	
  are	
  limitaGons	
  on	
                                              Unplanned	
  Maintenance:	
  Root	
  
   rolling	
  restarts	
  	
                                                       Cause	
  from	
  Cloudera	
  Support	
  
    •  All	
  Servers	
  and	
  clients	
  must	
  be	
  
       wire	
  compaGble	
  
    •  All	
  must	
  be	
  able	
  to	
  read	
  old	
  
       data	
  in	
  FS	
  and	
  ZK.	
                                                          Repair	
  
                                                                                                 Needed	
  
                                                                                                                             HBase,	
  ZK,	
  
                                                                                                  28%	
  
•  RamificaGons:	
  	
                                                                                                        MR,	
  HDFS	
  
                                                                                                                             Misconfig	
  
     •  Only	
  minor	
  version	
  upgrades	
                                                                                 44%	
  
        possible	
                                                                               Fix	
  HW/
     •  New	
  features	
  that	
  change	
  RPCs	
                                                NW	
  
        require	
  custom	
  compaGbility	
                                                        16%	
       Patch	
  
        shims.	
                                                                                              Required	
  
     •  Data	
  format	
  changes	
  not	
                                                                      12%	
  
        possible	
  across	
  minor	
  versions.	
  
                                                                 Source:	
  Cloudera’s	
  producGon	
  HBase	
  Support	
  Tickets	
  	
  
                                                                            CDH3’s	
  HBase	
  0.90.x,	
  Hadoop	
  0.20.x/1.0.x	
  
                                      Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                            70	
  
                                            Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
HBase	
  CompaDbility	
  and	
  Extensibility	
  

•  Coming	
  in	
  HBase	
  0.96	
  
    •  HBASE-­‐5305	
  and	
  friends	
  


•  Goals:	
  
    •  Allow	
  API	
  and	
  changes	
  and	
  persistent	
  data	
  structure	
  changes	
  
       while	
  guarantees	
  compaGbility	
  between	
  different	
  minor	
  
       versions	
  (0.96.0	
  -­‐>	
  0.96.1)	
  
    •  HBase	
  client	
  server	
  compaGbility	
  between	
  Major	
  Versions.	
  
       (0.96.x	
  -­‐>	
  0.98.x)	
  



                             Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
     71	
  
                                   Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
HDFS	
  Wire	
  CompaDbility	
  

•  Here	
  in	
  HDFS	
  2.0.x	
  
     •  HADOOP-­‐7347	
  and	
  friends	
  
                                                                                                 App	
      MR	
  
•  Goals:	
  
     •  Allow	
  API	
  and	
  changes	
  while	
  
        guaranteeing	
  wire	
  compaGbility	
  
        between	
  different	
  minor	
  versions	
  
     •  HDFS	
  client	
  server	
  compaGbility	
                                                ZK	
     HDFS	
  
        between	
  Major	
  Versions.	
  	
  

     	
  
                             Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                          72	
  
                                   Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
HDFS	
  Wire	
  CompaDbility	
  

•  Here	
  in	
  HDFS	
  2.0.x	
  
     •  HADOOP-­‐7347	
  and	
  friends	
  
                                                                                                 App	
      MR	
  
•  Goals:	
  
     •  Allow	
  API	
  and	
  changes	
  while	
  
        guaranteeing	
  wire	
  compaGbility	
  
        between	
  different	
  minor	
  versions	
  
     •  HDFS	
  client	
  server	
  compaGbility	
                                                ZK	
     HDFS	
  
        between	
  Major	
  Versions.	
  	
  

     	
  
                             Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                          73	
  
                                   Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
CONCLUSIONS	
  


          Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
     74	
  
                Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
Improving	
  how	
  we	
  handling	
  causes	
  of	
  downDme	
  

     HBase	
  DownDme	
  DistribuDon	
  
                                                                           Unplanned	
  Maintenance:	
  Root	
  
                                                                           Cause	
  from	
  Cloudera	
  Support	
  
                                        Wire	
  
                                       compat	
                                                                                 Best	
  
                                                                     hbck	
                                                   pracGces	
  
                                                                                         Repair	
  
                                  Planned	
  
                                                                                         Needed	
  
                                                                                                                     HBase,	
  ZK,	
  
                                                                                          28%	
  
                                                                                                                     MR,	
  HDFS	
  
                                                                                                                     Misconfig	
  
                                                                                                                       44%	
  
      Unplanned	
  
                                                                                         Fix	
  HW/
                                                                                           NW	
  
                                                                                           16%	
       Patch	
  
                                                                                                      Required	
  
                                          hbck	
  and	
                                                 12%	
  
                                       distributed	
  log	
                                                                     Wire	
  
                                           spliung	
                                                                           compat	
  

                              Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                                            75	
  
                                    Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  
jon@cloudera.com	
  
                                                                         TwiLer:	
  @jmhsieh	
  	
  
                                                                             We’re	
  hiring!	
  
QUESTIONS?	
  
	
  

           Hadoop	
  Summit	
  2012.	
  6/13/12	
  	
  Copyright	
  2012	
                             76	
  
                 Cloudera	
  Inc,	
  All	
  Rights	
  Reserved	
  

More Related Content

What's hot

Hadoop World 2011: Practical HBase - Ravi Veeramchaneni, Informatica
Hadoop World 2011: Practical HBase - Ravi Veeramchaneni, InformaticaHadoop World 2011: Practical HBase - Ravi Veeramchaneni, Informatica
Hadoop World 2011: Practical HBase - Ravi Veeramchaneni, InformaticaCloudera, Inc.
 
Scalability
ScalabilityScalability
Scalabilityfelho
 
Real-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQReal-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQSybase Türkiye
 
My sql 5.6_replwebinar_may12
My sql 5.6_replwebinar_may12My sql 5.6_replwebinar_may12
My sql 5.6_replwebinar_may12Mat Keep
 
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseHadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseCloudera, Inc.
 
Nevmug Martins Point Health Care J Anuary 2009
Nevmug   Martins Point Health Care   J Anuary 2009Nevmug   Martins Point Health Care   J Anuary 2009
Nevmug Martins Point Health Care J Anuary 2009csharney
 
Monitoring VMware vFabric with Hyperic and Spring Insight
Monitoring VMware vFabric with Hyperic and Spring InsightMonitoring VMware vFabric with Hyperic and Spring Insight
Monitoring VMware vFabric with Hyperic and Spring InsightC2B2 Consulting
 
App Dynamics & SOASTA Testing & Monitoring Converge, March 2012
App Dynamics & SOASTA Testing & Monitoring Converge, March 2012App Dynamics & SOASTA Testing & Monitoring Converge, March 2012
App Dynamics & SOASTA Testing & Monitoring Converge, March 2012SOASTA
 
Managed Services Seminar Presentation
Managed Services Seminar PresentationManaged Services Seminar Presentation
Managed Services Seminar Presentationgerrymark
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANASAP Technology
 
Riverbed Granite
Riverbed GraniteRiverbed Granite
Riverbed GraniteCTI Group
 
Finding Virtual Coins in the Couch
Finding Virtual Coins in the CouchFinding Virtual Coins in the Couch
Finding Virtual Coins in the CouchNovell
 
VMware & Riverbed
VMware & RiverbedVMware & Riverbed
VMware & Riverbedvmug
 
IBM Systems solution for SAP NetWeaver Business Warehouse Accelerator
IBM Systems solution for SAP NetWeaver Business Warehouse AcceleratorIBM Systems solution for SAP NetWeaver Business Warehouse Accelerator
IBM Systems solution for SAP NetWeaver Business Warehouse AcceleratorIBM India Smarter Computing
 
Apache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondApache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondDataWorks Summit
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)baggioss
 
BranchReduce Distributed Branch-and-Bound on YARN
BranchReduce Distributed Branch-and-Bound on YARNBranchReduce Distributed Branch-and-Bound on YARN
BranchReduce Distributed Branch-and-Bound on YARNDataWorks Summit
 
Securing Your Endpoints Using Novell ZENworks Endpoint Security Management
Securing Your Endpoints Using Novell ZENworks Endpoint Security ManagementSecuring Your Endpoints Using Novell ZENworks Endpoint Security Management
Securing Your Endpoints Using Novell ZENworks Endpoint Security ManagementNovell
 
A Foundation for Success in the Information Economy
A Foundation for Success in the Information EconomyA Foundation for Success in the Information Economy
A Foundation for Success in the Information EconomyInside Analysis
 

What's hot (20)

Hadoop World 2011: Practical HBase - Ravi Veeramchaneni, Informatica
Hadoop World 2011: Practical HBase - Ravi Veeramchaneni, InformaticaHadoop World 2011: Practical HBase - Ravi Veeramchaneni, Informatica
Hadoop World 2011: Practical HBase - Ravi Veeramchaneni, Informatica
 
Scalability
ScalabilityScalability
Scalability
 
Real-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQReal-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQ
 
My sql 5.6_replwebinar_may12
My sql 5.6_replwebinar_may12My sql 5.6_replwebinar_may12
My sql 5.6_replwebinar_may12
 
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the EnterpriseHadoop Summit 2012 | Integrating Hadoop Into the Enterprise
Hadoop Summit 2012 | Integrating Hadoop Into the Enterprise
 
Nevmug Martins Point Health Care J Anuary 2009
Nevmug   Martins Point Health Care   J Anuary 2009Nevmug   Martins Point Health Care   J Anuary 2009
Nevmug Martins Point Health Care J Anuary 2009
 
Monitoring VMware vFabric with Hyperic and Spring Insight
Monitoring VMware vFabric with Hyperic and Spring InsightMonitoring VMware vFabric with Hyperic and Spring Insight
Monitoring VMware vFabric with Hyperic and Spring Insight
 
App Dynamics & SOASTA Testing & Monitoring Converge, March 2012
App Dynamics & SOASTA Testing & Monitoring Converge, March 2012App Dynamics & SOASTA Testing & Monitoring Converge, March 2012
App Dynamics & SOASTA Testing & Monitoring Converge, March 2012
 
Managed Services Seminar Presentation
Managed Services Seminar PresentationManaged Services Seminar Presentation
Managed Services Seminar Presentation
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANA
 
Riverbed Granite
Riverbed GraniteRiverbed Granite
Riverbed Granite
 
Finding Virtual Coins in the Couch
Finding Virtual Coins in the CouchFinding Virtual Coins in the Couch
Finding Virtual Coins in the Couch
 
VMware & Riverbed
VMware & RiverbedVMware & Riverbed
VMware & Riverbed
 
IBM Systems solution for SAP NetWeaver Business Warehouse Accelerator
IBM Systems solution for SAP NetWeaver Business Warehouse AcceleratorIBM Systems solution for SAP NetWeaver Business Warehouse Accelerator
IBM Systems solution for SAP NetWeaver Business Warehouse Accelerator
 
Apache Hadoop Now Next and Beyond
Apache Hadoop Now Next and BeyondApache Hadoop Now Next and Beyond
Apache Hadoop Now Next and Beyond
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)
 
BranchReduce Distributed Branch-and-Bound on YARN
BranchReduce Distributed Branch-and-Bound on YARNBranchReduce Distributed Branch-and-Bound on YARN
BranchReduce Distributed Branch-and-Bound on YARN
 
Ronald van Luttikhuizen - Effective fault handling in SOA Suite and OSB 11g
Ronald van Luttikhuizen - Effective fault handling in SOA Suite and OSB 11gRonald van Luttikhuizen - Effective fault handling in SOA Suite and OSB 11g
Ronald van Luttikhuizen - Effective fault handling in SOA Suite and OSB 11g
 
Securing Your Endpoints Using Novell ZENworks Endpoint Security Management
Securing Your Endpoints Using Novell ZENworks Endpoint Security ManagementSecuring Your Endpoints Using Novell ZENworks Endpoint Security Management
Securing Your Endpoints Using Novell ZENworks Endpoint Security Management
 
A Foundation for Success in the Information Economy
A Foundation for Success in the Information EconomyA Foundation for Success in the Information Economy
A Foundation for Success in the Information Economy
 

Viewers also liked

Evaluacion inicial
Evaluacion inicialEvaluacion inicial
Evaluacion inicialGaby Andino
 
How to make virtual network services a winner rather than an integration disa...
How to make virtual network services a winner rather than an integration disa...How to make virtual network services a winner rather than an integration disa...
How to make virtual network services a winner rather than an integration disa...Allot Communications
 
2010.06.03call webquest
2010.06.03call webquest2010.06.03call webquest
2010.06.03call webquestwhitecat101
 
Marketing management.ppt
Marketing management.pptMarketing management.ppt
Marketing management.pptManeesha Patel
 
WeDo Technologies' Conference in Washington D.C. 2015
WeDo Technologies' Conference in Washington D.C. 2015WeDo Technologies' Conference in Washington D.C. 2015
WeDo Technologies' Conference in Washington D.C. 2015WeDo Technologies
 

Viewers also liked (10)

Evaluacion inicial
Evaluacion inicialEvaluacion inicial
Evaluacion inicial
 
How to make virtual network services a winner rather than an integration disa...
How to make virtual network services a winner rather than an integration disa...How to make virtual network services a winner rather than an integration disa...
How to make virtual network services a winner rather than an integration disa...
 
2010.06.03call webquest
2010.06.03call webquest2010.06.03call webquest
2010.06.03call webquest
 
Cars2
Cars2Cars2
Cars2
 
Future of HCatalog
Future of HCatalogFuture of HCatalog
Future of HCatalog
 
Manage Hadoop Cluster with Ambari
Manage Hadoop Cluster with AmbariManage Hadoop Cluster with Ambari
Manage Hadoop Cluster with Ambari
 
Untitled2
Untitled2Untitled2
Untitled2
 
Marketing management.ppt
Marketing management.pptMarketing management.ppt
Marketing management.ppt
 
WeDo Technologies' Conference in Washington D.C. 2015
WeDo Technologies' Conference in Washington D.C. 2015WeDo Technologies' Conference in Washington D.C. 2015
WeDo Technologies' Conference in Washington D.C. 2015
 
shareSEPHORA
shareSEPHORAshareSEPHORA
shareSEPHORA
 

Similar to Improving h base availability and repair

Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)jmhsieh
 
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseStrata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseCloudera, Inc.
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopCloudera, Inc.
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataCloudera, Inc.
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaMark Kerzner
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera, Inc.
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101EMC
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrowSteve Loughran
 
Trends in Supporting Production Apache HBase Clusters
Trends in Supporting Production Apache HBase ClustersTrends in Supporting Production Apache HBase Clusters
Trends in Supporting Production Apache HBase ClustersDataWorks Summit
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryCloudera, Inc.
 
Integrating Big Data Technologies
Integrating Big Data TechnologiesIntegrating Big Data Technologies
Integrating Big Data TechnologiesDATAVERSITY
 
4 supporting h base jeff, jon, kathleen - cloudera - final 2
4 supporting h base   jeff, jon, kathleen - cloudera - final 24 supporting h base   jeff, jon, kathleen - cloudera - final 2
4 supporting h base jeff, jon, kathleen - cloudera - final 2Cloudera, Inc.
 
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroCloudera, Inc.
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Storage infrastructure using HBase behind LINE messages
Storage infrastructure using HBase behind LINE messagesStorage infrastructure using HBase behind LINE messages
Storage infrastructure using HBase behind LINE messagesLINE Corporation (Tech Unit)
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaCloudera, Inc.
 
What's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemWhat's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemCloudera, Inc.
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprisesmarkgrover
 

Similar to Improving h base availability and repair (20)

Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)
 
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseStrata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrow
 
Trends in Supporting Production Apache HBase Clusters
Trends in Supporting Production Apache HBase ClustersTrends in Supporting Production Apache HBase Clusters
Trends in Supporting Production Apache HBase Clusters
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
Integrating Big Data Technologies
Integrating Big Data TechnologiesIntegrating Big Data Technologies
Integrating Big Data Technologies
 
4 supporting h base jeff, jon, kathleen - cloudera - final 2
4 supporting h base   jeff, jon, kathleen - cloudera - final 24 supporting h base   jeff, jon, kathleen - cloudera - final 2
4 supporting h base jeff, jon, kathleen - cloudera - final 2
 
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Storage infrastructure using HBase behind LINE messages
Storage infrastructure using HBase behind LINE messagesStorage infrastructure using HBase behind LINE messages
Storage infrastructure using HBase behind LINE messages
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
 
What's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemWhat's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File System
 
Firebird meets NoSQL
Firebird meets NoSQLFirebird meets NoSQL
Firebird meets NoSQL
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Recently uploaded (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Improving h base availability and repair

  • 1. Improving  HBase  Availability  and  Repair   Improving  HBase  Availability  and  Repair   Jeff  Bean,  Jonathan  Hsieh  {jw2ean,jon} @cloudera.com   6/13/12          
  • 2. Who  Are  We?   •  Jeff  Bean   •  Designated  Support  Engineer,  Cloudera   •  EducaGon  Program  Lead,  Cloudera   •  Jonathan  Hsieh   •  SoJware  Engineer,  Cloudera   •  Apache  HBase  CommiLer  and  PMC  member   Hadoop  Summit  2012.  6/13/12    Copyright  2012   2   Cloudera  Inc,  All  Rights  Reserved  
  • 3. What  is  Apache  HBase?   Apache  HBase  is  an   reliable,  column-­‐ oriented  data  store   that  provides   consistent,  low-­‐ latency,  random   read/write  access.   Hadoop  Summit  2012.  6/13/12    Copyright  2012   3   Cloudera  Inc,  All  Rights  Reserved  
  • 4. Fault  Tolerance  vs  Highly  Available   •  Fault  tolerant:     •  Ability  to  recover  service  if  a   component  fails,  without  losing   data.   Fault  Tolerant   •  Highly  Available:     •  Ability  to  quickly  recover  service  if   Highly   a  component  fails,  without  losing   Available   data.   •  Goal:  Minimize  downGme!   Hadoop  Summit  2012.  6/13/12    Copyright  2012   4   Cloudera  Inc,  All  Rights  Reserved  
  • 5. HBase  Architecture   •  HBase  is  designed  to  be  fault  tolerant   and  highly  available     •  It  depends  on  other  systems  to  be  as  well.   App   MR   •  ReplicaDon  for  fault  tolerance     •  Serve  regions  from  any  Region  server   •  Failover  HMasters   •  ZK  Quorums   •  HDFS  Block  replicaGon  on  Data  Nodes   ZK   HDFS   •  But  replicaGon  doesn’t  guarantee  high   availability   •  There  can  sGll  be  soJware  or  human  faults   Hadoop  Summit  2012.  6/13/12    Copyright  2012   5   Cloudera  Inc,  All  Rights  Reserved  
  • 6. Causes  of  HBase  DownDme   HBase  DownDme   DistribuDon   •  Unplanned  Maintenance     •  Hardware  failures     •  SoJware  errors   Planned   •  Human  error   •  Planned  Maintenance   •  Upgrades   Unplanned   •  MigraGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   6   Cloudera  Inc,  All  Rights  Reserved  
  • 7. Causes  of  Unexpected  Maintenance  Incidents     Unplanned  Maintenance:  Root   Cause  from  Cloudera  Support   •  MisconfiguraGon   •  Metadata  CorrupGons   Repair   •  Network  /  HW  problems   Needed   HBase,  ZK,   28%   •  SW  problems   MR,  HDFS   Misconfig   44%   Fix  HW/ •  Long  recovery  Gme   NW   16%   Patch   •  Automated  and  manual   Required   12%   Source:  Cloudera’s  producGon  HBase  Support  Tickets     CDH3’s  HBase  0.90.x,  Hadoop  0.20.x/1.0.x   Hadoop  Summit  2012.  6/13/12    Copyright  2012   7   Cloudera  Inc,  All  Rights  Reserved  
  • 8. Outline   •  Where  we  were     •  HBase  0.90.x  +  Hadoop  0.20.x/1.0.x     •  Case  Studies   •  Where  we  are  today   •  HBase  0.92.x/0.94.x  +  Hadoop  2.0.x   •  Feature  Summary   •  Where  we  are  going   •  HBase  0.96.x  +  Hadoop  2.x     •  Feature  Preview   Hadoop  Summit  2012.  6/13/12    Copyright  2012   8   Cloudera  Inc,  All  Rights  Reserved  
  • 9. [T]here  are  known  knowns;  there  are  things  we  know  we  know.   We  also  know  there  are  known  unknowns;  that  is  to  say  we  know   there  are  some  things  we  do  not  know.   But  there  are  also  unknown  unknowns  –  there  are  things  we  do  not   know  we  don't  know.   —United  States  Secretary  of  Defense  Donald  Rumsfeld   WHERE  WE  WERE:   CASE  STUDIES     Hadoop  Summit  2012.  6/13/12    Copyright  2012   9   Cloudera  Inc,  All  Rights  Reserved  
  • 10. Best  PracDces  to  avoid  hazards   Unplanned  Maintenance:  Root   Cause  from  Cloudera  Support   Repair   Needed   HBase,  ZK,   28%   MR,  HDFS   Misconfig   44%   Fix  HW/ NW   16%   Patch   Required   12%   CAN PREVENT HBASE Source:  Cloudera’s  producGon  HBase  Support  Tickets     MISCONFIGURATIONS CDH3’s  HBase  0.90.x,  Hadoop  0.20.x/1.0.x   Hadoop  Summit  2012.  6/13/12    Copyright  2012   10   Cloudera  Inc,  All  Rights  Reserved  
  • 11. Case  #1:  Memory  Over-­‐subscripDon  Hazard   Misconfig   Bad  Outcome   Masters  Take   Node  A  swaps   •  Too  many  MR  Slots   •  MapReduce  tasks  fail   AcGon   •  MR  Slots  too  large   •  HDFS  datanode   •  “Arbitrary”  processes   operaGons  Gme  out   •  JobTracker  blacklists  TT   pause  or  unresponsive   on  node  B   •  HBase  client  operaGons   fail   •  Jobs  fail  or  run  slow   •  NameNode  re-­‐replicates   blocks  from  node  A   Node    A  Under   Node  B  can’t   Load   connect  to  node  A   Hadoop  Summit  2012.  6/13/12    Copyright  2012   11   Cloudera  Inc,  All  Rights  Reserved  
  • 12. Case  #2,  #3:  Hazards  of  Abusing  HDFS  and  ZK   Millions  of  HDFS  files   Millions  of  ZK  nodes   Bad  PracGce   MisconfiguraGon   500,000  blocks  per   Millions  of  ZK  znodes   datanode   400MB  snapshot   Heartbeat  thread   SW  Bug   ZK  fails  to  create  new   blocks  IO   snapshots,  fails   RS  cannot  access   Bad  outcome   HBase  goes  down   HDFS   HBase  goes  down   Bad  outcome   HBase  fails  to  restart   SW  Bug,  Worse   Hadoop  Summit  2012.  6/13/12    Copyright  2012   outcome   12   Cloudera  Inc,  All  Rights  Reserved  
  • 13. Case  #4:  SpliYng  CorrupDon  from  HW  failure   Manual,  Slow,  and   HW  Failure   requires  expert   HBase  has   Region   regions   MulGple  6  hour   Network  failure   Split  Recovery   inconsistencies   aLempts  to   manual  repair   (takes  out  NN)   incomplete   split   (overlaps  /   sessions.   holes)   SW  Bug   Hadoop  Summit  2012.  6/13/12    Copyright  2012   13   Cloudera  Inc,  All  Rights  Reserved  
  • 14. Case  #5:  Slow  recovery  from  HW  failure   Correct  but  slow!   Human  error   On  restart,   RS  loses   9  hour  hlog   Network   Root   Manual   HDFS,   spliung   HW  failure   and  .META.   Repairs   WALs   recovery   assign  fails   SW  error   Hadoop  Summit  2012.  6/13/12    Copyright  2012   14   Cloudera  Inc,  All  Rights  Reserved  
  • 15. IniDal  Lessons   •  Use  Best  pracGces  to  avoid  problems   •  ConservaGve  first   •  Avoid  unstable  features   •  What  can  we  do?   •  Fix  the  bugs   •  Recover  from  problems  faster   •  Make  people  smarter  to  avoid  hazards  and  misconfiguraGons   •  Make  soJware  smarter  to  prevent  hazards  and   misconfiguraGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   15   Cloudera  Inc,  All  Rights  Reserved  
  • 16. In  war,  then,  let  your  great  object  be  victory,   not  lengthy  campaigns.   -­‐-­‐  Sun  Tzu   WHERE  WE  ARE  TODAY   HBASE  0.92.X  +  HADOOP  2.0.X   Hadoop  Summit  2012.  6/13/12    Copyright  2012   16   Cloudera  Inc,  All  Rights  Reserved  
  • 17. Goal:  Reduce  unexpected  downDme  by   recovering  faster     •  Removing  the  SPOFs   •  HA  HDFS   •  Faster  Recovery   •  Improved  hbck   •  Distributed  Log  spliung   Hadoop  Summit  2012.  6/13/12    Copyright  2012   17   Cloudera  Inc,  All  Rights  Reserved  
  • 18. Problem:  HDFS  NN  goes  down  under  HBase   •  HBase  depends  on  HDFS.   App   MR   •  If  HDFS  is  down,  HBase  goes  down.   •  RamificaGons.   •  Forces  Recovery  mechanism   •  Caused  some  data  corrupGons   ZK   HDFS   •  Ideally  we  avoid  having  to  do  recovery  at  all.   Hadoop  Summit  2012.  6/13/12    Copyright  2012   18   Cloudera  Inc,  All  Rights  Reserved  
  • 19. HBase-­‐HDFS  HA  Nodes   NameNode    (acGve)   HMaster     (metadata  server)   (region  metadata)   NameNode    (standby)   HMaster      (acGve-­‐standby   (hot  standby)    hot  failover)   ZooKeeper    Quorum   HDFS  DataNodes   HBase  RegionServers   Hadoop  Summit  2012.  6/13/12    Copyright  2012   19   Cloudera  Inc,  All  Rights  Reserved  
  • 20. HBase-­‐HDFS  HA  Nodes:  Transparent  to  HBase   HMaster     (region  metadata)   HMaster     NameNode    (acGve)   (hot  standby)   ZooKeeper    Quorum   HDFS  DataNodes   HBase  RegionServers   Hadoop  Summit  2012.  6/13/12    Copyright  2012   20   Cloudera  Inc,  All  Rights  Reserved  
  • 21. HBase-­‐HDFS  HA  Nodes:  No  more  SPOF   HMaster     NameNode    (acGve)   (acGve)   ZooKeeper    Quorum   HDFS  DataNodes   HBase  RegionServers   Hadoop  Summit  2012.  6/13/12    Copyright  2012   21   Cloudera  Inc,  All  Rights  Reserved  
  • 22. Recovery  operaDons   •  If  a  network  switch  fails  or  if  there  is  a  power  outage,     •  HBase,  ZK,  and  HA  HDFS  will  fail   •  Will  always  sGll  rely  on  recovery  mechanisms.   •  Need  to  be  able  to  quickly  recover   •  Metadata  Invariants  to  fix  metadata  corrupGons   •  Data  Consistency  to  restore  ACID  guarantees   Hadoop  Summit  2012.  6/13/12    Copyright  2012   22   Cloudera  Inc,  All  Rights  Reserved  
  • 23. HBase  Metadata  CorrupDons   •  Internal  HBase  metadata   Unplanned  Maintenance:  Root  Cause   corrupGons   from  Cloudera  Support   •  Prevent  HBase  from  starGng     •  Cause  some  regions  to  be   Repair   unavailable.   Needed   28%   HBase,  ZK,   MR,  HDFS   Misconfig   •  Repairs  are  intricate  and   44%   Fix  HW/ can  cause  extended  periods   NW   of  downGme.   16%   Patch   Required   12%   Hadoop  Summit  2012.  6/13/12    Copyright  2012   23   Cloudera  Inc,  All  Rights  Reserved  
  • 24. HBase  Metadata  Invariants   Table  Integrity   Region  Consistency   •  Every  key  shall  get  assigned   •  Metadata  about  regions  should   to  a  single  region.   agree  in  hdfs,  meta  and  region   server  assignment.   [‘  ‘,A)   [A,B)   regioninfo     in  META   [B,  C)   [C,  D)   [D,  E)   Good   [E,  F)   region   assigned     .regioninfo     [F,  G)   to    RS   in  HDFS   [G,  ‘  ‘)   Hadoop  Summit  2012.  6/13/12    Copyright  2012   24   Cloudera  Inc,  All  Rights  Reserved  
  • 25. DetecDng  and  Repairing  corrupDon  with  hbck   •  HBase  0.90  hbck     •  Checks  an  HBase   instance’s  internals   invariants.   •  HBase  hbck  today   •  Checks  and  can  fix   problem  in  an  HBase   instance’s  internal   invariants   •  0.90.7,  0.92.2,   0.94.0   •  CDH3u4,  CDH4   Hadoop  Summit  2012.  6/13/12    Copyright  2012   25   Cloudera  Inc,  All  Rights  Reserved  
  • 26. Case  #4  redux:  SpliYng  CorrupDon   Manual,  Slow,  and   HW  Failure   requires  expert   HBase  has   Region   Network  failure   regions   MulGple  6  hour   Split  Recovery   inconsistencies   aLempts  to   manual  repair   (takes  out  NN)   incomplete   split   (overlaps  /   sessions.   holes)   SW  Bug   Hadoop  Summit  2012.  6/13/12    Copyright  2012   26   Cloudera  Inc,  All  Rights  Reserved  
  • 27. Case  #4  redux:  SpliYng  CorrupDon   HW  Failure   HBase  has   Region   Network  failure   regions   Automated   Split  Recovery   inconsistencies   aLempts  to   repair  tool   (takes  out  NN)   incomplete   split   (overlaps  /   (Minutes)   holes)   SW  Bug   Fixes  are  quicker,   operator  can  use   Hadoop  Summit  2012.  6/13/12    Copyright  2012   27   Cloudera  Inc,  All  Rights  Reserved  
  • 28. Case  #4  redux:  SpliYng  CorrupDon   HW  Failure   Minor    HBase   Region   Network  failure   inconsistencies   Automated   Split  Recovery   aLempts  to   repair  tool   (takes  out  NN)   incomplete   (bad   split   (seconds)   assignments)   Fixed  SW  Bug   Hadoop  Summit  2012.  6/13/12    Copyright  2012   28   Cloudera  Inc,  All  Rights  Reserved  
  • 29. Data  Consistency   •  When  a  region  server  goes  down,  it  tries  to  flush  data  in   memory  to  HDFS.   •  If  it  cannot  write  to  HDFS,  it  relies  on  the  WAL/HLog.   •  Recovery  via  the  HLog  is  vital  to  prevent  data  loss   •  Understand  the  write  path.   •  Recovery:    HLog  spliung.   •  Faster  Recovery:  Distributed  HLog  spliung.   Hadoop  Summit  2012.  6/13/12    Copyright  2012   29   Cloudera  Inc,  All  Rights  Reserved  
  • 30. Write  Path  (Put  /  Delete  /  Increment)   HBase   client   Region  Server   HLog   Put   Server   HRegion   HRegion   MemStore   MemStore   Put   HStore   HStore   HStore   HStore   Hadoop  Summit  2012.  6/13/12    Copyright  2012   30   Cloudera  Inc,  All  Rights  Reserved  
  • 31. Write  Path  (Put  /  Delete  /  Increment)   Note,  both  regions   write  to  the  same   HBase   HLog   client   Region  Server   Put   HLog   Put   Put   Server   HRegion   HRegion   MemStore   MemStore   Put   Put   HStore   HStore   HStore   HStore   Hadoop  Summit  2012.  6/13/12    Copyright  2012   31   Cloudera  Inc,  All  Rights  Reserved  
  • 32. Log  SpliYng   HMaster   RegionServer   RegionServer   RegionServer   HLog1   HLog2   HLog3   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   mem   mem   mem   mem   mem   mem   Hadoop  Summit  2012.  6/13/12    Copyright  2012   32   Cloudera  Inc,  All  Rights  Reserved  
  • 33. Log  SpliYng   HMaster   RegionServer   RegionServer   RegionServer   HLog1   HLog2   HLog3   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   mem   mem   mem   mem   mem   mem   Hadoop  Summit  2012.  6/13/12    Copyright  2012   33   Cloudera  Inc,  All  Rights  Reserved  
  • 34. Log  SpliYng   HMaster   HLog1   HLog2   HLog3   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   34   Cloudera  Inc,  All  Rights  Reserved  
  • 35. Log  SpliYng   Spliung  log  1   HMaster   HLog1   HLog2   HLog3   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   35   Cloudera  Inc,  All  Rights  Reserved  
  • 36. Log  SpliYng   Spliung  log  2   HMaster   HLog   HLog1   HLog2   HLog3   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   36   Cloudera  Inc,  All  Rights  Reserved  
  • 37. Log  SpliYng   Spliung  log  3   HMaster   HLog   HLog1   HLog   HLog2   HLog3   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   37   Cloudera  Inc,  All  Rights  Reserved  
  • 38. Log  SpliYng   Spliung  log  100   HMaster   HLog   HLog   HLog   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   38   Cloudera  Inc,  All  Rights  Reserved  
  • 39. Log  SpliYng   Whew.    I  did  a  lot  of   spliung  work.    That   took  9  hours!   HMaster   HLog   HLog   HLog   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   39   Cloudera  Inc,  All  Rights  Reserved  
  • 40. Log  SpliYng   RegionServers,  here   are  your  region   assignments.   HMaster   RegionServer4   RegionServer5   RegionServer6   …   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   40   Cloudera  Inc,  All  Rights  Reserved  
  • 41. Log  SpliYng   Victory!   HMaster   RegionServer4   RegionServer5   RegionServer6   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   mem   mem   mem   mem   mem   mem   Hadoop  Summit  2012.  6/13/12    Copyright  2012   41   Cloudera  Inc,  All  Rights  Reserved  
  • 42. Can  we  recover  more  quickly?     •  In  the  case  study,  this  is  all  done  serially  by  the  master     •  The  master  took  9  hours  to  recovery.   •  The  100  region  server  nodes  were  idle.     •  Let’s  use  the  idle  machines  to  do  spliung  in  parallel!   •  Distributed  log  spliYng  (HBASE-­‐1364)   •  Introduced  in  0.92.0  by  Prakash  Khemani  (Facebook)   •  Included  in  CDH4  (0.92.1)       •  Backported  to  CDH3u3  (off  by  default)   Hadoop  Summit  2012.  6/13/12    Copyright  2012   42   Cloudera  Inc,  All  Rights  Reserved  
  • 43. Distributed  Log  SpliYng   I’m  the  boss.   HMaster   RegionServer   RegionServer   RegionServer   HLog1   HLog2   HLog3   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   mem   mem   mem   mem   mem   mem   Hadoop  Summit  2012.  6/13/12    Copyright  2012   43   Cloudera  Inc,  All  Rights  Reserved  
  • 44. Distributed  Log  SpliYng   There  is  a  lot  of   spliung  work  here,   HMaster   let’s  split  it  up.   HLog1   HLog2   HLog3   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   44   Cloudera  Inc,  All  Rights  Reserved  
  • 45. Distributed  Log  SpliYng   You  guys  do  the  work   for  me.   HMaster   RegionServer4   RegionServer5   RegionServer6   HLog1   HLog2   HLog3   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   45   Cloudera  Inc,  All  Rights  Reserved  
  • 46. Distributed  Log  SpliYng   You  guys  do  the  work   for  me.   HMaster   RegionServer4   RegionServer5   RegionServer6   HLog1   HLog2   HLog3   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   46   Cloudera  Inc,  All  Rights  Reserved  
  • 47. Distributed  Log  SpliYng   Great,  that  took  5.4   minutes.   HMaster   RegionServer4   RegionServer5   RegionServer6   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   47   Cloudera  Inc,  All  Rights  Reserved  
  • 48. Distributed  Log  SpliYng   Good  Job,  here  are   your  region   assignments.   HMaster   RegionServer4   RegionServer5   RegionServer6   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   Hadoop  Summit  2012.  6/13/12    Copyright  2012   48   Cloudera  Inc,  All  Rights  Reserved  
  • 49. Distributed  Log  SpliYng   Like  a  Boss.   HMaster   RegionServer4   RegionServer5   RegionServer6   …   HRegion   HRegion   HRegion   HRegion   HRegion   HRegion   mem   mem   mem   mem   mem   mem   Hadoop  Summit  2012.  6/13/12    Copyright  2012   49   Cloudera  Inc,  All  Rights  Reserved  
  • 50. Case  #5  redux:  Network  failure  and  slow  recovery   Correct  but  slow!   Human  error   On  restart,   RS  loses   9  hour  hlog   Network   Root   Manual   HDFS,   spliung   HW  failure   and  .META.   Repair   WALs   recovery   assign  fails   Hadoop  Summit  2012.  6/13/12    Copyright  2012   50   Cloudera  Inc,  All  Rights  Reserved  
  • 51. Case  #5  redux:  Network  failure  and  slow  recovery   Correct  and  Faster!   Human  error   On  restart,   5.4  Minute   RS  loses   Network   Root   AutomaGc   hlog   HDFS,   HW  failure   and  .META.   repairs   spliung   WALs   assign  fails   recovery   Fixed!   Hadoop  Summit  2012.  6/13/12    Copyright  2012   51   Cloudera  Inc,  All  Rights  Reserved  
  • 52. WHERE  WE  ARE  GOING   HBASE  0.96  +  HADOOP  2.X   Hadoop  Summit  2012.  6/13/12    Copyright  2012   52   Cloudera  Inc,  All  Rights  Reserved  
  • 53. Themes   •  Minimizing  Planned  downGme   HBase  DownDme   •  Changing  configuraGons   DistribuDon   •  Online  Schema  Change   (experimental  in  0.92,  0.94)   •  Rolling  Restarts   Planned   •  Wire  compaGbility   Unplanned   Hadoop  Summit  2012.  6/13/12    Copyright  2012   53   Cloudera  Inc,  All  Rights  Reserved  
  • 54. Table  unavailable  when  changing  schema   •  Changing  table  schema  requires  disabling  table   •  disable  table,  alter  table  schema,  enable  table   •  Schema  includes  compression,  cf’s,  caching,  Ll,  versions.   •  Goal:  Quickly  change  table  and  column  configuraGon   seungs  without  having  to  disable  Hbase  tables.   •  Feature  Online  Schema  Change  (HBASE-­‐1730)   •  Included  in  but  considered  experimental  in  HBase  0.92/0.94.       •  Contributed  by  Facebook   Hadoop  Summit  2012.  6/13/12    Copyright  2012   54   Cloudera  Inc,  All  Rights  Reserved  
  • 55. Changing  Server  Configs  and  Sogware  updates   •  Rolling  restart  is  an  operaGon  for  upgrading  an  HBase   cluster  to  a  compaGble  version  while  keeping  HBase   available  and  serving  data.   •  Handle  server  config  changes.   •  Handle  code  changes  like  ho}ixes  or  compaGble  upgrades     Hadoop  Summit  2012.  6/13/12    Copyright  2012   55   Cloudera  Inc,  All  Rights  Reserved  
  • 56. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS1   RS2   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   56   Cloudera  Inc,  All  Rights  Reserved  
  • 57. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS2   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   57   Cloudera  Inc,  All  Rights  Reserved  
  • 58. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS1   RS2   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   58   Cloudera  Inc,  All  Rights  Reserved  
  • 59. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS1   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   59   Cloudera  Inc,  All  Rights  Reserved  
  • 60. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS1   RS2   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   60   Cloudera  Inc,  All  Rights  Reserved  
  • 61. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS1   RS2   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   61   Cloudera  Inc,  All  Rights  Reserved  
  • 62. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS1   RS2   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   62   Cloudera  Inc,  All  Rights  Reserved  
  • 63. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS1   RS2   RS3   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   63   Cloudera  Inc,  All  Rights  Reserved  
  • 64. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS1   RS2   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   64   Cloudera  Inc,  All  Rights  Reserved  
  • 65. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   User   operaGons   HM2   RS1   RS2   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   65   Cloudera  Inc,  All  Rights  Reserved  
  • 66. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS1   RS2   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   66   Cloudera  Inc,  All  Rights  Reserved  
  • 67. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   RS1   RS2   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   67   Cloudera  Inc,  All  Rights  Reserved  
  • 68. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS1   RS2   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   68   Cloudera  Inc,  All  Rights  Reserved  
  • 69. Rolling  Restart   Admin   operaGons   ZK   Client   Shell   HM1   User   operaGons   HM2   RS1   RS2   RS3   RS4   Internal   operaGons   Hadoop  Summit  2012.  6/13/12    Copyright  2012   69   Cloudera  Inc,  All  Rights  Reserved  
  • 70. Rolling  restart  limitaDons   •  There  are  limitaGons  on   Unplanned  Maintenance:  Root   rolling  restarts     Cause  from  Cloudera  Support   •  All  Servers  and  clients  must  be   wire  compaGble   •  All  must  be  able  to  read  old   data  in  FS  and  ZK.   Repair   Needed   HBase,  ZK,   28%   •  RamificaGons:     MR,  HDFS   Misconfig   •  Only  minor  version  upgrades   44%   possible   Fix  HW/ •  New  features  that  change  RPCs   NW   require  custom  compaGbility   16%   Patch   shims.   Required   •  Data  format  changes  not   12%   possible  across  minor  versions.   Source:  Cloudera’s  producGon  HBase  Support  Tickets     CDH3’s  HBase  0.90.x,  Hadoop  0.20.x/1.0.x   Hadoop  Summit  2012.  6/13/12    Copyright  2012   70   Cloudera  Inc,  All  Rights  Reserved  
  • 71. HBase  CompaDbility  and  Extensibility   •  Coming  in  HBase  0.96   •  HBASE-­‐5305  and  friends   •  Goals:   •  Allow  API  and  changes  and  persistent  data  structure  changes   while  guarantees  compaGbility  between  different  minor   versions  (0.96.0  -­‐>  0.96.1)   •  HBase  client  server  compaGbility  between  Major  Versions.   (0.96.x  -­‐>  0.98.x)   Hadoop  Summit  2012.  6/13/12    Copyright  2012   71   Cloudera  Inc,  All  Rights  Reserved  
  • 72. HDFS  Wire  CompaDbility   •  Here  in  HDFS  2.0.x   •  HADOOP-­‐7347  and  friends   App   MR   •  Goals:   •  Allow  API  and  changes  while   guaranteeing  wire  compaGbility   between  different  minor  versions   •  HDFS  client  server  compaGbility   ZK   HDFS   between  Major  Versions.       Hadoop  Summit  2012.  6/13/12    Copyright  2012   72   Cloudera  Inc,  All  Rights  Reserved  
  • 73. HDFS  Wire  CompaDbility   •  Here  in  HDFS  2.0.x   •  HADOOP-­‐7347  and  friends   App   MR   •  Goals:   •  Allow  API  and  changes  while   guaranteeing  wire  compaGbility   between  different  minor  versions   •  HDFS  client  server  compaGbility   ZK   HDFS   between  Major  Versions.       Hadoop  Summit  2012.  6/13/12    Copyright  2012   73   Cloudera  Inc,  All  Rights  Reserved  
  • 74. CONCLUSIONS   Hadoop  Summit  2012.  6/13/12    Copyright  2012   74   Cloudera  Inc,  All  Rights  Reserved  
  • 75. Improving  how  we  handling  causes  of  downDme   HBase  DownDme  DistribuDon   Unplanned  Maintenance:  Root   Cause  from  Cloudera  Support   Wire   compat   Best   hbck   pracGces   Repair   Planned   Needed   HBase,  ZK,   28%   MR,  HDFS   Misconfig   44%   Unplanned   Fix  HW/ NW   16%   Patch   Required   hbck  and   12%   distributed  log   Wire   spliung   compat   Hadoop  Summit  2012.  6/13/12    Copyright  2012   75   Cloudera  Inc,  All  Rights  Reserved  
  • 76. jon@cloudera.com   TwiLer:  @jmhsieh     We’re  hiring!   QUESTIONS?     Hadoop  Summit  2012.  6/13/12    Copyright  2012   76   Cloudera  Inc,  All  Rights  Reserved