Improving HBase Availability and Repair  Improving HBase Availability and Repair           Jeff Bean, Jonathan Hsieh      ...
Who Are We?• Jeff Bean   • Designated Support Engineer, Cloudera   • Education Program Lead, Cloudera• Jonathan Hsieh   • ...
What is Apache HBase?                                Apache HBase is an                                 reliable, column- ...
Fault Tolerance vs Highly Available• Fault tolerant:   • Ability to recover service if a     component fails, without losi...
HBase Architecture• HBase is designed to be fault tolerant  and highly available   • It depends on other systems to be as ...
Causes of HBase Downtime                                                            HBase Downtime                        ...
Causes of Unexpected Maintenance Incidents                                                Unplanned Maintenance: Root     ...
Outline• Where we were  • HBase 0.90.x + Hadoop 0.20.x/1.0.x  • Case Studies• Where we are today  • HBase 0.92.x/0.94.x + ...
[T]here are known knowns; there are things we know we know.      We also know there are known unknowns; that is to say we ...
Best Practices to avoid hazards                                             Unplanned Maintenance: Root                   ...
Case #1: Memory Over-subscription Hazard          Misconfig                                                               ...
Case #2, #3: Hazards of Abusing HDFS and ZK      Millions of HDFS files                                Millions of ZK node...
Case #4: Splitting Corruption from HW failure                                                                             ...
Case #5: Slow recovery from HW failure                                                                         Correct but...
Initial Lessons• Use Best practices to avoid problems   • Conservative first   • Avoid unstable features• What can we do? ...
In war, then, let your great object be                       victory, not lengthy campaigns.                              ...
Goal: Reduce unexpected downtime byrecovering faster• Removing the SPOFs  • HA HDFS• Faster Recovery  • Improved hbck  • D...
Problem: HDFS NN goes down under HBase• HBase depends on HDFS.                                               MR           ...
HBase-HDFS HA Nodes NameNode (active)                                                     HMaster (metadata server)       ...
HBase-HDFS HA Nodes: Transparent to HBase                                                                       HMaster   ...
HBase-HDFS HA Nodes: No more SPOF                                                                       HMaster NameNode (...
Recovery operations• If a network switch fails or if there is a power outage,   • HBase, ZK, and HA HDFS will fail   • Wil...
HBase Metadata Corruptions• Internal HBase metadata                   Unplanned Maintenance: Root Cause  corruptions      ...
HBase Metadata Invariants Table Integrity                               Region Consistency • Every key shall get assigned ...
Detecting and Repairing corruption with hbck• HBase 0.90 hbck  • Checks an HBase    instance’s internals    invariants.• H...
Case #4 redux: Splitting Corruption                                                                              Manual, S...
Case #4 redux: Splitting Corruption                     HW Failure                                                        ...
Case #4 redux: Splitting Corruption                     HW Failure                                                        ...
Data Consistency• When a region server goes down, it tries to flush data in  memory to HDFS.• If it cannot write to HDFS, ...
Write Path (Put / Delete / Increment)   HBase                           Region Server   client                            ...
Write Path (Put / Delete / Increment)                                                                       Note, both reg...
Log Splitting                                  HMaster       RegionServer                     RegionServer                ...
Log Splitting                                  HMaster       RegionServer                     RegionServer                ...
Log Splitting                                 HMaster        HLog1                           HLog2                        ...
Log Splitting                                                                        Splitting log 1                      ...
Log Splitting                                                                        Splitting log 2                      ...
Log Splitting                                                                        Splitting log 3                      ...
Log Splitting                                                                        Splitting log 100                    ...
Log Splitting                                                                    Whew. I did a lot of                     ...
Log Splitting                                                                     RegionServers, here                     ...
Log Splitting                                                                           Victory!                          ...
Can we recover more quickly?• In the case study, this is all done serially by the master   • The master took 9 hours to re...
Distributed Log Splitting                                                              I’m the boss.                      ...
Distributed Log Splitting                                                              There is a lot of                  ...
Distributed Log Splitting                                                             You guys do the work                ...
Distributed Log Splitting                                                             You guys do the work                ...
Distributed Log Splitting                                                             Great, that took 5.4                ...
Distributed Log Splitting                                                             Good Job, here are                  ...
Distributed Log Splitting                                                              Like a Boss.                       ...
Case #5 redux: Network failure and slow recovery                                                                        Co...
Case #5 redux: Network failure and slow recovery                                                                         C...
WHERE WE ARE GOINGHBASE 0.96 + HADOOP 2.X         Hadoop Summit 2012. 6/13/12 Copyright 2012   52               Cloudera I...
Themes• Minimizing Planned downtime                                    HBase Downtime  • Changing configurations          ...
Table unavailable when changing schema• Changing table schema requires disabling table   • disable table, alter table sche...
Changing Server Configs and Software updates• Rolling restart is an operation for upgrading an HBase  cluster to a compati...
Rolling Restart                                                         Admin                                             ...
Rolling Restart                                                         Admin                                             ...
Rolling Restart                                                         Admin                                             ...
Rolling Restart                                                         Admin                                             ...
Rolling Restart                                                         Admin                                             ...
Rolling Restart                                                         Admin                                             ...
Rolling Restart                                                         Admin                                             ...
Rolling Restart                                                         Admin                                             ...
Rolling Restart                                                         Admin                                             ...
Rolling Restart                                                         Admin                                             ...
Rolling Restart                                                         Admin                                             ...
Rolling Restart                                                         Admin                                             ...
Rolling Restart                                                         Admin                                             ...
Rolling Restart                                                         Admin                                             ...
Rolling restart limitations• There are limitations on                           Unplanned Maintenance: Root  rolling resta...
HBase Compatibility and Extensibility• Coming in HBase 0.96  • HBASE-5305 and friends• Goals:  • Allow API and changes and...
HDFS Wire Compatibility• Here in HDFS 2.0.x   • HADOOP-7347 and friends                                                   ...
HDFS Wire Compatibility• Here in HDFS 2.0.x   • HADOOP-7347 and friends                                                   ...
CONCLUSIONS        Hadoop Summit 2012. 6/13/12 Copyright 2012   74              Cloudera Inc, All Rights Reserved
Improving how we handling causes of downtime                                                   Unplanned Maintenance: Root...
jon@cloudera.com                                                 Twitter: @jmhsieh                                        ...
Upcoming SlideShare
Loading in...5
×

Hadoop Summit 2012 | Improving HBase Availability and Repair

3,609

Published on

Apache HBase is a rapidly-evolving random-access distributed data store built on top of Apache Hadoop's HDFS and Apache ZooKeeper. Drawing from real-world support experiences, this talk provides administrators insight into improving HBase's availability and recovering from situations where HBase is not available. We share tips on the common root causes of unavailability, explain how to diagnose them, and prescribe measures for ensuring maximum availability of an HBase cluster. We discuss new features that improve recovery time such as distributed log splitting as well as supportability improvements. We will also describe utilities including new failure recovery tools that we have developed and contributed that can be used to diagnose and repair rare corruption problems on live HBase systems.

Published in: Technology, Education
0 Comments
19 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,609
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
19
Embeds 0
No embeds

No notes for slide
  • This pie chart is a product from analyzing critical production Hbase tickets over the past 6 months: misconfig 44%, patch 12%,hw/nw 16%, repair 28%. Meaning that correcting a misconfig was all that it took to bring Hbase back up again. As you can see, misconfigurations and bugs break the most HBase clusters. Fixing bugs is up to the community. Fixing misconfigurations is up to you and the focus of the next segment. Because it’s hard to diagnose, misconfigurations are not what you want to spend your time on.If your cluster is broken, it’s probably a misconfiguration. This is a hard problem becausethe error messages are not tightly tied to the root cause.
  • CDH3 goes GA 4/12/12
  • HDFS-2379
  • CDH4 GA 6/5/12
  • Tested under HBase
  • Transparent to clients
  • Coupled with HBase Master failover means no SPOF
  • Cause:Disconnect a region server for a whileKill -9 a region serverWhy?All writes at a region server go to a single Hlog. This can contain edits from multiple regions. Regions my get reassigned to multiple other region servers.Need to split up the hlog
  • Region_mover.rbMove regions off, recording which regions were presentRestore regions, based on recorded regions Still mostly a manual process
  • Region_mover.rbMove regions off, recording which regions were presentRestore regions, based on recorded regions Still mostly a manual process
  • Hadoop Summit 2012 | Improving HBase Availability and Repair

    1. 1. Improving HBase Availability and Repair Improving HBase Availability and Repair Jeff Bean, Jonathan Hsieh {jwfbean,jon}@cloudera.com 6/13/12
    2. 2. Who Are We?• Jeff Bean • Designated Support Engineer, Cloudera • Education Program Lead, Cloudera• Jonathan Hsieh • Software Engineer, Cloudera • Apache HBase Committer and PMC member Hadoop Summit 2012. 6/13/12 Copyright 2012 2 Cloudera Inc, All Rights Reserved
    3. 3. What is Apache HBase? Apache HBase is an reliable, column- oriented data store that provides consistent, low- latency, random read/write access. Hadoop Summit 2012. 6/13/12 Copyright 2012 3 Cloudera Inc, All Rights Reserved
    4. 4. Fault Tolerance vs Highly Available• Fault tolerant: • Ability to recover service if a component fails, without losing Fault Tolerant data.• Highly Available: • Ability to quickly recover service if Highly a component fails, without losing Available data.• Goal: Minimize downtime! Hadoop Summit 2012. 6/13/12 Copyright 2012 4 Cloudera Inc, All Rights Reserved
    5. 5. HBase Architecture• HBase is designed to be fault tolerant and highly available • It depends on other systems to be as well. App MR• Replication for fault tolerance • Serve regions from any Region server • Failover HMasters • ZK Quorums • HDFS Block replication on Data Nodes ZK HDFS• But replication doesn’t guarantee high availability • There can still be software or human faults Hadoop Summit 2012. 6/13/12 Copyright 2012 5 Cloudera Inc, All Rights Reserved
    6. 6. Causes of HBase Downtime HBase Downtime Distribution• Unplanned Maintenance • Hardware failures • Software errors Planned • Human error• Planned Maintenance • Upgrades Unplanned • Migrations Hadoop Summit 2012. 6/13/12 Copyright 2012 6 Cloudera Inc, All Rights Reserved
    7. 7. Causes of Unexpected Maintenance Incidents Unplanned Maintenance: Root Cause from Cloudera Support• Misconfiguration• Metadata Corruptions Repair• Network / HW problems Needed HBase, ZK, 28%• SW problems MR, HDFS Misconfig 44% Fix• Long recovery time HW/NW 16% Patch • Automated and manual Required 12% Source: Cloudera’s production HBase Support Tickets CDH3’s HBase 0.90.x, Hadoop 0.20.x/1.0.x Hadoop Summit 2012. 6/13/12 Copyright 2012 7 Cloudera Inc, All Rights Reserved
    8. 8. Outline• Where we were • HBase 0.90.x + Hadoop 0.20.x/1.0.x • Case Studies• Where we are today • HBase 0.92.x/0.94.x + Hadoop 2.0.x • Feature Summary• Where we are going • HBase 0.96.x + Hadoop 2.x • Feature Preview Hadoop Summit 2012. 6/13/12 Copyright 2012 8 Cloudera Inc, All Rights Reserved
    9. 9. [T]here are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – there are things we do not know we dont know. —United States Secretary of Defense Donald RumsfeldWHERE WE WERE:CASE STUDIES Hadoop Summit 2012. 6/13/12 Copyright 2012 9 Cloudera Inc, All Rights Reserved
    10. 10. Best Practices to avoid hazards Unplanned Maintenance: Root Cause from Cloudera Support Repair Needed HBase, ZK, 28% MR, HDFS Misconfig 44% Fix HW/NW 16% Patch Required 12% CAN PREVENT HBASE Source: Cloudera’s production HBase Support Tickets MISCONFIGURATIONS CDH3’s HBase 0.90.x, Hadoop 0.20.x/1.0.x Hadoop Summit 2012. 6/13/12 Copyright 2012 10 Cloudera Inc, All Rights Reserved
    11. 11. Case #1: Memory Over-subscription Hazard Misconfig Bad Outcome Masters Take Node A swaps• Too many MR Slots • MapReduce tasks fail Action• MR Slots too large • HDFS datanode • “Arbitrary” processes operations time out • JobTracker blacklists TT pause or unresponsive on node B • HBase client operations fail • Jobs fail or run slow • NameNode re-replicates blocks from node A Node A Under Node B can’t Load connect to node A Hadoop Summit 2012. 6/13/12 Copyright 2012 11 Cloudera Inc, All Rights Reserved
    12. 12. Case #2, #3: Hazards of Abusing HDFS and ZK Millions of HDFS files Millions of ZK nodes Bad Practice Misconfiguration 500,000 blocks per Millions of ZK znodes datanode 400MB snapshot Heartbeat thread SW Bug ZK fails to create new blocks IO snapshots, fails RS cannot access Bad outcome HBase goes down HDFS HBase goes down Bad outcome HBase fails to restart SW Bug, Worse Hadoop Summit 2012. 6/13/12 Copyright 2012 outcome 12 Cloudera Inc, All Rights Reserved
    13. 13. Case #4: Splitting Corruption from HW failure Manual, Slow, and HW Failure requires expert HBase has Region regions Multiple 6 hour Network failure Split Recovery attempts to inconsistencies manual repair (takes out NN) incomplete split (overlaps / sessions. holes) SW Bug Hadoop Summit 2012. 6/13/12 Copyright 2012 13 Cloudera Inc, All Rights Reserved
    14. 14. Case #5: Slow recovery from HW failure Correct but slow! Human error On RS loses restart, Roo 9 hour hlog Network Manual HDFS, WAL t and splittingHW failure Repairs s .META. recovery assign fails SW error Hadoop Summit 2012. 6/13/12 Copyright 2012 14 Cloudera Inc, All Rights Reserved
    15. 15. Initial Lessons• Use Best practices to avoid problems • Conservative first • Avoid unstable features• What can we do? • Fix the bugs • Recover from problems faster • Make people smarter to avoid hazards and misconfigurations • Make software smarter to prevent hazards and misconfigurations Hadoop Summit 2012. 6/13/12 Copyright 2012 15 Cloudera Inc, All Rights Reserved
    16. 16. In war, then, let your great object be victory, not lengthy campaigns. -- Sun TzuWHERE WE ARE TODAYHBASE 0.92.X + HADOOP 2.0.X Hadoop Summit 2012. 6/13/12 Copyright 2012 16 Cloudera Inc, All Rights Reserved
    17. 17. Goal: Reduce unexpected downtime byrecovering faster• Removing the SPOFs • HA HDFS• Faster Recovery • Improved hbck • Distributed Log splitting Hadoop Summit 2012. 6/13/12 Copyright 2012 17 Cloudera Inc, All Rights Reserved
    18. 18. Problem: HDFS NN goes down under HBase• HBase depends on HDFS. MR App • If HDFS is down, HBase goes down.• Ramifications. • Forces Recovery mechanism • Caused some data corruptions ZK HDFS• Ideally we avoid having to do recovery at all. Hadoop Summit 2012. 6/13/12 Copyright 2012 18 Cloudera Inc, All Rights Reserved
    19. 19. HBase-HDFS HA Nodes NameNode (active) HMaster (metadata server) (region metadata) NameNode (standby) HMaster (active-standby (hot standby) hot failover) ZooKeeper Quorum HDFS DataNodes HBase RegionServers Hadoop Summit 2012. 6/13/12 Copyright 2012 19 Cloudera Inc, All Rights Reserved
    20. 20. HBase-HDFS HA Nodes: Transparent to HBase HMaster (region metadata) HMaster NameNode (active) (hot standby) ZooKeeper Quorum HDFS DataNodes HBase RegionServers Hadoop Summit 2012. 6/13/12 Copyright 2012 20 Cloudera Inc, All Rights Reserved
    21. 21. HBase-HDFS HA Nodes: No more SPOF HMaster NameNode (active) (active) ZooKeeper Quorum HDFS DataNodes HBase RegionServers Hadoop Summit 2012. 6/13/12 Copyright 2012 21 Cloudera Inc, All Rights Reserved
    22. 22. Recovery operations• If a network switch fails or if there is a power outage, • HBase, ZK, and HA HDFS will fail • Will always still rely on recovery mechanisms.• Need to be able to quickly recover • Metadata Invariants to fix metadata corruptions • Data Consistency to restore ACID guarantees Hadoop Summit 2012. 6/13/12 Copyright 2012 22 Cloudera Inc, All Rights Reserved
    23. 23. HBase Metadata Corruptions• Internal HBase metadata Unplanned Maintenance: Root Cause corruptions from Cloudera Support • Prevent HBase from starting • Cause some regions to be Repair unavailable. Needed 28% HBase, ZK, MR, HDFS Misconfig• Repairs are intricate and 44% Fix can cause extended periods HW/NW of downtime. 16% Patch Required 12% Hadoop Summit 2012. 6/13/12 Copyright 2012 23 Cloudera Inc, All Rights Reserved
    24. 24. HBase Metadata Invariants Table Integrity Region Consistency • Every key shall get assigned • Metadata about regions should to a single region. agree in hdfs, meta and region server assignment. [‘ ‘,A) [A,B) regioninfo in META [B, C) [C, D) [D, E) Good [E, F) region assigned .regioninfo [F, G) to RS in HDFS [G, ‘ ‘) Hadoop Summit 2012. 6/13/12 Copyright 2012 24 Cloudera Inc, All Rights Reserved
    25. 25. Detecting and Repairing corruption with hbck• HBase 0.90 hbck • Checks an HBase instance’s internals invariants.• HBase hbck today • Checks and can fix problem in an HBase instance’s internal invariants • 0.90.7, 0.92.2, 0.9 4.0 • CDH3u4, CDH4 Hadoop Summit 2012. 6/13/12 Copyright 2012 25 Cloudera Inc, All Rights Reserved
    26. 26. Case #4 redux: Splitting Corruption Manual, Slow, and HW Failure requires expert HBase has Region Network failure regions Multiple 6 hour Split Recovery attempts to inconsistencies manual repair (takes out NN) incomplete split (overlaps / sessions. holes) SW Bug Hadoop Summit 2012. 6/13/12 Copyright 2012 26 Cloudera Inc, All Rights Reserved
    27. 27. Case #4 redux: Splitting Corruption HW Failure HBase has Region Network failure regions Automated Split Recovery attempts to inconsistencies repair tool (takes out NN) incomplete split (overlaps / (Minutes) holes) Fixes are SW Bug quicker, operator can use Hadoop Summit 2012. 6/13/12 Copyright 2012 27 Cloudera Inc, All Rights Reserved
    28. 28. Case #4 redux: Splitting Corruption HW Failure Minor HBase Region Network failure inconsistencies Automated Split Recovery attempts to repair tool (takes out NN) incomplete (bad split (seconds) assignments) Fixed SW Bug Hadoop Summit 2012. 6/13/12 Copyright 2012 28 Cloudera Inc, All Rights Reserved
    29. 29. Data Consistency• When a region server goes down, it tries to flush data in memory to HDFS.• If it cannot write to HDFS, it relies on the WAL/HLog.• Recovery via the HLog is vital to prevent data loss • Understand the write path. • Recovery: HLog splitting. • Faster Recovery: Distributed HLog splitting. Hadoop Summit 2012. 6/13/12 Copyright 2012 29 Cloudera Inc, All Rights Reserved
    30. 30. Write Path (Put / Delete / Increment) HBase Region Server client HLog Put Server HRegion HRegion MemStore MemStore Put HStore HStore HStore HStore Hadoop Summit 2012. 6/13/12 Copyright 2012 30 Cloudera Inc, All Rights Reserved
    31. 31. Write Path (Put / Delete / Increment) Note, both regions write to the same HBase HLog Region Server client Put HLog Put Put Server HRegion HRegion MemStore MemStore Put Put HStore HStore HStore HStore Hadoop Summit 2012. 6/13/12 Copyright 2012 31 Cloudera Inc, All Rights Reserved
    32. 32. Log Splitting HMaster RegionServer RegionServer RegionServer HLog1 HLog2 HLog3 … HRegion HRegion HRegion HRegion HRegion HRegion mem mem mem mem mem mem Hadoop Summit 2012. 6/13/12 Copyright 2012 32 Cloudera Inc, All Rights Reserved
    33. 33. Log Splitting HMaster RegionServer RegionServer RegionServer HLog1 HLog2 HLog3 … HRegion HRegion HRegion HRegion HRegion HRegion mem mem mem mem mem mem Hadoop Summit 2012. 6/13/12 Copyright 2012 33 Cloudera Inc, All Rights Reserved
    34. 34. Log Splitting HMaster HLog1 HLog2 HLog3 … HRegion HRegion HRegion HRegion HRegion HRegion Hadoop Summit 2012. 6/13/12 Copyright 2012 34 Cloudera Inc, All Rights Reserved
    35. 35. Log Splitting Splitting log 1 HMaster HLog1 HLog2 HLog3 … HRegion HRegion HRegion HRegion HRegion HRegion Hadoop Summit 2012. 6/13/12 Copyright 2012 35 Cloudera Inc, All Rights Reserved
    36. 36. Log Splitting Splitting log 2 HMaster HLog HLog1 HLog2 HLog3 … HRegion HRegion HRegion HRegion HRegion HRegion Hadoop Summit 2012. 6/13/12 Copyright 2012 36 Cloudera Inc, All Rights Reserved
    37. 37. Log Splitting Splitting log 3 HMaster HLog HLog1 HLog HLog2 HLog3 … HRegion HRegion HRegion HRegion HRegion HRegion Hadoop Summit 2012. 6/13/12 Copyright 2012 37 Cloudera Inc, All Rights Reserved
    38. 38. Log Splitting Splitting log 100 HMaster HLog HLog HLog … HRegion HRegion HRegion HRegion HRegion HRegion Hadoop Summit 2012. 6/13/12 Copyright 2012 38 Cloudera Inc, All Rights Reserved
    39. 39. Log Splitting Whew. I did a lot of splitting work. That HMaster took 9 hours! HLog HLog HLog … HRegion HRegion HRegion HRegion HRegion HRegion Hadoop Summit 2012. 6/13/12 Copyright 2012 39 Cloudera Inc, All Rights Reserved
    40. 40. Log Splitting RegionServers, here are your region HMaster assignments. RegionServer4 RegionServer5 RegionServer6 … … HRegion HRegion HRegion HRegion HRegion HRegion Hadoop Summit 2012. 6/13/12 Copyright 2012 40 Cloudera Inc, All Rights Reserved
    41. 41. Log Splitting Victory! HMaster RegionServer4 RegionServer5 RegionServer6 … HRegion HRegion HRegion HRegion HRegion HRegion mem mem mem mem mem mem Hadoop Summit 2012. 6/13/12 Copyright 2012 41 Cloudera Inc, All Rights Reserved
    42. 42. Can we recover more quickly?• In the case study, this is all done serially by the master • The master took 9 hours to recovery. • The 100 region server nodes were idle.• Let’s use the idle machines to do splitting in parallel!• Distributed log splitting (HBASE-1364) • Introduced in 0.92.0 by Prakash Khemani (Facebook) • Included in CDH4 (0.92.1) • Backported to CDH3u3 (off by default) Hadoop Summit 2012. 6/13/12 Copyright 2012 42 Cloudera Inc, All Rights Reserved
    43. 43. Distributed Log Splitting I’m the boss. HMaster RegionServer RegionServer RegionServer HLog1 HLog2 HLog3 … HRegion HRegion HRegion HRegion HRegion HRegion mem mem mem mem mem mem Hadoop Summit 2012. 6/13/12 Copyright 2012 43 Cloudera Inc, All Rights Reserved
    44. 44. Distributed Log Splitting There is a lot of splitting work here, HMaster let’s split it up. HLog1 HLog2 HLog3 … HRegion HRegion HRegion HRegion HRegion HRegion Hadoop Summit 2012. 6/13/12 Copyright 2012 44 Cloudera Inc, All Rights Reserved
    45. 45. Distributed Log Splitting You guys do the work for me. HMaster RegionServer4 RegionServer5 RegionServer6 HLog1 HLog2 HLog3 … HRegion HRegion HRegion HRegion HRegion HRegion Hadoop Summit 2012. 6/13/12 Copyright 2012 45 Cloudera Inc, All Rights Reserved
    46. 46. Distributed Log Splitting You guys do the work for me. HMaster RegionServer4 RegionServer5 RegionServer6 HLog1 HLog2 HLog3 … HRegion HRegion HRegion HRegion HRegion HRegion Hadoop Summit 2012. 6/13/12 Copyright 2012 46 Cloudera Inc, All Rights Reserved
    47. 47. Distributed Log Splitting Great, that took 5.4 minutes. HMaster RegionServer4 RegionServer5 RegionServer6 … HRegion HRegion HRegion HRegion HRegion HRegion Hadoop Summit 2012. 6/13/12 Copyright 2012 47 Cloudera Inc, All Rights Reserved
    48. 48. Distributed Log Splitting Good Job, here are your region HMaster assignments. RegionServer4 RegionServer5 RegionServer6 … HRegion HRegion HRegion HRegion HRegion HRegion Hadoop Summit 2012. 6/13/12 Copyright 2012 48 Cloudera Inc, All Rights Reserved
    49. 49. Distributed Log Splitting Like a Boss. HMaster RegionServer4 RegionServer5 RegionServer6 … HRegion HRegion HRegion HRegion HRegion HRegion mem mem mem mem mem mem Hadoop Summit 2012. 6/13/12 Copyright 2012 49 Cloudera Inc, All Rights Reserved
    50. 50. Case #5 redux: Network failure and slow recovery Correct but slow! Human error On RS loses restart, Roo 9 hour hlog Network Manual HDFS, WAL t and splittingHW failure Repair s .META. recovery assign fails Hadoop Summit 2012. 6/13/12 Copyright 2012 50 Cloudera Inc, All Rights Reserved
    51. 51. Case #5 redux: Network failure and slow recovery Correct and Faster! Human error On 5.4 Minute RS loses restart, Roo Network Automatic hlog HDFS, WAL t andHW failure repairs splitting s .META. recovery assign fails Fixed! Hadoop Summit 2012. 6/13/12 Copyright 2012 51 Cloudera Inc, All Rights Reserved
    52. 52. WHERE WE ARE GOINGHBASE 0.96 + HADOOP 2.X Hadoop Summit 2012. 6/13/12 Copyright 2012 52 Cloudera Inc, All Rights Reserved
    53. 53. Themes• Minimizing Planned downtime HBase Downtime • Changing configurations Distribution • Online Schema Change (experimental in 0.92, 0.94) • Rolling Restarts Planned • Wire compatibility Unplanned Hadoop Summit 2012. 6/13/12 Copyright 2012 53 Cloudera Inc, All Rights Reserved
    54. 54. Table unavailable when changing schema• Changing table schema requires disabling table • disable table, alter table schema, enable table • Schema includes compression, cf’s, caching, ttl, versions.• Goal: Quickly change table and column configuration settings without having to disable Hbase tables. • Feature Online Schema Change (HBASE-1730) • Included in but considered experimental in HBase 0.92/0.94. • Contributed by Facebook Hadoop Summit 2012. 6/13/12 Copyright 2012 54 Cloudera Inc, All Rights Reserved
    55. 55. Changing Server Configs and Software updates• Rolling restart is an operation for upgrading an HBase cluster to a compatible version while keeping HBase available and serving data. • Handle server config changes. • Handle code changes like hotfixes or compatible upgrades Hadoop Summit 2012. 6/13/12 Copyright 2012 55 Cloudera Inc, All Rights Reserved
    56. 56. Rolling Restart Admin operations ZK Client Shell HM1 User operations HM2 RS1 RS2 RS3 RS4 Internal operations Hadoop Summit 2012. 6/13/12 Copyright 2012 56 Cloudera Inc, All Rights Reserved
    57. 57. Rolling Restart Admin operations ZK Client Shell HM1 User operations HM2 RS2 RS3 RS4 Internal operations Hadoop Summit 2012. 6/13/12 Copyright 2012 57 Cloudera Inc, All Rights Reserved
    58. 58. Rolling Restart Admin operations ZK Client Shell HM1 User operations HM2 RS1 RS2 RS3 RS4 Internal operations Hadoop Summit 2012. 6/13/12 Copyright 2012 58 Cloudera Inc, All Rights Reserved
    59. 59. Rolling Restart Admin operations ZK Client Shell HM1 User operations HM2 RS1 RS3 RS4 Internal operations Hadoop Summit 2012. 6/13/12 Copyright 2012 59 Cloudera Inc, All Rights Reserved
    60. 60. Rolling Restart Admin operations ZK Client Shell HM1 User operations HM2 RS1 RS2 RS3 RS4 Internal operations Hadoop Summit 2012. 6/13/12 Copyright 2012 60 Cloudera Inc, All Rights Reserved
    61. 61. Rolling Restart Admin operations ZK Client Shell HM1 User operations HM2 RS1 RS2 RS4 Internal operations Hadoop Summit 2012. 6/13/12 Copyright 2012 61 Cloudera Inc, All Rights Reserved
    62. 62. Rolling Restart Admin operations ZK Client Shell HM1 User operations HM2 RS1 RS2 RS3 RS4 Internal operations Hadoop Summit 2012. 6/13/12 Copyright 2012 62 Cloudera Inc, All Rights Reserved
    63. 63. Rolling Restart Admin operations ZK Client Shell HM1 User operations HM2 RS1 RS2 RS3 Internal operations Hadoop Summit 2012. 6/13/12 Copyright 2012 63 Cloudera Inc, All Rights Reserved
    64. 64. Rolling Restart Admin operations ZK Client Shell HM1 User operations HM2 RS1 RS2 RS3 RS4 Internal operations Hadoop Summit 2012. 6/13/12 Copyright 2012 64 Cloudera Inc, All Rights Reserved
    65. 65. Rolling Restart Admin operations ZK Client Shell User operations HM2 RS1 RS2 RS3 RS4 Internal operations Hadoop Summit 2012. 6/13/12 Copyright 2012 65 Cloudera Inc, All Rights Reserved
    66. 66. Rolling Restart Admin operations ZK Client Shell HM1 User operations HM2 RS1 RS2 RS3 RS4 Internal operations Hadoop Summit 2012. 6/13/12 Copyright 2012 66 Cloudera Inc, All Rights Reserved
    67. 67. Rolling Restart Admin operations ZK Client Shell HM1 User operations RS1 RS2 RS3 RS4 Internal operations Hadoop Summit 2012. 6/13/12 Copyright 2012 67 Cloudera Inc, All Rights Reserved
    68. 68. Rolling Restart Admin operations ZK Client Shell HM1 User operations HM2 RS1 RS2 RS3 RS4 Internal operations Hadoop Summit 2012. 6/13/12 Copyright 2012 68 Cloudera Inc, All Rights Reserved
    69. 69. Rolling Restart Admin operations ZK Client Shell HM1 User operations HM2 RS1 RS2 RS3 RS4 Internal operations Hadoop Summit 2012. 6/13/12 Copyright 2012 69 Cloudera Inc, All Rights Reserved
    70. 70. Rolling restart limitations• There are limitations on Unplanned Maintenance: Root rolling restarts Cause from Cloudera Support • All Servers and clients must be wire compatible • All must be able to read old data in FS and ZK. Repair Needed HBase, ZK, 28%• Ramifications: MR, HDFS Misconfig • Only minor version upgrades 44% possible Fix • New features that change RPCs HW/NW require custom compatibility 16% Patch shims. Required • Data format changes not 12% possible across minor versions. Source: Cloudera’s production HBase Support Tickets CDH3’s HBase 0.90.x, Hadoop 0.20.x/1.0.x Hadoop Summit 2012. 6/13/12 Copyright 2012 70 Cloudera Inc, All Rights Reserved
    71. 71. HBase Compatibility and Extensibility• Coming in HBase 0.96 • HBASE-5305 and friends• Goals: • Allow API and changes and persistent data structure changes while guarantees compatibility between different minor versions (0.96.0 -> 0.96.1) • HBase client server compatibility between Major Versions. (0.96.x -> 0.98.x) Hadoop Summit 2012. 6/13/12 Copyright 2012 71 Cloudera Inc, All Rights Reserved
    72. 72. HDFS Wire Compatibility• Here in HDFS 2.0.x • HADOOP-7347 and friends App MR• Goals: • Allow API and changes while guaranteeing wire compatibility between different minor versions • HDFS client server compatibility ZK HDFS between Major Versions. Hadoop Summit 2012. 6/13/12 Copyright 2012 72 Cloudera Inc, All Rights Reserved
    73. 73. HDFS Wire Compatibility• Here in HDFS 2.0.x • HADOOP-7347 and friends App MR• Goals: • Allow API and changes while guaranteeing wire compatibility between different minor versions • HDFS client server compatibility ZK HDFS between Major Versions. Hadoop Summit 2012. 6/13/12 Copyright 2012 73 Cloudera Inc, All Rights Reserved
    74. 74. CONCLUSIONS Hadoop Summit 2012. 6/13/12 Copyright 2012 74 Cloudera Inc, All Rights Reserved
    75. 75. Improving how we handling causes of downtime Unplanned Maintenance: Root HBase Downtime Distribution Cause from Cloudera Support Wire compat Best hbck practices Repair Planned Needed HBase, ZK, 28% MR, HDFS Misconfig 44% Unplanned Fix HW/NW 16% Patch Required hbck and 12% distributed log Wire splitting compat Hadoop Summit 2012. 6/13/12 Copyright 2012 75 Cloudera Inc, All Rights Reserved
    76. 76. jon@cloudera.com Twitter: @jmhsieh We’re hiring!QUESTIONS? Hadoop Summit 2012. 6/13/12 Copyright 2012 76 Cloudera Inc, All Rights Reserved

    ×