Apache HBase Features for the USE PUBLICLY                          DO NOTEnterprise                 PRIOR TO 10/23/12Head...
Who Am I?                •   Cloudera:                      • Software Engineer                      • Apache HBase commit...
What is Apache HBase?                          Apache HBase is an open        App   MR                          source, di...
HBase provides Low-latency Random Access    •   Writes:           •   1-3ms, 1k-10k writes/sec per node                   ...
HBase On a Cluster        HDFS NameNodes     ZooKeeper          Slave Boxes (DN + RS)        HBase Masters       Quorum   ...
Production Apache HBase Applications    • Inbox    • Storage    • Web    • Search    • Analytics    • Monitoring         M...
Production Systems Need to Avoid Risk    •   Unfortunately, all things can fail.    •   Enterprises need to minimize risk....
Risks from within the clusterHosts and Services
Causes of HBase Downtime within the cluster     • Unplanned Maintenance                • Planned Maintenance        • Hard...
Unplanned Downtime                                   Service realizes there is a                Failure Event        probl...
Reduce downtime by speeding up recovery time                                   Service realizes there is a                ...
Reduce downtime by speeding up detection                                       Service realizes there is a                ...
Manual Problem detection: Metrics                      • Goal: Pinpoint root causes of problems                        fas...
Metrics from all levels of the system     •   HBase Region Servers                           Host                       JV...
16   10/25/12 Strata Hadoop World 2012
Eliminate HBase downtime                Maintenance         Service remains online,           Full service                ...
High Availability: HBase + HDFS + ZK         HDFS NameNodes     ZooKeeper          Slave Boxes (DN + RS)         HBase Mas...
Wire Compatibility •   Reduces downtime due to planned maintenance •   Wire compatibility + Extensible Data formats       ...
Risks from outside the clusterDatacenters and Disaster Recovery
External Risks21                    10/25/12 Strata Hadoop World 2012
Geographically separated copies of data22                    10/25/12 Strata Hadoop World 2012
Strategy: HBase-Supported Batch Backups     •   Export / Dist CP / Import                                                 ...
Strategy: Custom Application-managed Replication     •   Application writes to two instances of HBase           • Low Late...
Strategy: HBase replication (0.92+)     •   HBase Asynchronously copy edit logs to other clusters.           • Replication...
Master-Master Replication                                             logs           logs        logs                  Rep...
Risks from Users“Problem exists between keyboard and chair.”
Oops… User Error                User Error:                                        Service is restored,                dro...
Prevent user mistakes: User-level Security                User Error:                 Operation                drop ‘table...
HBase User-level Security     •   Based on Kerberos for HBase, HDFS and Zookeeper           • Grant privileges to users   ...
Recovering from User Mistakes: Table Snapshots                                                                            ...
Table Snapshots (0.96+)     • Under development, slated for HBase 0.96     • Multiple snapshot flavors planned           •...
ConclusionsHBase for the enterprise.
Feature Summary by Category                                     Avoid Down Time                                     • Roll...
Feature Summary by Version     0.90                            0.92                            0.94                       ...
Thank You!Jonathan Hsieh | @jmhsiehSoftware Engineer, ClouderaApache HBase committer / PMC member
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Upcoming SlideShare
Loading in...5
×

Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

1,545

Published on

Apache HBase is a distributed data store that is in production today at many enterprises and sites serving large volumes of near-real-time random-accesses. As Apache HBase matures, the community has augmented the system with new features that many enterprise consider to be hard requirements. We will discuss how the upcoming HBase 0.96 release addresses many of these shortcomings by introducing new features that will help the administrator monitor and control access to the system, and new mechanisms to minimize downtime due to expected and unexpected outages.

0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,545
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
72
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide
  • Hbase is a project that solves this problem. In a sentence, Hbase is an open source, distributed, sorted map modeled after Google’s BigTable.Open-source: Apache HBase is an open source project with an Apache 2.0 license.Distributed: HBase is designed to use multiple machines to store and serve data.Sorted Map: HBase stores data as a map, and guarantees that adjacent keys will be stored next to each other on disk.HBase is modeled after BigTable, a system that is used for hundreds of applications at Google.
  • Tested under HBase
  • 2 generals about detection time.
  • Most metrics reside in the region serverMetrics for various categories (e.g. Stores, compactions, flushes)Metrics per operation type (e.g. get, put, delete)Remember HDFS,JVM, and host metrics as well
  • Tested under HBase
  • Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

    1. 1. Apache HBase Features for the USE PUBLICLY DO NOTEnterprise PRIOR TO 10/23/12Headline Goes HereJonathan Hsieh | @jmhsiehSpeaker Name or Subhead Goes HereSoftware Engineer at Cloudera / HBase PMC MemberOctober 2012
    2. 2. Who Am I? • Cloudera: • Software Engineer • Apache HBase committer / PMC • Apache Flume founder / PMC • Apache Sqoop committer / PMC • U of Washington: • Research in Distributed Systems2 10/25/12 Strata Hadoop World 2012
    3. 3. What is Apache HBase? Apache HBase is an open App MR source, distributed, scala ble, consistent, low latency, random access ZK HDFS non-relational database built on Apache Hadoop3 10/25/12 Strata Hadoop World 2012
    4. 4. HBase provides Low-latency Random Access • Writes: • 1-3ms, 1k-10k writes/sec per node 0000000000 • Reads: 4 1111111111 • 0-3ms cached, 10-30ms disk 1 2222222222 • 10-40k reads / second / node from 3333333333 cache 5 4444444444 • Cell size: 5555555555 6666666666 • 0-3MB preferred 2 7777777777 • Read, write and insert data anywhere in 3 the table • No sequential write limitations4 9/23/12 Strangeloop 2012
    5. 5. HBase On a Cluster HDFS NameNodes ZooKeeper Slave Boxes (DN + RS) HBase Masters Quorum Rack 1 Name node Rack 2 Name node5 10/25/12 Strata Hadoop World 2012
    6. 6. Production Apache HBase Applications • Inbox • Storage • Web • Search • Analytics • Monitoring More Case Studies at http://www.hbasecon.com/agenda/6 10/25/12 Strata Hadoop World 2012
    7. 7. Production Systems Need to Avoid Risk • Unfortunately, all things can fail. • Enterprises need to minimize risk. • Understand potential data loss scenarios • Understand potential unavailability scenarios • Must have a disaster recovery story • Downtime, data loss == risk • Let’s talk about how HBase deals with: • Risks from within the cluster • Risks from outside the cluster • Risks posed by Users • Goal: Remove or reduce negative impact of potential risks7 10/25/12 Strata Hadoop World 2012
    8. 8. Risks from within the clusterHosts and Services
    9. 9. Causes of HBase Downtime within the cluster • Unplanned Maintenance • Planned Maintenance • Hardware failures • Upgrades • Software errors • Migrations • Human error Goal: Reduce downtime from hours to minutes to seconds.10 10/25/12 Strata Hadoop World 2012
    10. 10. Unplanned Downtime Service realizes there is a Failure Event problem starts fixing. Detection Time Recovery Time time Service still Service is thinks we are ok restored. • Two sources of unavailability • Detection time • Recovery time11 10/25/12 Strata Hadoop World 2012
    11. 11. Reduce downtime by speeding up recovery time Service realizes there is a Failure Event problem starts fixing. Detection Time time Service still Service is thinks we are ok restored. • Distributed log splitting (0.92) • Automated metadata repairs with hbck (0.92) • Enable of writes while recovering from failure (0.96)12 10/25/12 Strata Hadoop World 2012
    12. 12. Reduce downtime by speeding up detection Service realizes there is a Failure Event problem starts fixing. time Service still Service is thinks we are ok restored. • Proactively notify to recover from process failures quickly 0.92/0.94 All Master Failure Detection 180s Some Region Server Failure detection 180s 0.96 Master process failure detection 0- Region Server Failure detection 1s 0-1s13 10/25/12 Strata Hadoop World 2012
    13. 13. Manual Problem detection: Metrics • Goal: Pinpoint root causes of problems faster • Take a baseline of your system in steady- state • Anomalies like spikes or dips from baseline can indicate problems • Ex: Slow Query Logging • Integrates with existing infrastructure via JMX or use with Ganglia, Cloudera Manager14 10/25/12 Strata Hadoop World 2012
    14. 14. Metrics from all levels of the system • HBase Region Servers Host JVM • Operations / sec Master • Get / put latencies (0.92) CPUs Memory Disks Network • Per CF metrics (0.94) • Per Region metrics (0.94) Host • HBase Master JVM JVM Region server • RIT metrics Region HDFS DataNode • Replication Metrics CF CF • System/JVM CPUs Memory Network Disks • GC, RPC metrics15 10/25/12 Strata Hadoop World 2012
    15. 15. 16 10/25/12 Strata Hadoop World 2012
    16. 16. Eliminate HBase downtime Maintenance Service remains online, Full service Begins slightly degraded is restored. Maintenance Time time • Highly Available stack: HDFS (2.0) / ZK / HBASE • Client Cross-version wire compatibility (0.96) • Rolling Restarts • Online Schema Change (experimental)17 10/25/12 Strata Hadoop World 2012
    17. 17. High Availability: HBase + HDFS + ZK HDFS NameNodes ZooKeeper Slave Boxes (DN + RS) HBase Masters Quorum Rack 1 Name node Rack 2 Name node18 10/25/12 Strata Hadoop World 2012
    18. 18. Wire Compatibility • Reduces downtime due to planned maintenance • Wire compatibility + Extensible Data formats • Allow for forwards and backwards compatibility • Older clients can talk to newer servers and visa App MR versa • Rolling upgrade • Upgrade a single node at a time while system runs • Allows API and changes while guaranteeing wire ZK HDFS compatibility between different minor versions • HDFS client-server compatibility between Major Versions19 10/25/12 Strata Hadoop World 2012
    19. 19. Risks from outside the clusterDatacenters and Disaster Recovery
    20. 20. External Risks21 10/25/12 Strata Hadoop World 2012
    21. 21. Geographically separated copies of data22 10/25/12 Strata Hadoop World 2012
    22. 22. Strategy: HBase-Supported Batch Backups • Export / Dist CP / Import Import Export Dist CP • 3 batch MR jobs MR Job MR Job MR Job • Several extra copies of data • High latency (hours) • Copy Table Copy Table • 1 MR Job MR Job • Single copy of data • High Latency (hours) • Incremental table copies23 10/25/12 Strata Hadoop World 2012
    23. 23. Strategy: Custom Application-managed Replication • Application writes to two instances of HBase • Low Latency • Adds complexity • Inefficient App App24 10/25/12 Strata Hadoop World 2012
    24. 24. Strategy: HBase replication (0.92+) • HBase Asynchronously copy edit logs to other clusters. • Replication lag measured in seconds • Automatically catch up from failures. • Eventually consistent • Efficient batching • Master-slave† (0.90) logs logs logs logs • Master-master (0.92) Replication25 10/25/12 Strata Hadoop World 2012
    25. 25. Master-Master Replication logs logs logs Replicating data reduces chances of data loss.26 10/25/12 Strata Hadoop World 2012
    26. 26. Risks from Users“Problem exists between keyboard and chair.”
    27. 27. Oops… User Error User Error: Service is restored, drop ‘table’ major data loss Recovery Time time Service is down! • How do we prevent user error? • How do we recover from user error?28 10/25/12 Strata Hadoop World 2012
    28. 28. Prevent user mistakes: User-level Security User Error: Operation drop ‘table’ rejected, insufficient permissions. time • Authentication: • Ensure the identity of the services or users that are communicating • Access Control: • Ensure user has permission to execute table data operations29 10/25/12 Strata Hadoop World 2012
    29. 29. HBase User-level Security • Based on Kerberos for HBase, HDFS and Zookeeper • Grant privileges to users • Revoke privileges from users. • Column Family and Table granularity • Confidentiality: • Ensure information is only seen by intended users. • Audit Trails: • Track which users performed particular operations30 10/25/12 Strata Hadoop World 2012
    30. 30. Recovering from User Mistakes: Table Snapshots Service is User Error: Service is down! restored, minor data drop ‘table’ loss restore time Periodic snapshot • Snapshot the state of a table at a certain moment in time • Restore it or Clone it later, creating a new read write table • Export it to another cluster with minimal impact on HBase31 10/25/12 Strata Hadoop World 2012
    31. 31. Table Snapshots (0.96+) • Under development, slated for HBase 0.96 • Multiple snapshot flavors planned • Offline snapshots • Online Snapshots • Snapshot uses • Recover from application or user error. • Application experimentation (no need to spin up another cluster for replication) • Use MR directly on snapshot files32 10/25/12 Strata Hadoop World 2012
    32. 32. ConclusionsHBase for the enterprise.
    33. 33. Feature Summary by Category Avoid Down Time • Rolling Restart • Online backups • Table Replication • Security Access Controls (0.92) • Wire Compatibility (0.96) • Snapshots (0.96) Detection Time Recovery Time time Reduce Detection Time Reduce Recovery Time • Improved metrics • Distributed Log Splitting (0.92) • Proactive notification of HMaster failure (0.96) • Improvements to HBCK (0.94) • Proactive notification of RegionServer failure • Allow writes during recovery (0.96) (0.96) •Snapshots (0.96)34 10/25/12 Strata Hadoop World 2012
    34. 34. Feature Summary by Version 0.90 0.92 0.94 0.96 (Upcoming Release) •Metrics •Metrics •CF+Region Granularity •CF +Region Granularity Metrics Metrics •Improved failure detection time •Distributed log splitting* •Distributed log splitting •Distributed log splitting •Distributed log splitting •HBCK improvements* •HBCK improvements •HBCK improvements •HBCK improvements •Copy Table / Import / Export •Copy Table / Import / Export •Copy Table / Import / Export •Copy Table / Import / Export •Master-Slave Replication† •Master-Master Replication •Master-Master Replication •Master-Master Replication •Client Wire compatibility •Authentication and •Authentication and •Authentication and Authorization Authorization Authorization •(Snapshots) Recovery in Hours Recovery in Minutes Recovery in Minutes (Recovery in Seconds) † experimental (in progress) *backported in CDH35 10/25/12 Strata Hadoop World 2012
    35. 35. Thank You!Jonathan Hsieh | @jmhsiehSoftware Engineer, ClouderaApache HBase committer / PMC member
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×