Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
 

Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

on

  • 1,998 views

Apache HBase is a distributed data store that is in production today at many enterprises and sites serving large volumes of near-real-time random-accesses. As Apache HBase matures, the community has ...

Apache HBase is a distributed data store that is in production today at many enterprises and sites serving large volumes of near-real-time random-accesses. As Apache HBase matures, the community has augmented the system with new features that many enterprise consider to be hard requirements. We will discuss how the upcoming HBase 0.96 release addresses many of these shortcomings by introducing new features that will help the administrator monitor and control access to the system, and new mechanisms to minimize downtime due to expected and unexpected outages.

Statistics

Views

Total Views
1,998
Views on SlideShare
1,777
Embed Views
221

Actions

Likes
5
Downloads
71
Comments
0

4 Embeds 221

http://www.cloudera.com 216
http://author01.mtv.cloudera.com 2
http://author01.core.cloudera.com 2
http://www.bigdatacloud.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Hbase is a project that solves this problem. In a sentence, Hbase is an open source, distributed, sorted map modeled after Google’s BigTable.Open-source: Apache HBase is an open source project with an Apache 2.0 license.Distributed: HBase is designed to use multiple machines to store and serve data.Sorted Map: HBase stores data as a map, and guarantees that adjacent keys will be stored next to each other on disk.HBase is modeled after BigTable, a system that is used for hundreds of applications at Google.
  • Tested under HBase
  • 2 generals about detection time.
  • Most metrics reside in the region serverMetrics for various categories (e.g. Stores, compactions, flushes)Metrics per operation type (e.g. get, put, delete)Remember HDFS,JVM, and host metrics as well
  • Tested under HBase

Strata + Hadoop World 2012: Apache HBase Features for the Enterprise Strata + Hadoop World 2012: Apache HBase Features for the Enterprise Presentation Transcript

  • Apache HBase Features for the USE PUBLICLY DO NOTEnterprise PRIOR TO 10/23/12Headline Goes HereJonathan Hsieh | @jmhsiehSpeaker Name or Subhead Goes HereSoftware Engineer at Cloudera / HBase PMC MemberOctober 2012
  • Who Am I? • Cloudera: • Software Engineer • Apache HBase committer / PMC • Apache Flume founder / PMC • Apache Sqoop committer / PMC • U of Washington: • Research in Distributed Systems2 10/25/12 Strata Hadoop World 2012
  • What is Apache HBase? Apache HBase is an open App MR source, distributed, scala ble, consistent, low latency, random access ZK HDFS non-relational database built on Apache Hadoop3 10/25/12 Strata Hadoop World 2012 View slide
  • HBase provides Low-latency Random Access • Writes: • 1-3ms, 1k-10k writes/sec per node 0000000000 • Reads: 4 1111111111 • 0-3ms cached, 10-30ms disk 1 2222222222 • 10-40k reads / second / node from 3333333333 cache 5 4444444444 • Cell size: 5555555555 6666666666 • 0-3MB preferred 2 7777777777 • Read, write and insert data anywhere in 3 the table • No sequential write limitations4 9/23/12 Strangeloop 2012 View slide
  • HBase On a Cluster HDFS NameNodes ZooKeeper Slave Boxes (DN + RS) HBase Masters Quorum Rack 1 Name node Rack 2 Name node5 10/25/12 Strata Hadoop World 2012
  • Production Apache HBase Applications • Inbox • Storage • Web • Search • Analytics • Monitoring More Case Studies at http://www.hbasecon.com/agenda/6 10/25/12 Strata Hadoop World 2012
  • Production Systems Need to Avoid Risk • Unfortunately, all things can fail. • Enterprises need to minimize risk. • Understand potential data loss scenarios • Understand potential unavailability scenarios • Must have a disaster recovery story • Downtime, data loss == risk • Let’s talk about how HBase deals with: • Risks from within the cluster • Risks from outside the cluster • Risks posed by Users • Goal: Remove or reduce negative impact of potential risks7 10/25/12 Strata Hadoop World 2012
  • Risks from within the clusterHosts and Services
  • Causes of HBase Downtime within the cluster • Unplanned Maintenance • Planned Maintenance • Hardware failures • Upgrades • Software errors • Migrations • Human error Goal: Reduce downtime from hours to minutes to seconds.10 10/25/12 Strata Hadoop World 2012
  • Unplanned Downtime Service realizes there is a Failure Event problem starts fixing. Detection Time Recovery Time time Service still Service is thinks we are ok restored. • Two sources of unavailability • Detection time • Recovery time11 10/25/12 Strata Hadoop World 2012
  • Reduce downtime by speeding up recovery time Service realizes there is a Failure Event problem starts fixing. Detection Time time Service still Service is thinks we are ok restored. • Distributed log splitting (0.92) • Automated metadata repairs with hbck (0.92) • Enable of writes while recovering from failure (0.96)12 10/25/12 Strata Hadoop World 2012
  • Reduce downtime by speeding up detection Service realizes there is a Failure Event problem starts fixing. time Service still Service is thinks we are ok restored. • Proactively notify to recover from process failures quickly 0.92/0.94 All Master Failure Detection 180s Some Region Server Failure detection 180s 0.96 Master process failure detection 0- Region Server Failure detection 1s 0-1s13 10/25/12 Strata Hadoop World 2012
  • Manual Problem detection: Metrics • Goal: Pinpoint root causes of problems faster • Take a baseline of your system in steady- state • Anomalies like spikes or dips from baseline can indicate problems • Ex: Slow Query Logging • Integrates with existing infrastructure via JMX or use with Ganglia, Cloudera Manager14 10/25/12 Strata Hadoop World 2012
  • Metrics from all levels of the system • HBase Region Servers Host JVM • Operations / sec Master • Get / put latencies (0.92) CPUs Memory Disks Network • Per CF metrics (0.94) • Per Region metrics (0.94) Host • HBase Master JVM JVM Region server • RIT metrics Region HDFS DataNode • Replication Metrics CF CF • System/JVM CPUs Memory Network Disks • GC, RPC metrics15 10/25/12 Strata Hadoop World 2012
  • 16 10/25/12 Strata Hadoop World 2012
  • Eliminate HBase downtime Maintenance Service remains online, Full service Begins slightly degraded is restored. Maintenance Time time • Highly Available stack: HDFS (2.0) / ZK / HBASE • Client Cross-version wire compatibility (0.96) • Rolling Restarts • Online Schema Change (experimental)17 10/25/12 Strata Hadoop World 2012
  • High Availability: HBase + HDFS + ZK HDFS NameNodes ZooKeeper Slave Boxes (DN + RS) HBase Masters Quorum Rack 1 Name node Rack 2 Name node18 10/25/12 Strata Hadoop World 2012
  • Wire Compatibility • Reduces downtime due to planned maintenance • Wire compatibility + Extensible Data formats • Allow for forwards and backwards compatibility • Older clients can talk to newer servers and visa App MR versa • Rolling upgrade • Upgrade a single node at a time while system runs • Allows API and changes while guaranteeing wire ZK HDFS compatibility between different minor versions • HDFS client-server compatibility between Major Versions19 10/25/12 Strata Hadoop World 2012
  • Risks from outside the clusterDatacenters and Disaster Recovery
  • External Risks21 10/25/12 Strata Hadoop World 2012
  • Geographically separated copies of data22 10/25/12 Strata Hadoop World 2012
  • Strategy: HBase-Supported Batch Backups • Export / Dist CP / Import Import Export Dist CP • 3 batch MR jobs MR Job MR Job MR Job • Several extra copies of data • High latency (hours) • Copy Table Copy Table • 1 MR Job MR Job • Single copy of data • High Latency (hours) • Incremental table copies23 10/25/12 Strata Hadoop World 2012
  • Strategy: Custom Application-managed Replication • Application writes to two instances of HBase • Low Latency • Adds complexity • Inefficient App App24 10/25/12 Strata Hadoop World 2012
  • Strategy: HBase replication (0.92+) • HBase Asynchronously copy edit logs to other clusters. • Replication lag measured in seconds • Automatically catch up from failures. • Eventually consistent • Efficient batching • Master-slave† (0.90) logs logs logs logs • Master-master (0.92) Replication25 10/25/12 Strata Hadoop World 2012
  • Master-Master Replication logs logs logs Replicating data reduces chances of data loss.26 10/25/12 Strata Hadoop World 2012
  • Risks from Users“Problem exists between keyboard and chair.”
  • Oops… User Error User Error: Service is restored, drop ‘table’ major data loss Recovery Time time Service is down! • How do we prevent user error? • How do we recover from user error?28 10/25/12 Strata Hadoop World 2012
  • Prevent user mistakes: User-level Security User Error: Operation drop ‘table’ rejected, insufficient permissions. time • Authentication: • Ensure the identity of the services or users that are communicating • Access Control: • Ensure user has permission to execute table data operations29 10/25/12 Strata Hadoop World 2012
  • HBase User-level Security • Based on Kerberos for HBase, HDFS and Zookeeper • Grant privileges to users • Revoke privileges from users. • Column Family and Table granularity • Confidentiality: • Ensure information is only seen by intended users. • Audit Trails: • Track which users performed particular operations30 10/25/12 Strata Hadoop World 2012
  • Recovering from User Mistakes: Table Snapshots Service is User Error: Service is down! restored, minor data drop ‘table’ loss restore time Periodic snapshot • Snapshot the state of a table at a certain moment in time • Restore it or Clone it later, creating a new read write table • Export it to another cluster with minimal impact on HBase31 10/25/12 Strata Hadoop World 2012
  • Table Snapshots (0.96+) • Under development, slated for HBase 0.96 • Multiple snapshot flavors planned • Offline snapshots • Online Snapshots • Snapshot uses • Recover from application or user error. • Application experimentation (no need to spin up another cluster for replication) • Use MR directly on snapshot files32 10/25/12 Strata Hadoop World 2012
  • ConclusionsHBase for the enterprise.
  • Feature Summary by Category Avoid Down Time • Rolling Restart • Online backups • Table Replication • Security Access Controls (0.92) • Wire Compatibility (0.96) • Snapshots (0.96) Detection Time Recovery Time time Reduce Detection Time Reduce Recovery Time • Improved metrics • Distributed Log Splitting (0.92) • Proactive notification of HMaster failure (0.96) • Improvements to HBCK (0.94) • Proactive notification of RegionServer failure • Allow writes during recovery (0.96) (0.96) •Snapshots (0.96)34 10/25/12 Strata Hadoop World 2012
  • Feature Summary by Version 0.90 0.92 0.94 0.96 (Upcoming Release) •Metrics •Metrics •CF+Region Granularity •CF +Region Granularity Metrics Metrics •Improved failure detection time •Distributed log splitting* •Distributed log splitting •Distributed log splitting •Distributed log splitting •HBCK improvements* •HBCK improvements •HBCK improvements •HBCK improvements •Copy Table / Import / Export •Copy Table / Import / Export •Copy Table / Import / Export •Copy Table / Import / Export •Master-Slave Replication† •Master-Master Replication •Master-Master Replication •Master-Master Replication •Client Wire compatibility •Authentication and •Authentication and •Authentication and Authorization Authorization Authorization •(Snapshots) Recovery in Hours Recovery in Minutes Recovery in Minutes (Recovery in Seconds) † experimental (in progress) *backported in CDH35 10/25/12 Strata Hadoop World 2012
  • Thank You!Jonathan Hsieh | @jmhsiehSoftware Engineer, ClouderaApache HBase committer / PMC member