• Save
Trends in Supporting Production Apache HBase Clusters
Upcoming SlideShare
Loading in...5
×
 

Trends in Supporting Production Apache HBase Clusters

on

  • 995 views

Apache HBase is a distributed data store that is in production today at many enterprises and sites serving large volumes of near-real-time random-accesses. By supporting a wide range of production ...

Apache HBase is a distributed data store that is in production today at many enterprises and sites serving large volumes of near-real-time random-accesses. By supporting a wide range of production Apache HBase clusters with diverse use cases and sizes over the past year, we?ve noticed several new trends, learned lessons, and taken action to improve the HBase experience. We?ll present aggregated root-cause statistics on resolved support tickets from the past year. The comparison between this and the previous year?s shows an interesting shift away from problems internal to HBase (splitting, repairs, recovery time) that skews towards user-inflicted problems like poor application architecture level that can be mitigated by tuning (bulk load, r/w latencies and compaction policies). The talk will discuss several tuning tips used for a variety of production workloads running on top of the HBase 0.92.x/0.94.x clusters with 10s to 100s of nodes. This will include settings and their justification for sizing clusters, tuning bulk loads, region counts, and memory settings. We?ll also discuss recently added HBase features that alleviate these problems including an improved mean time to recovery, improved predictability, and improved performance.

Statistics

Views

Total Views
995
Views on SlideShare
995
Embed Views
0

Actions

Likes
8
Downloads
0
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Hbase is a project that solves this problem. In a sentence, Hbase is an open source, distributed, sorted map modeled after Google’s BigTable.Open-source: Apache HBase is an open source project with an Apache 2.0 license.Distributed: HBase is designed to use multiple machines to store and serve data.Sorted Map: HBase stores data as a map, and guarantees that adjacent keys will be stored next to each other on disk.HBase is modeled after BigTable, a system that is used for hundreds of applications at Google.
  • Hbase is a project that solves this problem. In a sentence, Hbase is an open source, distributed, sorted map modeled after Google’s BigTable.Open-source: Apache HBase is an open source project with an Apache 2.0 license.Distributed: HBase is designed to use multiple machines to store and serve data.Sorted Map: HBase stores data as a map, and guarantees that adjacent keys will be stored next to each other on disk.HBase is modeled after BigTable, a system that is used for hundreds of applications at Google.
  • This pie chart is a product from analyzing critical production Hbase tickets over the past 6 months: misconfig 44%, patch 12%,hw/nw 16%, repair 28%. Meaning that correcting a misconfig was all that it took to bring Hbase back up again. As you can see, misconfigurations and bugs break the most HBase clusters. Fixing bugs is up to the community. Fixing misconfigurations is up to you and the focus of the next segment. Because it’s hard to diagnose, misconfigurations are not what you want to spend your time on.If your cluster is broken, it’s probably a misconfiguration. This is a hard problem becausethe error messages are not tightly tied to the root cause.
  • This pie chart is a product from analyzing critical production Hbase tickets over the past 6 months: misconfig 44%, patch 12%,hw/nw 16%, repair 28%. Meaning that correcting a misconfig was all that it took to bring Hbase back up again. As you can see, misconfigurations and bugs break the most HBase clusters. Fixing bugs is up to the community. Fixing misconfigurations is up to you and the focus of the next segment. Because it’s hard to diagnose, misconfigurations are not what you want to spend your time on.If your cluster is broken, it’s probably a misconfiguration. This is a hard problem becausethe error messages are not tightly tied to the root cause.
  • This pie chart is a product from analyzing critical production Hbase tickets over the past 6 months: misconfig 44%, patch 12%,hw/nw 16%, repair 28%. Meaning that correcting a misconfig was all that it took to bring Hbase back up again. As you can see, misconfigurations and bugs break the most HBase clusters. Fixing bugs is up to the community. Fixing misconfigurations is up to you and the focus of the next segment. Because it’s hard to diagnose, misconfigurations are not what you want to spend your time on.If your cluster is broken, it’s probably a misconfiguration. This is a hard problem becausethe error messages are not tightly tied to the root cause.
  • Old edits – HBASE-64400-Length – HBASE-6443Bad refs – HBASE-7199(hbck) Invalid Hfile - Cosmic
  • Hannibal helped a lot with identifying balance issues.
  • Hannibal helped a lot with identifying balance issues.
  • This pie chart is a product from analyzing critical production Hbase tickets over the past 6 months: misconfig 44%, patch 12%,hw/nw 16%, repair 28%. Meaning that correcting a misconfig was all that it took to bring Hbase back up again. As you can see, misconfigurations and bugs break the most HBase clusters. Fixing bugs is up to the community. Fixing misconfigurations is up to you and the focus of the next segment. Because it’s hard to diagnose, misconfigurations are not what you want to spend your time on.If your cluster is broken, it’s probably a misconfiguration. This is a hard problem becausethe error messages are not tightly tied to the root cause.

Trends in Supporting Production Apache HBase Clusters Trends in Supporting Production Apache HBase Clusters Presentation Transcript

  • Headline Goes Here Speaker Name or Subhead Goes Here DO NOT USE PUBLICLY PRIOR TO 10/23/12 Trends in Supporting Production Apache HBase Clusters Jonathan Hsieh | @jmhsieh | Software Engineer at Cloudera / HBase PMC Member Kevin O’Dell| kevin.odell@cloudera| Systems Engineer at Cloudera June 26, 2013
  • Who are we? Jonathan Hsieh • Cloudera: • Software Engineer • Apache HBase committer / PMC • Apache Flume founder Kevin O’Dell • Cloudera: • Systems Engineer • Apache HBase contributor • Cloudera HBase Support Lead 2 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • What is Apache HBase? Apache HBase is a reliable, column- oriented data store that provides consistent, low- latency, random read/write access. ZK HDFS App MR 3 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • HBase Architecture ZK HDFS App MR 4 6/26/13 Hadoop Summit / O'Dell, Hsieh • HBase is designed to be fault tolerant and highly available • It depends on other systems to be as well. • Replication for fault tolerance • Serve regions from any Region server • Failover HMasters • ZK Quorums • HDFS Block replication on Data Nodes
  • From the trenches at Cloudera Customer Operations Trends Supporting HBase
  • Customers in 2011-12 vs in 2012-13 0.90.x / CDH3 era • Red Hat 5.x • Java jvm 1.6.13 • 4-8 disk machines • 24-48 GB RAM • Dual 4-core HT • CDH3 • Apache HBase 0.90 • Apache Hadoop 0.20.x 0.92.x/0.94.x / CDH4 era • Red Hat 6.x • Java jvm 1.6.31 • 12-15 disk machines • 48-96 GB RAM • Dual 6-core HT • CDH4 • Apache HBase 0.92/0.94 • Apache Hadoop 2.0 6 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • Support Incidents 6/2011-6/2012 • Patched Bug • Patched delivered, or • Fixed in next version • Operational Workaround • Misconfiguration • Schema design / tuning • hbck used to fix • Network/HW/OS • Problems with underlying systems. 7 Patched 12% Workaround (hbck) 28% Workaround (config) 44% Net/HW/OS 16% 6/11-6/12 - CDH3 / 0.90.x HBase Support Tickets 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • Comparing 6/11-6/12 to 6/12-6/13 8 Patched 12% Workaround (hbck) 28% Workaround (config) 44% Net/HW/OS 16% 6/11-6/12 - CDH3 / 0.90.x HBase Support Tickets Patched 14% Workaround (config/hbck) 36% Net/HW/OS 42% Documentation 8% 6/12-6/13 - CDH3+CDH4 HBase Support Tickets Much smaller! Merged config/hbck New category This is bigger! 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • Comparing 2011 to 2012 • Majority customers upgraded to CDH4. • More customers, but similar volume of support incidents • Shrunk the CDH3’s largest trouble spots significantly. • Larger number of issues due to underlying systems. • This is actually a good thing! 9 Patched 14% Workaround (config/hbck) 36% Net/HW/OS 42% Documentation 8% 6/12-6/13 - CDH3+CDH4 HBase Support Tickets 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • HBase Operations Challenges
  • Operation’s pain points from 6/12 – 6/13 • Hardware (Net/OS/HW) • Upgrade (0.90 -> 0.92) • HBase configuration 11 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • Hardware / Network / Operating System • Leap second • Transparent Huge pages • Bad 10GB Ethernet Firmware 12 Bug 14% Workaround (config/hbck) 36% Net/HW/OS 42% Documentation 8% 6/12-6/13 - CDH3+CDH4 HBase Support Tickets 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • Cloudera Manager (CM) system host checker 13 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • Upgrade Issues • Old .edits (HBASE-6440) • 0-length HLogs (HBASE-6443) • Bad region refs (HBASE-7199) • Invalid HFile (Heisenbug) 14 Bug 14% Workaround (config/hbck) 36% Net/HW/OS 42% Documentation 8% 6/12-6/13 - CDH3+CDH4 HBase Support Tickets 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • Upgrade Assistance • Parcels • simplified distribution • flexibility of install location • side by side installs for rolling upgrades • Rolling upgrades via CM • hot fixes • minor version upgrades • Automated tests for upgrades and compatibility 15 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • Configuration / Feature • Continuous Bulk Load • Avoid and Use Puts • Region tuning • Updated defaults + CM • GC tuning • Updated defaults + CM • Balancer • Manual / custom tools • Bad Schema • Trial and Error 16 Bug 14% Workaround (config/hbck) 36% Net/HW/OS 42% Documentation 8% 6/12-6/13 - CDH3+CDH4 HBase Support Tickets 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • CM helps • Sanity checks on configurations • Wizard based installation and setup • Wizard based rolling upgrades (minor versions) • Wizard based backup and disaster recovery strategies 17 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • Configuration Management 18 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • Support improvement wishlist • Improved “Ergonomics” • Better default configuration and guard rails • “I’m sorry Dave, I can’t let you do that” • Improved error messaging • Suggest likely root causes in logs • Improve log signal-to-noise ratio • More improved ops tooling and frameworks for app development 6/26/13 Hadoop Summit / O'Dell, Hsieh19
  • Good news • All bug fixes go into the Apache versions before CDH • HBase is maturing • Higher percentage of incidents by underlying OS/HW/NW • More performance and tuning oriented questions • Similar percentage of incidents caused by bugs • We’re getting better • Lower percentage of incidents managed with workarounds • More tools in place to help operational support • Hbck, CM, defaults • We can still do better! 20 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • Getting rid of workarounds Trends Developing HBase 21 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • Developer Community • Vibrant, Highly Active community! • We’re Growing! 6/26/13 Hadoop Summit / O'Dell, Hsieh22
  • Upstream Development Improvements for 0.95+ • Improving Usability • Improving Reliability • Improving Predictability 23 6/26/13 Hadoop Summit / O'Dell, Hsieh Patched 14% Workaround (config/hbck) 36% Net/HW/OS 42% Documentation 8% 6/12-6/13 - CDH3+CDH4 HBase Support Tickets
  • Improving Usability Metrics and Frameworks
  • Usability Concerns • Administering HBase has been too hard. • Difficult to see what is happening in HBase • Easy to make bad design decisions early without realizing • New Developments • Metrics Revamp • HTrace • Frameworks for Schema design 6/26/13 Hadoop Summit / O'Dell, Hsieh25
  • Metrics Options Cloudera Manager OpenTSDB 26 Ganglia Ganglia Image From:http://www.flickr.com/photos/hongiiv/ 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • HTrace • Problem: • Where is time being spent inside HBase? • Solution: HTrace Framework • Inspired by Google Dapper • Threaded through HBase and HDFS • Tracks time spent in calls in a distributed system by tracking spans* on different machines. *Some assembly still required. 6/26/13 Hadoop Summit / O'Dell, Hsieh27
  • HBase Schemas • HBase Application developers must iterate to find a suitable HBase schema • Schema critical for Performance at Scale • How can we make this easier? • How can we reduce the expertise required to do this? • Today: • Lots of tuning knobs • Developers need to understand Column Families, Rowkey design, Data encoding, … • Some are expensive to change after the fact 6/26/13 Hadoop Summit / O'Dell, Hsieh28
  • Row key design techniques • Numeric Keys and lexicographic sort • Store numbers big-endian. • Pad ASCII numbers with 0’s. • Use reversal to have most significant traits first. • Reverse URL. • Reverse timestamp to get most recent first. • (MAX_LONG - ts) so “time” gets monotonically smaller. • Use composite keys to make key distribute nicely and work well with sub-scans • Ex: User-ReverseTimeStamp • Do not use current timestamp as first part of row key! 29 Row100 Row3 Row 31 Row003 Row031 Row100 vs. blog.cloudera.com hbase.apache.org strataconf.com vs. com.cloudera.blog com.strataconf org.apache.hbase 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • Row key design techniques • Numeric Keys and lexicographic sort • Store numbers big-endian. • Pad ASCII numbers with 0’s. • Use reversal to have most significant traits first. • Reverse URL. • Reverse timestamp to get most recent first. • (MAX_LONG - ts) so “time” gets monotonically smaller. • Use composite keys to make key distribute nicely and work well with sub-scans • Ex: User-ReverseTimeStamp • Do not use current timestamp as first part of row key! 30 Row100 Row3 Row 31 Row003 Row031 Row100 vs. blog.cloudera.com hbase.apache.org strataconf.com vs. com.cloudera.blog com.strataconf org.apache.hbase 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • MTTR Improving Reliability
  • Reliable Reliable / Highly Available • Reliable: • Ability to recover service if a component fails, without losing data. • Highly Available: • Ability to quickly recover service if a component fails, without losing data. • Goal: Minimize downtime! Highly Available 32 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • Mean Time To Recovery (MTTR) • Average time taken to automatically recover from a failure. • Detection time • Repair Time • Notification Time • Measure: HTrace (Dapper) Infrastructure (0.96+) 6/26/13 Hadoop Summit / O'Dell, Hsieh33 Detect Repair Notify time
  • Reduce Detection Time • Proactive notification of HMaster failure (0.95) • Proactive notification of RS failure (0.95) • Fast server failover (Hardware) 6/26/13 Hadoop Summit / O'Dell, Hsieh34 Detect Notify time Repair
  • Reduce Detection Time • Proactive notification of HMaster failure (0.95) • Proactive notification of RS failure (0.95) • Fast server failover (Hardware) 6/26/13 Hadoop Summit / O'Dell, Hsieh35 Repair Notify time Detect
  • Reduce Recovery Time • Distributed Log Splitting (0.92) • Distributed Log Replay (0.95) • Fast Write recovery (0.95) • Pristine Read recovery (0.96+) 6/26/13 Hadoop Summit / O'Dell, Hsieh36 Notify time Detect Repair
  • Reduce Recovery Time • Distributed Log Splitting (0.92) • Distributed Log Replay (0.95) • Fast Write recovery (0.95) • Pristine Read recovery (0.96+) 6/26/13 Hadoop Summit / O'Dell, Hsieh37 Repair Notify time Detect
  • Reduce Notification Time • Notify client on recovery • Async Client rewrite (0.96+) 6/26/13 Hadoop Summit / O'Dell, Hsieh38 Notify time Detect Repair
  • Reduce Notification Time • Notify client on recovery • Async Client rewrite (0.96+) 6/26/13 Hadoop Summit / O'Dell, Hsieh39 Repair Notify time Detect
  • Compactions Improving Predictability
  • Reliable Reliable / Highly Available • Reliable: • Ability to recover service if a component fails, without losing data. • Highly Available: • Ability to quickly recover service if a component fails, without losing data. • Goal: Minimize downtime! Highly Available 41 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • Reliable Reliable / Highly Available / Latency Tolerant • Reliable: • Ability to recover service if a component fails, without losing data. • Highly Available: • Ability to quickly recover service if a component fails, without losing data. • Latency Tolerant • Ability to perform and recover in a predictable amount of time, without losing data • New Goal: Predictable performance Highly Available 42 Latency Tolerant 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • Common causes of performance variability • Compaction • Garbage Collection • Locality Loss 6/26/13 Hadoop Summit / O'Dell, Hsieh43
  • Compaction • Compactions optimizing read layout by rewriting files • Reduce the seeks required to read a row • Improve random read performance • Age off expired or deleted data • Assumes uniformly distributed write workload • But we have new workloads: • Continuous Bulk load write pattern • Time-series write pattern 6/26/13 Hadoop Summit / O'Dell, Hsieh44
  • Compactions: Put workload • Minor compactions • Optimizes a sub set of adjacent files • Major Compactions • Optimizes all files • Choosing: • Assume: older files should be larger than newer files. • “New” files are “larger” than “older” files? major compaction • Else, look at newer files and select files for a minor compaction 6/26/13 Hadoop Summit / O'Dell, Hsieh45 Newly flushed HFiles Minor … … Minor MajorMinor
  • Compactions: Bulkload workload • Functionality for loading data en masse • Intended for Bootstrapping HBase tables • New write workload: frequently ingest data only via bulk load • Problem: • Breaks age/size assumption! • Major Compaction Storms! • Compactions unnecessarily rewrite large files. 46 6/26/13 Hadoop Summit / O'Dell, Hsieh Newly bulk loaded HFiles Major Newly flushed HFiles MajorMajor
  • Bulkload: Exploring Compactor • Explore all compaction possibilities • Choose minor compactions that reduces # of files while incurring least IO. • “the best bang of the buck” • Compaction workload is more manageable 47 6/26/13 Hadoop Summit / O'Dell, Hsieh Newly bulk loaded HFiles Explore Newly flushed HFiles Minor Minor
  • Conclusions
  • Comparing 6/11-6/12 to 6/12-6/13 49 Patched 12% Workaround (hbck) 28% Workaround (config) 44% Net/HW/OS 16% 6/11-6/12 - CDH3 / 0.90.x HBase Support Tickets Patched 14% Workaround (config/hbck) 36% Net/HW/OS 42% Documentation 8% 6/12-6/13 - CDH3+CDH4 HBase Support Tickets Development and tooling efforts continue to reduce HBase is becoming more robust 6/26/13 Hadoop Summit / O'Dell, Hsieh Improved testing
  • Summary by Version 0.90 0.92 /0.94 0.95-dev / 0.96 0.98 /trunk •HBase Developer Expertise • HBase Operational Experience • Distributed Systems Admin Experience •  •True Durability • Consistency • Performance • MTTR • Protobufs • Snapshots • Table locks • (Predictability) • (File Block Affinity†) •Distributed log splitting* •Distributed log splitting • Distributed log splitting • Distributed log replay† • Fast Write Recovery† •Distributed log splitting •Distributed log replay† •Fast Write Recovery† •(Pristine Region Read Recovery) •Metrics • CF+Region Granularity Metrics • CF+Region Granularity Metrics • Improved failure detection time •CF +Region Granularity Metrics •Improved failure detection time •(Htrace) Recovery in Hours Recovery in Minutes Recovery in Seconds (for writes) Recovery in Seconds † experimental (in progress) *backported in CDH 50 6/26/13 Hadoop Summit / O'Dell, Hsieh
  • Questions? 6/26/13 Hadoop Summit / O'Dell, Hsieh51 @kevinrodell @jmhsieh