Your SlideShare is downloading. ×
0
Tales From the Cloudera Field
Kevin O’Dell, Kate Ting, Aleks Shulman
{kevin, kate, aleks}@cloudera.com
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Who Are We?
Kevin O’Dell
- Previously ...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Cloudera Internal HBase Metrics
• Clou...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Agenda
● Tales Getting Production Star...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Agenda
● Tales Getting Production Star...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
HBase Deployment Mistakes
• Cluster Si...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Why Cluster Sizing Matters
• Jobs Fail...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Heavy Write Sizing
java_max_heap 16GB
...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Update for Known Writes Sizing
write_t...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Why is Region Management Important
• I...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Region Management Best Practices
Regio...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
General Recommendations
Feature Benefi...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Agenda
● Tales Getting Production Star...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Tales Fixing Production Bugs
● RegionS...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Tales Fixing Production Bugs
● RegionS...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Fixing #1: RegionServer Hotspotting - ...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Fixing #1: RegionServer Hotspotting - ...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Tales Fixing Production Bugs
● RegionS...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Fixing #2: Faulty Hardware
● Diagnosti...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Fixing #2: Faulty Hardware - Solution
...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Fixing #2: Faulty Hardware - Solution
...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Tales Fixing Production Bugs
● RegionS...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Fixing #3: Application Bug
● HBase tim...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Fixing #3: Application Bug
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Fixing #3: Application Bug - Solution
...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Agenda
● Tales Getting Production Star...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Internal Case Study
CDH4->C5 (0.94->0....
Automating Upgrades
Testing the Upgrade lifecycle
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
What is Important?
The Administrator E...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
And Here Is Why It Is Important
Custom...
Upgrades
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Cold vs. Rolling Upgrades
C3u5 CDH4.0....
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Upgrades from HBase 0.90 -> 0.98
CDH V...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Cold Upgrade Results
● Upgrades work!
...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Rolling Upgrade Results
● What is test...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Improved Supportability Through Testin...
©2014 Cloudera, Inc. All rights reserved.
©2014 Cloudera, Inc. All rights reserved.
Here’s to Fewer Tales Next Year..
Auto...
©2014 Cloudera, Inc. All rights reserved.
Kevin O’Dell @kevinrodell
Kate Ting @kate_ting
Aleks Shulman @a_shulman
@clouder...
Upcoming SlideShare
Loading in...5
×

Tales from the Cloudera Field

1,158

Published on

Speakers: Kevin O'Dell, Aleksandr Shulman & Kathleen Ting (Cloudera)

From supporting the 0.90.x, 0.92, 0.94, and 0.96 HBase installations on clusters ranging from tens to hundreds of nodes, Cloudera has seen it all. Having automated the upgrade paths from the different Apache releases, we have developed a smooth path that can help the community with upcoming upgrades. In addition to automation best practices, in this talk you'll also learn proactive configuration tweaks and operational best practices to keep your HBase cluster always up and running. We'll also walk through how to contain an application bug let loose in production, to minimize the impact on HBase posed by faulty hardware, and the direct correlation between inefficient schema design and HBase performance.

Published in: Software, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,158
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
81
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Tales from the Cloudera Field"

  1. 1. Tales From the Cloudera Field Kevin O’Dell, Kate Ting, Aleks Shulman {kevin, kate, aleks}@cloudera.com
  2. 2. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Who Are We? Kevin O’Dell - Previously HBase Support Team Lead - Currently Systems Engineer with a focus on HBase deployments Kate Ting - Technical Account Manager of Cloudera’s largest HBase deployments - Co-author of O’Reilly’s Apache Sqoop Cookbook Aleks Shulman - HBase Test Engineer focused on ensuring HBase is enterprise ready - Primary focus on building compatibility frameworks for rolling upgrades
  3. 3. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Cloudera Internal HBase Metrics • Cloudera uses HBase internally for the Support Team • We ingest Tickets, Cluster Stats, and Apache Mailing Lists • Cloudera has ~20K HBase nodes under management • Over 60% of my accounts use HBase
  4. 4. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Agenda ● Tales Getting Production Started ● Tales Fixing Production Bugs ● Tales Upgrading Production Clusters
  5. 5. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Agenda ● Tales Getting Production Started ● Tales Fixing Production Bugs ● Tales Upgrading Production Clusters
  6. 6. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. HBase Deployment Mistakes • Cluster Sizing • Managing Your Regions • General Recommendations
  7. 7. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Why Cluster Sizing Matters • Jobs Failing • Writes Blocking • Performance Issues
  8. 8. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Heavy Write Sizing java_max_heap 16GB memstore_upper .50 java_max_heap * memstore = memstore_total_size Calculating Total Available Memstore desired_flush_size 128MB repl_factor 3 (default) max_file_size 20GB Calculating Max Regions memstore_total_size / desired_flush_size = total_regions_per_rs max_file_size * (total_regions_per_rs * repl_factor) = raw_storage_per_node X-axis = Flush_Size Y-axis = Region_Count
  9. 9. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Update for Known Writes Sizing write_throughput 20MBs total_data_size 350TB hlog_size * number_of_hlogs = amount_of_data_before_flush Calculating force flushes hlog_size 128MBs number_of_hlogs 64 (write_throughput * 60 * 60) / amount_of_data_before_flush = number_nodes_before_flush Calculating Max Regions total_data_size 350TB maxfile_size 20GB ((total_data_size * 1024) / maxfile_size) / desired_RS_count = total_regions_per_rs
  10. 10. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Why is Region Management Important • Initial loads are failing • Region Servers are crashing from overload
  11. 11. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Region Management Best Practices Region Split Policy ConstantSize Split on Max Filesize Use when pre-splitting all tables UpperBoundSplitPolicy Split on smarter intervals Use when not able to pre-split all tables Balancer Policy SimpleLoadBalancer Aimlessly balance regions Use with lots of tables with low region count ByTable Balance by table Use with few tables with high region count
  12. 12. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. General Recommendations Feature Benefit When to Enable Short Circuit Reads (SCR) Speed up read times by bypassing datanode layer Always Snappy Compression Speed up read times and lower data consumption On heavily accessed tables Bloom Filters Speed up read times when numerous HFiles are present Row should always be used, Row+Column is more accurate but higher in memory usage HLog Compression Speed up writes and recovery times Always Data Block Encoding compress long keys to store more in cache Best for short/tall tables with long like keys. Scans may be slower
  13. 13. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Agenda ● Tales Getting Production Started ● Tales Fixing Production Bugs ● Tales Upgrading Production Clusters
  14. 14. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Tales Fixing Production Bugs ● RegionServer Hotspotting ● Faulty Hardware ● Application Bug
  15. 15. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Tales Fixing Production Bugs ● RegionServer Hotspotting ● Faulty Hardware ● Application Bug
  16. 16. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Fixing #1: RegionServer Hotspotting - Solution ● Spread rows over all RS by salting the row key ● 100’s of regions avail but increments only done to 10’s of regions ● While locks wait to time out, blocked clients hold onto handlers
  17. 17. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Fixing #1: RegionServer Hotspotting - Solution ● Option 1: Change row key to something that scales ○ Reduce contention by reducing connections: each client picks one salt and writes only to one RS ● Option 2: Implement new coalescing feature in native HBaseSink, compressing entire batch of Flume events into single HBase RPC call [row1, colA+=1] [row1, colB+=1] [row1, colB+=1] => [row1 colA+=1 colB+=2]
  18. 18. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Tales Fixing Production Bugs ● RegionServer Hotspotting ● Faulty Hardware ● Application Bug
  19. 19. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Fixing #2: Faulty Hardware ● Diagnostics run on bad hardware caused HBase failures ● HBase recoverability = RS back online + locality (compaction) ● Stress test with prod load before needed (i.e. holiday season) ● Imagine financial impact of 7 hours of downtime?
  20. 20. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Fixing #2: Faulty Hardware - Solution ● Recover faster by failing fast ○ Too many retries cause HBase task to exit before it can print exception identifying stuck RS ● Decrease time needed to finish HBase major compaction ○ Run multiple threads during compaction ● Replay in parallel ○ Decrease HLog size to limit # of edits to be replayed, increase # of HLogs, constrain WAL file size to minimize time corresponding region is not available
  21. 21. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Fixing #2: Faulty Hardware - Solution ● Shorten column family names ○ Reduce scan time, skip bulk loads, reduce memory usage ● Turn off write cache ○ Node crash erases writes in memory, rebuilds block with outdated data, causing corrupt replica ● Turn on checksum ○ Enables RS to use other replicas from the cluster instead of failing the operation if there’s a corrupted HFile
  22. 22. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Tales Fixing Production Bugs ● RegionServer Hotspotting ● Faulty Hardware ● Application Bug
  23. 23. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Fixing #3: Application Bug ● HBase timestamps were hardcoded to be too far out - new data written went unused ● Bug put backup system out of commission for one month ○ More vulnerable to HBase outages
  24. 24. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Fixing #3: Application Bug
  25. 25. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Fixing #3: Application Bug - Solution ● Detailed knowledge of internals required to undo damage ○ Modified the timestamp to some time in the past for all records via custom MR jobs over one month: ■ back up data, generate new HFile with correct timestamp, bulkload data, run MD5 ● Don’t muck around with setting the timestamp yourself ● Do use always-increasing timestamps for new puts to a row ● Do use a separate timestamp attribute of the row
  26. 26. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Agenda ● Tales Getting Production Started ● Tales Fixing Production Bugs ● Tales Upgrading Production Clusters
  27. 27. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Internal Case Study CDH4->C5 (0.94->0.96) Upgrade Automation Failed What Happened? Root Cause • HBase Snapshots vs. HDFS Snapshots • Snapshot directory rename Outcome • All issues resolved before C5b1 was shipped 2013-07-12 17:11:42,656 ERROR org.apache. hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation MkdirOp [length=0, inodeId=0, path=/hbase/.snapshot, timestamp=1373674083434, permissions=hbase: supergroup:rwxr-xr-x, opCode=OP_MKDIR, txid=614] org.apache.hadoop. HadoopIllegalArgumentException: ".snapshot" is a reserved name. Please rename it before upgrade.
  28. 28. Automating Upgrades Testing the Upgrade lifecycle
  29. 29. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. What is Important? The Administrator Experience Matters ● Major version upgrades ● Rolling upgrades The Developer Experience Matters ● API Compatibility Testing
  30. 30. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. And Here Is Why It Is Important Customer Continuity • Smooth upgrades • Curated process • Understanding of customer cluster lifecycle Developer Continuity • Forward and backward compatibility • Binary Compatibility • Wire Compatibility Automation • You can only really make a guarantee about things that are automated • Product is easier to support • Confidence is only possible with testing
  31. 31. Upgrades
  32. 32. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Cold vs. Rolling Upgrades C3u5 CDH4.0.x CDH4.1.x CDH4.2.x CDH4.3.x CDH4.4.x CDH4.5.x CDH4.6.x C5.0 C5.1 -- Rolling Upgrade --> -- Rolling Upgrade -- > -- Cold Upgrade --> -- Cold Upgrade -->
  33. 33. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Upgrades from HBase 0.90 -> 0.98 CDH Version HBase Version CDH3u5 HBase 0.90.6 CDH4.1.0 HBase 0.92.1 CDH4.2.0 HBase 0.94.2 CDH4.4.0 HBase 0.94.6 CDH4.6.0 HBase 0.94.15 CDH5.0.0 HBase 0.96.1.1 CDH5.1.0 HBase 0.98.1 A B C Upgrade from version A -> Version B -> Version C
  34. 34. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Cold Upgrade Results ● Upgrades work! ● Steps: ○ Start at CDH3u5 ○ Upgrade to a version of CDH4 ○ Upgrade to CDH5.0.0 ● Data Integrity ○ Different bloom filters ○ Different compression formats ● Next Steps ○ CDH 5.1.0 expected to be based on 0.98.1
  35. 35. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Rolling Upgrade Results ● What is tested? ○ Ingest via Java API ○ MapReduce over HBase ■ Bulk load ■ RowCount/Export ● Status ○ Rolling upgrade broken (red) in CDH <=4.1.2 due to region_mover issue ○ Soft failure (yellow) for starting version <CDH4.1.0 - due to MapReduce JT/TT version mismatch issue ○ All else green!How to Read This: Pick a column and read down to see for which versions rolling upgrades are advised
  36. 36. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Improved Supportability Through Testing Case Study: Customer Rolling Upgrade Simulation Large Customer ● Upgrading from CDH4.1.4+patches ● Considered several CDH versions to upgrade ○ Custom patches Automation ● Automated testing added to simulate rolling upgrade ○ CM ○ HA+QJM ○ Parcels ● Scales ○ 4 nodes, 20 nodes, 80 nodes ● Subsequently used for other customers with similar upgrade paths
  37. 37. ©2014 Cloudera, Inc. All rights reserved. ©2014 Cloudera, Inc. All rights reserved. Here’s to Fewer Tales Next Year.. Automated Testing Better Cluster Mgmt Fewer Tales From the Field
  38. 38. ©2014 Cloudera, Inc. All rights reserved. Kevin O’Dell @kevinrodell Kate Ting @kate_ting Aleks Shulman @a_shulman @clouderaTest Questions?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×