Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment


Published on

Pinterest runs 38 different HBase clusters in production, doing a lot of different types of work—with some doing up to 5 million operations per second. In this talk, you'll get details about how we do capacity planning, maintenance tasks such as online automated rolling compaction, configuration management, and monitoring.

Published in: Software
  • Be the first to comment

HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment

  1. 1. HBase 2 Jeremy Carroll & Tian-Ying Chang Online, at at Low Latency
  2. 2. 30+ Billion Pins categorized by people into more than
 750 Million Boards 3
  3. 3. Use Cases Products running on HBase / Zen 4 The SmartFeed service renders the Pinterest landing page for feeds. Pinterest Engineering Provides personal suggestions based on a prefix for a user. Pinterest Engineering Messages / Notifications Pinterest Engineering Interests Pinterest Engineering 1 2 3 4
  4. 4. Online Performance
  5. 5. Problem Validating the new i2 platform 6 p99.9 zen get nodes in ms
  6. 6. Garbage Collection Stop the world events 7 pause time in milliseconds
  7. 7. Problem #1: Promotion Failures Issues seen in production Heap fragmentation causing promotion failures BlockCache at high QPS was causing fragmentation Keeping hundreds of billions of rows BloomFilters from being evicted, leading to latency issues Solutions Tuning CombinedCache for space for Memstore + Blooms & Indexes with CombinedCache / OffHeap Monitoring % of LRU Heap for blooms & indexes Tuning & BucketCache 8
  8. 8. Problem #2: Pause Time Calculating Deltas Started noticing ‘user + sys’ vs ‘real’ was very different Random spikes of delta time, concentrated on hourly boundaries Found resources online, but none of the fixes seemed to work Low user, low sys, high real 9
  9. 9. Cloud Problems Noisy neighbors 10 /dev/sda average io in milliseconds
  10. 10. Formatting with EXT4 Logging to instance-store volume 11 100% 1251.60 99.99% 1223.52 99.9% 241.33 90% 151.45 Pause time in ms
  11. 11. Formatting with XFS PerfDisableSharedMem & logging to epemeral 12 100% 115.64 99.99% 107.88 99.9% 94.01 90% 66.38 Pause time in ms
  12. 12. Online Performance JVM Options -XX:+PerfDisableSharedMem Instance Configuration Treat instance-store as read only
 irqbalance >= 1.0.6
 kernel 3.13+ for disk performance Changes for success on EC2 13
  13. 13. Monitoring
  14. 14. Monitoring Zen SLA driven from proxies • Success criteria is composite of two clusters due to backup requests • Wounded cluster due to too many backup requests • Additional query metadata from proxy logs. Aids in hot key analysis 99.99% Uptime 15 table Thrift zen cluster table table zen cluster table Thrift Thrift Backup Request Replication
  15. 15. Dashboards Metrics, Metrics, Metrics • Usually low value, except when you need them • Dashboards for all H-Stack daemons (DataNode, NameNode, etc..) • Capture amazing amounts of information due to per-region information • Per table / per region stats are very useful Deep dives 16
  16. 16. HBase Alerts Maintenance primary driver • Few clusters cause the majority of the alarms • Mainly driven by lack of capacity planning Hygiene / Capacity notifications • Replacing failed / terminated nodes • Measuring disk / cpu / network • Canary analysis on code deployments Common alerts • Hot CPU / Disk Space Analytics for HBase On-Call 17
  17. 17. HotSpots Spammers • Few users request the same row over and over • Rate limiting / caching Real time analysis • TCPDump is very helpful tcpdump -i eth0 -w - -s 0 tcp port 60020 | strings • Looking at per-region request stats Code Issues • Hard-coded key in product. Ex: Messages launch Debugging imbalanced requests 18
  18. 18. Capacity Planning
  19. 19. Capacity Planning Managed Splitting UniformSplitAlgo to pre-split regions Salted keys for uniform distribution Name spaced table on a shared cluster Feature Cost Some table attributes are more expensive BlockSize, Bloom Filters, Compression (Prefix, FastDiff, Snappy) Start small. Split for growth 20 table_feat Split Split
  20. 20. Capacity Planning Distributing Load Balance is important to eliminate hot spots Per-table load balancing Disable auto-splitting and monitor region size Scaling Metrics CPU Utilization HDFS Space Regions per server (Memstore / Flushing) All Others (Memory, Bandwidth, etc..) Balancing regions to servers 21 table table 1 2 4 5 3 1 3 4 2
  21. 21. Launching Messages • Started on a NameSpace table on a shared cluster • Ramping out to production with decider (x%) • Split table to get additional region servers serving traffic • Migrated to dedicated cluster • As experiment ramped up, added / removed capacity as feature was adopted From development to production 22 2:50 PM 100%
  22. 22. Operations with AWS 23
  23. 23. Availability Conditions for Failure Termination notices from the underlying host Default RF of 3 in one zone dangerous Placement Groups may make this worse Stability Patterns Highest numbered instance type in a family Multi Availability Zone + block placement “Air Gaped” change management Strategies for mitigating failure 24 Master Slave US-East-1A US-East-1E Replication
  24. 24. Disaster Recovery Recovering Data System copies WALs and Snapshots to local HDFS HDFS keeps <X> hours locally. Rotates to S3 Data can restore from S3, HDFS, or another Cluster w/rate limited copying Used frequently to test new configurations, and upgrade clusters side-by-side
 hbaserecover -t unique_index -ts 2015-04-08-16-01-15 --source-cluster zennotifications & bootstrapping new clusters 25 Master Slave US-East-1A US-East-1E Replication DR SlaveReplication US-East-1D HDFS S3
  25. 25. Flow Monitoring Snapshot backup routine • ZooKeeper based configuration for each cluster • Backup metadata is sent to ElasticSearch for integration with dashboards • Monitoring and alerting around WAL copy & snapshot status Reliable cloud backups 26
  26. 26. Flow Monitoring DistCP all hlog hourly • Copy from slave cluster to avoid latency impact • Using DistCP V2 which has throttling • Spike in graph means DistCP failed, huge amount of data copied once issued fixed • Using ElasticSearch to store all backup metadata Alert when backup pipeline failed 27
  27. 27. 4 28 Maintenance Check Health Get Lock Get Server Locality Check Server Status Region Movement w/Threads Start RegionServer + Cooldown Release Lock Update Status + Cooldown Rolling Restart Change management while retaining availability 1 2 3 Region Movement w/Threads Stop RegionServer + Cooldown Verify Locality Check Server Status
  28. 28. Rolling Compaction Only one region per server is selected • Avoid blocked region in queue Controlled concurrency • Control the space spike • Reduce increased network and disk traffic Controlled time to stop • Stop before day time traffic ramp up • Stop if compaction causing perf issue Resume the next night • Filter out the regions that has run compaction Important for online facing clusters 29
  29. 29. Next Challenges Upgrade to latest stable (1.x) from 0.94.x w/no downtime Increasing performance • Lower latency • Better compaction throughput Regional Failover • Cross Datacenter is in production now • Need cross regional failover Looking forward 30
  30. 30. © Copyright, All Rights Reserved Pinterest Inc. 2015