Operations
Jeremy Carroll
Operations Engineer
HBaseCon 2013
We help people discover things they love
and inspire them to do those things…
HBase in Production
Overview
• All running on Amazon Web Services
• 5 production clusters and growing
• Mix SSD and SATA c...
With lots of patches
Designing for EC2
• CDH 4.2.x
• HDFS-3912
• HBase 0.94.7
• HBASE-8284
• One zone per cluster / no rac...
Configuration
Cluster Setup
• Managed splitting w/pre split tables
• Bloom filters for pretty much everything
• Manual / Ro...
Fact-driven “Fry” method using Puppet
Provisioning
• User-data passed in to drive config management
• Repackaged modificat...
---- HBASE MODULE ----
class { 'hbase':
cluster => 'feeds_e',
namenode => 'ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com',
z...
Designed for EC2
Service Monitoring
• Wounded (dying) vs Operational
• High value metrics first
• Overall health
• Alive /...
Designed for EC2
Service Monitoring
Instrumentation
Metrics
• OpenTSDB for high cardinality metrics
• Per region stats collection
• tCollector
• RegionServer ...
OpenTSDB
Table RegionServer Region
Slicing and Dicing
Using R
Tables
Regions
StoreFiles
Tuning Performance
Compaction
+Logs
Operational Intelligence
Dashboards
S3 + HBase Snapshots
Backups
• Full NameNode backup every 60 mins
• EBS Volume as an name.dir for crash recovery
• HBase s...
Additional Tuning
Solid State Clusters
• Lower block size down from 32k
• Something a lot smaller. 8-16k
• Placement group...
Process
Planning for Launch
• Pyres queue asynchronous reads / writes
• Allows for tuning a system before it goes live
• T...
Pssstt. We’re hiring
Upcoming SlideShare
Loading in...5
×

HBaseCon 2013: Apache HBase Operations at Pinterest

4,296

Published on

Presented by: Jeremy Carroll, Pinterest

Published in: Technology
0 Comments
13 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,296
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
0
Comments
0
Likes
13
Embeds 0
No embeds

No notes for slide

Transcript of "HBaseCon 2013: Apache HBase Operations at Pinterest"

  1. 1. Operations Jeremy Carroll Operations Engineer HBaseCon 2013
  2. 2. We help people discover things they love and inspire them to do those things…
  3. 3. HBase in Production Overview • All running on Amazon Web Services • 5 production clusters and growing • Mix SSD and SATA clusters • Billions of page views per month
  4. 4. With lots of patches Designing for EC2 • CDH 4.2.x • HDFS-3912 • HBase 0.94.7 • HBASE-8284 • One zone per cluster / no rack locality • RegionServers - Ephemeral disk only • Redundant clusters for availability • HDFS-4721 • HDFS-3703 • HDFS-9503 • HBASE-8389• HBASE-8434 • HBASE-7878
  5. 5. Configuration Cluster Setup • Managed splitting w/pre split tables • Bloom filters for pretty much everything • Manual / Rolling major compactions • Reverse DNS on EC2 • 3 ZooKeepers in quorum • 1 NameNode / Sec-NameNode / Master • 1 EBS volume for fsImage / 1 Elastic IP • 10-50 nodes per cluster
  6. 6. Fact-driven “Fry” method using Puppet Provisioning • User-data passed in to drive config management • Repackaged modifications to HDFS / HBase • Ubuntu .deb packages created with FPM • Synced to S3, nodes configured with s3-apt plugin • Mount + format ephemerals on boot • Ext4 / nodiratime / nodealloc / lazy_itable_init
  7. 7. ---- HBASE MODULE ---- class { 'hbase': cluster => 'feeds_e', namenode => 'ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com', zookeeper_quorum => 'zk1,zk2,zk3', hbase_site_opts => { 'hbase.replication' => true, 'hbase.snapshot.enabled' => true, 'hbase.snapshot.region.timeout' => '35000', 'replication.sink.client.ops.timeout' => '20000', 'replication.sink.client.retries.number' => '3', 'replication.source.size.capacity' => '4194304', 'replication.source.nb.capacity' => '100', ... } } ---- FACT BASED VARIABLES ---- $hbase_heap_size = $ec2_instance_type ? { 'hi1.4xlarge' => '24000', 'm2.2xlarge' => '24000', 'm2.xlarge' => '11480', 'm1.xlarge' => '11480', 'm1.large' => '6500', ... } Puppet Module
  8. 8. Designed for EC2 Service Monitoring • Wounded (dying) vs Operational • High value metrics first • Overall health • Alive / dead nodes • Service up/down • Fsck / Blocks / % Space • Replication status • Regions needing splits • fsImage checkpoint • Zookeeper quorum • Synthetic transactions (get / put) • Queues (flush / compaction / rpc) • Latency (client / filesystem)
  9. 9. Designed for EC2 Service Monitoring
  10. 10. Instrumentation Metrics • OpenTSDB for high cardinality metrics • Per region stats collection • tCollector • RegionServer HTTP JMX • HBase REST • GangliaContext for hadoop-metrics
  11. 11. OpenTSDB Table RegionServer Region Slicing and Dicing
  12. 12. Using R Tables Regions StoreFiles
  13. 13. Tuning Performance Compaction +Logs
  14. 14. Operational Intelligence Dashboards
  15. 15. S3 + HBase Snapshots Backups • Full NameNode backup every 60 mins • EBS Volume as an name.dir for crash recovery • HBase snapshots + ExportSnapShot
  16. 16. Additional Tuning Solid State Clusters • Lower block size down from 32k • Something a lot smaller. 8-16k • Placement groups for 10Gb networking • Increase DFSBandwidthPerSec • Kernel tuning for TCP • Compaction threads • Disk elevator to noop
  17. 17. Process Planning for Launch • Pyres queue asynchronous reads / writes • Allows for tuning a system before it goes live • Tuning • Schema • Hot spots • Compaction • Canary roll out to new users • 10% -> 30% -> 80% -> 100%
  18. 18. Pssstt. We’re hiring

×