Operations
Jeremy Carroll
Operations Engineer
HBaseCon 2013
We help people discover things they love
and inspire them to do those things…
HBase in Production
Overview
• All running on Amazon Web Services
• 5 production clusters and growing
• Mix SSD and SATA clusters
• Billions of page views per month
With lots of patches
Designing for EC2
• CDH 4.2.x
• HDFS-3912
• HBase 0.94.7
• HBASE-8284
• One zone per cluster / no rack locality
• RegionServers - Ephemeral disk only
• Redundant clusters for availability
• HDFS-4721 • HDFS-3703 • HDFS-9503
• HBASE-8389• HBASE-8434 • HBASE-7878
Configuration
Cluster Setup
• Managed splitting w/pre split tables
• Bloom filters for pretty much everything
• Manual / Rolling major compactions
• Reverse DNS on EC2
• 3 ZooKeepers in quorum
• 1 NameNode / Sec-NameNode / Master
• 1 EBS volume for fsImage / 1 Elastic IP
• 10-50 nodes per cluster
Fact-driven “Fry” method using Puppet
Provisioning
• User-data passed in to drive config management
• Repackaged modifications to HDFS / HBase
• Ubuntu .deb packages created with FPM
• Synced to S3, nodes configured with s3-apt plugin
• Mount + format ephemerals on boot
• Ext4 / nodiratime / nodealloc / lazy_itable_init
---- HBASE MODULE ----
class { 'hbase':
cluster => 'feeds_e',
namenode => 'ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com',
zookeeper_quorum => 'zk1,zk2,zk3',
hbase_site_opts => {
'hbase.replication' => true,
'hbase.snapshot.enabled' => true,
'hbase.snapshot.region.timeout' => '35000',
'replication.sink.client.ops.timeout' => '20000',
'replication.sink.client.retries.number' => '3',
'replication.source.size.capacity' => '4194304',
'replication.source.nb.capacity' => '100',
...
}
}
---- FACT BASED VARIABLES ----
$hbase_heap_size = $ec2_instance_type ? {
'hi1.4xlarge' => '24000',
'm2.2xlarge' => '24000',
'm2.xlarge' => '11480',
'm1.xlarge' => '11480',
'm1.large' => '6500',
...
}
Puppet Module
Designed for EC2
Service Monitoring
• Wounded (dying) vs Operational
• High value metrics first
• Overall health
• Alive / dead nodes
• Service up/down
• Fsck / Blocks / % Space
• Replication status
• Regions needing splits
• fsImage checkpoint
• Zookeeper quorum
• Synthetic transactions (get / put)
• Queues (flush / compaction / rpc)
• Latency (client / filesystem)
Designed for EC2
Service Monitoring
Instrumentation
Metrics
• OpenTSDB for high cardinality metrics
• Per region stats collection
• tCollector
• RegionServer HTTP JMX
• HBase REST
• GangliaContext for hadoop-metrics
OpenTSDB
Table RegionServer Region
Slicing and Dicing
Using R
Tables
Regions
StoreFiles
Tuning Performance
Compaction
+Logs
Operational Intelligence
Dashboards
S3 + HBase Snapshots
Backups
• Full NameNode backup every 60 mins
• EBS Volume as an name.dir for crash recovery
• HBase snapshots + ExportSnapShot
Additional Tuning
Solid State Clusters
• Lower block size down from 32k
• Something a lot smaller. 8-16k
• Placement groups for 10Gb networking
• Increase DFSBandwidthPerSec
• Kernel tuning for TCP
• Compaction threads
• Disk elevator to noop
Process
Planning for Launch
• Pyres queue asynchronous reads / writes
• Allows for tuning a system before it goes live
• Tuning
• Schema
• Hot spots
• Compaction
• Canary roll out to new users
• 10% -> 30% -> 80% -> 100%
Pssstt. We’re hiring

HBaseCon 2013: Apache HBase Operations at Pinterest

  • 1.
  • 2.
    We help peoplediscover things they love and inspire them to do those things…
  • 3.
    HBase in Production Overview •All running on Amazon Web Services • 5 production clusters and growing • Mix SSD and SATA clusters • Billions of page views per month
  • 4.
    With lots ofpatches Designing for EC2 • CDH 4.2.x • HDFS-3912 • HBase 0.94.7 • HBASE-8284 • One zone per cluster / no rack locality • RegionServers - Ephemeral disk only • Redundant clusters for availability • HDFS-4721 • HDFS-3703 • HDFS-9503 • HBASE-8389• HBASE-8434 • HBASE-7878
  • 5.
    Configuration Cluster Setup • Managedsplitting w/pre split tables • Bloom filters for pretty much everything • Manual / Rolling major compactions • Reverse DNS on EC2 • 3 ZooKeepers in quorum • 1 NameNode / Sec-NameNode / Master • 1 EBS volume for fsImage / 1 Elastic IP • 10-50 nodes per cluster
  • 6.
    Fact-driven “Fry” methodusing Puppet Provisioning • User-data passed in to drive config management • Repackaged modifications to HDFS / HBase • Ubuntu .deb packages created with FPM • Synced to S3, nodes configured with s3-apt plugin • Mount + format ephemerals on boot • Ext4 / nodiratime / nodealloc / lazy_itable_init
  • 7.
    ---- HBASE MODULE---- class { 'hbase': cluster => 'feeds_e', namenode => 'ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com', zookeeper_quorum => 'zk1,zk2,zk3', hbase_site_opts => { 'hbase.replication' => true, 'hbase.snapshot.enabled' => true, 'hbase.snapshot.region.timeout' => '35000', 'replication.sink.client.ops.timeout' => '20000', 'replication.sink.client.retries.number' => '3', 'replication.source.size.capacity' => '4194304', 'replication.source.nb.capacity' => '100', ... } } ---- FACT BASED VARIABLES ---- $hbase_heap_size = $ec2_instance_type ? { 'hi1.4xlarge' => '24000', 'm2.2xlarge' => '24000', 'm2.xlarge' => '11480', 'm1.xlarge' => '11480', 'm1.large' => '6500', ... } Puppet Module
  • 8.
    Designed for EC2 ServiceMonitoring • Wounded (dying) vs Operational • High value metrics first • Overall health • Alive / dead nodes • Service up/down • Fsck / Blocks / % Space • Replication status • Regions needing splits • fsImage checkpoint • Zookeeper quorum • Synthetic transactions (get / put) • Queues (flush / compaction / rpc) • Latency (client / filesystem)
  • 9.
  • 10.
    Instrumentation Metrics • OpenTSDB forhigh cardinality metrics • Per region stats collection • tCollector • RegionServer HTTP JMX • HBase REST • GangliaContext for hadoop-metrics
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
    S3 + HBaseSnapshots Backups • Full NameNode backup every 60 mins • EBS Volume as an name.dir for crash recovery • HBase snapshots + ExportSnapShot
  • 16.
    Additional Tuning Solid StateClusters • Lower block size down from 32k • Something a lot smaller. 8-16k • Placement groups for 10Gb networking • Increase DFSBandwidthPerSec • Kernel tuning for TCP • Compaction threads • Disk elevator to noop
  • 17.
    Process Planning for Launch •Pyres queue asynchronous reads / writes • Allows for tuning a system before it goes live • Tuning • Schema • Hot spots • Compaction • Canary roll out to new users • 10% -> 30% -> 80% -> 100%
  • 18.