• Save
HBaseCon 2013: Apache HBase Operations at Pinterest
 

HBaseCon 2013: Apache HBase Operations at Pinterest

on

  • 4,128 views

Presented by: Jeremy Carroll, Pinterest

Presented by: Jeremy Carroll, Pinterest

Statistics

Views

Total Views
4,128
Views on SlideShare
3,181
Embed Views
947

Actions

Likes
10
Downloads
0
Comments
0

9 Embeds 947

http://www.scoop.it 719
http://www.bigdatanosql.com 198
http://www.cloudera.com 11
https://twitter.com 5
http://author01.mtv.cloudera.com 5
http://cloudera.com 5
http://author01.core.cloudera.com 2
http://64.73.205.98 1
http://dschool.co 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

HBaseCon 2013: Apache HBase Operations at Pinterest HBaseCon 2013: Apache HBase Operations at Pinterest Presentation Transcript

  • Operations Jeremy Carroll Operations Engineer HBaseCon 2013
  • We help people discover things they love and inspire them to do those things…
  • HBase in Production Overview • All running on Amazon Web Services • 5 production clusters and growing • Mix SSD and SATA clusters • Billions of page views per month
  • With lots of patches Designing for EC2 • CDH 4.2.x • HDFS-3912 • HBase 0.94.7 • HBASE-8284 • One zone per cluster / no rack locality • RegionServers - Ephemeral disk only • Redundant clusters for availability • HDFS-4721 • HDFS-3703 • HDFS-9503 • HBASE-8389• HBASE-8434 • HBASE-7878
  • Configuration Cluster Setup • Managed splitting w/pre split tables • Bloom filters for pretty much everything • Manual / Rolling major compactions • Reverse DNS on EC2 • 3 ZooKeepers in quorum • 1 NameNode / Sec-NameNode / Master • 1 EBS volume for fsImage / 1 Elastic IP • 10-50 nodes per cluster
  • Fact-driven “Fry” method using Puppet Provisioning • User-data passed in to drive config management • Repackaged modifications to HDFS / HBase • Ubuntu .deb packages created with FPM • Synced to S3, nodes configured with s3-apt plugin • Mount + format ephemerals on boot • Ext4 / nodiratime / nodealloc / lazy_itable_init
  • ---- HBASE MODULE ---- class { 'hbase': cluster => 'feeds_e', namenode => 'ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com', zookeeper_quorum => 'zk1,zk2,zk3', hbase_site_opts => { 'hbase.replication' => true, 'hbase.snapshot.enabled' => true, 'hbase.snapshot.region.timeout' => '35000', 'replication.sink.client.ops.timeout' => '20000', 'replication.sink.client.retries.number' => '3', 'replication.source.size.capacity' => '4194304', 'replication.source.nb.capacity' => '100', ... } } ---- FACT BASED VARIABLES ---- $hbase_heap_size = $ec2_instance_type ? { 'hi1.4xlarge' => '24000', 'm2.2xlarge' => '24000', 'm2.xlarge' => '11480', 'm1.xlarge' => '11480', 'm1.large' => '6500', ... } Puppet Module
  • Designed for EC2 Service Monitoring • Wounded (dying) vs Operational • High value metrics first • Overall health • Alive / dead nodes • Service up/down • Fsck / Blocks / % Space • Replication status • Regions needing splits • fsImage checkpoint • Zookeeper quorum • Synthetic transactions (get / put) • Queues (flush / compaction / rpc) • Latency (client / filesystem)
  • Designed for EC2 Service Monitoring
  • Instrumentation Metrics • OpenTSDB for high cardinality metrics • Per region stats collection • tCollector • RegionServer HTTP JMX • HBase REST • GangliaContext for hadoop-metrics
  • OpenTSDB Table RegionServer Region Slicing and Dicing
  • Using R Tables Regions StoreFiles
  • Tuning Performance Compaction +Logs
  • Operational Intelligence Dashboards
  • S3 + HBase Snapshots Backups • Full NameNode backup every 60 mins • EBS Volume as an name.dir for crash recovery • HBase snapshots + ExportSnapShot
  • Additional Tuning Solid State Clusters • Lower block size down from 32k • Something a lot smaller. 8-16k • Placement groups for 10Gb networking • Increase DFSBandwidthPerSec • Kernel tuning for TCP • Compaction threads • Disk elevator to noop
  • Process Planning for Launch • Pyres queue asynchronous reads / writes • Allows for tuning a system before it goes live • Tuning • Schema • Hot spots • Compaction • Canary roll out to new users • 10% -> 30% -> 80% -> 100%
  • Pssstt. We’re hiring