Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HBaseCon 2015: HBase Operations in a Flurry

2,921 views

Published on

With multiple clusters of 1,000+ nodes replicated across multiple data centers, Flurry has learned many operational lessons over the years. In this talk, you'll explore the challenges of maintaining and scaling Flurry's cluster, how we monitor, and how we diagnose and address potential problems.

Published in: Software
  • Be the first to comment

HBaseCon 2015: HBase Operations in a Flurry

  1. 1. HBase Operations in a Flurry Rahul Gidwani Ian Friedman @Yahoo!
  2. 2. ● 2008 - Flurry Analytics for Mobile Apps ○ Sharded MySQL, or ○ HBase ● Launched on 0.18.1 with a 3 node cluster ● Great Community ● Now running 0.98.12 (+patches) ● 2 data centers with 3 clusters each ● Bidirectional replication between all History
  3. 3. How we use HBase Flurry SDK PROCESSING PIPELINE MOBILE ADVERTISING
  4. 4. In each datacenter we have 3 hbase clusters. Our Clusters 1400 Nodes 800 Nodes 60 Nodes 128GB RAM, 4 drives (4TB each) , 10GigE, 2CPU x 6Core x 2HT = 24 procs Each Machine: RegionServer, NodeManager, DataNode 1 Table 60k regions 1.2PB (LZO Compressed) 37 Tables 115k regions 400TB (LZO Compressed) 1 Table 10k regions 2TB (LZO Compressed) Ingestion Pipeline MapReduce Jobs Random Reads Ingestion Pipeline MapReduce Jobs Random Reads Ingestion Pipeline Low Latency Random Reads 99% <= 1ms, Max TP: 1MM rps
  5. 5. 1. Start replication to the new datacenter 2. Backfill using Copy Table Data Migration Attempt #1 Old DC New DC Replication CopyTable MR Job Replication Old DC New DC
  6. 6.  Pros ● Easy from an operational standpoint ● One Map/Reduce job per table does the job ● No extra copies of the data to keep around  Cons ● Job Failure == Starting Over ● Shipping uncompressed data over the wire ● Slow Data Migration - Attempt #1
  7. 7. 1. Start replication to the new datacenter 1. Snapshot Table Data Migration - Attempt #2 Old DC New DC Replication Replication Old DC New DC Snapshot
  8. 8. 3. Export Snapshot 4. Bulk Load into destination cluster HDFS Data Migration - Attempt #2 Old DC New DC Replication Snapshot Exported Snapshot ExportSnapshot MR Job HDFS Old DC New DC Replication Exported Snapshot Bulk Load
  9. 9.  Pros ● Shipping compressed data over the wire ● Can easily modify the code such that if the Export Job fails, you can resume where you left off ● Much faster than copy table if your compression rates are good  Cons ● When you compact with snapshots you keep the old HFiles ● Potentially storing 2x original table on disk. ● More operational steps than copy table. ● Possibility of resurrecting deleted data. ● Snapshot Data - Delete data - Major Compact - Import Snapshot Data Migration - Attempt #2
  10. 10. 1. Start replication to the new datacenter 2. Partial Snapshot Table (HBASE-13031 - Ability to snapshot based on a key range) Data Migration - Attempt #3 Old DC New DC Replication Old DC New DC Snapshot Replication Snapshot Snapshot
  11. 11. 3. Export Snapshot 3. Bulk Load into destination cluster Data Migration - Attempt #3 Replication Multiple ExportSnapshot MR Jobs HDFS Old DC New DC Snapshot Snapshot Snapshot Exported Snapshot Exported Snapshot Exported Snapshot Exported Snapshot HDFS Old DC New DC Replication Exported Snapshot Exported Snapshot Multiple Bulk Load runs
  12. 12.  Pros ● Same as the previous attempt ● If you have large tables and limited space in DFS you can snapshot a key range, thus limiting the amount of duplicate storage at any time.  Cons ● Adding even more operational overhead. ● Still possibility of resurrecting deleted data Data Migration - Attempt #3
  13. 13. No downtime: [Hadoop-1.x,HBase-94.x] => [Hadoop-2.x, HBase-98.x]  Issues we had to iron out ● How to Migrate Data? ● HBase-94.x (Writable) HBase-98.x (Protobufs) ● Hadoop-1.x <-> Hadoop-2.x (Can’t push data, must use HFtp/WebHDFS) ● Snapshots are not compatible between HBase-94.x and HBase-98.x ● Client code compatibility ● Must be compatible with both Hadoop versions and HBase versions for some time ● Migrating our HBase jobs from Mapreduce -> YARN ● We had a few patches to Hadoop which protected HBase which no longer apply ● max_concurrent_map_tasks ● max_tasks_input_split Upgrading our cluster
  14. 14. The Requirement Deploy a single code base to either 0.94 or 0.98 clusters Good News: Most of the API calls are identical, so everything can resolve properly at runtime! Bad News: … most, not all. Migrating Client Code from 0.94 to 0.98
  15. 15. What we did  Separated out our HBase client code from the rest of the project  Forked that library to have separate 94 and 98 versions  Changed our build process to include either version depending on which cluster we’re building for Migrating Client Code from 0.94 to 0.98 VS
  16. 16.  Serialization changed significantly (Hadoop Writable -> Protobufs)  Input value types changed (KeyValue -> Cell) Solution: Added an abstract base class to handle these differences 0.94 to 0.98 - Filters 0.94 Filter 0.98 Filter
  17. 17. Instantiation changed too - now each Filter needs its own static Factory method which is found via reflection in the RegionServer code Adding this method causes no backwards compatibility issues! 0.94 to 0.98 - Filters
  18. 18. New reversed field on base Filter class broke our serialization unit tests, which expects transient fields to be marked transient even if they aren’t used in Java Serialization See HBASE-12996 0.94 to 0.98 - Filters Why aren’t you transient??
  19. 19. - In 0.94 HTables were heavy, so we cached them - We maintained a singleton cache of HTables so we wouldn’t have to reinstantiate them repeatedly 0.94 to 0.98 - HTable Caching TableInstanceCache ● Map<TableName, HTable> ● Lazy initialization HBaseDAOs - setAutoFlush() Give me an HTable!
  20. 20.  But in 0.98 HTables are light and you’re expected to create/destroy them as needed.  We still use that cache because we make heavy use of batching writes via setAutoFlush(false)  HTableMultiplexer is the 0.98 way to do this, but has a few issues that prevented us from moving towards it: 1. Backwards API compatibility with 0.94 2. Dropping requests (i.e. put() returns false) if the queue is full 3. No flush() sync point to ensure all buffered data is written 0.94 to 0.98 - HTable Caching
  21. 21.  Problem ● Adding Removing Racks causes the balancer to move way too many regions  Solution ● Patch the balancer to limit the number of regions moved per run  Problem ● Regions not written to for a while with only 1 store file were not candidates for major compaction, thus potentially having non-local blocks  Solution ● HBASE-11195 - Potentially improve block locality during major compaction for old regions  Problem ● Balancer does not respect draining nodes  Solution ● HBASE-10528 - DefaultBalancer selects plans to move regions onto draining nodes Some Operational Issues and Solutions
  22. 22. Ian Friedman - ianfriedman@yahoo-inc.com Rahul Gidwani - rahulgidwani@yahoo-inc.com Questions?

×