Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
2013-08-20
Dave Latham
 History
 Stats
 HowWe Store Data
 Challenges
 MistakesWe Made
 Tips / Patterns
 Future
 Moral of the Story
 2008 –Flurry Analytics for MobileApps
 Sharded MySQL, or
 HBase!
 Launched on 0.18.1 with a 3 node cluster
 Great co...
 1000 slave nodes per cluster
 32 GB RAM, 4 drives (1 or 2TB), 1 GigE, dual quad-
core * 2 HT = 16 procs
 DataNode,Task...
 12 physical nodes
 5 region servers with 20GB heaps on each
 1 table - 8 billion small rows - 500GB (LZO)
 All in blo...
 DAO for Java apps
 Requires:
▪ writeRowIndex / readRowIndex
▪ readKeyValue / writeRowContents
 Provides:
▪ save / dele...
 Change row key format
 DAO supports both formats
1. Create new table
2. Writes to both
3. Migrate existing
4. Validate
...
 Bottlenecks (not horizontally scalable)
 HMaster (e.g. HLog cleaning falls behind creation
[HBASE-9208])
 NameNode
▪ D...
 Too many regions (250k)
 Max size 256M -> 1 GB -> 5 GB
 Slow reassignments on failure
 Slow hbck recovery
 Lots of M...
 Latency long tail
 HTable Flush write buffer
 GC pauses
 RegionServer failure
 (SeeTheTail at Scale – Jeff Dean, Lui...
 Shared cluster for MapReduce and live
queries
 IO bound requests hog handler threads
 Even cached reads get slow
 Reg...
 HDFS-5042 Completed files lost after power failure
 ZOOKEEPER-1277 servers stop serving when lower 32bits of
zxid roll ...
 Small region size -> many regions
 Nagle’s
 Trying to solve a crisis you don’t understand
(hbck fixSplitParents)
 Set...
 Compact data matters (even with
compression)
 Block cache, network not compressed
 Avoid random reads on non cached ta...
 HDFS HA
 Snapshots (see how it works with 100k
regions on 1000 servers)
 2000 node clusters
 test those bottlenecks
...
 Scaled 1000x and more on the same DB
 If you’re on the edge you need to understand
your system
 Monitor
 Open Source
...
 And maybe some answers
Upcoming SlideShare
Loading in …5
×

HBase at Flurry

3,499 views

Published on

Slides from HBase Meetup at Flurry 2013-08-20

Published in: Technology, Art & Photos
  • Be the first to comment

HBase at Flurry

  1. 1. 2013-08-20 Dave Latham
  2. 2.  History  Stats  HowWe Store Data  Challenges  MistakesWe Made  Tips / Patterns  Future  Moral of the Story
  3. 3.  2008 –Flurry Analytics for MobileApps  Sharded MySQL, or  HBase!  Launched on 0.18.1 with a 3 node cluster  Great community  Now running 0.94.5 (+ patches)  2 data centers with 2 clusters each  Bidirectional replication
  4. 4.  1000 slave nodes per cluster  32 GB RAM, 4 drives (1 or 2TB), 1 GigE, dual quad- core * 2 HT = 16 procs  DataNode,TaskTracker, RegionServer (11GB), 5 Mappers, 2 Reducers  ~30 tables, 250k regions, 430TB (after LZO)  2 big tables are about 90% of that ▪ 1 wide table: 3 CF, 4 billion rows, up to 1MM cells per row ▪ 1 tall table: 1 CF, 1 trillion rows, most 1 cell per row
  5. 5.  12 physical nodes  5 region servers with 20GB heaps on each  1 table - 8 billion small rows - 500GB (LZO)  All in block cache (after 20 minute warmup)  100k-1MM QPS - 99.9% Reads  2ms mean, 99% <10ms  25 ms GC pause every 40 seconds  slow after compaction
  6. 6.  DAO for Java apps  Requires: ▪ writeRowIndex / readRowIndex ▪ readKeyValue / writeRowContents  Provides: ▪ save / delete ▪ streamEntities / pagination ▪ MR input formats on entities (rather than Result)  Uses HTable or asynchbase
  7. 7.  Change row key format  DAO supports both formats 1. Create new table 2. Writes to both 3. Migrate existing 4. Validate 5. Reads to new table 6. Write to (only) new table 7. Drop old table
  8. 8.  Bottlenecks (not horizontally scalable)  HMaster (e.g. HLog cleaning falls behind creation [HBASE-9208])  NameNode ▪ Disable table / shutdown => many HDFS files at once ▪ Scan table directory => slow region assignments  ZooKeeper (HBase replication)  JobTracker (heap)  META region
  9. 9.  Too many regions (250k)  Max size 256M -> 1 GB -> 5 GB  Slow reassignments on failure  Slow hbck recovery  Lots of META queries / big client cache ▪ Soft refs can exacerbate  Slow rolling restarts  More failures (Common and otherwise)  Zombie RS
  10. 10.  Latency long tail  HTable Flush write buffer  GC pauses  RegionServer failure  (SeeTheTail at Scale – Jeff Dean, Luiz André Barroso)
  11. 11.  Shared cluster for MapReduce and live queries  IO bound requests hog handler threads  Even cached reads get slow  RegionServer falls behind, stays behind  If the cluster goes down, it takes awhile to come back
  12. 12.  HDFS-5042 Completed files lost after power failure  ZOOKEEPER-1277 servers stop serving when lower 32bits of zxid roll over  ZOOKEEPER-1731 Unsynchronized access to ServerCnxnFactory.connectionBeans results in deadlock
  13. 13.  Small region size -> many regions  Nagle’s  Trying to solve a crisis you don’t understand (hbck fixSplitParents)  Setting up replication  Custom backup / restore  CopyTable OOM  Verification
  14. 14.  Compact data matters (even with compression)  Block cache, network not compressed  Avoid random reads on non cached tables (duh!)  Write cell fragments, combine at read time to avoid doing random reads  compact later - coprocessor?  can lead to large rows ▪ probabilistic counter
  15. 15.  HDFS HA  Snapshots (see how it works with 100k regions on 1000 servers)  2000 node clusters  test those bottlenecks  larger regions, larger HDFS blocks, larger HLogs  More (independent) clusters  Load aware balancing?  Separate RPC priorities for workloads  0.96
  16. 16.  Scaled 1000x and more on the same DB  If you’re on the edge you need to understand your system  Monitor  Open Source  Load test  Know your load  Disk or Cache (or SSDs?)
  17. 17.  And maybe some answers

×