HBase at Flurry

  • 2,099 views
Uploaded on

Slides from HBase Meetup at Flurry 2013-08-20

Slides from HBase Meetup at Flurry 2013-08-20

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,099
On Slideshare
0
From Embeds
0
Number of Embeds
5

Actions

Shares
Downloads
51
Comments
0
Likes
8

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 2013-08-20 Dave Latham
  • 2.  History  Stats  HowWe Store Data  Challenges  MistakesWe Made  Tips / Patterns  Future  Moral of the Story
  • 3.  2008 –Flurry Analytics for MobileApps  Sharded MySQL, or  HBase!  Launched on 0.18.1 with a 3 node cluster  Great community  Now running 0.94.5 (+ patches)  2 data centers with 2 clusters each  Bidirectional replication
  • 4.  1000 slave nodes per cluster  32 GB RAM, 4 drives (1 or 2TB), 1 GigE, dual quad- core * 2 HT = 16 procs  DataNode,TaskTracker, RegionServer (11GB), 5 Mappers, 2 Reducers  ~30 tables, 250k regions, 430TB (after LZO)  2 big tables are about 90% of that ▪ 1 wide table: 3 CF, 4 billion rows, up to 1MM cells per row ▪ 1 tall table: 1 CF, 1 trillion rows, most 1 cell per row
  • 5.  12 physical nodes  5 region servers with 20GB heaps on each  1 table - 8 billion small rows - 500GB (LZO)  All in block cache (after 20 minute warmup)  100k-1MM QPS - 99.9% Reads  2ms mean, 99% <10ms  25 ms GC pause every 40 seconds  slow after compaction
  • 6.  DAO for Java apps  Requires: ▪ writeRowIndex / readRowIndex ▪ readKeyValue / writeRowContents  Provides: ▪ save / delete ▪ streamEntities / pagination ▪ MR input formats on entities (rather than Result)  Uses HTable or asynchbase
  • 7.  Change row key format  DAO supports both formats 1. Create new table 2. Writes to both 3. Migrate existing 4. Validate 5. Reads to new table 6. Write to (only) new table 7. Drop old table
  • 8.  Bottlenecks (not horizontally scalable)  HMaster (e.g. HLog cleaning falls behind creation [HBASE-9208])  NameNode ▪ Disable table / shutdown => many HDFS files at once ▪ Scan table directory => slow region assignments  ZooKeeper (HBase replication)  JobTracker (heap)  META region
  • 9.  Too many regions (250k)  Max size 256M -> 1 GB -> 5 GB  Slow reassignments on failure  Slow hbck recovery  Lots of META queries / big client cache ▪ Soft refs can exacerbate  Slow rolling restarts  More failures (Common and otherwise)  Zombie RS
  • 10.  Latency long tail  HTable Flush write buffer  GC pauses  RegionServer failure  (SeeTheTail at Scale – Jeff Dean, Luiz André Barroso)
  • 11.  Shared cluster for MapReduce and live queries  IO bound requests hog handler threads  Even cached reads get slow  RegionServer falls behind, stays behind  If the cluster goes down, it takes awhile to come back
  • 12.  HDFS-5042 Completed files lost after power failure  ZOOKEEPER-1277 servers stop serving when lower 32bits of zxid roll over  ZOOKEEPER-1731 Unsynchronized access to ServerCnxnFactory.connectionBeans results in deadlock
  • 13.  Small region size -> many regions  Nagle’s  Trying to solve a crisis you don’t understand (hbck fixSplitParents)  Setting up replication  Custom backup / restore  CopyTable OOM  Verification
  • 14.  Compact data matters (even with compression)  Block cache, network not compressed  Avoid random reads on non cached tables (duh!)  Write cell fragments, combine at read time to avoid doing random reads  compact later - coprocessor?  can lead to large rows ▪ probabilistic counter
  • 15.  HDFS HA  Snapshots (see how it works with 100k regions on 1000 servers)  2000 node clusters  test those bottlenecks  larger regions, larger HDFS blocks, larger HLogs  More (independent) clusters  Load aware balancing?  Separate RPC priorities for workloads  0.96
  • 16.  Scaled 1000x and more on the same DB  If you’re on the edge you need to understand your system  Monitor  Open Source  Load test  Know your load  Disk or Cache (or SSDs?)
  • 17.  And maybe some answers