Advertisement
Advertisement

More Related Content

Viewers also liked(20)

Advertisement

More from Cloudera, Inc.(20)

Advertisement

HBaseCon 2013: Apache HBase at Pinterest - Scaling Our Feed Storage

  1. Apache HBase @Pinterest Scaling our “Feed” storage Varun Sharma Software Engineer June 13, 2013
  2. An online pinboard where you “curate” and “discover” things you love and go do them in real life What is Pinterest ?
  3. Discovery - “Follow” Model (Follower) (Followee) “Follower” follows “Followee” Follow Interest Graph • Follower indicates interest in Followee’s content • Following feed - Content collated from followees
  4. “Following” Feed @Pinterest New Pin Follower 1 Follower 2 Follower 3 . . . Fanout Fanout Fanout Challenges @scale • 100s of millions of pins/repins per month • High fanout - billions of writes per day (High throughput) • Billions of requests per month (Low latency and high availability)
  5. “Following” Feed on HBase CreationTs=100,PinId=8 CreationTs=99,PinId=6 UserId=678 <Empty> <Empty> • Pins in following feed reverse chronologically sorted • HBase Schema - choose wide over tall - Exploit lexicographic sorting within columns for ordering - Atomic transactions per user (inconsistencies get noticed at scale) - Row level bloom filters to eliminate unnecessary seeks - Prefix compression (FAST_DIFF) ........
  6. Frontend Message Bus Workers Follow Store Thrift + Finagle layer HBASE Pin Store Async. task enqueue Task dequeue Follow Unfollow New Pin Write Path • Follow => put • Unfollow => delete • New Pin => multi put
  7. Optimizing Writes • Increase per region memstore size - 512M memstore -> 40M HFile - Fewer HFiles and hence less frequent compactions • GC tuning - More frequent but smaller pauses
  8. Read Path Frontend Thrift + Finagle layer HBASE Pin Metadata Store Retrieve PinId(s) Retrieve Pin metadata
  9. Optimizing Reads Schema • Prefix compression - FAST_DIFF - 4 X size reduction • Reduced block size - 16K Cache • More block cache (hot set/temporal locality) • High cache hit rates Other Standard optimizations • Short circuit local reads • HBase checksums
  10. “Scale” Challenges ABfollow t1 ABunfollow t2 t1 < t2 (user) M(ABfollow) t1’ M(ABunfollow) t2’ t1’ < t2’ (msg queue) “Follow Unfollow” race • Lack of total ordering inside message queue • Resolution - client side timestamps • Example - use t1 and t2 as cell timestamps user1,pin1 user1,pin2 user1,pin3 . . . . . . . . user1,pin4 user1,pin5 user1,pin6 . . . . . . . . CompactionUnbounded dataset growth • Mapreduce/realtime trimming unfeasible • Coprocessors - trim during compactions user1,pin1 user1,pin2 user1,pin4 . . . . . . . . Trim
  11. MTTR 20s Stale node 30s ZooKeeper HDFS NN HBase Master • MTTR < 2 minutes consistently HBase • ZK session timeout 30 sec HDFS • Tight timeouts - socket timeout < 5 sec - connect timeout - 1 sec X 3 • Stale node timeout - 20 sec • Avoided during - WAL read - Lease recovery - Writing splits Failed status checks
  12. Single Points of Failure Cluster 1 Cluster 2 Message Queue Frontend Dual writes Cross cluster replication ZK Quorum Writes Reads
  13. Single Points of Failure (contd) Ephemeral EBS • No concept of HA shared storage on EC2 • Keep it simple - HA namenode + QJM - hell, no ! - Operate two clusters each in its own AZ HDFS NN
  14. Am I Better Off ? Redis vs HBase • Sharding, load balancing and fault tolerance • Longer feeds • Resolve data inconsistencies • Savings in $$ Cluster configuration • hi1.4xlarge - SSD backed for performance parity • HBase - 0.94.3 and 0.94.7 • HDFS - CDH 4.2.0
  15. And many more... • Rich pins • Duplicate pin notifications • Pinterest analytics • Recommendations - “People who pinned this also pinned” More to come...
Advertisement