Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo -- replication and efficient backup

94 views

Published on

Chenji Pan and Lianghong Xu of PInterest

Published in: Internet
  • Be the first to comment

  • Be the first to like this

HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo -- replication and efficient backup

  1. 1. hosted by Improving HBase reliability at Pinterest with Geo- replication and Efficient Backup August 17,2018 Chenji Pan Lianghong Xu Storage & Caching, Pinterest
  2. 2. hosted by Content 01 02 03 HBase in Pinterest Multicell Backup
  3. 3. hosted by HBase in Pinterest01
  4. 4. hosted by HBase in Pinterest • Use HBase for online service since 2013 • Back data abstraction layer like Zen, UMS • ~50 Hbase 1.2 clusters • Internal repo with ZSTD, CCSMAP, Bucket cache, timestamp, etc
  5. 5. hosted by Multicell02
  6. 6. hosted by 2011 2012 2013 2014 2015 2016 Why Multicell?
  7. 7. hosted by Architecture DB US-East US-West Data Service Global Load Balancer Cache … Local Load Balancer DB Data Service Cache … Local Load Balancer Replication
  8. 8. hosted by Master-Master DB US-East US-West Data Service Cache DB Data Service Cache Replication write req: key: val write req: key: val Invalidate Update Invalidate Update
  9. 9. hosted by Master-Slave Write DB US-East US-West(Master) Data Service Remote Marker Pool DB Data Service write req: key: val Update Forward Set Remote Marker Replication clean
  10. 10. hosted by Master-Slave Read DB US-East US-West(Master) Data Service Remote Marker Pool DB Data Service read req: key Read Read remote Check Remote Marker Read Local
  11. 11. hosted by Cache Invalidation Service DB US-East US-West Data Service Cache DB Data Service Cache write req: key: val Replication Invalidate Update Kafka Cache Invalidation Service Consume Invalidate
  12. 12. hosted by Mysql HBase DB
  13. 13. hosted by Maxwell Mysql Comment Mysql
  14. 14. hosted by Cache Invalidation Service DB US-East US-West Data Service Cache DB Data Service Cache write req: key: val Replication Invalidate Update Kafka Cache Invalidation Service Consume Invalidate
  15. 15. hosted by Mysql and HBase DB Kafka Comment Mysql Maxwell Mysql Comment HBase HBase replication proxy Hbase Annotations
  16. 16. hosted by • Expose HBase replicate API • Customized Kafka Topic • Event corresponding to mutation • Multiple HBase clusters share one HRP HBase Replication Proxy HBase ClusterA HRP Kafka HBase Cluster B Replication Replication publish
  17. 17. hosted by • Part of Mutate • Written in WAL log, but not Memstore HBase Annotations
  18. 18. hosted by • Avoid race condition HBase Timestamp HBase Cluster Data service Response with TS Cache Compare TS and update
  19. 19. hosted by Replication Topology Issue HBase M_E HBase S_E HBase M_W HBase S_W US-East US-West
  20. 20. hosted by Replication Topology Issue HBase M_E HBase S_E HBase M_W HBase S_W US-East US-West
  21. 21. hosted by Replication Topology Issue HBase M_E HBase S_E HBase M_W HBase S_W US-East US-West
  22. 22. hosted by Replication Topology Issue HBase M_E HBase S_E HBase M_W HBase S_W US-East US-West ZK ZK Get remote master region server list Update zk with master’s region server list
  23. 23. hosted by Improving backup efficiency 01 02 03 HBase backup at Pinterest Simplifying backup pipeline Offline Deduplication
  24. 24. hosted by HBase Backup at Pinterest HBase serves highly critical data • Requires very high availability • 10s of clusters with 10s of PB of data • All needs backup to S3 Daily backup to S3 for disaster recovery • Snapshot + WAL for point-in-time recovery • Maintain weekly/monthly backups according to retention policy • Also used for offline data analysis
  25. 25. hosted by Legacy Backup Problem Two-step backup pipeline • HBase -> HDFS backup cluster • HDFS -> S3 Problem with the HDFS backup cluster • Infra cost as data volume increases • Operational pain on failure HBase 0.94 does not support S3 export Hbase cluster HDFS backup cluster AWS S3 Hbase cluster Hbase cluster … Snapshots and WALs
  26. 26. hosted by Upgrade Backup Pipeline Hbase cluster HDFS backup cluster AWS S3 Hbase cluster Hbase cluster … Snapshots and WALs Hbase cluster AWS S3 Hbase cluster Hbase cluster … Snapshots and WALs PinDedup Offline deduplication HBase 1.2HBase 0.94
  27. 27. hosted by Challenge and Approach Directly export HBase backup to S3 • Table export done using a variant of distcp • Use S3A client with the fast upload option Direct S3 upload is very CPU intensive • Large HFiles broken down into smaller chunks • Each chunk needs to be hashed and signed before upload Minimize impact on prod HBase clusters • Constrain max number of threads and Yarn contains per host • Max CPU Overhead during backup < 30%
  28. 28. hosted by Offline HFile Deduplication HBase backup contains many duplicates Observation: large HFiles rarely change • Account for most storage usage • Only merged during major compaction • For read-heavy clusters, much redundancy across backup cycles PinDedup: offline S3 deduplication tool • Asynchronously checks for duplicate S3 files • Replace old files with references to new ones rs1/F1 rs1/F2 rs1/F3 rs1/F1 rs1/F4 rs1/F5 10GB 10GB 500MB 30MB 400MB 80MB Day1 backup Day2 backup Largest file usually unchanged
  29. 29. hosted by PinDedup Approach rs1/F1 rs1/F2 rs1/F3 rs1/F1 rs1/F4 rs1/F5 10GB 500MB 30MB 400MB 80MB Day1 backup Day2 backup 10GB Source: s3://bucket/dir/dt=dt1 Target: s3://bucket/dir/dt=dt2 Same file name and checksum Dedup candidates • Only checks HFiles in the same regions in adjacent dates • Declare duplicates when both filename and md5sum match • No need for large on-disk dedup index, very fast lookup
  30. 30. hosted by Design Choices File encoding File- vs. chunk-level deduplication Online vs. offline deduplication
  31. 31. hosted by File- vs. Chunk-level Dedup More fine-grained duplication detection? -> Chunk-level dedup Only marginal benefits • Rabin fingerprint chunking, 4K average chunk size • Increased complexity for implementation • During compaction, merged changes are spread across entire file Lessons • File-level dedup is good enough • Less aggressive major compaction to keep the largest files unchanged After compaction Before compaction Large HFile Small HFile
  32. 32. hosted by Online vs. Offline Dedup AWS S3 HBase cluster PinDedup File checksums Non-duplicate files Online dedup: • reduces data transfer to S3 AWS S3 HBase cluster PinDedup All files Offline dedup: • More control on when dedup occurs • Isolate backup and dedup failures
  33. 33. hosted by File encoding Dedup the old or new file? • Pros: one-step decoding • Cons: dangling file pointers when old files are deleted. E.g, when F1 is garbage collected, F2’ and F3’ become unaccessible. Intuition: keep the old file, dedup the new one Design choice: keep the new file, dedup the old one • No overhead accessing the latest copy (most use cases) • Avoids the dangling pointer problem
  34. 34. hosted by Results Significantly reduced infra cost Reduced backup end-to-end time by 50% 3-137X compression on S3 storage usage Lower operational overhead
  35. 35. hosted by Thanks

×