Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Large-scale Web Apps @ Pinterest

3,133 views

Published on

Speaker: Varun Sharma (Pinterest)

Over the past year, HBase has become an integral component of Pinterest's storage stack. HBase has enabled us to quickly launch and iterate on new products and create amazing pinner experiences. This talk briefly describes some of these applications, the underlying schema, and how our HBase setup stays highly available and performant despite billions of requests every week. It will also include some performance tips for running on SSDs. Finally, we will talk about a homegrown serving technology we built from a mashup of HBase components that has gained wide adoption across Pinterest.

Published in: Software, Technology
  • Be the first to comment

Large-scale Web Apps @ Pinterest

  1. 1. Large Scale Web Apps @Pinterest (Powered by Apache HBase) May 5, 2014
  2. 2. Pinterest is a visual discovery tool for collecting the things you love, and discovering related content along the way. What is Pinterest ?
  3. 3. Scale Challenges @scale • 100s of millions of pins/repins per month • Billions of requests per week • Millions of daily active users • Billions of pins • One of the largest discovery tools on the internet
  4. 4. Storage stack @Pinterest ! • MySQL • Redis (persistence and for cache) • MemCache (Consistent Hashing) App Tier Manual Sharding Sharding Logic
  5. 5. Why HBase ? ! • High Write throughput - Unlike MySQL/B-Tree, writes don’t ever seek on Disk • Seamless integration with Hadoop • Distributed operation - Fault tolerance - Load Balancing - Easily add/remove nodes ! Non-Technical Reasons • Large active community • Large scale online use cases
  6. 6. Outline ! • Features powered by HBase • SaaS (Storage as a Service) - MetaStore - HFile Service (Terrapin) • Our HBase setup - optimizing for High availability & Low latency
  7. 7. Applications/Features ! • Offline - Analytics - Search Indexing - ETL/Hadoop worklows • Online - Personalized Feeds - Rich Pins - Recommendations ! Why HBase ?
  8. 8. Personalized Feeds WHY HBASE ? Write Heavy load due to Pin fanout. Recommended Pins Users I follow
  9. 9. Rich Pins WHY HBASE ? Negative Hits with Bloom Filters
  10. 10. Recommendations HADOOP 1.0 HBASE + HADOOP 2.0 HADOOP 2.0 WHY HBASE ? Seamless Data Transfer from Hadoop Generate Recommendations DistCP Jobs Serving Cluster
  11. 11. SaaS • Large number of feature requests • 1 Cluster per feature • Scaling with organizational growth • Need for “defensive” multi tenant storage • Previous solutions reaching their limits
  12. 12. MetaStore I • Key Value store on top of HBase • 1 HBase Table per Feature with salted keys • Pre split tables • Table level rate limiting (online/offline reads/writes) • No Scan support • Simple client API! ! string getValue(string feature, string key, boolean online); void setValue(string feature, string key, string value, boolean online);
  13. 13. MetaStore II MetaStore Thrift Server Primary HBase Secondary HBase Clients Master/Master Replication Thrift Salting + Rate Limiting ZooKeeper Issue Gets/Sets Notifications Metastore Config - Rate Limits - Primary Cluster
  14. 14. HFile Service (Terrapin) • Solve the Bulk Upload problem • HBase backed solution - Bulk upload + major compact - Major compact to delete old data • Design solution from scratch using mashup of: - HFile - HBase BlockCache - Avoid compactions - Low latency key value lookups ! ! !
  15. 15. High Level Architecture I ! Client Library /Service ETL/Batch Jobs Load/Reload HFile Servers ! HFiles on Amazon S3 Key/Value Lookups Multiple HFiles/Server
  16. 16. High Level Architecture II • Each HFile server runs 2 processes - Copier: pulls HFiles from S3 to local disk - Supershard: serves multiple HFile shards to client • ZooKeeper - Detecting alive servers - Coordinating loading/swapping of new data - Enabling clients to detect availability of new data • Loader Module (replaces distcp) - Trigger new data copy - Trigger swap through zookeeper - Update ZooKeeper and notify client • Client library understands sharding • Old data deleted by background process !
  17. 17. Salient Features • Multi tenancy through namespacing • Pluggable sharding functions - modulus, range & more • HBase Block Cache • Multiple clusters for redundancy • Speculative execution across clusters for low latency ! ! !
  18. 18. Setting up for Success • Many online usecases/applications • Optimize for: - Low MTTR - high availability - Low latency (performance) ! !
  19. 19. MTTR - I DEADLIVE STALE 20sec 9min 40sec ! • Stale nodes avoided - As candidates for Reads - As candidate replicas for writes - During Lease Recovery • Copying of underreplicated blocks starts when a Node is marked as “Dead” DataNode States
  20. 20. MTTR - II Failure Detection Lease Recovery Log Split Recover Regions 30 sec ZooKeeper session timeout HDFS 4721 HDFS 3703 + HDFS 3912 < 2 min ! • Avoid stale nodes at each point of the recovery process • Multi minute timeouts ==> Multi second timeouts
  21. 21. Simulate, Simulate, Simulate Simulate “Pull the plug failures” and “tail -f the logs” • kill -9 both datanode and region server - causes connection refused errors • kill -STOP both datanode and region server - causes socket timeouts • Blackhole hosts using iptables - connect timeouts + “No Route to host” - Most representative of AWS failures
  22. 22. Performance Configuration tweaks • Small Block Size, 4K-16K • Prefix compression to cache more - when data is in the key, close to 4X reduction for some data sets • Separation of RPC handler threads for reads vs writes • Short circuit local reads • HBase level checksums (HBASE 5074) Hardware • SATA (m1.xl/c1.xl) and SSD (hi1.4xl) • Choose based on limiting factor - Disk space - pick SATA for max GB/$$ - IOPs - pick SSD for max IOPs/$$, clusters with heavy reads or heavy compaction activity
  23. 23. Performance (SSDs) HFile Read Performance • Turn off block cache for Data Blocks, reduce GC + heap fragmentation • Keep block cache on for Index Blocks • Increase “dfs.client.read.shortcircuit.streams.cache.size” from 100 to 10,000 (with short circuit reads) • Approx. 3X improvement in read throughput ! Write Performance • WAL contention when client sets AutoFlush=true • HBase 8755
  24. 24. In the Pipeline... ! • Building a graph database on HBase • Disaster recovery - snapshot + incremental backup + restore • Off Heap cache - reduce GC overhead and better use of hardware • Read path optimizations
  25. 25. And we are Hiring !!

×