Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Zen: Pinterest's Graph Storage Service


Published on

Video and slides synchronized, mp3 and slide download available at URL

This talk goes over the design motivation for Zen and describe its internals including the API, type system and HBase backend. Filmed at

Xun Liu is a software engineer in the infrastructure team at Pinterest. He worked in many areas and is currently focusing on storage and caching solutions at Pinterest. Raghavendra Prabhu is engineering manager for the infrastructure team at Pinterest, which is responsible for core backend infrastructure including storage systems, caching, service framework and core business logic.

Published in: Technology

Zen: Pinterest's Graph Storage Service

  1. 1. News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on! /zen-pinterest-graph-storage-service
  2. 2. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco
  3. 3. Raghavendra Prabhu (RVP) Zen: Pinterest’s Graph Storage Service
  4. 4. “Given how robust the messenger is on day one, it’s surprising to learn that Pinterest built the entire product 
 in three months.” — The Verge
  5. 5. What does it take to do this consistently? Many different things •Hire the best •Culture •Focus Infrastructure that doesn’t get in the way
  6. 6. Challenge for Infrastructure
  7. 7. Our approach •Make it part of the team mission statement •Design systems with ‘move fast’ in mind •Separation of concerns: feature vs reliability
  8. 8. Persistent Storage Even with a distributed database, app needs to deal with: •Schema design •Fault tolerance •Capacity management •Performance tuning
  9. 9. Solution 1: UserMetaStore Storage-as-a-Service: Key-value thrift API on top of HBase Features: •Key partitioning to balance load •Master-slave clusters, semi automatic failover •Speculative execution •Multi-tenancy with traffic isolation
  10. 10. Storage-as-a-service is a great step forward, but can we do better?
  11. 11. Example: Messages Data Model Conversation Message 1 Message 2 Message N User User Participates Contains
  12. 12. Realization •These object models closely resemble a graph •Objects are nodes, edges represent relationships •Typical needs: • retrieve data for a node or edge • get all outgoing edges from a node • get all incoming edges from a node • count incoming or outgoing edges for a node
  13. 13. Enter Zen! •Provides a graph data model instead of key-value •Automatically creates necessary indexes •Materializes counts for efficient querying •Implemented on top of HBase, but can plug in other backends
  14. 14. Why the name Zen? •Data model inspired by Facebook’s TAO •But internally a very different system •Zen: • “evolution of Buddhism under Taoist conditions” • “simplified version of Taoism” • basically Pinterest’s take on the TAO idea :)
  15. 15. What Zen is NOT •NOT a full fledged graph database •NO advanced graph operations •Basically an object-relationship data model on top of existing databases to simplify app development
  16. 16. Zen API Nodes: • addNode, removeNode, getNode • Node id: globally unique 64-bit integer
 ID 123 Prop 1 Val 1 Prop 2 Val 2
  17. 17. Zen API Edges: • addEdge, removeEdge, getEdge • Edge Ref: (edgeType, fromId, toId) • Score for ordering Edge Ref 120, 123, 4567 Prop 1 Val 1 Prop 2 Val 2
  18. 18. Zen API Edge Queries: • getEdges, countEdges, removeEdges struct EdgeQuery { 1: required NodeId nodeId; 2: required EdgeDirection direction; 3: optional TypeId edgeType; }
  19. 19. Zen API Property Indexes •Unique index •Ensures a property value is unique across all nodes of a type •Non-unique index •Allows retrieval by property value •Works for both nodes and edges
  20. 20. Zen API Type System •Declare node and edge types •Specify type schema, e.g. unique and non-unique index properties •Fully online: no deploy, no config •Internally implemented on top of Zen itself!
  21. 21. Illustration: Messages on Zen Id:1234 Id:2345 Id:3456 Type: Participates Type: Contains Type: Conversation Started: 12 Aug 2014 08:00 Header: “Great pin!” Pin Id: 10001 [non-unique] Type: User Name: “Ben Smith” [unique] Status: Active Type: Message Sent: 12 Aug 2014 08:00 Text: “Great pin!”
  22. 22. Zen: Current Usage Products: • smart feed, messages, network news, interest graph and other upcoming features Numbers: • ~10 clusters • 100,000+ requests per second at peak • Over 5 million HBase operations per second
  23. 23. Xun Liu Internals and Production Learnings
  24. 24. Zen Backends •HBase backend implemented in fall 2013 •Currently working on MySQL backend •Other potential backends in future
  25. 25. HBase Data Model Overview Data
  26. 26. HBase Data Model Overview Data col1 col2 row-key-1 val1 val2
  27. 27. HBase Data Model Overview Data col1 col2 col3 row-key-1 val1 val2 row-key-2 val3 val4
  28. 28. HBase Data Model Overview Data col1 col2 col3 col4 row-key-1 val1 val2 row-key-2 val3 val4 row-key-3 val5
  29. 29. Zen - Property Data type name score distance 12345 (node) 10 Ben Smith 12345-20-67890 (edge) 1000 1 mile
  30. 30. Zen - Property Index Data ID <hash>-unique-10-name=ben smith 12345 <hash>-nonuniq-10-lastname=smith-12345 <hash>-nonuniq-10-lastname=smith-67890
  31. 31. Zen - Edge Score Index Data 12345-out-20-1000-67890 12345-out-20-1001-67891 12345-in-30-990-67892 12345-in-30-991-67893
  32. 32. Zen - Edge Count Data Count 12345-out-20 2 12345-in-30 4
  33. 33. Status - Soft Delete New Features
  34. 34. Built-in Cache New Features Zen Cache HBaseClient Zen HBaseClient Cache Before After
  35. 35. Namespace New Features Node Namespace 1 Edge Index Node Namespace 2 Edge Index
  36. 36. New Features •Online type schema change •Optional reverse edge •Optional edge count •Retrieval of subset of properties •Descending edge score
  37. 37. Performance Work Demanding work load needs special tuning • Inserting 1 million edges per second • Excessive HLog (WAL) flushes
  38. 38. Performance Work Batching • Client Side Batching — bulk edge insertion • Zen Server Side Batching — buffer edits across clients & flush together • Reduced HLog (WAL) flushes by orders of magnitude
  39. 39. Performance Work Memory v.s. Performance • Bloom filter • reduce disk seeks • memory cost: 1 byte per row • Block size • the smaller block size the better random access performance • memory cost: bigger index size
  40. 40. Performance Work CPU v.s. Data Size • Encoding • FAST_DIFF: effective in reducing data size, cpu intensive • PREFIX: less effective in size reduction, less cpu intensive • Compression • SNAPPY, LZO, GZ, etc
  41. 41. Performance Work Capability to tune storage engine per special load Zen production setup • Dedicated Zen cluster • Namespace in shared Zen cluster
  42. 42. Data Consistency Add an edge 1. CAS create the edge row and properties 2. CAS create the unique index if any 3. Create non-unique index if any 4. Create edge score index for outgoing direction 5. Create edge score index for incoming direction 6. Increment edge count for outgoing direction 7. Increment edge count for incoming direction
  43. 43. Distributed transaction or not?
  44. 44. Data Consistency Stay on top of data inconsistencies • Manual rollback in Zen server • Offline jobs (Dr Zen) to scan and fix inconsistencies • Tools to debug and fix one-off inconsistency
  45. 45. Future Work •Dr Zen (make it more efficient) •Other backends: MySQL, etc •Distributed transactions •Open source!
  46. 46. Watch the video with slide synchronization on! pinterest-graph-storage-service