PNUTS: Yahoo!’s Hosted Data Serving Platform           Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam   Sil...
Motivation• Web applications need:  o Scalability    -architectural scalability, scale linearly  o Geographic scope    -da...
Relaxed Consistency• Not strictly consistency• Very expensive.• Not eventually consistency• Ex: a photo sharing applicatio...
What is PNUTS?• PNUTS, a massively parallel and geographically  distributed database system for Yahoo!’s web  applications...
System architecture                      5
System architecture• Storage Units• Store several hundreds of tablets, a tablet usually several  hundreds of megabytes.• R...
Yahoo Message Broker• Distributed publish-subscribe service.• Guarantees delivery once a message is  published.• Asynchron...
Types of Table                 8
Tablet splitting and balancing     Each storage unit has many tablets (horizontal partitions of the table)                ...
Query processing                   10
Accessing data         4                1         Record for key k Get key k                                      2       ...
Bulk read               1             {k1, k2, … kn}     Get k         1                                      2           ...
Per-record timelineconsistency• all replicas of a given record apply all updates to  the record in the same order.        ...
Per-record timelineconsistency•   An example sequence of updates to a record•   3 events: insert, update and delete.•   On...
Consistency model • Goal: make it easier for applications to reason about   updates and cope with asynchrony • web applica...
Consistency model                                                       Read-any                        Stale version     ...
Consistency model                                                Read latest                        Stale version         ...
Consistency modelRead-critical(required version):                    Read ≥ v.6                            Stale version  ...
Consistency modelTest-and-set-write(required version)                 Write if = v.7                                      ...
Consistency model                                           Write if = v.7                                                ...
Consistency levels   • Eventual consistency       o Transactions:           • Alice changes status from “Sleeping” to “Awa...
Consistency levels   • Timeline consistency       o Transactions:           • Alice changes status from “Sleeping” to “Awa...
Experiments              23
Experimental setup• Production PNUTS code  o Enhanced with ordered table type• Three PNUTS regions  o 2 west coast, 1 east...
Inserts• Inserts   o required 75.6 ms per insert in West 1 (tablet     master)   o 131.5 ms per insert into the non-master...
10% writes by default                        26
Lessons learned (1)• Simpler is better than clever   o Clever approaches are hard to     implement, test, debug and mainta...
Lessons learned (2)• Non-algorithmic challenges can be hard   o Dealing with network config, legacy software     and requi...
Upcoming SlideShare
Loading in …5
×

Pnuts

1,408 views

Published on

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,408
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
55
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Pnuts

  1. 1. PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen,Nick Puz, Daniel Weaver and Ramana Yerneni Yahoo! Research 1
  2. 2. Motivation• Web applications need: o Scalability -architectural scalability, scale linearly o Geographic scope -data replicas on multiple continents o High availability -failures, apps will still be able to read data o Relaxed consistency needs -Tolerate stale or reordered data 2
  3. 3. Relaxed Consistency• Not strictly consistency• Very expensive.• Not eventually consistency• Ex: a photo sharing application• U1: Remove someone from the list of people who can view his photos• U2: Post spring-break photos 3
  4. 4. What is PNUTS?• PNUTS, a massively parallel and geographically distributed database system for Yahoo!’s web applications.• An architecture based on record- level, asynchronous geographic replication, and use of a guaranteed message-delivery service rather than a persistent log. 4
  5. 5. System architecture 5
  6. 6. System architecture• Storage Units• Store several hundreds of tablets, a tablet usually several hundreds of megabytes.• Routers• The router stores an interval mapping, which defines the boundaries of each tablet, and also maps each tablet to a storage unit.• Tablet Controller• Routers contain only a cached copy of the interval mapping. The mapping is owned by the tablet controller• YMB- Yahoo Message Broker• topic-based pub/sub system 6
  7. 7. Yahoo Message Broker• Distributed publish-subscribe service.• Guarantees delivery once a message is published.• Asynchronously assigned to different regions and applied to their replicas. 7
  8. 8. Types of Table 8
  9. 9. Tablet splitting and balancing Each storage unit has many tablets (horizontal partitions of the table) Storage unit may become a hotspotStorage unit Tablet Overfull tablets split Tablets may grow over time Shed load by moving tablets to other servers 9
  10. 10. Query processing 10
  11. 11. Accessing data 4 1 Record for key k Get key k 2 3 Record for key k Get key k SU SU SU 11
  12. 12. Bulk read 1 {k1, k2, … kn} Get k 1 2 Get k 2 Get k 3 Scatter/ SU SU SU gather engine 12
  13. 13. Per-record timelineconsistency• all replicas of a given record apply all updates to the record in the same order. 13
  14. 14. Per-record timelineconsistency• An example sequence of updates to a record• 3 events: insert, update and delete.• One replica assigned as the master• Generation: new insert Version: each update 14
  15. 15. Consistency model • Goal: make it easier for applications to reason about updates and cope with asynchrony • web applications typically manipulate one record at a time Record Update Update Update Update Update Update Delete Update inserted v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time 15
  16. 16. Consistency model Read-any Stale version Stale version Current version v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 TimeRead-any: Returns a possibly stale version of the record. 16
  17. 17. Consistency model Read latest Stale version Stale version Current version v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 TimeRead latest: Returns the latest copy of the record thatreflects all writes that have succeeded. 17
  18. 18. Consistency modelRead-critical(required version): Read ≥ v.6 Stale version Stale version Current version v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time Read critical: Returns a version of the record that is strictly newer than, or the same as the required version. 18
  19. 19. Consistency modelTest-and-set-write(required version) Write if = v.7 ERROR Stale version Stale version Current version v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time This call performs the requested write to the record if and only if the present version of the record is the same as required version 19
  20. 20. Consistency model Write if = v.7 ERROR Stale version Stale version Current version Mechanism: per record mastership v. 1 v. 2 v. 3 v. 4 v. 5 Generation 1 v. 6 v. 7 v. 8 Time 20
  21. 21. Consistency levels • Eventual consistency o Transactions: • Alice changes status from “Sleeping” to “Awake” • Alice changes location from “Home” to “Work” (Alice, Home, Sleeping) (Alice, Home, Awake) (Alice, Work, Awake)Region 1 Awake Awake Work Final state consistent Work (Alice, Home, Sleeping) (Alice, Work, Sleeping) (Alice, Work, Awake)Region 2 “Invalid” state visible
  22. 22. Consistency levels • Timeline consistency o Transactions: • Alice changes status from “Sleeping” to “Awake” • Alice changes location from “Home” to “Work” (Alice, Home, Sleeping) (Alice, Home, Awake) (Alice, Work, Awake)Region 1 Awake Work (Alice, Work, Awake) Work (Alice, Home, Sleeping) (Alice, Work, Awake)Region 2
  23. 23. Experiments 23
  24. 24. Experimental setup• Production PNUTS code o Enhanced with ordered table type• Three PNUTS regions o 2 west coast, 1 east coast o 5 storage units, 2 message brokers, 1 router• Workload parameters o Request rate: 1200-3600 requests/second o Read: write mix ratio:0-50% writes o Locality:80% 24
  25. 25. Inserts• Inserts o required 75.6 ms per insert in West 1 (tablet master) o 131.5 ms per insert into the non-master West 2, and o 315.5 ms per insert into the non-master East. o These results show the expected effect that the cost of inserting is significantly higher if the insert is initiated in a non-master region that is far away from the tablet master. 25
  26. 26. 10% writes by default 26
  27. 27. Lessons learned (1)• Simpler is better than clever o Clever approaches are hard to implement, test, debug and maintain• Incremental is better than big-bang
  28. 28. Lessons learned (2)• Non-algorithmic challenges can be hard o Dealing with network config, legacy software and requirements, the “corporate way,” multiple stakeholders…• Researchers should get dirty hands o Being a part of shipping a real system can radically readjust your worldview o Write some test cases to understand system complexity

×