SlideShare a Scribd company logo
1 of 28
PNUTS: Yahoo!’s Hosted
 Data Serving Platform
           Brian F. Cooper, Raghu
 Ramakrishnan, Utkarsh Srivastava, Adam
   Silberstein, Philip Bohannon, HansArno
  Jacobsen,Nick Puz, Daniel Weaver and
               Ramana Yerneni
               Yahoo! Research
                                            1
Motivation
• Web applications need:
  o Scalability
    -architectural scalability, scale linearly
  o Geographic scope
    -data replicas on multiple continents
  o High availability
    -failures, apps will still be able to read data
  o Relaxed consistency needs
    -Tolerate stale or reordered data


                                                      2
Relaxed Consistency
• Not strictly consistency
• Very expensive.



• Not eventually consistency
• Ex: a photo sharing application
• U1: Remove someone from the list of people who
  can view his photos
• U2: Post spring-break photos


                                                   3
What is PNUTS?
• PNUTS, a massively parallel and geographically
  distributed database system for Yahoo!’s web
  applications.



• An architecture based on record-
  level, asynchronous geographic replication, and
  use of a guaranteed message-delivery service
  rather than a persistent log.



                                                    4
System architecture




                      5
System architecture
• Storage Units
• Store several hundreds of tablets, a tablet usually several
  hundreds of megabytes.
• Routers
• The router stores an interval mapping, which defines the
  boundaries of each tablet, and also maps each tablet
  to a storage unit.
• Tablet Controller
• Routers contain only a cached copy of the interval
  mapping. The mapping is owned by the tablet controller
• YMB- Yahoo Message Broker
• topic-based pub/sub system


                                                                6
Yahoo Message Broker
• Distributed publish-subscribe service.

• Guarantees delivery once a message is
  published.

• Asynchronously assigned to different regions
  and applied to their replicas.




                                            7
Types of Table




                 8
Tablet splitting and balancing
     Each storage unit has many tablets (horizontal partitions of the table)
                        Storage unit may become a hotspot


Storage unit
                                                                     Tablet




         Overfull tablets split             Tablets may grow over time

                  Shed load by moving tablets to other servers
                                                                               9
Query processing



                   10
Accessing data

         4                1
         Record for key k Get key k




                                      2
                   3
                   Record for key k   Get key k




    SU                SU               SU

                                                  11
Bulk read
               1
             {k1, k2, … kn}




     Get k
         1
                                      2
                  Get k
                      2
                              Get k
                                  3




                                          Scatter/
    SU       SU                 SU        gather
                                          engine

                                            12
Per-record timeline
consistency
• all replicas of a given record apply all updates to
  the record in the same order.




                                                        13
Per-record timeline
consistency


•   An example sequence of updates to a record
•   3 events: insert, update and delete.
•   One replica assigned as the master
•   Generation: new insert Version: each update


                                                  14
Consistency model
 • Goal: make it easier for applications to reason about
   updates and cope with asynchrony

 • web applications typically manipulate one record at a
   time

 Record Update                  Update Update   Update    Update Update          Delete
                    Update
 inserted


     v. 1    v. 2        v. 3      v. 4    v. 5        v. 6   v. 7        v. 8
                                          Generation 1                                     Time




                                                                                          15
Consistency model
                                                       Read-any




                        Stale version          Stale version             Current
                                                                         version


       v. 1   v. 2   v. 3   v. 4    v. 5        v. 6     v. 7     v. 8
                                   Generation 1                                     Time



Read-any: Returns a possibly stale version of the record.



                                                                                   16
Consistency model
                                                Read latest




                        Stale version          Stale version          Current
                                                                      version


       v. 1   v. 2   v. 3   v. 4    v. 5        v. 6   v. 7    v. 8
                                   Generation 1                                  Time



Read latest: Returns the latest copy of the record that
reflects all writes that have succeeded.


                                                                                17
Consistency model
Read-critical(required version):                    Read ≥ v.6




                            Stale version          Stale version          Current
                                                                          version


           v. 1   v. 2   v. 3   v. 4    v. 5        v. 6   v. 7    v. 8
                                       Generation 1                                  Time



  Read critical: Returns a version of the record that is
  strictly newer than, or the same as the required version.


                                                                                    18
Consistency model
Test-and-set-write(required version)                 Write if = v.7

                                                                             ERROR


                              Stale version          Stale version            Current
                                                                              version


             v. 1   v. 2   v. 3   v. 4    v. 5        v. 6   v. 7     v. 8
                                         Generation 1                                    Time



    This call performs the requested write to the record if
    and only if the present version of the record is the same
    as required version

                                                                                        19
Consistency model
                                           Write if = v.7

                                                                   ERROR


                    Stale version          Stale version            Current
                                                                    version


 Mechanism: per record mastership
   v. 1   v. 2   v. 3   v. 4    v. 5
                               Generation 1
                                            v. 6   v. 7     v. 8
                                                                               Time




                                                                              20
Consistency levels
   • Eventual consistency
       o Transactions:
           • Alice changes status from “Sleeping” to “Awake”
           • Alice changes location from “Home” to “Work”




           (Alice, Home, Sleeping) (Alice, Home, Awake)               (Alice, Work, Awake)
Region 1
                                                            Awake
                       Awake                   Work


                                                                               Final state consistent

                                                            Work
           (Alice, Home, Sleeping)       (Alice, Work, Sleeping)      (Alice, Work, Awake)
Region 2
                                                          “Invalid” state visible
Consistency levels
   • Timeline consistency
       o Transactions:
           • Alice changes status from “Sleeping” to “Awake”
           • Alice changes location from “Home” to “Work”




           (Alice, Home, Sleeping)   (Alice, Home, Awake)     (Alice, Work, Awake)
Region 1
                             Awake                  Work
                                                                                (Alice, Work, Awake)

                                                           Work



           (Alice, Home, Sleeping)                                        (Alice, Work, Awake)
Region 2
Experiments



              23
Experimental setup
• Production PNUTS code
  o Enhanced with ordered table type

• Three PNUTS regions
  o 2 west coast, 1 east coast
  o 5 storage units, 2 message brokers, 1 router

• Workload parameters
  o Request rate: 1200-3600 requests/second
  o Read: write mix ratio:0-50% writes
  o Locality:80%



                                                   24
Inserts
• Inserts
   o required 75.6 ms per insert in West 1 (tablet
     master)
   o 131.5 ms per insert into the non-master West
     2, and
   o 315.5 ms per insert into the non-master East.

   o These results show the expected effect that the
     cost of inserting is significantly higher if the insert
     is initiated in a non-master region that is far away
     from the tablet master.

                                                         25
10% writes by default




                        26
Lessons learned (1)
• Simpler is better than clever
   o Clever approaches are hard to
     implement, test, debug and maintain

• Incremental is better than big-bang
Lessons learned (2)
• Non-algorithmic challenges can be hard
   o Dealing with network config, legacy software
     and requirements, the “corporate way,” multiple
     stakeholders…

• Researchers should get dirty hands
   o Being a part of shipping a real system can
     radically readjust your worldview
   o Write some test cases to understand system
     complexity

More Related Content

Similar to Pnuts

2011.06.20 stratified-btree
2011.06.20 stratified-btree2011.06.20 stratified-btree
2011.06.20 stratified-btree
Acunu
 
VMware Backup in Cybozu Labs
VMware Backup in Cybozu LabsVMware Backup in Cybozu Labs
VMware Backup in Cybozu Labs
Takashi Hoshino
 
Design Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databasesDesign Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databases
lovingprince58
 

Similar to Pnuts (20)

Subversion last minute survival crash course
Subversion  last minute survival crash courseSubversion  last minute survival crash course
Subversion last minute survival crash course
 
2011.06.20 stratified-btree
2011.06.20 stratified-btree2011.06.20 stratified-btree
2011.06.20 stratified-btree
 
Thoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency ModelsThoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency Models
 
VMware Backup in Cybozu Labs
VMware Backup in Cybozu LabsVMware Backup in Cybozu Labs
VMware Backup in Cybozu Labs
 
Difference between team foundation server and subversion
Difference between team foundation server and subversionDifference between team foundation server and subversion
Difference between team foundation server and subversion
 
Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011
 
ES & Kafka
ES & KafkaES & Kafka
ES & Kafka
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
 
The Power of the Log
The Power of the LogThe Power of the Log
The Power of the Log
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Kubernetes Workshop
Kubernetes WorkshopKubernetes Workshop
Kubernetes Workshop
 
Extlect03
Extlect03Extlect03
Extlect03
 
Thoughts on consistency models
Thoughts on consistency modelsThoughts on consistency models
Thoughts on consistency models
 
"JBoss clustering solutions Mission Critical Enterprise" by Mircea Markus @ e...
"JBoss clustering solutions Mission Critical Enterprise" by Mircea Markus @ e..."JBoss clustering solutions Mission Critical Enterprise" by Mircea Markus @ e...
"JBoss clustering solutions Mission Critical Enterprise" by Mircea Markus @ e...
 
Graph processing
Graph processingGraph processing
Graph processing
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Node.js Explained
Node.js ExplainedNode.js Explained
Node.js Explained
 
Design Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databasesDesign Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databases
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
 

Recently uploaded

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 

Pnuts

  • 1. PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen,Nick Puz, Daniel Weaver and Ramana Yerneni Yahoo! Research 1
  • 2. Motivation • Web applications need: o Scalability -architectural scalability, scale linearly o Geographic scope -data replicas on multiple continents o High availability -failures, apps will still be able to read data o Relaxed consistency needs -Tolerate stale or reordered data 2
  • 3. Relaxed Consistency • Not strictly consistency • Very expensive. • Not eventually consistency • Ex: a photo sharing application • U1: Remove someone from the list of people who can view his photos • U2: Post spring-break photos 3
  • 4. What is PNUTS? • PNUTS, a massively parallel and geographically distributed database system for Yahoo!’s web applications. • An architecture based on record- level, asynchronous geographic replication, and use of a guaranteed message-delivery service rather than a persistent log. 4
  • 6. System architecture • Storage Units • Store several hundreds of tablets, a tablet usually several hundreds of megabytes. • Routers • The router stores an interval mapping, which defines the boundaries of each tablet, and also maps each tablet to a storage unit. • Tablet Controller • Routers contain only a cached copy of the interval mapping. The mapping is owned by the tablet controller • YMB- Yahoo Message Broker • topic-based pub/sub system 6
  • 7. Yahoo Message Broker • Distributed publish-subscribe service. • Guarantees delivery once a message is published. • Asynchronously assigned to different regions and applied to their replicas. 7
  • 9. Tablet splitting and balancing Each storage unit has many tablets (horizontal partitions of the table) Storage unit may become a hotspot Storage unit Tablet Overfull tablets split Tablets may grow over time Shed load by moving tablets to other servers 9
  • 11. Accessing data 4 1 Record for key k Get key k 2 3 Record for key k Get key k SU SU SU 11
  • 12. Bulk read 1 {k1, k2, … kn} Get k 1 2 Get k 2 Get k 3 Scatter/ SU SU SU gather engine 12
  • 13. Per-record timeline consistency • all replicas of a given record apply all updates to the record in the same order. 13
  • 14. Per-record timeline consistency • An example sequence of updates to a record • 3 events: insert, update and delete. • One replica assigned as the master • Generation: new insert Version: each update 14
  • 15. Consistency model • Goal: make it easier for applications to reason about updates and cope with asynchrony • web applications typically manipulate one record at a time Record Update Update Update Update Update Update Delete Update inserted v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time 15
  • 16. Consistency model Read-any Stale version Stale version Current version v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time Read-any: Returns a possibly stale version of the record. 16
  • 17. Consistency model Read latest Stale version Stale version Current version v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time Read latest: Returns the latest copy of the record that reflects all writes that have succeeded. 17
  • 18. Consistency model Read-critical(required version): Read ≥ v.6 Stale version Stale version Current version v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time Read critical: Returns a version of the record that is strictly newer than, or the same as the required version. 18
  • 19. Consistency model Test-and-set-write(required version) Write if = v.7 ERROR Stale version Stale version Current version v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time This call performs the requested write to the record if and only if the present version of the record is the same as required version 19
  • 20. Consistency model Write if = v.7 ERROR Stale version Stale version Current version Mechanism: per record mastership v. 1 v. 2 v. 3 v. 4 v. 5 Generation 1 v. 6 v. 7 v. 8 Time 20
  • 21. Consistency levels • Eventual consistency o Transactions: • Alice changes status from “Sleeping” to “Awake” • Alice changes location from “Home” to “Work” (Alice, Home, Sleeping) (Alice, Home, Awake) (Alice, Work, Awake) Region 1 Awake Awake Work Final state consistent Work (Alice, Home, Sleeping) (Alice, Work, Sleeping) (Alice, Work, Awake) Region 2 “Invalid” state visible
  • 22. Consistency levels • Timeline consistency o Transactions: • Alice changes status from “Sleeping” to “Awake” • Alice changes location from “Home” to “Work” (Alice, Home, Sleeping) (Alice, Home, Awake) (Alice, Work, Awake) Region 1 Awake Work (Alice, Work, Awake) Work (Alice, Home, Sleeping) (Alice, Work, Awake) Region 2
  • 24. Experimental setup • Production PNUTS code o Enhanced with ordered table type • Three PNUTS regions o 2 west coast, 1 east coast o 5 storage units, 2 message brokers, 1 router • Workload parameters o Request rate: 1200-3600 requests/second o Read: write mix ratio:0-50% writes o Locality:80% 24
  • 25. Inserts • Inserts o required 75.6 ms per insert in West 1 (tablet master) o 131.5 ms per insert into the non-master West 2, and o 315.5 ms per insert into the non-master East. o These results show the expected effect that the cost of inserting is significantly higher if the insert is initiated in a non-master region that is far away from the tablet master. 25
  • 26. 10% writes by default 26
  • 27. Lessons learned (1) • Simpler is better than clever o Clever approaches are hard to implement, test, debug and maintain • Incremental is better than big-bang
  • 28. Lessons learned (2) • Non-algorithmic challenges can be hard o Dealing with network config, legacy software and requirements, the “corporate way,” multiple stakeholders… • Researchers should get dirty hands o Being a part of shipping a real system can radically readjust your worldview o Write some test cases to understand system complexity