SlideShare a Scribd company logo
PNUTS: Yahoo!’s
Hosted Data Serving
Platform
• What does Yahoo! need?
• Consistency Levels
• What is PNUTS?
• Features
• Contributions
• FUNCTIONALITY
• SYSTEM ARCHITECTURE
• Experiments
• Future Work
Outline:
•Scalability - Flickr and del.icio.us.
•Response time and Geographic scope
•High Availability and Fault Tolerance
•Relaxed Consistency Guarantees
Characteristic of Web traffic
•Simple query needs
•Manipulate one record at a time
•Relaxed Consistency
What does Yahoo! need?
Consistency Levels
• Eventual consistency
o Transactions:
• Alice changes status from “Sleeping” to “Awake”
• Alice changes location from “Home” to “Work”
(Alice, Home, Sleeping) (Alice, Home, Awake)
Region 1
(Alice, Home, Sleeping) (Alice, Work, Sleeping)
Region 2
(Alice, Work, Awake)
(Alice, Work, Awake)
Work
Awake
Final state consistent
“Invalid” state visible
Awake Work
Consistency Levels
• Timeline consistency
o Transactions:
• Alice changes status from “Sleeping” to “Awake”
• Alice changes location from “Home” to “Work”
(Alice, Home, Sleeping) (Alice, Home, Awake)
Region 1
(Alice, Home, Sleeping) (Alice, Work, Awake)
Region 2
(Alice, Work, Awake)
Work
(Alice, Work, Awake)
Awake Work
• PNUTS, a massively parallel and
geographically distributed database system for
Yahoo!’s web applications.
What is PNUTS?
1
Data Model and Features(scatter-gather, asynchronous
notification, bulk loading)
2
Fault Tolerance
3
Pub-Sub Message System protocol (for geographically distant
replicas)
4
Asynchronously writing to multiple copies around the world
Features
5
Record-level Mastering
6
Flexible access: Hashed or ordered, indexes, views; flexible
schemas.
7
Centrally managed
8
Delivery of data management as hosted service.
Features
1
An architecture based on record-level, asynchronous
geographic replication,
2
A consistency model
3
A careful choice of features to include or exclude
4
Delivery of data management as hosted service.
Contributions
• Data and Query Model
• Consistency Model: Hiding the Complexity of
Replication
• Notification
• Bulk Load
FUNCTIONALITY
Data and Query Model
Data representation
Table of records with attributes
Additional data types: Blob
- Flexible Schemas
- Point Access Vs Range Access
- Hash tables Vs Ordered tables
Consistency Model
PNUTS provides a consistency model that is between the two extremes of general
serializability and eventual consistency.
web applications typically manipulate one record at a time, while different records
may have activity with different geographic locality.
-We provide per-record timeline consistency: all replicas of a given record apply all
updates to the record in the same order.
Consistency Model
Per-record Timeline Consistency
Consistency Model
API calls
Read-any: Returns a possibly stale version of the
record.
14
Time
v. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Current
version
Stale versionStale version
Read-any
Consistency Model
Time
v. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Read latest
Current
version
Stale versionStale version
Read latest: Returns the latest copy of the record that
reflects all writes that have succeeded.
Consistency Model
Time
v. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Read ≥ v.6
Current
version
Stale versionStale version
Read-critical(required version):
Read critical: Returns a version of the record that is strictly newer than, or
the same as the required version.
Consistency Model
Time
v. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Write if = v.7
ERROR
Current
version
Stale versionStale version
Test-and-set-write(required version)
This call performs the requested write to the record if and only if the present
version of the record is the same as required version
Notification
- Trigger-like notifications are important for applications
such as ad serving, which must invalidate cached copies of
ads when the advertising contract expires.
- Allow the user to subscribe to the stream of updates on a
table.
Bulk Load
Necessary for applications such as comparison shopping,
which upload large blocks of new sale listings into the
database every day. Bulk inserts can be done in parallel to
multiple storage units for fast loading.
SYSTEM ARCHITECTURE
SYSTEM ARCHITECTURE
SYSTEM ARCHITECTURE
Data Storage and Retrieval
- Storage unit
Store tablets
Respond to get(), scan() and set() requests.
- Tablet controller
Owns the mapping
Routers poll periodically to get mapping updates.
Performs load-balancing and recovery
- Router
Determines which tablet contains the record
Determines which storage unit has the tablet
Interval mapping- Binary search of B+ tree.
Tablet controller does not become bottleneck.
Replication and Consistency
Asynchronous replication to ensure low latency updates. We use
the Yahoo! message broker, a publish/subscribe system
developed at Yahoo!
Yahoo! Message Broker
- Topic based Publish/subscribe system
- Used for logging and replication
- PNUTS + YMB = Sherpa data services platform
- Data updates considered committed when published to YMB.
- Updates asynchronously propagated to different regions (post-
publishing).
- Message purged after applied to all replicas.
- Per-record mastership mechanism.
Consistency via YMB and mastership
- Mastership is assigned on a record-by-record basis.
- All requests directed to master.
- Different records in same table can be mastered in different
clusters.
- Basis: Write requests locality
- Record stores its master as metadata.
- Tablet master for primary key constraints
- Multiple values for primary keys.
Recovery
- Any committed update is recoverable from a remote replica.
Three step recovery
1- Tablet controller requests copy from remote (source) replica.
2- “Checkpoint message” published to YMB, for in-flight updates.
3- Source tablet is copied to destination region.
Support for recovery
Synchronized tablet boundaries
Tablet splits at the same time (two-phase commit)
Backup regions within the same region.
Query Processing
Scatter-gather engine
- Receives a multi-record request
- Splits it into multiple individual requests for single
records/tablet scans
- Initiates requests in parallel.
- Gather the result and passes to client.
Server-side design?
- Prevent multiple parallel client requests.
- Server side optimization (group requests to same storage)
Range scan
Notifications
- Service to notify external systems of updates to data.
Example: popular keyword search engine index.
- Clients subscribe to all topics(tablets) for table
- Client need no knowledge of tablet organization.
- Creation of new topic (tablet split) - automatic subscription
- Break subscription of slow notification clients.
Experiments
Experiments
Experimental setup
Metric: latency
Being compared: hash and ordered tables
Clusters: three-region PNUTS cluster
2 to the west, 1 to the east
Parameters
Experiments
Inserting Data
■ One region (West 1) is the tablet master
■ Hash: 99 clients (33 per region), MySQL: 60 clients
■ 1 million records, 1/3 per region
■ Result:
– Hash: West1: 75.6ms; West2: 131.5ms, East 315.5ms
– Ordered: West1: 33ms; West2: 105.8ms, East 324.5ms
■ Lesson: MySQL is faster than hash, although more vulnerable
to contention
■ More observations
Experiments
Varying Load
■ Requests vary between 1200 – 3600
requests/second with 10% writes
■ Result:
Experiments
Varying Read/Write Ratio
■ Ratios vary between 0 and 50%
■ Fixed 1,200 requests/second
Experiments
Varying Number of Storage Units
■ Storage units per region vary from 2-5
■ 10% writes, 1,200 requests/seconds
Experiments
Varying Size of Range Scans
■ Range scan between 0.01 to 0.1% size
■ Ordered table only
■ 30 clients vs. 300 clients
Bottlenecks
• Disk seek capacity on storage units
• Message Brokers
Different PNUTS customers are assigned different clusters
of storage units and message broker machines. Can share
routers and tablet controllers.
Future Work
• Consistency
– Referential integrity
– Bundled update
– Relaxed consistency
• Data Storage and Retrieval
– Fair sharing of storage units and message brokers
• Query Processing
– Query optimization: Maintain statistics
– Expansion of query language: join/aggregation
– Batch-query processing
• Indexes and Materialized views
Thank you
Any questions

More Related Content

What's hot

Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelKernel TLV
 
리눅스 커널 디버거 KGDB/KDB
리눅스 커널 디버거 KGDB/KDB리눅스 커널 디버거 KGDB/KDB
리눅스 커널 디버거 KGDB/KDBManjong Han
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloudinside-BigData.com
 
[MeetUp][1st] 오리뎅이의_쿠버네티스_네트워킹
[MeetUp][1st] 오리뎅이의_쿠버네티스_네트워킹[MeetUp][1st] 오리뎅이의_쿠버네티스_네트워킹
[MeetUp][1st] 오리뎅이의_쿠버네티스_네트워킹InfraEngineer
 
NDC12_Lockless게임서버설계와구현
NDC12_Lockless게임서버설계와구현NDC12_Lockless게임서버설계와구현
NDC12_Lockless게임서버설계와구현noerror
 
서버 아키텍처 이해를 위한 프로세스와 쓰레드
서버 아키텍처 이해를 위한 프로세스와 쓰레드서버 아키텍처 이해를 위한 프로세스와 쓰레드
서버 아키텍처 이해를 위한 프로세스와 쓰레드KwangSeob Jeong
 
[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅NAVER D2
 
[160402_데브루키_박민근] UniRx 소개
[160402_데브루키_박민근] UniRx 소개[160402_데브루키_박민근] UniRx 소개
[160402_데브루키_박민근] UniRx 소개MinGeun Park
 
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...HostedbyConfluent
 
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsMySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsJean-François Gagné
 
GCGC- CGCII 서버 엔진에 적용된 기술 (1)
GCGC- CGCII 서버 엔진에 적용된 기술 (1)GCGC- CGCII 서버 엔진에 적용된 기술 (1)
GCGC- CGCII 서버 엔진에 적용된 기술 (1)상현 조
 
게임서버프로그래밍 #2 - IOCP Adv
게임서버프로그래밍 #2 - IOCP Adv게임서버프로그래밍 #2 - IOCP Adv
게임서버프로그래밍 #2 - IOCP AdvSeungmo Koo
 
프로그래머가 몰랐던 멀티코어 CPU 이야기 13, 14장
프로그래머가 몰랐던 멀티코어 CPU 이야기 13, 14장프로그래머가 몰랐던 멀티코어 CPU 이야기 13, 14장
프로그래머가 몰랐던 멀티코어 CPU 이야기 13, 14장SukYun Yoon
 
Apache Kylin Meetup: Berlin - With OLX Group
Apache Kylin Meetup: Berlin - With OLX GroupApache Kylin Meetup: Berlin - With OLX Group
Apache Kylin Meetup: Berlin - With OLX GroupTyler Wishnoff
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXzznate
 
홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019
홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019
홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019devCAT Studio, NEXON
 
[모두의연구소] 쫄지말자딥러닝
[모두의연구소] 쫄지말자딥러닝[모두의연구소] 쫄지말자딥러닝
[모두의연구소] 쫄지말자딥러닝Modulabs
 
Resilient Design Using Queue Theory
Resilient Design Using Queue TheoryResilient Design Using Queue Theory
Resilient Design Using Queue TheoryScyllaDB
 
Extending the Apache Kafka® Replication Protocol Across Clusters, Sanjana Kau...
Extending the Apache Kafka® Replication Protocol Across Clusters, Sanjana Kau...Extending the Apache Kafka® Replication Protocol Across Clusters, Sanjana Kau...
Extending the Apache Kafka® Replication Protocol Across Clusters, Sanjana Kau...HostedbyConfluent
 

What's hot (20)

Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux Kernel
 
리눅스 커널 디버거 KGDB/KDB
리눅스 커널 디버거 KGDB/KDB리눅스 커널 디버거 KGDB/KDB
리눅스 커널 디버거 KGDB/KDB
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloud
 
[MeetUp][1st] 오리뎅이의_쿠버네티스_네트워킹
[MeetUp][1st] 오리뎅이의_쿠버네티스_네트워킹[MeetUp][1st] 오리뎅이의_쿠버네티스_네트워킹
[MeetUp][1st] 오리뎅이의_쿠버네티스_네트워킹
 
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and CompactionHBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
 
NDC12_Lockless게임서버설계와구현
NDC12_Lockless게임서버설계와구현NDC12_Lockless게임서버설계와구현
NDC12_Lockless게임서버설계와구현
 
서버 아키텍처 이해를 위한 프로세스와 쓰레드
서버 아키텍처 이해를 위한 프로세스와 쓰레드서버 아키텍처 이해를 위한 프로세스와 쓰레드
서버 아키텍처 이해를 위한 프로세스와 쓰레드
 
[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅
 
[160402_데브루키_박민근] UniRx 소개
[160402_데브루키_박민근] UniRx 소개[160402_데브루키_박민근] UniRx 소개
[160402_데브루키_박민근] UniRx 소개
 
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...
Monitoring Kafka without instrumentation using eBPF with Antón Rodríguez | Ka...
 
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsMySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitations
 
GCGC- CGCII 서버 엔진에 적용된 기술 (1)
GCGC- CGCII 서버 엔진에 적용된 기술 (1)GCGC- CGCII 서버 엔진에 적용된 기술 (1)
GCGC- CGCII 서버 엔진에 적용된 기술 (1)
 
게임서버프로그래밍 #2 - IOCP Adv
게임서버프로그래밍 #2 - IOCP Adv게임서버프로그래밍 #2 - IOCP Adv
게임서버프로그래밍 #2 - IOCP Adv
 
프로그래머가 몰랐던 멀티코어 CPU 이야기 13, 14장
프로그래머가 몰랐던 멀티코어 CPU 이야기 13, 14장프로그래머가 몰랐던 멀티코어 CPU 이야기 13, 14장
프로그래머가 몰랐던 멀티코어 CPU 이야기 13, 14장
 
Apache Kylin Meetup: Berlin - With OLX Group
Apache Kylin Meetup: Berlin - With OLX GroupApache Kylin Meetup: Berlin - With OLX Group
Apache Kylin Meetup: Berlin - With OLX Group
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
 
홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019
홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019
홍성우, 게임 서버의 목차 - 시작부터 출시까지, NDC2019
 
[모두의연구소] 쫄지말자딥러닝
[모두의연구소] 쫄지말자딥러닝[모두의연구소] 쫄지말자딥러닝
[모두의연구소] 쫄지말자딥러닝
 
Resilient Design Using Queue Theory
Resilient Design Using Queue TheoryResilient Design Using Queue Theory
Resilient Design Using Queue Theory
 
Extending the Apache Kafka® Replication Protocol Across Clusters, Sanjana Kau...
Extending the Apache Kafka® Replication Protocol Across Clusters, Sanjana Kau...Extending the Apache Kafka® Replication Protocol Across Clusters, Sanjana Kau...
Extending the Apache Kafka® Replication Protocol Across Clusters, Sanjana Kau...
 

Similar to Pnuts yahoo!’s hosted data serving platform

Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...DataWorks Summit
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Bob Pusateri
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applicationsDing Li
 
18 philbe replication stanford99
18 philbe replication stanford9918 philbe replication stanford99
18 philbe replication stanford99ashish61_scs
 
Understand oracle real application cluster
Understand oracle real application clusterUnderstand oracle real application cluster
Understand oracle real application clusterSatishbabu Gunukula
 
Realtime olap architecture in apache kylin 3.0
Realtime olap architecture in apache kylin 3.0Realtime olap architecture in apache kylin 3.0
Realtime olap architecture in apache kylin 3.0Shi Shao Feng
 
Designing Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and KubernetesDesigning Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and KubernetesEvan Chan
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
 
Database replication
Database replicationDatabase replication
Database replicationArslan111
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingApache Apex
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Bob Pusateri
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rulesOleg Tsal-Tsalko
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingApache Apex
 
Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019Dharma Shukla
 

Similar to Pnuts yahoo!’s hosted data serving platform (20)

Pnuts Review
Pnuts ReviewPnuts Review
Pnuts Review
 
PNUTS
PNUTSPNUTS
PNUTS
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
18 philbe replication stanford99
18 philbe replication stanford9918 philbe replication stanford99
18 philbe replication stanford99
 
Understand oracle real application cluster
Understand oracle real application clusterUnderstand oracle real application cluster
Understand oracle real application cluster
 
Realtime olap architecture in apache kylin 3.0
Realtime olap architecture in apache kylin 3.0Realtime olap architecture in apache kylin 3.0
Realtime olap architecture in apache kylin 3.0
 
Designing Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and KubernetesDesigning Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and Kubernetes
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Database replication
Database replicationDatabase replication
Database replication
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rules
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
 
Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019
 

Recently uploaded

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...Elena Simperl
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform EngineeringJemma Hussein Allen
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»QADay
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsVlad Stirbu
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
 

Recently uploaded (20)

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Ransomware Mallox [EN].pdf
Ransomware         Mallox       [EN].pdfRansomware         Mallox       [EN].pdf
Ransomware Mallox [EN].pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 

Pnuts yahoo!’s hosted data serving platform

  • 1. PNUTS: Yahoo!’s Hosted Data Serving Platform
  • 2. • What does Yahoo! need? • Consistency Levels • What is PNUTS? • Features • Contributions • FUNCTIONALITY • SYSTEM ARCHITECTURE • Experiments • Future Work Outline:
  • 3. •Scalability - Flickr and del.icio.us. •Response time and Geographic scope •High Availability and Fault Tolerance •Relaxed Consistency Guarantees Characteristic of Web traffic •Simple query needs •Manipulate one record at a time •Relaxed Consistency What does Yahoo! need?
  • 4. Consistency Levels • Eventual consistency o Transactions: • Alice changes status from “Sleeping” to “Awake” • Alice changes location from “Home” to “Work” (Alice, Home, Sleeping) (Alice, Home, Awake) Region 1 (Alice, Home, Sleeping) (Alice, Work, Sleeping) Region 2 (Alice, Work, Awake) (Alice, Work, Awake) Work Awake Final state consistent “Invalid” state visible Awake Work
  • 5. Consistency Levels • Timeline consistency o Transactions: • Alice changes status from “Sleeping” to “Awake” • Alice changes location from “Home” to “Work” (Alice, Home, Sleeping) (Alice, Home, Awake) Region 1 (Alice, Home, Sleeping) (Alice, Work, Awake) Region 2 (Alice, Work, Awake) Work (Alice, Work, Awake) Awake Work
  • 6. • PNUTS, a massively parallel and geographically distributed database system for Yahoo!’s web applications. What is PNUTS?
  • 7. 1 Data Model and Features(scatter-gather, asynchronous notification, bulk loading) 2 Fault Tolerance 3 Pub-Sub Message System protocol (for geographically distant replicas) 4 Asynchronously writing to multiple copies around the world Features
  • 8. 5 Record-level Mastering 6 Flexible access: Hashed or ordered, indexes, views; flexible schemas. 7 Centrally managed 8 Delivery of data management as hosted service. Features
  • 9. 1 An architecture based on record-level, asynchronous geographic replication, 2 A consistency model 3 A careful choice of features to include or exclude 4 Delivery of data management as hosted service. Contributions
  • 10. • Data and Query Model • Consistency Model: Hiding the Complexity of Replication • Notification • Bulk Load FUNCTIONALITY
  • 11. Data and Query Model Data representation Table of records with attributes Additional data types: Blob - Flexible Schemas - Point Access Vs Range Access - Hash tables Vs Ordered tables
  • 12. Consistency Model PNUTS provides a consistency model that is between the two extremes of general serializability and eventual consistency. web applications typically manipulate one record at a time, while different records may have activity with different geographic locality. -We provide per-record timeline consistency: all replicas of a given record apply all updates to the record in the same order.
  • 14. Consistency Model API calls Read-any: Returns a possibly stale version of the record. 14 Time v. 1 v. 2 v. 3 v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Current version Stale versionStale version Read-any
  • 15. Consistency Model Time v. 1 v. 2 v. 3 v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Read latest Current version Stale versionStale version Read latest: Returns the latest copy of the record that reflects all writes that have succeeded.
  • 16. Consistency Model Time v. 1 v. 2 v. 3 v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Read ≥ v.6 Current version Stale versionStale version Read-critical(required version): Read critical: Returns a version of the record that is strictly newer than, or the same as the required version.
  • 17. Consistency Model Time v. 1 v. 2 v. 3 v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Write if = v.7 ERROR Current version Stale versionStale version Test-and-set-write(required version) This call performs the requested write to the record if and only if the present version of the record is the same as required version
  • 18. Notification - Trigger-like notifications are important for applications such as ad serving, which must invalidate cached copies of ads when the advertising contract expires. - Allow the user to subscribe to the stream of updates on a table.
  • 19. Bulk Load Necessary for applications such as comparison shopping, which upload large blocks of new sale listings into the database every day. Bulk inserts can be done in parallel to multiple storage units for fast loading.
  • 23. Data Storage and Retrieval - Storage unit Store tablets Respond to get(), scan() and set() requests. - Tablet controller Owns the mapping Routers poll periodically to get mapping updates. Performs load-balancing and recovery - Router Determines which tablet contains the record Determines which storage unit has the tablet Interval mapping- Binary search of B+ tree. Tablet controller does not become bottleneck.
  • 24. Replication and Consistency Asynchronous replication to ensure low latency updates. We use the Yahoo! message broker, a publish/subscribe system developed at Yahoo!
  • 25. Yahoo! Message Broker - Topic based Publish/subscribe system - Used for logging and replication - PNUTS + YMB = Sherpa data services platform - Data updates considered committed when published to YMB. - Updates asynchronously propagated to different regions (post- publishing). - Message purged after applied to all replicas. - Per-record mastership mechanism.
  • 26. Consistency via YMB and mastership - Mastership is assigned on a record-by-record basis. - All requests directed to master. - Different records in same table can be mastered in different clusters. - Basis: Write requests locality - Record stores its master as metadata. - Tablet master for primary key constraints - Multiple values for primary keys.
  • 27. Recovery - Any committed update is recoverable from a remote replica. Three step recovery 1- Tablet controller requests copy from remote (source) replica. 2- “Checkpoint message” published to YMB, for in-flight updates. 3- Source tablet is copied to destination region. Support for recovery Synchronized tablet boundaries Tablet splits at the same time (two-phase commit) Backup regions within the same region.
  • 28. Query Processing Scatter-gather engine - Receives a multi-record request - Splits it into multiple individual requests for single records/tablet scans - Initiates requests in parallel. - Gather the result and passes to client. Server-side design? - Prevent multiple parallel client requests. - Server side optimization (group requests to same storage) Range scan
  • 29. Notifications - Service to notify external systems of updates to data. Example: popular keyword search engine index. - Clients subscribe to all topics(tablets) for table - Client need no knowledge of tablet organization. - Creation of new topic (tablet split) - automatic subscription - Break subscription of slow notification clients.
  • 31. Experiments Experimental setup Metric: latency Being compared: hash and ordered tables Clusters: three-region PNUTS cluster 2 to the west, 1 to the east Parameters
  • 32. Experiments Inserting Data ■ One region (West 1) is the tablet master ■ Hash: 99 clients (33 per region), MySQL: 60 clients ■ 1 million records, 1/3 per region ■ Result: – Hash: West1: 75.6ms; West2: 131.5ms, East 315.5ms – Ordered: West1: 33ms; West2: 105.8ms, East 324.5ms ■ Lesson: MySQL is faster than hash, although more vulnerable to contention ■ More observations
  • 33. Experiments Varying Load ■ Requests vary between 1200 – 3600 requests/second with 10% writes ■ Result:
  • 34. Experiments Varying Read/Write Ratio ■ Ratios vary between 0 and 50% ■ Fixed 1,200 requests/second
  • 35. Experiments Varying Number of Storage Units ■ Storage units per region vary from 2-5 ■ 10% writes, 1,200 requests/seconds
  • 36. Experiments Varying Size of Range Scans ■ Range scan between 0.01 to 0.1% size ■ Ordered table only ■ 30 clients vs. 300 clients
  • 37. Bottlenecks • Disk seek capacity on storage units • Message Brokers Different PNUTS customers are assigned different clusters of storage units and message broker machines. Can share routers and tablet controllers.
  • 38. Future Work • Consistency – Referential integrity – Bundled update – Relaxed consistency • Data Storage and Retrieval – Fair sharing of storage units and message brokers • Query Processing – Query optimization: Maintain statistics – Expansion of query language: join/aggregation – Batch-query processing • Indexes and Materialized views