PNUTS: Yahoo!’s Hosted Data
Serving Platform
VLDB ‘08
Auckland, New Zealand
Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam
Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel
Weaver, and Ramana Yerneni
Presented by
Tarik Reza Toha
#1017052013
Outline
• Background and motivation
• Related work
• Proposed methodology
– Data storage and retrieval
– Asynchronous replication and consistency
• Experimental evaluation
• Conclusion and future work
2
3
Modern Web Applications
Brian
Sonja Jimi Brandon Kurt
What are my friends up to?
Sonja:
Brandon:
4
Modern Web Applications (contd.)
16 Mike <ph..
6 Jimi <ph..
8 Mary <re..
12 Sonja <ph..
15 Brandon <po..
17 Bob <re..
<photo>
<title>Flower</title>
<url>www.flickr.com</url>
</photo>
• Scalability
– Architectural scalability: scale during periods of rapid
growth with minimal operational effort
• Response time and geographic scope
– Fast response time to geographically distributed users
• High availability and fault tolerance
– Read and even write data in failures
• Relaxed consistency guarantees
– Eventually consistency: update one replica first and
then update others
5
Requirements of Modern Web Applications
• Traditional DBMS features are:
– Complicated queries
– Strong transactions
• Modern web applications need:
– Simplified query
• No joins, aggregations
– Relaxed consistency needs
• Applications can tolerate stale or reordered data
6
DBMS for Modern Web Applications
• Bigtable: A Distributed Storage System for
Structured Data [Google, Inc.]
– Chang et al., OSDI, 2006
– Provides record-oriented access to very large tables
– Lacks geographic replication
– Lacks rich database functionalities
• Secondary indexes
• Materialized views
• Create multiple tables
• Hash-organized tables
Existing Database Management Systems
7
• Dynamo: Amazon’s Highly Available Key-value
Store
– DeCandia et al., SIGOPS, 2007
– A highly-available system
– Provides geographic replication via a gossip
mechanism
– Uses eventual consistency model
• Creates temporary inconsistency
– Uses hash-tables
• Some storages become hot-spots
8
Existing Database Management Systems (contd.)
• Distributed filesystems
– Ceph, Boxwood, Sinfonia
– Store objects
– Inappropriate for databases
– Unscalable
• Distributed hash tables (peer-to-peer)
– Chord, Pastry
– Provides object routing and database system
– Lacks ordered table abstraction
– Focuses on reliable routing and object replication in the
face of massive node turnover
9
Existing Database Management Systems (contd.)
PNUTS is a massively parallel and geographically
distributed database system for Yahoo!’s web
applications, which provides data storage organized
as hashed or ordered tables, low latency for large
numbers of con-current requests including updates
and queries, and novel per-record consistency
guarantees
10
Platform for Nimble Universal Table Storage
11
Proposed Architecture of PNUTS
E 75656 C
A 42342 E
B 42521 W
C 66354 W
D 12352 E
F 15677 E
E 75656 C
A 42342 E
B 42521 W
C 66354 W
D 12352 E
F 15677 E
CREATE TABLE Parts (
ID VARCHAR,
StockNumber INT,
Status VARCHAR
…
)
Parallel database Geographic replication
Indexes and views
Structured, flexible schema
Hosted, managed infrastructure
A 42342 E
B 42521 W
C 66354 W
D 12352 E
E 75656 C
F 15677 E
12
Detailed Architecture of PNUTS
Data-path components
Storage units
Tablet
controller
REST API
Clients
Message
Broker
Routers
13
Detailed Architecture of PNUTS (contd.)
Storage units
Routers
Tablet controller
REST API
Clients
Local region Remote regions
YMB
14
Tablets in Hash Table
Apple
Lemon
Grape
Orange
Lime
Strawberry
Kiwi
Avocado
Tomato
Banana
Grapes are good to eat
Limes are green
Apple is wisdom
Strawberry shortcake
Arrgh! Don’t get scurvy!
But at what price?
How much did you pay for this lemon?
Is this a vegetable?
New Zealand
The perfect fruit
Name Description Price
$12
$9
$1
$900
$2
$3
$1
$14
$2
$8
0x0000
0xFFFF
0x911F
0x2AF3
Tablet 1
Tablet 2
Tablet 3
15
Tablets in Ordered Table
Apple
Banana
Grape
Orange
Lime
Strawberry
Kiwi
Avocado
Tomato
Lemon
Grapes are good to eat
Limes are green
Apple is wisdom
Strawberry shortcake
Arrgh! Don’t get scurvy!
But at what price?
The perfect fruit
Is this a vegetable?
How much did you pay for this lemon?
New Zealand
$1
$3
$2
$12
$8
$1
$9
$2
$900
$14
Name Description Price
A
Z
Q
H
Tablet 1
Tablet 2
Tablet 3
16
Single Query in PNUTS
1
Get key k (get( ))
2
Get key k3
Record for key k
4
Record for key k
Routers
Storage unit 1 Storage unit 2 Storage unit 3
17
Range Queries in PNUTS
MIN-Canteloupe SU1
Canteloupe-Lime SU3
Lime-Strawberry SU2
Strawberry-MAX SU1
Storage unit 1 Storage unit 2 Storage unit 3
Router (Scatter-gather Engine)
Apple
Avocado
Banana
Blueberry
Canteloupe
Grape
Kiwi
Lemon
Lime
Mango
Orange
Pear
Strawberry
Tomato
Watermelon
Grapefruit…Pear? (scan( ))
Grapefruit…Lime?
Lime…Pear?
SU1Strawberry-MAX
SU2Lime-Strawberry
SU3Canteloupe-Lime
SU4MIN-Canteloupe
18
Update Operation in PNUTS
1
Write key k (set(v))
2
Write key k7
Sequence # for key k
8
Sequence # for key k
SU SU SU
3
Write key k
4
5
SUCCESS
6
Write key k
Routers
Message brokers
19
Load Balancing via Tablet Splitting
Each storage unit has many tablets (horizontal partitions of the table)
Tablets may grow over timeOverfull tablets split
Storage unit may become a hotspot
Shed load by moving tablets to other servers
Storage unit
Tablet
20
Asynchronous Replication
• Eventual consistency
– Transactions:
• Alice changes status from “Sleeping” to “Awake”
• Alice changes location from “Home” to “Work”
21
Consistency Levels
(Alice, Home, Sleeping) (Alice, Home, Awake)
Region 1
(Alice, Home, Sleeping) (Alice, Work, Sleeping)
Region 2
(Alice, Work, Awake)
(Alice, Work, Awake)
Work
Awake
Final state consistent
“Invalid” state visible
Awake Work
• Timeline consistency
– Transactions:
• Alice changes status from “Sleeping” to “Awake”
• Alice changes location from “Home” to “Work”
22
Consistency Levels (contd.)
(Alice, Home, Sleeping) (Alice, Home, Awake)
Region 1
(Alice, Home, Sleeping) (Alice, Work, Awake)
Region 2
(Alice, Work, Awake)
Work
(Alice, Work, Awake)
Awake Work
23
Consistency via Mastership
A 42342 E
B 42521 W
C 66354 W
D 12352 E
E 75656 C
F 15677 E
A 42342 E
B 42521 W
C 66354 W
D 12352 E
E 75656 C
F 15677 E
A 42342 E
B 42521 W
C 66354 W
D 12352 E
E 75656 C
F 15677 E
A 42342 E
B 42521 E
C 66354 W
D 12352 E
E 75656 C
F 15677 E
C 66354 W
B 42521 E
A 42342 E
D 12352 E
E 75656 C
F 15677 E
24
Failover in PNUTS
A 42342 E
B 42521 W
C 66354 W
D 12352 E
E 75656 C
F 15677 E A 42342 E
B 42521 W
C 66354 W
D 12352 E
E 75656 C
F 15677 E
A 42342 E
B 42521 W
C 66354 W
D 12352 E
E 75656 C
F 15677 E
X
X
OVERRIDE W → E
• PNUTS supports both eventual and timeline
consistency model
– Applications can choose which kind of table to create
• What happens to a record with primary key
“Brian”?
25
Consistency Models in PNUTS
Record
inserted
Update Update Update UpdateUpdate Delete
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Update Update
26
Some APIs of Timeline Model in PNUTS
Time
v. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Current
version
Stale versionStale version
Read-any
• Read-any returns a possibly stale version of the record
‒ Served using a local copy
• It can be used for displaying a user’s friend’s status in a social
networking application, as it is not absolutely essential to get
the most up-to-date value
27
Some APIs of Timeline Model in PNUTS (contd.)
Time
v. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Read-latest
Current
version
Stale versionStale version
Time
v. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Write
Current
version
Stale versionStale version
28
Some APIs of Timeline Model in PNUTS (contd.)
Time
v. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Read ≥ v.6
Current
version
Stale versionStale version
Read-critical(required version):
• Read-critical returns a version of the record that is strictly
newer than, or the same as the required version
• It can be used when a user writes a record, and then wants to
read a version of the record that definitely reflects his changes
29
Some APIs of Timeline Model in PNUTS (contd.)
Time
v. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Write if = v.7
ERROR
Current
version
Stale versionStale version
Test-and-set-write(required version)
• Test-and-set-write performs the requested write to the record if
and only if the present version of the record is the same as
required version
‒ Locking mechanism in row level
• It can be used to implement writing a record based on previous
reading, i.e., incrementing the value of a counter
• Yahoo! Message Broker (YMB) [redo log]
– Topic-based publish/subscribe system
– Data is considered “committed” when they have been published to YMB
– At some point after being committed, the update will be asynchronously
propagated to different regions and applied to their replicas
• Recovery via YMB
– The tablet controller requests a copy from a particular remote replica (the
“source tablet”)
– A “checkpoint message” is published to YMB to ensure that any in-flight
updates at the time the copy is initiated are applied to the source tablet
– The source tablet is copied to the destination region
– Backup is used in practice
30
Recovery via YMB
Other Features
31
• Notifications
– One pub-sub topic per tablet
– Client knows about tables instead of tablets
– Automatically subscribed to all tablets in spite of
adding/removing tablets
– Undelivered notifications are handled in usual way
• Hosted Database Service
– Centrally-managed database service shared by
multiple applications
Experimental Setup
32
• Three PNUTS regions
• Workload
– 1200-3600 requests/second
– 0-50% writes
– 80% locality
• Insert Operation
Region Machine Servers/region
West 1, West 2 2.8 GHz Xeon, 4GB RAM 5 SU, 2 YMB,
1 Router, 1 Tablet controllerEast Quad 2.13 GHz Xeon, 4GB RAM
Region Latency (hash table) Latency (ordered table)
West 1 (master) 75.6 ms 33 ms
West 2 (non-master) 131.5 ms 105.8 ms
East (non-master) 315.5 ms 324.5 ms
33
Experimental Evaluation
• Existing DBMS fails to provide rich database functionality and low
latency at massive scale
• PNUTS uses a asynchronous geographic replication to ensure low write
latency
– Per-record timeline consistency that provides useful guarantees to
applications without sacrificing scalability
– Message broker that serves both as the replication mechanism and redo log
of the database
– Flexible mapping of tablets to storage units to support automated failover
and load balancing
• Future work
– Indexes and materialized views
– Bundled updates
– Batch query processing (MapReduce)
34
Conclusion and Future Work
• Asynchronous View Maintenance for VLSD Databases
– Agarwal et al., SIGMOD, 2009
– Indexes and views
• A Batch of PNUTS: Experiences Connecting Cloud Batch
and Serving Systems
– Silberstein et al., SIGMOD, 2011
– PNUTS-Hadoop
• Where in the World is My Data?
– Kadambi et al., VLDB, 2011
– Selective replication
35
Subsequent Advancements
• Remote view table
– A regular table but updated by the view maintainer
instead of a client
36
Indexes and Views
Update
YMB YMBSU
VM
37
PNUTS-Hadoop
Reading from PNUTS
Hadoop Tasks
scan(0x2-0x4)
scan(0xa-0xc)
scan(0x8-0xa)
scan(0x0-0x2)
scan(0xc-0xe)
Map
PNUTS
1. Split PNUTS table into ranges
2. Each Hadoop task assigned a range
3. Task uses PNUTS scan API to retrieve
records in range
4. Task feeds scan results and feeds
records to map function
Record
Reader
Writing to PNUTS
Map or Reduce
Hadoop Tasks
PNUTS
Router
set
set
set
set
set
set
1. Call PNUTS set to write output
set
• If a European user’s record is never accessed in Asia, it does
not make sense to pay the bandwidth and disk costs to maintain
an Asian replica
• Static replacement
– Per-record constraints
– Client sets mandatory, disallowed regions
• Dynamic replacement
– Create replicas in regions where record is read
– Evict replicas from regions where record not read
– Lease-based
• When a replica read, guaranteed to survive for a time period
• Eviction lazy; when lease expires, replica deleted on next write
38
Selective Replication
Thank you
Questions are welcome!
Email: 1017052013@grad.cse.buet.ac.bd
39

PNUTS: Yahoo!’s Hosted Data Serving Platform

  • 1.
    PNUTS: Yahoo!’s HostedData Serving Platform VLDB ‘08 Auckland, New Zealand Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, and Ramana Yerneni Presented by Tarik Reza Toha #1017052013
  • 2.
    Outline • Background andmotivation • Related work • Proposed methodology – Data storage and retrieval – Asynchronous replication and consistency • Experimental evaluation • Conclusion and future work 2
  • 3.
    3 Modern Web Applications Brian SonjaJimi Brandon Kurt What are my friends up to? Sonja: Brandon:
  • 4.
    4 Modern Web Applications(contd.) 16 Mike <ph.. 6 Jimi <ph.. 8 Mary <re.. 12 Sonja <ph.. 15 Brandon <po.. 17 Bob <re.. <photo> <title>Flower</title> <url>www.flickr.com</url> </photo>
  • 5.
    • Scalability – Architecturalscalability: scale during periods of rapid growth with minimal operational effort • Response time and geographic scope – Fast response time to geographically distributed users • High availability and fault tolerance – Read and even write data in failures • Relaxed consistency guarantees – Eventually consistency: update one replica first and then update others 5 Requirements of Modern Web Applications
  • 6.
    • Traditional DBMSfeatures are: – Complicated queries – Strong transactions • Modern web applications need: – Simplified query • No joins, aggregations – Relaxed consistency needs • Applications can tolerate stale or reordered data 6 DBMS for Modern Web Applications
  • 7.
    • Bigtable: ADistributed Storage System for Structured Data [Google, Inc.] – Chang et al., OSDI, 2006 – Provides record-oriented access to very large tables – Lacks geographic replication – Lacks rich database functionalities • Secondary indexes • Materialized views • Create multiple tables • Hash-organized tables Existing Database Management Systems 7
  • 8.
    • Dynamo: Amazon’sHighly Available Key-value Store – DeCandia et al., SIGOPS, 2007 – A highly-available system – Provides geographic replication via a gossip mechanism – Uses eventual consistency model • Creates temporary inconsistency – Uses hash-tables • Some storages become hot-spots 8 Existing Database Management Systems (contd.)
  • 9.
    • Distributed filesystems –Ceph, Boxwood, Sinfonia – Store objects – Inappropriate for databases – Unscalable • Distributed hash tables (peer-to-peer) – Chord, Pastry – Provides object routing and database system – Lacks ordered table abstraction – Focuses on reliable routing and object replication in the face of massive node turnover 9 Existing Database Management Systems (contd.)
  • 10.
    PNUTS is amassively parallel and geographically distributed database system for Yahoo!’s web applications, which provides data storage organized as hashed or ordered tables, low latency for large numbers of con-current requests including updates and queries, and novel per-record consistency guarantees 10 Platform for Nimble Universal Table Storage
  • 11.
    11 Proposed Architecture ofPNUTS E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) Parallel database Geographic replication Indexes and views Structured, flexible schema Hosted, managed infrastructure A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E
  • 12.
    12 Detailed Architecture ofPNUTS Data-path components Storage units Tablet controller REST API Clients Message Broker Routers
  • 13.
    13 Detailed Architecture ofPNUTS (contd.) Storage units Routers Tablet controller REST API Clients Local region Remote regions YMB
  • 14.
    14 Tablets in HashTable Apple Lemon Grape Orange Lime Strawberry Kiwi Avocado Tomato Banana Grapes are good to eat Limes are green Apple is wisdom Strawberry shortcake Arrgh! Don’t get scurvy! But at what price? How much did you pay for this lemon? Is this a vegetable? New Zealand The perfect fruit Name Description Price $12 $9 $1 $900 $2 $3 $1 $14 $2 $8 0x0000 0xFFFF 0x911F 0x2AF3 Tablet 1 Tablet 2 Tablet 3
  • 15.
    15 Tablets in OrderedTable Apple Banana Grape Orange Lime Strawberry Kiwi Avocado Tomato Lemon Grapes are good to eat Limes are green Apple is wisdom Strawberry shortcake Arrgh! Don’t get scurvy! But at what price? The perfect fruit Is this a vegetable? How much did you pay for this lemon? New Zealand $1 $3 $2 $12 $8 $1 $9 $2 $900 $14 Name Description Price A Z Q H Tablet 1 Tablet 2 Tablet 3
  • 16.
    16 Single Query inPNUTS 1 Get key k (get( )) 2 Get key k3 Record for key k 4 Record for key k Routers Storage unit 1 Storage unit 2 Storage unit 3
  • 17.
    17 Range Queries inPNUTS MIN-Canteloupe SU1 Canteloupe-Lime SU3 Lime-Strawberry SU2 Strawberry-MAX SU1 Storage unit 1 Storage unit 2 Storage unit 3 Router (Scatter-gather Engine) Apple Avocado Banana Blueberry Canteloupe Grape Kiwi Lemon Lime Mango Orange Pear Strawberry Tomato Watermelon Grapefruit…Pear? (scan( )) Grapefruit…Lime? Lime…Pear? SU1Strawberry-MAX SU2Lime-Strawberry SU3Canteloupe-Lime SU4MIN-Canteloupe
  • 18.
    18 Update Operation inPNUTS 1 Write key k (set(v)) 2 Write key k7 Sequence # for key k 8 Sequence # for key k SU SU SU 3 Write key k 4 5 SUCCESS 6 Write key k Routers Message brokers
  • 19.
    19 Load Balancing viaTablet Splitting Each storage unit has many tablets (horizontal partitions of the table) Tablets may grow over timeOverfull tablets split Storage unit may become a hotspot Shed load by moving tablets to other servers Storage unit Tablet
  • 20.
  • 21.
    • Eventual consistency –Transactions: • Alice changes status from “Sleeping” to “Awake” • Alice changes location from “Home” to “Work” 21 Consistency Levels (Alice, Home, Sleeping) (Alice, Home, Awake) Region 1 (Alice, Home, Sleeping) (Alice, Work, Sleeping) Region 2 (Alice, Work, Awake) (Alice, Work, Awake) Work Awake Final state consistent “Invalid” state visible Awake Work
  • 22.
    • Timeline consistency –Transactions: • Alice changes status from “Sleeping” to “Awake” • Alice changes location from “Home” to “Work” 22 Consistency Levels (contd.) (Alice, Home, Sleeping) (Alice, Home, Awake) Region 1 (Alice, Home, Sleeping) (Alice, Work, Awake) Region 2 (Alice, Work, Awake) Work (Alice, Work, Awake) Awake Work
  • 23.
    23 Consistency via Mastership A42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E A 42342 E B 42521 E C 66354 W D 12352 E E 75656 C F 15677 E C 66354 W B 42521 E A 42342 E D 12352 E E 75656 C F 15677 E
  • 24.
    24 Failover in PNUTS A42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E X X OVERRIDE W → E
  • 25.
    • PNUTS supportsboth eventual and timeline consistency model – Applications can choose which kind of table to create • What happens to a record with primary key “Brian”? 25 Consistency Models in PNUTS Record inserted Update Update Update UpdateUpdate Delete Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Update Update
  • 26.
    26 Some APIs ofTimeline Model in PNUTS Time v. 1 v. 2 v. 3 v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Current version Stale versionStale version Read-any • Read-any returns a possibly stale version of the record ‒ Served using a local copy • It can be used for displaying a user’s friend’s status in a social networking application, as it is not absolutely essential to get the most up-to-date value
  • 27.
    27 Some APIs ofTimeline Model in PNUTS (contd.) Time v. 1 v. 2 v. 3 v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Read-latest Current version Stale versionStale version Time v. 1 v. 2 v. 3 v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Write Current version Stale versionStale version
  • 28.
    28 Some APIs ofTimeline Model in PNUTS (contd.) Time v. 1 v. 2 v. 3 v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Read ≥ v.6 Current version Stale versionStale version Read-critical(required version): • Read-critical returns a version of the record that is strictly newer than, or the same as the required version • It can be used when a user writes a record, and then wants to read a version of the record that definitely reflects his changes
  • 29.
    29 Some APIs ofTimeline Model in PNUTS (contd.) Time v. 1 v. 2 v. 3 v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Write if = v.7 ERROR Current version Stale versionStale version Test-and-set-write(required version) • Test-and-set-write performs the requested write to the record if and only if the present version of the record is the same as required version ‒ Locking mechanism in row level • It can be used to implement writing a record based on previous reading, i.e., incrementing the value of a counter
  • 30.
    • Yahoo! MessageBroker (YMB) [redo log] – Topic-based publish/subscribe system – Data is considered “committed” when they have been published to YMB – At some point after being committed, the update will be asynchronously propagated to different regions and applied to their replicas • Recovery via YMB – The tablet controller requests a copy from a particular remote replica (the “source tablet”) – A “checkpoint message” is published to YMB to ensure that any in-flight updates at the time the copy is initiated are applied to the source tablet – The source tablet is copied to the destination region – Backup is used in practice 30 Recovery via YMB
  • 31.
    Other Features 31 • Notifications –One pub-sub topic per tablet – Client knows about tables instead of tablets – Automatically subscribed to all tablets in spite of adding/removing tablets – Undelivered notifications are handled in usual way • Hosted Database Service – Centrally-managed database service shared by multiple applications
  • 32.
    Experimental Setup 32 • ThreePNUTS regions • Workload – 1200-3600 requests/second – 0-50% writes – 80% locality • Insert Operation Region Machine Servers/region West 1, West 2 2.8 GHz Xeon, 4GB RAM 5 SU, 2 YMB, 1 Router, 1 Tablet controllerEast Quad 2.13 GHz Xeon, 4GB RAM Region Latency (hash table) Latency (ordered table) West 1 (master) 75.6 ms 33 ms West 2 (non-master) 131.5 ms 105.8 ms East (non-master) 315.5 ms 324.5 ms
  • 33.
  • 34.
    • Existing DBMSfails to provide rich database functionality and low latency at massive scale • PNUTS uses a asynchronous geographic replication to ensure low write latency – Per-record timeline consistency that provides useful guarantees to applications without sacrificing scalability – Message broker that serves both as the replication mechanism and redo log of the database – Flexible mapping of tablets to storage units to support automated failover and load balancing • Future work – Indexes and materialized views – Bundled updates – Batch query processing (MapReduce) 34 Conclusion and Future Work
  • 35.
    • Asynchronous ViewMaintenance for VLSD Databases – Agarwal et al., SIGMOD, 2009 – Indexes and views • A Batch of PNUTS: Experiences Connecting Cloud Batch and Serving Systems – Silberstein et al., SIGMOD, 2011 – PNUTS-Hadoop • Where in the World is My Data? – Kadambi et al., VLDB, 2011 – Selective replication 35 Subsequent Advancements
  • 36.
    • Remote viewtable – A regular table but updated by the view maintainer instead of a client 36 Indexes and Views Update YMB YMBSU VM
  • 37.
    37 PNUTS-Hadoop Reading from PNUTS HadoopTasks scan(0x2-0x4) scan(0xa-0xc) scan(0x8-0xa) scan(0x0-0x2) scan(0xc-0xe) Map PNUTS 1. Split PNUTS table into ranges 2. Each Hadoop task assigned a range 3. Task uses PNUTS scan API to retrieve records in range 4. Task feeds scan results and feeds records to map function Record Reader Writing to PNUTS Map or Reduce Hadoop Tasks PNUTS Router set set set set set set 1. Call PNUTS set to write output set
  • 38.
    • If aEuropean user’s record is never accessed in Asia, it does not make sense to pay the bandwidth and disk costs to maintain an Asian replica • Static replacement – Per-record constraints – Client sets mandatory, disallowed regions • Dynamic replacement – Create replicas in regions where record is read – Evict replicas from regions where record not read – Lease-based • When a replica read, guaranteed to survive for a time period • Eviction lazy; when lease expires, replica deleted on next write 38 Selective Replication
  • 39.
    Thank you Questions arewelcome! Email: 1017052013@grad.cse.buet.ac.bd 39