Pileus

Consistency-Based
Service Level Agreements
for Cloud Storage
Douglas B. Terry, Vijayan Prabhakaran, Ramakrishna Kotla, Mahesh Balakrishnan, Marcos K.
Aguilera, Hussam Abu-Libdeh†
Microsoft Research Silicon Valley †Cornell University
ACM SOSP 2013
Presented by Yongrae Jo, System software lab. at POSTECH

Motivation
2
Single ideal consistency
(weak ~ strong)
Multiple alternative consistency
(depending on app demand)
Cloud service provider
Guaranteed
by SLA
Guaranteed
by system design(DB)
Consistency vs Availability / Performance Single & Fixed consistency
to developer
Multiple & alternative
consistency to developer

Pileus
• Storage system for cloud
• Replicated key/value store with consistency-based SLAs
• Provides a broad set of consistency choices
• that lie between strong and eventual consistency
• avoiding single-ideal consistency
• Satisfies application-specific consistency /
latency demand
• Latency-favoring applications (e.g., shopping cart)
• Consistency-favoring applications (e.g., bank)
• Exports system APIs to application 3

Consistency Levels
Strong
consistency
Eventual
consistency
Latency-favoring applications
• Shopping app
• Real-time Multiplayer games
• Computer-supported collaborative work
• Data analytics
Consistency-favoring applications
• Bank
• Calendar
• web-based e-mail
Applications with Trade-offs
• Web-browser
• Display local cache first,
and then load accurate
data as it arrives
4

System API
Interface with traditional
key-value cloud storage
(e.g., Table op., Get, Put)
Consistency Guarantees
• Consistency choices
• Service level agreements
Default
sla
condition code
(Consistency + SLA met?)
5

System API
Interface with traditional
key-value cloud storage
(e.g., Table op., Get, Put)
• Consistency choices
• Service level agreements
Default
sla
condition code
(Consistency + SLA met?)
6

on Get(s, key, sla)
Types Return value of Get(key)
Strong Consistency
Return value of last preceding Put(key)
performed by any client
Casual Consistency
Return value of latest casually preceding
Put(key)
Bounded Staleness(t) Return value that is stale at most t seconds
Read My Writes
Return value of latest Put(key) in client session
or a later value
Monotonic Reads
Return same or later value as earlier Get(key)
in client session
Eventual Consistency
Return value written by any Put(key)
(but, expected to return the latest value later)
7

Example evaluation on different
consistency guarantees
8
one client in each
country
(Secondary) (Secondary) (N/A)

Service Level Agreement
• An ordered list of subSLAs
• subSLA
• A pair of Consistency / Latency target with application-specific
utility
Most preferable
Relative
importance
Less preferable
9

SLA Failure & Checking
• Satisfying SLA can fail due to
• configuration of replicas and network conditions
• poor decision based on inaccurate information
• Checking SLA Failure from the return value(i.e.,
condition code) of Get
• Application can take different actions based on the
consistency of the returned data.
Def. Unavailability of Pileus
The inability to retrieve the desired data with acceptable consistency
and latency as defined by the SLA
11

Design and Implementation
• Architecture
• Client-side
• Consistency-specific node selection
• Monitoring storage nodes
• Client-side SLA enforcement
• Choosing a target subSLA
• Determining which subSLA was met
12

Pileus system
API
BeginSession (SLA)
BeginTx (SLA)
Put (key, value)
Get (key, SLA)
returns value,
consistency
EndTx ()
EndSession ()
13

Components
• Storage Node (or Secondary Node)
• Periodically fetches update results from primary node
• Primary Node
• One or some of storage node
• Holds master data(i.e., up-to-date data)
• Run replication protocol (e.g., consensus)
• (Client-side) Monitor
• Tracks the amount that storage nodes lag behind the primary node
• Measures roundtrip latencies between clients and storage nodes
• Client library
• Exports Pilleus APIs to application
14

Consistency-specific node selection
• How to select a node which a Get operation should be
sent?
• For desired consistency guarantee (Is it sufficiently up-to-date?)
• Minimum acceptable read timestamp
• Serves as a decision point between consistency guarantees
• Indicates an amount of lag of each node
15
Previous Object Versions
(in current session)
Key
(being read)
Minimum Acceptable
Read Timestamp
Consistency
guarantee
(Per) Node
high timestamp
Node Selection

16
Types Minimum Acceptable Read Timestamp
Strong Consistency
At least as large as the update timestamp of
the latest Put to the key that is being Get
Casual Consistency
the maximum timestamp of any object that was
previously read or written in this session
(already casual-ordered by primary)
Bounded Staleness(t) Current time – bound time t
Read My Writes
Maximum timestamp of any previous Puts to
the key being accessed in current session
Monotonic Reads
The recorded timestamp for the key being
accessed in the Get of current session
Eventual Consistency 0

: Example
17
This node can provide all
consistency guarantee except
strong and casual
consistency

Read-my-writes vs. Monotonic Reads
18
Read-my-writes
(client session)
Monotonic reads
(client session)
Put(k), t1 Get(k), t2 Put(k), t3 Get(k), t4 Get(k), t5
Put(k), t1 Get(k), t2 Put(k), t3 Get(k), t4 Get(k), t5
Minimum Acceptable
Read Timestamp
Minimum Acceptable
Read Timestamp

Monitoring Storage Nodes
• (client-side) Monitor probes latency / timestamp of
each node
• Monitor
• collects measurements in a sliding window (last few minutes)
• returns three probability estimates based on the recorded
information
PNodeCons (node, consistency, key)
Return probability that a node follows a
sufficiently up-to-date value
PNodeLat (node, latency)
Return probability that a node responds
within a given time
PNodeSla (node, consistency, latency,
key)
Return probability that a node satisfy
SLA (=PNodeCons * PNodeLat
19

Client-side SLA Enforcement
• How can we satisfy SLA effectively?
• Simple, but flawed method for SLA enforcement
• Broadcast Get op. to all replicas
• Incurs high cost (e.g., network resource, charging per byte)
• SLA Enforcement by Client Library
• Chooses a node group that can meet SLA
• Responsible for maximizing the expected utility
• Methods
• Choosing a target subSLA
• Determining which subSLA was met
20

// Choosing a target SLA and nodes
// subSLA that maximizes the expected utility
// Node group that clients will contact
21

Find targetSLA and node
group that best satisfies
SLA with maximum utility
22

Find a node with
minimum latency
23

Evaluation:
Experimental Setup
• Goal
• Evaluate Pileus in a globally distributed datacenter
environment
• Verify that adapting consistency is better than a fixed
consistency
• Measure how well the client’s Get operations meet a given consistency-
based SLA
• Evaluation
• Shopping cart SLA (weak consistency)
• Password checking SLA (strong consistency)
• Adaptability to network delays
• Sensitivity to utility values
24

Evaluation:
Experimental Setup
• YCSB benchmark with one client in each country
• Total 10,000 Put / Get, 400 Put / Get per session (20) by a client
• U.S West(Secondary), England(Primary), India(Secondary),
China(N/A)
• Comparisons with Pileus (different selection method)
• Primary
• always performs Gets at the primary node
• Random
• performs each Get at a randomly selected node
• Closest
• always performs Gets at the node with the lowest average latency
25

Evaluation:
Experimental Setup
26
Avg. RTT

Evaluation:
Shopping Cart SLA
27
• Weak consistency preferred

Evaluation:
Password Checking SLA
• Strong consistency preferred
28

Evaluation:
Adaptability to network delays
• Injecting artificial delays into Get op in Password
checking SLA experiment
client (in the U.S.)
primary (in England)
Injecting 300ms delay
to primary
29

Evaluation:
Client -> (Primary, Rank1-SLA)
Clienit learns primary is far away,
switching to second subSLA with U.S node
; Client -> (U.S node, Rank2-SLA)
Client realizes that first subSLA is
cannot be met, switching to third one
; Client -> (Primary, Rank3-SLA)
to primary
30

Evaluation:
Adding additional latency,
no SLA can be met
Client realized that only the third SLA
can be met, switching to primary
; Client -> (Primary, Rank3-SLA)
to U.S. node(local node)
31

Evaluation:
Client discovers, through periodic
probes, that it can regularly access its
local site with low delay
switching back to local node
; Client -> (U.S. Node, Rank2-SLA)
Reducing delay(a millesecond)
in U.S. node(local node)
32

Evaluation:
Client figures primary is
normal,
switching back to primary
; Client ->
(Primary, Rank1-SLA)
Restoring the avg. latency to the
primary to the usual (149ms)
33

Extensions and future work
• Enhanced monitoring
• Sharing monitoring information between clients for accurate
decision
• (i.e., client-centric distributed monitoring service)
• SLA-driven reconfiguration
• Reconfiguring replicas according to SLA
• (e.g., moving primary replica nearby client)
• Parallel Gets
• Multi-site Puts
34

Conclusion
• Pilieus is a storage system with consistency-based SLA
• Consistency-based SLAs allow applications that were
written to tolerate eventual consistency to benefit
from increased consistency
• Adaptive to varying system condition
• (e.g., nodes fail, overloaded, performance variation)
• Avoiding single ideal consistency
• Pileus can improve application-specific consistency
levels of service
• application’s SLA indicates how best to adapt
35

Research Implications
• Structural similarities to Hyperledger/Fabric
• SLA ?= Endorsement policy
• Primary node ?= Ordering Service
• Storage node ?= Peer
• Simple monitoring & decision technique
• Defining a sliding window and prob. functions
• Collect metrics -> Calculate prob. -> Decision an action
• Quorum based SLA
• 2f+1 <= Q <= (2/3)n
• From low importance to high importance
36

Many form of consistency
Consistency models in distributed systems: A survey on definitions, disciplines, challenges and applications (2019)
37

Casual precedence relationship
op1 < op2 if either,
• (a) op1 occurs before op2 in the same session
• (b) op1 is a Put(key) and op2 is a Get(key) that returns the
version put in op1
• (c) for some op3, op1 < op3 and op3 < op2.
38

Evaluation:
Sensitivity to utility values
39

Pileus

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Pileus

Similar to Pileus (20)

More from YongraeJo

More from YongraeJo (20)

Recently uploaded

Recently uploaded (20)

Pileus