Chubby Lock Service
Amir H. Payberah
amir@sics.se
Amirkabir University of Technology
(Tehran Polytechnic)
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 1 / 38
What is the Problem?
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 2 / 38
What is the Problem?
Large distributed system
How to ensure that only one process across a fleet of processes acts
on a resource?
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 3 / 38
What is the Problem?
Large distributed system
How to ensure that only one process across a fleet of processes acts
on a resource?
• Ensure only one server can write to a database.
• Ensure that only one server can perform a particular action.
• Ensure that there is a single master that processes all writes.
• ...
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 3 / 38
What is the Problem?
All these scenarios need a simple way to coordinate execution of
process to ensure that only one entity is performing an action.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 4 / 38
What is the Problem?
All these scenarios need a simple way to coordinate execution of
process to ensure that only one entity is performing an action.
We need agreement (consensus)
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 4 / 38
What is the Problem?
All these scenarios need a simple way to coordinate execution of
process to ensure that only one entity is performing an action.
We need agreement (consensus)
Paxos ...
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 4 / 38
How Can We Use Paxos
to Solve the Problem
of Coordination?
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 5 / 38
Possible Solutions
Building a consensus library
Building a centralized lock service
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 6 / 38
Consensus Library vs. Centralized Service
Consensus library
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
Consensus Library vs. Centralized Service
Consensus library
• Library code has to be included in every program.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
Consensus Library vs. Centralized Service
Consensus library
• Library code has to be included in every program.
• Application developers to be experts in distributed system.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
Consensus Library vs. Centralized Service
Consensus library
• Library code has to be included in every program.
• Application developers to be experts in distributed system.
• Need to wait on majority (quorums).
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
Consensus Library vs. Centralized Service
Consensus library
• Library code has to be included in every program.
• Application developers to be experts in distributed system.
• Need to wait on majority (quorums).
Centralized lock service
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
Consensus Library vs. Centralized Service
Consensus library
• Library code has to be included in every program.
• Application developers to be experts in distributed system.
• Need to wait on majority (quorums).
Centralized lock service
• Clean interface.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
Consensus Library vs. Centralized Service
Consensus library
• Library code has to be included in every program.
• Application developers to be experts in distributed system.
• Need to wait on majority (quorums).
Centralized lock service
• Clean interface.
• Service can be modified independently.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
Consensus Library vs. Centralized Service
Consensus library
• Library code has to be included in every program.
• Application developers to be experts in distributed system.
• Need to wait on majority (quorums).
Centralized lock service
• Clean interface.
• Service can be modified independently.
• A single client can obtain a lock and make progress (non-quorum
based decisions, the lock service takes care of it)
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
Consensus Library vs. Centralized Service
Consensus library
• Library code has to be included in every program.
• Application developers to be experts in distributed system.
• Need to wait on majority (quorums).
Centralized lock service
• Clean interface.
• Service can be modified independently.
• A single client can obtain a lock and make progress (non-quorum
based decisions, the lock service takes care of it)
• Application developers do not have to worry about dealing with
operations, various failure modes debugging etc.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
Centralized Lock Service
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 8 / 38
Chubby is a Lock Service
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 9 / 38
Chubby
Primary goals: reliability and availability (not high performance)
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 10 / 38
Chubby
Primary goals: reliability and availability (not high performance)
Used in Google: GFS, Bigtable, etc.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 10 / 38
Chubby Structure
Two main components:
• Server (chubby cell)
• Client library
Communicate via RPC
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 11 / 38
Chubby Cell
A small set of servers (replicas)
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 12 / 38
Chubby Cell
A small set of servers (replicas)
One is the master (elected from the replicas via Paxos)
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 12 / 38
Chubby Cell
A small set of servers (replicas)
One is the master (elected from the replicas via Paxos)
Maintain copies of simple database (replicated state machines)
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 12 / 38
Chubby Cell
A small set of servers (replicas)
One is the master (elected from the replicas via Paxos)
Maintain copies of simple database (replicated state machines)
• Only the master initiates reads and writes of this database.
• All other replicas simply copy updates from the master (using the
Paxos protocol).
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 12 / 38
Chubby Client
Find the master: sending master location requests to replicas.
All requests are sent directly to the master.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 13 / 38
Chubby Operations
Write requests
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 14 / 38
Chubby Operations
Write requests
• The master receives the request.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 14 / 38
Chubby Operations
Write requests
• The master receives the request.
• Write requests are propagated via the consensus protocol to all
replicas.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 14 / 38
Chubby Operations
Write requests
• The master receives the request.
• Write requests are propagated via the consensus protocol to all
replicas.
• Acknowledges when write request reaches the majority of the
replicas
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 14 / 38
Chubby Operations
Write requests
• The master receives the request.
• Write requests are propagated via the consensus protocol to all
replicas.
• Acknowledges when write request reaches the majority of the
replicas
Read requests
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 14 / 38
Chubby Operations
Write requests
• The master receives the request.
• Write requests are propagated via the consensus protocol to all
replicas.
• Acknowledges when write request reaches the majority of the
replicas
Read requests
• Satisfied by the master alone.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 14 / 38
Let’s Go into More Details
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 15 / 38
Chubby Interface (1/2)
Chubby exports a unix-like filesystem interface.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 16 / 38
Chubby Interface (1/2)
Chubby exports a unix-like filesystem interface.
Tree of files with names separated by slashes (namesapace):
/ls/cell1/aut/cc14:
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 16 / 38
Chubby Interface (1/2)
Chubby exports a unix-like filesystem interface.
Tree of files with names separated by slashes (namesapace):
/ls/cell1/aut/cc14:
• 1st component (ls): lock service (common to all names)
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 16 / 38
Chubby Interface (1/2)
Chubby exports a unix-like filesystem interface.
Tree of files with names separated by slashes (namesapace):
/ls/cell1/aut/cc14:
• 1st component (ls): lock service (common to all names)
• 2nd component (cell1): the chubby cell name
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 16 / 38
Chubby Interface (1/2)
Chubby exports a unix-like filesystem interface.
Tree of files with names separated by slashes (namesapace):
/ls/cell1/aut/cc14:
• 1st component (ls): lock service (common to all names)
• 2nd component (cell1): the chubby cell name
• The rest: the name of the directory and the file (name inside the
cell)
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 16 / 38
Chubby Interface (2/2)
Chubby maintains a namespace that contains files and directories,
called nodes.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 17 / 38
Chubby Interface (2/2)
Chubby maintains a namespace that contains files and directories,
called nodes.
Clients can create nodes and add contents to nodes.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 17 / 38
Chubby Interface (2/2)
Chubby maintains a namespace that contains files and directories,
called nodes.
Clients can create nodes and add contents to nodes.
Support most normal operations:
• create, delete, open, write, ...
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 17 / 38
Chubby Interface (2/2)
Chubby maintains a namespace that contains files and directories,
called nodes.
Clients can create nodes and add contents to nodes.
Support most normal operations:
• create, delete, open, write, ...
Clients open nodes to obtain handles that are analogous to unix file
descriptors.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 17 / 38
Locks (1/2)
Each Chubby file and directory (node) can act as a reader/writer
lock.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 18 / 38
Locks (1/2)
Each Chubby file and directory (node) can act as a reader/writer
lock.
A client handle may hold the lock in two modes: writer (exclusive)
mode or reader (shared) mode.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 18 / 38
Locks (1/2)
Each Chubby file and directory (node) can act as a reader/writer
lock.
A client handle may hold the lock in two modes: writer (exclusive)
mode or reader (shared) mode.
• Only one client can hold lock in writer mode.
• Many clients can hold lock in reader mode.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 18 / 38
Locks (2/2)
Locks are advisory in Chubby.
• Holding a lock is not necessary to access file.
• Holding a lock does not prevent other clients accessing file.
• Conflicts only with other attempts to acquire the same lock.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 19 / 38
Chubby Example (1/2)
Electing a primary (leader) process.
Steps in primary election:
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 20 / 38
Chubby Example (1/2)
Electing a primary (leader) process.
Steps in primary election:
• Clients open a lock file and attempt to acquire the lock in write
mode.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 20 / 38
Chubby Example (1/2)
Electing a primary (leader) process.
Steps in primary election:
• Clients open a lock file and attempt to acquire the lock in write
mode.
• One client succeeds (becomes the primary) and writes its
name/identity into the file.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 20 / 38
Chubby Example (1/2)
Electing a primary (leader) process.
Steps in primary election:
• Clients open a lock file and attempt to acquire the lock in write
mode.
• One client succeeds (becomes the primary) and writes its
name/identity into the file.
• Other clients fail (become replicas) and discover the name of the
primary by reading the file.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 20 / 38
Chubby Example (1/2)
Electing a primary (leader) process.
Steps in primary election:
• Clients open a lock file and attempt to acquire the lock in write
mode.
• One client succeeds (becomes the primary) and writes its
name/identity into the file.
• Other clients fail (become replicas) and discover the name of the
primary by reading the file.
• Primary obtains sequencer and passes it to servers (servers can
insure that the elected primary is still valid).
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 20 / 38
Chubby Example (2/2)
Leader election
Open("/ls/foo/OurServicePrimary", "write mode")
if (successful) { // primary
SetContents(primary_identity)
} else { // replica
Open("/ls/foo/OurServicePrimary", "read mode",
"file-modification event")
when notified of file modification:
primary = GetContentsAndStat()
}
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 21 / 38
APIs
Open(), Close(), Delete()
GetContentsAndStat(), GetStat(), ReadDir()
SetContents()
SetACL()
Locks: Acquire(), TryAcquire(), Release()
Sequencers: GetSequencer(), SetSequencer(),
CheckSequencer()
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 22 / 38
Events
Chubby clients may subscribe to many events when they create a
handle.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 23 / 38
Events
Chubby clients may subscribe to many events when they create a
handle.
Events include:
• File contents modified
• Child node added, removed, or modified
• Chubby master failed over
• Handle has become invalid
• Lock acquired
• Conflicting lock requested from another client
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 23 / 38
Locking - Problem
After acquiring a lock and issuing a request (R1) a client may fail.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 24 / 38
Locking - Problem
After acquiring a lock and issuing a request (R1) a client may fail.
Another client may acquire the lock and issue its own request (R2).
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 24 / 38
Locking - Problem
After acquiring a lock and issuing a request (R1) a client may fail.
Another client may acquire the lock and issue its own request (R2).
R1 arrive later at the server and be acted upon (possible inconsis-
tency).
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 24 / 38
Locking - Solutions (1/3)
Solution 1: virtual time
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 25 / 38
Locking - Solutions (1/3)
Solution 1: virtual time
• But, it is costly to introduce sequence numbers into all the
interactions.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 25 / 38
Locking - Solutions (2/3)
Solution 2: sequencers
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 26 / 38
Locking - Solutions (2/3)
Solution 2: sequencers
• Sequence numbers are introduced into only those interactions that
make use of locks.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 26 / 38
Locking - Solutions (2/3)
Solution 2: sequencers
• Sequence numbers are introduced into only those interactions that
make use of locks.
• A lock holder can obtain a sequencer from Chubby.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 26 / 38
Locking - Solutions (2/3)
Solution 2: sequencers
• Sequence numbers are introduced into only those interactions that
make use of locks.
• A lock holder can obtain a sequencer from Chubby.
• It attaches the sequencer to any requests that it sends to other
servers (e.g., Bigtable)
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 26 / 38
Locking - Solutions (2/3)
Solution 2: sequencers
• Sequence numbers are introduced into only those interactions that
make use of locks.
• A lock holder can obtain a sequencer from Chubby.
• It attaches the sequencer to any requests that it sends to other
servers (e.g., Bigtable)
• The other servers can verify the sequencer information
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 26 / 38
Locking - Solutions (3/3)
Solution 3: lock-delay
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 27 / 38
Locking - Solutions (3/3)
Solution 3: lock-delay
• Correctly released locks are immediately available.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 27 / 38
Locking - Solutions (3/3)
Solution 3: lock-delay
• Correctly released locks are immediately available.
• If a lock becomes free because holder failed or becomes inaccessible,
lock cannot be claimed by another client until lock-delay expires.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 27 / 38
Caching
To reduce read traffic, Chubby clients cache file data and node
meta-data.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 28 / 38
Caching
To reduce read traffic, Chubby clients cache file data and node
meta-data.
Write-through: write is done synchronously both to the cache and
to the primary copy.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 28 / 38
Caching
To reduce read traffic, Chubby clients cache file data and node
meta-data.
Write-through: write is done synchronously both to the cache and
to the primary copy.
Master will invalidate cached copies upon a write request.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 28 / 38
Caching
To reduce read traffic, Chubby clients cache file data and node
meta-data.
Write-through: write is done synchronously both to the cache and
to the primary copy.
Master will invalidate cached copies upon a write request.
The client also can allow its cache lease to expire.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 28 / 38
Session Lease (1/3)
Lease: a promise by one party to abide by an agreement for a given
interval of time unless the promise is explicitly revoked.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 29 / 38
Session Lease (1/3)
Lease: a promise by one party to abide by an agreement for a given
interval of time unless the promise is explicitly revoked.
Session: a relationship between a Chubby cell and a Chubby client.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 29 / 38
Session Lease (1/3)
Lease: a promise by one party to abide by an agreement for a given
interval of time unless the promise is explicitly revoked.
Session: a relationship between a Chubby cell and a Chubby client.
Session lease: interval during which client’s cached items (handles,
locks, files) are valid.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 29 / 38
Session Lease (1/3)
Lease: a promise by one party to abide by an agreement for a given
interval of time unless the promise is explicitly revoked.
Session: a relationship between a Chubby cell and a Chubby client.
Session lease: interval during which client’s cached items (handles,
locks, files) are valid.
Session maintained through KeepAlives messages.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 29 / 38
Session Lease (2/3)
If client’s local lease expires (happens when a master fails over)
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 30 / 38
Session Lease (2/3)
If client’s local lease expires (happens when a master fails over)
• Client disables cache.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 30 / 38
Session Lease (2/3)
If client’s local lease expires (happens when a master fails over)
• Client disables cache.
• Session in jeopardy, client waits in grace period (45s).
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 30 / 38
Session Lease (2/3)
If client’s local lease expires (happens when a master fails over)
• Client disables cache.
• Session in jeopardy, client waits in grace period (45s).
• Sends jeopardy event to application.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 30 / 38
Session Lease (2/3)
If client’s local lease expires (happens when a master fails over)
• Client disables cache.
• Session in jeopardy, client waits in grace period (45s).
• Sends jeopardy event to application.
• Application can suspend activity.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 30 / 38
Session Lease (3/3)
Client attempts to exchange KeepAlive messages with master during
grace period.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 31 / 38
Session Lease (3/3)
Client attempts to exchange KeepAlive messages with master during
grace period.
• Succeed: re-enables cache; send safe event to application
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 31 / 38
Session Lease (3/3)
Client attempts to exchange KeepAlive messages with master during
grace period.
• Succeed: re-enables cache; send safe event to application
• Failed: discards cache; sends expired event to application
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 31 / 38
Scaling
Reducing communication with the master.
Two techniques:
• Proxies
• Partitioning
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 32 / 38
Scaling - Proxies
Proxy is a trusted process that pass requests from other clients to
a Chubby cell.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 33 / 38
Scaling - Proxies
Proxy is a trusted process that pass requests from other clients to
a Chubby cell.
Can handle KeepAlives and reads → reduce the master load.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 33 / 38
Scaling - Proxies
Proxy is a trusted process that pass requests from other clients to
a Chubby cell.
Can handle KeepAlives and reads → reduce the master load.
Cannot reduce write loads, but they are << 1% of workload.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 33 / 38
Scaling - Proxies
Proxy is a trusted process that pass requests from other clients to
a Chubby cell.
Can handle KeepAlives and reads → reduce the master load.
Cannot reduce write loads, but they are << 1% of workload.
Introduces another point of failure.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 33 / 38
Scaling - Partitioning
Namespace partitioned between servers.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 34 / 38
Scaling - Partitioning
Namespace partitioned between servers.
N partitions, each with master and replicas
Node D/C stored on P(D/C) = hash(D) mod N
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 34 / 38
Summary
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 35 / 38
Summary
Library vs. Service
Chubby: coarse-grained lock service
Chubby cell and clients
Unix-like interface
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 36 / 38
References:
M. Burrows, The Chubby lock service for loosely-coupled distributed
systems, OSDI, 2006.
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 37 / 38
Questions?
Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 38 / 38

Chubby

  • 1.
    Chubby Lock Service AmirH. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 1 / 38
  • 2.
    What is theProblem? Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 2 / 38
  • 3.
    What is theProblem? Large distributed system How to ensure that only one process across a fleet of processes acts on a resource? Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 3 / 38
  • 4.
    What is theProblem? Large distributed system How to ensure that only one process across a fleet of processes acts on a resource? • Ensure only one server can write to a database. • Ensure that only one server can perform a particular action. • Ensure that there is a single master that processes all writes. • ... Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 3 / 38
  • 5.
    What is theProblem? All these scenarios need a simple way to coordinate execution of process to ensure that only one entity is performing an action. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 4 / 38
  • 6.
    What is theProblem? All these scenarios need a simple way to coordinate execution of process to ensure that only one entity is performing an action. We need agreement (consensus) Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 4 / 38
  • 7.
    What is theProblem? All these scenarios need a simple way to coordinate execution of process to ensure that only one entity is performing an action. We need agreement (consensus) Paxos ... Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 4 / 38
  • 8.
    How Can WeUse Paxos to Solve the Problem of Coordination? Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 5 / 38
  • 9.
    Possible Solutions Building aconsensus library Building a centralized lock service Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 6 / 38
  • 10.
    Consensus Library vs.Centralized Service Consensus library Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
  • 11.
    Consensus Library vs.Centralized Service Consensus library • Library code has to be included in every program. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
  • 12.
    Consensus Library vs.Centralized Service Consensus library • Library code has to be included in every program. • Application developers to be experts in distributed system. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
  • 13.
    Consensus Library vs.Centralized Service Consensus library • Library code has to be included in every program. • Application developers to be experts in distributed system. • Need to wait on majority (quorums). Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
  • 14.
    Consensus Library vs.Centralized Service Consensus library • Library code has to be included in every program. • Application developers to be experts in distributed system. • Need to wait on majority (quorums). Centralized lock service Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
  • 15.
    Consensus Library vs.Centralized Service Consensus library • Library code has to be included in every program. • Application developers to be experts in distributed system. • Need to wait on majority (quorums). Centralized lock service • Clean interface. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
  • 16.
    Consensus Library vs.Centralized Service Consensus library • Library code has to be included in every program. • Application developers to be experts in distributed system. • Need to wait on majority (quorums). Centralized lock service • Clean interface. • Service can be modified independently. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
  • 17.
    Consensus Library vs.Centralized Service Consensus library • Library code has to be included in every program. • Application developers to be experts in distributed system. • Need to wait on majority (quorums). Centralized lock service • Clean interface. • Service can be modified independently. • A single client can obtain a lock and make progress (non-quorum based decisions, the lock service takes care of it) Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
  • 18.
    Consensus Library vs.Centralized Service Consensus library • Library code has to be included in every program. • Application developers to be experts in distributed system. • Need to wait on majority (quorums). Centralized lock service • Clean interface. • Service can be modified independently. • A single client can obtain a lock and make progress (non-quorum based decisions, the lock service takes care of it) • Application developers do not have to worry about dealing with operations, various failure modes debugging etc. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 7 / 38
  • 19.
    Centralized Lock Service AmirH. Payberah (Tehran Polytechnic) Chubby 1393/7/5 8 / 38
  • 20.
    Chubby is aLock Service Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 9 / 38
  • 21.
    Chubby Primary goals: reliabilityand availability (not high performance) Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 10 / 38
  • 22.
    Chubby Primary goals: reliabilityand availability (not high performance) Used in Google: GFS, Bigtable, etc. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 10 / 38
  • 23.
    Chubby Structure Two maincomponents: • Server (chubby cell) • Client library Communicate via RPC Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 11 / 38
  • 24.
    Chubby Cell A smallset of servers (replicas) Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 12 / 38
  • 25.
    Chubby Cell A smallset of servers (replicas) One is the master (elected from the replicas via Paxos) Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 12 / 38
  • 26.
    Chubby Cell A smallset of servers (replicas) One is the master (elected from the replicas via Paxos) Maintain copies of simple database (replicated state machines) Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 12 / 38
  • 27.
    Chubby Cell A smallset of servers (replicas) One is the master (elected from the replicas via Paxos) Maintain copies of simple database (replicated state machines) • Only the master initiates reads and writes of this database. • All other replicas simply copy updates from the master (using the Paxos protocol). Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 12 / 38
  • 28.
    Chubby Client Find themaster: sending master location requests to replicas. All requests are sent directly to the master. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 13 / 38
  • 29.
    Chubby Operations Write requests AmirH. Payberah (Tehran Polytechnic) Chubby 1393/7/5 14 / 38
  • 30.
    Chubby Operations Write requests •The master receives the request. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 14 / 38
  • 31.
    Chubby Operations Write requests •The master receives the request. • Write requests are propagated via the consensus protocol to all replicas. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 14 / 38
  • 32.
    Chubby Operations Write requests •The master receives the request. • Write requests are propagated via the consensus protocol to all replicas. • Acknowledges when write request reaches the majority of the replicas Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 14 / 38
  • 33.
    Chubby Operations Write requests •The master receives the request. • Write requests are propagated via the consensus protocol to all replicas. • Acknowledges when write request reaches the majority of the replicas Read requests Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 14 / 38
  • 34.
    Chubby Operations Write requests •The master receives the request. • Write requests are propagated via the consensus protocol to all replicas. • Acknowledges when write request reaches the majority of the replicas Read requests • Satisfied by the master alone. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 14 / 38
  • 35.
    Let’s Go intoMore Details Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 15 / 38
  • 36.
    Chubby Interface (1/2) Chubbyexports a unix-like filesystem interface. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 16 / 38
  • 37.
    Chubby Interface (1/2) Chubbyexports a unix-like filesystem interface. Tree of files with names separated by slashes (namesapace): /ls/cell1/aut/cc14: Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 16 / 38
  • 38.
    Chubby Interface (1/2) Chubbyexports a unix-like filesystem interface. Tree of files with names separated by slashes (namesapace): /ls/cell1/aut/cc14: • 1st component (ls): lock service (common to all names) Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 16 / 38
  • 39.
    Chubby Interface (1/2) Chubbyexports a unix-like filesystem interface. Tree of files with names separated by slashes (namesapace): /ls/cell1/aut/cc14: • 1st component (ls): lock service (common to all names) • 2nd component (cell1): the chubby cell name Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 16 / 38
  • 40.
    Chubby Interface (1/2) Chubbyexports a unix-like filesystem interface. Tree of files with names separated by slashes (namesapace): /ls/cell1/aut/cc14: • 1st component (ls): lock service (common to all names) • 2nd component (cell1): the chubby cell name • The rest: the name of the directory and the file (name inside the cell) Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 16 / 38
  • 41.
    Chubby Interface (2/2) Chubbymaintains a namespace that contains files and directories, called nodes. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 17 / 38
  • 42.
    Chubby Interface (2/2) Chubbymaintains a namespace that contains files and directories, called nodes. Clients can create nodes and add contents to nodes. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 17 / 38
  • 43.
    Chubby Interface (2/2) Chubbymaintains a namespace that contains files and directories, called nodes. Clients can create nodes and add contents to nodes. Support most normal operations: • create, delete, open, write, ... Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 17 / 38
  • 44.
    Chubby Interface (2/2) Chubbymaintains a namespace that contains files and directories, called nodes. Clients can create nodes and add contents to nodes. Support most normal operations: • create, delete, open, write, ... Clients open nodes to obtain handles that are analogous to unix file descriptors. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 17 / 38
  • 45.
    Locks (1/2) Each Chubbyfile and directory (node) can act as a reader/writer lock. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 18 / 38
  • 46.
    Locks (1/2) Each Chubbyfile and directory (node) can act as a reader/writer lock. A client handle may hold the lock in two modes: writer (exclusive) mode or reader (shared) mode. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 18 / 38
  • 47.
    Locks (1/2) Each Chubbyfile and directory (node) can act as a reader/writer lock. A client handle may hold the lock in two modes: writer (exclusive) mode or reader (shared) mode. • Only one client can hold lock in writer mode. • Many clients can hold lock in reader mode. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 18 / 38
  • 48.
    Locks (2/2) Locks areadvisory in Chubby. • Holding a lock is not necessary to access file. • Holding a lock does not prevent other clients accessing file. • Conflicts only with other attempts to acquire the same lock. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 19 / 38
  • 49.
    Chubby Example (1/2) Electinga primary (leader) process. Steps in primary election: Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 20 / 38
  • 50.
    Chubby Example (1/2) Electinga primary (leader) process. Steps in primary election: • Clients open a lock file and attempt to acquire the lock in write mode. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 20 / 38
  • 51.
    Chubby Example (1/2) Electinga primary (leader) process. Steps in primary election: • Clients open a lock file and attempt to acquire the lock in write mode. • One client succeeds (becomes the primary) and writes its name/identity into the file. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 20 / 38
  • 52.
    Chubby Example (1/2) Electinga primary (leader) process. Steps in primary election: • Clients open a lock file and attempt to acquire the lock in write mode. • One client succeeds (becomes the primary) and writes its name/identity into the file. • Other clients fail (become replicas) and discover the name of the primary by reading the file. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 20 / 38
  • 53.
    Chubby Example (1/2) Electinga primary (leader) process. Steps in primary election: • Clients open a lock file and attempt to acquire the lock in write mode. • One client succeeds (becomes the primary) and writes its name/identity into the file. • Other clients fail (become replicas) and discover the name of the primary by reading the file. • Primary obtains sequencer and passes it to servers (servers can insure that the elected primary is still valid). Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 20 / 38
  • 54.
    Chubby Example (2/2) Leaderelection Open("/ls/foo/OurServicePrimary", "write mode") if (successful) { // primary SetContents(primary_identity) } else { // replica Open("/ls/foo/OurServicePrimary", "read mode", "file-modification event") when notified of file modification: primary = GetContentsAndStat() } Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 21 / 38
  • 55.
    APIs Open(), Close(), Delete() GetContentsAndStat(),GetStat(), ReadDir() SetContents() SetACL() Locks: Acquire(), TryAcquire(), Release() Sequencers: GetSequencer(), SetSequencer(), CheckSequencer() Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 22 / 38
  • 56.
    Events Chubby clients maysubscribe to many events when they create a handle. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 23 / 38
  • 57.
    Events Chubby clients maysubscribe to many events when they create a handle. Events include: • File contents modified • Child node added, removed, or modified • Chubby master failed over • Handle has become invalid • Lock acquired • Conflicting lock requested from another client Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 23 / 38
  • 58.
    Locking - Problem Afteracquiring a lock and issuing a request (R1) a client may fail. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 24 / 38
  • 59.
    Locking - Problem Afteracquiring a lock and issuing a request (R1) a client may fail. Another client may acquire the lock and issue its own request (R2). Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 24 / 38
  • 60.
    Locking - Problem Afteracquiring a lock and issuing a request (R1) a client may fail. Another client may acquire the lock and issue its own request (R2). R1 arrive later at the server and be acted upon (possible inconsis- tency). Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 24 / 38
  • 61.
    Locking - Solutions(1/3) Solution 1: virtual time Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 25 / 38
  • 62.
    Locking - Solutions(1/3) Solution 1: virtual time • But, it is costly to introduce sequence numbers into all the interactions. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 25 / 38
  • 63.
    Locking - Solutions(2/3) Solution 2: sequencers Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 26 / 38
  • 64.
    Locking - Solutions(2/3) Solution 2: sequencers • Sequence numbers are introduced into only those interactions that make use of locks. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 26 / 38
  • 65.
    Locking - Solutions(2/3) Solution 2: sequencers • Sequence numbers are introduced into only those interactions that make use of locks. • A lock holder can obtain a sequencer from Chubby. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 26 / 38
  • 66.
    Locking - Solutions(2/3) Solution 2: sequencers • Sequence numbers are introduced into only those interactions that make use of locks. • A lock holder can obtain a sequencer from Chubby. • It attaches the sequencer to any requests that it sends to other servers (e.g., Bigtable) Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 26 / 38
  • 67.
    Locking - Solutions(2/3) Solution 2: sequencers • Sequence numbers are introduced into only those interactions that make use of locks. • A lock holder can obtain a sequencer from Chubby. • It attaches the sequencer to any requests that it sends to other servers (e.g., Bigtable) • The other servers can verify the sequencer information Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 26 / 38
  • 68.
    Locking - Solutions(3/3) Solution 3: lock-delay Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 27 / 38
  • 69.
    Locking - Solutions(3/3) Solution 3: lock-delay • Correctly released locks are immediately available. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 27 / 38
  • 70.
    Locking - Solutions(3/3) Solution 3: lock-delay • Correctly released locks are immediately available. • If a lock becomes free because holder failed or becomes inaccessible, lock cannot be claimed by another client until lock-delay expires. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 27 / 38
  • 71.
    Caching To reduce readtraffic, Chubby clients cache file data and node meta-data. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 28 / 38
  • 72.
    Caching To reduce readtraffic, Chubby clients cache file data and node meta-data. Write-through: write is done synchronously both to the cache and to the primary copy. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 28 / 38
  • 73.
    Caching To reduce readtraffic, Chubby clients cache file data and node meta-data. Write-through: write is done synchronously both to the cache and to the primary copy. Master will invalidate cached copies upon a write request. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 28 / 38
  • 74.
    Caching To reduce readtraffic, Chubby clients cache file data and node meta-data. Write-through: write is done synchronously both to the cache and to the primary copy. Master will invalidate cached copies upon a write request. The client also can allow its cache lease to expire. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 28 / 38
  • 75.
    Session Lease (1/3) Lease:a promise by one party to abide by an agreement for a given interval of time unless the promise is explicitly revoked. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 29 / 38
  • 76.
    Session Lease (1/3) Lease:a promise by one party to abide by an agreement for a given interval of time unless the promise is explicitly revoked. Session: a relationship between a Chubby cell and a Chubby client. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 29 / 38
  • 77.
    Session Lease (1/3) Lease:a promise by one party to abide by an agreement for a given interval of time unless the promise is explicitly revoked. Session: a relationship between a Chubby cell and a Chubby client. Session lease: interval during which client’s cached items (handles, locks, files) are valid. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 29 / 38
  • 78.
    Session Lease (1/3) Lease:a promise by one party to abide by an agreement for a given interval of time unless the promise is explicitly revoked. Session: a relationship between a Chubby cell and a Chubby client. Session lease: interval during which client’s cached items (handles, locks, files) are valid. Session maintained through KeepAlives messages. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 29 / 38
  • 79.
    Session Lease (2/3) Ifclient’s local lease expires (happens when a master fails over) Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 30 / 38
  • 80.
    Session Lease (2/3) Ifclient’s local lease expires (happens when a master fails over) • Client disables cache. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 30 / 38
  • 81.
    Session Lease (2/3) Ifclient’s local lease expires (happens when a master fails over) • Client disables cache. • Session in jeopardy, client waits in grace period (45s). Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 30 / 38
  • 82.
    Session Lease (2/3) Ifclient’s local lease expires (happens when a master fails over) • Client disables cache. • Session in jeopardy, client waits in grace period (45s). • Sends jeopardy event to application. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 30 / 38
  • 83.
    Session Lease (2/3) Ifclient’s local lease expires (happens when a master fails over) • Client disables cache. • Session in jeopardy, client waits in grace period (45s). • Sends jeopardy event to application. • Application can suspend activity. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 30 / 38
  • 84.
    Session Lease (3/3) Clientattempts to exchange KeepAlive messages with master during grace period. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 31 / 38
  • 85.
    Session Lease (3/3) Clientattempts to exchange KeepAlive messages with master during grace period. • Succeed: re-enables cache; send safe event to application Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 31 / 38
  • 86.
    Session Lease (3/3) Clientattempts to exchange KeepAlive messages with master during grace period. • Succeed: re-enables cache; send safe event to application • Failed: discards cache; sends expired event to application Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 31 / 38
  • 87.
    Scaling Reducing communication withthe master. Two techniques: • Proxies • Partitioning Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 32 / 38
  • 88.
    Scaling - Proxies Proxyis a trusted process that pass requests from other clients to a Chubby cell. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 33 / 38
  • 89.
    Scaling - Proxies Proxyis a trusted process that pass requests from other clients to a Chubby cell. Can handle KeepAlives and reads → reduce the master load. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 33 / 38
  • 90.
    Scaling - Proxies Proxyis a trusted process that pass requests from other clients to a Chubby cell. Can handle KeepAlives and reads → reduce the master load. Cannot reduce write loads, but they are << 1% of workload. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 33 / 38
  • 91.
    Scaling - Proxies Proxyis a trusted process that pass requests from other clients to a Chubby cell. Can handle KeepAlives and reads → reduce the master load. Cannot reduce write loads, but they are << 1% of workload. Introduces another point of failure. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 33 / 38
  • 92.
    Scaling - Partitioning Namespacepartitioned between servers. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 34 / 38
  • 93.
    Scaling - Partitioning Namespacepartitioned between servers. N partitions, each with master and replicas Node D/C stored on P(D/C) = hash(D) mod N Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 34 / 38
  • 94.
    Summary Amir H. Payberah(Tehran Polytechnic) Chubby 1393/7/5 35 / 38
  • 95.
    Summary Library vs. Service Chubby:coarse-grained lock service Chubby cell and clients Unix-like interface Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 36 / 38
  • 96.
    References: M. Burrows, TheChubby lock service for loosely-coupled distributed systems, OSDI, 2006. Amir H. Payberah (Tehran Polytechnic) Chubby 1393/7/5 37 / 38
  • 97.
    Questions? Amir H. Payberah(Tehran Polytechnic) Chubby 1393/7/5 38 / 38