SlideShare a Scribd company logo
1 of 29
Clock-RSM: Low-Latency Inter-Datacenter
State Machine Replication Using Loosely
Synchronized Physical Clocks
Jiaqing Du, Daniele Sciascia, Sameh Elnikety
Willy Zwaenepoel, Fernando Pedone
EPFL, University of Lugano, Microsoft Research
Replicated State Machines (RSM)
• Strong consistency
– Execute same commands in same order
– Reach same state from same initial state
• Fault tolerance
– Store data at multiple replicas
– Failure masking / fast failover
2
Geo-Replication
Data Center
Data Center
Data Center
Data Center
Data Center
• High latency among replicas
• Messaging dominates replication latency
3
Leader-Based Protocols
• Order commands by a leader replica
• Require extra ordering messages at follower
Leader
client request client reply
Ordering
Replication
High latency for geo replication
Ordering
4
Follower
Clock-RSM
• Orders commands using physical clocks
• Overlaps ordering and replication
5
client request client reply
Ordering + Replication
Low latency for geo replication
Outline
• Clock-RSM
• Comparison with Paxos
• Evaluation
• Conclusion
6
Outline
• Clock-RSM
• Comparison with Paxos
• Evaluation
• Conclusion
7
Property and Assumption
• Provides linearizability
• Tolerates failure of minority replicas
• Assumptions
– Asynchronous FIFO channels
– Non-Byzantine faults
– Loosely synchronized physical clocks
8
Protocol Overview
client request client reply
client request client reply
9
PrepOK
cmd1.ts = Clock()
cmd2.ts = Clock()
Clock-RSM
cmd1cmd2
cmd1cmd2
cmd1cmd2
cmd1cmd2
cmd1cmd2
Major Message Steps
• Prep: Ask everyone to log a command
• PrepOK: Tell everyone after logging a command
R0
R2
R1
client request
R3
R4
Prep
PrepOK
PrepOK
cmd1.ts = 24
PrepOK
PrepOK
cmd1 committed?
client request
cmd2.ts = 23
10
Commit Conditions
• A command is committed if
– Replicated by a majority
– All commands ordered before are committed
• Wait until three conditions hold
C1: Majority replication
C2: Stable order
C3: Prefix replication
11
C1: Majority Replication
• More than half replicas log cmd1
R0
R2
R1
client request
R3
R4
PrepOK
PrepOK
cmd1.ts = 24
Prep
Replicated by R0, R1, R2
1 RTT: between R0 and majority
12
C2: Stable Order
• Replica knows all commands ordered before cmd1
– Receives a greater timestamp from every other replica
R0
R2
R1
client request
R3
R4
24
cmd1.ts = 24
2523
25
25
25
0.5 RTT: between R0 and farthest peer
cmd1 is stable at R0
13
Prep / PrepOK / ClockTime
C3: Prefix Replication
• All commands ordered before cmd1 are replicated
by a majority
14
R0
R2
R1
client request
R3
R4
cmd1.ts = 24
cmd2 is replicated
by R1, R2, R3
cmd2.ts = 23
Prep
PrepOk
1 RTT: R4 to majority + majority to R0
client request
Prep
Prep
PrepOkPrepOk
Overlapping Steps
15
R0
R2
R1
client request
R3
R4
Latency of cmd1 : about 1 RTT to majority
client reply
Majority replication
Stable order
Prefix replication
PrepOK
PrepOK
Prep
Log(cmd1)
Log(cmd1)
24 2523
25
25
25
Prep
Prep
PrepOk
PrepOk
cmd1.ts = 24
Commit Latency
Step Latency
Majority replication 1 RTT (majority1)
Stable order 0.5 RTT (farthest)
Prefix replication 1 RTT (majority2)
Overall latency =
MAX{ 1 RTT (majority1), 0.5 RTT (farthest), 1 RTT (majority2) }
16
If 0.5 RTT (farthest) < 1 RTT (majority),
then overall latency ≈ 1 RTT (majority).
R0
Topology Examples
Majority1
Farthest
R0
Majority1
Farthest
R3
R4
R2
R1
R4
R3
R2
R1
17
client request
client request
Outline
• Clock-RSM
• Comparison with Paxos
• Evaluation
• Conclusion
18
Paxos 1: Multi-Paxos
• Single leader orders commands
– Logical clock: 0, 1, 2, 3, ...
R0
Leader R2
R1
client request
Prep
CommitForward
client reply
PrepOK
R3
R4
Latency at followers: 2 RTTs (leader & majority) 19
Paxos 2: Paxos-bcast
• Every replica broadcasts PrepOK
– Trades off message complexity for latency
R0
Leader R2
R1
client request
Prep
Forward
client reply
PrepOK
R3
R4
Latency at followers: 1.5 RTTs (leader & majority)
20
Clock-RSM vs. Paxos
• With realistic topologies, Clock-RSM has
– Lower latency at Paxos follower replicas
– Similar / slightly higher latency at Paxos leader
21
Protocol Latency
Clock-RSM All replicas: 1 RTT (majority)
if 0.5 RTT (farthest) < 1 RTT (majority)
Paxos-bcast Leader: 1 RTT (majority)
Follower: 1.5 RTTs (leader & majority)
Outline
• Clock-RSM
• Comparison with Paxos
• Evaluation
• Conclusion
22
Experiment Setup
• Replicated key-value store
• Deployed on Amazon EC2
California (CA)
Virginia (VA)
Ireland (IR)
Singapore (SG)
Japan (JP)
23
Latency (1/2)
• All replicas serve client requests
24
Overlapping vs. Separate Steps
CA VA
IR
SG
JP
25
CA VA (L)
IR
SG
JP
Clock-RSM latency: max of three
Paxos-bcast latency: sum of three
client request
client request
Latency (2/2)
• Paxos leader is changed to CA
26
Throughput
• Five replicas on a local cluster
• Message batching is key
27
Also in the Paper
• A reconfiguration protocol
• Comparison with Mencius
• Latency analysis of protocols
28
Conclusion
• Clock-RSM: low latency geo-replication
– Uses loosely synchronized physical clocks
– Overlaps ordering and replication
• Leader-based protocols can incur high latency
29

More Related Content

What's hot

Real Time Application Interface for Linux
Real Time Application Interface for LinuxReal Time Application Interface for Linux
Real Time Application Interface for LinuxSarah Hussein
 
Free OpManager training Part 2- Monitoring Server Performance
Free OpManager training Part 2- Monitoring Server PerformanceFree OpManager training Part 2- Monitoring Server Performance
Free OpManager training Part 2- Monitoring Server PerformanceManageEngine, Zoho Corporation
 
Free OpManager training Part3- Network performance monitoring
Free OpManager training Part3- Network performance monitoringFree OpManager training Part3- Network performance monitoring
Free OpManager training Part3- Network performance monitoringManageEngine, Zoho Corporation
 
Round Robin Algorithm.pptx
Round Robin Algorithm.pptxRound Robin Algorithm.pptx
Round Robin Algorithm.pptxSanad Bhowmik
 
Free OpManager training Part1- Discovery and classification
Free OpManager training Part1- Discovery and classificationFree OpManager training Part1- Discovery and classification
Free OpManager training Part1- Discovery and classificationManageEngine, Zoho Corporation
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneOpen-NFP
 
Linux Administation
Linux AdministationLinux Administation
Linux Administationrkulandaivel
 
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache FlinkFlink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache FlinkFlink Forward
 
Centos failover link
Centos failover link Centos failover link
Centos failover link Ediga Watson
 
Getting Started with Performance Co-Pilot
Getting Started with Performance Co-PilotGetting Started with Performance Co-Pilot
Getting Started with Performance Co-PilotPaul V. Novarese
 
System performance monitoring pcp + vector
System performance monitoring   pcp + vectorSystem performance monitoring   pcp + vector
System performance monitoring pcp + vectorSandeep Kunkunuru
 
Introduction to Remote Procedure Call
Introduction to Remote Procedure CallIntroduction to Remote Procedure Call
Introduction to Remote Procedure CallAbdelrahman Al-Ogail
 
Lac2006 Lee Revell Slides
Lac2006 Lee Revell SlidesLac2006 Lee Revell Slides
Lac2006 Lee Revell Slidesrlrevell
 
Supporting Time-Sensitive Applications on a Commodity OS
Supporting Time-Sensitive Applications on a Commodity OSSupporting Time-Sensitive Applications on a Commodity OS
Supporting Time-Sensitive Applications on a Commodity OSNamHyuk Ahn
 

What's hot (20)

Real Time Application Interface for Linux
Real Time Application Interface for LinuxReal Time Application Interface for Linux
Real Time Application Interface for Linux
 
SCHEDULING ALGORITHMS
SCHEDULING ALGORITHMSSCHEDULING ALGORITHMS
SCHEDULING ALGORITHMS
 
Free OpManager training Part 2- Monitoring Server Performance
Free OpManager training Part 2- Monitoring Server PerformanceFree OpManager training Part 2- Monitoring Server Performance
Free OpManager training Part 2- Monitoring Server Performance
 
Free OpManager training Part3- Network performance monitoring
Free OpManager training Part3- Network performance monitoringFree OpManager training Part3- Network performance monitoring
Free OpManager training Part3- Network performance monitoring
 
Round Robin Algorithm.pptx
Round Robin Algorithm.pptxRound Robin Algorithm.pptx
Round Robin Algorithm.pptx
 
Free OpManager training Part1- Discovery and classification
Free OpManager training Part1- Discovery and classificationFree OpManager training Part1- Discovery and classification
Free OpManager training Part1- Discovery and classification
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data Plane
 
Linux Administation
Linux AdministationLinux Administation
Linux Administation
 
Raft presentation
Raft presentationRaft presentation
Raft presentation
 
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache FlinkFlink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
 
Centos failover link
Centos failover link Centos failover link
Centos failover link
 
Getting Started with Performance Co-Pilot
Getting Started with Performance Co-PilotGetting Started with Performance Co-Pilot
Getting Started with Performance Co-Pilot
 
System performance monitoring pcp + vector
System performance monitoring   pcp + vectorSystem performance monitoring   pcp + vector
System performance monitoring pcp + vector
 
Introduction to Remote Procedure Call
Introduction to Remote Procedure CallIntroduction to Remote Procedure Call
Introduction to Remote Procedure Call
 
MidTerm-RatanMohapatra
MidTerm-RatanMohapatraMidTerm-RatanMohapatra
MidTerm-RatanMohapatra
 
Lac2006 Lee Revell Slides
Lac2006 Lee Revell SlidesLac2006 Lee Revell Slides
Lac2006 Lee Revell Slides
 
Supporting Time-Sensitive Applications on a Commodity OS
Supporting Time-Sensitive Applications on a Commodity OSSupporting Time-Sensitive Applications on a Commodity OS
Supporting Time-Sensitive Applications on a Commodity OS
 
Dns
DnsDns
Dns
 
PCP
PCPPCP
PCP
 
Week5 lec1-bscs1
Week5 lec1-bscs1Week5 lec1-bscs1
Week5 lec1-bscs1
 

Similar to Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks

3 process scheduling
3 process scheduling3 process scheduling
3 process schedulingahad alam
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward
 
fggggggggggggggggggggggggggggggfffffffffffffffffff
fggggggggggggggggggggggggggggggffffffffffffffffffffggggggggggggggggggggggggggggggfffffffffffffffffff
fggggggggggggggggggggggggggggggfffffffffffffffffffadugnanegero
 
Process Scheduling Algorithms for Operating Systems
Process Scheduling Algorithms for Operating SystemsProcess Scheduling Algorithms for Operating Systems
Process Scheduling Algorithms for Operating SystemsKathirvelRajan2
 
3_process_scheduling.ppt----------------
3_process_scheduling.ppt----------------3_process_scheduling.ppt----------------
3_process_scheduling.ppt----------------DivyaBorade3
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
RedisConf18 - Active-Active Geo-Distributed Apps with Redis CRDTs (conflict f...
RedisConf18 - Active-Active Geo-Distributed Apps with Redis CRDTs (conflict f...RedisConf18 - Active-Active Geo-Distributed Apps with Redis CRDTs (conflict f...
RedisConf18 - Active-Active Geo-Distributed Apps with Redis CRDTs (conflict f...Redis Labs
 
Qualcomm lte-performance-challenges-09-01-2011
Qualcomm lte-performance-challenges-09-01-2011Qualcomm lte-performance-challenges-09-01-2011
Qualcomm lte-performance-challenges-09-01-2011Muhammad Noor Ifansyah
 
(NET404) Making Every Packet Count
(NET404) Making Every Packet Count(NET404) Making Every Packet Count
(NET404) Making Every Packet CountAmazon Web Services
 
AWS re:Invent 2016: Making Every Packet Count (NET404)
AWS re:Invent 2016: Making Every Packet Count (NET404)AWS re:Invent 2016: Making Every Packet Count (NET404)
AWS re:Invent 2016: Making Every Packet Count (NET404)Amazon Web Services
 
Ceph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateCeph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateCeph Community
 
Ceph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateCeph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateDanielle Womboldt
 
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreLec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreHsien-Hsin Sean Lee, Ph.D.
 
dataprocess using different technology.ppt
dataprocess using different technology.pptdataprocess using different technology.ppt
dataprocess using different technology.pptssuserf6eb9b
 

Similar to Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks (20)

3 process scheduling
3 process scheduling3 process scheduling
3 process scheduling
 
Real time database
Real time databaseReal time database
Real time database
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
 
13 risc
13 risc13 risc
13 risc
 
fggggggggggggggggggggggggggggggfffffffffffffffffff
fggggggggggggggggggggggggggggggffffffffffffffffffffggggggggggggggggggggggggggggggfffffffffffffffffff
fggggggggggggggggggggggggggggggfffffffffffffffffff
 
3_process_scheduling.ppt
3_process_scheduling.ppt3_process_scheduling.ppt
3_process_scheduling.ppt
 
Process Scheduling Algorithms for Operating Systems
Process Scheduling Algorithms for Operating SystemsProcess Scheduling Algorithms for Operating Systems
Process Scheduling Algorithms for Operating Systems
 
3_process_scheduling.ppt----------------
3_process_scheduling.ppt----------------3_process_scheduling.ppt----------------
3_process_scheduling.ppt----------------
 
3_process_scheduling.ppt
3_process_scheduling.ppt3_process_scheduling.ppt
3_process_scheduling.ppt
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
rtos.ppt
rtos.pptrtos.ppt
rtos.ppt
 
RedisConf18 - Active-Active Geo-Distributed Apps with Redis CRDTs (conflict f...
RedisConf18 - Active-Active Geo-Distributed Apps with Redis CRDTs (conflict f...RedisConf18 - Active-Active Geo-Distributed Apps with Redis CRDTs (conflict f...
RedisConf18 - Active-Active Geo-Distributed Apps with Redis CRDTs (conflict f...
 
Qualcomm lte-performance-challenges-09-01-2011
Qualcomm lte-performance-challenges-09-01-2011Qualcomm lte-performance-challenges-09-01-2011
Qualcomm lte-performance-challenges-09-01-2011
 
(NET404) Making Every Packet Count
(NET404) Making Every Packet Count(NET404) Making Every Packet Count
(NET404) Making Every Packet Count
 
AWS re:Invent 2016: Making Every Packet Count (NET404)
AWS re:Invent 2016: Making Every Packet Count (NET404)AWS re:Invent 2016: Making Every Packet Count (NET404)
AWS re:Invent 2016: Making Every Packet Count (NET404)
 
Ceph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateCeph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA Update
 
Ceph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateCeph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA Update
 
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreLec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
 
dataprocess using different technology.ppt
dataprocess using different technology.pptdataprocess using different technology.ppt
dataprocess using different technology.ppt
 
08Mapping.ppt
08Mapping.ppt08Mapping.ppt
08Mapping.ppt
 

Recently uploaded

PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 

Recently uploaded (20)

PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 

Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks

  • 1. Clock-RSM: Low-Latency Inter-Datacenter State Machine Replication Using Loosely Synchronized Physical Clocks Jiaqing Du, Daniele Sciascia, Sameh Elnikety Willy Zwaenepoel, Fernando Pedone EPFL, University of Lugano, Microsoft Research
  • 2. Replicated State Machines (RSM) • Strong consistency – Execute same commands in same order – Reach same state from same initial state • Fault tolerance – Store data at multiple replicas – Failure masking / fast failover 2
  • 3. Geo-Replication Data Center Data Center Data Center Data Center Data Center • High latency among replicas • Messaging dominates replication latency 3
  • 4. Leader-Based Protocols • Order commands by a leader replica • Require extra ordering messages at follower Leader client request client reply Ordering Replication High latency for geo replication Ordering 4 Follower
  • 5. Clock-RSM • Orders commands using physical clocks • Overlaps ordering and replication 5 client request client reply Ordering + Replication Low latency for geo replication
  • 6. Outline • Clock-RSM • Comparison with Paxos • Evaluation • Conclusion 6
  • 7. Outline • Clock-RSM • Comparison with Paxos • Evaluation • Conclusion 7
  • 8. Property and Assumption • Provides linearizability • Tolerates failure of minority replicas • Assumptions – Asynchronous FIFO channels – Non-Byzantine faults – Loosely synchronized physical clocks 8
  • 9. Protocol Overview client request client reply client request client reply 9 PrepOK cmd1.ts = Clock() cmd2.ts = Clock() Clock-RSM cmd1cmd2 cmd1cmd2 cmd1cmd2 cmd1cmd2 cmd1cmd2
  • 10. Major Message Steps • Prep: Ask everyone to log a command • PrepOK: Tell everyone after logging a command R0 R2 R1 client request R3 R4 Prep PrepOK PrepOK cmd1.ts = 24 PrepOK PrepOK cmd1 committed? client request cmd2.ts = 23 10
  • 11. Commit Conditions • A command is committed if – Replicated by a majority – All commands ordered before are committed • Wait until three conditions hold C1: Majority replication C2: Stable order C3: Prefix replication 11
  • 12. C1: Majority Replication • More than half replicas log cmd1 R0 R2 R1 client request R3 R4 PrepOK PrepOK cmd1.ts = 24 Prep Replicated by R0, R1, R2 1 RTT: between R0 and majority 12
  • 13. C2: Stable Order • Replica knows all commands ordered before cmd1 – Receives a greater timestamp from every other replica R0 R2 R1 client request R3 R4 24 cmd1.ts = 24 2523 25 25 25 0.5 RTT: between R0 and farthest peer cmd1 is stable at R0 13 Prep / PrepOK / ClockTime
  • 14. C3: Prefix Replication • All commands ordered before cmd1 are replicated by a majority 14 R0 R2 R1 client request R3 R4 cmd1.ts = 24 cmd2 is replicated by R1, R2, R3 cmd2.ts = 23 Prep PrepOk 1 RTT: R4 to majority + majority to R0 client request Prep Prep PrepOkPrepOk
  • 15. Overlapping Steps 15 R0 R2 R1 client request R3 R4 Latency of cmd1 : about 1 RTT to majority client reply Majority replication Stable order Prefix replication PrepOK PrepOK Prep Log(cmd1) Log(cmd1) 24 2523 25 25 25 Prep Prep PrepOk PrepOk cmd1.ts = 24
  • 16. Commit Latency Step Latency Majority replication 1 RTT (majority1) Stable order 0.5 RTT (farthest) Prefix replication 1 RTT (majority2) Overall latency = MAX{ 1 RTT (majority1), 0.5 RTT (farthest), 1 RTT (majority2) } 16 If 0.5 RTT (farthest) < 1 RTT (majority), then overall latency ≈ 1 RTT (majority).
  • 18. Outline • Clock-RSM • Comparison with Paxos • Evaluation • Conclusion 18
  • 19. Paxos 1: Multi-Paxos • Single leader orders commands – Logical clock: 0, 1, 2, 3, ... R0 Leader R2 R1 client request Prep CommitForward client reply PrepOK R3 R4 Latency at followers: 2 RTTs (leader & majority) 19
  • 20. Paxos 2: Paxos-bcast • Every replica broadcasts PrepOK – Trades off message complexity for latency R0 Leader R2 R1 client request Prep Forward client reply PrepOK R3 R4 Latency at followers: 1.5 RTTs (leader & majority) 20
  • 21. Clock-RSM vs. Paxos • With realistic topologies, Clock-RSM has – Lower latency at Paxos follower replicas – Similar / slightly higher latency at Paxos leader 21 Protocol Latency Clock-RSM All replicas: 1 RTT (majority) if 0.5 RTT (farthest) < 1 RTT (majority) Paxos-bcast Leader: 1 RTT (majority) Follower: 1.5 RTTs (leader & majority)
  • 22. Outline • Clock-RSM • Comparison with Paxos • Evaluation • Conclusion 22
  • 23. Experiment Setup • Replicated key-value store • Deployed on Amazon EC2 California (CA) Virginia (VA) Ireland (IR) Singapore (SG) Japan (JP) 23
  • 24. Latency (1/2) • All replicas serve client requests 24
  • 25. Overlapping vs. Separate Steps CA VA IR SG JP 25 CA VA (L) IR SG JP Clock-RSM latency: max of three Paxos-bcast latency: sum of three client request client request
  • 26. Latency (2/2) • Paxos leader is changed to CA 26
  • 27. Throughput • Five replicas on a local cluster • Message batching is key 27
  • 28. Also in the Paper • A reconfiguration protocol • Comparison with Mencius • Latency analysis of protocols 28
  • 29. Conclusion • Clock-RSM: low latency geo-replication – Uses loosely synchronized physical clocks – Overlaps ordering and replication • Leader-based protocols can incur high latency 29