Spanner osdi2012

J
Jose Maria FusterSantander Group
Spanner: Google’s 
Globally-Distributed Database 
Wilson Hsieh 
representing a host of authors 
OSDI 2012
What is Spanner? 
• Distributed multiversion database 
• General-purpose transactions (ACID) 
• SQL query language 
• Schematized tables 
• Semi-relational data model 
• Running in production 
• Storage for Google’s ad data 
• Replaced a sharded MySQL database 
OSDI 2012 2
Example: Social Network 
OSDI 2012 
User posts 
Friend lists 
US 
Brazil 
Russia 
San Francisco 
Seattle 
Arizona 
Spain 
Sao Paulo 
Santiago 
Buenos Aires 
Moscow 
Berlin 
Krakow 
London 
Paris 
Berlin 
Madrid 
Lisbon 
3 
x1000 
x1000 
x1000 
x1000
Overview 
• Feature: Lock-free distributed read transactions 
• Property: External consistency of distributed 
transactions 
– First system at global scale 
• Implementation: Integration of concurrency 
control, replication, and 2PC 
– Correctness and performance 
• Enabling technology: TrueTime 
– Interval-based global time 
OSDI 2012 4
Read Transactions 
• Generate a page of friends’ recent posts 
– Consistent view of friend list and their posts 
OSDI 2012 
Why consistency matters 
1. Remove untrustworthy person X as friend 
2. Post P: “My government is repressive…” 
5
Single Machine 
User posts 
Friend lists 
Friend2 post 
Generate my page 
Friend1 post 
Friend999 post 
Friend1000 post 
Block writes 
OSDI 2012 
… 
6
Multiple Machines 
User posts 
Friend lists 
User posts 
Friend lists 
Generate my page 
Block writes 
Friend1 post 
Friend2 post 
… 
Friend999 post 
Friend1000 post 
OSDI 2012 
7
Multiple Datacenters 
User posts 
Friend lists 
User posts 
x1000 
Friend lists 
User posts 
Friend lists 
User posts 
Friend lists 
Generate my page 
Friend1 post 
US 
Friend2 post 
Spain 
Friend999 post 
Brazil 
Friend1000 post 
OSDI 2012 
… 
Russia 
8 
x1000 
x1000 
x1000
Version Management 
• Transactions that write use strict 2PL 
– Each transaction T is assigned a timestamp s 
– Data written by T is timestamped with s 
Time <8 8 
[X] 
[me] 
15 
[P] 
My friends 
My posts 
X’s friends 
[] 
[] 
OSDI 2012 9
Synchronizing Snapshots 
Global wall-clock time 
== 
External Consistency: 
Commit order respects global wall-time order 
== 
Timestamp order respects global wall-time order 
given 
timestamp order == commit order 
OSDI 2012 10
Timestamps, Global Clock 
• Strict two-phase locking for write transactions 
• Assign timestamp while locks are held 
T 
Acquired locks Release locks 
Pick s = now() 
OSDI 2012 11
Timestamp Invariants 
• Timestamp order == commit order 
T2 
T1 
• Timestamp order respects global wall-time order 
T3 
T4 
OSDI 2012 12
TrueTime 
• “Global wall-clock time” with bounded 
uncertainty 
time 
TT.now() 
earliest latest 
2*ε 
OSDI 2012 13
Timestamps and TrueTime 
T 
Acquired locks Release locks 
Pick s = TT.now().latest 
s Wait until TT.now().earliest > s 
OSDI 2012 
Commit wait 
average ε 
average ε 
14
Commit Wait and Replication 
OSDI 2012 
T 
Start consensus Notify slaves 
Acquired locks Release locks 
Pick s Commit wait done 
15 
Achieve consensus
Commit Wait and 2-Phase Commit 
TC 
OSDI 2012 
Acquired locks Release locks 
TP1 
Notify participants of s 
Acquired locks Release locks 
TP2 
Acquired locks Release locks 
Compute s for each Commit wait done 
16 
Start logging Done logging 
Prepared 
Compute overall s 
Committed 
Send s
Example 
TC T2 
TP 
Remove X from 
my friend list 
Risky post P 
s=8 s=15 
Remove myself 
from X’s friend list 
sC=6 
sP=8 
s=8 
Time <8 
[X] 
[me] 
15 
[P] 
My friends 
My posts 
X’s friends 
8 
[] 
[] 
OSDI 2012 17
What Have We Covered? 
• Lock-free read transactions across datacenters 
• External consistency 
• Timestamp assignment 
• TrueTime 
– Uncertainty in time can be waited out 
OSDI 2012 18
What Haven’t We Covered? 
• How to read at the present time 
• Atomic schema changes 
– Mostly non-blocking 
– Commit in the future 
• Non-blocking reads in the past 
– At any sufficiently up-to-date replica 
OSDI 2012 19
TrueTime Architecture 
GPS 
timemaster 
GPS 
timemaster 
GPS 
timemaster 
Atomic-clock 
timemaster 
GPS 
timemaster 
GPS 
timemaster 
Client 
Datacenter 1 Datacenter 2 … Datacenter n 
Compute reference [earliest, latest] = now ± ε 
OSDI 2012 20
TrueTime implementation 
now = reference now + local-clock offset 
ε = reference ε + worst-case local-clock drift 
200 μs/sec 
time 
ε 
0sec 30sec 60sec 90sec 
+6ms 
reference 
uncertainty 
OSDI 2012 21
What If a Clock Goes Rogue? 
• Timestamp assignment would violate external 
consistency 
• Empirically unlikely based on 1 year of data 
– Bad CPUs 6 times more likely than bad clocks 
OSDI 2012 22
Network-Induced Uncertainty 
10 
8 
6 
4 
Mar 29 Mar 30 Mar 31 Apr 1 
OSDI 2012 
Date 
2 
Epsilon (ms) 
99.9 
99 
90 
6 
5 
4 
3 
2 
6AM 8AM 10AM 12PM 
Date (April 13) 
1 
23
What’s in the Literature 
• External consistency/linearizability 
• Distributed databases 
• Concurrency control 
• Replication 
• Time (NTP, Marzullo) 
OSDI 2012 24
Future Work 
• Improving TrueTime 
– Lower ε < 1 ms 
• Building out database features 
– Finish implementing basic features 
– Efficiently support rich query patterns 
OSDI 2012 25
Conclusions 
• Reify clock uncertainty in time APIs 
– Known unknowns are better than unknown 
unknowns 
– Rethink algorithms to make use of uncertainty 
• Stronger semantics are achievable 
– Greater scale != weaker semantics 
OSDI 2012 26
Thanks 
• To the Spanner team and customers 
• To our shepherd and reviewers 
• To lots of Googlers for feedback 
• To you for listening! 
• Questions? 
OSDI 2012 27
1 of 27

Recommended

Google Spanner by
Google SpannerGoogle Spanner
Google SpannerVaidas Brundza
2.1K views39 slides
Spanner by
SpannerSpanner
SpannerNafeez Abrar
1.6K views62 slides
Spanner (may 19) by
Spanner (may 19)Spanner (may 19)
Spanner (may 19)Sultan Ahmed
410 views91 slides
Corbett osdi12 slides (1) by
Corbett osdi12 slides (1)Corbett osdi12 slides (1)
Corbett osdi12 slides (1)Aksh54
31 views27 slides
Google Spanner : our understanding of concepts and implications by
Google Spanner : our understanding of concepts and implicationsGoogle Spanner : our understanding of concepts and implications
Google Spanner : our understanding of concepts and implicationsHarisankar H
6.1K views18 slides
An Overview of Spanner: Google's Globally Distributed Database by
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseBenjamin Bengfort
8.6K views23 slides

More Related Content

What's hot

Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP by
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPDiscretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPTathagata Das
1.3K views36 slides
MapReduce basics by
MapReduce basicsMapReduce basics
MapReduce basicsHarisankar H
1.3K views18 slides
Scheduling in Linux and Web Servers by
Scheduling in Linux and Web ServersScheduling in Linux and Web Servers
Scheduling in Linux and Web ServersDavid Evans
6.3K views63 slides
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ... by
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Tathagata Das
1.3K views41 slides
Replication, Durability, and Disaster Recovery by
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoverySteven Francia
9.9K views86 slides
SignalFx: Making Cassandra Perform as a Time Series Database by
SignalFx: Making Cassandra Perform as a Time Series DatabaseSignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series DatabaseDataStax Academy
1.6K views23 slides

What's hot(20)

Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP by Tathagata Das
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPDiscretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Tathagata Das1.3K views
Scheduling in Linux and Web Servers by David Evans
Scheduling in Linux and Web ServersScheduling in Linux and Web Servers
Scheduling in Linux and Web Servers
David Evans6.3K views
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ... by Tathagata Das
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Tathagata Das1.3K views
Replication, Durability, and Disaster Recovery by Steven Francia
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster Recovery
Steven Francia9.9K views
SignalFx: Making Cassandra Perform as a Time Series Database by DataStax Academy
SignalFx: Making Cassandra Perform as a Time Series DatabaseSignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series Database
DataStax Academy1.6K views
Dynamo cassandra by Wu Liang
Dynamo cassandraDynamo cassandra
Dynamo cassandra
Wu Liang927 views
Introduction to Cassandra: Replication and Consistency by Benjamin Black
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and Consistency
Benjamin Black51.9K views
Pgxc scalability pg_open2012 by Ashutosh Bapat
Pgxc scalability pg_open2012Pgxc scalability pg_open2012
Pgxc scalability pg_open2012
Ashutosh Bapat783 views
Managing terabytes: When Postgres gets big by Selena Deckelmann
Managing terabytes: When Postgres gets bigManaging terabytes: When Postgres gets big
Managing terabytes: When Postgres gets big
Selena Deckelmann753 views
Real-Time Big Data with Storm, Kafka and GigaSpaces by Oleksii Diagiliev
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
Oleksii Diagiliev1.1K views
Paxos building-reliable-system by Yanpo Zhang
Paxos building-reliable-systemPaxos building-reliable-system
Paxos building-reliable-system
Yanpo Zhang2.1K views
Cassandra NYC 2011 Data Modeling by Matthew Dennis
Cassandra NYC 2011 Data ModelingCassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data Modeling
Matthew Dennis7.9K views
Introduction to Cassandra by Gokhan Atil
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
Gokhan Atil7.7K views
Understanding AntiEntropy in Cassandra by Jason Brown
Understanding AntiEntropy in CassandraUnderstanding AntiEntropy in Cassandra
Understanding AntiEntropy in Cassandra
Jason Brown17.2K views
Distributed Postgres by Stas Kelvich
Distributed PostgresDistributed Postgres
Distributed Postgres
Stas Kelvich370 views
Large volume data analysis on the Typesafe Reactive Platform by Martin Zapletal
Large volume data analysis on the Typesafe Reactive PlatformLarge volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive Platform
Martin Zapletal4.7K views

Similar to Spanner osdi2012

A Brief History of Stream Processing by
A Brief History of Stream ProcessingA Brief History of Stream Processing
A Brief History of Stream ProcessingAleksandr Kuboskin, CFA
234 views66 slides
Lifting the hood on spark streaming - StampedeCon 2015 by
Lifting the hood on spark streaming - StampedeCon 2015Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015StampedeCon
1.4K views53 slides
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing by
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
900 views57 slides
Cassandra introduction mars jug by
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jugDuyhai Doan
1K views110 slides
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa... by
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...DataStax
5.4K views25 slides
TenMax Data Pipeline Experience Sharing by
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingChen-en Lu
2K views43 slides

Similar to Spanner osdi2012(20)

Lifting the hood on spark streaming - StampedeCon 2015 by StampedeCon
Lifting the hood on spark streaming - StampedeCon 2015Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015
StampedeCon1.4K views
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing by DoiT International
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
DoiT International900 views
Cassandra introduction mars jug by Duyhai Doan
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jug
Duyhai Doan1K views
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa... by DataStax
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
DataStax5.4K views
TenMax Data Pipeline Experience Sharing by Chen-en Lu
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience Sharing
Chen-en Lu2K views
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr... by Oscar Corcho
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
Oscar Corcho25.1K views
Finding an unusual cause of max_user_connections in MySQL by Olivier Doucet
Finding an unusual cause of max_user_connections in MySQLFinding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQL
Olivier Doucet572 views
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012 by TEST Huddle
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
TEST Huddle388 views
Dataflow - A Unified Model for Batch and Streaming Data Processing by DoiT International
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data Processing
DoiT International1.3K views
Open Source North - MongoDB Advanced Schema Design Patterns by Matthew Kalan
Open Source North - MongoDB Advanced Schema Design PatternsOpen Source North - MongoDB Advanced Schema Design Patterns
Open Source North - MongoDB Advanced Schema Design Patterns
Matthew Kalan411 views
Data Stream Processing - Concepts and Frameworks by Matthias Niehoff
Data Stream Processing - Concepts and FrameworksData Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and Frameworks
Matthias Niehoff1.3K views
Sizing MongoDB Clusters by MongoDB
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters
MongoDB828 views
A Day in the Life of a Druid Implementor and Druid's Roadmap by Itai Yaffe
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
Itai Yaffe297 views
Tsinghua University: Two Exemplary Applications in China by DataStax Academy
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
DataStax Academy775 views
Crash course on data streaming (with examples using Apache Flink) by Vincenzo Gulisano
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
Vincenzo Gulisano255 views
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste... by DataStax Academy
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
DataStax Academy1.2K views

Recently uploaded

Cracking the Code Decoding Leased Line Quotes for Connectivity Excellence.pptx by
Cracking the Code Decoding Leased Line Quotes for Connectivity Excellence.pptxCracking the Code Decoding Leased Line Quotes for Connectivity Excellence.pptx
Cracking the Code Decoding Leased Line Quotes for Connectivity Excellence.pptxLeasedLinesQuote
5 views8 slides
WITS Deck by
WITS DeckWITS Deck
WITS DeckW.I.T.S.
27 views22 slides
How to think like a threat actor for Kubernetes.pptx by
How to think like a threat actor for Kubernetes.pptxHow to think like a threat actor for Kubernetes.pptx
How to think like a threat actor for Kubernetes.pptxLibbySchulze1
7 views33 slides
Affiliate Marketing by
Affiliate MarketingAffiliate Marketing
Affiliate MarketingNavin Dhanuka
20 views30 slides
hamro digital logics.pptx by
hamro digital logics.pptxhamro digital logics.pptx
hamro digital logics.pptxtupeshghimire
11 views36 slides
ARNAB12.pdf by
ARNAB12.pdfARNAB12.pdf
ARNAB12.pdfArnabChakraborty499766
5 views83 slides

Recently uploaded(10)

Spanner osdi2012

  • 1. Spanner: Google’s Globally-Distributed Database Wilson Hsieh representing a host of authors OSDI 2012
  • 2. What is Spanner? • Distributed multiversion database • General-purpose transactions (ACID) • SQL query language • Schematized tables • Semi-relational data model • Running in production • Storage for Google’s ad data • Replaced a sharded MySQL database OSDI 2012 2
  • 3. Example: Social Network OSDI 2012 User posts Friend lists US Brazil Russia San Francisco Seattle Arizona Spain Sao Paulo Santiago Buenos Aires Moscow Berlin Krakow London Paris Berlin Madrid Lisbon 3 x1000 x1000 x1000 x1000
  • 4. Overview • Feature: Lock-free distributed read transactions • Property: External consistency of distributed transactions – First system at global scale • Implementation: Integration of concurrency control, replication, and 2PC – Correctness and performance • Enabling technology: TrueTime – Interval-based global time OSDI 2012 4
  • 5. Read Transactions • Generate a page of friends’ recent posts – Consistent view of friend list and their posts OSDI 2012 Why consistency matters 1. Remove untrustworthy person X as friend 2. Post P: “My government is repressive…” 5
  • 6. Single Machine User posts Friend lists Friend2 post Generate my page Friend1 post Friend999 post Friend1000 post Block writes OSDI 2012 … 6
  • 7. Multiple Machines User posts Friend lists User posts Friend lists Generate my page Block writes Friend1 post Friend2 post … Friend999 post Friend1000 post OSDI 2012 7
  • 8. Multiple Datacenters User posts Friend lists User posts x1000 Friend lists User posts Friend lists User posts Friend lists Generate my page Friend1 post US Friend2 post Spain Friend999 post Brazil Friend1000 post OSDI 2012 … Russia 8 x1000 x1000 x1000
  • 9. Version Management • Transactions that write use strict 2PL – Each transaction T is assigned a timestamp s – Data written by T is timestamped with s Time <8 8 [X] [me] 15 [P] My friends My posts X’s friends [] [] OSDI 2012 9
  • 10. Synchronizing Snapshots Global wall-clock time == External Consistency: Commit order respects global wall-time order == Timestamp order respects global wall-time order given timestamp order == commit order OSDI 2012 10
  • 11. Timestamps, Global Clock • Strict two-phase locking for write transactions • Assign timestamp while locks are held T Acquired locks Release locks Pick s = now() OSDI 2012 11
  • 12. Timestamp Invariants • Timestamp order == commit order T2 T1 • Timestamp order respects global wall-time order T3 T4 OSDI 2012 12
  • 13. TrueTime • “Global wall-clock time” with bounded uncertainty time TT.now() earliest latest 2*ε OSDI 2012 13
  • 14. Timestamps and TrueTime T Acquired locks Release locks Pick s = TT.now().latest s Wait until TT.now().earliest > s OSDI 2012 Commit wait average ε average ε 14
  • 15. Commit Wait and Replication OSDI 2012 T Start consensus Notify slaves Acquired locks Release locks Pick s Commit wait done 15 Achieve consensus
  • 16. Commit Wait and 2-Phase Commit TC OSDI 2012 Acquired locks Release locks TP1 Notify participants of s Acquired locks Release locks TP2 Acquired locks Release locks Compute s for each Commit wait done 16 Start logging Done logging Prepared Compute overall s Committed Send s
  • 17. Example TC T2 TP Remove X from my friend list Risky post P s=8 s=15 Remove myself from X’s friend list sC=6 sP=8 s=8 Time <8 [X] [me] 15 [P] My friends My posts X’s friends 8 [] [] OSDI 2012 17
  • 18. What Have We Covered? • Lock-free read transactions across datacenters • External consistency • Timestamp assignment • TrueTime – Uncertainty in time can be waited out OSDI 2012 18
  • 19. What Haven’t We Covered? • How to read at the present time • Atomic schema changes – Mostly non-blocking – Commit in the future • Non-blocking reads in the past – At any sufficiently up-to-date replica OSDI 2012 19
  • 20. TrueTime Architecture GPS timemaster GPS timemaster GPS timemaster Atomic-clock timemaster GPS timemaster GPS timemaster Client Datacenter 1 Datacenter 2 … Datacenter n Compute reference [earliest, latest] = now ± ε OSDI 2012 20
  • 21. TrueTime implementation now = reference now + local-clock offset ε = reference ε + worst-case local-clock drift 200 μs/sec time ε 0sec 30sec 60sec 90sec +6ms reference uncertainty OSDI 2012 21
  • 22. What If a Clock Goes Rogue? • Timestamp assignment would violate external consistency • Empirically unlikely based on 1 year of data – Bad CPUs 6 times more likely than bad clocks OSDI 2012 22
  • 23. Network-Induced Uncertainty 10 8 6 4 Mar 29 Mar 30 Mar 31 Apr 1 OSDI 2012 Date 2 Epsilon (ms) 99.9 99 90 6 5 4 3 2 6AM 8AM 10AM 12PM Date (April 13) 1 23
  • 24. What’s in the Literature • External consistency/linearizability • Distributed databases • Concurrency control • Replication • Time (NTP, Marzullo) OSDI 2012 24
  • 25. Future Work • Improving TrueTime – Lower ε < 1 ms • Building out database features – Finish implementing basic features – Efficiently support rich query patterns OSDI 2012 25
  • 26. Conclusions • Reify clock uncertainty in time APIs – Known unknowns are better than unknown unknowns – Rethink algorithms to make use of uncertainty • Stronger semantics are achievable – Greater scale != weaker semantics OSDI 2012 26
  • 27. Thanks • To the Spanner team and customers • To our shepherd and reviewers • To lots of Googlers for feedback • To you for listening! • Questions? OSDI 2012 27