5. Example: Social Network
Spanner 5
User posts
Friend lists
User posts
Friend lists
User posts
Friend lists
User posts
Friend lists
US
Brazil
Russia
Spain
San Francisco
Seattle
Arizona
Sao Paulo
Santiago
Buenos Aires
Moscow
Berlin
Krakow
London
Paris
Berlin
Madrid
Lisbon
User posts
Friend lists
x1000
x1000
x1000
x1000
7. User posts
Friend lists
User posts
Friend lists
Single Machine
Spanner 7
Friend2 post
Generate my page
Friend1 post
Friend1000 post
Friend999 post
Block writes
…
8. User posts
Friend lists
User posts
Friend lists
Multiple Machines
Spanner 8
User posts
Friend lists
Generate my page
Friend2 post
Friend1 post
Friend1000 post
Friend999 post
User posts
Friend lists
Block writes
…
9. User posts
Friend lists
User posts
Friend lists
User posts
Friend lists
Multiple Datacenters
Spanner 9
User posts
Friend lists
Generate my page
Friend2 post
Friend1 post
Friend1000 post
Friend999 post
…
US
Spain
Russia
Brazil
x1000
x1000
x1000
x1000
31. Leader Lease
Lease default 10 seconds
Sends request for timed lease votes
Quorum of lease vote ensures leadership
May request for extension
Spanner 31
35. Version Management
Transactions that write use strict 2PL
– Each transaction T is assigned a timestamp s
– Data written by T is timestamped with s
Spanner 35
Time 8<8
[X]
[me]
15
[P]
My friends
My posts
X’s friends
[]
[]
40. What If a Clock Goes
Rogue?
Timestamp assignment would violate
external consistency
Empirically unlikely based on 1 year of
data
– Bad CPUs 6 times more likely than bad
clocks
Spanner 40
46. Timestamps, Global Clock
Strict two-phase locking for write transactions
Assign timestamp while locks are held
Spanner 46
T
Pick s = now()
Acquired locks Release locks
47. Timestamps and TrueTime
Spanner 47
T
Pick s = TT.now().latest
Acquired locks Release locks
Wait until TT.now().earliest > ss
average ε
Commit wait
average ε
48. Commit Wait and 2-Phase
Commit
Spanner 48
TC
Acquired locks Release locks
TP1
Acquired locks Release locks
TP2
Acquired locks Release locks
Notify participants of s
Commit wait doneCompute s for each
Start loggingDone logging
Prepared
Compute overall s
Committed
Send s
49. Example
Spanner 49
TP
Remove X
from my
friend list
Remove myself
from X’s friend
list
sC=6
sP=8
s=8 s=15
Risky post P
s=8
Time <8
[X]
[me]
15
TC T2
[P]
My friends
My posts
X’s friends
8
[]
[]
55. Reduce wait time for read
Fine-grained mapping from key ranges to
Fine-grained mapping from key ranges to
LastTS()
Spanner 55
TM
safet
56. Future Work
Improving TrueTime
– Lower ε < 1 ms
Building out database features
– Finish implementing basic features
– Efficiently support rich query patterns
Spanner 56
58. Q/A
Zone master seems to be a single point of
failure
Difference between BigTable tablet and
spanner tablet
Spanner 58
59. Q/A
Why TimeSlave daemon polls some near
and some far GPS masters for time
synchronization?
What if a server fails in midst of processing
a read-only request?
Spanner 59
60. What’s in the Literature
External consistency/linearizability
Distributed databases
Concurrency control
Replication
Time (NTP, Marzullo)
Spanner 60
61. Conclusions
Reify clock uncertainty in time APIs
– Known unknowns are better than unknown
unknowns
– Rethink algorithms to make use of
uncertainty
Stronger semantics are achievable
– Greater scale != weaker semantics
Spanner 61
62. Thanks
To Spanner Developer Team
To Sebestian Kanthak, Wilson Hsieh &
others
To you for listening!
Spanner 62
Editor's Notes
Bad hosts are evicted
Timemasters check themselves against other timemasters
Clients check themselves against timemasters