SlideShare a Scribd company logo
1 of 28
Download to read offline
Internet-scale Distributed Systems
Google Spanner
a
Synchronously-Replicated
Globally-Distributed
Multi-Version
Database
22.01.2013 Maciej JozwiakPage 1
Presented by:
Maciej Jozwiak
Internet-scale Distributed Systems
Agenda
• Problem description
• Overview of available solutions
• Globally-distributed database
• Architecture
• How is data replicated?
• Data model
• TrueTime API
• Transactions
• Summary
22.01.2013 Maciej JozwiakPage 2
Internet-scale Distributed Systems
Problem – Need for Scalable MySQL
• Google’s advertising backend
– Based on MySQL
• Relations
• Query language
– Manually sharded
• Resharding is very costly
– Global distribution
22.01.2013 Maciej JozwiakPage 3
SHARDING:
Sharding is another name
for "horizontal
partitioning" of a database.
Rows of a database table
are held separately, form a
partition which can be
located on a separate
database server or physical
location.
Internet-scale Distributed Systems22.01.2013 Maciej JozwiakPage 4
• Replicated ACID transactions
• Schematized semi-relational
tables
• Synchronous replication
support across data-centers
• Performance
• Lack of query language
• Scalability
• Throughput
• Performance
• Eventually-consistent
replication support
across data-centers
Overview of Available Solutions
Google Megastore
Internet-scale Distributed Systems22.01.2013 Maciej JozwiakPage 5
• Replicated ACID transactions
• Schematized semi-relational
tables
• Synchronous replication
support across data-centers
• Performance
• Lack of query language
• Scalability
• Throughput
• Performance
• Eventually-consistent
replication support
across data-centers
Overview of Available Solutions
Google Megastore
Internet-scale Distributed Systems22.01.2013 Maciej JozwiakPage 6
• Replicated ACID transactions
• Schematized semi-relational
tables
• Synchronous replication
support across data-centers
• Performance
• Lack of query language
• Scalability
• Throughput
• Performance
• Eventually-consistent
replication support
across data-centers
Overview of Available Solutions
Google Megastore
Internet-scale Distributed Systems
Bridging the gap between Megastore
and Bigtable
22.01.2013 Maciej JozwiakPage 7
Google
Megastore
• Removes the need to manually partition data
• Synchronous replication and automatic failover
• Strong transactional semantics
• SQL based query language
• Semi-relational, schematized tables
Solution: Google Spanner
Internet-scale Distributed Systems
Globally-Distributed Database
22.01.2013 Maciej JozwiakPage 8
Future scale:
• one million to 10 million servers
• 100s to 1000s locations around
the world
• 1013 directories
• 1018 bytes of storage
cross-datacenter
replicated data management:
• high availability
• minimize latency of data reads and writes
• replication configuration dynamically
controlled at a fine grain by applications
Internet-scale Distributed Systems
Spanner Deployment - Universe
22.01.2013 Maciej JozwiakPage 9
Universe master
(status + interactive debugging)
Placement driver
(move data across
zones automatically)
Internet-scale Distributed Systems
How Is Data Replicated?
22.01.2013 Maciej JozwiakPage 10
Paxos:
protocols for solving consensus in a network of unreliable
processors. Consensus is the process of agreeing on one result
among a group of participants. This problem becomes difficult when
the participants or their communication medium may experience
failures.
Spanserver software stack
Internet-scale Distributed Systems
Replication Configuration
• Replication configurations for data can be dynamically
controllered at a fine grain by applications
• Applications can specify constraints to control:
– which datacenters contain which data
– how far data is from user (to control read latency)
– how far replicas are from each other (to control write
latency)
– how many replicas are maintained (to control durability,
availability, and read performance)
• North America: 5 replicas, Europe 2 replicas
22.01.2013 Maciej JozwiakPage 11
Internet-scale Distributed Systems
Hierarchical Data Model
• Universe (Spanner deployment)
– Database
• Tables
– Rows and columns
– Must have an ordered set one or more primary key columns
– Primary key uniquely identifies each row
• Hierarchies of tables
– Tables must be partioned by client into one or more
hierarchies of tables (INTERLEAVE IN)
– Table in the top – directory table
22.01.2013 Maciej JozwiakPage 12
Internet-scale Distributed Systems
Storing Photo Metadata
22.01.2013 Maciej JozwiakPage 13
Internet-scale Distributed Systems
Storing Photo Metadata
22.01.2013 Maciej JozwiakPage 14
directory table
directory table
Internet-scale Distributed Systems
Storing Photo Metadata
22.01.2013 Maciej JozwiakPage 15
directory
Internet-scale Distributed Systems
Storing Photo Metadata
22.01.2013 Maciej JozwiakPage 16
directory
Internet-scale Distributed Systems
Storing Photo Metadata
22.01.2013 Maciej JozwiakPage 17
Albums(2,1) – row from the Albums table for
user_id 2, album_id 1
Interleaving is important because it allows
clients to describe the locality relationship
which is necessary for good performance in a
sharded, distributed database.
Internet-scale Distributed Systems
Key Innovation
22.01.2013 Maciej JozwiakPage 18
Spanner knows what time is it
Internet-scale Distributed Systems
Is Synchronizing Time at the Global
Scale Possible?
22.01.2013 Maciej JozwiakPage 19
Distributed systems dogma:
• synchronizing time within and
between datacenters is extremely
hard and uncertain
• serialization of requests is
impossible at global scale
Internet-scale Distributed Systems
Is Synchronizing Time at the Global
Scale Possible?
22.01.2013 Maciej JozwiakPage 20
Distributed systems dogma:
• synchronizing time within and
between datacenters is extremely
hard and uncertain
• serialization of requests is
impossible at global scale
Internet-scale Distributed Systems
Is Synchronizing Time at the Global
Scale Possible?
22.01.2013 Maciej JozwiakPage 21
Idea: Accept uncertainty, keep it
small and quantify (using GPS
and Atomic Clocks)
Internet-scale Distributed Systems
TrueTime API
22.01.2013 Maciej JozwiakPage 22
Idea: Accept uncertainty, keep
it small and quantify (using
GPS and Atomic Clocks)
Novel API distributing a
globally synchronized „proper
time”
Method Returns
TT.now() TTinterval: [earliest, latest]
TT.after(t) True if t has definitely passed
TT.before(t) True if t has definitely not
arrived
TT interval - is guaranteed to
contain the absolute time
during which TT.now() was
invoked
Internet-scale Distributed Systems
How TrueTime Is Implemented?
22.01.2013 Maciej JozwiakPage 23
set of time master machines per datacenter
majority of masters have
GPS receivers
with dedicated antennas
timeslave daemon per machine
The remaining masters (which we refer
to as Armageddon masters) are
equipped with atomic clocks.
Internet-scale Distributed Systems
Time References Vulnerabilities
• GPS:
– antenna and receiver failures
– local radio interference
– correlated failures (e.g. spoofing)
– GPS system outages
• Atomic clock:
– can drift significantly due to frequency error
2 forms of time reference – 2 failure modes
(uncorrelated to each other):
22.01.2013 Maciej JozwiakPage 24
Internet-scale Distributed Systems
How Does Daemon Work?
22.01.2013 Maciej JozwiakPage 25
Daemon polls variety of masters:
• chosen from nearby datacenters
• from further datacenters
• Armageddon masters
Daemon polls variety of masters and
reaches a consensus about correct
timestamp.
Daemon’s poll interval is 30 seconds.
Between synchronizations daemon advertises
a slowy increasing time uncertainty (e)
Internet-scale Distributed Systems
Transactions In Spanner
• Globally meaningful commit timestamps to
distributed transactions
– If A happens-before B, then
timestamp(A) < timestamp (B)
– A happens-before B if its effects become visible before B
begins, in real time
• Visible means acked to client or updates applied to some replica
• Begins means first request arrived at Spanner server
• Two-phase commit
22.01.2013 Maciej JozwiakPage 26
Internet-scale Distributed Systems
What About Performance?
22.01.2013 Maciej JozwiakPage 27
„We believe it is better to have application
programmers deal with performance problems
due to overuse of transactions as bottlenecks arise,
rather than always coding around the lack of
transactions.”
Two-phase commit can raise availability and performance
issues.
Internet-scale Distributed Systems
Summary
• Externally consistent global write-transactions with
synchronous replication.
• Schematized, semi-relational data model.
• SQL-like query interface.
• Auto-sharding, auto-rebalancing, automatic failure
response.
• Exposes control of data replication and placement to
user/application.
22.01.2013 Maciej JozwiakPage 28

More Related Content

What's hot

CockroachDB
CockroachDBCockroachDB
CockroachDBandrei moga
 
Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networksFrancisco Restivo
 
Cloud spanner architecture and use cases
Cloud spanner architecture and use casesCloud spanner architecture and use cases
Cloud spanner architecture and use casesGDG Cloud Bengaluru
 
Disk and File System Management in Linux
Disk and File System Management in LinuxDisk and File System Management in Linux
Disk and File System Management in LinuxHenry Osborne
 
Using ZFS file system with MySQL
Using ZFS file system with MySQLUsing ZFS file system with MySQL
Using ZFS file system with MySQLMydbops
 
Apache Spark Internals
Apache Spark InternalsApache Spark Internals
Apache Spark InternalsKnoldus Inc.
 
Bigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured DataBigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Dataelliando dias
 
Microservices Architecture Part 2 Event Sourcing and Saga
Microservices Architecture Part 2 Event Sourcing and SagaMicroservices Architecture Part 2 Event Sourcing and Saga
Microservices Architecture Part 2 Event Sourcing and SagaAraf Karsh Hamid
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark ArchitectureAlexey Grishchenko
 
Summary of "Google's Big Table" at nosql summer reading in Tokyo
Summary of "Google's Big Table" at nosql summer reading in TokyoSummary of "Google's Big Table" at nosql summer reading in Tokyo
Summary of "Google's Big Table" at nosql summer reading in TokyoCLOUDIAN KK
 
Hierarchical Storage Management
Hierarchical Storage ManagementHierarchical Storage Management
Hierarchical Storage ManagementJaydeep Patel
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep diveKashif Khan
 
Simplifying Change Data Capture using Databricks Delta
Simplifying Change Data Capture using Databricks DeltaSimplifying Change Data Capture using Databricks Delta
Simplifying Change Data Capture using Databricks DeltaDatabricks
 

What's hot (20)

CockroachDB
CockroachDBCockroachDB
CockroachDB
 
Big table
Big tableBig table
Big table
 
Distributed DBMS - Unit 3 - Distributed DBMS Architecture
Distributed DBMS - Unit 3 - Distributed DBMS ArchitectureDistributed DBMS - Unit 3 - Distributed DBMS Architecture
Distributed DBMS - Unit 3 - Distributed DBMS Architecture
 
Community detection in social networks
Community detection in social networksCommunity detection in social networks
Community detection in social networks
 
GOOGLE BIGTABLE
GOOGLE BIGTABLEGOOGLE BIGTABLE
GOOGLE BIGTABLE
 
Cloud spanner architecture and use cases
Cloud spanner architecture and use casesCloud spanner architecture and use cases
Cloud spanner architecture and use cases
 
Google BigTable
Google BigTableGoogle BigTable
Google BigTable
 
Distributed DBMS - Unit 5 - Semantic Data Control
Distributed DBMS - Unit 5 - Semantic Data ControlDistributed DBMS - Unit 5 - Semantic Data Control
Distributed DBMS - Unit 5 - Semantic Data Control
 
Disk and File System Management in Linux
Disk and File System Management in LinuxDisk and File System Management in Linux
Disk and File System Management in Linux
 
Using ZFS file system with MySQL
Using ZFS file system with MySQLUsing ZFS file system with MySQL
Using ZFS file system with MySQL
 
Apache Spark Internals
Apache Spark InternalsApache Spark Internals
Apache Spark Internals
 
Bigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured DataBigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Data
 
Microservices Architecture Part 2 Event Sourcing and Saga
Microservices Architecture Part 2 Event Sourcing and SagaMicroservices Architecture Part 2 Event Sourcing and Saga
Microservices Architecture Part 2 Event Sourcing and Saga
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Distributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query ProcessingDistributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query Processing
 
Summary of "Google's Big Table" at nosql summer reading in Tokyo
Summary of "Google's Big Table" at nosql summer reading in TokyoSummary of "Google's Big Table" at nosql summer reading in Tokyo
Summary of "Google's Big Table" at nosql summer reading in Tokyo
 
Hierarchical Storage Management
Hierarchical Storage ManagementHierarchical Storage Management
Hierarchical Storage Management
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
 
Simplifying Change Data Capture using Databricks Delta
Simplifying Change Data Capture using Databricks DeltaSimplifying Change Data Capture using Databricks Delta
Simplifying Change Data Capture using Databricks Delta
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 

Similar to Google Spanner - Synchronously-Replicated, Globally-Distributed, Multi-Version Database

chap-0 .ppt
chap-0 .pptchap-0 .ppt
chap-0 .pptLookly Sam
 
Designing Scalable Applications
Designing Scalable ApplicationsDesigning Scalable Applications
Designing Scalable ApplicationsFabricio Epaminondas
 
Cloud computing
Cloud computingCloud computing
Cloud computingAaron Tushabe
 
Cloud Computing - Geektalk
Cloud Computing - GeektalkCloud Computing - Geektalk
Cloud Computing - GeektalkMalisa Ncube
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservicesBigstep
 
Chapter 5.pptx
Chapter 5.pptxChapter 5.pptx
Chapter 5.pptxJoeBaker69
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopMohit Tare
 
Apos week 1 4
Apos week 1   4Apos week 1   4
Apos week 1 4alixafar
 
distributed system original.pdf
distributed system original.pdfdistributed system original.pdf
distributed system original.pdfKirimanyiJovanntanda
 
Distributed database management system
Distributed database management systemDistributed database management system
Distributed database management systemVinay D. Patel
 
MSF: Sync your Data On-Premises And To The Cloud - dotNetwork Gathering, Oct ...
MSF: Sync your Data On-Premises And To The Cloud - dotNetwork Gathering, Oct ...MSF: Sync your Data On-Premises And To The Cloud - dotNetwork Gathering, Oct ...
MSF: Sync your Data On-Premises And To The Cloud - dotNetwork Gathering, Oct ...sameh samir
 
Architectural Tactics for Large Scale Systems
Architectural Tactics for Large Scale SystemsArchitectural Tactics for Large Scale Systems
Architectural Tactics for Large Scale SystemsLen Bass
 
Scaling Systems: Architectures that Grow
Scaling Systems: Architectures that GrowScaling Systems: Architectures that Grow
Scaling Systems: Architectures that GrowGibraltar Software
 
CS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfCS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfKishaKiddo
 
Webinar: Adobe Experience Manager Clustering Made Easy on MongoDB
Webinar: Adobe Experience Manager Clustering Made Easy on MongoDB Webinar: Adobe Experience Manager Clustering Made Easy on MongoDB
Webinar: Adobe Experience Manager Clustering Made Easy on MongoDB MongoDB
 
Distributed Systems Real Life Applications
Distributed Systems Real Life ApplicationsDistributed Systems Real Life Applications
Distributed Systems Real Life ApplicationsAman Srivastava
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsDataStax
 
Overview of Distributed Systems
Overview of Distributed SystemsOverview of Distributed Systems
Overview of Distributed Systemsvampugani
 
Inroduction to grid computing by gargi shankar verma
Inroduction to grid computing by gargi shankar vermaInroduction to grid computing by gargi shankar verma
Inroduction to grid computing by gargi shankar vermagargishankar1981
 

Similar to Google Spanner - Synchronously-Replicated, Globally-Distributed, Multi-Version Database (20)

chap-0 .ppt
chap-0 .pptchap-0 .ppt
chap-0 .ppt
 
Designing Scalable Applications
Designing Scalable ApplicationsDesigning Scalable Applications
Designing Scalable Applications
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Cloud Computing - Geektalk
Cloud Computing - GeektalkCloud Computing - Geektalk
Cloud Computing - Geektalk
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
Chapter 5.pptx
Chapter 5.pptxChapter 5.pptx
Chapter 5.pptx
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Apos week 1 4
Apos week 1   4Apos week 1   4
Apos week 1 4
 
distributed system original.pdf
distributed system original.pdfdistributed system original.pdf
distributed system original.pdf
 
Distributed database management system
Distributed database management systemDistributed database management system
Distributed database management system
 
MSF: Sync your Data On-Premises And To The Cloud - dotNetwork Gathering, Oct ...
MSF: Sync your Data On-Premises And To The Cloud - dotNetwork Gathering, Oct ...MSF: Sync your Data On-Premises And To The Cloud - dotNetwork Gathering, Oct ...
MSF: Sync your Data On-Premises And To The Cloud - dotNetwork Gathering, Oct ...
 
Architectural Tactics for Large Scale Systems
Architectural Tactics for Large Scale SystemsArchitectural Tactics for Large Scale Systems
Architectural Tactics for Large Scale Systems
 
Scaling Systems: Architectures that Grow
Scaling Systems: Architectures that GrowScaling Systems: Architectures that Grow
Scaling Systems: Architectures that Grow
 
Introduction
IntroductionIntroduction
Introduction
 
CS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfCS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdf
 
Webinar: Adobe Experience Manager Clustering Made Easy on MongoDB
Webinar: Adobe Experience Manager Clustering Made Easy on MongoDB Webinar: Adobe Experience Manager Clustering Made Easy on MongoDB
Webinar: Adobe Experience Manager Clustering Made Easy on MongoDB
 
Distributed Systems Real Life Applications
Distributed Systems Real Life ApplicationsDistributed Systems Real Life Applications
Distributed Systems Real Life Applications
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
 
Overview of Distributed Systems
Overview of Distributed SystemsOverview of Distributed Systems
Overview of Distributed Systems
 
Inroduction to grid computing by gargi shankar verma
Inroduction to grid computing by gargi shankar vermaInroduction to grid computing by gargi shankar verma
Inroduction to grid computing by gargi shankar verma
 

Recently uploaded

React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 

Recently uploaded (20)

React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 

Google Spanner - Synchronously-Replicated, Globally-Distributed, Multi-Version Database

  • 1. Internet-scale Distributed Systems Google Spanner a Synchronously-Replicated Globally-Distributed Multi-Version Database 22.01.2013 Maciej JozwiakPage 1 Presented by: Maciej Jozwiak
  • 2. Internet-scale Distributed Systems Agenda • Problem description • Overview of available solutions • Globally-distributed database • Architecture • How is data replicated? • Data model • TrueTime API • Transactions • Summary 22.01.2013 Maciej JozwiakPage 2
  • 3. Internet-scale Distributed Systems Problem – Need for Scalable MySQL • Google’s advertising backend – Based on MySQL • Relations • Query language – Manually sharded • Resharding is very costly – Global distribution 22.01.2013 Maciej JozwiakPage 3 SHARDING: Sharding is another name for "horizontal partitioning" of a database. Rows of a database table are held separately, form a partition which can be located on a separate database server or physical location.
  • 4. Internet-scale Distributed Systems22.01.2013 Maciej JozwiakPage 4 • Replicated ACID transactions • Schematized semi-relational tables • Synchronous replication support across data-centers • Performance • Lack of query language • Scalability • Throughput • Performance • Eventually-consistent replication support across data-centers Overview of Available Solutions Google Megastore
  • 5. Internet-scale Distributed Systems22.01.2013 Maciej JozwiakPage 5 • Replicated ACID transactions • Schematized semi-relational tables • Synchronous replication support across data-centers • Performance • Lack of query language • Scalability • Throughput • Performance • Eventually-consistent replication support across data-centers Overview of Available Solutions Google Megastore
  • 6. Internet-scale Distributed Systems22.01.2013 Maciej JozwiakPage 6 • Replicated ACID transactions • Schematized semi-relational tables • Synchronous replication support across data-centers • Performance • Lack of query language • Scalability • Throughput • Performance • Eventually-consistent replication support across data-centers Overview of Available Solutions Google Megastore
  • 7. Internet-scale Distributed Systems Bridging the gap between Megastore and Bigtable 22.01.2013 Maciej JozwiakPage 7 Google Megastore • Removes the need to manually partition data • Synchronous replication and automatic failover • Strong transactional semantics • SQL based query language • Semi-relational, schematized tables Solution: Google Spanner
  • 8. Internet-scale Distributed Systems Globally-Distributed Database 22.01.2013 Maciej JozwiakPage 8 Future scale: • one million to 10 million servers • 100s to 1000s locations around the world • 1013 directories • 1018 bytes of storage cross-datacenter replicated data management: • high availability • minimize latency of data reads and writes • replication configuration dynamically controlled at a fine grain by applications
  • 9. Internet-scale Distributed Systems Spanner Deployment - Universe 22.01.2013 Maciej JozwiakPage 9 Universe master (status + interactive debugging) Placement driver (move data across zones automatically)
  • 10. Internet-scale Distributed Systems How Is Data Replicated? 22.01.2013 Maciej JozwiakPage 10 Paxos: protocols for solving consensus in a network of unreliable processors. Consensus is the process of agreeing on one result among a group of participants. This problem becomes difficult when the participants or their communication medium may experience failures. Spanserver software stack
  • 11. Internet-scale Distributed Systems Replication Configuration • Replication configurations for data can be dynamically controllered at a fine grain by applications • Applications can specify constraints to control: – which datacenters contain which data – how far data is from user (to control read latency) – how far replicas are from each other (to control write latency) – how many replicas are maintained (to control durability, availability, and read performance) • North America: 5 replicas, Europe 2 replicas 22.01.2013 Maciej JozwiakPage 11
  • 12. Internet-scale Distributed Systems Hierarchical Data Model • Universe (Spanner deployment) – Database • Tables – Rows and columns – Must have an ordered set one or more primary key columns – Primary key uniquely identifies each row • Hierarchies of tables – Tables must be partioned by client into one or more hierarchies of tables (INTERLEAVE IN) – Table in the top – directory table 22.01.2013 Maciej JozwiakPage 12
  • 13. Internet-scale Distributed Systems Storing Photo Metadata 22.01.2013 Maciej JozwiakPage 13
  • 14. Internet-scale Distributed Systems Storing Photo Metadata 22.01.2013 Maciej JozwiakPage 14 directory table directory table
  • 15. Internet-scale Distributed Systems Storing Photo Metadata 22.01.2013 Maciej JozwiakPage 15 directory
  • 16. Internet-scale Distributed Systems Storing Photo Metadata 22.01.2013 Maciej JozwiakPage 16 directory
  • 17. Internet-scale Distributed Systems Storing Photo Metadata 22.01.2013 Maciej JozwiakPage 17 Albums(2,1) – row from the Albums table for user_id 2, album_id 1 Interleaving is important because it allows clients to describe the locality relationship which is necessary for good performance in a sharded, distributed database.
  • 18. Internet-scale Distributed Systems Key Innovation 22.01.2013 Maciej JozwiakPage 18 Spanner knows what time is it
  • 19. Internet-scale Distributed Systems Is Synchronizing Time at the Global Scale Possible? 22.01.2013 Maciej JozwiakPage 19 Distributed systems dogma: • synchronizing time within and between datacenters is extremely hard and uncertain • serialization of requests is impossible at global scale
  • 20. Internet-scale Distributed Systems Is Synchronizing Time at the Global Scale Possible? 22.01.2013 Maciej JozwiakPage 20 Distributed systems dogma: • synchronizing time within and between datacenters is extremely hard and uncertain • serialization of requests is impossible at global scale
  • 21. Internet-scale Distributed Systems Is Synchronizing Time at the Global Scale Possible? 22.01.2013 Maciej JozwiakPage 21 Idea: Accept uncertainty, keep it small and quantify (using GPS and Atomic Clocks)
  • 22. Internet-scale Distributed Systems TrueTime API 22.01.2013 Maciej JozwiakPage 22 Idea: Accept uncertainty, keep it small and quantify (using GPS and Atomic Clocks) Novel API distributing a globally synchronized „proper time” Method Returns TT.now() TTinterval: [earliest, latest] TT.after(t) True if t has definitely passed TT.before(t) True if t has definitely not arrived TT interval - is guaranteed to contain the absolute time during which TT.now() was invoked
  • 23. Internet-scale Distributed Systems How TrueTime Is Implemented? 22.01.2013 Maciej JozwiakPage 23 set of time master machines per datacenter majority of masters have GPS receivers with dedicated antennas timeslave daemon per machine The remaining masters (which we refer to as Armageddon masters) are equipped with atomic clocks.
  • 24. Internet-scale Distributed Systems Time References Vulnerabilities • GPS: – antenna and receiver failures – local radio interference – correlated failures (e.g. spoofing) – GPS system outages • Atomic clock: – can drift significantly due to frequency error 2 forms of time reference – 2 failure modes (uncorrelated to each other): 22.01.2013 Maciej JozwiakPage 24
  • 25. Internet-scale Distributed Systems How Does Daemon Work? 22.01.2013 Maciej JozwiakPage 25 Daemon polls variety of masters: • chosen from nearby datacenters • from further datacenters • Armageddon masters Daemon polls variety of masters and reaches a consensus about correct timestamp. Daemon’s poll interval is 30 seconds. Between synchronizations daemon advertises a slowy increasing time uncertainty (e)
  • 26. Internet-scale Distributed Systems Transactions In Spanner • Globally meaningful commit timestamps to distributed transactions – If A happens-before B, then timestamp(A) < timestamp (B) – A happens-before B if its effects become visible before B begins, in real time • Visible means acked to client or updates applied to some replica • Begins means first request arrived at Spanner server • Two-phase commit 22.01.2013 Maciej JozwiakPage 26
  • 27. Internet-scale Distributed Systems What About Performance? 22.01.2013 Maciej JozwiakPage 27 „We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions.” Two-phase commit can raise availability and performance issues.
  • 28. Internet-scale Distributed Systems Summary • Externally consistent global write-transactions with synchronous replication. • Schematized, semi-relational data model. • SQL-like query interface. • Auto-sharding, auto-rebalancing, automatic failure response. • Exposes control of data replication and placement to user/application. 22.01.2013 Maciej JozwiakPage 28