SlideShare a Scribd company logo
1 of 111
Architecture of a Geo-Distributed SQL Database
CockroachDB
Peter Mattis (@petermattis), Co-founder & CTO
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
cockroachdb-distributed-sql/
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
CockroachDB: Geo-distributed SQL Database
Make Data Easy
• Distributed
○ Horizontally scalable to grow with your application
• Geo-distributed
○ Handle datacenter failures
○ Place data near usage
○ Push computation near data
• SQL
○ Lingua-franca for rich data storage
○ Schemas, indexes, and transactions make app development easier
AGENDA
● Introduction
● Ranges and Replicas
● Transactions
● SQL Data in a KV World
● SQL Execution
● SQL Optimization
Distributed, Replicated, Transactional KV*
• Keys and values are strings
○ Lexicographically ordered by key
• Multi-version concurrency control (MVCC)
○ Values are never updated “in place”, newer versions shadow older versions
○ Tombstones are used to delete values
○ Provides snapshot to each transaction
• Monolithic key-space
* Not exposed for external usage
Monolithic Key Space
DOGS
carl
dagne
figment
jack
pinetop
sooshi
stella
zee
muddy
peetey
lula
lady
Monolithic logical key space
● Ordered lexicographically by key
Ranges
DOGS
carl
dagne
figment
jack
pinetop
sooshi
stella
zee
muddy
peetey
lula
lady
carl
dagne
figment
jack
muddy
peetey
lula
lady pinetop
sooshi
stella
zee
Key space divided into contiguous ~64MB ranges
Ranges are small enough to
be moved/split quickly
Ranges are large enough to
amortize indexing overhead
Range Indexing
DOGS
carl
dagne
figment
jack
pinetop
sooshi
stella
zee
muddy
peetey
lula
lady
Index structure used to
locate ranges
(very much like a B-tree)
1
2
3
carl - jack
lady - peetey
pinetop - zee
carl
dagne
figment
jack
muddy
peetey
lula
lady pinetop
sooshi
stella
zee
Ordered Range Scans
DOGS
carl
dagne
figment
jack
pinetop
sooshi
zee
peetey
lula
lady
Ordered keys enable
efficient range scans
dogs >= “muddy” AND <= “stella”
1
2
3
carl - jack
lady - peetey
pinetop - zee
carl
dagne
figment
jack peetey
lula
lady pinetop
sooshi
zee
muddy stella
stella
muddy
Transactional Updates
DOGS
carl
dagne
figment
jack
pinetop
sooshi
zee
peetey
lula
lady
Transactions used to insert
records into ranges
1
2
3
carl - jack
lady - peetey
pinetop - zee
stella
muddy
INSERT[sunny]
INSERT[sunny]
Space available in range? - YES
carl
dagne
figment
jack
muddy
peetey
lula
lady pinetop
sooshi
stella
zee
pinetop
sooshi
stella
zee
muddy
peetey
lula
lady
✓?
Transactional Updates
DOGS
carl
dagne
figment
jack
pinetop
sooshi
zee
peetey
lula
lady
1
2
3
carl - jack
lady - peetey
pinetop - zee
stella
muddy
INSERT[sunny]
carl
dagne
figment
jack
muddy
peetey
lula
lady pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
zee
muddy
peetey
lula
lady
✓
Transactions used to insert
records into ranges
INSERT[sunny]
Range Splits
DOGS
carl
dagne
figment
jack
pinetop
sooshi
zee
peetey
lula
lady
1
2
3
carl - jack
lady - peetey
pinetop - zee
stella
muddy
INSERT[rudy]
carl
dagne
figment
jack
muddy
peetey
lula
lady pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
zee
muddy
peetey
lula
lady
BUT… what happens when
a range is full?
✓? INSERT[rudy]
Space available in range? - NO
Range Splits
DOGS
carl
dagne
figment
jack
pinetop
sooshi
zee
peetey
lula
lady
stella
muddy
INSERT[rudy]
carl
dagne
figment
jack
muddy
peetey
lula
lady pinetop
rudy
sooshi
pinetop
sooshi
stella
zee
muddy
peetey
lula
lady
Ranges are automatically
split, a new range index is
created & order maintained
✓ INSERT[rudy]
split range and insert
stella
sunny
zee
1
2
3
carl - jack
lady - peetey
pinetop - sooshi
4 stella - zee
Raft and Replication
Ranges (~64MB) are the unit of replication
Each range is a Raft group
(Raft is a consensus replication protocol)
Default to 3 replicas, though this is configurable
• Important system ranges default to 5 replicas
• Note: 2 replicas doesn’t make sense in consensus replication
Raft
group
Raft and Replication
Raft provides “atomic replication” of commands
Commands are proposed by the leaseholder replica
and distributed to the follower replicas, but only
accepted when a quorum of replicas have
acknowledged receipt
* Leaseholder == Raft leader
Raft
group
LEASEHOLDER
node1
node2
node4
node3
Range Leases
muddy
peetey
lula
lady
pinetop
sooshi
stella
zee
carl
dagne
figment
jack
Reads with consensus
Reads must talk to a quorum of replicas
READ[carl]
node1
node2
node4
node3
Range Leases
muddy
peetey
lula
lady
pinetop
sooshi
stella
zee
carl
dagne
figment
jack
Reads without consensus
One replica is chosen as the leaseholder
READ[carl]
leaseholder
node1
node2
node4
node3
Range Leases
muddy
peetey
lula
lady
pinetop
sooshi
stella
zee
carl
dagne
figment
jack
Reads without consensus
One replica is chosen as the leaseholder
● Coordinates writes (proposal, key locking)
● Performs reads
READ[carl]
leaseholder
node1
node2
node4
node3
Replica Placement
muddy
peetey
lula
lady
pinetop
sooshi
stella
zee
● Space
● Diversity
● Load
● Latency
carl
dagne
figment
jack
Each Range is a Raft state machine
A Range has 1 or more Replicas
node1
node2
node4
node3
Replica Placement: Diversity
muddy
peetey
lula
lady
carl
dagne
figment
jack
Diversity
optimizes placement of
replicas across “failure
domains”
● Disk
● Single machine
● Rack
● Datacenter
● Region
pinetop
sooshi
stella
zee
node1
node2
node6
node4
node5
Replica Placement: Load
muddy
peetey
lula
lady
pinetop
sooshi
stella
zee
carl
dagne
figment
jack
Load
Balances placement using
heuristics that considers
real-time usage metrics of
the data itself
This range is high load as it is
accessed more than others
While we show this for ranges within a
single table, this is also applicable across
all ranges across ALL tables, which is the
more typical situation
node1node3
Replica Placement: Latency & Geo-partitioning
muddy
peetey
lula
lady
carl
dagne
figment
jack
pinetop
sooshi
stella
zee
USE/muddy
USE/stella
USE/figment
USE/dagne
USW/jack
USW/lady
USW/peetey
USW/pinetop
EU/carl
EU/lula
EU/sooshi
EU/zee
We apply a constraint that indicates regional
placement so we can ensure low latency
access or jurisdictional control of data
Rebalancing Replicas
node1
node5
node4
node2
node3
NEW
Scale: Add a node
If we add a node to the cluster,
CockroachDB automatically
redistributed replicas to even load
across the cluster
Uses the replica placement
heuristics from previous slides to
decide which node to add to and
which to remove from
Rebalancing Replicas
node1
node5
node4
node2
node3
NEW
Scale: Add a node
If we add a node to the cluster,
CockroachDB automatically
redistributed replicas to even load
across the cluster
Uses the replica placement
heuristics from previous slides
Movement is decomposed into
adding a replica followed by
removing a replica
Rebalancing Replicas
node1
node5
node4
node2
node3
NEW
Scale: Add a node
If we add a node to the cluster,
CockroachDB automatically
redistributed replicas to even load
across the cluster
Uses the replica placement
heuristics from previous slides
Movement is decomposed into
adding a replica followed by
removing a replica
Rebalancing Replicas
node1
node5
node4
node2
node3
Loss of a node
Permanent Failure
If a node goes down, the Raft
group realizes a replica is missing
and replaces it with a new replica
on an active node
Uses the replica placement
heuristics from previous slides
Rebalancing Replicas
node1
node5
node4
node2
node3
Loss of a node
Permanent Failure
If a node goes down, the Raft
group realizes a replica is missing
and replaces it with a new replica
on an active node
Uses the replica placement
heuristics from previous slides
The failed replica is removed from the Raft group
and a new replica created. The leaseholder sends a
snapshot of the Range’s state to bring the new
replica up to date.
Rebalancing Replicas
node1
node5
node4
node2
Loss of a node
Temporary Failure
If a node goes down for a moment,
the leaseholder can “catch up” any
replica that is behind
The leaseholder can send commands to be replayed
OR it can send a snapshot of the current Range data.
We apply heuristics to decide which is most efficient
for a given failure.
node3
AGENDA
● Introduction
● Ranges and Replicas
● Transactions
● SQL Data in a KV World
● SQL Execution
● SQL Optimization
Transactions
Atomicity, Consistency, Isolation, Durability
Serializable Isolation
• As if the transactions are run in a serial order
• Gold standard isolation level
• Make Data Easy - weaker isolation levels are too great a burden
Transactions can span arbitrary ranges
Conversational
• The full set of operations is not required up front
Transactions
Raft provides atomic writes to individual ranges
Bootstrap transaction atomicity using Raft atomic writes
Transaction record atomically flipped from PENDING to COMMIT
Distributed Transactions
node1
carl
dagne
figment
jack
carl
peetey
dagne
lady
lula
muddy
peetey
lady
node2
node3
carl
peetey
dagne
lady
lula
muddy
peetey
lady
node4
carl
dagne
figment
jack
pinetop
sooshi
stella
zee
pinetop
sooshi
stella
zee
pinetop
sooshi
stella
zee
carl
peetey
dagne
lady
lula
muddy
peetey
lady
carl
dagne
figment
jack
INSERT INTO dogs
VALUES (sunny, ozzie)
Distributed Transactions
node1
carl
dagne
figment
jack
carl
peetey
dagne
lady
lula
muddy
peetey
lady
node2
node3
carl
peetey
dagne
lady
lula
muddy
peetey
lady
node4
carl
dagne
figment
jack
carl
peetey
dagne
lady
lula
muddy
peetey
lady
carl
dagne
figment
jack
pinetop
sooshi
stella
zee
pinetop
sooshi
stella
zee
BEGIN TXN1
WRITE[sunny]
GATEWAY
INSERT INTO dogs
VALUES (sunny, ozzie)
transactions
TXN1: PENDING
pinetop
sooshi
stella
zee
Distributed Transactions
node1
carl
dagne
figment
jack
carl
peetey
dagne
lady
lula
muddy
peetey
lady
node2
node3
carl
peetey
dagne
lady
lula
muddy
peetey
lady
node4
carl
dagne
figment
jack
carl
peetey
dagne
lady
lula
muddy
peetey
lady
carl
dagne
figment
jack
pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
BEGIN TXN1
WRITE[sunny]
GATEWAY
INSERT INTO dogs
VALUES (sunny, ozzie)
transactions
TXN1: PENDING
pinetop
sooshi
stella
sunny
zee
Distributed Transactions
node1
carl
dagne
figment
jack
carl
peetey
dagne
lady
lula
muddy
peetey
lady
node2
node3
carl
peetey
dagne
lady
lula
muddy
peetey
lady
node4
carl
dagne
figment
jack
carl
peetey
dagne
lady
lula
muddy
peetey
lady
carl
dagne
figment
jack
pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
BEGIN TXN1
WRITE[sunny]
GATEWAY
INSERT INTO dogs
VALUES (sunny, ozzie)
transactions
TXN1: PENDING
pinetop
sooshi
stella
sunny
zee
ACK
pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
Distributed Transactions
node1
carl
dagne
figment
jack
carl
peetey
dagne
lady
lula
muddy
peetey
lady
node2
node3
carl
peetey
dagne
lady
lula
muddy
peetey
lady
node4
carl
dagne
figment
jack
carl
peetey
dagne
lady
lula
muddy
peetey
lady
carl
dagne
figment
jack
pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
BEGIN TXN1
WRITE[sunny]
GATEWAY
INSERT INTO dogs
VALUES (sunny, ozzie)
transactions
TXN1: PENDING
pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
Distributed Transactions
node1
carl
dagne
figment
jack
carl
peetey
dagne
lady
lula
muddy
peetey
lady
node2
node3
carl
peetey
dagne
lady
lula
muddy
peetey
lady
node4
carl
dagne
figment
jack
carl
dagne
figment
jack
pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
BEGIN TXN1
WRITE[sunny]
WRITE[ozzie]
GATEWAY
INSERT INTO dogs
VALUES (sunny, ozzie)
transactions
TXN1: PENDING
pinetop
sooshi
stella
sunny
zee
carl
peetey
dagne
lady
lula
muddy
ozzie
peetey
lady
ACK
pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
Distributed Transactions
node1
carl
dagne
figment
jack
carl
peetey
dagne
lady
lula
muddy
ozzie
peetey
lady
node2
node3
carl
peetey
dagne
lady
lula
muddy
ozzie
peetey
lady
node4
carl
dagne
figment
jack
carl
dagne
figment
jack
BEGIN TXN1
WRITE[sunny]
WRITE[ozzie]
GATEWAY
INSERT INTO dogs
VALUES (sunny, ozzie)
transactions
TXN1: PENDING
carl
peetey
dagne
lady
lula
muddy
ozzie
peetey
lady pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
Distributed Transactions
node1
carl
dagne
figment
jack
node2
node3
node4
carl
dagne
figment
jack
carl
dagne
figment
jack
BEGIN TXN1
WRITE[sunny]
WRITE[ozzie]
GATEWAY
INSERT INTO dogs
VALUES (sunny, ozzie)
transactions
TXN1: PENDING
ACK
pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
carl
peetey
dagne
lady
lula
muddy
ozzie
peetey
lady
carl
peetey
dagne
lady
lula
muddy
ozzie
peetey
lady
carl
peetey
dagne
lady
lula
muddy
ozzie
peetey
lady
Distributed Transactions
node1
carl
dagne
figment
jack
node2
node3
node4
carl
dagne
figment
jack
carl
dagne
figment
jack
BEGIN TXN1
WRITE[sunny]
WRITE[ozzie]
GATEWAY
INSERT INTO dogs
VALUES (sunny, ozzie)
transactions
TXN1: PENDING
ACK
pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
carl
peetey
dagne
lady
lula
muddy
ozzie
peetey
lady
carl
peetey
dagne
lady
lula
muddy
ozzie
peetey
lady
carl
peetey
dagne
lady
lula
muddy
ozzie
peetey
lady
Distributed Transactions
node1
carl
dagne
figment
jack
carl
peetey
dagne
lady
lula
muddy
ozzie
peetey
lady
node2
node3
carl
peetey
dagne
lady
lula
muddy
ozzie
peetey
lady
node4
carl
dagne
figment
jack
carl
dagne
figment
jack
pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
BEGIN TXN1
WRITE[sunny]
WRITE[ozzie]
COMMIT
GATEWAY
INSERT INTO dogs
VALUES (sunny, ozzie)
transactions
TXN1: COMMIT
pinetop
sooshi
stella
sunny
zee
carl
peetey
dagne
lady
lula
muddy
ozzie
peetey
lady
Distributed Transactions
node1
carl
dagne
figment
jack
carl
peetey
dagne
lady
lula
muddy
ozzie
peetey
lady
node2
node3
carl
peetey
dagne
lady
lula
muddy
ozzie
peetey
lady
node4
carl
dagne
figment
jack
carl
dagne
figment
jack
pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
BEGIN TXN1
WRITE[sunny]
WRITE[ozzie]
COMMIT
GATEWAY
INSERT INTO dogs
VALUES (sunny, ozzie)
pinetop
sooshi
stella
sunny
zee
carl
peetey
dagne
lady
lula
muddy
ozzie
peetey
lady
ACK
Transactions: Pipelining
Serial Pipelined
Transactions: Pipelining
Serial Pipelined
sunny
sunny
BEGIN
WRITE[sunny]
txn:sunny (pending)
Transactions: Pipelining
Serial Pipelined
txn:sunny (pending)
sunny
ozzie
sunny
ozzie
BEGIN
WRITE[sunny]
WRITE[ozzie]
Transactions: Pipelining
Serial Pipelined
txn:sunny (pending)
sunny
ozzie
txn:sunny (commit)[keys: sunny, ozzie]
txn:sunny (staged)[keys: sunny, ozzie]
sunny
ozzie
BEGIN
WRITE[sunny]
WRITE[ozzie]
COMMIT
Transactions: Pipelining
Serial Pipelined
txn:sunny (pending)
sunny
ozzie
txn:sunny (commit)[keys: sunny, ozzie]
BEGIN
WRITE[sunny]
WRITE[ozzie]
COMMIT
Committed once all
operations complete
We replaced the
centralized commit marker
with a distributed one
t
sunny
ozzie
txn:sunny (staged)[keys: sunny, ozzie]
* “Proved” with TLA+
AGENDA
● Introduction
● Ranges and Replicas
● Transactions
● SQL Data in a KV World
● SQL Execution
● SQL Optimization
SQL
Structured Query Language
Declarative, not imperative
• These are the results I want vs perform these operations in this sequence
Relational data model
• Typed: INT, FLOAT, STRING, ...
• Schemas: tables, rows, columns, foreign keys
SQL: Tabular Data in a KV World
SQL data has columns and types?!?
How do we store typed and columnar data in a distributed, replicated,
transactional key-value store?
• The SQL data model needs to be mapped to KV data
• Reminder: keys and values are lexicographically sorted
CREATE TABLE inventory (
id INT PRIMARY KEY,
name STRING,
price FLOAT
)
SQL Data Mapping: Inventory Table
ID Name Price
1 Bat 1.11
2 Ball 2.22
3 Glove 3.33
Key Value
/1 “Bat”,1.11
/2 “Ball”,2.22
/3 “Glove”,3.33
CREATE TABLE inventory (
id INT PRIMARY KEY,
name STRING,
price FLOAT
)
SQL Data Mapping: Inventory Table
ID Name Price
1 Bat 1.11
2 Ball 2.22
3 Glove 3.33
Key Value
/<Table>/<Index>/1 “Bat”,1.11
/<Table>/<Index>/2 “Ball”,2.22
/<Table>/<Index>/3 “Glove”,3.33
CREATE TABLE inventory (
id INT PRIMARY KEY,
name STRING,
price FLOAT
)
SQL Data Mapping: Inventory Table
ID Name Price
1 Bat 1.11
2 Ball 2.22
3 Glove 3.33
Key Value
/inventory/primary/1 “Bat”,1.11
/inventory/primary/2 “Ball”,2.22
/inventory/primary/3 “Glove”,3.33
CREATE TABLE inventory (
id INT PRIMARY KEY,
name STRING,
price FLOAT,
INDEX name_idx (name)
)
SQL Data Mapping: Inventory Table
ID Name Price
1 Bat 1.11
2 Ball 2.22
3 Glove 3.33
Key Value
/inventory/name_idx/”Bat”/1 ∅
/inventory/name_idx/”Ball”/2 ∅
/inventory/name_idx/”Glove”/3 ∅
CREATE TABLE inventory (
id INT PRIMARY KEY,
name STRING,
price FLOAT,
INDEX name_idx (name)
)
SQL Data Mapping: Inventory Table
ID Name Price
1 Bat 1.11
2 Ball 2.22
3 Glove 3.33
4 Bat 4.44
Key Value
/inventory/name_idx/”Bat”/1 ∅
/inventory/name_idx/”Ball”/2 ∅
/inventory/name_idx/”Glove”/3 ∅
CREATE TABLE inventory (
id INT PRIMARY KEY,
name STRING,
price FLOAT,
INDEX name_idx (name)
)
SQL Data Mapping: Inventory Table
ID Name Price
1 Bat 1.11
2 Ball 2.22
3 Glove 3.33
4 Bat 4.44
Key Value
/inventory/name_idx/”Bat”/1 ∅
/inventory/name_idx/”Ball”/2 ∅
/inventory/name_idx/”Glove”/3 ∅
/inventory/name_idx/”Bat”/4 ∅
AGENDA
● Introduction
● Ranges and Replicas
● Transactions
● SQL Data in a KV World
● SQL Execution
● SQL Optimization
SQL Execution
Relational operators
• Projection (SELECT <columns>)
• Selection (WHERE <filter>)
• Aggregation (GROUP BY <columns>)
• Join (JOIN), union (UNION), intersect (INTERSECT)
• Scan (FROM <table>)
• Sort (ORDER BY)
○ Technically, not a relational operator
SQL Execution
• Relational expressions have input expressions and scalar expressions
○ For example, a “filter” expression has 1 input expression and a scalar expression that
filters the rows from the child
○ The scan expression has zero inputs
• Query plan is a tree of relational expressions
• SQL execution takes a query plan and runs the operations to completion
SQL Execution: Example
SELECT name
FROM inventory
WHERE name >= “b” AND name < “c”
SQL Execution: Scan
SELECT name
FROM inventory
WHERE name >= “b” AND name < “c”
Scan
inventory
SQL Execution: Filter
SELECT name
FROM inventory
WHERE name >= “b” AND name < “c”
Scan
inventory
Filter
name >= “b” AND name < “c”
SQL Execution: Project
SELECT name
FROM inventory
WHERE name >= “b” AND name < “c”
Scan
inventory
Filter
name >= “b” AND name < “c”
Project
name
SQL Execution: Project
SELECT name
FROM inventory
WHERE name >= “b” AND name < “c”
Scan
inventory
Filter
name >= “b” AND name < “c”
Project
name
Results
SQL Execution: Index Scans
SELECT name
FROM inventory
WHERE name >= “b” AND name < “c”
Scan
inventory@name [“b” - “c”)
The filter gets pushed into the scan
SQL Execution: Index Scans
SELECT name
FROM inventory
WHERE name >= “b” AND name < “c”
Scan
inventory@name [“b” - “c”)
Project
name
Results
SQL Execution: Correctness
Correct SQL execution involves lots of bookkeeping
• User defined tables, and indexes
• Queries refer to table and column names
• Execution uses table and column IDs
• NULL handling
SQL Execution: Performance
Performant SQL execution
• Tight, well written code
• Operator specialization
○ hash group by, stream group by
○ hash join, merge join, lookup join, zig-zag join
• Distributed execution
SQL Execution: Group By
SELECT COUNT(*), country
FROM customers
GROUP BY country Name Country
Bob United States
Hans Germany
Jacques France
Marie France
Susan United States
SQL Execution: Hash Group By
SELECT COUNT(*), country
FROM customers
GROUP BY country Name Country
Bob United States
Hans Germany
Jacques France
Marie France
Susan United States
SQL Execution: Hash Group By
SELECT COUNT(*), country
FROM customers
GROUP BY country Name Country
Bob United States
Hans Germany
Jacques France
Marie France
Susan United States
United States 1
SQL Execution: Hash Group By
SELECT COUNT(*), country
FROM customers
GROUP BY country Name Country
Bob United States
Hans Germany
Jacques France
Marie France
Susan United States
United States 1
Germany 1
SQL Execution: Hash Group By
SELECT COUNT(*), country
FROM customers
GROUP BY country Name Country
Bob United States
Hans Germany
Jacques France
Marie France
Susan United States
United States 1
Germany 1
France 1
SQL Execution: Hash Group By
SELECT COUNT(*), country
FROM customers
GROUP BY country Name Country
Bob United States
Hans Germany
Jacques France
Marie France
Susan United States
United States 1
Germany 1
France 2
SQL Execution: Hash Group By
SELECT COUNT(*), country
FROM customers
GROUP BY country Name Country
Bob United States
Hans Germany
Jacques France
Marie France
Susan United States
United States 2
Germany 1
France 2
SQL Execution: Group By Revisited
SELECT COUNT(*), country
FROM customers
GROUP BY country Name Country
Bob United States
Hans Germany
Jacques France
Marie France
Susan United States
SQL Execution: Sort on Grouping Column(s)
SELECT COUNT(*), country
FROM customers
GROUP BY country Name Country
Jacques France
Marie France
Hans Germany
Bob United States
Susan United States
SQL Execution: Streaming Group By
SELECT COUNT(*), country
FROM customers
GROUP BY country Name Country
Jacques France
Marie France
Hans Germany
Bob United States
Susan United States
France 1
SQL Execution: Streaming Group By
SELECT COUNT(*), country
FROM customers
GROUP BY country Name Country
Jacques France
Marie France
Hans Germany
Bob United States
Susan United States
France 2
SQL Execution: Streaming Group By
SELECT COUNT(*), country
FROM customers
GROUP BY country Name Country
Jacques France
Marie France
Hans Germany
Bob United States
Susan United States
France 2
Germany 1
SQL Execution: Streaming Group By
SELECT COUNT(*), country
FROM customers
GROUP BY country Name Country
Jacques France
Marie France
Hans Germany
Bob United States
Susan United States
France 2
Germany 1
United States 1
SQL Execution: Streaming Group By
SELECT COUNT(*), country
FROM customers
GROUP BY country Name Country
Jacques France
Marie France
Hans Germany
Bob United States
Susan United States
France 2
Germany 1
United States 2
Distributed SQL Execution
Network latencies and
throughput are important
considerations in
geo-distributed setups
Push fragments of computation
as close to the data as possible
Distributed SQL Execution: Streaming Group By
SELECT COUNT(*), country
FROM customers
GROUP BY country
Scan
customers
Scan
customers
Scan
customers
scan
scan
scan
Distributed SQL Execution: Streaming Group By
SELECT COUNT(*), country
FROM customers
GROUP BY country
Scan
customers
Scan
customers
Scan
customers
Group-By
“country”
Group-By
“country”
Group-By
“country”
group-by
group-by
group-by
Distributed SQL Execution: Streaming Group By
SELECT COUNT(*), country
FROM customers
GROUP BY country
Scan
customers
Scan
customers
Scan
customers
Group-By
“country”
Group-By
“country”
Group-By
“country”
Group-By
“country”
group-by
AGENDA
● Introduction
● Ranges and Replicas
● Transactions
● SQL Data in a KV World
● SQL Execution
● SQL Optimization
SQL Optimization
An optimizer explores many plans that are logically equivalent to a given
query and chooses the best one
Parse ExecuteSearch
Memo
Prep
AST Plan
Fold Constants
Check Types
Resolve Names
Report Semantic Errors
Compute properties
Retrieve and attach stats
Cost-independent transformations
Cost-based transformationsParse SQL
SQL Optimization: Cost-Independent Transformations
• Some transformations always make sense
○ Constant folding
○ Filter push-down
○ Decorrelating subqueries*
○ ...
• These transformations are cost-independent
○ If the transformation can be applied to the query, it is applied
• Domain Specific Language for transformations
○ Compiled down to code which efficiently matches query fragments in the memo
○ ~200 transformations currently defined
* Actually cost-based, but we’re treating it as cost-independent right now
SQL Optimization: Filter Push-Down
SELECT * FROM a JOIN b WHERE x > 10
Scan
a@primary
Filter
x > 10
Results
Scan
b@primary
Join
Initial plan
SQL Optimization: Filter Push-Down
SELECT * FROM a JOIN b WHERE x > 10
Scan
a@primary
Filter
x > 10
Results
Scan
b@primary
Join
Filter
x > 10
After filter push-down
SQL Optimization: Cost-Based Transformations
• Some transformations are not universally good
○ Index selection
○ Join reordering
○ ...
• These transformations are cost-based
○ When should the transformation be applied?
○ Need to try both paths and maintain both the original and transformed query
○ State explosion: thousands of possible query plans
■ Memo data structure maintains a forest of query plans
○ Estimate cost of each query, select query with lowest cost
• Costing
○ Based on table statistics and estimating cardinality of inputs to relational expressions
SQL Optimization: Cost-based Index Selection
The index to use for a query is affected by multiple factors
• Filters and join conditions
• Required ordering (ORDER BY)
• Implicit ordering (GROUP BY)
• Covering vs non-covering (i.e. is an index-join required)
• Locality
SQL Optimization: Cost-based Index Selection
SELECT *
FROM a
WHERE x > 10
ORDER BY y
Required orderings affect index selection
Sorting is expensive if there are a lot of rows
Sorting can be the better option if there are few rows
SQL Optimization: Cost-based Index Selection
Required orderings affect index selection
Sorting is expensive if there are a lot of rows
Sorting can be the better option if there are few rows
Scan
a@primary
Filter
x > 10
Sort
y
SELECT *
FROM a
WHERE x > 10
ORDER BY y
SQL Optimization: Cost-based Index Selection
Required orderings affect index selection
Sorting is expensive if there are a lot of rows
Sorting can be the better option if there are few rows
Scan
a@primary
Scan
a@x [10 - )
Filter
x > 10
Sort
y
Sort
y
SELECT *
FROM a
WHERE x > 10
ORDER BY y
SQL Optimization: Cost-based Index Selection
Required orderings affect index selection
Sorting is expensive if there are a lot of rows
Sorting can be the better option if there are few rows
Scan
a@primary
Scan
a@x [10 - )
Filter
x > 10
Scan
a@y
Sort
y
Sort
y
Filter
x > 10
SELECT *
FROM a
WHERE x > 10
ORDER BY y
SQL Optimization: Cost-based Index Selection
Required orderings affect index selection
Sorting is expensive if there are a lot of rows
Sorting can be the better option if there are few rows
Scan
a@primary
Scan
a@x [10 - )
Filter
x > 10
Scan
a@y
Sort
y
Sort
y
Filter
x > 10
SELECT *
FROM a
WHERE x > 10
ORDER BY y
10
100,000
10
10
Lowest
Cost
SQL Optimization: Cost-based Index Selection
Required orderings affect index selection
Sorting is expensive if there are a lot of rows
Sorting can be the better option if there are few rows
Scan
a@primary
Scan
a@x [10 - )
Filter
x > 10
Scan
a@y
Sort
y
Sort
y
Filter
x > 10
SELECT *
FROM a
WHERE x > 10
ORDER BY y
50,000
100,000
50,000
50,000
Lowest
Cost
Locality-Aware SQL Optimization
Network latencies and
throughput are important
considerations in
geo-distributed setups
Duplicate read-mostly data in
each locality
Plan queries to use data from
the same locality
Locality-Aware SQL Optimization
Three copies of the
postal_codes table data
Use replication constraints to
pin the copies to different
geographic regions (US-East,
US-West, EU)
CREATE TABLE postal_codes (
id INT PRIMARY KEY,
code STRING,
INDEX idx_eu (id) STORING (code),
INDEX idx_usw (id) STORING (code)
)
Locality-Aware SQL Optimization
Optimizer includes locality in
cost model
Automatically selects index
from same locality: primary,
idx_eu, or idx_usw
CREATE TABLE postal_codes (
id INT PRIMARY KEY,
code STRING,
INDEX idx_eu (id) STORING (code),
INDEX idx_usw (id) STORING (code)
)
SELECT * FROM postal_codes
Conclusion
● Distributed, replicated, transactional key-value store
● Monolithic key space
● Raft replication of ranges (~64MB)
● Replica placement signals: space, diversity, load, latency
● Pipelined transaction operations
● Mapping SQL data to KV storage
● Distributed SQL execution
● Distributed SQL optimization
www.cockroachlabs.com
github.com/cockroachdb/cockroach
Thank You
A Simple Transaction
node1
carl
dagne
figment
jack
carl
peetey
dagne
lady
lula
muddy
peetey
lady
node1
node1
carl
peetey
dagne
lady
lula
muddy
peetey
lady
node1
carl
dagne
figment
jack
pinetop
sooshi
stella
zee
pinetop
sooshi
stella
zee
pinetop
sooshi
stella
zee
carl
peetey
dagne
lady
lula
muddy
peetey
lady
carl
dagne
figment
jack
INSERT INTO DOGS (sunny);
A Simple Transaction: One Range
node1
carl
dagne
figment
jack
carl
peetey
dagne
lady
lula
muddy
peetey
lady
node1
node1
carl
peetey
dagne
lady
lula
muddy
peetey
lady
node1
carl
dagne
figment
jack
carl
peetey
dagne
lady
lula
muddy
peetey
lady
carl
dagne
figment
jack
pinetop
sooshi
stella
zee
pinetop
sooshi
stella
zee
pinetop
sooshi
stella
sunny
zee
BEGIN
WRITE[sunny]
COMMIT
GATEWAY
INSERT INTO DOGS (sunny);
NOTE: a gateway can be ANY CockroachDB instance. It can
find the leaseholder for any range and execute a transaction
A Simple Transaction: One Range
node1
carl
dagne
figment
jack
carl
peetey
dagne
lady
lula
muddy
peetey
lady
node1
node1
carl
peetey
dagne
lady
lula
muddy
peetey
lady
node1
carl
dagne
figment
jack
carl
peetey
dagne
lady
lula
muddy
peetey
lady
carl
dagne
figment
jack
pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
BEGIN
WRITE[sunny]
COMMIT
GATEWAY
INSERT INTO DOGS (sunny);
ACK
A Simple Transaction: One Range
node1
carl
dagne
figment
jack
carl
peetey
dagne
lady
lula
muddy
peetey
lady
node1
node1
carl
peetey
dagne
lady
lula
muddy
peetey
lady
node1
carl
dagne
figment
jack
carl
peetey
dagne
lady
lula
muddy
peetey
lady
carl
dagne
figment
jack
pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
pinetop
sooshi
stella
sunny
zee
BEGIN
WRITE[sunny]
COMMIT
GATEWAY
INSERT INTO DOGS (sunny);
ACK
Ranges
CockroachDB implements order-preserving data distribution
• Automates sharding of key/value data into “ranges”
• Supports efficient range scans
• Requires an indexing structure
Foundational capability that enables efficient distribution
of data across nodes within a CockroachDB cluster
* This approach is also used by Bigtable (tablets), HBase (regions) & Spanner (ranges)
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
cockroachdb-distributed-sql/

More Related Content

What's hot

The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouseAltinity Ltd
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in SparkDatabricks
 
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & ClusterMySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & ClusterKenny Gryp
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Mydbops
 
How Booking.com avoids and deals with replication lag
How Booking.com avoids and deals with replication lagHow Booking.com avoids and deals with replication lag
How Booking.com avoids and deals with replication lagJean-François Gagné
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Koalas: Making an Easy Transition from Pandas to Apache Spark
Koalas: Making an Easy Transition from Pandas to Apache SparkKoalas: Making an Easy Transition from Pandas to Apache Spark
Koalas: Making an Easy Transition from Pandas to Apache SparkDatabricks
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
MySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdf
MySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdfMySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdf
MySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdfAlkin Tezuysal
 
MySQL GTID 시작하기
MySQL GTID 시작하기MySQL GTID 시작하기
MySQL GTID 시작하기I Goo Lee
 
Vitess VReplication: Standing on the Shoulders of a MySQL Giant
Vitess VReplication: Standing on the Shoulders of a MySQL GiantVitess VReplication: Standing on the Shoulders of a MySQL Giant
Vitess VReplication: Standing on the Shoulders of a MySQL GiantMatt Lord
 
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group ReplicationPercona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group ReplicationKenny Gryp
 
Rocks db state store in structured streaming
Rocks db state store in structured streamingRocks db state store in structured streaming
Rocks db state store in structured streamingBalaji Mohanam
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017 Karan Singh
 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Ed Kohlwey
 

What's hot (20)

Query logging with proxysql
Query logging with proxysqlQuery logging with proxysql
Query logging with proxysql
 
Rds data lake @ Robinhood
Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouse
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
 
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & ClusterMySQL Database Architectures - InnoDB ReplicaSet & Cluster
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
 
How Booking.com avoids and deals with replication lag
How Booking.com avoids and deals with replication lagHow Booking.com avoids and deals with replication lag
How Booking.com avoids and deals with replication lag
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Koalas: Making an Easy Transition from Pandas to Apache Spark
Koalas: Making an Easy Transition from Pandas to Apache SparkKoalas: Making an Easy Transition from Pandas to Apache Spark
Koalas: Making an Easy Transition from Pandas to Apache Spark
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
MySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdf
MySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdfMySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdf
MySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdf
 
MySQL GTID 시작하기
MySQL GTID 시작하기MySQL GTID 시작하기
MySQL GTID 시작하기
 
Vitess VReplication: Standing on the Shoulders of a MySQL Giant
Vitess VReplication: Standing on the Shoulders of a MySQL GiantVitess VReplication: Standing on the Shoulders of a MySQL Giant
Vitess VReplication: Standing on the Shoulders of a MySQL Giant
 
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group ReplicationPercona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
 
Rocks db state store in structured streaming
Rocks db state store in structured streamingRocks db state store in structured streaming
Rocks db state store in structured streaming
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?
 
Druid
DruidDruid
Druid
 

Similar to CockroachDB: Architecture of a Geo-Distributed SQL Database

HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaTed Dunning
 
Using OpenStack Swift for Extreme Data Durability
 Using OpenStack Swift for Extreme Data Durability Using OpenStack Swift for Extreme Data Durability
Using OpenStack Swift for Extreme Data DurabilityChristian Schwede
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introductionkanedafromparis
 
Scylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScyllaDB
 
5 levels of high availability from multi instance to hybrid cloud
5 levels of high availability  from multi instance to hybrid cloud5 levels of high availability  from multi instance to hybrid cloud
5 levels of high availability from multi instance to hybrid cloudRafał Leszko
 
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...Lucidworks
 
5 Levels of High Availability: From Multi-instance to Hybrid Cloud
5 Levels of High Availability: From Multi-instance to Hybrid Cloud5 Levels of High Availability: From Multi-instance to Hybrid Cloud
5 Levels of High Availability: From Multi-instance to Hybrid CloudRafał Leszko
 
MSR 2009
MSR 2009MSR 2009
MSR 2009swy351
 
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta MapR Technologies
 
Percon XtraDB Cluster in a nutshell
Percon XtraDB Cluster in a nutshellPercon XtraDB Cluster in a nutshell
Percon XtraDB Cluster in a nutshellFrederic Descamps
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionSplunk
 
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersLeveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersRan Ziv
 
Ten tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache SparkTen tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache SparkWill Du
 
Using Kubernetes to deliver a “serverless” service
Using Kubernetes to deliver a “serverless” serviceUsing Kubernetes to deliver a “serverless” service
Using Kubernetes to deliver a “serverless” serviceDoKC
 
NetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksNetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksRuslan Meshenberg
 
Scalable Persistent Storage for Erlang: Theory and Practice
Scalable Persistent Storage for Erlang: Theory and PracticeScalable Persistent Storage for Erlang: Theory and Practice
Scalable Persistent Storage for Erlang: Theory and PracticeAmir Ghaffari
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on DockerDataWorks Summit
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogC4Media
 

Similar to CockroachDB: Architecture of a Geo-Distributed SQL Database (20)

HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with Katta
 
Using OpenStack Swift for Extreme Data Durability
 Using OpenStack Swift for Extreme Data Durability Using OpenStack Swift for Extreme Data Durability
Using OpenStack Swift for Extreme Data Durability
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 
Scylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent Databases
 
5 levels of high availability from multi instance to hybrid cloud
5 levels of high availability  from multi instance to hybrid cloud5 levels of high availability  from multi instance to hybrid cloud
5 levels of high availability from multi instance to hybrid cloud
 
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
SolrCloud in Public Cloud: Scaling Compute Independently from Storage - Ilan ...
 
5 Levels of High Availability: From Multi-instance to Hybrid Cloud
5 Levels of High Availability: From Multi-instance to Hybrid Cloud5 Levels of High Availability: From Multi-instance to Hybrid Cloud
5 Levels of High Availability: From Multi-instance to Hybrid Cloud
 
MSR 2009
MSR 2009MSR 2009
MSR 2009
 
HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta HPTS talk on micro sharding with Katta
HPTS talk on micro sharding with Katta
 
Percon XtraDB Cluster in a nutshell
Percon XtraDB Cluster in a nutshellPercon XtraDB Cluster in a nutshell
Percon XtraDB Cluster in a nutshell
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersLeveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive Clusters
 
Ten tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache SparkTen tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache Spark
 
Using Kubernetes to deliver a “serverless” service
Using Kubernetes to deliver a “serverless” serviceUsing Kubernetes to deliver a “serverless” service
Using Kubernetes to deliver a “serverless” service
 
NetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksNetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talks
 
DEVIEW 2013
DEVIEW 2013DEVIEW 2013
DEVIEW 2013
 
Scalable Persistent Storage for Erlang: Theory and Practice
Scalable Persistent Storage for Erlang: Theory and PracticeScalable Persistent Storage for Erlang: Theory and Practice
Scalable Persistent Storage for Erlang: Theory and Practice
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on Docker
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 

More from C4Media

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoC4Media
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileC4Media
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020C4Media
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No KeeperC4Media
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like OwnersC4Media
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 

More from C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

CockroachDB: Architecture of a Geo-Distributed SQL Database