New SQL System introudction

New SQL system
introduction
JiaLin Dai

New SQL
• Scale like NoSQL
• Provide SQL functions
– Complete transaction support
– Full SQL query support

Spanner
• Global distributed DB in google
• Anti datacenter disaster
• Consistency
– External consistent read / write
– Global consistent read at timestamp

Spanner cluster
• zonemaster: assign data to spanserver
• spanserver: serve data to client
• location proxy: locate spanservers
• universe master: console to display status information
• placement driver: handle automated movement of data across zones

Spanner server
• Colossus: successor of GFS
• Tablet: store key ranged data
• Paxos: consensus protocol to keep replicas in sync

Data model
• SQL like schema
– Row must have PK
• Each row is multi versioned
– Version by timestamp
– Old versions are garbage collected
• Protocol buffer support

Interleaved tables
• Rows with same key prefix are grouped into one directory
• Data in same directory is co-located

Log structured merge
• Minor compact
• Convert memtable into one SSTable file
• Merge compact
• Merge all SSTable files into one

Bloom filter
• Read need to access multiple SSTable files
• In memory bloom filter avoid unnecessary reads

Partition attributes across
• Keep same row inside one page
• IO friendly
• Keep column values close to each other
• CPU cache friendly

Replicated state machines
• Replicate change logs
• Change committed when majority replicated

Types of transaction
• Read write transaction
– Read locks hold on replica leader
– Client buffer all writes
– By end of transaction, 2PL commit
• Snapshot transaction
– Read data at previous timestamp
– Any up to date enough replica can serv
– Lock free
• Read only transaction
– Spanner choose one transaction
– Remaining is same for Snapshot transaction

True time API
• Explicitly express time uncertainty
• Time masters implemented using GPS and atomic clock
• Time daemon on every machine

Timestamp for RW transaction
• Paxos write
– Monotonical timestamp associated with each write
• Participant: prepare timestamp
– Bigger than all previous transactions
• Coordinator: commit message time
– TT.now().latest
• Coordinator: commit timestamp
– Bigger than all prepare timestamp
– Bigger then all previous transactions
– Bigger than commit message time
• Coordinator: wait time
– TT.after(commit message time)

Update to date?
• Safe time of replica: min of
– Safe time of Paxos
– Safe time of transaction manager
• Safe time of Paxos
– Max timestamp for Paxos write
• Safe time of transaction manager
– Min prepare time stamp of prepared transactions

Schema change
• Plan schema change at a future time
• All shards perform schema change at that
time
• Read / write transactions coordinate based
timestamp

Query compile
• Build relational operator tree
• Optimize tree using equivalent rewrite
– Push down operators into shards

Distributed join
• Use sharding key filter to extract sharding key ranges from input
• Merge sharding key ranges
• Compute affected shards
• Construct minimal batches for these shards

Run query
• Single consumer API
• Parallel consumer API
• Query auto restart
– Any machine can fail
– Restart token accompany all query result
– Capture distributed state of query plan

Other NewSQL systems
• TiDB
• CockroachDB

New SQL System introudction

More Related Content

What's hot

Similar to New SQL System introudction

Recently uploaded

New SQL System introudction