Introducing TiDB @ SF DevOps Meetup

Introducing TiDB
Kevin Xu (@pingcap; @kevinsxu)
Liu Tang (@siddontang)
August 20, 2018

● History and Community
● Technical Walkthrough
● Use Case with Mobike
● Live Demo: TiDB on GCP with Kubernetes
● Q&A
Agenda

A Little About PingCAP...
● Founded in April 2015 by 3 infrastructure engineers
● TiDB platform: (Ti = Titanium)
○ TiDB (stateless SQL layer compatible with MySQL)
○ TiKV (distributed transactional key-value store)
○ TiSpark (Apache Spark plug-in on top of TiKV)
○ Placement Driver (metadata cluster)
● Open source from Day 1
○ Inspired by Google Spanner / F1
○ GA 1.0: October 2017
○ GA 2.0: April 2018

● Hybrid OLTP & OLAP (Minimize ETL)
● Horizontal Scalability (Designed for infinity...)
● MySQL Compatible
● Distributed Transaction (ACID Compliant)
● High Availability
● Cloud-Native
○ *Just open-sourced TiDB-Operator leveraging Kubernetes*
○ On InfoWorld:
https://www.infoworld.com/article/3297700/kubernetes/introducing-the-kubernetes-operator-for-tidb.html
TiDB Core Features

2018 PingCAP
Stars
● TiDB: 14,500+
● TiKV: 3500+
Contributors
● TiDB: 195+
● TiKV: 80+
Community

TiDB: OLTP + Ad Hoc OLAP
Node1 Node2 Node3 Node4
MySQL Network Protocol
SQL Parser
Cost-based Optimizer
Distributed Executor (Coprocessor)
ODBC/JDBC MySQL Client
Any ORM which
supports MySQL
TiDB
TiKV

ID Name Email
1 Edward h@pingcap.com
2 Tom tom@pingcap.com
...
user/1 Edward,h@pingcap.com
user/2 Tom,tom@pingcap.com
...
In TiKV -∞
+∞
(-∞, +∞)
Sorted map
“User” Table
TiDB: Relational -> KV
Some region...

Index Structure
Row:
Key: tablePrefix_rowPrefix_tableID_rowID (IDs are assigned by TiDB, all int64)
Value: [col1, col2, col3, col4]
Index:
Key: tablePrefix_idxPrefix_tableID_indexID_ColumnsValue_rowID
Value: [null]
Keys are ordered by byte array in TiKV, so can support SCAN
Every key is appended a timestamp, issued by Placement Driver

TiSpark: Complex OLAP
Spark ExecSpark Exec
Spark Driver
Spark Exec
TiKV TiKV TiKV TiKV
TiSpark
TiSpark TiSpark TiSpark
TiKV
Placement
Driver (PD)
gRPC
Distributed Storage Layer
gRPC
retrieve data location
retrieve real data from TiKV

● Complex calculation pushdown
● Key-range pruning
● Index support:
○ Clustered index / non-clustered index
○ Index-only query optimization
● Cost-based optimization:
○ Stats gathered from TiDB in histogram
TiSpark: Features

● Hash Join (fastest; if table <= 50 million rows)
● Sort Merge Join (join on indexed column or ordered data
source)
● Index Lookup Join (join on indexed column; ideally after filter,
result < 10,000 rows)
Chosen based on Cost-base Optimizer:
Join Support
Network cost Memory cost CPU cost

TiKV and Placement Driver (Storage)

TiKV: The Foundation
RocksDB
Raft
Transaction
Txn KV API
Coprocessor
API
RocksDB
Raft
Transaction
Txn KV API
Coprocessor
API
RocksDB
Raft
Transaction
Txn KV API
Coprocessor
API
Raft
Group
Client
gRPC
TiKV Instance TiKV Instance TiKV Instance
gRPC gRPC
PD Cluster

PD: Dynamic Split and Merge
Region A
Region A
Region B
Region A
Region A
Region B
Split
Region A
Region A
Region B
Merge
TiKV_1 TiKV_2 TiKV_2TiKV_1

PD: Hotspot Removal
*Region A*
*Region B*
Region A
Region B
Workload
*Region A*
Region B
Region A
*Region B*
Workload
Workload
Hotspot Schedule
(Raft leader transfer)
TiKV_1 TiKV_2
TiKV_2TiKV_1

Geo-Replication
*Region A*
Region B
Region A
Region B
Seattle_1 Seattle_2
Region A
*Region B*
New York_1
*Region A*
Region B
Region A
*Region B*
Seattle_2Seattle_1
Region A
Region B
New York_1

● Timestamp Oracle service (from Google’s Percolator paper)
● 2-Phase commit protocol (2PC)
● Problem: Single point of failure
● Solution: Placement Driver HA cluster
○ Replicated using Raft
Transaction Model

● Formal proof using TLA+
○ a formal specification and verification language to reason about and prove
aspects of complex systems
○ Raft
○ TSO/Percolator
○ 2PC
● See details: https://github.com/pingcap/tla-plus
Guaranteeing Correctness

TiKV -> CNCF (To Be Announced…)

2018 PingCAP
Who’s using TiDB?
200+
Companies

2018 PingCAP
1. MySQL Scalability
2. Hybrid OLTP/OLAP Architecture
Two Major Use Cases

Mobike + TiDB
● 200 million users
● 200 cities
● 9 milllion smart bikes
● ~30 TB / day

● Locking and unlocking of smart bikes generate massive data
● Smooth experience is key to user retention
● TiDB supports this system by alerting administrators when
success rate of locking/unlocking drops, within minutes
● Quickly find malfunctioning bikes
Scenario #1: Locking/Unlocking

● Synchronize TiDB with MySQL
instances using Syncer (proprietary
tool)
● TiDB + TiSpark empower real-time
analysis with horizontal scalability
● No need for Hadoop + Hive
Scenario #2: Real-Time Analysis

● An innovative loyalty program that must
be on 24 x 7 x 356
● TiDB handles:
○ High-concurrency for peak or promotional season
○ Permanent storage
○ Horizontal scalability
● No interruption as business evolves
Scenario #3: Mobike Store

Test, Use, Contribute!
Thank You!
Twitter: @PingCAP; @kevinsxu; @siddontang
Kevin Xu (kevin@pingcap.com); Liu Tang (tl@pingcap.com)

Introducing TiDB @ SF DevOps Meetup

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introducing TiDB @ SF DevOps Meetup

Similar to Introducing TiDB @ SF DevOps Meetup (20)

Recently uploaded

Recently uploaded (20)

Introducing TiDB @ SF DevOps Meetup