SlideShare a Scribd company logo
Introducing TiDB
(For those coming from MySQL..)
Make Data Creative
Morgan Tocker (@pingcap; @morgo)
October, 2018
● History and Community
● Technical Walkthrough
● Use Case with Mobike
● Live Demo: TiDB on GKE
● MySQL Compatibility
● Q&A
Agenda
● Sr Product / Community Manager
● ~15+ years MySQL Experience
○ MySQL AB, Sun Microsystems, Percona, Oracle
● Previously Product Manager for MySQL Server
A Little About Me...
A Little About PingCAP...
● Founded in April 2015 by 3 infrastructure engineers
● Offices in China and North America
● Remote Friendly!
○ I work from here ➡
Recent News
Recent News
Our Product is The TiDB Platform
● TiDB platform: (Ti = Titanium)
○ TiDB (stateless SQL layer compatible with MySQL)
○ TiKV (distributed transactional key-value store)
○ TiSpark (Apache Spark plug-in on top of TiKV)
● Open source from Day 1
○ GA 1.0: October 2017
○ GA 2.0: April 2018
TiDB is a NewSQL Database
● 1960s: First Gen Databases
● 1970s: Relational Model + SQL
● 2000s: Sharding + Memcached
● 2010s: NoSQL
● 2010s+: NewSQL
Inspired by Google Spanner / F1
MySQL Compatible
● Hybrid OLTP & OLAP (Minimize ETL)
● Horizontal Scalability
● MySQL Compatible
● Distributed Transaction (ACID Compliant)
● High Availability
● Cloud-Native
TiDB Core Features
2018 PingCAP
1. MySQL Scalability
2. Hybrid OLTP/OLAP Architecture
3. Unifying Data Storage/Management
Common Use Cases
Architecture
SparkSQL
TiDB
TiDB
Worker
Spark
Driver
TiKV Cluster (Storage)
Metadata
TiKV TiKV
TiKV
Data location
Job
TiSpark
DistSQL API
TiKV
TiDB
TSO/Data location
Worker
Worker
Spark Cluster
TiDB Cluster
TiDB
DistSQL API
PD
PD Cluster
TiKV TiKV
TiDB
KV API
MySQL
MySQL
PD
PD
2018 PingCAP
Stars
● TiDB: 15,000+
● TiKV: 3700+
Contributors
● TiDB: 200+
● TiKV: 100+
Community
Early Sign-up: https://www.pingcap.com/tidb-academy/
Sneak Peek!
TiDB Platform Architecture
Platform Architecture
TiDB
TiDB
Worker
Spark
Driver
TiKV Cluster (Storage)
Metadata
TiKV TiKV
TiKV
Data location
Job
TiSpark
DistSQL API
TiKV
TiDB
TSO/Data location
Worker
Worker
Spark Cluster
TiDB Cluster
TiDB
DistSQL API
PD
PD Cluster
TiKV TiKV
TiDB
KV API
MySQL
MySQL
SparkSQL
PD
PD
SparkSQL
TiKV: The Foundation [in CNCF]
RocksDB
Raft
Transaction
Txn KV API
Coprocessor
API
RocksDB
Raft
Transaction
Txn KV API
Coprocessor
API
RocksDB
Raft
Transaction
Txn KV API
Coprocessor
API
Raft
Group
Client
gRPC
TiKV Instance TiKV Instance TiKV Instance
gRPC gRPC
PD Cluster
TiDB: OLTP + Ad Hoc OLAP
Node1 Node2 Node3 Node4
MySQL Network Protocol
SQL Parser
Cost-based Optimizer
Distributed Executor (Coprocessor)
ODBC/JDBC MySQL Client
Any ORM which
supports MySQL
TiDB
TiKV
ID Name Email
1 Edward h@pingcap.com
2 Tom tom@pingcap.com
...
user/1 Edward,h@pingcap.com
user/2 Tom,tom@pingcap.com
...
In TiKV -∞
+∞
(-∞, +∞)
Sorted map
“User” Table
TiDB: Relational -> KV
Some region...
SQL -> Parser -> Coprocessor
TiSpark: Complex OLAP
Spark ExecSpark Exec
Spark Driver
Spark Exec
TiKV TiKV TiKV TiKV
TiSpark
TiSpark TiSpark TiSpark
TiKV
Placement
Driver (PD)
gRPC
Distributed Storage Layer
gRPC
retrieve data location
retrieve real data from TiKV
Who’s Using TiDB?
2018 PingCAP
Who’s using TiDB?
300+
Companies
2018 PingCAP
1. MySQL Scalability
2. Hybrid OLTP/OLAP Architecture
3. Unifying Data Storage/Management
Common Use Cases
Mobike + TiDB
● 200 million users
● 200 cities
● 9 milllion smart bikes
● ~30 TB / day
● Locking and unlocking of smart bikes generate massive data
● Smooth experience is key to user retention
● TiDB supports this system by alerting administrators when
success rate of locking/unlocking drops, within minutes
● Quickly find malfunctioning bikes
Scenario #1: Locking/Unlocking
● Synchronize TiDB with MySQL
instances using Syncer (proprietary
tool)
● TiDB + TiSpark empower real-time
analysis with horizontal scalability
● No need for Hadoop + Hive
Scenario #2: Real-Time Analysis
● An innovative loyalty program that must
be on 24 x 7 x 365
● TiDB handles:
○ High-concurrency for peak or promotional season
○ Permanent storage
○ Horizontal scalability
● No interruption as business evolves
Scenario #3: Mobike Store
TiDB on GKE Demo
MySQL Compatibility
● Compatible with MySQL 5.7
○ Joins, Subqueries, DML, DDL etc.
● On the roadmap:
○ Views, Window Functions, GIS
● Missing:
○ Stored Procedures, Triggers, Events
Summary
pingcap.com
/docs/sql/mysql-compatibility/
● Some features work differently
○ Auto Increment
○ Optimistic Locking
● TiDB works better with smaller
transactions
○ Recommended to batch updates, deletes,
inserts to 5000 rows
Nuanced
Thank You!
Twitter: @PingCAP; @morgo
https://github.com/pingcap
(Give us a Watch/Star!)
Morgan Tocker
(morgan@pingcap.com)
Early Sign-up:
www.pingcap.com/tidb-academy/
● Hash Join (fastest; if table <= 50 million rows)
● Sort Merge Join (join on indexed column or ordered data
source)
● Index Lookup Join (join on indexed column; ideally after filter,
result < 10,000 rows)
Chosen based on Cost-based Optimizer:
Join Support
Network cost Memory cost CPU cost
Index Structure
Row:
Key: tablePrefix_rowPrefix_tableID_rowID (IDs are assigned by TiDB, all int64)
Value: [col1, col2, col3, col4]
Index:
Key: tablePrefix_idxPrefix_tableID_indexID_ColumnsValue_rowID
Value: [null]
Keys are ordered by byte array in TiKV, so can support SCAN
Every key is appended a timestamp, issued by Placement Driver
● Complex calculation pushdown
● Key-range pruning
● Index support:
○ Clustered index / non-clustered index
○ Index-only query optimization
● Cost-based optimization:
○ Stats gathered from TiDB in histogram
TiSpark: Features
PD: Dynamic Split and Merge
Region A
Region A
Region B
Region A
Region A
Region B
Split
Region A
Region A
Region B
Merge
TiKV_1 TiKV_2 TiKV_2TiKV_1
PD: Hotspot Removal
*Region A*
*Region B*
Region A
Region B
Workload
*Region A*
Region B
Region A
*Region B*
Workload
Workload
Hotspot Schedule
(Raft leader transfer)
TiKV_1 TiKV_2
TiKV_2TiKV_1
Geo-Replication + Data Location
*Region A*
Region B
Region A
Region B
Seattle_1 Seattle_2
Region A
*Region B*
New York_1
*Region A*
Region B
Region A
*Region B*
Seattle_2Seattle_1
Region A
Region B
New York_1
● Timestamp Oracle service (from Google’s Percolator paper)
● 2-Phase commit protocol (2PC)
● Problem: Single point of failure
● Solution: Placement Driver HA cluster
○ Replicated using Raft
Transaction Model
● Formal proof using TLA+
○ a formal specification and verification language to reason about and prove
aspects of complex systems
● Raft
● TSO/Percolator
● 2PC
● See details: https://github.com/pingcap/tla-plus
Guaranteeing Correctness

More Related Content

What's hot

Webinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDBWebinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Severalnines
 
Webinar slides: Designing Open Source Databases for High Availability
Webinar slides: Designing Open Source Databases for High AvailabilityWebinar slides: Designing Open Source Databases for High Availability
Webinar slides: Designing Open Source Databases for High Availability
Severalnines
 

What's hot (20)

Presto Summit 2018 - 01 - Facebook Presto
Presto Summit 2018  - 01 - Facebook PrestoPresto Summit 2018  - 01 - Facebook Presto
Presto Summit 2018 - 01 - Facebook Presto
 
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKVPresentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
 
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseFireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
 
Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0
 
Presto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
Presto: SQL-on-Anything. Netherlands Hadoop User Group MeetupPresto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
Presto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
 
Presto Fast SQL on Anything
Presto Fast SQL on AnythingPresto Fast SQL on Anything
Presto Fast SQL on Anything
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
Scylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDB
 
How QBerg scaled to store data longer, query it faster
How QBerg scaled to store data longer, query it fasterHow QBerg scaled to store data longer, query it faster
How QBerg scaled to store data longer, query it faster
 
Introducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorIntroducing the R2DBC async Java connector
Introducing the R2DBC async Java connector
 
The architecture of SkySQL
The architecture of SkySQLThe architecture of SkySQL
The architecture of SkySQL
 
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDBWebinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDB
 
How Pixid dropped Oracle and went hybrid with MariaDB
How Pixid dropped Oracle and went hybrid with MariaDBHow Pixid dropped Oracle and went hybrid with MariaDB
How Pixid dropped Oracle and went hybrid with MariaDB
 
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
 
Scylla Summit 2019 Keynote - Avi Kivity
Scylla Summit 2019 Keynote - Avi KivityScylla Summit 2019 Keynote - Avi Kivity
Scylla Summit 2019 Keynote - Avi Kivity
 
Webinar slides: Designing Open Source Databases for High Availability
Webinar slides: Designing Open Source Databases for High AvailabilityWebinar slides: Designing Open Source Databases for High Availability
Webinar slides: Designing Open Source Databases for High Availability
 
Demystifying the Distributed Database Landscape
Demystifying the Distributed Database LandscapeDemystifying the Distributed Database Landscape
Demystifying the Distributed Database Landscape
 
Getting started in the cloud for developers
Getting started in the cloud for developersGetting started in the cloud for developers
Getting started in the cloud for developers
 
Presto Summit 2018 - 02 - LinkedIn
Presto Summit 2018  - 02 - LinkedInPresto Summit 2018  - 02 - LinkedIn
Presto Summit 2018 - 02 - LinkedIn
 
How we switched to columnar at SpendHQ
How we switched to columnar at SpendHQHow we switched to columnar at SpendHQ
How we switched to columnar at SpendHQ
 

Similar to TiDB Introduction - Boston MySQL Meetup Group

Similar to TiDB Introduction - Boston MySQL Meetup Group (20)

TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB Introduction
 
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
 
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
 
Introducing TiDB @ SF DevOps Meetup
Introducing TiDB @ SF DevOps MeetupIntroducing TiDB @ SF DevOps Meetup
Introducing TiDB @ SF DevOps Meetup
 
Scale Relational Database with NewSQL
Scale Relational Database with NewSQLScale Relational Database with NewSQL
Scale Relational Database with NewSQL
 
A Brief Introduction of TiDB (Percona Live)
A Brief Introduction of TiDB (Percona Live)A Brief Introduction of TiDB (Percona Live)
A Brief Introduction of TiDB (Percona Live)
 
When Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu MaWhen Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu Ma
 
TiDB as an HTAP Database
TiDB as an HTAP DatabaseTiDB as an HTAP Database
TiDB as an HTAP Database
 
TiDB for Big Data
TiDB for Big DataTiDB for Big Data
TiDB for Big Data
 
TiDB vs Aurora.pdf
TiDB vs Aurora.pdfTiDB vs Aurora.pdf
TiDB vs Aurora.pdf
 
Introducing TiDB Operator [Cologne, Germany]
Introducing TiDB Operator [Cologne, Germany]Introducing TiDB Operator [Cologne, Germany]
Introducing TiDB Operator [Cologne, Germany]
 
"Smooth Operator" [Bay Area NewSQL meetup]
"Smooth Operator" [Bay Area NewSQL meetup]"Smooth Operator" [Bay Area NewSQL meetup]
"Smooth Operator" [Bay Area NewSQL meetup]
 
TiDB + Mobike by Kevin Xu (@kevinsxu)
TiDB + Mobike by Kevin Xu (@kevinsxu)TiDB + Mobike by Kevin Xu (@kevinsxu)
TiDB + Mobike by Kevin Xu (@kevinsxu)
 
Keynote -- Percona Live Europe 2018
Keynote -- Percona Live Europe 2018Keynote -- Percona Live Europe 2018
Keynote -- Percona Live Europe 2018
 
Data-at-scale-with-TIDB Mydbops Co-Founder Kabilesh PR at LSPE Event
Data-at-scale-with-TIDB Mydbops Co-Founder Kabilesh PR at LSPE EventData-at-scale-with-TIDB Mydbops Co-Founder Kabilesh PR at LSPE Event
Data-at-scale-with-TIDB Mydbops Co-Founder Kabilesh PR at LSPE Event
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
 

More from Morgan Tocker

Using MySQL in Automated Testing
Using MySQL in Automated TestingUsing MySQL in Automated Testing
Using MySQL in Automated Testing
Morgan Tocker
 
MySQL: From Single Instance to Big Data
MySQL: From Single Instance to Big DataMySQL: From Single Instance to Big Data
MySQL: From Single Instance to Big Data
Morgan Tocker
 
MySQL 5.7: Core Server Changes
MySQL 5.7: Core Server ChangesMySQL 5.7: Core Server Changes
MySQL 5.7: Core Server Changes
Morgan Tocker
 
MySQL 5.6 - Operations and Diagnostics Improvements
MySQL 5.6 - Operations and Diagnostics ImprovementsMySQL 5.6 - Operations and Diagnostics Improvements
MySQL 5.6 - Operations and Diagnostics Improvements
Morgan Tocker
 
Locking and Concurrency Control
Locking and Concurrency ControlLocking and Concurrency Control
Locking and Concurrency Control
Morgan Tocker
 
The InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLThe InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQL
Morgan Tocker
 
My sql 5.7-upcoming-changes-v2
My sql 5.7-upcoming-changes-v2My sql 5.7-upcoming-changes-v2
My sql 5.7-upcoming-changes-v2
Morgan Tocker
 
Mysql 57-upcoming-changes
Mysql 57-upcoming-changesMysql 57-upcoming-changes
Mysql 57-upcoming-changes
Morgan Tocker
 

More from Morgan Tocker (20)

Introducing Spirit - Online Schema Change
Introducing Spirit - Online Schema ChangeIntroducing Spirit - Online Schema Change
Introducing Spirit - Online Schema Change
 
MySQL Usability Guidelines
MySQL Usability GuidelinesMySQL Usability Guidelines
MySQL Usability Guidelines
 
MySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer GuideMySQL 8.0 Optimizer Guide
MySQL 8.0 Optimizer Guide
 
MySQL Server Defaults
MySQL Server DefaultsMySQL Server Defaults
MySQL Server Defaults
 
MySQL Cloud Service Deep Dive
MySQL Cloud Service Deep DiveMySQL Cloud Service Deep Dive
MySQL Cloud Service Deep Dive
 
MySQL 5.7 + JSON
MySQL 5.7 + JSONMySQL 5.7 + JSON
MySQL 5.7 + JSON
 
Using MySQL in Automated Testing
Using MySQL in Automated TestingUsing MySQL in Automated Testing
Using MySQL in Automated Testing
 
Upcoming changes in MySQL 5.7
Upcoming changes in MySQL 5.7Upcoming changes in MySQL 5.7
Upcoming changes in MySQL 5.7
 
MySQL Query Optimization
MySQL Query OptimizationMySQL Query Optimization
MySQL Query Optimization
 
MySQL Performance Metrics that Matter
MySQL Performance Metrics that MatterMySQL Performance Metrics that Matter
MySQL Performance Metrics that Matter
 
MySQL For Linux Sysadmins
MySQL For Linux SysadminsMySQL For Linux Sysadmins
MySQL For Linux Sysadmins
 
MySQL: From Single Instance to Big Data
MySQL: From Single Instance to Big DataMySQL: From Single Instance to Big Data
MySQL: From Single Instance to Big Data
 
MySQL NoSQL APIs
MySQL NoSQL APIsMySQL NoSQL APIs
MySQL NoSQL APIs
 
MySQL 5.7: Core Server Changes
MySQL 5.7: Core Server ChangesMySQL 5.7: Core Server Changes
MySQL 5.7: Core Server Changes
 
MySQL 5.6 - Operations and Diagnostics Improvements
MySQL 5.6 - Operations and Diagnostics ImprovementsMySQL 5.6 - Operations and Diagnostics Improvements
MySQL 5.6 - Operations and Diagnostics Improvements
 
Locking and Concurrency Control
Locking and Concurrency ControlLocking and Concurrency Control
Locking and Concurrency Control
 
The InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQLThe InnoDB Storage Engine for MySQL
The InnoDB Storage Engine for MySQL
 
My sql 5.7-upcoming-changes-v2
My sql 5.7-upcoming-changes-v2My sql 5.7-upcoming-changes-v2
My sql 5.7-upcoming-changes-v2
 
Mysql 57-upcoming-changes
Mysql 57-upcoming-changesMysql 57-upcoming-changes
Mysql 57-upcoming-changes
 
Optimizing MySQL
Optimizing MySQLOptimizing MySQL
Optimizing MySQL
 

Recently uploaded

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 

Recently uploaded (20)

Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 

TiDB Introduction - Boston MySQL Meetup Group

  • 1. Introducing TiDB (For those coming from MySQL..) Make Data Creative Morgan Tocker (@pingcap; @morgo) October, 2018
  • 2. ● History and Community ● Technical Walkthrough ● Use Case with Mobike ● Live Demo: TiDB on GKE ● MySQL Compatibility ● Q&A Agenda
  • 3. ● Sr Product / Community Manager ● ~15+ years MySQL Experience ○ MySQL AB, Sun Microsystems, Percona, Oracle ● Previously Product Manager for MySQL Server A Little About Me...
  • 4. A Little About PingCAP... ● Founded in April 2015 by 3 infrastructure engineers ● Offices in China and North America ● Remote Friendly! ○ I work from here ➡
  • 7. Our Product is The TiDB Platform ● TiDB platform: (Ti = Titanium) ○ TiDB (stateless SQL layer compatible with MySQL) ○ TiKV (distributed transactional key-value store) ○ TiSpark (Apache Spark plug-in on top of TiKV) ● Open source from Day 1 ○ GA 1.0: October 2017 ○ GA 2.0: April 2018
  • 8. TiDB is a NewSQL Database ● 1960s: First Gen Databases ● 1970s: Relational Model + SQL ● 2000s: Sharding + Memcached ● 2010s: NoSQL ● 2010s+: NewSQL Inspired by Google Spanner / F1 MySQL Compatible
  • 9. ● Hybrid OLTP & OLAP (Minimize ETL) ● Horizontal Scalability ● MySQL Compatible ● Distributed Transaction (ACID Compliant) ● High Availability ● Cloud-Native TiDB Core Features
  • 10. 2018 PingCAP 1. MySQL Scalability 2. Hybrid OLTP/OLAP Architecture 3. Unifying Data Storage/Management Common Use Cases
  • 11. Architecture SparkSQL TiDB TiDB Worker Spark Driver TiKV Cluster (Storage) Metadata TiKV TiKV TiKV Data location Job TiSpark DistSQL API TiKV TiDB TSO/Data location Worker Worker Spark Cluster TiDB Cluster TiDB DistSQL API PD PD Cluster TiKV TiKV TiDB KV API MySQL MySQL PD PD
  • 12. 2018 PingCAP Stars ● TiDB: 15,000+ ● TiKV: 3700+ Contributors ● TiDB: 200+ ● TiKV: 100+ Community
  • 15. Platform Architecture TiDB TiDB Worker Spark Driver TiKV Cluster (Storage) Metadata TiKV TiKV TiKV Data location Job TiSpark DistSQL API TiKV TiDB TSO/Data location Worker Worker Spark Cluster TiDB Cluster TiDB DistSQL API PD PD Cluster TiKV TiKV TiDB KV API MySQL MySQL SparkSQL PD PD SparkSQL
  • 16. TiKV: The Foundation [in CNCF] RocksDB Raft Transaction Txn KV API Coprocessor API RocksDB Raft Transaction Txn KV API Coprocessor API RocksDB Raft Transaction Txn KV API Coprocessor API Raft Group Client gRPC TiKV Instance TiKV Instance TiKV Instance gRPC gRPC PD Cluster
  • 17. TiDB: OLTP + Ad Hoc OLAP Node1 Node2 Node3 Node4 MySQL Network Protocol SQL Parser Cost-based Optimizer Distributed Executor (Coprocessor) ODBC/JDBC MySQL Client Any ORM which supports MySQL TiDB TiKV
  • 18. ID Name Email 1 Edward h@pingcap.com 2 Tom tom@pingcap.com ... user/1 Edward,h@pingcap.com user/2 Tom,tom@pingcap.com ... In TiKV -∞ +∞ (-∞, +∞) Sorted map “User” Table TiDB: Relational -> KV Some region...
  • 19. SQL -> Parser -> Coprocessor
  • 20. TiSpark: Complex OLAP Spark ExecSpark Exec Spark Driver Spark Exec TiKV TiKV TiKV TiKV TiSpark TiSpark TiSpark TiSpark TiKV Placement Driver (PD) gRPC Distributed Storage Layer gRPC retrieve data location retrieve real data from TiKV
  • 22. 2018 PingCAP Who’s using TiDB? 300+ Companies
  • 23. 2018 PingCAP 1. MySQL Scalability 2. Hybrid OLTP/OLAP Architecture 3. Unifying Data Storage/Management Common Use Cases
  • 24. Mobike + TiDB ● 200 million users ● 200 cities ● 9 milllion smart bikes ● ~30 TB / day
  • 25. ● Locking and unlocking of smart bikes generate massive data ● Smooth experience is key to user retention ● TiDB supports this system by alerting administrators when success rate of locking/unlocking drops, within minutes ● Quickly find malfunctioning bikes Scenario #1: Locking/Unlocking
  • 26. ● Synchronize TiDB with MySQL instances using Syncer (proprietary tool) ● TiDB + TiSpark empower real-time analysis with horizontal scalability ● No need for Hadoop + Hive Scenario #2: Real-Time Analysis
  • 27. ● An innovative loyalty program that must be on 24 x 7 x 365 ● TiDB handles: ○ High-concurrency for peak or promotional season ○ Permanent storage ○ Horizontal scalability ● No interruption as business evolves Scenario #3: Mobike Store
  • 28. TiDB on GKE Demo
  • 30. ● Compatible with MySQL 5.7 ○ Joins, Subqueries, DML, DDL etc. ● On the roadmap: ○ Views, Window Functions, GIS ● Missing: ○ Stored Procedures, Triggers, Events Summary pingcap.com /docs/sql/mysql-compatibility/
  • 31. ● Some features work differently ○ Auto Increment ○ Optimistic Locking ● TiDB works better with smaller transactions ○ Recommended to batch updates, deletes, inserts to 5000 rows Nuanced
  • 32. Thank You! Twitter: @PingCAP; @morgo https://github.com/pingcap (Give us a Watch/Star!) Morgan Tocker (morgan@pingcap.com) Early Sign-up: www.pingcap.com/tidb-academy/
  • 33. ● Hash Join (fastest; if table <= 50 million rows) ● Sort Merge Join (join on indexed column or ordered data source) ● Index Lookup Join (join on indexed column; ideally after filter, result < 10,000 rows) Chosen based on Cost-based Optimizer: Join Support Network cost Memory cost CPU cost
  • 34. Index Structure Row: Key: tablePrefix_rowPrefix_tableID_rowID (IDs are assigned by TiDB, all int64) Value: [col1, col2, col3, col4] Index: Key: tablePrefix_idxPrefix_tableID_indexID_ColumnsValue_rowID Value: [null] Keys are ordered by byte array in TiKV, so can support SCAN Every key is appended a timestamp, issued by Placement Driver
  • 35. ● Complex calculation pushdown ● Key-range pruning ● Index support: ○ Clustered index / non-clustered index ○ Index-only query optimization ● Cost-based optimization: ○ Stats gathered from TiDB in histogram TiSpark: Features
  • 36. PD: Dynamic Split and Merge Region A Region A Region B Region A Region A Region B Split Region A Region A Region B Merge TiKV_1 TiKV_2 TiKV_2TiKV_1
  • 37. PD: Hotspot Removal *Region A* *Region B* Region A Region B Workload *Region A* Region B Region A *Region B* Workload Workload Hotspot Schedule (Raft leader transfer) TiKV_1 TiKV_2 TiKV_2TiKV_1
  • 38. Geo-Replication + Data Location *Region A* Region B Region A Region B Seattle_1 Seattle_2 Region A *Region B* New York_1 *Region A* Region B Region A *Region B* Seattle_2Seattle_1 Region A Region B New York_1
  • 39. ● Timestamp Oracle service (from Google’s Percolator paper) ● 2-Phase commit protocol (2PC) ● Problem: Single point of failure ● Solution: Placement Driver HA cluster ○ Replicated using Raft Transaction Model
  • 40. ● Formal proof using TLA+ ○ a formal specification and verification language to reason about and prove aspects of complex systems ● Raft ● TSO/Percolator ● 2PC ● See details: https://github.com/pingcap/tla-plus Guaranteeing Correctness