Demystifying Real Time Analytics With TiDB
Unlocking the Power of TiFlash for Real-Time Data Insights
Kabilesh PR
Co-Founder, Mydbops LLP
33rd
MyWebinar - Mydbops
About Me
Kabilesh PR
❏ Interested in Open Source DB technologies
❏ Keen Interest in MySQL, TiDB & Distributed SQL’s
❏ Active Tech Speaker/Blogger
❏ Pingcap Certified TiDB Professional
❏ AWS Database Speciality
❏ Founding Partner, Mydbops
Focus on MySQL, MongoDB, PostgreSQL, TiDB, Cassandra
Consulting
Services
Consulting
Services
Managed
Services
24*7
DBA Team
Targeted
Engagement
Mydbops Services
❏ Introduction
❏ TiDB Architecture
❏ Understanding Real-time Analytics
❏ Analytical Engine - TiFlash
❏ Enabling TiFlash
❏ Queries with TiFlash
Agenda
Introduction
TiDB is an Open Source , Distributed HTAP database compatible with MySQL Protocol.
Introduction
2
Understanding TiDB Architecture
MySQL compatible, the TiDB SQL Layer
separates compute from storage to make
scaling simpler,
The Placement Driver functions as a
orchestrator. Responsible for TSO,
scheduling, shard maintenance,
metadata and much more
Tikv is ROW based, Transactional
storage, Offers high-availability, strong
consistency that can auto-scale to
hundreds of node with petabyte data
scale
Advantages of TiDB
Open Source
No Vendor lock-in with a
database that’s 100% open source.
Horizontal Scaling
Grants total transparency into data workloads without
manual sharding.
Horizontal Scaling
Grants total transparency into
data workloads with automatic sharding.
High Availability
Guarantees auto-failover and self-healing for
continuous data access.
MySQL Compatibility
Enjoy the most MySQL compatible
distributed SQL database on the planet.
Multi-Cloud
Deploy database clusters
anywhere in the world.
Mixed Workloads
Streamlined tech stack makes it
easier to produce real-time analytics.
Robust Security
Protect data with enterprise-grade
encryption both in-flight and at-rest.
Global client-Base TiDB
Understanding Real-time Analytics
❏ Real-time analytics: Process of analyzing data as it is created, collected, and processed to provide
immediate insights to enable prompt decision-making.
❏ Use- Case:
Real-time Fraud detection, Market Analysis, Personalized Recommendations, Demand forecasting
❏ Challenges:
Data Volume and Velocity
Integration
Data Quality
Cost
● With TiDB Realtime insights as in when the business happens.
● Easy Integration and maintenance.
Real-Time Analytical Engine = TiFlash
❏ An Integrated columnar storage engine built exclusively for analytical workload.
❏ It's tightly integrated with TiKV and uses Clickhouse co-processor for providing MPP (Massively
Parallel Processing) analytical queries.
What is TiFlash?
Data Sync with TiFlash
Data Sync to TiFlash is done using the extended Raft-Learner Algorithm
Enabling TiFlash
❏ Adding a TiFlash node online won't impact the OLTP workload.
Tiflash_servers:
- host: 10.0.1.10
#tiup cluster scale-out <cluster-name> scale-out-topology.yaml
❏ After adding a TiFlash node, replication won’t starts by default.
❏ Replication to TiFlash can be at the table level or schema level.
ALTER TABLE table_name SET TIFLASH REPLICA count;
ALTER DATABASE db_name SET TIFLASH REPLICA count;
❏ Monitoring of TiFlash replication:
SELECT * FROM information_schema.tiflash_replica;
Enabling TiFlash
Scaling TiFlash
❏ Scaling out and scaling in TiFlash nodes is done online and won't impact the OLTP workload.
Nodes Addition:
Tiflash_servers:
- host: 10.0.1.10
- host: 10.0.1.12
#tiup cluster scale-out <cluster-name> scale-out-topology.yaml
Adjust the table / Schema replica count
ALTER TABLE table_name SET TIFLASH REPLICA count;
ALTER DATABASE db_name SET TIFLASH REPLICA count;
Node Removal:
Set the replica count to 0 for table
ALTER TABLE table_name SET TIFLASH REPLICA 0;
#tiup cluster scale-in <cluster-name> --node <tiflash_node_id>
Scaling TiFlash
Queries With TiFlash
❏ TiDB Optimizer automatically determines to use TiFlash replicas based on the COST.
❏ This works even in mix of workloads.
Smart Selection
❏ You can specify read queries to use replicas of specific engines with TiDB as shown below:
Config file:
[isolation-read]
engines = ["tikv", "tidb", "tiflash"]
SESSION:
set SESSION tidb_isolation_read_engines = "engine list separated
by commas";
Engine Isolation
❏ You can force the TiDB to use TiFlash replica as below with manual hint in query.
select /*+ read_from_storage(tiflash[table_name]) */ ... from
table_name;
Manual Hint
TiFlash Modes
❏ This mode enables the execution of queries in parallel across multiple nodes.
❏ TiDB automatically determines when to select MPP based on the optimizer’s cost estimation.
tidb_allow_mpp ,tidb_enforce_mpp- Control variables
MPP Mode
❏ With FastScan, TiFlash provides more efficient query performance but sacrifices the data
consistency.
❏ This mode is disabled by default.
❏ Query results might include old data of a table.
❏ Enable / Disable using tiflash_fastscan
FastScan Mode
Any Questions?
Thank You

Demystifying Real time Analytics with TiDB

  • 1.
    Demystifying Real TimeAnalytics With TiDB Unlocking the Power of TiFlash for Real-Time Data Insights Kabilesh PR Co-Founder, Mydbops LLP 33rd MyWebinar - Mydbops
  • 2.
    About Me Kabilesh PR ❏Interested in Open Source DB technologies ❏ Keen Interest in MySQL, TiDB & Distributed SQL’s ❏ Active Tech Speaker/Blogger ❏ Pingcap Certified TiDB Professional ❏ AWS Database Speciality ❏ Founding Partner, Mydbops
  • 3.
    Focus on MySQL,MongoDB, PostgreSQL, TiDB, Cassandra Consulting Services Consulting Services Managed Services 24*7 DBA Team Targeted Engagement Mydbops Services
  • 4.
    ❏ Introduction ❏ TiDBArchitecture ❏ Understanding Real-time Analytics ❏ Analytical Engine - TiFlash ❏ Enabling TiFlash ❏ Queries with TiFlash Agenda
  • 5.
  • 6.
    TiDB is anOpen Source , Distributed HTAP database compatible with MySQL Protocol. Introduction 2
  • 7.
  • 9.
    MySQL compatible, theTiDB SQL Layer separates compute from storage to make scaling simpler,
  • 10.
    The Placement Driverfunctions as a orchestrator. Responsible for TSO, scheduling, shard maintenance, metadata and much more
  • 11.
    Tikv is ROWbased, Transactional storage, Offers high-availability, strong consistency that can auto-scale to hundreds of node with petabyte data scale
  • 12.
    Advantages of TiDB OpenSource No Vendor lock-in with a database that’s 100% open source. Horizontal Scaling Grants total transparency into data workloads without manual sharding. Horizontal Scaling Grants total transparency into data workloads with automatic sharding. High Availability Guarantees auto-failover and self-healing for continuous data access. MySQL Compatibility Enjoy the most MySQL compatible distributed SQL database on the planet. Multi-Cloud Deploy database clusters anywhere in the world. Mixed Workloads Streamlined tech stack makes it easier to produce real-time analytics. Robust Security Protect data with enterprise-grade encryption both in-flight and at-rest.
  • 13.
  • 14.
  • 15.
    ❏ Real-time analytics:Process of analyzing data as it is created, collected, and processed to provide immediate insights to enable prompt decision-making. ❏ Use- Case: Real-time Fraud detection, Market Analysis, Personalized Recommendations, Demand forecasting ❏ Challenges: Data Volume and Velocity Integration Data Quality Cost
  • 16.
    ● With TiDBRealtime insights as in when the business happens. ● Easy Integration and maintenance.
  • 17.
  • 18.
    ❏ An Integratedcolumnar storage engine built exclusively for analytical workload. ❏ It's tightly integrated with TiKV and uses Clickhouse co-processor for providing MPP (Massively Parallel Processing) analytical queries. What is TiFlash?
  • 19.
  • 20.
    Data Sync toTiFlash is done using the extended Raft-Learner Algorithm
  • 21.
  • 22.
    ❏ Adding aTiFlash node online won't impact the OLTP workload. Tiflash_servers: - host: 10.0.1.10 #tiup cluster scale-out <cluster-name> scale-out-topology.yaml ❏ After adding a TiFlash node, replication won’t starts by default. ❏ Replication to TiFlash can be at the table level or schema level. ALTER TABLE table_name SET TIFLASH REPLICA count; ALTER DATABASE db_name SET TIFLASH REPLICA count; ❏ Monitoring of TiFlash replication: SELECT * FROM information_schema.tiflash_replica; Enabling TiFlash
  • 23.
  • 24.
    ❏ Scaling outand scaling in TiFlash nodes is done online and won't impact the OLTP workload. Nodes Addition: Tiflash_servers: - host: 10.0.1.10 - host: 10.0.1.12 #tiup cluster scale-out <cluster-name> scale-out-topology.yaml Adjust the table / Schema replica count ALTER TABLE table_name SET TIFLASH REPLICA count; ALTER DATABASE db_name SET TIFLASH REPLICA count; Node Removal: Set the replica count to 0 for table ALTER TABLE table_name SET TIFLASH REPLICA 0; #tiup cluster scale-in <cluster-name> --node <tiflash_node_id> Scaling TiFlash
  • 25.
  • 26.
    ❏ TiDB Optimizerautomatically determines to use TiFlash replicas based on the COST. ❏ This works even in mix of workloads. Smart Selection
  • 27.
    ❏ You canspecify read queries to use replicas of specific engines with TiDB as shown below: Config file: [isolation-read] engines = ["tikv", "tidb", "tiflash"] SESSION: set SESSION tidb_isolation_read_engines = "engine list separated by commas"; Engine Isolation
  • 28.
    ❏ You canforce the TiDB to use TiFlash replica as below with manual hint in query. select /*+ read_from_storage(tiflash[table_name]) */ ... from table_name; Manual Hint
  • 29.
  • 30.
    ❏ This modeenables the execution of queries in parallel across multiple nodes. ❏ TiDB automatically determines when to select MPP based on the optimizer’s cost estimation. tidb_allow_mpp ,tidb_enforce_mpp- Control variables MPP Mode
  • 31.
    ❏ With FastScan,TiFlash provides more efficient query performance but sacrifices the data consistency. ❏ This mode is disabled by default. ❏ Query results might include old data of a table. ❏ Enable / Disable using tiflash_fastscan FastScan Mode
  • 32.
  • 33.