SlideShare a Scribd company logo
1 of 30
Download to read offline
MariaDB Maxscale
Streaming Changes to Kafka
in Real Time
Markus Mäkelä
Massimiliano Pinto
What Is Real-Time Analytics?
How Real-Time Analytics Differs From Batch Analytics
Batch
Real-Time
Data oriented process
Scope is static
Data is complete
Output reflects input
Time oriented process
Scope is dynamic
Data is incremental
Output reflects changes in input
Change Data Capture
The MariaDB MaxScale CDC System
What Is Change Data Capture in MaxScale?
● Captures changes in committed data
○ MariaDB replication protocol awareness
● Stored as Apache Avro
○ Compact and efficient serialization format
● Simple data streaming service
○ Provides continuous data streams
What Does the CDC System Consist Of?
● Binlog replication relay (a.k.a Binlog Server)
● Data conversion service
● CDC protocol
● Kafka producer
Replication Proxy Layer
The Binlogrouter Module
Binlog Events
● The master database sends events from its binlog files
● Events sent are a binary representation of the binlog file
contents with a header prepended
● Once all events have been sent the master pauses until new
events are ready to be sent
Binlog Event details
378 | Gtid | 10122 | 420 | BEGIN GTID 0-11-10045
420 | Table_map | 10122 | 465 | table_id: 18 (test.t4)
465 | Write_rows_v1 | 10122 | 503 | table_id: 18 flags: STMT_END_F
503 | Xid | 10122 | 534 | COMMIT /* xid=823 */
Transaction -- TRX1
BEGIN;
INSERT INTO test.t4 VALUES (101);
COMMIT;
Binlog Events Receiving
mysql-bin.01045
Replication
Protocol
● MariaDB Replication Slave registration allows
MaxScale to receive binlog events from master
● Binlog events are stored in binlog files, same
way as master server does
Row based replication with full row image required on Master
set global binlog_format='ROW';
set global binlog_row_image='full';
MariaDB Master Server
MaxScale
Binlog Server
Binlog to Avro Conversion
The Avrorouter Module
Apache Avro™
● A data serialization format
○ Consists of a file header and one or more data blocks
● Specifies an Object Container file format
● Efficient storage of high volume data
○ Schema always stored with data
○ Compact integer representation
○ Supports compression
● Easy to process in parallel due to how the data blocks are stored
● Tooling for Avro is readily available
○ Easy to extract and load into other systems
Source: http://avro.apache.org/
Avro file conversion
mysql-bin.01045 AVRO_file_001
AVRO_file_002
AVRO converter
● Binlog files are converted to Avro file containers
○ one per database table
● On schema changes a new file sequence is created
● Tunable flows of events
#4
#2
#3
#1
Data Warehouse Platforms
Avro Schema
{
"type": "record",
"namespace": "MaxScaleChangeDataSchema.avro",
"name": "ChangeRecord",
"fields": ...
}
• Defines how the data is stored
• Contains some static fields
• MaxScale records always named as ChangeRecord in
MaxScaleChangeDataSchema.avro namespace
Avro Schema - Fields
"fields": [
{ "name": "domain", "type": "int" }, { "name": "server_id", "type": "int" },
{ "name": "sequence", "type": "int" }, { "name": "event_number", "type": "int" },
{ "name": "timestamp", "type": "int" },
{ "name": "event_type", "type": { "type": "enum", "name": "EVENT_TYPES",
"symbols": [ "insert", "update_before", "update_after", "delete" ] } },
… More fields …
]
• MaxScale adds six default fields
⬠ Three GTID components
⬠ Event index inside transaction
⬠ Event timestamp
⬠ Type of captured event
• A list of field information
• Constructed from standard Avro
data types
Avro Schema - Fields
"fields": [
{ "name": "domain", "type": "int" }, { "name": "server_id", "type": "int" },
{ "name": "sequence", "type": "int" }, { "name": "event_number", "type": "int" },
{ "name": "timestamp", "type": "int" },
{ "name": "event_type", "type": {
"type": "enum",
"name": "EVENT_TYPES",
"symbols": [ "insert", "update_before", "update_after", "delete" ]
}
}
{ "name": "id", "type": "int", "real_type": "int", "length": -1},
{ "name": "data", "type": "string", "real_type": "varchar", "length": 255},
]
CREATE TABLE t1 (id INT AUTO_INCREMENT PRIMARY KEY, data VARCHAR(255));
Avro schema file db1.tbl1.000001.avsc
Data Streaming
The CDC Protocol
Data Streaming in MaxScale
• Provide real time transactional data to data lake for analytics
• Capture changed data from the binary log events
• From MariaDB to CDC clients in real-time
CDC Protocol
● Register as change data client
● Receive change data records
● Query last GTID
● Query change data record statistics
● One client receives an events stream for one table
CDC clients
Change Data
Listener Protocol
CDC Client
● Simple Python 3 command line client for the CDC protocol
● Continuous stream consumer
○ A building block for more complex systems
○ Outputs newline delimited JSON or raw Avro data
● Shipped as a part of MaxScale 2.0
CDC Client - Example Output
[alex@localhost ~]$ cdc.py --user umaxuser --password maxpwd db1.tbl1
{"namespace": "MaxScaleChangeDataSchema.avro", "type": "record", "name": "ChangeRecord",
"fields": [{"name": "domain", "type": "int"}, {"name": "server_id", "type": "int"},
{"name": "sequence", "type": "int"}, {"name": "event_number", "type": "int"}, {"name":
"timestamp", "type": "int"}, {"name": "event_type", "type": {"type": "enum", "name":
"EVENT_TYPES", "symbols": ["insert", "update_before", "update_after", "delete"]}},
{"name": "id", "type": "int", "real_type": "int", "length": -1},
{"name": "data", "type": "string", "real_type": "varchar", "length": 255}]}
• Schema is sent first
• Events come after the schema
• New schema sent if the schema changes
CDC Client - Example Output
{"sequence": 2, "server_id": 3000, "data": "Hello", "event_type": "insert", "id": 1, "domain": 0, "timestamp": 1490878875,
"event_number": 1}
{"sequence": 3, "server_id": 3000, "data": "world!", "event_type": "insert", "id": 2, "domain": 0, "timestamp": 1490878880,
"event_number": 1}
{"sequence": 4, "server_id": 3000, "data": "Hello", "event_type": "update_before", "id": 1, "domain": 0, "timestamp": 1490878914,
"event_number": 1}
{"sequence": 4, "server_id": 3000, "data": "Greetings", "event_type": "update_after", "id": 1, "domain": 0, "timestamp":
1490878914, "event_number": 2}
{"sequence": 5, "server_id": 3000, "data": "world!", "event_type": "delete", "id": 2, "domain": 0, "timestamp": 1490878929,
"event_number": 1}
INSERT INTO t1 (data) VALUES ("Hello"); -- TRX1
INSERT INTO t1 (data) VALUES ("world!"); -- TRX2
UPDATE t1 SET data = "Greetings" WHERE id = 1; -- TRX3
DELETE FROM t1 WHERE id = 2; -- TRX4
CDC Client - Example Output
{"namespace": "MaxScaleChangeDataSchema.avro", "type": "record", "name": "ChangeRecord",
"fields": [{"name": "domain", "type": "int"}, {"name": "server_id", "type": "int"}, {"name":
"sequence", "type": "int"}, {"name": "event_number", "type": "int"}, {"name": "timestamp",
"type": "int"}, {"name": "event_type", "type": {"type": "enum", "name": "EVENT_TYPES",
"symbols": ["insert", "update_before", "update_after", "delete"]}}, {"name": "id", "type": "int",
"real_type": "int", "length": -1}, {"name": "data", "type": "string", "real_type": "varchar", "length":
255}, {"name": "account_balance", "type": "float", "real_type": "float", "length": -1}]}
{"domain": 0, "server_id": 3000, "sequence": 7, "event_number": 1, "timestamp": 1496682140,
"event_type": "insert", "id": 3, "data": "New Schema", "account_balance": 25.0}
ALTER TABLE t1 ADD COLUMN account_balance FLOAT;
INSERT INTO t1 (data, account_balance) VALUES ("New Schema", 25.0);
Kafka Producer
The CDC Kafka Producer
Why Kafka?
[vagrant@maxscale ~]$ ./bin/kafka-console-consumer.sh --zookeeper 127.0.0.1:2181 --topic MyTopic
{"namespace": "MaxScaleChangeDataSchema.avro", "type": "record", "fields": [{"type": "int", "name": "domain"}, {"type": "int", "name":
"server_id"}, {"type": "int", "name": "sequence"}, {"type": "int", "name": "event_number"},
{"type": "int", "name": "timestamp"},
{"type": {"symbols": ["insert", "update_before", "update_after", "delete"], "type": "enum", "name": "EVENT_TYPES"}, "name": "event_type"},
{"type": "int", "name": "id", "real_type": "int", "length": -1}], "name": "ChangeRecord"}
{"domain": 0, "event_number": 1, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 1}
{"domain": 0, "event_number": 2, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 2}
● Isolation of producers and consumers
○ Data can be produced and consumed at any
time
● Good for intermediate storage of streams
○ Data is stored until it is processed
○ Distributed storage makes data persistent
● Widely supported for real time analytics
○ Druid
○ Apache Storm
● Tooling for Kafka already exists
CDC Kafka Producer
● A Proof-of-Concept Kafka Producer
● Reads JSON generated by the MaxScale CDC Client
● Publishes JSON records to a Kafka cluster
● Simple usage
cdc.py -u maxuser -pmaxpwd -h 127.0.0.1 -P 4001 test.t1 |
cdc_kafka_producer.py --kafka-broker=127.0.0.1:9092 --kafka-topic=MyTopic
Change Data
Listener Protocol
From MaxScale to Kafka
Kafka Producer
CDC Consumer/Kafka Producer
CDC Client
Binlog Server
Everything Together
mysql-bin.01045
AVRO_file_001
AVRO_file_002
AVRO converter
CDC clients
Change Data
Capture Listener
AVRO streaming
MariaDB
Master
MaxScale for Streaming Changes
MaxScale solution provides:
● Easy replication setup from MariaDB database
● Integrated and configurable Avro file conversion
● Easy data streaming to compatible solutions
● Ready to use Python scripts
Thank you

More Related Content

What's hot

RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesYoshinori Matsunobu
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOAltinity Ltd
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQLJim Mlodgenski
 
PostgreSQL HA
PostgreSQL   HAPostgreSQL   HA
PostgreSQL HAharoonm
 
Reducing Risk When Upgrading MySQL
Reducing Risk When Upgrading MySQLReducing Risk When Upgrading MySQL
Reducing Risk When Upgrading MySQLKenny Gryp
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationEDB
 
The MySQL Query Optimizer Explained Through Optimizer Trace
The MySQL Query Optimizer Explained Through Optimizer TraceThe MySQL Query Optimizer Explained Through Optimizer Trace
The MySQL Query Optimizer Explained Through Optimizer Traceoysteing
 
Maxscale 소개 1.1.1
Maxscale 소개 1.1.1Maxscale 소개 1.1.1
Maxscale 소개 1.1.1NeoClova
 
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, AdjustShipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, AdjustAltinity Ltd
 
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docxKeepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docxNeoClova
 
PostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication CheatsheetPostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication CheatsheetAlexey Lesovsky
 
Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기NeoClova
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLJim Mlodgenski
 
Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0Mydbops
 
Escalabilidade horizontal com PostgreSQL e Pgpool II
Escalabilidade horizontal com PostgreSQL e Pgpool IIEscalabilidade horizontal com PostgreSQL e Pgpool II
Escalabilidade horizontal com PostgreSQL e Pgpool IIMatheus Espanhol
 
MariaDB Performance Tuning Crash Course
MariaDB Performance Tuning Crash CourseMariaDB Performance Tuning Crash Course
MariaDB Performance Tuning Crash CourseSeveralnines
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleMariaDB plc
 
Redis data modeling examples
Redis data modeling examplesRedis data modeling examples
Redis data modeling examplesTerry Cho
 

What's hot (20)

RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
 
PostgreSQL HA
PostgreSQL   HAPostgreSQL   HA
PostgreSQL HA
 
Reducing Risk When Upgrading MySQL
Reducing Risk When Upgrading MySQLReducing Risk When Upgrading MySQL
Reducing Risk When Upgrading MySQL
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
How to Use JSON in MySQL Wrong
How to Use JSON in MySQL WrongHow to Use JSON in MySQL Wrong
How to Use JSON in MySQL Wrong
 
The MySQL Query Optimizer Explained Through Optimizer Trace
The MySQL Query Optimizer Explained Through Optimizer TraceThe MySQL Query Optimizer Explained Through Optimizer Trace
The MySQL Query Optimizer Explained Through Optimizer Trace
 
Maxscale 소개 1.1.1
Maxscale 소개 1.1.1Maxscale 소개 1.1.1
Maxscale 소개 1.1.1
 
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, AdjustShipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
 
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docxKeepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
 
PostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication CheatsheetPostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication Cheatsheet
 
Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
 
Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0Redo log improvements MYSQL 8.0
Redo log improvements MYSQL 8.0
 
Escalabilidade horizontal com PostgreSQL e Pgpool II
Escalabilidade horizontal com PostgreSQL e Pgpool IIEscalabilidade horizontal com PostgreSQL e Pgpool II
Escalabilidade horizontal com PostgreSQL e Pgpool II
 
MariaDB Performance Tuning Crash Course
MariaDB Performance Tuning Crash CourseMariaDB Performance Tuning Crash Course
MariaDB Performance Tuning Crash Course
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScale
 
Redis data modeling examples
Redis data modeling examplesRedis data modeling examples
Redis data modeling examples
 

Similar to Streaming Operational Data with MariaDB MaxScale

What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?Miklos Christine
 
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"DataStax Academy
 
Intravert Server side processing for Cassandra
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for CassandraEdward Capriolo
 
Boundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary
 
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...confluent
 
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...Databricks
 
ELK Stack - Turn boring logfiles into sexy dashboard
ELK Stack - Turn boring logfiles into sexy dashboardELK Stack - Turn boring logfiles into sexy dashboard
ELK Stack - Turn boring logfiles into sexy dashboardGeorg Sorst
 
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Microsoft Tech Community
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache PinotSiddharth Teotia
 
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...DataWorks Summit
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDatabricks
 
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital EnterpriseWSO2
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingGuozhang Wang
 
Logisland "Event Mining at scale"
Logisland "Event Mining at scale"Logisland "Event Mining at scale"
Logisland "Event Mining at scale"Thomas Bailet
 
mypipe: Buffering and consuming MySQL changes via Kafka
mypipe: Buffering and consuming MySQL changes via Kafkamypipe: Buffering and consuming MySQL changes via Kafka
mypipe: Buffering and consuming MySQL changes via KafkaHisham Mardam-Bey
 
Strtio Spark Streaming + Siddhi CEP Engine
Strtio Spark Streaming + Siddhi CEP EngineStrtio Spark Streaming + Siddhi CEP Engine
Strtio Spark Streaming + Siddhi CEP EngineMyung Ho Yun
 
Elk presentation 2#3
Elk presentation 2#3Elk presentation 2#3
Elk presentation 2#3uzzal basak
 
Streaming Data with scalaz-stream
Streaming Data with scalaz-streamStreaming Data with scalaz-stream
Streaming Data with scalaz-streamGaryCoady
 
Writing Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIDatabricks
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingYaroslav Tkachenko
 

Similar to Streaming Operational Data with MariaDB MaxScale (20)

What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?
 
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
 
Intravert Server side processing for Cassandra
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for Cassandra
 
Boundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary Front end tech talk: how it works
Boundary Front end tech talk: how it works
 
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
 
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
 
ELK Stack - Turn boring logfiles into sexy dashboard
ELK Stack - Turn boring logfiles into sexy dashboardELK Stack - Turn boring logfiles into sexy dashboard
ELK Stack - Turn boring logfiles into sexy dashboard
 
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...Leveraging Azure Databricks to minimize time to insight by combining Batch an...
Leveraging Azure Databricks to minimize time to insight by combining Batch an...
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache Pinot
 
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things Right
 
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
 
Logisland "Event Mining at scale"
Logisland "Event Mining at scale"Logisland "Event Mining at scale"
Logisland "Event Mining at scale"
 
mypipe: Buffering and consuming MySQL changes via Kafka
mypipe: Buffering and consuming MySQL changes via Kafkamypipe: Buffering and consuming MySQL changes via Kafka
mypipe: Buffering and consuming MySQL changes via Kafka
 
Strtio Spark Streaming + Siddhi CEP Engine
Strtio Spark Streaming + Siddhi CEP EngineStrtio Spark Streaming + Siddhi CEP Engine
Strtio Spark Streaming + Siddhi CEP Engine
 
Elk presentation 2#3
Elk presentation 2#3Elk presentation 2#3
Elk presentation 2#3
 
Streaming Data with scalaz-stream
Streaming Data with scalaz-streamStreaming Data with scalaz-stream
Streaming Data with scalaz-stream
 
Writing Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark API
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processing
 

More from MariaDB plc

MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB plc
 
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB plc
 
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB plc
 
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB plc
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB plc
 
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB plc
 
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB plc
 
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB plc
 
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB plc
 
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB plc
 
Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023MariaDB plc
 
Hochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBHochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBMariaDB plc
 
Die Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerDie Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerMariaDB plc
 
Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®MariaDB plc
 
Introducing workload analysis
Introducing workload analysisIntroducing workload analysis
Introducing workload analysisMariaDB plc
 
Under the hood: SkySQL monitoring
Under the hood: SkySQL monitoringUnder the hood: SkySQL monitoring
Under the hood: SkySQL monitoringMariaDB plc
 
Introducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorIntroducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorMariaDB plc
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB plc
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBMariaDB plc
 
The architecture of SkySQL
The architecture of SkySQLThe architecture of SkySQL
The architecture of SkySQLMariaDB plc
 

More from MariaDB plc (20)

MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
 
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - Newpharma
 
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - Cloud
 
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB Enterprise
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance Optimization
 
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale
 
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentation
 
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentation
 
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
 
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
 
Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023
 
Hochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBHochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDB
 
Die Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerDie Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise Server
 
Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®
 
Introducing workload analysis
Introducing workload analysisIntroducing workload analysis
Introducing workload analysis
 
Under the hood: SkySQL monitoring
Under the hood: SkySQL monitoringUnder the hood: SkySQL monitoring
Under the hood: SkySQL monitoring
 
Introducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorIntroducing the R2DBC async Java connector
Introducing the R2DBC async Java connector
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introduction
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
 
The architecture of SkySQL
The architecture of SkySQLThe architecture of SkySQL
The architecture of SkySQL
 

Recently uploaded

JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 

Recently uploaded (20)

JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Streaming Operational Data with MariaDB MaxScale

  • 1. MariaDB Maxscale Streaming Changes to Kafka in Real Time Markus Mäkelä Massimiliano Pinto
  • 2. What Is Real-Time Analytics? How Real-Time Analytics Differs From Batch Analytics
  • 3. Batch Real-Time Data oriented process Scope is static Data is complete Output reflects input Time oriented process Scope is dynamic Data is incremental Output reflects changes in input
  • 4. Change Data Capture The MariaDB MaxScale CDC System
  • 5. What Is Change Data Capture in MaxScale? ● Captures changes in committed data ○ MariaDB replication protocol awareness ● Stored as Apache Avro ○ Compact and efficient serialization format ● Simple data streaming service ○ Provides continuous data streams
  • 6. What Does the CDC System Consist Of? ● Binlog replication relay (a.k.a Binlog Server) ● Data conversion service ● CDC protocol ● Kafka producer
  • 7. Replication Proxy Layer The Binlogrouter Module
  • 8. Binlog Events ● The master database sends events from its binlog files ● Events sent are a binary representation of the binlog file contents with a header prepended ● Once all events have been sent the master pauses until new events are ready to be sent
  • 9. Binlog Event details 378 | Gtid | 10122 | 420 | BEGIN GTID 0-11-10045 420 | Table_map | 10122 | 465 | table_id: 18 (test.t4) 465 | Write_rows_v1 | 10122 | 503 | table_id: 18 flags: STMT_END_F 503 | Xid | 10122 | 534 | COMMIT /* xid=823 */ Transaction -- TRX1 BEGIN; INSERT INTO test.t4 VALUES (101); COMMIT;
  • 10. Binlog Events Receiving mysql-bin.01045 Replication Protocol ● MariaDB Replication Slave registration allows MaxScale to receive binlog events from master ● Binlog events are stored in binlog files, same way as master server does Row based replication with full row image required on Master set global binlog_format='ROW'; set global binlog_row_image='full'; MariaDB Master Server MaxScale Binlog Server
  • 11. Binlog to Avro Conversion The Avrorouter Module
  • 12. Apache Avro™ ● A data serialization format ○ Consists of a file header and one or more data blocks ● Specifies an Object Container file format ● Efficient storage of high volume data ○ Schema always stored with data ○ Compact integer representation ○ Supports compression ● Easy to process in parallel due to how the data blocks are stored ● Tooling for Avro is readily available ○ Easy to extract and load into other systems Source: http://avro.apache.org/
  • 13. Avro file conversion mysql-bin.01045 AVRO_file_001 AVRO_file_002 AVRO converter ● Binlog files are converted to Avro file containers ○ one per database table ● On schema changes a new file sequence is created ● Tunable flows of events #4 #2 #3 #1 Data Warehouse Platforms
  • 14. Avro Schema { "type": "record", "namespace": "MaxScaleChangeDataSchema.avro", "name": "ChangeRecord", "fields": ... } • Defines how the data is stored • Contains some static fields • MaxScale records always named as ChangeRecord in MaxScaleChangeDataSchema.avro namespace
  • 15. Avro Schema - Fields "fields": [ { "name": "domain", "type": "int" }, { "name": "server_id", "type": "int" }, { "name": "sequence", "type": "int" }, { "name": "event_number", "type": "int" }, { "name": "timestamp", "type": "int" }, { "name": "event_type", "type": { "type": "enum", "name": "EVENT_TYPES", "symbols": [ "insert", "update_before", "update_after", "delete" ] } }, … More fields … ] • MaxScale adds six default fields ⬠ Three GTID components ⬠ Event index inside transaction ⬠ Event timestamp ⬠ Type of captured event • A list of field information • Constructed from standard Avro data types
  • 16. Avro Schema - Fields "fields": [ { "name": "domain", "type": "int" }, { "name": "server_id", "type": "int" }, { "name": "sequence", "type": "int" }, { "name": "event_number", "type": "int" }, { "name": "timestamp", "type": "int" }, { "name": "event_type", "type": { "type": "enum", "name": "EVENT_TYPES", "symbols": [ "insert", "update_before", "update_after", "delete" ] } } { "name": "id", "type": "int", "real_type": "int", "length": -1}, { "name": "data", "type": "string", "real_type": "varchar", "length": 255}, ] CREATE TABLE t1 (id INT AUTO_INCREMENT PRIMARY KEY, data VARCHAR(255)); Avro schema file db1.tbl1.000001.avsc
  • 18. Data Streaming in MaxScale • Provide real time transactional data to data lake for analytics • Capture changed data from the binary log events • From MariaDB to CDC clients in real-time
  • 19. CDC Protocol ● Register as change data client ● Receive change data records ● Query last GTID ● Query change data record statistics ● One client receives an events stream for one table CDC clients Change Data Listener Protocol
  • 20. CDC Client ● Simple Python 3 command line client for the CDC protocol ● Continuous stream consumer ○ A building block for more complex systems ○ Outputs newline delimited JSON or raw Avro data ● Shipped as a part of MaxScale 2.0
  • 21. CDC Client - Example Output [alex@localhost ~]$ cdc.py --user umaxuser --password maxpwd db1.tbl1 {"namespace": "MaxScaleChangeDataSchema.avro", "type": "record", "name": "ChangeRecord", "fields": [{"name": "domain", "type": "int"}, {"name": "server_id", "type": "int"}, {"name": "sequence", "type": "int"}, {"name": "event_number", "type": "int"}, {"name": "timestamp", "type": "int"}, {"name": "event_type", "type": {"type": "enum", "name": "EVENT_TYPES", "symbols": ["insert", "update_before", "update_after", "delete"]}}, {"name": "id", "type": "int", "real_type": "int", "length": -1}, {"name": "data", "type": "string", "real_type": "varchar", "length": 255}]} • Schema is sent first • Events come after the schema • New schema sent if the schema changes
  • 22. CDC Client - Example Output {"sequence": 2, "server_id": 3000, "data": "Hello", "event_type": "insert", "id": 1, "domain": 0, "timestamp": 1490878875, "event_number": 1} {"sequence": 3, "server_id": 3000, "data": "world!", "event_type": "insert", "id": 2, "domain": 0, "timestamp": 1490878880, "event_number": 1} {"sequence": 4, "server_id": 3000, "data": "Hello", "event_type": "update_before", "id": 1, "domain": 0, "timestamp": 1490878914, "event_number": 1} {"sequence": 4, "server_id": 3000, "data": "Greetings", "event_type": "update_after", "id": 1, "domain": 0, "timestamp": 1490878914, "event_number": 2} {"sequence": 5, "server_id": 3000, "data": "world!", "event_type": "delete", "id": 2, "domain": 0, "timestamp": 1490878929, "event_number": 1} INSERT INTO t1 (data) VALUES ("Hello"); -- TRX1 INSERT INTO t1 (data) VALUES ("world!"); -- TRX2 UPDATE t1 SET data = "Greetings" WHERE id = 1; -- TRX3 DELETE FROM t1 WHERE id = 2; -- TRX4
  • 23. CDC Client - Example Output {"namespace": "MaxScaleChangeDataSchema.avro", "type": "record", "name": "ChangeRecord", "fields": [{"name": "domain", "type": "int"}, {"name": "server_id", "type": "int"}, {"name": "sequence", "type": "int"}, {"name": "event_number", "type": "int"}, {"name": "timestamp", "type": "int"}, {"name": "event_type", "type": {"type": "enum", "name": "EVENT_TYPES", "symbols": ["insert", "update_before", "update_after", "delete"]}}, {"name": "id", "type": "int", "real_type": "int", "length": -1}, {"name": "data", "type": "string", "real_type": "varchar", "length": 255}, {"name": "account_balance", "type": "float", "real_type": "float", "length": -1}]} {"domain": 0, "server_id": 3000, "sequence": 7, "event_number": 1, "timestamp": 1496682140, "event_type": "insert", "id": 3, "data": "New Schema", "account_balance": 25.0} ALTER TABLE t1 ADD COLUMN account_balance FLOAT; INSERT INTO t1 (data, account_balance) VALUES ("New Schema", 25.0);
  • 24. Kafka Producer The CDC Kafka Producer
  • 25. Why Kafka? [vagrant@maxscale ~]$ ./bin/kafka-console-consumer.sh --zookeeper 127.0.0.1:2181 --topic MyTopic {"namespace": "MaxScaleChangeDataSchema.avro", "type": "record", "fields": [{"type": "int", "name": "domain"}, {"type": "int", "name": "server_id"}, {"type": "int", "name": "sequence"}, {"type": "int", "name": "event_number"}, {"type": "int", "name": "timestamp"}, {"type": {"symbols": ["insert", "update_before", "update_after", "delete"], "type": "enum", "name": "EVENT_TYPES"}, "name": "event_type"}, {"type": "int", "name": "id", "real_type": "int", "length": -1}], "name": "ChangeRecord"} {"domain": 0, "event_number": 1, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 1} {"domain": 0, "event_number": 2, "event_type": "insert", "server_id": 1, "sequence": 58, "timestamp": 1470670824, "id": 2} ● Isolation of producers and consumers ○ Data can be produced and consumed at any time ● Good for intermediate storage of streams ○ Data is stored until it is processed ○ Distributed storage makes data persistent ● Widely supported for real time analytics ○ Druid ○ Apache Storm ● Tooling for Kafka already exists
  • 26. CDC Kafka Producer ● A Proof-of-Concept Kafka Producer ● Reads JSON generated by the MaxScale CDC Client ● Publishes JSON records to a Kafka cluster ● Simple usage cdc.py -u maxuser -pmaxpwd -h 127.0.0.1 -P 4001 test.t1 | cdc_kafka_producer.py --kafka-broker=127.0.0.1:9092 --kafka-topic=MyTopic
  • 27. Change Data Listener Protocol From MaxScale to Kafka Kafka Producer CDC Consumer/Kafka Producer CDC Client
  • 28. Binlog Server Everything Together mysql-bin.01045 AVRO_file_001 AVRO_file_002 AVRO converter CDC clients Change Data Capture Listener AVRO streaming MariaDB Master
  • 29. MaxScale for Streaming Changes MaxScale solution provides: ● Easy replication setup from MariaDB database ● Integrated and configurable Avro file conversion ● Easy data streaming to compatible solutions ● Ready to use Python scripts