Real-time Analytics with the
New Streaming Data Adapters
Dipti Joshi
Director Product Management
Markus Mäkelä
Senior Software Engineer
Streamline and simplify
the process of data ingestion
Motivation
Organizations need to make data available for
analysis as soon as it arrives
Machine learning results need to be stored where
other business/data analysts work with them
Time to insight and time to action are now
competitive differentiators for businesses
Bulk data adapters
Applications can use bulk data
adapters SDK to collect and
write data - on-demand data
loading
No need to copy CSV to UM
or PM - simpler
Bypass SQL interface,
parser and optimizer -
faster writes
C++
Python
Java
MariaDB Server
ColumnStore UM
Application
ColumnStore PM ColumnStore PMColumnStore PM
Write API Write API Write API
MariaDB Server
ColumnStore UM
Bulk Data Adapter
1. For each row
a. For each column
bulkInsert->setColumn
b. bulkInsert->writeRow
2. bulkInsert->commit
* Buffer 100,000 rows by default
Streaming data adapters
– MaxScale CDC
Stream all writes from
MariaDB TX to MariaDB AX
automatically and
continuously - ensure
analytical data is up to
date and not stale, no
need for batch jobs,
manual processes or
human intervention
MariaDB Server
InnoDB
MariaDB Server
ColumnStore UM
MariaDB MaxScale
ColumnStore PM ColumnStore PMColumnStore PM
Write API Write API Write API
MariaDB Server
ColumnStore UM
Streaming Data Adapter
(MaxScale CDC Client)
Binlog-Avro CDC
Router
Inside MaxScale CDC Adapter
● Connects to MaxScale via MaxScale CDC Connector
● Connects to ColumnStore via ColumnStore API
● Set of CDC records → CS API mini-batch
● CDC Record
○ Timestamp
○ GTID
○ Type (write, delete, update)
○ Changed data
Inside MaxScale CDC Adapter
CREATE TABLE test.t1 (id INT);
INSERT INTO test.t1 VALUES (1);
UPDATE test.t1 SET id = 2 WHERE id = 1;
DELETE FROM test.t1 WHERE id = 2;
{"domain": 0, "server_id": 3000, "sequence": 19, "event_number": 1, "timestamp": 1519225339, "event_type": "insert", "id": 1}
{"domain": 0, "server_id": 3000, "sequence": 20, "event_number": 1, "timestamp": 1519225349, "event_type": "update_before", "id": 1}
{"domain": 0, "server_id": 3000, "sequence": 20, "event_number": 2, "timestamp": 1519225349, "event_type": "update_after", "id": 2}
{"domain": 0, "server_id": 3000, "sequence": 21, "event_number": 1, "timestamp": 1519225356, "event_type": "delete", "id": 2}
MariaDB [test]> select * from t1 order by sequence;
+--------+--------------+---------------+------+----------+-----------+------------+
| domain | event_number | event_type | id | sequence | server_id | timestamp |
+--------+--------------+---------------+------+----------+-----------+------------+
| 0 | 1 | insert | 1 | 19 | 3000 | 1519225339 |
| 0 | 1 | update_before | 1 | 20 | 3000 | 1519225349 |
| 0 | 2 | update_after | 2 | 20 | 3000 | 1519225349 |
| 0 | 1 | delete | 2 | 21 | 3000 | 1519225356 |
+--------+--------------+---------------+------+----------+-----------+------------+
Streaming data adapters
– Apache Kafka
Stream all messages
published to Apache Kafka
topics to MariaDB AX
automatically and
continuously - enable data
from many sources to be
streamed and collected for
analysis without complex
code
MariaDB Server
ColumnStore UM
ColumnStore PM ColumnStore PMColumnStore PM
Write API Write API Write API
MariaDB Server
ColumnStore UM
Streaming Data Adapter
(Kafka Client)
Apache Kafka
Topic Topic Topic
Inside Apache Kafka Adapter
● Connects to Kafka
● Reads Avro formatted data
○ Confluent KafkaAvroSerializer: https://docs.confluent.io/current/streams/developer-guide/write-streams.html
● Each topic is a stream
● Streams map to tables
○ Stream to multiple tables
○ Multiple streams to single table
Demo: Kafka Data Adapter
The big picture – putting it all together
AnalyticsOperations Ingestion
Apache Kafka
Streaming Data Adapters
Data Services
Bulk Data Adapters
Spark / Python / ML
Bulk Data Adapters
Transaction (OLTP)
MariaDB Server
InnoDB
MariaDB MaxScale
Web/Mobile Services
MariaDB MaxScale
Analytics (OLAP)
MariaDB
ColumnStore
Resources
Reach me
Download
Documentation https://mariadb.com/kb/en/library/mariadb-columnstore/
Blogs https://mariadb.com/blog-tags/columnstore
https://mariadb.com/blog-tags/big-data
dipti.joshi@mariadb.com
MariaDB AX https://mariadb.com/mariadb-ax-download
MariaDB ColumnStore 1.1 https://mariadb.com/downloads/mariadb-ax
MariaDB MaxScale https://mariadb.com/downloads/mariadb-ax/maxscale
Bulk Data Adapters and Streaming Data Adapters
https://mariadb.com/downloads/mariadb-ax/data-adapters
MariaDB ColumnStore Backup/Restore Tool
https://mariadb.com/downloads/mariadb-ax/tools-ax
Thank you!

M|18 Real-time Analytics with the New Streaming Data Adapters

  • 1.
    Real-time Analytics withthe New Streaming Data Adapters Dipti Joshi Director Product Management Markus Mäkelä Senior Software Engineer
  • 2.
    Streamline and simplify theprocess of data ingestion
  • 3.
    Motivation Organizations need tomake data available for analysis as soon as it arrives Machine learning results need to be stored where other business/data analysts work with them Time to insight and time to action are now competitive differentiators for businesses
  • 4.
    Bulk data adapters Applicationscan use bulk data adapters SDK to collect and write data - on-demand data loading No need to copy CSV to UM or PM - simpler Bypass SQL interface, parser and optimizer - faster writes C++ Python Java MariaDB Server ColumnStore UM Application ColumnStore PM ColumnStore PMColumnStore PM Write API Write API Write API MariaDB Server ColumnStore UM Bulk Data Adapter 1. For each row a. For each column bulkInsert->setColumn b. bulkInsert->writeRow 2. bulkInsert->commit * Buffer 100,000 rows by default
  • 5.
    Streaming data adapters –MaxScale CDC Stream all writes from MariaDB TX to MariaDB AX automatically and continuously - ensure analytical data is up to date and not stale, no need for batch jobs, manual processes or human intervention MariaDB Server InnoDB MariaDB Server ColumnStore UM MariaDB MaxScale ColumnStore PM ColumnStore PMColumnStore PM Write API Write API Write API MariaDB Server ColumnStore UM Streaming Data Adapter (MaxScale CDC Client) Binlog-Avro CDC Router
  • 6.
    Inside MaxScale CDCAdapter ● Connects to MaxScale via MaxScale CDC Connector ● Connects to ColumnStore via ColumnStore API ● Set of CDC records → CS API mini-batch ● CDC Record ○ Timestamp ○ GTID ○ Type (write, delete, update) ○ Changed data
  • 7.
    Inside MaxScale CDCAdapter CREATE TABLE test.t1 (id INT); INSERT INTO test.t1 VALUES (1); UPDATE test.t1 SET id = 2 WHERE id = 1; DELETE FROM test.t1 WHERE id = 2; {"domain": 0, "server_id": 3000, "sequence": 19, "event_number": 1, "timestamp": 1519225339, "event_type": "insert", "id": 1} {"domain": 0, "server_id": 3000, "sequence": 20, "event_number": 1, "timestamp": 1519225349, "event_type": "update_before", "id": 1} {"domain": 0, "server_id": 3000, "sequence": 20, "event_number": 2, "timestamp": 1519225349, "event_type": "update_after", "id": 2} {"domain": 0, "server_id": 3000, "sequence": 21, "event_number": 1, "timestamp": 1519225356, "event_type": "delete", "id": 2} MariaDB [test]> select * from t1 order by sequence; +--------+--------------+---------------+------+----------+-----------+------------+ | domain | event_number | event_type | id | sequence | server_id | timestamp | +--------+--------------+---------------+------+----------+-----------+------------+ | 0 | 1 | insert | 1 | 19 | 3000 | 1519225339 | | 0 | 1 | update_before | 1 | 20 | 3000 | 1519225349 | | 0 | 2 | update_after | 2 | 20 | 3000 | 1519225349 | | 0 | 1 | delete | 2 | 21 | 3000 | 1519225356 | +--------+--------------+---------------+------+----------+-----------+------------+
  • 8.
    Streaming data adapters –Apache Kafka Stream all messages published to Apache Kafka topics to MariaDB AX automatically and continuously - enable data from many sources to be streamed and collected for analysis without complex code MariaDB Server ColumnStore UM ColumnStore PM ColumnStore PMColumnStore PM Write API Write API Write API MariaDB Server ColumnStore UM Streaming Data Adapter (Kafka Client) Apache Kafka Topic Topic Topic
  • 9.
    Inside Apache KafkaAdapter ● Connects to Kafka ● Reads Avro formatted data ○ Confluent KafkaAvroSerializer: https://docs.confluent.io/current/streams/developer-guide/write-streams.html ● Each topic is a stream ● Streams map to tables ○ Stream to multiple tables ○ Multiple streams to single table
  • 10.
  • 11.
    The big picture– putting it all together
  • 12.
    AnalyticsOperations Ingestion Apache Kafka StreamingData Adapters Data Services Bulk Data Adapters Spark / Python / ML Bulk Data Adapters Transaction (OLTP) MariaDB Server InnoDB MariaDB MaxScale Web/Mobile Services MariaDB MaxScale Analytics (OLAP) MariaDB ColumnStore
  • 13.
    Resources Reach me Download Documentation https://mariadb.com/kb/en/library/mariadb-columnstore/ Blogshttps://mariadb.com/blog-tags/columnstore https://mariadb.com/blog-tags/big-data dipti.joshi@mariadb.com MariaDB AX https://mariadb.com/mariadb-ax-download MariaDB ColumnStore 1.1 https://mariadb.com/downloads/mariadb-ax MariaDB MaxScale https://mariadb.com/downloads/mariadb-ax/maxscale Bulk Data Adapters and Streaming Data Adapters https://mariadb.com/downloads/mariadb-ax/data-adapters MariaDB ColumnStore Backup/Restore Tool https://mariadb.com/downloads/mariadb-ax/tools-ax
  • 14.