Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

M|18 Real-time Analytics with the New Streaming Data Adapters

47 views

Published on

M|18 Real-time Analytics with the New Streaming Data Adapters

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

M|18 Real-time Analytics with the New Streaming Data Adapters

  1. 1. Real-time Analytics with the New Streaming Data Adapters Dipti Joshi Director Product Management Markus Mäkelä Senior Software Engineer
  2. 2. Streamline and simplify the process of data ingestion
  3. 3. Motivation Organizations need to make data available for analysis as soon as it arrives Machine learning results need to be stored where other business/data analysts work with them Time to insight and time to action are now competitive differentiators for businesses
  4. 4. Bulk data adapters Applications can use bulk data adapters SDK to collect and write data - on-demand data loading No need to copy CSV to UM or PM - simpler Bypass SQL interface, parser and optimizer - faster writes C++ Python Java MariaDB Server ColumnStore UM Application ColumnStore PM ColumnStore PMColumnStore PM Write API Write API Write API MariaDB Server ColumnStore UM Bulk Data Adapter 1. For each row a. For each column bulkInsert->setColumn b. bulkInsert->writeRow 2. bulkInsert->commit * Buffer 100,000 rows by default
  5. 5. Streaming data adapters – MaxScale CDC Stream all writes from MariaDB TX to MariaDB AX automatically and continuously - ensure analytical data is up to date and not stale, no need for batch jobs, manual processes or human intervention MariaDB Server InnoDB MariaDB Server ColumnStore UM MariaDB MaxScale ColumnStore PM ColumnStore PMColumnStore PM Write API Write API Write API MariaDB Server ColumnStore UM Streaming Data Adapter (MaxScale CDC Client) Binlog-Avro CDC Router
  6. 6. Inside MaxScale CDC Adapter ● Connects to MaxScale via MaxScale CDC Connector ● Connects to ColumnStore via ColumnStore API ● Set of CDC records → CS API mini-batch ● CDC Record ○ Timestamp ○ GTID ○ Type (write, delete, update) ○ Changed data
  7. 7. Inside MaxScale CDC Adapter CREATE TABLE test.t1 (id INT); INSERT INTO test.t1 VALUES (1); UPDATE test.t1 SET id = 2 WHERE id = 1; DELETE FROM test.t1 WHERE id = 2; {"domain": 0, "server_id": 3000, "sequence": 19, "event_number": 1, "timestamp": 1519225339, "event_type": "insert", "id": 1} {"domain": 0, "server_id": 3000, "sequence": 20, "event_number": 1, "timestamp": 1519225349, "event_type": "update_before", "id": 1} {"domain": 0, "server_id": 3000, "sequence": 20, "event_number": 2, "timestamp": 1519225349, "event_type": "update_after", "id": 2} {"domain": 0, "server_id": 3000, "sequence": 21, "event_number": 1, "timestamp": 1519225356, "event_type": "delete", "id": 2} MariaDB [test]> select * from t1 order by sequence; +--------+--------------+---------------+------+----------+-----------+------------+ | domain | event_number | event_type | id | sequence | server_id | timestamp | +--------+--------------+---------------+------+----------+-----------+------------+ | 0 | 1 | insert | 1 | 19 | 3000 | 1519225339 | | 0 | 1 | update_before | 1 | 20 | 3000 | 1519225349 | | 0 | 2 | update_after | 2 | 20 | 3000 | 1519225349 | | 0 | 1 | delete | 2 | 21 | 3000 | 1519225356 | +--------+--------------+---------------+------+----------+-----------+------------+
  8. 8. Streaming data adapters – Apache Kafka Stream all messages published to Apache Kafka topics to MariaDB AX automatically and continuously - enable data from many sources to be streamed and collected for analysis without complex code MariaDB Server ColumnStore UM ColumnStore PM ColumnStore PMColumnStore PM Write API Write API Write API MariaDB Server ColumnStore UM Streaming Data Adapter (Kafka Client) Apache Kafka Topic Topic Topic
  9. 9. Inside Apache Kafka Adapter ● Connects to Kafka ● Reads Avro formatted data ○ Confluent KafkaAvroSerializer: https://docs.confluent.io/current/streams/developer-guide/write-streams.html ● Each topic is a stream ● Streams map to tables ○ Stream to multiple tables ○ Multiple streams to single table
  10. 10. Demo: Kafka Data Adapter
  11. 11. The big picture – putting it all together
  12. 12. AnalyticsOperations Ingestion Apache Kafka Streaming Data Adapters Data Services Bulk Data Adapters Spark / Python / ML Bulk Data Adapters Transaction (OLTP) MariaDB Server InnoDB MariaDB MaxScale Web/Mobile Services MariaDB MaxScale Analytics (OLAP) MariaDB ColumnStore
  13. 13. Resources Reach me Download Documentation https://mariadb.com/kb/en/library/mariadb-columnstore/ Blogs https://mariadb.com/blog-tags/columnstore https://mariadb.com/blog-tags/big-data dipti.joshi@mariadb.com MariaDB AX https://mariadb.com/mariadb-ax-download MariaDB ColumnStore 1.1 https://mariadb.com/downloads/mariadb-ax MariaDB MaxScale https://mariadb.com/downloads/mariadb-ax/maxscale Bulk Data Adapters and Streaming Data Adapters https://mariadb.com/downloads/mariadb-ax/data-adapters MariaDB ColumnStore Backup/Restore Tool https://mariadb.com/downloads/mariadb-ax/tools-ax
  14. 14. Thank you!

×