Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services

135 views

Published on

Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services, Perry Krol, Head of Systems Engineering, CEMEA, Confluent
https://www.meetup.com/Frankfurt-Apache-Kafka-Meetup-by-Confluent/events/269751169/

Published in: Technology
  • Be the first to comment

Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services

  1. 1. 1 Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services Perry Krol, Head of Systems Engineering CEMEA
  2. 2. 2 Confluent Community Slack Channel Over 10,000 Kafkateers are collaborating every single day on the Confluent Community Slack channel! cnfl.io/community-slack Subscribe to the Confluent blog Get frequent updates from key names in Apache Kafka® on best practices, product updates & more! cnfl.io/read Welcome to HUG meets Kafka UG Frankfurt Meetup !   Zoom open at 18:15 18:20PM - 18:30PM Virtual Cheers and Networking 18:30PM - 19:15PM Setup Kafka with Terraform Dominik Fries & Qais Babie 19:15PM - 20:00PM Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services Perry Krol
  3. 3. 33 Apache Kafka® Fundamentals
  4. 4. 44 K V K V The Truth is in the Log K V K V Log of Events Kafka Topic 1 2 30
  5. 5. 55 Partitions … … … …
  6. 6. 66 Partitions … … … … Partition 0 Partition 1 Partition 2 Partition 3 • Provides scalable: - Writes - Storage - Consumption • Ordering is within a partition only
  7. 7. 77 Replicas Broker 1 Broker 2 Broker 3 Broker 4 Topic A Partition 0 Topic A Partition 0 Topic A Partition 1 Topic A Partition 0 Topic A Partition 1 Topic A Partition 2 Topic A Partition 3 Topic A Partition 1 Topic A Partition 2 Topic A Partition 2 Leader Follower
  8. 8. 88 Replicas Broker 1 Broker 2 Broker 3 Broker 4 Topic A Partition 0 Topic A Partition 0 Topic A Partition 1 Topic A Partition 0 Topic A Partition 1 Topic A Partition 2 Topic A Partition 3 Topic A Partition 1 Topic A Partition 2 Topic A Partition 2
  9. 9. 99 Producers … … … … Partition 0 Partition 1 Partition 2 Partition 3 Partitioned Topic Producer
  10. 10. 1010 Record Keys & Ordering Record keys determine the partition with the default kafka partitioner Keys are used in the default partitioning algorithm: partition = hash(key) % numPartitions
  11. 11. 1111 Consumers … … … … Partition 0 Partition 1 Partition 2 Partition 3 Partitioned Topic Consumer A
  12. 12. 1212 Consumers … … … … Partition 0 Partition 1 Partition 2 Partition 3 Partitioned Topic Consumer A Consumer B
  13. 13. 1313 Consumers … … … … Partition 0 Partition 1 Partition 2 Partition 3 Partitioned Topic Consumer A1 Consumer B Consumer A2 Consumer A3 Consumer A4
  14. 14. 1414 Consumers … … … … Partition 0 Partition 1 Partition 2 Partition 3 Partitioned Topic Consumer A1 Consumer B Consumer A2 Consumer A3 Consumer A4 • A Client Application • Reads Messages from Topics • Horizontally, elastically scalable (if stateless)
  15. 15. 1515 Apache Kafka® Connect
  16. 16. 16 Streaming Integration with Kafka® Connect Kafka® Connect Kafka® Brokers Sources Sinks
  17. 17. 17 Kafka® Connect Data Pipeline Source Kafka® Connect Kafka® Connector ConverterTransform(s)
  18. 18. 18 Confluent Hub Online library of pre-packaged and ready-to-install extensions or add-ons for Confluent Platform and Apache Kafka®: ● Connectors ● Transforms ● Converters Easily install the components that suit your needs into your local environment with the Confluent Hub client command line tool . https://hub.confuent.io
  19. 19. 1919 Apache Kafka® KStreams
  20. 20. 2020 … … Producer Stream Processor Consumer ∑
  21. 21. 2121 STREAM PROCESSING Create and store materialized views Filter and join Act and analyze in-flight
  22. 22. 22 Runs everywhere Clustering done for you Exactly once processing Event time processing Integrated database Joins, windowing, aggregation S/M/L/XL/XXL/XXXL sizes Things Kafka Streams Does
  23. 23. 2323 Confluent KSQL
  24. 24. 24 24 KSQLis the Streaming SQL Engine for Apache Kafka
  25. 25. 25 CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum’ EMIT CHANGES; Simple SQL syntax for expressing reasoning along and across data streams. You can write user-defined functions in Java Stream Processing with KSQL
  26. 26. 26 $ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt Kafka Cluster Stream ProcessingConnect API Connect API Stream Processing Analogy
  27. 27. @rmoff / Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline! The truth is the log. The database is a cache of a subset of the log. —Pat Helland Immutability Changes Everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf Photo by Bobby Burch on Unsplash
  28. 28. 28 KSQL for Real-Time Monitoring 28 •Log data monitoring, tracking and alerting •syslog data •Sensor / IoT data CREATE TABLE error_counts AS SELECT error_code, count(*) FROM monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE type = 'ERROR' GROUP BY error_code;
  29. 29. 29 KSQL for Streaming ETL 29 CREATE STREAM engine_oil_pressure_readings AS SELECT r.deviceid, r.reading, r.timestamp d.sensor_type, d.uom, d.component FROM sensor_readings r LEFT JOIN device_master d ON r.deviceid = d.id WHERE d.component = ‘Engine’ AND d.sensor_type = ‘Oil Pressure’ EMIT CHANGES; Joining, filtering, and aggregating streams of event data
  30. 30. 30 Kafka Connect Producer API Elasticsearch Kafka Connect Streaming ETL with Apache Kafka and KSQL r PostgreSQL CDC Debezium
  31. 31. 31 KSQL is a stream processing technology As such it is not yet a great fit for: Ad-hoc queries ● No indexes yet in KSQL ● Kafka often configured to retain data for only a limited span of time BI reports (Tableau etc.) ● No indexes yet in KSQL ● No JDBC ● Most BI tools don’t understand continuous, streaming results
  32. 32. 32 Building blocks for Stream Processing Core Kafka Producer Topic Consumer Kafka Streams State Stores Change Logs Processors Operators Stream Table Persistence Compute Declarative API ksqlDB Push Queries Pull Queries Serverless Topology Durable Pub Sub Transformers
  33. 33. 33C O N F I D E N T I A L 33 Streaming ETL OLTP RDBMS to Big Data in Cloud Environments
  34. 34. 34C O N F I D E N T I A L Sample UseCase: Sales data ● Dataset from Kaggle https://www.kaggle.com/kyanyoga/sample-sales-data
  35. 35. 35C O N F I D E N T I A L RDBMS ● Current de-facto data integration technology ● Third Normal Form ● Minimises data duplication
  36. 36. 36C O N F I D E N T I A L 36 Big Data ● Data storage is cheap ● Tabular data ● Flat schema
  37. 37. 37C O N F I D E N T I A L Hybrid Cloud ● OLTP Database on-premises DC ● Big Data service in Cloud environment ● Heterogeneous network, across security zones ● What’s the Bridge to Cloud ?
  38. 38. 38C O N F I D E N T I A L 38 Connector over WAN? ● Stability of Connector and reliability of data delivery dependent on stable WAN connection and QoS ● Tight coupling between Kafka cluster in on-premises DC and individual application endpoints in the Cloud
  39. 39. 39C O N F I D E N T I A L 3939C O N F I D E N T I A L Bridge 2 Cloud ● Decoupling with Kafka Clusters in both the DC and Cloud environment ● Reliable data delivery with buffer in each environment, independent of network security or QoS SLA ● Persistent bridge to cloud, ensures data is only sent once to the cloud (or to DC) and can be reused by many stream processors, connectors and apps
  40. 40. 40C O N F I D E N T I A L Demo Scenario
  41. 41. 41C O N F I D E N T I A L There’s a huge gap!
  42. 42. 42C O N F I D E N T I A L Streaming KSQL: pairwise joins CREATE STREAM ORDER_ORDERLINE_EVENTS AS SELECT ol.ORDERNUMBER, o.ORDERDATE, o.STATUS, o.QTR_ID, o.MONTH_ID, o.YEAR_ID, o.DEALSIZE, o.CUSTOMERNAME, ol.ORDERLINENUMBER, ol.QUANTITYORDERED, ol.PRICEEACH, ol.PRODUCTCODE FROM ORDER_LINES_EVENTS ol LEFT JOIN ORDER_HEADERS o ON ol.ORDERNUMBER = o.ORDERNUMBER PARTITION BY ol.CUSTOMERNAME;
  43. 43. 43C O N F I D E N T I A L Streaming KSQL: pairwise joins CREATE STREAM CUSTOMER_ORDER_ORDERLINE_EVENTS AS SELECT ool.ORDERNUMBER,ool.ORDERDATE, ool.STATUS,ool.QTR_ID, ool.MONTH_ID,ool.YEAR_ID, ool.DEALSIZE, ool.CUSTOMERNAME, c.PHONE,c.ADDRESSLINE1,c.ADDRESSLINE2, c.CITY,c.STATE,c.POSTALCODE,c.COUNTRY, c.CONTACTLASTNAME,c.CONTACTFIRSTNAME, ool.ORDERLINENUMBER, ool.QUANTITYORDERED, ool.PRICEEACH,ool.PRODUCTCODE FROM ORDER_ORDERLINE_EVENTS ool LEFT JOIN CUSTOMERS c ON ool.CUSTOMERNAME = c.CUSTOMERNAME PARTITION BY ool.PRODUCTCODE;
  44. 44. 44C O N F I D E N T I A L Streaming KSQL: pairwise joins CREATE STREAM PRODUCT_CUSTOMER_ORDERLINE_EVENTS AS SELECT col.ORDERNUMBER,col.ORDERDATE, col.STATUS,col.QTR_ID,col.MONTH_ID, col.YEAR_ID,col.DEALSIZE, col.CUSTOMERNAME,col.PHONE, col.ADDRESSLINE1,col.ADDRESSLINE2, col.CITY,col.STATE,col.POSTALCODE, col.COUNTRY,col.CONTACTLASTNAME, col.CONTACTFIRSTNAME, col.ORDERLINENUMBER, col.QUANTITYORDERED,col.PRICEEACH, col.PRODUCTCODE,p.PRODUCTLINE, p.MSRP FROM CUSTOMER_ORDER_ORDERLINE_EVENTS col LEFT JOIN PRODUCTS p ON pcol.OL1_OL_PRODUCTCODE = p.PRODUCTCODE PARTITION BY col.COUNTRY;
  45. 45. 45C O N F I D E N T I A L Streaming KSQL: pairwise joins CREATE STREAM TABULAR_ORDER_EVENTS WITH (KAFKA_TOPIC='orders_enriched') AS SELECT pcol.ORDERNUMBER,pcol.ORDERDATE, pcol.STATUS,pcol.QTR_ID,pcol.MONTH_ID, pcol.YEAR_ID,pcol.DEALSIZE, pcol.CUSTOMERNAME,pcol.PHONE, pcol.ADDRESSLINE1,pcol.ADDRESSLINE2, pcol.CITY,pcol.STATE, pcol.POSTALCODE,pcol.COUNTRY COUNTRY, c.TERRITORY,pcol.CONTACTLASTNAME, pcol.CONTACTFIRSTNAME,pcol.ORDERLINENUMBER, pcol.QUANTITYORDERED, pcol.PRICEEACH, pcol.PRODUCTCODE, pcol.PRODUCTLINE, pcol.MSRP FROM PRODUCT_CUSTOMER_ORDERLINE_EVENTS pcol LEFT JOIN COUNTRIES c ON pcol.COUNTRY = c.COUNTRY; PARTITION BY ORDERNUMBER;
  46. 46. 46C O N F I D E N T I A L What does KSQL look like? ● First load a topic into a stream ● Then flatten to a table ● Join stream to table for enrichment CREATE STREAM orderlines1 AS SELECT ol.*, o.ORDERDATE, o.STATUS, o.QTR_ID, o.MONTH_ID, o.YEAR_ID, o.DEALSIZE, o.CUSTOMERNAME FROM ORDERLINES_3NF ol LEFT JOIN T_ORDERS_3NF o ON ol.ORDERNUMBER = o.ORDERNUMBER; CREATE STREAM ORDER_EVENTS WITH (KAFKA_TOPIC='orders_cdc', VALUE_FORMAT='AVRO’) PARTITION BY ORDERNUMBER; CREATE TABLE ORDERS WITH (KAFKA_TOPIC='ORDER_EVENTS', VALUE_FORMAT='AVRO', KEY='ORDERNUMBER’);
  47. 47. 47C O N F I D E N T I A L Or use the Kafka Streams API ● Java or Scala ● Can do multiple joins in one operation ● Provides an interactive query API which makes it possible to query the state store.
  48. 48. 4848 Learn more about Apache Kafka® and Confluent Platform
  49. 49. 49 Learn Kafka. Start building with Apache Kafka at Confluent Developer. developer.confluent.io
  50. 50. 5050 https://www.confluent.io/apache-kafka-stream-processing-book-bundle/
  51. 51. 51Confluent are giving new users $50 of free usage per month for their first 3 months Sign up for a Confluent Cloud account Please bear in mind that you will be required to enter credit card information but will not be charged unless you go over the $50 usage in any of the first 3 months or if you don’t cancel your subscription before the end of your promotion. Here’s advice on how to use this promotion to try Confluent Cloud for free! You won’t be charged if you don’t go over the limit! Get the benefits of Confluent Cloud, but keep an eye on your your account making sure that you have enough remaining free credits available for the rest of your subscription month!! Cancel before the 3 months end If you don’t want to continue past the promotion If you fail to cancel within your first three months you will start being charged full price. To cancel, immediately stop all streaming and storing data in Confluent Cloud and email cloud-support@confluent.io bit.ly/TryConfluentCloudAvailable on bit.ly/TryConfluentCloud
  52. 52. 52 A Confluent community catalyst is a person who invests relentlessly in the Apache Kafka® and/or Confluent communities. Massive bragging rights Access to the private MVP Slack channel Special swagThe recognition of your peers Direct interaction with Apache Kafka contributors as well as the Confluent founders at special events Free pass for Kafka Summit SF Nominate yourself or a peer at CONFLUENT.IO/NOMINATE
  53. 53. 5353 Want to host or speak at one of our meetups? Please contact community@confluent.io and we will make it happen!
  54. 54. 54C O N F I D E N T I A L 54C O N F I D E N T I A L Learn more about Apache Kafka® and Confluent Platform
  55. 55. 55C O N F I D E N T I A L 55C O N F I D E N T I A L Confluent Blog https://confluent.io/blog Get the latest info and read about interesting use case or implementation details in our Blog! Categories include: ● Analytics ● Big Ideas ● Clients ● Company ● Confluent Cloud ● Connecting to Apache Kafka ● Frameworks ● Kafka Summit ● Use Cases ● Stream Processing
  56. 56. 56C O N F I D E N T I A L 56C O N F I D E N T I A L Podcast - Streaming Audio ● Real-Time Banking with Clojure and Apache Kafka ft. Bobby Calderwood ● Announcing ksqlDB ft. Jay Kreps (20.11.2019) ● Installing Apache Kafka with Ansible ft. Viktor Gamov and Justin Manchester (18.11.2019) ● Securing the Cloud with VPC Peering ft. Daniel LaMotte (13.11.2019) ● ETL and Event Streaming Explained ft. Stewart Bryson (06.11.2019) ● The Pro’s Guide to Fully Managed Apache Kafka Services ft. Ricardo Ferreira (04.11.2019) ● Many more!
  57. 57. 57C O N F I D E N T I A L 57 https://www.confluent.io/apache-kafka-stream-processing-book-bundle/
  58. 58. 58C O N F I D E N T I A L Confluent Community Slack Channel Over 10,000 Kafkateers are collaborating every single day on the Confluent Community Slack channel! cnfl.io/community-slack Confluent Community Catalyst Program Nominate yourself or a peer at confluent.io/nominate Subscribe to the Confluent blog Get frequent updates from key names in Apache Kafka® on best practices, product updates & more! cnfl.io/read
  59. 59. 59C O N F I D E N T I A L 59 https://www.confluent.io/download/compare/
  60. 60. 60C O N F I D E N T I A LConfluent are giving new users $50 of free usage per month for their first 3 months Sign up for a Confluent Cloud account Please bear in mind that you will be required to enter credit card information but will not be charged unless you go over the $50 usage in any of the first 3 months or if you don’t cancel your subscription before the end of your promotion. Here’s advice on how to use this promotion to try Confluent Cloud for free! You won’t be charged if you don’t go over the limit! Get the benefits of Confluent Cloud, but keep an eye on your your account making sure that you have enough remaining free credits available for the rest of your subscription month!! Cancel before the 3 months end If you don’t want to continue past the promotion If you fail to cancel within your first three months you will start being charged full price. To cancel, immediately stop all streaming and storing data in Confluent Cloud and email cloud-support@confluent.io bit.ly/TryConfluentCloudAvailable on bit.ly/TryConfluentCloud
  61. 61. 61C O N F I D E N T I A L Demos A curated list of demos that showcase Apache Kafka® event stream processing on the Confluent Platform. This list is curated by our DevX Team and include demos for: ● Confluent Cloud ● Stream Processing ● Data Pipelines ● Confluent Platform
  62. 62. 62C O N F I D E N T I A L https://github.com/confluentinc/examples
  63. 63. 63C O N F I D E N T I A L https://github.com/confluentinc/demo-scene
  64. 64. 64C O N F I D E N T I A L @perkrol

×