Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy

•Download as PPTX, PDF•

3 likes•402 views

Kairo Tavares

An overview of the Confluent platform for Apache Kafka and how to use KSQL to build streaming data pipelines

Software

Confluent Kafka and KSQL:
Streaming Data Pipelines Made Easy
Kairo Tavares
kairo.tavares@hpe.com

Goals
How Confluent Kafka Platform can solve problems in company
What are streaming data pipelines and what are its challenges
How KSQL can make easy your streaming data pipeline
2

Agenda
– Kafka 101
– Confluent Kafka
– Streaming Data Pipeline
– KSQL
– Demo
3

Common problems that we face
Extract - Transform – Load (ETL)
5

Common problems that we face
Microservices
6

Kafka 101
Apache Kafka
Kafka® is used for building real-time data pipelines and
streaming apps. It is horizontally scalable, fault-
tolerant, wicked fast, and runs in production in
thousands of companies.
8
https://kafka.apache.org/

Kafka 101
Consuming Data
11
Simple Consumer Consumer Groups

Confluent Platform
13
Founded by the team that built Apache Kafka®, Confluent
builds an event streaming platform that enables
companies to easily access data as real-time streams
Development and
Connectivity
Stream Processing
Management
and Operations
https://www.confluent.io/

Confluent Platform
Deployment and Connectivity
– Kafka Connect & Connectors
– Schema Registry
– Kafka Clients
– REST Proxy
– MQTT Proxy
14

Confluent Platform
Avro Format
– Open Source Data Format
– Choice for a number of reasons:
– Direct mapping to and from JSON
– Compact format
– Very fast to serialize and deserialize
– Support to several programming languages
– It has a rich, extensible schema language
defined in pure JSON
– It has the best notion of compatibility for evolving
your data over time
– Built-in documentation
15

Confluent Platform
Management and Operations
– Control Center
– Manage key operations and monitor the health and performance of Kafka clusters and data streams with curated
dashboards directly from a GUI.
– Replicator
– Replicate Kafka topics across data centers and public clouds to ensure disaster recovery and build distributed data
pipelines.
– Auto Data Balancer
– Optimize your resource utilization by invoking a rack-aware algorithm that automatically rebalances partitions across
a Kafka cluster.
– Security Controls
– Enable pass-through client credentials from REST Proxy / Schema Registry to Kafka broker. Map AD/LDAP groups to
Kafka ACLs.
– Operator
– Automate deployment of the complete Confluent Platform, including Kafka, as a cloud-native application on
Kubernetes.
16

Confluent Platform
Stream Processing
– Kafka Streams (Library)
– Build mission-critical real-time applications with a simple Java library and a Kafka cluster - no additional framework or
cluster needed.
– KSQL (Service)
– Build real-time stream processing applications against Apache Kafka using simple SQL-like semantics.
17

Streaming Data Pipeline
But what is a stream?
20
– A stream is an unbounded, continuous flow of records
– Data is real-time
– Immutable events
– Records are key-value pairs (Kafka)

Streaming Data Pipeline
Stream Processing
21
Per-record
millisecond delay
Data filtering
Data transformation
and conversions
Data enrichment
with joins
Data manipulation
with scalar functions
Stateful processing
Data Aggregation Windowing processing

KSQL
Architecture and components
24
https://docs.confluent.io/current/ksql/docs/concepts/ksql-architecture.html

KSQL
Aggregation
29
How many pageviews regions by user?

KSQL
Aggregation Functions
30UDF and UDAF

$KSQL Click Stream - ETL 32 Stream (clickstream) {userid, page, action, usage_time} Table (users) {user_id, level, gender, age}$

KSQL
Credit Card Fraud – Anomaly detection
33

KSQL
Machine Learning Prediction
36
CREATE STREAM AnomalyDetectionWithFilter
SELECT rowtime, eventid, anomaly(sensorinput) AS Anomaly
FROM carsensor
WHERE anomaly(sensorinput) > 5;

Demo
https://docs.confluent.io/current/quickstart/ce-docker-quickstart.html
37

What's hot

Introduction to KSQL: Streaming SQL for Apache Kafka®confluent

Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Kai Wähner

Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...HostedbyConfluent

IoT Sensor Analytics with Kafka, ksqlDB and TensorFlowKai Wähner

Tale of two streaming frameworks (Karthik D - Walmart)KafkaZone

Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...HostedbyConfluent

Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...confluent

Operational Analytics on Event Streams in Kafkaconfluent

Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...HostedbyConfluent

Cloud Native London 2019 Faas composition using Kafka and cloud-eventsNeil Avery

From data stream management to distributed dataflows and beyondVasia Kalavri

Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, ConfluentHostedbyConfluent

How to use Standard SQL over Kafka: From the basics to advanced use cases | F...HostedbyConfluent

Data Driven Enterprise with Apache Kafkaconfluent

Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...confluent

Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and LinkerdKai Wähner

Real-World Pulsar Architectural PatternsDevin Bost

Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)Kai Wähner

KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...Kai Wähner

Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...confluent

What's hot (20)

Introduction to KSQL: Streaming SQL for Apache Kafka®

Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)

Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...

IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow

Tale of two streaming frameworks (Karthik D - Walmart)

Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...

Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...

Operational Analytics on Event Streams in Kafka

Developing a custom Kafka connector? Make it shine! | Igor Buzatović, Porsche...

Cloud Native London 2019 Faas composition using Kafka and cloud-events

From data stream management to distributed dataflows and beyond

Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent

How to use Standard SQL over Kafka: From the basics to advanced use cases | F...

Data Driven Enterprise with Apache Kafka

Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...

Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd

Real-World Pulsar Architectural Patterns

Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)

KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...

Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...

Similar to Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy

Introduction to Apache Kafka and Confluent... and why they matter!Paolo Castagna

Introduction to Apache Kafka and Confluent... and why they matterconfluent

Introducing Confluent Cloud: Apache Kafka as a Service confluent

Confluent Enterprise Datasheetconfluent

Confluent and ElasticPaolo Castagna

Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!confluent

Introduction to apache kafka, confluent and why they matterPaolo Castagna

BBL KAPPA Lesfurets.comCedric Vidal

Kafka Streams for Java enthusiastsSlim Baltagi

Building streaming data applications using Kafka*[Connect + Core + Streams] b...Data Con LA

Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQLKai Wähner

Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...Kai Wähner

Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...HostedbyConfluent

Kafka Vienna Meetup 020719Patrik Kleindl

Building Streaming Data Applications Using Apache KafkaSlim Baltagi

Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLconfluent

Supply Chain Optimization with Apache KafkaKai Wähner

Leveraging Mainframe Data for Modern Analyticsconfluent

Apache Kafka Use Cases_ When To Use It_ When Not To Use_.pdfNoman Shaikh

How to Leverage the Apache Kafka Ecosystem to Productionize Machine Learning ...Codemotion

Similar to Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy (20)

Introduction to Apache Kafka and Confluent... and why they matter!

Introduction to Apache Kafka and Confluent... and why they matter

Introducing Confluent Cloud: Apache Kafka as a Service

Confluent Enterprise Datasheet

Confluent and Elastic

Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!

Introduction to apache kafka, confluent and why they matter

BBL KAPPA Lesfurets.com

Kafka Streams for Java enthusiasts

Building streaming data applications using Kafka*[Connect + Core + Streams] b...

Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL

Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...

Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...

Kafka Vienna Meetup 020719

Building Streaming Data Applications Using Apache Kafka

Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL

Supply Chain Optimization with Apache Kafka

Leveraging Mainframe Data for Modern Analytics

Apache Kafka Use Cases_ When To Use It_ When Not To Use_.pdf

How to Leverage the Apache Kafka Ecosystem to Productionize Machine Learning ...

Recently uploaded

SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko

Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini

2.pdf Ejercicios de programación competitivaDiego Iván Oliveros Acosta

EY_Graph Database Powered SustainabilityNeo4j

英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0

Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray

Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea

Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ

Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin

Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp

Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin

Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz

CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies

Recently uploaded (20)

SpotFlow: Tracking Method Calls and States at Runtime

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf

Xen Safety Embedded OSS Summit April 2024 v4.pdf

2.pdf Ejercicios de programación competitiva

EY_Graph Database Powered Sustainability

英国UN学位证,北安普顿大学毕业证书1:1制作

Unveiling the Future: Sylius 2.0 New Features

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...

Buds n Tech IT Solutions: Top-Notch Web Services in Noida

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样

Cloud Data Center Network Construction - IEEE

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide

Unveiling Design Patterns: A Visual Guide with UML Diagrams

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE

Der Spagat zwischen BIAS und FAIRNESS (2024)

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...

Folding Cheat Sheet #4 - fourth in a series

CRM Contender Series: HubSpot vs. Salesforce

Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy

1. Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy Kairo Tavares kairo.tavares@hpe.com

2. Goals How Confluent Kafka Platform can solve problems in company What are streaming data pipelines and what are its challenges How KSQL can make easy your streaming data pipeline 2

3. Agenda – Kafka 101 – Confluent Kafka – Streaming Data Pipeline – KSQL – Demo 3

4. Kafka 101 4

5. Common problems that we face Extract - Transform – Load (ETL) 5

6. Common problems that we face Microservices 6

7. Lets organized it 7

8. Kafka 101 Apache Kafka Kafka® is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault- tolerant, wicked fast, and runs in production in thousands of companies. 8 https://kafka.apache.org/

9. Kafka 101 9

10. Kafka 101 Producing Data 10

11. Kafka 101 Consuming Data 11 Simple Consumer Consumer Groups

12. Confluent Platform 12

13. Confluent Platform 13 Founded by the team that built Apache Kafka®, Confluent builds an event streaming platform that enables companies to easily access data as real-time streams Development and Connectivity Stream Processing Management and Operations https://www.confluent.io/

14. Confluent Platform Deployment and Connectivity – Kafka Connect & Connectors – Schema Registry – Kafka Clients – REST Proxy – MQTT Proxy 14

15. Confluent Platform Avro Format – Open Source Data Format – Choice for a number of reasons: – Direct mapping to and from JSON – Compact format – Very fast to serialize and deserialize – Support to several programming languages – It has a rich, extensible schema language defined in pure JSON – It has the best notion of compatibility for evolving your data over time – Built-in documentation 15

16. Confluent Platform Management and Operations – Control Center – Manage key operations and monitor the health and performance of Kafka clusters and data streams with curated dashboards directly from a GUI. – Replicator – Replicate Kafka topics across data centers and public clouds to ensure disaster recovery and build distributed data pipelines. – Auto Data Balancer – Optimize your resource utilization by invoking a rack-aware algorithm that automatically rebalances partitions across a Kafka cluster. – Security Controls – Enable pass-through client credentials from REST Proxy / Schema Registry to Kafka broker. Map AD/LDAP groups to Kafka ACLs. – Operator – Automate deployment of the complete Confluent Platform, including Kafka, as a cloud-native application on Kubernetes. 16

17. Confluent Platform Stream Processing – Kafka Streams (Library) – Build mission-critical real-time applications with a simple Java library and a Kafka cluster - no additional framework or cluster needed. – KSQL (Service) – Build real-time stream processing applications against Apache Kafka using simple SQL-like semantics. 17

18. Confluent Platform License 18

19. Streaming Data Pipeline 19

20. Streaming Data Pipeline But what is a stream? 20 – A stream is an unbounded, continuous flow of records – Data is real-time – Immutable events – Records are key-value pairs (Kafka)

21. Streaming Data Pipeline Stream Processing 21 Per-record millisecond delay Data filtering Data transformation and conversions Data enrichment with joins Data manipulation with scalar functions Stateful processing Data Aggregation Windowing processing

22. Streaming Data Pipeline Windowing 22

23. KSQL 23

24. KSQL Architecture and components 24 https://docs.confluent.io/current/ksql/docs/concepts/ksql-architecture.html

25. KSQL Why KSQL? 25

26. KSQL Kafka Streams Library vs KSQL 26

27. KSQL Streams vs Tables 27

28. KSQL Filtering 28

29. KSQL Aggregation 29 How many pageviews regions by user?

30. KSQL Aggregation Functions 30UDF and UDAF

31. KSQL Joins 31 Table Stream

32. KSQL Click Stream - ETL 32 Stream (clickstream) {userid, page, action, usage_time} Table (users) {user_id, level, gender, age}

33. KSQL Credit Card Fraud – Anomaly detection 33

34. KSQL Error Monitoring 34

35. KSQL Arithmetic Operations 35

36. KSQL Machine Learning Prediction 36 CREATE STREAM AnomalyDetectionWithFilter SELECT rowtime, eventid, anomaly(sensorinput) AS Anomaly FROM carsensor WHERE anomaly(sensorinput) > 5;

37. Demo https://docs.confluent.io/current/quickstart/ce-docker-quickstart.html 37

38. Thanks kairo.tavares@hpe.com 38

Editor's Notes

Apache Kafka was originally developed by LinkedIn, and was subsequently open sourced in early 2011. Graduation from the Apache Incubator occurred on 23 October 2012. In 2014, Jun Rao, Jay Kreps, and Neha Narkhede, who had worked on Kafka at LinkedIn, created a new company named Confluent with a focus on Kafka.[5] According to a Quora post from 2014, Kreps chose to name the software after the author Franz Kafka because it is "a system optimized for writing", and he liked Kafka's work

Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy

Similar to Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy (20)

Recently uploaded

Recently uploaded (20)

Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy

Editor's Notes