SlideShare a Scribd company logo
Streaming ETL in Practice
with PostgreSQL, Apache
Kafka, and KSQL
SPI-NL 2018
11 Oct 2018 / Mic Hussey
@hussey_mic mic@confluent.io
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 2
• Systems Engineer @ Confluent
• Working in messaging/event processing since 1998
• GitHub: https://github.com/MichaelHussey
• Twitter: @hussey_mic
$ whoami
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 3
App App App App
search
HadoopDWH
monitoring security
MQ MQ
cache
cache
A bit of a mess…
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 4
The Streaming Platform
KAFKA
DWH Hadoop
App
App App App App
App
App
App
request-response
messaging
OR
stream
processing
streaming data pipelines
changelogs
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 5
Database offload → Analytics
HDFS / S3 /
BigQuery etc
RDBM
CDC
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 6
Streaming ETL with Apache Kafka and KSQL
order events
customer
customer orders
Stream
Processing
RDBM CDC
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 7
Real-time Event Stream Enrichment
order events
customer
Stream
Processing
customer orders
RDBMS
<y>
CDC
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 8
Transform Once, Use Many
order events
customer
Stream
Processing
customer orders
RDBMS
<y>
New App
<x>
CDC
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 9
Transform Once, Use Many
order events
customer
Stream
Processing
customer orders
RDBMS
<y>
HDFS / S3 / etc
New App
<x>
CDC
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 10
KSQL
Streaming ETL with Apache Kafka
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 11
Streaming Integration with Kafka Connect
Kafka Brokers
Kafka Connect
Tasks Workers
Sources Sinks
Amazon S3
syslog
flat file
CSV
JSON
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 12
The Connect API of Apache Kafka®
✓ Fault tolerant and automatically load balanced
✓ Extensible API
✓ Single Message Transforms
✓ Part of Apache Kafka, included in

Confluent Open Source
Reliable and scalable integration of Kafka
with other systems – no coding required.
{
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:mysql://localhost:3306/demo?user=rmoff&password=foo",
"table.whitelist": "sales,orders,customers"
}
https://docs.confluent.io/current/connect/
✓ Centralized management and configuration
✓ Support for hundreds of technologies
including RDBMS, Elasticsearch, HDFS, S3
✓ Supports CDC ingest of events from RDBMS
✓ Preserves data schema
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 13
Integrating Postgres with Kafka
Kafka Connect
Kafka Connect
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 14
Confluent Hub
hub.confluent.io
• Launched June 2018
• One-stop place to discover and
download :
• Connectors
• Transformations
• Converters
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 15
KSQL
Streaming ETL with Apache Kafka
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018
Declarative
Stream
Language
Processing
KSQLis a
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018
KSQLis the
Streaming
SQL Enginefor
Apache Kafka
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL
KSQL for Streaming ETL
CREATE STREAM vip_actions AS 

SELECT userid, page, action
FROM clickstream c
LEFT JOIN users u
ON c.userid = u.user_id 

WHERE u.level = 'Platinum';
Joining, filtering, and aggregating streams of event data
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL
KSQL for Anomaly Detection
CREATE TABLE possible_fraud AS

SELECT card_number, count(*)

FROM authorization_attempts 

WINDOW TUMBLING (SIZE 5 SECONDS)

GROUP BY card_number

HAVING count(*) > 3;
Identifying patterns or anomalies in real-time data,
surfaced in milliseconds
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL
KSQL for Real-Time Monitoring
• Log data monitoring, tracking and alerting
• syslog data
• Sensor / IoT data
CREATE STREAM SYSLOG_INVALID_USERS AS
SELECT HOST, MESSAGE
FROM SYSLOG
WHERE MESSAGE LIKE '%Invalid user%';
http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 21
KSQL
Streaming ETL with Apache Kafka
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 22
Kafka Connect
Producer API
Elasticsearc
Kafka Connect
{
"rating_id": 5313,
"user_id": 3,
"stars": 4,
"route_id": 6975,
"rating_time": 1519304105213,
"channel": "web",
"message": "worst. flight. ever. #neveragain"
}
{
"id": 3,
"first_name": "Merilyn",
"last_name": "Doughartie",
"email": "mdoughartie1@dedecms.com",
"gender": "Female",
"club_status": "platinum",
"comments": "none"
}
Postgres
Demo Time!
Kafka Connect
Postgre
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 23
Producer API
{
"rating_id": 5313,
"user_id": 3,
"stars": 4,
"route_id": 6975,
"rating_time": 1519304105213,
"channel": "web",
"message": "worst. flight. ever. #neveragain"
}
POOR_RATINGS
Filter all ratings where STARS<3
CREATE STREAM POOR_RATINGS AS
SELECT * FROM ratings WHERE STARS <3
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 24
Kafka Connect
Producer API
{
"rating_id": 5313,
"user_id": 3,
"stars": 4,
"route_id": 6975,
"rating_time": 1519304105213,
"channel": "web",
"message": "worst. flight. ever. #neveragain"
}
{
"id": 3,
"first_name": "Merilyn",
"last_name": "Doughartie",
"email": "mdoughartie1@dedecms.com",
"gender": "Female",
"club_status": "platinum",
"comments": "none"
}
RATINGS_WITH_CUSTOMER_D
Join each rating to customer data
UNHAPPY_PLATINUM_CUSTO
Filter for just PLATINUM customers
CREATE STREAM UNHAPPY_PLATINUM_CUSTOMERS AS
SELECT * FROM RATINGS_WITH_CUSTOMER_DATA
WHERE STARS < 3
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 25
Kafka Connect
Producer API
{
"rating_id": 5313,
"user_id": 3,
"stars": 4,
"route_id": 6975,
"rating_time": 1519304105213,
"channel": "web",
"message": "worst. flight. ever. #neveragain"
}
{
"id": 3,
"first_name": "Merilyn",
"last_name": "Doughartie",
"email": "mdoughartie1@dedecms.com",
"gender": "Female",
"club_status": "platinum",
"comments": "none"
}
RATINGS_WITH_CUSTOMER_D
Join each rating to customer data
RATINGS_BY_CLUB_STATUS_1
Aggregate per-minute by CLUB_STATUS
CREATE TABLE RATINGS_BY_CLUB_STATUS AS
SELECT CLUB_STATUS, COUNT(*)
FROM RATINGS_WITH_CUSTOMER_DATA
WINDOW TUMBLING (SIZE 1 MINUTES)
GROUP BY CLUB_STATUS;
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 26
Confluent Open Source :
Apache Kafka with a bunch of cool stuff! For free!
Database Changes Log Events loT Data Web Events …
CRM
Data Warehouse
Database
Hadoop
Data

Integration
…
Monitoring
Analytics
Custom Apps
Transformations
Real-time Applications
…
Apache Open Source Confluent Open Source Confluent Enterprise
Confluent Platform
Confluent Platform
Apache Kafka®
Core | Connect API | Streams API
Data Compatibility
Schema Registry
Monitoring & Administration
Confluent Control Center | Security
Operations
Replicator | Auto Data Balancing
Development and Connectivity
Clients | Connectors | REST Proxy | CLI
Apache Open Source Confluent Open Source Confluent Enterprise
SQL Stream Processing
KSQL
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 27
Free Books!
https://www.confluent.io/apache-kafka-stream-processing-book-bundle
@hussey_mic mic@confluent.io
http://cnfl.io/slack
https://www.confluent.io/
download/
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 29
• Postgres integration into Kafka
• http://debezium.io/docs/connectors/postgresql/
• https://www.simple.com/engineering/a-change-data-capture-pipeline-from-postgresql-to-kafka
• https://www.slideshare.net/JeffKlukas/postgresql-kafka-the-delight-of-change-data-capture
• https://blog.insightdatascience.com/from-postgresql-to-redshift-with-kafka-connect-111c44954a6a
• Streaming ETL
• Embrace the Anarchy : Apache Kafka's Role in Modern Data Architectures Recording & Slides
• Look Ma, no Code! Building Streaming Data Pipelines with Apache Kafka and KSQL
• Steps to Building a Streaming ETL Pipeline with Apache Kafka and KSQL Recording & Slides
• https://www.confluent.io/blog/ksql-in-action-real-time-streaming-etl-from-oracle-transactional-data
• https://github.com/confluentinc/ksql/
Useful links
@hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 30
• CDC Spreadsheet
• Blog: No More Silos: How to Integrate your Databases with Apache Kafka and CDC
• #partner-engineering on Slack for questions
• BD team (#partners / partners@confluent.io) can help with introductions on a given sales op
Resources
#EOF

More Related Content

What's hot

Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Robert Metzger
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
Kai Wähner
 
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Michael Noll
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Databricks
 
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
confluent
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Kai Wähner
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
confluent
 
GCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and ProcessingGCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and Processing
confluent
 
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
confluent
 
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data PipelinesETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
confluent
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Kairo Tavares
 
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlowIoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
Kai Wähner
 
dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
dotScale 2017 Keynote: The Rise of Real Time by Neha NarkhededotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
confluent
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
confluent
 
Data Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache KafkaData Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache Kafka
confluent
 
Kafka Summit NYC 2017 - Venice: A Distributed Database on top of Kafka
Kafka Summit NYC 2017 - Venice: A Distributed Database on top of KafkaKafka Summit NYC 2017 - Venice: A Distributed Database on top of Kafka
Kafka Summit NYC 2017 - Venice: A Distributed Database on top of Kafka
confluent
 
Confluent and Elastic: a Lovely Couple - Elastic Stack in a Day 2018
Confluent and Elastic: a Lovely Couple - Elastic Stack in a Day 2018Confluent and Elastic: a Lovely Couple - Elastic Stack in a Day 2018
Confluent and Elastic: a Lovely Couple - Elastic Stack in a Day 2018
Paolo Castagna
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
Kai Wähner
 
Operational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in KafkaOperational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in Kafka
confluent
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
LINE Corporation
 

What's hot (20)

Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
 
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
Introducing Apache Kafka's Streams API - Kafka meetup Munich, Jan 25 2017
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
 
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
 
GCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and ProcessingGCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and Processing
 
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
 
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data PipelinesETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
ETL as a Platform: Pandora Plays Nicely Everywhere with Real-Time Data Pipelines
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
 
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlowIoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
 
dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
dotScale 2017 Keynote: The Rise of Real Time by Neha NarkhededotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
 
Data Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache KafkaData Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache Kafka
 
Kafka Summit NYC 2017 - Venice: A Distributed Database on top of Kafka
Kafka Summit NYC 2017 - Venice: A Distributed Database on top of KafkaKafka Summit NYC 2017 - Venice: A Distributed Database on top of Kafka
Kafka Summit NYC 2017 - Venice: A Distributed Database on top of Kafka
 
Confluent and Elastic: a Lovely Couple - Elastic Stack in a Day 2018
Confluent and Elastic: a Lovely Couple - Elastic Stack in a Day 2018Confluent and Elastic: a Lovely Couple - Elastic Stack in a Day 2018
Confluent and Elastic: a Lovely Couple - Elastic Stack in a Day 2018
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
 
Operational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in KafkaOperational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in Kafka
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
 

Similar to Streaming etl in practice with postgre sql, apache kafka, and ksql mic

Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
confluent
 
Data pipeline with kafka
Data pipeline with kafkaData pipeline with kafka
Data pipeline with kafka
Mole Wong
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 
Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017
Nitin Kumar
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
Paolo Castagna
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
Kai Wähner
 
When NOT to Use Apache Kafka? With Kai Waehner | Current 2022
When NOT to Use Apache Kafka? With Kai Waehner | Current 2022When NOT to Use Apache Kafka? With Kai Waehner | Current 2022
When NOT to Use Apache Kafka? With Kai Waehner | Current 2022
HostedbyConfluent
 
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Michael Noll
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Evan Chan
 
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikKeeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
HostedbyConfluent
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
Attunity
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Paolo Castagna
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLSteps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
confluent
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
Florent Ramiere
 
JHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka EcosystemJHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka Ecosystem
Florent Ramiere
 
Confluent and Elastic
Confluent and ElasticConfluent and Elastic
Confluent and Elastic
Paolo Castagna
 
Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman
Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre RomanSpring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman
Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman
VMware Tanzu
 

Similar to Streaming etl in practice with postgre sql, apache kafka, and ksql mic (20)

Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
 
Data pipeline with kafka
Data pipeline with kafkaData pipeline with kafka
Data pipeline with kafka
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
When NOT to Use Apache Kafka? With Kai Waehner | Current 2022
When NOT to Use Apache Kafka? With Kai Waehner | Current 2022When NOT to Use Apache Kafka? With Kai Waehner | Current 2022
When NOT to Use Apache Kafka? With Kai Waehner | Current 2022
 
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
Rethinking Stream Processing with Apache Kafka: Applications vs. Clusters, St...
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
 
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikKeeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLSteps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 
JHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka EcosystemJHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka Ecosystem
 
Confluent and Elastic
Confluent and ElasticConfluent and Elastic
Confluent and Elastic
 
Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman
Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre RomanSpring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman
Spring Boot & Spring Cloud on Pivotal Application Service - Alexandre Roman
 

More from Bas van Oudenaarde

Data Driven
Data DrivenData Driven
Data Driven
Bas van Oudenaarde
 
Smart Mapper
Smart MapperSmart Mapper
Smart Mapper
Bas van Oudenaarde
 
12e Spin Meetup Rubix Cloud & Containers
12e Spin Meetup Rubix Cloud & Containers12e Spin Meetup Rubix Cloud & Containers
12e Spin Meetup Rubix Cloud & Containers
Bas van Oudenaarde
 
Tiende Meetup: Microservices
Tiende Meetup: MicroservicesTiende Meetup: Microservices
Tiende Meetup: Microservices
Bas van Oudenaarde
 
DevOps Tooling event Amazic
DevOps Tooling event AmazicDevOps Tooling event Amazic
DevOps Tooling event Amazic
Bas van Oudenaarde
 
Mongodb lab
Mongodb labMongodb lab
Mongodb lab
Bas van Oudenaarde
 

More from Bas van Oudenaarde (6)

Data Driven
Data DrivenData Driven
Data Driven
 
Smart Mapper
Smart MapperSmart Mapper
Smart Mapper
 
12e Spin Meetup Rubix Cloud & Containers
12e Spin Meetup Rubix Cloud & Containers12e Spin Meetup Rubix Cloud & Containers
12e Spin Meetup Rubix Cloud & Containers
 
Tiende Meetup: Microservices
Tiende Meetup: MicroservicesTiende Meetup: Microservices
Tiende Meetup: Microservices
 
DevOps Tooling event Amazic
DevOps Tooling event AmazicDevOps Tooling event Amazic
DevOps Tooling event Amazic
 
Mongodb lab
Mongodb labMongodb lab
Mongodb lab
 

Recently uploaded

一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
perranet1
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
Rebecca Bilbro
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
PsychoTech Services
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
ArshadAyub49
 
Senior Engineering Sample EM DOE - Sheet1.pdf
Senior Engineering Sample EM DOE  - Sheet1.pdfSenior Engineering Sample EM DOE  - Sheet1.pdf
Senior Engineering Sample EM DOE - Sheet1.pdf
Vineet
 
CAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdfCAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdf
frp60658
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
Vineet
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
sapna sharmap11
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
9gr6pty
 
Q4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slideQ4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slide
mukulupadhayay1
 
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
Do People Really Know Their Fertility Intentions?  Correspondence between Sel...Do People Really Know Their Fertility Intentions?  Correspondence between Sel...
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
Xiao Xu
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
GeorgiiSteshenko
 

Recently uploaded (20)

一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
 
Senior Engineering Sample EM DOE - Sheet1.pdf
Senior Engineering Sample EM DOE  - Sheet1.pdfSenior Engineering Sample EM DOE  - Sheet1.pdf
Senior Engineering Sample EM DOE - Sheet1.pdf
 
CAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdfCAP Excel Formulas & Functions July - Copy (4).pdf
CAP Excel Formulas & Functions July - Copy (4).pdf
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
 
Q4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slideQ4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slide
 
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
Do People Really Know Their Fertility Intentions?  Correspondence between Sel...Do People Really Know Their Fertility Intentions?  Correspondence between Sel...
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
 

Streaming etl in practice with postgre sql, apache kafka, and ksql mic

  • 1. Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL SPI-NL 2018 11 Oct 2018 / Mic Hussey @hussey_mic mic@confluent.io
  • 2. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 2 • Systems Engineer @ Confluent • Working in messaging/event processing since 1998 • GitHub: https://github.com/MichaelHussey • Twitter: @hussey_mic $ whoami
  • 3. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 3 App App App App search HadoopDWH monitoring security MQ MQ cache cache A bit of a mess…
  • 4. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 4 The Streaming Platform KAFKA DWH Hadoop App App App App App App App App request-response messaging OR stream processing streaming data pipelines changelogs
  • 5. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 5 Database offload → Analytics HDFS / S3 / BigQuery etc RDBM CDC
  • 6. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 6 Streaming ETL with Apache Kafka and KSQL order events customer customer orders Stream Processing RDBM CDC
  • 7. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 7 Real-time Event Stream Enrichment order events customer Stream Processing customer orders RDBMS <y> CDC
  • 8. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 8 Transform Once, Use Many order events customer Stream Processing customer orders RDBMS <y> New App <x> CDC
  • 9. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 9 Transform Once, Use Many order events customer Stream Processing customer orders RDBMS <y> HDFS / S3 / etc New App <x> CDC
  • 10. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 10 KSQL Streaming ETL with Apache Kafka
  • 11. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 11 Streaming Integration with Kafka Connect Kafka Brokers Kafka Connect Tasks Workers Sources Sinks Amazon S3 syslog flat file CSV JSON
  • 12. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 12 The Connect API of Apache Kafka® ✓ Fault tolerant and automatically load balanced ✓ Extensible API ✓ Single Message Transforms ✓ Part of Apache Kafka, included in
 Confluent Open Source Reliable and scalable integration of Kafka with other systems – no coding required. { "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector", "connection.url": "jdbc:mysql://localhost:3306/demo?user=rmoff&password=foo", "table.whitelist": "sales,orders,customers" } https://docs.confluent.io/current/connect/ ✓ Centralized management and configuration ✓ Support for hundreds of technologies including RDBMS, Elasticsearch, HDFS, S3 ✓ Supports CDC ingest of events from RDBMS ✓ Preserves data schema
  • 13. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 13 Integrating Postgres with Kafka Kafka Connect Kafka Connect
  • 14. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 14 Confluent Hub hub.confluent.io • Launched June 2018 • One-stop place to discover and download : • Connectors • Transformations • Converters
  • 15. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 15 KSQL Streaming ETL with Apache Kafka
  • 16. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 Declarative Stream Language Processing KSQLis a
  • 17. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 KSQLis the Streaming SQL Enginefor Apache Kafka
  • 18. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL KSQL for Streaming ETL CREATE STREAM vip_actions AS 
 SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id 
 WHERE u.level = 'Platinum'; Joining, filtering, and aggregating streams of event data
  • 19. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL KSQL for Anomaly Detection CREATE TABLE possible_fraud AS
 SELECT card_number, count(*)
 FROM authorization_attempts 
 WINDOW TUMBLING (SIZE 5 SECONDS)
 GROUP BY card_number
 HAVING count(*) > 3; Identifying patterns or anomalies in real-time data, surfaced in milliseconds
  • 20. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL KSQL for Real-Time Monitoring • Log data monitoring, tracking and alerting • syslog data • Sensor / IoT data CREATE STREAM SYSLOG_INVALID_USERS AS SELECT HOST, MESSAGE FROM SYSLOG WHERE MESSAGE LIKE '%Invalid user%'; http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting
  • 21. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 21 KSQL Streaming ETL with Apache Kafka
  • 22. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 22 Kafka Connect Producer API Elasticsearc Kafka Connect { "rating_id": 5313, "user_id": 3, "stars": 4, "route_id": 6975, "rating_time": 1519304105213, "channel": "web", "message": "worst. flight. ever. #neveragain" } { "id": 3, "first_name": "Merilyn", "last_name": "Doughartie", "email": "mdoughartie1@dedecms.com", "gender": "Female", "club_status": "platinum", "comments": "none" } Postgres Demo Time! Kafka Connect Postgre
  • 23. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 23 Producer API { "rating_id": 5313, "user_id": 3, "stars": 4, "route_id": 6975, "rating_time": 1519304105213, "channel": "web", "message": "worst. flight. ever. #neveragain" } POOR_RATINGS Filter all ratings where STARS<3 CREATE STREAM POOR_RATINGS AS SELECT * FROM ratings WHERE STARS <3
  • 24. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 24 Kafka Connect Producer API { "rating_id": 5313, "user_id": 3, "stars": 4, "route_id": 6975, "rating_time": 1519304105213, "channel": "web", "message": "worst. flight. ever. #neveragain" } { "id": 3, "first_name": "Merilyn", "last_name": "Doughartie", "email": "mdoughartie1@dedecms.com", "gender": "Female", "club_status": "platinum", "comments": "none" } RATINGS_WITH_CUSTOMER_D Join each rating to customer data UNHAPPY_PLATINUM_CUSTO Filter for just PLATINUM customers CREATE STREAM UNHAPPY_PLATINUM_CUSTOMERS AS SELECT * FROM RATINGS_WITH_CUSTOMER_DATA WHERE STARS < 3
  • 25. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 25 Kafka Connect Producer API { "rating_id": 5313, "user_id": 3, "stars": 4, "route_id": 6975, "rating_time": 1519304105213, "channel": "web", "message": "worst. flight. ever. #neveragain" } { "id": 3, "first_name": "Merilyn", "last_name": "Doughartie", "email": "mdoughartie1@dedecms.com", "gender": "Female", "club_status": "platinum", "comments": "none" } RATINGS_WITH_CUSTOMER_D Join each rating to customer data RATINGS_BY_CLUB_STATUS_1 Aggregate per-minute by CLUB_STATUS CREATE TABLE RATINGS_BY_CLUB_STATUS AS SELECT CLUB_STATUS, COUNT(*) FROM RATINGS_WITH_CUSTOMER_DATA WINDOW TUMBLING (SIZE 1 MINUTES) GROUP BY CLUB_STATUS;
  • 26. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 26 Confluent Open Source : Apache Kafka with a bunch of cool stuff! For free! Database Changes Log Events loT Data Web Events … CRM Data Warehouse Database Hadoop Data
 Integration … Monitoring Analytics Custom Apps Transformations Real-time Applications … Apache Open Source Confluent Open Source Confluent Enterprise Confluent Platform Confluent Platform Apache Kafka® Core | Connect API | Streams API Data Compatibility Schema Registry Monitoring & Administration Confluent Control Center | Security Operations Replicator | Auto Data Balancing Development and Connectivity Clients | Connectors | REST Proxy | CLI Apache Open Source Confluent Open Source Confluent Enterprise SQL Stream Processing KSQL
  • 27. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 27 Free Books! https://www.confluent.io/apache-kafka-stream-processing-book-bundle
  • 29. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 29 • Postgres integration into Kafka • http://debezium.io/docs/connectors/postgresql/ • https://www.simple.com/engineering/a-change-data-capture-pipeline-from-postgresql-to-kafka • https://www.slideshare.net/JeffKlukas/postgresql-kafka-the-delight-of-change-data-capture • https://blog.insightdatascience.com/from-postgresql-to-redshift-with-kafka-connect-111c44954a6a • Streaming ETL • Embrace the Anarchy : Apache Kafka's Role in Modern Data Architectures Recording & Slides • Look Ma, no Code! Building Streaming Data Pipelines with Apache Kafka and KSQL • Steps to Building a Streaming ETL Pipeline with Apache Kafka and KSQL Recording & Slides • https://www.confluent.io/blog/ksql-in-action-real-time-streaming-etl-from-oracle-transactional-data • https://github.com/confluentinc/ksql/ Useful links
  • 30. @hussey_mic / Streaming ETL in Practice with PostgreSQL, Apache Kafka, and KSQL - SPI-NL 2018 30 • CDC Spreadsheet • Blog: No More Silos: How to Integrate your Databases with Apache Kafka and CDC • #partner-engineering on Slack for questions • BD team (#partners / partners@confluent.io) can help with introductions on a given sales op Resources #EOF