1. 1C O N F I D E N T I A L
Join the Confluent
Community Slack Channel
Subscribe to the
Confluent blog
cnfl.io/community-slack cnfl.io/read
Welcome to the Amsterdam Apache Kafka Meetup!
6:00 pm-6:45 pm:
Doors Open/Networking/pizza & drinks
6:45 pm-7:20 pm:
Using Kafka to integrate DWH and Cloud Based
big data systems:
Mic Hussey, Systems Engineer, Confluent
7:20 pm-7:55 pm:
Topic Management at Scale:
Filip Yonov, Constantin Mota and
Josephine Dik, ING
7:55 pm-8:30 pm:
Real Time Investment Alerts using Apache Kafka
& Spring Kafka at ING Bank:
Tim van Baarsen and Marcos Maia
8:30 pm-9:00 pm: Drinks and Networking
Apache, Apache Kafka, Kafka and the Kafka logo are trademarks of the Apache Software Foundation. The Apache Software Foundation has no
affiliation with and does not endorse the materials provided at this event.
2. 2C O N F I D E N T I A L
NOMINATE YOURSELF OR A PEER AT
CONFLUENT.IO/NOMINATE
3. 3C O N F I D E N T I A L
KS19Meetup.
CONFLUENT COMMUNITY DISCOUNT CODE
25% OFF*
*Standard Priced Conference pass
4. 4C O N F I D E N T I A L
Using Kafka to integrate DWH and
Cloud Based big data systems.
Mic Hussey, Confluent Nordics, mic@confluent.io
5. 5C O N F I D E N T I A L 5C O N F I D E N T I A L
Event Streaming as a Foundational Technology
Very few
Foundational
Technologies
Hundreds...
Supporting
Technologies
Innovation
6. 6C O N F I D E N T I A L 6C O N F I D E N T I A L
New Generation of Applications
Email
Web Browsing
Ubiquitous Internet Access
Fast
Always On
Mobile
7. 7C O N F I D E N T I A L 7C O N F I D E N T I A L
ETL/Data Integration Messaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence
Data Loss
No Replay
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)
8. 8C O N F I D E N T I A L 8C O N F I D E N T I A L
ETL/Data Integration Messaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence
Data Loss
No Replay
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)
9. 9C O N F I D E N T I A L 9C O N F I D E N T I A L
ETL/Data Integration
Batch
Expensive
Time Consuming
Messaging
Difficult to Scale
No Persistence
Data Loss
No Replay
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)
Transient MessagesStored records
10. 10C O N F I D E N T I A L 10C O N F I D E N T I A L
ETL/Data Integration
Batch
Expensive
Time Consuming
Messaging
Difficult to Scale
No Persistence
Data Loss
No Replay
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)
Transient MessagesStored records
Both of these are a complete mismatch
to how a business works.
11. 11C O N F I D E N T I A L 11C O N F I D E N T I A L
ETL/Data Integration Messaging
Transient MessagesStored records
ETL/Data Integration MessagingMessaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence
Data Loss
No Replay
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)
Event Streaming Paradigm
High Throughput
Durable
Persistent
Maintains Order
Fast (Low Latency)
12. 12C O N F I D E N T I A L 12C O N F I D E N T I A L
Fast (Low Latency)
Event Streaming Paradigm
To rethink data as not stored records
or transient messages, but instead as
a continually updating stream of events
13. 13C O N F I D E N T I A L 13C O N F I D E N T I A L
Fast (Low Latency)
Event Streaming Paradigm
14. 14C O N F I D E N T I A L
Apache Kafka, the de-facto OSS standard for
event streaming
Real-time | Uses disk structure for constant performance at Petabyte scale
Scalable | Distributed, scales quickly and easily without downtime
Persistent | Persists messages on disks, enables intra-cluster replication
Reliable | Replicates data, auto balances consumers upon failure
In production at more
than a third of the
Fortune 500
2 trillion messages a
day at LinkedIn
500 billion events a
day (1.3 PB) at Netflix
16. 16C O N F I D E N T I A L 16C O N F I D E N T I A L
Data Warehouses to Big Data
17. 17C O N F I D E N T I A L
Kafka Integration Architecture
Apps Apps Apps
Apps Apps Apps
Apps Apps Apps
Apps Apps Apps
Apps
Search
NoSQL
Apps
Apps
DWH
Hado
STREAM
ING
PLATFORM
Apps
Search
NoSQL
Apps
DWH
STREAMING
PLATFORM
PRODUCERCONSUMER
18. 18C O N F I D E N T I A L
Sample UseCase: Sales data
● Dataset from Kaggle https://www.kaggle.com/kyanyoga/sample-sales-data
19. 19C O N F I D E N T I A L
DWH
● Current de-facto
data integration
technology
● Third Normal Form
● Minimises data
duplication
● Star schema
20. 20C O N F I D E N T I A L 20
Big Data
● Data storage is
cheap
● Tabular data
● Flat schema
23. 23C O N F I D E N T I A L
Kafka Cluster
Connect API Stream Processing Connect API
$ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt
24. 24C O N F I D E N T I A L
2
4
KSQLis the
Streaming
SQL Enginefor
Apache Kafka
25. 25C O N F I D E N T I A L
CREATE STREAM oob_readings AS
SELECT *, c.std_value, c.sigma
FROM sensor_reading s
LEFT JOIN sensor_characteristics c
ON s.id = c.id
WHERE abs(s.value – c.std_value) > 3*c.sigma;
Simple SQL syntax for expressing reasoning along and
across data streams.
You can write user-defined functions in Java
26. 26C O N F I D E N T I A L
Streaming KSQL: pairwise joins
27. 27C O N F I D E N T I A L
Streaming KSQL: pairwise joins
28. 28C O N F I D E N T I A L
Streaming KSQL: pairwise joins
29. 29C O N F I D E N T I A L
Streaming KSQL: pairwise joins
30. 30C O N F I D E N T I A L
What does KSQL look like?
● First load a topic into a stream
● Then flatten to a table
● Join stream to table for enrichment
CREATE STREAM orderlines1 AS
SELECT ol.*, o.ORDERDATE, o.STATUS, o.QTR_ID, o.MONTH_ID, o.YEAR_ID,
o.DEALSIZE, o.CUSTOMERNAME
FROM ORDERLINES_3NF ol
LEFT JOIN T_ORDERS_3NF o ON ol.ORDERNUMBER = o.ORDERNUMBER;
CREATE STREAM ORDERS_3NF
WITH (KAFKA_TOPIC='orders_cdc', VALUE_FORMAT='AVRO’)
PARTITION BY ORDERNUMBER;
CREATE TABLE T_ORDERS_3NF
WITH (KAFKA_TOPIC='ORDERS_3NF', VALUE_FORMAT='AVRO', KEY='ORDERNUMBER’);
31. 31C O N F I D E N T I A L
Or use the Kafka Streams API
● Java or Scala
● Can do multiple joins in one operation
● Provides an interactive query API which makes it possible to query the state
store.
32. 32C O N F I D E N T I A LConfluent Community - What next?
About 10,000 Kafkateers are
collaborating every single day on the
Confluent Community Slack channel!
There are more than 35,000 Kafkateers
in around 145 meetup groups across
all five continents!
Join the Confluent Community
Slack Channel
Join your local Apache Kafka®
Meetup
Get frequent updates from key names
in Apache Kafka® on best practices,
product updates & more!
Subscribe to the
Confluent blog
cnfl.io/community-slack cnfl.io/meetups cnfl.io/read
Apache, Apache Kafka, Kafka and the Kafka logo are trademarks of the Apache Software Foundation. The Apache Software Foundation has no
affiliation with and does not endorse the materials provided at this event.
33. 33C O N F I D E N T I A L
NOMINATE YOURSELF OR A PEER AT
CONFLUENT.IO/NOMINATE
34. 34C O N F I D E N T I A L
KS19Meetup.
CONFLUENT COMMUNITY DISCOUNT CODE
25% OFF*
*Standard Priced Conference pass