Kafka can be used to build real-time streaming applications and process large amounts of data. It provides a simple publish-subscribe messaging model with streams of records. Kafka Connect allows connecting Kafka with other data systems and formats through reusable connectors. Kafka Streams provides a streaming library to allow building streaming applications and processing data in Kafka streams through operators like map, filter and windowing.
1. #DevoxxFR
Kafka … de haut en bas !
University
Florent Ramière @framiere
Jean-Louis Boudart @jlboudart
Nicolas Romanetti @nromanetti
1
2. 2
Massive volumes of
new data generated
every day
Mobile Cloud Microservices Internet of
Things
Machine
Learning
Distributed across
apps, devices,
datacenters, clouds
Structured,
unstructured
polymorphic
What
4. 4
Silos explained by Data Gravity concept
As data accumulates (builds mass) there is a greater
likelihood that additional services and applications will
be attracted to this data.
This is the same effect gravity has on objects around a
planet. As the mass or density increases, so does the
strength of gravitational pull.
35. 35
Set up secure Kafka
& build your first app
Understand streaming
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streaming
Infrastructure & apps
across LOBs
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streaming
Self-service on shared
Kafka
Infrastructure &
applications across
LOBs
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streamingUnderstand streaming
Pre-streamingValue
Stream Everything
05Break Silos
04
03
Go To Production
02
Learn Kafka
01
Investment & Time
Solve A Critical
Need
Maturity model
36. 36
Set up secure Kafka
& build your first app
Understand streaming
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streaming
Infrastructure & apps
across LOBs
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streaming
Self-service on shared
Kafka
Infrastructure &
applications across
LOBs
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streamingUnderstand streaming
Pre-streamingValue
Stream Everything
05Break Silos
04
03
Go To Production
02
Learn Kafka
01
Solve A Critical
Need
Maturity model
37. 37
Set up secure Kafka
& build your first app
Understand streaming
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streaming
Infrastructure & apps
across LOBs
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streaming
Self-service on shared
Kafka
Infrastructure &
applications across
LOBs
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streamingUnderstand streaming
Pre-streamingValue
Stream Everything
05Break Silos
04
03
Go To Production
02
Learn Kafka
01
Solve A Critical
Need
Maturity model
40. 40
… spawned a full platform
Apache Kafka®
Core | Connect API | Streams API
Stream Processing & Compatibility
KSQL | Schema Registry
Operations
Replicator | Auto Data Balancer | Connectors | MQTT Proxy | Operator
Database
Changes
Log Events IoT Data Web Events other events
Hadoop
Database
Data
Warehouse
CRM
other
DATA INTEGRATION
Transformations
Custom Apps
Analytics
Monitoring
other
REAL-TIME
APPLICATIONS
OPEN SOURCE FEATURES COMMERCIAL FEATURES
Datacenter Public Cloud Confluent Cloud
CONFLUENT PLATFORM
Administration & Monitoring
Control Center | Security
Connectivity
Clients | Connectors | REST Proxy
CONFLUENT FULLY-MANAGEDCUSTOMER SELF-MANAGED
61. 61
Apache Kafka Connect API: Import and Export Data In & Out of Kafka
JDBC
Mongo
MySQL
Elastic
Cassandra
HDFS
Kafka Connect API
Kafka Pipeline
Connector
Connector
Connector
Connector
Connector
Connector
Sources Sinks
Fault tolerant
Manage hundreds of
data sources and sinks
Preserves data schema
Integrated within
Confluent Control Center
62. 62
Connectors: Connect Kafka Easily with Data Sources and Sinks
Databases Datastore/File Store
Analytics Applications / Other
63. 63
Kafka Connect API, Part of the Apache Kafka™ Project
Connect any source to any target system
Integrated
• 100% compatible with Kafka v0.9 and
higher
• Integrated with Confluent’s Schema
Registry
• Easy to manage with Confluent Control
Center
Flexible
• 40+ open source connectors available
• Easy to develop additional connectors
• Flexible support for data types and
formats
Compatible
• Maintains critical metadata
• Preserves schema information
• Supports schema evolution
Reliable
• Automated failover
• Exactly-once guarantees
• Balances workload between nodes
66. 66
Clients: Communicate with Kafka in a Broad Variety of Languages
Apache Kafka
Confluent Platform Community Supported
Proxy http/REST
stdin/stdout
Confluent Platform Clients developed and fully supported by Confluent
67. 67
REST Proxy: Talking to Non-native Kafka Apps and Outside the Firewall
REST Proxy
Non-Java Applications
Native Kafka Java
Applications
Schema Registry
REST / HTTP
Simplifies administrative
actions
Simplifies message creation
and consumption
Provides a RESTful
interface to a Kafka cluster
86. 86
Event Time Processing
Event-time
”The point in time when an event or data record occurred, i.e. was originally created
"by the source". Achieving event-time semantics typically requires embedding
timestamps in the data records at the time a data record is being produced.”
Processing-time
”The point in time when the event or data record happens to be processed by the
stream processing application, i.e. when the record is being consumed. The
processing-time may be milliseconds, hours, or days etc. later than the original event-
time.”
Ingestion-time
“The point in time when an event or data record is stored in a topic partition by a
Kafka broker.”
88. 88
Delivery Guarantee
At most once
“Messages may be lost but are never redelivered.”
At least once
“Messages are never lost but may be redelivered.“
Exactly once
“Each message is delivered once and only once.“
100. 100
Interactive Queries
App
Streams API
store = kafkaStreams
.store(name, types)
value = store.get(key)
From our App, how to query the state store?
- Get the store « by name & types»
- Then the value « by key »
READ ONLY (Streams DSL)
Kafka
Cluster
102. 102
Interactive Queries
App
Streams
API
store = kafkaStreams
.store(name, types)
value = store.get(key)
We add App nodes to make it scale
Which App to call to get the value ?
Front
End App
Streams
API
App
Streams
API
?
?
?
key
Kafka
Cluster
103. 103
Interactive Queries
App
Streams
API
store = kafkaStreams
.store(name, types)
value = store.get(key)
We add App nodes to make it scale
Which App to call to get the value ?
è Any node
è We shift the problem to the App
Front
End App
Streams
API
App
Streams
API
key Kafka
Cluster
104. 104
Interactive Queries
App
Streams
API
metadata = kafkaStreams
.metadataForKey(name,key)
host = metadata.host()
port = metadata.port()
How does the App locate the value?
- Thanks to the metadata exchanged
with the coordinator
- Some simple configuration is
required
Front
End App
Streams
API
App
Streams
API
key Kafka
Cluster
Metadata
105. 105
Interactive Queries
App
Streams
API
metadata = kafkaStreams
.metadataForKey(name,key)
host = metadata.host()
port = metadata.port()
Once the data is located, the App
forwards the call to the target node
Front
End App
Streams
API
App
Streams
API
key Kafka
Cluster
Metadata
106. 106
Interactive Queries
App
Streams
API
metadata = kafkaStreams
.metadataForKey(name,key)
host = metadata.host()
port = metadata.port()
Beware!
The state store can be queried only in
« RUNNING » state
è Not during a rebalance
è May impact your SLAs if you expose the
data to your customers
Front
End App
Streams
API
App
Streams
API
key
App
Streams
API
Kafka
Cluster
107. 107
Interactive Queries
App
Streams
API
metadata = kafkaStreams
.metadataForKey(name,key)
host = metadata.host()
port = metadata.port()
Solution ?
Second App cluster, but:
- More resources...
- 1 more hop
Front
End
key Kafka
Cluster
App
Streams
API
App
Streams
API
App
Streams
API
App
Streams
API
App (b)
Streams
API
App (a)
Streams
API
109. 109
KSQL for Data Exploration
SELECT status, bytes
FROM clickstream
WHERE user_agent =
'Mozilla/5.0 (compatible; MSIE 6.0)';
110. 110
KSQL for Streaming ETL
Fact 1 Fact 2 Fact 3 Fact 4 Fact 5 Fact 6 id 1 id 2 id 3Business
111. 111
KSQL for Streaming ETL
Fact 1 Fact 2 Fact 3 Fact 4 Fact 5 Fact 6 id 1 id 2 id 3
Fact X Fact Y Fact Z
112. 112
KSQL for Streaming ETL
Fact 1 Fact 2 Fact 3 Fact 4 Fact 5 Fact 6 id 1 id 2 id 3
Fact X Fact Y Fact Z
Fact A Fact B Fact C Fact D Fact E
Fact K Fact L Fact M Fact N Id X
113. 113
KSQL for Streaming ETL
CREATE STREAM vip_actions AS
SELECT userid, page, action FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
115. 115
User Defined Functions (UDF)
SELECT eventid, anomaly(sensorinput)
FROM sensor
@Udf(description = "apply analytic model to sensor input")
public String anomaly(String sensorinput){ return your_logic; }
116. 116
KSQL for Anomaly Detection
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
118. 118
Plenty of KSQL Recipies
https://www.confluent.io/stream-processing-cookbook/
119. 119
Plenty of KSQL Recipies
https://www.confluent.io/stream-processing-cookbook/
120. 120
Plenty of KSQL Recipies
https://www.confluent.io/stream-processing-cookbook/
121. 121
KSQL: Enable Stream Processing using SQL-like Semantics
Example Use Cases
• Streaming ETL
• Anomaly detection
• Event monitoring
Leverage Kafka Streams API
without any coding required
KSQL server
Engine
(runs queries)
REST API
CLIClients
Confluent
Control Center
GUI
Kafka Cluster
Use any programming language
Connect via CLI or Control Center
user interface
124. 124
Lowering the Bar to Enter the World of Streaming
Kafka User Population
CodingSophistication
Core Java developers
Core developers who don’t use Java/Scala
Data engineers, architects, DevOps/SRE
BI analysts
streams
126. 126
The Challenge of Data Compatibility at Scale : implicit à explicit !
App 1
App 2
App 3
Many sources without a policy
causes mayhem in a centralized
data pipeline
Ensuring downstream systems can
use the data is key to an
operational stream pipeline
Example: Date formats
Even within a single application,
different formats can be
presented
Incompatibly formatted message
127. 127
Schema Registry: Make Data Backwards Compatible and Future-Proof
● Define the expected fields for each Kafka topic
● Automatically handle schema changes (e.g. new
fields)
● Prevent backwards incompatible changes
● Support multi-data center environments
Elastic
Cassandra
HDFS
Example Consumers
Serializer
App 1
Serializer
App 2
!
Kafka Topic!
Schema
Registry
133. 133
System Health
Are all brokers and topics available?
How much data is being processed?
What can be tuned to improve
performance?
End-to-End SLA Monitoring
Does Kafka process all events <15 seconds?
Is the 8am report missing data?
Are there duplicate events?
135. 135
Confluent Control Center– Cluster Health & Administration
Cluster health dashboard
• Monitor the health of your
Kafka clusters
and get alerts if any problems
occur
• Measure system load,
performance,
and operations
• View aggregate statistics or
drill down
by broker or topic
Cluster administration
• Monitor topic configurations
136. 136
View consumer-partition lag across
topics for a consumer group
Alert on max consumer group lag
across all topics
Consumer Lag Monitoring
136
142. 142
Resources – Community Slack and Mailing List
https://slackpass.io/confluentcommunity
https://groups.google.com/forum/#!forum/confluent-platform
149. 149
Kafka Provides a
Central Nervous
System for the
Modern Digital
Enterprise
Enabling companies to respond
accurately and in real time to
business events
150. 150
150
Jeudi: Neil Avery
KAFKA - THE ASYNCHRONOUS MICROSERVICES RUNTIME FOR STATE, SCALE
AND PERFORMANCE
Vendredi 14:30 - 15:15 - Florent & Loulou
APACHE KAFKA : PATTERNS / ANTI-PATTERNS
Vendredi: 15:30 – 17:30 - Florent, Nicolas & Loulou
APACHE KAFKA - LES MAINS DEDANS