R ET HINKING
Stream Processing
With Apache Kafka
2
1.0 Enterprise
Ready
0.10 Data Processing
(Streams API)
0.11 Exactly-once
Semantics
Kafka the Streaming Data Platform
2013 2014 2015 2016 2017 2018
0.8 Intra-cluster
replication
0.9 Data Integration
(Connect API)
3
Your System, a Hotel
APP
4
Your System, a Hotel
Microservices
APP
APP
APPSecurity
Guard
ReceptionistRoom
Service
5
Your System, a Hotel
Distributed Apps
APP
Security Guard
Security Team
6
As developers, we want
to build APPS not INFRASTRUCTURE
7
We want our apps to be:
Scalable Elastic Fault-tolerant
Stateful Distributed
8
Where do I put my compute?
9
Where do I put my state?
10
The actual question is
Where is my code?
11
the KAFKA STREAMS API is a
JAVA API to
BUILD REAL-TIME APPLICATIONS to
POWER THE BUSINESS
12
App
Streams
API
Not running
inside brokers!
13
Brokers?
Nope!
App
Streams
API
App
Streams
API
App
Streams
API
Same app, many instances
14
Before
DashboardProcessing Cluster
Your Job
Shared Database
15
After
Dashboard
APP
Streams
API
16
this means you can
DEPLOY your app ANYWHERE using
WHATEVER TECHNOLOGY YOU WANT
17
Things Kafka Streams Does
Runs
everywhere
Clustering
done for you
Exactly-once
processing
Event-time
processing
Integrated
database
Joins, windowing,
aggregation
S/M/L/XL/XXL/XXXL
sizes
18
First, some
API CONCEPTS
19
STREAMS are
EVERYWHERE
20
TABLES are EVERYWHERE
21
STREAMS <-> TABLES
22
// Example: reading data from Kafka
KStream<byte[], String> textLines = builder.stream("textlines-topic", Consumed.with(
Serdes.ByteArray(), Serdes.String()));
// Example: transforming data
KStream<byte[], String> upperCasedLines= rawRatings.mapValues(String::toUpperCase));
KStream
23
// Example: aggregating data
KTable<String, Long> wordCounts = textLines
.flatMapValues(textLine -> Arrays.asList(textLine.toLowerCase().split("W+")))
.groupBy((key, word) -> word)
.count();
KTable
24
DEMO
25
Remember, we want to build
APPS not
INFRASTRUCTURE
26
the KAFKA STREAMS API is a
JAVA API to
BUILD REAL-TIME APPLICATIONS to
POWER THE BUSINESS
27
Things Kafka Streams Does
Runs
everywhere
Clustering
done for you
Exactly-once
processing
Event-time
processing
Integrated
database
Joins, windowing,
aggregation
S/M/L/XL/XXL/XXXL
sizes
THANK YOU!

Riviera Jug - 20/03/2018 - Kafka streams