SlideShare a Scribd company logo
1 of 136
Download to read offline
Kafka Streams
Stream processing Made Simple with Kafka
1
Guozhang Wang
Hadoop Summit, June 28, 2016
2
What is NOT Stream Processing?
3
Stream Processing isn’t (necessarily)
• Transient, approximate, lossy…
• .. that you must have batch processing as safety net
4
5
6
7
8
Stream Processing
• A different programming paradigm
• .. that brings computation to unbounded data
• .. with tradeoffs between latency / cost / correctness
9
Why Kafka in Stream Processing?
10
• Persistent Buffering
• Logical Ordering
• Scalable “source-of-truth”
Kafka: Real-time Platforms
11
Stream Processing with Kafka
12
• Option I: Do It Yourself !
Stream Processing with Kafka
13
• Option I: Do It Yourself !
Stream Processing with Kafka
while (isRunning) {
// read some messages from Kafka
inputMessages = consumer.poll();
// do some processing…
// send output messages back to Kafka
producer.send(outputMessages);
}
14
15
• Ordering
• Partitioning &


Scalability

• Fault tolerance
DIY Stream Processing is Hard
• State Management
• Time, Window &


Out-of-order Data

• Re-processing
16
• Option I: Do It Yourself !
• Option II: full-fledged stream processing system
• Storm, Spark, Flink, Samza, ..
Stream Processing with Kafka
17
MapReduce Heritage?
• Config Management
• Resource Management

• Configuration

• etc..
18
MapReduce Heritage?
• Config Management
• Resource Management

• Deployment

• etc..
19
MapReduce Heritage?
• Config Management
• Resource Management

• Deployment

• etc..
Can I just use my own?!
20
• Option I: Do It Yourself !
• Option II: full-fledged stream processing system
• Option III: lightweight stream processing library
Stream Processing with Kafka
Kafka Streams
• In Apache Kafka since v0.10, May 2016
• Powerful yet easy-to-use stream processing library
• Event-at-a-time, Stateful
• Windowing with out-of-order handling
• Highly scalable, distributed, fault tolerant
• and more..
21
22
Anywhere, anytime
Ok. Ok. Ok. Ok.
23
Anywhere, anytime
<dependency>

<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>0.10.0.0</version>
</dependency>
24
Anywhere, anytime
War File
Rsync
Puppet/Chef
YARN
M
esos
Docker
Kubernetes
Very Uncool Very Cool
25
Simple is Beautiful
Kafka Streams DSL
26
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.countByKey(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a stream processing instance and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
Kafka Streams DSL
27
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.countByKey(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a stream processing instance and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
Kafka Streams DSL
28
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.countByKey(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a stream processing instance and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
Kafka Streams DSL
29
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.countByKey(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a stream processing instance and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
Kafka Streams DSL
30
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.countByKey(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a stream processing instance and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
Kafka Streams DSL
31
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.countByKey(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a stream processing instance and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
32
Native Kafka Integration
Property cfg = new Properties();
cfg.put(StreamsConfig.APPLICATION_ID_CONFIG, “my-streams-app”);
cfg.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, “broker1:9092”);
cfg.put(ConsumerConfig.AUTO_OFFSET_RESET_CONIFG, “earliest”);
cfg.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, “SASL_SSL”);
cfg.put(KafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, “registry:8081”);
StreamsConfig config = new StreamsConfig(cfg);
…
KafkaStreams streams = new KafkaStreams(builder, config);
33
Property cfg = new Properties();
cfg.put(StreamsConfig.APPLICATION_ID_CONFIG, “my-streams-app”);
cfg.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, “broker1:9092”);
cfg.put(ConsumerConfig.AUTO_OFFSET_RESET_CONIFG, “earliest”);
cfg.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, “SASL_SSL”);
cfg.put(KafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, “registry:8081”);
StreamsConfig config = new StreamsConfig(cfg);
…
KafkaStreams streams = new KafkaStreams(builder, config);
Native Kafka Integration
34
API, coding
“Full stack” evaluation
Operations, debugging, …
35
API, coding
“Full stack” evaluation
Operations, debugging, …
Simple is Beautiful
36
Key Idea:
Outsource hard problems to Kafka!
Kafka Concepts: the Log
4 5 5 7 8 9 10 11 12...
Producer Write
Consumer1 Reads
(offset 7)
Consumer2 Reads
(offset 10)
Messages
3
Topic 1
Topic 2
Partitions
Producers
Producers
Consumers
Consumers
Brokers
Kafka Concepts: the Log
39
Kafka Streams: Key Concepts
Stream and Records
40
Key Value Key Value Key Value Key Value
Stream
Record
Processor Topology
41
Stream
Processor Topology
42
Stream
Processor
Processor Topology
43
KStream<..> stream1 = builder.stream(”topic1”);
KStream<..> stream2 = builder.stream(”topic2”);
KStream<..> joined = stream1.leftJoin(stream2, ...);
KTable<..> aggregated = joined.aggregateByKey(...);
aggregated.to(”topic3”);
Processor Topology
44
KStream<..> stream1 = builder.stream(”topic1”);
KStream<..> stream2 = builder.stream(”topic2”);
KStream<..> joined = stream1.leftJoin(stream2, ...);
KTable<..> aggregated = joined.aggregateByKey(...);
aggregated.to(”topic3”);
Processor Topology
45
KStream<..> stream1 = builder.stream(”topic1”);
KStream<..> stream2 = builder.stream(”topic2”);
KStream<..> joined = stream1.leftJoin(stream2, ...);
KTable<..> aggregated = joined.aggregateByKey(...);
aggregated.to(”topic3”);
Processor Topology
46
KStream<..> stream1 = builder.stream(”topic1”);
KStream<..> stream2 = builder.stream(”topic2”);
KStream<..> joined = stream1.leftJoin(stream2, ...);
KTable<..> aggregated = joined.aggregateByKey(...);
aggregated.to(”topic3”);
Processor Topology
47
KStream<..> stream1 = builder.stream(”topic1”);
KStream<..> stream2 = builder.stream(”topic2”);
KStream<..> joined = stream1.leftJoin(stream2, ...);
KTable<..> aggregated = joined.aggregateByKey(...);
aggregated.to(”topic3”);
Processor Topology
48
Source Processor
Sink Processor
KStream<..> stream1 = builder.stream(
KStream<..> stream2 = builder.stream(
aggregated.to(
Processor Topology
49
KStream<..> stream1 = builder.stream(”topic1”);
KStream<..> stream2 = builder.table(”topic2”);
KStream<..> joined = stream1.leftJoin(stream2, ...);
KTable<..> aggregated = joined.aggregateByKey(...);
aggregated.to(”topic3”);
builder.addSource(”Source1”, ”topic1”)
.addSource(”Source2”, ”topic2”)
.addProcessor(”Join”, MyJoin:new, ”Source1”, ”Source2”)
.addProcessor(”Aggregate”, MyAggregate:new, ”Join”)
.addStateStore(Stores.persistent().build(), ”Aggregate”)
.addSink(”Sink”, ”topic3”, ”Aggregate”)
Processor Topology
50
builder.addSource(”Source1”, ”topic1”)
.addSource(”Source2”, ”topic2”)
.addProcessor(”Join”, MyJoin:new, ”Source1”, ”Source2”)
.addProcessor(”Aggregate”, MyAggregate:new, ”Join”)
.addStateStore(Stores.persistent().build(), ”Aggregate”)
.addSink(”Sink”, ”topic3”, ”Aggregate”)
Processor Topology
51
builder.addSource(”Source1”, ”topic1”)
.addSource(”Source2”, ”topic2”)
.addProcessor(”Join”, MyJoin:new, ”Source1”, ”Source2”)
.addProcessor(”Aggregate”, MyAggregate:new, ”Join”)
.addStateStore(Stores.persistent().build(), ”Aggregate”)
.addSink(”Sink”, ”topic3”, ”Aggregate”)
Processor Topology
52
builder.addSource(”Source1”, ”topic1”)
.addSource(”Source2”, ”topic2”)
.addProcessor(”Join”, MyJoin:new, ”Source1”, ”Source2”)
.addProcessor(”Aggregate”, MyAggregate:new, ”Join”)
.addStateStore(Stores.persistent().build(), ”Aggregate”)
.addSink(”Sink”, ”topic3”, ”Aggregate”)
Processor Topology
53Kafka Streams Kafka
Processor Topology
54
…
sink1.to(”topic1”);
source1 = builder.table(”topic1”);
source2 = sink1.through(”topic2”);
…
Processor Topology
55
…
sink1.to(”topic1”);
source1 = builder.table(”topic1”);
source2 = sink1.through(”topic2”);
…
Processor Topology
56
…
sink1.to(”topic1”);
source1 = builder.table(”topic1”);
source2 = sink1.through(”topic2”);
…
Processor Topology
57
…
sink1.to(”topic1”);
source1 = builder.table(”topic1”);
source2 = sink1.through(”topic2”);
…
Processor Topology
58
…
sink1.to(”topic1”);
source1 = builder.table(”topic1”);
source2 = sink1.through(”topic2”);
…
Sub-Topology
Processor Topology
59Kafka Streams Kafka
Processor Topology
60Kafka Streams Kafka
Processor Topology
61Kafka Streams Kafka
Processor Topology
62Kafka Streams Kafka
Stream Partitions and Tasks
63
Kafka Topic B Kafka Topic A
P1
P2
P1
P2
Stream Partitions and Tasks
64
Kafka Topic B Kafka Topic A
Processor Topology
P1
P2
P1
P2
Stream Partitions and Tasks
65
Kafka Topic AKafka Topic B
Kafka Topic B
Task2Task1
Stream Partitions and Tasks
66
Kafka Topic A
Kafka Topic B
Stream Partitions and Tasks
67
Kafka Topic A
Task2Task1
Kafka Topic B
Stream Threads
68
Kafka Topic A
MyApp.1
Task2Task1
Kafka Topic B
Stream Threads
69
Kafka Topic A
Task2Task1
MyApp.1 MyApp.2
Kafka Topic B
Stream Threads
70
Kafka Topic A
MyApp.1 MyApp.2
Task2Task1
Stream Threads
71
Kafka Topic AKafka Topic B
Task2Task1
MyApp.1 MyApp.2
Stream Threads
72
Task3
MyApp.3
Kafka Topic AKafka Topic B
Task2Task1
MyApp.1 MyApp.2
Stream Threads
73
Task3
Kafka Topic AKafka Topic B
Task2Task1
MyApp.1 MyApp.2 MyApp.3
Stream Threads
74
Thread1
Kafka Topic B
Task2Task1
Thread2
Task4Task3
Kafka Topic AKafka Topic A
Stream Threads
75
Thread1
Kafka Topic B
Task2Task1
Thread2
Task4Task3
Kafka Topic AKafka Topic A
Stream Threads
76
Thread1
Kafka Topic B
Task2Task1
Thread2
Task4Task3
Kafka Topic AKafka Topic A
Stream Threads
77
Thread1
Kafka Topic B
Task2Task1
Thread2
Task4Task3
Kafka Topic AKafka Topic A
78
• Ordering
• Partitioning &


Scalability

• Fault tolerance
Stream Processing Hard Parts
• State Management
• Time, Window &


Out-of-order Data

• Re-processing
States in Stream Processing
79
• filter
• map

• join

• aggregate
Stateless
Stateful
80
States in Stream Processing
81
KStream<..> stream1 = builder.stream(”topic1”);
KStream<..> stream2 = builder.stream(”topic2”);
KStream<..> joined = stream1.leftJoin(stream2, ...);
KTable<..> aggregated = joined.aggregateByKey(...);
aggregated.to(”topic2”);
State
82
builder.addSource(”Source1”, ”topic1”)
.addSource(”Source2”, ”topic2”)
.addProcessor(”Join”, MyJoin:new, ”Source1”, ”Source2”)
.addProcessor(”Aggregate”, MyAggregate:new, ”Join”)
.addStateStore(Stores.persistent().build(), ”Aggregate”)
.addSink(”Sink”, ”topic3”, ”Aggregate”)
State
States in Stream Processing
Kafka Topic B
Task2Task1
States in Stream Processing
83
Kafka Topic A
State State
It’s all about Time
• Event-time (when an event is created)
• Processing-time (when an event is processed)
84
Event-time 1 2 3 4 5 6 7
Processing-time 1999 2002 2005 1997 1980 1983 2015
85
PHANTOMMENACE
ATTACKOFTHECLONES
REVENGEOFTHESITH
ANEWHOPE
THEEMPIRESTRIKESBACK
RETURNOFTHEJEDI
THEFORCEAWAKENS
Out-of-Order
Timestamp Extractor
86
public long extract(ConsumerRecord<Object, Object> record) {
return System.currentTimeMillis();
}
public long extract(ConsumerRecord<Object, Object> record) {
return record.timestamp();
}
Timestamp Extractor
87
public long extract(ConsumerRecord<Object, Object> record) {
return System.currentTimeMillis();
}
public long extract(ConsumerRecord<Object, Object> record) {
return record.timestamp();
}
processing-time
Timestamp Extractor
88
public long extract(ConsumerRecord<Object, Object> record) {
return System.currentTimeMillis();
}
public long extract(ConsumerRecord<Object, Object> record) {
return record.timestamp();
}
processing-time
event-time
Timestamp Extractor
89
public long extract(ConsumerRecord<Object, Object> record) {
return System.currentTimeMillis();
} processing-time
event-time
public long extract(ConsumerRecord<Object, Object> record) {
return ((JsonNode) record.value()).get(”timestamp”).longValue();
}
Windowing
90
t
…
Windowing
91
t
…
Windowing
92
t
…
Windowing
93
t
…
Windowing
94
t
…
Windowing
95
t
…
Windowing
96
t
…
97
• Ordering
• Partitioning &


Scalability

• Fault tolerance
Stream Processing Hard Parts
• State Management
• Time, Window &


Out-of-order Data

• Re-processing
Stream v.s.Table?
98
KStream<..> stream1 = builder.stream(”topic1”);
KStream<..> stream2 = builder.stream(”topic2”);
KStream<..> joined = stream1.leftJoin(stream2, ...);
KTable<..> aggregated = joined.aggregateByKey(...);
aggregated.to(”topic2”);
State
99
Tables ≈ Streams
100
101
102
The Stream-Table Duality
• A stream is a changelog of a table
• A table is a materialized view at time of a stream
• Example: change data capture (CDC) of databases
103
KStream = interprets data as record stream
~ think: “append-only”
KTable = data as changelog stream
~ continuously updated materialized view
104
105
alice eggs bob lettuce alice milk
alice lnkd bob googl alice msft
KStream
KTable
User purchase history
User employment profile
106
alice eggs bob lettuce alice milk
alice lnkd bob googl alice msft
KStream
KTable
User purchase history
User employment profile
time
“Alice bought eggs.”
“Alice is now at LinkedIn.”
107
alice eggs bob lettuce alice milk
alice lnkd bob googl alice msft
KStream
KTable
User purchase history
User employment profile
time
“Alice bought eggs and milk.”
“Alice is now at LinkedIn
Microsoft.”
108
alice 2 bob 10 alice 3
timeKStream.aggregate()
KTable.aggregate()
(key: Alice, value: 2)
(key: Alice, value: 2)
109
alice 2 bob 10 alice 3
time
(key: Alice, value: 2 3)
(key: Alice, value: 2+3)
KStream.aggregate()
KTable.aggregate()
110
KStream KTable
reduce()
aggregate()
…
toStream()
map()
filter()
join()
…
map()
filter()
join()
…
111
KTable aggregated
KStream joined
KStream stream1KStream stream2
Updates Propagation in KTable
State
112
KTable aggregated
KStream joined
KStream stream1KStream stream2
State
Updates Propagation in KTable
113
KTable aggregated
KStream joined
KStream stream1KStream stream2
State
Updates Propagation in KTable
114
KTable aggregated
KStream joined
KStream stream1KStream stream2
State
Updates Propagation in KTable
115
• Ordering
• Partitioning &


Scalability

• Fault tolerance
Stream Processing Hard Parts
• State Management
• Time, Window &


Out-of-order Data

• Re-processing
116
Remember?
117
StateProcess
StateProcess
StateProcess
Kafka ChangelogFault Tolerance
Kafka
Kafka Streams
Kafka
118
StateProcess
StateProcess
Protoco
l
StateProcess
Fault Tolerance
Kafka
Kafka Streams
Kafka Changelog
Kafka
119
StateProcess
StateProcess
Protoco
l
StateProcess
Fault Tolerance
StateProcess
Kafka
Kafka Streams
Kafka Changelog
Kafka
120
121
122
123
124
• Ordering
• Partitioning &


Scalability

• Fault tolerance
Stream Processing Hard Parts
• State Management
• Time, Window &


Out-of-order Data

• Re-processing
125
• Ordering
• Partitioning &


Scalability

• Fault tolerance
Stream Processing Hard Parts
• State Management
• Time, Window &


Out-of-order Data

• Re-processing
Simple is Beautiful
Ongoing Work (0.10+)
• Beyond Java APIs
• SQL support, Python client, etc
• End-to-End Semantics (exactly-once)
• Queryable States
• … and more 126
Queryable States
127
State
Real-time Analytics
select Count(*), Sum(*)
from “MyAgg”
where windowId >
now() - 10;
128
But how to get data in / out Kafka?
129
130
131
132
Take-aways
• Stream Processing: a new programming paradigm
133
Take-aways
• Stream Processing: a new programming paradigm
• Kafka Streams: stream processing made easy
134
Take-aways
• Stream Processing: a new programming paradigm
• Kafka Streams: stream processing made easy
135
THANKS!
Guozhang Wang | guozhang@confluent.io | @guozhangwang
Visit Confluent at the Syncsort Booth (#1303), live demos @ 29th
Download Kafka Streams: www.confluent.io/product
136
We are Hiring!

More Related Content

What's hot

Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
Kai Wähner
 

What's hot (20)

Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connect
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
kafka
kafkakafka
kafka
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
 

Viewers also liked

Viewers also liked (9)

Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
The Design of the Scalaz 8 Effect System
The Design of the Scalaz 8 Effect SystemThe Design of the Scalaz 8 Effect System
The Design of the Scalaz 8 Effect System
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
reveal.js 3.0.0
reveal.js 3.0.0reveal.js 3.0.0
reveal.js 3.0.0
 

Similar to Introduction to Kafka Streams

Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Databricks
 

Similar to Introduction to Kafka Streams (20)

Stream Processing made simple with Kafka
Stream Processing made simple with KafkaStream Processing made simple with Kafka
Stream Processing made simple with Kafka
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
 
Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
 
I can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and SpringI can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and Spring
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things Right
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka StreamsStreams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
 
Streaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka StreamsStreaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka Streams
 
How to Build an Apache Kafka® Connector
How to Build an Apache Kafka® ConnectorHow to Build an Apache Kafka® Connector
How to Build an Apache Kafka® Connector
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processing
 
Testing Kafka components with Kafka for JUnit
Testing Kafka components with Kafka for JUnitTesting Kafka components with Kafka for JUnit
Testing Kafka components with Kafka for JUnit
 
Containerizing Distributed Pipes
Containerizing Distributed PipesContainerizing Distributed Pipes
Containerizing Distributed Pipes
 

More from Guozhang Wang

More from Guozhang Wang (12)

Consensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdfConsensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdf
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
 
Introduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of KafkaIntroduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of Kafka
 
Performance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsPerformance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams Applications
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
 
Building a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache KafkaBuilding a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache Kafka
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Behavioral Simulations in MapReduce
Behavioral Simulations in MapReduceBehavioral Simulations in MapReduce
Behavioral Simulations in MapReduce
 
Automatic Scaling Iterative Computations
Automatic Scaling Iterative ComputationsAutomatic Scaling Iterative Computations
Automatic Scaling Iterative Computations
 

Recently uploaded

Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdf
Kamal Acharya
 
Lecture_8-Digital implementation of analog controller design.pdf
Lecture_8-Digital implementation of analog controller design.pdfLecture_8-Digital implementation of analog controller design.pdf
Lecture_8-Digital implementation of analog controller design.pdf
mohamedsamy9878
 
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdfDR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DrGurudutt
 

Recently uploaded (20)

Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdf
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge
 
Natalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in KrakówNatalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in Kraków
 
Top 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering ScientistTop 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering Scientist
 
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamKIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
 
ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdf
ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdfONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdf
ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdf
 
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
 
Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdf
 
"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.
 
Lecture_8-Digital implementation of analog controller design.pdf
Lecture_8-Digital implementation of analog controller design.pdfLecture_8-Digital implementation of analog controller design.pdf
Lecture_8-Digital implementation of analog controller design.pdf
 
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
 
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGBRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
 
Dairy management system project report..pdf
Dairy management system project report..pdfDairy management system project report..pdf
Dairy management system project report..pdf
 
Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2
 
Planetary Gears of automatic transmission of vehicle
Planetary Gears of automatic transmission of vehiclePlanetary Gears of automatic transmission of vehicle
Planetary Gears of automatic transmission of vehicle
 
E-Commerce Shopping for developing a shopping ecommerce site
E-Commerce Shopping for developing a shopping ecommerce siteE-Commerce Shopping for developing a shopping ecommerce site
E-Commerce Shopping for developing a shopping ecommerce site
 
Lect_Z_Transform_Main_digital_image_processing.pptx
Lect_Z_Transform_Main_digital_image_processing.pptxLect_Z_Transform_Main_digital_image_processing.pptx
Lect_Z_Transform_Main_digital_image_processing.pptx
 
Attraction and Repulsion type Moving Iron Instruments.pptx
Attraction and Repulsion type Moving Iron Instruments.pptxAttraction and Repulsion type Moving Iron Instruments.pptx
Attraction and Repulsion type Moving Iron Instruments.pptx
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker project
 
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdfDR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
DR PROF ING GURUDUTT SAHNI WIKIPEDIA.pdf
 

Introduction to Kafka Streams

Editor's Notes

  1. Thank you.
  2. Well, stream processing has become widely popular today. Unlike Hadoop, Spark-like processing, which takes the bounded set of data, and only start processing until the data is completed, from a ETL process, and it can happen at a much later time than the data was originally generated, Stream processing is a real-time, continuous process for unbounded data series where the processing is usually takes a small set of record, or even one record at a time. And today, a common place to store these data streams is Kafka.
  3. Stream processing is a fundamental complement to capturing streams of data.
  4. This kind of run-as-a-service operational pattern comes from the Hadoop community.
  5. We think there should be an even better solution.
  6. No extra dependency, no enforced operational cost. In addition, it should support
  7. Again, in implementation such changelog streams should be compactable.
  8. Take all the organization's data and put it into a central place for real-time subscription. Data integration, replication, real-time stream processing.
  9. WAL
  10. Streaming on Message Pipes
  11. Batching: wait for all the data to be available. Reasoning about time are essential for dealing with unbounded, unordered data of varying event-time skew. Not all use cases care about event times (and if yours doesn’t, hooray! — your life is easier), but many do: billing, monitoring, anomaly detection.
  12. Talk about stream synchronization