SlideShare a Scribd company logo
1
What’s new in Kafka 0.10.0
Introducing Kafka Streams
Eno Thereska
eno@confluent.io
Kafka Meetup, July 21, 2016
Slide contributions: Michael Noll and Ismael
enotheres
ka
2
What’s new in Kafka 0.10.0
1. Lots of new KIPs in
1. KIP-4 metadata
2. KIP-31 Relative offsets in compressed message sets
3. KIP-32 Add timestamps to Kafka message
4. KIP-35 Retrieving protocol version
5. KIP-36 Rack aware replica assignment
6. KIP-41 KafkaConsumer Max Records
7. KIP-42: Add Producer and Consumer Interceptors
8. KIP-45 Standardize all client sequence interaction
9. KIP-43 Kafka SASL enhancements
10. KIP-57 - Interoperable LZ4 Framing
11. KIP-51 - List Connectors REST API
12. KIP-52: Connector Control APIs
13. KIP-56: Allow cross origin HTTP requests on all HTTP methods
2. Kafka Streams
3
Kafka Streams
• Powerful yet easy-to use Java library
• Part of open source Apache Kafka, introduced in v0.10, May 2016
• Source code: https://github.com/apache/kafka/tree/trunk/streams
• Build your own stream processing applications that are
• highly scalable
• fault-tolerant
• distributed
• stateful
• able to handle late-arriving, out-of-order data
4
Kafka Streams
5
When to use Kafka Streams (as of Kafka 0.10)
Recommended use cases
• Application Development
• “Fast Data” apps (small or big
data)
• Reactive and stateful
applications
• Linear streams
• Event-driven systems
• Continuous transformations
• Continuous queries
• Microservices
Questionable use cases
• Data Science / Data
Engineering
• “Heavy lifting”
• Data mining
• Non-linear, branching streams
(graphs)
• Machine learning, number
crunching
• What you’d do in a data
warehouse
6
Alright, can you show me some code now? 
KStream<Integer, Integer> input = builder.stream(“numbers-topic”);
// Stateless computation
KStream<Integer, Integer> doubled = input.mapValues(v -> v * 2);
// Stateful computation
KTable<Integer, Integer> sumOfOdds = input
.filter((k,v) -> v % 2 != 0)
.selectKey((k, v) -> 1)
.reduceByKey((v1, v2) -> v1 + v2, ”sum-of-odds");
• API option 1: Kafka Streams DSL (declarative)
7
Alright, can you show me some code now? 
Startup
Process a record
Periodic action
Shutdown
• API option 2: low-level Processor API (imperative)
8
How do I install Kafka Streams?
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>0.10.0.0</version>
</dependency>
• There is and there should be no “install”.
• It’s a library. Add it to your app like any other library.
9
Do I need to install a CLUSTER to run my apps?
• No, you don’t. Kafka Streams allows you to stay lean and lightweight.
• Unlearn bad habits: “do cool stuff with data != must have cluster”
Ok. Ok. Ok. Ok.
10
How do I package and deploy my apps? How do I …?
11
How do I package and deploy my apps? How do I …?
• Whatever works for you. Stick to what you/your company think is the
best way.
• Why? Because an app that uses Kafka Streams is…a normal Java app.
• Your Ops/SRE/InfoSec teams may finally start to love not hate you.
12
Kafka
concepts recap
13
Kafka concepts
14
Kafka concepts
15
Kafka Streams
concepts
16
Stream: ordered, re-playable, fault-tolerant sequence of immutable data
records
17
Processor topology: computational logic of an app’s data
processing
18
Stream partitions and stream tasks: units of parallelism
19
Streams meet Tables
A stream is a changelog of a table
A table is a materialized view at time of a stream
20
Streams meet Tables – in the Kafka Streams DSL
alice 2 bob 10 alice 3
time
“Alice clicked 2 times.”
“Alice clicked 2 times.”
time
“Alice clicked 2+3 = 5
times.”
“Alice clicked 2 3 times.”
KTable
= interprets data as changelog stream
~ is a continuously updated materialized view
KStream
= interprets data as record stream
21
Streams meet Tables – in the Kafka Streams DSL
• JOIN example: compute user clicks by region via
KStream.leftJoin(KTable)
22
Streams meet Tables – in the Kafka Streams DSL
• JOIN example: compute user clicks by region via
KStream.leftJoin(KTable)
23
Streams meet Tables – in the Kafka Streams DSL
• JOIN example: compute user clicks by region via
KStream.leftJoin(KTable)
alice 13 bob 5
Input KStream
alice
(europe,
13)
bob (europe, 5)
leftJoin()
w/ KTable
KStream
13
europe 5
europe
map() KStream
KTable
reduceByKey(_ + _) 13
europe
…
…
…
…
18
europe
…
…
…
…
24
Streams meet Tables – in the Kafka Streams DSL
25
Kafka Streams
key features
26
Key features in 0.10
• Native, 100%-compatible Kafka integration
• Also inherits Kafka’s security model, e.g. to encrypt data-in-transit
• Uses Kafka as its internal messaging layer, too
• Highly scalable
• Fault-tolerant
• Elastic
27
Key features in 0.10
• Native, 100%-compatible Kafka integration
• Also inherits Kafka’s security model, e.g. to encrypt data-in-transit
• Uses Kafka as its internal messaging layer, too
• Highly scalable
• Fault-tolerant
• Elastic
• Stateful and stateless computations (e.g. joins, aggregations)
28
Fault tolerance
State stores
29
Fault tolerance
State stores
charlie 3
bob 1
alice 1
alice 2
30
Fault tolerance
State stores
31
Fault tolerance
State stores alice 1
alice 2
32
Key features in 0.10
• Native, 100%-compatible Kafka integration
• Also inherits Kafka’s security model, e.g. to encrypt data-in-transit
• Uses Kafka as its internal messaging layer, too
• Highly scalable
• Fault-tolerant
• Elastic
• Stateful and stateless computations
• Time model
33
Time
34
Time
35
Time
• You configure the desired time semantics through timestamp extractors
• Default extractor yields event-time semantics
• Extracts embedded timestamps of Kafka messages (introduced in v0.10)
36
Key features in 0.10
• Native, 100%-compatible Kafka integration
• Also inherits Kafka’s security model, e.g. to encrypt data-in-transit
• Uses Kafka as its internal messaging layer, too
• Highly scalable
• Fault-tolerant
• Elastic
• Stateful and stateless computations
• Time model
• Windowing
37
Key features in 0.10
• Native, 100%-compatible Kafka integration
• Also inherits Kafka’s security model, e.g. to encrypt data-in-transit
• Uses Kafka as its internal messaging layer, too
• Highly scalable
• Fault-tolerant
• Elastic
• Stateful and stateless computations
• Time model
• Windowing
• Supports late-arriving and out-of-order data
• Millisecond processing latency, no micro-batching
• At-least-once processing guarantees (exactly-once is in the works)
38
Where to go from here?
• Kafka Streams is available in Apache Kafka 0.10 and Confluent Platform
3.0
• http://kafka.apache.org/
• http://www.confluent.io/download (free + enterprise versions,
tar/zip/deb/rpm)
• Kafka Streams demos at https://github.com/confluentinc/examples
• Java 7, Java 8+ with lambdas, and Scala
• WordCount, Joins, Avro integration, Top-N computation, Windowing, …
• Apache Kafka documentation:
http://kafka.apache.org/documentation.html
• Confluent documentation: http://docs.confluent.io/3.0.0/streams/
• Quickstart, Concepts, Architecture, Developer Guide, FAQ
• Join our bi-weekly Ask Me Anything sessions on Kafka Streams
• Contact me at eno@confluent.io for details
39
Some of the things to come
• Exactly-once semantics
• Queriable state – tap into the state of your applications (KIP-67: adopted)
• SQL interface
• Listen to and collaborate with the developer community
• Your feedback counts a lot! Share it via users@kafka.apache.org
40
Want to contribute to Kafka and open source?
Join the Kafka community
http://kafka.apache.org/
Questions, comments? Tweet with #bbuzz and /cc to @ConfluentInc
…in a great team with the creators of Kafka?
Confluent is hiring 
http://confluent.io/
41
Backup
42
Details on other KIPs
(Slides contributed by Ismael Juma)
43
KIP-4 Metadata
- Update MetadataRequest and MetadataResponse
- Expose new fields for KIP-4 - not used yet
- Make it possible to ask for cluster information with no topics
- Fix nasty bug where request would be repeatedly sent if producer
was started and unused for more than 5 minutes
- KAFKA-3602
44
KIP-31 Relative offsets in compressed message
sets
- Message format change (affects FetchRequest, ProduceRequest
and on-disk format)
- Avoids recompression to assign offsets
- Improves broker latency
- Should also improve throughput, but can affect producer batch
sizes so can reduce throughput in some cases, tune linger.ms and
batch.size
45
KIP-32 Add timestamps to Kafka message
- CreateTime or LogAppendTime
- Increases message size by 8 bytes
- Small throughput degradation, particularly for small messages
- Careful not to go over network limit due to this increase
46
Migration from V1 to V2 format
- Read the upgrade notes
- 0.10 Producer produces in new format
- 0.10 broker can store in old or new format depending on config
- 0.10 consumers can use either format
- 0.9 consumers only support old format
- Broker can do conversion on the fly (with performance impact)
47
KIP-35 Retrieving protocol version
- Request type that returns all the requests and versions supported
by the broker
- Aim is for clients to use this to help them support multiple broker
versions
- Not used by Java client yet
- Used by librdkafka and kafka-python
48
KIP-36 Rack aware replica assignment
- Kafka can now run with a rack awareness feature that isolates
replicas so they are guaranteed to span multiple racks or
availability zones. This allows all of Kafka’s durability guarantees to
be applied to these larger architectural units, significantly
increasing availability
- Old clients must be upgraded to 0.9.0.1 before going to 0.10.0.0
- broker.rack in server.properties
- Can be disabled when launching reassignment tool
49
New consumer enhancements
- KIP-41 KafkaConsumer Max Records
- KIP-42: Add Producer and Consumer Interceptors
- KIP-45 Standardize all client sequence interaction on j.u.Collection.
50
KIP-43 Kafka SASL enhancements
- Multiple SASL mechanisms: PLAIN and Kerberos included
- Pluggable
- Added support for protocol evolution
51
KIP-57 - Interoperable LZ4 Framing
It was broken, fixed in 0.10, took advantage of message format bump
52
Connect KIPs
KIP-51 - List Connectors REST API
KIP-52: Connector Control APIs
KIP-56: Allow cross origin HTTP requests on all HTTP methods
53
Lots of bugs fixed
Producer ordering, SocketServer leaks, New Consumer, Offset
handling in the broker
http://mirrors.muzzy.org.uk/apache/kafka/0.10.0.0/RELEASE_NOTES
.html

More Related Content

Similar to Kafka Explainaton

Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Kai Wähner
 
World of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaWorld of Tanks Experience of Using Kafka
World of Tanks Experience of Using Kafka
Levon Avakyan
 
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
Joan Viladrosa Riera
 
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Big Data Spain
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
Guido Schmutz
 
Python Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuidePython Kafka Integration: Developers Guide
Python Kafka Integration: Developers Guide
Inexture Solutions
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
Will Gardella
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
Joe Stein
 
Distributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaDistributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and Scala
Max Alexejev
 
Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 
Connecting Apache Kafka With Mule ESB
Connecting Apache Kafka With Mule ESBConnecting Apache Kafka With Mule ESB
Connecting Apache Kafka With Mule ESB
Jitendra Bafna
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
PivotalOpenSourceHub
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
confluent
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Kumar Shivam
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
confluent
 
Deploying and Operating KSQL
Deploying and Operating KSQLDeploying and Operating KSQL
Deploying and Operating KSQL
confluent
 
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
DataConf.TW2018: Develop Kafka Streams Application on Your LaptopDataConf.TW2018: Develop Kafka Streams Application on Your Laptop
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
Yu-Jhe Li
 

Similar to Kafka Explainaton (20)

Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
 
World of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaWorld of Tanks Experience of Using Kafka
World of Tanks Experience of Using Kafka
 
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
 
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 
Python Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuidePython Kafka Integration: Developers Guide
Python Kafka Integration: Developers Guide
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
Distributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and ScalaDistributed & Highly Available server applications in Java and Scala
Distributed & Highly Available server applications in Java and Scala
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Connecting Apache Kafka With Mule ESB
Connecting Apache Kafka With Mule ESBConnecting Apache Kafka With Mule ESB
Connecting Apache Kafka With Mule ESB
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
Deploying and Operating KSQL
Deploying and Operating KSQLDeploying and Operating KSQL
Deploying and Operating KSQL
 
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
DataConf.TW2018: Develop Kafka Streams Application on Your LaptopDataConf.TW2018: Develop Kafka Streams Application on Your Laptop
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
 

Recently uploaded

BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 

Recently uploaded (20)

BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 

Kafka Explainaton

  • 1. 1 What’s new in Kafka 0.10.0 Introducing Kafka Streams Eno Thereska eno@confluent.io Kafka Meetup, July 21, 2016 Slide contributions: Michael Noll and Ismael enotheres ka
  • 2. 2 What’s new in Kafka 0.10.0 1. Lots of new KIPs in 1. KIP-4 metadata 2. KIP-31 Relative offsets in compressed message sets 3. KIP-32 Add timestamps to Kafka message 4. KIP-35 Retrieving protocol version 5. KIP-36 Rack aware replica assignment 6. KIP-41 KafkaConsumer Max Records 7. KIP-42: Add Producer and Consumer Interceptors 8. KIP-45 Standardize all client sequence interaction 9. KIP-43 Kafka SASL enhancements 10. KIP-57 - Interoperable LZ4 Framing 11. KIP-51 - List Connectors REST API 12. KIP-52: Connector Control APIs 13. KIP-56: Allow cross origin HTTP requests on all HTTP methods 2. Kafka Streams
  • 3. 3 Kafka Streams • Powerful yet easy-to use Java library • Part of open source Apache Kafka, introduced in v0.10, May 2016 • Source code: https://github.com/apache/kafka/tree/trunk/streams • Build your own stream processing applications that are • highly scalable • fault-tolerant • distributed • stateful • able to handle late-arriving, out-of-order data
  • 5. 5 When to use Kafka Streams (as of Kafka 0.10) Recommended use cases • Application Development • “Fast Data” apps (small or big data) • Reactive and stateful applications • Linear streams • Event-driven systems • Continuous transformations • Continuous queries • Microservices Questionable use cases • Data Science / Data Engineering • “Heavy lifting” • Data mining • Non-linear, branching streams (graphs) • Machine learning, number crunching • What you’d do in a data warehouse
  • 6. 6 Alright, can you show me some code now?  KStream<Integer, Integer> input = builder.stream(“numbers-topic”); // Stateless computation KStream<Integer, Integer> doubled = input.mapValues(v -> v * 2); // Stateful computation KTable<Integer, Integer> sumOfOdds = input .filter((k,v) -> v % 2 != 0) .selectKey((k, v) -> 1) .reduceByKey((v1, v2) -> v1 + v2, ”sum-of-odds"); • API option 1: Kafka Streams DSL (declarative)
  • 7. 7 Alright, can you show me some code now?  Startup Process a record Periodic action Shutdown • API option 2: low-level Processor API (imperative)
  • 8. 8 How do I install Kafka Streams? <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-streams</artifactId> <version>0.10.0.0</version> </dependency> • There is and there should be no “install”. • It’s a library. Add it to your app like any other library.
  • 9. 9 Do I need to install a CLUSTER to run my apps? • No, you don’t. Kafka Streams allows you to stay lean and lightweight. • Unlearn bad habits: “do cool stuff with data != must have cluster” Ok. Ok. Ok. Ok.
  • 10. 10 How do I package and deploy my apps? How do I …?
  • 11. 11 How do I package and deploy my apps? How do I …? • Whatever works for you. Stick to what you/your company think is the best way. • Why? Because an app that uses Kafka Streams is…a normal Java app. • Your Ops/SRE/InfoSec teams may finally start to love not hate you.
  • 16. 16 Stream: ordered, re-playable, fault-tolerant sequence of immutable data records
  • 17. 17 Processor topology: computational logic of an app’s data processing
  • 18. 18 Stream partitions and stream tasks: units of parallelism
  • 19. 19 Streams meet Tables A stream is a changelog of a table A table is a materialized view at time of a stream
  • 20. 20 Streams meet Tables – in the Kafka Streams DSL alice 2 bob 10 alice 3 time “Alice clicked 2 times.” “Alice clicked 2 times.” time “Alice clicked 2+3 = 5 times.” “Alice clicked 2 3 times.” KTable = interprets data as changelog stream ~ is a continuously updated materialized view KStream = interprets data as record stream
  • 21. 21 Streams meet Tables – in the Kafka Streams DSL • JOIN example: compute user clicks by region via KStream.leftJoin(KTable)
  • 22. 22 Streams meet Tables – in the Kafka Streams DSL • JOIN example: compute user clicks by region via KStream.leftJoin(KTable)
  • 23. 23 Streams meet Tables – in the Kafka Streams DSL • JOIN example: compute user clicks by region via KStream.leftJoin(KTable) alice 13 bob 5 Input KStream alice (europe, 13) bob (europe, 5) leftJoin() w/ KTable KStream 13 europe 5 europe map() KStream KTable reduceByKey(_ + _) 13 europe … … … … 18 europe … … … …
  • 24. 24 Streams meet Tables – in the Kafka Streams DSL
  • 26. 26 Key features in 0.10 • Native, 100%-compatible Kafka integration • Also inherits Kafka’s security model, e.g. to encrypt data-in-transit • Uses Kafka as its internal messaging layer, too • Highly scalable • Fault-tolerant • Elastic
  • 27. 27 Key features in 0.10 • Native, 100%-compatible Kafka integration • Also inherits Kafka’s security model, e.g. to encrypt data-in-transit • Uses Kafka as its internal messaging layer, too • Highly scalable • Fault-tolerant • Elastic • Stateful and stateless computations (e.g. joins, aggregations)
  • 29. 29 Fault tolerance State stores charlie 3 bob 1 alice 1 alice 2
  • 32. 32 Key features in 0.10 • Native, 100%-compatible Kafka integration • Also inherits Kafka’s security model, e.g. to encrypt data-in-transit • Uses Kafka as its internal messaging layer, too • Highly scalable • Fault-tolerant • Elastic • Stateful and stateless computations • Time model
  • 35. 35 Time • You configure the desired time semantics through timestamp extractors • Default extractor yields event-time semantics • Extracts embedded timestamps of Kafka messages (introduced in v0.10)
  • 36. 36 Key features in 0.10 • Native, 100%-compatible Kafka integration • Also inherits Kafka’s security model, e.g. to encrypt data-in-transit • Uses Kafka as its internal messaging layer, too • Highly scalable • Fault-tolerant • Elastic • Stateful and stateless computations • Time model • Windowing
  • 37. 37 Key features in 0.10 • Native, 100%-compatible Kafka integration • Also inherits Kafka’s security model, e.g. to encrypt data-in-transit • Uses Kafka as its internal messaging layer, too • Highly scalable • Fault-tolerant • Elastic • Stateful and stateless computations • Time model • Windowing • Supports late-arriving and out-of-order data • Millisecond processing latency, no micro-batching • At-least-once processing guarantees (exactly-once is in the works)
  • 38. 38 Where to go from here? • Kafka Streams is available in Apache Kafka 0.10 and Confluent Platform 3.0 • http://kafka.apache.org/ • http://www.confluent.io/download (free + enterprise versions, tar/zip/deb/rpm) • Kafka Streams demos at https://github.com/confluentinc/examples • Java 7, Java 8+ with lambdas, and Scala • WordCount, Joins, Avro integration, Top-N computation, Windowing, … • Apache Kafka documentation: http://kafka.apache.org/documentation.html • Confluent documentation: http://docs.confluent.io/3.0.0/streams/ • Quickstart, Concepts, Architecture, Developer Guide, FAQ • Join our bi-weekly Ask Me Anything sessions on Kafka Streams • Contact me at eno@confluent.io for details
  • 39. 39 Some of the things to come • Exactly-once semantics • Queriable state – tap into the state of your applications (KIP-67: adopted) • SQL interface • Listen to and collaborate with the developer community • Your feedback counts a lot! Share it via users@kafka.apache.org
  • 40. 40 Want to contribute to Kafka and open source? Join the Kafka community http://kafka.apache.org/ Questions, comments? Tweet with #bbuzz and /cc to @ConfluentInc …in a great team with the creators of Kafka? Confluent is hiring  http://confluent.io/
  • 42. 42 Details on other KIPs (Slides contributed by Ismael Juma)
  • 43. 43 KIP-4 Metadata - Update MetadataRequest and MetadataResponse - Expose new fields for KIP-4 - not used yet - Make it possible to ask for cluster information with no topics - Fix nasty bug where request would be repeatedly sent if producer was started and unused for more than 5 minutes - KAFKA-3602
  • 44. 44 KIP-31 Relative offsets in compressed message sets - Message format change (affects FetchRequest, ProduceRequest and on-disk format) - Avoids recompression to assign offsets - Improves broker latency - Should also improve throughput, but can affect producer batch sizes so can reduce throughput in some cases, tune linger.ms and batch.size
  • 45. 45 KIP-32 Add timestamps to Kafka message - CreateTime or LogAppendTime - Increases message size by 8 bytes - Small throughput degradation, particularly for small messages - Careful not to go over network limit due to this increase
  • 46. 46 Migration from V1 to V2 format - Read the upgrade notes - 0.10 Producer produces in new format - 0.10 broker can store in old or new format depending on config - 0.10 consumers can use either format - 0.9 consumers only support old format - Broker can do conversion on the fly (with performance impact)
  • 47. 47 KIP-35 Retrieving protocol version - Request type that returns all the requests and versions supported by the broker - Aim is for clients to use this to help them support multiple broker versions - Not used by Java client yet - Used by librdkafka and kafka-python
  • 48. 48 KIP-36 Rack aware replica assignment - Kafka can now run with a rack awareness feature that isolates replicas so they are guaranteed to span multiple racks or availability zones. This allows all of Kafka’s durability guarantees to be applied to these larger architectural units, significantly increasing availability - Old clients must be upgraded to 0.9.0.1 before going to 0.10.0.0 - broker.rack in server.properties - Can be disabled when launching reassignment tool
  • 49. 49 New consumer enhancements - KIP-41 KafkaConsumer Max Records - KIP-42: Add Producer and Consumer Interceptors - KIP-45 Standardize all client sequence interaction on j.u.Collection.
  • 50. 50 KIP-43 Kafka SASL enhancements - Multiple SASL mechanisms: PLAIN and Kerberos included - Pluggable - Added support for protocol evolution
  • 51. 51 KIP-57 - Interoperable LZ4 Framing It was broken, fixed in 0.10, took advantage of message format bump
  • 52. 52 Connect KIPs KIP-51 - List Connectors REST API KIP-52: Connector Control APIs KIP-56: Allow cross origin HTTP requests on all HTTP methods
  • 53. 53 Lots of bugs fixed Producer ordering, SocketServer leaks, New Consumer, Offset handling in the broker http://mirrors.muzzy.org.uk/apache/kafka/0.10.0.0/RELEASE_NOTES .html

Editor's Notes

  1. Basic workflow is: Get the input data into Kafka, e.g. via Kafka Connect (part of Apache Kafka) or via your own applications that write data to Kafka. Process the data with Kafka Streams, and write the results back to Kafka.
  2. FYI: Some people have begun using the low-level Processor API to port their Apache Storm code to Kafka Streams.
  3. In some sense, Kafka Streams exploits economies of scale. Projects like Mesos or Kubernetes, for example, are totally focused on resource management and scheduling, and they’ll always do a better job here than a tool that’s focused on stream processing. Kafka Streams should rather allow for “composition” with these deployment tools and resources managers (think: Unix philosophy) rather than being strongly opinionated and dictating any such choices upon you.
  4. Stream: ordered, re-playable, fault-tolerant sequence of immutable data records Data records: key-value pairs (closely related to Kafka’s key-value messages)
  5. Processor topology: computational logic of an app’s data processing Defined via the Kafka Streams DSL or the low-level Processor API Stream processor: a node in the topology, represents a processing step As a user, you will only be exposed to these nuts n’ bolts if you use the Processor API. The Kafka Streams DSL hides this from you.
  6. Stream partitions and stream tasks are the logical units of parallelism Stream partition: totally ordered sequence of data records; maps to a Kafka topic A data record in the stream maps to a Kafka message from that topic The keys of data records determine the partitioning of data in both Kafka and Kafka Streams, i.e. how data is routed to specific partitions A processor topology is scaled by breaking it into multiple stream tasks, based on number of input stream partitions Stream tasks work in isolation, i.e. independent from each other Each Stream task has its own local, fault-tolerant state
  7. KStream ~ records are interpreted as INSERTs (since no record replaces any existing record) KTable – records are interpreted as UPDATEs (since any existing row with the same key is overwritten) Note: We ignore Bob in this diagram. Bob is only shown to highlight that there is generally more data than just Alice’s.
  8. To summarize: some operations such as map() retain the shape (e.g. a stream will stay a stream), some operations change the shape (e.g. a stream will become a table).
  9. Mention DSL abstracts stores away, but low level API provides direct access
  10. A proper notion and model of time is crucial for stream processing