SlideShare a Scribd company logo
1 of 73
Download to read offline
Kongo: Building a Scalable
Streaming IoT Application
using Apache Kafka
Paul Brebner
instaclustr.com Technology Evangelist
Overview
What we’ll see on
the journey up river
1. Kafka introduction
2. The Kongo problem
3. Application, Architecture and Design(s)
4. Streams extension
5. Scaling
1 Kafka
Introduction
Instaclustr
Managed
Platform
Multi-cloud
For scale,
performance,
availability,
integration, security
Instaclustr
Managed
Platform
Open Source -
Store, Analyze,
Search, Explore,
and in 2018 we
added Stream –
Kafka!
What is Kafka?
Message flow
Distributed
streams
processing
1 Distributed Producers…
2 Send Messages
3 To Distributed Consumers
4 Via Kafka Cluster
Kafka
Key Benefits
■ Fast – high throughput and low latency
■ Scalable – horizontally scalable, just add nodes and
partitions
■ Reliable – distributed and fault tolerant
■ Zero data loss
■ Open Source
■ Heterogeneous data sources and sinks
■ Available as an Instaclustr Managed service
How does
Kafka work?
Excerpt from…
Producer
Consumer
Consumer
Consumer
Consumer
?
Kafka is “pub-sub”, It’s loosely coupled,
producers and consumers don’t know about
each other.
Filtering, or which consumers get which messages, is topic based.
- Producers send messages to topics.
- Consumers subscribe to topics of interest, e.g. Parties.
- When they poll they only receive messages sent to those topics.
None of these consumers will receive messages sent to the “Work” topic.
Producer
Consumer
Consumer
Consumer
Consumer
Topic “Parties”
Topic “Work”
Consumers subscribed
to Topic “Parties”
Consumers poll to
receive messages
from “Parties”
Consumers not subscribed to
“Work” messages
Kafka works like an Amish Barn raising.
Partitions and a consumer group share
work across multiple consumers, the more
partitions a topic has the more consumers
it supports.
Image: Paul Cyr ©2018 NorthernMainePhotos.com
Kafka also works like the Clone Army
It supports delivery of the same message to
multiple consumers with consumer groups.
Kafka doesn’t throw messages away
immediately they are delivered, so the
same message can be delivered to
multiple consumer groups.
Image: AKKHARAT JARUSILAWONG / Shutterstock.com
Consumers subscribed to ”Parties” topic are allocated partitions.
When they poll they will only get messages from their allocated
partitions.
Consumer
Consumer
Topic “Parties”
Partition 1
Partition 2
Partition 3
Producer
Consumer Group
Consumer
Consumer Group
Consumer
This enables consumers in the same group to share the work
around. Each consumer gets only a subset of the available
messages.
Consumer
Consumer
Topic “Parties”
Partition 1
Partition 2
Partition 3
Producer
Consumer Group
Consumer
Consumers share
work within groups
Multiple groups enable message broadcasting. Messages
are duplicated across groups, as each consumer group
receives a copy of each message.
Consumer
Consumer
Consumer
Topic “Parties”
Partition 1
Partition 2
Partition 3
Producer
Consumer Group
Consumer
Consumer Group
Messages are
duplicated across
Consumer groups
2 The Kongo
Problem
Amazon was taken
Congo is 2nd biggest
river, 5,000km
And deepest
Congo -> Kongo
(Kingdom of)
Kongo River
Important for trade
A logistics problem
Our
Logistics
Problem
Goods stored in
Warehouses
Goods moved
between
Warehouses in
Trucks
Checking rules in
real-time
Goods
Chickens
(Perishable, Fragile,
Edible)
Toxic Waste
(Hazardous, Bullk)
Vegetables
(Perishable, edible)
Art (Fragile)
Warehouses
Trucks
Interfaces
between
virtual and
real
RFID tags
RFID readers
Produce Truck
load/unload events
Interfaces
between
virtual and
real
Interfaces
between
virtual and
real
Sensors (e.g. shock
and vibration,
environmental gas
sensor).
About 20 metrics
The Story
Warehouses
The Story
Goods in a
Warehouse
The Story
Warehouse Sensor
events
Check environment
rules for all Goods
in Warehouse
Warehouse sensor events
? ?
? ?
Warehouse sensor events No Goods
The Story
Along comes a truck
The Story
Load truck with art
The Story
RFID Load event
Art now in Truck
RFID Load Event
The Story
Load truck with
Drums
The Story
RFID Load Event
Drums now in Truck
Check Drums and
Art co-location rules
RFID Load Event
?
The Story
Truck drives to
another warehouse
The Story
Truck sensor events
Check environment
rules for all Goods
in Truck
Truck sensor events
?
?
The Story
Unload Drums and
Art
The Story
RFID Unload events
Goods now in
Warehouse
Repeat from Start
With lots more
warehouses, goods
and trucks!
RFID Unload Events
Rules
Goods
categories
■ Each Goods has 0 or more general Categories:
● Perishable
● Hazardous
● Fragile
● Edible
● Medicinal
● Bulky
● Dry
■ Real world more complex
● 97 categories in Australia
Rules
Goods
categories
■ And 0 or 1 temperature category
● Frozen Temp
● Heat Sensitive Temp
● Cool Temp
● Room Temp
● Ambient Temp
■ Some warehouses/trucks are temperature controlled
Rule
checking
Co-location
Goods have rules to
check if they are
safe in the same
Truck
Sensor rules
Goods to have rules
to check if they are
safe in the
environment of a
location -
Warehouse or
Trucks, 20 metrics,
some in common
E.g. Keep your
chickens cool
3 Application
Simulation
Logical steps
Create Goods,
Warehouses,
Trucks
Simulate next hour
Unload Goods into
Warehouses
Simulate Sensor
values (Trucks and
Warehouses)
Check Goods +
Sensor violations
(Goods in Trucks
and Warehouses)
Check Goods + co-
location violations
(Goods on trucks)
Load Trucks with
Goods, move
Trucks to random
Warehouses
repeat
Architecture
Create Goods,
Warehouses,
Trucks
Simulate next hour
Unload Goods into
Warehouses
Simulate Sensor
values (Trucks and
Warehouses)
Check Goods +
Sensor violations
(Goods in Trucks
and Warehouses)
Check Goods + co-
location violations
(Goods on trucks)
Load Trucks with
Goods, move
Trucks to random
Warehouses
repeat
Rule violations
Monolithic
Rule violations
Architecture
Create Goods,
Warehouses,
Trucks
Simulate next hour
Unload Goods into
Warehouses
Simulate Sensor
values (Trucks and
Warehouses)
Check Goods +
Sensor violations
(Goods in Trucks
and Warehouses)
Check Goods + co-
location violations
(Goods on trucks)Load Trucks with
Goods, move
Trucks to random
Warehouses
repeat
Event streams
Rule violations
Sensor events
Unload events
Load events
De-coupled with
event streams
Distributed
Architecture Create Goods,
Warehouses,
Trucks
Simulate next hour
Unload Goods into
Warehouses
Simulate Sensor
values (Trucks and
Warehouses)
Check Goods +
Sensor violations
(Goods in Trucks
and Warehouses)
Check Goods + co-
location violations
(Goods on trucks)
Load Trucks with
Goods, move
Trucks to random
Warehouses
repeat
Rule violations
Separate Kafka
producers and
consumers
Simulation has
perfect knowledge,
but violation rules
checking relies on
event stream data
Simulation
Kafka Producers
Violation rules checking
Kafka Consumers
Design Goal
Deliver events
produced from each
location to Goods in
same location
Events delivered to Goods
in same location
Events delivered to Goods
in same location
Events delivered to Goods
in same location
Events delivered to Goods
in same location
Design
Variables
Topics and
Consumers=Goods
All locations in 1 topic 1 topic per location
Goods de-coupled
from Consumers
(Consumers < Goods)
Every Goods is a
Consumer (Group)
Goods = Consumers
Topics
Consumers
1 Many
Design
Variables
Problems?
100s of locations =
topics ok, more not
ok
Too many consumer
groups not ok
All locations in 1 topic 1 topic per location
Goods de-coupled
from Consumers
(Consumers <<
Goods)
Every Goods is a
Consumer (Group)
Goods == Consumers
Topics
Consumers
1 Many
Possible
Design 1
Multiple topics
Goods =
Consumers = many
0..n
0..n
0..n
0..n
Possible
Design 2
Single Topic
1 Consumer Group,
decoupled Goods
Another component
responsible for
mapping of location
to Goods
Design
check
High Fan-out
How well does
Kafka work for
broadcast delivery
of the same event to
large numbers of
consumers?
Initial
benchmarking
100 Locations
100,000 Goods
Fan-out = 1:1000
Option 2 superior
Single topic, single
consumer group
0
20
40
60
80
100
120
Option 1 Option 2
Relative Throughput (%)
4 Kafka
Streams
1 of 4 Kafka APIs
Fishing on the
Congo
Kafka Streams = a
complex way of
fishing?!
But scalable
Streams
concurrency (tasks)
<= input topic
partitions
Streams
1 of 4 Kafka APIs
■ Kafka has 4 APIs, Producer, Consumer, Connector
and Streams!
■ The Streams API allows an application to act as a
stream processor
● consuming an input stream from one or more topics and
● producing an output stream to one or more topics
● transforming the input streams to output streams
■ A stream is an unbounded, ordered, replayable,
continuously updating data set, consisting of
strongly typed key-value records.
Processor
Topology
DAGs of stream
processors (nodes)
that are connected by
streams (edges)
Processors transform
data by receiving one
input record, applying
an operation to it, and
producing output
records.
Streams
DSL
Streams and Tables
■ The Streams DSL has built-in abstractions for
streams and tables
● KStream, KTable, GlobalKTable, KGroupedStream, and
KGroupedTable.
■ The DSL supports a Declarative functional
programming style, with
● stateless transformations (e.g. map and filter) as well as
● stateful transformations such as aggregations (e.g. count and
reduce), joins, and windowing.
How to
compose
operations
Truck
overload!
Trucks have a
maximum load
weight
Built a streams
application to check
for overloading.
Streaming
Problems?
Topology
Exceptions and
Floating trucks!
Understanding
and debugging
Streams
Topologies
Use Kafka Streams
Topology Visualizer!
https://zz85.github.io
/kafka-streams-viz/
Streaming
Problems?
■ Invalid Topology errors
● Some tricky (non-obvious)
Kafka streams topology rules
● And cycles aren’t allowed
Invalid Topologies
Streaming
Problems?
■ Anti-gravity?
● Sometimes truck weights went negative!
● Solution: Turned on “exactly-once” transactional setting
● The transactional producer allows an application to send messages
to multiple partitions atomically.
● Weights no longer go negative
Negative truck
weights!
5 Scaling
Congo Inga rapids
Scaling
Congo Inga rapids
Scaling is easy?
100 warehouses
200 trucks
10,000 Goods
Scalability
alternatives
Scale out, up and
multiple clusters
Multiple clusters
enables flexible
scaling (cluster for
violations)
Different instance
sizes have different
network speeds
Larger
instances
reduce end-
to-end
latency
2 core instances c.f.
4 core instances
Higher concurrency
and faster network
We also offer 8 core
Kafka instances
(AWS R5’s +SSDs)
Total
resources
Kafka clusters and
application cores
Application used x2
server cores
Kubernetes works
well for application
deployment, scaling,
monitoring
Scaling is
hard (1)
Actually hard to
achieve linear
scalability
Why? Kafka is
scalable, but:
■ Hash Collisions
● Too many open files exceptions
● Due to increasing and eventually too many consumers
● Some consumers were timing out
● Why? Some consumers were not receiving any events
● 300 locations and 300 partitions, but only 200 unique values, so
only 200 consumers receive events, the rest time out
● This is due to hashing collisions, some partitions get > 1 locations,
others 0
Key parking
problem
Well known problem
Knuth 1962
Ensure number of
keys >>> number of
partitions >=
number of
consumers (in a
group)
Scaling is
hard (2)
Cloudy with a
chance of
Rebalancing Storms
Rebalancing
storms
■ Rebalancing storms result in some consumers not
receiving events (drop in throughput) and a very
slow start up time for new consumers (> 20s)
■ Need to ensure consumers are started and are
polling before trying to add lots more consumers
■ So try to keep total number of consumers as low as
practical (next…)
Scaling is
hard (3)
Too much
(consumer)
scalability is bad
Consumers
Less is more
■ Even though we used the design with least
consumers…
■ If Kafka consumers take too long to read events and
process them, then need more consumer threads
(and more partitions), impacting Kafka cluster
scalability
■ Solution? Minimize consumer response time
● Only use consumers for reading events
● Do event processing asynchronously or in separate thread pool
■ My #1 Kafka rule is
● “Kafka is easy to scale with the smallest number of consumers”
More
information
The End
■ Kongo code:
● https://github.com/instaclustr/kongo2
● https://github.com/instaclustr/kongokafkastreams
■ All blog series, including Kongo, and latest,
● Anomalia Machina
ᐨ Kafka+Cassandra+Kubernetes, and
● Geospatial Anomalia Machina
ᐨ Kafka+Cassandra+Kubernetes+Geospatial queries & indexing
● https://www.instaclustr.com/paul-brebner/
■ Visual Introduction To Kafka
● https://www.instaclustr.com/resource/apache-kafka-a-visual-
introduction/
■ The Instaclustr Managed Platform
● https://www.instaclustr.com/platform/
● Free Trial
ᐨ https://console.instaclustr.com/user/signup

More Related Content

What's hot

Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...confluent
 
Large scale stream processing with Apache Flink
Large scale stream processing with Apache FlinkLarge scale stream processing with Apache Flink
Large scale stream processing with Apache FlinkNikolay Stoitsev
 
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...confluent
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProChester Chen
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basFlorent Ramiere
 
Follow the (Kafka) Streams
Follow the (Kafka) StreamsFollow the (Kafka) Streams
Follow the (Kafka) Streamsconfluent
 
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10  an integration story[Spark Summit EU 2017] Apache spark streaming + kafka 0.10  an integration story
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration storyJoan Viladrosa Riera
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineMonal Daxini
 
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNApache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNblueboxtraveler
 
Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...
Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...
Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...confluent
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...confluent
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGwen (Chen) Shapira
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...confluent
 
Kafka on Kubernetes: Does it really have to be "The Hard Way"? (Viktor Gamov,...
Kafka on Kubernetes: Does it really have to be "The Hard Way"? (Viktor Gamov,...Kafka on Kubernetes: Does it really have to be "The Hard Way"? (Viktor Gamov,...
Kafka on Kubernetes: Does it really have to be "The Hard Way"? (Viktor Gamov,...confluent
 
Ingesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah WhitacreIngesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah Whitacreconfluent
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Guozhang Wang
 
Apache Kafka - Patterns anti-patterns
Apache Kafka - Patterns anti-patternsApache Kafka - Patterns anti-patterns
Apache Kafka - Patterns anti-patternsFlorent Ramiere
 
A Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
A Modern C++ Kafka API | Kenneth Jia, Morgan StanleyA Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
A Modern C++ Kafka API | Kenneth Jia, Morgan StanleyHostedbyConfluent
 
10 Lessons Learned from using Kafka in 1000 microservices - ScalaUA
10 Lessons Learned from using Kafka in 1000 microservices - ScalaUA10 Lessons Learned from using Kafka in 1000 microservices - ScalaUA
10 Lessons Learned from using Kafka in 1000 microservices - ScalaUANatan Silnitsky
 
Jitney, Kafka at Airbnb
Jitney, Kafka at AirbnbJitney, Kafka at Airbnb
Jitney, Kafka at Airbnbalexismidon
 

What's hot (20)

Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
 
Large scale stream processing with Apache Flink
Large scale stream processing with Apache FlinkLarge scale stream processing with Apache Flink
Large scale stream processing with Apache Flink
 
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en bas
 
Follow the (Kafka) Streams
Follow the (Kafka) StreamsFollow the (Kafka) Streams
Follow the (Kafka) Streams
 
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10  an integration story[Spark Summit EU 2017] Apache spark streaming + kafka 0.10  an integration story
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
 
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNApache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
 
Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...
Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...
Spring Kafka beyond the basics - Lessons learned on our Kafka journey (Tim va...
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service mesh
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
 
Kafka on Kubernetes: Does it really have to be "The Hard Way"? (Viktor Gamov,...
Kafka on Kubernetes: Does it really have to be "The Hard Way"? (Viktor Gamov,...Kafka on Kubernetes: Does it really have to be "The Hard Way"? (Viktor Gamov,...
Kafka on Kubernetes: Does it really have to be "The Hard Way"? (Viktor Gamov,...
 
Ingesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah WhitacreIngesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah Whitacre
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
 
Apache Kafka - Patterns anti-patterns
Apache Kafka - Patterns anti-patternsApache Kafka - Patterns anti-patterns
Apache Kafka - Patterns anti-patterns
 
A Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
A Modern C++ Kafka API | Kenneth Jia, Morgan StanleyA Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
A Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
 
10 Lessons Learned from using Kafka in 1000 microservices - ScalaUA
10 Lessons Learned from using Kafka in 1000 microservices - ScalaUA10 Lessons Learned from using Kafka in 1000 microservices - ScalaUA
10 Lessons Learned from using Kafka in 1000 microservices - ScalaUA
 
Jitney, Kafka at Airbnb
Jitney, Kafka at AirbnbJitney, Kafka at Airbnb
Jitney, Kafka at Airbnb
 

Similar to ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application using Apache Kafka

RabbitMQ vs Apache Kafka - Part 1
RabbitMQ vs Apache Kafka - Part 1RabbitMQ vs Apache Kafka - Part 1
RabbitMQ vs Apache Kafka - Part 1Erlang Solutions
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APIStreaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APICarol McDonald
 
Evolutionary Systems - Kafka Microservices
Evolutionary Systems - Kafka MicroservicesEvolutionary Systems - Kafka Microservices
Evolutionary Systems - Kafka MicroservicesStefano Rocco
 
Data Pipelines with Apache Kafka
Data Pipelines with Apache KafkaData Pipelines with Apache Kafka
Data Pipelines with Apache KafkaBen Stopford
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQShameera Rathnayaka
 
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...StreamNative
 
Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in StreamsJamie Grier
 
Apache Pulsar as a Dual Stream / Batch Processor
Apache Pulsar as a Dual Stream / Batch ProcessorApache Pulsar as a Dual Stream / Batch Processor
Apache Pulsar as a Dual Stream / Batch ProcessorJoe Olson
 
Curing the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerCuring the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerDataWorks Summit
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingKostas Tzoumas
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Kostas Tzoumas
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
 
Updated shared chain pd 2
Updated shared chain pd 2Updated shared chain pd 2
Updated shared chain pd 2Nadeem Ansari
 
JConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaJConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaTimothy Spann
 
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...confluent
 
Fasten Industry Meeting with GitHub about Dependancy Management
Fasten Industry Meeting with GitHub about Dependancy ManagementFasten Industry Meeting with GitHub about Dependancy Management
Fasten Industry Meeting with GitHub about Dependancy ManagementFasten Project
 

Similar to ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application using Apache Kafka (20)

RabbitMQ vs Apache Kafka - Part 1
RabbitMQ vs Apache Kafka - Part 1RabbitMQ vs Apache Kafka - Part 1
RabbitMQ vs Apache Kafka - Part 1
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APIStreaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
 
Notes leo kafka
Notes leo kafkaNotes leo kafka
Notes leo kafka
 
Evolutionary Systems - Kafka Microservices
Evolutionary Systems - Kafka MicroservicesEvolutionary Systems - Kafka Microservices
Evolutionary Systems - Kafka Microservices
 
Data Pipelines with Apache Kafka
Data Pipelines with Apache KafkaData Pipelines with Apache Kafka
Data Pipelines with Apache Kafka
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
 
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
 
Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in Streams
 
Apache Pulsar as a Dual Stream / Batch Processor
Apache Pulsar as a Dual Stream / Batch ProcessorApache Pulsar as a Dual Stream / Batch Processor
Apache Pulsar as a Dual Stream / Batch Processor
 
Curing the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerCuring the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging Manager
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream Processing
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
Updated shared chain pd 2
Updated shared chain pd 2Updated shared chain pd 2
Updated shared chain pd 2
 
JConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaJConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with Java
 
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
 
Fasten Industry Meeting with GitHub about Dependancy Management
Fasten Industry Meeting with GitHub about Dependancy ManagementFasten Industry Meeting with GitHub about Dependancy Management
Fasten Industry Meeting with GitHub about Dependancy Management
 
Kafka for Scale
Kafka for ScaleKafka for Scale
Kafka for Scale
 

More from Paul Brebner

The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...Paul Brebner
 
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersApache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersPaul Brebner
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaSpinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaPaul Brebner
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Paul Brebner
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
 
A Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaA Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaPaul Brebner
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Paul Brebner
 
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Paul Brebner
 
Grid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialGrid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialPaul Brebner
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Paul Brebner
 
0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980'sPaul Brebner
 
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...Paul Brebner
 
How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...Paul Brebner
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache KafkaPaul Brebner
 
Automatic Performance Modelling from Application Performance Management (APM)...
Automatic Performance Modelling from Application Performance Management (APM)...Automatic Performance Modelling from Application Performance Management (APM)...
Automatic Performance Modelling from Application Performance Management (APM)...Paul Brebner
 
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Paul Brebner
 
Introduction to programming class 13
Introduction to programming   class 13Introduction to programming   class 13
Introduction to programming class 13Paul Brebner
 

More from Paul Brebner (20)

The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
 
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersApache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaSpinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache Kafka
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
A Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaA Visual Introduction to Apache Kafka
A Visual Introduction to Apache Kafka
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
 
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
 
Grid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialGrid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and Potential
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's
 
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
 
How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Automatic Performance Modelling from Application Performance Management (APM)...
Automatic Performance Modelling from Application Performance Management (APM)...Automatic Performance Modelling from Application Performance Management (APM)...
Automatic Performance Modelling from Application Performance Management (APM)...
 
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...
 
Introduction to programming class 13
Introduction to programming   class 13Introduction to programming   class 13
Introduction to programming class 13
 

Recently uploaded

StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfsteffenkarlsson2
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionWave PLM
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationHelp Desk Migration
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1KnowledgeSeed
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMarkus Moeller
 
Malaysia E-Invoice digital signature docpptx
Malaysia E-Invoice digital signature docpptxMalaysia E-Invoice digital signature docpptx
Malaysia E-Invoice digital signature docpptxMok TH
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdfkalichargn70th171
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesNeo4j
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabbereGrabber
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems ApproachNeo4j
 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAShane Coughlan
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...rajkumar669520
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationWave PLM
 
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdfThe Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdfkalichargn70th171
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Andrea Goulet
 
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...naitiksharma1124
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfMehmet Akar
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Soroosh Khodami
 

Recently uploaded (20)

5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data Migration
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdf
 
Malaysia E-Invoice digital signature docpptx
Malaysia E-Invoice digital signature docpptxMalaysia E-Invoice digital signature docpptx
Malaysia E-Invoice digital signature docpptx
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabber
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdfThe Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 

ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application using Apache Kafka

  • 1. Kongo: Building a Scalable Streaming IoT Application using Apache Kafka Paul Brebner instaclustr.com Technology Evangelist
  • 2. Overview What we’ll see on the journey up river 1. Kafka introduction 2. The Kongo problem 3. Application, Architecture and Design(s) 4. Streams extension 5. Scaling
  • 4. Instaclustr Managed Platform Open Source - Store, Analyze, Search, Explore, and in 2018 we added Stream – Kafka!
  • 5. What is Kafka? Message flow Distributed streams processing 1 Distributed Producers… 2 Send Messages 3 To Distributed Consumers 4 Via Kafka Cluster
  • 6. Kafka Key Benefits ■ Fast – high throughput and low latency ■ Scalable – horizontally scalable, just add nodes and partitions ■ Reliable – distributed and fault tolerant ■ Zero data loss ■ Open Source ■ Heterogeneous data sources and sinks ■ Available as an Instaclustr Managed service
  • 8. Producer Consumer Consumer Consumer Consumer ? Kafka is “pub-sub”, It’s loosely coupled, producers and consumers don’t know about each other.
  • 9. Filtering, or which consumers get which messages, is topic based. - Producers send messages to topics. - Consumers subscribe to topics of interest, e.g. Parties. - When they poll they only receive messages sent to those topics. None of these consumers will receive messages sent to the “Work” topic. Producer Consumer Consumer Consumer Consumer Topic “Parties” Topic “Work” Consumers subscribed to Topic “Parties” Consumers poll to receive messages from “Parties” Consumers not subscribed to “Work” messages
  • 10. Kafka works like an Amish Barn raising. Partitions and a consumer group share work across multiple consumers, the more partitions a topic has the more consumers it supports. Image: Paul Cyr ©2018 NorthernMainePhotos.com
  • 11. Kafka also works like the Clone Army It supports delivery of the same message to multiple consumers with consumer groups. Kafka doesn’t throw messages away immediately they are delivered, so the same message can be delivered to multiple consumer groups. Image: AKKHARAT JARUSILAWONG / Shutterstock.com
  • 12. Consumers subscribed to ”Parties” topic are allocated partitions. When they poll they will only get messages from their allocated partitions. Consumer Consumer Topic “Parties” Partition 1 Partition 2 Partition 3 Producer Consumer Group Consumer Consumer Group Consumer
  • 13. This enables consumers in the same group to share the work around. Each consumer gets only a subset of the available messages. Consumer Consumer Topic “Parties” Partition 1 Partition 2 Partition 3 Producer Consumer Group Consumer Consumers share work within groups
  • 14. Multiple groups enable message broadcasting. Messages are duplicated across groups, as each consumer group receives a copy of each message. Consumer Consumer Consumer Topic “Parties” Partition 1 Partition 2 Partition 3 Producer Consumer Group Consumer Consumer Group Messages are duplicated across Consumer groups
  • 15. 2 The Kongo Problem Amazon was taken Congo is 2nd biggest river, 5,000km And deepest Congo -> Kongo (Kingdom of)
  • 16. Kongo River Important for trade A logistics problem
  • 17. Our Logistics Problem Goods stored in Warehouses Goods moved between Warehouses in Trucks Checking rules in real-time
  • 18. Goods Chickens (Perishable, Fragile, Edible) Toxic Waste (Hazardous, Bullk) Vegetables (Perishable, edible) Art (Fragile)
  • 22. RFID readers Produce Truck load/unload events Interfaces between virtual and real
  • 23. Interfaces between virtual and real Sensors (e.g. shock and vibration, environmental gas sensor). About 20 metrics
  • 25. The Story Goods in a Warehouse
  • 26. The Story Warehouse Sensor events Check environment rules for all Goods in Warehouse Warehouse sensor events ? ? ? ? Warehouse sensor events No Goods
  • 29. The Story RFID Load event Art now in Truck RFID Load Event
  • 30. The Story Load truck with Drums
  • 31. The Story RFID Load Event Drums now in Truck Check Drums and Art co-location rules RFID Load Event ?
  • 32. The Story Truck drives to another warehouse
  • 33. The Story Truck sensor events Check environment rules for all Goods in Truck Truck sensor events ? ?
  • 35. The Story RFID Unload events Goods now in Warehouse Repeat from Start With lots more warehouses, goods and trucks! RFID Unload Events
  • 36. Rules Goods categories ■ Each Goods has 0 or more general Categories: ● Perishable ● Hazardous ● Fragile ● Edible ● Medicinal ● Bulky ● Dry ■ Real world more complex ● 97 categories in Australia
  • 37. Rules Goods categories ■ And 0 or 1 temperature category ● Frozen Temp ● Heat Sensitive Temp ● Cool Temp ● Room Temp ● Ambient Temp ■ Some warehouses/trucks are temperature controlled
  • 38. Rule checking Co-location Goods have rules to check if they are safe in the same Truck
  • 39. Sensor rules Goods to have rules to check if they are safe in the environment of a location - Warehouse or Trucks, 20 metrics, some in common E.g. Keep your chickens cool
  • 40. 3 Application Simulation Logical steps Create Goods, Warehouses, Trucks Simulate next hour Unload Goods into Warehouses Simulate Sensor values (Trucks and Warehouses) Check Goods + Sensor violations (Goods in Trucks and Warehouses) Check Goods + co- location violations (Goods on trucks) Load Trucks with Goods, move Trucks to random Warehouses repeat
  • 41. Architecture Create Goods, Warehouses, Trucks Simulate next hour Unload Goods into Warehouses Simulate Sensor values (Trucks and Warehouses) Check Goods + Sensor violations (Goods in Trucks and Warehouses) Check Goods + co- location violations (Goods on trucks) Load Trucks with Goods, move Trucks to random Warehouses repeat Rule violations Monolithic Rule violations
  • 42. Architecture Create Goods, Warehouses, Trucks Simulate next hour Unload Goods into Warehouses Simulate Sensor values (Trucks and Warehouses) Check Goods + Sensor violations (Goods in Trucks and Warehouses) Check Goods + co- location violations (Goods on trucks)Load Trucks with Goods, move Trucks to random Warehouses repeat Event streams Rule violations Sensor events Unload events Load events De-coupled with event streams
  • 43. Distributed Architecture Create Goods, Warehouses, Trucks Simulate next hour Unload Goods into Warehouses Simulate Sensor values (Trucks and Warehouses) Check Goods + Sensor violations (Goods in Trucks and Warehouses) Check Goods + co- location violations (Goods on trucks) Load Trucks with Goods, move Trucks to random Warehouses repeat Rule violations Separate Kafka producers and consumers Simulation has perfect knowledge, but violation rules checking relies on event stream data Simulation Kafka Producers Violation rules checking Kafka Consumers
  • 44. Design Goal Deliver events produced from each location to Goods in same location Events delivered to Goods in same location Events delivered to Goods in same location Events delivered to Goods in same location Events delivered to Goods in same location
  • 45. Design Variables Topics and Consumers=Goods All locations in 1 topic 1 topic per location Goods de-coupled from Consumers (Consumers < Goods) Every Goods is a Consumer (Group) Goods = Consumers Topics Consumers 1 Many
  • 46. Design Variables Problems? 100s of locations = topics ok, more not ok Too many consumer groups not ok All locations in 1 topic 1 topic per location Goods de-coupled from Consumers (Consumers << Goods) Every Goods is a Consumer (Group) Goods == Consumers Topics Consumers 1 Many
  • 47. Possible Design 1 Multiple topics Goods = Consumers = many 0..n 0..n 0..n 0..n
  • 48. Possible Design 2 Single Topic 1 Consumer Group, decoupled Goods Another component responsible for mapping of location to Goods
  • 49. Design check High Fan-out How well does Kafka work for broadcast delivery of the same event to large numbers of consumers?
  • 50. Initial benchmarking 100 Locations 100,000 Goods Fan-out = 1:1000 Option 2 superior Single topic, single consumer group 0 20 40 60 80 100 120 Option 1 Option 2 Relative Throughput (%)
  • 51. 4 Kafka Streams 1 of 4 Kafka APIs Fishing on the Congo Kafka Streams = a complex way of fishing?!
  • 53. Streams 1 of 4 Kafka APIs ■ Kafka has 4 APIs, Producer, Consumer, Connector and Streams! ■ The Streams API allows an application to act as a stream processor ● consuming an input stream from one or more topics and ● producing an output stream to one or more topics ● transforming the input streams to output streams ■ A stream is an unbounded, ordered, replayable, continuously updating data set, consisting of strongly typed key-value records.
  • 54. Processor Topology DAGs of stream processors (nodes) that are connected by streams (edges) Processors transform data by receiving one input record, applying an operation to it, and producing output records.
  • 55. Streams DSL Streams and Tables ■ The Streams DSL has built-in abstractions for streams and tables ● KStream, KTable, GlobalKTable, KGroupedStream, and KGroupedTable. ■ The DSL supports a Declarative functional programming style, with ● stateless transformations (e.g. map and filter) as well as ● stateful transformations such as aggregations (e.g. count and reduce), joins, and windowing.
  • 57. Truck overload! Trucks have a maximum load weight Built a streams application to check for overloading.
  • 59. Understanding and debugging Streams Topologies Use Kafka Streams Topology Visualizer! https://zz85.github.io /kafka-streams-viz/
  • 60. Streaming Problems? ■ Invalid Topology errors ● Some tricky (non-obvious) Kafka streams topology rules ● And cycles aren’t allowed Invalid Topologies
  • 61. Streaming Problems? ■ Anti-gravity? ● Sometimes truck weights went negative! ● Solution: Turned on “exactly-once” transactional setting ● The transactional producer allows an application to send messages to multiple partitions atomically. ● Weights no longer go negative Negative truck weights!
  • 63. Scaling Congo Inga rapids Scaling is easy? 100 warehouses 200 trucks 10,000 Goods
  • 64. Scalability alternatives Scale out, up and multiple clusters Multiple clusters enables flexible scaling (cluster for violations) Different instance sizes have different network speeds
  • 65. Larger instances reduce end- to-end latency 2 core instances c.f. 4 core instances Higher concurrency and faster network We also offer 8 core Kafka instances (AWS R5’s +SSDs)
  • 66. Total resources Kafka clusters and application cores Application used x2 server cores Kubernetes works well for application deployment, scaling, monitoring
  • 67. Scaling is hard (1) Actually hard to achieve linear scalability Why? Kafka is scalable, but: ■ Hash Collisions ● Too many open files exceptions ● Due to increasing and eventually too many consumers ● Some consumers were timing out ● Why? Some consumers were not receiving any events ● 300 locations and 300 partitions, but only 200 unique values, so only 200 consumers receive events, the rest time out ● This is due to hashing collisions, some partitions get > 1 locations, others 0
  • 68. Key parking problem Well known problem Knuth 1962 Ensure number of keys >>> number of partitions >= number of consumers (in a group)
  • 69. Scaling is hard (2) Cloudy with a chance of Rebalancing Storms
  • 70. Rebalancing storms ■ Rebalancing storms result in some consumers not receiving events (drop in throughput) and a very slow start up time for new consumers (> 20s) ■ Need to ensure consumers are started and are polling before trying to add lots more consumers ■ So try to keep total number of consumers as low as practical (next…)
  • 71. Scaling is hard (3) Too much (consumer) scalability is bad
  • 72. Consumers Less is more ■ Even though we used the design with least consumers… ■ If Kafka consumers take too long to read events and process them, then need more consumer threads (and more partitions), impacting Kafka cluster scalability ■ Solution? Minimize consumer response time ● Only use consumers for reading events ● Do event processing asynchronously or in separate thread pool ■ My #1 Kafka rule is ● “Kafka is easy to scale with the smallest number of consumers”
  • 73. More information The End ■ Kongo code: ● https://github.com/instaclustr/kongo2 ● https://github.com/instaclustr/kongokafkastreams ■ All blog series, including Kongo, and latest, ● Anomalia Machina ᐨ Kafka+Cassandra+Kubernetes, and ● Geospatial Anomalia Machina ᐨ Kafka+Cassandra+Kubernetes+Geospatial queries & indexing ● https://www.instaclustr.com/paul-brebner/ ■ Visual Introduction To Kafka ● https://www.instaclustr.com/resource/apache-kafka-a-visual- introduction/ ■ The Instaclustr Managed Platform ● https://www.instaclustr.com/platform/ ● Free Trial ᐨ https://console.instaclustr.com/user/signup