SlideShare a Scribd company logo
1
R U N N I N G
S T R E A M I N G
A P P S O N
D O C K E R
O N E C O U L D . . . B U T S H O U L D O N E ?
2
3
LEARN YOU
A DOCKER
GAMEDAY
SURPRISE
EXPERIMENT
4
LEARN
YOU A
DOCKER
New Paradigms
5
HOW (NOT?) TO SET UP A STREAMING PLATFORM
6
K A F K A S T R E A M S
“ H e y , C o n f l u e n t c r e a t e d
t h i s r e a l l y c o o l n e w
s t r e a m i n g d a t a
f r a m ewo r k w e s h o u l d t r y
o u t . ”
HOW (NOT?) TO SET UP A STREAMING PLATFORM
7
“But what if
someone else’s app
eats up all of my
resources?”
8
Image credit: DesertIslandBrooklyn.com
9
10
K A F K A S T R E A M S D O C K E R
“ H e y , C o n f l u e n t c r e a t e d
t h i s r e a l l y c o o l n e w
s t r e a m i n g d a t a
f r a m ewo r k w e s h o u l d t r y
o u t . ”
“ Yo u k n o w w h a t w e
s h o u l d d o ? R U N O U R
S T R E A M I N G A P P S O N
D O C K E R ! ”
HOW (NOT?) TO SET UP A STREAMING PLATFORM
11
Docker images
made it easy for us
to deploy anywhere
very quickly.
N O M O R E C H E F !
( W E L L , M O S T L Y . )
12
Isolated containers
freed us from
systemwide
requirements.
13
Build-time arguments
made our deploy
setup faster and
more flexible.
14
Docker also helped
with a Kafka Connect
issue: rebalancing.
15
Adding a new
connector to a KC
cluster causes a
rebalance of all
workers.
16
17
18
K A F K A S T R E A M S D O C K E R C O N F U S I O N
“ H e y , C o n f l u e n t c r e a t e d
t h i s r e a l l y c o o l n e w
s t r e a m i n g d a t a
f r a m ewo r k w e s h o u l d t r y
o u t . ”
“ Yo u k n o w w h a t w e
s h o u l d d o ? R U N O U R
S T R E A M I N G A P P S O N
D O C K E R ! ”
“ H o w d o e s K a f k a S t r e a m s
w o r k ? W h a t ’s h a p p e n i n g
i n s i d e m y a p p ? H ow d o
yo u eve n D o c ke r ? ”
HOW (NOT?) TO SET UP A STREAMING PLATFORM
19
GAMEDAY
Docker and the JVM
20
GAMEDAY (n):
All your coworkers get
together to try and
make your brand new
system fall over.
21
It’s great for testing
the resilience of your
system...
22
It’s great for testing
the resilience of your
system...
But it also becomes a
test of its viability.
23
High latency /
Slow recovery
=
NOT VIABLE
24
25
KA FKA STRE AMS
26
KA FKA STRE AMS DO CKER SWAR M
27
8 replicas
Multithreaded
> 1 billion records /
day
28
29
STEP 1:
Kill all but the

Swarm manager.
30
STEP 1:
Kill all but the

Swarm manager.
STEP 2:
Wait for Swarm to

restart replicas.
31
1 replica
+

Lots of messages
+
Lots of processing
32
“It is better to fail
in originality than
to succeed in
imitation.”
H e r m a n M e l v i l l e
33
After we restarted
all the replicas, lag
continued to
stagnate.
34
Record lag now
numbered in the
hundreds of
millions.
35
“It is better to fail
in originality than
to succeed in
imitation.”
H e r m a n M e l v i l l e
36
“It is better to fail
in originality than
to succeed in
imitation.”
H e r m a n M e l v i l l e
37
“It is better to fail
in originality than
to succeed in
imitation.”
H e r m a n M e l v i l l e
38
In the end,
a simple
`docker stats`
revealed the
problem:
39
MEMORY
CONSTRAINTS.
40
Why do we want to
run JVMs inside
Docker containers?
41
So we can pack
lots of them
onto one server ...
Image credit: datadoghq.com
42
So we can pack
lots of them
onto one server ...
and take advantage
of Docker’s resource
constraints.
Image credit: datadoghq.com
43
The JVM doesn’t
care that you’ve
set resource
constraints.
44
It’s not cgroup aware.
P R O J E C T D E S C R I P T I O N
Image credit: mairin.wordpress.com
45
JVM ERGONOMICS
Th e p r o ce s s by w h i c h t h e J V M t u n e s i t s e l f d e p e n d i n g o n i t s e nv i r o n m e n t .
G A R B A G E C O L L E C T I O N
I N I T I A L H E A P S I Z E
M A X I M U M H E A P S I Z E
R U N T I M E C O M P I L E R
46
It can be easy to
run a single JVM
on a server-class
machine with
minimal tuning.
47
But how does the
JVM behave when
you combine
ergonomics with
cgroups?
48
We’ve established
that my apps were
suffering due to
memory
constraints.
49
But they weren’t

being reaped

by the OOM killer…
Image credit: turnoff.us
50
But they weren’t

being reaped

by the OOM killer…
They were just
running
incredibly
slowly.
Image credit: turnoff.us
51
This is
generally not
something
you want to
see.
52
Limiting memory
like this allows you
2x as much swap...
53
... while this
constrains your swap
limit to 300M.
Limiting memory
like this allows you
2x as much swap...
54
Tune your JVMs or
you might break
your system by
deploying to a
different server.
55
WHAT HAVE WE LEARNED SO FAR?
L E S S O N O N E
Your Kafka Streams app is
just a regular Java app.
Tune settings carefully
and intentionally so you
can take full advantage of
containerization.
56
SURPRISE
EXPERIMENT
Adventures in Networking
57
My manager came
to me with an
urgent proposal.
58
T R E N D I N G S E A R C H E S O N E T S Y
59
RECREATE
THIS... USING
STREAMING
DATA.
T R E N D I N G S E A R C H E S O N E T S Y
60
TRENDING
SEARCHES:
Aggregate query
counts, get top K.
61
A company is an association or
collection of individuals, whether
natural persons, legal persons,.
Y O U R T I T L E H E R E
A company is an association or
collection of individuals, whether
natural persons, legal persons,.
Y O U R T I T L E H E R E
A company is an association or
collection of individuals, whether
natural persons, legal persons,.
Y O U R T I T L E H E R E
VALUE
NAME 1
Y O U C A N W R I T E H E R E
E n t r e p r e n e u r i a l a c t i v i t i e s d i f f e r
s u b s t a n t i a l l y d e p e n d i n g o n t h e t y p e
o f o r g a n i z a t i o n
62
The aggregation
worked ... but the
interactive queries
piece didn’t.
63
Sometimes I would
see data…
64
... and sometimes
I would see this.
Sometimes I would
see data…
*Etsy’s 404 image.
65
It didn’t seem to be
following any sort
of pattern.
66
Because reloading
worked ...
Sometimes.
67
At this point I was
starting to tear my
hair out...
68
…and then I decided
to RTFM.
69
This was the code for
the RPC layer.
P R O J E C T D E S C R I P T I O N
70
This was the Docker
ser vice definition.
P R O J E C T D E S C R I P T I O N
71
Can you guess what
we did wrong?
72
73
74
75
Image credit: thenewstack.io
76
Overlay network:
More overhead,
higher latency
77
Host network:
Lower latency,
less reusability
78
79
MACVLAN:
Less overhead,
more configuration
80
WHAT HAVE WE LEARNED SO FAR?
L E S S O N O N E
Your Kafka Streams app is
just a regular Java app.
Tune settings carefully
and intentionally so you
can take full advantage of
containerization.
L E S S O N T W O
Networking in containers
is not always
straightforward. It adds
an extra layer of
complexity to your
application deployment.
81
K A F K A S T R E A M S D O C K E R C O N F U S I O N
“ H e y , C o n f l u e n t c r e a t e d
t h i s r e a l l y c o o l n e w
s t r e a m i n g d a t a
f r a m ewo r k w e s h o u l d t r y
o u t . ”
“ Yo u k n o w w h a t w e
s h o u l d d o ? R U N O U R
S T R E A M I N G A P P S O N
D O C K E R ! ”
“ H o w d o e s K a f k a S t r e a m s
w o r k ? W h a t ’s h a p p e n i n g
i n s i d e m y a p p ? H ow d o
yo u eve n D o c ke r ? ”
HOW (NOT?) TO SET UP A STREAMING PLATFORM
82
K A F K A S T R E A M S D O C K E R C O N F U S I O N
“ H e y , C o n f l u e n t c r e a t e d
t h i s r e a l l y c o o l n e w
s t r e a m i n g d a t a
f r a m ewo r k w e s h o u l d t r y
o u t . ”
“ Yo u k n o w w h a t w e
s h o u l d d o ? R U N O U R
S T R E A M I N G A P P S O N
D O C K E R ! ”
“ H o w d o e s K a f k a S t r e a m s
w o r k ? W h a t ’s h a p p e n i n g
i n s i d e m y a p p ? H ow d o
yo u eve n D o c ke r ? ”
HOW (NOT?) TO SET UP A STREAMING PLATFORM
L E A R N A L L
T H E T H I N G S !
83
WHAT HAVE WE LEARNED SO FAR?
L E S S O N O N E L E S S O N T H R E E
Learning two things at
the same time is hard!
Docker will change how
you think about
deployment and failure
recovery.
Your Kafka Streams app is
just a regular Java app.
Tune settings carefully
and intentionally so you
can take full advantage of
containerization.
L E S S O N T W O
Networking in containers
is not always
straightforward. It adds
an extra layer of
complexity to your
application deployment.
84
In the end,
running Kafka
Streams on Docker
turned out just fine.

More Related Content

Similar to Kafka Summit SF 2017 - Running Streaming Apps on Docker

Similar to Kafka Summit SF 2017 - Running Streaming Apps on Docker (20)

Chaos is a ladder !
Chaos is a ladder !Chaos is a ladder !
Chaos is a ladder !
 
How to develop Alexa Skill Kit based on Serverless Architecture
How to develop Alexa Skill Kit based on Serverless ArchitectureHow to develop Alexa Skill Kit based on Serverless Architecture
How to develop Alexa Skill Kit based on Serverless Architecture
 
Introduction to Java Profiling
Introduction to Java ProfilingIntroduction to Java Profiling
Introduction to Java Profiling
 
Vulnerabilities of machine learning infrastructure
Vulnerabilities of machine learning infrastructureVulnerabilities of machine learning infrastructure
Vulnerabilities of machine learning infrastructure
 
whd.usa Plesk 2016 - More than just a control panel - reveal the power of Web...
whd.usa Plesk 2016 - More than just a control panel - reveal the power of Web...whd.usa Plesk 2016 - More than just a control panel - reveal the power of Web...
whd.usa Plesk 2016 - More than just a control panel - reveal the power of Web...
 
How Kubernetes make OpenStack & Ceph better
How Kubernetes make OpenStack & Ceph betterHow Kubernetes make OpenStack & Ceph better
How Kubernetes make OpenStack & Ceph better
 
Ark in Glass v3 Driving the Instance
Ark in Glass v3 Driving the InstanceArk in Glass v3 Driving the Instance
Ark in Glass v3 Driving the Instance
 
Consistency, Availability, Partition: Make Your Choice
Consistency, Availability, Partition: Make Your ChoiceConsistency, Availability, Partition: Make Your Choice
Consistency, Availability, Partition: Make Your Choice
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
 
Data Modelling at Scale
Data Modelling at ScaleData Modelling at Scale
Data Modelling at Scale
 
Is persistency on serverless even possible?!
Is persistency on serverless even possible?!Is persistency on serverless even possible?!
Is persistency on serverless even possible?!
 
Keep it simple web development stack
Keep it simple web development stackKeep it simple web development stack
Keep it simple web development stack
 
Ark in Glass (V4) Summary Concepts in Secant Wheel Construction
Ark in Glass (V4) Summary Concepts in Secant Wheel ConstructionArk in Glass (V4) Summary Concepts in Secant Wheel Construction
Ark in Glass (V4) Summary Concepts in Secant Wheel Construction
 
Voxxed Banff 2018 : Containers & Integration tests
Voxxed Banff 2018 : Containers & Integration testsVoxxed Banff 2018 : Containers & Integration tests
Voxxed Banff 2018 : Containers & Integration tests
 
Docker Inside/Out: The 'Real' Real- World World of Stacking Containers in pro...
Docker Inside/Out: The 'Real' Real- World World of Stacking Containers in pro...Docker Inside/Out: The 'Real' Real- World World of Stacking Containers in pro...
Docker Inside/Out: The 'Real' Real- World World of Stacking Containers in pro...
 
Open Source at AWS: Code, Contributions, Collaboration, and Communication
Open Source at AWS: Code, Contributions, Collaboration, and CommunicationOpen Source at AWS: Code, Contributions, Collaboration, and Communication
Open Source at AWS: Code, Contributions, Collaboration, and Communication
 
AWS Compute Overview: Servers, Containers, Serverless, and Batch | AWS Public...
AWS Compute Overview: Servers, Containers, Serverless, and Batch | AWS Public...AWS Compute Overview: Servers, Containers, Serverless, and Batch | AWS Public...
AWS Compute Overview: Servers, Containers, Serverless, and Batch | AWS Public...
 
Meteor - not just for rockstars
Meteor - not just for rockstarsMeteor - not just for rockstars
Meteor - not just for rockstars
 
JDD 2016 - Bartosz Majsak - Meet The Assertable Chaos Monkeys
JDD 2016 - Bartosz Majsak - Meet The Assertable Chaos Monkeys JDD 2016 - Bartosz Majsak - Meet The Assertable Chaos Monkeys
JDD 2016 - Bartosz Majsak - Meet The Assertable Chaos Monkeys
 

More from confluent

More from confluent (20)

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 

Recently uploaded

How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 

Recently uploaded (20)

Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdf
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 

Kafka Summit SF 2017 - Running Streaming Apps on Docker

  • 1. 1 R U N N I N G S T R E A M I N G A P P S O N D O C K E R O N E C O U L D . . . B U T S H O U L D O N E ?
  • 2. 2
  • 5. 5 HOW (NOT?) TO SET UP A STREAMING PLATFORM
  • 6. 6 K A F K A S T R E A M S “ H e y , C o n f l u e n t c r e a t e d t h i s r e a l l y c o o l n e w s t r e a m i n g d a t a f r a m ewo r k w e s h o u l d t r y o u t . ” HOW (NOT?) TO SET UP A STREAMING PLATFORM
  • 7. 7 “But what if someone else’s app eats up all of my resources?”
  • 9. 9
  • 10. 10 K A F K A S T R E A M S D O C K E R “ H e y , C o n f l u e n t c r e a t e d t h i s r e a l l y c o o l n e w s t r e a m i n g d a t a f r a m ewo r k w e s h o u l d t r y o u t . ” “ Yo u k n o w w h a t w e s h o u l d d o ? R U N O U R S T R E A M I N G A P P S O N D O C K E R ! ” HOW (NOT?) TO SET UP A STREAMING PLATFORM
  • 11. 11 Docker images made it easy for us to deploy anywhere very quickly. N O M O R E C H E F ! ( W E L L , M O S T L Y . )
  • 12. 12 Isolated containers freed us from systemwide requirements.
  • 13. 13 Build-time arguments made our deploy setup faster and more flexible.
  • 14. 14 Docker also helped with a Kafka Connect issue: rebalancing.
  • 15. 15 Adding a new connector to a KC cluster causes a rebalance of all workers.
  • 16. 16
  • 17. 17
  • 18. 18 K A F K A S T R E A M S D O C K E R C O N F U S I O N “ H e y , C o n f l u e n t c r e a t e d t h i s r e a l l y c o o l n e w s t r e a m i n g d a t a f r a m ewo r k w e s h o u l d t r y o u t . ” “ Yo u k n o w w h a t w e s h o u l d d o ? R U N O U R S T R E A M I N G A P P S O N D O C K E R ! ” “ H o w d o e s K a f k a S t r e a m s w o r k ? W h a t ’s h a p p e n i n g i n s i d e m y a p p ? H ow d o yo u eve n D o c ke r ? ” HOW (NOT?) TO SET UP A STREAMING PLATFORM
  • 20. 20 GAMEDAY (n): All your coworkers get together to try and make your brand new system fall over.
  • 21. 21 It’s great for testing the resilience of your system...
  • 22. 22 It’s great for testing the resilience of your system... But it also becomes a test of its viability.
  • 23. 23 High latency / Slow recovery = NOT VIABLE
  • 24. 24
  • 26. 26 KA FKA STRE AMS DO CKER SWAR M
  • 27. 27 8 replicas Multithreaded > 1 billion records / day
  • 28. 28
  • 29. 29 STEP 1: Kill all but the
 Swarm manager.
  • 30. 30 STEP 1: Kill all but the
 Swarm manager. STEP 2: Wait for Swarm to
 restart replicas.
  • 31. 31 1 replica +
 Lots of messages + Lots of processing
  • 32. 32 “It is better to fail in originality than to succeed in imitation.” H e r m a n M e l v i l l e
  • 33. 33 After we restarted all the replicas, lag continued to stagnate.
  • 34. 34 Record lag now numbered in the hundreds of millions.
  • 35. 35 “It is better to fail in originality than to succeed in imitation.” H e r m a n M e l v i l l e
  • 36. 36 “It is better to fail in originality than to succeed in imitation.” H e r m a n M e l v i l l e
  • 37. 37 “It is better to fail in originality than to succeed in imitation.” H e r m a n M e l v i l l e
  • 38. 38 In the end, a simple `docker stats` revealed the problem:
  • 40. 40 Why do we want to run JVMs inside Docker containers?
  • 41. 41 So we can pack lots of them onto one server ... Image credit: datadoghq.com
  • 42. 42 So we can pack lots of them onto one server ... and take advantage of Docker’s resource constraints. Image credit: datadoghq.com
  • 43. 43 The JVM doesn’t care that you’ve set resource constraints.
  • 44. 44 It’s not cgroup aware. P R O J E C T D E S C R I P T I O N Image credit: mairin.wordpress.com
  • 45. 45 JVM ERGONOMICS Th e p r o ce s s by w h i c h t h e J V M t u n e s i t s e l f d e p e n d i n g o n i t s e nv i r o n m e n t . G A R B A G E C O L L E C T I O N I N I T I A L H E A P S I Z E M A X I M U M H E A P S I Z E R U N T I M E C O M P I L E R
  • 46. 46 It can be easy to run a single JVM on a server-class machine with minimal tuning.
  • 47. 47 But how does the JVM behave when you combine ergonomics with cgroups?
  • 48. 48 We’ve established that my apps were suffering due to memory constraints.
  • 49. 49 But they weren’t
 being reaped
 by the OOM killer… Image credit: turnoff.us
  • 50. 50 But they weren’t
 being reaped
 by the OOM killer… They were just running incredibly slowly. Image credit: turnoff.us
  • 52. 52 Limiting memory like this allows you 2x as much swap...
  • 53. 53 ... while this constrains your swap limit to 300M. Limiting memory like this allows you 2x as much swap...
  • 54. 54 Tune your JVMs or you might break your system by deploying to a different server.
  • 55. 55 WHAT HAVE WE LEARNED SO FAR? L E S S O N O N E Your Kafka Streams app is just a regular Java app. Tune settings carefully and intentionally so you can take full advantage of containerization.
  • 57. 57 My manager came to me with an urgent proposal.
  • 58. 58 T R E N D I N G S E A R C H E S O N E T S Y
  • 59. 59 RECREATE THIS... USING STREAMING DATA. T R E N D I N G S E A R C H E S O N E T S Y
  • 61. 61 A company is an association or collection of individuals, whether natural persons, legal persons,. Y O U R T I T L E H E R E A company is an association or collection of individuals, whether natural persons, legal persons,. Y O U R T I T L E H E R E A company is an association or collection of individuals, whether natural persons, legal persons,. Y O U R T I T L E H E R E VALUE NAME 1 Y O U C A N W R I T E H E R E E n t r e p r e n e u r i a l a c t i v i t i e s d i f f e r s u b s t a n t i a l l y d e p e n d i n g o n t h e t y p e o f o r g a n i z a t i o n
  • 62. 62 The aggregation worked ... but the interactive queries piece didn’t.
  • 64. 64 ... and sometimes I would see this. Sometimes I would see data… *Etsy’s 404 image.
  • 65. 65 It didn’t seem to be following any sort of pattern.
  • 67. 67 At this point I was starting to tear my hair out...
  • 68. 68 …and then I decided to RTFM.
  • 69. 69 This was the code for the RPC layer. P R O J E C T D E S C R I P T I O N
  • 70. 70 This was the Docker ser vice definition. P R O J E C T D E S C R I P T I O N
  • 71. 71 Can you guess what we did wrong?
  • 72. 72
  • 73. 73
  • 74. 74
  • 78. 78
  • 80. 80 WHAT HAVE WE LEARNED SO FAR? L E S S O N O N E Your Kafka Streams app is just a regular Java app. Tune settings carefully and intentionally so you can take full advantage of containerization. L E S S O N T W O Networking in containers is not always straightforward. It adds an extra layer of complexity to your application deployment.
  • 81. 81 K A F K A S T R E A M S D O C K E R C O N F U S I O N “ H e y , C o n f l u e n t c r e a t e d t h i s r e a l l y c o o l n e w s t r e a m i n g d a t a f r a m ewo r k w e s h o u l d t r y o u t . ” “ Yo u k n o w w h a t w e s h o u l d d o ? R U N O U R S T R E A M I N G A P P S O N D O C K E R ! ” “ H o w d o e s K a f k a S t r e a m s w o r k ? W h a t ’s h a p p e n i n g i n s i d e m y a p p ? H ow d o yo u eve n D o c ke r ? ” HOW (NOT?) TO SET UP A STREAMING PLATFORM
  • 82. 82 K A F K A S T R E A M S D O C K E R C O N F U S I O N “ H e y , C o n f l u e n t c r e a t e d t h i s r e a l l y c o o l n e w s t r e a m i n g d a t a f r a m ewo r k w e s h o u l d t r y o u t . ” “ Yo u k n o w w h a t w e s h o u l d d o ? R U N O U R S T R E A M I N G A P P S O N D O C K E R ! ” “ H o w d o e s K a f k a S t r e a m s w o r k ? W h a t ’s h a p p e n i n g i n s i d e m y a p p ? H ow d o yo u eve n D o c ke r ? ” HOW (NOT?) TO SET UP A STREAMING PLATFORM L E A R N A L L T H E T H I N G S !
  • 83. 83 WHAT HAVE WE LEARNED SO FAR? L E S S O N O N E L E S S O N T H R E E Learning two things at the same time is hard! Docker will change how you think about deployment and failure recovery. Your Kafka Streams app is just a regular Java app. Tune settings carefully and intentionally so you can take full advantage of containerization. L E S S O N T W O Networking in containers is not always straightforward. It adds an extra layer of complexity to your application deployment.
  • 84. 84 In the end, running Kafka Streams on Docker turned out just fine.
  • 85. 85 Working out the best way to use Docker took by far the most time.
  • 86. 86 WHAT HAVE WE LEARNED SO FAR? L E S S O N O N E L E S S O N T H R E E L E S S O N F O U R Learning two things at the same time is hard! Docker will change how you think about deployment and failure recovery. Your Kafka Streams app is just a regular Java app. Tune settings carefully and intentionally so you can take full advantage of containerization. Make sure you’re using Docker for the right reasons! Otherwise, it’s not worth the time. L E S S O N T W O Networking in containers is not always straightforward. It adds an extra layer of complexity to your application deployment.
  • 87. 87 If you’re thinking about running your Streams apps in containers...
  • 89. 89 Do you have a good use case for Docker?
  • 90. 90 M U LT I P L E D E P L O Y S E T U P S ( F L E X I B L E I M A G E S )
  • 91. 91 M U LT I P L E D E P L O Y S E T U P S ( F L E X I B L E I M A G E S ) R U N N I N G AT L A R G E S C A L E ( R E S O U R C E C O N S T R A I N T S )
  • 92. 92 Do you have a good use case for Docker? Do you have an existing Docker setup?
  • 93. 93 M U LT I P L E D E P L O Y S E T U P S ( F L E X I B L E I M A G E S ) R U N N I N G AT L A R G E S C A L E ( R E S O U R C E C O N S T R A I N T S ) I N T E G R AT I N G W I T H E X I S T I N G D O C K E R S E T U P F O R M I C R O S E R V I C E S
  • 94. 94 Do you have a good use case for Docker? Do you have an existing Docker setup? Do you have Docker expertise on your team?
  • 95. 95 Do you have a good use case for Docker? Do you have an existing Docker setup? Do you have Docker expertise on your team? Or were you already running Kafka Streams in production?
  • 96. 96 Even if the answer to all these questions is no...
  • 98. 98 “It is better to fail in originality than to succeed in imitation.” H e r m a n M e l v i l l e
  • 99. 99 FURTHER QUESTIONS? nikki.thean@gmail.com E M A I L @NikkiThean T W I T T E R http://bit.ly/2weFcJz L I N K E D I N