SlideShare a Scribd company logo
Running Kafka
For Maximum Pain
​Todd Palino
​Senior Staff Engineer, Site Reliability
​LinkedIn
To All The Tech Debt
I’ve Loved Before
​Todd Palino
​Senior Staff Engineer, Site Reliability
​LinkedIn
T E C H N I C A L D E B T
The cost of the rework required by choosing
an easy solution now.
SRE
vs
SWE
SRE
❤s
SWE
• Both roles are critical
• Work together to balance
operability and features
• SRE’s job is to enable SWE to
move as quickly as possible
while meeting SLOs
How Big?
• Produced
• Every day
2Trillion
Messages
• Single cluster
• Unique data
5Gbps
Inbound
• Average 3x
consumption
• Before mirroring
18Gbps
Outbound
• Largest clusters
are 250k
• Up to 10k
partitions per
broker
2.5M
Partitions
Sources of Pain
Exponentially increase
your problems by
sharing them
Multitenancy
Kafka’s great!
Everything else around
it sucks
Infrastructure
What do you mean I
have to do it myself?
Management
Multitenancy
Sharing is Caring
• Reduces the hardware footprint
• Less administrative overhead
• One bad actor makes everyone’s life hard
Types of Data
• Member-related
Activity
• Data schemas are
managed by
DMRC
• Aggregated to
some datacenters
Tracking Metrics Queuing Logging
• Application
metrics, service
calls, logs
• Mostly produced
by application
containers
• Only aggregated
to backend
datacenters
• Internal
application data,
messaging
• Largest users are
Samza and Search
• Limited
aggregation in
production only
• Dedicated cluster
for application
logs going to ELK
• High volume, low
retention
• Not aggregated
Multitenancy Woes
• Auto topic creation means
nobody knows who created it
• Multiple producers further
clouds the issue
• Who makes decisions?
• Who is responsible for problems?
Ownership Capacity Security
• No controls means it’s free!
• Getting one person to project
growth is hard
• Getting 100 people to do it is
impossible
• Storage hardware is not
commodity
• Started with zero security
• Impossible to handle sensitive
data
Improvements
• Added an ownership metadata
service
• One committee with control over
shared data schemas
• Moving to disable automatic
topic creation
Ownership Capacity Security
• Quotas to limit bandwidth
• Retention by both time and
bytes to restrict disk usage
• Also forces customers to talk to
us about data usage
• Move all clients to SSL
• Add ACLs for existing usage (after
review)
• Starting to evaluate encryption
Infrastructure
Mirror Maker
• Every change requires a restart
• Grows n2 with number of sites
• Inefficient since 0.8
• Loses key to partition affinity
Mirror Maker
Performance
• Added identity handler for fixed
partition mapping
• Eliminated compression
• Finally off old consumer
• Coming soon to a KIP near you
Message Auditing
• Required to assure mirroring
works
• Makes infrastructure care about
data schema
• Only tracks producers (mostly)
• Relational database doesn’t cut it
for storing audit data
Streaming
Audit
• Moving audit data to headers
• Utilizing Samza for processing
counts
• Adding “cost to serve”
information
Management
Topic Configuration
• No way to manage configs across
multiple clusters
• Creating a new datacenter is a
manual process
• Changes need to be propagated in
a specific order
• Administrative commands are not
protected
Nuage
• One-stop shop for Data
Infrastructure
• Allows creation of topics with
ownership and ACLs
• Uses our Kafka REST interface
for CRUD
Cluster Membership
• No tool to remove brokers
• New brokers take no traffic
• Partition reassignment is basic
• Automatic leader election kills the
cluster
Round 1:
kafka-tools
• kafka-assigner:
• Remove broker
• Rebalance replicas
• Fix replication factor
• Protocol CLI tool
• Adding an admin client
• github.com/linkedin/kafka-tools
Round 2:
Cruise Control
• Dynamic workload rebalancing
• Self-healing clusters
• Manages multiple goals (network,
disk, CPU, rack)
• Requires no additional code
• Open source now!
What Else?
What Needs Attention?
• Very few metrics
• One bad partition breaks it
Log Compaction Client Config Upgrading
• Client and broker cannot
negotiate
• Configurations are essentially
shared secrets
• No information on the version of
clients connecting
• Message format changes are still
troubling
• Broker upgrades must be
carefully ordered
• Often no clear way to roll back
Make It Easier
• Cruise Control
• https://github.com/linkedin/cruise-control
• Kafka Monitor
• https://github.com/linkedin/kafka-monitor
• Burrow
• https://github.com/linkedin/Burrow
• kafka-tools
• https://github.com/linkedin/kafka-tools
LinkedIn Open Source Get Involved
• Community
• users@kafka.apache.org
• dev@kafka.apache.org
• Bugs and Work:
• https://issues.apache.org/jira/projects/KAFKA
Thank you

More Related Content

What's hot

Cloudstate - Towards Stateful Serverless
Cloudstate - Towards Stateful ServerlessCloudstate - Towards Stateful Serverless
Cloudstate - Towards Stateful Serverless
Lightbend
 
Microservices in Azure
Microservices in AzureMicroservices in Azure
Microservices in Azure
Doug Vanderweide
 
Common Challenges in DevOps Change Management
Common Challenges in DevOps Change ManagementCommon Challenges in DevOps Change Management
Common Challenges in DevOps Change Management
Matt Ray
 
The Reactive Principles: Eight Tenets For Building Cloud Native Applications
The Reactive Principles: Eight Tenets For Building Cloud Native ApplicationsThe Reactive Principles: Eight Tenets For Building Cloud Native Applications
The Reactive Principles: Eight Tenets For Building Cloud Native Applications
Lightbend
 
Serverless: The future of application delivery
Serverless: The future of application deliveryServerless: The future of application delivery
Serverless: The future of application delivery
Doug Vanderweide
 
Microservices, Kubernetes, and Application Modernization Done Right
Microservices, Kubernetes, and Application Modernization Done RightMicroservices, Kubernetes, and Application Modernization Done Right
Microservices, Kubernetes, and Application Modernization Done Right
Lightbend
 
DevCon13 System Administration Basics
DevCon13 System Administration BasicsDevCon13 System Administration Basics
DevCon13 System Administration Basicssysnickm
 
20160524 ibm fast data meetup
20160524 ibm fast data meetup20160524 ibm fast data meetup
20160524 ibm fast data meetup
shinolajla
 
Redgate Database Devops Demo webinar - Visual Studio Team Services - 21st Fe...
Redgate Database Devops Demo webinar  - Visual Studio Team Services - 21st Fe...Redgate Database Devops Demo webinar  - Visual Studio Team Services - 21st Fe...
Redgate Database Devops Demo webinar - Visual Studio Team Services - 21st Fe...
KateDuggan2
 
Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that grow
Gibraltar Software
 
QCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYE
QCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYEQCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYE
QCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYE
Randy Shoup
 
Concurrency at Scale: Evolution to Micro-Services
Concurrency at Scale:  Evolution to Micro-ServicesConcurrency at Scale:  Evolution to Micro-Services
Concurrency at Scale: Evolution to Micro-Services
Randy Shoup
 
How to Migrate to Cloud with Complete Confidence and Trust
How to Migrate to Cloud with Complete Confidence and TrustHow to Migrate to Cloud with Complete Confidence and Trust
How to Migrate to Cloud with Complete Confidence and Trust
Apcera
 
Dennis Doomen. Using OWIN, Webhooks, Event Sourcing and the Onion Architectur...
Dennis Doomen. Using OWIN, Webhooks, Event Sourcing and the Onion Architectur...Dennis Doomen. Using OWIN, Webhooks, Event Sourcing and the Onion Architectur...
Dennis Doomen. Using OWIN, Webhooks, Event Sourcing and the Onion Architectur...
IT Arena
 
Cloud Native Patterns Using AWS - Practical Examples
Cloud Native Patterns Using AWS - Practical ExamplesCloud Native Patterns Using AWS - Practical Examples
Cloud Native Patterns Using AWS - Practical Examples
Anderson Carvalho
 
Scaling Systems: Architectures that Grow
Scaling Systems: Architectures that GrowScaling Systems: Architectures that Grow
Scaling Systems: Architectures that Grow
Gibraltar Software
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
FoundationDB
 
Webinar: Eventual Consistency != Hopeful Consistency
Webinar: Eventual Consistency != Hopeful ConsistencyWebinar: Eventual Consistency != Hopeful Consistency
Webinar: Eventual Consistency != Hopeful Consistency
DataStax
 
Sdn not just a buzzword
Sdn not just a buzzwordSdn not just a buzzword
Sdn not just a buzzword
Jorge Bonilla
 
FoundationDB - NoSQL and ACID
FoundationDB - NoSQL and ACIDFoundationDB - NoSQL and ACID
FoundationDB - NoSQL and ACID
inside-BigData.com
 

What's hot (20)

Cloudstate - Towards Stateful Serverless
Cloudstate - Towards Stateful ServerlessCloudstate - Towards Stateful Serverless
Cloudstate - Towards Stateful Serverless
 
Microservices in Azure
Microservices in AzureMicroservices in Azure
Microservices in Azure
 
Common Challenges in DevOps Change Management
Common Challenges in DevOps Change ManagementCommon Challenges in DevOps Change Management
Common Challenges in DevOps Change Management
 
The Reactive Principles: Eight Tenets For Building Cloud Native Applications
The Reactive Principles: Eight Tenets For Building Cloud Native ApplicationsThe Reactive Principles: Eight Tenets For Building Cloud Native Applications
The Reactive Principles: Eight Tenets For Building Cloud Native Applications
 
Serverless: The future of application delivery
Serverless: The future of application deliveryServerless: The future of application delivery
Serverless: The future of application delivery
 
Microservices, Kubernetes, and Application Modernization Done Right
Microservices, Kubernetes, and Application Modernization Done RightMicroservices, Kubernetes, and Application Modernization Done Right
Microservices, Kubernetes, and Application Modernization Done Right
 
DevCon13 System Administration Basics
DevCon13 System Administration BasicsDevCon13 System Administration Basics
DevCon13 System Administration Basics
 
20160524 ibm fast data meetup
20160524 ibm fast data meetup20160524 ibm fast data meetup
20160524 ibm fast data meetup
 
Redgate Database Devops Demo webinar - Visual Studio Team Services - 21st Fe...
Redgate Database Devops Demo webinar  - Visual Studio Team Services - 21st Fe...Redgate Database Devops Demo webinar  - Visual Studio Team Services - 21st Fe...
Redgate Database Devops Demo webinar - Visual Studio Team Services - 21st Fe...
 
Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that grow
 
QCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYE
QCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYEQCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYE
QCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYE
 
Concurrency at Scale: Evolution to Micro-Services
Concurrency at Scale:  Evolution to Micro-ServicesConcurrency at Scale:  Evolution to Micro-Services
Concurrency at Scale: Evolution to Micro-Services
 
How to Migrate to Cloud with Complete Confidence and Trust
How to Migrate to Cloud with Complete Confidence and TrustHow to Migrate to Cloud with Complete Confidence and Trust
How to Migrate to Cloud with Complete Confidence and Trust
 
Dennis Doomen. Using OWIN, Webhooks, Event Sourcing and the Onion Architectur...
Dennis Doomen. Using OWIN, Webhooks, Event Sourcing and the Onion Architectur...Dennis Doomen. Using OWIN, Webhooks, Event Sourcing and the Onion Architectur...
Dennis Doomen. Using OWIN, Webhooks, Event Sourcing and the Onion Architectur...
 
Cloud Native Patterns Using AWS - Practical Examples
Cloud Native Patterns Using AWS - Practical ExamplesCloud Native Patterns Using AWS - Practical Examples
Cloud Native Patterns Using AWS - Practical Examples
 
Scaling Systems: Architectures that Grow
Scaling Systems: Architectures that GrowScaling Systems: Architectures that Grow
Scaling Systems: Architectures that Grow
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
 
Webinar: Eventual Consistency != Hopeful Consistency
Webinar: Eventual Consistency != Hopeful ConsistencyWebinar: Eventual Consistency != Hopeful Consistency
Webinar: Eventual Consistency != Hopeful Consistency
 
Sdn not just a buzzword
Sdn not just a buzzwordSdn not just a buzzword
Sdn not just a buzzword
 
FoundationDB - NoSQL and ACID
FoundationDB - NoSQL and ACIDFoundationDB - NoSQL and ACID
FoundationDB - NoSQL and ACID
 

Similar to Kafka Summit SF 2017 - Running Kafka for Maximum Pain

Running Kafka for Maximum Pain
Running Kafka for Maximum PainRunning Kafka for Maximum Pain
Running Kafka for Maximum Pain
Todd Palino
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
John Adams
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
Roger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Software Architecture and Architectors: useless VS valuable
Software Architecture and Architectors: useless VS valuableSoftware Architecture and Architectors: useless VS valuable
Software Architecture and Architectors: useless VS valuable
Comsysto Reply GmbH
 
The impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenThe impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves Goeleven
Particular Software
 
Iot cloud service v2.0
Iot cloud service v2.0Iot cloud service v2.0
Iot cloud service v2.0
Vinod Wilson
 
Scalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at PinterestScalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at Pinterest
Krishna Gade
 
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestDataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
Hakka Labs
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
John D Almon
 
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyArchitectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and Consistently
Comsysto Reply GmbH
 
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyArchitectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and Consistently
Comsysto Reply GmbH
 
Microservices: The Best Practices
Microservices: The Best PracticesMicroservices: The Best Practices
Microservices: The Best Practices
Pavel Mička
 
Stay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithStay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolith
Markus Eisele
 
Why retail companies can't afford database downtime
Why retail companies can't afford database downtimeWhy retail companies can't afford database downtime
Why retail companies can't afford database downtime
DBmaestro - Database DevOps
 
Microservices - opportunities, dilemmas and problems
Microservices - opportunities, dilemmas and problemsMicroservices - opportunities, dilemmas and problems
Microservices - opportunities, dilemmas and problems
Łukasz Sowa
 
Monitoring MySQL at scale
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scale
Ovais Tariq
 

Similar to Kafka Summit SF 2017 - Running Kafka for Maximum Pain (20)

Running Kafka for Maximum Pain
Running Kafka for Maximum PainRunning Kafka for Maximum Pain
Running Kafka for Maximum Pain
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Software Architecture and Architectors: useless VS valuable
Software Architecture and Architectors: useless VS valuableSoftware Architecture and Architectors: useless VS valuable
Software Architecture and Architectors: useless VS valuable
 
The impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenThe impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves Goeleven
 
Iot cloud service v2.0
Iot cloud service v2.0Iot cloud service v2.0
Iot cloud service v2.0
 
Scalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at PinterestScalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at Pinterest
 
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestDataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
 
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyArchitectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and Consistently
 
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyArchitectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and Consistently
 
Microservices: The Best Practices
Microservices: The Best PracticesMicroservices: The Best Practices
Microservices: The Best Practices
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
Stay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithStay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolith
 
Why retail companies can't afford database downtime
Why retail companies can't afford database downtimeWhy retail companies can't afford database downtime
Why retail companies can't afford database downtime
 
Microservices - opportunities, dilemmas and problems
Microservices - opportunities, dilemmas and problemsMicroservices - opportunities, dilemmas and problems
Microservices - opportunities, dilemmas and problems
 
Monitoring MySQL at scale
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scale
 

More from confluent

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
confluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
confluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
confluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
confluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
confluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
confluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
confluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
confluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
confluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
confluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
confluent
 

More from confluent (20)

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 

Recently uploaded

Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
NaapbooksPrivateLimi
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
MayankTawar1
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
Peter Caitens
 

Recently uploaded (20)

Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Visitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.appVisitor Management System in India- Vizman.app
Visitor Management System in India- Vizman.app
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 

Kafka Summit SF 2017 - Running Kafka for Maximum Pain

  • 1. Running Kafka For Maximum Pain ​Todd Palino ​Senior Staff Engineer, Site Reliability ​LinkedIn
  • 2. To All The Tech Debt I’ve Loved Before ​Todd Palino ​Senior Staff Engineer, Site Reliability ​LinkedIn
  • 3. T E C H N I C A L D E B T The cost of the rework required by choosing an easy solution now.
  • 5. SRE ❤s SWE • Both roles are critical • Work together to balance operability and features • SRE’s job is to enable SWE to move as quickly as possible while meeting SLOs
  • 6. How Big? • Produced • Every day 2Trillion Messages • Single cluster • Unique data 5Gbps Inbound • Average 3x consumption • Before mirroring 18Gbps Outbound • Largest clusters are 250k • Up to 10k partitions per broker 2.5M Partitions
  • 7. Sources of Pain Exponentially increase your problems by sharing them Multitenancy Kafka’s great! Everything else around it sucks Infrastructure What do you mean I have to do it myself? Management
  • 9. Sharing is Caring • Reduces the hardware footprint • Less administrative overhead • One bad actor makes everyone’s life hard
  • 10. Types of Data • Member-related Activity • Data schemas are managed by DMRC • Aggregated to some datacenters Tracking Metrics Queuing Logging • Application metrics, service calls, logs • Mostly produced by application containers • Only aggregated to backend datacenters • Internal application data, messaging • Largest users are Samza and Search • Limited aggregation in production only • Dedicated cluster for application logs going to ELK • High volume, low retention • Not aggregated
  • 11. Multitenancy Woes • Auto topic creation means nobody knows who created it • Multiple producers further clouds the issue • Who makes decisions? • Who is responsible for problems? Ownership Capacity Security • No controls means it’s free! • Getting one person to project growth is hard • Getting 100 people to do it is impossible • Storage hardware is not commodity • Started with zero security • Impossible to handle sensitive data
  • 12. Improvements • Added an ownership metadata service • One committee with control over shared data schemas • Moving to disable automatic topic creation Ownership Capacity Security • Quotas to limit bandwidth • Retention by both time and bytes to restrict disk usage • Also forces customers to talk to us about data usage • Move all clients to SSL • Add ACLs for existing usage (after review) • Starting to evaluate encryption
  • 14. Mirror Maker • Every change requires a restart • Grows n2 with number of sites • Inefficient since 0.8 • Loses key to partition affinity
  • 15. Mirror Maker Performance • Added identity handler for fixed partition mapping • Eliminated compression • Finally off old consumer • Coming soon to a KIP near you
  • 16. Message Auditing • Required to assure mirroring works • Makes infrastructure care about data schema • Only tracks producers (mostly) • Relational database doesn’t cut it for storing audit data
  • 17. Streaming Audit • Moving audit data to headers • Utilizing Samza for processing counts • Adding “cost to serve” information
  • 19. Topic Configuration • No way to manage configs across multiple clusters • Creating a new datacenter is a manual process • Changes need to be propagated in a specific order • Administrative commands are not protected
  • 20. Nuage • One-stop shop for Data Infrastructure • Allows creation of topics with ownership and ACLs • Uses our Kafka REST interface for CRUD
  • 21. Cluster Membership • No tool to remove brokers • New brokers take no traffic • Partition reassignment is basic • Automatic leader election kills the cluster
  • 22. Round 1: kafka-tools • kafka-assigner: • Remove broker • Rebalance replicas • Fix replication factor • Protocol CLI tool • Adding an admin client • github.com/linkedin/kafka-tools
  • 23. Round 2: Cruise Control • Dynamic workload rebalancing • Self-healing clusters • Manages multiple goals (network, disk, CPU, rack) • Requires no additional code • Open source now!
  • 25. What Needs Attention? • Very few metrics • One bad partition breaks it Log Compaction Client Config Upgrading • Client and broker cannot negotiate • Configurations are essentially shared secrets • No information on the version of clients connecting • Message format changes are still troubling • Broker upgrades must be carefully ordered • Often no clear way to roll back
  • 26. Make It Easier • Cruise Control • https://github.com/linkedin/cruise-control • Kafka Monitor • https://github.com/linkedin/kafka-monitor • Burrow • https://github.com/linkedin/Burrow • kafka-tools • https://github.com/linkedin/kafka-tools LinkedIn Open Source Get Involved • Community • users@kafka.apache.org • dev@kafka.apache.org • Bugs and Work: • https://issues.apache.org/jira/projects/KAFKA