SlideShare a Scribd company logo
instaclustr.comTwitter @instaclustr info@instaclustr.com instaclustr.com
Lessons Learned from Building an
Apache Kafka
Managed Service
instaclustr.com
Introduction
● Over 20 million node-hours of experience managing Cassandra,
Spark and Elassandra
● Our platform provides automated provisioning, monitoring and
management
● Available on AWS, GCP, Azure and IBM Cloud
● Managed Apache Kafka released May 21st
instaclustr.com
Agenda
● Context - our offering and development process
● Hardware choice and benchmarking
● Topic and user management
● Broker security configuration
● Monitoring
● Backup and Restore
instaclustr.com
Instaclustr Managed Kafka - Key Features
● Preview Release available:
○ Open source Apache Kafka and Zookeeper provisioned in AWS, GCP and Azure
○ Broker monitoring
○ Instaclustr monitoring and provisioning API support
○ Private network clusters (AWS only)
○ Run in your cloud provider account or ours
○ Topic management via a custom CLI tool
instaclustr.com
Instaclustr Managed Kafka - Key Features
● For GA (end June):
○ SOC2 compliant
○ User & credential management
○ Providing more cluster config options
○ Topic level and synthetic transaction monitoring
○ Infrastructure config tuning
instaclustr.com
Instaclustr Managed Kafka - Development Process
● First customer requests 2016
● Internal infrastructure deployment and usage of Kafka mid 2017
● Managed service platform development
commenced November 2017
● Early access program with 4 customers
commenced December 2017
● Public preview release 21 May 2018
● GA expected 25 June 2018
instaclustr.com
Hardware Choice and Benchmarking - GP2 vs ST1
● Disk Type
○ AWS benchmark - r4.large w 500GB disks
■ 1 x 500GB ST1 volume
■ 10 x 50GB GP2 volumes in RAID0 configuration
○ Avg 10% improved throughput with ST1 vs GP2 EBS
○ ST1 is 45% of the cost of GP2
○ Non-RAIDed mount simplifies re-sizing EBS volumes
Type Writes (m/s) Reads (m/s) Mixed (m/s)
ST1 223,851 149,506 W: 171,305 / R: 49,898
GP2 203,409 127,127 W: 162,966 / R: 44,869
instaclustr.com
ST1
GP2
instaclustr.com
Provider Comparison
instaclustr.com
Hardware Choice and Benchmarking - SSL vs non-SSL
● Encryption enabled on broker-to-broker and client-to-broker
○ AWS benchmark - r4.large w 1500GB ST1 disk
○ 512 byte messages
○ ~30% decrease in throughput with Broker and Client SSL enabled
● Follow-up benchmarks on OpenJDK 8 vs. 9, based on KAFKA-2561
○ 50% increased throughput in writes
○ 80% increased throughput in reads
instaclustr.com
instaclustr.com
Hardware Choice and Benchmarking - Number of Topics
● Possible urban myth that
increasing topics reduces
performance
● However,
more topics = more
partitions
● Significantly slows recovery
time from node failure
10
Topic
s
100
Topic
s
1000
Topic
s
5000
Topic
s
instaclustr.com
Hardware Choice and Benchmarking -
Colocated Zookeeper
● Often recommended to host zookeeper separately to Kafka
● However, recent changes have significantly reduced load on Zookeeper from Kafka
○ Consumer offsets are no longer stored in Zookeeper
● Our benchmarking showed no measurable difference in performance, at least for smaller clusters
instaclustr.com
Hardware Choice and Benchmarking -
Colocated Zookeeper
Consumer Rate - Separate Consumer Rate - Colocated
● 6 node cluster with broker restart
○ Similar results with dedicated Zookeeper disk vs. shared
instaclustr.com
Topic and User Configuration Management
● Kafka utilities require direct access to Zookeeper
● Zookeeper does not have a robust external security model
● Felt that providing access to Zookeeper was a risk
● Solutions
○ Developed command line tool to use Kafka API for topic configuration
https://github.com/instaclustr/ic-kafka-tools
■ Future: Console UI support?
■ Value topic configuration versioning and management
○ Adding user management to Instaclustr Console
■ Additional authentication required
instaclustr.com
Broker Security Configuration
● Using SCRAM (Salted Challenge Response Authentication Mechanism) authentication
○ Used for client->broker
○ Broker->broker uses SASL plaintext
● Using SASL plaintext authentication
○ Used for broker->broker
○ Were planning on integrating SCRAM authentication, but dynamic configuration still requires
broker restart
○ Instead planning on short-lived signed broker keys as dynamic configuration does not require
restart
instaclustr.com
Broker Security Configuration
● Access to managed clusters
○ Public IPs and whitelisting in firewall (security group or equivalent)
○ Private IPs with VPC Peering (or equivalent in other cloud providers)
○ Private Network Clusters where nodes are not allocated public IPs and gateway box is used for
admin access
○ Don’t expose Zookeeper through firewall due to weak security model
instaclustr.com
Monitoring
● Metrics exposed via JMX
○ Custom collection agent -> RabbitMQ (planned to migrate to Kafka) -> Riemann ->
Cassandra+Spark -> Console, APIs, Grafana
● Exposing broker-level and per-topic metrics
● Alerting
○ Basics: service state, disk usage free space, server still exists
○ Kafka metrics: offline partitions, active controllers != 1, partition under replicated
■ Active controller very sensitive, are re-assessing alert thresholds
○ Synthetic transactions: publish and consume message to controlled topic, measure success and
latency
instaclustr.com
Monitoring
● Central Logging
○ Fleet logs transferred via Kafka to an Elassandra cluster
○ 1,700 nodes submit via Journalbeat -> Kafka -> Logstash -> Elassandra
○ Kafka experience in this project has been very positive
● Only issue
○ Auto offset commit failed for group logstash: Commit offsets failed with retriable exception. You
should retry committing offsets.
○ We weren’t monitoring consumer lag closely enough
○ Increased consumer session and request timeouts
instaclustr.com
Backup and Restore
● Internet wisdom = Kafka Backups is not a thing
○ Rely on replication within cluster or mirror maker
replication to another cluster
● Cassandra experience says backups are valuable
○ Hardware failure is not an issue but corruption due to
app bugs or user error can occur and be spread by
replication
● Future
○ Working on regular automated backup and restore of
topic and security configuration
○ Consider using Kafka Connect to write important
messages to offline backup
instaclustr.com
Thanks for listening!
● Currently in Preview
● Would love any feedback, suggestions or just telling us what we missed
● 14-day free trial option (no CC needed) - console.instaclustr.com

More Related Content

What's hot

Samuel Bercovici - lbaaS for Havana
Samuel Bercovici - lbaaS for HavanaSamuel Bercovici - lbaaS for Havana
Samuel Bercovici - lbaaS for Havana
Cloud Native Day Tel Aviv
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Docker, Inc.
 
Function Mesh for Apache Pulsar, the Way for Simple Streaming Solutions
Function Mesh for Apache Pulsar, the Way for Simple Streaming SolutionsFunction Mesh for Apache Pulsar, the Way for Simple Streaming Solutions
Function Mesh for Apache Pulsar, the Way for Simple Streaming Solutions
StreamNative
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
confluent
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-Camus
Deep Shah
 
Apache Pulsar First Overview
Apache PulsarFirst OverviewApache PulsarFirst Overview
Apache Pulsar First Overview
Ricardo Paiva
 
Kubecon 2019_eu-k8s-secrets-csi
Kubecon 2019_eu-k8s-secrets-csiKubecon 2019_eu-k8s-secrets-csi
Kubecon 2019_eu-k8s-secrets-csi
Rita Zhang
 
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
HostedbyConfluent
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
confluent
 
Mongo DB Monitoring - Become a MongoDB DBA
Mongo DB Monitoring - Become a MongoDB DBAMongo DB Monitoring - Become a MongoDB DBA
Mongo DB Monitoring - Become a MongoDB DBA
Severalnines
 
How Much Can You Connect? | Bhavesh Raheja, Disney + Hotstar
How Much Can You Connect? | Bhavesh Raheja, Disney + HotstarHow Much Can You Connect? | Bhavesh Raheja, Disney + Hotstar
How Much Can You Connect? | Bhavesh Raheja, Disney + Hotstar
HostedbyConfluent
 
5 levels of high availability from multi instance to hybrid cloud
5 levels of high availability  from multi instance to hybrid cloud5 levels of high availability  from multi instance to hybrid cloud
5 levels of high availability from multi instance to hybrid cloud
Rafał Leszko
 
KubeCon + CloudNative Con NA 2021 | A New Generation of NATS
KubeCon + CloudNative Con NA 2021 | A New Generation of NATSKubeCon + CloudNative Con NA 2021 | A New Generation of NATS
KubeCon + CloudNative Con NA 2021 | A New Generation of NATS
NATS
 
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
HostedbyConfluent
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
VMware Tanzu
 
Architectural patterns for high performance microservices in kubernetes
Architectural patterns for high performance microservices in kubernetesArchitectural patterns for high performance microservices in kubernetes
Architectural patterns for high performance microservices in kubernetes
Rafał Leszko
 
Effectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache PulsarEffectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache Pulsar
Matteo Merli
 
Accelerating the Path to Digital with a Cloud Data Strategy
Accelerating the Path to Digital with a Cloud Data StrategyAccelerating the Path to Digital with a Cloud Data Strategy
Accelerating the Path to Digital with a Cloud Data Strategy
MongoDB
 

What's hot (18)

Samuel Bercovici - lbaaS for Havana
Samuel Bercovici - lbaaS for HavanaSamuel Bercovici - lbaaS for Havana
Samuel Bercovici - lbaaS for Havana
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
 
Function Mesh for Apache Pulsar, the Way for Simple Streaming Solutions
Function Mesh for Apache Pulsar, the Way for Simple Streaming SolutionsFunction Mesh for Apache Pulsar, the Way for Simple Streaming Solutions
Function Mesh for Apache Pulsar, the Way for Simple Streaming Solutions
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-Camus
 
Apache Pulsar First Overview
Apache PulsarFirst OverviewApache PulsarFirst Overview
Apache Pulsar First Overview
 
Kubecon 2019_eu-k8s-secrets-csi
Kubecon 2019_eu-k8s-secrets-csiKubecon 2019_eu-k8s-secrets-csi
Kubecon 2019_eu-k8s-secrets-csi
 
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Mongo DB Monitoring - Become a MongoDB DBA
Mongo DB Monitoring - Become a MongoDB DBAMongo DB Monitoring - Become a MongoDB DBA
Mongo DB Monitoring - Become a MongoDB DBA
 
How Much Can You Connect? | Bhavesh Raheja, Disney + Hotstar
How Much Can You Connect? | Bhavesh Raheja, Disney + HotstarHow Much Can You Connect? | Bhavesh Raheja, Disney + Hotstar
How Much Can You Connect? | Bhavesh Raheja, Disney + Hotstar
 
5 levels of high availability from multi instance to hybrid cloud
5 levels of high availability  from multi instance to hybrid cloud5 levels of high availability  from multi instance to hybrid cloud
5 levels of high availability from multi instance to hybrid cloud
 
KubeCon + CloudNative Con NA 2021 | A New Generation of NATS
KubeCon + CloudNative Con NA 2021 | A New Generation of NATSKubeCon + CloudNative Con NA 2021 | A New Generation of NATS
KubeCon + CloudNative Con NA 2021 | A New Generation of NATS
 
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
 
Architectural patterns for high performance microservices in kubernetes
Architectural patterns for high performance microservices in kubernetesArchitectural patterns for high performance microservices in kubernetes
Architectural patterns for high performance microservices in kubernetes
 
Effectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache PulsarEffectively-once semantics in Apache Pulsar
Effectively-once semantics in Apache Pulsar
 
Accelerating the Path to Digital with a Cloud Data Strategy
Accelerating the Path to Digital with a Cloud Data StrategyAccelerating the Path to Digital with a Cloud Data Strategy
Accelerating the Path to Digital with a Cloud Data Strategy
 

Similar to Instaclustr Kafka Meetup Sydney Presentation

Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Samuel Kerrien
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini
 
Automating using Ansible
Automating using AnsibleAutomating using Ansible
Automating using Ansible
Alok Patra
 
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, TwitterTwitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
HostedbyConfluent
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
Ankur Bansal
 
kafka
kafkakafka
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ... A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
HostedbyConfluent
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
Ruslan Meshenberg
 
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Nitin Kumar
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent Cloud
HostedbyConfluent
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
javier ramirez
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Allen (Xiaozhong) Wang
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Steven Wu
 
ClustrixDB at Samsung Cloud
ClustrixDB at Samsung CloudClustrixDB at Samsung Cloud
ClustrixDB at Samsung Cloud
MariaDB plc
 
What's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talkWhat's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talk
confluent
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
Amir Sedighi
 
A day in the life of a log message
A day in the life of a log messageA day in the life of a log message
A day in the life of a log message
Josef Karásek
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouse
Altinity Ltd
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupKafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Mingmin Chen
 
Build real time stream processing applications using Apache Kafka
Build real time stream processing applications using Apache KafkaBuild real time stream processing applications using Apache Kafka
Build real time stream processing applications using Apache Kafka
Hotstar
 

Similar to Instaclustr Kafka Meetup Sydney Presentation (20)

Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Automating using Ansible
Automating using AnsibleAutomating using Ansible
Automating using Ansible
 
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, TwitterTwitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
 
kafka
kafkakafka
kafka
 
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ... A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent Cloud
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
ClustrixDB at Samsung Cloud
ClustrixDB at Samsung CloudClustrixDB at Samsung Cloud
ClustrixDB at Samsung Cloud
 
What's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talkWhat's new in confluent platform 5.4 online talk
What's new in confluent platform 5.4 online talk
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
A day in the life of a log message
A day in the life of a log messageA day in the life of a log message
A day in the life of a log message
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouse
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupKafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetup
 
Build real time stream processing applications using Apache Kafka
Build real time stream processing applications using Apache KafkaBuild real time stream processing applications using Apache Kafka
Build real time stream processing applications using Apache Kafka
 

Recently uploaded

Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio, Inc.
 
Trailhead Talks_ Journey of an All-Star Ranger .pptx
Trailhead Talks_ Journey of an All-Star Ranger .pptxTrailhead Talks_ Journey of an All-Star Ranger .pptx
Trailhead Talks_ Journey of an All-Star Ranger .pptx
ImtiazBinMohiuddin
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Paul Brebner
 
Hands-on with Apache Druid: Installation & Data Ingestion Steps
Hands-on with Apache Druid: Installation & Data Ingestion StepsHands-on with Apache Druid: Installation & Data Ingestion Steps
Hands-on with Apache Druid: Installation & Data Ingestion Steps
servicesNitor
 
Beginner's Guide to Observability@Devoxx PL 2024
Beginner's  Guide to Observability@Devoxx PL 2024Beginner's  Guide to Observability@Devoxx PL 2024
Beginner's Guide to Observability@Devoxx PL 2024
michniczscribd
 
Secure-by-Design Using Hardware and Software Protection for FDA Compliance
Secure-by-Design Using Hardware and Software Protection for FDA ComplianceSecure-by-Design Using Hardware and Software Protection for FDA Compliance
Secure-by-Design Using Hardware and Software Protection for FDA Compliance
ICS
 
How GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdfHow GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdf
Zycus
 
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptxOperational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
sandeepmenon62
 
Software Test Automation - A Comprehensive Guide on Automated Testing.pdf
Software Test Automation - A Comprehensive Guide on Automated Testing.pdfSoftware Test Automation - A Comprehensive Guide on Automated Testing.pdf
Software Test Automation - A Comprehensive Guide on Automated Testing.pdf
kalichargn70th171
 
Going AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applicationsGoing AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applications
Alina Yurenko
 
Streamlining End-to-End Testing Automation
Streamlining End-to-End Testing AutomationStreamlining End-to-End Testing Automation
Streamlining End-to-End Testing Automation
Anand Bagmar
 
Orca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container OrchestrationOrca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container Orchestration
Pedro J. Molina
 
Ensuring Efficiency and Speed with Practical Solutions for Clinical Operations
Ensuring Efficiency and Speed with Practical Solutions for Clinical OperationsEnsuring Efficiency and Speed with Practical Solutions for Clinical Operations
Ensuring Efficiency and Speed with Practical Solutions for Clinical Operations
OnePlan Solutions
 
The Role of DevOps in Digital Transformation.pdf
The Role of DevOps in Digital Transformation.pdfThe Role of DevOps in Digital Transformation.pdf
The Role of DevOps in Digital Transformation.pdf
mohitd6
 
ppt on the brain chip neuralink.pptx
ppt  on   the brain  chip neuralink.pptxppt  on   the brain  chip neuralink.pptx
ppt on the brain chip neuralink.pptx
Reetu63
 
Microsoft-Power-Platform-Adoption-Planning.pptx
Microsoft-Power-Platform-Adoption-Planning.pptxMicrosoft-Power-Platform-Adoption-Planning.pptx
Microsoft-Power-Platform-Adoption-Planning.pptx
jrodriguezq3110
 
TheFutureIsDynamic-BoxLang-CFCamp2024.pdf
TheFutureIsDynamic-BoxLang-CFCamp2024.pdfTheFutureIsDynamic-BoxLang-CFCamp2024.pdf
TheFutureIsDynamic-BoxLang-CFCamp2024.pdf
Ortus Solutions, Corp
 
What’s new in VictoriaMetrics - Q2 2024 Update
What’s new in VictoriaMetrics - Q2 2024 UpdateWhat’s new in VictoriaMetrics - Q2 2024 Update
What’s new in VictoriaMetrics - Q2 2024 Update
VictoriaMetrics
 
Upturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in NashikUpturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in Nashik
Upturn India Technologies
 
Building the Ideal CI-CD Pipeline_ Achieving Visual Perfection
Building the Ideal CI-CD Pipeline_ Achieving Visual PerfectionBuilding the Ideal CI-CD Pipeline_ Achieving Visual Perfection
Building the Ideal CI-CD Pipeline_ Achieving Visual Perfection
Applitools
 

Recently uploaded (20)

Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
 
Trailhead Talks_ Journey of an All-Star Ranger .pptx
Trailhead Talks_ Journey of an All-Star Ranger .pptxTrailhead Talks_ Journey of an All-Star Ranger .pptx
Trailhead Talks_ Journey of an All-Star Ranger .pptx
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
 
Hands-on with Apache Druid: Installation & Data Ingestion Steps
Hands-on with Apache Druid: Installation & Data Ingestion StepsHands-on with Apache Druid: Installation & Data Ingestion Steps
Hands-on with Apache Druid: Installation & Data Ingestion Steps
 
Beginner's Guide to Observability@Devoxx PL 2024
Beginner's  Guide to Observability@Devoxx PL 2024Beginner's  Guide to Observability@Devoxx PL 2024
Beginner's Guide to Observability@Devoxx PL 2024
 
Secure-by-Design Using Hardware and Software Protection for FDA Compliance
Secure-by-Design Using Hardware and Software Protection for FDA ComplianceSecure-by-Design Using Hardware and Software Protection for FDA Compliance
Secure-by-Design Using Hardware and Software Protection for FDA Compliance
 
How GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdfHow GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdf
 
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptxOperational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
 
Software Test Automation - A Comprehensive Guide on Automated Testing.pdf
Software Test Automation - A Comprehensive Guide on Automated Testing.pdfSoftware Test Automation - A Comprehensive Guide on Automated Testing.pdf
Software Test Automation - A Comprehensive Guide on Automated Testing.pdf
 
Going AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applicationsGoing AOT: Everything you need to know about GraalVM for Java applications
Going AOT: Everything you need to know about GraalVM for Java applications
 
Streamlining End-to-End Testing Automation
Streamlining End-to-End Testing AutomationStreamlining End-to-End Testing Automation
Streamlining End-to-End Testing Automation
 
Orca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container OrchestrationOrca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container Orchestration
 
Ensuring Efficiency and Speed with Practical Solutions for Clinical Operations
Ensuring Efficiency and Speed with Practical Solutions for Clinical OperationsEnsuring Efficiency and Speed with Practical Solutions for Clinical Operations
Ensuring Efficiency and Speed with Practical Solutions for Clinical Operations
 
The Role of DevOps in Digital Transformation.pdf
The Role of DevOps in Digital Transformation.pdfThe Role of DevOps in Digital Transformation.pdf
The Role of DevOps in Digital Transformation.pdf
 
ppt on the brain chip neuralink.pptx
ppt  on   the brain  chip neuralink.pptxppt  on   the brain  chip neuralink.pptx
ppt on the brain chip neuralink.pptx
 
Microsoft-Power-Platform-Adoption-Planning.pptx
Microsoft-Power-Platform-Adoption-Planning.pptxMicrosoft-Power-Platform-Adoption-Planning.pptx
Microsoft-Power-Platform-Adoption-Planning.pptx
 
TheFutureIsDynamic-BoxLang-CFCamp2024.pdf
TheFutureIsDynamic-BoxLang-CFCamp2024.pdfTheFutureIsDynamic-BoxLang-CFCamp2024.pdf
TheFutureIsDynamic-BoxLang-CFCamp2024.pdf
 
What’s new in VictoriaMetrics - Q2 2024 Update
What’s new in VictoriaMetrics - Q2 2024 UpdateWhat’s new in VictoriaMetrics - Q2 2024 Update
What’s new in VictoriaMetrics - Q2 2024 Update
 
Upturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in NashikUpturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in Nashik
 
Building the Ideal CI-CD Pipeline_ Achieving Visual Perfection
Building the Ideal CI-CD Pipeline_ Achieving Visual PerfectionBuilding the Ideal CI-CD Pipeline_ Achieving Visual Perfection
Building the Ideal CI-CD Pipeline_ Achieving Visual Perfection
 

Instaclustr Kafka Meetup Sydney Presentation

  • 1. instaclustr.comTwitter @instaclustr info@instaclustr.com instaclustr.com Lessons Learned from Building an Apache Kafka Managed Service
  • 2. instaclustr.com Introduction ● Over 20 million node-hours of experience managing Cassandra, Spark and Elassandra ● Our platform provides automated provisioning, monitoring and management ● Available on AWS, GCP, Azure and IBM Cloud ● Managed Apache Kafka released May 21st
  • 3. instaclustr.com Agenda ● Context - our offering and development process ● Hardware choice and benchmarking ● Topic and user management ● Broker security configuration ● Monitoring ● Backup and Restore
  • 4. instaclustr.com Instaclustr Managed Kafka - Key Features ● Preview Release available: ○ Open source Apache Kafka and Zookeeper provisioned in AWS, GCP and Azure ○ Broker monitoring ○ Instaclustr monitoring and provisioning API support ○ Private network clusters (AWS only) ○ Run in your cloud provider account or ours ○ Topic management via a custom CLI tool
  • 5. instaclustr.com Instaclustr Managed Kafka - Key Features ● For GA (end June): ○ SOC2 compliant ○ User & credential management ○ Providing more cluster config options ○ Topic level and synthetic transaction monitoring ○ Infrastructure config tuning
  • 6. instaclustr.com Instaclustr Managed Kafka - Development Process ● First customer requests 2016 ● Internal infrastructure deployment and usage of Kafka mid 2017 ● Managed service platform development commenced November 2017 ● Early access program with 4 customers commenced December 2017 ● Public preview release 21 May 2018 ● GA expected 25 June 2018
  • 7. instaclustr.com Hardware Choice and Benchmarking - GP2 vs ST1 ● Disk Type ○ AWS benchmark - r4.large w 500GB disks ■ 1 x 500GB ST1 volume ■ 10 x 50GB GP2 volumes in RAID0 configuration ○ Avg 10% improved throughput with ST1 vs GP2 EBS ○ ST1 is 45% of the cost of GP2 ○ Non-RAIDed mount simplifies re-sizing EBS volumes Type Writes (m/s) Reads (m/s) Mixed (m/s) ST1 223,851 149,506 W: 171,305 / R: 49,898 GP2 203,409 127,127 W: 162,966 / R: 44,869
  • 10. instaclustr.com Hardware Choice and Benchmarking - SSL vs non-SSL ● Encryption enabled on broker-to-broker and client-to-broker ○ AWS benchmark - r4.large w 1500GB ST1 disk ○ 512 byte messages ○ ~30% decrease in throughput with Broker and Client SSL enabled ● Follow-up benchmarks on OpenJDK 8 vs. 9, based on KAFKA-2561 ○ 50% increased throughput in writes ○ 80% increased throughput in reads
  • 12. instaclustr.com Hardware Choice and Benchmarking - Number of Topics ● Possible urban myth that increasing topics reduces performance ● However, more topics = more partitions ● Significantly slows recovery time from node failure 10 Topic s 100 Topic s 1000 Topic s 5000 Topic s
  • 13. instaclustr.com Hardware Choice and Benchmarking - Colocated Zookeeper ● Often recommended to host zookeeper separately to Kafka ● However, recent changes have significantly reduced load on Zookeeper from Kafka ○ Consumer offsets are no longer stored in Zookeeper ● Our benchmarking showed no measurable difference in performance, at least for smaller clusters
  • 14. instaclustr.com Hardware Choice and Benchmarking - Colocated Zookeeper Consumer Rate - Separate Consumer Rate - Colocated ● 6 node cluster with broker restart ○ Similar results with dedicated Zookeeper disk vs. shared
  • 15. instaclustr.com Topic and User Configuration Management ● Kafka utilities require direct access to Zookeeper ● Zookeeper does not have a robust external security model ● Felt that providing access to Zookeeper was a risk ● Solutions ○ Developed command line tool to use Kafka API for topic configuration https://github.com/instaclustr/ic-kafka-tools ■ Future: Console UI support? ■ Value topic configuration versioning and management ○ Adding user management to Instaclustr Console ■ Additional authentication required
  • 16. instaclustr.com Broker Security Configuration ● Using SCRAM (Salted Challenge Response Authentication Mechanism) authentication ○ Used for client->broker ○ Broker->broker uses SASL plaintext ● Using SASL plaintext authentication ○ Used for broker->broker ○ Were planning on integrating SCRAM authentication, but dynamic configuration still requires broker restart ○ Instead planning on short-lived signed broker keys as dynamic configuration does not require restart
  • 17. instaclustr.com Broker Security Configuration ● Access to managed clusters ○ Public IPs and whitelisting in firewall (security group or equivalent) ○ Private IPs with VPC Peering (or equivalent in other cloud providers) ○ Private Network Clusters where nodes are not allocated public IPs and gateway box is used for admin access ○ Don’t expose Zookeeper through firewall due to weak security model
  • 18. instaclustr.com Monitoring ● Metrics exposed via JMX ○ Custom collection agent -> RabbitMQ (planned to migrate to Kafka) -> Riemann -> Cassandra+Spark -> Console, APIs, Grafana ● Exposing broker-level and per-topic metrics ● Alerting ○ Basics: service state, disk usage free space, server still exists ○ Kafka metrics: offline partitions, active controllers != 1, partition under replicated ■ Active controller very sensitive, are re-assessing alert thresholds ○ Synthetic transactions: publish and consume message to controlled topic, measure success and latency
  • 19. instaclustr.com Monitoring ● Central Logging ○ Fleet logs transferred via Kafka to an Elassandra cluster ○ 1,700 nodes submit via Journalbeat -> Kafka -> Logstash -> Elassandra ○ Kafka experience in this project has been very positive ● Only issue ○ Auto offset commit failed for group logstash: Commit offsets failed with retriable exception. You should retry committing offsets. ○ We weren’t monitoring consumer lag closely enough ○ Increased consumer session and request timeouts
  • 20. instaclustr.com Backup and Restore ● Internet wisdom = Kafka Backups is not a thing ○ Rely on replication within cluster or mirror maker replication to another cluster ● Cassandra experience says backups are valuable ○ Hardware failure is not an issue but corruption due to app bugs or user error can occur and be spread by replication ● Future ○ Working on regular automated backup and restore of topic and security configuration ○ Consider using Kafka Connect to write important messages to offline backup
  • 21. instaclustr.com Thanks for listening! ● Currently in Preview ● Would love any feedback, suggestions or just telling us what we missed ● 14-day free trial option (no CC needed) - console.instaclustr.com