The document discusses how to secure Apache Kafka clusters through authentication. It describes several authentication mechanisms including TLS, SASL/GSSAPI using Kerberos, and SASL/PLAIN and SASL/SCRAM for username and password authentication. TLS provides server and client authentication but has performance overhead while SASL mechanisms like GSSAPI and SCRAM integrate with existing authentication systems with lower performance impact. The document provides configuration details and security considerations for each mechanism.
A step-by-step deep dive into Kafka Security world. This presentation covers few most sought-after questions in Streaming / Kafka; like what happens internally when SASL / Kerberos / SSL security is configured, how does various Kafka components interacts with each other. This could be valuable resource for administrators, users & Application developers alike. Having internal Kafka knowledge would help them to configure, manage and use the Kafka systems in a more optimal way with least possible errors / mistakes.
Agenda is to discuss:
- Various Kafka Security model available: PLAINTEXT, SASL_PLAINTEXT, SASL_SSL, PLAINTEXT_SSL and when to use which model
- Anatomy of each Security model: in-depth examination of these models and what happens internally when they are used; with real life examples
- Do's and Don'ts of Kafka Security
- Common Errors & Troubleshooting
This talk will be all about looking under-the-hood with respect to Kafka Security. Suitable for all levels from beginners to expert.
Speaker
Vipin Rathor, Sr. Product Specialist (security), Hortonworks
With Apache Kafka 0.9, the community has introduced a number of features to make data streams secure. In this talk, we’ll explain the motivation for making these changes, discuss the design of Kafka security, and explain how to secure a Kafka cluster. We will cover common pitfalls in securing Kafka, and talk about ongoing security work.
Producer Performance Tuning for Apache KafkaJiangjie Qin
Kafka is well known for high throughput ingestion. However, to get the best latency characteristics without compromising on throughput and durability, we need to tune Kafka. In this talk, we share our experiences to achieve the optimal combination of latency, throughput and durability for different scenarios.
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
A step-by-step deep dive into Kafka Security world. This presentation covers few most sought-after questions in Streaming / Kafka; like what happens internally when SASL / Kerberos / SSL security is configured, how does various Kafka components interacts with each other. This could be valuable resource for administrators, users & Application developers alike. Having internal Kafka knowledge would help them to configure, manage and use the Kafka systems in a more optimal way with least possible errors / mistakes.
Agenda is to discuss:
- Various Kafka Security model available: PLAINTEXT, SASL_PLAINTEXT, SASL_SSL, PLAINTEXT_SSL and when to use which model
- Anatomy of each Security model: in-depth examination of these models and what happens internally when they are used; with real life examples
- Do's and Don'ts of Kafka Security
- Common Errors & Troubleshooting
This talk will be all about looking under-the-hood with respect to Kafka Security. Suitable for all levels from beginners to expert.
Speaker
Vipin Rathor, Sr. Product Specialist (security), Hortonworks
With Apache Kafka 0.9, the community has introduced a number of features to make data streams secure. In this talk, we’ll explain the motivation for making these changes, discuss the design of Kafka security, and explain how to secure a Kafka cluster. We will cover common pitfalls in securing Kafka, and talk about ongoing security work.
Producer Performance Tuning for Apache KafkaJiangjie Qin
Kafka is well known for high throughput ingestion. However, to get the best latency characteristics without compromising on throughput and durability, we need to tune Kafka. In this talk, we share our experiences to achieve the optimal combination of latency, throughput and durability for different scenarios.
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...confluent
Apache Kafka is a scalable streaming platform with built-in dynamic client scaling. The elastic scale-in/scale-out feature leverages Kafka’s “rebalance protocol” that was designed in the 0.9 release and improved ever since then. The original design aims for on-prem deployments of stateless clients. However, it does not always align with modern deployment tools like Kubernetes and stateful stream processing clients, like Kafka Streams. Those shortcoming lead to two mayor recent improvement proposals, namely static group membership and incremental rebalancing (which will hopefully be available in version 2.3). This talk provides a deep dive into the details of the rebalance protocol, starting from its original design in version 0.9 up to the latest improvements and future work. We discuss internal technical details, pros and cons of the existing approaches, and explain how you configure your client correctly for your use case. Additionally, we discuss configuration tradeoffs for stateless, stateful, on-prem, and containerized deployments.
In the last few years, Apache Kafka has been used extensively in enterprises for real-time data collecting, delivering, and processing. In this presentation, Jun Rao, Co-founder, Confluent, gives a deep dive on some of the key internals that help make Kafka popular.
- Companies like LinkedIn are now sending more than 1 trillion messages per day to Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
- Many companies (e.g., financial institutions) are now storing mission critical data in Kafka. Learn how Kafka supports high availability and durability through its built-in replication mechanism.
- One common use case of Kafka is for propagating updatable database records. Learn how a unique feature called compaction in Apache Kafka is designed to solve this kind of problem more naturally.
Apache Kafka 0.8 basic training - VerisignMichael Noll
Apache Kafka 0.8 basic training (120 slides) covering:
1. Introducing Kafka: history, Kafka at LinkedIn, Kafka adoption in the industry, why Kafka
2. Kafka core concepts: topics, partitions, replicas, producers, consumers, brokers
3. Operating Kafka: architecture, hardware specs, deploying, monitoring, P&S tuning
4. Developing Kafka apps: writing to Kafka, reading from Kafka, testing, serialization, compression, example apps
5. Playing with Kafka using Wirbelsturm
Audience: developers, operations, architects
Created by Michael G. Noll, Data Architect, Verisign, https://www.verisigninc.com/
Verisign is a global leader in domain names and internet security.
Tools mentioned:
- Wirbelsturm (https://github.com/miguno/wirbelsturm)
- kafka-storm-starter (https://github.com/miguno/kafka-storm-starter)
Blog post at:
http://www.michael-noll.com/blog/2014/08/18/apache-kafka-training-deck-and-tutorial/
Many thanks to the LinkedIn Engineering team (the creators of Kafka) and the Apache Kafka open source community!
(Stephane Maarek, DataCumulus) Kafka Summit SF 2018
Security in Kafka is a cornerstone of true enterprise production-ready deployment: It enables companies to control access to the cluster and limit risks in data corruption and unwanted operations. Understanding how to use security in Kafka and exploiting its capabilities can be complex, especially as the documentation that is available is aimed at people with substantial existing knowledge on the matter.
This talk will be delivered in a “hero journey” fashion, tracing the experience of an engineer with basic understanding of Kafka who is tasked with securing a Kafka cluster. Along the way, I will illustrate the benefits and implications of various mechanisms and provide some real-world tips on how users can simplify security management.
Attendees of this talk will learn about aspects of security in Kafka, including:
-Encryption: What is SSL, what problems it solves and how Kafka leverages it. We’ll discuss encryption in flight vs. encryption at rest.
-Authentication: Without authentication, anyone would be able to write to any topic in a Kafka cluster, do anything and remain anonymous. We’ll explore the available authentication mechanisms and their suitability for different types of deployment, including mutual SSL authentication, SASL/GSSAPI, SASL/SCRAM and SASL/PLAIN.
-Authorization: How ACLs work in Kafka, ZooKeeper security (risks and mitigations) and how to manage ACLs at scale
Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
Getting Started with Apache Spark on KubernetesDatabricks
Community adoption of Kubernetes (instead of YARN) as a scheduler for Apache Spark has been accelerating since the major improvements from Spark 3.0 release. Companies choose to run Spark on Kubernetes to use a single cloud-agnostic technology across their entire stack, and to benefit from improved isolation and resource sharing for concurrent workloads. In this talk, the founders of Data Mechanics, a serverless Spark platform powered by Kubernetes, will show how to easily get started with Spark on Kubernetes.
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This slide deck is a tutorial for the Kafka streaming platform. This slide deck covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example to demonstrate failover of brokers as well as consumers. Then it goes through some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have also expanded on the Kafka design section and added references. The tutorial covers Avro and the Schema Registry as well as advance Kafka Producers.
Increasingly, organizations are relying on Kafka for mission critical use-cases where high availability and fast recovery times are essential. In particular, enterprise operators need the ability to quickly migrate applications between clusters in order to maintain business continuity during outages. In many cases, out-of-order or missing records are entirely unacceptable. MirrorMaker is a popular tool for replicating topics between clusters, but it has proven inadequate for these enterprise multi-cluster environments. Here we present MirrorMaker 2.0, an upcoming all-new replication engine designed specifically to provide disaster recovery and high availability for Kafka. We describe various replication topologies and recovery strategies using MirrorMaker 2.0 and associated tooling.
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
Flink Forward San Francisco 2022.
Running natively on Kubernetes, using the new Apache Flink Kubernetes Operator is a great way to deploy and manage Flink application and session deployments. In this presentation, we provide: - A brief overview of Kubernetes operators and their benefits. - Introduce the five levels of the operator maturity model. - Introduce the newly released Apache Flink Kubernetes Operator and FlinkDeployment CRs - Dockerfile modifications you can make to swap out UBI images and Java of the underlying Flink Operator container - Enhancements we're making in: - Versioning/Upgradeability/Stability - Security - Demo of the Apache Flink Operator in-action, with a technical preview of an upcoming product using the Flink Kubernetes Operator. - Lessons learned - Q&A
by
James Busche & Ted Chang
Apache kafka performance(latency)_benchmark_v0.3SANG WON PARK
Apache Kafka를 이용하여 이미지 데이터를 얼마나 빠르게(with low latency) 전달 가능한지 성능 테스트.
최종 목적은 AI(ML/DL) 모델의 입력으로 대량의 실시간 영상/이미지 데이터를 전달하는 메세지 큐로 사용하기 위하여, Drone/제조공정 등의 장비에서 전송된 이미지를 얼마나 빨리 AI Model로 전달 할 수 있는지 확인하기 위함.
그래서 Kafka에서 이미지를 전송하는 간단한 테스트를 진행하였고,
이 과정에서 latency를 얼마나 줄여주는지를 확인해 보았다.(HTTP 프로토콜/Socket과 비교하여)
[현재 까지 결론]
- Apache Kafka는 대량의 요청 처리를 위한 throughtput에 최적화 된 솔루션임.
- 현재는 producer의 몇가지 옵션만 조정하여 테스트한 결과이므로,
- 잠정적인 결과이지만, kafka의 latency를 향상을 위해서는 많은 시도가 필요할 것 같음.
- 즉, 단일 요청의 latency는 확실히 느리지만,
- 대량의 처리를 기준으로 평균 latency를 비교하면 평균적인 latency는 많이 낮아짐.
Test Code : https://github.com/freepsw/kafka-latency-test
Kafka 2018 - Securing Kafka the Right WaySaylor Twift
How to evaluate, implement and maintain Kafka Message Broker in a high-throughput production environment. Taylor Swift's rectum probably smells like a Creamsicle.
Team Collaboration in Kafka Clusters With Maria Berinde-Tampanariu | Current ...HostedbyConfluent
Team Collaboration in Kafka Clusters With Maria Berinde-Tampanariu | Current 2022
When different teams start to use the same Kafka clusters, it opens up opportunities and challenges. During this talk, we will look at different architectures and team structures to explore ways in which to set up authorization in a granular and maintainable way for real-world users, as well as for producing or consuming clients.
What are the options offered by the Kafka built-in Authorizer, how can the Authorizer be customized and how are integrations with external systems built in order to provide group or role-based access control? Confluent Cloud and Confluent Platform provide predefined roles as part of the Role-based Access Control (RBAC) feature. We will look at the permissions included in these role bindings, the scope on which they can be used, and the components for which they are available. Role-based Access Control and Access Control Lists can be used together - let’s explore the options, best practices, and order of precedence.
We will put the capabilities into action by looking at the practices used by an imaginary company where the central Platform Team provisions clusters for its internal customers and provides access for teams to self-manage their domains. What’s the best approach to grant access to team members to their team’s resources and what needs to happen when one team collaborates with another team? What happens when a team member works temporarily on two teams?
We will close the session by looking at the ability to use the authorization mechanisms in conjunction with different authentication options and at the automation options to make the actions predictable and repeatable.
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...confluent
Apache Kafka is a scalable streaming platform with built-in dynamic client scaling. The elastic scale-in/scale-out feature leverages Kafka’s “rebalance protocol” that was designed in the 0.9 release and improved ever since then. The original design aims for on-prem deployments of stateless clients. However, it does not always align with modern deployment tools like Kubernetes and stateful stream processing clients, like Kafka Streams. Those shortcoming lead to two mayor recent improvement proposals, namely static group membership and incremental rebalancing (which will hopefully be available in version 2.3). This talk provides a deep dive into the details of the rebalance protocol, starting from its original design in version 0.9 up to the latest improvements and future work. We discuss internal technical details, pros and cons of the existing approaches, and explain how you configure your client correctly for your use case. Additionally, we discuss configuration tradeoffs for stateless, stateful, on-prem, and containerized deployments.
In the last few years, Apache Kafka has been used extensively in enterprises for real-time data collecting, delivering, and processing. In this presentation, Jun Rao, Co-founder, Confluent, gives a deep dive on some of the key internals that help make Kafka popular.
- Companies like LinkedIn are now sending more than 1 trillion messages per day to Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
- Many companies (e.g., financial institutions) are now storing mission critical data in Kafka. Learn how Kafka supports high availability and durability through its built-in replication mechanism.
- One common use case of Kafka is for propagating updatable database records. Learn how a unique feature called compaction in Apache Kafka is designed to solve this kind of problem more naturally.
Apache Kafka 0.8 basic training - VerisignMichael Noll
Apache Kafka 0.8 basic training (120 slides) covering:
1. Introducing Kafka: history, Kafka at LinkedIn, Kafka adoption in the industry, why Kafka
2. Kafka core concepts: topics, partitions, replicas, producers, consumers, brokers
3. Operating Kafka: architecture, hardware specs, deploying, monitoring, P&S tuning
4. Developing Kafka apps: writing to Kafka, reading from Kafka, testing, serialization, compression, example apps
5. Playing with Kafka using Wirbelsturm
Audience: developers, operations, architects
Created by Michael G. Noll, Data Architect, Verisign, https://www.verisigninc.com/
Verisign is a global leader in domain names and internet security.
Tools mentioned:
- Wirbelsturm (https://github.com/miguno/wirbelsturm)
- kafka-storm-starter (https://github.com/miguno/kafka-storm-starter)
Blog post at:
http://www.michael-noll.com/blog/2014/08/18/apache-kafka-training-deck-and-tutorial/
Many thanks to the LinkedIn Engineering team (the creators of Kafka) and the Apache Kafka open source community!
(Stephane Maarek, DataCumulus) Kafka Summit SF 2018
Security in Kafka is a cornerstone of true enterprise production-ready deployment: It enables companies to control access to the cluster and limit risks in data corruption and unwanted operations. Understanding how to use security in Kafka and exploiting its capabilities can be complex, especially as the documentation that is available is aimed at people with substantial existing knowledge on the matter.
This talk will be delivered in a “hero journey” fashion, tracing the experience of an engineer with basic understanding of Kafka who is tasked with securing a Kafka cluster. Along the way, I will illustrate the benefits and implications of various mechanisms and provide some real-world tips on how users can simplify security management.
Attendees of this talk will learn about aspects of security in Kafka, including:
-Encryption: What is SSL, what problems it solves and how Kafka leverages it. We’ll discuss encryption in flight vs. encryption at rest.
-Authentication: Without authentication, anyone would be able to write to any topic in a Kafka cluster, do anything and remain anonymous. We’ll explore the available authentication mechanisms and their suitability for different types of deployment, including mutual SSL authentication, SASL/GSSAPI, SASL/SCRAM and SASL/PLAIN.
-Authorization: How ACLs work in Kafka, ZooKeeper security (risks and mitigations) and how to manage ACLs at scale
Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
Getting Started with Apache Spark on KubernetesDatabricks
Community adoption of Kubernetes (instead of YARN) as a scheduler for Apache Spark has been accelerating since the major improvements from Spark 3.0 release. Companies choose to run Spark on Kubernetes to use a single cloud-agnostic technology across their entire stack, and to benefit from improved isolation and resource sharing for concurrent workloads. In this talk, the founders of Data Mechanics, a serverless Spark platform powered by Kubernetes, will show how to easily get started with Spark on Kubernetes.
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This slide deck is a tutorial for the Kafka streaming platform. This slide deck covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example to demonstrate failover of brokers as well as consumers. Then it goes through some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have also expanded on the Kafka design section and added references. The tutorial covers Avro and the Schema Registry as well as advance Kafka Producers.
Increasingly, organizations are relying on Kafka for mission critical use-cases where high availability and fast recovery times are essential. In particular, enterprise operators need the ability to quickly migrate applications between clusters in order to maintain business continuity during outages. In many cases, out-of-order or missing records are entirely unacceptable. MirrorMaker is a popular tool for replicating topics between clusters, but it has proven inadequate for these enterprise multi-cluster environments. Here we present MirrorMaker 2.0, an upcoming all-new replication engine designed specifically to provide disaster recovery and high availability for Kafka. We describe various replication topologies and recovery strategies using MirrorMaker 2.0 and associated tooling.
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
Flink Forward San Francisco 2022.
Running natively on Kubernetes, using the new Apache Flink Kubernetes Operator is a great way to deploy and manage Flink application and session deployments. In this presentation, we provide: - A brief overview of Kubernetes operators and their benefits. - Introduce the five levels of the operator maturity model. - Introduce the newly released Apache Flink Kubernetes Operator and FlinkDeployment CRs - Dockerfile modifications you can make to swap out UBI images and Java of the underlying Flink Operator container - Enhancements we're making in: - Versioning/Upgradeability/Stability - Security - Demo of the Apache Flink Operator in-action, with a technical preview of an upcoming product using the Flink Kubernetes Operator. - Lessons learned - Q&A
by
James Busche & Ted Chang
Apache kafka performance(latency)_benchmark_v0.3SANG WON PARK
Apache Kafka를 이용하여 이미지 데이터를 얼마나 빠르게(with low latency) 전달 가능한지 성능 테스트.
최종 목적은 AI(ML/DL) 모델의 입력으로 대량의 실시간 영상/이미지 데이터를 전달하는 메세지 큐로 사용하기 위하여, Drone/제조공정 등의 장비에서 전송된 이미지를 얼마나 빨리 AI Model로 전달 할 수 있는지 확인하기 위함.
그래서 Kafka에서 이미지를 전송하는 간단한 테스트를 진행하였고,
이 과정에서 latency를 얼마나 줄여주는지를 확인해 보았다.(HTTP 프로토콜/Socket과 비교하여)
[현재 까지 결론]
- Apache Kafka는 대량의 요청 처리를 위한 throughtput에 최적화 된 솔루션임.
- 현재는 producer의 몇가지 옵션만 조정하여 테스트한 결과이므로,
- 잠정적인 결과이지만, kafka의 latency를 향상을 위해서는 많은 시도가 필요할 것 같음.
- 즉, 단일 요청의 latency는 확실히 느리지만,
- 대량의 처리를 기준으로 평균 latency를 비교하면 평균적인 latency는 많이 낮아짐.
Test Code : https://github.com/freepsw/kafka-latency-test
Kafka 2018 - Securing Kafka the Right WaySaylor Twift
How to evaluate, implement and maintain Kafka Message Broker in a high-throughput production environment. Taylor Swift's rectum probably smells like a Creamsicle.
Team Collaboration in Kafka Clusters With Maria Berinde-Tampanariu | Current ...HostedbyConfluent
Team Collaboration in Kafka Clusters With Maria Berinde-Tampanariu | Current 2022
When different teams start to use the same Kafka clusters, it opens up opportunities and challenges. During this talk, we will look at different architectures and team structures to explore ways in which to set up authorization in a granular and maintainable way for real-world users, as well as for producing or consuming clients.
What are the options offered by the Kafka built-in Authorizer, how can the Authorizer be customized and how are integrations with external systems built in order to provide group or role-based access control? Confluent Cloud and Confluent Platform provide predefined roles as part of the Role-based Access Control (RBAC) feature. We will look at the permissions included in these role bindings, the scope on which they can be used, and the components for which they are available. Role-based Access Control and Access Control Lists can be used together - let’s explore the options, best practices, and order of precedence.
We will put the capabilities into action by looking at the practices used by an imaginary company where the central Platform Team provisions clusters for its internal customers and provides access for teams to self-manage their domains. What’s the best approach to grant access to team members to their team’s resources and what needs to happen when one team collaborates with another team? What happens when a team member works temporarily on two teams?
We will close the session by looking at the ability to use the authorization mechanisms in conjunction with different authentication options and at the automation options to make the actions predictable and repeatable.
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...confluent
In order to maximize Kafka accessibility within an organization, Kafka operators must choose an authentication option that balances security with ease of use. Kafka has been historically limited to a small number of authentication options that are difficult to integrate with a Single Signon (SSO) strategy, such as mutual TLS, basic auth, and Kerberos. The arrival of SASL/OAUTHBEARER in Kafka 2.0.0 affords system operators a flexible framework for integrating Kafka with their existing authentication infrastructure. Ron Dagostino (State Street Corporation) and Mike Kaminski (The New York Times) team up to discuss SASL/OAUTHBEARER and it’s real-world applications. Ron, who contributed the feature to core Kafka, explains the origins and intricacies of its development along with additional, related security changes, including client re-authentication (merged and scheduled for release in v2.2.0) and the plans for support of SASL/OAUTHBEARER in librdkafka-based clients. Mike Kaminski, a developer on The Publishing Pipeline team at The New York Times, talks about how his team leverages SASL/OAUTHBEARER to break down silos between teams by making it easy for product owners to get connected to the Publishing Pipeline’s Kafka cluster.
Introducing new features in Confluent Platform 5.4 and Apache Kafka 2.4...
CP 5.4 (based on AK 2.4)
Security:
Role-Based Access Control (RBAC)
Structured Audit Logs
Resilience:
Multi-Region Clusters (MRC)
Data Compatibility:
Server-side Schema Validation
Management & Monitoring:
Control Center enhancements
RBAC management
Replicator monitoring
Performance & Elasticity:
Tiered Storage (preview)
Stream Processing:
New ksqlDB features like Pull Queries and Kafka Connect Integration (preview)
Kafka is primarily used to build real-time streaming data pipelines and applications that adapt to the data streams. It combines messaging, storage, and stream processing to allow storage and analysis of both historical and real-time data.
Protecting your data at rest with Apache Kafka by Confluent and Vormetricconfluent
Learn how data in motion is secure within Apache Kafka and the broader Confluent Platform, while data at rest can be secured by solutions like Vormetric Data Security Manager.
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...HostedbyConfluent
If you want to build an ecosystem of streaming data to your Kafka platform, you will need a much easier way for your developer to quickly move what’s on the source to your cluster. Better yet, making the connector serverless so it would NOT waste any resources for being idle, and having a trusted partner manage your Kafka infrastructure for you. In this session, we will show you how easy we have made streaming data with great user experience. Flexible resource management with our new secret weapon in the Apache Camel project -- Kamelet. We’ll also demonstrate how Red Hat OpenShift Streams for Apache Kafka simplifies the provisioning of Kafka deployments in a public cloud, managing the cluster,topics, and configuring secure access to the Kafka cluster for your developers.
Peng Kang, Software Engineer, Dropbox + Richi Gupta, Engineering Manager, Dropbox
As a scalable and reliable data streaming solution with a rich ecosystem, Kafka is widely adopted in Dropbox infrastructure in various scenarios. It is part of Dropbox’s analytics data pipeline, stream processing platform and more mission critical systems. Jetstream is the team that provides Kafka as a service in Dropbox infrastructure. We manage the clusters, develop tooling, and enforce policies, so that our users can enjoy a highly available and reliable service. In this talk, we will share our experiences and learnings running Kafka clusters, pipelines that enable high durability (direct writes to kafka) and availability (goscribe), the policies we enforce for high reliability, the tooling we have for maintenance and stress testing, and finally an overview of Dropbox’s next generation queueing service built on top Kafka.
https://www.meetup.com/KafkaBayArea/events/266327152/
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...Timothy Spann
Scenic city summit real-time streaming in any and all clouds, hybrid and beyond
24-September-2021. Scenic City Summit. Virtual. Real-Time Streaming in Any and All Clouds, Hybrid and Beyond
Apache Pulsar, Apache NiFi, Apache Flink
StreamNative
Tim Spann
https://sceniccitysummit.com/
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Kai Wähner
High level introduction to Confluent REST Proxy and Schema Registry (leveraging Apache Avro under the hood), two components of the Apache Kafka open source ecosystem. See the concepts, architecture and features.
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
Presentation @ Oracle Code Berlin.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can we make sure that all these events are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amounts of messages between a source and a target. This session will start with an introduction of Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table.
Confluent Operations Training for Apache Kafkaconfluent
Course Objectives
In this three-day hands-on course, you will learn how to build, manage, and monitor clusters using industry best-practices developed by the world’s foremost Apache Kafka experts. You will learn how Kafka and the Confluent Platform work, their main subsystems, how they interact, and how to set up, manage, monitor, and tune your cluster.
For more information, please visit www.confluent.io/training/
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
In our exclusive webinar, you'll learn why event-driven architecture is the key to unlocking cost efficiency, operational effectiveness, and profitability. Gain insights on how this approach differs from API-driven methods and why it's essential for your organization's success.
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
In today's data-driven world, the Internet of Things (IoT) is revolutionizing industries and unlocking new possibilities. Join Data Reply, Confluent, and Imply as we unveil a comprehensive solution for IoT that harnesses the power of real-time insights.
Workshop híbrido: Stream Processing con Flinkconfluent
El Stream processing es un requisito previo de la pila de data streaming, que impulsa aplicaciones y pipelines en tiempo real.
Permite una mayor portabilidad de datos, una utilización optimizada de recursos y una mejor experiencia del cliente al procesar flujos de datos en tiempo real.
En nuestro taller práctico híbrido, aprenderás cómo filtrar, unir y enriquecer fácilmente datos en tiempo real dentro de Confluent Cloud utilizando nuestro servicio Flink sin servidor.
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
Our talk will explore the transformative impact of integrating Confluent, HiveMQ, and SparkPlug in Industry 4.0, emphasizing the creation of a Unified Namespace.
In addition to the creation of a Unified Namespace, our webinar will also delve into Stream Governance and Scaling, highlighting how these aspects are crucial for managing complex data flows and ensuring robust, scalable IIoT-Platforms.
You will learn how to ensure data accuracy and reliability, expand your data processing capabilities, and optimize your data management processes.
Don't miss out on this opportunity to learn from industry experts and take your business to the next level.
La arquitectura impulsada por eventos (EDA) será el corazón del ecosistema de MAPFRE. Para seguir siendo competitivas, las empresas de hoy dependen cada vez más del análisis de datos en tiempo real, lo que les permite obtener información y tiempos de respuesta más rápidos. Los negocios con datos en tiempo real consisten en tomar conciencia de la situación, detectar y responder a lo que está sucediendo en el mundo ahora.
Eventos y Microservicios - Santander TechTalkconfluent
Durante esta sesión examinaremos cómo el mundo de los eventos y los microservicios se complementan y mejoran explorando cómo los patrones basados en eventos nos permiten descomponer monolitos de manera escalable, resiliente y desacoplada.
Purpose of the session is to have a dive into Apache, Kafka, Data Streaming and Kafka in the cloud
- Dive into Apache Kafka
- Data Streaming
- Kafka in the cloud
Build real-time streaming data pipelines to AWS with Confluentconfluent
Traditional data pipelines often face scalability issues and challenges related to cost, their monolithic design, and reliance on batch data processing. They also typically operate under the premise that all data needs to be stored in a single centralized data source before it's put to practical use. Confluent Cloud on Amazon Web Services (AWS) provides a fully managed cloud-native platform that helps you simplify the way you build real-time data flows using streaming data pipelines and Apache Kafka.
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
No matter whether you are migrating your Kafka cluster to Confluent Cloud, running a cloud-hybrid environment or are in a different situation where data protection and encryption of sensitive information is required, Confluent Service Mesh allows you to transparently encrypt your data without the need to make code changes to you existing applications.
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
Microservices have become a dominant architectural paradigm for building systems in the enterprise, but they are not without their tradeoffs. Learn how to build event-driven microservices with Apache Kafka
Confluent & GSI Webinars series - Session 3confluent
An in depth look at how Confluent is being used in the financial services industry. Gain an understanding of how organisations are utilising data in motion to solve common problems and gain benefits from their real time data capabilities.
It will look more deeply into some specific use cases and show how Confluent technology is used to manage costs and mitigate risks.
This session is aimed at Solutions Architects, Sales Engineers and Pre Sales, and also the more technically minded business aligned people. Whilst this is not a deeply technical session, a level of knowledge around Kafka would be helpful.
Transforming applications built with traditional messaging solutions such as TIBCO, MQ and Solace to be scalable, reliable and ready for the move to cloud
How can applications built with traditional messaging technologies like TIBCO, Solace and IBM MQ be modernised and be made cloud ready? What are the advantages to Event Streaming approaches to pub/sub vs traditional message queues? What are the strengeths and weaknesses of both approaches, and what use cases and requirements are actually a better fit for messaging than Kafka?
This session will show why the old paradigm does not work and that a new approach to the data strategy needs to be taken. It aims to show how a Data Streaming Platform is integral to the evolution of a company’s data strategy and how Confluent is not just an integration layer but the central nervous system for an organisation
Vous apprendrez également à :
• Créer plus rapidement des produits et fonctionnalités à l’aide d’une suite complète de connecteurs et d’outils de gestion des flux, et à connecter vos environnements à des pipelines de données
• Protéger vos données et charges de travail les plus critiques grâce à des garanties intégrées en matière de sécurité, de gouvernance et de résilience
• Déployer Kafka à grande échelle en quelques minutes tout en réduisant les coûts et la charge opérationnelle associés
Confluent Partner Tech Talk with Synthesisconfluent
A discussion on the arduous planning process, and deep dive into the design/architectural decisions.
Learn more about the networking, RBAC strategies, the automation, and the deployment plan.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Globus Connect Server Deep Dive - GlobusWorld 2024
How to Lock Down Apache Kafka and Keep Your Streams Safe
1. How to Lock Down Apache Kafka
and Keep Your Streams Safe
Rajini Sivaram
2. About me
• Principal Software Engineer at Pivotal UK
• Apache Kafka Committer
• Project Lead: Reactor Kafka
– https://github.com/reactor/reactor-kafka
• Previously at IBM
– Message Hub developer: Kafka-as-a-Service on Bluemix
3. Outline
• Kafka Cluster Overview
• Securing Kafka Clusters
– Authentication
– Authorization
– Quotas
– Encryption
• Lock Down Kafka and ZooKeeper
• New security features
6. Outline
• Kafka Cluster Overview
• Securing Kafka Clusters
– Authentication
– Authorization
– Quotas
– Encryption
• Lock Down Kafka and ZooKeeper
• New security features
7. Authentication
• Client authentication
– Server verifies the identity (user principal) of the client
• Server authentication
– Client verifies that connection is to a genuine server
• Authentication mechanisms in Kafka
– TLS
– SASL
10. Client trust store
Server key store
Issuer’s certificate
TLS authentication
ssl.keystore.location=/path/ks.jks
ssl.keystore.password=ks-secret
ssl.key.password=key-secret
ssl.truststore.location=/path/trust.jks
ssl.truststore.password=ts-secret
ssl.endpoint.identification.algorithm=https
Server’s certificate
Distinguished Name(DN)
Server hostname (SAN)
Valid from: to:
Issuer DN
Issuer Digital Signature
Server Public Key
Issuer’s certificate
Issuer Public Key
Issuer Digital Signature
Issuer DN
Server
Private Key
✔
✔
✔
11. TLS Security Considerations
Threat Mitigation
Security vulnerability in older
protocols
• Use latest TLS version: TLSv1.2
Cryptographic attacks • Only strong cipher suites (e.g. 256-bit encryption key size)
• Minimum 2048-bit RSA key size
Man-in-the-middle attack • Disable anonymous key exchange using Diffie-Hellman
ciphers
• Enable hostname verification
Private key compromised • Certificate revocation using CRL
• Use short-lived keys to reduce exposure
Man-in-the-middle attack during
renegotiation
• Disable insecure renegotiation
• Note: TLS renegotiation is disabled in Kafka
Tampering with data during transit • Use ciphers with secure message digest to guarantee
integrity
DDoS attack • Enable quotas and connection rate throttling
12. Why TLS?
• Authentication
– Server
– Client
• Confidentiality
– Guarantees privacy of data in motion
• Integrity
– Message digest included with many ciphers
• Horizontally scalable
13. TLS drawbacks
• Performance impact
– latency and throughput
• 20-30% degradation
• High CPU cost of encryption
– Lose zero-copy transfer
• TLS-renegotiation is disabled
– Authenticate only once
• Vulnerable to DDoS attacks
• PKI infrastructure required
Throughput
Message Size
CA
VA
RA
CRL
RA
VA
14. SASL
• Simple Authentication and Security Layer
– Extensible authentication framework for
connection-oriented protocols
• Standard protocol for different mechanisms
– GSSAPI (since 0.9.0)
– PLAIN (since 0.10.0)
– SCRAM (since 0.10.2)
• Can negotiate security layer, but this feature
is not used in Kafka
– SASL_SSL/SASL_PLAINTEXT
15. SASL Handshake
Client
Kafka SaslHandshake request
(mechanism=GSSAPI)
Server
Establish connection
Kafka SaslHandshake response
Enabled mechanisms=GSSAPI,PLAIN
SASL handshake for selected mechanism
Challenge
Transport Layer
(eg. TLS handshake)
Kafka SASL
Handshake request
SASL authentication
using selected
mechanism
Kafka requests and
responses
Response
Authenticated
18. SASL/GSSAPI Security Considerations
Threat Mitigation
Dictionary attack • Enforce strong password policies
Keytab file compromised • Restrict access to keytab files and directory
• If user compromised, revoke access using ACLs. Restart
processes to force reconnections if required.
Eavesdropping, tampering with
data (after authentication
completes)
• Kafka does not use Kerberos encryption
• SASL_SSL should be used to guarantee confidentiality and
integrity if the traffic is not on a secure network
Hostname resolution issues • Secure correctly configured DNS
KDC failure • Set up multiple slave KDCs alongside a master KDC to
avoid single-point-of-failure
20. SASL/PLAIN customization
• Integrate with external authentication server
• SASL/PLAIN security provider
Kafka Broker
MyPlainProviderMyPlainLoginModule
KafkaServer {
com.pivotal.MyPlainLoginModule required
authentication.server=“https://my.server";
};
Authentication
Server
21. SASL/PLAIN Security Considerations
Threat Mitigation
Dictionary attack • Enforce strong password policies
Eavesdropping and replay attack • PLAIN must only be used with TLS
• Connection between Kafka and authentication
server/database must also be secure
User compromised • Revoke all access using ACLs
• Restart brokers if required to break connections
Password database compromised • Update authentication server
• Re-authentication of existing connections is not
supported, restart brokers.
23. SASL/SCRAM protocol
sasl.jaas.config=
org.apache.kafka.common.security.scram.ScramLoginModule required
username="alice” password="alice-secret”;
Kafka Broker
Kafka
Client
Zookeeper
• Client proves to the broker that client possesses the password for user
• Broker proves to the client that broker once possessed the password for user
alice, c-nonce /config/users/alice
salt,iterations,
salted keys
c-s-nonce, salt,
iterations
c-s-nonce,
client-proof
c-s-nonce,
server-proof
✔
✔
KafkaServer {
o.a.k.c.s.scram.ScramLoginModule required;
};
Cache
24. SASL/SCRAM Security Considerations
Threat Mitigation
Dictionary attack • Enforce strong password policies
Offline brute force attack • Use high iteration count, strong hash function
User compromised • Revoke all access for user
• Restart broker to disconnect if required
Zookeeper compromised • SCRAM is safe against replay attack
• Use with TLS to avoid interception of messages for use in
dictionary/brute force attacks
• Use strong hash function like SHA-256 or SHA-512
• Use high iteration count
Insecure Zookeeper installation • Use alternative secure password store for SCRAM
26. Choosing an authentication protocol
Authentication protocol Use if:
TLS • On insecure network and require encryption
• Server authentication and hostname verification required
• Already have PKI infrastructure for client auth
SASL/GSSAPI • Already have Kerberos infrastructure
• Insecure ZooKeeper installation, don’t want to integrate
with custom password database for SCRAM
SASL/PLAIN • Integrating with existing password server/database
SASL/SCRAM • Require username/password authentication without
external server
• Secure ZooKeeper installation
Custom SASL mechanism • Integrating with existing authentication server
27. Outline
• Kafka Cluster Overview
• Securing Kafka Clusters
– Authentication
– Authorization
– Quotas
– Encryption
• Lock Down Kafka and ZooKeeper
• New security features
28. Authorization
• User Principal
– ANONYMOUS for unauthenticated clients
– Configurable PrincipalBuilder for TLS
– Mechanism-specific user name for SASL
• Access Control Lists (ACL)
• Pluggable Authorizer
– Default out-of-the-box authorizer: SimpleAclAuthorizer
bin/kafka-acls.sh --authorizer-properties zookeeper.connect=localhost:2181 --add --allow-principal
User:alice --allow-host 198.51.100.0 --operation Read --operation Write --topic test-topic
✗
29. Access Control
alice Allow Read Topic Host
Deny Cluster
Operation Resource From hostPermissionUser Principal
Consumer
Group
Create
Delete
Alter
Describe
Write
ClusterAction
bob
✔
✗
Super user
31. Outline
• Kafka Cluster Overview
• Securing Kafka Clusters
– Authentication
– Authorization
– Quotas
– Encryption
• Lock Down Kafka and ZooKeeper
• New security features
32. Quotas
• Quota types
– Replication quota
– Bandwidth quota (Produce/Fetch)
– Request quotas (from 0.11.0)
• Per-broker quotas
– If usage exceeds quota, response is delayed
– Throttle time returned to clients, exposed as metrics
• Quota configuration in ZooKeeper
– Can be dynamically updated
bin/kafka-configs.sh --zookeeper localhost:2181 --alter --add-config
'producer_byte_rate=1024,consumer_byte_rate=2048' --entity-name alice --entity-type users
Kafka
Broker
Client
33. Quota Configuration
• Multi-level quotas: <client-id>, <user> or <user, client-id> levels
• The most specific quota configuration is applied to any connection
<user>
<client-id>
users
clients
<default>
<default>
<client-id>
<client-id>
clients
<default>clients
<default>
config
34. Outline
• Kafka Cluster Overview
• Securing Kafka Clusters
– Authentication
– Authorization
– Quotas
– Encryption
• Lock Down Kafka and ZooKeeper
• New security features
35. Encryption
• TLS
– Encrypt data during transit to prevent
eavesdropping
• Disk encryption
– Encrypt data at rest to protect sensitive data
• End-to-end encryption
– Clients send encrypted data (eg.
serialize/deserialize)
– Different keys to encrypt data to different topics
– Combine with TLS/SASL for authentication, TLS
to avoid man-in-the-middle
36. Outline
• Kafka Cluster Overview
• Securing Kafka Clusters
– Authentication
– Authorization
– Quotas
– Encryption
• Lock Down Kafka and ZooKeeper
• New security features
38. Zookeeper Server
Securing ZooKeeper
• ZooKeeper stores critical metadata for Kafka
• Lock down updates to Zookeeper
– SASL
• GSSAPI (Kerberos)
• Digest-MD5
– Set zookeeper.set.acl=true on Kafka brokers
• TLS is currently not supported for ZooKeeper
– Use network segmentation to limit access
SASL
40. Secure Kafka on the Cloud
Kafka BrokerKafka BrokerKafka Broker
Private Network
Kafka BrokerKafka BrokerZookeeper Server
Kafka Producer Kafka Consumer Kafka Connect Kafka Streams Kafka Admin
Public Network
TLS ProxyTLS ProxyTLS Proxy
Kafka Clients
Admin/ConfigTools
41. Outline
• Kafka Cluster Overview
• Securing Kafka Clusters
– Authentication
– Authorization
– Quotas
– Encryption
• Lock Down Kafka and ZooKeeper
• New security features
42. New features in 0.10.2
• Broker
– Multiple endpoints with the same security protocol
• Client
– Dynamic JAAS configuration without a file
– Multiple credentials within a JVM
• SASL mechanisms
– SCRAM-SHA-256, SCRAM-SHA-512
Kafka
Broker
Kafka
Broker
43. Future work
• KIP-48: Delegation tokens
• KIP-124: CPU utilization quota for requests
• KIP-117: Add a public AdminClient API for Kafka
• KIP-86: Configurable SASL callbacks
• KIP-111: Improve custom
PrincipalBuilder/Authorizer integration