A step-by-step deep dive into Kafka Security world. This presentation covers few most sought-after questions in Streaming / Kafka; like what happens internally when SASL / Kerberos / SSL security is configured, how does various Kafka components interacts with each other. This could be valuable resource for administrators, users & Application developers alike. Having internal Kafka knowledge would help them to configure, manage and use the Kafka systems in a more optimal way with least possible errors / mistakes.
Agenda is to discuss:
- Various Kafka Security model available: PLAINTEXT, SASL_PLAINTEXT, SASL_SSL, PLAINTEXT_SSL and when to use which model
- Anatomy of each Security model: in-depth examination of these models and what happens internally when they are used; with real life examples
- Do's and Don'ts of Kafka Security
- Common Errors & Troubleshooting
This talk will be all about looking under-the-hood with respect to Kafka Security. Suitable for all levels from beginners to expert.
Speaker
Vipin Rathor, Sr. Product Specialist (security), Hortonworks
With Apache Kafka 0.9, the community has introduced a number of features to make data streams secure. In this talk, we’ll explain the motivation for making these changes, discuss the design of Kafka security, and explain how to secure a Kafka cluster. We will cover common pitfalls in securing Kafka, and talk about ongoing security work.
Getting Started with Confluent Schema Registryconfluent
Getting started with Confluent Schema Registry, Patrick Druley, Senior Solutions Engineer, Confluent
Meetup link: https://www.meetup.com/Cleveland-Kafka/events/272787313/
It covers a brief introduction to Apache Kafka Connect, giving insights about its benefits,use cases, motivation behind building Kafka Connect.And also a short discussion on its architecture.
Kafka and Confluent are nice, but what about the integration with public clouds like Azure. Or even better, to integrate Kafka and Confluent with a managed API management like Azure API Gateway.
In this talk I will show you how it is possible to integrate an event streaming platform like Confluent into an enterprise API Management and different other services to build up a lambda based data platform architecture.
A step-by-step deep dive into Kafka Security world. This presentation covers few most sought-after questions in Streaming / Kafka; like what happens internally when SASL / Kerberos / SSL security is configured, how does various Kafka components interacts with each other. This could be valuable resource for administrators, users & Application developers alike. Having internal Kafka knowledge would help them to configure, manage and use the Kafka systems in a more optimal way with least possible errors / mistakes.
Agenda is to discuss:
- Various Kafka Security model available: PLAINTEXT, SASL_PLAINTEXT, SASL_SSL, PLAINTEXT_SSL and when to use which model
- Anatomy of each Security model: in-depth examination of these models and what happens internally when they are used; with real life examples
- Do's and Don'ts of Kafka Security
- Common Errors & Troubleshooting
This talk will be all about looking under-the-hood with respect to Kafka Security. Suitable for all levels from beginners to expert.
Speaker
Vipin Rathor, Sr. Product Specialist (security), Hortonworks
With Apache Kafka 0.9, the community has introduced a number of features to make data streams secure. In this talk, we’ll explain the motivation for making these changes, discuss the design of Kafka security, and explain how to secure a Kafka cluster. We will cover common pitfalls in securing Kafka, and talk about ongoing security work.
Getting Started with Confluent Schema Registryconfluent
Getting started with Confluent Schema Registry, Patrick Druley, Senior Solutions Engineer, Confluent
Meetup link: https://www.meetup.com/Cleveland-Kafka/events/272787313/
It covers a brief introduction to Apache Kafka Connect, giving insights about its benefits,use cases, motivation behind building Kafka Connect.And also a short discussion on its architecture.
Kafka and Confluent are nice, but what about the integration with public clouds like Azure. Or even better, to integrate Kafka and Confluent with a managed API management like Azure API Gateway.
In this talk I will show you how it is possible to integrate an event streaming platform like Confluent into an enterprise API Management and different other services to build up a lambda based data platform architecture.
Kafka Streams is a new stream processing library natively integrated with Kafka. It has a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. As such it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is backed by Kafka. We will provide the audience with an overview of Kafka Streams including its design and API, typical use cases, code examples, and an outlook of its upcoming roadmap. We will also compare Kafka Streams' light-weight library approach with heavier, framework-based tools such as Spark Streaming or Storm, which require you to understand and operate a whole different infrastructure for processing real-time data in Kafka.
ksqlDB is a stream processing SQL engine, which allows stream processing on top of Apache Kafka. ksqlDB is based on Kafka Stream and provides capabilities for consuming messages from Kafka, analysing these messages in near-realtime with a SQL like language and produce results again to a Kafka topic. By that, no single line of Java code has to be written and you can reuse your SQL knowhow. This lowers the bar for starting with stream processing significantly.
ksqlDB offers powerful capabilities of stream processing, such as joins, aggregations, time windows and support for event time. In this talk I will present how KSQL integrates with the Kafka ecosystem and demonstrate how easy it is to implement a solution using ksqlDB for most part. This will be done in a live demo on a fictitious IoT sample.
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This slide deck is a tutorial for the Kafka streaming platform. This slide deck covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example to demonstrate failover of brokers as well as consumers. Then it goes through some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have also expanded on the Kafka design section and added references. The tutorial covers Avro and the Schema Registry as well as advance Kafka Producers.
Modern businesses have data at their core, and this data is changing continuously. How can we harness this torrent of information in real-time? The answer is stream processing, and the technology that has since become the core platform for streaming data is Apache Kafka. Among the thousands of companies that use Kafka to transform and reshape their industries are the likes of Netflix, Uber, PayPal, and AirBnB, but also established players such as Goldman Sachs, Cisco, and Oracle.
Unfortunately, today’s common architectures for real-time data processing at scale suffer from complexity: there are many technologies that need to be stitched and operated together, and each individual technology is often complex by itself. This has led to a strong discrepancy between how we, as engineers, would like to work vs. how we actually end up working in practice.
In this session we talk about how Apache Kafka helps you to radically simplify your data processing architectures. We cover how you can now build normal applications to serve your real-time processing needs — rather than building clusters or similar special-purpose infrastructure — and still benefit from properties such as high scalability, distributed computing, and fault-tolerance, which are typically associated exclusively with cluster technologies. Notably, we introduce Kafka’s Streams API, its abstractions for streams and tables, and its recently introduced Interactive Queries functionality. As we will see, Kafka makes such architectures equally viable for small, medium, and large scale use cases.
Watch this talk here: https://www.confluent.io/online-talks/apache-kafka-architecture-and-fundamentals-explained-on-demand
This session explains Apache Kafka’s internal design and architecture. Companies like LinkedIn are now sending more than 1 trillion messages per day to Apache Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
This talk provides a comprehensive overview of Kafka architecture and internal functions, including:
-Topics, partitions and segments
-The commit log and streams
-Brokers and broker replication
-Producer basics
-Consumers, consumer groups and offsets
This session is part 2 of 4 in our Fundamentals for Apache Kafka series.
A brief introduction to Apache Kafka and describe its usage as a platform for streaming data. It will introduce some of the newer components of Kafka that will help make this possible, including Kafka Connect, a framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library.
ksqlDB: A Stream-Relational Database Systemconfluent
Speaker: Matthias J. Sax, Software Engineer, Confluent
ksqlDB is a distributed event streaming database system that allows users to express SQL queries over relational tables and event streams. The project was released by Confluent in 2017 and is hosted on Github and developed with an open-source spirit. ksqlDB is built on top of Apache Kafka®, a distributed event streaming platform. In this talk, we discuss ksqlDB’s architecture that is influenced by Apache Kafka and its stream processing library, Kafka Streams. We explain how ksqlDB executes continuous queries while achieving fault tolerance and high vailability. Furthermore, we explore ksqlDB’s streaming SQL dialect and the different types of supported queries.
Matthias J. Sax is a software engineer at Confluent working on ksqlDB. He mainly contributes to Kafka Streams, Apache Kafka's stream processing library, which serves as ksqlDB's execution engine. Furthermore, he helps evolve ksqlDB's "streaming SQL" language. In the past, Matthias also contributed to Apache Flink and Apache Storm and he is an Apache committer and PMC member. Matthias holds a Ph.D. from Humboldt University of Berlin, where he studied distributed data stream processing systems.
https://db.cs.cmu.edu/events/quarantine-db-talk-2020-confluent-ksqldb-a-stream-relational-database-system/
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...HostedbyConfluent
In our payments platform at Goldman Sachs Transaction Banking, Apache Kafka plays a critical role as the messaging bus in our micro-services architecture. Being a part of the financial service industry we need to ensure high-availability of our platform and quick response time during failures.
In this talk we will explore how we monitor and alert on the health of our Kafka clusters using our heartbeat application and clients using DataDog dashboards. We will see how we consolidate JMX metrics such as error-rates, connection-rates, latencies and consumer lag from all producers and consumers using JMX agent sidecar to provide a live view of the health of our entire infrastructure. We will also discuss our culture of game days where we regularly test the resiliency of all the clients in our infrastructure by simulating various failure scenarios to improve the overall availability of our infrastructure.
Introduction to KSQL: Streaming SQL for Apache Kafka®confluent
Join Tom Green, Solution Engineer at Confluent for this Lunch and Learn talk covering KSQL. Confluent KSQL is the streaming SQL engine that enables real-time data processing against Apache Kafka®. It provides an easy-to-use, yet powerful interactive SQL interface for stream processing on Kafka, without the need to write code in a programming language such as Java or Python. KSQL is scalable, elastic, fault-tolerant, and it supports a wide range of streaming operations, including data filtering, transformations, aggregations, joins, windowing, and sessionization.
By attending one of these sessions, you will learn:
-How to query streams, using SQL, without writing code.
-How KSQL provides automated scalability and out-of-the-box high availability for streaming queries
-How KSQL can be used to join streams of data from different sources
-The differences between Streams and Tables in Apache Kafka
Full recorded presentation at https://www.youtube.com/watch?v=2UfAgCSKPZo for Tetrate Tech Talks on 2022/05/13.
Envoy's support for Kafka protocol, in form of broker-filter and mesh-filter.
Contents:
- overview of Kafka (usecases, partitioning, producer/consumer, protocol);
- proxying Kafka (non-Envoy specific);
- proxying Kafka with Envoy;
- handling Kafka protocol in Envoy;
- Kafka-broker-filter for per-connection proxying;
- Kafka-mesh-filter to provide front proxy for multiple Kafka clusters.
References:
- https://adam-kotwasinski.medium.com/deploying-envoy-and-kafka-8aa7513ec0a0
- https://adam-kotwasinski.medium.com/kafka-mesh-filter-in-envoy-a70b3aefcdef
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaKai Wähner
Streaming all over the World: Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka.
Learn about various case studies for event streaming with Apache Kafka across industries. The talk explores architectures for real-world deployments from Audi, BMW, Disney, Generali, Paypal, Tesla, Unity, Walmart, William Hill, and more. Use cases include fraud detection, mainframe offloading, predictive maintenance, cybersecurity, edge computing, track&trace, live betting, and much more.
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Kai Wähner
Architecture patterns for distributed, hybrid, edge and global Apache Kafka deployments
Multi-cluster and cross-data center deployments of Apache Kafka have become the norm rather than an exception. This session gives an overview of several scenarios that may require multi-cluster solutions and discusses real-world examples with their specific requirements and trade-offs, including disaster recovery, aggregation for analytics, cloud migration, mission-critical stretched deployments and global Kafka.
Key takeaways:
In many scenarios, one Kafka cluster is not enough. Understand different architectures and alternatives for multi-cluster deployments.
Zero data loss and high availability are two key requirements. Understand how to realize this, including trade-offs.
Learn about features and limitations of Kafka for multi cluster deployments
Global Kafka and mission-critical multi-cluster deployments with zero data loss and high availability became the normal, not an exception.
Producer Performance Tuning for Apache KafkaJiangjie Qin
Kafka is well known for high throughput ingestion. However, to get the best latency characteristics without compromising on throughput and durability, we need to tune Kafka. In this talk, we share our experiences to achieve the optimal combination of latency, throughput and durability for different scenarios.
Team Collaboration in Kafka Clusters With Maria Berinde-Tampanariu | Current ...HostedbyConfluent
Team Collaboration in Kafka Clusters With Maria Berinde-Tampanariu | Current 2022
When different teams start to use the same Kafka clusters, it opens up opportunities and challenges. During this talk, we will look at different architectures and team structures to explore ways in which to set up authorization in a granular and maintainable way for real-world users, as well as for producing or consuming clients.
What are the options offered by the Kafka built-in Authorizer, how can the Authorizer be customized and how are integrations with external systems built in order to provide group or role-based access control? Confluent Cloud and Confluent Platform provide predefined roles as part of the Role-based Access Control (RBAC) feature. We will look at the permissions included in these role bindings, the scope on which they can be used, and the components for which they are available. Role-based Access Control and Access Control Lists can be used together - let’s explore the options, best practices, and order of precedence.
We will put the capabilities into action by looking at the practices used by an imaginary company where the central Platform Team provisions clusters for its internal customers and provides access for teams to self-manage their domains. What’s the best approach to grant access to team members to their team’s resources and what needs to happen when one team collaborates with another team? What happens when a team member works temporarily on two teams?
We will close the session by looking at the ability to use the authorization mechanisms in conjunction with different authentication options and at the automation options to make the actions predictable and repeatable.
Kafka 2018 - Securing Kafka the Right WaySaylor Twift
How to evaluate, implement and maintain Kafka Message Broker in a high-throughput production environment. Taylor Swift's rectum probably smells like a Creamsicle.
Kafka Streams is a new stream processing library natively integrated with Kafka. It has a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. As such it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is backed by Kafka. We will provide the audience with an overview of Kafka Streams including its design and API, typical use cases, code examples, and an outlook of its upcoming roadmap. We will also compare Kafka Streams' light-weight library approach with heavier, framework-based tools such as Spark Streaming or Storm, which require you to understand and operate a whole different infrastructure for processing real-time data in Kafka.
ksqlDB is a stream processing SQL engine, which allows stream processing on top of Apache Kafka. ksqlDB is based on Kafka Stream and provides capabilities for consuming messages from Kafka, analysing these messages in near-realtime with a SQL like language and produce results again to a Kafka topic. By that, no single line of Java code has to be written and you can reuse your SQL knowhow. This lowers the bar for starting with stream processing significantly.
ksqlDB offers powerful capabilities of stream processing, such as joins, aggregations, time windows and support for event time. In this talk I will present how KSQL integrates with the Kafka ecosystem and demonstrate how easy it is to implement a solution using ksqlDB for most part. This will be done in a live demo on a fictitious IoT sample.
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This slide deck is a tutorial for the Kafka streaming platform. This slide deck covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example to demonstrate failover of brokers as well as consumers. Then it goes through some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have also expanded on the Kafka design section and added references. The tutorial covers Avro and the Schema Registry as well as advance Kafka Producers.
Modern businesses have data at their core, and this data is changing continuously. How can we harness this torrent of information in real-time? The answer is stream processing, and the technology that has since become the core platform for streaming data is Apache Kafka. Among the thousands of companies that use Kafka to transform and reshape their industries are the likes of Netflix, Uber, PayPal, and AirBnB, but also established players such as Goldman Sachs, Cisco, and Oracle.
Unfortunately, today’s common architectures for real-time data processing at scale suffer from complexity: there are many technologies that need to be stitched and operated together, and each individual technology is often complex by itself. This has led to a strong discrepancy between how we, as engineers, would like to work vs. how we actually end up working in practice.
In this session we talk about how Apache Kafka helps you to radically simplify your data processing architectures. We cover how you can now build normal applications to serve your real-time processing needs — rather than building clusters or similar special-purpose infrastructure — and still benefit from properties such as high scalability, distributed computing, and fault-tolerance, which are typically associated exclusively with cluster technologies. Notably, we introduce Kafka’s Streams API, its abstractions for streams and tables, and its recently introduced Interactive Queries functionality. As we will see, Kafka makes such architectures equally viable for small, medium, and large scale use cases.
Watch this talk here: https://www.confluent.io/online-talks/apache-kafka-architecture-and-fundamentals-explained-on-demand
This session explains Apache Kafka’s internal design and architecture. Companies like LinkedIn are now sending more than 1 trillion messages per day to Apache Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
This talk provides a comprehensive overview of Kafka architecture and internal functions, including:
-Topics, partitions and segments
-The commit log and streams
-Brokers and broker replication
-Producer basics
-Consumers, consumer groups and offsets
This session is part 2 of 4 in our Fundamentals for Apache Kafka series.
A brief introduction to Apache Kafka and describe its usage as a platform for streaming data. It will introduce some of the newer components of Kafka that will help make this possible, including Kafka Connect, a framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library.
ksqlDB: A Stream-Relational Database Systemconfluent
Speaker: Matthias J. Sax, Software Engineer, Confluent
ksqlDB is a distributed event streaming database system that allows users to express SQL queries over relational tables and event streams. The project was released by Confluent in 2017 and is hosted on Github and developed with an open-source spirit. ksqlDB is built on top of Apache Kafka®, a distributed event streaming platform. In this talk, we discuss ksqlDB’s architecture that is influenced by Apache Kafka and its stream processing library, Kafka Streams. We explain how ksqlDB executes continuous queries while achieving fault tolerance and high vailability. Furthermore, we explore ksqlDB’s streaming SQL dialect and the different types of supported queries.
Matthias J. Sax is a software engineer at Confluent working on ksqlDB. He mainly contributes to Kafka Streams, Apache Kafka's stream processing library, which serves as ksqlDB's execution engine. Furthermore, he helps evolve ksqlDB's "streaming SQL" language. In the past, Matthias also contributed to Apache Flink and Apache Storm and he is an Apache committer and PMC member. Matthias holds a Ph.D. from Humboldt University of Berlin, where he studied distributed data stream processing systems.
https://db.cs.cmu.edu/events/quarantine-db-talk-2020-confluent-ksqldb-a-stream-relational-database-system/
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...HostedbyConfluent
In our payments platform at Goldman Sachs Transaction Banking, Apache Kafka plays a critical role as the messaging bus in our micro-services architecture. Being a part of the financial service industry we need to ensure high-availability of our platform and quick response time during failures.
In this talk we will explore how we monitor and alert on the health of our Kafka clusters using our heartbeat application and clients using DataDog dashboards. We will see how we consolidate JMX metrics such as error-rates, connection-rates, latencies and consumer lag from all producers and consumers using JMX agent sidecar to provide a live view of the health of our entire infrastructure. We will also discuss our culture of game days where we regularly test the resiliency of all the clients in our infrastructure by simulating various failure scenarios to improve the overall availability of our infrastructure.
Introduction to KSQL: Streaming SQL for Apache Kafka®confluent
Join Tom Green, Solution Engineer at Confluent for this Lunch and Learn talk covering KSQL. Confluent KSQL is the streaming SQL engine that enables real-time data processing against Apache Kafka®. It provides an easy-to-use, yet powerful interactive SQL interface for stream processing on Kafka, without the need to write code in a programming language such as Java or Python. KSQL is scalable, elastic, fault-tolerant, and it supports a wide range of streaming operations, including data filtering, transformations, aggregations, joins, windowing, and sessionization.
By attending one of these sessions, you will learn:
-How to query streams, using SQL, without writing code.
-How KSQL provides automated scalability and out-of-the-box high availability for streaming queries
-How KSQL can be used to join streams of data from different sources
-The differences between Streams and Tables in Apache Kafka
Full recorded presentation at https://www.youtube.com/watch?v=2UfAgCSKPZo for Tetrate Tech Talks on 2022/05/13.
Envoy's support for Kafka protocol, in form of broker-filter and mesh-filter.
Contents:
- overview of Kafka (usecases, partitioning, producer/consumer, protocol);
- proxying Kafka (non-Envoy specific);
- proxying Kafka with Envoy;
- handling Kafka protocol in Envoy;
- Kafka-broker-filter for per-connection proxying;
- Kafka-mesh-filter to provide front proxy for multiple Kafka clusters.
References:
- https://adam-kotwasinski.medium.com/deploying-envoy-and-kafka-8aa7513ec0a0
- https://adam-kotwasinski.medium.com/kafka-mesh-filter-in-envoy-a70b3aefcdef
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaKai Wähner
Streaming all over the World: Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka.
Learn about various case studies for event streaming with Apache Kafka across industries. The talk explores architectures for real-world deployments from Audi, BMW, Disney, Generali, Paypal, Tesla, Unity, Walmart, William Hill, and more. Use cases include fraud detection, mainframe offloading, predictive maintenance, cybersecurity, edge computing, track&trace, live betting, and much more.
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Kai Wähner
Architecture patterns for distributed, hybrid, edge and global Apache Kafka deployments
Multi-cluster and cross-data center deployments of Apache Kafka have become the norm rather than an exception. This session gives an overview of several scenarios that may require multi-cluster solutions and discusses real-world examples with their specific requirements and trade-offs, including disaster recovery, aggregation for analytics, cloud migration, mission-critical stretched deployments and global Kafka.
Key takeaways:
In many scenarios, one Kafka cluster is not enough. Understand different architectures and alternatives for multi-cluster deployments.
Zero data loss and high availability are two key requirements. Understand how to realize this, including trade-offs.
Learn about features and limitations of Kafka for multi cluster deployments
Global Kafka and mission-critical multi-cluster deployments with zero data loss and high availability became the normal, not an exception.
Producer Performance Tuning for Apache KafkaJiangjie Qin
Kafka is well known for high throughput ingestion. However, to get the best latency characteristics without compromising on throughput and durability, we need to tune Kafka. In this talk, we share our experiences to achieve the optimal combination of latency, throughput and durability for different scenarios.
Team Collaboration in Kafka Clusters With Maria Berinde-Tampanariu | Current ...HostedbyConfluent
Team Collaboration in Kafka Clusters With Maria Berinde-Tampanariu | Current 2022
When different teams start to use the same Kafka clusters, it opens up opportunities and challenges. During this talk, we will look at different architectures and team structures to explore ways in which to set up authorization in a granular and maintainable way for real-world users, as well as for producing or consuming clients.
What are the options offered by the Kafka built-in Authorizer, how can the Authorizer be customized and how are integrations with external systems built in order to provide group or role-based access control? Confluent Cloud and Confluent Platform provide predefined roles as part of the Role-based Access Control (RBAC) feature. We will look at the permissions included in these role bindings, the scope on which they can be used, and the components for which they are available. Role-based Access Control and Access Control Lists can be used together - let’s explore the options, best practices, and order of precedence.
We will put the capabilities into action by looking at the practices used by an imaginary company where the central Platform Team provisions clusters for its internal customers and provides access for teams to self-manage their domains. What’s the best approach to grant access to team members to their team’s resources and what needs to happen when one team collaborates with another team? What happens when a team member works temporarily on two teams?
We will close the session by looking at the ability to use the authorization mechanisms in conjunction with different authentication options and at the automation options to make the actions predictable and repeatable.
Kafka 2018 - Securing Kafka the Right WaySaylor Twift
How to evaluate, implement and maintain Kafka Message Broker in a high-throughput production environment. Taylor Swift's rectum probably smells like a Creamsicle.
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...confluent
Application teams in JPMC have started shifting towards building event driven architectures and real time steaming pipelines and Kafka has been at core in this journey. As application teams have started adopting Kafka rapidly, need for a centrally managed Kafka as a service has emerged. We have started delivering Kafka as a service in early 2018 and running in production for more than an year now operating 80+ clusters (and growing) in all environments together. One of the key requirements is to provide truly segregated, secured multi-tenant environment with RBAC model while satisfying financial regulations and controls at the same time. Operating clusters at large scale requires scalable self-service capabilities and cluster management orchestration. In this talk we will present - Our experiences in delivering and operating secured, multi-tenant and resilient Kafka clusters at scale. - Internals of our service framework/control plane which enables self-service capabilities for application teams, cluster build/patch orchestration and capacity management capabilities for TSE/admin teams. - Our approach in enabling automated Cross Datacenter failover for application teams using service framework and confluent replicator.
New Security Features in Apache HBase 0.98: An Operator's GuideHBaseCon
Speakers: Andrew Purtell and Ramkrishna Vasudevan (Intel)
HBase 0.98 introduces several new security features: visibility labels, cell ACLs, transparent encryption, and coprocessor framework changes. This talk will cover the new capabilities available in HBase 0.98+, the threat models and use cases they cover, how these features stack up against other data stores in the Apache big data ecosystem, and how operators and security architects can take advantage of them.
(Stephane Maarek, DataCumulus) Kafka Summit SF 2018
Security in Kafka is a cornerstone of true enterprise production-ready deployment: It enables companies to control access to the cluster and limit risks in data corruption and unwanted operations. Understanding how to use security in Kafka and exploiting its capabilities can be complex, especially as the documentation that is available is aimed at people with substantial existing knowledge on the matter.
This talk will be delivered in a “hero journey” fashion, tracing the experience of an engineer with basic understanding of Kafka who is tasked with securing a Kafka cluster. Along the way, I will illustrate the benefits and implications of various mechanisms and provide some real-world tips on how users can simplify security management.
Attendees of this talk will learn about aspects of security in Kafka, including:
-Encryption: What is SSL, what problems it solves and how Kafka leverages it. We’ll discuss encryption in flight vs. encryption at rest.
-Authentication: Without authentication, anyone would be able to write to any topic in a Kafka cluster, do anything and remain anonymous. We’ll explore the available authentication mechanisms and their suitability for different types of deployment, including mutual SSL authentication, SASL/GSSAPI, SASL/SCRAM and SASL/PLAIN.
-Authorization: How ACLs work in Kafka, ZooKeeper security (risks and mitigations) and how to manage ACLs at scale
Protecting your data at rest with Apache Kafka by Confluent and Vormetricconfluent
Learn how data in motion is secure within Apache Kafka and the broader Confluent Platform, while data at rest can be secured by solutions like Vormetric Data Security Manager.
In this session, Neil Avery covers the planning and operation of your KSQL deployment, including under-the-hood architectural details. You will learn about the various deployment models, how to track and monitor your KSQL applications, how to scale in and out and how to think about capacity planning. This is part 3 out of 3 in the Empowering Streams through KSQL series.
Flexible Authentication Strategies with SASL/OAUTHBEARER (Michael Kaminski, T...confluent
In order to maximize Kafka accessibility within an organization, Kafka operators must choose an authentication option that balances security with ease of use. Kafka has been historically limited to a small number of authentication options that are difficult to integrate with a Single Signon (SSO) strategy, such as mutual TLS, basic auth, and Kerberos. The arrival of SASL/OAUTHBEARER in Kafka 2.0.0 affords system operators a flexible framework for integrating Kafka with their existing authentication infrastructure. Ron Dagostino (State Street Corporation) and Mike Kaminski (The New York Times) team up to discuss SASL/OAUTHBEARER and it’s real-world applications. Ron, who contributed the feature to core Kafka, explains the origins and intricacies of its development along with additional, related security changes, including client re-authentication (merged and scheduled for release in v2.2.0) and the plans for support of SASL/OAUTHBEARER in librdkafka-based clients. Mike Kaminski, a developer on The Publishing Pipeline team at The New York Times, talks about how his team leverages SASL/OAUTHBEARER to break down silos between teams by making it easy for product owners to get connected to the Publishing Pipeline’s Kafka cluster.
Presentation done at the November meeting of the Sudoers Barcelona group (https://www.meetup.com/sudoersbcn/).
HashiCorp Vault (https://www.vaultproject.io/)
"Vault és una eina per emmagatzemar i gestionar secrets. Veurem què ofereix, com instal·lar-la, utilitzar-la i operar-la, i la nostra experiència."
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Kai Wähner
High level introduction to Confluent REST Proxy and Schema Registry (leveraging Apache Avro under the hood), two components of the Apache Kafka open source ecosystem. See the concepts, architecture and features.
In the latest versions of K8s there has been an evolution regarding the definition of security strategies at the level of access policies to the cluster by users and developers. The security contexts (securityContext) allow you to define the configurations at the level of access control and privileges for a pod or container in a simple way using keywords in the configuration files.
To facilitate the implementation of these security strategies throughout the cluster, new strategies have emerged such as the Pod Security Policy (PSP) where the cluster administrator is in charge of defining these policies at the cluster level with the aim that developers can follow these policies.
Other interesting projects include Open Policy Agent (OPA) as the main cloud-native authorization policy agent for creating policies and managing user permissions for access to applications.
The objective of this talk is to present the evolution that has occurred in security strategies and how we could use them together, as well as analyze their behavior in accessing resources. Among the points to be discussed we can highlight:
-Introduction to security strategies in K8s environments
-Pod Security Admission(PSA) vs Open Policy Agent (OPA)
-Combination of different security strategies together
-Access to resources in privileged and non-privileged mode
The access of the AKS Cluster and Kubelet to other Azure services is a part of AKS Security. This presentation is about this.
Youtube video of these slides: https://youtu.be/-ZRp1OoWz8M
WordPress post: azure-kubernetes-service-aks-with-terraform-deployment
GitHub: https://github.com/ParisaMousavi/enterprise-aks/tree/2022.10.24
You can find the first part of this presentation here: https://www.slideshare.net/secret/pAvK8Qd9f07oa
This presentation takes a deep dive into how the Million Song Library, a microservices-based application, was built using the Netflix Stack, Cassandra and Datastax.
To learn more about Million Song Library and its components visit the project on GitHub: https://github.com/kenzanlabs/million-song-library
Lea
Chickens & Eggs: Managing secrets in AWS with Hashicorp VaultJeff Horwitz
Presented to the Philly DevOps Meetup November 29, 2016.
Managing secrets is hard. It’s even harder in the cloud. At Jornaya (formerly LeadiD), we chose Hashicorp Vault to manage our secrets in AWS, and I’d like to share our experience with everyone.
Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.Opcito Technologies
Opcito Technologies is a proud partner with Kubernetes, an open-source system for container orchestration.
We will be talking about:
• Features of Kubernetes 1.6
• RBAC Configurations
• RBAC Use Cases
• Running Prometheus in Kubernetes
• Prometheus Operator - Deployment, Cluster & Service Monitoring
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
In our exclusive webinar, you'll learn why event-driven architecture is the key to unlocking cost efficiency, operational effectiveness, and profitability. Gain insights on how this approach differs from API-driven methods and why it's essential for your organization's success.
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
In today's data-driven world, the Internet of Things (IoT) is revolutionizing industries and unlocking new possibilities. Join Data Reply, Confluent, and Imply as we unveil a comprehensive solution for IoT that harnesses the power of real-time insights.
Workshop híbrido: Stream Processing con Flinkconfluent
El Stream processing es un requisito previo de la pila de data streaming, que impulsa aplicaciones y pipelines en tiempo real.
Permite una mayor portabilidad de datos, una utilización optimizada de recursos y una mejor experiencia del cliente al procesar flujos de datos en tiempo real.
En nuestro taller práctico híbrido, aprenderás cómo filtrar, unir y enriquecer fácilmente datos en tiempo real dentro de Confluent Cloud utilizando nuestro servicio Flink sin servidor.
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
Our talk will explore the transformative impact of integrating Confluent, HiveMQ, and SparkPlug in Industry 4.0, emphasizing the creation of a Unified Namespace.
In addition to the creation of a Unified Namespace, our webinar will also delve into Stream Governance and Scaling, highlighting how these aspects are crucial for managing complex data flows and ensuring robust, scalable IIoT-Platforms.
You will learn how to ensure data accuracy and reliability, expand your data processing capabilities, and optimize your data management processes.
Don't miss out on this opportunity to learn from industry experts and take your business to the next level.
La arquitectura impulsada por eventos (EDA) será el corazón del ecosistema de MAPFRE. Para seguir siendo competitivas, las empresas de hoy dependen cada vez más del análisis de datos en tiempo real, lo que les permite obtener información y tiempos de respuesta más rápidos. Los negocios con datos en tiempo real consisten en tomar conciencia de la situación, detectar y responder a lo que está sucediendo en el mundo ahora.
Eventos y Microservicios - Santander TechTalkconfluent
Durante esta sesión examinaremos cómo el mundo de los eventos y los microservicios se complementan y mejoran explorando cómo los patrones basados en eventos nos permiten descomponer monolitos de manera escalable, resiliente y desacoplada.
Purpose of the session is to have a dive into Apache, Kafka, Data Streaming and Kafka in the cloud
- Dive into Apache Kafka
- Data Streaming
- Kafka in the cloud
Build real-time streaming data pipelines to AWS with Confluentconfluent
Traditional data pipelines often face scalability issues and challenges related to cost, their monolithic design, and reliance on batch data processing. They also typically operate under the premise that all data needs to be stored in a single centralized data source before it's put to practical use. Confluent Cloud on Amazon Web Services (AWS) provides a fully managed cloud-native platform that helps you simplify the way you build real-time data flows using streaming data pipelines and Apache Kafka.
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
No matter whether you are migrating your Kafka cluster to Confluent Cloud, running a cloud-hybrid environment or are in a different situation where data protection and encryption of sensitive information is required, Confluent Service Mesh allows you to transparently encrypt your data without the need to make code changes to you existing applications.
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
Microservices have become a dominant architectural paradigm for building systems in the enterprise, but they are not without their tradeoffs. Learn how to build event-driven microservices with Apache Kafka
Confluent & GSI Webinars series - Session 3confluent
An in depth look at how Confluent is being used in the financial services industry. Gain an understanding of how organisations are utilising data in motion to solve common problems and gain benefits from their real time data capabilities.
It will look more deeply into some specific use cases and show how Confluent technology is used to manage costs and mitigate risks.
This session is aimed at Solutions Architects, Sales Engineers and Pre Sales, and also the more technically minded business aligned people. Whilst this is not a deeply technical session, a level of knowledge around Kafka would be helpful.
Transforming applications built with traditional messaging solutions such as TIBCO, MQ and Solace to be scalable, reliable and ready for the move to cloud
How can applications built with traditional messaging technologies like TIBCO, Solace and IBM MQ be modernised and be made cloud ready? What are the advantages to Event Streaming approaches to pub/sub vs traditional message queues? What are the strengeths and weaknesses of both approaches, and what use cases and requirements are actually a better fit for messaging than Kafka?
This session will show why the old paradigm does not work and that a new approach to the data strategy needs to be taken. It aims to show how a Data Streaming Platform is integral to the evolution of a company’s data strategy and how Confluent is not just an integration layer but the central nervous system for an organisation
Vous apprendrez également à :
• Créer plus rapidement des produits et fonctionnalités à l’aide d’une suite complète de connecteurs et d’outils de gestion des flux, et à connecter vos environnements à des pipelines de données
• Protéger vos données et charges de travail les plus critiques grâce à des garanties intégrées en matière de sécurité, de gouvernance et de résilience
• Déployer Kafka à grande échelle en quelques minutes tout en réduisant les coûts et la charge opérationnelle associés
Confluent Partner Tech Talk with Synthesisconfluent
A discussion on the arduous planning process, and deep dive into the design/architectural decisions.
Learn more about the networking, RBAC strategies, the automation, and the deployment plan.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
4. Reminder: What is Apache Kafka?
• Distributed, fault-tolerant and immutable log
• Use cases
• Publish-subscribe queue
• ETL (Extract, Transform, Load)
• Event streaming platform
• Rich infrastructure through connectors and language APIs
• Open Source, with enterprise support and enhancements through companies like
Confluent
4
5. 5
Apache Kafka before
security (<0.9)
¡ Kafka is infrastructure hidden
behind the firewall
¡ Servers within the network are
implicitly trusted
¡ No governance required (or possible)
7. Security Features added in Kafka 0.9
Encryption Authentication
Authorization
Connection to
ZooKeeper
8. Encryption
8
Encryption in flight TLS = Transport Layer Encryption
Encryption at rest OS or file system feature
End-to-End encryption Needs to be built into the
producers and consumers
9. Encryption via TLS (Transport Layer Security)
Prevents
eavesdropping
Prevents man-
in-the-middle
attacks
Industry-
standard
encryption
Broker certificate signed by either
• External, well-known Certificate Authority (CA) – GlobalSign, Let’s Encrypt …
• Internal CA, with clients using a truststore to verify its identity
security.protocol=SSL (instead of the default PLAINTEXT)
ssl.endpoint.identification.algorithm=https
10. TLS Mutual Authentication
If enabled,
provides trust into
client connections
Principal is used
for authorization
Certificates signed
by trusted
authority (CA)
Clients additionally use a keystore to provide their own identity
11. SASL (Simple Authentication and Security Layer)
11
Framework for authentication and data security
Decouples authentication mechanisms from the
application protocols
Common interface for different implementations
12. SASL Implementations
12
Implementation Class Comments
PLAIN org.apache.kafka.common.security.plain.PlainLoginModule Username/Password
SCRAM org.apache.kafka.common.security.scram.ScramLoginModule SCRAM_SHA_(256|512)
GSSAPI com.sun.security.auth.module.Krb5LoginModule Kerberos
OAUTHBEARER org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule Token based used for RBAC
security.protocol = SASL_PLAINTEXT | SASL_SSL
sasl.mechanism = PLAIN | SCRAM_SHA_256 | SCRAM_SHA_512 | GSSAPI | OAUTHBEARER
Configured in the brokers and the clients through a JAAS file
13. JAAS
13
• Java Authentication and Authorization Service
• For services (Kafka, ZooKeeper) grouped by section names
• KafkaServer
• KafkaClient
• Server (for ZooKeeper Server)
• Client (for ZooKeeper Client, e.g. Kafka)
• Alternatively, specify as a property
KafkaServer {
org.apache.kafka.common.security.scram.ScramLoginModule required
username="kafka"
password="kafka";
};
Client {
org.apache.zookeeper.server.auth.DigestLoginModule required
username="admin"
password=”admin-secret";
};
sasl.mechanism=SCRAM-SHA-256
security.protocol=SASL_PLAINTEXT
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required
username="producer"
password="producer-secret";
14. sasl.mechanism=PLAIN
• Username/Password
• Broker
• By default, holds a list of username/password in the SASL config file
• Requires rolling restart when users are added or deleted
• Can be overwritten, for example
• listener.name.sasl_plaintext.plain.sasl.server.callback.handler.class=
io.confluent.security.auth.provider.ldap.LdapAuthenticateCallbackHandler
• Client
sasl.mechanism=PLAIN
security.protocol=SASL_PLAINTEXT
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule
required
username="alice"
password="alice-secret" ;
15. SCRAM
• Salted Challenge Response Authentication Mechanism
• Users and passwords are stored in ZooKeeper
• Dynamic configuration through kafka-configs
• Different encryption levels (SHA-256, SHA-512)
• CAVEAT: Need to add any super user as an entity as well
kafka-configs --zookeeper zookeeper:2182 --alter
--add-config 'SCRAM-SHA-256=[password=xxxxx],SCRAM-SHA-512=[password=xxxxx]'
--entity-type users
--entity-name producer
sasl.mechanism=SCRAM-SHA-256
security.protocol=SASL_PLAINTEXT
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule
required
username="producer"
password=”xxxxx";
16. Kerberos
• Network authentication protocol created by the
MIT
• sasl.enabled.mechanisms=GSSAPI
• Generic Security Service Application Program
Interface
• External Key Distribution Center (KDC)
• Service (Broker, ZooKeeper …) registers itself
• sasl.kerberos.service.name must match
principal
• Default service names are ‘kafka’, ‘zookeeper’
17. Kerberos client
• Either: create a ticket with kinit (login)
• Or: provide the keytab location and the principal name
• List existing tickets with klist
sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
storeKey=true
keyTab="/etc/security/keytabs/kafka_client.keytab"
principal="kafkaclient1@EXAMPLE.COM";
19. ACL for users
19
•kafka-acls
•Can check in ZooKeeper directly
ACLs are stored in ZooKeeper
•Topic, Cluster, Group, TransactionId, DelegationToken
•(Principal, PermissionType, Operation, Host)
ACLs defined for
•Confluent extension to use Group:<group-name> set in LDAP
Principal for users is
User:<user-name>
Can use resource name ‘*’ for
catch all
•--resource-pattern-type prefixed (default is literal)
To specify a subset of resources
with the same prefix, use
20. Securing ZooKeeper
• Authentication
• SASL
• Digest
• Kerberos
• Authorization
• Securing your nodes through Kafka
• zookeeper.set.acl=true
• Migrating from a non-secured node
• zookeeper-security-migration
22. Challenge
• Clients directly connected to the brokers use SASL or mutual
auth for authentication
• Tools like KSQL or Connect are connected using a single user
with that user’s permissions
• ASK: Users of these tools need their own authentication and
propagate their credentials
Schema RegistryREST Client
SR UserREST User
REST User
23. The Need for A Unified Approach
• Requirements
• Services connect to Kafka using TLS Mutual Auth
or SASL
• External access via REST needs to impersonate
an internal user
• Want
• Single service to assign roles
• Changes need to be traceable (ZooKeeper does
not keep a log)
24. RBAC – Role Based Access Control
(Pre-defined) roles can be
assigned to users or groups for a
resource
Stored in a Kafka topic accessed
via a new Kafka interface, the
MDS (Meta Data Service)
Requires Confluent Server
instead of Apache Kafka
User authentication backed by
LDAP
Authorized users are given an
OATHBEARER Token to identify
themselves
28. Schema Registry, Connect, KSQL, REST Proxy
• Standard REST username/password security (from LDAP)
• MDS Server provides OAUTHBEARER token
• Credentials are propagated
29. RBAC Key Benefits
29
Key Benefits Over Traditional ZK ACLs:
• Efficiency with Scale
• Confluent Platform-wide security
• Authorization at scale - with predefined Roles and LDAP Groups as Principals
• Manage multiple clusters from 1 central location
• Increased Granular Policies
• Connector level access control.
• Granular access for Control Center users
• On top of the Topic and SR Subjects
• Improved Multi-Tenancy support
• Items without access do not show
• Delegation of Administration Responsibilities
30. Requirements
• Confluent Server >= CP 5.4
• LDAP (Usually Microsoft Active Directory) for
• User authentication (usually Username/Password SIMPLE BIND)
• Group membership for authorization
• All services configured with
• sasl.enabled.mechanism=OAUTHBEARER
• sasl.login.callback.handler.class=
io.confluent.kafka.clients.plugins.auth.token.TokenUserLoginCallbackHandler
• metadata.bootstrap.server.urls=http://mds-server:8090
• Do not configure standard producers/consumer/streams with OAUTHBEARER
• Unsupported
• Would require access to MDS server
30
31. Summary
• Security is an important feature of a streaming platform
• Encryption and ACLs provide security for clients directly
accessing the Kafka brokers
• RBAC adds a unified authentication and authorization
interface for services around Kafka