This document provides an overview of Apache Kafka. It describes Kafka as a distributed publish-subscribe messaging system with a distributed commit log that provides high-throughput and low-latency processing of streaming data. The document covers Kafka concepts like topics, partitions, producers, consumers, replication, and reliability guarantees. It also discusses Kafka architecture, performance optimizations, configuration parameters for durability and reliability, and use cases for activity tracking, messaging, metrics, and stream processing.
Building Event-Driven (Micro) Services with Apache KafkaGuido Schmutz
This talk begins with a short recap of how we created systems over the past 20 years, up to the current idea of building systems, using a Microservices architecture. What is a Microservices architecture and how does it differ from a Service-Oriented Architecture? Should you use traditional REST APIs to integrate services with each eachother in a Microservices Architecture? Or is it better to use a more loosely-coupled protocol? Answers to these and many other questions are provided. The talk will show how a distributed log (event hub) can help to create a central, persistent history of events and what benefits we achieve from doing so. Apache Kafka is a perfect match for building such an asynchronous, loosely-coupled event-driven backbone. Events trigger processing logic, which can be implemented in a more traditional as well as in a stream processing fashion. The talk shows the difference between a request-driven and event-driven communication and answers when to use which. It highlights how a modern stream processing systems can be used to hold state both internally as well as in a database and how this state can be used to further increase independence of other services, the primary goal of a Microservices architecture.
ksqlDB is a stream processing SQL engine, which allows stream processing on top of Apache Kafka. ksqlDB is based on Kafka Stream and provides capabilities for consuming messages from Kafka, analysing these messages in near-realtime with a SQL like language and produce results again to a Kafka topic. By that, no single line of Java code has to be written and you can reuse your SQL knowhow. This lowers the bar for starting with stream processing significantly.
ksqlDB offers powerful capabilities of stream processing, such as joins, aggregations, time windows and support for event time. In this talk I will present how KSQL integrates with the Kafka ecosystem and demonstrate how easy it is to implement a solution using ksqlDB for most part. This will be done in a live demo on a fictitious IoT sample.
A brief introduction to Apache Kafka and describe its usage as a platform for streaming data. It will introduce some of the newer components of Kafka that will help make this possible, including Kafka Connect, a framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library.
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
The University Seminar series aim to provide a basic understanding of Open Source Information Retrieval and its application in the real world through the Apache Lucene/Solr technologies.
Communication in a Microservice ArchitecturePer Bernhardt
There are many different approaches to how you let your microservices communicate between one another. Be it asynchronous or synchronous, choreographed or orchestrated, eventual consistent or distributedly transactional, fault tolerant or just a mess! In this session I will provide an overview on different concepts of microservice communication and their pros & cons. On the way I'll try to throw in some anecdotes, success stories and failures I learned from so that you can hopefully take something home with you.
Building Event-Driven (Micro) Services with Apache KafkaGuido Schmutz
This talk begins with a short recap of how we created systems over the past 20 years, up to the current idea of building systems, using a Microservices architecture. What is a Microservices architecture and how does it differ from a Service-Oriented Architecture? Should you use traditional REST APIs to integrate services with each eachother in a Microservices Architecture? Or is it better to use a more loosely-coupled protocol? Answers to these and many other questions are provided. The talk will show how a distributed log (event hub) can help to create a central, persistent history of events and what benefits we achieve from doing so. Apache Kafka is a perfect match for building such an asynchronous, loosely-coupled event-driven backbone. Events trigger processing logic, which can be implemented in a more traditional as well as in a stream processing fashion. The talk shows the difference between a request-driven and event-driven communication and answers when to use which. It highlights how a modern stream processing systems can be used to hold state both internally as well as in a database and how this state can be used to further increase independence of other services, the primary goal of a Microservices architecture.
ksqlDB is a stream processing SQL engine, which allows stream processing on top of Apache Kafka. ksqlDB is based on Kafka Stream and provides capabilities for consuming messages from Kafka, analysing these messages in near-realtime with a SQL like language and produce results again to a Kafka topic. By that, no single line of Java code has to be written and you can reuse your SQL knowhow. This lowers the bar for starting with stream processing significantly.
ksqlDB offers powerful capabilities of stream processing, such as joins, aggregations, time windows and support for event time. In this talk I will present how KSQL integrates with the Kafka ecosystem and demonstrate how easy it is to implement a solution using ksqlDB for most part. This will be done in a live demo on a fictitious IoT sample.
A brief introduction to Apache Kafka and describe its usage as a platform for streaming data. It will introduce some of the newer components of Kafka that will help make this possible, including Kafka Connect, a framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library.
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
The University Seminar series aim to provide a basic understanding of Open Source Information Retrieval and its application in the real world through the Apache Lucene/Solr technologies.
Communication in a Microservice ArchitecturePer Bernhardt
There are many different approaches to how you let your microservices communicate between one another. Be it asynchronous or synchronous, choreographed or orchestrated, eventual consistent or distributedly transactional, fault tolerant or just a mess! In this session I will provide an overview on different concepts of microservice communication and their pros & cons. On the way I'll try to throw in some anecdotes, success stories and failures I learned from so that you can hopefully take something home with you.
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This slide deck is a tutorial for the Kafka streaming platform. This slide deck covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example to demonstrate failover of brokers as well as consumers. Then it goes through some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have also expanded on the Kafka design section and added references. The tutorial covers Avro and the Schema Registry as well as advance Kafka Producers.
The Top 5 Apache Kafka Use Cases and Architectures in 2022Kai Wähner
I see the following topics coming up more regularly in conversations with customers, prospects, and the broader Kafka community across the globe:
Kappa Architecture: Kappa goes mainstream to replace Lambda and Batch pipelines (that does not mean that there is no batch processing anymore). Examples: Kafka-powered Kappa architectures from Uber, Disney, Shopify, and Twitter.
Hyper-personalized Omnichannel: Retail and customer communication across online and offline channels becomes the new black, including context-specific upselling, recommendations, and location-based services. Examples: Omnichannel Retail and Customer 360 in Real-Time with Apache Kafka.
Multi-Cloud Deployments: Business units and IT infrastructures span across regions, continents, and cloud providers. Linking clusters for bi-directional replication of data in real-time becomes crucial for many business models. Examples: Global Kafka deployments.
Edge Analytics: Low latency requirements, cost efficiency, or security requirements enforce the deployment of (some) event streaming use cases at the far edge (i.e., outside a data center), for instance, for predictive maintenance and quality assurance on the shop floor level in smart factories. Examples: Edge analytics with Kafka.
Real-time Cybersecurity: Situational awareness and threat intelligence need to process massive data in real-time to defend against cyberattacks successfully. The many successful ransomware attacks across the globe in 2021 were a warning for most CIOs. Examples: Cybersecurity for situational awareness and threat intelligence in real-time.
Running Apache Kafka in production is only the first step in the Kafka operations journey. Professional Kafka users are ready to handle all possible disasters - because for most businesses having a disaster recovery plan is not optional.
In this session, we’ll discuss disaster scenarios that can take down entire Kafka clusters and share advice on how to plan, prepare and handle these events. This is a technical session full of best practices - we want to make sure you are ready to handle the worst mayhem that nature and auditors can cause.
Visit www.confluent.io for more information.
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeSlim Baltagi
Kafka as a streaming data platform is becoming the successor to traditional messaging systems such as RabbitMQ. Nevertheless, there are still some use cases where they could be a good fit. This one single slide tries to answer in a concise and unbiased way where to use Apache Kafka and where to use RabbitMQ. Your comments and feedback are much appreciated.
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Kai Wähner
High level introduction to Confluent REST Proxy and Schema Registry (leveraging Apache Avro under the hood), two components of the Apache Kafka open source ecosystem. See the concepts, architecture and features.
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentHostedbyConfluent
Joins in Kafka Streams and ksqlDB are a killer-feature for data processing and basic join semantics are well understood. However, in a streaming world records are associated with timestamps that impact the semantics of joins: welcome to the fabulous world of _temporal_ join semantics. For joins, timestamps are as important as the actual data and it is important to understand how they impact the join result.
In this talk we want to deep dive on the different types of joins, with a focus of their temporal aspect. Furthermore, we relate the individual join operators to the overall ""time engine"" of the Kafka Streams query runtime and explain its relationship to operator semantics. To allow developers to apply their knowledge on temporal join semantics, we provide best practices, tip and tricks to ""bend"" time, and configuration advice to get the desired join results. Last, we give an overview of recent, and an outlook to future, development that improves joins even further.
Apache kafka performance(throughput) - without data loss and guaranteeing dat...SANG WON PARK
Apache Kafak의 성능이 특정환경(데이터 유실일 발생하지 않고, 데이터 전송순서를 반드시 보장)에서 어느정도 제공하는지 확인하기 위한 테스트 결과 공유
데이터 전송순서를 보장하기 위해서는 Apache Kafka cluster로 partition을 분산할 수 없게되므로, 성능향상을 위한 장점을 사용하지 못하게 된다.
이번 테스트에서는 Apache Kafka의 단위 성능, 즉 partition 1개에 대한 성능만을 측정하게 된다.
향후, partition을 증가할 경우 본 테스트의 1개 partition 단위 성능을 기준으로 예측이 가능할 것 같다.
A stream processing platform is not an island unto itself; it must be connected to all of your existing data systems, applications, and sources. In this talk we will provide different options for integrating systems and applications with Apache Kafka, with a focus on the Kafka Connect framework and the ecosystem of Kafka connectors. We will discuss the intended use cases for Kafka Connect and share our experience and best practices for building large-scale data pipelines using Apache Kafka.
Kafka Tutorial - introduction to the Kafka streaming platformJean-Paul Azar
Why is Kafka so fast? Why is Kafka so popular? Why Kafka?
Introduction to Kafka streaming platform. Covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example. Lastly, we added some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have started to expand on the Java examples to correlate with the design discussion of Kafka. We have also expanded on the Kafka design section and added references.
Apache BookKeeper: A High Performance and Low Latency Storage ServiceSijie Guo
Apache BookKeeper is a high performance and low latency storage service optimized for storing immutable and append-only data (such as log, streaming events, and objects). Sijie Guo and JV shares the experienced with Apache BookKeeper. This talk covers the motivation and overview of BookKeeper, dives into implementation details and describes the use cases built upon it.
Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This slide deck is a tutorial for the Kafka streaming platform. This slide deck covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example to demonstrate failover of brokers as well as consumers. Then it goes through some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have also expanded on the Kafka design section and added references. The tutorial covers Avro and the Schema Registry as well as advance Kafka Producers.
The Top 5 Apache Kafka Use Cases and Architectures in 2022Kai Wähner
I see the following topics coming up more regularly in conversations with customers, prospects, and the broader Kafka community across the globe:
Kappa Architecture: Kappa goes mainstream to replace Lambda and Batch pipelines (that does not mean that there is no batch processing anymore). Examples: Kafka-powered Kappa architectures from Uber, Disney, Shopify, and Twitter.
Hyper-personalized Omnichannel: Retail and customer communication across online and offline channels becomes the new black, including context-specific upselling, recommendations, and location-based services. Examples: Omnichannel Retail and Customer 360 in Real-Time with Apache Kafka.
Multi-Cloud Deployments: Business units and IT infrastructures span across regions, continents, and cloud providers. Linking clusters for bi-directional replication of data in real-time becomes crucial for many business models. Examples: Global Kafka deployments.
Edge Analytics: Low latency requirements, cost efficiency, or security requirements enforce the deployment of (some) event streaming use cases at the far edge (i.e., outside a data center), for instance, for predictive maintenance and quality assurance on the shop floor level in smart factories. Examples: Edge analytics with Kafka.
Real-time Cybersecurity: Situational awareness and threat intelligence need to process massive data in real-time to defend against cyberattacks successfully. The many successful ransomware attacks across the globe in 2021 were a warning for most CIOs. Examples: Cybersecurity for situational awareness and threat intelligence in real-time.
Running Apache Kafka in production is only the first step in the Kafka operations journey. Professional Kafka users are ready to handle all possible disasters - because for most businesses having a disaster recovery plan is not optional.
In this session, we’ll discuss disaster scenarios that can take down entire Kafka clusters and share advice on how to plan, prepare and handle these events. This is a technical session full of best practices - we want to make sure you are ready to handle the worst mayhem that nature and auditors can cause.
Visit www.confluent.io for more information.
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeSlim Baltagi
Kafka as a streaming data platform is becoming the successor to traditional messaging systems such as RabbitMQ. Nevertheless, there are still some use cases where they could be a good fit. This one single slide tries to answer in a concise and unbiased way where to use Apache Kafka and where to use RabbitMQ. Your comments and feedback are much appreciated.
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Kai Wähner
High level introduction to Confluent REST Proxy and Schema Registry (leveraging Apache Avro under the hood), two components of the Apache Kafka open source ecosystem. See the concepts, architecture and features.
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentHostedbyConfluent
Joins in Kafka Streams and ksqlDB are a killer-feature for data processing and basic join semantics are well understood. However, in a streaming world records are associated with timestamps that impact the semantics of joins: welcome to the fabulous world of _temporal_ join semantics. For joins, timestamps are as important as the actual data and it is important to understand how they impact the join result.
In this talk we want to deep dive on the different types of joins, with a focus of their temporal aspect. Furthermore, we relate the individual join operators to the overall ""time engine"" of the Kafka Streams query runtime and explain its relationship to operator semantics. To allow developers to apply their knowledge on temporal join semantics, we provide best practices, tip and tricks to ""bend"" time, and configuration advice to get the desired join results. Last, we give an overview of recent, and an outlook to future, development that improves joins even further.
Apache kafka performance(throughput) - without data loss and guaranteeing dat...SANG WON PARK
Apache Kafak의 성능이 특정환경(데이터 유실일 발생하지 않고, 데이터 전송순서를 반드시 보장)에서 어느정도 제공하는지 확인하기 위한 테스트 결과 공유
데이터 전송순서를 보장하기 위해서는 Apache Kafka cluster로 partition을 분산할 수 없게되므로, 성능향상을 위한 장점을 사용하지 못하게 된다.
이번 테스트에서는 Apache Kafka의 단위 성능, 즉 partition 1개에 대한 성능만을 측정하게 된다.
향후, partition을 증가할 경우 본 테스트의 1개 partition 단위 성능을 기준으로 예측이 가능할 것 같다.
A stream processing platform is not an island unto itself; it must be connected to all of your existing data systems, applications, and sources. In this talk we will provide different options for integrating systems and applications with Apache Kafka, with a focus on the Kafka Connect framework and the ecosystem of Kafka connectors. We will discuss the intended use cases for Kafka Connect and share our experience and best practices for building large-scale data pipelines using Apache Kafka.
Kafka Tutorial - introduction to the Kafka streaming platformJean-Paul Azar
Why is Kafka so fast? Why is Kafka so popular? Why Kafka?
Introduction to Kafka streaming platform. Covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example. Lastly, we added some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have started to expand on the Java examples to correlate with the design discussion of Kafka. We have also expanded on the Kafka design section and added references.
Apache BookKeeper: A High Performance and Low Latency Storage ServiceSijie Guo
Apache BookKeeper is a high performance and low latency storage service optimized for storing immutable and append-only data (such as log, streaming events, and objects). Sijie Guo and JV shares the experienced with Apache BookKeeper. This talk covers the motivation and overview of BookKeeper, dives into implementation details and describes the use cases built upon it.
Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
Fundamentals and Architecture of Apache Kafka.
This presentation explains Apache Kafka's architecture and internal design giving an overview of Kafka internal functions, including:
Brokers, Replication, Partitions, Producers, Consumers, Commit log, comparison over traditional message queues.
Uber has one of the largest Kafka deployment in the industry. To improve the scalability and availability, we developed and deployed a novel federated Kafka cluster setup which hides the cluster details from producers/consumers. Users do not need to know which cluster a topic resides and the clients view a "logical cluster". The federation layer will map the clients to the actual physical clusters, and keep the location of the physical cluster transparent from the user. Cluster federation brings us several benefits to support our business growth and ease our daily operation. In particular, Client control. Inside Uber there are a large of applications and clients on Kafka, and it's challenging to migrate a topic with live consumers between clusters. Coordinations with the users are usually needed to shift their traffic to the migrated cluster. Cluster federation enables much control of the clients from the server side by enabling consumer traffic redirection to another physical cluster without restarting the application. Scalability: With federation, the Kafka service can horizontally scale by adding more clusters when a cluster is full. The topics can freely migrate to a new cluster without notifying the users or restarting the clients. Moreover, no matter how many physical clusters we manage per topic type, from the user perspective, they view only one logical cluster. Availability: With a topic replicated to at least two clusters we can tolerate a single cluster failure by redirecting the clients to the secondary cluster without performing a region-failover. This also provides much freedom and alleviates the risks for us to carry out important maintenance on a critical cluster. Before the maintenance, we mark the cluster as a secondary and migrate off the live traffic and consumers. We will present the details of the architecture and several interesting technical challenges we overcame.
Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
High performance messaging with Apache PulsarMatteo Merli
Apache Pulsar is being used for an increasingly broad array of data ingestion tasks. When operating at scale, it's very important to ensure that the system can make use of all the available resources. Karthik Ramasamy and Matteo Merli share insights into the design decisions and the implementation techniques that allow Pulsar to achieve high performance with strong durability guarantees.
Linked In Stream Processing Meetup - Apache PulsarKarthik Ramasamy
Apache Pulsar is the next generation messaging system that uses a fundamentally different architecture to achieve durability, performance, scalability, efficiency, multi-tenancy and geo replication.
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpJosé Román Martín Gil
Apache Kafka is the most used data streaming broker by companies. It could manage millions of messages easily and it is the base of many architectures based in events, micro-services, orchestration, ... and now cloud environments. OpenShift is the most extended Platform as a Service (PaaS). It is based in Kubernetes and it helps the companies to deploy easily any kind of workload in a cloud environment. Thanks many of its features it is the base for many architectures based in stateless applications to build new Cloud Native Applications. Strimzi is an open source community that implements a set of Kubernetes Operators to help you to manage and deploy Apache Kafka brokers in OpenShift environments.
These slides will introduce you Strimzi as a new component on OpenShift to manage your Apache Kafka clusters.
Slides used at OpenShift Meetup Spain:
- https://www.meetup.com/es-ES/openshift_spain/events/261284764/
Unleashing Real-time Power with Kafka.pptxKnoldus Inc.
Unlock the potential of real-time data streaming with Kafka in this session. Learn the fundamentals, architecture, and seamless integration with Scala, empowering you to elevate your data processing capabilities. Perfect for developers at all levels, this hands-on experience will equip you to harness the power of real-time data streams effectively.
Haitao Zhang, Uber, Software Engineer + Yang Yang, Uber, Senior Software Engineer
Kafka Consumer Proxy is a forwarding proxy that consumes messages from Kafka and dispatches them to a user registered gRPC service endpoint. With Kafka Consumer Proxy, the experience of consuming messages from Apache Kafka for pub-sub use cases is as seamless and user-friendly as receiving (g)RPC requests. In this talk, we will share (1) the motivation for building this service, (2) the high-level architecture, (3) the mechanisms we designed to achieve high availability, scalability, and reliability, and (4) the current adoption status.
https://www.meetup.com/KafkaBayArea/events/273834934/
Similar to Kafka in action - Tech Talk - Paytm (20)
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
2. 2 / 40
Introduction
●
Apache Kafka is a publish/subscribe
messaging system.
●
A distributed, partitioned, replicated commit
log (record of transactions) service.
●
High-throughput, low-latency platform for
handling real-time data feeds.
3. 3 / 40
●
Activity Tracking: generating reports, feeding
machine learning systems, updating search results
●
Messaging: Send notifications to users, application
events
●
Metrics and logging: Aggregating statistics/logs from
distributed applications to produce centralized feeds
of operational data.
●
Commit Log: Database changes can be published to
Kafka and applications can easily monitor this
stream
●
Stream processing: Real time processing of infinite
data stream.
Use Cases
6. 6 / 40
Kafka Basics
●
The unit of data within Kafka is called a
message (array of bytes).
●
Message can have a key (byte array)
●
Keys are used to distribute messages among
partitions using a partitioner.
8. 8 / 40
Topic Partitions
●
Messages in Kafka are categorized into topics.
●
Topics are broken into ordered commit logs called
partitions
●
Each message in a partition is assigned a sequential
id called offset.
●
Partitions are the way that Kafka provides
redundancy and scalability.
9. 9 / 40
How many partitions?
●
Expected throughput you want to achieve.
●
Maximum throughput of 1 consumer thread.
●
Consider the number of partitions you will place on each
broker and available diskspace
●
Calculate throughput based on your expected future
usage, not the current usage
●
When publishing based on keys, difficult to change
partition count later.
●
Decreasing partitions not allowed.
●
Each partition uses memory/other resources on broker
and will increase the time for leader elections.
10. 10 / 40
Kafka Cluster
●
Zookeper maintains list of current brokers
●
Every broker has a unique id (broker.id)
●
Kafka components subscribe to the /brokers/ids
path in Zookeeper
●
The first broker that starts in the cluster
becomes the controller by creating an
ephemeral node called /controller
– Responsible for electing partition leaders
when nodes join and leave the cluster
– Cluster will(should) only have one controller at
a time
11. 11 / 40
Replication
●
Two types of replicas: leader & follower
●
All produce and consume requests go through the leader,
in order to guarantee consistency.
●
Followers send the leader Fetch requests, to stay in sync
(just like consumers)
●
A replica is considered in-sync if it is leader, or if it is a
follower that has an active session with Zookeeper and is
not too far behind the leader (configurable)
●
Only in-sync replicas are eligible to be elected as partition
leaders in case the existing leader fails
●
Each partition has a preferred leader—the replica that was
the leader when the topic was originally created
12. 12 / 40
Partition Allocation
●
When creating a topic, Kafka places the partitions and
replicas in a way that the brokers with least number of
existing partitions are used first, and replicas for same
partition are on different brokers.
●
When you add a new broker, it is used for new partitions
(since it has lowest number of existing partitions), but
there is no automatic balancing of existing partitions to
the new broker. You can use the replica-reassignment tool
to move partitions and replicas to the new broker.
13. 13 / 40
Kafka APIs
●
Producer API
●
Consumer API
●
Streams API
– Transforming streams of data from input topics to
output topics.
●
Connector API
– Implementing connectors that continually pull
from some source system into Kafka or push from
Kafka into some sink system.
●
AdminClient API
– Managing and inspecting topics, brokers, and
other Kafka objects.
15. 15 / 40
●
Default Partitioner: Round-robin, used if Key not
specified.
●
Hashing partitioner is used if key is provided,
hashing over all partitions, not just available ones.
●
Define custome partitioner
●
Returns RecordMetadata: [topic, partition, offset]
●
max.in.flight.requests.per.connection: how many
messages to send without receiving responses
●
When a producer firsts connect, it issues a Metadata
request, with the topics you are publishing to, and
the broker replies with which broker has which
partition.
Producer
16. 16 / 40
●
Kafka consumers are typically part of a consumer
group.
●
Multiple consumers of same group, will consume
from a different subset of the partitions in the topic.
●
One consumer per thread rule.
Consumer
17. 17 / 40
●
Consumer keep track of consumed messages by
saving offset position in kafka.
●
__consumer_offsets topic: [partition key: groupId,
topic and partition number] > offset
●
When consumers are added or leave, partitions are
re-allocated. Moving partition ownership from one
consumer to another is called a rebalance.
●
Rebalances provide the consumer group with high
availability and scalability.
Consumer
20. 20 / 40
Kafka optimizations
●
Kafka achieves high throughput and low latency
primarily from 2 key concepts:
– Batching of individual messages to amortizee n/w
overhead and append/consume chunks together.
– Zero copy using sendFile, skips unnecessary
copying to user space.
●
Relies heavily on Linux page cache to batch writes
and re-sequence writes to minimize disk-head
movement
●
Automatically uses all free memory.
●
If consumers are caught up with producers, you are
essentially reading from cache.
●
Batches are also typically compressed
21. 21 / 40
Cluster Sizing
●
No of brokers is goverened by data size &
capacity of the cluster to handle requests.
●
No. of brokers >= replication factor
●
Separate zookeper ensemble the atleast 3 nodes
(tolerates 1 failure), atleast 5 is ideal to support
2 failures. Requires (N/2 +1) majority to function.
●
Kafka and zookeper shouldn‘t be co-located as
both are disk I/O latency sensitive.
●
Kafka uses Linux OS page cache to reduce disk
I/O. Other apps "pollute" the page cache with
other data that takes away from cache for Kafka.
●
Single zookeper ensemble can be used for
multiple kafka clusters.
22. 22 / 40
Hardware
●
Use SSDs if possible, multiple data directories.
●
Make good amount of memory available to
system, to improve page cache performance.
●
Available network throughput along with disk
storage governs cluster sizing.
●
If network interface become saturated, cluster
replication can fall behind
●
Ideally, clients should compress messages to
optimize network and disk usage.
●
Processing power not as important as disk and
memory. CPU is chiefly required for
compression/decompression.
24. 24 / 40
Pesistence
●
Kafka always immediately writes all data to the
filesystem.
●
On Linux, the messages are written to the
filesystem cache and there is no guarantee
about when they will be written to disk.
●
Supports flush policy configuration (from OS
cache to disk).
●
Durability in Kafka (or other distributed systems)
does not require syncing data to disk, as a failed
node will recover from its replicas.
●
Recommended to use default flush settings
which disable application fsync entirely.
25. 25 / 40
Message Retention
●
By time and/or size
●
log.retention.ms: 1 week enabled by default
●
log.retention.bytes: Applied per partition
●
Compacted Topics: Fine grained per record log
retention.
– Retains at least the last value for each key for a
single topic partition.
– Allows downstream consumers to restore their
state after a crash/reloading cache.
27. 27 / 40
Broker Guarantees
●
Kafka provides order guarantee of messages in a
partition
●
Produced messages are considered committed when
they were written to the partition on all its in-sync
replicas (but not necessarily flushed to disk)
●
Messages that are committed will not be lost as long
as at least one replica remains alive.
●
Consumers can only read messages that are
committed and see records in the order they are
stored in the log.
●
Stronger ordering guarantees than traditional
messaging system like rabbitmq.
28. 28 / 40
●
Guarantees are necessary but not sufficient
to make the system fully reliable.
●
Trade-offs are invloved between
reliability/consistency and
availability/throughput/latency/hardware cost
●
Kafka’s replication mechanism, with multiple
replicas per partition, is at the core of its
reliability guarantees
●
Reliability can be configured per broker/topic
by certain parameters
Configurable Reliability
29. 29 / 40
Reliability parameters
●
Replication Factor (Default: 1): A replication factor
of N allows you to lose N-1 brokers while still being
able to read and write data to the topic reliably. So a
higher replication factor leads to higher availability,
higher reliability, and fewer disasters.
●
On flipside, for RF=N, you need N times storage
space.
●
Placement of replicas is also important: Kafka
supports rack awareness.
●
Unclean Leader Election (Default true) : Allow out-
of-sync replica to become leader or not.
●
Minimum In-Sync Replicas (Default: 1): Minimum
no. of ISR to be available to accept produce requests.
30. 30 / 40
Producing Reliably
●
Use the correct acks configuration:
– acks=0: successfully sent over the n/w
– acks=1: leader will send acknowledgment/error
after writing to partition
– acks=all: leader will wait until all in-sync replicas
got the message before sending back an
acknowledgment or an error. Here all means
min.insync.replica. Slowest option.
●
Handle errors correctly. Retry all retiable errors e.g
Leader not available, to prevent message loss.
●
Can lead to duplicate publish in case of lost
acknowledgments.
●
Error Handling strategies: Ignore/log/report
31. 31 / 40
Consuming Reliably
●
Consumers can lose messages when committing
offsets for events they’ve read but haven‘t processed
●
auto.offset.reset: (earliest/latest/..) what the
consumer will do when no commited offsets are
found
●
enable.auto.commit: set it to true if you can finish
processing the batch within the poll loop in sync.
●
No control over the number of duplicate records you
may need to process
●
auto.commit.interval.ms: tradeoff between commit
overhead vs duplicate processing
●
Handle Rebalances properly: Commit offsets before
partitions are revoked, clear old state on assignment
32. 32 / 40
●
Need to retry in case of error in upstream system like
database.
●
Can‘t block indefinitely, as need to poll to prevent
consumer session termination.
●
Commit last successfully processed record, store retry
packets in buffer and keep trying to process the
records.
●
Can lead to unbounded buffer problem
●
Use the consumer pause() method to ensure that
additional polls won’t return additional data and
resume if retry succeeds.
●
Dead letter topic: Write to separate topic and
continue
●
Separate or same consumer group can be used to
handle retries from the retry topic and pause the retry
topic between retries
34. 34 / 40
Idempotent Producer
●
Every new producer will be assigned a unique PID
during initialization
●
Producer will have a sequence no. for each
partition. It will be incremented by the producer
on every message sent to the broker.
●
The broker maintains persistent sequence
numbers it receives for each topic partition from
every PID.
●
The broker will reject a produce request if its seq-
no is not exactly one greater than the last
committed message from that PID/TopicPartition
pair
●
configure your producer to set
“enable.idempotence=true”
35. 35 / 40
Excatly once consumer
●
Easiest way is to make consumer operation
idempotent (idempotent writes).
●
Write to a system that has transactions
– The idea is to write the records and their
offsets in the same transaction so they will
be in-sync.
– When starting up, retrieve the offsets of the
latest records written to the external store
and then use consumer.seek() to start
consuming again from those offsets.
36. 36 / 40
Transactions
●
Kafka now supports atomic writes across
multiple partitions through the new
transactions API (v 0.11)
●
Allows a producer to send a batch of
messages to multiple partitions such that
either all or no messages are visible to any
consumer
●
Allows you to commit your consumer offsets
in the same transaction along with the data,
thereby allowing end-to-end exactly-once
semantics.
●
Idempotency and atomicity, make exactly-
once stream processing possible.
37. 37 / 40
Validating Reliability
●
Leader election: what happens if I kill the leader?
How long does it take the producer and consumer to
start working as usual again?
●
Controller election: how long does it take the system
to resume after a restart of the controller?
●
Rolling restart: can I restart the brokers one by one
without losing any messages?
●
Unclean leader election test: what happens when we
kill all the replicas for a partition one by one (to
make sure each goes out of sync) and then start a
broker that was out of sync?
●
Rolling restart of consumers
●
Clients lose connectivity to the server
39. 39 / 40
Apache Kafka RabbitMQ
Creation year 2011 2007
License Apache (Open source)
Mozilla Public License
(Open source)
Enterprise support
available
✘ ✓
Programming language Scala Erlang
AMQP compliant ✘ ✓
Officially supported clients1
in
Java2 Java, .NET/C#, Erlang
Other available clients in
~ 10 languages (incl.
Python and Node.js)
~ 13 languages (incl.
Python, PHP and Node.js)
Documentation, examples
and community3
∿
Immature and incomplete
✓
Table 1 -‐ General information comparison between Apache Kafka and RabbitMQ
Predefined queues on
broker
✓ ✓
Multiple different
consumers of same data set1
✓
∿
Load balancing across a set
of same consumers
✓ ✓
Remote2 queue definition ✘ ✓
Advanced/conditional
message routing
✘
Message to partition only
✓
Permission and access
control list support
✘ ✓
SSL support ✘ ✓
Clustering support ✓ ✓
Self-‐sufficient
✘
Needs ZooKeeper ✓
Management and
monitoring interface
✘
JMX and CLI-‐based
✓
Web and CLI-‐based