Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This Introduction to Kafka streaming platform covers Kafka Architecture with many small examples from the command line. Then we expand on this with a multi-server example. We walk you through Consumer failover. Then we walk you through clustering and Kafka Broker failover. It covers consumers, producers, and clustering basics.
Kafka Tutorial - Introduction to Apache Kafka (Part 2)Jean-Paul Azar
Â
Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This slide deck is a tutorial for the Kafka streaming platform. This slide deck covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example to demonstrate failover of brokers as well as consumers. Then it goes through some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have also expanded on the Kafka design section and added references. The tutorial covers Avro and the Schema Registry as well as advance Kafka Producers.
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
Â
Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This slide deck is a tutorial for the Kafka streaming platform. This slide deck covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example to demonstrate failover of brokers as well as consumers. Then it goes through some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have also expanded on the Kafka design section and added references. The tutorial covers Avro and the Schema Registry as well as advance Kafka Producers.
Kafka Intro With Simple Java Producer ConsumersJean-Paul Azar
Â
Introduction to Kafka streaming platform. Covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example. Lastly, we added some simple Java client examples for a Kafka Producer and a Kafka Consumer.
Avro Tutorial - Records with Schema for Kafka and HadoopJean-Paul Azar
Â
Covers how to use Avro to save records to disk. This can be used later to use Avro with Kafka Schema Registry. This provides background on Avro which gets used with Hadoop and Kafka.
Kafka Tutorial - introduction to the Kafka streaming platformJean-Paul Azar
Â
Why is Kafka so fast? Why is Kafka so popular? Why Kafka?
Introduction to Kafka streaming platform. Covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example. Lastly, we added some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have started to expand on the Java examples to correlate with the design discussion of Kafka. We have also expanded on the Kafka design section and added references.
Amazon Cassandra Basics & Guidelines for AWS/EC2/VPC/EBSJean-Paul Azar
Â
A comprehensive guide to deploying and configuring Cassandra on AWS/EC2. This guide is accurate and up to date as of 2017. There is a lot of information out there, and some of it is old or just wrong. Examples come from working code. This guide covers Ec2MultiRegionSnitch and EC2Snitch, broadcast address, using KMS to encrypt EBS, SSL config which is required for Ec2MultiRegionSnitch, Ansible, SSH Config, and setting up bastions so you can deploy your cluster on private subnets (NatGateway, how to setup routes, security groups, etc.). We also cover multi-region, multi-DC Cassandra deployments using VPN. We include the advantages and set up for enhanced networking and cluster placement groups in EC2.
We even cover how to setup and use the new EBS elastic volumes and how they benefit Cassandra deploys on AWS. We also cover how to setup Cassandra with systemd, and how to enable CloudWatch monitoring for Cassandra and the Linux OS for metrics and log aggregation.
From NAT setup to how to configure the GC and which EC2 instances to pick, this is the most comprehensive guide to deploying Cassandra on AWS/EC2/VPC.
Kafka Tutorial - Introduction to Apache Kafka (Part 2)Jean-Paul Azar
Â
Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This slide deck is a tutorial for the Kafka streaming platform. This slide deck covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example to demonstrate failover of brokers as well as consumers. Then it goes through some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have also expanded on the Kafka design section and added references. The tutorial covers Avro and the Schema Registry as well as advance Kafka Producers.
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
Â
Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This slide deck is a tutorial for the Kafka streaming platform. This slide deck covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example to demonstrate failover of brokers as well as consumers. Then it goes through some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have also expanded on the Kafka design section and added references. The tutorial covers Avro and the Schema Registry as well as advance Kafka Producers.
Kafka Intro With Simple Java Producer ConsumersJean-Paul Azar
Â
Introduction to Kafka streaming platform. Covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example. Lastly, we added some simple Java client examples for a Kafka Producer and a Kafka Consumer.
Avro Tutorial - Records with Schema for Kafka and HadoopJean-Paul Azar
Â
Covers how to use Avro to save records to disk. This can be used later to use Avro with Kafka Schema Registry. This provides background on Avro which gets used with Hadoop and Kafka.
Kafka Tutorial - introduction to the Kafka streaming platformJean-Paul Azar
Â
Why is Kafka so fast? Why is Kafka so popular? Why Kafka?
Introduction to Kafka streaming platform. Covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example. Lastly, we added some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have started to expand on the Java examples to correlate with the design discussion of Kafka. We have also expanded on the Kafka design section and added references.
Amazon Cassandra Basics & Guidelines for AWS/EC2/VPC/EBSJean-Paul Azar
Â
A comprehensive guide to deploying and configuring Cassandra on AWS/EC2. This guide is accurate and up to date as of 2017. There is a lot of information out there, and some of it is old or just wrong. Examples come from working code. This guide covers Ec2MultiRegionSnitch and EC2Snitch, broadcast address, using KMS to encrypt EBS, SSL config which is required for Ec2MultiRegionSnitch, Ansible, SSH Config, and setting up bastions so you can deploy your cluster on private subnets (NatGateway, how to setup routes, security groups, etc.). We also cover multi-region, multi-DC Cassandra deployments using VPN. We include the advantages and set up for enhanced networking and cluster placement groups in EC2.
We even cover how to setup and use the new EBS elastic volumes and how they benefit Cassandra deploys on AWS. We also cover how to setup Cassandra with systemd, and how to enable CloudWatch monitoring for Cassandra and the Linux OS for metrics and log aggregation.
From NAT setup to how to configure the GC and which EC2 instances to pick, this is the most comprehensive guide to deploying Cassandra on AWS/EC2/VPC.
Kafka Tutorial: Streaming Data ArchitectureJean-Paul Azar
Â
Kafka tutorial covers Java examples for Producers and Consumers. Also covers why Kafka is important and what Kafka is. Takes a look at the whole ecosystem around Kafka. Discusses low-level details about Kafka needed for successful deploys and performance tuning like batching, compression, partitioning, and replication.
Covers using Kafka MirrorMaker for disaster recovery, scaling reads, and to isolate mission critical clusters. Starts out with a description of MirrorMaker and how to use. Then walks through a thorough introduction and example. Step by Step.
Kafka and Avro with Confluent Schema RegistryJean-Paul Azar
Â
Covers how to use Kafka/Avro to send Records with support for schema and Avro serialization. Covers how to use Avro with Kafka and the confluent Schema Registry.
Kafka Tutorial - basics of the Kafka streaming platformJean-Paul Azar
Â
Introduction to Kafka streaming platform. Covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example. Lastly, we added some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have started to expand on the Java examples to correlate with the design discussion of Kafka. We have also expanded on the Kafka design section and added references.
This tutorial covers advanced consumer topics like custom deserializers, ConsumerRebalanceListener to rewind to a certain offset, manual assignment of partitions to implement a "priority queue", “at least once” message delivery semantics Consumer Java example, “at most once” message delivery semantics Consumer Java example, “exactly once” message delivery semantics Consumer Java example, and a lot more.
Amazon AWS basics needed to run a Cassandra Cluster in AWSJean-Paul Azar
Â
There is a lot of advice on how to configure a Cassandra cluster on AWS. Not every configuration meets every use case.
Best way to know how to deploy Cassandra on AWS is to know the basics of AWS. Part 1: We start covering AWS (as it applies to Cassandra). Later we go into detail with AWS Cassandra specifics.
Best Practices for Running Kafka on Docker ContainersBlueData, Inc.
Â
Docker containers provide an ideal foundation for running Kafka-as-a-Service on-premises or in the public cloud. However, using Docker containers in production environments for Big Data workloads using Kafka poses some challenges – including container management, scheduling, network configuration and security, and performance.
In this session at Kafka Summit in August 2017, Nanda Vijyaydev of BlueData shared lessons learned from implementing Kafka-as-a-Service with Docker containers.
https://kafka-summit.org/sessions/kafka-service-docker-containers
Building Event-Driven Systems with Apache KafkaBrian Ritchie
Â
Event-driven systems provide simplified integration, easy notifications, inherent scalability and improved fault tolerance. In this session we'll cover the basics of building event driven systems and then dive into utilizing Apache Kafka for the infrastructure. Kafka is a fast, scalable, fault-taulerant publish/subscribe messaging system developed by LinkedIn. We will cover the architecture of Kafka and demonstrate code that utilizes this infrastructure including C#, Spark, ELK and more.
Sample code: https://github.com/dotnetpowered/StreamProcessingSample
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...StreamNative
Â
We will introduce HerdDB a distributed database written in Java.
We will see how a distributed database can be built using Apache BookKeeper as write-ahead commit log.
Type safety is extremely important in any application built around a stream / queue. Type definition and evolution can either be built in the application or relied upon the data layer to support it out of the box allowing the application to only concentrate on business logic, not how of data store and evolution. It is this property of the good old relational databases (among others) that make them a favourite among all the modern NoSQL databases. Modern software architectures requires asynchronous communication (via stream / queue). While the data store and query design changes with asynchronous communication, type safety is still equally important.
In this slide deck, used for Apache Con 2021 talk, we will go over ways in which one can force structure (schema) over the streaming data. As an example, we will talk about Apache Pulsar. Apache pulsar offers server as well as client side support for the structured streaming. We have been using pulsar for asynchronous communication among microservices in our nutanix beam and flow security central apps for over 1.5 years in production. This deck presents the technical details on what is schema, how to represent schema, what is available in the apache pulsar server and client side, how we have used pulsar’s schema support to build our use cases and our learnings from them.
This session goes through the understanding of Apache Kafka, its components and working with best practices to achieve fault tolerant system with high availability and consistency by tuning Kafka brokers and producer to achieve the best result.
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesShivji Kumar Jha
Â
In order to leverage the best performance characters of your data or stream backend, it is important to understand the nitty gritty details of how your backend store and compute works, how data is stored, how is it indexed and how the read path is. Understanding this empowers you to design your use case solutioning so as to make the best use of resources at hand as well as get the optimum amount of consistency, availability, latency and throughput for a given amount of resources at hand.
With this underlying philosophy, in this slide deck, we will get to the bottom of storage tier of pulsar (apache bookkeeper), the barebones of the bookkeeper storage semantics, how it is used in different use cases ( even other than pulsar), understand the object models of storage in pulsar, different kinds of data structures and algorithms pulsar uses therein and how that maps to the semantics of the storage class shipped with pulsar by default. Oh yes, you can change the storage backend too with some additional code!
The focus will be more on storage backend so as to not keep this tailored to pulsar specifically but to be able to apply it different data stores or streams.
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Denodo
Â
Watch full webinar here: https://buff.ly/43PDVsz
In today's fast-paced, data-driven world, organizations need real-time data pipelines and streaming applications to make informed decisions. Apache Kafka, a distributed streaming platform, provides a powerful solution for building such applications and, at the same time, gives the ability to scale without downtime and to work with high volumes of data. At the heart of Apache Kafka lies Kafka Topics, which enable communication between clients and brokers in the Kafka cluster.
Join us for this session with Pooja Dusane, Data Engineer at Denodo where we will explore the critical role that Kafka listeners play in enabling connectivity to Kafka Topics. We'll dive deep into the technical details, discussing the key concepts of Kafka listeners, including their role in enabling real-time communication between consumers and producers. We'll also explore the various configuration options available for Kafka listeners and demonstrate how they can be customized to suit specific use cases.
Attend and Learn:
- The critical role that Kafka listeners play in enabling connectivity in Apache Kafka.
- Key concepts of Kafka listeners and how they enable real-time communication between clients and brokers.
- Configuration options available for Kafka listeners and how they can be customized to suit specific use cases.
Kafka Tutorial: Streaming Data ArchitectureJean-Paul Azar
Â
Kafka tutorial covers Java examples for Producers and Consumers. Also covers why Kafka is important and what Kafka is. Takes a look at the whole ecosystem around Kafka. Discusses low-level details about Kafka needed for successful deploys and performance tuning like batching, compression, partitioning, and replication.
Covers using Kafka MirrorMaker for disaster recovery, scaling reads, and to isolate mission critical clusters. Starts out with a description of MirrorMaker and how to use. Then walks through a thorough introduction and example. Step by Step.
Kafka and Avro with Confluent Schema RegistryJean-Paul Azar
Â
Covers how to use Kafka/Avro to send Records with support for schema and Avro serialization. Covers how to use Avro with Kafka and the confluent Schema Registry.
Kafka Tutorial - basics of the Kafka streaming platformJean-Paul Azar
Â
Introduction to Kafka streaming platform. Covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example. Lastly, we added some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have started to expand on the Java examples to correlate with the design discussion of Kafka. We have also expanded on the Kafka design section and added references.
This tutorial covers advanced consumer topics like custom deserializers, ConsumerRebalanceListener to rewind to a certain offset, manual assignment of partitions to implement a "priority queue", “at least once” message delivery semantics Consumer Java example, “at most once” message delivery semantics Consumer Java example, “exactly once” message delivery semantics Consumer Java example, and a lot more.
Amazon AWS basics needed to run a Cassandra Cluster in AWSJean-Paul Azar
Â
There is a lot of advice on how to configure a Cassandra cluster on AWS. Not every configuration meets every use case.
Best way to know how to deploy Cassandra on AWS is to know the basics of AWS. Part 1: We start covering AWS (as it applies to Cassandra). Later we go into detail with AWS Cassandra specifics.
Best Practices for Running Kafka on Docker ContainersBlueData, Inc.
Â
Docker containers provide an ideal foundation for running Kafka-as-a-Service on-premises or in the public cloud. However, using Docker containers in production environments for Big Data workloads using Kafka poses some challenges – including container management, scheduling, network configuration and security, and performance.
In this session at Kafka Summit in August 2017, Nanda Vijyaydev of BlueData shared lessons learned from implementing Kafka-as-a-Service with Docker containers.
https://kafka-summit.org/sessions/kafka-service-docker-containers
Building Event-Driven Systems with Apache KafkaBrian Ritchie
Â
Event-driven systems provide simplified integration, easy notifications, inherent scalability and improved fault tolerance. In this session we'll cover the basics of building event driven systems and then dive into utilizing Apache Kafka for the infrastructure. Kafka is a fast, scalable, fault-taulerant publish/subscribe messaging system developed by LinkedIn. We will cover the architecture of Kafka and demonstrate code that utilizes this infrastructure including C#, Spark, ELK and more.
Sample code: https://github.com/dotnetpowered/StreamProcessingSample
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...StreamNative
Â
We will introduce HerdDB a distributed database written in Java.
We will see how a distributed database can be built using Apache BookKeeper as write-ahead commit log.
Type safety is extremely important in any application built around a stream / queue. Type definition and evolution can either be built in the application or relied upon the data layer to support it out of the box allowing the application to only concentrate on business logic, not how of data store and evolution. It is this property of the good old relational databases (among others) that make them a favourite among all the modern NoSQL databases. Modern software architectures requires asynchronous communication (via stream / queue). While the data store and query design changes with asynchronous communication, type safety is still equally important.
In this slide deck, used for Apache Con 2021 talk, we will go over ways in which one can force structure (schema) over the streaming data. As an example, we will talk about Apache Pulsar. Apache pulsar offers server as well as client side support for the structured streaming. We have been using pulsar for asynchronous communication among microservices in our nutanix beam and flow security central apps for over 1.5 years in production. This deck presents the technical details on what is schema, how to represent schema, what is available in the apache pulsar server and client side, how we have used pulsar’s schema support to build our use cases and our learnings from them.
This session goes through the understanding of Apache Kafka, its components and working with best practices to achieve fault tolerant system with high availability and consistency by tuning Kafka brokers and producer to achieve the best result.
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesShivji Kumar Jha
Â
In order to leverage the best performance characters of your data or stream backend, it is important to understand the nitty gritty details of how your backend store and compute works, how data is stored, how is it indexed and how the read path is. Understanding this empowers you to design your use case solutioning so as to make the best use of resources at hand as well as get the optimum amount of consistency, availability, latency and throughput for a given amount of resources at hand.
With this underlying philosophy, in this slide deck, we will get to the bottom of storage tier of pulsar (apache bookkeeper), the barebones of the bookkeeper storage semantics, how it is used in different use cases ( even other than pulsar), understand the object models of storage in pulsar, different kinds of data structures and algorithms pulsar uses therein and how that maps to the semantics of the storage class shipped with pulsar by default. Oh yes, you can change the storage backend too with some additional code!
The focus will be more on storage backend so as to not keep this tailored to pulsar specifically but to be able to apply it different data stores or streams.
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Denodo
Â
Watch full webinar here: https://buff.ly/43PDVsz
In today's fast-paced, data-driven world, organizations need real-time data pipelines and streaming applications to make informed decisions. Apache Kafka, a distributed streaming platform, provides a powerful solution for building such applications and, at the same time, gives the ability to scale without downtime and to work with high volumes of data. At the heart of Apache Kafka lies Kafka Topics, which enable communication between clients and brokers in the Kafka cluster.
Join us for this session with Pooja Dusane, Data Engineer at Denodo where we will explore the critical role that Kafka listeners play in enabling connectivity to Kafka Topics. We'll dive deep into the technical details, discussing the key concepts of Kafka listeners, including their role in enabling real-time communication between consumers and producers. We'll also explore the various configuration options available for Kafka listeners and demonstrate how they can be customized to suit specific use cases.
Attend and Learn:
- The critical role that Kafka listeners play in enabling connectivity in Apache Kafka.
- Key concepts of Kafka listeners and how they enable real-time communication between clients and brokers.
- Configuration options available for Kafka listeners and how they can be customized to suit specific use cases.
Lesfurest.com invited me to talk about the KAPPA Architecture style during a BBL.
Kappa architecture is a style for real-time processing of large volumes of data, combining stream processing, storage, and serving layers into a single pipeline. It's different from the Lambda architecture, uses separate batch and stream processing pipelines.
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...HostedbyConfluent
Â
Event-driven application architectures are becoming increasingly common as a large number of users demand more interactive, real-time, and intelligent responses. Yet it can be challenging to decide how to capture and perform real-time data analysis and deliver differentiating experiences. Join experts from Confluent and AWS to learn how to build Apache Kafka®-based streaming applications backed by machine learning models. Adopting the recommendations will help you establish repeatable patterns for high performing event-based apps.
Apache Kafka - Scalable Message Processing and more!Guido Schmutz
Â
After a quick overview and introduction of Apache Kafka, this session cover two components which extend the core of Apache Kafka: Kafka Connect and Kafka Streams/KSQL.
Kafka Connects role is to access data from the out-side-world and make it available inside Kafka by publishing it into a Kafka topic. On the other hand, Kafka Connect is also responsible to transport information from inside Kafka to the outside world, which could be a database or a file system. There are many existing connectors for different source and target systems available out-of-the-box, either provided by the community or by Confluent or other vendors. You simply configure these connectors and off you go.
Kafka Streams is a light-weight component which extends Kafka with stream processing functionality. By that, Kafka can now not only reliably and scalable transport events and messages through the Kafka broker but also analyse and process these event in real-time. Interestingly Kafka Streams does not provide its own cluster infrastructure and it is also not meant to run on a Kafka cluster. The idea is to run Kafka Streams where it makes sense, which can be inside a “normal” Java application, inside a Web container or on a more modern containerized (cloud) infrastructure, such as Mesos, Kubernetes or Docker. Kafka Streams has a lot of interesting features, such as reliable state handling, queryable state and much more. KSQL is a streaming engine for Apache Kafka, providing a simple and completely interactive SQL interface for processing data in Kafka.
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...DataStax Academy
Â
Typesafe did a survey of Spark usage last year and found that a large percentage of Spark users combine it with Cassandra and Kafka. This talk focuses on streaming data scenarios that demonstrate how these three tools complement each other for building robust, scalable, and flexible data applications. Cassandra provides resilient and scalable storage, with flexible data format and query options. Kafka provides durable, scalable collection of streaming data with message-queue semantics. Spark provides very flexible analytics, everything from classic SQL queries to machine learning and graph algorithms, running in a streaming model based on "mini-batches", offline batch jobs, or interactive queries. We'll consider best practices and areas where improvements are needed.
In this Kafka Tutorial, we will discuss Kafka Architecture. In this Kafka Architecture article, we will see API’s in Kafka. Moreover, we will learn about Kafka Broker, Kafka Consumer, Zookeeper, and Kafka Producer. Also, we will see some fundamental concepts of Kafka.
Paolo Castagna is a Senior Sales Engineer at Confluent. His background is on 'big data' and he has, first hand, saw the shift happening in the industry from batch to stream processing and from big data to fast data. His talk will introduce Kafka Streams and explain why Apache Kafka is a great option and simplification for stream processing.
Kafka, Apache Kafka evolved from an enterprise messaging system to a fully distributed streaming data platform (Kafka Core + Kafka Connect + Kafka Streams) for building streaming data pipelines and streaming data applications.
This talk, that I gave at the Chicago Java Users Group (CJUG) on June 8th 2017, is mainly focusing on Kafka Streams, a lightweight open source Java library for building stream processing applications on top of Kafka using Kafka topics as input/output.
You will learn more about the following:
1. Apache Kafka: a Streaming Data Platform
2. Overview of Kafka Streams: Before Kafka Streams? What is Kafka Streams? Why Kafka Streams? What are Kafka Streams key concepts? Kafka Streams APIs and code examples?
3. Writing, deploying and running your first Kafka Streams application
4. Code and Demo of an end-to-end Kafka-based Streaming Data Application
5. Where to go from here?
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Kai Wähner
Â
High level introduction to Confluent REST Proxy and Schema Registry (leveraging Apache Avro under the hood), two components of the Apache Kafka open source ecosystem. See the concepts, architecture and features.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Â
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
Â
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Â
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Â
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Â
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Â
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Â
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
Â
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
Â
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Â
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Â
Kafka Tutorial, Kafka ecosystem with clustering examples
1. ™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Cassandra and Kafka Support on AWS/EC2
Cloudurable
Introduction to Kafka
Support around Cassandra
and Kafka running in EC2
3. ™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Kafka growing
Why Kafka?
Kafka adoption is on the rise
but why
What is Kafka?
4. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka growth exploding
âť– Kafka growth exploding
âť– 1/3 of all Fortune 500 companies
âť– Top ten travel companies, 7 of top ten banks, 8 of top
ten insurance companies, 9 of top ten telecom
companies
âť– LinkedIn, Microsoft and Netflix process 4 comma
message a day with Kafka (1,000,000,000,000)
âť– Real-time streams of data, used to collect big data or to
do real time analysis (or both)
5. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Why Kafka is Needed?
âť– Real time streaming data processed for real time
analytics
âť– Service calls, track every call, IOT sensors
âť– Apache Kafka is a fast, scalable, durable, and fault-
tolerant publish-subscribe messaging system
âť– Kafka is often used instead of JMS, RabbitMQ and
AMQP
âť– higher throughput, reliability and replication
6. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Why is Kafka needed? 2
âť– Kafka can works in combination with
âť– Flume/Flafka, Spark Streaming, Storm, HBase and Spark
for real-time analysis and processing of streaming data
âť– Feed your data lakes with data streams
âť– Kafka brokers support massive message streams for follow-
up analysis in Hadoop or Spark
âť– Kafka Streaming (subproject) can be used for real-time
analytics
7. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Use Cases
âť– Stream Processing
âť– Website Activity Tracking
âť– Metrics Collection and Monitoring
âť– Log Aggregation
âť– Real time analytics
âť– Capture and ingest data into Spark / Hadoop
âť– CRQS, replay, error recovery
âť– Guaranteed distributed commit log for in-memory computing
8. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Who uses Kafka?
âť– LinkedIn: Activity data and operational metrics
❖ Twitter: Uses it as part of Storm – stream processing
infrastructure
âť– Square: Kafka as bus to move all system events to various
Square data centers (logs, custom events, metrics, an so
on). Outputs to Splunk, Graphite, Esper-like alerting
systems
âť– Spotify, Uber, Tumbler, Goldman Sachs, PayPal, Box,
Cisco, CloudFlare, DataDog, LucidWorks, MailChimp,
NetFlix, etc.
9. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Why is Kafka Popular?
âť– Great performance
âť– Operational Simplicity, easy to setup and use, easy to reason
âť– Stable, Reliable Durability,
âť– Flexible Publish-subscribe/queue (scales with N-number of consumer groups),
âť– Robust Replication,
âť– Producer Tunable Consistency Guarantees,
âť– Ordering Preserved at shard level (Topic Partition)
âť– Works well with systems that have data streams to process, aggregate,
transform & load into other stores
Most important reason: Kafka’s great performance: throughput, latency, obtained through great engineering
10. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Why is Kafka so fast?
âť– Zero Copy - calls the OS kernel direct rather to move data fast
âť– Batch Data in Chunks - Batches data into chunks
âť– end to end from Producer to file system to Consumer
âť– Provides More efficient data compression. Reduces I/O latency
âť– Sequential Disk Writes - Avoids Random Disk Access
âť– writes to immutable commit log. No slow disk seeking. No random I/O
operations. Disk accessed in sequential manner
âť– Horizontal Scale - uses 100s to thousands of partitions for a single topic
âť– spread out to thousands of servers
âť– handle massive load
11. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Streaming Architecture
12. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Why Kafka Review
âť– Why is Kafka so fast?
âť– How fast is Kafka usage growing?
âť– How is Kafka getting used?
âť– Where does Kafka fit in the Big Data Architecture?
âť– How does Kafka relate to real-time analytics?
âť– Who uses Kafka?
13. ™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Cassandra / Kafka Support in EC2/AWS
What is Kafka? Kafka messaging
Kafka Overview
14. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
What is Kafka?
âť– Distributed Streaming Platform
âť– Publish and Subscribe to streams of records
âť– Fault tolerant storage
âť– Replicates Topic Log Partitions to multiple servers
âť– Process records as they occur
âť– Fast, efficient IO, batching, compression, and more
âť– Used to decouple data streams
15. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Decoupling Data
Streams
16. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Polyglot clients / Wire
protocol
âť– Kafka communication from clients and servers wire
protocol over TCP protocol
âť– Protocol versioned
âť– Maintains backwards compatibility
âť– Many languages supported
âť– Kafka REST proxy allows easy integration
âť– Also provides Avro/Schema registry support via Kafka
ecosystem
17. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Usage
âť– Build real-time streaming applications that react to streams
âť– Real-time data analytics
âť– Transform, react, aggregate, join real-time data flows
âť– Feed events to CEP for complex event processing
âť– Feed data lakes
âť– Build real-time streaming data pipe-lines
âť– Enable in-memory microservices (actors, Akka, Vert.x, Qbit,
RxJava)
18. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Use Cases
âť– Metrics / KPIs gathering
âť– Aggregate statistics from many sources
âť– Event Sourcing
âť– Used with microservices (in-memory) and actor systems
âť– Commit Log
âť– External commit log for distributed systems. Replicated
data between nodes, re-sync for nodes to restore state
âť– Real-time data analytics, Stream Processing, Log
Aggregation, Messaging, Click-stream tracking, Audit trail,
etc.
19. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Record Retention
âť– Kafka cluster retains all published records
❖ Time based – configurable retention period
âť– Size based - configurable based on size
âť– Compaction - keeps latest record
âť– Retention policy of three days or two weeks or a month
âť– It is available for consumption until discarded by time, size or
compaction
âť– Consumption speed not impacted by size
20. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka scalable message
storage
âť– Kafka acts as a good storage system for records/messages
âť– Records written to Kafka topics are persisted to disk and replicated to
other servers for fault-tolerance
âť– Kafka Producers can wait on acknowledgment
âť– Write not complete until fully replicated
âť– Kafka disk structures scales well
âť– Writing in large streaming batches is fast
âť– Clients/Consumers can control read position (offset)
âť– Kafka acts like high-speed file system for commit log storage,
replication
21. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Review
âť– How does Kafka decouple streams of data?
âť– What are some use cases for Kafka where you work?
âť– What are some common use cases for Kafka?
âť– How is Kafka like a distributed message storage
system?
âť– How does Kafka know when to delete old messages?
âť– Which programming languages does Kafka support?
22. ™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Kafka Architecture
23. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Fundamentals
âť– Records have a key (optional), value and timestamp; Immutable
❖ Topic a stream of records (“/orders”, “/user-signups”), feed name
âť– Log topic storage on disk
âť– Partition / Segments (parts of Topic Log)
âť– Producer API to produce a streams or records
âť– Consumer API to consume a stream of records
âť– Broker: Kafka server that runs in a Kafka Cluster. Brokers form a cluster.
Cluster consists on many Kafka Brokers on many servers.
âť– ZooKeeper: Does coordination of brokers/cluster topology. Consistent file
system for configuration information and leadership election for Broker Topic
Partition Leaders
24. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka: Topics, Producers, and
Consumers
Kafka
Cluster
Topic
Producer
Producer
Producer
Consumer
Consumer
Consumer
record
record
25. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Core Kafka
26. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka needs Zookeeper
âť– Zookeeper helps with leadership election of Kafka Broker and
Topic Partition pairs
âť– Zookeeper manages service discovery for Kafka Brokers that
form the cluster
âť– Zookeeper sends changes to Kafka
âť– New Broker join, Broker died, etc.
âť– Topic removed, Topic added, etc.
âť– Zookeeper provides in-sync view of Kafka Cluster configuration
27. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Producer/Consumer
Details
âť– Producers write to and Consumers read from Topic(s)
âť– Topic associated with a log which is data structure on disk
âť– Producer(s) append Records at end of Topic log
âť– Topic Log consist of Partitions -
âť– Spread to multiple files on multiple nodes
âť– Consumers read from Kafka at their own cadence
âť– Each Consumer (Consumer Group) tracks offset from where they left off reading
âť– Partitions can be distributed on different machines in a cluster
âť– high performance with horizontal scalability and failover with replication
28. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Topic Partition, Consumers,
Producers
0 1 42 3 5 6 7 8 9 10 11
Partition
0
Consumer Group A
Producer
Consumer Group B
Consumer groups remember offset where they left off.
Consumers groups each have their own offset.
Producer writing to offset 12 of Partition 0 while…
Consumer Group A is reading from offset 6.
Consumer Group B is reading from offset 9.
29. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Scale and Speed
âť– How can Kafka scale if multiple producers and consumers read/write to
same Kafka Topic log?
âť– Writes fast: Sequential writes to filesystem are fast (700 MB or more a
second)
âť– Scales writes and reads by sharding:
âť– Topic logs into Partitions (parts of a Topic log)
âť– Topics logs can be split into multiple Partitions different
machines/different disks
âť– Multiple Producers can write to different Partitions of the same Topic
âť– Multiple Consumers Groups can read from different partitions efficiently
30. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Brokers
âť– Kafka Cluster is made up of multiple Kafka Brokers
âť– Each Broker has an ID (number)
âť– Brokers contain topic log partitions
âť– Connecting to one broker bootstraps client to entire
cluster
âť– Start with at least three brokers, cluster can have, 10,
100, 1000 brokers if needed
31. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Cluster, Failover, ISRs
âť– Topic Partitions can be replicated
âť– across multiple nodes for failover
âť– Topic should have a replication factor greater than 1
âť– (2, or 3)
âť– Failover
âť– if one Kafka Broker goes down then Kafka Broker with
ISR (in-sync replica) can serve data
32. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
ZooKeeper does coordination for Kafka
Cluster
Kafka BrokerProducer
Producer
Producer
Consumer
Consumer
Consumer
Kafka Broker
Kafka Broker
Topic
ZooKeeper
33. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Failover vs. Disaster Recovery
âť– Replication of Kafka Topic Log partitions allows for failure of
a rack or AWS availability zone
âť– You need a replication factor of at least 3
âť– Kafka Replication is for Failover
âť– Mirror Maker is used for Disaster Recovery
âť– Mirror Maker replicates a Kafka cluster to another data-center
or AWS region
âť– Called mirroring since replication happens within a cluster
34. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Review
âť– How does Kafka decouple streams of data?
âť– What are some use cases for Kafka where you work?
âť– What are some common use cases for Kafka?
âť– What is a Topic?
âť– What is a Broker?
âť– What is a Partition? Offset?
âť– Can Kafka run without Zookeeper?
âť– How do implement failover in Kafka?
âť– How do you implement disaster recovery in Kafka?
35. ™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Kafka versus
36. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka vs JMS, SQS, RabbitMQ
Messaging
âť– Is Kafka a Queue or a Pub/Sub/Topic?
âť– Yes
âť– Kafka is like a Queue per consumer group
âť– Kafka is a queue system per consumer in consumer group so load
balancing like JMS, RabbitMQ queue
âť– Kafka is like Topics in JMS, RabbitMQ, MOM
âť– Topic/pub/sub by offering Consumer Groups which act like
subscriptions
âť– Broadcast to multiple consumer groups
âť– MOM = JMS, ActiveMQ, RabbitMQ, IBM MQ Series, Tibco, etc.
37. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka vs MOM
âť– By design, Kafka is better suited for scale than traditional MOM systems due
to partition topic log
âť– Load divided among Consumers for read by partition
âť– Handles parallel consumers better than traditional MOM
âť– Also by moving location (partition offset) in log to client/consumer side of
equation instead of the broker, less tracking required by Broker and more
flexible consumers
âť– Kafka written with mechanical sympathy, modern hardware, cloud in mind
âť– Disks are faster
âť– Servers have tons of system memory
âť– Easier to spin up servers for scale out
38. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kinesis and Kafka are similar
âť– Kinesis Streams is like Kafka Core
âť– Kinesis Analytics is like Kafka Streams
âť– Kinesis Shard is like Kafka Partition
âť– Similar and get used in similar use cases
âť– In Kinesis, data is stored in shards. In Kafka, data is stored in partitions
âť– Kinesis Analytics allows you to perform SQL like queries on data streams
âť– Kafka Streaming allows you to perform functional aggregations and mutations
âť– Kafka integrates well with Spark and Flink which allows SQL like queries on
streams
39. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka vs. Kinesis
âť– Data is stored in Kinesis for default 24 hours, and you can increase that up to 7 days.
âť– Kafka records default stored for 7 days
âť– can increase until you run out of disk space.
âť– Decide by the size of data or by date.
âť– Can use compaction with Kafka so it only stores the latest timestamp per key per record
in the log
âť– With Kinesis data can be analyzed by lambda before it gets sent to S3 or RedShift
âť– With Kinesis you pay for use, by buying read and write units.
âť– Kafka is more flexible than Kinesis but you have to manage your own clusters, and requires
some dedicated DevOps resources to keep it going
âť– Kinesis is sold as a service and does not require a DevOps team to keep it going (can be
more expensive and less flexible, but much easier to setup and run)
41. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Topics, Logs, Partitions
âť– Kafka Topic is a stream of records
âť– Topics stored in log
âť– Log broken up into partitions and segments
âť– Topic is a category or stream name or feed
âť– Topics are pub/sub
âť– Can have zero or many subscribers - consumer groups
âť– Topics are broken up and spread by partitions for speed and
size
42. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Topic Partitions
âť– Topics are broken up into partitions
âť– Partitions decided usually by key of record
âť– Key of record determines which partition
âť– Partitions are used to scale Kafka across many servers
âť– Record sent to correct partition by key
âť– Partitions are used to facilitate parallel consumers
âť– Records are consumed in parallel up to the number of partitions
âť– Order guaranteed per partition
âť– Partitions can be replicated to multiple brokers
43. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Topic Partition Log
âť– Order is maintained only in a single partition
âť– Partition is ordered, immutable sequence of records that is continually appended
to—a structured commit log
âť– Records in partitions are assigned sequential id number called the offset
âť– Offset identifies each record within the partition
âť– Topic Partitions allow Kafka log to scale beyond a size that will fit on a single server
âť– Topic partition must fit on servers that host it
âť– topic can span many partitions hosted on many servers
âť– Topic Partitions are unit of parallelism - a partition can only be used by one
consumer in group at a time
âť– Consumers can run in their own process or their own thread
âť– If a consumer stops, Kafka spreads partitions across remaining consumer in group
45. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Replication: Kafka Partition
Distribution
âť– Each partition has leader server and zero or more follower
servers
âť– Leader handles all read and write requests for partition
âť– Followers replicate leader, and take over if leader dies
âť– Used for parallel consumer handling within a group
âť– Partitions of log are distributed over the servers in the Kafka cluster
with each server handling data and requests for a share of
partitions
âť– Each partition can be replicated across a configurable number of
Kafka servers - Used for fault tolerance
46. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Replication: Kafka Partition
Leader
❖ One node/partition’s replicas is chosen as leader
âť– Leader handles all reads and writes of Records for
partition
âť– Writes to partition are replicated to followers
(node/partition pair)
âť– An follower that is in-sync is called an ISR (in-sync
replica)
âť– If a partition leader fails, one ISR is chosen as new
leader
47. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Replication to Partition 0
Kafka Broker 0
Partition 0
Partition 1
Partition 2
Partition 3
Partition 4
Kafka Broker 1
Partition 0
Partition 1
Partition 2
Partition 3
Partition 4
Kafka Broker 2
Partition 1
Partition 2
Partition 3
Partition 4
Client Producer
1) Write record
Partition 0
2) Replicate
record
2) Replicate
record
Leader Red
Follower Blue
Record is considered "committed"
when all ISRs for partition
wrote to their log.
Only committed records are
readable from consumer
48. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Replication to Partitions
1
Kafka Broker 0
Partition 0
Partition 1
Partition 2
Partition 3
Partition 4
Kafka Broker 1
Partition 0
Partition 1
Partition 2
Partition 3
Partition 4
Kafka Broker 2
Partition 1
Partition 2
Partition 3
Partition 4
Client Producer
1) Write record
Partition 0
2) Replicate
record
2) Replicate
record
Another partition can
be owned
by another leader
on another Kafka broker
Leader Red
Follower Blue
49. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Topic Review
âť– What is an ISR?
âť– How does Kafka scale Consumers?
âť– What are leaders? followers?
âť– How does Kafka perform failover for Consumers?
âť– How does Kafka perform failover for Brokers?
51. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Producers
âť– Producers send records to topics
âť– Producer picks which partition to send record to per topic
âť– Can be done in a round-robin
âť– Can be based on priority
âť– Typically based on key of record
âť– Kafka default partitioner for Java uses hash of keys to
choose partitions, or a round-robin strategy if no key
âť– Important: Producer picks partition
52. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Producers and
Consumers
0 1 42 3 5 6 7 8 9 10 11
Partition
0
Producers
Consumer Group A
Producers are writing at Offset 12
Consumer Group A is Reading from Offset 9.
53. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Producers
âť– Producers write at their own cadence so order of Records
cannot be guaranteed across partitions
âť– Producer configures consistency level (ack=0, ack=all,
ack=1)
âť– Producers pick the partition such that Record/messages
goes to a given same partition based on the data
âť– Example have all the events of a certain 'employeeId' go to
same partition
âť– If order within a partition is not needed, a 'Round Robin'
partition strategy can be used so Records are evenly
distributed across partitions.
54. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Producer Review
âť– Can Producers occasionally write faster than
consumers?
âť– What is the default partition strategy for Producers
without using a key?
âť– What is the default partition strategy for Producers using
a key?
âť– What picks which partition a record is sent to?
55. ™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Kafka Consumers
Load balancing consumers
Failover for consumers
Offset management per consumer
group
Kafka Consumer Architecture
56. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Consumer Groups
âť– Consumers are grouped into a Consumer Group
âť– Consumer group has a unique id
âť– Each consumer group is a subscriber
âť– Each consumer group maintains its own offset
âť– Multiple subscribers = multiple consumer groups
âť– Each has different function: one might delivering records to
microservices while another is streaming records to Hadoop
âť– A Record is delivered to one Consumer in a Consumer Group
âť– Each consumer in consumer groups takes records and only one consumer
in group gets same record
âť– Consumers in Consumer Group load balance record consumption
57. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Consumer Load Share
âť– Kafka Consumer consumption divides partitions over consumers in a
Consumer Group
âť– Each Consumer is exclusive consumer of a "fair share" of partitions
âť– This is Load Balancing
âť– Consumer membership in Consumer Group is handled by the Kafka
protocol dynamically
âť– If new Consumers join Consumer group, it gets a share of partitions
âť– If Consumer dies, its partitions are split among remaining live Consumers
in Consumer Group
58. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Consumer Groups
0 1 42 3 5 6 7 8 9 10 11
Partition
0
Consumer Group A
Producers
Consumer Group B
Consumers remember offset where they left off.
Consumers groups each have their own offset per partition.
59. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Consumer Groups
Processing
âť– How does Kafka divide up topic so multiple Consumers in a Consumer
Group can process a topic?
âť– You group consumers into consumers group with a group id
âť– Consumers with same id belong in same Consumer Group
âť– One Kafka broker becomes group coordinator for Consumer Group
âť– assigns partitions when new members arrive (older clients would talk
direct to ZooKeeper now broker does coordination)
âť– or reassign partitions when group members leave or topic changes
(config / meta-data change
âť– When Consumer group is created, offset set according to reset policy of
topic
60. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Consumer Failover
âť– Consumers notify broker when it successfully processed a
record
âť– advances offset
âť– If Consumer fails before sending commit offset to Kafka
broker,
âť– different Consumer can continue from the last committed
offset
âť– some Kafka records could be reprocessed
âť– at least once behavior
âť– messages should be idempotent
61. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Consumer Offsets and
Recovery
❖ Kafka stores offsets in topic called “__consumer_offset”
âť– Uses Topic Log Compaction
âť– When a consumer has processed data, it should
commit offsets
âť– If consumer process dies, it will be able to start up and
start reading where it left off based on offset stored in
“__consumer_offset”
62. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Consumer: What can be
consumed?
âť– "Log end offset" is offset of last record written
to log partition and where Producers write to
next
âť– "High watermark" is offset of last record
successfully replicated to all partitions followers
❖ Consumer only reads up to “high watermark”.
Consumer can’t read un-replicated data
63. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Consumer to Partition
Cardinality
âť– Only a single Consumer from the same Consumer
Group can access a single Partition
âť– If Consumer Group count exceeds Partition count:
âť– Extra Consumers remain idle; can be used for failover
âť– If more Partitions than Consumer Group instances,
âť– Some Consumers will read from more than one
partition
64. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
2 server Kafka cluster hosting 4 partitions (P0-P5)
Kafka Cluster
Server 2
P0 P1 P5
Server 1
P2 P3 P4
Consumer Group A
C0 C1 C3
Consumer Group B
C0 C1 C3
65. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Multi-threaded Consumers
âť– You can run more than one Consumer in a JVM process
âť– If processing records takes a while, a single Consumer can run multiple threads to process
records
âť– Harder to manage offset for each Thread/Task
âť– One Consumer runs multiple threads
âť– 2 messages on same partitions being processed by two different threads
âť– Hard to guarantee order without threads coordination
âť– PREFER: Multiple Consumers can run each processing record batches in their own thread
âť– Easier to manage offset
âť– Each Consumer runs in its thread
âť– Easier to mange failover (each process runs X num of Consumer threads)
66. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Consumer Review
âť– What is a consumer group?
âť– Does each consumer have its own offset?
âť– When can a consumer see a record?
âť– What happens if there are more consumers than
partitions?
âť– What happens if you run multiple consumers in many
thread in the same JVM?
67. ™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Using Kafka Single Node
Using Kafka Single
Node
Run ZooKeeper, Kafka
Create a topic
Send messages from command
line
Read messages from command
line
Tutorial Using Kafka Single Node
68. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run Kafka
âť– Run ZooKeeper start up script
âť– Run Kafka Server/Broker start up script
âť– Create Kafka Topic from command line
âť– Run producer from command line
âť– Run consumer from command line
69. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run ZooKeeper
70. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run Kafka Server
71. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Create Kafka Topic
72. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
List Topics
73. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run Kafka Producer
74. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run Kafka Consumer
75. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Running Kafka Producer and
Consumer
76. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Single Node Review
âť– What server do you run first?
âť– What tool do you use to create a topic?
âť– What tool do you use to see topics?
âť– What tool did we use to send messages on the command line?
âť– What tool did we use to view messages in a topic?
âť– Why were the messages coming out of order?
âť– How could we get the messages to come in order from the
consumer?
77. ™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Use Kafka to send and receive messages
Lab Use Kafka
Use single server version of
Kafka.
Setup single node.
Single ZooKeeper.
Create a topic.
Produce and consume messages
from the command line.
78. ™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Using Kafka Cluster
and Failover
Demonstrate Kafka Cluster
Create topic with replication
Show consumer failover
Show broker failover
Kafka Tutorial Cluster and
Failover
79. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Objectives
âť– Run many Kafka Brokers
âť– Create a replicated topic
âť– Demonstrate Pub / Sub
âť– Demonstrate load balancing
consumers
âť– Demonstrate consumer failover
âť– Demonstrate broker failover
80. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Running many nodes
âť– If not already running, start up ZooKeeper
âť– Shutdown Kafka from first lab
âť– Copy server properties for three brokers
âť– Modify properties files, Change port, Change Kafka log
location
âť– Start up many Kafka server instances
âť– Create Replicated Topic
âť– Use the replicated topic
81. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Create three new server-
n.properties files
âť– Copy existing server.properties to server-
0.properties, server-1.properties, server-2.properties
âť– Change server-1.properties to use log.dirs
“./logs/kafka-logs-0”
âť– Change server-1.properties to use port 9093, broker
id 1, and log.dirs “./logs/kafka-logs-1”
âť– Change server-2.properties to use port 9094, broker
id 2, and log.dirs “./logs/kafka-logs-2”
82. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Modify server-x.properties
âť– Each have different
broker.id
âť– Each have different
log.dirs
âť– Each had different
port
83. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Create Startup scripts for three Kafka
servers
âť– Passing properties files
from last step
84. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run Servers
85. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Create Kafka replicated topic my-
failsafe-topic
âť– Replication Factor is set to 3
âť– Topic name is my-failsafe-topic
âť– Partitions is 13
86. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Start Kafka Consumer
âť– Pass list of Kafka servers to bootstrap-
server
âť– We pass two of the three
âť– Only one needed, it learns about the rest
87. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Start Kafka Producer
âť– Start producer
âť– Pass list of Kafka Brokers
88. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka 1 consumer and 1 producer
running
89. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Start a second and third
consumer
âť– Acts like pub/sub
âť– Each consumer
in its own group
âť– Message goes to
each
âť– How do we load
share?
90. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Running consumers in same
group
âť– Modify start consumer script
âť– Add the consumers to a group called
mygroup
âť– Now they will share load
91. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Start up three consumers
again
âť– Start up producer and three consumers
âť– Send 7 messages
âť– Notice how messages are spread among 3 consumers
92. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Consumer Failover
âť– Kill one consumer
âť– Send seven more
messages
âť– Load is spread to
remaining consumers
âť– Failover WORK!
93. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Create Kafka Describe Topic
❖ —describe will show list partitions, ISRs, and
partition leadership
94. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Use Describe Topics
âť– Lists which broker owns (leader of) which partition
âť– Lists Replicas and ISR (replicas that are up to
date)
âť– Notice there are 13 topics
95. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Test Broker Failover: Kill 1st
server
se Kafka topic describe to see that a new leader was elected!
Kill the first server
96. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Show Broker Failover Worked
âť– Send two more
messages from the
producer
âť– Notice that the
consumer gets the
messages
âť– Broker Failover WORKS!
97. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Cluster Review
âť– Why did the three consumers not load share the
messages at first?
âť– How did we demonstrate failover for consumers?
âť– How did we demonstrate failover for producers?
âť– What tool and option did we use to show ownership of
partitions and the ISRs?
98. ™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Use Kafka to send and receive messages
Lab 2 Use Kafka
multiple nodes
Use a Kafka Cluster to
replicate a Kafka topic log
100. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Ecosystem
âť– Kafka Streams
âť– Streams API to transform, aggregate, process records from a stream and
produce derivative streams
âť– Kafka Connect
âť– Connector API reusable producers and consumers
âť– (e.g., stream of changes from DynamoDB)
âť– Kafka REST Proxy
âť– Producers and Consumers over REST (HTTP)
âť– Schema Registry - Manages schemas using Avro for Kafka Records
âť– Kafka MirrorMaker - Replicate cluster data to another cluster
101. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka REST Proxy and
Kafka Schema Registry
102. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Ecosystem
103. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Stream Processing
âť– Kafka Streams for Stream Processing
âť– Kafka enable real-time processing of streams.
âť– Kafka Streams supports Stream Processor
âť– processing, transformation, aggregation, and produces 1 to * output streams
âť– Example: video player app sends events videos watched, videos paused
âť– output a new stream of user preferences
âť– can gear new video recommendations based on recent user activity
âť– can aggregate activity of many users to see what new videos are hot
âť– Solves hard problems: out of order records, aggregating/joining across streams, stateful
computations, and more
104. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Connectors and
Streams
Kafka
Cluster
App
App
App
App
App
App
DB DB
App App
Connectors
Producers
Consumers
Streams
105. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Ecosystem review
âť– What is Kafka Streams?
âť– What is Kafka Connect?
âť– What is the Schema Registry?
âť– What is Kafka Mirror Maker?
âť– When might you use Kafka REST Proxy?
106. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
References
âť– Learning Apache Kafka, Second Edition 2nd Edition by Nishant Garg (Author),
2015, ISBN 978-1784393090, Packet Press
âť– Apache Kafka Cookbook, 1st Edition, Kindle Edition by Saurabh Minni (Author),
2015, ISBN 978-1785882449, Packet Press
âť– Kafka Streams for Stream processing: A few words about how Kafka works,
Serban Balamaci, 2017, Blog: Plain Ol' Java
âť– Kafka official documentation, 2017
âť– Why we need Kafka? Quora
âť– Why is Kafka Popular? Quora
âť– Why is Kafka so Fast? Stackoverflow
âť– Kafka growth exploding (Tech Republic)