Kafka is a real-time, fault-tolerant, scalable messaging system.
It is a publish-subscribe system that connects various applications with the help of messages - producers and consumers of information.
Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
Jay Kreps is a Principal Staff Engineer at LinkedIn where he is the lead architect for online data infrastructure. He is among the original authors of several open source projects including a distributed key-value store called Project Voldemort, a messaging system called Kafka, and a stream processing system called Samza. This talk gives an introduction to Apache Kafka, a distributed messaging system. It will cover both how Kafka works, as well as how it is used at LinkedIn for log aggregation, messaging, ETL, and real-time stream processing.
The first presentation for Kafka Meetup @ Linkedin (Bangalore) held on 2015/12/5
It provides a brief introduction to the motivation for building Kafka and how it works from a high level.
Please download the presentation if you wish to see the animated slides.
Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
Jay Kreps is a Principal Staff Engineer at LinkedIn where he is the lead architect for online data infrastructure. He is among the original authors of several open source projects including a distributed key-value store called Project Voldemort, a messaging system called Kafka, and a stream processing system called Samza. This talk gives an introduction to Apache Kafka, a distributed messaging system. It will cover both how Kafka works, as well as how it is used at LinkedIn for log aggregation, messaging, ETL, and real-time stream processing.
The first presentation for Kafka Meetup @ Linkedin (Bangalore) held on 2015/12/5
It provides a brief introduction to the motivation for building Kafka and how it works from a high level.
Please download the presentation if you wish to see the animated slides.
Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
In this slide deck we show how to implement custom Kafka Serializer for Producer. We then show how failover works configuring when broker/topic config min.insync.replicas, and Producer config acks (0, 1, -1, none, leader, all).
Then tutorial show how to implement Kafka producer batching and compression. Then use Producer metrics API to see how batching and compression improves throughput. Then this tutorial covers using retires and timeouts, and tested that it works. It explains how the setup of max inflight messages and retry back off work and when to use and not use inflight messaging.
It goes on to who how to implement a ProducerInterceptor. Then lastly, it shows how to implement a custom Kafka partitioner to implement a priority queue for important records. Through many of the step by step examples, this tutorial shows how to use some of the Kafka tools to do replication verification, and inspect the topic partition leadership status.
Full recorded presentation at https://www.youtube.com/watch?v=2UfAgCSKPZo for Tetrate Tech Talks on 2022/05/13.
Envoy's support for Kafka protocol, in form of broker-filter and mesh-filter.
Contents:
- overview of Kafka (usecases, partitioning, producer/consumer, protocol);
- proxying Kafka (non-Envoy specific);
- proxying Kafka with Envoy;
- handling Kafka protocol in Envoy;
- Kafka-broker-filter for per-connection proxying;
- Kafka-mesh-filter to provide front proxy for multiple Kafka clusters.
References:
- https://adam-kotwasinski.medium.com/deploying-envoy-and-kafka-8aa7513ec0a0
- https://adam-kotwasinski.medium.com/kafka-mesh-filter-in-envoy-a70b3aefcdef
This session goes through the understanding of Apache Kafka, its components and working with best practices to achieve fault tolerant system with high availability and consistency by tuning Kafka brokers and producer to achieve the best result.
A brief introduction to Apache Kafka and describe its usage as a platform for streaming data. It will introduce some of the newer components of Kafka that will help make this possible, including Kafka Connect, a framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library.
In this presentation Guido Schmutz talks about Apache Kafka, Kafka Core, Kafka Connect, Kafka Streams, Kafka and "Big Data"/"Fast Data Ecosystems, Confluent Data Platform and Kafka in Architecture.
Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
In this slide deck we show how to implement custom Kafka Serializer for Producer. We then show how failover works configuring when broker/topic config min.insync.replicas, and Producer config acks (0, 1, -1, none, leader, all).
Then tutorial show how to implement Kafka producer batching and compression. Then use Producer metrics API to see how batching and compression improves throughput. Then this tutorial covers using retires and timeouts, and tested that it works. It explains how the setup of max inflight messages and retry back off work and when to use and not use inflight messaging.
It goes on to who how to implement a ProducerInterceptor. Then lastly, it shows how to implement a custom Kafka partitioner to implement a priority queue for important records. Through many of the step by step examples, this tutorial shows how to use some of the Kafka tools to do replication verification, and inspect the topic partition leadership status.
Full recorded presentation at https://www.youtube.com/watch?v=2UfAgCSKPZo for Tetrate Tech Talks on 2022/05/13.
Envoy's support for Kafka protocol, in form of broker-filter and mesh-filter.
Contents:
- overview of Kafka (usecases, partitioning, producer/consumer, protocol);
- proxying Kafka (non-Envoy specific);
- proxying Kafka with Envoy;
- handling Kafka protocol in Envoy;
- Kafka-broker-filter for per-connection proxying;
- Kafka-mesh-filter to provide front proxy for multiple Kafka clusters.
References:
- https://adam-kotwasinski.medium.com/deploying-envoy-and-kafka-8aa7513ec0a0
- https://adam-kotwasinski.medium.com/kafka-mesh-filter-in-envoy-a70b3aefcdef
This session goes through the understanding of Apache Kafka, its components and working with best practices to achieve fault tolerant system with high availability and consistency by tuning Kafka brokers and producer to achieve the best result.
A brief introduction to Apache Kafka and describe its usage as a platform for streaming data. It will introduce some of the newer components of Kafka that will help make this possible, including Kafka Connect, a framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library.
In this presentation Guido Schmutz talks about Apache Kafka, Kafka Core, Kafka Connect, Kafka Streams, Kafka and "Big Data"/"Fast Data Ecosystems, Confluent Data Platform and Kafka in Architecture.
Whether you are developing a greenfield data project or migrating a legacy system,
there are many critical design decisions to be made. Often, it is advantageous to not only
consider immediate requirements, but also the future requirements and technologies you may
want to support. Your project may start out supporting batch analytics with the vision of adding
realtime support. Or your data pipeline may feed data to one technology today, but tomorrow
an entirely new system needs to be integrated. Apache Kafka can help decouple these
decisions and provide a flexible core to your data architecture. This talk will show how building
Kafka into your pipeline can provide the flexibility to experiment, evolve and grow. It will also
cover a brief overview of Kafka, its architecture, and terminology.
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Data Con LA
Abstract:-
Tracking user events as they happen can challenge anyone providing real time user interaction. It can demand both huge scale and a lot of processing to support dynamic adjustment to targeting products and services. As the operational data store Couchbase data services are capable of processing tens of millions of updates a day. Streaming through systems such as Apache Spark and Kafka into Hadoop, information about these key events can be turned into deeper knowledge. We will review Lambda architectures deployed at sites like PayPal, Live Person and LinkedIn that leverage a Couchbase Data Pipeline.
Bio:-
Justin Michaels. With over 20 years experience in deploying mission critical systems, Justin Michaels industry experience covers capacity planning, architecture and industry vertical experience. Justin brings his passion for architecting, implementing and improving Couchbase to the community as a Solution Architect. His expertise involves both conventional application platforms as well as distributed data management systems. He regularly engages with existing and new Couchbase customers in performance reviews, architecture planning and best practice guidance.
Kafka Streams is a new stream processing library natively integrated with Kafka. It has a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. As such it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is backed by Kafka. We will provide the audience with an overview of Kafka Streams including its design and API, typical use cases, code examples, and an outlook of its upcoming roadmap. We will also compare Kafka Streams' light-weight library approach with heavier, framework-based tools such as Spark Streaming or Storm, which require you to understand and operate a whole different infrastructure for processing real-time data in Kafka.
The forums at BionicMe.com are allowing humanity to move forward through the study and discussion of bionics, robotics, prosthetics, artificial intelligence, nanotechnology and virtual reality.
Projects have evolved over time as organizations have changed and have become increasingly more complex. This increase in complexity necessitates an overarching structure or set of guidelines to which projects and other business functions must adhere. This framework – set of guidelines, procedures, and bylaws – is referred to as governance. When it comes to governance models, one size does not fit all. We believe that organizations must understand their own institutional behaviors when selecting and implementing a governance framework. Based upon our experience with clients, we have identified the keys to successful governance including: selecting the appropriate governance framework; addressing pitfalls; and incorporating key success factors that lead to successful governance outcomes.
Making Your IEP System Work for You: 5 Questions to Ask About Your IEP SystemAccelify
An effective IEP system should accommodate your workflow, not determine it. If you and your staff are bending over backward to make your IEP system work for you, it may not be doing its job. And with limited options on the market, of which many lack sufficient flexibility, it may seem like demanding that your current system do more or migrating from one IEP system to another, may not be worthwhile. But reevaluating your IEP system can be disruptive in a good way too. Demanding more from your IEP system can lead to better tools that help you and your staff more efficiently manage the IEP process and the data needed to manage compliance along the way.
Ngày 14 tháng 2 năm 2015, Bộ Kế hoạch và Đầu tư đã ban hành Thông tư số 01/2015/TT-BKHĐT quy định chi tiết lập Hồ sơ mời quan tâm, Hồ sơ mời thầu, Hồ sơ yêu cầu dịch vụ tư vấn thay thế Thông tư số 06/2010/TT-BKH ngày 9 tháng 3 năm 2010 của Bộ Kế hoạch và Đầu tư quy định chi tiết lập Hồ sơ mời thầu dịch vụ tư vấn và Thông tư số 09/2011/TT-BKHĐT ngày 7 tháng 9 năm 2011 của Bộ Kế hoạch và Đầu tư quy định chi tiết lập Hồ sơ yêu cầu chỉ định thầu tư vấn (Xem Thông tư đính kèm).
In the following slides, we are trying to explore Kafka and Event-Driven Architecture. We try to define what is Kafka platform, how does it work, analyze Kafka API's like ConsumerAPI, ProducerAPI, StreamsAPI. Also we take a look on some core Kafka's configuration before we deploy it on production and we discuss a few best approaches to have a reliable data delivery system using Kafka.
Check out our repository: https://github.com/arconsis/Eshop-EDA
In the following slides, our dear colleagues Dimosthenis Botsaris and Alexandros Koufatzis are trying to explore Kafka and Event-Driven Architecture. They define what is the Kafka platform, how does it work and analyze Kafka API's like ConsumerAPI, ProducerAPI, StreamsAPI. They also take a look on some core Kafka's configuration before they deploy it on production and discuss a few best approaches to have a reliable data delivery system using Kafka.
Check out the repository: https://github.com/arconsis/Eshop-EDA
In this session you will learn:
1. Kafka Overview
2. Need for Kafka
3. Kafka Architecture
4. Kafka Components
5. ZooKeeper Overview
6. Leader Node
For more information, visit: https://www.mindsmapped.com/courses/big-data-hadoop/hadoop-developer-training-a-step-by-step-tutorial/
How to use kakfa for storing intermediate data and use it as a pub/sub model with each of the Producer/Consumer/Topic configs deeply and the Internals working of it.
Internet companies with huge traffic and millions of users have tasks involved that cannot be served in a request. RabbitMQ can process tasks or communication between different app components asynchronously but close to real time.
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
Apache Kafka is a new breed of messaging system built for the "big data" world. Coming out of LinkedIn (and donated to Apache), it is a distributed pub/sub system built in Scala. It has been an Apache TLP now for several months with the first Apache release imminent. Built for speed, scalability, and robustness, Kafka should definitely be one of the data tools you consider when designing distributed data-oriented applications.
The talk will cover a general overview of the project and technology, with some use cases, and a demo.
Big Data Challenges and How to Overcome Them with Qubole - a Self-Service Platform for Big Data Analytics built on Amazon Web Services, Microsoft and Google Clouds. Storing, accessing, and analyzing large amounts of data from diverse sources and making it easily accessible to deliver actionable insights for users can be challenging for data driven organizations. The solution for customers is to optimize scaling and create a unified interface to simplify analysis. Qubole helps customers simplify their big data analytics with speed and scalability, while providing data analysts and scientists self-service access in Cloud. The platform is fully elastic and automatically scales or contracts clusters based on workload. We will try to overview main features, advantages and drawback of this platform.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Modern design is crucial in today's digital environment, and this is especially true for SharePoint intranets. The design of these digital hubs is critical to user engagement and productivity enhancement. They are the cornerstone of internal collaboration and interaction within enterprises.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Hivelance Technology
Cryptocurrency trading bots are computer programs designed to automate buying, selling, and managing cryptocurrency transactions. These bots utilize advanced algorithms and machine learning techniques to analyze market data, identify trading opportunities, and execute trades on behalf of their users. By automating the decision-making process, crypto trading bots can react to market changes faster than human traders
Hivelance, a leading provider of cryptocurrency trading bot development services, stands out as the premier choice for crypto traders and developers. Hivelance boasts a team of seasoned cryptocurrency experts and software engineers who deeply understand the crypto market and the latest trends in automated trading, Hivelance leverages the latest technologies and tools in the industry, including advanced AI and machine learning algorithms, to create highly efficient and adaptable crypto trading bots
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
Your Digital Assistant.
Making complex approach simple. Straightforward process saves time. No more waiting to connect with people that matter to you. Safety first is not a cliché - Securely protect information in cloud storage to prevent any third party from accessing data.
Would you rather make your visitors feel burdened by making them wait? Or choose VizMan for a stress-free experience? VizMan is an automated visitor management system that works for any industries not limited to factories, societies, government institutes, and warehouses. A new age contactless way of logging information of visitors, employees, packages, and vehicles. VizMan is a digital logbook so it deters unnecessary use of paper or space since there is no requirement of bundles of registers that is left to collect dust in a corner of a room. Visitor’s essential details, helps in scheduling meetings for visitors and employees, and assists in supervising the attendance of the employees. With VizMan, visitors don’t need to wait for hours in long queues. VizMan handles visitors with the value they deserve because we know time is important to you.
Feasible Features
One Subscription, Four Modules – Admin, Employee, Receptionist, and Gatekeeper ensures confidentiality and prevents data from being manipulated
User Friendly – can be easily used on Android, iOS, and Web Interface
Multiple Accessibility – Log in through any device from any place at any time
One app for all industries – a Visitor Management System that works for any organisation.
Stress-free Sign-up
Visitor is registered and checked-in by the Receptionist
Host gets a notification, where they opt to Approve the meeting
Host notifies the Receptionist of the end of the meeting
Visitor is checked-out by the Receptionist
Host enters notes and remarks of the meeting
Customizable Components
Scheduling Meetings – Host can invite visitors for meetings and also approve, reject and reschedule meetings
Single/Bulk invites – Invitations can be sent individually to a visitor or collectively to many visitors
VIP Visitors – Additional security of data for VIP visitors to avoid misuse of information
Courier Management – Keeps a check on deliveries like commodities being delivered in and out of establishments
Alerts & Notifications – Get notified on SMS, email, and application
Parking Management – Manage availability of parking space
Individual log-in – Every user has their own log-in id
Visitor/Meeting Analytics – Evaluate notes and remarks of the meeting stored in the system
Visitor Management System is a secure and user friendly database manager that records, filters, tracks the visitors to your organization.
"Secure Your Premises with VizMan (VMS) – Get It Now"
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
Worried about document security while sharing them in Salesforce? Fret no more! Here are the top-notch security standards XfilesPro upholds to ensure strong security for your Salesforce documents while sharing with internal or external people.
To learn more, read the blog: https://www.xfilespro.com/how-does-xfilespro-make-document-sharing-secure-and-seamless-in-salesforce/
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
Tim Combridge from Sensible Giraffe and Salesforce Ben presents some important tips that all developers should know when dealing with Flows in Salesforce.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
Strategies for Successful Data Migration Tools.pptx
Apache Kafka - Messaging System Overview
1. Apache Kafka - Messaging System
Dmitry Tolpeko, EPAM Systems – September 2014
2. Kafka Overview
Kafka is a real-time, fault-tolerant, scalable messaging
system.
2
It is a publish-subscribe system that connects various
applications with the help of messages - producers
and consumers of information.
Producers and consumers are independent,
messages are queued, one producer can serve
multiple consumers.
Was originally developed by LinkedIn.
4. Kafka Architecture
4
Client Server Client
Producer(s) Broker(s) Consumers(s)
ZooKeeper
• Brokers act as the server part of Kafka. Brokers are peers,
there is no the master broker.
• Brokers can run on multiple nodes, but you can also run
multiple brokers on each node. Each broker has own IP and
port for client connections.
5. Topic is a way to handle
multiple data streams
(different data feeds i.e.)
Each producer sends
messages to, and
consumers read the
messages from the
specified topic.
New topics can be
created automatically
when a message with a
new topic arrives, or you
can use --create
command to create a
topic.
Topics
5
Broker
Topic 1
Topic 2
Producer 1
Producer 2
Producer 3
Consumer 1
Consumer 2
6. A topic can contain one or more partitions.
Each partition is stored on a single server, and
multiple partitions allow the queue to scale
and go beyond the limits of a single system.
Partitions also allow a single consumer to
concurrently read messages in multiple
concurrent threads. You can add new
partitions dynamically.
Offset is uniquely identifies a message within
partition.
Partitions
6
Broker 1
Topic 1
Partition 1
Topic 2
Partition 1
Broker 2
Topic 2
Partition 2
Partition 3
7. Each partition is replicated for fault-tolerance.
Partition has one server that acts a Leader, it
handles all read-write requests.
Zero or more servers act as Followers, they
replicate the leader and if it fails one of them
becomes the new Leader.
Leader uses ZooKeeper heartbeat mechanism to
indicate that it is alive.
A follower acts as a normal consumer, it pulls
messages and updates own log. Only when all
followers (ISR group) sync the message it can be
send to consumers. When a follower rejoins after
a downtime it can re-sync.
Replication
7
Broker 1
Topic 1
Partition 1 - Leader
Broker 2
Topic 1
Partition 1 - Follower
8. Consumers are organized to consumer
groups.
To consume a single message by multiple
consumers, they must belong to different
consumer groups.
A consumer group is a single consumer
abstraction, so consumers from single group
read messages like from a queue there is no
message broadcast within the group. This
helps balance load among consumers of the
same type (fault-tolerance, scalability).
The state of consumed messages are
handled by consumers, not brokers.
Consumers store the state in ZooKeeper -
offset within each partition for each
Consumer group, not consumer (!)
Consumer group name is unique within the
Kafka cluster.
Consumer Groups
8
Topic 1
Partition 1
Partition 2
Partition 3
Group 1
Consumer
Consumer
Group 2
Consumer
9. Order Guarantees and Delivery Semantics
Each partition can be consumed only one consumer within the consumer group.
Kafka only provides total order guarantee within a partition, not between different
partitions in a topic.
If you need total order over messages you have to use one partition, and in this
case you can use only one consumer process.
Kafka guarantees at-least-once delivery semantics by default where
messages are never lost but may be redelivered (keys can be used to handle
duplicates). Kafka offers options to disable retries (so messages can be lost) in
case if the application can handle this, and needs a higher performance.
Kafka retains all published messages - no matter whether they are consumed or
not - for the configured period of time (2 days by default).
9
10. Producer can assign a key for a message
that defines which partition to publish
message to.
• Random (default, when no partition
class or key specified)
• Round-robin for load balancing
• Partition function (hash by message
key i.e.) - if key is a class type
(Source ID i.e.) then all messages of
the same type go to one partition.
Producer can optionally require an
acknowledgment from the broker that the
message was received (synced to Leader
or all followers).
Kafka can group multiple messages and
compress them.
Producers
10
Producer 1
Topic 1
Partition 1
Partition 2
Partition 3
Producer 2
11. Consumers read the messages from the
brokers leading the partitions (pull method).
A consumer labels itself with a consumer
group.
If the number of consumers of a specific
consumer group is greater than the number
of partitions, then some consumers will never
see a message.
If there are more partitions than consumers
of a specific consumer group, then a
consumer can get messages from multiple
partitions (no order guarantee). Then when
you add consumers, Kafka re-balances
partitions.
Consumers can get compressed message as
a single message.
Consumers
11
Partition 1
Partition 2
Partition 3
Group 1
Consumer 1
Consumer 2
Consumer 3
Consumer 4
12. Consumer Advanced Features
There are High Level and Simple Consumer API.
A High Level Consumer sets
auto.commit.interval.ms option that defines how often
offset is updated in ZooKeeper. If an error occurs between
updates, the consumer will get replayed messages (!)
Simple Consumer is a low-level API that allows you to set
any offset, explicitly read messages multiple times, or ensure
that a message is processed only once.
12
14. Kafka relies heavily on OS disk cache, not
JVM heap even for caching messages. Data
immediately written (appended) to a file.
Consumed messages are not deleted.
Data files (called logs) are stored at
log.dirs
A directory exists for each topic partition that
contains log segments (files 0000000.log -
named as offset of the 1st message in the
log). log.segment.bytes and
log.roll.hours define rotation policy.
log.flush.interval.xxx options define
how often fsync performed on files.
All options can be specified either globally or
per topic.
Persistence
14
Broker JVM App
OS page cache
/data/kafka-logs
TopicName-0
00000.log
15. Messages can be grouped
together to minimize the number of
network round-trips.
Multiple messages can be also
compressed together (GZIP,
Snappy) that helps achieve a good
compression rate and reduce
amount of data sent over network.
Producer can specify
compression.codec and
compressed.topics
Network I/O
15
Message1
Message2
Message3
Compressed
Network
16. There is no in-memory application
level cache, data are in the OS
pagecache.
Kafka uses sendfile Linux API
calls that directly sends data from
pagecache to a network socket, so
there is no need to do read/write
to application memory space.
Grouped messages are stored
compressed in the log, and
decompressed only by consumers.
Memory
16
Broker JVM App
OS page cache
Network
17. Log Compaction
Without log compaction (time series data):
17
Key1 Key2 Key3 Key1 Key2 Key1 Key3
A B C AA BB AAA CC
With log compaction only the last update is stored for each key:
Key2 Key1 Key3
BB AAA CC
Log compaction can be defined per topic. This can help increase
performance of roll-forward operations, and reduce storage.
18. Kafka Use Cases
• Messaging - decouple processing or handle message
buffer
• Monitoring and Tracking - collect activity, clickstream,
status data and logs from various systems
• Stream Processing - aggregate, enrich, handle micro-batches
etc.
• Commit Log - facilitate replication between systems
18
19. Thanks!
Join us at
https://www.linkedin.com/groups/Belarus-
Hadoop-User-Group-BHUG-8104884
dmitry_tolpeko@epam.com