Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This slide deck is a tutorial for the Kafka streaming platform. This slide deck covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example to demonstrate failover of brokers as well as consumers. Then it goes through some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have also expanded on the Kafka design section and added references. The tutorial covers Avro and the Schema Registry as well as advance Kafka Producers.
A brief introduction to Apache Kafka and describe its usage as a platform for streaming data. It will introduce some of the newer components of Kafka that will help make this possible, including Kafka Connect, a framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library.
Jay Kreps is a Principal Staff Engineer at LinkedIn where he is the lead architect for online data infrastructure. He is among the original authors of several open source projects including a distributed key-value store called Project Voldemort, a messaging system called Kafka, and a stream processing system called Samza. This talk gives an introduction to Apache Kafka, a distributed messaging system. It will cover both how Kafka works, as well as how it is used at LinkedIn for log aggregation, messaging, ETL, and real-time stream processing.
Apache Kafka is an open-source message broker project developed by the Apache Software Foundation written in Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
Why is Kafka so fast? Why is Kafka so popular? Why Kafka? This slide deck is a tutorial for the Kafka streaming platform. This slide deck covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example to demonstrate failover of brokers as well as consumers. Then it goes through some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have also expanded on the Kafka design section and added references. The tutorial covers Avro and the Schema Registry as well as advance Kafka Producers.
A brief introduction to Apache Kafka and describe its usage as a platform for streaming data. It will introduce some of the newer components of Kafka that will help make this possible, including Kafka Connect, a framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library.
Jay Kreps is a Principal Staff Engineer at LinkedIn where he is the lead architect for online data infrastructure. He is among the original authors of several open source projects including a distributed key-value store called Project Voldemort, a messaging system called Kafka, and a stream processing system called Samza. This talk gives an introduction to Apache Kafka, a distributed messaging system. It will cover both how Kafka works, as well as how it is used at LinkedIn for log aggregation, messaging, ETL, and real-time stream processing.
Producer Performance Tuning for Apache KafkaJiangjie Qin
Kafka is well known for high throughput ingestion. However, to get the best latency characteristics without compromising on throughput and durability, we need to tune Kafka. In this talk, we share our experiences to achieve the optimal combination of latency, throughput and durability for different scenarios.
The first presentation for Kafka Meetup @ Linkedin (Bangalore) held on 2015/12/5
It provides a brief introduction to the motivation for building Kafka and how it works from a high level.
Please download the presentation if you wish to see the animated slides.
Getting Started with Confluent Schema Registryconfluent
Getting started with Confluent Schema Registry, Patrick Druley, Senior Solutions Engineer, Confluent
Meetup link: https://www.meetup.com/Cleveland-Kafka/events/272787313/
Producer Performance Tuning for Apache KafkaJiangjie Qin
Kafka is well known for high throughput ingestion. However, to get the best latency characteristics without compromising on throughput and durability, we need to tune Kafka. In this talk, we share our experiences to achieve the optimal combination of latency, throughput and durability for different scenarios.
The first presentation for Kafka Meetup @ Linkedin (Bangalore) held on 2015/12/5
It provides a brief introduction to the motivation for building Kafka and how it works from a high level.
Please download the presentation if you wish to see the animated slides.
Getting Started with Confluent Schema Registryconfluent
Getting started with Confluent Schema Registry, Patrick Druley, Senior Solutions Engineer, Confluent
Meetup link: https://www.meetup.com/Cleveland-Kafka/events/272787313/
SlideShare now has a player specifically designed for infographics. Upload your infographics now and see them take off! Need advice on creating infographics? This presentation includes tips for producing stand-out infographics. Read more about the new SlideShare infographics player here: http://wp.me/p24NNG-2ay
This infographic was designed by Column Five: http://columnfivemedia.com/
No need to wonder how the best on SlideShare do it. The Masters of SlideShare provides storytelling, design, customization and promotion tips from 13 experts of the form. Learn what it takes to master this type of content marketing yourself.
10 Ways to Win at SlideShare SEO & Presentation OptimizationOneupweb
Thank you, SlideShare, for teaching us that PowerPoint presentations don't have to be a total bore. But in order to tap SlideShare's 60 million global users, you must optimize. Here are 10 quick tips to make your next presentation highly engaging, shareable and well worth the effort.
For more content marketing tips: http://www.oneupweb.com/blog/
Are you new to SlideShare? Are you looking to fine tune your channel plan? Are you using SlideShare but are looking for ways to enhance what you're doing? How can you use SlideShare for content marketing tactics such as lead generation, calls-to-action to other pieces of your content, or thought leadership? Read more from the CMI team in their latest SlideShare presentation on SlideShare.
How to Make Awesome SlideShares: Tips & TricksSlideShare
Turbocharge your online presence with SlideShare. We provide the best tips and tricks for succeeding on SlideShare. Get ideas for what to upload, tips for designing your deck and more.
Unleashing Real-time Power with Kafka.pptxKnoldus Inc.
Unlock the potential of real-time data streaming with Kafka in this session. Learn the fundamentals, architecture, and seamless integration with Scala, empowering you to elevate your data processing capabilities. Perfect for developers at all levels, this hands-on experience will equip you to harness the power of real-time data streams effectively.
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
Fundamentals and Architecture of Apache Kafka.
This presentation explains Apache Kafka's architecture and internal design giving an overview of Kafka internal functions, including:
Brokers, Replication, Partitions, Producers, Consumers, Commit log, comparison over traditional message queues.
Full recorded presentation at https://www.youtube.com/watch?v=2UfAgCSKPZo for Tetrate Tech Talks on 2022/05/13.
Envoy's support for Kafka protocol, in form of broker-filter and mesh-filter.
Contents:
- overview of Kafka (usecases, partitioning, producer/consumer, protocol);
- proxying Kafka (non-Envoy specific);
- proxying Kafka with Envoy;
- handling Kafka protocol in Envoy;
- Kafka-broker-filter for per-connection proxying;
- Kafka-mesh-filter to provide front proxy for multiple Kafka clusters.
References:
- https://adam-kotwasinski.medium.com/deploying-envoy-and-kafka-8aa7513ec0a0
- https://adam-kotwasinski.medium.com/kafka-mesh-filter-in-envoy-a70b3aefcdef
In this session you will learn:
1. Kafka Overview
2. Need for Kafka
3. Kafka Architecture
4. Kafka Components
5. ZooKeeper Overview
6. Leader Node
For more information, visit: https://www.mindsmapped.com/courses/big-data-hadoop/hadoop-developer-training-a-step-by-step-tutorial/
Uber has one of the largest Kafka deployment in the industry. To improve the scalability and availability, we developed and deployed a novel federated Kafka cluster setup which hides the cluster details from producers/consumers. Users do not need to know which cluster a topic resides and the clients view a "logical cluster". The federation layer will map the clients to the actual physical clusters, and keep the location of the physical cluster transparent from the user. Cluster federation brings us several benefits to support our business growth and ease our daily operation. In particular, Client control. Inside Uber there are a large of applications and clients on Kafka, and it's challenging to migrate a topic with live consumers between clusters. Coordinations with the users are usually needed to shift their traffic to the migrated cluster. Cluster federation enables much control of the clients from the server side by enabling consumer traffic redirection to another physical cluster without restarting the application. Scalability: With federation, the Kafka service can horizontally scale by adding more clusters when a cluster is full. The topics can freely migrate to a new cluster without notifying the users or restarting the clients. Moreover, no matter how many physical clusters we manage per topic type, from the user perspective, they view only one logical cluster. Availability: With a topic replicated to at least two clusters we can tolerate a single cluster failure by redirecting the clients to the secondary cluster without performing a region-failover. This also provides much freedom and alleviates the risks for us to carry out important maintenance on a critical cluster. Before the maintenance, we mark the cluster as a secondary and migrate off the live traffic and consumers. We will present the details of the architecture and several interesting technical challenges we overcame.
OSMC 2016 - Monasca - Monitoring-as-a-Service (at-Scale) by Roland HochmuthNETWAYS
Roland Hochmut ist der Project Tech Lead (PTL) und Software Architect bei Monasca, das Open –Source Monitoring-as-a-Service (at-Scale) OpenStack Project (https://wiki.openstack.org/wiki/Monasca). Er konzentriert sich auf die Entwicklung einer leistungsstarken, skalierbaren und zuverlässigen Turn-Key Monitoring Lösung, die Einfluss hat auf die leitenden Trends und Innovationen der Industrie was Streaming von Daten, Analyse und Big Data betrifft. Er ist auch verantwortlich für die Metrics Processing Pipeline für HP`s öffentliche Cloud. Er hat Erfahrung in mehreren Software-Bereichen und Domänen, sowohl von 3-D Computer Grafiken als auch von Remote Desktop Visualisierung und Cloud Computing und Monitoring.
OSMC 2016 | Monasca: Monitoring-as-a-Service (at-Scale) by Roland HochmuthNETWAYS
Monasca, monasca.io ist eine Turn-Key Open Source OpenStack Monitoring-as-a-Service Plattform, die Authentifizierung und multi-Tenancy mittels OpenStack Keystone Identity Service unterstützt. Monasca ist eine hoch skalierbare, leistungsfähige und Fehler-tolerante Monitoring-as-a-Service Lösung, die Push-based Streaming-Metrics, Gesundheit/Status, Alarmierung/Thresholding und Benachrichtigungen unterstützt. Logging-as-a-Service befindet sich in der Entwicklung, und das Ziel ist es eine umfassende und integrierte Monitoring Lösung für Open Stack Clouds zur Verfügung zu stellen, die auch Kennzahlen, Events und Logs unterstützt.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
3. 3Page
Messaging Systems
• Asynchronous communication between systems
• Some Use Cases
• Web application – fast response to client and handle heavy processing
tasks asynchronously
• Balance load between workers
• Decouple processing from data producers
• Models
• Queuing: a pool of consumers may read from a server and each message
goes to one of them
• Publish – Subscribe: the message is broadcast to all consumers
Producer
Messaging
System
Consumer
4. 4Page
Kafka
• Kafka is a open-source message broker project
• Distributed, replicated, scalable, durable, and gives high throughput
• Aim – “central nervous system for data”
• The design is heavily influenced by transaction logs
• Built at LinkedIn with a specific purpose in mind: to serve as a central repository of data
streams
7. 7Page
Kafka
• After Kafka in place, LinkedIn stats look great – as of March 2015 –
• 800B messages produced / day – almost 175 TB of data
• 1100 Kafka brokers organized in 60 clusters
• As of Sep 2015… around 1.1 trillion a day...
• Written in Scala, open-sourced in 2011 under the Apache Software Foundation
• Apache top level project since 2012
8. 8Page
Kafka Terminology
Kafka broker
• Designed for HA - there are no master nodes. All
nodes are interchangeable.
• Data is replicated.
• Messages are stored for configurable period of time
Topic
• A topic is a category or feed name to which messages
are published.
• Topics are partitioned
Log
• Append Only
• Totally ordered sequence of records – ordered by
time
• They record what happened and when
9. 9Page
Kafka Terminology (cont.)
• Partitions
• Each partition is an ordered, immutable sequence of messages that is
continually appended to —a commit log
• Each message in the partition is assigned a unique sequenced ID, its offset
• More partitions allow greater parallelism for consumption
• They allow the log to scale beyond a size that will fit on a single server. Each
individual partition must fit on the servers that host it, but a topic can handle
an arbitrary amount of data.
• Number of partitions decide number of workers
• Each partition has one server which acts as the "leader" and zero or more
servers which act as "followers".
• Leader handles all read and write requests for the partition.
10. 10Page
Kafka Terminology (cont.)
Producers
• Send messages to topics synchronously or asynchronously
• They decide
• Partition / Key / none of these / Partitioner class
• what sort of replication guarantees they want (acks setting)
• batching and compressing
Consumers and Consumer Groups
• Consumer labels themselves with a consumer group name; and subscribe to
one or more topics
• Consumers pull messages
• They control the offset read by them .. Can re-read without overhead on
broker
• Each consumer in a consumer group will read messages from a unique subset
of partitions in each topic they subscribe to, so each message is delivered to
one consumer in the group, and all messages with the same key arrive at the
same consumer
11. 11Page
Kafka Terminology – Consumer Groups
Queue model Publish-subscribe model
Topic
C3 C4C1 C2
ConsGroup1 ConsGroup2
m1 m1 m2m2
Topic
C2C1
ConsGroup1 ConsGroup2
m1,
m2
m1,
m2
12. 12Page
Zookeeper
• ZooKeeper is a fast, highly available, fault tolerant, distributed coordination service
• help distributed synchronization and
• maintain configuration information
• Replicated: Like the distributed processes it coordinates, ZooKeeper itself is intended to be
replicated over a sets of hosts called an ensemble.
• Role in kafka architecture
• Coordinate cluster information
• Store cluster metadata
• Store consumer offsets
13. 13Page
Differences with RabbitMQ
Feature Kafka JMS Message Broker; RabbitMQ
Dequeuing cluster retains all published messages—whether or not
they have been consumed—for a configurable period of
time.
When consumer acknowledges
Consumer metadata the only metadata retained on a per-consumer basis is
offset.
consumer acknowledgments per
message
Ordering Strong ordering within a partition Ordering of the messages is lost in the
presence of parallel consumption. For
workaround of “exclusive consumer”
have to sacrifice parallelism
Batching / Streaming Available for both producer and consumer – supports
online and offline consumers
Consumers are mostly online
Scalability Client centric Broker centric
Complex routing Needs to be programmed Lot of options available with less work
Monitoring UI Needs work Decent web UI available
14. 14Page
Common Use Cases
• Messaging
• Website Activity Tracking
• The original use case for Kafka - Often very high volume –
• (page views, searches, etc.) -> published to central topics -> subscribed by different consumers
for various use cases - real-time processing, monitoring, and loading into Hadoop or offline
processing and reporting.
• Log Aggregation
• Stream Processing
• Collect data from various sources
• Aggregate the data as soon as it arrives
• Feed it to systems such as Hadoop/ DB/ other clients
15. 15Page
Kafka 0.9 Features
• Security
• authenticate users using either Kerberos or TLS client
certificates
• Unix-like permission system to control which user can
access which data
• encryption
• Kafka Connect
• User defined Quota
• New Consumer
• New Java client
• Group management facility
• Faster rebalancing
• Fully decouple clients from Zookeeper
16. 16Page
Bootstrapping
Bootstrapping for producers
1. Cycle through a list of "bootstrap" kafka urls until we find one we can connect to. Fetch cluster metadata.
2. Process fetch or produce requests, directing them to the appropriate broker based on the topic/partitions they send
to or fetch from
3. If we get an appropriate error, refresh the metadata and try again.
Bootstrapping of consumers
1. On startup or on co-ordinator failover, the consumer sends a ConsumerMetadataRequest to any of the brokers in the
bootstrap.brokers list -> receives the location of the co-ordinator for it's group.
2. The consumer connects to the co-ordinator and sends a HeartbeatRequest.
3. If no error is returned in the HeartbeatResponse, the consumer continues fetching data, for the list of partitions it last
owned, without interruption.
18. 18Page
Sample Application
• E shopping system – simplified scenario
• Supports shipping in two cities
• Once order is placed we need to handle
payment and shipping
• Shipping system allows efficiency if
requests are grouped by city
• See simple architecture diagram in next
slide and check out the code
In demo application, we will cover:
• Zookeeper config
• Broker config
• Start two brokers
• Create Topic and describe / list
• Producer config
• Message delivery semantics
• Consumer config
• Consumer Rebalancing
• Sample application code: https://github.com/teamclairvoyant/meetup-docs/tree/master/Meetup-Kafka
24. 24Page
RabbitMQ
• Proven Message Broker uses Advanced Message Queuing Protocol
(AMQP) for messaging.
• Message flow & concepts in RabbitMQ
• The producer publishes a message
• The exchange receives and routes the message in to the queues
• Routing can be based on different message attributes such as routing key,
depending on the exchange type
• Binding is a link between an exchange and a queue
• The messages stays in the queue until they are handled by a consumer
• The consumer handles the message.
• Channel: a virtual connection inside a connection. When you are publishing
or consuming messages or subscribing to a queue is it all done over a channel
25. 25Page
RabbitMQ (cont.)
• Types of Exchange
• Direct: delivers messages to queues based on a message
routing key:
Queues’ binding key == routing key of the message
• Fanout: routes messages to all of the queues that are
bound to it.
• Topic: does a wildcard match between the routing key
and the routing pattern specified in the binding.
• Headers: uses the message header attributes for
routing.
• CloudAMQP
• hosted RabbitMQ solution, just sign up for an account
and create an instance. You do not need to set up and
install RabbitMQ or care about cluster handling
26. 26Page
RabbitMQ (cont.)
• Management and Monitoring
• Nice web UI for management and monitoring of your RabbitMQ server.
• Allows to handle, create, delete and list queues, monitor queue length, check message rate,
change and add users permissions, etc.
27. 27Page
Upgrading from 0.8.0, 0.8.1.X or 0.8.2.X to 0.9.0.0
• 0.9.0.0 has potential breaking changes (please review before upgrading) and an inter-broker
protocol change from previous versions.
• Java 1.6 and Scala 2.9 is no longer supported
• http://kafka.apache.org/documentation.html
• Kafka consumers in earlier releases store their offsets by default in ZooKeeper. It is possible to
migrate these consumers to commit offsets into Kafka by following some steps
28. 28Page
Kafka Terminology (cont.)
• Protocol
• These requests to publish or fetch data must be sent to the broker that is currently acting as the
leader for a given partition. This condition is enforced by the broker, so a request for a
particular partition to the wrong broker will result in an the NotLeaderForPartition error code
• All Kafka brokers can answer a metadata request that describes the current state of the cluster:
• what topics there are
• which partitions those topics have
• which broker is the leader for those partitions
• the host and port information for these brokers
• Good explanation:
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
29. 29Page
Kafka Adoption
Apache Kafka has become a popular messaging system in a short period of time with a number of
organizations like
• LinkedIn
• Tumblr
• PayPal
• Cisco
• Box
• Airbnb
• Netflix
• Square
• Spotify
• Pinterest
• Uber
• Goldman Sachs
• Yahoo and Twitter among others using it in production systems