Unlock the potential of real-time data streaming with Kafka in this session. Learn the fundamentals, architecture, and seamless integration with Scala, empowering you to elevate your data processing capabilities. Perfect for developers at all levels, this hands-on experience will equip you to harness the power of real-time data streams effectively.
2. Lack of etiquette and manners is a huge turn off.
KnolX Etiquettes
Punctuality
Join the session 5 minutes prior to the session start time. We start on
time and conclude on time!
Feedback
Make sure to submit a constructive feedback for all sessions as it is very
helpful for the presenter.
Silent Mode
Keep your mobile devices in silent mode, feel free to move out of session
in case you need to attend an urgent call.
Avoid Disturbance
Avoid unwanted chit chat during the session.
3. I. Introduction
II. Features of Kafka
III. Kafka Components
IV.Kafka Architecture
V. Topics Replication
VI.Kafka Advantage and Disadvantages
VII.Kafka use Cases
VIII.Demo
4.
5. Examples
Simple way
Steps -
1. Take current location from Driver
2. Insert into DB
3. Send the Current location to Customer
Kafka way
Steps -
1. Take current location from Driver.
2. Give it to the Kafka and process it.
3. Consumers will consume the data.
4. Bulk Insert into the DB
6. Introduction
• Apache Kafka is defined as an open-source
platform for real-time data handling, support low-
latency, high-volume data relaying tasks.
• Developed by LinkedIn.
• It works as a broker between two parties.
• Apache Kafka is a software platform which is
based on a distributed streaming process.
• It is a publish-subscribe messaging system
which let exchanging of data between
applications, servers, and processors as well.
7. • A distributed system is simply any environment
where multiple computers or devices are working
on a variety of tasks and components, all spread
across a network.
• A messaging system is a type of software
architecture that facilitates communication and the
exchange of data between different applications
or components in a distributed environment.
Messaging System
Distributed System
8.
9. Features of Kafka
• Real-time data processing: It enables the seamless ingestion, processing, and consumption of data
in real-time, making it suitable for applications that require low-latency and high-throughput data
processing.
• Fault Tolerance: Kafka ensures data durability and fault tolerance through data replication.
• Decoupled systems: Kafka provides a decoupled architecture, allowing producers of data to be
independent of consumers.
10.
11. Zookeeper
Zookeeper is an open-source distributed coordination service that acts as a centralized repository
for configuration information, naming, synchronization, and group services in distributed systems.
Kafka Cluster
Kafka cluster is a group of interconnected Kafka brokers working together to provide a distributed
and fault- tolerant messaging system.
12. • Producer is a component or application responsible for publishing messages to Kafka topics
• Consumer is a component that subscribes to and processes streams of records/messages from
one or more topics in a Kafka cluster.
Consumer
Producer
• The primary role of a Kafka producer is to send data or events to the Kafka cluster, where
the messages are then distributed across partitions within the specified topics.
• Consumers are typically organized into consumer groups. Each message in a topic is sent to one
consumer within each subscribing consumer group. This allows for parallel processing of messages
and load balancing across consumers.
13. Topics
• Category or a common name used to store and publish a particular stream of data.
• We can create n number of topics as we want. It is identified by its name, which depends on the
user's choice.
• A producer publishes data to the topics, and a consumer reads that data from the topic by
subscribing it.
Partitions
• A topic is split into several parts which are known as the partitions of the topic.
• Each message gets stored into partitions with an incremental id known as its Offset value. The
order of the offset value is guaranteed within the partition only and not across the partition.
14. Brokers
• A Kafka cluster is comprised of one or more servers which are known as brokers or Kafka
brokers.
• A broker is a container that holds several topics with their multiple partitions.
• Kafka brokers are also known as Bootstrap brokers because connection with any one broker
means connection with the entire cluster.
15.
16. Topic Replication
• Data is partitioned in Kafka.
• Each partition is replicated across multiple brokers.
• Data is not lost if a broker goes down, as replicas can take over.
20. Kafka Advantages
• High-throughput: With its efficient storage and retrieval mechanisms, Kafka can handle high-throughput
data streams.
• Fault-Tolerant: Kafka ensures data durability even during hardware failures, thanks to its replication
mechanism.
• Seamless Scalability: Apache Kafka’s architecture allows horizontal scaling, making it suitable for handling
large amounts of data without compromising performance.
• Real-time Data Processing: Kafka’s ability to process and transmit data in real time enables timely
decision-making and analytics.
• Durability
• Distributed
21. Kafka Disadvantages
• Complexity: Due to its distributed nature, setting up and configuring Kafka can be complex, especially
for beginners.
• Resource Intensive: Kafka’s replication and storage mechanisms can be resource-intensive, requiring
sufficient hardware resources.
• Zookeeper Dependency: Kafka relies on Apache ZooKeeper for distributed coordination and
configuration management. Managing ZooKeeper adds an additional layer of complexity to the overall
system.
22.
23. Kafka use cases
• Activity tracking : This was the original use case for Kafka. LinkedIn needed to rebuild its user
activity tracking pipeline as a set of real-time publish-subscribe feeds. Activity tracking is often
very high volume, as each user page view generates many activity messages (events).
• Real-time data processing: Many systems require data to be processed as soon as it becomes
available. Kafka transmits data from producers to consumers with very low latency (5
milliseconds, for instance). This is useful for:
o Financial organizations
o Predictive maintenance
o Autonomous mobile devices,