KAFKA
KAFKA In and Out
@Mohammed Fazuluddin
TOPICS
• Kafka overview
• Kafka advantages
• Kafka Real-time use cases.
• Kafka useful links for refence/help
KAFKA OVERVIEW
KAFKA OVERVIEW
• Kafka is apaches another great invention, It is an open source messaging system.
• Kafka works in combination with Apache Storm, Apache HBase and Apache Spark
for real-time analysis.
• Kafka can message geospatial data from a fleet of long-haul trucks or sensor data
from heating and cooling equipment in office buildings.
• Kafka brokers massive message streams for low-latency analysis in Enterprise
Apache Hadoop.
• Kafka is often used in place of traditional message brokers like JMS and AMQP
because of its higher throughput, reliability and replication
KAFKA ADVANTAGES
• Apache™ Kafka is
• Fast
• Scalable
• Durable
• Fault-tolerant publish-subscribe messaging system.
• Apache Kafka supports a wide range of use cases as a general-purpose messaging
system for scenarios where
• high throughput
• Reliable delivery
• Horizontal scalability are important.
KAFKA ADVANTAGES
• Common use cases include where Kafka can integrated into system architecture…
• Stream Processing
• Website Activity Tracking
• Metrics Collection and Monitoring
• Log Aggregation
KAFKA ADVANTAGES
• Kafka supported features and definition:
Feature Description
Scalability Distributed system scales easily with no downtime
Durability Persists messages on disk, and provides intra-cluster replication
Reliability Replicates data, supports multiple subscribers, and automatically balances
consumers in case of failure.
Performance High throughput for both publishing and subscribing, with disk structures that
provide constant performance even with many terabytes of stored messages.
KAFKA REAL-TIME USE CASES
• Kafka will be used to capture real-time events.
• Kafka will consume and publish the events in real-time systems with high
throughput.
• Kafka will support below mentioned features in real-time
• Persisting of the messages
• Integrated in distributed system
• Integrated in multi-client system
KAFKA REAL-TIME USE CASES.
• Kafka Producer-Broker-Consumer communication diagram:
KAFKA REAL-TIME USE CASES.
• Kafka is used in real-time analysis which includes…
• Website monitoring
• Network monitoring
• Fraud detection
• Web clicks
• Advertising
• Internet of Things: sensors
• It is becoming important to process events as they arrive for real-time insights, but
high performance at scale is necessary to do this.
• Kafka can be integrated with apache Spark Streaming, MapR-DB, and MapR Streams
for fast, event-driven applications.
KAFKA REAL-TIME USE CASES.
• Example Use Case:
• The example use case we will look at here is an application that monitors oil wells.
Sensors in oil rigs generate streaming data, which is processed by Spark and stored in
HBase, for use by various analytical and reporting tools.
• We want to store every single event in HBase as it streams in. We also want to filter for,
and store alarms. Daily Spark processing will store aggregated summary statistics.
KAFKA REAL-TIME USE CASES.
• How do we do this with high performance at scale?
• We need to collect the data, process the data, store the data, and finally serve the data
for analysis, machine learning, and dashboards.
• Streaming Data Ingestion, Spark Streaming supports data sources such as HDFS
directories, TCP sockets, Kafka, Flume, Twitter, etc., we will use MapR Streams, a new
distributed messaging system for streaming event data at scale.
• MapR Streams enables producers and consumers to exchange events in real time via the
Apache Kafka 0.9 API. MapR Streams integrates with Spark Streaming via the Kafka
direct approach.
• MapR Streams (or Kafka) topics are logical collections of messages. Topics organize
events into categories. Topics decouple producers, which are the sources of data, from
consumers, which are the applications that process, analyze, and share data.
KAFKA REAL-TIME USE CASES.
• Multiple publishers and consumers communication diagram:
KAFKA REAL-TIME USE CASES.
• With HBASE:A table is automatically partitioned across a cluster by key range, and
each server is the source for a subset of a table. Grouping the data by key range
provides for really fast read and writes by row key.
KAFKA REAL-TIME USES CASES.
• MapR Streams Producer Steps:
• Set producer properties:
• The first step is to set the KafkaProducer configuration properties, which will be used later to
instantiate a KafkaProducer for publishing messages to topics.
• Create a KafkaProducer:
• KafkaProducer by providing the set of key-value pair configuration properties which you set up in the first
step. You need to specify the type parameters as the type of the key-value of the messages that the
producer will send.
KAFKA REAL-TIME USE CASES.
• Build the ProducerRecord message:
• The ProducerRecord is a key-value pair to be sent to Kafka. It consists of a topic name to which the record
is being sent, an optional partition number, and an optional key and a message value.
• The ProducerRecord is also a Java generic class, whose type parameters should match the serialization
properties set before.
• Send the message:
• Call the send method on the KafkaProducer passing the ProducerRecord, which will asynchronously send
a record to the specified topic.
• The asynchronous send() method adds the record to a buffer of pending records to send, and
immediately returns. This allows sending records in parallel without waiting for the responses, and allows
the records to be batched for efficiency.
• Finally, call the close method on the producer to release resources. This method blocks until all requests
are complete.
KAFKA USEFUL LINKS FOR REFENCE/HELP
• https://kafka.apache.org/
• https://www.confluent.io/blog/stream-data-platform-1/
• http://blog.cloudera.com/blog/2014/09/apache-kafka-for-beginners/
THANKS
• If you feel it is helpful and worthy to share with other people, please share the same

Kafka presentation

  • 1.
    KAFKA KAFKA In andOut @Mohammed Fazuluddin
  • 2.
    TOPICS • Kafka overview •Kafka advantages • Kafka Real-time use cases. • Kafka useful links for refence/help
  • 3.
  • 4.
    KAFKA OVERVIEW • Kafkais apaches another great invention, It is an open source messaging system. • Kafka works in combination with Apache Storm, Apache HBase and Apache Spark for real-time analysis. • Kafka can message geospatial data from a fleet of long-haul trucks or sensor data from heating and cooling equipment in office buildings. • Kafka brokers massive message streams for low-latency analysis in Enterprise Apache Hadoop. • Kafka is often used in place of traditional message brokers like JMS and AMQP because of its higher throughput, reliability and replication
  • 5.
    KAFKA ADVANTAGES • Apache™Kafka is • Fast • Scalable • Durable • Fault-tolerant publish-subscribe messaging system. • Apache Kafka supports a wide range of use cases as a general-purpose messaging system for scenarios where • high throughput • Reliable delivery • Horizontal scalability are important.
  • 6.
    KAFKA ADVANTAGES • Commonuse cases include where Kafka can integrated into system architecture… • Stream Processing • Website Activity Tracking • Metrics Collection and Monitoring • Log Aggregation
  • 7.
    KAFKA ADVANTAGES • Kafkasupported features and definition: Feature Description Scalability Distributed system scales easily with no downtime Durability Persists messages on disk, and provides intra-cluster replication Reliability Replicates data, supports multiple subscribers, and automatically balances consumers in case of failure. Performance High throughput for both publishing and subscribing, with disk structures that provide constant performance even with many terabytes of stored messages.
  • 8.
    KAFKA REAL-TIME USECASES • Kafka will be used to capture real-time events. • Kafka will consume and publish the events in real-time systems with high throughput. • Kafka will support below mentioned features in real-time • Persisting of the messages • Integrated in distributed system • Integrated in multi-client system
  • 9.
    KAFKA REAL-TIME USECASES. • Kafka Producer-Broker-Consumer communication diagram:
  • 10.
    KAFKA REAL-TIME USECASES. • Kafka is used in real-time analysis which includes… • Website monitoring • Network monitoring • Fraud detection • Web clicks • Advertising • Internet of Things: sensors • It is becoming important to process events as they arrive for real-time insights, but high performance at scale is necessary to do this. • Kafka can be integrated with apache Spark Streaming, MapR-DB, and MapR Streams for fast, event-driven applications.
  • 11.
    KAFKA REAL-TIME USECASES. • Example Use Case: • The example use case we will look at here is an application that monitors oil wells. Sensors in oil rigs generate streaming data, which is processed by Spark and stored in HBase, for use by various analytical and reporting tools. • We want to store every single event in HBase as it streams in. We also want to filter for, and store alarms. Daily Spark processing will store aggregated summary statistics.
  • 12.
    KAFKA REAL-TIME USECASES. • How do we do this with high performance at scale? • We need to collect the data, process the data, store the data, and finally serve the data for analysis, machine learning, and dashboards. • Streaming Data Ingestion, Spark Streaming supports data sources such as HDFS directories, TCP sockets, Kafka, Flume, Twitter, etc., we will use MapR Streams, a new distributed messaging system for streaming event data at scale. • MapR Streams enables producers and consumers to exchange events in real time via the Apache Kafka 0.9 API. MapR Streams integrates with Spark Streaming via the Kafka direct approach. • MapR Streams (or Kafka) topics are logical collections of messages. Topics organize events into categories. Topics decouple producers, which are the sources of data, from consumers, which are the applications that process, analyze, and share data.
  • 13.
    KAFKA REAL-TIME USECASES. • Multiple publishers and consumers communication diagram:
  • 14.
    KAFKA REAL-TIME USECASES. • With HBASE:A table is automatically partitioned across a cluster by key range, and each server is the source for a subset of a table. Grouping the data by key range provides for really fast read and writes by row key.
  • 15.
    KAFKA REAL-TIME USESCASES. • MapR Streams Producer Steps: • Set producer properties: • The first step is to set the KafkaProducer configuration properties, which will be used later to instantiate a KafkaProducer for publishing messages to topics. • Create a KafkaProducer: • KafkaProducer by providing the set of key-value pair configuration properties which you set up in the first step. You need to specify the type parameters as the type of the key-value of the messages that the producer will send.
  • 16.
    KAFKA REAL-TIME USECASES. • Build the ProducerRecord message: • The ProducerRecord is a key-value pair to be sent to Kafka. It consists of a topic name to which the record is being sent, an optional partition number, and an optional key and a message value. • The ProducerRecord is also a Java generic class, whose type parameters should match the serialization properties set before. • Send the message: • Call the send method on the KafkaProducer passing the ProducerRecord, which will asynchronously send a record to the specified topic. • The asynchronous send() method adds the record to a buffer of pending records to send, and immediately returns. This allows sending records in parallel without waiting for the responses, and allows the records to be batched for efficiency. • Finally, call the close method on the producer to release resources. This method blocks until all requests are complete.
  • 17.
    KAFKA USEFUL LINKSFOR REFENCE/HELP • https://kafka.apache.org/ • https://www.confluent.io/blog/stream-data-platform-1/ • http://blog.cloudera.com/blog/2014/09/apache-kafka-for-beginners/
  • 18.
    THANKS • If youfeel it is helpful and worthy to share with other people, please share the same