Topics
 What is Kafka?
 Why is Kafka important?
 Kafka architecture and design
What iskafk??
 Kafka® is used for building real-time data pipelines
and streaming apps.
Kafka is - a publish-subscribe-based durable
messaging system that is exchanging data between
processes, applications, and servers.
Kafka is run as a cluster on one or more servers that
can span multiple datacenters. Replicates Topic Log
Partitions to multiple servers
Apache Kafka® is a
distributed streaming
platform?
Who usesKafka
 LinkedIn: Activity data and operational metrics
 Twitter: Uses it as part of Storm – stream processing
infrastructure
 Square: Kafka as bus to move all system events to
various Square data centers (logs, custom events,
metrics, an so on). Outputs to Splunk, Graphite, Esper-
like alerting systems
 Spotify, Uber, Tumbler, Goldman Sachs, PayPal, Box,
Cisco, CloudFlare, DataDog, LucidWorks, MailChimp,
NetFlix, etc.
Continue…
 1/3 of all Fortune 500 companies
 Top ten travel companies, 7 of ten top banks, 8 of ten
top insurance companies, 9 of ten top telecom
companies
 LinkedIn, Microsoft and Netflix process 4 comma
message a day with Kafka (1,000,000,000,000)
 Real-time streams of data, used to collect big data or to
do real time analysis (or both)
Kafka isgenerally
usedfortwo
broad classesof
applications:
 Building real-time streaming data pipelines that
reliably get data between systems or applications
 Building real-time streaming applications that
transform or react to the streams of data
WhyKafkais
Needed?
 Real time streaming data processed for real time
analytics
(Service calls, track every call, IOT sensors )
 Apache Kafka is a fast, scalable, durable, and fault
tolerant publish-subscribe messaging system
 Kafka is often used instead of JMS, RabbitMQ and
AMQP,MsMQ
 Higher throughput, reliability and replication.
 Kafka communication from clients and servers wire
protocol over TCP protocol
Kafka Fundamentals
 TOPICS
The Kafka cluster stores
streams of records in categories
called topics.Each record consists
of a key, a value, and a
timestamp.
Kafka
Fundamentals
Continue..
 Producer API
publish a stream of records to one or more
Kafka topics.
 Consumer API
consume a stream of records.
 Broker
it is a Kafka server that runs in
a Kafka Cluster
Producer,Consu
mer,Broker.
Thank you

Kafka Basic For Beginners

  • 2.
    Topics  What isKafka?  Why is Kafka important?  Kafka architecture and design
  • 3.
    What iskafk??  Kafka®is used for building real-time data pipelines and streaming apps. Kafka is - a publish-subscribe-based durable messaging system that is exchanging data between processes, applications, and servers. Kafka is run as a cluster on one or more servers that can span multiple datacenters. Replicates Topic Log Partitions to multiple servers Apache Kafka® is a distributed streaming platform?
  • 9.
    Who usesKafka  LinkedIn:Activity data and operational metrics  Twitter: Uses it as part of Storm – stream processing infrastructure  Square: Kafka as bus to move all system events to various Square data centers (logs, custom events, metrics, an so on). Outputs to Splunk, Graphite, Esper- like alerting systems  Spotify, Uber, Tumbler, Goldman Sachs, PayPal, Box, Cisco, CloudFlare, DataDog, LucidWorks, MailChimp, NetFlix, etc.
  • 10.
    Continue…  1/3 ofall Fortune 500 companies  Top ten travel companies, 7 of ten top banks, 8 of ten top insurance companies, 9 of ten top telecom companies  LinkedIn, Microsoft and Netflix process 4 comma message a day with Kafka (1,000,000,000,000)  Real-time streams of data, used to collect big data or to do real time analysis (or both)
  • 11.
    Kafka isgenerally usedfortwo broad classesof applications: Building real-time streaming data pipelines that reliably get data between systems or applications  Building real-time streaming applications that transform or react to the streams of data
  • 12.
    WhyKafkais Needed?  Real timestreaming data processed for real time analytics (Service calls, track every call, IOT sensors )  Apache Kafka is a fast, scalable, durable, and fault tolerant publish-subscribe messaging system  Kafka is often used instead of JMS, RabbitMQ and AMQP,MsMQ  Higher throughput, reliability and replication.  Kafka communication from clients and servers wire protocol over TCP protocol
  • 13.
    Kafka Fundamentals  TOPICS TheKafka cluster stores streams of records in categories called topics.Each record consists of a key, a value, and a timestamp.
  • 14.
  • 15.
    Continue..  Producer API publisha stream of records to one or more Kafka topics.  Consumer API consume a stream of records.  Broker it is a Kafka server that runs in a Kafka Cluster
  • 16.
  • 17.