SlideShare a Scribd company logo
Apache Kafka
CHAPTER – 4
THE BASICS OF SEARCH ENGINE FRIENDLY DESIGN & DEVELOPMENT
Copyright @ 2019 Learntek. All Rights Reserved. 3
Apache Kafka
Data Analytics is often described as one of the biggest challenges associated with
big data, but even before that step can happen, data must be ingested and made
available to enterprise users. That’s where Apache Kafka comes in. Kafka’s growth
is exploding, more than 1⁄3 of all Fortune 500 companies use Kafka. These
companies includes the top ten travel companies, 7 of top ten banks, 8 of top ten
insurance companies, 9 of top ten telecom companies, and much more. LinkedIn,
Microsoft and Netflix process four comma messages a day with Kafka
(1,000,000,000,000).
Copyright @ 2019 Learntek. All Rights Reserved. 4
Introduction:
Apache Kafka is a streaming platform for collecting, storing, and processing high
volumes of data in real-time. Apache Kafka is a highly scalable, fast and fault-
tolerant messaging application used for streaming applications and data
processing. This application is written in Java and Scala programming languages.
Apache Kafka is a distributed data streaming platform that can publish, subscribe
to, store, and process streams of records in real time. It is designed to handle
data streams from multiple sources and deliver them to multiple consumers. In
short, it moves massive amounts of data – not just from point A to B, but from
points A to Z and anywhere else you need, all at the same time.
Apache Kafka started out as an internal system developed by LinkedIn to handle
1.4 trillion messages per day, but now it’s an open source data streaming solution
with application for a variety of enterprise needs.
Copyright @ 2019 Learntek. All Rights Reserved. 5
Copyright @ 2019 Learntek. All Rights Reserved. 6
Features:
•Apache Kafka is a distributed publish-subscribe messaging system that is designed to
be fast, scalable, and durable
•Apache Kafka is designed for distributed high throughput systems
•Apache Kafka tends to work very well as a replacement for a more traditional
message broker
•Apache Kafka has better throughput, built-in partitioning, replication and inherent
fault-tolerance, which makes it a good fit for large-scale message processing
applications
•Apache Kafka maintains feeds of messages in topics
•Producers write data to topics and consumers read from topics
•Since Kafka is a distributed system, topics are partitioned and replicated across
multiple nodes
•Kafka is very fast and guarantees zero downtime and zero data loss.
Copyright @ 2019 Learntek. All Rights Reserved. 7
Learn Big Data & Hadoop
Who uses Apache Kafka?
A lot of large companies who handle a lot of data use Kafka. LinkedIn, where it
originated, uses it to track activity data and operational metrics. Twitter uses it as
part of Storm to provide a stream processing infrastructure. Square uses Kafka as a
bus to move all system events to various Square data centers (logs, custom events,
metrics, and so on), outputs to Splunk, Graphite (dashboards), and to implement
an Esper-like/CEP alerting systems. It gets used by other companies too like Spotify,
Uber, Tumbler, Goldman Sachs, PayPal, Box, Cisco, CloudFlare, NetFlix, and much
more.
Copyright @ 2019 Learntek. All Rights Reserved. 8
Why is Kafka so Fast?
Kafka relies heavily on the OS kernel to move data around quickly. It relies on the
principals of Zero Copy. Kafka enables you to batch data records into chunks. These
batches of data can be seen end to end from Producer to file system (Kafka Topic
Log) to the Consumer. Batching allows for more efficient data compression and
reduces I/O latency. Kafka writes to the immutable commit log to the disk
sequential; thus, avoids random disk access, slow disk seeking. Kafka provides
horizontal Scale through sharding. It shards a Topic Log into hundreds potentially
thousands of partitions to thousands of servers. This sharding allows Kafka to
handle massive load.
Copyright @ 2019 Learntek. All Rights Reserved. 9
Key Benefits:
Copyright @ 2019 Learntek. All Rights Reserved. 10
Apache Kafka API:
Apache Kafka is a popular tool for developers because it is easy to pick up and
provides a powerful event streaming platform complete with 4 APIs: Producer,
Consumer, Streams, and Connect.
Basically, it has four core APIs:
•Producer API: This API permits the applications to publish a stream of records to
one or more topics.
•Consumer API: The Consumer API lets the application to subscribe to one or
more topics and process the produced stream of records.
•Streams API: This API takes the input from one or more topics and produces the
output to one or more topics by converting the input streams to the output ones.
•Connector API: This API is responsible for producing and executing reusable
producers and consumers who are able to link topics to the existing applications.
Copyright @ 2019 Learntek. All Rights Reserved. 11
Need for Apache Kafka :
•Kafka is a unified platform for handling all the real-time data feeds
•Kafka supports low latency message delivery and gives guarantee for fault tolerance in
the presence of machine failures
•It has the ability to handle a large number of diverse consumers
•Kafka is very fast, performs 2 million writes/sec
•Kafka persists all data to the disk, which essentially means that all the writes go to the
page cache of the OS (RAM)
•This makes it very efficient to transfer data from page cache to a network socket
Copyright @ 2019 Learntek. All Rights Reserved. 12
Apache Kafka – Use Cases:
Kafka can be used in many Use Cases. Some of them are listed below −
•Metrics− Kafka is often used for operational monitoring data. This involves
aggregating statistics from distributed applications to produce centralized feeds of
operational data.
•Twitter: Registered users can read and post tweets, but unregistered users can
only read tweets. Twitter uses Storm-Kafka as a part of their stream processing
infrastructure.
•Netflix: is an American multinational provider of on-demand Internet streaming
media. Netflix uses Kafka for real-time monitoring and event processing.
Copyright @ 2019 Learntek. All Rights Reserved. 13
•Log Aggregation Solution− Kafka can be used across an organization to collect
logs from multiple services and make them available in a standard format to multiple
con-summers.
•LinkedIn: Apache Kafka is used at LinkedIn for activity stream data and operational
metrics. Kafka messaging system helps LinkedIn with various products like LinkedIn
Newsfeed, LinkedIn Today for online message consumption and in addition to offline
analytics systems like Hadoop.
•Stream Processing− Popular frameworks such as Storm and Spark Streaming read
data from a topic, processes it, and write processed data to a new topic where it
becomes available for users and applications. Kafka’s strong durability is also very
useful in the context of stream processing.
Copyright @ 2019 Learntek. All Rights Reserved. 14
•Website activity tracking – The web application sends events such as page
views and searches Kafka, where they become available for real-time processing,
dashboards and offline analytics in Hadoop.
Copyright @ 2019 Learntek. All Rights Reserved. 15
For more Training Information , Contact Us
Email : info@learntek.org
USA : +1734 418 2465
INDIA : +40 4018 1306
+7799713624

More Related Content

What's hot

Using Apache Spark with IBM SPSS Modeler
Using Apache Spark with IBM SPSS ModelerUsing Apache Spark with IBM SPSS Modeler
Using Apache Spark with IBM SPSS Modeler
Global Knowledge Training
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
The Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scaleThe Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scale
Neha Narkhede
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
Will Gardella
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
Gwen (Chen) Shapira
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Slim Baltagi
 
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
Data Con LA
 
How do spark_kafka_and_syncsort_dmx-h
How do spark_kafka_and_syncsort_dmx-hHow do spark_kafka_and_syncsort_dmx-h
How do spark_kafka_and_syncsort_dmx-h
Precisely
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Zeeshan Khan
 
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Cloudera, Inc.
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
DataWorks Summit/Hadoop Summit
 
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
confluent
 
Stream processing and managing real-time data
Stream processing and managing real-time dataStream processing and managing real-time data
Stream processing and managing real-time data
Amazon Web Services
 
Impala use case @ Zoosk
Impala use case @ ZooskImpala use case @ Zoosk
Impala use case @ Zoosk
Cloudera, Inc.
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stack
DataWorks Summit/Hadoop Summit
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
confluent
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
 
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-timeFlurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Trieu Nguyen
 
Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016
Michael Kehoe
 

What's hot (20)

Using Apache Spark with IBM SPSS Modeler
Using Apache Spark with IBM SPSS ModelerUsing Apache Spark with IBM SPSS Modeler
Using Apache Spark with IBM SPSS Modeler
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
 
The Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scaleThe Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scale
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
 
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
 
How do spark_kafka_and_syncsort_dmx-h
How do spark_kafka_and_syncsort_dmx-hHow do spark_kafka_and_syncsort_dmx-h
How do spark_kafka_and_syncsort_dmx-h
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
 
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
 
Stream processing and managing real-time data
Stream processing and managing real-time dataStream processing and managing real-time data
Stream processing and managing real-time data
 
Impala use case @ Zoosk
Impala use case @ ZooskImpala use case @ Zoosk
Impala use case @ Zoosk
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stack
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-timeFlurry Analytic Backend - Processing Terabytes of Data in Real-time
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
 
Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016Couchbase Meetup Jan 2016
Couchbase Meetup Jan 2016
 

Similar to Apache kafka

Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Timothy Spann
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann
 
Apache kafka
Apache kafkaApache kafka
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Data Con LA
 
Kafka Basic For Beginners
Kafka Basic For BeginnersKafka Basic For Beginners
Kafka Basic For Beginners
Riby Varghese
 
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Denodo
 
A Short Presentation on Kafka
A Short Presentation on KafkaA Short Presentation on Kafka
A Short Presentation on Kafka
Mostafa Jubayer Khan
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
Data streaming
Data streamingData streaming
Data streaming
Alberto Paro
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
Slim Baltagi
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann
 
Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up
Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up  Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up
Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up
Knowledgent
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Long Nguyen
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
Joe Stein
 
Apache Kafka Use Cases_ When To Use It_ When Not To Use_.pdf
Apache Kafka Use Cases_ When To Use It_ When Not To Use_.pdfApache Kafka Use Cases_ When To Use It_ When Not To Use_.pdf
Apache Kafka Use Cases_ When To Use It_ When Not To Use_.pdf
Noman Shaikh
 
ITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming Apps
Timothy Spann
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Kumar Shivam
 
kafka-tutorial-cloudruable-v2.pdf
kafka-tutorial-cloudruable-v2.pdfkafka-tutorial-cloudruable-v2.pdf
kafka-tutorial-cloudruable-v2.pdf
PriyamTomar1
 
Apache frameworks for Big and Fast Data
Apache frameworks for Big and Fast DataApache frameworks for Big and Fast Data
Apache frameworks for Big and Fast Data
Naveen Korakoppa
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
DataWorks Summit/Hadoop Summit
 

Similar to Apache kafka (20)

Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
 
Kafka Basic For Beginners
Kafka Basic For BeginnersKafka Basic For Beginners
Kafka Basic For Beginners
 
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
 
A Short Presentation on Kafka
A Short Presentation on KafkaA Short Presentation on Kafka
A Short Presentation on Kafka
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
Data streaming
Data streamingData streaming
Data streaming
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
 
Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up
Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up  Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up
Stream Processing with Big Data: Knowledgent Big Data Palooza Meet-Up
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
Apache Kafka Use Cases_ When To Use It_ When Not To Use_.pdf
Apache Kafka Use Cases_ When To Use It_ When Not To Use_.pdfApache Kafka Use Cases_ When To Use It_ When Not To Use_.pdf
Apache Kafka Use Cases_ When To Use It_ When Not To Use_.pdf
 
ITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming Apps
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
kafka-tutorial-cloudruable-v2.pdf
kafka-tutorial-cloudruable-v2.pdfkafka-tutorial-cloudruable-v2.pdf
kafka-tutorial-cloudruable-v2.pdf
 
Apache frameworks for Big and Fast Data
Apache frameworks for Big and Fast DataApache frameworks for Big and Fast Data
Apache frameworks for Big and Fast Data
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
 

More from Janu Jahnavi

Analytics using r programming
Analytics using r programmingAnalytics using r programming
Analytics using r programming
Janu Jahnavi
 
Software testing
Software testingSoftware testing
Software testing
Janu Jahnavi
 
Software testing
Software testingSoftware testing
Software testing
Janu Jahnavi
 
Spring
SpringSpring
Spring
Janu Jahnavi
 
Stack skills
Stack skillsStack skills
Stack skills
Janu Jahnavi
 
Ui devopler
Ui devoplerUi devopler
Ui devopler
Janu Jahnavi
 
Apache flink
Apache flinkApache flink
Apache flink
Janu Jahnavi
 
Apache flink
Apache flinkApache flink
Apache flink
Janu Jahnavi
 
Angular js
Angular jsAngular js
Angular js
Janu Jahnavi
 
Mysql python
Mysql pythonMysql python
Mysql python
Janu Jahnavi
 
Mysql python
Mysql pythonMysql python
Mysql python
Janu Jahnavi
 
Ruby with cucmber
Ruby with cucmberRuby with cucmber
Ruby with cucmber
Janu Jahnavi
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Janu Jahnavi
 
Google cloud platform
Google cloud platformGoogle cloud platform
Google cloud platform
Janu Jahnavi
 
Google cloud Platform
Google cloud PlatformGoogle cloud Platform
Google cloud Platform
Janu Jahnavi
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
Janu Jahnavi
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
Janu Jahnavi
 
Categorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk pythonCategorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk python
Janu Jahnavi
 
Categorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk pythonCategorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk python
Janu Jahnavi
 
Python multithreading
Python multithreadingPython multithreading
Python multithreading
Janu Jahnavi
 

More from Janu Jahnavi (20)

Analytics using r programming
Analytics using r programmingAnalytics using r programming
Analytics using r programming
 
Software testing
Software testingSoftware testing
Software testing
 
Software testing
Software testingSoftware testing
Software testing
 
Spring
SpringSpring
Spring
 
Stack skills
Stack skillsStack skills
Stack skills
 
Ui devopler
Ui devoplerUi devopler
Ui devopler
 
Apache flink
Apache flinkApache flink
Apache flink
 
Apache flink
Apache flinkApache flink
Apache flink
 
Angular js
Angular jsAngular js
Angular js
 
Mysql python
Mysql pythonMysql python
Mysql python
 
Mysql python
Mysql pythonMysql python
Mysql python
 
Ruby with cucmber
Ruby with cucmberRuby with cucmber
Ruby with cucmber
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Google cloud platform
Google cloud platformGoogle cloud platform
Google cloud platform
 
Google cloud Platform
Google cloud PlatformGoogle cloud Platform
Google cloud Platform
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
 
Categorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk pythonCategorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk python
 
Categorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk pythonCategorizing and pos tagging with nltk python
Categorizing and pos tagging with nltk python
 
Python multithreading
Python multithreadingPython multithreading
Python multithreading
 

Recently uploaded

The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
ak6969907
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
simonomuemu
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 

Recently uploaded (20)

The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 

Apache kafka

  • 2. CHAPTER – 4 THE BASICS OF SEARCH ENGINE FRIENDLY DESIGN & DEVELOPMENT
  • 3. Copyright @ 2019 Learntek. All Rights Reserved. 3 Apache Kafka Data Analytics is often described as one of the biggest challenges associated with big data, but even before that step can happen, data must be ingested and made available to enterprise users. That’s where Apache Kafka comes in. Kafka’s growth is exploding, more than 1⁄3 of all Fortune 500 companies use Kafka. These companies includes the top ten travel companies, 7 of top ten banks, 8 of top ten insurance companies, 9 of top ten telecom companies, and much more. LinkedIn, Microsoft and Netflix process four comma messages a day with Kafka (1,000,000,000,000).
  • 4. Copyright @ 2019 Learntek. All Rights Reserved. 4 Introduction: Apache Kafka is a streaming platform for collecting, storing, and processing high volumes of data in real-time. Apache Kafka is a highly scalable, fast and fault- tolerant messaging application used for streaming applications and data processing. This application is written in Java and Scala programming languages. Apache Kafka is a distributed data streaming platform that can publish, subscribe to, store, and process streams of records in real time. It is designed to handle data streams from multiple sources and deliver them to multiple consumers. In short, it moves massive amounts of data – not just from point A to B, but from points A to Z and anywhere else you need, all at the same time. Apache Kafka started out as an internal system developed by LinkedIn to handle 1.4 trillion messages per day, but now it’s an open source data streaming solution with application for a variety of enterprise needs.
  • 5. Copyright @ 2019 Learntek. All Rights Reserved. 5
  • 6. Copyright @ 2019 Learntek. All Rights Reserved. 6 Features: •Apache Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable •Apache Kafka is designed for distributed high throughput systems •Apache Kafka tends to work very well as a replacement for a more traditional message broker •Apache Kafka has better throughput, built-in partitioning, replication and inherent fault-tolerance, which makes it a good fit for large-scale message processing applications •Apache Kafka maintains feeds of messages in topics •Producers write data to topics and consumers read from topics •Since Kafka is a distributed system, topics are partitioned and replicated across multiple nodes •Kafka is very fast and guarantees zero downtime and zero data loss.
  • 7. Copyright @ 2019 Learntek. All Rights Reserved. 7 Learn Big Data & Hadoop Who uses Apache Kafka? A lot of large companies who handle a lot of data use Kafka. LinkedIn, where it originated, uses it to track activity data and operational metrics. Twitter uses it as part of Storm to provide a stream processing infrastructure. Square uses Kafka as a bus to move all system events to various Square data centers (logs, custom events, metrics, and so on), outputs to Splunk, Graphite (dashboards), and to implement an Esper-like/CEP alerting systems. It gets used by other companies too like Spotify, Uber, Tumbler, Goldman Sachs, PayPal, Box, Cisco, CloudFlare, NetFlix, and much more.
  • 8. Copyright @ 2019 Learntek. All Rights Reserved. 8 Why is Kafka so Fast? Kafka relies heavily on the OS kernel to move data around quickly. It relies on the principals of Zero Copy. Kafka enables you to batch data records into chunks. These batches of data can be seen end to end from Producer to file system (Kafka Topic Log) to the Consumer. Batching allows for more efficient data compression and reduces I/O latency. Kafka writes to the immutable commit log to the disk sequential; thus, avoids random disk access, slow disk seeking. Kafka provides horizontal Scale through sharding. It shards a Topic Log into hundreds potentially thousands of partitions to thousands of servers. This sharding allows Kafka to handle massive load.
  • 9. Copyright @ 2019 Learntek. All Rights Reserved. 9 Key Benefits:
  • 10. Copyright @ 2019 Learntek. All Rights Reserved. 10 Apache Kafka API: Apache Kafka is a popular tool for developers because it is easy to pick up and provides a powerful event streaming platform complete with 4 APIs: Producer, Consumer, Streams, and Connect. Basically, it has four core APIs: •Producer API: This API permits the applications to publish a stream of records to one or more topics. •Consumer API: The Consumer API lets the application to subscribe to one or more topics and process the produced stream of records. •Streams API: This API takes the input from one or more topics and produces the output to one or more topics by converting the input streams to the output ones. •Connector API: This API is responsible for producing and executing reusable producers and consumers who are able to link topics to the existing applications.
  • 11. Copyright @ 2019 Learntek. All Rights Reserved. 11 Need for Apache Kafka : •Kafka is a unified platform for handling all the real-time data feeds •Kafka supports low latency message delivery and gives guarantee for fault tolerance in the presence of machine failures •It has the ability to handle a large number of diverse consumers •Kafka is very fast, performs 2 million writes/sec •Kafka persists all data to the disk, which essentially means that all the writes go to the page cache of the OS (RAM) •This makes it very efficient to transfer data from page cache to a network socket
  • 12. Copyright @ 2019 Learntek. All Rights Reserved. 12 Apache Kafka – Use Cases: Kafka can be used in many Use Cases. Some of them are listed below − •Metrics− Kafka is often used for operational monitoring data. This involves aggregating statistics from distributed applications to produce centralized feeds of operational data. •Twitter: Registered users can read and post tweets, but unregistered users can only read tweets. Twitter uses Storm-Kafka as a part of their stream processing infrastructure. •Netflix: is an American multinational provider of on-demand Internet streaming media. Netflix uses Kafka for real-time monitoring and event processing.
  • 13. Copyright @ 2019 Learntek. All Rights Reserved. 13 •Log Aggregation Solution− Kafka can be used across an organization to collect logs from multiple services and make them available in a standard format to multiple con-summers. •LinkedIn: Apache Kafka is used at LinkedIn for activity stream data and operational metrics. Kafka messaging system helps LinkedIn with various products like LinkedIn Newsfeed, LinkedIn Today for online message consumption and in addition to offline analytics systems like Hadoop. •Stream Processing− Popular frameworks such as Storm and Spark Streaming read data from a topic, processes it, and write processed data to a new topic where it becomes available for users and applications. Kafka’s strong durability is also very useful in the context of stream processing.
  • 14. Copyright @ 2019 Learntek. All Rights Reserved. 14 •Website activity tracking – The web application sends events such as page views and searches Kafka, where they become available for real-time processing, dashboards and offline analytics in Hadoop.
  • 15. Copyright @ 2019 Learntek. All Rights Reserved. 15 For more Training Information , Contact Us Email : info@learntek.org USA : +1734 418 2465 INDIA : +40 4018 1306 +7799713624