Apache Kafka

Outline
• Why do we use Apache Kafka ?
• What is it?
• How it works?
• Demo
• Ecosystem
1

Big Data
• Data doesn’t fit in one computer
• Welcome to the distributed systems 
2

(Near) Real-time Big Data & Analytics
• Events (e.g. clickstreams)
• Sensors
• Internet of Things (IoT)
• Data streams
3

Distributed Messaging Queues
• Scalable
• Reliable
• High throughput (read & write)
5

Why’s for Apache Kafka
• Clean and simple architecture
• Easy to use
• Easy to deploy
• High throughput
• Scalability
• High availability
• Persistence (for a while)
6

Apache Kafka 101
• Distributed, partitioned, replicated commit log
service.
• Provides the functionality of a messaging
system.
7

Cluster
8
Language agnostic
TCP protocol
Cluster => group of servers(brokers)

Topic
9
• Category or feed name to which messages are
published.
• Partitioned log
• Each partition
– Ordered
– Immutable seq.
– Appended to
offset => sequential id number

Partition Distribution
• Distributed over servers in the cluster
• Replicated for fault tolerance (configurable)
• Each partition has a leader server (read &
writes)
• Others acts followers (replicate leader)
• In case of partition failure one of the followers
becomes new leader
10

Producer
• Decides which message to which partition
– Round-robin
– Semantic partitioning
11

Consumer
• Queue vs. Publish/Subscribe
• Traditional queue ordering vs per-partition
ordering
12

Guarantees
• Messages in a partition will be same order
they are sent by a producer.
• Consumers see messages in the stored order
in log.
13

Demo
• Basic Command Line Tools
– Start a server
– Create a topic
– Send a message
– Start a consumer
– Multi-broker cluster
• No arguments displays usage information
14

Clients
• Java
• Python
• Ruby
• Go
• C/C++
• .NET
• Clojure
• Node.js
• Scala
• JRuby
• Perl
• Erlang
• PHP
• Rust
• HTTP Rest
15https://cwiki.apache.org/confluence/display/KAFKA/Clients

Administrative Tools
• Kafka Manager (powered by Yahoo)
• Kafkat : Command-line administration for Kafka
brokers.
• Kafka Web Console : Displays information about
your Kafka cluster including which nodes are up
and what topics they host data for.
• Kafka Offset Monitor : Displays the state of all
consumers and how far behind the head of the
stream they are.
16

Ecosystem
• Samza
• Spark Streaming
• Storm
17https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem

Use Cases
• Messaging
• Website activity tracking (at Linkedin)
• Metrics
• Log aggregation
• Stream processing (with Storm or Samza)
• Event sourcing (state changes are logged by time)
• Commit log (like database transaction log – log
compaction)
18

Who uses ?
• LinkedIn
• Yahoo
• Twitter
• Netflix
• Spotify
• Pinterest
• Uber
• Goldman Sachs
• Tumblr
• PayPal
• Box
• Airbnb
• Mozilla
• Cisco
• Etsy
• Foursquare
• StumbleUpon
• Coursera
• …
19https://cwiki.apache.org/confluence/display/KAFKA/Powered+By

Resources
• http://kafka.apache.org/
• https://cwiki.apache.org/confluence/display/KAFKA/Index
• https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem
• http://www.confluent.io/blog
20

About Me
• Twitter : @akisemre
• Linkedin : https://tr.linkedin.com/in/emreakis
22

Apache Kafka

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Apache Kafka

Similar to Apache Kafka (20)

Recently uploaded

Recently uploaded (20)

Apache Kafka

Editor's Notes