SlideShare a Scribd company logo
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Cassandra and Kafka Support on AWS/EC2
Cloudurable
Introduction to Kafka
Support around Cassandra
and Kafka running in EC2
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Kafka growing
Why Kafka?
Kafka adoption is on the rise
but why
What is Kafka?
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka growth exploding
âť– Kafka growth exploding
âť– 1/3 of all Fortune 500 companies
âť– Top ten travel companies, 7 of top ten banks, 8 of top
ten insurance companies, 9 of top ten telecom
companies
âť– LinkedIn, Microsoft and Netflix process 4 comma
message a day with Kafka (1,000,000,000,000)
âť– Real-time streams of data, used to collect big data or to
do real time analysis (or both)
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Why Kafka is Needed?
âť– Real time streaming data processed for real time
analytics
âť– Service calls, track every call, IOT sensors
âť– Apache Kafka is a fast, scalable, durable, and fault-
tolerant publish-subscribe messaging system
âť– Kafka is often used instead of JMS, RabbitMQ and
AMQP
âť– higher throughput, reliability and replication
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Why is Kafka needed? 2
âť– Kafka can works in combination with
âť– Flume/Flafka, Spark Streaming, Storm, HBase and Spark
for real-time analysis and processing of streaming data
âť– Feed your data lakes with data streams
âť– Kafka brokers support massive message streams for follow-
up analysis in Hadoop or Spark
âť– Kafka Streaming (subproject) can be used for real-time
analytics
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Use Cases
âť– Stream Processing
âť– Website Activity Tracking
âť– Metrics Collection and Monitoring
âť– Log Aggregation
âť– Real time analytics
âť– Capture and ingest data into Spark / Hadoop
âť– CRQS, replay, error recovery
âť– Guaranteed distributed commit log for in-memory computing
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Who uses Kafka?
âť– LinkedIn: Activity data and operational metrics
❖ Twitter: Uses it as part of Storm – stream processing
infrastructure
âť– Square: Kafka as bus to move all system events to various
Square data centers (logs, custom events, metrics, an so
on). Outputs to Splunk, Graphite, Esper-like alerting
systems
âť– Spotify, Uber, Tumbler, Goldman Sachs, PayPal, Box,
Cisco, CloudFlare, DataDog, LucidWorks, MailChimp,
NetFlix, etc.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Why is Kafka Popular?
âť– Great performance
âť– Operational Simplicity, easy to setup and use, easy to reason
âť– Stable, Reliable Durability,
âť– Flexible Publish-subscribe/queue (scales with N-number of consumer groups),
âť– Robust Replication,
âť– Producer Tunable Consistency Guarantees,
âť– Ordering Preserved at shard level (Topic Partition)
âť– Works well with systems that have data streams to process, aggregate,
transform & load into other stores
Most important reason: Kafka’s great performance: throughput, latency, obtained through great engineering
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Why is Kafka so fast?
âť– Zero Copy - calls the OS kernel direct rather to move data fast
âť– Batch Data in Chunks - Batches data into chunks
âť– end to end from Producer to file system to Consumer
âť– Provides More efficient data compression. Reduces I/O latency
âť– Sequential Disk Writes - Avoids Random Disk Access
âť– writes to immutable commit log. No slow disk seeking. No random I/O
operations. Disk accessed in sequential manner
âť– Horizontal Scale - uses 100s to thousands of partitions for a single topic
âť– spread out to thousands of servers
âť– handle massive load
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Streaming Architecture
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Why Kafka Review
âť– Why is Kafka so fast?
âť– How fast is Kafka usage growing?
âť– How is Kafka getting used?
âť– Where does Kafka fit in the Big Data Architecture?
âť– How does Kafka relate to real-time analytics?
âť– Who uses Kafka?
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Cassandra / Kafka Support in EC2/AWS
What is Kafka? Kafka messaging
Kafka Overview
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
What is Kafka?
âť– Distributed Streaming Platform
âť– Publish and Subscribe to streams of records
âť– Fault tolerant storage
âť– Replicates Topic Log Partitions to multiple servers
âť– Process records as they occur
âť– Fast, efficient IO, batching, compression, and more
âť– Used to decouple data streams
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Decoupling Data
Streams
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Polyglot clients / Wire
protocol
âť– Kafka communication from clients and servers wire
protocol over TCP protocol
âť– Protocol versioned
âť– Maintains backwards compatibility
âť– Many languages supported
âť– Kafka REST proxy allows easy integration
âť– Also provides Avro/Schema registry support via Kafka
ecosystem
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Usage
âť– Build real-time streaming applications that react to streams
âť– Real-time data analytics
âť– Transform, react, aggregate, join real-time data flows
âť– Feed events to CEP for complex event processing
âť– Feed data lakes
âť– Build real-time streaming data pipe-lines
âť– Enable in-memory microservices (actors, Akka, Vert.x, Qbit,
RxJava)
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Use Cases
âť– Metrics / KPIs gathering
âť– Aggregate statistics from many sources
âť– Event Sourcing
âť– Used with microservices (in-memory) and actor systems
âť– Commit Log
âť– External commit log for distributed systems. Replicated
data between nodes, re-sync for nodes to restore state
âť– Real-time data analytics, Stream Processing, Log
Aggregation, Messaging, Click-stream tracking, Audit trail,
etc.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Record Retention
âť– Kafka cluster retains all published records
❖ Time based – configurable retention period
âť– Size based - configurable based on size
âť– Compaction - keeps latest record
âť– Retention policy of three days or two weeks or a month
âť– It is available for consumption until discarded by time, size or
compaction
âť– Consumption speed not impacted by size
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka scalable message
storage
âť– Kafka acts as a good storage system for records/messages
âť– Records written to Kafka topics are persisted to disk and replicated to
other servers for fault-tolerance
âť– Kafka Producers can wait on acknowledgment
âť– Write not complete until fully replicated
âť– Kafka disk structures scales well
âť– Writing in large streaming batches is fast
âť– Clients/Consumers can control read position (offset)
âť– Kafka acts like high-speed file system for commit log storage,
replication
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Review
âť– How does Kafka decouple streams of data?
âť– What are some use cases for Kafka where you work?
âť– What are some common use cases for Kafka?
âť– How is Kafka like a distributed message storage
system?
âť– How does Kafka know when to delete old messages?
âť– Which programming languages does Kafka support?
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Kafka Architecture
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Fundamentals
âť– Records have a key (optional), value and timestamp; Immutable
❖ Topic a stream of records (“/orders”, “/user-signups”), feed name
âť– Log topic storage on disk
âť– Partition / Segments (parts of Topic Log)
âť– Producer API to produce a streams or records
âť– Consumer API to consume a stream of records
âť– Broker: Kafka server that runs in a Kafka Cluster. Brokers form a cluster.
Cluster consists on many Kafka Brokers on many servers.
âť– ZooKeeper: Does coordination of brokers/cluster topology. Consistent file
system for configuration information and leadership election for Broker Topic
Partition Leaders
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka: Topics, Producers, and
Consumers
Kafka
Cluster
Topic
Producer
Producer
Producer
Consumer
Consumer
Consumer
record
record
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Core Kafka
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka needs Zookeeper
âť– Zookeeper helps with leadership election of Kafka Broker and
Topic Partition pairs
âť– Zookeeper manages service discovery for Kafka Brokers that
form the cluster
âť– Zookeeper sends changes to Kafka
âť– New Broker join, Broker died, etc.
âť– Topic removed, Topic added, etc.
âť– Zookeeper provides in-sync view of Kafka Cluster configuration
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Producer/Consumer
Details
âť– Producers write to and Consumers read from Topic(s)
âť– Topic associated with a log which is data structure on disk
âť– Producer(s) append Records at end of Topic log
âť– Topic Log consist of Partitions -
âť– Spread to multiple files on multiple nodes
âť– Consumers read from Kafka at their own cadence
âť– Each Consumer (Consumer Group) tracks offset from where they left off reading
âť– Partitions can be distributed on different machines in a cluster
âť– high performance with horizontal scalability and failover with replication
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Topic Partition, Consumers,
Producers
0 1 42 3 5 6 7 8 9 10 11
Partition
0
Consumer Group A
Producer
Consumer Group B
Consumer groups remember offset where they left off.
Consumers groups each have their own offset.
Producer writing to offset 12 of Partition 0 while…
Consumer Group A is reading from offset 6.
Consumer Group B is reading from offset 9.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Scale and Speed
âť– How can Kafka scale if multiple producers and consumers read/write to
same Kafka Topic log?
âť– Writes fast: Sequential writes to filesystem are fast (700 MB or more a
second)
âť– Scales writes and reads by sharding:
âť– Topic logs into Partitions (parts of a Topic log)
âť– Topics logs can be split into multiple Partitions different
machines/different disks
âť– Multiple Producers can write to different Partitions of the same Topic
âť– Multiple Consumers Groups can read from different partitions efficiently
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Brokers
âť– Kafka Cluster is made up of multiple Kafka Brokers
âť– Each Broker has an ID (number)
âť– Brokers contain topic log partitions
âť– Connecting to one broker bootstraps client to entire
cluster
âť– Start with at least three brokers, cluster can have, 10,
100, 1000 brokers if needed
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Cluster, Failover, ISRs
âť– Topic Partitions can be replicated
âť– across multiple nodes for failover
âť– Topic should have a replication factor greater than 1
âť– (2, or 3)
âť– Failover
âť– if one Kafka Broker goes down then Kafka Broker with
ISR (in-sync replica) can serve data
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
ZooKeeper does coordination for Kafka
Cluster
Kafka BrokerProducer
Producer
Producer
Consumer
Consumer
Consumer
Kafka Broker
Kafka Broker
Topic
ZooKeeper
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Failover vs. Disaster Recovery
âť– Replication of Kafka Topic Log partitions allows for failure of
a rack or AWS availability zone
âť– You need a replication factor of at least 3
âť– Kafka Replication is for Failover
âť– Mirror Maker is used for Disaster Recovery
âť– Mirror Maker replicates a Kafka cluster to another data-center
or AWS region
âť– Called mirroring since replication happens within a cluster
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Review
âť– How does Kafka decouple streams of data?
âť– What are some use cases for Kafka where you work?
âť– What are some common use cases for Kafka?
âť– What is a Topic?
âť– What is a Broker?
âť– What is a Partition? Offset?
âť– Can Kafka run without Zookeeper?
âť– How do implement failover in Kafka?
âť– How do you implement disaster recovery in Kafka?
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Kafka versus
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka vs JMS, SQS, RabbitMQ
Messaging
âť– Is Kafka a Queue or a Pub/Sub/Topic?
âť– Yes
âť– Kafka is like a Queue per consumer group
âť– Kafka is a queue system per consumer in consumer group so load
balancing like JMS, RabbitMQ queue
âť– Kafka is like Topics in JMS, RabbitMQ, MOM
âť– Topic/pub/sub by offering Consumer Groups which act like
subscriptions
âť– Broadcast to multiple consumer groups
âť– MOM = JMS, ActiveMQ, RabbitMQ, IBM MQ Series, Tibco, etc.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka vs MOM
âť– By design, Kafka is better suited for scale than traditional MOM systems due
to partition topic log
âť– Load divided among Consumers for read by partition
âť– Handles parallel consumers better than traditional MOM
âť– Also by moving location (partition offset) in log to client/consumer side of
equation instead of the broker, less tracking required by Broker and more
flexible consumers
âť– Kafka written with mechanical sympathy, modern hardware, cloud in mind
âť– Disks are faster
âť– Servers have tons of system memory
âť– Easier to spin up servers for scale out
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kinesis and Kafka are similar
âť– Kinesis Streams is like Kafka Core
âť– Kinesis Analytics is like Kafka Streams
âť– Kinesis Shard is like Kafka Partition
âť– Similar and get used in similar use cases
âť– In Kinesis, data is stored in shards. In Kafka, data is stored in partitions
âť– Kinesis Analytics allows you to perform SQL like queries on data streams
âť– Kafka Streaming allows you to perform functional aggregations and mutations
âť– Kafka integrates well with Spark and Flink which allows SQL like queries on
streams
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka vs. Kinesis
âť– Data is stored in Kinesis for default 24 hours, and you can increase that up to 7 days.
âť– Kafka records default stored for 7 days
âť– can increase until you run out of disk space.
âť– Decide by the size of data or by date.
âť– Can use compaction with Kafka so it only stores the latest timestamp per key per record
in the log
âť– With Kinesis data can be analyzed by lambda before it gets sent to S3 or RedShift
âť– With Kinesis you pay for use, by buying read and write units.
âť– Kafka is more flexible than Kinesis but you have to manage your own clusters, and requires
some dedicated DevOps resources to keep it going
âť– Kinesis is sold as a service and does not require a DevOps team to keep it going (can be
more expensive and less flexible, but much easier to setup and run)
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Kafka Topics
Kafka Topics
Architecture
Replication
Failover
Parallel processing
Kafka Topic Architecture
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Topics, Logs, Partitions
âť– Kafka Topic is a stream of records
âť– Topics stored in log
âť– Log broken up into partitions and segments
âť– Topic is a category or stream name or feed
âť– Topics are pub/sub
âť– Can have zero or many subscribers - consumer groups
âť– Topics are broken up and spread by partitions for speed and
size
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Topic Partitions
âť– Topics are broken up into partitions
âť– Partitions decided usually by key of record
âť– Key of record determines which partition
âť– Partitions are used to scale Kafka across many servers
âť– Record sent to correct partition by key
âť– Partitions are used to facilitate parallel consumers
âť– Records are consumed in parallel up to the number of partitions
âť– Order guaranteed per partition
âť– Partitions can be replicated to multiple brokers
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Topic Partition Log
âť– Order is maintained only in a single partition
âť– Partition is ordered, immutable sequence of records that is continually appended
to—a structured commit log
âť– Records in partitions are assigned sequential id number called the offset
âť– Offset identifies each record within the partition
âť– Topic Partitions allow Kafka log to scale beyond a size that will fit on a single server
âť– Topic partition must fit on servers that host it
âť– topic can span many partitions hosted on many servers
âť– Topic Partitions are unit of parallelism - a partition can only be used by one
consumer in group at a time
âť– Consumers can run in their own process or their own thread
âť– If a consumer stops, Kafka spreads partitions across remaining consumer in group
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Topic Partitions Layout
0 1 42 3 5 6 7 8 9 10 11
0 1 42 3 5 6 7 8
0 1 42 3 5 6 7 8 9 10
Older Newer
0 1 42 3 5 6 7
Partition
0
Partition
1
Partition
2
Partition
3
Writes
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Replication: Kafka Partition
Distribution
âť– Each partition has leader server and zero or more follower
servers
âť– Leader handles all read and write requests for partition
âť– Followers replicate leader, and take over if leader dies
âť– Used for parallel consumer handling within a group
âť– Partitions of log are distributed over the servers in the Kafka cluster
with each server handling data and requests for a share of
partitions
âť– Each partition can be replicated across a configurable number of
Kafka servers - Used for fault tolerance
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Replication: Kafka Partition
Leader
❖ One node/partition’s replicas is chosen as leader
âť– Leader handles all reads and writes of Records for
partition
âť– Writes to partition are replicated to followers
(node/partition pair)
âť– An follower that is in-sync is called an ISR (in-sync
replica)
âť– If a partition leader fails, one ISR is chosen as new
leader
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Replication to Partition 0
Kafka Broker 0
Partition 0
Partition 1
Partition 2
Partition 3
Partition 4
Kafka Broker 1
Partition 0
Partition 1
Partition 2
Partition 3
Partition 4
Kafka Broker 2
Partition 1
Partition 2
Partition 3
Partition 4
Client Producer
1) Write record
Partition 0
2) Replicate
record
2) Replicate
record
Leader Red
Follower Blue
Record is considered "committed"
when all ISRs for partition
wrote to their log.
Only committed records are
readable from consumer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Replication to Partitions
1
Kafka Broker 0
Partition 0
Partition 1
Partition 2
Partition 3
Partition 4
Kafka Broker 1
Partition 0
Partition 1
Partition 2
Partition 3
Partition 4
Kafka Broker 2
Partition 1
Partition 2
Partition 3
Partition 4
Client Producer
1) Write record
Partition 0
2) Replicate
record
2) Replicate
record
Another partition can
be owned
by another leader
on another Kafka broker
Leader Red
Follower Blue
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Topic Review
âť– What is an ISR?
âť– How does Kafka scale Consumers?
âť– What are leaders? followers?
âť– How does Kafka perform failover for Consumers?
âť– How does Kafka perform failover for Brokers?
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Kafka Producers
Kafka Producers
Partition selection
Durability
Kafka Producer Architecture
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Producers
âť– Producers send records to topics
âť– Producer picks which partition to send record to per topic
âť– Can be done in a round-robin
âť– Can be based on priority
âť– Typically based on key of record
âť– Kafka default partitioner for Java uses hash of keys to
choose partitions, or a round-robin strategy if no key
âť– Important: Producer picks partition
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Producers and
Consumers
0 1 42 3 5 6 7 8 9 10 11
Partition
0
Producers
Consumer Group A
Producers are writing at Offset 12
Consumer Group A is Reading from Offset 9.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Producers
âť– Producers write at their own cadence so order of Records
cannot be guaranteed across partitions
âť– Producer configures consistency level (ack=0, ack=all,
ack=1)
âť– Producers pick the partition such that Record/messages
goes to a given same partition based on the data
âť– Example have all the events of a certain 'employeeId' go to
same partition
âť– If order within a partition is not needed, a 'Round Robin'
partition strategy can be used so Records are evenly
distributed across partitions.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Producer Review
âť– Can Producers occasionally write faster than
consumers?
âť– What is the default partition strategy for Producers
without using a key?
âť– What is the default partition strategy for Producers using
a key?
âť– What picks which partition a record is sent to?
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Kafka Consumers
Load balancing consumers
Failover for consumers
Offset management per consumer
group
Kafka Consumer Architecture
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Consumer Groups
âť– Consumers are grouped into a Consumer Group
âť– Consumer group has a unique id
âť– Each consumer group is a subscriber
âť– Each consumer group maintains its own offset
âť– Multiple subscribers = multiple consumer groups
âť– Each has different function: one might delivering records to
microservices while another is streaming records to Hadoop
âť– A Record is delivered to one Consumer in a Consumer Group
âť– Each consumer in consumer groups takes records and only one consumer
in group gets same record
âť– Consumers in Consumer Group load balance record consumption
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Consumer Load Share
âť– Kafka Consumer consumption divides partitions over consumers in a
Consumer Group
âť– Each Consumer is exclusive consumer of a "fair share" of partitions
âť– This is Load Balancing
âť– Consumer membership in Consumer Group is handled by the Kafka
protocol dynamically
âť– If new Consumers join Consumer group, it gets a share of partitions
âť– If Consumer dies, its partitions are split among remaining live Consumers
in Consumer Group
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Consumer Groups
0 1 42 3 5 6 7 8 9 10 11
Partition
0
Consumer Group A
Producers
Consumer Group B
Consumers remember offset where they left off.
Consumers groups each have their own offset per partition.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Consumer Groups
Processing
âť– How does Kafka divide up topic so multiple Consumers in a Consumer
Group can process a topic?
âť– You group consumers into consumers group with a group id
âť– Consumers with same id belong in same Consumer Group
âť– One Kafka broker becomes group coordinator for Consumer Group
âť– assigns partitions when new members arrive (older clients would talk
direct to ZooKeeper now broker does coordination)
âť– or reassign partitions when group members leave or topic changes
(config / meta-data change
âť– When Consumer group is created, offset set according to reset policy of
topic
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Consumer Failover
âť– Consumers notify broker when it successfully processed a
record
âť– advances offset
âť– If Consumer fails before sending commit offset to Kafka
broker,
âť– different Consumer can continue from the last committed
offset
âť– some Kafka records could be reprocessed
âť– at least once behavior
âť– messages should be idempotent
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Consumer Offsets and
Recovery
❖ Kafka stores offsets in topic called “__consumer_offset”
âť– Uses Topic Log Compaction
âť– When a consumer has processed data, it should
commit offsets
âť– If consumer process dies, it will be able to start up and
start reading where it left off based on offset stored in
“__consumer_offset”
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Consumer: What can be
consumed?
âť– "Log end offset" is offset of last record written
to log partition and where Producers write to
next
âť– "High watermark" is offset of last record
successfully replicated to all partitions followers
❖ Consumer only reads up to “high watermark”.
Consumer can’t read un-replicated data
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Consumer to Partition
Cardinality
âť– Only a single Consumer from the same Consumer
Group can access a single Partition
âť– If Consumer Group count exceeds Partition count:
âť– Extra Consumers remain idle; can be used for failover
âť– If more Partitions than Consumer Group instances,
âť– Some Consumers will read from more than one
partition
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
2 server Kafka cluster hosting 4 partitions (P0-P5)
Kafka Cluster
Server 2
P0 P1 P5
Server 1
P2 P3 P4
Consumer Group A
C0 C1 C3
Consumer Group B
C0 C1 C3
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Multi-threaded Consumers
âť– You can run more than one Consumer in a JVM process
âť– If processing records takes a while, a single Consumer can run multiple threads to process
records
âť– Harder to manage offset for each Thread/Task
âť– One Consumer runs multiple threads
âť– 2 messages on same partitions being processed by two different threads
âť– Hard to guarantee order without threads coordination
âť– PREFER: Multiple Consumers can run each processing record batches in their own thread
âť– Easier to manage offset
âť– Each Consumer runs in its thread
âť– Easier to mange failover (each process runs X num of Consumer threads)
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Consumer Review
âť– What is a consumer group?
âť– Does each consumer have its own offset?
âť– When can a consumer see a record?
âť– What happens if there are more consumers than
partitions?
âť– What happens if you run multiple consumers in many
thread in the same JVM?
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Using Kafka Single Node
Using Kafka Single
Node
Run ZooKeeper, Kafka
Create a topic
Send messages from command
line
Read messages from command
line
Tutorial Using Kafka Single Node
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run Kafka
âť– Run ZooKeeper start up script
âť– Run Kafka Server/Broker start up script
âť– Create Kafka Topic from command line
âť– Run producer from command line
âť– Run consumer from command line
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run ZooKeeper
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run Kafka Server
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Create Kafka Topic
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
List Topics
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run Kafka Producer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run Kafka Consumer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Running Kafka Producer and
Consumer
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Single Node Review
âť– What server do you run first?
âť– What tool do you use to create a topic?
âť– What tool do you use to see topics?
âť– What tool did we use to send messages on the command line?
âť– What tool did we use to view messages in a topic?
âť– Why were the messages coming out of order?
âť– How could we get the messages to come in order from the
consumer?
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Use Kafka to send and receive messages
Lab Use Kafka
Use single server version of
Kafka.
Setup single node.
Single ZooKeeper.
Create a topic.
Produce and consume messages
from the command line.
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Using Kafka Cluster
and Failover
Demonstrate Kafka Cluster
Create topic with replication
Show consumer failover
Show broker failover
Kafka Tutorial Cluster and
Failover
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Objectives
âť– Run many Kafka Brokers
âť– Create a replicated topic
âť– Demonstrate Pub / Sub
âť– Demonstrate load balancing
consumers
âť– Demonstrate consumer failover
âť– Demonstrate broker failover
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Running many nodes
âť– If not already running, start up ZooKeeper
âť– Shutdown Kafka from first lab
âť– Copy server properties for three brokers
âť– Modify properties files, Change port, Change Kafka log
location
âť– Start up many Kafka server instances
âť– Create Replicated Topic
âť– Use the replicated topic
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Create three new server-
n.properties files
âť– Copy existing server.properties to server-
0.properties, server-1.properties, server-2.properties
âť– Change server-1.properties to use log.dirs
“./logs/kafka-logs-0”
âť– Change server-1.properties to use port 9093, broker
id 1, and log.dirs “./logs/kafka-logs-1”
âť– Change server-2.properties to use port 9094, broker
id 2, and log.dirs “./logs/kafka-logs-2”
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Modify server-x.properties
âť– Each have different
broker.id
âť– Each have different
log.dirs
âť– Each had different
port
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Create Startup scripts for three Kafka
servers
âť– Passing properties files
from last step
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Run Servers
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Create Kafka replicated topic my-
failsafe-topic
âť– Replication Factor is set to 3
âť– Topic name is my-failsafe-topic
âť– Partitions is 13
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Start Kafka Consumer
âť– Pass list of Kafka servers to bootstrap-
server
âť– We pass two of the three
âť– Only one needed, it learns about the rest
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Start Kafka Producer
âť– Start producer
âť– Pass list of Kafka Brokers
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka 1 consumer and 1 producer
running
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Start a second and third
consumer
âť– Acts like pub/sub
âť– Each consumer
in its own group
âť– Message goes to
each
âť– How do we load
share?
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Running consumers in same
group
âť– Modify start consumer script
âť– Add the consumers to a group called
mygroup
âť– Now they will share load
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Start up three consumers
again
âť– Start up producer and three consumers
âť– Send 7 messages
âť– Notice how messages are spread among 3 consumers
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Consumer Failover
âť– Kill one consumer
âť– Send seven more
messages
âť– Load is spread to
remaining consumers
âť– Failover WORK!
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Create Kafka Describe Topic
❖ —describe will show list partitions, ISRs, and
partition leadership
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Use Describe Topics
âť– Lists which broker owns (leader of) which partition
âť– Lists Replicas and ISR (replicas that are up to
date)
âť– Notice there are 13 topics
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Test Broker Failover: Kill 1st
server
se Kafka topic describe to see that a new leader was elected!
Kill the first server
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Show Broker Failover Worked
âť– Send two more
messages from the
producer
âť– Notice that the
consumer gets the
messages
âť– Broker Failover WORKS!
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Cluster Review
âť– Why did the three consumers not load share the
messages at first?
âť– How did we demonstrate failover for consumers?
âť– How did we demonstrate failover for producers?
âť– What tool and option did we use to show ownership of
partitions and the ISRs?
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Use Kafka to send and receive messages
Lab 2 Use Kafka
multiple nodes
Use a Kafka Cluster to
replicate a Kafka topic log
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Kafka Ecosystem
Kafka Connect
Kafka Streaming
Kafka Schema Registry
Kafka REST
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Ecosystem
âť– Kafka Streams
âť– Streams API to transform, aggregate, process records from a stream and
produce derivative streams
âť– Kafka Connect
âť– Connector API reusable producers and consumers
âť– (e.g., stream of changes from DynamoDB)
âť– Kafka REST Proxy
âť– Producers and Consumers over REST (HTTP)
âť– Schema Registry - Manages schemas using Avro for Kafka Records
âť– Kafka MirrorMaker - Replicate cluster data to another cluster
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka REST Proxy and
Kafka Schema Registry
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Ecosystem
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Stream Processing
âť– Kafka Streams for Stream Processing
âť– Kafka enable real-time processing of streams.
âť– Kafka Streams supports Stream Processor
âť– processing, transformation, aggregation, and produces 1 to * output streams
âť– Example: video player app sends events videos watched, videos paused
âť– output a new stream of user preferences
âť– can gear new video recommendations based on recent user activity
âť– can aggregate activity of many users to see what new videos are hot
âť– Solves hard problems: out of order records, aggregating/joining across streams, stateful
computations, and more
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Connectors and
Streams
Kafka
Cluster
App
App
App
App
App
App
DB DB
App App
Connectors
Producers
Consumers
Streams
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Kafka Ecosystem review
âť– What is Kafka Streams?
âť– What is Kafka Connect?
âť– What is the Schema Registry?
âť– What is Kafka Mirror Maker?
âť– When might you use Kafka REST Proxy?
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
References
âť– Learning Apache Kafka, Second Edition 2nd Edition by Nishant Garg (Author),
2015, ISBN 978-1784393090, Packet Press
âť– Apache Kafka Cookbook, 1st Edition, Kindle Edition by Saurabh Minni (Author),
2015, ISBN 978-1785882449, Packet Press
âť– Kafka Streams for Stream processing: A few words about how Kafka works,
Serban Balamaci, 2017, Blog: Plain Ol' Java
âť– Kafka official documentation, 2017
âť– Why we need Kafka? Quora
âť– Why is Kafka Popular? Quora
âť– Why is Kafka so Fast? Stackoverflow
âť– Kafka growth exploding (Tech Republic)

More Related Content

What's hot

Kafka Tutorial: Streaming Data Architecture
Kafka Tutorial: Streaming Data ArchitectureKafka Tutorial: Streaming Data Architecture
Kafka Tutorial: Streaming Data Architecture
Jean-Paul Azar
 
Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...
Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...
Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...
Jean-Paul Azar
 
Kafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema RegistryKafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema Registry
Jean-Paul Azar
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
Jean-Paul Azar
 
Kafka Tutorial Advanced Kafka Consumers
Kafka Tutorial Advanced Kafka ConsumersKafka Tutorial Advanced Kafka Consumers
Kafka Tutorial Advanced Kafka Consumers
Jean-Paul Azar
 
Amazon AWS basics needed to run a Cassandra Cluster in AWS
Amazon AWS basics needed to run a Cassandra Cluster in AWSAmazon AWS basics needed to run a Cassandra Cluster in AWS
Amazon AWS basics needed to run a Cassandra Cluster in AWS
Jean-Paul Azar
 
Kafka Tutorial: Kafka Security
Kafka Tutorial: Kafka SecurityKafka Tutorial: Kafka Security
Kafka Tutorial: Kafka Security
Jean-Paul Azar
 
Kafka as a message queue
Kafka as a message queueKafka as a message queue
Kafka as a message queue
SoftwareMill
 
Best Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker ContainersBest Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker Containers
BlueData, Inc.
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
Rahul Jain
 
Kafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsKafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internals
Ayyappadas Ravindran (Appu)
 
Schema Evolution for Resilient Data microservices
Schema Evolution for Resilient Data microservicesSchema Evolution for Resilient Data microservices
Schema Evolution for Resilient Data microservices
VinĂ­cius Carvalho
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
StreamNative
 
Apache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data StreamingApache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data Streaming
Shivji Kumar Jha
 
ES & Kafka
ES & KafkaES & Kafka
ES & Kafka
Diego Pacheco
 
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals
Knoldus Inc.
 
Javaeeconf 2016 how to cook apache kafka with camel and spring boot
Javaeeconf 2016 how to cook apache kafka with camel and spring bootJavaeeconf 2016 how to cook apache kafka with camel and spring boot
Javaeeconf 2016 how to cook apache kafka with camel and spring boot
Ivan Vasyliev
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Shivji Kumar Jha
 
Deploying Kafka on DC/OS
Deploying Kafka on DC/OSDeploying Kafka on DC/OS
Deploying Kafka on DC/OS
Kaufman Ng
 

What's hot (20)

Kafka Tutorial: Streaming Data Architecture
Kafka Tutorial: Streaming Data ArchitectureKafka Tutorial: Streaming Data Architecture
Kafka Tutorial: Streaming Data Architecture
 
Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...
Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...
Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...
 
Kafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema RegistryKafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema Registry
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
 
Kafka Tutorial Advanced Kafka Consumers
Kafka Tutorial Advanced Kafka ConsumersKafka Tutorial Advanced Kafka Consumers
Kafka Tutorial Advanced Kafka Consumers
 
Amazon AWS basics needed to run a Cassandra Cluster in AWS
Amazon AWS basics needed to run a Cassandra Cluster in AWSAmazon AWS basics needed to run a Cassandra Cluster in AWS
Amazon AWS basics needed to run a Cassandra Cluster in AWS
 
Kafka Tutorial: Kafka Security
Kafka Tutorial: Kafka SecurityKafka Tutorial: Kafka Security
Kafka Tutorial: Kafka Security
 
Kafka as a message queue
Kafka as a message queueKafka as a message queue
Kafka as a message queue
 
Best Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker ContainersBest Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker Containers
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 
Kafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsKafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internals
 
Schema Evolution for Resilient Data microservices
Schema Evolution for Resilient Data microservicesSchema Evolution for Resilient Data microservices
Schema Evolution for Resilient Data microservices
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
 
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
 
Apache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data StreamingApache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data Streaming
 
ES & Kafka
ES & KafkaES & Kafka
ES & Kafka
 
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals
 
Javaeeconf 2016 how to cook apache kafka with camel and spring boot
Javaeeconf 2016 how to cook apache kafka with camel and spring bootJavaeeconf 2016 how to cook apache kafka with camel and spring boot
Javaeeconf 2016 how to cook apache kafka with camel and spring boot
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
 
Deploying Kafka on DC/OS
Deploying Kafka on DC/OSDeploying Kafka on DC/OS
Deploying Kafka on DC/OS
 

Similar to Kafka Tutorial, Kafka ecosystem with clustering examples

kafka-tutorial-cloudruable-v2.pdf
kafka-tutorial-cloudruable-v2.pdfkafka-tutorial-cloudruable-v2.pdf
kafka-tutorial-cloudruable-v2.pdf
PriyamTomar1
 
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Denodo
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
Cedric Vidal
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Kumar Shivam
 
Large scale, distributed and reliable messaging with Kafka
Large scale, distributed and reliable messaging with KafkaLarge scale, distributed and reliable messaging with Kafka
Large scale, distributed and reliable messaging with Kafka
Rafał Hryniewski
 
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
HostedbyConfluent
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
Florent Ramiere
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
Florent Ramiere
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
Syed Hadoop
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
Guido Schmutz
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
DataStax Academy
 
Kafka Basic For Beginners
Kafka Basic For BeginnersKafka Basic For Beginners
Kafka Basic For Beginners
Riby Varghese
 
Apache Kafka - A Distributed Streaming Platform
Apache Kafka - A Distributed Streaming PlatformApache Kafka - A Distributed Streaming Platform
Apache Kafka - A Distributed Streaming Platform
Paolo Castagna
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
confluent
 
Python Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuidePython Kafka Integration: Developers Guide
Python Kafka Integration: Developers Guide
Inexture Solutions
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Chris Fregly
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
Mohammed Fazuluddin
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
Slim Baltagi
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Kai Wähner
 
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdfDustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
Dustin Vannoy
 

Similar to Kafka Tutorial, Kafka ecosystem with clustering examples (20)

kafka-tutorial-cloudruable-v2.pdf
kafka-tutorial-cloudruable-v2.pdfkafka-tutorial-cloudruable-v2.pdf
kafka-tutorial-cloudruable-v2.pdf
 
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
 
BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Large scale, distributed and reliable messaging with Kafka
Large scale, distributed and reliable messaging with KafkaLarge scale, distributed and reliable messaging with Kafka
Large scale, distributed and reliable messaging with Kafka
 
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
Kafka Basic For Beginners
Kafka Basic For BeginnersKafka Basic For Beginners
Kafka Basic For Beginners
 
Apache Kafka - A Distributed Streaming Platform
Apache Kafka - A Distributed Streaming PlatformApache Kafka - A Distributed Streaming Platform
Apache Kafka - A Distributed Streaming Platform
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
 
Python Kafka Integration: Developers Guide
Python Kafka Integration: Developers GuidePython Kafka Integration: Developers Guide
Python Kafka Integration: Developers Guide
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
 
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdfDustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
 

Recently uploaded

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 

Recently uploaded (20)

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 

Kafka Tutorial, Kafka ecosystem with clustering examples

  • 1. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Cassandra and Kafka Support on AWS/EC2 Cloudurable Introduction to Kafka Support around Cassandra and Kafka running in EC2
  • 2. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting
  • 3. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Kafka growing Why Kafka? Kafka adoption is on the rise but why What is Kafka?
  • 4. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka growth exploding âť– Kafka growth exploding âť– 1/3 of all Fortune 500 companies âť– Top ten travel companies, 7 of top ten banks, 8 of top ten insurance companies, 9 of top ten telecom companies âť– LinkedIn, Microsoft and Netflix process 4 comma message a day with Kafka (1,000,000,000,000) âť– Real-time streams of data, used to collect big data or to do real time analysis (or both)
  • 5. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Why Kafka is Needed? âť– Real time streaming data processed for real time analytics âť– Service calls, track every call, IOT sensors âť– Apache Kafka is a fast, scalable, durable, and fault- tolerant publish-subscribe messaging system âť– Kafka is often used instead of JMS, RabbitMQ and AMQP âť– higher throughput, reliability and replication
  • 6. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Why is Kafka needed? 2 âť– Kafka can works in combination with âť– Flume/Flafka, Spark Streaming, Storm, HBase and Spark for real-time analysis and processing of streaming data âť– Feed your data lakes with data streams âť– Kafka brokers support massive message streams for follow- up analysis in Hadoop or Spark âť– Kafka Streaming (subproject) can be used for real-time analytics
  • 7. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Use Cases âť– Stream Processing âť– Website Activity Tracking âť– Metrics Collection and Monitoring âť– Log Aggregation âť– Real time analytics âť– Capture and ingest data into Spark / Hadoop âť– CRQS, replay, error recovery âť– Guaranteed distributed commit log for in-memory computing
  • 8. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Who uses Kafka? âť– LinkedIn: Activity data and operational metrics âť– Twitter: Uses it as part of Storm – stream processing infrastructure âť– Square: Kafka as bus to move all system events to various Square data centers (logs, custom events, metrics, an so on). Outputs to Splunk, Graphite, Esper-like alerting systems âť– Spotify, Uber, Tumbler, Goldman Sachs, PayPal, Box, Cisco, CloudFlare, DataDog, LucidWorks, MailChimp, NetFlix, etc.
  • 9. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Why is Kafka Popular? âť– Great performance âť– Operational Simplicity, easy to setup and use, easy to reason âť– Stable, Reliable Durability, âť– Flexible Publish-subscribe/queue (scales with N-number of consumer groups), âť– Robust Replication, âť– Producer Tunable Consistency Guarantees, âť– Ordering Preserved at shard level (Topic Partition) âť– Works well with systems that have data streams to process, aggregate, transform & load into other stores Most important reason: Kafka’s great performance: throughput, latency, obtained through great engineering
  • 10. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Why is Kafka so fast? âť– Zero Copy - calls the OS kernel direct rather to move data fast âť– Batch Data in Chunks - Batches data into chunks âť– end to end from Producer to file system to Consumer âť– Provides More efficient data compression. Reduces I/O latency âť– Sequential Disk Writes - Avoids Random Disk Access âť– writes to immutable commit log. No slow disk seeking. No random I/O operations. Disk accessed in sequential manner âť– Horizontal Scale - uses 100s to thousands of partitions for a single topic âť– spread out to thousands of servers âť– handle massive load
  • 11. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Streaming Architecture
  • 12. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Why Kafka Review âť– Why is Kafka so fast? âť– How fast is Kafka usage growing? âť– How is Kafka getting used? âť– Where does Kafka fit in the Big Data Architecture? âť– How does Kafka relate to real-time analytics? âť– Who uses Kafka?
  • 13. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Cassandra / Kafka Support in EC2/AWS What is Kafka? Kafka messaging Kafka Overview
  • 14. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ What is Kafka? âť– Distributed Streaming Platform âť– Publish and Subscribe to streams of records âť– Fault tolerant storage âť– Replicates Topic Log Partitions to multiple servers âť– Process records as they occur âť– Fast, efficient IO, batching, compression, and more âť– Used to decouple data streams
  • 15. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Decoupling Data Streams
  • 16. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Polyglot clients / Wire protocol âť– Kafka communication from clients and servers wire protocol over TCP protocol âť– Protocol versioned âť– Maintains backwards compatibility âť– Many languages supported âť– Kafka REST proxy allows easy integration âť– Also provides Avro/Schema registry support via Kafka ecosystem
  • 17. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Usage âť– Build real-time streaming applications that react to streams âť– Real-time data analytics âť– Transform, react, aggregate, join real-time data flows âť– Feed events to CEP for complex event processing âť– Feed data lakes âť– Build real-time streaming data pipe-lines âť– Enable in-memory microservices (actors, Akka, Vert.x, Qbit, RxJava)
  • 18. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Use Cases âť– Metrics / KPIs gathering âť– Aggregate statistics from many sources âť– Event Sourcing âť– Used with microservices (in-memory) and actor systems âť– Commit Log âť– External commit log for distributed systems. Replicated data between nodes, re-sync for nodes to restore state âť– Real-time data analytics, Stream Processing, Log Aggregation, Messaging, Click-stream tracking, Audit trail, etc.
  • 19. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Record Retention âť– Kafka cluster retains all published records âť– Time based – configurable retention period âť– Size based - configurable based on size âť– Compaction - keeps latest record âť– Retention policy of three days or two weeks or a month âť– It is available for consumption until discarded by time, size or compaction âť– Consumption speed not impacted by size
  • 20. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka scalable message storage âť– Kafka acts as a good storage system for records/messages âť– Records written to Kafka topics are persisted to disk and replicated to other servers for fault-tolerance âť– Kafka Producers can wait on acknowledgment âť– Write not complete until fully replicated âť– Kafka disk structures scales well âť– Writing in large streaming batches is fast âť– Clients/Consumers can control read position (offset) âť– Kafka acts like high-speed file system for commit log storage, replication
  • 21. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Review âť– How does Kafka decouple streams of data? âť– What are some use cases for Kafka where you work? âť– What are some common use cases for Kafka? âť– How is Kafka like a distributed message storage system? âť– How does Kafka know when to delete old messages? âť– Which programming languages does Kafka support?
  • 22. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Kafka Architecture
  • 23. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Fundamentals âť– Records have a key (optional), value and timestamp; Immutable âť– Topic a stream of records (“/orders”, “/user-signups”), feed name âť– Log topic storage on disk âť– Partition / Segments (parts of Topic Log) âť– Producer API to produce a streams or records âť– Consumer API to consume a stream of records âť– Broker: Kafka server that runs in a Kafka Cluster. Brokers form a cluster. Cluster consists on many Kafka Brokers on many servers. âť– ZooKeeper: Does coordination of brokers/cluster topology. Consistent file system for configuration information and leadership election for Broker Topic Partition Leaders
  • 24. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka: Topics, Producers, and Consumers Kafka Cluster Topic Producer Producer Producer Consumer Consumer Consumer record record
  • 25. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Core Kafka
  • 26. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka needs Zookeeper âť– Zookeeper helps with leadership election of Kafka Broker and Topic Partition pairs âť– Zookeeper manages service discovery for Kafka Brokers that form the cluster âť– Zookeeper sends changes to Kafka âť– New Broker join, Broker died, etc. âť– Topic removed, Topic added, etc. âť– Zookeeper provides in-sync view of Kafka Cluster configuration
  • 27. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Producer/Consumer Details âť– Producers write to and Consumers read from Topic(s) âť– Topic associated with a log which is data structure on disk âť– Producer(s) append Records at end of Topic log âť– Topic Log consist of Partitions - âť– Spread to multiple files on multiple nodes âť– Consumers read from Kafka at their own cadence âť– Each Consumer (Consumer Group) tracks offset from where they left off reading âť– Partitions can be distributed on different machines in a cluster âť– high performance with horizontal scalability and failover with replication
  • 28. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Topic Partition, Consumers, Producers 0 1 42 3 5 6 7 8 9 10 11 Partition 0 Consumer Group A Producer Consumer Group B Consumer groups remember offset where they left off. Consumers groups each have their own offset. Producer writing to offset 12 of Partition 0 while… Consumer Group A is reading from offset 6. Consumer Group B is reading from offset 9.
  • 29. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Scale and Speed âť– How can Kafka scale if multiple producers and consumers read/write to same Kafka Topic log? âť– Writes fast: Sequential writes to filesystem are fast (700 MB or more a second) âť– Scales writes and reads by sharding: âť– Topic logs into Partitions (parts of a Topic log) âť– Topics logs can be split into multiple Partitions different machines/different disks âť– Multiple Producers can write to different Partitions of the same Topic âť– Multiple Consumers Groups can read from different partitions efficiently
  • 30. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Brokers âť– Kafka Cluster is made up of multiple Kafka Brokers âť– Each Broker has an ID (number) âť– Brokers contain topic log partitions âť– Connecting to one broker bootstraps client to entire cluster âť– Start with at least three brokers, cluster can have, 10, 100, 1000 brokers if needed
  • 31. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Cluster, Failover, ISRs âť– Topic Partitions can be replicated âť– across multiple nodes for failover âť– Topic should have a replication factor greater than 1 âť– (2, or 3) âť– Failover âť– if one Kafka Broker goes down then Kafka Broker with ISR (in-sync replica) can serve data
  • 32. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ ZooKeeper does coordination for Kafka Cluster Kafka BrokerProducer Producer Producer Consumer Consumer Consumer Kafka Broker Kafka Broker Topic ZooKeeper
  • 33. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Failover vs. Disaster Recovery âť– Replication of Kafka Topic Log partitions allows for failure of a rack or AWS availability zone âť– You need a replication factor of at least 3 âť– Kafka Replication is for Failover âť– Mirror Maker is used for Disaster Recovery âť– Mirror Maker replicates a Kafka cluster to another data-center or AWS region âť– Called mirroring since replication happens within a cluster
  • 34. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Review âť– How does Kafka decouple streams of data? âť– What are some use cases for Kafka where you work? âť– What are some common use cases for Kafka? âť– What is a Topic? âť– What is a Broker? âť– What is a Partition? Offset? âť– Can Kafka run without Zookeeper? âť– How do implement failover in Kafka? âť– How do you implement disaster recovery in Kafka?
  • 35. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Kafka versus
  • 36. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka vs JMS, SQS, RabbitMQ Messaging âť– Is Kafka a Queue or a Pub/Sub/Topic? âť– Yes âť– Kafka is like a Queue per consumer group âť– Kafka is a queue system per consumer in consumer group so load balancing like JMS, RabbitMQ queue âť– Kafka is like Topics in JMS, RabbitMQ, MOM âť– Topic/pub/sub by offering Consumer Groups which act like subscriptions âť– Broadcast to multiple consumer groups âť– MOM = JMS, ActiveMQ, RabbitMQ, IBM MQ Series, Tibco, etc.
  • 37. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka vs MOM âť– By design, Kafka is better suited for scale than traditional MOM systems due to partition topic log âť– Load divided among Consumers for read by partition âť– Handles parallel consumers better than traditional MOM âť– Also by moving location (partition offset) in log to client/consumer side of equation instead of the broker, less tracking required by Broker and more flexible consumers âť– Kafka written with mechanical sympathy, modern hardware, cloud in mind âť– Disks are faster âť– Servers have tons of system memory âť– Easier to spin up servers for scale out
  • 38. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kinesis and Kafka are similar âť– Kinesis Streams is like Kafka Core âť– Kinesis Analytics is like Kafka Streams âť– Kinesis Shard is like Kafka Partition âť– Similar and get used in similar use cases âť– In Kinesis, data is stored in shards. In Kafka, data is stored in partitions âť– Kinesis Analytics allows you to perform SQL like queries on data streams âť– Kafka Streaming allows you to perform functional aggregations and mutations âť– Kafka integrates well with Spark and Flink which allows SQL like queries on streams
  • 39. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka vs. Kinesis âť– Data is stored in Kinesis for default 24 hours, and you can increase that up to 7 days. âť– Kafka records default stored for 7 days âť– can increase until you run out of disk space. âť– Decide by the size of data or by date. âť– Can use compaction with Kafka so it only stores the latest timestamp per key per record in the log âť– With Kinesis data can be analyzed by lambda before it gets sent to S3 or RedShift âť– With Kinesis you pay for use, by buying read and write units. âť– Kafka is more flexible than Kinesis but you have to manage your own clusters, and requires some dedicated DevOps resources to keep it going âť– Kinesis is sold as a service and does not require a DevOps team to keep it going (can be more expensive and less flexible, but much easier to setup and run)
  • 40. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Kafka Topics Kafka Topics Architecture Replication Failover Parallel processing Kafka Topic Architecture
  • 41. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Topics, Logs, Partitions âť– Kafka Topic is a stream of records âť– Topics stored in log âť– Log broken up into partitions and segments âť– Topic is a category or stream name or feed âť– Topics are pub/sub âť– Can have zero or many subscribers - consumer groups âť– Topics are broken up and spread by partitions for speed and size
  • 42. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Topic Partitions âť– Topics are broken up into partitions âť– Partitions decided usually by key of record âť– Key of record determines which partition âť– Partitions are used to scale Kafka across many servers âť– Record sent to correct partition by key âť– Partitions are used to facilitate parallel consumers âť– Records are consumed in parallel up to the number of partitions âť– Order guaranteed per partition âť– Partitions can be replicated to multiple brokers
  • 43. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Topic Partition Log âť– Order is maintained only in a single partition âť– Partition is ordered, immutable sequence of records that is continually appended to—a structured commit log âť– Records in partitions are assigned sequential id number called the offset âť– Offset identifies each record within the partition âť– Topic Partitions allow Kafka log to scale beyond a size that will fit on a single server âť– Topic partition must fit on servers that host it âť– topic can span many partitions hosted on many servers âť– Topic Partitions are unit of parallelism - a partition can only be used by one consumer in group at a time âť– Consumers can run in their own process or their own thread âť– If a consumer stops, Kafka spreads partitions across remaining consumer in group
  • 44. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Topic Partitions Layout 0 1 42 3 5 6 7 8 9 10 11 0 1 42 3 5 6 7 8 0 1 42 3 5 6 7 8 9 10 Older Newer 0 1 42 3 5 6 7 Partition 0 Partition 1 Partition 2 Partition 3 Writes
  • 45. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Replication: Kafka Partition Distribution âť– Each partition has leader server and zero or more follower servers âť– Leader handles all read and write requests for partition âť– Followers replicate leader, and take over if leader dies âť– Used for parallel consumer handling within a group âť– Partitions of log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of partitions âť– Each partition can be replicated across a configurable number of Kafka servers - Used for fault tolerance
  • 46. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Replication: Kafka Partition Leader âť– One node/partition’s replicas is chosen as leader âť– Leader handles all reads and writes of Records for partition âť– Writes to partition are replicated to followers (node/partition pair) âť– An follower that is in-sync is called an ISR (in-sync replica) âť– If a partition leader fails, one ISR is chosen as new leader
  • 47. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Replication to Partition 0 Kafka Broker 0 Partition 0 Partition 1 Partition 2 Partition 3 Partition 4 Kafka Broker 1 Partition 0 Partition 1 Partition 2 Partition 3 Partition 4 Kafka Broker 2 Partition 1 Partition 2 Partition 3 Partition 4 Client Producer 1) Write record Partition 0 2) Replicate record 2) Replicate record Leader Red Follower Blue Record is considered "committed" when all ISRs for partition wrote to their log. Only committed records are readable from consumer
  • 48. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Replication to Partitions 1 Kafka Broker 0 Partition 0 Partition 1 Partition 2 Partition 3 Partition 4 Kafka Broker 1 Partition 0 Partition 1 Partition 2 Partition 3 Partition 4 Kafka Broker 2 Partition 1 Partition 2 Partition 3 Partition 4 Client Producer 1) Write record Partition 0 2) Replicate record 2) Replicate record Another partition can be owned by another leader on another Kafka broker Leader Red Follower Blue
  • 49. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Topic Review âť– What is an ISR? âť– How does Kafka scale Consumers? âť– What are leaders? followers? âť– How does Kafka perform failover for Consumers? âť– How does Kafka perform failover for Brokers?
  • 50. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Kafka Producers Kafka Producers Partition selection Durability Kafka Producer Architecture
  • 51. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Producers âť– Producers send records to topics âť– Producer picks which partition to send record to per topic âť– Can be done in a round-robin âť– Can be based on priority âť– Typically based on key of record âť– Kafka default partitioner for Java uses hash of keys to choose partitions, or a round-robin strategy if no key âť– Important: Producer picks partition
  • 52. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Producers and Consumers 0 1 42 3 5 6 7 8 9 10 11 Partition 0 Producers Consumer Group A Producers are writing at Offset 12 Consumer Group A is Reading from Offset 9.
  • 53. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Producers âť– Producers write at their own cadence so order of Records cannot be guaranteed across partitions âť– Producer configures consistency level (ack=0, ack=all, ack=1) âť– Producers pick the partition such that Record/messages goes to a given same partition based on the data âť– Example have all the events of a certain 'employeeId' go to same partition âť– If order within a partition is not needed, a 'Round Robin' partition strategy can be used so Records are evenly distributed across partitions.
  • 54. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Producer Review âť– Can Producers occasionally write faster than consumers? âť– What is the default partition strategy for Producers without using a key? âť– What is the default partition strategy for Producers using a key? âť– What picks which partition a record is sent to?
  • 55. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Kafka Consumers Load balancing consumers Failover for consumers Offset management per consumer group Kafka Consumer Architecture
  • 56. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Consumer Groups âť– Consumers are grouped into a Consumer Group âť– Consumer group has a unique id âť– Each consumer group is a subscriber âť– Each consumer group maintains its own offset âť– Multiple subscribers = multiple consumer groups âť– Each has different function: one might delivering records to microservices while another is streaming records to Hadoop âť– A Record is delivered to one Consumer in a Consumer Group âť– Each consumer in consumer groups takes records and only one consumer in group gets same record âť– Consumers in Consumer Group load balance record consumption
  • 57. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Consumer Load Share âť– Kafka Consumer consumption divides partitions over consumers in a Consumer Group âť– Each Consumer is exclusive consumer of a "fair share" of partitions âť– This is Load Balancing âť– Consumer membership in Consumer Group is handled by the Kafka protocol dynamically âť– If new Consumers join Consumer group, it gets a share of partitions âť– If Consumer dies, its partitions are split among remaining live Consumers in Consumer Group
  • 58. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Consumer Groups 0 1 42 3 5 6 7 8 9 10 11 Partition 0 Consumer Group A Producers Consumer Group B Consumers remember offset where they left off. Consumers groups each have their own offset per partition.
  • 59. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Consumer Groups Processing âť– How does Kafka divide up topic so multiple Consumers in a Consumer Group can process a topic? âť– You group consumers into consumers group with a group id âť– Consumers with same id belong in same Consumer Group âť– One Kafka broker becomes group coordinator for Consumer Group âť– assigns partitions when new members arrive (older clients would talk direct to ZooKeeper now broker does coordination) âť– or reassign partitions when group members leave or topic changes (config / meta-data change âť– When Consumer group is created, offset set according to reset policy of topic
  • 60. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Consumer Failover âť– Consumers notify broker when it successfully processed a record âť– advances offset âť– If Consumer fails before sending commit offset to Kafka broker, âť– different Consumer can continue from the last committed offset âť– some Kafka records could be reprocessed âť– at least once behavior âť– messages should be idempotent
  • 61. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Consumer Offsets and Recovery âť– Kafka stores offsets in topic called “__consumer_offset” âť– Uses Topic Log Compaction âť– When a consumer has processed data, it should commit offsets âť– If consumer process dies, it will be able to start up and start reading where it left off based on offset stored in “__consumer_offset”
  • 62. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Consumer: What can be consumed? âť– "Log end offset" is offset of last record written to log partition and where Producers write to next âť– "High watermark" is offset of last record successfully replicated to all partitions followers âť– Consumer only reads up to “high watermark”. Consumer can’t read un-replicated data
  • 63. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Consumer to Partition Cardinality âť– Only a single Consumer from the same Consumer Group can access a single Partition âť– If Consumer Group count exceeds Partition count: âť– Extra Consumers remain idle; can be used for failover âť– If more Partitions than Consumer Group instances, âť– Some Consumers will read from more than one partition
  • 64. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ 2 server Kafka cluster hosting 4 partitions (P0-P5) Kafka Cluster Server 2 P0 P1 P5 Server 1 P2 P3 P4 Consumer Group A C0 C1 C3 Consumer Group B C0 C1 C3
  • 65. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Multi-threaded Consumers âť– You can run more than one Consumer in a JVM process âť– If processing records takes a while, a single Consumer can run multiple threads to process records âť– Harder to manage offset for each Thread/Task âť– One Consumer runs multiple threads âť– 2 messages on same partitions being processed by two different threads âť– Hard to guarantee order without threads coordination âť– PREFER: Multiple Consumers can run each processing record batches in their own thread âť– Easier to manage offset âť– Each Consumer runs in its thread âť– Easier to mange failover (each process runs X num of Consumer threads)
  • 66. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Consumer Review âť– What is a consumer group? âť– Does each consumer have its own offset? âť– When can a consumer see a record? âť– What happens if there are more consumers than partitions? âť– What happens if you run multiple consumers in many thread in the same JVM?
  • 67. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Using Kafka Single Node Using Kafka Single Node Run ZooKeeper, Kafka Create a topic Send messages from command line Read messages from command line Tutorial Using Kafka Single Node
  • 68. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Run Kafka âť– Run ZooKeeper start up script âť– Run Kafka Server/Broker start up script âť– Create Kafka Topic from command line âť– Run producer from command line âť– Run consumer from command line
  • 69. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Run ZooKeeper
  • 70. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Run Kafka Server
  • 71. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Create Kafka Topic
  • 72. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ List Topics
  • 73. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Run Kafka Producer
  • 74. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Run Kafka Consumer
  • 75. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Running Kafka Producer and Consumer
  • 76. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Single Node Review âť– What server do you run first? âť– What tool do you use to create a topic? âť– What tool do you use to see topics? âť– What tool did we use to send messages on the command line? âť– What tool did we use to view messages in a topic? âť– Why were the messages coming out of order? âť– How could we get the messages to come in order from the consumer?
  • 77. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Use Kafka to send and receive messages Lab Use Kafka Use single server version of Kafka. Setup single node. Single ZooKeeper. Create a topic. Produce and consume messages from the command line.
  • 78. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Using Kafka Cluster and Failover Demonstrate Kafka Cluster Create topic with replication Show consumer failover Show broker failover Kafka Tutorial Cluster and Failover
  • 79. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Objectives âť– Run many Kafka Brokers âť– Create a replicated topic âť– Demonstrate Pub / Sub âť– Demonstrate load balancing consumers âť– Demonstrate consumer failover âť– Demonstrate broker failover
  • 80. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Running many nodes âť– If not already running, start up ZooKeeper âť– Shutdown Kafka from first lab âť– Copy server properties for three brokers âť– Modify properties files, Change port, Change Kafka log location âť– Start up many Kafka server instances âť– Create Replicated Topic âť– Use the replicated topic
  • 81. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Create three new server- n.properties files âť– Copy existing server.properties to server- 0.properties, server-1.properties, server-2.properties âť– Change server-1.properties to use log.dirs “./logs/kafka-logs-0” âť– Change server-1.properties to use port 9093, broker id 1, and log.dirs “./logs/kafka-logs-1” âť– Change server-2.properties to use port 9094, broker id 2, and log.dirs “./logs/kafka-logs-2”
  • 82. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Modify server-x.properties âť– Each have different broker.id âť– Each have different log.dirs âť– Each had different port
  • 83. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Create Startup scripts for three Kafka servers âť– Passing properties files from last step
  • 84. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Run Servers
  • 85. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Create Kafka replicated topic my- failsafe-topic âť– Replication Factor is set to 3 âť– Topic name is my-failsafe-topic âť– Partitions is 13
  • 86. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Start Kafka Consumer âť– Pass list of Kafka servers to bootstrap- server âť– We pass two of the three âť– Only one needed, it learns about the rest
  • 87. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Start Kafka Producer âť– Start producer âť– Pass list of Kafka Brokers
  • 88. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka 1 consumer and 1 producer running
  • 89. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Start a second and third consumer âť– Acts like pub/sub âť– Each consumer in its own group âť– Message goes to each âť– How do we load share?
  • 90. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Running consumers in same group âť– Modify start consumer script âť– Add the consumers to a group called mygroup âť– Now they will share load
  • 91. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Start up three consumers again âť– Start up producer and three consumers âť– Send 7 messages âť– Notice how messages are spread among 3 consumers
  • 92. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Consumer Failover âť– Kill one consumer âť– Send seven more messages âť– Load is spread to remaining consumers âť– Failover WORK!
  • 93. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Create Kafka Describe Topic âť– —describe will show list partitions, ISRs, and partition leadership
  • 94. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Use Describe Topics âť– Lists which broker owns (leader of) which partition âť– Lists Replicas and ISR (replicas that are up to date) âť– Notice there are 13 topics
  • 95. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Test Broker Failover: Kill 1st server se Kafka topic describe to see that a new leader was elected! Kill the first server
  • 96. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Show Broker Failover Worked âť– Send two more messages from the producer âť– Notice that the consumer gets the messages âť– Broker Failover WORKS!
  • 97. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Cluster Review âť– Why did the three consumers not load share the messages at first? âť– How did we demonstrate failover for consumers? âť– How did we demonstrate failover for producers? âť– What tool and option did we use to show ownership of partitions and the ISRs?
  • 98. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Use Kafka to send and receive messages Lab 2 Use Kafka multiple nodes Use a Kafka Cluster to replicate a Kafka topic log
  • 99. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Kafka Ecosystem Kafka Connect Kafka Streaming Kafka Schema Registry Kafka REST
  • 100. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Ecosystem âť– Kafka Streams âť– Streams API to transform, aggregate, process records from a stream and produce derivative streams âť– Kafka Connect âť– Connector API reusable producers and consumers âť– (e.g., stream of changes from DynamoDB) âť– Kafka REST Proxy âť– Producers and Consumers over REST (HTTP) âť– Schema Registry - Manages schemas using Avro for Kafka Records âť– Kafka MirrorMaker - Replicate cluster data to another cluster
  • 101. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka REST Proxy and Kafka Schema Registry
  • 102. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Ecosystem
  • 103. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Stream Processing âť– Kafka Streams for Stream Processing âť– Kafka enable real-time processing of streams. âť– Kafka Streams supports Stream Processor âť– processing, transformation, aggregation, and produces 1 to * output streams âť– Example: video player app sends events videos watched, videos paused âť– output a new stream of user preferences âť– can gear new video recommendations based on recent user activity âť– can aggregate activity of many users to see what new videos are hot âť– Solves hard problems: out of order records, aggregating/joining across streams, stateful computations, and more
  • 104. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Connectors and Streams Kafka Cluster App App App App App App DB DB App App Connectors Producers Consumers Streams
  • 105. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Ecosystem review âť– What is Kafka Streams? âť– What is Kafka Connect? âť– What is the Schema Registry? âť– What is Kafka Mirror Maker? âť– When might you use Kafka REST Proxy?
  • 106. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ References âť– Learning Apache Kafka, Second Edition 2nd Edition by Nishant Garg (Author), 2015, ISBN 978-1784393090, Packet Press âť– Apache Kafka Cookbook, 1st Edition, Kindle Edition by Saurabh Minni (Author), 2015, ISBN 978-1785882449, Packet Press âť– Kafka Streams for Stream processing: A few words about how Kafka works, Serban Balamaci, 2017, Blog: Plain Ol' Java âť– Kafka official documentation, 2017 âť– Why we need Kafka? Quora âť– Why is Kafka Popular? Quora âť– Why is Kafka so Fast? Stackoverflow âť– Kafka growth exploding (Tech Republic)