Avvo fkafka

FKafka
360 degree overview
Tanuj Mehta

What is Flume?
Apache Flume is a data ingestion mechanism for transporting large amounts of
streaming data such as log data, events etc. from various sources like web servers,
Kafka etc. to a centralized data store like HDFS
Data
Generator
Centralized
storage
HDFS
HBase
FlumeLog/Event data Log/Event data

Flume Event
An event is the basic unit of the data transported inside Flume. It contains a
payload of byte array that is to be transported from the source to the destination
accompanied by optional headers
Event
Header Byte Payload

Flume Agent
An agent is an independent daemon process (JVM) in Flume. It receives the data
(events) from clients or other agents and forwards it to its next destination (sink or
agent).
Source channel Sink

Flume Components
Source Interceptor Selector Channel Sink

Multi-hop Flow
Source 1
Source 2
Source 3
Channel 1
Channel 2 Centralized
storage
Channel 3
Fan-in
Fan-out

Sink Processor
● Default
● Load Balancing (Multi-Threading)
● Failover

Kafka-Flume Integration
Kafka Broker
Topic 1
Topic 2
Flume Agent
Channel Sink
Kafka Channel 1
HBase
SOLR
Kafka Channel 2
HDFS

Benefits
● Ability to store data to any centralized stores (HDFS, HBase)
● Contextual routing
● Effectively handle peak hour loads
● Allow read and write to operate at different rate
● Reliable, fault tolerant, scalable, manageable, and customizable

Avvo fkafka

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to Avvo fkafka

Similar to Avvo fkafka (20)

More from Nitin Kumar

More from Nitin Kumar (13)

Recently uploaded

Recently uploaded (20)

Avvo fkafka