1) Apache Flume is a distributed and available service, in which it can collect and move large amount of streaming data from one location to another.
2) Most frequently it will deliver the log data into HDFS.

1) Event and Client are the logical components of flume.
2) An Event is a Singular unit of data which can be transported by Flume NG from its Source to destination.
3) Typically an Event will be composed of Zero or more headers and a body. Here the headers will be used for contextual routing. This means by using the Header definition we can rout the data to the next eligible destination.
4) Client is an Event generator. It will generate the events and send it to one or more agents.
Eg: Apache webservers, which generates continuously a huge amount of log data.

1) Flume agent is a JVM Daemon service, which holds all Flume-NG components like Sources, Channels, Sinks...etc.
2) Here the Source will send the events to channel and channel will stored it, later the channel will send the events to sink.

1) Source is an active component, which receives data from different locations and places it on one or more Channels.
2) The declaration of source component in “.conf” file of agent “a1” is listed here. In this s1 means Source component, a1 means agent.
a1.sources.s1.type=netcat (netcat is one of the Source type)
3) There are different Source types are available like Pollable (Means Auto generating like “tail –F” command and sequencing command), event driven and Netcat.
4) Even we can write our won Source type and specify that Custom class name to source type parameter.

1) A channel is a bridge between Source and Sink.
2) Channel will store the Source events and send it to Sink.
3) There are three different types of Channels like memory channel which is very fast but no guarantee for data loss. And file channel which will store the events in a file system before sending it to sink. And the third one is database channel which will store the events in database.
4) Single Channel can be connected to any number of Sources and Sinks.

1) A sink receives events from one channel only.

  • What is Flume? • Core flume-ng Concepts. • Flow Reliability in Flume. • Starting an Agent.
  10. 10. Interceptors • Interceptors: An interceptor is a point in your data flow where you can inspect and rout Flume events. You can chain zero or more interceptors after a source creates an event or before a sink sends the event wherever it is destined.
  11. 11. channel selectors • Channel selectors are responsible for how data moves from a source to one or more channels. There are two channel selectors. 1) A replicating channel selector (the default) simply puts a copy of the event into each channel assuming you have configured more than one. 2) A multiplexing channel selector can write to different channels depending on certain header information(Contextual routing).
  12. 12. sink processor • A Sink Processor is responsible for invoking one sink from an assigned group of sinks. Here the Sink Processor is invoked by Sink runner. Built-in Sink Processors:- 1) Load Balancing Sink Processor 2) Failover Sink Processor 3) Default Sink Processor
  13. 13. Flow Reliability in Flume  Whenever the Sink commit/end the transaction, then only the event data will be removed from channel(passive component).
  14. 14. Basic Flume Agent Configuration // example.conf file of agent name ‘a1’ a1.sources=s1 a1.channels=c1 a1.sinks=k1 a1.sources.s1.type=netcat a1.sources.s1.channels=c1 a1.sources.s1.bind=localhost a1.sources.s1.port=44444 a1.channels.c1.type=memory a1.sinks.k1.type=logger a1.sinks.k1.channel=c1
  15. 15. Starting Agent • $ flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console
