1) Apache Flume is a distributed and available service, in which it can collect and move large amount of streaming data from one location to another.
2) Most frequently it will deliver the log data into HDFS.
1) Event and Client are the logical components of flume.
2) An Event is a Singular unit of data which can be transported by Flume NG from its Source to destination.
3) Typically an Event will be composed of Zero or more headers and a body. Here the headers will be used for contextual routing. This means by using the Header definition we can rout the data to the next eligible destination.
4) Client is an Event generator. It will generate the events and send it to one or more agents.
Eg: Apache webservers, which generates continuously a huge amount of log data.
1) Flume agent is a JVM Daemon service, which holds all Flume-NG components like Sources, Channels, Sinks...etc.
2) Here the Source will send the events to channel and channel will stored it, later the channel will send the events to sink.
1) Source is an active component, which receives data from different locations and places it on one or more Channels.
2) The declaration of source component in “.conf” file of agent “a1” is listed here. In this s1 means Source component, a1 means agent.
a1.sources.s1.type=netcat (netcat is one of the Source type)
3) There are different Source types are available like Pollable (Means Auto generating like “tail –F” command and sequencing command), event driven and Netcat.
4) Even we can write our won Source type and specify that Custom class name to source type parameter.
1) A channel is a bridge between Source and Sink.
2) Channel will store the Source events and send it to Sink.
3) There are three different types of Channels like memory channel which is very fast but no guarantee for data loss. And file channel which will store the events in a file system before sending it to sink. And the third one is database channel which will store the events in database.
4) Single Channel can be connected to any number of Sources and Sinks.
1) A sink receives events from one channel only.
Clipping is a handy way to collect important slides you want to go back to later.