Flume basic


Published on

First slide
1) Apache Flume is a distributed and available service, in which it can collect and move large amount of streaming data from one location to another.
2) Most frequently it will deliver the log data into HDFS.

Second slide
1) Event and Client are the logical components of flume.
2) An Event is a Singular unit of data which can be transported by Flume NG from its Source to destination.
3) Typically an Event will be composed of Zero or more headers and a body. Here the headers will be used for contextual routing. This means by using the Header definition we can rout the data to the next eligible destination.
4) Client is an Event generator. It will generate the events and send it to one or more agents.
Eg: Apache webservers, which generates continuously a huge amount of log data.

Third slide
1) Flume agent is a JVM Daemon service, which holds all Flume-NG components like Sources, Channels, Sinks...etc.
2) Here the Source will send the events to channel and channel will stored it, later the channel will send the events to sink.

Fourth slide
1) Source is an active component, which receives data from different locations and places it on one or more Channels.
2) The declaration of source component in “.conf” file of agent “a1” is listed here. In this s1 means Source component, a1 means agent.
a1.sources.s1.type=netcat (netcat is one of the Source type)
3) There are different Source types are available like Pollable (Means Auto generating like “tail –F” command and sequencing command), event driven and Netcat.
4) Even we can write our won Source type and specify that Custom class name to source type parameter.

Fifth slide
1) A channel is a bridge between Source and Sink.
2) Channel will store the Source events and send it to Sink.
3) There are three different types of Channels like memory channel which is very fast but no guarantee for data loss. And file channel which will store the events in a file system before sending it to sink. And the third one is database channel which will store the events in database.
4) Single Channel can be connected to any number of Sources and Sinks.

Sixth slide
1) A sink receives events from one channel only.

Published in: Software
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Flume basic

  1. 1. Welcome
  2. 2. ApacheFlume(NG)
  3. 3. Agenda • What is Flume? • Core flume-ng Concepts. • Flow Reliability in Flume. • Starting an Agent.
  4. 4. What is Flume? • Apache Flume is a distributed and reliable service for efficiently collecting, aggregating, and moving large amounts of log data from one place to another. • Its main goal is to deliver streaming data from applications to Apache Hadoop's HDFS(Most probably).
  5. 5. Core Concepts: Event, Client Event:- An Event is a singular unit of data that can be transported by Flume NG from origin to its final destination. An event is composed of zero or more headers and a body, For contextual routing. Client:- An entity that generates events and sends them to one or more agents. Apache web servers - which generates huge amount of log files on daily basis. Logging package like a log4j appender that directly sends events to Flume NG's source
  6. 6. Core Concepts: agent • Flume agent is a physical JVM Daemon service, which holds all Flume-NG components like Sources, Channels, Sinks ..etc.
  7. 7. Core Components: Source • Source is an active component, which receives data from different locations and places it on one or more Channels. Different Source types:- 1) Pollable source(Auto-generating): Exec, SEQ 2) Event driven source: Avro source which accepts Avro RPC calls and converts the RPC payload into a Flume event. 3)Netcat Source: Syslog, ‘nc’ command line tool running in server mode. a1.sources=s1 a1.sources.s1.type=netcat
  8. 8. Core Components: Channel • A channel is a glue between Source and Sink. A channel may be in memory, which is fast but makes no guarantee against data loss, or it can be file/database (fully durable) where every event is guaranteed to be delivered to the connected sink even in failure cases like power loss. Single Channel can be connected to any number of Sources and Sinks. a1.channels=c1 a1.channels.c1.type=memory
  9. 9. Core Components: Sink • For a flat Flume NG agent, sink is a destination for data. Basically Sink will remove the events from channel and transmits them to next eligible destination(if exists). Built-in Sinks:- 1)hdfs, writes events to HDFS. 2)logger, which simply logs all events received. 3)null, Auto-Consuming sinks. … etc. a1.sinks=k1 a1.sinks.k1.type=logger
  10. 10. Interceptors • Interceptors: An interceptor is a point in your data flow where you can inspect and rout Flume events. You can chain zero or more interceptors after a source creates an event or before a sink sends the event wherever it is destined.
  11. 11. channel selectors • Channel selectors are responsible for how data moves from a source to one or more channels. There are two channel selectors. 1) A replicating channel selector (the default) simply puts a copy of the event into each channel assuming you have configured more than one. 2) A multiplexing channel selector can write to different channels depending on certain header information(Contextual routing).
  12. 12. sink processor • A Sink Processor is responsible for invoking one sink from an assigned group of sinks. Here the Sink Processor is invoked by Sink runner. Built-in Sink Processors:- 1) Load Balancing Sink Processor 2) Failover Sink Processor 3) Default Sink Processor
  13. 13. Flow Reliability in Flume  Whenever the Sink commit/end the transaction, then only the event data will be removed from channel(passive component).
  14. 14. Basic Flume Agent Configuration // example.conf file of agent name ‘a1’ a1.sources=s1 a1.channels=c1 a1.sinks=k1 a1.sources.s1.type=netcat a1.sources.s1.channels=c1 a1.sources.s1.bind=localhost a1.sources.s1.port=44444 a1.channels.c1.type=memory a1.sinks.k1.type=logger a1.sinks.k1.channel=c1
  15. 15. Starting Agent • $ flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console
  16. 16. Thank you All
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.