Kowndinya Mannepalli
Apache Flume
Kowndinya Mannepalli
Introduction
Architecture
Flume Installation
Example Twitter Streaming data into Hadoop
References
Kowndinya Mannepalli
Apache Flume is a distributed, reliable, and available
service for efficiently collecting, aggregating, and moving large amounts
of streaming data into the Hadoop Distributed File System (HDFS). It has
a simple and flexible architecture based on streaming data flows; and is
robust and fault tolerant with tunable reliability mechanisms for failover
and recovery
Introduction
Kowndinya Mannepalli
FLUMEAGENT
An agent is an independent daemon
process (JVM) in Flume. It receives
the data (events) from clients or
other agents and forwards it to its
next destination (sink or agent).
Flume may have more than one
agent.
SOURCE
A source is the component of an Agent which
receives data from the data generators and transfers it to
one or more channels in the form of Flume events
CHANNEL
It acts as a bridge between the sources and the
sinks. These channels are fully transactional and they can
work with any number of sources and sinks.
SINK
A sink stores the data into centralized stores
like HBase and HDFS. It consumes the data (events)
from the channels and delivers it to the destination. The
destination of the sink might be another agent or the
central stores.
ApacheFlume-Architecture
Kowndinya Mannepalli
INSTALLING FLUME
https://flume.apache.org/download.htmlGo to :
Download:
Extract & Create a directory with the name Flume in the same directory where the installation directories
of Hadoop :
• $ mkdir Flume
• $ cd Downloads/
• $ tar zxvf apache-flume-1.6.0-bin.tar.gz
Move the content of apache-flume-1.6.0-bin.tar file to the Flume directory
* $ mv apache-flume-1.6.0-bin.tar/* /home/Hadoop/Flume/
STEP -1
STEP -2
STEP -3
STEP -4
Kowndinya Mannepalli
Configuring Flume
To configure Flume, we have to modify three files namely, flume-env.sh, flumeconf.properties, and bash.rc.
* $ sudo gedit ~/.bash.rc.
Now rename
 flume-conf.properties.template file as flume-conf.properties and
 flume-env.sh.template as flume-env.sh
* Open flume-env.sh file and set the JAVA_Home
Verifying the Installation : $ ./flume-ng
STEP -5
STEP -6
STEP -7
Kowndinya Mannepalli
Twitter Data Streaming into Hadoop
Trying to get Tweets using Apache Flume and save them into HDFS. Twitter exposes the API to get the Tweets The
service is free, but requires the user to register for the service.
Kowndinya Mannepalli
Flume has the concepts of agents. The
sources, sinks and the intermediate channels
are the different types of agents. The sources
can push/pull the data and send it to the
different channels which in turn will send the
data to the different sinks.
Flume decouples the source (Twitter) and the sink
(HDFS) in this case. Both the source and the sink
can operate at different speeds, also it's much
easier to add new sources and sinks. Flume comes
with a set of sources, channels, sinks and new
once can be implemented by extending the Flume
base classes.
Creating a Twitter Application
Kowndinya Mannepalli
Saveasflume.confin.conffolder
Verify Hadoop : $ Hadoop Version
The TwitterAgent.sources.Twitter.keywords value can be modified to get the
tweets for some other topic like football, movies etc.
The consumerKey, consumerSecret, accessToken and accessTokenSecret
have to be replaced with those obtained from https://dev.twitter.com/apps.
TwitterAgent.sinks.HDFS.hdfs.path should point to the NameNode and the location
in HDFS where the tweets will go to.
Start flume using the below command
$ FLUME_HOME : bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -Dflume.root.logger=DEBUG,console -n TwitterAgent
Kowndinya Mannepalli
Twitter Data Streaming into HDFS using Flume
Kowndinya Mannepalli
After a couple of minutes the Tweets should
appear in HDFS.
Kowndinya Mannepalli
REFERENCES
 https://en.wikipedia.org/wiki/Apache_Flume
 https://flume.apache.org/
 https://cwiki.apache.org/confluence/display/FLUME/Home
 http://hortonworks.com/apache/flume/
 http://www.tutorialspoint.com/apache_flume
 https://github.com/apache/flume
 http://www.thecloudavenue.com/2013/03/analyse-tweets-using-flume-hadoop-and.html
Kowndinya Mannepalli
Thank You …………………………….
Any Query's : mkowndinya@gmail.com

Apache flume - Twitter Streaming

  • 1.
  • 2.
  • 3.
    Kowndinya Mannepalli Apache Flumeis a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming data into the Hadoop Distributed File System (HDFS). It has a simple and flexible architecture based on streaming data flows; and is robust and fault tolerant with tunable reliability mechanisms for failover and recovery Introduction
  • 4.
    Kowndinya Mannepalli FLUMEAGENT An agentis an independent daemon process (JVM) in Flume. It receives the data (events) from clients or other agents and forwards it to its next destination (sink or agent). Flume may have more than one agent. SOURCE A source is the component of an Agent which receives data from the data generators and transfers it to one or more channels in the form of Flume events CHANNEL It acts as a bridge between the sources and the sinks. These channels are fully transactional and they can work with any number of sources and sinks. SINK A sink stores the data into centralized stores like HBase and HDFS. It consumes the data (events) from the channels and delivers it to the destination. The destination of the sink might be another agent or the central stores. ApacheFlume-Architecture
  • 5.
    Kowndinya Mannepalli INSTALLING FLUME https://flume.apache.org/download.htmlGoto : Download: Extract & Create a directory with the name Flume in the same directory where the installation directories of Hadoop : • $ mkdir Flume • $ cd Downloads/ • $ tar zxvf apache-flume-1.6.0-bin.tar.gz Move the content of apache-flume-1.6.0-bin.tar file to the Flume directory * $ mv apache-flume-1.6.0-bin.tar/* /home/Hadoop/Flume/ STEP -1 STEP -2 STEP -3 STEP -4
  • 6.
    Kowndinya Mannepalli Configuring Flume Toconfigure Flume, we have to modify three files namely, flume-env.sh, flumeconf.properties, and bash.rc. * $ sudo gedit ~/.bash.rc. Now rename  flume-conf.properties.template file as flume-conf.properties and  flume-env.sh.template as flume-env.sh * Open flume-env.sh file and set the JAVA_Home Verifying the Installation : $ ./flume-ng STEP -5 STEP -6 STEP -7
  • 7.
    Kowndinya Mannepalli Twitter DataStreaming into Hadoop Trying to get Tweets using Apache Flume and save them into HDFS. Twitter exposes the API to get the Tweets The service is free, but requires the user to register for the service.
  • 8.
    Kowndinya Mannepalli Flume hasthe concepts of agents. The sources, sinks and the intermediate channels are the different types of agents. The sources can push/pull the data and send it to the different channels which in turn will send the data to the different sinks. Flume decouples the source (Twitter) and the sink (HDFS) in this case. Both the source and the sink can operate at different speeds, also it's much easier to add new sources and sinks. Flume comes with a set of sources, channels, sinks and new once can be implemented by extending the Flume base classes. Creating a Twitter Application
  • 9.
    Kowndinya Mannepalli Saveasflume.confin.conffolder Verify Hadoop: $ Hadoop Version The TwitterAgent.sources.Twitter.keywords value can be modified to get the tweets for some other topic like football, movies etc. The consumerKey, consumerSecret, accessToken and accessTokenSecret have to be replaced with those obtained from https://dev.twitter.com/apps. TwitterAgent.sinks.HDFS.hdfs.path should point to the NameNode and the location in HDFS where the tweets will go to. Start flume using the below command $ FLUME_HOME : bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -Dflume.root.logger=DEBUG,console -n TwitterAgent
  • 10.
    Kowndinya Mannepalli Twitter DataStreaming into HDFS using Flume
  • 11.
    Kowndinya Mannepalli After acouple of minutes the Tweets should appear in HDFS.
  • 12.
    Kowndinya Mannepalli REFERENCES  https://en.wikipedia.org/wiki/Apache_Flume https://flume.apache.org/  https://cwiki.apache.org/confluence/display/FLUME/Home  http://hortonworks.com/apache/flume/  http://www.tutorialspoint.com/apache_flume  https://github.com/apache/flume  http://www.thecloudavenue.com/2013/03/analyse-tweets-using-flume-hadoop-and.html
  • 13.
    Kowndinya Mannepalli Thank You……………………………. Any Query's : mkowndinya@gmail.com