Apache flume - Twitter Streaming

Kowndinya Mannepalli
Apache Flume

Introduction
Architecture
Flume Installation
Example Twitter Streaming data into Hadoop
References

Apache Flume is a distributed, reliable, and available
service for efficiently collecting, aggregating, and moving large amounts
of streaming data into the Hadoop Distributed File System (HDFS). It has
a simple and flexible architecture based on streaming data flows; and is
robust and fault tolerant with tunable reliability mechanisms for failover
and recovery
Introduction

FLUMEAGENT
An agent is an independent daemon
process (JVM) in Flume. It receives
the data (events) from clients or
other agents and forwards it to its
next destination (sink or agent).
Flume may have more than one
agent.
SOURCE
A source is the component of an Agent which
receives data from the data generators and transfers it to
one or more channels in the form of Flume events
CHANNEL
It acts as a bridge between the sources and the
sinks. These channels are fully transactional and they can
work with any number of sources and sinks.
SINK
A sink stores the data into centralized stores
like HBase and HDFS. It consumes the data (events)
from the channels and delivers it to the destination. The
destination of the sink might be another agent or the
central stores.
ApacheFlume-Architecture

INSTALLING FLUME
https://flume.apache.org/download.htmlGo to :
Download:
Extract & Create a directory with the name Flume in the same directory where the installation directories
of Hadoop :
• $ mkdir Flume
• $ cd Downloads/
• $ tar zxvf apache-flume-1.6.0-bin.tar.gz
Move the content of apache-flume-1.6.0-bin.tar file to the Flume directory
* $ mv apache-flume-1.6.0-bin.tar/* /home/Hadoop/Flume/
STEP -1
STEP -2
STEP -3
STEP -4

Configuring Flume
To configure Flume, we have to modify three files namely, flume-env.sh, flumeconf.properties, and bash.rc.
* $ sudo gedit ~/.bash.rc.
Now rename
 flume-conf.properties.template file as flume-conf.properties and
 flume-env.sh.template as flume-env.sh
* Open flume-env.sh file and set the JAVA_Home
Verifying the Installation : $ ./flume-ng
STEP -5
STEP -6
STEP -7

Twitter Data Streaming into Hadoop
Trying to get Tweets using Apache Flume and save them into HDFS. Twitter exposes the API to get the Tweets The
service is free, but requires the user to register for the service.

Flume has the concepts of agents. The
sources, sinks and the intermediate channels
are the different types of agents. The sources
can push/pull the data and send it to the
different channels which in turn will send the
data to the different sinks.
Flume decouples the source (Twitter) and the sink
(HDFS) in this case. Both the source and the sink
can operate at different speeds, also it's much
easier to add new sources and sinks. Flume comes
with a set of sources, channels, sinks and new
once can be implemented by extending the Flume
base classes.
Creating a Twitter Application

Saveasflume.confin.conffolder
Verify Hadoop : $ Hadoop Version
The TwitterAgent.sources.Twitter.keywords value can be modified to get the
tweets for some other topic like football, movies etc.
The consumerKey, consumerSecret, accessToken and accessTokenSecret
have to be replaced with those obtained from https://dev.twitter.com/apps.
TwitterAgent.sinks.HDFS.hdfs.path should point to the NameNode and the location
in HDFS where the tweets will go to.
Start flume using the below command
$ FLUME_HOME : bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -Dflume.root.logger=DEBUG,console -n TwitterAgent

Twitter Data Streaming into HDFS using Flume

After a couple of minutes the Tweets should
appear in HDFS.

REFERENCES
 https://en.wikipedia.org/wiki/Apache_Flume
 https://flume.apache.org/
 https://cwiki.apache.org/confluence/display/FLUME/Home
 http://hortonworks.com/apache/flume/
 http://www.tutorialspoint.com/apache_flume
 https://github.com/apache/flume
 http://www.thecloudavenue.com/2013/03/analyse-tweets-using-flume-hadoop-and.html

Thank You …………………………….
Any Query's : mkowndinya@gmail.com

Apache flume - Twitter Streaming

More Related Content

What's hot

Viewers also liked

Similar to Apache flume - Twitter Streaming

Recently uploaded

Apache flume - Twitter Streaming