Uploading Twitter Data into HDFS using Flume Agent

5,317 views

Published on


1) The main objective is How Twitter data get imported into HDFS by using FLUME as an intermediate service.
2) Here Twitter will acts like client, it will generate a huge amount of Data continuously, that data will be forwarded to the Custom Twitter Source, after that, the source will send the Event data to the Channel and the channel will stored it. Later the event data will send to HDFS.

1) To work with twitter data, initially we need to create a twitter application for generating corresponding keys like Consumer key, Consumer secret and also like Access token and Access token secret, these keys are useful in configuring Twitter Agent.
2) You will find in internet, that how to create twitter developer app and also you need to make sure those keys are confidential, which will directly relates to your twitter account. That is the reason behind I make it hidden those four keys.

1) Now we need to download the Flume sources jar, by issuing the above said command. This jar contains all the details related to our custom source class file and also some referred class, which relates to our Configuration.
2) After this we need to set this jar file in our classpath.

1) Now it’s time to configure Twitter agent. In this configuration we need to specify the information about source, channel and sink. Here the source type is com.cloudera.flume.source.TwitterSource. And also we need to specify all the sources key information like Consumer key, Consumer secret and also like Access token and Access token secret.
2) The keywords which you have specified under TwitterAgent.sources.Twitter.keywords are related to the topics/Trends which you are looking to download from Twitter.
3) And also we specified sink type as hdfs, and also we mentioned the path to which it has to be gets downloaded.


1) Here is the command to start the flume agent, which uses the agent namely TwitterAgent , configured under twitter-flume.conf file

Published in: Software
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,317
On SlideShare
0
From Embeds
0
Number of Embeds
41
Actions
Shares
0
Downloads
119
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Uploading Twitter Data into HDFS using Flume Agent

  1. 1. WELCOME
  2. 2. Twitter Data Import into HDFS Using Flume
  3. 3. Agenda • Objective • Create a Twitter Developer App • Download the flume-sources-1.0-NAPSHOT.jar and configure it. • Configuration of TwitterAgent. • Starting flume to import Twitter data into HDFS.
  4. 4. Objective • Moving data from Twitter to HDFS by using custom Twitter Source class
  5. 5. Create aTwitter Developer App • Initially we have to create an application in https://dev.twitter.com/apps/ and then generate the corresponding keys.
  6. 6. Download the flume-sources-1.0- SNAPSHOT.jar • Download the jar using the below command wget http://files.cloudera.com/samples/flume- sources-1.0-SNAPSHOT.jar  Set the FLUME_CLASSPATH to the downloaded jar using below command FLUME_CLASSPATH="/usr/lib/flume/lib/flume-sources-1.0- SNAPSHOT.jar"
  7. 7. Configuration ofTwitterAgent • Create a twitter-flume.conf file under /usr/lib/flume/conf location. TwitterAgent.sources = Twitter TwitterAgent.channels = MemChannel TwitterAgent.sinks = HDFS TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource TwitterAgent.sources.Twitter.channels = MemChannel TwitterAgent.sources.Twitter.consumerKey = <consumerKey> TwitterAgent.sources.Twitter.consumerSecret = <consumerSecret> TwitterAgent.sources.Twitter.accessToken = <accessToken> TwitterAgent.sources.Twitter.accessTokenSecret = <accessTokenSecret> TwitterAgent.sources.Twitter.keywords = hadoop, big data, bigdata, mapreduce, flume, TDCH, Sqoop TwitterAgent.sinks.HDFS.channel = MemChannel TwitterAgent.sinks.HDFS.type = hdfs TwitterAgent.sinks.HDFS.hdfs.path = hdfs://****:8020/user/flume/tweets/ TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000 TwitterAgent.sinks.HDFS.hdfs.rollSize = 0 TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000 TwitterAgent.channels.MemChannel.type = memory TwitterAgent.channels.MemChannel.capacity = 10000 TwitterAgent.channels.MemChannel.transactionCapacity = 100
  8. 8. Starting flume to importTwitter data into HDFS • Command to start the flume agent which uses the agent TwitterAgent configured in twitter-flume.conf file bin/flume-ng agent --conf ./conf/ -f conf/twitter- flume.conf -Dflume.root.logger=INFO,console -n TwitterAgent
  9. 9. Thanks to ALL

×