1) The main objective is How Twitter data get imported into HDFS by using FLUME as an intermediate service.
2) Here Twitter will acts like client, it will generate a huge amount of Data continuously, that data will be forwarded to the Custom Twitter Source, after that, the source will send the Event data to the Channel and the channel will stored it. Later the event data will send to HDFS.
1) To work with twitter data, initially we need to create a twitter application for generating corresponding keys like Consumer key, Consumer secret and also like Access token and Access token secret, these keys are useful in configuring Twitter Agent.
2) You will find in internet, that how to create twitter developer app and also you need to make sure those keys are confidential, which will directly relates to your twitter account. That is the reason behind I make it hidden those four keys.
1) Now we need to download the Flume sources jar, by issuing the above said command. This jar contains all the details related to our custom source class file and also some referred class, which relates to our Configuration.
2) After this we need to set this jar file in our classpath.
1) Now it’s time to configure Twitter agent. In this configuration we need to specify the information about source, channel and sink. Here the source type is com.cloudera.flume.source.TwitterSource. And also we need to specify all the sources key information like Consumer key, Consumer secret and also like Access token and Access token secret.
2) The keywords which you have specified under TwitterAgent.sources.Twitter.keywords are related to the topics/Trends which you are looking to download from Twitter.
3) And also we specified sink type as hdfs, and also we mentioned the path to which it has to be gets downloaded.
1) Here is the command to start the flume agent, which uses the agent namely TwitterAgent , configured under twitter-flume.conf file