Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
11/18/2015 Analyze Twitter Data
with Hortonworks
Hadoop
Intermediate Project Report
Bharat Khanna
UNIVERSITY AT BUFFALO
1
Sentiment Analysis of Mr. Narendra Modi’s Brand Image using Twitter Data
Summary: - I am doing sentiment analysis of Mr....
2
Click here for Flume Source Code
Size of Data: - Though there is no limitation of amount of data I can get from twitter ...
Upcoming SlideShare
Loading in …5
×

Twitter sentiment analysis project report

5,275 views

Published on

Project Report for Twitter Sentiment Analysis done using Apache Flume and data is analysed using Hive.

I intend to address the following questions:

 How raw tweets can be used to find audience’s perception or sentiment about a person ?

 How Hadoop can be used to solve this problem?

 How Apache Hive can be used to organize the final data in a tabular format and query it?

 How a data visualization tool can be used to display the findings?

Published in: Technology
  • Be the first to comment

Twitter sentiment analysis project report

  1. 1. 11/18/2015 Analyze Twitter Data with Hortonworks Hadoop Intermediate Project Report Bharat Khanna UNIVERSITY AT BUFFALO
  2. 2. 1 Sentiment Analysis of Mr. Narendra Modi’s Brand Image using Twitter Data Summary: - I am doing sentiment analysis of Mr. Narendra Modi’s Brand Image across different nations using data from twitter. For fetching the twitter data, I am using Apache Flume that is open source and by default comes installed in Hortonworks sandbox platform 1.3. After fetching the data from twitter, it would be loaded directly to HDFS (Hadoop Distributed File System). This way I am reducing the extra overhead of transferring the data from local system to HDFS. Data loaded in HDFS is still in unstructured format and not good for Ad-hoc analysis. So I will be converting the JSON data to tabular format and store it in HIVE. Also I would be providing a graphical user interface to end users to run their own ad-hoc analysis. Next step deals with using the dictionary file to score the sentiment of each tweet by the number of positive words compared to number of negative words, and then assigned a positive, negative or neutral sentiment value to eachtweet. I have downloaded the dictionary file from below link. Click here for Dictionary Last part of project is to show results of sentiments analysis in form of visualizations. Here I will be using Tableau for it. I will be connecting Tableau to Hive using Hortonworks ODBC Driver that I downloaded from Hortonworks website (link mentioned in references section). I will show the results of analysis in the form graphs and maps using Tableau’s inbuilt VIZQL server. Data sets and Software: Sentiment Data: - Sentiment Data is unstructured data that represents opinions, emotions, attitudes contained in sources such as social media posts, online blogs, and product reviews etc. Whyuse sentiment Data:- Organizations use sentiment data to know what people feel about their product and what they can do to effectively market their product. How did I fetched Twitter Data: - Created twitter app, configured flume.conf with app credentials and ran flume. All the steps for fetching data from twitter using Apache Flume I have mentioned in a YouTube video and a ppt, the link of which is below. I have alsouploaded video at ublearns discussion forum of DC. YouTube: - https://youtu.be/E1w5SkE7Cco Slide share: - http://www.slideshare.net/bharat3khanna/extracting-twitter-data-using- apache-flume Source code for Flume-Snapshot.jar:- Idownloadedsource code of Flume-snapshot.jarfromgithub and builtthe jarusingmavenpackage inHadoop cluster.
  3. 3. 2 Click here for Flume Source Code Size of Data: - Though there is no limitation of amount of data I can get from twitter but for this project, I am going to do my analysis on approximately 100 mb of data. AlgorithmsUsed:- IamnotusingMap-Reduce Algorithmhere,sinceIwanttodoanalysis oncomplete data and I don’twant to use aggregatedmeasures.If I wouldhave usedMap Reduce,thenmy lot of data wouldhave beenaggregatedbyreducer.My source data isin JSON format and I am usingHive- serde.jar (serde stands serializer and deserializer) that helps in parsing the JSON data effectively to hive tables. Source code forHive-serde.jar:-Idownloaded source code of Hive-serde.jarfromgithubandbuiltthe jar using maven package in Hadoop cluster. Clickhere forHive-serde.jarsource code Analysis to be done on Twitter data: - I am going to do following analysis using Hive and Tableau:- a) Maximum tweets count per user. b) Count of retweets. c) Geographically mapping people’s sentiments towards Mr. Modi. References: - http://blog.cloudera.com/blog/2012/09/analyzing-twitter-data-with-hadoop https://github.com/cloudera/cdh-twitter-example https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html#lexicon http://hortonworks.com/products/releases/hdp-1-3/#add_ons

×