Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mike davies sentiment_analysis_presentation_backup


Published on

My Michaelmas fourth year presentation on a CUED fourth year project: Sentiment Analysis.

Published in: Spiritual, Technology, Business
  • Be the first to comment

Mike davies sentiment_analysis_presentation_backup

  1. 1. Sentiment Analysis 1. Discover a niche network of Twitter users 2. Model their emotions on topics 3. Use feelings to more accurately predict a time series e.g. The stock market e.g. Box office success 4. Are some [users/networks] more influential than others?
  2. 2. This Talk The Design Decision The Core Goals The 3 parts of the project: 1. Classifying the SENTIMENT of tweets 2. Building a NETWORK of twitter users 3. Finding a TIME SERIES of sentiment for each user
  3. 3. Sentiment Analysis Used Already Derwent Capital Markets - ”The twitter hedgefund” £25m fund 10% of tweets predicts Dow Jones movement direction with 87.6% accuracy Returned 1.85% in its first month of trading Johan Bollen, Indiana University, used bag-of- words approach
  4. 4. Sentiment Analysis Used Already Product reviews / ratings
  5. 5. Sentiment Analysis Used Already Social Media Analytics
  6. 6. Design Decision Many paragraphs of text (Product Reviews) + : Better accuracy of prediction - : Less data overall Huge amount of small quantities of text (Twitter) + : Opinions of greater number of people & at high enough frequency to model as a signal - : Classification of opinion is v. poor => TWITTER
  7. 7. 2 Current Aims (will change later) 1. Project aims to be context independent (i.e. Movies & products) 2. When context is given, use it to better classify tweets
  8. 8. 1: Sentiment Analysis of Tweets Three-tier classification process: tweet spam not spam objective subjective positive negative
  9. 9. 1: Sentiment Analysis of Tweets Double-Back Propagation Algorithm  ACL Journal, March 2011, MIT Press  Opinion Word Extraction & Target Extraction  4 rules  ”The phone has a good screen” => add ”good” to list of adjectives => add ”screen” to list of nouns  Etc.  Great for rating features of a product  Not great for tweets
  10. 10. 1: Sentiment Analysis of Tweets Twitter Part Of Speech (POS) tagger: Written in java " ^ Drive ^ Max Ent " ^ , , go V and & watch V it O ! , Fantastic A movie N . ,
  11. 11. Bootstrapped Tweet SA improver Tweet IMDB Movie Review Corpora Tweet Tweet Sentiment Analysis Tweet Double-Back Prop. Algo Tweet Tweet Gives useful adjectives, nouns Tweet
  12. 12. 2: Building a Network Collected my twitter friends, friends of friends, friends of friends of friends.  => 115,896 users
  13. 13. 2: Building a Network
  14. 14. 2: Building a Network Community detection:  Paper 1: Near linear time algorithm for detecting community structures on large scale networks  Paper 2: An LDA-based Community Structure Discovery Approach for Large-Scale Social Networks Haizheng Zhang
  15. 15. 2: Building a Network  Like MapReduce  Instead of ”map” and ”reduce”  Map = Update: modify overlapping sets of data  Reduce = Sync: perform reductions in the background while sync is running  Label Propagation & LDA
  16. 16. 3: Time series prediction Will get time series from python to R using the rpy2 module R has a great package ”quantmod” for importing financial market data. Can also import other time series very easily & many great libraries.
  17. 17. Built With Python - For majority of code Packages: numpy, scipy, matplotlib networkx, graphviz, rpy2 django, twython, nltk R - For time series analysis Postgreql - SQL database Java - Twitter POS tagger C/C++ - GraphLab
  18. 18. End Product IMDB MovieReview Corpora Tweet Tweet Sentiment Tweet Double-Back Analysis Prop. Algo Tweet Tweet
  19. 19. Thank You  Mike Davies  Documented at
  20. 20. Notes: Vowpal Wabbit LDA Vowpal Wabbit is an open source library for fast online learning (mostly SGD) mainly developed by a guy at Yahoo. Optimised for speed LDA uses clever tricks like vectorisation, floating point representation to avoid using pow() and exp() functions.
  21. 21. Notes: Label Propagation Label Propagation has been proven to be an effective semi-supervised learning approach in many applications. The key idea behind label propagation is to first construct a graph in which each node represents a data point and each edge is assigned a weight often computed as the similarity between data points, then propagate the class labels of labeled data to neighbors in the constructed graph in order to make predictions.