Mike davies sentiment_analysis_presentation_backup
Upcoming SlideShare
Loading in...5

Mike davies sentiment_analysis_presentation_backup



My Michaelmas fourth year presentation on a CUED fourth year project: Sentiment Analysis.

My Michaelmas fourth year presentation on a CUED fourth year project: Sentiment Analysis.



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds


Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Mike davies sentiment_analysis_presentation_backup Mike davies sentiment_analysis_presentation_backup Presentation Transcript

  • Sentiment Analysis 1. Discover a niche network of Twitter users 2. Model their emotions on topics 3. Use feelings to more accurately predict a time series e.g. The stock market e.g. Box office success 4. Are some [users/networks] more influential than others?
  • This Talk The Design Decision The Core Goals The 3 parts of the project: 1. Classifying the SENTIMENT of tweets 2. Building a NETWORK of twitter users 3. Finding a TIME SERIES of sentiment for each user
  • Sentiment Analysis Used Already Derwent Capital Markets - ”The twitter hedgefund” £25m fund 10% of tweets predicts Dow Jones movement direction with 87.6% accuracy Returned 1.85% in its first month of trading Johan Bollen, Indiana University, used bag-of- words approach
  • Sentiment Analysis Used Already Product reviews / ratings
  • Sentiment Analysis Used Already Social Media Analytics
  • Design Decision Many paragraphs of text (Product Reviews) + : Better accuracy of prediction - : Less data overall Huge amount of small quantities of text (Twitter) + : Opinions of greater number of people & at high enough frequency to model as a signal - : Classification of opinion is v. poor => TWITTER
  • 2 Current Aims (will change later) 1. Project aims to be context independent (i.e. Movies & products) 2. When context is given, use it to better classify tweets
  • 1: Sentiment Analysis of Tweets Three-tier classification process: tweet spam not spam objective subjective positive negative
  • 1: Sentiment Analysis of Tweets Double-Back Propagation Algorithm  ACL Journal, March 2011, MIT Press  Opinion Word Extraction & Target Extraction  4 rules  ”The phone has a good screen” => add ”good” to list of adjectives => add ”screen” to list of nouns  Etc.  Great for rating features of a product  Not great for tweets
  • 1: Sentiment Analysis of Tweets Twitter Part Of Speech (POS) tagger: www.ark.cs.cmu.edu/TweetNLP/ Written in java " ^ Drive ^ Max Ent " ^ , , go V and & watch V it O ! , Fantastic A movie N . ,
  • Bootstrapped Tweet SA improver Tweet IMDB Movie Review Corpora Tweet Tweet Sentiment Analysis Tweet Double-Back Prop. Algo Tweet Tweet Gives useful adjectives, nouns Tweet
  • 2: Building a Network Collected my twitter friends, friends of friends, friends of friends of friends.  => 115,896 users
  • 2: Building a Network
  • 2: Building a Network Community detection:  Paper 1: Near linear time algorithm for detecting community structures on large scale networks  Paper 2: An LDA-based Community Structure Discovery Approach for Large-Scale Social Networks Haizheng Zhang
  • 2: Building a Network  Like MapReduce  Instead of ”map” and ”reduce”  Map = Update: modify overlapping sets of data  Reduce = Sync: perform reductions in the background while sync is running  Label Propagation & LDA
  • 3: Time series prediction Will get time series from python to R using the rpy2 module R has a great package ”quantmod” for importing financial market data. Can also import other time series very easily & many great libraries.
  • Built With Python - For majority of code Packages: numpy, scipy, matplotlib networkx, graphviz, rpy2 django, twython, nltk R - For time series analysis Postgreql - SQL database Java - Twitter POS tagger C/C++ - GraphLab
  • End Product IMDB MovieReview Corpora Tweet Tweet Sentiment Tweet Double-Back Analysis Prop. Algo Tweet Tweet
  • Thank You  Mike Davies  Documented at www.m1ked.com
  • Notes: Vowpal Wabbit LDA Vowpal Wabbit is an open source library for fast online learning (mostly SGD) mainly developed by a guy at Yahoo. Optimised for speed LDA uses clever tricks like vectorisation, floating point representation to avoid using pow() and exp() functions.
  • Notes: Label Propagation Label Propagation has been proven to be an effective semi-supervised learning approach in many applications. The key idea behind label propagation is to first construct a graph in which each node represents a data point and each edge is assigned a weight often computed as the similarity between data points, then propagate the class labels of labeled data to neighbors in the constructed graph in order to make predictions.