Sentiment Analysis
Yasen Kiprov
PhD Student, Intelligent Systems
R&D Engineer, NLP
AGENDA
● Introduction to NLP
● Text Classification & Sentiment Analysis
● Engineering approach
● Supervised Machine Learning
● Linear & Logistic Regression
● Sentiment analysis for statisticians
● Why is it not working (Discussion)
● Bonus track – word embeddings
Natural Language Processing
● Enables interaction between computers and
humans through natural languages
or “The branch of information science that deals
with natural language information”
● Natural language understanding - enabling
computers to derive meaning from human input
● Natural language generation
(Not neuro linguistic programming, still some magic applies)
NLP is everywhere
Google translate
Google ads
Google search
Siri / Question Answering
Chat bots
Spam generation / spam filtering
Gene and protein detection
Surveillance / marketing
Text Classification
● Automatically assign a piece of text to one or
more classes.
● History: Guess the author based on text
specifics and author style
1901: “One author prefers “em” as a short for
“them”- let's use this as feature!”
1970s: Who wrote “The Federalist Papers”?
Text Classification
● Spam or not spam
● News analysis: politics, sports, business
● Google ads verticals
26 root categories, 2200 subcategories
● Terrorist or not
Yes, they read your facebook and yes, they know...
Also Text Classification
● Detect truth / lie / sarcasm / joke
● Determine medical condition from hospital
records, patient description
● Guess stock prices
● “How will this press release affect company shares price”
● Sentiment analysis
Sentiment Analysis
● Determining writer's attitude
● Overall document: positive / negative / neutral
“We totally enjoyed our stay there!”
● Towards a target:
“Battery sucks, bends really well though”
● Detecting emotions: sad, happy, angry, excited
● Scales:
● Number of stars / -10 to +10 / percentage
● Subjective vs Objective
Classification for engineers
● Why bother with AI, keep it simple:
IF text contains “ em ”
AND NOT text contains “ them “
author is X
ELSE author is Y
● But what if...
Classification for engineers
● If author X decided to use “them” once?
Let's try a list of words that only author X uses
IF text contains a word from listX
author is X
ELSE try other rules
Find all the features !!!
Classification for engineers
● Build a super smart system of if-else
statements to classify correctly each document
● Solving the problem algorithmically
● An “expert system”
● Still used in practice for many applications
● Twitter “sentiment analysis” only rule: if text contains :) or :(
When to do engineering
● For very narrow tasks
● Determine if text is a url or email address
● For a very specific domain
● “If text contains a name of any US president, it's a legislation”
● To create a proof-of-concept
● Twitter “sentiment analysis” only rule: if text contains :) or :(
● When it's hard to get enough data (explained later)
AGENDA
● Introduction to NLP
● Text Classification & Sentiment Analysis
● How it's done (by engineers)
● Supervised Machine Learning
● Linear & Logistic Regression
● Sentiment analysis for statisticians
● Why is it not working (Discussion)
● Bonus track – word embeddings
Supervised learning - Regression
“In statistics, regression
analysis is a statistical process
for estimating the relationships
among variables.”
● Create a hypothesis function based on the blue dots
● When a new X appears, calculate Y
The graph: X values are features, Y values are target values.
Linear Regression Example
● Let X be temperature
● Let Y be chance of rain
Create a function that predicts chance of rain, given temperature
(In reality X is a vector with many feature values)
Linear Regression Example
● Let X be temperature
● Let Y be chance of rain
Create a function that predicts chance of rain, given temperature
(In reality X is a vector with many feature values)
Hypothesis (function
of a line):
Parameters:
Cost Function:
Goal:
Linear Regression Maths
Step:
Supervised Learning -
Classification
“identifying to which of a set of
categories a new observation
belongs, on the basis of a training
set of data containing observations
(or instances) whose category
membership is known.”
Given a set of training instances, predict a continuous
valued output for new ones.
The graph: x1 and x2 are features, dot color is the target class.
Classification Example
● Let X1 be temperature
● Let X2 be humidity
Create a function that predicts rain or no rain.
(In reality X is a vector with many feature values)
2D Example
● Let X be humidity
● Let Y = 0 for no rain
● Let Y = 1 for rain
Linear hypothesis function doesn't really make sense now.
Logistic function can approximate better.
Logistic Regression
AGENDA
● Introduction to NLP
● Text Classification & Sentiment Analysis
● How it's done (by engineers)
● Supervised Machine Learning
● Linear & Logistic Regression
● Sentiment analysis for statisticians
● Why is it not working (Discussion)
● Bonus track – word embeddings
Agenda Explained
● Until now:
● What is text classification
● What is supervised learning (classification)
● Up next:
● How to apply supervised learning to text?
Statistical Sentiment Analysis
● Document: A piece of text
● Corpus: Set of documents
● Target: Y, positive/negative, emotion, percentage
● Training corpus: Set of documents for which we know Y
●
What is X?
●
How to convert a document to a (real-valued) vector
● Building training corpus
● Find “enough” data
Defining Features
● Each word: one-hot vector
● I = [0, 0, 0, 1, 0, 0, 0, …, 0]
● like = [1, 0, 0, 0, 0, 0, 0, …, 0]
● cookies = [0, 0, 0, 0, 0, 0, 1, …, 0]
● Number of dimensions = size of vocabulary
● Document: bag of words
● Order of words is lost
● Count of words can be added
● Term frequency / inverse document frequency
"I like cookies" = [1, 0, 0, 1, 0, 0, 1, …, 0]
Feature Engineering
● Ngrams (as one-hot)
● I, like, cookies - unigrams
● “I like” = [0, 0, 0, 0, 1, 0, …, 0] - bigrams
● “I like cookies” - trigrams
● Character n-grams:
● li, ik, ke, lik, ike
● Dictionaries:
● Great value for sentiment analysis
● Very good for domain specific text
If document contains any of:
{love, like, good, cool}
add this one: [0, 0, 1, 0, …, 0]
Feature Engineering
● Simple features
● Document Length
● Emoticons
● elooongated words
● ALL-CAPS
● Stopwords
● Through other classification methods:
● Parts of speech
● Negation contexts “I don't like cookies”
● Named Entities
● Approximate dimensions of X: 100k – 10m
Work Process
● Assemble training corpus
● Separate test corpus
● Invent new features
● Generate model (supervised learning)
● Test performance
● Repeat
Tips & Tricks
● Performance usually is
● precision / recall / accuracy / f-measure
● Simple Machine Learning with tons of features
● Even a linear classifier works
● Marketing
● Everyone uses different corpus (can't compare accuracy)
● Showing only what you're sure about
● Generalizing: “overall, 70% of your customers like you”
AGENDA
● Introduction to NLP
● Text Classification & Sentiment Analysis
● How it's done (by engineers)
● Supervised Machine Learning
● Linear & Logistic Regression
● Sentiment analysis for statisticians
● Why is it not working (Discussion)
● Bonus track – word embeddings
A.I. - Why is it not working?
“Algorithmically solvable: A decision problem that can be
solved by an algorithm that halts on all inputs in a finite
number of steps.
“Unsolvable problem: A problem that cannot be solved for
all cases by any algorithm whatsoever”
● Artificial Intelligence: Develop intelligent systems, deal with
real world problems. It works... kind of...
- “Siri, will you marry me?”
- “My End User License Agreement does not cover marriage.
My apologies”
Challenges
● Annotation Guidelines
● Inter-annotator agreement
● SemEval
● Sentiment analysis corpus (~14k tweets)
● For 40% of tweets annotators didn't agree
"I don't know half of you half as well as I should like; and I like less
than half of you half as well as you deserve.”
Bilbo Baggins
Still not convinced?
● Context issues
● Narrowing the domain helps
● “beer is cool”, “soup is cool”
● “No babies yet!” - condoms / fertility drugs
● “Obama goes full Bush on Syria”
● User generated content SUCKS!
● “Polynesian sauce from chik fila a be so bomb”
● Common sense
“I tried the banana slicer and found it unacceptable. […] the
slicer is curved from left to right. All of my bananas are bent
the other way.”
AGENDA
● Introduction to NLP
● Text Classification & Sentiment Analysis
● How it's done (by engineers)
● Supervised Machine Learning
● Linear & Logistic Regression
● Sentiment analysis for statisticians
● Why is it not working (Discussion)
● Bonus track – word embeddings
Word representations
● One-hot is sparse and meaningless
● N-dimensional vector for each word
● “Ubuntu” close to “Debian”
● “king” to “queen” = “man” to “woman”
● Based solely on word co-occurrence
n = 50 to 1000
Deep Learning
● Artificial Neural Networks
● Input - word embeddings
● Output – target class
● Complex layer structure
● No feature engineering
Tools
● NLTK – NLP in python
● GATE – NLP in java + GUI
● Stanford CoreNLP – NLP in java + deep neural networks
● AlchemyAPI – commercial API for NLP (free demo)
● MetaMind – enterprise sentiment analysis and computer vision (deep
neural networks)
● WolframAlpha – Smart question answering (knows maths)
Thank you!

Sentiment Analysis

  • 1.
    Sentiment Analysis Yasen Kiprov PhDStudent, Intelligent Systems R&D Engineer, NLP
  • 2.
    AGENDA ● Introduction toNLP ● Text Classification & Sentiment Analysis ● Engineering approach ● Supervised Machine Learning ● Linear & Logistic Regression ● Sentiment analysis for statisticians ● Why is it not working (Discussion) ● Bonus track – word embeddings
  • 3.
    Natural Language Processing ●Enables interaction between computers and humans through natural languages or “The branch of information science that deals with natural language information” ● Natural language understanding - enabling computers to derive meaning from human input ● Natural language generation (Not neuro linguistic programming, still some magic applies)
  • 4.
    NLP is everywhere Googletranslate Google ads Google search Siri / Question Answering Chat bots Spam generation / spam filtering Gene and protein detection Surveillance / marketing
  • 5.
    Text Classification ● Automaticallyassign a piece of text to one or more classes. ● History: Guess the author based on text specifics and author style 1901: “One author prefers “em” as a short for “them”- let's use this as feature!” 1970s: Who wrote “The Federalist Papers”?
  • 6.
    Text Classification ● Spamor not spam ● News analysis: politics, sports, business ● Google ads verticals 26 root categories, 2200 subcategories ● Terrorist or not Yes, they read your facebook and yes, they know...
  • 7.
    Also Text Classification ●Detect truth / lie / sarcasm / joke ● Determine medical condition from hospital records, patient description ● Guess stock prices ● “How will this press release affect company shares price” ● Sentiment analysis
  • 8.
    Sentiment Analysis ● Determiningwriter's attitude ● Overall document: positive / negative / neutral “We totally enjoyed our stay there!” ● Towards a target: “Battery sucks, bends really well though” ● Detecting emotions: sad, happy, angry, excited ● Scales: ● Number of stars / -10 to +10 / percentage ● Subjective vs Objective
  • 9.
    Classification for engineers ●Why bother with AI, keep it simple: IF text contains “ em ” AND NOT text contains “ them “ author is X ELSE author is Y ● But what if...
  • 10.
    Classification for engineers ●If author X decided to use “them” once? Let's try a list of words that only author X uses IF text contains a word from listX author is X ELSE try other rules Find all the features !!!
  • 11.
    Classification for engineers ●Build a super smart system of if-else statements to classify correctly each document ● Solving the problem algorithmically ● An “expert system” ● Still used in practice for many applications ● Twitter “sentiment analysis” only rule: if text contains :) or :(
  • 12.
    When to doengineering ● For very narrow tasks ● Determine if text is a url or email address ● For a very specific domain ● “If text contains a name of any US president, it's a legislation” ● To create a proof-of-concept ● Twitter “sentiment analysis” only rule: if text contains :) or :( ● When it's hard to get enough data (explained later)
  • 13.
    AGENDA ● Introduction toNLP ● Text Classification & Sentiment Analysis ● How it's done (by engineers) ● Supervised Machine Learning ● Linear & Logistic Regression ● Sentiment analysis for statisticians ● Why is it not working (Discussion) ● Bonus track – word embeddings
  • 14.
    Supervised learning -Regression “In statistics, regression analysis is a statistical process for estimating the relationships among variables.” ● Create a hypothesis function based on the blue dots ● When a new X appears, calculate Y The graph: X values are features, Y values are target values.
  • 15.
    Linear Regression Example ●Let X be temperature ● Let Y be chance of rain Create a function that predicts chance of rain, given temperature (In reality X is a vector with many feature values)
  • 16.
    Linear Regression Example ●Let X be temperature ● Let Y be chance of rain Create a function that predicts chance of rain, given temperature (In reality X is a vector with many feature values)
  • 17.
    Hypothesis (function of aline): Parameters: Cost Function: Goal: Linear Regression Maths Step:
  • 19.
    Supervised Learning - Classification “identifyingto which of a set of categories a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.” Given a set of training instances, predict a continuous valued output for new ones. The graph: x1 and x2 are features, dot color is the target class.
  • 20.
    Classification Example ● LetX1 be temperature ● Let X2 be humidity Create a function that predicts rain or no rain. (In reality X is a vector with many feature values)
  • 21.
    2D Example ● LetX be humidity ● Let Y = 0 for no rain ● Let Y = 1 for rain Linear hypothesis function doesn't really make sense now. Logistic function can approximate better.
  • 22.
  • 23.
    AGENDA ● Introduction toNLP ● Text Classification & Sentiment Analysis ● How it's done (by engineers) ● Supervised Machine Learning ● Linear & Logistic Regression ● Sentiment analysis for statisticians ● Why is it not working (Discussion) ● Bonus track – word embeddings
  • 24.
    Agenda Explained ● Untilnow: ● What is text classification ● What is supervised learning (classification) ● Up next: ● How to apply supervised learning to text?
  • 25.
    Statistical Sentiment Analysis ●Document: A piece of text ● Corpus: Set of documents ● Target: Y, positive/negative, emotion, percentage ● Training corpus: Set of documents for which we know Y ● What is X? ● How to convert a document to a (real-valued) vector ● Building training corpus ● Find “enough” data
  • 26.
    Defining Features ● Eachword: one-hot vector ● I = [0, 0, 0, 1, 0, 0, 0, …, 0] ● like = [1, 0, 0, 0, 0, 0, 0, …, 0] ● cookies = [0, 0, 0, 0, 0, 0, 1, …, 0] ● Number of dimensions = size of vocabulary ● Document: bag of words ● Order of words is lost ● Count of words can be added ● Term frequency / inverse document frequency "I like cookies" = [1, 0, 0, 1, 0, 0, 1, …, 0]
  • 27.
    Feature Engineering ● Ngrams(as one-hot) ● I, like, cookies - unigrams ● “I like” = [0, 0, 0, 0, 1, 0, …, 0] - bigrams ● “I like cookies” - trigrams ● Character n-grams: ● li, ik, ke, lik, ike ● Dictionaries: ● Great value for sentiment analysis ● Very good for domain specific text If document contains any of: {love, like, good, cool} add this one: [0, 0, 1, 0, …, 0]
  • 28.
    Feature Engineering ● Simplefeatures ● Document Length ● Emoticons ● elooongated words ● ALL-CAPS ● Stopwords ● Through other classification methods: ● Parts of speech ● Negation contexts “I don't like cookies” ● Named Entities ● Approximate dimensions of X: 100k – 10m
  • 29.
    Work Process ● Assembletraining corpus ● Separate test corpus ● Invent new features ● Generate model (supervised learning) ● Test performance ● Repeat
  • 30.
    Tips & Tricks ●Performance usually is ● precision / recall / accuracy / f-measure ● Simple Machine Learning with tons of features ● Even a linear classifier works ● Marketing ● Everyone uses different corpus (can't compare accuracy) ● Showing only what you're sure about ● Generalizing: “overall, 70% of your customers like you”
  • 31.
    AGENDA ● Introduction toNLP ● Text Classification & Sentiment Analysis ● How it's done (by engineers) ● Supervised Machine Learning ● Linear & Logistic Regression ● Sentiment analysis for statisticians ● Why is it not working (Discussion) ● Bonus track – word embeddings
  • 32.
    A.I. - Whyis it not working? “Algorithmically solvable: A decision problem that can be solved by an algorithm that halts on all inputs in a finite number of steps. “Unsolvable problem: A problem that cannot be solved for all cases by any algorithm whatsoever” ● Artificial Intelligence: Develop intelligent systems, deal with real world problems. It works... kind of... - “Siri, will you marry me?” - “My End User License Agreement does not cover marriage. My apologies”
  • 33.
    Challenges ● Annotation Guidelines ●Inter-annotator agreement ● SemEval ● Sentiment analysis corpus (~14k tweets) ● For 40% of tweets annotators didn't agree "I don't know half of you half as well as I should like; and I like less than half of you half as well as you deserve.” Bilbo Baggins
  • 34.
    Still not convinced? ●Context issues ● Narrowing the domain helps ● “beer is cool”, “soup is cool” ● “No babies yet!” - condoms / fertility drugs ● “Obama goes full Bush on Syria” ● User generated content SUCKS! ● “Polynesian sauce from chik fila a be so bomb” ● Common sense “I tried the banana slicer and found it unacceptable. […] the slicer is curved from left to right. All of my bananas are bent the other way.”
  • 35.
    AGENDA ● Introduction toNLP ● Text Classification & Sentiment Analysis ● How it's done (by engineers) ● Supervised Machine Learning ● Linear & Logistic Regression ● Sentiment analysis for statisticians ● Why is it not working (Discussion) ● Bonus track – word embeddings
  • 36.
    Word representations ● One-hotis sparse and meaningless ● N-dimensional vector for each word ● “Ubuntu” close to “Debian” ● “king” to “queen” = “man” to “woman” ● Based solely on word co-occurrence n = 50 to 1000
  • 37.
    Deep Learning ● ArtificialNeural Networks ● Input - word embeddings ● Output – target class ● Complex layer structure ● No feature engineering
  • 38.
    Tools ● NLTK –NLP in python ● GATE – NLP in java + GUI ● Stanford CoreNLP – NLP in java + deep neural networks ● AlchemyAPI – commercial API for NLP (free demo) ● MetaMind – enterprise sentiment analysis and computer vision (deep neural networks) ● WolframAlpha – Smart question answering (knows maths)
  • 39.