2. There are no assumption that words are
independent.
Take sentence as input.
eg: Today’s weather is awesome and
beautiful but I am feeling worse.
• Preprocessing of sentiment.
• Output - Set of words which affect polarity.
O/P:Awesome,beautiful,worse.
3. We let any real valued function of
sentiment and class to be a feature.
fi(s,ci)
• Features: individual property of something
that you are observing.
• Property of awesome = feature of
awesome
• Property of worse = feature of worse
4. We calculate feature of each polarity word.
likelihood=frequency of word in
(+ve)class
frequency of word in (-
ve)class
5. In positive class:
Likelihood=frequency of ‘awesome’ (+ve) class
frequency of ‘awesome’ (-ve) class
=+6 is the feature
• In negative class:
Likelihood=frequency of ‘awesome’ (-ve) class
frequency of ‘awesome’ (+ve) class
=-1.2 is the feature
6. In positive class:
Likelihood=frequency of ‘beautiful’ (+ve) class
frequency of ‘beautiful’ (-ve) class
=5.4
• In negative class:
Likelihood=frequency of ‘beautiful’ (-ve) class
frequency of ‘beautiful’ (+ve) class
=-1.2
7. In positive class:
Likelihood=frequency of ‘worse’ (+ve) class
frequency of ‘worse’ (-ve) class
=1.15
• In negative class:
Likelihood=frequency of ‘worse’ (-ve) class
frequency of ‘worse’ (+ve) class
=-7.5
8. Initially we assign same weight to all features
and then by using Gradient Decent Algorithm
we find weights of features selected.
Feature Vector:
X=[1 x1 x2 x3 . . . . . .]
W=[w0 w1 w2 . . . . . . ]
X.W=w0 +w1x1 +w2x2 +. . . . . .
9. We get two feature vector:
(+ve)= (6*0.1) + (5.4*0.1)
awesome beautiful
=0.76
(-ve)= (-7.5*0.1)
worse
=0.32
Feature vector
For (+ve) class
Feature Vector for (-ve)
class
11. Task of our algorithm is to determine the
parameters of the hypothesis.
I/P data
hw (X) Observation
Parameters
0.76 (+ve) class hypothesis means there is
76% chance that sentiment is Positive.
12. Reality
Prediction
• Measure how far the prediction of system is from
reality.
• The cost depends on parameter.
• The less the cost, the closer we are to the ideal
parameter for the model.
15. -log(hw(x)) if y=1
Cost (hw(x),y)=
-log(1-hw(x)) if y=0
y=1 positive class
y=0 negative class
{
16. Goal of max entropy is to maximize uncertainity.
Eg:1)when we throw a dice then the probability of
each number is 1/6 i.e it is uncertain.
2) when we have a coin having probability of coming
head is 0.9 then it is certain that whenever we flip
a coin it shows us head only.
So to make a perfect model it should be uncertain
and to do so we have to maximize its entropy and
minimize its cost.
17. Finding the value of weight that minimize
the cost.
m
wi =wi - ∝ ∑ (hw (xj)-yj)xi
j
J=1
18. Wi =Wi - ∝ dJ(w)
dwi
New
weight
Old Weight
Learning Rate
Slope
19. We get weights which minimize the cost
when-
• Put improved weight & feature into
probability function given below-
P(c/d)= exp(∑wifi(d,c))
Z(d)=∑ exp(∑wifi(d,c))
20. Let us now try to test the algorithm with some
reviews given by a review site:
http://www.dandywarhols.com/news/band/co
urtney/courtneys-one-sentence-movie-
reviews/
21. Oz The Great and Powerful
(2013)
Oh my god it’s amazing.
Batman Begins
(2005)
I didn’t remember it being this bad.
22. Django Unchained
(2012)
So much violent fun-fun I almost needed a
nap in the middle.
The Master
(2012)
It’s like the value pack of artistic credibility:
you pay for just over two hours but you feel
like you got about four.
23. X-Men First Class
(2011)
I really love these movies but damned those
writers have a klunky bitch of a time trying to
rationalize out all the goofy names and
costumes.
The Iron Lady
(2011)
Jesus could they possibly have made it any
more depressing?
24. Moneyball
(2011)
See, all you need is amazing writing, a few of
the best actors on earth, god’s own director
and a few million bucks and anyone can
make a great movie.
Sherlock Holmes: A Game of Shadows
(2011)
These two guys are particularly fun to watch
in these two roles but the movie is such a
mess that I found myself with way too much
time to think things like B minus
25. Cowboys & Aliens
(2011)
It was really only about three problems which
took this movie from sweet to sucks and all of
them were as easily fixable as the title.
26. Part Of Speech Tagging
In corpus linguistics, part-of-speech tagging, also
called grammatical tagging or word-category disambiguation, is the process of
marking up a word in a text (corpus) as corresponding to a particular part of speech,
based on both its definition, as well as its context—i.e. relationship with adjacent
and related words in a phrase, sentence, or paragraph. A simplified form of this is
commonly taught to school-age children, in the identification of words
as nouns, verbs, adjectives, adverbs, etc.
27. •In this project we have used a dictionary of words
containing words along with their tags, these words were
assigned the tags by using Hidden markov model on a
corpus of around 35k sentences.
28. Features Applied to improve the results of
POST.
1. Any new word will be marked as a common NOUN
2. Convert Verb after “The” in NOUN
3. Convert Noun to number to Number if “.” appears.
4. Convert Noun to Past Participle if ends with “ed”
5. Anything that ends with “ly” is an adverb
6. Common noun is converted to adjective if it ends with “al”
7. Noun is converted to a verb If the word before it is “would”.
8. Convert Noun to plural if ending with “s”.
9. Convert common noun to Gerund if ending with “ing”.
10. If we get a noun followed by a noun so the second noun could be a
verb.
29. Noun Phrase
As we've seen, a noun phrase has a noun as its Head.
Determiners and adjective phrases usually constitute the pre-Head string:
[NP the children]
[NP happy children]
[NP the happy children]
Pronouns, too, can function as the Head of an NP:
[NP I] like coffee
The waitress gave [NP me] the wrong dessert
[NP This] is my car
If the Head is a pronoun, the NP will generally consist of the Head only. This is
because pronouns do not take determiners or adjectives, so there will be no
pre-Head string. However, with some pronouns, there may be a post-Head
string:
[NP Those who arrive late] cannot be admitted until the interval
Similarly, numerals, as a subclass of nouns, can be the Head of an NP:
[NP Two of my guests] have arrived
[NP The first to arrive] was John
30. Verb Phrase
In a VERB PHRASE (VP), the Head is always a verb. The pre-Head string, if any,
will be a `negative' word such as not [1] or never [2], or an adverb phrase [3]:
[1] [VP not compose an aria]
[2] [VP never compose an aria]
[3] Paul [VP deliberately broke the window]
Many verb Heads must be followed by a post-Head string:
My son [VP made a cake] -- (compare: *My son made)
We [VP keep pigeons] -- (compare: *We keep)
I [VP recommend the fish] -- (compare: *I recommend)
31. Negation Handling
We have tried to build our own algorithm to handle negation in the text
classification.
We explode the sentence into tokens and put in an array.
We then compare with neutral, positive and negative dictionaries and increment
their respective counters.
We then look for NO or NOT in the sentence.
We then store the position of the next word appearing after NO or NOT so that we
can adjust the counters.