5. OBJECTIVES
8/17/2017
5
To evaluate the persons opinion in certain cases
To determine the emotional tone behind the series of the
word
To teach the machine to analyze the various grammatical
nuances
To implement an algorithm for automatic classification of
text into positive or negative.
6. SCOPE AND APPLICATIONS
8/17/2017
6
Social networking sites for determining opinions of
people on products or brands
Review on currently published movies, songs and
products graphically
Business, politics and public actions
7. FEATURES
8/17/2017
7
Hashtag Search
Capable of training the new tweets
Representation of results in Pie-chart, Bar graph and
Scatter Plot
Determine the opinion about the peoples on
products or brands
Print or download results in png, jpeg, svg vector
image or pdf document
Popup images when hovering overs graphs
12. METHODOLOGY
8/17/2017
12
DOC TEXT CLASS
1 I loved the movies +
2 I hated the movies -
3 a great movies. good movies +
4 poor acting -
5 great acting. a good movies +
You have a document and a classification:
Ten Unique words:
<I, loved, the, movies, hated, a, great, poor, acting, good>
13. METHODOLOGY
8/17/2017
13
DOC I loved the movies hated a great poor acting good CLASS
1 1 1 1 1 +
2 1 1 1 1 -
3 2 1 1 1 +
4 1 1 -
5 1 1 1 1 1 +
Convert the document into feature sets, where the attributes are possible
words, and the values are the number of times a word occurs in the given
document.
14. METHODOLOGY
8/17/2017
14
DOC I loved the movies hated a great poor acting good CLASS
1 1 1 1 1 +
3 2 1 1 1 +
5 1 1 1 1 1 +
Documents with positive outcomes.
P (+) = 3/5=0.6
Compute: p (I|+); p (love|+); p (the|+); p (movies|+);
P (a|+); p (great|+); p (acting|+); p (good|+)
Le n be the number of words in the (+) case: 14. nk the number of times word k
occurs in these cases(+)
Let p(Wk |+)=nk+12n+|Vocabulary|
15. METHODOLOGY
8/17/2017
15
DOC I loved the movies hated a great poor acting good CLASS
1 1 1 1 1 +
3 2 1 1 1 +
5 1 1 1 1 1 +
P(+)=3/5=0.6; p(wk /+)=nk+12n+|Vocabulary|
P(I|+) = 1+114+10=0.0833; P(loved|+) = 1+114+10=0.0833;
P(the|+) = 1+114+10=0.0833; P(movies|+) = 5+114+10=0.2083;
P(a|+) = 2+114+10=0.125; P(great|+) = 2+114+10=0.125;
P(acting|+) = 1+114+10=0.0833; P(good|+) = 2+114+10=0.125;
P(hated|+) = 0+114+10=0.0417; P(poor|+) = 0+114+10=0.0417;
Documents with positive outcomes.
16. METHODOLOGY
8/17/2017
16
DOC I loved the movies hated a great poor acting good CLASS
2 1 1 1 1 -
4 1 1 -
P(-)=2/5=0.4
P(I|-) = 1+16+10=0.125; P(loved|-) = 0+16+10=0.0625;
P(the|-) = 1+16+10=0.125; P(movies|-) = 1+16+10=0.125;
P(a|-) = 0+16+10=0.0625; P(great|-) = 0+16+10=0.0625;
P(acting|-) = 1+16+10=0.125; P(good|-) = 0+16+10=0.0625;
P(hated|-) = 1+16+10=0.125; P(poor|-) = 1+16+10=0.125;
Now, let’s look at the negative examples
17. METHODOLOGY
8/17/2017
17
Now that we’ve trained our classifier.
Let’s classify a new sentence according to:
VNB = argmaxP
vj ∈ V
(vj) ∏
w ∈ words
P(W|vj)
where v stands for “value” or “class”
“I hated the poor acting”
If Vj=+; p(+)p(I|+)p(hated|+)p(the|+)p(poor|+)p(acting|+)=6.03*10-7
If Vj= -; p(-)p(I|-)p(hated|-)p(the|-)p(poor|-)p(acting|-)=1.22*10-5
18. LIMITATION
8/17/2017
18
Only 25 tweets are being analyzed
Takes nearly 35 seconds to analyze and display
results
Not able to determine neutrality of tweet
Doesnot determines polarity of emoji/smiley
( ;), :D, :P )
19. FUTURE ENHANCEMENT
8/17/2017
19
Analyzing sentiments on emo/smiley
Determining neutrality
Potential improvement can be made to our data collection
and analysis method
Future research can be done with possible improvement
such as more refined data and more accurate algorithm
Fast processing system
20. CONCLUSION
8/17/2017
20
Aimed to determine opinion of people
About 23,000 dataset (positive and negative) was
collected
Normalization of dataset
Used a suitable algorithm to train data
A simple user interface was designed
Representation of results in graph
21. REFERENCES
8/17/2017
21
Kim S-M, Hovy E (2004) Determining the sentiment of opinions In:
Proceedings of the 20th international conference on Computational
Linguistics, page 1367. Association for Computational Linguistics,
Stroudsburg, PA, USA.
Liu B (2010) Sentiment analysis and subjectivity In: Handbook of Natural
Language Processing, Second Edition. Taylor and Francis Group, Boca. Liu B,
Hu M, Cheng J (2005) Opinion observer: Analyzing and comparing opinions
on the web In: Proceedings of the 14th International Conference on World
Wide Web, WWW ’05, 342–351. ACM, New York, NY, USA.
Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and
opinion mining In: Proceedings of the Seventh conference on International
Language Resources and Evaluation. European Languages Resources
Association, Valletta, Malta.
Pang B, Lee L (2004) A sentimental education: Sentiment analysis using
subjectivity summarization based on minimum cuts In: Proceedings of the
42Nd Annual Meeting on Association for Computational Linguistics, ACL
’04.. Association for Computational Linguistics, Stroudsburg, PA, USA.
22. REFERENCES
8/17/2017
22
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends
Inf Retr2(1-2): 1–135.
Twitter (2014) Twitter apis. https://dev.twitter.com/start.
LiuB(2014)The science of detecting fake
reviews. http://content26.com/blog/bing-liu-the-science-of-detecting-fake-
reviews/
Jahanbakhsh, K., & Moon, Y. (2014). The predictive power of social media:
On the predictability of U.S presidential elections using Twitter
Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in
consumer reviews In: Proceedings of the 21st, International Conference on
World Wide Web, WWW ’12, 191–200.. ACM, New York, NY, USA.
Saif, H., He, Y., & Alani, H. (2012). Semantic sentiment analysis of twitter.
The Semantic Web (pp. 508– 524). ISWC
23. REFERENCES
8/17/2017
23
Tan LK-W, Na J-C, Theng Y-L, Chang K (2011) Sentence-level sentiment
polarity classification using a linguistic approach In: Digital Libraries: For
Cultural Heritage, Knowledge Dissemination, and Future Creation, 77–87..
Springer, Heidelberg, Germany.
Liu B (2012) Sentiment Analysis and Opinion Mining. Synthesis Lectures on
Human Language Technologies. Morgan & Claypool Publishers.
Gann W-JK, Day J, Zhou S (2014) Twitter analytics for insider trading fraud
detection system In: Proceedings of the sencond ASE international conference
on Big Data.. ASE.
Joachims T. Probabilistic analysis of the rocchio algorithm with TFIDF for text
categorization. In: Presented at the ICML conference; 1997.
Yung-Ming Li, Tsung-Ying Li Deriving market intelligence from microblogs