By
Prof. Masun Nabhan Homsi
Assistant Professor
Departamento de computación yTecnología de la Información
Universidad Simón Bolívar
Caracas -Venezuela
Multi-Class Sentiment Analysis using a
Hierarchical Logistic Model Tree Approach
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
About the author
2
 Masun Nabhan Homsi
 2013 – Present
 Assistant Professor, Computer Science and InformationTechnology
Department, Simon Bolivar University, Caracas –Venezuela.
 2011
 PhD Degree, University of Aleppo, Syria, inArtificial Intelligence,
student model, IntelligentTutoring System.
 2002
 Master Degree, University ofAleppo, Syria, in Artificial Intelligence for
designing Intelligent software for deaf-mute people.
Research Interest
 SentimentAnalysis, Personality Recognition, Intelligent
Tutoring Systems, Recommender systems & others.
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Agenda
 Introduction
 Motivation
 Objectives
 General Inquirer Dictionary (GI)
 Logistic ModelTree (LMT)
 Datasets used in this research
 Methodology
 Preprocessing phase
 Sentiment Classification Phase
 Results & Discussions
 Feature Extraction and Bipolar Sentiments
 Sentiments Intensities.
 Comparison and analysis.
 Conclusions and Future work.3
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Introduction
 What is SentimentAnalysis (SA)?
 Sentiments are feelings, opinions, emotions, likes/dislikes, good/bad.
 SA is a study of human behavior in which we extract user opinion and
emotion from plain text.
 SA is also known as Opinion Mining (OM).
 Example :
 John : It is a great movie (Positive).
 Alice: No, I don’t like it (Negative).
 Mark: Movie‘s stars is … (Neutral)
 Polarity :
 Positive
 Negative
 Neutral
4
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Introduction
 Sentiment AnalysisTasks
 Objective & Subjective.
 Sentiment Analysis Levels :
 Word level, Sentence Level or Document Level,Aspect level.
 Sentiment Analysis Approaches
 lexicon-based.
 Machine Learning-based.
 Hybrid.
5
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Agenda
 Introduction
 Motivation
 Objectives
 General Inquirer Dictionary (GI)
 Logistic ModelTree (LMT)
 Datasets used in this research
 Methodology
 Preprocessing phase
 Sentiment Classification Phase
 Results & Discussions
 Feature Extraction and Bipolar Sentiments
 Sentiments Intensities.
 Comparison and analysis.
 Conclusions and Future work.6
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Motivation
 Most of approaches in SA field deal with polarity
classification, ignoring the degree of users’ satisfaction or
dissatisfaction about any resource found on social media sites.
 Detecting several sentiment intensities in a text will help
users to access valuable information, improving their choice
and the quality of their decisions.
 In this context, a Multi-Class Sentiment Analysis System
(MCSAS) is built to extract automatically sentiments,
expressed in six dimensions, out from English text. High
Positive (HP), Positive (P), Low Positive (LP), Low Negative
(LN), Negative (N) and High Negative (HN).
7
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Agenda
 Introduction
 Motivation
 Objectives
 General Inquirer Dictionary (GI)
 Logistic ModelTree (LMT)
 Datasets used in this research
 Methodology
 Preprocessing phase
 Sentiment Classification Phase
 Results & Discussions
 Feature Extraction and Bipolar Sentiments
 Sentiments Intensities.
 Comparison and analysis.
 Conclusions and Future work.8
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Objectives
 The first objective is :
 To describe the steps taken to build MCSAS for document
classification using General Inquirer (GI) Dictionary and
Logistic ModelTree (LMT) algorithm.
 The second objective is :
 To measure the effectiveness of feature selection on the overall
performance of sentiment classification.
 The third objective is :
 To compare the new system experimental results with other
research results.
9
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Agenda
 Introduction
 Motivation
 Objectives
 General Inquirer Dictionary (GI)
 Logistic ModelTree (LMT)
 Datasets used in this research
 Methodology
 Preprocessing phase
 Sentiment Classification Phase
 Results & Discussions
 Feature Extraction and Bipolar Sentiments
 Sentiments Intensities.
 Comparison and analysis.
 Conclusions and Future work.10
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
General Inquirer Dictionary (GI)
 GI is a computational lexicon compiled from several sources:
 Harvard IV-4 dictionary
 Lasswell value dictionary.
 182 Lables:
 Positive
 Negative
 Pain
 Virtue
 Negate
 Etc.
 24 Labels are used :
 Positiv, Negativ, Pstv,Affil, Negtv, Hostile, Strong,Weak, Submit,Active,
Passive, Pleasur, Pain, Feel,Arousal, Emot,Virtue,Vice, Ovrst, Undrst,
Yes, No, Negate, and Intrj11
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Agenda
 Introduction
 Motivation
 Objectives
 General Inquirer Dictionary (GI)
 Logistic ModelTree (LMT)
 Datasets used in this research
 Methodology
 Preprocessing phase
 Sentiment Classification Phase
 Results & Discussions
 Feature Extraction and Bipolar Sentiments
 Sentiments Intensities.
 Comparison and analysis.
 Conclusions and Future work.12
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Logistic Model Tree (LMT)
 regression tree, but with logistic models on leaves
f
g
0 10% 20% 30% 40% 50%
0
0.2
0.4
0.6
0.8
1
% Nonunderstandings (FNON)
P(TaskSuccess=1)
0 10% 20% 30% 40% 50%
0
0.2
0.4
0.6
0.8
1
% Nonunderstandings (FNON)
P(TaskSuccess=1)
f=0
f=1
g>10g<=10
13
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Agenda
 Introduction
 Motivation
 Objectives
 General Inquirer Dictionary (GI)
 Logistic ModelTree (LMT)
 Datasets used in this research
 Methodology
 Preprocessing phase
 Sentiment Classification Phase
 Results & Discussions
 Feature Extraction and Bipolar Sentiments
 Sentiments Intensities.
 Comparison and analysis.
 Conclusions and Future work.14
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Datasets
 Movie Review Dataset:
 It contains movie reviews along with their associated binary
sentiment polarity labels split into 1000 positive and 1000
negative documents.
 SenTube Dataset:
 It is a dataset of user-generated comments onYouTube videos
annotated for information content and sentiment polarity
Uryupina et al. (2014).The dataset covers English and Italian
videos on the same products (automobiles, tablets). Sentiment
is divided towards the product and towards the video. 117
English videos are labeled as positive, 61 as negative and 39 as
neutral.
15
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Agenda
 Introduction
 Motivation
 Objectives
 General Inquirer Dictionary (GI)
 Logistic ModelTree (LMT)
 Datasets used in this research
 Methodology & System Architecture
 Preprocessing phase
 Sentiment Classification Phase
 Results & Discussions
 Feature Extraction and Bipolar Sentiments
 Sentiments Intensities.
 Comparison and analysis.
 Conclusions and Future work.16
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
System Architecture
17
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Agenda
 Introduction
 Motivation
 Objectives
 General Inquirer Dictionary (GI)
 Logistic ModelTree (LMT)
 Datasets used in this research
 Methodology & System Architecture
 Preprocessing phase
 Sentiment Classification Phase
 Results & Discussions
 Feature Extraction and Bipolar Sentiments
 Sentiments Intensities.
 Comparison and analysis.
 Conclusions and Future work.18
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Preprocessing Phase
 The primary objective of this phase is to extract feature
vectors (FVs) out of documents found in the CD={d1, d2,
d3,…,dn}, where n is the total labeled documents in the
dataset. Each FVi covers the occurrence or the total of
different word senses in di (where i=1,..,n).
19
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Preprocessing Phase
 1)Translate slang words, acronyms, abbreviations and
emoticons found in the raw text di, into their original
meanings. Internet Slang Dictionary (ISD) is employed to
perform this task and it contains 5380 terms originated from
various sources including Bulletin Board,AIM,Yahoo,
Twitter,YouTube, Chat Room and others.
http://www.internetslang.com
This film is extraordinarily horrendous and I'm
not going to waste any more words on it .
Replaced by
Sad
20
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Preprocessing Phase
 2) Search contracted words in di and replace them with their
original form.
This film is extraordinarily horrendous and I'm
not going to waste any more words on it sad.
Replaced by
I am
21
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Processing Phase
 Tokenize the text by breaking it up into words. di={w1, w2,
w3, … , wk}, where k is the number of token in the
document.
 Tag each word using Part-Of-SpeechTagger (POS-
Tagger).Words tagged as IN (prepositions), DT
(determiners), PRP (Personal pronoun), MD (Modal),WRV
(Wh-adverb), NNP (Proper noun), EX (Existential there) or
WP (Wh-pronoun) are ignored and considered as stop
words.A tagged word is represented by (wj, tj), where tj is
the lexical category of the word wj and j=1,..,k.
22
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Processing Phase
 Part-Of-SpeechTagger (POS-Tagger)
 this==>DT (Determiner)
 film==>NN (Noun, singular)
 is==>VBZ (Verb, 3rd person singular present)
 extraordinarily==>RB (Adverb)
 horrendous ==>JJ (Adjective)
 and==>CC (Cordinating conjuction)
 i==>PRP (Personal pronoun)
 am==>VB (verb)
 not==>RB (Adverb)
 go==>VB (verb)
 waste==>verb (Verb)
 any==>DT (Determiner)
 more==>JJR (Adjective, comparative)
 words==>NNS (Noun, plural)
 on==>IN (Preposition)
 it==>PRP (Prnoun)23
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Preprocessing Phase
 Search the word wj in GI dictionary to find its different
senses. If wj is not found stem it to find its root.This is done
using Porter Stemming algorithm.
This film is extraordinarily horrendous and I
am not going to waste any more words on it
sad.
Replaced
by go
24
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Preprocessing Phase
 Search again the stemmed word wj in GI dictionary with its
corresponding tag, to avoid ambiguity in word senses.
 If the stemmed word is found in GI, create the word senses
vector WjS={Positivj,Negativj,Pstvj,Affilj,Negtvj,
Hostilej,Strongj,Weakj,Submitj,Activej,Passivej,Pleasurj,
Painj,Feelj,Arousalj,Emotj,Virtuej,Vicej,Ovrstj,Undrstj,
Yesj,Noj,Negatej,Intrj,NWj}
 otherwise search its synonym inWordnik Dictionary.
This film is extraordinarily horrendous and I
am not go to waste any more words on it sad.
Replaced by
fearful
25
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Preprocessing Phase
 words not found in GI Dictionary are considered noise and
they are ignored.Additionally, non-English words are also
discarded, therefore, language detection procedure is not
needed.
26
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Preprocessing Phase
 Calculate Positive Polarity (PPi) by summing up the number
of occurrence of the following senses: Positiv, Pstv,Affil and
Yes in di.
 Compute Negative Polarity (NPi) by summing up the
number of occurrence of the senses: Negativ, Negtv, Hostile,
No and Negate in di.
27
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Preprocessing Phase
 Count the frequency of explicit negation words (NWi)
 The list of negation words used in this research is :”no, not,
none, no one, nobody, nothing, neither, nowhere and never” and it
is represented by N.

28
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Preprocessing Phase
 Search in a window of 2 words to detect Positive Polarity
Shifter (PPSi) and the Negative Polarity Shifter (NPSi) as
follows :
I do not hate it (Negation + Negative) = Positive
Negative
Negation
29
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Preprocessing Phase
 Calculate theTotal Positive (TPi) and theTotal Negative (TNi)
 Calculate potency senses : strong, weak and submit and the
rest of senses by counting their occurrance.
 Feature vector :
 0,0,0,0,1,2,0,1,1,0,1,0,0,0,1,1,1,0,0,0,0,0,0,0,
This film is extraordinarily fearful and I am not
go to waste any more words on it sad.
30
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Preprocessing Phase
 Film : No labels.
 Extraordinary: Positiv,Virtue and Ovrst.
 Fearful (Horrendous) : Negativ,Weak, Passive,
 Pain, Emot.
 Not: Negate.
 Go (going):Active.
 Waste : Negativ,Weak,Vice.
 More: Strong.
 Word: No labels.
31
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Preprocessing Phase
 Calculate the percentage of each feature by dividing it by the
number of words found in GI.The dimension of the final feature
vector is 19 of length to represent one document di in CD:
FVi={TPi,TNi, Strongi,Weaki, Submiti,Activei, Passivei, Pleasuri,
Paini, Feeli,Arousali, Emoti,Virtuei,Vicei, Ovrsti, Undrsti, Intrji,
NWi, Classi}.
 CD contains 2178 documents split into 1117 positive and 1061
negative. Due to this imbalanced class distribution, three re-
sampling techniques were applied and they are :
 Re-sampling without replacement.
 SMOT.
 Under-Sampling .
 The new dataset CD1 contains 2114 vectors, where each
1107 belong to a sentiment class.
32
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Agenda
 Introduction
 Motivation
 Objectives
 General Inquirer Dictionary (GI)
 Logistic ModelTree (LMT)
 Datasets used in this research
 Methodology & System Architecture
 Preprocessing phase
 Sentiment Classification Phase
 Results & Discussions
 Feature Extraction and Bipolar Sentiments
 Sentiments Intensities.
 Comparison and analysis.
 Conclusions and Future work.33
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Sentiment Classification Phase
 a hierarchical LMT (HLMT) is built to discover multi-class
sentiment in English text. It consists of three layers;
 Bipolar layer (BL) uses the first LMT (LMT-1) for detecting positive
and negative sentiments.
 Grouping Layer (GL), where positive and negative instances are
grouped separately by 2 k-means, 3 clusters for positive instances and
other 3 for negative instances. Instances in each cluster are then
labeled manually. GL is only used in construction phase of MCSAS.
 Intensity Layer (IL) utilizes two LMTs (LMT-2, LMT-3) to determine
sentiment strength expressed in six levels (three for each LMT).
 The three LMTs are trained and tested utilizing 10-folds stratified
cross validation.
34
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Sentiment Classification Phase
 Present the dataset CD1.
 LMT-1 is trained and tested to model positive and negative instances in dataset
CD1.
 CD1 is partitioned into two sub-datasets CD11 and CD12 for positive and
negative instances respectively.
 Instances in CD11 are grouped in three clusters employing K-means algorithm.
 The 3 generated clusters for positive dimensions (HP, P and LP) are labeled
manually and then instances in CD11 are saved in a new sub-dataset CDP.
 Instances in CD12 are also grouped in three clusters employing K-means
algorithm.
 The 3 generated clusters for negative dimensions (HN, N and LN) are labeled
manually and then instances in CD12 are saved in a new sub-dataset CDN.
 LMT-2 is trained and tested to model positive intensities in CDP.
 LMT-3 is trained and tested to model negative intensities in CDN.
35
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Agenda
 Introduction
 Motivation
 Objectives
 General Inquirer Dictionary (GI)
 Logistic ModelTree (LMT)
 Datasets used in this research
 Methodology & System Architecture
 Preprocessing phase
 Sentiment Classification Phase
 Results & Discussions
 Feature Extraction and Bipolar Sentiments
 Sentiments Intensities.
 Comparison and analysis.
 Conclusions and Future work.36
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Results & Discussions
 Experiments are performed in two different sets.
 The performance of the proposed feature extraction
algorithm is evaluated and compared withTF*IDF
algorithm employing six different ML algorithms to
detect sentiment polarity in BL.
 The accuracy of the LMT-2 and LMT-3 in IL is
measured to detect sentiment intensities reflected in
six classes.
37
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Agenda
 Introduction
 Motivation
 Objectives
 General Inquirer Dictionary (GI)
 Logistic ModelTree (LMT)
 Datasets used in this research
 Methodology & System Architecture
 Preprocessing phase
 Sentiment Classification Phase
 Results & Discussions
 Feature Extraction and Bipolar Sentiments
 Sentiments Intensities.
 Comparison and analysis.
 Conclusions and Future work.38
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Feature Extraction and Bipolar Sentiments
39
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Feature Extraction and Bipolar Sentiments
LM_1:
Class 0 :
-2.04 +
[TotalP] * -0.72 +
[TotalN] * -0.6 +
[Strong] * 0.34 +
[Weak] * -0.07 +
[Submit] * 7.34 +
[Active] * 0.61 +
[Passive] * 0.07 +
[Pleasur] * -0.22 +
[Pain] * -0.58 +
[Feel] * 15.58 +
[Arousal] * 1.66 +
[Emot] * -2.67 +
[Virtue] * -0.19 +
[Vice] * 1.25 +
[Ovrst] * 0.33 +
[Undrst] * -0.33 +
[Intrj] * 0.86 +
[Negation] * -2.46
LM_1
Class 1 :
2.04 +
[TotalP] * 0.72 +
[TotalN] * 0.6 +
[Strong] * -0.34 +
[Weak] * 0.07 +
[Submit] * -7.34 +
[Active] * -0.61 +
[Passive] * -0.07 +
[Pleasur] * 0.22 +
[Pain] * 0.58 +
[Feel] * -15.58 +
[Arousal] * -1.66 +
[Emot] * 2.67 +
[Virtue] * 0.19 +
[Vice] * -1.25 +
[Ovrst] * -0.33 +
[Undrst] * 0.33 +
[Intrj] * -0.86 +
[Negation] * 2.46
The generated tree of LMT-1 is of size 95 with 48
leaves (rules).
40
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Feature Extraction and Bipolar Sentiments
 Comparison of ML algorithms accuracy for
sentiment polarity classification in BL.
 Confusion Matrix of LMT-1 for Bipolar Layer
Feature
Extraction
NB SVM KNN JRip J48 LMT-1
GI 66.80 % 72.22 % 91.86% 83.74 % 88.66 % 90.88 %
TF*IDF 76.45 % 76.75 % 52.80 % 65.93 % 64.10 % 65.70%
Positive Negative
Positive 996 111
Negative 91 1016
41
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Agenda
 Introduction
 Motivation
 Objectives
 General Inquirer Dictionary (GI)
 Logistic ModelTree (LMT)
 Datasets used in this research
 Methodology & System Architecture
 Preprocessing phase
 Sentiment Classification Phase
 Results & Discussions
 Feature Extraction and Bipolar Sentiments
 Sentiments Intensities
 Comparison and analysis.
 Conclusions and Future work.42
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Sentiment Intensities
 2 trees of size 1 with 1 leave, that is, in prediction phase; only six linear
functions are used to classify multi-class sentiments expressed in English
text, where every three functions belong to a sentiment pole.
 Negative Intensities Functions
 HN= -30.23 + $Strong * -0.12 + $Submit * 0.52 + $Active * -0.09 +
$Passive * 1.2 + $Pleasur * 4.65 + $Pain * 4.49 + $Emot * 7.06 + $Vice *
-0.14 + $Ovrst * 0.07 + $Undrst * 0.55 + $Intrj * 1.72 ;
 N= 21.48 + $TotalP * 0.29 + $TotalN * -1.84 + $Strong * -0.37 +
$Weak * -0.44 + $Submit * -0.15 + $Pain * -0.53 + $Arousal * 0.44 +
$Emot * -0.78 + $Virtue * 0.2 + $Vice * -0.11 + $Ovrst * -0.06 + $Intrj
* 2.1 + $Negation * 0.2 ;
 LN= -18.43 + $TotalP * -0.52 + $TotalN * 0.98 + $Strong * 0.99 +
$Active * 0.26 + $Passive * -0.1 + $Pleasur * -2.04 + $Feel * -1.28 +
$Arousal * -2.31 + $Virtue * -0.45 + $Vice * 0.8 + $Intrj * -0.31;
43
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Sentiment Intensities
 Positive Intensities Functions
 HP= -1.24 + $Submit * 12.71 + $Active * -0.39 + $Passive *
0.15 + $Pain * -0.28 + $Feel * 4.35 + $Vice * -0.21 + $Ovrst
* -0.22 + $Undrst * -0.4 + $Intrj * -0.7 ;
 P= 3.09 + $TotalN * -0.41 + $Weak * -1.03 + $Submit * -
0.89 + $Passive * -0.04 + $Pain * -0.69 + $Emot * -0.07 +
$Vice * -0.68 + $Ovrst * 0.43 + $Undrst * 0.19 + $Negation
* 1.05 ;
 LP= -25.16 + $TotalN * 1.13 + $Strong * 0.23 + $Weak *
0.64 + $Active * 0.16 + $Pain * 1.67 + $Vice * 2.84 +
$Negation * -0.21;
44
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Sentiment Intensities
 Distribution of positive and negative instances over
clusters In GL.
 Comparison of ML algorithms accuracy for
sentiment intensity classification.
Dataset Cluster 0 Cluster 1 Cluster 2
CD11 (Positive) 208 (19%) 565 (51%) 334 (30%)
CD12 (Negative) 228 (21%) 521 (47%) 358 (32%)
Dataset NB SVM KNN JRip J48 LMT-2/
LMT-3
CDP
(Positive)
93.32 % 97.56% 93.50% 95.48 % 95.03% 99.28 %
CDN
(Negative)
92.14 % 98.28 % 92.77% 93.95% 94.40 % 99.37 %
45
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Sentiment Intensities
 Confusion matrix of LMT-2 for positive intensities
 Confusion matrix of LMT-3 for negative intensities
HP P LP
HP 206 (99.03%) 2 0
P 3 560 (99.11%) 2
LP 1 0 333 (99.70%)
HP P LP
HP 225 (99.11%) 1 2
P 1 354(98.88%) 3
LP 0 0 521(100%)
46
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Agenda
 Introduction
 Motivation
 Objectives
 General Inquirer Dictionary (GI)
 Logistic ModelTree (LMT)
 Datasets used in this research
 Methodology & System Architecture
 Preprocessing phase
 Sentiment Classification Phase
 Results & Discussions
 Feature Extraction and Bipolar Sentiments
 Sentiments Intensities.
 Comparison and analysis.
 Conclusions and Future work.47
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Comparison and Analysis
 Comparison of hybrid sentiment classifiers
Ref. Dataset Feature Selection Techniqu
e
Accuracy
%
BL Movie
Reviews+
SenTube
24 word senses in GI Dictionary +
Handling Negation
LMT 90.88
KNN 91.86
Prabowo et
al. (2009)
Movie
Reviews+
MySpace
RBC+SBC+GIBC SVM 90.45
(Kennedy
and Inkpen,
2006)
Movie review 4 word senses in GI and CTRW
Dictionaries + Unigram + Bigrams
+Handling Negation
SVM 85.9
(Cassinelli
and Chen,
2009)
Movie
Reviews
Frequency of positive and negative
word senses from GI
SVM 73.8
Severyn et
al. (2014)
SenTub. STRUCT Model: a shallow
syntactic tree
SVM 70.548
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Comparison and Analysis
 Comparison of Multi-Class Sentiment Classifiers
Ref. Dataset Class
es
Technique Accurac
y
Average
%
IL Movie Reviews +
SentTube
6 2 LTM 99.30
(Pandy, 2011) Multi-Domain Sentiment
Dataset
4 SVM 71.73
Wilson et al.
(2004)
MPQA (Multi-
perspective Question
Answering) Opinion
Corpus
4 BoosTexte
r
57.52
Ripper 55.55
Support
Vector
Regression
36.53
49
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Agenda
 Introduction
 Motivation
 Objectives
 General Inquirer Dictionary (GI)
 Logistic ModelTree (LMT)
 Datasets used in this research
 Methodology & System Architecture
 Preprocessing phase
 Sentiment Classification Phase
 Results & Discussions
 Feature Extraction and Bipolar Sentiments
 Sentiments Intensities.
 Comparison and analysis.
 Conclusions and Future work.50
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Conclusions
 The new proposed approach aims at creating automatic
multi-class sentiment analysis system which uses GI
dictionary with a hierarchical architecture of LMT.
 Experiments have demonstrated that the combination of a
wide list of features with two-stage classification method has
got high accuracies reaching 90.88% for polarity sentiment,
99.28% for positive sentiment intensities and 99.37% for
negative sentiment intensities.
51
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Future Work
 There are several directions for future work.
 The first direction is to study the performance of the proposed
system on other datasets described in the sentiment analysis literature
and on multilingual data from social media.
 The second direction is to investigate the influence of incorporating
subjective and objective features to improve the overall classification
accuracy.
 The third direction is to use the new system for building a sentiment
– based search engine forYouTube videos.
 Finally, the fourth direction is to find the relationship between users’
personality and their sentiments on social media to recommend them
appropriate resources adapted to their needs and interests.
52
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
ACKNOWLEDGMENT
 This work has been supported by the Decanato de
Investigación y Desarrollo (DID) from Simon Bolivar
University (USB).
53
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
References
 Cassinelli A. and Chen CW., 2009. CS224N Final Project Boost up! Sentiment categorization with
machine learning techniques.
 Inkpen D. Z., Feiguina O., and Hirst G., 2005. Generating more-positive and more-negative text.
In ComputingAttitude and Affect inText:Theory and Applications.Edited by J. Shanahan,Y. Qu,
and J.Wiebe.The Information Retrieval Series, 20, Springer, Dordrecht, the Netherlands, 187–
196.
 KennedyA., and Inkpen D., 2006. Sentiment classification of movie reviews using contextual
valence shifters, in Computational Intelligence, 22 (2), 110–125.
 Landwehr N., Hall M., and Eibe F., 2003. Logistic ModelTrees, in Proc.14th European Conf.on
Machine Learning, Cavtat-Dubrovnik,Croatia, 241-252.
 Pandey SJ., 2011. Opinion analysis through constraint optimization, MasterThesis, University of
York.
 Pang B. and Lee L., 2004.A sentimental education: sentiment analysis using subjectivity
summarization based on minimum cuts, in Proc.of the 42nd Annual Meeting on Association for
Computational Linguistics,ACL '04, Stroudsburg,PA, USA, 271-278.
 Pang B. and Lee L. 2008. Opinion Mining and SentimentAnalysis, Foundations andTrends in
Information Retrieval, 2(1-2), 1–135.
54
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
References
 Pang B., Lee L., andVaithyanathan S., 2002.Thumbs up? Sentiment classification using
machine-learning techniques, in Proc.of the ACL-02 Conf.on Empirical Methods in Natural
Language Processing, Philadelphia, PA, 79–86.
 Porter MF., 1980.An algorithm for suffix stripping, Program, 14(3), 130-137.
 Prabowo R.,Thelwall M., 2009.Sentiment analysis:A combined approach, Journal of
Informatics, 3(2), 143-157.
 SeverynA., Moschitti A., Uryupina O., Plank B., Filippova K., 2014. Opinion mining on
YouTube,In Proc.of the Annual Meeting of the Association for Computational Linguistics (ACL 2014).
 Uryupina O., Plank B., SeverynA., RotondiA. and Moschitti A., 2014. SenTube:A corpus for
sentiment analysis onYouTube social media, In Proc of the 9th conf.of Language Resources and
Evaluation (LREC’14), Reykjavik-Iceland, 4244-4249.
 Vinodhini G., and Chandrasekaran RM. 2012.SentimentAnalysis and Opinion Mining:A
Survey, International Journal of Advanced Research in Computer Science and Software Engineering,
2(6), 282-292.
 WilsonT.,Wiebe J. and Hwa R., 2004. Just how mad are you? finding strong and weak
opinion clauses, In Proc.of 19th National Conf.on Artificial Intelligence, 761-769.
 Witten I.H, Frank E., and Hall M., 2005. Data Mining: Practical machine learning tools and
techniques, The Morgan Kaufmann Series in Data Management Systems.
55
Multi-Class SentimentAnalysis using a Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela
Questions
 Thank you very much

Multi-Class Sentiment Analysis using a Hierarchical Logistic Model Tree Approach

  • 1.
    By Prof. Masun NabhanHomsi Assistant Professor Departamento de computación yTecnología de la Información Universidad Simón Bolívar Caracas -Venezuela Multi-Class Sentiment Analysis using a Hierarchical Logistic Model Tree Approach
  • 2.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela About the author 2  Masun Nabhan Homsi  2013 – Present  Assistant Professor, Computer Science and InformationTechnology Department, Simon Bolivar University, Caracas –Venezuela.  2011  PhD Degree, University of Aleppo, Syria, inArtificial Intelligence, student model, IntelligentTutoring System.  2002  Master Degree, University ofAleppo, Syria, in Artificial Intelligence for designing Intelligent software for deaf-mute people. Research Interest  SentimentAnalysis, Personality Recognition, Intelligent Tutoring Systems, Recommender systems & others.
  • 3.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Agenda  Introduction  Motivation  Objectives  General Inquirer Dictionary (GI)  Logistic ModelTree (LMT)  Datasets used in this research  Methodology  Preprocessing phase  Sentiment Classification Phase  Results & Discussions  Feature Extraction and Bipolar Sentiments  Sentiments Intensities.  Comparison and analysis.  Conclusions and Future work.3
  • 4.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Introduction  What is SentimentAnalysis (SA)?  Sentiments are feelings, opinions, emotions, likes/dislikes, good/bad.  SA is a study of human behavior in which we extract user opinion and emotion from plain text.  SA is also known as Opinion Mining (OM).  Example :  John : It is a great movie (Positive).  Alice: No, I don’t like it (Negative).  Mark: Movie‘s stars is … (Neutral)  Polarity :  Positive  Negative  Neutral 4
  • 5.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Introduction  Sentiment AnalysisTasks  Objective & Subjective.  Sentiment Analysis Levels :  Word level, Sentence Level or Document Level,Aspect level.  Sentiment Analysis Approaches  lexicon-based.  Machine Learning-based.  Hybrid. 5
  • 6.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Agenda  Introduction  Motivation  Objectives  General Inquirer Dictionary (GI)  Logistic ModelTree (LMT)  Datasets used in this research  Methodology  Preprocessing phase  Sentiment Classification Phase  Results & Discussions  Feature Extraction and Bipolar Sentiments  Sentiments Intensities.  Comparison and analysis.  Conclusions and Future work.6
  • 7.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Motivation  Most of approaches in SA field deal with polarity classification, ignoring the degree of users’ satisfaction or dissatisfaction about any resource found on social media sites.  Detecting several sentiment intensities in a text will help users to access valuable information, improving their choice and the quality of their decisions.  In this context, a Multi-Class Sentiment Analysis System (MCSAS) is built to extract automatically sentiments, expressed in six dimensions, out from English text. High Positive (HP), Positive (P), Low Positive (LP), Low Negative (LN), Negative (N) and High Negative (HN). 7
  • 8.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Agenda  Introduction  Motivation  Objectives  General Inquirer Dictionary (GI)  Logistic ModelTree (LMT)  Datasets used in this research  Methodology  Preprocessing phase  Sentiment Classification Phase  Results & Discussions  Feature Extraction and Bipolar Sentiments  Sentiments Intensities.  Comparison and analysis.  Conclusions and Future work.8
  • 9.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Objectives  The first objective is :  To describe the steps taken to build MCSAS for document classification using General Inquirer (GI) Dictionary and Logistic ModelTree (LMT) algorithm.  The second objective is :  To measure the effectiveness of feature selection on the overall performance of sentiment classification.  The third objective is :  To compare the new system experimental results with other research results. 9
  • 10.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Agenda  Introduction  Motivation  Objectives  General Inquirer Dictionary (GI)  Logistic ModelTree (LMT)  Datasets used in this research  Methodology  Preprocessing phase  Sentiment Classification Phase  Results & Discussions  Feature Extraction and Bipolar Sentiments  Sentiments Intensities.  Comparison and analysis.  Conclusions and Future work.10
  • 11.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela General Inquirer Dictionary (GI)  GI is a computational lexicon compiled from several sources:  Harvard IV-4 dictionary  Lasswell value dictionary.  182 Lables:  Positive  Negative  Pain  Virtue  Negate  Etc.  24 Labels are used :  Positiv, Negativ, Pstv,Affil, Negtv, Hostile, Strong,Weak, Submit,Active, Passive, Pleasur, Pain, Feel,Arousal, Emot,Virtue,Vice, Ovrst, Undrst, Yes, No, Negate, and Intrj11
  • 12.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Agenda  Introduction  Motivation  Objectives  General Inquirer Dictionary (GI)  Logistic ModelTree (LMT)  Datasets used in this research  Methodology  Preprocessing phase  Sentiment Classification Phase  Results & Discussions  Feature Extraction and Bipolar Sentiments  Sentiments Intensities.  Comparison and analysis.  Conclusions and Future work.12
  • 13.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Logistic Model Tree (LMT)  regression tree, but with logistic models on leaves f g 0 10% 20% 30% 40% 50% 0 0.2 0.4 0.6 0.8 1 % Nonunderstandings (FNON) P(TaskSuccess=1) 0 10% 20% 30% 40% 50% 0 0.2 0.4 0.6 0.8 1 % Nonunderstandings (FNON) P(TaskSuccess=1) f=0 f=1 g>10g<=10 13
  • 14.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Agenda  Introduction  Motivation  Objectives  General Inquirer Dictionary (GI)  Logistic ModelTree (LMT)  Datasets used in this research  Methodology  Preprocessing phase  Sentiment Classification Phase  Results & Discussions  Feature Extraction and Bipolar Sentiments  Sentiments Intensities.  Comparison and analysis.  Conclusions and Future work.14
  • 15.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Datasets  Movie Review Dataset:  It contains movie reviews along with their associated binary sentiment polarity labels split into 1000 positive and 1000 negative documents.  SenTube Dataset:  It is a dataset of user-generated comments onYouTube videos annotated for information content and sentiment polarity Uryupina et al. (2014).The dataset covers English and Italian videos on the same products (automobiles, tablets). Sentiment is divided towards the product and towards the video. 117 English videos are labeled as positive, 61 as negative and 39 as neutral. 15
  • 16.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Agenda  Introduction  Motivation  Objectives  General Inquirer Dictionary (GI)  Logistic ModelTree (LMT)  Datasets used in this research  Methodology & System Architecture  Preprocessing phase  Sentiment Classification Phase  Results & Discussions  Feature Extraction and Bipolar Sentiments  Sentiments Intensities.  Comparison and analysis.  Conclusions and Future work.16
  • 17.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela System Architecture 17
  • 18.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Agenda  Introduction  Motivation  Objectives  General Inquirer Dictionary (GI)  Logistic ModelTree (LMT)  Datasets used in this research  Methodology & System Architecture  Preprocessing phase  Sentiment Classification Phase  Results & Discussions  Feature Extraction and Bipolar Sentiments  Sentiments Intensities.  Comparison and analysis.  Conclusions and Future work.18
  • 19.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Preprocessing Phase  The primary objective of this phase is to extract feature vectors (FVs) out of documents found in the CD={d1, d2, d3,…,dn}, where n is the total labeled documents in the dataset. Each FVi covers the occurrence or the total of different word senses in di (where i=1,..,n). 19
  • 20.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Preprocessing Phase  1)Translate slang words, acronyms, abbreviations and emoticons found in the raw text di, into their original meanings. Internet Slang Dictionary (ISD) is employed to perform this task and it contains 5380 terms originated from various sources including Bulletin Board,AIM,Yahoo, Twitter,YouTube, Chat Room and others. http://www.internetslang.com This film is extraordinarily horrendous and I'm not going to waste any more words on it . Replaced by Sad 20
  • 21.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Preprocessing Phase  2) Search contracted words in di and replace them with their original form. This film is extraordinarily horrendous and I'm not going to waste any more words on it sad. Replaced by I am 21
  • 22.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Processing Phase  Tokenize the text by breaking it up into words. di={w1, w2, w3, … , wk}, where k is the number of token in the document.  Tag each word using Part-Of-SpeechTagger (POS- Tagger).Words tagged as IN (prepositions), DT (determiners), PRP (Personal pronoun), MD (Modal),WRV (Wh-adverb), NNP (Proper noun), EX (Existential there) or WP (Wh-pronoun) are ignored and considered as stop words.A tagged word is represented by (wj, tj), where tj is the lexical category of the word wj and j=1,..,k. 22
  • 23.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Processing Phase  Part-Of-SpeechTagger (POS-Tagger)  this==>DT (Determiner)  film==>NN (Noun, singular)  is==>VBZ (Verb, 3rd person singular present)  extraordinarily==>RB (Adverb)  horrendous ==>JJ (Adjective)  and==>CC (Cordinating conjuction)  i==>PRP (Personal pronoun)  am==>VB (verb)  not==>RB (Adverb)  go==>VB (verb)  waste==>verb (Verb)  any==>DT (Determiner)  more==>JJR (Adjective, comparative)  words==>NNS (Noun, plural)  on==>IN (Preposition)  it==>PRP (Prnoun)23
  • 24.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Preprocessing Phase  Search the word wj in GI dictionary to find its different senses. If wj is not found stem it to find its root.This is done using Porter Stemming algorithm. This film is extraordinarily horrendous and I am not going to waste any more words on it sad. Replaced by go 24
  • 25.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Preprocessing Phase  Search again the stemmed word wj in GI dictionary with its corresponding tag, to avoid ambiguity in word senses.  If the stemmed word is found in GI, create the word senses vector WjS={Positivj,Negativj,Pstvj,Affilj,Negtvj, Hostilej,Strongj,Weakj,Submitj,Activej,Passivej,Pleasurj, Painj,Feelj,Arousalj,Emotj,Virtuej,Vicej,Ovrstj,Undrstj, Yesj,Noj,Negatej,Intrj,NWj}  otherwise search its synonym inWordnik Dictionary. This film is extraordinarily horrendous and I am not go to waste any more words on it sad. Replaced by fearful 25
  • 26.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Preprocessing Phase  words not found in GI Dictionary are considered noise and they are ignored.Additionally, non-English words are also discarded, therefore, language detection procedure is not needed. 26
  • 27.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Preprocessing Phase  Calculate Positive Polarity (PPi) by summing up the number of occurrence of the following senses: Positiv, Pstv,Affil and Yes in di.  Compute Negative Polarity (NPi) by summing up the number of occurrence of the senses: Negativ, Negtv, Hostile, No and Negate in di. 27
  • 28.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Preprocessing Phase  Count the frequency of explicit negation words (NWi)  The list of negation words used in this research is :”no, not, none, no one, nobody, nothing, neither, nowhere and never” and it is represented by N.  28
  • 29.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Preprocessing Phase  Search in a window of 2 words to detect Positive Polarity Shifter (PPSi) and the Negative Polarity Shifter (NPSi) as follows : I do not hate it (Negation + Negative) = Positive Negative Negation 29
  • 30.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Preprocessing Phase  Calculate theTotal Positive (TPi) and theTotal Negative (TNi)  Calculate potency senses : strong, weak and submit and the rest of senses by counting their occurrance.  Feature vector :  0,0,0,0,1,2,0,1,1,0,1,0,0,0,1,1,1,0,0,0,0,0,0,0, This film is extraordinarily fearful and I am not go to waste any more words on it sad. 30
  • 31.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Preprocessing Phase  Film : No labels.  Extraordinary: Positiv,Virtue and Ovrst.  Fearful (Horrendous) : Negativ,Weak, Passive,  Pain, Emot.  Not: Negate.  Go (going):Active.  Waste : Negativ,Weak,Vice.  More: Strong.  Word: No labels. 31
  • 32.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Preprocessing Phase  Calculate the percentage of each feature by dividing it by the number of words found in GI.The dimension of the final feature vector is 19 of length to represent one document di in CD: FVi={TPi,TNi, Strongi,Weaki, Submiti,Activei, Passivei, Pleasuri, Paini, Feeli,Arousali, Emoti,Virtuei,Vicei, Ovrsti, Undrsti, Intrji, NWi, Classi}.  CD contains 2178 documents split into 1117 positive and 1061 negative. Due to this imbalanced class distribution, three re- sampling techniques were applied and they are :  Re-sampling without replacement.  SMOT.  Under-Sampling .  The new dataset CD1 contains 2114 vectors, where each 1107 belong to a sentiment class. 32
  • 33.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Agenda  Introduction  Motivation  Objectives  General Inquirer Dictionary (GI)  Logistic ModelTree (LMT)  Datasets used in this research  Methodology & System Architecture  Preprocessing phase  Sentiment Classification Phase  Results & Discussions  Feature Extraction and Bipolar Sentiments  Sentiments Intensities.  Comparison and analysis.  Conclusions and Future work.33
  • 34.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Sentiment Classification Phase  a hierarchical LMT (HLMT) is built to discover multi-class sentiment in English text. It consists of three layers;  Bipolar layer (BL) uses the first LMT (LMT-1) for detecting positive and negative sentiments.  Grouping Layer (GL), where positive and negative instances are grouped separately by 2 k-means, 3 clusters for positive instances and other 3 for negative instances. Instances in each cluster are then labeled manually. GL is only used in construction phase of MCSAS.  Intensity Layer (IL) utilizes two LMTs (LMT-2, LMT-3) to determine sentiment strength expressed in six levels (three for each LMT).  The three LMTs are trained and tested utilizing 10-folds stratified cross validation. 34
  • 35.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Sentiment Classification Phase  Present the dataset CD1.  LMT-1 is trained and tested to model positive and negative instances in dataset CD1.  CD1 is partitioned into two sub-datasets CD11 and CD12 for positive and negative instances respectively.  Instances in CD11 are grouped in three clusters employing K-means algorithm.  The 3 generated clusters for positive dimensions (HP, P and LP) are labeled manually and then instances in CD11 are saved in a new sub-dataset CDP.  Instances in CD12 are also grouped in three clusters employing K-means algorithm.  The 3 generated clusters for negative dimensions (HN, N and LN) are labeled manually and then instances in CD12 are saved in a new sub-dataset CDN.  LMT-2 is trained and tested to model positive intensities in CDP.  LMT-3 is trained and tested to model negative intensities in CDN. 35
  • 36.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Agenda  Introduction  Motivation  Objectives  General Inquirer Dictionary (GI)  Logistic ModelTree (LMT)  Datasets used in this research  Methodology & System Architecture  Preprocessing phase  Sentiment Classification Phase  Results & Discussions  Feature Extraction and Bipolar Sentiments  Sentiments Intensities.  Comparison and analysis.  Conclusions and Future work.36
  • 37.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Results & Discussions  Experiments are performed in two different sets.  The performance of the proposed feature extraction algorithm is evaluated and compared withTF*IDF algorithm employing six different ML algorithms to detect sentiment polarity in BL.  The accuracy of the LMT-2 and LMT-3 in IL is measured to detect sentiment intensities reflected in six classes. 37
  • 38.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Agenda  Introduction  Motivation  Objectives  General Inquirer Dictionary (GI)  Logistic ModelTree (LMT)  Datasets used in this research  Methodology & System Architecture  Preprocessing phase  Sentiment Classification Phase  Results & Discussions  Feature Extraction and Bipolar Sentiments  Sentiments Intensities.  Comparison and analysis.  Conclusions and Future work.38
  • 39.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Feature Extraction and Bipolar Sentiments 39
  • 40.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Feature Extraction and Bipolar Sentiments LM_1: Class 0 : -2.04 + [TotalP] * -0.72 + [TotalN] * -0.6 + [Strong] * 0.34 + [Weak] * -0.07 + [Submit] * 7.34 + [Active] * 0.61 + [Passive] * 0.07 + [Pleasur] * -0.22 + [Pain] * -0.58 + [Feel] * 15.58 + [Arousal] * 1.66 + [Emot] * -2.67 + [Virtue] * -0.19 + [Vice] * 1.25 + [Ovrst] * 0.33 + [Undrst] * -0.33 + [Intrj] * 0.86 + [Negation] * -2.46 LM_1 Class 1 : 2.04 + [TotalP] * 0.72 + [TotalN] * 0.6 + [Strong] * -0.34 + [Weak] * 0.07 + [Submit] * -7.34 + [Active] * -0.61 + [Passive] * -0.07 + [Pleasur] * 0.22 + [Pain] * 0.58 + [Feel] * -15.58 + [Arousal] * -1.66 + [Emot] * 2.67 + [Virtue] * 0.19 + [Vice] * -1.25 + [Ovrst] * -0.33 + [Undrst] * 0.33 + [Intrj] * -0.86 + [Negation] * 2.46 The generated tree of LMT-1 is of size 95 with 48 leaves (rules). 40
  • 41.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Feature Extraction and Bipolar Sentiments  Comparison of ML algorithms accuracy for sentiment polarity classification in BL.  Confusion Matrix of LMT-1 for Bipolar Layer Feature Extraction NB SVM KNN JRip J48 LMT-1 GI 66.80 % 72.22 % 91.86% 83.74 % 88.66 % 90.88 % TF*IDF 76.45 % 76.75 % 52.80 % 65.93 % 64.10 % 65.70% Positive Negative Positive 996 111 Negative 91 1016 41
  • 42.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Agenda  Introduction  Motivation  Objectives  General Inquirer Dictionary (GI)  Logistic ModelTree (LMT)  Datasets used in this research  Methodology & System Architecture  Preprocessing phase  Sentiment Classification Phase  Results & Discussions  Feature Extraction and Bipolar Sentiments  Sentiments Intensities  Comparison and analysis.  Conclusions and Future work.42
  • 43.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Sentiment Intensities  2 trees of size 1 with 1 leave, that is, in prediction phase; only six linear functions are used to classify multi-class sentiments expressed in English text, where every three functions belong to a sentiment pole.  Negative Intensities Functions  HN= -30.23 + $Strong * -0.12 + $Submit * 0.52 + $Active * -0.09 + $Passive * 1.2 + $Pleasur * 4.65 + $Pain * 4.49 + $Emot * 7.06 + $Vice * -0.14 + $Ovrst * 0.07 + $Undrst * 0.55 + $Intrj * 1.72 ;  N= 21.48 + $TotalP * 0.29 + $TotalN * -1.84 + $Strong * -0.37 + $Weak * -0.44 + $Submit * -0.15 + $Pain * -0.53 + $Arousal * 0.44 + $Emot * -0.78 + $Virtue * 0.2 + $Vice * -0.11 + $Ovrst * -0.06 + $Intrj * 2.1 + $Negation * 0.2 ;  LN= -18.43 + $TotalP * -0.52 + $TotalN * 0.98 + $Strong * 0.99 + $Active * 0.26 + $Passive * -0.1 + $Pleasur * -2.04 + $Feel * -1.28 + $Arousal * -2.31 + $Virtue * -0.45 + $Vice * 0.8 + $Intrj * -0.31; 43
  • 44.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Sentiment Intensities  Positive Intensities Functions  HP= -1.24 + $Submit * 12.71 + $Active * -0.39 + $Passive * 0.15 + $Pain * -0.28 + $Feel * 4.35 + $Vice * -0.21 + $Ovrst * -0.22 + $Undrst * -0.4 + $Intrj * -0.7 ;  P= 3.09 + $TotalN * -0.41 + $Weak * -1.03 + $Submit * - 0.89 + $Passive * -0.04 + $Pain * -0.69 + $Emot * -0.07 + $Vice * -0.68 + $Ovrst * 0.43 + $Undrst * 0.19 + $Negation * 1.05 ;  LP= -25.16 + $TotalN * 1.13 + $Strong * 0.23 + $Weak * 0.64 + $Active * 0.16 + $Pain * 1.67 + $Vice * 2.84 + $Negation * -0.21; 44
  • 45.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Sentiment Intensities  Distribution of positive and negative instances over clusters In GL.  Comparison of ML algorithms accuracy for sentiment intensity classification. Dataset Cluster 0 Cluster 1 Cluster 2 CD11 (Positive) 208 (19%) 565 (51%) 334 (30%) CD12 (Negative) 228 (21%) 521 (47%) 358 (32%) Dataset NB SVM KNN JRip J48 LMT-2/ LMT-3 CDP (Positive) 93.32 % 97.56% 93.50% 95.48 % 95.03% 99.28 % CDN (Negative) 92.14 % 98.28 % 92.77% 93.95% 94.40 % 99.37 % 45
  • 46.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Sentiment Intensities  Confusion matrix of LMT-2 for positive intensities  Confusion matrix of LMT-3 for negative intensities HP P LP HP 206 (99.03%) 2 0 P 3 560 (99.11%) 2 LP 1 0 333 (99.70%) HP P LP HP 225 (99.11%) 1 2 P 1 354(98.88%) 3 LP 0 0 521(100%) 46
  • 47.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Agenda  Introduction  Motivation  Objectives  General Inquirer Dictionary (GI)  Logistic ModelTree (LMT)  Datasets used in this research  Methodology & System Architecture  Preprocessing phase  Sentiment Classification Phase  Results & Discussions  Feature Extraction and Bipolar Sentiments  Sentiments Intensities.  Comparison and analysis.  Conclusions and Future work.47
  • 48.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Comparison and Analysis  Comparison of hybrid sentiment classifiers Ref. Dataset Feature Selection Techniqu e Accuracy % BL Movie Reviews+ SenTube 24 word senses in GI Dictionary + Handling Negation LMT 90.88 KNN 91.86 Prabowo et al. (2009) Movie Reviews+ MySpace RBC+SBC+GIBC SVM 90.45 (Kennedy and Inkpen, 2006) Movie review 4 word senses in GI and CTRW Dictionaries + Unigram + Bigrams +Handling Negation SVM 85.9 (Cassinelli and Chen, 2009) Movie Reviews Frequency of positive and negative word senses from GI SVM 73.8 Severyn et al. (2014) SenTub. STRUCT Model: a shallow syntactic tree SVM 70.548
  • 49.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Comparison and Analysis  Comparison of Multi-Class Sentiment Classifiers Ref. Dataset Class es Technique Accurac y Average % IL Movie Reviews + SentTube 6 2 LTM 99.30 (Pandy, 2011) Multi-Domain Sentiment Dataset 4 SVM 71.73 Wilson et al. (2004) MPQA (Multi- perspective Question Answering) Opinion Corpus 4 BoosTexte r 57.52 Ripper 55.55 Support Vector Regression 36.53 49
  • 50.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Agenda  Introduction  Motivation  Objectives  General Inquirer Dictionary (GI)  Logistic ModelTree (LMT)  Datasets used in this research  Methodology & System Architecture  Preprocessing phase  Sentiment Classification Phase  Results & Discussions  Feature Extraction and Bipolar Sentiments  Sentiments Intensities.  Comparison and analysis.  Conclusions and Future work.50
  • 51.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Conclusions  The new proposed approach aims at creating automatic multi-class sentiment analysis system which uses GI dictionary with a hierarchical architecture of LMT.  Experiments have demonstrated that the combination of a wide list of features with two-stage classification method has got high accuracies reaching 90.88% for polarity sentiment, 99.28% for positive sentiment intensities and 99.37% for negative sentiment intensities. 51
  • 52.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Future Work  There are several directions for future work.  The first direction is to study the performance of the proposed system on other datasets described in the sentiment analysis literature and on multilingual data from social media.  The second direction is to investigate the influence of incorporating subjective and objective features to improve the overall classification accuracy.  The third direction is to use the new system for building a sentiment – based search engine forYouTube videos.  Finally, the fourth direction is to find the relationship between users’ personality and their sentiments on social media to recommend them appropriate resources adapted to their needs and interests. 52
  • 53.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela ACKNOWLEDGMENT  This work has been supported by the Decanato de Investigación y Desarrollo (DID) from Simon Bolivar University (USB). 53
  • 54.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela References  Cassinelli A. and Chen CW., 2009. CS224N Final Project Boost up! Sentiment categorization with machine learning techniques.  Inkpen D. Z., Feiguina O., and Hirst G., 2005. Generating more-positive and more-negative text. In ComputingAttitude and Affect inText:Theory and Applications.Edited by J. Shanahan,Y. Qu, and J.Wiebe.The Information Retrieval Series, 20, Springer, Dordrecht, the Netherlands, 187– 196.  KennedyA., and Inkpen D., 2006. Sentiment classification of movie reviews using contextual valence shifters, in Computational Intelligence, 22 (2), 110–125.  Landwehr N., Hall M., and Eibe F., 2003. Logistic ModelTrees, in Proc.14th European Conf.on Machine Learning, Cavtat-Dubrovnik,Croatia, 241-252.  Pandey SJ., 2011. Opinion analysis through constraint optimization, MasterThesis, University of York.  Pang B. and Lee L., 2004.A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts, in Proc.of the 42nd Annual Meeting on Association for Computational Linguistics,ACL '04, Stroudsburg,PA, USA, 271-278.  Pang B. and Lee L. 2008. Opinion Mining and SentimentAnalysis, Foundations andTrends in Information Retrieval, 2(1-2), 1–135. 54
  • 55.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela References  Pang B., Lee L., andVaithyanathan S., 2002.Thumbs up? Sentiment classification using machine-learning techniques, in Proc.of the ACL-02 Conf.on Empirical Methods in Natural Language Processing, Philadelphia, PA, 79–86.  Porter MF., 1980.An algorithm for suffix stripping, Program, 14(3), 130-137.  Prabowo R.,Thelwall M., 2009.Sentiment analysis:A combined approach, Journal of Informatics, 3(2), 143-157.  SeverynA., Moschitti A., Uryupina O., Plank B., Filippova K., 2014. Opinion mining on YouTube,In Proc.of the Annual Meeting of the Association for Computational Linguistics (ACL 2014).  Uryupina O., Plank B., SeverynA., RotondiA. and Moschitti A., 2014. SenTube:A corpus for sentiment analysis onYouTube social media, In Proc of the 9th conf.of Language Resources and Evaluation (LREC’14), Reykjavik-Iceland, 4244-4249.  Vinodhini G., and Chandrasekaran RM. 2012.SentimentAnalysis and Opinion Mining:A Survey, International Journal of Advanced Research in Computer Science and Software Engineering, 2(6), 282-292.  WilsonT.,Wiebe J. and Hwa R., 2004. Just how mad are you? finding strong and weak opinion clauses, In Proc.of 19th National Conf.on Artificial Intelligence, 761-769.  Witten I.H, Frank E., and Hall M., 2005. Data Mining: Practical machine learning tools and techniques, The Morgan Kaufmann Series in Data Management Systems. 55
  • 56.
    Multi-Class SentimentAnalysis usinga Hierarchical Logistic ModelTree Approach Masun Nabhan Homsi, USB,Venezuela Questions  Thank you very much