Process the sentiments of NLP with Naive Bayes Rule, Random Forest, Support Vector Machine, and much more.
Thanks, for your time, if you enjoyed this short slide there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
2. Sentimental analysis
What is sentimental analysis?
Sentimental analysis is contextual mining of text which identifies and
extracts subjective information and helping the business to understand
the social sentiment of their brand, product or service.
In other words it is the process of determining whether a piece of writing
is positive, negative or neutral.
Application of sentiment analysis
Sources of data:- Twitter, facebook, survey, product reviews etc.
Applications:
1.) Fashion: Accessories, apparel, outlets, designing, brands etc.
2.) Automobile: type of pre-owned cars, features, requirements etc.
3.) Books, Malls and stores, Online Services, Travel, Healthcare, etc.
Rupak Roy
3. Sentimental analysis:1.Naïve Bayes
Machine Learning Classification Methods
1) Naïve Bayes: this supervised classification method uses Bayes Rule.
So this depends on “bag of Words” of a document.
Office 1
Traffic 3
Time 2
Early 1
Late 2
* A Bag of words or BoW means collection of words, discarding
grammar and order of words but keeping the multiplicity.
Its a way of extracting features from text for use in machine learning
modeling.
Rupak Roy
4. Recap: Naive Bayes Rule
In spam filtering the Naive Bayes algorithm was widely used. The
algorithm takes the count of “a particular word" mention in the spam list
with a normal mail, then it multiplies both probabilities using the Bayes
equation.
Good word list
Spam list
Later, spammers figure it out how to trick spam filters by adding lots of
"good" words at the end of the email and this method is
called Bayesian poisoning.
Rupak Roy
Great -235
Opportunities -3
Speak -44
Meeting -246
Collaborative-3
Sales-77
Scope - 98
100% - 642
Fast -78
Hurry - 40
“hello”
P(B|A) P(A)
P(A|B) = = Not Spam
P(B)
5. Recap: Naive Bayes Rule
It ignore few things:
words, word order, length. It just looks for frequency to do the
classification
Naïve Bayes strength & weakness
Advantage:
Being a supervised classification algorithm it is easy to implement
Weakness:
It breaks in funny ways. Previously when people did Google search for
Chicago bulls. It gave animals rather than city.
Because phrases that comprises multiple words with distinct different
meanings. Don‟t work with Naïve Bayes. And requires categorical
variable as target.
Assumptions: Bag of words position doesn‟t matter.
Conditional independence. Eg. „Great‟ occurring not dependent or
word „fabulous‟ in the same document.
Rupak Roy
6. Recap: Naive Bayes Rule
Prior probability of Green = no.of green objects/total no. of objects
Prior probability of Red = No. of Red objects/ total number of objects
Green 40/60=4/6
Red 20/60=2/6
Prior probability is computed without any knowledge about the point
likelihood computed after knowing what the data point is.
What is the likelihood of Red point= no. of red points/ total no. of points in
the neighborhood
What is the likelihood of green point = no. of green points/ total no. of points
in the neighborhood
Posterior probability of ‘x’ being Green = prior probability of green X
likelihood of „x‟ given Green = 4/6 X1/40=1/60 = 0.016
Posterior probability of ‘x’ being Red = prior probability of Red X likelihood of
„x‟ given Red = 2/6 X 3/20 =1/20 = 0.05
Prior Probability X test evidence = posterior probability
7. Recap: Naive Bayes Rule
Finally we classify „x‟ as Red since it class membership achieves the
largest posterior probability.
Formula to remember
In Naïve Bayes we simply take the maximum & convert them into Yes &
No, Classification.
Rupak Roy
8. Recap: Naive Bayes Rule
Marty
Love
.1
Deal
.8
Life
.1
Rupak Roy
Alica
Love
.5
Deal
.2
Life
.3
Assume,
Prior Probability
P(Alica)=0.5
P(Marty)=0.5
Love Life: So what is the probability of who wrote this mail:
Marty: .1.1 * .5
Alica: .5 .3 * .5(Its Alica) easy by seeing
Life Deal: Marty: .1 .8 .5(prior prob.) = 0.04
Alica: .2 .3 .5(prior prob.) = 0.03. So its Marty.
We can also do the same like
Posterior P(Marty|”Life Deal”)=0.04/(0.04+0.03)=4/7=57
P(Alica|”Life Deal”)=0.03/0.07=3/7=48
(0.04+0.03 i.e. 0.07 way to scale/normalize to 1)
9. Sentimental analysis: 1.Naive Bayes
A/c Bayes Theorem to sentimental analysis
Sentiment
A/c Bayes theorem, Classifier
P(Word/class) = P(class/word) * P (word) / P(class)
=P(Positive/Early)*P(Early) / P(Positive)
Rupak Roy
Bag of Words
Early
Late
Positive
Negative
Positive
Negative
70%
30%
80%
20%
20%
30%
Unconditional
(Probability)
Conditional
10. Sentimental analysis: 1.Naive Bayes
Naïve Bayes Assumptions:
1. Bag of words assumptions: position doesn‟t matter
2. Conditional Independence: Assume the feature probabilities are
independent given to the class.
Eg. Great occurring not dependent on word fabulous in the same
document.
So Phrases that comprises multiple words with distinct different
meaning, don‟t work with Naïve Bayes
Rupak Roy
11. Sentimental analysis: 2.Decision Trees
Give a Loan?
Decision trees can separate Non-Linear
To Linear decision surface
Random Forest is the collection of several models in this case collection
of decision trees that are used in order to increase predictive power &
the final score is obtained by aggregating them.
This is known as Ensemble Method in Machine Learning.
Credit
History
Good
Debt<1000
No
Time
Bad
Time >18
P=.3
Rupak Roy
12. Sentimental analysis: 2.Random Forest
Steps on how to use and build a random forest model:
1. Select the number of trees to be build i.e. Ntree = N (default N is 500)
2. Now select a bagging sample from the train dataset.
3. Define the mtry that is the number of randomly selected
predictors/features will be used to make the split.
4. Grow until it stops improving, in other words until the error no longer
decreases.
OOB Error (Out Of Bag)
For each sample ran from the data set(training dataset), there will be
samples left behind that were not included due to its robustness to
outliers and missing values comes with a cost of throwing some data
as we have learned in our previous chapter.
So these samples are called as Out of Bag (OOB) samples.
Rupak Roy
13. Sentimental analysis: 2.Random Forest
Advantages:
Can handle noisy or missing data very well.
In RF we don‟t need to separately create a test data set for cross
validation as each model uses 60% of the observations and 30%
approx. for accessing the performance of the model.
OOB or Out Of Bag sample also works as a cross validation for the
accuracy of a random forest model.
Helps to identify the important variables.
Disadvantages:
Unlike decision trees the model is not easily interpretable.
Prune to over fitting. Two common ways to avoid over fitting
Pre- pruning and Post-pruning. Post-pruning is more preferable because
predicting an estimate.
Over fitting refers to a model that models the training data too well to
the extent that it cannot recognize the pattern on an unseen new
data. Hence negatively impacts the performance of the model on new
data.
Rupak Roy
14. Sentimental analysis: 2.Random Forest
Random Forest Classification Technique
Positive Positive Negative Positive
Hence positive.
Rupak Roy
Data
Features
Decision Tree
Sample 1
Decision Tree
Sample 2
Decision Tree
Sample 3
Decision Tree
Sample 4
15. Sentimental analysis: 3.SVM
The most popular method of classical classification.
It tries to draw two lines between data points with the largest margin
between them.
Which is the line that best separates the data?
And why this line is the
best line that separates
the data?
What this does it maximizes the distance to the
nearest points and is named as MARGIN.
Margin is the distance between the line and the
nearest point between two classes.
Rupak Roy
16. Sentimental analysis: 3.SVM
Which line here is the best line?
This(blue) line maximizes the distance between the
data points while sacrificing a class which in turn
called as Class Error. So the 2nd(green) is the best
line that maximizes the distance between 2 classes
Support Vector Machine first classifies classes
correctly then maximizes the margin.
How can we solve this?
SVM‟s are good to find the
decision boundaries that max
the distance between classes
and at the same tolerates
the individual outliers.
Outlier
17. Sentimental analysis: 3.SVM
Non-Linear Data
Yes SVM will work!
SVM‟s will use Feature X and Y and will convert it
to a label (either Blue or Red)
Now we will have 3 dimensional space where we can separate
the classes linearly.
We will find, we will have small amount of Z in X axis and small with blue class.
Z measures the distance from the origin.
So is this linearly separable? Yes!
This blue line in actual represents the circle.
x
Y
𝑧 = 𝑥2
+ 𝑦2
𝑦
𝑥
SVM
Labels
𝑥
𝑧
18. Sentimental analysis: 4.Maximum Entropy
4. Maximum entropy: is technique of learning probability distribution
from data.
Maximum entropy models offer a clean way to combine diverse pieces
of contextual evidence in order to estimate the probability of a certain
linguistic class occurring in a document.
Eg. classify our documents into 3 classes: Positive, Negative, Neutral
• Each document must be classified into one of the classes, so
P(positive)+P(negative)+P(neutral)= 1 i.e. 100%
• Without additional information choose the model that makes the least
Rupak Roy
19. Sentimental analysis: 4.Maximum Entropy
Least Assumptions = Most Uniform
If the word “Good "appears in the document then
P(positive|”Good”) = 0.8
The Max Entropy model what it does, it starts adjusting with the other
classifiers whenever one of the classification is very high.
P(negative|”Good”)=0.1
P(neutral|”Good”)=0.1
Maximum Entropy modeling creates a distribution that accepts all these
constraints, while being uniform as possible. It tries to distribute equally
among all the classifiers but also takes into account the constraints.
So when we have more observations/constraints:
• P(Positive|”Good”)=0.8
• P(Negative|”Not Okay”)=0.7
• P(Neutral|”SoSo”) =0.3
Rupak Roy
20. Sentimental analysis: 4.Maximum Entropy
Why uniform distribution?
• Most uniform = Maximum Entropy
Least assumptions = simplest explanation
Maximum Entropy is one of the machine learning modeling technique
in NLP that is highly effective in classification with high accuracy.
Therefore MaxEntropy is a useful and easy-to-understand tool to help
computers make decisions based off of “features” on your data.
Rupak Roy
21. Next
Let‟s perform Sentimental analysis with the help of an example where
we will have reviews of the product.
Rupak Roy