SlideShare a Scribd company logo
1 of 21
Download to read offline
NLP
Sentiment Analysis
Sentimental analysis
What is sentimental analysis?
Sentimental analysis is contextual mining of text which identifies and
extracts subjective information and helping the business to understand
the social sentiment of their brand, product or service.
In other words it is the process of determining whether a piece of writing
is positive, negative or neutral.
Application of sentiment analysis
Sources of data:- Twitter, facebook, survey, product reviews etc.
Applications:
1.) Fashion: Accessories, apparel, outlets, designing, brands etc.
2.) Automobile: type of pre-owned cars, features, requirements etc.
3.) Books, Malls and stores, Online Services, Travel, Healthcare, etc.
Rupak Roy
Sentimental analysis:1.Naïve Bayes
Machine Learning Classification Methods
1) Naïve Bayes: this supervised classification method uses Bayes Rule.
So this depends on “bag of Words” of a document.
Office 1
Traffic 3
Time 2
Early 1
Late 2
* A Bag of words or BoW means collection of words, discarding
grammar and order of words but keeping the multiplicity.
Its a way of extracting features from text for use in machine learning
modeling.
Rupak Roy
Recap: Naive Bayes Rule
In spam filtering the Naive Bayes algorithm was widely used. The
algorithm takes the count of “a particular word" mention in the spam list
with a normal mail, then it multiplies both probabilities using the Bayes
equation.
Good word list
Spam list
Later, spammers figure it out how to trick spam filters by adding lots of
"good" words at the end of the email and this method is
called Bayesian poisoning.
Rupak Roy
Great -235
Opportunities -3
Speak -44
Meeting -246
Collaborative-3
Sales-77
Scope - 98
100% - 642
Fast -78
Hurry - 40
“hello”
P(B|A) P(A)
P(A|B) = = Not Spam
P(B)
Recap: Naive Bayes Rule
It ignore few things:
words, word order, length. It just looks for frequency to do the
classification
Naïve Bayes strength & weakness
Advantage:
Being a supervised classification algorithm it is easy to implement
Weakness:
It breaks in funny ways. Previously when people did Google search for
Chicago bulls. It gave animals rather than city.
Because phrases that comprises multiple words with distinct different
meanings. Don‟t work with Naïve Bayes. And requires categorical
variable as target.
Assumptions: Bag of words position doesn‟t matter.
Conditional independence. Eg. „Great‟ occurring not dependent or
word „fabulous‟ in the same document.
Rupak Roy
Recap: Naive Bayes Rule
Prior probability of Green = no.of green objects/total no. of objects
Prior probability of Red = No. of Red objects/ total number of objects
Green 40/60=4/6
Red 20/60=2/6
Prior probability is computed without any knowledge about the point
likelihood computed after knowing what the data point is.
What is the likelihood of Red point= no. of red points/ total no. of points in
the neighborhood
What is the likelihood of green point = no. of green points/ total no. of points
in the neighborhood
Posterior probability of ‘x’ being Green = prior probability of green X
likelihood of „x‟ given Green = 4/6 X1/40=1/60 = 0.016
Posterior probability of ‘x’ being Red = prior probability of Red X likelihood of
„x‟ given Red = 2/6 X 3/20 =1/20 = 0.05
Prior Probability X test evidence = posterior probability
Recap: Naive Bayes Rule
Finally we classify „x‟ as Red since it class membership achieves the
largest posterior probability.
Formula to remember
In Naïve Bayes we simply take the maximum & convert them into Yes &
No, Classification.
Rupak Roy
Recap: Naive Bayes Rule
Marty
Love
.1
Deal
.8
Life
.1
Rupak Roy
Alica
Love
.5
Deal
.2
Life
.3
Assume,
Prior Probability
P(Alica)=0.5
P(Marty)=0.5
Love Life: So what is the probability of who wrote this mail:
Marty: .1.1 * .5
Alica: .5 .3 * .5(Its Alica) easy by seeing
Life Deal: Marty: .1 .8 .5(prior prob.) = 0.04
Alica: .2 .3 .5(prior prob.) = 0.03. So its Marty.
We can also do the same like
Posterior P(Marty|”Life Deal”)=0.04/(0.04+0.03)=4/7=57
P(Alica|”Life Deal”)=0.03/0.07=3/7=48
(0.04+0.03 i.e. 0.07 way to scale/normalize to 1)
Sentimental analysis: 1.Naive Bayes
A/c Bayes Theorem to sentimental analysis
Sentiment
A/c Bayes theorem, Classifier
P(Word/class) = P(class/word) * P (word) / P(class)
=P(Positive/Early)*P(Early) / P(Positive)
Rupak Roy
Bag of Words
Early
Late
Positive
Negative
Positive
Negative
70%
30%
80%
20%
20%
30%
Unconditional
(Probability)
Conditional
Sentimental analysis: 1.Naive Bayes
Naïve Bayes Assumptions:
1. Bag of words assumptions: position doesn‟t matter
2. Conditional Independence: Assume the feature probabilities are
independent given to the class.
Eg. Great occurring not dependent on word fabulous in the same
document.
So Phrases that comprises multiple words with distinct different
meaning, don‟t work with Naïve Bayes
Rupak Roy
Sentimental analysis: 2.Decision Trees
Give a Loan?
Decision trees can separate Non-Linear
To Linear decision surface
Random Forest is the collection of several models in this case collection
of decision trees that are used in order to increase predictive power &
the final score is obtained by aggregating them.
 This is known as Ensemble Method in Machine Learning.
Credit
History
Good
Debt<1000
No
Time
Bad
Time >18
P=.3
Rupak Roy
Sentimental analysis: 2.Random Forest
Steps on how to use and build a random forest model:
1. Select the number of trees to be build i.e. Ntree = N (default N is 500)
2. Now select a bagging sample from the train dataset.
3. Define the mtry that is the number of randomly selected
predictors/features will be used to make the split.
4. Grow until it stops improving, in other words until the error no longer
decreases.
OOB Error (Out Of Bag)
 For each sample ran from the data set(training dataset), there will be
samples left behind that were not included due to its robustness to
outliers and missing values comes with a cost of throwing some data
as we have learned in our previous chapter.
 So these samples are called as Out of Bag (OOB) samples.
Rupak Roy
Sentimental analysis: 2.Random Forest
Advantages:
 Can handle noisy or missing data very well.
 In RF we don‟t need to separately create a test data set for cross
validation as each model uses 60% of the observations and 30%
approx. for accessing the performance of the model.
 OOB or Out Of Bag sample also works as a cross validation for the
accuracy of a random forest model.
 Helps to identify the important variables.
Disadvantages:
 Unlike decision trees the model is not easily interpretable.
 Prune to over fitting. Two common ways to avoid over fitting
Pre- pruning and Post-pruning. Post-pruning is more preferable because
predicting an estimate.
Over fitting refers to a model that models the training data too well to
the extent that it cannot recognize the pattern on an unseen new
data. Hence negatively impacts the performance of the model on new
data.
Rupak Roy
Sentimental analysis: 2.Random Forest
Random Forest Classification Technique
Positive Positive Negative Positive
Hence positive.
Rupak Roy
Data
Features
Decision Tree
Sample 1
Decision Tree
Sample 2
Decision Tree
Sample 3
Decision Tree
Sample 4
Sentimental analysis: 3.SVM
The most popular method of classical classification.
It tries to draw two lines between data points with the largest margin
between them.
Which is the line that best separates the data?
And why this line is the
best line that separates
the data?
What this does it maximizes the distance to the
nearest points and is named as MARGIN.
Margin is the distance between the line and the
nearest point between two classes.
Rupak Roy
Sentimental analysis: 3.SVM
Which line here is the best line?
This(blue) line maximizes the distance between the
data points while sacrificing a class which in turn
called as Class Error. So the 2nd(green) is the best
line that maximizes the distance between 2 classes
Support Vector Machine first classifies classes
correctly then maximizes the margin.
How can we solve this?
SVM‟s are good to find the
decision boundaries that max
the distance between classes
and at the same tolerates
the individual outliers.
Outlier
Sentimental analysis: 3.SVM
Non-Linear Data
Yes SVM will work!
SVM‟s will use Feature X and Y and will convert it
to a label (either Blue or Red)
Now we will have 3 dimensional space where we can separate
the classes linearly.
We will find, we will have small amount of Z in X axis and small with blue class.
Z measures the distance from the origin.
So is this linearly separable? Yes!
This blue line in actual represents the circle.
x
Y
𝑧 = 𝑥2
+ 𝑦2
𝑦
𝑥
SVM
Labels
𝑥
𝑧
Sentimental analysis: 4.Maximum Entropy
4. Maximum entropy: is technique of learning probability distribution
from data.
Maximum entropy models offer a clean way to combine diverse pieces
of contextual evidence in order to estimate the probability of a certain
linguistic class occurring in a document.
Eg. classify our documents into 3 classes: Positive, Negative, Neutral
• Each document must be classified into one of the classes, so
P(positive)+P(negative)+P(neutral)= 1 i.e. 100%
• Without additional information choose the model that makes the least
Rupak Roy
Sentimental analysis: 4.Maximum Entropy
Least Assumptions = Most Uniform
If the word “Good "appears in the document then
P(positive|”Good”) = 0.8
The Max Entropy model what it does, it starts adjusting with the other
classifiers whenever one of the classification is very high.
P(negative|”Good”)=0.1
P(neutral|”Good”)=0.1
Maximum Entropy modeling creates a distribution that accepts all these
constraints, while being uniform as possible. It tries to distribute equally
among all the classifiers but also takes into account the constraints.
So when we have more observations/constraints:
• P(Positive|”Good”)=0.8
• P(Negative|”Not Okay”)=0.7
• P(Neutral|”SoSo”) =0.3
Rupak Roy
Sentimental analysis: 4.Maximum Entropy
Why uniform distribution?
• Most uniform = Maximum Entropy
Least assumptions = simplest explanation
Maximum Entropy is one of the machine learning modeling technique
in NLP that is highly effective in classification with high accuracy.
Therefore MaxEntropy is a useful and easy-to-understand tool to help
computers make decisions based off of “features” on your data.
Rupak Roy
Next
Let‟s perform Sentimental analysis with the help of an example where
we will have reviews of the product.
Rupak Roy

More Related Content

What's hot

Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERTshaurya uppal
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)Sharayu Patil
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Exploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science ClubExploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science ClubMartin Bago
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine LearningSamra Shahzadi
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means ClusteringAnna Fensel
 
Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
 
Classification Algorithm.
Classification Algorithm.Classification Algorithm.
Classification Algorithm.Megha Sharma
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Simplilearn
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysisM. Atif Qureshi
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measuresankit_ppt
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Simplilearn
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine LearningKuppusamy P
 

What's hot (20)

Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Exploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science ClubExploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science Club
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Classification Algorithm.
Classification Algorithm.Classification Algorithm.
Classification Algorithm.
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine Learning
 

Similar to NLP - Sentiment Analysis

Understanding the Machine Learning Algorithms
Understanding the Machine Learning AlgorithmsUnderstanding the Machine Learning Algorithms
Understanding the Machine Learning AlgorithmsRupak Roy
 
Data mining approaches and methods
Data mining approaches and methodsData mining approaches and methods
Data mining approaches and methodssonangrai
 
An Introduction to SPSS
An Introduction to SPSSAn Introduction to SPSS
An Introduction to SPSSRayman Soe
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reductionMarco Quartulli
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffRaman Kannan
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional VerificationSai Kiran Kadam
 
Questions for R language.pdf
Questions for R language.pdfQuestions for R language.pdf
Questions for R language.pdfsdfghj21
 
Naive_hehe.pptx
Naive_hehe.pptxNaive_hehe.pptx
Naive_hehe.pptxMahimMajee
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Sherri Gunder
 
SVM Based Identification of Psychological Personality Using Handwritten Text
SVM Based Identification of Psychological Personality Using Handwritten Text SVM Based Identification of Psychological Personality Using Handwritten Text
SVM Based Identification of Psychological Personality Using Handwritten Text IJERA Editor
 
Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3butest
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Simplilearn
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018digitalzombie
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesSơn Còm Nhom
 

Similar to NLP - Sentiment Analysis (20)

Understanding the Machine Learning Algorithms
Understanding the Machine Learning AlgorithmsUnderstanding the Machine Learning Algorithms
Understanding the Machine Learning Algorithms
 
Naive.pdf
Naive.pdfNaive.pdf
Naive.pdf
 
Data mining approaches and methods
Data mining approaches and methodsData mining approaches and methods
Data mining approaches and methods
 
An Introduction to SPSS
An Introduction to SPSSAn Introduction to SPSS
An Introduction to SPSS
 
Unit-1.pdf
Unit-1.pdfUnit-1.pdf
Unit-1.pdf
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reduction
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
Questions for R language.pdf
Questions for R language.pdfQuestions for R language.pdf
Questions for R language.pdf
 
Naive_hehe.pptx
Naive_hehe.pptxNaive_hehe.pptx
Naive_hehe.pptx
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
 
SVM Based Identification of Psychological Personality Using Handwritten Text
SVM Based Identification of Psychological Personality Using Handwritten Text SVM Based Identification of Psychological Personality Using Handwritten Text
SVM Based Identification of Psychological Personality Using Handwritten Text
 
Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...
 
Introduction to spss
Introduction to spssIntroduction to spss
Introduction to spss
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
fINAL ML PPT.pptx
fINAL ML PPT.pptxfINAL ML PPT.pptx
fINAL ML PPT.pptx
 

More from Rupak Roy

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPRupak Roy
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPRupak Roy
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLPRupak Roy
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLPRupak Roy
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical StepsRupak Roy
 
Text Mining using Regular Expressions
Text Mining using Regular ExpressionsText Mining using Regular Expressions
Text Mining using Regular ExpressionsRupak Roy
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining Rupak Roy
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase ArchitectureRupak Roy
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase Rupak Roy
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQLRupak Roy
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Rupak Roy
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive Rupak Roy
 
Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSRupak Roy
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Rupak Roy
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functionsRupak Roy
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to FlumeRupak Roy
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Rupak Roy
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command LineRupak Roy
 
Apache PIG Relational Operations
Apache PIG Relational Operations Apache PIG Relational Operations
Apache PIG Relational Operations Rupak Roy
 
Apache PIG casting, reference
Apache PIG casting, referenceApache PIG casting, reference
Apache PIG casting, referenceRupak Roy
 

More from Rupak Roy (20)

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLP
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLP
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLP
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical Steps
 
Text Mining using Regular Expressions
Text Mining using Regular ExpressionsText Mining using Regular Expressions
Text Mining using Regular Expressions
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase Architecture
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQL
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMS
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to Flume
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command Line
 
Apache PIG Relational Operations
Apache PIG Relational Operations Apache PIG Relational Operations
Apache PIG Relational Operations
 
Apache PIG casting, reference
Apache PIG casting, referenceApache PIG casting, reference
Apache PIG casting, reference
 

Recently uploaded

PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 

Recently uploaded (20)

PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 

NLP - Sentiment Analysis

  • 2. Sentimental analysis What is sentimental analysis? Sentimental analysis is contextual mining of text which identifies and extracts subjective information and helping the business to understand the social sentiment of their brand, product or service. In other words it is the process of determining whether a piece of writing is positive, negative or neutral. Application of sentiment analysis Sources of data:- Twitter, facebook, survey, product reviews etc. Applications: 1.) Fashion: Accessories, apparel, outlets, designing, brands etc. 2.) Automobile: type of pre-owned cars, features, requirements etc. 3.) Books, Malls and stores, Online Services, Travel, Healthcare, etc. Rupak Roy
  • 3. Sentimental analysis:1.Naïve Bayes Machine Learning Classification Methods 1) Naïve Bayes: this supervised classification method uses Bayes Rule. So this depends on “bag of Words” of a document. Office 1 Traffic 3 Time 2 Early 1 Late 2 * A Bag of words or BoW means collection of words, discarding grammar and order of words but keeping the multiplicity. Its a way of extracting features from text for use in machine learning modeling. Rupak Roy
  • 4. Recap: Naive Bayes Rule In spam filtering the Naive Bayes algorithm was widely used. The algorithm takes the count of “a particular word" mention in the spam list with a normal mail, then it multiplies both probabilities using the Bayes equation. Good word list Spam list Later, spammers figure it out how to trick spam filters by adding lots of "good" words at the end of the email and this method is called Bayesian poisoning. Rupak Roy Great -235 Opportunities -3 Speak -44 Meeting -246 Collaborative-3 Sales-77 Scope - 98 100% - 642 Fast -78 Hurry - 40 “hello” P(B|A) P(A) P(A|B) = = Not Spam P(B)
  • 5. Recap: Naive Bayes Rule It ignore few things: words, word order, length. It just looks for frequency to do the classification Naïve Bayes strength & weakness Advantage: Being a supervised classification algorithm it is easy to implement Weakness: It breaks in funny ways. Previously when people did Google search for Chicago bulls. It gave animals rather than city. Because phrases that comprises multiple words with distinct different meanings. Don‟t work with Naïve Bayes. And requires categorical variable as target. Assumptions: Bag of words position doesn‟t matter. Conditional independence. Eg. „Great‟ occurring not dependent or word „fabulous‟ in the same document. Rupak Roy
  • 6. Recap: Naive Bayes Rule Prior probability of Green = no.of green objects/total no. of objects Prior probability of Red = No. of Red objects/ total number of objects Green 40/60=4/6 Red 20/60=2/6 Prior probability is computed without any knowledge about the point likelihood computed after knowing what the data point is. What is the likelihood of Red point= no. of red points/ total no. of points in the neighborhood What is the likelihood of green point = no. of green points/ total no. of points in the neighborhood Posterior probability of ‘x’ being Green = prior probability of green X likelihood of „x‟ given Green = 4/6 X1/40=1/60 = 0.016 Posterior probability of ‘x’ being Red = prior probability of Red X likelihood of „x‟ given Red = 2/6 X 3/20 =1/20 = 0.05 Prior Probability X test evidence = posterior probability
  • 7. Recap: Naive Bayes Rule Finally we classify „x‟ as Red since it class membership achieves the largest posterior probability. Formula to remember In Naïve Bayes we simply take the maximum & convert them into Yes & No, Classification. Rupak Roy
  • 8. Recap: Naive Bayes Rule Marty Love .1 Deal .8 Life .1 Rupak Roy Alica Love .5 Deal .2 Life .3 Assume, Prior Probability P(Alica)=0.5 P(Marty)=0.5 Love Life: So what is the probability of who wrote this mail: Marty: .1.1 * .5 Alica: .5 .3 * .5(Its Alica) easy by seeing Life Deal: Marty: .1 .8 .5(prior prob.) = 0.04 Alica: .2 .3 .5(prior prob.) = 0.03. So its Marty. We can also do the same like Posterior P(Marty|”Life Deal”)=0.04/(0.04+0.03)=4/7=57 P(Alica|”Life Deal”)=0.03/0.07=3/7=48 (0.04+0.03 i.e. 0.07 way to scale/normalize to 1)
  • 9. Sentimental analysis: 1.Naive Bayes A/c Bayes Theorem to sentimental analysis Sentiment A/c Bayes theorem, Classifier P(Word/class) = P(class/word) * P (word) / P(class) =P(Positive/Early)*P(Early) / P(Positive) Rupak Roy Bag of Words Early Late Positive Negative Positive Negative 70% 30% 80% 20% 20% 30% Unconditional (Probability) Conditional
  • 10. Sentimental analysis: 1.Naive Bayes Naïve Bayes Assumptions: 1. Bag of words assumptions: position doesn‟t matter 2. Conditional Independence: Assume the feature probabilities are independent given to the class. Eg. Great occurring not dependent on word fabulous in the same document. So Phrases that comprises multiple words with distinct different meaning, don‟t work with Naïve Bayes Rupak Roy
  • 11. Sentimental analysis: 2.Decision Trees Give a Loan? Decision trees can separate Non-Linear To Linear decision surface Random Forest is the collection of several models in this case collection of decision trees that are used in order to increase predictive power & the final score is obtained by aggregating them.  This is known as Ensemble Method in Machine Learning. Credit History Good Debt<1000 No Time Bad Time >18 P=.3 Rupak Roy
  • 12. Sentimental analysis: 2.Random Forest Steps on how to use and build a random forest model: 1. Select the number of trees to be build i.e. Ntree = N (default N is 500) 2. Now select a bagging sample from the train dataset. 3. Define the mtry that is the number of randomly selected predictors/features will be used to make the split. 4. Grow until it stops improving, in other words until the error no longer decreases. OOB Error (Out Of Bag)  For each sample ran from the data set(training dataset), there will be samples left behind that were not included due to its robustness to outliers and missing values comes with a cost of throwing some data as we have learned in our previous chapter.  So these samples are called as Out of Bag (OOB) samples. Rupak Roy
  • 13. Sentimental analysis: 2.Random Forest Advantages:  Can handle noisy or missing data very well.  In RF we don‟t need to separately create a test data set for cross validation as each model uses 60% of the observations and 30% approx. for accessing the performance of the model.  OOB or Out Of Bag sample also works as a cross validation for the accuracy of a random forest model.  Helps to identify the important variables. Disadvantages:  Unlike decision trees the model is not easily interpretable.  Prune to over fitting. Two common ways to avoid over fitting Pre- pruning and Post-pruning. Post-pruning is more preferable because predicting an estimate. Over fitting refers to a model that models the training data too well to the extent that it cannot recognize the pattern on an unseen new data. Hence negatively impacts the performance of the model on new data. Rupak Roy
  • 14. Sentimental analysis: 2.Random Forest Random Forest Classification Technique Positive Positive Negative Positive Hence positive. Rupak Roy Data Features Decision Tree Sample 1 Decision Tree Sample 2 Decision Tree Sample 3 Decision Tree Sample 4
  • 15. Sentimental analysis: 3.SVM The most popular method of classical classification. It tries to draw two lines between data points with the largest margin between them. Which is the line that best separates the data? And why this line is the best line that separates the data? What this does it maximizes the distance to the nearest points and is named as MARGIN. Margin is the distance between the line and the nearest point between two classes. Rupak Roy
  • 16. Sentimental analysis: 3.SVM Which line here is the best line? This(blue) line maximizes the distance between the data points while sacrificing a class which in turn called as Class Error. So the 2nd(green) is the best line that maximizes the distance between 2 classes Support Vector Machine first classifies classes correctly then maximizes the margin. How can we solve this? SVM‟s are good to find the decision boundaries that max the distance between classes and at the same tolerates the individual outliers. Outlier
  • 17. Sentimental analysis: 3.SVM Non-Linear Data Yes SVM will work! SVM‟s will use Feature X and Y and will convert it to a label (either Blue or Red) Now we will have 3 dimensional space where we can separate the classes linearly. We will find, we will have small amount of Z in X axis and small with blue class. Z measures the distance from the origin. So is this linearly separable? Yes! This blue line in actual represents the circle. x Y 𝑧 = 𝑥2 + 𝑦2 𝑦 𝑥 SVM Labels 𝑥 𝑧
  • 18. Sentimental analysis: 4.Maximum Entropy 4. Maximum entropy: is technique of learning probability distribution from data. Maximum entropy models offer a clean way to combine diverse pieces of contextual evidence in order to estimate the probability of a certain linguistic class occurring in a document. Eg. classify our documents into 3 classes: Positive, Negative, Neutral • Each document must be classified into one of the classes, so P(positive)+P(negative)+P(neutral)= 1 i.e. 100% • Without additional information choose the model that makes the least Rupak Roy
  • 19. Sentimental analysis: 4.Maximum Entropy Least Assumptions = Most Uniform If the word “Good "appears in the document then P(positive|”Good”) = 0.8 The Max Entropy model what it does, it starts adjusting with the other classifiers whenever one of the classification is very high. P(negative|”Good”)=0.1 P(neutral|”Good”)=0.1 Maximum Entropy modeling creates a distribution that accepts all these constraints, while being uniform as possible. It tries to distribute equally among all the classifiers but also takes into account the constraints. So when we have more observations/constraints: • P(Positive|”Good”)=0.8 • P(Negative|”Not Okay”)=0.7 • P(Neutral|”SoSo”) =0.3 Rupak Roy
  • 20. Sentimental analysis: 4.Maximum Entropy Why uniform distribution? • Most uniform = Maximum Entropy Least assumptions = simplest explanation Maximum Entropy is one of the machine learning modeling technique in NLP that is highly effective in classification with high accuracy. Therefore MaxEntropy is a useful and easy-to-understand tool to help computers make decisions based off of “features” on your data. Rupak Roy
  • 21. Next Let‟s perform Sentimental analysis with the help of an example where we will have reviews of the product. Rupak Roy