Amazon Alexa Reviews
Nikhil Shrivastava
Positive or Negative Alexa Reviews
Love my Echo!
Not working
Not good at all!
Amazing product
Focus of the Project: Alexa Reviews: Is this review positive or negative?
Dataset
Sentiment Classification for Alexa Reviews
Amazon Alexa Reviews Classification: A list of 3150 Amazon customers
reviews for Alexa Echo, Firestick, Echo Dot, etc and classify them if it’s
positive or negative.
Source of Dataset: https://www.kaggle.com/sid321axn/amazon-alexa-
reviews/metadata
Alexa Reviews Kaggle Dataset
Rating 5
• I love my Echo. It's easy to
operate, loads of fun. It is
everything as advertised. I use it
mainly to play my favorite tunes
and test Alexa's knowledge.
• Being able to add speakers is a
plus. I take it on my deck when I
am outside. Just love it. I have
my big Alexia in my bedroom
Ratings 4-1
• I didn't like that almost every
time i asked Alexa a question she
would say I don't know that, or I
haven't learned that.
• This device does not interact
with my home filled with Apple
devices. How disappointing!
Alexa Reviews Dataset Deep Dive
Dataset Snapshot:
Total length of the Data : 3150
Length of different ratings:
Combining Ratings 1,2,3 and 4 in negative sentiments and Rating 5 in positive sentiments
Dataset Deep Dive(Word cloud for Positive and Negative
Sentiments)
For Positive sentiments which is rating 5 we can
see words like love, great, good ,easy, etc
For Negative sentiments which is rating 1-4 we can
see words like disappointed, return,need, etc.
Most common words in entire dataset
We can clearly see that love has occurred 545 times and is pretty common.
Sentiment Analysis Setup
Feature Engineering and Baseline Algorithms
1. Tokenization
2. Vectorize
3. Classification using
1. Naïve Bayes Classifier
2. Random Forest Classifier
Tokenization
• First use stop-words to get clean reviews
• Tokenize the cleaned reviews using word_tokenize()
Vectorization: Creating Bag-of-Words model
• Used both Count Vectorizer and TF-IDF Vectorizer to count the occurrences and
frequency of tokens and building a sparse matrix of documents x tokens
• Count Vectorizer: Counts the occurrences of tokens to build the matrix.
• TF-IDF Vectorizer: Stands for Term Frequency Inverse Document Frequency. It is a
statistical measure used to evaluate how important a word is to a document in
the collection.
Count and TF-IDF Vectorizer
Finally proceeded with Count Vectorizer as it
was giving better results with ML models.
For TF-IDF to work better, I could have selected
bi-gram and tri-gram methods which would
give more accurate bag-of-words model
Multinomial Naïve Bayes Classifier
• In order to chose a label which should be assigned to a document w =
{w1,w2…wn), multinomial NB classifier begins by calculating the prior probability
Pr( c) of each label c which is determined by checking the frequency of each label
in the training set. The contribution from each word is then combined with Pr( c),
to arrive at a likelihood estimate for each label. It can be defined formally as:
Multinomial NB Classifier: Train and Test
Started with training the dataset and
the n checking the accuracy on test
dataset. Test dataset was 33% of the
entire dataset.
Accuracy is 80% and F-score which is
the harmonic mean of Precision and
Recall is 87%.
Weighted Precision, Recall Confusion Matrix
• Precision is the measure of false positives : TP/TP+FP which means
retrieval of relevant instances out of all positive instances. High
Precision means that an algorithm returned more relevant results
than irrelevant ones.
• Recall is the retrieval of True Positives out of TP’s and FN’s: TP/TP+FN.
High Recall means that an algorithm returned most of the relevant
results.
• TP = 701, TN = 135, FP= 169, FN = 35
Why weighted?
Used weighted Precision, Recall because weighted by support (the
number of true instances for each label) alters 'macro' to account for
label imbalance otherwise it can result in an F-score that is not
between precision and recall.
Precision is 0.80
Recall is 0.80
Random Forest Classifier
• Random forests is considered as a highly accurate and robust method
because of the number of decision trees participating in the process.
• It does not suffer from the overfitting problem. The main reason is
that it takes the average of all the predictions, which cancels out the
biases by using “feature bagging”.
Grid Search – To get the best estimator
• Used Grid Search to get the
best estimator in terms of
max features, max depth of
the tree, min_sample_split
and min_sample_leaf.
• Predicted the test using the
best estimator Random Forest
model.
• Accuracy is better slightly
approx. 81.05% and F- score is
also good 87.5%.
Precision, Recall and Confusion Matrix
• We can see that the FP’s has reduced
and TN have increased. But it’s still
better based on Precision and Recall.
• Precision and Recall is slightly better
with 81% approximately.
• Both have similar scores so our results
are evenly balanced here.
Feature Importance
Based on the feature importance, we can clearly see that
words like love, work, great, disappointed were the most
important words in determining any class of reviews.
Conclusion
• Overall, we can predict with 80% accuracy positive or negative review.
• Random Forest result were better than Naïve Bayes
Further Potential Enhancement
• By selecting and putting only important features, shown on previous
slide model accuracy can be further improved.
References
• https://medium.com/greyatom/an-introduction-to-bag-of-words-in-
nlp-ac967d43b428
• https://www.researchgate.net/publication/317173563_Bayesian_Mul
tinomial_Naive_Bayes_Classifier_to_Text_Classification

Sentiment Analysis - Amazon Alexa Reviews

  • 1.
  • 2.
    Positive or NegativeAlexa Reviews Love my Echo! Not working Not good at all! Amazing product Focus of the Project: Alexa Reviews: Is this review positive or negative?
  • 3.
  • 4.
    Sentiment Classification forAlexa Reviews Amazon Alexa Reviews Classification: A list of 3150 Amazon customers reviews for Alexa Echo, Firestick, Echo Dot, etc and classify them if it’s positive or negative. Source of Dataset: https://www.kaggle.com/sid321axn/amazon-alexa- reviews/metadata
  • 5.
    Alexa Reviews KaggleDataset Rating 5 • I love my Echo. It's easy to operate, loads of fun. It is everything as advertised. I use it mainly to play my favorite tunes and test Alexa's knowledge. • Being able to add speakers is a plus. I take it on my deck when I am outside. Just love it. I have my big Alexia in my bedroom Ratings 4-1 • I didn't like that almost every time i asked Alexa a question she would say I don't know that, or I haven't learned that. • This device does not interact with my home filled with Apple devices. How disappointing!
  • 6.
    Alexa Reviews DatasetDeep Dive Dataset Snapshot: Total length of the Data : 3150 Length of different ratings: Combining Ratings 1,2,3 and 4 in negative sentiments and Rating 5 in positive sentiments
  • 7.
    Dataset Deep Dive(Wordcloud for Positive and Negative Sentiments) For Positive sentiments which is rating 5 we can see words like love, great, good ,easy, etc For Negative sentiments which is rating 1-4 we can see words like disappointed, return,need, etc.
  • 8.
    Most common wordsin entire dataset We can clearly see that love has occurred 545 times and is pretty common.
  • 9.
  • 10.
    Feature Engineering andBaseline Algorithms 1. Tokenization 2. Vectorize 3. Classification using 1. Naïve Bayes Classifier 2. Random Forest Classifier
  • 11.
    Tokenization • First usestop-words to get clean reviews • Tokenize the cleaned reviews using word_tokenize()
  • 12.
    Vectorization: Creating Bag-of-Wordsmodel • Used both Count Vectorizer and TF-IDF Vectorizer to count the occurrences and frequency of tokens and building a sparse matrix of documents x tokens • Count Vectorizer: Counts the occurrences of tokens to build the matrix. • TF-IDF Vectorizer: Stands for Term Frequency Inverse Document Frequency. It is a statistical measure used to evaluate how important a word is to a document in the collection.
  • 13.
    Count and TF-IDFVectorizer Finally proceeded with Count Vectorizer as it was giving better results with ML models. For TF-IDF to work better, I could have selected bi-gram and tri-gram methods which would give more accurate bag-of-words model
  • 14.
    Multinomial Naïve BayesClassifier • In order to chose a label which should be assigned to a document w = {w1,w2…wn), multinomial NB classifier begins by calculating the prior probability Pr( c) of each label c which is determined by checking the frequency of each label in the training set. The contribution from each word is then combined with Pr( c), to arrive at a likelihood estimate for each label. It can be defined formally as:
  • 15.
    Multinomial NB Classifier:Train and Test Started with training the dataset and the n checking the accuracy on test dataset. Test dataset was 33% of the entire dataset. Accuracy is 80% and F-score which is the harmonic mean of Precision and Recall is 87%.
  • 16.
    Weighted Precision, RecallConfusion Matrix • Precision is the measure of false positives : TP/TP+FP which means retrieval of relevant instances out of all positive instances. High Precision means that an algorithm returned more relevant results than irrelevant ones. • Recall is the retrieval of True Positives out of TP’s and FN’s: TP/TP+FN. High Recall means that an algorithm returned most of the relevant results. • TP = 701, TN = 135, FP= 169, FN = 35
  • 17.
    Why weighted? Used weightedPrecision, Recall because weighted by support (the number of true instances for each label) alters 'macro' to account for label imbalance otherwise it can result in an F-score that is not between precision and recall. Precision is 0.80 Recall is 0.80
  • 18.
    Random Forest Classifier •Random forests is considered as a highly accurate and robust method because of the number of decision trees participating in the process. • It does not suffer from the overfitting problem. The main reason is that it takes the average of all the predictions, which cancels out the biases by using “feature bagging”.
  • 19.
    Grid Search –To get the best estimator • Used Grid Search to get the best estimator in terms of max features, max depth of the tree, min_sample_split and min_sample_leaf. • Predicted the test using the best estimator Random Forest model. • Accuracy is better slightly approx. 81.05% and F- score is also good 87.5%.
  • 20.
    Precision, Recall andConfusion Matrix • We can see that the FP’s has reduced and TN have increased. But it’s still better based on Precision and Recall. • Precision and Recall is slightly better with 81% approximately. • Both have similar scores so our results are evenly balanced here.
  • 21.
    Feature Importance Based onthe feature importance, we can clearly see that words like love, work, great, disappointed were the most important words in determining any class of reviews.
  • 22.
    Conclusion • Overall, wecan predict with 80% accuracy positive or negative review. • Random Forest result were better than Naïve Bayes Further Potential Enhancement • By selecting and putting only important features, shown on previous slide model accuracy can be further improved.
  • 23.