Sentiment Analysis.pptx

Sentiment Analysis And Opinion
Mining
Mohamed Khamis Mohamed
Shrief Salem
Abdelrhman Hisham
Undersuprivesd
Prof. Tarek Ghareeb

Agenda
● Introduction
● Problem definition
● Objective
● Dataset
● Exploratory Data Analysis (EDA)
● Text pre-processing(Cleaning)
● Methodology and Techqniqes

Introduction
Sentiment analysis is a technique for analysing a piece
of text to determine the sentiment contained within it.
It accomplishes this by combining machine learning
and natural language processing (NLP).

Problem definition
The act of computationally recognising and categorising
opinions contained in a piece of text, especially in order
to discern whether the writer has a good, negative, or
neutral attitude toward a given topic, product, etc.

Objective
The main goal is to estimate the sentiment many movie
reviews from the Internet Movie Database (IMDb).

Dataset
IMDB dataset having 50K movie reviews for natural language
processing or Text analytics. This is a dataset for binary sentiment
classification containing substantially more data than previous
benchmark datasets. It consists of a set of 25,000 highly polar
movie reviews for training and 25,000 for testing. So,we have to
predict the number of positive and negative reviews using either
classification or deep learning algorithms.So here we will use BERT
and train it for classifying reviews as positive/negative correctly.

Removing the html strips
Cleaned Text:
A wonderful little production. The filming technique is very unassuming- very old-time-BBC
fashion and gives a comforting, and sometimes discomforting, sense of realism to the entire
piece
def strip_html(text):
soup = BeautifulSoup(text, "html.parser")
return soup.get_text()
Text:
A wonderful little production. The filming technique is very unassuming- very old-time-BBC fashion
and gives a comforting, and sometimes discomforting,sense of realism to the entire piece

Removing special characters
and punctuations
Cleaned Text:
A wonderful little production The filming technique is very unassuming very old time BBC
fashion and gives a comforting and sometimes discomforting sense of realism to the entire
piece
def remove_special_characters(text):
pattern=r'[^a-zA-z0-9s]'
text=re.sub(pattern,'',text)
text.translate(str.maketrans(' ', ' ', string.punctuation))
return text
Text:
A wonderful little production. The filming technique is very unassuming- very old-time-BBC fashion and gives a
comforting, and sometimes discomforting, sense of realism to the entire piece

Remove stopwords
Cleaned Text:
wonderful little production filming technique unassuming old time BBC fashion
gives comforting sometimes discomforting sense realism entire piece
def remove_stopwords(text):
tokens = tokenizer.tokenize(text)
filtered_tokens = [token for token in tokens if token.lower() not in stopword_list]
filtered_text = ' '.join(filtered_tokens)
return filtered_text
Text:
A wonderful little production The filming technique is very unassuming very old time BBC fashion and gives a
comforting and sometimes discomforting sense of realism to the entire piece

Exploratory Data
Analysis (EDA)

count of words in positive
and negative reviews

Positive review sample
A wonderful little production. The filming technique is very unassuming- very
old-time-BBC fashion and gives a comforting, and sometimes discomforting, sense of
realism to the entire piece. The actors are extremely well chosen- Michael
Sheen not only "has got all the polari" but he has all the voices down pat too! You can
truly see the seamless editing guided by the references to Williams' diary entries, not only
is it well worth the watching but it is a terrific written and performed piece. A masterful
production about one of the great master's of comedy and his life. The
realism really comes home with the little things: the fantasy of the guard which, rather
than use the traditional 'dream' techniques remains solid then disappears. It plays on our
knowledge and our senses, particularly with the scenes concerning Orton and Halliwell
and the sets (particularly of their flat with Halliwell's murals decorating every surface) are
terribly well done.

Negative review sample
This show was an amazing, fresh & innovative idea in the 70's when it first aired. The first
7 or 8 years were brilliant, but things dropped off after that. By 1990, the show was not
really funny anymore, and it's continued its decline further to the complete waste of time
it is today. It's truly disgraceful how far this show has fallen. The writing is
painfully bad, the performances are almost as bad - if not for the mildly entertaining
respite of the guest-hosts, this show probably wouldn't still be on the air. I find it so hard
to believe that the same creator that hand-selected the original cast also chose the band
of hacks that followed. How can one recognize such brilliance and then see fit to replace
it with such mediocrity? I felt I must give 2 stars out of respect for the original cast that
made this show such a huge success. As it is now, the show is just awful. I can't believe
it's still on the air.

Feature Extraction
● TF-IDF Vectorizer
● Word2Vec Embedding

Models
● MLPClassifier
● Support Vector Machine (SMV)
● Long Short Term Memory (LSTM)
● Convolution Neural Network (CNN)
● CNN-LSTM(Hybrid)
● BERT

MLPClassifier Results
precision recall f1-score support
Positive 0.87 0.87 0.87 4993
Negative 0.87 0.87 0.87 5007
accuracy 0.87 10000
weighted avg 0.87 0.87 0.87 10000

SMV Results
Positive 0.87 0.86 0.87 4993
Negative 0.87 0.87 0.87 5007
accuracy 0.87 10000
weighted avg 0.87 0.87 0.87 10000

LSTM Results
Positive 0.81 0.83 0.82 4964
Negative 0.83 0.81 0.82 5036
accuracy 0.82 10000
weighted avg 0.82 0.82 0.82 10000

CNN Results
Positive 0.82 0.79 0.81 4964
Negative 0.83 0.81 0.82 5036
accuracy 0.81 10000
weighted avg 0.81 0.81 0.81 10000

CNN-LSTM Results
Positive 0.80 0.85 0.82 4964
Negative 0.84 0.79 0.81 5036
accuracy 0.82 10000
weighted avg 0.82 0.82 0.82 10000

BERT Results
Positive 0.90 0.91 0.90 4964
Negative 0.91 0.90 0.90 5036
accuracy 0.90 10000
weighted avg 0.90 0.90 0.90 10000

Summary
Model Feature
Results
Precision Recall F1-score accuracy
MLP TFIDF 0.87 0.87 0.87 0.87
SVM TFIDF 0.87 0.87 0.87 0.87
LSTM Word2Vec 0.82 0.82 0.82 0.82
CNN Word2Vec 0.81 0.81 0.81 0.81
CNN-LSTM Word2Vec 0.82 0.82 0.82 0.82
Bert Bert 0.90 0.90 0.90 0.90

The main motive behind this project was to construct a sentiment analysis model that
will help us to get a better understanding of movie reviews that we have collected,
We compared the results of different classifiers: MLP, Support Vector Machine (SVM),
LSTM, CNN, Hyperd LSTM-CNN, and BERT.
For Evaluation, we observed the accuracy provided by each model.
By evaluating the models, we found out that Bert gives us the highest accuracy score of
90%.
Conclusion

References
● MaisYasen, Sara Tedmori. “Movies Reviews Sentiment Analysis and Classification”. IEEE Jordon International
Joint Conference on Electrical Engineering and Information Technology (JEEIT). 978-1-5386-7942-5.
● Tirath Prasad Sahu, Sanjeev Ahuja. “Sentiment Analysis of movie reviews: A study on feature selection and
classification algorithms”. International Conference on Microelectronics, Computing, and Communication
(MicroCom).978-1-4673-6621-2.
● Wijayanto, Unggul and Sarno, Ritanarto. “An Experimental Study of Supervised Sentiment Analysis Using
Gaussian Naïve Bayes”. 476-481.10.1109/ISEMANTIC.2018.8549788.
● Tejaswini M. Untawale, G. Choudhari. “Implementation of Sentiment Classification of Movie Reviews by
Supervised Machine Learning Approaches”. 978-1-5386-7808-4.
● Sourav Mehra, Tanupriya Choudhury. “Sentiment Analysis of User Entered Text”. International Conference of
Computational Techniques, Electronics and Mechanical Systems (CTEMS). 978-1-5386-7709-4.
● Nisha Rathee, Nikita Joshi, Jaspreet Kaur. “Sentiment Analysis Using Machine Learning Techniques on Python”.
978-1-5386-2842-3 “https://ieeexplore.ieee.org/document/8663224”.
● https://www.researchgate.net/profile/Raouf_Ganda/publication/318975052_Deep_learning_for_sentence_classi
fication/links/59cd37a30f7e9b454f9f9181/Deep-learning-for-sentenceclassification.pdf
https://www.aclweb.org/anthology/P12-3020.pdf

Sentiment Analysis.pptx

Recommended

Recommended

More Related Content

Similar to Sentiment Analysis.pptx

Similar to Sentiment Analysis.pptx (20)

Recently uploaded

Recently uploaded (20)

Sentiment Analysis.pptx