This PPT include the project description of topic modelling to group reviews from Flipkart or Amazon. It contains the introduction, dataset used in the project, methodology or model used, result achieved and conclusion of the project.
3. Introduction
Point 2
It becomes difficult to access what we are
looking for, so we need to organize ,understand
and summarize the information. Sentimental
analysis show us the compound sentiment of the
large set of reviews and topic modelling acts as
to tool to find a hidden topical pattern which is
present in the collection.
Point 4
This project contains dataset
of reviews and perform
various text pre-processing,
EDA, Sentimental analysis
and topic modelling to reach
to desired output.
Point 1
In recent years, the usage of E-
Commerce has increased the amount
of reviews given by the customer for a
particular product.
Point 3
Topic modelling can be described as a
method for finding a group of words
from a collection of data that best
represents the information in the
data.
4. Dataset Used
Dataset contains all the
reviews and respective
dates from various
category of smartphones
on Flipkart.
What is dataset all
about?
The whole dataset is
created using web scraping
from Flipkart using Python.
How the dataset is
created?
One lakh forty thousand
reviews
Number of reviews in
dataset
• Python
• Beautiful Soup, selenium
• requests
• Html
Tools and module used
for creating dataset
5. Methodology / model used
01
02
03
04
The project completely used Python language and its
various library for designing whole model.
Python
Using text pre-processing all the noise has been removed like
hashtags, emoji etc. Using EDA data has been analysed like
getting most frequent word in dataset, average word length etc.
EDA and text Pre-processing
Sentiment analysis is done to find the customer’s emotion. VADER library of
Python is used to perform Sentiment analysis. VADER is a lexicon and rule
based sentiment analysis tool.
Sentiment Analysis
LDA is used for topic modeling. It classify documents in different tags. We know
that LDA divides the given corpus in fixed number of topics and can also provide
which topics are contained in a document and with what probability.
LDA(Latent Dirichlet Allocation)
6. Result Achieved- 01
We have achieved either positive, negative or neutral sentiment using Vader sentiments and using
topic modelling we have categorize our model in seven different topics
Fig 1: Sample dataframe after computing sentiment analysis
Fig 2: graph of sentiment
analysis using Vader
9. Conclusion and Future Work
Conclusion-01
From the sentiment analysis that we have done
using VADER, we conclude that a larger portion
of the customer community favors or have
positive sentiment towards mobile phones
purchasing from Flipkart.
Conclusion-02
Using topic modelling we categorize
our dataset into seven different
topics according to their similarities
using LDA model.
Future work-01
we will consider using different
deep learning models and try
different and more complex
models in order to achieve better
results.
Future work-02
Additionally, we will verify the model over
larger datasets other than the given
dataset for better results.
01
02
03
04
10. References
• D. Blei, A. Ng, M. Jordan. Latent Dirichlet Allocation. Journal of
Machine Learning Research, 3: 993-1022, 2003.
• Jockers, Matthew & Thalken, Rosamond. (2020). Topic
modelling. 10.1007/978-3-030-39643-5_17.
• Hanna M. Wallach. 2006. Topic modeling: beyond bag-of-
words. In Proceedings of the 23rd international conference on
Machine learning (ICML ’06).