Major Project
 Overall reviews of a product based on their ratings and content
of reviews
 Clustering of similar reviews of a product for sentiment analysis
 Checking the sentiment of reviews corresponding to their
ratings.
 Phase 1: In this we,
 We selected the desired problem statement
 Collected the required data set to analyse and operate upon
 Since the review of each product in the dataset was in the form
JSON Objects, it was converted to CSV format for analysis.
 Phase 2: This phase largely involves the data pre-processing.
 Firstly we removed the stop words from the dataset to reduce
the size of inverted list being formed.
 Next we applied the data cleaning so further reduce any useless
data.
 Porter – Stemming Algorithm is applied to further reduce our
dataset and now we can operate on the resulting data.
 We compared the sentiment of each word to form a rough idea
about the type of comments we getting corresponding to the
product.
 Phase 3: Implementation of the mining algorithm
 We extracted features from the resulting dataset in the above phase.
These features will now be the basis for implementation of the mining
algorithms in the next few steps.
 Features set is normalized to be used in cluster formation
 We check for Correlation on the normalized data and check for any
possibility of data-set reduction
 We applied the K-Means algorithm by using the features extracted
such as very positive, positive, neutral, negative, very negative as the
basis.
 The resultant data is then analyzed and its accuracy is checked for
completion of the project.
 Porter Stemming Algorithm
Stemming is a part of process of data cleaning. It is used to minimize
our dataset while creating posting list so that different words having same
root words are clubbed together as a single word in our posting list.
 K-Means Clustering
The data mining algorithm which is used to form clusters of reviews
having similar features. We will be creating 2 clusters in our project.
Review Mining of Products of Amazon.com
Review Mining of Products of Amazon.com

Review Mining of Products of Amazon.com

  • 1.
  • 2.
     Overall reviewsof a product based on their ratings and content of reviews  Clustering of similar reviews of a product for sentiment analysis  Checking the sentiment of reviews corresponding to their ratings.
  • 4.
     Phase 1:In this we,  We selected the desired problem statement  Collected the required data set to analyse and operate upon  Since the review of each product in the dataset was in the form JSON Objects, it was converted to CSV format for analysis.
  • 5.
     Phase 2:This phase largely involves the data pre-processing.  Firstly we removed the stop words from the dataset to reduce the size of inverted list being formed.  Next we applied the data cleaning so further reduce any useless data.  Porter – Stemming Algorithm is applied to further reduce our dataset and now we can operate on the resulting data.  We compared the sentiment of each word to form a rough idea about the type of comments we getting corresponding to the product.
  • 6.
     Phase 3:Implementation of the mining algorithm  We extracted features from the resulting dataset in the above phase. These features will now be the basis for implementation of the mining algorithms in the next few steps.  Features set is normalized to be used in cluster formation  We check for Correlation on the normalized data and check for any possibility of data-set reduction  We applied the K-Means algorithm by using the features extracted such as very positive, positive, neutral, negative, very negative as the basis.  The resultant data is then analyzed and its accuracy is checked for completion of the project.
  • 7.
     Porter StemmingAlgorithm Stemming is a part of process of data cleaning. It is used to minimize our dataset while creating posting list so that different words having same root words are clubbed together as a single word in our posting list.  K-Means Clustering The data mining algorithm which is used to form clusters of reviews having similar features. We will be creating 2 clusters in our project.