2. Readers often rely on book reviews by other users to
choose a book
Currently, reviews on Amazon are rated by users
Potentially helpful reviews can go unnoticed if unrated
Let a machine classify reviews as helpful
GoodReviews : Motivation
goodreviews.co
Excerpt of a helpful unrated review
3. Text file with book reviews from Amazon
Unstructured data
Data
Book reviews as they appear on Amazon
Response used in the modeling
Fraction of users who rated the review that
found it to be useful
J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. RecSys, 2013.
4. Three class classification
One vs Rest with Random Forest classifier
Features (NLP)
No. of adverbs, adjectives …..
in review and summary
References to genre
No. of words, sentences
Subjectivity, polarity and
lexical diversity
• Precision and Recall used as success metrics
• ~70% average precision and recall in the test and validation sets
• ~75% precision and recall for the most helpful reviews
• AOC ~ 0.88
Not
helpful
Middle Helpful
20% 80%
Fractional
helpfulness
5. Helpful reviews have more neutral sentiment
Helpful reviews have
no extremes of sentiment
Review polarity
Mean Variance
Helpful 0.17 0.12
Not helpful 0.05 0.25
6. Helpful reviews are longer and have fewer unique words
Helpful reviews
longer on average
Helpful reviews are
less lexically diverse
Number of words
in the review
Lexical
diversity
Mean Variance
Helpful 0.61 0.1
Not helpful 0.73 0.13
9. Details of data and algorithm
• Features extracted using NLTK and TextBlob
• Trained on ~ 60,000 reviews evenly split between the 3
classes
• In reality, the dataset is distributed as 3:3:1
(Bad:Middle:Good)
• Reviews rated at least 20 times used
• Test and validation sets consisting of 30,000 reviews each
• One vs Rest using Random Forest classifier