“Amazon Reviews: Which Words are Most Helpful?” used term-frequency inverse document frequency (TF-IDF) modeling to discover which words most contributed to Amazon's highest-ranked (and lowest-ranked) reviews as well as the terms most highly correlated with customer purchases.
1. Amazon Data Riders Page 1 of x
Amazon Reviews: What’s Really
Helpful?
Israel Kloss, Diana Trebino, Robert Baker, Shaun Deporter,
John (Jack) McDonald
2. Amazon Data Riders Page 2 of x
History
• Jeff Bezos founded Amazon in 1994
• Headquarters: Seattle Washington
• Largest internet-based retailer in the US
• Began as an online bookstore
• First book sold: Douglas Hofstadler’s Fluid Concepts
and Creative: Computer Models of the Fundamental
Mechanisms of Thought.
• 1st year of profits: 2001 - $5 million
6. Amazon Data Riders Page 6 of x
Motivation
• Many of the product reviews are ineffective at helping
potential customers learn more about the product
• Reviewers are often affected by exogenous
circumstances that may negatively inadvertently
reduce the quality of their review
• Or, they aren’t focused on issues/characteristics that
are important to others
7. Amazon Data Riders Page 7 of x
Research Question
• Using an Amazon product review data set, which repeatedly-
used words show up among the reviews that have the highest
"helpful" ranking by the readers?
• This specific dataset focused on cameras and books
8. Amazon Data Riders Page 8 of x
Amazon Product Review Dataset
• ReviewId : id of the review
• ReviewerId : id of the reviewer
• Asin : product ID on amazon
• Review : review text
• Title : title of the review
• DateWritten : date the review was written
• HelpfulVotes : number of helpful votes
• PrcHelpful : percentage of helpful votes (i.e. helpfulness)
• TotalVotes : total votes
• ProductRating : rating (1 to 5) that the review assigned to the product
• Verified : verified purchase by reviewer
• Badges : any badges the reviewer has
9. Amazon Data Riders Page 9 of x
Amazon Product Review Dataset
• 60,000 rows
• summary of prcHelpful
• summary of length of reviews
Min 1st Qu Median Mean 3rd Qu Max
-1.0 0.8330 1.0 0.88535 1.0 1.0
Min 1st Qu Median Mean 3rd Qu Max
0.0 155.0 358.0 686.1 821.2 32,630.0
10. Amazon Data Riders Page 10 of x
General Approach
• Created “helpful” dependent binary variable by using
thresholds of “prcHelpful”
• Created datasets to find most frequent words in
reviews deemed most helpful and most unhelpful
• Cleaned data to eliminate records with fewer than 5
votes, etc.
• Ran models to get predictive coefficients, significant
variables, accuracy
13. Amazon Data Riders Page 13 of x
Results:
More specific/descriptive
words: recommend,
price, reader, battery
Single-word occurrence model
had the greatest number of highest-significance (***) helpful words
Helpful
Estimate Std. Error z value Pr(>|z|) Significance
use 0.255836 0.07543 3.392 0.000695***
one 0.2821 0.073267 3.85 0.000118***
also 0.325108 0.092454 3.516 0.000437***
read 0.338809 0.086397 3.922 8.80E-05***
price 0.405081 0.113728 3.562 0.000368***
love 0.484127 0.101152 4.786 1.70E-06***
great 0.491618 0.080858 6.08 1.20E-09***
batteri 0.518258 0.117712 4.403 1.07E-05***
recommend 0.544638 0.11818 4.609 4.06E-06***
reader 0.553475 0.165715 3.34 0.000838***
easi 0.558622 0.121889 4.583 4.58E-06***
work 0.261715 0.082078 3.189 0.00143**
qualiti 0.27044 0.099945 2.706 0.006812**
mani 0.281039 0.099964 2.811 0.004933**
well 0.282249 0.092191 3.062 0.002202**
small 0.361829 0.128318 2.82 0.004806**
help 0.398555 0.132465 3.009 0.002623**
life 0.40411 0.126059 3.206 0.001347**
around 0.411475 0.139336 2.953 0.003146**
11 Significantly
Helpful Words
Found
14. Amazon Data Riders Page 14 of x
Generally more generic/
non-descriptive words:
buy, problem, like
Single-word occurrence model
had the greatest number of significant (***) unhelpful words
• Causation would require a controlled experiment, but here is a thought:
• Perhaps there is an association in reader’s minds between the word “buy” and a product
sales pitch. Or perhaps some reviews were actually disguised sales pitches.
Results:
UnHelpful
Estimate Std. Error z value Pr(>|z|) Significance
buy -0.370665 0.089356 -4.148 3.35E-05***
know -0.339205 0.100322 -3.381 0.000722***
point -0.322965 0.112858 -2.862 0.004214**
world -0.308137 0.127768 -2.412 0.015879*
give -0.215427 0.104678 -2.058 0.03959*
like -0.16023 0.076525 -2.094 0.036275*
actual -0.228637 0.130229 -1.756 0.079146.
doesnt -0.216152 0.118061 -1.831 0.067122.
author -0.212546 0.119046 -1.785 0.074193.
len -0.20534 0.110326 -1.861 0.062715.
back -0.180539 0.101675 -1.776 0.07579.
say -0.179834 0.099963 -1.799 0.072017.
got -0.177735 0.106905 -1.663 0.096402.
problem -0.172533 0.103116 -1.673 0.094291.
new -0.172463 0.099863 -1.727 0.084166.
still -0.170395 0.099429 -1.714 0.086579.
Only 2 highly
significant
unhelpful
words were
found
15. Amazon Data Riders Page 15 of x
Distribution of Length
Helpful Mean = 0.02187246
Unhelpful Mean = -0.1354268
Helpful Median = -0.3252782
Unhelpful Median = -0.3897026
16. Amazon Data Riders Page 16 of x
• Our analysis determined that reviews that focused on
specific characteristics/specs of the products were
most effective in helping customers
Observations
17. Amazon Data Riders Page 17 of x
How Amazon can employ our analysis
• Amazon should encourage effective reviewers to
review products more often by offering badges or
other perks, similar to sites like Trip Advisor
• Amazon can also educate customers on how to draft
an effective review by encouraging use of words or
themes identified in our analysis and recommend an
optimal length
• Training or reminders can be provided via e-mail or
during the review process
18. Amazon Data Riders Page 18 of x
Words from 3 models associated with purchasing and not purchasing.
Extra Results: