Predicting Helpfulness of User-Generated Product Reviews Through Analytical Models

Predic'ng
Helpfulness
of

Amazon’s
User-‐Generated
Product
Reviews

Ankita
Kaul
&
Nicholas
Baladis

MIT
Sloan
–
Spring
2015

Project

Mo'va'on

Amazon
prioriAzes
product
reviews
that
customers

deem
‘helpful’,
only
a@er
customers
have

voluntarily
voted
so.

Customers
can

voluntarily
vote

here

Ankita
Kaul
&
Nick
Baladis
|
MIT
Sloan

…Amazon
could
predict
which
reviews
are
helpful,

the
moment
they
are
posted?

Product
Ra5ng

Helpfulness
score

Ankita
Kaul
&
Nick
Baladis
|
MIT
Sloan

Data
Galore*

Our
data
consisted
of

Amazon
user-‐generated

product
reviews,
spanning
all

product
categories,
and

spanning
a
Ame
of
18
years.

Each
‘observaAon’
is
a

customer’s
review.

•  Reviewer
ID
•  Helpfulness
RaAng

•  Product
ID
•  Product
Price

•  Timestamp
of

review

•  Review
Prose

•  Score

Data
Structure:

~35M
Reviews,

All
Categories

~1.2M,
Electronics

Categories

~18K,

Only
Reviews
with

>10
votes

Downsize
Downsize

We
had
to
downsize:

*Data
procured
from
Stanford
University

J.
McAuley
and
J.
Leskovec.
Hidden
factors
and

hidden
topics:
understanding
ra5ng
dimensions

with
review
text.
RecSys,
2013.

Ankita
Kaul
&
Nick
Baladis
|
MIT
Sloan

Analysis
Approach

The
Setup

The
Methodology

Dependent

variable

Is
a
review
helpful
or

not?

• ‘Yes’
if
>75%
voters

agree

• Binary
variable

Independent

variables

Pre-‐Exis'ng
from
data
set:

• Product
Price

• Overall
product
raAng

Newly
calculated:

• Word
count
of
review
prose

• Readability
grade-‐level
score

On
unclustered

data
set

• Linear
Regression

• LogisAc
Regression

• CART

• Cross-‐Validated
CART

• Random
Forest

• Bag
of
Words

On
clustered
data

set

• LogisAc
Regression

• CART

• Cross-‐Validated
CART

• Random
Forest

• Bag
of
Words

Flesch-‐Kincaid
method:

Ankita
Kaul
&
Nick
Baladis
|
MIT
Sloan

PredicAons
on
Unclustered
Data
Set

Methodology
Accuracy

Baseline
74.95%

Linear
Regression

R2
=
0.273

LogisAc
Regression
81.44%

CART
80.88%

Cross-‐V
CART
81.84%

Random
Forest
81.94%

BoW
&
LogisAc
Reg
81.08%

BoW
&
CART
79.80%

BoW
&
Cross-‐V
CART
78.16%

BoW
&
Random
Forest
82.08%

score >= 2.5
price < 210
work >= 0.5
score >= 1.5
price < 30
FALSE
FALSE
FALSE
FALSE TRUE
TRUE
yes no
BoW
&
CART
Tree

Our
predic've
models
look
promising:

Ankita
Kaul
&
Nick
Baladis
|
MIT
Sloan

Clustering
The
Data
Set

Cluster
1
-‐
Eloquent
&
wordy

•  Highest
word
count

•  Highest
grade
score

Cluster
2
–
Cheap
products
&
less

wordy

•  Lowest
price

•  Low
word
count

Cluster
3
–Worse
products
&
shortest

reviews

•  Lowest
word
count

•  Lowest
product
score

Cluster
4
–
The
‘average’
group

•  Average
in
all
variables

Cluster
5
–
Expensive
products
&

least
arAculate
reviews

•  Highest
price

•  Low
grade
score

15%

35%

31%

14%

5%

05000001000000
Cluster Dendrogram
Height
Ankita
Kaul
&
Nick
Baladis
|
MIT
Sloan

Cluster

Baseline

Accuracy

Best
Performing

Accuracy

Best
Performing

Methodology

Cluster
1
90.52%
90.52%
Baseline

Cluster
2
85.24%
86.08%
Random
Forest

Cluster
3
65.31%
76.74%

Bag
of
Words
&
Random

Forest

Cluster
4
68.63%
82.24%

Bag
of
Words
&
Cross-‐
Validated
CART

Cluster
5
70.31%
84.34%
LogisAc
Regression

Clustered
Data
Set
Results

No
improvement

through
modeling

+14%
improvement

Cluster-‐then-‐predict
total
accuracy
=
76.81%

Clustering
provided
us
mixed
results
on
our
models:

Ankita
Kaul
&
Nick
Baladis
|
MIT
Sloan

Bag
of
Words
Text
AnalyAcs
+
CART

Examples
on
Clustered
Set

score >= 3.5
wordcoun >= 58
grade_sc >= 5.4
wordcoun >= 96
epson >= 2.5
might >= 0.5
keep >= 0.5
pretti >= 0.5
wordcoun < 102
FALSE
FALSE TRUE FALSE
FALSE
FALSE
FALSE
FALSE TRUE
TRUE
yes no
score >= 3.5
wordcoun >= 50 wordcoun >= 124
score >= 2.5
speaker < 1.5
fine >= 0.5
chang < 0.5
window >= 0.5
issu >= 0.5
real >= 0.5
FALSE TRUE
FALSE
FALSE
FALSE
FALSE
FALSE TRUE
TRUE
TRUE
TRUE
yes no
Cluster
4
Cluster
5

Ankita
Kaul
&
Nick
Baladis
|
MIT
Sloan

Our
best
performer
was
Bag
of
Words
+
Random
Forests
on
the
complete
data
set

The
cluster-‐then-‐predict
methodology
did
not
beat
modeling
the
enAre
set

However,
clustering
gave
us
other
interesAng
results:

•  Clusters
1,2,4,5
beat
even
our
best
models
we
developed
on
the
enAre
data
set

•  Cluster
1
had
such
a
high
baseline
(90.52%),
no
model
is
needed

•  Cluster
5
had
a
+14%
improvement,
higher
than
any
other
model

74.95%

(Baseline)

82.08%

(BoW
+
RF)

74.95%

(Baseline)

76.81%

(Cluster-‐then-‐Predict)

Amazon
can
predict
the
helpfulness
of
reviews
at
the
moment
they
are
posted
with

reasonable
accuracy
with
a
2-‐step
model
(1)
cluster,
2)
predict
by
cluster).
By
applying

such
analy'cs,
they
can
poten'ally
ﬂag
unhelpful
reviews
at
'me
of
pos'ng
and
help

develop
a
be_er
decision
making
experience
for
customers.

Conclusions:

Ankita
Kaul
&
Nick
Baladis
|
MIT
Sloan

Predicting Helpfulness of User-Generated Product Reviews Through Analytical Models

Recommended

Recommended

More Related Content

Similar to Predicting Helpfulness of User-Generated Product Reviews Through Analytical Models

Similar to Predicting Helpfulness of User-Generated Product Reviews Through Analytical Models (20)

Recently uploaded

Recently uploaded (20)

Predicting Helpfulness of User-Generated Product Reviews Through Analytical Models