SlideShare a Scribd company logo
1 of 12
Download to read offline
Predic'ng	
  Helpfulness	
  of	
  	
  
Amazon’s	
  User-­‐Generated	
  Product	
  Reviews	
  
Ankita	
  Kaul	
  &	
  Nicholas	
  Baladis	
  
MIT	
  Sloan	
  –	
  Spring	
  2015	
  
Project	
  
Mo'va'on	
  
Amazon	
  prioriAzes	
  product	
  reviews	
  that	
  customers	
  
deem	
  ‘helpful’,	
  only	
  a@er	
  customers	
  have	
  
voluntarily	
  voted	
  so.	
  
Customers	
  can	
  
voluntarily	
  vote	
  
here	
  
Ankita	
  Kaul	
  &	
  Nick	
  Baladis	
  |	
  MIT	
  Sloan	
  
…Amazon	
  could	
  predict	
  which	
  reviews	
  are	
  helpful,	
  
	
  the	
  moment	
  they	
  are	
  posted?	
  
Product	
  Ra5ng	
  
Helpfulness	
  score	
  
Ankita	
  Kaul	
  &	
  Nick	
  Baladis	
  |	
  MIT	
  Sloan	
  
Data	
  Galore*	
  
Our	
  data	
  consisted	
  of	
  
Amazon	
  user-­‐generated	
  
product	
  reviews,	
  spanning	
  all	
  
product	
  categories,	
  and	
  
spanning	
  a	
  Ame	
  of	
  18	
  years.	
  
Each	
  ‘observaAon’	
  is	
  a	
  
customer’s	
  review.	
  	
  
•  Reviewer	
  ID	
   •  Helpfulness	
  RaAng	
  
•  Product	
  ID	
   •  Product	
  Price	
  
•  Timestamp	
  of	
  
review	
  
•  Review	
  Prose	
  	
  
•  Score	
  
Data	
  Structure:	
  
~35M	
  Reviews,	
  
All	
  Categories	
  
~1.2M,	
  Electronics	
  
Categories	
  	
  
~18K,	
  
Only	
  Reviews	
  with	
  
	
  >10	
  votes	
  
Downsize	
  Downsize	
  
We	
  had	
  to	
  downsize:	
  
*Data	
  procured	
  from	
  Stanford	
  University	
  
J.	
  McAuley	
  and	
  J.	
  Leskovec.	
  Hidden	
  factors	
  and	
  
hidden	
  topics:	
  understanding	
  ra5ng	
  dimensions	
  
with	
  review	
  text.	
  RecSys,	
  2013.	
  
Ankita	
  Kaul	
  &	
  Nick	
  Baladis	
  |	
  MIT	
  Sloan	
  
Analysis	
  Approach	
  
The	
  Setup	
  
The	
  Methodology	
  
Dependent	
  
variable	
  
Is	
  a	
  review	
  helpful	
  or	
  
not?	
  	
  
• ‘Yes’	
  if	
  >75%	
  voters	
  
agree	
  
• Binary	
  variable	
  	
  
Independent	
  
variables	
  
Pre-­‐Exis'ng	
  from	
  data	
  set:	
  
• Product	
  Price	
  
• Overall	
  product	
  raAng	
  
	
  
Newly	
  calculated:	
  
• Word	
  count	
  of	
  review	
  prose	
  
• Readability	
  grade-­‐level	
  score	
  	
  
On	
  unclustered	
  
data	
  set	
  
• Linear	
  Regression	
  
• LogisAc	
  Regression	
  
• CART	
  
• Cross-­‐Validated	
  CART	
  
• Random	
  Forest	
  
• Bag	
  of	
  Words	
  
On	
  clustered	
  data	
  
set	
  
• LogisAc	
  Regression	
  
• CART	
  
• Cross-­‐Validated	
  CART	
  
• Random	
  Forest	
  
• Bag	
  of	
  Words	
  
Flesch-­‐Kincaid	
  method:	
  
Ankita	
  Kaul	
  &	
  Nick	
  Baladis	
  |	
  MIT	
  Sloan	
  
PredicAons	
  on	
  Unclustered	
  Data	
  Set	
  
	
  Methodology	
   Accuracy	
  
	
  Baseline	
   74.95%	
  
	
  Linear	
  Regression	
  	
   R2	
  =	
  0.273	
  
	
  LogisAc	
  Regression	
   81.44%	
  
	
  CART	
   80.88%	
  
	
  Cross-­‐V	
  CART	
   81.84%	
  
	
  Random	
  Forest	
   81.94%	
  
	
  BoW	
  &	
  LogisAc	
  Reg	
   81.08%	
  
	
  BoW	
  &	
  CART	
   79.80%	
  
	
  BoW	
  &	
  Cross-­‐V	
  CART	
   78.16%	
  
	
  BoW	
  &	
  Random	
  Forest	
   82.08%	
  
score >= 2.5
price < 210
work >= 0.5
score >= 1.5
price < 30
FALSE
FALSE
FALSE
FALSE TRUE
TRUE
yes no
BoW	
  &	
  CART	
  Tree	
  
Our	
  predic've	
  models	
  look	
  promising:	
  
Ankita	
  Kaul	
  &	
  Nick	
  Baladis	
  |	
  MIT	
  Sloan	
  
Clustering	
  The	
  Data	
  Set	
  
Cluster	
  1	
  -­‐	
  Eloquent	
  &	
  wordy	
  
•  Highest	
  word	
  count	
  
•  Highest	
  grade	
  score	
  
Cluster	
  2	
  –	
  Cheap	
  products	
  &	
  less	
  
wordy	
  
•  Lowest	
  price	
  
•  Low	
  word	
  count	
  
Cluster	
  3	
  –Worse	
  products	
  &	
  shortest	
  
reviews	
  
•  Lowest	
  word	
  count	
  
•  Lowest	
  product	
  score	
  
Cluster	
  4	
  –	
  The	
  ‘average’	
  group	
  
•  Average	
  in	
  all	
  variables	
  
Cluster	
  5	
  –	
  Expensive	
  products	
  &	
  
least	
  arAculate	
  reviews	
  
•  Highest	
  price	
  
•  Low	
  grade	
  score	
  
15%	
  
35%	
  
31%	
  
14%	
  
5%	
  
05000001000000
Cluster Dendrogram
Height
Ankita	
  Kaul	
  &	
  Nick	
  Baladis	
  |	
  MIT	
  Sloan	
  
 Cluster	
  
Baseline	
  
Accuracy	
  
Best	
  Performing	
  
Accuracy	
  
Best	
  Performing	
  
Methodology	
  
	
  Cluster	
  1	
   90.52%	
   90.52%	
   Baseline	
  
	
  Cluster	
  2	
   85.24%	
   86.08%	
   Random	
  Forest	
  
	
  Cluster	
  3	
   65.31%	
   76.74%	
  
Bag	
  of	
  Words	
  &	
  Random	
  
Forest	
  
	
  Cluster	
  4	
   68.63%	
   82.24%	
  
Bag	
  of	
  Words	
  &	
  Cross-­‐
Validated	
  CART	
  
	
  Cluster	
  5	
   70.31%	
   84.34%	
   LogisAc	
  Regression	
  
Clustered	
  Data	
  Set	
  Results	
  
No	
  improvement	
  
through	
  modeling	
  
+14%	
  improvement	
  
Cluster-­‐then-­‐predict	
  total	
  accuracy	
  =	
  76.81%	
  
Clustering	
  provided	
  us	
  mixed	
  results	
  on	
  our	
  models:	
  
Ankita	
  Kaul	
  &	
  Nick	
  Baladis	
  |	
  MIT	
  Sloan	
  
Bag	
  of	
  Words	
  Text	
  AnalyAcs	
  +	
  CART	
  
	
  Examples	
  on	
  Clustered	
  Set	
  
score >= 3.5
wordcoun >= 58
grade_sc >= 5.4
wordcoun >= 96
epson >= 2.5
might >= 0.5
keep >= 0.5
pretti >= 0.5
wordcoun < 102
FALSE
FALSE TRUE FALSE
FALSE
FALSE
FALSE
FALSE TRUE
TRUE
yes no
score >= 3.5
wordcoun >= 50 wordcoun >= 124
score >= 2.5
speaker < 1.5
fine >= 0.5
chang < 0.5
window >= 0.5
issu >= 0.5
real >= 0.5
FALSE TRUE
FALSE
FALSE
FALSE
FALSE
FALSE TRUE
TRUE
TRUE
TRUE
yes no
Cluster	
  4	
   Cluster	
  5	
  
Ankita	
  Kaul	
  &	
  Nick	
  Baladis	
  |	
  MIT	
  Sloan	
  
Conclusions	
  
Our	
  best	
  performer	
  was	
  Bag	
  of	
  Words	
  +	
  Random	
  Forests	
  on	
  the	
  complete	
  data	
  set	
  
	
  
	
  
	
  
	
  
The	
  cluster-­‐then-­‐predict	
  methodology	
  did	
  not	
  beat	
  modeling	
  the	
  enAre	
  set	
  
	
  
	
  
	
  
	
  
However,	
  clustering	
  gave	
  us	
  other	
  interesAng	
  results:	
  
•  Clusters	
  1,2,4,5	
  beat	
  even	
  our	
  best	
  models	
  we	
  developed	
  on	
  the	
  enAre	
  data	
  set	
  
•  Cluster	
  1	
  had	
  such	
  a	
  high	
  baseline	
  (90.52%),	
  no	
  model	
  is	
  needed	
  
•  Cluster	
  5	
  had	
  a	
  +14%	
  improvement,	
  higher	
  than	
  any	
  other	
  model	
  
	
  
74.95%	
  
(Baseline)	
  
82.08%	
  
	
  (BoW	
  +	
  RF)	
  
74.95%	
  
(Baseline)	
  
76.81%	
  
	
  (Cluster-­‐then-­‐Predict)	
  
Amazon	
  can	
  predict	
  the	
  helpfulness	
  of	
  reviews	
  at	
  the	
  moment	
  they	
  are	
  posted	
  with	
  
reasonable	
  accuracy	
  with	
  a	
  2-­‐step	
  model	
  (1)	
  cluster,	
  2)	
  predict	
  by	
  cluster).	
  By	
  applying	
  
such	
  analy'cs,	
  they	
  can	
  poten'ally	
  flag	
  unhelpful	
  reviews	
  at	
  'me	
  of	
  pos'ng	
  and	
  help	
  
develop	
  a	
  be_er	
  decision	
  making	
  experience	
  for	
  customers.	
  	
  
Conclusions:	
  
Ankita	
  Kaul	
  &	
  Nick	
  Baladis	
  |	
  MIT	
  Sloan	
  

More Related Content

Similar to Predicting Helpfulness of User-Generated Product Reviews Through Analytical Models

Statistical methods for questionnaire development: Questionnaire reliability ...
Statistical methods for questionnaire development: Questionnaire reliability ...Statistical methods for questionnaire development: Questionnaire reliability ...
Statistical methods for questionnaire development: Questionnaire reliability ...Ahmed Negida
 
Clinical Data Classification of alzheimer's disease
Clinical Data Classification of alzheimer's diseaseClinical Data Classification of alzheimer's disease
Clinical Data Classification of alzheimer's diseaseGeorge Kalangi
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data miningKamal Acharya
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9Roger Barga
 
Anton Muzhailo - Practical Test Process Improvement using ISTQB
Anton Muzhailo - Practical Test Process Improvement using ISTQBAnton Muzhailo - Practical Test Process Improvement using ISTQB
Anton Muzhailo - Practical Test Process Improvement using ISTQBIevgenii Katsan
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningKai Koenig
 
Mastering the 80% of Analytics: What Data Scientists Really Do
Mastering the 80% of Analytics: What Data Scientists Really DoMastering the 80% of Analytics: What Data Scientists Really Do
Mastering the 80% of Analytics: What Data Scientists Really DoAvrio Analytics
 
Introduction to Data Analysis With R and R Studio
Introduction to Data Analysis With R and R StudioIntroduction to Data Analysis With R and R Studio
Introduction to Data Analysis With R and R StudioAzmi Mohd Tamil
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsDarius Barušauskas
 
Traditional versus adaptive techniques
Traditional versus adaptive techniquesTraditional versus adaptive techniques
Traditional versus adaptive techniquesAravind Ganesh
 
Multi-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender SystemsMulti-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender SystemsAravind Sesagiri Raamkumar
 
Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)Avkash Chauhan
 
Meetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_AllMeetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_AllBernard Ong
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsSri Ambati
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyAlon Bochman, CFA
 
Mixed Effects Models - Random Intercepts
Mixed Effects Models - Random InterceptsMixed Effects Models - Random Intercepts
Mixed Effects Models - Random InterceptsScott Fraundorf
 

Similar to Predicting Helpfulness of User-Generated Product Reviews Through Analytical Models (20)

Statistical methods for questionnaire development: Questionnaire reliability ...
Statistical methods for questionnaire development: Questionnaire reliability ...Statistical methods for questionnaire development: Questionnaire reliability ...
Statistical methods for questionnaire development: Questionnaire reliability ...
 
Clinical Data Classification of alzheimer's disease
Clinical Data Classification of alzheimer's diseaseClinical Data Classification of alzheimer's disease
Clinical Data Classification of alzheimer's disease
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Predictive Analysis
Predictive AnalysisPredictive Analysis
Predictive Analysis
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
Anton Muzhailo - Practical Test Process Improvement using ISTQB
Anton Muzhailo - Practical Test Process Improvement using ISTQBAnton Muzhailo - Practical Test Process Improvement using ISTQB
Anton Muzhailo - Practical Test Process Improvement using ISTQB
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Mastering the 80% of Analytics: What Data Scientists Really Do
Mastering the 80% of Analytics: What Data Scientists Really DoMastering the 80% of Analytics: What Data Scientists Really Do
Mastering the 80% of Analytics: What Data Scientists Really Do
 
Introduction to Data Analysis With R and R Studio
Introduction to Data Analysis With R and R StudioIntroduction to Data Analysis With R and R Studio
Introduction to Data Analysis With R and R Studio
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitions
 
Traditional versus adaptive techniques
Traditional versus adaptive techniquesTraditional versus adaptive techniques
Traditional versus adaptive techniques
 
Data analytics, a (short) tour
Data analytics, a (short) tourData analytics, a (short) tour
Data analytics, a (short) tour
 
Multi-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender SystemsMulti-method Evaluation in Scientific Paper Recommender Systems
Multi-method Evaluation in Scientific Paper Recommender Systems
 
Data mining on yelp dataset
Data mining on yelp datasetData mining on yelp dataset
Data mining on yelp dataset
 
Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)
 
Meetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_AllMeetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_All
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
Mixed Effects Models - Random Intercepts
Mixed Effects Models - Random InterceptsMixed Effects Models - Random Intercepts
Mixed Effects Models - Random Intercepts
 

Recently uploaded

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 

Recently uploaded (20)

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 

Predicting Helpfulness of User-Generated Product Reviews Through Analytical Models

  • 1. Predic'ng  Helpfulness  of     Amazon’s  User-­‐Generated  Product  Reviews   Ankita  Kaul  &  Nicholas  Baladis   MIT  Sloan  –  Spring  2015  
  • 2. Project   Mo'va'on   Amazon  prioriAzes  product  reviews  that  customers   deem  ‘helpful’,  only  a@er  customers  have   voluntarily  voted  so.   Customers  can   voluntarily  vote   here   Ankita  Kaul  &  Nick  Baladis  |  MIT  Sloan  
  • 3.
  • 4. …Amazon  could  predict  which  reviews  are  helpful,    the  moment  they  are  posted?   Product  Ra5ng   Helpfulness  score   Ankita  Kaul  &  Nick  Baladis  |  MIT  Sloan  
  • 5. Data  Galore*   Our  data  consisted  of   Amazon  user-­‐generated   product  reviews,  spanning  all   product  categories,  and   spanning  a  Ame  of  18  years.   Each  ‘observaAon’  is  a   customer’s  review.     •  Reviewer  ID   •  Helpfulness  RaAng   •  Product  ID   •  Product  Price   •  Timestamp  of   review   •  Review  Prose     •  Score   Data  Structure:   ~35M  Reviews,   All  Categories   ~1.2M,  Electronics   Categories     ~18K,   Only  Reviews  with    >10  votes   Downsize  Downsize   We  had  to  downsize:   *Data  procured  from  Stanford  University   J.  McAuley  and  J.  Leskovec.  Hidden  factors  and   hidden  topics:  understanding  ra5ng  dimensions   with  review  text.  RecSys,  2013.   Ankita  Kaul  &  Nick  Baladis  |  MIT  Sloan  
  • 6. Analysis  Approach   The  Setup   The  Methodology   Dependent   variable   Is  a  review  helpful  or   not?     • ‘Yes’  if  >75%  voters   agree   • Binary  variable     Independent   variables   Pre-­‐Exis'ng  from  data  set:   • Product  Price   • Overall  product  raAng     Newly  calculated:   • Word  count  of  review  prose   • Readability  grade-­‐level  score     On  unclustered   data  set   • Linear  Regression   • LogisAc  Regression   • CART   • Cross-­‐Validated  CART   • Random  Forest   • Bag  of  Words   On  clustered  data   set   • LogisAc  Regression   • CART   • Cross-­‐Validated  CART   • Random  Forest   • Bag  of  Words   Flesch-­‐Kincaid  method:   Ankita  Kaul  &  Nick  Baladis  |  MIT  Sloan  
  • 7. PredicAons  on  Unclustered  Data  Set    Methodology   Accuracy    Baseline   74.95%    Linear  Regression     R2  =  0.273    LogisAc  Regression   81.44%    CART   80.88%    Cross-­‐V  CART   81.84%    Random  Forest   81.94%    BoW  &  LogisAc  Reg   81.08%    BoW  &  CART   79.80%    BoW  &  Cross-­‐V  CART   78.16%    BoW  &  Random  Forest   82.08%   score >= 2.5 price < 210 work >= 0.5 score >= 1.5 price < 30 FALSE FALSE FALSE FALSE TRUE TRUE yes no BoW  &  CART  Tree   Our  predic've  models  look  promising:   Ankita  Kaul  &  Nick  Baladis  |  MIT  Sloan  
  • 8. Clustering  The  Data  Set   Cluster  1  -­‐  Eloquent  &  wordy   •  Highest  word  count   •  Highest  grade  score   Cluster  2  –  Cheap  products  &  less   wordy   •  Lowest  price   •  Low  word  count   Cluster  3  –Worse  products  &  shortest   reviews   •  Lowest  word  count   •  Lowest  product  score   Cluster  4  –  The  ‘average’  group   •  Average  in  all  variables   Cluster  5  –  Expensive  products  &   least  arAculate  reviews   •  Highest  price   •  Low  grade  score   15%   35%   31%   14%   5%   05000001000000 Cluster Dendrogram Height Ankita  Kaul  &  Nick  Baladis  |  MIT  Sloan  
  • 9.  Cluster   Baseline   Accuracy   Best  Performing   Accuracy   Best  Performing   Methodology    Cluster  1   90.52%   90.52%   Baseline    Cluster  2   85.24%   86.08%   Random  Forest    Cluster  3   65.31%   76.74%   Bag  of  Words  &  Random   Forest    Cluster  4   68.63%   82.24%   Bag  of  Words  &  Cross-­‐ Validated  CART    Cluster  5   70.31%   84.34%   LogisAc  Regression   Clustered  Data  Set  Results   No  improvement   through  modeling   +14%  improvement   Cluster-­‐then-­‐predict  total  accuracy  =  76.81%   Clustering  provided  us  mixed  results  on  our  models:   Ankita  Kaul  &  Nick  Baladis  |  MIT  Sloan  
  • 10. Bag  of  Words  Text  AnalyAcs  +  CART    Examples  on  Clustered  Set   score >= 3.5 wordcoun >= 58 grade_sc >= 5.4 wordcoun >= 96 epson >= 2.5 might >= 0.5 keep >= 0.5 pretti >= 0.5 wordcoun < 102 FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE yes no score >= 3.5 wordcoun >= 50 wordcoun >= 124 score >= 2.5 speaker < 1.5 fine >= 0.5 chang < 0.5 window >= 0.5 issu >= 0.5 real >= 0.5 FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE yes no Cluster  4   Cluster  5   Ankita  Kaul  &  Nick  Baladis  |  MIT  Sloan  
  • 12. Our  best  performer  was  Bag  of  Words  +  Random  Forests  on  the  complete  data  set           The  cluster-­‐then-­‐predict  methodology  did  not  beat  modeling  the  enAre  set           However,  clustering  gave  us  other  interesAng  results:   •  Clusters  1,2,4,5  beat  even  our  best  models  we  developed  on  the  enAre  data  set   •  Cluster  1  had  such  a  high  baseline  (90.52%),  no  model  is  needed   •  Cluster  5  had  a  +14%  improvement,  higher  than  any  other  model     74.95%   (Baseline)   82.08%    (BoW  +  RF)   74.95%   (Baseline)   76.81%    (Cluster-­‐then-­‐Predict)   Amazon  can  predict  the  helpfulness  of  reviews  at  the  moment  they  are  posted  with   reasonable  accuracy  with  a  2-­‐step  model  (1)  cluster,  2)  predict  by  cluster).  By  applying   such  analy'cs,  they  can  poten'ally  flag  unhelpful  reviews  at  'me  of  pos'ng  and  help   develop  a  be_er  decision  making  experience  for  customers.     Conclusions:   Ankita  Kaul  &  Nick  Baladis  |  MIT  Sloan