Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Predicting Helpfulness of User-Generated Product Reviews Through Analytical Models

441 views

Published on

Customer reviews are an important feature on Amazon’s vast array of products. Many customers rely heavily on the honest reviews of past users during purchasing decisions. Currently, the only way to regulate the quality of these reviews is for other users to voluntarily thumbs up/down a review as ‘helpful’ or ‘not helpful’. It is in the best interest of Amazon (and potential customers) to be shown the most helpful reviews first and de-prioritize (or flag) useless reviews. Thus, we wanted to try and create a model that could successfully predict whether or not customers would find user product reviews helpful. With such a model, Amazon would be able to better prioritize user reviews displayed on product pages from the moment a review is posted.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Predicting Helpfulness of User-Generated Product Reviews Through Analytical Models

  1. 1. Predic'ng  Helpfulness  of     Amazon’s  User-­‐Generated  Product  Reviews   Ankita  Kaul  &  Nicholas  Baladis   MIT  Sloan  –  Spring  2015  
  2. 2. Project   Mo'va'on   Amazon  prioriAzes  product  reviews  that  customers   deem  ‘helpful’,  only  a@er  customers  have   voluntarily  voted  so.   Customers  can   voluntarily  vote   here   Ankita  Kaul  &  Nick  Baladis  |  MIT  Sloan  
  3. 3. …Amazon  could  predict  which  reviews  are  helpful,    the  moment  they  are  posted?   Product  Ra5ng   Helpfulness  score   Ankita  Kaul  &  Nick  Baladis  |  MIT  Sloan  
  4. 4. Data  Galore*   Our  data  consisted  of   Amazon  user-­‐generated   product  reviews,  spanning  all   product  categories,  and   spanning  a  Ame  of  18  years.   Each  ‘observaAon’  is  a   customer’s  review.     •  Reviewer  ID   •  Helpfulness  RaAng   •  Product  ID   •  Product  Price   •  Timestamp  of   review   •  Review  Prose     •  Score   Data  Structure:   ~35M  Reviews,   All  Categories   ~1.2M,  Electronics   Categories     ~18K,   Only  Reviews  with    >10  votes   Downsize  Downsize   We  had  to  downsize:   *Data  procured  from  Stanford  University   J.  McAuley  and  J.  Leskovec.  Hidden  factors  and   hidden  topics:  understanding  ra5ng  dimensions   with  review  text.  RecSys,  2013.   Ankita  Kaul  &  Nick  Baladis  |  MIT  Sloan  
  5. 5. Analysis  Approach   The  Setup   The  Methodology   Dependent   variable   Is  a  review  helpful  or   not?     • ‘Yes’  if  >75%  voters   agree   • Binary  variable     Independent   variables   Pre-­‐Exis'ng  from  data  set:   • Product  Price   • Overall  product  raAng     Newly  calculated:   • Word  count  of  review  prose   • Readability  grade-­‐level  score     On  unclustered   data  set   • Linear  Regression   • LogisAc  Regression   • CART   • Cross-­‐Validated  CART   • Random  Forest   • Bag  of  Words   On  clustered  data   set   • LogisAc  Regression   • CART   • Cross-­‐Validated  CART   • Random  Forest   • Bag  of  Words   Flesch-­‐Kincaid  method:   Ankita  Kaul  &  Nick  Baladis  |  MIT  Sloan  
  6. 6. PredicAons  on  Unclustered  Data  Set    Methodology   Accuracy    Baseline   74.95%    Linear  Regression     R2  =  0.273    LogisAc  Regression   81.44%    CART   80.88%    Cross-­‐V  CART   81.84%    Random  Forest   81.94%    BoW  &  LogisAc  Reg   81.08%    BoW  &  CART   79.80%    BoW  &  Cross-­‐V  CART   78.16%    BoW  &  Random  Forest   82.08%   score >= 2.5 price < 210 work >= 0.5 score >= 1.5 price < 30 FALSE FALSE FALSE FALSE TRUE TRUE yes no BoW  &  CART  Tree   Our  predic've  models  look  promising:   Ankita  Kaul  &  Nick  Baladis  |  MIT  Sloan  
  7. 7. Clustering  The  Data  Set   Cluster  1  -­‐  Eloquent  &  wordy   •  Highest  word  count   •  Highest  grade  score   Cluster  2  –  Cheap  products  &  less   wordy   •  Lowest  price   •  Low  word  count   Cluster  3  –Worse  products  &  shortest   reviews   •  Lowest  word  count   •  Lowest  product  score   Cluster  4  –  The  ‘average’  group   •  Average  in  all  variables   Cluster  5  –  Expensive  products  &   least  arAculate  reviews   •  Highest  price   •  Low  grade  score   15%   35%   31%   14%   5%   05000001000000 Cluster Dendrogram Height Ankita  Kaul  &  Nick  Baladis  |  MIT  Sloan  
  8. 8.  Cluster   Baseline   Accuracy   Best  Performing   Accuracy   Best  Performing   Methodology    Cluster  1   90.52%   90.52%   Baseline    Cluster  2   85.24%   86.08%   Random  Forest    Cluster  3   65.31%   76.74%   Bag  of  Words  &  Random   Forest    Cluster  4   68.63%   82.24%   Bag  of  Words  &  Cross-­‐ Validated  CART    Cluster  5   70.31%   84.34%   LogisAc  Regression   Clustered  Data  Set  Results   No  improvement   through  modeling   +14%  improvement   Cluster-­‐then-­‐predict  total  accuracy  =  76.81%   Clustering  provided  us  mixed  results  on  our  models:   Ankita  Kaul  &  Nick  Baladis  |  MIT  Sloan  
  9. 9. Bag  of  Words  Text  AnalyAcs  +  CART    Examples  on  Clustered  Set   score >= 3.5 wordcoun >= 58 grade_sc >= 5.4 wordcoun >= 96 epson >= 2.5 might >= 0.5 keep >= 0.5 pretti >= 0.5 wordcoun < 102 FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE yes no score >= 3.5 wordcoun >= 50 wordcoun >= 124 score >= 2.5 speaker < 1.5 fine >= 0.5 chang < 0.5 window >= 0.5 issu >= 0.5 real >= 0.5 FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE yes no Cluster  4   Cluster  5   Ankita  Kaul  &  Nick  Baladis  |  MIT  Sloan  
  10. 10. Conclusions  
  11. 11. Our  best  performer  was  Bag  of  Words  +  Random  Forests  on  the  complete  data  set           The  cluster-­‐then-­‐predict  methodology  did  not  beat  modeling  the  enAre  set           However,  clustering  gave  us  other  interesAng  results:   •  Clusters  1,2,4,5  beat  even  our  best  models  we  developed  on  the  enAre  data  set   •  Cluster  1  had  such  a  high  baseline  (90.52%),  no  model  is  needed   •  Cluster  5  had  a  +14%  improvement,  higher  than  any  other  model     74.95%   (Baseline)   82.08%    (BoW  +  RF)   74.95%   (Baseline)   76.81%    (Cluster-­‐then-­‐Predict)   Amazon  can  predict  the  helpfulness  of  reviews  at  the  moment  they  are  posted  with   reasonable  accuracy  with  a  2-­‐step  model  (1)  cluster,  2)  predict  by  cluster).  By  applying   such  analy'cs,  they  can  poten'ally  flag  unhelpful  reviews  at  'me  of  pos'ng  and  help   develop  a  be_er  decision  making  experience  for  customers.     Conclusions:   Ankita  Kaul  &  Nick  Baladis  |  MIT  Sloan  

×