ML Classifies Book Reviews as Helpful or Not

•

0 likes•260 views

krk269

Insight Demo

Data & Analytics

GoodReviews
Provide the most helpful book reviews
Krishna Karthik
Insight Data Science

Readers often rely on book reviews by other users to
choose a book
Currently, reviews on Amazon are rated by users
Potentially helpful reviews can go unnoticed if unrated
Let a machine classify reviews as helpful
GoodReviews : Motivation
goodreviews.co
Excerpt of a helpful unrated review

Text ﬁle with book reviews from Amazon
Unstructured data
Data
Book reviews as they appear on Amazon
Response used in the modeling
Fraction of users who rated the review that
found it to be useful
J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. RecSys, 2013.

Three class classiﬁcation
One vs Rest with Random Forest classifier
Features (NLP)
No. of adverbs, adjectives …..
in review and summary
References to genre
No. of words, sentences
Subjectivity, polarity and
lexical diversity
• Precision and Recall used as success metrics
• ~70% average precision and recall in the test and validation sets
• ~75% precision and recall for the most helpful reviews
• AOC ~ 0.88
Not
helpful
Middle Helpful
20% 80%
Fractional
helpfulness

Helpful reviews have more neutral sentiment
Helpful reviews have
no extremes of sentiment
Review polarity
Mean Variance
Helpful 0.17 0.12
Not helpful 0.05 0.25

Helpful reviews are longer and have fewer unique words
Helpful reviews
longer on average
Helpful reviews are
less lexically diverse
Number of words
in the review
Lexical
diversity
Mean Variance
Helpful 0.61 0.1
Not helpful 0.73 0.13

About me
Krishna Karthik
Experimental particle physics
New York University and CERN
Reading, hiking and soccer

Details of data and algorithm
• Features extracted using NLTK and TextBlob
• Trained on ~ 60,000 reviews evenly split between the 3
classes
• In reality, the dataset is distributed as 3:3:1
(Bad:Middle:Good)
• Reviews rated at least 20 times used
• Test and validation sets consisting of 30,000 reviews each
• One vs Rest using Random Forest classiﬁer

Feature importance
Book rating
Number
of words
Polarity
Lexical diversity

Test set plots
Test set - ROC
Test set - Precision Recall
~4% helpful misclassiﬁed as not helpful

Validation set plots
Validation set
Validation set
~4% helpful misclassiﬁed as not helpful

What's hot

Analyzing survey dataFatima Sultana

Analytic ExplanationPradeep Jha

Data AnalysisClive McGoun

Statistics for Librarians, Session 1: What is statistics & Why is it important?University of North Texas

Data interpretationNimisha Nandan

How to researchRachel Heyes

10-1. How to get your manuscript published? Elena Levtchenko (eng)KidneyOrgRu

Evaluating articles as a reviewerAmit Agrawal

Week 8 summative assignment critique of research articlepiya30

Research writing & Mendeleyvijay kumar

COMPLETE UNDERSTANDINGOF Quantitative Researchmeh0091

Top Five Things that Public Relations Students need to know about researchSheila Cost

Multivariate Models in Questionnaire DevelopmentD Dutta Roy

7 measurement & questionnaires design (Dr. Mai,2014)Phong Đá

Detecting missing data in SPSSShah Abdul Latif University

Authentic ResearchDrJim

Student SurveyMichael Germano

Research Process Explained360dissertations

Sampling DesignVasanthagopal R

What's hot (19)

Analyzing survey data

Analytic Explanation

Data Analysis

Statistics for Librarians, Session 1: What is statistics & Why is it important?

Data interpretation

How to research

10-1. How to get your manuscript published? Elena Levtchenko (eng)

Evaluating articles as a reviewer

Week 8 summative assignment critique of research article

Research writing & Mendeley

COMPLETE UNDERSTANDINGOF Quantitative Research

Top Five Things that Public Relations Students need to know about research

Multivariate Models in Questionnaire Development

7 measurement & questionnaires design (Dr. Mai,2014)

Detecting missing data in SPSS

Authentic Research

Student Survey

Research Process Explained

Sampling Design

Similar to ML Classifies Book Reviews as Helpful or Not

Review of rmtPakistan Gum Industries Pvt. Ltd

SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...Rocío Cañamares

6. Are you ready to write your methodology? DoctoralNet Limited

D3M Online Reviewsveesingh

Research Tools & TechniquesDrSamsonRVictor

Do Citations and Readership Predict Excellent Publications?Dasha Herrmannova

Book Recommendation EngineShravaniBheema

Evaluating e referenceElaine Lasda

EDUC 815Final Exam Grading RubricCriteriaLevels of Achieveme.docxtoltonkendal

D Whitelock LAK presentation open_essayistfvDenise Whitelock

Systematic Reviews, Tech Mining, and Other Knowledge Synthesis Beasts of BurdenUniversity of Michigan Taubman Health Sciences Library

Automatic Essay Grading_FinalSahilc2200

Lra co gnitiveinterviews11.29kconradi

Ontology Search: An Empirical EvaluationArmin Haller

Running head EVALUATION OF A QUALITATIVE STUDY1EVALUATION O.docxcowinhelen

#06198 Topic PSY 325 Statistics for the Behavioral & Social Scien.docxAASTHA76

Writing & publishing research workshopSeth Porter, MA, MLIS

Class, please see the MS Word attachment. This document has my expVinaOconner450

Recommender systemsTamer Rezk

The smart citizen and the fourth paradigmJaison Paul

Similar to ML Classifies Book Reviews as Helpful or Not (20)

Review of rmt

SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...

6. Are you ready to write your methodology?

D3M Online Reviews

Research Tools & Techniques

Do Citations and Readership Predict Excellent Publications?

Book Recommendation Engine

Evaluating e reference

EDUC 815Final Exam Grading RubricCriteriaLevels of Achieveme.docx

D Whitelock LAK presentation open_essayistfv

Systematic Reviews, Tech Mining, and Other Knowledge Synthesis Beasts of Burden

Automatic Essay Grading_Final

Lra co gnitiveinterviews11.29

Ontology Search: An Empirical Evaluation

Running head EVALUATION OF A QUALITATIVE STUDY1EVALUATION O.docx

#06198 Topic PSY 325 Statistics for the Behavioral & Social Scien.docx

Writing & publishing research workshop

Class, please see the MS Word attachment. This document has my exp

Recommender systems

The smart citizen and the fourth paradigm

Recently uploaded

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

Ukraine War presentation: KNOW THE BASICSAishani27

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一ffjhghh

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach

PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor

Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics

B2 Creative Industry Response Evaluation.docxStephen266013

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha

Invezz.com - Grow your wealth with trading signalsInvezz1

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo

Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083

Recently uploaded (20)

100-Concepts-of-AI by Anupama Kate .pptx

Ukraine War presentation: KNOW THE BASICS

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt

PKS-TGC-1084-630 - Stage 1 Proposal.pptx

FESE Capital Markets Fact Sheet 2024 Q1.pdf

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai

Predicting Employee Churn: A Data-Driven Approach Project Presentation

B2 Creative Industry Response Evaluation.docx

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...

Log Analysis using OSSEC sasoasasasas.pptx

Call Girls In Mahipalpur O9654467111 Escorts Service

Invezz.com - Grow your wealth with trading signals

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改

Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...

ML Classifies Book Reviews as Helpful or Not

1. GoodReviews Provide the most helpful book reviews Krishna Karthik Insight Data Science

2. Readers often rely on book reviews by other users to choose a book Currently, reviews on Amazon are rated by users Potentially helpful reviews can go unnoticed if unrated Let a machine classify reviews as helpful GoodReviews : Motivation goodreviews.co Excerpt of a helpful unrated review

3. Text ﬁle with book reviews from Amazon Unstructured data Data Book reviews as they appear on Amazon Response used in the modeling Fraction of users who rated the review that found it to be useful J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. RecSys, 2013.

4. Three class classiﬁcation One vs Rest with Random Forest classifier Features (NLP) No. of adverbs, adjectives ….. in review and summary References to genre No. of words, sentences Subjectivity, polarity and lexical diversity • Precision and Recall used as success metrics • ~70% average precision and recall in the test and validation sets • ~75% precision and recall for the most helpful reviews • AOC ~ 0.88 Not helpful Middle Helpful 20% 80% Fractional helpfulness

5. Helpful reviews have more neutral sentiment Helpful reviews have no extremes of sentiment Review polarity Mean Variance Helpful 0.17 0.12 Not helpful 0.05 0.25

6. Helpful reviews are longer and have fewer unique words Helpful reviews longer on average Helpful reviews are less lexically diverse Number of words in the review Lexical diversity Mean Variance Helpful 0.61 0.1 Not helpful 0.73 0.13

7. About me Krishna Karthik Experimental particle physics New York University and CERN Reading, hiking and soccer

8. Backup

9. Details of data and algorithm • Features extracted using NLTK and TextBlob • Trained on ~ 60,000 reviews evenly split between the 3 classes • In reality, the dataset is distributed as 3:3:1 (Bad:Middle:Good) • Reviews rated at least 20 times used • Test and validation sets consisting of 30,000 reviews each • One vs Rest using Random Forest classiﬁer

10. Feature importance Book rating Number of words Polarity Lexical diversity

11. Test set plots Test set - ROC Test set - Precision Recall ~4% helpful misclassiﬁed as not helpful

12. Validation set plots Validation set Validation set ~4% helpful misclassiﬁed as not helpful

13. Screenshot 1

14. Screenshot 2

15. Screenshot 3