SlideShare a Scribd company logo
1 of 15
Download to read offline
GoodReviews
Provide the most helpful book reviews
Krishna Karthik
Insight Data Science
Readers often rely on book reviews by other users to
choose a book
Currently, reviews on Amazon are rated by users
Potentially helpful reviews can go unnoticed if unrated
Let a machine classify reviews as helpful
GoodReviews : Motivation
goodreviews.co
Excerpt of a helpful unrated review
Text file with book reviews from Amazon
Unstructured data
Data
Book reviews as they appear on Amazon
Response used in the modeling
Fraction of users who rated the review that
found it to be useful
J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. RecSys, 2013.
Three class classification
One vs Rest with Random Forest classifier
Features (NLP)
No. of adverbs, adjectives …..
in review and summary
References to genre
No. of words, sentences
Subjectivity, polarity and
lexical diversity
• Precision and Recall used as success metrics
• ~70% average precision and recall in the test and validation sets
• ~75% precision and recall for the most helpful reviews
• AOC ~ 0.88
Not
helpful
Middle Helpful
20% 80%
Fractional
helpfulness
Helpful reviews have more neutral sentiment
Helpful reviews have
no extremes of sentiment
Review polarity
Mean Variance
Helpful 0.17 0.12
Not helpful 0.05 0.25
Helpful reviews are longer and have fewer unique words
Helpful reviews
longer on average
Helpful reviews are
less lexically diverse
Number of words
in the review
Lexical
diversity
Mean Variance
Helpful 0.61 0.1
Not helpful 0.73 0.13
About me
Krishna Karthik
Experimental particle physics
New York University and CERN
Reading, hiking and soccer
Backup
Details of data and algorithm
• Features extracted using NLTK and TextBlob
• Trained on ~ 60,000 reviews evenly split between the 3
classes
• In reality, the dataset is distributed as 3:3:1
(Bad:Middle:Good)
• Reviews rated at least 20 times used
• Test and validation sets consisting of 30,000 reviews each
• One vs Rest using Random Forest classifier
Feature importance
Book rating
Number
of words
Polarity
Lexical diversity
Test set plots
Test set - ROC
Test set - Precision Recall
~4% helpful misclassified as not helpful
Validation set plots
Validation set
Validation set
~4% helpful misclassified as not helpful
Screenshot 1
Screenshot 2
Screenshot 3

More Related Content

What's hot

Analytic Explanation
Analytic ExplanationAnalytic Explanation
Analytic ExplanationPradeep Jha
 
Statistics for Librarians, Session 1: What is statistics & Why is it important?
Statistics for Librarians, Session 1: What is statistics & Why is it important?Statistics for Librarians, Session 1: What is statistics & Why is it important?
Statistics for Librarians, Session 1: What is statistics & Why is it important?University of North Texas
 
10-1. How to get your manuscript published? Elena Levtchenko (eng)
10-1. How to get your manuscript published? Elena Levtchenko (eng)10-1. How to get your manuscript published? Elena Levtchenko (eng)
10-1. How to get your manuscript published? Elena Levtchenko (eng)KidneyOrgRu
 
Evaluating articles as a reviewer
Evaluating articles as a reviewerEvaluating articles as a reviewer
Evaluating articles as a reviewerAmit Agrawal
 
Week 8 summative assignment critique of research article
Week 8 summative assignment critique of research articleWeek 8 summative assignment critique of research article
Week 8 summative assignment critique of research articlepiya30
 
Research writing & Mendeley
Research writing & MendeleyResearch writing & Mendeley
Research writing & Mendeleyvijay kumar
 
COMPLETE UNDERSTANDINGOF Quantitative Research
COMPLETE UNDERSTANDINGOF Quantitative ResearchCOMPLETE UNDERSTANDINGOF Quantitative Research
COMPLETE UNDERSTANDINGOF Quantitative Researchmeh0091
 
Top Five Things that Public Relations Students need to know about research
Top Five Things that Public Relations Students need to know about researchTop Five Things that Public Relations Students need to know about research
Top Five Things that Public Relations Students need to know about researchSheila Cost
 
Multivariate Models in Questionnaire Development
Multivariate Models in Questionnaire DevelopmentMultivariate Models in Questionnaire Development
Multivariate Models in Questionnaire DevelopmentD Dutta Roy
 
7 measurement & questionnaires design (Dr. Mai,2014)
7 measurement & questionnaires design (Dr. Mai,2014)7 measurement & questionnaires design (Dr. Mai,2014)
7 measurement & questionnaires design (Dr. Mai,2014)Phong Đá
 
Authentic Research
Authentic ResearchAuthentic Research
Authentic ResearchDrJim
 
Research Process Explained
Research Process ExplainedResearch Process Explained
Research Process Explained360dissertations
 

What's hot (19)

Analyzing survey data
Analyzing survey dataAnalyzing survey data
Analyzing survey data
 
Analytic Explanation
Analytic ExplanationAnalytic Explanation
Analytic Explanation
 
Data Analysis
Data AnalysisData Analysis
Data Analysis
 
Statistics for Librarians, Session 1: What is statistics & Why is it important?
Statistics for Librarians, Session 1: What is statistics & Why is it important?Statistics for Librarians, Session 1: What is statistics & Why is it important?
Statistics for Librarians, Session 1: What is statistics & Why is it important?
 
Data interpretation
Data interpretationData interpretation
Data interpretation
 
How to research
How to researchHow to research
How to research
 
10-1. How to get your manuscript published? Elena Levtchenko (eng)
10-1. How to get your manuscript published? Elena Levtchenko (eng)10-1. How to get your manuscript published? Elena Levtchenko (eng)
10-1. How to get your manuscript published? Elena Levtchenko (eng)
 
Evaluating articles as a reviewer
Evaluating articles as a reviewerEvaluating articles as a reviewer
Evaluating articles as a reviewer
 
Week 8 summative assignment critique of research article
Week 8 summative assignment critique of research articleWeek 8 summative assignment critique of research article
Week 8 summative assignment critique of research article
 
Research writing & Mendeley
Research writing & MendeleyResearch writing & Mendeley
Research writing & Mendeley
 
COMPLETE UNDERSTANDINGOF Quantitative Research
COMPLETE UNDERSTANDINGOF Quantitative ResearchCOMPLETE UNDERSTANDINGOF Quantitative Research
COMPLETE UNDERSTANDINGOF Quantitative Research
 
Top Five Things that Public Relations Students need to know about research
Top Five Things that Public Relations Students need to know about researchTop Five Things that Public Relations Students need to know about research
Top Five Things that Public Relations Students need to know about research
 
Multivariate Models in Questionnaire Development
Multivariate Models in Questionnaire DevelopmentMultivariate Models in Questionnaire Development
Multivariate Models in Questionnaire Development
 
7 measurement & questionnaires design (Dr. Mai,2014)
7 measurement & questionnaires design (Dr. Mai,2014)7 measurement & questionnaires design (Dr. Mai,2014)
7 measurement & questionnaires design (Dr. Mai,2014)
 
Detecting missing data in SPSS
Detecting missing data in SPSSDetecting missing data in SPSS
Detecting missing data in SPSS
 
Authentic Research
Authentic ResearchAuthentic Research
Authentic Research
 
Student Survey
Student SurveyStudent Survey
Student Survey
 
Research Process Explained
Research Process ExplainedResearch Process Explained
Research Process Explained
 
Sampling Design
Sampling DesignSampling Design
Sampling Design
 

Similar to ML Classifies Book Reviews as Helpful or Not

SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...
SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...
SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...Rocío Cañamares
 
6. Are you ready to write your methodology?
6. Are you ready to write your methodology? 6. Are you ready to write your methodology?
6. Are you ready to write your methodology? DoctoralNet Limited
 
D3M Online Reviews
D3M Online ReviewsD3M Online Reviews
D3M Online Reviewsveesingh
 
Research Tools & Techniques
Research Tools & TechniquesResearch Tools & Techniques
Research Tools & TechniquesDrSamsonRVictor
 
Do Citations and Readership Predict Excellent Publications?
Do Citations and Readership Predict Excellent Publications?Do Citations and Readership Predict Excellent Publications?
Do Citations and Readership Predict Excellent Publications?Dasha Herrmannova
 
Book Recommendation Engine
Book Recommendation EngineBook Recommendation Engine
Book Recommendation EngineShravaniBheema
 
Evaluating e reference
Evaluating e referenceEvaluating e reference
Evaluating e referenceElaine Lasda
 
EDUC 815Final Exam Grading RubricCriteriaLevels of Achieveme.docx
EDUC 815Final Exam Grading RubricCriteriaLevels of Achieveme.docxEDUC 815Final Exam Grading RubricCriteriaLevels of Achieveme.docx
EDUC 815Final Exam Grading RubricCriteriaLevels of Achieveme.docxtoltonkendal
 
D Whitelock LAK presentation open_essayistfv
D Whitelock LAK presentation  open_essayistfvD Whitelock LAK presentation  open_essayistfv
D Whitelock LAK presentation open_essayistfvDenise Whitelock
 
Automatic Essay Grading_Final
Automatic Essay Grading_FinalAutomatic Essay Grading_Final
Automatic Essay Grading_FinalSahilc2200
 
Lra co gnitiveinterviews11.29
Lra co gnitiveinterviews11.29Lra co gnitiveinterviews11.29
Lra co gnitiveinterviews11.29kconradi
 
Ontology Search: An Empirical Evaluation
Ontology Search: An Empirical EvaluationOntology Search: An Empirical Evaluation
Ontology Search: An Empirical EvaluationArmin Haller
 
Running head EVALUATION OF A QUALITATIVE STUDY1EVALUATION O.docx
Running head EVALUATION OF A QUALITATIVE STUDY1EVALUATION O.docxRunning head EVALUATION OF A QUALITATIVE STUDY1EVALUATION O.docx
Running head EVALUATION OF A QUALITATIVE STUDY1EVALUATION O.docxcowinhelen
 
#06198 Topic PSY 325 Statistics for the Behavioral & Social Scien.docx
#06198 Topic PSY 325 Statistics for the Behavioral & Social Scien.docx#06198 Topic PSY 325 Statistics for the Behavioral & Social Scien.docx
#06198 Topic PSY 325 Statistics for the Behavioral & Social Scien.docxAASTHA76
 
Writing & publishing research workshop
Writing & publishing research workshopWriting & publishing research workshop
Writing & publishing research workshopSeth Porter, MA, MLIS
 
Class, please see the MS Word attachment. This document has my exp
Class, please see the MS Word attachment. This document has my expClass, please see the MS Word attachment. This document has my exp
Class, please see the MS Word attachment. This document has my expVinaOconner450
 
Recommender systems
Recommender systemsRecommender systems
Recommender systemsTamer Rezk
 
The smart citizen and the fourth paradigm
The smart citizen and the fourth paradigmThe smart citizen and the fourth paradigm
The smart citizen and the fourth paradigmJaison Paul
 

Similar to ML Classifies Book Reviews as Helpful or Not (20)

Review of rmt
Review of rmtReview of rmt
Review of rmt
 
SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...
SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...
SIGIR 2018 - From the Probability Ranking Principle to the Low Prior Discover...
 
6. Are you ready to write your methodology?
6. Are you ready to write your methodology? 6. Are you ready to write your methodology?
6. Are you ready to write your methodology?
 
D3M Online Reviews
D3M Online ReviewsD3M Online Reviews
D3M Online Reviews
 
Research Tools & Techniques
Research Tools & TechniquesResearch Tools & Techniques
Research Tools & Techniques
 
Do Citations and Readership Predict Excellent Publications?
Do Citations and Readership Predict Excellent Publications?Do Citations and Readership Predict Excellent Publications?
Do Citations and Readership Predict Excellent Publications?
 
Book Recommendation Engine
Book Recommendation EngineBook Recommendation Engine
Book Recommendation Engine
 
Evaluating e reference
Evaluating e referenceEvaluating e reference
Evaluating e reference
 
EDUC 815Final Exam Grading RubricCriteriaLevels of Achieveme.docx
EDUC 815Final Exam Grading RubricCriteriaLevels of Achieveme.docxEDUC 815Final Exam Grading RubricCriteriaLevels of Achieveme.docx
EDUC 815Final Exam Grading RubricCriteriaLevels of Achieveme.docx
 
D Whitelock LAK presentation open_essayistfv
D Whitelock LAK presentation  open_essayistfvD Whitelock LAK presentation  open_essayistfv
D Whitelock LAK presentation open_essayistfv
 
Systematic Reviews, Tech Mining, and Other Knowledge Synthesis Beasts of Burden
Systematic Reviews, Tech Mining, and Other Knowledge Synthesis Beasts of BurdenSystematic Reviews, Tech Mining, and Other Knowledge Synthesis Beasts of Burden
Systematic Reviews, Tech Mining, and Other Knowledge Synthesis Beasts of Burden
 
Automatic Essay Grading_Final
Automatic Essay Grading_FinalAutomatic Essay Grading_Final
Automatic Essay Grading_Final
 
Lra co gnitiveinterviews11.29
Lra co gnitiveinterviews11.29Lra co gnitiveinterviews11.29
Lra co gnitiveinterviews11.29
 
Ontology Search: An Empirical Evaluation
Ontology Search: An Empirical EvaluationOntology Search: An Empirical Evaluation
Ontology Search: An Empirical Evaluation
 
Running head EVALUATION OF A QUALITATIVE STUDY1EVALUATION O.docx
Running head EVALUATION OF A QUALITATIVE STUDY1EVALUATION O.docxRunning head EVALUATION OF A QUALITATIVE STUDY1EVALUATION O.docx
Running head EVALUATION OF A QUALITATIVE STUDY1EVALUATION O.docx
 
#06198 Topic PSY 325 Statistics for the Behavioral & Social Scien.docx
#06198 Topic PSY 325 Statistics for the Behavioral & Social Scien.docx#06198 Topic PSY 325 Statistics for the Behavioral & Social Scien.docx
#06198 Topic PSY 325 Statistics for the Behavioral & Social Scien.docx
 
Writing & publishing research workshop
Writing & publishing research workshopWriting & publishing research workshop
Writing & publishing research workshop
 
Class, please see the MS Word attachment. This document has my exp
Class, please see the MS Word attachment. This document has my expClass, please see the MS Word attachment. This document has my exp
Class, please see the MS Word attachment. This document has my exp
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
 
The smart citizen and the fourth paradigm
The smart citizen and the fourth paradigmThe smart citizen and the fourth paradigm
The smart citizen and the fourth paradigm
 

Recently uploaded

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 

Recently uploaded (20)

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 

ML Classifies Book Reviews as Helpful or Not

  • 1. GoodReviews Provide the most helpful book reviews Krishna Karthik Insight Data Science
  • 2. Readers often rely on book reviews by other users to choose a book Currently, reviews on Amazon are rated by users Potentially helpful reviews can go unnoticed if unrated Let a machine classify reviews as helpful GoodReviews : Motivation goodreviews.co Excerpt of a helpful unrated review
  • 3. Text file with book reviews from Amazon Unstructured data Data Book reviews as they appear on Amazon Response used in the modeling Fraction of users who rated the review that found it to be useful J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. RecSys, 2013.
  • 4. Three class classification One vs Rest with Random Forest classifier Features (NLP) No. of adverbs, adjectives ….. in review and summary References to genre No. of words, sentences Subjectivity, polarity and lexical diversity • Precision and Recall used as success metrics • ~70% average precision and recall in the test and validation sets • ~75% precision and recall for the most helpful reviews • AOC ~ 0.88 Not helpful Middle Helpful 20% 80% Fractional helpfulness
  • 5. Helpful reviews have more neutral sentiment Helpful reviews have no extremes of sentiment Review polarity Mean Variance Helpful 0.17 0.12 Not helpful 0.05 0.25
  • 6. Helpful reviews are longer and have fewer unique words Helpful reviews longer on average Helpful reviews are less lexically diverse Number of words in the review Lexical diversity Mean Variance Helpful 0.61 0.1 Not helpful 0.73 0.13
  • 7. About me Krishna Karthik Experimental particle physics New York University and CERN Reading, hiking and soccer
  • 9. Details of data and algorithm • Features extracted using NLTK and TextBlob • Trained on ~ 60,000 reviews evenly split between the 3 classes • In reality, the dataset is distributed as 3:3:1 (Bad:Middle:Good) • Reviews rated at least 20 times used • Test and validation sets consisting of 30,000 reviews each • One vs Rest using Random Forest classifier
  • 10. Feature importance Book rating Number of words Polarity Lexical diversity
  • 11. Test set plots Test set - ROC Test set - Precision Recall ~4% helpful misclassified as not helpful
  • 12. Validation set plots Validation set Validation set ~4% helpful misclassified as not helpful