SlideShare a Scribd company logo
1 of 12
What factors make a Young Adult author
more likely to be successful?
Yifan Wang
Understand what features contribute to the
success of young adult authors.
Model Explanation
What factors make a Young Adult author
more likely to be successful?
Research Question & Available Data
Explorative Data Analysis
Building Predictive Models
1
Table of Content
Explore both categorical and numerical
variables
In this analysis, we will build multiple models
and stack them together to increase the overall
performance.
Key Findings
2
• Young Adult Books with optimal length are more likely to be successful (between 290 and 485
pages);
• Book title with strong sentiments are more likely to be successful;
• Publisher does matter, certain publishers contribute to the success of the young adult book
authors.
Book Title Author Name Star Rating Number of reviews Length Publisher
Mistrust Margaret McHeyzer 4.5 64 333 Amazon
Girl in Pieces Kathleen Glasgow 4.5 139 418 Delacorte
Just Juliet Charlotte Reagan 4.5 369 224 Inkitt
Dork in Love ~ Tales of My Dorky Love Life: Teen
Romance
Ann Writes 4.5 9 122 Amazon
Warrior Cats: Battle (Warrior Cats (Werecat YA
Paranormal) Book 4)
Tiyana Angel 5 1 52 Guardian Angel Press
Out of Beat (Boys of Fallout Book 1) Cassandra Giovanni 4.5 11 231 Show not Tell Publishing
Research Question and Available Data
3
With the Book Title, we can further extract
sentiment information to provide additional
features for this analysis.
With the Author Name, we can further extract
the first name and get gender information and
add to the features.
In this analysis, we will define success as a rating
of 4.5 or above and a reviewer count of 100 or
above.
What factors make a Young Adult author more
likely to be successful?
Anger Anticipation Disgust Fear Joy
0 0 1 1 0
0 0 0 0 0
0 0 0 0 0
0 1 0 1 1
1 0 0 1 0
0 0 0 0 0
Name Male % Female % Gender Year_Min Year_Max
Abigail 0.0019 0.9981 female 1920 2010
Addison 0.1359 0.8641 female 1920 2010
Alexandra 0.0039 0.9961 female 1920 2010
Amy 0.0026 0.9974 female 1920 2010
Amy 0.0026 0.9974 female 1920 2010
Ann 0.0029 0.9971 female 1920 2010
Explorative Data Analysis: Categorical Features
4
Success by Book Title Sentiment Type Success by Sentiments
Explorative Data Analysis: Categorical Features
5
Success by Publishers
6
Explorative Data Analysis: Numerical Features
Success by Length Success by Star Rating Success by Number of Review
Building Predictive Models: Preprocess Data
7
SMOTE: Synthetic Minority Over-Sampling TechniqueImbalanced Data
No Yes Total
73 27 100
73% 27%
No Yes Total
54 21 75
72% 28%
No Yes Total
42 42 84
50% 50%
All Available data Training Data Set Training Data Set (with SMOTE)
Accuracy Min. 1st Qu. Median Mean 3rd Qu. Max.
c5.0 0.6470588 0.7500000 0.7647059 0.7552288 0.7745098 0.8333333
gbm 0.5882353 0.7169118 0.7638889 0.7504085 0.8038194 0.8823529
treebag 0.6470588 0.6571691 0.7500000 0.7428105 0.8005515 0.8888889
rf 0.6470588 0.7169118 0.7712418 0.7559232 0.8038194 0.8235294
knn 0.5294118 0.6571691 0.7326389 0.7320670 0.8207721 0.8750000
SVMRadial 0.5882353 0.6305147 0.6875000 0.6841912 0.7181373 0.8125000
Building Predictive Models: Training Models
8
Kappa Min. 1st Qu. Median Mean 3rd Qu. Max.
c5.0 0.31081081 0.5000000 0.5310122 0.5129304 0.5502283 0.6666667
gbm 0.17931034 0.4291958 0.5277778 0.5004806 0.6076389 0.7671233
treebag 0.30136986 0.3268581 0.5000000 0.4895214 0.6006944 0.7777778
rf 0.30136986 0.4353448 0.5416667 0.5122408 0.6076389 0.6433566
knn 0.04225352 0.3125000 0.4652778 0.4621085 0.6424569 0.7500000
SVMRadial 0.17931034 0.2604167 0.3750000 0.3715983 0.4407159 0.6250000
Correlation c5.0 gbm treebag rf knn SVMRadial
c5.0 1.0000000 0.5614717 0.6228355 0.62473600 0.38103898 0.1938362
gbm 0.5614717 1.0000000 0.2465510 0.32106821 0.81746483 0.6061850
treebag 0.6228355 0.2465510 1.0000000 0.38551636 0.37732381 0.5382060
rf 0.6247360 0.3210682 0.3855164 1.0000000 -0.06350643 -0.1650719
knn 0.3810390 0.8174648 0.3773238 -0.06350643 1.0000000 0.8849458
SVMRadial 0.1938362 0.6061850 0.5382060 -0.16507190 0.88494583 1.0000000
Sub-Models Results Sub-Models Correlation
9
Reference
Prediction No Yes
No 15 2
Yes 4 4
Reference
Prediction No Yes
No 0.60 0.08
Yes 0.16 0.16
Factor Probability
1 No 0.042
2 No 0.096
3 No 0.290
4 Yes 0.642
5 Yes 0.516
6 No 0.154
7 No 0.084
8 No 0.030
9 No 0.128
10 No 0.274
Building Predictive Models: Stacking Sub-Models
Accuracy 0.76
95% CI (0.5487, 0.9064)
Kappa 0.4094
Sensitivity 0.6667
Specificity 0.7895
Pos Pred. Value 0.5000
Neg Pred. Value 0.8824
Prevalence 0.2400
Detection Rate 0.1600
Detection Prevalence 0.3200
Balanced Accuracy 0.7281
Final Model Stacked with Random Forest Sub-Model (Random Forest)
Accuracy 0.56
95% CI (0.3493, 0.756)
Kappa 0.1379
Sensitivity 0.6667
Specificity 0.5263
Pos Pred. Value 0.3077
Neg Pred. Value 0.8333
Prevalence 0.2400
Detection Rate 0.1600
Detection Prevalence 0.5200
Balanced Accuracy 0.5965
Prediction Accuracy & Statistics
Confusion Matrix ROC Curve (AUC: 0.7368421)
Accuracy & Statistics
ROC Curve (AUC: 0.7105263)
Model Explanation
10
Feature Importance Heatmap (all 25 cases in test data set)
Model Explanation
Feature Importance Visualization (First 6 cases from test data set)
11

More Related Content

Recently uploaded

Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证zifhagzkk
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunksgmuir1066
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理pyhepag
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchersdarmandersingh4580
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfRobertoOcampo24
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一fztigerwe
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxStephen266013
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"John Sobanski
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksBoston Institute of Analytics
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证dq9vz1isj
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancingmohamed Elzalabany
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 

Recently uploaded (20)

Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
123.docx. .
123.docx.                                 .123.docx.                                 .
123.docx. .
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

What factors make a Young Adult author more likely to be successful?

  • 1. What factors make a Young Adult author more likely to be successful? Yifan Wang
  • 2. Understand what features contribute to the success of young adult authors. Model Explanation What factors make a Young Adult author more likely to be successful? Research Question & Available Data Explorative Data Analysis Building Predictive Models 1 Table of Content Explore both categorical and numerical variables In this analysis, we will build multiple models and stack them together to increase the overall performance.
  • 3. Key Findings 2 • Young Adult Books with optimal length are more likely to be successful (between 290 and 485 pages); • Book title with strong sentiments are more likely to be successful; • Publisher does matter, certain publishers contribute to the success of the young adult book authors.
  • 4. Book Title Author Name Star Rating Number of reviews Length Publisher Mistrust Margaret McHeyzer 4.5 64 333 Amazon Girl in Pieces Kathleen Glasgow 4.5 139 418 Delacorte Just Juliet Charlotte Reagan 4.5 369 224 Inkitt Dork in Love ~ Tales of My Dorky Love Life: Teen Romance Ann Writes 4.5 9 122 Amazon Warrior Cats: Battle (Warrior Cats (Werecat YA Paranormal) Book 4) Tiyana Angel 5 1 52 Guardian Angel Press Out of Beat (Boys of Fallout Book 1) Cassandra Giovanni 4.5 11 231 Show not Tell Publishing Research Question and Available Data 3 With the Book Title, we can further extract sentiment information to provide additional features for this analysis. With the Author Name, we can further extract the first name and get gender information and add to the features. In this analysis, we will define success as a rating of 4.5 or above and a reviewer count of 100 or above. What factors make a Young Adult author more likely to be successful? Anger Anticipation Disgust Fear Joy 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 1 0 0 0 0 0 0 Name Male % Female % Gender Year_Min Year_Max Abigail 0.0019 0.9981 female 1920 2010 Addison 0.1359 0.8641 female 1920 2010 Alexandra 0.0039 0.9961 female 1920 2010 Amy 0.0026 0.9974 female 1920 2010 Amy 0.0026 0.9974 female 1920 2010 Ann 0.0029 0.9971 female 1920 2010
  • 5. Explorative Data Analysis: Categorical Features 4 Success by Book Title Sentiment Type Success by Sentiments
  • 6. Explorative Data Analysis: Categorical Features 5 Success by Publishers
  • 7. 6 Explorative Data Analysis: Numerical Features Success by Length Success by Star Rating Success by Number of Review
  • 8. Building Predictive Models: Preprocess Data 7 SMOTE: Synthetic Minority Over-Sampling TechniqueImbalanced Data No Yes Total 73 27 100 73% 27% No Yes Total 54 21 75 72% 28% No Yes Total 42 42 84 50% 50% All Available data Training Data Set Training Data Set (with SMOTE)
  • 9. Accuracy Min. 1st Qu. Median Mean 3rd Qu. Max. c5.0 0.6470588 0.7500000 0.7647059 0.7552288 0.7745098 0.8333333 gbm 0.5882353 0.7169118 0.7638889 0.7504085 0.8038194 0.8823529 treebag 0.6470588 0.6571691 0.7500000 0.7428105 0.8005515 0.8888889 rf 0.6470588 0.7169118 0.7712418 0.7559232 0.8038194 0.8235294 knn 0.5294118 0.6571691 0.7326389 0.7320670 0.8207721 0.8750000 SVMRadial 0.5882353 0.6305147 0.6875000 0.6841912 0.7181373 0.8125000 Building Predictive Models: Training Models 8 Kappa Min. 1st Qu. Median Mean 3rd Qu. Max. c5.0 0.31081081 0.5000000 0.5310122 0.5129304 0.5502283 0.6666667 gbm 0.17931034 0.4291958 0.5277778 0.5004806 0.6076389 0.7671233 treebag 0.30136986 0.3268581 0.5000000 0.4895214 0.6006944 0.7777778 rf 0.30136986 0.4353448 0.5416667 0.5122408 0.6076389 0.6433566 knn 0.04225352 0.3125000 0.4652778 0.4621085 0.6424569 0.7500000 SVMRadial 0.17931034 0.2604167 0.3750000 0.3715983 0.4407159 0.6250000 Correlation c5.0 gbm treebag rf knn SVMRadial c5.0 1.0000000 0.5614717 0.6228355 0.62473600 0.38103898 0.1938362 gbm 0.5614717 1.0000000 0.2465510 0.32106821 0.81746483 0.6061850 treebag 0.6228355 0.2465510 1.0000000 0.38551636 0.37732381 0.5382060 rf 0.6247360 0.3210682 0.3855164 1.0000000 -0.06350643 -0.1650719 knn 0.3810390 0.8174648 0.3773238 -0.06350643 1.0000000 0.8849458 SVMRadial 0.1938362 0.6061850 0.5382060 -0.16507190 0.88494583 1.0000000 Sub-Models Results Sub-Models Correlation
  • 10. 9 Reference Prediction No Yes No 15 2 Yes 4 4 Reference Prediction No Yes No 0.60 0.08 Yes 0.16 0.16 Factor Probability 1 No 0.042 2 No 0.096 3 No 0.290 4 Yes 0.642 5 Yes 0.516 6 No 0.154 7 No 0.084 8 No 0.030 9 No 0.128 10 No 0.274 Building Predictive Models: Stacking Sub-Models Accuracy 0.76 95% CI (0.5487, 0.9064) Kappa 0.4094 Sensitivity 0.6667 Specificity 0.7895 Pos Pred. Value 0.5000 Neg Pred. Value 0.8824 Prevalence 0.2400 Detection Rate 0.1600 Detection Prevalence 0.3200 Balanced Accuracy 0.7281 Final Model Stacked with Random Forest Sub-Model (Random Forest) Accuracy 0.56 95% CI (0.3493, 0.756) Kappa 0.1379 Sensitivity 0.6667 Specificity 0.5263 Pos Pred. Value 0.3077 Neg Pred. Value 0.8333 Prevalence 0.2400 Detection Rate 0.1600 Detection Prevalence 0.5200 Balanced Accuracy 0.5965 Prediction Accuracy & Statistics Confusion Matrix ROC Curve (AUC: 0.7368421) Accuracy & Statistics ROC Curve (AUC: 0.7105263)
  • 11. Model Explanation 10 Feature Importance Heatmap (all 25 cases in test data set)
  • 12. Model Explanation Feature Importance Visualization (First 6 cases from test data set) 11