SlideShare a Scribd company logo
1 of 6
Supplementing Random Forest Predictions with Observed Patterns in Titanic Data
5 logical statements, overriding the predictions of 9 individual outcomes according to the
random forest model, were applied to the test data. This increased the accuracy of
predictions by about 2%.
This time-consuming process of detailed exploratory analysis demonstrates that a basic
random forest model may miscalculate outcomes that a human may otherwise be able to
predict based on tabulation and intuition.
The process, however, also showcase the efficiency of the random forest model. After
several hours of analysis, only 9 outcomes were reversed, whereas the model itself predicts
with relative accuracy the outcomes of roughly 75% of the population with only a few lines
of code.
Female
Female ticket
survival rate = 100%+ = Survived
Female Survival Rate by Ticket, Training Data
Ticket Number Survival Rate # of Females on Ticket
2666 100% 4
13502 100% 3
24160 100% 3
110152 100% 3
PC 17757 100% 3
2668 100% 2
11767 100% 2
12749 100% 2
16966 100% 2
17421 100% 2
19950 100% 2
26360 100% 2
35273 100% 2
36928 100% 2
Will not be applied to Virgil App as ticket number is not
collected.
Many groups of passengers had the same ticket number. Ticket number appears to be a
reasonable predictor of groups’ or families’ survival rates, particularly among females. Large
groups of females travelling on the same ticket appeared to have experienced the same
outcome.
In total, 38 passengers met these criteria in the
test data. The random forest model predicted
that 36 of the 38 passengers would survive.
Using this logical statement, all 38 female
passengers would survive.
Female
Female ticket
survival rate = 0%+ = Died
In total, 7 passengers met these criteria in the
test data. The random forest model predicted
that 4 of the 7 passengers would survive. Using
this logical statement, all 7 female passengers
would die.
Cabin appeared to be a confounding factor in determining whether a female on the same
ticket as a group of females died. Groups of females on the same ticket appeared to die at
the same rates when the cabin field was blank. There was less confidence when the cabin
field was not blank.
+ Cabin field is blank
Survival rate by Ticket Number, Females with Blank Cabin
Ticket Number Survival Rate Grand Total
347082 0% 5
4133 0% 3
347088 0% 3
349909 0% 3
CA. 2343 0% 3
W./C. 6608 0% 3
2665 0% 2
2678 0% 2
2691 0% 2
345773 0% 2
CA 2144 0% 2
Will not be applied to Virgil App as ticket number is not
collected.
Female Fare > 50
+ = Survived
In total, 25 passengers met these criteria in the
test data. The random forest model predicted
that 24 of the 25 passengers would survive.
Using this logical statement, all 25 female
passengers would survive.
The vast majority of females with a fare of at least 50 survived. No characteristics could be
identified for the 6% of women with fares higher than 50 which died, so all women with
fares higher than 50 are predicted to survive.
Survival Rate by Fare Range, Females
Fare Range Survival Rate Grand Total
0 - 10 59% 64
20 - 50 66% 85
10.1 -20 74% 78
50.1 - 100 94% 53
101+ 94% 34
Grand Total 74% 314
May be applied to Virgil App if fare is collected
Class 1 or 2 Age < 16
+ = Survived
In total, 11 passengers met these criteria in the
test data. The random forest model predicted
that 10 of the 11 passengers would survive.
Using this logical statement, all 11 passengers
would survive.
The vast majority of first and second class passengers under the age of 16 survived. No
characteristics could be identified for the 4% which died, so all first and second class
passengers under the age of 16 are predicted to survive.
Survival Rate by Age Range, First and Second Class
Age Range Survival Rate Grand Total
Blank 44% 41
<10 95% 20
<15 100% 4
<20 63% 32
<30 53% 87
<50 59% 153
<80 40% 62
80+ 100% 1
Grand Total 56% 400
May be applied to Virgil app if age and class is collected
Male
Male ticket survival
rate > 49%+ = Survived
In total, 8 passengers met these criteria in the
test data. The random forest model predicted
that 6 of the 8 passengers would survive. Using
this logical statement, all 8 male passengers
would survive.
Identifying common characteristics of surviving males proved difficult. The survival rate of
males sharing a common ticket is a reasonable way of predicting the survival of a male
passenger. A relatively low bar is used, predicting a male’s survival if at least half of the
males on the same ticket survive.
Survival Rate by Ticket, Males
Ticket Number Survival Rate Grand Total
1601 71% 7
230080 67% 3
2661 100% 2
29106 100% 2
113760 100% 2
C.A. 37671 100% 2
PC 17572 100% 2
PC 17755 100% 2
2699 50% 2
17421 50% 2
347077 50% 2
363291 50% 2
C.A. 2315 50% 2
C.A. 33112 50% 2
S.C./PARIS 2079 50% 2
Will not be applied to Virgil App as ticket number is not
collected.

More Related Content

Recently uploaded

一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
RafigAliyev2
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
pyhepag
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
cyebo
 

Recently uploaded (20)

Data analytics courses in Nepal Presentation
Data analytics courses in Nepal PresentationData analytics courses in Nepal Presentation
Data analytics courses in Nepal Presentation
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 

Featured

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 

Supplementing Random Forest Predictions with Observed Patterns in Titanic Data

  • 1. Supplementing Random Forest Predictions with Observed Patterns in Titanic Data 5 logical statements, overriding the predictions of 9 individual outcomes according to the random forest model, were applied to the test data. This increased the accuracy of predictions by about 2%. This time-consuming process of detailed exploratory analysis demonstrates that a basic random forest model may miscalculate outcomes that a human may otherwise be able to predict based on tabulation and intuition. The process, however, also showcase the efficiency of the random forest model. After several hours of analysis, only 9 outcomes were reversed, whereas the model itself predicts with relative accuracy the outcomes of roughly 75% of the population with only a few lines of code.
  • 2. Female Female ticket survival rate = 100%+ = Survived Female Survival Rate by Ticket, Training Data Ticket Number Survival Rate # of Females on Ticket 2666 100% 4 13502 100% 3 24160 100% 3 110152 100% 3 PC 17757 100% 3 2668 100% 2 11767 100% 2 12749 100% 2 16966 100% 2 17421 100% 2 19950 100% 2 26360 100% 2 35273 100% 2 36928 100% 2 Will not be applied to Virgil App as ticket number is not collected. Many groups of passengers had the same ticket number. Ticket number appears to be a reasonable predictor of groups’ or families’ survival rates, particularly among females. Large groups of females travelling on the same ticket appeared to have experienced the same outcome. In total, 38 passengers met these criteria in the test data. The random forest model predicted that 36 of the 38 passengers would survive. Using this logical statement, all 38 female passengers would survive.
  • 3. Female Female ticket survival rate = 0%+ = Died In total, 7 passengers met these criteria in the test data. The random forest model predicted that 4 of the 7 passengers would survive. Using this logical statement, all 7 female passengers would die. Cabin appeared to be a confounding factor in determining whether a female on the same ticket as a group of females died. Groups of females on the same ticket appeared to die at the same rates when the cabin field was blank. There was less confidence when the cabin field was not blank. + Cabin field is blank Survival rate by Ticket Number, Females with Blank Cabin Ticket Number Survival Rate Grand Total 347082 0% 5 4133 0% 3 347088 0% 3 349909 0% 3 CA. 2343 0% 3 W./C. 6608 0% 3 2665 0% 2 2678 0% 2 2691 0% 2 345773 0% 2 CA 2144 0% 2 Will not be applied to Virgil App as ticket number is not collected.
  • 4. Female Fare > 50 + = Survived In total, 25 passengers met these criteria in the test data. The random forest model predicted that 24 of the 25 passengers would survive. Using this logical statement, all 25 female passengers would survive. The vast majority of females with a fare of at least 50 survived. No characteristics could be identified for the 6% of women with fares higher than 50 which died, so all women with fares higher than 50 are predicted to survive. Survival Rate by Fare Range, Females Fare Range Survival Rate Grand Total 0 - 10 59% 64 20 - 50 66% 85 10.1 -20 74% 78 50.1 - 100 94% 53 101+ 94% 34 Grand Total 74% 314 May be applied to Virgil App if fare is collected
  • 5. Class 1 or 2 Age < 16 + = Survived In total, 11 passengers met these criteria in the test data. The random forest model predicted that 10 of the 11 passengers would survive. Using this logical statement, all 11 passengers would survive. The vast majority of first and second class passengers under the age of 16 survived. No characteristics could be identified for the 4% which died, so all first and second class passengers under the age of 16 are predicted to survive. Survival Rate by Age Range, First and Second Class Age Range Survival Rate Grand Total Blank 44% 41 <10 95% 20 <15 100% 4 <20 63% 32 <30 53% 87 <50 59% 153 <80 40% 62 80+ 100% 1 Grand Total 56% 400 May be applied to Virgil app if age and class is collected
  • 6. Male Male ticket survival rate > 49%+ = Survived In total, 8 passengers met these criteria in the test data. The random forest model predicted that 6 of the 8 passengers would survive. Using this logical statement, all 8 male passengers would survive. Identifying common characteristics of surviving males proved difficult. The survival rate of males sharing a common ticket is a reasonable way of predicting the survival of a male passenger. A relatively low bar is used, predicting a male’s survival if at least half of the males on the same ticket survive. Survival Rate by Ticket, Males Ticket Number Survival Rate Grand Total 1601 71% 7 230080 67% 3 2661 100% 2 29106 100% 2 113760 100% 2 C.A. 37671 100% 2 PC 17572 100% 2 PC 17755 100% 2 2699 50% 2 17421 50% 2 347077 50% 2 363291 50% 2 C.A. 2315 50% 2 C.A. 33112 50% 2 S.C./PARIS 2079 50% 2 Will not be applied to Virgil App as ticket number is not collected.