More than Just Lines on a Map: Best Practices for U.S Bike Routes
Supplementing Random Forest Predictions with Observed Patterns in Titanic Data
1. Supplementing Random Forest Predictions with Observed Patterns in Titanic Data
5 logical statements, overriding the predictions of 9 individual outcomes according to the
random forest model, were applied to the test data. This increased the accuracy of
predictions by about 2%.
This time-consuming process of detailed exploratory analysis demonstrates that a basic
random forest model may miscalculate outcomes that a human may otherwise be able to
predict based on tabulation and intuition.
The process, however, also showcase the efficiency of the random forest model. After
several hours of analysis, only 9 outcomes were reversed, whereas the model itself predicts
with relative accuracy the outcomes of roughly 75% of the population with only a few lines
of code.
2. Female
Female ticket
survival rate = 100%+ = Survived
Female Survival Rate by Ticket, Training Data
Ticket Number Survival Rate # of Females on Ticket
2666 100% 4
13502 100% 3
24160 100% 3
110152 100% 3
PC 17757 100% 3
2668 100% 2
11767 100% 2
12749 100% 2
16966 100% 2
17421 100% 2
19950 100% 2
26360 100% 2
35273 100% 2
36928 100% 2
Will not be applied to Virgil App as ticket number is not
collected.
Many groups of passengers had the same ticket number. Ticket number appears to be a
reasonable predictor of groups’ or families’ survival rates, particularly among females. Large
groups of females travelling on the same ticket appeared to have experienced the same
outcome.
In total, 38 passengers met these criteria in the
test data. The random forest model predicted
that 36 of the 38 passengers would survive.
Using this logical statement, all 38 female
passengers would survive.
3. Female
Female ticket
survival rate = 0%+ = Died
In total, 7 passengers met these criteria in the
test data. The random forest model predicted
that 4 of the 7 passengers would survive. Using
this logical statement, all 7 female passengers
would die.
Cabin appeared to be a confounding factor in determining whether a female on the same
ticket as a group of females died. Groups of females on the same ticket appeared to die at
the same rates when the cabin field was blank. There was less confidence when the cabin
field was not blank.
+ Cabin field is blank
Survival rate by Ticket Number, Females with Blank Cabin
Ticket Number Survival Rate Grand Total
347082 0% 5
4133 0% 3
347088 0% 3
349909 0% 3
CA. 2343 0% 3
W./C. 6608 0% 3
2665 0% 2
2678 0% 2
2691 0% 2
345773 0% 2
CA 2144 0% 2
Will not be applied to Virgil App as ticket number is not
collected.
4. Female Fare > 50
+ = Survived
In total, 25 passengers met these criteria in the
test data. The random forest model predicted
that 24 of the 25 passengers would survive.
Using this logical statement, all 25 female
passengers would survive.
The vast majority of females with a fare of at least 50 survived. No characteristics could be
identified for the 6% of women with fares higher than 50 which died, so all women with
fares higher than 50 are predicted to survive.
Survival Rate by Fare Range, Females
Fare Range Survival Rate Grand Total
0 - 10 59% 64
20 - 50 66% 85
10.1 -20 74% 78
50.1 - 100 94% 53
101+ 94% 34
Grand Total 74% 314
May be applied to Virgil App if fare is collected
5. Class 1 or 2 Age < 16
+ = Survived
In total, 11 passengers met these criteria in the
test data. The random forest model predicted
that 10 of the 11 passengers would survive.
Using this logical statement, all 11 passengers
would survive.
The vast majority of first and second class passengers under the age of 16 survived. No
characteristics could be identified for the 4% which died, so all first and second class
passengers under the age of 16 are predicted to survive.
Survival Rate by Age Range, First and Second Class
Age Range Survival Rate Grand Total
Blank 44% 41
<10 95% 20
<15 100% 4
<20 63% 32
<30 53% 87
<50 59% 153
<80 40% 62
80+ 100% 1
Grand Total 56% 400
May be applied to Virgil app if age and class is collected
6. Male
Male ticket survival
rate > 49%+ = Survived
In total, 8 passengers met these criteria in the
test data. The random forest model predicted
that 6 of the 8 passengers would survive. Using
this logical statement, all 8 male passengers
would survive.
Identifying common characteristics of surviving males proved difficult. The survival rate of
males sharing a common ticket is a reasonable way of predicting the survival of a male
passenger. A relatively low bar is used, predicting a male’s survival if at least half of the
males on the same ticket survive.
Survival Rate by Ticket, Males
Ticket Number Survival Rate Grand Total
1601 71% 7
230080 67% 3
2661 100% 2
29106 100% 2
113760 100% 2
C.A. 37671 100% 2
PC 17572 100% 2
PC 17755 100% 2
2699 50% 2
17421 50% 2
347077 50% 2
363291 50% 2
C.A. 2315 50% 2
C.A. 33112 50% 2
S.C./PARIS 2079 50% 2
Will not be applied to Virgil App as ticket number is not
collected.