Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Titanic: Machine Learning from Disaster
1. INFSCI – 2725 DATA ANALYTICS
Team Name: Data
Warriors
Team Members:
Sushma Anand Akoju
BharathKumar Inbasekaran
Kaggle –Titanic: Machine Learning from Disaster
Source: http://img.src.ca/2012/04/06/635x357/120406_g08if_betcie_titanic_iceberg_sn635.jpg
6. Pclass, Age and Sex
The survival of young
passengers is not
dependent on Pclass
and Sex.
Source: http://i.imgur.com/geNnKff.png
7. MissingValues
The age for many
male passengers who
are travelling alone
are missing
Source: https://lh3.googleusercontent.com/6Ow3-1kbm2-lpabohQ4eKrcbg3DnWddCsqwpyjRiz2g=w402-h311-p-no
8. SurvivalVs Fare
Passengers who have
paid more fare for
tickets had higher
survival rate.
Source: http://s24.postimg.org/7kp796lmd/Fare_Price_And_SRate.png
10. Using Decision tree with single attribute
Title, Sex and Pclass
are the major deciding
factor.
Source: http://s13.postimg.org/55y2aslnr/single_attribute_scores.jpg
11. Combining attributes
Combining Sex and
Fare yields good result
Sourcehttps://http://s11.postimg.org/jc28ezt1f/sex_plus_attribute_scores.png
12. Missing Data CompletionTechniques
• 236 missing values
• Two approaches:
• Median value
• DecisionTree
age
• Two missing value
• Substituted with “S” as many passengers had embarked from
Southamptonembarked
• One NA value
• Two approaches:
• Replaced with the median value.
• DecisionTree
fare
Source: http://trevorstephens.com/post/72916401642/titanic-getting-started-with-r
13. Given
Attributes
Pclass, Age, Sex, Fare,
Sibsp, Parch, Cabin
Engineered
Attributes
Title, Surname, Mother,
Father, Extended relation
Attributes Considered
23. Other Approaches and Observations
• Used all attributes for prediction in decision tree, but ended up with the issue of
overfitting
• Used Bayesian Search in Genie to implement Naïve Bayes
• GBM model predictionVs Random Forest predictionVs Bayesian Dependence
• We found importance of attributes in GBM or conditional dependence in Bayesian
Search were similar
• Used brute force methods:
• All Survived - .37 accuracy
• All Perished - .63 accuracy
• Used stacking and voting – achieved .85 accuracy
• FrequencyTables and Conditional Probability
24. Manual Prediction
Every one correct
prediction gives a
0.2 percent increase
in accuracy
http://rstudio-pubs-static.s3.amazonaws.com/53109_cd4baa0d9ad54bfb94598a67f79a79a6.html
25. Experience and Learning
Each algorithm gave different results
No algorithm can predict with 100% accuracy
The accuracy of a model as determined by confusion matrix is
different from the accuracy obtained from kaggle