Titanic: Machine Learning from Disaster

•

6 likes•2,786 views

BharathKumar (BK) Inbasekaran

Predicting survivors in Titanic through Machine Learning.

Data & Analytics

Source: https://www.kaggle.com/c/titanic
Titanic: Machine Learning from Disaster
• Goal –Which passenger survived the tragedy?
• Kaggle “Getting Started” competition
• No point / tiers awarded in Kaggle

SurvivedVsTravelling with family
Sourcehttps://http://rfsber.home.xs4all.nl/sur_rate_family_all_sm.png

Pclass, Age and Sex
The survival of young
passengers is not
dependent on Pclass
and Sex.
Source: http://i.imgur.com/geNnKff.png

MissingValues
The age for many
male passengers who
are travelling alone
are missing
Source: https://lh3.googleusercontent.com/6Ow3-1kbm2-lpabohQ4eKrcbg3DnWddCsqwpyjRiz2g=w402-h311-p-no

SurvivalVs Fare
Passengers who have
paid more fare for
tickets had higher
survival rate.
Source: http://s24.postimg.org/7kp796lmd/Fare_Price_And_SRate.png

Sourcehttps://kaggle2.blob.core.windows.net/prospector-files/ac638596-4333-4581-9871-53da431d94a0/ca4346c29dcc4c1aa3ba3328724aa139/Sheet%203.jpg

Using Decision tree with single attribute
Title, Sex and Pclass
are the major deciding
factor.
Source: http://s13.postimg.org/55y2aslnr/single_attribute_scores.jpg

Combining attributes
Combining Sex and
Fare yields good result
Sourcehttps://http://s11.postimg.org/jc28ezt1f/sex_plus_attribute_scores.png

Missing Data CompletionTechniques
• 236 missing values
• Two approaches:
• Median value
• DecisionTree
age
• Two missing value
• Substituted with “S” as many passengers had embarked from
Southamptonembarked
• One NA value
• Two approaches:
• Replaced with the median value.
• DecisionTree
fare
Source: http://trevorstephens.com/post/72916401642/titanic-getting-started-with-r

Given
Attributes
Pclass, Age, Sex, Fare,
Sibsp, Parch, Cabin
Engineered
Attributes
Title, Surname, Mother,
Father, Extended relation
Attributes Considered

Algorithms
C Forest
+
Boosting
Naïve
Bayes
K - SVM
GLM
Random
Forest
GBM

Accuracy
•0.77
Learning
•Limitation with the number of
factors affected the accuracy
Random Forest

Accuracy
•0.77
Learning
•All the attributes do not follow
standardized distribution
Generalized Linear Model

Accuracy
•0.79
Learning
•Less accuracy because all the
attributes could not be factored
K – Fold SVM

Naïve Bayes
Accuracy
• 0.77
Learning
• Considering the conditional and individual
probabilistic dependence of each variable on
survival rate.

Accuracy
• 0.81
Learning
• More factors were considered and the
accuracy has improved over Random forest.
Conditional Inference Forest (C Forest)

Accuracy
•0.83
Learning
•Important attributes like title, fare
and pclass were correctly identified
Gradient Boosted Regression Model (GBM)

Other Approaches and Observations
• Used all attributes for prediction in decision tree, but ended up with the issue of
overfitting
• Used Bayesian Search in Genie to implement Naïve Bayes
• GBM model predictionVs Random Forest predictionVs Bayesian Dependence
• We found importance of attributes in GBM or conditional dependence in Bayesian
Search were similar
• Used brute force methods:
• All Survived - .37 accuracy
• All Perished - .63 accuracy
• Used stacking and voting – achieved .85 accuracy
• FrequencyTables and Conditional Probability

Manual Prediction
Every one correct
prediction gives a
0.2 percent increase
in accuracy
http://rstudio-pubs-static.s3.amazonaws.com/53109_cd4baa0d9ad54bfb94598a67f79a79a6.html

Experience and Learning
Each algorithm gave different results
No algorithm can predict with 100% accuracy
The accuracy of a model as determined by confusion matrix is
different from the accuracy obtained from kaggle

We were able
to reach top
10 in kaggle !

What's hot

Decision Tree LearningMilind Gokhale

Titanic survivor prediction by machine learningDing Li

AES by exampleShiraz316

Diamond Price PredictionChirag Ghelani

Synthetic Data Generation with DoppelGangerQuantUniversity

Decision treeVenkata Reddy Konasani

Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...Neo4j

Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony

Decision treeMukund Trivedi

Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet MahanaAmrinder Arora

SVM & KNN Presentation.pptxMohamedMonir33

Data For DataminingDataminingTools Inc

Machine Learning Algorithm - Decision Trees Kush Kulshrestha

L3. Decision TreesMachine Learning Valencia

5descVishwajeet Gudadhe

Random forest algorithmRashid Ansari

Decision Tree LearningMd. Ariful Hoque

Ml2 train test-splits_validation_linear_regressionankit_ppt

Confusion Matrix ExplainedStockholm University

2.2 decision treeKrish_ver2

What's hot (20)

Decision Tree Learning

Titanic survivor prediction by machine learning

AES by example

Diamond Price Prediction

Synthetic Data Generation with DoppelGanger

Decision tree

Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...

Classification Based Machine Learning Algorithms

Decision tree

Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana

SVM & KNN Presentation.pptx

Data For Datamining

Machine Learning Algorithm - Decision Trees

L3. Decision Trees

5desc

Random forest algorithm

Decision Tree Learning

Ml2 train test-splits_validation_linear_regression

Confusion Matrix Explained

2.2 decision tree

Similar to Titanic: Machine Learning from Disaster

Meetup_Consumer_Credit_Default_Vers_2_AllBernard Ong

Marc Stein, Underwrite.ai - Driverless AI Use Cases in Finance and Cancer Gen...Sri Ambati

Horizon: Deep Reinforcement Learning at ScaleDatabricks

Mini datathonKunal Jain

Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmeaSandesh Rao

Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEASandesh Rao

Getting to Know the Video Consumer - NAB Show 2018Verimatrix

Measuring Coverage From E2E TestsAnand Bagmar

Predicting Storage Failures with Machine LearningAhmed El-Shimi

DSO530 Group projectlibbx1008

Looking into the Future: Using Google's Prediction APIJustin Grammens

Change Tyres In A Moving Car - Make Functional Test Automation Effective KeynoteAnand Bagmar

How we measure quality of JIRA deployments to Cloud?Stowarzyszenie Jakości Systemów Informatycznych (SJSI)

SANER2022 SDC-ScissorChristian Birchler

Similar to Titanic: Machine Learning from Disaster (14)

Meetup_Consumer_Credit_Default_Vers_2_All

Marc Stein, Underwrite.ai - Driverless AI Use Cases in Finance and Cancer Gen...

Horizon: Deep Reinforcement Learning at Scale

Mini datathon

Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea

Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA

Getting to Know the Video Consumer - NAB Show 2018

Measuring Coverage From E2E Tests

Predicting Storage Failures with Machine Learning

DSO530 Group project

Looking into the Future: Using Google's Prediction API

Change Tyres In A Moving Car - Make Functional Test Automation Effective Keynote

How we measure quality of JIRA deployments to Cloud?

SANER2022 SDC-Scissor

Recently uploaded

Zuja dropshipping via API with DroFx.pptxolyaivanovalion

Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls

CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion

Data-Analysis for Chicago Crime Data 2023ymrp368

Capstone Project on IBM Data Analytics ProgramMoniSankarHazra

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823

Midocean dropshipping via API with DroFxolyaivanovalion

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71

Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson

VidaXL dropshipping via API with DroFx.pptxolyaivanovalion

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

Recently uploaded (20)

Zuja dropshipping via API with DroFx.pptx

Determinants of health, dimensions of health, positive health and spectrum of...

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night

CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online

BigBuy dropshipping via API with DroFx.pptx

CebaBaby dropshipping via API with DroFX.pptx

Data-Analysis for Chicago Crime Data 2023

Capstone Project on IBM Data Analytics Program

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

Midocean dropshipping via API with DroFx

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

Schema on read is obsolete. Welcome metaprogramming..pdf

VidaXL dropshipping via API with DroFx.pptx

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

Titanic: Machine Learning from Disaster

1. INFSCI – 2725 DATA ANALYTICS Team Name: Data Warriors Team Members: Sushma Anand Akoju BharathKumar Inbasekaran Kaggle –Titanic: Machine Learning from Disaster Source: http://img.src.ca/2012/04/06/635x357/120406_g08if_betcie_titanic_iceberg_sn635.jpg

2. Data Warriors Sushma Bharath

3. Source: https://www.kaggle.com/c/titanic Titanic: Machine Learning from Disaster • Goal –Which passenger survived the tragedy? • Kaggle “Getting Started” competition • No point / tiers awarded in Kaggle

4. Pre-data Analysis

5. SurvivedVsTravelling with family Sourcehttps://http://rfsber.home.xs4all.nl/sur_rate_family_all_sm.png

6. Pclass, Age and Sex The survival of young passengers is not dependent on Pclass and Sex. Source: http://i.imgur.com/geNnKff.png

7. MissingValues The age for many male passengers who are travelling alone are missing Source: https://lh3.googleusercontent.com/6Ow3-1kbm2-lpabohQ4eKrcbg3DnWddCsqwpyjRiz2g=w402-h311-p-no

8. SurvivalVs Fare Passengers who have paid more fare for tickets had higher survival rate. Source: http://s24.postimg.org/7kp796lmd/Fare_Price_And_SRate.png

9. Sourcehttps://kaggle2.blob.core.windows.net/prospector-files/ac638596-4333-4581-9871-53da431d94a0/ca4346c29dcc4c1aa3ba3328724aa139/Sheet%203.jpg

10. Using Decision tree with single attribute Title, Sex and Pclass are the major deciding factor. Source: http://s13.postimg.org/55y2aslnr/single_attribute_scores.jpg

11. Combining attributes Combining Sex and Fare yields good result Sourcehttps://http://s11.postimg.org/jc28ezt1f/sex_plus_attribute_scores.png

12. Missing Data CompletionTechniques • 236 missing values • Two approaches: • Median value • DecisionTree age • Two missing value • Substituted with “S” as many passengers had embarked from Southamptonembarked • One NA value • Two approaches: • Replaced with the median value. • DecisionTree fare Source: http://trevorstephens.com/post/72916401642/titanic-getting-started-with-r

13. Given Attributes Pclass, Age, Sex, Fare, Sibsp, Parch, Cabin Engineered Attributes Title, Surname, Mother, Father, Extended relation Attributes Considered

14. Algorithms C Forest + Boosting Naïve Bayes K - SVM GLM Random Forest GBM

15. Accuracy •0.77 Learning •Limitation with the number of factors affected the accuracy Random Forest

16. Accuracy •0.77 Learning •All the attributes do not follow standardized distribution Generalized Linear Model

17. Accuracy •0.79 Learning •Less accuracy because all the attributes could not be factored K – Fold SVM

18. Naïve Bayes Accuracy • 0.77 Learning • Considering the conditional and individual probabilistic dependence of each variable on survival rate.

19. Accuracy • 0.81 Learning • More factors were considered and the accuracy has improved over Random forest. Conditional Inference Forest (C Forest)

20. Accuracy •0.83 Learning •Important attributes like title, fare and pclass were correctly identified Gradient Boosted Regression Model (GBM)

21. GBMVs Random Forest

22.

23. Other Approaches and Observations • Used all attributes for prediction in decision tree, but ended up with the issue of overfitting • Used Bayesian Search in Genie to implement Naïve Bayes • GBM model predictionVs Random Forest predictionVs Bayesian Dependence • We found importance of attributes in GBM or conditional dependence in Bayesian Search were similar • Used brute force methods: • All Survived - .37 accuracy • All Perished - .63 accuracy • Used stacking and voting – achieved .85 accuracy • FrequencyTables and Conditional Probability

24. Manual Prediction Every one correct prediction gives a 0.2 percent increase in accuracy http://rstudio-pubs-static.s3.amazonaws.com/53109_cd4baa0d9ad54bfb94598a67f79a79a6.html

25. Experience and Learning Each algorithm gave different results No algorithm can predict with 100% accuracy The accuracy of a model as determined by confusion matrix is different from the accuracy obtained from kaggle

26. We were able to reach top 10 in kaggle !

Titanic: Machine Learning from Disaster

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Titanic: Machine Learning from Disaster

Similar to Titanic: Machine Learning from Disaster (14)

Recently uploaded

Recently uploaded (20)

Titanic: Machine Learning from Disaster