SlideShare a Scribd company logo
1 of 11
Titanic Survivor Prediction
by Machine Learning
Ding Li 2018.05
online store: costumejewelry1.com
2
April 15, 1912 Titanic Sank
People
2,224
Crew
908
Passengers
1,316
Survivors
212
Victims
696
Survivors
498
Victims
818
Passengers
1,309
Train
891
Test
418
Survivors
342
Victims
549
Survivors
who?
Victims
who?
Modeling Prediction
History
Kaggle Machine Learning (ML) Project
3
PassengerId 195
Survived 1
Pclass 1
Name*
Brown, Mrs. James
Joseph (Margaret Tobin)
Sex female
Age 44
SibSp 0
ParCh 0
Ticket** PC 17610
Fare 27.7208
Cabin*** B4
Embarked C
Train
PassengerId 972
Survived (need to predict)
Pclass 3
Name*
Boulos, Master.
Akar
Sex male
Age 6
SibSp 1
ParCh 1
Ticket** 2678
Fare 15.2458
Cabin***
Embarked C
Test Goal
PassengerId Survived
892 1/0
893 1/0
…… ……
1309 1/0
Margaret Brown In Titanic Movie
By Kathy Bates
*Title can be extracted from Name. **Ticket not informative, not used ***Cabin most missing, not used
Age, Fare: missing data replaced with median value; Embarked: missing data replaced with mode value
4
Embarked from
S: Southampton C: Cherbourg Q: Queenstown
SibSp: # of Siblings or Spouse
ParCh: # of Parents or Children
Family Size = 𝑆𝑖𝑏𝑆𝑝 + 𝑃𝑎𝑟𝑐ℎ + 1
Is Alone = 1
0
if Family Size = 1
if Family Size > 1
5
6
Survived – Sex -0.54
P Class -0.34
Fare bin 0.3
Embarked -0.17
Fare bin – P Class -0.63
Family Size 0.47
Age bin – P Class -0.36
Title code 0.32
7
Models
Coding in Python
Sklearn, Xgboost
Train
Accuracy
Mean
Test
Accuracy
Mean
XGBClassifier 85.6% 82.9%
SVC 83.7% 82.6%
RandomForestClassifier 89.0% 82.2%
DecisionTreeClassifier 89.5% 82.0%
KNeighborsClassifier 85.0% 81.4%
RidgeClassifierCV 79.7% 79.4%
LogisticRegressionCV 79.7% 79.1%
GaussianNB 79.5% 78.1%
SGDClassifier 73.5% 73.2%
Cross Validation (CV)
8
Steps are easy to Interpret
Very complicated logic
Tree is too deep
Prone to overfitting
9
Observation
Model
Accuracy
Train
891
Survivors
342
(38%)
Victims
549
(62%)
Train
891
Victims
891
all
549
891
= 62%
Train
891
Male
577 (35%)
Female
314 (65%)
Survivors
109
(19%)
Victims
468
(81%)
Survivors
234
(74%)
Victims
80
(26%)
Train
891
Male
577 (35%)
Female
314 (65%)
Victims
577
Survivors
314
468+234
891
=
702
891
= 79%
With one more
layer, hand-made
tree can reach
82% accuracy
10
Before Turning:
Training Score = 89.5%
Test Score = 82.05%
After Turning:
(Best max_depth = 4)
Training Score = 89.4%
Test Score = 87.4%
Alleviate the overfitting
11
• Kaggle is a convenient platform to study and practice machine learning.
• Python code can be executed directly at the host server from the browser.
• Numerous datasets were provided on the site, including training and test data.
• Once the prediction file is submitted, a score will be returned to evaluate your model.
• Many developers share runnable code with detailed explanation.
• Appling artificial intelligence blindly without human intelligence is dangerous.
• Some ML models can be too complicated, leading to overfitting.
• The performance of some ML models can be worse than simple hand-made model.
• Combining AI and human logic can make the analytical process enjoyable and reliable.
Python code of the project at kaggle: https://www.kaggle.com/dingli/titanic-survivor-prediction-machine-learning

More Related Content

What's hot

Recurrent Convolutional Neural Networks for Text Classification
Recurrent Convolutional Neural Networks for Text ClassificationRecurrent Convolutional Neural Networks for Text Classification
Recurrent Convolutional Neural Networks for Text ClassificationShuangshuang Zhou
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDevashish Shanker
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Saeedeh Shekarpour
 
Ai vs machine learning vs deep learning
Ai vs machine learning vs deep learningAi vs machine learning vs deep learning
Ai vs machine learning vs deep learningSanjay Patel
 
Machine learning for wireless networks @Bestcom2016
Machine learning for wireless networks @Bestcom2016Machine learning for wireless networks @Bestcom2016
Machine learning for wireless networks @Bestcom2016Merima Kulin
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithmsnextlib
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender SystemsT212
 
When Cyber Security Meets Machine Learning
When Cyber Security Meets Machine LearningWhen Cyber Security Meets Machine Learning
When Cyber Security Meets Machine LearningLior Rokach
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 
Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithmsankit panigrahy
 
Machine learning
Machine learningMachine learning
Machine learningInfoFarm
 
Research of adversarial example on a deep neural network
Research of adversarial example on a deep neural networkResearch of adversarial example on a deep neural network
Research of adversarial example on a deep neural networkNAVER Engineering
 
Neural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsNeural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsJon Lederman
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersSungchul Kim
 
"Getting More from Your Datasets: Data Augmentation, Annotation and Generativ...
"Getting More from Your Datasets: Data Augmentation, Annotation and Generativ..."Getting More from Your Datasets: Data Augmentation, Annotation and Generativ...
"Getting More from Your Datasets: Data Augmentation, Annotation and Generativ...Edge AI and Vision Alliance
 

What's hot (20)

Recurrent Convolutional Neural Networks for Text Classification
Recurrent Convolutional Neural Networks for Text ClassificationRecurrent Convolutional Neural Networks for Text Classification
Recurrent Convolutional Neural Networks for Text Classification
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems
 
Ai vs machine learning vs deep learning
Ai vs machine learning vs deep learningAi vs machine learning vs deep learning
Ai vs machine learning vs deep learning
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
 
Precision and Recall
Precision and RecallPrecision and Recall
Precision and Recall
 
Machine learning for wireless networks @Bestcom2016
Machine learning for wireless networks @Bestcom2016Machine learning for wireless networks @Bestcom2016
Machine learning for wireless networks @Bestcom2016
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommender system
Recommender systemRecommender system
Recommender system
 
When Cyber Security Meets Machine Learning
When Cyber Security Meets Machine LearningWhen Cyber Security Meets Machine Learning
When Cyber Security Meets Machine Learning
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithms
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
Machine learning
Machine learningMachine learning
Machine learning
 
Research of adversarial example on a deep neural network
Research of adversarial example on a deep neural networkResearch of adversarial example on a deep neural network
Research of adversarial example on a deep neural network
 
Neural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsNeural Networks and Deep Learning Basics
Neural Networks and Deep Learning Basics
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
"Getting More from Your Datasets: Data Augmentation, Annotation and Generativ...
"Getting More from Your Datasets: Data Augmentation, Annotation and Generativ..."Getting More from Your Datasets: Data Augmentation, Annotation and Generativ...
"Getting More from Your Datasets: Data Augmentation, Annotation and Generativ...
 

More from Ding Li

Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applicationsDing Li
 
Seismic data analysis with u net
Seismic data analysis with u netSeismic data analysis with u net
Seismic data analysis with u netDing Li
 
Find nuclei in images with U-net
Find nuclei in images with U-netFind nuclei in images with U-net
Find nuclei in images with U-netDing Li
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDing Li
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemDing Li
 
Practical data science
Practical data sciencePractical data science
Practical data scienceDing Li
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksDing Li
 
AI to advance science research
AI to advance science researchAI to advance science research
AI to advance science researchDing Li
 
Machine learning with graph
Machine learning with graphMachine learning with graph
Machine learning with graphDing Li
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
 
Great neck school budget 2016-2017 analysis
Great neck school budget 2016-2017 analysisGreat neck school budget 2016-2017 analysis
Great neck school budget 2016-2017 analysisDing Li
 
Business Intelligence and Big Data in Cloud
Business Intelligence and Big Data in CloudBusiness Intelligence and Big Data in Cloud
Business Intelligence and Big Data in CloudDing Li
 

More from Ding Li (13)

Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
Seismic data analysis with u net
Seismic data analysis with u netSeismic data analysis with u net
Seismic data analysis with u net
 
Find nuclei in images with U-net
Find nuclei in images with U-netFind nuclei in images with U-net
Find nuclei in images with U-net
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural network
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Practical data science
Practical data sciencePractical data science
Practical data science
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
AI to advance science research
AI to advance science researchAI to advance science research
AI to advance science research
 
Machine learning with graph
Machine learning with graphMachine learning with graph
Machine learning with graph
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
Great neck school budget 2016-2017 analysis
Great neck school budget 2016-2017 analysisGreat neck school budget 2016-2017 analysis
Great neck school budget 2016-2017 analysis
 
Business Intelligence and Big Data in Cloud
Business Intelligence and Big Data in CloudBusiness Intelligence and Big Data in Cloud
Business Intelligence and Big Data in Cloud
 

Recently uploaded

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 

Recently uploaded (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 

Titanic survivor prediction by machine learning

  • 1. Titanic Survivor Prediction by Machine Learning Ding Li 2018.05 online store: costumejewelry1.com
  • 2. 2 April 15, 1912 Titanic Sank People 2,224 Crew 908 Passengers 1,316 Survivors 212 Victims 696 Survivors 498 Victims 818 Passengers 1,309 Train 891 Test 418 Survivors 342 Victims 549 Survivors who? Victims who? Modeling Prediction History Kaggle Machine Learning (ML) Project
  • 3. 3 PassengerId 195 Survived 1 Pclass 1 Name* Brown, Mrs. James Joseph (Margaret Tobin) Sex female Age 44 SibSp 0 ParCh 0 Ticket** PC 17610 Fare 27.7208 Cabin*** B4 Embarked C Train PassengerId 972 Survived (need to predict) Pclass 3 Name* Boulos, Master. Akar Sex male Age 6 SibSp 1 ParCh 1 Ticket** 2678 Fare 15.2458 Cabin*** Embarked C Test Goal PassengerId Survived 892 1/0 893 1/0 …… …… 1309 1/0 Margaret Brown In Titanic Movie By Kathy Bates *Title can be extracted from Name. **Ticket not informative, not used ***Cabin most missing, not used Age, Fare: missing data replaced with median value; Embarked: missing data replaced with mode value
  • 4. 4 Embarked from S: Southampton C: Cherbourg Q: Queenstown SibSp: # of Siblings or Spouse ParCh: # of Parents or Children Family Size = 𝑆𝑖𝑏𝑆𝑝 + 𝑃𝑎𝑟𝑐ℎ + 1 Is Alone = 1 0 if Family Size = 1 if Family Size > 1
  • 5. 5
  • 6. 6 Survived – Sex -0.54 P Class -0.34 Fare bin 0.3 Embarked -0.17 Fare bin – P Class -0.63 Family Size 0.47 Age bin – P Class -0.36 Title code 0.32
  • 7. 7 Models Coding in Python Sklearn, Xgboost Train Accuracy Mean Test Accuracy Mean XGBClassifier 85.6% 82.9% SVC 83.7% 82.6% RandomForestClassifier 89.0% 82.2% DecisionTreeClassifier 89.5% 82.0% KNeighborsClassifier 85.0% 81.4% RidgeClassifierCV 79.7% 79.4% LogisticRegressionCV 79.7% 79.1% GaussianNB 79.5% 78.1% SGDClassifier 73.5% 73.2% Cross Validation (CV)
  • 8. 8 Steps are easy to Interpret Very complicated logic Tree is too deep Prone to overfitting
  • 9. 9 Observation Model Accuracy Train 891 Survivors 342 (38%) Victims 549 (62%) Train 891 Victims 891 all 549 891 = 62% Train 891 Male 577 (35%) Female 314 (65%) Survivors 109 (19%) Victims 468 (81%) Survivors 234 (74%) Victims 80 (26%) Train 891 Male 577 (35%) Female 314 (65%) Victims 577 Survivors 314 468+234 891 = 702 891 = 79% With one more layer, hand-made tree can reach 82% accuracy
  • 10. 10 Before Turning: Training Score = 89.5% Test Score = 82.05% After Turning: (Best max_depth = 4) Training Score = 89.4% Test Score = 87.4% Alleviate the overfitting
  • 11. 11 • Kaggle is a convenient platform to study and practice machine learning. • Python code can be executed directly at the host server from the browser. • Numerous datasets were provided on the site, including training and test data. • Once the prediction file is submitted, a score will be returned to evaluate your model. • Many developers share runnable code with detailed explanation. • Appling artificial intelligence blindly without human intelligence is dangerous. • Some ML models can be too complicated, leading to overfitting. • The performance of some ML models can be worse than simple hand-made model. • Combining AI and human logic can make the analytical process enjoyable and reliable. Python code of the project at kaggle: https://www.kaggle.com/dingli/titanic-survivor-prediction-machine-learning