SlideShare a Scribd company logo
Titanic Survivor Prediction
by Machine Learning
Ding Li 2018.05
online store: costumejewelry1.com
2
April 15, 1912 Titanic Sank
People
2,224
Crew
908
Passengers
1,316
Survivors
212
Victims
696
Survivors
498
Victims
818
Passengers
1,309
Train
891
Test
418
Survivors
342
Victims
549
Survivors
who?
Victims
who?
Modeling Prediction
History
Kaggle Machine Learning (ML) Project
3
PassengerId 195
Survived 1
Pclass 1
Name*
Brown, Mrs. James
Joseph (Margaret Tobin)
Sex female
Age 44
SibSp 0
ParCh 0
Ticket** PC 17610
Fare 27.7208
Cabin*** B4
Embarked C
Train
PassengerId 972
Survived (need to predict)
Pclass 3
Name*
Boulos, Master.
Akar
Sex male
Age 6
SibSp 1
ParCh 1
Ticket** 2678
Fare 15.2458
Cabin***
Embarked C
Test Goal
PassengerId Survived
892 1/0
893 1/0
…… ……
1309 1/0
Margaret Brown In Titanic Movie
By Kathy Bates
*Title can be extracted from Name. **Ticket not informative, not used ***Cabin most missing, not used
Age, Fare: missing data replaced with median value; Embarked: missing data replaced with mode value
4
Embarked from
S: Southampton C: Cherbourg Q: Queenstown
SibSp: # of Siblings or Spouse
ParCh: # of Parents or Children
Family Size = 𝑆𝑖𝑏𝑆𝑝 + 𝑃𝑎𝑟𝑐ℎ + 1
Is Alone = 1
0
if Family Size = 1
if Family Size > 1
5
6
Survived – Sex -0.54
P Class -0.34
Fare bin 0.3
Embarked -0.17
Fare bin – P Class -0.63
Family Size 0.47
Age bin – P Class -0.36
Title code 0.32
7
Models
Coding in Python
Sklearn, Xgboost
Train
Accuracy
Mean
Test
Accuracy
Mean
XGBClassifier 85.6% 82.9%
SVC 83.7% 82.6%
RandomForestClassifier 89.0% 82.2%
DecisionTreeClassifier 89.5% 82.0%
KNeighborsClassifier 85.0% 81.4%
RidgeClassifierCV 79.7% 79.4%
LogisticRegressionCV 79.7% 79.1%
GaussianNB 79.5% 78.1%
SGDClassifier 73.5% 73.2%
Cross Validation (CV)
8
Steps are easy to Interpret
Very complicated logic
Tree is too deep
Prone to overfitting
9
Observation
Model
Accuracy
Train
891
Survivors
342
(38%)
Victims
549
(62%)
Train
891
Victims
891
all
549
891
= 62%
Train
891
Male
577 (35%)
Female
314 (65%)
Survivors
109
(19%)
Victims
468
(81%)
Survivors
234
(74%)
Victims
80
(26%)
Train
891
Male
577 (35%)
Female
314 (65%)
Victims
577
Survivors
314
468+234
891
=
702
891
= 79%
With one more
layer, hand-made
tree can reach
82% accuracy
10
Before Turning:
Training Score = 89.5%
Test Score = 82.05%
After Turning:
(Best max_depth = 4)
Training Score = 89.4%
Test Score = 87.4%
Alleviate the overfitting
11
• Kaggle is a convenient platform to study and practice machine learning.
• Python code can be executed directly at the host server from the browser.
• Numerous datasets were provided on the site, including training and test data.
• Once the prediction file is submitted, a score will be returned to evaluate your model.
• Many developers share runnable code with detailed explanation.
• Appling artificial intelligence blindly without human intelligence is dangerous.
• Some ML models can be too complicated, leading to overfitting.
• The performance of some ML models can be worse than simple hand-made model.
• Combining AI and human logic can make the analytical process enjoyable and reliable.
Python code of the project at kaggle: https://www.kaggle.com/dingli/titanic-survivor-prediction-machine-learning

More Related Content

What's hot

22 Machine Learning Feature Selection
22 Machine Learning Feature Selection22 Machine Learning Feature Selection
22 Machine Learning Feature Selection
Andres Mendez-Vazquez
 
Ways to evaluate a machine learning model’s performance
Ways to evaluate a machine learning model’s performanceWays to evaluate a machine learning model’s performance
Ways to evaluate a machine learning model’s performance
Mala Deep Upadhaya
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
Saad Elbeleidy
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
Functional Imperative
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hakky St
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
Pradeep Redddy Raamana
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methodsReza Ramezani
 
An Approach For Predicting Road Accident Severity
An Approach For Predicting Road Accident SeverityAn Approach For Predicting Road Accident Severity
An Approach For Predicting Road Accident Severity
BilalSikander3
 
Decision tree
Decision treeDecision tree
Decision tree
Ami_Surati
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
Kamal Acharya
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
CodePolitan
 
Machine learning meetup
Machine learning meetupMachine learning meetup
Machine learning meetup
QuantUniversity
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
Kai Koenig
 
Intro to Feature Selection
Intro to Feature SelectionIntro to Feature Selection
Intro to Feature Selectionchenhm
 
Deep Neural Networks (DNN)
Deep Neural Networks (DNN)Deep Neural Networks (DNN)
Churn prediction data modeling
Churn prediction data modelingChurn prediction data modeling
Churn prediction data modeling
Pierre Gutierrez
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
Edgar Marca
 
Rnn and lstm
Rnn and lstmRnn and lstm
Rnn and lstm
Shreshth Saxena
 
Feature selection
Feature selectionFeature selection
Feature selection
Dong Guo
 

What's hot (20)

22 Machine Learning Feature Selection
22 Machine Learning Feature Selection22 Machine Learning Feature Selection
22 Machine Learning Feature Selection
 
Ways to evaluate a machine learning model’s performance
Ways to evaluate a machine learning model’s performanceWays to evaluate a machine learning model’s performance
Ways to evaluate a machine learning model’s performance
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
 
An Approach For Predicting Road Accident Severity
An Approach For Predicting Road Accident SeverityAn Approach For Predicting Road Accident Severity
An Approach For Predicting Road Accident Severity
 
Decision tree
Decision treeDecision tree
Decision tree
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
 
Machine learning meetup
Machine learning meetupMachine learning meetup
Machine learning meetup
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Intro to Feature Selection
Intro to Feature SelectionIntro to Feature Selection
Intro to Feature Selection
 
Deep Neural Networks (DNN)
Deep Neural Networks (DNN)Deep Neural Networks (DNN)
Deep Neural Networks (DNN)
 
Churn prediction data modeling
Churn prediction data modelingChurn prediction data modeling
Churn prediction data modeling
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
 
Rnn and lstm
Rnn and lstmRnn and lstm
Rnn and lstm
 
Feature selection
Feature selectionFeature selection
Feature selection
 

More from Ding Li

Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
Ding Li
 
Seismic data analysis with u net
Seismic data analysis with u netSeismic data analysis with u net
Seismic data analysis with u net
Ding Li
 
Find nuclei in images with U-net
Find nuclei in images with U-netFind nuclei in images with U-net
Find nuclei in images with U-net
Ding Li
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural network
Ding Li
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
Ding Li
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
Ding Li
 
Practical data science
Practical data sciencePractical data science
Practical data science
Ding Li
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
Ding Li
 
AI to advance science research
AI to advance science researchAI to advance science research
AI to advance science research
Ding Li
 
Machine learning with graph
Machine learning with graphMachine learning with graph
Machine learning with graph
Ding Li
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
Ding Li
 
Great neck school budget 2016-2017 analysis
Great neck school budget 2016-2017 analysisGreat neck school budget 2016-2017 analysis
Great neck school budget 2016-2017 analysis
Ding Li
 
Business Intelligence and Big Data in Cloud
Business Intelligence and Big Data in CloudBusiness Intelligence and Big Data in Cloud
Business Intelligence and Big Data in Cloud
Ding Li
 

More from Ding Li (13)

Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
Seismic data analysis with u net
Seismic data analysis with u netSeismic data analysis with u net
Seismic data analysis with u net
 
Find nuclei in images with U-net
Find nuclei in images with U-netFind nuclei in images with U-net
Find nuclei in images with U-net
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural network
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Practical data science
Practical data sciencePractical data science
Practical data science
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
AI to advance science research
AI to advance science researchAI to advance science research
AI to advance science research
 
Machine learning with graph
Machine learning with graphMachine learning with graph
Machine learning with graph
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
Great neck school budget 2016-2017 analysis
Great neck school budget 2016-2017 analysisGreat neck school budget 2016-2017 analysis
Great neck school budget 2016-2017 analysis
 
Business Intelligence and Big Data in Cloud
Business Intelligence and Big Data in CloudBusiness Intelligence and Big Data in Cloud
Business Intelligence and Big Data in Cloud
 

Recently uploaded

原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 

Recently uploaded (20)

原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 

Titanic survivor prediction by machine learning

  • 1. Titanic Survivor Prediction by Machine Learning Ding Li 2018.05 online store: costumejewelry1.com
  • 2. 2 April 15, 1912 Titanic Sank People 2,224 Crew 908 Passengers 1,316 Survivors 212 Victims 696 Survivors 498 Victims 818 Passengers 1,309 Train 891 Test 418 Survivors 342 Victims 549 Survivors who? Victims who? Modeling Prediction History Kaggle Machine Learning (ML) Project
  • 3. 3 PassengerId 195 Survived 1 Pclass 1 Name* Brown, Mrs. James Joseph (Margaret Tobin) Sex female Age 44 SibSp 0 ParCh 0 Ticket** PC 17610 Fare 27.7208 Cabin*** B4 Embarked C Train PassengerId 972 Survived (need to predict) Pclass 3 Name* Boulos, Master. Akar Sex male Age 6 SibSp 1 ParCh 1 Ticket** 2678 Fare 15.2458 Cabin*** Embarked C Test Goal PassengerId Survived 892 1/0 893 1/0 …… …… 1309 1/0 Margaret Brown In Titanic Movie By Kathy Bates *Title can be extracted from Name. **Ticket not informative, not used ***Cabin most missing, not used Age, Fare: missing data replaced with median value; Embarked: missing data replaced with mode value
  • 4. 4 Embarked from S: Southampton C: Cherbourg Q: Queenstown SibSp: # of Siblings or Spouse ParCh: # of Parents or Children Family Size = 𝑆𝑖𝑏𝑆𝑝 + 𝑃𝑎𝑟𝑐ℎ + 1 Is Alone = 1 0 if Family Size = 1 if Family Size > 1
  • 5. 5
  • 6. 6 Survived – Sex -0.54 P Class -0.34 Fare bin 0.3 Embarked -0.17 Fare bin – P Class -0.63 Family Size 0.47 Age bin – P Class -0.36 Title code 0.32
  • 7. 7 Models Coding in Python Sklearn, Xgboost Train Accuracy Mean Test Accuracy Mean XGBClassifier 85.6% 82.9% SVC 83.7% 82.6% RandomForestClassifier 89.0% 82.2% DecisionTreeClassifier 89.5% 82.0% KNeighborsClassifier 85.0% 81.4% RidgeClassifierCV 79.7% 79.4% LogisticRegressionCV 79.7% 79.1% GaussianNB 79.5% 78.1% SGDClassifier 73.5% 73.2% Cross Validation (CV)
  • 8. 8 Steps are easy to Interpret Very complicated logic Tree is too deep Prone to overfitting
  • 9. 9 Observation Model Accuracy Train 891 Survivors 342 (38%) Victims 549 (62%) Train 891 Victims 891 all 549 891 = 62% Train 891 Male 577 (35%) Female 314 (65%) Survivors 109 (19%) Victims 468 (81%) Survivors 234 (74%) Victims 80 (26%) Train 891 Male 577 (35%) Female 314 (65%) Victims 577 Survivors 314 468+234 891 = 702 891 = 79% With one more layer, hand-made tree can reach 82% accuracy
  • 10. 10 Before Turning: Training Score = 89.5% Test Score = 82.05% After Turning: (Best max_depth = 4) Training Score = 89.4% Test Score = 87.4% Alleviate the overfitting
  • 11. 11 • Kaggle is a convenient platform to study and practice machine learning. • Python code can be executed directly at the host server from the browser. • Numerous datasets were provided on the site, including training and test data. • Once the prediction file is submitted, a score will be returned to evaluate your model. • Many developers share runnable code with detailed explanation. • Appling artificial intelligence blindly without human intelligence is dangerous. • Some ML models can be too complicated, leading to overfitting. • The performance of some ML models can be worse than simple hand-made model. • Combining AI and human logic can make the analytical process enjoyable and reliable. Python code of the project at kaggle: https://www.kaggle.com/dingli/titanic-survivor-prediction-machine-learning