SlideShare a Scribd company logo
1 of 33
By
MD.
RANA
MAHMUD
Predicting Titanic Passenger
Survival Using Machine Learning
1
Table of Context
• Introduction
• Overview of the problem
• Questions to be answered
• Data Description
• Descriptive Analysis
• Methodology
• Data Processing
• Predicting Survival
• Summary Finding
2
Introduction
I figure life’s a gift and I don’t intend on wasting
it.
-Jack; Titanic Movie
3
Data Description
• Training Data
• 891 passenger data
• 12 variables
• Test Data
• 418 passenger data
• 11 variables
4
Data Dictionary
Variable Definition Key
survival Survival 0 = No, 1 = Yes
pclass Ticket class 1 = 1st, 2 = 2nd, 3 =
3rd
sex Sex
Age Age in years
sibsp # of siblings / spouses
aboard the Titanic
parch # of parents / children
aboard the Titanic
ticket Ticket number
fare Passenger fare
cabin Cabin number
embarked Port of Embarkation C = Cherbourg, Q =
Queenstown, S =
Southampton
5
Data Sample
PassengerIdSurvived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
1 0 3 Braund, Mr. Owen Harrismale 22 1 0 A/5 21171 7.25 S
2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer)female 38 1 0 PC 17599 71.2833 C85 C
3 1 3 Heikkinen, Miss. Lainafemale 26 0 0 STON/O2. 31012827.925 S
4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel)female 35 1 0 113803 53.1 C123 S
5 0 3 Allen, Mr. William Henrymale 35 0 0 373450 8.05 S
6 0 3 Moran, Mr. Jamesmale 0 0 330877 8.4583 Q
7 0 1 McCarthy, Mr. Timothy Jmale 54 0 0 17463 51.8625 E46 S
8 0 3 Palsson, Master. Gosta Leonardmale 2 3 1 349909 21.075 S
9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female 27 0 2 347742 11.1333 S
10 1 2 Nasser, Mrs. Nicholas (Adele Achem)female 14 1 0 237736 30.0708 C
11 1 3 Sandstrom, Miss. Marguerite Rutfemale 4 1 1 PP 9549 16.7 G6 S
12 1 1 Bonnell, Miss. Elizabethfemale 58 0 0 113783 26.55 C103 S
13 0 3 Saundercock, Mr. William Henrymale 20 0 0 A/5. 2151 8.05 S
14 0 3 Andersson, Mr. Anders Johanmale 39 1 5 347082 31.275 S
15 0 3 Vestrom, Miss. Hulda Amanda Adolfinafemale 14 0 0 350406 7.8542 S
16 1 2 Hewlett, Mrs. (Mary D Kingcome)female 55 0 0 248706 16 S
17 0 3 Rice, Master. Eugenemale 2 4 1 382652 29.125 Q
18 1 2 Williams, Mr. Charles Eugenemale 0 0 244373 13 S
19 0 3 Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)female 31 1 0 345763 18 S
6
Titanic Cross Section
7
Overview of the problem?
• Analyzing survival pattern
• Predicting passenger survival using existing
information
8
Descriptive Analysis
9
10
11
12
13
14
15
16
17
18
19
Missing Map in Training Data
20
Methodology
• Decision Tree
• Random Forest
• Analysis of Variance
21
Decision Tree
22
Random Forest
23
Data Processing
• Children – Age < 14
• Mother – Sex = Female & Parch 0 & Age > 18
• Title - First Part of Name before ,
• Deck - Frist Letter of Cabin
• Ticket Group
• Fare < 50 = Low
• Fare > 50 & Fare <= 100 = med1
• Fare > 100 & Fare <= 150 = med2
• Fare >= 150 & Fare <= 500 = high
• Fare > 500 = vhigh
• Family member = SibSp + Parch + 1
• Family Type
• Family Size == 1 = Single
• Family Size > 1 & Family Size <5 = Smaller
• Family Size > 4 = Large
24
Missing Value Replacement
• Age = Pclass + Mother + FamilySize + Sex +
SibSp + Parch + Deck + Fare + Embarked +
Title + FamilyID + FamilySizeGroup +
FamilySize
• Embarked == “” by “S”
• Fare by Median Fare
25
26
Prediction
Call:
randomForest(x = rftrain01, y = rflabel, ntree = 1000, importance =
TRUE)
Type of random forest: classification
Number of trees: 1000
No. of variables tried at each split: 3
OOB estimate of error rate: 17.4%
Confusion matrix:
0 1 class.error
0 491 58 0.1056466
1 97 245 0.2836257
27
28
library(party)
set.seed(291)
fit2 <- cforest(as.factor(Survived) ~ Pclass + Sex + Age + SibSp
+ Parch + Fare + Embarked + Title + FamilySize + FamilyID,
data = train, controls=cforest_unbiased(ntree=2000,
mtry=3))
MyPredict <- predict(fit2, test, OOB=TRUE, type = "response")
predict7th <- data.frame(PassengerId = test$PassengerId,
Survived = MyPredict)
29
Model
Survived = Pclass + Sex + Age + SibSp + Parch +
Fare + Embarked + Title + FamilySize + FamilyID
Ntree = 2000
mtry = 3
30
Summary
Accuracy 81.3%
31
Questions
32
Thank You
33

More Related Content

What's hot

R, Scikit-Learn and Apache Spark ML - What difference does it make?
R, Scikit-Learn and Apache Spark ML - What difference does it make?R, Scikit-Learn and Apache Spark ML - What difference does it make?
R, Scikit-Learn and Apache Spark ML - What difference does it make?Villu Ruusmann
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionTe-Yen Liu
 
Data mining & predictive analytics for US Airlines' performance
Data mining & predictive analytics for US Airlines' performanceData mining & predictive analytics for US Airlines' performance
Data mining & predictive analytics for US Airlines' performanceAkiso Yadav
 
Loan default prediction with machine language
Loan  default  prediction with  machine  language Loan  default  prediction with  machine  language
Loan default prediction with machine language Aayush Kumar
 
YOLOv5 BASED WEB APPLICATION FOR INDIAN CURRENCY NOTE DETECTION
YOLOv5 BASED WEB APPLICATION FOR INDIAN CURRENCY NOTE DETECTIONYOLOv5 BASED WEB APPLICATION FOR INDIAN CURRENCY NOTE DETECTION
YOLOv5 BASED WEB APPLICATION FOR INDIAN CURRENCY NOTE DETECTIONIRJET Journal
 
Flight Delay Prediction Model (2)
Flight Delay Prediction Model (2)Flight Delay Prediction Model (2)
Flight Delay Prediction Model (2)Shubham Gupta
 
Classification and Detection of Vehicles using Deep Learning
Classification and Detection of Vehicles using Deep LearningClassification and Detection of Vehicles using Deep Learning
Classification and Detection of Vehicles using Deep Learningijtsrd
 
Machine learning
Machine learningMachine learning
Machine learningRohit Kumar
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning ProjectAbhishek Singh
 
Fake Image Identification
Fake Image IdentificationFake Image Identification
Fake Image IdentificationVenkat Projects
 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Syed Atif Naseem
 
Heart Attack Prediction using Machine Learning
Heart Attack Prediction using Machine LearningHeart Attack Prediction using Machine Learning
Heart Attack Prediction using Machine Learningmohdshoaibuddin1
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learningTonmoy Bhagawati
 
Image proceesing with matlab
Image proceesing with matlabImage proceesing with matlab
Image proceesing with matlabAshutosh Shahi
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analyticsUmasree Raghunath
 

What's hot (20)

R, Scikit-Learn and Apache Spark ML - What difference does it make?
R, Scikit-Learn and Apache Spark ML - What difference does it make?R, Scikit-Learn and Apache Spark ML - What difference does it make?
R, Scikit-Learn and Apache Spark ML - What difference does it make?
 
Airline delay prediction
Airline delay predictionAirline delay prediction
Airline delay prediction
 
Flight Delay Prediction
Flight Delay PredictionFlight Delay Prediction
Flight Delay Prediction
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Data mining & predictive analytics for US Airlines' performance
Data mining & predictive analytics for US Airlines' performanceData mining & predictive analytics for US Airlines' performance
Data mining & predictive analytics for US Airlines' performance
 
Loan default prediction with machine language
Loan  default  prediction with  machine  language Loan  default  prediction with  machine  language
Loan default prediction with machine language
 
YOLOv5 BASED WEB APPLICATION FOR INDIAN CURRENCY NOTE DETECTION
YOLOv5 BASED WEB APPLICATION FOR INDIAN CURRENCY NOTE DETECTIONYOLOv5 BASED WEB APPLICATION FOR INDIAN CURRENCY NOTE DETECTION
YOLOv5 BASED WEB APPLICATION FOR INDIAN CURRENCY NOTE DETECTION
 
Flight Delay Prediction Model (2)
Flight Delay Prediction Model (2)Flight Delay Prediction Model (2)
Flight Delay Prediction Model (2)
 
Classification and Detection of Vehicles using Deep Learning
Classification and Detection of Vehicles using Deep LearningClassification and Detection of Vehicles using Deep Learning
Classification and Detection of Vehicles using Deep Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
 
Fake Image Identification
Fake Image IdentificationFake Image Identification
Fake Image Identification
 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)
 
Heart Attack Prediction using Machine Learning
Heart Attack Prediction using Machine LearningHeart Attack Prediction using Machine Learning
Heart Attack Prediction using Machine Learning
 
Deep Learning for Lung Cancer Detection
Deep Learning for Lung Cancer DetectionDeep Learning for Lung Cancer Detection
Deep Learning for Lung Cancer Detection
 
Ada boost
Ada boostAda boost
Ada boost
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Image proceesing with matlab
Image proceesing with matlabImage proceesing with matlab
Image proceesing with matlab
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 

Recently uploaded

如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一fztigerwe
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...BabaJohn3
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshareraiaryan448
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...ssuserf63bd7
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjadimosmejiaslendon
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证ju0dztxtn
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchersdarmandersingh4580
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfRobertoOcampo24
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...Amil baba
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证pwgnohujw
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancingmohamed Elzalabany
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksBoston Institute of Analytics
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证dq9vz1isj
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationmuqadasqasim10
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 

Recently uploaded (20)

如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 

Titanic Survival Prediction Using Machine Learning

  • 2. Table of Context • Introduction • Overview of the problem • Questions to be answered • Data Description • Descriptive Analysis • Methodology • Data Processing • Predicting Survival • Summary Finding 2
  • 3. Introduction I figure life’s a gift and I don’t intend on wasting it. -Jack; Titanic Movie 3
  • 4. Data Description • Training Data • 891 passenger data • 12 variables • Test Data • 418 passenger data • 11 variables 4
  • 5. Data Dictionary Variable Definition Key survival Survival 0 = No, 1 = Yes pclass Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd sex Sex Age Age in years sibsp # of siblings / spouses aboard the Titanic parch # of parents / children aboard the Titanic ticket Ticket number fare Passenger fare cabin Cabin number embarked Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton 5
  • 6. Data Sample PassengerIdSurvived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked 1 0 3 Braund, Mr. Owen Harrismale 22 1 0 A/5 21171 7.25 S 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer)female 38 1 0 PC 17599 71.2833 C85 C 3 1 3 Heikkinen, Miss. Lainafemale 26 0 0 STON/O2. 31012827.925 S 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel)female 35 1 0 113803 53.1 C123 S 5 0 3 Allen, Mr. William Henrymale 35 0 0 373450 8.05 S 6 0 3 Moran, Mr. Jamesmale 0 0 330877 8.4583 Q 7 0 1 McCarthy, Mr. Timothy Jmale 54 0 0 17463 51.8625 E46 S 8 0 3 Palsson, Master. Gosta Leonardmale 2 3 1 349909 21.075 S 9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female 27 0 2 347742 11.1333 S 10 1 2 Nasser, Mrs. Nicholas (Adele Achem)female 14 1 0 237736 30.0708 C 11 1 3 Sandstrom, Miss. Marguerite Rutfemale 4 1 1 PP 9549 16.7 G6 S 12 1 1 Bonnell, Miss. Elizabethfemale 58 0 0 113783 26.55 C103 S 13 0 3 Saundercock, Mr. William Henrymale 20 0 0 A/5. 2151 8.05 S 14 0 3 Andersson, Mr. Anders Johanmale 39 1 5 347082 31.275 S 15 0 3 Vestrom, Miss. Hulda Amanda Adolfinafemale 14 0 0 350406 7.8542 S 16 1 2 Hewlett, Mrs. (Mary D Kingcome)female 55 0 0 248706 16 S 17 0 3 Rice, Master. Eugenemale 2 4 1 382652 29.125 Q 18 1 2 Williams, Mr. Charles Eugenemale 0 0 244373 13 S 19 0 3 Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)female 31 1 0 345763 18 S 6
  • 8. Overview of the problem? • Analyzing survival pattern • Predicting passenger survival using existing information 8
  • 10. 10
  • 11. 11
  • 12. 12
  • 13. 13
  • 14. 14
  • 15. 15
  • 16. 16
  • 17. 17
  • 18. 18
  • 19. 19
  • 20. Missing Map in Training Data 20
  • 21. Methodology • Decision Tree • Random Forest • Analysis of Variance 21
  • 24. Data Processing • Children – Age < 14 • Mother – Sex = Female & Parch 0 & Age > 18 • Title - First Part of Name before , • Deck - Frist Letter of Cabin • Ticket Group • Fare < 50 = Low • Fare > 50 & Fare <= 100 = med1 • Fare > 100 & Fare <= 150 = med2 • Fare >= 150 & Fare <= 500 = high • Fare > 500 = vhigh • Family member = SibSp + Parch + 1 • Family Type • Family Size == 1 = Single • Family Size > 1 & Family Size <5 = Smaller • Family Size > 4 = Large 24
  • 25. Missing Value Replacement • Age = Pclass + Mother + FamilySize + Sex + SibSp + Parch + Deck + Fare + Embarked + Title + FamilyID + FamilySizeGroup + FamilySize • Embarked == “” by “S” • Fare by Median Fare 25
  • 26. 26
  • 27. Prediction Call: randomForest(x = rftrain01, y = rflabel, ntree = 1000, importance = TRUE) Type of random forest: classification Number of trees: 1000 No. of variables tried at each split: 3 OOB estimate of error rate: 17.4% Confusion matrix: 0 1 class.error 0 491 58 0.1056466 1 97 245 0.2836257 27
  • 28. 28
  • 29. library(party) set.seed(291) fit2 <- cforest(as.factor(Survived) ~ Pclass + Sex + Age + SibSp + Parch + Fare + Embarked + Title + FamilySize + FamilyID, data = train, controls=cforest_unbiased(ntree=2000, mtry=3)) MyPredict <- predict(fit2, test, OOB=TRUE, type = "response") predict7th <- data.frame(PassengerId = test$PassengerId, Survived = MyPredict) 29
  • 30. Model Survived = Pclass + Sex + Age + SibSp + Parch + Fare + Embarked + Title + FamilySize + FamilyID Ntree = 2000 mtry = 3 30