SlideShare a Scribd company logo
TITANIC
TITANIC PROJECT (R software used)
Cleaning the data
 Handling missing values
 Choosing theright attributes
 Converting the target to categorical type
 Adding a new column (“Survived”) to the testing data
Handling missing values
 Cabin column with very few data values removed from the data
 Blank space converted into NA
 NA in Age column filled out with the mean of column (since the
distribution of ages is roughly symmetric: the mean is a good
measure of center)
Choosing the appropriate attributes
Use logistic regression to examine the
relationship between the target variable
(“Survived”) and each one of the other
attributes one at a time. The p-value is a
good indicator of how much an attribute
impacts the target.
Example: the p-value in the relationship
"Survived” vs “Fare” is 9.79e-12 (very
small). “Fare” has an impact in
“Survived”
Another indicator could be a stacked bar
graph such as the one to the right
showing that a female has a much higher
probability of surviving than a male (gray
area shows the proportion of survivors)
Converting attribute types
The following columns: “Pclass”, “Survived” are converted into
categorical variables using the as.factor function
Adding a new column
A new column “Survived” added to the testing dataset so we can make
predictions using the model built from the training set
How to approach this
problem?
One would think that age and gender would be very important criteria
in saving passengers as well as fare and class.
What data analysis shows
Most females survived (74%)
What data analysis shows
Most passengers who embarked
at Cherbourg survived (55%) while
only 34% of passengers who
embarked at Southampton did.
What data analysis shows
Survival rate is highest in class1(63%)
and lowest in class 3 (only 24%)
Question 1
The survival rate being higher among females and also among first class
passengers may be understandable (Special treatment).
But why Cherbourg passengers have a higher survival rate?
What data shows
Fifty percent of people who
embarked at Cherbourg are first
class passengers. This explains
why the survival rate among
Cherbourg passengers is higher
than among passengers who
embarked at Southampton or
Queenstown
Question2
Why is the survival rate much higher among females? Is it because of
gender or wealth?
How did we solve this question?
• Find the survival rate among female in general (74%) against 19% for
males
How did we solve this question?
• Find the probability that a female survive given that she is a first class
passenger.
• The probability that a woman survives is 74% against 19% (previous
slide) for a man but the probability that a woman survive given that
she is in class1 is almost 97%, which is consistent with the high
survival rate in Pclass1 (63%) and higher percentage of female in
Pclass1 than male (30% vs 20%).
• Conclusion: The high survival rate among females is a combination of
gender and wealth
Structure
Decision tree using the classification
tree(ctree) of party library in R
Accuracy on training set
Predictions on testing set
Any suggestions, please share
THANK YOU

More Related Content

Viewers also liked

Jubilares - 10/12/2015 Feria Bizhirik Erakusketa
Jubilares - 10/12/2015 Feria Bizhirik ErakusketaJubilares - 10/12/2015 Feria Bizhirik Erakusketa
Jubilares - 10/12/2015 Feria Bizhirik Erakusketa
Bizhirik
 
Reflective Safety Gear
Reflective Safety GearReflective Safety Gear
Reflective Safety Gear
Florida Transcor
 
Institutional Research AS Media
Institutional Research AS MediaInstitutional Research AS Media
Institutional Research AS Media
medcalfbro
 
Level 2 Computerised Bookkeeping
Level 2 Computerised BookkeepingLevel 2 Computerised Bookkeeping
Level 2 Computerised BookkeepingVanessa King
 
Portfólio Allan Cancian (junho 2016)
Portfólio Allan Cancian (junho 2016)Portfólio Allan Cancian (junho 2016)
Portfólio Allan Cancian (junho 2016)
Allan Cancian
 
Level 3 payroll management
Level 3 payroll managementLevel 3 payroll management
Level 3 payroll managementVanessa King
 
Verkosta virtaa! hankkeen esittely 2016 (1)
Verkosta virtaa!  hankkeen esittely 2016 (1)Verkosta virtaa!  hankkeen esittely 2016 (1)
Verkosta virtaa! hankkeen esittely 2016 (1)
Riikka Aminoff
 
Relatório do programa de recuperação de áreas degradadas setembro 2016
Relatório do programa de recuperação de áreas degradadas  setembro 2016Relatório do programa de recuperação de áreas degradadas  setembro 2016
Relatório do programa de recuperação de áreas degradadas setembro 2016
Luciano Silveira
 
Film Studio Research AS Media
Film Studio Research AS MediaFilm Studio Research AS Media
Film Studio Research AS Media
medcalfbro
 
Level 3 Computerised Bookkeeping
Level 3 Computerised BookkeepingLevel 3 Computerised Bookkeeping
Level 3 Computerised BookkeepingVanessa King
 
11 nurses how to communicate
11 nurses how to communicate11 nurses how to communicate
11 nurses how to communicate
Forward Thinking, LLC
 

Viewers also liked (13)

Jubilares - 10/12/2015 Feria Bizhirik Erakusketa
Jubilares - 10/12/2015 Feria Bizhirik ErakusketaJubilares - 10/12/2015 Feria Bizhirik Erakusketa
Jubilares - 10/12/2015 Feria Bizhirik Erakusketa
 
Reflective Safety Gear
Reflective Safety GearReflective Safety Gear
Reflective Safety Gear
 
Institutional Research AS Media
Institutional Research AS MediaInstitutional Research AS Media
Institutional Research AS Media
 
MCGazetteArticle
MCGazetteArticleMCGazetteArticle
MCGazetteArticle
 
Level 2 Computerised Bookkeeping
Level 2 Computerised BookkeepingLevel 2 Computerised Bookkeeping
Level 2 Computerised Bookkeeping
 
Portfólio Allan Cancian (junho 2016)
Portfólio Allan Cancian (junho 2016)Portfólio Allan Cancian (junho 2016)
Portfólio Allan Cancian (junho 2016)
 
Senior_Network_Engineer
Senior_Network_EngineerSenior_Network_Engineer
Senior_Network_Engineer
 
Level 3 payroll management
Level 3 payroll managementLevel 3 payroll management
Level 3 payroll management
 
Verkosta virtaa! hankkeen esittely 2016 (1)
Verkosta virtaa!  hankkeen esittely 2016 (1)Verkosta virtaa!  hankkeen esittely 2016 (1)
Verkosta virtaa! hankkeen esittely 2016 (1)
 
Relatório do programa de recuperação de áreas degradadas setembro 2016
Relatório do programa de recuperação de áreas degradadas  setembro 2016Relatório do programa de recuperação de áreas degradadas  setembro 2016
Relatório do programa de recuperação de áreas degradadas setembro 2016
 
Film Studio Research AS Media
Film Studio Research AS MediaFilm Studio Research AS Media
Film Studio Research AS Media
 
Level 3 Computerised Bookkeeping
Level 3 Computerised BookkeepingLevel 3 Computerised Bookkeeping
Level 3 Computerised Bookkeeping
 
11 nurses how to communicate
11 nurses how to communicate11 nurses how to communicate
11 nurses how to communicate
 

Similar to Titanic

Statistical measures categorical data
Statistical measures   categorical dataStatistical measures   categorical data
Statistical measures categorical datajaflint718
 
Summer Program on Transportation Statistics, Assessing Crash Risk for Highly ...
Summer Program on Transportation Statistics, Assessing Crash Risk for Highly ...Summer Program on Transportation Statistics, Assessing Crash Risk for Highly ...
Summer Program on Transportation Statistics, Assessing Crash Risk for Highly ...
The Statistical and Applied Mathematical Sciences Institute
 
Module 3_ Classification.pptx
Module 3_ Classification.pptxModule 3_ Classification.pptx
Module 3_ Classification.pptx
nikshaikh786
 
Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...
Adrian Olszewski
 
Linear and Logistics Regression
Linear and Logistics RegressionLinear and Logistics Regression
Linear and Logistics Regression
Mukul Kumar Singh Chauhan
 
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdfregression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
lisow86669
 
QT1 - 02 - Frequency Distribution
QT1 - 02 - Frequency DistributionQT1 - 02 - Frequency Distribution
QT1 - 02 - Frequency Distribution
Prithwis Mukerjee
 
Classification Assessment Methods.pptx
Classification Assessment  Methods.pptxClassification Assessment  Methods.pptx
Classification Assessment Methods.pptx
Riadh Al-Haidari
 
Machine learning session5(logistic regression)
Machine learning   session5(logistic regression)Machine learning   session5(logistic regression)
Machine learning session5(logistic regression)
Abhimanyu Dwivedi
 
Spss cross tab n chi sq bivariate analysis
Spss  cross tab n chi sq bivariate analysisSpss  cross tab n chi sq bivariate analysis
Spss cross tab n chi sq bivariate analysisRaja Azrul Raja Ahmad
 
Displaying and describing categorical data
Displaying and describing categorical dataDisplaying and describing categorical data
Displaying and describing categorical data
Olivia Dombrowski
 
202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization
FEG
 
ML MODULE 2.pdf
ML MODULE 2.pdfML MODULE 2.pdf
ML MODULE 2.pdf
Shiwani Gupta
 
Titanic - Presentation
Titanic - PresentationTitanic - Presentation
Titanic - PresentationSonali Haldar
 
M.Ed Tcs 2 seminar ppt npc to submit
M.Ed Tcs 2 seminar ppt npc   to submitM.Ed Tcs 2 seminar ppt npc   to submit
M.Ed Tcs 2 seminar ppt npc to submit
BINCYKMATHEW
 
Decision Making Using the Analytic Hierarchy Process (AHP); A Step by Step A...
Decision Making Using the Analytic Hierarchy Process (AHP);  A Step by Step A...Decision Making Using the Analytic Hierarchy Process (AHP);  A Step by Step A...
Decision Making Using the Analytic Hierarchy Process (AHP); A Step by Step A...
Hamed Taherdoost
 
Nonparametric tests assignment
Nonparametric tests assignmentNonparametric tests assignment
Nonparametric tests assignment
ROOHASHAHID1
 
Discriminant analysis ravi nakulan slideshare
Discriminant analysis ravi nakulan slideshareDiscriminant analysis ravi nakulan slideshare
Discriminant analysis ravi nakulan slideshare
Ravi Nakulan
 

Similar to Titanic (20)

Statistical measures categorical data
Statistical measures   categorical dataStatistical measures   categorical data
Statistical measures categorical data
 
Summer Program on Transportation Statistics, Assessing Crash Risk for Highly ...
Summer Program on Transportation Statistics, Assessing Crash Risk for Highly ...Summer Program on Transportation Statistics, Assessing Crash Risk for Highly ...
Summer Program on Transportation Statistics, Assessing Crash Risk for Highly ...
 
Module 3_ Classification.pptx
Module 3_ Classification.pptxModule 3_ Classification.pptx
Module 3_ Classification.pptx
 
Assessment
AssessmentAssessment
Assessment
 
Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...Logistic regression vs. logistic classifier. History of the confusion and the...
Logistic regression vs. logistic classifier. History of the confusion and the...
 
Linear and Logistics Regression
Linear and Logistics RegressionLinear and Logistics Regression
Linear and Logistics Regression
 
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdfregression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
 
QT1 - 02 - Frequency Distribution
QT1 - 02 - Frequency DistributionQT1 - 02 - Frequency Distribution
QT1 - 02 - Frequency Distribution
 
Classification Assessment Methods.pptx
Classification Assessment  Methods.pptxClassification Assessment  Methods.pptx
Classification Assessment Methods.pptx
 
Machine learning session5(logistic regression)
Machine learning   session5(logistic regression)Machine learning   session5(logistic regression)
Machine learning session5(logistic regression)
 
Spss cross tab n chi sq bivariate analysis
Spss  cross tab n chi sq bivariate analysisSpss  cross tab n chi sq bivariate analysis
Spss cross tab n chi sq bivariate analysis
 
Displaying and describing categorical data
Displaying and describing categorical dataDisplaying and describing categorical data
Displaying and describing categorical data
 
202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization202312 Exploration of Data Analysis Visualization
202312 Exploration of Data Analysis Visualization
 
ML MODULE 2.pdf
ML MODULE 2.pdfML MODULE 2.pdf
ML MODULE 2.pdf
 
Titanic - Presentation
Titanic - PresentationTitanic - Presentation
Titanic - Presentation
 
M.Ed Tcs 2 seminar ppt npc to submit
M.Ed Tcs 2 seminar ppt npc   to submitM.Ed Tcs 2 seminar ppt npc   to submit
M.Ed Tcs 2 seminar ppt npc to submit
 
Decision Making Using the Analytic Hierarchy Process (AHP); A Step by Step A...
Decision Making Using the Analytic Hierarchy Process (AHP);  A Step by Step A...Decision Making Using the Analytic Hierarchy Process (AHP);  A Step by Step A...
Decision Making Using the Analytic Hierarchy Process (AHP); A Step by Step A...
 
Nonparametric tests assignment
Nonparametric tests assignmentNonparametric tests assignment
Nonparametric tests assignment
 
Discriminant analysis ravi nakulan slideshare
Discriminant analysis ravi nakulan slideshareDiscriminant analysis ravi nakulan slideshare
Discriminant analysis ravi nakulan slideshare
 
conference_MAF_22042014
conference_MAF_22042014conference_MAF_22042014
conference_MAF_22042014
 

Recently uploaded

Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 

Recently uploaded (20)

Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 

Titanic

  • 1. TITANIC TITANIC PROJECT (R software used)
  • 2. Cleaning the data  Handling missing values  Choosing theright attributes  Converting the target to categorical type  Adding a new column (“Survived”) to the testing data
  • 3. Handling missing values  Cabin column with very few data values removed from the data  Blank space converted into NA  NA in Age column filled out with the mean of column (since the distribution of ages is roughly symmetric: the mean is a good measure of center)
  • 4. Choosing the appropriate attributes Use logistic regression to examine the relationship between the target variable (“Survived”) and each one of the other attributes one at a time. The p-value is a good indicator of how much an attribute impacts the target. Example: the p-value in the relationship "Survived” vs “Fare” is 9.79e-12 (very small). “Fare” has an impact in “Survived” Another indicator could be a stacked bar graph such as the one to the right showing that a female has a much higher probability of surviving than a male (gray area shows the proportion of survivors)
  • 5. Converting attribute types The following columns: “Pclass”, “Survived” are converted into categorical variables using the as.factor function
  • 6. Adding a new column A new column “Survived” added to the testing dataset so we can make predictions using the model built from the training set
  • 7. How to approach this problem? One would think that age and gender would be very important criteria in saving passengers as well as fare and class.
  • 8. What data analysis shows Most females survived (74%)
  • 9. What data analysis shows Most passengers who embarked at Cherbourg survived (55%) while only 34% of passengers who embarked at Southampton did.
  • 10. What data analysis shows Survival rate is highest in class1(63%) and lowest in class 3 (only 24%)
  • 11. Question 1 The survival rate being higher among females and also among first class passengers may be understandable (Special treatment). But why Cherbourg passengers have a higher survival rate?
  • 12. What data shows Fifty percent of people who embarked at Cherbourg are first class passengers. This explains why the survival rate among Cherbourg passengers is higher than among passengers who embarked at Southampton or Queenstown
  • 13. Question2 Why is the survival rate much higher among females? Is it because of gender or wealth?
  • 14. How did we solve this question? • Find the survival rate among female in general (74%) against 19% for males
  • 15. How did we solve this question? • Find the probability that a female survive given that she is a first class passenger. • The probability that a woman survives is 74% against 19% (previous slide) for a man but the probability that a woman survive given that she is in class1 is almost 97%, which is consistent with the high survival rate in Pclass1 (63%) and higher percentage of female in Pclass1 than male (30% vs 20%). • Conclusion: The high survival rate among females is a combination of gender and wealth
  • 17. Decision tree using the classification tree(ctree) of party library in R
  • 20. Any suggestions, please share THANK YOU