Data Analysis project "TITANIC SURVIVAL"

Eleftherios Mitsimponas
“DATA ANALYST “


Import CSV file and start exploring the dataset
Check the dimensions of the dataset
Call the first rows to understand the data
Check the variables and their types
Explore the dataset for Missing values and identify their location and their
number
Subset our data to obtain observations that contain no missing data by replacing
the lines with N/A with some specific values
Create new variables to help to train our data
Exploratory DATA Analysis


 My predictive model is based on some new variables, that I had to
create due to make my predictions more accurate.
CREATE
New variables

Out-of-bag (OOB) error :
is a method of measuring the
prediction error of random forests .So
the smaller the error, the more
accurate my model.
Random Forest:
Random forest builds multiple
decision trees and merges them
together to get a more accurate and
stable prediction.
IMPORTANCE OF VALUES
Random Forest has a feature of
presenting the
important variables.
83.95%
Accuracy
rf.label <- as.factor(train$Survived)
0:perish
1:survive

Using 10 fold Cross-validation divide my
train.data(=891 obs.) into 10 folds with almost the
same length each one
Fold1(89 obs.), Fold2(89 obs.),…..,Fold10(90obs.)
makeCluster(6,type=“SOCK”)
We call these groups 1 to 10 .The analysis is performed 10 times . The first
time the analysis is performed, groups 1 to 9 are used to train the
algorithm and group 10 is used to test the model.
I categorize the data (Clustering) into 6
sockets . Every socket commits CPU and it’s
working without waiting at the same time.
Performing 10 fold C-V find the cp-accuracy
of the model


Visualization
Visualize the data is a powerful machine to understand well
your data and find the correlation between them.
This plot help me to find the survivng
rate based on Pclass and new title.
My final rpart model which give me the
most important variables of my
predictive model. As a result the best
accuracy for my model.

Data Analysis project "TITANIC SURVIVAL"

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Data Analysis project "TITANIC SURVIVAL"

Similar to Data Analysis project "TITANIC SURVIVAL" (20)

Recently uploaded

Recently uploaded (20)

Data Analysis project "TITANIC SURVIVAL"