SlideShare a Scribd company logo
1 of 9
Download to read offline
Data Assessment and Analysis
1). Importing necessary packages
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt
import numpy as np
2). Importing dataset as a python data frame
df=pd.read_csv('C:Userssaravananvideostasktask_data.csv')
df.head()
sample
index
class_l
abel
sensor
0
sensor
1
sensor
2
sensor
3
sensor
4
sensor
5
sensor
6
sensor
7
sensor
8
sensor
9
0
sample0 1.0
0.8342
51
0.7260
81
0.5359
04
0.2148
96
0.8737
88
0.7676
05
0.1113
08
0.5575
26
0.5996
50
0.6655
69
1
sample1 1.0
0.8040
59
0.2531
35
0.8698
67
0.3342
85
0.6040
75
0.4940
45
0.8335
75
0.1941
90
0.0149
66
0.8029
18
2
sample2 1.0
0.6944
04
0.5957
77
0.5812
94
0.7990
03
0.7628
57
0.6513
93
0.0759
05
0.0071
86
0.6596
33
0.8310
09
3
sample3 1.0
0.7836
90
0.0387
80
0.2850
43
0.6273
05
0.8006
20
0.4863
40
0.8277
23
0.3398
07
0.7313
43
0.8923
59
4
sample4 1.0
0.7888
35
0.1744
33
0.3487
70
0.9382
44
0.6920
65
0.3776
20
0.1837
60
0.6168
05
0.4928
99
0.9309
69
3). Analysis of dataset properties
print("unique class_labels and that counts n",df.class_label.value_counts())
unique class_label and that counts
-1.0 200
1.0 200
Name: class_label, dtype: int64
3.1) The Presence of null values in the dataset
df.isnull().sum()
sample index 0
class_label 0
sensor0 0
sensor1 0
sensor2 0
sensor3 0
sensor4 0
sensor5 0
sensor6 0
sensor7 0
sensor8 0
sensor9 0
3.2) Statistical understanding of dataset
df.describe(include="all")
sampl
e
index
class_l
abel
sensor
0
sensor
1
sensor
2
sensor
3
sensor
4
sensor
5
sensor
6
sensor
7
sensor
8
sensor
9
count
400
400.00
0000
400.00
0000
400.00
0000
400.00
0000
400.00
0000
400.00
0000
400.00
0000
400.00
0000
400.00
0000
400.00
0000
400.00
0000
unique 400 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
top sampl
e98
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
freq 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
mean
NaN
0.0000
00
0.5236
61
0.5092
23
0.4812
38
0.5097
52
0.4978
75
0.5010
65
0.4904
80
0.4823
72
0.4828
22
0.5419
33
std
NaN
1.0012
52
0.2681
94
0.2768
78
0.2875
84
0.2977
12
0.2882
08
0.2876
34
0.2899
54
0.2827
14
0.2961
80
0.2724
90
min
NaN
-
1.0000
00
0.0077
75
0.0038
65
0.0044
73
0.0014
66
0.0002
50
0.0004
25
0.0001
73
0.0033
22
0.0031
65
0.0004
52
25%
NaN
-
1.0000
00
0.2997
92
0.2830
04
0.2355
44
0.2626
97
0.2493
69
0.2694
30
0.2266
87
0.2428
48
0.2136
26
0.3212
64
50%
NaN
0.0000
00
0.5349
06
0.5075
83
0.4602
41
0.5100
66
0.4978
42
0.4971
08
0.4773
41
0.4634
38
0.4622
51
0.5783
89
75%
NaN
1.0000
00
0.7518
87
0.7278
43
0.7349
37
0.7689
75
0.7434
01
0.7388
54
0.7353
04
0.7324
83
0.7405
42
0.7689
90
max
NaN
1.0000
00
0.9994
76
0.9986
80
0.9929
63
0.9951
19
0.9994
12
0.9973
67
0.9971
41
0.9982
30
0.9960
98
0.9994
65
3.3) Define Predictor_variables and Target_variable
predictor_variables=df[['sensor0', 'sensor1', 'sensor2','sensor3', 'sensor4',
'sensor5', 'sensor6', 'sensor7', 'sensor8', 'sensor9']]
target_variable=df[['class_label']]
3.4) Variance of predictor_variables data
print("variance of all sensorsnn",np.var(predictor_variables))
variance of all sensors
sensor0 0.071748
sensor1 0.076470
sensor2 0.082498
sensor3 0.088411
sensor4 0.082856
sensor5 0.082527
sensor6 0.083863
sensor7 0.079727
sensor8 0.087503
sensor9 0.074065
3.5) Probability distributions of the predictor_variables data
sensors=predictor_variables.keys()
for i in range(0,len(predictor_variables.keys())):
k=sensors[i]
sb.distplot(predictor_variables[""+k+""],label=""+k+"",hist=False)
3.6) Correlation analysis of Predictor_variables data
sb.heatmap(predictor_variables.corr())
4) Properties of dataset:
1) Our dataset have 12 features with 400 observations.
2) It have one class_label that have either 1 or -1 with each class have equal samples like 200,200.
3) It's don't have any null values.
4) All predictor_variables data start with 0 and end with 1.
5) All predictor_variables data almost 95% normally distributed and also all variables have homogeneity of variance.
6) All predictor_variables are not highly correlated with each one.
5) Assumptions and process of thoughts:
I am going to provide solutions with two approaches.
Approach 1
1) Log loss
Our dataset predictor_variables data start with 0 and end with 1 continuously.
And our class_label (target_variable) have 1 and -1. So, based on those format it’s exactly look like sigmoid
model outputs. If we want to know each sensor performance or predictive power, we can apply log loss metric to
evaluate predictive power of each sensors.
Log loss is the best metrics for binary evolution models / sigmoid function model (logistic or probit
regression) because it will tell the exact loss value of each predicted value. In the sense we can know exact
predictive power of binary model using log loss metric.
Note:
Xi means already predicted values and Y means actual values I derived formula for our problem of case
scenario.
Y = Target_variable (class_label)
Xi …n = Predictor_variables (sensor0, sensor1...n)
n = Number of observations
Log_loss = [(Y * log(Xi)) + ((1-Y) * log(1-X1i))]
Avg_log_loss = [(-1/n)* sum(Log_loss)]
Output (Solution) using Log-loss:
sensors_rank log_loss_score
sensor8 -0.132343277
sensor4 -0.011416493
sensor0 0.243350704
sensor3 0.28850512
sensor5 0.52194066
sensor7 0.608291426
sensor2 0.829876049
sensor9 0.922617133
sensor6 1.025691108
sensor1 1.5173744
Note: Less log-loss score is a most accurate and important sensor. In order to that, the top sensor ranks are
senor8, sensor4, sensor0, sensor3, sensor5, sensor7, sensor2, sensor9, sensor6 and sensor1.
Approach 2
2) Linear Discriminant Analysis (LDA)
 It’s interesting and scalable model which provides predictive power of model and model
performance score (accuracy).
 LDA approaches the problem by assuming that the conditional probability density functions
P(X|Y==0) and P(X|Y==1) are both normally distributed with mean and covariance parameters.(In
our case 1 or -1)
 It consists of statistical properties and it’s calculated for each class.
LDA Assumptions
 Variance among group variables are the same across level of predictors.
 Linear Discriminant Analysis be used when predictor variables variance/covariance are equal. ( Our
dataset predictor variable variance also almost 95% same)
 Linear Discriminant Analysis be used when predictor variables have homogeneity of normal
distributions.
 LDA be used when predictor variable are not highly correlated with each one.
(LDA Assumptions and our dataset properties and assumptions are almost equal)
Output (Solution) using LDA:
Note 1: LDA score for predictive power of model. In order to that, the top sensor ranks are senor8, sensor4,
sensor0, sensor3, sensor7, sensor9, sensor2, sensor5, sensor6 and sensor1.
Note 2: Explained Variance Score is 100% that means model fitted 100% well.
Note 3: LDA model accuracy score is 93%.
6) Strengths:
sensors_rank LDA_Score
sensor8 8.763614
sensor4 7.333124
sensor0 5.862246
sensor3 3.493563
sensor7 2.323977
sensor9 2.178563
sensor2 0.960444
sensor5 0.782621
sensor6 0.745524
sensor1 -1.9319
STRENGTHS
 It is a measure of the performance of a
classification model.
 It will work better when we have
binary prediction probability values.
 It will provide loss values that is deals
with target value.
 It’s good evaluation metric for
sigmoid models.
 It’s multi-class log loss function when
we have target variable with multi-
class.
 Not only LDA is to project the features
in higher dimension space onto a lower
dimensional space but also provide
various impactful information, and
evaluating predictive power of models.
 It’s a better model for find predictive
performance when we have target
variable.
 We can able to evaluate model
efficiently when we have binary and
multi-class target variable.
Log Loss LDA
7) Weakness:
8) Scalabilities:
WEAKNESS
Log Loss LDA
 Never it’s work when we don’t have
sigmoid value.
 LDA does not work well if the data is
not balanced.
 Sometimes LDA doesn’t work well
when we have too many
observations.
SCALABILITY
Log Loss LDA
 It will provide accurate solution when
we have n dimensional variables and
sample’s
 Doesn’t matter n number of features
and samples because it is working based
mathematical formula.
 Processing time is too low for provide
solution
 We can able to integrate rules (Min-
Max Rule,..) onto it.
 It will work well when we have
multiple number of features.
 Doesn’t matter n number of features in
LDA.
 It will take few seconds to provide
solution that means processing time.
9) Alternative Methods:
1. Threshold Based Scoring Mechanism(TBSM)
Threshold Based Scoring Mechanism is technique which is providing solution based on threshold
score.
Strengths:
 We can provide solution using simple threshold scoring.
 We can optimize threshold values based problem.
Weakness:
 It’s little difficult to provide solution when we have multi-class target variable.
 Difficult to assign threshold values when we have multi-class target variable.
 It will take few more minutes for processing when we have n number features.
 Code modification will happen at all time of processing.
Scalability:
 It not a good method when we have n number of features.
 Modification always there when we going to process new data with same problem.
 It’s not a good scalable method for evaluate model performance and power.
Output (Solution) using TBSM:
sensors_rank Threshold_error_score
sensor8 47
sensor4 66
sensor0 80
sensor3 105
sensor9 182
sensor1 189
sensor7 190
sensor2 197
sensor6 201
sensor5 203
Note 1: Threshold based scoring mechanism scoring of sensors in order to that, the top sensor ranks are
senor8, sensor4, sensor0, sensor3, sensor9, sensor1, sensor7, sensor2, sensor6 and sensor5.
2. Cross- Entropy Error Function:
Cross-entropy error function are slightly different depending on the context, but in machine learning
when calculating error rates between 0 and 1 it’s resolve same like log-loss. But in our case we have
prediction probability
Strengths:
 It’s more like a log-loss function.
 It’s a better loss function in machine learning and optimization.
 It deals with classifying a given set of data points into two possible classes generically labelled 0
and 1 in our case -1 and 1.
 We can able to assign vector of weights W.
 We can use cross entropy to get a measure of dissimilarity between p and q.
Weakness:
 Assigning vector of weights W is fully depends of other optimization modelling like Gradient -
descent.
 Assigning vector of weights W is little difficult when we have multi class target variable.
Scalability:
 We can perform efficiently when we have n number features because it’s also a mathematical
formula directly applied on problem.
10) Suggestions:
 LDA and log-loss providing 90% same solution.
 Log-loss is more popular and optimization loss function which provide exact loss value
for each observation deals with target observation. So, it’s a good approach for this
model evaluation task.
 Linear Discriminant Analysis is interesting and new approach for evaluate model
predictive power which will tell most accurate prediction variable (sensor).
 We can do further optimizing techniques, modelling, pipelines using this both.
Y = Target_variable (class_label)
Xi …n = Predictor_variables (sensor0, sensor1...n)
p = {Y, 1-Y}
q = {Xi, 1-Xi}
H (p, q) = -Y log Xi – (1- Y) log (1-Xi)

More Related Content

What's hot

Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionKnoldus Inc.
 
03 image transformations_i
03 image transformations_i03 image transformations_i
03 image transformations_iankit_ppt
 
04 image transformations_ii
04 image transformations_ii04 image transformations_ii
04 image transformations_iiankit_ppt
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
 
Event classification & prediction using support vector machine
Event classification & prediction using support vector machineEvent classification & prediction using support vector machine
Event classification & prediction using support vector machineRuta Kambli
 
(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_KimSundong Kim
 
Curse of dimensionality
Curse of dimensionalityCurse of dimensionality
Curse of dimensionalityNikhil Sharma
 
Reinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIReinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIRaouf KESKES
 
Panoramic Imaging using SIFT and SURF
Panoramic Imaging using SIFT and SURFPanoramic Imaging using SIFT and SURF
Panoramic Imaging using SIFT and SURFEric Jansen
 
[M4A2] Data Analysis and Interpretation Specialization
[M4A2] Data Analysis and Interpretation Specialization [M4A2] Data Analysis and Interpretation Specialization
[M4A2] Data Analysis and Interpretation Specialization Andrea Rubio
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchJisang Yoon
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tearsAnkit Sharma
 
06 image features
06 image features06 image features
06 image featuresankit_ppt
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn
 
Secure information aggregation in sensor networks
Secure information aggregation in sensor networksSecure information aggregation in sensor networks
Secure information aggregation in sensor networksAleksandr Yampolskiy
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoHridyesh Bisht
 
Support Vector Machine (Classification) - Step by Step
Support Vector Machine (Classification) - Step by StepSupport Vector Machine (Classification) - Step by Step
Support Vector Machine (Classification) - Step by StepManish nath choudhary
 

What's hot (20)

Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
03 image transformations_i
03 image transformations_i03 image transformations_i
03 image transformations_i
 
04 image transformations_ii
04 image transformations_ii04 image transformations_ii
04 image transformations_ii
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
 
Event classification & prediction using support vector machine
Event classification & prediction using support vector machineEvent classification & prediction using support vector machine
Event classification & prediction using support vector machine
 
(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim
 
Curse of dimensionality
Curse of dimensionalityCurse of dimensionality
Curse of dimensionality
 
Reinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIReinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAI
 
Panoramic Imaging using SIFT and SURF
Panoramic Imaging using SIFT and SURFPanoramic Imaging using SIFT and SURF
Panoramic Imaging using SIFT and SURF
 
[M4A2] Data Analysis and Interpretation Specialization
[M4A2] Data Analysis and Interpretation Specialization [M4A2] Data Analysis and Interpretation Specialization
[M4A2] Data Analysis and Interpretation Specialization
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
 
06 image features
06 image features06 image features
06 image features
 
algorithm Unit 2
algorithm Unit 2 algorithm Unit 2
algorithm Unit 2
 
DCSM report2
DCSM report2DCSM report2
DCSM report2
 
SVM
SVMSVM
SVM
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
Secure information aggregation in sensor networks
Secure information aggregation in sensor networksSecure information aggregation in sensor networks
Secure information aggregation in sensor networks
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demo
 
Support Vector Machine (Classification) - Step by Step
Support Vector Machine (Classification) - Step by StepSupport Vector Machine (Classification) - Step by Step
Support Vector Machine (Classification) - Step by Step
 

Similar to Data Assessment and Analysis for Model Evaluation

House price prediction
House price predictionHouse price prediction
House price predictionSabahBegum
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Yao Yao
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Predictionsriram30691
 
AMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTAMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTIRJET Journal
 
IRJET- ASL Fingerspelling Interpretation using SVM and MLP
IRJET-  	  ASL Fingerspelling Interpretation using SVM and MLPIRJET-  	  ASL Fingerspelling Interpretation using SVM and MLP
IRJET- ASL Fingerspelling Interpretation using SVM and MLPIRJET Journal
 
casestudy_important.pptx
casestudy_important.pptxcasestudy_important.pptx
casestudy_important.pptxssuser31398b
 
Parameter Estimation User Guide
Parameter Estimation User GuideParameter Estimation User Guide
Parameter Estimation User GuideAndy Salmon
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONIRJET Journal
 
Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...Yao Yao
 
Sign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedSign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedIRJET Journal
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple stepsRenjith M P
 
Data_Mining_Exploration
Data_Mining_ExplorationData_Mining_Exploration
Data_Mining_ExplorationBrett Keim
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Sagar Deogirkar
 
An Introduction to Simulation in the Social Sciences
An Introduction to Simulation in the Social SciencesAn Introduction to Simulation in the Social Sciences
An Introduction to Simulation in the Social Sciencesfsmart01
 
Denoising autoencoder by Harish.R
Denoising autoencoder by Harish.RDenoising autoencoder by Harish.R
Denoising autoencoder by Harish.RHARISH R
 
Spark ml streaming
Spark ml streamingSpark ml streaming
Spark ml streamingAdam Doyle
 

Similar to Data Assessment and Analysis for Model Evaluation (20)

House price prediction
House price predictionHouse price prediction
House price prediction
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
report
reportreport
report
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
 
AMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTAMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLT
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
IRJET- ASL Fingerspelling Interpretation using SVM and MLP
IRJET-  	  ASL Fingerspelling Interpretation using SVM and MLPIRJET-  	  ASL Fingerspelling Interpretation using SVM and MLP
IRJET- ASL Fingerspelling Interpretation using SVM and MLP
 
casestudy_important.pptx
casestudy_important.pptxcasestudy_important.pptx
casestudy_important.pptx
 
Parameter Estimation User Guide
Parameter Estimation User GuideParameter Estimation User Guide
Parameter Estimation User Guide
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTION
 
Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...
 
Sign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedSign Detection from Hearing Impaired
Sign Detection from Hearing Impaired
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
 
Data_Mining_Exploration
Data_Mining_ExplorationData_Mining_Exploration
Data_Mining_Exploration
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
 
An Introduction to Simulation in the Social Sciences
An Introduction to Simulation in the Social SciencesAn Introduction to Simulation in the Social Sciences
An Introduction to Simulation in the Social Sciences
 
Denoising autoencoder by Harish.R
Denoising autoencoder by Harish.RDenoising autoencoder by Harish.R
Denoising autoencoder by Harish.R
 
Spark ml streaming
Spark ml streamingSpark ml streaming
Spark ml streaming
 
PythonML.pptx
PythonML.pptxPythonML.pptx
PythonML.pptx
 

Recently uploaded

如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证zifhagzkk
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...ssuserf63bd7
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...ssuserf63bd7
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...BabaJohn3
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationmuqadasqasim10
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...yulianti213969
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfRobertoOcampo24
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchersdarmandersingh4580
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一fztigerwe
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样jk0tkvfv
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshareraiaryan448
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 

Recently uploaded (20)

如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 

Data Assessment and Analysis for Model Evaluation

  • 1. Data Assessment and Analysis 1). Importing necessary packages import pandas as pd import seaborn as sb import matplotlib.pyplot as plt import numpy as np 2). Importing dataset as a python data frame df=pd.read_csv('C:Userssaravananvideostasktask_data.csv') df.head() sample index class_l abel sensor 0 sensor 1 sensor 2 sensor 3 sensor 4 sensor 5 sensor 6 sensor 7 sensor 8 sensor 9 0 sample0 1.0 0.8342 51 0.7260 81 0.5359 04 0.2148 96 0.8737 88 0.7676 05 0.1113 08 0.5575 26 0.5996 50 0.6655 69 1 sample1 1.0 0.8040 59 0.2531 35 0.8698 67 0.3342 85 0.6040 75 0.4940 45 0.8335 75 0.1941 90 0.0149 66 0.8029 18 2 sample2 1.0 0.6944 04 0.5957 77 0.5812 94 0.7990 03 0.7628 57 0.6513 93 0.0759 05 0.0071 86 0.6596 33 0.8310 09 3 sample3 1.0 0.7836 90 0.0387 80 0.2850 43 0.6273 05 0.8006 20 0.4863 40 0.8277 23 0.3398 07 0.7313 43 0.8923 59 4 sample4 1.0 0.7888 35 0.1744 33 0.3487 70 0.9382 44 0.6920 65 0.3776 20 0.1837 60 0.6168 05 0.4928 99 0.9309 69 3). Analysis of dataset properties print("unique class_labels and that counts n",df.class_label.value_counts()) unique class_label and that counts -1.0 200 1.0 200 Name: class_label, dtype: int64 3.1) The Presence of null values in the dataset df.isnull().sum() sample index 0 class_label 0 sensor0 0 sensor1 0 sensor2 0 sensor3 0 sensor4 0 sensor5 0 sensor6 0 sensor7 0 sensor8 0 sensor9 0
  • 2. 3.2) Statistical understanding of dataset df.describe(include="all") sampl e index class_l abel sensor 0 sensor 1 sensor 2 sensor 3 sensor 4 sensor 5 sensor 6 sensor 7 sensor 8 sensor 9 count 400 400.00 0000 400.00 0000 400.00 0000 400.00 0000 400.00 0000 400.00 0000 400.00 0000 400.00 0000 400.00 0000 400.00 0000 400.00 0000 unique 400 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN top sampl e98 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN freq 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN mean NaN 0.0000 00 0.5236 61 0.5092 23 0.4812 38 0.5097 52 0.4978 75 0.5010 65 0.4904 80 0.4823 72 0.4828 22 0.5419 33 std NaN 1.0012 52 0.2681 94 0.2768 78 0.2875 84 0.2977 12 0.2882 08 0.2876 34 0.2899 54 0.2827 14 0.2961 80 0.2724 90 min NaN - 1.0000 00 0.0077 75 0.0038 65 0.0044 73 0.0014 66 0.0002 50 0.0004 25 0.0001 73 0.0033 22 0.0031 65 0.0004 52 25% NaN - 1.0000 00 0.2997 92 0.2830 04 0.2355 44 0.2626 97 0.2493 69 0.2694 30 0.2266 87 0.2428 48 0.2136 26 0.3212 64 50% NaN 0.0000 00 0.5349 06 0.5075 83 0.4602 41 0.5100 66 0.4978 42 0.4971 08 0.4773 41 0.4634 38 0.4622 51 0.5783 89 75% NaN 1.0000 00 0.7518 87 0.7278 43 0.7349 37 0.7689 75 0.7434 01 0.7388 54 0.7353 04 0.7324 83 0.7405 42 0.7689 90 max NaN 1.0000 00 0.9994 76 0.9986 80 0.9929 63 0.9951 19 0.9994 12 0.9973 67 0.9971 41 0.9982 30 0.9960 98 0.9994 65 3.3) Define Predictor_variables and Target_variable predictor_variables=df[['sensor0', 'sensor1', 'sensor2','sensor3', 'sensor4', 'sensor5', 'sensor6', 'sensor7', 'sensor8', 'sensor9']] target_variable=df[['class_label']] 3.4) Variance of predictor_variables data print("variance of all sensorsnn",np.var(predictor_variables)) variance of all sensors sensor0 0.071748 sensor1 0.076470 sensor2 0.082498 sensor3 0.088411 sensor4 0.082856 sensor5 0.082527 sensor6 0.083863 sensor7 0.079727 sensor8 0.087503 sensor9 0.074065
  • 3. 3.5) Probability distributions of the predictor_variables data sensors=predictor_variables.keys() for i in range(0,len(predictor_variables.keys())): k=sensors[i] sb.distplot(predictor_variables[""+k+""],label=""+k+"",hist=False) 3.6) Correlation analysis of Predictor_variables data sb.heatmap(predictor_variables.corr())
  • 4. 4) Properties of dataset: 1) Our dataset have 12 features with 400 observations. 2) It have one class_label that have either 1 or -1 with each class have equal samples like 200,200. 3) It's don't have any null values. 4) All predictor_variables data start with 0 and end with 1. 5) All predictor_variables data almost 95% normally distributed and also all variables have homogeneity of variance. 6) All predictor_variables are not highly correlated with each one. 5) Assumptions and process of thoughts: I am going to provide solutions with two approaches. Approach 1 1) Log loss Our dataset predictor_variables data start with 0 and end with 1 continuously. And our class_label (target_variable) have 1 and -1. So, based on those format it’s exactly look like sigmoid model outputs. If we want to know each sensor performance or predictive power, we can apply log loss metric to evaluate predictive power of each sensors. Log loss is the best metrics for binary evolution models / sigmoid function model (logistic or probit regression) because it will tell the exact loss value of each predicted value. In the sense we can know exact predictive power of binary model using log loss metric. Note: Xi means already predicted values and Y means actual values I derived formula for our problem of case scenario. Y = Target_variable (class_label) Xi …n = Predictor_variables (sensor0, sensor1...n) n = Number of observations Log_loss = [(Y * log(Xi)) + ((1-Y) * log(1-X1i))] Avg_log_loss = [(-1/n)* sum(Log_loss)]
  • 5. Output (Solution) using Log-loss: sensors_rank log_loss_score sensor8 -0.132343277 sensor4 -0.011416493 sensor0 0.243350704 sensor3 0.28850512 sensor5 0.52194066 sensor7 0.608291426 sensor2 0.829876049 sensor9 0.922617133 sensor6 1.025691108 sensor1 1.5173744 Note: Less log-loss score is a most accurate and important sensor. In order to that, the top sensor ranks are senor8, sensor4, sensor0, sensor3, sensor5, sensor7, sensor2, sensor9, sensor6 and sensor1. Approach 2 2) Linear Discriminant Analysis (LDA)  It’s interesting and scalable model which provides predictive power of model and model performance score (accuracy).  LDA approaches the problem by assuming that the conditional probability density functions P(X|Y==0) and P(X|Y==1) are both normally distributed with mean and covariance parameters.(In our case 1 or -1)  It consists of statistical properties and it’s calculated for each class. LDA Assumptions  Variance among group variables are the same across level of predictors.  Linear Discriminant Analysis be used when predictor variables variance/covariance are equal. ( Our dataset predictor variable variance also almost 95% same)  Linear Discriminant Analysis be used when predictor variables have homogeneity of normal distributions.  LDA be used when predictor variable are not highly correlated with each one. (LDA Assumptions and our dataset properties and assumptions are almost equal)
  • 6. Output (Solution) using LDA: Note 1: LDA score for predictive power of model. In order to that, the top sensor ranks are senor8, sensor4, sensor0, sensor3, sensor7, sensor9, sensor2, sensor5, sensor6 and sensor1. Note 2: Explained Variance Score is 100% that means model fitted 100% well. Note 3: LDA model accuracy score is 93%. 6) Strengths: sensors_rank LDA_Score sensor8 8.763614 sensor4 7.333124 sensor0 5.862246 sensor3 3.493563 sensor7 2.323977 sensor9 2.178563 sensor2 0.960444 sensor5 0.782621 sensor6 0.745524 sensor1 -1.9319 STRENGTHS  It is a measure of the performance of a classification model.  It will work better when we have binary prediction probability values.  It will provide loss values that is deals with target value.  It’s good evaluation metric for sigmoid models.  It’s multi-class log loss function when we have target variable with multi- class.  Not only LDA is to project the features in higher dimension space onto a lower dimensional space but also provide various impactful information, and evaluating predictive power of models.  It’s a better model for find predictive performance when we have target variable.  We can able to evaluate model efficiently when we have binary and multi-class target variable. Log Loss LDA
  • 7. 7) Weakness: 8) Scalabilities: WEAKNESS Log Loss LDA  Never it’s work when we don’t have sigmoid value.  LDA does not work well if the data is not balanced.  Sometimes LDA doesn’t work well when we have too many observations. SCALABILITY Log Loss LDA  It will provide accurate solution when we have n dimensional variables and sample’s  Doesn’t matter n number of features and samples because it is working based mathematical formula.  Processing time is too low for provide solution  We can able to integrate rules (Min- Max Rule,..) onto it.  It will work well when we have multiple number of features.  Doesn’t matter n number of features in LDA.  It will take few seconds to provide solution that means processing time.
  • 8. 9) Alternative Methods: 1. Threshold Based Scoring Mechanism(TBSM) Threshold Based Scoring Mechanism is technique which is providing solution based on threshold score. Strengths:  We can provide solution using simple threshold scoring.  We can optimize threshold values based problem. Weakness:  It’s little difficult to provide solution when we have multi-class target variable.  Difficult to assign threshold values when we have multi-class target variable.  It will take few more minutes for processing when we have n number features.  Code modification will happen at all time of processing. Scalability:  It not a good method when we have n number of features.  Modification always there when we going to process new data with same problem.  It’s not a good scalable method for evaluate model performance and power. Output (Solution) using TBSM: sensors_rank Threshold_error_score sensor8 47 sensor4 66 sensor0 80 sensor3 105 sensor9 182 sensor1 189 sensor7 190 sensor2 197 sensor6 201 sensor5 203 Note 1: Threshold based scoring mechanism scoring of sensors in order to that, the top sensor ranks are senor8, sensor4, sensor0, sensor3, sensor9, sensor1, sensor7, sensor2, sensor6 and sensor5.
  • 9. 2. Cross- Entropy Error Function: Cross-entropy error function are slightly different depending on the context, but in machine learning when calculating error rates between 0 and 1 it’s resolve same like log-loss. But in our case we have prediction probability Strengths:  It’s more like a log-loss function.  It’s a better loss function in machine learning and optimization.  It deals with classifying a given set of data points into two possible classes generically labelled 0 and 1 in our case -1 and 1.  We can able to assign vector of weights W.  We can use cross entropy to get a measure of dissimilarity between p and q. Weakness:  Assigning vector of weights W is fully depends of other optimization modelling like Gradient - descent.  Assigning vector of weights W is little difficult when we have multi class target variable. Scalability:  We can perform efficiently when we have n number features because it’s also a mathematical formula directly applied on problem. 10) Suggestions:  LDA and log-loss providing 90% same solution.  Log-loss is more popular and optimization loss function which provide exact loss value for each observation deals with target observation. So, it’s a good approach for this model evaluation task.  Linear Discriminant Analysis is interesting and new approach for evaluate model predictive power which will tell most accurate prediction variable (sensor).  We can do further optimizing techniques, modelling, pipelines using this both. Y = Target_variable (class_label) Xi …n = Predictor_variables (sensor0, sensor1...n) p = {Y, 1-Y} q = {Xi, 1-Xi} H (p, q) = -Y log Xi – (1- Y) log (1-Xi)