In this presentation, two different data-sets are being collected to implement the machine learning classification techniques introduced from introduction to data mining and machine learning coursework. Both data-sets are collected by analyzing their output and team members interest. Following are the data-sets named as, Electricity grid stability simulated data-set and Face Recognition on Olivetti Data set
2. Problem formulation
● Electricity data set
● This is classification
problem of electrical grid
stability into two classes;
● 1.Stable
● 2.Unstable
● Olivetti Dataset
● Recognizing face with highest
accuracy using machine learning
algorithms
● Comparison of their accuracies to
select the most fitted face
recognition algorithm.
3. About Electricity Data Set
•Electrical Grid data set in which we have different attributes for examine the stability of
the system.
• We will examine the response of the system stability depends on 10,000 observations
and 13 attributes, 1 classes attribute (stabf). Attributes are given in dataset as;
● tau[x] : Reaction time of participant (real from the range [0.5,10]s).
● p[x] : Nominal power consumed(negative)/produced(positive)(real).
● g[x] : Coefficient (gamma) proportional to price elasticity (real from the range
[0.05,1]s^-1).
● stab: The maximal real part of the characteristic equation root (if positive - the
system is linearly unstable)(real)
● stabf: The stability label of the system (stable/unstable)
10. Classification by ratio 95:5
Method Accuracy Confusion Matrix
Naive Bayes 0.832
KNN 0.92
SVM 0.92
Decision Tree 0.878
Random Forest 0.826
11. Selecting different features and results
Naive Bayes:
Accuracy - 0.982
KNN:
Accuracy - 0.964
Decision Tree
Accuracy - 1.0
Random Forest:
Accuracy - 1.0
SVM
Accuracy - 99.2
Test: 0.05
12. Best algorithm fit for electricity simulated data Set for
classification
Random Forest( Using n_estimators = 60 ):
● Test dataset: 90:10
● Accuracy = 0.923
● Confusion Matrix:
● Although the KNN and SVM has given good accuracy at ratio of 95:5, but it is only on
test dataset – need further evaluation on training dataset.
● By selecting different features the accuracy observed is 1, but the model is
overfitted – Need prepossing and proper selection of features to train the models to
improve accuracy.
Data is collected by taking the pictures of 40 different human personalities in each column of the dataset.
Each row is the sample of one human personality with different face emojis to train the dataset.
Dimensions: 10:400
No N/As
Machine learning models can work on vectors. Since the image data is in the matrix form, it must be converted to a vector.
Different testing datasets are being used: 0.1,0.2,0.3.
SVM is not affected by PCA
100 is due to small and clean data set
Increasing the percentage of training set improves accuracy