Final Presentation for Pattern Recognition

4-Fold Cross Validation
Study

David Glen
OCstar Inc.

Introduction

Purpose
● Determine Best Classifier
● Predict classifier performance on unseen data

4fold cross validation performed on:
● K Nearest Neighbors
● Bayesian
● Artificial Naural Network

N-Fold Cross Validation
● Technique for comparing classification algorithms
● Insight on how classifiers perform on unseen data

Process
● Training data partisioned into N groups
● N1 groups used to train classifier
● 1 group used to test classifier
● Repeated for all groups

K = 5 Nearest Neighbors

Algorithm
● The 5 nearest points in the training set to the input
● Majority vote of nearest points classifies input
● If a tie exists, the number of nearest points is
reduced

Distance Metric is Euclidian

Bayesian
● Probability mathmatics foundation
● Uses statistical data from training set
● Mean i of each class
● Average covariance  of all classes

● Uses discriminants
t 1
gi(x) = 0.5(x – i)  (x – i) + ln P(i)

Artificial Neural Network
● Interconnected network Output Class
of nonlinear nodes

● Weight Matrices govern
performance

● Weights trained by
gradient descent
Feature Input

Results: 5 Nearest
Neighbors
● Consistant performance of 97% between folds
● Most commonly confused class, varies between
folds
● Worst class 80% correct in worst case

● Does not provide insight on error classes for
application

Results: Bayesian Classifier
● Performance varied slightly between folds
● Precision varied between 97% and 100% accuracy,
with an average of 98.75%
● All observed errors on class 'x'
● Class x 70% correct in worst case, and 87.5% on
average

● 'x' is likely to be a problem class in application

Results: Artificial Neural Net
● Inconsistent results varying between 77% correct
and 96%
● Possible that worst case did not converge during
training.
● Average performance wihtout worse case 95.33%

● Problem classes varied between folds

Study Conclusion

● Bayesian classifier recomended

● Best average precision between folds
● Errors confined to class 'x'
● Class 'x' correct 87.5% on average, 70% in worst case
● Provides insight a postprocessing technique could take
advantage of

Results on Final Data Set
a c e m n o r s x z
a 120 0 0 0 0 0 0 0 0 0 0
c 0 120 0 0 0 0 0 0 0 0 0
e 0 2 118 0 0 0 0 0 0 0 2
m 0 0 0 120 0 0 0 0 0 0 0
n 0 0 0 0 120 0 0 0 0 0 0
o 0 0 1 0 0 119 0 0 0 0 1
r 0 0 0 0 0 0 120 0 0 0 0
s 0 0 0 0 0 0 0 120 0 0 0
x 0 0 0 0 1 0 0 27 92 0 28
z 0 0 0 0 0 0 2 0 0 118 2

0 2 1 0 1 0 2 27 0 0 33
97.25% correct 2.75% error

● Class 'x' 76.7% correct

Final Presentation for Pattern Recognition

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Final Presentation for Pattern Recognition

Similar to Final Presentation for Pattern Recognition (20)

Final Presentation for Pattern Recognition