SlideShare a Scribd company logo
1 of 12
Download to read offline
4-Fold Cross Validation
                Study



    David Glen
    OCstar Inc.



                   
Introduction

    Purpose
    ●   Determine Best Classifier
    ●   Predict classifier performance on unseen data


    4­fold cross validation performed on:
    ●   K Nearest Neighbors
    ●   Bayesian
    ●   Artificial Naural Network
                                 
N-Fold Cross Validation
    ●   Technique for comparing classification algorithms
    ●   Insight on how classifiers perform on unseen data


    Process
    ●   Training data partisioned into N groups
    ●   N­1 groups used to train classifier
    ●   1 group used to test classifier
    ●   Repeated for all groups
                                   
K = 5 Nearest Neighbors

    Algorithm
    ●   The 5 nearest points in the training set to the input
    ●   Majority vote of nearest points classifies input
    ●   If a tie exists, the number of nearest points is 
        reduced


    Distance Metric is Euclidian


                                    
Bayesian
    ●   Probability mathmatics foundation
    ●   Uses statistical data from training set
        ●   Mean i of each class
        ●   Average covariance  of all classes


    ●   Uses discriminants
                           t    ­1 
    gi(x) = ­0.5(x – i)  (x – i) + ln P(i)


                                       
Artificial Neural Network
    ●   Interconnected network        Output Class
        of non­linear nodes


    ●   Weight Matrices govern 
        performance


    ●   Weights trained by 
        gradient descent
                                      Feature Input
                                   
Results: 5 Nearest
                  Neighbors
    ●   Consistant performance of 97% between folds
    ●   Most commonly confused class, varies between 
        folds
    ●   Worst class 80% correct in worst case


    ●   Does not provide insight on error classes for 
        application


                                  
Results: Bayesian Classifier
    ●   Performance varied slightly between folds
    ●   Precision varied between 97% and 100% accuracy, 
        with an average of 98.75%
    ●   All observed errors on class 'x'
    ●   Class x 70% correct in worst case, and 87.5% on 
        average


    ●   'x' is likely to be a problem class in application

                                   
Results: Artificial Neural Net
    ●   Inconsistent results varying between 77% correct 
        and 96%
    ●   Possible that worst case did not converge during 
        training.
    ●   Average performance wihtout worse case 95.33%


    ●   Problem classes varied between folds


                                  
Study Conclusion

    ●   Bayesian classifier recomended

    ●   Best average precision between folds
    ●   Errors confined to class 'x'
        ●   Class 'x' correct 87.5% on average, 70% in worst case
        ●   Provides insight a post­processing technique could take 
            advantage of


                                      
Results on Final Data Set
          a    c     e     m     n     o     r     s      x    z
     a 120     0     0     0     0     0     0     0      0    0    0
     c    0   120    0     0     0     0     0     0      0    0    0
     e    0    2    118    0     0     0     0     0      0    0    2
     m    0    0     0    120    0     0     0     0      0    0    0
     n    0    0     0     0    120    0     0     0      0    0    0
     o    0    0     1     0     0    119    0     0      0    0    1
     r    0    0     0     0     0     0    120    0      0    0    0
     s    0    0     0     0     0     0     0    120     0    0    0
     x    0    0     0     0     1     0     0    27    92     0    28
     z    0    0     0     0     0     0     2     0      0   118   2


          0    2     1     0     1     0     2    27      0    0    33
                      97.25% correct        2.75% error


     ●   Class 'x'  76.7% correct
                                         
Thank You




    Questions?


         

More Related Content

Viewers also liked

Buscadores especializados para docentes ad sjuliol 2013
Buscadores especializados para docentes  ad sjuliol 2013Buscadores especializados para docentes  ad sjuliol 2013
Buscadores especializados para docentes ad sjuliol 2013
TICS & Partners
 
Java WebServices JaxWS - JaxRs
Java WebServices JaxWS - JaxRsJava WebServices JaxWS - JaxRs
Java WebServices JaxWS - JaxRs
Hernan Rengifo
 
Apache Mahout Algorithms
Apache Mahout AlgorithmsApache Mahout Algorithms
Apache Mahout Algorithms
mozgkarakaya
 
Unidad 10 Mad Diagrama De Clases
Unidad 10 Mad Diagrama De ClasesUnidad 10 Mad Diagrama De Clases
Unidad 10 Mad Diagrama De Clases
Sergio Sanchez
 
Neural networks...
Neural networks...Neural networks...
Neural networks...
Molly Chugh
 
Modelos de Base de Datos
Modelos de Base de DatosModelos de Base de Datos
Modelos de Base de Datos
Axel Mérida
 

Viewers also liked (20)

Buscadores especializados para docentes ad sjuliol 2013
Buscadores especializados para docentes  ad sjuliol 2013Buscadores especializados para docentes  ad sjuliol 2013
Buscadores especializados para docentes ad sjuliol 2013
 
Filtros Colaborativos y Sistemas de Recomendación
Filtros Colaborativos y Sistemas de RecomendaciónFiltros Colaborativos y Sistemas de Recomendación
Filtros Colaborativos y Sistemas de Recomendación
 
Java WebServices JaxWS - JaxRs
Java WebServices JaxWS - JaxRsJava WebServices JaxWS - JaxRs
Java WebServices JaxWS - JaxRs
 
Cross-Validation
Cross-ValidationCross-Validation
Cross-Validation
 
Apache Mahout Algorithms
Apache Mahout AlgorithmsApache Mahout Algorithms
Apache Mahout Algorithms
 
Intro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopIntro to Mahout -- DC Hadoop
Intro to Mahout -- DC Hadoop
 
Modelo del dominio
Modelo del dominioModelo del dominio
Modelo del dominio
 
Rosetta Stone Presentation
Rosetta Stone PresentationRosetta Stone Presentation
Rosetta Stone Presentation
 
Unidad 10 Mad Diagrama De Clases
Unidad 10 Mad Diagrama De ClasesUnidad 10 Mad Diagrama De Clases
Unidad 10 Mad Diagrama De Clases
 
Intro to Apache Mahout
Intro to Apache MahoutIntro to Apache Mahout
Intro to Apache Mahout
 
Protein-ligand docking
Protein-ligand dockingProtein-ligand docking
Protein-ligand docking
 
Neural networks...
Neural networks...Neural networks...
Neural networks...
 
Homology modelling
Homology modellingHomology modelling
Homology modelling
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
 
Protein Structure Prediction
Protein Structure PredictionProtein Structure Prediction
Protein Structure Prediction
 
Knowledge based systems
Knowledge based systemsKnowledge based systems
Knowledge based systems
 
Modelos de Base de Datos
Modelos de Base de DatosModelos de Base de Datos
Modelos de Base de Datos
 
neural network
neural networkneural network
neural network
 
Modelo relacional
Modelo relacionalModelo relacional
Modelo relacional
 

Similar to Final Presentation for Pattern Recognition

L1 statistics
L1 statisticsL1 statistics
L1 statistics
dapdai
 
1242019 Z Score Table - Z Table and Z score calculationw.docx
1242019 Z Score Table - Z Table and Z score calculationw.docx1242019 Z Score Table - Z Table and Z score calculationw.docx
1242019 Z Score Table - Z Table and Z score calculationw.docx
aulasnilda
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
sriram30691
 

Similar to Final Presentation for Pattern Recognition (20)

L1 statistics
L1 statisticsL1 statistics
L1 statistics
 
Stats chapter 2
Stats chapter 2 Stats chapter 2
Stats chapter 2
 
Lecture 10.4 bt
Lecture 10.4 btLecture 10.4 bt
Lecture 10.4 bt
 
Gauss Seidal method.ppt
Gauss Seidal method.pptGauss Seidal method.ppt
Gauss Seidal method.ppt
 
Fundamentalsof Crime Mapping 8
Fundamentalsof Crime Mapping 8Fundamentalsof Crime Mapping 8
Fundamentalsof Crime Mapping 8
 
1242019 Z Score Table - Z Table and Z score calculationw.docx
1242019 Z Score Table - Z Table and Z score calculationw.docx1242019 Z Score Table - Z Table and Z score calculationw.docx
1242019 Z Score Table - Z Table and Z score calculationw.docx
 
Applied Mathematics Under Uncertainty: How Many Teller Stations
Applied Mathematics Under Uncertainty: How Many Teller StationsApplied Mathematics Under Uncertainty: How Many Teller Stations
Applied Mathematics Under Uncertainty: How Many Teller Stations
 
Corrleation and regression
Corrleation and regressionCorrleation and regression
Corrleation and regression
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
 
TESCO Evaluation of Non-Normal Meter Data
TESCO Evaluation of Non-Normal Meter DataTESCO Evaluation of Non-Normal Meter Data
TESCO Evaluation of Non-Normal Meter Data
 
Robustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsRobustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning Methods
 
Ilya Kuzovkin - Adaptive Interactive Learning for Brain-Computer Interfaces
Ilya Kuzovkin - Adaptive Interactive Learning for Brain-Computer InterfacesIlya Kuzovkin - Adaptive Interactive Learning for Brain-Computer Interfaces
Ilya Kuzovkin - Adaptive Interactive Learning for Brain-Computer Interfaces
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Accurate Campaign Targeting Using Classification - Poster
Accurate Campaign Targeting Using Classification - PosterAccurate Campaign Targeting Using Classification - Poster
Accurate Campaign Targeting Using Classification - Poster
 
Amissions in India
Amissions in India Amissions in India
Amissions in India
 
Week 7 Lecture
Week 7 LectureWeek 7 Lecture
Week 7 Lecture
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
 
The Curse of P90
The Curse of P90The Curse of P90
The Curse of P90
 
Admission in india 2015
Admission in india 2015Admission in india 2015
Admission in india 2015
 

Final Presentation for Pattern Recognition

  • 1. 4-Fold Cross Validation Study David Glen OCstar Inc.    
  • 2. Introduction Purpose ● Determine Best Classifier ● Predict classifier performance on unseen data 4­fold cross validation performed on: ● K Nearest Neighbors ● Bayesian ● Artificial Naural Network    
  • 3. N-Fold Cross Validation ● Technique for comparing classification algorithms ● Insight on how classifiers perform on unseen data Process ● Training data partisioned into N groups ● N­1 groups used to train classifier ● 1 group used to test classifier ● Repeated for all groups    
  • 4. K = 5 Nearest Neighbors Algorithm ● The 5 nearest points in the training set to the input ● Majority vote of nearest points classifies input ● If a tie exists, the number of nearest points is  reduced Distance Metric is Euclidian    
  • 5. Bayesian ● Probability mathmatics foundation ● Uses statistical data from training set ● Mean i of each class ● Average covariance  of all classes ● Uses discriminants t  ­1  gi(x) = ­0.5(x – i)  (x – i) + ln P(i)    
  • 6. Artificial Neural Network ● Interconnected network  Output Class of non­linear nodes ● Weight Matrices govern  performance ● Weights trained by  gradient descent Feature Input    
  • 7. Results: 5 Nearest Neighbors ● Consistant performance of 97% between folds ● Most commonly confused class, varies between  folds ● Worst class 80% correct in worst case ● Does not provide insight on error classes for  application    
  • 8. Results: Bayesian Classifier ● Performance varied slightly between folds ● Precision varied between 97% and 100% accuracy,  with an average of 98.75% ● All observed errors on class 'x' ● Class x 70% correct in worst case, and 87.5% on  average ● 'x' is likely to be a problem class in application    
  • 9. Results: Artificial Neural Net ● Inconsistent results varying between 77% correct  and 96% ● Possible that worst case did not converge during  training. ● Average performance wihtout worse case 95.33% ● Problem classes varied between folds    
  • 10. Study Conclusion ● Bayesian classifier recomended ● Best average precision between folds ● Errors confined to class 'x' ● Class 'x' correct 87.5% on average, 70% in worst case ● Provides insight a post­processing technique could take  advantage of    
  • 11. Results on Final Data Set a c e m n o r s x z a 120 0 0 0 0 0 0 0 0 0 0 c 0 120 0 0 0 0 0 0 0 0 0 e 0 2 118 0 0 0 0 0 0 0 2 m 0 0 0 120 0 0 0 0 0 0 0 n 0 0 0 0 120 0 0 0 0 0 0 o 0 0 1 0 0 119 0 0 0 0 1 r 0 0 0 0 0 0 120 0 0 0 0 s 0 0 0 0 0 0 0 120 0 0 0 x 0 0 0 0 1 0 0 27 92 0 28 z 0 0 0 0 0 0 2 0 0 118 2 0 2 1 0 1 0 2 27 0 0 33 97.25% correct 2.75% error ● Class 'x'  76.7% correct    
  • 12. Thank You Questions?