Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Principal Component Analysis For Novelty Detection


Published on

feature selection for pattern recognition in novelty detection; mechanical vibrations

  • Be the first to comment

  • Be the first to like this

Principal Component Analysis For Novelty Detection

  1. 1. Principal Component Analysis for Novelty DetectionA journal article submitted to and accepted by Pattern Recognition Letters Jordan McBain, P.Eng. Markus Timusk, PhD, P.Eng.
  2. 2. Condition Monitoring Maintenance technique  Maintenance undertaken when some indicator of health is flagged  Advanced technique employed when cost-benefit analysis justifies the expense of monitoring equipment  Alternative to run-to-failure maintenance and statistically determined time-based maintenance Employ pattern recognition to automate diagnosis  Expert system employed to replicate technicians maintenance insight  Computer and sensors replaces technician and screw driver set atop vibrating machine – the nature of the vibration used to discern state
  3. 3. Pattern Recognition Equality insufficient means of classifying real-world members of class (noise, variance, etc) Pattern recognition  Real-world signals presumed to be representative of class reduced to representative n-dimensional feature vectors  Plotted in N-dimensional space  Decision boundary generated with pattern recognition techniques  Employed as classification rule  Problems  Choice of features  How representative?  Maximize number of features?  Curse of dimensionality  Imbalance of data
  4. 4. Principal Component Analysis One technique used to find “optimal” set of features  Finds the axes of normally distributed data  Select the largest axes and omit smaller ones to define new basis  Project data onto basis to reduce dimensionality of problem space Each feature presumed to be normally distributed
  5. 5.  N-dimensional scattering of features presumed independent Combined probability: P( A B) P( A)* P( B)
  6. 6. d d 1 xi i 2  1 2 ( )p( x ) p ( xi ) e i i 1 i 1 2 i d 1 x i 2 1   t   ( i ) 1 2i1 1 2 (x ) 1 (x ) d e i e d (2 ) d (2 ) | | i i 1  Find principal components (i.e. axes of hyper-ellipsoidal distribution)  Select maximum variance (largest axes)  Eigenvalue problem  Eigenvectors – principle components  Eigenvalues – size of axis
  7. 7. Novelty Detection Deals with imbalance of data between classes Fault detection in machinery  Easy to collect data representative of healthy state  Difficult to collect data representative of faulted states  Costly to break machinery  Operationally unacceptable  Poor database of faults kept  Can never capture them all! Model healthy data with decision boundary  If test patterns fall outside, classify as a fault!
  8. 8. Problem PCA is best for selecting a subspace that best represents the data In pattern recognition, we seek to discriminant between classes Objective of most feature reduction techniques are not optimized for novelty detection
  9. 9. Feature Reduction Techniques
  10. 10. Feature Reduction Techniques Feature Selection vs. Feature Extraction Selection  Choosing small subsets of features that are adequate to describe classes  E.g. “Search”  Examines all subsets of feature combinations to find the one which maximizes some objective function  May employ classifier error as objective function  Exponential explosion  Heuristics to mitigate possible  If computationally feasible, gives the best results Extraction  Computes a small number of new features form the set of old features  E.g. PCA
  11. 11. Principal Component Analysis Seeks a subspace in which the data representation error is minimal Development  For a set of n vectors in d-dimensional space  seek the equation of a hyper plane onto which the data may be projected with minimal representation error  Hyper plane fixed at the data’s mean, m  Hyper plane’s orientation defined by direction vector, w (normal definition of a plane)  Derive error function
  12. 12.  Optimization problem well known eigenvalue problem Resultant feature space is linear  May not represent non-linear and changing data well  Kernel PCA and Dynamic PCA Techniques only suitable for representing data not discriminating between them Source: Duda, 2000
  13. 13. Multiple Discriminant Analysis Seeks to find efficient subspaces for discrimination rather than representation Development  Two class problem with d-dimensional set of n-vectors grouped into D1 and D2  Projected onto some direction vector w to give  Consequently grouped into subsets Y1 and Y 2  Find the direction vector w such that the distance between projected sample means m1 and m2 is maximized  Rationalize the distance against the relative sample size
  14. 14.  Reduces to Solution is described as “analogous to the well known Rayleigh quotient:”    1 w S w (m1 m2 ) Technique extended for problems with n-classes  Objective to maximize the spread between all classes in the projected space Source: Duda, 2000
  15. 15. Extraction for Novelty Detection
  16. 16. Development Objective: distinguish between normal and abnormal classes  KFDA inappropriate (assumes classes group well into separate classes)  Novelty detection – classes may cluster well but abnormal classes expected to orbit the normal data  Means could overlap  Eliminating previous objective functions Approach: find the subspace maximizing difference between average spread of the normal class and average spread of the abnormal class measured from the mean of the normal class
  17. 17.  Mathematically, for an outlier class containing b elements and target class containing a-elements with mean m_t To simplify, introduce outlier scatter matrix, O, for outlier data centered at m_t Reducing to
  18. 18.  Maximize this objective function  Find the eigenvectors and eigenvalues of the matrix St-O Select the first k largest eigenvalues and use corresponding eigenvectors as new basis Project data onto new basis Proceed with classification Limitations  Still dependant on assumption of normal data distribution  (as are other PCA techniques)  Assumption: normal data scatter somewhat circularly and outlier data orbit nicely without intruding  (as with PCA and MDA )  Machinery vibration data are not normally Gaussian (heuristic)
  19. 19. Validation: Artificial Data Artificial 3-d data set  Normal distribution:  spherical (radius 50) centered at origin  Outlier distribution:  randomly generated spherical distribution (radius 100)  Not permitted to fall within cylinder concentric with the normal data’s sphere and oriented with length parallel to [1,1,1]
  20. 20. Validation: Artifical Data Results (reduced to 2 dimensions)  Subspace’s normal vector only 7 degrees off from expected [1,1,1]
  21. 21. Experimental Methodology
  22. 22. Apparatus Spectraquest gear dynamics simulator  3-hp motor  Magnetic particle brake loading  National Instruments PXI data acquisition and control  Accelerometers (sampling 4kHz)
  23. 23. Faults 4 motors employed  healthy  Combo bearing faults  Broken rotor bars  Rotor unbalance Gearbox faults  Fault-free conditions  Missing tooth gear  Chipped tooth  Bearing with outer race faults  Bearings with inner and outer race faults
  24. 24. Feature Extraction Autoregressive model  a model of a statistical process generated by regressing previous values of that statistical process with itself  Sampling of sampled signal that best represents the original sampling  Order 10Segmentation Vibration data segmented into groups based on intervals with constant number of shaft rotations  Gaussian Window  70% overlap between segments
  25. 25. Results: Proposed Algorithm
  26. 26. Results: Kernel PCA
  27. 27. Results: Kernel FDA N.B. Potential for singular matrices
  28. 28. Results: Exhaustive Feature Search
  29. 29. Feature Extraction in the Absence of Outliers
  30. 30. Motivation and Development The above violates assumption of novelty detection  Limited data from fault classes In the case where we know nothing of the outlier classes  Work with what we have: normal data  Minimize variance of normal data
  31. 31. Results: Novelty Reduction (OutlierAbsence)
  32. 32. Conclusions
  33. 33. Conclusions Reduce a large feature space to a smaller one  Mitigate the curse of dimensionality Objective function tweaked for novelty detection Similar to MDA but modified to accommodate case where normal and outlier means are closely separated Results good for artificial and machinery data Future work  Extend technique with kernels  Difficult problem due to need for mean Thanks  CEMI  Dr. Mechefske, Queens