Object Detection Using the Documented Viola-Jones Technique. Student: Nathan Faggian,  Supervisors: Dr. Andrew Paplinski, Prof. Bala Srinivasan. Version  1.1
What is Object Detection? Detecting a specified object class within a image. Object detection has many applications in computer based vision. Object tracking, object recognition, scene surveillance. The focus of this project was to implement object detection, and to detect objects of the class face.
How Is It Done? A standard pattern recognition problem. Feature extraction. Something that can be representative of a face. Feature evaluation. Does this “something” really represent a face. A bit of a black art… Classification. Given a sample and its features,  what is it?
Common Techniques Strong focuses on statistics. Statistical models of images. Schneiderman-Kanade A lot of work with Neural networks. Generally slow systems. Rowley-Balauja Feature and Template methods seem to be the most common.
Features of Good Techniques Quick to compute. Classification of a face does not require a lot of offline processing. Accurate. Most good implementations can provide accuracy above the 90 percentile. Capitalization on invariance. Features are invariant. Scale, luminance, rotation.
Paul Viola and Michael Jones Devised a technique that was both robust and very quick.  15 times quicker than any technique at the time of release. A detection algorithm that could be operated in real-time.   95% accuracy at around 17fps. Understanding is the primary goal. It is a good technique !
The Viola-Jones Technique Feature extraction and feature evaluation. Rectangular features are used, with a new image representation their calculation is very fast. Classifier training and feature selection using a method called AdaBoost. A long and exhaustive training process. A degenerate decision tree of classifiers is formed. The key to the techniques speed.
Features Four basic types. They are easy to calculate. The white areas are subtracted from the black ones. A special representation of the sample called the  integral image  makes feature extraction faster.
Integral images Summed area tables A representation that means any rectangle’s area can be calculated in four indexes to the integral image.
Feature Extraction Features are extracted from sub windows of an sample image. The base size for a sub window is 24 by 24 pixels. In a 24 pixel by 24 pixel sub window there are 180,000 possible features to be calculated. What is the end result of feature extraction? A lot of data!  This is called over fitting and the amount of data must be reduced. Overfitting can be compensated to an extent by logical elimination.
Weak Classifiers A feature, a threshold and a parity. Thresholds are obtained by obtaining the mean value for the feature on both class sets and then averaging the two values. Parity defines the direction of the equality.
Feature/Classifier Evaluation Using AdaBoost the number of features is dramatically reduced. A simple algorithm that selects one feature at a time and assigns weights to the feature. Producing a strong classifier. It is a method of selecting features but also able to train classifiers in a tree as well.  Features are clustered together to form nodes in a degenerate decision tree.
AdaBoost Given a sample set of images.  For t = 1…T (rounds of boosting) A weak classifier is trained using a single feature. The error of the classifier is calculated. The classifier (or single feature) with the lowest error is selected, and combined with the priors to make a strong classifier. After a  T  rounds a  T  strong classifier is created.  It is the weighted linear combination of the weak classifiers selected.
Hard examples are isolated…
Classifier error is driven down.
The Attentional Cascade Referred here as a degenerate decision tree. The reason the technique is fast. Quick rejection of sub windows.
Motivation for a cascade Speed. Reduction of false positives. Each node is trained with the false positives of the prior.  AdaBoost can be used in conjunction with a simple bootstrapping process to drive detection error down.  Viola and Jones presented a method to do this, that  iteratively  builds boosted nodes, to a desired false positive rate.
Implementations Two implementations were realized. Matlab based. Improved flexibility, able to produce quicker results. C++ system, more of a framework. (much faster)  Far faster than the interpreted Matlab scripts yet less flexibility.
Current Progress Attentional cascade training code is complete. Matlab/C++ framework for future work. Numerous monolithic detectors have been trained.
An existing system OpenCV (intel) There is still much work to do!
Questions? How was my talk? Can anything be explained better? Email: nathan.faggian@mail.csse.monash.edu.au

Face Detection techniques

  • 1.
    Object Detection Usingthe Documented Viola-Jones Technique. Student: Nathan Faggian, Supervisors: Dr. Andrew Paplinski, Prof. Bala Srinivasan. Version 1.1
  • 2.
    What is ObjectDetection? Detecting a specified object class within a image. Object detection has many applications in computer based vision. Object tracking, object recognition, scene surveillance. The focus of this project was to implement object detection, and to detect objects of the class face.
  • 3.
    How Is ItDone? A standard pattern recognition problem. Feature extraction. Something that can be representative of a face. Feature evaluation. Does this “something” really represent a face. A bit of a black art… Classification. Given a sample and its features, what is it?
  • 4.
    Common Techniques Strongfocuses on statistics. Statistical models of images. Schneiderman-Kanade A lot of work with Neural networks. Generally slow systems. Rowley-Balauja Feature and Template methods seem to be the most common.
  • 5.
    Features of GoodTechniques Quick to compute. Classification of a face does not require a lot of offline processing. Accurate. Most good implementations can provide accuracy above the 90 percentile. Capitalization on invariance. Features are invariant. Scale, luminance, rotation.
  • 6.
    Paul Viola andMichael Jones Devised a technique that was both robust and very quick. 15 times quicker than any technique at the time of release. A detection algorithm that could be operated in real-time. 95% accuracy at around 17fps. Understanding is the primary goal. It is a good technique !
  • 7.
    The Viola-Jones TechniqueFeature extraction and feature evaluation. Rectangular features are used, with a new image representation their calculation is very fast. Classifier training and feature selection using a method called AdaBoost. A long and exhaustive training process. A degenerate decision tree of classifiers is formed. The key to the techniques speed.
  • 8.
    Features Four basictypes. They are easy to calculate. The white areas are subtracted from the black ones. A special representation of the sample called the integral image makes feature extraction faster.
  • 9.
    Integral images Summedarea tables A representation that means any rectangle’s area can be calculated in four indexes to the integral image.
  • 10.
    Feature Extraction Featuresare extracted from sub windows of an sample image. The base size for a sub window is 24 by 24 pixels. In a 24 pixel by 24 pixel sub window there are 180,000 possible features to be calculated. What is the end result of feature extraction? A lot of data! This is called over fitting and the amount of data must be reduced. Overfitting can be compensated to an extent by logical elimination.
  • 11.
    Weak Classifiers Afeature, a threshold and a parity. Thresholds are obtained by obtaining the mean value for the feature on both class sets and then averaging the two values. Parity defines the direction of the equality.
  • 12.
    Feature/Classifier Evaluation UsingAdaBoost the number of features is dramatically reduced. A simple algorithm that selects one feature at a time and assigns weights to the feature. Producing a strong classifier. It is a method of selecting features but also able to train classifiers in a tree as well. Features are clustered together to form nodes in a degenerate decision tree.
  • 13.
    AdaBoost Given asample set of images. For t = 1…T (rounds of boosting) A weak classifier is trained using a single feature. The error of the classifier is calculated. The classifier (or single feature) with the lowest error is selected, and combined with the priors to make a strong classifier. After a T rounds a T strong classifier is created. It is the weighted linear combination of the weak classifiers selected.
  • 14.
    Hard examples areisolated…
  • 15.
    Classifier error isdriven down.
  • 16.
    The Attentional CascadeReferred here as a degenerate decision tree. The reason the technique is fast. Quick rejection of sub windows.
  • 17.
    Motivation for acascade Speed. Reduction of false positives. Each node is trained with the false positives of the prior. AdaBoost can be used in conjunction with a simple bootstrapping process to drive detection error down. Viola and Jones presented a method to do this, that iteratively builds boosted nodes, to a desired false positive rate.
  • 18.
    Implementations Two implementationswere realized. Matlab based. Improved flexibility, able to produce quicker results. C++ system, more of a framework. (much faster) Far faster than the interpreted Matlab scripts yet less flexibility.
  • 19.
    Current Progress Attentionalcascade training code is complete. Matlab/C++ framework for future work. Numerous monolithic detectors have been trained.
  • 20.
    An existing systemOpenCV (intel) There is still much work to do!
  • 21.
    Questions? How wasmy talk? Can anything be explained better? Email: nathan.faggian@mail.csse.monash.edu.au