SlideShare a Scribd company logo
1 of 48
Download to read offline
Binary Features
Steven C. Mitchell, Ph.D.
Componica, LLC
What’s a Binary Feature?
What’s a Binary Feature?
-Let’s take an image, and sample a region of interest, a 4x4 patch. Maybe you’re looking for
a face, or a tumor, or gun.
-In a typical object detection system, this region of interest will be scanned across the image
over different scales.
-Typically you scan left-to-right, top-to-bottom in steps of 10% the size of the patch. Then
you shrink the image (or scale the patch) by 20% and start over. Continue doing that until the
image becomes too small or you found what you’re looking for.
-So let’s start with this patch (we’ll assume only gray values, forget about color for now).
-First the pixels have value, typically from 0 to 255.
-Now we also need a way of addressing the location of the these pixels. I’ll use a simple
number scheme as the patches will always be 4x4.
-Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11.
-Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
-So let’s start with this patch (we’ll assume only gray values, forget about color for now).
-First the pixels have value, typically from 0 to 255.
-Now we also need a way of addressing the location of the these pixels. I’ll use a simple
number scheme as the patches will always be 4x4.
-Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11.
-Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
-So let’s start with this patch (we’ll assume only gray values, forget about color for now).
-First the pixels have value, typically from 0 to 255.
-Now we also need a way of addressing the location of the these pixels. I’ll use a simple
number scheme as the patches will always be 4x4.
-Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11.
-Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
-So let’s start with this patch (we’ll assume only gray values, forget about color for now).
-First the pixels have value, typically from 0 to 255.
-Now we also need a way of addressing the location of the these pixels. I’ll use a simple
number scheme as the patches will always be 4x4.
-Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11.
-Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
-So let’s start with this patch (we’ll assume only gray values, forget about color for now).
-First the pixels have value, typically from 0 to 255.
-Now we also need a way of addressing the location of the these pixels. I’ll use a simple
number scheme as the patches will always be 4x4.
-Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11.
-Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
-So let’s start with this patch (we’ll assume only gray values, forget about color for now).
-First the pixels have value, typically from 0 to 255.
-Now we also need a way of addressing the location of the these pixels. I’ll use a simple
number scheme as the patches will always be 4x4.
-Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11.
-Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
-So let’s start with this patch (we’ll assume only gray values, forget about color for now).
-First the pixels have value, typically from 0 to 255.
-Now we also need a way of addressing the location of the these pixels. I’ll use a simple
number scheme as the patches will always be 4x4.
-Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11.
-Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
-Ok, let’s try different patches with the same binary feature, that is compare location 5 and
11.
-Now imagine I try a whole bunch on pairs on a given patch. 2 vs 14, 8 vs 4, 7 vs 2, etc. I’m
going to get a bunch of yes/no responses base on the patch I happen to show the system.
-Ok, let’s try different patches with the same binary feature, that is compare location 5 and
11.
-Now imagine I try a whole bunch on pairs on a given patch. 2 vs 14, 8 vs 4, 7 vs 2, etc. I’m
going to get a bunch of yes/no responses base on the patch I happen to show the system.
-Ok, let’s try different patches with the same binary feature, that is compare location 5 and
11.
-Now imagine I try a whole bunch on pairs on a given patch. 2 vs 14, 8 vs 4, 7 vs 2, etc. I’m
going to get a bunch of yes/no responses base on the patch I happen to show the system.
Different Types of Binary Features
-Of course there are many different types of binary features, different types of questions I
can ask.
-Simple thresholding, which pixel is brighter, which pixel is brighter based on a threshold,
how similar are two pixels.
-With color it could be comparisons of different channels.
-The main points are, each feature has a fixed set of parameters discovered during training
and fixed for recognition. And the output is a yes or no.
-BTW, I really like the simple comparison of two pixels. It fast and any changes to the
brightness / contrast of a patch will always return the same result.
Decision Tree Overview
-Now in order to make use of these features, let’s talk about decision trees.
Is Grass Wet?
Did you water
the grass?
Y N Y N
Y N
YES
YES
NO
NO
Did it rain last night?
-Let’s saying you’re trying to determine if it rained last night.
-This is a classification problem.
-Here I constructed a simple decision tree based on a couple yes/no questions.
-At the leaves of the this tree are probability histograms created from my data.
-They sum up to one.
-My decision is based on which of the two bars are greater at each leaf.
Is Grass Wet?
YES NO
Y NY N
Do you like oranges?
YES NO
Y N Y N
Selecting Good Questions
-So how do I pick a good question? First pick a question from my Universe of questions, pour
my data thru it, and measure how well it predicts.
-Three commonly used metrics: Entropy, Gini Impurity, and Classification Error.
-What they basically measure is how far away you are from just a 50/50 coin toss.
-Here you can see an irrelevant question like “Do you like oranges” would yield a flat
distribution. This would yield a high entropy, gini impurity, or classification error.
I[5] < I[11]
Y N Y N
Y N
YES
YES
NO
NO
I[7] < I[3]
-Going back to Binary Features, the questions we ask are based on pixel comparisons.
-How do we pick the parameters? Well we random sample from the universe of parameters
and choose the one that yields a good score from the given dataset.
-In the 4x4 patch, I would pick two random numbers from 0 to 15 (no duplicates) and a
random threshold (if I need one). Add that feature to the tree, and then I test my tree with my
dataset and compute a score. I’ll do this 2000 times and keep the binary feature that
produced the best tree with the best score. I then keep growing my tree in a greedy fashion
until it’s big enough (5-9 levels deep) or accurate enough.
-This answers the question where does x, y, T come from.
-In my experience a good sampling of 500-2000 works really well with diminishing returns
with anything higher.
-This is the most time consuming part of building these times, but it’s extremely
parallelizable.
Is Grass Wet?
YES NO
Do you like oranges?
YES NO
Selecting Good Questions
-Now that’s for classification. Decision trees can also be used for regression too.
-Instead of classes like yes/no, cat/dog/horse, etc. The output is the average value at the
leaves from my dataset.
-What makes a good question? The ones that decrease the variance from the averages.
-Also note, the output can be multi-dimensional, and not necessarily a single value. You can
compute variance of multi-dimensional things fairly easily, don’t worry.
I[5] < I[11]
YES
YES
NO
NO
I[7] < I[3]
-So here is a binary feature tree that returns a value (like probability it’s an object) instead of
a class... or it could be a vector like landmarks.
-Now we can start constructing interesting solutions using these concepts.
Corner Detector
-First let’s start with corner detection.
Harris Corner Detector
1. Compute a smooth gradient in the X and Y
2. For each pixel, compute this matrix.
3. Solve for R
4. Maximum suppression to gather corners.
-Harris Corner Detector, one of the simplest ways to detect corners based on estimating the
2nd derivative of the sum-square-distance of two patches.
-SURF, SIFT, SUSAN etc.
-So what’s the point? These points are stable regardless of angle, scale, or translation.
-This reduces the data such that you can rapidly compare the image to a template for
techniques like augmented reality, image stitching, and motion tracking.
-So you can find corners using these four easy steps... wait... lots of math... slow...
FAST Corner Detector
Given a pixel, based on the 16 surrounding pixel, is this location a corner?
FAST uses a decision tree trained on real images and converted to nested if
statements in C.
Doesn’t use math, averages about 3 comparisons per pixel...very very FAST.
http://mi.eng.cam.ac.uk/~er258/work/fast.html
-Ok, enough of that. Let’s use a more machine learning approach...
FAST: Features from Accelerated Segment Test
FAST Corner Detector
The source code is computer generated,
and free for anyone to use.
It is 6000 lines long and not
comprehensible.
With an averaging of vectors and an
arctangent, you can get a rotation vector
cheaply.IPLE TARGET LOCALISATION AT OVER 100 FPS
d for the HIPs and the 5 sample locations selected
est point (shown by the grey circle). Right: The
m of the gradients between opposite pixels in the
e Positions and Orientations
us to select FAST-9 [12] as the interest point de-
ientation require computationally expensive blur-
http://mi.eng.cam.ac.uk/~er258/work/fast.html
FAST Example
-Here’s a picture of your’s truly and a Starbuck’s Logo that I ran for a project.
-The lines indicate a direction derived from that rotation vector in the last slide. It’s useful for
normalizing patches like if you were to create an augmented reality system on a mobile
device.
-Here is some random dude’s youtube video running FAST. I’d show you my own, but I didn’t
have enough time.
-Notice it’s running in realtime off a slow iPhone 3, Harris Corners and SURF would drag on
such a device. Just as a note, Mobile phones typical run 10x-30x slower than desktops.
FAST Example
-Here’s a picture of your’s truly and a Starbuck’s Logo that I ran for a project.
-The lines indicate a direction derived from that rotation vector in the last slide. It’s useful for
normalizing patches like if you were to create an augmented reality system on a mobile
device.
-Here is some random dude’s youtube video running FAST. I’d show you my own, but I didn’t
have enough time.
-Notice it’s running in realtime off a slow iPhone 3, Harris Corners and SURF would drag on
such a device. Just as a note, Mobile phones typical run 10x-30x slower than desktops.
Keypoint Recognition
-Once you have corners, the next step is to identify what those corners belong to.
Keypoint Recognition
Fast Keypoint Recognition using Random Ferns
Mustafa Özuysal, Michael Calonder, Vincent Lepetit and Pascal Fua
-So in an image stitching problem, an augmented reality solution, or bag-of-words object
recognizer (Amazon’s Product IDer thingy), you sample a region of interest around each
corner and try to match it with a known template.
-Comparisons are often non-trivial because you have to normalize the patches from
distortions caused by rotations and tilt, normalize the brightness, and then come up with
some feature vector from the patches.
-Finally you measure the distances from the feature vectors from each patch in the template
to the image.. That’s like an O(n^2) deal there.
-Everything about this sounds really slow on an iPhone.
-Ok, let’s use binary feature trees to solve this.
Fast Keypoint Recognition using Random Ferns
Mustafa Özuysal, Michael Calonder, Vincent Lepetit and Pascal Fua
-First generate patches from each corner in the original template with random orientations,
sizes, tilt. Generate a ton of them because that’s our training set.
Fast Keypoint Recognition using Random Ferns
Mustafa Özuysal, Michael Calonder, Vincent Lepetit and Pascal Fua
-Next, at for these guys, they simplified that decision tree concept with something they
dubbed Ferns (or primitive trees)
-The idea is if you ask the same question at each depth, you can collapse the tree into simple
bits in an index. The leaves are simply locations in an array.
-So for example three bits is 2^3 or 8 possible outcomes. So instead of a tree, you have an
array of 8 probability histograms.
-Next, the selection of classes is based off this simple max of the class probabilities for a
given set of bits, but you’re probably going to need a lot of bits to get a good result (they
empirically determine this)
-Now if you assume independence of the features, then you can reduce this to products of
several ferns.
0
1
1
1
0
0
1
0
1
1102=6 0012=1 1012=5
Efficient Keypoint Recognition, Lepetit et al
1
0
0
1
0
1
0
1
0
0012=6 1012=5 0102=2
Efficient Keypoint Recognition, Lepetit et al
1
0
0
0
1
1
1
0
1
0012=6 1102=6 1012=5
Efficient Keypoint Recognition, Lepetit et al
Fast Keypoint Recognition in Ten Lines of Code
Mustafa Özuysal Pascal Fua Vincent Lepetit
-This whole algorithm can be express in just 10 lines of C code.
-Very very fast.
Fast Keypoint Recognition in Ten Lines of Code
Mustafa Özuysal Pascal Fua Vincent Lepetit
-This whole algorithm can be express in just 10 lines of C code.
-Very very fast.
From Bits to Images
-So these binary trees toss all gray values. Do they really characterize images well enough to
solve serious problems?
-Ok, let’s say we took an image, found corners, sampled binary pairs from 32x32 patches
(few hundred). Can we reconstruct an image from just the locations of the corners, patch
size, and binary pairs?
From Bits to Images: Inversion of Local Binary Descriptors
Emmanuel d’Angelo, Laurent Jacques, Alexandre Alahi and Pierre Vandergheynst
-Yes we can. It’s a bit like solving Sodoku.
-What’s really surprising is how much information we can capture without any gray levels.
-So you’re collecting edge information over different scales, plus, if it’s just simple
comparisons, it’s immune to brightness / contrast issues or global lighting.
-In many ways it’s superior to other means of characterizing images.
Object Detection
-Let’s talk about object detection.
Viola / Jones Object Detection
"Robust Real-time Object Detection"
Paul Viola and Michael Jones
-The Viola Jone’s object detection frame was formulated in the early 2000s and was a
breakthru in object detection. Cheap cameras and cellphones use it all the time.
-It works by measuring the differences of the sums of rectangles and taking a threshold. If it
exceeds a certain value, it’s a face.
-Now of course that’s a very poor system of face detection, so they strengthened it utilizing
the principles of ensemble learning.
-That is, yes one rectangle comparison makes a very awful face detector, but if you have a
large number of independent detectors and do a weighted vote, you’ll end up with a much
more accurate detector.
-Wisdom of crowds.
-The AdaBoost algorithm shown here lists a method of determining the weighting. Basically
give higher vote to the more accurate detectors, retrain on the dataset looking at the
incorrect samples. Repeat.
Viola / Jones Object Detection
Figure 2: The integral image. Left: A simple input of image values. Center: The computed integral image. Right:
Using the integral image to calculate the sum over rectangle D.
3 The Technique
Our adaptive thresholding technique is a simple extension of Wellner’s method [Wellner 1993]. The main idea
in Wellner’s algorithm is that each pixel is compared to an average of the surrounding pixels. Specifically, an
approximate moving average of the last s pixels seen is calculated while traversing the image. If the value of the
current pixel is t percent lower than the average then it is set to black, otherwise it is set to white. This method works
because comparing a pixel to the average of nearby pixels will preserve hard contrast lines and ignore soft gradient
changes. The advantage of this method is that only a single pass through the image is required. Wellner uses 1/8th
of the image width for the value of s and 15 for the value of t. However, a problem with this method is that it is
dependent on the scanning order of the pixels. In addition, the moving average is not a good representation of the
surrounding pixels at each step because the neighbourhood samples are not evenly distributed in all directions. By
using the integral image (and sacrificing one additional iteration through the image), we present a solution that does
not suffer from these problems. Our technique is clean, straightforward, easy to code, and produces the same output
independently of how the image is processed. Instead of computing a running average of the last s pixels seen, we
compute the average of an s x s window of pixels centered around each pixel. This is a better average for comparison
since it considers neighbouring pixels on all sides. The average computation is accomplished in linear time by using
the integral image. We calculate the integral image in the first pass through the input image. In a second pass, we
compute the s x s average using the integral image for each pixel in constant time and then perform the comparison.
If the value of the current pixel is t percent less than this average then it is set to black, otherwise it is set to white.
The following pseudocode demonstrates our technique for input image in, output binary image out, image width w
and image height h.
procedure AdaptiveThreshold(in,out,w,h)
1: for i = 0 to w do
2: sum ⇥ 0
3: for j = 0 to h do
4: sum ⇥ sum+in[i, j]
5: if i = 0 then
we can use an integral image and achieve a constant number of operations per rectangle with
preprocessing.
e the integral image, we store at each location, I(x,y), the sum of all f(x,y) terms to the lef
,y). This is accomplished in linear time using the following equation for each pixel (taking
cases),
I(x,y) = f(x,y)+I(x 1,y)+I(x,y 1) I(x 1,y 1).
ft and center) illustrates the computation of an integral image. Once we have the integral ima
tion for any rectangle with upper left corner (x1,y1), and lower right corner (x2,y2) can be c
me using the following equation,
x2
Â
x=x1
y2
Â
y=y1
f(x,y) = I(x2,y2) I(x2,y1 1) I(x1 1,y2)+I(x1 1,y1 1).
ght) illustrates that computing the sum of f(x,y) over the rectangle D using Equation 2 is e
the sums over the rectangles (A+B+C+D)-(A+B)-(A+C)+A.
D. Bradley, G. Roth, Adaptive Thresholding using the
Integral Image. J. Graphics Tools 12(2): 13-21 (2007)
-The other trick in Viola-Jones was the fast method of summing the rectangles using an
integral image.
-If you construct an integral image based on summing the pixels left and about while
subtracting the upper left pixel, you can rapidly compute the rect sum using the about
equation.
-Problem is this construction of integral images can be slow, plus you’re doing 8 operations
per feature.
-Binary Features with pixel comparisons can do it with two without even constructing an
integral image or brightness / contrast normalization.
Binary Feature-Based Object Detection
Unconstrained Face Detection
Shengcai Liao, Anil K. Jain, and Stan Z. Li
I[5] < I[11]
Y N Y N
Y N
YES
YES
NO
NO
I[7] < I[3]
Object Detection with Pixel Intensity Comparisons Organized in Decision Trees
Nenad Markus, Miroslav Frljak, Igor S. Pandzic, Jorgen Ahlberg, and Robert Forchheimer
-This technique was simultaneously published by several groups.
-Here is Nenad Markus’ implementation
-His runs 30x faster than Viola Jones and 9x faster than Local Binary Patterns approach in
OpenCV.
-Here he accomplishes rotational invariance by rotating the trees N times, however it’s fast
enough that that’s feasible.
Binary Feature-Based Object Detection
Unconstrained Face Detection
Shengcai Liao, Anil K. Jain, and Stan Z. Li
I[5] < I[11]
Y N Y N
Y N
YES
YES
NO
NO
I[7] < I[3]
Object Detection with Pixel Intensity Comparisons Organized in Decision Trees
Nenad Markus, Miroslav Frljak, Igor S. Pandzic, Jorgen Ahlberg, and Robert Forchheimer
-This technique was simultaneously published by several groups.
-Here is Nenad Markus’ implementation
-His runs 30x faster than Viola Jones and 9x faster than Local Binary Patterns approach in
OpenCV.
-Here he accomplishes rotational invariance by rotating the trees N times, however it’s fast
enough that that’s feasible.
Object Landmarking
Face Alignment by Explicit Shape Regression, Cao et al
-Microsoft has been putting a lot of effort into deriving methods for landmarking faces.
-For some reason they call it facial alignment. We tend to call it landmarking or
segmentation.
-Basically find points on an object that may or may not represent contours of that object.
Base on: Face Alignment by Explicit Shape Regression, Cao et al
t = 0 t = 1 t = 2 t = 10
Affine
Transform to
mean shape
Transform
back from
mean shape
...
...
Insert Magic
…... …...
-Here is one of their approaches to landmarking faces using regression trees.
-Dubbed Explicit Shape Regression.
-Typically done with 10 groups of trees.
-Each group is hundreds of trees refining the shape vector from the previous tree.
-Although they don’t say it, they’re effectively using a Gradient Boosting approach using
regression trees with a lambda of one. A slightly lower lambda would improve generalization,
but most likely they were not aware of this.
Face Alignment by Explicit Shape Regression, Cao et al
I[S5+∆] < I[S11+∆]
YES
YES
NO
NO
I[S7+∆] < I[S3+∆]
What’s inside ?
-So each regression tree is between 5-9 levels deep.
-Pixel comparisons are made with locations relative to the landmarks, S.
-One comparison requires which two landmarks (i,j) and x/y delta from each landmark.
-The affine to mean transform in the other slide removes any need to care about scale.
-The leaves store delta S’s to move the S closer to the target.
Face Alignment by Explicit Shape Regression, Cao et al
-An average face, S^0, is placed on the image using a face detector like Viola-Jones / LBP /
or that tree thing I just talked about.
-The shape is refined to the image using groups of trees followed by affine transform
adjustments.
-Here are examples of landmarked faces.
-The original paper makes the argument that all generated landmarks are based on a linear
combination of faces. That it implicitly creates a shape model of faces, so you don’t need to
worry about generating non-sensical faces.
In Conclusion
I just presented a small subset of a very large topic.
The comparison of two pixels is a surprisingly useful
feature that’s very easy to compute.
Combined with decision trees and ferns, these
techniques substitute math with machine learning.
This enables complicated object recognition
techniques to run in realtime on mobile devices.

More Related Content

Viewers also liked

Harris Method
Harris MethodHarris Method
Harris MethodSheenum
 
Using websites in the classroom
Using websites in the classroomUsing websites in the classroom
Using websites in the classroomHardi Prasetyo
 
Componential analysis approach
Componential  analysis approachComponential  analysis approach
Componential analysis approachhamid gittan
 
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)Yusuke Uchida
 
Componential analysis and semantic decomposition
Componential analysis and semantic decompositionComponential analysis and semantic decomposition
Componential analysis and semantic decompositionRatna Nurhidayati
 
Linguistic Features & Functions
Linguistic Features & FunctionsLinguistic Features & Functions
Linguistic Features & FunctionsMasitah ZulkifLy
 
Object detection
Object detectionObject detection
Object detectionSomesh Vyas
 
Kahoot socrative plickers
Kahoot socrative plickersKahoot socrative plickers
Kahoot socrative plickersLuísa Lima
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDBAlex Sharp
 
Linguistic Devices
Linguistic DevicesLinguistic Devices
Linguistic Deviceswendron
 
harris corner detector
harris corner detectorharris corner detector
harris corner detectorMohamed Khomsi
 
Best Practices for Teaching English to Young Learners by Joan Shin
Best Practices for Teaching English to Young Learners by Joan ShinBest Practices for Teaching English to Young Learners by Joan Shin
Best Practices for Teaching English to Young Learners by Joan ShinVenezuela TESOL
 

Viewers also liked (13)

Harris Method
Harris MethodHarris Method
Harris Method
 
Plickers
PlickersPlickers
Plickers
 
Using websites in the classroom
Using websites in the classroomUsing websites in the classroom
Using websites in the classroom
 
Componential analysis approach
Componential  analysis approachComponential  analysis approach
Componential analysis approach
 
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
Image Retrieval with Fisher Vectors of Binary Features (MIRU'14)
 
Componential analysis and semantic decomposition
Componential analysis and semantic decompositionComponential analysis and semantic decomposition
Componential analysis and semantic decomposition
 
Linguistic Features & Functions
Linguistic Features & FunctionsLinguistic Features & Functions
Linguistic Features & Functions
 
Object detection
Object detectionObject detection
Object detection
 
Kahoot socrative plickers
Kahoot socrative plickersKahoot socrative plickers
Kahoot socrative plickers
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDB
 
Linguistic Devices
Linguistic DevicesLinguistic Devices
Linguistic Devices
 
harris corner detector
harris corner detectorharris corner detector
harris corner detector
 
Best Practices for Teaching English to Young Learners by Joan Shin
Best Practices for Teaching English to Young Learners by Joan ShinBest Practices for Teaching English to Young Learners by Joan Shin
Best Practices for Teaching English to Young Learners by Joan Shin
 

Similar to Binary Features for Object Detection and Landmarking

AP Calculus - Mini Exam Review
AP Calculus - Mini Exam ReviewAP Calculus - Mini Exam Review
AP Calculus - Mini Exam Reviewk_ina
 
learningIntro.doc
learningIntro.doclearningIntro.doc
learningIntro.docbutest
 
learningIntro.doc
learningIntro.doclearningIntro.doc
learningIntro.docbutest
 
(Radhika) presentation on chapter 2 ai
(Radhika) presentation on chapter 2 ai(Radhika) presentation on chapter 2 ai
(Radhika) presentation on chapter 2 aiRadhika Srinivasan
 
GUI based Face detection using Viola-Jones algorithm in MATLAB.
GUI based Face detection using Viola-Jones algorithm in MATLAB.GUI based Face detection using Viola-Jones algorithm in MATLAB.
GUI based Face detection using Viola-Jones algorithm in MATLAB.Binita Khua
 
Introduction to Computer Vision
Introduction to Computer VisionIntroduction to Computer Vision
Introduction to Computer VisionComponica LLC
 
Transcript - Data Visualisation - Tools and Techniques
Transcript - Data Visualisation - Tools and TechniquesTranscript - Data Visualisation - Tools and Techniques
Transcript - Data Visualisation - Tools and TechniquesARDC
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reductionMarco Quartulli
 
PBL presentation p2.pptx
PBL presentation p2.pptxPBL presentation p2.pptx
PBL presentation p2.pptxTony383416
 
Convolutional neural network complete guide
Convolutional neural network complete guideConvolutional neural network complete guide
Convolutional neural network complete guideMLTUT
 
Lect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfLect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfHassanElalfy4
 
Optimization Review
Optimization ReviewOptimization Review
Optimization Reviewk_ina
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment AnalysisRupak Roy
 
What goes on during haar cascade face detection
What goes on during haar cascade face detectionWhat goes on during haar cascade face detection
What goes on during haar cascade face detectionOnibiyo Joshua Toluse
 
Facial Expression Recognition via Python
Facial Expression Recognition via PythonFacial Expression Recognition via Python
Facial Expression Recognition via PythonSaurav Gupta
 
Predicting Facial Expression using Neural Network
Predicting Facial Expression using Neural Network Predicting Facial Expression using Neural Network
Predicting Facial Expression using Neural Network Santanu Paul
 
It's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithmsIt's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithmsBrian Lange
 

Similar to Binary Features for Object Detection and Landmarking (20)

AP Calculus - Mini Exam Review
AP Calculus - Mini Exam ReviewAP Calculus - Mini Exam Review
AP Calculus - Mini Exam Review
 
learningIntro.doc
learningIntro.doclearningIntro.doc
learningIntro.doc
 
learningIntro.doc
learningIntro.doclearningIntro.doc
learningIntro.doc
 
(Radhika) presentation on chapter 2 ai
(Radhika) presentation on chapter 2 ai(Radhika) presentation on chapter 2 ai
(Radhika) presentation on chapter 2 ai
 
GUI based Face detection using Viola-Jones algorithm in MATLAB.
GUI based Face detection using Viola-Jones algorithm in MATLAB.GUI based Face detection using Viola-Jones algorithm in MATLAB.
GUI based Face detection using Viola-Jones algorithm in MATLAB.
 
Scale invariant feature transform
Scale invariant feature transformScale invariant feature transform
Scale invariant feature transform
 
Introduction to Computer Vision
Introduction to Computer VisionIntroduction to Computer Vision
Introduction to Computer Vision
 
Transcript - Data Visualisation - Tools and Techniques
Transcript - Data Visualisation - Tools and TechniquesTranscript - Data Visualisation - Tools and Techniques
Transcript - Data Visualisation - Tools and Techniques
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reduction
 
PBL presentation p2.pptx
PBL presentation p2.pptxPBL presentation p2.pptx
PBL presentation p2.pptx
 
Convolutional neural network complete guide
Convolutional neural network complete guideConvolutional neural network complete guide
Convolutional neural network complete guide
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Lect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfLect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdf
 
Optimization Review
Optimization ReviewOptimization Review
Optimization Review
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
 
What goes on during haar cascade face detection
What goes on during haar cascade face detectionWhat goes on during haar cascade face detection
What goes on during haar cascade face detection
 
Facial Expression Recognition
Facial Expression RecognitionFacial Expression Recognition
Facial Expression Recognition
 
Facial Expression Recognition via Python
Facial Expression Recognition via PythonFacial Expression Recognition via Python
Facial Expression Recognition via Python
 
Predicting Facial Expression using Neural Network
Predicting Facial Expression using Neural Network Predicting Facial Expression using Neural Network
Predicting Facial Expression using Neural Network
 
It's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithmsIt's Not Magic - Explaining classification algorithms
It's Not Magic - Explaining classification algorithms
 

Recently uploaded

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction managementMariconPadriquez1
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 

Recently uploaded (20)

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction management
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 

Binary Features for Object Detection and Landmarking

  • 1. Binary Features Steven C. Mitchell, Ph.D. Componica, LLC
  • 2. What’s a Binary Feature?
  • 3. What’s a Binary Feature? -Let’s take an image, and sample a region of interest, a 4x4 patch. Maybe you’re looking for a face, or a tumor, or gun. -In a typical object detection system, this region of interest will be scanned across the image over different scales. -Typically you scan left-to-right, top-to-bottom in steps of 10% the size of the patch. Then you shrink the image (or scale the patch) by 20% and start over. Continue doing that until the image becomes too small or you found what you’re looking for.
  • 4. -So let’s start with this patch (we’ll assume only gray values, forget about color for now). -First the pixels have value, typically from 0 to 255. -Now we also need a way of addressing the location of the these pixels. I’ll use a simple number scheme as the patches will always be 4x4. -Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11. -Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
  • 5. -So let’s start with this patch (we’ll assume only gray values, forget about color for now). -First the pixels have value, typically from 0 to 255. -Now we also need a way of addressing the location of the these pixels. I’ll use a simple number scheme as the patches will always be 4x4. -Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11. -Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
  • 6. -So let’s start with this patch (we’ll assume only gray values, forget about color for now). -First the pixels have value, typically from 0 to 255. -Now we also need a way of addressing the location of the these pixels. I’ll use a simple number scheme as the patches will always be 4x4. -Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11. -Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
  • 7. -So let’s start with this patch (we’ll assume only gray values, forget about color for now). -First the pixels have value, typically from 0 to 255. -Now we also need a way of addressing the location of the these pixels. I’ll use a simple number scheme as the patches will always be 4x4. -Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11. -Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
  • 8. -So let’s start with this patch (we’ll assume only gray values, forget about color for now). -First the pixels have value, typically from 0 to 255. -Now we also need a way of addressing the location of the these pixels. I’ll use a simple number scheme as the patches will always be 4x4. -Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11. -Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
  • 9. -So let’s start with this patch (we’ll assume only gray values, forget about color for now). -First the pixels have value, typically from 0 to 255. -Now we also need a way of addressing the location of the these pixels. I’ll use a simple number scheme as the patches will always be 4x4. -Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11. -Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
  • 10. -So let’s start with this patch (we’ll assume only gray values, forget about color for now). -First the pixels have value, typically from 0 to 255. -Now we also need a way of addressing the location of the these pixels. I’ll use a simple number scheme as the patches will always be 4x4. -Lastly, I want to compare the brightness of two pixels. I’ll pick location 5 and 11. -Why those two locations? Well in a later slide, I’ll explain how locations are chosen.
  • 11. -Ok, let’s try different patches with the same binary feature, that is compare location 5 and 11. -Now imagine I try a whole bunch on pairs on a given patch. 2 vs 14, 8 vs 4, 7 vs 2, etc. I’m going to get a bunch of yes/no responses base on the patch I happen to show the system.
  • 12. -Ok, let’s try different patches with the same binary feature, that is compare location 5 and 11. -Now imagine I try a whole bunch on pairs on a given patch. 2 vs 14, 8 vs 4, 7 vs 2, etc. I’m going to get a bunch of yes/no responses base on the patch I happen to show the system.
  • 13. -Ok, let’s try different patches with the same binary feature, that is compare location 5 and 11. -Now imagine I try a whole bunch on pairs on a given patch. 2 vs 14, 8 vs 4, 7 vs 2, etc. I’m going to get a bunch of yes/no responses base on the patch I happen to show the system.
  • 14. Different Types of Binary Features -Of course there are many different types of binary features, different types of questions I can ask. -Simple thresholding, which pixel is brighter, which pixel is brighter based on a threshold, how similar are two pixels. -With color it could be comparisons of different channels. -The main points are, each feature has a fixed set of parameters discovered during training and fixed for recognition. And the output is a yes or no. -BTW, I really like the simple comparison of two pixels. It fast and any changes to the brightness / contrast of a patch will always return the same result.
  • 15. Decision Tree Overview -Now in order to make use of these features, let’s talk about decision trees.
  • 16. Is Grass Wet? Did you water the grass? Y N Y N Y N YES YES NO NO Did it rain last night? -Let’s saying you’re trying to determine if it rained last night. -This is a classification problem. -Here I constructed a simple decision tree based on a couple yes/no questions. -At the leaves of the this tree are probability histograms created from my data. -They sum up to one. -My decision is based on which of the two bars are greater at each leaf.
  • 17. Is Grass Wet? YES NO Y NY N Do you like oranges? YES NO Y N Y N Selecting Good Questions -So how do I pick a good question? First pick a question from my Universe of questions, pour my data thru it, and measure how well it predicts. -Three commonly used metrics: Entropy, Gini Impurity, and Classification Error. -What they basically measure is how far away you are from just a 50/50 coin toss. -Here you can see an irrelevant question like “Do you like oranges” would yield a flat distribution. This would yield a high entropy, gini impurity, or classification error.
  • 18. I[5] < I[11] Y N Y N Y N YES YES NO NO I[7] < I[3] -Going back to Binary Features, the questions we ask are based on pixel comparisons. -How do we pick the parameters? Well we random sample from the universe of parameters and choose the one that yields a good score from the given dataset. -In the 4x4 patch, I would pick two random numbers from 0 to 15 (no duplicates) and a random threshold (if I need one). Add that feature to the tree, and then I test my tree with my dataset and compute a score. I’ll do this 2000 times and keep the binary feature that produced the best tree with the best score. I then keep growing my tree in a greedy fashion until it’s big enough (5-9 levels deep) or accurate enough. -This answers the question where does x, y, T come from. -In my experience a good sampling of 500-2000 works really well with diminishing returns with anything higher. -This is the most time consuming part of building these times, but it’s extremely parallelizable.
  • 19. Is Grass Wet? YES NO Do you like oranges? YES NO Selecting Good Questions -Now that’s for classification. Decision trees can also be used for regression too. -Instead of classes like yes/no, cat/dog/horse, etc. The output is the average value at the leaves from my dataset. -What makes a good question? The ones that decrease the variance from the averages. -Also note, the output can be multi-dimensional, and not necessarily a single value. You can compute variance of multi-dimensional things fairly easily, don’t worry.
  • 20. I[5] < I[11] YES YES NO NO I[7] < I[3] -So here is a binary feature tree that returns a value (like probability it’s an object) instead of a class... or it could be a vector like landmarks. -Now we can start constructing interesting solutions using these concepts.
  • 21. Corner Detector -First let’s start with corner detection.
  • 22. Harris Corner Detector 1. Compute a smooth gradient in the X and Y 2. For each pixel, compute this matrix. 3. Solve for R 4. Maximum suppression to gather corners. -Harris Corner Detector, one of the simplest ways to detect corners based on estimating the 2nd derivative of the sum-square-distance of two patches. -SURF, SIFT, SUSAN etc. -So what’s the point? These points are stable regardless of angle, scale, or translation. -This reduces the data such that you can rapidly compare the image to a template for techniques like augmented reality, image stitching, and motion tracking. -So you can find corners using these four easy steps... wait... lots of math... slow...
  • 23. FAST Corner Detector Given a pixel, based on the 16 surrounding pixel, is this location a corner? FAST uses a decision tree trained on real images and converted to nested if statements in C. Doesn’t use math, averages about 3 comparisons per pixel...very very FAST. http://mi.eng.cam.ac.uk/~er258/work/fast.html -Ok, enough of that. Let’s use a more machine learning approach... FAST: Features from Accelerated Segment Test
  • 24. FAST Corner Detector The source code is computer generated, and free for anyone to use. It is 6000 lines long and not comprehensible. With an averaging of vectors and an arctangent, you can get a rotation vector cheaply.IPLE TARGET LOCALISATION AT OVER 100 FPS d for the HIPs and the 5 sample locations selected est point (shown by the grey circle). Right: The m of the gradients between opposite pixels in the e Positions and Orientations us to select FAST-9 [12] as the interest point de- ientation require computationally expensive blur- http://mi.eng.cam.ac.uk/~er258/work/fast.html
  • 25. FAST Example -Here’s a picture of your’s truly and a Starbuck’s Logo that I ran for a project. -The lines indicate a direction derived from that rotation vector in the last slide. It’s useful for normalizing patches like if you were to create an augmented reality system on a mobile device. -Here is some random dude’s youtube video running FAST. I’d show you my own, but I didn’t have enough time. -Notice it’s running in realtime off a slow iPhone 3, Harris Corners and SURF would drag on such a device. Just as a note, Mobile phones typical run 10x-30x slower than desktops.
  • 26. FAST Example -Here’s a picture of your’s truly and a Starbuck’s Logo that I ran for a project. -The lines indicate a direction derived from that rotation vector in the last slide. It’s useful for normalizing patches like if you were to create an augmented reality system on a mobile device. -Here is some random dude’s youtube video running FAST. I’d show you my own, but I didn’t have enough time. -Notice it’s running in realtime off a slow iPhone 3, Harris Corners and SURF would drag on such a device. Just as a note, Mobile phones typical run 10x-30x slower than desktops.
  • 27. Keypoint Recognition -Once you have corners, the next step is to identify what those corners belong to.
  • 28. Keypoint Recognition Fast Keypoint Recognition using Random Ferns Mustafa Özuysal, Michael Calonder, Vincent Lepetit and Pascal Fua -So in an image stitching problem, an augmented reality solution, or bag-of-words object recognizer (Amazon’s Product IDer thingy), you sample a region of interest around each corner and try to match it with a known template. -Comparisons are often non-trivial because you have to normalize the patches from distortions caused by rotations and tilt, normalize the brightness, and then come up with some feature vector from the patches. -Finally you measure the distances from the feature vectors from each patch in the template to the image.. That’s like an O(n^2) deal there. -Everything about this sounds really slow on an iPhone. -Ok, let’s use binary feature trees to solve this.
  • 29. Fast Keypoint Recognition using Random Ferns Mustafa Özuysal, Michael Calonder, Vincent Lepetit and Pascal Fua -First generate patches from each corner in the original template with random orientations, sizes, tilt. Generate a ton of them because that’s our training set.
  • 30. Fast Keypoint Recognition using Random Ferns Mustafa Özuysal, Michael Calonder, Vincent Lepetit and Pascal Fua -Next, at for these guys, they simplified that decision tree concept with something they dubbed Ferns (or primitive trees) -The idea is if you ask the same question at each depth, you can collapse the tree into simple bits in an index. The leaves are simply locations in an array. -So for example three bits is 2^3 or 8 possible outcomes. So instead of a tree, you have an array of 8 probability histograms. -Next, the selection of classes is based off this simple max of the class probabilities for a given set of bits, but you’re probably going to need a lot of bits to get a good result (they empirically determine this) -Now if you assume independence of the features, then you can reduce this to products of several ferns.
  • 31. 0 1 1 1 0 0 1 0 1 1102=6 0012=1 1012=5 Efficient Keypoint Recognition, Lepetit et al
  • 32. 1 0 0 1 0 1 0 1 0 0012=6 1012=5 0102=2 Efficient Keypoint Recognition, Lepetit et al
  • 33. 1 0 0 0 1 1 1 0 1 0012=6 1102=6 1012=5 Efficient Keypoint Recognition, Lepetit et al
  • 34. Fast Keypoint Recognition in Ten Lines of Code Mustafa Özuysal Pascal Fua Vincent Lepetit -This whole algorithm can be express in just 10 lines of C code. -Very very fast.
  • 35. Fast Keypoint Recognition in Ten Lines of Code Mustafa Özuysal Pascal Fua Vincent Lepetit -This whole algorithm can be express in just 10 lines of C code. -Very very fast.
  • 36. From Bits to Images -So these binary trees toss all gray values. Do they really characterize images well enough to solve serious problems? -Ok, let’s say we took an image, found corners, sampled binary pairs from 32x32 patches (few hundred). Can we reconstruct an image from just the locations of the corners, patch size, and binary pairs?
  • 37. From Bits to Images: Inversion of Local Binary Descriptors Emmanuel d’Angelo, Laurent Jacques, Alexandre Alahi and Pierre Vandergheynst -Yes we can. It’s a bit like solving Sodoku. -What’s really surprising is how much information we can capture without any gray levels. -So you’re collecting edge information over different scales, plus, if it’s just simple comparisons, it’s immune to brightness / contrast issues or global lighting. -In many ways it’s superior to other means of characterizing images.
  • 38. Object Detection -Let’s talk about object detection.
  • 39. Viola / Jones Object Detection "Robust Real-time Object Detection" Paul Viola and Michael Jones -The Viola Jone’s object detection frame was formulated in the early 2000s and was a breakthru in object detection. Cheap cameras and cellphones use it all the time. -It works by measuring the differences of the sums of rectangles and taking a threshold. If it exceeds a certain value, it’s a face. -Now of course that’s a very poor system of face detection, so they strengthened it utilizing the principles of ensemble learning. -That is, yes one rectangle comparison makes a very awful face detector, but if you have a large number of independent detectors and do a weighted vote, you’ll end up with a much more accurate detector. -Wisdom of crowds. -The AdaBoost algorithm shown here lists a method of determining the weighting. Basically give higher vote to the more accurate detectors, retrain on the dataset looking at the incorrect samples. Repeat.
  • 40. Viola / Jones Object Detection Figure 2: The integral image. Left: A simple input of image values. Center: The computed integral image. Right: Using the integral image to calculate the sum over rectangle D. 3 The Technique Our adaptive thresholding technique is a simple extension of Wellner’s method [Wellner 1993]. The main idea in Wellner’s algorithm is that each pixel is compared to an average of the surrounding pixels. Specifically, an approximate moving average of the last s pixels seen is calculated while traversing the image. If the value of the current pixel is t percent lower than the average then it is set to black, otherwise it is set to white. This method works because comparing a pixel to the average of nearby pixels will preserve hard contrast lines and ignore soft gradient changes. The advantage of this method is that only a single pass through the image is required. Wellner uses 1/8th of the image width for the value of s and 15 for the value of t. However, a problem with this method is that it is dependent on the scanning order of the pixels. In addition, the moving average is not a good representation of the surrounding pixels at each step because the neighbourhood samples are not evenly distributed in all directions. By using the integral image (and sacrificing one additional iteration through the image), we present a solution that does not suffer from these problems. Our technique is clean, straightforward, easy to code, and produces the same output independently of how the image is processed. Instead of computing a running average of the last s pixels seen, we compute the average of an s x s window of pixels centered around each pixel. This is a better average for comparison since it considers neighbouring pixels on all sides. The average computation is accomplished in linear time by using the integral image. We calculate the integral image in the first pass through the input image. In a second pass, we compute the s x s average using the integral image for each pixel in constant time and then perform the comparison. If the value of the current pixel is t percent less than this average then it is set to black, otherwise it is set to white. The following pseudocode demonstrates our technique for input image in, output binary image out, image width w and image height h. procedure AdaptiveThreshold(in,out,w,h) 1: for i = 0 to w do 2: sum ⇥ 0 3: for j = 0 to h do 4: sum ⇥ sum+in[i, j] 5: if i = 0 then we can use an integral image and achieve a constant number of operations per rectangle with preprocessing. e the integral image, we store at each location, I(x,y), the sum of all f(x,y) terms to the lef ,y). This is accomplished in linear time using the following equation for each pixel (taking cases), I(x,y) = f(x,y)+I(x 1,y)+I(x,y 1) I(x 1,y 1). ft and center) illustrates the computation of an integral image. Once we have the integral ima tion for any rectangle with upper left corner (x1,y1), and lower right corner (x2,y2) can be c me using the following equation, x2 Â x=x1 y2 Â y=y1 f(x,y) = I(x2,y2) I(x2,y1 1) I(x1 1,y2)+I(x1 1,y1 1). ght) illustrates that computing the sum of f(x,y) over the rectangle D using Equation 2 is e the sums over the rectangles (A+B+C+D)-(A+B)-(A+C)+A. D. Bradley, G. Roth, Adaptive Thresholding using the Integral Image. J. Graphics Tools 12(2): 13-21 (2007) -The other trick in Viola-Jones was the fast method of summing the rectangles using an integral image. -If you construct an integral image based on summing the pixels left and about while subtracting the upper left pixel, you can rapidly compute the rect sum using the about equation. -Problem is this construction of integral images can be slow, plus you’re doing 8 operations per feature. -Binary Features with pixel comparisons can do it with two without even constructing an integral image or brightness / contrast normalization.
  • 41. Binary Feature-Based Object Detection Unconstrained Face Detection Shengcai Liao, Anil K. Jain, and Stan Z. Li I[5] < I[11] Y N Y N Y N YES YES NO NO I[7] < I[3] Object Detection with Pixel Intensity Comparisons Organized in Decision Trees Nenad Markus, Miroslav Frljak, Igor S. Pandzic, Jorgen Ahlberg, and Robert Forchheimer -This technique was simultaneously published by several groups. -Here is Nenad Markus’ implementation -His runs 30x faster than Viola Jones and 9x faster than Local Binary Patterns approach in OpenCV. -Here he accomplishes rotational invariance by rotating the trees N times, however it’s fast enough that that’s feasible.
  • 42. Binary Feature-Based Object Detection Unconstrained Face Detection Shengcai Liao, Anil K. Jain, and Stan Z. Li I[5] < I[11] Y N Y N Y N YES YES NO NO I[7] < I[3] Object Detection with Pixel Intensity Comparisons Organized in Decision Trees Nenad Markus, Miroslav Frljak, Igor S. Pandzic, Jorgen Ahlberg, and Robert Forchheimer -This technique was simultaneously published by several groups. -Here is Nenad Markus’ implementation -His runs 30x faster than Viola Jones and 9x faster than Local Binary Patterns approach in OpenCV. -Here he accomplishes rotational invariance by rotating the trees N times, however it’s fast enough that that’s feasible.
  • 44. Face Alignment by Explicit Shape Regression, Cao et al -Microsoft has been putting a lot of effort into deriving methods for landmarking faces. -For some reason they call it facial alignment. We tend to call it landmarking or segmentation. -Basically find points on an object that may or may not represent contours of that object.
  • 45. Base on: Face Alignment by Explicit Shape Regression, Cao et al t = 0 t = 1 t = 2 t = 10 Affine Transform to mean shape Transform back from mean shape ... ... Insert Magic …... …... -Here is one of their approaches to landmarking faces using regression trees. -Dubbed Explicit Shape Regression. -Typically done with 10 groups of trees. -Each group is hundreds of trees refining the shape vector from the previous tree. -Although they don’t say it, they’re effectively using a Gradient Boosting approach using regression trees with a lambda of one. A slightly lower lambda would improve generalization, but most likely they were not aware of this.
  • 46. Face Alignment by Explicit Shape Regression, Cao et al I[S5+∆] < I[S11+∆] YES YES NO NO I[S7+∆] < I[S3+∆] What’s inside ? -So each regression tree is between 5-9 levels deep. -Pixel comparisons are made with locations relative to the landmarks, S. -One comparison requires which two landmarks (i,j) and x/y delta from each landmark. -The affine to mean transform in the other slide removes any need to care about scale. -The leaves store delta S’s to move the S closer to the target.
  • 47. Face Alignment by Explicit Shape Regression, Cao et al -An average face, S^0, is placed on the image using a face detector like Viola-Jones / LBP / or that tree thing I just talked about. -The shape is refined to the image using groups of trees followed by affine transform adjustments. -Here are examples of landmarked faces. -The original paper makes the argument that all generated landmarks are based on a linear combination of faces. That it implicitly creates a shape model of faces, so you don’t need to worry about generating non-sensical faces.
  • 48. In Conclusion I just presented a small subset of a very large topic. The comparison of two pixels is a surprisingly useful feature that’s very easy to compute. Combined with decision trees and ferns, these techniques substitute math with machine learning. This enables complicated object recognition techniques to run in realtime on mobile devices.