References
[1] G. Guo, G. Mu, Y. Fu, and T.S. Huang. Human age estimation using bio-inspired features.
Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2009
[2] X. He and P. Niyogi, “Locality Preserving Projections,” Proc. Conf. Advances in Neural
Information Processing System, 2003.
Classification
The Support Vector Machine algorithm will be used to classify the Morph II
data by gender. Classification is achieved by finding the optimal hyperplane
that separates classes of data. A hyperplane of a n dimensional vector space
is a subspace with n-1 dimensions. A separating hyperplane separates
categories of data. For example, in the following picture the goal would be
finding a line that separates the blue class of data from the green class.
Regression
The Human face is the essential form of identification, as each face is unique in its own way shape or form. Specifically, when it
comes to age identification based on the image of a human face, it can prove to be quite problematic in more ways than one.
The biggest issue with age identification is that the aging process is also unique to each individual. Many outside factors such
as living style, health, and gender effect the way in which people age. Therefore, this poses the issue of high variance in age
estimation for facial data. Particularly, Human faces and their unique characteristics pose the problem out various number of
outliers from a predicted model. This section hopes to examine a linear model, known as Support Vector Regression (SVR),
that optimizes the age prediction accuracy with the constraint of uniqueness with the human face.
Dimension Reduction
Since each image is quantified by 4,376 real value points and there are 3,000 images the computational intensity for regression
and classification would decrease if the number of points each image held decreased. Thus, the goal is to reduce each image to
200 meaningful points. The dimension reduction technique used is Locality Preserving Projections [2], which is an algorithm that
preserves local structure between the points. The computational complexity is linear making it efficient for large data sets. Thus,
given a set x1, x2, … , x3000 in R4376, find a transformation matrix A that maps these 3000 points to a set of points y1, y2, … , y3000 in
R200 such that yi represents xi where yi = AT xi.
The objective of SVM classification is to find the separating hyperplane that
maximizes the margin between itself and the decision boundaries. The
equation of the hyperplane can be written as W*X+b=0, and the decision
boundaries can be written as W*X+b=± 1.
By computing the distance from the hyperplane we find that the size of the
margin is 1/||W||. The essential problem of SVM classification is to maximize
this distance, which is done by minimizing ||W||. Since this is a classification
algorithm there are a few constraints:
G(-1) = -1(W*X + b) ≥ 1
G(1) = (W*X + b) ≥ 1
These constraints prevent misclassification. The best way to solve an
optimization problem with constraints is by using the Lagrangian Multiplier
method. In-depth mathematical explanation can be found in the
corresponding paper regarding this project.
To apply SVM classification to Morph-II data, Matlab’s built in SVM function
was used. A set of 3000 feature vectors was used as a subset of the initial
data. Each vector has been reduced from 4,376 dimensions to 200. This set
contains 2,747 males and 253 females. The males are set to the +1 class
while the females belong to the -1 class. As seen below, SVM classification
performs remarkably well, with over a 99% total success rate.
Outliers are very prevalent
in facial recognition data,
as human faces are very
unique and are prone to
misclassification of age.
Basic Linear Regression Optimization Support Vector Regression
Objective
This project focuses on identifying characteristics from a set of images of
faces. These faces are from the MORPH-II dataset that use the Bio-Inspired-
Feature technology at West Virginia University to extract meaningful data
points from the images [1].
Each image is quantified to 4,376 real value points in order to predict
characteristics from new images we can apply the same models that have
learned from the old images. Thus, the project can be broken down into steps:
1) Dimension reduction and 2) Classification & regression. In order to save
computational time, we use dimension reduction. Classification methods help
us determine the genders, and regression helps determine the approximate
ages.
Introduction
Beginning in the 1960’s with Bledsoe, Wolf, and Bisson, facial recognition
began to become a topic of interest. Specifically these researchers began to
notice the difference and uniqueness of distances between the eyes, ears,
nose, and mouth. In the 1970’s hair color and lip thickness then became further
subjects of facial recognition exploration. Further, in 1988, component analysis
was utilized to ease computational time in recognition. In 1991 eigenface
techniques were utilized and brought to public attention when officials were
able to monitor video surveillance of the Super Bowl and recognize 32
criminals at the event based on facial recognition. Finally, in 2002, verification
rates of up to 90% were able to be achieved by mapping a 2D image of a face
to a 3D grid.
Image Processing for Classification and Regression
Danielle Chaung, Bruce Hunt, Jacob Oullette
Dr. Yaw O Chang
Note the goal of Support Vector Regression
is to minimize the slope of the regression
line to compensate for outliers. When
compared to basic Linear regression, we can
see that the slope is of smaller value.
The constraints of our problem can be solved
by creating a Lagrange function and
implementing Lagrange multipliers per
constraint. Thus, we have the unconstrained,
optimization equation
Let G be a graph denoted by G = (V,E) where
V is the set of nodes and E is the set of
edges. Each point is a node. Then create a
matrix (4376 x 4376) representing each node
and mark either a 0 or a 1 to denote if one
node is related to another node.
Figure 1 Figure 2
Points related to each other are marked with 1
based on Euclidean distance. When projecting
points from two dimensions to one, Figure 1
has a projection that would lose the important
local structure, whereas Figure 2 would not
due to its relation preserved in the adjacency
matrix.
With the adjacency matrix completed, W is
the weighted graph that quantifies how
strong each relation is in the adjacency
matrix. Let nodes i and j in the weighted
matrix hold a value quantified by either 1
or 2:
1) Simple. If i and j are unrelated let
Wij=0, else Wij=1.
2) Heat Kernel. If i and j are unrelated let
Wij=0, else
The Heat Kernel is used to quantify
how strong a relationship is between
two points because as the points get
further apart, the negative exponential
function value is small and as the
distance decreases the value is large.
In order to find the reduced matrix Y, where yi
and yj are two points in Y, the objective function
the algorithm minimizes is:
If Wij=0, this represents there is no relation
between the original points xi and xj thus there
is no need to minimize zero, but if Wij=1 or is
equal to a large number then it is necessary to
make the new points yi and yj as close together
as possible. Thus minimizing the objective
function is a good mapping of X into Y.
Deriving the objective function into a
generalized eigenvector problem the result is:
Where D is the diagonal matrix and a is the
transform that maps X into Y as shown below
Thus solving the proposed problem.
Data
We have 3000 African American faces, because of this the classification and
regression might yield higher accuracy rates since it is within one race.
The below two graphs are from the Locality Preserving Projection dimension reduction.

495Poster

  • 1.
    References [1] G. Guo,G. Mu, Y. Fu, and T.S. Huang. Human age estimation using bio-inspired features. Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2009 [2] X. He and P. Niyogi, “Locality Preserving Projections,” Proc. Conf. Advances in Neural Information Processing System, 2003. Classification The Support Vector Machine algorithm will be used to classify the Morph II data by gender. Classification is achieved by finding the optimal hyperplane that separates classes of data. A hyperplane of a n dimensional vector space is a subspace with n-1 dimensions. A separating hyperplane separates categories of data. For example, in the following picture the goal would be finding a line that separates the blue class of data from the green class. Regression The Human face is the essential form of identification, as each face is unique in its own way shape or form. Specifically, when it comes to age identification based on the image of a human face, it can prove to be quite problematic in more ways than one. The biggest issue with age identification is that the aging process is also unique to each individual. Many outside factors such as living style, health, and gender effect the way in which people age. Therefore, this poses the issue of high variance in age estimation for facial data. Particularly, Human faces and their unique characteristics pose the problem out various number of outliers from a predicted model. This section hopes to examine a linear model, known as Support Vector Regression (SVR), that optimizes the age prediction accuracy with the constraint of uniqueness with the human face. Dimension Reduction Since each image is quantified by 4,376 real value points and there are 3,000 images the computational intensity for regression and classification would decrease if the number of points each image held decreased. Thus, the goal is to reduce each image to 200 meaningful points. The dimension reduction technique used is Locality Preserving Projections [2], which is an algorithm that preserves local structure between the points. The computational complexity is linear making it efficient for large data sets. Thus, given a set x1, x2, … , x3000 in R4376, find a transformation matrix A that maps these 3000 points to a set of points y1, y2, … , y3000 in R200 such that yi represents xi where yi = AT xi. The objective of SVM classification is to find the separating hyperplane that maximizes the margin between itself and the decision boundaries. The equation of the hyperplane can be written as W*X+b=0, and the decision boundaries can be written as W*X+b=± 1. By computing the distance from the hyperplane we find that the size of the margin is 1/||W||. The essential problem of SVM classification is to maximize this distance, which is done by minimizing ||W||. Since this is a classification algorithm there are a few constraints: G(-1) = -1(W*X + b) ≥ 1 G(1) = (W*X + b) ≥ 1 These constraints prevent misclassification. The best way to solve an optimization problem with constraints is by using the Lagrangian Multiplier method. In-depth mathematical explanation can be found in the corresponding paper regarding this project. To apply SVM classification to Morph-II data, Matlab’s built in SVM function was used. A set of 3000 feature vectors was used as a subset of the initial data. Each vector has been reduced from 4,376 dimensions to 200. This set contains 2,747 males and 253 females. The males are set to the +1 class while the females belong to the -1 class. As seen below, SVM classification performs remarkably well, with over a 99% total success rate. Outliers are very prevalent in facial recognition data, as human faces are very unique and are prone to misclassification of age. Basic Linear Regression Optimization Support Vector Regression Objective This project focuses on identifying characteristics from a set of images of faces. These faces are from the MORPH-II dataset that use the Bio-Inspired- Feature technology at West Virginia University to extract meaningful data points from the images [1]. Each image is quantified to 4,376 real value points in order to predict characteristics from new images we can apply the same models that have learned from the old images. Thus, the project can be broken down into steps: 1) Dimension reduction and 2) Classification & regression. In order to save computational time, we use dimension reduction. Classification methods help us determine the genders, and regression helps determine the approximate ages. Introduction Beginning in the 1960’s with Bledsoe, Wolf, and Bisson, facial recognition began to become a topic of interest. Specifically these researchers began to notice the difference and uniqueness of distances between the eyes, ears, nose, and mouth. In the 1970’s hair color and lip thickness then became further subjects of facial recognition exploration. Further, in 1988, component analysis was utilized to ease computational time in recognition. In 1991 eigenface techniques were utilized and brought to public attention when officials were able to monitor video surveillance of the Super Bowl and recognize 32 criminals at the event based on facial recognition. Finally, in 2002, verification rates of up to 90% were able to be achieved by mapping a 2D image of a face to a 3D grid. Image Processing for Classification and Regression Danielle Chaung, Bruce Hunt, Jacob Oullette Dr. Yaw O Chang Note the goal of Support Vector Regression is to minimize the slope of the regression line to compensate for outliers. When compared to basic Linear regression, we can see that the slope is of smaller value. The constraints of our problem can be solved by creating a Lagrange function and implementing Lagrange multipliers per constraint. Thus, we have the unconstrained, optimization equation Let G be a graph denoted by G = (V,E) where V is the set of nodes and E is the set of edges. Each point is a node. Then create a matrix (4376 x 4376) representing each node and mark either a 0 or a 1 to denote if one node is related to another node. Figure 1 Figure 2 Points related to each other are marked with 1 based on Euclidean distance. When projecting points from two dimensions to one, Figure 1 has a projection that would lose the important local structure, whereas Figure 2 would not due to its relation preserved in the adjacency matrix. With the adjacency matrix completed, W is the weighted graph that quantifies how strong each relation is in the adjacency matrix. Let nodes i and j in the weighted matrix hold a value quantified by either 1 or 2: 1) Simple. If i and j are unrelated let Wij=0, else Wij=1. 2) Heat Kernel. If i and j are unrelated let Wij=0, else The Heat Kernel is used to quantify how strong a relationship is between two points because as the points get further apart, the negative exponential function value is small and as the distance decreases the value is large. In order to find the reduced matrix Y, where yi and yj are two points in Y, the objective function the algorithm minimizes is: If Wij=0, this represents there is no relation between the original points xi and xj thus there is no need to minimize zero, but if Wij=1 or is equal to a large number then it is necessary to make the new points yi and yj as close together as possible. Thus minimizing the objective function is a good mapping of X into Y. Deriving the objective function into a generalized eigenvector problem the result is: Where D is the diagonal matrix and a is the transform that maps X into Y as shown below Thus solving the proposed problem. Data We have 3000 African American faces, because of this the classification and regression might yield higher accuracy rates since it is within one race. The below two graphs are from the Locality Preserving Projection dimension reduction.