2. 2
Introduction
The goal of this paper is to examine the effectiveness of machine
learning/prediction technologies in making a simple daily decision. One of the first
decisions we make everyday is choosing what to wear. In this paper, we evaluate the
effectiveness of using the well-known nearest neighbor algorithm in aiding humans to
make this decision. We designed a MS Windows application which takes as input an
outfit descriptor in the form of garment colors, and gives as output one of three possible
ratings: bad, mediocre, or good. We focused on men’s suits, as there is more or less a
strict rule-set that governs them, which we thought could perhaps be modeled by a
computer. Women’s clothing tends to be less conservative in this regard.
Machine Learning
The problem of data classification/prediction has been one of the important
elements in the growing field of Artificial Intelligence (AI) and machine learning.
Everything from intelligent robots to email spam detection algorithms use some form or
another of data classification, to aid in a decision making process. The problem is set up
as follows: given a set of inputs that represent some data point, suggest an output (or
classification) based on some knowledge-set. For example, the robot mentioned above
may take some form of its current visual data as input for a learning algorithm and base
its next move (i.e., left/right turn) on the output of the algorithm. Spam detection is a
good example of more straightforward classification; given an email (a set of words
which act as inputs), classify it on an integer scale between 0 (most probably legitimate)
3. 3
and 10 (most probably spam). In both cases, some algorithm with a predefined
knowledge-set returns a prediction based on its input (and this knowledge).
Perhaps the most important feature of any classification algorithm that falls under
the realm of machine-learning is the ability to build a knowledge-base based on some
training data set, or in other words, learn. In terms of the spam example, a training set
would encompass perhaps thousands of emails which are pre-classified (by a human) into
the different score-groups. Using a method known as supervised learning, the algorithm
parses all of these input/output pairs and attempts to “learn” the function that
appropriately maps the input vectors onto their corresponding outputs. Using whichever
learning method, it builds some knowledge that is based on the training set and can later
be applied to other unclassified datasets (vectors). Though other types of learning are
possible, including transduction, which evaluates its previous experiences to learn its own
bias, we will concentrate on the simple supervised learning method outlined above.1
The Nearest Neighbor Algorithm
In the realm of supervised learning algorithms, there are many options. Neural
Network and State Vector Machine (SVM) systems are some of the more complicated
and advanced ones which have been successfully implemented and enjoy widespread use
in the industry. However, another popular and often very effective classification system is
the simple Nearest-Neighbor algorithm. Though quite memory-intensive, as it maintains
a list of all previously-trained vectors and their classifications, it performs just as well as
the others in many of its applications. Because of its simplicity, its often comparable
1
Machine learning, Wikipedia.
< http://en.wikipedia.org/wiki/Machine_learning>
4. 4
performance, and our relatively tiny data-set, we chose to use it as our classifier. We were
also interested to see just how well a simple algorithm would approximate the human
“taste function.”
Although the nearest neighbor algorithm has both geometric2 and classification
applications, we will be concentrating on the latter one. A good example of its usage in
classification is the prediction of individuals’ political party-affiliations. With input data
such as age, education level, income level, and gender (all grouped together to form a d-
dimensional vector) the algorithm can be used to predict the party of the person
represented by the inputs. A party-affiliated point in d-dimensional space represents each
person in the data set. The classifier determines the party affiliation of a person by
assigning it the affiliation of its nearest neighbor.3 The following process is employed to
do this: the geometric distance from a new data point input to each element of the set of
classified points is calculated. The shortest distance to the new data point indicates the
nearest neighbor of the data point, and the class (in this case party-affiliation) of the
nearest neighbor is assigned to the new data point.4
Again, it is important to note that the knowledge-base of the nearest neighbor
algorithm is no more intricate than the entire set of points that have already been
classified during some previous phase. The classification of these points constitutes the
training, or supervised-learning phase of the algorithm, whereas we will refer to the
prediction of new points simply as the prediction phase.
2
A classic example of its usage in geometry would be emergency dispatchers. Given the location of the
fire, the emergency dispatcher finds the closest firehouse from a map, and dispatches the vehicles from
there.
3
Nearest Neighbor Search
<http://www2.toki.or.id/book/AlgDesignManual/BOOK/BOOK4/NODE188.HTM>
4
Nearest neighbor (pattern recognition), Wikipedia.
<http://en.wikipedia.org/wiki/Neares_neighbor_%28pattern_recognition%29>
5. 5
The prediction phase of the algorithm is the part in which the actual knowledge
(database of classified vectors) that the system already has is used to make educated
statistically educated guesses as to the appropriate classification of new vectors. The
nature of the algorithm, however, makes this part very time consuming, as, in the brute-
force implementation, some constant amount of computing time, C, must be done in
comparing the new data input vector to all n vectors in the database. The result is an
algorithm which requires a time that is linearly proportional to the size of the database.
To combat this problem, various optimizations such as specialized trees to organize pre-
classified data have been developed; these drastically reduce the number of distances
needed to be computed. Such methods partition the geometric space by computing only
the distances within specified limits.5
Alternative Approaches
In the realm of nearest neighbor, there are a variety of other approaches/options
which deserve some attention. Firstly, it is important to note that in practice, a common
variant that is often employed is known as k-nearest neighbor, in which in which k
number of data points are used to estimate the output of the new input data point. To
highlight its effectiveness, we will examine the following example which maps
1-dimensional vectors to their classifications:
Input : 0.0 1.0 1.7 2.5 3.0 3.5 4.0 5.0 6.0 7.0
Output: D D D R R D R R R R
5
Nearest neighbor (pattern recognition), Wikipedia.
<http://en.wikipedia.org/wiki/Neares_neighbor_%28pattern_recognition%29>
6. 6
An input such as 0.6 would be classified as D, with the simple nearest neighbor
algorithm. When k-nearest neighbor algorithm is applied with k = 2 or 3, it would still be
classified as D. However, determining the output of an input such as 3.7 with the k-
nearest neighbor algorithm is more difficult. With the simple nearest neighbor algorithm,
the output would be D. When k = 2, the two closest neighbors are D and R, and do not
belong to the same class. When k = 3, two out of three nearest neighbors are R, therefore
the classification is R. When k = 10, all the neighbors in the set are taken into account,
and the classification is R.6
Unlike in the simple nearest neighbor method, in the k-nearest neighbor method
the calculation of errors becomes important as well. The value of k should be chosen such
that the prediction error should be minimized. For the calculation of the prediction error,
a loss function is necessary. Loss functions take the truth and the prediction as input and
produce 0 when the two match, or produce large values when the truth and the prediction
are far from each other.7 Though more complicated, the use of k-nearest neighbor in our
implementation may have proved more effective.
Another option we considered was using a slightly less conventional model, in
which the system would be trained only on good outfits. In such a model, the final rating
of an outfit would be some decreasing function of the measured distance from the nearest
neighbor, rather than simply its classification. Yet another option is to use the traditional
model, but with different weights on each of the seven items of the suit. The purpose of
this would be to avoid some the problems previously outlined.
6
Kth Nearest Neighbor Classification: Introduction
<http://stat-www.berkeley.edu/users/nolan/stat133/Fall04/lectures/KNN.pdf>
7
Cross Validation
< http://stat-www.berkeley.edu/users/nolan/stat133/Fall04/lectures/CV.pdf>
7. 7
There is also the alternative of using a completely different algorithm, perhaps not
even under the umbrella of machine learning. One such algorithm could rate suits based
on a knowledge-set which simply described the weights of and required correlations
between different elements of the suit. Such an algorithm would, for example assign a
value rating the matchability between different elements of the suits (jacket/pants,
shirt/tie, etc.) and then use these values in determining the score.
There are of course many other alternatives both within and outside the realm of
machine learning. We hoped only to scratch the surface of what we thought could be an
interesting way to help humans with a simple every-day decision.
Implementation
As previously mentioned, we chose to limit the scope of our project to that of
men’s suits. Women’s outfits have numerous varieties in terms of shape, style, color, cut
and cloth - elements that would make a program that evaluates women’s outfits too
complicated for a project of this size. Men’s suits are more standard in terms of shape and
style consisting only of pants, socks, shoes, shirts, jackets, ties and belts. We assumed
that the main criteria for the evaluation of men’s outfits is garment color; it is the most
important element used by humans in determining if a set of suit elements are a good
“match.” These decisions allow for the representation of almost any men’s suit simply in
terms of a list of its seven garment/accessory colors.
With the above in mind, the problem of assessing an outfit as bad, mediocre, or
good can essentially be thought of as one of prediction. In terms of machine learning,
some system could be trained on various sets of seven-color combinations, each
8. 8
associated with some rating (bad, mediocre, good) and then queried with new color
combinations for a prediction response. As for nearest neighbor, the same methodology
applies. There is however and important aspect which needed consideration – how
exactly to represent each color in terms that the algorithm can understand.
Although the nearest neighbor algorithm can be implemented to work with
discrete data (as in the party-affiliation problem discussed earlier in which one of the
inputs is race) and other types of distance measuring functions that work well with such,
color is anything but that. Color is a continuous spectrum on which humans can often
measure some type of distance. In other words, given three colors, we can usually group
two of them as being “closest” to each other. It is this very measure of distance that the
nearest neighbor algorithm relies on to match certain color groupings with others. A
natural way of combating this problem is to map each of the possible colors (~16.7
million on most computers today) to a number and then use the standard Euclidean
distance function as a measure of closeness. However, again who is to say what colors
should be close to each other on the number line? A somewhat artificial but more logical
approach is to break down each color into some other representation. In our case, we
chose to represent each color as the intensity levels of the three primary colors of red,
green and blue (Each of the primary colors can take 256 intensities and so adjusting them
appropriately, it is possible to come up with 256^3 ≈ 16.7 million colors.). The result is a
system which maps 21-dimensional input vectors (3 primary colors for each of the seven
garment colors) to one of three ratings categories (bad, mediocre, good). Though this
decision triples our vector size, it organizes the colors in an ordering in which, at least at
9. 9
some level, the distance between colors can be identified via a geometric Euclidean
function.
The following is a screenshot of the developed application:
The window allows for the selection of a dataset (pre-classified knowledgebase)
and the setting of any of the seven garment/accessory colors. The “Predict!” button runs
the nearest neighbor algorithm on the 21-dimensional input vector corresponding to the
chosen colors and outputs the response in the text-field (in this example, according to the
knowledgebase, the given outfit is predicted to be a “good” one). The “Add Datapoint”
button is used to add a combination and rating to the currently open dataset – the slider
above it can be set to any of the three ratings (bad being leftmost).
10. 10
Methodology
To train the program, the first step involved designing sets of color combinations
for the suit and rating each as good, mediocre, or bad. We chose 35 outfits for each
classification. The good outfits were chosen by browsing online men’s advertisements
and finding the latest fashions. The mediocre outfits were created by using our own
tastes to modify the good outfits to okay outfits. And finally the bad outfits were set by
randomly choosing ridiculous color combinations that we thought would be tasteless.
To test the success of the nearest neighbor algorithm in suit matching it was
necessary to create a testing data set which consisted of outfits already rated by a human
and then compare how the program rated them. The test data set consisted of thirty
different outfits of which one third were bad, one third good, and the other third
mediocre. These outfits were chosen by a member of our group who was not involved in
training the program so that the results would not be too biased. The thirty test outfits
were input into the program and the category the program assigned to each outfit was
recorded. The success of the program was measured by assigning the result from the
program to a score of 1 if it rated the outfit from the test data in the same category as the
human assigned it, and a score of 0 if the program rated it differently than the humanly
assigned category. Note that no weight was placed on how “wrong” the program was in
rating the outfit. For example if the human assigned the outfit to a good category but the
computer assigned it to either a mediocre or bad one, the result would receive the same
score of 1 even though a computer rating of mediocre is closer to getting it “correct.”
We designed two experiments to determine some factors that affected the results
of the nearest neighbor algorithm. Our first experiment involved testing the hypothesis
11. 11
that the larger the size of the training data, the more accurate the algorithm would in
predicting a “correctly” matched outfit. We trained the program with two different data
sets: one consisting of 135 outfits and the other consisting of only 68. The 68 outfits were
chosen by including only every other outfit from the larger training set. We then inputted
the 30 test outfits with the two different size training data sets and compared the scores.
See results section for results.
The second experiment was, in essence, a repeat of the above experiment with
one important change. As mentioned before, the decision to use the RGB color
representation scheme was somewhat arbitrary – this scheme is simply the most common
one used by computers and offers at least some level of color difference “measurability.”
There is yet another common theme which some might say more closely models the
human color perception continuum – HSL. With HSL, each color is also broken down
into three numerical descriptors: Hue, Saturation, and Luminescence, which are all
measured as some percentage of a maximum value. Hue and saturation describe
qualitative differences between different colors, while luminescence describes the
quantitative differences of their brightness.8 In the second experiment, the two same
training sets (sizes 135 and 68 respectively) were converted into HSL representation. The
same was done with the 30 test outfits and the training/prediction was repeated. Below
are the results.
8
Color
<http://encarta.msn.com/text_761577547__1/Color.html>
12. 12
Results
Effects of the size of Training Data
When the program was trained with 135 different outfits, it incorrectly
categorized 36.7% of the 30 humanly categorized outfits of the time. (Incorrect denotes
that the computer did not categorize the outfit as the same as the human.) Statistically,
with 95% confidence, this implies that with 135 different outfits in its knowledgebase,
the algorithm will incorrectly categorize the outfits 19.4%-53.9% of the time. When the
program is trained on only 68 different outfits, the program incorrectly categorized the
outfits 50% of the time. With 95% confidence, with only 68 outfits in its knowledgebase,
the program incorrectly categorizes the outfits 32.1%-67.9% of the time. As we
hypothesized, when there is less training data, the nearest neighbor algorithm is less
accurate in its predictions. The more data points in its knowledgebase, the higher the
chance that some new input will have a nearest neighbor that is “closer.” With too few
data points, a new vector’s closest neighbor may be quite far away on the color
continuum and thus be too different to trust as a member of the same classification group.
Effects of a different color representation
Replicating the same experiments with the HSL color representation scheme, we
found only negligible performance differences. For the trial with a 135 outfit training-
set, the error rate is 36.7% with a 95% confidence interval of 19.4%-53.9%. The results
were identical for the trial with only 68 outfits.
13. 13
Conclusion of Results
The results outlined above suggest that with enough data points, the nearest
neighbor learning algorithm implemented does decently in terms of agreeing with another
human’s classification of outfits. Though the difference in error rates between our two
trials (in RGB) may not be too statistically significant, the literature on nearest neighbor,
and machine learning in general, do indeed verify this conjecture. It is important to note
that in our trials, the mean error rate was significantly less than 66.7%, the expected error
rate of a random classifier. Furthermore, the upper bound of the confidence interval of the
RGB trial with the larger dataset is still less than this number.
As for the results from the HSL representation, we see no improvement on the
trial with the larger dataset and only a slight improvement on the trial with the smaller
dataset. We can therefore make no statistically significant conclusions on the most
appropriate color representation model for use in such an algorithm. It is very possible
however, that with a larger dataset and more than only one trial, some conclusions can be
arrived at regarding this question.
Conclusion: Discussion
We have shown that the simple nearest-neighbor algorithm performs relatively
well in rating what we will call the “matchability” of outfits, based solely on color. We
have also demonstrated the use of an alternate color representation scheme and its effect
on the algorithm’s performance. However, the question of where and under what
conditions the algorithm fails still remains.
14. 14
In developing and testing the algorithm, we came to understand its true limitations
in terms of application in the real world. These limitations stem mainly from the fact that
the algorithm, in and of itself, simply does what it says it does – it finds the nearest
neighbor and assigns its classification to that of the new outfit. It follows that in cases in
which a new outfit matches an existing one perfectly in all dimensions except, for
example, jacket color, the algorithm will most surely rate the outfit according to its
almost perfect match. But, herein lies the problem – a “perfectly” matching outfit
immediately goes from good to quite bad the moment the color of a major piece of the
outfit, for example the jacket, is changed to a ridiculous color. The nearest neighbor
algorithm inherently can not understand this and so, often fails in evaluating such outfits.
With more appropriately-trained data points in the region, it might perform better.
Another major problem with the algorithm is its lack of a true understanding of
how humans tend to rate an outfit. Namely, it fails to weight and correlate different
elements of the suit. For example, while the matching of the jacket and pants is essential,
there is often much more leeway with tie color. The basic algorithm, however gives these
two dimensions the same weight/importance in computing distances and so fails on this
front. Nearest neighbor’s failure in correlation is best illustrated by the rating given to a
very well-matched (at least by our standards) but rather colorful suit. Because our
training set consists only of the more conservative/traditional suits, the algorithm ends up
classifying the suit as bad.
15. 15
APPENDIX
Nearest Neighbor Core Functions:
#include "stdafx.h"
#include ".nearestneighbor.h"
#include <math.h>
#include <queue>
#include <string>
#include <sstream>
using namespace std;
NearestNeighbor::NearestNeighbor(DataSet * d_local, int k_local) :
d(d_local), k(k_local)
{
standardize();
}
NearestNeighbor::~NearestNeighbor(void)
{
}
/* returns the euclidian square distance between two vectors, x and
y, which are assumed
to be standardized already, and assumed to both have dimension of
vector x */
double NearestNeighbor::distance(vector<double> &x, vector<double> &y) {
double d = 0;
for(int i = 0; i < x.size(); i++) {
double t = x[i] - y[i];
d += t*t;
}
return d; //return square dist. for speed (no need for real
dist. - comparison only)
}
void NearestNeighbor::standardize() {
vector<vector<int> > & input = d->trainEx; ///???
int numAttrs = d->numAttrs;
int numExs = d->numTrainExs;
//record means
vector<double> mean(numAttrs); //?
for (int i = 0; i < numExs; i++) {
for (int j = 0; j < numAttrs; j++)
mean[j] += (double)input[i][j];
}
for (int i = 0; i < numAttrs; i++)
mean[i] /= (double)numExs;
//end record means
//record standard deviations
stdev.resize(numAttrs);
for (int i = 0; i < numExs; i++) {
16. 16
for (int j = 0; j < numAttrs; j++) {
double t = (double)input[i][j] - mean[j];
stdev[j] += t*t;
}
}
for (int i = 0; i < numAttrs; i++){
stdev[i] /= (double)numExs;
stdev[i] = sqrt((double)stdev[i]);
}
//end record standard deviations
//standardize
data.resize(numExs);
for (int i = 0; i < numExs; i++) {
data[i].resize(numAttrs);
for (int j = 0; j < numAttrs; j++)
if (stdev[j] != 0)
data[i][j] = (double)input[i][j] / stdev[j];
}
//end standardize
}
int NearestNeighbor::predict(vector<int> &ex) {
int numAttrs = d->numAttrs;
int numExs = d->numTrainExs;
vector<double> dex(numAttrs);
for (int i = 0; i < numAttrs; i++) //standardize vector based on
training stdev's
if (stdev[i] != 0)
dex[i] = (double)ex[i] / stdev[i];
double bestDist = distance(data[0], dex);
int bestIndex = 0;
for (int i = 1; i < d->numTrainExs; i++) {
double dist = distance(data[i], dex);
if (dist < bestDist) {
bestDist = dist;
bestIndex = i;
}
}
return d->trainLabel[bestIndex];
}
17. 17
BIBLIOGRAPHY
Cover, T. M. and P. E. Hart. “Nearest Neighbor Pattern Classification,” IEEE
Transactions on Information Theory, Vol. IT-13, No.1, January 1967.
Gooda, Abdel-Hamid. “Application of The Techniques of Data Compression and Nearest
Neighbor Classification to Information Retrieval,” 2002.
Nayar, Shree K. and Sameer A. Nene. “A Simple Algorithm for Nearest Neighbor Search
in High Dimensions,” IEEE Transactions on Pattern Analysis and Machine Intelligence,
Vol. 19, No.9, September 1997.
Pace, R. Kelley and Dongya Zou. “Closed-Form Maximum Likelihood Estimates of
Nearest Neighbor Spatial Dependence,” Geographical Analysis, Volume 32, Number 2,
April 2000.
Yau, Hung-Chun and Michael T. Manry. “Iterative Improvement of a Nearest Neighbor
Classifier.”
Color
<http://encarta.msn.com/text_761577547__1/Color.html>
Cross Validation
<http://stat-www.berkeley.edu/users/nolan/stat133/Fall04/lectures/CV.pdf>
Kth Nearest Neighbor Classification: Introduction.
<http://stat-www.berkeley.edu/users/nolan/stat133/Fall04/lectures/KNN.pdf>
Machine learning, Wikipedia.
<http://en.wikipedia.org/wiki/Machine_learning>
Nearest Neighbor Search
<http://www2.toki.or.id/book/AlgDesignManual/BOOK/BOOK4/NODE188.HTM>
Nearest neighbor (pattern recognition), Wikipedia.
<http://en.wikipedia.org/wiki/Neares_neighbor_%28pattern_recognition%29>
Nearest Neighbor Search
<http://www.cs.sunysb.edu/~algorith/files/nearest-neighbor.shtml>