Machine Learning and Men's Suit Choices

1

Nearest Neighbor and Men’s Suits

Jayda Dagdelen
Nishani Siriwardane
Daniel Yehuda

2

Introduction

The goal of this paper is to examine the effectiveness of machine

learning/prediction technologies in making a simple daily decision. One of the first

decisions we make everyday is choosing what to wear. In this paper, we evaluate the

effectiveness of using the well-known nearest neighbor algorithm in aiding humans to

make this decision. We designed a MS Windows application which takes as input an

outfit descriptor in the form of garment colors, and gives as output one of three possible

ratings: bad, mediocre, or good. We focused on men’s suits, as there is more or less a

strict rule-set that governs them, which we thought could perhaps be modeled by a

computer. Women’s clothing tends to be less conservative in this regard.

Machine Learning

The problem of data classification/prediction has been one of the important

elements in the growing field of Artificial Intelligence (AI) and machine learning.

Everything from intelligent robots to email spam detection algorithms use some form or

another of data classification, to aid in a decision making process. The problem is set up

as follows: given a set of inputs that represent some data point, suggest an output (or

classification) based on some knowledge-set. For example, the robot mentioned above

may take some form of its current visual data as input for a learning algorithm and base

its next move (i.e., left/right turn) on the output of the algorithm. Spam detection is a

good example of more straightforward classification; given an email (a set of words

which act as inputs), classify it on an integer scale between 0 (most probably legitimate)

3

and 10 (most probably spam). In both cases, some algorithm with a predefined

knowledge-set returns a prediction based on its input (and this knowledge).

Perhaps the most important feature of any classification algorithm that falls under

the realm of machine-learning is the ability to build a knowledge-base based on some

training data set, or in other words, learn. In terms of the spam example, a training set

would encompass perhaps thousands of emails which are pre-classified (by a human) into

the different score-groups. Using a method known as supervised learning, the algorithm

parses all of these input/output pairs and attempts to “learn” the function that

appropriately maps the input vectors onto their corresponding outputs. Using whichever

learning method, it builds some knowledge that is based on the training set and can later

be applied to other unclassified datasets (vectors). Though other types of learning are

possible, including transduction, which evaluates its previous experiences to learn its own

bias, we will concentrate on the simple supervised learning method outlined above.1

The Nearest Neighbor Algorithm

In the realm of supervised learning algorithms, there are many options. Neural

Network and State Vector Machine (SVM) systems are some of the more complicated

and advanced ones which have been successfully implemented and enjoy widespread use

in the industry. However, another popular and often very effective classification system is

the simple Nearest-Neighbor algorithm. Though quite memory-intensive, as it maintains

a list of all previously-trained vectors and their classifications, it performs just as well as

the others in many of its applications. Because of its simplicity, its often comparable

1
Machine learning, Wikipedia.
< http://en.wikipedia.org/wiki/Machine_learning>

4

performance, and our relatively tiny data-set, we chose to use it as our classifier. We were

also interested to see just how well a simple algorithm would approximate the human

“taste function.”

Although the nearest neighbor algorithm has both geometric2 and classification

applications, we will be concentrating on the latter one. A good example of its usage in

classification is the prediction of individuals’ political party-affiliations. With input data

such as age, education level, income level, and gender (all grouped together to form a d-

dimensional vector) the algorithm can be used to predict the party of the person

represented by the inputs. A party-affiliated point in d-dimensional space represents each

person in the data set. The classifier determines the party affiliation of a person by

assigning it the affiliation of its nearest neighbor.3 The following process is employed to

do this: the geometric distance from a new data point input to each element of the set of

classified points is calculated. The shortest distance to the new data point indicates the

nearest neighbor of the data point, and the class (in this case party-affiliation) of the

nearest neighbor is assigned to the new data point.4

Again, it is important to note that the knowledge-base of the nearest neighbor

algorithm is no more intricate than the entire set of points that have already been

classified during some previous phase. The classification of these points constitutes the

training, or supervised-learning phase of the algorithm, whereas we will refer to the

prediction of new points simply as the prediction phase.

2
A classic example of its usage in geometry would be emergency dispatchers. Given the location of the
fire, the emergency dispatcher finds the closest firehouse from a map, and dispatches the vehicles from
there.
3
Nearest Neighbor Search
<http://www2.toki.or.id/book/AlgDesignManual/BOOK/BOOK4/NODE188.HTM>
4
Nearest neighbor (pattern recognition), Wikipedia.
<http://en.wikipedia.org/wiki/Neares_neighbor_%28pattern_recognition%29>

5

The prediction phase of the algorithm is the part in which the actual knowledge

(database of classified vectors) that the system already has is used to make educated

statistically educated guesses as to the appropriate classification of new vectors. The

nature of the algorithm, however, makes this part very time consuming, as, in the brute-

force implementation, some constant amount of computing time, C, must be done in

comparing the new data input vector to all n vectors in the database. The result is an

algorithm which requires a time that is linearly proportional to the size of the database.

To combat this problem, various optimizations such as specialized trees to organize pre-

classified data have been developed; these drastically reduce the number of distances

needed to be computed. Such methods partition the geometric space by computing only

the distances within specified limits.5

Alternative Approaches

In the realm of nearest neighbor, there are a variety of other approaches/options

which deserve some attention. Firstly, it is important to note that in practice, a common

variant that is often employed is known as k-nearest neighbor, in which in which k

number of data points are used to estimate the output of the new input data point. To

highlight its effectiveness, we will examine the following example which maps

1-dimensional vectors to their classifications:

Input : 0.0 1.0 1.7 2.5 3.0 3.5 4.0 5.0 6.0 7.0

Output: D D D R R D R R R R

5

6

An input such as 0.6 would be classified as D, with the simple nearest neighbor

algorithm. When k-nearest neighbor algorithm is applied with k = 2 or 3, it would still be

classified as D. However, determining the output of an input such as 3.7 with the k-

nearest neighbor algorithm is more difficult. With the simple nearest neighbor algorithm,

the output would be D. When k = 2, the two closest neighbors are D and R, and do not

belong to the same class. When k = 3, two out of three nearest neighbors are R, therefore

the classification is R. When k = 10, all the neighbors in the set are taken into account,

and the classification is R.6

Unlike in the simple nearest neighbor method, in the k-nearest neighbor method

the calculation of errors becomes important as well. The value of k should be chosen such

that the prediction error should be minimized. For the calculation of the prediction error,

a loss function is necessary. Loss functions take the truth and the prediction as input and

produce 0 when the two match, or produce large values when the truth and the prediction

are far from each other.7 Though more complicated, the use of k-nearest neighbor in our

implementation may have proved more effective.

Another option we considered was using a slightly less conventional model, in

which the system would be trained only on good outfits. In such a model, the final rating

of an outfit would be some decreasing function of the measured distance from the nearest

neighbor, rather than simply its classification. Yet another option is to use the traditional

model, but with different weights on each of the seven items of the suit. The purpose of

this would be to avoid some the problems previously outlined.

6
Kth Nearest Neighbor Classification: Introduction
<http://stat-www.berkeley.edu/users/nolan/stat133/Fall04/lectures/KNN.pdf>
7
Cross Validation
< http://stat-www.berkeley.edu/users/nolan/stat133/Fall04/lectures/CV.pdf>

7

There is also the alternative of using a completely different algorithm, perhaps not

even under the umbrella of machine learning. One such algorithm could rate suits based

on a knowledge-set which simply described the weights of and required correlations

between different elements of the suit. Such an algorithm would, for example assign a

value rating the matchability between different elements of the suits (jacket/pants,

shirt/tie, etc.) and then use these values in determining the score.

There are of course many other alternatives both within and outside the realm of

machine learning. We hoped only to scratch the surface of what we thought could be an

interesting way to help humans with a simple every-day decision.

Implementation

As previously mentioned, we chose to limit the scope of our project to that of

men’s suits. Women’s outfits have numerous varieties in terms of shape, style, color, cut

and cloth - elements that would make a program that evaluates women’s outfits too

complicated for a project of this size. Men’s suits are more standard in terms of shape and

style consisting only of pants, socks, shoes, shirts, jackets, ties and belts. We assumed

that the main criteria for the evaluation of men’s outfits is garment color; it is the most

important element used by humans in determining if a set of suit elements are a good

“match.” These decisions allow for the representation of almost any men’s suit simply in

terms of a list of its seven garment/accessory colors.

With the above in mind, the problem of assessing an outfit as bad, mediocre, or

good can essentially be thought of as one of prediction. In terms of machine learning,

some system could be trained on various sets of seven-color combinations, each

8

associated with some rating (bad, mediocre, good) and then queried with new color

combinations for a prediction response. As for nearest neighbor, the same methodology

applies. There is however and important aspect which needed consideration – how

exactly to represent each color in terms that the algorithm can understand.

Although the nearest neighbor algorithm can be implemented to work with

discrete data (as in the party-affiliation problem discussed earlier in which one of the

inputs is race) and other types of distance measuring functions that work well with such,

color is anything but that. Color is a continuous spectrum on which humans can often

measure some type of distance. In other words, given three colors, we can usually group

two of them as being “closest” to each other. It is this very measure of distance that the

nearest neighbor algorithm relies on to match certain color groupings with others. A

natural way of combating this problem is to map each of the possible colors (~16.7

million on most computers today) to a number and then use the standard Euclidean

distance function as a measure of closeness. However, again who is to say what colors

should be close to each other on the number line? A somewhat artificial but more logical

approach is to break down each color into some other representation. In our case, we

chose to represent each color as the intensity levels of the three primary colors of red,

green and blue (Each of the primary colors can take 256 intensities and so adjusting them

appropriately, it is possible to come up with 256^3 ≈ 16.7 million colors.). The result is a

system which maps 21-dimensional input vectors (3 primary colors for each of the seven

garment colors) to one of three ratings categories (bad, mediocre, good). Though this

decision triples our vector size, it organizes the colors in an ordering in which, at least at

9

some level, the distance between colors can be identified via a geometric Euclidean

function.

The following is a screenshot of the developed application:

The window allows for the selection of a dataset (pre-classified knowledgebase)

and the setting of any of the seven garment/accessory colors. The “Predict!” button runs

the nearest neighbor algorithm on the 21-dimensional input vector corresponding to the

chosen colors and outputs the response in the text-field (in this example, according to the

knowledgebase, the given outfit is predicted to be a “good” one). The “Add Datapoint”

button is used to add a combination and rating to the currently open dataset – the slider

above it can be set to any of the three ratings (bad being leftmost).

10

Methodology

To train the program, the first step involved designing sets of color combinations

for the suit and rating each as good, mediocre, or bad. We chose 35 outfits for each

classification. The good outfits were chosen by browsing online men’s advertisements

and finding the latest fashions. The mediocre outfits were created by using our own

tastes to modify the good outfits to okay outfits. And finally the bad outfits were set by

randomly choosing ridiculous color combinations that we thought would be tasteless.

To test the success of the nearest neighbor algorithm in suit matching it was

necessary to create a testing data set which consisted of outfits already rated by a human

and then compare how the program rated them. The test data set consisted of thirty

different outfits of which one third were bad, one third good, and the other third

mediocre. These outfits were chosen by a member of our group who was not involved in

training the program so that the results would not be too biased. The thirty test outfits

were input into the program and the category the program assigned to each outfit was

recorded. The success of the program was measured by assigning the result from the

program to a score of 1 if it rated the outfit from the test data in the same category as the

human assigned it, and a score of 0 if the program rated it differently than the humanly

assigned category. Note that no weight was placed on how “wrong” the program was in

rating the outfit. For example if the human assigned the outfit to a good category but the

computer assigned it to either a mediocre or bad one, the result would receive the same

score of 1 even though a computer rating of mediocre is closer to getting it “correct.”

We designed two experiments to determine some factors that affected the results

of the nearest neighbor algorithm. Our first experiment involved testing the hypothesis

11

that the larger the size of the training data, the more accurate the algorithm would in

predicting a “correctly” matched outfit. We trained the program with two different data

sets: one consisting of 135 outfits and the other consisting of only 68. The 68 outfits were

chosen by including only every other outfit from the larger training set. We then inputted

the 30 test outfits with the two different size training data sets and compared the scores.

See results section for results.

The second experiment was, in essence, a repeat of the above experiment with

one important change. As mentioned before, the decision to use the RGB color

representation scheme was somewhat arbitrary – this scheme is simply the most common

one used by computers and offers at least some level of color difference “measurability.”

There is yet another common theme which some might say more closely models the

human color perception continuum – HSL. With HSL, each color is also broken down

into three numerical descriptors: Hue, Saturation, and Luminescence, which are all

measured as some percentage of a maximum value. Hue and saturation describe

qualitative differences between different colors, while luminescence describes the

quantitative differences of their brightness.8 In the second experiment, the two same

training sets (sizes 135 and 68 respectively) were converted into HSL representation. The

same was done with the 30 test outfits and the training/prediction was repeated. Below

are the results.

8
Color
<http://encarta.msn.com/text_761577547__1/Color.html>

12

Results

Effects of the size of Training Data

When the program was trained with 135 different outfits, it incorrectly

categorized 36.7% of the 30 humanly categorized outfits of the time. (Incorrect denotes

that the computer did not categorize the outfit as the same as the human.) Statistically,

with 95% confidence, this implies that with 135 different outfits in its knowledgebase,

the algorithm will incorrectly categorize the outfits 19.4%-53.9% of the time. When the

program is trained on only 68 different outfits, the program incorrectly categorized the

outfits 50% of the time. With 95% confidence, with only 68 outfits in its knowledgebase,

the program incorrectly categorizes the outfits 32.1%-67.9% of the time. As we

hypothesized, when there is less training data, the nearest neighbor algorithm is less

accurate in its predictions. The more data points in its knowledgebase, the higher the

chance that some new input will have a nearest neighbor that is “closer.” With too few

data points, a new vector’s closest neighbor may be quite far away on the color

continuum and thus be too different to trust as a member of the same classification group.

Effects of a different color representation

Replicating the same experiments with the HSL color representation scheme, we

found only negligible performance differences. For the trial with a 135 outfit training-

set, the error rate is 36.7% with a 95% confidence interval of 19.4%-53.9%. The results

were identical for the trial with only 68 outfits.

13

Conclusion of Results

The results outlined above suggest that with enough data points, the nearest

neighbor learning algorithm implemented does decently in terms of agreeing with another

human’s classification of outfits. Though the difference in error rates between our two

trials (in RGB) may not be too statistically significant, the literature on nearest neighbor,

and machine learning in general, do indeed verify this conjecture. It is important to note

that in our trials, the mean error rate was significantly less than 66.7%, the expected error

rate of a random classifier. Furthermore, the upper bound of the confidence interval of the

RGB trial with the larger dataset is still less than this number.

As for the results from the HSL representation, we see no improvement on the

trial with the larger dataset and only a slight improvement on the trial with the smaller

dataset. We can therefore make no statistically significant conclusions on the most

appropriate color representation model for use in such an algorithm. It is very possible

however, that with a larger dataset and more than only one trial, some conclusions can be

arrived at regarding this question.

Conclusion: Discussion

We have shown that the simple nearest-neighbor algorithm performs relatively

well in rating what we will call the “matchability” of outfits, based solely on color. We

have also demonstrated the use of an alternate color representation scheme and its effect

on the algorithm’s performance. However, the question of where and under what

conditions the algorithm fails still remains.

14

In developing and testing the algorithm, we came to understand its true limitations

in terms of application in the real world. These limitations stem mainly from the fact that

the algorithm, in and of itself, simply does what it says it does – it finds the nearest

neighbor and assigns its classification to that of the new outfit. It follows that in cases in

which a new outfit matches an existing one perfectly in all dimensions except, for

example, jacket color, the algorithm will most surely rate the outfit according to its

almost perfect match. But, herein lies the problem – a “perfectly” matching outfit

immediately goes from good to quite bad the moment the color of a major piece of the

outfit, for example the jacket, is changed to a ridiculous color. The nearest neighbor

algorithm inherently can not understand this and so, often fails in evaluating such outfits.

With more appropriately-trained data points in the region, it might perform better.

Another major problem with the algorithm is its lack of a true understanding of

how humans tend to rate an outfit. Namely, it fails to weight and correlate different

elements of the suit. For example, while the matching of the jacket and pants is essential,

there is often much more leeway with tie color. The basic algorithm, however gives these

two dimensions the same weight/importance in computing distances and so fails on this

front. Nearest neighbor’s failure in correlation is best illustrated by the rating given to a

very well-matched (at least by our standards) but rather colorful suit. Because our

training set consists only of the more conservative/traditional suits, the algorithm ends up

classifying the suit as bad.

15

APPENDIX

Nearest Neighbor Core Functions:

#include "stdafx.h"
#include ".nearestneighbor.h"
#include <math.h>
#include <queue>
#include <string>
#include <sstream>

using namespace std;

NearestNeighbor::NearestNeighbor(DataSet * d_local, int k_local) :
d(d_local), k(k_local)
{
standardize();
}

NearestNeighbor::~NearestNeighbor(void)
{
}

/* returns the euclidian square distance between two vectors, x and
y, which are assumed
to be standardized already, and assumed to both have dimension of
vector x */
double NearestNeighbor::distance(vector<double> &x, vector<double> &y) {
double d = 0;
for(int i = 0; i < x.size(); i++) {
double t = x[i] - y[i];
d += t*t;
}
return d; //return square dist. for speed (no need for real
dist. - comparison only)
}

void NearestNeighbor::standardize() {
vector<vector<int> > & input = d->trainEx; ///???
int numAttrs = d->numAttrs;
int numExs = d->numTrainExs;

//record means
vector<double> mean(numAttrs); //?

for (int i = 0; i < numExs; i++) {
for (int j = 0; j < numAttrs; j++)
mean[j] += (double)input[i][j];
}
for (int i = 0; i < numAttrs; i++)
mean[i] /= (double)numExs;
//end record means

//record standard deviations
stdev.resize(numAttrs);

16

for (int j = 0; j < numAttrs; j++) {
double t = (double)input[i][j] - mean[j];
stdev[j] += t*t;
}
}
for (int i = 0; i < numAttrs; i++){
stdev[i] /= (double)numExs;
stdev[i] = sqrt((double)stdev[i]);
}
//end record standard deviations

//standardize
data.resize(numExs);
data[i].resize(numAttrs);
for (int j = 0; j < numAttrs; j++)
if (stdev[j] != 0)
data[i][j] = (double)input[i][j] / stdev[j];
}
//end standardize
}

int NearestNeighbor::predict(vector<int> &ex) {

int numAttrs = d->numAttrs;
int numExs = d->numTrainExs;

vector<double> dex(numAttrs);

for (int i = 0; i < numAttrs; i++) //standardize vector based on
training stdev's
if (stdev[i] != 0)
dex[i] = (double)ex[i] / stdev[i];

double bestDist = distance(data[0], dex);
int bestIndex = 0;
for (int i = 1; i < d->numTrainExs; i++) {
double dist = distance(data[i], dex);
if (dist < bestDist) {
bestDist = dist;
bestIndex = i;
}
}
return d->trainLabel[bestIndex];

}

17

BIBLIOGRAPHY

Cover, T. M. and P. E. Hart. “Nearest Neighbor Pattern Classification,” IEEE
Transactions on Information Theory, Vol. IT-13, No.1, January 1967.

Gooda, Abdel-Hamid. “Application of The Techniques of Data Compression and Nearest
Neighbor Classification to Information Retrieval,” 2002.

Nayar, Shree K. and Sameer A. Nene. “A Simple Algorithm for Nearest Neighbor Search
in High Dimensions,” IEEE Transactions on Pattern Analysis and Machine Intelligence,
Vol. 19, No.9, September 1997.

Pace, R. Kelley and Dongya Zou. “Closed-Form Maximum Likelihood Estimates of
Nearest Neighbor Spatial Dependence,” Geographical Analysis, Volume 32, Number 2,
April 2000.

Yau, Hung-Chun and Michael T. Manry. “Iterative Improvement of a Nearest Neighbor
Classifier.”

Color
<http://encarta.msn.com/text_761577547__1/Color.html>

Cross Validation
<http://stat-www.berkeley.edu/users/nolan/stat133/Fall04/lectures/CV.pdf>

Kth Nearest Neighbor Classification: Introduction.
<http://stat-www.berkeley.edu/users/nolan/stat133/Fall04/lectures/KNN.pdf>

Machine learning, Wikipedia.
<http://en.wikipedia.org/wiki/Machine_learning>

<http://www2.toki.or.id/book/AlgDesignManual/BOOK/BOOK4/NODE188.HTM>


<http://www.cs.sunysb.edu/~algorith/files/nearest-neighbor.shtml>

Machine Learning and Men's Suit Choices

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Viewers also liked

Viewers also liked (9)

Similar to Machine Learning and Men's Suit Choices

Similar to Machine Learning and Men's Suit Choices (20)

More from butest

More from butest (20)

Machine Learning and Men's Suit Choices