2. CONTENTS
1. Importance of Good Features
2. Irrelevant and Redundant Features
3. Feature Pruning and Normalization
4. Evaluating Model Performance
5. Cross Validation
6. Hypothesis Testing and Statistical Significance
7. Debugging Learning Algorithms
8. Bias/Variance Trade-off
2/15
3. IMPORTANCE OF GOOD FEATURES
Feature:
• a feature is an individual
measurable property
• a base of a model
3/15
Importance of
Feature:
• choosing poorly will
result in an unreliable
model
Figure: Machine learning workflow
4. FEATURE EXTRACTION EXAMPLE
pixel representation
• 100 x 100 pixel image = 30,000
dimension vector
• each dimension corresponds to the
RGB
• Like feature(1.1) is ..
4/15
patch representation
• the unit of interest is a small rectangular
block
• rather than a single pixel
object recognition from images
Figure: pixel representation
Figure: patch representation
5. FEATURE EXTRACTION EXAMPLE
shape representation
• throw out all color and pixel
information
• simply provide a bounding polygon
5/15
text categorization
bag of words representation
object recognition from images
Figure: pixel representation
Figure: pixel representation
Figure: shape representation
Figure: text categorization
6. IRRELEVANT AND REDUNDANT FEATURES
6/15
Figure: pixel representation
Figure: shape representation
Irrelevant Feature:
the presence of
the word “the” might
be largely irrelevant
for predicting whether
a
course review is
positive or negative.
an irrelevant
feature is one that is
completely uncorrelated with
the prediction
task
7. IRRELEVANT AND REDUNDANT FEATURES
7/15
Figure: pixel representation
Figure: shape representation
Redundant Feature:
having a bright red
pixel in an image at
position
(20, 93) is probably
highly redundant with
having a bright red
pixel
at position (21, 93)
two features are redundant if
they are highly correlated
eg: both might be useful for
identifying fire hydrants
Figure: fire hydrants
8. FEATURE PRUNING AND NORMALIZATION
8/15
Figure: pixel representation
Figure: shape representation
Feature Pruning:
the word “good” appears
in exactly one training
document, which is
positive.
It’s hard to tell with just
one training example if it
is really correlated with
the
positive class, or is it just
noise.
• reduces the size of decision trees
• reduces the complexity of the
final classifier
9. FEATURE PRUNING AND NORMALIZATION
9/15
Figure: pixel representation
Figure: shape representation
Normalization:
to make it easier for your learning
algorithm to learn.
Eg: the height of the “A” has been
reduced from 8 to 6 pixels, while the
width has been reduced from 7 to 5
pixels
10. EVALUATING MODEL PERFORMANCE
10/15
Figure: pixel representation
Figure: shape representation
Purpose:
highly accurate classifier
eg:
Medical Diagnosis
Spam Detection
There are two major types of binary
classification problems.
1.“X versus Y.” For instance, positive versus
negative sentiment.
2. “X versus not-X.” For instance, spam versus
non-spam.
11. CROSS VALIDATION
11/15
Figure: pixel representation
Figure: shape representation
• evaluating and comparing learning
algorithms
• how a model will perform in the
future
dividing data into two
segments:
one used to learn or
train a model
and the other used to
validate the model
12. HYPOTHESIS TESTING AND STATISTICAL SIGNIFICANCE
12/15
Figure: pixel representation
eg. In cross validation, compare
between 7% error and 6.9%
error over 1000 examples
in machine learning just as in statistical
hypothesis testing.
13. DEBUGGING LEARNING ALGORITHMS
13/15
Figure: pixel representation
Moreover, sometimes bugs lead
to learning algorithms
performing
better
• it’s unclear to identify there’s a bug
or
• problem is too hard or
• there’s too much noise
• Learning algorithms are notoriously hard to debug
14. BIAS/VARIANCE TRADE-OFF
14/15
Figure: pixel representation
trade-off between estimation error and
approximation error
f be the learned classifier, selected
from a set F of “all possible
classifiers using a fixed
representation,” and f * is optimal
classifier
estimation error, measures how
far the actual learned classifier f
is from the optimal classifier f *
approximation error, measures
the quality of the model family