SlideShare a Scribd company logo
1 of 58
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Data Mining
(and machine learning)
DM Lecture 8: Feature Selection
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Today
Feature Selection:
–What
–Why
–How
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Feature Selection: What
You have some data, and you want to use it to
build a classifier, so that you can predict something
(e.g. likelihood of cancer)
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Feature Selection: What
You have some data, and you want to use it to
build a classifier, so that you can predict something
(e.g. likelihood of cancer)
The data has 10,000 fields (features)
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Feature Selection: What
You have some data, and you want to use it to
build a classifier, so that you can predict something
(e.g. likelihood of cancer)
The data has 10,000 fields (features)
you need to cut it down to 1,000 fields before
you try machine learning. Which 1,000?
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Feature Selection: What
You have some data, and you want to use it to
build a classifier, so that you can predict something
(e.g. likelihood of cancer)
The data has 10,000 fields (features)
you need to cut it down to 1,000 fields before
you try machine learning. Which 1,000?
The process of choosing the 1,000 fields to use is called
Feature Selection
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Datasets with many features
Gene expression datasets (~10,000 features)
http://www.ncbi.nlm.nih.gov/sites/entrez?db=gds
Proteomics data (~20,000 features)
http://www.ebi.ac.uk/pride/
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Feature Selection: Why?
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Feature Selection: Why?
The accuracy of all test Web URLs when chang the number of
top words for category file
74%
76%
78%
80%
82%
84%
86%
88%
90%
top10
top20
top30
top40
top50
top60
top70
top80
top90
top100
top110
top120
top130
top140
top150
top160
top170
top180
top190
top200
Number of top words for category file
Accuracy
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Feature Selection: Why?
From http://elpub.scix.net/data/works/att/02-28.content.pdf
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Quite easy to find lots more cases from
papers, where experiments show that
accuracy reduces when you use more
features
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
• Why does accuracy reduce with more
features?
• How does it depend on the specific choice
of features?
• What else changes if we use more features?
• So, how do we choose the right features?
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Why accuracy reduces:
• Note: suppose the best feature set has 20
features. If you add another 5 features,
typically the accuracy of machine learning
may reduce. But you still have the original
20 features!! Why does this happen???
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Noise / Explosion
• The additional features typically add noise.
Machine learning will pick up on spurious
correlations, that might be true in the training set,
but not in the test set.
• For some ML methods, more features means more
parameters to learn (more NN weights, more
decision tree nodes, etc…) – the increased space
of possibilities is more difficult to search.
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Feature selection methods
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Feature selection methods
A big research area!
This diagram from
(Dash & Liu, 1997)
We’ll look briefly at
parts of it
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Feature selection methods
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Feature selection methods
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Correlation-based feature ranking
This is what you used in your assignments,
especially CW 2.
It is indeed used often, by practitioners (who
perhaps don’t understand the issues
involved in FS)
It is actually fine for certain datasets.
It is not even considered in Dash & Liu’s
survey. Why?
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
A made-up dataset
f1 f2 f3 f4 … class
0.4 0.6 0.4 0.6 1
0.2 0.4 1.6 -0.6 1
0.5 0.7 1.8 -0.8 1
0.7 0.8 0.2 0.9 2
0.9 0.8 1.8 -0.7 2
0.5 0.5 0.6 0.5 2
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Correlated with the class
f1 f2 f3 f4 … class
0.4 0.6 0.4 0.6 1
0.2 0.4 1.6 -0.6 1
0.5 0.7 1.8 -0.8 1
0.7 0.8 0.2 0.9 2
0.9 0.8 1.8 -0.7 2
0.5 0.5 0.6 0.5 2
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
uncorrelated with the class /
seemingly random
f1 f2 f3 f4 … class
0.4 0.6 0.4 0.6 1
0.2 0.4 1.6 -0.6 1
0.5 0.7 1.8 -0.8 1
0.7 0.8 0.2 0.9 2
0.9 0.8 1.8 -0.7 2
0.5 0.5 0.6 0.5 2
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Correlation based FS reduces the
dataset to this.
f1 f2 … class
0.4 0.6 1
0.2 0.4 1
0.5 0.7 1
0.7 0.8 2
0.9 0.8 2
0.5 0.5 2
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
But, col 5 shows us f3 + f4 – which
is perfectly correlated with the class!
f1 f2 f3 f4 … class
0.4 0.6 0.4 0.6 1 1
0.2 0.4 1.6 -0.6 1 1
0.5 0.7 1.8 -0.8 1 1
0.7 0.8 0.2 0.9 1.1 2
0.9 0.8 1.8 -0.7 1.1 2
0.5 0.5 0.6 0.5 1.1 2
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Good FS Methods therefore:
• Need to consider how well features work
together
• As we have noted before, if you take 100
features that are each well correlated with
the class, they may simply be correlated
strongly with each other, so provide no
more information than just one of them
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
`Complete’ methods
Original dataset has N features
You want to use a subset of k features
A complete FS method means: try every
subset of k features, and choose the best!
the number of subsets is N! / k!(N−k)!
what is this when N is 100 and k is 5?
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
`Complete’ methods
Original dataset has N features
You want to use a subset of k features
A complete FS method means: try every
subset of k features, and choose the best!
the number of subsets is N! / k!(N−k)!
what is this when N is 100 and k is 5?
75,287,520 -- almost nothing
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
`Complete’ methods
Original dataset has N features
You want to use a subset of k features
A complete FS method means: try every
subset of k features, and choose the best!
the number of subsets is N! / k!(N−k)!
what is this when N is 10,000 and k is 100?
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
`Complete’ methods
Original dataset has N features
You want to use a subset of k features
A complete FS method means: try every
subset of k features, and choose the best!
the number of subsets is N! / k!(N−k)!
what is this when N is 10,000 and k is 100?
5,000,000,000,000,000,000,000,000,000,
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
… continued for another 114 slides.
Actually it is around 5 × 1035,101
(there are around 1080 atoms in the universe)
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Can you see a problem with
complete methods?
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
`forward’ methods
These methods `grow’ a set S of features –
• S starts empty
• Find the best feature to add (by checking
which one gives best performance on a
test set when combined with S).
• If overall performance has improved,
return to step 2; else stop
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
`backward’ methods
These methods remove features one by one.
• S starts with the full feature set
• Find the best feature to remove (by
checking which removal from S gives best
performance on a test set).
• If overall performance has improved,
return to step 2; else stop
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
• When might you choose forward instead of
backward?
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
The Relief method
An instance-based, heuristic method – it works out weight values
for Each feature, based on how important they seem to be in
discriminating between near neighbours
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
The Relief method
There are two features here – the x and the y co-ordinate
Initially they each have zero weight: wx = 0; wy = 0;
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
The Relief method
wx = 0; wy = 0; choose an instance at random
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
The Relief method
wx = 0; wy = 0; choose an instance at random, call it R
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
The Relief method
wx = 0; wy = 0; find H (hit: the nearest to R of the same class) and
M (miss: the nearest to R of different class)
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
The Relief method
M H
wx = 0; wy = 0; find H (hit: the nearest to R of the same class) and
M (miss: the nearest to R of different class)
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
The Relief method
M H
wx = 0; wy = 0; now we update the weights based on the distances
between R and H and between R and M. This happens one feature
at a time
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
The Relief method
M H
To change wx, we add to it: (MR − HR)/n ; so, the further the
`miss’ in the x direction, the higher the weight of x – the more
important x is in terms of discriminating the classes
HR
MR
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
The Relief method
M H
To change wy, we add to it: (MR − HR)/n again, but this time
calculated in the y dimension; clearly the difference is smaller;
differences in this feature don’t seem important in terms of class
value
HR
MR
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
The Relief method
Maybe now we have wx = 0.07, wy = 0.002.
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
The Relief method
wx = 0.07, wy = 0.002;
Pick another instance at random, and do the same again.
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
The Relief method
H M
wx = 0.07, wy = 0.002;
Identify H and M
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
The Relief method
H M
wx = 0.07, wy = 0.002;
Add the HR and MR differences divided by n, for each feature,
again …
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
The Relief method
In the end, we have a weight value for each feature. The higher the
value, the more relevant the feature.
We can use these weights for feature selection, simply by choosing
the features with the S highest weights (if we want to use S
features)
NOTE
-It is important to use Relief F only on min-max normalised data in
[0,1]. However it is fine if category attibutes are involved, in which
case use Hamming distance for those attributes,
-Why divide by n? Then, the weight values can be interpreted as a
difference in probabilities.
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
The Relief method, plucked directly from the
original paper (Kira and Rendell 1992)
Random(ised) methods
aka Stochastic methods
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Suppose you have 1,000 features.
There are 21000 possible subsets of features.
One way to try to find a good subset is to run a
stochastic search algorithm
E.g. Hillclimbing, simulated annealing, genetic algorithm,
particle swarm optimisation, …
One slide introduction to (most)
stochastic search algorithms
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
A search algorithm:
BEGIN:
1. initialise a random population P of N candidate solutions (maybe just 1) (e.g.
each solution is a random subset of features)
2. Evaluate each solution in P (e.g. accuracy of 3-NN using only the features in
that solution)
ITERATE:
1. generate a set C of new solutions, using the good ones in P (e.g. choose a
good one and mutate it, combine bits of two or more solutions, etc …)
2. evaluate each of the new solutions in C.
3. Update P – e.g. by choosing the best N from all of P and C
4. If we have iterated a certain number of times, or accuracy is good enough, stop
One slide introduction to (most)
stochastic search algorithms
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
A search algorithm:
BEGIN:
1. initialise a random population P of N candidate solutions (maybe just 1) (e.g.
each solution is a random subset of features)
2. Evaluate each solution in P (e.g. accuracy of 3-NN using only the features in
that solution)
ITERATE:
1. generate a set C of new solutions, using the good ones in P (e.g. choose a
good one and mutate it, combine bits of two or more solutions, etc …)
2. evaluate each of the new solutions in C.
3. Update P – e.g. by choosing the best N from all of P and C
4. If we have iterated a certain number of times, or accuracy is good enough, stop
GENERATE
TEST
GENERATE
TEST
UPDATE
Why randomised/search methods
are good for FS
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Usually you have a large number of features (e.g. 1,000)
You can give each feature a score (e.g. correlation with target,
Relief weight, etc …), and choose the best-scoring features.
This is very fast.
However this does not evaluate how well features work with
other features. You could give combinations of features a score,
but there are too many combinations of multiple features.
Search algorithms are the only suitable approach that get to grips
with evaluating combinations of features.
Some recommended reading, if you
are interested, is on the website
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com
These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Optional: Coursework E
• Give me ~150 words describing how you
would improve this module. What aspects
you would maintain, and what aspects you
would change. In order to pass, you must
include at least three distinct criticisms.

More Related Content

Similar to DMML8_fese.ppt

Electricity (Diff Charging Processes).pptx
Electricity (Diff Charging Processes).pptxElectricity (Diff Charging Processes).pptx
Electricity (Diff Charging Processes).pptx
JermaeDizon2
 
Data management for TA's
Data management for TA'sData management for TA's
Data management for TA's
aaroncollie
 
Embedding Metadata In Word Processing Documents
Embedding Metadata In Word Processing DocumentsEmbedding Metadata In Word Processing Documents
Embedding Metadata In Word Processing Documents
Jim Downing
 

Similar to DMML8_fese.ppt (20)

know Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdfknow Machine Learning Basic Concepts.pdf
know Machine Learning Basic Concepts.pdf
 
Scalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningScalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data Mining
 
Electricity (Diff Charging Processes).pptx
Electricity (Diff Charging Processes).pptxElectricity (Diff Charging Processes).pptx
Electricity (Diff Charging Processes).pptx
 
Getting the Crowd up to the Cloud
Getting the Crowd up to the CloudGetting the Crowd up to the Cloud
Getting the Crowd up to the Cloud
 
Data management for TA's
Data management for TA'sData management for TA's
Data management for TA's
 
Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples
 
Creating an action plan for learning analytics
Creating an action plan for learning analyticsCreating an action plan for learning analytics
Creating an action plan for learning analytics
 
Philadelphia U Sciences 2011
Philadelphia U Sciences 2011Philadelphia U Sciences 2011
Philadelphia U Sciences 2011
 
Smartphones wikis and games for education
Smartphones wikis and games for educationSmartphones wikis and games for education
Smartphones wikis and games for education
 
Solar car
Solar carSolar car
Solar car
 
2013 10-30-sbc361-reproducible designsandsustainablesoftware
2013 10-30-sbc361-reproducible designsandsustainablesoftware2013 10-30-sbc361-reproducible designsandsustainablesoftware
2013 10-30-sbc361-reproducible designsandsustainablesoftware
 
Software Design Patterns Lecutre (Intro)
Software Design Patterns Lecutre (Intro)Software Design Patterns Lecutre (Intro)
Software Design Patterns Lecutre (Intro)
 
Creating an Engaging Classroom Environment with Apps and Google Technologies
Creating an Engaging Classroom Environment with Apps and Google TechnologiesCreating an Engaging Classroom Environment with Apps and Google Technologies
Creating an Engaging Classroom Environment with Apps and Google Technologies
 
ntcir14centre-overview
ntcir14centre-overviewntcir14centre-overview
ntcir14centre-overview
 
Embedding Metadata In Word Processing Documents
Embedding Metadata In Word Processing DocumentsEmbedding Metadata In Word Processing Documents
Embedding Metadata In Word Processing Documents
 
Data Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceData Matters for AGU Early Career Conference
Data Matters for AGU Early Career Conference
 
Technology Driven Differentiated Instruction
Technology Driven Differentiated InstructionTechnology Driven Differentiated Instruction
Technology Driven Differentiated Instruction
 
Embedded with the Scientists: The UCLA Experience
Embedded with the Scientists: The UCLA ExperienceEmbedded with the Scientists: The UCLA Experience
Embedded with the Scientists: The UCLA Experience
 
BRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning TalkBRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning Talk
 
Developing WebQuests
Developing WebQuestsDeveloping WebQuests
Developing WebQuests
 

Recently uploaded

Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manual
BalamuruganV28
 
Artificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfArtificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdf
Kira Dess
 

Recently uploaded (20)

Circuit Breakers for Engineering Students
Circuit Breakers for Engineering StudentsCircuit Breakers for Engineering Students
Circuit Breakers for Engineering Students
 
Basics of Relay for Engineering Students
Basics of Relay for Engineering StudentsBasics of Relay for Engineering Students
Basics of Relay for Engineering Students
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptx
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
 
15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon15-Minute City: A Completely New Horizon
15-Minute City: A Completely New Horizon
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 
Software Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdfSoftware Engineering Practical File Front Pages.pdf
Software Engineering Practical File Front Pages.pdf
 
Intro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney UniIntro to Design (for Engineers) at Sydney Uni
Intro to Design (for Engineers) at Sydney Uni
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & Modernization
 
Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...
 
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manual
 
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and ToolsMaximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
Maximizing Incident Investigation Efficacy in Oil & Gas: Techniques and Tools
 
Artificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdfArtificial intelligence presentation2-171219131633.pdf
Artificial intelligence presentation2-171219131633.pdf
 
CLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalCLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference Modal
 
Insurance management system project report.pdf
Insurance management system project report.pdfInsurance management system project report.pdf
Insurance management system project report.pdf
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
 
Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1Research Methodolgy & Intellectual Property Rights Series 1
Research Methodolgy & Intellectual Property Rights Series 1
 
The Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptxThe Entity-Relationship Model(ER Diagram).pptx
The Entity-Relationship Model(ER Diagram).pptx
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 

DMML8_fese.ppt

  • 1. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Data Mining (and machine learning) DM Lecture 8: Feature Selection
  • 2. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Today Feature Selection: –What –Why –How
  • 3. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Feature Selection: What You have some data, and you want to use it to build a classifier, so that you can predict something (e.g. likelihood of cancer)
  • 4. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Feature Selection: What You have some data, and you want to use it to build a classifier, so that you can predict something (e.g. likelihood of cancer) The data has 10,000 fields (features)
  • 5. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Feature Selection: What You have some data, and you want to use it to build a classifier, so that you can predict something (e.g. likelihood of cancer) The data has 10,000 fields (features) you need to cut it down to 1,000 fields before you try machine learning. Which 1,000?
  • 6. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Feature Selection: What You have some data, and you want to use it to build a classifier, so that you can predict something (e.g. likelihood of cancer) The data has 10,000 fields (features) you need to cut it down to 1,000 fields before you try machine learning. Which 1,000? The process of choosing the 1,000 fields to use is called Feature Selection
  • 7. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Datasets with many features Gene expression datasets (~10,000 features) http://www.ncbi.nlm.nih.gov/sites/entrez?db=gds Proteomics data (~20,000 features) http://www.ebi.ac.uk/pride/
  • 8. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Feature Selection: Why?
  • 9. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Feature Selection: Why? The accuracy of all test Web URLs when chang the number of top words for category file 74% 76% 78% 80% 82% 84% 86% 88% 90% top10 top20 top30 top40 top50 top60 top70 top80 top90 top100 top110 top120 top130 top140 top150 top160 top170 top180 top190 top200 Number of top words for category file Accuracy
  • 10. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Feature Selection: Why? From http://elpub.scix.net/data/works/att/02-28.content.pdf
  • 11. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Quite easy to find lots more cases from papers, where experiments show that accuracy reduces when you use more features
  • 12. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html • Why does accuracy reduce with more features? • How does it depend on the specific choice of features? • What else changes if we use more features? • So, how do we choose the right features?
  • 13. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Why accuracy reduces: • Note: suppose the best feature set has 20 features. If you add another 5 features, typically the accuracy of machine learning may reduce. But you still have the original 20 features!! Why does this happen???
  • 14. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Noise / Explosion • The additional features typically add noise. Machine learning will pick up on spurious correlations, that might be true in the training set, but not in the test set. • For some ML methods, more features means more parameters to learn (more NN weights, more decision tree nodes, etc…) – the increased space of possibilities is more difficult to search.
  • 15. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Feature selection methods
  • 16. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Feature selection methods A big research area! This diagram from (Dash & Liu, 1997) We’ll look briefly at parts of it
  • 17. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Feature selection methods
  • 18. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Feature selection methods
  • 19. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Correlation-based feature ranking This is what you used in your assignments, especially CW 2. It is indeed used often, by practitioners (who perhaps don’t understand the issues involved in FS) It is actually fine for certain datasets. It is not even considered in Dash & Liu’s survey. Why?
  • 20. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html A made-up dataset f1 f2 f3 f4 … class 0.4 0.6 0.4 0.6 1 0.2 0.4 1.6 -0.6 1 0.5 0.7 1.8 -0.8 1 0.7 0.8 0.2 0.9 2 0.9 0.8 1.8 -0.7 2 0.5 0.5 0.6 0.5 2
  • 21. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Correlated with the class f1 f2 f3 f4 … class 0.4 0.6 0.4 0.6 1 0.2 0.4 1.6 -0.6 1 0.5 0.7 1.8 -0.8 1 0.7 0.8 0.2 0.9 2 0.9 0.8 1.8 -0.7 2 0.5 0.5 0.6 0.5 2
  • 22. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html uncorrelated with the class / seemingly random f1 f2 f3 f4 … class 0.4 0.6 0.4 0.6 1 0.2 0.4 1.6 -0.6 1 0.5 0.7 1.8 -0.8 1 0.7 0.8 0.2 0.9 2 0.9 0.8 1.8 -0.7 2 0.5 0.5 0.6 0.5 2
  • 23. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Correlation based FS reduces the dataset to this. f1 f2 … class 0.4 0.6 1 0.2 0.4 1 0.5 0.7 1 0.7 0.8 2 0.9 0.8 2 0.5 0.5 2
  • 24. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html But, col 5 shows us f3 + f4 – which is perfectly correlated with the class! f1 f2 f3 f4 … class 0.4 0.6 0.4 0.6 1 1 0.2 0.4 1.6 -0.6 1 1 0.5 0.7 1.8 -0.8 1 1 0.7 0.8 0.2 0.9 1.1 2 0.9 0.8 1.8 -0.7 1.1 2 0.5 0.5 0.6 0.5 1.1 2
  • 25. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Good FS Methods therefore: • Need to consider how well features work together • As we have noted before, if you take 100 features that are each well correlated with the class, they may simply be correlated strongly with each other, so provide no more information than just one of them
  • 26. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html `Complete’ methods Original dataset has N features You want to use a subset of k features A complete FS method means: try every subset of k features, and choose the best! the number of subsets is N! / k!(N−k)! what is this when N is 100 and k is 5?
  • 27. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html `Complete’ methods Original dataset has N features You want to use a subset of k features A complete FS method means: try every subset of k features, and choose the best! the number of subsets is N! / k!(N−k)! what is this when N is 100 and k is 5? 75,287,520 -- almost nothing
  • 28. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html `Complete’ methods Original dataset has N features You want to use a subset of k features A complete FS method means: try every subset of k features, and choose the best! the number of subsets is N! / k!(N−k)! what is this when N is 10,000 and k is 100?
  • 29. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html `Complete’ methods Original dataset has N features You want to use a subset of k features A complete FS method means: try every subset of k features, and choose the best! the number of subsets is N! / k!(N−k)! what is this when N is 10,000 and k is 100? 5,000,000,000,000,000,000,000,000,000,
  • 30. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000,
  • 31. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000,
  • 32. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000,
  • 33. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html … continued for another 114 slides. Actually it is around 5 × 1035,101 (there are around 1080 atoms in the universe)
  • 34. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Can you see a problem with complete methods?
  • 35. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html `forward’ methods These methods `grow’ a set S of features – • S starts empty • Find the best feature to add (by checking which one gives best performance on a test set when combined with S). • If overall performance has improved, return to step 2; else stop
  • 36. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html `backward’ methods These methods remove features one by one. • S starts with the full feature set • Find the best feature to remove (by checking which removal from S gives best performance on a test set). • If overall performance has improved, return to step 2; else stop
  • 37. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html • When might you choose forward instead of backward?
  • 38. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html The Relief method An instance-based, heuristic method – it works out weight values for Each feature, based on how important they seem to be in discriminating between near neighbours
  • 39. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html The Relief method There are two features here – the x and the y co-ordinate Initially they each have zero weight: wx = 0; wy = 0;
  • 40. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html The Relief method wx = 0; wy = 0; choose an instance at random
  • 41. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html The Relief method wx = 0; wy = 0; choose an instance at random, call it R
  • 42. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html The Relief method wx = 0; wy = 0; find H (hit: the nearest to R of the same class) and M (miss: the nearest to R of different class)
  • 43. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html The Relief method M H wx = 0; wy = 0; find H (hit: the nearest to R of the same class) and M (miss: the nearest to R of different class)
  • 44. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html The Relief method M H wx = 0; wy = 0; now we update the weights based on the distances between R and H and between R and M. This happens one feature at a time
  • 45. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html The Relief method M H To change wx, we add to it: (MR − HR)/n ; so, the further the `miss’ in the x direction, the higher the weight of x – the more important x is in terms of discriminating the classes HR MR
  • 46. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html The Relief method M H To change wy, we add to it: (MR − HR)/n again, but this time calculated in the y dimension; clearly the difference is smaller; differences in this feature don’t seem important in terms of class value HR MR
  • 47. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html The Relief method Maybe now we have wx = 0.07, wy = 0.002.
  • 48. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html The Relief method wx = 0.07, wy = 0.002; Pick another instance at random, and do the same again.
  • 49. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html The Relief method H M wx = 0.07, wy = 0.002; Identify H and M
  • 50. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html The Relief method H M wx = 0.07, wy = 0.002; Add the HR and MR differences divided by n, for each feature, again …
  • 51. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html The Relief method In the end, we have a weight value for each feature. The higher the value, the more relevant the feature. We can use these weights for feature selection, simply by choosing the features with the S highest weights (if we want to use S features) NOTE -It is important to use Relief F only on min-max normalised data in [0,1]. However it is fine if category attibutes are involved, in which case use Hamming distance for those attributes, -Why divide by n? Then, the weight values can be interpreted as a difference in probabilities.
  • 52. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html The Relief method, plucked directly from the original paper (Kira and Rendell 1992)
  • 53. Random(ised) methods aka Stochastic methods David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Suppose you have 1,000 features. There are 21000 possible subsets of features. One way to try to find a good subset is to run a stochastic search algorithm E.g. Hillclimbing, simulated annealing, genetic algorithm, particle swarm optimisation, …
  • 54. One slide introduction to (most) stochastic search algorithms David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html A search algorithm: BEGIN: 1. initialise a random population P of N candidate solutions (maybe just 1) (e.g. each solution is a random subset of features) 2. Evaluate each solution in P (e.g. accuracy of 3-NN using only the features in that solution) ITERATE: 1. generate a set C of new solutions, using the good ones in P (e.g. choose a good one and mutate it, combine bits of two or more solutions, etc …) 2. evaluate each of the new solutions in C. 3. Update P – e.g. by choosing the best N from all of P and C 4. If we have iterated a certain number of times, or accuracy is good enough, stop
  • 55. One slide introduction to (most) stochastic search algorithms David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html A search algorithm: BEGIN: 1. initialise a random population P of N candidate solutions (maybe just 1) (e.g. each solution is a random subset of features) 2. Evaluate each solution in P (e.g. accuracy of 3-NN using only the features in that solution) ITERATE: 1. generate a set C of new solutions, using the good ones in P (e.g. choose a good one and mutate it, combine bits of two or more solutions, etc …) 2. evaluate each of the new solutions in C. 3. Update P – e.g. by choosing the best N from all of P and C 4. If we have iterated a certain number of times, or accuracy is good enough, stop GENERATE TEST GENERATE TEST UPDATE
  • 56. Why randomised/search methods are good for FS David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Usually you have a large number of features (e.g. 1,000) You can give each feature a score (e.g. correlation with target, Relief weight, etc …), and choose the best-scoring features. This is very fast. However this does not evaluate how well features work with other features. You could give combinations of features a score, but there are too many combinations of multiple features. Search algorithms are the only suitable approach that get to grips with evaluating combinations of features.
  • 57. Some recommended reading, if you are interested, is on the website David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
  • 58. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html Optional: Coursework E • Give me ~150 words describing how you would improve this module. What aspects you would maintain, and what aspects you would change. In order to pass, you must include at least three distinct criticisms.