Visit to a blind student's school🧑🦯🧑🦯(community medicine)
Lecture7 - IBk
1. Introduction to Machine
Learning
Lecture 7
Instance Based Learning
Albert Orriols i Puig
aorriols@salle.url.edu
i l @ ll ld
Artificial Intelligence – Machine Learning
Enginyeria i Arquitectura La Salle
gy q
Universitat Ramon Llull
2. Recap of Lecture 6
LET’S START WITH DATA
CLASSIFICATION
Slide 2
Artificial Intelligence Machine Learning
3. Recap of Lecture 6
Data Set Classification Model How?
We are going to deal with:
• Data described by nominal and continuous attributes
• Data that may have instances with missing values
Slide 3
Artificial Intelligence Machine Learning
4. Recap of Lecture 6
We want to build decision trees
How can I automatically
generate these types
of trees?
Decide which attribute we
should put in each node
Decide a split point
Rely on information theory
We also saw many other improvements
Slide 4
Artificial Intelligence Machine Learning
5. Today’s Agenda
Classification without building a model
K-Nearest Neighbor (kNN)
Effect of K
Distance functions
Variants of K-NN
Strengths and weaknesses
Slide 5
Artificial Intelligence Machine Learning
6. Classification without Building a Model
Forget about a global model!
g g
Simply store all the training examples
Build local model f each new t t i t
B ild a l l d l for h test instance
Refered to as lazy learners
Some approaches to IBL
Nearest neighbors
Locally weighted regression
Case-based reasoning
Slide 6
Artificial Intelligence Machine Learning
7. k-Nearest Neighbors
Algorithm
g
Store all the training data
Given a new t t instance
Gi test i t
Recover the k neighbors of the test instance
Predict th
P di t the majority class among the neighbors
j it l th i hb
Voronoi Cells: The feature space is
decomposed into several cells.
E.g. for k=1
Slide 7
Artificial Intelligence Machine Learning
8. k-Nearest Neighbors
But, where is the learning process?
, gp
Select the k neighbors and return the majority class is learning?
No, that’s just t i i
N th t’ j t retrieving
But still, some important issues
Which k should I use?
Which distance functions should I use?
Should I maintain all instances of the training data set?
Slide 8
Artificial Intelligence Machine Learning
9. Which k Should I Use?
The effect of k
15-NN 1-NN
Do you remember the discussion about overfitting in C4.5?
Apply the same concepts here!
Slide 9
Artificial Intelligence Machine Learning
10. Which k Should I Use?
Some experimental results on the use of different k
p
7-NN
Number of neighbors
Notice that the test error decreases as k increases but at k ≈ 5-
increases,
7, it starts increasing again
Rule of thumb: k=3 k=5 and k=7 seem to work ok in the
k=3, k=5,
majority of problems
Slide 10
Artificial Intelligence Machine Learning
11. Distance Functions
Distance functions must be able to
Nominal attributes
Continuous attributes
C ti tt ib t
Missing values
The key
They must return a low value for similar objects and a high
value for different objects
Seems obvious, right? But still, it is domain dependent
obvious still
There are many of them. Let’s see some of the most
used
Slide 11
Artificial Intelligence Machine Learning
12. Distance Functions
Distance between two points in the same space
p p
d(x, y)
Some properties expected to be satisfied in general
d(x, y) ≥ 0 and d(x, x) = 0
d(x y) = d(y x)
d(x, d(y,
d(x, y) + d(y, z) ≥ d(x, z)
Slide 12
Artificial Intelligence Machine Learning
13. Distances for Continuous Variables
Given x=(x1,…,xn)’ and y=(y1,…,yn)’
n
d E ( x, y ) = [∑ ( xi − yi ) 2 ]1/ 2
Euclidean
i =1
n
d E ( x, y ) = [∑ ( xi − yi ) ] q 1/ q
Minkowsky
i =1
n
d ABS ( x, y ) = ∑ | xi − yi |
Distance absolute value
i =1
Slide 13
Artificial Intelligence Machine Learning
14. Distances for Continuous Variables
What if attributes are measured over different scales?
Attribute 1 ranging in [0,1]
Attribute 2 ranging in [0 1000]
[0,
Can you detect any potential problem in the aforementioned
distance functions?
X in [0,1], y in [0,1000] X in [0,1000], y in [0,1000]
Slide 14
Artificial Intelligence Machine Learning
15. Distances for Continuous Variables
The larger the scale, the larger the influence of the
g , g
attribute in the distance function
Solution: Normalize each attribute
How:
Normalization by means of the range
d (ex1a , ex2 )
a
d anorm (ex1 , ex2 ) =
a a
max a − min a
Normalization by means of the standard deviation
d (ex1a , ex2 )
a
d anorm (ex1a , ex2 ) =
a
4σ a
Slide 15
Artificial Intelligence Machine Learning
16. Distances for Nominal Attributes
Several metrics to deal with nominal attributes
Overlap distance function
Idea: Two nominal attributes are equal only if they have the same
value
Slide 16
Artificial Intelligence Machine Learning
17. Distances for Nominal Attributes
Several metrics to deal with nominal attributes
Value difference metric (VDM)
C = number of classes
P(a exia, c) = conditional probability
P(a,
that the output class is c given that
the attribute a has de value exia.
Idea: Two nominal values are similar if they have more similar
correlations with the output classes
See (Wilson & Martinez) for more distance functions
Slide 17
Artificial Intelligence Machine Learning
18. Distances for Heterogeneous Attributes
What if my data set is described by both nominal and
continuous attributes?
Apply the same distance function
Use nominal distance functions for nominal attributes
Use continuous distance function for continuous attributes
Slide 18
Artificial Intelligence Machine Learning
19. Variants of kNN
Different variants of kNN
Distance-weighted kNN
Attribute-weighted kNN
Slide 19
Artificial Intelligence Machine Learning
20. Distance-Weighted kNN
Inference of original kNN
g
The k nearest neighbors vote for the class
Shouldn t
Shouldn’t the closest examples have a higher influence in the
decision process?
Weight the contribution of each of the k neighbors wrt their distance
E.g., k
f ( xq ) = arg max ∑ wiδ (v, f ( xi ))
ˆ k
∑ wi f ( xi )
v∈V i =1
f ( xq ) =
ˆ i =1
1 k
where wi = ∑ wi
d ( xq , xi ) 2 i =1
More robust to noisy instances and outliers
E.g.: Shepard’s method (Shepard,1968)
Slide 20
Artificial Intelligence Machine Learning
21. Attribute-weighted kNN
What if some attributes are irrelevant or misleading?
g
If irrelevant cost increases, but accuracy is not affected
If misleading
i l di cost increases and accuracy may d
ti d decrease
Weight attributes:
n
d w( x, y ) = ∑ wi ( xi − yi ) 2
i =1
How to determine the weights?
Option 1: The expert p
p p provide us with the weights
g
Option 2: Use a machine learning approach
More will be said in the next lecture!
Slide 21
Artificial Intelligence Machine Learning
22. Strengths and Weaknesses
Strengths of kNN
Building of a new local model for each test instance
Learning has no cost
Empirical results show that the method is highly accurate w.r.t other
machine learning techniques
Weaknesses
Retrieving approach, but does not learn
No global model. The knowledge is not legible
Test cost increases linearly with the input instances
No generalization
Curse of dimensionality: What happens if we have many attributes?
Noise and outliers may have a very negative effect
Slide 22
Artificial Intelligence Machine Learning
23. Next Class
From instance-based to case-based reasoning
A little bit more on learning
Distance functions
Prototype selection
Slide 23
Artificial Intelligence Machine Learning
24. Introduction to Machine
Learning
Lecture 7
Instance Based Learning
Albert Orriols i Puig
aorriols@salle.url.edu
i l @ ll ld
Artificial Intelligence – Machine Learning
Enginyeria i Arquitectura La Salle
gy q
Universitat Ramon Llull