Lecture7 - IBk


Published on

Published in: Education
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Lecture7 - IBk

  1. 1. Introduction to Machine Learning Lecture 7 Instance Based Learning Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull
  2. 2. Recap of Lecture 6 LET’S START WITH DATA CLASSIFICATION Slide 2 Artificial Intelligence Machine Learning
  3. 3. Recap of Lecture 6 Data Set Classification Model How? We are going to deal with: • Data described by nominal and continuous attributes • Data that may have instances with missing values Slide 3 Artificial Intelligence Machine Learning
  4. 4. Recap of Lecture 6 We want to build decision trees How can I automatically generate these types of trees? Decide which attribute we should put in each node Decide a split point Rely on information theory We also saw many other improvements Slide 4 Artificial Intelligence Machine Learning
  5. 5. Today’s Agenda Classification without building a model K-Nearest Neighbor (kNN) Effect of K Distance functions Variants of K-NN Strengths and weaknesses Slide 5 Artificial Intelligence Machine Learning
  6. 6. Classification without Building a Model Forget about a global model! g g Simply store all the training examples Build local model f each new t t i t B ild a l l d l for h test instance Refered to as lazy learners Some approaches to IBL Nearest neighbors Locally weighted regression Case-based reasoning Slide 6 Artificial Intelligence Machine Learning
  7. 7. k-Nearest Neighbors Algorithm g Store all the training data Given a new t t instance Gi test i t Recover the k neighbors of the test instance Predict th P di t the majority class among the neighbors j it l th i hb Voronoi Cells: The feature space is decomposed into several cells. E.g. for k=1 Slide 7 Artificial Intelligence Machine Learning
  8. 8. k-Nearest Neighbors But, where is the learning process? , gp Select the k neighbors and return the majority class is learning? No, that’s just t i i N th t’ j t retrieving But still, some important issues Which k should I use? Which distance functions should I use? Should I maintain all instances of the training data set? Slide 8 Artificial Intelligence Machine Learning
  9. 9. Which k Should I Use? The effect of k 15-NN 1-NN Do you remember the discussion about overfitting in C4.5? Apply the same concepts here! Slide 9 Artificial Intelligence Machine Learning
  10. 10. Which k Should I Use? Some experimental results on the use of different k p 7-NN Number of neighbors Notice that the test error decreases as k increases but at k ≈ 5- increases, 7, it starts increasing again Rule of thumb: k=3 k=5 and k=7 seem to work ok in the k=3, k=5, majority of problems Slide 10 Artificial Intelligence Machine Learning
  11. 11. Distance Functions Distance functions must be able to Nominal attributes Continuous attributes C ti tt ib t Missing values The key They must return a low value for similar objects and a high value for different objects Seems obvious, right? But still, it is domain dependent obvious still There are many of them. Let’s see some of the most used Slide 11 Artificial Intelligence Machine Learning
  12. 12. Distance Functions Distance between two points in the same space p p d(x, y) Some properties expected to be satisfied in general d(x, y) ≥ 0 and d(x, x) = 0 d(x y) = d(y x) d(x, d(y, d(x, y) + d(y, z) ≥ d(x, z) Slide 12 Artificial Intelligence Machine Learning
  13. 13. Distances for Continuous Variables Given x=(x1,…,xn)’ and y=(y1,…,yn)’ n d E ( x, y ) = [∑ ( xi − yi ) 2 ]1/ 2 Euclidean i =1 n d E ( x, y ) = [∑ ( xi − yi ) ] q 1/ q Minkowsky i =1 n d ABS ( x, y ) = ∑ | xi − yi | Distance absolute value i =1 Slide 13 Artificial Intelligence Machine Learning
  14. 14. Distances for Continuous Variables What if attributes are measured over different scales? Attribute 1 ranging in [0,1] Attribute 2 ranging in [0 1000] [0, Can you detect any potential problem in the aforementioned distance functions? X in [0,1], y in [0,1000] X in [0,1000], y in [0,1000] Slide 14 Artificial Intelligence Machine Learning
  15. 15. Distances for Continuous Variables The larger the scale, the larger the influence of the g , g attribute in the distance function Solution: Normalize each attribute How: Normalization by means of the range d (ex1a , ex2 ) a d anorm (ex1 , ex2 ) = a a max a − min a Normalization by means of the standard deviation d (ex1a , ex2 ) a d anorm (ex1a , ex2 ) = a 4σ a Slide 15 Artificial Intelligence Machine Learning
  16. 16. Distances for Nominal Attributes Several metrics to deal with nominal attributes Overlap distance function Idea: Two nominal attributes are equal only if they have the same value Slide 16 Artificial Intelligence Machine Learning
  17. 17. Distances for Nominal Attributes Several metrics to deal with nominal attributes Value difference metric (VDM) C = number of classes P(a exia, c) = conditional probability P(a, that the output class is c given that the attribute a has de value exia. Idea: Two nominal values are similar if they have more similar correlations with the output classes See (Wilson & Martinez) for more distance functions Slide 17 Artificial Intelligence Machine Learning
  18. 18. Distances for Heterogeneous Attributes What if my data set is described by both nominal and continuous attributes? Apply the same distance function Use nominal distance functions for nominal attributes Use continuous distance function for continuous attributes Slide 18 Artificial Intelligence Machine Learning
  19. 19. Variants of kNN Different variants of kNN Distance-weighted kNN Attribute-weighted kNN Slide 19 Artificial Intelligence Machine Learning
  20. 20. Distance-Weighted kNN Inference of original kNN g The k nearest neighbors vote for the class Shouldn t Shouldn’t the closest examples have a higher influence in the decision process? Weight the contribution of each of the k neighbors wrt their distance E.g., k f ( xq ) = arg max ∑ wiδ (v, f ( xi )) ˆ k ∑ wi f ( xi ) v∈V i =1 f ( xq ) = ˆ i =1 1 k where wi = ∑ wi d ( xq , xi ) 2 i =1 More robust to noisy instances and outliers E.g.: Shepard’s method (Shepard,1968) Slide 20 Artificial Intelligence Machine Learning
  21. 21. Attribute-weighted kNN What if some attributes are irrelevant or misleading? g If irrelevant cost increases, but accuracy is not affected If misleading i l di cost increases and accuracy may d ti d decrease Weight attributes: n d w( x, y ) = ∑ wi ( xi − yi ) 2 i =1 How to determine the weights? Option 1: The expert p p p provide us with the weights g Option 2: Use a machine learning approach More will be said in the next lecture! Slide 21 Artificial Intelligence Machine Learning
  22. 22. Strengths and Weaknesses Strengths of kNN Building of a new local model for each test instance Learning has no cost Empirical results show that the method is highly accurate w.r.t other machine learning techniques Weaknesses Retrieving approach, but does not learn No global model. The knowledge is not legible Test cost increases linearly with the input instances No generalization Curse of dimensionality: What happens if we have many attributes? Noise and outliers may have a very negative effect Slide 22 Artificial Intelligence Machine Learning
  23. 23. Next Class From instance-based to case-based reasoning A little bit more on learning Distance functions Prototype selection Slide 23 Artificial Intelligence Machine Learning
  24. 24. Introduction to Machine Learning Lecture 7 Instance Based Learning Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull