Lecture4 - Machine Learning


Published on

Published in: Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Lecture4 - Machine Learning

  1. 1. Introduction to Machine Learning Lecture 4 Slides based on Francisco Herrera course on Data Mining Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull
  2. 2. Recap of Lecture 3 Typically, techniques in ML have been divided in different paradigms Inductive learning Explanation-based learning p g Analogy-based learning Evolutionary learning Connectionist Learning Slide 2 Artificial Intelligence Machine Learning
  3. 3. Recap of Lecture 3 Problems that we’ll study Data l D t classification: C4 5 kNN N ï B ifi ti C4.5, kNN, Naïve Bayes … 1. Statistical learning: SVM 2. Association analysis: A-priori 3. Link mining: Page Rank 4. Clustering: k-means 5. Reinforcement learning: Q-learning, XCS g g, 6. Regression 7. Genetic Fuzzy Systems 8. 8 Slide 3 Artificial Intelligence Machine Learning
  4. 4. Today’s Agenda Situation: Where Are We? Classification Prediction Clustering Association Data Mining Systems D t Mi i S t Slide 4 Artificial Intelligence Machine Learning
  5. 5. Situation: Where Are We? The input consists of examples featured by different characteristics Slide 5 Artificial Intelligence Machine Learning
  6. 6. Situation: Where Are We? What can we do with a bunch of examples? Depend on the type of examples we may have Classification: Find the class to which a new instance belongs to g E.g.: Find whether a new patient has cancer or not Numeric prediction: A variation of classification in which the output p p consists of numeric classes E.g.: Find the frequency of cancerous cell found Regression: Find a function that fits your examples E.g.: Find a function that controls your chain process Association: Find association among your problem attributes or variables E.g.: Find relations such as a patient with high-blood-pressure i E Fi d l ti h ti t ith hi h bl d is more likely to have heart-attack disease Clustering: Process to cluster/group the instances into classes E.g.: Group clients whose purchases are similar Slide 6 Artificial Intelligence Machine Learning
  7. 7. Data Classification Test set New instance Information based Knowledge on experience extraction t ti Learner Model Dataset Predicted Output Training set Slide 7 Artificial Intelligence Machine Learning
  8. 8. Example of Data Classification Data Set Classification Model How The classification model can be implemented in several ways: • Rules • Decision trees • Mathematical formulae Slide 8 Artificial Intelligence Machine Learning
  9. 9. Classification as a Two-Step Process Model usage: to classify future or unknown objects g y j Estimate the accuracy of the model The known label of test samples is compared with the label predicted by the system The accuracy rate is the p p y proportion of test examples that are p correctly classified by the model The test set is independent of the training set If the experts thing that the model is acceptable Then, use to the model to predict unknown examples Slide 9 Artificial Intelligence Machine Learning
  10. 10. Going to Real World katydids Definition: Given a collection of a o a ed data (in s annotated da a ( this case katydids a yd ds and grasshoppers), decide what type of insect in the following one grasshoppers Slide 10 Artificial Intelligence Machine Learning
  11. 11. Going to Real World How can I put a katydid or a g p y grasshopper into my pp y computer? Slide 11 Artificial Intelligence Machine Learning
  12. 12. Going to Real World Thus, the classification problem has been reduced to , p Insect Abdomen Antennae Insect ID Length L th Length L th Class Cl 1 2.7 5.5 Grasshopper 2 8.0 9.1 Katydid 3 0.9 09 4.7 47 Grasshopper 4 1.1 3.1 Grasshopper 5 5.4 8.5 Katykid 6 2.9 1.9 Grasshopper 7 6.1 6.6 Katydid 8 0.5 1.0 Grasshopper 9 8.3 6.6 Katydid 10 8.1 81 4.7 47 Katydid We have an observation with abdomen length 5 1 and 5.1 antennae length 7? Slide 12 Artificial Intelligence Machine Learning
  13. 13. Going to Real World Actually, we could write that y, How do I classify this domain? Slide 13 Artificial Intelligence Machine Learning
  14. 14. How to Create Classification Models We will study some of this methods: The decision tree C4 5 C4.5 The instance based classifier kNN The probabilistic classifier Naïve Bayes Slide 14 Artificial Intelligence Machine Learning
  15. 15. Regression or Prediction Prediction vs data classification Similarities: Both learn from a data set Difference: Diff In classification, each example has a class associated In I prediction, each example has a numerical value di ti h lh ill associated Slide 15 Artificial Intelligence Machine Learning
  16. 16. How to Extract a Model? Prediction works analogously to data classification Use U an algorithm to b ild a model build l ih dl Use this model to predict the new unknown example Types of regression Linear and multiple regression Non-linear regression Two of the most-used approaches to regression pp g Neural networks F lb d t Fuzzy rule-based systems Slide 16 Artificial Intelligence Machine Learning
  17. 17. Clustering The clustering problem gp Given a data base D={t1, t2, …, tn} of transactions and an integer value k, the c us e g p ob e refers to de e a ege a ue , e clustering problem e e s o define mapping f: D {1,…, k} where each ti is assigned to one cluster kj, 1<=j<=k Main difference with classification In classification, each example is labeled with a class classification In clustering, examples are not labeled Examples of clustering Segment customer data base based on similar buying patterns Group houses in a town into G h i t it neighborhoods based on similar features Identify new plant species Identify similar web usage patterns Slide 17 Artificial Intelligence Machine Learning
  18. 18. Example of Clustering Put these people in different clusters pp Which are the keys? Define what’s similar Group similar things in different clusters Size of the clusters? Which type of clustering do I want? Hierarchical clustering? Partition-based clustering? Slide 18 Artificial Intelligence Machine Learning
  19. 19. Are They Similar? Slide 19 Artificial Intelligence Machine Learning
  20. 20. How to Group the Elements? Slide 20 Artificial Intelligence Machine Learning
  21. 21. Which Type of Clustering? Many types of clustering y yp g Hierarchical: Nested set of clusters Partition-based: One set of clusters Incremental: Each element handled at one time Simultaneous: All elements h dl d t Si lt l t handled together th Overlapping/non-overlapping Hierarchical Clustering Partition-based Clustering Slide 21 Artificial Intelligence Machine Learning
  22. 22. Association Rules Given a set of items I={I1, I2, …, Im} and a database of {, , , } transactions D={t1, t2, …, tn} where ti={Ii1, Ii2, …, Iik} and Iij Є I The association rule problem is to identify all the rules with form X Y Rules ith minimum s pport R les with minim m support and confidence Support: Fraction of transactions which contain both X and Y Confidence: Measures of how often items in Y appear in transactions that contain X Slide 22 Artificial Intelligence Machine Learning
  23. 23. Example Association Rules I = {Beer, Bread Jelly Milk PeanutButter} {Beer Bread, Jelly, Milk, Support of {Bread, PeanutButter} is 60% Slide 23 Artificial Intelligence Machine Learning
  24. 24. Example Association Rules Slide 24 Artificial Intelligence Machine Learning
  25. 25. Before Finishing… Some environments that contain algorithms to perform g p data classification, regression, clustering and association rule mining KEEL: http://www keel es http://www.keel.es Weka: http://www.cs.waikato.ac.nz/ml/weka/ Rapid Miner: http://rapid-i.com/content/blogcategory/38/69/ Slide 25 Artificial Intelligence Machine Learning
  26. 26. Next Class Start with data classification C4.5 Slide 26 Artificial Intelligence Machine Learning
  27. 27. Introduction to Machine Learning Lecture 4 Slides based on Francisco Herrera course on Data Mining Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle gy q Universitat Ramon Llull