Introduction to Machine
Learning
Lecture 4
Slides based on Francisco Herrera course on Data Mining
Albert Orriols i Puig
aorriols@salle.url.edu
i l @ ll ld
Artificial Intelligence – Machine Learning
Enginyeria i Arquitectura La Salle
gy q
Universitat Ramon Llull
Recap of Lecture 3
Typically, techniques in ML have been divided in
different paradigms
Inductive learning
Explanation-based learning
p g
Analogy-based learning
Evolutionary learning
Connectionist Learning
Slide 2
Artificial Intelligence Machine Learning
Recap of Lecture 3
Problems that we’ll study
Data l
D t classification: C4 5 kNN N ï B
ifi ti C4.5, kNN, Naïve Bayes …
1.
Statistical learning: SVM
2.
Association analysis: A-priori
3.
Link mining: Page Rank
4.
Clustering: k-means
5.
Reinforcement learning: Q-learning, XCS
g g,
6.
Regression
7.
Genetic Fuzzy Systems
8.
8
Slide 3
Artificial Intelligence Machine Learning
Today’s Agenda
Situation: Where Are We?
Classification
Prediction
Clustering
Association
Data Mining Systems
D t Mi i S t
Slide 4
Artificial Intelligence Machine Learning
Situation: Where Are We?
The input consists of examples featured by
different characteristics
Slide 5
Artificial Intelligence Machine Learning
Situation: Where Are We?
What can we do with a bunch of examples?
Depend on the type of examples we may have
Classification: Find the class to which a new instance belongs to
g
E.g.: Find whether a new patient has cancer or not
Numeric prediction: A variation of classification in which the output
p p
consists of numeric classes
E.g.: Find the frequency of cancerous cell found
Regression: Find a function that fits your examples
E.g.: Find a function that controls your chain process
Association: Find association among your problem attributes or
variables
E.g.: Find relations such as a patient with high-blood-pressure i
E Fi d l ti h ti t ith hi h bl d is
more likely to have heart-attack disease
Clustering: Process to cluster/group the instances into classes
E.g.: Group clients whose purchases are similar
Slide 6
Artificial Intelligence Machine Learning
Data Classification
Test set
New instance
Information based Knowledge
on experience extraction
t ti
Learner Model
Dataset
Predicted Output
Training set
Slide 7
Artificial Intelligence Machine Learning
Example of Data Classification
Data Set Classification Model How
The classification model can be implemented in several ways:
• Rules
• Decision trees
• Mathematical formulae
Slide 8
Artificial Intelligence Machine Learning
Classification as a Two-Step Process
Model usage: to classify future or unknown objects
g y j
Estimate the accuracy of the model
The known label of test samples is compared with the label
predicted by the system
The accuracy rate is the p p
y proportion of test examples that are
p
correctly classified by the model
The test set is independent of the training set
If the experts thing that the model is acceptable
Then, use to the model to predict unknown examples
Slide 9
Artificial Intelligence Machine Learning
Going to Real World
katydids
Definition: Given a collection of
a o a ed data (in s
annotated da a ( this case katydids
a yd ds
and grasshoppers), decide what type
of insect in the following one
grasshoppers
Slide 10
Artificial Intelligence Machine Learning
Going to Real World
How can I put a katydid or a g
p y grasshopper into my
pp y
computer?
Slide 11
Artificial Intelligence Machine Learning
Going to Real World
Thus, the classification problem has been reduced to
, p
Insect Abdomen Antennae Insect
ID Length
L th Length
L th Class
Cl
1 2.7 5.5 Grasshopper
2 8.0 9.1 Katydid
3 0.9
09 4.7
47 Grasshopper
4 1.1 3.1 Grasshopper
5 5.4 8.5 Katykid
6 2.9 1.9 Grasshopper
7 6.1 6.6 Katydid
8 0.5 1.0 Grasshopper
9 8.3 6.6 Katydid
10 8.1
81 4.7
47 Katydid
We have an observation with abdomen length 5 1 and
5.1
antennae length 7?
Slide 12
Artificial Intelligence Machine Learning
Going to Real World
Actually, we could write that
y,
How do I classify this domain?
Slide 13
Artificial Intelligence Machine Learning
How to Create Classification Models
We will study some of this methods:
The decision tree C4 5
C4.5
The instance based classifier kNN
The probabilistic classifier Naïve Bayes
Slide 14
Artificial Intelligence Machine Learning
Regression or Prediction
Prediction vs data classification
Similarities: Both learn from a data set
Difference:
Diff
In classification, each example has a class associated
In
I prediction, each example has a numerical value
di ti h lh ill
associated
Slide 15
Artificial Intelligence Machine Learning
How to Extract a Model?
Prediction works analogously to data classification
Use
U an algorithm to b ild a model
build
l ih dl
Use this model to predict the new unknown example
Types of regression
Linear and multiple regression
Non-linear regression
Two of the most-used approaches to regression
pp g
Neural networks
F lb d t
Fuzzy rule-based systems
Slide 16
Artificial Intelligence Machine Learning
Clustering
The clustering problem
gp
Given a data base D={t1, t2, …, tn} of transactions and an
integer value k, the c us e g p ob e refers to de e a
ege a ue , e clustering problem e e s o define
mapping f: D {1,…, k} where each ti is assigned to one cluster
kj, 1<=j<=k
Main difference with classification
In classification, each example is labeled with a class
classification
In clustering, examples are not labeled
Examples of clustering
Segment customer data base based on
similar buying patterns
Group houses in a town into
G h i t it
neighborhoods based on similar features
Identify new plant species
Identify similar web usage patterns
Slide 17
Artificial Intelligence Machine Learning
Example of Clustering
Put these people in different clusters
pp
Which are the keys?
Define what’s similar
Group similar things in
different clusters
Size of the clusters?
Which type of clustering do I want?
Hierarchical clustering?
Partition-based clustering?
Slide 18
Artificial Intelligence Machine Learning
Are They Similar?
Slide 19
Artificial Intelligence Machine Learning
How to Group the Elements?
Slide 20
Artificial Intelligence Machine Learning
Which Type of Clustering?
Many types of clustering
y yp g
Hierarchical: Nested set of clusters
Partition-based: One set of clusters
Incremental: Each element handled at one time
Simultaneous: All elements h dl d t
Si lt l t handled together
th
Overlapping/non-overlapping
Hierarchical Clustering Partition-based Clustering
Slide 21
Artificial Intelligence Machine Learning
Association Rules
Given a set of items I={I1, I2, …, Im} and a database of
{, , , }
transactions D={t1, t2, …, tn} where ti={Ii1, Ii2, …, Iik}
and Iij Є I
The association rule problem is to identify all the rules
with form
X Y
Rules ith minimum s pport
R les with minim m support and confidence
Support: Fraction of transactions which contain both X and Y
Confidence: Measures of how often items in Y appear in
transactions that contain X
Slide 22
Artificial Intelligence Machine Learning
Example Association Rules
I = {Beer, Bread Jelly Milk PeanutButter}
{Beer Bread, Jelly, Milk,
Support of {Bread, PeanutButter} is 60%
Slide 23
Artificial Intelligence Machine Learning
Example Association Rules
Slide 24
Artificial Intelligence Machine Learning
Before Finishing…
Some environments that contain algorithms to perform
g p
data classification, regression, clustering and
association rule mining
KEEL: http://www keel es
http://www.keel.es
Weka: http://www.cs.waikato.ac.nz/ml/weka/
Rapid Miner: http://rapid-i.com/content/blogcategory/38/69/
Slide 25
Artificial Intelligence Machine Learning
Next Class
Start with data classification
C4.5
Slide 26
Artificial Intelligence Machine Learning
Introduction to Machine
Learning
Lecture 4
Slides based on Francisco Herrera course on Data Mining
Albert Orriols i Puig
aorriols@salle.url.edu
i l @ ll ld
Artificial Intelligence – Machine Learning
Enginyeria i Arquitectura La Salle
gy q
Universitat Ramon Llull
0 comments
Post a comment