SlideShare a Scribd company logo
Machine 
Learning 
for 
Language 
Technology 
Lecture 
8: 
Decision 
Trees 
and 
k-­‐Nearest 
Neighbors 
Marina 
San:ni 
Department 
of 
Linguis:cs 
and 
Philology 
Uppsala 
University, 
Uppsala, 
Sweden 
Autumn 
2014 
Acknowledgement: 
Thanks 
to 
Prof. 
Joakim 
Nivre 
for 
course 
design 
and 
materials 
1
Supervised 
Classifica:on 
• Divide 
instances 
into 
(two 
or 
more) 
classes 
– Instance 
(feature 
vector): 
• Features 
may 
be 
categorical 
or 
numerical 
– Class 
(label): 
– Training 
data: 
• Classifica:on 
in 
Language 
Technology 
– Spam 
filtering 
(spam 
vs. 
non-­‐spam) 
– Spelling 
error 
detec:on 
(error 
vs. 
no 
error) 
– Text 
categoriza:on 
(news, 
economy, 
culture, 
sport, 
...) 
– Named 
en:ty 
classifica:on 
(person, 
loca:on, 
organiza:on, 
...) 
2 
€ 
X = {xt,y t }N 
t=1 
€ 
x = x1, …, xm 
€ 
y
Models 
for 
Classifica:on 
• Genera:ve 
probabilis:c 
models: 
– Model 
of 
P(x, 
y) 
– Naive 
Bayes 
• Condi:onal 
probabilis:c 
models: 
– Model 
of 
P(y 
| 
x) 
– Logis:c 
regression 
• Discrimina:ve 
model: 
– No 
explicit 
probability 
model 
– Decision 
trees, 
nearest 
neighbor 
classifica:on 
– Perceptron, 
support 
vector 
machines, 
MIRA 
3
Repea:ng… 
• Noise 
– Data 
cleaning 
is 
expensive 
and 
:me 
consuming 
• Margin 
• Induc:ve 
bias 
Types of inductive biases 
• Minimum cross-­‐‑validation error 
• Maximum margin 
• Minimum description length 
• [...] 
4
DECISION 
TREES 
5
Decision 
Trees 
• Hierarchical 
tree 
structure 
for 
classifica:on 
– Each 
internal 
node 
specifies 
a 
test 
of 
some 
feature 
– Each 
branch 
corresponds 
to 
a 
value 
for 
the 
tested 
feature 
– Each 
leaf 
node 
provides 
a 
classifica:on 
for 
the 
instance 
• Represents 
a 
disjunc:on 
of 
conjunc:ons 
of 
constraints 
– Each 
path 
from 
root 
to 
leaf 
specifies 
a 
conjunc:on 
of 
tests 
– The 
tree 
itself 
represents 
the 
disjunc:on 
of 
all 
paths 
6
Decision 
Tree 
7
Divide 
and 
Conquer 
• Internal 
decision 
nodes 
– Univariate: 
Uses 
a 
single 
a]ribute, 
xi 
• Numeric 
xi 
: 
Binary 
split 
: 
xi 
> 
wm 
• Discrete 
xi 
: 
n-­‐way 
split 
for 
n 
possible 
values 
– Mul:variate: 
Uses 
all 
a]ributes, 
x 
• Leaves 
– Classifica:on: 
class 
labels 
(or 
propor:ons) 
– Regression: 
r 
average 
(or 
local 
fit) 
• Learning: 
– Greedy 
recursive 
algorithm 
– Find 
best 
split 
X = 
(X1, 
..., 
Xp), 
then 
induce 
tree 
for 
each 
Xi 
8
Classifica:on 
Trees 
(ID3, 
CART, 
C4.5) 
9 
• For 
node 
m, 
Nm 
instances 
reach 
m, 
Ni 
m 
belong 
to 
Ci 
• Node 
m 
is 
pure 
if 
pi 
m 
i 
Nm 
is 
0 
or 
1 
• Measure 
of 
impurity 
is 
entropy 
KΣ 
Im = − pm 
i logpi 
2m 
i=1 
ˆP 
(Ci |x,m) ≡ pm 
i = 
Nm
Example: 
Entropy 
• Assume 
two 
classes 
(C1, 
C2) 
and 
four 
instances 
(x1, 
x2, 
x3, 
x4) 
• Case 
1: 
– C1 
= 
{x1, 
x2, 
x3, 
x4}, 
C2 
= 
{ 
} 
– Im = 
– 
(1 
log 
1 
+ 
0 
log 
0) 
= 
0 
• Case 
2: 
– C1 
= 
{x1, 
x2, 
x3}, 
C2 
= 
{x4} 
– Im = 
– 
(0.75 
log 
0.75 
+ 
0.25 
log 
0.25) 
= 
0.81 
• Case 
3: 
– C1 
= 
{x1, 
x2}, 
C2 
= 
{x3, 
x4} 
– Im = 
– 
(0.5 
log 
0.5 
+ 
0.5 
log 
0.5) 
= 
1 
10
Best 
Split 
ˆP 
(Ci |x,m, j) ≡ pmj 
i = 
i 
Nmj 
Nmj 
i log2pmj 
11 
• If 
node 
m 
is 
pure, 
generate 
a 
leaf 
and 
stop, 
otherwise 
split 
with 
test 
t 
and 
con:nue 
recursively 
• Find 
the 
test 
that 
minimizes 
impurity 
• Impurity 
ager 
split 
with 
test 
t: 
– Nmj 
of 
Nm 
take 
branch 
j 
– Ni 
mj 
belong 
to 
Ci 
Im 
nΣ 
t = − 
Nmj 
Nj=1 m 
KΣ 
pmj 
i 
KΣ 
i=1 
€ 
Im = − pm 
i logpi 
2m 
i=1 
ˆP 
Ci ( |x,m) ≡ pm 
i = 
Ni 
m 
Nm
12
Informa:on 
Gain 
• We 
want 
to 
determine 
which 
a]ribute 
in 
a 
given 
set 
of 
training 
feature 
vectors 
is 
most 
useful 
for 
discrimina:ng 
between 
the 
classes 
to 
be 
learned. 
• •Informa:on 
gain 
tells 
us 
how 
important 
a 
given 
a]ribute 
of 
the 
feature 
vectors 
is. 
• •We 
will 
use 
it 
to 
decide 
the 
ordering 
of 
a]ributes 
in 
the 
nodes 
of 
a 
decision 
tree. 
13
Informa:on 
Gain 
and 
Gain 
Ra:o 
€ 
KΣ 
Im = − pm 
i logpi 
2m 
i=1 
i log2pmj 
Nmj 
Nj=1 m 
14 
• Choosing 
the 
test 
that 
minimizes 
impurity 
maximizes 
the 
informa:on 
gain 
(IG): 
• Informa:on 
gain 
prefers 
features 
with 
many 
values 
• The 
normalized 
version 
is 
called 
gain 
ra:o 
(GR): 
€ 
Im 
nΣ 
t = − 
Nmj 
j=1 Nm 
pmj 
i 
KΣ 
i=1 
€ 
IGm 
t = I− It 
m m 
Vt = − 
m 
Nmj 
Nm 
log2 
nΣ 
€ 
GRm 
t = 
IGt 
m 
Vm 
t
Pruning 
Trees 
• Decision 
trees 
are 
suscep:ble 
to 
overfijng 
• Remove 
subtrees 
for 
be]er 
generaliza:on: 
– Prepruning: 
Early 
stopping 
(e.g., 
with 
entropy 
threshold) 
– Postpruning: 
Grow 
whole 
tree, 
then 
prune 
subtrees 
• Prepruning 
is 
faster, 
postpruning 
is 
more 
accurate 
(requires 
a 
separate 
valida:on 
set) 
15
Rule 
Extrac:on 
from 
Trees 
16 
C4.5Rules 
(Quinlan, 
1993)
Learning 
Rules 
• Rule 
induc:on 
is 
similar 
to 
tree 
induc:on 
but 
– tree 
induc:on 
is 
breadth-­‐first 
– rule 
induc:on 
is 
depth-­‐first 
(one 
rule 
at 
a 
:me) 
• Rule 
learning: 
– A 
rule 
is 
a 
conjunc:on 
of 
terms 
(cf. 
tree 
path) 
– A 
rule 
covers 
an 
example 
if 
all 
terms 
of 
the 
rule 
evaluate 
to 
true 
for 
the 
example 
(cf. 
sequence 
of 
tests) 
– Sequen:al 
covering: 
Generate 
rules 
one 
at 
a 
:me 
un:l 
all 
posi:ve 
examples 
are 
covered 
– IREP 
(Fürnkrantz 
and 
Widmer, 
1994), 
Ripper 
(Cohen, 
1995) 
17
Proper:es 
of 
Decision 
Trees 
• Decision 
trees 
are 
appropriate 
for 
classifica:on 
when: 
– Features 
can 
be 
both 
categorical 
and 
numeric 
– Disjunc:ve 
descrip:ons 
may 
be 
required 
– Training 
data 
may 
be 
noisy 
(missing 
values, 
incorrect 
labels) 
– Interpreta:on 
of 
learned 
model 
is 
important 
(rules) 
• Induc0ve 
bias 
of 
(most) 
decision 
tree 
learners: 
1. Prefers 
trees 
with 
informa0ve 
aEributes 
close 
to 
the 
root 
2. Prefers 
smaller 
trees 
over 
bigger 
ones 
(with 
pruning) 
3. Preference 
bias 
(incomplete 
search 
of 
complete 
space) 
18
K-­‐NEAREST 
NEIGHBORS 
19
Nearest 
Neighbor 
Classifica:on 
• An 
old 
idea 
• Key 
components: 
– Storage 
of 
old 
instances 
– Similarity-­‐based 
reasoning 
to 
new 
instances 
20 
This “rule of nearest neighbor” has considerable 
elementary intuitive appeal and probably corresponds to 
practice in many situations. For example, it is possible 
that much medical diagnosis is influenced by the doctor's 
recollection of the subsequent history of an earlier patient 
whose symptoms resemble in some way those of the 
current patient. (Fix and Hodges, 1952)
k-­‐Nearest 
Neighbour 
• Learning: 
– Store 
training 
instances 
in 
memory 
• Classifica:on: 
– Given 
new 
test 
instance 
x, 
• Compare 
it 
to 
all 
stored 
instances 
• Compute 
a 
distance 
between 
x 
and 
each 
stored 
instance 
xt 
• Keep 
track 
of 
the 
k 
closest 
instances 
(nearest 
neighbors) 
– Assign 
to 
x 
the 
majority 
class 
of 
the 
k 
nearest 
neighbours 
• A 
geometric 
view 
of 
learning 
– Proximity 
in 
(feature) 
space 
à 
same 
class 
– The 
smoothness 
assump:on 
21
Eager 
and 
Lazy 
Learning 
• Eager 
learning 
(e.g., 
decision 
trees) 
– Learning 
– 
induce 
an 
abstract 
model 
from 
data 
– Classifica:on 
– 
apply 
model 
to 
new 
data 
• Lazy 
learning 
(a.k.a. 
memory-­‐based 
learning) 
– Learning 
– 
store 
data 
in 
memory 
– Classifica:on 
– 
compare 
new 
data 
to 
data 
in 
memory 
– Proper:es: 
• Retains 
all 
the 
informa:on 
in 
the 
training 
set 
– 
no 
abstrac:on 
• Complex 
hypothesis 
space 
– 
suitable 
for 
natural 
language? 
• Main 
drawback 
– 
classifica:on 
can 
be 
very 
inefficient 
22
Dimensions 
of 
a 
k-­‐NN 
Classifier 
• Distance 
metric 
– How 
do 
we 
measure 
distance 
between 
instances? 
– Determines 
the 
layout 
of 
the 
instance 
space 
• The 
k 
parameter 
– How 
large 
neighborhood 
should 
we 
consider? 
– Determines 
the 
complexity 
of 
the 
hypothesis 
space 
23
Distance 
Metric 
1 
• Overlap 
= 
count 
of 
mismatching 
features 
24 
€ 
mΣ 
Δ(x,z) = δ (xi,zi ) 
i=1 
⎧ 
⎪⎪⎪ 
0 
⎨ 
⎪⎪⎪ 
⎩ 
x z if numeric else 
if x = 
z 
i i 
≠ 
− 
i i 
− 
= 
i i 
i i 
i i 
if x z 
x z 
1 
, 
max min 
δ ( , )
Distance 
Metric 
2 
• MVDM 
= 
Modified 
Value 
Difference 
Metric 
25 
Δ(mΣ 
x,z) = δ (xi,zi ) 
i=1 
δ KΣ 
(xi,zi ) = P(Cj | xi ) − P(Cj | zi) 
j=1
The 
k 
parameter 
• Tunes 
the 
complexity 
of 
the 
hypothesis 
space 
– If 
k 
= 
1, 
every 
instance 
has 
its 
own 
neighborhood 
– If 
k 
= 
N, 
all 
the 
feature 
space 
is 
one 
neighborhood 
26 
k = 1 
k = 15 
ˆE 
MΣ 
= E(h|V) = 1 h xt ( ) ≠ rt ( ) 
t=1
A 
Simple 
Example 
Δ(mΣ 
x,z) = δ (xi,zi ) 
i=1 
27 
Training 
set: 
1. (a, 
b, 
a, 
c) 
à 
A 
2. (a, 
b, 
c, 
a) 
à 
B 
3. (b, 
a, 
c, 
c) 
à 
C 
4. (c, 
a, 
b, 
c) 
à 
A 
New 
instance: 
5. (a, 
b, 
b, 
a) 
Distances 
(overlap): 
Δ(1, 
5) 
= 
2 
Δ(2, 
5) 
= 
1 
Δ(3, 
5) 
= 
4 
Δ(4, 
5) 
= 
3 
k-­‐NN 
classifica:on: 
1-­‐NN(5) 
= 
B 
2-­‐NN(5) 
= 
A/B 
3-­‐NN(5) 
= 
A 
4-­‐NN(5) 
= 
A 
⎧ 
⎪⎪⎪ 
0 
⎨ 
⎪⎪⎪ 
⎩ 
x z if numeric else 
if x = 
z 
i i 
≠ 
− 
i i 
− 
= 
i i 
i i 
i i 
if x z 
x z 
1 
, 
max min 
δ ( , )
Further 
Varia:ons 
on 
k-­‐NN 
• Feature 
weights: 
– The 
overlap 
metric 
gives 
all 
features 
equal 
weight 
– Features 
can 
be 
weighted 
by 
IG 
or 
GR 
• Weighted 
vo:ng: 
– The 
normal 
decision 
rule 
gives 
all 
neighbors 
equal 
weight 
– Instances 
can 
be 
weighted 
by 
(inverse) 
distance 
28
Proper:es 
of 
k-­‐NN 
• Nearest 
neighbor 
classifica:on 
is 
appropriate 
when: 
– Features 
can 
be 
both 
categorical 
and 
numeric 
– Disjunc:ve 
descrip:ons 
may 
be 
required 
– Training 
data 
may 
be 
noisy 
(missing 
values, 
incorrect 
labels) 
– Fast 
classifica:on 
is 
not 
crucial 
• Induc0ve 
bias 
of 
k-­‐NN: 
1. Nearby 
instances 
should 
have 
the 
same 
label 
(smoothness 
assump0on) 
2. All 
features 
are 
equally 
important 
(without 
feature 
weights) 
3. Complexity 
tuned 
by 
the 
k 
parameter 
29
End 
of 
Lecture 
8 
30

More Related Content

What's hot

Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3
Xueping Peng
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
Clustering
ClusteringClustering
Clustering
M Rizwan Aqeel
 
2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts
Krish_ver2
 
Feature selection
Feature selectionFeature selection
Feature selection
Dong Guo
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
butest
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
rajshreemuthiah
 
Classification in Data Mining
Classification in Data MiningClassification in Data Mining
Classification in Data Mining
Rashmi Bhat
 
K means clustering
K means clusteringK means clustering
K means clustering
keshav goyal
 
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsData Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Salah Amean
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
parry prabhu
 
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
ijscmc
 
Data Mining
Data MiningData Mining
Data Mining
IIIT ALLAHABAD
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
Simplilearn
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering
Yan Xu
 
Classification Techniques
Classification TechniquesClassification Techniques
Classification Techniques
Kiran Bhowmick
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
Neha Kulkarni
 
kmean clustering
kmean clusteringkmean clustering
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
Carlos Castillo (ChaTo)
 

What's hot (20)

Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
 
Clustering
ClusteringClustering
Clustering
 
2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts2.1 Data Mining-classification Basic concepts
2.1 Data Mining-classification Basic concepts
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
 
Classification in Data Mining
Classification in Data MiningClassification in Data Mining
Classification in Data Mining
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsData Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
 
Data Mining
Data MiningData Mining
Data Mining
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering
 
Classification Techniques
Classification TechniquesClassification Techniques
Classification Techniques
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
kmean clustering
kmean clusteringkmean clustering
kmean clustering
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 

Viewers also liked

K Nearest Neighbor Presentation
K Nearest Neighbor PresentationK Nearest Neighbor Presentation
K Nearest Neighbor Presentation
Dessy Amirudin
 
How Emotional Are Users' Needs? Emotion in Query Logs
How Emotional Are Users' Needs? Emotion in Query LogsHow Emotional Are Users' Needs? Emotion in Query Logs
How Emotional Are Users' Needs? Emotion in Query Logs
Marina Santini
 
Query dependent ranking using k nearest neighbor
Query dependent ranking using k nearest neighborQuery dependent ranking using k nearest neighbor
Query dependent ranking using k nearest neighbor
iyo
 
k*NN2016
k*NN2016k*NN2016
k*NN2016
Riku Iwabuchi
 
Nber slides11 lecture2
Nber slides11 lecture2Nber slides11 lecture2
Nber slides11 lecture2
NBER
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
Marina Santini
 
a novel approach for breast cancer detection using data mining tool weka
a novel approach for breast cancer detection using data mining tool wekaa novel approach for breast cancer detection using data mining tool weka
a novel approach for breast cancer detection using data mining tool weka
ahmad abdelhafeez
 
K nn
K nnK nn
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
Enrico Hugo, CFP®, CEH
 
Sistem informasi prediksi harga kebutuhan bahan pokok
Sistem informasi prediksi harga kebutuhan bahan pokokSistem informasi prediksi harga kebutuhan bahan pokok
Sistem informasi prediksi harga kebutuhan bahan pokok
Abdul Fauzan
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)
Marina Santini
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an Introduction
Ali Abbasi
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Marina Santini
 
Linear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector MachinesLinear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector Machines
butest
 
Nearest Neighbor Algorithm Zaffar Ahmed
Nearest Neighbor Algorithm  Zaffar AhmedNearest Neighbor Algorithm  Zaffar Ahmed
Nearest Neighbor Algorithm Zaffar Ahmed
Zaffar Ahmed Shaikh
 
K-Nearest Neighbor
K-Nearest NeighborK-Nearest Neighbor
K-Nearest Neighbor
Jonathas Magalhães
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Marina Santini
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data Visualisation
Sunderland City Council
 
Text classification methods
Text classification methodsText classification methods
Text classification methods
Young Alista
 
Transaction processing system
Transaction processing systemTransaction processing system
Transaction processing system
Jayson Jueco
 

Viewers also liked (20)

K Nearest Neighbor Presentation
K Nearest Neighbor PresentationK Nearest Neighbor Presentation
K Nearest Neighbor Presentation
 
How Emotional Are Users' Needs? Emotion in Query Logs
How Emotional Are Users' Needs? Emotion in Query LogsHow Emotional Are Users' Needs? Emotion in Query Logs
How Emotional Are Users' Needs? Emotion in Query Logs
 
Query dependent ranking using k nearest neighbor
Query dependent ranking using k nearest neighborQuery dependent ranking using k nearest neighbor
Query dependent ranking using k nearest neighbor
 
k*NN2016
k*NN2016k*NN2016
k*NN2016
 
Nber slides11 lecture2
Nber slides11 lecture2Nber slides11 lecture2
Nber slides11 lecture2
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
 
a novel approach for breast cancer detection using data mining tool weka
a novel approach for breast cancer detection using data mining tool wekaa novel approach for breast cancer detection using data mining tool weka
a novel approach for breast cancer detection using data mining tool weka
 
K nn
K nnK nn
K nn
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Sistem informasi prediksi harga kebutuhan bahan pokok
Sistem informasi prediksi harga kebutuhan bahan pokokSistem informasi prediksi harga kebutuhan bahan pokok
Sistem informasi prediksi harga kebutuhan bahan pokok
 
Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)Lecture 3b: Decision Trees (1 part)
Lecture 3b: Decision Trees (1 part)
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an Introduction
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
 
Linear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector MachinesLinear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector Machines
 
Nearest Neighbor Algorithm Zaffar Ahmed
Nearest Neighbor Algorithm  Zaffar AhmedNearest Neighbor Algorithm  Zaffar Ahmed
Nearest Neighbor Algorithm Zaffar Ahmed
 
K-Nearest Neighbor
K-Nearest NeighborK-Nearest Neighbor
K-Nearest Neighbor
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data Visualisation
 
Text classification methods
Text classification methodsText classification methods
Text classification methods
 
Transaction processing system
Transaction processing systemTransaction processing system
Transaction processing system
 

Similar to Lecture 8: Decision Trees & k-Nearest Neighbors

Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Marina Santini
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
jins0618
 
KNN presentation.pdf
KNN presentation.pdfKNN presentation.pdf
KNN presentation.pdf
AbhilashChauhan14
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
6 clustering
6 clustering6 clustering
6 clustering
Viet-Trung TRAN
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
Albert Bifet
 
09Evaluation_Clustering.pdf
09Evaluation_Clustering.pdf09Evaluation_Clustering.pdf
09Evaluation_Clustering.pdf
BizuayehuDesalegn
 
clustering in DataMining and differences in models/ clustering in data mining
clustering in DataMining and differences in models/ clustering in data miningclustering in DataMining and differences in models/ clustering in data mining
clustering in DataMining and differences in models/ clustering in data mining
RevathiSundar4
 
KNN.pptx
KNN.pptxKNN.pptx
KNN.pptx
Rahul Halder
 
KNN.pptx
KNN.pptxKNN.pptx
KNN.pptx
dfgd7
 
Clustering
ClusteringClustering
Clustering
Smrutiranjan Sahu
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Sebastian Raschka
 
KNN
KNNKNN
unit 4 nearest neighbor.ppt
unit 4 nearest neighbor.pptunit 4 nearest neighbor.ppt
unit 4 nearest neighbor.ppt
PRANAVKUMAR699137
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniques
Xavier Rafael Palou
 
ML unit3.pptx
ML unit3.pptxML unit3.pptx
ML unit3.pptx
SwarnaKumariChinni
 
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
MostafaHazemMostafaa
 
Lect4
Lect4Lect4
Lect4
sumit621
 
clustering_hierarchical ckustering notes.pdf
clustering_hierarchical ckustering notes.pdfclustering_hierarchical ckustering notes.pdf
clustering_hierarchical ckustering notes.pdf
p_manimozhi
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
36rajneekant
 

Similar to Lecture 8: Decision Trees & k-Nearest Neighbors (20)

Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
 
KNN presentation.pdf
KNN presentation.pdfKNN presentation.pdf
KNN presentation.pdf
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
 
6 clustering
6 clustering6 clustering
6 clustering
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
09Evaluation_Clustering.pdf
09Evaluation_Clustering.pdf09Evaluation_Clustering.pdf
09Evaluation_Clustering.pdf
 
clustering in DataMining and differences in models/ clustering in data mining
clustering in DataMining and differences in models/ clustering in data miningclustering in DataMining and differences in models/ clustering in data mining
clustering in DataMining and differences in models/ clustering in data mining
 
KNN.pptx
KNN.pptxKNN.pptx
KNN.pptx
 
KNN.pptx
KNN.pptxKNN.pptx
KNN.pptx
 
Clustering
ClusteringClustering
Clustering
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
KNN
KNNKNN
KNN
 
unit 4 nearest neighbor.ppt
unit 4 nearest neighbor.pptunit 4 nearest neighbor.ppt
unit 4 nearest neighbor.ppt
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniques
 
ML unit3.pptx
ML unit3.pptxML unit3.pptx
ML unit3.pptx
 
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
2_9_asset-v1-ColumbiaX+CSMM.101x+2T2017+type@asset+block@AI_edx_ml_unsupervis...
 
Lect4
Lect4Lect4
Lect4
 
clustering_hierarchical ckustering notes.pdf
clustering_hierarchical ckustering notes.pdfclustering_hierarchical ckustering notes.pdf
clustering_hierarchical ckustering notes.pdf
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
 

More from Marina Santini

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Marina Santini
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Marina Santini
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
Marina Santini
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability Features
Marina Santini
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
Marina Santini
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
Marina Santini
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
Marina Santini
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
Marina Santini
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
Marina Santini
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)
Marina Santini
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
Marina Santini
 
Lecture: Word Senses
Lecture: Word SensesLecture: Word Senses
Lecture: Word Senses
Marina Santini
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
Marina Santini
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
Marina Santini
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational Semantics
Marina Santini
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
Marina Santini
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
Marina Santini
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation
Marina Santini
 
Lecture 2: Preliminaries (Understanding and Preprocessing data)
Lecture 2: Preliminaries (Understanding and Preprocessing data)Lecture 2: Preliminaries (Understanding and Preprocessing data)
Lecture 2: Preliminaries (Understanding and Preprocessing data)
Marina Santini
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
Marina Santini
 

More from Marina Santini (20)

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability Features
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Lecture: Word Senses
Lecture: Word SensesLecture: Word Senses
Lecture: Word Senses
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational Semantics
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation
 
Lecture 2: Preliminaries (Understanding and Preprocessing data)
Lecture 2: Preliminaries (Understanding and Preprocessing data)Lecture 2: Preliminaries (Understanding and Preprocessing data)
Lecture 2: Preliminaries (Understanding and Preprocessing data)
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 

Recently uploaded

Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
MJDuyan
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
Himanshu Rai
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Denish Jangid
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
PsychoTech Services
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
TechSoup
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
Krassimira Luka
 
ZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptxZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptx
dot55audits
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
iammrhaywood
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
Wahiba Chair Training & Consulting
 

Recently uploaded (20)

Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
 
ZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptxZK on Polkadot zero knowledge proofs - sub0.pptx
ZK on Polkadot zero knowledge proofs - sub0.pptx
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
 

Lecture 8: Decision Trees & k-Nearest Neighbors

  • 1. Machine Learning for Language Technology Lecture 8: Decision Trees and k-­‐Nearest Neighbors Marina San:ni Department of Linguis:cs and Philology Uppsala University, Uppsala, Sweden Autumn 2014 Acknowledgement: Thanks to Prof. Joakim Nivre for course design and materials 1
  • 2. Supervised Classifica:on • Divide instances into (two or more) classes – Instance (feature vector): • Features may be categorical or numerical – Class (label): – Training data: • Classifica:on in Language Technology – Spam filtering (spam vs. non-­‐spam) – Spelling error detec:on (error vs. no error) – Text categoriza:on (news, economy, culture, sport, ...) – Named en:ty classifica:on (person, loca:on, organiza:on, ...) 2 € X = {xt,y t }N t=1 € x = x1, …, xm € y
  • 3. Models for Classifica:on • Genera:ve probabilis:c models: – Model of P(x, y) – Naive Bayes • Condi:onal probabilis:c models: – Model of P(y | x) – Logis:c regression • Discrimina:ve model: – No explicit probability model – Decision trees, nearest neighbor classifica:on – Perceptron, support vector machines, MIRA 3
  • 4. Repea:ng… • Noise – Data cleaning is expensive and :me consuming • Margin • Induc:ve bias Types of inductive biases • Minimum cross-­‐‑validation error • Maximum margin • Minimum description length • [...] 4
  • 6. Decision Trees • Hierarchical tree structure for classifica:on – Each internal node specifies a test of some feature – Each branch corresponds to a value for the tested feature – Each leaf node provides a classifica:on for the instance • Represents a disjunc:on of conjunc:ons of constraints – Each path from root to leaf specifies a conjunc:on of tests – The tree itself represents the disjunc:on of all paths 6
  • 8. Divide and Conquer • Internal decision nodes – Univariate: Uses a single a]ribute, xi • Numeric xi : Binary split : xi > wm • Discrete xi : n-­‐way split for n possible values – Mul:variate: Uses all a]ributes, x • Leaves – Classifica:on: class labels (or propor:ons) – Regression: r average (or local fit) • Learning: – Greedy recursive algorithm – Find best split X = (X1, ..., Xp), then induce tree for each Xi 8
  • 9. Classifica:on Trees (ID3, CART, C4.5) 9 • For node m, Nm instances reach m, Ni m belong to Ci • Node m is pure if pi m i Nm is 0 or 1 • Measure of impurity is entropy KΣ Im = − pm i logpi 2m i=1 ˆP (Ci |x,m) ≡ pm i = Nm
  • 10. Example: Entropy • Assume two classes (C1, C2) and four instances (x1, x2, x3, x4) • Case 1: – C1 = {x1, x2, x3, x4}, C2 = { } – Im = – (1 log 1 + 0 log 0) = 0 • Case 2: – C1 = {x1, x2, x3}, C2 = {x4} – Im = – (0.75 log 0.75 + 0.25 log 0.25) = 0.81 • Case 3: – C1 = {x1, x2}, C2 = {x3, x4} – Im = – (0.5 log 0.5 + 0.5 log 0.5) = 1 10
  • 11. Best Split ˆP (Ci |x,m, j) ≡ pmj i = i Nmj Nmj i log2pmj 11 • If node m is pure, generate a leaf and stop, otherwise split with test t and con:nue recursively • Find the test that minimizes impurity • Impurity ager split with test t: – Nmj of Nm take branch j – Ni mj belong to Ci Im nΣ t = − Nmj Nj=1 m KΣ pmj i KΣ i=1 € Im = − pm i logpi 2m i=1 ˆP Ci ( |x,m) ≡ pm i = Ni m Nm
  • 12. 12
  • 13. Informa:on Gain • We want to determine which a]ribute in a given set of training feature vectors is most useful for discrimina:ng between the classes to be learned. • •Informa:on gain tells us how important a given a]ribute of the feature vectors is. • •We will use it to decide the ordering of a]ributes in the nodes of a decision tree. 13
  • 14. Informa:on Gain and Gain Ra:o € KΣ Im = − pm i logpi 2m i=1 i log2pmj Nmj Nj=1 m 14 • Choosing the test that minimizes impurity maximizes the informa:on gain (IG): • Informa:on gain prefers features with many values • The normalized version is called gain ra:o (GR): € Im nΣ t = − Nmj j=1 Nm pmj i KΣ i=1 € IGm t = I− It m m Vt = − m Nmj Nm log2 nΣ € GRm t = IGt m Vm t
  • 15. Pruning Trees • Decision trees are suscep:ble to overfijng • Remove subtrees for be]er generaliza:on: – Prepruning: Early stopping (e.g., with entropy threshold) – Postpruning: Grow whole tree, then prune subtrees • Prepruning is faster, postpruning is more accurate (requires a separate valida:on set) 15
  • 16. Rule Extrac:on from Trees 16 C4.5Rules (Quinlan, 1993)
  • 17. Learning Rules • Rule induc:on is similar to tree induc:on but – tree induc:on is breadth-­‐first – rule induc:on is depth-­‐first (one rule at a :me) • Rule learning: – A rule is a conjunc:on of terms (cf. tree path) – A rule covers an example if all terms of the rule evaluate to true for the example (cf. sequence of tests) – Sequen:al covering: Generate rules one at a :me un:l all posi:ve examples are covered – IREP (Fürnkrantz and Widmer, 1994), Ripper (Cohen, 1995) 17
  • 18. Proper:es of Decision Trees • Decision trees are appropriate for classifica:on when: – Features can be both categorical and numeric – Disjunc:ve descrip:ons may be required – Training data may be noisy (missing values, incorrect labels) – Interpreta:on of learned model is important (rules) • Induc0ve bias of (most) decision tree learners: 1. Prefers trees with informa0ve aEributes close to the root 2. Prefers smaller trees over bigger ones (with pruning) 3. Preference bias (incomplete search of complete space) 18
  • 20. Nearest Neighbor Classifica:on • An old idea • Key components: – Storage of old instances – Similarity-­‐based reasoning to new instances 20 This “rule of nearest neighbor” has considerable elementary intuitive appeal and probably corresponds to practice in many situations. For example, it is possible that much medical diagnosis is influenced by the doctor's recollection of the subsequent history of an earlier patient whose symptoms resemble in some way those of the current patient. (Fix and Hodges, 1952)
  • 21. k-­‐Nearest Neighbour • Learning: – Store training instances in memory • Classifica:on: – Given new test instance x, • Compare it to all stored instances • Compute a distance between x and each stored instance xt • Keep track of the k closest instances (nearest neighbors) – Assign to x the majority class of the k nearest neighbours • A geometric view of learning – Proximity in (feature) space à same class – The smoothness assump:on 21
  • 22. Eager and Lazy Learning • Eager learning (e.g., decision trees) – Learning – induce an abstract model from data – Classifica:on – apply model to new data • Lazy learning (a.k.a. memory-­‐based learning) – Learning – store data in memory – Classifica:on – compare new data to data in memory – Proper:es: • Retains all the informa:on in the training set – no abstrac:on • Complex hypothesis space – suitable for natural language? • Main drawback – classifica:on can be very inefficient 22
  • 23. Dimensions of a k-­‐NN Classifier • Distance metric – How do we measure distance between instances? – Determines the layout of the instance space • The k parameter – How large neighborhood should we consider? – Determines the complexity of the hypothesis space 23
  • 24. Distance Metric 1 • Overlap = count of mismatching features 24 € mΣ Δ(x,z) = δ (xi,zi ) i=1 ⎧ ⎪⎪⎪ 0 ⎨ ⎪⎪⎪ ⎩ x z if numeric else if x = z i i ≠ − i i − = i i i i i i if x z x z 1 , max min δ ( , )
  • 25. Distance Metric 2 • MVDM = Modified Value Difference Metric 25 Δ(mΣ x,z) = δ (xi,zi ) i=1 δ KΣ (xi,zi ) = P(Cj | xi ) − P(Cj | zi) j=1
  • 26. The k parameter • Tunes the complexity of the hypothesis space – If k = 1, every instance has its own neighborhood – If k = N, all the feature space is one neighborhood 26 k = 1 k = 15 ˆE MΣ = E(h|V) = 1 h xt ( ) ≠ rt ( ) t=1
  • 27. A Simple Example Δ(mΣ x,z) = δ (xi,zi ) i=1 27 Training set: 1. (a, b, a, c) à A 2. (a, b, c, a) à B 3. (b, a, c, c) à C 4. (c, a, b, c) à A New instance: 5. (a, b, b, a) Distances (overlap): Δ(1, 5) = 2 Δ(2, 5) = 1 Δ(3, 5) = 4 Δ(4, 5) = 3 k-­‐NN classifica:on: 1-­‐NN(5) = B 2-­‐NN(5) = A/B 3-­‐NN(5) = A 4-­‐NN(5) = A ⎧ ⎪⎪⎪ 0 ⎨ ⎪⎪⎪ ⎩ x z if numeric else if x = z i i ≠ − i i − = i i i i i i if x z x z 1 , max min δ ( , )
  • 28. Further Varia:ons on k-­‐NN • Feature weights: – The overlap metric gives all features equal weight – Features can be weighted by IG or GR • Weighted vo:ng: – The normal decision rule gives all neighbors equal weight – Instances can be weighted by (inverse) distance 28
  • 29. Proper:es of k-­‐NN • Nearest neighbor classifica:on is appropriate when: – Features can be both categorical and numeric – Disjunc:ve descrip:ons may be required – Training data may be noisy (missing values, incorrect labels) – Fast classifica:on is not crucial • Induc0ve bias of k-­‐NN: 1. Nearby instances should have the same label (smoothness assump0on) 2. All features are equally important (without feature weights) 3. Complexity tuned by the k parameter 29