SlideShare a Scribd company logo
1 of 28
Download to read offline
Oblique Trees Classification
Andrejus Parfionovas
Adele Cutler
Utah State University
Tree-based classification:
Fever?
Sore
throat?
Running
Nose?
Cold HealthyAngina Flu
Yes            No Yes           No
Yes                       No
Weight / Height > 26
(Body Mass Index)
Risk of
heart attack
Cholesterol/Age > 4
(Framingham
Point Score)
Risk of
heart attack
More
tests
Yes No
Yes No
Classical tree-based methods:
1. Pick a variable (1 dimension)
2. Find the best split to separate classes
(Gini index as a measure of purity)
3. Repeat steps 1-2 and choose the best
separable variable
4. Go to step 1 with fewer observations,
until all nodes in the tree are pure
The feature space is now partitioned
into a set of rectangles
Newly proposed algorithm:
1. Pick two variables
2. Find the best split in 2D space to
separate classes (Gini-based criteria)
3. Repeat steps 1-2 and choose the best
separable pair of variables
4. Go to step 1 with fewer observations,
until all nodes in the tree are pure
The feature space is now partitioned
by oblique linear splits
Problem formulation:
Finding the best split in 2D:
D
C
E
A
B
1 2 3 4 5
Finding the best split in 2D:
Slopes
DE -3
AB -1
CE -.5
AE .25
BE .67
AC 1
AD 1.33
CD 2
BD 2.5
BC 3
D
C
E
A 4
5
B 3
1
2
Finding the best split in 2D:
Slopes
DE -3
AB -1
CE -.5
AE .25
BE .67
AC 1
AD 1.33
CD 2
BD 2.5
BC 3
D
C
E 5
A 3
4
B
2
1
Finding the best split in 2D:
Slopes
DE -3
AB -1
CE -.5
AE .25
BE .67
AC 1
AD 1.33
CD 2
BD 2.5
BC 3
Ordering of the points
A .3 B .47 C .47 D .4 E
A B C E .3 D
B .3 A C E D
B A E same C D
B E .26 A C D
E same B A C D
E B C 0 A D
Finding the best split in 2D:
Slopes
DE -3
AB -1
CE -.5
AE .25
BE .67
AC 1
AD 1.33
CD 2
BD 2.5
BC 3
Algorithm Performance:
O(n⋅n1 – n1
2) ~ between O(n –1) and O(n2),
where n – sample size,
n1 ∈ (0, n) – biggest class size
No need to recalculate Gini index!
(can easily be updated)
Simulation: Orthogonal XOR
Training sample = 500
Test sample = 1000
# of repetitions = 6
-10 -5 0 5 10
-10-50510
Simulation: Orthogonal XOR
20 40 60 80 100
012345
number of trees
errorrate(%)
Classical trees: –x–x–
Oblique trees: –o–o–
Simulation: Orthogonal XOR
Classical trees: –x–x–
Oblique trees: –o–o–
20 40 60 80 100
012345
number of trees
errorrate(%)
Simulation: Diagonal XOR
Training sample = 400
Test sample = 1000
# of repetitions = 5
-10 -5 0 5 10
-10-50510
Simulation: Diagonal XOR
Classical trees: –x–x–
Oblique trees: –o–o–
20 40 60 80 100
01234567
number of trees
errorrate(%)
Simulation: Diagonal XOR
Classical trees: –x–x–
Oblique trees: –o–o–
20 40 60 80 100
01234567
number of trees
errorrate(%)
Simulation: Mixed XOR
Training sample = 400
Test sample = 1000
# of repetitions = 5
-10 -5 0 5 10
-10-50510
20 40 60 80 100
01234567
number of trees
errorrate(%)Simulation: Mixed XOR
Classical trees: –x–x–
Oblique trees: –o–o–
20 40 60 80 100
01234567
number of trees
errorrate(%)Simulation: Mixed XOR
Classical trees: –x–x–
Oblique trees: –o–o–
Example: Anderson’s Iris Data
150 observations (50 in each class)
3 classes (species)
4 variables:
Sepal Length
Sepal Width
Petal Length
Petal Width
Classical (orthogonal) tree:
Oblique tree classification:
Trees comparison:
Petal Length < 2.45
Class 1 Petal Width < 1.75
Petal Length < 4.95 Class 3
Class 2 Class 3
SW > (.79)⋅SL – 1.27
Class 1 PW > 3.72 – (.45)⋅PL
PW > (.3)⋅SW+.81 Class 2
Class 3 Class 2
Orthogonal Tree Oblique Tree
Summary:
Consistently, smaller error rate
Fewer trees required for desired accuracy
Smoother, more adaptive class separation
Provides insights of the data structure
(variables selection)
Orthogonal separation – stay with classics
Future investigations:
Randomization
When choosing between ties
Try random pairs of variables
High dimensional data
C++ based package for R users
Thank you for your attention!
Special thanks to Dr. Adele Cutler
for her contribution and advice
More information:
http://cc.usu.edu/~andrej/
andrej@cc.usu.edu
adele.cutler@usu.edu

More Related Content

Viewers also liked

Viewers also liked (12)

Unidad ll secuencia didáctica integrando un software
Unidad ll secuencia didáctica integrando un software Unidad ll secuencia didáctica integrando un software
Unidad ll secuencia didáctica integrando un software
 
Certificates
CertificatesCertificates
Certificates
 
Reporte de la práctica
Reporte de  la prácticaReporte de  la práctica
Reporte de la práctica
 
Presentation of Climablock Nordic products in English
Presentation of Climablock Nordic products in EnglishPresentation of Climablock Nordic products in English
Presentation of Climablock Nordic products in English
 
Unidad ll secuencia didáctica integrando un software
Unidad ll secuencia didáctica integrando un software Unidad ll secuencia didáctica integrando un software
Unidad ll secuencia didáctica integrando un software
 
Reporte de la práctica
Reporte de  la prácticaReporte de  la práctica
Reporte de la práctica
 
Anatomía: iniciación al sistema nervioso
Anatomía: iniciación al sistema nerviosoAnatomía: iniciación al sistema nervioso
Anatomía: iniciación al sistema nervioso
 
Modelos de equipamiento
Modelos de equipamientoModelos de equipamiento
Modelos de equipamiento
 
Momento explicativo de la gestión gerencial
Momento explicativo de la gestión gerencialMomento explicativo de la gestión gerencial
Momento explicativo de la gestión gerencial
 
Neurologia de miembro inferior
Neurologia de miembro inferiorNeurologia de miembro inferior
Neurologia de miembro inferior
 
Las ventajas de ser invisible
Las ventajas de ser invisibleLas ventajas de ser invisible
Las ventajas de ser invisible
 
Meaning of confort in a house
 Meaning of confort in a house Meaning of confort in a house
Meaning of confort in a house
 

Similar to Parfionovas_Interface_2008

Business Analytics using R.ppt
Business Analytics using R.pptBusiness Analytics using R.ppt
Business Analytics using R.ppt
Rohit Raj
 
Lecture 11 Paired t test.pptx
Lecture 11 Paired t test.pptxLecture 11 Paired t test.pptx
Lecture 11 Paired t test.pptx
shakirRahman10
 
kmean_naivebayes.pptx
kmean_naivebayes.pptxkmean_naivebayes.pptx
kmean_naivebayes.pptx
Aryanhayaran
 

Similar to Parfionovas_Interface_2008 (20)

analytical representation of data
 analytical representation of data analytical representation of data
analytical representation of data
 
Unit 4 rm
Unit 4  rmUnit 4  rm
Unit 4 rm
 
Split Criterions for Variable Selection Using Decision Trees
Split Criterions for Variable Selection Using Decision TreesSplit Criterions for Variable Selection Using Decision Trees
Split Criterions for Variable Selection Using Decision Trees
 
Statistics 3, 4
Statistics 3, 4Statistics 3, 4
Statistics 3, 4
 
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
Combining Decision Trees Based on Imprecise Probabilities and Uncertainty Mea...
 
The two sample t-test
The two sample t-testThe two sample t-test
The two sample t-test
 
Library design and series selection by Pareto ranking
Library design and series selection by Pareto rankingLibrary design and series selection by Pareto ranking
Library design and series selection by Pareto ranking
 
NN Classififcation Neural Network NN.pptx
NN Classififcation   Neural Network NN.pptxNN Classififcation   Neural Network NN.pptx
NN Classififcation Neural Network NN.pptx
 
Business Analytics using R.ppt
Business Analytics using R.pptBusiness Analytics using R.ppt
Business Analytics using R.ppt
 
Lecture 11 Paired t test.pptx
Lecture 11 Paired t test.pptxLecture 11 Paired t test.pptx
Lecture 11 Paired t test.pptx
 
Group 3 measures of central tendency and variation - (mean, median, mode, ra...
Group 3  measures of central tendency and variation - (mean, median, mode, ra...Group 3  measures of central tendency and variation - (mean, median, mode, ra...
Group 3 measures of central tendency and variation - (mean, median, mode, ra...
 
Statistics - Basics
Statistics - BasicsStatistics - Basics
Statistics - Basics
 
L3. Decision Trees
L3. Decision TreesL3. Decision Trees
L3. Decision Trees
 
Central Tendancy.pdf
Central Tendancy.pdfCentral Tendancy.pdf
Central Tendancy.pdf
 
kmean_naivebayes.pptx
kmean_naivebayes.pptxkmean_naivebayes.pptx
kmean_naivebayes.pptx
 
Airthmetic mean
Airthmetic meanAirthmetic mean
Airthmetic mean
 
Normal Distribution slides(1).pptx
Normal Distribution slides(1).pptxNormal Distribution slides(1).pptx
Normal Distribution slides(1).pptx
 
Basics of Stats (2).pptx
Basics of Stats (2).pptxBasics of Stats (2).pptx
Basics of Stats (2).pptx
 
My7class
My7classMy7class
My7class
 
Analysis of single samples
Analysis of single samplesAnalysis of single samples
Analysis of single samples
 

Parfionovas_Interface_2008