HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

Artificial Data Sets based on
Knowledge Generators: Analysis of
g y
Learning Algorithms Efficiency

Joaquin Rios-Boutin
Rios Boutin
Albert Orriols-Puig
Josep Maria Garrell Guiu
Josep-Maria Garrell-Guiu

Grup de Recerca en Sistemes Intel·ligents
Enginyeria i Arquitectura La Salle Universitat Ramon Llull
Salle,
{jrios, aorriols, josepmg}@salle.url.edu

Motivation

What is the Holy Grail of Machine Learning?
– Find the right Learning Algorithm to every Problem
– Real Problems are black boxes
• We don’t know which knowledge is contained

DI
• We can’t answer:
– When to stop training?
– How much efficient is the learning process?

– Artificial Problems:
DI
K
• Knowledge-driven
• Property-driven
Complex.Met.

Enginyeria i Arquitectura la Salle Slide 2
GRSI

Framework

Machine Learning as a Communication System

Communication Chanel
Learning
Environment.
Algorithm.
Al ith
Data Set
Knowledge
Learned
to be learned
Knowledge

GRSI

Outline

1. Algorithm Evaluation Methodology Definition
1 Al ih E l i M hdl D fi i i

2.
2 Methodology Implementation

3. Experiment Description

4. Results and Analysis

5. Conclusions and Further Work

GRSI

1 Algorithm Evaluation Process
g

Process Execution and Control DB

Problem
Sampling Sampling Algorithm
Data Set
Method Size Parameters

Accuracy
DI Learning
Generation Algorithm 1.2
DS1kMulplx6m1

1

0.8

0.6

0.4

0.2

0
0 2000 4000 6000 8000 10000

Knowledge
Comparison 1.2
DS1kMulplx6m1

1

Optimal
p
0.8

0.6

Population
0.4

0.2

0
0 2000 4000 6000 8000 10000

GRSI

1 Algorithm Evaluation Process Dimensions

100000
10000
mpling
g

1000
Size

100
Sam
S

Apn
A
10 Alg.P
1 AP1 aram
SRS SIS RRS RIS .
Sampling Methods

To each Problem
GRSI

Outline


2.




GRSI

2 Knowledge Representation
g p

Condition Class/Action

a11 a12 a1j a1m C1 Rule1

ai1 ai2 aij aim Ci
Rule
Set

an1 an2 anj anm Cn

aij={0,1, #} CiєN
{0,1,

GRSI

2 Sampling Methods
pg

SRS Sequential Rule Selection SIS Sequential Instance Selection
Sequential #
substitution
1st
2nd
Random # substitution
2nd

1st

RRS Random Rule Selection RIS Random Instance Selection
Random # substitution Sequential #
2nd
1st substitution
2nd
1st

GRSI

2 Problems to learn and Learning Algorithm

Mux6 Mux11 Parity5
0 0 # # # 0 0 0 0 0 0 0 0
0 0 # # # 1 1 0 0 0 0 1 1
0 1 # # 0 # 0 0 0 0 1 0 1
0 1 # # 1 # 1 0 0 0 1 1 0

1 1 0 # # # 0 1 1 1 1 0 0

XCS
1 1 1 # # # 1 1 1 1 1 1 1

Parity5-3
Position5 Position11
0 0 0 0 0 0 0 0 0 0 0 # # # 0
0 0 0 0 1 1 0 0 0 0 1 # # # 1
0 0 0 1 # 2 0 0 0 1 0 # # # 1
0 0 1 # # 3 0 0 0 1 1 # # # 0

1 # # # # 5 1 1 1 1 0 # # # 0
1 1 1 1 1 # # # 1

GRSI

2 Problem Properties
p

Optimal Rule Sets
– Complete
– Non overlapped
– Irreducible
Why?
– Simple structure of knowledge complexity
–V
Very k
known artificial problems
tifi i l bl

GRSI

Outline


2.




GRSI

3 Sampling and Learning Iteration
pg g

{
{Sampling Iteration} Problem {Training Iteration}
pg } { g }
Sampling Sampling Algorithm
Data Set
Method Size Parameters

Accuracy
DI Learning
Genaration Algorithm 1.2
DS1kMulplx6m1

1

0.8

0.6

0.4

0.2

0
0 2000 4000 6000 8000 10000

Knowledge
Comparison 1.2
DS1kMulplx6m1

1

Optimal
0.8

Population
P l ti
0.6

0.4

0.2

0
0 2000 4000 6000 8000 10000

GRSI

3 Output Results and Iteration Reduction
p

Output Results
– 2 Plots to every Problem Sampling Method Sampling Size and
Problem, Method,
Algorithm Parameters. 1.2
DS1kMulplx6m1

• Optimal Population 1

• Accuracy 0.8

Iteration R d ti
It ti Reduction 0.6

– SIS Pure sequential
0.4

• No Sampling Iteration Needed
0.2
– Problems without “don’t care”
0
• SRS=SIS and RRS=RIS 0 2000 4000 6000 8000 10000

Slide 14
GRSI

3 Experimental Parameters
p

Number of Problems = 6
Number f Sampling M th d = 4
N b of S li Methods
Number of different Sampling Sizes = 4
Number of different Algorithms Parameters Sets = 2
Number f Sampling It ti
N b of S li Iterations = 10
Number of Training Iterations = 10
Number of Data Sets Generated = 744
Number of Training Process = 14880

Slide 15
GRSI

Outline


2.




GRSI

Problem Dimension

Sampling M. = RIS Sampling Size = 1000 Learning Alg. Param. = pDNC 0.2
M Alg Param 02
1.2
Mux6 1.1
DS1kMulplx6m4
DS1kParity5m4
Parity5
1
1

0.9
0.8
0.8
0.6
0.7

0.6
0.4

0.5
0.2
0.4
0
0.3

-0.2 0.2
0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000
1.05
DS1kMulplx6m4 1.05
DS1kParity5m4
1
1
0.95

0.9
0.95

0.85
0.9
0.8

0.75
0.85
0.7

0.65
0.8
0.6

0.75 0.55
0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000

Slide 17
GRSI

Sampling Method Dimension
pg

Problem = Position5 Sampling Size = 1000 Learning Alg. Param. = pDNC 0.2
Alg Param 02
SRS Sequential Rule Selection RIS Random Instance Selection
1.2 0.9
DS1kPosition5m1 DS1kPosition5m4
0.8
1
0.7

0.6
0.8

0.5
0.6
0.4

0.3
0.4
0.2

0.1
0.2

0
0
-0.1
0 2000 4000 6000 8000 10000
0 2000 4000 6000 8000 10000

1.1 1.1
DS1kPosition5m1 DS1kPosition5m4

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6
0.6

0.5
0.5

0.4
0.4

0.3
0.3
0 2000 4000 6000 8000 10000
0 2000 4000 6000 8000 10000
Slide 18
GRSI

Sampling Size Dimension
pg

Problem = Parity5 Sampling M.= RIS Learning Alg. Param. = pDNC 0.2
M= Alg Param 02
1.1
1.1
DS100Parity5m4
100 DS10kParity5m4
10000
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4

0.4
0.3

0.3
0.2

0.1 0.2
0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000

1.05
1.05
DS10kParity5m4
DS100Parity5m4
1
1
0.95
0.95
0.9
0.9
0.85
0.85
0.8

0.8 0.75

0.7
0.75
0.65
0.7
0.6
0.65
0.55
0 2000 4000 6000 8000 10000
0.6
0 2000 4000 6000 8000 10000

Slide 19
GRSI

Parameter Algorithm Dimension
g

Problem = Mux6 Sampling M. = RIS Sampling Size = 1000
M
1 1.2
DS1kMulplx6m4 DS1kMulplx6m4

pDNC 0.8 0.9
pDNC 0.2
1
0.8

0.7 0.8
0.6
0.6
0.5

0.4
0.4
0.3

0.2
0.2

0.1
0
0

-0.1 -0.2
0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000

1.05
DS1kMulplx6m4 1.05
DS1kMulplx6m4
1

0.95 1
0.9

0.85 0.95

0.8
0.9
0.75

0.7
0.85
0.65

0.6
0.8
0.55

0.5
0.75
0 2000 4000 6000 8000 10000
0 2000 4000 6000 8000 10000

Slide 20
GRSI

Outline


2.




GRSI

Conclusions and Further Work

Conclusions
– Automatic Learning Algorithm Analyzer based on Artificial Data Sets
– Four dimensions comparisons
– Methodology Implementation, Experiment and Results Analysis

Further Work
– Non ORS Problems
– R l Att ib t
Real Attributes
– Sampling Methods based on distance or transition matrix
– Multi Step Problems
p
– Different Learning Algorithms
– Different Knowledge representations
– Knowledge Covering Metrics
– Applying Data Set Complexity Metrics Suite

Slide 22
GRSI

GRSI

Artificial Data Sets based on Knowledge Generators:
Analysis of Learning Algorithms Efficiency
y gg y
Joaquin Rios Boutin, Albert Orriols-Puig, Josep-Maria Garrell-Guiu
{j
{jrios, aorriols, josepmg}@salle.url.edu
j p g}@

GRSI (Grup de Recerca en Sistemes Intel·ligents)
http://www.salle.url.edu/GRSI
• http://www salle url edu/GRSI

– Oriented to:
• CBR (Computer Based Reasoning) Algorithms
• Evolutive Computation Algorithms
• Data Mining Technology Transfer

GRSI

HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (11)

Similar to HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

Similar to HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency (20)

More from Albert Orriols-Puig

More from Albert Orriols-Puig (20)

Recently uploaded

Recently uploaded (20)

HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency