The document proposes a framework for evaluating machine learning algorithms using artificially generated data sets based on predefined knowledge. It defines a methodology that involves sampling problems, generating data sets using different sampling methods and sizes, and running learning algorithms with varying parameters. Results are analyzed by plotting accuracy and optimal population over iterations to compare algorithm efficiency under different conditions. The methodology allows reducing the number of iterations needed for evaluation.
Artificial Data Sets based on Knowledge Generators
1. Artificial Data Sets based on
Knowledge Generators: Analysis of
g y
Learning Algorithms Efficiency
Joaquin Rios-Boutin
Rios Boutin
Albert Orriols-Puig
Josep Maria Garrell Guiu
Josep-Maria Garrell-Guiu
Grup de Recerca en Sistemes Intel·ligents
Enginyeria i Arquitectura La Salle Universitat Ramon Llull
Salle,
{jrios, aorriols, josepmg}@salle.url.edu
2. Motivation
What is the Holy Grail of Machine Learning?
– Find the right Learning Algorithm to every Problem
– Real Problems are black boxes
• We don’t know which knowledge is contained
DI
• We can’t answer:
– When to stop training?
– How much efficient is the learning process?
– Artificial Problems:
DI
K
• Knowledge-driven
• Property-driven
Complex.Met.
Enginyeria i Arquitectura la Salle Slide 2
GRSI
3. Framework
Machine Learning as a Communication System
Communication Chanel
Learning
Environment.
Algorithm.
Al ith
Data Set
Knowledge
Learned
to be learned
Knowledge
Enginyeria i Arquitectura la Salle Slide 3
GRSI
4. Outline
1. Algorithm Evaluation Methodology Definition
1 Al ih E l i M hdl D fi i i
2.
2 Methodology Implementation
3. Experiment Description
4. Results and Analysis
5. Conclusions and Further Work
Enginyeria i Arquitectura la Salle Slide 4
GRSI
5. 1 Algorithm Evaluation Process
g
Process Execution and Control DB
Problem
Sampling Sampling Algorithm
Data Set
Method Size Parameters
Accuracy
DI Learning
Generation Algorithm 1.2
DS1kMulplx6m1
1
0.8
0.6
0.4
0.2
0
0 2000 4000 6000 8000 10000
Knowledge
Comparison 1.2
DS1kMulplx6m1
1
Optimal
p
0.8
0.6
Population
0.4
0.2
0
0 2000 4000 6000 8000 10000
Enginyeria i Arquitectura la Salle Slide 5
GRSI
6. 1 Algorithm Evaluation Process Dimensions
100000
10000
mpling
g
1000
Size
100
Sam
S
Apn
A
10 Alg.P
1 AP1 aram
SRS SIS RRS RIS .
Sampling Methods
To each Problem
Enginyeria i Arquitectura la Salle Slide 6
GRSI
7. Outline
1. Algorithm Evaluation Methodology Definition
1 Al ih E l i M hdl D fi i i
2.
2 Methodology Implementation
3. Experiment Description
4. Results and Analysis
5. Conclusions and Further Work
Enginyeria i Arquitectura la Salle Slide 7
GRSI
8. 2 Knowledge Representation
g p
Condition Class/Action
a11 a12 a1j a1m C1 Rule1
ai1 ai2 aij aim Ci
Rule
Set
an1 an2 anj anm Cn
aij={0,1, #} CiєN
{0,1,
Enginyeria i Arquitectura la Salle Slide 8
GRSI
9. 2 Sampling Methods
pg
SRS Sequential Rule Selection SIS Sequential Instance Selection
Sequential #
substitution
1st
2nd
Random # substitution
2nd
1st
RRS Random Rule Selection RIS Random Instance Selection
Random # substitution Sequential #
2nd
1st substitution
2nd
1st
Enginyeria i Arquitectura la Salle Slide 9
GRSI
11. 2 Problem Properties
p
Optimal Rule Sets
– Complete
– Non overlapped
– Irreducible
Why?
– Simple structure of knowledge complexity
–V
Very k
known artificial problems
tifi i l bl
Enginyeria i Arquitectura la Salle Slide 11
GRSI
12. Outline
1. Algorithm Evaluation Methodology Definition
1 Al ih E l i M hdl D fi i i
2.
2 Methodology Implementation
3. Experiment Description
4. Results and Analysis
5. Conclusions and Further Work
Enginyeria i Arquitectura la Salle Slide 12
GRSI
13. 3 Sampling and Learning Iteration
pg g
{
{Sampling Iteration} Problem {Training Iteration}
pg } { g }
Sampling Sampling Algorithm
Data Set
Method Size Parameters
Accuracy
DI Learning
Genaration Algorithm 1.2
DS1kMulplx6m1
1
0.8
0.6
0.4
0.2
0
0 2000 4000 6000 8000 10000
Knowledge
Comparison 1.2
DS1kMulplx6m1
1
Optimal
0.8
Population
P l ti
0.6
0.4
0.2
0
0 2000 4000 6000 8000 10000
Enginyeria i Arquitectura la Salle Slide 13
GRSI
14. 3 Output Results and Iteration Reduction
p
Output Results
– 2 Plots to every Problem Sampling Method Sampling Size and
Problem, Method,
Algorithm Parameters. 1.2
DS1kMulplx6m1
• Optimal Population 1
• Accuracy 0.8
Iteration R d ti
It ti Reduction 0.6
– SIS Pure sequential
0.4
• No Sampling Iteration Needed
0.2
– Problems without “don’t care”
0
• SRS=SIS and RRS=RIS 0 2000 4000 6000 8000 10000
Slide 14
GRSI
15. 3 Experimental Parameters
p
Number of Problems = 6
Number f Sampling M th d = 4
N b of S li Methods
Number of different Sampling Sizes = 4
Number of different Algorithms Parameters Sets = 2
Number f Sampling It ti
N b of S li Iterations = 10
Number of Training Iterations = 10
Number of Data Sets Generated = 744
Number of Training Process = 14880
Slide 15
GRSI
16. Outline
1. Algorithm Evaluation Methodology Definition
1 Al ih E l i M hdl D fi i i
2.
2 Methodology Implementation
3. Experiment Description
4. Results and Analysis
5. Conclusions and Further Work
Enginyeria i Arquitectura la Salle Slide 16
GRSI
21. Outline
1. Algorithm Evaluation Methodology Definition
1 Al ih E l i M hdl D fi i i
2.
2 Methodology Implementation
3. Experiment Description
4. Results and Analysis
5. Conclusions and Further Work
Enginyeria i Arquitectura la Salle Slide 21
GRSI
22. Conclusions and Further Work
Conclusions
– Automatic Learning Algorithm Analyzer based on Artificial Data Sets
– Four dimensions comparisons
– Methodology Implementation, Experiment and Results Analysis
Further Work
– Non ORS Problems
– R l Att ib t
Real Attributes
– Sampling Methods based on distance or transition matrix
– Multi Step Problems
p
– Different Learning Algorithms
– Different Knowledge representations
– Knowledge Covering Metrics
– Applying Data Set Complexity Metrics Suite
Slide 22
GRSI
23. GRSI
Artificial Data Sets based on Knowledge Generators:
Analysis of Learning Algorithms Efficiency
y gg y
Joaquin Rios Boutin, Albert Orriols-Puig, Josep-Maria Garrell-Guiu
{j
{jrios, aorriols, josepmg}@salle.url.edu
j p g}@
GRSI (Grup de Recerca en Sistemes Intel·ligents)
http://www.salle.url.edu/GRSI
• http://www salle url edu/GRSI
– Oriented to:
• CBR (Computer Based Reasoning) Algorithms
• Evolutive Computation Algorithms
• Data Mining Technology Transfer
Enginyeria i Arquitectura la Salle Slide 23
GRSI