HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

531 views

Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

  1. 1. Artificial Data Sets based on Knowledge Generators: Analysis of g y Learning Algorithms Efficiency Joaquin Rios-Boutin Rios Boutin Albert Orriols-Puig Josep Maria Garrell Guiu Josep-Maria Garrell-Guiu Grup de Recerca en Sistemes Intel·ligents Enginyeria i Arquitectura La Salle Universitat Ramon Llull Salle, {jrios, aorriols, josepmg}@salle.url.edu
  2. 2. Motivation What is the Holy Grail of Machine Learning? – Find the right Learning Algorithm to every Problem – Real Problems are black boxes • We don’t know which knowledge is contained DI • We can’t answer: – When to stop training? – How much efficient is the learning process? – Artificial Problems: DI K • Knowledge-driven • Property-driven Complex.Met. Enginyeria i Arquitectura la Salle Slide 2 GRSI
  3. 3. Framework Machine Learning as a Communication System Communication Chanel Learning Environment. Algorithm. Al ith Data Set Knowledge Learned to be learned Knowledge Enginyeria i Arquitectura la Salle Slide 3 GRSI
  4. 4. Outline 1. Algorithm Evaluation Methodology Definition 1 Al ih E l i M hdl D fi i i 2. 2 Methodology Implementation 3. Experiment Description 4. Results and Analysis 5. Conclusions and Further Work Enginyeria i Arquitectura la Salle Slide 4 GRSI
  5. 5. 1 Algorithm Evaluation Process g Process Execution and Control DB Problem Sampling Sampling Algorithm Data Set Method Size Parameters Accuracy DI Learning Generation Algorithm 1.2 DS1kMulplx6m1 1 0.8 0.6 0.4 0.2 0 0 2000 4000 6000 8000 10000 Knowledge Comparison 1.2 DS1kMulplx6m1 1 Optimal p 0.8 0.6 Population 0.4 0.2 0 0 2000 4000 6000 8000 10000 Enginyeria i Arquitectura la Salle Slide 5 GRSI
  6. 6. 1 Algorithm Evaluation Process Dimensions 100000 10000 mpling g 1000 Size 100 Sam S Apn A 10 Alg.P 1 AP1 aram SRS SIS RRS RIS . Sampling Methods To each Problem Enginyeria i Arquitectura la Salle Slide 6 GRSI
  7. 7. Outline 1. Algorithm Evaluation Methodology Definition 1 Al ih E l i M hdl D fi i i 2. 2 Methodology Implementation 3. Experiment Description 4. Results and Analysis 5. Conclusions and Further Work Enginyeria i Arquitectura la Salle Slide 7 GRSI
  8. 8. 2 Knowledge Representation g p Condition Class/Action a11 a12 a1j a1m C1 Rule1 ai1 ai2 aij aim Ci Rule Set an1 an2 anj anm Cn aij={0,1, #} CiєN {0,1, Enginyeria i Arquitectura la Salle Slide 8 GRSI
  9. 9. 2 Sampling Methods pg SRS Sequential Rule Selection SIS Sequential Instance Selection Sequential # substitution 1st 2nd Random # substitution 2nd 1st RRS Random Rule Selection RIS Random Instance Selection Random # substitution Sequential # 2nd 1st substitution 2nd 1st Enginyeria i Arquitectura la Salle Slide 9 GRSI
  10. 10. 2 Problems to learn and Learning Algorithm Mux6 Mux11 Parity5 0 0 # # # 0 0 0 0 0 0 0 0 0 0 # # # 1 1 0 0 0 0 1 1 0 1 # # 0 # 0 0 0 0 1 0 1 0 1 # # 1 # 1 0 0 0 1 1 0 1 1 0 # # # 0 1 1 1 1 0 0 XCS 1 1 1 # # # 1 1 1 1 1 1 1 Parity5-3 Position5 Position11 0 0 0 0 0 0 0 0 0 0 0 # # # 0 0 0 0 0 1 1 0 0 0 0 1 # # # 1 0 0 0 1 # 2 0 0 0 1 0 # # # 1 0 0 1 # # 3 0 0 0 1 1 # # # 0 1 # # # # 5 1 1 1 1 0 # # # 0 1 1 1 1 1 # # # 1 Enginyeria i Arquitectura la Salle Slide 10 GRSI
  11. 11. 2 Problem Properties p Optimal Rule Sets – Complete – Non overlapped – Irreducible Why? – Simple structure of knowledge complexity –V Very k known artificial problems tifi i l bl Enginyeria i Arquitectura la Salle Slide 11 GRSI
  12. 12. Outline 1. Algorithm Evaluation Methodology Definition 1 Al ih E l i M hdl D fi i i 2. 2 Methodology Implementation 3. Experiment Description 4. Results and Analysis 5. Conclusions and Further Work Enginyeria i Arquitectura la Salle Slide 12 GRSI
  13. 13. 3 Sampling and Learning Iteration pg g { {Sampling Iteration} Problem {Training Iteration} pg } { g } Sampling Sampling Algorithm Data Set Method Size Parameters Accuracy DI Learning Genaration Algorithm 1.2 DS1kMulplx6m1 1 0.8 0.6 0.4 0.2 0 0 2000 4000 6000 8000 10000 Knowledge Comparison 1.2 DS1kMulplx6m1 1 Optimal 0.8 Population P l ti 0.6 0.4 0.2 0 0 2000 4000 6000 8000 10000 Enginyeria i Arquitectura la Salle Slide 13 GRSI
  14. 14. 3 Output Results and Iteration Reduction p Output Results – 2 Plots to every Problem Sampling Method Sampling Size and Problem, Method, Algorithm Parameters. 1.2 DS1kMulplx6m1 • Optimal Population 1 • Accuracy 0.8 Iteration R d ti It ti Reduction 0.6 – SIS Pure sequential 0.4 • No Sampling Iteration Needed 0.2 – Problems without “don’t care” 0 • SRS=SIS and RRS=RIS 0 2000 4000 6000 8000 10000 Slide 14 GRSI
  15. 15. 3 Experimental Parameters p Number of Problems = 6 Number f Sampling M th d = 4 N b of S li Methods Number of different Sampling Sizes = 4 Number of different Algorithms Parameters Sets = 2 Number f Sampling It ti N b of S li Iterations = 10 Number of Training Iterations = 10 Number of Data Sets Generated = 744 Number of Training Process = 14880 Slide 15 GRSI
  16. 16. Outline 1. Algorithm Evaluation Methodology Definition 1 Al ih E l i M hdl D fi i i 2. 2 Methodology Implementation 3. Experiment Description 4. Results and Analysis 5. Conclusions and Further Work Enginyeria i Arquitectura la Salle Slide 16 GRSI
  17. 17. Problem Dimension Sampling M. = RIS Sampling Size = 1000 Learning Alg. Param. = pDNC 0.2 M Alg Param 02 1.2 Mux6 1.1 DS1kMulplx6m4 DS1kParity5m4 Parity5 1 1 0.9 0.8 0.8 0.6 0.7 0.6 0.4 0.5 0.2 0.4 0 0.3 -0.2 0.2 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 1.05 DS1kMulplx6m4 1.05 DS1kParity5m4 1 1 0.95 0.9 0.95 0.85 0.9 0.8 0.75 0.85 0.7 0.65 0.8 0.6 0.75 0.55 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 Slide 17 GRSI
  18. 18. Sampling Method Dimension pg Problem = Position5 Sampling Size = 1000 Learning Alg. Param. = pDNC 0.2 Alg Param 02 SRS Sequential Rule Selection RIS Random Instance Selection 1.2 0.9 DS1kPosition5m1 DS1kPosition5m4 0.8 1 0.7 0.6 0.8 0.5 0.6 0.4 0.3 0.4 0.2 0.1 0.2 0 0 -0.1 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 1.1 1.1 DS1kPosition5m1 DS1kPosition5m4 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 Slide 18 GRSI
  19. 19. Sampling Size Dimension pg Problem = Parity5 Sampling M.= RIS Learning Alg. Param. = pDNC 0.2 M= Alg Param 02 1.1 1.1 DS100Parity5m4 100 DS10kParity5m4 10000 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.1 0.2 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 1.05 1.05 DS10kParity5m4 DS100Parity5m4 1 1 0.95 0.95 0.9 0.9 0.85 0.85 0.8 0.8 0.75 0.7 0.75 0.65 0.7 0.6 0.65 0.55 0 2000 4000 6000 8000 10000 0.6 0 2000 4000 6000 8000 10000 Slide 19 GRSI
  20. 20. Parameter Algorithm Dimension g Problem = Mux6 Sampling M. = RIS Sampling Size = 1000 M 1 1.2 DS1kMulplx6m4 DS1kMulplx6m4 pDNC 0.8 0.9 pDNC 0.2 1 0.8 0.7 0.8 0.6 0.6 0.5 0.4 0.4 0.3 0.2 0.2 0.1 0 0 -0.1 -0.2 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 1.05 DS1kMulplx6m4 1.05 DS1kMulplx6m4 1 0.95 1 0.9 0.85 0.95 0.8 0.9 0.75 0.7 0.85 0.65 0.6 0.8 0.55 0.5 0.75 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 Slide 20 GRSI
  21. 21. Outline 1. Algorithm Evaluation Methodology Definition 1 Al ih E l i M hdl D fi i i 2. 2 Methodology Implementation 3. Experiment Description 4. Results and Analysis 5. Conclusions and Further Work Enginyeria i Arquitectura la Salle Slide 21 GRSI
  22. 22. Conclusions and Further Work Conclusions – Automatic Learning Algorithm Analyzer based on Artificial Data Sets – Four dimensions comparisons – Methodology Implementation, Experiment and Results Analysis Further Work – Non ORS Problems – R l Att ib t Real Attributes – Sampling Methods based on distance or transition matrix – Multi Step Problems p – Different Learning Algorithms – Different Knowledge representations – Knowledge Covering Metrics – Applying Data Set Complexity Metrics Suite Slide 22 GRSI
  23. 23. GRSI Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency y gg y Joaquin Rios Boutin, Albert Orriols-Puig, Josep-Maria Garrell-Guiu {j {jrios, aorriols, josepmg}@salle.url.edu j p g}@ GRSI (Grup de Recerca en Sistemes Intel·ligents) http://www.salle.url.edu/GRSI • http://www salle url edu/GRSI – Oriented to: • CBR (Computer Based Reasoning) Algorithms • Evolutive Computation Algorithms • Data Mining Technology Transfer Enginyeria i Arquitectura la Salle Slide 23 GRSI

×