HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers Behavior

•

1 like•437 views

This document presents a genetic algorithm approach to generating synthetic data sets for analyzing classifier behavior. The genetic algorithm represents data set labelings as binary strings and uses genetic operators like crossover and mutation to evolve solutions that satisfy the desired complexity based on class boundary length. Experiments show the genetic algorithm can generate intermediate complexity data sets in early generations and produce similar accuracy rates across different classifier paradigms, while allowing control over the data set properties. Future work aims to improve efficiency and scalability, enable multiple criteria optimization, and develop benchmark problems with more realistic structure.

Genetic-based Synthetic Data
Sets for the A l i f
S t f th Analysis of
Classifiers Behavior
8th I t
International Conference on Hybrid Intelligent Systems
ti lC f H b id I t lli tS t

Núria Macià
Albert Orriols-Puig
Alb t O i l P i
Ester Bernadó-Mansilla
{nmacia,aorriols,esterb}@salle.url.edu

Grup de Recerca en Sistemes Intel·ligents
Enginyeria i Arquitectura La Salle
Universitat Ramon Llull

Motivation

Knowledge
Data Set Model
Extraction
Real-world
Learner
problem
+
Prediction
Necessity of synthetic data sets
To evaluate real learners performance under
controlled scenarios
How to generate synthetic data sets?
Data complexity (Ho & Basu, 2002)
Length of the class boundary (Macià et al., 2008)

Objective: Set of benchmark problems to analyze
learners behavior
Overview and Future Research Slide 2

Outline
1.
1 Data complexity
2. Synthetic data sets
3. Design of GA
4.
4 Experiments and results
5. Conclusions and further work

Overview and Future Research Slide 3

1. Data complexity
Length of the class boundary
Build minimum spanning tree (MST) connecting all
the points regardless of class
Count the number of edges joining
opposite classes
it l

Two cases of many points in boundary:
Very interleaved or random data
Linearly separable problem with narrow margins

Overview and Future Research Slide 4

2. Synthetic data sets
Generation procedure
Set the number of instances n, the number of
attributes m and the length of the class boundary
m,
b.
Generate n points di t ib t d randomly and b ild
G t i t distributed dl d build
the MST.

Label the class of each
instances

Overview and Future Research Slide 5

2. Synthetic data sets
Exhaustive search
Labelings grow exponentially with the number of
instances
Heuristic search
Demanded length of the class boundary is not
always achieved
No diverse solutions

Genetic algorithm
G ti l ith

Overview and Future Research Slide 6

3. Design of GA
Knowledge representation
k-ary string where the bit i stores the class label of
the ith instance

Data set i Individual i
Att. 1 Att. 2 … Att. N Class
0.4
04 0.5
05 0.4
04 0
0.2 1.0 0.2 1
011011
0.5 0.3 0.4 1
0.6 0.5 0.4 0
0.7 0.1 1.0 1
0.5 0.3 0.9 1

Overview and Future Research Slide 7

3. Design of GA
Genetic operators
s-wise tournament selection
Two-point crossover
T it
Bit-wise mutation
Fitness function
fitnessi = bobj − bi

Overview and Future Research Slide 8

4. Experiment and results (I)
Synthetic data set generation
Different solutions < Solutions
Population converge
Pop lation con erge to the same sol tion
solution
{0100,1011} are equivalent individuals
Intermediate complexity are obtained i early
It di t l it bt i d in l
generations

Overview and Future Research Slide 9

4. Experiment and results (II)
Analysis of classifiers behavior
Three different paradigms: C4.5, Naïve Bayes, and
SMO
Similar accuracy rates with noticeable variability

Overview and Future Research Slide 10

5. Conclusions
The GA allows us to generate data sets with
the demanded length of the class boundary

Overview and Future Research Slide 11

6. Further work
Efficiency and scalability
Move from simple GA to competent GA
Capacity of satisfying multiple criteria
C f f
Multi-objective strategy
j gy
Achieve structure of real-world problems
Provide a set of benchmark problems

Overview and Future Research Slide 12

This paper presents a new model for neuro-evolutionary systems. It is a new quantum-inspired evolutionary algorithm with binary-real representation (QIEA-BR) for evolution of a neural network. The proposed model is an extension of the QIEA-R developed for numerical optimization. The Quantum-Inspired Neuro-Evolutionary Computation model (QINEA-BR) is able to completely configure a feed-forward neural network in terms of selecting the relevant input variables, number of neurons in the hidden layer and all existent synaptic weights. QINEA-BR is evaluated in a benchmark problem of financial credit evaluation. The results obtained demonstrate the effectiveness of this new model in comparison with other machine learning and statistical models, providing good accuracy in separating good from bad customers.

Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...

Universitat Politècnica de Catalunya

Image segmentation is a classic computer vision task that aims at labeling pixels with semantic classes. These slides provide an overview of the basic approaches applied from the deep learning field to tackle this challenge and presents the basic subtasks (semantic, instance and panoptic segmentation) and related datasets. Presented at the International Summer School on Deep Learning (ISSonDL) 2020 held online and organized by the University of Gdansk (Poland) between the 30th August and 2nd September. http://2020.dl-lab.eu/virtual-summer-school-on-deep-learning/

An Unorthodox View on Memetic Algorithms

Natalio Krasnogor

Memetic Algorithms have become one of the key methodologies behind solvers that are capable of tackling very large, real-world, optimisation problems. They are being actively investigated in research institutions as well as broadly applied in industry. In this talk we provide a pragmatic guide on the key design issues underpinning Memetic Algorithms (MA) engineering. We begin with a brief contextual introduction to Memetic Algorithms and then move on to define a Pattern Language for MAs. For each pattern, an associated design issue is tackled and illustrated with examples from the literature. We then fast forward to the future and mention what, in our mind, are the key challenges that scientistis and practitioner will need to face if Memetic Algorithms are to remain a relevant technology in the next 20 years.

Deep Learning Representations for All (a.ka. the AI hype)

Universitat Politècnica de Catalunya

Deep neural networks have revolutionized the data analytics scene by improving results in several and diverse benchmarks with the same recipe: learning feature representations from data. These achievements have raised the interest across multiple scientific fields, especially in those where large amounts of data and computation are available. This change of paradigm in data analytics has several ethical and economic implications that are driving large investments, political debates and sounding press coverage under the generic label of artificial intelligence (AI). This talk will present the fundamentals of deep learning through the classic example of image classification, and point at how the same principal has been adopted for several tasks. Finally, some of the forthcoming potentials and risks for AI will be pointed.

Multimodal Deep Learning

Universitat Politècnica de Catalunya

Deep neural networks have boosted the convergence of multimedia data analytics in a unified framework shared by practitioners in natural language, vision and speech. Image captioning, lip reading or video sonorization are some of the first applications of a new and exciting field of research exploiting the generalization properties of deep neural representation. This tutorial will firstly review the basic neural architectures to encode and decode vision, text and audio, to later review the those models that have successfully translated information across modalities. The contents of this tutorial are available at: https://telecombcn-dl.github.io/2019-mmm-tutorial/.

Neural Architectures for Video Encoding

Universitat Politècnica de Catalunya

Data-centric AI and the convergence of data and model engineering:opportunit...

Paolo Missier

A keynote talk given to the IDEAL 2023 conference (Evora, Portugal Nov 23, 2023). Abstract. The past few years have seen the emergence of what the AI community calls "Data-centric AI", namely the recognition that some of the limiting factors in AI performance are in fact in the data used for training the models, as much as in the expressiveness and complexity of the models themselves. One analogy is that of a powerful engine that will only run as fast as the quality of the fuel allows. A plethora of recent literature has started the connection between data and models in depth, along with startups that offer "data engineering for AI" services. Some concepts are well-known to the data engineering community, including incremental data cleaning, multi-source integration, or data bias control; others are more specific to AI applications, for instance the realisation that some samples in the training space are "easier to learn from" than others. In this "position talk" I will suggest that, from an infrastructure perspective, there is an opportunity to efficiently support patterns of complex pipelines where data and model improvements are entangled in a series of iterations. I will focus in particular on end-to-end tracking of data and model versions, as a way to support MLDev and MLOps engineers as they navigate through a complex decision space.

Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis

Olga Scrivner

In the format of hands-on session, this workshop will introduce participants to the Language Variation Suite (LVS), a user-friendly interactive web application built in R. LVS provides access to advanced statistical methods and visualization techniques, such as mixed-effects modeling, conditional and random tree analyses, cluster analysis. These advanced methods enable researchers to handle imbalanced data, measure individual and group variation, estimate significance, and rank variables according to their significance.

What's hot

Lecture24Albert Orriols-Puig

Lecture3 - Machine LearningAlbert Orriols-Puig

Lecture1 - Machine LearningAlbert Orriols-Puig

CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...Albert Orriols-Puig

Lecture7 - IBkAlbert Orriols-Puig

Lecture11 - neural networksAlbert Orriols-Puig

Lecture17Albert Orriols-Puig

Lecture2 - Machine LearningAlbert Orriols-Puig

Lecture15 - Advances topics on association rules PART IIAlbert Orriols-Puig

Lecture19Albert Orriols-Puig

Lecture18Albert Orriols-Puig

Lecture20Albert Orriols-Puig

HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSAlbert Orriols-Puig

Lecture23Albert Orriols-Puig

A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...

Anderson Pinho

Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...

Universitat Politècnica de Catalunya

An Unorthodox View on Memetic Algorithms

Natalio Krasnogor

Deep Learning Representations for All (a.ka. the AI hype)

Universitat Politècnica de Catalunya

Multimodal Deep Learning

Universitat Politècnica de Catalunya

Neural Architectures for Video Encoding

Universitat Politècnica de Catalunya

What's hot (20)

Lecture24

Lecture3 - Machine Learning

Lecture1 - Machine Learning

CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...

Lecture7 - IBk

Lecture11 - neural networks

Lecture17

Lecture2 - Machine Learning

Lecture15 - Advances topics on association rules PART II

Lecture19

Lecture18

Lecture20

HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS

Lecture23

A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...

Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...

An Unorthodox View on Memetic Algorithms

Deep Learning Representations for All (a.ka. the AI hype)

Multimodal Deep Learning

Neural Architectures for Video Encoding

Similar to HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers Behavior

Data-centric AI and the convergence of data and model engineering:opportunit...

Paolo Missier

Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis

Olga Scrivner

Lec1-Intobutest

Presentation on Machine Learning and Data Miningbutest

Clusterix at VDS 2016

Eamonn Maguire

Machine Learning: Foundations Course Number 0368403401butest

Machine learning and_neural_network_lecture_slide_ece_dku

Seokhyun Yoon

Machine Learning: Foundations Course Number 0368403401butest

A SURVEY ON DATA MINING IN STEEL INDUSTRIES

IJCSES Journal

In Industrial environments, huge amount of data is being generated which in turn collected indatabase anddata warehouses from all involved areas such as planning, process design, materials, assembly, production, quality, process control, scheduling, fault detection,shutdown, customer relation management, and so on. Data Mining has become auseful tool for knowledge acquisition for industrial process of Iron and steel making. Due to the rapid growth in Data Mining, various industries started using data mining technology to search the hidden patterns, which might further be used to the system with the new knowledge which might design new models to enhance the production quality, productivity optimum cost and maintenance etc. The continuous improvement of all steel production process regarding the avoidance of quality deficiencies and the related improvement of production yield is an essential task of steel producer. Therefore, zero defect strategy is popular today and to maintain it several quality assurancetechniques areused. The present report explains the methods of data mining and describes its application in the industrial environment and especially, in the steel industry.

SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET

Editor IJMTER

Data mining environment produces a large amount of data that need to be analyzed. Using traditional databases and architectures, it has become difficult to process, manage and analyze patterns. To gain knowledge about the Big Data a proper architecture should be understood. Classification is an important data mining technique with broad applications to classify the various kinds of data used in nearly every field of our life. Classification is used to classify the item according to the features of the item with respect to the predefined set of classes. This paper put a light on various classification algorithms including j48, C4.5, Naive Bayes using large dataset.

Computational model for artificial learning using formal concept analysisAboul Ella Hassanien

Selecting the correct Data Mining Method: Classification & InDaMiTe-R

IOSR Journals

Hypothesis on Different Data Mining Algorithms

IJERA Editor

In this paper, different classification algorithms for data mining are discussed. Data Mining is about explaining the past & predicting the future by means of data analysis. Classification is a task of data mining, which categories data based on numerical or categorical variables. To classify the data many algorithms are proposed, out of them five algorithms are comparatively studied for data mining through classification. There are four different classification approaches namely Frequency Table, Covariance Matrix, Similarity Functions & Others. As work for research on classification methods, algorithms like Naive Bayesian, K Nearest Neighbors, Decision Tree, Artificial Neural Network & Support Vector Machine are studied & examined using benchmark datasets like Iris & Lung Cancer.

Neural Network Classification and its Applications in Insurance IndustryInderjeet Singh

Machine learning ppt unit one syllabuspptx

VenkateswaraBabuRavi

Classifier Model using Artificial Neural Network

AI Publications

When it comes to AI and ML, precision in categorization is of the utmost importance. In this research, the use of supervised instance selection (SIS) to improve the performance of artificial neural networks (ANNs) in classification is investigated. The goal of SIS is to enhance the accuracy of future classification tasks by identifying and selecting a subset of examples from the original dataset. The purpose of this research is to provide light on how useful SIS is as a preprocessing tool for artificial neural network-based classification. The work aims to improve the input dataset to ANNs by using SIS, which may help with problems caused by noisy or redundant data. The ultimate goal is to improve ANNs' ability to identify data points properly across a wide range of application areas.

Chapter1_C.docbutest

SYNOPSIS on Parse representation and Linear SVM.

bhavinecindus

Kenett On Information NYU-Poly 2013The Hebrew University of Jerusalem

Similar to HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers Behavior (20)

Data-centric AI and the convergence of data and model engineering:opportunit...

Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis

Lec1-Into

Presentation on Machine Learning and Data Mining

Clusterix at VDS 2016

Machine Learning: Foundations Course Number 0368403401

Machine learning and_neural_network_lecture_slide_ece_dku

Machine Learning: Foundations Course Number 0368403401

A SURVEY ON DATA MINING IN STEEL INDUSTRIES

SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET

Computational model for artificial learning using formal concept analysis

Selecting the correct Data Mining Method: Classification & InDaMiTe-R

Hypothesis on Different Data Mining Algorithms

Neural Network Classification and its Applications in Insurance Industry

Machine learning ppt unit one syllabuspptx

Classifier Model using Artificial Neural Network

Chapter1_C.doc

SYNOPSIS on Parse representation and Linear SVM.

Kenett On Information NYU-Poly 2013

Recently uploaded

BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...

Nguyen Thanh Tu Collection

Synthetic Fiber Construction in lab .pptx

Pavel ( NSTU)

Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.

Operation Blue Star - Saka Neela Tara

Balvir Singh

Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup. The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.

Acetabularia Information For Class 9 .docx

vaibhavrinwa19

Palestine last event orientationfvgnh .pptx

RaedMohamed3

How to Make a Field invisible in Odoo 17

Celine George

Biological Screening of Herbal Drugs in detailed.

Ashokrao Mane college of Pharmacy Peth-Vadgaon

Biological screening of herbal drugs: Introduction and Need for Phyto-Pharmacological Screening, New Strategies for evaluating Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and Antifertility, Toxicity studies as per OECD guidelines

Lapbook sobre os Regimes Totalitários.pdf

Jean Carlos Nunes Paixão

Francesca Gottschalk - How can education support child empowerment.pptx

EduSkills OECD

Sha'Carri Richardson Presentation 202345

beazzy04

Chapter 3 - Islamic Banking Products and Services.pptx

Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia

2024.06.01 Introducing a competency framework for languag learning materials ...

Sandy Millin

http://sandymillin.wordpress.com/iateflwebinar2024 Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error. Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials. This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.

Embracing GenAI - A Strategic Imperative

Peter Windle

Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction. This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.

Digital Tools and AI for Teaching Learning and Research

Vikramjit Singh

Honest Reviews of Tim Han LMA Course Program.pptx

timhan337

"Protectable subject matters, Protection in biotechnology, Protection of othe...

SACHIN R KONDAGURI

A Strategic Approach: GenAI in Education

Peter Windle

The Accursed House by Émile Gaboriau.pptx

DhatriParmar

CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE

BhavyaRajput3

Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf

TechSoup

Recently uploaded (20)

BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...

Synthetic Fiber Construction in lab .pptx

Operation Blue Star - Saka Neela Tara

Acetabularia Information For Class 9 .docx

Palestine last event orientationfvgnh .pptx

How to Make a Field invisible in Odoo 17

Biological Screening of Herbal Drugs in detailed.

Lapbook sobre os Regimes Totalitários.pdf

Francesca Gottschalk - How can education support child empowerment.pptx

Sha'Carri Richardson Presentation 202345

Chapter 3 - Islamic Banking Products and Services.pptx

2024.06.01 Introducing a competency framework for languag learning materials ...

Embracing GenAI - A Strategic Imperative

Digital Tools and AI for Teaching Learning and Research

Honest Reviews of Tim Han LMA Course Program.pptx

"Protectable subject matters, Protection in biotechnology, Protection of othe...

A Strategic Approach: GenAI in Education

The Accursed House by Émile Gaboriau.pptx

CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE

Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf

HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers Behavior

1. Genetic-based Synthetic Data Sets for the A l i f S t f th Analysis of Classifiers Behavior 8th I t International Conference on Hybrid Intelligent Systems ti lC f H b id I t lli tS t Núria Macià Albert Orriols-Puig Alb t O i l P i Ester Bernadó-Mansilla {nmacia,aorriols,esterb}@salle.url.edu Grup de Recerca en Sistemes Intel·ligents Enginyeria i Arquitectura La Salle Universitat Ramon Llull

2. Motivation Knowledge Data Set Model Extraction Real-world Learner problem + Prediction Necessity of synthetic data sets To evaluate real learners performance under controlled scenarios How to generate synthetic data sets? Data complexity (Ho & Basu, 2002) Length of the class boundary (Macià et al., 2008) Objective: Set of benchmark problems to analyze learners behavior Overview and Future Research Slide 2

3. Outline 1. 1 Data complexity 2. Synthetic data sets 3. Design of GA 4. 4 Experiments and results 5. Conclusions and further work Overview and Future Research Slide 3

4. 1. Data complexity Length of the class boundary Build minimum spanning tree (MST) connecting all the points regardless of class Count the number of edges joining opposite classes it l Two cases of many points in boundary: Very interleaved or random data Linearly separable problem with narrow margins Overview and Future Research Slide 4

5. 2. Synthetic data sets Generation procedure Set the number of instances n, the number of attributes m and the length of the class boundary m, b. Generate n points di t ib t d randomly and b ild G t i t distributed dl d build the MST. Label the class of each instances Overview and Future Research Slide 5

6. 2. Synthetic data sets Exhaustive search Labelings grow exponentially with the number of instances Heuristic search Demanded length of the class boundary is not always achieved No diverse solutions Genetic algorithm G ti l ith Overview and Future Research Slide 6

7. 3. Design of GA Knowledge representation k-ary string where the bit i stores the class label of the ith instance Data set i Individual i Att. 1 Att. 2 … Att. N Class 0.4 04 0.5 05 0.4 04 0 0.2 1.0 0.2 1 011011 0.5 0.3 0.4 1 0.6 0.5 0.4 0 0.7 0.1 1.0 1 0.5 0.3 0.9 1 Overview and Future Research Slide 7

8. 3. Design of GA Genetic operators s-wise tournament selection Two-point crossover T it Bit-wise mutation Fitness function fitnessi = bobj − bi Overview and Future Research Slide 8

9. 4. Experiment and results (I) Synthetic data set generation Different solutions < Solutions Population converge Pop lation con erge to the same sol tion solution {0100,1011} are equivalent individuals Intermediate complexity are obtained i early It di t l it bt i d in l generations Overview and Future Research Slide 9

10. 4. Experiment and results (II) Analysis of classifiers behavior Three different paradigms: C4.5, Naïve Bayes, and SMO Similar accuracy rates with noticeable variability Overview and Future Research Slide 10

11. 5. Conclusions The GA allows us to generate data sets with the demanded length of the class boundary Overview and Future Research Slide 11

12. 6. Further work Efficiency and scalability Move from simple GA to competent GA Capacity of satisfying multiple criteria C f f Multi-objective strategy j gy Achieve structure of real-world problems Provide a set of benchmark problems Overview and Future Research Slide 12

13. Genetic-based Synthetic Data Sets for the A l i f S t f th Analysis of Classifiers Behavior 8th I t International Conference on Hybrid Intelligent Systems ti lC f H b id I t lli tS t Núria Macià Albert Orriols-Puig Alb t O i l P i Ester Bernadó-Mansilla {nmacia,aorriols,esterb}@salle.url.edu Grup de Recerca en Sistemes Intel·ligents Enginyeria i Arquitectura La Salle Universitat Ramon Llull

HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers Behavior

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers Behavior

Similar to HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers Behavior (20)

More from Albert Orriols-Puig

More from Albert Orriols-Puig (11)

Recently uploaded

Recently uploaded (20)

HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers Behavior