Presentation by Xavier Llorà, Kumara Sastry, & David E. Goldberg showing how linkage learning is possible on Pittsburgh style learning classifier systems
OpenTag: Open Attribute Value Extraction From Product ProfilesSubhabrata Mukherjee
Guineng Zheng, Subhabrata Mukherjee, Xin Luna Dong, Feifei Li
KDD 2018, London, UK
OpenTag brings deep learning and active learning together for state-of-the-art imputation and open entity extraction system.
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...Xavier Llorà
A byproduct benefit of using probabilistic model-building genetic algorithms is the creation of cheap and accurate surrogate models. Learning classifier systems---and genetics-based machine learning in general---can greatly benefit from such surrogates which may replace the costly matching procedure of a rule against large data sets. In this paper we investigate the accuracy of such surrogate fitness functions when coupled with the probabilistic models evolved by the x-ary extended compact classifier system (xeCCS). To achieve such a goal, we show the need that the probabilistic models should be able to represent all the accurate basis functions required for creating an accurate surrogate. We also introduce a procedure to transform populations of rules based into dependency structure matrices (DSMs) which allows building accurate models of overlapping building blocks---a necessary condition to accurately estimate the fitness of the evolved rules.
OpenTag: Open Attribute Value Extraction From Product ProfilesSubhabrata Mukherjee
Guineng Zheng, Subhabrata Mukherjee, Xin Luna Dong, Feifei Li
KDD 2018, London, UK
OpenTag brings deep learning and active learning together for state-of-the-art imputation and open entity extraction system.
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...Xavier Llorà
A byproduct benefit of using probabilistic model-building genetic algorithms is the creation of cheap and accurate surrogate models. Learning classifier systems---and genetics-based machine learning in general---can greatly benefit from such surrogates which may replace the costly matching procedure of a rule against large data sets. In this paper we investigate the accuracy of such surrogate fitness functions when coupled with the probabilistic models evolved by the x-ary extended compact classifier system (xeCCS). To achieve such a goal, we show the need that the probabilistic models should be able to represent all the accurate basis functions required for creating an accurate surrogate. We also introduce a procedure to transform populations of rules based into dependency structure matrices (DSMs) which allows building accurate models of overlapping building blocks---a necessary condition to accurately estimate the fitness of the evolved rules.
Tutorial on Theory and Application of Generative Adversarial NetworksMLReview
Description
Generative adversarial network (GAN) has recently emerged as a promising generative modeling approach. It consists of a generative network and a discriminative network. Through the competition between the two networks, it learns to model the data distribution. In addition to modeling the image/video distribution in computer vision problems, the framework finds use in defining visual concept using examples. To a large extent, it eliminates the need of hand-crafting objective functions for various computer vision problems. In this tutorial, we will present an overview of generative adversarial network research. We will cover several recent theoretical studies as well as training techniques and will also cover several vision applications of generative adversarial networks.
Towards billion bit optimization via parallel estimation of distribution algo...kknsastry
This paper presents a highly efficient, fully parallelized implementation of the compact genetic algorithm to solve very large scale problems with millions to billions of variables. The paper presents principled results demonstrating the scalable solution of a difficult test function on instances over a billion variables using a parallel implementation of compact genetic algorithm (cGA). The problem addressed is a noisy, blind problem over a vector of binary decision variables. Noise is added equaling up to a tenth of the deterministic objective function variance of the problem, thereby making it difficult for simple hillclimbers to find the optimal solution. The compact GA, on the other hand, is able to find the optimum in the presence of noise quickly, reliably, and accurately, and the solution scalability follows known convergence theories. These results on noisy problem together with other results on problems involving varying modularity, hierarchy, and overlap foreshadow routine solution of billion-variable problems across the landscape of search problems.
A short presentation on the current state of Generative Adversarial Networks. Some of the materials are borrowed from the ICCV 2017 tutorial on GANs. I have put a reference where applicable at the bottom of the slide.
Empirical Analysis of ideal recombination on random decomposable problemskknsastry
This paper analyzes the behavior of a selectorecombinative genetic algorithm (GA) with an ideal crossover on a class of random additively decomposable problems (rADPs). Specifically, additively decomposable problems of order k whose subsolution fitnesses are sampled from the standard uniform distribution U[0,1] are analyzed. The scalability of the selectorecombinative GA is investigated for 10,000 rADP instances. The validity of facetwise models in bounding the population size, run duration, and the number of function evaluations required to successfully solve the problems is also verified. Finally, rADP instances that are easiest and most difficult are also investigated.
Презентация к магистерской диссертации Екатерины Выломовой "Нейросетевое моделирование вербального сознания", которая защищалась на английском яыке (#iu5, #bmstu).
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...Vahid Taslimitehrani
Presented at 15th International Conference on BioInformatics and BioEngineering (BIBE2014)
Prognostic modeling is central to medicine, as it is often used to predict patients’ outcome and response to treatments and to identify important medical risk factors. Logistic regression is one of the most used approaches for clinical pre- diction modeling. Traumatic brain injury (TBI) is an important public health issue and a leading cause of death and disability worldwide. In this study, we adapt CPXR (Contrast Pattern Aided Regression, a recently introduced regression method), to develop a new logistic regression method called CPXR(Log), for general binary outcome prediction (including prognostic modeling), and we use the method to carry out prognostic modeling for TBI using admission time data. The models produced by CPXR(Log) achieved AUC as high as 0.93 and specificity as high as 0.97, much better than those reported by previous studies. Our method produced interpretable prediction models for diverse patient groups for TBI, which show that different kinds of patients should be evaluated differently for TBI outcome prediction and the odds ratios of some predictor variables differ significantly from those given by previous studies; such results can be valuable to physicians.
High-performance graph analysis is unlocking knowledge in computer security, bioinformatics, social networks, and many other data integration areas. Graphs provide a convenient abstraction for many data problems beyond linear algebra. Some problems map directly to linear algebra. Others, like community detection, look eerily similar to sparse linear algebra techniques. And then there are algorithms that strongly resist attempts at making them look like linear algebra. This talk will cover recent results with an emphasis on streaming graph problems where the graph changes and results need updated with minimal latency. We’ll also touch on issues of sensitivity and reliability where graph analysis needs to learn from numerical analysis and linear algebra.
This talk describes a study that showed that integrating foveation into modern convolutional neural network improves their robustness to adversarial attacks and common image corruptions. These slides are of a talk given by Muhammad Ahmed Shah at Riken AIP, Tokyo, Japan as part of the TrustML Young Scientist Seminar.
A quick overview of the seed for Meandre 2.0 series. It covers the main motivations moving forward and the disruptive changes introduced via the use of Scala and MongoDB
Tutorial on Theory and Application of Generative Adversarial NetworksMLReview
Description
Generative adversarial network (GAN) has recently emerged as a promising generative modeling approach. It consists of a generative network and a discriminative network. Through the competition between the two networks, it learns to model the data distribution. In addition to modeling the image/video distribution in computer vision problems, the framework finds use in defining visual concept using examples. To a large extent, it eliminates the need of hand-crafting objective functions for various computer vision problems. In this tutorial, we will present an overview of generative adversarial network research. We will cover several recent theoretical studies as well as training techniques and will also cover several vision applications of generative adversarial networks.
Towards billion bit optimization via parallel estimation of distribution algo...kknsastry
This paper presents a highly efficient, fully parallelized implementation of the compact genetic algorithm to solve very large scale problems with millions to billions of variables. The paper presents principled results demonstrating the scalable solution of a difficult test function on instances over a billion variables using a parallel implementation of compact genetic algorithm (cGA). The problem addressed is a noisy, blind problem over a vector of binary decision variables. Noise is added equaling up to a tenth of the deterministic objective function variance of the problem, thereby making it difficult for simple hillclimbers to find the optimal solution. The compact GA, on the other hand, is able to find the optimum in the presence of noise quickly, reliably, and accurately, and the solution scalability follows known convergence theories. These results on noisy problem together with other results on problems involving varying modularity, hierarchy, and overlap foreshadow routine solution of billion-variable problems across the landscape of search problems.
A short presentation on the current state of Generative Adversarial Networks. Some of the materials are borrowed from the ICCV 2017 tutorial on GANs. I have put a reference where applicable at the bottom of the slide.
Empirical Analysis of ideal recombination on random decomposable problemskknsastry
This paper analyzes the behavior of a selectorecombinative genetic algorithm (GA) with an ideal crossover on a class of random additively decomposable problems (rADPs). Specifically, additively decomposable problems of order k whose subsolution fitnesses are sampled from the standard uniform distribution U[0,1] are analyzed. The scalability of the selectorecombinative GA is investigated for 10,000 rADP instances. The validity of facetwise models in bounding the population size, run duration, and the number of function evaluations required to successfully solve the problems is also verified. Finally, rADP instances that are easiest and most difficult are also investigated.
Презентация к магистерской диссертации Екатерины Выломовой "Нейросетевое моделирование вербального сознания", которая защищалась на английском яыке (#iu5, #bmstu).
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...Vahid Taslimitehrani
Presented at 15th International Conference on BioInformatics and BioEngineering (BIBE2014)
Prognostic modeling is central to medicine, as it is often used to predict patients’ outcome and response to treatments and to identify important medical risk factors. Logistic regression is one of the most used approaches for clinical pre- diction modeling. Traumatic brain injury (TBI) is an important public health issue and a leading cause of death and disability worldwide. In this study, we adapt CPXR (Contrast Pattern Aided Regression, a recently introduced regression method), to develop a new logistic regression method called CPXR(Log), for general binary outcome prediction (including prognostic modeling), and we use the method to carry out prognostic modeling for TBI using admission time data. The models produced by CPXR(Log) achieved AUC as high as 0.93 and specificity as high as 0.97, much better than those reported by previous studies. Our method produced interpretable prediction models for diverse patient groups for TBI, which show that different kinds of patients should be evaluated differently for TBI outcome prediction and the odds ratios of some predictor variables differ significantly from those given by previous studies; such results can be valuable to physicians.
High-performance graph analysis is unlocking knowledge in computer security, bioinformatics, social networks, and many other data integration areas. Graphs provide a convenient abstraction for many data problems beyond linear algebra. Some problems map directly to linear algebra. Others, like community detection, look eerily similar to sparse linear algebra techniques. And then there are algorithms that strongly resist attempts at making them look like linear algebra. This talk will cover recent results with an emphasis on streaming graph problems where the graph changes and results need updated with minimal latency. We’ll also touch on issues of sensitivity and reliability where graph analysis needs to learn from numerical analysis and linear algebra.
This talk describes a study that showed that integrating foveation into modern convolutional neural network improves their robustness to adversarial attacks and common image corruptions. These slides are of a talk given by Muhammad Ahmed Shah at Riken AIP, Tokyo, Japan as part of the TrustML Young Scientist Seminar.
A quick overview of the seed for Meandre 2.0 series. It covers the main motivations moving forward and the disruptive changes introduced via the use of Scala and MongoDB
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0Xavier Llorà
One hundred and fifty years have passed since the publication of Darwin's world-changing manuscript "The Origins of Species by Means of Natural Selection". Darwin's ideas have proven their power to reach beyond the biology realm, and their ability to define a conceptual framework which allows us to model and understand complex systems. In the mid 1950s and 60s the efforts of a scattered group of engineers proved the benefits of adopting an evolutionary paradigm to solve complex real-world problems. In the 70s, the emerging presence of computers brought us a new collection of artificial evolution paradigms, among which genetic algorithms rapidly gained widespread adoption. Currently, the Internet has propitiated an exponential growth of information and computational resources that are clearly disrupting our perception and forcing us to reevaluate the boundaries between technology and social interaction. Darwin's ideas can, once again, help us understand such disruptive change. In this talk, I will review the origin of artificial evolution ideas and techniques. I will also show how these techniques are, nowadays, helping to solve a wide range of applications, from life science problems to twitter puzzles, and how high performance computing can make Darwin ideas a routinary tool to help us model and understand complex systems.
Large Scale Data Mining using Genetics-Based Machine LearningXavier Llorà
We are living in the peta-byte era.We have larger and larger data to analyze, process and transform into useful answers for the domain experts. Robust data mining tools, able to cope with petascale volumes and/or high dimensionality producing human-understandable solutions are key on several domain areas. Genetics-based machine learning (GBML) techniques are perfect candidates for this task, among others, due to the recent advances in representations, learning paradigms, and theoretical modeling. If evolutionary learning techniques aspire to be a relevant player in this context, they need to have the capacity of processing these vast amounts of data and they need to process this data within reasonable time. Moreover, massive computation cycles are getting cheaper and cheaper every day, allowing researchers to have access to unprecedented parallelization degrees. Several topics are interlaced in these two requirements: (1) having the proper learning paradigms and knowledge representations, (2) understanding them and knowing when are they suitable for the problem at hand, (3) using efficiency enhancement techniques, and (4) transforming and visualizing the produced solutions to give back as much insight as possible to the domain experts are few of them.
This tutorial will try to answer this question, following a roadmap that starts with the questions of what large means, and why large is a challenge for GBML methods. Afterwards, we will discuss different facets in which we can overcome this challenge: Efficiency enhancement techniques, representations able to cope with large dimensionality spaces, scalability of learning paradigms. We will also review a topic interlaced with all of them: how can we model the scalability of the components of our GBML systems to better engineer them to get the best performance out of them for large datasets. The roadmap continues with examples of real applications of GBML systems and finishes with an analysis of further directions.
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...Xavier Llorà
Data-intensive computing has positioned itself as a valuable programming paradigm to efficiently approach problems requiring processing very large volumes of data. This paper presents a pilot study about how to apply the data-intensive computing paradigm to evolutionary computation algorithms. Two representative cases (selectorecombinative genetic algorithms and estimation of distribution algorithms) are presented, analyzed, and discussed. This study shows that equivalent data-intensive computing evolutionary computation algorithms can be easily developed, providing robust and scalable algorithms for the multicore-computing era. Experimental results show how such algorithms scale with the number of available cores without further modification.
Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...Xavier Llorà
Cancer diagnosis is essentially a human task. Almost universally, the process requires the extraction of tissue (biopsy) and examination of its microstructure by a human. To improve diagnoses based on limited and inconsistent morphologic knowledge, a new approach has recently been proposed that uses molecular spectroscopic imaging to utilize microscopic chemical composition for diagnoses. In contrast to visible imaging, the approach results in very large data sets as each pixel contains the entire molecular vibrational spectroscopy data from all chemical species. Here, we propose data handling and analysis strategies to allow computer-based diagnosis of human prostate cancer by applying a novel genetics-based machine learning technique ({\tt NAX}). We apply this technique to demonstrate both fast learning and accurate classification that, additionally, scales well with parallelization. Preliminary results demonstrate that this approach can improve current clinical practice in diagnosing prostate cancer.
This presentation covers a brief overview of the current stage of the DISCUS project. General overview and introduction to some of the currently available tools
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
National Security Agency - NSA mobile device best practices
Linkage Learning for Pittsburgh LCS: Making Problems Tractable
1. Linkage Learning for Pittsburgh LCS:
Making Problems Tractable
Xavier Llorà, Kumara Sastry, & David E. Goldberg
Illinois Genetic Algorithms Lab
University of Illinois at Urbana-Champaign
{xllora,kumara,deg}@illigal.ge.uiuc.edu
2. Motivation and Early Work
• Can we apply Wilson’s ideas for evolving rule sets
formed only by maximally accurate and general rules in
Pittsburgh LCS?
• Previous Multi-objective approaches:
Bottom up (Bernadó, 2002)
• Panmictic populations
• Multimodal optimization (sharing/crowding for niche formation)
Top down (Llorà, Goldberg, Traus, Bernadó, 2003)
• Explicitly address accuracy and generality
• Use it to push and product compact rule sets
• The compact classifier system (CCS) roots on the bottom
up approach.
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 2
3. Maximally Accurate and General Rules
• Accuracy and generality can be compute as
n t + (r) + n t# (r) n t + (r)
quot;(r) = quot;(r) =
nt nm
• Fitness should combine accuracy and generality
f (r) = quot;(r) # $(r)%
! !
• Such measure can be either applied to rules or rule sets.
• The CCS uses this fitness and a compact genetic algorithm
!
(cGA) to evolve such rules.
• One cGA run provides one rule.
• Multiple rules are required to form a rule set.
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 3
4. The cGA Can Make It
• Rules may be obtained optimizing
f (r) = quot;(r) # $(r)%
The basic CGA scheme
•
0
1. Initialization px i = 0.5
!
2. Model sampling (two individuals are generated)
3. Evaluation (f(r))
4. Selection (tournament selection)
!
5. Probabilistic model updation
6. Repeat steps 2-5 until termination criteria are met
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 4
5. cGA Model Perturbation
• Facilitate the evolution of different rules
• Explore the frequency of appearance of each optimal
rule
• Initial model perturbation
0
px i = 0.5 + U(quot;0.4,0.4)
• Experiments using the 3-input multiplexer
• 1,000 independent runs
!
• Visualize the pair-wise relations of the genes
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 5
6. But One Rule Is Not Enough
• Model perturbation in cGA evolve different rules
• The goal: evolve population of rules that solve the
problem together
• The fitness measure (f(r)) can be also be applied to rule
sets
Two mechanism:
•
Spawn a population until the solution is meet
Fusing populations when they represent the same rule
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 6
7. Spawning and Fusing Populations
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 7
8. Experiments & Scalability
• Analysis using multiplexer problems (3-, 6-, and 11-input)
• The number of rules in [O] grow exponentially.
It grows as 2i, where i is the number of inputs.
Assume equal probability of hitting a rule (binomial model).
The number or runs to achieve all the rules in [O] grows
exponentially.
• The cGA success as a function of the problem size!
3-input: 97%
6-input: 73.93%
11-input: 43.03%
• Scalability over 10,000 independent runs
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 8
10. So?
Open questions:
•
Multiple runs is not an option.
Could the poor cGA scalability be the result of the existence of linkage?
The χ-ary extended compact classifier system (χeCCS) needs to
•
provide answers to:
Perform linkage learning to improve the scalability of the rule learning
process.
Evolve [O] in a single run (rule niching?).
The χeCCS answer:
•
Use the extended compact genetic algorithm (Harik, 1999)
Rule niching via restricted tournament replacement (Harik, 1995)
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 10
11. Extended Compact Genetic Algorithm
A Probabilistic model building GA (Harik, 1999)
•
Builds models of good solutions as linkage groups
Key idea:
•
Good probability distribution → Linkage learning
Key components:
•
Representation: Marginal product model (MPM)
• Marginal distribution of a gene partition
Quality: Minimum description length (MDL)
• Occam’s razor principle
• All things being equal, simpler models are better
Search Method: Greedy heuristic search
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 11
12. Marginal Product Model (MPM)
• Partition variables into clusters
• Product of marginal distributions on a partition of genes
• Gene partition maps to linkage groups
MPM: [1, 2, 3], [4, 5, 6], … [l-2, l -1, l]
... xl-2 xl-1 xl
x1 x2 x3 x4 x5 x6
{p000, p001, p010, p100, p011, p101, p110, p111}
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 12
13. Minimum Description Length Metric
Hypothesis: For an optimal model
•
Model size and error is minimum
Model complexity, Cm
•
# of bits required to store all marginal probabilities
Compressed population complexity, Cp
•
Entropy of the marginal distribution over all partitions
MDL metric, Cc = Cm + Cp
•
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 13
14. Building an Optimal MPM
Assume independent genes ([1],[2],…,[l])
•
Compute MDL metric, Cc
•
All combinations of two subset merges
•
Eg., {([1,2],[3],…,[l]), ([1,3],[2],…,[l]), ([1],[2],…,[l-1,l])}
•
Compute MDL metric for all model candidates
•
Select the set with minimum MDL,
•
If , accept the model and go to step 2.
•
Else, the current model is optimal
•
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 14
15. Extended Compact Genetic Algorithm
Initialize the population (usually random initialization)
•
Evaluate the fitness of individuals
•
Select promising solutions (e.g., tournament selection)
•
Build the probabilistic model
•
Optimize structure & parameters to best fit selected individuals
•
Automatic identification of sub-structures
•
Sample the model to create new candidate solutions
•
Effective exchange of building blocks
•
Repeat steps 2–7 till some convergence criteria are met
•
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 15
16. Models built by eCGA
• Use model-building procedure of extended compact GA
Partition genes into (mutually) independent groups
Start with the lowest complexity model
Search for a least-complex, most-accurate model
Model Structure Metric
[X0] [X1] [X2] [X3] [X4] [X5] [X6] [X7] [X8] [X9] [X10] [X11] 1.0000
[X0] [X1] [X2] [X3] [X4X5] [X6] [X7] [X8] [X9] [X10] [X11] 0.9933
[X0] [X1] [X2] [X3] [X4X5X7] [X6] [X8] [X9] [X10] [X11] 0.9819
[X0] [X1] [X2] [X3] [X4X5X6X7] [X8] [X9] [X10] [X11] 0.9644
M M
[X0] [X1] [X2] [X3] [X4X5X6X7] [X8X9X10X11] 0.9273
M M
[X0X1X2X3] [X4X5X6X7] [X8X9X10X11] 0.8895
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 16
17. Modifying ecGA for Rule Learning
• Rules are described using χ-ary alphabets {0, 1, #}.
• χeCCS uses a χ-ary version of ecGA (Sastry and Goldberg,
2003; de la Osa, Sastry, and Lobo, 2006).
• Maximally general and maximally accurate rules may be
obtained using:
f (r) = quot;(r) # $(r)%
• Needs to maintain multiple rules in a run → niching
We need an efficient niching method, that does not adversely
!
affect the quality of the probabilistic models.
Restricted tournament replacement (Harik, 1995)
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 17
18. Experiments
Goals
•
1. Is linkage learning useful to solve the multiplexer problem using
Pittsburgh LCS?
2. How far can we push it?
Multiplexer problems
•
Address bits determine what input to use
There is un underlying structure, isn’t it?
The larger solved using Pittsburgh approaches (11-input)
•
Match all the examples
No linkage learning available
We borrowed the population sizing theory for ecGA.
•
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 18
19. χeCCS Models for Different Multiplexers
Building Block Size Increases
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 19
20. χeCCS Scalability
Follows facet-wise theory:
•
1. Grows exponential with the number of address bits (building block size)
2. Quadratically with the problem size
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 20
21. Conclusions
The χeCCS builds on competent GAs
•
The facetwise models from GA theory hold
•
The χeCCS is able to:
•
1. Perform linkage learning to improve the scalability of the rule
learning process.
2. Evolve [O] in a single run.
The χeCCS show the need for linkage learning in
•
Pittsburgh LCS to effectively solve multiplexer
problems.
χeCCS solved 20-input, 37-input, and 70-input
•
multiplexers problems for the first time using Pittsburgh
LCS.
NIGEL 2006 Llorà, X., Sastry, K., and Goldberg, D. 21
22. Linkage Learning for Pittsburgh LCS:
Making Problems Tractable
Xavier Llorà, Kumara Sastry, & David E. Goldberg
Illinois Genetic Algorithms Lab
University of Illinois at Urbana-Champaign
{xllora,kumara,deg}@illigal.ge.uiuc.edu