The document discusses learning classifier systems (LCS) for addressing class imbalance problems in datasets. It aims to enhance the applicability of LCS to knowledge discovery from real-world datasets that often exhibit class imbalance, where one class is represented by significantly fewer examples than other classes. The author proposes adapting parameters of the XCS learning classifier system, such as learning rate and genetic algorithm threshold, based on estimated class imbalance ratios within classifiers' niches in order to minimize bias towards majority classes and better handle small disjuncts representing minority classes.
Dr. Oner CelepcikayCS 4319CS 4319Machine LearningW.docxmadlynplamondon
Dr. Oner Celepcikay
CS 4319
CS 4319
Machine Learning
Week 6
Data Science Tool I – Classification Part II
Header – dark yellow 24 points Arial Bold
Body text – white 20 points Arial Bold, dark yellow highlights
Bullets – dark yellow
Copyright – white 12 points Arial
Size:
Height: 7.52"
Width: 10.02"
Scale: 70%
Position on slide:
Horizontal - 0"
Vertical - 0"
Tree InductionGreedy strategy.Split the records based on an attribute test that optimizes certain criterion.
IssuesDetermine how to split the recordsHow to specify the attribute test condition?How to determine the best split?Determine when to stop splitting
Stopping Criteria for Tree InductionStop expanding a node when all the records belong to the same class
Stop expanding a node when all the records have similar attribute values
Early termination (to be discussed later)
Practical Issues of ClassificationUnderfitting and Overfitting
Missing Values
Costs of Classification
Underfitting and Overfitting
Overfitting
Underfitting: when model is too simple, both training and test errors are large
Overfitting due to Noise
Decision boundary is distorted by noise point
Overfitting due to Noise
* Bats and Whales are misclassified; non-mammals instead of mammals.
Overfitting due to Noise
Decision boundary is distorted by noise point
Both humans and dolphins were misclassified as n0n-mammals b/c Body Temp, Gives_Birth and Four-legged values are identical to mislabeled records in training set.
Spiny anteaters represent an exceptional case (every warm-blooded with no gives_birth is non-mammal in TR_Set
Decision tree perfectly fits training data (training error=0)
But error rate on test data is 30%.
Overfitting due to Noise
Estimating Generalization ErrorsRe-substitution errors: error on training ( e(t) )Generalization errors: error on testing ( e’(t))
Methods for estimating generalization errors:Optimistic approach: e’(t) = e(t)Pessimistic approach: For each leaf node: e’(t) = (e(t)+0.5) Total errors: e’(T) = e(T) + N 0.5 (N: number of leaf nodes) For a tree with 30 leaf nodes and 10 errors on training
(out of 1000 instances):
Training error = 10/1000 = 1%
Generalization error = (10 + 300.5)/1000 = 2.5%Reduced error pruning (REP): uses validation data set to estimate generalization
error
Occam’s RazorGiven two models of similar generalization errors, one should prefer the simpler model over the more complex model
For complex models, there is a greater chance that it was fitted accidentally by errors in data
Therefore, one should include model complexity when evaluating a model
How to Address OverfittingPre-Pruning (Early Stopping Rule)Stop the algorithm before it becomes a fully-grown treeTypical stopping conditions for a node: Stop if all instances belong to the same class Stop if all the attribute values are the sameMore restrictive conditions: Stop if number of instances is less tha ...
Modeling XCS in class imbalances: Population sizing and parameter settingskknsastry
This paper analyzes the scalability of the population size required in XCS to maintain niches that are infrequently activated. Facetwise models have been developed to predict the effect of the imbalance ratio—ratio between the number of instances of the majority class and the minority class that are sampled to XCS—on population initialization, and on the creation and deletion of classifiers of the minority class. While theoretical models show that, ideally, XCS scales linearly with the imbalance ratio, XCS with standard configuration scales exponentially.
The causes that are potentially responsible for this deviation from the ideal scalability are also investigated. Specifically, the inheritance procedure of classifiers’ parameters, mutation, and subsumption are analyzed, and improvements in XCS’s mechanisms are proposed to effectively and efficiently handle imbalanced problems. Once the recommendations are incorporated to XCS, empirical results show that the population size in XCS indeed scales linearly with the imbalance ratio.
Dr. Oner CelepcikayCS 4319CS 4319Machine LearningW.docxmadlynplamondon
Dr. Oner Celepcikay
CS 4319
CS 4319
Machine Learning
Week 6
Data Science Tool I – Classification Part II
Header – dark yellow 24 points Arial Bold
Body text – white 20 points Arial Bold, dark yellow highlights
Bullets – dark yellow
Copyright – white 12 points Arial
Size:
Height: 7.52"
Width: 10.02"
Scale: 70%
Position on slide:
Horizontal - 0"
Vertical - 0"
Tree InductionGreedy strategy.Split the records based on an attribute test that optimizes certain criterion.
IssuesDetermine how to split the recordsHow to specify the attribute test condition?How to determine the best split?Determine when to stop splitting
Stopping Criteria for Tree InductionStop expanding a node when all the records belong to the same class
Stop expanding a node when all the records have similar attribute values
Early termination (to be discussed later)
Practical Issues of ClassificationUnderfitting and Overfitting
Missing Values
Costs of Classification
Underfitting and Overfitting
Overfitting
Underfitting: when model is too simple, both training and test errors are large
Overfitting due to Noise
Decision boundary is distorted by noise point
Overfitting due to Noise
* Bats and Whales are misclassified; non-mammals instead of mammals.
Overfitting due to Noise
Decision boundary is distorted by noise point
Both humans and dolphins were misclassified as n0n-mammals b/c Body Temp, Gives_Birth and Four-legged values are identical to mislabeled records in training set.
Spiny anteaters represent an exceptional case (every warm-blooded with no gives_birth is non-mammal in TR_Set
Decision tree perfectly fits training data (training error=0)
But error rate on test data is 30%.
Overfitting due to Noise
Estimating Generalization ErrorsRe-substitution errors: error on training ( e(t) )Generalization errors: error on testing ( e’(t))
Methods for estimating generalization errors:Optimistic approach: e’(t) = e(t)Pessimistic approach: For each leaf node: e’(t) = (e(t)+0.5) Total errors: e’(T) = e(T) + N 0.5 (N: number of leaf nodes) For a tree with 30 leaf nodes and 10 errors on training
(out of 1000 instances):
Training error = 10/1000 = 1%
Generalization error = (10 + 300.5)/1000 = 2.5%Reduced error pruning (REP): uses validation data set to estimate generalization
error
Occam’s RazorGiven two models of similar generalization errors, one should prefer the simpler model over the more complex model
For complex models, there is a greater chance that it was fitted accidentally by errors in data
Therefore, one should include model complexity when evaluating a model
How to Address OverfittingPre-Pruning (Early Stopping Rule)Stop the algorithm before it becomes a fully-grown treeTypical stopping conditions for a node: Stop if all instances belong to the same class Stop if all the attribute values are the sameMore restrictive conditions: Stop if number of instances is less tha ...
Modeling XCS in class imbalances: Population sizing and parameter settingskknsastry
This paper analyzes the scalability of the population size required in XCS to maintain niches that are infrequently activated. Facetwise models have been developed to predict the effect of the imbalance ratio—ratio between the number of instances of the majority class and the minority class that are sampled to XCS—on population initialization, and on the creation and deletion of classifiers of the minority class. While theoretical models show that, ideally, XCS scales linearly with the imbalance ratio, XCS with standard configuration scales exponentially.
The causes that are potentially responsible for this deviation from the ideal scalability are also investigated. Specifically, the inheritance procedure of classifiers’ parameters, mutation, and subsumption are analyzed, and improvements in XCS’s mechanisms are proposed to effectively and efficiently handle imbalanced problems. Once the recommendations are incorporated to XCS, empirical results show that the population size in XCS indeed scales linearly with the imbalance ratio.
In this project, we study the classification problem and compare some traditional statistical models with neural networks. This work was done in the frame of postgraduate programme in Web Science at Department of Mathematics, Aristotle University of Thessaloniki
Statistical Analysis of Imaging Trials: Multivariate Methods and Prediction, Probing Cancer with MR II: From Animal Models to Clinical Assessment, 17th Annual Conference of the International Society for Magnetic Resonance in Medicine, Honolulu, Hawai\'i, April 19-24
It's Not Magic - Explaining classification algorithmsBrian Lange
As organizations increasingly leverage data and machine learning methods, people throughout those organizations need to build a basic "data literacy" in those topics. In this session, data scientist and instructor Brian Lange provides simple, visual, and equation free explanations for a variety of classification algorithms, geared towards helping anyone understand how they work. Now with Python code examples!
Defect models that are trained on class imbalanced datasets (i.e., the proportion of defective and clean modules is not equally represented) are highly susceptible to produce inaccurate prediction models. Prior research compares the impact of class rebalancing techniques on the performance of defect models but arrives at contradictory conclusions due to the use of different choice of datasets, classification techniques, and performance measures. Such contradictory conclusions make it hard to derive practical guidelines for whether class rebalancing techniques should be applied in the context of defect models. In this paper, we investigate the impact of class rebalancing techniques on performance measures and the interpretation of defect models. We also investigate the experimental settings in which class rebalancing techniques are beneficial for defect models. Through a case study of 101 datasets that span across proprietary and open-source systems, we conclude that the impact of class rebalancing techniques on the performance of defect prediction models depends on the used performance measure and the used classification techniques. We observe that the optimized SMOTE technique and the under-sampling technique are beneficial when quality assurance teams wish to increase AUC and Recall, respectively, but they should be avoided when deriving knowledge and understandings from defect models.
A quick overview of the seed for Meandre 2.0 series. It covers the main motivations moving forward and the disruptive changes introduced via the use of Scala and MongoDB
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0Xavier Llorà
One hundred and fifty years have passed since the publication of Darwin's world-changing manuscript "The Origins of Species by Means of Natural Selection". Darwin's ideas have proven their power to reach beyond the biology realm, and their ability to define a conceptual framework which allows us to model and understand complex systems. In the mid 1950s and 60s the efforts of a scattered group of engineers proved the benefits of adopting an evolutionary paradigm to solve complex real-world problems. In the 70s, the emerging presence of computers brought us a new collection of artificial evolution paradigms, among which genetic algorithms rapidly gained widespread adoption. Currently, the Internet has propitiated an exponential growth of information and computational resources that are clearly disrupting our perception and forcing us to reevaluate the boundaries between technology and social interaction. Darwin's ideas can, once again, help us understand such disruptive change. In this talk, I will review the origin of artificial evolution ideas and techniques. I will also show how these techniques are, nowadays, helping to solve a wide range of applications, from life science problems to twitter puzzles, and how high performance computing can make Darwin ideas a routinary tool to help us model and understand complex systems.
Large Scale Data Mining using Genetics-Based Machine LearningXavier Llorà
We are living in the peta-byte era.We have larger and larger data to analyze, process and transform into useful answers for the domain experts. Robust data mining tools, able to cope with petascale volumes and/or high dimensionality producing human-understandable solutions are key on several domain areas. Genetics-based machine learning (GBML) techniques are perfect candidates for this task, among others, due to the recent advances in representations, learning paradigms, and theoretical modeling. If evolutionary learning techniques aspire to be a relevant player in this context, they need to have the capacity of processing these vast amounts of data and they need to process this data within reasonable time. Moreover, massive computation cycles are getting cheaper and cheaper every day, allowing researchers to have access to unprecedented parallelization degrees. Several topics are interlaced in these two requirements: (1) having the proper learning paradigms and knowledge representations, (2) understanding them and knowing when are they suitable for the problem at hand, (3) using efficiency enhancement techniques, and (4) transforming and visualizing the produced solutions to give back as much insight as possible to the domain experts are few of them.
This tutorial will try to answer this question, following a roadmap that starts with the questions of what large means, and why large is a challenge for GBML methods. Afterwards, we will discuss different facets in which we can overcome this challenge: Efficiency enhancement techniques, representations able to cope with large dimensionality spaces, scalability of learning paradigms. We will also review a topic interlaced with all of them: how can we model the scalability of the components of our GBML systems to better engineer them to get the best performance out of them for large datasets. The roadmap continues with examples of real applications of GBML systems and finishes with an analysis of further directions.
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...Xavier Llorà
Data-intensive computing has positioned itself as a valuable programming paradigm to efficiently approach problems requiring processing very large volumes of data. This paper presents a pilot study about how to apply the data-intensive computing paradigm to evolutionary computation algorithms. Two representative cases (selectorecombinative genetic algorithms and estimation of distribution algorithms) are presented, analyzed, and discussed. This study shows that equivalent data-intensive computing evolutionary computation algorithms can be easily developed, providing robust and scalable algorithms for the multicore-computing era. Experimental results show how such algorithms scale with the number of available cores without further modification.
In this project, we study the classification problem and compare some traditional statistical models with neural networks. This work was done in the frame of postgraduate programme in Web Science at Department of Mathematics, Aristotle University of Thessaloniki
Statistical Analysis of Imaging Trials: Multivariate Methods and Prediction, Probing Cancer with MR II: From Animal Models to Clinical Assessment, 17th Annual Conference of the International Society for Magnetic Resonance in Medicine, Honolulu, Hawai\'i, April 19-24
It's Not Magic - Explaining classification algorithmsBrian Lange
As organizations increasingly leverage data and machine learning methods, people throughout those organizations need to build a basic "data literacy" in those topics. In this session, data scientist and instructor Brian Lange provides simple, visual, and equation free explanations for a variety of classification algorithms, geared towards helping anyone understand how they work. Now with Python code examples!
Defect models that are trained on class imbalanced datasets (i.e., the proportion of defective and clean modules is not equally represented) are highly susceptible to produce inaccurate prediction models. Prior research compares the impact of class rebalancing techniques on the performance of defect models but arrives at contradictory conclusions due to the use of different choice of datasets, classification techniques, and performance measures. Such contradictory conclusions make it hard to derive practical guidelines for whether class rebalancing techniques should be applied in the context of defect models. In this paper, we investigate the impact of class rebalancing techniques on performance measures and the interpretation of defect models. We also investigate the experimental settings in which class rebalancing techniques are beneficial for defect models. Through a case study of 101 datasets that span across proprietary and open-source systems, we conclude that the impact of class rebalancing techniques on the performance of defect prediction models depends on the used performance measure and the used classification techniques. We observe that the optimized SMOTE technique and the under-sampling technique are beneficial when quality assurance teams wish to increase AUC and Recall, respectively, but they should be avoided when deriving knowledge and understandings from defect models.
A quick overview of the seed for Meandre 2.0 series. It covers the main motivations moving forward and the disruptive changes introduced via the use of Scala and MongoDB
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0Xavier Llorà
One hundred and fifty years have passed since the publication of Darwin's world-changing manuscript "The Origins of Species by Means of Natural Selection". Darwin's ideas have proven their power to reach beyond the biology realm, and their ability to define a conceptual framework which allows us to model and understand complex systems. In the mid 1950s and 60s the efforts of a scattered group of engineers proved the benefits of adopting an evolutionary paradigm to solve complex real-world problems. In the 70s, the emerging presence of computers brought us a new collection of artificial evolution paradigms, among which genetic algorithms rapidly gained widespread adoption. Currently, the Internet has propitiated an exponential growth of information and computational resources that are clearly disrupting our perception and forcing us to reevaluate the boundaries between technology and social interaction. Darwin's ideas can, once again, help us understand such disruptive change. In this talk, I will review the origin of artificial evolution ideas and techniques. I will also show how these techniques are, nowadays, helping to solve a wide range of applications, from life science problems to twitter puzzles, and how high performance computing can make Darwin ideas a routinary tool to help us model and understand complex systems.
Large Scale Data Mining using Genetics-Based Machine LearningXavier Llorà
We are living in the peta-byte era.We have larger and larger data to analyze, process and transform into useful answers for the domain experts. Robust data mining tools, able to cope with petascale volumes and/or high dimensionality producing human-understandable solutions are key on several domain areas. Genetics-based machine learning (GBML) techniques are perfect candidates for this task, among others, due to the recent advances in representations, learning paradigms, and theoretical modeling. If evolutionary learning techniques aspire to be a relevant player in this context, they need to have the capacity of processing these vast amounts of data and they need to process this data within reasonable time. Moreover, massive computation cycles are getting cheaper and cheaper every day, allowing researchers to have access to unprecedented parallelization degrees. Several topics are interlaced in these two requirements: (1) having the proper learning paradigms and knowledge representations, (2) understanding them and knowing when are they suitable for the problem at hand, (3) using efficiency enhancement techniques, and (4) transforming and visualizing the produced solutions to give back as much insight as possible to the domain experts are few of them.
This tutorial will try to answer this question, following a roadmap that starts with the questions of what large means, and why large is a challenge for GBML methods. Afterwards, we will discuss different facets in which we can overcome this challenge: Efficiency enhancement techniques, representations able to cope with large dimensionality spaces, scalability of learning paradigms. We will also review a topic interlaced with all of them: how can we model the scalability of the components of our GBML systems to better engineer them to get the best performance out of them for large datasets. The roadmap continues with examples of real applications of GBML systems and finishes with an analysis of further directions.
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...Xavier Llorà
Data-intensive computing has positioned itself as a valuable programming paradigm to efficiently approach problems requiring processing very large volumes of data. This paper presents a pilot study about how to apply the data-intensive computing paradigm to evolutionary computation algorithms. Two representative cases (selectorecombinative genetic algorithms and estimation of distribution algorithms) are presented, analyzed, and discussed. This study shows that equivalent data-intensive computing evolutionary computation algorithms can be easily developed, providing robust and scalable algorithms for the multicore-computing era. Experimental results show how such algorithms scale with the number of available cores without further modification.
Linkage Learning for Pittsburgh LCS: Making Problems TractableXavier Llorà
Presentation by Xavier Llorà, Kumara Sastry, & David E. Goldberg showing how linkage learning is possible on Pittsburgh style learning classifier systems
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...Xavier Llorà
A byproduct benefit of using probabilistic model-building genetic algorithms is the creation of cheap and accurate surrogate models. Learning classifier systems---and genetics-based machine learning in general---can greatly benefit from such surrogates which may replace the costly matching procedure of a rule against large data sets. In this paper we investigate the accuracy of such surrogate fitness functions when coupled with the probabilistic models evolved by the x-ary extended compact classifier system (xeCCS). To achieve such a goal, we show the need that the probabilistic models should be able to represent all the accurate basis functions required for creating an accurate surrogate. We also introduce a procedure to transform populations of rules based into dependency structure matrices (DSMs) which allows building accurate models of overlapping building blocks---a necessary condition to accurately estimate the fitness of the evolved rules.
Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...Xavier Llorà
Cancer diagnosis is essentially a human task. Almost universally, the process requires the extraction of tissue (biopsy) and examination of its microstructure by a human. To improve diagnoses based on limited and inconsistent morphologic knowledge, a new approach has recently been proposed that uses molecular spectroscopic imaging to utilize microscopic chemical composition for diagnoses. In contrast to visible imaging, the approach results in very large data sets as each pixel contains the entire molecular vibrational spectroscopy data from all chemical species. Here, we propose data handling and analysis strategies to allow computer-based diagnosis of human prostate cancer by applying a novel genetics-based machine learning technique ({\tt NAX}). We apply this technique to demonstrate both fast learning and accurate classification that, additionally, scales well with parallelization. Preliminary results demonstrate that this approach can improve current clinical practice in diagnosing prostate cancer.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Epistemic Interaction - tuning interfaces to provide information for AI support
Learning Classifier Systems for Class Imbalance Problems
1. Learning Classifier Systems
for Class Imbalance
Problems
Ester Bernadó-Mansilla
Research Group in Intelligent Systems
Enginyeria i Arquitectura La Salle
Universitat Ramon Llull
Barcelona, Spain
2. Aim
Enhance the applicability of LCSs
to knowledge discovery from datasets
Classification problems
Real-world domains
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
3. Framework
model
LCS
Dataset
+
estimated
performance
• Representativity of the target
• Evolutionary pressures
concept
• Interpretability
• Geometrical complexity
• Domain of applicability
• Class imbalance
• Noise
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
4. Class Imbalance
When one class is represented by a small number of
examples, compared to other class/es.
Usually the class of that describes the circumscribed
concept (positive class) is the minority class
Where?
Rare medical diagnoses
Fraud detection
Oil spills in satellite images
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
5. Class Imbalance and Classifiers
Is there a bias towards the majority class?
Probably, because…
Most classifier schemes are trained to minimize the global error
As a result
They classify accurately the examples from the majority class
They tend to misclassify the examples of the minority class,
which are often those representing the target concept.
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
6. Measures of Performance
Confusion matrix
Prediction
A B
Actual A true positive (TP) false negative (FN)
B false positive (FP) true negative (TN)
Accuracy = (TP+TN)/(TP+FN+FP+TN)
TN rate = TN / (TN + FP)
TP rate = TP / (FN + TP)
ROC curves
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
7. The Higher Class Imbalance: the
Higher Bias?
Dataset 1 Dataset 2
concept: 15 concept: 15
counterpart: 150 counterpart: 45
ratio: 10:1 ratio: 3:1
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
8. XCS
XCS
class
input Set of
Rules
update
search
Genetic Reinforcement
Algorithms Learning
reward
Environment
Dataset
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
9. Our Approach with XCS
Bounding XCS’s parameters for unbalanced datasets
Online identification of small disjuncts
Adaptation of parameters for the discovery of small
disjuncts
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
10. XCS’s Behavior in Unbalanced
Datasets
Unbalanced 11-multiplexer problem
ir=16:1 ir=32:1 ir=64:1
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
11. XCS’s Population
Most numerous rules, ir=128:1
Classifier P Error F Num
###########:0 1000 0.12 0.98 385
1.2 10-4
###########:1 0.074 0.98 366
estimated estimated too high
high
prediction:
overgeneral error: numerosity
fitness
992.24
classifiers 15.38
7.75
Test examples are classified as belonging to the majority class
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
12. How Imbalance Affects XCS
Classifier’s error
Stability of prediction and error estimates
Occurrence-based reproduction
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
13. Classifier’s Error in Unbalanced
Datasets
Will an overgeneral classifier be detected as inaccurate if the
imbalance ratio is high?
Bound for inaccurate classifier: !quot;!0
Given the estimated prediction and error:
P = Pc (cl ) Rmax + (1 ! Pc (cl )) Rmin
quot;=| P ! Rmax | Pc (cl )+ | P ! Rmin | (1 ! Pc (cl ))
We derive:
# quot;o p 2 + 2 p ( Rmax # quot;0 )# quot;0 ! 0
where !quot;!0
p =!C / C
For
Rmax = 1000 !0 = 1
we get maximum imbalance ratio:
irmax = 1998
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
14. Prediction and Error Estimates and
Learning Rate
ir=128:1, ###########:0
Error
Prediction
β=0.2
β=0.002
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
15. Occurrence-based Reproduction
Probability of occurrence (pocc)
Given ir=maj/min:
0,6
Classifier poccB poccI
0,5
1/2 1/2
########### :0
probability of occurrence
1/2 1/2
########### :1 0,4
0,3
0000#######:0 1/32
0,2
0001#######:1 1/32 0,1
0
1 2 4 8 16 32 64 128 256
imbalance ratio
22ir
p occB 00001######:1 00000######:0
ir + 1 ###########:0 ###########:1
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
16. Occurrence-based Reproduction
Probability of reproduction (pGA)
1
pGA =
TGA
if Tocc < % GA
#% GA
where TGA $ quot;
!Tocc otherwise
With θGA=20:
GA
Tocc
…
T (# # # # # # # # # # #: 0) ! quot;
GA GA
θGA
Tocc
GA
…
T (0000# # # # # # #: 0) ! T 1
GA occ
θGA
1 Assuming non-overlapping
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
17. Guidelines for Parameter Tuning
Rmax and є0 determine the threshold between negligible noise and
imbalance ratio
β determines the size of the moving window. The window should be
high enough to allow computing examples from both classes:
f min
! =k
f maj
θGA can counterbalance the reproduction opportunities of most frequent
(majority) and least frequent niches (minority):
1
! GA = k '
f min
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
18. XCS with Parameters Tuning
XCS with parameter tuning
XCS with standard settings
ir=16:1 ir=32:1 ir=64:1 ir=64:1 ir=256:1
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
19. XCS Tuning for Real-world Datasets
How we can estimate the niche frequency?
Estimate from the ratio of majority class instances and minority
class instances
Problem:
• This may not be related to the distribution of niches in the feature
space
Take the approach to the small disjuncts problem
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
20. Online Identification of Small Disjuncts
We search for regions that promote
overgeneral classifiers
Estimate ircl based on the classifier’s
experience on each class:
exp max
ircl =
exp min
Adapt β and θGA according to ircl
ircl = 20 / 4
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
21. Online Parameter Adaptation
ir=256:1
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
22. What about UCS?
Supervised XCS:
Needs less exploration
Avoids XCS’s fitness dilemma
More robust to parameter settings
Overgeneral classifiers also tend to overcome the
population
Their probability of occurrence depends on the imbalance ratio
Partially minimized with fitness sharing
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
23. What about UCS?
ir=256:1
ir=512:1
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
25. How can we Minimize the Effects of
Small Disjuncts?
Resampling the dataset:
Addresses small
disjuncts
Classical methods:
• Random oversampling
• Random undersampling Assumes that
clusterization will
Heuristic methods:
find small
• Tomek links
disjuncts and
• CNN match classifier’s
• One-sided selection approximation
• Smote
Cluster-based oversampling
Could XCS
benefit from the
online
Cost-sensitive classifiers
identification of
small disjuncts?
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
26. Domains of Applicability
Should we use some counterbalancing scheme?
Which learning scheme should we use?
Is there a combination of counterbalancing
scheme+learner that beats all others?
How can we know the presence of small
disjuncts?
Are there other complexity factors mixed up with
the small disjuncts problem?
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
27. Domains of Applicability
Resampling/
Learn it! Classifier/
Resampling+classifier
Where are
LCSs
placed?
Dataset Dataset
Suggested
Prediction
characterization
approach
Type of dataset:
Geometrical distribution of classes
Possible presence of small disjuncts
Other complexity factors
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
28. Future Directions
Potential benefit of XCS to discover small disjuncts
…and learn from it online
Further analyze UCS
How do LCSs perform w.r.t. other classifiers for unbalanced
datasets?
Measures for small disjuncts identification
… and other possible complexity factors
What is noise and what is a small disjunct?
In which cases a LCS is applicable?
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
29. Learning Classifier Systems
for Class Imbalance
Problems
Ester Bernadó-Mansilla
Research Group in Intelligent Systems
Enginyeria i Arquitectura La Salle
Universitat Ramon Llull
Barcelona, Spain