Abstract Goal optimization has long been a topic of great interest in computer science. The literature contains many thousands of papers that discuss methods for the search of optimal solutions to complex problems. In the case of multi-objective optimization, such a search yields iteratively improved approximations to the Pareto frontier, i.e. the set of best solutions contained along a trade-off curve of competing objectives.
To approximate the Pareto frontier, one method that is ubiquitous throughout the field of optimization is stochastic search. Stochastic search engines explore solution spaces by randomly mutating candidate guesses to generate new solutions. This mutation policy is employed by the most commonly used tools (e.g. NSGA-II, SPEA2, etc.), with the goal of a) avoiding local optima, and b) expand upon diversity in the set of generated approximations. Such "blind" mutation policies explore many sub-optimal solutions that are discarded when better solutions are found. Hence, this approach has two problems. Firstly, stochastic search can be unnecessarily computationally expensive due to evaluating an overwhelming number of candidates. Secondly, the generated approximations to the Pareto frontier are usually very large, and can be difficult to understand.
To solve these two problems, a more-directed, less-stochastic approach than standard search tools is necessary. This thesis presents GALE (Genetic Active Learning). GALE is an active learner that finds approximations to the Pareto frontier by spectrally clustering candidates using a near-linear time recursive descent algorithm that iteratively divides candidates into halves (called leaves at the bottom level). Active learning in GALE selects a minimally most-informative subset of candidates by only evaluating the two-most different candidates during each descending split; hence, GALE only requires at most, $2LogN$ evaluations per generation. The candidates of each leaf are thereafter non-stochastically mutated in the most promising directions along each piece. Those leafs are piece-wise approximations to the Pareto frontier.
The experiments of this thesis lead to the following conclusion: a near-linear time recursive binary division of the decision space of candidates in a multi-objective optimization algorithm can find useful directions to mutate instances and find quality solutions much faster than traditional randomization approaches. Specifically, in comparative studies with standard methods (NSGA-II and SPEA2) applied to a variety of models, GALE required orders of magnitude fewer evaluations to find solutions. As a result, GALE can perform dramatically faster than the other methods, especially for realistic models.
Going Smart and Deep on Materials at ALCFIan Foster
As we acquire large quantities of science data from experiment and simulation, it becomes possible to apply machine learning (ML) to those data to build predictive models and to guide future simulations and experiments. Leadership Computing Facilities need to make it easy to assemble such data collections and to develop, deploy, and run associated ML models.
We describe and demonstrate here how we are realizing such capabilities at the Argonne Leadership Computing Facility. In our demonstration, we use large quantities of time-dependent density functional theory (TDDFT) data on proton stopping power in various materials maintained in the Materials Data Facility (MDF) to build machine learning models, ranging from simple linear models to complex artificial neural networks, that are then employed to manage computations, improving their accuracy and reducing their cost. We highlight the use of new services being prototyped at Argonne to organize and assemble large data collections (MDF in this case), associate ML models with data collections, discover available data and models, work with these data and models in an interactive Jupyter environment, and launch new computations on ALCF resources.
Dr. Fariba Fahroo presents an overview of her program, Optimization and Discrete Mathematics, at the AFOSR 2013 Spring Review. At this review, Program Officers from AFOSR Technical Divisions will present briefings that highlight basic research programs beneficial to the Air Force.
The document provides an overview of the NTCIR-14 CENTRE Task, which aims to examine the replicability and reproducibility of results from past CLEF, NTCIR, and TREC evaluations. It describes the task specifications, including the replicability and reproducibility subtasks that asked participants to replicate or reproduce past run pairs. It also discusses the additional relevance assessments that were collected and the evaluation measures used, such as root mean squared error and effect ratio. The only participating team was able to mostly replicate the effects observed in the original NTCIR runs for the replicability subtask.
130321 zephyrin soh - on the effect of exploration strategies on maintenanc...Ptidej Team
RQ2
RQ3
RQ4
Conclusion and
Future Work
Conclusion
Threats to Validity and
Future Work
9 / 30
This document presents an empirical study that investigates developers' program exploration strategies. The goal is to understand how developers navigate through a program's entities in order to help them more efficiently. The study analyzes developers' interaction histories to identify common exploration strategies and examines relationships between strategies and other factors like task type and expertise level. The results could help evaluate developer performance, improve comprehension models, and guide less experienced developers.
Implementing Generate-Test-and-Aggregate Algorithms on HadoopYu Liu
Generate-Test-and-Aggregate is a class of algorithms that can automatically derive efficient MapReduce programs.
MapReduce is a useful and popular programming model for large-scale parallel processing. However, for many complex problems, it is usually not easy to develop the efficient parallel algorithms that match MapReduce paradigm well.
The generator-based parallelization approach has been developed and introduced to simplify parallel programming by its automatic generating and optimizing mechanism. Efficient parallel algorithms can be generated from users' naive but correct programs by making use of generators which exploit knowledge of optimization theorems in the field of skeletal parallel programming. The obtained efficient-parallel algorithms are in the form that very fit for implementation with MapReduce.
By such an approach, a large class of generate-and-test-like computations can be efficiently programmed and computed over MapReduce. Thus a novel programming interface and framework can be built on top of MapReduce, and that would be helpful for resolving the difficulties on programmability and efficiency. In this paper we will introduce a framework that has such a novel programming interface for MapReduce. With this framework, users can just concentrate on making naive correct programs. We will show that a lot of so-called generate-and-test-like computations can be easily and efficiently implemented by this framework over MapReduce.
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Anubhav Jain
1) The document discusses evaluating machine learning algorithms for materials science using the Matbench protocol.
2) Matbench provides standardized datasets, testing procedures, and an online leaderboard to benchmark and compare machine learning performance.
3) This allows different groups to evaluate algorithms independently and identify best practices for materials science predictions.
Overview of the NTCIR-15 We Want Web with CENTRE (WWW-3) Task
by
Tetsuya Sakai, Sijie Tao, Zhaohao Zeng
Yukun Zheng, Jiaxin Mao, Zhumin Chu, Yiqun Liu
Maria Maistro
Zhicheng Dou
Nicola Ferro
Ian Soboroff
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Anubhav Jain
- The document describes a computational materials design pipeline that uses theory, optimization, and natural language processing (NLP) to accelerate materials discovery.
- Key components of the pipeline include optimization algorithms like Rocketsled to find best materials solutions with fewer calculations, and NLP tools to extract and analyze knowledge from literature to predict promising new materials and benchmarks.
- The pipeline has shown speedups of 15-30x over random searches and has successfully predicted new thermoelectric materials discoveries 1-2 years before their reporting in literature.
Going Smart and Deep on Materials at ALCFIan Foster
As we acquire large quantities of science data from experiment and simulation, it becomes possible to apply machine learning (ML) to those data to build predictive models and to guide future simulations and experiments. Leadership Computing Facilities need to make it easy to assemble such data collections and to develop, deploy, and run associated ML models.
We describe and demonstrate here how we are realizing such capabilities at the Argonne Leadership Computing Facility. In our demonstration, we use large quantities of time-dependent density functional theory (TDDFT) data on proton stopping power in various materials maintained in the Materials Data Facility (MDF) to build machine learning models, ranging from simple linear models to complex artificial neural networks, that are then employed to manage computations, improving their accuracy and reducing their cost. We highlight the use of new services being prototyped at Argonne to organize and assemble large data collections (MDF in this case), associate ML models with data collections, discover available data and models, work with these data and models in an interactive Jupyter environment, and launch new computations on ALCF resources.
Dr. Fariba Fahroo presents an overview of her program, Optimization and Discrete Mathematics, at the AFOSR 2013 Spring Review. At this review, Program Officers from AFOSR Technical Divisions will present briefings that highlight basic research programs beneficial to the Air Force.
The document provides an overview of the NTCIR-14 CENTRE Task, which aims to examine the replicability and reproducibility of results from past CLEF, NTCIR, and TREC evaluations. It describes the task specifications, including the replicability and reproducibility subtasks that asked participants to replicate or reproduce past run pairs. It also discusses the additional relevance assessments that were collected and the evaluation measures used, such as root mean squared error and effect ratio. The only participating team was able to mostly replicate the effects observed in the original NTCIR runs for the replicability subtask.
130321 zephyrin soh - on the effect of exploration strategies on maintenanc...Ptidej Team
RQ2
RQ3
RQ4
Conclusion and
Future Work
Conclusion
Threats to Validity and
Future Work
9 / 30
This document presents an empirical study that investigates developers' program exploration strategies. The goal is to understand how developers navigate through a program's entities in order to help them more efficiently. The study analyzes developers' interaction histories to identify common exploration strategies and examines relationships between strategies and other factors like task type and expertise level. The results could help evaluate developer performance, improve comprehension models, and guide less experienced developers.
Implementing Generate-Test-and-Aggregate Algorithms on HadoopYu Liu
Generate-Test-and-Aggregate is a class of algorithms that can automatically derive efficient MapReduce programs.
MapReduce is a useful and popular programming model for large-scale parallel processing. However, for many complex problems, it is usually not easy to develop the efficient parallel algorithms that match MapReduce paradigm well.
The generator-based parallelization approach has been developed and introduced to simplify parallel programming by its automatic generating and optimizing mechanism. Efficient parallel algorithms can be generated from users' naive but correct programs by making use of generators which exploit knowledge of optimization theorems in the field of skeletal parallel programming. The obtained efficient-parallel algorithms are in the form that very fit for implementation with MapReduce.
By such an approach, a large class of generate-and-test-like computations can be efficiently programmed and computed over MapReduce. Thus a novel programming interface and framework can be built on top of MapReduce, and that would be helpful for resolving the difficulties on programmability and efficiency. In this paper we will introduce a framework that has such a novel programming interface for MapReduce. With this framework, users can just concentrate on making naive correct programs. We will show that a lot of so-called generate-and-test-like computations can be easily and efficiently implemented by this framework over MapReduce.
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Anubhav Jain
1) The document discusses evaluating machine learning algorithms for materials science using the Matbench protocol.
2) Matbench provides standardized datasets, testing procedures, and an online leaderboard to benchmark and compare machine learning performance.
3) This allows different groups to evaluate algorithms independently and identify best practices for materials science predictions.
Overview of the NTCIR-15 We Want Web with CENTRE (WWW-3) Task
by
Tetsuya Sakai, Sijie Tao, Zhaohao Zeng
Yukun Zheng, Jiaxin Mao, Zhumin Chu, Yiqun Liu
Maria Maistro
Zhicheng Dou
Nicola Ferro
Ian Soboroff
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Anubhav Jain
- The document describes a computational materials design pipeline that uses theory, optimization, and natural language processing (NLP) to accelerate materials discovery.
- Key components of the pipeline include optimization algorithms like Rocketsled to find best materials solutions with fewer calculations, and NLP tools to extract and analyze knowledge from literature to predict promising new materials and benchmarks.
- The pipeline has shown speedups of 15-30x over random searches and has successfully predicted new thermoelectric materials discoveries 1-2 years before their reporting in literature.
This document summarizes work on developing clear sky detection methods and photovoltaic data analytics tools. It describes collaborating with NREL and kWh Analytics to build a robust clear sky detection method for the RdTools software. The goal is to automatically learn the best parameters for the PVLib clear sky model by comparing its labels to known clear sky labels from satellite data. It also discusses developing open-source software to analyze string-level I-V curves collected by Sandia National Labs to detect mismatching and extract IV parameters. The work aims to help researchers by providing data management, analytics and predictive modeling through a DuraMat Data Hub.
The document discusses using artificial intelligence (AI) to accelerate materials innovation for clean energy applications. It outlines six elements needed for a Materials Acceleration Platform: 1) automated experimentation, 2) AI for materials discovery, 3) modular robotics for synthesis and characterization, 4) computational methods for inverse design, 5) bridging simulation length and time scales, and 6) data infrastructure. Examples of opportunities include using AI to bridge simulation scales, assist complex measurements, and enable automated materials design. The document argues that a cohesive infrastructure is needed to make effective use of AI, data, computation, and experiments for materials science.
TMS workshop on machine learning in materials science: Intro to deep learning...BrianDeCost
This presentation is intended as a high-level introduction for to deep learning and its applications in materials science. The intended audience is materials scientists and engineers
Disclaimers: the second half of this presentation is intended as a broad overview of deep learning applications in materials science; due to time limitations it is not intended to be comprehensive. As a review of the field, this necessarily includes work that is not my own. If my own name is not included explicitly in the reference at the bottom of a slide, I was not involved in that work.
Any mention of commercial products in this presentation is for information only; it does not imply recommendation or endorsement by NIST.
Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...Punit Sharnagat
OSMnx is a Python package to retrieve, model, analyze, and visualize street networks from OpenStreetMap.
OpenStreetMap (OSM) is a collaborative mapping project that provides a free and publicly editable map of the world.
OpenStreetMap provides a valuable crowd-sourced database of raw geospatial data for constructing models of urban street networks for scientific analysis
The Status of ML Algorithms for Structure-property Relationships Using Matb...Anubhav Jain
The document discusses the development of Matbench, a standardized benchmark for evaluating machine learning algorithms for materials property prediction. Matbench includes 13 standardized datasets covering a variety of materials prediction tasks. It employs a nested cross-validation procedure to evaluate algorithms and ranks submissions on an online leaderboard. This allows for reproducible evaluation and comparison of different algorithms. Matbench has provided insights into which algorithm types work best for certain prediction problems and has helped measure overall progress in the field. Future work aims to expand Matbench with more diverse datasets and evaluation procedures to better represent real-world materials design challenges.
Automating materials science workflows with pymatgen, FireWorks, and atomateAnubhav Jain
FireWorks is a workflow management system that allows researchers to define and execute complex computational materials science workflows on local or remote computing resources in an automated manner. It provides features such as error detection and recovery, job scheduling, provenance tracking, and remote file access. The atomate library builds on FireWorks to provide a high-level interface for common materials simulation procedures like structure optimization, band structure calculation, and property prediction using popular codes like VASP. Together, these tools aim to make high-throughput computational materials discovery and design more accessible to researchers.
The document provides an overview of materials informatics and the Materials Genome Initiative. It discusses how materials informatics uses data-driven approaches and techniques from fields like signal processing, machine learning and statistics to generate structure-property-processing linkages from materials science data and improve understanding of materials behavior. This includes extracting features from materials microstructure, using statistical analysis and data mining to discover relationships and create predictive models, and evaluating how knowledge has improved.
This document compares different heuristic search methods for optimizing traffic signal timing, in terms of solution quality (optimality) and computation time (run time). It finds that simulated annealing and genetic algorithms achieved near-optimal solutions with similar effectiveness. Within genetic algorithms, tournament and roulette wheel selection methods performed similarly. Tabu search did not provide significant benefits over other methods. Weaker search methods like hill-climbing aborted optimization early along the optimality-versus-run-time trajectory. Parameters like mutation rates and annealing schedules affected search performance and should be carefully selected.
Automated Machine Learning Applied to Diverse Materials Design ProblemsAnubhav Jain
Automated Machine Learning Applied to Diverse Materials Design Problems
Anubhav Jain presented on developing standardized benchmark datasets and algorithms for automated machine learning in materials science. Matbench provides a diverse set of materials design problems for evaluating ML algorithms, including classification and regression tasks of varying sizes from experiments and DFT. Automatminer is a "black box" ML algorithm that uses genetic algorithms to automatically generate features, select models, and tune hyperparameters on a given dataset, performing comparably to specialized literature methods on small datasets but less well on large datasets. Standardized evaluations can help accelerate progress in automated ML for materials design.
ICML 2018 included papers on generative models, music and audio applications, and AI security. On generative models, papers explored topics like learning many-to-many mappings between domains, joint distribution learning, and reducing amortization gaps in VAEs. In music, works examined hierarchical latent space models for music structure and style transfer for speech synthesis. Regarding security, studies analyzed adversarial attacks across domains, the threat of adversarial examples, and circumventing defenses through obfuscated gradients.
1. Materials Informatics uses Python tools like RDKit for analyzing molecular structures and properties.
2. ORGAN and MolGAN are two generative models that use GANs to generate novel molecular structures based on SMILES strings, with ORGAN incorporating reinforcement learning to optimize for desired properties.
3. Tools like RDKit enable analyzing molecular fingerprints and descriptors that can be used for machine learning applications in materials informatics.
PhD defense presentation of Dominik Kowald: Modeling Activation Processes in Human Memory to Improve Tag Recommendations. Presented at Know-Center / Graz University of Technology (Austria)
Materials discovery through theory, computation, and machine learningAnubhav Jain
The document discusses using theory, computation, and machine learning to discover new materials. It summarizes that density functional theory (DFT) can model material properties from first principles, and how DFT calculations have been automated and run on supercomputers to enable high-throughput screening of materials. Examples are given of computations predicting new materials that were later experimentally confirmed, like sidorenkite cathodes for sodium ion batteries. Related projects are outlined like the open-source Materials Project database of DFT data on over 85,000 materials and software libraries to support high-throughput computation and materials science. Text mining of scientific literature is also discussed to help predict new materials in advance.
The document proposes a hybrid approach to estimating biophysical parameters from remote sensing data that combines a theoretical forward model with available reference samples. It aims to improve both accuracy and robustness of estimates. The approach formulates the estimation problem and characterizes the deviation between model outputs and observations using reference samples. An experimental analysis applies the approach to soil moisture estimation using microwave data, demonstrating improved performance over solely using the theoretical model.
While much of the recent literature in spatial statistics has evolved around addressing the big data issue, practical implementations of these methods on high performance computing systems for truly large data are still rare. We discuss our explorations in this area at the National Center for Atmospheric Research for a range of applications, which can benefit from large scale computing infrastructure. These applications include extreme value analysis, approximate spatial methods, spatial localization methods and statistically-based data compression and are implemented in different programming languages. We will focus on timing results and practical considerations, such as speed vs. memory trade-offs, limits of scaling and ease of use.
HYBRID GENETIC ALGORITHM FOR BI-CRITERIA MULTIPROCESSOR TASK SCHEDULING WITH ...aciijournal
Present work considers the minimization of the bi-criteria function including weighted sum of makespan and total completion time for a Multiprocessor task scheduling problem.Genetic algorithm is the most
appealing choice for the different NP hard problems including multiprocessor task scheduling.
Performance of genetic algorithm depends on the quality of initial solution as good initial solution provides the better results. Different list scheduling heuristics based hybrid genetic algorithms (HGAs) have been
proposed and developedfor the problem. Computational analysis with the help of defined performance
index has been conducted on the standard task scheduling problems for evaluating the performance of the
proposed HGAs. The analysis shows that the ETF-GA is quite efficient and best among the other heuristic based hybrid genetic algorithms in terms of solution quality especially for large and complex problems.
Open Source Tools for Materials InformaticsAnubhav Jain
This document discusses open source tools for materials informatics, including Matminer and Matscholar. Matminer is a library of descriptors for materials science data that can generate features for machine learning models. It includes over 60 featurizer classes and supports scikit-learn. Matscholar applies natural language processing to over 2 million materials science abstracts to extract keywords and enable improved literature searching. The document argues that open datasets like Matbench and automated tools like Automatminer could help lower barriers for developing machine learning models in materials science by making it easier to obtain training data and evaluate model performance.
2D/3D Materials screening and genetic algorithm with ML modelaimsnist
JARVIS-ML provides concise summaries of materials properties using machine learning models trained on the extensive data in the JARVIS repositories. It has developed regression and classification models that can predict formation energies, bandgaps, and other material properties in seconds, much faster than traditional DFT calculations. The models use gradient boosting decision trees and feature importance analysis to provide explanations. JARVIS-ML is available as a public web app and API for rapid screening and discovery of new materials.
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET Journal
This document summarizes research on object detection techniques using deep learning. It discusses using the YOLO algorithm to identify objects in images using a single neural network that predicts bounding boxes and class probabilities. The document reviews prior research on algorithms like R-CNN, Fast R-CNN, Faster R-CNN, Mask R-CNN and RetinaNet. It then describes the YOLO loss function and methodology for finding bounding boxes of objects in an image. The document concludes that YOLO is well-suited for real-time object detection applications due to its advantages over other algorithms.
An application of genetic algorithms to time cost-quality trade-off in constr...Alexander Decker
This document summarizes a research paper that develops an optimization model using genetic algorithms to solve the time-cost-quality trade-off problem in construction projects. The model aims to find the minimum cost for a construction project to meet certain quality levels within a given time limit. It does this by considering different activity execution modes and using genetic algorithms to efficiently explore the large solution space. The document provides background on optimization problems and techniques, an overview of the time-cost-quality trade-off problem and prior related research, and describes the objectives and approach of the developed genetic algorithms model.
A literature survey of benchmark functions for global optimisation problemsXin-She Yang
The document summarizes a literature survey of 175 benchmark functions for validating global optimization algorithms. The functions have diverse properties like modality, separability, and landscape features to provide a robust test. This set of benchmark functions is the most comprehensive collection to date and can be used to thoroughly evaluate new optimization algorithms.
This document summarizes work on developing clear sky detection methods and photovoltaic data analytics tools. It describes collaborating with NREL and kWh Analytics to build a robust clear sky detection method for the RdTools software. The goal is to automatically learn the best parameters for the PVLib clear sky model by comparing its labels to known clear sky labels from satellite data. It also discusses developing open-source software to analyze string-level I-V curves collected by Sandia National Labs to detect mismatching and extract IV parameters. The work aims to help researchers by providing data management, analytics and predictive modeling through a DuraMat Data Hub.
The document discusses using artificial intelligence (AI) to accelerate materials innovation for clean energy applications. It outlines six elements needed for a Materials Acceleration Platform: 1) automated experimentation, 2) AI for materials discovery, 3) modular robotics for synthesis and characterization, 4) computational methods for inverse design, 5) bridging simulation length and time scales, and 6) data infrastructure. Examples of opportunities include using AI to bridge simulation scales, assist complex measurements, and enable automated materials design. The document argues that a cohesive infrastructure is needed to make effective use of AI, data, computation, and experiments for materials science.
TMS workshop on machine learning in materials science: Intro to deep learning...BrianDeCost
This presentation is intended as a high-level introduction for to deep learning and its applications in materials science. The intended audience is materials scientists and engineers
Disclaimers: the second half of this presentation is intended as a broad overview of deep learning applications in materials science; due to time limitations it is not intended to be comprehensive. As a review of the field, this necessarily includes work that is not my own. If my own name is not included explicitly in the reference at the bottom of a slide, I was not involved in that work.
Any mention of commercial products in this presentation is for information only; it does not imply recommendation or endorsement by NIST.
Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...Punit Sharnagat
OSMnx is a Python package to retrieve, model, analyze, and visualize street networks from OpenStreetMap.
OpenStreetMap (OSM) is a collaborative mapping project that provides a free and publicly editable map of the world.
OpenStreetMap provides a valuable crowd-sourced database of raw geospatial data for constructing models of urban street networks for scientific analysis
The Status of ML Algorithms for Structure-property Relationships Using Matb...Anubhav Jain
The document discusses the development of Matbench, a standardized benchmark for evaluating machine learning algorithms for materials property prediction. Matbench includes 13 standardized datasets covering a variety of materials prediction tasks. It employs a nested cross-validation procedure to evaluate algorithms and ranks submissions on an online leaderboard. This allows for reproducible evaluation and comparison of different algorithms. Matbench has provided insights into which algorithm types work best for certain prediction problems and has helped measure overall progress in the field. Future work aims to expand Matbench with more diverse datasets and evaluation procedures to better represent real-world materials design challenges.
Automating materials science workflows with pymatgen, FireWorks, and atomateAnubhav Jain
FireWorks is a workflow management system that allows researchers to define and execute complex computational materials science workflows on local or remote computing resources in an automated manner. It provides features such as error detection and recovery, job scheduling, provenance tracking, and remote file access. The atomate library builds on FireWorks to provide a high-level interface for common materials simulation procedures like structure optimization, band structure calculation, and property prediction using popular codes like VASP. Together, these tools aim to make high-throughput computational materials discovery and design more accessible to researchers.
The document provides an overview of materials informatics and the Materials Genome Initiative. It discusses how materials informatics uses data-driven approaches and techniques from fields like signal processing, machine learning and statistics to generate structure-property-processing linkages from materials science data and improve understanding of materials behavior. This includes extracting features from materials microstructure, using statistical analysis and data mining to discover relationships and create predictive models, and evaluating how knowledge has improved.
This document compares different heuristic search methods for optimizing traffic signal timing, in terms of solution quality (optimality) and computation time (run time). It finds that simulated annealing and genetic algorithms achieved near-optimal solutions with similar effectiveness. Within genetic algorithms, tournament and roulette wheel selection methods performed similarly. Tabu search did not provide significant benefits over other methods. Weaker search methods like hill-climbing aborted optimization early along the optimality-versus-run-time trajectory. Parameters like mutation rates and annealing schedules affected search performance and should be carefully selected.
Automated Machine Learning Applied to Diverse Materials Design ProblemsAnubhav Jain
Automated Machine Learning Applied to Diverse Materials Design Problems
Anubhav Jain presented on developing standardized benchmark datasets and algorithms for automated machine learning in materials science. Matbench provides a diverse set of materials design problems for evaluating ML algorithms, including classification and regression tasks of varying sizes from experiments and DFT. Automatminer is a "black box" ML algorithm that uses genetic algorithms to automatically generate features, select models, and tune hyperparameters on a given dataset, performing comparably to specialized literature methods on small datasets but less well on large datasets. Standardized evaluations can help accelerate progress in automated ML for materials design.
ICML 2018 included papers on generative models, music and audio applications, and AI security. On generative models, papers explored topics like learning many-to-many mappings between domains, joint distribution learning, and reducing amortization gaps in VAEs. In music, works examined hierarchical latent space models for music structure and style transfer for speech synthesis. Regarding security, studies analyzed adversarial attacks across domains, the threat of adversarial examples, and circumventing defenses through obfuscated gradients.
1. Materials Informatics uses Python tools like RDKit for analyzing molecular structures and properties.
2. ORGAN and MolGAN are two generative models that use GANs to generate novel molecular structures based on SMILES strings, with ORGAN incorporating reinforcement learning to optimize for desired properties.
3. Tools like RDKit enable analyzing molecular fingerprints and descriptors that can be used for machine learning applications in materials informatics.
PhD defense presentation of Dominik Kowald: Modeling Activation Processes in Human Memory to Improve Tag Recommendations. Presented at Know-Center / Graz University of Technology (Austria)
Materials discovery through theory, computation, and machine learningAnubhav Jain
The document discusses using theory, computation, and machine learning to discover new materials. It summarizes that density functional theory (DFT) can model material properties from first principles, and how DFT calculations have been automated and run on supercomputers to enable high-throughput screening of materials. Examples are given of computations predicting new materials that were later experimentally confirmed, like sidorenkite cathodes for sodium ion batteries. Related projects are outlined like the open-source Materials Project database of DFT data on over 85,000 materials and software libraries to support high-throughput computation and materials science. Text mining of scientific literature is also discussed to help predict new materials in advance.
The document proposes a hybrid approach to estimating biophysical parameters from remote sensing data that combines a theoretical forward model with available reference samples. It aims to improve both accuracy and robustness of estimates. The approach formulates the estimation problem and characterizes the deviation between model outputs and observations using reference samples. An experimental analysis applies the approach to soil moisture estimation using microwave data, demonstrating improved performance over solely using the theoretical model.
While much of the recent literature in spatial statistics has evolved around addressing the big data issue, practical implementations of these methods on high performance computing systems for truly large data are still rare. We discuss our explorations in this area at the National Center for Atmospheric Research for a range of applications, which can benefit from large scale computing infrastructure. These applications include extreme value analysis, approximate spatial methods, spatial localization methods and statistically-based data compression and are implemented in different programming languages. We will focus on timing results and practical considerations, such as speed vs. memory trade-offs, limits of scaling and ease of use.
HYBRID GENETIC ALGORITHM FOR BI-CRITERIA MULTIPROCESSOR TASK SCHEDULING WITH ...aciijournal
Present work considers the minimization of the bi-criteria function including weighted sum of makespan and total completion time for a Multiprocessor task scheduling problem.Genetic algorithm is the most
appealing choice for the different NP hard problems including multiprocessor task scheduling.
Performance of genetic algorithm depends on the quality of initial solution as good initial solution provides the better results. Different list scheduling heuristics based hybrid genetic algorithms (HGAs) have been
proposed and developedfor the problem. Computational analysis with the help of defined performance
index has been conducted on the standard task scheduling problems for evaluating the performance of the
proposed HGAs. The analysis shows that the ETF-GA is quite efficient and best among the other heuristic based hybrid genetic algorithms in terms of solution quality especially for large and complex problems.
Open Source Tools for Materials InformaticsAnubhav Jain
This document discusses open source tools for materials informatics, including Matminer and Matscholar. Matminer is a library of descriptors for materials science data that can generate features for machine learning models. It includes over 60 featurizer classes and supports scikit-learn. Matscholar applies natural language processing to over 2 million materials science abstracts to extract keywords and enable improved literature searching. The document argues that open datasets like Matbench and automated tools like Automatminer could help lower barriers for developing machine learning models in materials science by making it easier to obtain training data and evaluate model performance.
2D/3D Materials screening and genetic algorithm with ML modelaimsnist
JARVIS-ML provides concise summaries of materials properties using machine learning models trained on the extensive data in the JARVIS repositories. It has developed regression and classification models that can predict formation energies, bandgaps, and other material properties in seconds, much faster than traditional DFT calculations. The models use gradient boosting decision trees and feature importance analysis to provide explanations. JARVIS-ML is available as a public web app and API for rapid screening and discovery of new materials.
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET Journal
This document summarizes research on object detection techniques using deep learning. It discusses using the YOLO algorithm to identify objects in images using a single neural network that predicts bounding boxes and class probabilities. The document reviews prior research on algorithms like R-CNN, Fast R-CNN, Faster R-CNN, Mask R-CNN and RetinaNet. It then describes the YOLO loss function and methodology for finding bounding boxes of objects in an image. The document concludes that YOLO is well-suited for real-time object detection applications due to its advantages over other algorithms.
An application of genetic algorithms to time cost-quality trade-off in constr...Alexander Decker
This document summarizes a research paper that develops an optimization model using genetic algorithms to solve the time-cost-quality trade-off problem in construction projects. The model aims to find the minimum cost for a construction project to meet certain quality levels within a given time limit. It does this by considering different activity execution modes and using genetic algorithms to efficiently explore the large solution space. The document provides background on optimization problems and techniques, an overview of the time-cost-quality trade-off problem and prior related research, and describes the objectives and approach of the developed genetic algorithms model.
A literature survey of benchmark functions for global optimisation problemsXin-She Yang
The document summarizes a literature survey of 175 benchmark functions for validating global optimization algorithms. The functions have diverse properties like modality, separability, and landscape features to provide a robust test. This set of benchmark functions is the most comprehensive collection to date and can be used to thoroughly evaluate new optimization algorithms.
This document describes a hybrid approach combining scatter search and simulated annealing to solve multi-objective optimization problems. The approach generates an initial population of solutions using a diversification method. It then uses simulated annealing as an improvement method to enhance solutions. Solutions are added to a reference set based on quality and diversity. A subset generation method operates on the reference set to produce combined solutions. The combination method then transforms subsets into new combined solutions. The approach was tested on benchmark problems and found to perform well.
Computational optimization, modelling and simulation: Recent advances and ove...Xin-She Yang
This document summarizes recent advances in computational optimization, modeling, and simulation. It discusses how optimization is important for engineering design and industrial applications to maximize profits and minimize costs. Metaheuristic algorithms and surrogate-based optimization techniques are becoming widely used for complex optimization problems. The workshop accepted papers that applied optimization, modeling, and simulation to diverse areas like production planning, mixed-integer programming, electromagnetics, and reliability analysis. Overall computational optimization and modeling have broad applications and continued research is needed in areas like metaheuristic convergence and surrogate modeling methods.
Computational Optimization, Modelling and Simulation: Recent Trends and Chall...Xin-She Yang
This document summarizes recent trends and challenges in computational optimization, modeling and simulation. It discusses how nature-inspired algorithms and surrogate modeling have become popular approaches. However, challenges remain around theoretical understanding of algorithms, solving large-scale problems, and constructing accurate yet efficient surrogate models. The document also reviews papers presented at a workshop on these topics, which demonstrate diverse applications in engineering. Open questions are identified regarding improving algorithm performance, developing more intelligent algorithms, and determining best practices for specific problems.
This curriculum vitae summarizes Maxim Sviridenko's professional experience and qualifications. He currently works as a Principal Research Scientist at Yahoo! Labs, and has previously held professor and research positions at various universities and IBM. His areas of expertise include algorithms, optimization, and machine learning. He has published numerous papers in journals and conferences, supervised several students and postdocs, and received multiple awards and grants for his research work.
This document outlines the history and principles of value engineering. It discusses how value engineering seeks to balance cost, reliability, and performance. It describes the typical 8-step job plan process for conducting a value engineering study, including orientation, information gathering, functional analysis, creativity, evaluation, development, presentation, and implementation. Finally, it provides a case study example of applying value engineering to optimize the design of a focus adjustment knob for a slit lamp microscope. The redesign focused on changing the material and production process, resulting in a 38.64% cost savings.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
LNCS 5050 - Bilevel Optimization and Machine Learningbutest
This document discusses using bilevel optimization and machine learning techniques to improve model selection in machine learning problems. It proposes framing machine learning model selection as a bilevel optimization problem, where the inner level problems involve optimizing models on training data and the outer level problem selects hyperparameters to minimize error on test data. This bilevel framing allows for systematic optimization of hyperparameters and enables novel machine learning approaches. The document illustrates the approach for support vector regression, formulating model selection as a Stackelberg game and solving the resulting mathematical program with equilibrium constraints.
This document summarizes Daniel Burg's ABET portfolio containing work from various engineering courses aimed at meeting objectives related to engineering skills. It includes analyses of a truss bridge, cabin beam, fluid mechanics lab, autonomous robot design for a competition, thermodynamics group project, ethics presentation, electronics labs, and a presentation about an international exchange. The portfolio demonstrates the application of math, science, and engineering knowledge; performance of engineering analysis; experimental design skills; system design subject to constraints; effective teamwork; and communication abilities.
Resource Allocation Using Metaheuristic Searchcsandit
This document discusses using metaheuristic search techniques to solve resource allocation and scheduling problems that are common in software development projects. It evaluates the performance of three algorithms - simulated annealing, tabu search, and genetic algorithms - on test problems representative of resource constrained project scheduling problems (RCPSP). The experimental results found that all three metaheuristics can solve such problems effectively, with genetic algorithms performing slightly better overall than the other two techniques.
This presentation is about Value Engineering and contains:
1.History of VE
2.Value Concept
3.What is Value Engineering?
4.Implementation of VE in our project
5.Principle and Purpose of VE
6.Case Study
7.Conclusion
This document provides a review of optimization algorithms that have been used to solve job shop scheduling problems (JSSP). It first discusses how JSSPs are NP-hard combinatorial optimization problems that are difficult to solve exactly. It then reviews both traditional and non-traditional algorithms that have been applied to JSSPs, including mathematical programming approaches, heuristic construction methods, evolutionary algorithms like genetic algorithms, and local search methods like simulated annealing and tabu search. The document also discusses metaheuristic algorithms and provides a classification of different metaheuristics. Overall, the document aims to assess the various techniques that have been used to approach solving JSSPs.
SCHEDULING AND INSPECTION PLANNING IN SOFTWARE DEVELOPMENT PROJECTS USING MUL...ijseajournal
This document presents a multi-objective hyper-heuristic evolutionary algorithm (MHypEA) for scheduling and inspection planning in software development projects. The MHypEA incorporates twelve low-level heuristics based on selection, crossover, and mutation operations of evolutionary algorithms. The algorithm selects heuristics based on reinforcement learning with adaptive weights. An experiment on randomly generated test problems found that MHypEA explores and exploits the search space thoroughly to find high quality solutions, achieving better results than other multi-objective evolutionary algorithms in half the time.
Lecture on “Aerodynamic design of Aircraft” in University of Tokyo 21st December, 2015. Optimization techniques, data-visualization and their applications are inclusive.
This document reviews applications of evolutionary multiobjective optimization (EMO) techniques in production research. It summarizes EMO applications in several areas of production research, including scheduling, production planning and control, cellular manufacturing, flexible manufacturing systems, and assembly-line optimization. The review finds that EMO techniques have been successfully applied to optimization problems in these areas and provide a number of non-dominated solutions. However, future research opportunities remain, such as improved integration of EMO with other metaheuristics and consideration of additional objectives.
A Machine learning approach to classify a pair of sentence as duplicate or not.Pankaj Chandan Mohapatra
The team presented their machine learning project on predicting question pairs on Quora. They used logistic regression, random forest, and XGBoost models with manually engineered features like word count and word match. XGBoost performed best with an AUC score of 0.936. Key lessons were the importance of preprocessing, using words as features requires dimension reduction, and feature hashing improves scalability over storing vocabularies. Future work could experiment with convolutional neural networks for sentence similarity as proposed by H. Hua et al.
Qualitative and Quantitative Research Plans By Malik Muhammad MehranMalik Mughal
This document provides an overview of qualitative and quantitative research plans. It defines key terms like research plan, discusses the purposes and significance of research plans, and outlines the main components and steps in developing a research plan, including defining the problem, reviewing literature, developing hypotheses and methods, collecting and analyzing data, and communicating results. The document emphasizes that a good research plan provides structure, facilitates evaluation, and guides conducting a successful study within budget and timeline.
The paper presents a new language called UDITA for describing tests. UDITA is a Java-based language that includes non-deterministic choice operators and an interface for generating linked data structures. This allows for more efficient and effective test generation compared to previous approaches. The language aims to make test specification easier while generating tests that are faster, of higher quality, and less complex than traditional manually written or randomly generated tests.
1. The document summarizes the PhD thesis of Fouad KHARROUBI on solving the routing and wavelength assignment problem in WDM networks using random search algorithms.
2. It proposes a new mathematical formulation of the maximum routing and wavelength assignment problem and investigates four random search algorithms (ROA, GA, TSA, EP) to solve the problem.
3. A novel efficient Backtracking algorithm is also proposed to generate more possible lightpaths and improve the performance of the random search algorithms.
Similar to Faster Evolutionary Multi-Objective Optimization via GALE: the Geometric Active Learner (20)
Graphical Closure Rules for Unsupervised Load Classification in NILM SystemsJoe Krall
The document describes an unsupervised method for load classification in non-intrusive load monitoring (NILM) systems. The method involves 5 steps: (1) clustering raw energy data into steady states and transitions, (2) constructing a graph of states and transitions, (3) detecting cycles in the graph to define closure rules, (4) simplifying rules to reveal basic two-transition rules indicating loads, and (5) mapping loads to steady states by traversing from a minimum power node. The method was developed by researchers at LoadIQ to classify loads without requiring labeled training data.
AAAI 2014 Spring - Learning Task Management of an Aircraft Approach SystemJoe Krall
Researchers at NASA Ames Research Center and West Virginia University studied continuous descent approaches (CDA) for aircraft, which aim to improve efficiency and reduce emissions and noise levels. They used multi-objective evolutionary algorithms (MOEAs) like NSGA-II, SPEA2, and GALE to explore decision options for CDA and find tradeoffs between objectives. GALE, which uses active learning, was able to explore the problem space much more efficiently than standard MOEAs and identify decision options that provided significant improvements over the baseline in reducing emissions and noise.
The document discusses the concept of dimensional worlds and travel between dimensions. It uses the analogy of 1D beings that exist on a line or circle and can only travel left or right. Taking a dimensional "turn" would allow them to travel directly between two points rather than along the curved path. Similarly, humans exist in 3D but could potentially travel between points in 3D space through extra dimensions, shortening the distance. Bending or changing the shape of the dimensional container could further minimize travel distances, in effect achieving faster-than-light travel or time travel within our universe.
The document discusses permutation graphs and their applications. It begins by defining permutations and inversion graphs. It then introduces permutation graphs as graphs that are isomorphic to the inversion graph of some permutation. Properties of permutation graphs are discussed, including that reversing a permutation yields the complement graph. Applications mentioned include using permutation graphs to model flight altitude assignments to prevent collisions and using sorting algorithms on permutations.
This document describes an empirical study on pathfinding algorithms. It tests different neighborhood region sizes and heuristic functions using A-Star pathfinding on 6 maps. The results show that neighborhood region strongly affects runtime and path length, while heuristic function has no significant impact. Nodes evaluated models runtime well, while path length is a poor predictor of runtime. The 8-way neighborhood region is recommended to minimize nodes evaluated.
This document summarizes Joseph Krall's PhD defense presentation on his theory of fun in video games. The presentation covered several topics: dimensions of fun including originality, gameplay and story; believable AI; procedural content generation; a theory of fun involving stages of gameplay and maintaining playability and replayability; methods for studying games including surveys; and conclusions about using empirical research and a theory of fun to help game developers create more enjoyable games.
Joe Krall presented analysis on the impact of network performance on the fun and enjoyment of online games. He discussed how distractions like latency, lag and delays can negatively impact a player's immersion and experience. Krall described experiments using games like XBlast that showed a decrease in player ratings as latency increased. Additional studies examined relationships between latency, player ratings and in-game scoring. The conclusion was that following principles of system performance engineering can help minimize distractions and optimize the online gaming experience.
The Rising Future of CPaaS in the Middle East 2024Yara Milbes
Explore "The Rising Future of CPaaS in the Middle East in 2024" with this comprehensive PPT presentation. Discover how Communication Platforms as a Service (CPaaS) is transforming communication across various sectors in the Middle East.
How GenAI Can Improve Supplier Performance Management.pdfZycus
Data Collection and Analysis with GenAI enables organizations to gather, analyze, and visualize vast amounts of supplier data, identifying key performance indicators and trends. Predictive analytics forecast future supplier performance, mitigating risks and seizing opportunities. Supplier segmentation allows for tailored management strategies, optimizing resource allocation. Automated scorecards and reporting provide real-time insights, enhancing transparency and tracking progress. Collaboration is fostered through GenAI-powered platforms, driving continuous improvement. NLP analyzes unstructured feedback, uncovering deeper insights into supplier relationships. Simulation and scenario planning tools anticipate supply chain disruptions, supporting informed decision-making. Integration with existing systems enhances data accuracy and consistency. McKinsey estimates GenAI could deliver $2.6 trillion to $4.4 trillion in economic benefits annually across industries, revolutionizing procurement processes and delivering significant ROI.
A neural network is a machine learning program, or model, that makes decisions in a manner similar to the human brain, by using processes that mimic the way biological neurons work together to identify phenomena, weigh options and arrive at conclusions.
Consistent toolbox talks are critical for maintaining workplace safety, as they provide regular opportunities to address specific hazards and reinforce safe practices.
These brief, focused sessions ensure that safety is a continual conversation rather than a one-time event, which helps keep safety protocols fresh in employees' minds. Studies have shown that shorter, more frequent training sessions are more effective for retention and behavior change compared to longer, infrequent sessions.
Engaging workers regularly, toolbox talks promote a culture of safety, empower employees to voice concerns, and ultimately reduce the likelihood of accidents and injuries on site.
The traditional method of conducting safety talks with paper documents and lengthy meetings is not only time-consuming but also less effective. Manual tracking of attendance and compliance is prone to errors and inconsistencies, leading to gaps in safety communication and potential non-compliance with OSHA regulations. Switching to a digital solution like Safelyio offers significant advantages.
Safelyio automates the delivery and documentation of safety talks, ensuring consistency and accessibility. The microlearning approach breaks down complex safety protocols into manageable, bite-sized pieces, making it easier for employees to absorb and retain information.
This method minimizes disruptions to work schedules, eliminates the hassle of paperwork, and ensures that all safety communications are tracked and recorded accurately. Ultimately, using a digital platform like Safelyio enhances engagement, compliance, and overall safety performance on site. https://safelyio.com/
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid
IBM watsonx Code Assistant for Z, our latest Generative AI-assisted mainframe application modernization solution. Mainframe (IBM Z) application modernization is a topic that every mainframe client is addressing to various degrees today, driven largely from digital transformation. With generative AI comes the opportunity to reimagine the mainframe application modernization experience. Infusing generative AI will enable speed and trust, help de-risk, and lower total costs associated with heavy-lifting application modernization initiatives. This document provides an overview of the IBM watsonx Code Assistant for Z which uses the power of generative AI to make it easier for developers to selectively modernize COBOL business services while maintaining mainframe qualities of service.
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Paul Brebner
Closing talk for the Performance Engineering track at Community Over Code EU (Bratislava, Slovakia, June 5 2024) https://eu.communityovercode.org/sessions/2024/why-apache-kafka-clusters-are-like-galaxies-and-other-cosmic-kafka-quandaries-explored/ Instaclustr (now part of NetApp) manages 100s of Apache Kafka clusters of many different sizes, for a variety of use cases and customers. For the last 7 years I’ve been focused outwardly on exploring Kafka application development challenges, but recently I decided to look inward and see what I could discover about the performance, scalability and resource characteristics of the Kafka clusters themselves. Using a suite of Performance Engineering techniques, I will reveal some surprising discoveries about cosmic Kafka mysteries in our data centres, related to: cluster sizes and distribution (using Zipf’s Law), horizontal vs. vertical scalability, and predicting Kafka performance using metrics, modelling and regression techniques. These insights are relevant to Kafka developers and operators.
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...The Third Creative Media
"Navigating Invideo: A Comprehensive Guide" is an essential resource for anyone looking to master Invideo, an AI-powered video creation tool. This guide provides step-by-step instructions, helpful tips, and comparisons with other AI video creators. Whether you're a beginner or an experienced video editor, you'll find valuable insights to enhance your video projects and bring your creative ideas to life.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
React.js, a JavaScript library developed by Facebook, has gained immense popularity for building user interfaces, especially for single-page applications. Over the years, React has evolved and expanded its capabilities, becoming a preferred choice for mobile app development. This article will explore why React.js is an excellent choice for the Best Mobile App development company in Noida.
Visit Us For Information: https://www.linkedin.com/pulse/what-makes-reactjs-stand-out-mobile-app-development-rajesh-rai-pihvf/
Manyata Tech Park Bangalore_ Infrastructure, Facilities and Morenarinav14
Located in the bustling city of Bangalore, Manyata Tech Park stands as one of India’s largest and most prominent tech parks, playing a pivotal role in shaping the city’s reputation as the Silicon Valley of India. Established to cater to the burgeoning IT and technology sectors
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio, Inc.
Alluxio Webinar
June. 18, 2024
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Jianjian Xie (Staff Software Engineer, Alluxio)
As Trino users increasingly rely on cloud object storage for retrieving data, speed and cloud cost have become major challenges. The separation of compute and storage creates latency challenges when querying datasets; scanning data between storage and compute tiers becomes I/O bound. On the other hand, cloud API costs related to GET/LIST operations and cross-region data transfer add up quickly.
The newly introduced Trino file system cache by Alluxio aims to overcome the above challenges. In this session, Jianjian will dive into Trino data caching strategies, the latest test results, and discuss the multi-level caching architecture. This architecture makes Trino 10x faster for data lakes of any scale, from GB to EB.
What you will learn:
- Challenges relating to the speed and costs of running Trino in the cloud
- The new Trino file system cache feature overview, including the latest development status and test results
- A multi-level cache framework for maximized speed, including Trino file system cache and Alluxio distributed cache
- Real-world cases, including a large online payment firm and a top ridesharing company
- The future roadmap of Trino file system cache and Trino-Alluxio integration
Enhanced Screen Flows UI/UX using SLDS with Tom KittPeter Caitens
Join us for an engaging session led by Flow Champion, Tom Kitt. This session will dive into a technique of enhancing the user interfaces and user experiences within Screen Flows using the Salesforce Lightning Design System (SLDS). This technique uses Native functionality, with No Apex Code, No Custom Components and No Managed Packages required.
Building API data products on top of your real-time data infrastructureconfluent
This talk and live demonstration will examine how Confluent and Gravitee.io integrate to unlock value from streaming data through API products.
You will learn how data owners and API providers can document, secure data products on top of Confluent brokers, including schema validation, topic routing and message filtering.
You will also see how data and API consumers can discover and subscribe to products in a developer portal, as well as how they can integrate with Confluent topics through protocols like REST, Websockets, Server-sent Events and Webhooks.
Whether you want to monetize your real-time data, enable new integrations with partners, or provide self-service access to topics through various protocols, this webinar is for you!
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...XfilesPro
Wondering how X-Sign gained popularity in a quick time span? This eSign functionality of XfilesPro DocuPrime has many advancements to offer for Salesforce users. Explore them now!
Faster Evolutionary Multi-Objective Optimization via GALE: the Geometric Active Learner
1. JosephKrall
In partial fulfillment of the requirements for the degree of Doctor of
Philosophy in Computer Science.
College of Engineering and Mineral Resources
Faster Evolutionary Multi-Objective Optimization
via GALE, the Geometric Active Learner
a Ph.D. Final Defense Presentation for the
Special Thanks to the
NASA Ames Research Center
The Lane Department of Computer Science and Electrical Engineering
at
April 21, 2014
Estimated Duration:
45 minutes
2. 4/21/2014 Faster Multi-Objective Optimization via GALE
A Thesis Proposal
- “JMOO: Tools for Faster Multi-Objective Optimization”
Comments from Committee
- Lacking Rigor
- Generalizability of Proposal
- Lacking Details / Misunderstandings
- Some Missing Related Works
- Validity Concerns
- Needed More – Not Substantial Enough
Last Time
1. Introduction
November, 2013
SE or CS?
2/48
3. 4/21/2014 Faster Multi-Objective Optimization via GALE
Final Dissertation
- “Faster Multi-Objective Optimization via GALE”
Key Changes from Proposal
- Focus on Contributions of GALE
- Focus on Assessing and Validating GALE
- Very rigorous experimental methodology
- Addressing Comments from Proposal
- Expansive Related Works
- Formalizing the Field
- MANY more experimental results
This Time
Spring!
…Sort of
April, 2014
1. Introduction
3/48
4. 4/21/2014 Faster Multi-Objective Optimization via GALE
Search & Optimization of Goals
- the art of decision making
- e.g. shortest time city navigation
- e.g. managing calorie intake for diets
Not always trivial
- Landing an airplane safely
- Maximizing software project profits
MOO = Multi-Objective Optimization
- Draft solutions to a problem (red)
- Find Pareto Frontiers (green)
- Report to a decision maker
This Thesis
Areas on the Pareto frontier
Rejected
Solutions
Who do I pick???
1. Introduction
4/48
5. 4/21/2014 Faster Multi-Objective Optimization via GALE
Increasing Interest
The Field of MOO
Agile
Project
Studies
Aircraft
Studies
Software Engineering (SE) General MOO
(MOO) Coello: http://delta.cs.cinvestav.mx/˜ccoello/EMOO/EMOObib.html
(SE) CREST: http://crestweb.cs.ucl.ac.uk/resources/sbse_repository/repository.html
* Data from :
8000 Papers
Since the
1950’s
1. Introduction
In this thesis:
SE and CS
5/48
6. 4/21/2014 Faster Multi-Objective Optimization via GALE
[Sayyad & Ammar 2013] Report:
- NSGA-II and SPEA2 are the most popular search tools today
Popular Search Tools Evaluate Too Much
- O(N2) internal search: fast if solution evaluation is a cheap operation
- Need to count number of evaluations instead: O(2NG)
This Thesis Proposes GALE: O(2Log2(NG))
- GALE adds data mining to evaluate only the most-informative solutions
Main Message
Introduction
GALE:
597s
NSGA-II:
14,018s
N = population size
G = number of generations
6/48
7. 4/21/2014 Faster Multi-Objective Optimization via GALE
Aircraft Studies for Safety Assurance
- Complex Simulations at NASA [8 seconds per run]
Standard MOO Tools
- Many [300] weeks
GALE
- Many [300] hours
Applications of MOO
!
* Asiana Flight Wreckage,
Summer 2013
(50400 hrs)
(1.8 wks)
1. Introduction
7/48
8. 4/21/2014 Faster Multi-Objective Optimization via GALE
GALE is a Meta-heuristic Search Tool
- Too difficult (maybe impossible) to “prove”
- Can only be experimented
-> Generalizability (External Validity) concerns
-> A MOO Critique to Improve Validity
Research Questions
- Evaluations
- Runtime
- Solution Quality
Assessing GALE
4 Experimental Areas:
- #1 Aircraft Safety (CDA)
- #2 Agile Projects (POM3)
- #3 Constrained Lab Problems
- #4 Unconstrained Lab Problems
SE or CS?
SE
CS
CS
CS
1. Introduction
8/48
9. 4/21/2014 Faster Multi-Objective Optimization via GALE
GALE shown to be a strong rival to NSGA-II & SPEA2
And The Results
Two orders of magnitude
fewer evaluations for all
models
Two orders of magnitude
faster (seconds) for big
models
Better Solution Quality
SPEA2 much slower
GALE Never worse NSGA-II/SPEA2 Never better
1. Introduction
9/48
10. 4/21/2014 Faster Multi-Objective Optimization via GALE
Background2
In this chapter:
- Formalities
- Definitions
- Related Works
1. Introduction
2. Background
3. MOO Critique
4. GALE
5. Models
6. Experiments
7. Validity
8. Conclusion
10 Slides
10/48
11. 4/21/2014 Faster Multi-Objective Optimization via GALE
Mathematical Programming: [Dantzig]
- The aim is to find solutions that optimize objectives
- Transformation functions transform decisions (x) into objectives (y)
- Solutions are infeasible if they do not satisfy constraint functions
Formalities
2. Background
objectives
Constraint functionsOptimality direction
Transformation functions
a. Defines
11/48
12. 4/21/2014 Faster Multi-Objective Optimization via GALE
Lab Problems
- Schaffer, Viennet, Tanaka, etc.
Real-world Problems
- Simulations
- Too complex for math
- Aircraft Safety
- Software Dev. Profit
Kinds of Models
The Schaffer Model
2. Background a. Defines
12/48
13. 4/21/2014 Faster Multi-Objective Optimization via GALE
Early methods assumed math models
- A bad assumption for real world practicality
They also assume other aspects:
- Concave vs. Convex
- Differentiability
- Linear vs. Non-linear
- Single vs. Multi-objective
- Objective Functions vs. Simulation
Numerical Optimization
2. Background b. Early Methods
13/48
14. 4/21/2014 Faster Multi-Objective Optimization via GALE
Exterior Search [Dantzig]
- For Linear problems ( [Nelder & Mead 1965] made a non-linear version)
- Embed a simplex with solutions along the vertices
- Traverse along the nodes
- Good average Complexity
- But bad O(N3) worst case
Simplex Search
Nelder, John A.; R. Mead (1965). "A simplex method for
function minimization". Computer Journal 7: 308–313.
2. Background b. Early Methods
14/48
15. 4/21/2014 Faster Multi-Objective Optimization via GALE
Karmarkar’s Algorithm – [Karmarkar 1984]
- Good for big data
- Fast convergence
- Polynomial complexity
- 50x faster than Simplex
- Single-Objective Only
- Requires Concavity
Interior Point Methods
Narendra Karmarkar (1984). "A New Polynomial Time Algorithm for Linear Programming", Combinatorica, Vol 4, nr. 4, p. 373–395.
2. Background b. Early Methods
15/48
16. 4/21/2014 Faster Multi-Objective Optimization via GALE
Moving onward from Numerical Methods
- Improve a heuristic, not the actual objectives
- Hill Climbing: Accept only improved steps
- Tabu Search: Refuse only recently attempted steps
- Simulated Annealing: Early bad okay, late bad refused
Heuristic-based Searches
2. Background c. Recent Methods
16/48
17. 4/21/2014 Faster Multi-Objective Optimization via GALE
Particle Swarm Optimization [Kennedy 1995]
- Real life swarms; flocks of birds, etc
- Swarm towards good solutions
- Self best and Pack best
Ant Colony Optimization [Dorigo 1992]
- Ant Colony Path Searches
- Pheromone density = best path
PSO & ACO
Kennedy, J.; Eberhart, R. (1995). "Particle Swarm
Optimization". Proceedings of IEEE International Conference on Neural
Networks IV. pp. 1942–1948.
M. Dorigo, Optimization, Learning and Natural Algorithms, PhD thesis,
Politecnico di Milano, Italy, 1992.
2. Background c. Recent Methods
17/48
18. 4/21/2014 Faster Multi-Objective Optimization via GALE
Standard EA (Evolutionary Algorithm):
1) Build initial population
2) Repeat for max_generations:
a) crossover
b) mutation
c) select
3) Return final population
Evolutionary Algorithms
a+b) Build Offspring: Perturb Population
c) Combine Offspring + Population
c) Cull the worst solutions to retain Population Size
* Malin Åberg:
http://physiol.gu.se/maberg/images.html
2. Background c. Recent Methods
18/48
19. 4/21/2014 Faster Multi-Objective Optimization via GALE
NSGA-II [Deb 2002]
- Non-dominated Sorting Genetic Algorithm
- Standard select+crossover+mutation
- Sort by ‘bands’, or domination ‘depth’
- Break ties based on density
- crowding distance
NSGA-II
Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. (2002). "A fast
and elitist multiobjective genetic algorithm: NSGA-II". IEEE
Transactions on Evolutionary Computation 6 (2): 182
2. Background c. Recent Methods
19/48
20. 4/21/2014 Faster Multi-Objective Optimization via GALE
SPEA2 [Zitzler2002]
- Strength Pareto Evolutionary Algorithm
- Standard select+crossover+mutation
- Sort by ‘strength’: count of solutions someone dominates
- Truncate crowded solutions via nearest neighbor
SPEA2
E. Zitzler, M. Laumanns, and L. Thiele. SPEA2: Improving the strength pareto evolutionary algorithm for multiobjective
optimization. Evolutionary Methods for Design Optimization and Control with Applications to Industrial Problems, 95--100, 2001.
2. Background c. Recent Methods
20/48
21. 4/21/2014 Faster Multi-Objective Optimization via GALE
MOO Critique
3
In this chapter:
- Survey
- Rigor
1. Introduction
2. Background
3. MOO Critique
4. GALE
5. Models
6. Experiments
7. Validity
8. Conclusion
4 Slides
21/48
22. 4/21/2014 Faster Multi-Objective Optimization via GALE
Experimental Rigor
- Want to maximize validity
- Because reasons to doubt GALE
- Still does good with few evals?
- Can still run fast?
We looked at literature for advice
- Search query targeted these questions:
- Ended up selecting 21 papers
Survey of MOO
Statistical Methods?
- [Demsar2006]: recommends
KS-Test + Friedman + Nemenyi
* J. Demsar, “Statistical comparisons of classifiers over multiple data sets,” ˇ J. Mach. Learn. Res., vol. 7, pp. 1–30, Dec. 2006.
Population size?
- 20 ~ 100 is good.
- Over 200 is a waste
Number of Repeats?
- [Harman 2012]: 30-50 is common.
- This Thesis: 20.
* M. Harman et al., Search based software engineering: techniques, taxonomy, tutorial. In Empirical Software Engineering and
Verification, Bertrand Meyer and Martin Nordio (Eds.). Springer-Verlag, Berlin, Heidelberg 1-59.
3. MOO Critique
22/48
23. 4/21/2014 Faster Multi-Objective Optimization via GALE
1. Use variety of models
– Real World Models: Practicality.
– Standard Models: Reproducibility.
– Constrained and Unconstrained: Generalizability
2. How many Repeats
– Pragmatics: Keep repeats low to save on computational cost
– Statistics: Want high repeats for statistical stability
– The middle ground: for n in 20,30,40: no change. So 20 is good.
Principles 1 & 2
Many
papers
used only
lab models
- 7 Constrained
- 13 Unconstrained
- 1 Privatized (CDA)
- 1 Public (POM3)
In this thesis:
Standard ModelsReal World Models
Constrained Lab
Unconstrained LabPublic
Privatized
Use models from
all quadrants:
3. MOO Critique
23/48
24. 4/21/2014 Faster Multi-Objective Optimization via GALE
3. Statistical Methods
– Based on Demsar’s Recommendations
– Begin with Kolmogorov-Smirnov (KS-Test) to test normality
• Data rarely conforms to normality assumptions
– For two-group testing, use Wilcoxon Rank Sum (WRS) Test
– For Multi-group testing, use Friedman Test + Nemenyi
4. Runtimes
– Report runtimes to aid reproducibility arguments
– Report details of machine
Principles 3 & 4
3. MOO Critique
Most papers failed to
address number of groups
Half of the papers
neglected to report runtimes
24/48
25. 4/21/2014 Faster Multi-Objective Optimization via GALE
5. Number of Evaluations
– Report number of evaluations
– Because they dominate runtime of real-world models
6. Parameters
– Define all parameters carefully
– Reproducibility concerns: pop. Size, #gens, stopping criteria
7. Discuss Threats of Validity
– Don’t make the reader do all the work
– Rigorous Experimental Methods = Stronger Conclusions
Principles 5-7
Half of the papers
neglected to report evaluations
Almost no one had a threats
to validity section in their paper
3. MOO Critique
25/48
26. 4/21/2014 Faster Multi-Objective Optimization via GALE
GALE4
1. Introduction
2. Background
3. MOO Critique
4. GALE
5. Models
6. Experiments
7. Validity
8. Conclusion
In this chapter:
- Spectral Learning
- Active Learning
5 Slides
26/48
27. GALE: Geometric Active Learning (Evolution)
- At most O(2Log2N) evaluations per generation
- Exactly Θ(2N) evaluations for NSGA-II, SPEA2
Main Differences in GALE:
- cluster solutions
- evaluate some, not all
- Directed vs random
- More on these later
4/21/2014 Faster Multi-Objective Optimization via GALE
Introducing GALE
4. GALE
GALE NSGA-II SPEA2
Asymptotic Notation:
Big-O: worst case
Big-Theta: Exact case
27/48
28. 4/21/2014 Faster Multi-Objective Optimization via GALE
Three key phrases to talk about
1. Active Learning
- Minimize cost of evaluation
- Learn more from using less [Settles 2009]
2. Spectral Learning (WHERE)
- Reasoning with eigenvectors via covariance matrix
- “Spectral Clustering” – via eigenvectors
- FastMap finds eigenvectors faster than PCA
3. Directed Search
- Shove solutions along promising directions
Components to GALE
some, not all
clustered
spectrally
Directed
mutation
4. GALE
28/48
29. 4/21/2014 Faster Multi-Objective Optimization via GALE
Algorithm shown here and explained over next several slides
- WHERE algorithm
- WHERE uses FastMap
- Directed Mutation
1. Build initial population, P0. Initialize generation: t = 0. Set Life = 3.
2. Repeat until stopping criteria is met (stop if life == 0):
a. Run WHERE (with pruning) to select Rt = dominant leafs from WHERE.
b. Perform Directed Mutation on members of Rt.
c. Copy Rt into Pt+1 and generate new random candidates until new population is full.
d. Increment generation number t = t + 1.
e. Collect stats and evaluate stopping criteria. Decrement life if no improvement to any
objective.
3. Run WHERE (without pruning) to select Rt = dominant leafs from WHERE.
4. Rt contains approximations to the Pareto frontier.
GALE Pseudo-Code
GALE
Spectral Learning
Active Learning
Directed Search
29/48
30. 4/21/2014 Faster Multi-Objective Optimization via GALE
Spectral clustering is O(n3) [Kumar12]
- Common method: PCA
- The Nystrom Method reduces to near-linear
- Low-rank approx. of covariance matrix
e.g.: FastMap is a Nystrom Algorithm [Platt05]
- 1) Pick an arbitrary point, z.
- 2) Let ‘east’ be the furthest point from z.
- 3) Let ‘west’ be the furthest point from ‘east’.
- 4) Project all points onto the line east-west
- 5) east-west is the first principal component
Nystrom Method
GALE
east
west
c
b
a x
Active Learning:
- Only evaluate East & West!
30/48
31. 4/21/2014 Faster Multi-Objective Optimization via GALE
WHERE = Spectral Learning in GALE
- Similar to Boley’s PDDP: find first eigenvector and recursively split
- PDDP uses PCA. WHERE uses FastMap.
The WHERE Tool
GALE
Initial population
WHERE clusters
initial population =
Spectral Learning
Only evaluate the
best clusters =
Active Learning
Mutate along
those clusters =
Directed Search
At Most 2Log2(NG) Evaluations (N=Population Size. G=Number of Generations)
Refill the Population
Non-dominated
clusters
31/48
32. 4/21/2014 Faster Multi-Objective Optimization via GALE
Models
5
1. Introduction
2. Background
3. MOO Critique
4. GALE
5. Models
6. Experiments
7. Validity
8. Conclusion
In this chapter:
- CDA
- POM3
- Lab Models
4 Slides
32/48
33. 4/21/2014 Faster Multi-Objective Optimization via GALE
5. Models
Continuous Descent Arrival
- NASA wants to know if CDA is doable
- Standard descents are less efficient than CDA
-> more {noise, time, fuel, $$$}
- CDA might unnecessarily strain air traffic control (ATC)
CDA Model
a. CDA
33/48
34. 4/21/2014 Faster Multi-Objective Optimization via GALE
Lots of work
- 2 months at NASA Ames Research Center
- CDA not pre-assembled
Inspiration from 2013 Asiana Flight Crash
- Pilots had to do unusually more tasks than normal
- Keeping airspeed nominal was a task they ‘forgot’
- Human Factors model a pilot ‘HTM’ = maximum human taskload
Goal of CDA: less forgetting, less time from delays and missed tasks
* based on Work Models that Compute by Pritchett, Kim and Feigh, 2011-2013
Building CDA
5. Models a. CDA
34/48
35. 4/21/2014 Faster Multi-Objective Optimization via GALE
POM3
- Model of Agile Software Requirements Engineering
Agile Software Projects
- Programmers rush to complete tasks
- But what tasks get most priority?
Requirements Prioritization Strategies
- Find good schemes that optimize objectives
POM3
Repeat 2 < N < 6 times:
1. Collect Tasks
2. Prioritize Tasks
3. Execute Tasks
4. Find New Tasks
5. Adjust Priorities
Objectives to Minimize
- Total Cost
- % Idle Rate of Teams
Objectives to Maximize
- % Completion of Tasks
* POM3 based on POM2 based on POM by Portman, Owens, Menzies (2008, 2009)
5. Models b. POM3
35/48
36. 4/21/2014 Faster Multi-Objective Optimization via GALE
We explore all these:
The Constrex Model
Standard Lab Models
Unconstrained Constrained
Fonseca BNH
Golinski Constrex
Kursawe Osyczka2
Poloni Srinivas
Schaffer Tanaka
Viennet2-3-4 TwoBarTruss
ZDT1-2-3 Water
ZDT4-6
5. Models c. Lab
36/48
37. 4/21/2014 Faster Multi-Objective Optimization via GALE
Experiments6
3. MOO Critique
4. GALE
5. Models
6. Experiments
7. Validity
8. Conclusion
1. Introduction
2. Background
4 Slides
37/48
In this chapter:
- Results
- Analysis
38. 4/21/2014 Faster Multi-Objective Optimization via GALE
Research Questions:
- Number of Evaluations
- Runtime
- Quality of Solutions
4 Experiment Areas:
- #1 Aircraft Safety
- #2 Agile Software Development
- #3 Constrained Lab Models
- #4 Unconstrained Lab Models
Experimental Methods
6. Experiments
1. Run the Model 500 times
2. Collect an average-case baseline
3. Compute loss (x, baseline) for each solution x
4. The median loss is the “Quality Score”
o = number of objectives
Quality Score:
> 1.0: Loss in Quality from Baseline
= 1.0: No Change from Baseline
< 1.0: Improvement from Baseline
[Zitzler & Kunzli 2004]
38/48
39. 4/21/2014 Faster Multi-Objective Optimization via GALE
Experiment GALE NSGA-II SPEA2
#1 Aircraft Safety
(CDA Model)
50
+++
2800
=
2450
=
#2 Agile Software
(POM3 Models)
36-46
+++
3000-3550
=
3050-3300
=
#3 Constrained
Lab Models
28-88
+++
1050-3250
=
950-3150
=
#4 Unconstrained
Lab models
26-45
+++
1250-3550
=
1250-3250
=
RQ1: Number of Evaluations
GALE needed two orders of magnitude fewer evaluations
6. Experiments
39/48
40. 4/21/2014 Faster Multi-Objective Optimization via GALE
Experiment GALE NSGA-II SPEA2
#1 Aircraft Safety
(CDA Model)
6 – 20mins
+++
3 – 5hrs
=
3 – 5hrs
=
#2 Agile Software
(POM3 Models)
1.5 – 9.5s
++
4.0 – 108s
=
12 – 109s
=
#3 Constrained
Lab Models
0.5 – 1.5s
=
0.5 – 1.0s
=
3 – 30s
–
#4 Unconstrained
Lab models
0.5 – 2.5s
=
0.5 – 1.0s
=
3 – 30s
–
#5 – 16 Modes of
the CDA Model
83 hours 6 months 6 months
RQ2: Runtime
GALE needed two orders of magnitude lesser runtime
6. Experiments
GALE
enabled an
even larger
study on
CDA
NSGA-II and
SPEA
weren’t
used in #5,
so these
values were
extrapolated
from #1
40/48
41. 4/21/2014 Faster Multi-Objective Optimization via GALE
Experiment GALE NSGA-II SPEA2
#1 Aircraft Safety
(CDA Model)
0-0-2
=
0-0-2
=
0-0-2
=
#2 Agile Software
(POM3 Models)
0-0-6
=
0-1-5
=
1-0-5
=
#3 Constrained
Lab Models
12-0-2
+
0-6-8
=
0-6-8
=
#4 Unconstrained
Lab models
10-3-13
+
1-5-20
=
2-5-19
=
RQ3: Solution Quality
Displays are ‘Wins-Losses-Ties’ Format
GALE never loses. GALE usually wins.
KS-Test + Friedman + Nemenyi at the 99% Level
6. Experiments
41/48
42. 4/21/2014 Faster Multi-Objective Optimization via GALE
Threats to Validity
3. MOO Critique
4. GALE
5. Models
6. Experiments
7. Validity
8. Conclusion
7
1. Introduction
2. Background
1 Slide
42/48
In this chapter:
- Validity
43. 4/21/2014 Faster Multi-Objective Optimization via GALE
Most threats were already addressed
Others too trivial for this presentation
Threats to Validity
7. Validity
43/48
44. 4/21/2014 Faster Multi-Objective Optimization via GALE
Conclusion
3. MOO Critique
4. GALE
5. Models
6. Experiments
7. Validity
8. Conclusion
8
1. Introduction
2. Background
3 Slides
44/48
In this chapter:
- Summary
- Ending
45. 4/21/2014 Faster Multi-Objective Optimization via GALE
Popular MOO Tools Need O(2NG) Evaluations
- Very slow for large models
GALE: Geometric Active Learning (Evolution)
- Add Data Mining to Search
- Evaluate only most informative Solutions
- At most O(2LogNG) Evaluations (usually less than that)
- Enables large studies with large models
- Finds good solutions for wide
variety of models
Summary
8. Conclusion
N = population size
G = number of generations
Active Learning:
- Only evaluate East & West!
Standard ModelsReal World Models
Constrained Lab
Unconstrained LabPublic
Privatized
45/48
46. 4/21/2014 Faster Multi-Objective Optimization via GALE
Developed principles for rigorous experiments
Employed those principles for our experiments
Principles
8. Conclusion
46/48