Deploying your Predictive Models as a Service via DominoJo-fai Chow
The document discusses how Domino Data Lab can be used to deploy predictive models as APIs. It provides examples of using Domino to build, evaluate, and deploy predictive models for the Iris dataset and stock market forecasting. Key features discussed include the web and R interfaces, code sharing, scheduled runs, automatic version control, and publishing models as APIs for other applications to access.
Taverna workflows can be run in the cloud to automate complex analysis pipelines and access remote data and services. This allows sophisticated computational analyses to be shared as web services. The BioVeL and CA4LS projects are developing cloud-based workflow systems to support life scientists and clinical researchers. Workflows are hidden from users, who access pre-configured analyses via a web interface. This "workflow as a service" approach scales easily and provides a secure environment for data-intensive biomedical research.
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs and so forth. Don’t stop reading. Data management isn’t likely to win anyone a Nobel prize. But publications should be supported and accompanied by data, methods, procedures, etc. to assure reproducibility of results. Funding agencies expect data (and increasingly software) management retention and access plans as part of the proposal process for projects to be funded. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems Biology demands the interlinking and exchange of assets and the systematic recording
of metadata for their interpretation.
The FAIR Guiding Principles for scientific data management and stewardship (http://www.nature.com/articles/sdata201618) has been an effective rallying-cry for EU and USA Research Infrastructures. FAIRDOM (Findable, Accessible, Interoperable, Reusable Data, Operations and Models) Initiative has 8 years of experience of asset sharing and data infrastructure ranging across European programmes (SysMO and EraSysAPP ERANets), national initiatives (de.NBI, German Virtual Liver Network, UK SynBio centres) and PI's labs. It aims to support Systems and Synthetic Biology researchers with data and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety.
This talk will use the FAIRDOM Initiative to discuss the FAIR management of data, SOPs, and models for Sys Bio, highlighting the challenges of and approaches to sharing, credit, citation and asset infrastructures in practice. I'll also highlight recent experiments in affecting sharing using behavioural interventions.
http://www.fair-dom.org
http://www.fairdomhub.org
http://www.seek4science.org
Presented at COMBINE 2016, Newcastle, 19 September.
http://co.mbine.org/events/COMBINE_2016
Tutorial given at the European Conference for Machine Learning (ECMLPKDD 2015). It covers OpenML, how to use it in your research, interfaces in Java, R, Python, use through machine learning tools such as WEKA and MOA. Also covers topics in open science and reproducible research.
PhD Thesis: Mining abstractions in scientific workflowsdgarijo
Slides of the presentation for my PhD dissertation. I strongly recommend downloading the slides, as they have animations that are easier to see in power point. The abstract of the thesis is as follows: "Scientific workflows have been adopted in the last decade to represent the computational methods used in in silico scientific experiments and their associated research products. Scientific workflows have demonstrated to be useful for sharing and reproducing scientific experiments, allowing scientists to visualize, debug and save time when re-executing previous work. However, scientific workflows may be difficult to understand and reuse. The large amount of available workflows in repositories, together with their heterogeneity and lack of documentation and usage examples may become an obstacle for a scientist aiming to reuse the work from other scientists. Furthermore, given that it is often possible to implement a method using different algorithms or techniques, seemingly disparate workflows may be related at a higher level of abstraction, based on their common functionality. In this thesis we address the issue of reusability and abstraction by exploring how workflows relate to one another in a workflow repository, mining abstractions that may be helpful for workflow reuse. In order to do so, we propose a simple model for representing and relating workflows and their executions, we analyze the typical common abstractions that can be found in workflow repositories, we explore the current practices of users regarding workflow reuse and we describe a method for discovering useful abstractions for workflows based on existing graph mining techniques. Our results expose the common abstractions and practices of users in terms of workflow reuse, and show how our proposed abstractions have potential to become useful for users designing new workflows".
Results may vary: Collaborations Workshop, Oxford 2014Carole Goble
Thoughts on computational science reproducibility with a focus on software. Given at the Software Sustainability Institute's 2014 Collaborations Workshop
Agile Development in a Regulated EnvironmentTechWell
There is no doubt that agile is an accepted development methodology. However, if you work in a regulated industry like health care where you have to comply with its standard operating procedures, heaps of paperwork, and frequent audits, don’t these conflict with agile’s core tenets? Chris Ampenberger describes his operating environment and the applicable regulations that define the constraints for the software development process he can use. He shares how they overcame the incongruity between agile and regulatory requirements. With real-world examples, Chris demonstrates how you can produce the required documentation as a byproduct of the scrum team’s everyday work and illustrates how his teams succeeded in an agile way, achieving significant increases in productivity. Chris points out common pitfalls, details the hurdles they had to overcome, and discusses how to obtain buy-in from stakeholders at all levels of the organization. If you are working in a regulated environment, this session is for you.
The SEALS project conducted the first worldwide evaluation of semantic tools using their SEALS platform. They evaluated ontology engineering tools, storage and reasoning systems, ontology matching tools, semantic search tools, and semantic web services tools. The results showed that certain tools performed better than others in each category. A white paper summarizing the results will be published soon, and the next evaluation campaign using the SEALS platform will begin in July 2011.
Deploying your Predictive Models as a Service via DominoJo-fai Chow
The document discusses how Domino Data Lab can be used to deploy predictive models as APIs. It provides examples of using Domino to build, evaluate, and deploy predictive models for the Iris dataset and stock market forecasting. Key features discussed include the web and R interfaces, code sharing, scheduled runs, automatic version control, and publishing models as APIs for other applications to access.
Taverna workflows can be run in the cloud to automate complex analysis pipelines and access remote data and services. This allows sophisticated computational analyses to be shared as web services. The BioVeL and CA4LS projects are developing cloud-based workflow systems to support life scientists and clinical researchers. Workflows are hidden from users, who access pre-configured analyses via a web interface. This "workflow as a service" approach scales easily and provides a secure environment for data-intensive biomedical research.
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs and so forth. Don’t stop reading. Data management isn’t likely to win anyone a Nobel prize. But publications should be supported and accompanied by data, methods, procedures, etc. to assure reproducibility of results. Funding agencies expect data (and increasingly software) management retention and access plans as part of the proposal process for projects to be funded. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems Biology demands the interlinking and exchange of assets and the systematic recording
of metadata for their interpretation.
The FAIR Guiding Principles for scientific data management and stewardship (http://www.nature.com/articles/sdata201618) has been an effective rallying-cry for EU and USA Research Infrastructures. FAIRDOM (Findable, Accessible, Interoperable, Reusable Data, Operations and Models) Initiative has 8 years of experience of asset sharing and data infrastructure ranging across European programmes (SysMO and EraSysAPP ERANets), national initiatives (de.NBI, German Virtual Liver Network, UK SynBio centres) and PI's labs. It aims to support Systems and Synthetic Biology researchers with data and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety.
This talk will use the FAIRDOM Initiative to discuss the FAIR management of data, SOPs, and models for Sys Bio, highlighting the challenges of and approaches to sharing, credit, citation and asset infrastructures in practice. I'll also highlight recent experiments in affecting sharing using behavioural interventions.
http://www.fair-dom.org
http://www.fairdomhub.org
http://www.seek4science.org
Presented at COMBINE 2016, Newcastle, 19 September.
http://co.mbine.org/events/COMBINE_2016
Tutorial given at the European Conference for Machine Learning (ECMLPKDD 2015). It covers OpenML, how to use it in your research, interfaces in Java, R, Python, use through machine learning tools such as WEKA and MOA. Also covers topics in open science and reproducible research.
PhD Thesis: Mining abstractions in scientific workflowsdgarijo
Slides of the presentation for my PhD dissertation. I strongly recommend downloading the slides, as they have animations that are easier to see in power point. The abstract of the thesis is as follows: "Scientific workflows have been adopted in the last decade to represent the computational methods used in in silico scientific experiments and their associated research products. Scientific workflows have demonstrated to be useful for sharing and reproducing scientific experiments, allowing scientists to visualize, debug and save time when re-executing previous work. However, scientific workflows may be difficult to understand and reuse. The large amount of available workflows in repositories, together with their heterogeneity and lack of documentation and usage examples may become an obstacle for a scientist aiming to reuse the work from other scientists. Furthermore, given that it is often possible to implement a method using different algorithms or techniques, seemingly disparate workflows may be related at a higher level of abstraction, based on their common functionality. In this thesis we address the issue of reusability and abstraction by exploring how workflows relate to one another in a workflow repository, mining abstractions that may be helpful for workflow reuse. In order to do so, we propose a simple model for representing and relating workflows and their executions, we analyze the typical common abstractions that can be found in workflow repositories, we explore the current practices of users regarding workflow reuse and we describe a method for discovering useful abstractions for workflows based on existing graph mining techniques. Our results expose the common abstractions and practices of users in terms of workflow reuse, and show how our proposed abstractions have potential to become useful for users designing new workflows".
Results may vary: Collaborations Workshop, Oxford 2014Carole Goble
Thoughts on computational science reproducibility with a focus on software. Given at the Software Sustainability Institute's 2014 Collaborations Workshop
Agile Development in a Regulated EnvironmentTechWell
There is no doubt that agile is an accepted development methodology. However, if you work in a regulated industry like health care where you have to comply with its standard operating procedures, heaps of paperwork, and frequent audits, don’t these conflict with agile’s core tenets? Chris Ampenberger describes his operating environment and the applicable regulations that define the constraints for the software development process he can use. He shares how they overcame the incongruity between agile and regulatory requirements. With real-world examples, Chris demonstrates how you can produce the required documentation as a byproduct of the scrum team’s everyday work and illustrates how his teams succeeded in an agile way, achieving significant increases in productivity. Chris points out common pitfalls, details the hurdles they had to overcome, and discusses how to obtain buy-in from stakeholders at all levels of the organization. If you are working in a regulated environment, this session is for you.
The SEALS project conducted the first worldwide evaluation of semantic tools using their SEALS platform. They evaluated ontology engineering tools, storage and reasoning systems, ontology matching tools, semantic search tools, and semantic web services tools. The results showed that certain tools performed better than others in each category. A white paper summarizing the results will be published soon, and the next evaluation campaign using the SEALS platform will begin in July 2011.
In this video from the 2017 Argonne Training Program on Extreme-Scale Computing, Phil Carns from Argonne presents: HPC I/O for Computational Scientists.
"Darshan is a scalable HPC I/O characterization tool. It captures an accurate but concise picture of application I/O behavior with minimum overhead."
Darshan was originally developed on the IBM Blue Gene series of computers deployed at the Argonne Leadership Computing Facility, but it is portable across a wide variety of platforms include the Cray XE6, Cray XC30, and Linux clusters. Darshan routinely instruments jobs using up to 786,432 compute cores on the Mira system at ALCF.
Watch the video: https://wp.me/p3RLHQ-hv9
Learn more: https://extremecomputingtraining.anl.gov/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Keynote on software sustainability given at the 2nd Annual Netherlands eScience Symposium, November 2014.
Based on the article
Carole Goble ,
Better Software, Better Research
Issue No.05 - Sept.-Oct. (2014 vol.18)
pp: 4-8
IEEE Computer Society
http://www.computer.org/csdl/mags/ic/2014/05/mic2014050004.pdf
http://doi.ieeecomputersociety.org/10.1109/MIC.2014.88
http://www.software.ac.uk/resources/publications/better-software-better-research
From Scientific Workflows to Research Objects: Publication and Abstraction of...dgarijo
Presentation of my PhD work to the UPM group on the 12th of Feb of 2014. Summary of goals, motivation, OPMW, Standards, PROV, p-plan, Workflow Motifs, Workflow fragment detection and Research Objects.
Using SigOpt to Tune Deep Learning Models with Nervana CloudSigOpt
This document discusses using SigOpt to tune deep learning models. It notes that tuning deep learning systems is non-intuitive and expert-intensive using traditional random search or grid search methods. SigOpt provides a more efficient approach using Bayesian optimization to suggest optimal hyperparameters after each trial, reducing wasted expert time and computation. The document provides examples applying SigOpt to tune convolutional neural networks on CIFAR10, demonstrating a 1.6% reduction in error rate over expert tuning with no wasted trials.
- Systems biology uses computational approaches to produce quantitative, predictive models of biological processes by integrating math, biology, and high-throughput data.
- Eclipse technology can help by providing an extensible and customizable user interface for biologists to access modeling tools and IDEs for computational modelers, with reusable components.
- The SBSI software provides clients, a dispatcher, numerics algorithms, and a repository for systems biology modeling and optimization, with plugins for tasks like pathway editing, simulation, and data visualization.
Opquast desktop : quick analysis of an Opendata DatasetTemesis
Opquast desktop is a firefox addon that analyze the quality of the Web pages. In Brussels, for the 14th libre software meeting, the creators of this addons have demonstrated how Opquast desktop was able to check the quality of an opendata dataset.
From Scientific Workflows to Research Objects: Publication and Abstraction of...dgarijo
Overview of my current work done at the Ontology Engineering Group. This presentation is similar to http://www.slideshare.net/dgarijo/from-scientific-workflows-to-research-objects-publication-and-abstraction-of-scientific-experiments, with a couple of extra slides with some details of my future plans.
The document discusses the Planets Testbed, which provides a controlled environment for experimenting with and evaluating digital preservation tools and strategies. The Testbed allows for systematic testing of tools on shared content, automated comparison of experiment results, and reproducibility of experiments. This enables more informed decision making about digital preservation approaches tailored to institutional needs and contexts. Key benefits of the Testbed include access to preservation tools and experimental data, as well as contributing to the growing body of knowledge on digital preservation.
Ensemble machine learning methods are often used when the true prediction function is not easily approximated by a single algorithm. Practitioners may prefer ensemble algorithms when model performance is valued above other factors such as model complexity and training time. The Super Learner algorithm, also called "stacking", learns the optimal combination of the base learner fits. The latest version of H2O now contains a "Stacked Ensemble" method, which allows the user to stack H2O models into a Super Learner. The Stacked Ensemble method is the the native H2O version of stacking, previously only available in the h2oEnsemble R package, and now enables stacking from all the H2O APIs: Python, R, Scala, etc.
Erin is a Statistician and Machine Learning Scientist at H2O.ai. Before joining H2O, she was the Principal Data Scientist at Wise.io (acquired by GE Digital) and Marvin Mobile Security (acquired by Veracode) and the founder of DataScientific, Inc. Erin received her Ph.D. from University of California, Berkeley. Her research focuses on ensemble machine learning, learning from imbalanced binary-outcome data, influence curve based variance estimation and statistical computing.
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Universidade de São Paulo
The use of graph theory for analyzing network-like data has gained central importance with the rise of the Web 2.0. However, many graph-based techniques are not well-disseminated and neither explored at their full potential, what might depend on a complimentary approach achieved with the combination of multiple techniques. This paper describes the systematic use of graph-based techniques of different types (multimodal) combining the resultant analytical insights around a common domain, the Digital Bibliography & Library Project (DBLP). To do so, we introduce an analytical ensemble based on statistical (degree, and weakly-connected components distribution), topological (average clustering coefficient, and effective diameter evolution), algorithmic (link prediction/machine learning), and algebraic techniques to inspect non-evident features of DBLP at the same time that we interpret the heterogeneous discoveries found along the work. As a result, we have put together a set of techniques demonstrating over DBLP what we call multimodal analysis, an innovative process of information understanding that demands a wide technical knowledge and a deep understanding of the data domain. We expect that our methodology and our findings will foster other multimodal analyses and also that they will bring light over the Computer Science research.
Fyber implemented XGBoost models for two main use cases: Audience Vault Reach prediction and CTR prediction for their offer wall. For Audience Vault Reach, XGBoost with Spark was used to predict audience size over the next 14 days using historical user activity data. For CTR prediction, XGBoost ranked offers based on attributes to better estimate performance compared to old manual configurations. Both models involved data preprocessing, feature engineering, training XGBoost pipelines on Spark, and integrating the models into products.
Scientific Workflows: what do we have, what do we miss?Paolo Romano
This document discusses scientific workflows and outlines some key points:
- Scientific workflows are used to automate data retrieval and analysis processes from multiple databases and tools. Workflow management systems help implement these processes.
- Issues with current workflow systems include lack of automatic composition capabilities, performance limitations especially with large data volumes, and ensuring reproducibility of results over time as databases and tools change.
- The document outlines approaches to address these issues such as using ontologies to support automatic composition, optimizing for performance through parallelization and alternative services, and capturing provenance data to improve reproducibility and reuse of analyses.
The document discusses some of the promises and perils of mining software repositories like Git and GitHub for research purposes. It notes that while these sources contain rich data on software development, there are also challenges to consider. For example, decentralized version control systems like Git allow private collaboration that may be missed. And most GitHub projects are personal and inactive, while it is also used for storage and hosting. The document recommends researchers approach these data sources carefully and provides lessons on how to properly analyze and interpret the data from repositories like Git and GitHub.
This paper presents an approach for mapping products, processes, and resources for assembly automation using ontologies. The approach uses ontologies to represent product, process, and resource knowledge and SWRL rules to infer required components and tasks for product assembly. The approach was tested on a Festo test rig case study. The results demonstrated that the ontology mappings enabled dynamic configuration and analysis of the automation system and eased the modeling task.
Creating abstractions from scientific workflows: PhD symposium 2015dgarijo
This document discusses the creation of abstractions in scientific workflows. It hypothesizes that it is possible to automatically extract reusable patterns and abstractions from scientific workflow repositories that could be useful for developers. The document outlines challenges in workflow representation, abstraction, reuse, and annotation. It then describes an approach to define vocabularies and methodologies for publishing workflows as linked data. This includes defining a catalog of common workflow abstractions and techniques for finding and evaluating these abstractions across different workflow corpora. Evaluation shows the extracted patterns are similar to those defined by users and are considered useful.
Software tools to facilitate materials science researchAnubhav Jain
The document discusses software tools to facilitate materials science research, noting that the author's group works to standardize and automate computational methods for high-throughput calculations and discovery of new functional materials. It advocates for developing automated workflows and analysis frameworks to reduce errors, improve efficiency, and enable non-experts to easily conduct complex simulations and analyses through intuitive online interfaces. The goal is to make advanced computational materials science accessible to a wider audience.
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
This document describes FOOPS, an ontology validation service that checks ontologies for adherence to the FAIR principles. FOOPS tests ontologies against criteria related to findability, accessibility, interoperability, and reusability. It provides explanations for test failures to help users improve their ontologies. FOOPS validation results include an overall FAIRness score and coverage of FAIR categories to assess ontology quality, though there is no single threshold for what makes an ontology fully FAIR. The document demonstrates FOOPS and lists the types of tests it supports under each FAIR category. It invites feedback to help further improve FOOPS.
FAIR Workflows: A step closer to the Scientific Paper of the Futuredgarijo
Keynote presented at the Computational and Autonomous Workflows (CAW-2021) at the Oak Ridge National Laboratory. The keynote describes an overview of the different aspects to take into account when aiming to create FAIR workflows and associated resources.
More Related Content
Similar to On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects
In this video from the 2017 Argonne Training Program on Extreme-Scale Computing, Phil Carns from Argonne presents: HPC I/O for Computational Scientists.
"Darshan is a scalable HPC I/O characterization tool. It captures an accurate but concise picture of application I/O behavior with minimum overhead."
Darshan was originally developed on the IBM Blue Gene series of computers deployed at the Argonne Leadership Computing Facility, but it is portable across a wide variety of platforms include the Cray XE6, Cray XC30, and Linux clusters. Darshan routinely instruments jobs using up to 786,432 compute cores on the Mira system at ALCF.
Watch the video: https://wp.me/p3RLHQ-hv9
Learn more: https://extremecomputingtraining.anl.gov/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Keynote on software sustainability given at the 2nd Annual Netherlands eScience Symposium, November 2014.
Based on the article
Carole Goble ,
Better Software, Better Research
Issue No.05 - Sept.-Oct. (2014 vol.18)
pp: 4-8
IEEE Computer Society
http://www.computer.org/csdl/mags/ic/2014/05/mic2014050004.pdf
http://doi.ieeecomputersociety.org/10.1109/MIC.2014.88
http://www.software.ac.uk/resources/publications/better-software-better-research
From Scientific Workflows to Research Objects: Publication and Abstraction of...dgarijo
Presentation of my PhD work to the UPM group on the 12th of Feb of 2014. Summary of goals, motivation, OPMW, Standards, PROV, p-plan, Workflow Motifs, Workflow fragment detection and Research Objects.
Using SigOpt to Tune Deep Learning Models with Nervana CloudSigOpt
This document discusses using SigOpt to tune deep learning models. It notes that tuning deep learning systems is non-intuitive and expert-intensive using traditional random search or grid search methods. SigOpt provides a more efficient approach using Bayesian optimization to suggest optimal hyperparameters after each trial, reducing wasted expert time and computation. The document provides examples applying SigOpt to tune convolutional neural networks on CIFAR10, demonstrating a 1.6% reduction in error rate over expert tuning with no wasted trials.
- Systems biology uses computational approaches to produce quantitative, predictive models of biological processes by integrating math, biology, and high-throughput data.
- Eclipse technology can help by providing an extensible and customizable user interface for biologists to access modeling tools and IDEs for computational modelers, with reusable components.
- The SBSI software provides clients, a dispatcher, numerics algorithms, and a repository for systems biology modeling and optimization, with plugins for tasks like pathway editing, simulation, and data visualization.
Opquast desktop : quick analysis of an Opendata DatasetTemesis
Opquast desktop is a firefox addon that analyze the quality of the Web pages. In Brussels, for the 14th libre software meeting, the creators of this addons have demonstrated how Opquast desktop was able to check the quality of an opendata dataset.
From Scientific Workflows to Research Objects: Publication and Abstraction of...dgarijo
Overview of my current work done at the Ontology Engineering Group. This presentation is similar to http://www.slideshare.net/dgarijo/from-scientific-workflows-to-research-objects-publication-and-abstraction-of-scientific-experiments, with a couple of extra slides with some details of my future plans.
The document discusses the Planets Testbed, which provides a controlled environment for experimenting with and evaluating digital preservation tools and strategies. The Testbed allows for systematic testing of tools on shared content, automated comparison of experiment results, and reproducibility of experiments. This enables more informed decision making about digital preservation approaches tailored to institutional needs and contexts. Key benefits of the Testbed include access to preservation tools and experimental data, as well as contributing to the growing body of knowledge on digital preservation.
Ensemble machine learning methods are often used when the true prediction function is not easily approximated by a single algorithm. Practitioners may prefer ensemble algorithms when model performance is valued above other factors such as model complexity and training time. The Super Learner algorithm, also called "stacking", learns the optimal combination of the base learner fits. The latest version of H2O now contains a "Stacked Ensemble" method, which allows the user to stack H2O models into a Super Learner. The Stacked Ensemble method is the the native H2O version of stacking, previously only available in the h2oEnsemble R package, and now enables stacking from all the H2O APIs: Python, R, Scala, etc.
Erin is a Statistician and Machine Learning Scientist at H2O.ai. Before joining H2O, she was the Principal Data Scientist at Wise.io (acquired by GE Digital) and Marvin Mobile Security (acquired by Veracode) and the founder of DataScientific, Inc. Erin received her Ph.D. from University of California, Berkeley. Her research focuses on ensemble machine learning, learning from imbalanced binary-outcome data, influence curve based variance estimation and statistical computing.
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Universidade de São Paulo
The use of graph theory for analyzing network-like data has gained central importance with the rise of the Web 2.0. However, many graph-based techniques are not well-disseminated and neither explored at their full potential, what might depend on a complimentary approach achieved with the combination of multiple techniques. This paper describes the systematic use of graph-based techniques of different types (multimodal) combining the resultant analytical insights around a common domain, the Digital Bibliography & Library Project (DBLP). To do so, we introduce an analytical ensemble based on statistical (degree, and weakly-connected components distribution), topological (average clustering coefficient, and effective diameter evolution), algorithmic (link prediction/machine learning), and algebraic techniques to inspect non-evident features of DBLP at the same time that we interpret the heterogeneous discoveries found along the work. As a result, we have put together a set of techniques demonstrating over DBLP what we call multimodal analysis, an innovative process of information understanding that demands a wide technical knowledge and a deep understanding of the data domain. We expect that our methodology and our findings will foster other multimodal analyses and also that they will bring light over the Computer Science research.
Fyber implemented XGBoost models for two main use cases: Audience Vault Reach prediction and CTR prediction for their offer wall. For Audience Vault Reach, XGBoost with Spark was used to predict audience size over the next 14 days using historical user activity data. For CTR prediction, XGBoost ranked offers based on attributes to better estimate performance compared to old manual configurations. Both models involved data preprocessing, feature engineering, training XGBoost pipelines on Spark, and integrating the models into products.
Scientific Workflows: what do we have, what do we miss?Paolo Romano
This document discusses scientific workflows and outlines some key points:
- Scientific workflows are used to automate data retrieval and analysis processes from multiple databases and tools. Workflow management systems help implement these processes.
- Issues with current workflow systems include lack of automatic composition capabilities, performance limitations especially with large data volumes, and ensuring reproducibility of results over time as databases and tools change.
- The document outlines approaches to address these issues such as using ontologies to support automatic composition, optimizing for performance through parallelization and alternative services, and capturing provenance data to improve reproducibility and reuse of analyses.
The document discusses some of the promises and perils of mining software repositories like Git and GitHub for research purposes. It notes that while these sources contain rich data on software development, there are also challenges to consider. For example, decentralized version control systems like Git allow private collaboration that may be missed. And most GitHub projects are personal and inactive, while it is also used for storage and hosting. The document recommends researchers approach these data sources carefully and provides lessons on how to properly analyze and interpret the data from repositories like Git and GitHub.
This paper presents an approach for mapping products, processes, and resources for assembly automation using ontologies. The approach uses ontologies to represent product, process, and resource knowledge and SWRL rules to infer required components and tasks for product assembly. The approach was tested on a Festo test rig case study. The results demonstrated that the ontology mappings enabled dynamic configuration and analysis of the automation system and eased the modeling task.
Creating abstractions from scientific workflows: PhD symposium 2015dgarijo
This document discusses the creation of abstractions in scientific workflows. It hypothesizes that it is possible to automatically extract reusable patterns and abstractions from scientific workflow repositories that could be useful for developers. The document outlines challenges in workflow representation, abstraction, reuse, and annotation. It then describes an approach to define vocabularies and methodologies for publishing workflows as linked data. This includes defining a catalog of common workflow abstractions and techniques for finding and evaluating these abstractions across different workflow corpora. Evaluation shows the extracted patterns are similar to those defined by users and are considered useful.
Software tools to facilitate materials science researchAnubhav Jain
The document discusses software tools to facilitate materials science research, noting that the author's group works to standardize and automate computational methods for high-throughput calculations and discovery of new functional materials. It advocates for developing automated workflows and analysis frameworks to reduce errors, improve efficiency, and enable non-experts to easily conduct complex simulations and analyses through intuitive online interfaces. The goal is to make advanced computational materials science accessible to a wider audience.
Similar to On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects (20)
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
This document describes FOOPS, an ontology validation service that checks ontologies for adherence to the FAIR principles. FOOPS tests ontologies against criteria related to findability, accessibility, interoperability, and reusability. It provides explanations for test failures to help users improve their ontologies. FOOPS validation results include an overall FAIRness score and coverage of FAIR categories to assess ontology quality, though there is no single threshold for what makes an ontology fully FAIR. The document demonstrates FOOPS and lists the types of tests it supports under each FAIR category. It invites feedback to help further improve FOOPS.
FAIR Workflows: A step closer to the Scientific Paper of the Futuredgarijo
Keynote presented at the Computational and Autonomous Workflows (CAW-2021) at the Oak Ridge National Laboratory. The keynote describes an overview of the different aspects to take into account when aiming to create FAIR workflows and associated resources.
An increasing number of researchers rely on computational methods to generate the results described in their publications. Research software created to this end is heterogeneous (e.g., scripts, libraries, packages, notebooks, etc.) and usually difficult to find, reuse, compare and understand due to its disconnected documentation (dispersed in manuals, readme files, web sites, and code comments) and a lack of structured metadata to describe it. In this talk I will describe the main challenges for finding, comparing and reusing research software, how structured metadata can help to address some of them, which are the best practices being proposed by the community; and current initiatives to aid their adoption by researchers within EOSC.
Impact: The talk addresses an important aspect of the EOSC infrastructure for quality research software by ensuring that software contributed to the EOSC ecosystem can be found, compared and reused by researchers. The talk also aims to address metadata quality of current research products, which is critical for successful adoption.
Presented at the EOSC symposium
SOMEF: a metadata extraction framework from software documentationdgarijo
Presentation done at the council of software registries on March, 2021. SOMEF is a python package for automatically extracting over 25 metadata categories from a readme file. The output is then exported in JSON or in JSON-LD using the codemeta representation
A Template-Based Approach for Annotating Long-Tailed Datasetsdgarijo
An increasing amount of data is shared on the Web through heterogeneous spreadsheets and CSV files. In order to homogenize and query these data, the scientific community has developed Extract, Transform and Load (ETL) tools and services that help making these files machine readable in Knowledge Graphs (KGs). However, tabular data may be complex; and the level of expertise required by existing ETL tools makes it difficult for users to describe their own data. In this paper we propose a simple annotation schema to guide users when transforming complex tables into KGs. We have implemented our approach by extending T2WML, a table annotation tool designed to help users annotate their data and upload the results to a public KG. We have evaluated our effort with six non-expert users, obtaining promising preliminary results.
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphsdgarijo
In this presentation we describe the Ontology-Based APIs framework (OBA), our approach to automatically create REST APIs from ontologies while following RESTful API best
practices. Given an ontology (or ontology network) OBA uses standard technologies familiar to web developers (OpenAPI Specification, JSON) and combines them with W3C standards (OWL, JSON-LD frames and SPARQL) to create maintainable APIs with documentation, units tests, automated validation of resources and clients (in Python, Javascript, etc.) for non Semantic Web experts to access the contents of a target
knowledge graph. We showcase OBA with three examples that illustrate the capabilities of the framework for different ontologies.
Towards Knowledge Graphs of Reusable Research Software Metadatadgarijo
Research software is a key asset for understanding, reusing and reproducing results in computational sciences. An increasing amount of software is stored in code repositories, which usually contain human readable instructions indicating how to use it and set it up. However, developers and researchers often need to spend a significant amount of time to understand how to invoke a software component, prepare data in the required format, and use it in combination with other software. In addition, this time investment makes it challenging to discover and compare software with similar functionality. In this talk I will describe our efforts to address these issues by creating and using Open Knowledge Graphs that describe research software in a machine readable manner. Our work includes: 1) an ontology that extends schema.org and codemeta, designed to describe software and the specific data formats it uses; 2) an approach to publish software metadata as an open knowledge graph, linked to other Web of Data objects; and 3) a framework for automatically extracting metadata from software repositories; and 4) a framework to curate, query, explore and compare research software metadata in a collaborative manner. The talk will illustrate our approach with real-world examples, including a domain application for inspecting and discovering hydrology, agriculture, and economic software models; and the results of our framework when enriching the research software entries in Zenodo.org.
Scientific Software Registry Collaboration Workshop: From Software Metadata r...dgarijo
In this talk I briefly describe our work in OntoSoft for easy software metadata representation, and how new requirements for software reusability are making us move towards knowledge graphs of scientific software metadata
WDPlus: Leveraging Wikidata to Link and Extend Tabular Datadgarijo
Today, data about any domain can be found on the web in data repositories, web APIs and many millions of spreadsheets and CSV files. Researchers and organizations make these data available in a myriad of formats, layouts, terminology and cleanliness that make it difficult to integrate together. As a result, researchers aiming to use data in their analyses face three main challenges. The first one is finding datasets related to a feature, variable or topic of interest. For example, climate scientists need to look for years of observational data from authoritative sources when estimating the climate of a region. The second challenge is completing a given dataset with existing knowledge: machine learning applications are data hungry and require as many data points and features as possible to improve their predictions, which often requires integrating data from different sources. The third challenge is sharing integrated results: once several datasets have been merged together, how to make them available to the rest of the community?
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...dgarijo
Scientific software is crucial for understanding, reusing and reproducing results in computational sciences. Software is often stored in code repositories, which may contain human readable instructions necessary to use it and set it up. However, a significant amount of time is usually required to understand how to invoke a software component, prepare data in the format it requires, and use it in combination with other software. In this presentation we introduce OKG-Soft, an open knowledge graph that describes scientific software in a machine readable manner. OKG-Soft includes: 1) an ontology designed to describe software and the specific data formats it uses; 2) an approach to publish software metadata as an open knowledge graph, linked to other Web of Data objects; and 3) a framework to annotate, query, explore and curate scientific software metadata.
Towards Human-Guided Machine Learning - IUI 2019dgarijo
Automated Machine Learning (AutoML) systems are emerging
that automatically search for possible solutions from a large space of possible kinds of models. Although fully automated machine learning is appropriate for many applications, users often have knowledge that supplements and constraints the available data and solutions. This paper proposes human-guided machine learning (HGML) as a hybrid approach where a user interacts with an AutoML system and tasks it to explore different problem settings that reflect the user’s knowledge about the data available. We present: 1) a task analysis of HGML that shows the tasks that a user would want to carry out, 2) a characterization of two scientific publications, one in neuroscience and one in political science, in terms of how the authors would search for solutions using an AutoML system, 3) requirements for HGML based on those characterizations, and 4) an assessment of existing AutoML systems in terms of those requirements.
Capturing Context in Scientific Experiments: Towards Computer-Driven Sciencedgarijo
Scientists publish computational experiments in ways that do not facilitate reproducibility or reuse. Significant domain expertise, time and effort are required to understand scientific experiments and their research outputs. In order to improve this situation, mechanisms are needed to capture the exact details and the context of computational experiments. Only then, Intelligent Systems would be able help researchers understand, discover, link and reuse products of existing research.
In this presentation I will introduce my work and vision towards enabling scientists share, link, curate and reuse their computational experiments and results. In the first part of the talk, I will present my work for capturing and sharing the context of scientific experiments by using scientific workflows and machine readable representations. Thanks to this approach, experiment results are described in an unambiguous manner, have a clear trace of their creation process and include a pointer to the sources used for their generation. In the second part of the talk, I will describe examples on how the context of scientific experiments may be exploited to browse, explore and inspect research results. I will end the talk by presenting new ideas for improving and benefiting from the capture of context of scientific experiments and how to involve scientists in the process of curating and creating abstractions on available research metadata.
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...dgarijo
Traditional approaches to ontology development have a large lapse between the time when a user using the ontology has found a need to extend it and the time when it does get extended. For scientists, this delay can be weeks or months and can be a significant barrier for adoption. We present a new approach to ontology development and data annotation enabling users to add new metadata properties on the fly as they describe their datasets, creating terms that can be immediately adopted by others and eventually become standardized. This approach combines a traditional, consensus-based approach to ontology development, and a crowdsourced approach where ex-pert users (the crowd) can dynamically add terms as needed to support their work. We have implemented this approach as a socio-technical system that includes: 1) a crowdsourcing platform to support metadata annotation and addition of new terms, 2) a range of social editorial processes to make standardization decisions for those new terms, and 3) a framework for ontology revision and updates to the metadata created with the previous version of the ontology. We present a prototype implementation for the paleoclimate community, the Linked Earth Framework, currently containing 700 datasets and engaging over 50 active contributors. Users exploit the platform to do science while extending the metadata vocabulary, thereby producing useful and practical metadata
WIDOCO: A Wizard for Documenting Ontologiesdgarijo
WIDOCO is a WIzard for DOCumenting Ontologies that guides users through the documentation process of their vocabularies. Given an RDF vocabulary, WIDOCO detects missing vocabulary metadata and creates a documentation with diagrams, human readable descriptions of the ontology terms and a summary of
changes with respect to previous versions of the ontology. The documentation consists on a set of linked enriched HTML pages that can be further extended by end users. WIDOCO is open source and builds on well established Semantic Web tools. So far, it has been used to document more than one hundred ontologies in different domains.
We propose a new area of research on automating data narratives. Data narratives are containers of information about computationally generated research findings. They have three major components: 1) A record of events, that describe a new result through a workflow and/or provenance of all the computations executed; 2) Persistent entries for key entities involved for data, software versions, and workflows; 3) A set of narrative accounts that are automatically generated human-consumable renderings of the record and entities and can be included in a paper. Different narrative accounts can be used for different audiences with different content and details, based on the level of interest or expertise of the reader. Data narratives can make science more transparent and reproducible, because they ensure that the text description of the computational experiment reflects with high fidelity what was actually done. Data narratives can be incorporated in papers, either in the methods section or as supplementary materials. We introduce DANA, a prototype that illustrates how to generate data narratives automatically, and describe the information it uses from the computational records. We also present a formative evaluation of our approach and discuss potential uses of automated data narratives.
Automated Hypothesis Testing with Large Scale Scientific Workflowsdgarijo
(Credit to Varun Ratnakar and Yolanda Gil).
The automation of important aspects of scientific data analysis would significantly accelerate the pace of science and innovation. Although important aspects of data analysis can be automated, the hypothesize-test-evaluate discovery cycle is largely carried out by hand by researchers. This introduces a significant human bottleneck, which is inefficient and can lead to erroneous and incomplete explorations. We introduce a novel approach to automate the hypothesize-test-evaluate discovery cycle with an intelligent system that a scientist can task to test hypotheses of interest in a data repository. Our approach captures three types of data analytics knowledge: 1) common data analytic methods represented as semantic workflows; 2) meta-analysis methods that aggregate those results, represented as meta-workflows; and 3) data analysis strategies that specify for a type of hypothesis what data and methods to use, represented as lines of inquiry. Given a hypothesis specified by a scientist, appropriate lines of inquiry are triggered, which lead to retrieving relevant datasets, running relevant workflows on that data, and finally running meta-workflows on workflow results. The scientist is then presented with a level of confidence on the initial hypothesis (or a revised hypothesis) based on the data and methods applied. We have implemented this approach in the DISK system, and applied it to multi-omics data analysis.
OntoSoft: A Distributed Semantic Registry for Scientific Softwaredgarijo
Credit to Yolanda Gil.
OntoSoft is a distributed semantic registry for scientific software. This paper describes three major novel contributions of OntoSoft: 1) a software metadata registry designed for scientists, 2) a distributed approach to software registries that targets communities of interest, and 3) metadata crowdsourcing through access control. Software metadata is organized using the OntoSoft ontology along six dimensions that matter to scientists: identify software, understand and assess software, execute software, get support for the software, do research with the software, and update the software. OntoSoft is a distributed registry where each site is owned and maintained by a community of interest, with a distributed semantic query capability that allows users to search across all sites. The registry has metadata crowdsourcing capabilities, supported through access control so that software authors can allow others to expand on specific metadata properties.
OEG tools for supporting Ontology Engineeringdgarijo
The document summarizes several tools developed by the Ontology Engineering Group (OEG) to support ontology engineering, including Vocabularium for serving ontologies online, OnToology for evaluation reports, documentation and publishing ontologies, AR2DTool for ontology diagrams, Widoco for HTML documentation, and OOPS! for ontology quality evaluations. It provides an overview of the capabilities of each tool and URLs for their websites and GitHub repositories.
Software Metadata: Describing "dark software" in GeoSciencesdgarijo
This document discusses describing "dark software" or unshared scientific software in geosciences. It proposes using the OntoSoft ontology to capture standardized metadata about scientific software. This would allow software to be more discoverable, reusable and reproducible. The document outlines the types of metadata captured by OntoSoft and demonstrates how it can be used to describe software and facilitate search and comparison of different tools.
Reproducibility Using Semantics: An Overviewdgarijo
Overview of the different approaches for addressing reproducibilities (using semantics) in laboratory protocols, workflow description and publication and workflow infrastructure. Furthermore, Research Objects are introduced as a means to capture the context and annotations of scientific experiments, together with the privacy and IPR concerns that may arise. This presentation was presented in Dagstuhl Seminar 16041: http://www.dagstuhl.de/16041
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Webinar: Designing a schema for a Data WarehouseFederico Razzoli
Are you new to data warehouses (DWH)? Do you need to check whether your data warehouse follows the best practices for a good design? In both cases, this webinar is for you.
A data warehouse is a central relational database that contains all measurements about a business or an organisation. This data comes from a variety of heterogeneous data sources, which includes databases of any type that back the applications used by the company, data files exported by some applications, or APIs provided by internal or external services.
But designing a data warehouse correctly is a hard task, which requires gathering information about the business processes that need to be analysed in the first place. These processes must be translated into so-called star schemas, which means, denormalised databases where each table represents a dimension or facts.
We will discuss these topics:
- How to gather information about a business;
- Understanding dictionaries and how to identify business entities;
- Dimensions and facts;
- Setting a table granularity;
- Types of facts;
- Types of dimensions;
- Snowflakes and how to avoid them;
- Expanding existing dimensions and facts.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
On Specifying and Sharing Scientific Workflow Optimization Results Using Research Objects
1. On Specifying and Sharing Scientific Workflow
Optimization Results Using Research Objects
Mitglied der Helmholtz-Gemeinschaft
8th Workshop On Workflows in Support of Large-Scale Science
17. November 2013 | Sonja Holl*, Daniel Garijo+, Khalid Belhajjame$, Olav Zimmermann*,
Renato De Giovanni#, Matthias Obst~, Carole Goble$
*Jülich Supercomputing Centre (JSC),Forschungszentrum Juelich, Germany
+Ontology Engineering Group, Facultad
de Informática Universidad Politécnica de Madrid, Spain
$School of Computer Science University of Manchester, UK
#Reference Center
on Environmental Information Campinas SP, Brazil
~Department of Biological and Environmental Sciences University of Gothenburg, Sweden
2. Scientific Workflows
•
Mitglied der Helmholtz-Gemeinschaft
•
Popular choice to design,
manage, and execute in silico
experiments
Sharing and reuse via workflow
repositories
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
2
3. Ecological Niche Modeling
1
4
5
3
Mitglied der Helmholtz-Gemeinschaft
2
Perform species adaptation to environmental
changes (BioVeL Project)
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
3
4. Ecological Niche Modeling Workflow
Parameter
Occurrence
Data
Environmental
Layer
Geographic
Mask
createModel
Mitglied der Helmholtz-Gemeinschaft
testModel
calcAUC
AUC
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
4
6. Ecological Niche Modeling Workflow
Gamma
Cost
NumberOfPseu
doAbsences
Occurrence
Data
createModel
Environmental
Layer
Geographic
Mask
SVM
Maxent
GARP
Mitglied der Helmholtz-Gemeinschaft
testModel
calcAUC
AUC
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
6
7. ‐3.2
1
11
2.3
1.5
a
4.55
‐3
Ecological Niche Modeling Workflow
84
BLAST
10
6.788
Gamma
0.5
Cost
NumberOfPseu
doAbsences
Occurrence
Data
Environmental
Layer
Select Algorithms
0
createModel
Geographic
Mask
12
SVM
Maxent
GARP
Select Parameters
100
testModel
Mitglied der Helmholtz-Gemeinschaft
‐2.9
‐bt
1.3
calcAUC
1
AUC
1
Sunday Nov. 17, 2013
/
gaussian
8th Workshop On Workflows in Support of Large-Scale Science
1.9425
6.7
7
13
8. Common strategies to handle this challenge
•
•
•
Default parameters & applications
Trial and error
Parameter sweeps
But:
Mitglied der Helmholtz-Gemeinschaft
•
•
•
Increasing complexity of scientific workflows
Raising number parameters
Work time & compute intensive
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
8
10. Intelligent automated optimization techniques
Goal:
• Automated way to find workflow settings that optimizes
the output
•
Mitglied der Helmholtz-Gemeinschaft
•
•
Define workflow output(s) as fitness value
Use fitness value for evaluation (e.g. AUC or correlation
coefficient)
Use heuristic search algorithm to find best
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
10
11. How does it work?
•
•
•
Mitglied der Helmholtz-Gemeinschaft
•
Development of optimization framework that extends
Taverna workflow management system
Abstracts optimization process (e.g. parallel execution,
security)
Developer API allows rapid adaption of new optimization
methods
Optimization plugins can be added independently
WMS
Taverna
Sunday Nov. 17, 2013
Framework
Optimization
Layer
Plugins
A
P
I
Parameter Optimization
Component Optimization
8th Workshop On Workflows in Support of Large-Scale Science
11
12. Taverna
Optimization Framework & Plugin
(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
method parameters
(population size,
termination criteria)
Best Fitness:
0.34
1
Best Fitness:
0.42
2
Best Fitness:
0.48
Mitglied der Helmholtz-Gemeinschaft
.
.
.
Display the
optimization
result
x
Best Fitness: 0.49
Genetic Algorithm Parameter
Optimization Plugin
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
12
13. Status quo
•
•
Workflow optimization starts from scratch each time
Optimization meta-data are lost
Mitglied der Helmholtz-Gemeinschaft
Idea: Capture optimization meta-data next to traditional
provenance data
⇒
⇒
learn from/extend prior optimization runs
improve and accelerate optimization process
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
13
14. Research Objects
•
•
•
•
Aligned with W3C standards
Aggregates various resources
Describes scientific processes in machine readable
format
Specified by several ontologies
Mitglied der Helmholtz-Gemeinschaft
…
ore:aggregates
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
14
15. Taverna
Optimization Framework & Plugin
Mitglied der Helmholtz-Gemeinschaft
(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
parameters (population
size, termination criteria)
Display the
optimization
result
Best
Fitness:
0.34
Best
Fitness:
0.42
Best
Fitness:
0.48
1
2
.
.
.
x
Best Fitness: 0.49
Genetic Algorithm Parameter
Optimization Plugin
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
15
16. Optimization Research Object Ontology
ro:Research
Object
opt:Optimization
Research
Object
ore:aggregates
Mitglied der Helmholtz-Gemeinschaft
opt:Algorithm
Describes the
optimization
algorithm and
its parameters
opt:Fitness
opt:Generation
opt:Optimization
Run
opt:Search
Space
opt:Termination
Condition
opt:Workflow
Describes the
fitness
functions
Defines the
population size
and generation
number for an
Optimization
Run
Represents one
result set: sub‐
workflow,
parameters and
obtained fitness
values
Describes the
dependencies
and parameter
constraints
Describes the
termination
condition
defined by the
user
The workflow
that was
optimized
rdfs:subClassOf
Sunday Nov. 17, 2013
rdf:Property
8th Workshop On Workflows in Support of Large-Scale Science
16
17. Algorithm
Mitglied der Helmholtz-Gemeinschaft
• Genetic Algorihm
• Mutation rate: 0.1
• Crossover rate 0.7
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
17
18. Search Space
Gamma:
• Double
• 0 - 10
Mitglied der Helmholtz-Gemeinschaft
• Cost/2 < Gamma
(fictional)
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
18
19. Optimization Run
Mitglied der Helmholtz-Gemeinschaft
• Origin of result
• Parameter setting
• Fitness value
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
19
20. Taverna
Optimization Framework & Plugin
(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
parameters (population
size, termination criteria)
Generation 1 Iteration 1
Best Fitness:
Fitness: 0.05
0.34
Fitness: 0.05
1
Best Fitness:
0.42
2
Best Fitness:
0.48
Mitglied der Helmholtz-Gemeinschaft
.
.
.
Display the
optimization
result
x
Best Fitness: 0.49
Genetic Algorithm Parameter
Optimization Plugin
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
20
21. Taverna
Optimization Framework & Plugin
(1) Define sub-workflow
(2) Specify input
parameters (constraints)
(3) Select fitness output
parameters (e.g. AUC)
(4) Define optimization
parameters (population
size, termination criteria)
Generation 1 Iteration 1
Best Fitness:
Fitness: 0.05
0.34
Generation 1 Iteration 2
Fitness: 0.05
1
Fitness: 0.22
Generation 1 Iteration 3
Best Fitness:
0.42
Fitness: 0.27
Generation 1 Iteration 4
2
Fitness: 0.19
Best Fitness:
Generation 1 Iteration 5
0.48
Fitness: 0.31
.
Generation 1 Iteration 6
.
Fitness: 0.34
x
Mitglied der Helmholtz-Gemeinschaft
.
Display the
optimization
result
Best Fitness: 0.49
Genetic Algorithm Parameter
Optimization Plugin
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
21
24. Benefits of sharing and exploiting Optimization
Research Objects
•
•
•
Mitglied der Helmholtz-Gemeinschaft
•
•
•
What is the optimal setting? - Reuse optimized settings
What ranges have been explored? - Adopt used parameter
ranges
What algorithm settings were used? - Reuse algorithm
settings
Are there similar optimizations? - Reuse existing results
Resume the optimization
Embed optimization provenance into workflow
infrastructures to be reused by other scientists
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
24
25. Conclusion
•
Scientific workflows are hard to configure
Optimization can help but meta-data get lost
Extend Research Objects
Build new Optimization Research Object Ontology
Reuse of optimization meta-data to speed up
optimization
Shareable with the community in workflow infrastructures
•
Outlook: How to learn from similar workflows?
•
•
•
•
Mitglied der Helmholtz-Gemeinschaft
•
Sunday Nov. 17, 2013
8th Workshop On Workflows in Support of Large-Scale Science
25