This document discusses workflow monitoring and performance analysis using Stampede. Stampede models monitoring data from running scientific workflows in real-time and performs analysis to predict failures and identify problematic resources. Unsupervised learning is used to cluster workflows based on historical data. Online analysis then classifies new workflows to identify those with high failure rates. The goal is to provide feedback to users and workflow engines to adapt workflows and improve performance.
OCAT: Object Capture based Automated Testing (ISSTA 2010)Sung Kim
OCAT is an approach that aims to improve automated test generation for object-oriented programs. It works by instrumenting a program to capture object instances during execution. These captured objects are then used as inputs to an existing test generation technique like Randoop to generate new test cases. The objects may also be mutated to try and cover additional branches. An evaluation on three Java projects found that OCAT improved branch coverage by an average of 25.5% over using test generation alone. While the approach shows promise, future work is needed to address limitations such as dependency on existing test suites and challenges of evolving software and remaining uncovered branches.
Snabbkaffe is a library that helps testing concurrent and distributed systems.
It does so by moving the focus from /states/ to /effects/.
Developers find bugs by looking at the logs, Snabbkaffe does the same and automates the process.
It has advanced modes of testing: fault and scheduling injection.
It aims for efficiency: run test scenario once, verify multiple properties.
Open sourced in GitHub: https://github.com/kafka4beam/snabbkaffe
Materials Project Validation, Provenance, and Sandboxes by Dan GunterDan Gunter
Summary of Goals, Progress, and Next steps for these three aspects of the Materials Project (materialsproject.org) infrastructure
* Validation: constantly guard against bugs in core data and imported data
* Provenance: know how data came to be
* Sandboxes: combine public and non-public data; "good fences make good neighbors"
Presenter: Dan Gunter, LBNL
The document discusses using MongoDB as a database for materials discovery projects at Lawrence Berkeley National Lab. It describes how MongoDB is used to [1] store and manage computational calculations across multiple supercomputers, [2] store complex materials data in flexible documents, and [3] enable powerful querying and analysis of the data through logging and machine learning. MongoDB allows for integrated management, analysis and automation of computational materials science projects.
A Lithium Ion Capacitor is a hybrid device which combines the intercalation mechanism of a Lithium battery with the [cathode] of an electric double-layer capacitor (EDLC).
The document summarizes the history and evolution of non-relational databases, known as NoSQL databases. It discusses early database systems like MUMPS and IMS, the development of the relational model in the 1970s, and more recent NoSQL databases developed by companies like Google, Amazon, Facebook to handle large, dynamic datasets across many servers. Pioneering systems like Google's Bigtable and Amazon's Dynamo used techniques like distributed indexing, versioning, and eventual consistency that influenced many open-source NoSQL databases today.
Artificial intelligence (AI) is everywhere, promising self-driving cars, medical breakthroughs, and new ways of working. But how do you separate hype from reality? How can your company apply AI to solve real business problems?
Here’s what AI learnings your business should keep in mind for 2017.
OCAT: Object Capture based Automated Testing (ISSTA 2010)Sung Kim
OCAT is an approach that aims to improve automated test generation for object-oriented programs. It works by instrumenting a program to capture object instances during execution. These captured objects are then used as inputs to an existing test generation technique like Randoop to generate new test cases. The objects may also be mutated to try and cover additional branches. An evaluation on three Java projects found that OCAT improved branch coverage by an average of 25.5% over using test generation alone. While the approach shows promise, future work is needed to address limitations such as dependency on existing test suites and challenges of evolving software and remaining uncovered branches.
Snabbkaffe is a library that helps testing concurrent and distributed systems.
It does so by moving the focus from /states/ to /effects/.
Developers find bugs by looking at the logs, Snabbkaffe does the same and automates the process.
It has advanced modes of testing: fault and scheduling injection.
It aims for efficiency: run test scenario once, verify multiple properties.
Open sourced in GitHub: https://github.com/kafka4beam/snabbkaffe
Materials Project Validation, Provenance, and Sandboxes by Dan GunterDan Gunter
Summary of Goals, Progress, and Next steps for these three aspects of the Materials Project (materialsproject.org) infrastructure
* Validation: constantly guard against bugs in core data and imported data
* Provenance: know how data came to be
* Sandboxes: combine public and non-public data; "good fences make good neighbors"
Presenter: Dan Gunter, LBNL
The document discusses using MongoDB as a database for materials discovery projects at Lawrence Berkeley National Lab. It describes how MongoDB is used to [1] store and manage computational calculations across multiple supercomputers, [2] store complex materials data in flexible documents, and [3] enable powerful querying and analysis of the data through logging and machine learning. MongoDB allows for integrated management, analysis and automation of computational materials science projects.
A Lithium Ion Capacitor is a hybrid device which combines the intercalation mechanism of a Lithium battery with the [cathode] of an electric double-layer capacitor (EDLC).
The document summarizes the history and evolution of non-relational databases, known as NoSQL databases. It discusses early database systems like MUMPS and IMS, the development of the relational model in the 1970s, and more recent NoSQL databases developed by companies like Google, Amazon, Facebook to handle large, dynamic datasets across many servers. Pioneering systems like Google's Bigtable and Amazon's Dynamo used techniques like distributed indexing, versioning, and eventual consistency that influenced many open-source NoSQL databases today.
Artificial intelligence (AI) is everywhere, promising self-driving cars, medical breakthroughs, and new ways of working. But how do you separate hype from reality? How can your company apply AI to solve real business problems?
Here’s what AI learnings your business should keep in mind for 2017.
Automated generation of various and consistent populations in multi-agent sim...Benoit Lacroix
In multi-agent based simulations, providing various and consistent behaviors for the agents is an important issue to produce realistic and valid results. However, it is difficult for the simulations users to manage simultaneously these two elements, especially when the exact influence of each behaviorial parameter remains unknown. We propose in this paper a generic model designed to deal with this issue: easily generate various and consistent behaviors for the agents. The behaviors are described using a normative approach, which allows increasing the variety by introducing violations. The generation engine controls the determinism of the creation process, and a mechanism based on unsupervised learning allows managing the behaviors consistency. The model has been applied to traffic simulation with the driving simulation software used at Renault, SCANeR 2, and experimental results are presented to demonstrate its validity.
This document discusses scientific workflow systems and their use for automating computational tasks, making use of computational infrastructure, abstracting away complexity, capturing provenance, and enabling reproducibility. It provides an overview of various workflow systems including Snakemake, Nextflow, Taverna, KNIME, Galaxy, and Common Workflow Language, and how containers like Docker can be used to package and distribute workflows.
This document discusses scientific workflow systems and their use for automating computational tasks, making use of computational infrastructure, abstracting away complexity, capturing provenance, and enabling reproducibility. It provides an overview of various workflow systems including Snakemake, Nextflow, Taverna, KNIME, Galaxy, and Common Workflow Language, and how containers like Docker can be used to package and distribute workflows.
Hervé Panetto. A framework for analysing product information traceabilityMilan Zdravković
Presentation from the 1st Workshop on Future Internet Enterprise Systems - FINES 2010: Ontologies and Interoperability, made at 10.11.2010 in Faculty of Mechanical Engineering, Laboratory for Intelligent Manufacturing Systems
The document summarizes machine learning applications in performance management, including transaction recognition, event mining, and probing strategy. It discusses using naive Bayes classification to recognize end-user transactions from remote procedure calls, representing transactions as feature vectors of RPC counts. Evaluation showed the approach achieved up to 87% accuracy for classification and 64% accuracy for combined segmentation and labeling. Event mining aims to learn system behavior patterns from large event logs, using probabilistic graphical models. Probing strategy seeks an optimal probe frequency to minimize failure detection time while limiting additional load.
TAU Performance System and the Extreme-scale Scientific Software Stack (E4S) aim to improve productivity for HPC and AI workloads. TAU provides a portable performance evaluation toolkit, while E4S delivers modular and interoperable software stacks. Together, they lower barriers to using software tools from the Exascale Computing Project and enable performance analysis of complex, multi-component applications.
German Conference on Bioinformatics 2021
https://gcb2021.de/
FAIR Computational Workflows
Computational workflows capture precise descriptions of the steps and data dependencies needed to carry out computational data pipelines, analysis and simulations in many areas of Science, including the Life Sciences. The use of computational workflows to manage these multi-step computational processes has accelerated in the past few years driven by the need for scalable data processing, the exchange of processing know-how, and the desire for more reproducible (or at least transparent) and quality assured processing methods. The SARS-CoV-2 pandemic has significantly highlighted the value of workflows.
This increased interest in workflows has been matched by the number of workflow management systems available to scientists (Galaxy, Snakemake, Nextflow and 270+ more) and the number of workflow services like registries and monitors. There is also recognition that workflows are first class, publishable Research Objects just as data are. They deserve their own FAIR (Findable, Accessible, Interoperable, Reusable) principles and services that cater for their dual roles as explicit method description and software method execution [1]. To promote long-term usability and uptake by the scientific community, workflows (as well as the tools that integrate them) should become FAIR+R(eproducible), and citable so that author’s credit is attributed fairly and accurately.
The work on improving the FAIRness of workflows has already started and a whole ecosystem of tools, guidelines and best practices has been under development to reduce the time needed to adapt, reuse and extend existing scientific workflows. An example is the EOSC-Life Cluster of 13 European Biomedical Research Infrastructures which is developing a FAIR Workflow Collaboratory based on the ELIXIR Research Infrastructure for Life Science Data Tools ecosystem. While there are many tools for addressing different aspects of FAIR workflows, many challenges remain for describing, annotating, and exposing scientific workflows so that they can be found, understood and reused by other scientists.
This keynote will explore the FAIR principles for computational workflows in the Life Science using the EOSC-Life Workflow Collaboratory as an example.
[1] Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes,Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, and Daniel Schober FAIR Computational Workflows Data Intelligence 2020 2:1-2, 108-121 https://doi.org/10.1162/dint_a_00033.
Containerizing HPC and AI applications using E4S and Performance Monitor toolGanesan Narayanasamy
The DOE Exascale Computing Project (ECP) Software Technology focus area is developing an HPC software ecosystem that will enable the efficient and performant execution of exascale applications. Through the Extreme-scale Scientific Software Stack (E4S) [https://e4s.io], it is developing a comprehensive and coherent software stack that will enable application developers to productively write highly parallel applications that can portably target diverse exascale architectures. E4S provides both source builds through the Spack platform and a set of containers that feature a broad collection of HPC software packages. E4S exists to accelerate the development, deployment, and use of HPC software, lowering the barriers for HPC users. It provides container images, build manifests, and turn-key, from-source builds of popular HPC software packages developed as Software Development Kits (SDKs). This effort includes a broad range of areas including programming models and runtimes (MPICH, Kokkos, RAJA, OpenMPI), development tools (TAU, HPCToolkit, PAPI), math libraries (PETSc, Trilinos), data and visualization tools (Adios, HDF5, Paraview), and compilers (LLVM), all available through the Spack package manager. It will describe the community engagements and interactions that led to the many artifacts produced by E4S. It will introduce the E4S containers are being deployed at the HPC systems at DOE national laboratories using Singularity, Shifter, and Charliecloud container runtimes.
This talk will describe how E4S can support the OpenPOWER platform with NVIDIA GPUs.
This document summarizes a research paper presented at the 1st Workshop on Multimedia-Aware Networking 2011 that proposes a preliminary design for a content-aware network node. The design aims to classify multimedia traffic based on deep packet inspection and apply specific routing policies. An experimental testbed was implemented to validate the concept, which demonstrated that the content-aware node was able to identify traffic and route differently compared to a traditional agnostic node, without affecting delay.
PrOnto: an Ontology Driven Business Process Mining ToolFrancesco Nocera
Presentation at the 21st International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES 2017) 6 – 8 September, 2017, Marseille, France
Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...Matthew Skelton
In this talk, we explore five practical, tried-and-tested, real world techniques for improving operability with many kinds of software systems, including cloud, Serverless, on-premise, and IoT.
Logging as a live diagnostics vector with sparse Event IDs
Operational checklists and ‘Run Book dialogue sheets’ as a discovery mechanism for teams
Endpoint healthchecks as a way to assess runtime dependencies and complexity
Correlation IDs beyond simple HTTP calls
Lightweight ‘User Personas’ as drivers for operational dashboards
Based on our work in many industry sectors, we will share our experience of helping teams to improve the operability of their software systems through
Required audience experience
Some experience of building web-scale systems or industrial IoT/embedded systems would be helpful.
Objective of the talk
We will share our experience of helping teams to improve the operability of their software systems. Attendees will learn some practical operability approaches and how teams can expand their understanding and awareness of operability through these simple, team-friendly techniques.
From a talk given at Continuous Lifecycle London 2018: https://continuouslifecycle.london/sessions/practical-team-focused-operability-techniques-for-distributed-systems/
20090918 Agile Computer Control of a Complex ExperimentJonathan Blakes
The document summarizes a research paper that describes using Python and related tools like Traits and TraitsUI to build a flexible and reliable software system for controlling a complex experiment. It allows for interfacing with specialized hardware, unit testing components, running operations in parallel, building graphical user interfaces, and making the system data-driven and evolvable to change with experimental needs. Key advantages included pushing hardware interface code to a high-level language and automatically generating GUIs based on underlying data models.
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...Rafael Ferreira da Silva
Scientific workflows are a cornerstone of modern scientific computing. They are used to describe complex computational applications that require efficient and robust management of large volumes of data, which are typically stored/processed on heterogeneous, distributed resources. The workflow research and development community has employed a number of methods for the quantitative evaluation of existing and novel workflow algorithms and systems. In particular, a common approach is to simulate workflow executions. In previous work, we have presented a collection of tools that have been used for aiding research and development activities in the Pegasus project, and that have been adopted by others for conducting workflow research. Despite their popularity, there are several shortcomings that prevent easy adoption, maintenance, and consistency with the evolving structures and computational requirements of production workflows. In this work, we present WorkflowHub, a community framework that provides a collection of tools for analyzing workflow execution traces, producing realistic synthetic workflow traces, and simulating workflow executions. We demonstrate the realism of the generated synthetic traces by comparing simulated executions of these traces with actual workflow executions. We also contrast these results with those obtained when using the previously available collection of tools. We find that our framework not only can be used to generate representative synthetic workflow traces (i.e., with workflow structures and task characteristics distributions that resemble those in traces obtained from real-world workflow executions), but can also generate representative workflow traces at larger scales than that of available workflow traces.
Ingredients for Semantic Sensor NetworksOscar Corcho
The document discusses ingredients for creating a Semantic Sensor Web including an ontology model, URI definition practices, semantic technologies like SPARQL, and mappings to integrate sensor data. It provides an overview of the SSN ontology for describing sensors and observations. Examples are given of querying sensor data streams using SPARQL extensions and translating queries to sensor network APIs using mappings. Lessons on publishing and consuming linked stream data are also discussed.
This document proposes a method to improve the reuse of workflow fragments by mining workflow repositories. It evaluates different graph representations of workflows and uses the SUBDUE algorithm to identify recurrent fragments. An experiment compares representations on precision, recall, memory usage, and time. Representation D1, which labels edges and nodes, performed best. A second experiment assesses how filtering workflows by keywords impacts finding relevant fragments for a user query. The method aims to incorporate workflow fragment search capabilities into the design lifecycle to promote reuse.
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARNJosh Patterson
This document summarizes Josh Patterson's work on parallel machine learning algorithms. It discusses his past publications and work on routing algorithms and metaheuristics. It then outlines his work developing parallel versions of algorithms like linear regression, logistic regression, and neural networks using Hadoop and YARN. It presents performance results showing these parallel algorithms can achieve close to linear speedup. It also discusses techniques used like vector caching and unit testing frameworks. Finally, it discusses future work on algorithms like Adagrad and parallel quasi-Newton methods.
This document proposes extending algorithmic skeletons with event-driven programming to address the inversion of control problem in skeleton frameworks. It introduces event listeners that can be registered at event hooks within skeletons to access runtime information. This allows implementing non-functional concerns like logging and performance monitoring separately from the core parallel logic. The approach is implemented in the Skandium skeleton library, and examples are given of a logger and online performance monitor built using it. An analysis shows the overhead of processing events is negligible, at around 20 microseconds per event.
This work is dedicated to the design of a robust fault detection system for an anaerobic bioreactor modeled as a Descriptor Linear Parameter Varying system (D-LPV) system with unmeasurable gain scheduling functionss. The main goals are: first, an accurate D-LPV representation of the nonlinear model is obtained by considering the sector nonlinearity modeling approach. Second, a LPV observer is designed to estimate the bioprocess states variables. The observer considers the gains scheduling functions depending on estimated states due to the lack of a biomass mass sensor. Third, in oder to perform fault detection and isolation, by mean of residual generation, a bank of observers is designed. For each observer, sufficient conditions to guarantee asymptotic stability and robustness against disturbance are given by a set of feasible Linear Matrix Inequalities (LMIs). Finally, some simulations in fault-free and faulty operations are considered on the bioreactor system.
This document discusses the RAMP (Rapid Analytics and Model Prototyping) program, which hosts data science competitions that allow participants to submit code. RAMP aims to enable collaborative prototyping, teaching, and managing the data science process. It has hosted many challenges across various domains. The document outlines RAMP's goals, examples of past and upcoming challenges, and tools for defining and testing predictive workflows that are submitted as part of the challenges.
Automated generation of various and consistent populations in multi-agent sim...Benoit Lacroix
In multi-agent based simulations, providing various and consistent behaviors for the agents is an important issue to produce realistic and valid results. However, it is difficult for the simulations users to manage simultaneously these two elements, especially when the exact influence of each behaviorial parameter remains unknown. We propose in this paper a generic model designed to deal with this issue: easily generate various and consistent behaviors for the agents. The behaviors are described using a normative approach, which allows increasing the variety by introducing violations. The generation engine controls the determinism of the creation process, and a mechanism based on unsupervised learning allows managing the behaviors consistency. The model has been applied to traffic simulation with the driving simulation software used at Renault, SCANeR 2, and experimental results are presented to demonstrate its validity.
This document discusses scientific workflow systems and their use for automating computational tasks, making use of computational infrastructure, abstracting away complexity, capturing provenance, and enabling reproducibility. It provides an overview of various workflow systems including Snakemake, Nextflow, Taverna, KNIME, Galaxy, and Common Workflow Language, and how containers like Docker can be used to package and distribute workflows.
This document discusses scientific workflow systems and their use for automating computational tasks, making use of computational infrastructure, abstracting away complexity, capturing provenance, and enabling reproducibility. It provides an overview of various workflow systems including Snakemake, Nextflow, Taverna, KNIME, Galaxy, and Common Workflow Language, and how containers like Docker can be used to package and distribute workflows.
Hervé Panetto. A framework for analysing product information traceabilityMilan Zdravković
Presentation from the 1st Workshop on Future Internet Enterprise Systems - FINES 2010: Ontologies and Interoperability, made at 10.11.2010 in Faculty of Mechanical Engineering, Laboratory for Intelligent Manufacturing Systems
The document summarizes machine learning applications in performance management, including transaction recognition, event mining, and probing strategy. It discusses using naive Bayes classification to recognize end-user transactions from remote procedure calls, representing transactions as feature vectors of RPC counts. Evaluation showed the approach achieved up to 87% accuracy for classification and 64% accuracy for combined segmentation and labeling. Event mining aims to learn system behavior patterns from large event logs, using probabilistic graphical models. Probing strategy seeks an optimal probe frequency to minimize failure detection time while limiting additional load.
TAU Performance System and the Extreme-scale Scientific Software Stack (E4S) aim to improve productivity for HPC and AI workloads. TAU provides a portable performance evaluation toolkit, while E4S delivers modular and interoperable software stacks. Together, they lower barriers to using software tools from the Exascale Computing Project and enable performance analysis of complex, multi-component applications.
German Conference on Bioinformatics 2021
https://gcb2021.de/
FAIR Computational Workflows
Computational workflows capture precise descriptions of the steps and data dependencies needed to carry out computational data pipelines, analysis and simulations in many areas of Science, including the Life Sciences. The use of computational workflows to manage these multi-step computational processes has accelerated in the past few years driven by the need for scalable data processing, the exchange of processing know-how, and the desire for more reproducible (or at least transparent) and quality assured processing methods. The SARS-CoV-2 pandemic has significantly highlighted the value of workflows.
This increased interest in workflows has been matched by the number of workflow management systems available to scientists (Galaxy, Snakemake, Nextflow and 270+ more) and the number of workflow services like registries and monitors. There is also recognition that workflows are first class, publishable Research Objects just as data are. They deserve their own FAIR (Findable, Accessible, Interoperable, Reusable) principles and services that cater for their dual roles as explicit method description and software method execution [1]. To promote long-term usability and uptake by the scientific community, workflows (as well as the tools that integrate them) should become FAIR+R(eproducible), and citable so that author’s credit is attributed fairly and accurately.
The work on improving the FAIRness of workflows has already started and a whole ecosystem of tools, guidelines and best practices has been under development to reduce the time needed to adapt, reuse and extend existing scientific workflows. An example is the EOSC-Life Cluster of 13 European Biomedical Research Infrastructures which is developing a FAIR Workflow Collaboratory based on the ELIXIR Research Infrastructure for Life Science Data Tools ecosystem. While there are many tools for addressing different aspects of FAIR workflows, many challenges remain for describing, annotating, and exposing scientific workflows so that they can be found, understood and reused by other scientists.
This keynote will explore the FAIR principles for computational workflows in the Life Science using the EOSC-Life Workflow Collaboratory as an example.
[1] Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes,Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, and Daniel Schober FAIR Computational Workflows Data Intelligence 2020 2:1-2, 108-121 https://doi.org/10.1162/dint_a_00033.
Containerizing HPC and AI applications using E4S and Performance Monitor toolGanesan Narayanasamy
The DOE Exascale Computing Project (ECP) Software Technology focus area is developing an HPC software ecosystem that will enable the efficient and performant execution of exascale applications. Through the Extreme-scale Scientific Software Stack (E4S) [https://e4s.io], it is developing a comprehensive and coherent software stack that will enable application developers to productively write highly parallel applications that can portably target diverse exascale architectures. E4S provides both source builds through the Spack platform and a set of containers that feature a broad collection of HPC software packages. E4S exists to accelerate the development, deployment, and use of HPC software, lowering the barriers for HPC users. It provides container images, build manifests, and turn-key, from-source builds of popular HPC software packages developed as Software Development Kits (SDKs). This effort includes a broad range of areas including programming models and runtimes (MPICH, Kokkos, RAJA, OpenMPI), development tools (TAU, HPCToolkit, PAPI), math libraries (PETSc, Trilinos), data and visualization tools (Adios, HDF5, Paraview), and compilers (LLVM), all available through the Spack package manager. It will describe the community engagements and interactions that led to the many artifacts produced by E4S. It will introduce the E4S containers are being deployed at the HPC systems at DOE national laboratories using Singularity, Shifter, and Charliecloud container runtimes.
This talk will describe how E4S can support the OpenPOWER platform with NVIDIA GPUs.
This document summarizes a research paper presented at the 1st Workshop on Multimedia-Aware Networking 2011 that proposes a preliminary design for a content-aware network node. The design aims to classify multimedia traffic based on deep packet inspection and apply specific routing policies. An experimental testbed was implemented to validate the concept, which demonstrated that the content-aware node was able to identify traffic and route differently compared to a traditional agnostic node, without affecting delay.
PrOnto: an Ontology Driven Business Process Mining ToolFrancesco Nocera
Presentation at the 21st International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES 2017) 6 – 8 September, 2017, Marseille, France
Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...Matthew Skelton
In this talk, we explore five practical, tried-and-tested, real world techniques for improving operability with many kinds of software systems, including cloud, Serverless, on-premise, and IoT.
Logging as a live diagnostics vector with sparse Event IDs
Operational checklists and ‘Run Book dialogue sheets’ as a discovery mechanism for teams
Endpoint healthchecks as a way to assess runtime dependencies and complexity
Correlation IDs beyond simple HTTP calls
Lightweight ‘User Personas’ as drivers for operational dashboards
Based on our work in many industry sectors, we will share our experience of helping teams to improve the operability of their software systems through
Required audience experience
Some experience of building web-scale systems or industrial IoT/embedded systems would be helpful.
Objective of the talk
We will share our experience of helping teams to improve the operability of their software systems. Attendees will learn some practical operability approaches and how teams can expand their understanding and awareness of operability through these simple, team-friendly techniques.
From a talk given at Continuous Lifecycle London 2018: https://continuouslifecycle.london/sessions/practical-team-focused-operability-techniques-for-distributed-systems/
20090918 Agile Computer Control of a Complex ExperimentJonathan Blakes
The document summarizes a research paper that describes using Python and related tools like Traits and TraitsUI to build a flexible and reliable software system for controlling a complex experiment. It allows for interfacing with specialized hardware, unit testing components, running operations in parallel, building graphical user interfaces, and making the system data-driven and evolvable to change with experimental needs. Key advantages included pushing hardware interface code to a high-level language and automatically generating GUIs based on underlying data models.
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...Rafael Ferreira da Silva
Scientific workflows are a cornerstone of modern scientific computing. They are used to describe complex computational applications that require efficient and robust management of large volumes of data, which are typically stored/processed on heterogeneous, distributed resources. The workflow research and development community has employed a number of methods for the quantitative evaluation of existing and novel workflow algorithms and systems. In particular, a common approach is to simulate workflow executions. In previous work, we have presented a collection of tools that have been used for aiding research and development activities in the Pegasus project, and that have been adopted by others for conducting workflow research. Despite their popularity, there are several shortcomings that prevent easy adoption, maintenance, and consistency with the evolving structures and computational requirements of production workflows. In this work, we present WorkflowHub, a community framework that provides a collection of tools for analyzing workflow execution traces, producing realistic synthetic workflow traces, and simulating workflow executions. We demonstrate the realism of the generated synthetic traces by comparing simulated executions of these traces with actual workflow executions. We also contrast these results with those obtained when using the previously available collection of tools. We find that our framework not only can be used to generate representative synthetic workflow traces (i.e., with workflow structures and task characteristics distributions that resemble those in traces obtained from real-world workflow executions), but can also generate representative workflow traces at larger scales than that of available workflow traces.
Ingredients for Semantic Sensor NetworksOscar Corcho
The document discusses ingredients for creating a Semantic Sensor Web including an ontology model, URI definition practices, semantic technologies like SPARQL, and mappings to integrate sensor data. It provides an overview of the SSN ontology for describing sensors and observations. Examples are given of querying sensor data streams using SPARQL extensions and translating queries to sensor network APIs using mappings. Lessons on publishing and consuming linked stream data are also discussed.
This document proposes a method to improve the reuse of workflow fragments by mining workflow repositories. It evaluates different graph representations of workflows and uses the SUBDUE algorithm to identify recurrent fragments. An experiment compares representations on precision, recall, memory usage, and time. Representation D1, which labels edges and nodes, performed best. A second experiment assesses how filtering workflows by keywords impacts finding relevant fragments for a user query. The method aims to incorporate workflow fragment search capabilities into the design lifecycle to promote reuse.
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARNJosh Patterson
This document summarizes Josh Patterson's work on parallel machine learning algorithms. It discusses his past publications and work on routing algorithms and metaheuristics. It then outlines his work developing parallel versions of algorithms like linear regression, logistic regression, and neural networks using Hadoop and YARN. It presents performance results showing these parallel algorithms can achieve close to linear speedup. It also discusses techniques used like vector caching and unit testing frameworks. Finally, it discusses future work on algorithms like Adagrad and parallel quasi-Newton methods.
This document proposes extending algorithmic skeletons with event-driven programming to address the inversion of control problem in skeleton frameworks. It introduces event listeners that can be registered at event hooks within skeletons to access runtime information. This allows implementing non-functional concerns like logging and performance monitoring separately from the core parallel logic. The approach is implemented in the Skandium skeleton library, and examples are given of a logger and online performance monitor built using it. An analysis shows the overhead of processing events is negligible, at around 20 microseconds per event.
This work is dedicated to the design of a robust fault detection system for an anaerobic bioreactor modeled as a Descriptor Linear Parameter Varying system (D-LPV) system with unmeasurable gain scheduling functionss. The main goals are: first, an accurate D-LPV representation of the nonlinear model is obtained by considering the sector nonlinearity modeling approach. Second, a LPV observer is designed to estimate the bioprocess states variables. The observer considers the gains scheduling functions depending on estimated states due to the lack of a biomass mass sensor. Third, in oder to perform fault detection and isolation, by mean of residual generation, a bank of observers is designed. For each observer, sufficient conditions to guarantee asymptotic stability and robustness against disturbance are given by a set of feasible Linear Matrix Inequalities (LMIs). Finally, some simulations in fault-free and faulty operations are considered on the bioreactor system.
This document discusses the RAMP (Rapid Analytics and Model Prototyping) program, which hosts data science competitions that allow participants to submit code. RAMP aims to enable collaborative prototyping, teaching, and managing the data science process. It has hosted many challenges across various domains. The document outlines RAMP's goals, examples of past and upcoming challenges, and tools for defining and testing predictive workflows that are submitted as part of the challenges.
Similar to Online Workflow Management and Performance Analysis with Stampede (20)
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfTechgropse Pvt.Ltd.
In this blog post, we'll delve into the intersection of AI and app development in Saudi Arabia, focusing on the food delivery sector. We'll explore how AI is revolutionizing the way Saudi consumers order food, how restaurants manage their operations, and how delivery partners navigate the bustling streets of cities like Riyadh, Jeddah, and Dammam. Through real-world case studies, we'll showcase how leading Saudi food delivery apps are leveraging AI to redefine convenience, personalization, and efficiency.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
20240605 QFM017 Machine Intelligence Reading List May 2024
Online Workflow Management and Performance Analysis with Stampede
1. Online Workflow Management
and Performance Analysis with
Stampede
Dan Gunter1, Taghrid Samak1, Monte Goode1,
Ewa Deelman2, Gaurang Mehta2, Fabio Silva2, Karan Vahi2
Christopher Brooks3
Priscilla Moraes4, Martin Swany4
1 Lawrence Berkeley National Laboratory
2 University of Southern California, Information Sciences Institute
3 University of San Francisco
4 University of Delaware
1
3. Goal: Predict behavior of
running scientific workflows
— Primarily failures
— Is a given workflow going to “fail”?
— Are specific resources causing problems?
— Which application sub-components are failing?
— Is the data staging a problem?
— In large workflows, some failures, etc. are normal
— This work is about learning from known problems, which
patterns of failures, etc. are unusual and require adaptation
— Do all of this as generally as possible: Can we provide a
solution that can apply to all workflow engines?
CNSM 2011, October
24-28, Paris, France 3
4. Approach
— Model the monitoring data from running workflows
— Collect all the data in real-time
— Run analysis, also in real-time, on the collected
data
— map low-level failures to application-level
characteristics
— Feed back analysis to user, workflow engine
CNSM 2011, October
24-28, Paris, France 4
5. Scientific Applications
Montage Epigenome LIGO CyberShake
Astronomy Bioinformatics Astrophysics Geophysics
CNSM 2011, October
24-28, Paris, France 5
6. Domain: Large Scientific
Workflows
SCEC-2009: Millions of tasks completed per day
Radius = 11 million
6
8. Basic terms and concepts
Success
Execution
Fail
Workflow Resources
Workflow Management System
CNSM 2011, October
24-28, Paris, France 8
9. Base technologies
— Workflow management systems
— Pegasus
— www.pegasus.isi.edu
— Monitoring and data analysis +
— NetLogger
— www.netlogger.lbl.gov
CNSM 2011, October
24-28, Paris, France 9
10. Data Model
CNSM 2011, October
24-28, Paris, France 10
11. Data Model Goals
— Be widely applicable: — Provide everything we
there are many need for Pegasus
workflow engines out workflows
there that could
benefit.
CNSM 2011, October
24-28, Paris, France
10/27/11
11
12. Abstract and Executable
Workflows
— Workflows start as a resource-independent
statement of computations, input and output data,
and dependencies
— This is called the Abstract Workflow (AW)
— For each workflow run, Pegasus-WMS plans the
workflow, adding helper tasks and clustering small
computations together
— This is called the Executable Workflow (EW)
— Note: Most of the logs are from the EW but the
user really only knows the AW.
CNSM 2011, October
24-28, Paris, France 12
13. Additional Terminology
— Workflow: Container for an entire computation
— Sub-workflow: Workflow that is contained in another workflow
— Task: Representation of a computation in the AW
— Job: Node in the EW
— May represent part of a task (e.g., a stage-in/out), one task,
or many tasks
— Job instance: Job scheduled or running by underlying system
— Due to retries, there may be multiple job instances per job
— Invocation: One or more executables for a job instance
— Invocations are the instantiation of tasks, whereas jobs are an
intermediate abstraction for use by the planning and
scheduling sub-systems
CNSM 2011, October
24-28, Paris, France 13
14. Denormalized Data Model
— Stream of timestamped “events”:
— unique, hierarchical, name
— unique identifiers (workflow, job, etc.)
— values and metadata
— Used NETCONF YANG data-modeling language, keyed on
event name [RFCs: 6020 6021 (6087)]
— YANG schema (see bit.ly/nQfPd1) documents and validates
each log event
Snippet of schema
container stampede.xwf.start {
description “Start of executable workflow”;
uses base-event;
leaf restart_count {
type uint32;
description "Number of times workflow was restarted (due to
failures)”; }}
CNSM 2011, October
24-28, Paris, France 14
15. Relational data model
Abstract
task_edge Workflow (AW) jobstate
Task parent Job status
and child
task job job_instance
Task Job Job Instance
job_edge
Job parent and child
workflow invocation
Workflow Invocation
workflow_state Executable
Workflow status
AW and EW Workflow (EW)
CNSM 2011, October
24-28, Paris, France 15
16. Infrastructure
CNSM 2011, October
24-28, Paris, France
10/27/11
16
17. Infrastructure overview
Raw logs
Normalized logs
Query Subscribe
CNSM 2011, October
24-28, Paris, France 17
18. Detailed data flow
Pegasus
Log collection and
normalization
NetLogger Failure detection
Real-time
analysis
Relational archive
CNSM 2011, October
24-28, Paris, France 18
19. Message bus usage
BP Log events
Routing key = event name
AMQP Exchange Queue … Queue
Subscribe Data
Analysis client … Analysis client
CNSM 2011, October
24-28, Paris, France
10/27/11
19
22. Workflow clustering
— Features collected for each workflow run
— Successful jobs
— Failed jobs
— Success duration
— Fail duration
— Offline clustering on historical data
— Algorithm: k-means
— Online analysis classifies workflows according to
nearest cluster
22
23. “High Failure” Workflows
(HFW)
— The workflow engine keeps retrying workflows until
they complete or time out
— But in the experimental logs, workflows are never
marked as “failed”
— Aside: this is fixed in the newest version
— Therefore, we use a simple heuristic for identifying
workflows as problematic:
— HFW means: > 50% of jobs failed
CNSM 2011, October
24-28, Paris, France 23
24. HFW failure patterns
Montage application
X-axis is
normalized
workflow execution
time
Y-axis shows the
percent of total job
failures for this
workflow, so far
Legend shows, for
each workflow,
jobs failed/jobs total
24
25. More HFW Failure Patterns
Epigenome Broadband
Montage CyberShake
25
27. Online classification
Workflows
4 21:512/905
24:28/29
Workflow classification
25:28/29
27:4/4
33:28/30
41:64/89
3
Class
Doesn’t
converge
2
High-failure workflow class
1
0 20 40 60 80 100
Lifetime %
CNSM 2011, October
24-28, Paris, France 27
28. Anomaly detection
Montage application
1.0
X: total number
0.9 of failures
Anomalous!
0.8
See Slide #24 Y: proportion of
Cumulative Percent
time-windows
0.6
46:281/496 experiencing
48:62/65 that number of
failures or less
0.4
49:44/73
50:36/65
51:22/37
0.2
52:38/51
53:42/57
54:32/48
0.0
0 10 15 20 30
CNSM 2011, October
Failures
24-28, Paris, France 28
29. System
broadband cybershake
4
10
performance 103
102
Bars show the 101
rate for each 100
Query type
Median queries minute, log10 scale
epigenome ligo 01-JobsTot
type of query 104 02-JobsState
03-JobsType
103 04-JobsHost
Each panel is an 102
05-TimeTot
06-TimeState
application 101 07-TimeType
08-TimeHost
0
10 09-JobDelay
Dashed black
montage periodograms 10-WfSumm
104 11-HostSumm
lines are median 103
arrival rate for 102
the application. 101
100
01 02 03 04 05 06 07 08 09 10 11 01 02 03 04 05 06 07 08 09 10 11
29
Query type
CNSM 2011, October 24-28, Paris,
France
30. Summary
— Real-time failure prediction for scientific workflows
is a challenging but important task
— Unsupervised learning can be used to model high-
level workflow failures from historical data
— High failure classes of workflows can be predicted
in real-time with high accuracy
— Future directions
— Analysis; root-cause investigation
— System; notifications and updates
— Working with data from other workflow systems
CNSM 2011, October 24-28, Paris, 30
France
31. Thank you!
For more information, visit the Stampede wiki at:
https://confluence.pegasus.isi.edu/display/stampede/
32. Extra slides..
CNSM 2011, October
24-28, Paris, France 32
34. Pegasus
— Maps from abstract to concrete workflow
— Algorithmic and AI-based techniques
— Automatically locates physical locations for both
workflow components and data
— Finds appropriate resources to execute
— Reuses existing data products where applicable
— Publishes newly derived data products
— Provides provenance information
CNSM 2011, October
24-28, Paris, France 34
35. NetLogger
— Logging Methodology
— Timestamped, named, messages at the start and end
of significant events, with additional identifiers and
metadata in a std. line-oriented ASCII format (Best
Practices or BP)
— APIs are provided, incl. in-memory log aggregation for
high frequency events; but message generation is often
best done within an existing framework
— Logging and Analysis Tools
— Parse many existing formats to BP
— Load BP into message bus, MySQL, MongoDB, etc.
— Generate profiles, graphs, and CSV from BP data
CNSM 2011, October
24-28, Paris, France 35