Daniel Garijo is working on creating abstractions in scientific workflows through workflow traces, templates, provenance and plan representations. He has developed algorithms to automatically find workflow abstractions and link them to provenance traces. His past work includes developing OPMW for provenance publishing and a motif catalog of recurring workflow patterns. His next steps are finishing the implementation of macro abstraction detection and publishing evaluation results.
Software engineering methodologies also work for Ontology engineering. This presentation from Bio-Ontologies 2012 describes how we are using Jenkins CI in GO and other ontologies.
The document discusses semantic web services and proposes approaches to help describe, discover, compose and mediate between services in a semantic way. It presents technologies like OWL-S, WSMO that model service semantics and proposes semantic templates to describe service aspects like inputs, outputs and properties in a declarative way. It also discusses facilitating service discovery, composition and mediation through semantic annotations and using techniques like faceted search and semantic association querying. The document argues that taking a semantic approach helps address challenges in service interoperability, discovery and mediation.
This document discusses TAO APIs for integrating standalone items into computer-based assessment platforms. It provides an overview of the available APIs, including client-side and server-side APIs, and how they can be used for item I/O, backend setup, event logging, item state, and more. It also describes how to run a standalone item and which features are needed to integrate it with a CBA platform using the Item Runtime and Workflow Runtime APIs. The document encourages contributions to the TAO APIs by discussing support resources on the TAO Forge.
This document provides an overview and agenda for a presentation on using the Splunk Java SDK. The presenter is introduced as a developer evangelist for Splunk. The agenda includes an overview of the Splunk platform, REST API and SDKs, a detailed look at the Java SDK, examples of using the SDK to log events, search data, and work with saved searches, and a discussion of ways to think outside the box when using Splunk with Java.
S4 is a distributed stream computing platform that allows programmers to easily implement applications for processing continuous unbounded streams of data in real-time. It uses an actor-based programming model and is designed to be fault-tolerant, scalable, and pluggable. S4 was originally developed at Yahoo! Labs to enable personalized search ads by modeling users' click behaviors in real-time from streams of user activity data. It aims to maximize revenue and user experience by controlling ad ranking, pricing, filtering, and placement based on personalized models of users' intent.
Software engineering methodologies also work for Ontology engineering. This presentation from Bio-Ontologies 2012 describes how we are using Jenkins CI in GO and other ontologies.
The document discusses semantic web services and proposes approaches to help describe, discover, compose and mediate between services in a semantic way. It presents technologies like OWL-S, WSMO that model service semantics and proposes semantic templates to describe service aspects like inputs, outputs and properties in a declarative way. It also discusses facilitating service discovery, composition and mediation through semantic annotations and using techniques like faceted search and semantic association querying. The document argues that taking a semantic approach helps address challenges in service interoperability, discovery and mediation.
This document discusses TAO APIs for integrating standalone items into computer-based assessment platforms. It provides an overview of the available APIs, including client-side and server-side APIs, and how they can be used for item I/O, backend setup, event logging, item state, and more. It also describes how to run a standalone item and which features are needed to integrate it with a CBA platform using the Item Runtime and Workflow Runtime APIs. The document encourages contributions to the TAO APIs by discussing support resources on the TAO Forge.
This document provides an overview and agenda for a presentation on using the Splunk Java SDK. The presenter is introduced as a developer evangelist for Splunk. The agenda includes an overview of the Splunk platform, REST API and SDKs, a detailed look at the Java SDK, examples of using the SDK to log events, search data, and work with saved searches, and a discussion of ways to think outside the box when using Splunk with Java.
S4 is a distributed stream computing platform that allows programmers to easily implement applications for processing continuous unbounded streams of data in real-time. It uses an actor-based programming model and is designed to be fault-tolerant, scalable, and pluggable. S4 was originally developed at Yahoo! Labs to enable personalized search ads by modeling users' click behaviors in real-time from streams of user activity data. It aims to maximize revenue and user experience by controlling ad ranking, pricing, filtering, and placement based on personalized models of users' intent.
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Cloudera, Inc.
This session will answer frequently asked questions about Hadoop, and share proven ways you can overcome challenges in deploying, managing, and tuning Hadoop environments. The discussion topics will include Hadoop operations, configuration management, upgrades and lifecycle management, monitoring and managing power and heat, and Hadoop performance tuning, testing, and optimization. The presenters will also discuss how rapid Hadoop deployment makes life easier for administrators, and talk about Crowbar, an open source Operations Framework.
Salesforce.com uses Hadoop to analyze large amounts of customer data generated from over 130,000 customers and 800 million daily transactions to track product usage and customer behavior. Some key use cases discussed include analyzing product metrics to understand feature adoption, examining user behavior to improve products, and enabling collaborative filtering recommendations. The document outlines Salesforce.com's Hadoop ecosystem and data pipelines used to collect, process, and visualize insights from petabytes of customer data.
Spring Framework provides a comprehensive infrastructure to develop Java applications. It handles dependency injection and inversion of control so developers can focus on domain logic rather than plumbing. Spring promotes best practices like clean code, test-driven development, and design patterns. It includes aspects for cross-cutting concerns and supports web development through frameworks like Spring MVC. The document introduces Spring's lightweight and modular IoC container and AOP capabilities for separating concerns.
Online performance modeling and analysis of message-passing parallel applicat...MOCA Platform
Although the evolution of hardware is improving at an incredible rate, the advances in
parallel software have been hampered for many reasons. Developing an efficient parallel
application is still not an easy task. Our thesis is that many performance problems and their reasons can be quickly located and explained with automated techniques that work on unmodified parallel applications. This work identifies main obstacles for such diagnosis and presents a two-step approach for addressing them. In this approach, the application is automatically modeled and diagnosed during its execution.
First, we introduce an online performance modeling technique that enables automated discovery of causal execution flows through communication and computational activities in message-passing parallel programs. Second, we present a systematic approach to online performance analysis. The automated
analysis uses online model to quickly identify the most important performance problems,
and correlate them with application source code. Our technique is able to discover causal
dependences between the problems, infer their root causes in some scenarios and explain
them to developers. In this work, we focus on diagnosing scientific MPI parallel applications and their communication and computational problems although the approach can be extended to support other classes of activities and programming models.
We have evaluated our approach on a variety of scientific parallel applications. In all scenarios, our online performance modeling technique proved effective for low-overhead capturing of program’s behavior and facilitated performance understanding. With our automated, model-based performance analysis approach, we were able to easily identify the most severe performance problems during application execution, and locate their root causes without previous knowledge of application internals.
Chisimba - introduction to practical demoDerek Keats
This document provides an overview of the Chisimba open-source eLearning platform:
- Chisimba is built on free and open-source software (FOSS) applications and uses FOSS development and collaboration tools.
- It has a modular architecture that allows new features and functionality to be added through plugins and modules.
- The system is highly configurable and customizable, allowing it to be used for different domains like eLearning, content management, and more.
SharePoint is a set of integrated technologies that provides a platform for organizations to build a flexible, long-term information and knowledge management infrastructure. SharePoint has gained over 250 million users in a few years and continues growing as customers apply it to more business processes. SharePoint is also increasingly becoming a compelling development platform with advanced capabilities in the 2010 release.
The SENSORIA Development Environment is a CASE tool for service-oriented architecture (SOA) development from the SENSORIA EU FP6 project. It has 19 partners from 7 countries over 4 years with 4 million Euro funding. The tool provides an integrated platform for SOA development tools, allowing tools to be discovered, installed, composed, and orchestrated as services. The environment is based on Eclipse and OSGi services. It addresses challenges in SOA such as service specification, composition correctness, and continuous operation in changing environments.
R and Hadoop are changing the way organizations manage and utilize big data. Think Big Analytics and Revolution Analytics are helping clients plan, build, test and implement innovative solutions based on the two technologies that allow clients to analyze data in new ways; exposing new insights for the business. Join us as Jeffrey Breen explains the core technology concepts and illustrates how to utilize R and Revolution Analytics’ RevoR in Hadoop environments.
Boris Lublinsky and Alexey Yakubovich give us an overview of using Oozie. This presentation was given on December 13th, 2012 at the Nokia offices in Chicago, IL.
View the HD video of this talk here: http://vimeo.com/chug/oozie-overview
The document introduces Activiti, an open source BPMN 2.0 workflow and Business Process Management (BPM) platform. It discusses that Activiti provides a BPM engine that executes process definitions defined using BPMN boxes and arrows. It also manages persisting execution state while processes are waiting. The document outlines how Activiti supports both business users through visual process modeling, and developers by embedding workflow capabilities into Java applications.
The document introduces Activiti, an open source BPMN 2.0 workflow and Business Process Management (BPM) platform. It discusses that Activiti provides a BPM engine that executes process definitions defined using BPMN boxes and arrows. It also manages persisting execution state while processes are waiting. The document outlines how Activiti supports both business users through visual process modeling, and developers by embedding workflow capabilities into Java applications.
Code Reuse Made Easy: Uncovering the Hidden Gems of Corporate and Open Source...Perforce
Watch the webinar on-demand here: http://info.perforce.com/US_2013Q1_Webinar_BlackDuck_register.html
With 100’s of millions of lines of code running in most large companies and billions of lines of code available in the world of open source, a significant opportunity exists to reuse code written by others in your next application. Sounds easy, but finding the right code and evaluating it for reuse can be a daunting task – one that often causes people to start from scratch.
Join Dave Gruber and Randy DeFauw as they demonstrate how Black Duck and Perforce Version Management can help you discover, track and reuse software components from both your internal code bases and the world of open source.
Learn concrete steps for:
Managing open source and internal components
Consuming or forking these components in your projects
And coming to grips with open source libraries hosted in public Git repositories
This document provides an overview of Spring fundamentals including dependency injection, the Spring IoC container, bean configuration, bean scopes, the bean lifecycle, AOP with Spring, testing with Spring, and topics beyond the Spring basics. It discusses how Spring addresses common problems developers face through its features like dependency injection and aspects.
OCCiware A Formal and Tooled Toolchain For Managing Everything as a Service OCCIware
The OCCIware project aims at building a comprehensive engineering toolchain for managing Everything as a Service (XaaS). The objective is to dramatically decrease the cost of using or providing XaaS by breaking silos between layers and domains of Cloud Computing. Leveraging the Open Cloud Computing Interface (OCCI) standard-to-be, we are developing a model-driven engineering studio as well as generic runtimes adapted to various domains: Linked Open Data, cloud computing, platform as a service, Big Data, connected objects, etc. The project, funded by French Ministry of Industry, involves 10 academic and industrial partners and is supervised by a committee of 11 top scientists and industry experts.
Single API for library services (poster)Milan Janíček
This document introduces a single API that allows access to basic library data from different sources in simple formats like JSON. It describes a JavaScript demo application that fetches data from the API and displays it in a user-friendly way without needing to directly interact with complex library system APIs. The API acts as a middleware that standardizes access to data from systems like Aleph and SFX in order to more easily present information to users.
The document discusses open source and open standards. It provides definitions of open source, including that licensing includes GPL, Apache, and ECL. Open source communities typically involve small numbers of developers and projects hosted on SourceForge. Open standards bodies include W3C and standards for HTML, XML, SOAP, and floating-point arithmetic. The document asks what UBC can do to support open source, such as policy statements, user groups, and funding.
Research objects aim to preserve digital science by aggregating all elements needed to understand a research investigation, including data, computational processes, and annotations. They promote reuse and verification of reproducibility. The anatomy of a research object includes resources like datasets and workflows that are described and related using semantic technologies. Tools are being developed to work with research objects, and standards like the Open Annotation Data Model and PROV are being used to represent their evolution over time.
Stefane Fermigier is the Chairman and Founder of Nuxeo, an open source ECM software company established in 2000. Nuxeo EP 5.2 is a full-featured software platform for ECM that provides many new features such as content annotations, content preview, and a visible content store. Nuxeo has many customers including media companies and partners some of whom were featured in case studies such as AFP.
EclipseConEurope2012 SOA - Models As Operational DocumentationMarc Dutoo
At Eclipse Con Europe 2012 in the SOA Symposium track, JWT's EMF model export to structure and information in Document Management Systems is explained and demonstrated for in the case of the EasySOA service documentation registry, with JWT workflows producing a basis for SOA operational documentation.
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
This document describes FOOPS, an ontology validation service that checks ontologies for adherence to the FAIR principles. FOOPS tests ontologies against criteria related to findability, accessibility, interoperability, and reusability. It provides explanations for test failures to help users improve their ontologies. FOOPS validation results include an overall FAIRness score and coverage of FAIR categories to assess ontology quality, though there is no single threshold for what makes an ontology fully FAIR. The document demonstrates FOOPS and lists the types of tests it supports under each FAIR category. It invites feedback to help further improve FOOPS.
FAIR Workflows: A step closer to the Scientific Paper of the Futuredgarijo
Keynote presented at the Computational and Autonomous Workflows (CAW-2021) at the Oak Ridge National Laboratory. The keynote describes an overview of the different aspects to take into account when aiming to create FAIR workflows and associated resources.
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Cloudera, Inc.
This session will answer frequently asked questions about Hadoop, and share proven ways you can overcome challenges in deploying, managing, and tuning Hadoop environments. The discussion topics will include Hadoop operations, configuration management, upgrades and lifecycle management, monitoring and managing power and heat, and Hadoop performance tuning, testing, and optimization. The presenters will also discuss how rapid Hadoop deployment makes life easier for administrators, and talk about Crowbar, an open source Operations Framework.
Salesforce.com uses Hadoop to analyze large amounts of customer data generated from over 130,000 customers and 800 million daily transactions to track product usage and customer behavior. Some key use cases discussed include analyzing product metrics to understand feature adoption, examining user behavior to improve products, and enabling collaborative filtering recommendations. The document outlines Salesforce.com's Hadoop ecosystem and data pipelines used to collect, process, and visualize insights from petabytes of customer data.
Spring Framework provides a comprehensive infrastructure to develop Java applications. It handles dependency injection and inversion of control so developers can focus on domain logic rather than plumbing. Spring promotes best practices like clean code, test-driven development, and design patterns. It includes aspects for cross-cutting concerns and supports web development through frameworks like Spring MVC. The document introduces Spring's lightweight and modular IoC container and AOP capabilities for separating concerns.
Online performance modeling and analysis of message-passing parallel applicat...MOCA Platform
Although the evolution of hardware is improving at an incredible rate, the advances in
parallel software have been hampered for many reasons. Developing an efficient parallel
application is still not an easy task. Our thesis is that many performance problems and their reasons can be quickly located and explained with automated techniques that work on unmodified parallel applications. This work identifies main obstacles for such diagnosis and presents a two-step approach for addressing them. In this approach, the application is automatically modeled and diagnosed during its execution.
First, we introduce an online performance modeling technique that enables automated discovery of causal execution flows through communication and computational activities in message-passing parallel programs. Second, we present a systematic approach to online performance analysis. The automated
analysis uses online model to quickly identify the most important performance problems,
and correlate them with application source code. Our technique is able to discover causal
dependences between the problems, infer their root causes in some scenarios and explain
them to developers. In this work, we focus on diagnosing scientific MPI parallel applications and their communication and computational problems although the approach can be extended to support other classes of activities and programming models.
We have evaluated our approach on a variety of scientific parallel applications. In all scenarios, our online performance modeling technique proved effective for low-overhead capturing of program’s behavior and facilitated performance understanding. With our automated, model-based performance analysis approach, we were able to easily identify the most severe performance problems during application execution, and locate their root causes without previous knowledge of application internals.
Chisimba - introduction to practical demoDerek Keats
This document provides an overview of the Chisimba open-source eLearning platform:
- Chisimba is built on free and open-source software (FOSS) applications and uses FOSS development and collaboration tools.
- It has a modular architecture that allows new features and functionality to be added through plugins and modules.
- The system is highly configurable and customizable, allowing it to be used for different domains like eLearning, content management, and more.
SharePoint is a set of integrated technologies that provides a platform for organizations to build a flexible, long-term information and knowledge management infrastructure. SharePoint has gained over 250 million users in a few years and continues growing as customers apply it to more business processes. SharePoint is also increasingly becoming a compelling development platform with advanced capabilities in the 2010 release.
The SENSORIA Development Environment is a CASE tool for service-oriented architecture (SOA) development from the SENSORIA EU FP6 project. It has 19 partners from 7 countries over 4 years with 4 million Euro funding. The tool provides an integrated platform for SOA development tools, allowing tools to be discovered, installed, composed, and orchestrated as services. The environment is based on Eclipse and OSGi services. It addresses challenges in SOA such as service specification, composition correctness, and continuous operation in changing environments.
R and Hadoop are changing the way organizations manage and utilize big data. Think Big Analytics and Revolution Analytics are helping clients plan, build, test and implement innovative solutions based on the two technologies that allow clients to analyze data in new ways; exposing new insights for the business. Join us as Jeffrey Breen explains the core technology concepts and illustrates how to utilize R and Revolution Analytics’ RevoR in Hadoop environments.
Boris Lublinsky and Alexey Yakubovich give us an overview of using Oozie. This presentation was given on December 13th, 2012 at the Nokia offices in Chicago, IL.
View the HD video of this talk here: http://vimeo.com/chug/oozie-overview
The document introduces Activiti, an open source BPMN 2.0 workflow and Business Process Management (BPM) platform. It discusses that Activiti provides a BPM engine that executes process definitions defined using BPMN boxes and arrows. It also manages persisting execution state while processes are waiting. The document outlines how Activiti supports both business users through visual process modeling, and developers by embedding workflow capabilities into Java applications.
The document introduces Activiti, an open source BPMN 2.0 workflow and Business Process Management (BPM) platform. It discusses that Activiti provides a BPM engine that executes process definitions defined using BPMN boxes and arrows. It also manages persisting execution state while processes are waiting. The document outlines how Activiti supports both business users through visual process modeling, and developers by embedding workflow capabilities into Java applications.
Code Reuse Made Easy: Uncovering the Hidden Gems of Corporate and Open Source...Perforce
Watch the webinar on-demand here: http://info.perforce.com/US_2013Q1_Webinar_BlackDuck_register.html
With 100’s of millions of lines of code running in most large companies and billions of lines of code available in the world of open source, a significant opportunity exists to reuse code written by others in your next application. Sounds easy, but finding the right code and evaluating it for reuse can be a daunting task – one that often causes people to start from scratch.
Join Dave Gruber and Randy DeFauw as they demonstrate how Black Duck and Perforce Version Management can help you discover, track and reuse software components from both your internal code bases and the world of open source.
Learn concrete steps for:
Managing open source and internal components
Consuming or forking these components in your projects
And coming to grips with open source libraries hosted in public Git repositories
This document provides an overview of Spring fundamentals including dependency injection, the Spring IoC container, bean configuration, bean scopes, the bean lifecycle, AOP with Spring, testing with Spring, and topics beyond the Spring basics. It discusses how Spring addresses common problems developers face through its features like dependency injection and aspects.
OCCiware A Formal and Tooled Toolchain For Managing Everything as a Service OCCIware
The OCCIware project aims at building a comprehensive engineering toolchain for managing Everything as a Service (XaaS). The objective is to dramatically decrease the cost of using or providing XaaS by breaking silos between layers and domains of Cloud Computing. Leveraging the Open Cloud Computing Interface (OCCI) standard-to-be, we are developing a model-driven engineering studio as well as generic runtimes adapted to various domains: Linked Open Data, cloud computing, platform as a service, Big Data, connected objects, etc. The project, funded by French Ministry of Industry, involves 10 academic and industrial partners and is supervised by a committee of 11 top scientists and industry experts.
Single API for library services (poster)Milan Janíček
This document introduces a single API that allows access to basic library data from different sources in simple formats like JSON. It describes a JavaScript demo application that fetches data from the API and displays it in a user-friendly way without needing to directly interact with complex library system APIs. The API acts as a middleware that standardizes access to data from systems like Aleph and SFX in order to more easily present information to users.
The document discusses open source and open standards. It provides definitions of open source, including that licensing includes GPL, Apache, and ECL. Open source communities typically involve small numbers of developers and projects hosted on SourceForge. Open standards bodies include W3C and standards for HTML, XML, SOAP, and floating-point arithmetic. The document asks what UBC can do to support open source, such as policy statements, user groups, and funding.
Research objects aim to preserve digital science by aggregating all elements needed to understand a research investigation, including data, computational processes, and annotations. They promote reuse and verification of reproducibility. The anatomy of a research object includes resources like datasets and workflows that are described and related using semantic technologies. Tools are being developed to work with research objects, and standards like the Open Annotation Data Model and PROV are being used to represent their evolution over time.
Stefane Fermigier is the Chairman and Founder of Nuxeo, an open source ECM software company established in 2000. Nuxeo EP 5.2 is a full-featured software platform for ECM that provides many new features such as content annotations, content preview, and a visible content store. Nuxeo has many customers including media companies and partners some of whom were featured in case studies such as AFP.
EclipseConEurope2012 SOA - Models As Operational DocumentationMarc Dutoo
At Eclipse Con Europe 2012 in the SOA Symposium track, JWT's EMF model export to structure and information in Document Management Systems is explained and demonstrated for in the case of the EasySOA service documentation registry, with JWT workflows producing a basis for SOA operational documentation.
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
This document describes FOOPS, an ontology validation service that checks ontologies for adherence to the FAIR principles. FOOPS tests ontologies against criteria related to findability, accessibility, interoperability, and reusability. It provides explanations for test failures to help users improve their ontologies. FOOPS validation results include an overall FAIRness score and coverage of FAIR categories to assess ontology quality, though there is no single threshold for what makes an ontology fully FAIR. The document demonstrates FOOPS and lists the types of tests it supports under each FAIR category. It invites feedback to help further improve FOOPS.
FAIR Workflows: A step closer to the Scientific Paper of the Futuredgarijo
Keynote presented at the Computational and Autonomous Workflows (CAW-2021) at the Oak Ridge National Laboratory. The keynote describes an overview of the different aspects to take into account when aiming to create FAIR workflows and associated resources.
An increasing number of researchers rely on computational methods to generate the results described in their publications. Research software created to this end is heterogeneous (e.g., scripts, libraries, packages, notebooks, etc.) and usually difficult to find, reuse, compare and understand due to its disconnected documentation (dispersed in manuals, readme files, web sites, and code comments) and a lack of structured metadata to describe it. In this talk I will describe the main challenges for finding, comparing and reusing research software, how structured metadata can help to address some of them, which are the best practices being proposed by the community; and current initiatives to aid their adoption by researchers within EOSC.
Impact: The talk addresses an important aspect of the EOSC infrastructure for quality research software by ensuring that software contributed to the EOSC ecosystem can be found, compared and reused by researchers. The talk also aims to address metadata quality of current research products, which is critical for successful adoption.
Presented at the EOSC symposium
SOMEF: a metadata extraction framework from software documentationdgarijo
Presentation done at the council of software registries on March, 2021. SOMEF is a python package for automatically extracting over 25 metadata categories from a readme file. The output is then exported in JSON or in JSON-LD using the codemeta representation
A Template-Based Approach for Annotating Long-Tailed Datasetsdgarijo
An increasing amount of data is shared on the Web through heterogeneous spreadsheets and CSV files. In order to homogenize and query these data, the scientific community has developed Extract, Transform and Load (ETL) tools and services that help making these files machine readable in Knowledge Graphs (KGs). However, tabular data may be complex; and the level of expertise required by existing ETL tools makes it difficult for users to describe their own data. In this paper we propose a simple annotation schema to guide users when transforming complex tables into KGs. We have implemented our approach by extending T2WML, a table annotation tool designed to help users annotate their data and upload the results to a public KG. We have evaluated our effort with six non-expert users, obtaining promising preliminary results.
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphsdgarijo
In this presentation we describe the Ontology-Based APIs framework (OBA), our approach to automatically create REST APIs from ontologies while following RESTful API best
practices. Given an ontology (or ontology network) OBA uses standard technologies familiar to web developers (OpenAPI Specification, JSON) and combines them with W3C standards (OWL, JSON-LD frames and SPARQL) to create maintainable APIs with documentation, units tests, automated validation of resources and clients (in Python, Javascript, etc.) for non Semantic Web experts to access the contents of a target
knowledge graph. We showcase OBA with three examples that illustrate the capabilities of the framework for different ontologies.
Towards Knowledge Graphs of Reusable Research Software Metadatadgarijo
Research software is a key asset for understanding, reusing and reproducing results in computational sciences. An increasing amount of software is stored in code repositories, which usually contain human readable instructions indicating how to use it and set it up. However, developers and researchers often need to spend a significant amount of time to understand how to invoke a software component, prepare data in the required format, and use it in combination with other software. In addition, this time investment makes it challenging to discover and compare software with similar functionality. In this talk I will describe our efforts to address these issues by creating and using Open Knowledge Graphs that describe research software in a machine readable manner. Our work includes: 1) an ontology that extends schema.org and codemeta, designed to describe software and the specific data formats it uses; 2) an approach to publish software metadata as an open knowledge graph, linked to other Web of Data objects; and 3) a framework for automatically extracting metadata from software repositories; and 4) a framework to curate, query, explore and compare research software metadata in a collaborative manner. The talk will illustrate our approach with real-world examples, including a domain application for inspecting and discovering hydrology, agriculture, and economic software models; and the results of our framework when enriching the research software entries in Zenodo.org.
Scientific Software Registry Collaboration Workshop: From Software Metadata r...dgarijo
In this talk I briefly describe our work in OntoSoft for easy software metadata representation, and how new requirements for software reusability are making us move towards knowledge graphs of scientific software metadata
WDPlus: Leveraging Wikidata to Link and Extend Tabular Datadgarijo
Today, data about any domain can be found on the web in data repositories, web APIs and many millions of spreadsheets and CSV files. Researchers and organizations make these data available in a myriad of formats, layouts, terminology and cleanliness that make it difficult to integrate together. As a result, researchers aiming to use data in their analyses face three main challenges. The first one is finding datasets related to a feature, variable or topic of interest. For example, climate scientists need to look for years of observational data from authoritative sources when estimating the climate of a region. The second challenge is completing a given dataset with existing knowledge: machine learning applications are data hungry and require as many data points and features as possible to improve their predictions, which often requires integrating data from different sources. The third challenge is sharing integrated results: once several datasets have been merged together, how to make them available to the rest of the community?
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...dgarijo
Scientific software is crucial for understanding, reusing and reproducing results in computational sciences. Software is often stored in code repositories, which may contain human readable instructions necessary to use it and set it up. However, a significant amount of time is usually required to understand how to invoke a software component, prepare data in the format it requires, and use it in combination with other software. In this presentation we introduce OKG-Soft, an open knowledge graph that describes scientific software in a machine readable manner. OKG-Soft includes: 1) an ontology designed to describe software and the specific data formats it uses; 2) an approach to publish software metadata as an open knowledge graph, linked to other Web of Data objects; and 3) a framework to annotate, query, explore and curate scientific software metadata.
Towards Human-Guided Machine Learning - IUI 2019dgarijo
Automated Machine Learning (AutoML) systems are emerging
that automatically search for possible solutions from a large space of possible kinds of models. Although fully automated machine learning is appropriate for many applications, users often have knowledge that supplements and constraints the available data and solutions. This paper proposes human-guided machine learning (HGML) as a hybrid approach where a user interacts with an AutoML system and tasks it to explore different problem settings that reflect the user’s knowledge about the data available. We present: 1) a task analysis of HGML that shows the tasks that a user would want to carry out, 2) a characterization of two scientific publications, one in neuroscience and one in political science, in terms of how the authors would search for solutions using an AutoML system, 3) requirements for HGML based on those characterizations, and 4) an assessment of existing AutoML systems in terms of those requirements.
Capturing Context in Scientific Experiments: Towards Computer-Driven Sciencedgarijo
Scientists publish computational experiments in ways that do not facilitate reproducibility or reuse. Significant domain expertise, time and effort are required to understand scientific experiments and their research outputs. In order to improve this situation, mechanisms are needed to capture the exact details and the context of computational experiments. Only then, Intelligent Systems would be able help researchers understand, discover, link and reuse products of existing research.
In this presentation I will introduce my work and vision towards enabling scientists share, link, curate and reuse their computational experiments and results. In the first part of the talk, I will present my work for capturing and sharing the context of scientific experiments by using scientific workflows and machine readable representations. Thanks to this approach, experiment results are described in an unambiguous manner, have a clear trace of their creation process and include a pointer to the sources used for their generation. In the second part of the talk, I will describe examples on how the context of scientific experiments may be exploited to browse, explore and inspect research results. I will end the talk by presenting new ideas for improving and benefiting from the capture of context of scientific experiments and how to involve scientists in the process of curating and creating abstractions on available research metadata.
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...dgarijo
Traditional approaches to ontology development have a large lapse between the time when a user using the ontology has found a need to extend it and the time when it does get extended. For scientists, this delay can be weeks or months and can be a significant barrier for adoption. We present a new approach to ontology development and data annotation enabling users to add new metadata properties on the fly as they describe their datasets, creating terms that can be immediately adopted by others and eventually become standardized. This approach combines a traditional, consensus-based approach to ontology development, and a crowdsourced approach where ex-pert users (the crowd) can dynamically add terms as needed to support their work. We have implemented this approach as a socio-technical system that includes: 1) a crowdsourcing platform to support metadata annotation and addition of new terms, 2) a range of social editorial processes to make standardization decisions for those new terms, and 3) a framework for ontology revision and updates to the metadata created with the previous version of the ontology. We present a prototype implementation for the paleoclimate community, the Linked Earth Framework, currently containing 700 datasets and engaging over 50 active contributors. Users exploit the platform to do science while extending the metadata vocabulary, thereby producing useful and practical metadata
WIDOCO: A Wizard for Documenting Ontologiesdgarijo
WIDOCO is a WIzard for DOCumenting Ontologies that guides users through the documentation process of their vocabularies. Given an RDF vocabulary, WIDOCO detects missing vocabulary metadata and creates a documentation with diagrams, human readable descriptions of the ontology terms and a summary of
changes with respect to previous versions of the ontology. The documentation consists on a set of linked enriched HTML pages that can be further extended by end users. WIDOCO is open source and builds on well established Semantic Web tools. So far, it has been used to document more than one hundred ontologies in different domains.
We propose a new area of research on automating data narratives. Data narratives are containers of information about computationally generated research findings. They have three major components: 1) A record of events, that describe a new result through a workflow and/or provenance of all the computations executed; 2) Persistent entries for key entities involved for data, software versions, and workflows; 3) A set of narrative accounts that are automatically generated human-consumable renderings of the record and entities and can be included in a paper. Different narrative accounts can be used for different audiences with different content and details, based on the level of interest or expertise of the reader. Data narratives can make science more transparent and reproducible, because they ensure that the text description of the computational experiment reflects with high fidelity what was actually done. Data narratives can be incorporated in papers, either in the methods section or as supplementary materials. We introduce DANA, a prototype that illustrates how to generate data narratives automatically, and describe the information it uses from the computational records. We also present a formative evaluation of our approach and discuss potential uses of automated data narratives.
Automated Hypothesis Testing with Large Scale Scientific Workflowsdgarijo
(Credit to Varun Ratnakar and Yolanda Gil).
The automation of important aspects of scientific data analysis would significantly accelerate the pace of science and innovation. Although important aspects of data analysis can be automated, the hypothesize-test-evaluate discovery cycle is largely carried out by hand by researchers. This introduces a significant human bottleneck, which is inefficient and can lead to erroneous and incomplete explorations. We introduce a novel approach to automate the hypothesize-test-evaluate discovery cycle with an intelligent system that a scientist can task to test hypotheses of interest in a data repository. Our approach captures three types of data analytics knowledge: 1) common data analytic methods represented as semantic workflows; 2) meta-analysis methods that aggregate those results, represented as meta-workflows; and 3) data analysis strategies that specify for a type of hypothesis what data and methods to use, represented as lines of inquiry. Given a hypothesis specified by a scientist, appropriate lines of inquiry are triggered, which lead to retrieving relevant datasets, running relevant workflows on that data, and finally running meta-workflows on workflow results. The scientist is then presented with a level of confidence on the initial hypothesis (or a revised hypothesis) based on the data and methods applied. We have implemented this approach in the DISK system, and applied it to multi-omics data analysis.
OntoSoft: A Distributed Semantic Registry for Scientific Softwaredgarijo
Credit to Yolanda Gil.
OntoSoft is a distributed semantic registry for scientific software. This paper describes three major novel contributions of OntoSoft: 1) a software metadata registry designed for scientists, 2) a distributed approach to software registries that targets communities of interest, and 3) metadata crowdsourcing through access control. Software metadata is organized using the OntoSoft ontology along six dimensions that matter to scientists: identify software, understand and assess software, execute software, get support for the software, do research with the software, and update the software. OntoSoft is a distributed registry where each site is owned and maintained by a community of interest, with a distributed semantic query capability that allows users to search across all sites. The registry has metadata crowdsourcing capabilities, supported through access control so that software authors can allow others to expand on specific metadata properties.
OEG tools for supporting Ontology Engineeringdgarijo
The document summarizes several tools developed by the Ontology Engineering Group (OEG) to support ontology engineering, including Vocabularium for serving ontologies online, OnToology for evaluation reports, documentation and publishing ontologies, AR2DTool for ontology diagrams, Widoco for HTML documentation, and OOPS! for ontology quality evaluations. It provides an overview of the capabilities of each tool and URLs for their websites and GitHub repositories.
Software Metadata: Describing "dark software" in GeoSciencesdgarijo
This document discusses describing "dark software" or unshared scientific software in geosciences. It proposes using the OntoSoft ontology to capture standardized metadata about scientific software. This would allow software to be more discoverable, reusable and reproducible. The document outlines the types of metadata captured by OntoSoft and demonstrates how it can be used to describe software and facilitate search and comparison of different tools.
Reproducibility Using Semantics: An Overviewdgarijo
Overview of the different approaches for addressing reproducibilities (using semantics) in laboratory protocols, workflow description and publication and workflow infrastructure. Furthermore, Research Objects are introduced as a means to capture the context and annotations of scientific experiments, together with the privacy and IPR concerns that may arise. This presentation was presented in Dagstuhl Seminar 16041: http://www.dagstuhl.de/16041
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3Data Hops
Free A4 downloadable and printable Cyber Security, Social Engineering Safety and security Training Posters . Promote security awareness in the home or workplace. Lock them Out From training providers datahops.com
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Leveraging the Graph for Clinical Trials and Standards
Status update OEG - Nov 2012
1. Work at ISI,
Current Status,
Next Steps
Daniel Garijo Verdejo,
Oscar Corcho,
Yolanda Gil
Ontology Engineering Group. Laboratorio de Inteligencia Artificial
Departamento de Inteligencia Artificial
Facultad de Informática
Universidad Politécnica de Madrid
Date: 29/11/2012
2. What am I working on?
•Creation of abstractions in scientific workflows
•Workflow Traces and template representation
•Provenance representation
•Plan representation
•Abstraction catalog
•Find ways to link the definitions to the provenance traces automatically
•Understandability and reuse of scientific workflows
2
3. Outline
Index
1. Motivation
2. Overview
3. Workflow systems used
4. Summary of work done in my previous visit to ISI
• OPMW and provenance publishing
5. Summary of work done before second visit to ISI
• Workflow motif catalog
6. Summary of work done in my second visit to ISI
• OPMW-PROV and P-PLAN
• Automatic macro abstraction detection
7. Next Steps
8. Future work
3
4. Motivation
•As a designer: Discovery
•Workflows with similar functionality fragments/methods
•Design based in previous templates.
•As user/reuser: Understandability
•Search workflows by functionality
•Commonalities between execution runs
•Component categorization
4
5. Overview
Descriptions/
Abstraction definitions and categorization PSMS/Ontologies
Data mining tools,
graph analysis, etc.
Algorithms for finding the different
abstractions automatically
RDF Stores
Experiment publication
Vocabularies
Provenance Plan
representation representation
5
6. Taverna and Wings
http://www.taverna.org.uk/
http://www.wings-workflows.org/
6
IEEE eScience 2012. Chicago, USA
7. Summary: Previous Work at ISI
Abstractions definitions and categorization
Algorithms for finding the different
abstractions automatically
Virtuoso,
Experiment Publication Pubby,
Wings (+Plugin)
OPMW
Provenance Plan
representation representation
7
8. High level architecture
Other
workflow
WINGS on local laptop environments
Workflow
Core Template OPM
Portal Workflow export
Instance
Programatic access
(external apps)
WINGS on shared host
Workflow Linked
Core Template OPM
Portal export Data
Workflow
Instance Publication Interactive
WINGS on web server
Browsing
Workflow (Pubby frontend)
Core Template OPM
Portal export Users
Workflow
Instance
Wings workflow OPM
Publication Share Reuse
generation conversion
8
11. Work previous to second visit to ISI
Motif Detection
Abstractions definitions and categorization
Algorithms for automatic matching
Virtuoso,
Experiment Publication Pubby,
Wings (+Plugin)
Provenance Plan OPMW
representation representation
11
12. Overview
• Empirical analysis on 177 workflow templates from Taverna and
Wings
• Catalog of recurring patterns: scientific
workflow motifs.
• Data Oriented Motifs
• Workflow Oriented Motifs
•Understandability and reuse
http://sensefinancial.com/wp-content/uploads/2012/02/contribution.jpg
12
IEEE eScience 2012. Chicago, USA
13. Approach
•Reverse-engineer the set of current practices in workflow
development through an analysis of empirical evidence
•Identify workflow abstractions that would facilitate
understandability and therefore effective re-use
13
IEEE eScience 2012. Chicago, USA
14. Workflow Motifs
•Workflow motif: Domain independent conceptual abstraction on the workflow
steps.
1. Data-oriented motifs: What kind of manipulations does the workflow have?
•E.g.:
•Data retrieval
•Data preparation
• etc.
WHAT?
2. Workflow-oriented motifs: How does the workflow perform its operations?
•E.g.:
•Stateful steps
•Stateless steps
•Human interactions
•etc.
HOW?
14
IEEE eScience 2012. Chicago, USA
15. Motif Catalog
Data-Oriented Motifs Workflow-Oriented Motifs
Data Retrieval Intra-Workflow Motifs
Data Preparation Stateful (Asynchronous) Invocations
Format Transformation Stateless (Synchronous) Invocations
Input Augmentation Internal Macros
and Output Splitting
Human Interactions
Data Organisation
Data Analysis Inter-Workflow Motifs
Data Curation/Cleaning Atomic Workflows
Composite Workflows
Data Moving
Data Visualisation Workflow Overloading
15
IEEE eScience 2012. Chicago, USA
16. Summary: Work done at ISI
Motif Detection
Abstractions definitions and categorization
SUBDUE
exploration and
Macro
Algorithms for automatic integration in
abstraction
matching RDF
detection
Virtuoso,
Experiment Publication Pubby,
Wings (+Plugin)
Provenance Plan OPMW + PROV
representation representation + P-PLAN
16
17. PROV Compatibility
•OPMW fits naturally into PROV
•Same usage-generation
structure
•Extension for the scientific
workflow with PROV
•Binary relationships (no n-ary
patterns used).
•Simplicity
•Publication of PROV as well as
OPMW.
•Queries can be answered in
both languages.
•Flexibility.
•http://www.opmw.org/node/8
17
18. P-PLAN
•Plans are not provenance
•P-PLAN: Simple plan model for binding traces to template representations
•Aligned with OPMW and PROV
•Documentation in progress
18
19. Summary: Work done at ISI
Motif Detection
Abstractions definitions and categorization
SUBDUE
exploration and
Macro
Algorithms for automatic integration in
abstraction
matching RDF
detection
Virtuoso,
Experiment Publication Pubby,
Wings (+Plugin)
Provenance Plan OPMW + PROV
representation representation + P-PLAN
19
20. Macro abstraction detection
Problem statement:
Given a repository of workflow templates (either abstract or specific) or workflow
execution traces, what are the workflow fragments I can deduce from it?
Useful for:
•Systems like Taverna and Wings: (Many templates, little annotation to relate them)
•Finding relationships between workflows and sub-workflows.
•Most used fragments, most executed, etc.
•Systems like GenePattern and Galaxy: (Many runs, nearly no templates
published)
•Proposing new templates with the popular fragments.
20
21. Macro abstraction detection
•Work in Progress (implementation and evaluation)
•WINGS traces
•Similar to Sub-graph Isomorphism
•Kind of “Graph Clustering”
•Early results
•Tool for finding common sub-graphs
•Sequential graphs
•Efficient
•Scalable.
•Integration with RDF (by me)
•TO DO:
•Finish implementation: inference.
•Evaluation!!
21
23. Next Steps
•Thesis:
•Finish up implementation.
•How to evaluate results?
•Publications:
•Workshop:
•Provenance Corpus (with Taverna Team). To have something citable
•Conference:
•KCAP: Macro detection implementation and evaluation.
•Journal
•Decay analysis publication in journal (January)
•OPMW - PROV -P-PLAN publication in journal (December)
•Motif extension publication in journal (Invited by special issue)
(Now)
23
24. Future work
•Thesis:
•Other methods for detecting workflow abstraction automatically
•Metadata and file analysis (diff, etc.): Filter, merge, etc.
•Provenance reconstruction.
•Project:
•RO model specifications
•Testcases
•Workflow abstraction with Isoco
24
26. Work at ISI,
relation with wf4Ever,
future steps
Daniel Garijo Verdejo
Ontology Engineering Group. Laboratorio de Inteligencia Artificial
Departamento de Inteligencia Artificial
Facultad de Informática
Universidad Politécnica de Madrid
Date: 03/10/2011