SlideShare a Scribd company logo

Detecting common scientific workflow fragments using templates and execution provenance (K-CAP 2013)

dgarijo
dgarijo
1 of 29
Download to read offline
Date: 21/06/2013
Detecting common scientific
workflow fragments using
templates and execution
provenance
Daniel Garijo *, Oscar Corcho *, Yolanda Gil Ŧ
* Ontology Engineering Group
Universidad Politécnica de Madrid,
Ŧ USC Information Sciences Institute
K-CAP 2013. Banff, Canada
2
Overview
• Creation of abstractions from low level and high level tasks in
scientific workflows.
• Approach for detecting common groups of tasks among scientific
workflows.
•Discoverability, understandability, reuse and design
K-CAP 2013. Banff, Canada
Lab book
Digital Log
Laboratory Protocol
(recipe)
Workflow
Experiment
3
Background
• Workflows as software artifacts that capture the scientific method
• Addition to paper publication
• Reuse
• Existing repositories of workflows (myExperiment)
• Sharing workflows
• Exploring existing workflows.
• PROBLEMS to address:
• Sometimes workflows are difficult to understand
• Provenance is captured at a too low level. How
can it be generalized?
• Workflow descriptions are hard to relate to each
other.
• What are the common fragments shared among
workflow templates?
http://www.myexperiment.org
K-CAP 2013. Banff, Canada
4
Terminology: workflow templates
“A workflow template connects the steps of the workflow together, its inputs,
intermediate results and expected outputs, and defines their types and dependencies”.
•Abstract workflow template: Template with some unbound steps
•Specific workflow template: Template in which all the steps are bound to a
specific service, tool or code.
K-CAP 2013. Banff, Canada
Abstract
Specific
Taxonomy of components
5
Terminology: workflow templates
“A workflow template connects the steps of the workflow together, its inputs,
intermediate results and expected outputs, and defines their types and dependencies”.
•Abstract workflow template: Template with some unbound steps
•Specific workflow template: Template in which all the steps are bound to a
specific service, tool or code.
K-CAP 2013. Banff, Canada
Abstract
Specific
Taxonomy of components
Problem
Solving
Methods
6
Terminology : Workflow execution provenance traces
Workflow execution provenance trace: structured log of the workflow execution
results.
•Inputs of the run
•Outputs of the run
•Intermediate steps resultant form the run.
•Software codes used by the steps.
Porter Stemmer
Result
TF
Output
Dataset
ReutersTrain
TestDataset
A12314
TFResultRun
21-06-2013
K-CAP 2013. Banff, Canada
DataTemplate
ExecutionProcessP1
ExecutionprocessP2

Recommended

From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...dgarijo
 
From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...dgarijo
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAnubhav Jain
 
The Landscape of Ontology Reuse in Linked Data - OEDW2012
The Landscape of Ontology Reuse in Linked Data - OEDW2012The Landscape of Ontology Reuse in Linked Data - OEDW2012
The Landscape of Ontology Reuse in Linked Data - OEDW2012María Poveda Villalón
 
Handling data and workflows in computational materials science: the AiiDA ini...
Handling data and workflows in computational materials science: the AiiDA ini...Handling data and workflows in computational materials science: the AiiDA ini...
Handling data and workflows in computational materials science: the AiiDA ini...Research Data Alliance
 
ntcir14centre-overview
ntcir14centre-overviewntcir14centre-overview
ntcir14centre-overviewTetsuya Sakai
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using RVictoria López
 
Tutorial "Linked Data Query Processing" Part 5 "Query Planning and Optimizati...
Tutorial "Linked Data Query Processing" Part 5 "Query Planning and Optimizati...Tutorial "Linked Data Query Processing" Part 5 "Query Planning and Optimizati...
Tutorial "Linked Data Query Processing" Part 5 "Query Planning and Optimizati...Olaf Hartig
 

More Related Content

Similar to Detecting common scientific workflow fragments using templates and execution provenance (K-CAP 2013)

Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015dgarijo
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsdgarijo
 
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema VarietyUniversity of Bologna
 
Converting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsConverting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsKhalid Belhajjame
 
Converting Scripts into Reproducible Workflow Research Objects
Converting Scripts into Reproducible Workflow Research ObjectsConverting Scripts into Reproducible Workflow Research Objects
Converting Scripts into Reproducible Workflow Research ObjectsLucas Augusto Carvalho
 
GrenchMark at CCGrid, May 2006.
GrenchMark at CCGrid, May 2006.GrenchMark at CCGrid, May 2006.
GrenchMark at CCGrid, May 2006.Alexandru Iosup
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OSri Ambati
 
Technical research writing
Technical research writing   Technical research writing
Technical research writing AJAL A J
 
From Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data ApplicationsFrom Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data ApplicationsDatabricks
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...t_ivanov
 
PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...Feng Li
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterDatabricks
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science researchAnubhav Jain
 
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeWorkflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeCarole Goble
 
Stacked Ensembles in H2O
Stacked Ensembles in H2OStacked Ensembles in H2O
Stacked Ensembles in H2OSri Ambati
 
DataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and WorkflowsDataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and WorkflowsDataONE
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsCarole Goble
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkMahantesh Angadi
 

Similar to Detecting common scientific workflow fragments using templates and execution provenance (K-CAP 2013) (20)

Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
 
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
 
Converting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsConverting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objects
 
Converting Scripts into Reproducible Workflow Research Objects
Converting Scripts into Reproducible Workflow Research ObjectsConverting Scripts into Reproducible Workflow Research Objects
Converting Scripts into Reproducible Workflow Research Objects
 
GrenchMark at CCGrid, May 2006.
GrenchMark at CCGrid, May 2006.GrenchMark at CCGrid, May 2006.
GrenchMark at CCGrid, May 2006.
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2OStrata San Jose 2016: Scalable Ensemble Learning with H2O
Strata San Jose 2016: Scalable Ensemble Learning with H2O
 
Technical research writing
Technical research writing   Technical research writing
Technical research writing
 
From Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data ApplicationsFrom Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data Applications
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
 
PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeWorkflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
 
Stacked Ensembles in H2O
Stacked Ensembles in H2OStacked Ensembles in H2O
Stacked Ensembles in H2O
 
DITEC - Software Engineering
DITEC - Software EngineeringDITEC - Software Engineering
DITEC - Software Engineering
 
DataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and WorkflowsDataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and Workflows
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
 

More from dgarijo

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Futuredgarijo
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Softwaredgarijo
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationdgarijo
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasetsdgarijo
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphsdgarijo
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadatadgarijo
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...dgarijo
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Datadgarijo
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...dgarijo
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019dgarijo
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Sciencedgarijo
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...dgarijo
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologiesdgarijo
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narrativesdgarijo
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflowsdgarijo
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Softwaredgarijo
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineeringdgarijo
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesdgarijo
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overviewdgarijo
 

More from dgarijo (20)

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 

Recently uploaded

TrustArc Webinar - TrustArc's Latest AI Innovations
TrustArc Webinar - TrustArc's Latest AI InnovationsTrustArc Webinar - TrustArc's Latest AI Innovations
TrustArc Webinar - TrustArc's Latest AI InnovationsTrustArc
 
Microsoft x 2toLead Webinar Session 1 - How Employee Communication and Connec...
Microsoft x 2toLead Webinar Session 1 - How Employee Communication and Connec...Microsoft x 2toLead Webinar Session 1 - How Employee Communication and Connec...
Microsoft x 2toLead Webinar Session 1 - How Employee Communication and Connec...2toLead Limited
 
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)Jay Zhao
 
AI for Educators - Integrating AI in the Classrooms
AI for Educators - Integrating AI in the ClassroomsAI for Educators - Integrating AI in the Classrooms
AI for Educators - Integrating AI in the ClassroomsPremsankar Chakkingal
 
Boosting Developer Effectiveness with a Java platform team 1.4 - ArnhemJUG
Boosting Developer Effectiveness with a Java platform team 1.4 - ArnhemJUGBoosting Developer Effectiveness with a Java platform team 1.4 - ArnhemJUG
Boosting Developer Effectiveness with a Java platform team 1.4 - ArnhemJUGRick Ossendrijver
 
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...ShapeBlue
 
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)François
 
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...ShapeBlue
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...
Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...
Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...Chris Bingham
 
AGFM - Toyota Coaster 1HZ Install Guide.pdf
AGFM - Toyota Coaster 1HZ Install Guide.pdfAGFM - Toyota Coaster 1HZ Install Guide.pdf
AGFM - Toyota Coaster 1HZ Install Guide.pdfRodneyThomas28
 
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxGraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxNeo4j
 
How AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptxHow AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptxInfosec
 
GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...
GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...
GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...James Anderson
 
Geospatial Synergy: Amplifying Efficiency with FME & Esri
Geospatial Synergy: Amplifying Efficiency with FME & EsriGeospatial Synergy: Amplifying Efficiency with FME & Esri
Geospatial Synergy: Amplifying Efficiency with FME & EsriSafe Software
 
Pragmatic UI testing with Compose Semantics.pdf
Pragmatic UI testing with Compose Semantics.pdfPragmatic UI testing with Compose Semantics.pdf
Pragmatic UI testing with Compose Semantics.pdfinfogdgmi
 
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...ShapeBlue
 
software-quality-assurance question paper 2023
software-quality-assurance question paper 2023software-quality-assurance question paper 2023
software-quality-assurance question paper 2023RohanMistry15
 
Large Language Models and Applications in Healthcare
Large Language Models and Applications in HealthcareLarge Language Models and Applications in Healthcare
Large Language Models and Applications in HealthcareAsma Ben Abacha
 
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...Improving IT Investment Decisions and Business Outcomes with Integrated Enter...
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...Cprime
 

Recently uploaded (20)

TrustArc Webinar - TrustArc's Latest AI Innovations
TrustArc Webinar - TrustArc's Latest AI InnovationsTrustArc Webinar - TrustArc's Latest AI Innovations
TrustArc Webinar - TrustArc's Latest AI Innovations
 
Microsoft x 2toLead Webinar Session 1 - How Employee Communication and Connec...
Microsoft x 2toLead Webinar Session 1 - How Employee Communication and Connec...Microsoft x 2toLead Webinar Session 1 - How Employee Communication and Connec...
Microsoft x 2toLead Webinar Session 1 - How Employee Communication and Connec...
 
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
 
AI for Educators - Integrating AI in the Classrooms
AI for Educators - Integrating AI in the ClassroomsAI for Educators - Integrating AI in the Classrooms
AI for Educators - Integrating AI in the Classrooms
 
Boosting Developer Effectiveness with a Java platform team 1.4 - ArnhemJUG
Boosting Developer Effectiveness with a Java platform team 1.4 - ArnhemJUGBoosting Developer Effectiveness with a Java platform team 1.4 - ArnhemJUG
Boosting Developer Effectiveness with a Java platform team 1.4 - ArnhemJUG
 
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
 
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)
Mind your App Footprint 🐾⚡️🌱 (@FlutterHeroes 2024)
 
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...
Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...
Learning About GenAI Engineering with AWS PartyRock [AWS User Group Basel - F...
 
AGFM - Toyota Coaster 1HZ Install Guide.pdf
AGFM - Toyota Coaster 1HZ Install Guide.pdfAGFM - Toyota Coaster 1HZ Install Guide.pdf
AGFM - Toyota Coaster 1HZ Install Guide.pdf
 
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptxGraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
GraphSummit London Feb 2024 - ABK - Neo4j Product Vision and Roadmap.pptx
 
How AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptxHow AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptx
 
GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...
GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...
GDG Cloud Southlake 30 Brian Demers Breeding 10x Developers with Developer Pr...
 
Geospatial Synergy: Amplifying Efficiency with FME & Esri
Geospatial Synergy: Amplifying Efficiency with FME & EsriGeospatial Synergy: Amplifying Efficiency with FME & Esri
Geospatial Synergy: Amplifying Efficiency with FME & Esri
 
Pragmatic UI testing with Compose Semantics.pdf
Pragmatic UI testing with Compose Semantics.pdfPragmatic UI testing with Compose Semantics.pdf
Pragmatic UI testing with Compose Semantics.pdf
 
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
 
software-quality-assurance question paper 2023
software-quality-assurance question paper 2023software-quality-assurance question paper 2023
software-quality-assurance question paper 2023
 
Large Language Models and Applications in Healthcare
Large Language Models and Applications in HealthcareLarge Language Models and Applications in Healthcare
Large Language Models and Applications in Healthcare
 
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...Improving IT Investment Decisions and Business Outcomes with Integrated Enter...
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...
 

Detecting common scientific workflow fragments using templates and execution provenance (K-CAP 2013)

  • 1. Date: 21/06/2013 Detecting common scientific workflow fragments using templates and execution provenance Daniel Garijo *, Oscar Corcho *, Yolanda Gil Ŧ * Ontology Engineering Group Universidad Politécnica de Madrid, Ŧ USC Information Sciences Institute K-CAP 2013. Banff, Canada
  • 2. 2 Overview • Creation of abstractions from low level and high level tasks in scientific workflows. • Approach for detecting common groups of tasks among scientific workflows. •Discoverability, understandability, reuse and design K-CAP 2013. Banff, Canada Lab book Digital Log Laboratory Protocol (recipe) Workflow Experiment
  • 3. 3 Background • Workflows as software artifacts that capture the scientific method • Addition to paper publication • Reuse • Existing repositories of workflows (myExperiment) • Sharing workflows • Exploring existing workflows. • PROBLEMS to address: • Sometimes workflows are difficult to understand • Provenance is captured at a too low level. How can it be generalized? • Workflow descriptions are hard to relate to each other. • What are the common fragments shared among workflow templates? http://www.myexperiment.org K-CAP 2013. Banff, Canada
  • 4. 4 Terminology: workflow templates “A workflow template connects the steps of the workflow together, its inputs, intermediate results and expected outputs, and defines their types and dependencies”. •Abstract workflow template: Template with some unbound steps •Specific workflow template: Template in which all the steps are bound to a specific service, tool or code. K-CAP 2013. Banff, Canada Abstract Specific Taxonomy of components
  • 5. 5 Terminology: workflow templates “A workflow template connects the steps of the workflow together, its inputs, intermediate results and expected outputs, and defines their types and dependencies”. •Abstract workflow template: Template with some unbound steps •Specific workflow template: Template in which all the steps are bound to a specific service, tool or code. K-CAP 2013. Banff, Canada Abstract Specific Taxonomy of components Problem Solving Methods
  • 6. 6 Terminology : Workflow execution provenance traces Workflow execution provenance trace: structured log of the workflow execution results. •Inputs of the run •Outputs of the run •Intermediate steps resultant form the run. •Software codes used by the steps. Porter Stemmer Result TF Output Dataset ReutersTrain TestDataset A12314 TFResultRun 21-06-2013 K-CAP 2013. Banff, Canada DataTemplate ExecutionProcessP1 ExecutionprocessP2
  • 7. 7 Internal Macro K-CAP 2013. Banff, Canada •Same sequence of steps in different parts of the workflow. •Types of data and steps are the same. •May or not may be found among other workflows. Local to a workflow.
  • 8. 8 Composite Workflows K-CAP 2013. Banff, Canada •Same sequence of steps among different workflows. •Types of data and steps are the same.
  • 9. 9 Background: Motifs •Workflow motifs catalogue [Garijo et al. 2012]: Domain independent conceptual abstractions on the workflow steps. 1. Data-oriented motifs: What kind of manipulations does the workflow have? 2. Workflow-oriented motifs: How does the workflow perform its operations? •We aim to automatically detect two types of motifs •Internal Macro (common sequences of steps within a workflow) •Composite workflows (common sequences of steps among workflows) K-CAP 2013. Banff, Canada [Garijo et al. 2012] Daniel Garijo, Pinar Alper, Khalid Belhajjame, Oscar Corcho, Yolanda Gil, Carole Goble. Common motifs in scientific workflows: An empirical analysis. IEEE 8th International Conference on eScience 2012.
  • 10. 11 Motifs: Summary K-CAP 2013. Banff, Canada Most popular HOW motifs: Atomic workflows, Composite Workflows and Internal Macro
  • 11. 12 Approach Workflow Retrieval Common fragment detection Result analysis K-CAP 2013. Banff, Canada 1. Retrieval of workflow templates and execution provenance traces from a repository of workflows. 2. Algorithms to obtain the most common fragments among the workflow dataset. 3. Derivation of statistics and annotation of workflows.
  • 12. 13 Workflow representation •Workflows are labeled DAGs (Directed Acyclic Graphs) •Representation for both templates and workflow execution provenance traces. •No loops •No conditionals •Popular representation in data oriented scientific workflows (supported by many workflow engines). K-CAP 2013. Banff, Canada
  • 13. 14 Challenges: Common workflow fragment detection K-CAP 2013. Banff, Canada [Holder et al 1994]: L. B. Holder, D. J. Cook, and S. Djoko. Substructure Discovery in the SUBDUE System. AAAI Workshop on Knowledge Discovery, pages 169{180, 1994. •Given a collection of workflows, which are the most common fragments? •Common sub-graphs among the collection •Sub-graph isomorphism (NP-complete) •We use the SUBDUE algorithm [Holder et al 1994] •Graph Grammar learning •The rules of the grammar are the workflow fragments •Graph based hierarchical clustering •Each cluster corresponds to a workflow fragment •Iterative algorithm with two measures for compressing the graph: •Minimum Description Length (MDL) •Size
  • 14. 15 How does SUBDUE work? K-CAP 2013. Banff, Canada ProcessType1 DatasetT1 DatasetT2 ProcessType2 DatasetT3 ProcessType3 DatasetT3 ProcessType1 DatasetT1 DatasetT2 ProcessType2 DatasetT3 DatasetT2 ProcessType2 DatasetT3 Input Graph
  • 15. 16 How does SUBDUE work? K-CAP 2013. Banff, Canada ProcessType1 DatasetT1 DatasetT2 ProcessType2 DatasetT3 ProcessType3 DatasetT3 ProcessType1 DatasetT1 DatasetT2 ProcessType2 DatasetT3 DatasetT2 ProcessType2 DatasetT3 Iteration 1 Fragment1
  • 16. 17 How does SUBDUE work? K-CAP 2013. Banff, Canada ProcessType1 DatasetT1 FRAG1 ProcessType3 DatasetT3 ProcessType1 DatasetT1 FRAG1 Iteration 1 result FRAG1
  • 17. 18 How does SUBDUE work? K-CAP 2013. Banff, Canada ProcessType1 DatasetT1 FRAG1 ProcessType3 DatasetT3 ProcessType1 DatasetT1 FRAG1 Iteration 2 Fragment2 FRAG1
  • 18. 19 How does SUBDUE work? K-CAP 2013. Banff, Canada FRAG2 ProcessType3 DatasetT3 FRAG2 Iteration 2 result (STOP) FRAG1
  • 19. 20 How does SUBDUE work? K-CAP 2013. Banff, Canada Results: Fragment 1 (FRAG1) : Fragment 2 (FRAG2): Occurrences: 3 times 2 times DatasetT2 ProcessType2 DatasetT3 ProcessType1 DatasetT1 FRAG1
  • 20. 21 Challenges: Generalization of workflows K-CAP 2013. Banff, Canada Workflow Retrieval Common fragment detection Result analysis Workflow Generalization Porter Stemmer Lovins Stemmer Term Weighting DFTF Stemmer CF
  • 21. 22 Analysis setup Analysis performed on 22 workflow templates with 30 workflow execution provenance traces. •Abstract and specific workflow templates •Several workflow executions belong to the same template •Some workflow executions had errors during the execution. •Workflows have been manually analyzed to find motifs. •Internal Macros •SubWorkflows K-CAP 2013. Banff, Canada
  • 22. 23 Evaluation results. Internal Macro K-CAP 2013. Banff, Canada •Our goal is to maximize the filtered multi-step fragments. •The algorithm finds more multi-step fragments due to the way it operates. •A step for filtering the multi-step fragments must be applied on the obtained results (some are part of others).
  • 23. 24 Evaluation results. Composite workflows K-CAP 2013. Banff, Canada •More filtered multi-step fragments are found automatically than manually. •Manual analysis affects sub-workflows. •More than 50% of the filtered multi-step fragments overlap with the manual ones. •The fragments found automatically have more occurrences than those found manually.
  • 24. 25 Limitations K-CAP 2013. Banff, Canada Overlapping fragment may not be fully detected!!
  • 25. 26 Conclusions & future work •Approach for detecting commonalities among scientific workflows. • Workflow execution provenance traces • Workflow templates •Detection of the most common workflow fragments. •Generalization of the datasets. Future work •Expand analysis to other domains. •Add support for other workflow systems: Taverna, Knime, GenePattern, Galaxy, Vistrails, etc. •Test other graph matching algorithms. •Optimize the algorithm by reducing the search space. •All inputs and results are available here: http://www.oeg-upm.net/files/dgarijo/kcap2013Eval K-CAP 2013. Banff, Canada
  • 26. 27 Towards automatic annotation of workflows K-CAP 2013. Banff, Canada •Ontology for describing workflow motifs •The Workflow Motif Ontology •URL: http://purl.org/net/wf-motifs •Ontology for linking fragments to the workflows of the dataset (Work in progress). •The Workflow Fragment Description Ontology •URL: To be announced
  • 27. 28 Current improvements •Testing other domains. •Expanding the compatible workflow systems: Taverna. •Improving the workflow representation to reduce the graph size. K-CAP 2013. Banff, Canada
  • 28. 29 Who are we? •Daniel Garijo Ontology Engineering Group, UPM •Oscar Corcho Ontology Engineering Group, UPM •Yolanda Gil Information Sciences Institute, USC EU Wf4Ever project (270129) funded under EU FP7 (ICT- 2009.4.1). (http://www.wf4ever-project.org) K-CAP 2013. Banff, Canada
  • 29. Date: 21/06/2013 Detecting common scientific workflow fragments using templates and execution provenance Daniel Garijo *, Oscar Corcho *, Yolanda Gil Ŧ * Ontology Engineering Group Universidad Politécnica de Madrid, Ŧ USC Information Sciences Institute K-CAP 2013. Banff, Canada