SlideShare a Scribd company logo
1 of 36
Download to read offline
Daniel Garijo, Ontology Engineering Group,
Universidad Politécnica de Madrid, Spain
FAIR Workflows:
A step closer to the
Scientific Paper of the
Future
daniel.garijo@upm.es
@dgarijov
Computational and Autonomous Workflows Workshop
(CAW) 20th July, 2021
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
A few details about myself
2
Semantic Web, Linked Data and
Knowledge Graphs
Open Science best practices
Semantic Scientific Workflows
(WINGS)
Provenance Standards (W3C PROV)
Software metadata representation
and extraction
Research Objects (context)
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
How I started: A personal view on reproducibility
3
A simple summer job:
- Reproduce this paper:
- Paper was published 1 year before
- Authors were available to help
- Website with data used was available
http://funsite.sdsc.edu/drugome/TB/ now a 404! (see Internet Archive)
But:
- No workflow (or sketch)
- Input data had slight changes
- Software licenses had expired
- Some data cleaning steps (for final
results) not available
- Some authors were in different institutions
Phil E. Bourne
(UCSD, now
Univ. of Virginia)
Yolanda Gil
(ISI, USC)
Kinnings SL, Xie L, Fung KH, Jackson RM, Xie L, et al. (2010) The
Mycobacterium tuberculosis Drugome and Its Polypharmacological
Implications. PLoS Comput Biol 6(11): e1000976
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
How I started: A personal view on Reproducibility
4
Three months later, we were
successful. We:
- Quantified effort and expertise
- Stored all resources in a wiki
- Created a desiderata for
reproducibility
Garijo D, Kinnings S, Xie L, Xie L, Zhang Y, Bourne PE, et al. (2013) Quantifying Reproducibility in Computational Biology: The Case
of the Tuberculosis Drugome. PLoS ONE 8(11): e80278. https://doi.org/10.1371/journal.pone.0080278
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Reproducibility
5
Scientists
Industry General Public
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
There is hope
6
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Changes in the scientific community
7
Open Data Open Source Software Open Access
Publications
Credit and impact
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Changes in public institutions: Initiatives in Data Science
8
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Changes in publishers and funders
9
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Best practices and principles
10
Other guidelines:
● Guidelines for Transparency and
Openness Promotion (TOP)
● Reproducibility Enhancement
Principles (REP)
● ...
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
FAIR data principles in a nutshell
11
Metadata
Make sure your resource is
findable in a public registry (e.g.,
by a search engine), and it has a
public unique id
Your resource should be
retrievable by using its identifier
and a standard communication
protocol (e.g. HTTP)
Use an existing standard to
represent your resource
Include documentation,
provenance and license for your
resource
Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3,
160018 (2016). https://doi.org/10.1038/sdata.2016.18
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Extensions to FAIR
12
Since 2016, much has been written about FAIR (e.g., full special
issue in data Intelligence, 2020)
- Software and services
- Semantic artefacts
- Workflows
- ...
Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes, Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters,
Daniel Schober; FAIR Computational Workflows. Data Intelligence 2020; 2 (1-2): 108–121. doi:
https://doi.org/10.1162/dint_a_00033 https://doi.org/10.1162/dint_a_00033
Lamprecht, Anna-Lena et al. ‘Towards FAIR Principles for Research Software’. 1 Jan. 2020 : 37 – 59
https://content.iospress.com/articles/data-science/ds190026
Smith AM, Katz DS, Niemeyer KE, FORCE11 Software Citation Working Group. (2016) Software Citation Principles. PeerJ Computer Science
2:e86. DOI: 10.7717/peerj-cs.86
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 13
What does FAIR
mean for
computational
workflows?
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Anatomy of a workflow
14
Several aspects to consider:
● Data
● Software
○ Environment
○ Wrapper scripts
● Specification
● Configuration
● Provenance
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Anatomy of a workflow: Data
15
● Data
○ Apply FAIR:
■ Meaningful inputs
■ Meaningful
intermediate results
■ Meaningful outputs
■ Streaming?
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Anatomy of a workflow: Software
16
Software
○ Tools & scripts (data
preparation,
visualization, etc.)
○ Wrapper scripts
Aspects for FAIR:
○ Software changes and
decays rapidly (version)
○ (Public) code repository
○ (Open) license
○ Credit (citation)
○ Documentation
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
The importance of software metadata
17
Software repository
● Code resides there
● Support software evolution
● Support groups of developers
Software registry
● Capture metadata
● Useful structured
information about the
code
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
The importance of software metadata
18
https://twitter.com/mitsuhiko/status/1410886329924194309
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Software and its computational environment
19
Dependencies? OS?
○ Virtual environments
○ Package managers
○ Containers
○ Virtual machines
Aspects for FAIR:
○ Landscape and standards
change quickly
○ Documentation
○ Size (long term preservation)
dockerpedia
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Computational methods
20
Workflow
○ Many workflow systems
■ Different capabilities
Aspects for FAIR:
○ Public repositories
○ Standard representations
(CWL)
○ Nested workflows?
○ Metadata and documentation
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Documentation: workflow sketches
21
Critical for human for creating human-readable descriptions!
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Workflow configurations
22
California
Florida
K = 10² cm/s
K = 0.001 cm/s
Fix certain
data/parameters/software
of a workflow
○ Run in different
regions
○ Calibrated models
○ Data
compatibilities
○ Critical for end
users (reuse)
Aspects for FAIR:
● Where to include
this information?
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Workflow provenance
23
California
K = 10² cm/s
A record of relevant past executions (and results) of a workflow
● Debug
● Reference examples (cause for workflow decay)*
● Critical for reusability
Aspects for FAIR: How to select (and represent) relevant provenance?
*J. Zhao et al., "Why workflows break — Understanding and combating decay in Taverna workflows," 2012 IEEE 8th International Conference on
E-Science, 2012, pp. 1-9, doi: 10.1109/eScience.2012.6404482.
run with precipitation from
Feb 2020
March
2020
Jan 2020
prov. record march 2020
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 24
● Data
● Software
○ Environment
○ Wrapper scripts
● Specification
● Configuration
● Provenance
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 25
● Data
● Software
○ Environment
○ Wrapper scripts
● Specification
● Configuration
● Provenance
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 26
● Data
● Software
○ Environment
○ Wrapper scripts
● Specification
● Configuration
● Provenance
How to aggregate
everything together, while
preserving its context?
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
A solution: Workflow-centric Research Objects
27
https://www.researchobject.org/ro-crate/1.1/
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Workflow-centric Research Objects (in detail)
28
Open community:
https://www.researchobject.org/ro-crate/community
https://github.com/ResearchObject/ro-crate
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 29
Transition slide
A look beyond FAIR
Beyond FAIR workflows
https://egyptindependent.com/mans-first-steps-on-the-moon-reported-live-by-afp/
A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
The Scientific Paper of the Future
30
www.scientificpaperofthefuture.org
“Towards the Geoscience Paper of the Future: Best Practices
for Documenting and Sharing Research from Data to Software
to Provenance” Gil et al, Earth and Space Science, 2016.
http://dx.doi.org/10.1002/2015EA000136
Geophysics: Special Issue on
Geoscience Papers of the Future
Special Section: Geoscience Papers of
the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
The Scientific Paper of the Future
31
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
An example
32
Published Articles
www.scientificpaperofthefuture.org/gpf/special-issue
“Towards the Geoscience Paper of
the Future: Best Practices for
Documenting and Sharing
Research from Data to Software to
Provenance” Gil et al, Earth and
Space Science, 2016.
http://dx.doi.org/10.1002/2015EA000136
● [David et al 2015]: 10 years of hydrology model software
● [Yu et al 2015]: Model coupling for surface/subsurface flow
● [Essawy et al 2015]: Hydrology workflows for reproducibility
● [Pope et al 2015]: Estimate subglaciar lake depth from imagery
● [Fulweiler et al 2016]: Long-term estuary data & products
● [Tzeng et al 2016]: Data processing for ocean observatory
● [Demir et al 2017]: Sensor network for flood monitoring
● [Peckham et al 2017]: Hydrological modeling toolkit
Adopting FAIR has a crucial social component
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
It’s not only reproducibility...
33
1. Practice open science and reproducible research
2. Get credit for all your research products
● Citations for software, data, containers, notebooks, samples, …
3. Increase citations of your papers
4. Write impressive Data Management Plans
5. Extend your CV with data and software sections
6. Improve the management of your research assets
7. Reproduce your work from years ago and build on it
8. Address new funder and journal requirements
9. Attract transformative students
10. Demonstrate leadership by stepping into the future
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Towards the Scientific Paper of the Future
34
Workflows are not just data in terms of FAIR
- Software, datasets, provenance, environment, context, etc.
- Aggregation, versioning, sustainability, etc.
Why do YOU want to support FAIR workflows?
- Level of granularity (usefulness)
- Scientific Paper of the Future
A Social change is needed
- Some practices take time but can be easily adopted
- Add persistent identifiers
- Add licence
- Specify citation
- Documentation
Can you execute a workflow you last ran two years ago?
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Acknowledgements
35
Some slides from this talk have been adapted from the
Scientific Paper of the Future training materials by
Yolanda Gil et al. under a CC-BY license
https://scientificpaperofthefuture.org http://doi.org/10.5281/zenodo.159206
Yolanda Gil, Oscar Corcho, Carole Goble, Stian
Soiland-Reyes, Deborah Khider, Varun Ratnakar,
Maximiliano Osorio, Hernán Vargas, Suzanne Pierce
and all the participants of the Scientific Paper of the
Future Initiative.
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Questions?
36
* https://www.slideshare.net/rlovinger/metadata-is-a-love-note-to-the-future
Metadata (+ FAIR) is a love note to the future*
Contact me at:
daniel.garijo@upm.es
@dgarijov

More Related Content

What's hot

Introduction of semantic technology for SAS programmers
Introduction of semantic technology for SAS programmersIntroduction of semantic technology for SAS programmers
Introduction of semantic technology for SAS programmers
Kevin Lee
 

What's hot (20)

Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer Research
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
FAIRness through a novel combination of Web technologies
FAIRness through a novel combination of Web technologiesFAIRness through a novel combination of Web technologies
FAIRness through a novel combination of Web technologies
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
 
Research Object Community Update
Research Object Community UpdateResearch Object Community Update
Research Object Community Update
 
Introduction of semantic technology for SAS programmers
Introduction of semantic technology for SAS programmersIntroduction of semantic technology for SAS programmers
Introduction of semantic technology for SAS programmers
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
 
Better software, better service, better research: The Software Sustainabilit...
Better software, better service, better research: The Software Sustainabilit...Better software, better service, better research: The Software Sustainabilit...
Better software, better service, better research: The Software Sustainabilit...
 
New PID developments
New PID developmentsNew PID developments
New PID developments
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
 
A Comparative analysis of Graph Databases vs Relational Database
A Comparative analysis of Graph Databases vs Relational Database A Comparative analysis of Graph Databases vs Relational Database
A Comparative analysis of Graph Databases vs Relational Database
 
Role of PIDs in connecting scholarly works
Role of PIDs in connecting scholarly worksRole of PIDs in connecting scholarly works
Role of PIDs in connecting scholarly works
 
Towards a Unified PageRank for DBpedia and Wikidata
Towards a Unified PageRank for DBpedia and WikidataTowards a Unified PageRank for DBpedia and Wikidata
Towards a Unified PageRank for DBpedia and Wikidata
 

Similar to FAIR Workflows: A step closer to the Scientific Paper of the Future

Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
Sanjay Padhi, Ph.D
 
Real world e-science use-cases
Real world e-science use-casesReal world e-science use-cases
Real world e-science use-cases
Annette Strauch
 

Similar to FAIR Workflows: A step closer to the Scientific Paper of the Future (20)

IronHacks Live: Info session #3 - COVID-19 Data Science Challenge
IronHacks Live: Info session #3 - COVID-19 Data Science ChallengeIronHacks Live: Info session #3 - COVID-19 Data Science Challenge
IronHacks Live: Info session #3 - COVID-19 Data Science Challenge
 
Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021
 
STI 2022 - Generating large-scale network analyses of scientific landscapes i...
STI 2022 - Generating large-scale network analyses of scientific landscapes i...STI 2022 - Generating large-scale network analyses of scientific landscapes i...
STI 2022 - Generating large-scale network analyses of scientific landscapes i...
 
KEDL DBpedia 2019
KEDL DBpedia  2019KEDL DBpedia  2019
KEDL DBpedia 2019
 
LiDIA: An integration architecture to query Linked Open Data from multiple da...
LiDIA: An integration architecture to query Linked Open Data from multiple da...LiDIA: An integration architecture to query Linked Open Data from multiple da...
LiDIA: An integration architecture to query Linked Open Data from multiple da...
 
Tracking research data footprints - slides
Tracking research data footprints - slidesTracking research data footprints - slides
Tracking research data footprints - slides
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
 
From Web 1.0 to Web 2.0 in the engineering information world - yesterday, tod...
From Web 1.0 to Web 2.0 in the engineering information world - yesterday, tod...From Web 1.0 to Web 2.0 in the engineering information world - yesterday, tod...
From Web 1.0 to Web 2.0 in the engineering information world - yesterday, tod...
 
Defining iot.schema.org: Using Knowledge Extraction from Existing IoT-based ...
Defining iot.schema.org: Using Knowledge Extraction from  Existing IoT-based ...Defining iot.schema.org: Using Knowledge Extraction from  Existing IoT-based ...
Defining iot.schema.org: Using Knowledge Extraction from Existing IoT-based ...
 
Experiments With Knowledge Graphs in Fisheries & Oceans Canada
Experiments With Knowledge Graphs in Fisheries & Oceans CanadaExperiments With Knowledge Graphs in Fisheries & Oceans Canada
Experiments With Knowledge Graphs in Fisheries & Oceans Canada
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhy
 
Primers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code ReviewPrimers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code Review
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
2nd International Conference on Cloud, Big Data and IoT (CBIoT 2021)
2nd International Conference on Cloud, Big Data and IoT (CBIoT 2021)2nd International Conference on Cloud, Big Data and IoT (CBIoT 2021)
2nd International Conference on Cloud, Big Data and IoT (CBIoT 2021)
 
2nd International Conference on Cloud, Big Data and IoT (CBIoT 2021)
2nd International Conference on Cloud, Big Data and IoT (CBIoT 2021)2nd International Conference on Cloud, Big Data and IoT (CBIoT 2021)
2nd International Conference on Cloud, Big Data and IoT (CBIoT 2021)
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data Science
 
Real world e-science use-cases
Real world e-science use-casesReal world e-science use-cases
Real world e-science use-cases
 
Delivering Agile Data Science on Openshift - Red Hat Summit 2019
Delivering Agile Data Science on Openshift  - Red Hat Summit 2019Delivering Agile Data Science on Openshift  - Red Hat Summit 2019
Delivering Agile Data Science on Openshift - Red Hat Summit 2019
 

More from dgarijo

Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
dgarijo
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
dgarijo
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
dgarijo
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
dgarijo
 
Is preserving data enough? Towards the preservation of scientific methods
Is preserving data enough? Towards the preservation of scientific methods Is preserving data enough? Towards the preservation of scientific methods
Is preserving data enough? Towards the preservation of scientific methods
dgarijo
 
Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
dgarijo
 

More from dgarijo (20)

WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
 
Publicación de datos y métodos científicos en investigación
Publicación de datos y métodos científicos en investigaciónPublicación de datos y métodos científicos en investigación
Publicación de datos y métodos científicos en investigación
 
EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overview
 
Similarity in Wikipedia Articles (EDBT Summer School)
Similarity in Wikipedia Articles (EDBT Summer School)Similarity in Wikipedia Articles (EDBT Summer School)
Similarity in Wikipedia Articles (EDBT Summer School)
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
 
Is preserving data enough? Towards the preservation of scientific methods
Is preserving data enough? Towards the preservation of scientific methods Is preserving data enough? Towards the preservation of scientific methods
Is preserving data enough? Towards the preservation of scientific methods
 
Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
 
Towards Workflow Ecosystems Through Semantic and Standard Representations
Towards Workflow Ecosystems Through Semantic and Standard RepresentationsTowards Workflow Ecosystems Through Semantic and Standard Representations
Towards Workflow Ecosystems Through Semantic and Standard Representations
 
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline UsersWorkflow Reuse in Practice: A Study of Neuroimaging Pipeline Users
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users
 
Frag Flow: Automated Fragment Detection in Scientific Workflows
Frag Flow: Automated Fragment Detection in Scientific WorkflowsFrag Flow: Automated Fragment Detection in Scientific Workflows
Frag Flow: Automated Fragment Detection in Scientific Workflows
 
User requirments for geospatial provenance
User requirments for geospatial provenanceUser requirments for geospatial provenance
User requirments for geospatial provenance
 

Recently uploaded

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 

Recently uploaded (20)

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To Curves
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 

FAIR Workflows: A step closer to the Scientific Paper of the Future

  • 1. Daniel Garijo, Ontology Engineering Group, Universidad Politécnica de Madrid, Spain FAIR Workflows: A step closer to the Scientific Paper of the Future daniel.garijo@upm.es @dgarijov Computational and Autonomous Workflows Workshop (CAW) 20th July, 2021
  • 2. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 A few details about myself 2 Semantic Web, Linked Data and Knowledge Graphs Open Science best practices Semantic Scientific Workflows (WINGS) Provenance Standards (W3C PROV) Software metadata representation and extraction Research Objects (context)
  • 3. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 How I started: A personal view on reproducibility 3 A simple summer job: - Reproduce this paper: - Paper was published 1 year before - Authors were available to help - Website with data used was available http://funsite.sdsc.edu/drugome/TB/ now a 404! (see Internet Archive) But: - No workflow (or sketch) - Input data had slight changes - Software licenses had expired - Some data cleaning steps (for final results) not available - Some authors were in different institutions Phil E. Bourne (UCSD, now Univ. of Virginia) Yolanda Gil (ISI, USC) Kinnings SL, Xie L, Fung KH, Jackson RM, Xie L, et al. (2010) The Mycobacterium tuberculosis Drugome and Its Polypharmacological Implications. PLoS Comput Biol 6(11): e1000976
  • 4. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 How I started: A personal view on Reproducibility 4 Three months later, we were successful. We: - Quantified effort and expertise - Stored all resources in a wiki - Created a desiderata for reproducibility Garijo D, Kinnings S, Xie L, Xie L, Zhang Y, Bourne PE, et al. (2013) Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome. PLoS ONE 8(11): e80278. https://doi.org/10.1371/journal.pone.0080278
  • 5. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Reproducibility 5 Scientists Industry General Public
  • 6. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 There is hope 6
  • 7. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Changes in the scientific community 7 Open Data Open Source Software Open Access Publications Credit and impact
  • 8. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Changes in public institutions: Initiatives in Data Science 8
  • 9. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Changes in publishers and funders 9
  • 10. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Best practices and principles 10 Other guidelines: ● Guidelines for Transparency and Openness Promotion (TOP) ● Reproducibility Enhancement Principles (REP) ● ...
  • 11. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 FAIR data principles in a nutshell 11 Metadata Make sure your resource is findable in a public registry (e.g., by a search engine), and it has a public unique id Your resource should be retrievable by using its identifier and a standard communication protocol (e.g. HTTP) Use an existing standard to represent your resource Include documentation, provenance and license for your resource Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
  • 12. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Extensions to FAIR 12 Since 2016, much has been written about FAIR (e.g., full special issue in data Intelligence, 2020) - Software and services - Semantic artefacts - Workflows - ... Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes, Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, Daniel Schober; FAIR Computational Workflows. Data Intelligence 2020; 2 (1-2): 108–121. doi: https://doi.org/10.1162/dint_a_00033 https://doi.org/10.1162/dint_a_00033 Lamprecht, Anna-Lena et al. ‘Towards FAIR Principles for Research Software’. 1 Jan. 2020 : 37 – 59 https://content.iospress.com/articles/data-science/ds190026 Smith AM, Katz DS, Niemeyer KE, FORCE11 Software Citation Working Group. (2016) Software Citation Principles. PeerJ Computer Science 2:e86. DOI: 10.7717/peerj-cs.86
  • 13. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 13 What does FAIR mean for computational workflows?
  • 14. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Anatomy of a workflow 14 Several aspects to consider: ● Data ● Software ○ Environment ○ Wrapper scripts ● Specification ● Configuration ● Provenance
  • 15. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Anatomy of a workflow: Data 15 ● Data ○ Apply FAIR: ■ Meaningful inputs ■ Meaningful intermediate results ■ Meaningful outputs ■ Streaming?
  • 16. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Anatomy of a workflow: Software 16 Software ○ Tools & scripts (data preparation, visualization, etc.) ○ Wrapper scripts Aspects for FAIR: ○ Software changes and decays rapidly (version) ○ (Public) code repository ○ (Open) license ○ Credit (citation) ○ Documentation
  • 17. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 The importance of software metadata 17 Software repository ● Code resides there ● Support software evolution ● Support groups of developers Software registry ● Capture metadata ● Useful structured information about the code
  • 18. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 The importance of software metadata 18 https://twitter.com/mitsuhiko/status/1410886329924194309
  • 19. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Software and its computational environment 19 Dependencies? OS? ○ Virtual environments ○ Package managers ○ Containers ○ Virtual machines Aspects for FAIR: ○ Landscape and standards change quickly ○ Documentation ○ Size (long term preservation) dockerpedia
  • 20. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Computational methods 20 Workflow ○ Many workflow systems ■ Different capabilities Aspects for FAIR: ○ Public repositories ○ Standard representations (CWL) ○ Nested workflows? ○ Metadata and documentation
  • 21. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Documentation: workflow sketches 21 Critical for human for creating human-readable descriptions!
  • 22. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Workflow configurations 22 California Florida K = 10² cm/s K = 0.001 cm/s Fix certain data/parameters/software of a workflow ○ Run in different regions ○ Calibrated models ○ Data compatibilities ○ Critical for end users (reuse) Aspects for FAIR: ● Where to include this information?
  • 23. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Workflow provenance 23 California K = 10² cm/s A record of relevant past executions (and results) of a workflow ● Debug ● Reference examples (cause for workflow decay)* ● Critical for reusability Aspects for FAIR: How to select (and represent) relevant provenance? *J. Zhao et al., "Why workflows break — Understanding and combating decay in Taverna workflows," 2012 IEEE 8th International Conference on E-Science, 2012, pp. 1-9, doi: 10.1109/eScience.2012.6404482. run with precipitation from Feb 2020 March 2020 Jan 2020 prov. record march 2020
  • 24. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 24 ● Data ● Software ○ Environment ○ Wrapper scripts ● Specification ● Configuration ● Provenance
  • 25. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 25 ● Data ● Software ○ Environment ○ Wrapper scripts ● Specification ● Configuration ● Provenance
  • 26. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 26 ● Data ● Software ○ Environment ○ Wrapper scripts ● Specification ● Configuration ● Provenance How to aggregate everything together, while preserving its context?
  • 27. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 A solution: Workflow-centric Research Objects 27 https://www.researchobject.org/ro-crate/1.1/
  • 28. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Workflow-centric Research Objects (in detail) 28 Open community: https://www.researchobject.org/ro-crate/community https://github.com/ResearchObject/ro-crate
  • 29. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 29 Transition slide A look beyond FAIR Beyond FAIR workflows https://egyptindependent.com/mans-first-steps-on-the-moon-reported-live-by-afp/ A step closer to the Scientific Paper of the Future
  • 30. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 The Scientific Paper of the Future 30 www.scientificpaperofthefuture.org “Towards the Geoscience Paper of the Future: Best Practices for Documenting and Sharing Research from Data to Software to Provenance” Gil et al, Earth and Space Science, 2016. http://dx.doi.org/10.1002/2015EA000136 Geophysics: Special Issue on Geoscience Papers of the Future Special Section: Geoscience Papers of the Future
  • 31. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 The Scientific Paper of the Future 31
  • 32. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 An example 32 Published Articles www.scientificpaperofthefuture.org/gpf/special-issue “Towards the Geoscience Paper of the Future: Best Practices for Documenting and Sharing Research from Data to Software to Provenance” Gil et al, Earth and Space Science, 2016. http://dx.doi.org/10.1002/2015EA000136 ● [David et al 2015]: 10 years of hydrology model software ● [Yu et al 2015]: Model coupling for surface/subsurface flow ● [Essawy et al 2015]: Hydrology workflows for reproducibility ● [Pope et al 2015]: Estimate subglaciar lake depth from imagery ● [Fulweiler et al 2016]: Long-term estuary data & products ● [Tzeng et al 2016]: Data processing for ocean observatory ● [Demir et al 2017]: Sensor network for flood monitoring ● [Peckham et al 2017]: Hydrological modeling toolkit Adopting FAIR has a crucial social component
  • 33. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 It’s not only reproducibility... 33 1. Practice open science and reproducible research 2. Get credit for all your research products ● Citations for software, data, containers, notebooks, samples, … 3. Increase citations of your papers 4. Write impressive Data Management Plans 5. Extend your CV with data and software sections 6. Improve the management of your research assets 7. Reproduce your work from years ago and build on it 8. Address new funder and journal requirements 9. Attract transformative students 10. Demonstrate leadership by stepping into the future
  • 34. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Towards the Scientific Paper of the Future 34 Workflows are not just data in terms of FAIR - Software, datasets, provenance, environment, context, etc. - Aggregation, versioning, sustainability, etc. Why do YOU want to support FAIR workflows? - Level of granularity (usefulness) - Scientific Paper of the Future A Social change is needed - Some practices take time but can be easily adopted - Add persistent identifiers - Add licence - Specify citation - Documentation Can you execute a workflow you last ran two years ago?
  • 35. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Acknowledgements 35 Some slides from this talk have been adapted from the Scientific Paper of the Future training materials by Yolanda Gil et al. under a CC-BY license https://scientificpaperofthefuture.org http://doi.org/10.5281/zenodo.159206 Yolanda Gil, Oscar Corcho, Carole Goble, Stian Soiland-Reyes, Deborah Khider, Varun Ratnakar, Maximiliano Osorio, Hernán Vargas, Suzanne Pierce and all the participants of the Scientific Paper of the Future Initiative.
  • 36. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Questions? 36 * https://www.slideshare.net/rlovinger/metadata-is-a-love-note-to-the-future Metadata (+ FAIR) is a love note to the future* Contact me at: daniel.garijo@upm.es @dgarijov