FAIR Workflows: A step closer to the Scientific Paper of the Future

dgarijo
Daniel Garijo, Ontology Engineering Group,
Universidad Politécnica de Madrid, Spain
FAIR Workflows:
A step closer to the
Scientific Paper of the
Future
daniel.garijo@upm.es
@dgarijov
Computational and Autonomous Workflows Workshop
(CAW) 20th July, 2021
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
A few details about myself
2
Semantic Web, Linked Data and
Knowledge Graphs
Open Science best practices
Semantic Scientific Workflows
(WINGS)
Provenance Standards (W3C PROV)
Software metadata representation
and extraction
Research Objects (context)
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
How I started: A personal view on reproducibility
3
A simple summer job:
- Reproduce this paper:
- Paper was published 1 year before
- Authors were available to help
- Website with data used was available
http://funsite.sdsc.edu/drugome/TB/ now a 404! (see Internet Archive)
But:
- No workflow (or sketch)
- Input data had slight changes
- Software licenses had expired
- Some data cleaning steps (for final
results) not available
- Some authors were in different institutions
Phil E. Bourne
(UCSD, now
Univ. of Virginia)
Yolanda Gil
(ISI, USC)
Kinnings SL, Xie L, Fung KH, Jackson RM, Xie L, et al. (2010) The
Mycobacterium tuberculosis Drugome and Its Polypharmacological
Implications. PLoS Comput Biol 6(11): e1000976
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
How I started: A personal view on Reproducibility
4
Three months later, we were
successful. We:
- Quantified effort and expertise
- Stored all resources in a wiki
- Created a desiderata for
reproducibility
Garijo D, Kinnings S, Xie L, Xie L, Zhang Y, Bourne PE, et al. (2013) Quantifying Reproducibility in Computational Biology: The Case
of the Tuberculosis Drugome. PLoS ONE 8(11): e80278. https://doi.org/10.1371/journal.pone.0080278
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Reproducibility
5
Scientists
Industry General Public
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
There is hope
6
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Changes in the scientific community
7
Open Data Open Source Software Open Access
Publications
Credit and impact
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Changes in public institutions: Initiatives in Data Science
8
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Changes in publishers and funders
9
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Best practices and principles
10
Other guidelines:
● Guidelines for Transparency and
Openness Promotion (TOP)
● Reproducibility Enhancement
Principles (REP)
● ...
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
FAIR data principles in a nutshell
11
Metadata
Make sure your resource is
findable in a public registry (e.g.,
by a search engine), and it has a
public unique id
Your resource should be
retrievable by using its identifier
and a standard communication
protocol (e.g. HTTP)
Use an existing standard to
represent your resource
Include documentation,
provenance and license for your
resource
Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3,
160018 (2016). https://doi.org/10.1038/sdata.2016.18
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Extensions to FAIR
12
Since 2016, much has been written about FAIR (e.g., full special
issue in data Intelligence, 2020)
- Software and services
- Semantic artefacts
- Workflows
- ...
Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes, Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters,
Daniel Schober; FAIR Computational Workflows. Data Intelligence 2020; 2 (1-2): 108–121. doi:
https://doi.org/10.1162/dint_a_00033 https://doi.org/10.1162/dint_a_00033
Lamprecht, Anna-Lena et al. ‘Towards FAIR Principles for Research Software’. 1 Jan. 2020 : 37 – 59
https://content.iospress.com/articles/data-science/ds190026
Smith AM, Katz DS, Niemeyer KE, FORCE11 Software Citation Working Group. (2016) Software Citation Principles. PeerJ Computer Science
2:e86. DOI: 10.7717/peerj-cs.86
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 13
What does FAIR
mean for
computational
workflows?
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Anatomy of a workflow
14
Several aspects to consider:
● Data
● Software
○ Environment
○ Wrapper scripts
● Specification
● Configuration
● Provenance
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Anatomy of a workflow: Data
15
● Data
○ Apply FAIR:
■ Meaningful inputs
■ Meaningful
intermediate results
■ Meaningful outputs
■ Streaming?
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Anatomy of a workflow: Software
16
Software
○ Tools & scripts (data
preparation,
visualization, etc.)
○ Wrapper scripts
Aspects for FAIR:
○ Software changes and
decays rapidly (version)
○ (Public) code repository
○ (Open) license
○ Credit (citation)
○ Documentation
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
The importance of software metadata
17
Software repository
● Code resides there
● Support software evolution
● Support groups of developers
Software registry
● Capture metadata
● Useful structured
information about the
code
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
The importance of software metadata
18
https://twitter.com/mitsuhiko/status/1410886329924194309
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Software and its computational environment
19
Dependencies? OS?
○ Virtual environments
○ Package managers
○ Containers
○ Virtual machines
Aspects for FAIR:
○ Landscape and standards
change quickly
○ Documentation
○ Size (long term preservation)
dockerpedia
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Computational methods
20
Workflow
○ Many workflow systems
■ Different capabilities
Aspects for FAIR:
○ Public repositories
○ Standard representations
(CWL)
○ Nested workflows?
○ Metadata and documentation
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Documentation: workflow sketches
21
Critical for human for creating human-readable descriptions!
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Workflow configurations
22
California
Florida
K = 10² cm/s
K = 0.001 cm/s
Fix certain
data/parameters/software
of a workflow
○ Run in different
regions
○ Calibrated models
○ Data
compatibilities
○ Critical for end
users (reuse)
Aspects for FAIR:
● Where to include
this information?
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Workflow provenance
23
California
K = 10² cm/s
A record of relevant past executions (and results) of a workflow
● Debug
● Reference examples (cause for workflow decay)*
● Critical for reusability
Aspects for FAIR: How to select (and represent) relevant provenance?
*J. Zhao et al., "Why workflows break — Understanding and combating decay in Taverna workflows," 2012 IEEE 8th International Conference on
E-Science, 2012, pp. 1-9, doi: 10.1109/eScience.2012.6404482.
run with precipitation from
Feb 2020
March
2020
Jan 2020
prov. record march 2020
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 24
● Data
● Software
○ Environment
○ Wrapper scripts
● Specification
● Configuration
● Provenance
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 25
● Data
● Software
○ Environment
○ Wrapper scripts
● Specification
● Configuration
● Provenance
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 26
● Data
● Software
○ Environment
○ Wrapper scripts
● Specification
● Configuration
● Provenance
How to aggregate
everything together, while
preserving its context?
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
A solution: Workflow-centric Research Objects
27
https://www.researchobject.org/ro-crate/1.1/
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Workflow-centric Research Objects (in detail)
28
Open community:
https://www.researchobject.org/ro-crate/community
https://github.com/ResearchObject/ro-crate
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 29
Transition slide
A look beyond FAIR
Beyond FAIR workflows
https://egyptindependent.com/mans-first-steps-on-the-moon-reported-live-by-afp/
A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
The Scientific Paper of the Future
30
www.scientificpaperofthefuture.org
“Towards the Geoscience Paper of the Future: Best Practices
for Documenting and Sharing Research from Data to Software
to Provenance” Gil et al, Earth and Space Science, 2016.
http://dx.doi.org/10.1002/2015EA000136
Geophysics: Special Issue on
Geoscience Papers of the Future
Special Section: Geoscience Papers of
the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
The Scientific Paper of the Future
31
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
An example
32
Published Articles
www.scientificpaperofthefuture.org/gpf/special-issue
“Towards the Geoscience Paper of
the Future: Best Practices for
Documenting and Sharing
Research from Data to Software to
Provenance” Gil et al, Earth and
Space Science, 2016.
http://dx.doi.org/10.1002/2015EA000136
● [David et al 2015]: 10 years of hydrology model software
● [Yu et al 2015]: Model coupling for surface/subsurface flow
● [Essawy et al 2015]: Hydrology workflows for reproducibility
● [Pope et al 2015]: Estimate subglaciar lake depth from imagery
● [Fulweiler et al 2016]: Long-term estuary data & products
● [Tzeng et al 2016]: Data processing for ocean observatory
● [Demir et al 2017]: Sensor network for flood monitoring
● [Peckham et al 2017]: Hydrological modeling toolkit
Adopting FAIR has a crucial social component
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
It’s not only reproducibility...
33
1. Practice open science and reproducible research
2. Get credit for all your research products
● Citations for software, data, containers, notebooks, samples, …
3. Increase citations of your papers
4. Write impressive Data Management Plans
5. Extend your CV with data and software sections
6. Improve the management of your research assets
7. Reproduce your work from years ago and build on it
8. Address new funder and journal requirements
9. Attract transformative students
10. Demonstrate leadership by stepping into the future
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Towards the Scientific Paper of the Future
34
Workflows are not just data in terms of FAIR
- Software, datasets, provenance, environment, context, etc.
- Aggregation, versioning, sustainability, etc.
Why do YOU want to support FAIR workflows?
- Level of granularity (usefulness)
- Scientific Paper of the Future
A Social change is needed
- Some practices take time but can be easily adopted
- Add persistent identifiers
- Add licence
- Specify citation
- Documentation
Can you execute a workflow you last ran two years ago?
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Acknowledgements
35
Some slides from this talk have been adapted from the
Scientific Paper of the Future training materials by
Yolanda Gil et al. under a CC-BY license
https://scientificpaperofthefuture.org http://doi.org/10.5281/zenodo.159206
Yolanda Gil, Oscar Corcho, Carole Goble, Stian
Soiland-Reyes, Deborah Khider, Varun Ratnakar,
Maximiliano Osorio, Hernán Vargas, Suzanne Pierce
and all the participants of the Scientific Paper of the
Future Initiative.
FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Questions?
36
* https://www.slideshare.net/rlovinger/metadata-is-a-love-note-to-the-future
Metadata (+ FAIR) is a love note to the future*
Contact me at:
daniel.garijo@upm.es
@dgarijov
1 of 36

Recommended

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles by
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
519 views8 slides
Coming to terms to FAIR semantics by
Coming to terms to FAIR semanticsComing to terms to FAIR semantics
Coming to terms to FAIR semanticsMaría Poveda Villalón
284 views17 slides
Scientific Software Registry Collaboration Workshop: From Software Metadata r... by
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...dgarijo
460 views12 slides
A Template-Based Approach for Annotating Long-Tailed Datasets by
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasetsdgarijo
144 views12 slides
Towards Knowledge Graphs of Reusable Research Software Metadata by
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadatadgarijo
624 views22 slides
Towards Automating Data Narratives by
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narrativesdgarijo
918 views23 slides

More Related Content

What's hot

Towards Reusable Research Software by
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Softwaredgarijo
171 views9 slides
SOMEF: a metadata extraction framework from software documentation by
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationdgarijo
121 views7 slides
Towards Human-Guided Machine Learning - IUI 2019 by
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019dgarijo
545 views16 slides
FAIR Computational Workflows by
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
982 views49 slides
Software Sustainability: Better Software Better Science by
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
2.1K views73 slides
OKE2018 Challenge @ ESWC2018 by
OKE2018 Challenge @ ESWC2018OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018Holistic Benchmarking of Big Linked Data
371 views10 slides

What's hot(20)

Towards Reusable Research Software by dgarijo
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
dgarijo171 views
SOMEF: a metadata extraction framework from software documentation by dgarijo
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
dgarijo121 views
Towards Human-Guided Machine Learning - IUI 2019 by dgarijo
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
dgarijo545 views
FAIR Computational Workflows by Carole Goble
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble982 views
Software Sustainability: Better Software Better Science by Carole Goble
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
Carole Goble2.1K views
RO-Crate: A framework for packaging research products into FAIR Research Objects by Carole Goble
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
Carole Goble425 views
(Big) Data (Science) Skills by Oscar Corcho
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
Oscar Corcho4K views
Research Object Community Update by Carole Goble
Research Object Community UpdateResearch Object Community Update
Research Object Community Update
Carole Goble196 views
Introduction of semantic technology for SAS programmers by Kevin Lee
Introduction of semantic technology for SAS programmersIntroduction of semantic technology for SAS programmers
Introduction of semantic technology for SAS programmers
Kevin Lee34 views
Research Objects for improved sharing and reproducibility by Oscar Corcho
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
Oscar Corcho843 views
Better software, better service, better research: The Software Sustainabilit... by Carole Goble
Better software, better service, better research: The Software Sustainabilit...Better software, better service, better research: The Software Sustainabilit...
Better software, better service, better research: The Software Sustainabilit...
Carole Goble265 views
New PID developments by OpenAIRE
New PID developmentsNew PID developments
New PID developments
OpenAIRE928 views
FAIR Workflows and Research Objects get a Workout by Carole Goble
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
Carole Goble479 views
FAIR History and the Future by Carole Goble
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
Carole Goble308 views
A Comparative analysis of Graph Databases vs Relational Database by Darroch Greally
A Comparative analysis of Graph Databases vs Relational Database A Comparative analysis of Graph Databases vs Relational Database
A Comparative analysis of Graph Databases vs Relational Database
Darroch Greally1.3K views
Role of PIDs in connecting scholarly works by OpenAIRE
Role of PIDs in connecting scholarly worksRole of PIDs in connecting scholarly works
Role of PIDs in connecting scholarly works
OpenAIRE669 views
Towards a Unified PageRank for DBpedia and Wikidata by Andreas Thalhammer
Towards a Unified PageRank for DBpedia and WikidataTowards a Unified PageRank for DBpedia and Wikidata
Towards a Unified PageRank for DBpedia and Wikidata
Andreas Thalhammer5.7K views

Similar to FAIR Workflows: A step closer to the Scientific Paper of the Future

IronHacks Live: Info session #3 - COVID-19 Data Science Challenge by
IronHacks Live: Info session #3 - COVID-19 Data Science ChallengeIronHacks Live: Info session #3 - COVID-19 Data Science Challenge
IronHacks Live: Info session #3 - COVID-19 Data Science ChallengePurdue RCODI
85 views47 slides
Hala skafkeynote@conferencedata2021 by
Hala skafkeynote@conferencedata2021Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021hala Skaf
113 views143 slides
STI 2022 - Generating large-scale network analyses of scientific landscapes i... by
STI 2022 - Generating large-scale network analyses of scientific landscapes i...STI 2022 - Generating large-scale network analyses of scientific landscapes i...
STI 2022 - Generating large-scale network analyses of scientific landscapes i...Michele Pasin
20 views21 slides
KEDL DBpedia 2019 by
KEDL DBpedia  2019KEDL DBpedia  2019
KEDL DBpedia 2019Sebastian Hellmann
392 views40 slides
LiDIA: An integration architecture to query Linked Open Data from multiple da... by
LiDIA: An integration architecture to query Linked Open Data from multiple da...LiDIA: An integration architecture to query Linked Open Data from multiple da...
LiDIA: An integration architecture to query Linked Open Data from multiple da...Cristian Rodríguez Enríquez
846 views21 slides
Class Linked Open Data Advanced topics in semantic web cs 7820 01 by
Class Linked Open Data Advanced topics in semantic web  cs 7820 01Class Linked Open Data Advanced topics in semantic web  cs 7820 01
Class Linked Open Data Advanced topics in semantic web cs 7820 01Amélie Gyrard
1K views79 slides

Similar to FAIR Workflows: A step closer to the Scientific Paper of the Future(20)

IronHacks Live: Info session #3 - COVID-19 Data Science Challenge by Purdue RCODI
IronHacks Live: Info session #3 - COVID-19 Data Science ChallengeIronHacks Live: Info session #3 - COVID-19 Data Science Challenge
IronHacks Live: Info session #3 - COVID-19 Data Science Challenge
Purdue RCODI85 views
Hala skafkeynote@conferencedata2021 by hala Skaf
Hala skafkeynote@conferencedata2021Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021
hala Skaf113 views
STI 2022 - Generating large-scale network analyses of scientific landscapes i... by Michele Pasin
STI 2022 - Generating large-scale network analyses of scientific landscapes i...STI 2022 - Generating large-scale network analyses of scientific landscapes i...
STI 2022 - Generating large-scale network analyses of scientific landscapes i...
Michele Pasin20 views
Class Linked Open Data Advanced topics in semantic web cs 7820 01 by Amélie Gyrard
Class Linked Open Data Advanced topics in semantic web  cs 7820 01Class Linked Open Data Advanced topics in semantic web  cs 7820 01
Class Linked Open Data Advanced topics in semantic web cs 7820 01
Amélie Gyrard1K views
Tracking research data footprints - slides by ARDC
Tracking research data footprints - slidesTracking research data footprints - slides
Tracking research data footprints - slides
ARDC350 views
Fighting COVID-19 with Artificial Intelligence by vty
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
vty484 views
From Web 1.0 to Web 2.0 in the engineering information world - yesterday, tod... by paul.needham
From Web 1.0 to Web 2.0 in the engineering information world - yesterday, tod...From Web 1.0 to Web 2.0 in the engineering information world - yesterday, tod...
From Web 1.0 to Web 2.0 in the engineering information world - yesterday, tod...
paul.needham494 views
Defining iot.schema.org: Using Knowledge Extraction from Existing IoT-based ... by Amélie Gyrard
Defining iot.schema.org: Using Knowledge Extraction from  Existing IoT-based ...Defining iot.schema.org: Using Knowledge Extraction from  Existing IoT-based ...
Defining iot.schema.org: Using Knowledge Extraction from Existing IoT-based ...
Amélie Gyrard888 views
Experiments With Knowledge Graphs in Fisheries & Oceans Canada by Neo4j
Experiments With Knowledge Graphs in Fisheries & Oceans CanadaExperiments With Knowledge Graphs in Fisheries & Oceans Canada
Experiments With Knowledge Graphs in Fisheries & Oceans Canada
Neo4j349 views
Data Science in 2016: Moving Up by Paco Nathan
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
Paco Nathan10.5K views
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015 by Big Data Spain
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Big Data Spain998 views
Building COVID-19 Knowledge Graph at CoronaWhy by vty
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhy
vty233 views
2nd International Conference on Cloud, Big Data and IoT (CBIoT 2021) by ijccsa
2nd International Conference on Cloud, Big Data and IoT (CBIoT 2021)2nd International Conference on Cloud, Big Data and IoT (CBIoT 2021)
2nd International Conference on Cloud, Big Data and IoT (CBIoT 2021)
ijccsa6 views
2nd International Conference on Cloud, Big Data and IoT (CBIoT 2021) by IJCNCJournal
2nd International Conference on Cloud, Big Data and IoT (CBIoT 2021)2nd International Conference on Cloud, Big Data and IoT (CBIoT 2021)
2nd International Conference on Cloud, Big Data and IoT (CBIoT 2021)
IJCNCJournal20 views
Real world e-science use-cases by Annette Strauch
Real world e-science use-casesReal world e-science use-cases
Real world e-science use-cases
Annette Strauch1.8K views

More from dgarijo

WDPlus: Leveraging Wikidata to Link and Extend Tabular Data by
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Datadgarijo
584 views13 slides
Capturing Context in Scientific Experiments: Towards Computer-Driven Science by
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Sciencedgarijo
551 views54 slides
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met... by
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...dgarijo
583 views14 slides
WIDOCO: A Wizard for Documenting Ontologies by
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologiesdgarijo
1.2K views12 slides
Automated Hypothesis Testing with Large Scale Scientific Workflows by
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflowsdgarijo
586 views44 slides
OntoSoft: A Distributed Semantic Registry for Scientific Software by
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Softwaredgarijo
919 views18 slides

More from dgarijo(20)

WDPlus: Leveraging Wikidata to Link and Extend Tabular Data by dgarijo
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
dgarijo584 views
Capturing Context in Scientific Experiments: Towards Computer-Driven Science by dgarijo
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
dgarijo551 views
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met... by dgarijo
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
dgarijo583 views
WIDOCO: A Wizard for Documenting Ontologies by dgarijo
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
dgarijo1.2K views
Automated Hypothesis Testing with Large Scale Scientific Workflows by dgarijo
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
dgarijo586 views
OntoSoft: A Distributed Semantic Registry for Scientific Software by dgarijo
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
dgarijo919 views
OEG tools for supporting Ontology Engineering by dgarijo
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
dgarijo289 views
Software Metadata: Describing "dark software" in GeoSciences by dgarijo
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
dgarijo901 views
Reproducibility Using Semantics: An Overview by dgarijo
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
dgarijo890 views
PhD Thesis: Mining abstractions in scientific workflows by dgarijo
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
dgarijo1.8K views
Publicación de datos y métodos científicos en investigación by dgarijo
Publicación de datos y métodos científicos en investigaciónPublicación de datos y métodos científicos en investigación
Publicación de datos y métodos científicos en investigación
dgarijo820 views
EDBT 2015: Summer School Overview by dgarijo
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overview
dgarijo608 views
Similarity in Wikipedia Articles (EDBT Summer School) by dgarijo
Similarity in Wikipedia Articles (EDBT Summer School)Similarity in Wikipedia Articles (EDBT Summer School)
Similarity in Wikipedia Articles (EDBT Summer School)
dgarijo790 views
Semantic web 101: Benefits for geologists by dgarijo
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
dgarijo551 views
Is preserving data enough? Towards the preservation of scientific methods by dgarijo
Is preserving data enough? Towards the preservation of scientific methods Is preserving data enough? Towards the preservation of scientific methods
Is preserving data enough? Towards the preservation of scientific methods
dgarijo898 views
Creating abstractions from scientific workflows: PhD symposium 2015 by dgarijo
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
dgarijo526 views
Towards Workflow Ecosystems Through Semantic and Standard Representations by dgarijo
Towards Workflow Ecosystems Through Semantic and Standard RepresentationsTowards Workflow Ecosystems Through Semantic and Standard Representations
Towards Workflow Ecosystems Through Semantic and Standard Representations
dgarijo877 views
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users by dgarijo
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline UsersWorkflow Reuse in Practice: A Study of Neuroimaging Pipeline Users
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users
dgarijo934 views
Frag Flow: Automated Fragment Detection in Scientific Workflows by dgarijo
Frag Flow: Automated Fragment Detection in Scientific WorkflowsFrag Flow: Automated Fragment Detection in Scientific Workflows
Frag Flow: Automated Fragment Detection in Scientific Workflows
dgarijo928 views
User requirments for geospatial provenance by dgarijo
User requirments for geospatial provenanceUser requirments for geospatial provenance
User requirments for geospatial provenance
dgarijo913 views

Recently uploaded

Dynamics of Hard-Magnetic Soft Materials by
Dynamics of Hard-Magnetic Soft MaterialsDynamics of Hard-Magnetic Soft Materials
Dynamics of Hard-Magnetic Soft MaterialsShivendra Nandan
14 views32 slides
Pull down shoulder press final report docx (1).pdf by
Pull down shoulder press final report docx (1).pdfPull down shoulder press final report docx (1).pdf
Pull down shoulder press final report docx (1).pdfComsat Universal Islamabad Wah Campus
10 views25 slides
SNMPx by
SNMPxSNMPx
SNMPxAmatullahbutt
15 views12 slides
DevOps-ITverse-2023-IIT-DU.pptx by
DevOps-ITverse-2023-IIT-DU.pptxDevOps-ITverse-2023-IIT-DU.pptx
DevOps-ITverse-2023-IIT-DU.pptxAnowar Hossain
8 views45 slides
SPICE PARK DEC2023 (6,625 SPICE Models) by
SPICE PARK DEC2023 (6,625 SPICE Models) SPICE PARK DEC2023 (6,625 SPICE Models)
SPICE PARK DEC2023 (6,625 SPICE Models) Tsuyoshi Horigome
17 views218 slides

Recently uploaded(20)

Dynamics of Hard-Magnetic Soft Materials by Shivendra Nandan
Dynamics of Hard-Magnetic Soft MaterialsDynamics of Hard-Magnetic Soft Materials
Dynamics of Hard-Magnetic Soft Materials
Shivendra Nandan14 views
An approach of ontology and knowledge base for railway maintenance by IJECEIAES
An approach of ontology and knowledge base for railway maintenanceAn approach of ontology and knowledge base for railway maintenance
An approach of ontology and knowledge base for railway maintenance
IJECEIAES12 views
Advances in micro milling: From tool fabrication to process outcomes by Shivendra Nandan
Advances in micro milling: From tool fabrication to process outcomesAdvances in micro milling: From tool fabrication to process outcomes
Advances in micro milling: From tool fabrication to process outcomes
Effect of deep chemical mixing columns on properties of surrounding soft clay... by AltinKaradagli
Effect of deep chemical mixing columns on properties of surrounding soft clay...Effect of deep chemical mixing columns on properties of surrounding soft clay...
Effect of deep chemical mixing columns on properties of surrounding soft clay...
AltinKaradagli6 views
Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ... by AltinKaradagli
Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ...Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ...
Investigation of Physicochemical Changes of Soft Clay around Deep Geopolymer ...
AltinKaradagli6 views
MSA Website Slideshow (16).pdf by msaucla
MSA Website Slideshow (16).pdfMSA Website Slideshow (16).pdf
MSA Website Slideshow (16).pdf
msaucla58 views
Update 42 models(Diode/General ) in SPICE PARK(DEC2023) by Tsuyoshi Horigome
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Instrumentation & Control Lab Manual.pdf by NTU Faisalabad
Instrumentation & Control Lab Manual.pdfInstrumentation & Control Lab Manual.pdf
Instrumentation & Control Lab Manual.pdf
NTU Faisalabad 5 views
Machine learning in drug supply chain management during disease outbreaks: a ... by IJECEIAES
Machine learning in drug supply chain management during disease outbreaks: a ...Machine learning in drug supply chain management during disease outbreaks: a ...
Machine learning in drug supply chain management during disease outbreaks: a ...
IJECEIAES10 views
Generative AI Models & Their Applications by SN
Generative AI Models & Their ApplicationsGenerative AI Models & Their Applications
Generative AI Models & Their Applications
SN6 views

FAIR Workflows: A step closer to the Scientific Paper of the Future

  • 1. Daniel Garijo, Ontology Engineering Group, Universidad Politécnica de Madrid, Spain FAIR Workflows: A step closer to the Scientific Paper of the Future daniel.garijo@upm.es @dgarijov Computational and Autonomous Workflows Workshop (CAW) 20th July, 2021
  • 2. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 A few details about myself 2 Semantic Web, Linked Data and Knowledge Graphs Open Science best practices Semantic Scientific Workflows (WINGS) Provenance Standards (W3C PROV) Software metadata representation and extraction Research Objects (context)
  • 3. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 How I started: A personal view on reproducibility 3 A simple summer job: - Reproduce this paper: - Paper was published 1 year before - Authors were available to help - Website with data used was available http://funsite.sdsc.edu/drugome/TB/ now a 404! (see Internet Archive) But: - No workflow (or sketch) - Input data had slight changes - Software licenses had expired - Some data cleaning steps (for final results) not available - Some authors were in different institutions Phil E. Bourne (UCSD, now Univ. of Virginia) Yolanda Gil (ISI, USC) Kinnings SL, Xie L, Fung KH, Jackson RM, Xie L, et al. (2010) The Mycobacterium tuberculosis Drugome and Its Polypharmacological Implications. PLoS Comput Biol 6(11): e1000976
  • 4. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 How I started: A personal view on Reproducibility 4 Three months later, we were successful. We: - Quantified effort and expertise - Stored all resources in a wiki - Created a desiderata for reproducibility Garijo D, Kinnings S, Xie L, Xie L, Zhang Y, Bourne PE, et al. (2013) Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome. PLoS ONE 8(11): e80278. https://doi.org/10.1371/journal.pone.0080278
  • 5. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Reproducibility 5 Scientists Industry General Public
  • 6. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 There is hope 6
  • 7. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Changes in the scientific community 7 Open Data Open Source Software Open Access Publications Credit and impact
  • 8. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Changes in public institutions: Initiatives in Data Science 8
  • 9. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Changes in publishers and funders 9
  • 10. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Best practices and principles 10 Other guidelines: ● Guidelines for Transparency and Openness Promotion (TOP) ● Reproducibility Enhancement Principles (REP) ● ...
  • 11. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 FAIR data principles in a nutshell 11 Metadata Make sure your resource is findable in a public registry (e.g., by a search engine), and it has a public unique id Your resource should be retrievable by using its identifier and a standard communication protocol (e.g. HTTP) Use an existing standard to represent your resource Include documentation, provenance and license for your resource Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
  • 12. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Extensions to FAIR 12 Since 2016, much has been written about FAIR (e.g., full special issue in data Intelligence, 2020) - Software and services - Semantic artefacts - Workflows - ... Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes, Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, Daniel Schober; FAIR Computational Workflows. Data Intelligence 2020; 2 (1-2): 108–121. doi: https://doi.org/10.1162/dint_a_00033 https://doi.org/10.1162/dint_a_00033 Lamprecht, Anna-Lena et al. ‘Towards FAIR Principles for Research Software’. 1 Jan. 2020 : 37 – 59 https://content.iospress.com/articles/data-science/ds190026 Smith AM, Katz DS, Niemeyer KE, FORCE11 Software Citation Working Group. (2016) Software Citation Principles. PeerJ Computer Science 2:e86. DOI: 10.7717/peerj-cs.86
  • 13. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 13 What does FAIR mean for computational workflows?
  • 14. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Anatomy of a workflow 14 Several aspects to consider: ● Data ● Software ○ Environment ○ Wrapper scripts ● Specification ● Configuration ● Provenance
  • 15. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Anatomy of a workflow: Data 15 ● Data ○ Apply FAIR: ■ Meaningful inputs ■ Meaningful intermediate results ■ Meaningful outputs ■ Streaming?
  • 16. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Anatomy of a workflow: Software 16 Software ○ Tools & scripts (data preparation, visualization, etc.) ○ Wrapper scripts Aspects for FAIR: ○ Software changes and decays rapidly (version) ○ (Public) code repository ○ (Open) license ○ Credit (citation) ○ Documentation
  • 17. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 The importance of software metadata 17 Software repository ● Code resides there ● Support software evolution ● Support groups of developers Software registry ● Capture metadata ● Useful structured information about the code
  • 18. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 The importance of software metadata 18 https://twitter.com/mitsuhiko/status/1410886329924194309
  • 19. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Software and its computational environment 19 Dependencies? OS? ○ Virtual environments ○ Package managers ○ Containers ○ Virtual machines Aspects for FAIR: ○ Landscape and standards change quickly ○ Documentation ○ Size (long term preservation) dockerpedia
  • 20. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Computational methods 20 Workflow ○ Many workflow systems ■ Different capabilities Aspects for FAIR: ○ Public repositories ○ Standard representations (CWL) ○ Nested workflows? ○ Metadata and documentation
  • 21. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Documentation: workflow sketches 21 Critical for human for creating human-readable descriptions!
  • 22. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Workflow configurations 22 California Florida K = 10² cm/s K = 0.001 cm/s Fix certain data/parameters/software of a workflow ○ Run in different regions ○ Calibrated models ○ Data compatibilities ○ Critical for end users (reuse) Aspects for FAIR: ● Where to include this information?
  • 23. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Workflow provenance 23 California K = 10² cm/s A record of relevant past executions (and results) of a workflow ● Debug ● Reference examples (cause for workflow decay)* ● Critical for reusability Aspects for FAIR: How to select (and represent) relevant provenance? *J. Zhao et al., "Why workflows break — Understanding and combating decay in Taverna workflows," 2012 IEEE 8th International Conference on E-Science, 2012, pp. 1-9, doi: 10.1109/eScience.2012.6404482. run with precipitation from Feb 2020 March 2020 Jan 2020 prov. record march 2020
  • 24. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 24 ● Data ● Software ○ Environment ○ Wrapper scripts ● Specification ● Configuration ● Provenance
  • 25. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 25 ● Data ● Software ○ Environment ○ Wrapper scripts ● Specification ● Configuration ● Provenance
  • 26. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 26 ● Data ● Software ○ Environment ○ Wrapper scripts ● Specification ● Configuration ● Provenance How to aggregate everything together, while preserving its context?
  • 27. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 A solution: Workflow-centric Research Objects 27 https://www.researchobject.org/ro-crate/1.1/
  • 28. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Workflow-centric Research Objects (in detail) 28 Open community: https://www.researchobject.org/ro-crate/community https://github.com/ResearchObject/ro-crate
  • 29. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 29 Transition slide A look beyond FAIR Beyond FAIR workflows https://egyptindependent.com/mans-first-steps-on-the-moon-reported-live-by-afp/ A step closer to the Scientific Paper of the Future
  • 30. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 The Scientific Paper of the Future 30 www.scientificpaperofthefuture.org “Towards the Geoscience Paper of the Future: Best Practices for Documenting and Sharing Research from Data to Software to Provenance” Gil et al, Earth and Space Science, 2016. http://dx.doi.org/10.1002/2015EA000136 Geophysics: Special Issue on Geoscience Papers of the Future Special Section: Geoscience Papers of the Future
  • 31. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 The Scientific Paper of the Future 31
  • 32. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 An example 32 Published Articles www.scientificpaperofthefuture.org/gpf/special-issue “Towards the Geoscience Paper of the Future: Best Practices for Documenting and Sharing Research from Data to Software to Provenance” Gil et al, Earth and Space Science, 2016. http://dx.doi.org/10.1002/2015EA000136 ● [David et al 2015]: 10 years of hydrology model software ● [Yu et al 2015]: Model coupling for surface/subsurface flow ● [Essawy et al 2015]: Hydrology workflows for reproducibility ● [Pope et al 2015]: Estimate subglaciar lake depth from imagery ● [Fulweiler et al 2016]: Long-term estuary data & products ● [Tzeng et al 2016]: Data processing for ocean observatory ● [Demir et al 2017]: Sensor network for flood monitoring ● [Peckham et al 2017]: Hydrological modeling toolkit Adopting FAIR has a crucial social component
  • 33. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 It’s not only reproducibility... 33 1. Practice open science and reproducible research 2. Get credit for all your research products ● Citations for software, data, containers, notebooks, samples, … 3. Increase citations of your papers 4. Write impressive Data Management Plans 5. Extend your CV with data and software sections 6. Improve the management of your research assets 7. Reproduce your work from years ago and build on it 8. Address new funder and journal requirements 9. Attract transformative students 10. Demonstrate leadership by stepping into the future
  • 34. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Towards the Scientific Paper of the Future 34 Workflows are not just data in terms of FAIR - Software, datasets, provenance, environment, context, etc. - Aggregation, versioning, sustainability, etc. Why do YOU want to support FAIR workflows? - Level of granularity (usefulness) - Scientific Paper of the Future A Social change is needed - Some practices take time but can be easily adopted - Add persistent identifiers - Add licence - Specify citation - Documentation Can you execute a workflow you last ran two years ago?
  • 35. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Acknowledgements 35 Some slides from this talk have been adapted from the Scientific Paper of the Future training materials by Yolanda Gil et al. under a CC-BY license https://scientificpaperofthefuture.org http://doi.org/10.5281/zenodo.159206 Yolanda Gil, Oscar Corcho, Carole Goble, Stian Soiland-Reyes, Deborah Khider, Varun Ratnakar, Maximiliano Osorio, Hernán Vargas, Suzanne Pierce and all the participants of the Scientific Paper of the Future Initiative.
  • 36. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 Questions? 36 * https://www.slideshare.net/rlovinger/metadata-is-a-love-note-to-the-future Metadata (+ FAIR) is a love note to the future* Contact me at: daniel.garijo@upm.es @dgarijov