SlideShare a Scribd company logo
1 of 9
Towards Reusable
Research Software
Daniel Garijo Verdejo
@dgarijov
daniel.garijo@upm.es
Ontology Engineering Group
Departamento de Inteligencia Artificial
Facultad de Informática
Universidad Politécnica de Madrid
Reproducibility: Open Research Data, Software and Methods
2
Scientific publication
Research Data Research Software Research Methods
EOSC Symposium: Infrastructure for quality research software
Challenges for (Re)using and Sharing Research Software
3
• What does the software component do?
Which of its methods should I use?
• How to transform my data to use the
software component?
• How to interpret the results produced by
the software component?
• How to invoke the software component?
• How to configure the software component
with the right parameters?
• How to compare against similar methods?
Software designer
Software user
• How to ease capturing the
dependencies and installation
instructions of my software?
• How to encapsulate my software so
it can be used with other data?
• How to describe my software so it
can be used by others?
• How to test if my software is ready
to be used by others?
EOSC Symposium: Infrastructure for quality research software
Community Initiatives and Standards
• Describing Research Software
• Schema.org & Codemeta
• Common Worflow Language (I/O)
• Packaging Research Artefacts (incl. software)
• Research Objects (RO-Crate)
• Aggregators (OpenAIRE, EOSC)
• General (e.g., Zenodo) &
domain-specific registries
• Scicodes (https://scicodes.net/)
4
Nine Best Practices for Research Software Registries and Repositories: A Concise Guide https://arxiv.org/abs/2012.13117
EOSC Symposium: Infrastructure for quality research software
Adopting annotation vocabularies: where are we at?
Software metadata is not abundant machine readable
5
EOSC Symposium: Infrastructure for quality research software
Can you please describe your
software component with metadata?
I already did! Did you read the
project readme?
Did you see the online
documentation?
Perhaps the you saw the
paper?
Many domain-specific registries are curated by
hand by experts
Automated Software Metadata Extraction
6
SOMEF
SOftware Metadata
Extraction Framework
https://github.com/KnowledgeCaptureAndDiscovery/somef/
[Mao et al 2019]: SoMEF: A Framework for Capturing Software Metadata from its Documentation. 2019 IEEE BigData REU Symposium. Los
Angeles, 2019
EOSC Symposium: Infrastructure for quality research software
Code repository
(readme)
Machine-readable file with software metadata:
• > 20 common metadata fields
• Installation instructions, description, invocation
command, license, author, citation, requirements,
examples, documentation, notebooks, etc.
• Analysis of readme and supp. Files (e.g., notebooks,
Dockerfiles)
• JSON, RDF(graph), Codemeta, RO (in progress)
Leveraging Software Metadata to create Knowledge Graphs
7
Explore input/output variables (interoperability)
Explore Software I/O files
(composition)
Knowledge Graphs with can link RS and its
components.
OKG-Soft: machine-readable Software Metadata:
• (From Schema.org) Attribution, license, funding,
usage examples...
• Executable software components
• Software invocation
• Input & output files, variables and units
• Containers used to encapsulate and run software
components
• Parameter validation and suggestion
[Garijo et al 2019]: OKG-Soft: An Open Knowledge Graph with Machine Readable Scientific Software Metadata. International
Conference on eScience, San Diego, USA. 2019
EOSC Symposium: Infrastructure for quality research software
Conclusions
Research Software Metadata should be actionable and useful for:
• Understanding the differences between two or more software
components
• Help portability (ROs)
• Add components in workflows (CWL + ROs)
• Help linking similar software methods
• Build automated comparison benchmarks
• Reduce the time needed to understand and adopt an existing
software component
• Author credit
8
EOSC Symposium: Infrastructure for quality research software
Questions?
Let's create machine-actionable software metadata
9
Image credit: https://icons8.com/icons/
+
findable
portable
comparable
executable
reusable
Code +
documentation
Automated
extraction
Knowledge
Graphs
EOSC Symposium: Infrastructure for quality research software
Acknowledgements: Yolanda Gil, Deborah Khider, Varun Ratnakar, Maximiliano Osorio,
Hernan Vargas, Oscar Corcho
SOMEF

More Related Content

What's hot

Mining Sociotechnical Information From Software Repositories
Mining Sociotechnical Information From Software RepositoriesMining Sociotechnical Information From Software Repositories
Mining Sociotechnical Information From Software Repositories
Marco Aurelio Gerosa
 
v2_Shikha_Gupta_Resume
v2_Shikha_Gupta_Resumev2_Shikha_Gupta_Resume
v2_Shikha_Gupta_Resume
Shikha Gupta
 

What's hot (20)

Coming to terms to FAIR semantics
Coming to terms to FAIR semanticsComing to terms to FAIR semantics
Coming to terms to FAIR semantics
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer Research
 
OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018
 
FAIRness through a novel combination of Web technologies
FAIRness through a novel combination of Web technologiesFAIRness through a novel combination of Web technologies
FAIRness through a novel combination of Web technologies
 
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
 
A Comparative analysis of Graph Databases vs Relational Database
A Comparative analysis of Graph Databases vs Relational Database A Comparative analysis of Graph Databases vs Relational Database
A Comparative analysis of Graph Databases vs Relational Database
 
Scientific Units in the Electronic Age
Scientific Units in the Electronic AgeScientific Units in the Electronic Age
Scientific Units in the Electronic Age
 
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical DataA Generic Scientific Data Model and Ontology for Representation of Chemical Data
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
 
It Takes a Village to Grow ORCIDs on Campus: Establishing and Integrating Uni...
It Takes a Village to Grow ORCIDs on Campus: Establishing and Integrating Uni...It Takes a Village to Grow ORCIDs on Campus: Establishing and Integrating Uni...
It Takes a Village to Grow ORCIDs on Campus: Establishing and Integrating Uni...
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
Mining Sociotechnical Information From Software Repositories
Mining Sociotechnical Information From Software RepositoriesMining Sociotechnical Information From Software Repositories
Mining Sociotechnical Information From Software Repositories
 
A Guide for Reproducible Research
A Guide for Reproducible ResearchA Guide for Reproducible Research
A Guide for Reproducible Research
 
Capturing the context: one small(ish step for modellers, one giant leap for m...
Capturing the context: one small(ish step for modellers, one giant leap for m...Capturing the context: one small(ish step for modellers, one giant leap for m...
Capturing the context: one small(ish step for modellers, one giant leap for m...
 
Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...
Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...
Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...
 
Liberating Laboratory Data - Eureka
Liberating Laboratory Data - EurekaLiberating Laboratory Data - Eureka
Liberating Laboratory Data - Eureka
 
Working with RDF in Jupyter Notebooks: some lessons in getting rid of Excel f...
Working with RDF in Jupyter Notebooks: some lessons in getting rid of Excel f...Working with RDF in Jupyter Notebooks: some lessons in getting rid of Excel f...
Working with RDF in Jupyter Notebooks: some lessons in getting rid of Excel f...
 
2016 Summer - Araport Project Overview Leaflet
2016 Summer - Araport Project Overview Leaflet2016 Summer - Araport Project Overview Leaflet
2016 Summer - Araport Project Overview Leaflet
 
v2_Shikha_Gupta_Resume
v2_Shikha_Gupta_Resumev2_Shikha_Gupta_Resume
v2_Shikha_Gupta_Resume
 
A guided tour of Araport
A guided tour of AraportA guided tour of Araport
A guided tour of Araport
 

Similar to Towards Reusable Research Software

Similar to Towards Reusable Research Software (20)

Datamingse
DatamingseDatamingse
Datamingse
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
 
Msr17a.ppt
Msr17a.pptMsr17a.ppt
Msr17a.ppt
 
Msr17a.ppt
Msr17a.pptMsr17a.ppt
Msr17a.ppt
 
Research software identification - Catherine Jones
Research software identification - Catherine JonesResearch software identification - Catherine Jones
Research software identification - Catherine Jones
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)
 
A knowledge-workbench-for-software-development
A knowledge-workbench-for-software-developmentA knowledge-workbench-for-software-development
A knowledge-workbench-for-software-development
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
ExSchema - ICSM'13
ExSchema - ICSM'13ExSchema - ICSM'13
ExSchema - ICSM'13
 
Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and Security
 
Lecture 2 | Industry, Career Paths, Essential Skills
Lecture 2 | Industry, Career Paths, Essential SkillsLecture 2 | Industry, Career Paths, Essential Skills
Lecture 2 | Industry, Career Paths, Essential Skills
 
Knowledge based-interaction-in-software-development
Knowledge based-interaction-in-software-developmentKnowledge based-interaction-in-software-development
Knowledge based-interaction-in-software-development
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
The recommendations system for source code components retrieval
The recommendations system for source code components retrievalThe recommendations system for source code components retrieval
The recommendations system for source code components retrieval
 
PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...
PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...
PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...
 
"Different software evolutions from Start till Release in PHP product" Oleksa...
"Different software evolutions from Start till Release in PHP product" Oleksa..."Different software evolutions from Start till Release in PHP product" Oleksa...
"Different software evolutions from Start till Release in PHP product" Oleksa...
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 
Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software Datasets
 
E learning resource Locator Project Report (J2EE)
E learning resource Locator Project Report (J2EE)E learning resource Locator Project Report (J2EE)
E learning resource Locator Project Report (J2EE)
 

More from dgarijo

Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
dgarijo
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
dgarijo
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
dgarijo
 
Is preserving data enough? Towards the preservation of scientific methods
Is preserving data enough? Towards the preservation of scientific methods Is preserving data enough? Towards the preservation of scientific methods
Is preserving data enough? Towards the preservation of scientific methods
dgarijo
 
Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
dgarijo
 

More from dgarijo (20)

WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
 
Publicación de datos y métodos científicos en investigación
Publicación de datos y métodos científicos en investigaciónPublicación de datos y métodos científicos en investigación
Publicación de datos y métodos científicos en investigación
 
EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overview
 
Similarity in Wikipedia Articles (EDBT Summer School)
Similarity in Wikipedia Articles (EDBT Summer School)Similarity in Wikipedia Articles (EDBT Summer School)
Similarity in Wikipedia Articles (EDBT Summer School)
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
 
Is preserving data enough? Towards the preservation of scientific methods
Is preserving data enough? Towards the preservation of scientific methods Is preserving data enough? Towards the preservation of scientific methods
Is preserving data enough? Towards the preservation of scientific methods
 
Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
 
Towards Workflow Ecosystems Through Semantic and Standard Representations
Towards Workflow Ecosystems Through Semantic and Standard RepresentationsTowards Workflow Ecosystems Through Semantic and Standard Representations
Towards Workflow Ecosystems Through Semantic and Standard Representations
 
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline UsersWorkflow Reuse in Practice: A Study of Neuroimaging Pipeline Users
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users
 
Frag Flow: Automated Fragment Detection in Scientific Workflows
Frag Flow: Automated Fragment Detection in Scientific WorkflowsFrag Flow: Automated Fragment Detection in Scientific Workflows
Frag Flow: Automated Fragment Detection in Scientific Workflows
 

Recently uploaded

Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdf
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 

Towards Reusable Research Software

  • 1. Towards Reusable Research Software Daniel Garijo Verdejo @dgarijov daniel.garijo@upm.es Ontology Engineering Group Departamento de Inteligencia Artificial Facultad de Informática Universidad Politécnica de Madrid
  • 2. Reproducibility: Open Research Data, Software and Methods 2 Scientific publication Research Data Research Software Research Methods EOSC Symposium: Infrastructure for quality research software
  • 3. Challenges for (Re)using and Sharing Research Software 3 • What does the software component do? Which of its methods should I use? • How to transform my data to use the software component? • How to interpret the results produced by the software component? • How to invoke the software component? • How to configure the software component with the right parameters? • How to compare against similar methods? Software designer Software user • How to ease capturing the dependencies and installation instructions of my software? • How to encapsulate my software so it can be used with other data? • How to describe my software so it can be used by others? • How to test if my software is ready to be used by others? EOSC Symposium: Infrastructure for quality research software
  • 4. Community Initiatives and Standards • Describing Research Software • Schema.org & Codemeta • Common Worflow Language (I/O) • Packaging Research Artefacts (incl. software) • Research Objects (RO-Crate) • Aggregators (OpenAIRE, EOSC) • General (e.g., Zenodo) & domain-specific registries • Scicodes (https://scicodes.net/) 4 Nine Best Practices for Research Software Registries and Repositories: A Concise Guide https://arxiv.org/abs/2012.13117 EOSC Symposium: Infrastructure for quality research software
  • 5. Adopting annotation vocabularies: where are we at? Software metadata is not abundant machine readable 5 EOSC Symposium: Infrastructure for quality research software Can you please describe your software component with metadata? I already did! Did you read the project readme? Did you see the online documentation? Perhaps the you saw the paper? Many domain-specific registries are curated by hand by experts
  • 6. Automated Software Metadata Extraction 6 SOMEF SOftware Metadata Extraction Framework https://github.com/KnowledgeCaptureAndDiscovery/somef/ [Mao et al 2019]: SoMEF: A Framework for Capturing Software Metadata from its Documentation. 2019 IEEE BigData REU Symposium. Los Angeles, 2019 EOSC Symposium: Infrastructure for quality research software Code repository (readme) Machine-readable file with software metadata: • > 20 common metadata fields • Installation instructions, description, invocation command, license, author, citation, requirements, examples, documentation, notebooks, etc. • Analysis of readme and supp. Files (e.g., notebooks, Dockerfiles) • JSON, RDF(graph), Codemeta, RO (in progress)
  • 7. Leveraging Software Metadata to create Knowledge Graphs 7 Explore input/output variables (interoperability) Explore Software I/O files (composition) Knowledge Graphs with can link RS and its components. OKG-Soft: machine-readable Software Metadata: • (From Schema.org) Attribution, license, funding, usage examples... • Executable software components • Software invocation • Input & output files, variables and units • Containers used to encapsulate and run software components • Parameter validation and suggestion [Garijo et al 2019]: OKG-Soft: An Open Knowledge Graph with Machine Readable Scientific Software Metadata. International Conference on eScience, San Diego, USA. 2019 EOSC Symposium: Infrastructure for quality research software
  • 8. Conclusions Research Software Metadata should be actionable and useful for: • Understanding the differences between two or more software components • Help portability (ROs) • Add components in workflows (CWL + ROs) • Help linking similar software methods • Build automated comparison benchmarks • Reduce the time needed to understand and adopt an existing software component • Author credit 8 EOSC Symposium: Infrastructure for quality research software
  • 9. Questions? Let's create machine-actionable software metadata 9 Image credit: https://icons8.com/icons/ + findable portable comparable executable reusable Code + documentation Automated extraction Knowledge Graphs EOSC Symposium: Infrastructure for quality research software Acknowledgements: Yolanda Gil, Deborah Khider, Varun Ratnakar, Maximiliano Osorio, Hernan Vargas, Oscar Corcho SOMEF