Towards Reusable Research Software with Automated Metadata Extraction

Towards Reusable
Research Software
Daniel Garijo Verdejo
@dgarijov
daniel.garijo@upm.es
Ontology Engineering Group
Departamento de Inteligencia Artificial
Facultad de Informática
Universidad Politécnica de Madrid

Reproducibility: Open Research Data, Software and Methods
2
Scientific publication
Research Data Research Software Research Methods
EOSC Symposium: Infrastructure for quality research software

Challenges for (Re)using and Sharing Research Software
3
• What does the software component do?
Which of its methods should I use?
• How to transform my data to use the
software component?
• How to interpret the results produced by
the software component?
• How to invoke the software component?
• How to configure the software component
with the right parameters?
• How to compare against similar methods?
Software designer
Software user
• How to ease capturing the
dependencies and installation
instructions of my software?
• How to encapsulate my software so
it can be used with other data?
• How to describe my software so it
can be used by others?
• How to test if my software is ready
to be used by others?

Community Initiatives and Standards
• Describing Research Software
• Schema.org & Codemeta
• Common Worflow Language (I/O)
• Packaging Research Artefacts (incl. software)
• Research Objects (RO-Crate)
• Aggregators (OpenAIRE, EOSC)
• General (e.g., Zenodo) &
domain-specific registries
• Scicodes (https://scicodes.net/)
4
Nine Best Practices for Research Software Registries and Repositories: A Concise Guide https://arxiv.org/abs/2012.13117

Adopting annotation vocabularies: where are we at?
Software metadata is not abundant machine readable
5
Can you please describe your
software component with metadata?
I already did! Did you read the
project readme?
Did you see the online
documentation?
Perhaps the you saw the
paper?
Many domain-specific registries are curated by
hand by experts

Automated Software Metadata Extraction
6
SOMEF
SOftware Metadata
Extraction Framework
https://github.com/KnowledgeCaptureAndDiscovery/somef/
[Mao et al 2019]: SoMEF: A Framework for Capturing Software Metadata from its Documentation. 2019 IEEE BigData REU Symposium. Los
Angeles, 2019
Code repository
(readme)
Machine-readable file with software metadata:
• > 20 common metadata fields
• Installation instructions, description, invocation
command, license, author, citation, requirements,
examples, documentation, notebooks, etc.
• Analysis of readme and supp. Files (e.g., notebooks,
Dockerfiles)
• JSON, RDF(graph), Codemeta, RO (in progress)

Leveraging Software Metadata to create Knowledge Graphs
7
Explore input/output variables (interoperability)
Explore Software I/O files
(composition)
Knowledge Graphs with can link RS and its
components.
OKG-Soft: machine-readable Software Metadata:
• (From Schema.org) Attribution, license, funding,
usage examples...
• Executable software components
• Software invocation
• Input & output files, variables and units
• Containers used to encapsulate and run software
components
• Parameter validation and suggestion
[Garijo et al 2019]: OKG-Soft: An Open Knowledge Graph with Machine Readable Scientific Software Metadata. International
Conference on eScience, San Diego, USA. 2019

Conclusions
Research Software Metadata should be actionable and useful for:
• Understanding the differences between two or more software
components
• Help portability (ROs)
• Add components in workflows (CWL + ROs)
• Help linking similar software methods
• Build automated comparison benchmarks
• Reduce the time needed to understand and adopt an existing
software component
• Author credit
8

Questions?
Let's create machine-actionable software metadata
9
Image credit: https://icons8.com/icons/
+
findable
portable
comparable
executable
reusable
Code +
documentation
Automated
extraction
Knowledge
Graphs
Acknowledgements: Yolanda Gil, Deborah Khider, Varun Ratnakar, Maximiliano Osorio,
Hernan Vargas, Oscar Corcho
SOMEF

Towards Reusable Research Software with Automated Metadata Extraction

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Towards Reusable Research Software with Automated Metadata Extraction

Similar to Towards Reusable Research Software with Automated Metadata Extraction (20)

More from dgarijo

More from dgarijo (20)

Recently uploaded

Recently uploaded (20)

Towards Reusable Research Software with Automated Metadata Extraction