Scientific software is crucial for understanding, reusing and reproducing results in computational sciences. Software is often stored in code repositories, which may contain human readable instructions necessary to use it and set it up. However, a significant amount of time is usually required to understand how to invoke a software component, prepare data in the format it requires, and use it in combination with other software. In this presentation we introduce OKG-Soft, an open knowledge graph that describes scientific software in a machine readable manner. OKG-Soft includes: 1) an ontology designed to describe software and the specific data formats it uses; 2) an approach to publish software metadata as an open knowledge graph, linked to other Web of Data objects; and 3) a framework to annotate, query, explore and curate scientific software metadata.
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software Metadata
1. http://mint-project.info
OKG-SOFT: AN OPEN KNOWLEDGE
GRAPH WITH MACHINE READABLE
SCIENTIFIC SOFTWARE METADATA
Daniel Garijo, Maximiliano Osorio, Deborah Khider,
Varun Ratnakar and Yolanda Gil
University of Southern California,
Information Sciences Institute
@dgarijov
OKN-AKBC
May 22nd,Amherst, USAInformation
Sciences
Institute
2. http://mint-project.info
Science is changing: Open Science
2
Open publications
Open data
Open access
Open source software
Impact
and credit
OKG-SOFT: AN OPEN KNOWLEDGE GRAPH WITH MACHINE READABLE SCIENTIFIC SOFTWARE METADATA –ESCIENCE 2019
3. http://mint-project.info
The importance of Scientific Software
3
Open publications
Open data
Open source software
• Software helps understand data
• Provenance, reproducibility
• Software helps understanding methods
• Assumptions, limitations
OKG-SOFT: AN OPEN KNOWLEDGE GRAPH WITH MACHINE READABLE SCIENTIFIC SOFTWARE METADATA –ESCIENCE 2019
4. http://mint-project.info
Why is it difficult to reuse Scientific Software?
Software A
Hydrology Model
Weather DEM Infiltration
Outflow Error
FLDAS
(climate)
Remote
sensing
Let’s imagine we want to reuse existing work:
Software B
Map-based visualizations
?
?
?
How do I to transform data?
How do I invoke code?
How do I interpret results?
OKG-SOFT: AN OPEN KNOWLEDGE GRAPH WITH MACHINE READABLE SCIENTIFIC SOFTWARE METADATA –ESCIENCE 2019
4
5. http://mint-project.info
Outline
5
1. Requirements help scientific software reusability
2. Our current approach for representing scientific
software metadata
3. A framework to query, explore, exploit and publish
software metadata
OKG-SOFT: AN OPEN KNOWLEDGE GRAPH WITH MACHINE READABLE SCIENTIFIC SOFTWARE METADATA –ESCIENCE 2019
6. http://mint-project.info
Outline
6
1. Requirements help scientific software reusability
2. Our current approach for representing scientific
software metadata
3. A framework to query, explore, exploit and publish
software metadata
OKG-SOFT: AN OPEN KNOWLEDGE GRAPH WITH MACHINE READABLE SCIENTIFIC SOFTWARE METADATA –ESCIENCE 2019
7. http://mint-project.info
Requirements for Software Reusability
7
1. Exposing software inputs, outputs and their corresponding variables
Hydrology Software
Model
OKG-SOFT: AN OPEN KNOWLEDGE GRAPH WITH MACHINE READABLE SCIENTIFIC SOFTWARE METADATA –ESCIENCE 2019
Weather DEM Infiltration
Outflow Error
Input1 Input2 Input3
Output1 Output2
- Land surface temperature (degC)
- Precipitation rate (mm/h)
- Land surface wind speed (m/day)
- Net radiation (MJ/(day m^2)
8. http://mint-project.info
Requirements for Software Reusability
8
1. Exposing software inputs, outputs and their corresponding variables
2. Capturing the functions of the software being used
OKG-SOFT: AN OPEN KNOWLEDGE GRAPH WITH MACHINE READABLE SCIENTIFIC SOFTWARE METADATA –ESCIENCE 2019
Hydrology Software Model
Function A: Richards
Equation for water
movement (unsat soil)
Function B: Saint Venant
equations
(shallow water)
9. http://mint-project.info
Requirements for Software Reusability
9
1. Exposing software inputs, outputs and their corresponding variables
2. Capturing the functions of software being used
3. Using principled ontologies with structured names for model variables,
processes, and methods
OKG-SOFT: AN OPEN KNOWLEDGE GRAPH WITH MACHINE READABLE SCIENTIFIC SOFTWARE METADATA –ESCIENCE 2019
Temp
T
T_C
svo:land_surface_
air__temperature
10. http://mint-project.info
Requirements for Software Reusability
10
1. Exposing software inputs, outputs and their corresponding variables
2. Capturing the functions of software being used
3. Using principled ontologies with structured names for model variables,
processes, and methods
4. Capture the semantic structure of software invocations
OKG-SOFT: AN OPEN KNOWLEDGE GRAPH WITH MACHINE READABLE SCIENTIFIC SOFTWARE METADATA –ESCIENCE 2019
Dependencies?
Sample runs?
Invocation command?
Is data supposed to be in the same folder?
Default arguments/Configuration files?
Volumes?
Do I have to log in in the image
11. http://mint-project.info
Outline
11
1. Requirements help scientific software reusability
2. Our current approach for representing scientific
software metadata
3. A framework to query, explore, exploit and publish
software metadata
OKG-SOFT: AN OPEN KNOWLEDGE GRAPH WITH MACHINE READABLE SCIENTIFIC SOFTWARE METADATA –ESCIENCE 2019
12. http://mint-project.info
Prior Work: OntoSoft Software Metadata Registry
12
OntoSoft
Model and Software Metadata Registry
• Complements code repositories to
make them understandable
• Software metadata designed for
scientists
• Metadata is curated by decentralized
communities of users
• Training scientists on best practices
http://ontosoft.org
Finding Software
OKG-SOFT: AN OPEN KNOWLEDGE GRAPH WITH MACHINE READABLE SCIENTIFIC SOFTWARE METADATA –ESCIENCE 2019
[Gil et al 2015]: OntoSoft: Capturing Scientific Software Metadata Eighth ACM International
Conference on Knowledge Capture, Palisades, NY, 2015
13. http://mint-project.info
Prior Work: OntoSoft Software Metadata Registry
13
OKG-SOFT: AN OPEN KNOWLEDGE GRAPH WITH MACHINE READABLE SCIENTIFIC SOFTWARE METADATA –ESCIENCE 2019
PIHM PIHMgis DrEICH TauDEM WBMsed
14. http://mint-project.info
Evolving OntoSoft: Software Description Ontology
https://w3id.org/okn/o/sd#
Extensions:
• Schema.org (software metadata)
• W3C Data Cubes (Contents of inputs and outputs)
• NASA QUDT (Units)
• DockerPedia (Software images)
• Scientific Variables Ontology (Standard Variables)
OKG-SOFT: AN OPEN KNOWLEDGE GRAPH WITH MACHINE READABLE SCIENTIFIC SOFTWARE METADATA –ESCIENCE 2019
14
16. http://mint-project.info
https://w3id.org/okn/o/sd#
Extensions:
• Schema.org (software metadata)
• W3C Data Cubes (Contents of inputs and outputs)
• NASA QUDT (Units)
• DockerPedia (Software images)
• Scientific Variables Ontology (Standard Variables)
OKG-SOFT: AN OPEN KNOWLEDGE GRAPH WITH MACHINE READABLE SCIENTIFIC SOFTWARE METADATA –ESCIENCE 2019
16
Evolving OntoSoft: Software Description Ontology
17. http://mint-project.info
Describing Input/Output files, parameters and
variables
17
M. Stoica and S. D. Peckham, “An Ontology Blueprint for Constructing Qualitative and Quantitative
Scientific Variables,” in Proceedings of the ISWC 2018 Posters & Demonstrations, Industry and Blue Sky
Ideas Tracks co-located with 17th International Semantic Web Conference (ISWC 2018), Monterey, USA,
October 8th - to - 12th, 2018., 2018
Scientific Variables Ontology identifiers
18. http://mint-project.info
https://w3id.org/okn/o/sd#
Extensions:
• Schema.org (software metadata)
• W3C Data Cubes (Contents of inputs and outputs)
• NASA QUDT (Units)
• DockerPedia (Software images)
• Scientific Variables Ontology (Standard Variables)
OKG-SOFT: AN OPEN KNOWLEDGE GRAPH WITH MACHINE READABLE SCIENTIFIC SOFTWARE METADATA –ESCIENCE 2019
18
Evolving OntoSoft: Software Description Ontology
19. http://mint-project.info
Machine readable representation of units
19
“RAIN”
mm/day
CCUT
Linking and
augmenting from
Wikidata
• B. Shbita, A. Rajendran, J. Pujara, and C. Knoblock, Parsing, Representing and Transforming Units of Measure, in Modeling the World's Systems, 2019.
20. http://mint-project.info
https://w3id.org/okn/o/sd#
Extensions:
• Schema.org (software metadata)
• W3C Data Cubes (Contents of inputs and outputs)
• NASA QUDT (Units)
• DockerPedia (Software images)
• Scientific Variables Ontology (Standard Variables)
OKG-SOFT: AN OPEN KNOWLEDGE GRAPH WITH MACHINE READABLE SCIENTIFIC SOFTWARE METADATA –ESCIENCE 2019
20
Evolving OntoSoft: Describing Containers
21. http://mint-project.infoOKG-SOFT: AN OPEN KNOWLEDGE GRAPH WITH MACHINE READABLE SCIENTIFIC SOFTWARE METADATA –ESCIENCE 2019
21
Evolving OntoSoft: Describing Containers
“mintproject/
pihm2cycles”
Docker
pedia
Image tag
Image semantic representation
https://dockerpedia.inf.utfsm.cl/
• M. Osorio, H. Vargas, and C. Buil Aranda, “DockerPedia: a Knowledge Graph of Docker Images,” in Proceedings of the ISWC 2018 Posters &
Demonstrations, Industry and Blue Sky Ideas Tracks co-located with 17th International Semantic Web Conference (ISWC 2018), Monterrey, 2018.
22. http://mint-project.info
Outline
22
1. Requirements help scientific software reusability
2. Our current approach for representing scientific
software metadata
3. A framework to query, explore, exploit and publish
software metadata
OKG-SOFT: AN OPEN KNOWLEDGE GRAPH WITH MACHINE READABLE SCIENTIFIC SOFTWARE METADATA –ESCIENCE 2019
23. http://mint-project.info
OKG-SOFT: Framework
23
Software Model Catalog contains:
• Models from hydrology, agriculture and economy, their versions and model
configurations.
• More than 200 variables mapped to SVO.
• All models are executable through scientific workflows
• Most contents are added manually (expert users) collaboratively
• Automated unit transformations
• Automated software image description
• Semi-automated Wikidata linking
OKG-SOFT: AN OPEN KNOWLEDGE GRAPH WITH MACHINE READABLE SCIENTIFIC SOFTWARE METADATA –ESCIENCE 2019
https://query.mint.isi.edu/api/mintproject/MINT-ModelCatalogQueries#/
APIs:
• SPARQL endpoint
• REST APIs (GET/POST)
• Python clients
24. http://mint-project.info
Exploitation: Exploring Scientific Software Model
Metadata
24http://models.mint.isi.edu
Explore variables
OKG-SOFT: AN OPEN KNOWLEDGE GRAPH WITH MACHINE READABLE SCIENTIFIC SOFTWARE METADATA –ESCIENCE 2019
Explore Software I/O
Find Software Models
27. http://mint-project.info
Summary
27
Scientific software reusability is crucial to understand
• Existing data
• Published methods
1. Requirements for scientific software reusability include
• Expose inputs, outputs, variables and software invocation details!
2. Our approach for capturing and structuring scientific software
3. A framework to query, explore, exploit and publish software
metadata
OKG-SOFT: AN OPEN KNOWLEDGE GRAPH WITH MACHINE READABLE SCIENTIFIC SOFTWARE METADATA –ESCIENCE 2019
28. http://mint-project.info
Help us making your software more reusable
28
We would like to thank Scott Peckham, Maria Stoica, Chris Duffy, Lele Shu, Kelly
Cobourn, Zeya Zhang Suzanne Pierce, Armen Kemanian, Rajiv Mayani, Jay Puajara,
Basel Shbita, Dhruv Pattel, Rohit Mayura, Amrish Goel and Anuj Doiphode
OKG-SOFT: AN OPEN KNOWLEDGE GRAPH WITH MACHINE READABLE SCIENTIFIC SOFTWARE METADATA –ESCIENCE 2019
Contact me: dgarijo@isi.edu