Simon Cox (Researcher @ CSIRO) and I presented on outcomes of the CSIRO Summer of Vocabularies.
This project focused on:
- Examining the state of the management of various controlled vocabularies and developing reusable processes to clean & standardise those vocabularies.
- Developing standards & technologies for vocabulary management, curation & visualisation.
We presented on these Summer of Vocabularies activities and how they are being used now to further the work of the Vocram project and the improvement and development of ANDS vocabulary services.
1. Summer of Vocabularies
Jane Frazier, Data Librarian
Australian National Data Service
Simon Cox, Research Scientist
CSIRO Land & Water
2. 1. Meet the team
2. Why a Summer of Vocabularies?
3. Existing CSIRO & ANDS vocabulary infrastructure
4. Tasks completed during Summer of Vocabularies
5. After the Summer of Vocabularies
3. the SoV team
Simon Cox
Research Scientist
CSIRO Land & Water
Jane Frazier
Data Librarian
ANDS
Fabrizio Giabardo
Bachelor of Science & Computer Science candidate, Monash
CSIRO Summer Vacation Scholar
Ben Leighton
Research Software Engineer
CSIRO Land & Water
Becky Schmidt
Knowledge Delivery Team Lead
CSIRO Land & Water
Sally Tetreault Campbell
Divisional Editor
CSIRO Land & Water
Megan Williams
Bachelor of Environmental & Social Science candidate, RMIT
CSIRO Summer Vacation Scholar
Jonathan Yu
Research Software Engineer
CSIRO Land & Water
4. CSIRO needs science vocabularies to support their research,
to provide categorical values within datasets, and to index datasets
Current collection of vocabularies is varied; different formats & locations;
different maintenance regimes; often not published openly
● Standardising formats & locations of vocabularies
● Creating a reusable process to standardise & store future vocabularies
● Developing features to improve current vocabulary tools
why?
9. Vocabulary Widget Linked Data Browser
Vocabulary Service API
http://researchdata.ands.org.au:8080/vocab/api/{vocab key}/{service call}
10. SoV tasks
1. Curation of a list of relevant vocabularies
2. Cleaning & standardisation of vocabularies sourced from various places
3. Formalisation & documentation of the transformation process for vocabularies
4. Formalisation & documentation of a schema for the description of vocabularies
5. Develop a workflow and technology for vocab management/curation/visualisation
6. Debug/customize the Linked Data Registry tool
7. Improve visualisation in SISSVoc
11. 1. Curation of a list of vocabularies that are appropriate for use by the
Australian research community and, more specifically, the CSIRO
environmental research community
12. original format final format
Australian and New Zealand Standard Research
Classification (ANZSRC)
American Geophysical Union (AGU) Index Terms
Australian Governments' Interactive Functions
Thesaurus (AGIFT)
National Groundwater Information System (NGIS)
Terminology
International Chronostratigraphic Chart
(Geologic Timescale)
2. Cleaning & standardisation of selected vocabulary content
.xls
<HTML>
SKOS
in .ttl.xls
SKOS
in .ttl
SKOS
in .ttl
SKOS
in .ttl
GTS +
SKOS
in .ttl
RDF
in .ttl
PDF <HTML>
+
13. 2. Cleaning & standardisation of selected vocabulary content
.xls .xls
RDF123
SKOS
in .ttl
[original
Excel file]
[RDF123
compatible
Excel file]
CSIRO
Linked Data Registry
14. 3. Formalisation and documentation of a transformation process for
vocabularies
Can we make the cleaning & standardisation
process reusable?
What can be completed by (SKOS-novice) users
who want to use our vocab services but don’t want
to delve into esoteric tools like RDF123 & TopBraid?
15. 3. Formalisation and documentation of a transformation process for
vocabularies
● Select commonly used format
● Select representative vocabulary for demonstration
original format final format
Australian and New Zealand Standard Research
Classification (ANZSRC)
machine-
friendly
Excel
human-
friendly
Excel
● Examine steps taken to clean & transform vocabulary
● Examine artifact documents created during cleaning & transformation
● Construct support documentation for SKOS-novice users
16. A guide to the
transformation &
ingestion process
for vocabularies
Design
FAQ
URI
creation
FAQ
Useful
functions
Vocabulary
ingestion template
Excel spreadsheet
{in progress}
3. Formalisation and documentation of a transformation process for
vocabularies
17. original format vocabulary ingestion template format
3. Formalisation and documentation of a transformation process for
vocabularies
18. 4. Formalisation and documentation of a schema for the
description of vocabularies
ANDS Vocabulary Portal
metadata schema recommendations
ANDS Vocabulary Portal
proposed architecture
19. 5. Develop a toolkit and technology for vocab
management/curation/visualisation
RDF Triple-store
Linked Data
Registry
SISSVoc
vocabulary API
SKOS/RDF
pipeline
Vocabulary
source
SPARQL
update
SPARQL
query
LDR API
20. Vocabularies can be used through standard Web API (SPARQL)
RDF Triple-store
Linked Data
Registry
SISSVoc
vocabulary API
SPARQL
update
SPARQL
query
other web
apps ...
SPARQL
query