Cross-domain data
discovery and integration
Simon J D Cox
CSIRO Land and Water
7 November 2018
Finding data for x-disciplinary applications
Use cross-domain data catalogs
SCIDATACON 2018-11-07 | Cox et al. | Harmonising data description2
|
SCIDATACON 2018-11-07 | Cox | x-domain discovery and integration
SCIDATACON 2018-11-07 | Cox | x-domain discovery and integration
Generic
dataset
metadata
SCIDATACON 2018-11-07 | Cox | x-domain discovery and integration
Metadata standards
SCIDATACON 2018-11-07 | Cox | x-domain discovery and integration
Primarily
domain usage
Variable level
Study level
Generic
applications
SSN/SOSA
DCAT
19115
EMLDQV
FHIR
HL7Q
DATS
DDI
CERIF
QB
SCIDATACON 2018-11-07 | Cox | x-domain discovery and integration
Observation metadata
SCIDATACON 2018-11-07 | Cox | x-domain discovery and integration
SSN Observation
- resultTime
- madeBySensor
- usedProcedure
- hasFeatureOfInterest
- hasResult
- phenomenonTime
- observedProperty
Lining up natural and social science concepts
SCIDATACON 2018-11-07 | Cox | x-domain discovery and integration
Controlled vocabularies
Provenance/context
Observation properties
Dataset statistics
SCIDATACON 2018-11-07 | Cox | x-domain discovery and integration
Metadata mashup
Is ‘metadata’ the solution?
• Discovery might be done other ways
SCIDATACON 2018-11-07 | Cox | x-domain discovery and integration
Metadata
property -> data source (type)
Semantic Data Lake – Envisaged Architecture
Decomposing
User QuerySPARQL query
Database
XML
File
?item gho:Country ?country .
?item gho:Disease ?disease .
...
SELECT country, disease, ...
FROM Observations
Finding Relevant Data Sources
+ Queries Translation
SQL XPathSQL
MongoDB
JSONPath
SQL
XML
MongoDB
Execution Plan
SCIDATACON 2018-11-07 | Cox | x-domain discovery and integration
Run-time integration?
Thank you
Simon J D Cox
Research Scientist
CSIRO Land and Water
simon.cox@csiro.au.au
SCIDATACON 2018-11-07 | Cox | x-domain discovery and integration
Dataset catalog – schema.org
SCIDATACON 2018-11-07 | Cox | x-domain discovery and integration
W3C Dataset catalog vocabulary - DCAT
SCIDATACON 2018-11-07 | Cox | x-domain discovery and integration
SCIDATACON 2018-11-07 | Cox | x-domain discovery and integration
CKAN ♥ DCAT

Cross-domain data discovery and integration

Editor's Notes

  • #7 There – two dirty words in one title!
  • #13 The Squerall project has proved initially promising results but needs further development. Performance is better than the traditional integration at ingestion time - i.e. mapping everything to a single unified data model - as you only ever query relevant data, not the whole lot, looking for things that aren’t there.