This presentation was given at the ISESS'15 conference at Melbourne.
Abstract. netCDF is a well-known and widely used format to exchange array-oriented scientific data such as grids and time-series. We describe a new convention for encoding netCDF based on Linked Data principles called netCDF-LD. netCDF-LD allows metadata elements, given as string values in current netCDF files, to be given as Linked Data objects. netCDF-LD allows precise semantics to be used for elements and expands the type options beyond lists of controlled terms. Using Uniform Resource Identifiers (URIs) for elements allows them to refer to other Linked Data resources for their type and descriptions. This enables improved data discovery through a generic mechanism for element type identification and adds element type expandability to new Linked Data resources as they become available. By following patterns already established for extending existing formats, netCDF-LD applications can take advantage of existing software for processing Linked Data and supporting more effective data discovery and integration across systems.
See http://link.springer.com/chapter/10.1007%2F978-3-319-15994-2_9
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)Dag Endresen
Presentation of TDWG and GBIF for the ECP/GR D&I Network meeting at ZADI Bonn Germany 11th April 2005. Dag Endresen (Nordic Gene Bank). TDWG is a standardization body for Biodiversity Information Standards; GBIF is a Global Biodiversity Information Facility for free and open access to biodiversity data.
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)Dag Endresen
Visit to the NI Vavilov Institute for Plant Industry (VIR) in April 2010. Installation of the GBIF IPT toolkit for data publishing as a test upgrade for the EURISCO data infrastructure of European genebanks.
Data exchange alternatives, GIGA TAG (2009)Dag Endresen
GIGA TAG meeting at Bioversity International, Rome, Italy 18th May 2009. Data exchange alternatives for the Global Information on Germplasm Accessions (GIGA) project. Dag Endresen (Bioversity/NordGen).
GlobusWorld 2021: Arecibo Observatory Data MovementGlobus
The story of how Globus helped move petabytes of data from the Arecibo Observatory to TACC, and thereby save 50+ years of data for posterity and future research. Presented at the GlobusWorld 2021 conference by George Robb III.
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)Dag Endresen
Presentation of TDWG and GBIF for the ECP/GR D&I Network meeting at ZADI Bonn Germany 11th April 2005. Dag Endresen (Nordic Gene Bank). TDWG is a standardization body for Biodiversity Information Standards; GBIF is a Global Biodiversity Information Facility for free and open access to biodiversity data.
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)Dag Endresen
Visit to the NI Vavilov Institute for Plant Industry (VIR) in April 2010. Installation of the GBIF IPT toolkit for data publishing as a test upgrade for the EURISCO data infrastructure of European genebanks.
Data exchange alternatives, GIGA TAG (2009)Dag Endresen
GIGA TAG meeting at Bioversity International, Rome, Italy 18th May 2009. Data exchange alternatives for the Global Information on Germplasm Accessions (GIGA) project. Dag Endresen (Bioversity/NordGen).
GlobusWorld 2021: Arecibo Observatory Data MovementGlobus
The story of how Globus helped move petabytes of data from the Arecibo Observatory to TACC, and thereby save 50+ years of data for posterity and future research. Presented at the GlobusWorld 2021 conference by George Robb III.
High Performance Cyberinfrastructure Is Required for the Era of Big DataLarry Smarr
13.03.14
Opening Workshop Presentation
“Whither Science in Mexico: an Analysis for Action from the Academic, Industry and Technology.”
Held at CICESE, Ensenada, Mexico
Virtual Marine Geophysical Data Center July 2010 OriginalBrad Ilg
New Zealand Marine Geophysical Virtual Data Center Proposal
Dr Brad Ilg July 2010
Note: Public information regarding an awarded permit was added to the original PowerPoint slide 15 for comparative purposes.
The pilot New Zealand Marine Geophysical Virtual Data Center project has four primary goals:
To render publicly-available NZ geophysical, geological and physical geographical metadata visible via one or more clients.
To put end-users in direct contact with data owners.
To standardise metadata delivery.
To test the possible long-term viability of centralised geophysical metadata delivery.
Semantically-Enabled Environmental Data Discovery and Integration: Demonstrat...Tatiana Tarasova
This presentation is about a framework for semantically-enabled data discovery and integration across multiple Earth science disciplines. Data harmonization was based on the principles of Linked Data. Previous works define the Data Cube extensions which are relevant to certain Earth science disciplines. To provide a generic and domain independent solution, we propose an upper level vocabulary (the ENVRI vocabulary) that allows us to express domain specific information at a higher level of abstraction.
From a human viewpoint we provide an interactive Web based user interface for data discovery and integration across multiple research infrastructures (http://portal.envri.eu). The system is demonstrated on a use case of the Iceland Volcano’s eruption on April 10, 2010.
Large Scale On-Demand Image Processing For Disaster ReliefRobert Grossman
This is a status update (as of Feb 22, 2010) of a new Open Cloud Consortium project that will provide on-demand, large scale image processing to assist with disaster relief efforts.
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...Laurent Lefort
Canberra Semantic Web Meetup.
Initiatives have been launched to develop semantic vocabularies representing statistical classifications and discovery metadata. Tools are also being created by statistical organizations to support the publication of dimensional data conforming to the Data Cube specification, now in Last Call at W3C.
The meeting will be an opportunity to hear about two semantic Web and Linked Data initiatives for statistical data that are driven by the Australian Government. The Bureau of Meteorlogy and CSIRO have recently released a Linked Data version of the ACORN-SAT historical climate data at http://lab.environment.data.gov.au and the ABS has released the Census data modelled in the Data Cube vocabulary which is part of a challenge the ABS is organising in context of the SemStats Workshop (http://www.datalift.org/en/event/semstats2013/challenge) at the International Semantic Web Conference (ISWC) in Sydney (http://iswc2013.semanticweb.org).
Come along to hear about these two projects, the challenges encountered and the solutions developed.
High Performance Cyberinfrastructure Is Required for the Era of Big DataLarry Smarr
13.03.14
Opening Workshop Presentation
“Whither Science in Mexico: an Analysis for Action from the Academic, Industry and Technology.”
Held at CICESE, Ensenada, Mexico
Virtual Marine Geophysical Data Center July 2010 OriginalBrad Ilg
New Zealand Marine Geophysical Virtual Data Center Proposal
Dr Brad Ilg July 2010
Note: Public information regarding an awarded permit was added to the original PowerPoint slide 15 for comparative purposes.
The pilot New Zealand Marine Geophysical Virtual Data Center project has four primary goals:
To render publicly-available NZ geophysical, geological and physical geographical metadata visible via one or more clients.
To put end-users in direct contact with data owners.
To standardise metadata delivery.
To test the possible long-term viability of centralised geophysical metadata delivery.
Semantically-Enabled Environmental Data Discovery and Integration: Demonstrat...Tatiana Tarasova
This presentation is about a framework for semantically-enabled data discovery and integration across multiple Earth science disciplines. Data harmonization was based on the principles of Linked Data. Previous works define the Data Cube extensions which are relevant to certain Earth science disciplines. To provide a generic and domain independent solution, we propose an upper level vocabulary (the ENVRI vocabulary) that allows us to express domain specific information at a higher level of abstraction.
From a human viewpoint we provide an interactive Web based user interface for data discovery and integration across multiple research infrastructures (http://portal.envri.eu). The system is demonstrated on a use case of the Iceland Volcano’s eruption on April 10, 2010.
Large Scale On-Demand Image Processing For Disaster ReliefRobert Grossman
This is a status update (as of Feb 22, 2010) of a new Open Cloud Consortium project that will provide on-demand, large scale image processing to assist with disaster relief efforts.
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...Laurent Lefort
Canberra Semantic Web Meetup.
Initiatives have been launched to develop semantic vocabularies representing statistical classifications and discovery metadata. Tools are also being created by statistical organizations to support the publication of dimensional data conforming to the Data Cube specification, now in Last Call at W3C.
The meeting will be an opportunity to hear about two semantic Web and Linked Data initiatives for statistical data that are driven by the Australian Government. The Bureau of Meteorlogy and CSIRO have recently released a Linked Data version of the ACORN-SAT historical climate data at http://lab.environment.data.gov.au and the ABS has released the Census data modelled in the Data Cube vocabulary which is part of a challenge the ABS is organising in context of the SemStats Workshop (http://www.datalift.org/en/event/semstats2013/challenge) at the International Semantic Web Conference (ISWC) in Sydney (http://iswc2013.semanticweb.org).
Come along to hear about these two projects, the challenges encountered and the solutions developed.
Knowledge Discovery in Environmental Management Dr. Aparna Varde
This is a research presentation by Aparna Varde during a summer research visit at the Max Planck Institute for Informatics, Saarbruecken, Germany in August 2015 within the research group of Dr. Gerhard Weikum, The presentation focuses on various aspects of data mining and knowledge discovery pertinent to environmental science and management. It encompasses three main topics: (1) decision support for the greening of data centers; (2) predictive analysis in urban planning and simulation; and (3) common sense knowledge for domain-specific KBs. It includes a few brief highlights on web and text mining in article / collocation error detection as well as in terminology evolution. This presentation is based on her relevant work as per August 2015, serving as an invited talk during this research visit.
The Department of Energy's Integrated Research Infrastructure (IRI)Globus
We will provide an overview of DOE’s IRI initiative as it moves into early implementation, what drives the IRI vision, and the role of DOE in the larger national research ecosystem.
Opening Keynote Lecture
15th Annual ON*VECTOR International Photonics Workshop
Calit2’s Qualcomm Institute
University of California, San Diego
February 29, 2016
Visualising the Australian open data and research data landscapeJonathan Yu
"Visualising the Australian open data and research data landscape" at C3DIS May 2018 in Melbourne. In this talk, we presented work around the visualisation of an survey of open government and research data in Australia. This features a first attempt at formalising a quantitative based approach to measuring the data ecosystem in Australia.
OzNome 5-star Tool: A Rating System for making data FAIR and Trustable
Talk given by Simon Cox at eResearch 2017 in Brisbane at the Making Data FAIR
session. See https://conference.eresearch.edu.au/2017/08/oznome-5-star-tool-a-rating-system-for-making-data-fair-and-trustable/
About:
We have developed a set of criteria under 14 headings to assess data collection, publication and service provisioning arrangements (see https://confluence.csiro.au/display/OZNOME/Data+ratings). This is based on experience with data services and supply chains, and rating systems and maturity models proposed over the years across different initiatives, e.g. 5* Linked Open Data, FAIR, Schema.org.
The rating system is implemented as the 5* OzNome Data tool (available at http://oznome.csiro.au/5star/) allows users to carry out a self-assessment by classifying the 14 facets into five qualities of data – Findable, Accessible, Interoperable, Reusable and Trusted. For each quality, the user assesses the current state of their data offerings and products. They are given a rating out of 5 stars for each quality. In doing the self-assessment, users can also explore ways in which they are able to improve their data collection and how it is accessed by others. This gives data providers graduated targets to improve their data collection and publishing process.
Australian Open government and research data pilot survey 2017Jonathan Yu
Australian Open data pilot survey conducted October 2017 leveraging indexed datasets across government and research sources via the CSIRO Knowledge Network (http://kn.csiro.au). Please note, these are preliminary results using our prototype quantitative methodology to assess volume, variety and velocity of open data initiatives across Australia. Lots of sources missing (we'd love to hear feedback about which ones would be good to include in the future!). Future work include addressing gaps in sources list, de-duplication of cross-indexed datasets, quantifying web services data, and an online version of the analysis.
WESCML: A Data Standard for Exchanging Water and Energy Supply and Consumptio...Jonathan Yu
Slides from a talk given at the International Hydroinformatics Conference Incheon, South Korea 22 Aug '16 on the Water and Energy Supply and Consumption markup language data standard (WESCML) and its supporting tools.
http://wescml.org
Paper here: http://dx.doi.org/10.1016/j.proeng.2016.07.451
eReefs Data Brokering Layer (DBL) is about a mediating web service which provides integrated discovery and access to a range of known data services. It extends the idea of Web Catalogues and uses semantic web technologies (RDF, OWL, SPARQL, JSON-LD) to provide a linked data view of data providers, web services, and dataset metadata.
The DBL aims to support the broader eReefs information architecture. It allows flexible 'plug-n-play' data provider nodes to participate in the initiative. The DBL also provides client applications and end users with enhanced discovery of the underlying datasets which is traceable as the details are captured as linked data.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
netCDF-LD - Towards linked data conventions for delivery of environmental data using netCDF
1. Towards linked data conventions for
delivery of environmental data using
netCDF
LAND AND WATER FLAGSHIP | OCEANS AND ATMOSPHERE FLAGSHIP
Jonathan Yu, Nicholas Car, Adam Leadbetter*, Bruce A. Simons, and Simon J.D. Cox
CSIRO LWF and BODC/ Marine Institute (Galway)
25 March 2014 | ISESS
… netCDF-LD
2. Outline
1. Big data and netCDF
2. Metadata challenges
3. Linked Data, JSON-LD & CSV-on-the-web
4. netCDF-LD
5. Applications
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu2 |
3. Big data
• “90% of the world’s data has been produced over the last two
years”
• Environmental and earth observation has always been big-data…
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu3 |
4. National Soil and Landscape Grid
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu4 |
6. Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu6 |
Real-time wind data
7. netCDF
• Persistence layer
• Simple: self-describing, user
accessible “flat file”
• Java, C++, python, others
• Interfaces to the Data Model
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu7 |
File format
Software library
API
9. netCDF cont’
• Really good at handling array-oriented data – efficient subsetting
• Multi-dimensional grids
• CF conventions
climatology and
weather forecasting
domains
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu9 |
10. netCDF metadata example
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu10 |
float Nap_MIM(time, latitude, longitude) ;
Nap_MIM:_FillValue = -999.f ;
Nap_MIM:long_name = "TSS, MIM SVDC on Rrs" ;
Nap_MIM:units = "mg/L" ;
Nap_MIM:valid_min = 0.01209607f ;
Nap_MIM:valid_max = 226.9626f ;
Nap_MIM:standard_name =
”total_suspended_solids”;
11. Use of netCDF for Chlorophyll observations
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu11 |
12. netCDF challenges
• Encoding format and framework
• “Domain agnostic” but conventions exist for communities – e.g.
Climate and Forecasting (CF conventions)
• CF conventions limited
• Use of ‘standard_name’ attribute to denote the combination of
medium/transformation/substance/state/quantity using a particular syntax
• Assumes English
• Standard names take a while to be adopted
• Standard names variations of each other
• Units are from UDUNITS – Unidata managed
• Narrow scope of scientific domain – can’t reuse for other domains
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu12 |
13. netCDF Metadata challenges – Semantic heterogeneity
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu13 |
Enviro
Application
#1
Data
DB
Chl_MIM
Enviro
Application
#2
Data
DB
mass_conc_
chlorophyll_
In_sea_water
Enviro
Application
#3
Data
DB
mass_conc_
chlorophyll_a_
In_sea_water
14. Semantic Heterogeneity… leads to data silos
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu14 |
Enviro
Application
#1
Enviro
Application
#2
Enviro
Application
#3
Data Data Data
DB DBDB
Chl_MIM
mass_conc_
chlorophyll_
In_sea_water
mass_conc_
chlorophyll_a_
In_sea_water
X X
Meetings
15. Approaches to address netCDF metadata challenge
1. Strict naming grammars for standard_name
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu15 |
(Peckham 2014) CSDMS Standard
Naming Conventions
16. Approaches to address netCDF metadata challenge (2)
2. Break up various concerns in standard_name into multiple
attributes – ref (Yu et al. 2014)
float Nap_MIM(time, latitude, longitude) ;
Nap_MIM:_FillValue = -999.f ;
Nap_MIM:long_name = "TSS, MIM SVDC on Rrs" ;
Nap_MIM:units = "mg/L" ;
Nap_MIM:valid_min = 0.01209607f ;
Nap_MIM:valid_max = 226.9626f ;
Nap_MIM:scaledQuantityKind_id
= "http://environment.data.gov.au/water/quality/def/property/solids-
total_suspended" ;
Nap_MIM:unit_id =
"http://environment.data.gov.au/water/quality/def/unit/MilliGramsPerLitre" ;
Nap_MIM:substanceOrTaxon_id =
"http://environment.data.gov.au/water/quality/def/object/solids";
Nap_MIM:medium_id =
"http://environment.data.gov.au/water/quality/def/object/ocean"
Nap_MIM:procedure_id = "http://data.ereefs.org.au/ocean-colour/MIM_SVDC_RRS" ;
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu16 |
17. Harmonised Publish, Discovery, Access and Use
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu17 |
Relies on community
agreed vocabularies
Describe/Publish
the data
Query/Use data
Enviro
Application
#1
Enviro
Application
#2
Enviro
Application
#3
Data Data Data
DB DBDB
substanceOrTaxon=
http://environment.data.gov.au/def/object/chlorophyll
scaledQuantityKind =
http://environment.data.gov.au/def/property/chlorophyll_
concentration
Need to communicate more consistently
Requires shared, precise, agreed semantics
18. Linked Data, JSON-LD & CSV-on-the-web
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu18 |
"LOD Cloud Diagram as of September 2011" by Anja Jentzsch
19. Linked Data
• Method of connect related data, existing vocabularies and other
semantics using web links (URIs) and a language for describing
resources (RDF)
• Applications can then
query data, follow the links
and infer new insights
from the data
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu19 |
20. JSON-LD
{
"name": "John Lennon",
"born": "1940-10-09",
"spouse":
"http://dbpedia.org/resource/Cynthia_Lennon"
}
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu20 |
JSON-LD
Decorators
21. JSON-LD and Semantic Web
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu21 |
http://dbpedia.org/re
source/John_Lennon
http://dbpedia.org/res
ource/Cynthia_Lennon
“John Lennon”
1940-10-09
22. CSV-on-the-web
• Linked data for CSV
tabular data
• Add context to
tables
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu22 |
23. Example: Domain vocab term
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu23 |
24. Example: Domain vocab term
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu24 |
...
wqp:chlorophyll_a_concentration
a skos:Concept, op:ScaledQuantityKind,
qudt:ChemistryQuantityKind ;
skos:broader wqp:chlorophyll_concentration ;
skos:prefLabel "chlorophyll a concentration"@en ;
26. netCDF-LD: Linkifying netCDF
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu26 |
‘Context’ boilerplates
27. netCDF-LD: Linkifying netCDF
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu27 |
Air Temperature
Definition
Air
Temperature
Medium
Quantity
Kind
Kelvin
UoM
eReefs Vocabularies
29. Assigning URIs to variable level attributes
z:units = "meters";
z:units_ref = "http://qudt.org/vocab/unit#Meter";
z:a =
"http://environment.data.gov.au/def/op#quantityKind";
z:dcPartOf = "http://foo.bar/linked_netCDF_example";
z:valid_range = 0., 5000.;
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu29 |
30. netCDF-LD to RDF
@prefix unit: <http://qudt.org/vocab/unit#> .
@prefix qudt: <http://qudt.org/1.1/schema/qudt#> .
@prefix op: <http://environment.data.gov.au/def/op#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
_:z qudt:unit unit:Meter;
a op:ScaledQuantityKind;
dcterms:isPartOf <http://foo.bar/linked_netCDF_example>.
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu30 |
31. Use of netCDF-LD to support Data Discovery
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu31 |
32. Data annotated with bindings to vocab URIs
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu32 |
THREDDS
THREDDS
Catalog
Domain Vocabs
(Water Quality at
environment.data.gov.au)
Quantities/ Units ontology
(QUDT)
substanceOrTaxon=
http://environment.data.gov.au
/def/object/chlorophyll
scaledQuantityKind
= http://environment.data.gov.au
/def/property/chlorophyll_concentra
tion
unit
= http://qudt.org/vocab/unit#Unitless
medium
= http://environment.data.gov.au
/def/feature/ocean
33. DBL Harvesting and End Use
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu33 |
Data Brokering
Layer
THREDDS
THREDDS
Catalog
Domain Vocabs
(Water Quality at
environment.data.gov.au)
Quantities/ Units ontology
(QUDT)
substanceOrTaxon=
http://environment.data.gov.au
/def/object/chlorophyll
scaledQuantityKind
= http://environment.data.gov.au
/def/property/chlorophyll_concentra
tion
unit
= http://qudt.org/vocab/unit#Unitless
medium
= http://environment.data.gov.au
/def/feature/ocean
End users
Client
application
chlorophyll
34. netCDF-LD benefits
• Enhanced metadata – better self-description for any domain
• Allows binding to rich semantics from existing vocabularies being
published via SKOS/OWL/RDF in multiple domains
• Linked Data approach – consistent with Semantic Web
technologies, RDF, JSON-LD, CSV-on-the-web
• Hook into toolkits and software libraries for data discovery via
Semantic Web technologies and other Linked data already being
published online
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu34 |
35. Future/Current Work
1. netCDF-LD requires adoption at community level – need to test
the proposal at OGC, Unidata, CF communities
2. Tooling for parsing netCDF-LD
3. Encoding large volumes using netCDF-LD
4. Demonstrate Data Broker concept using netCDF-LD
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu35 |
36. Summary
• netCDF is really good at big data – esp. array-oriented data
• Limitations with the current netCDF metadata structure
• *-LD pattern is emerging across multiple formats (JSON, CSV)
• Propose netCDF-LD which is an unintrusive extension on netCDF to
facilitate improved self-described datasets
• netCDF-LD has potential to enhance data discovery from
environmental dataset across to other Linked Data being
published online
Towards linked data conventions for delivery of environmental data using netCDF | Jonathan Yu36 |
37. LAND AND WATER
Thank you
Land and Water
Jonathan Yu
Research Software
Engineer
t +61 3 9252 6440
e jonathan.yu@csiro.au