Towards a computable standard for
Knowledge Graph Metadata
Michel Dumontier
WG1 Lead
COST Action Distributed Knowledge Graphs
W3C CG Knowledge Graph Construction
June 20, 2022
Metadata are information about data. They often provide a
description, context, provenance, and meaning to the data.
Informative metadata
Technical and administrative details
Descriptive metadata
Information to understand and interpret the data
Relational metadata
Captures the relationship between the data item and other
entities
Data: jpg image file
Informative metadata:
● Size: 155kb
● Date created: 2015-05-25
● Filetype: jpg
Descriptive metadata
● Title: MRI of the head
● Generated by: Ingenia 3.0T
Relational metadata
● About: EHR092376573
● Clinical Study: CT7812356
Image source: https://pixabay.com/photo-782457/
Metadata are information about data. They often provide a
description, context, provenance, and meaning to the data.
Metadata play a key role in finding, understanding, and reusing
digital (and non-digital) assets.
6
Poor quality (meta)data impedes reuse
which data elements are in the data, and what is the range of their values?
7
http://www.nature.com/articles/sdata201618
● What is the name of the KG?
● Who made the KG?
● When was it created or released?
● How was it created?
● What is the KG about?
● What language(s) are used in the KG?
● What kinds of types, relations, and
attributes are in the KG?
● How is the KG accessible? What data
standards does it use?
● What license it is released under?
A guide to describing data with RDF
vocabularies
● Identifiers
● Descriptors
● Versioning
● Attribution
● Provenance
● Content summarization
Mandatory, recommended, optional descriptors
Reference editor and validation
http://www.w3.org/TR/hcls-dataset/
Metagraph
COST ACTION Distributed Knowledge Graphs
WG1 is concerned with how knowledge graphs can be made
available from various sources, systems and formats, in a scalable,
serviceable, distributed, and FAIR (Findable, Accessible, Interoperable,
and Reusable) manner.
The WG will define requirements and explore ideas, methods, and
tools to make FAIR distributed knowledge graphs, with special
attention as to whether the data are offline or online, and what to do
when the data are privacy-sensitive.
https://cost-dkg.eu
KG Metadata Specification
Purpose: To provide a concrete guidance on
which metadata to be included in the
description of a KG.
People involved:
● María del Mar Roldán, University of Malaga, Spain.
● Manuel Paneque, University of Malaga, Spain.
● Matthijs Sloep, Maastricht University, The Netherlands
● Ilan Kernerman, K Dictionaries - Lexicala, Israel
● Jinzhou Yang, Maastricht University, The Netherlands
● Maxime Lefrançois, MINES Saint-Étienne, France
● Michel Dumontier, Maastricht University
● Katja Hose, Aalborg University, Denmark
● Flavio De Paoli, University of Milan-Bicocca, Italy
● Chang Sun, Maastricht University
● Maryam Mohammadi, Maastricht University, The
Netherlands
● Remzi Celebi, Maastricht University, The Netherlands
● Erkan Yasar, Ege University, Turkey
DKG Workshop on Metadata4KG
May 18-20, 2022. Lyon
Approach:
1. Examined relevant schemas
2. Brainstormed KG specific metadata
3. Discussed candidate metadata elements
4. Identified pertinent schema.org and RDF
vocabularies
5. Defined datatype ranges
6. Discussed their cardinality
7. Voted on their inclusion
8. Defined a minimal set of metadata elements
9. Rexamined cardinality constraints and added
few more candidates
10. Included wikidata metadata as example
KG specific metadata?
Meta-graph
Graph statistics
Vocabularies used
query API (SPARQL, graphQL, etc)
example queries
KG schema
KG Metadata Specification: Results - 33 elements
Future Work
Ensure relevance, completeness, and correctness of proposed schema, and
to potentially uncover other unmet needs
Define key attributes for the metadata document (e.g. creator, license, date,
schema)
Formalize the metadata specification into a computable standard (e.g. SHACL,
ShEX, JSON-Schema, etc).
nanobench SHAPE Publisher
https://collaboratory.semanticscience.org/shape-publisher
FAIRnotator (based on CEDAR workbench)
Future Work
Ensure relevance, completeness, and correctness of proposed schema, and
to potentially uncover other unmet needs
Define key attributes for the metadata document (e.g. creator, license, date,
schema)
Formalize the metadata specification into a computable standard (e.g. SHACL,
ShEX, JSON-Schema, etc).
Build a repository of distributed knowledge graphs that relies on the
metadata specification, along with other representations.
Can we do this in the W3C Community Group on Knowledge Graph Construction
?
Notes from meeting
positive indication to join forces.
The Profiles Vocabulary - https://www.w3.org/TR/dx-prof/
Automated metadata generation for linked dat agneeration and publishing workflows
https://events.linkeddata.org/ldow2016/papers/LDOW2016_paper_04.pdf
agree to biweekly calls 3-5pm until mid-july, then later in fall.

A metadata standard for Knowledge Graphs

  • 1.
    Towards a computablestandard for Knowledge Graph Metadata Michel Dumontier WG1 Lead COST Action Distributed Knowledge Graphs W3C CG Knowledge Graph Construction June 20, 2022
  • 2.
    Metadata are informationabout data. They often provide a description, context, provenance, and meaning to the data.
  • 3.
    Informative metadata Technical andadministrative details Descriptive metadata Information to understand and interpret the data Relational metadata Captures the relationship between the data item and other entities
  • 4.
    Data: jpg imagefile Informative metadata: ● Size: 155kb ● Date created: 2015-05-25 ● Filetype: jpg Descriptive metadata ● Title: MRI of the head ● Generated by: Ingenia 3.0T Relational metadata ● About: EHR092376573 ● Clinical Study: CT7812356 Image source: https://pixabay.com/photo-782457/
  • 5.
    Metadata are informationabout data. They often provide a description, context, provenance, and meaning to the data. Metadata play a key role in finding, understanding, and reusing digital (and non-digital) assets.
  • 6.
    6 Poor quality (meta)dataimpedes reuse which data elements are in the data, and what is the range of their values?
  • 7.
  • 8.
    ● What isthe name of the KG? ● Who made the KG? ● When was it created or released? ● How was it created? ● What is the KG about? ● What language(s) are used in the KG? ● What kinds of types, relations, and attributes are in the KG? ● How is the KG accessible? What data standards does it use? ● What license it is released under?
  • 9.
    A guide todescribing data with RDF vocabularies ● Identifiers ● Descriptors ● Versioning ● Attribution ● Provenance ● Content summarization Mandatory, recommended, optional descriptors Reference editor and validation http://www.w3.org/TR/hcls-dataset/
  • 10.
  • 12.
    COST ACTION DistributedKnowledge Graphs WG1 is concerned with how knowledge graphs can be made available from various sources, systems and formats, in a scalable, serviceable, distributed, and FAIR (Findable, Accessible, Interoperable, and Reusable) manner. The WG will define requirements and explore ideas, methods, and tools to make FAIR distributed knowledge graphs, with special attention as to whether the data are offline or online, and what to do when the data are privacy-sensitive. https://cost-dkg.eu
  • 13.
    KG Metadata Specification Purpose:To provide a concrete guidance on which metadata to be included in the description of a KG. People involved: ● María del Mar Roldán, University of Malaga, Spain. ● Manuel Paneque, University of Malaga, Spain. ● Matthijs Sloep, Maastricht University, The Netherlands ● Ilan Kernerman, K Dictionaries - Lexicala, Israel ● Jinzhou Yang, Maastricht University, The Netherlands ● Maxime Lefrançois, MINES Saint-Étienne, France ● Michel Dumontier, Maastricht University ● Katja Hose, Aalborg University, Denmark ● Flavio De Paoli, University of Milan-Bicocca, Italy ● Chang Sun, Maastricht University ● Maryam Mohammadi, Maastricht University, The Netherlands ● Remzi Celebi, Maastricht University, The Netherlands ● Erkan Yasar, Ege University, Turkey DKG Workshop on Metadata4KG May 18-20, 2022. Lyon Approach: 1. Examined relevant schemas 2. Brainstormed KG specific metadata 3. Discussed candidate metadata elements 4. Identified pertinent schema.org and RDF vocabularies 5. Defined datatype ranges 6. Discussed their cardinality 7. Voted on their inclusion 8. Defined a minimal set of metadata elements 9. Rexamined cardinality constraints and added few more candidates 10. Included wikidata metadata as example
  • 14.
    KG specific metadata? Meta-graph Graphstatistics Vocabularies used query API (SPARQL, graphQL, etc) example queries KG schema
  • 15.
    KG Metadata Specification:Results - 33 elements
  • 16.
    Future Work Ensure relevance,completeness, and correctness of proposed schema, and to potentially uncover other unmet needs Define key attributes for the metadata document (e.g. creator, license, date, schema) Formalize the metadata specification into a computable standard (e.g. SHACL, ShEX, JSON-Schema, etc).
  • 17.
  • 18.
    Future Work Ensure relevance,completeness, and correctness of proposed schema, and to potentially uncover other unmet needs Define key attributes for the metadata document (e.g. creator, license, date, schema) Formalize the metadata specification into a computable standard (e.g. SHACL, ShEX, JSON-Schema, etc). Build a repository of distributed knowledge graphs that relies on the metadata specification, along with other representations. Can we do this in the W3C Community Group on Knowledge Graph Construction ?
  • 19.
    Notes from meeting positiveindication to join forces. The Profiles Vocabulary - https://www.w3.org/TR/dx-prof/ Automated metadata generation for linked dat agneeration and publishing workflows https://events.linkeddata.org/ldow2016/papers/LDOW2016_paper_04.pdf agree to biweekly calls 3-5pm until mid-july, then later in fall.