Meaningful data and
semantic interoperability:
utopia or a possible reality
(in the Italian public sector)?
Giorgia Lodi - giorgia.lodi@cnr.it
Institute of Cognitive Sciences and Technologies (ISTC) of CNR
Semantic Technology Laboratory (STLab)
16th of April 2021
STLab
Data interoperability context
Bilateral
agreements for
specific uses
The rest is more
or less a set of
nice islands!
Data quality examples: null values
In the address
colum there are
also postCodes
and telephone
numbers!
Data quality examples: missing semantics
With some context
probably it can be
either «active» or «not
active» or «open» or
«closed»?
Data quality examples: semantics mismatches
Data quality examples: semantics mismatches
Data quality examples: strings and not codes
• Chenge of the name of a vaccine
provider causes troubles to many
applications built on this data. No
codes used to identify them, only
strings
• Many datasets are heavily based
on unstructured text!
Data quality examples: accuracy
Data quality examples
Can we trust this data?
Garbage in - Garbage out law!
Terrific
thing
Paradigm shift: Web as a blueprint
Persistent URIs for things: from strings to codes!
Dereferencable HTTP URIs and Content Negotiation
Open Standards
Standardized way to represent and query (APIs) data
Native data integration: link data to other
Separation of data from applications
In other worlds, can we leverage the semantic
web to fix interoperability and data quality issues?
Open standards: RDF, OWL, SPARQL
RDF - standard for data representation (data model) in the Web
u Data is in the form of triples: subject - predicate - object.
(interlinked graph - Linked Data)
u Subject and predicate have URIs
u Object can be literal or have a URI (subject of other triples)
u RDF* is coming: it allows one to predicate over triples
u << subject predicate object >> predicate object .
OWL - a standard computational logic-based language used to represent
rich and complex knowledge about things and relations between things
SPARQL - a standard protocol and language
u It allows one to query data represented as triples based on the triples pattern
matching principle
u SPARQL* is coming: it allows one to query data in RDF*
Web of data
https://lod-cloud.net/
Ontologies
COMPUTER SCIENCE - Ontology is a
formal, explicit and shared
representation (contextualization)
of a knowledge domain, defined
on the basis of specific
requirements to be collected
A set of logical axioms that
describe entities and their
relations
https://lov.linkeddata.es/dataset/lov/ https://bioportal.bioontology.org/
Knowledge graph
Ontology
Linked Data
Knowledge graph = Ontology + Linked Data
OntoPiA open ontology framework
30 ontologies, 40 controlled vocabularies, more than 15.500
defined logical axioms
https://github.com/italia/daf-ontologie-vocabolari-controllati
Knowledge graph design and production process
Requirements
Collection
(user stories,
competency
questions)
Requirements
under different
forms
Ontology
Design
Patterns
Identification
Draft
ontological
modules
Ontologies at the
state of the art
Analysis of
existing
ontologies Ontology
Design Patterns
at the state of
the art
Modularization
Direct reuse and
indirect reuse of
ontologies
Final OWL
ontologies +
OWL alignment
modules
Data
Production
KG refactoring &
enrichment (entity
deduplication,
disambiguation,
linking)
RDF Data
Testing
Knbowledge
graph (ontology
and Linked
Data published
V. Presutti, E. Daga, A. Gangemi, E. Blomqvist. eXtreme Design with Content Ontology Design Patterns.
Workshop on Ontology Patterns, 83-97; 2009
V. Presutti, G. Lodi, A.G. Nuzzolese, A. Gangemi, S. Peroni, L. Asprino. The Role of Ontology Design
Patterns in Linked Data Projects. International Conference on Conceptual Modeling (ER); 2016
Ontology design patterns
u Reusable modelling solutions to solve recurrent ontological modelling
problems [1]. Useful to reduce the arbitrariness of the ontology design
u Research in the field demonstrated that their reuse can:
u Reduce the modelling errors
u Help in detecting requirements that are not so evident
u Help in improving the overall ontology quality [2]
u Help in representing data that is then more sound [3]
[1] http://ontologydesignpatterns.org/wiki/Main_Page
[2] Blomqvist E., Gangemi A., Presutti V. Experiments in Pattern-based Ontology Design,
Proceedings of KCAP09, Los Angeles, ACM Press, 2009
[3] Paulheim, H. and Gangemi, A. Serving DBpedia with DOLCE – More than Just Adding a Cherry
on Top. Proceedings of ISWC2015, the Thirteenth International Semantic Web Conference, LNCS,
Springer, 2015
Ontology design patterns
Time indexed situation
design pattern
Time indexed situation
design pattern applied to
of Roles of Agents in
Services
Inference
Schema
mysch:Musical_Artist owl:equivalentClass [mysch:Artist [ some
mysch:plays mysch:Music ]] .
Data
ex:Miles_Davis a mysch:Artist ;
mysch:plays ex:Jazz .
ex:Jazz a mysch:Music .
Inferred Data
à ex:Miles_Davis a mysch:Musical_Artist .
What can we do with all of this?
System B
System A
System C
System D
Federated Data
Catalogue
(e.g, Google
Dataset Search,
European Data
Portal, dati.gov.it)
Interoperable
Federated Systems
Knowledge graph
What can we do with all of this?
https://catalogo.beniculturali.it/
Based on ArCo Knowledge graph
(169,151,644 triples, linking to 20,479
distinct entities of other datasets)
What can we do with all of this?
MARIO robot - http://www.mario-project.eu/portal/
Use knowledge graph to navigate a virtual immersive environment
What can we do with all of this?
Broken data silos! First European knowledge graph
on water (marine and inland) and health data
Duration
September 2020 to August 2023
EU Programme
2019 CEF Telecom Public Open Data
A co-creation programme has
been launched in order to
engage with potential re-users
since the very first development
phases of the project
Visit https://whowproject.eu/
for more information
Is it a perfect world?
u As everything, there are still many missing points
and issues
u Quite significant human effort for representing the
knowledge of all possible domains according to
different requirements
u But many are available as open resources
u Still lot of hidden knowledge represented as
unstructured text (many literals)
u Data on the web may be incomplete
FRED
u It is a machine reading tool that produces knowledge graphs from
text by relying on Combinatory Categorical Grammar, Discourse
Representation Theory, Linguistic Frames, and Ontology Design
Patterns
The president Joe Biden decided to stop vaccinations with Janssen
vaccine
[1] Gangemi, A., Presutti, V., Reforgiato Recupero, D., Nuzzolese, A. G., Draicchio, F., & Mongiovì,
M. (2017). Semantic web machine reading with FRED. Semantic Web, 8(6), 873-893.
http://wit.istc.cnr.it/stlab-tools/fred/demo/?
Using machine learning for missing data
GOAL
u To automatically generate a-cd:iconclassCode triples
for the 25k entities missing a code
HOW
u Using neural network linear classifiers to be trained
(linear SGD) on the entities that have a
iconclassCode
PRELIMINARY RESULTS
u Ten binary classifiers: 1 for each of the ten
general categories
u For each of them additional classifiers are trained
for subcategories, etc.
u A total of 10 + 140 classifiers tests so far
u Training: > 80 entities per classifier
u Precision, Recall > 75%
9
Conclusions
u Knowledge representation, knowledge graphs are
getting momentum (documented in Gartner hype
cycle) in industry and in some public institutions
u Separation of data from applications
u New Web from the inventor Sir Tim Berners Lee - Solid
u Additional developments for separating data from rules from
applications
u Third waves in AI - combine neural and symbolic AI
u We are open to collaborations for new Horizon Europe
projects!
Thank you for your attention!

Semantic Interoperability - grafi della conoscenza

  • 1.
    Meaningful data and semanticinteroperability: utopia or a possible reality (in the Italian public sector)? Giorgia Lodi - giorgia.lodi@cnr.it Institute of Cognitive Sciences and Technologies (ISTC) of CNR Semantic Technology Laboratory (STLab) 16th of April 2021
  • 2.
  • 3.
    Data interoperability context Bilateral agreementsfor specific uses The rest is more or less a set of nice islands!
  • 4.
    Data quality examples:null values In the address colum there are also postCodes and telephone numbers!
  • 5.
    Data quality examples:missing semantics With some context probably it can be either «active» or «not active» or «open» or «closed»?
  • 6.
    Data quality examples:semantics mismatches
  • 7.
    Data quality examples:semantics mismatches
  • 8.
    Data quality examples:strings and not codes • Chenge of the name of a vaccine provider causes troubles to many applications built on this data. No codes used to identify them, only strings • Many datasets are heavily based on unstructured text!
  • 9.
  • 10.
    Data quality examples Canwe trust this data?
  • 11.
    Garbage in -Garbage out law! Terrific thing
  • 12.
    Paradigm shift: Webas a blueprint Persistent URIs for things: from strings to codes! Dereferencable HTTP URIs and Content Negotiation Open Standards Standardized way to represent and query (APIs) data Native data integration: link data to other Separation of data from applications In other worlds, can we leverage the semantic web to fix interoperability and data quality issues?
  • 13.
    Open standards: RDF,OWL, SPARQL RDF - standard for data representation (data model) in the Web u Data is in the form of triples: subject - predicate - object. (interlinked graph - Linked Data) u Subject and predicate have URIs u Object can be literal or have a URI (subject of other triples) u RDF* is coming: it allows one to predicate over triples u << subject predicate object >> predicate object . OWL - a standard computational logic-based language used to represent rich and complex knowledge about things and relations between things SPARQL - a standard protocol and language u It allows one to query data represented as triples based on the triples pattern matching principle u SPARQL* is coming: it allows one to query data in RDF*
  • 14.
  • 15.
    Ontologies COMPUTER SCIENCE -Ontology is a formal, explicit and shared representation (contextualization) of a knowledge domain, defined on the basis of specific requirements to be collected A set of logical axioms that describe entities and their relations https://lov.linkeddata.es/dataset/lov/ https://bioportal.bioontology.org/
  • 16.
    Knowledge graph Ontology Linked Data Knowledgegraph = Ontology + Linked Data
  • 17.
    OntoPiA open ontologyframework 30 ontologies, 40 controlled vocabularies, more than 15.500 defined logical axioms https://github.com/italia/daf-ontologie-vocabolari-controllati
  • 18.
    Knowledge graph designand production process Requirements Collection (user stories, competency questions) Requirements under different forms Ontology Design Patterns Identification Draft ontological modules Ontologies at the state of the art Analysis of existing ontologies Ontology Design Patterns at the state of the art Modularization Direct reuse and indirect reuse of ontologies Final OWL ontologies + OWL alignment modules Data Production KG refactoring & enrichment (entity deduplication, disambiguation, linking) RDF Data Testing Knbowledge graph (ontology and Linked Data published V. Presutti, E. Daga, A. Gangemi, E. Blomqvist. eXtreme Design with Content Ontology Design Patterns. Workshop on Ontology Patterns, 83-97; 2009 V. Presutti, G. Lodi, A.G. Nuzzolese, A. Gangemi, S. Peroni, L. Asprino. The Role of Ontology Design Patterns in Linked Data Projects. International Conference on Conceptual Modeling (ER); 2016
  • 19.
    Ontology design patterns uReusable modelling solutions to solve recurrent ontological modelling problems [1]. Useful to reduce the arbitrariness of the ontology design u Research in the field demonstrated that their reuse can: u Reduce the modelling errors u Help in detecting requirements that are not so evident u Help in improving the overall ontology quality [2] u Help in representing data that is then more sound [3] [1] http://ontologydesignpatterns.org/wiki/Main_Page [2] Blomqvist E., Gangemi A., Presutti V. Experiments in Pattern-based Ontology Design, Proceedings of KCAP09, Los Angeles, ACM Press, 2009 [3] Paulheim, H. and Gangemi, A. Serving DBpedia with DOLCE – More than Just Adding a Cherry on Top. Proceedings of ISWC2015, the Thirteenth International Semantic Web Conference, LNCS, Springer, 2015
  • 20.
    Ontology design patterns Timeindexed situation design pattern Time indexed situation design pattern applied to of Roles of Agents in Services
  • 21.
    Inference Schema mysch:Musical_Artist owl:equivalentClass [mysch:Artist[ some mysch:plays mysch:Music ]] . Data ex:Miles_Davis a mysch:Artist ; mysch:plays ex:Jazz . ex:Jazz a mysch:Music . Inferred Data à ex:Miles_Davis a mysch:Musical_Artist .
  • 22.
    What can wedo with all of this? System B System A System C System D Federated Data Catalogue (e.g, Google Dataset Search, European Data Portal, dati.gov.it) Interoperable Federated Systems Knowledge graph
  • 23.
    What can wedo with all of this? https://catalogo.beniculturali.it/ Based on ArCo Knowledge graph (169,151,644 triples, linking to 20,479 distinct entities of other datasets)
  • 24.
    What can wedo with all of this? MARIO robot - http://www.mario-project.eu/portal/
  • 25.
    Use knowledge graphto navigate a virtual immersive environment
  • 26.
    What can wedo with all of this? Broken data silos! First European knowledge graph on water (marine and inland) and health data Duration September 2020 to August 2023 EU Programme 2019 CEF Telecom Public Open Data A co-creation programme has been launched in order to engage with potential re-users since the very first development phases of the project Visit https://whowproject.eu/ for more information
  • 27.
    Is it aperfect world? u As everything, there are still many missing points and issues u Quite significant human effort for representing the knowledge of all possible domains according to different requirements u But many are available as open resources u Still lot of hidden knowledge represented as unstructured text (many literals) u Data on the web may be incomplete
  • 28.
    FRED u It isa machine reading tool that produces knowledge graphs from text by relying on Combinatory Categorical Grammar, Discourse Representation Theory, Linguistic Frames, and Ontology Design Patterns The president Joe Biden decided to stop vaccinations with Janssen vaccine [1] Gangemi, A., Presutti, V., Reforgiato Recupero, D., Nuzzolese, A. G., Draicchio, F., & Mongiovì, M. (2017). Semantic web machine reading with FRED. Semantic Web, 8(6), 873-893. http://wit.istc.cnr.it/stlab-tools/fred/demo/?
  • 29.
    Using machine learningfor missing data GOAL u To automatically generate a-cd:iconclassCode triples for the 25k entities missing a code HOW u Using neural network linear classifiers to be trained (linear SGD) on the entities that have a iconclassCode PRELIMINARY RESULTS u Ten binary classifiers: 1 for each of the ten general categories u For each of them additional classifiers are trained for subcategories, etc. u A total of 10 + 140 classifiers tests so far u Training: > 80 entities per classifier u Precision, Recall > 75% 9
  • 30.
    Conclusions u Knowledge representation,knowledge graphs are getting momentum (documented in Gartner hype cycle) in industry and in some public institutions u Separation of data from applications u New Web from the inventor Sir Tim Berners Lee - Solid u Additional developments for separating data from rules from applications u Third waves in AI - combine neural and symbolic AI u We are open to collaborations for new Horizon Europe projects!
  • 31.
    Thank you foryour attention!