Session III Census and registers - M. Scannapieco,The Italian Integrated System of Statistical Registers: Design and Implementation of an Ontology-based Data
Similar to Session III Census and registers - M. Scannapieco,The Italian Integrated System of Statistical Registers: Design and Implementation of an Ontology-based Data
Similar to Session III Census and registers - M. Scannapieco,The Italian Integrated System of Statistical Registers: Design and Implementation of an Ontology-based Data (20)
Session III Census and registers - M. Scannapieco,The Italian Integrated System of Statistical Registers: Design and Implementation of an Ontology-based Data
1. The Italian Integrated System of
Statistical Registers
Design and Implementation
of an Ontology-Based
Data Integration Architecture
Roberta Radini, Monica Scannapieco, Laura Tosco
Directorate for Methodology and Statistical Process Design, Istat
2. The project: the Italian Integrated System of
Statistical Registers (ISSR)
ISSR as an ontology-based data integration system
Piloting activities
Next steps
Outline
3. Significant revision of
statistical production
processes based on an
Integrated System of
Statistical Registers (ISSR)
Global view: Identification
and estimation for the whole
integrated system of units
and variables
Single logical environment to
support the consistency of
statistical production
processes
The Project: ISSR
Individuals
and Families
Economic
Units
Places
4. Requirement: Need to integrate
concepts belonging to different
thematic areas
4
Adoption of the ontology-based
data management approach for
accessing, integrating and
managing data sources
ISSR as an ontology-based data integration system - 1
Ontology-based Data Management (OBDM) is a new paradigm, rooted
on the idea of using Database Theory and Semantic Technologies for
data management.
OBDM is characterized by the following principles:
Let data reside where they are (no need to move data)
Define a logic-based conceptual specification of the domain of interest
(ontology)
Map the ontology to the concrete data sources
Express services/queries over the ontology and automatically obtain
answers
5. ISSR as an ontology-based data integration system - 2
Ontology: formal, shared and
explicit representation of the
conceptualization of the domain of
interest expressed through the
formal language which makes it
“machine-actionable”
Individuals and
Families
Economic
Units
Places
Ontology
Mapping
Data sources: sources
heterogeneous both semantically
and technologically
Mapping: rules expressing the
correspondence between data
and concepts/attributes of the
domain of interest
6. 6
Detailed benefits
Global ontology layer
Allows the coexistence of
different definitions of a
concept according to
different contexts, allowing
consistent access to the
underlying data
Offers reasoning capability
allowing to “infer” new
knowledge
Metadata
Complex in terms of
hugeness and lack of a direct
control. Ontologies can cope
with such a complexity
Represented and accessible
through an IT system: so far
statistical metadata models
are “not represented” in
formal languages
Coupled with data
Integration layer
Permits to virtualize data
sources
Performs “on-the fly” query
answering
Tranparency and
flexibility
Quality
Automated
Metadata
Governance
7. 7
Raw Data Area
Working Data Area
Validated Data Area
ISSR
Ontology for:
Semantic Data
Integration
Data Access
Data Quality
Check
Ontology for:
Semantic Data
Integration
Data Access
Data Stack
8. 8
ISSR Prototype: The Istat - UNIROMA 1 Experience
Persons, Families and
Cohabitation Ontology
Use of the Mastro system for OBDM:
http://www.obdasystems.com/mastro
Individuals and
Families
Mapping: used to semantically
link data at the sources to the
ontology
9. Domain: portion of Base Register of Individuals, Families and Cohabitations
related to persons, including residential data, and their family
relationships.
Ontology expressed in Graphol (http://www.obdasystems.com/graphol)
9
ISSR Prototype: The Istat - UNIROMA 1 Experience
10. - Tab_pers[idp, date_of_birth, address]
- Tab_family[idf, idp, head_flag]
Tab_pers(x,y,z) Person(x)
Tab_pers(x,y,z) ∧ y ≠ NULL date_of_birth(x,y)
Tab_pers(x,y,z) ∧ z ≠ NULL address(x,z)
Tab_family(x,y,z) Family(x)
Tab_family(x,y,z) belongsTo(y,x)
Tab_family(x,y,z) ∧ z = TRUE headOf(y,x)
The data sources connected to the ontology are those containing information
about persons and families.
We specified about 100 mapping assertions to link such data sources to
the ontology.
Tables in the data source
containing information on persons
and families
10
ISSR Prototype: The Istat - UNIROMA 1 Experience
11. Example of Access Query: We ask for persons belonging to
the same family joined by a civil union
This is expressed over the ontology by means of the
following query:
? x,y,z | Resident(x) ∧ ReferencePerson(y) ∧ Family(z)
∧ belongsTo(x,z) ∧ belongsTo(y,z) ∧ headOf(y,z)
∧civilPartnerOf(x,y)
11
ISSR Prototype: The Istat - UNIROMA 1 Experience
12. Example of Quality Check: Persons belonging to the same
family cannot have different residential addresses
This is expressed over the ontology by means of the
following constraint:
∀x,y,z,v,w Family(x) ∧ belongsTo(y,x) ∧ address(y,v)
∧ belongsTo(z,x) ∧ address(z, w) ∧ v ≠ w ⊥
12
ISSR Prototype: The Istat - UNIROMA 1 Experience
13. Blue cells represent the ontology - mediated
access
Green cells represent the direct access
13
Use cases, users and ontology mediated/direct access to data
14. From prototyping to production
Ontology modelling for registers of ISSRs
Architectural implementations: choice of production
systems, configuration and set up
Training IT, statistical and thematic people
So far addressed Processing phase GSBPM (Generic
Statistical Production Model)
Need to start addressing Analyse and Dissemination phases
of GSBPM (Aggregated Data)
Start with prototyping
14
Conclusions and next steps