The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

The necessity of metadata for linked open data and
its contribution to policy analyses

Anneke Zuiderwijk*, Keith Jeffery**, Marijn Janssen*

*Delft University of Technology, The Netherlands
**Science and Technology Facilities Council, United Kingdom

CEDEM 2012, May 3-4

Open governmental data

0 "We are sending a strong signal to administrations today. Your
data is worth more if you give it away. So start releasing it
now.” (December 12, 2011)

European Commission Vice President Neelie Kroes, digital agenda:
Turning government data into gold)

0 One of many examples that shows that open governmental
data have gained considerable attention recently

CEDEM 2012

The ENGAGE project

0 ENGAGE (FP7): An Infrastructure for Open, Linked
Governmental Data Provision towards Research Communities
and Citizens (http://www.engage-project.eu)

0 Main goal: the development and use of a data infrastructure,
incorporating distributed and diverse public sector information
(PSI) resources.

0 The ENGAGE platform will enable researchers and citizens to:
0 Discover and browse datasets across diverse and dispersed public
sector information resources (local, national and European) in their
own language
0 Download the datasets
0 Perform geospatial search of datasets
0 Visualize properly structured datasets in data tables, maps and charts
CEDEM 2012

Open governmental data

0 Open governmental data can be defined as “all stored data of
the public sector which could be made accessible by
government in the public interest without any restrictions on
usage and distribution” (Geiger & Von Lucke, 2011, p. 185).

0 For example, public sector data can be:
0 Geographic data (e.g. cadastral information)
0 Legal data (e.g. courts decisions, legislation)
0 Meteorological data (e.g. climate data, weather forecasts)
0 Social data (e.g. population, public administration)
0 Transport data (e.g. traffic congestion, work on roads)
0 Business data (e.g. chamber of commerce, patents) (MEPSIR study,
Dekkers et al., 2006)

CEDEM 2012

Linked open data (LOD)

0 Focus on turning public sector PUBLIC SECTOR (POLICY)
data into LOD (1)
DATA METADATA

1. Public body produces data (and (2)
metadata) PUBLICATION ON THE
SEMANTIC WEB
2. Data become available on the
(3)
Web of Data / Semantic Web
REUSING OPEN DATA
3. Open data can be reused
(4)
4. Open data can be linked to other LINKING DATA
data  show relationships
(5)
5. Data are both open and linked  LINKED OPEN DATA
Linked Open Data (LOD)
Figure 1: Process for creating Linked Open Data

CEDEM 2012

Metadata

0 Metadata are part of the LOD-process
0 Metadata are needed to make sense of the open data (Berners-
Lee, 2009)

0 Metadata are defined as “structured information that
describes, explains, locates, or otherwise makes it easier to
retrieve, use, or manage an information resource.” (National
Information Standards Organization, 2004, p. 1).

0 Metadata provision in the ideal situation:
0 Discovery metadata, e.g. identifier, title, creator, keywords.
0 Contextual metadata, e.g. organizations, projects, funding.
0 Detailed metadata, e.g. quality and domain specific parameters.

CEDEM 2012

Why metadata are necessary in analyzing LOD

0 Metadata for LOD can be useful in the following situations.
Metadata:
0 create order within datasets;
0 improve storing and preservation of LOD;
0 improve easily finding LOD;
0 improve the accessibility of LOD;
0 may make it possible to assess and rank the quality of LOD;
0 improve easily analyzing, comparing, reproducing and therefore finding
inconsistencies in LOD;
0 improve chances of a correct interpretation of LOD;
0 improve the possibilities to find patterns in LOD to generate new
hypotheses;
0 may improve visualizing LOD;
0 make it easier to link data ;
0 avoid unnecessary duplication of LOD.

CEDEM 2012

Problem statement

0 Discrepancies between the benefits that are described in
literature and the benefits that are obtained in reality

0 Current situation is a long way from the ideal situation:
0 usually few and insufficient ways of managing metadata and
interpretation of LOD (for instance Hernández-Pérez et al., 2009;
Schuurman et al., 2008; Xiong et al., 2011);
0 adding metadata is often viewed as an additional activity that only
consumes resources.

0 Statements:
0 Merely linking data is not enough to make use of open data
0 Metadata are key enablers for the effective use of LOD in
policy-making

CEDEM 2012

Requirements for a metadata architecture

0 The metadata should:
0 be easily discovered;
0 interconvert common metadata formats used in PSI;
0 provide a LOD representation of the metadata for browsing
or query;
0 maintain the capabilities of conventional information
systems with structured query including convenient
primitive operations.

CEDEM 2012

Outline architecture
0 The requirements lead to the following architecture:

Portal server PORTAL
METADATA

RUNNING
SOFTWARE
APPLICATION
PSI PSI PSI
DATA- DATA- DATA-
Application Server
SET SET SET

PSI Dataset Servers

Figure 2: An architecture of a portal server for the provision of metadata.

CEDEM 2012

Metadata
0 Metadata should be used to implement this architecture

A 3-layer structure for metadata is used:
a) discovery (flat) metadata; for example:
0 Dublin Core (DC);
0 e-Government Metadata Standard (e-GMS);
0 Comprehensive Knowledge Archive Network (CKAN);
0 or similar ‘flat’ metadata
b) contextual metadata; uses the Common European Research
Information Format (CERIF) ;
c) detailed metadata.

CEDEM 2012

The Vision: Metadata for Data Model

DISCOVERY
Linked
open data (DC, eGMS…)
Generate

CONTEXT
(CERIF)
Formal Point to
Information
Systems DETAIL
(SUBJECT OR TOPIC SPECIFIC)

Design
The presented structure provides the next improved facilities:

0 CERIF provides a much richer metadata than the standards
used commonly with PSI datasets.

0 The representation of contextual metadata (CERIF) allows rich
semantics to be represented thus making the PSI datasets
understandable to the end user (or software) through the
metadata.

0 The Structured Query Language (SQL) has a simpler structure
than SPARQL and includes convenient primitive operations for
simple statistical calculations such as sum, count, average.

CEDEM 2012

Benefits of architecture

0 Because of the powerful expressive semantics over formal
syntax of CERIF we can:
0 Generate discovery metadata from CERIF;
0 Interconvert common metadata formats used in PSI using CERIF as the
superset exchange mechanism;
0 Provide a semantic web / LOD representation of the metadata for
browsing or query using SPARQL;
0 While maintaining a conventional information systems capability with
structured query including convenient primitive operations.

CEDEM 2012

Models for an infrastructure

0 The data model with its metadata described is only one
relevant model

0 The other models are:
0 User model
0 Processing model
0 Resource model

The Vision: The Models

User Model
Processing
Model
Data Model
Resource
Model
Complete cohort of users Complete ICT environment for PSI

Model – User model

0 User Model: controls the way in which the end-user interacts
with the e-infrastructure.
0 User profile, security certification, privacy;
0 Device and interaction mode preferences (keyboard/mouse through
voice and gesture to brain-connected), language preference;
0 Resource preferences (including contacts) with directories;

0 METADATA

Models – Processing model

0 Process Model controls the way processes are
constructed and executed in the e-infrastructure
0 Services
0 Described for discovery, described for functional and non-functional
(security, privacy, performance) properties
0 Mobile (deployed in distributed / parallel execution environments)
0 Open source where possible
0 Service composition
0 Dynamically (re-) composable during execution
0 METADATA

Models – Data model

0 Data Model controls data representation and data (re-)use
0 Formal syntax (structure)
0 Even for text, images, streamed video
0 Declared semantics (meaning)

0 METADATA

Models – Resource model

0 Resource Model catalogs the available computing
resources in the e-infrastructure
0 This allows virtualisation so the user neither knows nor cares from
where the data comes, or where the processing is done, as long as
quality of service is maintained;
0 Requires updating by resource owners – together with conditions of
use
0 METADATA

Conclusions (1)

0 Metadata are needed to make sense of the open data
0 Merely linking data is not enough to make optimal use of open
data
0 Metadata are key enablers for policy-making
0 Adding metadata can yield considerable benefits, including:
0 creating order in datasets
0 improving find ability, accessibility, storing and preservation of LOD
0 improving easily analyzing, comparing, reproducing, finding
inconsistencies
0 correct interpretation and visualizing of LOD
0 finding patters in LOD to generate new hypotheses
0 making linking of data easier
0 assessing and ranking the quality of LOD and avoiding unnecessary
duplication of LOD

CEDEM 2012

Conclusions (2)

0 Architecture for metadata:
0 discovery metadata can be generated from CERIF
0 common metadata formats can use CERIF as the superset exchange
mechanism
0 a LOD representation of the metadata for browsing or query can be
made allowing the use of SPARQL
0 while a conventional information systems capability with structured
query including convenient primitive operations can be maintained
0 We recommend to further implement the proposed metadata
architecture

CEDEM 2012

The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Viewers also liked

Viewers also liked (7)

Similar to The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

Similar to The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12 (20)

More from Danube University Krems, Centre for E-Governance

More from Danube University Krems, Centre for E-Governance (20)

Recently uploaded

Recently uploaded (20)

The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

Editor's Notes