• Like
The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

  • 1,500 views
Published

The necessity of metadata for open linked data and its contribution to policy …

The necessity of metadata for open linked data and its contribution to policy
analyses (Anneke Zuiderwijk, Keith Jeffery, Marijn Janssen) #CeDEM12

Published in Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,500
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
13
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Start with a citation of NelieKroes - December 12, 2011This example shows that open data have gained considerable attention recentlyAnotherexample is the ENGAGE project
  • Framework Programme 7 shows that attention of the European Commission for Open DataENGAGE is part of FP7Mail goalThe paper that we present here stems from the ENGAGE project
  • What are open governmental data? Mention definition Geiger & Von Lucke.We adopt this definition as it excludes the publication of data which must remain confidential, are private or contain industrial secrets.Examples of open governmental data
  • Linking data providesuswith the benefits of open data; obtainvaluebylinking, showingrelationshipHow are LOD created?A public body produces anonymised (non-personally identifiable) data during the course of its ordinary business. Produceddata become freely available to everyone on the Web of Data, also referred to as the Semantic Web. The public sector data are then referred to as open data and can be used, reused and redistributed by everyone, without restrictions from copyright, patents or other mechanisms of control. A possibility of reusing open data is by linking themto other data to show relationships with these other data.The Linked Data that are the outcome of this linking are defined as “a collection of interrelated datasets on the Web”. Data which are both open and linked, referred to as LOD, are data that meet the requirements of open data and that also show relationships among the open data thus providing information which may be defined as structured data in context. After PSI is converted into LOD, this creates interesting possibilities for analyzing policies of public bodies. e.g. 2 datasets: 1 withdemographic data, 1 with crime data. Linkingthem on the basis of postal codes will shows relationsshipsbetweendemographic data and crime data.
  • We saw that publishing metadata is part of the LOD-process.Metadata are needed to make sense of the open data. Metadata are data about the data.We define metadata as “structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.”In the ideal situation for LOD, different types of metadata are provided: discovery (flat) metadata (which are descriptive and navigational),contextual metadata (which are descriptive, restrictive and navigational) detailed metadata (which cover schema metadata plus additional metadata to assure quality). These types of metadata describe among other things the following information about the LOD.Discovery (flat) metadata: identifier, title, creator, publisher, country, source, type, format, language, sector, subjects, keywords, relative information system, validity date (from – to), audience, legal framework, status, relevant resources and linked data sets.Contextual metadata: organizations, persons, projects, funding, facilities, equipment, services and pointers to detailed metadata.Detailed metadata: include quality (accuracy, precision, calibration and other parameters (Charalabidis, Ntanos, & Lampathaki, 2011) and domain or dataset-specific parameters that are used by software accessing and processing the dataset.
  • Benefits of the metadata according to the literature overview.
  • There are discrepancies between the benefits that are described in literature and the benefits that are obtained in reality. The currentsituation is insufficient.Statements
  • Based on the literature overview and twouse cases we found that the basic capabilities that are created by adding metadata are as follows:The metadata should be easily discovered;The metadata should interconvert common metadata formats used in PSI;The metadata should provide a LOD representation of the metadata for browsing or query;The metadata should maintain the capabilities of conventional information systems with structured query including convenient primitive operations.To accomplish these capabilities we need discovery, contextual and detailed metadata.
  • The challenge is to design an architecture to allow (a) end-user ‘citizen’ and ‘researcher’ access via a portal supported by metadata to PSI datasets for download; (b) access - utilising metadata – to those same datasets via a service from a running program on another system to utilise the information in another context. This leads naturally to an architecture sketched in Figure 2.
  • - CERIF provides a much richer metadata than the standards used commonly with PSI datasets and so improves greatly the experience of the end user (or the software) in processing the PSI datasets described by the enhanced metadata.- The representation of contextual metadata (CERIF) allows rich semantics to be represented simply over a formal syntax thus making the PSI datasets understandable to the end user (or software) through the enhanced metadata. - The Structured Query Language (SQL) usually presented to the end-user through an easy-to-use Query By Example (QBE) interface has a simpler structure than SPARQL and includes convenient primitive operations for simple statistical calculations such as sum, count, average.

Transcript

  • 1. The necessity of metadata for linked open data and its contribution to policy analyses Anneke Zuiderwijk*, Keith Jeffery**, Marijn Janssen* *Delft University of Technology, The Netherlands **Science and Technology Facilities Council, United Kingdom CEDEM 2012, May 3-4
  • 2. Open governmental data0 "We are sending a strong signal to administrations today. Your data is worth more if you give it away. So start releasing it now.” (December 12, 2011)European Commission Vice President Neelie Kroes, digital agenda:Turning government data into gold)0 One of many examples that shows that open governmental data have gained considerable attention recently CEDEM 2012
  • 3. The ENGAGE project0 ENGAGE (FP7): An Infrastructure for Open, Linked Governmental Data Provision towards Research Communities and Citizens (http://www.engage-project.eu)0 Main goal: the development and use of a data infrastructure, incorporating distributed and diverse public sector information (PSI) resources.0 The ENGAGE platform will enable researchers and citizens to: 0 Discover and browse datasets across diverse and dispersed public sector information resources (local, national and European) in their own language 0 Download the datasets 0 Perform geospatial search of datasets 0 Visualize properly structured datasets in data tables, maps and charts CEDEM 2012
  • 4. Open governmental data0 Open governmental data can be defined as “all stored data of the public sector which could be made accessible by government in the public interest without any restrictions on usage and distribution” (Geiger & Von Lucke, 2011, p. 185).0 For example, public sector data can be: 0 Geographic data (e.g. cadastral information) 0 Legal data (e.g. courts decisions, legislation) 0 Meteorological data (e.g. climate data, weather forecasts) 0 Social data (e.g. population, public administration) 0 Transport data (e.g. traffic congestion, work on roads) 0 Business data (e.g. chamber of commerce, patents) (MEPSIR study, Dekkers et al., 2006) CEDEM 2012
  • 5. Linked open data (LOD)0 Focus on turning public sector PUBLIC SECTOR (POLICY) data into LOD (1) DATA METADATA1. Public body produces data (and (2) metadata) PUBLICATION ON THE SEMANTIC WEB2. Data become available on the (3) Web of Data / Semantic Web REUSING OPEN DATA3. Open data can be reused (4)4. Open data can be linked to other LINKING DATA data  show relationships (5)5. Data are both open and linked  LINKED OPEN DATA Linked Open Data (LOD) Figure 1: Process for creating Linked Open Data CEDEM 2012
  • 6. Metadata0 Metadata are part of the LOD-process0 Metadata are needed to make sense of the open data (Berners- Lee, 2009)0 Metadata are defined as “structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.” (National Information Standards Organization, 2004, p. 1).0 Metadata provision in the ideal situation: 0 Discovery metadata, e.g. identifier, title, creator, keywords. 0 Contextual metadata, e.g. organizations, projects, funding. 0 Detailed metadata, e.g. quality and domain specific parameters. CEDEM 2012
  • 7. Why metadata are necessary in analyzing LOD0 Metadata for LOD can be useful in the following situations.Metadata:0 create order within datasets;0 improve storing and preservation of LOD;0 improve easily finding LOD;0 improve the accessibility of LOD;0 may make it possible to assess and rank the quality of LOD;0 improve easily analyzing, comparing, reproducing and therefore finding inconsistencies in LOD;0 improve chances of a correct interpretation of LOD;0 improve the possibilities to find patterns in LOD to generate new hypotheses;0 may improve visualizing LOD;0 make it easier to link data ;0 avoid unnecessary duplication of LOD. CEDEM 2012
  • 8. Problem statement0 Discrepancies between the benefits that are described in literature and the benefits that are obtained in reality0 Current situation is a long way from the ideal situation: 0 usually few and insufficient ways of managing metadata and interpretation of LOD (for instance Hernández-Pérez et al., 2009; Schuurman et al., 2008; Xiong et al., 2011); 0 adding metadata is often viewed as an additional activity that only consumes resources.0 Statements: 0 Merely linking data is not enough to make use of open data 0 Metadata are key enablers for the effective use of LOD in policy-making CEDEM 2012
  • 9. Requirements for a metadata architecture0 The metadata should: 0 be easily discovered; 0 interconvert common metadata formats used in PSI; 0 provide a LOD representation of the metadata for browsing or query; 0 maintain the capabilities of conventional information systems with structured query including convenient primitive operations. CEDEM 2012
  • 10. Outline architecture0 The requirements lead to the following architecture: Portal server PORTAL METADATA RUNNING SOFTWARE APPLICATION PSI PSI PSI DATA- DATA- DATA- Application Server SET SET SET PSI Dataset Servers Figure 2: An architecture of a portal server for the provision of metadata. CEDEM 2012
  • 11. Metadata0 Metadata should be used to implement this architectureA 3-layer structure for metadata is used:a) discovery (flat) metadata; for example: 0 Dublin Core (DC); 0 e-Government Metadata Standard (e-GMS); 0 Comprehensive Knowledge Archive Network (CKAN); 0 or similar ‘flat’ metadatab) contextual metadata; uses the Common European Research Information Format (CERIF) ;c) detailed metadata. CEDEM 2012
  • 12. The Vision: Metadata for Data Model DISCOVERY Linkedopen data (DC, eGMS…) Generate CONTEXT (CERIF) Formal Point toInformation Systems DETAIL (SUBJECT OR TOPIC SPECIFIC)
  • 13. DesignThe presented structure provides the next improved facilities:0 CERIF provides a much richer metadata than the standards used commonly with PSI datasets.0 The representation of contextual metadata (CERIF) allows rich semantics to be represented thus making the PSI datasets understandable to the end user (or software) through the metadata.0 The Structured Query Language (SQL) has a simpler structure than SPARQL and includes convenient primitive operations for simple statistical calculations such as sum, count, average. CEDEM 2012
  • 14. Benefits of architecture0 Because of the powerful expressive semantics over formal syntax of CERIF we can: 0 Generate discovery metadata from CERIF; 0 Interconvert common metadata formats used in PSI using CERIF as the superset exchange mechanism; 0 Provide a semantic web / LOD representation of the metadata for browsing or query using SPARQL; 0 While maintaining a conventional information systems capability with structured query including convenient primitive operations. CEDEM 2012
  • 15. Models for an infrastructure0 The data model with its metadata described is only one relevant model0 The other models are: 0 User model 0 Processing model 0 Resource model
  • 16. The Vision: The Models User Model Processing Model Data Model Resource Model Complete cohort of users Complete ICT environment for PSI
  • 17. Model – User model0 User Model: controls the way in which the end-user interacts with the e-infrastructure. 0 User profile, security certification, privacy; 0 Device and interaction mode preferences (keyboard/mouse through voice and gesture to brain-connected), language preference; 0 Resource preferences (including contacts) with directories;0 METADATA
  • 18. Models – Processing model0 Process Model controls the way processes are constructed and executed in the e-infrastructure 0 Services 0 Described for discovery, described for functional and non-functional (security, privacy, performance) properties 0 Mobile (deployed in distributed / parallel execution environments) 0 Open source where possible 0 Service composition 0 Dynamically (re-) composable during execution0 METADATA
  • 19. Models – Data model0 Data Model controls data representation and data (re-)use 0 Formal syntax (structure) 0 Even for text, images, streamed video 0 Declared semantics (meaning)0 METADATA
  • 20. Models – Resource model0 Resource Model catalogs the available computing resources in the e-infrastructure 0 This allows virtualisation so the user neither knows nor cares from where the data comes, or where the processing is done, as long as quality of service is maintained; 0 Requires updating by resource owners – together with conditions of use0 METADATA
  • 21. Conclusions (1)0 Metadata are needed to make sense of the open data0 Merely linking data is not enough to make optimal use of open data0 Metadata are key enablers for policy-making0 Adding metadata can yield considerable benefits, including: 0 creating order in datasets 0 improving find ability, accessibility, storing and preservation of LOD 0 improving easily analyzing, comparing, reproducing, finding inconsistencies 0 correct interpretation and visualizing of LOD 0 finding patters in LOD to generate new hypotheses 0 making linking of data easier 0 assessing and ranking the quality of LOD and avoiding unnecessary duplication of LOD CEDEM 2012
  • 22. Conclusions (2)0 Architecture for metadata: 0 discovery metadata can be generated from CERIF 0 common metadata formats can use CERIF as the superset exchange mechanism 0 a LOD representation of the metadata for browsing or query can be made allowing the use of SPARQL 0 while a conventional information systems capability with structured query including convenient primitive operations can be maintained0 We recommend to further implement the proposed metadata architecture CEDEM 2012