The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12


Published on

The necessity of metadata for open linked data and its contribution to policy
analyses (Anneke Zuiderwijk, Keith Jeffery, Marijn Janssen) #CeDEM12

Published in: Education, Technology
  • Be the first to comment

The necessity of metadata for linked open data and its contribution to policy analyses #CeDEM12

  1. 1. The necessity of metadata for linked open data and its contribution to policy analyses Anneke Zuiderwijk*, Keith Jeffery**, Marijn Janssen* *Delft University of Technology, The Netherlands **Science and Technology Facilities Council, United Kingdom CEDEM 2012, May 3-4
  2. 2. Open governmental data0 "We are sending a strong signal to administrations today. Your data is worth more if you give it away. So start releasing it now.” (December 12, 2011)European Commission Vice President Neelie Kroes, digital agenda:Turning government data into gold)0 One of many examples that shows that open governmental data have gained considerable attention recently CEDEM 2012
  3. 3. The ENGAGE project0 ENGAGE (FP7): An Infrastructure for Open, Linked Governmental Data Provision towards Research Communities and Citizens ( Main goal: the development and use of a data infrastructure, incorporating distributed and diverse public sector information (PSI) resources.0 The ENGAGE platform will enable researchers and citizens to: 0 Discover and browse datasets across diverse and dispersed public sector information resources (local, national and European) in their own language 0 Download the datasets 0 Perform geospatial search of datasets 0 Visualize properly structured datasets in data tables, maps and charts CEDEM 2012
  4. 4. Open governmental data0 Open governmental data can be defined as “all stored data of the public sector which could be made accessible by government in the public interest without any restrictions on usage and distribution” (Geiger & Von Lucke, 2011, p. 185).0 For example, public sector data can be: 0 Geographic data (e.g. cadastral information) 0 Legal data (e.g. courts decisions, legislation) 0 Meteorological data (e.g. climate data, weather forecasts) 0 Social data (e.g. population, public administration) 0 Transport data (e.g. traffic congestion, work on roads) 0 Business data (e.g. chamber of commerce, patents) (MEPSIR study, Dekkers et al., 2006) CEDEM 2012
  5. 5. Linked open data (LOD)0 Focus on turning public sector PUBLIC SECTOR (POLICY) data into LOD (1) DATA METADATA1. Public body produces data (and (2) metadata) PUBLICATION ON THE SEMANTIC WEB2. Data become available on the (3) Web of Data / Semantic Web REUSING OPEN DATA3. Open data can be reused (4)4. Open data can be linked to other LINKING DATA data  show relationships (5)5. Data are both open and linked  LINKED OPEN DATA Linked Open Data (LOD) Figure 1: Process for creating Linked Open Data CEDEM 2012
  6. 6. Metadata0 Metadata are part of the LOD-process0 Metadata are needed to make sense of the open data (Berners- Lee, 2009)0 Metadata are defined as “structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.” (National Information Standards Organization, 2004, p. 1).0 Metadata provision in the ideal situation: 0 Discovery metadata, e.g. identifier, title, creator, keywords. 0 Contextual metadata, e.g. organizations, projects, funding. 0 Detailed metadata, e.g. quality and domain specific parameters. CEDEM 2012
  7. 7. Why metadata are necessary in analyzing LOD0 Metadata for LOD can be useful in the following situations.Metadata:0 create order within datasets;0 improve storing and preservation of LOD;0 improve easily finding LOD;0 improve the accessibility of LOD;0 may make it possible to assess and rank the quality of LOD;0 improve easily analyzing, comparing, reproducing and therefore finding inconsistencies in LOD;0 improve chances of a correct interpretation of LOD;0 improve the possibilities to find patterns in LOD to generate new hypotheses;0 may improve visualizing LOD;0 make it easier to link data ;0 avoid unnecessary duplication of LOD. CEDEM 2012
  8. 8. Problem statement0 Discrepancies between the benefits that are described in literature and the benefits that are obtained in reality0 Current situation is a long way from the ideal situation: 0 usually few and insufficient ways of managing metadata and interpretation of LOD (for instance Hernández-Pérez et al., 2009; Schuurman et al., 2008; Xiong et al., 2011); 0 adding metadata is often viewed as an additional activity that only consumes resources.0 Statements: 0 Merely linking data is not enough to make use of open data 0 Metadata are key enablers for the effective use of LOD in policy-making CEDEM 2012
  9. 9. Requirements for a metadata architecture0 The metadata should: 0 be easily discovered; 0 interconvert common metadata formats used in PSI; 0 provide a LOD representation of the metadata for browsing or query; 0 maintain the capabilities of conventional information systems with structured query including convenient primitive operations. CEDEM 2012
  10. 10. Outline architecture0 The requirements lead to the following architecture: Portal server PORTAL METADATA RUNNING SOFTWARE APPLICATION PSI PSI PSI DATA- DATA- DATA- Application Server SET SET SET PSI Dataset Servers Figure 2: An architecture of a portal server for the provision of metadata. CEDEM 2012
  11. 11. Metadata0 Metadata should be used to implement this architectureA 3-layer structure for metadata is used:a) discovery (flat) metadata; for example: 0 Dublin Core (DC); 0 e-Government Metadata Standard (e-GMS); 0 Comprehensive Knowledge Archive Network (CKAN); 0 or similar ‘flat’ metadatab) contextual metadata; uses the Common European Research Information Format (CERIF) ;c) detailed metadata. CEDEM 2012
  12. 12. The Vision: Metadata for Data Model DISCOVERY Linkedopen data (DC, eGMS…) Generate CONTEXT (CERIF) Formal Point toInformation Systems DETAIL (SUBJECT OR TOPIC SPECIFIC)
  13. 13. DesignThe presented structure provides the next improved facilities:0 CERIF provides a much richer metadata than the standards used commonly with PSI datasets.0 The representation of contextual metadata (CERIF) allows rich semantics to be represented thus making the PSI datasets understandable to the end user (or software) through the metadata.0 The Structured Query Language (SQL) has a simpler structure than SPARQL and includes convenient primitive operations for simple statistical calculations such as sum, count, average. CEDEM 2012
  14. 14. Benefits of architecture0 Because of the powerful expressive semantics over formal syntax of CERIF we can: 0 Generate discovery metadata from CERIF; 0 Interconvert common metadata formats used in PSI using CERIF as the superset exchange mechanism; 0 Provide a semantic web / LOD representation of the metadata for browsing or query using SPARQL; 0 While maintaining a conventional information systems capability with structured query including convenient primitive operations. CEDEM 2012
  15. 15. Models for an infrastructure0 The data model with its metadata described is only one relevant model0 The other models are: 0 User model 0 Processing model 0 Resource model
  16. 16. The Vision: The Models User Model Processing Model Data Model Resource Model Complete cohort of users Complete ICT environment for PSI
  17. 17. Model – User model0 User Model: controls the way in which the end-user interacts with the e-infrastructure. 0 User profile, security certification, privacy; 0 Device and interaction mode preferences (keyboard/mouse through voice and gesture to brain-connected), language preference; 0 Resource preferences (including contacts) with directories;0 METADATA
  18. 18. Models – Processing model0 Process Model controls the way processes are constructed and executed in the e-infrastructure 0 Services 0 Described for discovery, described for functional and non-functional (security, privacy, performance) properties 0 Mobile (deployed in distributed / parallel execution environments) 0 Open source where possible 0 Service composition 0 Dynamically (re-) composable during execution0 METADATA
  19. 19. Models – Data model0 Data Model controls data representation and data (re-)use 0 Formal syntax (structure) 0 Even for text, images, streamed video 0 Declared semantics (meaning)0 METADATA
  20. 20. Models – Resource model0 Resource Model catalogs the available computing resources in the e-infrastructure 0 This allows virtualisation so the user neither knows nor cares from where the data comes, or where the processing is done, as long as quality of service is maintained; 0 Requires updating by resource owners – together with conditions of use0 METADATA
  21. 21. Conclusions (1)0 Metadata are needed to make sense of the open data0 Merely linking data is not enough to make optimal use of open data0 Metadata are key enablers for policy-making0 Adding metadata can yield considerable benefits, including: 0 creating order in datasets 0 improving find ability, accessibility, storing and preservation of LOD 0 improving easily analyzing, comparing, reproducing, finding inconsistencies 0 correct interpretation and visualizing of LOD 0 finding patters in LOD to generate new hypotheses 0 making linking of data easier 0 assessing and ranking the quality of LOD and avoiding unnecessary duplication of LOD CEDEM 2012
  22. 22. Conclusions (2)0 Architecture for metadata: 0 discovery metadata can be generated from CERIF 0 common metadata formats can use CERIF as the superset exchange mechanism 0 a LOD representation of the metadata for browsing or query can be made allowing the use of SPARQL 0 while a conventional information systems capability with structured query including convenient primitive operations can be maintained0 We recommend to further implement the proposed metadata architecture CEDEM 2012