ENGAGE: An Infrastructure for Open, Linked Governmental Data Provision... – Nikos Houssos


Published on

OpenAIRE Interoperability Workshop (8 Feb. 2013).
ENGAGE: An Infrastructure for Open, Linked Governmental Data Provision towards Research Communities and Citizens – Nikos Houssos, National Documentation Centre (EKT)/euroCRIS

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

ENGAGE: An Infrastructure for Open, Linked Governmental Data Provision... – Nikos Houssos

  1. 1. ENGAGE: An Infrastructure for Open, Linked Governmental Data Provision towards Research Communities and Citizens Nikos Houssos National Documentation Centre (EKT) / NHRF EuroCRIS OpenAIRE Interoperability Workshop, University of Minho, 7/8 February 2013
  2. 2. Agenda 0 ENGAGE project overview 0 ENGAGE interoperability aspects 0 ENGAGE collaboration opportunities
  3. 3. ENGAGE Project Information Acronym ENGAGE Title An Infrastructure for Open, Linked Governmental Data Provision towards Research Communities and Citizens Website http://www.engage-project.eu Platform http://www.engagedata.eu Project participants Research Infrastructures Contract no RI-283700 Project type CP-CSA Start date 01/06/2011 Duration 36 months Partners 9 Framework Programme 7 (2007-2013) NTUA (Coordinator) GR TU-DELFT NL MIC-GR GR IBM-ISRAEL IL INTRASOFT LU STFC UK FhG-FOKUS DE AEGEAN GR EUROCRIS NL
  4. 4. Public Sector Information 0 Data produced by governmental organisations – typically referring to datasets 0 Examples: geospatial, demographic, statistical, environmental, public safety, financial data 0 Growing international movement: open access to PSI datasets in a way that facilitates reuse 0 Opening up PSI datasets can potentially lead to substantial economic gains 1 1Vickery, G. (2011): Review of recent studies on PSI re-use and related market developments.
  5. 5. Overview of ENGAGE objectives • • • Development and use of a data infrastructure, incorporating distributed and diverse public sector information (PSI) resources Capable of supporting scientific collaboration and research, particularly for the Social Science and Humanities (SSH) scientific communities, Empowering the deployment of open governmental data towards citizens. Simply put, ENGAGE is a door for researchers that leads them to the world of Open Government Data. Through the ENGAGE platform, researchers and citizens will be able to search, browse, download, visualise and submit diverse and distributed Public Sector datasets from EU countries.
  6. 6. ENGAGE Two-way Scenario Public Sector Information Collection •Public Sector Organisations •Open data initiations Data Curation •Pre-processing •Anonymisation •Harmonisation •Annotation •Linking Archival •Cloud and Grid Infrastructure •Platform Independence and Interoperability Data Search and Retrieval •Open and intuitive access to the data collection •Context-specific search Advanced Data Services •Visualisation (inc. combined views) •Context-specific formatting •Collaboration tools Delivering Open Data Needs and guidelines to Public Sector Organisations New Problems – new Challenges •Society •Policy Search Data Needs •Research Communities •Policy makers New Service Definition for open data •ENGAGE Utilisation of existing Infrastructures •ENGAGE and eInfrastructures Needs for Governmental data Provision •Public Sector Organisations
  7. 7. Ministries / local public agencies websites Publicdata.eu National Statistical Offices Unstructured / “Semi-structured” Public data sources ENGAGE traverses across distributed and diverse public sector information resources ENGAGE provides a single point of access to PSI sources as well as relevant tools in order to cover the needs of researchers and citizens
  8. 8. ENGAGE: not an isolated data silo but a vital part of the Global Data Space. ENGAGE aims to embrace the Linked Data Paradigm while ensuring the quality and responsiveness of highly structured information models.
  9. 9. ENGAGE will enable EU Researchers / Citizens to  Discover and browse datasets across diverse and dispersed public sector information resources (local, National and European) in their own language.  Upload curated, enhanced or extended versions of existing datasets, originally published by public agencies, in order to address various formats, standards and scientific purposes in a crowdsourcing manner.  Acquire the datasets  Visualize properly structured datasets in data tables, maps and charts Additionally  Utilize ENGAGE Application Programming Interfaces (APIs) for searching and acquiring the datasets.  Rate the quality of datasets on various dimensions  Request additional datasets or information on existing datasets from the Public Agencies  View usage statistics  View publications and other material linked to datasets
  10. 10. Public Agencies will be able to  Utilize the ENGAGE infrastructure (interface and APIs) to publish governmental data  Register and link their datasets within the ENGAGE infrastructure  Receive feedback on the quality of their datasets  Review the opinion or request of citizens and researchers  View the applications, publications and other datasets uploaded by scientists, that are linked to their original published datasets
  11. 11. Unstructured / Semi-structured / Structured Public data sources JSON Discovery and Context Metadata Crowdsourcing Moving from low structured, low value datasets to highly structured and / or derived datasets ENGAGE Low Re-Use Value / Quality structure / metadata High Re-Use Value / Quality structure / metadata Conversion Data Enrichment Metadata Enrichment Cleansing “Snapshots”
  12. 12. ENGAGE 2.0 0 An infrastructure that integrates original PSI data and derived / curated datasets created, maintained and extended by users (researchers, citizens, journalists, computer specialists) in a collaborative environment. A curation platform with focus on the SSH research communities. 0 To be released Spring 2013 0 The vision of the ENGAGE infrastructure is to extract, highlight and enhance the RE-USE value of PSI data. 0 HOW: Moving from low-structured, isolated, difficult to find PSI data to easy to link , easy to process datasets with rich, structured metadata
  13. 13. ENGAGE 2.0 0 On top of ENGAGE basic functions (catalog, search, visualizations, API) Researchers / Citizens / Journalists: 0 Extend other datasets (official or already extended - derived datasets) 0 0 0 0 0 0 Conversions (e.g. HTML- PDF to xls, PDF to RDF) Data Cleansing (e.g. duplicate records, empty rows, errors) Metadata Enrichment (missing metadata, Linked Data Enablers!) Data Enrichment (enrich datasets with more information) Snapshots of real-time data (e.g. Diavgeia_decisions_10_2012_to_12_2012.xls) Mash-ups / Interlinking (e.g. Combine Election results to UV radiation levels!) 0 View the version tree of official – derived datasets (clean solution - easy to understand and manage the contributions / versions)
  14. 14. ENGAGE 2.0 Researchers / Citizens / Journalists: 0 Data Requests 0 0 0 0 Looking for a dataset (e.g. I can’t find it elsewhere. Does it exist?) Looking for a curation / conversion / enrichment (e.g. I am looking for the election results in Greece in XLS. ) Looking for data verification (e.g. Do you think this dataset is valid?) Freedom of Information Requests 0 Integration of tools 0 0 0 Google Refine ScraperWiki Visualizations
  15. 15. ENGAGE 2.0 Data Providers: 0 Maintainers of Official Datasets 0 Work as a group 0 Bring the community which works on their data closer to them/ direct communication 0 See and take advantage of ENGAGE Data Curation Community work (e.g. cleansing, better formats) 0 Easy to see / gather all the Applications that are based on their official datasets. 0 See the impact of their datasets. 0 Understand which datasets have RE-USE value for users. 0 Community Help in the process of Digitalization and Opening of current or older Public Data (history dimension)
  16. 16. Rich, structured metadata to enable Linked Data 0 Structure: Entities and semantic relationships instead of plain fields 0 Each entity has structured metadata, including a URI field 0 Semantics: Each relationship has clear semantics 0 0 What is the relationship of organisation Y with data set X? Creator, maintainer, commissioner,… 0 Ability to dynamically include into the system vocabularies => linked data, reuse of existing vocabularies/ontologies 0 CERIF (Entities and Semantic Layer) provide the required features for contextual metadata
  17. 17. Rich contextual metadata is important 0 Captures context, purpose, provenance, coverage, etc. 0 Allows the user to: 0 Discover a dataset 0 Evaluate utility and re-use potential 0 Reuse it! 0 Enables advanced services 0 Sophisticated search/discovery and navigation, mining, visualisation, reporting 11th International Conference on Current Research Information Systems (CRIS 2012), Prague, 6-9 June 2012
  18. 18. A 3-level metadata approach 0 Level-1. Discovery metadata. Flat schemata (analogous to Dublin core). Enables basic search by non-sophisticated users. 0 Level-2. Usage metadata. A structured, semantically-rich model for contextual metadata. Enables advanced domainindependent services. 0 Level-3. Domain metadata. Detailed domain-specific metadata. Allows advanced services provided by specialised tools. 11th International Conference on Current Research Information Systems (CRIS 2012), Prague, 6-9 June 2012
  19. 19. Metadata approach
  20. 20. Overview of architecture for PSI metadata Data Source 1 Data Source 2 … SPARQL interface RDF / Linked Open Data Data Source N Dublin Coree CERIF eGMS DCAT CKAN 11th International Conference on Current Research Information Systems (CRIS 2012), Prague, 6-9 June 2012
  21. 21. Thank you