ENGAGE Workshop at OpenDataWeek2013

  • 331 views
Uploaded on

The slides that supported the workshop "Accelerating data reuse: international initiatives? what role for the European Commission?" on February 27th, 2013, in Marseille, introducing the ENGAGE …

The slides that supported the workshop "Accelerating data reuse: international initiatives? what role for the European Commission?" on February 27th, 2013, in Marseille, introducing the ENGAGE platform, and animated by Valerie BRASSE

More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
331
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. ENGAGE Workshop, June 27th, 2013, Marseille Accelerate the data re-use: ex of an e-infrastructure at European level Valerie Brasse euroCRIS / IS4RI Strasbourg, France Slides reproduced from presentations by ENGAGE members
  • 2. Agenda  The ENGAGE project, an introduction  The ENGAGE 2.0 platform, released in Beta since April 2013  Open data for re-use in Europe, some barriers to overcome?  Findings from the ENGAGE project  Discussion  Your suggestions to overcome the barriers 2
  • 3. Contract no Project type Start date Duration Partners Framework Programme 7 (2007-2013) NTUA GR TU-DELFT NL MIC-GR GR IBM-ISRAEL IL INTRASOFT LU STFC UK FhG-FOKUS DE AEGEAN GR EUROCRIS NL Acronym ENGAGE Title An Infrastructure for Open, Linked Governmental Data Provision towards Research Communities and Citizens Website http://www.engage-project.eu Platform http://www.engagedata.eu ENGAGE Project Information RI-283700 CP-CSA 01/06/2011 36 months 9 Project participants Research Infrastructures (Coordinator)
  • 4. Public Sector Information 0 Data produced by governmental organisations – typically referring to datasets 0 Examples: geospatial, demographic, statistical, environmental, public safety, financial data 0 Growing international movement: open access to PSI datasets in a way that facilitates reuse 0 Opening up PSI datasets can potentially lead to substantial economic gains 1 1Vickery, G. (2011): Review of recent studies on PSI re-use and related market developments.
  • 5. • Development and use of a data infrastructure, incorporating distributed and diverse public sector information (PSI) resources • Capable of supporting scientific collaboration and research, particularly for the Social Science and Humanities (SSH) scientific communities, • Empowering the deployment of open governmental data towards citizens. Simply put, ENGAGE is a door for researchers that leads them to the world of Open Government Data. Through the ENGAGE platform, researchers and citizens will be able to search, browse, download, visualise and submit diverse and distributed Public Sector datasets from EU countries. Overview of ENGAGE objectives
  • 6. ENGAGE Two-way Scenario Public Sector Information Collection Data Curation Archival Data Search and Retrieval Advanced Data Services Delivering Open Data Needs and guidelines to Public Sector Organisations •Public Sector Organisations •Open data initiations •Pre-processing •Anonymisation •Harmonisation •Annotation •Linking •Cloud and Grid Infrastructure •Platform Independence and Interoperability •Open and intuitive access to the data collection •Context-specific search •Visualisation (inc. combined views) •Context-specific formatting •Collaboration tools •Public Sector Organisations •ENGAGE and eInfrastructures •ENGAGE•Society •Policy •Research Communities •Policy makers New Problems – new Challenges Search Data Needs New Service Definition for open data Utilisation of existing Infrastructures Needs for Governmental data Provision
  • 7. ENGAGE provides a single point of access to PSI sources as well as relevant tools in order to cover the needs of researchers and citizens Unstructured / “Semi-structured” Ministries / local public agencies websites Publicdata.eu National Statistical Offices Public data sources ENGAGE traverses across distributed and diverse public sector information resources
  • 8. ENGAGE aims to embrace the Linked Data Paradigm while ensuring the quality and responsiveness of highly structured information models. ENGAGE: not an isolated data silo but a vital part of the Global Data Space.
  • 9. ENGAGE will enable EU Researchers / Citizens to  Discover and browse datasets across diverse and dispersed public sector information resources (local, National and European) in their own language.  Upload curated, enhanced or extended versions of existing datasets, originally published by public agencies, in order to address various formats, standards and scientific purposes in a crowd- sourcing manner.  Acquire the datasets  Visualize properly structured datasets in data tables, maps and charts Additionally  Utilize ENGAGE Application Programming Interfaces (APIs) for searching and acquiring the datasets.  Rate the quality of datasets on various dimensions  Request additional datasets or information on existing datasets from the Public Agencies  View usage statistics  View publications and other material linked to datasets
  • 10. Public Agencies will be able to  Utilize the ENGAGE infrastructure (interface and APIs) to publish governmental data  Register and link their datasets within the ENGAGE infrastructure  Receive feedback on the quality of their datasets  Review the opinion or request of citizens and researchers  View the applications, publications and other datasets uploaded by scientists, that are linked to their original published datasets
  • 11. 0 Integration of original PSI data and derived / curated datasets  created, maintained and extended by users (researchers,  citizens, journalists, computer specialists) in a collaborative  environment. A research / data curation community platform  with focus on the SSH domain. 0 The vision of the ENGAGE infrastructure is to extract, highlight  and enhance the RE‐USE value of PSI data. 0 HOW: Moving slowly from low‐structured, isolated, difficult to find PSI  data to high‐structured, easy to link , easy to process datasets =>  Crowd‐sourcing.
  • 12. Unstructured / Semi-structured / Structured Public data sources JSON Conversion Data Enrichment Metadata Enrichment Cleansing “Snapshots” Low Re-Use Value / Quality structure / metadata Discovery and Context Metadata High Re-Use Value / Quality structure / metadata ENGAGECrowdsourcing Moving from low structured, low value datasets to highly structured and / or derived datasets
  • 13. ENGAGEDATA.EU
  • 14. ENGAGE 2.0 0 On top of ENGAGE basic functions (catalog, search, visualizations, API) Researchers / Citizens / Journalists: 0 Extend other datasets (official or already extended - derived datasets) 0 Conversions (e.g. HTML- PDF to xls, PDF to RDF) 0 Data Cleansing (e.g. duplicate records, empty rows, errors) 0 Metadata Enrichment (missing metadata, Linked Data Enablers!) 0 Data Enrichment (enrich datasets with more information) 0 Snapshots of real-time data (e.g. Diavgeia_decisions_10_2012_to_12_2012.xls) 0 Mash-ups / Interlinking (e.g. Combine Election results to UV radiation levels!) 0 View the version tree of official – derived datasets (clean solution - easy to understand and manage the contributions / versions)
  • 15. ENGAGE 2.0 Researchers / Citizens / Journalists: 0 Data Requests 0 Looking for a dataset (e.g. I can’t find it elsewhere. Does it exist?) 0 Looking for a curation / conversion / enrichment (e.g. I am looking for the election results in Greece in XLS. ) 0 Looking for data verification (e.g. Do you think this dataset is valid?) 0 Freedom of Information Requests 0 Integration of tools 0 Google Refine 0 ScraperWiki 0 Visualizations
  • 16. ENGAGE 2.0 Data Providers: 0 Maintainers of Official Datasets 0 Work as a group 0 Bring the community which works on their data closer to them/ direct communication 0 See and take advantage of ENGAGE Data Curation Community work (e.g. cleansing, better formats) 0 Easy to see / gather all the Applications that are based on their official datasets. 0 See the impact of their datasets. 0 Understand which datasets have RE-USE value for users. 0 Community Help in the process of Digitalization and Opening of current or older Public Data (history dimension)
  • 17. Search for a dataset... ...use your own language
  • 18. Check dataset information... ...and download it
  • 19. Faceted search available... ...with several filters
  • 20. Extend the datasets... ...in several ways...
  • 21. ...and keep the provenance information
  • 22. www.engagedata.eu Open Refine
  • 23. Describe the metadata...
  • 24. Join the community... ..and create your groups
  • 25. Rate the datasets... ..and share your thoughts
  • 26. Find out about Open Data sites...
  • 27. ...per country
  • 28. ...or other criteria
  • 29. Learn more about... ...ENGAGE, the ENGAGE API and Data Curation Methods
  • 30. Functionalities of ENGAGE open data e-infrastructure 0 Contribution of ENGAGE over existing infrastructures: 1. Service for researchers and citizens 2. Metadata specification and content organisation (embracement of the Linked Data Paradigm while ensuring the quality and responsiveness of highly structured information models) 3. Automation in data entry and curation 4. Crowdsourcing and interaction with and between users of the platform 5. Data curation tools and services 6. Dataset visualisation possibilities 7. Multilinguality 8. User help and training
  • 31. Value Proposition through individual tools  Search in diverse and dispersed data sources in EU supported by ENGAGE  Be able to transform your datasets keeping the valuable information with the ENGAGE external tools (Open Refine, Scrapperwiki etc.)  See your results through visualisation tools  Structure your data according to your needs – control all the levels of your dataset (data, metadata, format)  Refine existing datasets by metadata enrichment
  • 32. Value Proposition through collaboration  Create your community(ies) with members of mutual interests  Each community will be able to increase the value of its data sets by applying their own perspectives based on its unique needs  Upload your work and share it with your community  Find other data sets, valuable for your work, uploaded by your community (Collaborate / Exchange / Ask / Provide)  Combine their results with yours – make new datasets
  • 33. Elastic Search Ckan API ScraperWiki API Open Refine Django Wiki Amazon S3 Python / Django Framework HerokuPostgresql Virtuoso PostgreSQL Apache SolR Django Framework Gateways and   integrated  tools User Interface ENGAGE Core Components HTML / Jquery Translate Storage Components CERIF
  • 34. Performing scenarios  Scenario 1: Searching, downloading, extending/ visualizing/ curating/ linking and uploading interesting datasets  Scenario 2: Getting information about other open data websites and comparing them via the ENGAGE website  Scenario 3: Getting information about manuals, API's and tutorials (training)
  • 35. engagedata.eu engage-project.eu → Events → Workshops → ENGAGE Online Usability Test Verification Code = ODWM
  • 36. Agenda  The ENGAGE project, an introduction  The ENGAGE 2.0 platform, released in Beta since April 2013  Open data for re-use in Europe, some barriers to overcome?  Findings from the ENGAGE project  Discussion  Your suggestions to overcome the barriers 8
  • 37. From V1 evaluation  Asking for: – More of the specific datasets that users are looking for – Better performing advanced search functionality – More / more open dataset formats – More tools for visualization – More metadata – More metadata in the language that the user understands – Better understandable metadata – Easy to find metadata – Information about the quality of the datasets – Ability to rate and post comments on datasets ➢ Metadata are very important in solving many problems + Multilinguality + Dataset formats 3
  • 38. Challenges of data sourcing • Great diversity and variety on datasets in terms of • File format • Encoding • License • Language • Metadata standard (Discovery level) • Metadata standard (Data ‐ Domain level) • Some PSI sites (even new) do not provide an API • Most sites provide an API only for discovery • Linked Data potential still not achieved (IT‐savvy / researchers only) • Live query of other portals datasets has issues: – Schema Mapping – Performance
  • 39. Barriers to overcome?  Metadata  Need for a rich format to facilitate discovery and search (DCAT, CKAN..., CERIF?)  How to have it filled: human vs extraction  Tracking of data ownership and provenance, for trust and security  Datasets formats: from pdf/csv toward LOD/rdf  Multilinguality (metadata and data)  Licences  Many “open” licences: CC and national licences  Curation: what licence for a merged and enriched A + B, depending on A and B licences? 4
  • 40. Agenda  The ENGAGE project, an introduction  The ENGAGE 2.0 platform, released in Beta since April 2013  Open data for re-use in Europe, some barriers to overcome?  Findings from the ENGAGE project  Discussion  Your suggestions to overcome the barriers 11
  • 41. Barriers to overcome?  Metadata  Need for a rich format to facilitate discovery and search (DCAT, CKAN..., CERIF?)  How to have it filled: human vs extraction  Tracking of data ownership and provenance, for trust and security  Datasets formats: from pdf/csv toward LOD/rdf  Multilinguality (metadata and data)  Licences  Many “open” licences: CC and national licences  Curation: what licence for a merged and enriched A + B, depending on A and B licences? 5
  • 42. Rich contextual metadata is important 0 Captures context, purpose, provenance, coverage, etc. 0 Allows the user to: 0 Discover a dataset 0 Evaluate utility and re-use potential 0 Reuse it! 0 Enables advanced services 0 Sophisticated search/discovery and navigation, mining, visualisation, reporting 11th International Conference on Current Research Information Systems (CRIS 2012), Prague, 6-9 June 2012
  • 43. • Need canonical form to reduce n(n‐1) conversions to n – PSI data has several different metadata ‘standards’ • Canonical form must be able to ingest or generate the other  metadata ‘standards’ – Implies has to be richer than the others • Syntax (structure) and semantics • Support multiple semantics over canonical syntax • Canonical form must support whatever architecture is used Mapping considerations
  • 44. A 3‐level metadata approach • Level‐1. Discovery metadata. Flat schemata  (analogous to Dublin core). Enables basic search by  non‐sophisticated users. • Level‐2. Usage metadata. A structured,  semantically‐rich model for contextual metadata.  Enables advanced domain‐independent services. • Level‐3. Domain metadata. Detailed domain‐ specific metadata. Allows advanced services  provided by specialized tools.
  • 45. A 3‐level metadata approach CSMD  Scientific studies Samples, parameters,… DDI Social sciences Surveys, Populations,  questionnaires,… INSPIRE Geospatial data Geospatial info SDMX Statistical data Measures, Dimensions, … Level‐3 eGMS DCAT CKAN  DC Level‐1 Level‐2 CERIF CERIF‐generated RDF/LOD
  • 46. Target Dataset(s) generate Point to Processing model A 3‐level metadata approach
  • 47. CERIF Common European Research Information Format – maintained by euroCRIS From http://cerifsupport.org/2013/04/02/data-in-cerif/ , B. Joerg CERI F
  • 48. Barriers to overcome?  Metadata  Need for a rich format to facilitate discovery and search (DCAT, CKAN..., CERIF?)  How to have it filled: human vs extraction  Tracking of data ownership and provenance, for trust and security  Datasets formats: from pdf/csv toward LOD/rdf  Multilinguality (metadata and data)  Licences  Many “open” licences: CC and national licences  Curation: what licence for a merged and enriched A + B, depending on A and B licences? 6
  • 49. Dataset formats
  • 50. Community converting dataset to another format
  • 51. Barriers to overcome?  Metadata  Need for a rich format to facilitate discovery and search (DCAT, CKAN..., CERIF?)  How to have it filled: human vs extraction  Tracking of data ownership and provenance, for trust and security  Datasets formats: from pdf/csv toward LOD/rdf  Multilinguality (metadata and data)  Licences  Many “open” licences: CC and national licences  Curation: what licence for a merged and enriched A + B, depending on A and B licences? 7
  • 52. User Interface translation
  • 53. Metadata translation
  • 54. Barriers to overcome?  Metadata  Need for a rich format to facilitate discovery and search (DCAT, CKAN..., CERIF?)  How to have it filled: human vs extraction  Tracking of data ownership and provenance, for trust and security  Datasets formats: from pdf/csv toward LOD/rdf  Multilinguality (metadata and data)  Licences  Many “open” licences: CC and national licences  Curation: what licence for a merged and enriched A + B, depending on A and B licences? 8
  • 55. Open licenses landscape
  • 56. Open licenses landscape – per country Country Portal Licence France Data.gouv.fr Licence Ouverte United Kingdom Data.gov.uk Open Government Licence Italy Dati.gov.it Creative Commons Attribuzione - Non commerciale 2.5 Italia (CC BY-NC 2.5) Germany Govdata.de Datenlizenz Deutschland – Namensnennung Datenlizenz Deutschland – Namensnennung – nicht kommerziell Norway Data.norge.no Norsk lisens for offentlige data (NLOD) Netherlands Data.overheid.nl No specific common licence but a recommendation for the agencies publishing data through the portal to use the framework of the Open Government Act, and to apply Creative Commons Zero of Public Domain if any licence is desired at all Spain Datos.gob.es No specific licence but two parts in extensive legal notes that cover data re-use and are based on different pieces of Spanish national legislation Belgium Data.gov.be No specific common licence. Each public service or government institution determines the terms and conditions governing access to and use of its data published through portal. From Bunakov, V., Jeffery, K. (2013). Licence management for Public Sector Information. Conference for eDemocracy & Open Government
  • 57. Open licenses landscape – several types
  • 58. Open license content – an example Regulation components of data.gouv.fr open licence From Bunakov, V., Jeffery, K. (2013). Licence management for Public Sector Information. Conference for eDemocracy & Open Government
  • 59. Conclusion: Participants' suggestions The ENGAGE platform and features are interesting to promote data re-use pending the fulfillment of the next points: 1. Have a clear view of the positioning of ENGAGE in the Open Data ecosystem, including the added value / differences with respect to HOMER and other Open Data-related EC projects 2. Ensure ENGAGE sustainability 3. For ENGAGE in particular, and for EC projects in general, the developed software should be required to be open source in order to ensure their sustainability 4. Success stories related to the use of ENGAGE should be promoted, for example demonstrating the savings in time for Researchers 5. The educative side should be strong, with the inclusion of basic information on Linked Data, video tutorials,... 25
  • 60. vbrasse@is4ri.com @valcas2000 http://www.engage-project.eu Join Us Thank you for your contribution!