Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Framework and Roadmap towards an Open Science Infrastructure/Simon Hodson

47 views

Published on

Presented during the African Open Science Platform ICT Infrastructure meeting on 14 May 2018, Pretoria, South Africa.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Framework and Roadmap towards an Open Science Infrastructure/Simon Hodson

  1. 1. Framework and Roadmap towards an Open Science Infrastructure Simon Hodson, Executive Director, CODATA www.codata.org AOSP Workshop: Framework and Roadmap towards an Open Science Infrastructure Centurion Lake Hotel 14 May 2018
  2. 2.  Vision of a coordinating activity to help put in place and link the enabling practices, capacities and technologies for Open Science.  Pan African in ambition.  Funded by Department of Science and Technology via National Research Foundation; delivered by ASSAf, directed by CODATA.  Current three year pilot preparing the foundations for a broader initiative.  Successful first strategy workshop (March 2018) followed by a stakeholder workshop (Sept 2018) to prepare the platform initiative.  Aim for this to be launched at Science Forum South Africa, Dec 2018. African Open Science Platform
  3. 3.  Key deliverables of the pilot project will be foundations for the platform in these four key area: 1. Frameworks and guidance to assist policy development at national and institutional level. 2. Study and recommendations to reduce barriers and provide constructive incentives for Open Science. 3. Framework for data science training (including RDM, data stewardship and science of data); curriculum framework, training materials, recommendations for training initiatives. 4. Framework and roadmap for data infrastructure development: emphasising partnerships and de-duplication between national systems, economies of scale, institutions and domain initiatives. Framework for Policies, Incentives, Training and Technical Infrastructures
  4. 4. Developing a Framework and Roadmap for Open Science Infrastructure  Today’s meeting: to help inform the project on matters of data infrastructure and to benefit from your expertise.  A preliminary document identifying a set of priorities and a plan for development to inform discussions in September.  Virtualised network, compute and storage: delivered in such as way as to achieve economies of scale (regional, national and institutional dimensions).  Open Science Infrastructure: including international ecosystem for FAIR data, requirements of data stewardship, specialised Research Infrastructures.  A final project output which will lay out a vision and set of priorities and actions for data infrastructure to inform the activities of a proposed phase two.
  5. 5. The Case for Open Data in a Big Data World • Science International Accord on Open Data in a Big Data World: http://www.science-international.org/ • Supported by four major international science organisations. • Presents a powerful case that the profound transformations mean that data should be: • Open by default: as open as possible, as closed as necessary • Intelligently open: FAIR data • Lays out a framework of principles, responsibilities and enabling practices for how the vision of Open Data in a Big Data World can be achieved. • Campaign for endorsements: over 150 organisations so far. • Please consider endorsing the Accord: http://www.science-international.org/#endorse
  6. 6. Framework for Regional, National and Institutional Data Strategies  National / Institutional Open Science and FAIR Data Strategy  Consultative forum, stakeholder engagement.  Open data policies and guidance at national and institutional level.  Clarify the boundaries of open (particularly privacy, IPR).  Clarify the data in scope, guidelines on selection.  Develop incentives and reward systems.  Mechanisms (infrastructure and policy) to ensure concurrent publication of data as research output.  Data ‘publication’ and citations of data included in assessment of research contribution.  Promotion of data skills:  Essential data skills for researchers.  Develop skills and competencies for data stewards, data scientists.
  7. 7. Framework for Regional, National and Institutional Data Strategies  Scope, roadmap and implement data infrastructure.  Network, compute and storage: key components of national, regional infrastructure (network / NREN, economies of scale for storage and compute).  Engagement with international FAIR Data / Open Science data ecosystem components: permanent identifiers, metadata standards, standards for TDRs, etc.  Data Stewardship Infrastructure: Development of regional, national and institutional infrastructure(s) for data stewardship and Open Science (RDM, generic and specialised research platforms/environments, trusted digital repositories).  Collaborative Research Infrastructures: RIs and research tools for certain research disciplines, nationally, regionally to pool expertise and lower costs.
  8. 8. Vision and Mission of an African Open Science Platform  African scientists are at the cutting edge of contemporary, data-intensive science as a fundamental resource for a modern society.  A digital ecosystem with five complementary aims governed by a set of common principles and practices: 1. A virtual space for scientists to find, deposit, manage, share and reuse data, software and metadata; 2. A means of continually developing capacities at all levels of national science systems and amongst professionals and their institutions operating in the public and private domain; 3. A basis for multi-stakeholder consortia that wish to utilise powerful digital tools in addressing major common problems, and for work in the trans-disciplinary mode; 4. A forum for exchange of ideas, best practices and opportunities amongst Platform partners and with the international data-science community. 5. An African Data Science Institute, to advance the frontiers of data science and provide support for interdisciplinary research domains where there are particularly strong data assets in Africa.
  9. 9. African Open Science Platform: Suggested Phase Two Activities 1. Registry of African data initiatives, collections and services 2. Coordination and provision of network, compute and storage (building on current work of NRENs, targeting needs of Open Science, achieving economies of scale). 3. A virtual space for scientists to find, deposit, manage, share and reuse data, software and metadata (i.e. support for / or provision of FAIR data components, data stewardship and Research Infrastructures). 4. An African Data Science Institute (to develop African capacities at the international cutting edge of research in data analytics, artificial intelligence, machine learning and data stewardship). 5. Major data-intensive programmes in science areas where Africa is data-asset rich (process for identifying these areas, obtaining funding, ensuring that RIs are in place). 6. Network for Education and Skills in Data and Information (training programmes in data science, data stewardship, data literacy, targeted at all stages of education). 7. Network for Open Science Access and Dialogue (building full engagement and joint action in transdisciplinary and citizen science initiatives as an essential component of Open Science).
  10. 10. Emerging Policy Consensus? FAIR Data • FAIR Data (see original guiding principles at https://www.force11.org/node/6062 • Findable: have sufficiently rich metadata and a unique and persistent identifier. • Accessible: retrievable by humans and machines through a standard protocol; open and free by default; authentication and authorization where necessary. • Interoperable: metadata use a ‘formal, accessible, shared, and broadly applicable language for knowledge representation’. • Reusable: metadata provide rich and accurate information; clear usage license; detailed provenance.
  11. 11. European Commission Expert Group on FAIR Data Core Deliverables 1. To develop recommendations on what needs to be done to turn each component of the FAIR data principles into reality 2. To propose indicators to measure progress on each of the FAIR components 3. Actively support the creation of the FAIR Data Action Plan, by proposing a list of concrete actions as part of its Final Report 4. Draft for consultation, released 11 June 2018, final report October 2018. 5. Support Commission in presentation of FAIR Data Action Plan in Autumn 2018. Report Structure 1. Concepts: Why FAIR? 2. Creating a culture of FAIR data 3. Making FAIR data a reality: technical perspective 4. Skills and capacities for FAIR data 5. Measuring Change 6. Facilitating Change: a FAIR Data Action Plan
  12. 12. FAIR Guiding Principles (1) • To be Findable: • F1. (meta)data are assigned a globally unique and persistent identifier • F2. data are described with rich metadata (defined by R1 below) • F3. metadata clearly and explicitly include the identifier of the data it describes • F4. (meta)data are registered or indexed in a searchable resource • To be Accessible: • A1. (meta)data are retrievable by their identifier using a standardized communications protocol • A1.1 the protocol is open, free, and universally implementable • A1.2 the protocol allows for an authentication and authorization procedure, where necessary • A2. metadata are accessible, even when the data are no longer available (Mons, B., et al., The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data, http://dx.doi.org/10.1038/sdata.2016.18)
  13. 13. FAIR Guiding Principles (2) • To be Interoperable: • I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. • I2. (meta)data use vocabularies that follow FAIR principles • I3. (meta)data include qualified references to other (meta)data • To be Reusable: • R1. meta(data) are richly described with a plurality of accurate and relevant attributes • R1.1. (meta)data are released with a clear and accessible data usage license • R1.2. (meta)data are associated with detailed provenance • R1.3. (meta)data meet domain-relevant community standards (Mons, B., et al., The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data, http://dx.doi.org/10.1038/sdata.2016.18)
  14. 14. International ‘ecosystem’ of open science and FAIR data components  Open Science infrastructure is not just the network, storage and compute.  Ecosystem of components which are created and governed internationally.  Reporting Research Outputs: information systems for research output reporting (CRIS), metadata standards e.g. CERIF, managed by euroCRIS.  Persistent and Unique Identifiers: DOIs for articles (CrossRef); DOIs for data sets (DataCite); author IDs (ORCID).  Data and Metadata Standards: CIF in crystallography, FITS in astronomy, DDI in social science surveys, Darwin Core in biodiversity, etc, etc.  DCC Registry of Metadata Standards http://www.dcc.ac.uk/resources/metadata-standards ; now maintained by RDA IG http://rd-alliance.github.io/metadata-directory/  Data Repositories: listed in Re3Data, registry of data repositories: https://www.re3data.org/  Trusted Data Repositories: Core Trust Seal https://www.coretrustseal.org/, a merger of Data Seal of Approval and the World Data System criteria.  Criteria for Trustworthy Digital Archives (DIN 31644) http://www.data- archive.ac.uk/curate/trusted-digital-repositories/standards-of-trust?index=3  Audit and certification of trustworthy digital repositories (ISO 16363) http://www.data- archive.ac.uk/curate/trusted-digital-repositories/standards-of-trust?index=2
  15. 15. Components of a FAIR ecosystem 15
  16. 16. Plan Create Use AppraisePublish Find Reuse Store Annotate Select DiscardDescribe Identify Hand Over? Access Supporting the Research Data Lifecycle
  17. 17. RDM lifecycle diagram for maturity assessment, DCC 2018, based on Hodson and Molloy 2013 • Full lifecycle data infrastructures:  Preparation of DMPs  Management of active data  Appraisal and selection  Stewardship and preservation  Ensuring the Data is FAIR (discovery metadata, identifier, access mechanisms and controls, usage license, domain and provenance metadata…) Open Science and FAIR Data Services
  18. 18. Where should research data go? • Earth observation data; • Genetic data; • Social science survey data… Homogenous data collections essential for research • Significant data outputs from funded projects; • Raw and analysed experimental data… Significant data outputs of publicly funded research • Raw and analysed data for reproducibility (evidence); • Data behind the graph… Data underpinning research publications National and international data archives National or institutional data archives; data papers Dedicated data archives (e.g. Dryad)
  19. 19. Open Science, FAIR Data: Commons, Clouds, Platforms…  Commons: ‘collectively owned and managed by a community of users’  Clouds: European Open Science Cloud (not just European, not entirely Open, not just for science and not exclusively cloud technology)…  Platform Approaches:  brokerage for discovery and access, reinforced by the development of common standards and principles or policies (e.g. GEOSS, Research Data Australia);  brokerage of services: approaches for discovery and access, augmented by the provision of services for particular research disciplines, including the promotion of skills, training, competences, standards, tools for analysis etc (e.g. Elixir, CESSDA and other ESFRIs, CGIAR on a global scale);  platform environment: utilizing the capacity of Cloud Computing for efficiency, access management, analysis across vast numbers of datasets, marketisation of services in a platform economy in which standards and common rules minimize vendor lock-in (e.g. NIH Data Commons, European Open Science Cloud).
  20. 20. EOSC Declaration  [EOSC architecture] The EOSC will be developed as a data infrastructure commons serving the needs of scientists. It should provide both common functions and localised services delegated to community level. Indeed, the EOSC will federate existing resources across national data centres, European e-infrastructures and research infrastructures  [Service deployment] The EOSC shall support different deployment models (e.g. Infrastructure as a Service, Platform as a Service, Software as a Service), to meet the needs of communities at different levels of maturity in the provision and use of research data service. The EOSC shall support the whole research lifecycle by strong development at platform level that facilitate the provision of a wide set of software, infrastructure, protocols, methods, incentives, training, services.  [Thematic areas] The EOSC shall promote the co-ordination and progressive federation of open data infrastructures developed in specific thematic areas (e.g. health, environment, food, marine, social sciences, transport). The EOSC will implement a common reference scheme to ensure FAIR data uptake and compliance by national and European data providers in all disciplines.
  21. 21. EOSC Declaration  [FAIR principles] Implementation of the FAIR principles must be pragmatic and technology-neutral, encompassing all four dimensions: findability, accessibility, interoperability and reusability. FAIR principles are neither standards nor practices. The disciplinary sectors must develop their specific notions of FAIR data in a coordinated fashion and determine the desired level of FAIR-ness. FAIR principles should apply not only to research data but also to data-related algorithms, tools, workflows, protocols, services and other kinds of digital research objects.  [Research data repositories] Trusted research data repositories play a fundamental role in modern science. Scientist must be able to find, re-use, deposit and share data via trusted data repositories that implement FAIR data principles and that ensure long-term sustainability of research data across all disciplines.  [Data Management Plans] A key element of good data management is a Data Management Plan (DMP); the use of DMPs should become obligatory in all research projects generating or collecting publicly funded research data, based on online tools conforming to common methodologies. Funder and institutional requirements must be aligned and minimum conditions for DMPs must be defined. Researchers' host institutions have a responsibility to oversee and complete the DMPs and hand them over to data repositories.
  22. 22. EOSC Declaration  [Citation system] A data citation system should be put in place to reward the provision of excellent open data. This will assist both the assessment of researchers and their projects, and help implementing the findability, accessibility, interoperability and reusability of research data.  [Common catalogues] There must be catalogues (e.g. for datasets, services, standards) based on machine readable metadata and identifiable by means of a common and persistent identification mechanism that will make research data findable via an 'EOSC Portal'.  [Semantic layer] Research data must be both syntactically and semantically understandable, allowing meaningful data exchange and reuse among scientific disciplines and countries.  [FAIR tools and services] Easy access must be available to a common set of FAIR tools and services, to guide the curation of FAIR data for re-use and to assess FAIR compliance.
  23. 23. INTERNATIONAL DATA WEEK IDW 2018 Gaborone, Botswana: 5-8 November 2018 Information: http://internationaldataweek.org/ Deadline for abstracts, 31 May: https://www.scidatacon.org/IDW2018/
  24. 24. CODATA-RDA School of Research Data Science • Annual foundational school at ICTP, Trieste (with the objective to build a network of partners, train-the- trainers). • Advanced workshops, ICTP, Trieste, following the foundational school. • National or regional schools, organised with local partners. 2018 • Next #DataTrieste Summer School, 6-17 August 2018. • Next #DataTrieste Advanced Workshops 20-24 August 2018. • Call for applications, deadline 21 May: http://www.codata.org/datatrieste2018 • Schools in Brisbane (UQ and Australian Academy of Sciences); ICTP Kigali (October); ICTP São Paulo (December)
  25. 25. Simon Hodson Executive Director CODATA www.codata.org http://lists.codata.org/mailman/listinfo/codata-international_lists.codata.org Email: simon@codata.org Twitter: @simonhodson99 Tel (Office): +33 1 45 25 04 96 | Tel (Cell): +33 6 86 30 42 59 CODATA (ICSU Committee on Data for Science and Technology), 5 rue Auguste Vacquerie, 75016 Paris, Thank you for your attention!
  26. 26. RDM lifecycle diagram for maturity assessment, DCC 2018, based on Hodson and Molloy 2013
  27. 27. CODATA Prospectus: https://doi.org/10.5281/zenodo.1167846 Principles, Policies and Practice Capacity Building Frontiers of Data Science Data Science Journal CODATA 2017, Saint Petersburg 8-13 Oct 2017
  28. 28. SciDataCon part of International Data Week  SciDataCon aims to help this community ensure that it has a concrete scientific record of its work: peer reviewed abstracts > presentations > Special Collection in the Data Science Journal.  Themes and Scope: see https://www.scidatacon.org/conference/IDW2018/conference_themes_and_scope/  Approved Sessions: https://www.scidatacon.org/conference/IDW2018/approved_sessions/  Incredibly rich range of topics. If you do not find a topic there you can submit an abstract to the general submissions.  Abstracts can be submitted to Approved Sessions or to General Submissions. Will be peer reviewed and distributed into the programme.  Abstracts for presentations and lightning talks/posters.  Deadline is 31 May: https://www.scidatacon.org/conference/IDW2018/call_for_papers/
  29. 29. International Data Week Keynotes  Joy Phumaphi, former Minister of Health, Botswana; co-chair of WHO Group on Family and Community Health.  Rob Adam, Director of SKA South Africa, a major African science and data initiative.  Ismail Serageldin, founding Director of the new Biblioteca Alexandrina, noted thinker on science policy issues.  Elizabeth Marincola, former CEO of PLOS; now leading the African Academy of Sciences publication initiatives (see AAS Open Research).  Tshilidzi Marwala, VC of University of Johannesburg, noted thinker in Big Data and AI.
  30. 30. What is Open Science? (1)  Open access to research literature.  Data that is as Open as possible, as closed as necessary.  FAIR Data (Findable, Accessible, Interoperable, Reusable).  Data is a recognised and important output of research.  A culture and methodology of open discussion and enquiry (including methodology, lab notebooks, pre- prints).  Data code and analysis processes are shared for reproducibility.  Engagement with society and the economy in research activities (citizen science, co-design / transdisciplinary research, interface between research, development and innovation).
  31. 31. What is Open Science? (2)  Open Science is not just Open Access + Open Data.  Individuals, institutions and the science system benefits from putting research outputs (including data) in the open: shop window and repository of all research outputs.  Important role of open processes, open data and reproducibility / replicability.  Role of AI / Machine Learning: analysis at scale.  Open innovation and transdisciplinary research.  The Open Science ethos and co-design helps build collaboration between research institutions, societal groups, government agencies, third sector and industry.
  32. 32. CODATA-RDA School of Research Data Science • Contemporary research – particularly when addressing the most significant, interdisciplinary research challenges – increasingly depends on a range of skills relating to data. • These skills include the principles and practice of Open Science; research data management and curation, how to prepare a data management plan and to annotate data; software and data carpentry; principles and practices of visualisation; data analysis, statistics and machine learning; use of computational infrastructures. The ensemble of these skills, relating to data in research, can usefully be called ‘Research Data Science’.
  33. 33. DataTrieste Film on Vimeo: https://vimeo.com/232209813 Call for applications, deadline 21 May: http://www.codata.org/datatrieste2018

×