Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A coordinated framework for open data open science in Botswana/Simon Hodson

110 views

Published on

Botswana 30-31 Oct 2017

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

A coordinated framework for open data open science in Botswana/Simon Hodson

  1. 1. A Coordinated Framework for Open Data Open Science in Botswana Simon Hodson, Executive Director, CODATA www.codata.org National Forum: Open Data Open Science University of Botswana Conference Centre Gaborone, Botswana 30 October 2017
  2. 2. FAIR Data • FAIR Data: increasingly widely adopted as summary of attributes that increase value of research data. • Chair of European Commission Expert Group on FAIR Data ‘Making FAIR Data a Reality’: http://bit.ly/FAIRdata-EG • One of European Experts in ZA-EU Open Science Dialogue. • Member of EOSC Advisory Board. • The value of data lies in reuse. What are the attributes that make data reusable? • Findable: have sufficiently rich metadata and a unique and persistent identifier. • Accessible: retrievable by humans and machines through a standard protocol; open and free by default; authentication and authorization where necessary. • Interoperable: metadata use a ‘formal, accessible, shared, and broadly applicable language for knowledge representation’. • Reusable: metadata provide rich and accurate information; clear usage license; detailed provenance. • FAIR is augmented by the principle that data should be ‘as open as possible, as closed as necessary’, or ‘open by default’.
  3. 3. FAIR Guiding Principles (1) • To be Findable: • F1. (meta)data are assigned a globally unique and persistent identifier • F2. data are described with rich metadata (defined by R1 below) • F3. metadata clearly and explicitly include the identifier of the data it describes • F4. (meta)data are registered or indexed in a searchable resource • To be Accessible: • A1. (meta)data are retrievable by their identifier using a standardized communications protocol • A1.1 the protocol is open, free, and universally implementable • A1.2 the protocol allows for an authentication and authorization procedure, where necessary • A2. metadata are accessible, even when the data are no longer available (Mons, B., et al., The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data, http://dx.doi.org/10.1038/sdata.2016.18)
  4. 4. FAIR Guiding Principles (2) • To be Interoperable: • I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. • I2. (meta)data use vocabularies that follow FAIR principles • I3. (meta)data include qualified references to other (meta)data • To be Reusable: • R1. meta(data) are richly described with a plurality of accurate and relevant attributes • R1.1. (meta)data are released with a clear and accessible data usage license • R1.2. (meta)data are associated with detailed provenance • R1.3. (meta)data meet domain-relevant community standards (Mons, B., et al., The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data, http://dx.doi.org/10.1038/sdata.2016.18)
  5. 5. The Case for Open Data in a Big Data World • Science International Accord on Open Data in a Big Data World: http://www.science-international.org/ • Supported by four major international science organisations. • Presents a powerful case that the profound transformations mean that data should be: • Open by default • Intelligently open, FAIR data • Lays out a framework of principles, responsibilities and enabling practices for how the vision of Open Data in a Big Data World can be achieved. • Campaign for endorsements: over 150 organisations so far. • Please consider endorsing the Accord: http://www.science-international.org/#endorse
  6. 6. The Open Data Iceberg The Technical Challenge The Ecosystem Challenge The Funding Challenge The Support Challenge The Skills Challenge The Incentives Challenge The Mindset Challenge Processes & Organisation People Geoffrey Boulton (CODATA) - developed from an idea by Deetjen, U., E. T. Meyer and R. Schroeder (2015). OECD Digital Economy Papers, No. 246, OECD Publishing. A(n Inter)National Infrastructure Technology
  7. 7. Establish African Open Data Forum / Platform Funded Research Data Infrastructure Initiatives Funded, co-designed transdisciplinary research projects Co-design African Open Data Policies Develop Incentives Frameworks Develop Research Data Science Training African Research Data Infrastructure Roadmap Activities require low funding for coordination, secondment, contributions in kind and evaluation. Activities require higher investment for coordination, co-design implemenatation and evaluation. African Open Science Platform Pilot Project Workpackages
  8. 8. Framework for National and Institutional Data Strategies  National / Institutional Open Science and FAIR Data Strategy  Consultative forum, stakeholder engagement.  Open data policies and guidance at national and institutional level.  Clarify the boundaries of open (particularly privacy, IPR).  Develop incentives and reward systems.  Mechanisms (infrastructure and policy) to ensure concurrent publication of data as research output.  Data ‘publication’ and citations of data included in assessment of research contribution.
  9. 9. Framework for National and Institutional Data Strategies  Promotion of data skills:  Essential data skills for researchers.  Develop skills and competencies for data stewards, data scientists.  Scope, roadmap and implement data infrastructure.  Key components of national and regional infrastructure (network / NREN, economies of scale for storage and compute).  Development of institutional infrastructure for research collaboration and data stewardship/RDM.  Collaborative infrastructures for certain research disciplines, nationally, regionally to pool expertise and lower costs.  International infrastructure / data ecosystem components: permanent identifiers, metadata standards, trusted digital repositories.
  10. 10. Plan Create Use AppraisePublish Discover Reuse Store Annotate Select DiscardDescribe Identify Hand Over? Access Supporting the Research Data Lifecycle
  11. 11. Data Management Planning Managing Active Data Processes for selection and retention Deposit / Handover Data Repositories/ Catalogues Components of RDM support services RDM Policy and Roadmap Business Plan and Sustainability Guidance, Training and Support Research Data Registry / Infrastructure 11 Institutional Research Data Management Policies: http://www.dcc.ac.uk/r esources/policy-and- legal/institutional-data- policies/uk- institutional-data- policies
  12. 12. CODATA-RDA School of Research Data Science • Contemporary research – particularly when addressing the most significant, transdisciplinary research challenges – increasingly depends on a range of skills relating to data. These skills include the principles and practice of Open Science and research data management and curation, the development of a range of data platforms and infrastructures, the techniques of large scale analysis, statistics, visualisation and modelling techniques, software development and data annotation. The ensemble of these skills, relating to data in research, can usefully be called ‘Research Data Science’.
  13. 13. Foundational Curriculum Seven components: open science, data management and curation; software carpentry; data carpentry; data infrastructures; statistics and machine learning; visualisation. Builds on much existing courses to create something more than the sum of its parts:  Open Science – reflection on ethos and requirements of sharing/openness  Open Research Data – Basics of data management, DMPs, RDM life-cycle, data publishing, metadata and annotation  Author Carpentry – Improving research efficiency with command line and OS tools.  Software Carpentry – Introduction the Unix shell and Git (sharing software and data)  Data Carpentry – Introduction to programming in R, and to SQL databases  Visualisation – Tools, Critical Analysis of Visualisation  Analysis – Statistics and Machine Learning (clustering, supervised and unsupervised learning)  Computational Infrastructures – Introduction to cloud computing, launching a Virtual Machine on an IaaS cloud Building international network of short courses http://bit.ly/first_data_school_trieste Programme and materials: http://bit.ly/School_of_Research_Data_Science-Programme ; http://bit.ly/first_data_school_materials
  14. 14. CODATA-RDA Data Science Training Initiative • Annual foundational school hosted at ICTP, Trieste (with the objective to build a network of partners, train-the- trainers). • Advanced workshops, ICTP, Trieste, following the foundational school. • National or regional schools, organised with local partners. • Planning at least two pilot schools as part of the African Open Science Platform project: ICTP, Kigale? Gaborone for IDW? • Next #DataTrieste Summer School, 6-17 August 2018. • Next #DataTrieste Advanced Workshops 20-24 August 2018. • Next regional foundational school ‘CODATA-RDA School of Research Data Science’, São Paulo, 4-15 December 2017: http://www.ictp-saifr.org/?page_id=15270
  15. 15. Data is difficult: benefits and challenges  Open and FAIR data is essential for transparency and reproducibility; to take advantage of analysis at scale; to tackle major interdisciplinary challenges that require integration of data from many resources; has significant economic and other societal benefits, including encouraging partnerships between research, government, innovation and development.  But…  Research funders and research performing institutions will have to invest in data infrastructure.  Essential to consider the cost of data stewardship and dissemination as part of the total cost of doing research.  Data description, definitions and ontologies, data management require significant effort.  Requires new data skills…  Requires a change in culture, new processes and activities…
  16. 16. Open Science and FAIR Data: Benefits for Stakeholders  Government and Innovation / Development  Increased impact from investment in activities relating to data; economic, innovation and research benefits.  Partnerships for research, development and innovation around co-design, Open Science and FAIR data.  Research Institutions:  Development of data capacity and data skills;  Not losing valuable data (stored on hard drives, not annotated or reusable);  Shop window of research activities and expertise (Open Access, Open Data / FAIR Data)  Capacity to build research schools around data assets and skills, attract international collaboration and investment.  Build case for ‘data sovereignty’, data (re-)patriation.  Researchers:  Increased data skills, expertise in FAIR data builds competitive edge.  Citation advantage of Open Access / Open Data.  Culture of certain research disciplines is already strongly in favour of Open Data / Open Science.
  17. 17. Simon Hodson Executive Director CODATA www.codata.org http://lists.codata.org/mailman/listinfo/codata-international_lists.codata.org Email: simon@codata.org Twitter: @simonhodson99 Tel (Office): +33 1 45 25 04 96 | Tel (Cell): +33 6 86 30 42 59 CODATA (ICSU Committee on Data for Science and Technology), 5 rue Auguste Vacquerie, 75016 Paris, Thank you for your attention!
  18. 18. What data are in scope? Criteria for preservation, publication?  Typology of things that are in scope of research data policies:  Data underpinning research conclusions presented the literature must be published.  Significant data produced by research projects should be made available.  Major data collection/creation exercises with evident multiple uses (census, Hubble etc).  Unrepeatable observations, measurements in nature or society? (preserve and publish)  Data created by in vitro experiments that can be reproduced and for which the instruments are being improved (perhaps very limited reasons for preserving)  Data collected for a given purpose (traffic management, customer relations, ships logs) but which could be used for research…  Need for collaborative approach (with researchers) to clarifying the criteria to keeping, publishing and discarding data.
  19. 19. Commission on Data Standards for Science  Major transdisciplinary research issues depend on the integration of data and information from different sources.  Fundamental importance of agreed vocabularies and standards.  Fundamental to integration of social science, geospatial and other data  Essential to effective interface of science and monitoring (e.g. Sendai, SDGs, sustainable cities)  LOD for Disaster Research, Nanomaterials Uniform Description System  Huge opportunities but significant challenges.  The ICSU and ISSC, any merged Council, and international scientific unions could have a major role to play to encourage and accelerate these developments.
  20. 20. Commission on Data Standards for Science  ‘Inter-Union Workshop on 21st Century Scientific and Technical Data Developing a roadmap for data integration’, Paris, 19-20 June: http://bit.ly/codata_standards_workshop  Representatives of International Scientific Unions: IUCr for CIF; IUPAC for chemical terminologies; IUGS for GeoSciML; etc.  Representatives of Standards Organisations: e.g. Darwin Core, for biology, biodiversity; DDI for social science surveys; OGC for geospatial data; W3C for the web.  Position paper in development.  Directory of activities involving international scientific unions.  Maturity model for vocabularies and standards.  Case studies of applications of vocabularies and standards for transdiciplinary research.  Larger follow-up workshop 13-15 November, Royal Society, London.  Vision of a decadal initiative to advance science through integration of data and information.

×