A Coordinated Framework for Open Data
Open Science in Botswana
Simon Hodson, Executive Director, CODATA
www.codata.org
National Forum: Open Data Open Science
University of Botswana Conference Centre
Gaborone, Botswana
30 October 2017
FAIR Data
• FAIR Data: increasingly widely adopted as summary of attributes that increase value of research
data.
• Chair of European Commission Expert Group on FAIR Data ‘Making FAIR Data a Reality’:
http://bit.ly/FAIRdata-EG
• One of European Experts in ZA-EU Open Science Dialogue.
• Member of EOSC Advisory Board.
• The value of data lies in reuse. What are the attributes that make data reusable?
• Findable: have sufficiently rich metadata and a unique and persistent identifier.
• Accessible: retrievable by humans and machines through a standard protocol; open and free
by default; authentication and authorization where necessary.
• Interoperable: metadata use a ‘formal, accessible, shared, and broadly applicable language
for knowledge representation’.
• Reusable: metadata provide rich and accurate information; clear usage license; detailed
provenance.
• FAIR is augmented by the principle that data should be ‘as open as possible, as closed as
necessary’, or ‘open by default’.
FAIR Guiding Principles (1)
• To be Findable:
• F1. (meta)data are assigned a globally unique and persistent identifier
• F2. data are described with rich metadata (defined by R1 below)
• F3. metadata clearly and explicitly include the identifier of the data it describes
• F4. (meta)data are registered or indexed in a searchable resource
• To be Accessible:
• A1. (meta)data are retrievable by their identifier using a standardized
communications protocol
• A1.1 the protocol is open, free, and universally implementable
• A1.2 the protocol allows for an authentication and authorization procedure,
where necessary
• A2. metadata are accessible, even when the data are no longer available
(Mons, B., et al., The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data,
http://dx.doi.org/10.1038/sdata.2016.18)
FAIR Guiding Principles (2)
• To be Interoperable:
• I1. (meta)data use a formal, accessible, shared, and broadly applicable language
for knowledge representation.
• I2. (meta)data use vocabularies that follow FAIR principles
• I3. (meta)data include qualified references to other (meta)data
• To be Reusable:
• R1. meta(data) are richly described with a plurality of accurate and relevant
attributes
• R1.1. (meta)data are released with a clear and accessible data usage license
• R1.2. (meta)data are associated with detailed provenance
• R1.3. (meta)data meet domain-relevant community standards
(Mons, B., et al., The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data,
http://dx.doi.org/10.1038/sdata.2016.18)
The Case for Open Data
in a Big Data World
• Science International Accord on Open Data in a Big Data
World: http://www.science-international.org/
• Supported by four major international science
organisations.
• Presents a powerful case that the profound
transformations mean that data should be:
• Open by default
• Intelligently open, FAIR data
• Lays out a framework of principles, responsibilities and
enabling practices for how the vision of Open Data in a
Big Data World can be achieved.
• Campaign for endorsements: over 150 organisations so
far.
• Please consider endorsing the Accord:
http://www.science-international.org/#endorse
The Open Data Iceberg
The Technical Challenge
The Ecosystem Challenge
The Funding Challenge
The Support Challenge
The Skills Challenge
The Incentives Challenge
The Mindset Challenge
Processes &
Organisation
People
Geoffrey Boulton (CODATA) - developed from an idea by Deetjen, U., E. T. Meyer and R. Schroeder
(2015). OECD Digital Economy Papers, No. 246, OECD Publishing.
A(n Inter)National Infrastructure
Technology
Establish African Open Data Forum / Platform
Funded Research Data Infrastructure Initiatives
Funded, co-designed transdisciplinary research
projects
Co-design African Open Data Policies
Develop Incentives Frameworks
Develop Research Data Science Training
African Research Data Infrastructure Roadmap
Activities require
low funding for
coordination,
secondment,
contributions in
kind and evaluation.
Activities require
higher investment
for coordination,
co-design
implemenatation
and evaluation.
African Open Science Platform
Pilot Project Workpackages
Framework for National and Institutional
Data Strategies
 National / Institutional Open Science and FAIR Data
Strategy
 Consultative forum, stakeholder engagement.
 Open data policies and guidance at national and
institutional level.
 Clarify the boundaries of open (particularly privacy,
IPR).
 Develop incentives and reward systems.
 Mechanisms (infrastructure and policy) to ensure
concurrent publication of data as research output.
 Data ‘publication’ and citations of data included in
assessment of research contribution.
Framework for National and Institutional
Data Strategies
 Promotion of data skills:
 Essential data skills for researchers.
 Develop skills and competencies for data stewards,
data scientists.
 Scope, roadmap and implement data infrastructure.
 Key components of national and regional
infrastructure (network / NREN, economies of scale
for storage and compute).
 Development of institutional infrastructure for
research collaboration and data stewardship/RDM.
 Collaborative infrastructures for certain research
disciplines, nationally, regionally to pool expertise and
lower costs.
 International infrastructure / data ecosystem
components: permanent identifiers, metadata
standards, trusted digital repositories.
Plan
Create
Use
AppraisePublish
Discover
Reuse
Store
Annotate
Select
DiscardDescribe
Identify Hand Over?
Access
Supporting the Research Data Lifecycle
Data Management
Planning
Managing Active
Data
Processes for
selection and
retention
Deposit / Handover
Data Repositories/
Catalogues
Components of RDM support services
RDM Policy and Roadmap
Business Plan and
Sustainability
Guidance, Training and Support
Research Data
Registry /
Infrastructure
11
Institutional Research
Data Management
Policies:
http://www.dcc.ac.uk/r
esources/policy-and-
legal/institutional-data-
policies/uk-
institutional-data-
policies
CODATA-RDA School of Research
Data Science
• Contemporary research – particularly
when addressing the most significant,
transdisciplinary research challenges –
increasingly depends on a range of skills
relating to data. These skills include the
principles and practice of Open Science
and research data management and
curation, the development of a range of
data platforms and infrastructures, the
techniques of large scale analysis,
statistics, visualisation and modelling
techniques, software development and
data annotation. The ensemble of
these skills, relating to data in research,
can usefully be called ‘Research Data
Science’.
Foundational Curriculum
Seven components: open science, data management and curation; software carpentry; data carpentry;
data infrastructures; statistics and machine learning; visualisation.
Builds on much existing courses to create something more than the sum of its parts:
 Open Science – reflection on ethos and requirements of sharing/openness
 Open Research Data – Basics of data management, DMPs, RDM life-cycle, data publishing,
metadata and annotation
 Author Carpentry – Improving research efficiency with command line and OS tools.
 Software Carpentry – Introduction the Unix shell and Git (sharing software and data)
 Data Carpentry – Introduction to programming in R, and to SQL databases
 Visualisation – Tools, Critical Analysis of Visualisation
 Analysis – Statistics and Machine Learning (clustering, supervised and unsupervised learning)
 Computational Infrastructures – Introduction to cloud computing, launching a Virtual Machine on
an IaaS cloud
Building international network of short courses http://bit.ly/first_data_school_trieste
Programme and materials: http://bit.ly/School_of_Research_Data_Science-Programme ;
http://bit.ly/first_data_school_materials
CODATA-RDA Data Science
Training Initiative
• Annual foundational school hosted at ICTP, Trieste (with
the objective to build a network of partners, train-the-
trainers).
• Advanced workshops, ICTP, Trieste, following the
foundational school.
• National or regional schools, organised with local
partners.
• Planning at least two pilot schools as part of the African
Open Science Platform project: ICTP, Kigale? Gaborone
for IDW?
• Next #DataTrieste Summer School, 6-17 August 2018.
• Next #DataTrieste Advanced Workshops 20-24 August
2018.
• Next regional foundational school ‘CODATA-RDA School
of Research Data Science’, São Paulo, 4-15 December
2017: http://www.ictp-saifr.org/?page_id=15270
Data is difficult: benefits and challenges
 Open and FAIR data is essential for transparency and reproducibility; to take advantage of
analysis at scale; to tackle major interdisciplinary challenges that require integration of data
from many resources; has significant economic and other societal benefits, including
encouraging partnerships between research, government, innovation and development.
 But…
 Research funders and research performing institutions will have to invest in data
infrastructure.
 Essential to consider the cost of data stewardship and dissemination as part of the total cost
of doing research.
 Data description, definitions and ontologies, data management require significant effort.
 Requires new data skills…
 Requires a change in culture, new processes and activities…
Open Science and FAIR Data:
Benefits for Stakeholders
 Government and Innovation / Development
 Increased impact from investment in activities relating to data; economic, innovation and research
benefits.
 Partnerships for research, development and innovation around co-design, Open Science and FAIR
data.
 Research Institutions:
 Development of data capacity and data skills;
 Not losing valuable data (stored on hard drives, not annotated or reusable);
 Shop window of research activities and expertise (Open Access, Open Data / FAIR Data)
 Capacity to build research schools around data assets and skills, attract international collaboration
and investment.
 Build case for ‘data sovereignty’, data (re-)patriation.
 Researchers:
 Increased data skills, expertise in FAIR data builds competitive edge.
 Citation advantage of Open Access / Open Data.
 Culture of certain research disciplines is already strongly in favour of Open Data / Open Science.
Simon Hodson
Executive Director CODATA
www.codata.org
http://lists.codata.org/mailman/listinfo/codata-international_lists.codata.org
Email: simon@codata.org
Twitter: @simonhodson99
Tel (Office): +33 1 45 25 04 96 | Tel (Cell): +33 6 86 30 42 59
CODATA (ICSU Committee on Data for Science and Technology), 5 rue Auguste Vacquerie, 75016 Paris,
Thank you for your attention!
What data are in scope?
Criteria for preservation, publication?
 Typology of things that are in scope of research data policies:
 Data underpinning research conclusions presented the literature must be published.
 Significant data produced by research projects should be made available.
 Major data collection/creation exercises with evident multiple uses (census, Hubble etc).
 Unrepeatable observations, measurements in nature or society? (preserve and publish)
 Data created by in vitro experiments that can be reproduced and for which the instruments are
being improved (perhaps very limited reasons for preserving)
 Data collected for a given purpose (traffic management, customer relations, ships logs) but
which could be used for research…
 Need for collaborative approach (with researchers) to clarifying the criteria to keeping,
publishing and discarding data.
Commission on
Data Standards for Science
 Major transdisciplinary research issues depend on the integration
of data and information from different sources.
 Fundamental importance of agreed vocabularies and standards.
 Fundamental to integration of social science, geospatial and
other data
 Essential to effective interface of science and monitoring (e.g.
Sendai, SDGs, sustainable cities)
 LOD for Disaster Research, Nanomaterials Uniform Description
System
 Huge opportunities but significant challenges.
 The ICSU and ISSC, any merged Council, and international scientific
unions could have a major role to play to encourage and accelerate
these developments.
Commission on
Data Standards for Science
 ‘Inter-Union Workshop on 21st Century Scientific and Technical Data Developing a roadmap for data
integration’, Paris, 19-20 June: http://bit.ly/codata_standards_workshop
 Representatives of International Scientific Unions: IUCr for CIF; IUPAC for chemical terminologies; IUGS
for GeoSciML; etc.
 Representatives of Standards Organisations: e.g. Darwin Core, for biology, biodiversity; DDI for social
science surveys; OGC for geospatial data; W3C for the web.
 Position paper in development.
 Directory of activities involving international
scientific unions.
 Maturity model for vocabularies and standards.
 Case studies of applications of vocabularies and
standards for transdiciplinary research.
 Larger follow-up workshop 13-15 November, Royal
Society, London.
 Vision of a decadal initiative to advance science through
integration of data and information.

A coordinated framework for open data open science in Botswana/Simon Hodson

  • 1.
    A Coordinated Frameworkfor Open Data Open Science in Botswana Simon Hodson, Executive Director, CODATA www.codata.org National Forum: Open Data Open Science University of Botswana Conference Centre Gaborone, Botswana 30 October 2017
  • 2.
    FAIR Data • FAIRData: increasingly widely adopted as summary of attributes that increase value of research data. • Chair of European Commission Expert Group on FAIR Data ‘Making FAIR Data a Reality’: http://bit.ly/FAIRdata-EG • One of European Experts in ZA-EU Open Science Dialogue. • Member of EOSC Advisory Board. • The value of data lies in reuse. What are the attributes that make data reusable? • Findable: have sufficiently rich metadata and a unique and persistent identifier. • Accessible: retrievable by humans and machines through a standard protocol; open and free by default; authentication and authorization where necessary. • Interoperable: metadata use a ‘formal, accessible, shared, and broadly applicable language for knowledge representation’. • Reusable: metadata provide rich and accurate information; clear usage license; detailed provenance. • FAIR is augmented by the principle that data should be ‘as open as possible, as closed as necessary’, or ‘open by default’.
  • 3.
    FAIR Guiding Principles(1) • To be Findable: • F1. (meta)data are assigned a globally unique and persistent identifier • F2. data are described with rich metadata (defined by R1 below) • F3. metadata clearly and explicitly include the identifier of the data it describes • F4. (meta)data are registered or indexed in a searchable resource • To be Accessible: • A1. (meta)data are retrievable by their identifier using a standardized communications protocol • A1.1 the protocol is open, free, and universally implementable • A1.2 the protocol allows for an authentication and authorization procedure, where necessary • A2. metadata are accessible, even when the data are no longer available (Mons, B., et al., The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data, http://dx.doi.org/10.1038/sdata.2016.18)
  • 4.
    FAIR Guiding Principles(2) • To be Interoperable: • I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. • I2. (meta)data use vocabularies that follow FAIR principles • I3. (meta)data include qualified references to other (meta)data • To be Reusable: • R1. meta(data) are richly described with a plurality of accurate and relevant attributes • R1.1. (meta)data are released with a clear and accessible data usage license • R1.2. (meta)data are associated with detailed provenance • R1.3. (meta)data meet domain-relevant community standards (Mons, B., et al., The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data, http://dx.doi.org/10.1038/sdata.2016.18)
  • 5.
    The Case forOpen Data in a Big Data World • Science International Accord on Open Data in a Big Data World: http://www.science-international.org/ • Supported by four major international science organisations. • Presents a powerful case that the profound transformations mean that data should be: • Open by default • Intelligently open, FAIR data • Lays out a framework of principles, responsibilities and enabling practices for how the vision of Open Data in a Big Data World can be achieved. • Campaign for endorsements: over 150 organisations so far. • Please consider endorsing the Accord: http://www.science-international.org/#endorse
  • 6.
    The Open DataIceberg The Technical Challenge The Ecosystem Challenge The Funding Challenge The Support Challenge The Skills Challenge The Incentives Challenge The Mindset Challenge Processes & Organisation People Geoffrey Boulton (CODATA) - developed from an idea by Deetjen, U., E. T. Meyer and R. Schroeder (2015). OECD Digital Economy Papers, No. 246, OECD Publishing. A(n Inter)National Infrastructure Technology
  • 7.
    Establish African OpenData Forum / Platform Funded Research Data Infrastructure Initiatives Funded, co-designed transdisciplinary research projects Co-design African Open Data Policies Develop Incentives Frameworks Develop Research Data Science Training African Research Data Infrastructure Roadmap Activities require low funding for coordination, secondment, contributions in kind and evaluation. Activities require higher investment for coordination, co-design implemenatation and evaluation. African Open Science Platform Pilot Project Workpackages
  • 8.
    Framework for Nationaland Institutional Data Strategies  National / Institutional Open Science and FAIR Data Strategy  Consultative forum, stakeholder engagement.  Open data policies and guidance at national and institutional level.  Clarify the boundaries of open (particularly privacy, IPR).  Develop incentives and reward systems.  Mechanisms (infrastructure and policy) to ensure concurrent publication of data as research output.  Data ‘publication’ and citations of data included in assessment of research contribution.
  • 9.
    Framework for Nationaland Institutional Data Strategies  Promotion of data skills:  Essential data skills for researchers.  Develop skills and competencies for data stewards, data scientists.  Scope, roadmap and implement data infrastructure.  Key components of national and regional infrastructure (network / NREN, economies of scale for storage and compute).  Development of institutional infrastructure for research collaboration and data stewardship/RDM.  Collaborative infrastructures for certain research disciplines, nationally, regionally to pool expertise and lower costs.  International infrastructure / data ecosystem components: permanent identifiers, metadata standards, trusted digital repositories.
  • 10.
  • 11.
    Data Management Planning Managing Active Data Processesfor selection and retention Deposit / Handover Data Repositories/ Catalogues Components of RDM support services RDM Policy and Roadmap Business Plan and Sustainability Guidance, Training and Support Research Data Registry / Infrastructure 11 Institutional Research Data Management Policies: http://www.dcc.ac.uk/r esources/policy-and- legal/institutional-data- policies/uk- institutional-data- policies
  • 12.
    CODATA-RDA School ofResearch Data Science • Contemporary research – particularly when addressing the most significant, transdisciplinary research challenges – increasingly depends on a range of skills relating to data. These skills include the principles and practice of Open Science and research data management and curation, the development of a range of data platforms and infrastructures, the techniques of large scale analysis, statistics, visualisation and modelling techniques, software development and data annotation. The ensemble of these skills, relating to data in research, can usefully be called ‘Research Data Science’.
  • 13.
    Foundational Curriculum Seven components:open science, data management and curation; software carpentry; data carpentry; data infrastructures; statistics and machine learning; visualisation. Builds on much existing courses to create something more than the sum of its parts:  Open Science – reflection on ethos and requirements of sharing/openness  Open Research Data – Basics of data management, DMPs, RDM life-cycle, data publishing, metadata and annotation  Author Carpentry – Improving research efficiency with command line and OS tools.  Software Carpentry – Introduction the Unix shell and Git (sharing software and data)  Data Carpentry – Introduction to programming in R, and to SQL databases  Visualisation – Tools, Critical Analysis of Visualisation  Analysis – Statistics and Machine Learning (clustering, supervised and unsupervised learning)  Computational Infrastructures – Introduction to cloud computing, launching a Virtual Machine on an IaaS cloud Building international network of short courses http://bit.ly/first_data_school_trieste Programme and materials: http://bit.ly/School_of_Research_Data_Science-Programme ; http://bit.ly/first_data_school_materials
  • 14.
    CODATA-RDA Data Science TrainingInitiative • Annual foundational school hosted at ICTP, Trieste (with the objective to build a network of partners, train-the- trainers). • Advanced workshops, ICTP, Trieste, following the foundational school. • National or regional schools, organised with local partners. • Planning at least two pilot schools as part of the African Open Science Platform project: ICTP, Kigale? Gaborone for IDW? • Next #DataTrieste Summer School, 6-17 August 2018. • Next #DataTrieste Advanced Workshops 20-24 August 2018. • Next regional foundational school ‘CODATA-RDA School of Research Data Science’, São Paulo, 4-15 December 2017: http://www.ictp-saifr.org/?page_id=15270
  • 15.
    Data is difficult:benefits and challenges  Open and FAIR data is essential for transparency and reproducibility; to take advantage of analysis at scale; to tackle major interdisciplinary challenges that require integration of data from many resources; has significant economic and other societal benefits, including encouraging partnerships between research, government, innovation and development.  But…  Research funders and research performing institutions will have to invest in data infrastructure.  Essential to consider the cost of data stewardship and dissemination as part of the total cost of doing research.  Data description, definitions and ontologies, data management require significant effort.  Requires new data skills…  Requires a change in culture, new processes and activities…
  • 16.
    Open Science andFAIR Data: Benefits for Stakeholders  Government and Innovation / Development  Increased impact from investment in activities relating to data; economic, innovation and research benefits.  Partnerships for research, development and innovation around co-design, Open Science and FAIR data.  Research Institutions:  Development of data capacity and data skills;  Not losing valuable data (stored on hard drives, not annotated or reusable);  Shop window of research activities and expertise (Open Access, Open Data / FAIR Data)  Capacity to build research schools around data assets and skills, attract international collaboration and investment.  Build case for ‘data sovereignty’, data (re-)patriation.  Researchers:  Increased data skills, expertise in FAIR data builds competitive edge.  Citation advantage of Open Access / Open Data.  Culture of certain research disciplines is already strongly in favour of Open Data / Open Science.
  • 17.
    Simon Hodson Executive DirectorCODATA www.codata.org http://lists.codata.org/mailman/listinfo/codata-international_lists.codata.org Email: simon@codata.org Twitter: @simonhodson99 Tel (Office): +33 1 45 25 04 96 | Tel (Cell): +33 6 86 30 42 59 CODATA (ICSU Committee on Data for Science and Technology), 5 rue Auguste Vacquerie, 75016 Paris, Thank you for your attention!
  • 18.
    What data arein scope? Criteria for preservation, publication?  Typology of things that are in scope of research data policies:  Data underpinning research conclusions presented the literature must be published.  Significant data produced by research projects should be made available.  Major data collection/creation exercises with evident multiple uses (census, Hubble etc).  Unrepeatable observations, measurements in nature or society? (preserve and publish)  Data created by in vitro experiments that can be reproduced and for which the instruments are being improved (perhaps very limited reasons for preserving)  Data collected for a given purpose (traffic management, customer relations, ships logs) but which could be used for research…  Need for collaborative approach (with researchers) to clarifying the criteria to keeping, publishing and discarding data.
  • 19.
    Commission on Data Standardsfor Science  Major transdisciplinary research issues depend on the integration of data and information from different sources.  Fundamental importance of agreed vocabularies and standards.  Fundamental to integration of social science, geospatial and other data  Essential to effective interface of science and monitoring (e.g. Sendai, SDGs, sustainable cities)  LOD for Disaster Research, Nanomaterials Uniform Description System  Huge opportunities but significant challenges.  The ICSU and ISSC, any merged Council, and international scientific unions could have a major role to play to encourage and accelerate these developments.
  • 20.
    Commission on Data Standardsfor Science  ‘Inter-Union Workshop on 21st Century Scientific and Technical Data Developing a roadmap for data integration’, Paris, 19-20 June: http://bit.ly/codata_standards_workshop  Representatives of International Scientific Unions: IUCr for CIF; IUPAC for chemical terminologies; IUGS for GeoSciML; etc.  Representatives of Standards Organisations: e.g. Darwin Core, for biology, biodiversity; DDI for social science surveys; OGC for geospatial data; W3C for the web.  Position paper in development.  Directory of activities involving international scientific unions.  Maturity model for vocabularies and standards.  Case studies of applications of vocabularies and standards for transdiciplinary research.  Larger follow-up workshop 13-15 November, Royal Society, London.  Vision of a decadal initiative to advance science through integration of data and information.

Editor's Notes

  • #3 It is not an exaggeration to say that there is an emerging policy consensus around FAIR. In an accessible way, FAIR summarises attributes of data which have been stressed in a number of policy documents, including the Royal Society report on Science as an Open Enterprise and its definition of ‘intelligent openness’.
  • #4 FAIR data implies responsibilities on PIs, their research groups and institutions. The responsibilities are intended to ensure that the data is as FAIR and above all as assessable and reusable as possible.
  • #5 FAIR data implies responsibilities on PIs, their research groups and institutions. The responsibilities are intended to ensure that the data is as FAIR and above all as assessable and reusable as possible.
  • #6 CODATA made a major contribution to the debate through the Science International Accord on Open Data in a Big Data World. Excellent IUCr position paper in response. Welcome endorsements from other organisations.
  • #7 To support this, we need to think about the challenge holistically. It is not restricted to the technical issues.
  • #8 We are experiencing a
  • #9 We are experiencing a
  • #10 We are experiencing a
  • #11 10
  • #16 Data is difficult and the contribution of those who curate data needs to be recognised and rewarded. Data citation is a necessary but not sufficient part of this.
  • #17 Data is difficult and the contribution of those who curate data needs to be recognised and rewarded. Data citation is a necessary but not sufficient part of this.
  • #19 We are experiencing a