CODATA: Open Data, FAIR Data and Open Science/Simon Hodson
CODATA: Open Data, FAIR Data and Open
Simon Hodson, Executive Director, CODATA
African Open Science Platform Project and Research Data Alliance Workshop
Association of African Universities Conference
Palm Royal Beach Hotel, Accra, Ghana
7 June 2017
Mobilising the Data Revolution
Three priority areas essential to a coordinated
international response to the data revolution.
Promoting implementation of open data
principles, policies and practices;
Advancing the frontiers of data science and
adaptation to scientific research;
Building capacity by improving data skills and the
functions of science systems needed to support
open data (particularly in LMICs)
CODATA General Assembly in New Delhi:
presented the first vigorous response to the
ICSU Review of CODATA.
New President and Executive Committee
undertook to refocus the CODATA strategy.
New Committee elected in Denver, includes
members from Kenya and South Africa, will
co-opt a member from Latin America.
Principles, Policies and Practice
Frontiers of Data Science
Data Science Journal
CODATA 2017, Saint
Petersburg 8-13 Oct 2017
Research Data: challenges and
Challenges and solutions for data issues
relate to the conduct of science in
national settings and international
CODATA’s membership helps us to
address data issues on these two axes.
Science or Data
Role of CODATA
Join CODATA and form a National Committee.
CODATA membership dues are aligned with GDP.
CODATA National Committees are composed of national
stakeholders and data experts.
What are the benefits of having a CODATA National
Engage: point of contact with CODATA;
Influence: contribute to CODATA strategy;
Coordinate: forum by which national stakeholders may
advance data agenda in step with international
Collaborate: propose Task Groups, host or participate in
international workshop series, engage with Early Career
Data Professionals Group;
Partner: undertake activities with other National
Committees, bilaterally or in groups.
Data Standards for Science
Major transdisciplinary research issues depend on the integration of
data and information from different sources.
Fundamental importance of agreed vocabularies and standards.
Importance of integration of social science, geospatial and other
Essential to effective interface of science and monitoring (e.g.
LOD for Disaster Research, Nanomaterials Uniform Description
Huge opportunities but significant challenges.
The ICSU and ISSC, any merged Council, and international scientific
unions could have a major role to play to encourage and accelerate
‘Inter-Union Workshop on 21st Century Scientific and Technical Data
Developing a roadmap for data integration’, Paris, 19-20 June.
Larger follow-up workshop later in the year.
Vision of a decadal initiative to advance science through integration
of data and information.
From September 2015 to September 2016, the annual income from membership fees
of c.€205K leveraged further investment in activities to a total of over €1.9M: a
leverage ratio of over 9.6:1.
This estimate includes external contributions to events, Task Groups and similar
activities, sponsorship obtained as well as host and participant investment in events.
As a specific example, in August 2016, the CODATA-RDA School of Research Data
Science was held at ICTP in Trieste. CODATA’s own investment in the event totals
c.€10K in travel and student support. The event as a whole leveraged an additional
c.€270,000 in support, comprising international and local travel and accommodation
for experts and students as well as sponsorship and local expenses.
It should be noted that this estimate considerably undervalues the CODATA’s
leveraging power as it does not include any estimate for contributions in kind (e.g.
Next two years, concerted outreach to expand membership and to engage more with
Africa, Data and Open Science
21st century is the century of data.
Data skills and infrastructure will be essential for economic
advancement and for sustainable development.
We need to create a ‘world that counts’ that gathers data and uses
data to understand itself.
Open data is essential to increase impact of research and translation
African governments, research and education systems and
universities have an interest in developing data skills and
African universities have an essential role to play as educators of a
data savvy generation and as the stewards of the data created by
The data from many research projects conducted in Africa is not
looked after in African institutions.
African institutions need to present their research outputs,
including data, as a shop window and a record of their activities,
What is Open Science:
Open access to research literature.
Data that is as Open as possible, as closed as necessary.
FAIR Data (Findable, Accessible, Interoperable, Reusable).
A shop window and repository of all research outputs.
A culture and methodology of open discussion and enquiry
(including methodology, lab notebooks, pre-prints)
Research data is evidence: it is fundamental to the validity and
reproducibility of science.
Those research disciplines that have leapt forward in the past 15-20
years are those that have shared and analysed data at scale: genomics,
astronomy, disciplines using remote sensing data etc.
African research institutions have an opportunity to build their
reputation around research specialisation: and this requires data
specialisation and FAIR data collections.
The Case for Open Data
in a Big Data World
• Science International Accord on Open Data in a Big
Data World: http://www.science-international.org/
• Presents a powerful case that the profound
transformations mean that data should be:
• Open by default
• Intelligently open
• Supported by four major international science
• Lays out a framework of principles, responsibilities and
enabling practices for how the vision of Open Data in a
Big Data World can be achieved.
• Campaign for endorsements: over 100 organisations so
far. Please consider endorsing the Accord.
• Translations: Chinese, Russian, Polish, Spanish, French.
CODATA Data Policy Activities
New Data Policy Committee, chaired by Paul Uhlir,
international expert in Data Policies and member of
CODATA Executive Committee.
Current Best Practice for Research Data Management
The Value of Open Data Sharing, report for GEO
Legal Interoperability, Principles and Implementation
Simon Hodson is chairing the European
Commission’s Expert Group on FAIR Data:
OECD Global Science Forum and CODATA Project on
Business Models for Sustainable Data Repositories:
CODATA-RDA School of Research
• Contemporary research – particularly
when addressing the most significant,
transdisciplinary research challenges –
increasingly depends on a range of skills
relating to data. These skills include the
principles and practice of Open Science
and research data management and
curation, the development of a range of
data platforms and infrastructures, the
techniques of large scale analysis,
statistics, visualisation and modelling
techniques, software development and
data annotation. The ensemble of
these skills, relating to data in research,
can usefully be called ‘Research Data
Seven components: open science, data management and curation; software carpentry; data carpentry;
data infrastructures; statistics and machine learning; visualisation.
Builds on much existing courses to create something more than the sum of its parts:
Open Science – reflection on ethos and requirements of sharing/openness
Open Research Data – Basics of data management, DMPs, RDM life-cycle, data publishing,
metadata and annotation
Author Carpentry – Improving research efficiency with command line and OS tools.
Software Carpentry – Introduction the Unix shell and Git (sharing software and data)
Data Carpentry – Introduction to programming in R, and to SQL databases
Visualisation – Tools, Critical Analysis of Visualisation
Analysis – Statistics and Machine Learning (clustering, supervised and unsupervised learning)
Computational Infrastructures – Introduction to cloud computing, launching a Virtual Machine on
an IaaS cloud
Building international network of short courses http://bit.ly/first_data_school_trieste
Programme and materials: http://bit.ly/School_of_Research_Data_Science-Programme ;
CODATA Data Science
CODATA-RDA ‘Foundational’ School of Research Data
Science, ICTP, Trieste, Italy, 10-21 July 2017
CODATA-RDA ‘Advanced’ Research Data Science Applied
Workshops, ICTP, Trieste, Italy, 24-28 July 2017
CODATA International Training Workshop in Open Data for
Better Science, Beijing 16-29 July 2017
CODATA International School of Research Data Science, Sao
Paolo, 4-15 December 2017: announcement coming soon
Advancing the Frontiers of Data Science
Re-launched Data Science Journal
New conference series:
CODATA Conference will happen every ‘odd’ year:
International Data Week, every ‘even’ year: 2018,
Commission on Data Standards.
Commission (Task Group) on Fundamental Constants.
Eight Task Groups (elected by General Assembly every
Eight Working Groups (approved by secretariat and
Executive Committee to address strategic issues).
CODATA Recommended Values of the Fundamental Physical
Constants, 2014: http://dx.doi.org/10.5281/zenodo.22826
CODATA WG on Description of
CODATA WG on the Description of Nanomaterials:
Uniform Description System v.02, May 2016:
Future Nano Needs Project:
and data experts
in FP7 Future
Challenges in Data Science
TG LOD Global Disaster Risk
TG Earth-Space Science Data
edition of Atlas of
• Contribution to standards for
multidisciplinary GIS for geoscience data
• Increased focus on interoperability and
• International collaboration for conferences
and training activities (Moscow and Sochi,
July 2016; Peterhof, October 2017).
• White Paper: ‘Gap Analysis on Open Data
Interconnectivity for Global Disaster Risk
• Important response from the perspective of
science and data to post-Sendai framework
• Inviting comments until 30 September.
Challenges in Data Science
TG Agricultural Data for
Knowledge and Innovation
TG Coordinating Data Standards
amongst Scientific Unions
• Encourage increased coordination and
collaboration on vocabularies and ontologies
across International Scientific Unions.
• Compile and distribute information about
ISU data and information standards.
• Develop a maturity model and good practice
• Identify opportunities for increased
technical and semantic coordination.
• Coordinating development of policy, more
effective application of standards and
• Initial focus on EAR (East African Region).
IRIDIUM International Glossary of
Research Data Terms
• Building on a glossary first developed by Research Data Canada.
• CODATA helping to internationalise the glossary.
• Adopting CASRAI processes for review and updating in systematic working group
and review circle cycles.
• Current Glossary: http://dictionary.casrai.org/Category:Research_Data_Domain
• Information about the 2017 process: http://bit.ly/IRIDIUM-RDM-Glossary
International Data Week 2016
• Jointly organised by CODATA, RDA and WDS: 12-16 September, Denver, Colorado, USA
1. Two-day research conference, SciDataCon 2016;
2. an international data forum focusing on policy discussion, intersections with open
public data and data science, data driven innovation;
3. RDA Plenary 8
• 640 participants for SciDataCon.
• 820 total registrations over International Data Week, plus a number of collocated events.
• Articles beginning to appear in Data Science Journal: http://datascience.codata.org/
• Abstracts and slides available from http://www.scidatacon.org/2016/programme/
• Call for EoIs for International Data Week 2018 closed at the end of March. Seven valid
proposals, of which three from Africa.
CODATA 2017: Global Challenges
and Data Driven Science
Major conference themes:
1. Achievements in Data Driven Science, in all
2. Earth Observations Data and the Earth’s System
3. Data and Disaster Risk Research
4. Data Driven and Sustainable Cities
5. Big Data in Scientific and Commercial Sectors
6. Data Analysis, Event Recognition and Applications
7. National and International Data Services
8. Research Data Services in Universities
9. Coordination of Data Standards and
10. FAIR Data and the Limits of Open Data
11. Metrology, Reference Data and Monitoring Data
Call for Sessions and Papers
• Deadline is 30 June 2017
• Participants may submit session proposals, with
proposed papers, or papers against conference
• Encouraged to submit papers for a special
collection in the Data Science Journal
Executive Director CODATA
Tel (Office): +33 1 45 25 04 96 | Tel (Cell): +33 6 86 30 42 59
CODATA (ICSU Committee on Data for Science and Technology), 5 rue Auguste Vacquerie, 75016 Paris,
Thank you for your attention!