CODATA: Open Data, FAIR Data and Open Science/Simon Hodson


Presentation during the 14th Association of African Universities (AAU) Conference and African Open Science Platform (AOSP)/Research Data Alliance (RDA) Workshop in Accra, Ghana, 7-8 June 2017.

  1. 1. CODATA: Open Data, FAIR Data and Open Science Simon Hodson, Executive Director, CODATA African Open Science Platform Project and Research Data Alliance Workshop Association of African Universities Conference Palm Royal Beach Hotel, Accra, Ghana 7 June 2017
  2. 2. CODATA Strategy: Mobilising the Data Revolution Three priority areas essential to a coordinated international response to the data revolution.  Promoting implementation of open data principles, policies and practices;  Advancing the frontiers of data science and adaptation to scientific research;  Building capacity by improving data skills and the functions of science systems needed to support open data (particularly in LMICs)  CODATA General Assembly in New Delhi: presented the first vigorous response to the ICSU Review of CODATA.  New President and Executive Committee undertook to refocus the CODATA strategy.  New Committee elected in Denver, includes members from Kenya and South Africa, will co-opt a member from Latin America.
  3. 3. CODATA Prospectus: Principles, Policies and Practice Capacity Building Frontiers of Data Science Data Science Journal CODATA 2017, Saint Petersburg 8-13 Oct 2017
  4. 4. Research Data: challenges and stakeholders  Challenges and solutions for data issues relate to the conduct of science in national settings and international research disciplines.  CODATA’s membership helps us to address data issues on these two axes. National Research Systems CODATA National Members National Academies of Science or Data Organisations Scientific Disciplines CODATA International Scientific Union Members
  5. 5. Role of CODATA National Committees  Join CODATA and form a National Committee.  CODATA membership dues are aligned with GDP.  CODATA National Committees are composed of national stakeholders and data experts.  What are the benefits of having a CODATA National Committee?  Engage: point of contact with CODATA;  Influence: contribute to CODATA strategy;  Coordinate: forum by which national stakeholders may advance data agenda in step with international developments;  Collaborate: propose Task Groups, host or participate in international workshop series, engage with Early Career Data Professionals Group;  Partner: undertake activities with other National Committees, bilaterally or in groups.
  6. 6. Commission on Data Standards for Science  Major transdisciplinary research issues depend on the integration of data and information from different sources.  Fundamental importance of agreed vocabularies and standards.  Importance of integration of social science, geospatial and other data  Essential to effective interface of science and monitoring (e.g. Sendai, SDGs)  LOD for Disaster Research, Nanomaterials Uniform Description System  Huge opportunities but significant challenges.  The ICSU and ISSC, any merged Council, and international scientific unions could have a major role to play to encourage and accelerate these developments.  ‘Inter-Union Workshop on 21st Century Scientific and Technical Data Developing a roadmap for data integration’, Paris, 19-20 June.  Larger follow-up workshop later in the year.  Vision of a decadal initiative to advance science through integration of data and information.
  7. 7. CODATA Leverage  From September 2015 to September 2016, the annual income from membership fees of c.€205K leveraged further investment in activities to a total of over €1.9M: a leverage ratio of over 9.6:1.  This estimate includes external contributions to events, Task Groups and similar activities, sponsorship obtained as well as host and participant investment in events.  As a specific example, in August 2016, the CODATA-RDA School of Research Data Science was held at ICTP in Trieste. CODATA’s own investment in the event totals c.€10K in travel and student support. The event as a whole leveraged an additional c.€270,000 in support, comprising international and local travel and accommodation for experts and students as well as sponsorship and local expenses.  It should be noted that this estimate considerably undervalues the CODATA’s leveraging power as it does not include any estimate for contributions in kind (e.g. co-chairs time).  Next two years, concerted outreach to expand membership and to engage more with National Committees.
  8. 8. Africa, Data and Open Science  21st century is the century of data.  Data skills and infrastructure will be essential for economic advancement and for sustainable development.  We need to create a ‘world that counts’ that gathers data and uses data to understand itself.  Open data is essential to increase impact of research and translation for practitioners.  African governments, research and education systems and universities have an interest in developing data skills and infrastructure.  African universities have an essential role to play as educators of a data savvy generation and as the stewards of the data created by African research.  The data from many research projects conducted in Africa is not looked after in African institutions.  African institutions need to present their research outputs, including data, as a shop window and a record of their activities, achievements, impact.
  9. 9. Open Science  What is Open Science:  Open access to research literature.  Data that is as Open as possible, as closed as necessary.  FAIR Data (Findable, Accessible, Interoperable, Reusable).  A shop window and repository of all research outputs.  A culture and methodology of open discussion and enquiry (including methodology, lab notebooks, pre-prints)  Research data is evidence: it is fundamental to the validity and reproducibility of science.  Those research disciplines that have leapt forward in the past 15-20 years are those that have shared and analysed data at scale: genomics, astronomy, disciplines using remote sensing data etc.  African research institutions have an opportunity to build their reputation around research specialisation: and this requires data specialisation and FAIR data collections.
  10. 10. The Case for Open Data in a Big Data World • Science International Accord on Open Data in a Big Data World: • Presents a powerful case that the profound transformations mean that data should be: • Open by default • Intelligently open • Supported by four major international science organisations. • Lays out a framework of principles, responsibilities and enabling practices for how the vision of Open Data in a Big Data World can be achieved. • Campaign for endorsements: over 100 organisations so far. Please consider endorsing the Accord. • Translations: Chinese, Russian, Polish, Spanish, French.
  11. 11. CODATA Data Policy Activities  New Data Policy Committee, chaired by Paul Uhlir, international expert in Data Policies and member of CODATA Executive Committee.  Current Best Practice for Research Data Management Policies  The Value of Open Data Sharing, report for GEO  Legal Interoperability, Principles and Implementation Guidelines  FAIR Data  Simon Hodson is chairing the European Commission’s Expert Group on FAIR Data:  OECD Global Science Forum and CODATA Project on Business Models for Sustainable Data Repositories: sustainable-business-models
  12. 12. CODATA-RDA School of Research Data Science • Contemporary research – particularly when addressing the most significant, transdisciplinary research challenges – increasingly depends on a range of skills relating to data. These skills include the principles and practice of Open Science and research data management and curation, the development of a range of data platforms and infrastructures, the techniques of large scale analysis, statistics, visualisation and modelling techniques, software development and data annotation. The ensemble of these skills, relating to data in research, can usefully be called ‘Research Data Science’.
  13. 13. Foundational Curriculum Seven components: open science, data management and curation; software carpentry; data carpentry; data infrastructures; statistics and machine learning; visualisation. Builds on much existing courses to create something more than the sum of its parts:  Open Science – reflection on ethos and requirements of sharing/openness  Open Research Data – Basics of data management, DMPs, RDM life-cycle, data publishing, metadata and annotation  Author Carpentry – Improving research efficiency with command line and OS tools.  Software Carpentry – Introduction the Unix shell and Git (sharing software and data)  Data Carpentry – Introduction to programming in R, and to SQL databases  Visualisation – Tools, Critical Analysis of Visualisation  Analysis – Statistics and Machine Learning (clustering, supervised and unsupervised learning)  Computational Infrastructures – Introduction to cloud computing, launching a Virtual Machine on an IaaS cloud Building international network of short courses Programme and materials: ;
  14. 14. CODATA Data Science Training Opportunities CODATA-RDA ‘Foundational’ School of Research Data Science, ICTP, Trieste, Italy, 10-21 July 2017  CODATA-RDA ‘Advanced’ Research Data Science Applied Workshops, ICTP, Trieste, Italy, 24-28 July 2017  RDA_Data_Science_Workshops_2017 CODATA International Training Workshop in Open Data for Better Science, Beijing 16-29 July 2017  CODATA International School of Research Data Science, Sao Paolo, 4-15 December 2017: announcement coming soon 
  15. 15. Advancing the Frontiers of Data Science  Re-launched Data Science Journal  New conference series:  CODATA Conference will happen every ‘odd’ year: 2017, 2019…  International Data Week, every ‘even’ year: 2018, 2020…  Commission on Data Standards.  Commission (Task Group) on Fundamental Constants.  Eight Task Groups (elected by General Assembly every two years).  Eight Working Groups (approved by secretariat and Executive Committee to address strategic issues).
  16. 16. CODATA Recommended Values of the Fundamental Physical Constants, 2014:
  17. 17. CODATA WG on Description of Nanomaterials CODATA WG on the Description of Nanomaterials: Uniform Description System v.02, May 2016: Future Nano Needs Project: Convene ISUs, International Stakeholders and data experts Form Working Group Draft Framework for Description of Nanomaterials Refine/validate in FP7 Future Nano Needs Project
  18. 18. Challenges in Data Science TG LOD Global Disaster Risk Research TG Earth-Space Science Data Interoperability Preparing second edition of Atlas of the Earth’s Magnetic Field magnetic-field • Contribution to standards for multidisciplinary GIS for geoscience data • Increased focus on interoperability and standardisation issues. • International collaboration for conferences and training activities (Moscow and Sochi, July 2016; Peterhof, October 2017). • White Paper: ‘Gap Analysis on Open Data Interconnectivity for Global Disaster Risk Research’ LOD_Disaster_Gap_Analysis • Important response from the perspective of science and data to post-Sendai framework • Inviting comments until 30 September.
  19. 19. Challenges in Data Science TG Agricultural Data for Knowledge and Innovation TG Coordinating Data Standards amongst Scientific Unions • Encourage increased coordination and collaboration on vocabularies and ontologies across International Scientific Unions. • Compile and distribute information about ISU data and information standards. • Develop a maturity model and good practice guidelines. • Identify opportunities for increased technical and semantic coordination. • Coordinating development of policy, more effective application of standards and capacity building/training. • Initial focus on EAR (East African Region).
  20. 20. IRIDIUM International Glossary of Research Data Terms • Building on a glossary first developed by Research Data Canada. • CODATA helping to internationalise the glossary. • Adopting CASRAI processes for review and updating in systematic working group and review circle cycles. • Current Glossary: • Information about the 2017 process:
  21. 21. International Data Week 2016 SciDataCon 2016 • Jointly organised by CODATA, RDA and WDS: 12-16 September, Denver, Colorado, USA 1. Two-day research conference, SciDataCon 2016; 2. an international data forum focusing on policy discussion, intersections with open public data and data science, data driven innovation; 3. RDA Plenary 8 • 640 participants for SciDataCon. • 820 total registrations over International Data Week, plus a number of collocated events. • Articles beginning to appear in Data Science Journal: • Abstracts and slides available from • Call for EoIs for International Data Week 2018 closed at the end of March. Seven valid proposals, of which three from Africa.
  22. 22. CODATA 2017
  23. 23. CODATA 2017: Global Challenges and Data Driven Science Major conference themes: 1. Achievements in Data Driven Science, in all research disciplines 2. Earth Observations Data and the Earth’s System 3. Data and Disaster Risk Research 4. Data Driven and Sustainable Cities 5. Big Data in Scientific and Commercial Sectors 6. Data Analysis, Event Recognition and Applications 7. National and International Data Services 8. Research Data Services in Universities 9. Coordination of Data Standards and Interoperability 10. FAIR Data and the Limits of Open Data 11. Metrology, Reference Data and Monitoring Data Call for Sessions and Papers • Deadline is 30 June 2017 • Participants may submit session proposals, with proposed papers, or papers against conference themes. • Encouraged to submit papers for a special collection in the Data Science Journal
  24. 24. Simon Hodson Executive Director CODATA Email: Twitter: @simonhodson99 Tel (Office): +33 1 45 25 04 96 | Tel (Cell): +33 6 86 30 42 59 CODATA (ICSU Committee on Data for Science and Technology), 5 rue Auguste Vacquerie, 75016 Paris, Thank you for your attention!