Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data as a research output and a research asset: the case for Open Science/Simon Hodson


Published on

Presented during Uganda Open Data/Open Science National Dialogue 25-26 April 2018.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Data as a research output and a research asset: the case for Open Science/Simon Hodson

  1. 1. Data as a research output and a research asset: the case for Open Science Simon Hodson, Executive Director, CODATA National Dialogue on Mainstreaming Open Data Access and Use in Uganda Skyz Hotel, Naguru Kampala 25 April 2018
  2. 2. CODATA Prospectus: Principles, Policies and Practice Capacity Building Frontiers of Data Science Data Science Journal CODATA 2017, Saint Petersburg 8-13 Oct 2017
  3. 3. INTERNATIONAL DATA WEEK IDW 2018 Gaborone, Botswana: 5-8 November 2018 Information: Deadline for abstracts, 31 May:
  4. 4. Why Open Science / FAIR Data? • Good scientific practice depends on communicating the evidence. • Open research data are essential for reproducibility, self-correction. • Academic publishing has not kept up with age of digital data. • Danger of an replication / evidence / credibility gap. • Boulton: to fail to communicate the data that supports scientific assertions is malpractice • Open data practices have transformed certain areas of research. • Genomics and related biomedical sciences; crystallography; astronomy; areas of earth systems science; various disciplines using remote sensing data… • FAIR data helps use of data at scale, by machines, harnessing technological potential. • Research data often have considerable potential for reuse, reinterpretation, use in different studies. • Open data foster innovation and accelerate scientific discovery through reuse of data within and outside the academic system. • Research data produced by publicly funded research are a public asset.
  5. 5. What is Open Science? (1)  Open access to research literature.  Data that is as Open as possible, as closed as necessary.  FAIR Data (Findable, Accessible, Interoperable, Reusable).  Data is a recognised and important output of research.  A culture and methodology of open discussion and enquiry (including methodology, lab notebooks, pre- prints).  Data code and analysis processes are shared for reproducibility.  Engagement with society and the economy in research activities (citizen science, co-design / transdisciplinary research, interface between research, development and innovation).
  6. 6. What is Open Science? (2)  Open Science is not just Open Access + Open Data.  Individuals, institutions and the science system benefits from putting research outputs (including data) in the open: shop window and repository of all research outputs.  Important role of open processes, open data and reproducibility / replicability.  Role of AI / Machine Learning: analysis at scale.  Open innovation and transdisciplinary research.  The Open Science ethos and co-design helps build collaboration between research institutions, societal groups, government agencies, third sector and industry.
  7. 7.
  8. 8.  Dryad Joint Data Archiving Policy, Feb 2010:  This journal requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as GenBank, TreeBASE, Dryad, or the Knowledge Network for Biocomplexity.  PLOS Data Availability Policy, revised Feb 2014:  PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exceptions.  Springer Nature initiative to standardise policies:  RDA Interest Group developing standardised journal data policies. Developments: Journal Policies
  9. 9.  Bill and Melinda Gates Foundation, Open Access and Open Data Policy  ‘Data Underlying Published Research Results Will Be Accessible and Open Immediately. The foundation will require that data underlying the published research results be immediately accessible and open. This too is subject to the transition period and a 12-month embargo may be applied.’  MSF Data Sharing Policy:  ‘MSF recognizes the ethical imperative it has to share its data openly, transparently and in a timely manner for the greater public health good.’  Appropriate restrictions for consent, privacy, etc.  European Commission Data Policy: ‘as open as possible, as closed as necessary’, FAIR Data  Wellcome Trust: strong support for Open Data sharing, with appropriate restrictions. Developments: Donor Policies
  10. 10. 80% of ecology data irretrievable after 20 years (516 studies) Vines TH et al. (2013) Current Biology DOI: 10.1016/j.cub.2013.11.014
  11. 11. Data Entropy: the Michener Cliff
  12. 12. Global Registry of Data Repositories Country coverage in (registry of data repositories) accessed Dec 2017
  13. 13. Data Seal of Approval Location of repositories having acquired Data Seal of Approval (accessed Dec 2017)
  14. 14. Slide Credit: Laura Merson, IDDO
  15. 15. Slide Credit: Laura Merson, IDDO
  16. 16. Good Data Management / Data Science Saves Time! Lowndes et al., 2017, Our path to better science in less time using open data science tools, Nature Ecology and Evolution, Ocean Health Index Project
  17. 17. Data Revolution: A World that Counts!  Creating a world that counts: Mobilising the Data Revolution for Sustainable Development.  To meet the new sustainablity goals ‘there is an urgent need to mobilise the data revolution for all people and the whole planet in order to monitor progress, hold governments accountable and foster sustainable development.’  Without immediate action, gaps between developed and developing countries, between information-rich and information- poor people, and between the private and public sectors will widen, and risks of harm and abuses of human rights will grow.  Data quality and integrity  Data disaggregation (no-one should be invisible)  Data timeliness  Data transparency and openness  Data usability and curation  Data protection and privacy  Data governance and independence  Data resources and capacity  Data rights
  18. 18. The Value of Open Data Sharing  Report by CODATA for GEO, the Group on Earth Observation.  Provides a concise, accessible, high level synthesis of key arguments and evidence of the benefits and value of open data sharing.  Particular, but not exclusive, reference to Earth Observation data.  Benefits in the areas of:  Economic Benefits  Social Welfare Benefits  Research and Innovation Opportunities  Education  Governance  Available at  GEO DSWG is building on this work with further examples: would be valuable to work with this community.
  19. 19. The Case for Open Data in a Big Data World • Science International Accord on Open Data in a Big Data World: • Supported by four major international science organisations. • Presents a powerful case that the profound transformations mean that data should be: • Open by default: as open as possible, as closed as necessary • Intelligently open, FAIR data • Lays out a framework of principles, responsibilities and enabling practices for how the vision of Open Data in a Big Data World can be achieved. • Campaign for endorsements: over 150 organisations so far. • Please consider endorsing the Accord:
  20. 20. Framework for Regional, National and Institutional Data Strategies  National / Institutional Open Science and FAIR Data Strategy  Consultative forum, stakeholder engagement.  Open data policies and guidance at national and institutional level.  Clarify the boundaries of open (particularly privacy, IPR).  Clarify the data in scope, guidelines on selection.  Develop incentives and reward systems.  Mechanisms (infrastructure and policy) to ensure concurrent publication of data as research output.  Data ‘publication’ and citations of data included in assessment of research contribution.  Promotion of data skills:  Essential data skills for researchers.  Develop skills and competencies for data stewards, data scientists.
  21. 21. Framework for Regional, National and Institutional Data Strategies  Scope, roadmap and implement data infrastructure.  Key components of national and regional infrastructure (network / NREN, economies of scale for storage and compute).  Development of regional, national and institutional infrastructure(s) for research collaboration and data stewardship/RDM, generic research platforms/environments, trusted digital repositories.  Collaborative infrastructures for certain research disciplines, nationally, regionally to pool expertise and lower costs.  International infrastructure / data ecosystem components: permanent identifiers, metadata standards.
  22. 22. Data is difficult: benefits and challenges  Open and FAIR data is essential for transparency and reproducibility; to take advantage of analysis at scale; to tackle major interdisciplinary challenges that require integration of data from many resources; has significant economic and other societal benefits, including encouraging partnerships between research, government, innovation and development.  But…  Research funders and research performing institutions will have to invest in data infrastructure.  Essential to consider the cost of data stewardship and dissemination as part of the total cost of doing research.  Data description, definitions and ontologies, data management require significant effort.  Requires new data skills…  Requires a change in culture, new processes and activities…
  23. 23. Open Science and FAIR Data: Benefits for Stakeholders  Government and Innovation / Development  Increased impact from investment in activities relating to data; economic, innovation and research benefits.  Partnerships for research, development and innovation around co-design, Open Science and FAIR data.  Research Institutions:  Development of data capacity and data skills;  Not losing valuable data (stored on hard drives, not annotated or reusable);  Shop window of research activities and expertise (Open Access, Open Data / FAIR Data)  Capacity to build research schools around data assets and skills, attract international collaboration and investment.  Build case for ‘data sovereignty’, data (re-)patriation.  Researchers:  Increased data skills, expertise in FAIR data builds competitive edge.  Citation advantage of Open Access / Open Data.  Culture of certain research disciplines is already strongly in favour of Open Data / Open Science.
  24. 24. Vision and Mission of an African Open Science Platform  African scientists are at the cutting edge of contemporary, data-intensive science as a fundamental resource for a modern society.  A digital ecosystem with four complementary aims governed by a set of common principles and practices: 1. A virtual space for scientists to find, deposit, manage, share and reuse data, software and metadata; 2. A means of continually developing capacities at all levels of national science systems and amongst professionals and their institutions operating in the public and private domain; 3. A basis for multi-stakeholder consortia that wish to utilise powerful digital tools in addressing major common problems, and for work in the trans-disciplinary mode; 4. A forum for exchange of ideas, best practices and opportunities amongst Platform partners and with the international data-science community. 5. An African Data Science Institute, to advance the frontiers of data science and provide support for interdisciplinary research domains where there are particularly strong data assets in Africa.
  25. 25. Simon Hodson Executive Director CODATA Email: Twitter: @simonhodson99 Tel (Office): +33 1 45 25 04 96 | Tel (Cell): +33 6 86 30 42 59 CODATA (ICSU Committee on Data for Science and Technology), 5 rue Auguste Vacquerie, 75016 Paris, Thank you for your attention!
  26. 26. INTERNATIONAL DATA WEEK IDW 2018 Gaborone, Botswana: 5-8 November 2018 Information: Deadline for abstracts, 31 May:
  27. 27. CODATA-RDA School of Research Data Science • Contemporary research – particularly when addressing the most significant, interdisciplinary research challenges – increasingly depends on a range of skills relating to data. • These skills include the principles and practice of Open Science; research data management and curation, how to prepare a data management plan and to annotate data; software and data carpentry; principles and practices of visualisation; data analysis, statistics and machine learning; use of computational infrastructures. The ensemble of these skills, relating to data in research, can usefully be called ‘Research Data Science’.
  28. 28. CODATA-RDA School of Research Data Science • Annual foundational school at ICTP, Trieste (with the objective to build a network of partners, train-the- trainers). • Advanced workshops, ICTP, Trieste, following the foundational school. • National or regional schools, organised with local partners. 2018 • Next #DataTrieste Summer School, 6-17 August 2018. • Next #DataTrieste Advanced Workshops 20-24 August 2018. • Call for applications, deadline 21 May: • Schools in Brisbane (UQ and Australian Academy of Sciences); ICTP Kigali (October); ICTP São Paulo (December)
  29. 29. DataTrieste Film on Vimeo: Call for applications, deadline 21 May: