Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

From Open Data to Open Science, by Geoffrey Boulton

912 views

Published on

1st LEARN Workshop. Embedding Research Data as part of the research cycle. 29 Jan 2016. Presentation by Geoffrey Boulton, University of Edinburgh & CODATA

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

From Open Data to Open Science, by Geoffrey Boulton

  1. 1. From Open Data to Open Science Geoffrey Boulton University of Edinburgh & CODATA “Learn” Workshop University College, London January 2016
  2. 2. Knowledge and understanding - the engines of material progress depend on technologies that enable their accumulation and communication 1454 2002
  3. 3. Openness – the bedrock of science in the modern era Henry Oldenburg
  4. 4. Scientific self correction
  5. 5. /var/folders/ls/nv6g47p94ks4d11f1p72h2ch00 00gn/T/com.apple.Preview/com.apple.Preview .PasteboardItems/rutford_avo_afi_ed_july201 0 (dragged).pdf The Challenge: the “Data Storm” is undermining “self correction” THEN AND NOW
  6. 6. A crisis of reproducibility and credibility? Why such low levels of reproducibility? • Misconduct/fraud • Invalid reasoning • Absent or inadequate data and/or metadata
  7. 7. 19 Exabytes280Exabytes Based on: http://www.martinhilbert.net/WorldOnfoCapacity.html 1 Exabyte=1018 bytes The digital revolution Global information storage capacity In optimally compressed bytes Digital Storage Analogue Storage Explosion of the Digital revolution 1986 1993 2000 2007 2014-4000Exabytes
  8. 8. http://www.wired.co. uk/news/archive/201 4-01/15/1000-dollar- genome/viewgallery/3 31679 Data acquistion: Cost down – Flux up
  9. 9. Information: how much is crystallised into knowledge?
  10. 10. Reinventing reproducibility for the digital age How do we retain an essential principle? The data providing the evidence for a published concept MUST be concurrently published, together with necessary metadata and computer code. To do otherwise is scientific MALPRACTICE
  11. 11. Ozone Levels Four key drivers of change for science • Big data • Semantically-linked data • Open data • Cost reduction Micro-satellite Looking at clouds
  12. 12. Pillars of the Digital Revolution Big Data Volume Velocity Variety Linked Open Data Many databases Semantic Relations Deeper meaning Foundations : Openness Machine analysis & learning Text and data mining
  13. 13. The opportunity: data from “simple” to complex systems from uncoupled to highly coupled behaviour Uncoupled systems Simulating behaviour of highly coupled systems
  14. 14. Simulating system dynamics Mapping a complex state Image of brain cells in a rat Emergent behaviour of a specific 6-component coupled system • patterns not hitherto seen • unsuspected relationship • complex systems e.g. complexity: dynamic evolution and system state Scientific opportunities
  15. 15. Satellite observation Surface monitoring The opportunity: data-modelling: iterative integration Initial conditions Model forecast Model-data iteration - forecast correction
  16. 16. Linear regression Cluster analysis Dynamic/complex behaviour Complex systems No mathematical pipeline Simple relationships Classical statistics System characterisations: from simple to complex Glucose in type II diabetes Topological analysis
  17. 17. A barrier to openness? - Analytic overload. E.g. - Global Earth Observation System of Systems • What is the human role? • Can we analyse & scrutinise what is in the black box? - &who owns the box? • What does it mean to be a researcher in a data intensive age? A disconnect between machine analysis & human cognition?
  18. 18. Mathematics related discussions Tim Gowers - crowd-sourced mathematics An unsolved problem posed on his blog. 32 days – 27 people – 800 substantive contributions Emerging contributions rapidly developed or discarded Problem solved! “Its like driving a car whilst normal research is like pushing it” What inhibits such processes? - The criteria for credit and promotion – ALTMETRICS THE ANSWER? New modes of technology- enabled creativity: e.g Crowd-sourcing
  19. 19. The Open Data Iceberg The Technical Challenge The Consent Challenge The Ecosystem Challenge The Funding Challenge The Support Challenge The Skills Challenge The Incentives Challenge The Mindset Challenge Processes & Organisation People motivation and ethos. Developed from: Deetjen, U., E. T. Meyer and R. Schroeder (2015). A National Infrastructure Technology
  20. 20. The “Science International” Accord: principles of open data (www.icsu.org/science-international) Responsibilities 1-2. Scientists 3. Research institutions & universities 4. Publishers 5. Funding agencies 6. Scholarly societies and academies 7. Libraries & repositories 8. Boundaries of openness Enabling practices 9. Citation and provenance 10. Interoperability 11. Non-restrictive re-use 12. Linkability
  21. 21. Responsibilities Scientists i. Publicly funded scientists have a responsibility to contribute to the public good through the creation and communication of new knowledge, of which associated data are intrinsic parts. They should make such data openly available to others as soon as possible after their production in ways that permit them to be re- used and re-purposed. ii. The data that provide evidence for published scientific claims should be made concurrently and publicly available in an intelligently open form. This should permit the logic of the link between data and claim to be rigorously scrutinised and the validity of the data to be tested by replication of experiments or observations. To the extent possible, data should be deposited in well-managed and trusted repositories with low access barriers.
  22. 22. CODATACODATA II SS UU African Open Data/Open Science Platform Platform Forum Coordination Government Priority setting Funders Funding Incentives Capacity Building Training and Skills Infrastructure Roadmaps Flagship Co-Designed Data Intensive Projects International Standards Programmes Shared infrastructure investment; shared good practice; capacity building; system development
  23. 23. EMBL-EBI services Labs around the world send us their data and we… Archive it Classify it Share it with other data providers Analyse, add value and integrate it …provide tools to help researchers use it A collaborative enterprise Disciplinary communities can lead the way e.g. Elixir programme in life sciences/bio-informatics
  24. 24. Regional Platforms for Open Science African Platform? Asian Platform? Australian Platform Shared investment in infrastructure; harvesting and circulating good ideas; spreading and supporting good practice; capacity building; promoting applications; linking to international programmes and standards. S. American Platform?
  25. 25. Inputs Outputs Open access Administrative data (held by public authorities e.g. prescription data) Public Sector Research data (e.g. Met Office weather data) Research Data (e.g. CERN, generated in universities) Research publications (i.e. papers in journals) Open data Open science “science as a public enterprise” Collecting the data Doing research Doing science openly Researchers - Govt & Public sector - Businesses - Citizens - Citizen scientists (communication/dialogue – joint production of knowledge) Stakeholders • Communication/dialogue must be audience-sensitive • Is it – with all stakeholder groups?
  26. 26. Open Science Data / Publications Researchers Mono/MultiInterTransdisciplinary Stakeholders RigourInnovationPolicySolutions Open Knowledge
  27. 27. Ins tu onal management and support Na onal policies & e-infrastructure Open Research Data Big Data Analy cs Knowledge Output EXPLOITING THE DATA REVOLUTION Scien fic inference Ins tu onal management & support Na onal policies & e-infrastructure A national data-intensive system
  28. 28. CODATACODATA II SS UU International Research Data Collaboration CODATACODATA II SS UU CODATA  Policies & practice  Frontiers of data science  Capacity Building WDS • Data stewardship • Data standards RDA • Interoperability
  29. 29. 1. Maintaining “self-correction” 2. Open knowledge is creative & productive “If you have an apple and I have an apple and we exchange these apples, then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas.” 3. Open data enables semantic linking George Bernard Shaw Why openness & sharing?
  30. 30. • Openly collected science is already helping policy makers. • AshTag app allows users to submit photos and locations of sightings to a team who will refer them on to the Forestry Commission, which is leading efforts to stop the disease's spread with the Department for Environment, Food and Rural Affairs (Defra). Chalara spread: 1992-2012 Citizen Science

×