From Open Data to Open Science: Enabling Reproducibility, Innovation and Solutions

From Open Data to Open Science
Geoffrey Boulton
University of Edinburgh & CODATA
“Learn” Workshop
University College, London
January 2016

Knowledge and understanding - the engines of material progress
depend on technologies that enable their accumulation and communication
1454 2002

Openness – the bedrock of science in the
modern era
Henry Oldenburg

/var/folders/ls/nv6g47p94ks4d11f1p72h2ch00
00gn/T/com.apple.Preview/com.apple.Preview
.PasteboardItems/rutford_avo_afi_ed_july201
0 (dragged).pdf
The Challenge: the “Data Storm” is undermining
“self correction”
THEN AND NOW

A crisis of reproducibility and credibility?
Why such low levels of reproducibility?
• Misconduct/fraud
• Invalid reasoning
• Absent or inadequate data and/or metadata

19
Exabytes280Exabytes
Based on:
http://www.martinhilbert.net/WorldOnfoCapacity.html 1 Exabyte=1018 bytes
The digital revolution
Global information storage capacity
In optimally compressed bytes
Digital
Storage
Analogue Storage
Explosion of the
Digital revolution
1986
1993
2000
2007
2014-4000Exabytes

http://www.wired.co.
uk/news/archive/201
4-01/15/1000-dollar-
genome/viewgallery/3
31679
Data acquistion: Cost down – Flux up

Information: how much is crystallised into knowledge?

Reinventing reproducibility
for the digital age
How do we retain an essential principle?
The data providing the evidence for a published
concept MUST be concurrently published, together
with necessary metadata and computer code.
To do otherwise is scientific MALPRACTICE

Ozone Levels
Four key drivers of change for science
• Big data
• Semantically-linked data
• Open data
• Cost reduction
Micro-satellite
Looking at clouds

Pillars of the Digital Revolution
Big Data
Volume
Velocity
Variety
Linked
Open
Data
Many
databases
Semantic
Relations
Deeper
meaning
Foundations : Openness
Machine analysis & learning Text and data mining

The opportunity: data from “simple” to complex systems
from uncoupled to highly coupled behaviour
Uncoupled
systems
Simulating behaviour of
highly coupled systems

Simulating system dynamics Mapping a complex state
Image of brain cells in a rat
Emergent behaviour of a specific
6-component coupled system
• patterns not hitherto seen
• unsuspected relationship
• complex systems
e.g. complexity: dynamic evolution and system state
Scientific opportunities

Satellite observation Surface monitoring
The opportunity: data-modelling: iterative integration
Initial conditions
Model forecast
Model-data iteration - forecast correction

Linear regression
Cluster analysis
Dynamic/complex behaviour
Complex systems
No mathematical pipeline
Simple relationships
Classical statistics
System characterisations: from simple to complex
Glucose in type II diabetes
Topological analysis

A barrier to openness? - Analytic overload.
E.g. - Global Earth Observation System of Systems
• What is the human role?
• Can we analyse & scrutinise what is in the
black box? - &who owns the box?
• What does it mean to be a researcher in a
data intensive age?
A disconnect between machine
analysis & human cognition?

Mathematics related discussions
Tim Gowers
- crowd-sourced mathematics
An unsolved problem posed on
his blog.
32 days – 27 people – 800
substantive contributions
Emerging contributions rapidly
developed or discarded
Problem solved!
“Its like driving a car whilst
normal research is like pushing
it”
What inhibits such processes?
- The criteria for credit and
promotion
– ALTMETRICS THE ANSWER?
New modes of technology-
enabled creativity:
e.g Crowd-sourcing

The Open Data Iceberg
The Technical Challenge
The Consent Challenge
The Ecosystem Challenge
The Funding Challenge
The Support Challenge
The Skills Challenge
The Incentives Challenge
The Mindset Challenge
Processes &
Organisation
People
motivation and ethos.
Developed from: Deetjen, U., E. T. Meyer and R. Schroeder (2015).
A National Infrastructure
Technology

The “Science International” Accord:
principles of open data
(www.icsu.org/science-international)
Responsibilities
1-2. Scientists
3. Research institutions & universities
4. Publishers
5. Funding agencies
6. Scholarly societies and academies
7. Libraries & repositories
8. Boundaries of openness
Enabling practices
9. Citation and provenance
10. Interoperability
11. Non-restrictive re-use
12. Linkability

Responsibilities
Scientists
i. Publicly funded scientists have a responsibility to contribute to the
public good through the creation and communication of new
knowledge, of which associated data are intrinsic parts. They
should make such data openly available to others as soon as
possible after their production in ways that permit them to be re-
used and re-purposed.
ii. The data that provide evidence for published scientific claims
should be made concurrently and publicly available in an
intelligently open form. This should permit the logic of the link
between data and claim to be rigorously scrutinised and the
validity of the data to be tested by replication of experiments or
observations. To the extent possible, data should be deposited in
well-managed and trusted repositories with low access barriers.

CODATACODATA
II
SS
UU
African Open Data/Open Science Platform
Platform Forum
Coordination
Government
Priority setting
Funders
Funding
Incentives
Capacity Building
Training and Skills
Infrastructure
Roadmaps
Flagship
Co-Designed Data
Intensive Projects
International
Standards
Programmes
Shared infrastructure investment; shared good practice; capacity building;
system development

EMBL-EBI services
Labs around the
world send us
their data and
we…
Archive it
Classify it
Share it with
other data
providers
Analyse, add
value and
integrate it
…provide
tools to help
researchers
use it
A collaborative
enterprise
Disciplinary communities can lead the way
e.g. Elixir programme in life sciences/bio-informatics

Regional Platforms for Open Science
African
Platform?
Asian
Platform?
Australian
Platform
Shared investment in infrastructure; harvesting and circulating good ideas;
spreading and supporting good practice; capacity building; promoting
applications; linking to international programmes and standards.
S.
American
Platform?

Inputs Outputs
Open access
Administrative
data (held by
public
authorities e.g.
prescription
data)
Public Sector
Research data
(e.g. Met
Office weather
data)
Research
Data (e.g.
CERN,
generated in
universities)
Research
publications
(i.e. papers in
journals)
Open data
Open science
“science as a public enterprise”
Collecting the
data
Doing
research
Doing science
openly
Researchers - Govt & Public sector - Businesses - Citizens - Citizen scientists
(communication/dialogue – joint production of knowledge)
Stakeholders
• Communication/dialogue must be audience-sensitive
• Is it – with all stakeholder groups?

Open Science
Data / Publications
Researchers
Mono/MultiInterTransdisciplinary
Stakeholders
RigourInnovationPolicySolutions
Open Knowledge

Ins tu onal
management and support
Na onal policies
& e-infrastructure
Open
Research
Data
Big Data
Analy cs
Knowledge
Output
EXPLOITING THE DATA REVOLUTION
Scien fic inference
Ins tu onal
management & support
Na onal policies
& e-infrastructure
A national data-intensive system

CODATACODATA
II
SS
UU
International Research Data Collaboration
CODATACODATA
II
SS
UU
CODATA
 Policies & practice
 Frontiers of data
science
 Capacity Building
WDS
• Data stewardship
• Data standards
RDA
• Interoperability

1. Maintaining “self-correction”
2. Open knowledge is creative & productive
“If you have an apple and I have an apple and we
exchange these apples, then you and I will still
each have one apple. But if you have an idea and I
have an idea and we exchange these ideas, then
each of us will have two ideas.”
3. Open data enables semantic linking
George Bernard Shaw
Why openness & sharing?

• Openly collected science is already helping policy
makers.
• AshTag app allows users to submit photos and
locations of sightings to a team who will refer them on
to the Forestry Commission, which is leading efforts to
stop the disease's spread with the Department for
Environment, Food and Rural Affairs (Defra).
Chalara spread: 1992-2012
Citizen Science

From Open Data to Open Science: Enabling Reproducibility, Innovation and Solutions

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to From Open Data to Open Science: Enabling Reproducibility, Innovation and Solutions

Similar to From Open Data to Open Science: Enabling Reproducibility, Innovation and Solutions (20)

More from LEARN Project

More from LEARN Project (20)

Recently uploaded

Recently uploaded (20)

From Open Data to Open Science: Enabling Reproducibility, Innovation and Solutions

Editor's Notes