The material advance of human society has been based on the acquisition and use of knowledge and science, as it has been practised in the last 300 years has proved to be the most effective way of gaining reliable knowledge. I want to talk about the processes whereby science is done and how they need to adapt to a novel environment in which we are able to acquire, store, manipulate and communicate data of unprecedented volume and complexity. What challenges does this environment offer to the essential processes of science, how can we exploit the opportunities that it offers and what barriers inhibit necessary changes. This is not about openness for itself – but open processes in the doing of science) Open science is not new. It was the bedrock on which the extraordinary scientific revolutions of the 18th and 19th centuries were built. But we do need to reinvent it for a data-rich era. So let us start with a little history.
This is Henry Oldenberg, the first secretary of the newly formed Royal Society in the early 1660s. Henry was an inveterate correspondent, with those we would now call scientists both in Europe and beyond. Rather than keep this correspondence private, he thought it would be a good idea to publish it, and persuaded the new Society to do so by creating the Philosophical Transactions, which remains a top-flight journal to the present day. But he demanded two things of his correspondents: that they should submit in the vernacular and not Latin; and that evidence (data) that supported a concept must be published together with the concept. It permitted others to scrutinize the logic of the concept, the extent to which it was supported by the data and permitted replication and re-use. Open publication of concept and evidence is the basis of “scientific self-correction”, which historians of science argue were the crucial building blocks on which the scientific revolution of the 18th and 19th centuries was built and remain fundamental to the progress of science. Openness to scrutiny by scientific peers is the most powerful form of peer review.
The fundamental challenge is to scientific self-correction. Journals can no longer contain the data, and neither scientists nor journals have taken the obvious step of having data relevant to a publication concurrently available in an electronic database. (example of last year’s Nature paper revealing that only 11% of results in 50 benchmark papers in pre-clinical oncology were replicable. If lack of Oldenburg’s rigour in presenting evidence is widespread, a failure of replicability risks undermines science as a reliable way of acquiring knowledge and can therefore undermines its credibility.
Lots of interchangeable and fluid terms but many shared principles. The word “science” is used to mean the systematic organisation of knowledge that can be rationally explained and reliably applied. It is not exclusively restricted to “natural science”.
Human and technical requirements for a sustainable data infrastructure. Network of world data centres. Data policies and data science: bringing data experts together with research scientists.
Ash dieback, caused by the fungus Chalara fraxinea, was found in the UK in October outside of plantations and nurseries in East Anglia, raising fears of a repeat of Dutch elm disease which killed 25 million mature elms in the 1970s and 80s. In an attempt to map and help prevent the spread of the disease across the country, a team of developers and academics worked through the weekend to create an app that smartphone owners can use to report suspected cases of infection. Infected ash trees are recognisable by lesions on their bark, dieback of leaves at the tree's crown, and leaves turning brown – though experts say the arrival of autumn makes the latter harder to accurately spot. zThe AshTag app for IOS and Android devices allows users to submit photos and locations of sightings to a team who will refer them on to the Forestry Commission, which is leading efforts to stop the disease's spread with the Department for Environment, Food and Rural Affairs (Defra).
From Open Data to Open Science, by Geoffrey Boulton
From Open Data to Open Science
University of Edinburgh & CODATA
University College, London
Knowledge and understanding - the engines of material progress
depend on technologies that enable their accumulation and communication
Openness – the bedrock of science in the
The Challenge: the “Data Storm” is undermining
THEN AND NOW
A crisis of reproducibility and credibility?
Why such low levels of reproducibility?
• Invalid reasoning
• Absent or inadequate data and/or metadata
http://www.martinhilbert.net/WorldOnfoCapacity.html 1 Exabyte=1018 bytes
The digital revolution
Global information storage capacity
In optimally compressed bytes
Explosion of the
Data acquistion: Cost down – Flux up
Information: how much is crystallised into knowledge?
for the digital age
How do we retain an essential principle?
The data providing the evidence for a published
concept MUST be concurrently published, together
with necessary metadata and computer code.
To do otherwise is scientific MALPRACTICE
Four key drivers of change for science
• Big data
• Semantically-linked data
• Open data
• Cost reduction
Looking at clouds
Pillars of the Digital Revolution
Foundations : Openness
Machine analysis & learning Text and data mining
The opportunity: data from “simple” to complex systems
from uncoupled to highly coupled behaviour
Simulating behaviour of
highly coupled systems
Simulating system dynamics Mapping a complex state
Image of brain cells in a rat
Emergent behaviour of a specific
6-component coupled system
• patterns not hitherto seen
• unsuspected relationship
• complex systems
e.g. complexity: dynamic evolution and system state
Satellite observation Surface monitoring
The opportunity: data-modelling: iterative integration
Model-data iteration - forecast correction
No mathematical pipeline
System characterisations: from simple to complex
Glucose in type II diabetes
A barrier to openness? - Analytic overload.
E.g. - Global Earth Observation System of Systems
• What is the human role?
• Can we analyse & scrutinise what is in the
black box? - &who owns the box?
• What does it mean to be a researcher in a
data intensive age?
A disconnect between machine
analysis & human cognition?
Mathematics related discussions
- crowd-sourced mathematics
An unsolved problem posed on
32 days – 27 people – 800
Emerging contributions rapidly
developed or discarded
“Its like driving a car whilst
normal research is like pushing
What inhibits such processes?
- The criteria for credit and
– ALTMETRICS THE ANSWER?
New modes of technology-
The Open Data Iceberg
The Technical Challenge
The Consent Challenge
The Ecosystem Challenge
The Funding Challenge
The Support Challenge
The Skills Challenge
The Incentives Challenge
The Mindset Challenge
motivation and ethos.
Developed from: Deetjen, U., E. T. Meyer and R. Schroeder (2015).
A National Infrastructure
The “Science International” Accord:
principles of open data
3. Research institutions & universities
5. Funding agencies
6. Scholarly societies and academies
7. Libraries & repositories
8. Boundaries of openness
9. Citation and provenance
11. Non-restrictive re-use
i. Publicly funded scientists have a responsibility to contribute to the
public good through the creation and communication of new
knowledge, of which associated data are intrinsic parts. They
should make such data openly available to others as soon as
possible after their production in ways that permit them to be re-
used and re-purposed.
ii. The data that provide evidence for published scientific claims
should be made concurrently and publicly available in an
intelligently open form. This should permit the logic of the link
between data and claim to be rigorously scrutinised and the
validity of the data to be tested by replication of experiments or
observations. To the extent possible, data should be deposited in
well-managed and trusted repositories with low access barriers.
African Open Data/Open Science Platform
Training and Skills
Shared infrastructure investment; shared good practice; capacity building;
Labs around the
world send us
their data and
Share it with
tools to help
Disciplinary communities can lead the way
e.g. Elixir programme in life sciences/bio-informatics
Regional Platforms for Open Science
Shared investment in infrastructure; harvesting and circulating good ideas;
spreading and supporting good practice; capacity building; promoting
applications; linking to international programmes and standards.
data (held by
(i.e. papers in
“science as a public enterprise”
Researchers - Govt & Public sector - Businesses - Citizens - Citizen scientists
(communication/dialogue – joint production of knowledge)
• Communication/dialogue must be audience-sensitive
• Is it – with all stakeholder groups?
Data / Publications
Ins tu onal
management and support
Na onal policies
EXPLOITING THE DATA REVOLUTION
Scien fic inference
Ins tu onal
management & support
Na onal policies
A national data-intensive system
International Research Data Collaboration
Policies & practice
Frontiers of data
• Data stewardship
• Data standards
1. Maintaining “self-correction”
2. Open knowledge is creative & productive
“If you have an apple and I have an apple and we
exchange these apples, then you and I will still
each have one apple. But if you have an idea and I
have an idea and we exchange these ideas, then
each of us will have two ideas.”
3. Open data enables semantic linking
George Bernard Shaw
Why openness & sharing?
• Openly collected science is already helping policy
• AshTag app allows users to submit photos and
locations of sightings to a team who will refer them on
to the Forestry Commission, which is leading efforts to
stop the disease's spread with the Department for
Environment, Food and Rural Affairs (Defra).
Chalara spread: 1992-2012