SlideShare a Scribd company logo
1 of 41
Open Data in a Big Data World: easy to
say, but hard to do?
Sarah Callaghan
sarah.callaghan@stfc.ac.uk
@sorcha_ni
ORCID: 0000-0002-0517-1031
Geoffrey Boulton, Dominique Babini, Simon Hodson, Jianhui Li, Tshilidzi
Marwala, Maria Musoke, Paul Uhlir, Sally Wyatt
3rd LEARN workshop on Research Data Management,
“Make research data management policies work”
Helsinki, 28 June 2016
Principles, Policies & Practice
Responsibilities
1-2. Scientists
3.Research institutions & universities
4.Publishers
5.Funding agencies
6.Scholarly societies and academies
7.Libraries & repositories
8. Boundaries of openness
Enabling practices
9. Citation and provenance
10. Interoperability
11. Non-restrictive re-use
12. Linkability
http://www.icsu.org/science-
international/accord
The Data Deluge
http://www.economist.com/node/21521549
http://www.leadformix.com/blog/2013/02/the-big-data-deluge/
It used to be “easy”…
Suber cells and mimosa
leaves. Robert Hooke, Micrographia,
1665
The Scientific Papers of William Parsons,
Third Earl of Rosse 1800-1867
…but datasets have gotten so big, it’s not useful
to publish them in hard copy anymore
Hard copy of the Human Genome at the
Wellcome Collection
Example Big Data: CMIP5
CMIP5: Fifth Coupled Model
Intercomparison Project
• Global community activity under the
World Meteorological Organisation (WMO)
via the World Climate Research
Programme (WCRP)
•Aim:
– to address outstanding scientific
questions that arose as part of the
4th
Assessment Report process,
– improve understanding of climate,
and
– to provide estimates of future
climate change that will be useful to
those considering its possible
consequences.
Many distinct experiments, with very
different characteristics, which influence the
configuration of the models, (what they can
do, and how they should be interpreted).
Simulations:
~ 90,000 years
~ 60 experiments
~ 20 modelling centres (from around the world)
using
~ 30 major(*) model configurations
~ 2 million output “atomic” datasets
~ 10's of petabytes of output
~ 2 petabytes of CMIP5 requested output
~ 1 petabyte of CMIP5 “replicated” output
Which are replicated at a number of sites
(including ours)
Major international collaboration!
Funded by EU FP7 projects (IS-ENES2,
Metafor) and US (ESG) and other national
sources (e.g. NERC for the UK)
CMIP5 numbers
10
Summary of the CMIP5 example
The Climate problem needs:
– Major physical e-infrastructure (networks, supercomputers)
– Comprehensive information architectures covering the whole information life
cycle, including annotation (particularly of quality)
… and hard work populating these information objects, particularly with
provenance detail.
– Sophisticated tools to produce and consume the data and information
objects
– State of the art access control techniques
Major distributed systems are social challenges as much as technical challenges.
CMIP5 is Big Data, with lots of different participants and lots of different
technologies.
It also has a community willing to work together to standardise and automate data
and metadata production and curation, and with the willingness to support the
effort needed for openness.
Big Data:
•Industrialised and standardised data
and metadata production
•Large groups of people involved
•Methods for making the data open,
attribution and credit for data creation
established
Long Tail Data:
•Bespoke data and metadata creation
methods
•Small groups/lone researchers
•No generally accepted methods for
attribution and credit for data creation.
Often data is closed due to lack of effort
to open it
https://flic.kr/p/g1EHPR
Most people have an idea of what a
publication is
Some examples of data (just from the
Earth Sciences)
1. Time series, some still being updated
e.g. meteorological measurements
2. Large 4D synthesised datasets, e.g.
Climate, Oceanographic, Hydrological
and Numerical Weather Prediction
model data generated on a
supercomputer
3. 2D scans e.g. satellite data, weather
radar data
4. 2D snapshots, e.g. cloud camera
5. Traces through a changing medium,
e.g. radiosonde launches, aircraft
flights, ocean salinity and temperature
6. Datasets consisting of data from
multiple instruments as part of the
same measurement campaign
7. Physical samples, e.g. fossils
Open Data is not a new idea
Henry Oldenburg
Data, Reproducibility and Science
Science should be reproducible –
other people doing the same
experiments in the same way should
get the same results.
Observational data is not
reproducible (unless you have a time
machine)
Therefore we need to have access to
the data to confirm the science is
valid!
Poor data analysis generates false
facts – and false facts &
inaccessible data undermine
science & its credibility
http://www.flickr.com/photos/31333486@N00/1893012324/siz
es/o/in/photostream/
A crisis of reproducibility and
credibility?
The data providing the evidence for a published concept MUST be concurrently
published, together with the metadata. To do otherwise is scientific MALPRACTICE
Pre-clinical oncology – 89% not reproducible
Why?
•Misconduct/fraud
•Invalid reasoning
•Absent or inadequate data and/or metadata
We’re only going to get more data
More big data - linked data – machine learning
The internet of things
So, what must we do?
•Concurrently publish data and metadata that are the evidence for a published
scientific claim – to do otherwise is malpractice
•Data science skills for researchers
•Re-establish standards of reproducibility for a data-intensive age
• Patterns not hitherto seen
• Unsuspected relationships
• Integrated analysis of diverse data (e.g. natural & social science)
• Complex systems
e.g. complexity: dynamic evolution and system state
But not all research is or needs to be data-intensive
Scientific Opportunities of Big Data
https://www.clickz.com/clic
kz/column/2389218/create
-better-content-via-humor
http://www.tylervigen.com/spurious-correlations
Caveat Emptor!
Data supporting a published claim Other data for re-use & integration
Pillars of the Digital Revolution
Big Data
Volume
Velocity
Variety
Veracity
Linked
Data
Many
databases
Semantic
Relations
Deeper
meaning
Foundations : Openness
Machine analysis & learning
The Open Data Edifice
Open Data initiatives in areas of:
Life sciences
Earth Science,
Environmental Science
Food Science
Agricultural Science
Chemical Crystallography
Bioinformatics/Genomics
Linguistics
Social Sciences
Evolutionary biology
Biodiversity
Astronomy
Earth Observation (GEO)
Archaeology
Atmospheric sciences
EMBL-EBI services
Labs around the
world send us
their data and
we…
Archive it
Classify it
Share it with
other data
providers
Analyse, add
value and
integrate it
…provide
tools to help
researchers
use it
A collaborative
enterprise
Elixir programme
It is happening: bottom-
up Open Data initiatives
The Open Data Iceberg
The Technical Challenge
The Consent Challenge
The Institutional Challenge
The Funding Challenge
The Support Challenge
The Skills Challenge
The Incentives Challenge
The Mindset Challenge
Processes &
Organisation
People
Developed from: Deetjen, U., E. T. Meyer and R. Schroeder
(2015). OECD Digital Economy Papers, No. 246, OECD
A National Infrastructure
Technology
Scientists
i.Publicly funded scientists have a responsibility to contribute to the
public good through the creation and communication of new
knowledge, of which associated data are intrinsic parts. They should
make such data openly available to others as soon as possible after
their production in ways that permit them to be re-used and re-
purposed.
ii. The data that provide evidence for published scientific claims
should be made concurrently and publicly available in an
intelligently open form. This should permit the logic of the link
between data and claim to be rigorously scrutinised and the
validity of the data to be tested by replication of experiments or
observations. To the extent possible, data should be deposited in
well-managed and trusted repositories with low access barriers.
From the Accord: Responsibilities
Creating a dataset is hard work!
"Piled Higher and Deeper" by Jorge Cham
www.phdcomics.com
Documenting a dataset so that it is usable and understandable by
others is extra work!
“I’m all for the free sharing
of information, provided
it’s them sharing their
information with us.”
http://discworld.wikia.com/wiki/Mustrum_Ri
dcully
Mustrum Ridcully, D.Thau., D.M., D.S.,
D.Mn., D.G., D.D., D.C.L., D.M. Phil.,
D.M.S., D.C.M., D.W., B.El.L,
Archancellor, Unseen University, Anhk-
Morpork, Discworld
- As quoted in “Unseen Academicals”, by
Terry Pratchett
Open is not enough!
“When required to make the data available by
my program manager, my collaborators, and
ultimately by law, I will grudgingly do so by
placing the raw data on an FTP site, named
with UUIDs like 4e283d36-61c4-11df-9a26-
edddf420622d. I will under no circumstances
make any attempt to provide analysis source
code, documentation for formats, or any
metadata with the raw data. When requested
(and ONLY when requested), I will provide an
Excel spreadsheet linking the names to data
sets with published results. This spreadsheet
will likely be wrong -- but since no one will be
able to analyze the data, that won't matter.”
- http://ivory.idyll.org/blog/data-
management.html https://flic.kr/p/awnCQu
Incentives for Open Data
• Need reward
structures and
incentives for
researchers to
encourage them to
make their data open
• Data citation and
publication
• (again, issues with
treating data as a
special case of
publications…)
The Understandability
Challenge: Article
What the data set looks
like on disk
What the raw data files look like.
I could make these files open
easily, but no one would have
a clue how to use them!
The
Understandability
Challenge: Data
It’s ok, I’ll just put it out there and if it’s
important other people will figure it out
These documents have been preserved for thousands of years!
But they’ve both been translated many times, with different meanings each time.
We need Metadata to preserve Information
We can’t rely on Data Archaeology
Phaistos Disk, 1700BC
http://theupturnedmicroscope.com/comi
c/negative-data/
It’s not just data!
• Experimental protocols
• Workflows
• Software code
• Metadata
• Things that went wrong!
• …
Usability, trust, metadata
http://trollcats.com/2009/11/im-your-friend-and-i-
only-want-whats-best-for-you-trollcat/
When you read a journal paper, it’s easy to
read and get a quick understanding of the
quality of the paper.
You don’t want to be downloading many
GB of dataset to open it and see if it’s any
use to you.
Need to use proxies for quality:
•Do you know the data source/repository?
Can you trust it?
•Is there enough metadata so that you can
understand and/or use the data?
In the same way that not all journal
publishers are created equal, not all data
repositories are created equal
Example metadata from a published
dataset:
“rain.csv contains rainfall in mm for each
month at Marysville, Victoria from
January 1995 to February 2009”
Lindenmayer, David B.; Wood, Jeff; McBurney, Lachlan;
Michael, Damian; Crane, Mason; MacGregor, Christopher;
Montague-Drake, Rebecca; Gibbons, Philip; Banks, Sam C.;
(2011): rain; Dryad Digital Repository.
http://doi.org/10.5061/DRYAD.QP1F6H0S/3
Should ALL data be open?
Most data produced through
publically funded research
should be open.
But!
• Confidentiality issues (e.g.
named persons’ health records)
• Conservation issues (e.g. maps
of locations of rare animals at
risk from poachers)
• Security issues (e.g. data and
methodologies for building
biological weapons) There should be a very good
reason for publically funded
data to not be open.
Getting scooped
http://www.phdcomics.com/comics/archive.php?comicid=795
It happened to me!
I shared my data with another research group. They published
the first results using that data.
I wasn’t a co-author. I didn’t get an acknowledgement.
Citeable does not equal Open!
Just like you can cite a paper that is
behind a paywall, you can cite a
dataset that isn’t open.
Making something citeable means
that:
• You know it exists
• You know who’s responsible for it
• You know where to find it
• You know a little bit about it (title,
abstract,…)
Even if you can’t download/read the
thing yourself.
Citation gives benefits that
encourage data producers to
make their data open
Be careful of your citations!
Inputs Outputs
Open access
Administrative
data (held by
public
authorities e.g.
prescription
data)
Public Sector
Research data
(e.g. Met
Office weather
data)
Research
Data (e.g.
CERN,
generated in
universities)
Research
publications
(i.e. papers
in journals)
Open data
Open science
A direction of travel?
Collecting
the data
Doing
research
Doing science
openly
Researchers - Govt & Public sector - Businesses - Citizens - Citizen scientists
(communication/dialogue – joint production of knowledge)
Stakeholders
• Communication/dialogue must be audience-sensitive
• Is it – with all stakeholder groups?
Summary and maybe
conclusions?
• We need to open the products of research
• to encourage innovation and collaboration
• to give credit to the people who’ve created
them
• to be transparent and trustworthy
• Openness does come at a cost!
• It’s not enough for data to be open
• it needs to be usable and understandable
too
• Data citation and publication are ways of
encouraging researchers to make their data
open
• or at least tell the world that their data exists!
• We need a culture change – but it’s
already happening!
http://www.keepcalm-o-matic.co.uk/default.asp
Thanks!
Any questions?
sarah.callaghan@stfc.ac.uk
@sorcha_ni
http://citingbytes.blogspot.co.uk/
“Publishing research without data is simply
advertising, not science” - Graham Steel
http://blog.okfn.org/2013/09/03/publishing-research-without-data-is-simply-advertising-not-science/
http://heywhipple.com/dont-show-me-a-something-
about-show-me-something/

More Related Content

What's hot

The Challenges of Making Data Travel, by Sabina Leonelli
The Challenges of Making Data Travel, by Sabina LeonelliThe Challenges of Making Data Travel, by Sabina Leonelli
The Challenges of Making Data Travel, by Sabina LeonelliLEARN Project
 
Research Data in an Open Science World - Prof. Dr. Eva Mendez, uc3m
Research Data in an Open Science World - Prof. Dr. Eva Mendez, uc3mResearch Data in an Open Science World - Prof. Dr. Eva Mendez, uc3m
Research Data in an Open Science World - Prof. Dr. Eva Mendez, uc3mLEARN Project
 
LEARN Final Conference: Tutorial Group | Implementing the LEARN RDM Toolkit
LEARN Final Conference: Tutorial Group | Implementing the LEARN RDM ToolkitLEARN Final Conference: Tutorial Group | Implementing the LEARN RDM Toolkit
LEARN Final Conference: Tutorial Group | Implementing the LEARN RDM ToolkitLEARN Project
 
Opening Research Data in EU Universities: Policies, Motivators and Challenges
Opening Research Data in EU Universities: Policies, Motivators and ChallengesOpening Research Data in EU Universities: Policies, Motivators and Challenges
Opening Research Data in EU Universities: Policies, Motivators and ChallengesLEARN Project
 
What does open science mean? A stakeholder perspective
What does open science mean? A stakeholder perspectiveWhat does open science mean? A stakeholder perspective
What does open science mean? A stakeholder perspectiveLIBER Europe
 
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...LIBER Europe
 
The Needs of Stakeholders in the RDM Process - the role of LEARN
The Needs of Stakeholders in the RDM Process - the role of LEARNThe Needs of Stakeholders in the RDM Process - the role of LEARN
The Needs of Stakeholders in the RDM Process - the role of LEARNLEARN Project
 
Introduction to open science
Introduction to open scienceIntroduction to open science
Introduction to open scienceReme Melero
 
Developing a Framework for Research Data Management Protocols
Developing a Framework for Research Data Management ProtocolsDeveloping a Framework for Research Data Management Protocols
Developing a Framework for Research Data Management ProtocolsLEARN Project
 
Fostering Open Science to Research Using a Taxonomy and an eLearning Portal
Fostering Open Science to Research Using a Taxonomy and an eLearning PortalFostering Open Science to Research Using a Taxonomy and an eLearning Portal
Fostering Open Science to Research Using a Taxonomy and an eLearning PortalNancy Pontika
 
Supporting Research Data Management in UK Universities: the Jisc Managing Res...
Supporting Research Data Management in UK Universities: the Jisc Managing Res...Supporting Research Data Management in UK Universities: the Jisc Managing Res...
Supporting Research Data Management in UK Universities: the Jisc Managing Res...L Molloy
 
LEARN Final Conference: Tutorial Group | How To Engage Early Career Researchers
LEARN Final Conference: Tutorial Group | How To Engage Early Career ResearchersLEARN Final Conference: Tutorial Group | How To Engage Early Career Researchers
LEARN Final Conference: Tutorial Group | How To Engage Early Career ResearchersLEARN Project
 
Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...
Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...
Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...LIBER Europe
 
Enabling Data-Intensive Science Through Data Infrastructures
Enabling Data-Intensive Science Through Data InfrastructuresEnabling Data-Intensive Science Through Data Infrastructures
Enabling Data-Intensive Science Through Data InfrastructuresLIBER Europe
 
UK Research Data Management: overview to ADBU congress, 19 Sep 2013 by Laura ...
UK Research Data Management: overview to ADBU congress, 19 Sep 2013 by Laura ...UK Research Data Management: overview to ADBU congress, 19 Sep 2013 by Laura ...
UK Research Data Management: overview to ADBU congress, 19 Sep 2013 by Laura ...L Molloy
 
Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Jisc
 
Data, Science, Society - Claudio Gutierrez, University of Chile
Data, Science, Society - Claudio Gutierrez, University of ChileData, Science, Society - Claudio Gutierrez, University of Chile
Data, Science, Society - Claudio Gutierrez, University of ChileLEARN Project
 
Open science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, PotsdamOpen science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, PotsdamPlatforma Otwartej Nauki
 

What's hot (20)

The Challenges of Making Data Travel, by Sabina Leonelli
The Challenges of Making Data Travel, by Sabina LeonelliThe Challenges of Making Data Travel, by Sabina Leonelli
The Challenges of Making Data Travel, by Sabina Leonelli
 
Research Data in an Open Science World - Prof. Dr. Eva Mendez, uc3m
Research Data in an Open Science World - Prof. Dr. Eva Mendez, uc3mResearch Data in an Open Science World - Prof. Dr. Eva Mendez, uc3m
Research Data in an Open Science World - Prof. Dr. Eva Mendez, uc3m
 
LEARN Final Conference: Tutorial Group | Implementing the LEARN RDM Toolkit
LEARN Final Conference: Tutorial Group | Implementing the LEARN RDM ToolkitLEARN Final Conference: Tutorial Group | Implementing the LEARN RDM Toolkit
LEARN Final Conference: Tutorial Group | Implementing the LEARN RDM Toolkit
 
Opening Research Data in EU Universities: Policies, Motivators and Challenges
Opening Research Data in EU Universities: Policies, Motivators and ChallengesOpening Research Data in EU Universities: Policies, Motivators and Challenges
Opening Research Data in EU Universities: Policies, Motivators and Challenges
 
What does open science mean? A stakeholder perspective
What does open science mean? A stakeholder perspectiveWhat does open science mean? A stakeholder perspective
What does open science mean? A stakeholder perspective
 
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...
 
The Needs of Stakeholders in the RDM Process - the role of LEARN
The Needs of Stakeholders in the RDM Process - the role of LEARNThe Needs of Stakeholders in the RDM Process - the role of LEARN
The Needs of Stakeholders in the RDM Process - the role of LEARN
 
Introduction to open science
Introduction to open scienceIntroduction to open science
Introduction to open science
 
Developing a Framework for Research Data Management Protocols
Developing a Framework for Research Data Management ProtocolsDeveloping a Framework for Research Data Management Protocols
Developing a Framework for Research Data Management Protocols
 
Fostering Open Science to Research Using a Taxonomy and an eLearning Portal
Fostering Open Science to Research Using a Taxonomy and an eLearning PortalFostering Open Science to Research Using a Taxonomy and an eLearning Portal
Fostering Open Science to Research Using a Taxonomy and an eLearning Portal
 
Supporting Research Data Management in UK Universities: the Jisc Managing Res...
Supporting Research Data Management in UK Universities: the Jisc Managing Res...Supporting Research Data Management in UK Universities: the Jisc Managing Res...
Supporting Research Data Management in UK Universities: the Jisc Managing Res...
 
LEARN Final Conference: Tutorial Group | How To Engage Early Career Researchers
LEARN Final Conference: Tutorial Group | How To Engage Early Career ResearchersLEARN Final Conference: Tutorial Group | How To Engage Early Career Researchers
LEARN Final Conference: Tutorial Group | How To Engage Early Career Researchers
 
November 10, 2015 NISO/ICSTI Joint Webinar: A Pathway from Open Access and Da...
November 10, 2015 NISO/ICSTI Joint Webinar: A Pathway from Open Access and Da...November 10, 2015 NISO/ICSTI Joint Webinar: A Pathway from Open Access and Da...
November 10, 2015 NISO/ICSTI Joint Webinar: A Pathway from Open Access and Da...
 
Open Science
Open ScienceOpen Science
Open Science
 
Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...
Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...
Roadmaps, Roles and Re-engineering: Developing Data Informatics Capability in...
 
Enabling Data-Intensive Science Through Data Infrastructures
Enabling Data-Intensive Science Through Data InfrastructuresEnabling Data-Intensive Science Through Data Infrastructures
Enabling Data-Intensive Science Through Data Infrastructures
 
UK Research Data Management: overview to ADBU congress, 19 Sep 2013 by Laura ...
UK Research Data Management: overview to ADBU congress, 19 Sep 2013 by Laura ...UK Research Data Management: overview to ADBU congress, 19 Sep 2013 by Laura ...
UK Research Data Management: overview to ADBU congress, 19 Sep 2013 by Laura ...
 
Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014
 
Data, Science, Society - Claudio Gutierrez, University of Chile
Data, Science, Society - Claudio Gutierrez, University of ChileData, Science, Society - Claudio Gutierrez, University of Chile
Data, Science, Society - Claudio Gutierrez, University of Chile
 
Open science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, PotsdamOpen science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, Potsdam
 

Similar to Open Data in a Big Data World: easy to say, but hard to do?

Open Data and Open Science
Open Data and Open ScienceOpen Data and Open Science
Open Data and Open ScienceTheContentMine
 
Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...African Open Science Platform
 
CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...Johann van Wyk
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open DataRoss Mounce
 
Open science curriculum for students, June 2019
Open science curriculum for students, June 2019Open science curriculum for students, June 2019
Open science curriculum for students, June 2019Dag Endresen
 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeLizLyon
 
Stories of “Glocality"—Nations in a Global Infrastructure
Stories of “Glocality"—Nations in a Global InfrastructureStories of “Glocality"—Nations in a Global Infrastructure
Stories of “Glocality"—Nations in a Global InfrastructureResearch Data Alliance
 
Rda nitrd 2015 berman - final
Rda nitrd 2015 berman  - finalRda nitrd 2015 berman  - final
Rda nitrd 2015 berman - finalKathy Fontaine
 
Gobinda Chowdhury
Gobinda ChowdhuryGobinda Chowdhury
Gobinda Chowdhurymaredata
 
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...Academy of Science of South Africa (ASSAf)
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global EcosystemPhilip Bourne
 
Minimal viable data reuse
Minimal viable data reuseMinimal viable data reuse
Minimal viable data reusevoginip
 
The world of research data: when should data be closed, shared or open
The world of research data: when should data be closed, shared or openThe world of research data: when should data be closed, shared or open
The world of research data: when should data be closed, shared or openheila1
 
Science as an Open Enterprise – Geoffrey Boulton
Science as an Open Enterprise – Geoffrey BoultonScience as an Open Enterprise – Geoffrey Boulton
Science as an Open Enterprise – Geoffrey BoultonOpenAIRE
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonAfrican Open Science Platform
 
Open Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics InstituteOpen Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics InstituteTheContentMine
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonAfrican Open Science Platform
 

Similar to Open Data in a Big Data World: easy to say, but hard to do? (20)

Open Data and Open Science
Open Data and Open ScienceOpen Data and Open Science
Open Data and Open Science
 
Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...
 
CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...CODATA International Training Workshop in Big Data for Science for Researcher...
CODATA International Training Workshop in Big Data for Science for Researcher...
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open Data
 
Open science curriculum for students, June 2019
Open science curriculum for students, June 2019Open science curriculum for students, June 2019
Open science curriculum for students, June 2019
 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decade
 
Stories of “Glocality"—Nations in a Global Infrastructure
Stories of “Glocality"—Nations in a Global InfrastructureStories of “Glocality"—Nations in a Global Infrastructure
Stories of “Glocality"—Nations in a Global Infrastructure
 
The State of Open Data Report by @figshare
The State of Open Data Report  by @figshareThe State of Open Data Report  by @figshare
The State of Open Data Report by @figshare
 
Rda nitrd 2015 berman - final
Rda nitrd 2015 berman  - finalRda nitrd 2015 berman  - final
Rda nitrd 2015 berman - final
 
Gobinda Chowdhury
Gobinda ChowdhuryGobinda Chowdhury
Gobinda Chowdhury
 
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
Open FAIR Data and Open Science: Developing Partnerships, Strategies, Policie...
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global Ecosystem
 
Minimal viable data reuse
Minimal viable data reuseMinimal viable data reuse
Minimal viable data reuse
 
The world of research data: when should data be closed, shared or open
The world of research data: when should data be closed, shared or openThe world of research data: when should data be closed, shared or open
The world of research data: when should data be closed, shared or open
 
Science as an Open Enterprise – Geoffrey Boulton
Science as an Open Enterprise – Geoffrey BoultonScience as an Open Enterprise – Geoffrey Boulton
Science as an Open Enterprise – Geoffrey Boulton
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
Ebi
EbiEbi
Ebi
 
Open Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics InstituteOpen Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics Institute
 
Open Notebook Science
Open Notebook ScienceOpen Notebook Science
Open Notebook Science
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon Hodson
 

More from LEARN Project

Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster LEARN Project
 
LEARN Final Conference: Tutorial Group | Using the LEARN Model RDM Policy
LEARN Final Conference: Tutorial Group | Using the LEARN Model RDM PolicyLEARN Final Conference: Tutorial Group | Using the LEARN Model RDM Policy
LEARN Final Conference: Tutorial Group | Using the LEARN Model RDM PolicyLEARN Project
 
LEARN Final Conference: Tutorial Group | Costing RDM
LEARN Final Conference: Tutorial Group | Costing RDMLEARN Final Conference: Tutorial Group | Costing RDM
LEARN Final Conference: Tutorial Group | Costing RDMLEARN Project
 
Paolo Budroni at COAR Annual Meeting
Paolo Budroni at COAR Annual MeetingPaolo Budroni at COAR Annual Meeting
Paolo Budroni at COAR Annual MeetingLEARN Project
 
About Data From A Machine Learning Perspective
About Data From A Machine Learning PerspectiveAbout Data From A Machine Learning Perspective
About Data From A Machine Learning PerspectiveLEARN Project
 
LEARN Carribean Workshop Opening Remarks
LEARN Carribean Workshop Opening RemarksLEARN Carribean Workshop Opening Remarks
LEARN Carribean Workshop Opening RemarksLEARN Project
 
Managing Research Data in the Caribbean: Good practices and challenges
Managing Research Data in the Caribbean: Good practices and challengesManaging Research Data in the Caribbean: Good practices and challenges
Managing Research Data in the Caribbean: Good practices and challengesLEARN Project
 
LEARN Project: The Story So Far
LEARN Project: The Story So FarLEARN Project: The Story So Far
LEARN Project: The Story So FarLEARN Project
 
The Data Deluge: the Role of Research Organisations
The Data Deluge: the Role of Research OrganisationsThe Data Deluge: the Role of Research Organisations
The Data Deluge: the Role of Research OrganisationsLEARN Project
 
Data for Development in the Caribbean
Data for Development in the CaribbeanData for Development in the Caribbean
Data for Development in the CaribbeanLEARN Project
 
Open Data in a Big World by Fernando Ariel López
Open Data in a Big World by Fernando Ariel López Open Data in a Big World by Fernando Ariel López
Open Data in a Big World by Fernando Ariel López LEARN Project
 
Research Data Management in São Paulo by Fabio Kon FAPESP
Research Data Management in São Paulo by Fabio Kon FAPESPResearch Data Management in São Paulo by Fabio Kon FAPESP
Research Data Management in São Paulo by Fabio Kon FAPESPLEARN Project
 
Gestion de datos para la investigacion: el caso peruano by Edward Mezones, Su...
Gestion de datos para la investigacion: el caso peruano by Edward Mezones, Su...Gestion de datos para la investigacion: el caso peruano by Edward Mezones, Su...
Gestion de datos para la investigacion: el caso peruano by Edward Mezones, Su...LEARN Project
 
TALLER LEARN SOBRE DATOS DE INVESTIGACIÓN IMPLEMENTACIÓN DE POLÍTICAS Y ESTRA...
TALLER LEARN SOBRE DATOS DE INVESTIGACIÓN IMPLEMENTACIÓN DE POLÍTICAS Y ESTRA...TALLER LEARN SOBRE DATOS DE INVESTIGACIÓN IMPLEMENTACIÓN DE POLÍTICAS Y ESTRA...
TALLER LEARN SOBRE DATOS DE INVESTIGACIÓN IMPLEMENTACIÓN DE POLÍTICAS Y ESTRA...LEARN Project
 
Avances en torno a la Ley 26.899 e iniciativa regional de datos primarios de...
Avances en torno a la Ley 26.899 e iniciativa regional de datos primarios de...Avances en torno a la Ley 26.899 e iniciativa regional de datos primarios de...
Avances en torno a la Ley 26.899 e iniciativa regional de datos primarios de...LEARN Project
 
“Data for Development – the value of data for research and society” by Dr. Ma...
“Data for Development – the value of data for research and society” by Dr. Ma...“Data for Development – the value of data for research and society” by Dr. Ma...
“Data for Development – the value of data for research and society” by Dr. Ma...LEARN Project
 
Conicyt Y Mandato OECD by Patricia Muñoz, CONICYT (Chile)
Conicyt Y Mandato OECD by Patricia Muñoz, CONICYT (Chile)Conicyt Y Mandato OECD by Patricia Muñoz, CONICYT (Chile)
Conicyt Y Mandato OECD by Patricia Muñoz, CONICYT (Chile)LEARN Project
 
Datos Abiertos de Investigacion - Caso Mexico
Datos Abiertos de Investigacion - Caso MexicoDatos Abiertos de Investigacion - Caso Mexico
Datos Abiertos de Investigacion - Caso MexicoLEARN Project
 

More from LEARN Project (20)

Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster Research Data Management, Challenges and Tools - Per Öster
Research Data Management, Challenges and Tools - Per Öster
 
LEARN Final Conference: Tutorial Group | Using the LEARN Model RDM Policy
LEARN Final Conference: Tutorial Group | Using the LEARN Model RDM PolicyLEARN Final Conference: Tutorial Group | Using the LEARN Model RDM Policy
LEARN Final Conference: Tutorial Group | Using the LEARN Model RDM Policy
 
LEARN Final Conference: Tutorial Group | Costing RDM
LEARN Final Conference: Tutorial Group | Costing RDMLEARN Final Conference: Tutorial Group | Costing RDM
LEARN Final Conference: Tutorial Group | Costing RDM
 
Paolo Budroni at COAR Annual Meeting
Paolo Budroni at COAR Annual MeetingPaolo Budroni at COAR Annual Meeting
Paolo Budroni at COAR Annual Meeting
 
LEARN Webinar
LEARN WebinarLEARN Webinar
LEARN Webinar
 
About Data From A Machine Learning Perspective
About Data From A Machine Learning PerspectiveAbout Data From A Machine Learning Perspective
About Data From A Machine Learning Perspective
 
LEARN Carribean Workshop Opening Remarks
LEARN Carribean Workshop Opening RemarksLEARN Carribean Workshop Opening Remarks
LEARN Carribean Workshop Opening Remarks
 
Managing Research Data in the Caribbean: Good practices and challenges
Managing Research Data in the Caribbean: Good practices and challengesManaging Research Data in the Caribbean: Good practices and challenges
Managing Research Data in the Caribbean: Good practices and challenges
 
LEARN Project: The Story So Far
LEARN Project: The Story So FarLEARN Project: The Story So Far
LEARN Project: The Story So Far
 
The Data Deluge: the Role of Research Organisations
The Data Deluge: the Role of Research OrganisationsThe Data Deluge: the Role of Research Organisations
The Data Deluge: the Role of Research Organisations
 
Data for Development in the Caribbean
Data for Development in the CaribbeanData for Development in the Caribbean
Data for Development in the Caribbean
 
Open Data in a Big World by Fernando Ariel López
Open Data in a Big World by Fernando Ariel López Open Data in a Big World by Fernando Ariel López
Open Data in a Big World by Fernando Ariel López
 
CENTRO DE DATOS
CENTRO DE DATOSCENTRO DE DATOS
CENTRO DE DATOS
 
Research Data Management in São Paulo by Fabio Kon FAPESP
Research Data Management in São Paulo by Fabio Kon FAPESPResearch Data Management in São Paulo by Fabio Kon FAPESP
Research Data Management in São Paulo by Fabio Kon FAPESP
 
Gestion de datos para la investigacion: el caso peruano by Edward Mezones, Su...
Gestion de datos para la investigacion: el caso peruano by Edward Mezones, Su...Gestion de datos para la investigacion: el caso peruano by Edward Mezones, Su...
Gestion de datos para la investigacion: el caso peruano by Edward Mezones, Su...
 
TALLER LEARN SOBRE DATOS DE INVESTIGACIÓN IMPLEMENTACIÓN DE POLÍTICAS Y ESTRA...
TALLER LEARN SOBRE DATOS DE INVESTIGACIÓN IMPLEMENTACIÓN DE POLÍTICAS Y ESTRA...TALLER LEARN SOBRE DATOS DE INVESTIGACIÓN IMPLEMENTACIÓN DE POLÍTICAS Y ESTRA...
TALLER LEARN SOBRE DATOS DE INVESTIGACIÓN IMPLEMENTACIÓN DE POLÍTICAS Y ESTRA...
 
Avances en torno a la Ley 26.899 e iniciativa regional de datos primarios de...
Avances en torno a la Ley 26.899 e iniciativa regional de datos primarios de...Avances en torno a la Ley 26.899 e iniciativa regional de datos primarios de...
Avances en torno a la Ley 26.899 e iniciativa regional de datos primarios de...
 
“Data for Development – the value of data for research and society” by Dr. Ma...
“Data for Development – the value of data for research and society” by Dr. Ma...“Data for Development – the value of data for research and society” by Dr. Ma...
“Data for Development – the value of data for research and society” by Dr. Ma...
 
Conicyt Y Mandato OECD by Patricia Muñoz, CONICYT (Chile)
Conicyt Y Mandato OECD by Patricia Muñoz, CONICYT (Chile)Conicyt Y Mandato OECD by Patricia Muñoz, CONICYT (Chile)
Conicyt Y Mandato OECD by Patricia Muñoz, CONICYT (Chile)
 
Datos Abiertos de Investigacion - Caso Mexico
Datos Abiertos de Investigacion - Caso MexicoDatos Abiertos de Investigacion - Caso Mexico
Datos Abiertos de Investigacion - Caso Mexico
 

Recently uploaded

LANDMARKS AND MONUMENTS IN NIGERIA.pptx
LANDMARKS  AND MONUMENTS IN NIGERIA.pptxLANDMARKS  AND MONUMENTS IN NIGERIA.pptx
LANDMARKS AND MONUMENTS IN NIGERIA.pptxBasil Achie
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptssuser319dad
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...NETWAYS
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...NETWAYS
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringSebastiano Panichella
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@vikas rana
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxmavinoikein
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxJohnree4
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...NETWAYS
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxaryanv1753
 
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...marjmae69
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationNathan Young
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...henrik385807
 
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)Basil Achie
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfhenrik385807
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...NETWAYS
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Salam Al-Karadaghi
 

Recently uploaded (20)

LANDMARKS AND MONUMENTS IN NIGERIA.pptx
LANDMARKS  AND MONUMENTS IN NIGERIA.pptxLANDMARKS  AND MONUMENTS IN NIGERIA.pptx
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.ppt
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software Engineering
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptx
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptx
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptx
 
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism Presentation
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
 
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
 

Open Data in a Big Data World: easy to say, but hard to do?

  • 1. Open Data in a Big Data World: easy to say, but hard to do? Sarah Callaghan sarah.callaghan@stfc.ac.uk @sorcha_ni ORCID: 0000-0002-0517-1031 Geoffrey Boulton, Dominique Babini, Simon Hodson, Jianhui Li, Tshilidzi Marwala, Maria Musoke, Paul Uhlir, Sally Wyatt 3rd LEARN workshop on Research Data Management, “Make research data management policies work” Helsinki, 28 June 2016
  • 2. Principles, Policies & Practice Responsibilities 1-2. Scientists 3.Research institutions & universities 4.Publishers 5.Funding agencies 6.Scholarly societies and academies 7.Libraries & repositories 8. Boundaries of openness Enabling practices 9. Citation and provenance 10. Interoperability 11. Non-restrictive re-use 12. Linkability http://www.icsu.org/science- international/accord
  • 4.
  • 5.
  • 6. It used to be “easy”… Suber cells and mimosa leaves. Robert Hooke, Micrographia, 1665 The Scientific Papers of William Parsons, Third Earl of Rosse 1800-1867 …but datasets have gotten so big, it’s not useful to publish them in hard copy anymore
  • 7. Hard copy of the Human Genome at the Wellcome Collection
  • 8. Example Big Data: CMIP5 CMIP5: Fifth Coupled Model Intercomparison Project • Global community activity under the World Meteorological Organisation (WMO) via the World Climate Research Programme (WCRP) •Aim: – to address outstanding scientific questions that arose as part of the 4th Assessment Report process, – improve understanding of climate, and – to provide estimates of future climate change that will be useful to those considering its possible consequences. Many distinct experiments, with very different characteristics, which influence the configuration of the models, (what they can do, and how they should be interpreted).
  • 9. Simulations: ~ 90,000 years ~ 60 experiments ~ 20 modelling centres (from around the world) using ~ 30 major(*) model configurations ~ 2 million output “atomic” datasets ~ 10's of petabytes of output ~ 2 petabytes of CMIP5 requested output ~ 1 petabyte of CMIP5 “replicated” output Which are replicated at a number of sites (including ours) Major international collaboration! Funded by EU FP7 projects (IS-ENES2, Metafor) and US (ESG) and other national sources (e.g. NERC for the UK) CMIP5 numbers
  • 10. 10 Summary of the CMIP5 example The Climate problem needs: – Major physical e-infrastructure (networks, supercomputers) – Comprehensive information architectures covering the whole information life cycle, including annotation (particularly of quality) … and hard work populating these information objects, particularly with provenance detail. – Sophisticated tools to produce and consume the data and information objects – State of the art access control techniques Major distributed systems are social challenges as much as technical challenges. CMIP5 is Big Data, with lots of different participants and lots of different technologies. It also has a community willing to work together to standardise and automate data and metadata production and curation, and with the willingness to support the effort needed for openness.
  • 11. Big Data: •Industrialised and standardised data and metadata production •Large groups of people involved •Methods for making the data open, attribution and credit for data creation established Long Tail Data: •Bespoke data and metadata creation methods •Small groups/lone researchers •No generally accepted methods for attribution and credit for data creation. Often data is closed due to lack of effort to open it https://flic.kr/p/g1EHPR
  • 12. Most people have an idea of what a publication is
  • 13. Some examples of data (just from the Earth Sciences) 1. Time series, some still being updated e.g. meteorological measurements 2. Large 4D synthesised datasets, e.g. Climate, Oceanographic, Hydrological and Numerical Weather Prediction model data generated on a supercomputer 3. 2D scans e.g. satellite data, weather radar data 4. 2D snapshots, e.g. cloud camera 5. Traces through a changing medium, e.g. radiosonde launches, aircraft flights, ocean salinity and temperature 6. Datasets consisting of data from multiple instruments as part of the same measurement campaign 7. Physical samples, e.g. fossils
  • 14. Open Data is not a new idea Henry Oldenburg
  • 15. Data, Reproducibility and Science Science should be reproducible – other people doing the same experiments in the same way should get the same results. Observational data is not reproducible (unless you have a time machine) Therefore we need to have access to the data to confirm the science is valid! Poor data analysis generates false facts – and false facts & inaccessible data undermine science & its credibility http://www.flickr.com/photos/31333486@N00/1893012324/siz es/o/in/photostream/
  • 16. A crisis of reproducibility and credibility? The data providing the evidence for a published concept MUST be concurrently published, together with the metadata. To do otherwise is scientific MALPRACTICE Pre-clinical oncology – 89% not reproducible Why? •Misconduct/fraud •Invalid reasoning •Absent or inadequate data and/or metadata
  • 17. We’re only going to get more data More big data - linked data – machine learning The internet of things So, what must we do? •Concurrently publish data and metadata that are the evidence for a published scientific claim – to do otherwise is malpractice •Data science skills for researchers •Re-establish standards of reproducibility for a data-intensive age
  • 18. • Patterns not hitherto seen • Unsuspected relationships • Integrated analysis of diverse data (e.g. natural & social science) • Complex systems e.g. complexity: dynamic evolution and system state But not all research is or needs to be data-intensive Scientific Opportunities of Big Data https://www.clickz.com/clic kz/column/2389218/create -better-content-via-humor
  • 20. Data supporting a published claim Other data for re-use & integration Pillars of the Digital Revolution Big Data Volume Velocity Variety Veracity Linked Data Many databases Semantic Relations Deeper meaning Foundations : Openness Machine analysis & learning The Open Data Edifice
  • 21. Open Data initiatives in areas of: Life sciences Earth Science, Environmental Science Food Science Agricultural Science Chemical Crystallography Bioinformatics/Genomics Linguistics Social Sciences Evolutionary biology Biodiversity Astronomy Earth Observation (GEO) Archaeology Atmospheric sciences EMBL-EBI services Labs around the world send us their data and we… Archive it Classify it Share it with other data providers Analyse, add value and integrate it …provide tools to help researchers use it A collaborative enterprise Elixir programme It is happening: bottom- up Open Data initiatives
  • 22. The Open Data Iceberg The Technical Challenge The Consent Challenge The Institutional Challenge The Funding Challenge The Support Challenge The Skills Challenge The Incentives Challenge The Mindset Challenge Processes & Organisation People Developed from: Deetjen, U., E. T. Meyer and R. Schroeder (2015). OECD Digital Economy Papers, No. 246, OECD A National Infrastructure Technology
  • 23. Scientists i.Publicly funded scientists have a responsibility to contribute to the public good through the creation and communication of new knowledge, of which associated data are intrinsic parts. They should make such data openly available to others as soon as possible after their production in ways that permit them to be re-used and re- purposed. ii. The data that provide evidence for published scientific claims should be made concurrently and publicly available in an intelligently open form. This should permit the logic of the link between data and claim to be rigorously scrutinised and the validity of the data to be tested by replication of experiments or observations. To the extent possible, data should be deposited in well-managed and trusted repositories with low access barriers. From the Accord: Responsibilities
  • 24. Creating a dataset is hard work! "Piled Higher and Deeper" by Jorge Cham www.phdcomics.com Documenting a dataset so that it is usable and understandable by others is extra work!
  • 25. “I’m all for the free sharing of information, provided it’s them sharing their information with us.” http://discworld.wikia.com/wiki/Mustrum_Ri dcully Mustrum Ridcully, D.Thau., D.M., D.S., D.Mn., D.G., D.D., D.C.L., D.M. Phil., D.M.S., D.C.M., D.W., B.El.L, Archancellor, Unseen University, Anhk- Morpork, Discworld - As quoted in “Unseen Academicals”, by Terry Pratchett
  • 26. Open is not enough! “When required to make the data available by my program manager, my collaborators, and ultimately by law, I will grudgingly do so by placing the raw data on an FTP site, named with UUIDs like 4e283d36-61c4-11df-9a26- edddf420622d. I will under no circumstances make any attempt to provide analysis source code, documentation for formats, or any metadata with the raw data. When requested (and ONLY when requested), I will provide an Excel spreadsheet linking the names to data sets with published results. This spreadsheet will likely be wrong -- but since no one will be able to analyze the data, that won't matter.” - http://ivory.idyll.org/blog/data- management.html https://flic.kr/p/awnCQu
  • 27. Incentives for Open Data • Need reward structures and incentives for researchers to encourage them to make their data open • Data citation and publication • (again, issues with treating data as a special case of publications…)
  • 29. What the data set looks like on disk What the raw data files look like. I could make these files open easily, but no one would have a clue how to use them! The Understandability Challenge: Data
  • 30. It’s ok, I’ll just put it out there and if it’s important other people will figure it out These documents have been preserved for thousands of years! But they’ve both been translated many times, with different meanings each time. We need Metadata to preserve Information We can’t rely on Data Archaeology Phaistos Disk, 1700BC
  • 32. It’s not just data! • Experimental protocols • Workflows • Software code • Metadata • Things that went wrong! • …
  • 33. Usability, trust, metadata http://trollcats.com/2009/11/im-your-friend-and-i- only-want-whats-best-for-you-trollcat/ When you read a journal paper, it’s easy to read and get a quick understanding of the quality of the paper. You don’t want to be downloading many GB of dataset to open it and see if it’s any use to you. Need to use proxies for quality: •Do you know the data source/repository? Can you trust it? •Is there enough metadata so that you can understand and/or use the data? In the same way that not all journal publishers are created equal, not all data repositories are created equal Example metadata from a published dataset: “rain.csv contains rainfall in mm for each month at Marysville, Victoria from January 1995 to February 2009” Lindenmayer, David B.; Wood, Jeff; McBurney, Lachlan; Michael, Damian; Crane, Mason; MacGregor, Christopher; Montague-Drake, Rebecca; Gibbons, Philip; Banks, Sam C.; (2011): rain; Dryad Digital Repository. http://doi.org/10.5061/DRYAD.QP1F6H0S/3
  • 34. Should ALL data be open? Most data produced through publically funded research should be open. But! • Confidentiality issues (e.g. named persons’ health records) • Conservation issues (e.g. maps of locations of rare animals at risk from poachers) • Security issues (e.g. data and methodologies for building biological weapons) There should be a very good reason for publically funded data to not be open.
  • 35.
  • 36. Getting scooped http://www.phdcomics.com/comics/archive.php?comicid=795 It happened to me! I shared my data with another research group. They published the first results using that data. I wasn’t a co-author. I didn’t get an acknowledgement.
  • 37. Citeable does not equal Open! Just like you can cite a paper that is behind a paywall, you can cite a dataset that isn’t open. Making something citeable means that: • You know it exists • You know who’s responsible for it • You know where to find it • You know a little bit about it (title, abstract,…) Even if you can’t download/read the thing yourself. Citation gives benefits that encourage data producers to make their data open
  • 38. Be careful of your citations!
  • 39. Inputs Outputs Open access Administrative data (held by public authorities e.g. prescription data) Public Sector Research data (e.g. Met Office weather data) Research Data (e.g. CERN, generated in universities) Research publications (i.e. papers in journals) Open data Open science A direction of travel? Collecting the data Doing research Doing science openly Researchers - Govt & Public sector - Businesses - Citizens - Citizen scientists (communication/dialogue – joint production of knowledge) Stakeholders • Communication/dialogue must be audience-sensitive • Is it – with all stakeholder groups?
  • 40. Summary and maybe conclusions? • We need to open the products of research • to encourage innovation and collaboration • to give credit to the people who’ve created them • to be transparent and trustworthy • Openness does come at a cost! • It’s not enough for data to be open • it needs to be usable and understandable too • Data citation and publication are ways of encouraging researchers to make their data open • or at least tell the world that their data exists! • We need a culture change – but it’s already happening! http://www.keepcalm-o-matic.co.uk/default.asp
  • 41. Thanks! Any questions? sarah.callaghan@stfc.ac.uk @sorcha_ni http://citingbytes.blogspot.co.uk/ “Publishing research without data is simply advertising, not science” - Graham Steel http://blog.okfn.org/2013/09/03/publishing-research-without-data-is-simply-advertising-not-science/ http://heywhipple.com/dont-show-me-a-something- about-show-me-something/

Editor's Notes

  1. This is Henry Oldenberg, the first secretary of the newly formed Royal Society in the early 1660s. Henry was an inveterate correspondent, with those we would now call scientists both in Europe and beyond. Rather than keep this correspondence private, he thought it would be a good idea to publish it, and persuaded the new Society to do so by creating the Philosophical Transactions, which remains a top-flight journal to the present day. But he demanded two things of his correspondents: that they should submit in the vernacular and not Latin; and that evidence (data) that supported a concept must be published together with the concept. It permitted others to scrutinize the logic of the concept, the extent to which it was supported by the data and permitted replication and re-use. Open publication of concept and evidence is the basis of “scientific self-correction”, which historians of science argue were the crucial building blocks on which the scientific revolution of the 18th and 19th centuries was built and remain fundamental to the progress of science. Openness to scrutiny by scientific peers is the most powerful form of peer review.
  2. The fundamental challenge is to scientific self-correction. Journals can no longer contain the data, and neither scientists nor journals have taken the obvious step of having data relevant to a publication concurrently available in an electronic database. (example of last year’s Nature paper revealing that only 11% of results in 50 benchmark papers in pre-clinical oncology were replicable. If lack of Oldenburg’s rigour in presenting evidence is widespread, a failure of replicability risks undermines science as a reliable way of acquiring knowledge and can therefore undermines its credibility.
  3. Lots of interchangeable and fluid terms but many shared principles. The word “science” is used to mean the systematic organisation of knowledge that can be rationally explained and reliably applied. It is not exclusively restricted to “natural science”.