Building the FAIR Research Commons: A Data Driven Society of Scientists

Building the
FAIR Research Commons:
A Data Driven Society of Scientists
Professor Carole Goble CBE FREng FBCS
The University of Manchester, UK
carole.goble@manchester.ac.uk
FAIR
Research
Commons
Symposium: The Future of a Data-Driven Society, Maastricht University, 25 Jan 2018

Data-Driven Science
Simulations, data exploration, data processing, analytics, text mining,
visual analytics, automated inference….
e-Science:
enabling Data Driven Science
e-Infrastructure:
enabling e-Science
Distributed computing
Data management, Catalogues
Virtual Research Environments
Metadata & Semantic Web technologies
Software Engineering Products and Services
Collaboration, Sharing & Publishing Platforms

Open
Science
Open Data
Reproducible Science
Personally Productive Science

“The FAIR Guiding Principles for scientific data management and stewardship
Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18
Principles
Metadata
Identifiers
Access policies
Technical: Political
Social
Economic:
A Flag,
A Meme

The Future of a Data-Driven Society
A Society of Scientists
Do Data Driven Science
Data Driven Scholarship
Data contributors,
curators, consumers
Biodiversity Scientists +
Research InfrastructureTechies
ProjectTeams……. Of Individuals
Collaborating and Competing Simultaneously

KnowledgeTurning
Increase Flow of Information
• Across scattered resources, platform, people
• Coordination, collaboration
• Cumulative, Dynamic
[original figure: Josh Sommer]
Cumulative
Commons
Goble, De Roure, Bechhofer, Accelerating KnowledgeTurns, I3CK, 2013, isbn: 978-3-642-37186-8

• Distributed, Fragmented, Siloed
• No single entry point
• Living software, models, data, catalogues, tools …
What’s the Commons?
Resources
• collectively created
• owned or shared
• between or among a
community
Governance
https://scholarlycommons.org/

Macro, Micro*, pooled
• public resources
• data centres
• journals
• dedicated projects
• governance
• majority of
researchers
• labs & universities
• generators
• my resources
*Meso too – but to complicated for 20 minutes! See
http://www.knowledge-exchange.info/event/ke-approach-open-scholarship

Some Data-driven Predictive Science
in Ecological Niche Modelling
predatory fish
the grazer endemic alga
[Obst, Leidenberger]

Do Research
Research Infrastructure
Services
Assemble
Methods, Materials Experiment
ObserveSimulate
Analyse
Results
Quality
Assessment
Track and Credit
Disseminate
Deposit &
Licence
Marketplace
Services
Publish
Share
Results
Any
research
product
Selected
products
Manage
Results
The Data-Driven Open Science
Public + Personal Commons
Science 2.0 Repositories: Time for a Change in Scholarly Communication Assante, Candela, Castelli, Manghi, Pagano, D-Lib 2015

“The questions don’t change but the
answers do” Dan Reed, Microsoft
Salami Slicing, Scattering

101 Innovations in Scholarly Communication - the Changing Research Workflow, Boseman and Kramer, 2015,
http://figshare.com/articles/101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow/1286826

Research
Infrastructure
Services
Assemble
ObserveSimulate
Analyse
Results
Quality
Assessment
Track and Credit
Disseminate
Deposit &
Licence
Marketplace
Services
Share
Results
Manage
Results
Building a FAIR Research Commons
Portable
Automated
Reproducible
Methods
Supporting
Collaborations
Science 2.0 Repositories:Time for a Change in Scholarly Communication
Assante, Candela,Castelli, Manghi, Pagano DOI: 10.1045/january2015-assante
Mesirov,J. Accessible Reproducible Research Science
327(5964), 415-416 (2010)

Clear steps
Transparent
Comprehensible
Replicable
Logged
Accessible
Provenance
Standardised
Harmonised
Combined
Method
Materials
Variations X N
Repeat. Compare.
Log & Track
Provenance
Scale
Data-driven Science, Predictive Science
is Software-driven, Method-Driven
x

Data ScienceAnalytics
Machine learning
Discovery, New algorithms
Data stewardship
Standardisation, Harmonisation,
Annotation and enrichment,
Maintaining access, preserving
Software stewardship
Updates, versions, porting
Prep & Processing
Data wrangling & curation
Instrument pipelines
Simulation sweeps

Method Commodities
Workflows ASAP
Automate, Scale, Abstract, Provenance
Taverna 14th Anniversary

Methods
techniques, algorithms, spec.
of the steps, models, versions,
robustness, statistical power …
Materials
datasets, parameters, thresholds,
versions, algorithm seeds, reference
datasets…
Instruments
tools, codes, services, scripts,
underlying libraries, versions,
workflows…
Laboratory
computational environment,
High performance access,
Operating system…
Data Instruments -> Data Scopes
Method Objects, fragile, updating ….
Maintain for Running
Document for Reading

Software is a first class member of
Data-driven Science
56% Of UK researchers
develop their own
research software
or scripts
73% Of UK researchers
have had no
formal software
engineering
training
Survey of researchers from 15 RussellGroup universities conducted by SSI between August - October 2014.
406 respondents covering representative range of funders, discipline and seniority.
Goble, Better Software, Better Research IEEE Internet Computing doi: 10.1109/MIC.2014.88
De Roure, Goble,Software Design for Empowering Scientists IEEE Software doi: 10.1109/MS.2009.22
Research Software Engineers
National Capability

10th Anniversary
Workflow Commons
Groups
Social collaboration, credit and
citation around Research Objects
Replicate- Reproduce - Remix -Repurpose
Reuse – Repurpose – avoid Reinvent

FAIR Workflow
Research Object
Reproducibility, Portability, Repurpose
Repair. Preservation,
Executable Publishing
Metadata
Object
metadata, ontologies,
identifiers
Manifest
Provenance
Dependencies
Versions
Checklists
Annotations
Container
System
researchobject.org
Unbounded Objects

FAIR Methods, Different wflow systems
Living
Products

Jennifer Schopf,Treating Data Like Software: ACase for ProductionQuality Data, JCDL
2012
Don’t Publish, Release
Analogous to software
products and practices
rather than data or
articles
Agile Data-driven
Science
Treat ALL Products and
ALL Research Like Software
“evolving
manuscript”
Sir Mark WalportTime Higher Education Supplement, 14 May 2015

Context
Relationships
Credit
Research Goods FAIR Exchange
Governance
Stewardship
Credit
Tracking
Lifecycles
Fixivity…
Arxiv,
my Lab
myExperiment
GitHub,
Web Service myWebSite
bioModels.org,
openModeller
PubMed
Spreadsheet in
figshare
ArrayExpress,
BioSamples,
PRIDE, GBIF,
my Lab,
institutional
repository
Overlaying the
Research Commons
ecosystem
Unbounded
Composite
Living
Rots

Tracking, credit mining, comparison, auto-
metadata, blockchain, boundary objects….
1
3
2
A FAIR KnowledgeWeb of Research Objects
Map across metadata
Threaded publications
Navigate, Pivot-Focus, Cite
Self-describing

Unit for Reproducibility / Productivity, Portability,
Preservation, Executable Publishing
researchobject.org
Bechhofer et al (2013)Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004
Bechhofer et al (2010) Research Objects:Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/
Linked Data / Semantic Web
FAIR machine processable metadata
Standards-based generic
metadata framework
Provenance
Dependencies
Versions
Checklists
Annotations

The time is right …
Reproducible Document
Stack project
Social
Technology Process
Purpose
Publishers, Research
Infrastructures, Communities,
Library services, Agencies ….
Not Jo Public….

Research
Infrastructure
Services
Assemble
ObserveSimulate
Analyse
Results
Quality
Assessment
Track and Credit
Disseminate
Deposit &
Licence
Marketplace
Services
Share
Results
Manage
Results
Building a FAIR Research Commons
Portable
Automated
Reproducible
Methods Supporting collaborations
to make & exchange FAIR
content

Systems Biology Projects
• SME multi-disciplinary groups
• Multi-site collaborations
• Competing
• Experimentalists, dry modellers
• Self-deposit, no stewardship skills
• Funder driven sharing
modellers
experimentalists
Build a Project Commons!!
• Foster stewardship
• Stimulate sharing
• Ensure retention
• Respect global community,
local project resources
http://fair-dom.org Wolstencroft et al , Nucleic Acids Research, 2016, 10.1093/nar/gkw1032.

3 Studies
Model analysis,
construction, validation
24 Assays/Analysis
Simulations,
characterisations
16
19
13
2
1
Structured organisation
Retain context in one place, Release FAIR products
Use and deposit in the fragmented resources [Penkler, Snoep]

FAIRDOMHub Systems Biology Commons
http://fairdomhub.org
Distributed Commons, Integrated View
“During and within” publishing
Simulate
Compare
Validate
10th Anniversary

What methods are been used to
determine enzyme activity?
What SOP was used for this
sample?
Where is the validation data for this
model?
Is there any group
generating kinetic data?
Is this data available?
Track versions of my model
Whats the relationship between the
data and model?
Which data belong to
which publications?
Self-controlled spaces
• enclaves -> public
Discover own assets
One entry point
• over external systems

Project Pals
Post-docs, Postgrads,
Data stewards
Building the Commons so they Come
The Programme Funders
Stewardship
Support

TheTragedy of the Commons? FAIR Play?
Values
of assets
of reproducibility
of metadata
economics of infras.
priorities
Behaviours
enclave sharing
hoarding, flirting, voyerism
consumer-producer asymmetry
playground rules
Sweatshop
collaborating but competing
burden - time, skills
short term, shortcuts
principle investigators
tools & templates
seamless join-up
automation, stewards
reprod. debt is hard
The last mile

Self
Retention, Access
Productivity
Quick, Lightweight
Simple
ShortTerm
Credit
Trusted & Free
Just Enough
Skills?
Service
Sharing
Reproducibility
Accurate, Reusable
Rich
LongTerm
Credit
Sustained
Just in Case
Stewards
Pushing FAIR upstream

“Sloppy ScienceWins”
John Ioannidis,
Stanford School of Medicine
Open Science Fair, Athens 2017

Social
Technology Process
Professional
Stewardship
Ramps
Defeating
Cultural Inertia
Overcoming
TheTragedy of the Commons
Paying for it

By side effect – metadata for FAIR
Universal tagging of Life
Science datasets, tools,
protocols, training materials
Web scale knowledge graph
Embedded ontologies and
metadata templates
Metadata harvesting by
stealth
https://ncip.nci.nih.gov/blog/face-new-tragedy-commons-remedy-better-metadata/

Ask what can you and Data Science
do for the FAIR Commons?

Building the
FAIR Research Commons:
A Data Driven Society of Scientists
Release FAIR
Research Objects
Manage
Datascopes
FAIR play incentives
FAIR
Research
Commons

All the members of the Wf4Ever team
Colleagues in Manchester’s
Information Management Group,
ELIXIR-UK, Bioschemas
http://www.researchobject.org
http://www.myexperiment.org
http://wf4ever.org
http://www.fair-dom.org
http://www.fairdomhub.org
http://seek4science.org
http://rightfield.org.uk
http://www.bioschemas.org
http://www.commonwl.org
http://www.bioexcel.eu
http://www.openphacts.org
https://www.force11.org/
Mark Robinson
AlanWilliams
Jo McEntyre
Norman Morrison
Stian Soiland-Reyes
Paul Groth
Tim Clark
Alejandra Gonzalez-Beltran
Philippe Rocca-Serra
Ian Cottam
Susanna Sansone
Kristian Garza
Daniel Garijo
Catarina Martins
Alasdair Gray
Rafael Jimenez
Iain Buchan
Caroline Jay
Michael Crusoe
Katy Wolstencroft
Barend Mons
Sean Bechhofer
Matthew Gamble
Raul Palma
Jun Zhao
Josh Sommer
Matthias Obst
Jacky Snoep
David Gavaghan
Stuart Owen
Finn Bacall
Paolo Missier
Phil Crouch
Oscar Corcho
Dan Katz
Arfon Smith
David De Roure
Marco Roos
Massimilano Assante
Paolo Manghi

Building the FAIR Research Commons: A Data Driven Society of Scientists

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Building the FAIR Research Commons: A Data Driven Society of Scientists

Similar to Building the FAIR Research Commons: A Data Driven Society of Scientists (20)

More from Carole Goble

More from Carole Goble (14)

Recently uploaded

Recently uploaded (20)

Building the FAIR Research Commons: A Data Driven Society of Scientists