FAIRy Stories

FAIRy stories
for Christmas
Carole Goble
The University of Manchester, UK
carole.goble@manchester.ac.uk
ELIXIR-UK, FAIRDOM, ISBE,
BioExcel CoE, Software Sustainability Institute
Open PHACTS
SWAT4HCLS 2017, 5th Dec 2017, Rome
Once upon a time in
a land far, far away
lived a KinG …
Who wanted all data
to be FAIR….
FAIRy Stories
Mark D. Wilkinson,
Michel Dumontier,
IJsbrand Jan Aalbersberg,
Gabrielle Appleton,
Myles Axton,
Arie Baak,
Niklas Blomberg,
Jan-Willem Boiten,
Luiz Bonino da Silva Santos,
Philip E. Bourne,
Jildau Bouwman,
Anthony J. Brookes,
Tim Clark,
Mercè Crosas,
Ingrid Dillo,
Olivier Dumon,
Scott Edmunds,
Chris T. Evelo,
Richard Finkers,
Alejandra Gonzalez-Beltran,
Alasdair J.G. Gray,
Paul Groth,
Carole Goble,
Jeffrey S. Grethe,
Jaap Heringa,
Peter A.C ’t Hoen,
Rob Hooft,
Tobias Kuhn,
Ruben Kok,
Joost Kok,
Scott J. Lusher,
Maryann E. Martone,
Albert Mons,
Abel L. Packer,
Bengt Persson,
Philippe Rocca-Serra,
Marco Roos,
Rene van Schaik,
Susanna-Assunta Sansone,
Erik Schultes,
Thierry Sengstag,
Ted Slater,
George Strawn,
Morris A. Swertz,
Mark Thompson,
Johan van der Lei,
Erik van Mulligen,
Jan Velterop,
Andra Waagmeester,
Peter Wittenburg,
Katherine Wolstencroft,
Jun Zhao,
Barend Mons
Wilkinson Dumontier Schultes
Scientific Data 3, 160018 (2016)
doi:10.1038/sdata.2016.18
Queens…
And FAIRY GODMOTHERS
Scientific Data 3, 160018 (2016)
doi:10.1038/sdata.2016.18
Machine Processable Metadata
Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18
• Catalogues, Search, Stores
• Metadata Standards
• StandardAccess protocols
• Identifiers, Policies
• Authorised Access
• Licensing
FAIR spread across the lands ……
VIVO/SciTS Conferences 6-8 August 2014, Austin, TX
FAIR spread across the lands ……
Stakeholder FAIR Awareness
UK Institutional Research Data Management guidance*
* Jisc: Final Report FAIR in Practice, Nov 2017
Government,
Funder,
Publisher,
National &
International
Infrastructures…
Institutional
Researchers
FAIR spread across the lands …… BUT not
necessarily all the peoples
FAIR spread across the lands ……
Moral: Names are important
Spinning (metadata) straw
into gold
Be careful what you
promise…
Me Too!
staking claims
we { are | will be | always
have been } FAIR
a rallying flag
Hype
Curve
http://dx.doi.org/10.1101/225490
http://blog.ukdataser
vice.ac.uk/fair-data-
assessment-tool/
http://fairmetrics.org/
Beware…
beauty is in the
eye of the
beholder
What’s FAIR from a Cataloguer
perspective maybe useless from
a biologists viewpoint
My Semantic FAIRy Stories
The Scientist and
the FAIR Commons
The MAGIC
Research Object
little semantics and
the big Web
The Scientists and the
FAIR Research
Commons
Supporting mixed
types and many
researchers
FAIR
The Scientists and the
FAIR Research
Commons
Find:
ID resolution
Faceted Navigation
Search, RDF
SPARQL endpoint, APIs
A Commons for Workflows
myexperiment.org
A Commons for Systems Biology Projects
fairdomhub.org
investigation
study
assay/analysis
data
models
SOPs
Community & Project Commons
Structured
organisation
across standards
and types
Federation over
autonomous
resources
Laissez-Faire
Independent
Users
Ecosystem of
types, stores
and metadata
Own little houses: from straw to bricks
Permission controls
Staged sharing
Licenses
Negotiated access
Embargos
Open
Schema
Dublin core
Datacite,
DCAT, Bioschemas
Catalogue
Level
Investigation
Studies
Assay/Analysis
Content
level
Persistent Identifiers
Content level
subject thematic standards
Content
level
Stratified
Linked Data
Getting the best FAIR metadata….
FAIR Access
– myExperiment -> open
– FAIRDOM -> friends and family
– Hand over straw houses to FAIRDOMHub
“TheTragedy of the Commons”*
– Metadata quality and quantity
– Identifier hygiene
– Curation & contributions
– Public good vs personal burden
– Incorporation into processes
– Community socialisation - obligations mismatches. Credit!
*Mark Musen , https://ncip.nci.nih.gov/blog/face-new-tragedy-commons-remedy-better-metadata/
project PIs, funders
time
burden, distrust
project PIs, funders
PALs – juniors, advocates and
Cinderellas
templates, tools
benefit
Moral: Incentives
Bake in
“Semantic Nudging”
Ontologies stealthily embedded
in Excel spreadsheet templates
Added value -
Model execution
Vanity, guilt, shaming
Automation
rightfield.org.uk
Cinderella?
The Spreadsheet
“The Last Mile”* -> The First Mile
FAIR from bench to cloud
Last mile - Infrastructure
view
First mile - researcher /
resource view
* Dimitrios Koureas et al Community engagement: The ‘last mile’ challenge for
European research e-infrastructures
Research I deas and Outcomes 2: e9933 (20 Jul 2016)
https://doi.org/10.3897/rio.2.e9933
the generic vs specific zig zag path
The MAGIC Research
OBJECT
GENERIC Framework
For exchange,
reproducibility,
Preservation, active
artefacts
Universal Catering,
bottomless content
FAIR
The FAIR Research Object
import, exchange, portability, maintenance
ISA-TAB
Bergman et al COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project,
BMC Bioinformatics 2014, 15:369
workflow engine
Workflow Run
Provenance
Inputs Outputs
Intermediates
Parameters
Configs
Narrative
Exchange between people & platforms
Commons store, catalogue & archive
Reproduce preserve, port, repair
Activate re-compute, mix, compare,
evolve
The FAIR Workflow Research Object
researchobject.org
Bechhofer et al (2013) Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004
Bechhofer et al (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/
Standards-based generic
metadata framework for
bundling internal and external
resources with context
citable reproducible packaging
Data used and results produced in study
Methods employed to produce/analyse data
Provenance and settings for the experiments
People involved in the investigation
Annotations about these resources:-
understanding & interpretation
Linking across ROs and into the
Linked Open Data Cloud
• Recording & linking together the
components of an experiment
• Linking across experiments.
• Linked ROs
• A SemanticWeb of Research
Objects
• Resource References – a
bottomless pot
Technology Independent.
The least possible.
The simplest feasible. Low tech.
Low user overhead and thin client
Graceful degradation.
FAIR ROs Desiderata
Construction Content Profile
Types
Identification
to locate things
Aggregates
to link things together
Annotations
about things & their
relationships
Type Checklists
what should be there
Provenance
where it came from
Versioning
its evolution
Dependencies
what else is needed
Manifest checklist
Type Checklists
describing what
should be there
Container
Metadata
Objects
Construction
http://www.researchobject.org/specifications/
RO Model
Identifiers: URI, RRI,
DOI, ORCID
W3C Web
AnnotationVocabulary
Open Archives Initiative
Object Exchange and Reuse
Aggregation
Annotation
Container
Content
Profiles.
Progression LevelsContainer
Profile
http://purl.org/minim/description
W3C
Shape Specs
*Gamble, Zhao, Klyne, Goble. "MIM: A Minimum Information Model Vocabulary and Framework for Scientific Linked
Data", IEEE eScience 2012 Chicago, USA October, 2012), http://dx.doi.org/10.1109/eScience.2012.6404489
validators / viewers
Minim model for
defining
checklists*
multiple profiles for
different consumers
Generic
Specifics
RO-SHOW
Container
Linked Data
Pharmacological
Discovery Platform
Data Releases
Dataset “build”
RO Library
Earth Sciences
Public Health Learning Systems
Asthma Research e-
Lab sharing and
computing statistical
cohort studies
Happy Endings!
ISA based Packaging,
Systems Biology commons
& publishing
Managing distributed
unmovable large datasets
for Biomedical HTS
analytic pipelines *
* Chard et al I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets,
https://doi.org/10.1109/BigData.2016.7840618
Happy Ending – Workflows
Biomedical HTS analytic pipelines
Manifest description of
CWL workflows + rich
context + provenance +
other objects + snapshots
Precision medicine
NGS pipelines regulation*
*Alterovitz, Dean II, Goble, Crusoe, Soiland-Reyes et al Enabling Precision Medicine via standard communication of NGS provenance, analysis, and results, biorxiv.org,
2017, https://doi.org/10.1101/191783
EDAM
Biomolecular modelling
PortableWorkflows
BagIT, JSON(-LD),
schema.org
https://dokie.li/
https://linkedresearch.org/
Manifest: Schema.org,
JSON-LD, RDF
Archive: .tar.gz
Reproducible Document
Stack project
eLife, Substance and Stencila
BagIT data profile +
schema.org JSON-LD
annotations
Many Roads
Morals
Incremental, open frameworks hard work
– Extensive reuse of standards is tricky
– Too Generic vsToo Specific
– Multi-element type & nesting challenges
– ROs with a Purpose
– Examples & templates
Representational Beauty vsTools
– Easy to make, hard to consume
– Be specific, be developer friendly
– Profiles & tools critical
Patience is a virtue
Bioschemas:
Little Semantics and
the big web
Being and keeping light,
small and viral
FAIR
Structured data markup for web pages
Schema.org adds simple
structured metadata markup to
web pages & sitemaps for
harvesting, search and summary
snippet making.
Search engines often highlight
websites containing Schema.org
Widespread commercial and
open source infrastructure
creates a low barrier to adoption
Goldilocks & the 3 Use Cases
Standardised
metadata
mark-up
Metadata
published &
harvested
withoutAPIs
or special
feeds
3 Use Cases
1. Finding/Citing,
2. Summary snippets
3. Metadata exchange /
ingest
Goldilocks
• Reuse ubiquitous
commercial platform
• The least possible change,
the max possible reuse
• Minimum properties – 6
• Reuse domain ontologies –
we are not reinventing
them!
Commodity
Off the Shelf tools
App eco-system
Repository Level
Content type level
Standardised
metadata
mark-up
Metadata
published &
harvested
withoutAPIs
or special
feeds
Commodity
Off the Shelf tools
App eco-system
Repository Level
Content type level
Goldilocks & the 3 Use Cases
Training
materialsEvents
Organizations Data
Software Lab
Protocols
schema.org tailored to the Biosciences for FAIR
simple structured metadata markup on web pages & sitemaps
bio.tools
schema.org tailored to the Biosciences
simple structured metadata markup on web pages & sitemaps
• Specific for life sciences
• Extends existing Schema.org types
• Focused on few types and well defined relationships
• Minimum properties for finding and accessing data
• Best practices for selected properties
• Managed by Bioschemas.org
• Generic data model
• Generous list of properties to describe data types
• Managed by Schema.org
Tailored schema.org to improve
Findability and Accessibility in Bioscience
Layer of constraints +
documentation + extensions
Leyla Garcia. Poster & Flashtalk
2-3 Oct 2017, Hinxton, ~50 people
Ideally 6 concepts
Reuse ontologies
schema.org
Real mark-up
Tools
Find, Cite, Snippets,
Metadata exchange
Community
http://www.france-bioinformatique.fr/en/training_material
https://search.google.com/structured-data/testing-tool
Applied Drupal 7 schema.org extension
Took about 2 hours
Included inTeSS in an hour
[Niall Beard]
MORALs
Community Buy-in Worth it
• First specs & main mechanism for training
• Google / Schema & ELIXIR support
• Research Schemas for EuropeanOpen
Science Cloud pilot
Goldilocks works but is hard work
• Types & Profiles debates
• Elegance vs best for tools
• Reuse domain ontologies
• Validation, mark-up & harvesting tools
Trolls
How are we FAIRing?
Different levels with different emphasis
Its an Ecosystem, not a single solution
• Catalogues, Search, Stores
• Metadata Standards
• StandardAccess protocols
• Identifiers, Policies
• AuthorisedAccess
• Licensing
smart rebrand launch
Still hard, same stuff
Rally big communities
and grassroots initiatives
Examine our capabilities
There is no magic
FAIRy Land PEST
Political
Economic
Social
Technical
Platform & user buy-in from the get-go
Passionate, dedicated leadership
Seeding critical mass
Community
Tools Driver
Bottom up initiatives fostered by big
umbrellas infrastructures
FAIR Semantic Village*
Simple & Lightweight
Ramps not revolutions
FAIR with a PURPOSE & With PEOPLE
FAIR
Support typical developer –
Familiarity – JSON, APIs
*Deb McGuinness
Research for FAIR
FAIR representation
• The Semantic Web
Automated metadata
• Deep learning, machine learning, AI
• Text Mining, Ontology mapping
Social metadata
• User Experience, Crowd Sourcing
• Choice architecture
FAIR action
• Blockchain
• Virtualised & remote execution
• Image processing
• Preservation & portability
• Provenance tracking, object trajectories
• Engineering & Design, Ethics, Social Sciences
Research +
Developer Practitioner
practices
Mark Robinson
Norman Morrison
Paul Groth
Tim Clark
Alejandra Gonzalez-Beltran
Philippe Rocca-Serra
Ian Cottam
Susanna Sansone
Kristian Garza
Daniel Garijo
Catarina Martins
Iain Buchan
Caroline Jay
David De Roure
Oscar Corcho
Steve Pettifer
Khalid Belhajjame
Jun Zhao
Phil Crouch
Lilian Gorea,
Oluwatomide Fasugba
Stian Soiland-Reyes
Michael Crusoe
Rafael Jimenez
Alasdair Gray
Barend Mons
Sean Bechhofer
Michel Dumontier
Mark Wilkinson
Leyla Garcia
Stuart Owen
KatyWolstencroft
Finn Bacall
Alan Williams
Wolfgang Mueller
Olga Krebs
Jacky Snoep
Matthew Gamble
Raul Palma
Mark Musen
http://www.researchobject.org
http://www.myexperiment.org
http://wf4ever.org
http://www.fair-dom.org
http://www.fairdomhub.org
http://seek4science.org
http://rightfield.org.uk
http://www.bioschemas.org
http://www.commonwl.org
http://www.bioexcel.eu
http://www.openphacts.org
FAIRy Stories
1 of 59

Recommended

Research Objects: more than the sum of the parts by
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the partsCarole Goble
1.5K views45 slides
The Rhetoric of Research Objects by
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
2.4K views53 slides
Being FAIR: Enabling Reproducible Data Science by
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
1.2K views46 slides
Being Reproducible: SSBSS Summer School 2017 by
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
1.4K views60 slides
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe... by
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
459 views59 slides
Being FAIR: FAIR data and model management SSBSS 2017 Summer School by
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
978 views65 slides

More Related Content

What's hot

Mtsr2015 goble-keynote by
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynoteCarole Goble
1.5K views63 slides
ROHub by
ROHubROHub
ROHubRaul Palma
775 views13 slides
FAIRer Research by
FAIRer ResearchFAIRer Research
FAIRer ResearchCarole Goble
1K views40 slides
Aspects of Reproducibility in Earth Science by
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceRaul Palma
560 views17 slides
RARE and FAIR Science: Reproducibility and Research Objects by
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsCarole Goble
1.4K views68 slides
FAIR Data, Operations and Model management for Systems Biology and Systems Me... by
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...Carole Goble
1.5K views47 slides

What's hot(20)

Mtsr2015 goble-keynote by Carole Goble
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
Carole Goble1.5K views
Aspects of Reproducibility in Earth Science by Raul Palma
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
Raul Palma560 views
RARE and FAIR Science: Reproducibility and Research Objects by Carole Goble
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research Objects
Carole Goble1.4K views
FAIR Data, Operations and Model management for Systems Biology and Systems Me... by Carole Goble
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
Carole Goble1.5K views
Reproducibility Using Semantics: An Overview by dgarijo
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
dgarijo890 views
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o... by Carole Goble
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
Carole Goble17.2K views
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks by Carole Goble
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Carole Goble1.3K views
SEEK for Science: A Data and Model Management Platform to support Open and Re... by Carole Goble
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
Carole Goble2.2K views
What is Reproducibility? The R* brouhaha (and how Research Objects can help) by Carole Goble
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
Carole Goble1.5K views
Introduction to FAIRDOM by Carole Goble
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOM
Carole Goble1.3K views
FAIR Workflows and Research Objects get a Workout by Carole Goble
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
Carole Goble480 views
Reproducible and citable data and models: an introduction. by FAIRDOM
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.
FAIRDOM4.2K views
Reproducibility and Scientific Research: why, what, where, when, who, how by Carole Goble
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
Carole Goble2.6K views
FAIR History and the Future by Carole Goble
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
Carole Goble308 views
The beauty of workflows and models by myGrid team
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and models
myGrid team1.1K views
Research Objects for FAIRer Science by Carole Goble
Research Objects for FAIRer Science Research Objects for FAIRer Science
Research Objects for FAIRer Science
Carole Goble2.2K views

Similar to FAIRy Stories

re3data.org – Registry of Research Data Repositories by
re3data.org – Registry of Research Data Repositoriesre3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data RepositoriesHeinz Pampel
1.4K views27 slides
Let’s go on a FAIR safari! by
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Carole Goble
1.4K views58 slides
Open Science - Global Perspectives/Simon Hodson by
Open Science - Global Perspectives/Simon HodsonOpen Science - Global Perspectives/Simon Hodson
Open Science - Global Perspectives/Simon HodsonAcademy of Science of South Africa (ASSAf)
383 views35 slides
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata... by
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...Open Science Fair
203 views21 slides
A Clean Slate? by
A Clean Slate?A Clean Slate?
A Clean Slate?Herbert Van de Sompel
5.7K views77 slides
The swings and roundabouts of a decade of fun and games with Research Objects by
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects Carole Goble
168 views62 slides

Similar to FAIRy Stories(20)

re3data.org – Registry of Research Data Repositories by Heinz Pampel
re3data.org – Registry of Research Data Repositoriesre3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositories
Heinz Pampel1.4K views
Let’s go on a FAIR safari! by Carole Goble
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
Carole Goble1.4K views
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata... by Open Science Fair
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
Open Science Fair203 views
The swings and roundabouts of a decade of fun and games with Research Objects by Carole Goble
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
Carole Goble168 views
The Future of Research (Science and Technology) by Duncan Hull
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
Duncan Hull39K views
The need for a transparent data supply chain by Paul Groth
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chain
Paul Groth2.8K views
Data curation issues for repositories by Chris Rusbridge
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
Chris Rusbridge1.2K views
HKU Data Curation MLIM7350 Class 8 by Scott Edmunds
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
Scott Edmunds407 views
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo... by Carole Goble
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
Carole Goble45 views
FAIRy stories: the FAIR Data principles in theory and in practice by Carole Goble
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
Carole Goble246 views
Networked Science, And Integrating with Dataverse by Anita de Waard
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
Anita de Waard596 views
re3data.org – a Registry of Research Data Repositories by Heinz Pampel
re3data.org – a Registry of Research Data Repositoriesre3data.org – a Registry of Research Data Repositories
re3data.org – a Registry of Research Data Repositories
Heinz Pampel1.8K views
Open Research Data: Licensing | Standards | Future by Ross Mounce
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | Future
Ross Mounce1.5K views
Minimal viable data reuse by voginip
Minimal viable data reuseMinimal viable data reuse
Minimal viable data reuse
voginip499 views
Data Standards & Best Practices for the Stratigraphic Record by Kerstin Lehnert
Data Standards & Best Practices for the Stratigraphic RecordData Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic Record
Kerstin Lehnert735 views

More from Carole Goble

Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research... by
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...Carole Goble
38 views33 slides
Research Software Sustainability takes a Village by
Research Software Sustainability takes a VillageResearch Software Sustainability takes a Village
Research Software Sustainability takes a VillageCarole Goble
40 views29 slides
FAIR Computational Workflows by
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
193 views29 slides
Open Research: Manchester leading and learning by
Open Research: Manchester leading and learningOpen Research: Manchester leading and learning
Open Research: Manchester leading and learningCarole Goble
143 views17 slides
RDMkit, a Research Data Management Toolkit. Built by the Community for the ... by
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...Carole Goble
712 views38 slides
FAIR Computational Workflows by
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
982 views49 slides

More from Carole Goble(20)

Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research... by Carole Goble
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Carole Goble38 views
Research Software Sustainability takes a Village by Carole Goble
Research Software Sustainability takes a VillageResearch Software Sustainability takes a Village
Research Software Sustainability takes a Village
Carole Goble40 views
FAIR Computational Workflows by Carole Goble
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble193 views
Open Research: Manchester leading and learning by Carole Goble
Open Research: Manchester leading and learningOpen Research: Manchester leading and learning
Open Research: Manchester leading and learning
Carole Goble143 views
RDMkit, a Research Data Management Toolkit. Built by the Community for the ... by Carole Goble
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
Carole Goble712 views
FAIR Computational Workflows by Carole Goble
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble982 views
FAIR Computational Workflows by Carole Goble
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble415 views
EOSC-Life Workflow Collaboratory by Carole Goble
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
Carole Goble132 views
FAIR Computational Workflows by Carole Goble
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble493 views
FAIR Data Bridging from researcher data management to ELIXIR archives in the... by Carole Goble
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
Carole Goble120 views
FAIR Computational Workflows by Carole Goble
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble630 views
RO-Crate: A framework for packaging research products into FAIR Research Objects by Carole Goble
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
Carole Goble425 views
How are we Faring with FAIR? (and what FAIR is not) by Carole Goble
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)
Carole Goble814 views
What is Reproducibility? The R* brouhaha and how Research Objects can help by Carole Goble
What is Reproducibility? The R* brouhaha and how Research Objects can helpWhat is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can help
Carole Goble258 views
ELIXIR UK Node presentation to the ELIXIR Board by Carole Goble
ELIXIR UK Node presentation to the ELIXIR BoardELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR Board
Carole Goble501 views
FAIRy stories: tales from building the FAIR Research Commons by Carole Goble
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research Commons
Carole Goble1.4K views
Reproducible Research: how could Research Objects help by Carole Goble
Reproducible Research: how could Research Objects helpReproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects help
Carole Goble605 views
Reflections on a (slightly unusual) multi-disciplinary academic career by Carole Goble
Reflections on a (slightly unusual) multi-disciplinary academic careerReflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic career
Carole Goble482 views
Better Software, Better Research by Carole Goble
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better Research
Carole Goble657 views
Reproducibility (and the R*) of Science: motivations, challenges and trends by Carole Goble
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
Carole Goble1.8K views

Recently uploaded

Nitrosamine & NDSRI.pptx by
Nitrosamine & NDSRI.pptxNitrosamine & NDSRI.pptx
Nitrosamine & NDSRI.pptxNileshBonde4
18 views22 slides
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe... by
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...Anmol Vishnu Gupta
26 views12 slides
Factors affecting fluorescence and phosphorescence.pptx by
Factors affecting fluorescence and phosphorescence.pptxFactors affecting fluorescence and phosphorescence.pptx
Factors affecting fluorescence and phosphorescence.pptxSamarthGiri1
7 views11 slides
application of genetic engineering 2.pptx by
application of genetic engineering 2.pptxapplication of genetic engineering 2.pptx
application of genetic engineering 2.pptxSankSurezz
14 views12 slides
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... by
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...ILRI
5 views6 slides
How to be(come) a successful PhD student by
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD studentTom Mens
537 views62 slides

Recently uploaded(20)

Nitrosamine & NDSRI.pptx by NileshBonde4
Nitrosamine & NDSRI.pptxNitrosamine & NDSRI.pptx
Nitrosamine & NDSRI.pptx
NileshBonde418 views
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe... by Anmol Vishnu Gupta
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...
Study on Drug Drug Interaction Through Prescription Analysis of Type II Diabe...
Factors affecting fluorescence and phosphorescence.pptx by SamarthGiri1
Factors affecting fluorescence and phosphorescence.pptxFactors affecting fluorescence and phosphorescence.pptx
Factors affecting fluorescence and phosphorescence.pptx
SamarthGiri17 views
application of genetic engineering 2.pptx by SankSurezz
application of genetic engineering 2.pptxapplication of genetic engineering 2.pptx
application of genetic engineering 2.pptx
SankSurezz14 views
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ... by ILRI
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
ILRI5 views
How to be(come) a successful PhD student by Tom Mens
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD student
Tom Mens537 views
A giant thin stellar stream in the Coma Galaxy Cluster by Sérgio Sacani
A giant thin stellar stream in the Coma Galaxy ClusterA giant thin stellar stream in the Coma Galaxy Cluster
A giant thin stellar stream in the Coma Galaxy Cluster
Sérgio Sacani18 views
Structure of purines and pyrimidines - Jahnvi arora (11228108), mmdu ,mullana... by jahnviarora989
Structure of purines and pyrimidines - Jahnvi arora (11228108), mmdu ,mullana...Structure of purines and pyrimidines - Jahnvi arora (11228108), mmdu ,mullana...
Structure of purines and pyrimidines - Jahnvi arora (11228108), mmdu ,mullana...
jahnviarora9897 views
Discovery of therapeutic agents targeting PKLR for NAFLD using drug repositio... by Trustlife
Discovery of therapeutic agents targeting PKLR for NAFLD using drug repositio...Discovery of therapeutic agents targeting PKLR for NAFLD using drug repositio...
Discovery of therapeutic agents targeting PKLR for NAFLD using drug repositio...
Trustlife142 views
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F... by SwagatBehera9
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
Effect of Integrated Nutrient Management on Growth and Yield of Solanaceous F...
SwagatBehera95 views
Applications of Large Language Models in Materials Discovery and Design by Anubhav Jain
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
Anubhav Jain13 views
2. Natural Sciences and Technology Author Siyavula.pdf by ssuser821efa
2. Natural Sciences and Technology Author Siyavula.pdf2. Natural Sciences and Technology Author Siyavula.pdf
2. Natural Sciences and Technology Author Siyavula.pdf
ssuser821efa10 views
Open Access Publishing in Astrophysics by Peter Coles
Open Access Publishing in AstrophysicsOpen Access Publishing in Astrophysics
Open Access Publishing in Astrophysics
Peter Coles1.2K views
Light Pollution for LVIS students by CWBarthlmew
Light Pollution for LVIS studentsLight Pollution for LVIS students
Light Pollution for LVIS students
CWBarthlmew12 views
Exploring the nature and synchronicity of early cluster formation in the Larg... by Sérgio Sacani
Exploring the nature and synchronicity of early cluster formation in the Larg...Exploring the nature and synchronicity of early cluster formation in the Larg...
Exploring the nature and synchronicity of early cluster formation in the Larg...
Sérgio Sacani1.2K views

FAIRy Stories

  • 1. FAIRy stories for Christmas Carole Goble The University of Manchester, UK carole.goble@manchester.ac.uk ELIXIR-UK, FAIRDOM, ISBE, BioExcel CoE, Software Sustainability Institute Open PHACTS SWAT4HCLS 2017, 5th Dec 2017, Rome
  • 2. Once upon a time in a land far, far away lived a KinG … Who wanted all data to be FAIR….
  • 4. Mark D. Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E. Bourne, Jildau Bouwman, Anthony J. Brookes, Tim Clark, Mercè Crosas, Ingrid Dillo, Olivier Dumon, Scott Edmunds, Chris T. Evelo, Richard Finkers, Alejandra Gonzalez-Beltran, Alasdair J.G. Gray, Paul Groth, Carole Goble, Jeffrey S. Grethe, Jaap Heringa, Peter A.C ’t Hoen, Rob Hooft, Tobias Kuhn, Ruben Kok, Joost Kok, Scott J. Lusher, Maryann E. Martone, Albert Mons, Abel L. Packer, Bengt Persson, Philippe Rocca-Serra, Marco Roos, Rene van Schaik, Susanna-Assunta Sansone, Erik Schultes, Thierry Sengstag, Ted Slater, George Strawn, Morris A. Swertz, Mark Thompson, Johan van der Lei, Erik van Mulligen, Jan Velterop, Andra Waagmeester, Peter Wittenburg, Katherine Wolstencroft, Jun Zhao, Barend Mons Wilkinson Dumontier Schultes Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18
  • 5. Queens… And FAIRY GODMOTHERS Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18
  • 6. Machine Processable Metadata Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18 • Catalogues, Search, Stores • Metadata Standards • StandardAccess protocols • Identifiers, Policies • Authorised Access • Licensing
  • 7. FAIR spread across the lands …… VIVO/SciTS Conferences 6-8 August 2014, Austin, TX
  • 8. FAIR spread across the lands ……
  • 9. Stakeholder FAIR Awareness UK Institutional Research Data Management guidance* * Jisc: Final Report FAIR in Practice, Nov 2017 Government, Funder, Publisher, National & International Infrastructures… Institutional Researchers FAIR spread across the lands …… BUT not necessarily all the peoples
  • 10. FAIR spread across the lands ……
  • 11. Moral: Names are important Spinning (metadata) straw into gold Be careful what you promise…
  • 12. Me Too! staking claims we { are | will be | always have been } FAIR a rallying flag
  • 15. Beware… beauty is in the eye of the beholder What’s FAIR from a Cataloguer perspective maybe useless from a biologists viewpoint
  • 16. My Semantic FAIRy Stories The Scientist and the FAIR Commons The MAGIC Research Object little semantics and the big Web
  • 17. The Scientists and the FAIR Research Commons Supporting mixed types and many researchers FAIR
  • 18. The Scientists and the FAIR Research Commons Find: ID resolution Faceted Navigation Search, RDF SPARQL endpoint, APIs A Commons for Workflows myexperiment.org A Commons for Systems Biology Projects fairdomhub.org investigation study assay/analysis data models SOPs
  • 19. Community & Project Commons Structured organisation across standards and types Federation over autonomous resources Laissez-Faire Independent Users Ecosystem of types, stores and metadata
  • 20. Own little houses: from straw to bricks Permission controls Staged sharing Licenses Negotiated access Embargos Open
  • 21. Schema Dublin core Datacite, DCAT, Bioschemas Catalogue Level Investigation Studies Assay/Analysis Content level Persistent Identifiers Content level subject thematic standards Content level Stratified Linked Data
  • 22. Getting the best FAIR metadata…. FAIR Access – myExperiment -> open – FAIRDOM -> friends and family – Hand over straw houses to FAIRDOMHub “TheTragedy of the Commons”* – Metadata quality and quantity – Identifier hygiene – Curation & contributions – Public good vs personal burden – Incorporation into processes – Community socialisation - obligations mismatches. Credit! *Mark Musen , https://ncip.nci.nih.gov/blog/face-new-tragedy-commons-remedy-better-metadata/
  • 23. project PIs, funders time burden, distrust project PIs, funders PALs – juniors, advocates and Cinderellas templates, tools benefit
  • 25. Bake in “Semantic Nudging” Ontologies stealthily embedded in Excel spreadsheet templates Added value - Model execution Vanity, guilt, shaming Automation rightfield.org.uk
  • 27. “The Last Mile”* -> The First Mile FAIR from bench to cloud Last mile - Infrastructure view First mile - researcher / resource view * Dimitrios Koureas et al Community engagement: The ‘last mile’ challenge for European research e-infrastructures Research I deas and Outcomes 2: e9933 (20 Jul 2016) https://doi.org/10.3897/rio.2.e9933
  • 28. the generic vs specific zig zag path
  • 29. The MAGIC Research OBJECT GENERIC Framework For exchange, reproducibility, Preservation, active artefacts Universal Catering, bottomless content FAIR
  • 30. The FAIR Research Object import, exchange, portability, maintenance ISA-TAB Bergman et al COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project, BMC Bioinformatics 2014, 15:369
  • 31. workflow engine Workflow Run Provenance Inputs Outputs Intermediates Parameters Configs Narrative Exchange between people & platforms Commons store, catalogue & archive Reproduce preserve, port, repair Activate re-compute, mix, compare, evolve The FAIR Workflow Research Object
  • 32. researchobject.org Bechhofer et al (2013) Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004 Bechhofer et al (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/ Standards-based generic metadata framework for bundling internal and external resources with context citable reproducible packaging Data used and results produced in study Methods employed to produce/analyse data Provenance and settings for the experiments People involved in the investigation Annotations about these resources:- understanding & interpretation
  • 33. Linking across ROs and into the Linked Open Data Cloud • Recording & linking together the components of an experiment • Linking across experiments. • Linked ROs • A SemanticWeb of Research Objects • Resource References – a bottomless pot
  • 34. Technology Independent. The least possible. The simplest feasible. Low tech. Low user overhead and thin client Graceful degradation. FAIR ROs Desiderata
  • 35. Construction Content Profile Types Identification to locate things Aggregates to link things together Annotations about things & their relationships Type Checklists what should be there Provenance where it came from Versioning its evolution Dependencies what else is needed Manifest checklist Type Checklists describing what should be there Container Metadata Objects
  • 36. Construction http://www.researchobject.org/specifications/ RO Model Identifiers: URI, RRI, DOI, ORCID W3C Web AnnotationVocabulary Open Archives Initiative Object Exchange and Reuse Aggregation Annotation Container
  • 38. Profile http://purl.org/minim/description W3C Shape Specs *Gamble, Zhao, Klyne, Goble. "MIM: A Minimum Information Model Vocabulary and Framework for Scientific Linked Data", IEEE eScience 2012 Chicago, USA October, 2012), http://dx.doi.org/10.1109/eScience.2012.6404489 validators / viewers Minim model for defining checklists* multiple profiles for different consumers Generic Specifics RO-SHOW Container
  • 39. Linked Data Pharmacological Discovery Platform Data Releases Dataset “build” RO Library Earth Sciences Public Health Learning Systems Asthma Research e- Lab sharing and computing statistical cohort studies Happy Endings! ISA based Packaging, Systems Biology commons & publishing Managing distributed unmovable large datasets for Biomedical HTS analytic pipelines * * Chard et al I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets, https://doi.org/10.1109/BigData.2016.7840618
  • 40. Happy Ending – Workflows Biomedical HTS analytic pipelines Manifest description of CWL workflows + rich context + provenance + other objects + snapshots Precision medicine NGS pipelines regulation* *Alterovitz, Dean II, Goble, Crusoe, Soiland-Reyes et al Enabling Precision Medicine via standard communication of NGS provenance, analysis, and results, biorxiv.org, 2017, https://doi.org/10.1101/191783 EDAM Biomolecular modelling PortableWorkflows
  • 41. BagIT, JSON(-LD), schema.org https://dokie.li/ https://linkedresearch.org/ Manifest: Schema.org, JSON-LD, RDF Archive: .tar.gz Reproducible Document Stack project eLife, Substance and Stencila BagIT data profile + schema.org JSON-LD annotations Many Roads
  • 42. Morals Incremental, open frameworks hard work – Extensive reuse of standards is tricky – Too Generic vsToo Specific – Multi-element type & nesting challenges – ROs with a Purpose – Examples & templates Representational Beauty vsTools – Easy to make, hard to consume – Be specific, be developer friendly – Profiles & tools critical Patience is a virtue
  • 43. Bioschemas: Little Semantics and the big web Being and keeping light, small and viral FAIR
  • 44. Structured data markup for web pages Schema.org adds simple structured metadata markup to web pages & sitemaps for harvesting, search and summary snippet making. Search engines often highlight websites containing Schema.org Widespread commercial and open source infrastructure creates a low barrier to adoption
  • 45. Goldilocks & the 3 Use Cases Standardised metadata mark-up Metadata published & harvested withoutAPIs or special feeds 3 Use Cases 1. Finding/Citing, 2. Summary snippets 3. Metadata exchange / ingest Goldilocks • Reuse ubiquitous commercial platform • The least possible change, the max possible reuse • Minimum properties – 6 • Reuse domain ontologies – we are not reinventing them! Commodity Off the Shelf tools App eco-system Repository Level Content type level
  • 46. Standardised metadata mark-up Metadata published & harvested withoutAPIs or special feeds Commodity Off the Shelf tools App eco-system Repository Level Content type level Goldilocks & the 3 Use Cases
  • 47. Training materialsEvents Organizations Data Software Lab Protocols schema.org tailored to the Biosciences for FAIR simple structured metadata markup on web pages & sitemaps bio.tools
  • 48. schema.org tailored to the Biosciences simple structured metadata markup on web pages & sitemaps • Specific for life sciences • Extends existing Schema.org types • Focused on few types and well defined relationships • Minimum properties for finding and accessing data • Best practices for selected properties • Managed by Bioschemas.org • Generic data model • Generous list of properties to describe data types • Managed by Schema.org
  • 49. Tailored schema.org to improve Findability and Accessibility in Bioscience Layer of constraints + documentation + extensions Leyla Garcia. Poster & Flashtalk
  • 50. 2-3 Oct 2017, Hinxton, ~50 people Ideally 6 concepts Reuse ontologies schema.org Real mark-up Tools Find, Cite, Snippets, Metadata exchange Community
  • 52. MORALs Community Buy-in Worth it • First specs & main mechanism for training • Google / Schema & ELIXIR support • Research Schemas for EuropeanOpen Science Cloud pilot Goldilocks works but is hard work • Types & Profiles debates • Elegance vs best for tools • Reuse domain ontologies • Validation, mark-up & harvesting tools Trolls
  • 53. How are we FAIRing? Different levels with different emphasis Its an Ecosystem, not a single solution • Catalogues, Search, Stores • Metadata Standards • StandardAccess protocols • Identifiers, Policies • AuthorisedAccess • Licensing
  • 54. smart rebrand launch Still hard, same stuff Rally big communities and grassroots initiatives Examine our capabilities There is no magic
  • 56. Platform & user buy-in from the get-go Passionate, dedicated leadership Seeding critical mass Community Tools Driver Bottom up initiatives fostered by big umbrellas infrastructures FAIR Semantic Village* Simple & Lightweight Ramps not revolutions FAIR with a PURPOSE & With PEOPLE FAIR Support typical developer – Familiarity – JSON, APIs *Deb McGuinness
  • 57. Research for FAIR FAIR representation • The Semantic Web Automated metadata • Deep learning, machine learning, AI • Text Mining, Ontology mapping Social metadata • User Experience, Crowd Sourcing • Choice architecture FAIR action • Blockchain • Virtualised & remote execution • Image processing • Preservation & portability • Provenance tracking, object trajectories • Engineering & Design, Ethics, Social Sciences Research + Developer Practitioner practices
  • 58. Mark Robinson Norman Morrison Paul Groth Tim Clark Alejandra Gonzalez-Beltran Philippe Rocca-Serra Ian Cottam Susanna Sansone Kristian Garza Daniel Garijo Catarina Martins Iain Buchan Caroline Jay David De Roure Oscar Corcho Steve Pettifer Khalid Belhajjame Jun Zhao Phil Crouch Lilian Gorea, Oluwatomide Fasugba Stian Soiland-Reyes Michael Crusoe Rafael Jimenez Alasdair Gray Barend Mons Sean Bechhofer Michel Dumontier Mark Wilkinson Leyla Garcia Stuart Owen KatyWolstencroft Finn Bacall Alan Williams Wolfgang Mueller Olga Krebs Jacky Snoep Matthew Gamble Raul Palma Mark Musen http://www.researchobject.org http://www.myexperiment.org http://wf4ever.org http://www.fair-dom.org http://www.fairdomhub.org http://seek4science.org http://rightfield.org.uk http://www.bioschemas.org http://www.commonwl.org http://www.bioexcel.eu http://www.openphacts.org

Editor's Notes

  1. Findable Accessable Interoperable Reusable < data |models | SOPs | samples | articles| * >. FAIR is a mantra; a meme; a myth; a mystery; a moan. For the past 15 years I have been working on FAIR in a bunch of projects and initiatives in Life Science projects. Some are top-down like Life Science European Research Infrastructures ELIXIR and ISBE, and some are bottom-up, supporting research projects in Systems and Synthetic Biology (FAIRDOM), Biodiversity (BioVel), and Pharmacology (open PHACTS), for example. Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. Some have happy endings. Who are the villains and who are the heroes? What are the morals we can draw from these stories?
  2. The additions are hidden behind these … just as important and not the same….
  3. Many Princes Scientific Data 3, Article number: 160018 (2016)DOIdoi:10.1038/sdata.2016.18 https://www.nature.com/articles/sdata201618 (2016)
  4. ELIXIR, RDA
  5. Child as first payment Be careful what you promise
  6. Slide from NLM CLA RIN? CERIF, CLARIN me too! the elephant & blind men
  7. Who are the witches and the godmothers? What the get out clause?
  8. Three – open PHACTS? What did we learn – much harder than you think.
  9. Windsor….what did we learn? Distributed commons Dig out user numbers
  10. Cliques and complementarity Visibility is muted. Licensing… PI leadership Sticking to conventions Local responsibility Time and resource Curation recognition Trust Tribal trading behaviours Enclave sharing Not public donation Reciprocity & credit Drivers … External dominate Personal productivity
  11. Stratified to hide the visible from the invisible. We also have APIs, RAILS
  12. Consumer – producer obligations mismatches Wolves: Project PIs, funders, time Godmothers: Project PIs, “PALs”, templates, funders Deferred pain The ant and the grasshopper Automate or sneak From the IB 13 talk and the Group 09 talk Active enclave sharing Public sharing tricky even after publication, bribery and threats Data Hugging, Flirting and Voyerism Playground rules apply Fluid, transient collaborations > membership mgt pain in a*se Shameless exploitation of PI competitiveness & vanity PI & Funder leadership Pan project spawned collaborations – YES!!!! But not necessarily visible to us.
  13. PALs are also the cinderellas The scientists’ world does not revolve around your infrastructure or agenda.
  14. Bullying doesn’t work Fame / Shame Money / Burden Love / Fear Side effect / special effort
  15. Templates! Spreadsheets spreadsheets are your friend, not Cinderellas Similarly on myexperiment – metadata in CWL can be extracted… Choice
  16. Don’t necessarily interleave
  17. Across platforms
  18. Bechhofer, Sean, De Roure, David, Gamble, Matthew, Goble, Carole and Buchan, Iain (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge At The Future of the Web for Collaborative Science (FWCS 2010), United States. Why linked data is not enough for scientists Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble Publication date 2013/2/28 Journal Future Generation Computer Systems Volume 29 Issue 2 Pages 599-611 Publisher North-Holland
  19. Recording & linking together the components of an experiment Linking across experiments. Linked Ros Bigger on the inside than the outside
  20. Predated the FAIR Principles Element enumeration Identification & citation Description tracking attributes (metadata) and origins (provenance) of contents. Simplicity - low user overhead and thin (no) client
  21. RO-bagit
  22. Generic tools multiple bespoke profiles – RDA Data Provenance approach. One for CERIF, one for DataCite Typing
  23. HIDDEN SLIDE Specific to the generic
  24. HIDDEN SLIDE Context of data content together when its scattered transferring and archiving very large HTS datasets in a location-independent way These tools combine a simple and robust method for describing data collections (BDBags), data descriptions (Research Objects), and simple persistent identifiers (Minids) to create a powerful ecosystem of tools and services for big data analysis and sharing. We present these tools and use biomedical case studies to illustrate their use for the rapid assembly, sharing, and analysis of large datasets.
  25. SEAD – Jim Myers
  26. Too vague and too general – needed profile lock-down Can’t make profiles in the abstract
  27. First specifications: Bio data infrastructure Data Catalog Datasets Bio data types Human beacons Samples Plant Phenotypes Proteins (Chemistry) Bio stuff Training materials Events Laboratory protocols Workflows and Tools
  28. Of course this is relevant to ROs – dataset in particular is similar to collection. An RO is a structured collection.
  29. Now the most popular mechanism for publishing and harvesting metadata, beating APIs and scrapping.
  30. HIDDEN SLIDE Usecases Biobanks should be able to crawl the BioSamples database to identify all the published (and searchable) datasets derived from samples they have provided Public archives should be able to crawl Biobank websites, in order to identify samples that are known to have public accessions in the BioSamples database AND that can be made publicly available, and thereby link public samples to a provider (“where can I get more of this sample?”).   In case of privacy or consent considerations, only the biobank should know what are the specific samples connected to publicly available datasets Public archives should be able to crawl Biobank websites, in order to identify ‘sanitised’ sample metadata descriptions (again, in case of confidentiality or consent considerations).  Biobanks remain responsible for ensuring only authorised metadata is visible, and can control access to restricted samples. Assumptions Each sample provided by a biobank has an opaque pseudo-anonymous identifier that is assigned by the biobank to identify a specific sample (referred to hereafter as the “sample name”) Each sample reported in a public archive or used to generate a public dataset has a public, BioSamples database accession (hereafter called “sample identifier”). In some cases, a biobank may issue different sample identifiers when providing the same sample to different projects. This may result in duplicated sample accessions in the BioSamples database Given these use cases and assumptions, we will use Bioschemas to describe sample links.  The main challenge is therefore the identification of links between sample identifiers (within Biobanks) and sample accessions (from the BioSamples database).  This is not always possible without considerable additional curation effort, but of the 5 million samples in the BioSamples database, over 4 million declare either a ‘synonym’, ‘sample source name’ or ‘source name’ attribute, frequently used to encode the original biobank sample name.  Exposing these in a structured manner through the BioSamples database would allow Biobanks to crawl and analyse this content, marrying sample that are recognised with their own internal identifiers. Once this mapping is done, Biobanks can then re-expose these links through structured content on their own websites, allowing public resources to reciprocate links from public records back to the sample provider. Implementation Study Outline Objectives Facilitate the ingestion of sample metadata from data repositories (eg. Biobank databases) into registries like the BioSamples, BBMRI Biobank directory or the UKCRC Tissue Directory via Bioschemas. Engage and help data providers and developers of BioBank LIMS to test and adopt the exposure of sample metadata via Bioschemas Contribute to contextualise information from data sample registries (eg. BioSamples) and biobank sample repositories (eg. NL Biobank) and Biobank Registries (eg. BBMRI Biobank directory) Make registries like BioSamples compliant with Bioschemas. Biobanks crawl BioSamples to discover sample accessions, markup etc if they have 'known' biobank name fields. Sample (study) catalogues provide findability for the individual samples - Aligning with MIABIS Sample Donor and Sample modules Work with repositories/Biobanks/LIMS to adopt Bioschema • Develop general crawler: in collaboration with Bioschema community F2Share (Federation framework for data Sharing): https://github.com/MIABIS/logstash-configuration-generator/wiki
  31. More tools needed than thought! 14+ repositories marked up
  32. HIDDEN SLIDE Maintain common profiles across scientific domains focused on finding and accessing data Minimum properties General best practices Support different scientific domains to extend and develop domain specific profiles
  33. Evidence for the funders and researchers Focused on technical and social, but the economics and political is critical.
  34. Ecosystem Grassroots community activities Fostered by Infrastructure Initiatives Don’t squash the start up! Open standards and lightweight Practical engineering Keeping it simple and real Ramps rather than Revolution Specialist, bespoke Rise of containers Too vague and too general – needed profile lock-down Can’t make profiles in the abstract
  35. Added afterwards….
  36. Successes Multiple apps developed 500+ users 20-30 million hits a month Used to answer real pharmaceutical research questions API documentation Lessons Support the typical app developer workflow (i.e. APIs, JSON) Support domain specific (non-RDF) services Identifier equivalence is non-trivial Free text search is important Staying up-to-date with dataset updates is a challenge