Research Shared
BOSC
July 11th 2015, Dublin
Norman Morrison, The University of Manchester
researchobject.org
Framework
A	
  framework	
  to	
  bundle,	
  exchange	
  and	
  link	
  (scattered)	
  resources	
  about	
  experiments.	
  
Framework desiderata
	
  
	
  
	
  
	
  
Technology	
  Independent.	
  
The	
  least	
  possible	
  
The	
  simplest	
  feasible	
  
Graceful degradation
Standard	
  tooling	
  
How?
The	
  Container	
  
	
  
Packaging:	
  	
  
Zip	
  files,	
  Docker	
  images,	
  BagIt,	
  Web,	
  …	
  
Catalogues	
  &	
  Commons:	
  	
  
FAIRDOM	
  SEEK,	
  Farr	
  Commons	
  CKAN,	
  
myExperiment,	
  Zenodo,	
  Figshare,	
  …	
  
Manifest	
  
Describes the aggregated resources, their
annotations and provenance	
  
Manifest
Manifest
Manifest	
  Construction	
  
•  Identification	
  –	
  id,	
  title,	
  creator,	
  status….	
  
•  Aggregates	
  –	
  list	
  of	
  ids/links	
  to	
  resources	
  
•  Annotations	
  –	
  list	
  of	
  annotations	
  about	
  
resources	
  
Manifest
Manifest	
  Description	
  
•  Checklists	
  –	
  	
  what	
  should	
  be	
  there	
  
•  Provenance	
  –	
  where	
  it	
  came	
  from	
  
•  Versioning	
  –	
  its	
  evolution	
  
•  Dependencies	
  –	
  what	
  else	
  is	
  needed	
  
Manifest
Manifest
id:	
  doi:10.000/zenodo.123	
  
createdOn:	
  2015-­‐07-­‐10T16:46:00Z	
  
createdBy:	
  http://orcid.org/0000-­‐0001-­‐9842-­‐9718	
  
aggregates:	
  	
  
	
  	
  -­‐	
  id:	
  /sequence/specimen5.bam	
  
	
  	
  	
  	
  conformsTo:	
  http://gemrb.org/iesdp/file_formats/ie_formats/bam_v1.htm	
  	
  	
  	
  
	
  	
  -­‐	
  id:	
  http://example.com/blog/about-­‐specimen5	
  
	
  	
  	
  	
  authoredBy:	
  http://orcid.org/0000-­‐0001-­‐7066-­‐3350	
  	
  
	
  	
  -­‐	
  id:	
  http://www.myexperiment.org/workflows/3355	
  	
  
	
  	
  	
  	
  history:	
  provenance/workflow-­‐evolution.ttl	
  
annotations:	
  
	
  	
  -­‐	
  about:	
  	
  	
  /sequence/specimen5.bam	
  
	
  	
  	
  	
  content:	
  annotations/specimen5-­‐properties.jsonld	
  
	
  	
  	
  	
  createdBy:	
  http://orcid.org/0000-­‐0001-­‐7066-­‐3350	
  
	
  	
  -­‐	
  about:	
  	
  	
  /sequence/specimen5.bam	
  
	
  	
  	
  	
  content:	
  http://example.com/blog/about-­‐specimen5	
  
	
  	
  	
  	
  oa:motivatedBy	
  oa:questioning	
  
RO Principles
Use unique identifiers as names for things.
Use some mechanism of aggregation to
group things together.
Provide metadata about those things &
how they relate to each other.
Get tooled up
https://github.com/ResearchObject
Real world examples
•  Reviewed to Reproduced
•  Workflow run (CWL)
•  Farr Commons
•  Capturing and describing Docker images
for CERN Atlas analyses
•  FAIR-DOM http://fair-dom.org/
– SEEK http://seek4science.org/
•  FAIR Publishing - RO to Figshare
Reviewed to Reproduced
Reviewed to Reproduced
From González-Beltrán et al. doi:
10.1371/journal.pone.0127612
Reproducibility
Same data
Same code
Systematic and
extensible
meta-data
collection
✔
✔
Workflow Run
workflowrun.prov.ttl
(RDF)
outputA.txt
outputC.jpg
outputB/
intermediates/
1.txt
2.txt
3.txt
de/def2e58b-50e2-4949-9980-fd310166621a.txt
inputA.txt
workflow attribution
execution
environment
Aggregating in Research Object
ZIP folder structure (RO Bundle)
mimetype
application/vnd.wf4ever.robundle
+zip	
  	
  
.ro/
manifest.json
URI
reference
s
Exchange
Reproducibility
Same data
Same code
Systematic and
extensible meta-
data collection
Uses RO Model WF
Extension - basis of
CWL
✔
✔
✔
✔
RO’s and Sensitive data
Farr Commons
Exchange
Systematic and
extensible
meta-data
collection
✔
✔
Use	
  case:	
  ATLAS	
  Collider	
  	
  
Data	
  Analytics	
  
Portable,	
  lightweight	
  
application	
  runtime	
  
and	
  packaging	
  tool.	
  	
  
Image	
  
ATLAS	
  and	
  CMS	
  detector	
  data	
  
Charles	
  Vardeman,	
  Da	
  Huo	
  	
  
	
  
All	
  data	
  and	
  files	
  
of	
  the	
  execution	
  
+	
  Instructions	
  
convert	
  
bundle	
  
manifest	
  
Relate	
  files	
  	
  
and	
  layers	
  
Add	
  provenance	
  
and	
  annotations	
  
Link	
  in	
  other	
  
content	
  
run	
  
Exchange
Reproducibility
Same data
Same code
Same run time
environment
Systematic and
extensible meta-
data collection
✔
✔
✔
FAIRDOM SEEK
FAIRDOM
Export as RO Model, Data, SOP,
Parameters
RO Unzip
Reproducibility
Versioning
Systematic and
extensible
meta-data
collection
✔
✔
✔
FAIR Publishing
Research Objects
•  Reproducibility
– Same data, same code, same run time
environment
•  Versioning
•  Exchange
•  Systematic and extensible meta-data
collection
Research Objects
Publish a digital record
of your entire scientific
enterprise
You can give it
to someone
else
You can get
credit for it
People think
you are a good
person
You get a
promotion
•  Why does this matter to Biologists?
Okay, but what does it cost?
Conclusion
•  Simple solution, addressing needs towards
transparent FAIR principles
–  Findable, Accessible, Interoperable, Reproducible
•  Adoption
–  Training
•  Online tutorials
•  Face to face
–  Need more tools that take advantage of the RO
Framework and lower the cost (technological
debt) of reproducibility
•  Work together
Acknowledgements
Carole	
  Goble	
  
Stian	
  Soiland-­‐Reyes	
  
Matt	
  Gamble	
  
Rob	
  Haines	
  	
  
Sean	
  Bechhofer	
  
Phil	
  Crouch	
  
Finn	
  Bacall	
  
Stuart	
  Owen	
  
Carole	
  Goble	
  
Khalid	
  Belhajjame	
  
	
  
Graham	
  Klyne	
  
Jun	
  Zhao	
  	
  
	
  
Daniel	
  Garijo,	
  	
  
Oscar	
  Corcho	
  
	
  
Esteban	
  García	
  
Cuesta	
  
University	
  of	
  
Manchester	
  	
  
University	
  of	
  Oxford	
  
Lancaster	
  University	
  	
  
UPM	
  	
  
http://researchobject.org	
  
http://fair-­‐dom.org	
  
http://www.seek4science.org	
  
http://www.farrinstitute.org	
  
http://www.wf4ever-­‐project.org	
  
http://myexperiment.org	
  
	
  
Raul	
  Palma	
  	
  
iSOCO	
  
PSNC	
  
Paris	
  6	
  

Research Shared: researchobject.org

  • 1.
    Research Shared BOSC July 11th2015, Dublin Norman Morrison, The University of Manchester researchobject.org
  • 2.
    Framework A  framework  to  bundle,  exchange  and  link  (scattered)  resources  about  experiments.  
  • 3.
    Framework desiderata         Technology  Independent.   The  least  possible   The  simplest  feasible   Graceful degradation Standard  tooling  
  • 4.
    How? The  Container     Packaging:     Zip  files,  Docker  images,  BagIt,  Web,  …   Catalogues  &  Commons:     FAIRDOM  SEEK,  Farr  Commons  CKAN,   myExperiment,  Zenodo,  Figshare,  …   Manifest   Describes the aggregated resources, their annotations and provenance   Manifest
  • 5.
    Manifest Manifest  Construction   • Identification  –  id,  title,  creator,  status….   •  Aggregates  –  list  of  ids/links  to  resources   •  Annotations  –  list  of  annotations  about   resources   Manifest Manifest  Description   •  Checklists  –    what  should  be  there   •  Provenance  –  where  it  came  from   •  Versioning  –  its  evolution   •  Dependencies  –  what  else  is  needed   Manifest
  • 6.
    Manifest id:  doi:10.000/zenodo.123   createdOn:  2015-­‐07-­‐10T16:46:00Z   createdBy:  http://orcid.org/0000-­‐0001-­‐9842-­‐9718   aggregates:        -­‐  id:  /sequence/specimen5.bam          conformsTo:  http://gemrb.org/iesdp/file_formats/ie_formats/bam_v1.htm            -­‐  id:  http://example.com/blog/about-­‐specimen5          authoredBy:  http://orcid.org/0000-­‐0001-­‐7066-­‐3350        -­‐  id:  http://www.myexperiment.org/workflows/3355            history:  provenance/workflow-­‐evolution.ttl   annotations:      -­‐  about:      /sequence/specimen5.bam          content:  annotations/specimen5-­‐properties.jsonld          createdBy:  http://orcid.org/0000-­‐0001-­‐7066-­‐3350      -­‐  about:      /sequence/specimen5.bam          content:  http://example.com/blog/about-­‐specimen5          oa:motivatedBy  oa:questioning  
  • 7.
    RO Principles Use uniqueidentifiers as names for things. Use some mechanism of aggregation to group things together. Provide metadata about those things & how they relate to each other.
  • 8.
  • 9.
    Real world examples • Reviewed to Reproduced •  Workflow run (CWL) •  Farr Commons •  Capturing and describing Docker images for CERN Atlas analyses •  FAIR-DOM http://fair-dom.org/ – SEEK http://seek4science.org/ •  FAIR Publishing - RO to Figshare
  • 10.
  • 11.
    Reviewed to Reproduced FromGonzález-Beltrán et al. doi: 10.1371/journal.pone.0127612 Reproducibility Same data Same code Systematic and extensible meta-data collection ✔ ✔
  • 12.
    Workflow Run workflowrun.prov.ttl (RDF) outputA.txt outputC.jpg outputB/ intermediates/ 1.txt 2.txt 3.txt de/def2e58b-50e2-4949-9980-fd310166621a.txt inputA.txt workflow attribution execution environment Aggregatingin Research Object ZIP folder structure (RO Bundle) mimetype application/vnd.wf4ever.robundle +zip     .ro/ manifest.json URI reference s Exchange Reproducibility Same data Same code Systematic and extensible meta- data collection Uses RO Model WF Extension - basis of CWL ✔ ✔ ✔ ✔
  • 13.
  • 14.
  • 15.
    Use  case:  ATLAS  Collider     Data  Analytics   Portable,  lightweight   application  runtime   and  packaging  tool.     Image   ATLAS  and  CMS  detector  data   Charles  Vardeman,  Da  Huo       All  data  and  files   of  the  execution   +  Instructions   convert   bundle   manifest   Relate  files     and  layers   Add  provenance   and  annotations   Link  in  other   content   run   Exchange Reproducibility Same data Same code Same run time environment Systematic and extensible meta- data collection ✔ ✔ ✔
  • 16.
  • 17.
  • 18.
    Export as ROModel, Data, SOP, Parameters
  • 19.
  • 20.
  • 21.
    Research Objects •  Reproducibility – Samedata, same code, same run time environment •  Versioning •  Exchange •  Systematic and extensible meta-data collection
  • 22.
    Research Objects Publish adigital record of your entire scientific enterprise You can give it to someone else You can get credit for it People think you are a good person You get a promotion •  Why does this matter to Biologists?
  • 23.
    Okay, but whatdoes it cost?
  • 24.
    Conclusion •  Simple solution,addressing needs towards transparent FAIR principles –  Findable, Accessible, Interoperable, Reproducible •  Adoption –  Training •  Online tutorials •  Face to face –  Need more tools that take advantage of the RO Framework and lower the cost (technological debt) of reproducibility •  Work together
  • 25.
    Acknowledgements Carole  Goble   Stian  Soiland-­‐Reyes   Matt  Gamble   Rob  Haines     Sean  Bechhofer   Phil  Crouch   Finn  Bacall   Stuart  Owen   Carole  Goble   Khalid  Belhajjame     Graham  Klyne   Jun  Zhao       Daniel  Garijo,     Oscar  Corcho     Esteban  García   Cuesta   University  of   Manchester     University  of  Oxford   Lancaster  University     UPM     http://researchobject.org   http://fair-­‐dom.org   http://www.seek4science.org   http://www.farrinstitute.org   http://www.wf4ever-­‐project.org   http://myexperiment.org     Raul  Palma     iSOCO   PSNC   Paris  6