Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Wf4Ever: Annotatingresearch objectsStian Soiland-Reyes, Sean BechHofermyGrid, University of ManchesterOpen Annotation Roll...
Motivation: Scientific workflowsCoordinated execution ofservices and linked resourcesDataflow between servicesWeb services...
But workflows are complex machinesOutputsInputsConfigurationComponentshttp://www.myexperiment.org/workflows/3355• Will it ...
Electronic Paper Not EnoughHypothesis ExperimentResultAnalysisConclusionsInvestigationData DataElectronicpaperPublishhttp:...
http://www.researchobject.org/RESEARCH OBJECT (RO)http://www.researchobject.org/Research objects goal: Openly share everyt...
What is in a research object?A Research Object bundles and relates digitalresources of a scientific experiment orinvestiga...
Gathering everythingResearch Objects (RO) aggregate related resources, theirprovenance and annotationsConveys “everything ...
Why Research Objects?i. To share your research materials(RO as a social object)ii. To facilitate reproducibility and reuse...
A Research object http://alpha.myexperiment.org/packs/387
QualityAssessment of a research object
QualityMonitoring
Annotations in research objectsTypes: “This document contains an hypothesis”Relations: “These datasets are consumed by tha...
Annotation guidelines – whichproperties?Descriptions: dct:title, dct:description, rdfs:comment, dct:publisher, dct:license...
What is provenance?By Dr Stephen Dannlicensed under Creative Commons Attribution-ShareAlike 2.0 Generichttp://www.flickr.c...
AttributionWho collected this sample? Who helped?Which lab performed the sequencing?Who did the data analysis?Who curated ...
DerivationWhich sample was this metagenome sequenced from?Which meta-genomes was this sequence extracted from?Which sequen...
ActivitiesWhat happened? When? Who?What was used and generated?Why was this workflow started?Which workflow ran? Where?Why...
PROV modelhttp://www.w3.org/TR/prov-primer/Copyright © 2013 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved.Provenan...
Provenance of what?Who made the (content of) research object? Who maintains it?Who wrote this document? Who uploaded it?Wh...
Research object model at a glanceResearchObjectResourceResourceResourceAnnotationAnnotationAnnotationoa:hasTargetResourceR...
Wf4Ever architectureBlob storeGraphstoreResourceUploaded toManifestAnnotationgraphResearchobjectAnnotationORE ProxyExterna...
Where do RO annotations comefrom?Imported from uploaded resources, e.g. embedded inworkflow-specific format (creator: unkn...
How we are using the OA modelMultiple oa:Annotation contained within the manifest RDF andaggregated by the RO.Provenance (...
What should we also be using?MotivationsmyExperiment: commenting, describing, moderating, questioning,replying, tagging – ...
What is the benefit of OA for us?Existing vocabulary – no need for our project to try tospecify and agree on our own way o...
History lesson (AO/OAC/OA)When forming the Wf4Ever Research Object model, we found:Open Annotation Collaboration (OAC)Anno...
Saving a research object:RO bundleSingle, transferrable research objectSelf-contained snapshotWhich files in ZIP, which ar...
Workflow Results Bundleworkflowrun.prov.ttl(RDF)outputA.txtoutputC.jpgoutputB/https://w3id.org/bundlintermediates/1.txt2.t...
RO BundleWhat is aggregated? File In ZIP or external URIWho made the RO? When?Who?External URIs placed in foldersEmbedded ...
http://mayor2.dia.fi.upm.es/oeg-upm/files/dgarijo/motifAnalysisSite/<h3 property="dc:title">Common Motifs in Scientific Wo...
W3C community group for ROhttp://www.w3.org/community/rosc/
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
Upcoming SlideShare
Loading in …5
×

2013 06-24 Wf4Ever: Annotating research objects (PPTX)

718 views

Published on

Open Annotation Rollout, Manchester, 2013-06-25
See also PDF version: http://www.slideshare.net/soilandreyes/2013-0624annotatingr-osopenannotationmeeting-23289491

Published in: Technology
  • Be the first to comment

  • Be the first to like this

2013 06-24 Wf4Ever: Annotating research objects (PPTX)

  1. 1. Wf4Ever: Annotatingresearch objectsStian Soiland-Reyes, Sean BechHofermyGrid, University of ManchesterOpen Annotation Rollout, Manchester, 2013-06-24This work is licensed under aCreative Commons Attribution 3.0 Unported License
  2. 2. Motivation: Scientific workflowsCoordinated execution ofservices and linked resourcesDataflow between servicesWeb services (SOAP, REST)Command line toolsScriptsUser interactionsComponents (nested workflows)Method becomes:Documented visuallyShareable as single definitionReusable with new inputsRepurposable other servicesReproducible?http://www.myexperiment.org/workflows/3355http://www.taverna.org.uk/http://www.biovel.eu/
  3. 3. But workflows are complex machinesOutputsInputsConfigurationComponentshttp://www.myexperiment.org/workflows/3355• Will it still work after a year? 10 years?• Expanding components, we see a workflow involves aseries of specific tools and services which• Depend on datasets, software libraries, other tools• Are often poorly described or understood• Over time evolve, change, break or are replaced• User interactions are not reproducible• But can be tracked and replayed
  4. 4. Electronic Paper Not EnoughHypothesis ExperimentResultAnalysisConclusionsInvestigationData DataElectronicpaperPublishhttp://www.force11.org/beyondthepdf2http://figshare.com/Open Research movement: Openly share the data of your experimentshttp://datadryad.org/
  5. 5. http://www.researchobject.org/RESEARCH OBJECT (RO)http://www.researchobject.org/Research objects goal: Openly share everything about yourexperiments, including how those things are related
  6. 6. What is in a research object?A Research Object bundles and relates digitalresources of a scientific experiment orinvestigation:Data used and results produced inexperimental studyMethods employed to produce and analysethat dataProvenance and settings for the experimentsPeople involved in the investigationAnnotations about these resources, that areessential to the understanding andinterpretation of the scientific outcomescaptured by a research objecthttp://www.researchobject.org/
  7. 7. Gathering everythingResearch Objects (RO) aggregate related resources, theirprovenance and annotationsConveys “everything you need to know” about astudy/experiment/analysis/dataset/workflowShareable, evolvable, contributable, citableROs have their own provenance and lifecycles
  8. 8. Why Research Objects?i. To share your research materials(RO as a social object)ii. To facilitate reproducibility and reuse of methodsiii. To be recognized and cited(even for constituent resources)iv. To preserve results and prevent decay(curation of workflow definition;using provenance for partial rerun)
  9. 9. A Research object http://alpha.myexperiment.org/packs/387
  10. 10. QualityAssessment of a research object
  11. 11. QualityMonitoring
  12. 12. Annotations in research objectsTypes: “This document contains an hypothesis”Relations: “These datasets are consumed by that tool”Provenance: “These results came from this workflow run”Descriptions: “Purpose of this step is to filter out invalid data”Comments: “This method looks useful, but how do I install it?”Examples: “This is how you could use it”
  13. 13. Annotation guidelines – whichproperties?Descriptions: dct:title, dct:description, rdfs:comment, dct:publisher, dct:license,dct:subjectProvenance: dct:created, dct:creator, dct:modified, pav:providedBy,pav:authoredBy, pav:contributedBy, roevo:wasArchivedBy, pav:createdAtProvenance relations: prov:wasDerivedFrom, prov:wasRevisionOf,wfprov:usedInput, wfprov:wasOutputFromSocial networking: oa:Tag, mediaont:hasRating, roterms:technicalContact,cito:isDocumentedBy, cito:isCitedByDependencies: dcterms:requires, roterms:requiresHardware,roterms:requiresSoftware, roterms:requiresDatasetTyping: wfdesc:Workflow, wf4ever:Script, roterms:Hypothesis, roterms:Results,dct:BibliographicResource
  14. 14. What is provenance?By Dr Stephen Dannlicensed under Creative Commons Attribution-ShareAlike 2.0 Generichttp://www.flickr.com/photos/stephendann/3375055368/Attributionwho did it?Derivationhow did it change?Activitywhat happens to it?Licensingcan I use it?Attributeswhat is it?Originwhere is it from?Annotationswhat do others say about it?Aggregationwhat is it part of?Date and toolwhen was it made?using what?
  15. 15. AttributionWho collected this sample? Who helped?Which lab performed the sequencing?Who did the data analysis?Who curated the results?Who produced the raw data this analysis is based on?Who wrote the analysis workflow?Why do I need this?i. To be recognized for my workii. Who should I give credits to?iii. Who should I complain to?iv. Can I trust them?v. Who should I make friends with?prov:wasAttributedToprov:actedOnBehalfOfdct:creatordct:publisherpav:authoredBypav:contributedBypav:curatedBypav:createdBypav:importedBypav:providedBy...RolesPersonOrganizationSoftwareAgentAgent typesAliceThelabDatawasAttributedToactedOnBehalfOfhttp://practicalprovenance.wordpress.com/
  16. 16. DerivationWhich sample was this metagenome sequenced from?Which meta-genomes was this sequence extracted from?Which sequence was the basis for the results?What is the previous revision of the new results?Why do I need this?i. To verify consistency (did I usethe correct sequence?)ii. To find the latest revisioniii. To backtrack where a diversionappeared after a changeiv. To credit work I depend onv. Auditing and defence for peer reviewwasDerivedFromwasQuotedFromSequenceNewresultswasDerivedFromSampleMeta -genomeOldresultswasRevisionOfwasInfluencedBy
  17. 17. ActivitiesWhat happened? When? Who?What was used and generated?Why was this workflow started?Which workflow ran? Where?Why do I need this?i. To see which analysis was performedii. To find out who did whatiii. What was the metagenomeused for?iv. To understand the whole process“make me a Methods section”v. To track down inconsistenciesusedwasGeneratedBywasStartedAt"2012-06-21"MetagenomeSamplewasAssociatedWithWorkflowserverwasInformedBywasStartedByWorkflowrunwasGeneratedByResultsSequencingwasAssociatedWithAlicehadPlanWorkflowdefinitionhadRoleLabtechnicianResults
  18. 18. PROV modelhttp://www.w3.org/TR/prov-primer/Copyright © 2013 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved.Provenance Working Group
  19. 19. Provenance of what?Who made the (content of) research object? Who maintains it?Who wrote this document? Who uploaded it?Which CSV was this Excel file imported from?Who wrote this description? When? How did we get it?What is the state of this RO? (Live or Published?)What did the research object look like before? (Revisions) – arethere newer versions?Which research objects are derived from this RO?
  20. 20. Research object model at a glanceResearchObjectResourceResourceResourceAnnotationAnnotationAnnotationoa:hasTargetResourceResourceAnnotation graphoa:hasBodyore:aggregates«ore:Aggregation»«ro:ResearchObject»«oa:Annotation»«ro:AggregatedAnnotation»«trig:Graph»«ore:AggregatedResource»«ro:Resource»Manifest«ore:ResourceMap»«ro:Manifest»
  21. 21. Wf4Ever architectureBlob storeGraphstoreResourceUploaded toManifestAnnotationgraphResearchobjectAnnotationORE ProxyExternalreferenceRedirects toIf RDF, import as named graphSPARQLREST resourceshttp://www.wf4ever-project.org/wiki/display/docs/RO+API+6
  22. 22. Where do RO annotations comefrom?Imported from uploaded resources, e.g. embedded inworkflow-specific format (creator: unknown!)Created by users filling in Title, Description etc. on websiteBy automatically invoked software agents, e.g.:A workflow transformation service extracts the workflowstructure as RDF from the native workflow formatProvenance trace from a workflow run, which describes theorigin of aggregated output files in the research object
  23. 23. How we are using the OA modelMultiple oa:Annotation contained within the manifest RDF andaggregated by the RO.Provenance (PAV, PROV) on oa:Annotation (who made the link)and body resource (who stated it)Typically a single oa:hasTarget, either the RO or an aggregatedresource.oa:hasBody to a trig:Graph resource (read: RDF file) with the“actual” annotation as RDF:<workflow1> dct:title "The wonderful workflow" .Multiple oa:hasTarget for relationships, e.g. graph body:<workflow1> roterms:inputSelected <input2.txt> .
  24. 24. What should we also be using?MotivationsmyExperiment: commenting, describing, moderating, questioning,replying, tagging – made our own vocabulary as OA did not existSelectors on compound resourcesE.g. description on processors within a workflow in a workflowdefinition. How do you find this if you only know the workflowdefinition file?Currently: Annotations on separate URIs for each component,described in workflow structure graph, which is body of annotationtargeting the workflow definition fileImporting/referring to annotations from other OA systems(how to discover those?)
  25. 25. What is the benefit of OA for us?Existing vocabulary – no need for our project to try tospecify and agree on our own way of trackingannotations.Potential interoperability with third-party annotationtoolsE.g. We want to annotate a figure in a paper andrelate it to a dataset in a research object – don’twant to write another tool for that!Existing annotations (pre research object) in Tavernaand myExperiment map easily to OA model
  26. 26. History lesson (AO/OAC/OA)When forming the Wf4Ever Research Object model, we found:Open Annotation Collaboration (OAC)Annotation Ontology (AO)What was the difference?Technically, for Wf4Ever’s purposes: They are equivalentPolitical choice: AO – supported by Utopia (Manchester)We encouraged the formation of W3C Open AnnotationCommunity Group and a joint modelNext: Research Object model v0.2 and RO Bundle will use theOA model – since we only used 2 properties, mapping is 1:1http://www.wf4ever-project.org/wiki/display/docs/2011-09-26+Annotation+model+considerations
  27. 27. Saving a research object:RO bundleSingle, transferrable research objectSelf-contained snapshotWhich files in ZIP, which are URIs? (Up to user/application)Regular ZIP file, explored and unpacked with standard toolsJSON manifest is programmatically accessible without RDFunderstandingWorks offline and in desktop applications – no REST APIaccess requiredBasis for RO-enabled file formats, e.g. Taverna run bundleExchanged with myExperiment and RO tools
  28. 28. Workflow Results Bundleworkflowrun.prov.ttl(RDF)outputA.txtoutputC.jpgoutputB/https://w3id.org/bundlintermediates/1.txt2.txt3.txtde/def2e58b-50e2-4949-9980-fd310166621a.txtinputA.txtworkflowURIreferencesattributionexecutionenvironmentAggregating in Research ObjectZIP folder structure (RO Bundle)mimetypeapplication/vnd.wf4ever.robundle+zip.ro/manifest.json
  29. 29. RO BundleWhat is aggregated? File In ZIP or external URIWho made the RO? When?Who?External URIs placed in foldersEmbedded annotationExternal annotation, e.g. blogpostJSON-LD context  RDFRO provenance.ro/manifest.jsonFormatNote: JSON "quotes" not shown above for brevityhttp://json-ld.org/http://orcid.org/https://w3id.org/bundle
  30. 30. http://mayor2.dia.fi.upm.es/oeg-upm/files/dgarijo/motifAnalysisSite/<h3 property="dc:title">Common Motifs in Scientific Workflows:<br>An Empirical Analysis</h3><body resource="http://www.oeg-upm.net/files/dgarijo/motifAnalysisSite/"typeOf="ore:Aggregation ro:ResearchObject">Research Object as RDFahttp://www.oeg-upm.net/files/dgarijo/motifAnalysisSite/<li><a property="ore:aggregates" href="t2_workflow_set_eSci2012.v.0.9_FGCS.xls"typeOf="ro:Resource">Analytics for Taverna workflows</a></li><li><a property="ore:aggregates" href="WfCatalogue-AdditionalWingsDomains.xlsx“typeOf="ro:Resource">Analytics for Wings workflows</a></li><span property="dc:creator prov:wasAttributedTo"resource="http://delicias.dia.fi.upm.es/members/DGarijo/#me"></span>
  31. 31. W3C community group for ROhttp://www.w3.org/community/rosc/

×