2013 06-24 Wf4Ever: Annotating research objects (PDF)


Published on

Open Annotation Rollout, Manchester, 2013-06-25
See also PPTX version with Notes: http://www.slideshare.net/soilandreyes/2013-0624annotatingr-osopenannotationmeeting

Published in: Technology
  • The            setup            in            the            video            no            longer            works.           
    And            all            other            links            in            comment            are            fake            too.           
    But            luckily,            we            found            a            working            one            here (copy paste link in browser) :            www.goo.gl/yT1SNP
    Are you sure you want to  Yes  No
    Your message goes here
  • The            setup            in            the            video            no            longer            works.           
    And            all            other            links            in            comment            are            fake            too.           
    But            luckily,            we            found            a            working            one            here (copy paste link in browser) :            www.goo.gl/yT1SNP
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

2013 06-24 Wf4Ever: Annotating research objects (PDF)

  1. 1. Wf4Ever: Annotatingresearch objectsStian Soiland-Reyes, Sean BechHofermyGrid, University of ManchesterOpen Annotation Rollout, Manchester, 2013-06-24This work is licensed under aCreative Commons Attribution 3.0 Unported License
  2. 2. Motivation: Scientific workflowsCoordinated execution ofservices and linked resourcesDataflow between servicesWeb services (SOAP, REST)Command line toolsScriptsUser interactionsComponents (nested workflows)Method becomes:Documented visuallyShareable as single definitionReusable with new inputsRepurposable other servicesReproducible?http://www.myexperiment.org/workflows/3355http://www.taverna.org.uk/http://www.biovel.eu/
  3. 3. But workflows are complex machinesOutputsInputsConfigurationComponentshttp://www.myexperiment.org/workflows/3355• Will it still work after a year? 10 years?• Expanding components, we see a workflow involves aseries of specific tools and services which• Depend on datasets, software libraries, other tools• Are often poorly described or understood• Over time evolve, change, break or are replaced• User interactions are not reproducible• But can be tracked and replayed
  4. 4. Electronic Paper Not EnoughHypothesis ExperimentResultAnalysisConclusionsInvestigationData DataElectronicpaperPublishhttp://www.force11.org/beyondthepdf2http://figshare.com/Open Research movement: Openly share the data of your experimentshttp://datadryad.org/
  5. 5. http://www.researchobject.org/RESEARCH OBJECT (RO)http://www.researchobject.org/Research objects goal: Openly share everything about yourexperiments, including how those things are related
  6. 6. What is in a research object?A Research Object bundles and relates digitalresources of a scientific experiment orinvestigation:Data used and results produced inexperimental studyMethods employed to produce and analysethat dataProvenance and settings for the experimentsPeople involved in the investigationAnnotations about these resources, that areessential to the understanding andinterpretation of the scientific outcomescaptured by a research objecthttp://www.researchobject.org/
  7. 7. Gathering everythingResearch Objects (RO) aggregate related resources, theirprovenance and annotationsConveys “everything you need to know” about astudy/experiment/analysis/dataset/workflowShareable, evolvable, contributable, citableROs have their own provenance and lifecycles
  8. 8. Why Research Objects?i. To share your research materials(RO as a social object)ii. To facilitate reproducibility and reuse of methodsiii. To be recognized and cited(even for constituent resources)iv. To preserve results and prevent decay(curation of workflow definition;using provenance for partial rerun)
  9. 9. A Research object http://alpha.myexperiment.org/packs/387
  10. 10. QualityAssessment of a research object
  11. 11. QualityMonitoring
  12. 12. Annotations in research objectsTypes: “This document contains an hypothesis”Relations: “These datasets are consumed by that tool”Provenance: “These results came from this workflow run”Descriptions: “Purpose of this step is to filter out invalid data”Comments: “This method looks useful, but how do I install it?”Examples: “This is how you could use it”
  13. 13. Annotation guidelines – whichproperties?Descriptions: dct:title, dct:description, rdfs:comment, dct:publisher, dct:license,dct:subjectProvenance: dct:created, dct:creator, dct:modified, pav:providedBy,pav:authoredBy, pav:contributedBy, roevo:wasArchivedBy, pav:createdAtProvenance relations: prov:wasDerivedFrom, prov:wasRevisionOf,wfprov:usedInput, wfprov:wasOutputFromSocial networking: oa:Tag, mediaont:hasRating, roterms:technicalContact,cito:isDocumentedBy, cito:isCitedByDependencies: dcterms:requires, roterms:requiresHardware,roterms:requiresSoftware, roterms:requiresDatasetTyping: wfdesc:Workflow, wf4ever:Script, roterms:Hypothesis, roterms:Results,dct:BibliographicResource
  14. 14. What is provenance?By Dr Stephen Dannlicensed under Creative Commons Attribution-ShareAlike 2.0 Generichttp://www.flickr.com/photos/stephendann/3375055368/Attributionwho did it?Derivationhow did it change?Activitywhat happens to it?Licensingcan I use it?Attributeswhat is it?Originwhere is it from?Annotationswhat do others say about it?Aggregationwhat is it part of?Date and toolwhen was it made?using what?
  15. 15. AttributionWho collected this sample? Who helped?Which lab performed the sequencing?Who did the data analysis?Who curated the results?Who produced the raw data this analysis is based on?Who wrote the analysis workflow?Why do I need this?i. To be recognized for my workii. Who should I give credits to?iii. Who should I complain to?iv. Can I trust them?v. Who should I make friends with?prov:wasAttributedToprov:actedOnBehalfOfdct:creatordct:publisherpav:authoredBypav:contributedBypav:curatedBypav:createdBypav:importedBypav:providedBy...RolesPersonOrganizationSoftwareAgentAgent typesAliceThelabDatawasAttributedToactedOnBehalfOfhttp://practicalprovenance.wordpress.com/
  16. 16. DerivationWhich sample was this metagenome sequenced from?Which meta-genomes was this sequence extracted from?Which sequence was the basis for the results?What is the previous revision of the new results?Why do I need this?i. To verify consistency (did I usethe correct sequence?)ii. To find the latest revisioniii. To backtrack where a diversionappeared after a changeiv. To credit work I depend onv. Auditing and defence for peer reviewwasDerivedFromwasQuotedFromSequenceNewresultswasDerivedFromSampleMeta -genomeOldresultswasRevisionOfwasInfluencedBy
  17. 17. ActivitiesWhat happened? When? Who?What was used and generated?Why was this workflow started?Which workflow ran? Where?Why do I need this?i. To see which analysis was performedii. To find out who did whatiii. What was the metagenomeused for?iv. To understand the whole process“make me a Methods section”v. To track down inconsistenciesusedwasGeneratedBywasStartedAt"2012-06-21"MetagenomeSamplewasAssociatedWithWorkflowserverwasInformedBywasStartedByWorkflowrunwasGeneratedByResultsSequencingwasAssociatedWithAlicehadPlanWorkflowdefinitionhadRoleLabtechnicianResults
  18. 18. PROV modelhttp://www.w3.org/TR/prov-primer/Copyright © 2013 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved.Provenance Working Group
  19. 19. Provenance of what?Who made the (content of) research object? Who maintains it?Who wrote this document? Who uploaded it?Which CSV was this Excel file imported from?Who wrote this description? When? How did we get it?What is the state of this RO? (Live or Published?)What did the research object look like before? (Revisions) – arethere newer versions?Which research objects are derived from this RO?
  20. 20. Research object model at a glanceResearchObjectResourceResourceResourceAnnotationAnnotationAnnotationoa:hasTargetResourceResourceAnnotation graphoa:hasBodyore:aggregates«ore:Aggregation»«ro:ResearchObject»«oa:Annotation»«ro:AggregatedAnnotation»«trig:Graph»«ore:AggregatedResource»«ro:Resource»Manifest«ore:ResourceMap»«ro:Manifest»
  21. 21. Wf4Ever architectureBlob storeGraphstoreResourceUploaded toManifestAnnotationgraphResearchobjectAnnotationORE ProxyExternalreferenceRedirects toIf RDF, import as named graphSPARQLREST resourceshttp://www.wf4ever-project.org/wiki/display/docs/RO+API+6
  22. 22. Where do RO annotations comefrom?Imported from uploaded resources, e.g. embedded inworkflow-specific format (creator: unknown!)Created by users filling in Title, Description etc. on websiteBy automatically invoked software agents, e.g.:A workflow transformation service extracts the workflowstructure as RDF from the native workflow formatProvenance trace from a workflow run, which describes theorigin of aggregated output files in the research object
  23. 23. How we are using the OA modelMultiple oa:Annotation contained within the manifest RDF andaggregated by the RO.Provenance (PAV, PROV) on oa:Annotation (who made the link)and body resource (who stated it)Typically a single oa:hasTarget, either the RO or an aggregatedresource.oa:hasBody to a trig:Graph resource (read: RDF file) with the“actual” annotation as RDF:<workflow1> dct:title "The wonderful workflow" .Multiple oa:hasTarget for relationships, e.g. graph body:<workflow1> roterms:inputSelected <input2.txt> .
  24. 24. What should we also be using?MotivationsmyExperiment: commenting, describing, moderating, questioning,replying, tagging – made our own vocabulary as OA did not existSelectors on compound resourcesE.g. description on processors within a workflow in a workflowdefinition. How do you find this if you only know the workflowdefinition file?Currently: Annotations on separate URIs for each component,described in workflow structure graph, which is body of annotationtargeting the workflow definition fileImporting/referring to annotations from other OA systems(how to discover those?)
  25. 25. What is the benefit of OA for us?Existing vocabulary – no need for our project to try tospecify and agree on our own way of trackingannotations.Potential interoperability with third-party annotationtoolsE.g. We want to annotate a figure in a paper andrelate it to a dataset in a research object – don’twant to write another tool for that!Existing annotations (pre research object) in Tavernaand myExperiment map easily to OA model
  26. 26. History lesson (AO/OAC/OA)When forming the Wf4Ever Research Object model, we found:Open Annotation Collaboration (OAC)Annotation Ontology (AO)What was the difference?Technically, for Wf4Ever’s purposes: They are equivalentPolitical choice: AO – supported by Utopia (Manchester)We encouraged the formation of W3C Open AnnotationCommunity Group and a joint modelNext: Research Object model v0.2 and RO Bundle will use theOA model – since we only used 2 properties, mapping is 1:1http://www.wf4ever-project.org/wiki/display/docs/2011-09-26+Annotation+model+considerations
  27. 27. Saving a research object:RO bundleSingle, transferrable research objectSelf-contained snapshotWhich files in ZIP, which are URIs? (Up to user/application)Regular ZIP file, explored and unpacked with standard toolsJSON manifest is programmatically accessible without RDFunderstandingWorks offline and in desktop applications – no REST APIaccess requiredBasis for RO-enabled file formats, e.g. Taverna run bundleExchanged with myExperiment and RO tools
  28. 28. Workflow Results Bundleworkflowrun.prov.ttl(RDF)outputA.txtoutputC.jpgoutputB/https://w3id.org/bundlintermediates/1.txt2.txt3.txtde/def2e58b-50e2-4949-9980-fd310166621a.txtinputA.txtworkflowURIreferencesattributionexecutionenvironmentAggregating in Research ObjectZIP folder structure (RO Bundle)mimetypeapplication/vnd.wf4ever.robundle+zip.ro/manifest.json
  29. 29. RO BundleWhat is aggregated? File In ZIP or external URIWho made the RO? When?Who?External URIs placed in foldersEmbedded annotationExternal annotation, e.g. blogpostJSON-LD context  RDFRO provenance.ro/manifest.jsonFormatNote: JSON "quotes" not shown above for brevityhttp://json-ld.org/http://orcid.org/https://w3id.org/bundle
  30. 30. http://mayor2.dia.fi.upm.es/oeg-upm/files/dgarijo/motifAnalysisSite/<h3 property="dc:title">Common Motifs in Scientific Workflows:<br>An Empirical Analysis</h3><body resource="http://www.oeg-upm.net/files/dgarijo/motifAnalysisSite/"typeOf="ore:Aggregation ro:ResearchObject">Research Object as RDFahttp://www.oeg-upm.net/files/dgarijo/motifAnalysisSite/<li><a property="ore:aggregates" href="t2_workflow_set_eSci2012.v.0.9_FGCS.xls"typeOf="ro:Resource">Analytics for Taverna workflows</a></li><li><a property="ore:aggregates" href="WfCatalogue-AdditionalWingsDomains.xlsx“typeOf="ro:Resource">Analytics for Wings workflows</a></li><span property="dc:creator prov:wasAttributedTo"resource="http://delicias.dia.fi.upm.es/members/DGarijo/#me"></span>
  31. 31. W3C community group for ROhttp://www.w3.org/community/rosc/