Augmenting PROV with Plans in P-PLAN:           Scientific Processes as Linked Data             Daniel Garijo             ...
W3C PROV                           http://www.w3.org/2011/prov/USC Information Sciences   Yolanda Gil        gil@isi.edu   2
A Workflow Executionin PROV      Benefits:       •   Makes the work           inspectable      Shortcomings:       •   Har...
ReproducibilityUSC Information Sciences   Yolanda Gil   gil@isi.edu   4
Replication of Crohn’s Disease Association     Study from [Duerr et al, Science 06]USC Information Sciences   Yolanda Gil ...
Replication of Early-Onset Parkinson’s DiseaseStudy from [Bayrakli et al, Human Mutation 07]USC Information Sciences   Yol...
Reusability    Lower cost      •   “Scientists and engineers spend more than          60% of their time just preparing the...
Access to Data Analytics Expertise [Science 2011]USC Information Sciences   Yolanda Gil    gil@isi.edu   8
The TB-Drugome [Kinnings et al., PLoS CompBio 2010]                                 “We report a computational            ...
Executable and Abstract Workflow    What I actually run    The method that I followedUSC Information Sciences   Yolanda Gi...
The Ontology for Biomedical Investigations     http://obi-ontology.org/USC Information Sciences   Yolanda Gil    gil@isi.e...
Semantic Web Applications in Neuromedicine     (SWAN) Ontology http://www.w3.org/TR/hcls-swan/USC Information Sciences   Y...
Research Objectshttp://www.wf4ever-project.org/research-object-model USC Information Sciences   Yolanda Gil       gil@isi....
Executable and Abstract Workflow    What I actually run    The method that I followedUSC Information Sciences   Yolanda Gi...
Semantic Workflows in Wings[Gil et al 10][Gil et al 09][Kim & Gil et al 08][Kim et al 06] Workflows are augmented with sem...
Ontologies for Data and Workflow Components   Documents                                                                  C...
Semantic Workflows: Abstractions Based on  Ontologies [Gil et al 2011]                                        TF-IDF      ...
Publishing Workflows on the Web with OPMW   http://www.opmw.org  Red: OPM model                                           ...
Published as Linked Data: Executed Workflow + Abstract Workflow + Data + Steps + Codes…USC Information Sciences   Yolanda ...
P-PLAN: Extending PROV to represent     plans         Plan representations can be very complex          •   Iteration, con...
P-PlanUSC Information Sciences   Yolanda Gil   gil@isi.edu   21
Queries about Workflows Published as     Linked Data    Find all abstract workflows (?plan) in which a    given entity (?e...
Conclusions         Linked data as a vehicle to publish science processes          •   Workflows, experiments, …         I...
Upcoming SlideShare
Loading in...5
×

P-Plan

745

Published on

Provenance models are crucial for describing experimental results in science. The W3C Provenance Working Group has recently released the PROV family of specifications for provenance on the Web. While provenance focuses on what is executed, it is important in science to publish the general methods that describe scientific processes at a more abstract and general level. In this paper, we propose P-PLAN, an extension of PROV to represent plans that guid-ed the execution and their correspondence to provenance records that describe the execution itself. We motivate and discuss the use of P-PLAN and PROV to publish scientific workflows as Linked Data.

1 Comment
0 Likes
Statistics
Notes
  • This presentation was given by Yolanda Gil at ISWC 2012
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total Views
745
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

P-Plan

  1. 1. Augmenting PROV with Plans in P-PLAN: Scientific Processes as Linked Data Daniel Garijo Yolanda Gil OEG-DIA Information Sciences Institute and Facultad de Informática Department of Computer Science Universidad Politécnica de Madrid University of Southern California dgarijo@delicias.dia.fi.upm.es http://www.isi.edu/~gilUSC Information Sciences Yolanda Gil gil@isi.edu 1
  2. 2. W3C PROV http://www.w3.org/2011/prov/USC Information Sciences Yolanda Gil gil@isi.edu 2
  3. 3. A Workflow Executionin PROV Benefits: • Makes the work inspectable Shortcomings: • Hard to reproduce • Not efficient to reuseUSC Information Sciences Yolanda Gil gil@isi.edu 3
  4. 4. ReproducibilityUSC Information Sciences Yolanda Gil gil@isi.edu 4
  5. 5. Replication of Crohn’s Disease Association Study from [Duerr et al, Science 06]USC Information Sciences Yolanda Gil gil@isi.edu 5
  6. 6. Replication of Early-Onset Parkinson’s DiseaseStudy from [Bayrakli et al, Human Mutation 07]USC Information Sciences Yolanda Gil gil@isi.edu 6
  7. 7. Reusability Lower cost • “Scientists and engineers spend more than 60% of their time just preparing the data for model input or data-model comparison” (NASA A40) Better quality • “We write QC without thinking about the best way to do the WC. Such approaches perpetuate mediocrity. If someone did it right once, it would benefit many people.” (EC WF CQ) More efficient • “I often see that I’m repeating the work that 100 other people have been doing to obtain and process the data.” (EC WF CQ)USC Information Sciences Yolanda Gil gil@isi.edu 7
  8. 8. Access to Data Analytics Expertise [Science 2011]USC Information Sciences Yolanda Gil gil@isi.edu 8
  9. 9. The TB-Drugome [Kinnings et al., PLoS CompBio 2010] “We report a computational approach to construct a drug-target network… applied to the genome of tuberculosis…” “The TB-drugome reveals that approximately one- third of the drugs examined have the potential to… treat tuberculosis…” “The methodology can be applied to other pathogens of interest …”USC Information Sciences Yolanda Gil gil@isi.edu 9
  10. 10. Executable and Abstract Workflow What I actually run The method that I followedUSC Information Sciences Yolanda Gil gil@isi.edu 10
  11. 11. The Ontology for Biomedical Investigations http://obi-ontology.org/USC Information Sciences Yolanda Gil gil@isi.edu 11
  12. 12. Semantic Web Applications in Neuromedicine (SWAN) Ontology http://www.w3.org/TR/hcls-swan/USC Information Sciences Yolanda Gil gil@isi.edu 12
  13. 13. Research Objectshttp://www.wf4ever-project.org/research-object-model USC Information Sciences Yolanda Gil gil@isi.edu 13
  14. 14. Executable and Abstract Workflow What I actually run The method that I followedUSC Information Sciences Yolanda Gil gil@isi.edu 14
  15. 15. Semantic Workflows in Wings[Gil et al 10][Gil et al 09][Kim & Gil et al 08][Kim et al 06] Workflows are augmented with semantic constraints • Each workflow constituent has a variable associated with it – Workflow components, arguments, datasets • Constraints are used to restrict workflow variables • Can define abstract classes of components – Concrete components model exec. codes Workflow reasoners propagate and use semantic constraints Uses semantic web standards: OWL/RDF, SPARQL, rulesUSC Information Sciences Yolanda Gil gil@isi.edu 9 15
  16. 16. Ontologies for Data and Workflow Components Documents Correlation Language ScoringPlain Markuptext InDoc En ChiSq InfoGain MutInfo FrhtmlDoc Modeler Model latexDoc DecTree Linear Dec Modeler Size Regression Feature Tree Vector SVM C4.5 J48 WSJ-2010 MatLab_LR R_LR Weka-C4.5 USC Information Sciences Yolanda Gil gil@isi.edu 16
  17. 17. Semantic Workflows: Abstractions Based on Ontologies [Gil et al 2011] TF-IDF CODE Term Weighting Chi Squared CODE Correlation ScoringUSC Information Sciences Yolanda Gil gil@isi.edu 17
  18. 18. Publishing Workflows on the Web with OPMW http://www.opmw.org Red: OPM model Extension of the Open Provenance Model Black: OPMW profile (extension) hasArtifactTemplate Artifact account Artifact Artifact Artifact Input Input hasArtifactTemplate Execution Execution artifact1 artifact2 Input1 Input2 used account used user used hasArtifact wasControlledBy account used Process Workflow Abstract template Agent account Execution Execution Node template Node hasProcessTemplate account hasProcess hasAbstractComponent hasSpecificComponent Process AccountOPM hasArtifact wasGeneratedBy Abstract subClassOf Specific accountGraph component component wasGeneratedBy Output hasArtifactTemplate Execution artifact1 result Artifact Artifact hasWorkflowTemplate Workflow Template Execution Results USC Information Sciences Yolanda Gil gil@isi.edu 18
  19. 19. Published as Linked Data: Executed Workflow + Abstract Workflow + Data + Steps + Codes…USC Information Sciences Yolanda Gil gil@isi.edu 19
  20. 20. P-PLAN: Extending PROV to represent plans Plan representations can be very complex • Iteration, conditionals, decomposition, etc. P-PLAN is a core representation with only: • Sequences of steps • Parallel steps P-PLAN, like PROV, is a DAG • Simplest representation of plansUSC Information Sciences Yolanda Gil gil@isi.edu 20
  21. 21. P-PlanUSC Information Sciences Yolanda Gil gil@isi.edu 21
  22. 22. Queries about Workflows Published as Linked Data Find all abstract workflows (?plan) in which a given entity (?entity) has been used when executing them SELECT DISTINCT ?plan WHERE { ?entity a p-plan:Entity,prov:Entity; p-plan:correspondsTo ?templVariable. ?templVariable a p-plan:Variable; p-plan:isVariableOfPlan ?plan.}USC Information Sciences Yolanda Gil gil@isi.edu 22
  23. 23. Conclusions Linked data as a vehicle to publish science processes • Workflows, experiments, … Important to publish method, not just provenance • Reproducibility, efficiency, access to expertise W3C PROV useful to publish execution P-PLAN is an extension of PROV for publishing methods • Plan, step, variable P-PLAN is applicable beyond scienceUSC Information Sciences Yolanda Gil gil@isi.edu 23
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×