Representing Microarray Experiment Metadata Using Provenance Models

1,292
-1

Published on

James McCusker and Deborah L. McGuinness


Presentation Abstract:
MAGE (MicroArray and Gene Expression) representations are primarily representations of workflow: a process was used to derive biomaterial A from biomaterial B. This representation is ideally suited for representation using provenance models such as OPM (Open Provenance Model) and PML (Proof Markup Language). We demonstrate methods and tools, MAGE2OPM and MAGE2PML, to convert RDF representations of MAGE graphs to OPM and PML respectively. We analyze the coverage of representation for each model. We argue that provenance models are sufficient to represent biomedical experimental metadata in general, and may provide a useful point of reference for unifying information from multiple systems, including biospecimen management, Laboratory Information Systems (LIMS), and computational workflow automation tools. This unification results in a complete, computationally useful derivational picture of biomedical experimental data.

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,292
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
14
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide













































  • Representing Microarray Experiment Metadata Using Provenance Models

    1. 1. The Tetherless World Constellation Representing Microarray Experiment Metadata Using Provenance Models Jim McCusker and Deborah L. McGuinness Tetherless World Constellation, Rensselaer Polytechnic Institute http://tw.rpi.edu
    2. 2. Observation • experimental metadata (n): The information about an experiment that describes it in sufficient detail for someone to understand and reproduce the experiment. • provenance in science (n): The information about an experiment that describes it in sufficient detail for someone to understand and reproduce the experiment.
    3. 3. Experimental metadata is provenance.
    4. 4. Examples: Provenance Representations All of these are representations of provenance.
    5. 5. Examples: Provenance Representations • Computational Workflows All of these are representations of provenance.
    6. 6. Examples: Provenance Representations • Computational Workflows • Experimental Workflows All of these are representations of provenance.
    7. 7. Examples: Provenance Representations • Computational Workflows • Experimental Workflows • Biospecimen Management All of these are representations of provenance.
    8. 8. Examples: Provenance Representations • Computational Workflows • Experimental Workflows • Biospecimen Management • Data Sources, Ownership & Attribution All of these are representations of provenance.
    9. 9. Hypothesis • If experimental metadata is provenance: • Then a good model of provenance can be used to encode experimental metadata as well as a domain-specific metadata. • Then a general model of provenance can model the entirety of experimental metadata. • We try to evaluate this by mapping microarray metadata onto provenance graphs.
    10. 10. Why a generic provenance model?
    11. 11. An Integrated View of Provenance
    12. 12. An Integrated View of Provenance Transformation Normalization Scan Hybridization Labeled Cell Extract Extract Line
    13. 13. An Integrated View of Provenance Transformation Normalization Scan Hybridization Labeled Cell Extract Section Biopsy Participant Extract Line
    14. 14. An Integrated View of Provenance Transformation Normalization Clinical Scan Diagnosis Hybridization Labeled Cell Extract Section Biopsy Participant Extract Line
    15. 15. An Integrated View of Provenance Transformation Normalization Clinical Scan Diagnosis Hybridization DOB Labeled Cell Extract Section Biopsy Participant Extract Line
    16. 16. An Integrated View of Provenance Transformation Normalization Histology Clinical Scan Image Diagnosis Hybridization Slide DOB Labeled Cell Extract Section Biopsy Participant Extract Line
    17. 17. An Integrated View of Provenance Transformation Lab Lab Lab Lab Results Results Normalization Results Results Histology Clinical Scan Image Diagnosis Blood Hybridization Slide DOB Sample Labeled Cell Extract Section Biopsy Participant Extract Line
    18. 18. An Integrated View of Provenance Transformation Lab Lab Adverse Adverse Lab Lab Adverse Adverse Results Results Events Events Normalization Results Results Events Events Histology Clinical Scan Image Diagnosis Blood Hybridization Slide DOB Sample Labeled Cell Extract Section Biopsy Participant Extract Line
    19. 19. An Integrated View of Provenance Transformation Lab Lab Adverse Adverse Lab Lab Adverse Adverse Results Results Events Events Normalization Results Results Events Events Histology Clinical Scan Image Diagnosis Blood Hybridization Slide DOB Sample Labeled Cell Extract Section Biopsy Participant Extract Line Biopsy Event Date
    20. 20. Related Work, ABC Soup •MAGE (MicroArray and Gene Expression Object Model •MAGE-TAB: Tab-delimited spreadsheet of MAGE. •IDF (Investigation Design Format) •SDRF (Sample and Data Relationship Format) •MGED (Microarray Gene Expression Data)Ontology and MGED Society •OPM (Open Provenance Model) •PML (Proof Markup Language)
    21. 21. General Models of Provenance: Open Provenance Model
    22. 22. General Models of Provenance: Proof Markup Language
    23. 23. Methods Overview MAGE-TAB MAGE-TAB IDF SDRF
    24. 24. Methods Overview MAGE-TAB MAGE-TAB IDF SDRF Used(IDF) Used(SDRF) MAGE-TAB to MAGE-RDF WasGeneratedBy(RDF) MAGE-RDF
    25. 25. Methods Overview MAGE-TAB MAGE-TAB IDF SDRF Used(IDF) Used(SDRF) MAGE-TAB to mageprov.rules MAGE-RDF mageprovenance.owl Used(Rules) WasGeneratedBy(RDF) Jena Rule Used(OWL) MAGE-RDF Used(RDF) Engine
    26. 26. Methods Overview MAGE-TAB MAGE-TAB IDF SDRF PML & OPM Used(IDF) Used(SDRF) WasGeneratedBy(RDF) MAGE-TAB to mageprov.rules MAGE-RDF mageprovenance.owl Used(Rules) WasGeneratedBy(RDF) Jena Rule Used(OWL) MAGE-RDF Used(RDF) Engine
    27. 27. MAGE-TAB 2 MAGE-RDF: an RDF Version of MAGE • Used Limpopo to parse MAGE-TAB • Big thanks to HCLS SIG for input • (Small) issues with MGED Ontology: • Missing ProtocolApplication, some key properties. • Cannot perform inferencing without ignoring owl:someValuesFrom. • Contains 234 owl:Classes with 193 owl:someValuesFrom restrictions. • Should be using integrity constraint instead. • http://magetab2rdf.googlecode.com
    28. 28. Methods: MAGE-RDF to Open Provenance Model • 16 rules in Jena Example Rule: Inference Language [opm.generated.sample: (?exp rdf:type mged:Experiment), • 15 subClass (?exp mage:has_protocol_application ?pa), makeTemp(?gen), mappings (?pa mged:has_protocol ?protocol), (?pa mage:has_derivative ?sample) -> • 10 individuals (?gen rdf:type opm:WasGeneratedBy), (?gen opm:cause ?protocol), (Roles) (?gen opm:account ?exp), (?gen opm:effect ?sample), (?sample rdf:type opm:Artifact) • 2 subProperty ] mappings
    29. 29. Methods: MAGE-RDF to Proof Markup Language • 4 rules in Jena Example Rule: Inference [pml.protocolApplication: (?pa rdf:type mage:ProtocolApplication), Language (?pa mage:has_derivation_source ?derivation), (?pa mged:has_protocol ?protocol), (?pa mage:has_derivative ?derived), • 14 subClass (?pa mged:has_protocol ?protocol), (?derivedNodeSet pmlj:hasConclusion ?derived), (?derivationNodeSet pmlj:hasConclusion ?derivation), mappings -> makeTemp(?antecedentList) (?pa rdf:type pmlj:InferenceStep), • 2 subProperty (?pa pmlj:hasInferenceRule ?protocol), (?pa pmlj:hasIndex 1), mappings (?derivedNodeSet pmlj:isConsequentOf ?pa), (?pa pmlj:hasAntecedentList ?antecedentList), (?antecedentList rdf:type pmlj:NodeSetList), (?antecedentList ds:first ?derivationNodeSet) ]
    30. 30. Evaluation: Declarative Mapping Pros: Cons: • Easy to understand. • “Unclean” • Easy to validate. representations. • Little programming • More difficult to involved. debug. • Many languages. • Many languages. • Easy integration with • Logic programming existing ontologies. not everyone's cup of tea. • Powerful rule languages available.
    31. 31. Evaluation: Expressiveness & Coverage
    32. 32. Evaluation: Expressiveness & Coverage Missing in OPM: • See below.
    33. 33. Evaluation: Expressiveness & Coverage Missing in OPM: Missing in PML: • See below. • Parameters • Parameter Values (could have been variable mappings)
    34. 34. Evaluation: Expressiveness & Coverage Missing in OPM: Missing in PML: • See below. • Parameters • Parameter Values Could have added (could have been to OPM and PML: variable mappings) • Publications • Time • Comments
    35. 35. Evaluation: Expressiveness & Coverage Missing in OPM: Missing in PML: • See below. • Parameters • Parameter Values Could have added (could have been to OPM and PML: variable mappings) • Publications MAGE comments are • Time structured name-value pairs. • Comments These would have to be represented as PML Information and OPM Artifacts.
    36. 36. Evaluation: PML Visualization
    37. 37. Evaluation: PML Visualization
    38. 38. Evaluation: OPM Visualization
    39. 39. Evaluation: OPM Visualization
    40. 40. Future Work • Integration with: • Tissue management tools • Clinical data sources • Workflow engines (Taverna already exports OPM) • Use case-based evaluation of sufficiency.
    41. 41. Conclusions • Experimental metadata is provenance. • All experimental provenance can benefit from a common model. • MAGE-provenance means 9,975 assays ready for conversion to a common model. • OPM provides slightly better modeling coverage and workflow integration. • PML provides better reasoning coverage and theorem prover integration.
    42. 42. Acknowledgements and References • Tetherless World (Deborah McGuinness et al.) • Krauthammer Lab (Michael Krauthammer et al.) • W3C Health Care and Life Sciences SIG. • LIMPOPO: limpopo.sourceforge.net • MAGE-TAB: magetab.sourceforge.net • MGED Society: www.mged.org • PML: www.inference-web.org • OPM: www.openprovenance.org • Jiao Tao, Li Ding, Deborah L. McGuinness, Instance Data Evaluation for Semantic Web-Based Knowledge Management Systems, 42th Hawaii International Conference on System Sciences (HICSS-42), January 2009.

    ×