Representing Microarray Experiment Metadata Using Provenance Models
Upcoming SlideShare
Loading in...5
×
 

Representing Microarray Experiment Metadata Using Provenance Models

on

  • 1,738 views

James McCusker and Deborah L. McGuinness...

James McCusker and Deborah L. McGuinness


Presentation Abstract:
MAGE (MicroArray and Gene Expression) representations are primarily representations of workflow: a process was used to derive biomaterial A from biomaterial B. This representation is ideally suited for representation using provenance models such as OPM (Open Provenance Model) and PML (Proof Markup Language). We demonstrate methods and tools, MAGE2OPM and MAGE2PML, to convert RDF representations of MAGE graphs to OPM and PML respectively. We analyze the coverage of representation for each model. We argue that provenance models are sufficient to represent biomedical experimental metadata in general, and may provide a useful point of reference for unifying information from multiple systems, including biospecimen management, Laboratory Information Systems (LIMS), and computational workflow automation tools. This unification results in a complete, computationally useful derivational picture of biomedical experimental data.

Statistics

Views

Total Views
1,738
Views on SlideShare
1,729
Embed Views
9

Actions

Likes
1
Downloads
13
Comments
0

1 Embed 9

http://www.slideshare.net 9

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />

Representing Microarray Experiment Metadata Using Provenance Models Representing Microarray Experiment Metadata Using Provenance Models Presentation Transcript

  • The Tetherless World Constellation Representing Microarray Experiment Metadata Using Provenance Models Jim McCusker and Deborah L. McGuinness Tetherless World Constellation, Rensselaer Polytechnic Institute http://tw.rpi.edu
  • Observation • experimental metadata (n): The information about an experiment that describes it in sufficient detail for someone to understand and reproduce the experiment. • provenance in science (n): The information about an experiment that describes it in sufficient detail for someone to understand and reproduce the experiment.
  • Experimental metadata is provenance.
  • Examples: Provenance Representations All of these are representations of provenance.
  • Examples: Provenance Representations • Computational Workflows All of these are representations of provenance.
  • Examples: Provenance Representations • Computational Workflows • Experimental Workflows All of these are representations of provenance.
  • Examples: Provenance Representations • Computational Workflows • Experimental Workflows • Biospecimen Management All of these are representations of provenance.
  • Examples: Provenance Representations • Computational Workflows • Experimental Workflows • Biospecimen Management • Data Sources, Ownership & Attribution All of these are representations of provenance.
  • Hypothesis • If experimental metadata is provenance: • Then a good model of provenance can be used to encode experimental metadata as well as a domain-specific metadata. • Then a general model of provenance can model the entirety of experimental metadata. • We try to evaluate this by mapping microarray metadata onto provenance graphs.
  • Why a generic provenance model?
  • An Integrated View of Provenance
  • An Integrated View of Provenance Transformation Normalization Scan Hybridization Labeled Cell Extract Extract Line
  • An Integrated View of Provenance Transformation Normalization Scan Hybridization Labeled Cell Extract Section Biopsy Participant Extract Line
  • An Integrated View of Provenance Transformation Normalization Clinical Scan Diagnosis Hybridization Labeled Cell Extract Section Biopsy Participant Extract Line
  • An Integrated View of Provenance Transformation Normalization Clinical Scan Diagnosis Hybridization DOB Labeled Cell Extract Section Biopsy Participant Extract Line
  • An Integrated View of Provenance Transformation Normalization Histology Clinical Scan Image Diagnosis Hybridization Slide DOB Labeled Cell Extract Section Biopsy Participant Extract Line
  • An Integrated View of Provenance Transformation Lab Lab Lab Lab Results Results Normalization Results Results Histology Clinical Scan Image Diagnosis Blood Hybridization Slide DOB Sample Labeled Cell Extract Section Biopsy Participant Extract Line
  • An Integrated View of Provenance Transformation Lab Lab Adverse Adverse Lab Lab Adverse Adverse Results Results Events Events Normalization Results Results Events Events Histology Clinical Scan Image Diagnosis Blood Hybridization Slide DOB Sample Labeled Cell Extract Section Biopsy Participant Extract Line
  • An Integrated View of Provenance Transformation Lab Lab Adverse Adverse Lab Lab Adverse Adverse Results Results Events Events Normalization Results Results Events Events Histology Clinical Scan Image Diagnosis Blood Hybridization Slide DOB Sample Labeled Cell Extract Section Biopsy Participant Extract Line Biopsy Event Date
  • Related Work, ABC Soup •MAGE (MicroArray and Gene Expression Object Model •MAGE-TAB: Tab-delimited spreadsheet of MAGE. •IDF (Investigation Design Format) •SDRF (Sample and Data Relationship Format) •MGED (Microarray Gene Expression Data)Ontology and MGED Society •OPM (Open Provenance Model) •PML (Proof Markup Language)
  • General Models of Provenance: Open Provenance Model
  • General Models of Provenance: Proof Markup Language
  • Methods Overview MAGE-TAB MAGE-TAB IDF SDRF
  • Methods Overview MAGE-TAB MAGE-TAB IDF SDRF Used(IDF) Used(SDRF) MAGE-TAB to MAGE-RDF WasGeneratedBy(RDF) MAGE-RDF
  • Methods Overview MAGE-TAB MAGE-TAB IDF SDRF Used(IDF) Used(SDRF) MAGE-TAB to mageprov.rules MAGE-RDF mageprovenance.owl Used(Rules) WasGeneratedBy(RDF) Jena Rule Used(OWL) MAGE-RDF Used(RDF) Engine
  • Methods Overview MAGE-TAB MAGE-TAB IDF SDRF PML & OPM Used(IDF) Used(SDRF) WasGeneratedBy(RDF) MAGE-TAB to mageprov.rules MAGE-RDF mageprovenance.owl Used(Rules) WasGeneratedBy(RDF) Jena Rule Used(OWL) MAGE-RDF Used(RDF) Engine
  • MAGE-TAB 2 MAGE-RDF: an RDF Version of MAGE • Used Limpopo to parse MAGE-TAB • Big thanks to HCLS SIG for input • (Small) issues with MGED Ontology: • Missing ProtocolApplication, some key properties. • Cannot perform inferencing without ignoring owl:someValuesFrom. • Contains 234 owl:Classes with 193 owl:someValuesFrom restrictions. • Should be using integrity constraint instead. • http://magetab2rdf.googlecode.com
  • Methods: MAGE-RDF to Open Provenance Model • 16 rules in Jena Example Rule: Inference Language [opm.generated.sample: (?exp rdf:type mged:Experiment), • 15 subClass (?exp mage:has_protocol_application ?pa), makeTemp(?gen), mappings (?pa mged:has_protocol ?protocol), (?pa mage:has_derivative ?sample) -> • 10 individuals (?gen rdf:type opm:WasGeneratedBy), (?gen opm:cause ?protocol), (Roles) (?gen opm:account ?exp), (?gen opm:effect ?sample), (?sample rdf:type opm:Artifact) • 2 subProperty ] mappings
  • Methods: MAGE-RDF to Proof Markup Language • 4 rules in Jena Example Rule: Inference [pml.protocolApplication: (?pa rdf:type mage:ProtocolApplication), Language (?pa mage:has_derivation_source ?derivation), (?pa mged:has_protocol ?protocol), (?pa mage:has_derivative ?derived), • 14 subClass (?pa mged:has_protocol ?protocol), (?derivedNodeSet pmlj:hasConclusion ?derived), (?derivationNodeSet pmlj:hasConclusion ?derivation), mappings -> makeTemp(?antecedentList) (?pa rdf:type pmlj:InferenceStep), • 2 subProperty (?pa pmlj:hasInferenceRule ?protocol), (?pa pmlj:hasIndex 1), mappings (?derivedNodeSet pmlj:isConsequentOf ?pa), (?pa pmlj:hasAntecedentList ?antecedentList), (?antecedentList rdf:type pmlj:NodeSetList), (?antecedentList ds:first ?derivationNodeSet) ]
  • Evaluation: Declarative Mapping Pros: Cons: • Easy to understand. • “Unclean” • Easy to validate. representations. • Little programming • More difficult to involved. debug. • Many languages. • Many languages. • Easy integration with • Logic programming existing ontologies. not everyone's cup of tea. • Powerful rule languages available.
  • Evaluation: Expressiveness & Coverage
  • Evaluation: Expressiveness & Coverage Missing in OPM: • See below.
  • Evaluation: Expressiveness & Coverage Missing in OPM: Missing in PML: • See below. • Parameters • Parameter Values (could have been variable mappings)
  • Evaluation: Expressiveness & Coverage Missing in OPM: Missing in PML: • See below. • Parameters • Parameter Values Could have added (could have been to OPM and PML: variable mappings) • Publications • Time • Comments
  • Evaluation: Expressiveness & Coverage Missing in OPM: Missing in PML: • See below. • Parameters • Parameter Values Could have added (could have been to OPM and PML: variable mappings) • Publications MAGE comments are • Time structured name-value pairs. • Comments These would have to be represented as PML Information and OPM Artifacts.
  • Evaluation: PML Visualization
  • Evaluation: PML Visualization
  • Evaluation: OPM Visualization
  • Evaluation: OPM Visualization
  • Future Work • Integration with: • Tissue management tools • Clinical data sources • Workflow engines (Taverna already exports OPM) • Use case-based evaluation of sufficiency.
  • Conclusions • Experimental metadata is provenance. • All experimental provenance can benefit from a common model. • MAGE-provenance means 9,975 assays ready for conversion to a common model. • OPM provides slightly better modeling coverage and workflow integration. • PML provides better reasoning coverage and theorem prover integration.
  • Acknowledgements and References • Tetherless World (Deborah McGuinness et al.) • Krauthammer Lab (Michael Krauthammer et al.) • W3C Health Care and Life Sciences SIG. • LIMPOPO: limpopo.sourceforge.net • MAGE-TAB: magetab.sourceforge.net • MGED Society: www.mged.org • PML: www.inference-web.org • OPM: www.openprovenance.org • Jiao Tao, Li Ding, Deborah L. McGuinness, Instance Data Evaluation for Semantic Web-Based Knowledge Management Systems, 42th Hawaii International Conference on System Sciences (HICSS-42), January 2009.