C-SHALS 2013 - Talk E.Gombocz

243 views

Published on

Yes, we can!
Lessons from Using Linked Open Data (LOD) and Public Ontologies to Contextualize and Enrich Experimental Data

The talk slides cover Ideas behind “Linked Open Data”,
reality challenges, applied to a practical scenario of toxicity biomarkers; details on semantic mapping, harmonization, resource alignment; the lessons learned and their
socioeconomic consequences
Future Outlook

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
243
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • “YES, we can! Lessons from Using Linked Open Data (LOD) and Public Ontologies to Contextualize and Enrich Experimental Data” at CSHALS 2013
  • Ideal and real world – LOD cloud is growing, but how many LOD Cloud resources are true 5-star open data? And, does this equally mean their contextualized collaborative usability?
  • Not so fast! Issues to address are manifold; it’s just not nicely lining up (“but it’s all RDF, how come?”) – and if so, will those resources be around forever?
  • Multi-YearToxicity Studies: Study plan results in multi-modal experimental correlations for pre-selection of putative markers on several tissues. After all the years of experiments: are they really biologically representative? We cannot know from this alone.
  • Reasoning why we want to use LOD resources: to enrich and annotate experiments for biological qualification of results. Statistics alone are not enough –we’ve seen that.
  • Approach: Analyze needs – basics (from where? version? link quality? Persistent?), 5-star LOD’s, existing ontologies via web services (NCBO BioPortal: TMO, VoID, PROV-O, …)
  • Workflow: Mapping, Harmonization, Ontology Merge, Resource Alignment Step-by-step walk through: be aware of obstacles, but don’t let them derail the objectives
  • Step 1: Map experiments to RDF – Template generation , scripted transformation, visual mapping review
  • Step 1: Map to RDF – Term harmonization via one or multiple thesauri; select thesauri for classes during mapping
  • Step 2: Use public ontologies – BioPortal example; merge applications ontology with parts of formal ontologies to utilize their structure (applied VoID, PROV-O and elements of TMO to informal applications ontology)
  • Ontology import and merging: building from parts of well-formed public ontologies to final merged application-specific ontology with common vocabularies
  • Explore common relationships for experimental observations between treatments
  • Perform iterativevisual SPARQL queries with perturbation ranges for each putative marker to establish a model pattern
  • Enrichment via queries:Public SPARQL endpoints: UniProt, GO, Drugbank, Diseasome, SIDER, Reactome, ChEMBL– import results to enrich the network.Drillout to NCBI BioSystems and Gene – import results to enrich further
  • Common Toxicity marker across 2 compounds (genes and metabolites) and their involvement in biological systems of diseases
  • Common Toxicity marker (genes and metabolites) and their involvement in biological systems of diseases: 2 different treatments, pulled apart for better visual exploration
  • Common Toxicity marker (genes and metabolites) and their common involvement in biological systems (partial view, 2 treatments compared for similarities in perturbation; affected biosystems depicted around each marker)
  • Common Toxicity marker (genes and metabolites) and their involvement in biological systems: review of external resource details for one pathway a genomic marker is involved
  • Common Toxicity marker (genes and metabolites) and their involvement in biological systems of diseases: explore relationships in DrugBank and Diseasome and add them selectively to the Knowledge Base
  • Example: Benzene-like toxicity; 18 genomic, 2 metabolomic markers aligned with their biology
  • SPARQL query pattern of multi-modal biomarker sets characteristic for a specific type of Toxicity are generated and published as ‘Applied Semantic Knowledgebase’. Such ASK Arrays can be used via web browser for toxicity screening of unknown compounds.
  • General: LOD can be used to qualify experimentsSpecifics: Confirmed Nucleotide synthesis & repair impact; long-term memory effects; dependency; tissue-specific oxidative stress; chronic toxicity
  • Lessons Learned: Yes, we Can ! - Socioeconomic Consequences - Future Outlook: room for improvement
  • C-SHALS 2013 - Talk E.Gombocz

    1. 1. YES, WE CAN! LESSONS FROM USING LINKED OPEN DATA (LOD) AND PUBLIC ONTOLOGIES TO CONTEXTUALIZE AND ENRICH EXPERIMENTAL DATA Erich A. Gombocz Andrea Splendiani Mark A. Musen Robert A. Stanley Jason A. Eshleman 1
    2. 2. OUTLINE • IDEAS BEHIND “LINKED OPEN DATA” • REALITY CHALLENGES • PRACTICAL SCENARIO: TOXICITY BIOMARKERS • MAPPING, HARMONIZATION, RESOURCE ALIGNMENT • LESSONS LEARNED: YES, WE CAN ! • SOCIOECONOMIC CONSEQUENCES • FUTURE OUTLOOK • ACKNOWLEDGEMENTS • REFERENCES 2
    3. 3. 3 IDEAS BEHIND “LINKED OPEN DATA” IDEAL ***** Linked (Open) Data = ***** Collaborative Usability
    4. 4. 4 GREAT! LET‟S USE THEM FOR ENRICHMENT
    5. 5. REALITY CHECK 5 • NOT SO FAST: • Inconsistent namespace policies • Use of internal, non-formal application ontologies • Misaligned public and experimental corporate standards • Versioning and provenance issues • Reliability from service-level to URI persistence • More and more “Open data” are closed for commercial use • Serious funding concerns about government-backed resources
    6. 6. 6 HMMM.. CAN WE REALLY USE THEM?
    7. 7. 7 YES, WE CAN! LET‟S SEE HOW MUCH WE GAIN …
    8. 8. STUDY SCENARIO • HEPATOTOXICITY STUDIES • Panel of hepatotoxicants, single oral dose (placebo, low, mid, high) in groups of 4 rats, at 6, 24 and 48 hrs. • Metabolic analysis of liver, serum and urine (1603 metabolic components; Bruker LC/MS-MS); • Microarray analysis of liver and whole blood (31096 transcript probes; Affymetrix); • Statistical biomarker pre-selection at p<0.005, abs fc>10 (genes) and p<0.005, abs fc>2.5 (metabolites) • ALCOHOL STUDIES • High doses t.i.d. for four days, with and without 24h withdrawal. • Metabolic analysis of plasma, liver and brain (1620 metabolic components) • Microarray analysis of liver and brain (31096 transcript probes) • Statistical biomarker pre-selection at p<0.005, abs fc>5 (genes) and p<0.005, abs fc>2.5 8
    9. 9. OBJECTIVES INTERPRET EXPERIMENTAL FINDINGS IN CONTEXT OF BIOLOGICAL FUNCTIONS • HOW? • Use public LODs to semantically enrich and annotate experiments, and qualify biological relevancy of putative biomarkers • WHY? • Multi-modal experimental observations from the same perturbation can represent very different biological processes. • Pharmacodynamic correlations are not necessarily functionally linked biologically 9
    10. 10. APPROACH RESOURCE ANALYSIS • What do we need to accomplish objectives? • Basics (provenance, versioning, high interlink quality, persistence) • Generally applicable, quick & easy solution • Focus on „‟ resources • UniProt, … • Use existing formal ontologies (or parts of) whenever possible • NCBO BioPortal 10
    11. 11. ROADMAP TO MAKE IT WORK 11 Map Experiments to RDF Harmonize & Version Refine Context Annotate & Add to Knowledge Base Get Answers from Contextual Queries • Namespace, URI Policy • Entities vs. Literals, Data Types • Scripted Transformations • Provenance • Concept alignment • Vocabularies, Thesauri • Ontology Merging • Versioning, Attribution • Import only what‟s needed • Iterate Visual SPARQL Queries • Establish Classifier Patterns Be aware of challenges, BUT: LOD will save you a lot of time in providing biological context for experimental findings !
    12. 12. 12
    13. 13. 13
    14. 14. 14
    15. 15. 15
    16. 16. 16
    17. 17. 17
    18. 18. 18
    19. 19. 19
    20. 20. 20
    21. 21. 21
    22. 22. 22
    23. 23. 23
    24. 24. RESULTS: BIOLOGY CONFIRMED 24 Marker Class Instance UniProt AC Pathway Gene Protein Biology genes CYP2C40 P11510 cp2cc Cytochrome P450 2C40 heme binding, iron ion binding, aromatase activity genes AKR7A3 P38918 akr7a3 Aflatoxin B1 aldehyde reductase member 3 detoxification genes GPX2 P83645 gpx2 Glutathione peroxidase 2 response to oxidative stress, negative regulation of inflammatory response genes MYC P09416 myc Myc proto-oncogene protein (Transcription factor p64) regulation of gene transcription, non-specific DNA binding, activates transcription of growth-related genes genes MT1A P02803, Q91ZP8 mt1a Metallothionein-1 metal ion binding genes HMOX1 P06762 hmox1 Heme oxigenase 1 heme catabolic process, negative regulation of DNA binding genes FGF21 Q8VI80 fgf21 Fibroblast growth factor 21 (Protein Fgf21) positive regulation of ERK1 and ERK2 cascade, MAPKKK cascade and cell proliferation genes AKR1B8 Q91W30 akr1b8 Aldose reductase-like protein oxidoreductase activity genes TRIB3 Q9WTQ6 trib3 Tribbles homolog 3 disrupts insulin signaling by binding directly to Akt kinases, expression induced during programmed cell death genes YC2 P46418 gsta5 Glutathione S-transferase alpha-5 (EC 2.5.1.18) response to drug, xenobiotic catabolic process genes ABCB1, RGD:619951 P43245 abcb1 Multidrug resistance protein 1 (EC=3.6.3.44) response to organic cyclic compound, tumor necrosis factor, arsenic-containing substance or ionizing radiation genes RGD:1310991 Q5U2P3 Zfand2a AN1-type zinc finger protein 2A zinc ion binding genes GSTP1, GSTP2 P04906 gstp1 Glutathione S-transferase P (EC 2.5.1.18) response to toxin, xenobiotic metabolic process, response to reactive oxygen species, response to ethanol genes RGD:708417 Q62789 ugt2p7 UDP-glucuronosyltransferase 2B7 (UDPGT 2B7) (EC 2.4.1.17) major importance in conjugation and subsequent elimination of toxic xenobiotics and endogenous compounds genes GCLC P19468 gclc Glutamate--cysteine ligase catalytic subunit (EC=6.3.2.2) response to oxidative stress genes TXNRD1 O89049 txnrd1 Thioredoxin reductase 1, cytoplasmic (EC=1.8.1.9) benzene-containing compound metabolic process, cell redox homeostasis, response to drug genes NQO1 P05982 nqo1 NAD(P)H dehydrogenase [quinone] 1 (EC 1.6.5.2) response to oxidative stress, response to ethanol, superoxide dismutase activity genes DDIT4L Q8VD50 ddit4l DNA damage-inducible transcript 4-like protein negative regulation of signal transduction, Inhibits cell growth by regulating TOR signaling pathway metabolites Pyroglutamic acid Q9ER34 aco2 Aconitate hydratase, mitochondrial citrate metabolism, isocitrate metabolism, tricarboxylic acid cycle metabolites Choline Q64057 aldh7a1 Alpha-aminoadipic semialdehyde dehydrogenase (EC 1.2.1.31) betaine biosynthesis via choline pathway, response to DNA damage stimulus
    25. 25. FROM COMPLEX TO ACTIONABLE 25
    26. 26. SUMMARY IN A NUTSHELL • GENERAL • LOD resources can be used to confidently qualify statistical pharmacogenomic findings with systems biological responses • Such relationships are key to better decode complex biological functions involved in toxicity. • We were able to qualify biomarker patterns for distinct categories of toxicity (Benzene-like, Halogenated compound-like, Alcohol-like). Confirmation of biological viability enables their use for toxicity screening. • SPECIFIC INSIGHTS GAINED • NUCLEOTIDE SYNTHESIS AND REPAIR: One-Carbon metabolism changes are due to differential methylation • LONG-TERM MEMORY: Signaling pathway involvement indicates influence on long-term memory storage in brain • DEPENDENCY: Ketoacidosis in liver and depletion of biogenic amine precursors relate to alcohol dependency. • TISSUE SPECIFICITY: Major changes in purine metabolism suggest inhibition of xanthine oxidase through oxidative stress while in plasma changes in biogenic amine precursors which rebound during withdrawal were also indicated by the selective depletion of cytosine and cytidine vs. thymidine. • CHRONIC TOXICITY: Purine metabolism changes in liver explain observed processes in Krebs cycle and Tryptophan pathway indicative of chronic, long-term toxic effects. 26
    27. 27. TAKE HOME / FUTURE OUTLOOK YES, ENRICHING EXPERIMENTS WITH LOD RESOURCES FACILITATES BETTER AND FASTER QUALIFICATION ! • In toxicity assessment at pre-clinical stage. biologically validated system changes associated with common toxicity mechanisms provide better a-priori determination of adverse effects of drug combinations. • Models for classification of toxicity types (hepato-, nephro-, drug residue- based) were functionally qualified. • THERE IS STILL ROOM FOR IMPROVEMENT • Permanent URLs, better inter-linking and provenance. • NEED TO RECOGNIZE SOCIOECONOMIC BENEFITS • Time and money saved should lead to new business models to secure LOD resource funding. 27
    28. 28. ACKNOWLEDGMENTS 28 Icoria / Cogenics Pat Hurban, Alan Higgins, Imran Shah, Hongkang Mei, Ed Lobenhofer Bowles Center for Alcohol Studies / UNC Fulton Crews BMIR / NCBO Stanford Mark Musen, Trish Whetzel Bio2RDF II Michel Dumontier SIB / UniProt Consortium Jerven Bolleman Wikimedia Foundation Anja Jentsch Support for Toxicity Studies NIST ATP #70NANB2H3009 NIAAA #HHSN281200510008C W3C HCLS / Pharmacogenomics SIG IO Informatics Andrea Splendiani, Jason Eshleman, Robert Stanley
    29. 29. 29 THANK YOU! QUESTIONS? egombocz@io-informatics.com
    30. 30. REFERENCES 1) LDOW2012 Linked Data on the Web. Bizer C,Heath T, Berners-Lee T, Hausenblas M. WWW Workshop on Linked Data on the Web, 2012 Apr.16, Lyon, France. 2) The National Center for Biomedical Ontology. Musen MA, Noy NF, Shah NH, Whetzel PL, Chute CG, Story MA, Smith B. J Am Med Inform Assoc. 2012 Mar-Apr; 19 (2): 190-5 3) BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Whetzel PL, Noy NF, Shah NH, Alexander PR, Nyulas C, Tudorache T, Musen MA. Nucleic Acids Res. 2011; 39 (Web Server issue): W541-5 4) Using SPARQL to Query BioPortal Ontologies and Metadata Salvadores M, Horridge M, Alexander PR, Fergerson RW, Musen MA, and Noy NF. International Semantic Web Conference. Boston US. LNCS 7650, pp. 180195, 2012. 5) The Translational Medicine Ontology and Knowledge Base: driving personalized medicine by bridging the gap between bench and bedside. Luciano JS, Andersson B, Batchelor C, Bodenreider O, Clark T, Denney CK, Domarew C, Gambet T, Harland L, Jentzsch A, Kashyap V, Kos P, Kozlovsky J, Lebo T, Marshall SM, McCusker JP, McGuinness DL, Ogbuji C, Pichler E, Powers RL, Prud‟hommeaux E, Samwald M, Schriml L, Tonellato PJ, Whetzel PL, Zhao J, Stephens S, Dumontier M. J.Biomed.Semantics 2011; 2(Suppl 2):S1 6) VoID Vocabulary of Interlinked Datasets. Cyganiak R, Zhao J, Alexander K, Hausenblas M. DERI, W3C note 6-Mar- 2011 7) PROV-O: The PROV Ontology. W3C Candidate Recommendation 11- Dec-2012 8) Does network analysis of integrated data help understanding how alcohol affects biological functions? - Results of a semantic approach to biomarker discovery. Gombocz EA, A.J. Higgins AJ, Hurban P, Lobenhofer EK, Crews FT, Stanley RA, Rockey C, Nishimura T. 2008 Sept.29-Oct.1.Biomarker Discovery Summit, Philadelphia, PA. 30

    ×