Integrating Large, Disparate, BiomedicalOntologies to Boost Organ DevelopmentNetwork Connectivity    Chimezie Ogbuji1 and ...
Outline   Outline    ◦   Background    ◦   Motivation    ◦   Literature review / related work    ◦   Opportunity / specif...
Background   Controlled biomedical vocabulary systems    (and ontologies) play a key role in the    analysis of genetic d...
Motivation Want descriptive relations that comprise  terminology paths between (congenital)  diseases and the anatomical ...
Opportunity   The Gene Ontology (GO) is arguably the most prominent    example of how highly-organized and structured med...
Opportunity (continued) Their skeletal relations (is_a, part_of, and  has_part) have the same meaning There are no immed...
Literature review  Cellular components function via interaction with   each other in a highly-complex and   interconnecte...
For over a decade, analysis of biological networks via network and graph theoryhas revealed the importance of locally-dens...
Related work   Investigation of structural and lexical    concordance between anatomy terms in the FMA    and SNOMED-CT  ...
Opportunity: Cardiovasculardisease and development Understanding the formation of the heart is  critical to the understan...
Marfan Syndrome (MFS)[…] mainly characterized by aneurysm formation in theproximal ascending aorta, leading to aortic diss...
Marfan Syndrome example In the GO, FBN1 is annotated with the  GO_0001501 (skeletal system development)  and GO_0007507 (...
Hypothesis   A high-quality integration of the GOs    development process hierarchy with the FMA will    have several ben...
Copper: annotates human geneGold : does not annotate human gene
Method and materials   Integration is performed on the following GO    development process hiearchies    ◦ Anatomical str...
Method and materials (continued) The FMA ontology is loaded (as OWL/RDF) into a  triple store for remote querying via SPA...
Example GO_0007507 (heart development) Prefix: heart Matching FMA concept: FMA_7088  (Heart)
Evaluation Result: 1644 development process and  anatomical entity pairs We calculate the Jaccard coefficient of  the ov...
Evaluation (continued) Using the GO development process for some  FMA organ O as the starting point, the set of all  subo...
Evaluation (continued)   In a similar fashion, the subordinate anatomical    entities for each O amongst the 6 chosen org...
Jaccard Coefficient (overlap)
Evaluation: network connectivity   We calculate number of new paths from    OMIM diseases through their genes to    the a...
Network connectivity: continued   Only genes that are annotated with    anatomical development processes    matched to FM...
Number of additional P+dgo paths on a logarithmic scale
Histogram of the distribution of additional P+dgo paths as a wholeand normalized by the number of genes associated with ea...
Log-scaled histogram of additional paths from Genesdev to FMAclasses, only for those genes that had additional paths
Evaluation summary On average, mapping introduces 9,549  additional P+dgo paths per OMIM disease On average, each Genede...
Discussion Overlap results indicate little overlap  between the GO hierarchies and  corresponding FMA hierarchies Not su...
Discussion (continued) This along with the size of the FMA as a  whole and within the portions mapped to  the GO hierarch...
Discussion (continued) As these paths are at least as logically and  biologically sound as the ontologies they  were forg...
Limitations   Regarding deductions (formal or    otherwise) that follow from an    integration of the FMA and GO    ◦ Nee...
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity
Upcoming SlideShare
Loading in …5
×

Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

634 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
634
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

  1. 1. Integrating Large, Disparate, BiomedicalOntologies to Boost Organ DevelopmentNetwork Connectivity Chimezie Ogbuji1 and Rong Xu2 Metacognition LLC1 Case Western Reserve University2
  2. 2. Outline Outline ◦ Background ◦ Motivation ◦ Literature review / related work ◦ Opportunity / specific example ◦ Hypothesis ◦ Method ◦ Evaluation ◦ Discussion
  3. 3. Background Controlled biomedical vocabulary systems (and ontologies) play a key role in the analysis of genetic disease ◦ Structured, interoperable, and machine-readable ◦ Facilitate reproducibility of scientific results and use of intelligent software that can leverage underlying meaning ◦ Scientific results and the structured biomedical knowledge they are based on may be used for multiple - even unanticipated - purposes
  4. 4. Motivation Want descriptive relations that comprise terminology paths between (congenital) diseases and the anatomical entities that become malformed Want to use these as the basis for analysis and classification of congenital disorders according to their underlying molecular mechanism
  5. 5. Opportunity The Gene Ontology (GO) is arguably the most prominent example of how highly-organized and structured medical knowledge can be leveraged to facilitate medical genetics ◦ Has a hierarchy of biological processes involving organ development.   The Foundational Model of Anatomy (FMA) is a vast ontology with an objective to conceptualize the physical objects and spaces that constitute the human body ◦ macroscopic, microscopic and sub-cellular canonical anatomy. Their skeletal relations (is_a, part_of, and has_part) have the same meaning
  6. 6. Opportunity (continued) Their skeletal relations (is_a, part_of, and has_part) have the same meaning There are no immediately usable terminology paths between concepts in the GOs anatomy development process hierarchy and participating anatomical entities defined in the FMA
  7. 7. Literature review  Cellular components function via interaction with each other in a highly-complex and interconnected network  Interdependencies among a cell’s molecular components lead to functional, molecular, and causal relationships among distinct phenotypes.  Network-based approaches to disease have the potential to provide a framework for classifying disease, defining susceptibility, predicting disease outcome, and identifying tailored therapeutic strategiesBarabási et al. Network Medicine: A Network-based Approach to Human Disease, Nature ReviewsGenetics 2011.
  8. 8. For over a decade, analysis of biological networks via network and graph theoryhas revealed the importance of locally-dense andwell-connected subgraphs (hubs). Schwikowski et al. A network of protein-protein interactions in yeast 2000 Barabási et al. 2011
  9. 9. Related work Investigation of structural and lexical concordance between anatomy terms in the FMA and SNOMED-CT ◦ Bodenreider & Zhang 2006 Leveraging this concordance for integrating modules from each for a specific domain ◦ Ogbuji et al. 2010 Discussion of logical consequences of using part_of between both anatomical entities (in the FMA) and biological processes (the GO) ◦ Jimenez-Ruiz et al. 2010
  10. 10. Opportunity: Cardiovasculardisease and development Understanding the formation of the heart is critical to the understanding of cardiovascular diseases The study of genes and gene products involved in cardiovascular development is an important research area There have been recent efforts to expand the subset of the GOs anatomy development hierarchy involved in heart development
  11. 11. Marfan Syndrome (MFS)[…] mainly characterized by aneurysm formation in theproximal ascending aorta, leading to aortic dissectionor rupture at a young age when left untreated. Theidentification of the underlying genetic cause of MFS, namelymutations in the fibrillin-1 gene (FBN1), has furtherenhanced [...] insights into the complex pathophysiology ofaneurysm formationIn UMLS Metathesaurus• Finding site: connective tissue structure (SNOMED-CT)• Category: congenitial skeletal disorder (CRISP Thesaurus and NLM MTH)
  12. 12. Marfan Syndrome example In the GO, FBN1 is annotated with the GO_0001501 (skeletal system development) and GO_0007507 (heart development) concepts (amongst others) The former coincides with the more common finding site and classification of MFS as a congenital skeletal disorder This is in spite of the fact that associations (causal and otherwise) between MFS and cardiovascular diseases such as aortic root dilation are well- documented in the medical literature
  13. 13. Hypothesis A high-quality integration of the GOs development process hierarchy with the FMA will have several benefits: ◦ New biological pathways from genetic diseases to the anatomical entities whose development are involved in their underlying molecular mechanisms ◦ Graph and network analysis can benefit from an increase in connectivity for discovering biologically meaningful motifs ◦ Similarly, classification algorithms can also take advantage of this
  14. 14. Copper: annotates human geneGold : does not annotate human gene
  15. 15. Method and materials Integration is performed on the following GO development process hiearchies ◦ Anatomical structure development ◦ Anatomical structure arrangement ◦ Anatomical structure morphogenesis Only GO concepts that annotate human genes are considered In processing the GO, the logical properties (transitivity, for example) of the relations are fully considered ◦ This will always be the case, henceforth
  16. 16. Method and materials (continued) The FMA ontology is loaded (as OWL/RDF) into a triple store for remote querying via SPARQL The prefix of the human-readable label for each GO concept in the development hierarchies is stemmed and used as a basis for case-insensitive, lexical matching on primary labels and exact synonyms of FMA classes via a SPARQL query FMA classes that match exactly are considered to denote the anatomical entities that participate in the corresponding GO biological process
  17. 17. Example GO_0007507 (heart development) Prefix: heart Matching FMA concept: FMA_7088 (Heart)
  18. 18. Evaluation Result: 1644 development process and anatomical entity pairs We calculate the Jaccard coefficient of the overlap between hierarchies for 6 major organs and the anatomical development processes they participate in
  19. 19. Evaluation (continued) Using the GO development process for some FMA organ O as the starting point, the set of all subordinate terms is calculated: GOsubgraph(O) Example: ◦ GO_0007507 (heart development) has GO_0003170 (heart valve development) as a component (via has_part) ◦ GO_0003170 subsumes GO_0003176 (aortic valve development) and has GO_0003179 (heart valve morphogenesis) as a component ◦ Each of these would be considered as subordinates of GO_0007507
  20. 20. Evaluation (continued) In a similar fashion, the subordinate anatomical entities for each O amongst the 6 chosen organs are calculated: ◦ FMAsubgraph(O) For each O, we calculate the GO terms that are both in GOsubgraph(O) and were matched with an FMA class that is in FMAsubgraph(O) This resulting set of GO terms is considered the intersecting set and the Jaccard coefficient is calculated with respect to this, FMAsubgraph(O), and GOsubgraph(O)
  21. 21. Jaccard Coefficient (overlap)
  22. 22. Evaluation: network connectivity We calculate number of new paths from OMIM diseases through their genes to the anatomical entities in the FMA: ◦ P+dgo Similarly, we calculate the number of new paths starting from the genes to additional FMA anatomical entities ◦ P+go
  23. 23. Network connectivity: continued Only genes that are annotated with anatomical development processes matched to FMA classes and OMIM diseases associated with these genes were considered ◦ Genesdev
  24. 24. Number of additional P+dgo paths on a logarithmic scale
  25. 25. Histogram of the distribution of additional P+dgo paths as a wholeand normalized by the number of genes associated with each disease
  26. 26. Log-scaled histogram of additional paths from Genesdev to FMAclasses, only for those genes that had additional paths
  27. 27. Evaluation summary On average, mapping introduces 9,549 additional P+dgo paths per OMIM disease On average, each Genedev gene had 17,037 additional paths to FMA classes Caveat in normalizing the number of P+dgo paths by number of genes ◦ paths from diseases to anatomical entities introduce combinatorial factor of disease-gene pairings
  28. 28. Discussion Overlap results indicate little overlap between the GO hierarchies and corresponding FMA hierarchies Not surprising as both cover disparate domains within medicine and one is specific to humans while the other is not
  29. 29. Discussion (continued) This along with the size of the FMA as a whole and within the portions mapped to the GO hierarchies indicate opportunity to build on the mapping and to integrate both ontologies in a meaningful way Connectivity results demonstrate significant increase of biological paths from genetic diseases (and their genes) to the anatomical entities participating in the development process
  30. 30. Discussion (continued) As these paths are at least as logically and biologically sound as the ontologies they were forged from, we expect that an appreciable amount of them will be useful for analysis To our knowledge, this is the first attempt of this kind to integrate the anatomical structural development, morphogenesis, and organization hierarchies in the GO with the FMA
  31. 31. Limitations Regarding deductions (formal or otherwise) that follow from an integration of the FMA and GO ◦ Need to be careful to only consider annotations for humans or to have a robust way to manage the uncertainty introduced in not doing so

×