Removing roadblocks: leveragingontologies for data aggregation andcomputationNCBO Seminar seriesMarch 6th, 2013Melissa Hae...
Topics for today The Research Symbiosis Some Integration Projects LeveragingOntologies A more complete research profile...
ConsultDatabasesShareResources/DataPublishpapersContributetoDatabasesThe Research SymbiosisGet fundingDo ExperimentsThe Web
We’ve all been here before:Ontologies can help us do better.OMIM Query # of records“large bone” 1032"enlarged bone" 207"bi...
Why not just map to ontology terms?Class A Class B Mapped? Useful?FMA: extensorretinaculum of wristMouseAnatomy: retina Ye...
Topics for today The Research Symbiosis Some Integration Projects LeveragingOntologies A more complete research profile...
CTSAconnect:A Linked Open Dataapproach to representclinical and researchexpertise, activities, andresourcesCTSA 10-001: 10...
About eagle-i:inventories “invisible” resourcesOntology-system forcollecting and queryingresearch resourceseagle-i.net net...
About VIVO Primarily focused on people, activities, andoutcomes typically associated with researchnetworking Eager to re...
At the intersection of Vivo and eagle-i
www.ctsaconnect.org CTSAconnectReveal Connections. Realize Potential.And then was born the“CTSAconnect” projectOk, so it i...
ISF Content and modularizationeagle-IResearch resourcesVIVOPerson profilingShareCenterDiscussions, requests,share document...
ISF ModularizationConstraints• Different ontology modeling principles• Active ongoing development of eagle-i and VIVO appl...
 Annotation view with approved or pending approval. Module view shows pending axiom changes per module and has ability t...
ISF Merging
Relating ICD9 to MeSH in support ofclinical expertise
Clinical expertise data visualization
Building translational teams We want to assemble teams of scientists toexamine, for example, specific drugs releasedfor r...
Bringing together clinical expertiseand basic science expertiseRepresentation of a clinician expertise extractedFrom ICD-9...
Relating researchers across disciplines
Topics for today The Research Symbiosis Some Integration Projects LeveragingOntologies A more complete research profile...
OHSU’s Biolibrary and Search Engine Data aggregated from two repositories:– Department of Pathology repository (600K)– Kn...
Opportunities for improving theBiolibrary dataLimited anatomical data– Cancer registry table has 300+anatomical entities– ...
Current Search InterfaceTwo separate searchinterfacesMultiple forms
Biolibrary Text SearchSyntactic free textsearch
Coded Syntactic SearchSearch throughanatomy andhistology lists
Extracting ontology concepts Pathology reports were the main focus– Main source of data in the current system– Contain ri...
Developing a Biospecimen ontologyPhenotypes(PATO)InformationOntology (IAO)•HPO•SNOMED•NCI Thesaurus•ICDO/ICD9•GO•CHEBI•Cel...
Structured data vs. pathology report(about 7K cases)However, pathology report also includes:• Low grade pancreatic intraep...
Adding Logical Relationships About 400 anatomical entities were mappedto the Foundational Model of Anatomy An additional...
Considerations Concept mapping helps with document retrieval Does not necessarily imply a fact– Negation– Differential d...
Topics for today The Research Symbiosis Some Integration Projects LeveragingOntologies A more complete research profile...
VertebrataAscidiansArthropodaAnnelidaMolluscaEchinodermatatetrapod limbsampullaetube feetparapodiaWe want to understand ge...
Databasing phenotypesis hard• Free text descriptions• Clinical note• Models• Atlases• Images• Controlled terms• Multiple f...
Databases proliferateATTCGGATTACCGTATTA…genes, regulatory elements, …sequenceSequence data
Ontologies as a tool for unificationDisease-PhenotypedatabasesDiseasephenotypeontologyExpressiondataGene functiondataCell ...
Yet problems remainsIncompletedataNot connectedontologyMissing & incorrectannotationsMultipleOverlappingOntologiesontology...
Ontologies built for one species willnot work for othershttp://fme.biostr.washington.edu:8080/FME/index.htmlhttp://ccm.ucd...
Uberon: a multi-species anatomyontology• Contents:– Over 8,000 classes (terms)– Multiple relationships, including subclass...
Bridging anatomy ontologiesZFAMA FMAEHDAA2EMAPAUberonCJ Mungall, C Torniai, GV Gkoutos, SE Lewis, MA Haendel.Uberon, an in...
UBERONcerebellumcerebellarvermisppcerebellumcerebellarvermiscerebellumvermis ofcereblleumposteriorlobe ofcerebellumppMA:mo...
Niknejad, A., Comte, A., Parmentier, G., Roux, J., Frederic, B., & Robinson-rechavi, M. (2011).vHOG , a multi-species vert...
Evo-devo applicationsDahdul, et al. 2010. Evolutionary characters, phenotypes and ontologies: curating data from the syste...
The Monarch InitiativeThe model systems research networkWe are under constructionGoals are to: Aggregate model systems ge...
Can we search by phenotype alone?Washington NL, Haendel MA, Mungall CJ, Ashburner M, et al. (2009) Linking Human Diseases ...
Integrating phenotypes using ontologies
Organism Genotypezebrafish fgf8ati282a/+; shhatq252/tq252(AB)mouse B6.Cg-Shhtm1(EGFP/cre)Cjt/Jworm daf-2(e1370) III; fog-2...
Model systemsphenotype andgenotype dataPulling it togetherNIF DISCOData ingest Ontology annotationOWLSIMEnabling phenotype...
PhenotypicqualitiesCellsPhenotypicabnormalities(Human MouseZebrafish)MolecularfunctionBiologicalprocessCellularcomponentAna...
These integration projects…well,integrateCTSAconnectReveal Connections. Realize Potential.OHSU BiolibrarypeopleResearchres...
Conclusions Ontologies have provided us the capability tointegrate a variety of biomedical data, atdifferent levels of gr...
Monarch InitiativeCTSAconnectBiospecimen OntologyOHSUMelissa HaendelCarlo TorniaiNicole VasilevskyChris KelleherShahim Ess...
Upcoming SlideShare
Loading in …5
×

NCBO haendel talk 2013

520 views

Published on

Part of the NCBO seminar series
http://www.bioontology.org/webinar-series

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
520
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • How can we:Help science be more reproducible?Provide access to resources and expertise?Give credit where credit is due? Make data more interoperable and visible?Make science more efficient?The projects that I am going to talk about all involve supporting and strengthening these connections
  • Need different figure
  • ChrisMungallA genome is a genome, whether it’s an amoeba or a human. Tweets.Mention models hereEnvironment too
  • ChrisMungallWhat tends to happen is that multiple non-interoperable
  • Chris Mungall
  • Chris MungallHumans must manually integrateMachines can’t make sense of this alone
  • Images: Seth Ruffins
  • Icbo paper.
  • Multiple coordinated: need to describe cell and tissue context
  • NCBO haendel talk 2013

    1. 1. Removing roadblocks: leveragingontologies for data aggregation andcomputationNCBO Seminar seriesMarch 6th, 2013Melissa HaendelOn behalf of very many team members
    2. 2. Topics for today The Research Symbiosis Some Integration Projects LeveragingOntologies A more complete research profile – integratingresearch resources and person information Improving query across multiple biospecimenrepositories Identifying disease candidates by leveragingcross-species anatomy and phenotype queries
    3. 3. ConsultDatabasesShareResources/DataPublishpapersContributetoDatabasesThe Research SymbiosisGet fundingDo ExperimentsThe Web
    4. 4. We’ve all been here before:Ontologies can help us do better.OMIM Query # of records“large bone” 1032"enlarged bone" 207"big bones" 22"huge bones" 4"massive bones" 39"hyperplastic bones" 12"hyperplastic bone" 44"bone hyperplasia" 173"increased bone growth" 836
    5. 5. Why not just map to ontology terms?Class A Class B Mapped? Useful?FMA: extensorretinaculum of wristMouseAnatomy: retina Yes NoVivo: legal decision Cognitive Atlas: decision Yes NoPlantOntology: Pith MouseAnatomy: medulla Yes NoTaxRank: domain NCI: protein domain Yes NoZfishAnat: hypophysis MouseAnatomy: pituitary No YesFMA: tibia FlyAnatomy: tibia Yes NoFMA: colon GAZ: Colón, Panama Yes NoQuality: male Chebi: maleate 2(-) Yes NoMapping requires manual work to perform and maintain; stringmatching for mapping can lead to spurious results; semantics ofmappings and provenance are not always clear
    6. 6. Topics for today The Research Symbiosis Some Integration Projects LeveragingOntologies A more complete research profile – integratingresearch resources and person information Improving query across multiple biospecimenrepositories Identifying disease candidates by leveragingcross-species anatomy and phenotype queries
    7. 7. CTSAconnect:A Linked Open Dataapproach to representclinical and researchexpertise, activities, andresourcesCTSA 10-001: 100928SB23PROJECT #: 00921-0001
    8. 8. About eagle-i:inventories “invisible” resourcesOntology-system forcollecting and queryingresearch resourceseagle-i.net net w o r k
    9. 9. About VIVO Primarily focused on people, activities, andoutcomes typically associated with researchnetworking Eager to represent more diverse components ofexpertise, across domains e.g., exhibits, performances, specifics about research Had worked with core facilities at Cornell torepresent labs, equipment, and services Started collaborating with eagle-i to go furtherwith research resources
    10. 10. At the intersection of Vivo and eagle-i
    11. 11. www.ctsaconnect.org CTSAconnectReveal Connections. Realize Potential.And then was born the“CTSAconnect” projectOk, so it is perhaps not a very informative name for an effort toconsolidate researcher, research activities, and researchresource representation, but what else are we going to call it?ARG! The Agents, Resources, and Grants ontology
    12. 12. ISF Content and modularizationeagle-IResearch resourcesVIVOPerson profilingShareCenterDiscussions, requests,share documentsISFContact OrganizationsAffiliationsServices EventsClinicalExpertiseReagentsOrganismsCredentials
    13. 13. ISF ModularizationConstraints• Different ontology modeling principles• Active ongoing development of eagle-i and VIVO applications• Investments in existing RDF datasets and the need for stabletargetsBenefits• Flexibility in what modules to populate at a given site• Extensibility as needs and feedback influence future evolution
    14. 14.  Annotation view with approved or pending approval. Module view shows pending axiom changes per module and has ability to save thechanges with a log comment, and generate the spreadsheet summaryProtégé refactoring plugin
    15. 15. ISF Merging
    16. 16. Relating ICD9 to MeSH in support ofclinical expertise
    17. 17. Clinical expertise data visualization
    18. 18. Building translational teams We want to assemble teams of scientists toexamine, for example, specific drugs releasedfor repurposing Hard to identify and connect complementarybasic and clinical expertise across disciplines
    19. 19. Bringing together clinical expertiseand basic science expertiseRepresentation of a clinician expertise extractedFrom ICD-9 codes forBasic Researcher with SimilarExpertise based on MeSH TermsResourcesa resource related to Autoimmune disease
    20. 20. Relating researchers across disciplines
    21. 21. Topics for today The Research Symbiosis Some Integration Projects LeveragingOntologies A more complete research profile – integratingresearch resources and person information Improving query across multiple biospecimenrepositories Identifying disease candidates by leveragingcross-species anatomy and phenotype queries
    22. 22. OHSU’s Biolibrary and Search Engine Data aggregated from two repositories:– Department of Pathology repository (600K)– Knight Cancer Institute repository (16K) A web-based search engine over de-identifieddata Our group is applying semantic informatics toimprove– Data format and quality– Data integration across the two repositories– Search capabilitiesFunded by Medical ResearchFoundation of Oregon
    23. 23. Opportunities for improving theBiolibrary dataLimited anatomical data– Cancer registry table has 300+anatomical entities– Pathology table only 86– 99% of pathology reports (600K)have no anatomical codes– No anatomical relationships– Coded sites are not as specific asdescriptions in the pathologyreports
    24. 24. Current Search InterfaceTwo separate searchinterfacesMultiple forms
    25. 25. Biolibrary Text SearchSyntactic free textsearch
    26. 26. Coded Syntactic SearchSearch throughanatomy andhistology lists
    27. 27. Extracting ontology concepts Pathology reports were the main focus– Main source of data in the current system– Contain richer information NLP tools were used to identify concepts Existing ontology resources were used toadd semantics
    28. 28. Developing a Biospecimen ontologyPhenotypes(PATO)InformationOntology (IAO)•HPO•SNOMED•NCI Thesaurus•ICDO/ICD9•GO•CHEBI•CellAnatomy(FMA, Uberon)Medicine(OGMS)Classes, Types,VocabularyData, InstancesPathology CatalogPathology InventoryPathologyReportInstance #123 Instance #456Instantiates Classifies asUses
    29. 29. Structured data vs. pathology report(about 7K cases)However, pathology report also includes:• Low grade pancreatic intraepithelial neoplasia• Extensive perineural invasion• Acute and chronic cholecystitis• Bile duct tissue with chronic inflammation• Chronic pancreatitis• Acute gastric serositisAvailable structured data from one case:
    30. 30. Adding Logical Relationships About 400 anatomical entities were mappedto the Foundational Model of Anatomy An additional 300 to SNOMED Used the is_a and part_of relations Re-represented this in a semantic andcomputable format Allows for semantic queries
    31. 31. Considerations Concept mapping helps with document retrieval Does not necessarily imply a fact– Negation– Differential diagnosis– Past case history Researchers will likely need aggregated factsfrom multiple sources to support real researchqueries Information extraction options are beingexplored as part of this work
    32. 32. Topics for today The Research Symbiosis Some Integration Projects LeveragingOntologies A more complete research profile – integratingresearch resources and person information Improving query across multiple biospecimenrepositories Identifying disease candidates by leveragingcross-species anatomy and phenotype queries
    33. 33. VertebrataAscidiansArthropodaAnnelidaMolluscaEchinodermatatetrapod limbsampullaetube feetparapodiaWe want to understand genefunction across taxa
    34. 34. Databasing phenotypesis hard• Free text descriptions• Clinical note• Models• Atlases• Images• Controlled terms• Multiple file formats• Measurements• …ATTCGGATTACCGTATTA…genes, regulatory elements, …sequenceSequence data
    35. 35. Databases proliferateATTCGGATTACCGTATTA…genes, regulatory elements, …sequenceSequence data
    36. 36. Ontologies as a tool for unificationDisease-PhenotypedatabasesDiseasephenotypeontologyExpressiondataGene functiondataCell and tissueontologyGOannotationsontologiesAshburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., et al. (2000). Gene ontology: toolfor the unification of biology. The Gene Ontology Consortium. Nat Genet, 25(1). doi:10.1038/75556
    37. 37. Yet problems remainsIncompletedataNot connectedontologyMissing & incorrectannotationsMultipleOverlappingOntologiesontologyontologyontologyontologyontologyontologyontologyontologyontologyontologyontologyontologyontologyontologyontologyAnnotationsmiss the importantbiology
    38. 38. Ontologies built for one species willnot work for othershttp://fme.biostr.washington.edu:8080/FME/index.htmlhttp://ccm.ucdavis.edu/bcancercd/22/mouse_figure.html
    39. 39. Uberon: a multi-species anatomyontology• Contents:– Over 8,000 classes (terms)– Multiple relationships, including subclass, part-of anddevelops-from• Scope: metazoa (animals)– Current focus is chordates– Federated approach for other taxa• Uberon classes are generic / species neutral– ‘mammary gland’: you can use this class for any mammal!– ‘lung’: you can use this class for any vertebrate (that haslungs)Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrativemulti-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5http://genomebiology.com/2012/13/1/R5
    40. 40. Bridging anatomy ontologiesZFAMA FMAEHDAA2EMAPAUberonCJ Mungall, C Torniai, GV Gkoutos, SE Lewis, MA Haendel.Uberon, an integrative multi-species anatomy ontology. Genome biology 13 (1), R5SNOMEDNCItGOCL
    41. 41. UBERONcerebellumcerebellarvermisppcerebellumcerebellarvermiscerebellumvermis ofcereblleumposteriorlobe ofcerebellumppMA:mouseFMA:humanGO/NIF: subcellular GO/NIF: subcellularaxonCL:Purkinje cellpi iCL:Purkinje cellaxoniiiidendrite dendritecerebellumposteriorlobecerebellumposteriorlobepppUberon enablesqueries acrossgranularity
    42. 42. Niknejad, A., Comte, A., Parmentier, G., Roux, J., Frederic, B., & Robinson-rechavi, M. (2011).vHOG , a multi-species vertebrate ontology of homologous or- gans groups. Ecology, 1-5.http://bgee.unil.ch
    43. 43. Evo-devo applicationsDahdul, et al. 2010. Evolutionary characters, phenotypes and ontologies: curating data from the systematic biology literature. PLoS ONE 5(5):e10708.doi:10.1371/journal.pone.0010708
    44. 44. The Monarch InitiativeThe model systems research networkWe are under constructionGoals are to: Aggregate model systems genotypeand phenotype information Integrate with network, genomic, andfunctional data Leverage ontologies for phenotypesimilarity matching Build knowledge exploration tools forend users Build services for other applicationsFunded by NIH # 1R24OD011883-01
    45. 45. Can we search by phenotype alone?Washington NL, Haendel MA, Mungall CJ, Ashburner M, et al. (2009) Linking Human Diseases to Animal Models Using Ontology-BasedPhenotype Annotation. PLoS Biol 7(11): e1000247. doi:10.1371/journal.pbio.1000247http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1000247
    46. 46. Integrating phenotypes using ontologies
    47. 47. Organism Genotypezebrafish fgf8ati282a/+; shhatq252/tq252(AB)mouse B6.Cg-Shhtm1(EGFP/cre)Cjt/Jworm daf-2(e1370) III; fog-2(oz40) Vhuman* ATP1A3(NM_152296.3)[c.946G>A, p.Gly316Ser]+[=]But..different organisms recordgenotypes differentlyPhenotypes can be attached to full or partialgenotypes, alleles, or variants
    48. 48. Model systemsphenotype andgenotype dataPulling it togetherNIF DISCOData ingest Ontology annotationOWLSIMEnabling phenotype-based knowledge discovery toolsONTOQUESTExtensible Web resourceDISCOvery, registration andinteroperation frameworkMONARCH tools and services
    49. 49. PhenotypicqualitiesCellsPhenotypicabnormalities(Human MouseZebrafish)MolecularfunctionBiologicalprocessCellularcomponentAnatomy(Human Mouse Zebrafish)MoleculesChemicals ProteinsZEBRAFISH-Term"abnormally disruptedpigmentation"MOUSE-Term"abnormal earpigmentation"HUMAN-TermAbnormality ofpigmentationUberphenoOntologiesSemantic IntegrationHP1 HP2 HP3 HP4 HP5HumanMouseZebrafishZP MP ZPPhenome systems analysisPhenogramGenome systems analysisT P73tumor protein p73GN B1guanine nucleotidebinding proteinCerebral cortical atrophyGN B2L1guanine nucleotidebinding protein (G protein),InteractomeOrthology AnnotationCNV syndromeGene function                                                                                                                                                                                                                                                                                                                                                                                    Pheno-clusterMPPhenotype Gene
    50. 50. These integration projects…well,integrateCTSAconnectReveal Connections. Realize Potential.OHSU BiolibrarypeopleResearchresourcesClinicalencountersPhenotypesbiospecimensgenesvariations
    51. 51. Conclusions Ontologies have provided us the capability tointegrate a variety of biomedical data, atdifferent levels of granularity, from differentapplications, and across domains Describing biology works best with multipleconnected ontologies We need smart data, not just big data We need better tools to integrate multipleontologies We need better tools to make use of smarterdata structures (e.g. reasoning costs)
    52. 52. Monarch InitiativeCTSAconnectBiospecimen OntologyOHSUMelissa HaendelCarlo TorniaiNicole VasilevskyChris KelleherShahim EssaidCornell UniversityDean KrafftJon Corson-RikertBrian LoweUniversity of FloridaMike ConlonChris BarnesNicholas RejackOHSUMelissa HaendelShahim EssaidCarlo TorniaiOHSUMelissa HaendelCarlo TorniaiShahim EssaidNicole VasilevskyScott HoffmanMatt BrushLBNLChris MungallSuzi LewisNicole WashingtonUCSD/NIFMaryann MartoneAnita BandrowskiJeff GretheAmarnath GuptaStony Brook UniversityMoises EisenbergErich BremerJanos HajagosHarvard UniversityDaniela BourgesSophia ChengUniversity at BuffaloBarry SmithDagobert SoergelZaloniWill CorbettRanjit DasBen SharmaUniversity of PittsburghHarry HochheiserChuck Borromeo

    ×