NCBO haendel talk 2013
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

NCBO haendel talk 2013

  • 493 views
Uploaded on

Part of the NCBO seminar series

Part of the NCBO seminar series
http://www.bioontology.org/webinar-series

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
493
On Slideshare
493
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
1
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • How can we:Help science be more reproducible?Provide access to resources and expertise?Give credit where credit is due? Make data more interoperable and visible?Make science more efficient?The projects that I am going to talk about all involve supporting and strengthening these connections
  • Need different figure
  • ChrisMungallA genome is a genome, whether it’s an amoeba or a human. Tweets.Mention models hereEnvironment too
  • ChrisMungallWhat tends to happen is that multiple non-interoperable
  • Chris Mungall
  • Chris MungallHumans must manually integrateMachines can’t make sense of this alone
  • Images: Seth Ruffins
  • Icbo paper.
  • Multiple coordinated: need to describe cell and tissue context

Transcript

  • 1. Removing roadblocks: leveragingontologies for data aggregation andcomputationNCBO Seminar seriesMarch 6th, 2013Melissa HaendelOn behalf of very many team members
  • 2. Topics for today The Research Symbiosis Some Integration Projects LeveragingOntologies A more complete research profile – integratingresearch resources and person information Improving query across multiple biospecimenrepositories Identifying disease candidates by leveragingcross-species anatomy and phenotype queries
  • 3. ConsultDatabasesShareResources/DataPublishpapersContributetoDatabasesThe Research SymbiosisGet fundingDo ExperimentsThe Web
  • 4. We’ve all been here before:Ontologies can help us do better.OMIM Query # of records“large bone” 1032"enlarged bone" 207"big bones" 22"huge bones" 4"massive bones" 39"hyperplastic bones" 12"hyperplastic bone" 44"bone hyperplasia" 173"increased bone growth" 836
  • 5. Why not just map to ontology terms?Class A Class B Mapped? Useful?FMA: extensorretinaculum of wristMouseAnatomy: retina Yes NoVivo: legal decision Cognitive Atlas: decision Yes NoPlantOntology: Pith MouseAnatomy: medulla Yes NoTaxRank: domain NCI: protein domain Yes NoZfishAnat: hypophysis MouseAnatomy: pituitary No YesFMA: tibia FlyAnatomy: tibia Yes NoFMA: colon GAZ: Colón, Panama Yes NoQuality: male Chebi: maleate 2(-) Yes NoMapping requires manual work to perform and maintain; stringmatching for mapping can lead to spurious results; semantics ofmappings and provenance are not always clear
  • 6. Topics for today The Research Symbiosis Some Integration Projects LeveragingOntologies A more complete research profile – integratingresearch resources and person information Improving query across multiple biospecimenrepositories Identifying disease candidates by leveragingcross-species anatomy and phenotype queries
  • 7. CTSAconnect:A Linked Open Dataapproach to representclinical and researchexpertise, activities, andresourcesCTSA 10-001: 100928SB23PROJECT #: 00921-0001
  • 8. About eagle-i:inventories “invisible” resourcesOntology-system forcollecting and queryingresearch resourceseagle-i.net net w o r k
  • 9. About VIVO Primarily focused on people, activities, andoutcomes typically associated with researchnetworking Eager to represent more diverse components ofexpertise, across domains e.g., exhibits, performances, specifics about research Had worked with core facilities at Cornell torepresent labs, equipment, and services Started collaborating with eagle-i to go furtherwith research resources
  • 10. At the intersection of Vivo and eagle-i
  • 11. www.ctsaconnect.org CTSAconnectReveal Connections. Realize Potential.And then was born the“CTSAconnect” projectOk, so it is perhaps not a very informative name for an effort toconsolidate researcher, research activities, and researchresource representation, but what else are we going to call it?ARG! The Agents, Resources, and Grants ontology
  • 12. ISF Content and modularizationeagle-IResearch resourcesVIVOPerson profilingShareCenterDiscussions, requests,share documentsISFContact OrganizationsAffiliationsServices EventsClinicalExpertiseReagentsOrganismsCredentials
  • 13. ISF ModularizationConstraints• Different ontology modeling principles• Active ongoing development of eagle-i and VIVO applications• Investments in existing RDF datasets and the need for stabletargetsBenefits• Flexibility in what modules to populate at a given site• Extensibility as needs and feedback influence future evolution
  • 14.  Annotation view with approved or pending approval. Module view shows pending axiom changes per module and has ability to save thechanges with a log comment, and generate the spreadsheet summaryProtégé refactoring plugin
  • 15. ISF Merging
  • 16. Relating ICD9 to MeSH in support ofclinical expertise
  • 17. Clinical expertise data visualization
  • 18. Building translational teams We want to assemble teams of scientists toexamine, for example, specific drugs releasedfor repurposing Hard to identify and connect complementarybasic and clinical expertise across disciplines
  • 19. Bringing together clinical expertiseand basic science expertiseRepresentation of a clinician expertise extractedFrom ICD-9 codes forBasic Researcher with SimilarExpertise based on MeSH TermsResourcesa resource related to Autoimmune disease
  • 20. Relating researchers across disciplines
  • 21. Topics for today The Research Symbiosis Some Integration Projects LeveragingOntologies A more complete research profile – integratingresearch resources and person information Improving query across multiple biospecimenrepositories Identifying disease candidates by leveragingcross-species anatomy and phenotype queries
  • 22. OHSU’s Biolibrary and Search Engine Data aggregated from two repositories:– Department of Pathology repository (600K)– Knight Cancer Institute repository (16K) A web-based search engine over de-identifieddata Our group is applying semantic informatics toimprove– Data format and quality– Data integration across the two repositories– Search capabilitiesFunded by Medical ResearchFoundation of Oregon
  • 23. Opportunities for improving theBiolibrary dataLimited anatomical data– Cancer registry table has 300+anatomical entities– Pathology table only 86– 99% of pathology reports (600K)have no anatomical codes– No anatomical relationships– Coded sites are not as specific asdescriptions in the pathologyreports
  • 24. Current Search InterfaceTwo separate searchinterfacesMultiple forms
  • 25. Biolibrary Text SearchSyntactic free textsearch
  • 26. Coded Syntactic SearchSearch throughanatomy andhistology lists
  • 27. Extracting ontology concepts Pathology reports were the main focus– Main source of data in the current system– Contain richer information NLP tools were used to identify concepts Existing ontology resources were used toadd semantics
  • 28. Developing a Biospecimen ontologyPhenotypes(PATO)InformationOntology (IAO)•HPO•SNOMED•NCI Thesaurus•ICDO/ICD9•GO•CHEBI•CellAnatomy(FMA, Uberon)Medicine(OGMS)Classes, Types,VocabularyData, InstancesPathology CatalogPathology InventoryPathologyReportInstance #123 Instance #456Instantiates Classifies asUses
  • 29. Structured data vs. pathology report(about 7K cases)However, pathology report also includes:• Low grade pancreatic intraepithelial neoplasia• Extensive perineural invasion• Acute and chronic cholecystitis• Bile duct tissue with chronic inflammation• Chronic pancreatitis• Acute gastric serositisAvailable structured data from one case:
  • 30. Adding Logical Relationships About 400 anatomical entities were mappedto the Foundational Model of Anatomy An additional 300 to SNOMED Used the is_a and part_of relations Re-represented this in a semantic andcomputable format Allows for semantic queries
  • 31. Considerations Concept mapping helps with document retrieval Does not necessarily imply a fact– Negation– Differential diagnosis– Past case history Researchers will likely need aggregated factsfrom multiple sources to support real researchqueries Information extraction options are beingexplored as part of this work
  • 32. Topics for today The Research Symbiosis Some Integration Projects LeveragingOntologies A more complete research profile – integratingresearch resources and person information Improving query across multiple biospecimenrepositories Identifying disease candidates by leveragingcross-species anatomy and phenotype queries
  • 33. VertebrataAscidiansArthropodaAnnelidaMolluscaEchinodermatatetrapod limbsampullaetube feetparapodiaWe want to understand genefunction across taxa
  • 34. Databasing phenotypesis hard• Free text descriptions• Clinical note• Models• Atlases• Images• Controlled terms• Multiple file formats• Measurements• …ATTCGGATTACCGTATTA…genes, regulatory elements, …sequenceSequence data
  • 35. Databases proliferateATTCGGATTACCGTATTA…genes, regulatory elements, …sequenceSequence data
  • 36. Ontologies as a tool for unificationDisease-PhenotypedatabasesDiseasephenotypeontologyExpressiondataGene functiondataCell and tissueontologyGOannotationsontologiesAshburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., et al. (2000). Gene ontology: toolfor the unification of biology. The Gene Ontology Consortium. Nat Genet, 25(1). doi:10.1038/75556
  • 37. Yet problems remainsIncompletedataNot connectedontologyMissing & incorrectannotationsMultipleOverlappingOntologiesontologyontologyontologyontologyontologyontologyontologyontologyontologyontologyontologyontologyontologyontologyontologyAnnotationsmiss the importantbiology
  • 38. Ontologies built for one species willnot work for othershttp://fme.biostr.washington.edu:8080/FME/index.htmlhttp://ccm.ucdavis.edu/bcancercd/22/mouse_figure.html
  • 39. Uberon: a multi-species anatomyontology• Contents:– Over 8,000 classes (terms)– Multiple relationships, including subclass, part-of anddevelops-from• Scope: metazoa (animals)– Current focus is chordates– Federated approach for other taxa• Uberon classes are generic / species neutral– ‘mammary gland’: you can use this class for any mammal!– ‘lung’: you can use this class for any vertebrate (that haslungs)Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrativemulti-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5http://genomebiology.com/2012/13/1/R5
  • 40. Bridging anatomy ontologiesZFAMA FMAEHDAA2EMAPAUberonCJ Mungall, C Torniai, GV Gkoutos, SE Lewis, MA Haendel.Uberon, an integrative multi-species anatomy ontology. Genome biology 13 (1), R5SNOMEDNCItGOCL
  • 41. UBERONcerebellumcerebellarvermisppcerebellumcerebellarvermiscerebellumvermis ofcereblleumposteriorlobe ofcerebellumppMA:mouseFMA:humanGO/NIF: subcellular GO/NIF: subcellularaxonCL:Purkinje cellpi iCL:Purkinje cellaxoniiiidendrite dendritecerebellumposteriorlobecerebellumposteriorlobepppUberon enablesqueries acrossgranularity
  • 42. Niknejad, A., Comte, A., Parmentier, G., Roux, J., Frederic, B., & Robinson-rechavi, M. (2011).vHOG , a multi-species vertebrate ontology of homologous or- gans groups. Ecology, 1-5.http://bgee.unil.ch
  • 43. Evo-devo applicationsDahdul, et al. 2010. Evolutionary characters, phenotypes and ontologies: curating data from the systematic biology literature. PLoS ONE 5(5):e10708.doi:10.1371/journal.pone.0010708
  • 44. The Monarch InitiativeThe model systems research networkWe are under constructionGoals are to: Aggregate model systems genotypeand phenotype information Integrate with network, genomic, andfunctional data Leverage ontologies for phenotypesimilarity matching Build knowledge exploration tools forend users Build services for other applicationsFunded by NIH # 1R24OD011883-01
  • 45. Can we search by phenotype alone?Washington NL, Haendel MA, Mungall CJ, Ashburner M, et al. (2009) Linking Human Diseases to Animal Models Using Ontology-BasedPhenotype Annotation. PLoS Biol 7(11): e1000247. doi:10.1371/journal.pbio.1000247http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1000247
  • 46. Integrating phenotypes using ontologies
  • 47. Organism Genotypezebrafish fgf8ati282a/+; shhatq252/tq252(AB)mouse B6.Cg-Shhtm1(EGFP/cre)Cjt/Jworm daf-2(e1370) III; fog-2(oz40) Vhuman* ATP1A3(NM_152296.3)[c.946G>A, p.Gly316Ser]+[=]But..different organisms recordgenotypes differentlyPhenotypes can be attached to full or partialgenotypes, alleles, or variants
  • 48. Model systemsphenotype andgenotype dataPulling it togetherNIF DISCOData ingest Ontology annotationOWLSIMEnabling phenotype-based knowledge discovery toolsONTOQUESTExtensible Web resourceDISCOvery, registration andinteroperation frameworkMONARCH tools and services
  • 49. PhenotypicqualitiesCellsPhenotypicabnormalities(Human MouseZebrafish)MolecularfunctionBiologicalprocessCellularcomponentAnatomy(Human Mouse Zebrafish)MoleculesChemicals ProteinsZEBRAFISH-Term"abnormally disruptedpigmentation"MOUSE-Term"abnormal earpigmentation"HUMAN-TermAbnormality ofpigmentationUberphenoOntologiesSemantic IntegrationHP1 HP2 HP3 HP4 HP5HumanMouseZebrafishZP MP ZPPhenome systems analysisPhenogramGenome systems analysisT P73tumor protein p73GN B1guanine nucleotidebinding proteinCerebral cortical atrophyGN B2L1guanine nucleotidebinding protein (G protein),InteractomeOrthology AnnotationCNV syndromeGene function                                                                                                                                                                                                                                                                                                                                                                                    Pheno-clusterMPPhenotype Gene
  • 50. These integration projects…well,integrateCTSAconnectReveal Connections. Realize Potential.OHSU BiolibrarypeopleResearchresourcesClinicalencountersPhenotypesbiospecimensgenesvariations
  • 51. Conclusions Ontologies have provided us the capability tointegrate a variety of biomedical data, atdifferent levels of granularity, from differentapplications, and across domains Describing biology works best with multipleconnected ontologies We need smart data, not just big data We need better tools to integrate multipleontologies We need better tools to make use of smarterdata structures (e.g. reasoning costs)
  • 52. Monarch InitiativeCTSAconnectBiospecimen OntologyOHSUMelissa HaendelCarlo TorniaiNicole VasilevskyChris KelleherShahim EssaidCornell UniversityDean KrafftJon Corson-RikertBrian LoweUniversity of FloridaMike ConlonChris BarnesNicholas RejackOHSUMelissa HaendelShahim EssaidCarlo TorniaiOHSUMelissa HaendelCarlo TorniaiShahim EssaidNicole VasilevskyScott HoffmanMatt BrushLBNLChris MungallSuzi LewisNicole WashingtonUCSD/NIFMaryann MartoneAnita BandrowskiJeff GretheAmarnath GuptaStony Brook UniversityMoises EisenbergErich BremerJanos HajagosHarvard UniversityDaniela BourgesSophia ChengUniversity at BuffaloBarry SmithDagobert SoergelZaloniWill CorbettRanjit DasBen SharmaUniversity of PittsburghHarry HochheiserChuck Borromeo