Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Monarch Initiative: From Model Organism to Precision Medicine

NIH BD2K all-hands meeting poster November 12, 2015.

Attempts at correlating phenotypic aspects of disease with causal genetic influences are often confounded by the challenges of interpreting diverse data distributed across numerous resources. New approaches to data modeling, integration, tooling, and community practices are needed to make efficient use of these data. The Monarch Initiative is an international consortium working on the development of shared data, tools, and standards to enable direct translation of integrated genotype, phenotype, and environmental data from human and model organisms to enhance our understanding of human disease. We utilize sophisticated semantic mapping techniques across a diverse set of standardized ontologies to deeply integrate data across species, sources, and modalities. Using phenotype similarity matching algorithms across these data enables disorder prediction, variant prioritization, and patient matching against known diseases and model organisms. These similarity algorithms form the core of several innovative tools. The Exomiser, which enables exome variant prioritization by combining pathogenicity, frequency, inheritance, protein interaction, and cross-species phenotype data. Our Phenotype Sufficiency tool provides clinicians the ability to compare patient phenotypic profiles using the Human Phenotype Ontology to determine uniqueness and specificity in support of variant prioritization. The PhenoGrid visualization widget illustrates phenotype similarity between patients, known diseases, and model organisms. Monarch develops models in collaboration with the community in support of the burgeoning genotype-phenotype disease research community. We have successfully used Exomiser to solve a number of undiagnosed patient cases in collaboration with the NIH Undiagnosed Disease Program. Ongoing development in coordination with the Global Alliance for Genetic Health (GA4GH) and other groups will catalyze the realization of our goal of a vital translational community focused on the collaborative application of integrated genotype, phenotype, and environmental data to human disease.

  • Be the first to comment

  • Be the first to like this

The Monarch Initiative: From Model Organism to Precision Medicine

  1. 1. ??? Monarch is supported generously by: a NIH Office of the Director Grant #5R24OD011883 as well as by NCI/Leidos #15X143, BD2K U54HG007990-S2 (Haussler) & BD2K PA-15-144-U01 (Kesselman) info@monarchinitiative.org @monarchinit The Problem: Human genome is poorly annotated A better understanding of human gene function and disease mechanisms is critical for diagnosis, precision medicine, and targeted therapies The Approach: Monarch cross-species G2P Integration Pipeline Ontologies Data Standards Curation and Data Modeling Algorithms Tools The Solution: Leverage all the species data Solve the cross-species language divide www.monarchinitiative.org/sources Acknowledgements and Contact Info Palmoplantar hyperkeratosis Thick hand skin Ulcerated paws MONARCH TEAM MAINTAINS MONARCH TEAM CONTRIBUTES LEGEND Data source Ontology Bridging Ontology PHENOTYPESDISEASES MODEL ORGNISMHUMAN Community Ontology Term Phenotype ANATOMY ClinVar Coriell CTD Elem of Morph Gene Reviews GWAS HPOA OMIMdb Orphanet KEGG AnimalQTLDB FlyBase IMPC MGI MPD OMIA RGD WormBase ZFIN MeSH MedGen OMIM HP EFO ORDO VT FBcv ZP WP MP MONDO UPheno MA ZFA UBERON FBbt WA CL EMAPA MODEL ORGNISM HUMAN PROBLEM Phenotypic language differs by organism and also by community, thus impeding integration SOLUTION SOLUTION Monarch integrates the data sources through bridging ontologies PROBLEM SOLUTION PROBLEM SOLUTION SOLUTION SOLUTION SOLUTION SOLUTION The phenotypes are associated with very different aspects of the genotype in each data source. The Challenge: Fragmented, heterogeneous G2P data Mus mgdmgd mmrrcmmrrc mgimgi animalqtldbanimalqtldb Homo cgdcgd clinvarclinvar gwascataloggwascatalog hpoahpoa keggkegg omimomim orphanetorphanet coriellcoriell omiaomia monarchmonarch-curated Canis Macaca Panthera Equus Ovis Danio zfinzfin Gallus Sula Vulpes Anas Coturnix Peromyscus Tragelaphus other >100 SPECIES Bos Sus 0% 40% 60% 80% 100% Human only Human + other 20% The phenotypic consequences of mutation for the human coding genome are <20%; inclusion of orthologs from other species boosts this number to over 80% We learn about different phenotypes from different species, and want to use all this data Improve data quality and interoperability Evidence and provenance for G2P associations is incomplete, not computable, and frequently conflated. This hampers integration and pathogenicity determination. Disentangle these concepts, and model data to make it computable. PROBLEMS SOLUTIONS https://mme.monarchinitiative.org github.com/ga4gh/schemas Diagnosing rare diseases requires identifying similar patients and models Monarch integrated cross-species data available on pa- tient matchmaker exchange. Data models for modeling any bio- logical database source expecially G2P sources are highly heterogene- ous. Data are insufficiently described to understand what they are or how they were produced. Monarch integrated cross- species data available on patient matchmaker exchange Monarch is contributing GA4GH Schemas to bridge the heterogeneous G2P sources HCLS provides a guide to indicate what are the essential metadata, and how to express it. Monarch was a key contributor toward this community effort and is testing the model for all sources in its corpus Compute over diseases, phenotypes, modes to diagnose diseases PhenoGrid http://www.sanger.ac.uk/science/tools/exomiser http://patientarchive.org/ Exomiser https://www.npmjs.com/package/phenogrid Whole exome Remove off-target and common variants Variant score from allele freq and pathogenicity Phenotype score from phenotypic similarity PHIVE score to give final candidates Mendelianfilters Combine genotype and phenotype data for variant prioritization Visualize phenotype profile comparisons Between patients and... - Other patients - Known diseases - Models Embeddable 3rd party widget for data resources PhenoTua / Noctua Uniquely identify a model or disease Check organism/genotype nomenclature Choose terms from any phenotype ontology Provide evidence Edit collaboratively, group sharing View in two modalities: - Ontology smart spreadsheet - Graphical Causal Networks HPO Pubmed Browser Curate causal networks between genes, genotypes, phenotypes, diseases, using organism-agnostic standardized owl models http://create.monarchinitiative.org/ Check Annotation Sufficiency Automated extraction of Human Phenotype Ontology concepts from free text clinical summaries. Intuitive visualization of patient phenotype profiles and diagnoses. Immediate visual feed-back on phenotype profiles using the Monarch annotation sufficiency score. Fine-grained patient sharing access control. Encrypted patient sensitive data - yet with the possibility of searching over this data. Visualize and Browse Relationships Finding literature relevant to a set of phenotypes should be easy. http://pubmed-browser.human-phenotype-ontology.org/ Zemojtel, T. et al. Effective diagnosis of genetic disease by computation- al phenotype analysis of the disease-associated genome. Science Trans- lational Medicine Vol. 6, Issue 252, pp. 252ra123 (11 diagnosed fami- lies) Pippucci, T. et al. A novel null homozygous mutation confirms CAC- NA2D2 as a gene mutated in epileptic encephalopathy. PLoS One 8, e82154 (2013). (1 diagnosed family) Requena, T. et al. Identification of two novel mutations in FAM136A and DTNA genes in autosomal-dominant familial Meniereʼs disease. Human Molecular Genetics. 24, 1119–26 (2015). (2 diagnosed families) Bone, W. et al. Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency. Genetics in Medicine. In press (2015). doi:10.1038/gim.2015.137 (4 diagnosed families) 18PublishedDiagnoses www.monarchinitiative.org www.owlsim.org Patient X Disease Y Model Z Make causal relationships computable: Improve modeling of evidence and provenance owlsim http://brcaexchange.org/ Providence Evidence Claim - Data (eg: images, sequences) - Evidence codes - Publications - Statistical confidence (p-val, z-score) - Summary figures - Conclusions from previous studies - Tacit knowledge of a domain expert - types of assay/technique/study or instances thereof - agent(s) who produced evidence - agent(s) who asserted the claim - time and place - materials (e.g. models systems, reagents, instruments) Process history Key participants in process Outputs of process http://tinyurl.com/brca-g2p http://tinyurl.com/acmg-guidelines - Causal relationships, hypothesized relationships, coorelations etc. Fuzzy matching between patients, phenotypes, and diseases Problem: It is difficult to prioritize candidate genes for diagnosis, or identifying model that best capitulates a disease Compute similarity of phenotypic profiles Graph-based semantic similarity PROBLEM SOLUTION Researchers donʼt know when their phenotyping is sufficient to be useful beyond their specialized community Clinicians donʼt know when their phe- notyping is sufficient for diagnosis Compare patient or organism phenotypic profile against all known diesases and genotypes. Get feedback in real time. http://tinyurl.com/phenotypesufficiency https://monarchinitiative.org/page/services patient archive ? ? ? ? ? patient archive PROBLEMS SOLUTIONS Problems with identifier design and provision result in link rot and content drift therefore com- promising the flow and integrity of information. Identifiers must resolve, and when referenced in the same context must not collide. Prefixes play a critical role in these two goals; however, due to confusion and inconsistency about prefixes, a single identifier can be referenced multiple differ- ent ways: 12345, MGI:12345, MGI:MGI:12345, MGI:MGI_12345, thus complicating determina- tions of equivalence and data integration. Moreover prefixes used in the same context can conflict (eg. GEO). Monarch is a key contributor to identifier standards for big data integration 10 Simple Rules for Design and Provision of Life Science Database Identifiers for the Web Monarch is leading a community effort to coordinate prefixes between the eight active prefix registries JDDCP prefix commons zenodo.org/record/31765 github.com/prefixcommons health care & life sciences w3.org/TR/hcls-dataset/ MENDELIAN DISEASES 3,462 OMIM ? 47,964 VARIANTS CLINVAR with no known genetic basis with no known diseases 1 Oregon Health & Sciences University; Portland, OR • 2 Lawrence Berkeley National Lab, Berkeley, CA • 3 University of Pittsburgh, Pittsburgh, PA • 4 University of California San Diego, San Diego, CA • 5 Garvan Institute, Sydney, Australia • 6 Sanger Center, Hinxton, UK • 7 Charite From Model Mechanism to Precision Medicine: an Open Science Integrated Genotype-Phenotype Platform Nicole Vasilevsky1, Nicole Washington2, Chuck Borromeo3, Matthew Brush1, Seth Carbon2, Michael Davis3, Nathan Dunn2, Mark Englestad1, Jeremy Espino3, Shahim Essaid1, Jeffrey Grethe4, Tudor Groza5, Harry Hochheiser3, Sebastian Köhler6, Suzanna Lewis2, Julie McMurry1, Craig McNamara5, Chris Mungall2, Jeremy Nguyen Xuan2, Peter Robinson7, Kent Shefchek1, Damian Smedley6, Zhou Yuan3, Edwin Zhang5, Melissa Haendel1, Human Disease: HADZISELIMOVIC SYNDROME mouse model: b2b1035Clo (aka Blue Meanie) tricuspid valve atresia MP:0006123 prenatal growth retardation MP:0010865 persistent truncus arteriosis MP:0002633 cleft palate MP:0000111 1 Ventricular hypertrophy HP:0001714 High-arched palate HP:0000156 Failure to thrive HP:0001508 Pulmonary artery atresia HP:0004935 Renal hypoplasia HP:0000089 abnormal kidney morphology abnormal palate morphology growth deficiency Malformation of the heart and great vessels abnormal heart and great artery attachment duplex kidney MP:0004017 common (UPheno)

    Be the first to comment

    Login to see the comments

NIH BD2K all-hands meeting poster November 12, 2015. Attempts at correlating phenotypic aspects of disease with causal genetic influences are often confounded by the challenges of interpreting diverse data distributed across numerous resources. New approaches to data modeling, integration, tooling, and community practices are needed to make efficient use of these data. The Monarch Initiative is an international consortium working on the development of shared data, tools, and standards to enable direct translation of integrated genotype, phenotype, and environmental data from human and model organisms to enhance our understanding of human disease. We utilize sophisticated semantic mapping techniques across a diverse set of standardized ontologies to deeply integrate data across species, sources, and modalities. Using phenotype similarity matching algorithms across these data enables disorder prediction, variant prioritization, and patient matching against known diseases and model organisms. These similarity algorithms form the core of several innovative tools. The Exomiser, which enables exome variant prioritization by combining pathogenicity, frequency, inheritance, protein interaction, and cross-species phenotype data. Our Phenotype Sufficiency tool provides clinicians the ability to compare patient phenotypic profiles using the Human Phenotype Ontology to determine uniqueness and specificity in support of variant prioritization. The PhenoGrid visualization widget illustrates phenotype similarity between patients, known diseases, and model organisms. Monarch develops models in collaboration with the community in support of the burgeoning genotype-phenotype disease research community. We have successfully used Exomiser to solve a number of undiagnosed patient cases in collaboration with the NIH Undiagnosed Disease Program. Ongoing development in coordination with the Global Alliance for Genetic Health (GA4GH) and other groups will catalyze the realization of our goal of a vital translational community focused on the collaborative application of integrated genotype, phenotype, and environmental data to human disease.

Views

Total views

1,246

On Slideshare

0

From embeds

0

Number of embeds

64

Actions

Downloads

35

Shares

0

Comments

0

Likes

0

×