Presented at the II International Summer School for Rare Disease and Orphan Drug Registries, September 15-19, 2014, Organized by the National Centre for Rare Diseases
Istituto Superiore di Sanità (ISS), Rome, Italy.
Note the extensive contribution by many consortium members and partners listed in the acknowledgements slide.
The Human Phenotype Ontology (HPO) was developed to describe phenotypic abnormalities, aka, “deep phenotyping”, whereby symptoms and characteristic phenotypic findings (a phenotypic profile) are captured. The HPO has been utilized to great success for assisting computational phenotype comparison against known diseases, other patients, and model organisms to support diagnosis of rare disease patients. Clinicians and geneticists create phenotypic profiles based on clinical evaluation, but this is time consuming and can miss important phenotypic features. Patients are sometimes the best source of information about their symptoms that might otherwise be missed in a clinical encounter. However, HPO primarily use medical terminology, which can be difficult for patients and their families to understand. To make the HPO accessible to patients, we systematically added non-expert terminology (i.e., layperson terms) synonyms. Using semantic similarity, patient-recorded phenotypic profiles can be evaluated against those created clinically for undiagnosed patients to determine the improvement gained from the patient-driven phenotyping, as well as how much the patient phenotyping narrows the diagnosis. This patient-centric HPO can be utilized by all: in patient-centered rare disease websites, in patient community platforms and registries, or even to post one’s hard-to-diagnosed phenotypic profile on the Web.
المصطلحات الطبية أنا قدها | Medical Terminology : Yes I Can Meshari Alzahrani
When i was 1st year at medical school , i was facing problem with Medical Terminology , No one advice me indeed , So i make this presentation make Medical Terminology so Easy for Primary Medical students , hope you enjoy it :)
Women Can’t Win: Despite Making Educational Gains and Pursuing High-Wage Majo...CEW Georgetown
Women Can’t Win: Despite Making Educational Gains and Pursuing High-Wage Majors, Women Still Earn Less than Men explores the complex set of reasons that have kept the gender wage gap in place. Even when comparing men and women who have equal educational attainment and work in the same occupation, women still earn only 92 cents for every dollar earned by men.
The Human Phenotype Ontology (HPO) was developed to describe phenotypic abnormalities, aka, “deep phenotyping”, whereby symptoms and characteristic phenotypic findings (a phenotypic profile) are captured. The HPO has been utilized to great success for assisting computational phenotype comparison against known diseases, other patients, and model organisms to support diagnosis of rare disease patients. Clinicians and geneticists create phenotypic profiles based on clinical evaluation, but this is time consuming and can miss important phenotypic features. Patients are sometimes the best source of information about their symptoms that might otherwise be missed in a clinical encounter. However, HPO primarily use medical terminology, which can be difficult for patients and their families to understand. To make the HPO accessible to patients, we systematically added non-expert terminology (i.e., layperson terms) synonyms. Using semantic similarity, patient-recorded phenotypic profiles can be evaluated against those created clinically for undiagnosed patients to determine the improvement gained from the patient-driven phenotyping, as well as how much the patient phenotyping narrows the diagnosis. This patient-centric HPO can be utilized by all: in patient-centered rare disease websites, in patient community platforms and registries, or even to post one’s hard-to-diagnosed phenotypic profile on the Web.
المصطلحات الطبية أنا قدها | Medical Terminology : Yes I Can Meshari Alzahrani
When i was 1st year at medical school , i was facing problem with Medical Terminology , No one advice me indeed , So i make this presentation make Medical Terminology so Easy for Primary Medical students , hope you enjoy it :)
Women Can’t Win: Despite Making Educational Gains and Pursuing High-Wage Majo...CEW Georgetown
Women Can’t Win: Despite Making Educational Gains and Pursuing High-Wage Majors, Women Still Earn Less than Men explores the complex set of reasons that have kept the gender wage gap in place. Even when comparing men and women who have equal educational attainment and work in the same occupation, women still earn only 92 cents for every dollar earned by men.
THE PLACE OF HONEY BEES IN THE WORLD is a PPT slideshow by Dr. Kristen Healy, Entomology Specialist with the LSU AgCenter. This PPT describes the classification of honey bees, the origin of honey bees, and the common types of bees. This PPT also has a table that compares the advantages and disadvantages of honey bees races.
Summary of Pedigree Chart symbols.
How to use pedigree charts to analyse genetic conditions
Please note: this resource found on a fileserver on the internet. Author unknown.
The Monarch Initiative: From Model Organism to Precision Medicinemhaendel
NIH BD2K all-hands meeting poster November 12, 2015.
Attempts at correlating phenotypic aspects of disease with causal genetic influences are often confounded by the challenges of interpreting diverse data distributed across numerous resources. New approaches to data modeling, integration, tooling, and community practices are needed to make efficient use of these data. The Monarch Initiative is an international consortium working on the development of shared data, tools, and standards to enable direct translation of integrated genotype, phenotype, and environmental data from human and model organisms to enhance our understanding of human disease. We utilize sophisticated semantic mapping techniques across a diverse set of standardized ontologies to deeply integrate data across species, sources, and modalities. Using phenotype similarity matching algorithms across these data enables disorder prediction, variant prioritization, and patient matching against known diseases and model organisms. These similarity algorithms form the core of several innovative tools. The Exomiser, which enables exome variant prioritization by combining pathogenicity, frequency, inheritance, protein interaction, and cross-species phenotype data. Our Phenotype Sufficiency tool provides clinicians the ability to compare patient phenotypic profiles using the Human Phenotype Ontology to determine uniqueness and specificity in support of variant prioritization. The PhenoGrid visualization widget illustrates phenotype similarity between patients, known diseases, and model organisms. Monarch develops models in collaboration with the community in support of the burgeoning genotype-phenotype disease research community. We have successfully used Exomiser to solve a number of undiagnosed patient cases in collaboration with the NIH Undiagnosed Disease Program. Ongoing development in coordination with the Global Alliance for Genetic Health (GA4GH) and other groups will catalyze the realization of our goal of a vital translational community focused on the collaborative application of integrated genotype, phenotype, and environmental data to human disease.
Dataset description using the W3C HCLS standardmhaendel
This talk was presented at the BioCaddie http://biocaddie.org/ workshop at the Force15 conference (https://www.force11.org/meetings/force2015) on changing the future of scholarly communication. The goal was to increase awareness of why a Semantic Web-compliant standard was needed for describing data, where current standards fall short, and how this new emerging standard that extends prior efforts can aid data discovery and integration. This work is being lead by Michel Dumontier, Alasdair Gray, Joachim Baran, and M. Scott Marshall; participants and end-user testers are welcome, see: http://tiny.cc/hcls-datadesc-ed
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...mhaendel
Presented at the IRDiRC 2017 conference in Paris, Feb 9th, 2017 (http://irdirc-conference.org/). This talk reviews use of the Human Phenotype Ontology for phenotype comparisons against other patients, known diseases, and animal models for diagnostic discovery. It also discusses the new Phenopackets Exchange mechanism for open phenotypic data sharing.
www.monarchinitiative.org
www.phenopackets.org
www.human-phenotype-ontology.org
How open is open? An evaluation rubric for public knowledgebasesmhaendel
Presented at the 2017 International Biocuration Conference.
Data relevant to any given scientific investigation is highly decentralized across thousands of specialized databases. Within the Biocuration community, we recognize that the value of open scientific knowledge bases is that they make scientific knowledge easier to find and compute, thereby maximizing impact and minimizing waste. The ever-increasing number of databases makes us necessarily question what are our priorities with respect to maintaining them, developing new ones, or senescing/subsuming ones that have completed in their mission. Therefore, open biomedical data repositories should be carefully evaluated according to quality, accessibility, and value of the database resources over time and across the translational divide.
Traditional citation count and publication impact factors as a measure of success or value are known to be inadequate to assess the usefulness of a resource. This is especially true for integrative resources. For example, almost everyone in biomedicine relies on PubMed, but almost no one ever cites or mentions it in their publications. While the Nucleic Acids Research Database issues have increased citation of some databases, many still go unpublished or uncited; even novel derivations of methodology, applications, and workflows from biomedical knowledge bases are often “adapted” but never cited. There is a lack of citation best practices for widely used biomedical database resources (e.g. should a paper be cited? A URL? Is mention of the name and access date sufficient?).
We have developed a draft evaluation rubric for evaluating open science databases according to the commonly cited FAIR principles -- Findable, Accessible, Interoperable, and Reusable, but with three additional principles: Traceable, Licensed, and Connected. These additions are largely overlooked and underappreciated, yet are critical to reuse of the knowledge contained within any given database. It is worth noting that FAIR principles apply not only to the resource as a whole, but also to their key components; this “fractal FAIRness” means that even the license, identifiers, vocabularies, APIs themselves must be Findable, Accessible, Interoperable, Reusable, etc. Here we report on initial testing of our evaluation rubric on the recent NIH/Wellcome Trust Open Science projects and seek community input for how to further advance this rubric as a Biocuration community resource.
This talks gives some basic pointers on how to give power point presentations that are effective and get your point across. Great for young scientists, or really any academic field.
GRC Workshop held at Churchill College on Sep 21, 2014. Talk by Bronwen Aken discussing the Ensembl approach to annotating the complete human reference assembly.
THE PLACE OF HONEY BEES IN THE WORLD is a PPT slideshow by Dr. Kristen Healy, Entomology Specialist with the LSU AgCenter. This PPT describes the classification of honey bees, the origin of honey bees, and the common types of bees. This PPT also has a table that compares the advantages and disadvantages of honey bees races.
Summary of Pedigree Chart symbols.
How to use pedigree charts to analyse genetic conditions
Please note: this resource found on a fileserver on the internet. Author unknown.
The Monarch Initiative: From Model Organism to Precision Medicinemhaendel
NIH BD2K all-hands meeting poster November 12, 2015.
Attempts at correlating phenotypic aspects of disease with causal genetic influences are often confounded by the challenges of interpreting diverse data distributed across numerous resources. New approaches to data modeling, integration, tooling, and community practices are needed to make efficient use of these data. The Monarch Initiative is an international consortium working on the development of shared data, tools, and standards to enable direct translation of integrated genotype, phenotype, and environmental data from human and model organisms to enhance our understanding of human disease. We utilize sophisticated semantic mapping techniques across a diverse set of standardized ontologies to deeply integrate data across species, sources, and modalities. Using phenotype similarity matching algorithms across these data enables disorder prediction, variant prioritization, and patient matching against known diseases and model organisms. These similarity algorithms form the core of several innovative tools. The Exomiser, which enables exome variant prioritization by combining pathogenicity, frequency, inheritance, protein interaction, and cross-species phenotype data. Our Phenotype Sufficiency tool provides clinicians the ability to compare patient phenotypic profiles using the Human Phenotype Ontology to determine uniqueness and specificity in support of variant prioritization. The PhenoGrid visualization widget illustrates phenotype similarity between patients, known diseases, and model organisms. Monarch develops models in collaboration with the community in support of the burgeoning genotype-phenotype disease research community. We have successfully used Exomiser to solve a number of undiagnosed patient cases in collaboration with the NIH Undiagnosed Disease Program. Ongoing development in coordination with the Global Alliance for Genetic Health (GA4GH) and other groups will catalyze the realization of our goal of a vital translational community focused on the collaborative application of integrated genotype, phenotype, and environmental data to human disease.
Dataset description using the W3C HCLS standardmhaendel
This talk was presented at the BioCaddie http://biocaddie.org/ workshop at the Force15 conference (https://www.force11.org/meetings/force2015) on changing the future of scholarly communication. The goal was to increase awareness of why a Semantic Web-compliant standard was needed for describing data, where current standards fall short, and how this new emerging standard that extends prior efforts can aid data discovery and integration. This work is being lead by Michel Dumontier, Alasdair Gray, Joachim Baran, and M. Scott Marshall; participants and end-user testers are welcome, see: http://tiny.cc/hcls-datadesc-ed
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...mhaendel
Presented at the IRDiRC 2017 conference in Paris, Feb 9th, 2017 (http://irdirc-conference.org/). This talk reviews use of the Human Phenotype Ontology for phenotype comparisons against other patients, known diseases, and animal models for diagnostic discovery. It also discusses the new Phenopackets Exchange mechanism for open phenotypic data sharing.
www.monarchinitiative.org
www.phenopackets.org
www.human-phenotype-ontology.org
How open is open? An evaluation rubric for public knowledgebasesmhaendel
Presented at the 2017 International Biocuration Conference.
Data relevant to any given scientific investigation is highly decentralized across thousands of specialized databases. Within the Biocuration community, we recognize that the value of open scientific knowledge bases is that they make scientific knowledge easier to find and compute, thereby maximizing impact and minimizing waste. The ever-increasing number of databases makes us necessarily question what are our priorities with respect to maintaining them, developing new ones, or senescing/subsuming ones that have completed in their mission. Therefore, open biomedical data repositories should be carefully evaluated according to quality, accessibility, and value of the database resources over time and across the translational divide.
Traditional citation count and publication impact factors as a measure of success or value are known to be inadequate to assess the usefulness of a resource. This is especially true for integrative resources. For example, almost everyone in biomedicine relies on PubMed, but almost no one ever cites or mentions it in their publications. While the Nucleic Acids Research Database issues have increased citation of some databases, many still go unpublished or uncited; even novel derivations of methodology, applications, and workflows from biomedical knowledge bases are often “adapted” but never cited. There is a lack of citation best practices for widely used biomedical database resources (e.g. should a paper be cited? A URL? Is mention of the name and access date sufficient?).
We have developed a draft evaluation rubric for evaluating open science databases according to the commonly cited FAIR principles -- Findable, Accessible, Interoperable, and Reusable, but with three additional principles: Traceable, Licensed, and Connected. These additions are largely overlooked and underappreciated, yet are critical to reuse of the knowledge contained within any given database. It is worth noting that FAIR principles apply not only to the resource as a whole, but also to their key components; this “fractal FAIRness” means that even the license, identifiers, vocabularies, APIs themselves must be Findable, Accessible, Interoperable, Reusable, etc. Here we report on initial testing of our evaluation rubric on the recent NIH/Wellcome Trust Open Science projects and seek community input for how to further advance this rubric as a Biocuration community resource.
This talks gives some basic pointers on how to give power point presentations that are effective and get your point across. Great for young scientists, or really any academic field.
GRC Workshop held at Churchill College on Sep 21, 2014. Talk by Bronwen Aken discussing the Ensembl approach to annotating the complete human reference assembly.
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesEBI
Event: Plant and Animal Genomes Conference
Speaker: Bert Overduin
The Ensembl project (http://www.ensembl.org) seeks to enable genomic science by providing high quality, integrated annotation on chordate and selected eukaryotic genomes. All supported species include comprehensive, evidence-based gene annotations and a selected set of genomes includes additional data focused on variation, comparative, evolutionary, functional and regulatory annotation. As of Ensembl release 65 (December 2011), 56 species are fully supported. Ensembl data are accessible through an interactive web site, flat files, the data mining tool BioMart, direct database querying and a set of Perl APIs. Moreover, Ensembl is not just a data visualisation tool, but a suite of programs for data production (e.g. gene calling and comparative genomics) that can be deployed individually according to the needs of an individual community. Ensembl Genomes (http://www.ensemblgenomes.org) consists of five sub-portals (for bacteria, protists, fungi, plants and invertebrate metazoa) designed to complement the genomes available in Ensembl. It currently contains data for over 300 species. Many of the databases that support Ensembl Genomes have been built by, or in close collaboration with, groups that maintain specialist data resources for individual species, and we are actively seeking to extend the range of these collaborations. Together Ensembl and Ensembl Genomes offer a single unified interface across the taxonomic space. This presentation will consist of a short introduction to Ensembl and Ensembl Genomes followed by a demonstration of the respective websites and the BioMart data retrieval tool. Special attention will be given to recently developed functionality like the Variant Effect Predictor, which predicts the consequences of substitutions, insertions and deletions on transcripts and protein sequences, and the possibility to visualize your own data by attaching BAM and VCF files (for example).
Enhancing Rare Disease Literature for Researchers and PatientsErin D. Foster
Objectives: In rare disease research, structured phenotype information is crucial to document in order to draw connections between other known cases and work towards diagnosis and treatment of disease. The Human Phenotype Ontology (HPO) is a standardized vocabularly that describes phenotypic abnormalities encountered in human diseases. To enable the increased identification of traits (i.e., phenotypes) associated with rare diseases, the HPO was expanded to include layperson synonyms to make the ontology more accessible and useful to patients. Additionally, the HPO was used to annotate phenotypes in a sample of rare disease case reports to provide structured annotations of rare disease phenotypes.
Methods: The HPO was systematically reviewed and 'layperson synonyms' were added to include terms used by patients and non-medical professionals. Subsequent work annotated phenotypic descriptions in a sample of rare disease case reports with HPO terms. The literature sample was identified by filtering 'case reports' in PubMed and excluding articles that were already included in the Online Mendelian Inheritance of Man (OMIM) database. The sample set was further restricted to articles from the European Journal of Human Genetics for the pilot set, which resulted in a final sample size of 143 articles. The papers were reviewed and annotated for the following information: disease name, associated gene(s), and corresponding phenotypes.
Results: The review of the HPO resulted in approximately half of the terms including layperson synonyms. A subset of the literature sample was annotated to determine the best curation workflow. Of that subset of papers (n=20), 353 total phenotypes were identified. 12% of these phenotypes were not included in the HPO and required new term and/or synonym requests. Some challenges encountered in this work included maintaining consistency in HPO term definitions and use, as well as annotation reliability.
Conclusion: This work contributes to knowledge of rare diseases by curating the existing literature to provide structured annotation of rare disease traits, which helps with information retrieval and data interoperability and reuse. Additionally, the expansion of the HPO to include layperson synonyms enables patients to 'self-phenotype' and contribute to the identification of rare disease traits. Following the completed annotation of the literature sample, future work will focus on incorporating the annotations into databases that collect rare disease phenotypic information. Further work is also being done to add additional layperson synonyms to the HPO through review of patient forums and medical message boards to continue to identify terminology used by actual patients.
Enhancing the Human Phenotype Ontology for Use by the LaypersonNicole Vasilevsky
Presentation at the International Conference on Biological Ontology & BioCreative, August 1-4, 2016, Corvallis, Oregon, USA.
Abstract
In rare or undiagnosed diseases, physicians rely upon genotype and phenotype information in order to compare abnormalities to other known cases and to inform diagnoses. Patients are often the best sources of information about their symptoms and phenotypes. The Human Phenotype Ontology (HPO) contains over 12,000 terms describing abnormal human phenotypes. However, the labels and synonyms in the HPO primarily use medical terminology, which can be difficult for patients and their families to understand. In order to make the HPO more accessible to non-medical experts, we systematically added new synonyms using non-expert terminology (i.e., layperson terms) to the existing HPO classes or tagged existing synonyms as layperson. As a result, the HPO contains over 6,000 classes with layperson synonyms.
The Foundation of P4 Medicine Keynote Presentation as presented by Leroy Hood, M.D., PhD, at the Ohio State University Personalized Health Care National Conference 2010.
Envisioning a world where everyone helps solve diseasemhaendel
Keynote presented at the Semantic Web for Life Sciences conference in Cambridge, UK, December 9th, 2015
http://www.swat4ls.org/
The talk focuses on the use of ontologies for data integration to support rare disease diagnostics, and how so very many people unbeknownst to the patient or even to the researchers creating the data are involved in a diagnosis.
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...mhaendel
Presented at AMIA TBI CRI 2018.
Rare disease patients are expert in their medical history and these patients not only are some of the most engaged, but also they can themselves provision data for use in clinical evaluation. We therefore created a lay-person version of our clinical deep phenotyping instrument, the Human Phenotype Ontology. Here, we evaluate the diagnostic utility of this lay-HPO, and debut a new software tool for patient-led deep phenotyping.
Supporting Genomics in the Practice of Medicine by Heidi RehmKnome_Inc
View the webinar at http://www.knome.com/webinar-supporting-genomics-practice-medicine. In this presentation, Dr. Heidi Rehm, Chief Laboratory Director of the Laboratory for Molecular Medicine at Partners Healthcare and one of the Principal Investigators on ClinGen, elucidates the challenges of genomics in medicine and outlined the path to integrating large scale sequencing into clinical practice.
A neuromuscular disorder that leads to weakness of skeletal muscles.
Symptoms
Causes
Prevention
Complications
Common tests & procedures
Neurological examination:
Repetitive nerve stimulation test:
Antibody test:
Pulmonary function tests (PFTs): To check any breathing difficulty.
CT scan: To rule out a presence of tumor in thymus.
Magnetic resonance imaging (MRI): MRI of the chest is performed to rule out a presence of tumor in thymus.
Edrophonium (Tensilon) test:
Medication
Procedures
Nutrition
The Software and Data Licensing Solution: Not Your Dad’s UBMTA mhaendel
Presented at the Association of University Technology Managers (AUTM) Annual Conference 2018
Moderator: Arvin Paranjpe, Oregon Health & Science University
Speakers: Frank Curci, Ater Wynne LLP
Melissa Haendel, Oregon Health & Science University
Charles Williams, University of Oregon
Big data is an open frontier, and it’s quickly expanding. However, transaction costs and legal barriers stand squarely in the way of meaningful, far-reaching data integration. We’ll grapple with the issues regarding a large-scale data integration project across humans, model and non-model organisms. Without pointing fingers, we’ll also share a few highlights from the (Re)usable Data Project, which outlined a five-part rubric to evaluate data licenses with respect to clarity and the reuse and redistribution of data. In addition, the topic raises the question: How well-suited are off-the-shelf software and data licenses for universities? Data scientists and software programmers are all too quick to pick one when they release their technology on GitHub. What should technology transfer professionals
recommend? We’ll discuss the usefulness and attributes of a uniform software and data license for university researchers and software programmers.
Equivalence is in the (ID) of the beholdermhaendel
Presented at PIDapalooza 2018. https://pidapalooza.org/
Determining identifier equivalency is key to data integration and to realizing the scientific discoveries that can only be made by collating our vast disconnected data stores.
There are two key problems in determining equivalency - conceptual and syntactic alignment. Conceptual alignment often relies on Xrefs and string-matching against synonyms. There is indeed a better way! Algorithmic determination of identifier equivalency across different sources can use a combination of Xrefs, priors rules, existing semantic relations, and synonyms to create equivalency cliques than can highlight the discrepancies in conceptual definitions for manual review. This is especially useful for data sources annotated with concept drift and differences, such as diseases. Syntactic issues are that there are so many variations of the same identifier, making data joins difficult. We present a framework to reconcile and provide authoritative and integration-ready prefixed identifiers (CURIES), to capture and consolidate prefixes and to build links across key resource registries. The combination of JSON-LD context technology with a prefix metadata repository provides the basis for the infrastructure to handle identifiers in a consistent fashion. Finally, this architecture also allows resources to be self describing "beacons" with respect to their identifiers.
Building (and traveling) the data-brick road: A report from the front lines ...mhaendel
The NIH Data Commons must treat the data it will contain not unlike the mortar and stones of a road. To help our fellow scientists travelers use the road, we must engineer for heavy traffic and diverse destinations. There are many steps to architecting a robust and persistent road. First, the data must be sourced and manipulated into common data models. This requires versioned access to the data, equivalency determination of identifiers within the data or minting of new ones for the data and/or within it, manipulating the data according to common data models (e.g. a genotype-to-pehnotype association in one source may relate a variant to a disease, where in another it may be a set of alleles associated with a set of phenotypes, each source models the data differently). Inclusion of the data in the Commons must meet all licensing restrictions, which are varied and usually poorly declared, as well as security, HIPAA, and ethics requirements. Software tools are needed to perform the Enhance-Transform-Load (ETL) process on a regular cycle to keep the data current, and to assess changes and quality assurance over time. For records that disappear, there needs to be a way to keep an archive of them. Once in the Commons, the data requires a map to navigate the roads: where do you want to go? Indexing and search across the data requires having the data be self-reporting - loading ontologies used in the data for indexing and providing faceted query over these and other attributes, sophisticated text mining tools, relevance ranking, and equivalency and similarity determination from amongst different providers. Once found, the users need vehicles to drive upon the road. These are their workspaces, the place where they design and implement the operations they need in order to get where they want to go. Unimaginable scientific emeralds are to be found at the end of the road, as the sum of all the data, if well integrated and made computationally reusable, has proven to be well beyond the sum of its parts in getting us where we want to go.
Reusable data for biomedicine: A data licensing odysseymhaendel
Biomedical data integrators grapple with a fundamental blocker in research today: licensing for data use and redistribution. Complex licensing and data reuse restrictions hinder most publicly-funded, seemingly “open” biomedical data from being put to its full potential. Such issues include missing licenses, non-standard licenses, and restrictive provisions. The sheer diversity of licenses are particularly thorny for those that aim to redistribute data. Redistributors are often required to contact each sub-source to obtain permissions, and this is complicated by the fact that on each side of the agreement there may be multiple legal entities involved and some sub-sources may themselves already be aggregating data from other sub-sources. Furthermore, interpreting legal compliance with source data licensing and use agreements is complicated, as data is often manipulated, shared, and redistributed by many types of research groups and users in various and subtle ways. Here, we debut a new effort, the (Re)usable Data Project, where we have created a five-part rubric to evaluate biomedical data sources and their licensing information to determine the degree to which unnegotiated and unrestricted reuse and redistribution are provided. We have tested the (Re)usable Data rubric against various biomedical data sources, ranking each source on a scale of zero to five stars, and have found that approximately half of the resources rank poorly, getting 2.5 stars or less. Our goal is to help biomedical informaticians and other users navigate the plethora of issues in reusing and redistributing biomedical data. The (Re)usable Data project aims to promote standardization and ease of reuse licensing practices by data providers.
Data Translator: an Open Science Data Platform for Mechanistic Disease Discoverymhaendel
Architecture of language and data translation that underlays the NCATS Biomedical Data Translator. Presented at the Fanconi Anemia Annual Meeting. http://fanconi.org/index.php/research/annual_symposium
Deep phenotyping to aid identification of coding & non-coding rare disease v...mhaendel
Whole-exome sequencing has revolutionized disease research, but many cases remain unsolved because ~100-1000 candidates remain after removing common or non-pathogenic variants. We present Genomiser to prioritize coding and non-coding variants by leveraging phenotype data encoded with the Human Phenotype Ontology and a curated database of non-coding Mendelian variants. Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes.
Credit where credit is due: acknowledging all types of contributionsmhaendel
This is an update for COASP (http://oaspa.org/conference/) on the representation of attribution beyond authorship of a publication. Publications are proxies for the projects and people that area actually engaged in the work, and represent the dissemination aspect. How can we better understand the individual contributions and their impact? The openRIF, openVIVO and FORCE11 Attribution WG efforts aim to represent scholarship in a computationally tractable manner so as to enable credit and evaluation of all types of scholarly contributions.
Why the world needs phenopacketeers, and how to be onemhaendel
Keynote presented at the the Ninth International Biocuration Conference Geneva, Switzerland, April 10-14, 2016
The health of an individual organism results from complex interplay between its genes and environment. Although great strides have been made in standardizing the representation of genetic information for exchange, there are no comparable standards to represent phenotypes (e.g. patient disease features, variation across biodiversity) or environmental factors that may influence such phenotypic outcomes. Phenotypic features of individual organisms are currently described in diverse places and in diverse formats: publications, databases, health records, registries, clinical trials, museum collections, and even social media. In these contexts, biocuration has been pivotal to obtaining a computable representation, but is still deeply challenged by the lack of standardization, accessibility, persistence, and computability among these contexts. How can we help all phenotype data creators contribute to this biocuration effort when the data is so distributed across so many communities, sources, and scales? How can we track contributions and provide proper attribution? How can we leverage phenotypic data from the model organism or biodiversity communities to help diagnose disease or determine evolutionary relatedness? Biocurators unite in a new community effort to address these challenges.
On the frontier of genotype-2-phenotype data integrationmhaendel
Presented at AMIA TBI 2016 BD2K Panel. A description of the Monarch Initiative's efforts to perform deep phenotyping data integration across species, facilitate exchange, and build computable G2P evidence modesl to aid variant interpretation.
Force11: Enabling transparency and efficiency in the research landscapemhaendel
Presented at the Feb 2015, NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
http://www.niso.org/news/events/2015/virtual_conferences/sci_data_management/
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
insect taxonomy importance systematics and classification
The Application of the Human Phenotype Ontology
1. Melissa
Haendel
Sept 19th,
2014
THE APPLICATION OF
THE HUMAN
PHENOTYPE
ONTOLOGY
II International Summer School
RARE DISEASE AND ORPHAN
DRUG REGISTRIES
2. OUTLINE
Why phenotyping is hard
About Ontologies
Diagnosing known diseases
Getting the phenotype data
How much phenotyping is enough?
Model organism data for undiagnosed
diseases
5. THE CONSTELLATION OF PHENOTYPES SIGNIFIES THE DISEASE –
A ‘PROFILE’
http://www.learn.ppdictionary.com/prenatal_development_2.htm
http://www.pyroenergen.com/articles07/do
wns-syndrome.htm
http://www.theguardian.com/commentisfree/2009/oct/27/downs-syndrome-increase-terminations
http://anthro.palomar.edu/abnormal/abnormal_4.htm
7. SEARCHING FOR PHENOTYPES USING
TEXT ALONE IS INSUFFICIENT
OMIM Query # Records
“large bone” 785
“enlarged bone” 156
“big bone” 16
“huge bones” 4
“massive bones” 28
“hyperplastic bones” 12
“hyperplastic bone” 40
“bone hyperplasia” 134
“increased bone growth” 612
8. TERMS SHOULD BE WELL DEFINED SO
THEY GET USED PROPERLY
We need to capture synonyms and use unique
labels
9. SO WHAT IS THE PROBLEM?
Obviously similar phenotype descriptions mean the same
thing to you, but not to a computer:
generalized amyotrophy
generalized muscle, atrophy
muscular atrophy, generalized
Many publications have little information about the actual
phenotypic features seen in patients with particular mutations
Databases cannot talk to one another about phenotypes
10. OUTLINE
Why phenotyping is hard
About Ontologies
Diagnosing known diseases
Getting the phenotype data
How much phenotyping is enough?
Model organism data for undiagnosed
diseases
11. ONTOLOGIES CAN HELP.
A controlled vocabulary of logically defined,
inter-related terms used to annotate data
Use of common or logically related terms across
databases enables integration
Relationships between terms allow annotations
to be grouped in scientifically meaningful ways
Reasoning software enables computation of
inferred knowledge
Some well known ontologies are SNOMED-CT,
Foundational Model of Anatomy, Gene Ontology,
Linnean Taxonomy of species
13. HUMAN PHENOTYPE ONTOLOGY
Used to annotate:
• Patients
• Disorders
• Genotypes
• Genes
• Sequence variants
Abnormality of
pancreatic islet
cells
Reduced pancreatic
beta cells
Abnormality of endocrine
pancreas physiology
Pancreatic islet
cell adenoma
Abnormality of exocrine
pancreas physiology
Pancreatic islet cell
adenoma
Insulinoma
Multiple pancreatic
beta-cell adenomas
Mappings to SNOMED-CT,
UMLS, MeSH, ICD, etc.
Köhler et al. The Human Phenotype Ontology project: linking molecular biology and
disease through phenotype data. Nucleic Acids Res. 2014 Jan 1;42(1):D966-74.
14. USING A CONTROLLED VOCABULARY TO LINK
PHENOTYPES TO DISEASES
Failure to
Thrive
Chromosome
21 Trisomy
Flat Head
Abnormal
Ears
Umbilical
Hernia
Broad Hands
15. SURVEY OF ANNOTATIONS IN DISEASE CORPUS
7000+ diseases OMIM +
Orphanet + Decipher
(ClinVar coming soon)
111,000+ annotations
Phenotype annotations are unevenly distributed across
different anatomical systems
16. HOW DOES HPO RELATE TO OTHER
CLINICAL VOCABULARIES?
Winnenburg and Bodenreider, ISMB PhenoDay, 2014
17. LOGICAL TERM DEFINITION
Definitions are of the following Genus-Differentia form:
X = a Y which has one or more differentiating characteristics.
where X is the is_a parent of Y.
Definition of a cylinder:
Surface formed by the set of lines
perpendicular to a plane, which pass
through a given circle in that plane.
is_a is_a
Definition: Blue cylinder = Cylinder that has color blue.
Definition: Red cylinder = Cylinder that has color red.
18. ABOUT REASONERS
A piece of software able to infer logical consequences
from a set of asserted facts or axioms.
They are used to check the logical consistency of the
ontologies and to extend the ontologies with "inferred"
facts or axioms
For example, a reasoner would infer:
Major premise: All mortals die.
Minor premise: Some men are mortals.
Conclusion: Some men die.
19. PHENOTYPES CAN BE CLASSIFIED IN
MULTIPLE WAYS
Abnormality
of the eye
Abnormal eye
morphology
Vitreous
hemorrhage
Abnormality of the
cardiovascular
system
Abnormal eye
physiology
Hemorrhage
of the eye
Internal
hemorrhage
Abnormality
of the globe
Abnormality of
blood circulation
20. PHENOTYPE MATCHING
Patient Phenotype
Resting tremors
REM disorder
Unstable posture
Myopia
Neuronal loss in
Substantia Nigra
Phenotype of known
variant
Resting tremors
REM disorder
Unstable posture
Nystagmus
Neuronal loss in
Substantia Niagra
21. abn. of
the eye
abn. of the
ocular region
abn. of the
eyelid abn. of globe
localization or size
abn. of the
palpebral fissures
hypertelorism
downward slanting
palpebral fissures
Noonan Syndrome
a)
Syndrome term
Query term
Overlap between
query and disease
abn. of
the eye
abn. of the
ocular region
abn. of the
eyelid abn. of globe
localization or size
abn. of the
palpebral fissures
downward slanting
palpebral fissures
Opitz Syndrome
b)
telecanthus
hypertelorism
c)
Noonan Syndrome
downward slanting
palpebral fissures
hypertelorism
3.78
3.05
Opitz Syndrome
hypertelorism
telecanthus
3.05
Query (Q)
hypertelorism
2.45
downward slanting
palpebral fissures
(IC of abn. of the eyelid)
sim(Q,Noonan) = 3.78 + 3.05
sim(Q,Opitz) =2.45 + 3.05
2
= 2.75
2
= 3.42
23. OUTLINE
Why phenotyping is hard
About Ontologies
Diagnosing known diseases
Getting the phenotype data
How much phenotyping is enough?
Model organism data for undiagnosed
diseases
24. THE YET-TO-BE DIAGNOSED PATIENT
Known disorders not recognized during
prior evaluations?
Atypical presentation of known
disorders?
Combinations of several disorders?
Novel, unreported disorder?
25. EXOME ANALYSIS
Remove off-target, common variants,
and variants not in known disease
causing genes
Recessive, de novo filters
http://compbio.charite.de/PhenIX/
Target panel of 2741 known
Mendelian disease genes
Compare
phenotype
profiles using
data from:
HGMD, Clinvar,
OMIM, Orphanet
Zemojtel et al. Sci Transl Med 3 September 2014:
Vol. 6, Issue 252, p.252ra123
26. PHENIX PERFORMANCE TESTING
Figure removed due to restrictions. Please see the paper:
http://stm.sciencemag.org/content/6/252/252ra123.full
Simulated datasets created by spiking DAG panel generated VCF file with the causative
mutation removed
27. CONTROL PATIENTS WITH KNOWN
MUTATIONS
Inheritance Gene Average
Rank
AD ACVR1, ATL1, BRCA1, BRCA2, CHD7 (4),
CLCN7, COL1A1, COL2A1, EXT1, FGFR2 (2),
FGFR3, GDF5, KCNQ1, MLH1 (2), MLL2/KMT2D,
MSH2, MSH6, MYBPC3, NF1 (6), P63, PTCH1,
PTH1R (2), PTPN11 (2), SCN1A, SOS1, TRPS1,
TSC1, WNT10A
1.7
AR ATM, ATP6V0A2, CLCN1 (2), LRP5, PYCR1,
SLC39A4
5
X EFNB1, MECP2 (2), DMD, PHF6 1.8
28. WORKFLOW FOR CLINICAL EXOME
Suspected genetic
disease
DRG sequencing
Deep phenotyping
ANALYSIS
Top ranked
candidates
Clinical rounds
Exclude candidate
gene
Sanger validation
Cosegregation
studies
Fails
Diagnosis
Passes
Inconsistent
Consistent
Reconsider short list
Choose best
candidate
Variant Analysis
Annotation sources:
HGMD
MAF (dbSNP, ESP)
ClinVar
Prediction criteria:
Predicted pathogenicity
Variant class
Location in DRG target region
Computational Phenotype Analysis
HPO
Annotation sources:
OMIM, Orphanet,
MGI...
Semantic similarity
Mode of inheritance
Ontology:
Prediction criteria: MP
29. PHENIX HELPED DIAGNOSE 11/40 PATIENTS
global developmental delay (HP:0001263)
delayed speech and language development (HP:0000750)
motor delay (HP:0001270)
proportionate short stature (HP:0003508)
microcephaly (HP:0000252)
feeding difficulties (HP:0011968)
congenital megaloureter (HP:0008676)
cone-shaped epiphysis of the phalanges of the hand (HP:0010230)
sacral dimple (HP:0000960)
hyperpigmentated/hypopigmentated macules (HP:0007441)
hypertelorism (HP:0000316)
abnormality of the midface (HP:0000309)
flat nose (HP:0000457)
thick lower lip vermilion (HP:0000179)
thick upper lip vermilion (HP:0000215)
full cheeks (HP:0000293)
short neck (HP:0000470)
31. SKELETOME PATIENT ARCHIVE
Integration with the HPO, Orphanet, and Monarch Initiative
Automated phenotyping from clinical summaries
Collaborative diagnosis
University of Queensland, University of Sydney
32. THE FACES OF RARE DISEASES
Screening and
diagnosis
Treatment moni toring
Surgical planning and
audi t
Genotype-phenotype
correlation
Cross-species
comparisons
Face to text conversion
for text mining
mm
Non-invasive, non-irradiating deeply precise
3D facial analysis
University of Western Australia
33. OUTLINE
Why phenotyping is hard
About Ontologies
Diagnosing known diseases
Getting the phenotype data
How much phenotyping is enough?
Model organism data for undiagnosed
diseases
35. USING ONTOLOGIES IN THE CLINIC
Ontologies are large (HPO has > 10,000 terms) and
difficult to navigate
Mapping data to an ontology post-visit is time
consuming and prone to error
Best time to phenotype using ontologies is during the
patient visit
Goals of PhenoTips
Make deep phenotyping simple
Make it “faster than paper”
36. PhenomeCentral is a Matchmaker
Lets you know about other similar patients
Lets you easily connect with other users
Each Patient Record can be:
Public – Anyone can see the record
Private – Only specified users/consortia can see
the record
Matchable – The record cannot be seen, but can
be “discovered” by users who submit similar
patients
37. STEP 1: ADD PATIENT
Can use the interface
built into
PhenomeCentral
Can export data
directly from a local
PhenoTips instance
Add a vcf file (or list
of genes)
Set each record as
Private, Public or
Matchable
40. OUTLINE
Why phenotyping is hard
About Ontologies
Diagnosing known diseases
Getting the phenotype data
How much phenotyping is enough?
Model organism data for undiagnosed
diseases
41. HOW MUCH PHENOTYPING IS ENOUGH?
How many annotations…?
How many different categories?
How many within each?
42. Not everything that counts can be counted and not everything that can be
counted counts -Albert Einstein
43. METHOD: DERIVE BY CATEGORY
REMOVAL
Remove annotations that are
subclasses of a single high-level node
Repeat for each 1° subclass
47. SEMANTIC SIMILARITY ALGORITHMS ARE
ROBUST IN THE FACE OF MISSING
INFORMATION
Similarity of Derived Disease to Original Derived Disease Profile Rank
(avg) 92% of derived diseases are most-similar
to original disease
Severity of impact follows proportion of
phenotype
48. METHOD: DERIVE BY LIFTING
Iteratively map each class to their direct
superclass(es)
Keep only leaf nodes
49. SEMANTIC SIMILARITY ALGORITHMS ARE
SENSITIVE TO SPECIFICITY OF INFORMATION
Similarity of Derived Disease to Original Derived Disease Profile Rank
Severity of impact increases with more-general
phenotypes
51. OUTLINE
Why phenotyping is hard
About Ontologies
Diagnosing known diseases
Getting the phenotype data
How much phenotyping is enough?
Model organism data for undiagnosed
diseases
52. WHAT TO DO WHEN WE CAN’T DIAGNOSE
WITH A KNOWN DISEASE?
54. HOW MUCH PHENOTYPE DATA?
Human genes have poor phenotype coverage
GWAS
+
ClinVar
+
OMIM
55. HOW MUCH PHENOTYPE DATA?
Human genes have poor phenotype coverage
What else can we leverage?
GWAS
+
ClinVar
+
OMIM
56. HOW MUCH PHENOTYPE DATA?
Human genes have poor phenotype coverage
What else can we leverage? …animal models
Orthology via PANTHER v9
57. COMBINED, HUMAN AND MODEL PHENOTYPES
CAN BE LINKED TO >75% HUMAN GENES
Orthology via PANTHER v9
58. EACH MODEL CONTRIBUTES DIFFERENT
PHENOTYPES
Data from MGI, ZFIN, & HPO, reasoned over with cross-species phenotype
ontology
https://code.google.com/p/phenotype-ontologies/
60. SOLUTION: BRIDGING SEMANTICS
anatomical
structure
endoderm of
forgut
lung bud
respiration organ
lung
organ
foregut
is_a (SubClassOf)
part_of
develops_from
capable_of
alveolus
organ part
alveolus of lung
FMA:lung
MA:lung
endoderm
GO: respiratory
gaseous exchange
MA:lung
alveolus
FMA:
pulmonary
alveolus
is_a (taxon equivalent)
NCBITaxon: Mammalia
EHDAA:
lung bud
only_in_taxon
pulmonary
acinus
alveolar sac
lung primordium
swim bladder
respiratory
primordium
NCBITaxon:
Actinopterygii
Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative
multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5
61. PHENOTYPE REPRESENTATION REQUIRES
MORE THAN “PHENOT YPE ONTOLOGIES
Disease Gene Ontology Chemical
glucose
metabolism
(GO:0006006)
Gene/protein
function data
glucose
(CHEBI:172
34)
Metabolomics,
toxicogenomics
data
type II
diabetes
mellitus
(DOID:9352)
Disease &
phenotype
data
pyruvate
(CHEBI:153
61)
Cell
pancreatic
beta cell
(CL:0000169)
transcriptomic
data
62. MODELS BASED ON PHENOTYPIC
SIMILARITY
Washington, Haendel, et al. (2009). Linking Human Diseases to Animal Models Using Ontology-Based Phenotype
Annotation. PLoS Biol, 7(11). doi:10.1371/journal.pbio.1000247
63. OWLSIM: PHENOTYPE SIMILARITY
ACROSS PATIENTS OR ORGANISMS
find
Resting tremors
REM disorder
Shuffling gait
Unstable
posture
Neuronal loss in
Substant ia Nigra
Const ipat ion
Hyposmia
sterotypic
behavior
abnormal
EEG
decreased
stride length
poor rotarod
performance
axon
degenerat ion
decreased gut
peristalsis
failure to find
food
abnormal
motor function
sleep
disturbance
abnormal
locomotion
abnormal
coordination
CNS neuron
degenerat ion
abnormal
digestive
physiology
abnormal
olfaction
https://code.google.com/p/owltools/wiki/OwlSim
64. MONARCH PHENOTYPE DATA
Species Data source Genes Genotypes Variants Phenotype
annotations
Also in the system: Rat; IMPC; GO annotations; Coriell cell lines; OMIA; MPD; Yeast; CTD; GWAS;
Panther, Homologene orthologs; BioGrid interactions; Drugbank; AutDB; Allen Brain …157 sources
to date
Coming soon: Animal QTLs for pig, cattle, chicken, sheep, trout, dog, horse
Diseases
mouse MGI 13,433 59,087 34,895 271,621
fish ZFIN 7,612 25,588 17,244 81,406
fly Flybase 27,951 91,096 108,348 267,900
worm Wormbase 23,379 15,796 10,944 543,874
human HPOA 112,602 7,401
human OMIM 2,970 4,437 3,651
human ClinVar 3,215 100,523 445,241 4,056
human KEGG 2,509 3,927 1,159
human ORPHANET 3,113 5,690 3,064
human CTD 7,414 23,320 4,912
65. EXOMISER
METHOD Cohort VCF
file
Homo rec
De novo
dom
Compound
het
X-linked
Exomiser
Filters:
Mendelian
Frequency
Candidates
HP
https://www.sanger.ac.uk/resources/databa
ses/exomiser/query/exomiser2
66. EXOMISER RESULTS ON NIH UNDIAGNOSED
DISEASE PROGRAM PATIENTS
9 previously diagnosed families
Identified causative variants with a
rank of at least 7/408 potential
variants
21 families without identified
disorders
We have now prioritized variants in
STIM1, ATP13A2, PANK2, and CSF1R
in 5 different families (2 STIM1
families)
Bone et al., submitted
67. UDP_2731
Patient
phenotypes Sh3kbp1 tm1Ivdi -/ -
Gait apraxia
Spasticity
Thyroid
stimulating
hormone excess
Behavioural/
Psychiatric
Abnormality
hyperactivity
hyperactivity
increased
dopamine level
increased
exploration in new
environment
abnormal
locomotor
behavior
Abnormal
voluntary
movement
Abnormality of
the endocrine
system
Behavioral
abnormality
70. PHENOVIZ: INTEGRATE ALL HUMAN,
MOUSE, AND FISH DATA TO UNDERSTAND
CNVS
Desktop application
for differential
diagnostics in CNVs
Explain manifestations of CNV diseases based on genes
contained in CNV
E.g., Supravalcular aortic stenosis in Williams syndrome can
be explained by haploinsufficiency for elastin
Double the number of explanations using model data
Doelken, Köhler, et al. (2013) Dis Model Mech 6:358-72
72. WHO USES THE HPO?
Bayés, Àlex, et al. Nature
neuroscience 2011
Castellano, Sergi, et al. PNAS
2014
Corpas, Manuel, et al. " Current
Protocols in Human Genetics
2012
Sifrim, Alejandro, et al. Nature
methods 2013
Lappalainen, Ilkka, et al.
Nucleic acids research 2013
Firth, Helen V., and Caroline F.
Wright. Developmental
Medicine & Child Neurology
2011
Many more…
73. ADVANTAGES OF HPO
Widely used, flexible, freely available, and community
supported resource
Prioritization of candidate variants through tools such as
PhenIX and Exomizer, and others
Extensive links to model organism ontologies, allowing
selection of optimal models for wet-lab validation and
research, and collaborators
Intuitive clinical interfaces built into tools such as
PhenoTips, Certagenia, and others
Ability to easily share data with key international projects
(Decipher/DDD, RD-Connect, PhenomeCentral,
Matchmaker Exchange, etc.)
74. LIMITATIONS
Quantitative vs. qualitative – Much of clinical data is
quantitative lab data with reference standards. It is possible
to convert based on ±3 SD, but no way to record the
reference measure/population yet.
Temporal presentation – ontologies can support temporal
ordering, but data capture tools don’t yet capture this and
the comparison algorithms don’t yet take it into account
Severity – semantic encoding is available, but simple in
comparison to phenotype-specific measures
Emerging ontology – some areas have poor coverage, such
as nervous system, behavior, and imaging results. Need to
represent the assays in these contexts.
75. ACKNOWLEDGMENTS
NIH-UDP
William Bone
Murat Sincan
David Adams
Amanda Links
David Draper
Joie Davis
Neal Boerkoel
Cyndi Tif f t
Bill Gahl
OHSU
Nicole Vasilesky
Matt Brush
Bryan Laraway
Shahim Essaid
Lawrence Berkeley
Nicole Washington
Suzanna Lewis
Chris Mungall
UCSD
Amarnath Gupta
Jef f Grethe
Anita Bandrowski
Maryann Martone
U of Pitt
Chuck Boromeo
Jeremy Espino
Becky Boes
Harry Hochheiser
Sanger
Anika Oehl r ich
Jules Jacobson
Damian Smedley
Toronto
Mar ta Gi rdea
Sergiu Dumi t r iu
Heather Trang
Mike Brudno
JAX
Cynthia Smi th
Charité
Sebast ian Kohler
Sandra Doelken
Sebast ian Bauer
Peter Robinson
Funding:
NIH Office of Director: 1R24OD011883
NIH-UDP: HHSN268201300036C, HHSN268201400093P
76. WHERE TO GET HPO, AND HOW TO
REQUEST NEW CONTENT
We need you!
Browse in the following places:
http://www.human-phenotype-ontology.org/
http://purl.bioontology.org/ontology/HP
Get the file:
http://purl.obolibrary.org/obo/hp.owl
Request content:
https://sourceforge.net/p/obo/human-phenotype-requests/new/
More Documentation:
https://code.google.com/p/phenotype-ontologies/
Editor's Notes
A good example of this can be seen here. So the average person has had enough experience with Down’s Syndrome that they are likely able to notice that all three of these patients have it. However, if you asked them to describe this phenotype in short phrases that can be agreed upon by the majority of people, can be used to identify the disorder, and ideally can be easily used by a computer, they are going to have a difficult time. It is a collection of phenotypes “Together “ that define a disease and it is difficult for someone who does have the proper training to parse this information out.
Down’s Syndrome good example
Average person can identify Down’s , but can’t list out the Pheno for a computer
Unless clinically TRAINED tough to outline a phenotype
But, through hard work, and being very specific with our description of phenotype we can begin to approach making this information manageable and use it to identify disease.
Can make a COMPUTABLE list of phenotype with hard work
How can we characterize the diseases? Well, a distribution of the phenotypes across all diseases reveals that most have phenotypes affecting the nervous system, while the least have connective tissue phenotypes.
7401 diseases
99,045 annotations
There are 20 categories in all. Note that they are not additive, as some phenotypes might belong to two categories. For example, some eye and ear phenotypes also belong to the head and neck.
MESSAGE: Phenotype annotations unevenly distributed in different anatomical systems.
As you might expect, we use this phenotype ontology to search for known variants that have similar phenotype to our patient. Using just HPO and the disease annotated with HPO terms we can compare to the known human disease, but as we are all painfully aware of we do not know the consequences of mutations in every gene that is in the human genome.
We use patient pheno and search for known variants to with similar pheno
Can do this with Humans, but don’t know all Human disease
PhenIX usese human data and predicted deleteriousness
HGMD ClinVAR OMIM Orphanet
HGMD mutations were inserted into variant files from DAG panels from which the causative mutations had been removed and phenotypic annotations of the corresponding diseases were extracted from the HPO database. The genes were ranked with PhenIX. Results were simulated either on the entire disease set (All) or by filtering for known autosomal dominant (AD) or autosomal recessive (AR) diseases (fig. S2). A total of 8504 (All), 3471 (AD), and 5006 (AR) simulations were performed.
See final figure in paper: http://stm.sciencemag.org/content/6/252/252ra123.full
A community-driven knowledge curation platform for skeletal dysplasias
Editorial process for curating phenotype and genotype knowledge on skeletal dysplasias
Integration with HPO
1: Diagnostic Facial Signature via a vector diagram: shows directional differences versus a normative data set. It is a 2D representation of a 3D image. The ends of the coloured lines represent the patient’s (or averaged groups of patients’ ) image(s). The grey scale represents the normative data (normal equivalent). In this case, if the tips of the arrows are seen sticking out from the grey surface, then the patient (or group of patient’s) is characterised by a protrusion of that region. E.g. here the patient has a prominent nasal tip and prominent lips. If you were to rotate the 3D image you would see the lines in the cheek region pointing inwards i.e. the patient has a flat cheek region compared to the reference range.
2: Monitoring treatment in a multisystem disorder. Progressive reduction in facial dysmorphology (anomaly), top to bottom, over treatment for a rare metabolic condition (Mucopolysaccharidosis type 1)
3: Text mining: converting face to text, specifically, elements of morphology terms.
When we have a patient with an undiagnosed/rare disease, we want to be able to search the whole landscape of knowledge.
How can we effectively utilize phenotype information collected about the patients?
It all boils down to the question, how much is enough to be useful?
What does that information need to be like in order to be useful?
We have partnered with the UDP to try to help figure out how much information might be necessary to collect about the patients with rare diseases.
To illustrate the method, let’s take Schwartz-jampel Syndrome. It has ~100 phenotype annotations, distributed to 17 phenotypic categories. The majority of which are in the skeletal system.
Problem in Hspg2, a proteoglycan that binds to and cross-links many extracellular matrix components and cell-surface molecules.
For this example, I’ve taken a subset of the phenotypes, and colored them by category.
We can test the roll of category by creating a derived disease that removes all the phenotypes for that category as our “case”…
We can test the role of category by creating a derived disease that removes all the phenotypes for that category as our “case”…
And then as a control, remove an equal amount of “information” from other categories.
In the case of Schwartz-Jampel Syndrome, removing only skeletal phenotypes (which comprises 40% of phenotype profile) it significantly reduces its similarity, dropping it to only 86% similar, whereas removing the same amount of information from the controls gives an average of 91% similarity. In this case, there were 73 controls to compare to. We have performed these types of experiments for every disease profile in our corpus (approx 8K).
The experiment therefore is:
Create a variety of “derived” less-specific diseases
Assess the change in similarity:
Is the derived disease still considered similar to the original disease?
…or more similar to a different disease?
Is it distinguishable beyond random?
Using the results, we can create a metric to define when a phenotype profile is unique enough to be useful in comparisons to other diseases and models systems.
Using the results, we can create a metric to define when a phenotype profile is unique enough to be useful in comparisons to other diseases and models systems.
This has been implemented at UDP so that clinicians can provide quality human data to be able to compare to animal models.
It has also been implemented on monarch genotype pages, where one can see how well annotated any given model is.
This functionality is also available as a service call. We can also generate different types of reports based on these analyses to determine which diseases have the least coverage, which models are the least well annotated, etc.
Our abilities to link genotype to phenotype are constrained by our knowledgebase. To date, <40% of human genes have been directly linked to diseases or traits by ClinVar, OMIM, and GWAS. How are we going to discover the disease-gene links if the phenotype coverage is so poor?
(Of course, there may be much more links for GWAS, once we figure out what to do with all the variants that lie outside of gene boundaries.
So, <40% of human genes have phenotypes. And when we look at the orthologs for each of the standard multi-cellular model organisms, there doesn’t appear to be any more than about 50% coverage for any given model.
But when put together, they bring the phenotypic coverage of human genes (either directly or inferred via orthology) up to nearly 80%. That is A LOT of coverage. How can we better tap that?
Since any computational methods rely heavily on the data, what does the available data look like?
The distribution of phenotype information per model genotype is different compared to human disease annotations.
For mouse, there’s a much higher representation of metabolic, cardiovascular, blood, and endocrine phenotypes available to compare;
For fish, there’s increased nervous, skeletal, head and neck, and cardiovascular, and connective tissue.
(Note that these do not include “normal” phenotypes for either diseases or genotypes.)
Each model brings something different to the phenotype landscape.
Different terminology is used to describe clinical manifestations than is used to describe model system biological features.
Things like finding models of sirenomelia due to disruption of the lateral plate mesoderm . Helping to find models and gene candidates based on the relationships in the development
Without additional knowledge and linking, computers can’t make the connections. These links take us from the molecular to the protein, to the cellular and anatomical, to the disease level of phenotypes
OWLsim computes semantic similarity between sets of phenotypes within and across species using the bridging semantics. Phenotypes in common from the bridging ontologies relate human clinical phenotypes with model organism phenotypes.
Examples include motor systems, olfaction, and digestion.
In this case, data encoded using the human phenotype ontology has been made interoperable with mouse, zebrafish and other model system ontologies. This also enables the use of more complex algorithms to detect similarity – not bases solely on mapping or string matching; e.g. constipation and decreased gut peristalsis are both subtypes of abnormal digestive system physiology.
Run through pipeline: Exomiser LOT is a version of exomiser that is less restrictive as far as what transcripts it recognizes (not at worried about off target reads because of the Mendalian filters and the ability to look at the BAMS)
Exomiser
Exomiser LOT
Pheno only
Exomiser is an exome analysis tool that leverages the uberpheno cross species phentoype comparisons, standard exome filtering (pathogenicity, frequency, off targets, etc.), in combination with mendelian filtering and interactome walking. The original method paper is published: http://genome.cshlp.org/content/early/2013/10/25/gr.160325.113.abstract
This work is unpublished but has been (just) submitted. Happy to share the manuscript if of interest.
Each model organism has a different suite of phenotypes that are examined, because different models are used to explore different types of biological function and malfunction. By using a diversity of model systems, we have the potential to identify candidates based on partial overlaps with the patient phenotype profile by looking at different models with mutations in potential candidates or related via interactions, co-expression, genomic regulatory region, etc.
Phenoviz is a new graphviz plugin that can be used as a standalone app for Windows, Mac, or Linux. The user uploads a list of CNVs detected by Array CGH (SNP Chips, or even genome sequence data would also work as a starting point, but the program expects a simple list). You also enter a list of the HPO terms observed in the patient. The application then tries to find “matches” based on the single gene disorders (human – HPO annotations) or the mouse models (mainly knockouts, MP annotations from MGI) or fish models (ZFIN E/Q annotations). This is being in the Charite Array CGH diagnostics service to help with interpretation of CNVs. Subjectively, the tool helps you to quickly find good candidates in order to write reports. The program also picks out the best matching CNV in case the user enters several (a typical array CGH finding in our lab has up to 50 CNVs, of which 2-5 are not found in databases of common variants like DGV).
There are a lot of people who have contributed to this work over many years.