Data Translator: an Open Science Data Platform for Mechanistic Disease Discoverymhaendel
Architecture of language and data translation that underlays the NCATS Biomedical Data Translator. Presented at the Fanconi Anemia Annual Meeting. http://fanconi.org/index.php/research/annual_symposium
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...mhaendel
Presented at the IRDiRC 2017 conference in Paris, Feb 9th, 2017 (http://irdirc-conference.org/). This talk reviews use of the Human Phenotype Ontology for phenotype comparisons against other patients, known diseases, and animal models for diagnostic discovery. It also discusses the new Phenopackets Exchange mechanism for open phenotypic data sharing.
www.monarchinitiative.org
www.phenopackets.org
www.human-phenotype-ontology.org
The Monarch Initiative: From Model Organism to Precision Medicinemhaendel
NIH BD2K all-hands meeting poster November 12, 2015.
Attempts at correlating phenotypic aspects of disease with causal genetic influences are often confounded by the challenges of interpreting diverse data distributed across numerous resources. New approaches to data modeling, integration, tooling, and community practices are needed to make efficient use of these data. The Monarch Initiative is an international consortium working on the development of shared data, tools, and standards to enable direct translation of integrated genotype, phenotype, and environmental data from human and model organisms to enhance our understanding of human disease. We utilize sophisticated semantic mapping techniques across a diverse set of standardized ontologies to deeply integrate data across species, sources, and modalities. Using phenotype similarity matching algorithms across these data enables disorder prediction, variant prioritization, and patient matching against known diseases and model organisms. These similarity algorithms form the core of several innovative tools. The Exomiser, which enables exome variant prioritization by combining pathogenicity, frequency, inheritance, protein interaction, and cross-species phenotype data. Our Phenotype Sufficiency tool provides clinicians the ability to compare patient phenotypic profiles using the Human Phenotype Ontology to determine uniqueness and specificity in support of variant prioritization. The PhenoGrid visualization widget illustrates phenotype similarity between patients, known diseases, and model organisms. Monarch develops models in collaboration with the community in support of the burgeoning genotype-phenotype disease research community. We have successfully used Exomiser to solve a number of undiagnosed patient cases in collaboration with the NIH Undiagnosed Disease Program. Ongoing development in coordination with the Global Alliance for Genetic Health (GA4GH) and other groups will catalyze the realization of our goal of a vital translational community focused on the collaborative application of integrated genotype, phenotype, and environmental data to human disease.
The Application of the Human Phenotype Ontology mhaendel
Presented at the II International Summer School for Rare Disease and Orphan Drug Registries, September 15-19, 2014, Organized by the National Centre for Rare Diseases
Istituto Superiore di Sanità (ISS), Rome, Italy.
Note the extensive contribution by many consortium members and partners listed in the acknowledgements slide.
Why the world needs phenopacketeers, and how to be onemhaendel
Keynote presented at the the Ninth International Biocuration Conference Geneva, Switzerland, April 10-14, 2016
The health of an individual organism results from complex interplay between its genes and environment. Although great strides have been made in standardizing the representation of genetic information for exchange, there are no comparable standards to represent phenotypes (e.g. patient disease features, variation across biodiversity) or environmental factors that may influence such phenotypic outcomes. Phenotypic features of individual organisms are currently described in diverse places and in diverse formats: publications, databases, health records, registries, clinical trials, museum collections, and even social media. In these contexts, biocuration has been pivotal to obtaining a computable representation, but is still deeply challenged by the lack of standardization, accessibility, persistence, and computability among these contexts. How can we help all phenotype data creators contribute to this biocuration effort when the data is so distributed across so many communities, sources, and scales? How can we track contributions and provide proper attribution? How can we leverage phenotypic data from the model organism or biodiversity communities to help diagnose disease or determine evolutionary relatedness? Biocurators unite in a new community effort to address these challenges.
Data Translator: an Open Science Data Platform for Mechanistic Disease Discoverymhaendel
Architecture of language and data translation that underlays the NCATS Biomedical Data Translator. Presented at the Fanconi Anemia Annual Meeting. http://fanconi.org/index.php/research/annual_symposium
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...mhaendel
Presented at the IRDiRC 2017 conference in Paris, Feb 9th, 2017 (http://irdirc-conference.org/). This talk reviews use of the Human Phenotype Ontology for phenotype comparisons against other patients, known diseases, and animal models for diagnostic discovery. It also discusses the new Phenopackets Exchange mechanism for open phenotypic data sharing.
www.monarchinitiative.org
www.phenopackets.org
www.human-phenotype-ontology.org
The Monarch Initiative: From Model Organism to Precision Medicinemhaendel
NIH BD2K all-hands meeting poster November 12, 2015.
Attempts at correlating phenotypic aspects of disease with causal genetic influences are often confounded by the challenges of interpreting diverse data distributed across numerous resources. New approaches to data modeling, integration, tooling, and community practices are needed to make efficient use of these data. The Monarch Initiative is an international consortium working on the development of shared data, tools, and standards to enable direct translation of integrated genotype, phenotype, and environmental data from human and model organisms to enhance our understanding of human disease. We utilize sophisticated semantic mapping techniques across a diverse set of standardized ontologies to deeply integrate data across species, sources, and modalities. Using phenotype similarity matching algorithms across these data enables disorder prediction, variant prioritization, and patient matching against known diseases and model organisms. These similarity algorithms form the core of several innovative tools. The Exomiser, which enables exome variant prioritization by combining pathogenicity, frequency, inheritance, protein interaction, and cross-species phenotype data. Our Phenotype Sufficiency tool provides clinicians the ability to compare patient phenotypic profiles using the Human Phenotype Ontology to determine uniqueness and specificity in support of variant prioritization. The PhenoGrid visualization widget illustrates phenotype similarity between patients, known diseases, and model organisms. Monarch develops models in collaboration with the community in support of the burgeoning genotype-phenotype disease research community. We have successfully used Exomiser to solve a number of undiagnosed patient cases in collaboration with the NIH Undiagnosed Disease Program. Ongoing development in coordination with the Global Alliance for Genetic Health (GA4GH) and other groups will catalyze the realization of our goal of a vital translational community focused on the collaborative application of integrated genotype, phenotype, and environmental data to human disease.
The Application of the Human Phenotype Ontology mhaendel
Presented at the II International Summer School for Rare Disease and Orphan Drug Registries, September 15-19, 2014, Organized by the National Centre for Rare Diseases
Istituto Superiore di Sanità (ISS), Rome, Italy.
Note the extensive contribution by many consortium members and partners listed in the acknowledgements slide.
Why the world needs phenopacketeers, and how to be onemhaendel
Keynote presented at the the Ninth International Biocuration Conference Geneva, Switzerland, April 10-14, 2016
The health of an individual organism results from complex interplay between its genes and environment. Although great strides have been made in standardizing the representation of genetic information for exchange, there are no comparable standards to represent phenotypes (e.g. patient disease features, variation across biodiversity) or environmental factors that may influence such phenotypic outcomes. Phenotypic features of individual organisms are currently described in diverse places and in diverse formats: publications, databases, health records, registries, clinical trials, museum collections, and even social media. In these contexts, biocuration has been pivotal to obtaining a computable representation, but is still deeply challenged by the lack of standardization, accessibility, persistence, and computability among these contexts. How can we help all phenotype data creators contribute to this biocuration effort when the data is so distributed across so many communities, sources, and scales? How can we track contributions and provide proper attribution? How can we leverage phenotypic data from the model organism or biodiversity communities to help diagnose disease or determine evolutionary relatedness? Biocurators unite in a new community effort to address these challenges.
Envisioning a world where everyone helps solve diseasemhaendel
Keynote presented at the Semantic Web for Life Sciences conference in Cambridge, UK, December 9th, 2015
http://www.swat4ls.org/
The talk focuses on the use of ontologies for data integration to support rare disease diagnostics, and how so very many people unbeknownst to the patient or even to the researchers creating the data are involved in a diagnosis.
The Human Phenotype Ontology (HPO) was developed to describe phenotypic abnormalities, aka, “deep phenotyping”, whereby symptoms and characteristic phenotypic findings (a phenotypic profile) are captured. The HPO has been utilized to great success for assisting computational phenotype comparison against known diseases, other patients, and model organisms to support diagnosis of rare disease patients. Clinicians and geneticists create phenotypic profiles based on clinical evaluation, but this is time consuming and can miss important phenotypic features. Patients are sometimes the best source of information about their symptoms that might otherwise be missed in a clinical encounter. However, HPO primarily use medical terminology, which can be difficult for patients and their families to understand. To make the HPO accessible to patients, we systematically added non-expert terminology (i.e., layperson terms) synonyms. Using semantic similarity, patient-recorded phenotypic profiles can be evaluated against those created clinically for undiagnosed patients to determine the improvement gained from the patient-driven phenotyping, as well as how much the patient phenotyping narrows the diagnosis. This patient-centric HPO can be utilized by all: in patient-centered rare disease websites, in patient community platforms and registries, or even to post one’s hard-to-diagnosed phenotypic profile on the Web.
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...mhaendel
Presented at AMIA TBI CRI 2018.
Rare disease patients are expert in their medical history and these patients not only are some of the most engaged, but also they can themselves provision data for use in clinical evaluation. We therefore created a lay-person version of our clinical deep phenotyping instrument, the Human Phenotype Ontology. Here, we evaluate the diagnostic utility of this lay-HPO, and debut a new software tool for patient-led deep phenotyping.
On the frontier of genotype-2-phenotype data integrationmhaendel
Presented at AMIA TBI 2016 BD2K Panel. A description of the Monarch Initiative's efforts to perform deep phenotyping data integration across species, facilitate exchange, and build computable G2P evidence modesl to aid variant interpretation.
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Jeremy Yang
Talk given at 14th Annual New Mexico BioInformatics, Science and Technology (NMBIST) Symposium, entitled Integrative Omics, on March 14-15, 2019. Most slides c/o IDG KMC PI Tudor Oprea, MD, PhD.
Visual Exploration of Clinical and Genomic Data for Patient StratificationNils Gehlenborg
Talk presented at the Simons Foundation Biotech Symposium "Complex Data Visualization: Approach and Application" (12 September 2014)
http://www.simonsfoundation.org/event/complex-data-visualization-approach-and-application/
In this talk I describe how we integrated a sophisticated computational framework directly into the StratomeX visualization technique to enable rapid exploration of tens of thousands of stratifications in cancer genomics data, creating a unique and powerful tool for the identification and characterization of tumor subtypes. The tool can handle a wide range of genomic and clinical data types for cohorts with hundreds of patients. StratomeX also provides direct access to comprehensive data sets generated by The Cancer Genome Atlas Firehose analysis pipeline.
http://stratomex.caleydo.org
Poster presentation at the Rare Disease Symposium at Oregon Health & Science University in Portland, Oregon, 2015.
http://openwetware.org/wiki/OHSU_Rare_Disease_Research_Consortium_Symposium_2015
Guided visual exploration of patient stratifications in cancer genomicsNils Gehlenborg
Talk presented at the "Beyond the Genome 2014: Cancer Genomics" conference (10 October 2014)
http://www.beyond-the-genome.com/2014/
Cancer is a heterogeneous disease, and molecular profiling of tumors from large cohorts has enabled characterization of new tumor subtypes. This is a prerequisite for improving personalized treatment and ultimately better patient outcomes. Potential tumor subtypes can be identified with methods such as unsupervised clustering or network-based stratification, which assign patients to sets based on high-dimensional molecular profiles. Detailed characterization of identified sets and their interpretation, however, remain a time-consuming exploratory process.
To address these challenges, we have developed StratomeX (http://stratomex.caleydo.org), an interactive visualization tool that complements algorithmic approaches. StratomeX also integrates a computational framework for query-based guided exploration directly into the visualization, enabling discovery of novel relationships between patient sets and efficient generation and refinement of hypotheses about tumor subtypes. StratomeX enables analysts to efficiently compare multiple patient stratifications, to correlate patient sets with clinical information or genomic alterations, and to view the differences between molecular profiles across patient sets.
https://www.creative-bioarray.com/support/resazurin-cell-viability-assay.htm
Resazurin cell viability assay is a simple, rapid, reliable, sensitive, safe and cost-effective measurement of cell viability.
STR DNA profiling is now a powerful, inexpensive tool that can generate unique DNA signatures that can be used to authenticate cell lines and detect contamination of more than one cell type. This presentation will talk about why scientists need cell authentication, what is STR profile and STR profile workflow from Creative Bioarray.
The human genome is full of repeated DNA sequences which come in various sizes and are classified according to the length of the core repeat units, the number of contiguous repeat units, and/or the overall length of the repeat region. DNA regions with short repeat units (usually 2-6 bp in length) are called Short Tandem Repeats (STR).
Approaches for the Integration of Visual and Computational Analysis of Biomed...Nils Gehlenborg
The integration of computational and statistical approaches with visualization tools is becoming crucial as biomedical data sets are rapidly growing in size. Finding efficient solutions that address the interplay between data management, algorithmic and visual analysis tools is challenging. I will discuss some of these challenges and demonstrate how we are addressing them in our Refinery Platform project (http://www.refinery-platform.org).
How to transform genomic big data into valuable clinical informationJoaquin Dopazo
How to transform genomic big data into valuable clinical information
The impact of genomics in translational medicine: present view
13th October 2014, Vall d’Hebron Institute of Research (VHIR), Barcelona, Spain
Envisioning a world where everyone helps solve diseasemhaendel
Keynote presented at the Semantic Web for Life Sciences conference in Cambridge, UK, December 9th, 2015
http://www.swat4ls.org/
The talk focuses on the use of ontologies for data integration to support rare disease diagnostics, and how so very many people unbeknownst to the patient or even to the researchers creating the data are involved in a diagnosis.
The Human Phenotype Ontology (HPO) was developed to describe phenotypic abnormalities, aka, “deep phenotyping”, whereby symptoms and characteristic phenotypic findings (a phenotypic profile) are captured. The HPO has been utilized to great success for assisting computational phenotype comparison against known diseases, other patients, and model organisms to support diagnosis of rare disease patients. Clinicians and geneticists create phenotypic profiles based on clinical evaluation, but this is time consuming and can miss important phenotypic features. Patients are sometimes the best source of information about their symptoms that might otherwise be missed in a clinical encounter. However, HPO primarily use medical terminology, which can be difficult for patients and their families to understand. To make the HPO accessible to patients, we systematically added non-expert terminology (i.e., layperson terms) synonyms. Using semantic similarity, patient-recorded phenotypic profiles can be evaluated against those created clinically for undiagnosed patients to determine the improvement gained from the patient-driven phenotyping, as well as how much the patient phenotyping narrows the diagnosis. This patient-centric HPO can be utilized by all: in patient-centered rare disease websites, in patient community platforms and registries, or even to post one’s hard-to-diagnosed phenotypic profile on the Web.
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...mhaendel
Presented at AMIA TBI CRI 2018.
Rare disease patients are expert in their medical history and these patients not only are some of the most engaged, but also they can themselves provision data for use in clinical evaluation. We therefore created a lay-person version of our clinical deep phenotyping instrument, the Human Phenotype Ontology. Here, we evaluate the diagnostic utility of this lay-HPO, and debut a new software tool for patient-led deep phenotyping.
On the frontier of genotype-2-phenotype data integrationmhaendel
Presented at AMIA TBI 2016 BD2K Panel. A description of the Monarch Initiative's efforts to perform deep phenotyping data integration across species, facilitate exchange, and build computable G2P evidence modesl to aid variant interpretation.
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Jeremy Yang
Talk given at 14th Annual New Mexico BioInformatics, Science and Technology (NMBIST) Symposium, entitled Integrative Omics, on March 14-15, 2019. Most slides c/o IDG KMC PI Tudor Oprea, MD, PhD.
Visual Exploration of Clinical and Genomic Data for Patient StratificationNils Gehlenborg
Talk presented at the Simons Foundation Biotech Symposium "Complex Data Visualization: Approach and Application" (12 September 2014)
http://www.simonsfoundation.org/event/complex-data-visualization-approach-and-application/
In this talk I describe how we integrated a sophisticated computational framework directly into the StratomeX visualization technique to enable rapid exploration of tens of thousands of stratifications in cancer genomics data, creating a unique and powerful tool for the identification and characterization of tumor subtypes. The tool can handle a wide range of genomic and clinical data types for cohorts with hundreds of patients. StratomeX also provides direct access to comprehensive data sets generated by The Cancer Genome Atlas Firehose analysis pipeline.
http://stratomex.caleydo.org
Poster presentation at the Rare Disease Symposium at Oregon Health & Science University in Portland, Oregon, 2015.
http://openwetware.org/wiki/OHSU_Rare_Disease_Research_Consortium_Symposium_2015
Guided visual exploration of patient stratifications in cancer genomicsNils Gehlenborg
Talk presented at the "Beyond the Genome 2014: Cancer Genomics" conference (10 October 2014)
http://www.beyond-the-genome.com/2014/
Cancer is a heterogeneous disease, and molecular profiling of tumors from large cohorts has enabled characterization of new tumor subtypes. This is a prerequisite for improving personalized treatment and ultimately better patient outcomes. Potential tumor subtypes can be identified with methods such as unsupervised clustering or network-based stratification, which assign patients to sets based on high-dimensional molecular profiles. Detailed characterization of identified sets and their interpretation, however, remain a time-consuming exploratory process.
To address these challenges, we have developed StratomeX (http://stratomex.caleydo.org), an interactive visualization tool that complements algorithmic approaches. StratomeX also integrates a computational framework for query-based guided exploration directly into the visualization, enabling discovery of novel relationships between patient sets and efficient generation and refinement of hypotheses about tumor subtypes. StratomeX enables analysts to efficiently compare multiple patient stratifications, to correlate patient sets with clinical information or genomic alterations, and to view the differences between molecular profiles across patient sets.
https://www.creative-bioarray.com/support/resazurin-cell-viability-assay.htm
Resazurin cell viability assay is a simple, rapid, reliable, sensitive, safe and cost-effective measurement of cell viability.
STR DNA profiling is now a powerful, inexpensive tool that can generate unique DNA signatures that can be used to authenticate cell lines and detect contamination of more than one cell type. This presentation will talk about why scientists need cell authentication, what is STR profile and STR profile workflow from Creative Bioarray.
The human genome is full of repeated DNA sequences which come in various sizes and are classified according to the length of the core repeat units, the number of contiguous repeat units, and/or the overall length of the repeat region. DNA regions with short repeat units (usually 2-6 bp in length) are called Short Tandem Repeats (STR).
Approaches for the Integration of Visual and Computational Analysis of Biomed...Nils Gehlenborg
The integration of computational and statistical approaches with visualization tools is becoming crucial as biomedical data sets are rapidly growing in size. Finding efficient solutions that address the interplay between data management, algorithmic and visual analysis tools is challenging. I will discuss some of these challenges and demonstrate how we are addressing them in our Refinery Platform project (http://www.refinery-platform.org).
How to transform genomic big data into valuable clinical informationJoaquin Dopazo
How to transform genomic big data into valuable clinical information
The impact of genomics in translational medicine: present view
13th October 2014, Vall d’Hebron Institute of Research (VHIR), Barcelona, Spain
Enhancing the Human Phenotype Ontology for Use by the LaypersonNicole Vasilevsky
Presentation at the International Conference on Biological Ontology & BioCreative, August 1-4, 2016, Corvallis, Oregon, USA.
Abstract
In rare or undiagnosed diseases, physicians rely upon genotype and phenotype information in order to compare abnormalities to other known cases and to inform diagnoses. Patients are often the best sources of information about their symptoms and phenotypes. The Human Phenotype Ontology (HPO) contains over 12,000 terms describing abnormal human phenotypes. However, the labels and synonyms in the HPO primarily use medical terminology, which can be difficult for patients and their families to understand. In order to make the HPO more accessible to non-medical experts, we systematically added new synonyms using non-expert terminology (i.e., layperson terms) to the existing HPO classes or tagged existing synonyms as layperson. As a result, the HPO contains over 6,000 classes with layperson synonyms.
openEHR in Research: Linking Health Data with Computational ModelsKoray Atalag
My prezo at Medinfo 2017 openEHR Developers Workshop.
The aim was to demonstrate how openEHR supports very advanced research and analytics with examples from computational physiology and biosimulation to create patient-specific decision support.
Scientific Consensus on Brain Fingerprinting and Differing Views on the Scien...Karlos Svoboda
The following proposed Scientific Consensus on Brain fingerprinting has arisen from discussionsamong forensic scientists, legal experts, psychophysiologists, and experts in law enforcementand national security. These discussions were initiated by Lawrence A. Farwell. This is a workin progress. Discussions of these and other related issues are ongoing. Please refer commentsand suggestions to Lawrence A. Farwell at LFarwell@brainwavescience.com .The most fundamental point of consensus among scientists and other relevant experts regardingbrain fingerprinting, forensic science, and science in general is that different methods producedifferent results. Brain fingerprinting, from the seminal Farwell and Donchin (1986; 1991) andFarwell and Smith (2001) papers to the present, has never produced an error, neither a falsenegative nor a false positive. Some alternative methods of applying the same brain responses inattempts to detect concealed information have resulted in 10% to 15% errors and in some casesas high as nearly 50% errors, no better than chance. Even some purported “replications” ofFarwell and Donchin have in fact used fundamentally different methods. Consequently theyhave failed to achieve accuracy approaching that of brain fingerprinting and, unlike brainfingerprinting, are susceptible to countermeasures. These fundamental differences in scientificmethods are the reason why brain fingerprinting has been successfully applied in the field andruled admissible in court, and these alternative methods are unsuitable for field use or applicationin the criminal justice system or national security.In developing this consensus, we have specified precisely the standard scientific methods thatconstitute brain fingerprinting and attempted to identify the specific standards that are necessaryand sufficient to obtain the results that brain fingerprinting has consistently attained. We havesought to identify differences in methods that are responsible for the widely divergent resultsobtained in different laboratories conducting related research.Fundamental brain fingerprinting scientific principles, methods, and scientific standards arebriefly described the first section of this article. The proposed Scientific Consensus on BrainFingerprinting presumes a thorough understanding of the information contained therein. It alsoassumes familiarity with the articles in the literature cited in the Background section below.In the course of developing a consensus, some points have arisen on which there is considerablediversity of opinion. Some of these Differing Views on Brain Fingerprinting are briefly outlinedfollowing the Scientific Consensus on Brain Fingerprinting.
Genome-wide association study (GWAS) technology has been a primary method for identifying the genes responsible for diseases and other traits for the past ten years. GWAS continues to be highly relevant as a scientific method. Over 2,000 human GWAS reports now appear in scientific journals. Our free eBook aims to explain the basic steps and concepts to complete a GWAS experiment.
Empowering patients by increasing accessibility to clinical terminologyNicole Vasilevsky
Flash talk at Medical Library Association Pacific Northwest Chapter meeting in Portland, OR on October 18, 2016.
http://pnc-mla.cloverpad.org/annual2016
Authors: Erin Foster, Mark Engelstad, Chris Mungall, Peter Robinson, Sebastian Kohler, Melissa Haendel and Nicole Vasilevsky
The Software and Data Licensing Solution: Not Your Dad’s UBMTA mhaendel
Presented at the Association of University Technology Managers (AUTM) Annual Conference 2018
Moderator: Arvin Paranjpe, Oregon Health & Science University
Speakers: Frank Curci, Ater Wynne LLP
Melissa Haendel, Oregon Health & Science University
Charles Williams, University of Oregon
Big data is an open frontier, and it’s quickly expanding. However, transaction costs and legal barriers stand squarely in the way of meaningful, far-reaching data integration. We’ll grapple with the issues regarding a large-scale data integration project across humans, model and non-model organisms. Without pointing fingers, we’ll also share a few highlights from the (Re)usable Data Project, which outlined a five-part rubric to evaluate data licenses with respect to clarity and the reuse and redistribution of data. In addition, the topic raises the question: How well-suited are off-the-shelf software and data licenses for universities? Data scientists and software programmers are all too quick to pick one when they release their technology on GitHub. What should technology transfer professionals
recommend? We’ll discuss the usefulness and attributes of a uniform software and data license for university researchers and software programmers.
Equivalence is in the (ID) of the beholdermhaendel
Presented at PIDapalooza 2018. https://pidapalooza.org/
Determining identifier equivalency is key to data integration and to realizing the scientific discoveries that can only be made by collating our vast disconnected data stores.
There are two key problems in determining equivalency - conceptual and syntactic alignment. Conceptual alignment often relies on Xrefs and string-matching against synonyms. There is indeed a better way! Algorithmic determination of identifier equivalency across different sources can use a combination of Xrefs, priors rules, existing semantic relations, and synonyms to create equivalency cliques than can highlight the discrepancies in conceptual definitions for manual review. This is especially useful for data sources annotated with concept drift and differences, such as diseases. Syntactic issues are that there are so many variations of the same identifier, making data joins difficult. We present a framework to reconcile and provide authoritative and integration-ready prefixed identifiers (CURIES), to capture and consolidate prefixes and to build links across key resource registries. The combination of JSON-LD context technology with a prefix metadata repository provides the basis for the infrastructure to handle identifiers in a consistent fashion. Finally, this architecture also allows resources to be self describing "beacons" with respect to their identifiers.
Building (and traveling) the data-brick road: A report from the front lines ...mhaendel
The NIH Data Commons must treat the data it will contain not unlike the mortar and stones of a road. To help our fellow scientists travelers use the road, we must engineer for heavy traffic and diverse destinations. There are many steps to architecting a robust and persistent road. First, the data must be sourced and manipulated into common data models. This requires versioned access to the data, equivalency determination of identifiers within the data or minting of new ones for the data and/or within it, manipulating the data according to common data models (e.g. a genotype-to-pehnotype association in one source may relate a variant to a disease, where in another it may be a set of alleles associated with a set of phenotypes, each source models the data differently). Inclusion of the data in the Commons must meet all licensing restrictions, which are varied and usually poorly declared, as well as security, HIPAA, and ethics requirements. Software tools are needed to perform the Enhance-Transform-Load (ETL) process on a regular cycle to keep the data current, and to assess changes and quality assurance over time. For records that disappear, there needs to be a way to keep an archive of them. Once in the Commons, the data requires a map to navigate the roads: where do you want to go? Indexing and search across the data requires having the data be self-reporting - loading ontologies used in the data for indexing and providing faceted query over these and other attributes, sophisticated text mining tools, relevance ranking, and equivalency and similarity determination from amongst different providers. Once found, the users need vehicles to drive upon the road. These are their workspaces, the place where they design and implement the operations they need in order to get where they want to go. Unimaginable scientific emeralds are to be found at the end of the road, as the sum of all the data, if well integrated and made computationally reusable, has proven to be well beyond the sum of its parts in getting us where we want to go.
Reusable data for biomedicine: A data licensing odysseymhaendel
Biomedical data integrators grapple with a fundamental blocker in research today: licensing for data use and redistribution. Complex licensing and data reuse restrictions hinder most publicly-funded, seemingly “open” biomedical data from being put to its full potential. Such issues include missing licenses, non-standard licenses, and restrictive provisions. The sheer diversity of licenses are particularly thorny for those that aim to redistribute data. Redistributors are often required to contact each sub-source to obtain permissions, and this is complicated by the fact that on each side of the agreement there may be multiple legal entities involved and some sub-sources may themselves already be aggregating data from other sub-sources. Furthermore, interpreting legal compliance with source data licensing and use agreements is complicated, as data is often manipulated, shared, and redistributed by many types of research groups and users in various and subtle ways. Here, we debut a new effort, the (Re)usable Data Project, where we have created a five-part rubric to evaluate biomedical data sources and their licensing information to determine the degree to which unnegotiated and unrestricted reuse and redistribution are provided. We have tested the (Re)usable Data rubric against various biomedical data sources, ranking each source on a scale of zero to five stars, and have found that approximately half of the resources rank poorly, getting 2.5 stars or less. Our goal is to help biomedical informaticians and other users navigate the plethora of issues in reusing and redistributing biomedical data. The (Re)usable Data project aims to promote standardization and ease of reuse licensing practices by data providers.
How open is open? An evaluation rubric for public knowledgebasesmhaendel
Presented at the 2017 International Biocuration Conference.
Data relevant to any given scientific investigation is highly decentralized across thousands of specialized databases. Within the Biocuration community, we recognize that the value of open scientific knowledge bases is that they make scientific knowledge easier to find and compute, thereby maximizing impact and minimizing waste. The ever-increasing number of databases makes us necessarily question what are our priorities with respect to maintaining them, developing new ones, or senescing/subsuming ones that have completed in their mission. Therefore, open biomedical data repositories should be carefully evaluated according to quality, accessibility, and value of the database resources over time and across the translational divide.
Traditional citation count and publication impact factors as a measure of success or value are known to be inadequate to assess the usefulness of a resource. This is especially true for integrative resources. For example, almost everyone in biomedicine relies on PubMed, but almost no one ever cites or mentions it in their publications. While the Nucleic Acids Research Database issues have increased citation of some databases, many still go unpublished or uncited; even novel derivations of methodology, applications, and workflows from biomedical knowledge bases are often “adapted” but never cited. There is a lack of citation best practices for widely used biomedical database resources (e.g. should a paper be cited? A URL? Is mention of the name and access date sufficient?).
We have developed a draft evaluation rubric for evaluating open science databases according to the commonly cited FAIR principles -- Findable, Accessible, Interoperable, and Reusable, but with three additional principles: Traceable, Licensed, and Connected. These additions are largely overlooked and underappreciated, yet are critical to reuse of the knowledge contained within any given database. It is worth noting that FAIR principles apply not only to the resource as a whole, but also to their key components; this “fractal FAIRness” means that even the license, identifiers, vocabularies, APIs themselves must be Findable, Accessible, Interoperable, Reusable, etc. Here we report on initial testing of our evaluation rubric on the recent NIH/Wellcome Trust Open Science projects and seek community input for how to further advance this rubric as a Biocuration community resource.
Deep phenotyping to aid identification of coding & non-coding rare disease v...mhaendel
Whole-exome sequencing has revolutionized disease research, but many cases remain unsolved because ~100-1000 candidates remain after removing common or non-pathogenic variants. We present Genomiser to prioritize coding and non-coding variants by leveraging phenotype data encoded with the Human Phenotype Ontology and a curated database of non-coding Mendelian variants. Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes.
Credit where credit is due: acknowledging all types of contributionsmhaendel
This is an update for COASP (http://oaspa.org/conference/) on the representation of attribution beyond authorship of a publication. Publications are proxies for the projects and people that area actually engaged in the work, and represent the dissemination aspect. How can we better understand the individual contributions and their impact? The openRIF, openVIVO and FORCE11 Attribution WG efforts aim to represent scholarship in a computationally tractable manner so as to enable credit and evaluation of all types of scholarly contributions.
Force11: Enabling transparency and efficiency in the research landscapemhaendel
Presented at the Feb 2015, NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
http://www.niso.org/news/events/2015/virtual_conferences/sci_data_management/
Dataset description using the W3C HCLS standardmhaendel
This talk was presented at the BioCaddie http://biocaddie.org/ workshop at the Force15 conference (https://www.force11.org/meetings/force2015) on changing the future of scholarly communication. The goal was to increase awareness of why a Semantic Web-compliant standard was needed for describing data, where current standards fall short, and how this new emerging standard that extends prior efforts can aid data discovery and integration. This work is being lead by Michel Dumontier, Alasdair Gray, Joachim Baran, and M. Scott Marshall; participants and end-user testers are welcome, see: http://tiny.cc/hcls-datadesc-ed
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Mammalian Pineal Body Structure and Also Functions
Global phenotypic data sharing standards to maximize diagnostic discovery
1. Global Phenotypic Data Sharing Standards
to Maximize Diagnostic Discovery
Melissa Haendel, PhD and Sebastian Köhler, PhD
RD-Action workshop
April 26th and 27th, Brussels
2. Talk outline
About HPO
Semantic similarity
Leveraging basic research data
Exome analysis and disease discovery
HPO-based tools
Phenotype data standards for exchange
3. What do we mean by phenotype?
= Phenotypic abnormality = clinical feature
Constellation/Pattern clinical features
defines a disease:
– [Disease X]... is a rare developmental disorder defined by
the combination of aplasia cutis congenita of the scalp
vertex and terminal transverse limb defects. In addition,
vascular anomalies such as cutis marmorata
telangiectatica ... are recurrently seen.
(Yes, this is a simplification)
4. Starting point: OMIM
Clinical Synopsis (CS) section
Free text phenotypic description
Very expressive
Online Mendelian Inheritance in Man database
5. (Un)Controlled Vocabularies
Not designed to be easily machine interpretable
Spelling problems, acronyms, etc.
Homonyms:
... fibrillation ...
fibrillation ≠ fibrillation
= ventricular fibrillation= muscle fibrillation
6. Why you should care
OMIM Query Number of Results
large bones 264
large bone 785
enlarged bones 87
enlarged bone 156
big bones 16
huge bones 4
massive bones 28
hyperplastic bones 12
hyperplastic bone 40
bone hyperplasia 134
increased bone growth 612
7. Motivation
HPO started in 2008
Goal: computer-interpretable clinical features!
Reliable information extraction from databases based on clinical
features
Compute similarity between diseases based on clinical features
Compute similarity between patients based on clinical features
Compute similarity between patients and diseases based on clinical
features
Interoperability with basic research to improve diagnostic discovery
Easy to use
Freely available
8. The Human Phenotype Ontology
(HPO)
Description of phenotypic abnormalities (or clinical features) in
humans
abnormality of the
nervous system
neurofibrillary
tangles
cerebral inclusion
bodies
gait ataxia
gait
disturbance
ataxia
phenotypic
abnormality
incoordination
abnormality of
movement
abnormality of the
central nervous
system
This is a term
CS of OMIM:0815
CS of OMIM:1234
Neurofibrillary tangles
may be present
Paired helical filaments
9. The Human Phenotype Ontology (HPO)
Synonyms merged into one term
Textual definitions for each term
id: HP:0002185
name: Neurofibrillary tangles
def: Pathological protein
aggregates formed by
hyperphosphorylation of a
microtubule-associated protein
known as tau, causing it to
aggregate in an insoluble form.
[HPO:sdoelken]
synonym: Neurofibrillary tangles
may be present EXACT []
synonym: Paired helical filaments
EXACT []
abnormality of the
nervous system
neurofibrillary
tangles
cerebral inclusion
bodies
gait ataxia
gait
disturbance
ataxia
phenotypic
abnormality
abnormality of
movement
abnormality of the
central nervous
system
incoordination
10. The Human Phenotype Ontology
(HPO)
Semantic relations
(’subclass of’, ‘is a’)
From top to bottom,
terms get more specific
abnormality of the
nervous system
neurofibrillary
tangles
cerebral inclusion
bodies
gait ataxia
gait
disturbance
ataxia
phenotypic
abnormality
abnormality of
movement
abnormality of the
central nervous
system
is a
is a
is a
is a
is a
is a
is a
is a
is a
is a
is a
is a
is a
is a incoordination
11. Computable phenotype definitions of
disease
HPO Terms are used to annotate (describe) diseases
E.g. neurofibrillary tangles is used to annotate Alzheimer Disease:
Orphanet + Monarch:
~124,000 annotations of 7,700 rare diseases from OMIM,
Orphanet, DECIPHER
~133,000 annotations of 3,145 common diseases
Köhler et al. https://doi.org/10.1093/nar/gkw1039
OMIM:0815 OMIM:1234
Neurofibrillary tangles
may be present
Paired helical filaments
15. HPO language translations
We need your help! http://bit.ly/hpo-translations
Translation of labels, synonyms, and text definitions
Italian Spanish Russian French
German English layperson Japanese Chinese
100%11%
12%
100%
19%19%
near 100%
20%
16. Adoption of HPO
Public facing databases using HPO to
annotate patients
Tools ingesting HPO-annotated data:
Köhler et al. https://doi.org/10.1093/nar/gkw1039
17. Why HPO is a successful standard
One language shared by “all“
Synonyms “map“ to one concept (HPO term)
Contains terms that no other ontology has
Comes with disease annotations! (Not just “Yet another clinical
terminology“)
Simple, qualitative phenotyping, deviation (abnormal, abnormal
increase, abnormal decrease, ...) to ease analysis
Documented, traceable editing
Open science community project with diverse contributors
Constantly improved and extended, examples:
Layperson version for patients
Language translations
Opposite-relations between terms
18. Talk outline
About HPO
Semantic similarity
Leveraging basic research data
Exome analysis and disease discovery
HPO-based tools
Phenotype data standards for exchange
19. A disease can be described
algorithmically as a collection of
phenotypes
Patient
Disease X
Differential diagnosis with matching phenotype concepts is already good
Splenomegaly
Nasal speech
Increased spleen size Nasal voice
These are synonyms in
HPO, i.e. map to the
same term
These are synonyms in
HPO, i.e. map to the
same term
20. A disease can be described
algorithmically as a collection of
phenotypes
Patient
Disease X
Differential diagnosis with similar but non-matching phenotypes is difficult
Splenomegaly Oral motor hypotonia
Ruptured spleen Decreased muscle mass
21. Similarity between two terms
Oral motor
hypotonia
Muscular
hypotonia of the
trunk
Abnormal muscle
tone
Oral motor
hypotonia
Abnormality of
calvarial
morphology
Phenotypic
abnormality
High scoring match
Very low scoring match
Medium scoring match
Score: Measured by Information Content
22. Comparing phenotype profiles
E.g. Patient-to-Disease
comparison
Patient‘s phenotypes
more similar to Disease A
Orphamizer would rank
Disease A before Disease
Disease BPatientPatient Disease A
High scoring match
Very low scoring match
Medium scoring match
Score: Measured by Information Content
24. Talk outline
About HPO
Semantic similarity
Leveraging basic research data
Exome analysis and disease discovery
HPO-based visualization tools
Phenotype data standards for exchange
25. The genome is sequenced, but...
3,398
OMIM
Mendelian Diseases with
no known genetic basis
?
At least 120,000*
ClinVar
Variants with no known
pathogenicity
…we still don’t know very much about what it does
*This is > twice what it was
in 2016!
27. More species = more coverage
19,008
78%
14,779
Number of human protein-coding genes in ExAC DB as per Lek et al. Nature 2016
19,008
Even inclusion of just four species boosts
phenotypic coverage of genes by 38%
(5189%)
Combined = 89%
19,008
2,195 7,544 7,235 = 16,974
(union of coverage in any species)
9,739
51%
Mungall et al Nucleic Acids Research bit.ly/monarch-nar-2016
28. Ulcerated
paws
Palmoplantar
hyperkeratosis
Thick hand skin
Image credits:
"HandsEBS" by James Heilman, MD - Own work. Licensed under CC BY-SA 3.0 via Commons –
https://commons.wikimedia.org/wiki/File:HandsEBS.JPG#/media/File:HandsEBS.JPG
http://www.guinealynx.info/pododermatitis.html
30. Challenge: Each database uses their own
phenotype vocabulary/ontology
ZFA
MP
DPO
WPO
HP
OMIA
VT
FYPO
APO
SNOMED
…
NCIT
…
WB
PB
FB
OMIA
MGI
RGD
ZFIN
SGD
HPOA
EHR
IMPC
OMIM
…
QTLdb
31. Can we help machines understand
phenotype terms?
“Palmoplantar
hyperkeratosis”
Human phenotype
I have absolutely
no idea what
that means
32. Decomposition of complex concepts
using species neutral terms
Mungall, C. J., Gkoutos, G., Smith, C., Haendel, M., Lewis, S., & Ashburner,
M. (2010). Integrating phenotype ontologies across multiple species.
Genome Biology, 11(1), R2. doi:10.1186/gb-2010-11-1-r2
“Palmoplantar
hyperkeratosis”
increased
Stratum corneum
layer of skin
=
Human phenotype
PATO
Uberon
Species neutral ontologies, homologous concepts
Autopod
keratinization
GO
41. Talk outline
About HPO
Semantic similarity
Leveraging basic research data
Exome analysis and disease discovery
HPO-based tools
Phenotype data standards for exchange
42. Prevailing clinical genomic pipelines
leverage only a tiny fraction of the available
data
PATIENT EXOME
/ GENOME
PATIENT CLINICAL
PHENOTYPES
PUBLIC GENOMIC DATA
PUBLIC CLINICAL PHENOTYPE,
DISEASE DATA
POSSIBLE DISEASES
DIAGNOSIS & TREATMENT
PATIENT ENVIRONMENT
PUBLIC ENVIRONMENT,
DISEASE DATA
PATIENT OMICS PHENOTYPES PUBLIC OMICS PHENOTYPES,
CORRELATIONS
Under-utilized data
44. Combining G2P data for variant
prioritization
Whole exome
Remove off-target and
common variants
Variant score from allele
freq and pathogenicity
Phenotype score from phenotypic similarity
PHIVE score to give final candidates
Mendelian filters
45. Exomiser results for UDP diagnosed
patients
Inclusion of phenotype data improves variant prioritization
In 60% of first 1000 genomes at GEL, Exomiser
predicts top candidate
In 86% of cases, Exomiser predicts within top 5
46. Example case solved by Exomiser
Phenotypic
profile
Genes
Heterozygous,
missense mutation
STIM-1
N/A
Heterozygous,
missense mutation
STIM-1
N/A
Stim1Sax/Sax
Ranked STIM-1 variant maximally pathogenic
based on cross-species G2P data,
in the absence of traditional data sources
http://bit.ly/exomiser
47. Deep phenotyping and “fuzzy” matching
algorithms improve diagnostics
4.9% exomes with dual molecular diagnoses,
differentiated with deep phenotyping
48. Talk outline
About HPO
Semantic similarity
Leveraging basic research data
Exome analysis and disease discovery
HPO-based tools
Phenotype data standards for exchange
49. How much phenotyping is enough?
Enlarged ears (2)Dark hair (6) Female (4)
Male (4)
Blue skin (1)
Pointy ears (1)
Hair absent on head (1)
Horns present (1)
Hair present
on head (7)
Enlarged lip (2)
Increased skin
pigmentation (3)
bit.ly/annotationsufficiency
51. Matchmaker Exchange for patients, diseases, and model
organisms to aid diagnosis and mechanistic discovery
www.monarchinitiative.org
http://bit.ly/Monarch-MME
Goal: Get clinical sites & public databases to provide standardized phenotype data
52. Talk outline
About HPO
Semantic similarity
Leveraging basic research data
Exome analysis and disease discovery
HPO-based tools
Phenotype data standards for exchange
62. Journals are now requiring HPO
terms
Robinson, P. N., Mungall, C. J., & Haendel, M. (2015). Capturing phenotypes for precision
medicine. Molecular Case Studies, 1(1), a000372. doi:10.1101/mcs.a000372
Each phenopacket
can be shared via DOI
in any repository
outside paywall (eg.
Figshare, Zenodo, etc)
Each article can be
associated with a
phenopacket
63. Community “curate-athons” for of HPO
Cardiovascular curate-athon at Stanford.
@20 cardiologists (surgeons, pediatric, etc.),
four ontologists, and three clinical curators
met for two days.
Abnormal Complex
Voltage to be added to all waves
-increased, decreased, fluctuating (alternans)
Duration to be added to all waves
-increased, decreased
P wave
-notching
-axis
QRS
-fractionation
-axis (right/left/extreme)
Q wave
R wave
S wave
R’ wave
S’ wave (abnormal only)
J wave (can be normal variant)
Epsilon wave (abnormal only)
Osborne wave (abnormal only)
Terminal slur wave (can be normal variant)
Delta wave (abnormal only)
Added 100s of clinically relevant
cardiophysiology phenotypes to HPO,
new exome analysis possible
64. Summary
The Human Phenotype Ontology is a robust standard
describing phenotypic abnormalities FOR the community,
FROM the community for deep phenotyping rare disease
patients
Model organism data can fill gaps in our knowledge and
aid mechanistic exploration of disease candidates
Tools that leverage the Human Phenotype Ontology can be
used to prioritize coding and noncoding variants for WES
and WGS and CNVs
Patients can provide self-phenotyping information as
partners in the deep phenotyping process
Phenopackets is a FAIR-based GA4GH exchange standard
for facilitating distributed phenotype data sharing for
clinics, labs, patients, and journals
65. Acknowledgements
Orphanet
Ana Rath
Annie Olry
Marc Hanauer
Halima Lourghi
Lawrence Berkeley
Chris Mungall
Suzanna Lewis
Jeremy Nguyen
Seth Carbon
RENCI
Jim Balhoff
OHSU
Matt Brush
Kent Shefchek
Julie McMurry
Tom Conlin
Nicole Vasilevsky
Dan Keith
Genomics
England/Queen Mary
Damian Smedley
Jules Jacobson
Jackson Laboratory
Peter Robinson
Leigh Carmody
With special thanks to Julie McMurry for excellent graphic design
Garvan
Tudor Groza
Craig McNamara
Hipbi / NeuroCure
Dominik Seelow
Markus Schülke-
Gerstenfeld
Charite
Dominik Seelow
Tomasz Zemojtel
One of the workshop questions was : why the HPO has been recommended as an optimal ontology for clinical (phenotypic) descriptions.
I have not been part of the process that lead to this recommendation, such that I will try to rather give my impression, why HPO has been so successfull over the last 8 years.
First, what is the content of HPO. It contains phenotypic abnormalities ... Definition in the context of HPO ... Bla bla
What data did we want to use in the beginning. This is what we had.
Problems. Well – known. Just briefly.
Why is it so important to have controlled vocabularies at all
Query today:
Search: 'large bone'
Results: 9,128 entries.
Search: 'enlarged bone'
Results: 3,912 entries.
CHV = Consumer Health Vocabulary
Translation teams at: https://github.com/Human-Phenotype-Ontology/HPO-translations/blob/master/README.md
Contact: sebastian.koehler@charite.de
Merged with next slide
You take it from here Melissa?
There is a lot we don’t know about the genome
As of March 2017, OMIM number: 3398 unknown 4,964 known
ClinVar number: 121,000 at least
with the addition that these are variants that researchers have found suspicious, due to rarity in the population or something else, contextually 160k variants in the entire genome is not much
Each organism provides unique genetic & phenotypic data that helps fill in knowledge gaps in the human genome. For example, much work has been done in chicks to understand limb development. I used to work in a fruit fly lab studying the brain, so I am particularly attached to fly data. As you can imagine, phenotypes described for flies, or other models, use very different terms than those used for humans. Later, I will discuss how Monarch is overcoming this challenge. Now I will show you an example of how using phenotype data from other organisms can improve human health.
Our approach is to try and get the machine to understand the terms so that it can assist us intelligently.
We make things digestible. Complex concepts into simpler parts. We use ontologies that are comparative by design.
Represent organism as a biological subject
Represent diseases/genotypes as collections of nodes in the graph
Interoperable with other bioinformatics resources and leverage modern semantic standards
5 root classes:
Phenotypic abnormality, Mode of Inheritance, Clinical modifier, Mortality/Ageing, Frequency
11,813 classes/terms in HPO
~124,000 annotations of 7,700 rare diseases from OMIM, Orphanet, DECIPHER
~133,000 annotations of 3,145 common diseases
OWLsim algorithm
About HPO 2: We want the vocabulary to be enable sophisticated phenotypic matching within and across species
Our team has led international ontology development efforts, including ICD112, the HPO29,30, the Gene Ontology18,31,32, and major tissue/cell ontologies used for mam- malian functional genomics20,33–37. We have extensive experience integrating data using these ontologies38,39. A fundamental challenge is to translate the vocabularies used by clinicians via EMRs and billing systems to those used in primary research data. For example, a clinician may describe a patient as having “Microcephaly” with an EMR code ICD10-Q02. A basic scientist using mice may describe this condition with MP:0003303.To translate between clinician and scientist, we provide services that map equivalent concepts15,40. Finally, TransMed will generate dynamic ontologies by combining existing classifications with data in the system, e.g. to gener- ate disease nosologies based on pathway membership, orthology, and phenotypic similarity.
///
Nosology: We will prototype dynamic ontology generation based on combining our existing knowledge sources. We will apply a mixture of methods. This includes our own k-BOOM Bayesian algorithm that weighs different knowledge sources and ontologies. We will also apply our data-driven techniques for generating nosologies based on molecular mechanistic information ingested into our knowledge graph. For low probability associations and equivalencies that may have high value, we will perform some curation to reconcile these.
https://github.com/monarch-initiative/monarch-disease-ontology/issues/90
Note the two subgraphs; little overlap in the upper areas
This was the novel case we solved. The UDP patient had a number of signs and symptoms including various platelet abnormalities. The same heterozygous, missense mutation was seen in 2 patients and ranked top by Exomiser. It had never been seen in any of the SNP databases and was predicted maximally pathogenic. Finally a mouse curated by MGI involving a heterozygous, missense point mutation introduced by chemical mutagenesis exhibited strikingly similar platelet abnormalities.
Example showing how adding fuzzy phenotype matching improves disease diagnosis above using sequence based methodologies alone.
Knowing what the normal distribution and clustering of phenotypes is helps us know that blue skin is rare and can reliably distinguish between phenotype profiles. Likewise to know that if the first phenotype entered is enlarged lip, the next one to ask for would be enlarged ears. The combination of 3 non-unique phenotypes offers a perfect match.
This is a lot of text and not easy to see for the audience.
The classic G+E=P. But the = has a lot that can be applied to aid the linking.
G-P or D (disease)
causes
contributes to
is risk factor for
protects against
correlates with
is marker for
modulates
involved in
increases susceptibility to
G-G (kind of)
regulates
negatively regulates (inhibits)
positively regulates (activates)
directly regulates
interacts with
co-localizes with
co-expressed with
P/D - P/D
part of
results in
co-occurs with
correlates with
hallmark of (P->D)
E-P
contributes to (E->P)
influences (E->P)
exacerbates (E->P)
manifest in (P->E)
G-E (kind of)
expressed in
expressed during
contains
inactivated by
The classic G+E=P. But the = has a lot that can be applied to aid the linking.
The classic G+E=P. But the = has a lot that can be applied to aid the linking.
Needs adjusting yet
Fully translational – from bench to bedside – group of stakeholders, contributors, and partners