The Human Phenotype Ontology (HPO) was developed to describe phenotypic abnormalities, aka, “deep phenotyping”, whereby symptoms and characteristic phenotypic findings (a phenotypic profile) are captured. The HPO has been utilized to great success for assisting computational phenotype comparison against known diseases, other patients, and model organisms to support diagnosis of rare disease patients. Clinicians and geneticists create phenotypic profiles based on clinical evaluation, but this is time consuming and can miss important phenotypic features. Patients are sometimes the best source of information about their symptoms that might otherwise be missed in a clinical encounter. However, HPO primarily use medical terminology, which can be difficult for patients and their families to understand. To make the HPO accessible to patients, we systematically added non-expert terminology (i.e., layperson terms) synonyms. Using semantic similarity, patient-recorded phenotypic profiles can be evaluated against those created clinically for undiagnosed patients to determine the improvement gained from the patient-driven phenotyping, as well as how much the patient phenotyping narrows the diagnosis. This patient-centric HPO can be utilized by all: in patient-centered rare disease websites, in patient community platforms and registries, or even to post one’s hard-to-diagnosed phenotypic profile on the Web.
Inbreeding coefficient
Inbreeding and self-fertilization
Genotypes mate at random with respect to their genotype at this particular locus.
There are many ways in which this assumption might be violated:
• Some genotypes may be more successful in mating than others, sexual selection.
• Genotypesthataredifferentfromoneanothermaymatemoreoftenthanexpecteddisassortative mating, e.g., self-incompatibility alleles in flowering plants, MHC lociinhumans (the smelly t-shirt experiment)
• Genotypesthataresimilartooneanothermaymatemoreoftenthanexpectedassortativemating.
• Some fraction of the offspring produced may be produced asexually.
• Individuals may mate with relatives inbreeding.
– self-fertilization
– sib-mating
– first-cousin mating
– parent-offspring mating
– etc.
Affordable field high-throughput phenotyping - some tipsCIMMYT
Remote sensing –Beyond images
Mexico 14-15 December 2013
The workshop was organized by CIMMYT Global Conservation Agriculture Program (GCAP) and funded by the Bill & Melinda Gates Foundation (BMGF), the Mexican Secretariat of Agriculture, Livestock, Rural Development, Fisheries and Food (SAGARPA), the International Maize and Wheat Improvement Center (CIMMYT), CGIAR Research Program on Maize, the Cereal System Initiative for South Asia (CSISA) and the Sustainable Modernization of the Traditional Agriculture (MasAgro)
Inbreeding coefficient
Inbreeding and self-fertilization
Genotypes mate at random with respect to their genotype at this particular locus.
There are many ways in which this assumption might be violated:
• Some genotypes may be more successful in mating than others, sexual selection.
• Genotypesthataredifferentfromoneanothermaymatemoreoftenthanexpecteddisassortative mating, e.g., self-incompatibility alleles in flowering plants, MHC lociinhumans (the smelly t-shirt experiment)
• Genotypesthataresimilartooneanothermaymatemoreoftenthanexpectedassortativemating.
• Some fraction of the offspring produced may be produced asexually.
• Individuals may mate with relatives inbreeding.
– self-fertilization
– sib-mating
– first-cousin mating
– parent-offspring mating
– etc.
Affordable field high-throughput phenotyping - some tipsCIMMYT
Remote sensing –Beyond images
Mexico 14-15 December 2013
The workshop was organized by CIMMYT Global Conservation Agriculture Program (GCAP) and funded by the Bill & Melinda Gates Foundation (BMGF), the Mexican Secretariat of Agriculture, Livestock, Rural Development, Fisheries and Food (SAGARPA), the International Maize and Wheat Improvement Center (CIMMYT), CGIAR Research Program on Maize, the Cereal System Initiative for South Asia (CSISA) and the Sustainable Modernization of the Traditional Agriculture (MasAgro)
This ppt is prepared by Sandeep Kumar Maurya , m. pharma ,department of pharmaceutical sciences, dr. harisingh gour university sagar madhya pradesh.
This SlideShare covers some of genetic disorders , molecular pathology, single gene disorder type of single gene disorder and advanced level cancer , mechanism of cancer, model for cancer induction explanation.
1. Introduction of genetic disorder
2. Common genetic disorders
3. Causes of genetic disorders
4. Symptoms of genetic disorders
5. single gene disorder
6. Cancer.
8. References.
Genetic disorders occur when a mutation (a harmful change to a gene, also known as a pathogenic variant) affects your genes or when you have the wrong amount of genetic material. Genes are made of DNA (deoxyribonucleic acid), which contain instructions for cell functioning and the characteristics that make you unique.
You receive half your genes from each biological parent and may inherit a gene mutation from one parent or both. Sometimes genes change due to issues within the DNA (mutations). This can raise your risk of having a genetic disorder. Some cause symptoms at birth, while others develop over time.`
Introduction of Cancer
Cancer is caused by the failure of genetic mechanisms that control the growth and proliferation of cells. In most cases, cumulative damage to multiple genes (the "multi-hit" model) via physical and chemical agents, replication errors, etc. contribute to oncogenesis. However, a person's inherited genetic background also may strongly contribute. In cancer, a single transformed cell grows to become a primary tumor, accumulates more mutations and becomes more aggressive, then metastasizes to another tissue and forms a secondary tumor. The difference between a benign tumor and a malignant one mostly involves the latter's ability to invade and metastasize to other tissues. Tumors are classified according to the embryonic origin of the tissue from which they originate. The term carcinoma is used to denote cancers of endodermal (e.g., gut epithelia cancers) or ectodermal (e.g., skin, neural epithelia) origin. Cancers of mesodermal origin (e.g., muscle, blood cells) are called sarcomas. Carcinomas make up >90% of malignant tumors.
This presentation elaborates regarding introduction to genetics, chromosomes, DNA, RNA, Genetics of developmental disorders of teeth, Genetics of craniofacial disorders and syndromes, genetics of cleft lip and palate, malocclusion and dental caries
DNA sequence variations are sometimes described as mutations and sometimes as polymorphisms. A gene is said to be polymorphic if more than one allele occupies that gene's locus within a population.
Polymorphic sequence variants usually do not cause overt debilitating diseases. Many are found outside of genes and are completely neutral in effect. Others may be found within genes, but may influence characteristics such as height and hair colour rather than characteristics of medical importance.
However, polymorphic sequence variation does contribute to disease susceptibility and can also influence drug responses (Single Nucleotide Polymorphisms).
It promotes diversity and persists over many generations because no single form has an overall advantage or disadvantage over the others in terms of natural selection.
It is originally used to describe visible forms of genes, but now used to include cryptic modes such as blood types, which require a blood test to decode.
In addition to having more than one allele at a specific locus, each allele must also occur in the population at a rate of at least 1% to generally be considered polymorphic.
Gene polymorphisms can occur in any region of the genome.
The majority of polymorphisms are silent, meaning they do not alter the function or expression of a gene.
Some polymorphism is visible. For example, in dogs the E locus, can have any of five different alleles, known as E, Em, Eg, Eh, and e. Varying combinations of these alleles contribute to the pigmentation and patterns seen in dog coats.
Human blood groups is also a polymorphic effect.
Human skin color is influenced by an intergenic DNA polymorphism regulating transcription of the nearby BNC2 pigmentation gene.
FRAGILE X SYNDROME ( FXS ) an inherited cause of mental retardation.shhhoaib
-FXS is a genetic syndrome that is the most widespread single-gene cause of autism and inherited cause of mental retardation.
- It is associated with the expansion of the CGG trinucleotide repeat affecting the Fragile X mental retardation 1 (FMR1) gene on the X chromosome.
- Resulting in a failure to express the fragile X mental retardation protein (FMRP).
-FMRP is required for normal neural development.
-Absence of FMRP leads to abnormalities in brain development and function.
Progeria (HGPS), also known as Hutchinson-Gilford syndrome, is a progressive genetic disorder that causes children to age rapidly, beginning in their first two years.
The Monarch Initiative: From Model Organism to Precision Medicinemhaendel
NIH BD2K all-hands meeting poster November 12, 2015.
Attempts at correlating phenotypic aspects of disease with causal genetic influences are often confounded by the challenges of interpreting diverse data distributed across numerous resources. New approaches to data modeling, integration, tooling, and community practices are needed to make efficient use of these data. The Monarch Initiative is an international consortium working on the development of shared data, tools, and standards to enable direct translation of integrated genotype, phenotype, and environmental data from human and model organisms to enhance our understanding of human disease. We utilize sophisticated semantic mapping techniques across a diverse set of standardized ontologies to deeply integrate data across species, sources, and modalities. Using phenotype similarity matching algorithms across these data enables disorder prediction, variant prioritization, and patient matching against known diseases and model organisms. These similarity algorithms form the core of several innovative tools. The Exomiser, which enables exome variant prioritization by combining pathogenicity, frequency, inheritance, protein interaction, and cross-species phenotype data. Our Phenotype Sufficiency tool provides clinicians the ability to compare patient phenotypic profiles using the Human Phenotype Ontology to determine uniqueness and specificity in support of variant prioritization. The PhenoGrid visualization widget illustrates phenotype similarity between patients, known diseases, and model organisms. Monarch develops models in collaboration with the community in support of the burgeoning genotype-phenotype disease research community. We have successfully used Exomiser to solve a number of undiagnosed patient cases in collaboration with the NIH Undiagnosed Disease Program. Ongoing development in coordination with the Global Alliance for Genetic Health (GA4GH) and other groups will catalyze the realization of our goal of a vital translational community focused on the collaborative application of integrated genotype, phenotype, and environmental data to human disease.
This ppt is prepared by Sandeep Kumar Maurya , m. pharma ,department of pharmaceutical sciences, dr. harisingh gour university sagar madhya pradesh.
This SlideShare covers some of genetic disorders , molecular pathology, single gene disorder type of single gene disorder and advanced level cancer , mechanism of cancer, model for cancer induction explanation.
1. Introduction of genetic disorder
2. Common genetic disorders
3. Causes of genetic disorders
4. Symptoms of genetic disorders
5. single gene disorder
6. Cancer.
8. References.
Genetic disorders occur when a mutation (a harmful change to a gene, also known as a pathogenic variant) affects your genes or when you have the wrong amount of genetic material. Genes are made of DNA (deoxyribonucleic acid), which contain instructions for cell functioning and the characteristics that make you unique.
You receive half your genes from each biological parent and may inherit a gene mutation from one parent or both. Sometimes genes change due to issues within the DNA (mutations). This can raise your risk of having a genetic disorder. Some cause symptoms at birth, while others develop over time.`
Introduction of Cancer
Cancer is caused by the failure of genetic mechanisms that control the growth and proliferation of cells. In most cases, cumulative damage to multiple genes (the "multi-hit" model) via physical and chemical agents, replication errors, etc. contribute to oncogenesis. However, a person's inherited genetic background also may strongly contribute. In cancer, a single transformed cell grows to become a primary tumor, accumulates more mutations and becomes more aggressive, then metastasizes to another tissue and forms a secondary tumor. The difference between a benign tumor and a malignant one mostly involves the latter's ability to invade and metastasize to other tissues. Tumors are classified according to the embryonic origin of the tissue from which they originate. The term carcinoma is used to denote cancers of endodermal (e.g., gut epithelia cancers) or ectodermal (e.g., skin, neural epithelia) origin. Cancers of mesodermal origin (e.g., muscle, blood cells) are called sarcomas. Carcinomas make up >90% of malignant tumors.
This presentation elaborates regarding introduction to genetics, chromosomes, DNA, RNA, Genetics of developmental disorders of teeth, Genetics of craniofacial disorders and syndromes, genetics of cleft lip and palate, malocclusion and dental caries
DNA sequence variations are sometimes described as mutations and sometimes as polymorphisms. A gene is said to be polymorphic if more than one allele occupies that gene's locus within a population.
Polymorphic sequence variants usually do not cause overt debilitating diseases. Many are found outside of genes and are completely neutral in effect. Others may be found within genes, but may influence characteristics such as height and hair colour rather than characteristics of medical importance.
However, polymorphic sequence variation does contribute to disease susceptibility and can also influence drug responses (Single Nucleotide Polymorphisms).
It promotes diversity and persists over many generations because no single form has an overall advantage or disadvantage over the others in terms of natural selection.
It is originally used to describe visible forms of genes, but now used to include cryptic modes such as blood types, which require a blood test to decode.
In addition to having more than one allele at a specific locus, each allele must also occur in the population at a rate of at least 1% to generally be considered polymorphic.
Gene polymorphisms can occur in any region of the genome.
The majority of polymorphisms are silent, meaning they do not alter the function or expression of a gene.
Some polymorphism is visible. For example, in dogs the E locus, can have any of five different alleles, known as E, Em, Eg, Eh, and e. Varying combinations of these alleles contribute to the pigmentation and patterns seen in dog coats.
Human blood groups is also a polymorphic effect.
Human skin color is influenced by an intergenic DNA polymorphism regulating transcription of the nearby BNC2 pigmentation gene.
FRAGILE X SYNDROME ( FXS ) an inherited cause of mental retardation.shhhoaib
-FXS is a genetic syndrome that is the most widespread single-gene cause of autism and inherited cause of mental retardation.
- It is associated with the expansion of the CGG trinucleotide repeat affecting the Fragile X mental retardation 1 (FMR1) gene on the X chromosome.
- Resulting in a failure to express the fragile X mental retardation protein (FMRP).
-FMRP is required for normal neural development.
-Absence of FMRP leads to abnormalities in brain development and function.
Progeria (HGPS), also known as Hutchinson-Gilford syndrome, is a progressive genetic disorder that causes children to age rapidly, beginning in their first two years.
The Monarch Initiative: From Model Organism to Precision Medicinemhaendel
NIH BD2K all-hands meeting poster November 12, 2015.
Attempts at correlating phenotypic aspects of disease with causal genetic influences are often confounded by the challenges of interpreting diverse data distributed across numerous resources. New approaches to data modeling, integration, tooling, and community practices are needed to make efficient use of these data. The Monarch Initiative is an international consortium working on the development of shared data, tools, and standards to enable direct translation of integrated genotype, phenotype, and environmental data from human and model organisms to enhance our understanding of human disease. We utilize sophisticated semantic mapping techniques across a diverse set of standardized ontologies to deeply integrate data across species, sources, and modalities. Using phenotype similarity matching algorithms across these data enables disorder prediction, variant prioritization, and patient matching against known diseases and model organisms. These similarity algorithms form the core of several innovative tools. The Exomiser, which enables exome variant prioritization by combining pathogenicity, frequency, inheritance, protein interaction, and cross-species phenotype data. Our Phenotype Sufficiency tool provides clinicians the ability to compare patient phenotypic profiles using the Human Phenotype Ontology to determine uniqueness and specificity in support of variant prioritization. The PhenoGrid visualization widget illustrates phenotype similarity between patients, known diseases, and model organisms. Monarch develops models in collaboration with the community in support of the burgeoning genotype-phenotype disease research community. We have successfully used Exomiser to solve a number of undiagnosed patient cases in collaboration with the NIH Undiagnosed Disease Program. Ongoing development in coordination with the Global Alliance for Genetic Health (GA4GH) and other groups will catalyze the realization of our goal of a vital translational community focused on the collaborative application of integrated genotype, phenotype, and environmental data to human disease.
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...mhaendel
Presented at the IRDiRC 2017 conference in Paris, Feb 9th, 2017 (http://irdirc-conference.org/). This talk reviews use of the Human Phenotype Ontology for phenotype comparisons against other patients, known diseases, and animal models for diagnostic discovery. It also discusses the new Phenopackets Exchange mechanism for open phenotypic data sharing.
www.monarchinitiative.org
www.phenopackets.org
www.human-phenotype-ontology.org
Empowering patients by increasing accessibility to clinical terminologyNicole Vasilevsky
Flash talk at Medical Library Association Pacific Northwest Chapter meeting in Portland, OR on October 18, 2016.
http://pnc-mla.cloverpad.org/annual2016
Authors: Erin Foster, Mark Engelstad, Chris Mungall, Peter Robinson, Sebastian Kohler, Melissa Haendel and Nicole Vasilevsky
The Application of the Human Phenotype Ontology mhaendel
Presented at the II International Summer School for Rare Disease and Orphan Drug Registries, September 15-19, 2014, Organized by the National Centre for Rare Diseases
Istituto Superiore di Sanità (ISS), Rome, Italy.
Note the extensive contribution by many consortium members and partners listed in the acknowledgements slide.
Credit where credit is due: acknowledging all types of contributionsmhaendel
This is an update for COASP (http://oaspa.org/conference/) on the representation of attribution beyond authorship of a publication. Publications are proxies for the projects and people that area actually engaged in the work, and represent the dissemination aspect. How can we better understand the individual contributions and their impact? The openRIF, openVIVO and FORCE11 Attribution WG efforts aim to represent scholarship in a computationally tractable manner so as to enable credit and evaluation of all types of scholarly contributions.
Talk at Medical Library Association Pacific Northwest Chapter meeting in Portland, OR on October 18, 2016.
http://pnc-mla.cloverpad.org/annual2016
Authors: Nicole Vasilevsky, Jackie Wirz, Bjorn Pederson, Ted Laderas, Shannon McWeeney, William Hersh, David Dorr, and Melissa Haendel
On the Reproducibility of Science: Unique Identification of Research Resourc...Nicole Vasilevsky
Poster presentation at the Data Information Literacy Symposium at Purdue University in Indiana, Sept. 2013. This study is published here: https://peerj.com/articles/148/
Couture Curricula - BD2K Data Science Tailored to Your NeedsNicole Vasilevsky
Poster presentation at Force2016 (https://www.force11.org/meetings/force2016) describing Big Data to Science (BD2K) efforts at Oregon Health & Science University.
Talk titled "Roles for Libraries in Providing Research Data Management Services" for presentation at the ACRL conference in Portland, OR, on 03/28/15. Presented by Nicole Vasilevsky (Oregon Health & Science University), Victoria Mitchell (University of Oregon) and Jeremy Kenyon (University of Idaho).
Enhancing the Human Phenotype Ontology for Use by the LaypersonNicole Vasilevsky
Presentation at the International Conference on Biological Ontology & BioCreative, August 1-4, 2016, Corvallis, Oregon, USA.
Abstract
In rare or undiagnosed diseases, physicians rely upon genotype and phenotype information in order to compare abnormalities to other known cases and to inform diagnoses. Patients are often the best sources of information about their symptoms and phenotypes. The Human Phenotype Ontology (HPO) contains over 12,000 terms describing abnormal human phenotypes. However, the labels and synonyms in the HPO primarily use medical terminology, which can be difficult for patients and their families to understand. In order to make the HPO more accessible to non-medical experts, we systematically added new synonyms using non-expert terminology (i.e., layperson terms) to the existing HPO classes or tagged existing synonyms as layperson. As a result, the HPO contains over 6,000 classes with layperson synonyms.
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Robert H. McDonald
This is the slidedeck for my ACRL 2015 TechConnect Presentation with Nicole Vasilevsky (OHSU). For more on the program see - <a>http://bit.ly/1xcQbCr</a>.
Envisioning a world where everyone helps solve diseasemhaendel
Keynote presented at the Semantic Web for Life Sciences conference in Cambridge, UK, December 9th, 2015
http://www.swat4ls.org/
The talk focuses on the use of ontologies for data integration to support rare disease diagnostics, and how so very many people unbeknownst to the patient or even to the researchers creating the data are involved in a diagnosis.
Why the world needs phenopacketeers, and how to be onemhaendel
Keynote presented at the the Ninth International Biocuration Conference Geneva, Switzerland, April 10-14, 2016
The health of an individual organism results from complex interplay between its genes and environment. Although great strides have been made in standardizing the representation of genetic information for exchange, there are no comparable standards to represent phenotypes (e.g. patient disease features, variation across biodiversity) or environmental factors that may influence such phenotypic outcomes. Phenotypic features of individual organisms are currently described in diverse places and in diverse formats: publications, databases, health records, registries, clinical trials, museum collections, and even social media. In these contexts, biocuration has been pivotal to obtaining a computable representation, but is still deeply challenged by the lack of standardization, accessibility, persistence, and computability among these contexts. How can we help all phenotype data creators contribute to this biocuration effort when the data is so distributed across so many communities, sources, and scales? How can we track contributions and provide proper attribution? How can we leverage phenotypic data from the model organism or biodiversity communities to help diagnose disease or determine evolutionary relatedness? Biocurators unite in a new community effort to address these challenges.
Making the most of phenotypes in ontology-based biomedical knowledge discoveryMichel Dumontier
A phenotype is an observable characteristic of an individually and typically pertains to its morphology, function, and behavior. Phenotypes, whether observed at the bench or the bedside, are increasingly being used to gain insight into the diagnosis, mechanism, and treatment of disease. A key aspect of these approaches involve comparing phenotypes that are defined in multiple terminologies that often cater to altogether different organisms, such as mice and humans. In this seminar, I will discuss computational approaches for harmonizing and utilizing phenotypes for translational research. We will examine case studies which involve the computation of semantic similarity including the use of phenotypes to inform clinical diagnosis of rare diseases, to identify human drug targets using mice knock-out models, and to explore phenotype-based approaches for drug repositioning .
Enhancing Rare Disease Literature for Researchers and PatientsErin D. Foster
Objectives: In rare disease research, structured phenotype information is crucial to document in order to draw connections between other known cases and work towards diagnosis and treatment of disease. The Human Phenotype Ontology (HPO) is a standardized vocabularly that describes phenotypic abnormalities encountered in human diseases. To enable the increased identification of traits (i.e., phenotypes) associated with rare diseases, the HPO was expanded to include layperson synonyms to make the ontology more accessible and useful to patients. Additionally, the HPO was used to annotate phenotypes in a sample of rare disease case reports to provide structured annotations of rare disease phenotypes.
Methods: The HPO was systematically reviewed and 'layperson synonyms' were added to include terms used by patients and non-medical professionals. Subsequent work annotated phenotypic descriptions in a sample of rare disease case reports with HPO terms. The literature sample was identified by filtering 'case reports' in PubMed and excluding articles that were already included in the Online Mendelian Inheritance of Man (OMIM) database. The sample set was further restricted to articles from the European Journal of Human Genetics for the pilot set, which resulted in a final sample size of 143 articles. The papers were reviewed and annotated for the following information: disease name, associated gene(s), and corresponding phenotypes.
Results: The review of the HPO resulted in approximately half of the terms including layperson synonyms. A subset of the literature sample was annotated to determine the best curation workflow. Of that subset of papers (n=20), 353 total phenotypes were identified. 12% of these phenotypes were not included in the HPO and required new term and/or synonym requests. Some challenges encountered in this work included maintaining consistency in HPO term definitions and use, as well as annotation reliability.
Conclusion: This work contributes to knowledge of rare diseases by curating the existing literature to provide structured annotation of rare disease traits, which helps with information retrieval and data interoperability and reuse. Additionally, the expansion of the HPO to include layperson synonyms enables patients to 'self-phenotype' and contribute to the identification of rare disease traits. Following the completed annotation of the literature sample, future work will focus on incorporating the annotations into databases that collect rare disease phenotypic information. Further work is also being done to add additional layperson synonyms to the HPO through review of patient forums and medical message boards to continue to identify terminology used by actual patients.
The Foundation of P4 Medicine Keynote Presentation as presented by Leroy Hood, M.D., PhD, at the Ohio State University Personalized Health Care National Conference 2010.
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...mhaendel
Presented at AMIA TBI CRI 2018.
Rare disease patients are expert in their medical history and these patients not only are some of the most engaged, but also they can themselves provision data for use in clinical evaluation. We therefore created a lay-person version of our clinical deep phenotyping instrument, the Human Phenotype Ontology. Here, we evaluate the diagnostic utility of this lay-HPO, and debut a new software tool for patient-led deep phenotyping.
The Software and Data Licensing Solution: Not Your Dad’s UBMTA mhaendel
Presented at the Association of University Technology Managers (AUTM) Annual Conference 2018
Moderator: Arvin Paranjpe, Oregon Health & Science University
Speakers: Frank Curci, Ater Wynne LLP
Melissa Haendel, Oregon Health & Science University
Charles Williams, University of Oregon
Big data is an open frontier, and it’s quickly expanding. However, transaction costs and legal barriers stand squarely in the way of meaningful, far-reaching data integration. We’ll grapple with the issues regarding a large-scale data integration project across humans, model and non-model organisms. Without pointing fingers, we’ll also share a few highlights from the (Re)usable Data Project, which outlined a five-part rubric to evaluate data licenses with respect to clarity and the reuse and redistribution of data. In addition, the topic raises the question: How well-suited are off-the-shelf software and data licenses for universities? Data scientists and software programmers are all too quick to pick one when they release their technology on GitHub. What should technology transfer professionals
recommend? We’ll discuss the usefulness and attributes of a uniform software and data license for university researchers and software programmers.
Equivalence is in the (ID) of the beholdermhaendel
Presented at PIDapalooza 2018. https://pidapalooza.org/
Determining identifier equivalency is key to data integration and to realizing the scientific discoveries that can only be made by collating our vast disconnected data stores.
There are two key problems in determining equivalency - conceptual and syntactic alignment. Conceptual alignment often relies on Xrefs and string-matching against synonyms. There is indeed a better way! Algorithmic determination of identifier equivalency across different sources can use a combination of Xrefs, priors rules, existing semantic relations, and synonyms to create equivalency cliques than can highlight the discrepancies in conceptual definitions for manual review. This is especially useful for data sources annotated with concept drift and differences, such as diseases. Syntactic issues are that there are so many variations of the same identifier, making data joins difficult. We present a framework to reconcile and provide authoritative and integration-ready prefixed identifiers (CURIES), to capture and consolidate prefixes and to build links across key resource registries. The combination of JSON-LD context technology with a prefix metadata repository provides the basis for the infrastructure to handle identifiers in a consistent fashion. Finally, this architecture also allows resources to be self describing "beacons" with respect to their identifiers.
Building (and traveling) the data-brick road: A report from the front lines ...mhaendel
The NIH Data Commons must treat the data it will contain not unlike the mortar and stones of a road. To help our fellow scientists travelers use the road, we must engineer for heavy traffic and diverse destinations. There are many steps to architecting a robust and persistent road. First, the data must be sourced and manipulated into common data models. This requires versioned access to the data, equivalency determination of identifiers within the data or minting of new ones for the data and/or within it, manipulating the data according to common data models (e.g. a genotype-to-pehnotype association in one source may relate a variant to a disease, where in another it may be a set of alleles associated with a set of phenotypes, each source models the data differently). Inclusion of the data in the Commons must meet all licensing restrictions, which are varied and usually poorly declared, as well as security, HIPAA, and ethics requirements. Software tools are needed to perform the Enhance-Transform-Load (ETL) process on a regular cycle to keep the data current, and to assess changes and quality assurance over time. For records that disappear, there needs to be a way to keep an archive of them. Once in the Commons, the data requires a map to navigate the roads: where do you want to go? Indexing and search across the data requires having the data be self-reporting - loading ontologies used in the data for indexing and providing faceted query over these and other attributes, sophisticated text mining tools, relevance ranking, and equivalency and similarity determination from amongst different providers. Once found, the users need vehicles to drive upon the road. These are their workspaces, the place where they design and implement the operations they need in order to get where they want to go. Unimaginable scientific emeralds are to be found at the end of the road, as the sum of all the data, if well integrated and made computationally reusable, has proven to be well beyond the sum of its parts in getting us where we want to go.
Reusable data for biomedicine: A data licensing odysseymhaendel
Biomedical data integrators grapple with a fundamental blocker in research today: licensing for data use and redistribution. Complex licensing and data reuse restrictions hinder most publicly-funded, seemingly “open” biomedical data from being put to its full potential. Such issues include missing licenses, non-standard licenses, and restrictive provisions. The sheer diversity of licenses are particularly thorny for those that aim to redistribute data. Redistributors are often required to contact each sub-source to obtain permissions, and this is complicated by the fact that on each side of the agreement there may be multiple legal entities involved and some sub-sources may themselves already be aggregating data from other sub-sources. Furthermore, interpreting legal compliance with source data licensing and use agreements is complicated, as data is often manipulated, shared, and redistributed by many types of research groups and users in various and subtle ways. Here, we debut a new effort, the (Re)usable Data Project, where we have created a five-part rubric to evaluate biomedical data sources and their licensing information to determine the degree to which unnegotiated and unrestricted reuse and redistribution are provided. We have tested the (Re)usable Data rubric against various biomedical data sources, ranking each source on a scale of zero to five stars, and have found that approximately half of the resources rank poorly, getting 2.5 stars or less. Our goal is to help biomedical informaticians and other users navigate the plethora of issues in reusing and redistributing biomedical data. The (Re)usable Data project aims to promote standardization and ease of reuse licensing practices by data providers.
Data Translator: an Open Science Data Platform for Mechanistic Disease Discoverymhaendel
Architecture of language and data translation that underlays the NCATS Biomedical Data Translator. Presented at the Fanconi Anemia Annual Meeting. http://fanconi.org/index.php/research/annual_symposium
How open is open? An evaluation rubric for public knowledgebasesmhaendel
Presented at the 2017 International Biocuration Conference.
Data relevant to any given scientific investigation is highly decentralized across thousands of specialized databases. Within the Biocuration community, we recognize that the value of open scientific knowledge bases is that they make scientific knowledge easier to find and compute, thereby maximizing impact and minimizing waste. The ever-increasing number of databases makes us necessarily question what are our priorities with respect to maintaining them, developing new ones, or senescing/subsuming ones that have completed in their mission. Therefore, open biomedical data repositories should be carefully evaluated according to quality, accessibility, and value of the database resources over time and across the translational divide.
Traditional citation count and publication impact factors as a measure of success or value are known to be inadequate to assess the usefulness of a resource. This is especially true for integrative resources. For example, almost everyone in biomedicine relies on PubMed, but almost no one ever cites or mentions it in their publications. While the Nucleic Acids Research Database issues have increased citation of some databases, many still go unpublished or uncited; even novel derivations of methodology, applications, and workflows from biomedical knowledge bases are often “adapted” but never cited. There is a lack of citation best practices for widely used biomedical database resources (e.g. should a paper be cited? A URL? Is mention of the name and access date sufficient?).
We have developed a draft evaluation rubric for evaluating open science databases according to the commonly cited FAIR principles -- Findable, Accessible, Interoperable, and Reusable, but with three additional principles: Traceable, Licensed, and Connected. These additions are largely overlooked and underappreciated, yet are critical to reuse of the knowledge contained within any given database. It is worth noting that FAIR principles apply not only to the resource as a whole, but also to their key components; this “fractal FAIRness” means that even the license, identifiers, vocabularies, APIs themselves must be Findable, Accessible, Interoperable, Reusable, etc. Here we report on initial testing of our evaluation rubric on the recent NIH/Wellcome Trust Open Science projects and seek community input for how to further advance this rubric as a Biocuration community resource.
Deep phenotyping to aid identification of coding & non-coding rare disease v...mhaendel
Whole-exome sequencing has revolutionized disease research, but many cases remain unsolved because ~100-1000 candidates remain after removing common or non-pathogenic variants. We present Genomiser to prioritize coding and non-coding variants by leveraging phenotype data encoded with the Human Phenotype Ontology and a curated database of non-coding Mendelian variants. Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes.
On the frontier of genotype-2-phenotype data integrationmhaendel
Presented at AMIA TBI 2016 BD2K Panel. A description of the Monarch Initiative's efforts to perform deep phenotyping data integration across species, facilitate exchange, and build computable G2P evidence modesl to aid variant interpretation.
Force11: Enabling transparency and efficiency in the research landscapemhaendel
Presented at the Feb 2015, NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
http://www.niso.org/news/events/2015/virtual_conferences/sci_data_management/
Dataset description using the W3C HCLS standardmhaendel
This talk was presented at the BioCaddie http://biocaddie.org/ workshop at the Force15 conference (https://www.force11.org/meetings/force2015) on changing the future of scholarly communication. The goal was to increase awareness of why a Semantic Web-compliant standard was needed for describing data, where current standards fall short, and how this new emerging standard that extends prior efforts can aid data discovery and integration. This work is being lead by Michel Dumontier, Alasdair Gray, Joachim Baran, and M. Scott Marshall; participants and end-user testers are welcome, see: http://tiny.cc/hcls-datadesc-ed
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
2. The genome is sequenced, but…
…we still don’t know very much about what it does
3,435
OMIM
Mendelian Diseases with
no known genetic basis
?
66,396
ClinVar
Variants with no known
pathogenicity
3. Why we need all the organisms
Model data can provide up to
80% phenotypic coverage of the human coding genome
4. The prevailing clinical diagnosis pipelines leverage
only a tiny fraction of the available data
PATIENT EXOME
/ GENOME
PATIENT PHENOTYPES
PATIENT ENVIRONMENT
PUBLIC GENOMIC DATA
PUBLIC HUMAN & MODEL
PHENOTYPE,
DISEASE DATA
PUBLIC HUMAN & MODEL
ENVIRONMENT,
DISEASE DATA
POSSIBLE DISEASES
DIAGNOSIS & TREATMENT
Under-utilized data
5. monarchinitiative.org
PROBLEM
Diagnosis / treatment / prognosis on gestalt
(Experience, intuition, and pattern recognition)
Things are not always what they first seem
Errors are common, and up to 35% of errors cause harm
It takes patients @ six years from noticing symptoms to
being diagnosed with trips to eight physicians
25% of patients having to wait between 5 and 30 years
HYPOTHESIS
Diagnosis, treatment and prognosis may be informed and
complemented by democratized deep phenotyping that is
easier to compute, collect, and exchange
6.
7. Ulcerated paws
Palmoplantar
hyperkeratosis
Thick hand
skin
Image credits:
"HandsEBS" by James Heilman, MD - Own work. Licensed under CC BY-
SA 3.0 via Commons –
https://commons.wikimedia.org/wiki/File:HandsEBS.JPG#/media/File
:HandsEBS.JPG
http://www.guinealynx.info/pododermatitis.html
9. Challenge: Each database uses their
own phenotype vocabulary/ontology
ZFA
MP
DPO
WPO
HP
OMIA
VT
FYPO
APO
SNO
MED
…
…
…
WB
PB
FB
OMIA
MGI
RGD
ZFIN
SGD
HPOA
EHR
IMPC
OMIM
…
QTLdb
10. Can we help machines understand
phenotype terms?
“Palmoplantar
hyperkeratosis”
Human phenotype
I have
absolutely no
idea what that
means
11. The Human Phenotype Ontology for deep
phenotyping
Hyposmia
Abnormality of
globe location
eyeball of
camera-type eye
sensory
perception of smell
Abnormal eye
morphology
Motor neuron
atrophyDeeply set eyes
motor neuronCL
34571 annotations in
22 species
157534 phenotype
annotations
2150 phenotype
annotations
12. Decomposition of complex concepts
allows interoperability
Mungall, C. J., Gkoutos, G., Smith, C., Haendel, M., Lewis, S., & Ashburner,
M. (2010). Integrating phenotype ontologies across multiple species.
Genome Biology, 11(1), R2. doi:10.1186/gb-2010-11-1-r2
“Palmoplantar
hyperkeratosis”
increased
Stratum corneum
layer of skin
=
Human phenotype
PATO
Uberon
Species neutral ontologies, homologous concepts
Autopod
keratinization
GO
16. Putting all that data to use to diagnose a rare
platelet syndrome
http://bit.ly/stim1paper
Phenotypic
profile
Genes
Heterozygous,
missense mutation
STIM-1
MGI mouse
N/A
Heterozygous,
missense mutation
STIM-1
N/A
Ranked STIM-1 variant maximally pathogenic
based on cross-species G2P data,
in the absence of traditional data sources
http://bit.ly/exomiser
Stim1Sax/Sax
22. Simulated GenomeConnect
survey HPO Profiles
Monarch Initiative
reference HPO Profiles
Ensure that the survey is maximally diagnostic
Patient
Expert
Phenotypic Profile
overlap
Compare
phenotypic
profiles
For every known disease, fill the survey and ask:
Does the profile match the disease best based on the survey mapping?
24. Assess patient-derived profile generation
Patient
ExpertPhenotypicProfile
overlap
Compare
phenotypic
profiles
For every diagnosed patient:
Can the patients utilize the survey and retrieve the correct disease?
26. Determine the contribution and sufficiency of
patient self-phenotyping
UDN patient generated
GenomeConnect survey
HPO profile
UDN patient Clinical
evaluation HPO profile
Patient
Expert
Phenotypic Profile
overlap
Compare
phenotypic
profiles
27. Human Phenotype Ontology, now with
6,200 plain language synonyms
for patients, families, and non-experts
www.human-phenotype-ontology.org@HP_ontology
28. Almost half of the 14k synonyms are
plain language
All synonyms Plain language
synonyms
30. Genes Environment Phenotypes+ =
Biology central dogma
Standards for encoding and exchanging data
must be up to these challenges
@ontowonka
31. The relationships too must be captured
It is not just the bits…
G-P or D (disease)
causes
contributes to
is risk factor for
protects against
correlates with
is marker for
modulates
involved in
increases susceptibility to
G-G (kind of)
regulates
negatively regulates (inhibits)
positively regulates (activates)
directly regulates
interacts with
co-localizes with
co-expressed with
P/D - P/D
part of
results in
co-occurs with
correlates with
hallmark of (P->D)
E-P
contributes to (E->P)
influences (E->P)
exacerbates (E->P)
manifest in (P->E)
G-E (kind of)
expressed in
expressed during
contains
inactivated by
32. Genes Environment Phenotypes+ =
Computable encodings are essential
Base pairs
Variant notation (eg. HGVS)
Human Phenotype
Ontology
Mammalian
Phenotype Ontology
Medical procedure coding
Environment Ontology
@ontowonka
33. Genes Environment Phenotypes
VCF PXFGFF
Standard exchange formats exist for genes …
but for phenotypes? Environment?
BED
@ontowonka
34. If it is alive, it can be PhenoPackaged
Some biodiversity images adapted from http://i.vimeocdn.com/video/417366050_1280x720.jpg
Model Organisms
Biodiversity Crops Domestic Animals
Disease vectors
Epidemiological
Monitoring
Drug discovery
& Development
Rare Disease
Diagnosis
Personalized
Medicine
Environmental
Monitoring
Patients & Cohorts
Genetic
Engineering
Mechanistic
Discovery
35. Phenopackets for organisms
This is “Maru”,
a 4-year-old, male
cat of the Scottish
Fold breed
abnormal
sheltering behavior
[MP:0014039]
(onset at birth)
Biography
Phenotypes
&qualifiers
youtube.com/user
/mugumogu
Weighs 6kg
Measurements
Source
38. Phenopackets for journals
Each article can be
associated with a
phenopacket
Robinson, P. N., Mungall, C. J., & Haendel, M. (2015). Capturing phenotypes for precision
medicine. Molecular Case Studies, 1(1), a000372. doi:10.1101/mcs.a000372
Each phenopacket
can be shared via
DOI in any repository
outside paywall (eg.
Figshare, Zenodo,
etc)
43. Thank you!
Deep Phenotype and have a magical day
Community engagement survey
bit.ly/monarchcommunity
44. Acknowledgements
Lawrence Berkeley
Chris Mungall
Suzanna Lewis
Jeremy Nguyen
Seth Carbon
Charité
Peter Robinson
Sebastian Kohler
RTI
Jim Balhoff
Cyverse
Ramona Walls
U of Pittsburgh
Harry Hochheiser
OHSU
Matt Brush
Kent Shefchek
Julie McMurry
Tom Conlin
Nicole Vasilevsky
Queen Mary College
London
Damian Smedley
Jules Jacobson
Garvan
Tudor Groza
Alfred Wegener
Pier Buttigieg
GSoC
Satwik Bhattamishra
FUNDING: NIH Office of Director: 1R24OD011883; NIH-UDP: HHSN268201300036C,
HHSN268201400093P, Phenotype Ontology Research Coordination Network (NSF-DEB-0956049)
With special thanks to Julie McMurry for excellent graphic design
Editor's Notes
There is a lot we don’t know about the genome
As of April 2016
OMIM updated number: 3435
ClinVar updated number: 66396
Data from mouse, rat, zebrafish, worm, fruitfly
Human:OMIM, clinvar
Orthology via PANTHER v9
We’ve been thinking about the need for combined mechanistic/molecular classification with phenomenological nosologies for some time.
This figure is adapted from National Research Council (U.S.). Committee on A Framework for Developing a New Taxonomy of Disease., Toward precision medicine : building a knowledge network for biomedical research and a new taxonomy of disease. 2011, Washington, D.C.: National Academies Press. xiii, 128 p.
http://www.nap.edu/catalog/13284/toward-precision-medicine-building-a-knowledge-network-for-biomedical-research
Figure 3.1 (page 52): Building a biomedical Knowledge Network for basic discovery and Medicine.
Our approach is to try and get the machine to understand the terms so that it can assist us intelligently.
Represent organism as a biological subject
Represent diseases/genotypes as collections of nodes in the graph
3. Interoperable with other bioinformatics resources and leverage modern semantic standards
We make things digestible. Complex concepts into simpler parts. We use ontologies that are comparative by design.
If we include bridging ontologies, we can unify diseases across sources AND phenotypes across sources and organisms.
Highlighting how we get different phenotypic information from different sources, species
Data from MGI, ZFIN, & HPO, reasoned over with cross-species phenotype ontology
https://code.google.com/p/phenotype-ontologies/
The distribution of phenotype information per model genotype is different compared to human disease annotations.
For mouse, there’s a much higher representation of metabolic, cardiovascular, blood, and endocrine phenotypes available to compare;
For fish, there’s increased nervous, skeletal, head and neck, and cardiovascular, and connective tissue.
(Note that these do not include “normal” phenotypes for either diseases or genotypes.)
What does it mean to replicate a phenotypic profile in a model organism? For many patients or diseases, we may need different models to fully recapitulate the disease. Further, some phenotypes are common in a given species and if present in the patient, would be a less significant result.
This was the novel case we solved. The UDP patient had a number of signs and symptoms including various platelet abnormalities. The same heterozygous, missense mutation was seen in 2 patients and ranked top by Exomiser. It had never been seen in any of the SNP databases and was predicted maximally pathogenic. Finally a mouse curated by MGI involving a heterozygous, missense point mutation introduced by chemical mutagenesis exhibited strikingly similar platelet abnormalities.
Experiments to determine the sufficiency of any given phenotype profile.
We can test the roll of category by creating a derived disease that removes all the phenotypes for that category as our “case”…And then as a control, remove an equal amount of “information” from other categories.
In the case of Schwartch-Jampel Syndrome, removing only skeletal phenotypes (which comprises 40% of phenotype profile) it significantly reduces its similarity, dropping it to only 86% similar, whereas removing the same amount of information from the controls gives an average of 91% similarity. In this case, there were 73 controls to compare to. Other categories that are significantly affected (where the case and control similarity scores differ significantly) are genitourinary system, growth, and musculature. But not always in the same direction.
We did this for all diseases.
The classic G+E=P. But the = has a lot that can be applied to aid the linking.
The classic G+E=P. But the = has a lot that can be applied to aid the linking.
The classic G+E=P. But the = has a lot that can be applied to aid the linking.
The classic G+E=P. But the = has a lot that can be applied to aid the linking.
Mosquito image from https://pixabay.com/en/brazil-health-mosquito-news-virus-1300017/ no attribution required
Reeldx patientslikeme post phenopacket on facebook
Same format
There are a lot of people who have contributed to this work over many years.