Uberon is a multi-species anatomy ontology covering animal anatomy. It contains over 8,000 classes describing anatomical structures across metazoans in a species-neutral way. Uberon bridges species-specific anatomy ontologies and allows cross-species analysis of high-throughput genomics and phenomics data. It is extensively connected to other biomedical ontologies and has been applied in projects involving phenomics, transcriptomics, systematics and finding disease models.
This document summarizes various sensory receptors in the body. It describes receptors based on their origin of stimulus (exteroceptors, interoceptors, proprioceptors), nature of stimuli (mechanoreceptors, chemoreceptors, thermoreceptors etc.), and anatomical position (non-encapsulated like free nerve endings vs encapsulated like Pacinian corpuscles). It then provides more detailed descriptions of specific receptors like Pacinian corpuscles, Meissner's corpuscles, muscle spindles, Golgi tendon organs, taste buds, retina, cochlear duct and organ of Corti.
The respiratory system brings air into the lungs, exchanges oxygen and carbon dioxide between the air sacs and bloodstream, and exhales air. It includes the nasal cavity, pharynx, larynx, trachea, bronchial tubes, and lungs. The nasal cavity cleans and conditions air before it reaches the pharynx and larynx, which direct air to the trachea and lungs for gas exchange to occur in the alveoli.
The document discusses the respiratory system. It describes the major parts of the respiratory system including the nose, pharynx, larynx, trachea, bronchi, lungs, bronchioles, alveoli, and diaphragm. It then discusses common respiratory diseases like colds, pharyngitis, laryngitis, asthma, bronchitis, pneumonia, emphysema, and lung cancer. Finally, it mentions treatments for respiratory diseases including pulmonology, antihistamines, halotherapy, and lung transplants.
Science (1. Respiratory System and Circulatory System Working With The Other ...Eemlliuq Agalalan
The respiratory system involves air entering the nose and traveling through the trachea, which divides into two branches called bronchi. The bronchi subdivide many times inside the lungs, forming hairlike tubes called bronchioles. At the end of the bronchioles are tiny bubble-like structures called alveoli. When breathing in, the diaphragm contracts, and when breathing out, it relaxes to help air in and out of the lungs. The circulatory system involves blood traveling from the heart through the bloodstream to cells, where oxygen is used and carbon dioxide is released as waste, before blood returns to the heart and lungs to release carbon dioxide. The heart is a hollow muscular organ divided into four chambers -
Introduction & investigations to respiratory diseasesFiroz Hakkim
The document provides an overview of the human respiratory system including its structure and function. It is divided into the upper respiratory tract including the nose, sinuses and pharynx, and the lower respiratory tract including the lungs. The conducting portion brings air to the lungs while the respiratory portion facilitates gas exchange. Various respiratory functions and investigations of the respiratory system are also summarized such as imaging techniques, endoscopic exams, and pulmonary function tests.
The document discusses common ailments of the respiratory system including:
- The common cold and influenza, characterized by sneezing, runny nose, and sore throat, sometimes with fever and body aches.
- Asthma, which causes difficulty breathing in those sensitive to pollutants like pollen and dust. Asthma patients sometimes cough and wheeze when exhaling.
- Pneumonia, an infection of the lungs that can be caused by bacteria like Streptococcus pneumoniae, viruses, or atypical bacteria. It develops in the small airways and air sacs of the lungs.
Uberon is a multi-species anatomy ontology covering animal anatomy. It contains over 8,000 classes describing anatomical structures across metazoans in a species-neutral way. Uberon bridges species-specific anatomy ontologies and allows cross-species analysis of high-throughput genomics and phenomics data. It is extensively connected to other biomedical ontologies and has been applied in projects involving phenomics, transcriptomics, systematics and finding disease models.
This document summarizes various sensory receptors in the body. It describes receptors based on their origin of stimulus (exteroceptors, interoceptors, proprioceptors), nature of stimuli (mechanoreceptors, chemoreceptors, thermoreceptors etc.), and anatomical position (non-encapsulated like free nerve endings vs encapsulated like Pacinian corpuscles). It then provides more detailed descriptions of specific receptors like Pacinian corpuscles, Meissner's corpuscles, muscle spindles, Golgi tendon organs, taste buds, retina, cochlear duct and organ of Corti.
The respiratory system brings air into the lungs, exchanges oxygen and carbon dioxide between the air sacs and bloodstream, and exhales air. It includes the nasal cavity, pharynx, larynx, trachea, bronchial tubes, and lungs. The nasal cavity cleans and conditions air before it reaches the pharynx and larynx, which direct air to the trachea and lungs for gas exchange to occur in the alveoli.
The document discusses the respiratory system. It describes the major parts of the respiratory system including the nose, pharynx, larynx, trachea, bronchi, lungs, bronchioles, alveoli, and diaphragm. It then discusses common respiratory diseases like colds, pharyngitis, laryngitis, asthma, bronchitis, pneumonia, emphysema, and lung cancer. Finally, it mentions treatments for respiratory diseases including pulmonology, antihistamines, halotherapy, and lung transplants.
Science (1. Respiratory System and Circulatory System Working With The Other ...Eemlliuq Agalalan
The respiratory system involves air entering the nose and traveling through the trachea, which divides into two branches called bronchi. The bronchi subdivide many times inside the lungs, forming hairlike tubes called bronchioles. At the end of the bronchioles are tiny bubble-like structures called alveoli. When breathing in, the diaphragm contracts, and when breathing out, it relaxes to help air in and out of the lungs. The circulatory system involves blood traveling from the heart through the bloodstream to cells, where oxygen is used and carbon dioxide is released as waste, before blood returns to the heart and lungs to release carbon dioxide. The heart is a hollow muscular organ divided into four chambers -
Introduction & investigations to respiratory diseasesFiroz Hakkim
The document provides an overview of the human respiratory system including its structure and function. It is divided into the upper respiratory tract including the nose, sinuses and pharynx, and the lower respiratory tract including the lungs. The conducting portion brings air to the lungs while the respiratory portion facilitates gas exchange. Various respiratory functions and investigations of the respiratory system are also summarized such as imaging techniques, endoscopic exams, and pulmonary function tests.
The document discusses common ailments of the respiratory system including:
- The common cold and influenza, characterized by sneezing, runny nose, and sore throat, sometimes with fever and body aches.
- Asthma, which causes difficulty breathing in those sensitive to pollutants like pollen and dust. Asthma patients sometimes cough and wheeze when exhaling.
- Pneumonia, an infection of the lungs that can be caused by bacteria like Streptococcus pneumoniae, viruses, or atypical bacteria. It develops in the small airways and air sacs of the lungs.
This document discusses Uberon, a multi-species anatomy ontology for phenomics and evolutionary developmental biology analyses. It provides an overview of Uberon, including that it integrates species anatomy ontologies, interoperates with other ontologies, uses reasoning and validation, handles taxonomic variation, and has applications in phenotype analysis. It also discusses how Uberon uses logical definitions and general axioms to manage anatomical variation between species and how its development involves iterative curation, alignment with other ontologies, and use of reasoners to detect errors.
This document provides an overview of insect morphology, anatomy, and classification. It describes the external and internal body structures of insects, including the head, thorax, abdomen, digestive and reproductive systems. It also outlines the key characteristics of the phylum Arthropoda and class Insecta. Examples are provided of the different insect orders. References for further reading on entomology are listed at the end.
Anatomy and Physiology of Respiratory System-1.pptxMuhammadAsif45095
The document summarizes the anatomy and physiology of the respiratory system in birds and compares it to mammals. Key differences include birds having nostrils located on the upper beak rather than below the nose. The nasal cavity is also a single chamber in birds versus divided by a septum in mammals. Additionally, birds have a syrinx vocal organ at the trachea bifurcation instead of larynx near the pharynx as in mammals. The bronchial system in birds has three orders of branching before reaching gas exchange units, and birds possess eight air sacs to aid ventilation of the lungs.
The document discusses Jaak Panksepp's model of the brain's emotional systems, known as the "triune brain". It describes seven core emotional systems - SEEKING, RAGE, FEAR, PANIC/DISTRESS, CARE/NURTURE, PLAY, and LUST - each of which is associated with different affective feelings and motivations. It also outlines Panksepp's three-level model of motor control in the brain - the neomammalian cortex, limbic system, and reptilian reflex system. Finally, it provides diagrams of the neurological pathways and structures involved in sensory processing and motor control.
The document summarizes the 12 cranial nerves, including their origins, paths through the skull, branches, and functions. Key points include:
- Cranial nerves I-VI originate in the brain, while VII-XII originate in the brainstem
- They have both motor and sensory components that innervate structures of the head and neck like muscles, sense organs, and glands
- The trigeminal nerve is the largest cranial nerve and has three major divisions that innervate the face
- The optic nerve carries visual information from the eyes to the brain
- Cranial nerves exit the skull through various foramina to innervate their target structures
The insect head is composed of sclerites that form the cranium. There are three main types of insect heads - hypognathous, prognathous, and opisthognathous - which differ in the orientation of the mouthparts. Antennae are jointed sensory appendages found in pairs on the insect head. They can have various shapes including filiform, capitate, clavate, geniculate, moniliform, pectinate, serrate, aristate, plumose, lamellate, and flabellate. The basic segments of antennae are the scape, pedicel, and multiple flagellomeres.
The insect head is composed of sclerites including the vertex, frons, clypeus, gena, and occiput. It is divided by sutures and contains openings like the occipital foramen. The head supports appendages like the antennae and mouthparts. There are three basic head types - hypognathous, prognathous, and opisthognathous - which differ in the orientation of the mouthparts. Antennae are jointed sensory organs that can be filiform or modified into various shapes like capitate, clavate, geniculate, and pectinate. They are composed of segments including the scape, pedicel, and flagello
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...mhaendel
Presented at AMIA TBI CRI 2018.
Rare disease patients are expert in their medical history and these patients not only are some of the most engaged, but also they can themselves provision data for use in clinical evaluation. We therefore created a lay-person version of our clinical deep phenotyping instrument, the Human Phenotype Ontology. Here, we evaluate the diagnostic utility of this lay-HPO, and debut a new software tool for patient-led deep phenotyping.
The Software and Data Licensing Solution: Not Your Dad’s UBMTA mhaendel
Presented at the Association of University Technology Managers (AUTM) Annual Conference 2018
Moderator: Arvin Paranjpe, Oregon Health & Science University
Speakers: Frank Curci, Ater Wynne LLP
Melissa Haendel, Oregon Health & Science University
Charles Williams, University of Oregon
Big data is an open frontier, and it’s quickly expanding. However, transaction costs and legal barriers stand squarely in the way of meaningful, far-reaching data integration. We’ll grapple with the issues regarding a large-scale data integration project across humans, model and non-model organisms. Without pointing fingers, we’ll also share a few highlights from the (Re)usable Data Project, which outlined a five-part rubric to evaluate data licenses with respect to clarity and the reuse and redistribution of data. In addition, the topic raises the question: How well-suited are off-the-shelf software and data licenses for universities? Data scientists and software programmers are all too quick to pick one when they release their technology on GitHub. What should technology transfer professionals
recommend? We’ll discuss the usefulness and attributes of a uniform software and data license for university researchers and software programmers.
Equivalence is in the (ID) of the beholdermhaendel
Presented at PIDapalooza 2018. https://pidapalooza.org/
Determining identifier equivalency is key to data integration and to realizing the scientific discoveries that can only be made by collating our vast disconnected data stores.
There are two key problems in determining equivalency - conceptual and syntactic alignment. Conceptual alignment often relies on Xrefs and string-matching against synonyms. There is indeed a better way! Algorithmic determination of identifier equivalency across different sources can use a combination of Xrefs, priors rules, existing semantic relations, and synonyms to create equivalency cliques than can highlight the discrepancies in conceptual definitions for manual review. This is especially useful for data sources annotated with concept drift and differences, such as diseases. Syntactic issues are that there are so many variations of the same identifier, making data joins difficult. We present a framework to reconcile and provide authoritative and integration-ready prefixed identifiers (CURIES), to capture and consolidate prefixes and to build links across key resource registries. The combination of JSON-LD context technology with a prefix metadata repository provides the basis for the infrastructure to handle identifiers in a consistent fashion. Finally, this architecture also allows resources to be self describing "beacons" with respect to their identifiers.
Building (and traveling) the data-brick road: A report from the front lines ...mhaendel
The NIH Data Commons must treat the data it will contain not unlike the mortar and stones of a road. To help our fellow scientists travelers use the road, we must engineer for heavy traffic and diverse destinations. There are many steps to architecting a robust and persistent road. First, the data must be sourced and manipulated into common data models. This requires versioned access to the data, equivalency determination of identifiers within the data or minting of new ones for the data and/or within it, manipulating the data according to common data models (e.g. a genotype-to-pehnotype association in one source may relate a variant to a disease, where in another it may be a set of alleles associated with a set of phenotypes, each source models the data differently). Inclusion of the data in the Commons must meet all licensing restrictions, which are varied and usually poorly declared, as well as security, HIPAA, and ethics requirements. Software tools are needed to perform the Enhance-Transform-Load (ETL) process on a regular cycle to keep the data current, and to assess changes and quality assurance over time. For records that disappear, there needs to be a way to keep an archive of them. Once in the Commons, the data requires a map to navigate the roads: where do you want to go? Indexing and search across the data requires having the data be self-reporting - loading ontologies used in the data for indexing and providing faceted query over these and other attributes, sophisticated text mining tools, relevance ranking, and equivalency and similarity determination from amongst different providers. Once found, the users need vehicles to drive upon the road. These are their workspaces, the place where they design and implement the operations they need in order to get where they want to go. Unimaginable scientific emeralds are to be found at the end of the road, as the sum of all the data, if well integrated and made computationally reusable, has proven to be well beyond the sum of its parts in getting us where we want to go.
The Monarch Initiative aims to improve disease diagnostics and analysis by utilizing deep phenotyping data. It has developed ontologies like the Human Phenotype Ontology with over 13,000 phenotype terms to help machines understand human phenotypes. It uses "fuzzy" phenotypic profile matching across species to match patient data to known genetic disorders, as demonstrated by a case solved linking a patient's profile to a STIM1 variant. The Initiative is working to develop lay-friendly phenotyping tools and connect data sources through the Matchmaker Exchange to aid in diagnosis and research.
Reusable data for biomedicine: A data licensing odysseymhaendel
Biomedical data integrators grapple with a fundamental blocker in research today: licensing for data use and redistribution. Complex licensing and data reuse restrictions hinder most publicly-funded, seemingly “open” biomedical data from being put to its full potential. Such issues include missing licenses, non-standard licenses, and restrictive provisions. The sheer diversity of licenses are particularly thorny for those that aim to redistribute data. Redistributors are often required to contact each sub-source to obtain permissions, and this is complicated by the fact that on each side of the agreement there may be multiple legal entities involved and some sub-sources may themselves already be aggregating data from other sub-sources. Furthermore, interpreting legal compliance with source data licensing and use agreements is complicated, as data is often manipulated, shared, and redistributed by many types of research groups and users in various and subtle ways. Here, we debut a new effort, the (Re)usable Data Project, where we have created a five-part rubric to evaluate biomedical data sources and their licensing information to determine the degree to which unnegotiated and unrestricted reuse and redistribution are provided. We have tested the (Re)usable Data rubric against various biomedical data sources, ranking each source on a scale of zero to five stars, and have found that approximately half of the resources rank poorly, getting 2.5 stars or less. Our goal is to help biomedical informaticians and other users navigate the plethora of issues in reusing and redistributing biomedical data. The (Re)usable Data project aims to promote standardization and ease of reuse licensing practices by data providers.
Data Translator: an Open Science Data Platform for Mechanistic Disease Discoverymhaendel
Architecture of language and data translation that underlays the NCATS Biomedical Data Translator. Presented at the Fanconi Anemia Annual Meeting. http://fanconi.org/index.php/research/annual_symposium
How open is open? An evaluation rubric for public knowledgebasesmhaendel
Presented at the 2017 International Biocuration Conference.
Data relevant to any given scientific investigation is highly decentralized across thousands of specialized databases. Within the Biocuration community, we recognize that the value of open scientific knowledge bases is that they make scientific knowledge easier to find and compute, thereby maximizing impact and minimizing waste. The ever-increasing number of databases makes us necessarily question what are our priorities with respect to maintaining them, developing new ones, or senescing/subsuming ones that have completed in their mission. Therefore, open biomedical data repositories should be carefully evaluated according to quality, accessibility, and value of the database resources over time and across the translational divide.
Traditional citation count and publication impact factors as a measure of success or value are known to be inadequate to assess the usefulness of a resource. This is especially true for integrative resources. For example, almost everyone in biomedicine relies on PubMed, but almost no one ever cites or mentions it in their publications. While the Nucleic Acids Research Database issues have increased citation of some databases, many still go unpublished or uncited; even novel derivations of methodology, applications, and workflows from biomedical knowledge bases are often “adapted” but never cited. There is a lack of citation best practices for widely used biomedical database resources (e.g. should a paper be cited? A URL? Is mention of the name and access date sufficient?).
We have developed a draft evaluation rubric for evaluating open science databases according to the commonly cited FAIR principles -- Findable, Accessible, Interoperable, and Reusable, but with three additional principles: Traceable, Licensed, and Connected. These additions are largely overlooked and underappreciated, yet are critical to reuse of the knowledge contained within any given database. It is worth noting that FAIR principles apply not only to the resource as a whole, but also to their key components; this “fractal FAIRness” means that even the license, identifiers, vocabularies, APIs themselves must be Findable, Accessible, Interoperable, Reusable, etc. Here we report on initial testing of our evaluation rubric on the recent NIH/Wellcome Trust Open Science projects and seek community input for how to further advance this rubric as a Biocuration community resource.
Deep phenotyping to aid identification of coding & non-coding rare disease v...mhaendel
Whole-exome sequencing has revolutionized disease research, but many cases remain unsolved because ~100-1000 candidates remain after removing common or non-pathogenic variants. We present Genomiser to prioritize coding and non-coding variants by leveraging phenotype data encoded with the Human Phenotype Ontology and a curated database of non-coding Mendelian variants. Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes.
This document discusses making scientific data fair, open, and reusable. It defines the FAIR guiding principles of findable, accessible, interoperable and reusable data and describes what each principle entails. It then expands on these principles by introducing FAIR-TLC, which adds the dimensions of traceable, licensed and connected. The document argues that adopting FAIR-TLC practices and developing tools to support them can help improve the sharing and reuse of scientific data. It also suggests ways to incentivize open science through funding and publication requirements.
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...mhaendel
Presented at the IRDiRC 2017 conference in Paris, Feb 9th, 2017 (http://irdirc-conference.org/). This talk reviews use of the Human Phenotype Ontology for phenotype comparisons against other patients, known diseases, and animal models for diagnostic discovery. It also discusses the new Phenopackets Exchange mechanism for open phenotypic data sharing.
www.monarchinitiative.org
www.phenopackets.org
www.human-phenotype-ontology.org
More Related Content
Similar to A merger of multi-species anatomy ontologies
This document discusses Uberon, a multi-species anatomy ontology for phenomics and evolutionary developmental biology analyses. It provides an overview of Uberon, including that it integrates species anatomy ontologies, interoperates with other ontologies, uses reasoning and validation, handles taxonomic variation, and has applications in phenotype analysis. It also discusses how Uberon uses logical definitions and general axioms to manage anatomical variation between species and how its development involves iterative curation, alignment with other ontologies, and use of reasoners to detect errors.
This document provides an overview of insect morphology, anatomy, and classification. It describes the external and internal body structures of insects, including the head, thorax, abdomen, digestive and reproductive systems. It also outlines the key characteristics of the phylum Arthropoda and class Insecta. Examples are provided of the different insect orders. References for further reading on entomology are listed at the end.
Anatomy and Physiology of Respiratory System-1.pptxMuhammadAsif45095
The document summarizes the anatomy and physiology of the respiratory system in birds and compares it to mammals. Key differences include birds having nostrils located on the upper beak rather than below the nose. The nasal cavity is also a single chamber in birds versus divided by a septum in mammals. Additionally, birds have a syrinx vocal organ at the trachea bifurcation instead of larynx near the pharynx as in mammals. The bronchial system in birds has three orders of branching before reaching gas exchange units, and birds possess eight air sacs to aid ventilation of the lungs.
The document discusses Jaak Panksepp's model of the brain's emotional systems, known as the "triune brain". It describes seven core emotional systems - SEEKING, RAGE, FEAR, PANIC/DISTRESS, CARE/NURTURE, PLAY, and LUST - each of which is associated with different affective feelings and motivations. It also outlines Panksepp's three-level model of motor control in the brain - the neomammalian cortex, limbic system, and reptilian reflex system. Finally, it provides diagrams of the neurological pathways and structures involved in sensory processing and motor control.
The document summarizes the 12 cranial nerves, including their origins, paths through the skull, branches, and functions. Key points include:
- Cranial nerves I-VI originate in the brain, while VII-XII originate in the brainstem
- They have both motor and sensory components that innervate structures of the head and neck like muscles, sense organs, and glands
- The trigeminal nerve is the largest cranial nerve and has three major divisions that innervate the face
- The optic nerve carries visual information from the eyes to the brain
- Cranial nerves exit the skull through various foramina to innervate their target structures
The insect head is composed of sclerites that form the cranium. There are three main types of insect heads - hypognathous, prognathous, and opisthognathous - which differ in the orientation of the mouthparts. Antennae are jointed sensory appendages found in pairs on the insect head. They can have various shapes including filiform, capitate, clavate, geniculate, moniliform, pectinate, serrate, aristate, plumose, lamellate, and flabellate. The basic segments of antennae are the scape, pedicel, and multiple flagellomeres.
The insect head is composed of sclerites including the vertex, frons, clypeus, gena, and occiput. It is divided by sutures and contains openings like the occipital foramen. The head supports appendages like the antennae and mouthparts. There are three basic head types - hypognathous, prognathous, and opisthognathous - which differ in the orientation of the mouthparts. Antennae are jointed sensory organs that can be filiform or modified into various shapes like capitate, clavate, geniculate, and pectinate. They are composed of segments including the scape, pedicel, and flagello
Similar to A merger of multi-species anatomy ontologies (8)
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...mhaendel
Presented at AMIA TBI CRI 2018.
Rare disease patients are expert in their medical history and these patients not only are some of the most engaged, but also they can themselves provision data for use in clinical evaluation. We therefore created a lay-person version of our clinical deep phenotyping instrument, the Human Phenotype Ontology. Here, we evaluate the diagnostic utility of this lay-HPO, and debut a new software tool for patient-led deep phenotyping.
The Software and Data Licensing Solution: Not Your Dad’s UBMTA mhaendel
Presented at the Association of University Technology Managers (AUTM) Annual Conference 2018
Moderator: Arvin Paranjpe, Oregon Health & Science University
Speakers: Frank Curci, Ater Wynne LLP
Melissa Haendel, Oregon Health & Science University
Charles Williams, University of Oregon
Big data is an open frontier, and it’s quickly expanding. However, transaction costs and legal barriers stand squarely in the way of meaningful, far-reaching data integration. We’ll grapple with the issues regarding a large-scale data integration project across humans, model and non-model organisms. Without pointing fingers, we’ll also share a few highlights from the (Re)usable Data Project, which outlined a five-part rubric to evaluate data licenses with respect to clarity and the reuse and redistribution of data. In addition, the topic raises the question: How well-suited are off-the-shelf software and data licenses for universities? Data scientists and software programmers are all too quick to pick one when they release their technology on GitHub. What should technology transfer professionals
recommend? We’ll discuss the usefulness and attributes of a uniform software and data license for university researchers and software programmers.
Equivalence is in the (ID) of the beholdermhaendel
Presented at PIDapalooza 2018. https://pidapalooza.org/
Determining identifier equivalency is key to data integration and to realizing the scientific discoveries that can only be made by collating our vast disconnected data stores.
There are two key problems in determining equivalency - conceptual and syntactic alignment. Conceptual alignment often relies on Xrefs and string-matching against synonyms. There is indeed a better way! Algorithmic determination of identifier equivalency across different sources can use a combination of Xrefs, priors rules, existing semantic relations, and synonyms to create equivalency cliques than can highlight the discrepancies in conceptual definitions for manual review. This is especially useful for data sources annotated with concept drift and differences, such as diseases. Syntactic issues are that there are so many variations of the same identifier, making data joins difficult. We present a framework to reconcile and provide authoritative and integration-ready prefixed identifiers (CURIES), to capture and consolidate prefixes and to build links across key resource registries. The combination of JSON-LD context technology with a prefix metadata repository provides the basis for the infrastructure to handle identifiers in a consistent fashion. Finally, this architecture also allows resources to be self describing "beacons" with respect to their identifiers.
Building (and traveling) the data-brick road: A report from the front lines ...mhaendel
The NIH Data Commons must treat the data it will contain not unlike the mortar and stones of a road. To help our fellow scientists travelers use the road, we must engineer for heavy traffic and diverse destinations. There are many steps to architecting a robust and persistent road. First, the data must be sourced and manipulated into common data models. This requires versioned access to the data, equivalency determination of identifiers within the data or minting of new ones for the data and/or within it, manipulating the data according to common data models (e.g. a genotype-to-pehnotype association in one source may relate a variant to a disease, where in another it may be a set of alleles associated with a set of phenotypes, each source models the data differently). Inclusion of the data in the Commons must meet all licensing restrictions, which are varied and usually poorly declared, as well as security, HIPAA, and ethics requirements. Software tools are needed to perform the Enhance-Transform-Load (ETL) process on a regular cycle to keep the data current, and to assess changes and quality assurance over time. For records that disappear, there needs to be a way to keep an archive of them. Once in the Commons, the data requires a map to navigate the roads: where do you want to go? Indexing and search across the data requires having the data be self-reporting - loading ontologies used in the data for indexing and providing faceted query over these and other attributes, sophisticated text mining tools, relevance ranking, and equivalency and similarity determination from amongst different providers. Once found, the users need vehicles to drive upon the road. These are their workspaces, the place where they design and implement the operations they need in order to get where they want to go. Unimaginable scientific emeralds are to be found at the end of the road, as the sum of all the data, if well integrated and made computationally reusable, has proven to be well beyond the sum of its parts in getting us where we want to go.
The Monarch Initiative aims to improve disease diagnostics and analysis by utilizing deep phenotyping data. It has developed ontologies like the Human Phenotype Ontology with over 13,000 phenotype terms to help machines understand human phenotypes. It uses "fuzzy" phenotypic profile matching across species to match patient data to known genetic disorders, as demonstrated by a case solved linking a patient's profile to a STIM1 variant. The Initiative is working to develop lay-friendly phenotyping tools and connect data sources through the Matchmaker Exchange to aid in diagnosis and research.
Reusable data for biomedicine: A data licensing odysseymhaendel
Biomedical data integrators grapple with a fundamental blocker in research today: licensing for data use and redistribution. Complex licensing and data reuse restrictions hinder most publicly-funded, seemingly “open” biomedical data from being put to its full potential. Such issues include missing licenses, non-standard licenses, and restrictive provisions. The sheer diversity of licenses are particularly thorny for those that aim to redistribute data. Redistributors are often required to contact each sub-source to obtain permissions, and this is complicated by the fact that on each side of the agreement there may be multiple legal entities involved and some sub-sources may themselves already be aggregating data from other sub-sources. Furthermore, interpreting legal compliance with source data licensing and use agreements is complicated, as data is often manipulated, shared, and redistributed by many types of research groups and users in various and subtle ways. Here, we debut a new effort, the (Re)usable Data Project, where we have created a five-part rubric to evaluate biomedical data sources and their licensing information to determine the degree to which unnegotiated and unrestricted reuse and redistribution are provided. We have tested the (Re)usable Data rubric against various biomedical data sources, ranking each source on a scale of zero to five stars, and have found that approximately half of the resources rank poorly, getting 2.5 stars or less. Our goal is to help biomedical informaticians and other users navigate the plethora of issues in reusing and redistributing biomedical data. The (Re)usable Data project aims to promote standardization and ease of reuse licensing practices by data providers.
Data Translator: an Open Science Data Platform for Mechanistic Disease Discoverymhaendel
Architecture of language and data translation that underlays the NCATS Biomedical Data Translator. Presented at the Fanconi Anemia Annual Meeting. http://fanconi.org/index.php/research/annual_symposium
How open is open? An evaluation rubric for public knowledgebasesmhaendel
Presented at the 2017 International Biocuration Conference.
Data relevant to any given scientific investigation is highly decentralized across thousands of specialized databases. Within the Biocuration community, we recognize that the value of open scientific knowledge bases is that they make scientific knowledge easier to find and compute, thereby maximizing impact and minimizing waste. The ever-increasing number of databases makes us necessarily question what are our priorities with respect to maintaining them, developing new ones, or senescing/subsuming ones that have completed in their mission. Therefore, open biomedical data repositories should be carefully evaluated according to quality, accessibility, and value of the database resources over time and across the translational divide.
Traditional citation count and publication impact factors as a measure of success or value are known to be inadequate to assess the usefulness of a resource. This is especially true for integrative resources. For example, almost everyone in biomedicine relies on PubMed, but almost no one ever cites or mentions it in their publications. While the Nucleic Acids Research Database issues have increased citation of some databases, many still go unpublished or uncited; even novel derivations of methodology, applications, and workflows from biomedical knowledge bases are often “adapted” but never cited. There is a lack of citation best practices for widely used biomedical database resources (e.g. should a paper be cited? A URL? Is mention of the name and access date sufficient?).
We have developed a draft evaluation rubric for evaluating open science databases according to the commonly cited FAIR principles -- Findable, Accessible, Interoperable, and Reusable, but with three additional principles: Traceable, Licensed, and Connected. These additions are largely overlooked and underappreciated, yet are critical to reuse of the knowledge contained within any given database. It is worth noting that FAIR principles apply not only to the resource as a whole, but also to their key components; this “fractal FAIRness” means that even the license, identifiers, vocabularies, APIs themselves must be Findable, Accessible, Interoperable, Reusable, etc. Here we report on initial testing of our evaluation rubric on the recent NIH/Wellcome Trust Open Science projects and seek community input for how to further advance this rubric as a Biocuration community resource.
Deep phenotyping to aid identification of coding & non-coding rare disease v...mhaendel
Whole-exome sequencing has revolutionized disease research, but many cases remain unsolved because ~100-1000 candidates remain after removing common or non-pathogenic variants. We present Genomiser to prioritize coding and non-coding variants by leveraging phenotype data encoded with the Human Phenotype Ontology and a curated database of non-coding Mendelian variants. Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes.
This document discusses making scientific data fair, open, and reusable. It defines the FAIR guiding principles of findable, accessible, interoperable and reusable data and describes what each principle entails. It then expands on these principles by introducing FAIR-TLC, which adds the dimensions of traceable, licensed and connected. The document argues that adopting FAIR-TLC practices and developing tools to support them can help improve the sharing and reuse of scientific data. It also suggests ways to incentivize open science through funding and publication requirements.
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...mhaendel
Presented at the IRDiRC 2017 conference in Paris, Feb 9th, 2017 (http://irdirc-conference.org/). This talk reviews use of the Human Phenotype Ontology for phenotype comparisons against other patients, known diseases, and animal models for diagnostic discovery. It also discusses the new Phenopackets Exchange mechanism for open phenotypic data sharing.
www.monarchinitiative.org
www.phenopackets.org
www.human-phenotype-ontology.org
Phenopackets as applied to variant interpretation mhaendel
Phenopackets provide a standardized format for representing phenotypic data in order to make such data more findable, accessible, interoperable, and reusable. The format captures information about entities like patients and organisms, their associated conditions and phenotypes, and evidence for these associations. Phenopackets can be exported in different formats like CSV, JSON, and RDF. They allow complex phenotypes to be described through annotation and composition of terms from ontologies. Tools are being developed to work with phenopackets to enable applications in areas like clinical diagnostics, databases, and journals.
Credit where credit is due: acknowledging all types of contributionsmhaendel
This is an update for COASP (http://oaspa.org/conference/) on the representation of attribution beyond authorship of a publication. Publications are proxies for the projects and people that area actually engaged in the work, and represent the dissemination aspect. How can we better understand the individual contributions and their impact? The openRIF, openVIVO and FORCE11 Attribution WG efforts aim to represent scholarship in a computationally tractable manner so as to enable credit and evaluation of all types of scholarly contributions.
The Human Phenotype Ontology (HPO) was developed to describe phenotypic abnormalities, aka, “deep phenotyping”, whereby symptoms and characteristic phenotypic findings (a phenotypic profile) are captured. The HPO has been utilized to great success for assisting computational phenotype comparison against known diseases, other patients, and model organisms to support diagnosis of rare disease patients. Clinicians and geneticists create phenotypic profiles based on clinical evaluation, but this is time consuming and can miss important phenotypic features. Patients are sometimes the best source of information about their symptoms that might otherwise be missed in a clinical encounter. However, HPO primarily use medical terminology, which can be difficult for patients and their families to understand. To make the HPO accessible to patients, we systematically added non-expert terminology (i.e., layperson terms) synonyms. Using semantic similarity, patient-recorded phenotypic profiles can be evaluated against those created clinically for undiagnosed patients to determine the improvement gained from the patient-driven phenotyping, as well as how much the patient phenotyping narrows the diagnosis. This patient-centric HPO can be utilized by all: in patient-centered rare disease websites, in patient community platforms and registries, or even to post one’s hard-to-diagnosed phenotypic profile on the Web.
Why the world needs phenopacketeers, and how to be onemhaendel
Keynote presented at the the Ninth International Biocuration Conference Geneva, Switzerland, April 10-14, 2016
The health of an individual organism results from complex interplay between its genes and environment. Although great strides have been made in standardizing the representation of genetic information for exchange, there are no comparable standards to represent phenotypes (e.g. patient disease features, variation across biodiversity) or environmental factors that may influence such phenotypic outcomes. Phenotypic features of individual organisms are currently described in diverse places and in diverse formats: publications, databases, health records, registries, clinical trials, museum collections, and even social media. In these contexts, biocuration has been pivotal to obtaining a computable representation, but is still deeply challenged by the lack of standardization, accessibility, persistence, and computability among these contexts. How can we help all phenotype data creators contribute to this biocuration effort when the data is so distributed across so many communities, sources, and scales? How can we track contributions and provide proper attribution? How can we leverage phenotypic data from the model organism or biodiversity communities to help diagnose disease or determine evolutionary relatedness? Biocurators unite in a new community effort to address these challenges.
On the frontier of genotype-2-phenotype data integrationmhaendel
Presented at AMIA TBI 2016 BD2K Panel. A description of the Monarch Initiative's efforts to perform deep phenotyping data integration across species, facilitate exchange, and build computable G2P evidence modesl to aid variant interpretation.
The Monarch Initiative: A semantic phenomics approach to disease discovery
A merger of multi-species anatomy ontologies
1. A merger of multi-species anatomy
ontologies
or
An experiment in knowledge management
Biocuration 2013
Chris Mungall, Jim Balhoff, Frederic Bastian, David
Blackburn, Aurelie Comte , Wasila Dahdul , Alex
Dececchi, Nizar Ibrahim, Suzi Lewis, Paula Mabee,
Anne Niknejad, Melissa Haendel
2. We want to understand gene
function across taxa
Vertebrata
tetrapod limbs
Ascidians
ampullae
Echinodermata
tube feet
Arthropoda
Annelida
parapodia
Mollusca
Anatomy ontologies are used to describe morphological variation
3. Anatomy ontologies built for one
species will not work for others
http://ccm.ucdavis.edu/bcancercd/22/mouse_figure.html
http://fme.biostr.washington.edu:8080/FME/index.html
4. So we build species-specific ontologies
FMA
is_a (SubClassOf)
EHDAA2 part_of
develops_from
surrounded_by organ system solid organ
pharyngeal region
respiratory
primordium
respiratory parenchymatous
lung bud system organ
lung
MA Lower
thoracic respiratory lobular organ
organ system tract
cavity
MPO
abnormal respiratory thoracic respiratory
system morphology system
cavity organ
pleural sac lung
abnormal lung
morphology
lung
abnormal pulmonary
acinus morphology pulmonary
acinus
abnormal pulmonary lung
alveolus morphology
alveolar sac
alveolus
5. But this results in silos
FMA
is_a (SubClassOf)
EHDAA2 part_of
develops_from
surrounded_by organ system solid organ
pharyngeal region
respiratory
primordium
respiratory parenchymatous
lung bud system organ
lung
MA Lower
thoracic respiratory lobular organ
organ system tract
cavity
MPO
abnormal respiratory thoracic respiratory
system morphology system
cavity organ
pleural sac lung
abnormal lung
morphology
lung
abnormal pulmonary
acinus morphology pulmonary
acinus
abnormal pulmonary lung
alveolus morphology
alveolar sac
alveolus
6. Why not just map ontology terms?
Class A Class B Mapped? Useful?
FMA: extensor MouseAnatomy: retina Yes No
retinaculum of wrist
Vivo: legal decision Cognitive Atlas: decision Yes No
PlantOntology: Pith MouseAnatomy: medulla Yes No
TaxRank: domain NCI: protein domain Yes No
ZfishAnat: hypophysis MouseAnatomy: pituitary No Yes
TAO:fossa AdverseReactions: depression Yes No
FMA: colon GAZ: Colón, Panama Yes No
Quality: male Chebi: maleate 2(-) Yes No
String matching for mapping can lead to spurious results and
semantics of mappings and provenance are not always clear
7. Avoiding Silo-ization
Use ontologies that are:
open
documented
reusable
interoperable
built according to shared principles
reuse core relations and patterns
Problem:
How do we re-use in the presence of variability?
8. Long ago in the world of anatomies
CARO
FMA
EMAP EHDAA2
MA (mostly)
FBbt embryonic embryonic XAO ZFA
Adult adult
Drosophila mouse human Xenopus zebrafish
mouse human
An anatomical reference ontology was built to help standardize
species-specific ontologies
9. And then came Uberon, created to
bridge model organism anatomies
CARO
UBERON
FMA
EMAP EHDAA2
MA (mostly)
FBbt embryonic embryonic XAO ZFA
Adult adult
Drosophila mouse human Xenopus zebrafish
mouse human
10. Subsumption of species-specific classes
is_a (SubClassOf) anatomical
part_of structure
develops_from
capable_of
is_a (taxon equivalent) endoderm
only_in_taxon
organ part
foregut
swim bladder organ endoderm of
forgut
NCBITaxon:
respiration organ
Actinopterygii
respiratory
primordium
GO: respiratory
gaseous exchange
pulmonary acinus
alveolus lung lung primordium
NCBITaxon: Mammalia alveolus of lung alveolar sac lung bud
FMA:
pulmonary FMA:lung
MA:lung alveolus EHDAA:
MA:lung
alveolus lung bud
Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. (2012). http://genomebiology.com/2012/13/1/R5
11. Uberon UBERON
includes the cerebellum
p p
Cell Ontology cerebellar cerebellum
i
to enable i
vermis posterior
lobe
query across i i
granularity cerebellum i i cerebellum
p p p pposterior
cerebellum
cerebellar vermis of lobe of
posterior
vermis cereblleum cerebellum
lobe
MA:mouse
p FMA:human p
dendrite dendrite
CL:Purkinje cell CL:Purkinje cell
axon axon
GO/NIF: subcellular GO/NIF: subcellular
12. Use of Uberon
Annotation extensions GOA/UniProtKB [Chris’ talk]
Construction of GO terms [Heiko's talk]
Bgee cross-model homology-based expression search
[Frederic’s talk]
Annotation of biospecimens from diverse taxa
[eagle-i.org]
Phenotype similarity analyses to identify disease gene
candidates and models
New project “Monarch Initiative” to build tools and
services for navigating phenotypes [see our poster
tonight]
PhenoDigm analysis engine [Damian’s talk tomorrow]
13. Fossils, the ultimate silo
Modern diversity only a fraction of
evolutionary diversity
Missing evolutionary transitions e.g.
fin to limb
Extant ontologies not always Shubin et al. 2006
compatible with fossil data
Different data sources and resolution
between extinct and extant
14. And so over time…
CARO
UBERON
vHOG
Arthropod Vertebrate Homologous
Organs Group
HAO AAO TAO
Hymenoptera VSAO Amphibian Teleost
Vertebrate Skeletal
FMA
EMAP EHDAA2
MA (mostly)
FBbt embryonic embryonic XAO ZFA
Adult adult
Drosophila mouse human Xenopus zebrafish
mouse human
… additional multi-species ontologies evolved
15. But…
CARO
UBERON
vHOG
Arthropod Vertebrate Homologous
Organs Group
HAO AAO TAO
Hymenoptera VSAO Amphibian Teleost
Vertebrate Skeletal
FMA
EMAP EHDAA2
MA (mostly)
FBbt embryonic embryonic XAO ZFA
Adult adult
Drosophila mouse human Xenopus zebrafish
mouse human
…they had a hard time maintaining relationships
to one another
17. The big roll-up
CARO
UBERON-ext
Uberon
core
CL
Cell Ontology
VSAO AAO TAO
Vertebrate Skeletal Amphibian Teleost
FMA
EMAP EHDAA2
MA (mostly)
embryonic embryonic XAO ZFA
Adult adult
mouse human Xenopus zebrafish
mouse human
18. The new Uberon-ext
Contents:
– Over 8,000 classes (terms), 2500+ added by Phenoscape
– Multiple relationships, including subclass, part-of and develops-
from
Scope: metazoa (animals)
– Current focus is chordates
– Includes teleost, amniote, and amphibian specific classes
Uberon classes are generic / species neutral
– ‘mammary gland’: you can use this class for any mammal!
– ‘lung’: you can use this class for any vertebrate (that has lungs)
http://purl.obolibrary.org/obo/uberon/ext.obo
http://purl.obolibrary.org/obo/uberon/ext.owl
20. Managing variation: using reasoners
to detect errors
only_in_taxon
UBERON: bone Vertebrata
disjoint with is_a is_a
Drosophila melanogaster UBERON: tibia Homo sapiens
is_a is_a
✗
part_of part_of
Fruit fly FBbt ‘tibia’ Human FMA ‘tibia’
Developmental Biology, Scott Gilbert, 6th ed.
21. What can you do with the new
uberon-ext?
kb.phenoscape.org
Reason across anatomical variation in extinct and extant taxa
Query for candidate genes relevant to morphological evolution
Modified from Ahn and Ho 2008
22. Different strokes for different folks
ontology contents
basic simple
relationshipshttp://purl.obolibrary.org/obo/uberon/basic.owl
uberon main ontology
http://purl.obolibrary.org/obo/uberon.owl
merged main ontology + links to GO, CL, NCBITaxon,
NBOhttp://purl.obolibrary.org/obo/uberon/merged.owl
Composite- Uberon plus species-specific ontology classes merged in
metazoan http://purl.obolibrary.org/obo/uberon/composite-metazoan.owl
Uberon-ext Uberon merged plus TAO, AAO and VSAO terms merged in
http://purl.obolibrary.org/obo/uberon/ext.owl
Formats:
http://uberon.org OBO-Format
OWL
23. Conclusions
Model organism anatomies were difficult to query
across
Uberon was developed to help integrate human and
model organism anatomy
Uberon has been useful to align model organism
anatomy ontologies
Palaeontolosts and evo-devo biologists needed
wider coverage
A core set of vertebrate terms was needed by all
=> So we merged the ontologies, and now we can have
dinosaur bone data, model organism data, and human
data all integrated and queryable in one database!
24. Thanks!
Chris Mungall, Jim
Balhoff, Frederic Bastian, David
Blackburn, Aurelie Comte
, Wasila Dahdul , Alex
Dececchi, Nizar Ibrahim, Suzi
Lewis, Paula Mabee, Anne
Niknejad
25. Looking for a post-doc?
http://nescent.org/about/employment.php#PostDoc2
We are recruiting a postdoc with training in bioinformatics who is interested in studying
phenotypic evolution by combining model organism genetic data with comparative
anatomical data from throughout the vertebrates. Projects may range from primarily
computational to primarily biological.
Editor's Notes
Maybe update examples here
Pull out? Make more CL centric?
Fantom5? Anything else to put here?
Alex do you have a higher res image of the limbs?
Uberon uses the only_in_taxon method to make relationships such as lactifierous gland only in taxonMammalia and boneonly in taxon Vertebrata. These relations are useful for human users of the ontology, and can be used forconsistency checking within the ontology. For example if the FBbt class “tibia” (representing a segment ofan insect leg) were accidentally placed as a child of UBERON:0000979 tibia, this would be flagged by thereasoner because tibia is a bone, bones are found only in vertebrates, and FBbt is a Drosophila ontology