Semantics for rare disease phenotyping, diagnostics, and discovery

•

1 like•456 views

Presented in the life sciences panel at the 1st U.S. SEMANTIC TECHNOLOGIES SYMPOSIUM 2018. http://us2ts.org/

Semantics for rare disease phenotyping,
diagnostics, and discovery
Melissa Haendel, PhD
@ontowonka
@monarchinit

$Prevailing clinical diagnostic pipelines leverage only a tiny fraction of the available data$

$Prevailing clinical diagnostic pipelines leverage only a tiny fraction of the available data Under-utilized data à Loss of discriminatory power ?$

The (translational)
chasm of semantic despair
CD2H

The Human Phenotype Ontology
§ 11,813
phenotype
terms
§ 217,351 disease
-phenotype
annotations
bit.ly/hpo-paper
https://monarchinitiative.org/

Fuzzy (semantic similarity) phenotype
matching
DOI: 10.1126/scitranslmed.3009262

OMIM
(brown)
MESH
(grey)
ORDO/Orphanet
(yellow)
SubClassOf
(solid line)
Xref
(dashed grey line)
Hemolytic anemia
mappings across
resources
The ontologies inconsistently map to each other, leading to
poor interoperability and computability

New integrated ontology
Bayesian OWL Ontology Merging (kBOOM)
Mungall et al. http://bit.ly/k-BOOM
http://bit.ly/mondo-repo
http://purl.obolibrary.org/obo/mondo.owl

Patients can and should be empowered
to help with all of this.

Plain language synonyms for computable
phenotypes

Global patient matchmaking
Computational matching of rare disease patients across clinical & public sources
Find models and experts for functional validation
bit.ly/mme-matchbox
patientarchive.org
bit.ly/exomiser-2017

With many thanks
Lawrence Berkeley
Chris Mungall
Suzanna Lewis
Jeremy Nguyen
Seth Carbon
Nicole Washington
Charite
Sebastian Kohler
Ga rv a n
Tudor Groza
Craig McNamara
RENCI
Jim Balhoff
Boston Children’s
Ingrid Holm
Catherine Brownstein
John Brownstein
E BI
Helen Parkinson
David Osumi-Sutherland
OH SU
Matt Brush
Kent Shefchek
Julie McMurry
Tom Conlin
Nicole Vasilevsky
Dan Keith
Maureen Hoatlin
Tim Putman
JP Gourdine
David Ellison
Gen omi cs E n gland/Queen
M a ry
Damian Smedley
Jules Jacobson
Tomasz Konopka
Pilar Cacheiro
J a ckson La boratory
Peter Robinson
Leigh Carmody
Hannah Blau
With special thanks to Julie McMurry for excellent graphic design
Johns Hopkins
Chris Chute
Casey Overby
Ada Hamosh
Scripps
Andrew Su
Ben Good
Chunlei Wu
Gregg Stupp
Sanford Health Imagenetics
Neal Boerkoel
Kayli Rageth
Murat Sincan
ClinGen
Heidi Rehm
Larry Babb
Harindra Arachchi

The Monarch Initiative aims to improve disease diagnostics and analysis by utilizing deep phenotyping data. It has developed ontologies like the Human Phenotype Ontology with over 13,000 phenotype terms to help machines understand human phenotypes. It uses "fuzzy" phenotypic profile matching across species to match patient data to known genetic disorders, as demonstrated by a case solved linking a patient's profile to a STIM1 variant. The Initiative is working to develop lay-friendly phenotyping tools and connect data sources through the Matchmaker Exchange to aid in diagnosis and research.

GA4GH Phenotype Ontologies Task team update

mhaendel

Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery

mhaendel

Deep phenotyping to aid identification of coding & non-coding rare disease v...

mhaendel

Whole-exome sequencing has revolutionized disease research, but many cases remain unsolved because ~100-1000 candidates remain after removing common or non-pathogenic variants. We present Genomiser to prioritize coding and non-coding variants by leveraging phenotype data encoded with the Human Phenotype Ontology and a curated database of non-coding Mendelian variants. Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes.

Global phenotypic data sharing standards to maximize diagnostic discovery

mhaendel

Deep phenotyping for everyone

mhaendel

The Human Phenotype Ontology (HPO) was developed to describe phenotypic abnormalities, aka, “deep phenotyping”, whereby symptoms and characteristic phenotypic findings (a phenotypic profile) are captured. The HPO has been utilized to great success for assisting computational phenotype comparison against known diseases, other patients, and model organisms to support diagnosis of rare disease patients. Clinicians and geneticists create phenotypic profiles based on clinical evaluation, but this is time consuming and can miss important phenotypic features. Patients are sometimes the best source of information about their symptoms that might otherwise be missed in a clinical encounter. However, HPO primarily use medical terminology, which can be difficult for patients and their families to understand. To make the HPO accessible to patients, we systematically added non-expert terminology (i.e., layperson terms) synonyms. Using semantic similarity, patient-recorded phenotypic profiles can be evaluated against those created clinically for undiagnosed patients to determine the improvement gained from the patient-driven phenotyping, as well as how much the patient phenotyping narrows the diagnosis. This patient-centric HPO can be utilized by all: in patient-centered rare disease websites, in patient community platforms and registries, or even to post one’s hard-to-diagnosed phenotypic profile on the Web.

The Application of the Human Phenotype Ontology

mhaendel

Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...

mhaendel

Presented at the IRDiRC 2017 conference in Paris, Feb 9th, 2017 (http://irdirc-conference.org/). This talk reviews use of the Human Phenotype Ontology for phenotype comparisons against other patients, known diseases, and animal models for diagnostic discovery. It also discusses the new Phenopackets Exchange mechanism for open phenotypic data sharing. www.monarchinitiative.org www.phenopackets.org www.human-phenotype-ontology.org

Keynote presented at the the Ninth International Biocuration Conference Geneva, Switzerland, April 10-14, 2016 The health of an individual organism results from complex interplay between its genes and environment. Although great strides have been made in standardizing the representation of genetic information for exchange, there are no comparable standards to represent phenotypes (e.g. patient disease features, variation across biodiversity) or environmental factors that may influence such phenotypic outcomes. Phenotypic features of individual organisms are currently described in diverse places and in diverse formats: publications, databases, health records, registries, clinical trials, museum collections, and even social media. In these contexts, biocuration has been pivotal to obtaining a computable representation, but is still deeply challenged by the lack of standardization, accessibility, persistence, and computability among these contexts. How can we help all phenotype data creators contribute to this biocuration effort when the data is so distributed across so many communities, sources, and scales? How can we track contributions and provide proper attribution? How can we leverage phenotypic data from the model organism or biodiversity communities to help diagnose disease or determine evolutionary relatedness? Biocurators unite in a new community effort to address these challenges.

Semantic phenotyping for disease diagnosis and discovery

mhaendel

Here are a few things to consider about the patient's lower back pain over time: - Acute vs chronic: Determine if the pain is a new onset (acute) or has been present long-term (chronic). The duration can provide clues. - Progression: Note if the pain has gotten better, worse or stayed the same over time. Progression may indicate a more serious problem. - Radiation: Document if the pain radiates anywhere (e.g. legs). Radiating pain can suggest nerve root involvement. - Relieving/aggravating factors: Identify what makes the pain better or worse (e.g. activity, rest, position). This can help determine the

Phenopackets as applied to variant interpretation

mhaendel

Phenopackets provide a standardized format for representing phenotypic data in order to make such data more findable, accessible, interoperable, and reusable. The format captures information about entities like patients and organisms, their associated conditions and phenotypes, and evidence for these associations. Phenopackets can be exported in different formats like CSV, JSON, and RDF. They allow complex phenotypes to be described through annotation and composition of terms from ontologies. Tools are being developed to work with phenopackets to enable applications in areas like clinical diagnostics, databases, and journals.

Use of semantic phenotyping to aid disease diagnosis

mhaendel

This document discusses using semantic phenotyping to aid disease diagnosis. It outlines using ontologies to semantically annotate phenotypes seen in patients, animal models, and genes. This allows computation of semantic similarity between phenotypes to identify potential disease candidates. The document also discusses challenges such as uneven phenotype data distribution and differences in how phenotypes are described across species. It proposes building an integrated cross-species semantic framework called Uberpheno to address these challenges and better leverage animal models for diagnosing rare diseases.

The Monarch Initiative: An integrated genotype-phenotype platform for disease...

mhaendel

Enhancing the Human Phenotype Ontology for Use by the Layperson

Nicole Vasilevsky

Presentation at the International Conference on Biological Ontology & BioCreative, August 1-4, 2016, Corvallis, Oregon, USA. Abstract In rare or undiagnosed diseases, physicians rely upon genotype and phenotype information in order to compare abnormalities to other known cases and to inform diagnoses. Patients are often the best sources of information about their symptoms and phenotypes. The Human Phenotype Ontology (HPO) contains over 12,000 terms describing abnormal human phenotypes. However, the labels and synonyms in the HPO primarily use medical terminology, which can be difficult for patients and their families to understand. In order to make the HPO more accessible to non-medical experts, we systematically added new synonyms using non-expert terminology (i.e., layperson terms) to the existing HPO classes or tagged existing synonyms as layperson. As a result, the HPO contains over 6,000 classes with layperson synonyms.

Cell authentication by str profile

Bennie George

Empowering patients by increasing accessibility to clinical terminology

Nicole Vasilevsky

Plegable final

DanielVelezDiaz423

The document discusses two scientific studies: 1) A study of identical twins which found that DNA rearrangements increased with age and may help explain immune system aging. Larger rearrangements were only in twins over 60, and rearrangements correlated with age. 2) A study linking faulty proteins to ovarian cancer and potential for expanding PARP inhibitor treatments to more patients. Additional defective proteins beyond BRCA may be driving ovarian cancer. This could change ovarian cancer treatment dynamics.

A Primer to Bioinformatics: 29 September 2017

DocSoc2017

This document provides an introduction and primer to key concepts in bioinformatics. It discusses DNA structure and genes, how bioinformatics uses computer science to solve biological problems like genome sequencing, and the central dogma of DNA transcription and translation into mRNA and protein. It then outlines three tasks - converting a DNA sequence to mRNA, evaluating a sequence for single nucleotide polymorphisms related to sickle cell disease, and using a restriction endonuclease to identify tandem repeats related to Huntington's disease risk.

Hum. reprod. 2013-enciso-1707-15

t7260678

This study examined the relationship between DNA damage and numerical chromosome abnormalities in sperm samples from 45 infertile men. The study found: 1) A significant correlation between the proportion of sperm with numerical chromosome abnormalities and the level of DNA fragmentation. 2) Sperm cells that were chromosomally abnormal were more likely to display DNA damage than those that were normal based on the chromosomes tested. 3) This association was detected not only in samples with elevated rates of chromosome abnormalities, but also in samples with rates in the normal range. The findings suggest DNA fragmentation may be a marker for the presence of chromosome abnormalities in sperm.

Monarch Initiative Poster - Rare Disease Symposium 2015

Nicole Vasilevsky

Identify Disease-Associated Genetic Variants Via 3D Genomics Structure and Re...

Databricks

Whole genome sequencing (WGS) has enabled us to quantify human genomic variation at whole genome scale. This has profound impact on improving our understanding of human diversity, health, and diseases. One promising application of WGS is to identify disease-causal genes that can be therapeutically targeted. However, majority of disease-associated variants are located in non-coding regions or so-called genetic deserts, thus the exact function and biological consequences of these variants are unknown. In addition, with numerous variants in linkage disequilibrium (LD), genetic sequence itself is insufficient to infer the likely causal variant(s) among many variants in a region of association. Studies have shown that majority of these variants reside in gene regulatory regions and preferentially in cell type-specific enhancers, providing insights into disease relevance. Novel cutting-edge sequencing technologies to configure 3D genomic structure and to build tissue-specific gene regulatory landscapes can link regulatory elements to their targeted genes. This allows us to associate disease-associated variants and their underlying genes targets. In this talk, we demonstrate a new approach to incorporate 3D genomic structure and chromatin states of gene regulatory landscapes in a deep learning framework to predict functions of disease-associated variants and their targeted genes. This approach can significantly improve our understanding of the functional importance of those otherwise unknown genetics variants. It allows us to evaluate and prioritize high-impact variants and their targeted genes for development of new drug intervention.

The Genomics Revolution: The Good, The Bad, and The Ugly (UEOP16 Keynote)

Emiliano De Cristofaro

The document discusses the genomics revolution and its implications for privacy. It outlines the good of genetic testing and medicine, the bad of collecting sensitive genomic data that is hard to anonymize, and the ugly challenges of balancing privacy and the greater good. It then reviews the history of genome sequencing and cost reductions. The remainder summarizes privacy issues like re-identification risks, kin privacy, and challenges of data sharing. It also outlines cryptography techniques being explored to enable private genomic computation and testing on encrypted genomes. Open problems remain around long-term data storage and usability of privacy techniques.

Plegable BIOLOGÍA MOLECULAR

Adriana Álvarez

Researchers from the University of Massachusetts Medical School, Institut Curie in Paris and Stanford University studied the structure of the inactive X chromosome, known as the Barr body, in female mammals. They discovered that the Barr body contains two separately packed lobes of condensed inactive DNA separated by a highly repetitive segment of DNA. This suggests the repetitive DNA may play a role in organizing the Barr body. A separate study from the University of Valencia found that females have a protective effect from genetic mutations due to having two X chromosomes, whereas males only have one unprotected X chromosome, helping to explain differences in lifespan between sexes. Understanding normal genetic expression and abnormalities could enable development of specific medical treatments tailored to individuals.

Exosomes lecture

Dr.Mahmoud Abbas

Fundamentals of Analysis of Exomes

daforerog

This document provides an overview of exome analysis for identifying causal genes for Mendelian disorders. It discusses technological advances that have enabled exome sequencing, key publications in the field, strategies and tools used for data analysis, and exome sequencing service providers. The document is intended as a useful resource for those interested in how exome analysis is used to identify genes underlying Mendelian conditions.

Neuromics base presentation 2019

Pete Shuster

DNA PROFILING

Dishi Jain

DNA profiling was developed in 1984 by Sir Alec Jeffreys and involves analyzing variable regions of DNA called STRs or microsatellites that differ between individuals. It is used in forensic investigations to identify suspects or link them to crime scenes by comparing a sample to a reference DNA profile. The process involves extracting DNA from samples, analyzing STR regions to develop a profile of allele lengths, and entering it into DNA databases for comparison to other profiles. Some of the largest DNA databases are maintained by governments like the UK's NDNAD and US's CODIS, which help solve crimes but also raise privacy concerns due to retention of profiles.

ISEV2014 - Introduction to Pathogen Derived EV's (H. Del Portillo)

andyfhill

Computing on Phenotypes AMP 2015

Chris Mungall

1. The document discusses using phenotypes across species to aid in interpreting genomic data from patients and improving diagnosis and treatment. 2. Building comprehensive phenotype databases from multiple sources is challenging due to disparate data on human genes/variants and model organisms. 3. The Monarch Initiative aims to link human diseases to phenotypes in model systems through an ontology-based knowledge base and portal. 4. Incorporating rich phenotypic data can improve variant filtering and interpretation by providing more context for sequencing results.

The Monarch Initiative: From Model Organism to Precision Medicine

mhaendel

NIH BD2K all-hands meeting poster November 12, 2015. Attempts at correlating phenotypic aspects of disease with causal genetic influences are often confounded by the challenges of interpreting diverse data distributed across numerous resources. New approaches to data modeling, integration, tooling, and community practices are needed to make efficient use of these data. The Monarch Initiative is an international consortium working on the development of shared data, tools, and standards to enable direct translation of integrated genotype, phenotype, and environmental data from human and model organisms to enhance our understanding of human disease. We utilize sophisticated semantic mapping techniques across a diverse set of standardized ontologies to deeply integrate data across species, sources, and modalities. Using phenotype similarity matching algorithms across these data enables disorder prediction, variant prioritization, and patient matching against known diseases and model organisms. These similarity algorithms form the core of several innovative tools. The Exomiser, which enables exome variant prioritization by combining pathogenicity, frequency, inheritance, protein interaction, and cross-species phenotype data. Our Phenotype Sufficiency tool provides clinicians the ability to compare patient phenotypic profiles using the Human Phenotype Ontology to determine uniqueness and specificity in support of variant prioritization. The PhenoGrid visualization widget illustrates phenotype similarity between patients, known diseases, and model organisms. Monarch develops models in collaboration with the community in support of the burgeoning genotype-phenotype disease research community. We have successfully used Exomiser to solve a number of undiagnosed patient cases in collaboration with the NIH Undiagnosed Disease Program. Ongoing development in coordination with the Global Alliance for Genetic Health (GA4GH) and other groups will catalyze the realization of our goal of a vital translational community focused on the collaborative application of integrated genotype, phenotype, and environmental data to human disease.

What's hot

Why the world needs phenopacketeers, and how to be one

mhaendel

Semantic phenotyping for disease diagnosis and discovery

mhaendel

Phenopackets as applied to variant interpretation

mhaendel

Use of semantic phenotyping to aid disease diagnosis

mhaendel

The Monarch Initiative: An integrated genotype-phenotype platform for disease...

mhaendel

Enhancing the Human Phenotype Ontology for Use by the Layperson

Nicole Vasilevsky

Cell authentication by str profile

Bennie George

Empowering patients by increasing accessibility to clinical terminology

Nicole Vasilevsky

Plegable final

DanielVelezDiaz423

A Primer to Bioinformatics: 29 September 2017

DocSoc2017

Hum. reprod. 2013-enciso-1707-15

t7260678

Monarch Initiative Poster - Rare Disease Symposium 2015

Nicole Vasilevsky

Identify Disease-Associated Genetic Variants Via 3D Genomics Structure and Re...

Databricks

The Genomics Revolution: The Good, The Bad, and The Ugly (UEOP16 Keynote)

Emiliano De Cristofaro

Plegable BIOLOGÍA MOLECULAR

Adriana Álvarez

Exosomes lecture

Dr.Mahmoud Abbas

Fundamentals of Analysis of Exomes

daforerog

Neuromics base presentation 2019

Pete Shuster

DNA PROFILING

Dishi Jain

ISEV2014 - Introduction to Pathogen Derived EV's (H. Del Portillo)

andyfhill

What's hot (20)

Why the world needs phenopacketeers, and how to be one

Semantic phenotyping for disease diagnosis and discovery

Phenopackets as applied to variant interpretation

Use of semantic phenotyping to aid disease diagnosis

The Monarch Initiative: An integrated genotype-phenotype platform for disease...

Enhancing the Human Phenotype Ontology for Use by the Layperson

Cell authentication by str profile

Empowering patients by increasing accessibility to clinical terminology

Plegable final

A Primer to Bioinformatics: 29 September 2017

Hum. reprod. 2013-enciso-1707-15

Monarch Initiative Poster - Rare Disease Symposium 2015

Identify Disease-Associated Genetic Variants Via 3D Genomics Structure and Re...

The Genomics Revolution: The Good, The Bad, and The Ugly (UEOP16 Keynote)

Plegable BIOLOGÍA MOLECULAR

Exosomes lecture

Fundamentals of Analysis of Exomes

Neuromics base presentation 2019

DNA PROFILING

ISEV2014 - Introduction to Pathogen Derived EV's (H. Del Portillo)

Similar to Semantics for rare disease phenotyping, diagnostics, and discovery

Computing on Phenotypes AMP 2015

Chris Mungall

The Monarch Initiative: From Model Organism to Precision Medicine

mhaendel

openEHR in Research: Linking Health Data with Computational Models

Koray Atalag

Informatics and data analytics to support for exposome-based discovery

Chirag Patel

The document discusses the need for informatics methods, databases, and standards to support exposome-driven discovery research in a similar way that informatics has supported genomic research. Specifically, it notes that estimates of heritability from twin studies indicate that environmental factors likely play an equally important role as genetics in many traits/diseases. However, the chemical space of the exposome is large and heterogeneous, posing challenges to integrate exposome, genome, and phenome data through approaches like exposome-wide association studies.

Haendel clingenetics.3.14.14

mhaendel

The document discusses using structured phenotype data to improve the interpretation and prioritization of candidate genes from exome sequencing data, particularly for undiagnosed diseases. It outlines current challenges in candidate gene prioritization based on phenotypes alone. It then describes how ontologies can be used to semantically represent and compare phenotypes across species to leverage knowledge from model organisms. The document presents results showing that combining phenotype data with variant data using a tool called PhenIX improves the ability to correctly prioritize candidate genes from exome data compared to using variant data alone. This demonstrates the utility of structured phenotype data for computational analysis of exomes to diagnose rare diseases.

Blood transfusion in genomic era

Ugochukwu Oduwe

Similar to Semantics for rare disease phenotyping, diagnostics, and discovery (6)

Computing on Phenotypes AMP 2015

The Monarch Initiative: From Model Organism to Precision Medicine

openEHR in Research: Linking Health Data with Computational Models

Informatics and data analytics to support for exposome-based discovery

Haendel clingenetics.3.14.14

Blood transfusion in genomic era

More from mhaendel

Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...

mhaendel

Presented at AMIA TBI CRI 2018. Rare disease patients are expert in their medical history and these patients not only are some of the most engaged, but also they can themselves provision data for use in clinical evaluation. We therefore created a lay-person version of our clinical deep phenotyping instrument, the Human Phenotype Ontology. Here, we evaluate the diagnostic utility of this lay-HPO, and debut a new software tool for patient-led deep phenotyping.

The Software and Data Licensing Solution: Not Your Dad’s UBMTA

mhaendel

Presented at the Association of University Technology Managers (AUTM) Annual Conference 2018 Moderator: Arvin Paranjpe, Oregon Health & Science University Speakers: Frank Curci, Ater Wynne LLP Melissa Haendel, Oregon Health & Science University Charles Williams, University of Oregon Big data is an open frontier, and it’s quickly expanding. However, transaction costs and legal barriers stand squarely in the way of meaningful, far-reaching data integration. We’ll grapple with the issues regarding a large-scale data integration project across humans, model and non-model organisms. Without pointing fingers, we’ll also share a few highlights from the (Re)usable Data Project, which outlined a five-part rubric to evaluate data licenses with respect to clarity and the reuse and redistribution of data. In addition, the topic raises the question: How well-suited are off-the-shelf software and data licenses for universities? Data scientists and software programmers are all too quick to pick one when they release their technology on GitHub. What should technology transfer professionals recommend? We’ll discuss the usefulness and attributes of a uniform software and data license for university researchers and software programmers.

Equivalence is in the (ID) of the beholder

mhaendel

Presented at PIDapalooza 2018. https://pidapalooza.org/ Determining identifier equivalency is key to data integration and to realizing the scientific discoveries that can only be made by collating our vast disconnected data stores. There are two key problems in determining equivalency - conceptual and syntactic alignment. Conceptual alignment often relies on Xrefs and string-matching against synonyms. There is indeed a better way! Algorithmic determination of identifier equivalency across different sources can use a combination of Xrefs, priors rules, existing semantic relations, and synonyms to create equivalency cliques than can highlight the discrepancies in conceptual definitions for manual review. This is especially useful for data sources annotated with concept drift and differences, such as diseases. Syntactic issues are that there are so many variations of the same identifier, making data joins difficult. We present a framework to reconcile and provide authoritative and integration-ready prefixed identifiers (CURIES), to capture and consolidate prefixes and to build links across key resource registries. The combination of JSON-LD context technology with a prefix metadata repository provides the basis for the infrastructure to handle identifiers in a consistent fashion. Finally, this architecture also allows resources to be self describing "beacons" with respect to their identifiers.

Building (and traveling) the data-brick road: A report from the front lines ...

mhaendel

The NIH Data Commons must treat the data it will contain not unlike the mortar and stones of a road. To help our fellow scientists travelers use the road, we must engineer for heavy traffic and diverse destinations. There are many steps to architecting a robust and persistent road. First, the data must be sourced and manipulated into common data models. This requires versioned access to the data, equivalency determination of identifiers within the data or minting of new ones for the data and/or within it, manipulating the data according to common data models (e.g. a genotype-to-pehnotype association in one source may relate a variant to a disease, where in another it may be a set of alleles associated with a set of phenotypes, each source models the data differently). Inclusion of the data in the Commons must meet all licensing restrictions, which are varied and usually poorly declared, as well as security, HIPAA, and ethics requirements. Software tools are needed to perform the Enhance-Transform-Load (ETL) process on a regular cycle to keep the data current, and to assess changes and quality assurance over time. For records that disappear, there needs to be a way to keep an archive of them. Once in the Commons, the data requires a map to navigate the roads: where do you want to go? Indexing and search across the data requires having the data be self-reporting - loading ontologies used in the data for indexing and providing faceted query over these and other attributes, sophisticated text mining tools, relevance ranking, and equivalency and similarity determination from amongst different providers. Once found, the users need vehicles to drive upon the road. These are their workspaces, the place where they design and implement the operations they need in order to get where they want to go. Unimaginable scientific emeralds are to be found at the end of the road, as the sum of all the data, if well integrated and made computationally reusable, has proven to be well beyond the sum of its parts in getting us where we want to go.

Reusable data for biomedicine: A data licensing odyssey

mhaendel

Biomedical data integrators grapple with a fundamental blocker in research today: licensing for data use and redistribution. Complex licensing and data reuse restrictions hinder most publicly-funded, seemingly “open” biomedical data from being put to its full potential. Such issues include missing licenses, non-standard licenses, and restrictive provisions. The sheer diversity of licenses are particularly thorny for those that aim to redistribute data. Redistributors are often required to contact each sub-source to obtain permissions, and this is complicated by the fact that on each side of the agreement there may be multiple legal entities involved and some sub-sources may themselves already be aggregating data from other sub-sources. Furthermore, interpreting legal compliance with source data licensing and use agreements is complicated, as data is often manipulated, shared, and redistributed by many types of research groups and users in various and subtle ways. Here, we debut a new effort, the (Re)usable Data Project, where we have created a five-part rubric to evaluate biomedical data sources and their licensing information to determine the degree to which unnegotiated and unrestricted reuse and redistribution are provided. We have tested the (Re)usable Data rubric against various biomedical data sources, ranking each source on a scale of zero to five stars, and have found that approximately half of the resources rank poorly, getting 2.5 stars or less. Our goal is to help biomedical informaticians and other users navigate the plethora of issues in reusing and redistributing biomedical data. The (Re)usable Data project aims to promote standardization and ease of reuse licensing practices by data providers.

How open is open? An evaluation rubric for public knowledgebases

mhaendel

Presented at the 2017 International Biocuration Conference. Data relevant to any given scientific investigation is highly decentralized across thousands of specialized databases. Within the Biocuration community, we recognize that the value of open scientific knowledge bases is that they make scientific knowledge easier to find and compute, thereby maximizing impact and minimizing waste. The ever-increasing number of databases makes us necessarily question what are our priorities with respect to maintaining them, developing new ones, or senescing/subsuming ones that have completed in their mission. Therefore, open biomedical data repositories should be carefully evaluated according to quality, accessibility, and value of the database resources over time and across the translational divide. Traditional citation count and publication impact factors as a measure of success or value are known to be inadequate to assess the usefulness of a resource. This is especially true for integrative resources. For example, almost everyone in biomedicine relies on PubMed, but almost no one ever cites or mentions it in their publications. While the Nucleic Acids Research Database issues have increased citation of some databases, many still go unpublished or uncited; even novel derivations of methodology, applications, and workflows from biomedical knowledge bases are often “adapted” but never cited. There is a lack of citation best practices for widely used biomedical database resources (e.g. should a paper be cited? A URL? Is mention of the name and access date sufficient?). We have developed a draft evaluation rubric for evaluating open science databases according to the commonly cited FAIR principles -- Findable, Accessible, Interoperable, and Reusable, but with three additional principles: Traceable, Licensed, and Connected. These additions are largely overlooked and underappreciated, yet are critical to reuse of the knowledge contained within any given database. It is worth noting that FAIR principles apply not only to the resource as a whole, but also to their key components; this “fractal FAIRness” means that even the license, identifiers, vocabularies, APIs themselves must be Findable, Accessible, Interoperable, Reusable, etc. Here we report on initial testing of our evaluation rubric on the recent NIH/Wellcome Trust Open Science projects and seek community input for how to further advance this rubric as a Biocuration community resource.

Science in the open, what does it take?

mhaendel

This document discusses making scientific data fair, open, and reusable. It defines the FAIR guiding principles of findable, accessible, interoperable and reusable data and describes what each principle entails. It then expands on these principles by introducing FAIR-TLC, which adds the dimensions of traceable, licensed and connected. The document argues that adopting FAIR-TLC practices and developing tools to support them can help improve the sharing and reuse of scientific data. It also suggests ways to incentivize open science through funding and publication requirements.

Credit where credit is due: acknowledging all types of contributions

mhaendel

This is an update for COASP (http://oaspa.org/conference/) on the representation of attribution beyond authorship of a publication. Publications are proxies for the projects and people that area actually engaged in the work, and represent the dissemination aspect. How can we better understand the individual contributions and their impact? The openRIF, openVIVO and FORCE11 Attribution WG efforts aim to represent scholarship in a computationally tractable manner so as to enable credit and evaluation of all types of scholarly contributions.

On the frontier of genotype-2-phenotype data integration

mhaendel

The Monarch Initiative: A semantic phenomics approach to disease discovery

mhaendel

Envisioning a world where everyone helps solve disease

mhaendel

Getting (and giving) credit for all that we do

mhaendel

This document discusses the need to give proper attribution and credit to all contributions in the research process, not just authorship of publications. It notes that many roles and outputs are not adequately recognized currently. It introduces the open Research Information Framework (openRIF) which aims to develop ontologies and tools to connect people to their diverse research outputs and roles through interoperable systems in order to ensure proper attribution for all.

Integrating clinical and model organism G2P data for disease discovery

mhaendel

This document discusses challenges in integrating clinical and model organism genotype-phenotype data to improve disease discovery. It notes there are many variants of unknown significance and phenotypes are not well represented across vocabularies. The document describes using the Human Phenotype Ontology and semantic techniques to standardize and bridge vocabularies and phenotypes across species. This can help compare phenotypic profiles for variant prioritization and disease discovery. Standardizing data and ontologies across species in a graph database is described as a way to propagate evidence and evaluate schemas being developed by GA4GH.

Force11: Enabling transparency and efficiency in the research landscape

mhaendel

Dataset description using the W3C HCLS standard

mhaendel

This talk was presented at the BioCaddie http://biocaddie.org/ workshop at the Force15 conference (https://www.force11.org/meetings/force2015) on changing the future of scholarly communication. The goal was to increase awareness of why a Semantic Web-compliant standard was needed for describing data, where current standards fall short, and how this new emerging standard that extends prior efforts can aid data discovery and integration. This work is being lead by Michel Dumontier, Alasdair Gray, Joachim Baran, and M. Scott Marshall; participants and end-user testers are welcome, see: http://tiny.cc/hcls-datadesc-ed

On the nature of Credit

mhaendel

Standardizing scholarly output with the VIVO ontology

mhaendel

The document discusses standardizing scholarly output by creating a semantic representation of research activities and products using VIVO-ISF. This would enable identifying potential collaborators and expertise across disciplines. VIVO-ISF can integrate data from different research profiling systems and sources to provide a standardized view. Integrating clinical, research, and publication data from multiple institutions using VIVO-ISF can help answer questions about expertise, collaboration, and identifying advisors.

More from mhaendel (17)

Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...

The Software and Data Licensing Solution: Not Your Dad’s UBMTA

Equivalence is in the (ID) of the beholder

Building (and traveling) the data-brick road: A report from the front lines ...

Reusable data for biomedicine: A data licensing odyssey

How open is open? An evaluation rubric for public knowledgebases

Science in the open, what does it take?

Credit where credit is due: acknowledging all types of contributions

On the frontier of genotype-2-phenotype data integration

The Monarch Initiative: A semantic phenomics approach to disease discovery

Envisioning a world where everyone helps solve disease

Getting (and giving) credit for all that we do

Integrating clinical and model organism G2P data for disease discovery

Force11: Enabling transparency and efficiency in the research landscape

Dataset description using the W3C HCLS standard

On the nature of Credit

Standardizing scholarly output with the VIVO ontology

Recently uploaded

molar-distalization in orthodontics-seminar.pptx

Anagha Prasad

Immersive Learning That Works: Research Grounding and Paths Forward

Leonel Morgado

We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.

GBSN - Biochemistry (Unit 6) Chemistry of Proteins

Areesha Ahmad

Applied Science: Thermodynamics, Laws & Methodology.pdf

University of Hertfordshire

When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best. Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level. Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.

8.Isolation of pure cultures and preservation of cultures.pdf

by6843629

Direct Seeded Rice - Climate Smart Agriculture

International Food Policy Research Institute- South Asia Office

Modelo de slide quimica para powerpoint

Karen593256

Pests of Storage_Identification_Dr.UPR.pdf

PirithiRaju

Eukaryotic Transcription Presentation.pptx

RitabrataSarkar3

AJAY KUMAR NIET GreNo Guava Project File.pdf

AJAY KUMAR

快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样

hozt8xgk

学校原件一模一样【微信：741003700 】《(UAM毕业证书)马德里自治大学毕业证学位证》【微信：741003700 】学位证，留信认证（真实可查，永久存档）原件一模一样纸张工艺/offer、雅思、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！外壳，原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原。 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700 【主营项目】一.毕业证【q微741003700】成绩单、使馆认证、教育部认证、雅思托福成绩单、学生卡等！二.真实使馆公证(即留学回国人员证明,不成功不收费) 三.真实教育部学历学位认证（教育部存档！教育部留服网站永久可查）四.办理各国各大学文凭(一对一专业服务,可全程监控跟踪进度) 如果您处于以下几种情况： ◇在校期间，因各种原因未能顺利毕业……拿不到官方毕业证【q/微741003700】 ◇面对父母的压力，希望尽快拿到； ◇不清楚认证流程以及材料该如何准备； ◇回国时间很长，忘记办理； ◇回国马上就要找工作，办给用人单位看； ◇企事业单位必须要求办理的 ◇需要报考公务员、购买免税车、落转户口 ◇申请留学生创业基金留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才

Randomised Optimisation Algorithms in DAPHNE

University of Maribor

aziz sancar nobel prize winner: from mardin to nobel

İsa Badur

Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...

Travis Hills MN

ESR spectroscopy in liquid food and beverages.pptx

PRIYANKA PATEL

With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.

The cost of acquiring information by natural selection

Carl Bergstrom

This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome. It's based on the first part of this research paper: The cost of information acquisition by natural selection Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577

Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf

Selcen Ozturkcan

11.1 Role of physical biological in deterioration of grains.pdf

PirithiRaju

Basics of crystallography, crystal systems, classes and different forms

MaheshaNanjegowda

Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...

frank0071

Recently uploaded (20)

molar-distalization in orthodontics-seminar.pptx

Immersive Learning That Works: Research Grounding and Paths Forward

GBSN - Biochemistry (Unit 6) Chemistry of Proteins

Applied Science: Thermodynamics, Laws & Methodology.pdf

8.Isolation of pure cultures and preservation of cultures.pdf

Direct Seeded Rice - Climate Smart Agriculture

Modelo de slide quimica para powerpoint

Pests of Storage_Identification_Dr.UPR.pdf

Eukaryotic Transcription Presentation.pptx

AJAY KUMAR NIET GreNo Guava Project File.pdf

快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样

Randomised Optimisation Algorithms in DAPHNE

aziz sancar nobel prize winner: from mardin to nobel

Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...

ESR spectroscopy in liquid food and beverages.pptx

The cost of acquiring information by natural selection

Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf

11.1 Role of physical biological in deterioration of grains.pdf

Basics of crystallography, crystal systems, classes and different forms

Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...

Semantics for rare disease phenotyping, diagnostics, and discovery

1. Semantics for rare disease phenotyping, diagnostics, and discovery Melissa Haendel, PhD @ontowonka @monarchinit

2. Prevailing clinical diagnostic pipelines leverage only a tiny fraction of the available data

3. Prevailing clinical diagnostic pipelines leverage only a tiny fraction of the available data Under-utilized data à Loss of discriminatory power ?

4. The (translational) chasm of semantic despair CD2H

5. The Human Phenotype Ontology § 11,813 phenotype terms § 217,351 disease -phenotype annotations bit.ly/hpo-paper https://monarchinitiative.org/

6. Fuzzy (semantic similarity) phenotype matching DOI: 10.1126/scitranslmed.3009262

7. OMIM (brown) MESH (grey) ORDO/Orphanet (yellow) SubClassOf (solid line) Xref (dashed grey line) Hemolytic anemia mappings across resources The ontologies inconsistently map to each other, leading to poor interoperability and computability

9. New integrated ontology Bayesian OWL Ontology Merging (kBOOM) Mungall et al. http://bit.ly/k-BOOM http://bit.ly/mondo-repo http://purl.obolibrary.org/obo/mondo.owl

10. Patients can and should be empowered to help with all of this.

11. Plain language synonyms for computable phenotypes

12. Global patient matchmaking Computational matching of rare disease patients across clinical & public sources Find models and experts for functional validation bit.ly/mme-matchbox patientarchive.org bit.ly/exomiser-2017

13. With many thanks Lawrence Berkeley Chris Mungall Suzanna Lewis Jeremy Nguyen Seth Carbon Nicole Washington Charite Sebastian Kohler Ga rv a n Tudor Groza Craig McNamara RENCI Jim Balhoff Boston Children’s Ingrid Holm Catherine Brownstein John Brownstein E BI Helen Parkinson David Osumi-Sutherland OH SU Matt Brush Kent Shefchek Julie McMurry Tom Conlin Nicole Vasilevsky Dan Keith Maureen Hoatlin Tim Putman JP Gourdine David Ellison Gen omi cs E n gland/Queen M a ry Damian Smedley Jules Jacobson Tomasz Konopka Pilar Cacheiro J a ckson La boratory Peter Robinson Leigh Carmody Hannah Blau With special thanks to Julie McMurry for excellent graphic design Johns Hopkins Chris Chute Casey Overby Ada Hamosh Scripps Andrew Su Ben Good Chunlei Wu Gregg Stupp Sanford Health Imagenetics Neal Boerkoel Kayli Rageth Murat Sincan ClinGen Heidi Rehm Larry Babb Harindra Arachchi

Semantics for rare disease phenotyping, diagnostics, and discovery

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Semantics for rare disease phenotyping, diagnostics, and discovery

Similar to Semantics for rare disease phenotyping, diagnostics, and discovery (6)

More from mhaendel

More from mhaendel (17)

Recently uploaded

Recently uploaded (20)

Semantics for rare disease phenotyping, diagnostics, and discovery