The Monarch Initiative aims to improve disease diagnostics and analysis by utilizing deep phenotyping data. It has developed ontologies like the Human Phenotype Ontology with over 13,000 phenotype terms to help machines understand human phenotypes. It uses "fuzzy" phenotypic profile matching across species to match patient data to known genetic disorders, as demonstrated by a case solved linking a patient's profile to a STIM1 variant. The Initiative is working to develop lay-friendly phenotyping tools and connect data sources through the Matchmaker Exchange to aid in diagnosis and research.
Data Translator: an Open Science Data Platform for Mechanistic Disease Discoverymhaendel
Architecture of language and data translation that underlays the NCATS Biomedical Data Translator. Presented at the Fanconi Anemia Annual Meeting. http://fanconi.org/index.php/research/annual_symposium
Deep phenotyping to aid identification of coding & non-coding rare disease v...mhaendel
Whole-exome sequencing has revolutionized disease research, but many cases remain unsolved because ~100-1000 candidates remain after removing common or non-pathogenic variants. We present Genomiser to prioritize coding and non-coding variants by leveraging phenotype data encoded with the Human Phenotype Ontology and a curated database of non-coding Mendelian variants. Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes.
The Human Phenotype Ontology (HPO) was developed to describe phenotypic abnormalities, aka, “deep phenotyping”, whereby symptoms and characteristic phenotypic findings (a phenotypic profile) are captured. The HPO has been utilized to great success for assisting computational phenotype comparison against known diseases, other patients, and model organisms to support diagnosis of rare disease patients. Clinicians and geneticists create phenotypic profiles based on clinical evaluation, but this is time consuming and can miss important phenotypic features. Patients are sometimes the best source of information about their symptoms that might otherwise be missed in a clinical encounter. However, HPO primarily use medical terminology, which can be difficult for patients and their families to understand. To make the HPO accessible to patients, we systematically added non-expert terminology (i.e., layperson terms) synonyms. Using semantic similarity, patient-recorded phenotypic profiles can be evaluated against those created clinically for undiagnosed patients to determine the improvement gained from the patient-driven phenotyping, as well as how much the patient phenotyping narrows the diagnosis. This patient-centric HPO can be utilized by all: in patient-centered rare disease websites, in patient community platforms and registries, or even to post one’s hard-to-diagnosed phenotypic profile on the Web.
The Application of the Human Phenotype Ontology mhaendel
Presented at the II International Summer School for Rare Disease and Orphan Drug Registries, September 15-19, 2014, Organized by the National Centre for Rare Diseases
Istituto Superiore di Sanità (ISS), Rome, Italy.
Note the extensive contribution by many consortium members and partners listed in the acknowledgements slide.
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...mhaendel
Presented at the IRDiRC 2017 conference in Paris, Feb 9th, 2017 (http://irdirc-conference.org/). This talk reviews use of the Human Phenotype Ontology for phenotype comparisons against other patients, known diseases, and animal models for diagnostic discovery. It also discusses the new Phenopackets Exchange mechanism for open phenotypic data sharing.
www.monarchinitiative.org
www.phenopackets.org
www.human-phenotype-ontology.org
The Monarch Initiative aims to improve disease diagnostics and analysis by utilizing deep phenotyping data. It has developed ontologies like the Human Phenotype Ontology with over 13,000 phenotype terms to help machines understand human phenotypes. It uses "fuzzy" phenotypic profile matching across species to match patient data to known genetic disorders, as demonstrated by a case solved linking a patient's profile to a STIM1 variant. The Initiative is working to develop lay-friendly phenotyping tools and connect data sources through the Matchmaker Exchange to aid in diagnosis and research.
Data Translator: an Open Science Data Platform for Mechanistic Disease Discoverymhaendel
Architecture of language and data translation that underlays the NCATS Biomedical Data Translator. Presented at the Fanconi Anemia Annual Meeting. http://fanconi.org/index.php/research/annual_symposium
Deep phenotyping to aid identification of coding & non-coding rare disease v...mhaendel
Whole-exome sequencing has revolutionized disease research, but many cases remain unsolved because ~100-1000 candidates remain after removing common or non-pathogenic variants. We present Genomiser to prioritize coding and non-coding variants by leveraging phenotype data encoded with the Human Phenotype Ontology and a curated database of non-coding Mendelian variants. Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes.
The Human Phenotype Ontology (HPO) was developed to describe phenotypic abnormalities, aka, “deep phenotyping”, whereby symptoms and characteristic phenotypic findings (a phenotypic profile) are captured. The HPO has been utilized to great success for assisting computational phenotype comparison against known diseases, other patients, and model organisms to support diagnosis of rare disease patients. Clinicians and geneticists create phenotypic profiles based on clinical evaluation, but this is time consuming and can miss important phenotypic features. Patients are sometimes the best source of information about their symptoms that might otherwise be missed in a clinical encounter. However, HPO primarily use medical terminology, which can be difficult for patients and their families to understand. To make the HPO accessible to patients, we systematically added non-expert terminology (i.e., layperson terms) synonyms. Using semantic similarity, patient-recorded phenotypic profiles can be evaluated against those created clinically for undiagnosed patients to determine the improvement gained from the patient-driven phenotyping, as well as how much the patient phenotyping narrows the diagnosis. This patient-centric HPO can be utilized by all: in patient-centered rare disease websites, in patient community platforms and registries, or even to post one’s hard-to-diagnosed phenotypic profile on the Web.
The Application of the Human Phenotype Ontology mhaendel
Presented at the II International Summer School for Rare Disease and Orphan Drug Registries, September 15-19, 2014, Organized by the National Centre for Rare Diseases
Istituto Superiore di Sanità (ISS), Rome, Italy.
Note the extensive contribution by many consortium members and partners listed in the acknowledgements slide.
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...mhaendel
Presented at the IRDiRC 2017 conference in Paris, Feb 9th, 2017 (http://irdirc-conference.org/). This talk reviews use of the Human Phenotype Ontology for phenotype comparisons against other patients, known diseases, and animal models for diagnostic discovery. It also discusses the new Phenopackets Exchange mechanism for open phenotypic data sharing.
www.monarchinitiative.org
www.phenopackets.org
www.human-phenotype-ontology.org
Why the world needs phenopacketeers, and how to be onemhaendel
Keynote presented at the the Ninth International Biocuration Conference Geneva, Switzerland, April 10-14, 2016
The health of an individual organism results from complex interplay between its genes and environment. Although great strides have been made in standardizing the representation of genetic information for exchange, there are no comparable standards to represent phenotypes (e.g. patient disease features, variation across biodiversity) or environmental factors that may influence such phenotypic outcomes. Phenotypic features of individual organisms are currently described in diverse places and in diverse formats: publications, databases, health records, registries, clinical trials, museum collections, and even social media. In these contexts, biocuration has been pivotal to obtaining a computable representation, but is still deeply challenged by the lack of standardization, accessibility, persistence, and computability among these contexts. How can we help all phenotype data creators contribute to this biocuration effort when the data is so distributed across so many communities, sources, and scales? How can we track contributions and provide proper attribution? How can we leverage phenotypic data from the model organism or biodiversity communities to help diagnose disease or determine evolutionary relatedness? Biocurators unite in a new community effort to address these challenges.
Semantic phenotyping for disease diagnosis and discovery mhaendel
Here are a few things to consider about the patient's lower back pain over time:
- Acute vs chronic: Determine if the pain is a new onset (acute) or has been present long-term (chronic). The duration can provide clues.
- Progression: Note if the pain has gotten better, worse or stayed the same over time. Progression may indicate a more serious problem.
- Radiation: Document if the pain radiates anywhere (e.g. legs). Radiating pain can suggest nerve root involvement.
- Relieving/aggravating factors: Identify what makes the pain better or worse (e.g. activity, rest, position). This can help determine the
Phenopackets as applied to variant interpretation mhaendel
Phenopackets provide a standardized format for representing phenotypic data in order to make such data more findable, accessible, interoperable, and reusable. The format captures information about entities like patients and organisms, their associated conditions and phenotypes, and evidence for these associations. Phenopackets can be exported in different formats like CSV, JSON, and RDF. They allow complex phenotypes to be described through annotation and composition of terms from ontologies. Tools are being developed to work with phenopackets to enable applications in areas like clinical diagnostics, databases, and journals.
Use of semantic phenotyping to aid disease diagnosismhaendel
This document discusses using semantic phenotyping to aid disease diagnosis. It outlines using ontologies to semantically annotate phenotypes seen in patients, animal models, and genes. This allows computation of semantic similarity between phenotypes to identify potential disease candidates. The document also discusses challenges such as uneven phenotype data distribution and differences in how phenotypes are described across species. It proposes building an integrated cross-species semantic framework called Uberpheno to address these challenges and better leverage animal models for diagnosing rare diseases.
Enhancing the Human Phenotype Ontology for Use by the LaypersonNicole Vasilevsky
Presentation at the International Conference on Biological Ontology & BioCreative, August 1-4, 2016, Corvallis, Oregon, USA.
Abstract
In rare or undiagnosed diseases, physicians rely upon genotype and phenotype information in order to compare abnormalities to other known cases and to inform diagnoses. Patients are often the best sources of information about their symptoms and phenotypes. The Human Phenotype Ontology (HPO) contains over 12,000 terms describing abnormal human phenotypes. However, the labels and synonyms in the HPO primarily use medical terminology, which can be difficult for patients and their families to understand. In order to make the HPO more accessible to non-medical experts, we systematically added new synonyms using non-expert terminology (i.e., layperson terms) to the existing HPO classes or tagged existing synonyms as layperson. As a result, the HPO contains over 6,000 classes with layperson synonyms.
The human genome is full of repeated DNA sequences which come in various sizes and are classified according to the length of the core repeat units, the number of contiguous repeat units, and/or the overall length of the repeat region. DNA regions with short repeat units (usually 2-6 bp in length) are called Short Tandem Repeats (STR).
Empowering patients by increasing accessibility to clinical terminologyNicole Vasilevsky
Flash talk at Medical Library Association Pacific Northwest Chapter meeting in Portland, OR on October 18, 2016.
http://pnc-mla.cloverpad.org/annual2016
Authors: Erin Foster, Mark Engelstad, Chris Mungall, Peter Robinson, Sebastian Kohler, Melissa Haendel and Nicole Vasilevsky
The document discusses two scientific studies:
1) A study of identical twins which found that DNA rearrangements increased with age and may help explain immune system aging. Larger rearrangements were only in twins over 60, and rearrangements correlated with age.
2) A study linking faulty proteins to ovarian cancer and potential for expanding PARP inhibitor treatments to more patients. Additional defective proteins beyond BRCA may be driving ovarian cancer. This could change ovarian cancer treatment dynamics.
A Primer to Bioinformatics: 29 September 2017DocSoc2017
This document provides an introduction and primer to key concepts in bioinformatics. It discusses DNA structure and genes, how bioinformatics uses computer science to solve biological problems like genome sequencing, and the central dogma of DNA transcription and translation into mRNA and protein. It then outlines three tasks - converting a DNA sequence to mRNA, evaluating a sequence for single nucleotide polymorphisms related to sickle cell disease, and using a restriction endonuclease to identify tandem repeats related to Huntington's disease risk.
This study examined the relationship between DNA damage and numerical chromosome abnormalities in sperm samples from 45 infertile men. The study found:
1) A significant correlation between the proportion of sperm with numerical chromosome abnormalities and the level of DNA fragmentation.
2) Sperm cells that were chromosomally abnormal were more likely to display DNA damage than those that were normal based on the chromosomes tested.
3) This association was detected not only in samples with elevated rates of chromosome abnormalities, but also in samples with rates in the normal range. The findings suggest DNA fragmentation may be a marker for the presence of chromosome abnormalities in sperm.
Poster presentation at the Rare Disease Symposium at Oregon Health & Science University in Portland, Oregon, 2015.
http://openwetware.org/wiki/OHSU_Rare_Disease_Research_Consortium_Symposium_2015
Identify Disease-Associated Genetic Variants Via 3D Genomics Structure and Re...Databricks
Whole genome sequencing (WGS) has enabled us to quantify human genomic variation at whole genome scale. This has profound impact on improving our understanding of human diversity, health, and diseases. One promising application of WGS is to identify disease-causal genes that can be therapeutically targeted. However, majority of disease-associated variants are located in non-coding regions or so-called genetic deserts, thus the exact function and biological consequences of these variants are unknown. In addition, with numerous variants in linkage disequilibrium (LD), genetic sequence itself is insufficient to infer the likely causal variant(s) among many variants in a region of association. Studies have shown that majority of these variants reside in gene regulatory regions and preferentially in cell type-specific enhancers, providing insights into disease relevance. Novel cutting-edge sequencing technologies to configure 3D genomic structure and to build tissue-specific gene regulatory landscapes can link regulatory elements to their targeted genes. This allows us to associate disease-associated variants and their underlying genes targets.
In this talk, we demonstrate a new approach to incorporate 3D genomic structure and chromatin states of gene regulatory landscapes in a deep learning framework to predict functions of disease-associated variants and their targeted genes. This approach can significantly improve our understanding of the functional importance of those otherwise unknown genetics variants. It allows us to evaluate and prioritize high-impact variants and their targeted genes for development of new drug intervention.
The Genomics Revolution: The Good, The Bad, and The Ugly (UEOP16 Keynote)Emiliano De Cristofaro
The document discusses the genomics revolution and its implications for privacy. It outlines the good of genetic testing and medicine, the bad of collecting sensitive genomic data that is hard to anonymize, and the ugly challenges of balancing privacy and the greater good. It then reviews the history of genome sequencing and cost reductions. The remainder summarizes privacy issues like re-identification risks, kin privacy, and challenges of data sharing. It also outlines cryptography techniques being explored to enable private genomic computation and testing on encrypted genomes. Open problems remain around long-term data storage and usability of privacy techniques.
Researchers from the University of Massachusetts Medical School, Institut Curie in Paris and Stanford University studied the structure of the inactive X chromosome, known as the Barr body, in female mammals. They discovered that the Barr body contains two separately packed lobes of condensed inactive DNA separated by a highly repetitive segment of DNA. This suggests the repetitive DNA may play a role in organizing the Barr body. A separate study from the University of Valencia found that females have a protective effect from genetic mutations due to having two X chromosomes, whereas males only have one unprotected X chromosome, helping to explain differences in lifespan between sexes. Understanding normal genetic expression and abnormalities could enable development of specific medical treatments tailored to individuals.
Lecture presented by Dr.Fatma Taha at BIOCHEM Cairo 2014 organized by Department of Medical Biochemistry and Molecular Biology, Cairo University. BIOCHEM Cairo 2014 is a Scribe event ( www.scribeofegypt.com)
This document provides an overview of exome analysis for identifying causal genes for Mendelian disorders. It discusses technological advances that have enabled exome sequencing, key publications in the field, strategies and tools used for data analysis, and exome sequencing service providers. The document is intended as a useful resource for those interested in how exome analysis is used to identify genes underlying Mendelian conditions.
Neuromics' is a recognized leader in providing Large Pharma, Biotech, and Academic/Government Labs 2 and 3-D cell-based assays. They are excellent for use in Drug Discover and Toxicology Studies.
DNA profiling was developed in 1984 by Sir Alec Jeffreys and involves analyzing variable regions of DNA called STRs or microsatellites that differ between individuals. It is used in forensic investigations to identify suspects or link them to crime scenes by comparing a sample to a reference DNA profile. The process involves extracting DNA from samples, analyzing STR regions to develop a profile of allele lengths, and entering it into DNA databases for comparison to other profiles. Some of the largest DNA databases are maintained by governments like the UK's NDNAD and US's CODIS, which help solve crimes but also raise privacy concerns due to retention of profiles.
1. The document discusses using phenotypes across species to aid in interpreting genomic data from patients and improving diagnosis and treatment.
2. Building comprehensive phenotype databases from multiple sources is challenging due to disparate data on human genes/variants and model organisms.
3. The Monarch Initiative aims to link human diseases to phenotypes in model systems through an ontology-based knowledge base and portal.
4. Incorporating rich phenotypic data can improve variant filtering and interpretation by providing more context for sequencing results.
The Monarch Initiative: From Model Organism to Precision Medicinemhaendel
NIH BD2K all-hands meeting poster November 12, 2015.
Attempts at correlating phenotypic aspects of disease with causal genetic influences are often confounded by the challenges of interpreting diverse data distributed across numerous resources. New approaches to data modeling, integration, tooling, and community practices are needed to make efficient use of these data. The Monarch Initiative is an international consortium working on the development of shared data, tools, and standards to enable direct translation of integrated genotype, phenotype, and environmental data from human and model organisms to enhance our understanding of human disease. We utilize sophisticated semantic mapping techniques across a diverse set of standardized ontologies to deeply integrate data across species, sources, and modalities. Using phenotype similarity matching algorithms across these data enables disorder prediction, variant prioritization, and patient matching against known diseases and model organisms. These similarity algorithms form the core of several innovative tools. The Exomiser, which enables exome variant prioritization by combining pathogenicity, frequency, inheritance, protein interaction, and cross-species phenotype data. Our Phenotype Sufficiency tool provides clinicians the ability to compare patient phenotypic profiles using the Human Phenotype Ontology to determine uniqueness and specificity in support of variant prioritization. The PhenoGrid visualization widget illustrates phenotype similarity between patients, known diseases, and model organisms. Monarch develops models in collaboration with the community in support of the burgeoning genotype-phenotype disease research community. We have successfully used Exomiser to solve a number of undiagnosed patient cases in collaboration with the NIH Undiagnosed Disease Program. Ongoing development in coordination with the Global Alliance for Genetic Health (GA4GH) and other groups will catalyze the realization of our goal of a vital translational community focused on the collaborative application of integrated genotype, phenotype, and environmental data to human disease.
Why the world needs phenopacketeers, and how to be onemhaendel
Keynote presented at the the Ninth International Biocuration Conference Geneva, Switzerland, April 10-14, 2016
The health of an individual organism results from complex interplay between its genes and environment. Although great strides have been made in standardizing the representation of genetic information for exchange, there are no comparable standards to represent phenotypes (e.g. patient disease features, variation across biodiversity) or environmental factors that may influence such phenotypic outcomes. Phenotypic features of individual organisms are currently described in diverse places and in diverse formats: publications, databases, health records, registries, clinical trials, museum collections, and even social media. In these contexts, biocuration has been pivotal to obtaining a computable representation, but is still deeply challenged by the lack of standardization, accessibility, persistence, and computability among these contexts. How can we help all phenotype data creators contribute to this biocuration effort when the data is so distributed across so many communities, sources, and scales? How can we track contributions and provide proper attribution? How can we leverage phenotypic data from the model organism or biodiversity communities to help diagnose disease or determine evolutionary relatedness? Biocurators unite in a new community effort to address these challenges.
Semantic phenotyping for disease diagnosis and discovery mhaendel
Here are a few things to consider about the patient's lower back pain over time:
- Acute vs chronic: Determine if the pain is a new onset (acute) or has been present long-term (chronic). The duration can provide clues.
- Progression: Note if the pain has gotten better, worse or stayed the same over time. Progression may indicate a more serious problem.
- Radiation: Document if the pain radiates anywhere (e.g. legs). Radiating pain can suggest nerve root involvement.
- Relieving/aggravating factors: Identify what makes the pain better or worse (e.g. activity, rest, position). This can help determine the
Phenopackets as applied to variant interpretation mhaendel
Phenopackets provide a standardized format for representing phenotypic data in order to make such data more findable, accessible, interoperable, and reusable. The format captures information about entities like patients and organisms, their associated conditions and phenotypes, and evidence for these associations. Phenopackets can be exported in different formats like CSV, JSON, and RDF. They allow complex phenotypes to be described through annotation and composition of terms from ontologies. Tools are being developed to work with phenopackets to enable applications in areas like clinical diagnostics, databases, and journals.
Use of semantic phenotyping to aid disease diagnosismhaendel
This document discusses using semantic phenotyping to aid disease diagnosis. It outlines using ontologies to semantically annotate phenotypes seen in patients, animal models, and genes. This allows computation of semantic similarity between phenotypes to identify potential disease candidates. The document also discusses challenges such as uneven phenotype data distribution and differences in how phenotypes are described across species. It proposes building an integrated cross-species semantic framework called Uberpheno to address these challenges and better leverage animal models for diagnosing rare diseases.
Enhancing the Human Phenotype Ontology for Use by the LaypersonNicole Vasilevsky
Presentation at the International Conference on Biological Ontology & BioCreative, August 1-4, 2016, Corvallis, Oregon, USA.
Abstract
In rare or undiagnosed diseases, physicians rely upon genotype and phenotype information in order to compare abnormalities to other known cases and to inform diagnoses. Patients are often the best sources of information about their symptoms and phenotypes. The Human Phenotype Ontology (HPO) contains over 12,000 terms describing abnormal human phenotypes. However, the labels and synonyms in the HPO primarily use medical terminology, which can be difficult for patients and their families to understand. In order to make the HPO more accessible to non-medical experts, we systematically added new synonyms using non-expert terminology (i.e., layperson terms) to the existing HPO classes or tagged existing synonyms as layperson. As a result, the HPO contains over 6,000 classes with layperson synonyms.
The human genome is full of repeated DNA sequences which come in various sizes and are classified according to the length of the core repeat units, the number of contiguous repeat units, and/or the overall length of the repeat region. DNA regions with short repeat units (usually 2-6 bp in length) are called Short Tandem Repeats (STR).
Empowering patients by increasing accessibility to clinical terminologyNicole Vasilevsky
Flash talk at Medical Library Association Pacific Northwest Chapter meeting in Portland, OR on October 18, 2016.
http://pnc-mla.cloverpad.org/annual2016
Authors: Erin Foster, Mark Engelstad, Chris Mungall, Peter Robinson, Sebastian Kohler, Melissa Haendel and Nicole Vasilevsky
The document discusses two scientific studies:
1) A study of identical twins which found that DNA rearrangements increased with age and may help explain immune system aging. Larger rearrangements were only in twins over 60, and rearrangements correlated with age.
2) A study linking faulty proteins to ovarian cancer and potential for expanding PARP inhibitor treatments to more patients. Additional defective proteins beyond BRCA may be driving ovarian cancer. This could change ovarian cancer treatment dynamics.
A Primer to Bioinformatics: 29 September 2017DocSoc2017
This document provides an introduction and primer to key concepts in bioinformatics. It discusses DNA structure and genes, how bioinformatics uses computer science to solve biological problems like genome sequencing, and the central dogma of DNA transcription and translation into mRNA and protein. It then outlines three tasks - converting a DNA sequence to mRNA, evaluating a sequence for single nucleotide polymorphisms related to sickle cell disease, and using a restriction endonuclease to identify tandem repeats related to Huntington's disease risk.
This study examined the relationship between DNA damage and numerical chromosome abnormalities in sperm samples from 45 infertile men. The study found:
1) A significant correlation between the proportion of sperm with numerical chromosome abnormalities and the level of DNA fragmentation.
2) Sperm cells that were chromosomally abnormal were more likely to display DNA damage than those that were normal based on the chromosomes tested.
3) This association was detected not only in samples with elevated rates of chromosome abnormalities, but also in samples with rates in the normal range. The findings suggest DNA fragmentation may be a marker for the presence of chromosome abnormalities in sperm.
Poster presentation at the Rare Disease Symposium at Oregon Health & Science University in Portland, Oregon, 2015.
http://openwetware.org/wiki/OHSU_Rare_Disease_Research_Consortium_Symposium_2015
Identify Disease-Associated Genetic Variants Via 3D Genomics Structure and Re...Databricks
Whole genome sequencing (WGS) has enabled us to quantify human genomic variation at whole genome scale. This has profound impact on improving our understanding of human diversity, health, and diseases. One promising application of WGS is to identify disease-causal genes that can be therapeutically targeted. However, majority of disease-associated variants are located in non-coding regions or so-called genetic deserts, thus the exact function and biological consequences of these variants are unknown. In addition, with numerous variants in linkage disequilibrium (LD), genetic sequence itself is insufficient to infer the likely causal variant(s) among many variants in a region of association. Studies have shown that majority of these variants reside in gene regulatory regions and preferentially in cell type-specific enhancers, providing insights into disease relevance. Novel cutting-edge sequencing technologies to configure 3D genomic structure and to build tissue-specific gene regulatory landscapes can link regulatory elements to their targeted genes. This allows us to associate disease-associated variants and their underlying genes targets.
In this talk, we demonstrate a new approach to incorporate 3D genomic structure and chromatin states of gene regulatory landscapes in a deep learning framework to predict functions of disease-associated variants and their targeted genes. This approach can significantly improve our understanding of the functional importance of those otherwise unknown genetics variants. It allows us to evaluate and prioritize high-impact variants and their targeted genes for development of new drug intervention.
The Genomics Revolution: The Good, The Bad, and The Ugly (UEOP16 Keynote)Emiliano De Cristofaro
The document discusses the genomics revolution and its implications for privacy. It outlines the good of genetic testing and medicine, the bad of collecting sensitive genomic data that is hard to anonymize, and the ugly challenges of balancing privacy and the greater good. It then reviews the history of genome sequencing and cost reductions. The remainder summarizes privacy issues like re-identification risks, kin privacy, and challenges of data sharing. It also outlines cryptography techniques being explored to enable private genomic computation and testing on encrypted genomes. Open problems remain around long-term data storage and usability of privacy techniques.
Researchers from the University of Massachusetts Medical School, Institut Curie in Paris and Stanford University studied the structure of the inactive X chromosome, known as the Barr body, in female mammals. They discovered that the Barr body contains two separately packed lobes of condensed inactive DNA separated by a highly repetitive segment of DNA. This suggests the repetitive DNA may play a role in organizing the Barr body. A separate study from the University of Valencia found that females have a protective effect from genetic mutations due to having two X chromosomes, whereas males only have one unprotected X chromosome, helping to explain differences in lifespan between sexes. Understanding normal genetic expression and abnormalities could enable development of specific medical treatments tailored to individuals.
Lecture presented by Dr.Fatma Taha at BIOCHEM Cairo 2014 organized by Department of Medical Biochemistry and Molecular Biology, Cairo University. BIOCHEM Cairo 2014 is a Scribe event ( www.scribeofegypt.com)
This document provides an overview of exome analysis for identifying causal genes for Mendelian disorders. It discusses technological advances that have enabled exome sequencing, key publications in the field, strategies and tools used for data analysis, and exome sequencing service providers. The document is intended as a useful resource for those interested in how exome analysis is used to identify genes underlying Mendelian conditions.
Neuromics' is a recognized leader in providing Large Pharma, Biotech, and Academic/Government Labs 2 and 3-D cell-based assays. They are excellent for use in Drug Discover and Toxicology Studies.
DNA profiling was developed in 1984 by Sir Alec Jeffreys and involves analyzing variable regions of DNA called STRs or microsatellites that differ between individuals. It is used in forensic investigations to identify suspects or link them to crime scenes by comparing a sample to a reference DNA profile. The process involves extracting DNA from samples, analyzing STR regions to develop a profile of allele lengths, and entering it into DNA databases for comparison to other profiles. Some of the largest DNA databases are maintained by governments like the UK's NDNAD and US's CODIS, which help solve crimes but also raise privacy concerns due to retention of profiles.
1. The document discusses using phenotypes across species to aid in interpreting genomic data from patients and improving diagnosis and treatment.
2. Building comprehensive phenotype databases from multiple sources is challenging due to disparate data on human genes/variants and model organisms.
3. The Monarch Initiative aims to link human diseases to phenotypes in model systems through an ontology-based knowledge base and portal.
4. Incorporating rich phenotypic data can improve variant filtering and interpretation by providing more context for sequencing results.
The Monarch Initiative: From Model Organism to Precision Medicinemhaendel
NIH BD2K all-hands meeting poster November 12, 2015.
Attempts at correlating phenotypic aspects of disease with causal genetic influences are often confounded by the challenges of interpreting diverse data distributed across numerous resources. New approaches to data modeling, integration, tooling, and community practices are needed to make efficient use of these data. The Monarch Initiative is an international consortium working on the development of shared data, tools, and standards to enable direct translation of integrated genotype, phenotype, and environmental data from human and model organisms to enhance our understanding of human disease. We utilize sophisticated semantic mapping techniques across a diverse set of standardized ontologies to deeply integrate data across species, sources, and modalities. Using phenotype similarity matching algorithms across these data enables disorder prediction, variant prioritization, and patient matching against known diseases and model organisms. These similarity algorithms form the core of several innovative tools. The Exomiser, which enables exome variant prioritization by combining pathogenicity, frequency, inheritance, protein interaction, and cross-species phenotype data. Our Phenotype Sufficiency tool provides clinicians the ability to compare patient phenotypic profiles using the Human Phenotype Ontology to determine uniqueness and specificity in support of variant prioritization. The PhenoGrid visualization widget illustrates phenotype similarity between patients, known diseases, and model organisms. Monarch develops models in collaboration with the community in support of the burgeoning genotype-phenotype disease research community. We have successfully used Exomiser to solve a number of undiagnosed patient cases in collaboration with the NIH Undiagnosed Disease Program. Ongoing development in coordination with the Global Alliance for Genetic Health (GA4GH) and other groups will catalyze the realization of our goal of a vital translational community focused on the collaborative application of integrated genotype, phenotype, and environmental data to human disease.
openEHR in Research: Linking Health Data with Computational ModelsKoray Atalag
My prezo at Medinfo 2017 openEHR Developers Workshop.
The aim was to demonstrate how openEHR supports very advanced research and analytics with examples from computational physiology and biosimulation to create patient-specific decision support.
Informatics and data analytics to support for exposome-based discoveryChirag Patel
The document discusses the need for informatics methods, databases, and standards to support exposome-driven discovery research in a similar way that informatics has supported genomic research. Specifically, it notes that estimates of heritability from twin studies indicate that environmental factors likely play an equally important role as genetics in many traits/diseases. However, the chemical space of the exposome is large and heterogeneous, posing challenges to integrate exposome, genome, and phenome data through approaches like exposome-wide association studies.
The document discusses using structured phenotype data to improve the interpretation and prioritization of candidate genes from exome sequencing data, particularly for undiagnosed diseases. It outlines current challenges in candidate gene prioritization based on phenotypes alone. It then describes how ontologies can be used to semantically represent and compare phenotypes across species to leverage knowledge from model organisms. The document presents results showing that combining phenotype data with variant data using a tool called PhenIX improves the ability to correctly prioritize candidate genes from exome data compared to using variant data alone. This demonstrates the utility of structured phenotype data for computational analysis of exomes to diagnose rare diseases.
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...mhaendel
Presented at AMIA TBI CRI 2018.
Rare disease patients are expert in their medical history and these patients not only are some of the most engaged, but also they can themselves provision data for use in clinical evaluation. We therefore created a lay-person version of our clinical deep phenotyping instrument, the Human Phenotype Ontology. Here, we evaluate the diagnostic utility of this lay-HPO, and debut a new software tool for patient-led deep phenotyping.
The Software and Data Licensing Solution: Not Your Dad’s UBMTA mhaendel
Presented at the Association of University Technology Managers (AUTM) Annual Conference 2018
Moderator: Arvin Paranjpe, Oregon Health & Science University
Speakers: Frank Curci, Ater Wynne LLP
Melissa Haendel, Oregon Health & Science University
Charles Williams, University of Oregon
Big data is an open frontier, and it’s quickly expanding. However, transaction costs and legal barriers stand squarely in the way of meaningful, far-reaching data integration. We’ll grapple with the issues regarding a large-scale data integration project across humans, model and non-model organisms. Without pointing fingers, we’ll also share a few highlights from the (Re)usable Data Project, which outlined a five-part rubric to evaluate data licenses with respect to clarity and the reuse and redistribution of data. In addition, the topic raises the question: How well-suited are off-the-shelf software and data licenses for universities? Data scientists and software programmers are all too quick to pick one when they release their technology on GitHub. What should technology transfer professionals
recommend? We’ll discuss the usefulness and attributes of a uniform software and data license for university researchers and software programmers.
Equivalence is in the (ID) of the beholdermhaendel
Presented at PIDapalooza 2018. https://pidapalooza.org/
Determining identifier equivalency is key to data integration and to realizing the scientific discoveries that can only be made by collating our vast disconnected data stores.
There are two key problems in determining equivalency - conceptual and syntactic alignment. Conceptual alignment often relies on Xrefs and string-matching against synonyms. There is indeed a better way! Algorithmic determination of identifier equivalency across different sources can use a combination of Xrefs, priors rules, existing semantic relations, and synonyms to create equivalency cliques than can highlight the discrepancies in conceptual definitions for manual review. This is especially useful for data sources annotated with concept drift and differences, such as diseases. Syntactic issues are that there are so many variations of the same identifier, making data joins difficult. We present a framework to reconcile and provide authoritative and integration-ready prefixed identifiers (CURIES), to capture and consolidate prefixes and to build links across key resource registries. The combination of JSON-LD context technology with a prefix metadata repository provides the basis for the infrastructure to handle identifiers in a consistent fashion. Finally, this architecture also allows resources to be self describing "beacons" with respect to their identifiers.
Building (and traveling) the data-brick road: A report from the front lines ...mhaendel
The NIH Data Commons must treat the data it will contain not unlike the mortar and stones of a road. To help our fellow scientists travelers use the road, we must engineer for heavy traffic and diverse destinations. There are many steps to architecting a robust and persistent road. First, the data must be sourced and manipulated into common data models. This requires versioned access to the data, equivalency determination of identifiers within the data or minting of new ones for the data and/or within it, manipulating the data according to common data models (e.g. a genotype-to-pehnotype association in one source may relate a variant to a disease, where in another it may be a set of alleles associated with a set of phenotypes, each source models the data differently). Inclusion of the data in the Commons must meet all licensing restrictions, which are varied and usually poorly declared, as well as security, HIPAA, and ethics requirements. Software tools are needed to perform the Enhance-Transform-Load (ETL) process on a regular cycle to keep the data current, and to assess changes and quality assurance over time. For records that disappear, there needs to be a way to keep an archive of them. Once in the Commons, the data requires a map to navigate the roads: where do you want to go? Indexing and search across the data requires having the data be self-reporting - loading ontologies used in the data for indexing and providing faceted query over these and other attributes, sophisticated text mining tools, relevance ranking, and equivalency and similarity determination from amongst different providers. Once found, the users need vehicles to drive upon the road. These are their workspaces, the place where they design and implement the operations they need in order to get where they want to go. Unimaginable scientific emeralds are to be found at the end of the road, as the sum of all the data, if well integrated and made computationally reusable, has proven to be well beyond the sum of its parts in getting us where we want to go.
Reusable data for biomedicine: A data licensing odysseymhaendel
Biomedical data integrators grapple with a fundamental blocker in research today: licensing for data use and redistribution. Complex licensing and data reuse restrictions hinder most publicly-funded, seemingly “open” biomedical data from being put to its full potential. Such issues include missing licenses, non-standard licenses, and restrictive provisions. The sheer diversity of licenses are particularly thorny for those that aim to redistribute data. Redistributors are often required to contact each sub-source to obtain permissions, and this is complicated by the fact that on each side of the agreement there may be multiple legal entities involved and some sub-sources may themselves already be aggregating data from other sub-sources. Furthermore, interpreting legal compliance with source data licensing and use agreements is complicated, as data is often manipulated, shared, and redistributed by many types of research groups and users in various and subtle ways. Here, we debut a new effort, the (Re)usable Data Project, where we have created a five-part rubric to evaluate biomedical data sources and their licensing information to determine the degree to which unnegotiated and unrestricted reuse and redistribution are provided. We have tested the (Re)usable Data rubric against various biomedical data sources, ranking each source on a scale of zero to five stars, and have found that approximately half of the resources rank poorly, getting 2.5 stars or less. Our goal is to help biomedical informaticians and other users navigate the plethora of issues in reusing and redistributing biomedical data. The (Re)usable Data project aims to promote standardization and ease of reuse licensing practices by data providers.
How open is open? An evaluation rubric for public knowledgebasesmhaendel
Presented at the 2017 International Biocuration Conference.
Data relevant to any given scientific investigation is highly decentralized across thousands of specialized databases. Within the Biocuration community, we recognize that the value of open scientific knowledge bases is that they make scientific knowledge easier to find and compute, thereby maximizing impact and minimizing waste. The ever-increasing number of databases makes us necessarily question what are our priorities with respect to maintaining them, developing new ones, or senescing/subsuming ones that have completed in their mission. Therefore, open biomedical data repositories should be carefully evaluated according to quality, accessibility, and value of the database resources over time and across the translational divide.
Traditional citation count and publication impact factors as a measure of success or value are known to be inadequate to assess the usefulness of a resource. This is especially true for integrative resources. For example, almost everyone in biomedicine relies on PubMed, but almost no one ever cites or mentions it in their publications. While the Nucleic Acids Research Database issues have increased citation of some databases, many still go unpublished or uncited; even novel derivations of methodology, applications, and workflows from biomedical knowledge bases are often “adapted” but never cited. There is a lack of citation best practices for widely used biomedical database resources (e.g. should a paper be cited? A URL? Is mention of the name and access date sufficient?).
We have developed a draft evaluation rubric for evaluating open science databases according to the commonly cited FAIR principles -- Findable, Accessible, Interoperable, and Reusable, but with three additional principles: Traceable, Licensed, and Connected. These additions are largely overlooked and underappreciated, yet are critical to reuse of the knowledge contained within any given database. It is worth noting that FAIR principles apply not only to the resource as a whole, but also to their key components; this “fractal FAIRness” means that even the license, identifiers, vocabularies, APIs themselves must be Findable, Accessible, Interoperable, Reusable, etc. Here we report on initial testing of our evaluation rubric on the recent NIH/Wellcome Trust Open Science projects and seek community input for how to further advance this rubric as a Biocuration community resource.
This document discusses making scientific data fair, open, and reusable. It defines the FAIR guiding principles of findable, accessible, interoperable and reusable data and describes what each principle entails. It then expands on these principles by introducing FAIR-TLC, which adds the dimensions of traceable, licensed and connected. The document argues that adopting FAIR-TLC practices and developing tools to support them can help improve the sharing and reuse of scientific data. It also suggests ways to incentivize open science through funding and publication requirements.
Credit where credit is due: acknowledging all types of contributionsmhaendel
This is an update for COASP (http://oaspa.org/conference/) on the representation of attribution beyond authorship of a publication. Publications are proxies for the projects and people that area actually engaged in the work, and represent the dissemination aspect. How can we better understand the individual contributions and their impact? The openRIF, openVIVO and FORCE11 Attribution WG efforts aim to represent scholarship in a computationally tractable manner so as to enable credit and evaluation of all types of scholarly contributions.
On the frontier of genotype-2-phenotype data integrationmhaendel
Presented at AMIA TBI 2016 BD2K Panel. A description of the Monarch Initiative's efforts to perform deep phenotyping data integration across species, facilitate exchange, and build computable G2P evidence modesl to aid variant interpretation.
Envisioning a world where everyone helps solve diseasemhaendel
Keynote presented at the Semantic Web for Life Sciences conference in Cambridge, UK, December 9th, 2015
http://www.swat4ls.org/
The talk focuses on the use of ontologies for data integration to support rare disease diagnostics, and how so very many people unbeknownst to the patient or even to the researchers creating the data are involved in a diagnosis.
Getting (and giving) credit for all that we domhaendel
This document discusses the need to give proper attribution and credit to all contributions in the research process, not just authorship of publications. It notes that many roles and outputs are not adequately recognized currently. It introduces the open Research Information Framework (openRIF) which aims to develop ontologies and tools to connect people to their diverse research outputs and roles through interoperable systems in order to ensure proper attribution for all.
Integrating clinical and model organism G2P data for disease discoverymhaendel
This document discusses challenges in integrating clinical and model organism genotype-phenotype data to improve disease discovery. It notes there are many variants of unknown significance and phenotypes are not well represented across vocabularies. The document describes using the Human Phenotype Ontology and semantic techniques to standardize and bridge vocabularies and phenotypes across species. This can help compare phenotypic profiles for variant prioritization and disease discovery. Standardizing data and ontologies across species in a graph database is described as a way to propagate evidence and evaluate schemas being developed by GA4GH.
Force11: Enabling transparency and efficiency in the research landscapemhaendel
Presented at the Feb 2015, NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
http://www.niso.org/news/events/2015/virtual_conferences/sci_data_management/
Dataset description using the W3C HCLS standardmhaendel
This talk was presented at the BioCaddie http://biocaddie.org/ workshop at the Force15 conference (https://www.force11.org/meetings/force2015) on changing the future of scholarly communication. The goal was to increase awareness of why a Semantic Web-compliant standard was needed for describing data, where current standards fall short, and how this new emerging standard that extends prior efforts can aid data discovery and integration. This work is being lead by Michel Dumontier, Alasdair Gray, Joachim Baran, and M. Scott Marshall; participants and end-user testers are welcome, see: http://tiny.cc/hcls-datadesc-ed
Standardizing scholarly output with the VIVO ontologymhaendel
The document discusses standardizing scholarly output by creating a semantic representation of research activities and products using VIVO-ISF. This would enable identifying potential collaborators and expertise across disciplines. VIVO-ISF can integrate data from different research profiling systems and sources to provide a standardized view. Integrating clinical, research, and publication data from multiple institutions using VIVO-ISF can help answer questions about expertise, collaboration, and identifying advisors.
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
PPT on Direct Seeded Rice presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills MN
By harnessing the power of High Flux Vacuum Membrane Distillation, Travis Hills from MN envisions a future where clean and safe drinking water is accessible to all, regardless of geographical location or economic status.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
The cost of acquiring information by natural selectionCarl Bergstrom
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfSelcen Ozturkcan
Ozturkcan, S., Berndt, A., & Angelakis, A. (2024). Mending clothing to support sustainable fashion. Presented at the 31st Annual Conference by the Consortium for International Marketing Research (CIMaR), 10-13 Jun 2024, University of Gävle, Sweden.
13. With many thanks
Lawrence Berkeley
Chris Mungall
Suzanna Lewis
Jeremy Nguyen
Seth Carbon
Nicole Washington
Charite
Sebastian Kohler
Ga rv a n
Tudor Groza
Craig McNamara
RENCI
Jim Balhoff
Boston Children’s
Ingrid Holm
Catherine Brownstein
John Brownstein
E BI
Helen Parkinson
David Osumi-Sutherland
OH SU
Matt Brush
Kent Shefchek
Julie McMurry
Tom Conlin
Nicole Vasilevsky
Dan Keith
Maureen Hoatlin
Tim Putman
JP Gourdine
David Ellison
Gen omi cs E n gland/Queen
M a ry
Damian Smedley
Jules Jacobson
Tomasz Konopka
Pilar Cacheiro
J a ckson La boratory
Peter Robinson
Leigh Carmody
Hannah Blau
With special thanks to Julie McMurry for excellent graphic design
Johns Hopkins
Chris Chute
Casey Overby
Ada Hamosh
Scripps
Andrew Su
Ben Good
Chunlei Wu
Gregg Stupp
Sanford Health Imagenetics
Neal Boerkoel
Kayli Rageth
Murat Sincan
ClinGen
Heidi Rehm
Larry Babb
Harindra Arachchi