Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Information Genetic Content (IGC): a comprehensive discovery platform for disease-gene research association

488 views

Published on

We developed Information Genetic Content (IGC), a comprehensive
knowledgebase and discovery tool for human genes and genetic disorders
research use. IGC comprises three components: the Disease-Association
Database (DAD), the Gene Scoring Algorithm (GSA), and the Virtual Panel
Library (VPL). The DAD module contains over 400,000 associations
between over 17,000 genes and 15,000 Mendelian and complex diseases
from both expert-curated and text-mined data. The DAD module also
features a hierarchical organization of human diseases using a UMLScontrolled
vocabulary, permitting queries at any level of the disease
ontology hierarchy. The GSA module aims to prioritize genes for a specific
disease of interest. This gene scoring algorithm is distinctive in the way it
combines the strength of association and the number of associated
diseases to provide an unbiased score for each gene. In conjunction with
the DAD module, the GSA module is able to produce a list of ranked genes
for one or more diseases at any level of the disease hierarchy. The VPL
module generates optimal gene grouping by disease classification using
hierarchical-clustering-based network analysis. Genes that are involved in
the same pathological pathways are grouped into the same cluster.

Published in: Science
  • If we are speaking about saving time and money this site ⇒ www.HelpWriting.net ⇐ is going to be the best option!! I personally used lots of times and remain highly satisfied.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • I like this service ⇒ www.WritePaper.info ⇐ from Academic Writers. I don't have enough time write it by myself.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Information Genetic Content (IGC): a comprehensive discovery platform for disease-gene research association

  1. 1. Yun Zhu, Emily Williams, Yuan Tian, Carol Munroe, John Bucci, Yutao Fu, Fiona Hyland, and Corina Shtir, Clinical Next-Gen Seq Division, Thermo Fisher Scientific Inc., 5781 Van Allen Way, Carlsbad, CA, U.S.A, 92008. Table 1. Disease annotation for the 28 identified gene clusters.ABSTRACT We developed Information Genetic Content (IGC), a comprehensive knowledgebase and discovery tool for human genes and genetic disorders research use. IGC comprises three components: the Disease-Association Database (DAD), the Gene Scoring Algorithm (GSA), and the Virtual Panel Library (VPL). The DAD module contains over 400,000 associations between over 17,000 genes and 15,000 Mendelian and complex diseases from both expert-curated and text-mined data. The DAD module also features a hierarchical organization of human diseases using a UMLS- controlled vocabulary, permitting queries at any level of the disease ontology hierarchy. The GSA module aims to prioritize genes for a specific disease of interest. This gene scoring algorithm is distinctive in the way it combines the strength of association and the number of associated diseases to provide an unbiased score for each gene. In conjunction with the DAD module, the GSA module is able to produce a list of ranked genes for one or more diseases at any level of the disease hierarchy. The VPL module generates optimal gene grouping by disease classification using hierarchical-clustering-based network analysis. Genes that are involved in the same pathological pathways are grouped into the same cluster. INTRODUCTION The identification of disease-associated genes is an important step towards understanding disease mechanisms, diagnosis, and therapy for the future. However, due to the complex and distributed nature of the problem, current scientific knowledge is spread out over several overlapping databases maintained by independent groups. It is unclear how to rank gene-disease research associations due to the distributed and dispersed nature of our knowledge. To fill this gap, we developed Information Genetic Content (IGC), a comprehensive knowledgebase and discovery tool for human genes and genetic disorders research use. IGC is unique in two aspects. First, it integrates data from multiple databases into one system. Second, it provides an unbiased scoring algorithm to rank gene-disease research association at any level of the disease ontology hierarchy. METHODS CONCLUSIONS We created a comprehensive, efficient, and informative engine, the IGC, to optimize gene selection given diseases at any level of the disease ontology hierarchy: • The DAD organizes diseases into an effective hierarchical structure for lookup, and associate diseases to genes. • The GSA ranks genes by clinical relevance, and summarizes the scores for disease at any level of the hierarchy. • The VPL efficiently groups genes into pools by disease classifications, and further ranks the genes within clusters by their relative importance to diseases. REFERENCES 1.Pinero J, Queralt-Rosinach N, Bravo A et al (2015) DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database 2015:bav028. 2.Bodenreider O (2004) The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70. 3.Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008: 9:559 For Research Use only. Not for use in diagnostic procedures © 2016 Thermo Fisher Scientific Inc. All rights reserved. All trademarks are the property of Thermo Fisher Scientific and its subsidiaries unless otherwise specified. Information Genetic Content (IGC): a comprehensive discovery platform for disease-gene research association Thermo Fisher Scientific • 5781 Van Allen Way • Carlsbad, CA 92008 • thermofisher.com Figure 2. Gene Association Database (DAD) maps genes to diseases • DAD contains over 400,000 associations between over 17,000 genes and 15,000 Mendelian and complex diseases from both expert and text-mined data. • DAD established gene-disease relationships based on DisGeNET1, which scores gene- disease associations according to expert-curated sources (e.g. CTD, CLINVAR, and ORPHANET), predicted data using mouse models, and text-mining of publications. Blue circles: two neurological diseases – schizophrenia and bipolar disorder. Green circles: genes associated with these two diseases. • The disease association database (DAD) organizes diseases into an effective hierarchical structure for lookup, using disease parent-child relationships established in NIH Unified Medical Language System (UMLS). • For any disease in the hierarchical tree, the GSA computes the rank-weighted sum score (RWSS) to summarize the strength of the gene’s association with all of its child diseases (see below). Figure 3. Gene Scoring Algorithm (GSA) Figure 5. Gene clustering identified 28 VPLs that can be well defined by disease classifications. A B Disease Key MeSH Category Description C04 Neoplasms C05 Musculoskeletal Diseases C06 Digestive System Diseases C07 Stomatognathic Diseases C08 Respiratory Tract Diseases C09 Otorhinolaryngologic Diseases C10 Nervous System Diseases C11 Eye Diseases C12 Male Urogenital Diseases C13 Female Urogenital Diseases and Pregnancy Complications C14 Cardiovascular Diseases C15 Hemic and Lymphatic Diseases C16 Congenital, Hereditary, and Neonatal Diseases and Abnormalities C17 Skin and Connective Tissue Diseases C18 Nutritional and Metabolic Diseases C19 Endocrine System Diseases C20 Immune System Diseases Cluster Groups Disease of interest DisGeNET Database Rank-Weighted Sum Score (RWSS) RWSS is an unbiased gene scoring method that accounts for both the strength and number of gene-disease pairs. From the top 5,000 genes that are clinical relevant by GSA, 28 gene clusters were identified using WGCNA algorithm3. A) Hierarchical clustering of genes according to their association patterns with 16 high-level MeSH categories relevant to inherited diseases. B) Gene cluster association scores with the 16 MeSH disease categories are shown with p-values. RESULTS Figure 1. Overview of IGC framework Figure 4. Gene Scoring in multiple disease hierarchies Level 1 Level 2 Level 3 Level 4 • The GSA module uses RWSS method to prioritize genes for a specific disease of interest. • In conjunction with the DAD module, the GSA module is able to produce a list of ranked genes for one or more diseases at any level of the disease hierarchy. Module # Module Color GeneCount Disease Annotation 1 turquoise 530 Nervous System Diseases 2 blue 321 Nutritional and Metabolic Diseases 3 brown 307 Cardiovascular Diseases 4 yellow 280 Digestive System Diseases 5 green 253 Eye Diseases 6 red 250 Skin and Tissue Connective Diseases 7 black 229 Male and Female Urogenital Diseases 8 pink 205 Musculoskeletal Diseases 9 magenta 164 Nervous System Diseases; Nutritional and Metabolic Diseases 10 purple 150 Hemic and Lymphatic Diseases 11 greenyellow 140 Musculoskeletal Diseases; Nervous System Diseases 12 tan 137 Neoplasms 13 salmon 129 Respiratory Tract Diseases 14 cyan 111 Otorhinolaryngologic Diseases; Nervous System Diseases 15 midnightblue 90 Male Urogenital Diseases; 16 lightcyan 87 Immune; Male Urogenital Diseases; Female Urogenital Diseases and Pregnancy Complications 17 grey60 76 Stomatognathic Diseases 18 lightgreen 69 Hemic and Lymphatic Diseases; Immune System Diseases 19 lightyellow 67 Female Urogenital Diseases and Pregnancy Complications; Endocrine System Diseases 20 royalblue 63 Female Urogenital Diseases and Pregnancy Complications 21 darkred 61 Musculoskeletal Diseases; Skin and Connective Tissue Diseases 22 darkgreen 60 Musculoskeletal Diseases; Stomatognathic Diseases 23 darkgrey 55 Female and Male Urogenital Diseases; Nutritional and Metabolic Diseases 24 darkturquoise 55 Nutritional and Metabolic Diseases; Endocrine System Diseases 25 darkorange 36 Musculoskeletal Diseases; Cardiovascular Diseases 26 orange 36 Immune System Diseases 27 white 35 Endocrine System Diseases 28 skyblue 34 Immune System Diseases; Skin and Connective Tissue Diseases

×