Successfully reported this slideshow.
Your SlideShare is downloading. ×

Identify Disease-Associated Genetic Variants Via 3D Genomics Structure and Regulatory Landscapes Using Deep Learning Frameworks with Yi-Hsiang Hsu and Yongsheng Huang

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 29 Ad

Identify Disease-Associated Genetic Variants Via 3D Genomics Structure and Regulatory Landscapes Using Deep Learning Frameworks with Yi-Hsiang Hsu and Yongsheng Huang

Download to read offline

Whole genome sequencing (WGS) has enabled us to quantify human genomic variation at whole genome scale. This has profound impact on improving our understanding of human diversity, health, and diseases. One promising application of WGS is to identify disease-causal genes that can be therapeutically targeted. However, majority of disease-associated variants are located in non-coding regions or so-called genetic deserts, thus the exact function and biological consequences of these variants are unknown. In addition, with numerous variants in linkage disequilibrium (LD), genetic sequence itself is insufficient to infer the likely causal variant(s) among many variants in a region of association. Studies have shown that majority of these variants reside in gene regulatory regions and preferentially in cell type-specific enhancers, providing insights into disease relevance. Novel cutting-edge sequencing technologies to configure 3D genomic structure and to build tissue-specific gene regulatory landscapes can link regulatory elements to their targeted genes. This allows us to associate disease-associated variants and their underlying genes targets.

In this talk, we demonstrate a new approach to incorporate 3D genomic structure and chromatin states of gene regulatory landscapes in a deep learning framework to predict functions of disease-associated variants and their targeted genes. This approach can significantly improve our understanding of the functional importance of those otherwise unknown genetics variants. It allows us to evaluate and prioritize high-impact variants and their targeted genes for development of new drug intervention.

Whole genome sequencing (WGS) has enabled us to quantify human genomic variation at whole genome scale. This has profound impact on improving our understanding of human diversity, health, and diseases. One promising application of WGS is to identify disease-causal genes that can be therapeutically targeted. However, majority of disease-associated variants are located in non-coding regions or so-called genetic deserts, thus the exact function and biological consequences of these variants are unknown. In addition, with numerous variants in linkage disequilibrium (LD), genetic sequence itself is insufficient to infer the likely causal variant(s) among many variants in a region of association. Studies have shown that majority of these variants reside in gene regulatory regions and preferentially in cell type-specific enhancers, providing insights into disease relevance. Novel cutting-edge sequencing technologies to configure 3D genomic structure and to build tissue-specific gene regulatory landscapes can link regulatory elements to their targeted genes. This allows us to associate disease-associated variants and their underlying genes targets.

In this talk, we demonstrate a new approach to incorporate 3D genomic structure and chromatin states of gene regulatory landscapes in a deep learning framework to predict functions of disease-associated variants and their targeted genes. This approach can significantly improve our understanding of the functional importance of those otherwise unknown genetics variants. It allows us to evaluate and prioritize high-impact variants and their targeted genes for development of new drug intervention.

Advertisement
Advertisement

More Related Content

Slideshows for you (19)

Similar to Identify Disease-Associated Genetic Variants Via 3D Genomics Structure and Regulatory Landscapes Using Deep Learning Frameworks with Yi-Hsiang Hsu and Yongsheng Huang (20)

Advertisement

More from Databricks (20)

Recently uploaded (20)

Advertisement

Identify Disease-Associated Genetic Variants Via 3D Genomics Structure and Regulatory Landscapes Using Deep Learning Frameworks with Yi-Hsiang Hsu and Yongsheng Huang

  1. 1. YongSheng Huang, Ph.D Identify Disease-Causal Genes from GWAS Loci by 3D Genome Structure, Regulatory Landscapes & Deep Learning Yi-Hsiang Hsu, MD, ScD
  2. 2. Deep Learning: The Inspiration 𝒘 𝑻 𝒙 + 𝒃 “ the deepest concepts in mathematics are those which link one world of ideas with another” ---- Freeman Dyson
  3. 3. Deep Learning: The Natural Form Science 2013 Nov
  4. 4. Deep-Learning: The Renaissance In the 1960s, …... believed that a workable artificial intelligence system was just 10 years away. In the 1980s, a wave of commercial start-ups collapsed, leading to what some people called the “A.I. winter.” But recent achievements have impressed ….. In October, for example, a team of graduate students studying with the University of Toronto computer scientist Geoffrey E. Hinton won the top prize in a contest sponsored by Merck to design software to help find molecules that might lead to new drugs. Scientists See Promise in Deep-Learning Programs by JOHN MARKOFF Nov. 23, 2012
  5. 5. Deep Learning: Impact on Medicine On par performance as 21 board-certified pathologists Nature 2017 Feb >90% specificity and sensitivity as board-certified ophthalmologists Artery’s Cardio DL wins FDA approval for clinical diagnosis (10-sec vs. 1hr)
  6. 6. Deep Learning: The New Disruption Can we leverage DL to identify genetic variants that are disease causal, so that we can treat diseases at its root level per individual patient ?
  7. 7. Yi-Hsiang Hsu, MD, ScD yihsianghsu@hsl.harvard.edu yihsiang@broadinstitute.org Director & Associate Professor, HSL GeriOmics Center, Harvard Medical Sch Program for Quantitative Genomics, Harvard School of Public Health Associate Member, BROAD Institute of MIT and Harvard NHLBI Framingham Heart Study Investigator
  8. 8. Genome-Wide Association Studies (GWAS) Catalog Y-H Hsu  Identified ~13,000 genetic variants (single nucleotide mutations/ polymorphisms) to be associated with ~2,000 diseases/phenotypes ?
  9. 9. Genome-Wide Association Scans Y-H Hsu
  10. 10. Study design: 10,000 to 500,000 samples each with 5 millions genetic variants markers to 3 billions of DNA codes GWAS (Whole Genome Association) Scans Y-H Hsu 1. Genotype SNP arrays/chips 2. NGS Whole GenomeSequence
  11. 11. % Successfully Approved Drugs & Human Genetics Nature Genetics, 2015; 47, 856–860  FDA approved drugs with human genetic information are 5~10X more likely to be successful  Failure targets at each drug development stage (pre-clinical, phase I, II, III) are more likely to be those targets without genetic validation  The impact on medical care from GWAS could potentially be substantial
  12. 12. R&D Spending on New Drugs ≠ Drug Approvals  New a better drug development pipeline  Utilizing human genetic information/validation is the key
  13. 13. Genome-Wide Association Studies (GWAS) Catalog Y-H Hsu  Identified ~13,000 genetic variants (single nucleotide mutations/ polymorphisms) to be associated with ~2,000 diseases/phenotypes
  14. 14. Genome-Wide Association Studies (GWAS) Catalog  Identified ~13,000 genetic variants (single nucleotide mutations/ polymorphisms) to be associated with ~2,000 diseases/phenotypes  91% of disease-associated genetic variants are located in non- protein-coding regions; used to call “junk DNA”  Unknown function, difficult to translate findings into clinical use Y-H Hsu non-coding
  15. 15. RS66800491 (Motion Sickness) Associated Variants Located in Gene Desert Y-H Hsu Genetic Coordination: 1D Physical Location on Linear DNA Sequences
  16. 16. Too Many Genes: Which Gene(s)? (Osteoporosis) Y-H Hsu
  17. 17. FTO Gene Locus (Obesity) Associated Variants Located in Introns: Looks Promising? Y-H Hsu
  18. 18. 10kb Functional Genomics Approaches Tissue-Specific Active Enhancers predicted by Histone Marks H3K27ac, H3K4me1 P300 Y-H Hsu NEJM, 2016
  19. 19. eQTLs Intensity of 3D Physical Interaction by Hi-C seq TAD Plot 3D Genome Interaction Structure with IRX5 Gene  Tissue-Specific Chromatin Confirmation Capture (3C Tech)  eQTLs (associations between variants and gene expression)  Allele-specific expression 2Mb Y-H Hsu NEJM, 2016
  20. 20. FTO Genetic Variants and IRX5 Gene Regulation Y-H Hsu  Obesity associated genetic variants disrupt TF binding and then reduce IRX5 gene expression Mutations Polymorphisms IRX5 IRX5 Enhancers Wild type Obesity subjects Healthy subjects
  21. 21.  Gene Editing by CRISPR/Cas9 in Human adipocytes from subjects carried “risk allele” and subjects carried “protective allele”  The Risk Allele C: Gain-of-function Gene-Editing: Functional Validation NEJM, 2016Y-H Hsu
  22. 22.  The obesity associated variants physically interacts with promoter of Irx3 gene, but not Fto, not Irx5 in mouse brain by 4C-seq  4C-seq: Regional Chromatin Confirmation Capture (3C Tech) FTO Variants Link to Irx3 Gene in Brain Nature, 2014Y-H Hsu
  23. 23. Gene regulatory elements in physical proximity (3D space) with the gene promoters via looping mechanisms Gene Regulatory Models Tissue (Cell)-Specific DNA Loops: Enhancer-Promoter Interactions Y-H Hsu Nature, 2009, 461, 199-205
  24. 24. Genome-Widely Identify/Predict Targeted Genes?  Identified ~13,000 genetic variants (single nucleotide mutations/ polymorphisms) to be associated with ~2,000 diseases/phenotypes  91% of disease-associated genetic variants are located in non- protein-coding regions; used to call “junk DNA”  Unknown function, difficult to translate findings into clinical use  May involve in tissue/cell type-specific gene regulation Y-H Hsu
  25. 25. Chromosome Conformation Capture To Identify DNA Loops Science. 2009.; 326(5950): 289–293 Nat Rev Genet. 2010;11(6):439-46. Cell. 2014;159(7):1665-80 Nature Genetics 2016; 48, 488–496  3C, 4C, 5C, HiC, capture-HiC, etc to estimate 3D interaction among genome Hi-C seq Contact Map Loop Domains Enhancer-Promoter Enhancer-Enhancer Promoter-Promoter Physical Interactions False-Pos (seq error, miss-matched cutting,…) PredictionY-H Hsu
  26. 26. Building Tissue-Specific Gene Regulatory Circuits Hi-C ATAC ATAC Y-H Hsu
  27. 27. Building Gene Regulatory Circuits On Human Heart  Omics experiments on normal human primary cardiac fibroblasts and myocytes from atrium and ventricle; HMSC; skeletal muscle cells  Publicly available (low resolution): Left ventricle, right ventricle and aorta tissues Experiments Functions Notes ATAC-seq Active cis-regulatory region Active TF binding Hi-C Chromatin confirmation capture 1.5 to 2 kb resoultion (DpnII, 2 Billions Reads, 2Tb) H3K4me3: Aactive promoter H3K27ac: Active enhancer/promoter CTCF: Insulator Cohesin: Insulator-RAD21 Cohesin: Insulator-SMC3 H3K27me3: Polycomb repressed/bivalent promoter/enhancer H3K9me3: Heterochromatin H3K36me3: Transcribed region mRNA (active and ) Isoforms; coexpression with TF microRNA and small RNA Enhancer RNA ChIP-seq Predicted chromatin states by HMM RNA-seq Y-H Hsu
  28. 28. Model Gene Regulation with Deep Neural Network (DNN)  DNN implemented in the TensorFlow to predict enhancer-promoter gene pairs Motif Chip-seq chromatin states TSS distance matrix …… Hi-C contact matrix  Training sets (VISTA: enhancer elements are in 100kb of genes): 1,564 Enhancer-promoter gene pairs (the positive set) functionally validated to have regulatory relationships in mouse models 1,207 EP pairs without regulatory relationships (the negative set)
  29. 29. TwinsUK Acknowledgements Yi-Hsiang Hsu, MD, ScD yihsianghsu@hsl.harvard.edu yihsiang@broadinstitute.org

×