Understanding mechanisms underlying human gene expression variation with RNA sequencing

3,783 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,783
On SlideShare
0
From Embeds
0
Number of Embeds
231
Actions
Shares
0
Downloads
90
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Understanding mechanisms underlying human gene expression variation with RNA sequencing

  1. 2. Goals: <ul><li>Long-term: understand the precise mechanisms by which genetic variation in humans influences gene expression </li></ul><ul><ul><li>inform genome-wide association studies (which have identified hundreds of non-coding region associated with disease) </li></ul></ul><ul><li>This study: identify the transcribed and polyadenylated RNAs present in a model cell type, identify genetic variants that influence the expression of these RNAs </li></ul>
  2. 3. Lymphoblastoid cell lines: a model system for understanding the genetics of gene regulation <ul><li>This talk: 69 cell lines derived from white blood cells from Nigerian individuals </li></ul>Genotypes : > 4M Single Nucleotide Polymorphisms (SNPs) from the HapMap Project Genomic Data : mRNA expression, DNA methylation, histone marks, etc.
  3. 4. Lymphoblastoid cell lines: a model system for understanding the genetics of gene regulation Genotypes : > 4M Single Nucleotide Polymorphisms (SNPs) from the HapMap Project This talk: 69 cell lines derived from white blood cells from Nigerian individuals
  4. 5. AAAAAAAAAA DNA RNA RNA-Seq 1: Isolate poly-A RNA, convert to cDNA
  5. 6. AAAAAAAAAA DNA RNA RNA-Seq 2: Generate short sequencing reads from cDNA library (using Illumina GA2)
  6. 7. AAAAAAAAAA DNA RNA RNA-Seq 3: Map short reads back to genome; count of reads/gene measures expression
  7. 8. AAAAAAAAAA DNA RNA AAAAAAAAAA
  8. 9. Lymphoblastoid cell lines: a model system for understanding the genetics of gene regulation <ul><li>This talk: 69 cell lines derived from white blood cells from Nigerian individuals </li></ul>AAAAAAAAAA Sequence RNA from each line (total 1.2 billion sequencing reads), average expression levels across lines
  9. 10. RNA-Seq identifies new exons
  10. 11. AAAAAAAAAA DNA RNA
  11. 12. AAAAAAAAAA DNA RNA
  12. 13. RNA-Seq identifies new exons
  13. 14. RNA-Seq identifies new polyadenylation sites
  14. 15. AAAAAAAAAA DNA RNA
  15. 16. AAAAAAAAAA DNA RNA
  16. 17. RNA-Seq identifies new polyadenylation sites
  17. 18. Revisiting gene annotations in these cells <ul><li>~4,000 unannotated, conserved exons </li></ul><ul><ul><li>115 of which appear protein-coding </li></ul></ul><ul><ul><li>Unannotated exons have lower expression and are more tissue specific than annotated exons </li></ul></ul><ul><li>~400 polyadenylation sites over 50 bases from a known site. </li></ul><ul><ul><li>Conclusion: extensive use of unannotated UTRs. </li></ul></ul>
  18. 19. Lymphoblastoid cell lines: a model system for understanding the genetics of gene regulation <ul><li>This talk: 69 cell lines derived from white blood cells from Nigerian individuals </li></ul>AAAAAAAAAA Genotypes : > 4M Single Nucleotide Polymorphisms (SNPs) from the HapMap Project Genotypes : > 4M Single Nucleotide Polymorphisms (SNPs) from the HapMap Project Genotypes : > 4M Single Nucleotide Polymorphisms (SNPs) from the HapMap Project Genotypes : > 4M Single Nucleotide Polymorphisms (SNPs) from the HapMap Project Genotypes : > 4M Single Nucleotide Polymorphisms (SNPs) from the HapMap Project Genotypes : > 4M Single Nucleotide Polymorphisms (SNPs) from the HapMap Project Genotypes : > 4M Single Nucleotide Polymorphisms (SNPs) from the HapMap Project Expression : from RNA-Seq
  19. 20. (Natural) Genetic variation potentially affects many levels of gene regulation DNA 1. Transcription Initiation Chromatin accessibility TF Binding 2. mRNA processing Splicing Polyadenylation Capping, export 3. mRNA degradation microRNA regulation NMD 4. Translation, etc tRNA abundances Protein localization Protein degradation
  20. 21. AAAAAAAAAA DNA RNA RNA-Seq 3: Map short reads back to genome; count of reads/gene measures expression C
  21. 22. AAAAAAAAAA DNA RNA AAAAAAAAAA T
  22. 23. <ul><li>use genotypes to identify associations between genetic variation and expression (eQTLs) </li></ul><ul><li>~1000 eQTLs at an FDR of 10% </li></ul>
  23. 24. Polymorphisms near the transcription start site of a gene are the most likely to affect its transcription <ul><li>Combining information across all genes, we can ask where SNPs that affect expression lie </li></ul><ul><li>SNPs near the TSS, throughout the genic region most likely to influence expression </li></ul>Black: bins within the genic region Blue: bins outside the genic region L See also Veyrieras et al. (2008), Stranger et al. (2007), Cheung et al. (2005)
  24. 25. (Natural) Genetic variation potentially affects many levels of gene regulation DNA 1. Transcription Initiation Chromatin accessibility TF Binding 2. mRNA processing Splicing Polyadenylation Capping, export 3. mRNA degradation microRNA regulation NMD 4. Translation, etc tRNA abundances Protein localization Protein degradation
  25. 26. <ul><li>use genotypes to identify associations between genetic variation and splicing (sQTLs) </li></ul><ul><li>~200 sQTLs at an FDR of 10% </li></ul>
  26. 27. <ul><li>use genotypes to identify associations between genetic variation and splicing (sQTLs) </li></ul><ul><li>~200 sQTLs at an FDR of 10% </li></ul>
  27. 28. where are SNPs that affect splicing? <ul><li>Figure: odds of a SNP in a given functional annotation to impact splicing (relative to those in non-splice site intronic positions) </li></ul><ul><li>SNPs in splice sites (this is defined liberally to include sites beyond the canonical two bases) and within the exon itself are enriched for sQTLs </li></ul>
  28. 29. Conclusions <ul><li>Goal: understand the mechanisms of natural variation in gene regulation in a model system </li></ul><ul><li>RNA sequencing is useful for annotating genomes and comparing mRNA levels across individuals </li></ul><ul><ul><li>Observe extensive usage of unannotated UTRs. </li></ul></ul><ul><ul><li>eQTLs enriched near transcription start sites, sQTLs enriched in and around canonical splice sites </li></ul></ul><ul><li>Next steps: can we identify which transcription factors/splice factors have altered binding, leading to variation in expression? </li></ul>
  29. 30. http://dx.doi.org/10.1038/nature08872 http://eqtl.uchicago.edu

×