Functionally annotate genomic variants

3,054 views

Published on

This seminar aims at answering the question of what to make of the identified variants, specifically how to evaluate the quality, prioritize and functionally annotate the variants.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,054
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
138
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • http://www.wikigenes.org/e/pub/e/84.htmlhttp://biostar.stackexchange.com/questions/1728/predicting-phenotype-from-snp-data-help/1730#1730
  • unmethylated ‘C’ bases, or cytosines, are converted to ‘T’
  • Functionally annotate genomic variants

    1. 1. [by Swamibu]<br />Functionally annotate variants<br />The answer is not always 42 !<br />August 4, 2011<br />
    2. 2. Quick recap: DNA sequence read mapping<br />August 4, 2011<br />Alignment -> Improving -> Variant calling -> Filtering<br />Resulting file type: vcf<br />“What are the differences to the reference genome?”<br />Searching the haystack<br />3.5 million SNPs<br />by Darwin Bell<br />
    3. 3. Finding the causal variant inideal situations*<br />Spot the variant that is common amongst all affected but absent in all unaffected<br />This variant is in a gene with known function and causes the protein to be disrupted<br />August 4, 2011<br />* e.g. some rare autosomaldisease<br />
    4. 4. In reality<br />You can’t spot the difference<br />You deal with ~3.5 million SNPs<br />You need to employ methods that systematically identify variants that stand out: GWAS<br />GWAS taught us that it is unlikely to find a causal common variant for complex diseases<br />Rare Variant ?<br />A bunch of rare and common variants ?<br />An even more complex model ?<br />August 4, 2011<br />1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010 Oct 28;467(7319):1061-73. PubMed PMID: 20981092<br />
    5. 5. Production Informatics and Bioinformatics<br />August 4, 2011<br />Produce raw sequence reads<br />Basic Production<br />Informatics<br />Map to genome and generate raw genomic features (e.g. SNPs)<br />Advanced <br />Production Inform.<br />Analyze the data; Uncover the biological meaning<br />Bioinformatics<br />Research<br />Statistical genetics<br />Per one-flowcell project<br />
    6. 6. Discount erroneous SNPs ?<br />Maybe most of my SNPs are not real and by excluding them I can find the causal variant?<br />Biological verification<br />Re-sequencing with a *different* method (e.g. Sanger)<br />“Yes the individual has a variant at location X”<br />But you can’t do that for > 3 Million SNPs<br />Bioinformatics verification<br />All quality measures are just proxies because we do not know which variants are real<br />August 4, 2011<br />
    7. 7. Quality control for variants<br />Transition (A->G; C->T) to Transversion (purine<->pyrimidine) rate<br />Concordance with known variants: dbSNP, HapMap, 1000genomes<br />Mendelian Errors<br />August 4, 2011<br />“of de novo germline base substitution mutations to be aprox. 10(-8) per base pair per generation”<br />1000 genomes Project<br />illumina<br />
    8. 8. Just look at exons ?<br />We know that there is a reduction of genetic variation in the neighborhood of genes, due to selection at linked sites (1000 genomes project).<br />We could focus on them to get started<br />Variant in a protein coding region likely to be functional<br />We are more likely to find the meaning of a variant in a protein coding region <br />August 4, 2011<br />1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010 Oct 28;467(7319):1061-73. PubMed PMID: 20981092<br />
    9. 9. Influence of a variant in protein coding region<br />NonsynonomousSNPs<br />Introduce stop codon<br />Disrupt structure<br />Disrupt domain<br />Indels<br />Cause frame shift<br />SynonomousSNPs<br />Alter translation efficiency<br />But, on average, each “normal” person is found to carry<br />250 to 300 loss-of-function variants in annotated genes<br />50 to 100 variants previously implicated in inherited disorders.<br />August 4, 2011<br />1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010 Oct 28;467(7319):1061-73. PubMed PMID: 20981092<br />
    10. 10. Intergenic variants are also important<br />Disrupt regulatory elements<br />Transcription factor binding sites<br />Splicer<br />ncRNA transcripts<br />mRNA editing<br />Causing changes in the expression of proteins that have a downstream effect on their regulatory targets<br />August 4, 2011<br />Promoter<br />Enhancer<br />Silencer<br />ncRNA<br />Splicing<br />Exons<br />Gene Blue<br />Exons<br />Gene Green<br />
    11. 11. Catching a villain does not bring down the mob<br />August 4, 2011<br />Disc<br />chr1<br />chr11<br />Autosomal translocation disrupting the function of the DISC gene is causing SZ in a family<br />However, this is a rare event and can not explain heritability of SZ in the larger population. <br />Millar JK, Wilson-Annan JC, Anderson S, Christie S, Taylor MS, Semple CA, Devon RS, Clair DM, Muir WJ, Blackwood DH, Porteous DJ (May 2000). "Disruption of two novel genes by a translocation co-segregating with schizophrenia". Hum. Mol. Genet. 9 (9): 1415–23. doi:10.1093/hmg/9.9.1415. PMID 10814723.<br />
    12. 12. Isolating SNPs that collectively explain liability<br />Different populations may have their own “version” of a change that has the same downstream effects.<br />Unlikely a “one-variant one-phenotype”-case for many diseases<br />Prioritize variants or sets of variants to focus analysis on<br />Variants likely to be functional<br />Involved in the same pathway<br />Model disease liability on this “subset” -> Statistical genetics: find variants with rel. large effect sizes that are able to explain a proportion of disease heritability in the population.<br />August 4, 2011<br />1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010 Oct 28;467(7319):1061-73. PubMed PMID: 20981092<br />
    13. 13. Functional variants <br />SIFT<br />Assigns a pre-computed score that says how likely this substitution is tolerated given the sequence of homologous proteins. <br />PolyPhen<br />Machine learning method predicting the impact of a sequence on the protein’s structure. <br />ANNOVAR<br />Annotate SNPs if they overlap functional elements, e.g. domains, transcription factor binding site, splice variant,…<br />August 4, 2011<br />Fernald GH, Capriotti E, Daneshjou R, Karczewski KJ, Altman RB. Bioinformatics challenges for personalized medicine. Bioinformatics. 2011 Jul 1;27(13):1741-8. PMID: 21596790<br />
    14. 14. Custom filer approach with Excel<br />Filter annotated variants with your requirements using excel to quickly identify a manageable list of “interesting” variants<br />Approach taken by the Daimantina (Paul Leo)<br />August 4, 2011<br />Carried by 90% of affected<br />Carried by 10% of un-affected<br />Loss of function<br />exonic<br />
    15. 15. Three things to remember<br />A “one-variant one-phenotype” model is rather unlikely<br />Variants in non-protein-coding regions are also important<br />New methods (bioinf and statistical genetics) need to be developed to address this problem<br />August 4, 2011<br />Addressed in upcoming discussion session run by Dr. Jake Gratten<br />
    16. 16. Next week:<br />August 4, 2011<br />Abstract: The focus in this session will be put on the differences between standard DNA mapping and RNAseq-specific transcript mapping:  identifying splice variants and isoforms. The issue of transcript quantification and genomic variants that can be identified from RNAseq data will be discussed.<br />

    ×