Your SlideShare is downloading. ×
0
Functionally annotate genomic variants
Functionally annotate genomic variants
Functionally annotate genomic variants
Functionally annotate genomic variants
Functionally annotate genomic variants
Functionally annotate genomic variants
Functionally annotate genomic variants
Functionally annotate genomic variants
Functionally annotate genomic variants
Functionally annotate genomic variants
Functionally annotate genomic variants
Functionally annotate genomic variants
Functionally annotate genomic variants
Functionally annotate genomic variants
Functionally annotate genomic variants
Functionally annotate genomic variants
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Functionally annotate genomic variants

2,557

Published on

This seminar aims at answering the question of what to make of the identified variants, specifically how to evaluate the quality, prioritize and functionally annotate the variants.

This seminar aims at answering the question of what to make of the identified variants, specifically how to evaluate the quality, prioritize and functionally annotate the variants.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,557
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
130
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • http://www.wikigenes.org/e/pub/e/84.htmlhttp://biostar.stackexchange.com/questions/1728/predicting-phenotype-from-snp-data-help/1730#1730
  • unmethylated ‘C’ bases, or cytosines, are converted to ‘T’
  • Transcript

    • 1. [by Swamibu]
      Functionally annotate variants
      The answer is not always 42 !
      August 4, 2011
    • 2. Quick recap: DNA sequence read mapping
      August 4, 2011
      Alignment -> Improving -> Variant calling -> Filtering
      Resulting file type: vcf
      “What are the differences to the reference genome?”
      Searching the haystack
      3.5 million SNPs
      by Darwin Bell
    • 3. Finding the causal variant inideal situations*
      Spot the variant that is common amongst all affected but absent in all unaffected
      This variant is in a gene with known function and causes the protein to be disrupted
      August 4, 2011
      * e.g. some rare autosomaldisease
    • 4. In reality
      You can’t spot the difference
      You deal with ~3.5 million SNPs
      You need to employ methods that systematically identify variants that stand out: GWAS
      GWAS taught us that it is unlikely to find a causal common variant for complex diseases
      Rare Variant ?
      A bunch of rare and common variants ?
      An even more complex model ?
      August 4, 2011
      1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010 Oct 28;467(7319):1061-73. PubMed PMID: 20981092
    • 5. Production Informatics and Bioinformatics
      August 4, 2011
      Produce raw sequence reads
      Basic Production
      Informatics
      Map to genome and generate raw genomic features (e.g. SNPs)
      Advanced
      Production Inform.
      Analyze the data; Uncover the biological meaning
      Bioinformatics
      Research
      Statistical genetics
      Per one-flowcell project
    • 6. Discount erroneous SNPs ?
      Maybe most of my SNPs are not real and by excluding them I can find the causal variant?
      Biological verification
      Re-sequencing with a *different* method (e.g. Sanger)
      “Yes the individual has a variant at location X”
      But you can’t do that for > 3 Million SNPs
      Bioinformatics verification
      All quality measures are just proxies because we do not know which variants are real
      August 4, 2011
    • 7. Quality control for variants
      Transition (A->G; C->T) to Transversion (purine<->pyrimidine) rate
      Concordance with known variants: dbSNP, HapMap, 1000genomes
      Mendelian Errors
      August 4, 2011
      “of de novo germline base substitution mutations to be aprox. 10(-8) per base pair per generation”
      1000 genomes Project
      illumina
    • 8. Just look at exons ?
      We know that there is a reduction of genetic variation in the neighborhood of genes, due to selection at linked sites (1000 genomes project).
      We could focus on them to get started
      Variant in a protein coding region likely to be functional
      We are more likely to find the meaning of a variant in a protein coding region
      August 4, 2011
      1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010 Oct 28;467(7319):1061-73. PubMed PMID: 20981092
    • 9. Influence of a variant in protein coding region
      NonsynonomousSNPs
      Introduce stop codon
      Disrupt structure
      Disrupt domain
      Indels
      Cause frame shift
      SynonomousSNPs
      Alter translation efficiency
      But, on average, each “normal” person is found to carry
      250 to 300 loss-of-function variants in annotated genes
      50 to 100 variants previously implicated in inherited disorders.
      August 4, 2011
      1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010 Oct 28;467(7319):1061-73. PubMed PMID: 20981092
    • 10. Intergenic variants are also important
      Disrupt regulatory elements
      Transcription factor binding sites
      Splicer
      ncRNA transcripts
      mRNA editing
      Causing changes in the expression of proteins that have a downstream effect on their regulatory targets
      August 4, 2011
      Promoter
      Enhancer
      Silencer
      ncRNA
      Splicing
      Exons
      Gene Blue
      Exons
      Gene Green
    • 11. Catching a villain does not bring down the mob
      August 4, 2011
      Disc
      chr1
      chr11
      Autosomal translocation disrupting the function of the DISC gene is causing SZ in a family
      However, this is a rare event and can not explain heritability of SZ in the larger population.
      Millar JK, Wilson-Annan JC, Anderson S, Christie S, Taylor MS, Semple CA, Devon RS, Clair DM, Muir WJ, Blackwood DH, Porteous DJ (May 2000). "Disruption of two novel genes by a translocation co-segregating with schizophrenia". Hum. Mol. Genet. 9 (9): 1415–23. doi:10.1093/hmg/9.9.1415. PMID 10814723.
    • 12. Isolating SNPs that collectively explain liability
      Different populations may have their own “version” of a change that has the same downstream effects.
      Unlikely a “one-variant one-phenotype”-case for many diseases
      Prioritize variants or sets of variants to focus analysis on
      Variants likely to be functional
      Involved in the same pathway
      Model disease liability on this “subset” -> Statistical genetics: find variants with rel. large effect sizes that are able to explain a proportion of disease heritability in the population.
      August 4, 2011
      1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010 Oct 28;467(7319):1061-73. PubMed PMID: 20981092
    • 13. Functional variants
      SIFT
      Assigns a pre-computed score that says how likely this substitution is tolerated given the sequence of homologous proteins.
      PolyPhen
      Machine learning method predicting the impact of a sequence on the protein’s structure.
      ANNOVAR
      Annotate SNPs if they overlap functional elements, e.g. domains, transcription factor binding site, splice variant,…
      August 4, 2011
      Fernald GH, Capriotti E, Daneshjou R, Karczewski KJ, Altman RB. Bioinformatics challenges for personalized medicine. Bioinformatics. 2011 Jul 1;27(13):1741-8. PMID: 21596790
    • 14. Custom filer approach with Excel
      Filter annotated variants with your requirements using excel to quickly identify a manageable list of “interesting” variants
      Approach taken by the Daimantina (Paul Leo)
      August 4, 2011
      Carried by 90% of affected
      Carried by 10% of un-affected
      Loss of function
      exonic
    • 15. Three things to remember
      A “one-variant one-phenotype” model is rather unlikely
      Variants in non-protein-coding regions are also important
      New methods (bioinf and statistical genetics) need to be developed to address this problem
      August 4, 2011
      Addressed in upcoming discussion session run by Dr. Jake Gratten
    • 16. Next week:
      August 4, 2011
      Abstract: The focus in this session will be put on the differences between standard DNA mapping and RNAseq-specific transcript mapping:  identifying splice variants and isoforms. The issue of transcript quantification and genomic variants that can be identified from RNAseq data will be discussed.

    ×