Non-synonymous SNP ID

2,238 views

Published on

"Large" data set project for Bioinformatics class identifying non-synonymous SNPs in sockeye salmon

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,238
On SlideShare
0
From Embeds
0
Number of Embeds
58
Actions
Shares
0
Downloads
37
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • 22% loss
  • Non-synonymous SNP ID

    1. 1. Identifying synonymous and non-synonymous SNPs<br />Bioinformatics for Environmental Sciences<br />Winter 2009<br />Large Data Set Analysis<br />Caroline Storer<br />
    2. 2. Objective<br />Identify weather a SNP is synonymous or non-synonymous <br />Develop high-throughput workflow for non-synonymous SNP identification<br />
    3. 3. Context<br />SNPs are becoming an abundant and accessible tool for genetic studies in non-model organisms<br />Additionally, <br /><ul><li>In sockeye salmon, we now have > 110 SNPs to use for genetic stock identification (GSI)
    4. 4. SNPs under selection often provide high resolution for GSI
    5. 5. Non-synonymous SNPs might indicate genes under selection</li></li></ul><li>Theory<br />A single, nucleotide sequence difference could produce a change in amino acid and thereby possibly alter protein function<br />This is a non-synonymous SNP<br />
    6. 6. Non-synonymous SNPs<br />Dependent on position of SNP in codon<br />Dependant on the reading frame<br />
    7. 7. Difficulties<br />Determining the reading frame<br />5’-TCTAAAATGGGTGAC-3<br />5’-UCUAAAAUGGGUGAC-3<br />UCU AAA AUG GGU GAC<br /> . CUA AAA UGG GUG AC<br /> . . UAA AAU GGG UGA C<br />dsDNA<br />RNA<br />6 possible RFs for dsDNA<br />3 possible RFs in each direction<br />Whichreading frame is correct?<br />
    8. 8. Workflow<br />
    9. 9. Data: Sequences & SNPs<br />2 sequences, one for each SNP allele<br />1 sequence with 1 SNP <br /><ul><li>23 sequences with 1 SNP
    10. 10. Sequence length from 102 bp – 400 bp</li></ul>TGATTTCT[C/T]CATTCCATG<br />TGATTTCTCCATTCCATG<br />TGATTTCTTCATTCCATG<br />46 DNA sequences, 1 for each allele<br />
    11. 11. Translation: Transseq<br />http://www.ebi.ac.Tk/Tools/emboss/transeq/index.html?<br /><ul><li>Result: 143 AA sequences, 12 for each SNP locus, 6 for each allele sequence</li></li></ul><li>Protein BLAST<br />iNquiry BLASTP: amino acid query/protein database<br />Top hit only<br />Tabular output<br />INPUT<br />OUTPUT<br />BLASTP<br />84 sequences<br />276 sequences<br />All 23 loci<br />18 loci<br /><ul><li>22% loss of loci due to zero hits in BLAST </li></li></ul><li>Picking a Reading Frame<br />Query Top Hit E-value Scenario <br />Only 1 reading frame had hits<br />Multiple reading frames had hits,1 had higher E-value<br />Reading frame<br />Locus <br />Allele <br />
    12. 12. Sequence Alignment & SNPs<br />17 synonymous SNPs (no change in AA)<br />1 non- synonymous SNP<br />
    13. 13. A Non-synonymous SNP<br />GO Terms<br />SNP U1214: [A/C]<br />Gene: Sialytransferase<br />
    14. 14. High-throughput<br />
    15. 15. Conclusions<br />Can identify non-synonymous mutations<br />Method is not high-throughput<br />Method could be more automated with sockeye genome <br />

    ×