General methods of SNP discovery: PolyBayes Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA 02467
General methods of SNP mining – PolyBayes 2. Use sequence quality information ( base quality values ) to distinguish true ...
Computational SNP mining – PolyBayes sequence clustering simplifies to database search with  genome reference paralog filt...
Sequence clustering <ul><li>Clustering simplifies to search against sequence database to recruit relevant sequences </li><...
(Anchored) multiple alignment <ul><li>Advantages </li></ul><ul><ul><li>efficient -- only involves pair-wise comparisons </...
Paralog filtering <ul><li>The “paralog problem” </li></ul><ul><ul><li>unrecognized paralogs give rise to spurious SNP pred...
SNP detection <ul><li>Goal: to discern true variation from sequencing error </li></ul>sequencing error polymorphism
SNP discovery with PolyBayes genome reference sequence  1. Fragment recruitment (database search) 2. Anchored alignment 3....
Bayesian-statistical SNP detection 1. The algorithm probability of polymorphism base call, base quality a priori  polymorp...
Upcoming SlideShare
Loading in …5
×

Bi820 2005 S Marth Poly Bayes

693 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
693
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Bi820 2005 S Marth Poly Bayes

  1. 1. General methods of SNP discovery: PolyBayes Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA 02467
  2. 2. General methods of SNP mining – PolyBayes 2. Use sequence quality information ( base quality values ) to distinguish true mismatches from sequencing errors sequencing error true polymorphism 1. Utilize the genome reference sequence as a template to organize other sequence fragments from arbitrary sources Two innovative ideas:
  3. 3. Computational SNP mining – PolyBayes sequence clustering simplifies to database search with genome reference paralog filtering by counting mismatches weighed by quality values multiple alignment by anchoring fragments to genome reference SNP detection by differentiating true polymorphism from sequencing error using quality values
  4. 4. Sequence clustering <ul><li>Clustering simplifies to search against sequence database to recruit relevant sequences </li></ul>cluster 1 cluster 2 cluster 3 genome reference fragments <ul><li>Clusters = groups of overlapping sequence fragments matching the genome reference </li></ul>
  5. 5. (Anchored) multiple alignment <ul><li>Advantages </li></ul><ul><ul><li>efficient -- only involves pair-wise comparisons </li></ul></ul><ul><ul><li>accurate -- correctly aligns alternatively spliced ESTs </li></ul></ul><ul><li>The genomic reference sequence serves as an anchor </li></ul><ul><ul><li>fragments pair-wise aligned to genomic sequence </li></ul></ul><ul><ul><li>insertions are propagated – “sequence padding” </li></ul></ul>
  6. 6. Paralog filtering <ul><li>The “paralog problem” </li></ul><ul><ul><li>unrecognized paralogs give rise to spurious SNP predictions </li></ul></ul><ul><ul><li>SNPs in duplicated regions may be useless for genotyping </li></ul></ul>Paralogous difference Sequencing errors <ul><li>Challenge </li></ul><ul><ul><li>to differentiate between sequencing errors and paralogous difference </li></ul></ul>
  7. 7. SNP detection <ul><li>Goal: to discern true variation from sequencing error </li></ul>sequencing error polymorphism
  8. 8. SNP discovery with PolyBayes genome reference sequence 1. Fragment recruitment (database search) 2. Anchored alignment 3. Paralog identification 4. SNP detection
  9. 9. Bayesian-statistical SNP detection 1. The algorithm probability of polymorphism base call, base quality a priori polymorphism rate base composition depth of coverage

×