Mayank

468 views

Published on

Published in: Career
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
468
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Mayank

  1. 1. GENERAL SEMINAR TOPIC: IMPORTANCE OF FASTA AS A BIOINFORMATIC TOOL PRESENTED BY: MAYANK SRIVASTAVA
  2. 2. INTRODUCTION <ul><li>FASTA (Pearson and Lipman 1988) </li></ul><ul><li>Fast sequence searching algorithm for comparing query Sequence against a database. </li></ul><ul><li>FASTA (pronounced FAST-AYE) stands for FAST-All, reflecting the fact that </li></ul><ul><li>It can be used for a fast protein comparison or a fast nucleotide comparison. </li></ul><ul><li>This program achieves a high level of sensitivity for similarity searching at </li></ul><ul><li>high speed. </li></ul>
  3. 3. HEURISTIC METHODS <ul><li>An approximation (or heuristic) search method does not mean that the search algorithm </li></ul><ul><li>will find a wrong solution. </li></ul><ul><li>If a solution is found, that solution is guaranteed to be valid, but it may not be optimal. </li></ul><ul><li>Problem of Dynamic Programming </li></ul><ul><li>D.P. compute the score in a lot of useless area for optimal sequence </li></ul><ul><li>FASTA focuses on diagonal area </li></ul>6 5 5 5 5 4 3 3 3 2 1 A 5 5 5 5 4 4 3 3 2 2 1 G 4 4 4 4 4 4 3 3 2 2 1 C 3 3 3 3 3 3 3 3 2 2 1 T 2 2 2 2 2 2 2 2 2 2 1 A 2 2 2 2 1 1 1 1 1 1 1 G 1 1 1 1 1 1 1 1 1 1 1 G A T T G A C T T A A G
  4. 4. <ul><li>Dynamic programming requires order N 2 L computations (where N is size of the query sequence and L is the size of the database) </li></ul>FASTA focus on this area
  5. 6. <ul><li>Use look-up Table </li></ul><ul><li>Query : G A A T T C A G T T A </li></ul><ul><li>Sequence: G G A T C G A </li></ul>Look-up Table Dot—Matrix 1 2 3 4 5 6 7 8 9 10 11 4,5,9,10 T 1,8 G 6 C 2,3,7,11 A Location Q * * * * A * * G * C * * * * T * * * * A * * G * * G A T T G A C T T A A G
  6. 7. FASTA ALGORITHM <ul><li>( a) Find runs of identical words </li></ul><ul><li>Identify regions shared by the two sequences that have the highest </li></ul><ul><li>density of single identities (ktup=1) or two consecutive </li></ul><ul><li>identities(ktup=2) </li></ul><ul><li>(b) Re-score using PAM matrix </li></ul><ul><li>Longest diagonals are scored again using the PAM-250 matrix (or other </li></ul><ul><li>matrix). The best scores are saved as “init1” scores. </li></ul><ul><li>( c) Join segments using gaps and eliminate other segments </li></ul><ul><li>Long diagonals that are neighbors are joined. The score for this </li></ul><ul><li>joined region is “initn”. This score may be lower due to a penalty </li></ul><ul><li>for a gap. </li></ul><ul><li>(d) Use DP to create the optimal alignment </li></ul><ul><li>construct an optimal alignment of the query sequence and the </li></ul><ul><li>library sequence (SW algorithm).This score is reported as the </li></ul><ul><li>optimized score </li></ul>
  7. 8. FASTA Steps 1 Local regions of identity are found Different offset values Identical offset values in a contiguous sequence 2 Rescore the local regions using PAM or Blos. matrix Diagonals are extended 3 Eliminate short diagonals below a cutoff score 4 Create a gapped alignment in a narrow segment and then perform S-W alignment
  8. 10. OUTPUT: HIT LIST
  9. 11. ALIGNMENT OF QUERY TO A HIT
  10. 12. VERSIONS OF FASTA <ul><li>FASTA-nucleotide or protein sequence searching </li></ul><ul><li>FAST x-compares a translated DNA query sequence </li></ul><ul><li>FAST y - a protein sequence database (forward </li></ul><ul><li>or backward translation of the query) </li></ul><ul><li>t FAST x-compares protein query sequence </li></ul><ul><li>t FAST y DNA sequence database that has been </li></ul><ul><li>translated into three forward and three </li></ul><ul><li>reverse reading frames </li></ul>
  11. 13. Application <ul><li>Quality control and pre processing of metagenomic datasets. </li></ul><ul><li>CENTROIDFOLD: a web server for RNA secondary structure prediction. </li></ul>
  12. 14. BIBLIOGRAPHY <ul><li>1) Schmieder R, Edwards R/ Epub /2011 Mar 15;27(6):863-4. </li></ul><ul><li>2) Sato K, Hamada M, Asai K, Mituyama T/ Epub/2009 Jul;1:W277-80 </li></ul><ul><li>3) H.S. Chawla / Plant biotechnology, 3 rd edition2010/genomics and bioinformatics. </li></ul>

×