Mayank
Upcoming SlideShare
Loading in...5
×
 

Mayank

on

  • 366 views

 

Statistics

Views

Total Views
366
Views on SlideShare
366
Embed Views
0

Actions

Likes
0
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Mayank Mayank Presentation Transcript

  • GENERAL SEMINAR TOPIC: IMPORTANCE OF FASTA AS A BIOINFORMATIC TOOL PRESENTED BY: MAYANK SRIVASTAVA
  • INTRODUCTION
    • FASTA (Pearson and Lipman 1988)
    • Fast sequence searching algorithm for comparing query Sequence against a database.
    • FASTA (pronounced FAST-AYE) stands for FAST-All, reflecting the fact that
    • It can be used for a fast protein comparison or a fast nucleotide comparison.
    • This program achieves a high level of sensitivity for similarity searching at
    • high speed.
  • HEURISTIC METHODS
    • An approximation (or heuristic) search method does not mean that the search algorithm
    • will find a wrong solution.
    • If a solution is found, that solution is guaranteed to be valid, but it may not be optimal.
    • Problem of Dynamic Programming
    • D.P. compute the score in a lot of useless area for optimal sequence
    • FASTA focuses on diagonal area
    6 5 5 5 5 4 3 3 3 2 1 A 5 5 5 5 4 4 3 3 2 2 1 G 4 4 4 4 4 4 3 3 2 2 1 C 3 3 3 3 3 3 3 3 2 2 1 T 2 2 2 2 2 2 2 2 2 2 1 A 2 2 2 2 1 1 1 1 1 1 1 G 1 1 1 1 1 1 1 1 1 1 1 G A T T G A C T T A A G View slide
    • Dynamic programming requires order N 2 L computations (where N is size of the query sequence and L is the size of the database)
    FASTA focus on this area View slide
  •  
    • Use look-up Table
    • Query : G A A T T C A G T T A
    • Sequence: G G A T C G A
    Look-up Table Dot—Matrix 1 2 3 4 5 6 7 8 9 10 11 4,5,9,10 T 1,8 G 6 C 2,3,7,11 A Location Q * * * * A * * G * C * * * * T * * * * A * * G * * G A T T G A C T T A A G
  • FASTA ALGORITHM
    • ( a) Find runs of identical words
    • Identify regions shared by the two sequences that have the highest
    • density of single identities (ktup=1) or two consecutive
    • identities(ktup=2)
    • (b) Re-score using PAM matrix
    • Longest diagonals are scored again using the PAM-250 matrix (or other
    • matrix). The best scores are saved as “init1” scores.
    • ( c) Join segments using gaps and eliminate other segments
    • Long diagonals that are neighbors are joined. The score for this
    • joined region is “initn”. This score may be lower due to a penalty
    • for a gap.
    • (d) Use DP to create the optimal alignment
    • construct an optimal alignment of the query sequence and the
    • library sequence (SW algorithm).This score is reported as the
    • optimized score
  • FASTA Steps 1 Local regions of identity are found Different offset values Identical offset values in a contiguous sequence 2 Rescore the local regions using PAM or Blos. matrix Diagonals are extended 3 Eliminate short diagonals below a cutoff score 4 Create a gapped alignment in a narrow segment and then perform S-W alignment
  •  
  • OUTPUT: HIT LIST
  • ALIGNMENT OF QUERY TO A HIT
  • VERSIONS OF FASTA
    • FASTA-nucleotide or protein sequence searching
    • FAST x-compares a translated DNA query sequence
    • FAST y - a protein sequence database (forward
    • or backward translation of the query)
    • t FAST x-compares protein query sequence
    • t FAST y DNA sequence database that has been
    • translated into three forward and three
    • reverse reading frames
  • Application
    • Quality control and pre processing of metagenomic datasets.
    • CENTROIDFOLD: a web server for RNA secondary structure prediction.
  • BIBLIOGRAPHY
    • 1) Schmieder R, Edwards R/ Epub /2011 Mar 15;27(6):863-4.
    • 2) Sato K, Hamada M, Asai K, Mituyama T/ Epub/2009 Jul;1:W277-80
    • 3) H.S. Chawla / Plant biotechnology, 3 rd edition2010/genomics and bioinformatics.
  •