Sequence comparison techniques
Upcoming SlideShare
Loading in...5
×
 

Sequence comparison techniques

on

  • 2,363 views

huristic approach

huristic approach
blast and fasta

Statistics

Views

Total Views
2,363
Views on SlideShare
2,363
Embed Views
0

Actions

Likes
2
Downloads
104
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Sequence comparison techniques Sequence comparison techniques Presentation Transcript

    • Sequence comparison technique
      Ms.ruchiyadavlectureramity institute of biotechnologyamity universitylucknow(up)
    • Sequence comparison technique
      Pairwise Alignment
      Local Alignment(Smith WatermanAlgorithm)
      Global Alignment(Needleman Wunsch Algorithm)
      Multiple Alignment
      Heuristic Methods
      Rather than struggling to find the optimal alignment we may save a lot of time by employing heuristic algorithms
      Execution time is much faster
      May completely miss the optimal alignment
      FASTA and
      BLAST
    • A
      T
      T
      G
      A
      C
      T
      T
      A
      A
      G
      1
      1
      1
      1
      1
      1
      1
      1
      1
      1
      1
      G
      2
      2
      2
      2
      1
      1
      1
      1
      1
      1
      1
      G
      2
      2
      2
      2
      2
      2
      2
      2
      2
      2
      1
      A
      3
      3
      3
      3
      3
      3
      3
      3
      2
      2
      1
      T
      4
      4
      4
      4
      4
      4
      3
      3
      2
      2
      1
      C
      5
      5
      5
      5
      4
      4
      3
      3
      2
      2
      1
      G
      6
      5
      5
      5
      5
      4
      3
      3
      3
      2
      1
      A
      Heuristic Methods
      Problem of Dynamic Programming
      D.P. compute the score in a lot of useless area for optimal sequence
      FASTA focuses on diagonal area
    • Heuristic
      Heuristic
      Good local alignment should have some exact match subsequence.
      FASTA focus on this area
    • Heuristic Methods: FASTA and BLAST
      FASTA
      First fast sequence searching algorithm for comparing a query sequence against a database.
      BLAST
      Basic Local Alignment Search Technique
      Improvement of FASTA: Search speed, ease of use, statistical rigor.
    • FASTA ALGORITHM
      (a)Find runs of identical words
      Identify regions shared by the two sequences that have the highest density of single identities (ktup=1) or two consecutive identities(ktup=2)
      (b) Re-score using PAM matrix.
      Longest diagonals are scored again using the PAM-250 matrix (or other matrix). The best scores are saved as “init1” scores.
    • FASTA Algorithm
      “init1”
      ktup=2
    • FASTA ALGORITHM
      (c) Join segments using gaps and eliminate other segments.
      Longdiagonals that are neighbors are joined. The score for this joined region is“initn”. This score may be lower due to a penalty for a gap.
      (d) Use DP to create the optimal alignment.
      construct an optimal alignment of the query sequence and the library sequence (SW algorithm).This score is reported as the optimized score
    • FASTA Alignments
      “initn”
    • FASTA Algorithm- Find words of identical words.
      Lookup table showing the positions of each word of length k, or k-tuple, is constructed for each sequence.
      The relative positions of each word in the two sequences are then calculated by subtracting the position in the first sequence from that in the second.
      Words that have the same offset position are in phase and reveal a region of alignment between the two sequences.
    • Look-up table
    • A
      T
      T
      G
      A
      C
      T
      T
      A
      A
      G
      *
      *
      G
      Location
      Q
      *
      *
      G
      2,3,7,11
      A
      *
      *
      *
      *
      A
      6
      C
      *
      *
      *
      *
      T
      1,8
      G
      *
      C
      *
      *
      G
      4,5,9,10
      T
      *
      *
      *
      *
      A
      FASTA - Algorithm -
      Use look-up Table
      Query : G A A T T C A G T T A
      Sequence: G G A T C G A
      Dot—Matrix
      1 2 3 4 5 6 7 8 9 10 11
      Look-up Table
    • FASTA - Algorithm -
      Use the dynamic programming in restricted area around the best-score alignment to find out the higher-score alignment than the best-score alignment
      Width of this band is a parameter
    • FASTA - Complexity
      Complexity
      Step 1 and 2 // select the best 10 diagonal run//
      Let n be a sequence from DB
      O(n) because Step 1 just uses look up table
      O(n) << O(mn) m,n = 100 to 200
    • FASTA - Complexity
      compute partial D.P. Depends on the restricted area
      < O(mn)
      Therefore, FASTA is faster than D.P.
      Width of this band is a parameter
    • Step 1: Finding Seeds
      t
      s
      16
    • Step 2: Re-scoring Segments, Keeping Top 10
      t
      s
      17
    • Step 3: Eliminating Unlikely Segments
      t
      s
      18
    • Step 4: Finding the Best Alignment
      t
      s
      19
    • Versions of FASTA
      FASTA compares a query protein sequence to a protein sequence library to find similar sequences. FASTA also compares a DNA sequence to a DNA sequence library.
      TFASTA compares a query protein sequence to a DNA sequence library, after translating the DNA sequence library in all six reading frames.
      FASTX and FASTY translate a query DNA sequence in all three reading forward frames and compare all three frames to a protein sequence database.
      TFASTX and TFASTY compare a query protein sequence to a DNA sequence database, translating each DNA sequence in all six possible reading frames.
    • BLAST
      Publications:
      Ungapped BLAST – Alttschul et al., 1990
      Gapped BLAST, PSI-BLAST - Altschul et al., 1997
      Basic Local Alignment Search Tool
      Altschul et al. 1990,1994,1997
      Heuristic method for local alignment
      Designed specifically for database searches
      Based on the same assumption as FASTA that good alignments contain short lengths of exact matches
    • Basic Local Alignment Search Tool (BLAST)
      Input:
      Query (target) sequence– either DNA, RNA or Protein
      Scoring Scheme– gap penalties, substitution matrix for proteins, identity/mismatch scores for DNA/RNA
      Word length W– typical is
      W=3 for proteins and
      W=11 for DNA/RNA
      Output:
      Statistically significant matches
      22
    • BLAST ALGORITHM PARAMETERS
    • Algorithm of BLAST
      There are three distinct steps, which are represented as follow:
      Step1: Query preprocessing;
      Step2: Scan the database for hits;
      Step3: Extension of hits.
    • BLAST - Algorithm
      Step 1: Query preprocessing;
      Create neighbourhood words for each query word
      Max:L-w+1
      Query Word
      Neighborhood words
    • BLAST - Algorithm
      Step 1: Query preprocessing;
      A list of words of length 3 for protein (word length 11 is used for DNA sequences)
    • BLAST -Query preprocessing
      Compile the short-hit scoring word list from query.
      The length of query word, is 3.
      Words below threshold are not further pursued.
    • BLAST - Algorithm
      Step 2: Scan the database for hits;
      For each words list, identify all exact matches with DB sequences
      Neighborhood Word list
      Query Word
      Sequences in DB
      Sequence 1
      Sequence 2
      Step 2
      Step 1
      The purpose of Step 1 and 2 is as same as FASTA
    • Step3:Extension of the hits
      Every hit that has been generated is now extended in both directions, without gaps.
      To determine whether each hit may be part of a longer segment pair with higher score,
    • Step3:Extension of the hits
      HSP (High scoring Segment Pair).
      If the extended segment pair has score better than equal to S (set as a parameter of the program), it is called HSP
      MSP (Maximal segment pair).
      In a comparison, for every sequence in the database, the best scoring HSP is called the MSP
    • HIGH –SCORING PAIR(HSP)
    • Maximal segment pair(msp)
    • Step 2: Extracting Seeds
      t
      s
      33
    • Step 3: Finding HSPs
      t
      s
      34
    • Step 4: Combining HSPs
      t
      s
      35
    • BLAST
    • Basic BLAST
    • Specialized BLAST
      • Make specific primers with Primer-BLAST
      • Search trace archives
      • Find conserved domains in your sequence (cds)
      • Find sequences with similar conserved domain architecture (cdart)
      • Search sequences that have gene expression profiles (GEO)
      • Search immunoglobulins(IgBLAST)
      • Search for SNPs(snp)
      • Screen sequence for vector contamination (vecscreen)
      • Align two (or more) sequences using BLAST (bl2seq)
      • Search protein or nucleotide targets in PubChem BioAssay
      • Search SRA transcript and genomic libraries
      • Constraint Based Protein Multiple Alignment Tool
      • Needleman-Wunsch Global Sequence Alignment Tool
    • BLAST DATABASES
    • Databases available on BLAST Web server
    • Databases available on BLAST Web server
    • Options and parameter settings available on the BLAST server