Sequence Alignment,Blast, Fasta, MSA

1,969 views

Published on

Similarity searches (Blast, Fasta, sequence alignments), pairwise sequence alignments for the aCSIR Ph.D course work

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,969
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
101
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Series of methods that relies on pairwise alignments
  • Sequence Alignment,Blast, Fasta, MSA

    1. 1. Sucheta Tripathy, 16th November 2012
    2. 2.  A protein sequence from species A ◦ What is the nearest species this protein is similar to? ◦ Where is it originated from? ◦ Putative function. ◦ If it has a conserved motif etc.
    3. 3.  Blast (Basic Local Alignment Search Tool) ◦ NCBI Blast ◦ Wu-Blast ◦ PSI-Blast Fasta SSearch
    4. 4.  Heuristic (Educated guess) Does not compare sequence to its entirety. Quickly locates short matches(seeds) Word size Seeds are extended in both directions Threshold is defined ◦ > Threshold -> keep the alignment ◦ < Threshold -> discard the alignment
    5. 5. GLKFA -> 3 GLK, LKF, FKA
    6. 6.  A Query sequence: ◦ Nucleotide ◦ Protein A Target Database ◦ Nucleotide ◦ Protein Blast Program ◦ Blastn ◦ Blastp ◦ tBlastx (Slowest Nt query translated against Nt database trlt.) ◦ tBlastn (Protein query translated nt. Database) ◦ Blastx (Nucleotide trnslt against Protein database)
    7. 7.  E Value -> Probability value at which the sequence hits may occur by chance Score -> Similarity score. ◦ By chance rain probability is 0.001 ◦ Passing by chance etc. ◦ Less the e –value the better is the sensitivity of the alignment.
    8. 8.  Remove Low Complexity regions Generate all the k mers. List All Possible matching key words. - Blast cares about only high scoring pairs - Fasta stores all pairs irrespective of the scores. Extend the matches into high scoring pairs(HSPs) Evaluate results depending on thresholds set. Extend HSPs and join them together.
    9. 9. ATGGGGCGAGGCAGCGGCACCTTCGAGCGTCTCCTAGACAAGGCGACCAGCCAGCTCCTGTTGGAGACAGATTGGGAGTCCATTTTGCAGATCTGCGACCTGATCCGCCAAGGGGACACACAAGCAAAATATGCTGTGAATTCCATCAAGAAGAAAGTCAACGACAAGAACCCACACGTCGCCTTGTATGCCCTGGAGGTCATGGAATCTGTGGTAAAGAACTGTGGCCAGACAGTTCATGATGAGGTGGCCAACAAGCAGACCATGGAGGAGCTGAAGGACCTGCTGAAGAGACAAGTGGAGGTAAACGTCCGTAACAAGATCCTGTACCTGATCCAGGCCTGGGCGCATGCCTTCCGGAACGAGCCCAAGTACAAGGTGGTCCAGGACACCTACCAGATCATGAAGGTGGAGGGGCACGTCTTTCCAGAATTCAAAGAGAGCGATGCCATGTTTGCTGCCGAGAGAGCCCCAGACTGGGTGGACGCTGAGGAATGCCACCGCTGCAGGGTGCAGTTCGGGGTGATGACCCGTAAGCACCACTGCCGGGCGTGTGGGCAGATATTCTGTGGAAAGTGTTCTTCCAAGTACTCCACCATCCCCAAGTTTGGCATCGAGAAGGAGGTGCGCGTGTGTGAGCCCTGCTACGAGCAGCTGAACAGGAAAGCGGAGGGAAAGGCCACTTCCACCACTGA
    10. 10.  Dot matrix method (bioinfx.net) Dynamic Programming method ◦ Global(Needleman-Wunsch method) ◦ Local (Smith-Waterman method) Word Method or K-tuple method(Heuristic) FTFTALILLAVAV FTALLLAAVhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC50453/pdf/pnas01096-
    11. 11.  Uses Neighbor joining guide tree(NJ). ◦ N number of sequences  ½ * N! / (N-r)! -> Number of pairs  5 sequences (5,4,3,2,1)  (5,4), (5,3), (5,2), (5,1); (4,3),(4,2),(4,1);(3,2),(3,1);(2,1)
    12. 12. PAMBLOSSUMGONNETDNA Identity MatrixDNA PUPY matrix
    13. 13.  Substitution Matrices Insertion and deletions are less likely than a substitution  Insertion and Deletion in DNA sequence leads to Frame shift.PAM Matrices(Point Accepted Mutation Matrices)Margaret Dayhoff 1978PAM1 -> Expected rates of substition if 1% of theamino acids have changed BLOSUM : Blocks Substitution Matrix (% of identity)
    14. 14. PAM matrices are based on a simple evolutionary model MATLFC MLTLCC M(A/L)TL(F/C)C Two changes Ancestral sequence?• Only mutations are allowed• Sites evolve independently 15
    15. 15. Guidelines for using matriciesProtein Query LengthMatrix Open Gap Extend Gap>300 BLOSUM50 -10 -285-300 BLOSUM62 -7 -150-85 BLOSUM80 -16 -4>300 PAM250 -10 -285-300 PAM120 -16 -435-85 MDM40 -12 -2<=35 MDM20 -22 -4<=10 MDM10 -23 -4PAM100 ==> Blosum90PAM120 ==> Blosum80PAM160 ==> Blosum60PAM200 ==> Blosum52PAM250 ==> Blosum45
    16. 16. Scoring MatricesS = [sij] gives score of aligning character i with character j for every pair i, j. STPP CTCA 0 + 3 + (-3) + 1 =1 17

    ×