SlideShare a Scribd company logo
Sucheta Tripathy, 16th November 2012
   A protein sequence from species A
    ◦ What is the nearest species this protein is similar
      to?
    ◦ Where is it originated from?
    ◦ Putative function.
    ◦ If it has a conserved motif etc.
   Blast (Basic Local Alignment Search Tool)
    ◦ NCBI Blast
    ◦ Wu-Blast
    ◦ PSI-Blast
   Fasta
   SSearch
   Heuristic (Educated guess)
   Does not compare sequence to its entirety.
   Quickly locates short matches(seeds)
   Word size
   Seeds are extended in both directions
   Threshold is defined
    ◦ > Threshold -> keep the alignment
    ◦ < Threshold -> discard the alignment
GLKFA -> 3
   GLK, LKF, FKA
   A Query sequence:
    ◦ Nucleotide
    ◦ Protein
   A Target Database
    ◦ Nucleotide
    ◦ Protein
   Blast Program
    ◦ Blastn
    ◦ Blastp
    ◦ tBlastx (Slowest Nt query translated against Nt database
      trlt.)
    ◦ tBlastn (Protein query translated nt. Database)
    ◦ Blastx (Nucleotide trnslt against Protein database)
   E Value -> Probability value at which the
    sequence hits may occur by chance
   Score -> Similarity score.
    ◦ By chance rain probability is 0.001
    ◦ Passing by chance etc.
    ◦ Less the e –value the better is the sensitivity of the
      alignment.
   Remove Low Complexity regions
   Generate all the k mers.
   List All Possible matching key words.
    - Blast cares about only high scoring pairs
    - Fasta stores all pairs irrespective of the
    scores.
   Extend the matches into high scoring
    pairs(HSPs)
   Evaluate results depending on thresholds set.
   Extend HSPs and join them together.
ATGGGGCGAGGCAGCGGCACCTTCGAGCGTCTCCTAGACAAGGCGACCAGCCAGCTCCTGTTG
GAGACAGATTGGGAGTCCATTTTGCAGATCTGCGACCTGATCCGCCAAGGGGACACACAAGCA
AAATATGCTGTGAATTCCATCAAGAAGAAAGTCAACGACAAGAACCCACACGTCGCCTTGTATG
CCCTGGAGGTCATGGAATCTGTGGTAAAGAACTGTGGCCAGACAGTTCATGATGAGGTGGCCA
ACAAGCAGACCATGGAGGAGCTGAAGGACCTGCTGAAGAGACAAGTGGAGGTAAACGTCCGTA
ACAAGATCCTGTACCTGATCCAGGCCTGGGCGCATGCCTTCCGGAACGAGCCCAAGTACAAGG
TGGTCCAGGACACCTACCAGATCATGAAGGTGGAGGGGCACGTCTTTCCAGAATTCAAAGAGA
GCGATGCCATGTTTGCTGCCGAGAGAGCCCCAGACTGGGTGGACGCTGAGGAATGCCACCGCT
GCAGGGTGCAGTTCGGGGTGATGACCCGTAAGCACCACTGCCGGGCGTGTGGGCAGATATTCT
GTGGAAAGTGTTCTTCCAAGTACTCCACCATCCCCAAGTTTGGCATCGAGAAGGAGGTGCGCGT
GTGTGAGCCCTGCTACGAGCAGCTGAACAGGAAAGCGGAGGGAAAGGCCACTTCCACCACTGA
   Dot matrix method (bioinfx.net)
   Dynamic Programming method
    ◦ Global(Needleman-Wunsch method)
    ◦ Local (Smith-Waterman method)
   Word Method or K-tuple method(Heuristic)




    FTFTALILLAVAV
    FTALLLAAV



http://www.ncbi.nlm.nih.gov/pmc/articles/PMC50453/pdf/pnas01096-
   Uses Neighbor joining guide tree(NJ).
    ◦ N number of sequences
      ½ * N! / (N-r)! -> Number of pairs
      5 sequences (5,4,3,2,1)
        (5,4), (5,3), (5,2), (5,1); (4,3),(4,2),(4,1);(3,2),(3,1);(2,1)
PAM
BLOSSUM
GONNET
DNA Identity Matrix
DNA PUPY matrix
   Substitution Matrices
      Insertion and deletions are less likely than
    a substitution
      Insertion and Deletion in DNA sequence leads to Frame
       shift.



PAM Matrices(Point Accepted Mutation Matrices)
Margaret Dayhoff 1978

PAM1 -> Expected rates of substition if 1% of the
amino acids have changed
 BLOSUM : Blocks Substitution Matrix (% of identity)
PAM matrices are based on a
   simple evolutionary model
    MATLFC          MLTLCC




          M(A/L)TL(F/C)C     Two changes
       Ancestral sequence?
• Only mutations are allowed
• Sites evolve independently
                                           15
Guidelines for using matricies


Protein Query      LengthMatrix   Open Gap   Extend Gap
>300                  BLOSUM50          -10      -2
85-300                BLOSUM62          -7       -1
50-85                 BLOSUM80          -16      -4
>300                  PAM250             -10      -2
85-300                 PAM120            -16      -4
35-85                  MDM40            -12       -2
<=35                   MDM20             -22      -4
<=10                    MDM10            -23      -4

PAM100   ==>    Blosum90
PAM120   ==>    Blosum80
PAM160   ==>    Blosum60
PAM200   ==>    Blosum52
PAM250   ==>    Blosum45
Scoring Matrices
S = [sij] gives score of aligning character i
  with character j for every pair i, j.


                              STPP
                              CTCA

                               0 + 3 + (-3) + 1

                                  =1
                                                17

More Related Content

What's hot

Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)
Pritom Chaki
 
RNA secondary structure prediction
RNA secondary structure predictionRNA secondary structure prediction
RNA secondary structure prediction
Muhammed sadiq
 
Clustal
ClustalClustal
Clustal
Benittabenny
 
222397 lecture 16 17
222397 lecture 16 17222397 lecture 16 17
222397 lecture 16 17
mohamedseyam13
 
Statistical significance of alignments
Statistical significance of alignmentsStatistical significance of alignments
Statistical significance of alignmentsavrilcoghlan
 
Mascot database
Mascot databaseMascot database
Mascot database
angellal2010
 
Protein database
Protein databaseProtein database
Protein database
Rajpal Choudhary
 
Blast Algorithm
Blast AlgorithmBlast Algorithm
The yeast two hybrid system and ChIP
The yeast two hybrid system and ChIPThe yeast two hybrid system and ChIP
The yeast two hybrid system and ChIP
Abhishek M
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure prediction
Siva Dharshini R
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interaction
Aashish Patel
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matrices
Ashwini
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
Vidya Kalaivani Rajkumar
 
Global and Local Sequence Alignment
Global and Local Sequence AlignmentGlobal and Local Sequence Alignment
Global and Local Sequence Alignment
AjayPatil210
 
Blast bioinformatics
Blast bioinformaticsBlast bioinformatics
Blast bioinformaticsatmapandey
 
Sequence database
Sequence databaseSequence database
Sequence database
Dr.M.Prasad Naidu
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
Shifa Ansari
 
Chou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure predictionChou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure prediction
Roshan Karunarathna
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
Meghaj Mallick
 
RNA structure analysis
RNA structure analysis RNA structure analysis
RNA structure analysis
Afra Fathima
 

What's hot (20)

Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)
 
RNA secondary structure prediction
RNA secondary structure predictionRNA secondary structure prediction
RNA secondary structure prediction
 
Clustal
ClustalClustal
Clustal
 
222397 lecture 16 17
222397 lecture 16 17222397 lecture 16 17
222397 lecture 16 17
 
Statistical significance of alignments
Statistical significance of alignmentsStatistical significance of alignments
Statistical significance of alignments
 
Mascot database
Mascot databaseMascot database
Mascot database
 
Protein database
Protein databaseProtein database
Protein database
 
Blast Algorithm
Blast AlgorithmBlast Algorithm
Blast Algorithm
 
The yeast two hybrid system and ChIP
The yeast two hybrid system and ChIPThe yeast two hybrid system and ChIP
The yeast two hybrid system and ChIP
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure prediction
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interaction
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matrices
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Global and Local Sequence Alignment
Global and Local Sequence AlignmentGlobal and Local Sequence Alignment
Global and Local Sequence Alignment
 
Blast bioinformatics
Blast bioinformaticsBlast bioinformatics
Blast bioinformatics
 
Sequence database
Sequence databaseSequence database
Sequence database
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Chou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure predictionChou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure prediction
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
RNA structure analysis
RNA structure analysis RNA structure analysis
RNA structure analysis
 

Viewers also liked

Blast fasta 4
Blast fasta 4Blast fasta 4
Blast fasta 4
Er Puspendra Tripathi
 
Fasta
FastaFasta
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignment
Kubuldinho
 
Blast
BlastBlast
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignmentavrilcoghlan
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)
Sobia
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
Malla Reddy College of Pharmacy
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
ammar kareem
 
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Asiri Wijesinghe
 
Alignments
AlignmentsAlignments
Alignments
James McInerney
 
Multiple alignment
Multiple alignmentMultiple alignment
Multiple alignmentavrilcoghlan
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformaticsavrilcoghlan
 
Blast
BlastBlast
Blast
Athar Awan
 
Needleman-wunch algorithm harshita
Needleman-wunch algorithm  harshitaNeedleman-wunch algorithm  harshita
Needleman-wunch algorithm harshita
Harshita Bhawsar
 
K8 bs pa islam
K8 bs pa   islamK8 bs pa   islam
K8 bs pa islam
Fasta Qoirita
 

Viewers also liked (20)

Fasta
FastaFasta
Fasta
 
Fasta
FastaFasta
Fasta
 
Blast fasta 4
Blast fasta 4Blast fasta 4
Blast fasta 4
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
Fasta
FastaFasta
Fasta
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignment
 
Blast
BlastBlast
Blast
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
 
BLAST
BLASTBLAST
BLAST
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Blast
BlastBlast
Blast
 
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)
 
Alignments
AlignmentsAlignments
Alignments
 
Multiple alignment
Multiple alignmentMultiple alignment
Multiple alignment
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformatics
 
Blast
BlastBlast
Blast
 
Needleman-wunch algorithm harshita
Needleman-wunch algorithm  harshitaNeedleman-wunch algorithm  harshita
Needleman-wunch algorithm harshita
 
K8 bs pa islam
K8 bs pa   islamK8 bs pa   islam
K8 bs pa islam
 

Similar to Sequence Alignment,Blast, Fasta, MSA

BLAST
BLASTBLAST
ppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdfppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdf
Paul Gardner
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07Computer Science Club
 
Wang labsummer2010
Wang labsummer2010Wang labsummer2010
Wang labsummer2010
russodl
 
Analyzing_ETF_Financial_Data_In_R
Analyzing_ETF_Financial_Data_In_RAnalyzing_ETF_Financial_Data_In_R
Analyzing_ETF_Financial_Data_In_RGeoffery Mullings
 
Comparative Genomics with GMOD and BioPerl
Comparative Genomics with GMOD and BioPerlComparative Genomics with GMOD and BioPerl
Comparative Genomics with GMOD and BioPerl
Jason Stajich
 
Bioinformatics life sciences_v2015
Bioinformatics life sciences_v2015Bioinformatics life sciences_v2015
Bioinformatics life sciences_v2015
Prof. Wim Van Criekinge
 
Lab talk 190210 efficacy studies on radioligand hits_beginnings of fret assay...
Lab talk 190210 efficacy studies on radioligand hits_beginnings of fret assay...Lab talk 190210 efficacy studies on radioligand hits_beginnings of fret assay...
Lab talk 190210 efficacy studies on radioligand hits_beginnings of fret assay...
Laurence Dawkins-Hall
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
alizain9604
 

Similar to Sequence Alignment,Blast, Fasta, MSA (13)

Similarity
SimilaritySimilarity
Similarity
 
BLAST
BLASTBLAST
BLAST
 
_BLAST.ppt
_BLAST.ppt_BLAST.ppt
_BLAST.ppt
 
ppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdfppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdf
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
 
Wang labsummer2010
Wang labsummer2010Wang labsummer2010
Wang labsummer2010
 
Analyzing_ETF_Financial_Data_In_R
Analyzing_ETF_Financial_Data_In_RAnalyzing_ETF_Financial_Data_In_R
Analyzing_ETF_Financial_Data_In_R
 
Ch06 alignment
Ch06 alignmentCh06 alignment
Ch06 alignment
 
Comparative Genomics with GMOD and BioPerl
Comparative Genomics with GMOD and BioPerlComparative Genomics with GMOD and BioPerl
Comparative Genomics with GMOD and BioPerl
 
Bioinformatics life sciences_v2015
Bioinformatics life sciences_v2015Bioinformatics life sciences_v2015
Bioinformatics life sciences_v2015
 
Lab talk 190210 efficacy studies on radioligand hits_beginnings of fret assay...
Lab talk 190210 efficacy studies on radioligand hits_beginnings of fret assay...Lab talk 190210 efficacy studies on radioligand hits_beginnings of fret assay...
Lab talk 190210 efficacy studies on radioligand hits_beginnings of fret assay...
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
Lecture6.pptx
Lecture6.pptxLecture6.pptx
Lecture6.pptx
 

More from Sucheta Tripathy

Gal
GalGal
Ramorum2016 final
Ramorum2016 finalRamorum2016 final
Ramorum2016 final
Sucheta Tripathy
 
Primer designgeneprediction
Primer designgenepredictionPrimer designgeneprediction
Primer designgeneprediction
Sucheta Tripathy
 
Motif andpatterndatabase
Motif andpatterndatabaseMotif andpatterndatabase
Motif andpatterndatabase
Sucheta Tripathy
 
Databases ii
Databases iiDatabases ii
Databases ii
Sucheta Tripathy
 
Snps and microarray
Snps and microarraySnps and microarray
Snps and microarray
Sucheta Tripathy
 
Stat2013
Stat2013Stat2013
26 nov2013seminar
26 nov2013seminar26 nov2013seminar
26 nov2013seminar
Sucheta Tripathy
 
Stat2013
Stat2013Stat2013
Presentation2013
Presentation2013Presentation2013
Presentation2013
Sucheta Tripathy
 
Lecture7,8
Lecture7,8Lecture7,8
Lecture7,8
Sucheta Tripathy
 
Primer designgeneprediction
Primer designgenepredictionPrimer designgeneprediction
Primer designgeneprediction
Sucheta Tripathy
 
Lecture 3,4
Lecture 3,4Lecture 3,4
Lecture 3,4
Sucheta Tripathy
 
Lecture 1,2
Lecture 1,2Lecture 1,2
Lecture 1,2
Sucheta Tripathy
 
Databases Part II
Databases Part IIDatabases Part II
Databases Part II
Sucheta Tripathy
 
Biological databases
Biological databasesBiological databases
Biological databases
Sucheta Tripathy
 
Genome sequencingprojects
Genome sequencingprojectsGenome sequencingprojects
Genome sequencingprojects
Sucheta Tripathy
 
Tyler presentation
Tyler presentationTyler presentation
Tyler presentation
Sucheta Tripathy
 

More from Sucheta Tripathy (20)

Gal
GalGal
Gal
 
Ramorum2016 final
Ramorum2016 finalRamorum2016 final
Ramorum2016 final
 
Primer designgeneprediction
Primer designgenepredictionPrimer designgeneprediction
Primer designgeneprediction
 
Motif andpatterndatabase
Motif andpatterndatabaseMotif andpatterndatabase
Motif andpatterndatabase
 
Databases ii
Databases iiDatabases ii
Databases ii
 
Snps and microarray
Snps and microarraySnps and microarray
Snps and microarray
 
Stat2013
Stat2013Stat2013
Stat2013
 
26 nov2013seminar
26 nov2013seminar26 nov2013seminar
26 nov2013seminar
 
Stat2013
Stat2013Stat2013
Stat2013
 
Presentation2013
Presentation2013Presentation2013
Presentation2013
 
Lecture7,8
Lecture7,8Lecture7,8
Lecture7,8
 
Lecture5,6
Lecture5,6Lecture5,6
Lecture5,6
 
Primer designgeneprediction
Primer designgenepredictionPrimer designgeneprediction
Primer designgeneprediction
 
Lecture 3,4
Lecture 3,4Lecture 3,4
Lecture 3,4
 
Lecture 1,2
Lecture 1,2Lecture 1,2
Lecture 1,2
 
Databases Part II
Databases Part IIDatabases Part II
Databases Part II
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Genome sequencingprojects
Genome sequencingprojectsGenome sequencingprojects
Genome sequencingprojects
 
Human encodeproject
Human encodeprojectHuman encodeproject
Human encodeproject
 
Tyler presentation
Tyler presentationTyler presentation
Tyler presentation
 

Sequence Alignment,Blast, Fasta, MSA

Editor's Notes

  1. Series of methods that relies on pairwise alignments