Heuristic (Educated guess) Does not compare sequence to its entirety. Quickly locates short matches(seeds) Word size Seeds are extended in both directions Threshold is defined ◦ > Threshold -> keep the alignment ◦ < Threshold -> discard the alignment
A Query sequence: ◦ Nucleotide ◦ Protein A Target Database ◦ Nucleotide ◦ Protein Blast Program ◦ Blastn ◦ Blastp ◦ tBlastx (Slowest Nt query translated against Nt database trlt.) ◦ tBlastn (Protein query translated nt. Database) ◦ Blastx (Nucleotide trnslt against Protein database)
E Value -> Probability value at which the sequence hits may occur by chance Score -> Similarity score. ◦ By chance rain probability is 0.001 ◦ Passing by chance etc. ◦ Less the e –value the better is the sensitivity of the alignment.
Remove Low Complexity regions Generate all the k mers. List All Possible matching key words. - Blast cares about only high scoring pairs - Fasta stores all pairs irrespective of the scores. Extend the matches into high scoring pairs(HSPs) Evaluate results depending on thresholds set. Extend HSPs and join them together.
Substitution Matrices Insertion and deletions are less likely than a substitution Insertion and Deletion in DNA sequence leads to Frame shift.PAM Matrices(Point Accepted Mutation Matrices)Margaret Dayhoff 1978PAM1 -> Expected rates of substition if 1% of theamino acids have changed BLOSUM : Blocks Substitution Matrix (% of identity)
PAM matrices are based on a simple evolutionary model MATLFC MLTLCC M(A/L)TL(F/C)C Two changes Ancestral sequence?• Only mutations are allowed• Sites evolve independently 15