Successfully reported this slideshow.
Upcoming SlideShare
×

# Global alignment

5,398 views

Published on

Global Alignment algorithm with example and applications

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Global alignment

1. 1. GLOBAL ALIGNMENT Pinky Sheetal V M.tech Bioinformatics
2. 2. CONTENTS Sequence Alignment Dynamic Programming Algorithm Global Alignment
3. 3. The result of inserting gaps into the strings such thatafterwards as many positions as possible coincides.X: AGGCTATCAY: TAGCTATCA
4. 4. Scoring weights:For a match : +mFor a mismatch : -sFor a gap : -dAlignment Score:F = (# matches) x m - (# mismatches) x s – (#gaps) x d
5. 5. Complex ProblemSub prob1 Soln 1 Sub prob2 Sub prob3 Soln 2 Soln 3
6. 6. GLOBAL ALIGNMENT
7. 7. •Allows obtaining the optimal alignment with linear gap cost hasbeen proposed by Needleman and Wunsch by providing ascore, for each position of the aligned sequences.•Based on the dynamic programming technique.•For two sequences of length m and n we define a matrix ofdimensions m+1 and n+1.
8. 8. Termination Condition:Optimal score between the two sequencesobtained at the last cell of the last row and lastcolumn.
9. 9. Sequences:S: ATTATCTT: TTTCTA TS 0 _ T T T C T A _ 0 -1 -2 -3 -4 -5 -6 A -1 0 -1 -2 -3 -4 -5 T -2 1 2 1 0 -1 -2 T -3 0 3 4 3 2 1 A -4 -1 2 3 4 3 4 Match Score = +2 Mismatch Score = 0 T -5 -2 1 4 3 6 5 Gap Penalty = -1 C -6 -3 0 3 6 5 6 T -7 -4 -1 2 5 8 7
10. 10. T 0 _ T T T C T AS _ 0 -1 -2 -3 -4 -5 -6 A -1 0 -1 -2 -3 -4 -5 T -2 1 2 1 0 -1 -2 T -3 0 3 4 3 2 1 A -4 -1 2 3 4 3 4 T -5 -2 1 4 3 6 5 C -6 -3 0 3 6 5 6 T -7 -4 -1 2 5 8 7
11. 11.  Optimal Alignment:S ATTATC T–T - TT – TC TANo: of matches = 5No: of mismatches = 3(5 x 2) – (3 x -1) = 7
12. 12. Tools that utilize Global Alignment Algorithm EMBOSS Needle EMBOSS StretcherApplications: Identify Conserved Interaction Pathways and Complexes [Brian P. Kelley,et al.2003] Functional Orthology Detection [ Rohit Singh.et al.2008]Advantages:The similar sequence region is of the same order and orientation.Disadvantage:Slow, Memory IntensiveCannot be applied on genome-sized sequences