Sequence Alignment
Where we left off
• Output from Sequencing:
AACTGACCTA…
CCGTTGGCAT…
TTTGCGGTCA…
…
…
Where we left off
• I have hundreds of millions of short
• Most applications need to figure out where they
came from
• Given a reference genome sequence and millions of
short sequences, how do I figure out where each of
the short sequences came from?
What is an algorithm?
• A process or set of rules to be followed in
calculations or other problem-solving operations,
esp. by a computer
• X = 2 * n
• Make a burrito
• Unwrap burrito
• Place on plate in microwave
• Turn microwave on for 2 minutes
What makes a good algorithmic
solution?
• Speed
• Memory
• Optimality of answer
The alignment problem
• Input
• Two sequences s and t of length n and m
• Output
• An alignment between the two sequences with gaps
inserted appropriately
• Objective Function
• A scoring function that weights particular character to
character alignments
How fast do I have to eat?
• Input
• # of sandwiches
• Output
• Sandwiches / minute
• Objective function
• Minimize the number of sandwiches I have to eat per
minute such that I finish all sandwiches in an hour
Scoring Function
• Which alignment of ACCTG and ACTTG is better?
• AATAC AATA-C
• ATATC -ATATC
• How did you decide?
• Example scoring function:
• +1 for matches
• -1 for gaps and mismatches
Kinds of Alignment
http://3.bp.blogspot.com/_OcdkdnkXwIg/SnRLqde_8PI/AAAAAAAAAZs/_ETauY69JiM/s320/glob
al-local-alignment.png
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
T
T
G
C
Rules
• Fill in top row and left-most column according to
scoring function
• Start in the upper left-most corner of unfilled squares
• Move left to right filling in the result of the scoring
function
• Break ties arbitrarily
• Trace back from bottom right corner to upper left
corner
Needleman-Wunsch Recurrence
𝑆𝑆 𝑖𝑖, 𝑗𝑗 = max �
𝑆𝑆 𝑖𝑖 − 1, 𝑗𝑗 − 1 + 𝛿𝛿 𝑠𝑠𝑖𝑖, 𝑡𝑡𝑗𝑗
𝑆𝑆 𝑖𝑖 − 1, 𝑗𝑗 + 𝛿𝛿 −
𝑆𝑆 𝑖𝑖, 𝑗𝑗 − 1 + 𝛿𝛿 −
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
T
T
G
C
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0
T
T
G
C
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1
T
T
G
C
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2
T
T
G
C
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3
T
T
G
C
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T
T
G
C
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1
T
G
C
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1
T -2
G
C
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1
T -2
G -3
C
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1
T -2
G -3
C -4
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1 -1
T -2
G -3
C -4
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1 -1 0
T -2
G -3
C -4
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1 -1 0 -1
T -2
G -3
C -4
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1 -1 0 -1 -2
T -2
G -3
C -4
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1 -1 0 -1 -2
T -2 -2
G -3
C -4
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1 -1 0 -1 -2
T -2 -2 0
G -3
C -4
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1 -1 0 -1 -2
T -2 -2 0 1
G -3
C -4
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1 -1 0 -1 -2
T -2 -2 0 1 0
G -3
C -4
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1 -1 0 -1 -2
T -2 -2 0 1 0
G -3 -3
C -4
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1 -1 0 -1 -2
T -2 -2 0 1 0
G -3 -3 -1
C -4
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1 -1 0 -1 -2
T -2 -2 0 1 0
G -3 -3 -1 0
C -4
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1 -1 0 -1 -2
T -2 -2 0 1 0
G -3 -3 -1 0 2
C -4
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1 -1 0 -1 -2
T -2 -2 0 1 0
G -3 -3 -1 0 2
C -4 -4
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1 -1 0 -1 -2
T -2 -2 0 1 0
G -3 -3 -1 0 2
C -4 -4 -2
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1 -1 0 -1 -2
T -2 -2 0 1 0
G -3 -3 -1 0 2
C -4 -4 -2 -1
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1 -1 0 -1 -2
T -2 -2 0 1 0
G -3 -3 -1 0 2
C -4 -4 -2 -1 1
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1 -1 0 -1 -2
T -2 -2 0 1 0
G -3 -3 -1 0 2
C -4 -4 -2 -1 1
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1 -1 0 -1 -2
T -2 -2 0 1 0
G -3 -3 -1 0 2
C -4 -4 -2 -1 1
-
C
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1 -1 0 -1 -2
T -2 -2 0 1 0
G -3 -3 -1 0 2
C -4 -4 -2 -1 1
G -
G C
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1 -1 0 -1 -2
T -2 -2 0 1 0
G -3 -3 -1 0 2
C -4 -4 -2 -1 1
T G -
T G C
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1 -1 0 -1 -2
T -2 -2 0 1 0
G -3 -3 -1 0 2
C -4 -4 -2 -1 1
T T G -
T T G C
Global Alignment:
Needleman-Wunsch Algorithm
A T T G
0 -1 -2 -3 -4
T -1 -1 0 -1 -2
T -2 -2 0 1 0
G -3 -3 -1 0 2
C -4 -4 -2 -1 1
A T T G -
- T T G C
Rules
• Fill in top row and left-most column according to
scoring function
• Start in the upper left-most corner of unfilled squares
• Move left to right filling in the result of the scoring
function
• Break ties arbitrarily
• Trace back from the max element in the matrix to the
first STOP
Rules
• Fill in top row and left-most column according to
scoring function
• Start in the upper left-most corner of unfilled squares
• Move left to right filling in the result of the scoring
function
• Break ties arbitrarily
• Trace back from the max element in the matrix to the
first STOP
Local Alignment:
Smith-Waterman Algorithm
𝑆𝑆 𝑖𝑖, 𝑗𝑗 = max
0
𝑆𝑆 𝑖𝑖 − 1, 𝑗𝑗 − 1 + 𝛿𝛿 𝑠𝑠𝑖𝑖, 𝑡𝑡𝑗𝑗
𝑆𝑆 𝑖𝑖 − 1, 𝑗𝑗 + 𝛿𝛿 −
𝑆𝑆 𝑖𝑖, 𝑗𝑗 − 1 + 𝛿𝛿 −
Local Alignment:
Smith-Waterman Algorithm
𝑆𝑆 𝑖𝑖, 𝑗𝑗 = max
0
𝑆𝑆 𝑖𝑖 − 1, 𝑗𝑗 − 1 + 𝛿𝛿 𝑠𝑠𝑖𝑖, 𝑡𝑡𝑗𝑗
𝑆𝑆 𝑖𝑖 − 1, 𝑗𝑗 + 𝛿𝛿 −
𝑆𝑆 𝑖𝑖, 𝑗𝑗 − 1 + 𝛿𝛿 −
Local Alignment:
Smith-Waterman Algorithm
A T T G
T
T
G
C
Local Alignment:
Smith-Waterman Algorithm
A T T G
0
T
T
G
C
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0
T
T
G
C
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0
T
T
G
C
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0
T
T
G
C
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T
T
G
C
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0
T
G
C
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0
T 0
G
C
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0
T 0
G 0
C
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0
T 0
G 0
C 0
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0 0
T 0
G 0
C 0
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0 0 1
T 0
G 0
C 0
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0 0 1 1
T 0
G 0
C 0
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0 0 1 1 0
T 0
G 0
C 0
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0 0 1 1 0
T 0 0
G 0
C 0
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0 0 1 1 0
T 0 0 1
G 0
C 0
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0 0 1 1 0
T 0 0 1 2
G 0
C 0
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0 0 1 1 0
T 0 0 1 2 1
G 0
C 0
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0 0 1 1 0
T 0 0 1 2 1
G 0 0
C 0
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0 0 1 1 0
T 0 0 1 2 1
G 0 0 0
C 0
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0 0 1 1 0
T 0 0 1 2 1
G 0 0 0 1
C 0
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0 0 1 1 0
T 0 0 1 2 1
G 0 0 0 1 3
C 0
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0 0 1 1 0
T 0 0 1 2 1
G 0 0 0 1 3
C 0 0
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0 0 1 1 0
T 0 0 1 2 1
G 0 0 0 1 3
C 0 0 0
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0 0 1 1 0
T 0 0 1 2 1
G 0 0 0 1 3
C 0 0 0 0
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0 0 1 1 0
T 0 0 1 2 1
G 0 0 0 1 3
C 0 0 0 0 2
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0 0 1 1 0
T 0 0 1 2 1
G 0 0 0 1 3
C 0 0 0 0 2
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0 0 1 1 0
T 0 0 1 2 1
G 0 0 0 1 3
C 0 0 0 0 2
G
G
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0 0 1 1 0
T 0 0 1 2 1
G 0 0 0 1 3
C 0 0 0 0 2
T G
T G
Local Alignment:
Smith-Waterman Algorithm
A T T G
0 0 0 0 0
T 0 0 1 1 0
T 0 0 1 2 1
G 0 0 0 1 3
C 0 0 0 0 2
T T G
T T G
How to evaluate statistical
significance?
• Everyone pick a number between 1 and 10 (keep it
to yourself!)
The problem with databases
• Query is: ACCT
• Is a match significant?
• Database A:
• ACCT
• CAGG
• AAAA
• Database B:
• ACCT
• ACCT
• ACCT
Alignment Projects
• Research BWA
• Research Bowtie
• Research MAQ
• Code a program in the language of your choice that
performs Needleman-Wunsch or Smith Waterman

Sequence Alignment