SlideShare a Scribd company logo
1 of 134
2016.09.28
TOPIC REVIEW
• Exam
• PS2 Sequence Alignment
• Command Line Blast
• PS1 Molecular Biology
• Personal Microbiome Project
CURRENTLY
LET’S NEGOTIATE
• Problem sets (4) - 10%
• Microbiome project - 20%
• Exam (1) - 20%
• Research project - 45%
• Participation - 5%
OR
• Problem sets (4) - 10%
• Microbiome project - 20%
• Exam 1 - 15%
• Exam 2 - 15%
• Research project - 35%
• Participation - 5%
PS2 SEQUENCE ALIGNMENT
PS2 SEQUENCE ALIGNMENT
RefSeqs, protein (experimentally supported)
On chromosome 17
Reverse strand
PRCD Progressive rod-cone degeneration
PS2: GLOBAL ALIGNMENT
BLOSUM62
• substitutions less penalized and are
preferred to gaps. There is also a
decrease in the level of identity.
BLOSUM80
• Substitutions more penalized and
gaps are favored.
PAM60
• Substitutions more penalized and gaps
are favored.
PAM250
• substitutions less penalized and are
preferred to gaps. There is also a
decrease in the level of identity.
PS2: LOCAL ALIGNMENT
SEQ1 A L S C V W M I P
SEQ2 A I S C M I P T
9 residues
8 residues
Create Matrix: length of seq1 + 1
x
length of seq2 + 1
Matrix 10 x 9
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2
-4
-6
-8
-10
-12
-14
-16
A
I
S
C
M
I
P
T
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2
-4
-6
-8
-10
-12
-14
-16
A
I
S
C
M
I
P
T
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
4 comes from the
substitution matrix.
Match score = 0 + (4) = 4
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2
-4
-6
-8
-10
-12
-14
-16
A
I
S
C
M
I
P
T
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
Match score = 0 + (4) = 4
Vertical gap score = -2 + (-2) = -4
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2
-4
-6
-8
-10
-12
-14
-16
A
I
S
C
M
I
P
T
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
Match score = 0 + (4) = 4
Horizontal gap score = -2 + (-2) = -4
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
Vertical gap score = -2 + (-2) = -4
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2 4
-4
-6
-8
-10
-12
-14
-16
A
I
S
C
M
I
P
T
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
Match score = 0 + (4) = 4
Horizontal gap score = -2 + (-2) = -4
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
Vertical gap score = -2 + (-2) = -4
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
-1 comes from the
substitution matrix.
Match score = -2 + (-1) = -3
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2 4
-4
-6
-8
-10
-12
-14
-16
A
I
S
C
M
I
P
T
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
Match score = -2 + (-1) = -3
Vertical gap score = -4 + (-2) = -6
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2 4
-4
-6
-8
-10
-12
-14
-16
A
I
S
C
M
I
P
T
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
Match score = -2 + (-1) = -3
Horizontal gap score = 4 + (-2) = 2
Vertical gap score = -4 + (-2) = -6
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2 4
-4
-6
-8
-10
-12
-14
-16
A
I
S
C
M
I
P
T
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2 4 2
-4
-6
-8
-10
-12
-14
-16
A
I
S
C
M
I
P
T
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
Match score = -2 + (-1) = -3
Horizontal gap score = 4 + (-2) = 2
Vertical gap score = -4 + (-2) = -6
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2 4 2
-4
-6
-8
-10
-12
-14
-16
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
+1 comes from the
substitution matrix.
Match score = -4 + (1) = -3
A
I
S
C
M
I
P
T
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2 4 2
-4
-6
-8
-10
-12
-14
-16
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
Match score = -4 + (1) = -3
Vertical gap score = -6 + (-2) = -8
A
I
S
C
M
I
P
T
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2 4 2
-4
-6
-8
-10
-12
-14
-16
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
Match score = -4 + (1) = -3
Horizontal gap score = 2 + (-2) = 0
Vertical gap score = -6 + (-2) = -8
A
I
S
C
M
I
P
T
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2 4 2 0
-4
-6
-8
-10
-12
-14
-16
A
I
S
C
M
I
P
T
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
Match score = 0 + (4) = 4
Horizontal gap score = -2 + (-2) = -4
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
Vertical gap score = -2 + (-2) = -4
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2 4 2 0 -2 -4 -6 -8 -10 -12
-4 2 6 4 2 1 -1 -3 -4 -6
-6 0 0 10 8 6 4 2 0 -2
-8 -2 -1 -1 19 17 15 13 11 9
-10 -4 0 -2 17 20 18 20 18 16
-12 -6 -2 -2 15 20 18 19 24 22
-14 -8 -4 -3 13 18 16 17 22 31
-16 -10 -6 -3 11 16 16 15 20 29
A
I
S
C
M
I
P
T
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
-
T
Seq1
Seq2
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2 4 2 0 -2 -4 -6 -8 -10 -12
-4 2 6 4 2 1 -1 -3 -4 -6
-6 0 0 10 8 6 4 2 0 -2
-8 -2 -1 -1 19 17 15 13 11 9
-10 -4 0 -2 17 20 18 20 18 16
-12 -6 -2 -2 15 20 18 19 24 22
-14 -8 -4 -3 13 18 16 17 22 31
-16 -10 -6 -3 11 16 16 15 20 29
A
I
S
C
M
I
P
T
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
P -
P T
Seq1
Seq2
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2 4 2 0 -2 -4 -6 -8 -10 -12
-4 2 6 4 2 1 -1 -3 -4 -6
-6 0 0 10 8 6 4 2 0 -2
-8 -2 -1 -1 19 17 15 13 11 9
-10 -4 0 -2 17 20 18 20 18 16
-12 -6 -2 -2 15 20 18 19 24 22
-14 -8 -4 -3 13 18 16 17 22 31
-16 -10 -6 -3 11 16 16 15 20 29
A
I
S
C
M
I
P
T
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
I P -
I P T
Seq1
Seq2
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2 4 2 0 -2 -4 -6 -8 -10 -12
-4 2 6 4 2 1 -1 -3 -4 -6
-6 0 0 10 8 6 4 2 0 -2
-8 -2 -1 -1 19 17 15 13 11 9
-10 -4 0 -2 17 20 18 20 18 16
-12 -6 -2 -2 15 20 18 19 24 22
-14 -8 -4 -3 13 18 16 17 22 31
-16 -10 -6 -3 11 16 16 15 20 29
A
I
S
C
M
I
P
T
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
M I P -
M I P T
Seq1
Seq2
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2 4 2 0 -2 -4 -6 -8 -10 -12
-4 2 6 4 2 1 -1 -3 -4 -6
-6 0 0 10 8 6 4 2 0 -2
-8 -2 -1 -1 19 17 15 13 11 9
-10 -4 0 -2 17 20 18 20 18 16
-12 -6 -2 -2 15 20 18 19 24 22
-14 -8 -4 -3 13 18 16 17 22 31
-16 -10 -6 -3 11 16 16 15 20 29
A
I
S
C
M
I
P
T
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
W M I P -
- M I P T
Seq1
Seq2
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2 4 2 0 -2 -4 -6 -8 -10 -12
-4 2 6 4 2 1 -1 -3 -4 -6
-6 0 0 10 8 6 4 2 0 -2
-8 -2 -1 -1 19 17 15 13 11 9
-10 -4 0 -2 17 20 18 20 18 16
-12 -6 -2 -2 15 20 18 19 24 22
-14 -8 -4 -3 13 18 16 17 22 31
-16 -10 -6 -3 11 16 16 15 20 29
A
I
S
C
M
I
P
T
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
V W M I P -
- - M I P T
Seq1
Seq2
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2 4 2 0 -2 -4 -6 -8 -10 -12
-4 2 6 4 2 1 -1 -3 -4 -6
-6 0 0 10 8 6 4 2 0 -2
-8 -2 -1 -1 19 17 15 13 11 9
-10 -4 0 -2 17 20 18 20 18 16
-12 -6 -2 -2 15 20 18 19 24 22
-14 -8 -4 -3 13 18 16 17 22 31
-16 -10 -6 -3 11 16 16 15 20 29
A
I
S
C
M
I
P
T
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
C V W M I P -
C - - M I P T
Seq1
Seq2
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2 4 2 0 -2 -4 -6 -8 -10 -12
-4 2 6 4 2 1 -1 -3 -4 -6
-6 0 0 10 8 6 4 2 0 -2
-8 -2 -1 -1 19 17 15 13 11 9
-10 -4 0 -2 17 20 18 20 18 16
-12 -6 -2 -2 15 20 18 19 24 22
-14 -8 -4 -3 13 18 16 17 22 31
-16 -10 -6 -3 11 16 16 15 20 29
A
I
S
C
M
I
P
T
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
S C V W M I P -
S C - - M I P T
Seq1
Seq2
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2 4 2 0 -2 -4 -6 -8 -10 -12
-4 2 6 4 2 1 -1 -3 -4 -6
-6 0 0 10 8 6 4 2 0 -2
-8 -2 -1 -1 19 17 15 13 11 9
-10 -4 0 -2 17 20 18 20 18 16
-12 -6 -2 -2 15 20 18 19 24 22
-14 -8 -4 -3 13 18 16 17 22 31
-16 -10 -6 -3 11 16 16 15 20 29
A
I
S
C
M
I
P
T
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
L S C V W M I P -
I S C - - M I P T
Seq1
Seq2
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2 4 2 0 -2 -4 -6 -8 -10 -12
-4 2 6 4 2 1 -1 -3 -4 -6
-6 0 0 10 8 6 4 2 0 -2
-8 -2 -1 -1 19 17 15 13 11 9
-10 -4 0 -2 17 20 18 20 18 16
-12 -6 -2 -2 15 20 18 19 24 22
-14 -8 -4 -3 13 18 16 17 22 31
-16 -10 -6 -3 11 16 16 15 20 29
A
I
S
C
M
I
P
T
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
A L S C V W M I P -
A I S C - - M I P T
Seq1
Seq2
A L S C V W M I P
0 -2 -4 -6 -8 -10 -12 -14 -16 -18
-2 4 2 0 -2 -4 -6 -8 -10 -12
-4 2 6 4 2 1 -1 -3 -4 -6
-6 0 0 10 8 6 4 2 0 -2
-8 -2 -1 -1 19 17 15 13 11 9
-10 -4 0 -2 17 20 18 20 18 16
-12 -6 -2 -2 15 20 18 19 24 22
-14 -8 -4 -3 13 18 16 17 22 31
-16 -10 -6 -3 11 16 16 15 20 29
A
I
S
C
M
I
P
T
Exercise: fill the scores of the alignment matrix
using the BLOSUM62 substitution matrix.
Gap opening penalty: -5
Gap extension penalty: -1
S V E T D
T
S
I
N
Q
E
T
Ala A 4
Arg R -1 5
Asn N -2 0 6
Asp D -2 -2 1 6
Cys C 0 -3 -3 -3 9
Gln Q -1 1 0 0 -3 5
Glu E -1 0 0 2 -4 2 5
Gly G 0 -2 0 -1 -3 -2 -2 6
His H -2 0 1 -1 -3 0 0 -2 8
Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4
Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5
Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7
Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4
Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11
Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
A
la
A
rg
A
sn
A
sp
C
y
s
G
ln
G
lu
G
ly
H
is
Il
e
L
e
u
L
y
s
M
e
t
P
h
e
P
ro
S
e
r
T
h
r
T
rp
T
y
r
V
a
l
A R N D C Q E G H I L K M F P S T W Y V
Dynamical programming - global alignment
83
BLOSUM62
GAP COST: -2
At each cell, 3 scores are calculated:
• match score = diagonal cell score +
score from the substitution matrix.
• Vertical gap score = upper neighbor
+ gap cost
• Horizontal gap score = left neighbor
+ gap cost
• The highest score is retained and
the arrow is labelled
A L S C V W M I P -
A I S C - - M I P T
Seq1
Seq2
4 2 4 9-2-2 5 4 7
alignment score: 31
COMMAND LINE BLAST
command
•Resource: https://www.ncbi.nlm.nih.gov/books/
NBK279675/
•Log onto compile
source /usr/local/ncbi-blast-2.4.0+/blast_env.sh
export BNFO=/home/bnfo301/assignments/2016-09-28
COMMAND LINE BLAST
• Find the documentation for makeblastdb.
• Create blast database
• Run blastp with query1.faa file
• Run blastp with query2.faa file
makeblastdb -help
makeblastdb -in $BNFO301/protein-db.faa -dbtype prot
-parse_seqids -out /home/bnfo301/huangb2/protein-
db.faa
blastp -query $BNFO301/query1.faa -db /home/bnfo301/
huangb2/protein-db.faa -out query1-out.txt
blastp -query $BNFO301/query2.faa -db /home/bnfo301/
huangb2/protein-db.faa -out query1-out.txt
COMMAND LINE BLAST
• Change the output format
• Change the evalue
• Change the substitution matrix
blastp -query $BNFO301/query2.faa -db /home/bnfo301/
huangb2/protein-db.faa -out query2-out.txt -outfmt 7
blastp -query $BNFO301/query2.faa -db /home/bnfo301/
huangb2/protein-db.faa -out query2-out.txt -outfmt 7
-evalue 1e-05
blastp -query $BNFO301/query2.faa -db /home/bnfo301/
huangb2/protein-db.faa -out query2-out.txt -outfmt 7
-evalue 1e-05 -matrix BLOSUM80
PS1 MOLECULAR BIOLOGY
BNFO301: PS Molecular Biology
DUE:
1. Complete the following table:
2. Below is the double-stranded DNA sequence of a
hypothetical genome, which happens to
have a very small gene.
a. Which strand of DNA shown, the top or the bottom, is the
template strand?
b. What is the sequence of the mRNA produced from this gene?
c. What is the sequence of the protein produced from the
mRNA?
d. If a mutation were found where a T/A (top/bottom) base pair
were added immediately
after the T/A base pair shown in bold, what would be the
sequence of the mRNA? What
would be the sequence of the protein? What type of mutation is
it?
Translation Problem Set - 1
BNFO 301: Introduction to Bioinformatics
Introduction to Molecular Biology: Translation - Problem Set
1. Complete the following table:
DNA
double helix
A G A
T G T
mRNA
transcribed 5' A U
Appropriate
tRNA anticodon U G 5'
Amino acids
incorporated into protein
met
2. List the changes that can be produced by a single basepair
mutation in the AGA codon
encoding arginine and label each silent (no effect on protein
structure), conservative (mild
effect on protein structure), hydrophobic-to-hydrophilic,
hydrophilic-to-hydrophobic, or
other.
3. Hemophilia A is an X-linked disease associated with the
absence of an essential blood clotting
factor, factor VIII (if you don't have any idea what an X-linked
trait is, don't worry about it).
Factor VIII is encoded by the gene called FACTOR8. This gene
was cloned from several
individuals -- some affected, some not -- and sequenced. A
portion of each sequence that
you're sure contains the beginning of the gene (i.e., the start
codon) was compared with the
same portion of the wild-type sequence, as shown below. Each
sequence contains only one
mutation, shown emphasized.
Wild-type 5'-
GGAGTTGAGTCATGGACTCTAAGCAGCGATCCACAAAG...
Individual a 5'-
GGAGTTTAGTCATGGACTCTAAGCAGCGATCCACAAAG...
Individual b 5'-
GGAGTTGAGTCATTGACTCTAAGCAGCGATCCACAAAG...
Individual c 5'-
GGAGTTGAGTCATGGACTCTTAGCAGCGATCCACAAAG...
Individual d 5'-
GGAGTTGAGTCATGGACTCTAAGCAGCTATCCACAAAG...
Individual e 5'-
GGAGTTGAGTCATGGACTCTAAGCAGCGATCCACTAAG...
For each individual, choose from the list below to describe what
you predict would be the
severity of the phenotype, and give the reason for your choice.
A. Severe hemophilia
B. Mild hemophilia
C. No hemophilia
A U G
U A C
T A C
A T G A
U
A
T T
A A
U
C C T A
C G A T
GC C
GC
AA
A U U3’
3’
3’
3’
5’
5’
Lys Ala Stop
PS1 MOLECULAR BIOLOGY
• Which strand of DNA shown, the top or the bottom, is the
template strand?
• What is the sequence of the mRNA produced from this gene?
• What is the sequence of the protein produced from the mRNA?
• If a mutation were found where a T/A (top/bottom) base pair
were added
immediately after the T/A base pair shown in bold, what would
be the
sequence of the mRNA? What would be the sequence of the
protein? What
type of mutation is it?
Bottom
5’- CTATAAAGAGCCATG CAT TAT CTA GAT AGT AGG
CTC TGA GAATTTATCTCACT - 3’
||||||||||||||| ||| ||| ||| ||| ||| ||| ||| ||| |||||
3’- GATATTTCTCGGTAC GTA ATA GAT CTA TCA TCC
GAG ACT CTTAAATAGAGTGA - 5’
PROMOTER TERMINATOR
mRNA 5’- GAGCCAUG CAU UAU CUA GAU AGU AGG CUC
UGA GAAUUUAUCUC -3’
protein 5’- met his tyr leu asp ser arg leu stp -3’
Standard Score Problems
Assuming a population mean of 500 and a population standard
deviation of 100 for the verbal subtest of the SAT exam:
1. What percentage of the student population has SAT-V scores
greater than 600?
2. What percentage of the student population has SAT-V scores
greater than 700?
3. What percentage of the student population has SAT-V scores
lower than 420?
4. What percentage of the student population has SAT-V scores
between 300 and 520?
5. What percentage of the student population has SAT-V scores
between 250 and 600?
6. What percentage of the student population has SAT-V scores
between 500 and 550?
7. A student gets a 620 on this test. Convert this to a percentile.
8. 7. A student gets a 340 on this test. Convert this to a
percentile.
BNFO301: Exam 1
1. List all the changes that can be produced by a single base
pair mutation in the AGA
codon encoding arginine and label the resulting amino acid. In
addition label each
mutation as silent, missense or nonsense. (4pts)
2. What would be the value of using a dot plot to compare a
sequence to its own reverse
complement? (2 pts)
Sketch the dot plot o
3. f a 1 kb sequence in which a motif of approximately 50
consecutive bases appears six
times in the N terminal region of the sequence. (4 pts)
4. Use the PAM250 matrix to answer question 4.
a. Give the score for aligning two alanines (A) (1 pt)
b. Give the score for aligning two tryptophans (W) (1 pt)
c. Both of these alignments constitute “matches”, so why are the
scores so different? (2
pts)
Use the BLOSUM62 matrix for questions 5 and 6.
5. Calculate the dynamic programming matrix and an optimal
GLOBAL alignment for the
protein sequences FKHMEDPLE and FMDTPLNE , scoring -2
for a gap (i.e. 2 is the gap
penalty). Use the BLOSUM62 substitution matrix (given
above).
a. Fill out the matrix. (6 pts)
b. Highlight the traceback alignment. (1 pt)
c. Write out the final alignment. (2 pts)
d. Score the final alignment. (1 pt)
6. Calculate the dynamic programming matrix and an optimal
LOCAL alignment for the
protein sequences FKHMEDPLE and FMDTPLNE . Use the
BLOSUM62 matrix (provided
above).
a. Fill out the matrix. (6 pts)
b. Highlight the traceback local alignments. (1 pt)
c. Write out the final alignment. (2 pts)
d. Score the final alignment. (1 pt)
7. What is 16S rRNA and what is its function inside a cell? (2
pts)
8. 16s rRNA is widely used in microbiome studies. List two
strengths and two limitations of
16S rRNA sequencing. (4 pts)
9. Can 16S rRNA be used to classify viruses? Why or why not?
(2 pts)
10. Which of the following amino acids is least mutable
according to the PAM scoring
matrix? (2 pts)
a. Alanine
b. Glutamine
c. Methionine
d. Cysteine
1. Which of the following sentences BEST describes the
difference between a global
alignment and a local alignment between two sequences? (2 pts)
a. Global alignment is usually used for DNA sequences, while
local alignment is usually
used for protein sequences.
b. Global alignment has gaps, while local alignment does not
have gaps.
c. Global alignment finds the global maximum, while local
alignment finds the local
maximum.
d. Global alignment aligns the whole sequence, while local
alignment finds the best
subsequence that aligns.
2. How does the BLOSUM scoring matrix differ most notably
from the PAM scoring matrix?
(2 pts)
a. It is best used for aligning very closely related proteins.
b. It is based on global multiple alignment from closely related
proteins.
c. It is based on local multiple alignments from distantly related
proteins.
d. It combines local and global alignment information.
3. A global alignment algorithm (such as Needleman-Wunsch
algorithm) is guaranteed to
find an optimal alignment. Such an algorithm: (2 pts)
a. puts the two proteins being compared into a matrix and finds
the optimal score by
exhaustively searching every possible combination of
alignments.
b. puts the two proteins being compared into a matrix and finds
the optimal score by
iterative recursions.
c. puts the two proteins being compared into a matrix and finds
the optimal alignment
by finding optimal subpaths that define the best alignment(s)
d. can be used for proteins but not for DNA sequences.
4. What are the basic concepts of library preparation? (4 pts)
5. List 3 applications of next-generation sequencing. (2 pts)
6. How many reads do you need to get 30x coverage of your
genome if your read length is
300bp and your genome size is 10Mb? (2 pts)
Command line
Log in to compile. Navigate to the bnfo301 (home/bnfo301 )
directory. There is a folder
called exam1 where you will find all the files you need to
answer the next set of questions.
Instructions for this section:
• Write your output files to your user specific folder in
/home/bnfo301 (ex. my user specific
folder is /home/bnfo301/huangb2 ). You will be graded on the
files found in your specific
folder. If the files are not in that folder you will not get credit
for your answers. No
exceptions.
• Make sure you name your output file as instructed in each
question. I will take off 1 point
for each output file that is not correctly named.
• Code is typically written using a fixed width font. Use a fixed
width font to type your
commands in this section (ex. courier, inconsolata, menlo,
monaco).
• For each question, provide the command when specified, or
the command and answer. All
output files from this section should be written to you user
specific folder on compile. I will
access your user specific folder to grade this section.
1. List the files in the exam1 folder. command only (2 pts)
2. Count how many sequences are in the protein-db.faa file?
command and answer (2
pts)
3. You have an unknown1.faa sequence that you want to blast
against sequences in the
protein-db.faa file.
a. Copy the protein-db.faa to your user specific folder.
command only
b. Create a blast database for protein-db.faa . command only (2
pts)
c. Blast unknown1.faa against the database you just created.
Name your blast output
file 3b-unknown-output.txt . command only, leave output file on
Compile (2 pts)
d. Filter your blast results for hits with an evalue greater than
1e-05. Name your blast
output file 3c-unknown-output.txt . command only, leave output
file on Compile
(2 pts)
e. What is the percent identity and alignment length of the best
hit in your blast results
when you filter based on an evalue greater than 1e-05? Hint:
you may need to change
your output format. (8 pts)
f. What is the percent identity and alignment length of the worst
hit in your blast results
when you filter based on an evalue greater than 1e-05? Hint:
you may need to change
your output format. (4 pts)
7. BLAST is a tool that can be used to query multiple databases.
It is not always necessary
to create your own database. One of the most common blast
databases is the
non-redundant database (nr).
a. Blast the unknown1.faa sequence against the nr database
(/home/norrissw/bin/I-TASSER4.2/lib/nr/nr ) to find out what it
is. Name
your blast output file 4a-unknown-nr-output.txt . NOTE: you do
not need to run
the makeblastdb command. Also, it can take a few minutes for
your blast to run
because the nr database is very big. command only, leave output
file on Compile (2
pts)
b. Filter your blast results for hits with an e-value greater than
1e-10. Name your blast
output file 4b-unknown-nr-output.txt . command only, leave
output file on
Compile (2 pts)
c. Based on the best hit from nr, take the accession number and
identify what that
protein is. (4 pts)
8. The next set of questions involve the pipeline.py script
a. Copy the pipeline.py script to your /home/bnfo301/vcuid (2
pts)
b. Rename the pipeline.py script to 5b-pipeline.py . (2 pts)
c. Describe in detail what the script is doing, including what the
output from each step
is. (4 pts)
d. Modify the script so it filters the blast results using an e-
value cut off of 1e-05. Save
the modified script as 5d-pipeline.py . You do not need to run
the script, just add
in your modification. leave output file on Compile (2 pts)
2016.09.28TOPIC REVIEW• Exam • PS2 Sequence Alignment .docx

More Related Content

Similar to 2016.09.28TOPIC REVIEW• Exam • PS2 Sequence Alignment .docx

Automatic phase changer
Automatic phase changerAutomatic phase changer
Automatic phase changerRahul Kumar
 
bioinfo_6th_20070720
bioinfo_6th_20070720bioinfo_6th_20070720
bioinfo_6th_20070720sesejun
 
[DL Hacks]Pruning Convolutional Neural Networks for Resource Efficient Inference
[DL Hacks]Pruning Convolutional Neural Networks for Resource Efficient Inference[DL Hacks]Pruning Convolutional Neural Networks for Resource Efficient Inference
[DL Hacks]Pruning Convolutional Neural Networks for Resource Efficient InferenceDeep Learning JP
 
Mathcad - CMS (Component Mode Synthesis) Analysis.pdf
Mathcad - CMS (Component Mode Synthesis) Analysis.pdfMathcad - CMS (Component Mode Synthesis) Analysis.pdf
Mathcad - CMS (Component Mode Synthesis) Analysis.pdfJulio Banks
 
Py con 2018_youngsooksong
Py con 2018_youngsooksongPy con 2018_youngsooksong
Py con 2018_youngsooksongYoung Sook Song
 
ホームページ制作の見積もり・発注 完全マニュアル
ホームページ制作の見積もり・発注 完全マニュアルホームページ制作の見積もり・発注 完全マニュアル
ホームページ制作の見積もり・発注 完全マニュアルWeb幹事
 
E xact micro 10 photometer v4
E xact micro 10 photometer v4E xact micro 10 photometer v4
E xact micro 10 photometer v4Ronnie Lewis
 
Alignment scoring functions
Alignment scoring functionsAlignment scoring functions
Alignment scoring functionsavrilcoghlan
 
Compact Street Lights - 25W LED STELLAR STREET LIGHT Specifications
Compact Street Lights - 25W LED STELLAR STREET LIGHT SpecificationsCompact Street Lights - 25W LED STELLAR STREET LIGHT Specifications
Compact Street Lights - 25W LED STELLAR STREET LIGHT SpecificationsCompact Lighting
 
Remote control of electrical equipment(eee499.blogspot.com)
Remote control of electrical equipment(eee499.blogspot.com)Remote control of electrical equipment(eee499.blogspot.com)
Remote control of electrical equipment(eee499.blogspot.com)slmnsvn
 
Application of parallel hierarchical matrices and low-rank tensors in spatial...
Application of parallel hierarchical matrices and low-rank tensors in spatial...Application of parallel hierarchical matrices and low-rank tensors in spatial...
Application of parallel hierarchical matrices and low-rank tensors in spatial...Alexander Litvinenko
 
Inventory Model with Different Deterioration Rates with Stock and Price Depen...
Inventory Model with Different Deterioration Rates with Stock and Price Depen...Inventory Model with Different Deterioration Rates with Stock and Price Depen...
Inventory Model with Different Deterioration Rates with Stock and Price Depen...inventionjournals
 
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.Deep Learning JP
 

Similar to 2016.09.28TOPIC REVIEW• Exam • PS2 Sequence Alignment .docx (20)

Automatic phase changer
Automatic phase changerAutomatic phase changer
Automatic phase changer
 
Fast updating GG.pptx
Fast updating GG.pptxFast updating GG.pptx
Fast updating GG.pptx
 
bioinfo_6th_20070720
bioinfo_6th_20070720bioinfo_6th_20070720
bioinfo_6th_20070720
 
[DL Hacks]Pruning Convolutional Neural Networks for Resource Efficient Inference
[DL Hacks]Pruning Convolutional Neural Networks for Resource Efficient Inference[DL Hacks]Pruning Convolutional Neural Networks for Resource Efficient Inference
[DL Hacks]Pruning Convolutional Neural Networks for Resource Efficient Inference
 
Global alignment
Global alignmentGlobal alignment
Global alignment
 
Mathcad - CMS (Component Mode Synthesis) Analysis.pdf
Mathcad - CMS (Component Mode Synthesis) Analysis.pdfMathcad - CMS (Component Mode Synthesis) Analysis.pdf
Mathcad - CMS (Component Mode Synthesis) Analysis.pdf
 
Py con 2018_youngsooksong
Py con 2018_youngsooksongPy con 2018_youngsooksong
Py con 2018_youngsooksong
 
ホームページ制作の見積もり・発注 完全マニュアル
ホームページ制作の見積もり・発注 完全マニュアルホームページ制作の見積もり・発注 完全マニュアル
ホームページ制作の見積もり・発注 完全マニュアル
 
E xact micro 10 photometer v4
E xact micro 10 photometer v4E xact micro 10 photometer v4
E xact micro 10 photometer v4
 
Alignment scoring functions
Alignment scoring functionsAlignment scoring functions
Alignment scoring functions
 
Compact Street Lights - 25W LED STELLAR STREET LIGHT Specifications
Compact Street Lights - 25W LED STELLAR STREET LIGHT SpecificationsCompact Street Lights - 25W LED STELLAR STREET LIGHT Specifications
Compact Street Lights - 25W LED STELLAR STREET LIGHT Specifications
 
Remote control of electrical equipment(eee499.blogspot.com)
Remote control of electrical equipment(eee499.blogspot.com)Remote control of electrical equipment(eee499.blogspot.com)
Remote control of electrical equipment(eee499.blogspot.com)
 
Application of parallel hierarchical matrices and low-rank tensors in spatial...
Application of parallel hierarchical matrices and low-rank tensors in spatial...Application of parallel hierarchical matrices and low-rank tensors in spatial...
Application of parallel hierarchical matrices and low-rank tensors in spatial...
 
4
44
4
 
Bioinformatics life sciences_v2015
Bioinformatics life sciences_v2015Bioinformatics life sciences_v2015
Bioinformatics life sciences_v2015
 
ShowNet2003-Topology
ShowNet2003-TopologyShowNet2003-Topology
ShowNet2003-Topology
 
Inventory Model with Different Deterioration Rates with Stock and Price Depen...
Inventory Model with Different Deterioration Rates with Stock and Price Depen...Inventory Model with Different Deterioration Rates with Stock and Price Depen...
Inventory Model with Different Deterioration Rates with Stock and Price Depen...
 
Markov chain
Markov chainMarkov chain
Markov chain
 
Www.kutub.info 9472
Www.kutub.info 9472Www.kutub.info 9472
Www.kutub.info 9472
 
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
 

More from felicidaddinwoodie

Business UseWeek 1 Assignment #1Instructions1. Plea.docx
Business UseWeek 1 Assignment #1Instructions1. Plea.docxBusiness UseWeek 1 Assignment #1Instructions1. Plea.docx
Business UseWeek 1 Assignment #1Instructions1. Plea.docxfelicidaddinwoodie
 
Business UsePALADIN ASSIGNMENT ScenarioYou are give.docx
Business UsePALADIN ASSIGNMENT ScenarioYou are give.docxBusiness UsePALADIN ASSIGNMENT ScenarioYou are give.docx
Business UsePALADIN ASSIGNMENT ScenarioYou are give.docxfelicidaddinwoodie
 
Business UsePractical Connection WorkThis work is a writte.docx
Business UsePractical Connection WorkThis work is a writte.docxBusiness UsePractical Connection WorkThis work is a writte.docx
Business UsePractical Connection WorkThis work is a writte.docxfelicidaddinwoodie
 
Business System AnalystSUMMARY· Cognos Business.docx
Business System AnalystSUMMARY· Cognos Business.docxBusiness System AnalystSUMMARY· Cognos Business.docx
Business System AnalystSUMMARY· Cognos Business.docxfelicidaddinwoodie
 
Business StrategyOrganizations have to develop an international .docx
Business StrategyOrganizations have to develop an international .docxBusiness StrategyOrganizations have to develop an international .docx
Business StrategyOrganizations have to develop an international .docxfelicidaddinwoodie
 
Business StrategyGroup BCase Study- KFC Business Analysis.docx
Business StrategyGroup BCase Study- KFC Business Analysis.docxBusiness StrategyGroup BCase Study- KFC Business Analysis.docx
Business StrategyGroup BCase Study- KFC Business Analysis.docxfelicidaddinwoodie
 
Business Strategy Differentiation, Cost Leadership, a.docx
Business Strategy Differentiation, Cost Leadership, a.docxBusiness Strategy Differentiation, Cost Leadership, a.docx
Business Strategy Differentiation, Cost Leadership, a.docxfelicidaddinwoodie
 
Business Research Methods, 11e, CooperSchindler1case.docx
Business Research Methods, 11e, CooperSchindler1case.docxBusiness Research Methods, 11e, CooperSchindler1case.docx
Business Research Methods, 11e, CooperSchindler1case.docxfelicidaddinwoodie
 
Business RequirementsReference number Document Control.docx
Business RequirementsReference number Document Control.docxBusiness RequirementsReference number Document Control.docx
Business RequirementsReference number Document Control.docxfelicidaddinwoodie
 
Business ProposalThe Business Proposal is the major writing .docx
Business ProposalThe Business Proposal is the major writing .docxBusiness ProposalThe Business Proposal is the major writing .docx
Business ProposalThe Business Proposal is the major writing .docxfelicidaddinwoodie
 
Business ProjectProject Progress Evaluation Feedback Form .docx
Business ProjectProject Progress Evaluation Feedback Form .docxBusiness ProjectProject Progress Evaluation Feedback Form .docx
Business ProjectProject Progress Evaluation Feedback Form .docxfelicidaddinwoodie
 
BUSINESS PROCESSES IN THE FUNCTION OF COST MANAGEMENT IN H.docx
BUSINESS PROCESSES IN THE FUNCTION OF COST MANAGEMENT IN H.docxBUSINESS PROCESSES IN THE FUNCTION OF COST MANAGEMENT IN H.docx
BUSINESS PROCESSES IN THE FUNCTION OF COST MANAGEMENT IN H.docxfelicidaddinwoodie
 
Business Process Management JournalBusiness process manageme.docx
Business Process Management JournalBusiness process manageme.docxBusiness Process Management JournalBusiness process manageme.docx
Business Process Management JournalBusiness process manageme.docxfelicidaddinwoodie
 
Business Process DiagramACCESS for ELL.docx
Business Process DiagramACCESS for ELL.docxBusiness Process DiagramACCESS for ELL.docx
Business Process DiagramACCESS for ELL.docxfelicidaddinwoodie
 
Business Plan[Your Name], OwnerPurdue GlobalBUSINESS PLANDate.docx
Business Plan[Your Name], OwnerPurdue GlobalBUSINESS PLANDate.docxBusiness Plan[Your Name], OwnerPurdue GlobalBUSINESS PLANDate.docx
Business Plan[Your Name], OwnerPurdue GlobalBUSINESS PLANDate.docxfelicidaddinwoodie
 
Business PlanCover Page  Name of Project, Contact Info, Da.docx
Business PlanCover Page  Name of Project, Contact Info, Da.docxBusiness PlanCover Page  Name of Project, Contact Info, Da.docx
Business PlanCover Page  Name of Project, Contact Info, Da.docxfelicidaddinwoodie
 
Business Planning and Program Planning A strategic plan.docx
Business Planning and Program Planning          A strategic plan.docxBusiness Planning and Program Planning          A strategic plan.docx
Business Planning and Program Planning A strategic plan.docxfelicidaddinwoodie
 
Business Plan In your assigned journal, describe the entity you wil.docx
Business Plan In your assigned journal, describe the entity you wil.docxBusiness Plan In your assigned journal, describe the entity you wil.docx
Business Plan In your assigned journal, describe the entity you wil.docxfelicidaddinwoodie
 
Business Plan Part IVPart IV of the Business PlanPart IV of .docx
Business Plan Part IVPart IV of the Business PlanPart IV of .docxBusiness Plan Part IVPart IV of the Business PlanPart IV of .docx
Business Plan Part IVPart IV of the Business PlanPart IV of .docxfelicidaddinwoodie
 
BUSINESS PLAN FORMAT          Whether you plan to apply for a bu.docx
BUSINESS PLAN FORMAT          Whether you plan to apply for a bu.docxBUSINESS PLAN FORMAT          Whether you plan to apply for a bu.docx
BUSINESS PLAN FORMAT          Whether you plan to apply for a bu.docxfelicidaddinwoodie
 

More from felicidaddinwoodie (20)

Business UseWeek 1 Assignment #1Instructions1. Plea.docx
Business UseWeek 1 Assignment #1Instructions1. Plea.docxBusiness UseWeek 1 Assignment #1Instructions1. Plea.docx
Business UseWeek 1 Assignment #1Instructions1. Plea.docx
 
Business UsePALADIN ASSIGNMENT ScenarioYou are give.docx
Business UsePALADIN ASSIGNMENT ScenarioYou are give.docxBusiness UsePALADIN ASSIGNMENT ScenarioYou are give.docx
Business UsePALADIN ASSIGNMENT ScenarioYou are give.docx
 
Business UsePractical Connection WorkThis work is a writte.docx
Business UsePractical Connection WorkThis work is a writte.docxBusiness UsePractical Connection WorkThis work is a writte.docx
Business UsePractical Connection WorkThis work is a writte.docx
 
Business System AnalystSUMMARY· Cognos Business.docx
Business System AnalystSUMMARY· Cognos Business.docxBusiness System AnalystSUMMARY· Cognos Business.docx
Business System AnalystSUMMARY· Cognos Business.docx
 
Business StrategyOrganizations have to develop an international .docx
Business StrategyOrganizations have to develop an international .docxBusiness StrategyOrganizations have to develop an international .docx
Business StrategyOrganizations have to develop an international .docx
 
Business StrategyGroup BCase Study- KFC Business Analysis.docx
Business StrategyGroup BCase Study- KFC Business Analysis.docxBusiness StrategyGroup BCase Study- KFC Business Analysis.docx
Business StrategyGroup BCase Study- KFC Business Analysis.docx
 
Business Strategy Differentiation, Cost Leadership, a.docx
Business Strategy Differentiation, Cost Leadership, a.docxBusiness Strategy Differentiation, Cost Leadership, a.docx
Business Strategy Differentiation, Cost Leadership, a.docx
 
Business Research Methods, 11e, CooperSchindler1case.docx
Business Research Methods, 11e, CooperSchindler1case.docxBusiness Research Methods, 11e, CooperSchindler1case.docx
Business Research Methods, 11e, CooperSchindler1case.docx
 
Business RequirementsReference number Document Control.docx
Business RequirementsReference number Document Control.docxBusiness RequirementsReference number Document Control.docx
Business RequirementsReference number Document Control.docx
 
Business ProposalThe Business Proposal is the major writing .docx
Business ProposalThe Business Proposal is the major writing .docxBusiness ProposalThe Business Proposal is the major writing .docx
Business ProposalThe Business Proposal is the major writing .docx
 
Business ProjectProject Progress Evaluation Feedback Form .docx
Business ProjectProject Progress Evaluation Feedback Form .docxBusiness ProjectProject Progress Evaluation Feedback Form .docx
Business ProjectProject Progress Evaluation Feedback Form .docx
 
BUSINESS PROCESSES IN THE FUNCTION OF COST MANAGEMENT IN H.docx
BUSINESS PROCESSES IN THE FUNCTION OF COST MANAGEMENT IN H.docxBUSINESS PROCESSES IN THE FUNCTION OF COST MANAGEMENT IN H.docx
BUSINESS PROCESSES IN THE FUNCTION OF COST MANAGEMENT IN H.docx
 
Business Process Management JournalBusiness process manageme.docx
Business Process Management JournalBusiness process manageme.docxBusiness Process Management JournalBusiness process manageme.docx
Business Process Management JournalBusiness process manageme.docx
 
Business Process DiagramACCESS for ELL.docx
Business Process DiagramACCESS for ELL.docxBusiness Process DiagramACCESS for ELL.docx
Business Process DiagramACCESS for ELL.docx
 
Business Plan[Your Name], OwnerPurdue GlobalBUSINESS PLANDate.docx
Business Plan[Your Name], OwnerPurdue GlobalBUSINESS PLANDate.docxBusiness Plan[Your Name], OwnerPurdue GlobalBUSINESS PLANDate.docx
Business Plan[Your Name], OwnerPurdue GlobalBUSINESS PLANDate.docx
 
Business PlanCover Page  Name of Project, Contact Info, Da.docx
Business PlanCover Page  Name of Project, Contact Info, Da.docxBusiness PlanCover Page  Name of Project, Contact Info, Da.docx
Business PlanCover Page  Name of Project, Contact Info, Da.docx
 
Business Planning and Program Planning A strategic plan.docx
Business Planning and Program Planning          A strategic plan.docxBusiness Planning and Program Planning          A strategic plan.docx
Business Planning and Program Planning A strategic plan.docx
 
Business Plan In your assigned journal, describe the entity you wil.docx
Business Plan In your assigned journal, describe the entity you wil.docxBusiness Plan In your assigned journal, describe the entity you wil.docx
Business Plan In your assigned journal, describe the entity you wil.docx
 
Business Plan Part IVPart IV of the Business PlanPart IV of .docx
Business Plan Part IVPart IV of the Business PlanPart IV of .docxBusiness Plan Part IVPart IV of the Business PlanPart IV of .docx
Business Plan Part IVPart IV of the Business PlanPart IV of .docx
 
BUSINESS PLAN FORMAT          Whether you plan to apply for a bu.docx
BUSINESS PLAN FORMAT          Whether you plan to apply for a bu.docxBUSINESS PLAN FORMAT          Whether you plan to apply for a bu.docx
BUSINESS PLAN FORMAT          Whether you plan to apply for a bu.docx
 

Recently uploaded

Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 

Recently uploaded (20)

Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 

2016.09.28TOPIC REVIEW• Exam • PS2 Sequence Alignment .docx

  • 1. 2016.09.28 TOPIC REVIEW • Exam • PS2 Sequence Alignment • Command Line Blast • PS1 Molecular Biology • Personal Microbiome Project CURRENTLY LET’S NEGOTIATE • Problem sets (4) - 10% • Microbiome project - 20% • Exam (1) - 20% • Research project - 45% • Participation - 5% OR • Problem sets (4) - 10% • Microbiome project - 20% • Exam 1 - 15% • Exam 2 - 15% • Research project - 35% • Participation - 5% PS2 SEQUENCE ALIGNMENT
  • 2. PS2 SEQUENCE ALIGNMENT RefSeqs, protein (experimentally supported) On chromosome 17 Reverse strand PRCD Progressive rod-cone degeneration PS2: GLOBAL ALIGNMENT BLOSUM62 • substitutions less penalized and are preferred to gaps. There is also a decrease in the level of identity. BLOSUM80 • Substitutions more penalized and gaps are favored.
  • 3. PAM60 • Substitutions more penalized and gaps are favored. PAM250 • substitutions less penalized and are preferred to gaps. There is also a decrease in the level of identity. PS2: LOCAL ALIGNMENT SEQ1 A L S C V W M I P SEQ2 A I S C M I P T 9 residues 8 residues Create Matrix: length of seq1 + 1 x length of seq2 + 1 Matrix 10 x 9
  • 4. A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 -4 -6 -8 -10 -12 -14 -16 A I S C M I P T Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D T S I N
  • 5. Q E T Ala A 4 Arg R -1 5 Asn N -2 0 6 Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9 Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la A rg A sn
  • 7. e P ro S e r T h r T rp T y r V a l A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment 83 BLOSUM62 GAP COST: -2
  • 8. At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost • The highest score is retained and the arrow is labelled A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 -4 -6 -8 -10 -12 -14 -16 A I S C M I P T
  • 9. Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D T S I N Q E T Ala A 4 Arg R -1 5 Asn N -2 0 6 Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9 Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6
  • 10. Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la A rg A sn A sp C y s G ln G lu G ly H is Il
  • 12. V a l A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment 83 BLOSUM62 GAP COST: -2 At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost • The highest score is retained and the arrow is labelled 4 comes from the substitution matrix. Match score = 0 + (4) = 4
  • 13. A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 -4 -6 -8 -10 -12 -14 -16 A I S C M I P T Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D T S I N
  • 14. Q E T Ala A 4 Arg R -1 5 Asn N -2 0 6 Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9 Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la A rg A sn
  • 16. h e P ro S e r T h r T rp T y r V a l A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment 83 BLOSUM62 GAP COST: -2
  • 17. Match score = 0 + (4) = 4 Vertical gap score = -2 + (-2) = -4 At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost • The highest score is retained and the arrow is labelled A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 -4 -6 -8 -10 -12 -14 -16 A I S C M
  • 18. I P T Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D T S I N Q E T Ala A 4 Arg R -1 5 Asn N -2 0 6 Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9 Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
  • 19. Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la A rg A sn A sp C y s G ln G lu G ly
  • 21. T y r V a l A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment 83 BLOSUM62 GAP COST: -2 Match score = 0 + (4) = 4 Horizontal gap score = -2 + (-2) = -4 At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost • The highest score is retained and
  • 22. the arrow is labelled Vertical gap score = -2 + (-2) = -4 A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 4 -4 -6 -8 -10 -12 -14 -16 A I S C M I P T Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D T
  • 23. S I N Q E T Ala A 4 Arg R -1 5 Asn N -2 0 6 Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9 Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la
  • 25. M e t P h e P ro S e r T h r T rp T y r V a l A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment
  • 26. 83 BLOSUM62 GAP COST: -2 Match score = 0 + (4) = 4 Horizontal gap score = -2 + (-2) = -4 At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost • The highest score is retained and the arrow is labelled Vertical gap score = -2 + (-2) = -4 Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D
  • 27. T S I N Q E T Ala A 4 Arg R -1 5 Asn N -2 0 6 Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9 Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A
  • 29. s M e t P h e P ro S e r T h r T rp T y r V a l A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment
  • 30. 83 BLOSUM62 GAP COST: -2 At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost • The highest score is retained and the arrow is labelled -1 comes from the substitution matrix. Match score = -2 + (-1) = -3 A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 4 -4 -6 -8 -10 -12 -14
  • 31. -16 A I S C M I P T At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost • The highest score is retained and the arrow is labelled Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D T
  • 32. S I N Q E T Ala A 4 Arg R -1 5 Asn N -2 0 6 Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9 Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la
  • 34. M e t P h e P ro S e r T h r T rp T y r V a l A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment
  • 35. 83 BLOSUM62 GAP COST: -2 Match score = -2 + (-1) = -3 Vertical gap score = -4 + (-2) = -6 A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 4 -4 -6 -8 -10 -12 -14 -16 A I S C M I P T At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix.
  • 36. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost • The highest score is retained and the arrow is labelled Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D T S I N Q E T Ala A 4 Arg R -1 5 Asn N -2 0 6 Asp D -2 -2 1 6
  • 37. Cys C 0 -3 -3 -3 9 Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la A rg A sn A sp C y s G ln
  • 39. h r T rp T y r V a l A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment 83 BLOSUM62 GAP COST: -2 Match score = -2 + (-1) = -3 Horizontal gap score = 4 + (-2) = 2 Vertical gap score = -4 + (-2) = -6 A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 4 -4 -6
  • 40. -8 -10 -12 -14 -16 A I S C M I P T A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 4 2 -4 -6 -8 -10 -12 -14 -16 A I S C M I P T
  • 41. Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D T S I N Q E T Ala A 4 Arg R -1 5 Asn N -2 0 6 Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9 Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5
  • 42. Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la A rg A sn A sp C y s G ln G lu G ly H is
  • 44. V a l A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment 83 BLOSUM62 GAP COST: -2 At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost • The highest score is retained and the arrow is labelled Match score = -2 + (-1) = -3 Horizontal gap score = 4 + (-2) = 2 Vertical gap score = -4 + (-2) = -6
  • 45. A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 4 2 -4 -6 -8 -10 -12 -14 -16 Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D T S I N Q E T Ala A 4
  • 46. Arg R -1 5 Asn N -2 0 6 Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9 Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la A rg A sn A sp C y s
  • 48. r T h r T rp T y r V a l A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment 83 BLOSUM62 GAP COST: -2 At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix. • Vertical gap score = upper neighbor + gap cost
  • 49. • Horizontal gap score = left neighbor + gap cost • The highest score is retained and the arrow is labelled +1 comes from the substitution matrix. Match score = -4 + (1) = -3 A I S C M I P T A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 4 2 -4 -6 -8 -10 -12 -14 -16 At each cell, 3 scores are calculated:
  • 50. • match score = diagonal cell score + score from the substitution matrix. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost • The highest score is retained and the arrow is labelled Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D T S I N Q E T Ala A 4 Arg R -1 5
  • 51. Asn N -2 0 6 Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9 Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la A rg A sn A sp C y s G
  • 53. T h r T rp T y r V a l A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment 83 BLOSUM62 GAP COST: -2 Match score = -4 + (1) = -3 Vertical gap score = -6 + (-2) = -8 A I S C M
  • 54. I P T A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 4 2 -4 -6 -8 -10 -12 -14 -16 At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost • The highest score is retained and the arrow is labelled Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix. Gap opening penalty: -5 Gap extension penalty: -1
  • 55. S V E T D T S I N Q E T Ala A 4 Arg R -1 5 Asn N -2 0 6 Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9 Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7
  • 56. Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la A rg A sn A sp C y s G ln G lu G ly H is Il e L e u
  • 58. Dynamical programming - global alignment 83 BLOSUM62 GAP COST: -2 Match score = -4 + (1) = -3 Horizontal gap score = 2 + (-2) = 0 Vertical gap score = -6 + (-2) = -8 A I S C M I P T A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 4 2 0 -4 -6 -8 -10 -12 -14
  • 59. -16 A I S C M I P T Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D T S I N Q E T Ala A 4 Arg R -1 5 Asn N -2 0 6
  • 60. Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9 Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la A rg A sn A sp C y s G ln
  • 62. T h r T rp T y r V a l A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment 83 BLOSUM62 GAP COST: -2 Match score = 0 + (4) = 4 Horizontal gap score = -2 + (-2) = -4 At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix.
  • 63. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost • The highest score is retained and the arrow is labelled Vertical gap score = -2 + (-2) = -4 A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 4 2 0 -2 -4 -6 -8 -10 -12 -4 2 6 4 2 1 -1 -3 -4 -6 -6 0 0 10 8 6 4 2 0 -2 -8 -2 -1 -1 19 17 15 13 11 9 -10 -4 0 -2 17 20 18 20 18 16 -12 -6 -2 -2 15 20 18 19 24 22 -14 -8 -4 -3 13 18 16 17 22 31 -16 -10 -6 -3 11 16 16 15 20 29 A I S C M I P T Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix.
  • 64. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D T S I N Q E T Ala A 4 Arg R -1 5 Asn N -2 0 6 Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9 Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
  • 65. Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la A rg A sn A sp C y s G ln G lu G ly H is Il e L
  • 67. A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment 83 BLOSUM62 GAP COST: -2 At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost • The highest score is retained and the arrow is labelled - T Seq1 Seq2
  • 68. A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 4 2 0 -2 -4 -6 -8 -10 -12 -4 2 6 4 2 1 -1 -3 -4 -6 -6 0 0 10 8 6 4 2 0 -2 -8 -2 -1 -1 19 17 15 13 11 9 -10 -4 0 -2 17 20 18 20 18 16 -12 -6 -2 -2 15 20 18 19 24 22 -14 -8 -4 -3 13 18 16 17 22 31 -16 -10 -6 -3 11 16 16 15 20 29 A I S C M I P T Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D T S I N
  • 69. Q E T Ala A 4 Arg R -1 5 Asn N -2 0 6 Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9 Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la A rg A sn
  • 71. h e P ro S e r T h r T rp T y r V a l A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment 83 BLOSUM62 GAP COST: -2
  • 72. At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost • The highest score is retained and the arrow is labelled P - P T Seq1 Seq2 A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 4 2 0 -2 -4 -6 -8 -10 -12 -4 2 6 4 2 1 -1 -3 -4 -6 -6 0 0 10 8 6 4 2 0 -2 -8 -2 -1 -1 19 17 15 13 11 9 -10 -4 0 -2 17 20 18 20 18 16 -12 -6 -2 -2 15 20 18 19 24 22 -14 -8 -4 -3 13 18 16 17 22 31 -16 -10 -6 -3 11 16 16 15 20 29
  • 73. A I S C M I P T Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D T S I N Q E T Ala A 4 Arg R -1 5 Asn N -2 0 6 Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9
  • 74. Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la A rg A sn A sp C y s G ln G
  • 76. r T rp T y r V a l A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment 83 BLOSUM62 GAP COST: -2 At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost
  • 77. • The highest score is retained and the arrow is labelled I P - I P T Seq1 Seq2 A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 4 2 0 -2 -4 -6 -8 -10 -12 -4 2 6 4 2 1 -1 -3 -4 -6 -6 0 0 10 8 6 4 2 0 -2 -8 -2 -1 -1 19 17 15 13 11 9 -10 -4 0 -2 17 20 18 20 18 16 -12 -6 -2 -2 15 20 18 19 24 22 -14 -8 -4 -3 13 18 16 17 22 31 -16 -10 -6 -3 11 16 16 15 20 29 A I S C M I P T Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix.
  • 78. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D T S I N Q E T Ala A 4 Arg R -1 5 Asn N -2 0 6 Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9 Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
  • 79. Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la A rg A sn A sp C y s G ln G lu G ly H is Il e L
  • 81. A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment 83 BLOSUM62 GAP COST: -2 At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost • The highest score is retained and the arrow is labelled M I P - M I P T Seq1 Seq2
  • 82. A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 4 2 0 -2 -4 -6 -8 -10 -12 -4 2 6 4 2 1 -1 -3 -4 -6 -6 0 0 10 8 6 4 2 0 -2 -8 -2 -1 -1 19 17 15 13 11 9 -10 -4 0 -2 17 20 18 20 18 16 -12 -6 -2 -2 15 20 18 19 24 22 -14 -8 -4 -3 13 18 16 17 22 31 -16 -10 -6 -3 11 16 16 15 20 29 A I S C M I P T Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D T S I N
  • 83. Q E T Ala A 4 Arg R -1 5 Asn N -2 0 6 Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9 Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la A rg A sn
  • 85. h e P ro S e r T h r T rp T y r V a l A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment 83 BLOSUM62 GAP COST: -2
  • 86. At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost • The highest score is retained and the arrow is labelled W M I P - - M I P T Seq1 Seq2 A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 4 2 0 -2 -4 -6 -8 -10 -12 -4 2 6 4 2 1 -1 -3 -4 -6 -6 0 0 10 8 6 4 2 0 -2 -8 -2 -1 -1 19 17 15 13 11 9 -10 -4 0 -2 17 20 18 20 18 16 -12 -6 -2 -2 15 20 18 19 24 22 -14 -8 -4 -3 13 18 16 17 22 31 -16 -10 -6 -3 11 16 16 15 20 29
  • 87. A I S C M I P T Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D T S I N Q E T Ala A 4 Arg R -1 5 Asn N -2 0 6 Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9
  • 88. Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la A rg A sn A sp C y s G ln G
  • 90. r T rp T y r V a l A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment 83 BLOSUM62 GAP COST: -2 At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost
  • 91. • The highest score is retained and the arrow is labelled V W M I P - - - M I P T Seq1 Seq2 A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 4 2 0 -2 -4 -6 -8 -10 -12 -4 2 6 4 2 1 -1 -3 -4 -6 -6 0 0 10 8 6 4 2 0 -2 -8 -2 -1 -1 19 17 15 13 11 9 -10 -4 0 -2 17 20 18 20 18 16 -12 -6 -2 -2 15 20 18 19 24 22 -14 -8 -4 -3 13 18 16 17 22 31 -16 -10 -6 -3 11 16 16 15 20 29 A I S C M I P T Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix.
  • 92. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D T S I N Q E T Ala A 4 Arg R -1 5 Asn N -2 0 6 Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9 Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
  • 93. Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la A rg A sn A sp C y s G ln G lu G ly H is Il e L
  • 95. A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment 83 BLOSUM62 GAP COST: -2 At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost • The highest score is retained and the arrow is labelled C V W M I P - C - - M I P T Seq1 Seq2
  • 96. A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 4 2 0 -2 -4 -6 -8 -10 -12 -4 2 6 4 2 1 -1 -3 -4 -6 -6 0 0 10 8 6 4 2 0 -2 -8 -2 -1 -1 19 17 15 13 11 9 -10 -4 0 -2 17 20 18 20 18 16 -12 -6 -2 -2 15 20 18 19 24 22 -14 -8 -4 -3 13 18 16 17 22 31 -16 -10 -6 -3 11 16 16 15 20 29 A I S C M I P T Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D T S I N
  • 97. Q E T Ala A 4 Arg R -1 5 Asn N -2 0 6 Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9 Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la A rg A sn
  • 99. h e P ro S e r T h r T rp T y r V a l A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment 83 BLOSUM62 GAP COST: -2
  • 100. At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost • The highest score is retained and the arrow is labelled S C V W M I P - S C - - M I P T Seq1 Seq2 A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 4 2 0 -2 -4 -6 -8 -10 -12 -4 2 6 4 2 1 -1 -3 -4 -6 -6 0 0 10 8 6 4 2 0 -2 -8 -2 -1 -1 19 17 15 13 11 9 -10 -4 0 -2 17 20 18 20 18 16 -12 -6 -2 -2 15 20 18 19 24 22 -14 -8 -4 -3 13 18 16 17 22 31 -16 -10 -6 -3 11 16 16 15 20 29
  • 101. A I S C M I P T Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D T S I N Q E T Ala A 4 Arg R -1 5 Asn N -2 0 6 Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9
  • 102. Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la A rg A sn A sp C y s G ln G
  • 104. r T rp T y r V a l A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment 83 BLOSUM62 GAP COST: -2 At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost
  • 105. • The highest score is retained and the arrow is labelled L S C V W M I P - I S C - - M I P T Seq1 Seq2 A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 4 2 0 -2 -4 -6 -8 -10 -12 -4 2 6 4 2 1 -1 -3 -4 -6 -6 0 0 10 8 6 4 2 0 -2 -8 -2 -1 -1 19 17 15 13 11 9 -10 -4 0 -2 17 20 18 20 18 16 -12 -6 -2 -2 15 20 18 19 24 22 -14 -8 -4 -3 13 18 16 17 22 31 -16 -10 -6 -3 11 16 16 15 20 29 A I S C M I P T Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix.
  • 106. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D T S I N Q E T Ala A 4 Arg R -1 5 Asn N -2 0 6 Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9 Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5
  • 107. Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la A rg A sn A sp C y s G ln G lu G ly H is Il e L
  • 109. A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment 83 BLOSUM62 GAP COST: -2 At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost • The highest score is retained and the arrow is labelled A L S C V W M I P - A I S C - - M I P T Seq1 Seq2
  • 110. A L S C V W M I P 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -2 4 2 0 -2 -4 -6 -8 -10 -12 -4 2 6 4 2 1 -1 -3 -4 -6 -6 0 0 10 8 6 4 2 0 -2 -8 -2 -1 -1 19 17 15 13 11 9 -10 -4 0 -2 17 20 18 20 18 16 -12 -6 -2 -2 15 20 18 19 24 22 -14 -8 -4 -3 13 18 16 17 22 31 -16 -10 -6 -3 11 16 16 15 20 29 A I S C M I P T Exercise: fill the scores of the alignment matrix using the BLOSUM62 substitution matrix. Gap opening penalty: -5 Gap extension penalty: -1 S V E T D T S I N
  • 111. Q E T Ala A 4 Arg R -1 5 Asn N -2 0 6 Asp D -2 -2 1 6 Cys C 0 -3 -3 -3 9 Gln Q -1 1 0 0 -3 5 Glu E -1 0 0 2 -4 2 5 Gly G 0 -2 0 -1 -3 -2 -2 6 His H -2 0 1 -1 -3 0 0 -2 8 Ile I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 Leu L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 Lys K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 Met M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 Phe F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 Pro P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 Ser S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 Thr T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 Trp W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Tyr Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 Val V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A la A rg A sn
  • 113. h e P ro S e r T h r T rp T y r V a l A R N D C Q E G H I L K M F P S T W Y V Dynamical programming - global alignment 83 BLOSUM62 GAP COST: -2
  • 114. At each cell, 3 scores are calculated: • match score = diagonal cell score + score from the substitution matrix. • Vertical gap score = upper neighbor + gap cost • Horizontal gap score = left neighbor + gap cost • The highest score is retained and the arrow is labelled A L S C V W M I P - A I S C - - M I P T Seq1 Seq2 4 2 4 9-2-2 5 4 7 alignment score: 31 COMMAND LINE BLAST command •Resource: https://www.ncbi.nlm.nih.gov/books/ NBK279675/ •Log onto compile
  • 115. source /usr/local/ncbi-blast-2.4.0+/blast_env.sh export BNFO=/home/bnfo301/assignments/2016-09-28 COMMAND LINE BLAST • Find the documentation for makeblastdb. • Create blast database • Run blastp with query1.faa file • Run blastp with query2.faa file makeblastdb -help makeblastdb -in $BNFO301/protein-db.faa -dbtype prot -parse_seqids -out /home/bnfo301/huangb2/protein- db.faa blastp -query $BNFO301/query1.faa -db /home/bnfo301/ huangb2/protein-db.faa -out query1-out.txt blastp -query $BNFO301/query2.faa -db /home/bnfo301/ huangb2/protein-db.faa -out query1-out.txt COMMAND LINE BLAST • Change the output format • Change the evalue
  • 116. • Change the substitution matrix blastp -query $BNFO301/query2.faa -db /home/bnfo301/ huangb2/protein-db.faa -out query2-out.txt -outfmt 7 blastp -query $BNFO301/query2.faa -db /home/bnfo301/ huangb2/protein-db.faa -out query2-out.txt -outfmt 7 -evalue 1e-05 blastp -query $BNFO301/query2.faa -db /home/bnfo301/ huangb2/protein-db.faa -out query2-out.txt -outfmt 7 -evalue 1e-05 -matrix BLOSUM80 PS1 MOLECULAR BIOLOGY BNFO301: PS Molecular Biology DUE: 1. Complete the following table: 2. Below is the double-stranded DNA sequence of a hypothetical genome, which happens to have a very small gene. a. Which strand of DNA shown, the top or the bottom, is the template strand?
  • 117. b. What is the sequence of the mRNA produced from this gene? c. What is the sequence of the protein produced from the mRNA? d. If a mutation were found where a T/A (top/bottom) base pair were added immediately after the T/A base pair shown in bold, what would be the sequence of the mRNA? What would be the sequence of the protein? What type of mutation is it? Translation Problem Set - 1 BNFO 301: Introduction to Bioinformatics Introduction to Molecular Biology: Translation - Problem Set 1. Complete the following table: DNA double helix A G A T G T mRNA transcribed 5' A U
  • 118. Appropriate tRNA anticodon U G 5' Amino acids incorporated into protein met 2. List the changes that can be produced by a single basepair mutation in the AGA codon encoding arginine and label each silent (no effect on protein structure), conservative (mild effect on protein structure), hydrophobic-to-hydrophilic, hydrophilic-to-hydrophobic, or other. 3. Hemophilia A is an X-linked disease associated with the absence of an essential blood clotting factor, factor VIII (if you don't have any idea what an X-linked trait is, don't worry about it). Factor VIII is encoded by the gene called FACTOR8. This gene was cloned from several individuals -- some affected, some not -- and sequenced. A portion of each sequence that you're sure contains the beginning of the gene (i.e., the start codon) was compared with the same portion of the wild-type sequence, as shown below. Each sequence contains only one mutation, shown emphasized. Wild-type 5'- GGAGTTGAGTCATGGACTCTAAGCAGCGATCCACAAAG...
  • 119. Individual a 5'- GGAGTTTAGTCATGGACTCTAAGCAGCGATCCACAAAG... Individual b 5'- GGAGTTGAGTCATTGACTCTAAGCAGCGATCCACAAAG... Individual c 5'- GGAGTTGAGTCATGGACTCTTAGCAGCGATCCACAAAG... Individual d 5'- GGAGTTGAGTCATGGACTCTAAGCAGCTATCCACAAAG... Individual e 5'- GGAGTTGAGTCATGGACTCTAAGCAGCGATCCACTAAG... For each individual, choose from the list below to describe what you predict would be the severity of the phenotype, and give the reason for your choice. A. Severe hemophilia B. Mild hemophilia C. No hemophilia A U G U A C T A C A T G A U A T T A A U C C T A C G A T GC C
  • 120. GC AA A U U3’ 3’ 3’ 3’ 5’ 5’ Lys Ala Stop PS1 MOLECULAR BIOLOGY • Which strand of DNA shown, the top or the bottom, is the template strand? • What is the sequence of the mRNA produced from this gene? • What is the sequence of the protein produced from the mRNA? • If a mutation were found where a T/A (top/bottom) base pair were added immediately after the T/A base pair shown in bold, what would be the sequence of the mRNA? What would be the sequence of the protein? What type of mutation is it? Bottom
  • 121. 5’- CTATAAAGAGCCATG CAT TAT CTA GAT AGT AGG CTC TGA GAATTTATCTCACT - 3’ ||||||||||||||| ||| ||| ||| ||| ||| ||| ||| ||| ||||| 3’- GATATTTCTCGGTAC GTA ATA GAT CTA TCA TCC GAG ACT CTTAAATAGAGTGA - 5’ PROMOTER TERMINATOR mRNA 5’- GAGCCAUG CAU UAU CUA GAU AGU AGG CUC UGA GAAUUUAUCUC -3’ protein 5’- met his tyr leu asp ser arg leu stp -3’ Standard Score Problems Assuming a population mean of 500 and a population standard deviation of 100 for the verbal subtest of the SAT exam: 1. What percentage of the student population has SAT-V scores greater than 600? 2. What percentage of the student population has SAT-V scores greater than 700? 3. What percentage of the student population has SAT-V scores lower than 420? 4. What percentage of the student population has SAT-V scores between 300 and 520? 5. What percentage of the student population has SAT-V scores between 250 and 600? 6. What percentage of the student population has SAT-V scores between 500 and 550?
  • 122. 7. A student gets a 620 on this test. Convert this to a percentile. 8. 7. A student gets a 340 on this test. Convert this to a percentile. BNFO301: Exam 1 1. List all the changes that can be produced by a single base pair mutation in the AGA codon encoding arginine and label the resulting amino acid. In addition label each mutation as silent, missense or nonsense. (4pts) 2. What would be the value of using a dot plot to compare a sequence to its own reverse complement? (2 pts) Sketch the dot plot o
  • 123. 3. f a 1 kb sequence in which a motif of approximately 50 consecutive bases appears six times in the N terminal region of the sequence. (4 pts) 4. Use the PAM250 matrix to answer question 4. a. Give the score for aligning two alanines (A) (1 pt) b. Give the score for aligning two tryptophans (W) (1 pt) c. Both of these alignments constitute “matches”, so why are the scores so different? (2 pts)
  • 124. Use the BLOSUM62 matrix for questions 5 and 6. 5. Calculate the dynamic programming matrix and an optimal GLOBAL alignment for the protein sequences FKHMEDPLE and FMDTPLNE , scoring -2 for a gap (i.e. 2 is the gap penalty). Use the BLOSUM62 substitution matrix (given above). a. Fill out the matrix. (6 pts) b. Highlight the traceback alignment. (1 pt) c. Write out the final alignment. (2 pts) d. Score the final alignment. (1 pt)
  • 125. 6. Calculate the dynamic programming matrix and an optimal LOCAL alignment for the protein sequences FKHMEDPLE and FMDTPLNE . Use the BLOSUM62 matrix (provided above). a. Fill out the matrix. (6 pts) b. Highlight the traceback local alignments. (1 pt) c. Write out the final alignment. (2 pts) d. Score the final alignment. (1 pt)
  • 126. 7. What is 16S rRNA and what is its function inside a cell? (2 pts) 8. 16s rRNA is widely used in microbiome studies. List two strengths and two limitations of 16S rRNA sequencing. (4 pts)
  • 127. 9. Can 16S rRNA be used to classify viruses? Why or why not? (2 pts) 10. Which of the following amino acids is least mutable according to the PAM scoring matrix? (2 pts) a. Alanine b. Glutamine c. Methionine d. Cysteine 1. Which of the following sentences BEST describes the
  • 128. difference between a global alignment and a local alignment between two sequences? (2 pts) a. Global alignment is usually used for DNA sequences, while local alignment is usually used for protein sequences. b. Global alignment has gaps, while local alignment does not have gaps. c. Global alignment finds the global maximum, while local alignment finds the local maximum. d. Global alignment aligns the whole sequence, while local alignment finds the best subsequence that aligns. 2. How does the BLOSUM scoring matrix differ most notably from the PAM scoring matrix? (2 pts) a. It is best used for aligning very closely related proteins. b. It is based on global multiple alignment from closely related proteins. c. It is based on local multiple alignments from distantly related proteins. d. It combines local and global alignment information.
  • 129. 3. A global alignment algorithm (such as Needleman-Wunsch algorithm) is guaranteed to find an optimal alignment. Such an algorithm: (2 pts) a. puts the two proteins being compared into a matrix and finds the optimal score by exhaustively searching every possible combination of alignments. b. puts the two proteins being compared into a matrix and finds the optimal score by iterative recursions. c. puts the two proteins being compared into a matrix and finds the optimal alignment by finding optimal subpaths that define the best alignment(s) d. can be used for proteins but not for DNA sequences. 4. What are the basic concepts of library preparation? (4 pts) 5. List 3 applications of next-generation sequencing. (2 pts) 6. How many reads do you need to get 30x coverage of your
  • 130. genome if your read length is 300bp and your genome size is 10Mb? (2 pts) Command line Log in to compile. Navigate to the bnfo301 (home/bnfo301 ) directory. There is a folder called exam1 where you will find all the files you need to answer the next set of questions. Instructions for this section: • Write your output files to your user specific folder in /home/bnfo301 (ex. my user specific folder is /home/bnfo301/huangb2 ). You will be graded on the files found in your specific folder. If the files are not in that folder you will not get credit for your answers. No exceptions. • Make sure you name your output file as instructed in each question. I will take off 1 point for each output file that is not correctly named. • Code is typically written using a fixed width font. Use a fixed width font to type your commands in this section (ex. courier, inconsolata, menlo, monaco). • For each question, provide the command when specified, or the command and answer. All
  • 131. output files from this section should be written to you user specific folder on compile. I will access your user specific folder to grade this section. 1. List the files in the exam1 folder. command only (2 pts) 2. Count how many sequences are in the protein-db.faa file? command and answer (2 pts) 3. You have an unknown1.faa sequence that you want to blast against sequences in the protein-db.faa file. a. Copy the protein-db.faa to your user specific folder. command only b. Create a blast database for protein-db.faa . command only (2 pts) c. Blast unknown1.faa against the database you just created. Name your blast output file 3b-unknown-output.txt . command only, leave output file on Compile (2 pts) d. Filter your blast results for hits with an evalue greater than 1e-05. Name your blast output file 3c-unknown-output.txt . command only, leave output file on Compile (2 pts) e. What is the percent identity and alignment length of the best
  • 132. hit in your blast results when you filter based on an evalue greater than 1e-05? Hint: you may need to change your output format. (8 pts) f. What is the percent identity and alignment length of the worst hit in your blast results when you filter based on an evalue greater than 1e-05? Hint: you may need to change your output format. (4 pts) 7. BLAST is a tool that can be used to query multiple databases. It is not always necessary to create your own database. One of the most common blast databases is the non-redundant database (nr). a. Blast the unknown1.faa sequence against the nr database (/home/norrissw/bin/I-TASSER4.2/lib/nr/nr ) to find out what it is. Name your blast output file 4a-unknown-nr-output.txt . NOTE: you do not need to run the makeblastdb command. Also, it can take a few minutes for your blast to run because the nr database is very big. command only, leave output file on Compile (2 pts) b. Filter your blast results for hits with an e-value greater than 1e-10. Name your blast output file 4b-unknown-nr-output.txt . command only, leave output file on Compile (2 pts)
  • 133. c. Based on the best hit from nr, take the accession number and identify what that protein is. (4 pts) 8. The next set of questions involve the pipeline.py script a. Copy the pipeline.py script to your /home/bnfo301/vcuid (2 pts) b. Rename the pipeline.py script to 5b-pipeline.py . (2 pts) c. Describe in detail what the script is doing, including what the output from each step is. (4 pts) d. Modify the script so it filters the blast results using an e- value cut off of 1e-05. Save the modified script as 5d-pipeline.py . You do not need to run the script, just add in your modification. leave output file on Compile (2 pts)