SlideShare a Scribd company logo
1 of 136
FBW 
21-10-2014 
Wim Van Criekinge
Wel les op 4 november en GEEN les op 18 november
Rat versus 
mouse RBP 
Rat versus 
bacterial 
lipocalin
– Henikoff and Henikoff have compared the 
BLOSUM matrices to PAM by evaluating how 
effectively the matrices can detect known members 
of a protein family from a database when searching 
with the ungapped local alignment program 
BLAST. They conclude that overall the BLOSUM 
62 matrix is the most effective. 
• However, all the substitution matrices investigated 
perform better than BLOSUM 62 for a proportion of 
the families. This suggests that no single matrix is 
the complete answer for all sequence comparisons. 
• It is probably best to compliment the BLOSUM 62 
matrix with comparisons using 250 PAMS, and 
Overington structurally derived matrices. 
– It seems likely that as more protein three 
dimensional structures are determined, substitution 
tables derived from structure comparison will give 
the most reliable data. 
Overview
Dotplots 
• What is it ? 
– Graphical representation using two orthogonal 
axes and “dots” for regions of similarity. 
– In a bioinformatics context two sequence are 
used on the axes and dots are plotted when a 
given treshold is met in a given window. 
• Dot-plotting is the best way to see all of the 
structures in common between two 
sequences or to visualize all of the repeated 
or inverted repeated structures in one 
sequence
Visual Alignments (Dot Plots) 
• Matrix 
– Rows: Characters in one sequence 
– Columns: Characters in second sequence 
• Filling 
– Loop through each row; if character in row, col match, fill 
in the cell 
– Continue until all cells have been examined
Dotplot-simulator.pl 
print " $seq1n"; 
for(my $teller=0;$teller<=$seq2_length;$teller++){ 
print substr($seq2,$teller,1); 
$w2=substr($seq2,$teller,$window); 
for(my $teller2=0;$teller2<=$seq_length;$teller2++){ 
$w1=substr($seq1,$teller2,$window); 
if($w1 eq $w2){print "*";}else{print " ";} 
} 
print"n"; 
}
Overview 
Window size = 1, stringency 100%
Noise in Dot Plots 
• Nucleic Acids (DNA, RNA) 
– 1 out of 4 bases matches at random 
• Stringency 
– Window size is considered 
– Percentage of bases matching in the window is 
set as threshold
Reduction of Dot Plot Noise 
Self alignment of ACCTGAGCTCACCTGAGTTA
Dotplot-simulator.pl 
Example: ZK822 Genomic and cDNA 
Gene prediction: 
How many exons ? 
Confirm donor and aceptor sites ? 
Remember to check the reverse complement !
Chromosome Y self comparison
• Regions of similarity appear 
as diagonal runs of dots 
• Reverse diagonals 
(perpendicular to diagonal) 
indicate inversions 
• Reverse diagonals crossing 
diagonals (Xs) indicate 
palindromes 
• A gap is introduced by each 
vertical or horizontal skip 
Overview
• Window size changes with goal 
of analysis 
– size of average exon 
– size of average protein structural 
element 
– size of gene promoter 
– size of enzyme active site 
Overview
Rules of thumb 
 Don't get too many points, about 3- 
5 times the length of the sequence 
is about right (1-2%) 
 Window size about 20 for distant 
proteins 12 for nucleic acid 
 Check sequence vs. itself 
 Check sequence vs. sequence 
 Anticipate results 
(e.g. “in-house” sequence vs genomic, 
question) 
Overview
Available Dot Plot Programs 
Dotlet (Java Applet) 
http://www.isrec.isb-sib. 
ch/java/dotlet/Dotlet. 
html
Sequence Alignments 
Introduction 
Algorithms 
What ? 
Examples 
Properties 
Dynamic Programming for Pairwise Alignment 
Concept 
Example 
Needleman-Wunsch(.pl) 
Smith-Waterman(.pl) 
Multiple Alignment 
MSA 
Hierarchical Pairwise Alignent 
ClustalW, PileUp 
Formatting 
Interpretation 
Alternative Methods 
SIM 
Blast2 
Dali
Global and local alignment 
Pairwise sequence alignment can be global or local 
Global: the sequences are completely aligned 
(Needleman and Wunsch, 1970) 
Local: only the best sub-regions are aligned 
(Smith and Waterman, 1981). BLAST 
uses local alignment.
Why we do multiple alignments? 
– In order to characterize protein families, identify 
shared regions of homology in a multiple 
sequence alignment; (this happens generally 
when a sequence search revealed homologies to 
several sequences) 
– Determination of the consensus sequence of 
several aligned sequences 
– Help prediction of the secondary and tertiary 
structures of new sequences; 
– Preliminary step in molecular evolution analysis 
using Phylogenetic methods for constructing 
phylogenetic trees 
– Garbage in, Garbage out 
– Chicken/egg
Why we do multiple alignments? 
• To find conserved regions 
– Local multiple alignment reveals conserved 
regions 
– Conserved regions usually are key functional 
regions 
– These regions are prime targets for drug 
developments 
• To do phylogenetic analysis: 
– Same protein from different species 
– Optimal multiple alignment probably implies 
history 
– Discover irregularities, such as Cystic Fibrosis 
gene
VTISCTGSSSNIGAG-NHVKWYQQLPG 
VTISCTGTSSNIGS--ITVNWYQQLPG 
LRLSCSSSGFIFSS--YAMYWVRQAPG 
LSLTCTVSGTSFDD--YYSTWVRQPPG 
PEVTCVVVDVSHEDPQVKFNWYVDG-- 
ATLVCLISDFYPGA--VTVAWKADS-- 
AALGCLVKDYFPEP--VTVSWNSG--- 
VSLTCLVKGFYPSD--IAVEWWSNG--
Sequence Alignments 
Introduction 
Algorithms 
What ? 
Examples 
Properties 
Dynamic Programming for Pairwise Alignment 
Concept 
Example 
Needleman-Wunsch(.pl) 
Smith-Waterman(.pl) 
Multiple Alignment 
MSA 
Hierarchical Pairwise Alignent 
ClustalW, PileUp 
Formatting 
Interpretation 
Alternative Methods 
SIM 
Blast2 
Dali
Algorithms and Programs 
• Algorithm: a method or a process followed 
to solve a problem. 
– A recipe. 
• An algorithm takes the input to a problem 
(function) and transforms it to the output. 
– A mapping of input to output. 
• A problem can have many algorithms.
Bubble Sort Algorithm 
One of the simplest sorting algorithms proceeds by walking down the list, comparing 
adjacent elements, and swapping them if they are in the wrong order. The process is 
continued until the list is sorted. 
More formally: 
1. Initialize the size of the list to be sorted to be the actual size of the list. 
2. Loop through the list until no element needs to be exchanged with another 
to reach its correct position. 
2.1 Loop (i) from 0 to size of the list to be sorted - 2. 
2.1.1 Compare the ith and (i + 1)st elements in the unsorted list. 
2.1.2 Swap the ith and (i + 1)st elements if not in order ( ascending or 
descending as desired). 
2.2 Decrease the size of the list to be sorted by 1. 
Each pass "bubbles" the largest element in the unsorted part of the list to its correct location. 
A 13 7 43 5 3 19 2 23 29 ?? ?? ?? ?? ??
Bubble Sort Implementation 
Here is an ascending-order implementation of the bubblesort algorithm for integer arrays: 
void BubbleSort(int List[] , int Size) { 
int tempInt; // temp variable for swapping list elems 
for (int Stop = Size - 1; Stop > 0; Stop--) { 
for (int Check = 0; Check < Stop; Check++) { // make a pass 
if (List[Check] > List[Check + 1]) { // compare elems 
tempInt = List[Check]; // swap if in the 
List[Check] = List[Check + 1]; // wrong order 
List[Check + 1] = tempInt; 
} 
} 
} 
} 
Bubblesort compares and swaps adjacent elements; simple but not very efficient. 
Efficiency note: the outer loop could be modified to exit if the list is already sorted.
ijs 
• 6 eierdooiers + 105 gram S1 kristalsuiker 
• 1’ kloppen to “ruban” 
• Ondertussen 500 ml volle melk laten opwarmen 
met 105 gram S1 suiker 
• Toevoegen vanille en/of chocolade (kaneel) 
• Langzaam de bijna kokende melk onder ruban 
kloppen (van het vuur) 
• Terug op het vuur: “Porter a la nappe” 
• Afkoelen 
• “Afdraaien” (in ijsmachine) 
• 15” voor stolling 500 ml room toevoegen
ijs implementatie
"Great algorithms are the poetry of computation"
"Great algorithms are the poetry of computation" 
1946: The Metropolis Algorithm for Monte Carlo. Through the use of random 
processes, this algorithm offers an efficient way to stumble toward answers to 
problems that are too complicated to solve exactly. 
1947: Simplex Method for Linear Programming. An elegant solution to a common 
problem in planning and decision-making. 
1950: Krylov Subspace Iteration Method. A technique for rapidly solving the linear 
equations that abound in scientific computation. 
1951: The Decompositional Approach to Matrix Computations. A suite of techniques 
for numerical linear algebra. 
1957: The Fortran Optimizing Compiler. Turns high-level code into efficient 
computer-readable code. 
1959: QR Algorithm for Computing Eigenvalues. Another crucial matrix operation 
made swift and practical. 
1962: Quicksort Algorithms for Sorting. For the efficient handling of large databases. 
1965: Fast Fourier Transform. Perhaps the most ubiquitous algorithm in use today, it 
breaks down waveforms (like sound) into periodic components. 
1977: Integer Relation Detection. A fast method for spotting simple equations satisfied 
by collections of seemingly unrelated numbers. 
1987: Fast Multipole Method. A breakthrough in dealing with the complexity of n-body 
calculations, applied in problems ranging from celestial mechanics to protein folding. 
From Random Samples, Science page 799, February 4, 2000.
Algorithm Properties 
• An algorithm possesses the following 
properties: 
– It must be correct. 
– It must be composed of a series of concrete steps. 
– There can be no ambiguity as to which step will be 
performed next. 
– It must be composed of a finite number of steps. 
– It must terminate. 
• A computer program is an instance, or 
concrete representation, for an algorithm 
in some programming language.
Measuring Algorithm Efficiency 
• Types of complexity 
– Space complexity 
– Time complexity 
• Analysis of algorithms 
– The measuring of the complexity of an algorithm 
• Cannot compute actual time for an algorithm 
– We usually measure worst-case time
Measuring Algorithm Efficiency 
Three algorithms for computing 
1 + 2 + … n for an integer n > 0
Measuring Algorithm Efficiency 
The number of operations required by the algorithms
Measuring Algorithm Efficiency 
The number of operations required by the algorithms as a 
function of n
Big Oh Notation 
• To say "Algorithm A has a worst-case time 
requirement proportional to n" 
– We say A is O(n) 
– Read "Big Oh of n" 
• For the other two algorithms 
– Algorithm B is O(n2) 
– Algorithm C is O(1) 
• O is derived from order (magnitude)
Picturing Efficiency 
O(n) algorithm
Picturing Efficiency 
An O(n2) algorithm.
Picturing Efficiency 
Another O(n2) algorithm.
Sequence Alignments 
Introduction 
Algorithms 
What ? 
Examples 
Properties 
Dynamic Programming for Pairwise Alignment 
Concept 
Example 
Needleman-Wunsch(.pl) 
Smith-Waterman(.pl) 
Multiple Alignment 
MSA 
Hierarchical Pairwise Alignent 
ClustalW, PileUp 
Formatting 
Interpretation 
Alternative Methods 
SIM 
Blast2 
Dali
The best alignment: 
The one with the maximum total 
score
• Exhaustive … 
– All combinations: 
• Algorithm 
– Dynamic programming (much faster) 
• Heuristics 
– Needleman – Wunsh for global 
alignments 
(Journal of Molecular Biology, 1970) 
– Later adapated by Smith-Waterman 
for local alignment 
Overview
• Score of an alignment: reward 
matches and penalize mismatches 
and spaces. 
– eg, each column gets a (different) 
value for: 
• a match: +1, (both have the same 
characters); 
• a mismatch : -1, (both have different 
characters); and 
• a space in a column: -2. 
– The total score of an alignment is the 
sum of the values assigned to its 
columns.
A metric … 
GACGGATTAG, GATCGGAATAG 
GA-CGGATTAG 
GATCGGAATAG 
+1 (a match), -1 (a mismatch),-2 (gap) 
9*1 + 1*(-1)+1*(-2) = 6
Dynamic programming 
Reduce the problem: 
the solution to a large problem is to 
simplify … if we first know the 
solution to a smaller problem that 
is a subset of the larger problem 
Overview 
P 
P1 P2 P3 
P
Dynamic Programming 
• Finding optimal solution to search 
problem 
• Recursively computes solution 
• Fundamental principle is to produce 
optimal solutions to smaller pieces of 
the problem first and then glue them 
together 
• Efficient divide-and-conquer strategy 
because it uses a bottom-up approach 
and utilizes a look-up table instead of 
recomputing optimal solutions to sub-problems 
P 
P1 P2 P3 
P
Dynamic Programming 
What is the best way to get from A to C ? 
Rules: Three stops 
Solutions: Try all and select best, requires 
(combin(13,3)) = 286 calculations 
A C
Dynamic Programming 
What is the best way to get from A to C ? 
If we known that B is on the optimal path ? 
A B C
Dynamic Programming 
What is the best way to get from A to B ? 
1 
2 
3 
A B C 
4 
5 
6
Dynamic Programming 
What is the best way to get from B to C ? 
2 
3 
A B 4 
C 
5 
6 
1
Dynamic Programming 
How many paths from A to C via B ? 
6 * 6 = 36 
1 
2 
3 
A B 4 
C 
5 
6 
1
Dynamic Programming 
Solve the subproblem A to B: 6 calculations 
1 
2 
3 
A B C 
4 
5 
6
Dynamic Programming 
Solve the subproblem B to C: 6 calculations 
2 
3 
A B 4 
C 
5 
6 
1
Dynamic Programming 
If B is on optimal path from A->C, this 
optimal path = optimal path from A to B + 
optimal path from B to C 
12 calculations needed (not 36 or 286) 
A B C 
5 
3
the best alignment between 
• a zinc-finger core sequence: 
–CKHVFCRVCI 
• and a sequence fragment 
from a viral polyprotein: 
–CKKCFCKCV
C K H V F C R V C I 
+-------------------- 
C | 1 1 1 
K | 1 
K | 1 
C | 1 1 1 
F | 1 
C | 1 1 1 
K | 1 
C | 1 1 1 
V | 1 1 
Dynamic Programming
C K H V F C R V C I 
+-------------------- 
C | 1 1 1 
K | 1 
K | 1 
C | 1 1 1 
F | 1 
C | 1 1 1 
K | 1 
C | 1 1 1 
V | 1 1 
Dynamic Programming
C K H V F C R V C I 
+-------------------- 
C | 1 1 1 0 
K | 1 0 
K | 1 0 
C | 1 1 1 0 
F | 1 0 
C | 1 1 1 0 
K | 1 0 
C | 1 1 1 0 
V | 0 0 0 1 0 0 0 1 0 0 
Dynamic Programming
C K H V F C R V C I 
+-------------------- 
C | 1 1 1 0 
K | 1 0 
K | 1 0 
C | 1 1 1 0 
F | 1 0 
C | 1 1 1 0 
K | 1 0 
C | 2 1 1 0 
V | 0 0 0 1 0 0 0 1 0 0 
Dynamic Programming
C K H V F C R V C I 
+-------------------- 
C | 1 1 1 0 
K | 1 0 0 
K | 1 0 0 
C | 1 1 1 0 
F | 1 0 0 
C | 1 1 1 0 
K | 1 0 0 
C | 2 1 1 1 1 2 1 0 1 0 
V | 0 0 0 1 0 0 0 1 0 0 
Dynamic Programming
C K H V F C R V C I 
+-------------------- 
C | 1 1 1 1 0 
K | 1 1 0 0 
K | 1 1 0 0 
C | 1 1 1 1 0 
F | 1 1 0 0 
C | 1 1 1 1 0 
K | 2 3 2 2 2 1 1 1 0 0 
C | 2 1 1 1 1 2 1 0 1 0 
V | 0 0 0 1 0 0 0 1 0 0 
Dynamic Programming
C K H V F C R V C I 
+-------------------- 
C | 1 1 1 1 1 0 
K | 1 1 1 0 0 
K | 1 1 1 0 0 
C | 1 1 1 1 1 0 
F | 1 1 1 0 0 
C | 4 2 2 2 2 2 1 1 1 0 
K | 2 3 2 2 2 1 1 1 0 0 
C | 2 1 1 1 1 2 1 0 1 0 
V | 0 0 0 1 0 0 0 1 0 0 
Dynamic Programming
C K H V F C R V C I 
+-------------------- 
C | 1 2 1 1 1 0 
K | 1 1 1 1 0 0 
K | 1 1 1 1 0 0 
C | 1 2 1 1 1 0 
F | 2 2 2 2 3 1 1 1 0 0 
C | 4 2 2 2 2 2 1 1 1 0 
K | 2 3 2 2 2 1 1 1 0 0 
C | 2 1 1 1 1 2 1 0 1 0 
V | 0 0 0 1 0 0 0 1 0 0 
Dynamic Programming
C K H V F C R V C I 
+-------------------- 
C | 1 2 2 1 1 1 0 
K | 1 2 1 1 1 0 0 
K | 1 2 1 1 1 0 0 
C | 4 3 3 3 2 2 1 1 1 0 
F | 2 2 2 2 3 1 1 1 0 0 
C | 4 2 2 2 2 2 1 1 1 0 
K | 2 3 2 2 2 1 1 1 0 0 
C | 2 1 1 1 1 2 1 0 1 0 
V | 0 0 0 1 0 0 0 1 0 0 
Dynamic Programming
C K H V F C R V C I 
+-------------------- 
C | 1 3 2 2 1 1 1 0 
K | 1 3 2 1 1 1 0 0 
K | 3 4 3 3 2 1 1 1 0 0 
C | 4 3 3 3 2 2 1 1 1 0 
F | 2 2 2 2 3 1 1 1 0 0 
C | 4 2 2 2 2 2 1 1 1 0 
K | 2 3 2 2 2 1 1 1 0 0 
C | 2 1 1 1 1 2 1 0 1 0 
V | 0 0 0 1 0 0 0 1 0 0 
Dynamic Programming
C K H V F C R V C I 
+-------------------- 
C | 1 3 3 2 2 1 1 1 0 
K | 4 4 3 3 2 1 1 1 0 0 
K | 3 4 3 3 2 1 1 1 0 0 
C | 4 3 3 3 2 2 1 1 1 0 
F | 2 2 2 2 3 1 1 1 0 0 
C | 4 2 2 2 2 2 1 1 1 0 
K | 2 3 2 2 2 1 1 1 0 0 
C | 2 1 1 1 1 2 1 0 1 0 
V | 0 0 0 1 0 0 0 1 0 0 
Dynamic Programming
C K H V F C R V C I 
+-------------------- 
C | 5 3 3 3 2 2 1 1 1 0 
K | 4 4 3 3 2 1 1 1 0 0 
K | 3 4 3 3 2 1 1 1 0 0 
C | 4 3 3 3 2 2 1 1 1 0 
F | 2 2 2 2 3 1 1 1 0 0 
C | 4 2 2 2 2 2 1 1 1 0 
K | 2 3 2 2 2 1 1 1 0 0 
C | 2 1 1 1 1 2 1 0 1 0 
V | 0 0 0 1 0 0 0 1 0 0 
Dynamic Programming
C K H V F C R V C I 
+-------------------- 
C | 5 3 3 3 2 2 1 1 1 0 
K | 4 4 3 3 2 1 1 1 0 0 
K | 3 4 3 3 2 1 1 1 0 0 
C | 4 3 3 3 2 2 1 1 1 0 
F | 3 2 2 2 3 1 1 1 0 0 
C | 4 2 2 2 2 2 1 1 1 0 
K | 2 3 2 2 2 1 1 1 0 0 
C | 2 1 1 1 1 2 1 0 1 0 
V | 0 0 0 1 0 0 0 1 0 0 
Dynamic Programming
C K H V F C R V C I 
+-------------------- 
C | 5 3 3 3 2 2 1 1 1 0 
K | 4 4 3 3 2 1 1 1 0 0 
K | 3 4 3 3 2 1 1 1 0 0 
C | 4 3 3 3 2 2 1 1 1 0 
F | 3 2 2 2 3 1 1 1 0 0 
C | 4 2 2 2 2 2 1 1 1 0 
K | 2 3 2 2 2 1 1 1 0 0 
C | 2 1 1 1 1 2 1 0 1 0 
V | 0 0 0 1 0 0 0 1 0 0 
Dynamic Programming
C K H V F C R V C I 
+-------------------- 
C | 5 3 3 3 2 2 1 1 1 0 
K | 4 4 3 3 2 1 1 1 0 0 
K | 3 4 3 3 2 1 1 1 0 0 
C | 4 3 3 3 2 2 1 1 1 0 
F | 3 2 2 2 3 1 1 1 0 0 
C | 4 2 2 2 2 2 1 1 1 0 
K | 2 3 2 2 2 1 1 1 0 0 
C | 2 1 1 1 1 2 1 0 1 0 
V | 0 0 0 1 0 0 0 1 0 0 
Dynamic Programming
C K H V F C R V C I 
+-------------------- 
C | 5 3 3 3 2 2 1 1 1 0 
K | 4 4 3 3 2 1 1 1 0 0 
K | 3 4 3 3 2 1 1 1 0 0 
C | 4 3 3 3 2 2 1 1 1 0 
F | 3 2 2 2 3 1 1 1 0 0 
C | 4 2 2 2 2 2 1 1 1 0 
K | 2 3 2 2 2 1 1 1 0 0 
C | 2 1 1 1 1 2 1 0 1 0 
V | 0 0 0 1 0 0 0 1 0 0 
Dynamic Programming
C K H V F C R V C I 
+-------------------- 
C | 5 3 3 3 2 2 1 1 1 0 
K | 4 4 3 3 2 1 1 1 0 0 
K | 3 4 3 3 2 1 1 1 0 0 
C | 4 3 3 3 2 2 1 1 1 0 
F | 3 2 2 2 3 1 1 1 0 0 
C | 4 2 2 2 2 2 1 1 1 0 
K | 2 3 2 2 2 1 1 1 0 0 
C | 2 1 1 1 1 2 1 0 1 0 
V | 0 0 0 1 0 0 0 1 0 0 
Dynamic Programming
C K H V F C R V C I 
+-------------------- 
C | 5 3 3 3 2 2 1 1 1 0 
K | 4 4 3 3 2 1 1 1 0 0 
K | 3 4 3 3 2 1 1 1 0 0 
C | 4 3 3 3 2 2 1 1 1 0 
F | 3 2 2 2 3 1 1 1 0 0 
C | 4 2 2 2 2 2 1 1 1 0 
K | 2 3 2 2 2 1 1 1 0 0 
C | 2 1 1 1 1 2 1 0 1 0 
V | 0 0 0 1 0 0 0 1 0 0 
Dynamic Programming
C K H V F C R V C I 
+-------------------- 
C | 5 3 3 3 2 2 1 1 1 0 
K | 4 4 3 3 2 1 1 1 0 0 
K | 3 4 3 3 2 1 1 1 0 0 
C | 4 3 3 3 2 2 1 1 1 0 
F | 3 2 2 2 3 1 1 1 0 0 
C | 4 2 2 2 2 2 1 1 1 0 
K | 2 3 2 2 2 1 1 1 0 0 
C | 2 1 1 1 1 2 1 0 1 0 
V | 0 0 0 1 0 0 0 1 0 0 
Dynamic Programming
C K H V F C R V C I 
+-------------------- 
C | 5 3 3 3 2 2 1 1 1 0 
K | 4 4 3 3 2 1 1 1 0 0 
K | 3 4 3 3 2 1 1 1 0 0 
C | 4 3 3 3 2 2 1 1 1 0 
F | 3 2 2 2 3 1 1 1 0 0 
C | 4 2 2 2 2 2 1 1 1 0 
K | 2 3 2 2 2 1 1 1 0 0 
C | 2 1 1 1 1 2 1 0 1 0 
V | 0 0 0 1 0 0 0 1 0 0 
Dynamic Programming
C K H V F C R V C I 
+-------------------- 
C | 5 3 3 3 2 2 1 1 1 0 
K | 4 4 3 3 2 1 1 1 0 0 
K | 3 4 3 3 2 1 1 1 0 0 
C | 4 3 3 3 2 2 1 1 1 0 
F | 3 2 2 2 3 1 1 1 0 0 
C | 4 2 2 2 2 2 1 1 1 0 
K | 2 3 2 2 2 1 1 1 0 0 
C | 2 1 1 1 1 2 1 0 1 0 
V | 0 0 0 1 0 0 0 1 0 0 
Dynamic Programming
C K H V F C R V C I 
+-------------------- 
C | 5 3 3 3 2 2 1 1 1 0 
K | 4 4 3 3 2 1 1 1 0 0 
K | 3 4 3 3 2 1 1 1 0 0 
C | 4 3 3 3 2 2 1 1 1 0 
F | 3 2 2 2 3 1 1 1 0 0 
C | 4 2 2 2 2 2 1 1 1 0 
K | 2 3 2 2 2 1 1 1 0 0 
C | 2 1 1 1 1 2 1 0 1 0 
V | 0 0 0 1 0 0 0 1 0 0 
C K H V F C R V C I 
C K K C F C - K C V 
C K H V F C R V C I 
C K K C F C K - C V 
C - K H V F C R V C I 
C K K C - F C - K C V 
C K H - V F C R V C I 
C K K C - F C - K C V 
Dynamic Programming
C K H V F C R V C I 
+-------------------- 
C | 5 3 3 3 2 2 1 1 1 0 
K | 4 4 3 3 2 1 1 1 0 0 
K | 3 4 3 3 2 1 1 1 0 0 
C | 4 3 3 3 2 2 1 1 1 0 
F | 3 2 2 2 3 1 1 1 0 0 
C | 4 2 2 2 2 2 1 1 1 0 
K | 2 3 2 2 2 1 1 1 0 0 
C | 2 1 1 1 1 2 1 0 1 0 
V | 0 0 0 1 0 0 0 1 0 0 
C K H V F C R V C I 
C K K C F C - K C V 
C K H V F C R V C I 
C K K C F C K - C V 
C - K H V F C R V C I 
C K K C - F C - K C V 
C K H - V F C R V C I 
C K K C - F C - K C V 
Dynamic Programming
Extensions to basic dynamic programming method 
use gap penalties 
– constant gap penalty for gap > 1 
– gap penalty proportional to gap size 
• one penalty for starting a gap (gap opening 
penalty) 
• different (lower) penalty for adding to a gap 
(gap extension penalty) 
• for nucleic acids, can be used to mimic 
thermodynamics of helix formation 
– two kinds of gap opening penalties 
• one for gap closed by AT, different for GC 
Dynamic Programming
• Zie cursus voor voorbeeld met gap-penalties 
– zoek de fouten ;-) 
• Beschikbaar als perl programma waarmee we 
kunnen experimenteren
Needleman-Wunsch.pl 
# initialization 
my @matrix; 
$matrix[0][0]{score} = 0; 
$matrix[0][0]{pointer} = "none"; 
for(my $j = 1; $j <= length($seq1); $j++) { 
$matrix[0][$j]{score} = $GAP * $j; 
$matrix[0][$j]{pointer} = "left"; 
} 
for (my $i = 1; $i <= length($seq2); $i++) { 
$matrix[$i][0]{score} = $GAP * $i; 
$matrix[$i][0]{pointer} = "up"; 
}
Needleman-Wunsch-edu.pl 
The Score Matrix 
---------------- 
Seq1(j)1 2 3 4 5 6 7 
Seq2 * C K H V F C R 
(i) * 0 -1 -2 -3 -4 -5 -6 -7 
1 C -1 1 0 -1 -2 -3 -4 -5 
2 K -2 0 2 1 0 -1 -2 -3 
3 K -3 -1 1 1 0 -1 -2 -3 
4 C -4 -2 0 0 0 -1 0 -1 
5 F -5 -3 -1 -1 -1 1 0 -1 
6 C -6 -4 -2 -2 -2 0 2 1 
7 K -7 -5 -3 -3 -3 -1 1 1 
8 C -8 -6 -4 -4 -4 -2 0 0 
9 V -9 -7 -5 -5 -3 -3 -1 -1
Needleman-Wunsch-edu.pl 
The Score Matrix 
---------------- 
Seq1(j)1 2 3 4 5 6 7 
Seq2 * C K H V F C R 
(i) * 0 -1 -2 -3 -4 -5 -6 -7 
1 C -1 1 0 -1 -2 -3 -4 -5 
2 K -2 0 2 1 0 -1 -2 -3 
3 K -3 -1 1 1 0 -1 -2 -3 
4 C -4 -2 0 0 0 -1 0 -1 
5 F -5 -3 -1 -1 -1 1 0 -1 
6 C -6 -4 -2 -2 -2 0 2 1 
7 K -7 -5 -3 -3 -3 -1 1 1 
8 C -8 -6 -4 -4 -4 -2 0 0 
9 V -9 -7 -5 -5 -3 -3 -1 -1
Needleman-Wunsch.pl 
# fill 
for(my $i = 1; $i <= length($seq2); $i++) { 
for(my $j = 1; $j <= length($seq1); $j++) { 
my ($diagonal_score, $left_score, $up_score); 
# calculate match score 
my $letter1 = substr($seq1, $j-1, 1); 
my $letter2 = substr($seq2, $i-1, 1); 
if ($letter1 eq $letter2) { 
$diagonal_score = $matrix[$i-1][$j-1]{score} + $MATCH; 
} 
else { 
$diagonal_score = $matrix[$i-1][$j-1]{score} + $MISMATCH; 
} 
# calculate gap scores 
$up_score = $matrix[$i-1][$j]{score} + $GAP; 
$left_score = $matrix[$i][$j-1]{score} + $GAP; 
# choose best score 
if ($diagonal_score >= $up_score) { 
if ($diagonal_score >= $left_score) { 
$matrix[$i][$j]{score} = $diagonal_score; 
$matrix[$i][$j]{pointer} = "diagonal"; 
} 
else { 
$matrix[$i][$j]{score} = $left_score; 
$matrix[$i][$j]{pointer} = "left"; 
} 
} else { 
if ($up_score >= $left_score) { 
$matrix[$i][$j]{score} = $up_score; 
$matrix[$i][$j]{pointer} = "up"; 
} 
else { 
$matrix[$i][$j]{score} = $left_score; 
$matrix[$i][$j]{pointer} = "left"; 
} 
}
Needleman-Wunsch.pl 
#!e:perlbin -w 
use strict; 
# usage statement 
die "usage: $0 <sequence 1> <sequence 2>n" unless @ARGV 
== 2; 
# get sequences from command line 
my ($seq1, $seq2) = @ARGV; 
# scoring scheme 
my $MATCH = 1; # +1 for letters that match 
my $MISMATCH = -1; # -1 for letters that mismatch 
my $GAP = -1; # -1 for any gap
Needleman-Wunsch-edu.pl 
The Score Matrix 
---------------- 
Seq1(j)1 2 3 4 5 6 7 
Seq2 * C K H V F C R 
(i) * 0 -1 -2 -3 -4 -5 -6 -7 
1 C -1 1 a 
0 -1 -2 -3 -4 -5 
2 K -2 0 c 2 b 
1 0 -1 -2 -3 
3 K -3 -1 1 1 0 -1 -2 -3 
4 C -4 -2 0 0 0 -1 0 -1 
5 F -5 -3 -1 -1 -1 1 0 -1 
6 C -6 -4 -2 -2 -2 0 2 1 
7 K -7 -5 -3 -3 -3 -1 1 1 
8 C -8 -6 -4 -4 -4 -2 0 0 
9 V -9 -7 -5 -5 -3 -3 -1 -1 
A: matrix(i,j) = matrix(i-1,j-1) + (MIS)MATCH 
if (substr(seq1,j-1,1) eq substr(seq2,i-1,1) 
B: up_score = matrix(i-1,j) + GAP 
C: left_score = matrix(i,j-1) + GAP
Needleman-Wunsch-edu.pl 
The Score Matrix 
---------------- 
Seq1(j)1 2 3 4 5 6 7 
Seq2 * C K H V F C R 
(i) * 0 -1 -2 -3 -4 -5 -6 -7 
1 C -1 1 0 -1 -2 -3 -4 -5 
2 K -2 0 2 1 0 -1 -2 -3 
3 K -3 -1 1 1 0 -1 -2 -3 
4 C -4 -2 0 0 0 -1 0 -1 
5 F -5 -3 -1 -1 -1 1 0 -1 
6 C -6 -4 -2 -2 -2 0 2 1 
7 K -7 -5 -3 -3 -3 -1 1 1 
8 C -8 -6 -4 -4 -4 -2 0 0 
9 V -9 -7 -5 -5 -3 -3 -1 -1
Needleman-Wunsch-edu.pl
Needleman-Wunsch.pl 
my $align1 = ""; 
my $align2 = ""; 
my $j = length($seq1); 
my $i = length($seq2); 
while (1) { 
last if $matrix[$i][$j]{pointer} eq "none"; 
if ($matrix[$i][$j]{pointer} eq "diagonal") { 
$align1 .= substr($seq1, $j-1, 1); 
$align2 .= substr($seq2, $i-1, 1); 
$i--; $j--; 
} 
elsif ($matrix[$i][$j]{pointer} eq "left") { 
$align1 .= substr($seq1, $j-1, 1); 
$align2 .= "-"; 
$j--; 
} 
elsif ($matrix[$i][$j]{pointer} eq "up") { 
$align1 .= "-"; 
$align2 .= substr($seq2, $i-1, 1); 
$i--; 
} 
} 
$align1 = reverse $align1; 
$align2 = reverse $align2; 
print "$align1n"; 
print "$align2n";
Needleman-Wunsch-edu.pl 
Seq1:CKHVFCRVCI 
Seq2:CKKCFC-KCV 
++--++--+- score = 0
• Practicum: use similarity function in 
initialization step -> scoring tables 
• Time Complexity 
• Use random proteins to generate 
histogram of scores from aligned 
random sequences
Time complexity with needleman-wunsch.pl 
Sequence Length (aa) Execution Time (s) 
10 0 
25 0 
50 0 
100 1 
500 5 
1000 19 
2500 559 
5000 Memory could not be 
written
• -edu version 
• Monte-carlo version
Average around -64 ! 
-80 
-78 
-76 
-74 
-72 ** 
-70 ******* 
-68 *************** 
-66 ************************* 
-64 ************************************************************ 
-60 *********************** 
-58 *************** 
-56 ******** 
-54 **** 
-52 * 
-50 
-48 
-46 
-44 
-42 
-40 
-38
If the sequences are similar, the path 
of the best alignment should be very 
close to the main diagonal. 
Therefore, we may not need to fill the 
entire matrix, rather, we fill a narrow 
band of entries around the main 
diagonal. 
An algorithm that fills in a band of 
width 2k+1 around the main 
diagonal.
Smith-Waterman.pl 
• Three changes 
– The edges of the matrix are initialized to 0 instead 
of increasing gap penalties 
– The maximum score is never less than 0, and no 
pointer is recorded unless the score is greater 
than 0 
– The trace-back starts from the highest score in 
the matrix (rather than at the end of the matrix) 
and ends at a score of 0 (rather than the start of 
the matrix) 
• Demonstration
Sequence Alignments 
Introduction 
Algorithms 
What ? 
Examples 
Properties 
Dynamic Programming for Pairwise Alignment 
Concept 
Example 
Needleman-Wunsch(.pl) 
Smith-Waterman(.pl) 
Multiple Alignment 
MSA 
Hierarchical Pairwise Alignent 
ClustalW, PileUp 
Formatting 
Interpretation 
Alternative Methods 
SIM 
Blast2 
Dali
The best alignment: 
The one with the maximum total score 
Multiple Aligment: n>2
2 to 3: hyperlattice
On its top-left side, the cube is 
"covered" by the polyhedron. The 
edges 1, 2, 3, 6 and 7 are coming 
from the inside, and edges 4 and 5 
can be ignored (and are therefore 
not labeled in the figure).
Computational Complexity of MA by standard Dynamic Programming 
• Each node in the k-dimensional hyperlattice is 
visited once, and therefore the running time 
must be proportional to the number of nodes in 
the lattice. 
– This number is the product of the lengths of the 
sequences. 
– eg. the 3-dimensional lattice as visualized.
• The memory space requirement is even worse. 
To trace back the alignment, we need to store the 
whole lattice, a data structure the size of a 
multidimensional skyscraper. 
– In fact, space is the No.1 problem here, bogging down 
multiple alignment methods that try to achieve 
optimality. 
– Furthermore, incorporating a realistic gap model, we 
will further increase our demands on space and running 
time
Size/Time limits…
• The most practical and widely used 
method in multiple sequence alignment 
is the hierarchical extensions of 
pairwise alignment methods. 
• The principal is that multiple alignments 
is achieved by successive application 
of pairwise methods. 
– First do all pairwise alignments (not just one 
sequence with all others) 
– Then combine pairwise alignments to generate 
overall alignment 
Multiple Alignment Method
• The steps are summarized as follows: 
– Compare all sequences pairwise. 
– Perform cluster analysis on the pairwise data to 
generate a hierarchy for alignment. This may be in 
the form of a binary tree or a simple ordering 
– Build the multiple alignment by first aligning the 
most similar pair of sequences, then the next most 
similar pair and so on. Once an alignment of two 
sequences has been made, then this is fixed. 
Thus for a set of sequences A, B, C, D having 
aligned A with C and B with D the alignment of A, 
B, C, D is obtained by comparing the alignments 
of A and C with that of B and D using averaged 
scores at each aligned position. 
Multiple Alignment Method
Multiple Alignment Method
Multiple Alignment Method
Multiple Sequence Alignment programs 
• Automatic multiple alignemnt 
– extend dynamic programming (MSA - Lipman) 
• limit: computing power: length and number of sequences 
(e.q. 2000^8) 
– progressive alignment (Feng & Doolittle) 
• use “guide tree” (PileUp, ClustalW etc) 
• Dedicated alignment editing program 
– Boxshade 
– SeaView 
– SeqPup (Java) 
• Combination (Biology – Computation)
• ClustalW is a general purpose multiple 
alignment program for DNA or proteins. 
• ClustalW is produced by Julie D. Thompson, 
Toby Gibson of European Molecular Biology 
Laboratory, Germany and Desmond Higgins 
of European Bioinformatics Institute, 
Cambridge, UK. Algorithmic 
• Improves the sensitivity of progressive 
multiple sequence alignment through 
sequence weighting, positions-specific gap 
penalties and weight matrix choice. Nucleic 
Acids Research, 22:4673-4680. 
ClustalW
Running ClustalW 
****** MULTIPLE ALIGNMENT MENU ****** 
1. Do complete multiple alignment now (Slow/Accurate) 
2. Produce guide tree file only 
3. Do alignment using old guide tree file 
4. Toggle Slow/Fast pairwise alignments = SLOW 
5. Pairwise alignment parameters 
6. Multiple alignment parameters 
7. Reset gaps between alignments? = OFF 
8. Toggle screen display = ON 
9. Output format options 
S. Execute a system command 
H. HELP 
or press [RETURN] to go back to main menu 
Your choice:
PileUp 
• Before you run PILEUP, it is necessary to study 
the sequences that will be aligned. 
• PILEUP is very sensitive to gaps, so if a set of 
sequences are of different lengths, gaps will be 
added to the ends of all shorter sequences to 
make them equal to the longest one in the set. 
• If you try to align five 300 nucleotide EST's with a 
single 20,000 nucleotide cosmid, you are adding 
5 X 19,700 gaps to the alignment - and PILEUP 
will crash!
Formatting Multiple Alignments 
• The final product of a PILEUP run is a set of aligned 
sequences, which are stored in a Multiple 
Sequence File (called .msf by GCG). 
This msf file is a text file that can be formatted with 
a text editor, but GCG has some dedicated tools for 
improving the looks of msf files for easier 
interpretation and for publication. 
• Consensus sequences can be calculated and the 
relationship of each character of each sequence to 
the consensus can be highlighted using the 
program PRETTY
Formatting Multiple Alignments 
• Shading of regions of high homology can be created using 
the programs BOXSHADE and PRETTYBOX , but that 
goes beyond the scope of this tutorial. (Boxshade: 
http://www.ch.embnet.org/software/BOX_form.html) 
• In addition to these programs that run on the Alpha, the 
output of PILEUP (or CLUSTAL) can be moved by FTP 
from your RCR account to a local Mac or PC. 
• Since this output is a plain text file, it can be edited with 
any word processing program, or imported into any 
drawing program to add boldface text, underlining, 
shading, boxes, arrows, etc
http://dot.imgen.bcm.tmc.edu:9331/multi-align/multi-align.html
An example of Multiple Alignment … immunoglobulin 
VTISCTGSSSNIGAG-NHVKWYQQLPG 
VTISCTGTSSNIGS--ITVNWYQQLPG 
LRLSCSSSGFIFSS--YAMYWVRQAPG 
LSLTCTVSGTSFDD--YYSTWVRQPPG 
PEVTCVVVDVSHEDPQVKFNWYVDG-- 
ATLVCLISDFYPGA--VTVAWKADS-- 
AALGCLVKDYFPEP--VTVSWNSG--- 
VSLTCLVKGFYPSD--IAVEWWSNG--
An example of Multiple Alignment … immunoglobulin 
• Their alignment highlights conserved 
residues (one of the cysteines forming the 
disulphide bridges, and the tryptophan are 
notable) 
• conserved regions (in particular, "Q.PG" at 
the end of the first 4 sequences), and more 
sophisticated patterns, like the dominance of 
hydrophobic residues at fragment positions 1 
and 3. 
• The alternating hydrophobicity pattern is 
typical for the surface beta-strand at the 
beginning of each fragment. Indeed, multiple 
alignments are helpful for protein structure 
prediction.
A Practical Approach: Interpretation 
• Providing the alignment is accurate 
then the following may be inferred 
about the secondary structure from 
a multiple sequence alignment. 
 The position of insertions and 
deletions (INDELS) suggests 
regions where surface loops exist. 
 Conserved glycine or proline 
suggests a beta-turn.
A Practical Approach: Interpretation 
• Residues with hydrophobic properties 
conserved at i, i+2, i+4 separated by 
unconserved or hydrophilic residues 
suggest surface beta- strands. 
 A short run of hydrophobic amino acids 
(4 residues) suggests a buried beta-strand. 
 Pairs of conserved hydrophobic amino 
acids separated by pairs of 
unconserved, or hydrophilic residues 
suggests an alfa-helix with one face 
packing in the protein core. Likewise, 
an i, i+3, i+4, i+7 pattern of conserved 
hydrophobic residues.
A Practical Approach: Which sequences to use ? 
• Take out noise (GAPS) 
• Extra information (structure - function) 
• Recursive selection 
– first most similar to have an idea about 
conserved regions 
– manual scan for these in more distant 
members then include these
Sequence Alignments 
Introduction 
Algorithms 
What ? 
Examples 
Properties 
Dynamic Programming for Pairwise Alignment 
Concept 
Example 
Needleman-Wunsch(.pl) 
Smith-Waterman(.pl) 
Multiple Alignment 
MSA 
Hierarchical Pairwise Alignent 
ClustalW, PileUp 
Formatting 
Interpretation 
Alternative Methods 
SIM 
Blast2 
Dali
L-align (2 sequences) 
SIM (www.expasy.ch) 
LALNVIEW is available for UNIX, Mac 
and PC on the ExPASy anonymous 
FTP server. 
very nice TWEAKING tool (70% criteria)
Length 
P-value 
SIM
SIM
SIM
How can I use NCBI 
to compare two 
sequences? 
Answer: 
Use the 
“BLAST 2 Sequences” 
program
Practical guide to pairwise alignment: 
the “BLAST 2 sequences” website 
• Go to http://www.ncbi.nlm.nih.gov/BLAST 
• Choose BLAST 2 sequences 
• In the program, 
[1] choose blastp (protein search) or blastn (for DNA) 
[2] paste in your accession numbers 
(or use FASTA format) 
[3] select optional parameters, such as 
--BLOSU62 matrix is default for proteins 
try PAM250 for distantly related proteins 
--gap creation and extension penalties 
[4] click “align”
Question #2: 
How can I use NCBI 
to compare a 
sequence to an 
entire database? 
BLAST!
• An introduction to Basic Concepts in 
Computer Science for Life Scientists 
• Dotplot patterns: A Literal Look at 
Pattern Languages
Practicum 3 
• CpG Islands 
– Download from ENSEMBL 1000 (random) promoters (3000 bp) (hint: 
use Biomart) 
– How many times would you expect to observe CG if all nucleotides 
were equipropable 
– Count the number op times CG is observed for these 1000 genes and 
make a histogram from these scores. 
– Are there any other dinucleatides over- or underrepresented 
– CG repeats are often methylated. In order to study methylation 
patterns bisulfide treatment of DNA is used. Bisulfide changes every C 
which is not followed by G into T. Generate computationally the 
bisulfide treated version of DNA (hint: while (s/C([^G])/T$1/g) {};) 
– How would you find primers that discriminate between methylated and 
unmethylated DNA ? Given that the genome is 3.109 bp how long do 
you need to make a primer to avoid mispriming ?
Weblems 
W4.1: Align the amino acid sequence of acetylcholine 
receptor from human, rat, mouse, dog with 
ClustalW 
T-Coffee 
Dali 
MSA 
W4.2: Use BoxShade to create a word file indicating 
the different conserved resides in colours 
W4.3: Perform a LocalAlignent using SIM and Lalign 
on the same sequence and Blast2 
W4.4: Do the different methods give different results, 
what are the default settings they use ? 
W4.5: How would you identify critical residues for 
catalytic activity ?

More Related Content

What's hot

STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSHEETHUMOLKS
 
Statistical significance of alignments
Statistical significance of alignmentsStatistical significance of alignments
Statistical significance of alignmentsavrilcoghlan
 
Scoring schemes in bioinformatics
Scoring schemes in bioinformaticsScoring schemes in bioinformatics
Scoring schemes in bioinformaticsSumatiHajela
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmProshantaShil
 
Genomic mapping, genetic mapping
Genomic mapping, genetic mappingGenomic mapping, genetic mapping
Genomic mapping, genetic mappingKAUSHAL SAHU
 
Study of Transcriptome
Study of TranscriptomeStudy of Transcriptome
Study of TranscriptomeBOTANYWith
 
GENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICSGENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICSsandeshGM
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignmentKubuldinho
 
In silico structure prediction
In silico structure predictionIn silico structure prediction
In silico structure predictionSubin E K
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijayVijay Hemmadi
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Vijay Hemmadi
 

What's hot (20)

STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
 
Statistical significance of alignments
Statistical significance of alignmentsStatistical significance of alignments
Statistical significance of alignments
 
Express sequence tags
Express sequence tagsExpress sequence tags
Express sequence tags
 
Scoring schemes in bioinformatics
Scoring schemes in bioinformaticsScoring schemes in bioinformatics
Scoring schemes in bioinformatics
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch Algorithm
 
BLAST
BLASTBLAST
BLAST
 
Genomic mapping, genetic mapping
Genomic mapping, genetic mappingGenomic mapping, genetic mapping
Genomic mapping, genetic mapping
 
Study of Transcriptome
Study of TranscriptomeStudy of Transcriptome
Study of Transcriptome
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Types of genomics ppt
Types of genomics pptTypes of genomics ppt
Types of genomics ppt
 
GENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICSGENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICS
 
Est database
Est databaseEst database
Est database
 
Clustal
ClustalClustal
Clustal
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignment
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 
Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
 
In silico structure prediction
In silico structure predictionIn silico structure prediction
In silico structure prediction
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 

Similar to FBWScheduleDotPlots

2015 bioinformatics alignments_wim_vancriekinge
2015 bioinformatics alignments_wim_vancriekinge2015 bioinformatics alignments_wim_vancriekinge
2015 bioinformatics alignments_wim_vancriekingeProf. Wim Van Criekinge
 
Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013Prof. Wim Van Criekinge
 
2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekingeProf. Wim Van Criekinge
 
The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment Parinda Rajapaksha
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastRai University
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastRai University
 
sequence alignment
sequence alignmentsequence alignment
sequence alignmentammar kareem
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadfalizain9604
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructuresKrish_ver2
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructuresKrish_ver2
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchGreg Makowski
 
Chapter 8 advanced sorting and hashing for print
Chapter 8 advanced sorting and hashing for printChapter 8 advanced sorting and hashing for print
Chapter 8 advanced sorting and hashing for printAbdii Rashid
 
Advanced s and s algorithm.ppt
Advanced s and s algorithm.pptAdvanced s and s algorithm.ppt
Advanced s and s algorithm.pptLegesseSamuel
 
Lecture-1-Algorithms.pptx
Lecture-1-Algorithms.pptxLecture-1-Algorithms.pptx
Lecture-1-Algorithms.pptxxalahama3
 
A Variant of Modified Diminishing Increment Sorting: Circlesort and its Perfo...
A Variant of Modified Diminishing Increment Sorting: Circlesort and its Perfo...A Variant of Modified Diminishing Increment Sorting: Circlesort and its Perfo...
A Variant of Modified Diminishing Increment Sorting: Circlesort and its Perfo...CSCJournals
 
Laboratory 1 sequence_alignments
Laboratory 1 sequence_alignmentsLaboratory 1 sequence_alignments
Laboratory 1 sequence_alignmentsseham15
 

Similar to FBWScheduleDotPlots (20)

Bioinformatica 27-10-2011-t4-alignments
Bioinformatica 27-10-2011-t4-alignmentsBioinformatica 27-10-2011-t4-alignments
Bioinformatica 27-10-2011-t4-alignments
 
2015 bioinformatics alignments_wim_vancriekinge
2015 bioinformatics alignments_wim_vancriekinge2015 bioinformatics alignments_wim_vancriekinge
2015 bioinformatics alignments_wim_vancriekinge
 
Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013
 
2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge2016 bioinformatics i_alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge
 
Bioinformatica t4-alignments
Bioinformatica t4-alignmentsBioinformatica t4-alignments
Bioinformatica t4-alignments
 
The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Chapter 8 advanced sorting and hashing for print
Chapter 8 advanced sorting and hashing for printChapter 8 advanced sorting and hashing for print
Chapter 8 advanced sorting and hashing for print
 
Advanced s and s algorithm.ppt
Advanced s and s algorithm.pptAdvanced s and s algorithm.ppt
Advanced s and s algorithm.ppt
 
Lecture-1-Algorithms.pptx
Lecture-1-Algorithms.pptxLecture-1-Algorithms.pptx
Lecture-1-Algorithms.pptx
 
Data Structures 6
Data Structures 6Data Structures 6
Data Structures 6
 
Alignments
AlignmentsAlignments
Alignments
 
A Variant of Modified Diminishing Increment Sorting: Circlesort and its Perfo...
A Variant of Modified Diminishing Increment Sorting: Circlesort and its Perfo...A Variant of Modified Diminishing Increment Sorting: Circlesort and its Perfo...
A Variant of Modified Diminishing Increment Sorting: Circlesort and its Perfo...
 
Laboratory 1 sequence_alignments
Laboratory 1 sequence_alignmentsLaboratory 1 sequence_alignments
Laboratory 1 sequence_alignments
 

More from Prof. Wim Van Criekinge

2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_uploadProf. Wim Van Criekinge
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_uploadProf. Wim Van Criekinge
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_uploadProf. Wim Van Criekinge
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_uploadProf. Wim Van Criekinge
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_uploadProf. Wim Van Criekinge
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Prof. Wim Van Criekinge
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_uploadProf. Wim Van Criekinge
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_uploadProf. Wim Van Criekinge
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_uploadProf. Wim Van Criekinge
 

More from Prof. Wim Van Criekinge (20)

2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload
 
P7 2018 biopython3
P7 2018 biopython3P7 2018 biopython3
P7 2018 biopython3
 
P6 2018 biopython2b
P6 2018 biopython2bP6 2018 biopython2b
P6 2018 biopython2b
 
P4 2018 io_functions
P4 2018 io_functionsP4 2018 io_functions
P4 2018 io_functions
 
P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
 
T1 2018 bioinformatics
T1 2018 bioinformaticsT1 2018 bioinformatics
T1 2018 bioinformatics
 
P1 2018 python
P1 2018 pythonP1 2018 python
P1 2018 python
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]
 
2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload
 
2018 03 20_biological_databases_part3
2018 03 20_biological_databases_part32018 03 20_biological_databases_part3
2018 03 20_biological_databases_part3
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
 
P7 2017 biopython3
P7 2017 biopython3P7 2017 biopython3
P7 2017 biopython3
 
P6 2017 biopython2
P6 2017 biopython2P6 2017 biopython2
P6 2017 biopython2
 

Recently uploaded

Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 

Recently uploaded (20)

Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 

FBWScheduleDotPlots

  • 1.
  • 2. FBW 21-10-2014 Wim Van Criekinge
  • 3. Wel les op 4 november en GEEN les op 18 november
  • 4. Rat versus mouse RBP Rat versus bacterial lipocalin
  • 5. – Henikoff and Henikoff have compared the BLOSUM matrices to PAM by evaluating how effectively the matrices can detect known members of a protein family from a database when searching with the ungapped local alignment program BLAST. They conclude that overall the BLOSUM 62 matrix is the most effective. • However, all the substitution matrices investigated perform better than BLOSUM 62 for a proportion of the families. This suggests that no single matrix is the complete answer for all sequence comparisons. • It is probably best to compliment the BLOSUM 62 matrix with comparisons using 250 PAMS, and Overington structurally derived matrices. – It seems likely that as more protein three dimensional structures are determined, substitution tables derived from structure comparison will give the most reliable data. Overview
  • 6. Dotplots • What is it ? – Graphical representation using two orthogonal axes and “dots” for regions of similarity. – In a bioinformatics context two sequence are used on the axes and dots are plotted when a given treshold is met in a given window. • Dot-plotting is the best way to see all of the structures in common between two sequences or to visualize all of the repeated or inverted repeated structures in one sequence
  • 7. Visual Alignments (Dot Plots) • Matrix – Rows: Characters in one sequence – Columns: Characters in second sequence • Filling – Loop through each row; if character in row, col match, fill in the cell – Continue until all cells have been examined
  • 8. Dotplot-simulator.pl print " $seq1n"; for(my $teller=0;$teller<=$seq2_length;$teller++){ print substr($seq2,$teller,1); $w2=substr($seq2,$teller,$window); for(my $teller2=0;$teller2<=$seq_length;$teller2++){ $w1=substr($seq1,$teller2,$window); if($w1 eq $w2){print "*";}else{print " ";} } print"n"; }
  • 9. Overview Window size = 1, stringency 100%
  • 10. Noise in Dot Plots • Nucleic Acids (DNA, RNA) – 1 out of 4 bases matches at random • Stringency – Window size is considered – Percentage of bases matching in the window is set as threshold
  • 11. Reduction of Dot Plot Noise Self alignment of ACCTGAGCTCACCTGAGTTA
  • 12. Dotplot-simulator.pl Example: ZK822 Genomic and cDNA Gene prediction: How many exons ? Confirm donor and aceptor sites ? Remember to check the reverse complement !
  • 13. Chromosome Y self comparison
  • 14. • Regions of similarity appear as diagonal runs of dots • Reverse diagonals (perpendicular to diagonal) indicate inversions • Reverse diagonals crossing diagonals (Xs) indicate palindromes • A gap is introduced by each vertical or horizontal skip Overview
  • 15. • Window size changes with goal of analysis – size of average exon – size of average protein structural element – size of gene promoter – size of enzyme active site Overview
  • 16. Rules of thumb  Don't get too many points, about 3- 5 times the length of the sequence is about right (1-2%)  Window size about 20 for distant proteins 12 for nucleic acid  Check sequence vs. itself  Check sequence vs. sequence  Anticipate results (e.g. “in-house” sequence vs genomic, question) Overview
  • 17. Available Dot Plot Programs Dotlet (Java Applet) http://www.isrec.isb-sib. ch/java/dotlet/Dotlet. html
  • 18. Sequence Alignments Introduction Algorithms What ? Examples Properties Dynamic Programming for Pairwise Alignment Concept Example Needleman-Wunsch(.pl) Smith-Waterman(.pl) Multiple Alignment MSA Hierarchical Pairwise Alignent ClustalW, PileUp Formatting Interpretation Alternative Methods SIM Blast2 Dali
  • 19. Global and local alignment Pairwise sequence alignment can be global or local Global: the sequences are completely aligned (Needleman and Wunsch, 1970) Local: only the best sub-regions are aligned (Smith and Waterman, 1981). BLAST uses local alignment.
  • 20. Why we do multiple alignments? – In order to characterize protein families, identify shared regions of homology in a multiple sequence alignment; (this happens generally when a sequence search revealed homologies to several sequences) – Determination of the consensus sequence of several aligned sequences – Help prediction of the secondary and tertiary structures of new sequences; – Preliminary step in molecular evolution analysis using Phylogenetic methods for constructing phylogenetic trees – Garbage in, Garbage out – Chicken/egg
  • 21. Why we do multiple alignments? • To find conserved regions – Local multiple alignment reveals conserved regions – Conserved regions usually are key functional regions – These regions are prime targets for drug developments • To do phylogenetic analysis: – Same protein from different species – Optimal multiple alignment probably implies history – Discover irregularities, such as Cystic Fibrosis gene
  • 22. VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-- ATLVCLISDFYPGA--VTVAWKADS-- AALGCLVKDYFPEP--VTVSWNSG--- VSLTCLVKGFYPSD--IAVEWWSNG--
  • 23. Sequence Alignments Introduction Algorithms What ? Examples Properties Dynamic Programming for Pairwise Alignment Concept Example Needleman-Wunsch(.pl) Smith-Waterman(.pl) Multiple Alignment MSA Hierarchical Pairwise Alignent ClustalW, PileUp Formatting Interpretation Alternative Methods SIM Blast2 Dali
  • 24. Algorithms and Programs • Algorithm: a method or a process followed to solve a problem. – A recipe. • An algorithm takes the input to a problem (function) and transforms it to the output. – A mapping of input to output. • A problem can have many algorithms.
  • 25. Bubble Sort Algorithm One of the simplest sorting algorithms proceeds by walking down the list, comparing adjacent elements, and swapping them if they are in the wrong order. The process is continued until the list is sorted. More formally: 1. Initialize the size of the list to be sorted to be the actual size of the list. 2. Loop through the list until no element needs to be exchanged with another to reach its correct position. 2.1 Loop (i) from 0 to size of the list to be sorted - 2. 2.1.1 Compare the ith and (i + 1)st elements in the unsorted list. 2.1.2 Swap the ith and (i + 1)st elements if not in order ( ascending or descending as desired). 2.2 Decrease the size of the list to be sorted by 1. Each pass "bubbles" the largest element in the unsorted part of the list to its correct location. A 13 7 43 5 3 19 2 23 29 ?? ?? ?? ?? ??
  • 26. Bubble Sort Implementation Here is an ascending-order implementation of the bubblesort algorithm for integer arrays: void BubbleSort(int List[] , int Size) { int tempInt; // temp variable for swapping list elems for (int Stop = Size - 1; Stop > 0; Stop--) { for (int Check = 0; Check < Stop; Check++) { // make a pass if (List[Check] > List[Check + 1]) { // compare elems tempInt = List[Check]; // swap if in the List[Check] = List[Check + 1]; // wrong order List[Check + 1] = tempInt; } } } } Bubblesort compares and swaps adjacent elements; simple but not very efficient. Efficiency note: the outer loop could be modified to exit if the list is already sorted.
  • 27. ijs • 6 eierdooiers + 105 gram S1 kristalsuiker • 1’ kloppen to “ruban” • Ondertussen 500 ml volle melk laten opwarmen met 105 gram S1 suiker • Toevoegen vanille en/of chocolade (kaneel) • Langzaam de bijna kokende melk onder ruban kloppen (van het vuur) • Terug op het vuur: “Porter a la nappe” • Afkoelen • “Afdraaien” (in ijsmachine) • 15” voor stolling 500 ml room toevoegen
  • 29. "Great algorithms are the poetry of computation"
  • 30. "Great algorithms are the poetry of computation" 1946: The Metropolis Algorithm for Monte Carlo. Through the use of random processes, this algorithm offers an efficient way to stumble toward answers to problems that are too complicated to solve exactly. 1947: Simplex Method for Linear Programming. An elegant solution to a common problem in planning and decision-making. 1950: Krylov Subspace Iteration Method. A technique for rapidly solving the linear equations that abound in scientific computation. 1951: The Decompositional Approach to Matrix Computations. A suite of techniques for numerical linear algebra. 1957: The Fortran Optimizing Compiler. Turns high-level code into efficient computer-readable code. 1959: QR Algorithm for Computing Eigenvalues. Another crucial matrix operation made swift and practical. 1962: Quicksort Algorithms for Sorting. For the efficient handling of large databases. 1965: Fast Fourier Transform. Perhaps the most ubiquitous algorithm in use today, it breaks down waveforms (like sound) into periodic components. 1977: Integer Relation Detection. A fast method for spotting simple equations satisfied by collections of seemingly unrelated numbers. 1987: Fast Multipole Method. A breakthrough in dealing with the complexity of n-body calculations, applied in problems ranging from celestial mechanics to protein folding. From Random Samples, Science page 799, February 4, 2000.
  • 31. Algorithm Properties • An algorithm possesses the following properties: – It must be correct. – It must be composed of a series of concrete steps. – There can be no ambiguity as to which step will be performed next. – It must be composed of a finite number of steps. – It must terminate. • A computer program is an instance, or concrete representation, for an algorithm in some programming language.
  • 32. Measuring Algorithm Efficiency • Types of complexity – Space complexity – Time complexity • Analysis of algorithms – The measuring of the complexity of an algorithm • Cannot compute actual time for an algorithm – We usually measure worst-case time
  • 33. Measuring Algorithm Efficiency Three algorithms for computing 1 + 2 + … n for an integer n > 0
  • 34. Measuring Algorithm Efficiency The number of operations required by the algorithms
  • 35. Measuring Algorithm Efficiency The number of operations required by the algorithms as a function of n
  • 36. Big Oh Notation • To say "Algorithm A has a worst-case time requirement proportional to n" – We say A is O(n) – Read "Big Oh of n" • For the other two algorithms – Algorithm B is O(n2) – Algorithm C is O(1) • O is derived from order (magnitude)
  • 38. Picturing Efficiency An O(n2) algorithm.
  • 39. Picturing Efficiency Another O(n2) algorithm.
  • 40. Sequence Alignments Introduction Algorithms What ? Examples Properties Dynamic Programming for Pairwise Alignment Concept Example Needleman-Wunsch(.pl) Smith-Waterman(.pl) Multiple Alignment MSA Hierarchical Pairwise Alignent ClustalW, PileUp Formatting Interpretation Alternative Methods SIM Blast2 Dali
  • 41. The best alignment: The one with the maximum total score
  • 42. • Exhaustive … – All combinations: • Algorithm – Dynamic programming (much faster) • Heuristics – Needleman – Wunsh for global alignments (Journal of Molecular Biology, 1970) – Later adapated by Smith-Waterman for local alignment Overview
  • 43. • Score of an alignment: reward matches and penalize mismatches and spaces. – eg, each column gets a (different) value for: • a match: +1, (both have the same characters); • a mismatch : -1, (both have different characters); and • a space in a column: -2. – The total score of an alignment is the sum of the values assigned to its columns.
  • 44. A metric … GACGGATTAG, GATCGGAATAG GA-CGGATTAG GATCGGAATAG +1 (a match), -1 (a mismatch),-2 (gap) 9*1 + 1*(-1)+1*(-2) = 6
  • 45. Dynamic programming Reduce the problem: the solution to a large problem is to simplify … if we first know the solution to a smaller problem that is a subset of the larger problem Overview P P1 P2 P3 P
  • 46. Dynamic Programming • Finding optimal solution to search problem • Recursively computes solution • Fundamental principle is to produce optimal solutions to smaller pieces of the problem first and then glue them together • Efficient divide-and-conquer strategy because it uses a bottom-up approach and utilizes a look-up table instead of recomputing optimal solutions to sub-problems P P1 P2 P3 P
  • 47. Dynamic Programming What is the best way to get from A to C ? Rules: Three stops Solutions: Try all and select best, requires (combin(13,3)) = 286 calculations A C
  • 48. Dynamic Programming What is the best way to get from A to C ? If we known that B is on the optimal path ? A B C
  • 49. Dynamic Programming What is the best way to get from A to B ? 1 2 3 A B C 4 5 6
  • 50. Dynamic Programming What is the best way to get from B to C ? 2 3 A B 4 C 5 6 1
  • 51. Dynamic Programming How many paths from A to C via B ? 6 * 6 = 36 1 2 3 A B 4 C 5 6 1
  • 52. Dynamic Programming Solve the subproblem A to B: 6 calculations 1 2 3 A B C 4 5 6
  • 53. Dynamic Programming Solve the subproblem B to C: 6 calculations 2 3 A B 4 C 5 6 1
  • 54. Dynamic Programming If B is on optimal path from A->C, this optimal path = optimal path from A to B + optimal path from B to C 12 calculations needed (not 36 or 286) A B C 5 3
  • 55. the best alignment between • a zinc-finger core sequence: –CKHVFCRVCI • and a sequence fragment from a viral polyprotein: –CKKCFCKCV
  • 56. C K H V F C R V C I +-------------------- C | 1 1 1 K | 1 K | 1 C | 1 1 1 F | 1 C | 1 1 1 K | 1 C | 1 1 1 V | 1 1 Dynamic Programming
  • 57. C K H V F C R V C I +-------------------- C | 1 1 1 K | 1 K | 1 C | 1 1 1 F | 1 C | 1 1 1 K | 1 C | 1 1 1 V | 1 1 Dynamic Programming
  • 58. C K H V F C R V C I +-------------------- C | 1 1 1 0 K | 1 0 K | 1 0 C | 1 1 1 0 F | 1 0 C | 1 1 1 0 K | 1 0 C | 1 1 1 0 V | 0 0 0 1 0 0 0 1 0 0 Dynamic Programming
  • 59. C K H V F C R V C I +-------------------- C | 1 1 1 0 K | 1 0 K | 1 0 C | 1 1 1 0 F | 1 0 C | 1 1 1 0 K | 1 0 C | 2 1 1 0 V | 0 0 0 1 0 0 0 1 0 0 Dynamic Programming
  • 60. C K H V F C R V C I +-------------------- C | 1 1 1 0 K | 1 0 0 K | 1 0 0 C | 1 1 1 0 F | 1 0 0 C | 1 1 1 0 K | 1 0 0 C | 2 1 1 1 1 2 1 0 1 0 V | 0 0 0 1 0 0 0 1 0 0 Dynamic Programming
  • 61. C K H V F C R V C I +-------------------- C | 1 1 1 1 0 K | 1 1 0 0 K | 1 1 0 0 C | 1 1 1 1 0 F | 1 1 0 0 C | 1 1 1 1 0 K | 2 3 2 2 2 1 1 1 0 0 C | 2 1 1 1 1 2 1 0 1 0 V | 0 0 0 1 0 0 0 1 0 0 Dynamic Programming
  • 62. C K H V F C R V C I +-------------------- C | 1 1 1 1 1 0 K | 1 1 1 0 0 K | 1 1 1 0 0 C | 1 1 1 1 1 0 F | 1 1 1 0 0 C | 4 2 2 2 2 2 1 1 1 0 K | 2 3 2 2 2 1 1 1 0 0 C | 2 1 1 1 1 2 1 0 1 0 V | 0 0 0 1 0 0 0 1 0 0 Dynamic Programming
  • 63. C K H V F C R V C I +-------------------- C | 1 2 1 1 1 0 K | 1 1 1 1 0 0 K | 1 1 1 1 0 0 C | 1 2 1 1 1 0 F | 2 2 2 2 3 1 1 1 0 0 C | 4 2 2 2 2 2 1 1 1 0 K | 2 3 2 2 2 1 1 1 0 0 C | 2 1 1 1 1 2 1 0 1 0 V | 0 0 0 1 0 0 0 1 0 0 Dynamic Programming
  • 64. C K H V F C R V C I +-------------------- C | 1 2 2 1 1 1 0 K | 1 2 1 1 1 0 0 K | 1 2 1 1 1 0 0 C | 4 3 3 3 2 2 1 1 1 0 F | 2 2 2 2 3 1 1 1 0 0 C | 4 2 2 2 2 2 1 1 1 0 K | 2 3 2 2 2 1 1 1 0 0 C | 2 1 1 1 1 2 1 0 1 0 V | 0 0 0 1 0 0 0 1 0 0 Dynamic Programming
  • 65. C K H V F C R V C I +-------------------- C | 1 3 2 2 1 1 1 0 K | 1 3 2 1 1 1 0 0 K | 3 4 3 3 2 1 1 1 0 0 C | 4 3 3 3 2 2 1 1 1 0 F | 2 2 2 2 3 1 1 1 0 0 C | 4 2 2 2 2 2 1 1 1 0 K | 2 3 2 2 2 1 1 1 0 0 C | 2 1 1 1 1 2 1 0 1 0 V | 0 0 0 1 0 0 0 1 0 0 Dynamic Programming
  • 66. C K H V F C R V C I +-------------------- C | 1 3 3 2 2 1 1 1 0 K | 4 4 3 3 2 1 1 1 0 0 K | 3 4 3 3 2 1 1 1 0 0 C | 4 3 3 3 2 2 1 1 1 0 F | 2 2 2 2 3 1 1 1 0 0 C | 4 2 2 2 2 2 1 1 1 0 K | 2 3 2 2 2 1 1 1 0 0 C | 2 1 1 1 1 2 1 0 1 0 V | 0 0 0 1 0 0 0 1 0 0 Dynamic Programming
  • 67. C K H V F C R V C I +-------------------- C | 5 3 3 3 2 2 1 1 1 0 K | 4 4 3 3 2 1 1 1 0 0 K | 3 4 3 3 2 1 1 1 0 0 C | 4 3 3 3 2 2 1 1 1 0 F | 2 2 2 2 3 1 1 1 0 0 C | 4 2 2 2 2 2 1 1 1 0 K | 2 3 2 2 2 1 1 1 0 0 C | 2 1 1 1 1 2 1 0 1 0 V | 0 0 0 1 0 0 0 1 0 0 Dynamic Programming
  • 68. C K H V F C R V C I +-------------------- C | 5 3 3 3 2 2 1 1 1 0 K | 4 4 3 3 2 1 1 1 0 0 K | 3 4 3 3 2 1 1 1 0 0 C | 4 3 3 3 2 2 1 1 1 0 F | 3 2 2 2 3 1 1 1 0 0 C | 4 2 2 2 2 2 1 1 1 0 K | 2 3 2 2 2 1 1 1 0 0 C | 2 1 1 1 1 2 1 0 1 0 V | 0 0 0 1 0 0 0 1 0 0 Dynamic Programming
  • 69. C K H V F C R V C I +-------------------- C | 5 3 3 3 2 2 1 1 1 0 K | 4 4 3 3 2 1 1 1 0 0 K | 3 4 3 3 2 1 1 1 0 0 C | 4 3 3 3 2 2 1 1 1 0 F | 3 2 2 2 3 1 1 1 0 0 C | 4 2 2 2 2 2 1 1 1 0 K | 2 3 2 2 2 1 1 1 0 0 C | 2 1 1 1 1 2 1 0 1 0 V | 0 0 0 1 0 0 0 1 0 0 Dynamic Programming
  • 70. C K H V F C R V C I +-------------------- C | 5 3 3 3 2 2 1 1 1 0 K | 4 4 3 3 2 1 1 1 0 0 K | 3 4 3 3 2 1 1 1 0 0 C | 4 3 3 3 2 2 1 1 1 0 F | 3 2 2 2 3 1 1 1 0 0 C | 4 2 2 2 2 2 1 1 1 0 K | 2 3 2 2 2 1 1 1 0 0 C | 2 1 1 1 1 2 1 0 1 0 V | 0 0 0 1 0 0 0 1 0 0 Dynamic Programming
  • 71. C K H V F C R V C I +-------------------- C | 5 3 3 3 2 2 1 1 1 0 K | 4 4 3 3 2 1 1 1 0 0 K | 3 4 3 3 2 1 1 1 0 0 C | 4 3 3 3 2 2 1 1 1 0 F | 3 2 2 2 3 1 1 1 0 0 C | 4 2 2 2 2 2 1 1 1 0 K | 2 3 2 2 2 1 1 1 0 0 C | 2 1 1 1 1 2 1 0 1 0 V | 0 0 0 1 0 0 0 1 0 0 Dynamic Programming
  • 72. C K H V F C R V C I +-------------------- C | 5 3 3 3 2 2 1 1 1 0 K | 4 4 3 3 2 1 1 1 0 0 K | 3 4 3 3 2 1 1 1 0 0 C | 4 3 3 3 2 2 1 1 1 0 F | 3 2 2 2 3 1 1 1 0 0 C | 4 2 2 2 2 2 1 1 1 0 K | 2 3 2 2 2 1 1 1 0 0 C | 2 1 1 1 1 2 1 0 1 0 V | 0 0 0 1 0 0 0 1 0 0 Dynamic Programming
  • 73. C K H V F C R V C I +-------------------- C | 5 3 3 3 2 2 1 1 1 0 K | 4 4 3 3 2 1 1 1 0 0 K | 3 4 3 3 2 1 1 1 0 0 C | 4 3 3 3 2 2 1 1 1 0 F | 3 2 2 2 3 1 1 1 0 0 C | 4 2 2 2 2 2 1 1 1 0 K | 2 3 2 2 2 1 1 1 0 0 C | 2 1 1 1 1 2 1 0 1 0 V | 0 0 0 1 0 0 0 1 0 0 Dynamic Programming
  • 74. C K H V F C R V C I +-------------------- C | 5 3 3 3 2 2 1 1 1 0 K | 4 4 3 3 2 1 1 1 0 0 K | 3 4 3 3 2 1 1 1 0 0 C | 4 3 3 3 2 2 1 1 1 0 F | 3 2 2 2 3 1 1 1 0 0 C | 4 2 2 2 2 2 1 1 1 0 K | 2 3 2 2 2 1 1 1 0 0 C | 2 1 1 1 1 2 1 0 1 0 V | 0 0 0 1 0 0 0 1 0 0 Dynamic Programming
  • 75. C K H V F C R V C I +-------------------- C | 5 3 3 3 2 2 1 1 1 0 K | 4 4 3 3 2 1 1 1 0 0 K | 3 4 3 3 2 1 1 1 0 0 C | 4 3 3 3 2 2 1 1 1 0 F | 3 2 2 2 3 1 1 1 0 0 C | 4 2 2 2 2 2 1 1 1 0 K | 2 3 2 2 2 1 1 1 0 0 C | 2 1 1 1 1 2 1 0 1 0 V | 0 0 0 1 0 0 0 1 0 0 Dynamic Programming
  • 76. C K H V F C R V C I +-------------------- C | 5 3 3 3 2 2 1 1 1 0 K | 4 4 3 3 2 1 1 1 0 0 K | 3 4 3 3 2 1 1 1 0 0 C | 4 3 3 3 2 2 1 1 1 0 F | 3 2 2 2 3 1 1 1 0 0 C | 4 2 2 2 2 2 1 1 1 0 K | 2 3 2 2 2 1 1 1 0 0 C | 2 1 1 1 1 2 1 0 1 0 V | 0 0 0 1 0 0 0 1 0 0 Dynamic Programming
  • 77. C K H V F C R V C I +-------------------- C | 5 3 3 3 2 2 1 1 1 0 K | 4 4 3 3 2 1 1 1 0 0 K | 3 4 3 3 2 1 1 1 0 0 C | 4 3 3 3 2 2 1 1 1 0 F | 3 2 2 2 3 1 1 1 0 0 C | 4 2 2 2 2 2 1 1 1 0 K | 2 3 2 2 2 1 1 1 0 0 C | 2 1 1 1 1 2 1 0 1 0 V | 0 0 0 1 0 0 0 1 0 0 C K H V F C R V C I C K K C F C - K C V C K H V F C R V C I C K K C F C K - C V C - K H V F C R V C I C K K C - F C - K C V C K H - V F C R V C I C K K C - F C - K C V Dynamic Programming
  • 78. C K H V F C R V C I +-------------------- C | 5 3 3 3 2 2 1 1 1 0 K | 4 4 3 3 2 1 1 1 0 0 K | 3 4 3 3 2 1 1 1 0 0 C | 4 3 3 3 2 2 1 1 1 0 F | 3 2 2 2 3 1 1 1 0 0 C | 4 2 2 2 2 2 1 1 1 0 K | 2 3 2 2 2 1 1 1 0 0 C | 2 1 1 1 1 2 1 0 1 0 V | 0 0 0 1 0 0 0 1 0 0 C K H V F C R V C I C K K C F C - K C V C K H V F C R V C I C K K C F C K - C V C - K H V F C R V C I C K K C - F C - K C V C K H - V F C R V C I C K K C - F C - K C V Dynamic Programming
  • 79. Extensions to basic dynamic programming method use gap penalties – constant gap penalty for gap > 1 – gap penalty proportional to gap size • one penalty for starting a gap (gap opening penalty) • different (lower) penalty for adding to a gap (gap extension penalty) • for nucleic acids, can be used to mimic thermodynamics of helix formation – two kinds of gap opening penalties • one for gap closed by AT, different for GC Dynamic Programming
  • 80. • Zie cursus voor voorbeeld met gap-penalties – zoek de fouten ;-) • Beschikbaar als perl programma waarmee we kunnen experimenteren
  • 81.
  • 82. Needleman-Wunsch.pl # initialization my @matrix; $matrix[0][0]{score} = 0; $matrix[0][0]{pointer} = "none"; for(my $j = 1; $j <= length($seq1); $j++) { $matrix[0][$j]{score} = $GAP * $j; $matrix[0][$j]{pointer} = "left"; } for (my $i = 1; $i <= length($seq2); $i++) { $matrix[$i][0]{score} = $GAP * $i; $matrix[$i][0]{pointer} = "up"; }
  • 83. Needleman-Wunsch-edu.pl The Score Matrix ---------------- Seq1(j)1 2 3 4 5 6 7 Seq2 * C K H V F C R (i) * 0 -1 -2 -3 -4 -5 -6 -7 1 C -1 1 0 -1 -2 -3 -4 -5 2 K -2 0 2 1 0 -1 -2 -3 3 K -3 -1 1 1 0 -1 -2 -3 4 C -4 -2 0 0 0 -1 0 -1 5 F -5 -3 -1 -1 -1 1 0 -1 6 C -6 -4 -2 -2 -2 0 2 1 7 K -7 -5 -3 -3 -3 -1 1 1 8 C -8 -6 -4 -4 -4 -2 0 0 9 V -9 -7 -5 -5 -3 -3 -1 -1
  • 84. Needleman-Wunsch-edu.pl The Score Matrix ---------------- Seq1(j)1 2 3 4 5 6 7 Seq2 * C K H V F C R (i) * 0 -1 -2 -3 -4 -5 -6 -7 1 C -1 1 0 -1 -2 -3 -4 -5 2 K -2 0 2 1 0 -1 -2 -3 3 K -3 -1 1 1 0 -1 -2 -3 4 C -4 -2 0 0 0 -1 0 -1 5 F -5 -3 -1 -1 -1 1 0 -1 6 C -6 -4 -2 -2 -2 0 2 1 7 K -7 -5 -3 -3 -3 -1 1 1 8 C -8 -6 -4 -4 -4 -2 0 0 9 V -9 -7 -5 -5 -3 -3 -1 -1
  • 85. Needleman-Wunsch.pl # fill for(my $i = 1; $i <= length($seq2); $i++) { for(my $j = 1; $j <= length($seq1); $j++) { my ($diagonal_score, $left_score, $up_score); # calculate match score my $letter1 = substr($seq1, $j-1, 1); my $letter2 = substr($seq2, $i-1, 1); if ($letter1 eq $letter2) { $diagonal_score = $matrix[$i-1][$j-1]{score} + $MATCH; } else { $diagonal_score = $matrix[$i-1][$j-1]{score} + $MISMATCH; } # calculate gap scores $up_score = $matrix[$i-1][$j]{score} + $GAP; $left_score = $matrix[$i][$j-1]{score} + $GAP; # choose best score if ($diagonal_score >= $up_score) { if ($diagonal_score >= $left_score) { $matrix[$i][$j]{score} = $diagonal_score; $matrix[$i][$j]{pointer} = "diagonal"; } else { $matrix[$i][$j]{score} = $left_score; $matrix[$i][$j]{pointer} = "left"; } } else { if ($up_score >= $left_score) { $matrix[$i][$j]{score} = $up_score; $matrix[$i][$j]{pointer} = "up"; } else { $matrix[$i][$j]{score} = $left_score; $matrix[$i][$j]{pointer} = "left"; } }
  • 86. Needleman-Wunsch.pl #!e:perlbin -w use strict; # usage statement die "usage: $0 <sequence 1> <sequence 2>n" unless @ARGV == 2; # get sequences from command line my ($seq1, $seq2) = @ARGV; # scoring scheme my $MATCH = 1; # +1 for letters that match my $MISMATCH = -1; # -1 for letters that mismatch my $GAP = -1; # -1 for any gap
  • 87. Needleman-Wunsch-edu.pl The Score Matrix ---------------- Seq1(j)1 2 3 4 5 6 7 Seq2 * C K H V F C R (i) * 0 -1 -2 -3 -4 -5 -6 -7 1 C -1 1 a 0 -1 -2 -3 -4 -5 2 K -2 0 c 2 b 1 0 -1 -2 -3 3 K -3 -1 1 1 0 -1 -2 -3 4 C -4 -2 0 0 0 -1 0 -1 5 F -5 -3 -1 -1 -1 1 0 -1 6 C -6 -4 -2 -2 -2 0 2 1 7 K -7 -5 -3 -3 -3 -1 1 1 8 C -8 -6 -4 -4 -4 -2 0 0 9 V -9 -7 -5 -5 -3 -3 -1 -1 A: matrix(i,j) = matrix(i-1,j-1) + (MIS)MATCH if (substr(seq1,j-1,1) eq substr(seq2,i-1,1) B: up_score = matrix(i-1,j) + GAP C: left_score = matrix(i,j-1) + GAP
  • 88. Needleman-Wunsch-edu.pl The Score Matrix ---------------- Seq1(j)1 2 3 4 5 6 7 Seq2 * C K H V F C R (i) * 0 -1 -2 -3 -4 -5 -6 -7 1 C -1 1 0 -1 -2 -3 -4 -5 2 K -2 0 2 1 0 -1 -2 -3 3 K -3 -1 1 1 0 -1 -2 -3 4 C -4 -2 0 0 0 -1 0 -1 5 F -5 -3 -1 -1 -1 1 0 -1 6 C -6 -4 -2 -2 -2 0 2 1 7 K -7 -5 -3 -3 -3 -1 1 1 8 C -8 -6 -4 -4 -4 -2 0 0 9 V -9 -7 -5 -5 -3 -3 -1 -1
  • 90. Needleman-Wunsch.pl my $align1 = ""; my $align2 = ""; my $j = length($seq1); my $i = length($seq2); while (1) { last if $matrix[$i][$j]{pointer} eq "none"; if ($matrix[$i][$j]{pointer} eq "diagonal") { $align1 .= substr($seq1, $j-1, 1); $align2 .= substr($seq2, $i-1, 1); $i--; $j--; } elsif ($matrix[$i][$j]{pointer} eq "left") { $align1 .= substr($seq1, $j-1, 1); $align2 .= "-"; $j--; } elsif ($matrix[$i][$j]{pointer} eq "up") { $align1 .= "-"; $align2 .= substr($seq2, $i-1, 1); $i--; } } $align1 = reverse $align1; $align2 = reverse $align2; print "$align1n"; print "$align2n";
  • 92. • Practicum: use similarity function in initialization step -> scoring tables • Time Complexity • Use random proteins to generate histogram of scores from aligned random sequences
  • 93. Time complexity with needleman-wunsch.pl Sequence Length (aa) Execution Time (s) 10 0 25 0 50 0 100 1 500 5 1000 19 2500 559 5000 Memory could not be written
  • 94. • -edu version • Monte-carlo version
  • 95. Average around -64 ! -80 -78 -76 -74 -72 ** -70 ******* -68 *************** -66 ************************* -64 ************************************************************ -60 *********************** -58 *************** -56 ******** -54 **** -52 * -50 -48 -46 -44 -42 -40 -38
  • 96. If the sequences are similar, the path of the best alignment should be very close to the main diagonal. Therefore, we may not need to fill the entire matrix, rather, we fill a narrow band of entries around the main diagonal. An algorithm that fills in a band of width 2k+1 around the main diagonal.
  • 97. Smith-Waterman.pl • Three changes – The edges of the matrix are initialized to 0 instead of increasing gap penalties – The maximum score is never less than 0, and no pointer is recorded unless the score is greater than 0 – The trace-back starts from the highest score in the matrix (rather than at the end of the matrix) and ends at a score of 0 (rather than the start of the matrix) • Demonstration
  • 98. Sequence Alignments Introduction Algorithms What ? Examples Properties Dynamic Programming for Pairwise Alignment Concept Example Needleman-Wunsch(.pl) Smith-Waterman(.pl) Multiple Alignment MSA Hierarchical Pairwise Alignent ClustalW, PileUp Formatting Interpretation Alternative Methods SIM Blast2 Dali
  • 99. The best alignment: The one with the maximum total score Multiple Aligment: n>2
  • 100. 2 to 3: hyperlattice
  • 101. On its top-left side, the cube is "covered" by the polyhedron. The edges 1, 2, 3, 6 and 7 are coming from the inside, and edges 4 and 5 can be ignored (and are therefore not labeled in the figure).
  • 102. Computational Complexity of MA by standard Dynamic Programming • Each node in the k-dimensional hyperlattice is visited once, and therefore the running time must be proportional to the number of nodes in the lattice. – This number is the product of the lengths of the sequences. – eg. the 3-dimensional lattice as visualized.
  • 103. • The memory space requirement is even worse. To trace back the alignment, we need to store the whole lattice, a data structure the size of a multidimensional skyscraper. – In fact, space is the No.1 problem here, bogging down multiple alignment methods that try to achieve optimality. – Furthermore, incorporating a realistic gap model, we will further increase our demands on space and running time
  • 105. • The most practical and widely used method in multiple sequence alignment is the hierarchical extensions of pairwise alignment methods. • The principal is that multiple alignments is achieved by successive application of pairwise methods. – First do all pairwise alignments (not just one sequence with all others) – Then combine pairwise alignments to generate overall alignment Multiple Alignment Method
  • 106. • The steps are summarized as follows: – Compare all sequences pairwise. – Perform cluster analysis on the pairwise data to generate a hierarchy for alignment. This may be in the form of a binary tree or a simple ordering – Build the multiple alignment by first aligning the most similar pair of sequences, then the next most similar pair and so on. Once an alignment of two sequences has been made, then this is fixed. Thus for a set of sequences A, B, C, D having aligned A with C and B with D the alignment of A, B, C, D is obtained by comparing the alignments of A and C with that of B and D using averaged scores at each aligned position. Multiple Alignment Method
  • 109. Multiple Sequence Alignment programs • Automatic multiple alignemnt – extend dynamic programming (MSA - Lipman) • limit: computing power: length and number of sequences (e.q. 2000^8) – progressive alignment (Feng & Doolittle) • use “guide tree” (PileUp, ClustalW etc) • Dedicated alignment editing program – Boxshade – SeaView – SeqPup (Java) • Combination (Biology – Computation)
  • 110. • ClustalW is a general purpose multiple alignment program for DNA or proteins. • ClustalW is produced by Julie D. Thompson, Toby Gibson of European Molecular Biology Laboratory, Germany and Desmond Higgins of European Bioinformatics Institute, Cambridge, UK. Algorithmic • Improves the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22:4673-4680. ClustalW
  • 111. Running ClustalW ****** MULTIPLE ALIGNMENT MENU ****** 1. Do complete multiple alignment now (Slow/Accurate) 2. Produce guide tree file only 3. Do alignment using old guide tree file 4. Toggle Slow/Fast pairwise alignments = SLOW 5. Pairwise alignment parameters 6. Multiple alignment parameters 7. Reset gaps between alignments? = OFF 8. Toggle screen display = ON 9. Output format options S. Execute a system command H. HELP or press [RETURN] to go back to main menu Your choice:
  • 112.
  • 113. PileUp • Before you run PILEUP, it is necessary to study the sequences that will be aligned. • PILEUP is very sensitive to gaps, so if a set of sequences are of different lengths, gaps will be added to the ends of all shorter sequences to make them equal to the longest one in the set. • If you try to align five 300 nucleotide EST's with a single 20,000 nucleotide cosmid, you are adding 5 X 19,700 gaps to the alignment - and PILEUP will crash!
  • 114. Formatting Multiple Alignments • The final product of a PILEUP run is a set of aligned sequences, which are stored in a Multiple Sequence File (called .msf by GCG). This msf file is a text file that can be formatted with a text editor, but GCG has some dedicated tools for improving the looks of msf files for easier interpretation and for publication. • Consensus sequences can be calculated and the relationship of each character of each sequence to the consensus can be highlighted using the program PRETTY
  • 115. Formatting Multiple Alignments • Shading of regions of high homology can be created using the programs BOXSHADE and PRETTYBOX , but that goes beyond the scope of this tutorial. (Boxshade: http://www.ch.embnet.org/software/BOX_form.html) • In addition to these programs that run on the Alpha, the output of PILEUP (or CLUSTAL) can be moved by FTP from your RCR account to a local Mac or PC. • Since this output is a plain text file, it can be edited with any word processing program, or imported into any drawing program to add boldface text, underlining, shading, boxes, arrows, etc
  • 117. An example of Multiple Alignment … immunoglobulin VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-- ATLVCLISDFYPGA--VTVAWKADS-- AALGCLVKDYFPEP--VTVSWNSG--- VSLTCLVKGFYPSD--IAVEWWSNG--
  • 118. An example of Multiple Alignment … immunoglobulin • Their alignment highlights conserved residues (one of the cysteines forming the disulphide bridges, and the tryptophan are notable) • conserved regions (in particular, "Q.PG" at the end of the first 4 sequences), and more sophisticated patterns, like the dominance of hydrophobic residues at fragment positions 1 and 3. • The alternating hydrophobicity pattern is typical for the surface beta-strand at the beginning of each fragment. Indeed, multiple alignments are helpful for protein structure prediction.
  • 119. A Practical Approach: Interpretation • Providing the alignment is accurate then the following may be inferred about the secondary structure from a multiple sequence alignment.  The position of insertions and deletions (INDELS) suggests regions where surface loops exist.  Conserved glycine or proline suggests a beta-turn.
  • 120. A Practical Approach: Interpretation • Residues with hydrophobic properties conserved at i, i+2, i+4 separated by unconserved or hydrophilic residues suggest surface beta- strands.  A short run of hydrophobic amino acids (4 residues) suggests a buried beta-strand.  Pairs of conserved hydrophobic amino acids separated by pairs of unconserved, or hydrophilic residues suggests an alfa-helix with one face packing in the protein core. Likewise, an i, i+3, i+4, i+7 pattern of conserved hydrophobic residues.
  • 121. A Practical Approach: Which sequences to use ? • Take out noise (GAPS) • Extra information (structure - function) • Recursive selection – first most similar to have an idea about conserved regions – manual scan for these in more distant members then include these
  • 122. Sequence Alignments Introduction Algorithms What ? Examples Properties Dynamic Programming for Pairwise Alignment Concept Example Needleman-Wunsch(.pl) Smith-Waterman(.pl) Multiple Alignment MSA Hierarchical Pairwise Alignent ClustalW, PileUp Formatting Interpretation Alternative Methods SIM Blast2 Dali
  • 123. L-align (2 sequences) SIM (www.expasy.ch) LALNVIEW is available for UNIX, Mac and PC on the ExPASy anonymous FTP server. very nice TWEAKING tool (70% criteria)
  • 125. SIM
  • 126. SIM
  • 127. How can I use NCBI to compare two sequences? Answer: Use the “BLAST 2 Sequences” program
  • 128. Practical guide to pairwise alignment: the “BLAST 2 sequences” website • Go to http://www.ncbi.nlm.nih.gov/BLAST • Choose BLAST 2 sequences • In the program, [1] choose blastp (protein search) or blastn (for DNA) [2] paste in your accession numbers (or use FASTA format) [3] select optional parameters, such as --BLOSU62 matrix is default for proteins try PAM250 for distantly related proteins --gap creation and extension penalties [4] click “align”
  • 129.
  • 130.
  • 131. Question #2: How can I use NCBI to compare a sequence to an entire database? BLAST!
  • 132.
  • 133.
  • 134. • An introduction to Basic Concepts in Computer Science for Life Scientists • Dotplot patterns: A Literal Look at Pattern Languages
  • 135. Practicum 3 • CpG Islands – Download from ENSEMBL 1000 (random) promoters (3000 bp) (hint: use Biomart) – How many times would you expect to observe CG if all nucleotides were equipropable – Count the number op times CG is observed for these 1000 genes and make a histogram from these scores. – Are there any other dinucleatides over- or underrepresented – CG repeats are often methylated. In order to study methylation patterns bisulfide treatment of DNA is used. Bisulfide changes every C which is not followed by G into T. Generate computationally the bisulfide treated version of DNA (hint: while (s/C([^G])/T$1/g) {};) – How would you find primers that discriminate between methylated and unmethylated DNA ? Given that the genome is 3.109 bp how long do you need to make a primer to avoid mispriming ?
  • 136. Weblems W4.1: Align the amino acid sequence of acetylcholine receptor from human, rat, mouse, dog with ClustalW T-Coffee Dali MSA W4.2: Use BoxShade to create a word file indicating the different conserved resides in colours W4.3: Perform a LocalAlignent using SIM and Lalign on the same sequence and Blast2 W4.4: Do the different methods give different results, what are the default settings they use ? W4.5: How would you identify critical residues for catalytic activity ?