In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.
The following slides were prepared by POORNIMA M.S student of II M.Sc., Life Science Bangalore University, Bangalore
3. INTRODUCTION
• A Sequence alignment is a way of arranging the primary sequences of
DNA,RNA or Proteins to identify regions of similarity that may be a
consequences of FUNCTIONAL,STRUCTURAL and EVOLUTIONARY relationship
between the sequences.
• Sequence alignment is the procedure of comparing two (pair-wise
alignment)or more (multiple sequences) by searching for a series of individual
characters/patterns that are in the same order in the sequences.
• This can also be defined as “ the alignment is made between a known
sequence & unknown sequence or between 2 unknown sequence”.
4. Contd..
• Here, the known sequence is called the “REFERENCE SEQUENCE” and the
unknown sequence is called the “QUERY SEQUENCE”.
5. SEQUENCE ALIGNMENT
• Motivation:Access similarity of
sequences & learn about their
evolutionary relationship.
• Sequence Homology: 2/ more
sequences are Homologous if they
evolved from a common ancestor
• A good alignment is one with few
Substitutions & Indels.
6. WHY ALIGNMENT NEEDED ❓❓❓
• We need to be able to compare
sequences for similarities &
differences.
• Often what we are looking for are
not exact matches,but similarities.
• Homology- similarity due to
descent from a common ancestor.
• We can sometimes infer structure/
function from sequence similarity.
7. PRINCIPLES
• Alignment can reveal HOMOLOGY between sequences.
• Similarity is descriptive term that tells about the degree of
match between 2 sequences.
• Sequence similarity doesn’t always imply a common function.
• Conserved function doesn’t always imply similarity at the
sequence level.
• Convergent evolution: sequences are highly similar, but are not
homologous.
8. GOALS OF SEQUENCE ALIGNMENT
•To identify conserved regions & differences.
•To see whether a substring in one sequence
aligns well with a substring in other.
9. EXAMPLE ALIGNMENT: GLOBINS
• Figure at right shows prototypical structure
of globins.
• Figure at below shows part of alignment for
8 globins.
11. Sequence alignment problems
• No.of sequences:
✓ 2 sequences--Pairwise Alignment
✓>2 sequences– MSA
• Which part to align ??
✓Whole sequence--Global alignment
✓Parts of sequence—Local alignment
• How to compute similarity ??
✓Ways to compute substitution scores
✓ Ways to compute gap penalties.
12. TYPES IN SEQUENCE ALIGNMENT
2 types in Sequence Alignment;
1. PAIRWISE ALIGNMENT :It is a method used
to find the best-matching piece-wise( Global
& Local ) alignments of 2 query sequence at
a time.
2. MULTIPLE SEQUENCE ALIGNMENT :It is an
extension of pairwise alignment to incorporate
3/ more sequences of similar length at a time.
13. PAIRWISE ALIGNMENT TYPES
There are majorly 2 types of Pairwise Alignment . They are;
1. Global alignment.
2. Local alignment.
14. GLOBAL ALIGNMENT
• The alignment is stretched over the entire sequence length
to include as many matching amino acids as possible upto
and including the sequence ends.Vertical lines between the
sequence indicate the presence of Identical amino acids.
• Involves the EMBOSS Needle tool.
• Ex: Needleman-Wunsch algorithm
15. LOCAL ALIGNMENT
• The Alignment tends to stop at the end of the regions of
identity or strong similarity.A much higher priority is given to
finding these local regions than extending the alignment to
include more neighboring amino acid pairs .
• Involves the BLAST tool.
• Ex: Smith-Waterman algorithm
16. Pairwise Alignment in MSA
• The most practical & widely used method in multiple sequence
alignment is the hierarchical extensions of Pairwise Alignment
methods.
• Here the principle is, the Multiple Alignments are achieved by
successive application of Pairwise methods.
Alignment help to analyze
Sequence data :
Organize & Visualize.
17. ISSUES IN SEQUENCE ALIGNMENT
• The sequences we are comparing probably differ in length
• There may b only relatively small regions in the sequence that
match.
• Variable length regions may have been inserted / deleted from the
common ancestral sequence.
18. Advantages of Sequence alignment:
• Sequences of different length are compared.
• Long sequences containing both coding and non-
coding regions are compared.
• Proteins from different protein families are compared
to find conserved domain
• Possible to determine e-values.
• Checking minor differences between 2 sequences.
• Easy to understand complete sequence in output.
• Functional orthology detection.
19. REFERENCES
• Jurate Daugelaite, Aisling O' Driscoll, Roy D. Sleator, An Overview of Multiple
Sequence Alignments and Cloud Computing in Bioinformatics ,Published
2013,DOI:10.1155/2013/6156300
• C. B. Do and K. Katoh, “Protein multiple sequence alignment” Methods in
Molecular Biology,vol.484,pp.379–413,2008.
• Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD.
(2003). Multiple sequence alignment with the Clustal series of programs. Nucleic
Acids Res., 31, 3497-3500.
• J. D. Thompson, F. Plewniak, and O. Poch, “A comprehensive comparison of
multiple sequence alignment programs,” Nucleic Acids Research, vol. 27, no. 13,
pp. 2682–2690, 1999
• https://www.ncbi.nlm.nih.gov/protein/?term=acidic+ribosomal+protein+po+l10e
20. “How can WORDS not matter
the foremost ,when our DNA is a sequence of
LETTERS.”