M.SRI ARAVIND
LAL
B841018
INTRODUCTION
• In computional biology a dot plot is a graphical methods
for comparing two biological sequences and identifying
region of close similarity
• It is type of recurrence plot (graph of horizontal and
vertical axis
HISTORY
• These are introduced by Gibbs and Mclntyre in 1970
• These plot are two dimensional matrices that have
sequences of the proteins being compared along the vertical
and horizontal axis.
• Individual cells in matrix can be shaded black,if the residue
are identical
• Thus matched sequences run of diagonal lines across the
matrix.
PRINCIPLE
• The principle used to generate the dot plot is:
• The top X and the left y axes of a rectangular array are used to represent the two
sequences to be compared
• Calculation:
• Matrix Columns = residues of sequence 1
Rows = residues of sequence 2
EXAMPLE
• Seq 1: TWILIGHTZONE
• Seq 2: MIDNIGHTZONE Matrix= 12 * 12
• A dot is plotted at every co-ordinate where there is similarity between the bases
DOT PLOT INTERPRETATION
• Seq1: ATGATAT
• Seq2: ATGATAT
SIMPLE PLOT TERMS
• Window: size of sequence block used for comparison.
example:
window = 1
• Stringency = Number of matches required to score
positive.
example:
stringency = 1 (required exact match)
DOTPLOT SCORING
• Dotplot- matrix, with one sequence across top, other down side. Put a dot, or 1, where ever
there is identity.
G A T C T
G
A
T
C
T
DOTPLOT SCORING
• Dotplot- matrix, with one sequence across top, other down side. Put a dot, or 1, where ever
there is identity.
G A T C T
G
A
T
C
T
.
DOTPLOT SCORING
• Dotplot- matrix, with one sequence across top, other down side. Put a dot, or 1, where ever
there is identity.
G A T C T
G
A
T
C
T
... .
DOTPLOT SCORING
• Dotplot- matrix, with one sequence across top, other down side. Put a dot, or 1, where ever
there is identity.
G A T C T
G
A
T
C
T
... ... .
G A T A C T G C G A T A C T G C G C A
G 1 1 1 1 1
A 1 1 1 1 1
T 1 1 1 1 1
A 1 1 1 1
C 1 1 1 1 1
T 1 1 1 1
G 1 1 1 1
C 1 1 1 1
G 1 1 1
A 1 1 1
T 1 1
A 1
C 1 1 1
T 1
G 1 1
C 1 1
G 1
C 1
A 1
G A T A C T G C A T C G T C A C T C A
G 1 1 1
A 1 1 1 1 1
T 1 1 1 1 1
A 1 1 1 1
C 1 1 1 1 1 1
T 1 1 1 1
G 1 1
C 1 1 1 1 1
A 1 1 1
T 1 1 1
C 1 1 1 1
G 1
T 1 1
C 1 1 1
A 1 1
C 1 1
T 1
C 1
A 1
INTRAGENIC COMPARISON
• Rat Groucho Gene
INTERGENIC COMPARISON
• Rat and Drosophila Groucho Gene
INTERGENIC COMPARISON
• Nucleotide sequence contains three
domains.
INTERGENIC COMPARISON
• Nucleotide sequence contains three
domains.
• 50 - 350 - Strong conservation
• Indel places comparison out of register
INTERGENIC COMPARISON
• Nucleotide sequence contains three
domains.
• 50 - 350 - Strong conservation
• Indel places comparison out of register
• 450 - 1300 - Slightly weaker conservation
INTERGENIC COMPARISON
• Nucleotide sequence contains three
domains.
• 50 - 350 - Strong conservation
• Indel places comparison out of register
• 450 - 1300 - Slightly weaker conservation
• 1300 - 2400 - Strong conservation
ANALYSIS OF DOT PLOT MATRIX
• Principal diagonal shows identical sequence.
• Global and local alignment are shown.
• Multiple diagonal indicate repeatation
• Reverse diagonal (perpendicular to diagonal) indicate
INVERSION.
• Reverse diagonal crossing diagonal (X) indicate
PALINDROMES.
• Formation of box indicate the low complexity region
DIRECT REPEAT
PALINDROMIC SEQUENCE
• A palindromic sequence is a nucleic acid sequence (DNA or RNA) tha is same
whether read 5' to 3' on one strand or 5' to 3' on the complementary strand with
which it forms a double helix.
INVERTED REPEAT
• An inverted repeat is sequence of nucleotides followed downstream by its
reverse complement.
• Inverted repeat: abcdeedcbafghijklmno
LOW-COMPLEXITY REGIONS
• Low-complexity regions in sequences can be found as regions around the
diagonal all obtaining a high score. Low complexity regions are calculated from
the redundancy of amino acids within a limited region.
DOT PLOT SOFTWARE
• we can use the EMBOSS package, which are following:
 Dotmatcher
 Dotpath
 Polydot
 Dottup
(http://emboss.bioinformatics.nl/cgi-
bin/emboss/dottup
JOURNALS
APPLICATION
• Shows the all possible alignment between two nucleic
acid and amino acid sequences.
• Help to recognise large region of simiarity.
• An excellent approach for finding sequence transposition.
• To find the location of genes between two genomes.
• To find the non sequential alignment.
LIMITATION
• For longer sequence, memory required for the graphical
representation is very high. So long sequence can not be aligned.
(only 2 sequence can align at a time)
• Lots of insignifcant matches makes it noisy (so many off diagonal
appear).
• Time required to compare two sequences is proportional to the
product of length of the sequences time of the search window. (not
very quick)
i.e, higher efficiency of short sequence.
Low efficiency of long sequence.
GAP PENALITY
• Gap penality is a method of scoring alignment of two or more sequence.
• when a gap is inserted in an sequence it matches more than the sequence
without gap insertion.
• Too many gap can cause an alignment to become meaningless.
Types of gap penality
Constant
Linear
affine
SCORING SCHEMES
TYPES OF GAP PENALITY
Constant
This is the simplest type of gap penality and a fixed negative score is given to
every gap, regardless of its length.
ATTGACCTGA EACH MATCH=1 SCORE 7-1=6
AT CCTGA WHOLE GAP=1
TYPES OF GAP PENALITY
Linear
The linear gap penalty takes into account the length (L) of each insertion/deletion
in the gap.
ATTGACCTGA EACH MATCH =1
AT CCTGA EACH GAP = -1
The score here is (7 − 3 = 4).
TYPES OF GAP PENALITY
Affine
 Most widely used gap penality and it combines both linear and
constant gap penality.
 Penality is based on form of A+B.L
 A is known as the gap opening penalty, B the gap extension penalty
and L the length of the gap.
 Gap opening refers to the cost required to open a gap of any length,
and gap extension the cost to extend the length of an existing gap
by 1.
VALUE IS 26
VALUE IS
7
REFERENCES
• Bioinformatics concepts, skill & applications, second edition by
S.C.Rastogi, Namita Mendriatta, Parag Rastogi
• http://en.wikipedia.org/wiki/Dot_plot_%28bioinformatics%29
• http://lectures.molgen.mpg.de/Pairwise/DotPlots/
• https://ugene.unipro.ru/wiki/pages/viewpage.action?pageId=4
227426
• http://www.clcsupport.com/clcgenomicsworkbench/650/Examples
_interpretations_dot_plots.html
Dot matrix seminar
Dot matrix seminar
Dot matrix seminar
Dot matrix seminar

Dot matrix seminar

  • 1.
  • 2.
    INTRODUCTION • In computionalbiology a dot plot is a graphical methods for comparing two biological sequences and identifying region of close similarity • It is type of recurrence plot (graph of horizontal and vertical axis
  • 3.
    HISTORY • These areintroduced by Gibbs and Mclntyre in 1970 • These plot are two dimensional matrices that have sequences of the proteins being compared along the vertical and horizontal axis. • Individual cells in matrix can be shaded black,if the residue are identical • Thus matched sequences run of diagonal lines across the matrix.
  • 4.
    PRINCIPLE • The principleused to generate the dot plot is: • The top X and the left y axes of a rectangular array are used to represent the two sequences to be compared • Calculation: • Matrix Columns = residues of sequence 1 Rows = residues of sequence 2
  • 5.
    EXAMPLE • Seq 1:TWILIGHTZONE • Seq 2: MIDNIGHTZONE Matrix= 12 * 12 • A dot is plotted at every co-ordinate where there is similarity between the bases
  • 6.
    DOT PLOT INTERPRETATION •Seq1: ATGATAT • Seq2: ATGATAT
  • 7.
    SIMPLE PLOT TERMS •Window: size of sequence block used for comparison. example: window = 1 • Stringency = Number of matches required to score positive. example: stringency = 1 (required exact match)
  • 8.
    DOTPLOT SCORING • Dotplot-matrix, with one sequence across top, other down side. Put a dot, or 1, where ever there is identity. G A T C T G A T C T
  • 9.
    DOTPLOT SCORING • Dotplot-matrix, with one sequence across top, other down side. Put a dot, or 1, where ever there is identity. G A T C T G A T C T .
  • 10.
    DOTPLOT SCORING • Dotplot-matrix, with one sequence across top, other down side. Put a dot, or 1, where ever there is identity. G A T C T G A T C T ... .
  • 11.
    DOTPLOT SCORING • Dotplot-matrix, with one sequence across top, other down side. Put a dot, or 1, where ever there is identity. G A T C T G A T C T ... ... .
  • 12.
    G A TA C T G C G A T A C T G C G C A G 1 1 1 1 1 A 1 1 1 1 1 T 1 1 1 1 1 A 1 1 1 1 C 1 1 1 1 1 T 1 1 1 1 G 1 1 1 1 C 1 1 1 1 G 1 1 1 A 1 1 1 T 1 1 A 1 C 1 1 1 T 1 G 1 1 C 1 1 G 1 C 1 A 1
  • 13.
    G A TA C T G C A T C G T C A C T C A G 1 1 1 A 1 1 1 1 1 T 1 1 1 1 1 A 1 1 1 1 C 1 1 1 1 1 1 T 1 1 1 1 G 1 1 C 1 1 1 1 1 A 1 1 1 T 1 1 1 C 1 1 1 1 G 1 T 1 1 C 1 1 1 A 1 1 C 1 1 T 1 C 1 A 1
  • 14.
  • 18.
    INTERGENIC COMPARISON • Ratand Drosophila Groucho Gene
  • 20.
    INTERGENIC COMPARISON • Nucleotidesequence contains three domains.
  • 21.
    INTERGENIC COMPARISON • Nucleotidesequence contains three domains. • 50 - 350 - Strong conservation • Indel places comparison out of register
  • 22.
    INTERGENIC COMPARISON • Nucleotidesequence contains three domains. • 50 - 350 - Strong conservation • Indel places comparison out of register • 450 - 1300 - Slightly weaker conservation
  • 23.
    INTERGENIC COMPARISON • Nucleotidesequence contains three domains. • 50 - 350 - Strong conservation • Indel places comparison out of register • 450 - 1300 - Slightly weaker conservation • 1300 - 2400 - Strong conservation
  • 24.
    ANALYSIS OF DOTPLOT MATRIX • Principal diagonal shows identical sequence. • Global and local alignment are shown. • Multiple diagonal indicate repeatation • Reverse diagonal (perpendicular to diagonal) indicate INVERSION. • Reverse diagonal crossing diagonal (X) indicate PALINDROMES. • Formation of box indicate the low complexity region
  • 25.
  • 26.
    PALINDROMIC SEQUENCE • Apalindromic sequence is a nucleic acid sequence (DNA or RNA) tha is same whether read 5' to 3' on one strand or 5' to 3' on the complementary strand with which it forms a double helix.
  • 27.
    INVERTED REPEAT • Aninverted repeat is sequence of nucleotides followed downstream by its reverse complement. • Inverted repeat: abcdeedcbafghijklmno
  • 28.
    LOW-COMPLEXITY REGIONS • Low-complexityregions in sequences can be found as regions around the diagonal all obtaining a high score. Low complexity regions are calculated from the redundancy of amino acids within a limited region.
  • 29.
    DOT PLOT SOFTWARE •we can use the EMBOSS package, which are following:  Dotmatcher  Dotpath  Polydot  Dottup (http://emboss.bioinformatics.nl/cgi- bin/emboss/dottup
  • 30.
  • 31.
    APPLICATION • Shows theall possible alignment between two nucleic acid and amino acid sequences. • Help to recognise large region of simiarity. • An excellent approach for finding sequence transposition. • To find the location of genes between two genomes. • To find the non sequential alignment.
  • 32.
    LIMITATION • For longersequence, memory required for the graphical representation is very high. So long sequence can not be aligned. (only 2 sequence can align at a time) • Lots of insignifcant matches makes it noisy (so many off diagonal appear). • Time required to compare two sequences is proportional to the product of length of the sequences time of the search window. (not very quick) i.e, higher efficiency of short sequence. Low efficiency of long sequence.
  • 33.
    GAP PENALITY • Gappenality is a method of scoring alignment of two or more sequence. • when a gap is inserted in an sequence it matches more than the sequence without gap insertion. • Too many gap can cause an alignment to become meaningless. Types of gap penality Constant Linear affine
  • 34.
  • 35.
    TYPES OF GAPPENALITY Constant This is the simplest type of gap penality and a fixed negative score is given to every gap, regardless of its length. ATTGACCTGA EACH MATCH=1 SCORE 7-1=6 AT CCTGA WHOLE GAP=1
  • 36.
    TYPES OF GAPPENALITY Linear The linear gap penalty takes into account the length (L) of each insertion/deletion in the gap. ATTGACCTGA EACH MATCH =1 AT CCTGA EACH GAP = -1 The score here is (7 − 3 = 4).
  • 37.
    TYPES OF GAPPENALITY Affine  Most widely used gap penality and it combines both linear and constant gap penality.  Penality is based on form of A+B.L  A is known as the gap opening penalty, B the gap extension penalty and L the length of the gap.  Gap opening refers to the cost required to open a gap of any length, and gap extension the cost to extend the length of an existing gap by 1.
  • 38.
  • 39.
  • 40.
    REFERENCES • Bioinformatics concepts,skill & applications, second edition by S.C.Rastogi, Namita Mendriatta, Parag Rastogi • http://en.wikipedia.org/wiki/Dot_plot_%28bioinformatics%29 • http://lectures.molgen.mpg.de/Pairwise/DotPlots/ • https://ugene.unipro.ru/wiki/pages/viewpage.action?pageId=4 227426 • http://www.clcsupport.com/clcgenomicsworkbench/650/Examples _interpretations_dot_plots.html