This document discusses patterns and profiles in bioinformatics. It provides an example of a conserved pattern ("L-X(6)-L-X(6)-L-X(6)-L") found in 'Leucine zipper motif' proteins. It then shows how a profile is created by adding occurrence numbers from a multiple sequence alignment. Profiles capture conserved residues and patterns better than single sequences. Conserved residues likely involve protein function. Patterns help classify homologs and identify other sequences. The document also introduces the bioinformatics database PROSITE for identifying protein domains and functional sites.
1. H.E.J. Research Institute of Chemistry,
International Center for Chemical and Biological Sciences,
University of Karachi,
Karachi
Chem-727 (Bioinformatics)
Lecture-21
Patterns and Profiles in Bioinformatics
3. Conserved pattern in ‘Leucine zipper motif’ containing proteins
L-X(6)-L-X(6)-L-X(6)-L
4. EFGHIVW
EYAHMIW
DYAHSLW
EFGHPLW
[ED]- [FY]- H-[GA]- X- [VIL]- W
A pattern obtained from
multiple alignment of a
family of proteins.
But – there are slightly
more glutamates than
aspartates in the
alignment!
5. Making profile
let’s add some numbers to the
problem!
EFGHIVW
EYAHMIW
DYAHSLW
EFGHPLW
E
D
F
Y
H
G
A
I
M
S
P
L
V
W
One
15
5
0
0
0
0
0
0
0
0
0
0
0
0
Two
0
0
10
10
0
0
0
0
0
0
0
0
0
0
Three
0
0
0
0
0
10
10
0
0
0
0
0
0
0
Four
0
0
0
0
20
0
0
0
0
0
0
0
0
0
Five
0
0
0
0
0
0
0
5
5
5
5
0
0
0
Six
0
0
0
0
0
0
0
5
0
0
0
10
5
0
Seven
0
0
0
0
0
0
0
0
0
0
0
0
0
20
Positions
6. Significance of Patterns and Profiles
• Profiles express the patterns inherent in multiple alignment of a set
of homologous sequences.
• They permit greater accuracy in alignments of distantly-related
sequences.
• Highly conserved residues are likely to be part of the active site
• Conservation patterns facilitate identification of other homologous
sequences.
• Patterns help in classifying subfamilies within a set of homologues.
• Residues with little conservation and are subject to indels, are likely
to be in the surface loops of protein structures.
• Reliability of protein-structure prediction methods (in particular
homology modeling) increase due to information obtained from
multiple alignments
7. PROSITE; premier bioinformatics service for
identification of patterns and profiles
• PROSITE is a database of protein families and domains.It
consists of entries describing the domains, families and
functional sites as well as amino acid patterns, signatures, and
profiles in them.
• Applications include (a) identifying possible functions of
newly discovered proteins and (b) analysis of known proteins
for previously undetermined activity.
• Prosite outputs contain information about biologically
meaningful residues, like active sites, substrate- or co-factor-
binding sites, posttranslational modification sites or disulfide
bonds, to help function determination.
• E.g. prediction of phosphorylation/glycosylation sites in
protein sequences.