Introduction to Protein Families and Databases

An introduction to
Protein Families
and databases
Jamia Millia Islamia

Date 2
Pitching in
Protein Families and the need
for classification
Domains & Motifs with GPCRs
as example
Vrinda Sharma
Groundwork
Sequence Features
Protein Signatures
Patterns & Profiles
HMMs
Wanchha Maurya
Showstopper
DUFs- a story worth reciting
Databases of Protein Families
Demistifying the Hypotheticals
Rohit Satyam

Need for classification
Date 3
Proteins can be classified into
groups based on sequence or
structural similarity.
These groups often contain
well characterised proteins
whose function is known.
Thus, when a novel protein is
identified, its functional
properties can be proposed
based on the group to which
it is predicted to belong.
Source: EMBL-EBI Training Course:
https://www.ebi.ac.uk/training-
beta/online/courses/protein-classification-intro-ebi-
resources/protein-classification/what-are-protein-
families/

Protein Families in Brief
Date Your Footer Here
Group of Proteins which
• Shares a common evolutionary origin
• Performs related functions
• Similar in sequence or structure.
Superfamily
Family A
Subfamily
A1
Subfamily
A2
Family B Family C
Subfamily
C1
Subfamily
C2
Subfamily
C3

Domain and Motifs
aren’t synonyms
Date
Domains are distinct functional and/or structural
units in a protein.
They are responsible for a particular function or
interaction, contributing to the overall role of a
protein.
Motifs are secondary structure that are formed
due to interaction between alpha-helices and
beta-sheets.
Structure of the SH3 domain
Domain composition of Nck. Nck contains three
SH3 domains plus another domain known as SH2

G-Protein Coupled receptors
An example to understand Protein
Families

G-Protein Signaling
Date Your Footer Here
• Regulator of GPS domains are protein
structural units that activate GTPase.
• sequences belonging to RGS protein
family(multifunctional GTPase accelerating
protein).
• All RGS protein family member contains RGS
domain ,some (RGS1) consist little more than
domain .
• RGS3 and RGS6 contain additional domains for
other functions .

They have seven transmembrane
domains, and interact with
specialized proteins (called G
proteins) to influence intracellular
pathways after binding
extracellular signals
G-protein-coupled receptors
and cancer
Dorsam et al 2007

Date Your Footer Here 9
Level2
Level 1
Sub-family
Superfamily GPCRs
Rhodopsin
-like GPCRs
Opsins
Red-
sensitive
opsins
Green-
sensitive
opsins
Blue-
sensitive
opsins
APJ
receptors
Relaxin
Receptors
cAMP
Receptors
Secretin like-
GPCRs
Etc…
The GPCR superfamily hierarchy. Families and subfamilies to which the short-wave-sensitive opsin 1
protein belongs are highlighted in violet.
GPCRs
Regulates: Biological processes, including photoreception, regulation of the immune system, and nervous system
transmission.
Similarity
increases

Date 10
What Are Sequence Features?
1.Active Site
2.Binding Site
3. Post Translational Modifications (PTMs)
4. Repeats
Group of amino acid that confer certain characteristics upon a protein ,and maybe important for
overall function

Date 11
Protein Signatures
• To classify protein’s family and to
predict the domains or sequence
features we use computational tools
and that tools are the predictive
models known as protein signatures.
• Model refines distantly related
sequences in database are identified.
• Once the model is mature, signature
is ready for protein sequence
analysis.
The Purpose and the Process

Date 12
How do Protein Signature compare to other
ways of classifying proteins?
• Multiple sequence alignment gives
us information about classification
which we use to identify amino acid
residues that are conserved in
distantly proteins.
• Protein signature built from
multiple sequence alignment are
usually better at detecting
divergent homologues than
pairwise comparison method.
Identifying the conserved residues

Date 13
Signature types
Patterns
Profiles
Fingerprints
Hidden Markov Models (HMMs)
Approaches to generate signatures

Patterns & Profiles
Date 14
Signature Types
Patterns can recognize sequence
features such as binding sites or
active sites of enzymes consist of a
only few amino acids.
Ex: PROSITE database.
1 2
Profiles are built by converting
multiple sequence alignment into
position specific scoring system
(PMMs).
Ex: CDD, HAMAP, PROSITE and
PRODOM.

Fingerprints and HMMs
Date 15
Signature Types
3 4
Fingerprints are composed of multiple
short conserved motifs which are drawn
from sequence alignment. They can
distinguish individual subfamilies within
protein families.
Ex : PRINTS database.
Hidden Markov models (HMMs) are
used to convert multiple sequence
alignment into position specific
scoring system.
Ex: Pfam, SMART, TIGRFAM,
PANTHER, SFLD, Superfamily
and Gene 3D.

Date 16
Families in search of function
Domains of unknown function (DUFs)
Popovic et al., 2017.,Scientific reports,
The function of the Domain is yet to be discovered
The DUF naming scheme was introduced by Chris
Ponting through the addition of DUF1 and DUF2 to
the SMART database
Goodacre et al 2014.

Databases at Glance
Date 18
Databases of
Protein Families
5. PRINTS
Combine Multidomain/motif
information for family categorization.
MSA and Fuzzy Logic (Regex)
6. MobiDB
Homology, Predicted, Curated
Intrinsically Disordered regions
database
7. TIGRFAM
MSA, HMM mainly for prokaryotic
proteins
8. SUPERFAMILY2
Using HMM and protein Sequences
Domain organisation, sequence alignments
and protein sequence details can be
obtained for query sequence
4. PRIDE
Mass-Spec based identification
Provide PTM information and Literature
Evidences
3. Prosite
MSA of homologous Proteins;Based on
Prorules
2. PIRSF
MSA and Clustering with hight similarity
thresholds
1. Pfam
Protein Family, Domains, Motifs and Repeats
(Generated from MSA and HMMs)
1
3 5
7
2
4
8
6

Date 19
Interpro-A Protein Family Compendium

Date 20
GOFeat Tutorial
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Protein Under Investigation: LOC645967

Date 21
InterPro Tutorial
Protein Under Investigation: LOC645967

Date 22
References
• Dorsam, R.T. and Gutkind, J.S., 2007. G-protein-coupled receptors and cancer. Nature reviews
cancer, 7(2), pp.79-94.
• Bateman, Alex, Penny Coggill, and Robert D. Finn. "DUFs: families in search of function." Acta
Crystallographica Section F: Structural Biology and Crystallization Communications 66, no. 10
(2010): 1148-1152.
• Goodacre, Norman F., Dietlind L. Gerloff, and Peter Uetz. "Protein domains of unknown function are
essential in bacteria." MBio 5, no. 1 (2014).
• EMBL-EBI Training Course: https://www.ebi.ac.uk/training-beta/online/courses/protein-
classification-intro-ebi-resources/protein-classification/what-are-protein-families/

Date 23
Thanks
Drop in
@RohitSatyam1
+91 9870953351
Jamia Millia Islamia University

Introduction to Protein Families and Databases

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to Protein Families and Databases

Similar to Introduction to Protein Families and Databases (20)

More from Rohit Satyam

More from Rohit Satyam (9)

Recently uploaded

Recently uploaded (20)

Introduction to Protein Families and Databases