2. Protein database can be a sequence database or
structure database.
Protein sequence database:
The protein sequence database was developed at
National biomedical research foundation (NBRF) at
Georgetown university by margaret dayoff in 1960’s.
The protein sequence database was collaboratively
maintained by PIR,JIPID(international protein
information database of Japan) and
MIPS(martinsried institute of protein sequence.
3. PIR (PROTEIN INFORMATION RESOURCE) DATABASE:
It is main protein sequence database.
This database is classified into 4 classes.
PIR1:classified and annotated entries.
PIR2:Priliminary entries
PIR3:Unverified entries
PIR4:Conceptual translation of the sequence that are
not transcribed , that are genetically engineered etc.
4. SWISS-PROT
It is a protein sequence database maintained
collaboratively by medical biochemistry at the
university of Geneva .
The d/b endeavours to provide high level annotation
,description of the function of the protein and
structure of the domains, post translational
modifications,varients and so on.
They are interlinked to many source and have
minimal redundancy.
5. TrEMBL:
It was created in 1966 as a computer annotated
supplement to swiss prot.
The d/b contains translation of all coding sequences.
2 main sections:
SP –TrEMBL –contain entries that are not been
annotated but they are eventually incorporated in to
swiss prot.
REM-TrEMBL-contain entries that are not included
into swiss prot.eg Ig seq,synthetic seq.
NRL-3D:
This d/b is produced by PIR from sequences
extracted from PDB.
6. NRL 3D is used both for similarity searches and keyword
introgation.
ATLAS retrieval system is used to access information from
NRL-3D.
Structural database:
They store a collection of 3 dimensional biological
macromolecular structures of proteins.
The last established datbase for protein structures is protein
data bank (PDB)
7. PDB: It contains following information
Name of the protein
The species
Describe the structure determination.
Amino acid sequence
Additional information.
SCOP:(Structural classification of protein)
The SCOP describes structural and evolutionary relationship
between proteins of known structure.
Proteins are clustered into families with clear evolutionary
relationships if they have sequence identities of more than 30 %.
Proteins are suggested to have a common fold if they have the
same secondary structures in the same arrangement whether or
not they have a common evolutionary origin.
8. CATH DATABASE:
Class,architecture,topology and homology)
Class is derived from gross secondary structure content and
packing.
Architecture describes the gross arrangement of secondary
structures.
Topology encompasses both overall shape and connectivity
of secondary structures.
Homology groups domains that share more than 35 %
sequence identity and thought to share a common ancestor.
9. OTHER DATABASE:
DALI:Based on extraction of similar structures from
distance matrices.
CE:Database of structural alignments.
Proteopedia: A collaborative 3D encyclopedia of
proteins and other molecules.
OPM:provides spatial positions of protein 3 Dimensional
structure
CONSERVED DOMAIN DATABASE: A collection of
sequence alignments and profiles representing protein
domains conserved in molecular evolution