PROTEIN DATABASES
PDB
PIR
SWISSPROT
PROTEIN DATABASES
What are PROTEIN ?
PROTEIN DATABASES TYPES
• Protein Information Resource (PIR)
• SWISS-PROT
• Protein Databank (PDB)
Importance of Protein Databases
What are PROTEIN ?
PROTEIN DATABASES
• Protein Information Resource (PIR)
• SWISS-PROT
• Protein Databank (PDB)
PROTEIN DATABASES
Protein Information Resource (PIR)
History
The Protein Information Resource (PIR) is an integrated
public bioinformatics resource to support genomic, proteomic and
systems biology research and scientific studies.
PIR was established in 1984 by the National Biomedical
Research Foundation (NBRF) as a resource to assist researchers in
the identification and interpretation of protein sequence
information.
For over four decades, beginning with the Atlas of Protein
Sequence and Structure, PIR has provided protein databases and
analysis tools freely accessible to the scientific community including
the Protein Sequence Database (PSD).
In 2002 PIR, along with its international
partners, EBI (European Bioinformatics Institute)
and SIB (Swiss Institute of Bioinformatics), were awarded
a grant from NIH to create UniProt, a single worldwide
database of protein sequence and function, by unifying
the PIR-PSD, Swiss-Prot, and TrEMBL databases.
Today, PIR maintains staff at UD and GUMC and
continues to offer world leading resources to assist with
proteomic and genomic data integration and the
propagation and standardization of protein annotation.
Protein Databank (PDB):
• PDB is a primary protein structure database. It is a
crystallographic database for the three-dimensional
structure of large biological molecules, such as proteins.
• In spite of the name, PDB archive the three-dimensional
structures of not only proteins but also all biologically
important molecules, such as nucleic acid fragments,
RNA molecules, large peptides such as antibiotic
gramicidin and complexes of protein and nucleic acids.
• The database holds data derived from mainly three
sources: Structure determined by X-ray crystallography,
NMR experiments, and molecular modeling.
SWISS-PROT
• The other well known and extensively used
protein database is SWISS-PROT.
• The data in each entry can be considered
separately as core data and annotation.
• The core data consists of the sequences entered
in common single letter amino acid code, and the
related references and bibliography. The
taxonomy of the organism from which the
sequence was obtained also forms part of this
core information.
The annotation contains information on the
function or functions of the protein, post-
translational modification such as phosphorylation,
acetylation, etc., functional and structural domains
and sites, such as calcium binding regions, ATP-
binding sites, zinc fingers, etc., known secondary
structural features as for examples alpha helix, beta
sheet, etc., the quaternary structure of the protein,
similarities to other protein if any, and diseases that
may arise due to different authors publishing
different sequences for the same protein, or due to
mutations in different strains of an described as
part of the annotation.
TrEMBL (for Translated EMBL)
It is a also computer-annotated protein
sequence database that is released as a
supplement to SWISS-PROT. It contains the
translation of all coding sequences present in
the EMBL Nucleotide database, which have not
been fully annotated. Thus it may contain the
sequence of proteins that are never expressed
and never actually identified in the organisms.
• UniProtKB/Swiss-Prot which is manually
annotated and is reviewed and
• UniProtKB/TrEMBL which is automatically
annotated and is not reviewed
Importance of Protein Databases
Huge amounts of data for protein structures,
functions, and particularly sequences are being
generated. Searching databases are often the first
step in the study of a new protein. It has the
following uses:
• Comparison between proteins or between
protein families provides information about the
relationship between proteins within a genome
or across different species and hence offers much
more information that can be obtained by
studying only an isolated protein.
Importance of Protein Databases
• Secondary databases derived from
experimental databases are also widely
available. These databases reorganize and
annotate the data or provide predictions.
• The use of multiple databases often helps
researchers understand the structure and
function of a protein.
Thanking You

Protein Databases

  • 1.
  • 2.
    PROTEIN DATABASES What arePROTEIN ? PROTEIN DATABASES TYPES • Protein Information Resource (PIR) • SWISS-PROT • Protein Databank (PDB) Importance of Protein Databases
  • 3.
  • 7.
  • 10.
    • Protein InformationResource (PIR) • SWISS-PROT • Protein Databank (PDB) PROTEIN DATABASES
  • 11.
    Protein Information Resource(PIR) History The Protein Information Resource (PIR) is an integrated public bioinformatics resource to support genomic, proteomic and systems biology research and scientific studies. PIR was established in 1984 by the National Biomedical Research Foundation (NBRF) as a resource to assist researchers in the identification and interpretation of protein sequence information. For over four decades, beginning with the Atlas of Protein Sequence and Structure, PIR has provided protein databases and analysis tools freely accessible to the scientific community including the Protein Sequence Database (PSD).
  • 12.
    In 2002 PIR,along with its international partners, EBI (European Bioinformatics Institute) and SIB (Swiss Institute of Bioinformatics), were awarded a grant from NIH to create UniProt, a single worldwide database of protein sequence and function, by unifying the PIR-PSD, Swiss-Prot, and TrEMBL databases. Today, PIR maintains staff at UD and GUMC and continues to offer world leading resources to assist with proteomic and genomic data integration and the propagation and standardization of protein annotation.
  • 13.
    Protein Databank (PDB): •PDB is a primary protein structure database. It is a crystallographic database for the three-dimensional structure of large biological molecules, such as proteins. • In spite of the name, PDB archive the three-dimensional structures of not only proteins but also all biologically important molecules, such as nucleic acid fragments, RNA molecules, large peptides such as antibiotic gramicidin and complexes of protein and nucleic acids. • The database holds data derived from mainly three sources: Structure determined by X-ray crystallography, NMR experiments, and molecular modeling.
  • 14.
    SWISS-PROT • The otherwell known and extensively used protein database is SWISS-PROT. • The data in each entry can be considered separately as core data and annotation. • The core data consists of the sequences entered in common single letter amino acid code, and the related references and bibliography. The taxonomy of the organism from which the sequence was obtained also forms part of this core information.
  • 15.
    The annotation containsinformation on the function or functions of the protein, post- translational modification such as phosphorylation, acetylation, etc., functional and structural domains and sites, such as calcium binding regions, ATP- binding sites, zinc fingers, etc., known secondary structural features as for examples alpha helix, beta sheet, etc., the quaternary structure of the protein, similarities to other protein if any, and diseases that may arise due to different authors publishing different sequences for the same protein, or due to mutations in different strains of an described as part of the annotation.
  • 16.
    TrEMBL (for TranslatedEMBL) It is a also computer-annotated protein sequence database that is released as a supplement to SWISS-PROT. It contains the translation of all coding sequences present in the EMBL Nucleotide database, which have not been fully annotated. Thus it may contain the sequence of proteins that are never expressed and never actually identified in the organisms.
  • 17.
    • UniProtKB/Swiss-Prot whichis manually annotated and is reviewed and • UniProtKB/TrEMBL which is automatically annotated and is not reviewed
  • 18.
    Importance of ProteinDatabases Huge amounts of data for protein structures, functions, and particularly sequences are being generated. Searching databases are often the first step in the study of a new protein. It has the following uses: • Comparison between proteins or between protein families provides information about the relationship between proteins within a genome or across different species and hence offers much more information that can be obtained by studying only an isolated protein.
  • 19.
    Importance of ProteinDatabases • Secondary databases derived from experimental databases are also widely available. These databases reorganize and annotate the data or provide predictions. • The use of multiple databases often helps researchers understand the structure and function of a protein.
  • 20.