Protein sequence databases

Protein sequence databases
Introduction:
The Protein database is a collection of sequences from several sources, including translations from
annotated coding regions in GenBank, RefSeqand TPA, as well as records from SwissProt, PIR,
PRF, and PDB. Protein sequences are the fundamental determinants of biological structure and
function.
SWISS-PROT
– Manually curated
– high-quality annotations, less data
GenPept/TREMBL
– Translated coding sequences from GenBank/EMBL
– Few annotations, more up to date
PIR
– Phylogenetic-based annotations
All 3 now combining efforts to form UniProt (http://www.uniprot.org)
PDB (Protein Databank)
 Stores 3-dimensional atomic coordinates for biological molecules including protein and
nucleic acids
 Data obtained by X-ray crystallography, NMR, or computer modelling
http://www.rcsb.org/pdb/
MMDB (Molecular Modelling database)
Over 28,000 3D macromolecular structures, including proteins and
polynucleotides(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure)
SCOP (Structural Classification of Proteins)
Classification of proteins according to structural and evolutionary relationships
SWISS-PROT
Introduction:
SWISS-PROT is an annotated protein sequence database, which was created at the
Department of Medical Biochemistry of the University of Geneva and has been a collaborative
effort of the Department and the European Molecular Biology Laboratory (EMBL), since 1987.
SWISS-PROT is now an equal partnership between the EMBL and the Swiss Institute of

Bioinformatics (SIB). The EMBL activities are carried out by its Hinxton Outstation, the European
Bioinformatics Institute (EBI). The SWISS-PROT protein sequence database consists of sequence
entries. Sequence entries are composed of different line types, each with their own format.
The SWISS-PROT database distinguishes itself from other protein sequence databases by three
distinct criteria:
(i) annotations
(ii) (ii) minimal redundancy and
(iii) (iii) integration with other databases.
Annotations
CORE DATA
• The sequence data
• The citation information (bibliographical references)
• The taxonomic data (description of the biological source of the protein)
Annotation- Additional Data
• Descriptions include:
• Function(s) of the protein
• Posttranslational modification(s) such as carbohydrates, phosphorylation, acetylation and
GPI-anchor
• Domains and sites, for example, calcium-binding regions, ATP-binding sites, zinc fingers,
homeoboxes, and SH2 and SH3 domains
• Secondary structure, e.g. alpha helix, beta sheet
• Quaternary structure, i.g. homodimer, heterotrimer, etc.
• Similarities to other proteins
• Disease(s) associated with any number of deficiencies in the protein
• Sequence conflicts, variants, etc.
Minimal Redundancy
• Much of data comes from more than one literature report
• Data condensed and merged to appear more concise and coherent
• Conflicts in data are listed for each entry
Integration with other databases
• 50+ databases for cross-reference

• Nucleic acid sequences, protein tertiary structure, protein 3-D models, etc.
• Allows Swiss-PROT to play a major role as the focal point for biomolecular
interconnectivity
Documentation
• All files documented and indexed
• Documentation kept up-to-date
Applications for the Knowledgebase
• Provides highly organized data and information on a wide variety of proteins
• Can be used as a starting point for protein research
• Allows searches to be conducted starting with various search strings
• Biochemical encyclopedia

ID - Identification.
AC - Accession number(s).
DT - Date.
DE - Description.
GN - Gene name(s).
OS - Organism species.
OG - Organelle.
OC - Organism classification.
RN - Reference number.
RP - Reference position.
RC - Reference comments.
RX - Reference cross-references.
RA - Reference authors.
RL - Reference location.
CC - Comments or notes.

DR - Database cross-references.
KW - Keywords.
FT - Feature table data.
SQ - Sequence header.
- (blanks) sequence data.
// - Termination line.

Protein sequence databases

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to Protein sequence databases

Similar to Protein sequence databases (20)

More from Vidya Kalaivani Rajkumar

More from Vidya Kalaivani Rajkumar (20)

Recently uploaded

Recently uploaded (20)

Protein sequence databases