protein sequence database bioinformatics.pdf

TrEMBL (for Translated EMBL)
• TrEMBL is a computer-annotated protein sequence database that contains all the
translations of DDBJ/EMBL/GenBank nucleotide sequence entries, which are not yet
integrated into SWISS-PROT, therefore covering all protein categories.
• It is an advanced version.
• It contains protein sequences extracted from the literature as well directly submitted
by the users.
Protein sequence Database

▪ TrEMBL is updated automatically with data from nucleotide sequence databases such
as EMBL, GenBank, and DDBJ.
▪ It also includes sequences obtained from projects such as genome sequencing
initiatives and metagenomics studies.
▪ TrEMBL can be used to identify novel proteins or characterize proteins from organisms
whose genomes have been sequenced but not yet fully annotated.
▪ The data from TrEMBL is used in combination with experimental data
(e.g., mass spectrometry) for protein identification.

PIR-PSD
• Protein Information Resource-Protein Sequence Database.
• It is the world's first database of classified and functionally annotated protein sequences.
• PIR-PSD has been the most comprehensive and expertly curated protein sequence database
in the public domain for over 20 years.
• PIR-PSD was developed and distributed by the Protein Information Resource in collaboration
with MIPS (Munich Information Center for Protein Sequences) and JIPID (Japan International
Protein Information Database).

• A unique characteristic feature of the PIR-PSD is its superfamily-based
classification of protein sequences.
• Further, the sequence in PIR-PSD is also classified based on homology domain
and sequence motifs.
• Homology domains may correspond to evolutionary building blocks, while
sequence motifs represent functional sites or conserved regions. The
classification approach allows a complete understanding of the sequence-
structure-function relationships.

https://proteininformationresource.org/pirww
w/dbinfo/pir_psd.shtml

Before becoming part of UniProt, PIR-PSD was the oldest annotated and curated
protein-sequence database, established in 1984 as a successor to the original
National Biomedical Research Foundation (NBRF) Protein Sequence Database.

❖ In 2002, PIR joined EBI (European Bioinformatics Institute) and SIB (Swiss
Institute of Bioinformatics) to form the UniProt consortium.
❖ PIR-PSD sequences and annotations have been integrated into UniProt
Knowledgebase.
❖ Bi-directional cross-references between UniProt (UniProt Knowledgebase and/or
UniParc) and PIR-PSD are established to allow easy tracking of former PIR-PSD
entries because new entries are deposited into UniProt directly.
❖ The UniProt consortium comprises the EBI, the SIB, and the PIR to host the large
resource of bioinformatics databases and services. The mission of UniProt is to
provide the scientific community with a comprehensive, high-quality and freely
accessible resource of protein sequence and functional information.
❖ Now structural information is also linked to the database if available for that protein.
UniProt Database

Before this collaboration, EMBL-EBI maintained
TrEMBL, SIB maintained Swiss-Prot, and PIR
maintained the Protein Sequence Database
(PIR-PSD)

❖ UniProt was launched in December 2003 and is mainly supported by the
National Institutes of Health, USA (NIH) grants.
❖ UniProt acts as a central hub for biomolecular information archived in more
than 50 cross-referenced databases.
❖ It provides cross-references to external data collections such as the Biological
Databases, underlying DNA sequence entries in the DDBJ/EMBL/GenBank
nucleotide sequence databases, 3D protein structure databases, various protein
domain and family characterization databases, post-translational modification
databases, species-specific data collections, variant databases and disease
databases.
❖ Explain it with an example of 1410800.

❖ The Universal Protein Resource Knowledgebase (UniProtKB), which was
initiated in 2002 by the UniProt consortium.
❖ The UniProtKB consists of two parts: UniProtKB/Swiss-Prot (reviewed,
manually annotated) and UniProtKB/TrEMBL (unreviewed, automatically
annotated; TrEMBL 5 translated EMBL).
❖ UniProtKB/Swiss-Prot contains manually annotated records and information
obtained from the literature and curator-evaluated computational analysis,
whereas UniProtKB/TrEMBL contains computationally analyzed records that
still need full manual annotation.
❖ The source of the protein sequences in UniProtKB can be multiple, such as
translated coding sequence from EMBL-Bank/GenBank/DDBJ nucleotide-
sequence databases, Protein Data Bank (PDB) database, Protein Information
Resource (PIR) database, and sequences submitted directly to UniProtKB.

protein sequence database bioinformatics.pdf

More Related Content

Similar to protein sequence database bioinformatics.pdf

Recently uploaded

protein sequence database bioinformatics.pdf