This presentation gives you a detailed information about the swiss prot database that comes under UniProtKB. It also covers TrEMBL: a computer annotated supplement to Swiss-Prot.
2. INTRODUCTION
• The Universal Protein Resource Knowledgebase (UniProtKB) is the central hub for the
collection of functional information on proteins.
It consists of two sections:
23-Jan-17 2
Swiss -Prot
• Reviewed
• Manually annotated
• Records with information extracted from
literature and curator-evaluated
computational analysis.
Tr-EMBL (Transalted European
Molecular Biological Labratory)
• Unreviewed
• Computationally annotated
• Records that await full manual
annotation.
4. • Created at the Department of Medical Biochemistry of the University of Geneva and
works in collaboration with the European Molecular Biology Laboratory (EMBL), since
1987
• Swiss-Prot strives to provide high level of annotation , minimal level of redundancy
and integration with other databases
• It is now an equal partnership between the EMBL and the Swiss Institute of
Bioinformatics (SIB)
• TrEMBL, a computer-annotated supplement to Swiss-Prot.
• Similar format to European Bioinformatics Institute Nucleotide Sequence Database
(EMBL)
INTRODUCTION
23-Jan-17 4
5. Features of Swiss-Prot
• Annotation
• Minimal Redundancy
• Integration with other databases
• Documentation
23-Jan-17 5
6. Annotation
Data
23-Jan-17 6
Core data Annotation
• sequence data
• the citation information (bibliographical
references)
• taxonomic data (description of the
biological source of the protein)
• Post-translational modification(s). for
example phosphorylation, acetylation, etc.
• Domains and sites. for example calcium
binding regions, zinc fingers.
• Secondary structure. For example alpha
helix, beta sheet, etc.
• Quaternary structure. For example
homodimer, heterotrimer, etc.
• Disease(s) associated with deficiencies in
the protein
7. Minimal redundancy
• Much of data comes from more than one literature report
• Data condensed and merged to appear more concise and coherent
• Conflicts in data are listed for each entry
23-Jan-17 7
8. Integration with other databases
• Swiss-Prot provides cross-references to external data collections
• Integration between the three types of sequence-related databases (nucleic acid
sequences, protein sequences and protein tertiary structures)
• Swiss-Prot Sample Entry swiss prot entry.txt
• Original entry Aar2 - Protein AAR2 homolog - Mus musculus (Mouse) - Aar2 gene & protein.html
23-Jan-17 8
10. Swiss-Prot Statistics
23-Jan-17 10
Number of entries
New entries 245
Updated entries 64,182
Unchanged entries 489,047
Total 553,474
Entries with updated sequences 40
With a fragmented AA sequence 9,143
With known alternative products 24,759
Source http://www.uniprot.org/statistics/Swiss-Prot
(Jan 18, 2017 release)
11. TrEMBL: A computer-annotated supplement to Swiss-PROT
• TrEMBL (translation of EMBL nucleotide sequence database) in 1996..
Why TrEMBL ?
• Increased data flow from genome projects to the sequence databases.
• To maintain the high annotation quality.
• To make sequences available as quickly as possible..
• TrEMBL consists of computer-annotated entries derived from the translation of all
coding sequences (CDS) in the nucleotide sequence databases, except for CDS
already included in Swiss-PROT.
• It also contains protein sequences extracted from the literature and protein sequences
submitted directly by the user community.
23-Jan-17 11
12. TrEMBL
23-Jan-17 12
Sp- TrEMBL
(SWISS PROT-TrEMBL)
REM-TrEMBL
(Remaining TrEMBL)
contains sequences, which will eventually
be incorporated into SWISS-PROT
contains those sequences which will not be
incorporated into SWISS-PROT.
For eg synthetic sequences, patent
application sequences, fragments of less
than 8 amino acids and coding sequences
where there is strong experimental
evidence that the sequence does not code
for a real protein.
13. Tr-EMBL Statistics
23-Jan-17 13
Number of entries
New entries 3,031,100
Updated entries 20,906,527
Unchanged entries 49,774,254
Total 73,711,881
Entries with updated sequences 746
With a fragmented AA sequence 8,492,670
With known alternative products 0
Source:http://www.uniprot.org/statistics/TrEMBL
Jan 18, 2017 release
15. CONCLUSION
• Swiss-Prot continuously enhanced its format and content to adjust to the wide
knowledge pool in proteomics along with high quality of annotation.
• Automated annotation procedures are used for Swiss-Prot in a very conservative
manner.
• The extensive integration of SWISS-PROT with specialized databases enables users
to navigate through the current knowledge in the Life Sciences providing an insight into
the universe of proteins.
23-Jan-17 15
16. References
• The Swiss-PROT protein knowledgebase and its supplement TrEMBL in
2003 Brigitte Boeckmann etal Nucl Acids Res (2003) 31 (1): 365-370.
• The Swiss-PROT protein sequence database and its supplement TrEMBL in
2000 Amos Bairoch, Rolf Apweiler Nucl Acids Res (2000) 28 (1): 45-48.
• The Swiss-PROT protein sequence data bank and its supplement TrEMBL in
1999 Amos Bairoch ,Rolf Apweiler Nucl Acids Res (1999) 27 (1): 49-54
• The Swiss-PROT protein sequence data bank and its supplement TrEMBL Amos
Bairoch Rolf Apweiler Nucl Acids Res (1997) 25 (1): 31-36.
• The Swiss-PROT Protein Sequence Data Bank and Its New Supplement
TREMBL Amos Bairoch Rolf Apweiler Nucl Acids Res (1996) 24 (1): 21-25.
• http://www.uniprot.org/uniprot/
23-Jan-17 16
annotation (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.),
except the CDSs already included in SWISS-PROT
annotation is mainly found in the comment lines (CC), in the feature table (FT) and in the keyword lines (KW)
In addition, there is a weekly update to TrEMBL called TrEMBLnew. TrEMBLnew from new nucleotide sequences deposited in the EMBL nucleotide sequence database. At each TrEMBL release, the TrEMBLnew entries are processed; any entries redundant against SWISS-PROT/TrEMBL ( 4 ) are merged and the remainder then progressed into TrEMBL ( 5 ).
are only applied where they allow the achievement of the same level of quality as obtained by manual annotation