Swiss – Prot
SAGRIKA CHUGH
(M.Tech Bioinformatics)
23-Jan-17 1
INTRODUCTION
• The Universal Protein Resource Knowledgebase (UniProtKB) is the central hub for the
collection of functional information on proteins.
It consists of two sections:
23-Jan-17 2
Swiss -Prot
• Reviewed
• Manually annotated
• Records with information extracted from
literature and curator-evaluated
computational analysis.
Tr-EMBL (Transalted European
Molecular Biological Labratory)
• Unreviewed
• Computationally annotated
• Records that await full manual
annotation.
Source: http://www.uniprot.org/
23-Jan-17 3
• Created at the Department of Medical Biochemistry of the University of Geneva and
works in collaboration with the European Molecular Biology Laboratory (EMBL), since
1987
• Swiss-Prot strives to provide high level of annotation , minimal level of redundancy
and integration with other databases
• It is now an equal partnership between the EMBL and the Swiss Institute of
Bioinformatics (SIB)
• TrEMBL, a computer-annotated supplement to Swiss-Prot.
• Similar format to European Bioinformatics Institute Nucleotide Sequence Database
(EMBL)
INTRODUCTION
23-Jan-17 4
Features of Swiss-Prot
• Annotation
• Minimal Redundancy
• Integration with other databases
• Documentation
23-Jan-17 5
Annotation
Data
23-Jan-17 6
Core data Annotation
• sequence data
• the citation information (bibliographical
references)
• taxonomic data (description of the
biological source of the protein)
• Post-translational modification(s). for
example phosphorylation, acetylation, etc.
• Domains and sites. for example calcium
binding regions, zinc fingers.
• Secondary structure. For example alpha
helix, beta sheet, etc.
• Quaternary structure. For example
homodimer, heterotrimer, etc.
• Disease(s) associated with deficiencies in
the protein
Minimal redundancy
• Much of data comes from more than one literature report
• Data condensed and merged to appear more concise and coherent
• Conflicts in data are listed for each entry
23-Jan-17 7
Integration with other databases
• Swiss-Prot provides cross-references to external data collections
• Integration between the three types of sequence-related databases (nucleic acid
sequences, protein sequences and protein tertiary structures)
• Swiss-Prot Sample Entry swiss prot entry.txt
• Original entry Aar2 - Protein AAR2 homolog - Mus musculus (Mouse) - Aar2 gene & protein.html
23-Jan-17 8
Documentation
• All files documented and indexed.
• Documentation kept up-to-date.
23-Jan-17 9
Swiss-Prot Statistics
23-Jan-17 10
Number of entries
New entries 245
Updated entries 64,182
Unchanged entries 489,047
Total 553,474
Entries with updated sequences 40
With a fragmented AA sequence 9,143
With known alternative products 24,759
Source http://www.uniprot.org/statistics/Swiss-Prot
(Jan 18, 2017 release)
TrEMBL: A computer-annotated supplement to Swiss-PROT
• TrEMBL (translation of EMBL nucleotide sequence database) in 1996..
Why TrEMBL ?
• Increased data flow from genome projects to the sequence databases.
• To maintain the high annotation quality.
• To make sequences available as quickly as possible..
• TrEMBL consists of computer-annotated entries derived from the translation of all
coding sequences (CDS) in the nucleotide sequence databases, except for CDS
already included in Swiss-PROT.
• It also contains protein sequences extracted from the literature and protein sequences
submitted directly by the user community.
23-Jan-17 11
TrEMBL
23-Jan-17 12
Sp- TrEMBL
(SWISS PROT-TrEMBL)
REM-TrEMBL
(Remaining TrEMBL)
contains sequences, which will eventually
be incorporated into SWISS-PROT
contains those sequences which will not be
incorporated into SWISS-PROT.
For eg synthetic sequences, patent
application sequences, fragments of less
than 8 amino acids and coding sequences
where there is strong experimental
evidence that the sequence does not code
for a real protein.
Tr-EMBL Statistics
23-Jan-17 13
Number of entries
New entries 3,031,100
Updated entries 20,906,527
Unchanged entries 49,774,254
Total 73,711,881
Entries with updated sequences 746
With a fragmented AA sequence 8,492,670
With known alternative products 0
Source:http://www.uniprot.org/statistics/TrEMBL
Jan 18, 2017 release
Summary
23-Jan-17 14
Source: www.expasy.org
CONCLUSION
• Swiss-Prot continuously enhanced its format and content to adjust to the wide
knowledge pool in proteomics along with high quality of annotation.
• Automated annotation procedures are used for Swiss-Prot in a very conservative
manner.
• The extensive integration of SWISS-PROT with specialized databases enables users
to navigate through the current knowledge in the Life Sciences providing an insight into
the universe of proteins.
23-Jan-17 15
References
• The Swiss-PROT protein knowledgebase and its supplement TrEMBL in
2003 Brigitte Boeckmann etal Nucl Acids Res (2003) 31 (1): 365-370.
• The Swiss-PROT protein sequence database and its supplement TrEMBL in
2000 Amos Bairoch, Rolf Apweiler Nucl Acids Res (2000) 28 (1): 45-48.
• The Swiss-PROT protein sequence data bank and its supplement TrEMBL in
1999 Amos Bairoch ,Rolf Apweiler Nucl Acids Res (1999) 27 (1): 49-54
• The Swiss-PROT protein sequence data bank and its supplement TrEMBL Amos
Bairoch Rolf Apweiler Nucl Acids Res (1997) 25 (1): 31-36.
• The Swiss-PROT Protein Sequence Data Bank and Its New Supplement
TREMBL Amos Bairoch Rolf Apweiler Nucl Acids Res (1996) 24 (1): 21-25.
• http://www.uniprot.org/uniprot/
23-Jan-17 16
23-Jan-17 17

Swiss prot database

  • 1.
    Swiss – Prot SAGRIKACHUGH (M.Tech Bioinformatics) 23-Jan-17 1
  • 2.
    INTRODUCTION • The UniversalProtein Resource Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins. It consists of two sections: 23-Jan-17 2 Swiss -Prot • Reviewed • Manually annotated • Records with information extracted from literature and curator-evaluated computational analysis. Tr-EMBL (Transalted European Molecular Biological Labratory) • Unreviewed • Computationally annotated • Records that await full manual annotation.
  • 3.
  • 4.
    • Created atthe Department of Medical Biochemistry of the University of Geneva and works in collaboration with the European Molecular Biology Laboratory (EMBL), since 1987 • Swiss-Prot strives to provide high level of annotation , minimal level of redundancy and integration with other databases • It is now an equal partnership between the EMBL and the Swiss Institute of Bioinformatics (SIB) • TrEMBL, a computer-annotated supplement to Swiss-Prot. • Similar format to European Bioinformatics Institute Nucleotide Sequence Database (EMBL) INTRODUCTION 23-Jan-17 4
  • 5.
    Features of Swiss-Prot •Annotation • Minimal Redundancy • Integration with other databases • Documentation 23-Jan-17 5
  • 6.
    Annotation Data 23-Jan-17 6 Core dataAnnotation • sequence data • the citation information (bibliographical references) • taxonomic data (description of the biological source of the protein) • Post-translational modification(s). for example phosphorylation, acetylation, etc. • Domains and sites. for example calcium binding regions, zinc fingers. • Secondary structure. For example alpha helix, beta sheet, etc. • Quaternary structure. For example homodimer, heterotrimer, etc. • Disease(s) associated with deficiencies in the protein
  • 7.
    Minimal redundancy • Muchof data comes from more than one literature report • Data condensed and merged to appear more concise and coherent • Conflicts in data are listed for each entry 23-Jan-17 7
  • 8.
    Integration with otherdatabases • Swiss-Prot provides cross-references to external data collections • Integration between the three types of sequence-related databases (nucleic acid sequences, protein sequences and protein tertiary structures) • Swiss-Prot Sample Entry swiss prot entry.txt • Original entry Aar2 - Protein AAR2 homolog - Mus musculus (Mouse) - Aar2 gene & protein.html 23-Jan-17 8
  • 9.
    Documentation • All filesdocumented and indexed. • Documentation kept up-to-date. 23-Jan-17 9
  • 10.
    Swiss-Prot Statistics 23-Jan-17 10 Numberof entries New entries 245 Updated entries 64,182 Unchanged entries 489,047 Total 553,474 Entries with updated sequences 40 With a fragmented AA sequence 9,143 With known alternative products 24,759 Source http://www.uniprot.org/statistics/Swiss-Prot (Jan 18, 2017 release)
  • 11.
    TrEMBL: A computer-annotatedsupplement to Swiss-PROT • TrEMBL (translation of EMBL nucleotide sequence database) in 1996.. Why TrEMBL ? • Increased data flow from genome projects to the sequence databases. • To maintain the high annotation quality. • To make sequences available as quickly as possible.. • TrEMBL consists of computer-annotated entries derived from the translation of all coding sequences (CDS) in the nucleotide sequence databases, except for CDS already included in Swiss-PROT. • It also contains protein sequences extracted from the literature and protein sequences submitted directly by the user community. 23-Jan-17 11
  • 12.
    TrEMBL 23-Jan-17 12 Sp- TrEMBL (SWISSPROT-TrEMBL) REM-TrEMBL (Remaining TrEMBL) contains sequences, which will eventually be incorporated into SWISS-PROT contains those sequences which will not be incorporated into SWISS-PROT. For eg synthetic sequences, patent application sequences, fragments of less than 8 amino acids and coding sequences where there is strong experimental evidence that the sequence does not code for a real protein.
  • 13.
    Tr-EMBL Statistics 23-Jan-17 13 Numberof entries New entries 3,031,100 Updated entries 20,906,527 Unchanged entries 49,774,254 Total 73,711,881 Entries with updated sequences 746 With a fragmented AA sequence 8,492,670 With known alternative products 0 Source:http://www.uniprot.org/statistics/TrEMBL Jan 18, 2017 release
  • 14.
  • 15.
    CONCLUSION • Swiss-Prot continuouslyenhanced its format and content to adjust to the wide knowledge pool in proteomics along with high quality of annotation. • Automated annotation procedures are used for Swiss-Prot in a very conservative manner. • The extensive integration of SWISS-PROT with specialized databases enables users to navigate through the current knowledge in the Life Sciences providing an insight into the universe of proteins. 23-Jan-17 15
  • 16.
    References • The Swiss-PROTprotein knowledgebase and its supplement TrEMBL in 2003 Brigitte Boeckmann etal Nucl Acids Res (2003) 31 (1): 365-370. • The Swiss-PROT protein sequence database and its supplement TrEMBL in 2000 Amos Bairoch, Rolf Apweiler Nucl Acids Res (2000) 28 (1): 45-48. • The Swiss-PROT protein sequence data bank and its supplement TrEMBL in 1999 Amos Bairoch ,Rolf Apweiler Nucl Acids Res (1999) 27 (1): 49-54 • The Swiss-PROT protein sequence data bank and its supplement TrEMBL Amos Bairoch Rolf Apweiler Nucl Acids Res (1997) 25 (1): 31-36. • The Swiss-PROT Protein Sequence Data Bank and Its New Supplement TREMBL Amos Bairoch Rolf Apweiler Nucl Acids Res (1996) 24 (1): 21-25. • http://www.uniprot.org/uniprot/ 23-Jan-17 16
  • 17.

Editor's Notes

  • #3 Encyclopedia of proteins
  • #5 annotation (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), except the CDSs already included in SWISS-PROT
  • #7 annotation is mainly found in the comment lines (CC), in the feature table (FT) and in the keyword lines (KW)
  • #13 In addition, there is a weekly update to TrEMBL called TrEMBLnew. TrEMBLnew from new nucleotide sequences deposited in the EMBL nucleotide sequence database. At each TrEMBL release, the TrEMBLnew entries are processed; any entries redundant against SWISS-PROT/TrEMBL ( 4 ) are merged and the remainder then progressed into TrEMBL ( 5 ).
  • #16 are only applied where they allow the achievement of the same level of quality as obtained by manual annotation