This document describes the PRESAGE database, which aims to improve communication among structural genomics researchers. The database contains protein sequence annotations from experimental and computational research. Researchers can submit annotations about protein structures they are studying experimentally or predicting computationally. The annotations are classified as experimental to track experimental progress, or prediction at three levels of detail. The database is publicly available online and allows registered users to receive notifications about annotations of interest.
4. Background
Structural Genomics
- first used by Barry Honig, Wayne Hendrickson, and colleagues ( 1997) in the
context of solving structures across whole genomes.
- describes the high-throughput generation of new protein structures and their
analysis in the context of emerging genome sequence data (Terry Gaasterland,
July 1998).
Aim- to characterize the structure of the genome.
- provide an experimental structure or a good theoretical model for every protein in
all completed genomes.
5. Approaches:
1) Experimental:
- provides essential information about a relatively small number of individual
proteins.
2) Computational:
- expands knowledge obtained from experiments and apply it to the potentially
large families of related proteins.
- They are first used to assign protein structures to genomic proteins.
- The remaining proteins are clustered into families, and representatives from these
families are selected for experimental characterization. The newly solved
structures are compared with other proteins of known structure in classifications
such as SCOP, CATH or FSSP , to yield information about their evolution and
thence about function.
6. Why PRESAGE developed?
No co-ordination in the selection of new structures in PDB.
Impact - multiple groups to inadvertently begin studies on the same protein, even
though there are more than enough important families to go around.
Computational studies have often been performed in isolation, with researchers
unaware of their colleagues’ efforts or the details of their work.
Lack of consistent organization and repositories for these data.
7. PRESAGE DATABASE
Protein Resource Entailing Structural Annotation of Genomic Entities.
ͽ Aim - to improve communication among structural genomics researchers.
To achieve this,
• provides a repository of capsule information about progress in the field.
• aids in the distribution of this knowledge to the biology research community.
8. Database Model
Core - a database of protein sequences (derived from SWISS-PROT + TrEMBL)
Unlike SWISS-PROT, the authors of the database do not create and edit these
annotations.
Instead, any active structural genomics researcher may submit information.
Original contributors retain full credit for their annotations.
Entries have links with information about the contributor & optional links to
relevant literature references and associated Web sites.
Db also provide annotated summary data and analyses.
9. ANNOTATIONS
Fundamental unit of information in PRESAGE;
Attached to a single protein sequence entry.
records the name of the annotator, the date on which it was entered.
Annotations have details specific to their class,
Permits free-text comments, listings of relevant papers with MEDLINE
references, and links to other Web sites associated with the annotation.
Two classes:
1) Experimental
2) Prediction
10. 1) Experimental Annotations:
Indicates that a protein has been selected for structure determination and tracks
the progress towards the solved structure.
e.g. NCBI/HUGO Human Genome Sequencing Index
(http://www.ncbi.nlm.nih.gov/ HUGO/ ) that records sequencing efforts; preventing
inadvertent overlapping studies.
Experimental annotators record the stages their experiments have reached and
specific details associated with those stages.
2) Prediction Annotations:
Computational biologists can register predicted structure of proteins at 3 levels
11. A)Level 1: Assignment
associates a region of the sequence with a known structure, and asserts that the
two proteins will share a common fold.
B) Level 2 : Alignment
augments this information by indicating how the database sequence maps onto the
solved structure.
C) Level 3: Model
provides predicted three-dimensional coordinates for the protein sequence.
12. FACILITIES
• Retrieval of entries-
Several methods , including
- searches by various identifiers [ those used by SWISS-PROT and TrEMBL ,
GenBank] or
- by keywords in the SWISS-PROT description and comments about the proteins.
• Awareness Function-
- allows a user to register interest in a protein, and he will receive Email
notification when annotations are made to that protein.
13. Availability
• Database is publicly available at http://presage.stanford.edu/ .
• Contributors and individuals wishing to use the awareness function may register
on-line, through links from that page.
14. Conclusion
• the database will help to link researchers in the decentralized field of
structural genomics.
• It will help to make their results readily available.