Archives and Information Retrieval Lecture by; AQSAD RASHDA BIOINFORMATICS
Database indexing and specification of search terms
An index is a set of pointers to information in a database
Entries: discrete coherent parcels of information.
The information retrieval software
Primary data collections related to biological macromolecules include: Nucleic acid sequences, including whole-genome projects Amino acid sequences of proteins Protein and nucleic acid structures Small-molecule crystal structures Protein functions Expression patterns of genes Publications The archives
The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene Organism Classification Keyword Description Date Identification Accession Organism Source Reference Number
Reference Title Reference Author Reference Position Reference Location Feature Table Header
FT: The feature table may indicate regions that 1. perform or affect function 2. interact with other molecules 3. affect replication 4. are involved in recombination 5. are a repeated unit 6. have secondary or tertiary structure 7. are revised or corrected Sequence Header
The PIR maintains several databases about proteins:
1. PIR-PSD: the main protein sequence database
2. iProClass: classification of proteins according to structure and function
3. ASDB: annotation and similarity database; each entry is linked to a list of similar sequences
4. P/R-NREF: a comprehensive non-redundant collection of over 800 000 protein sequences merged from all available sources
5. NRL3D: a database of sequences and annotations of proteins of known structure deposited in the Protein Data Bank
6. ALN: a database of protein sequence alignments, and
7.RESID: a database of covalent protein structure modifications (recall that important structural features of proteins such as disulphide bridges are not inferrable from gene sequences, and will not appear in protein sequence databases derived solely by translation of genomic data)
Structure databases archive, annotate and distribute sets of atomic coordinates
Protein Data Bank (PDB).
The information contained includes:
What protein is the subject of the entry, and what species it came from
Who solved the structure, and references to publications describing the structure determination
Experimental details about the structure determination, including information related to the general quality of the result such as resolution of an X-ray structure determination and stereochemical statistics
The amino acid sequence
What additional molecules appear in the structure, including cofactors, inhibitors, and water molecules
VIPER (Virus Particle ExploreR) treats crystal structures of icosahedral viruses.
In the field of immunology :
IMGT, the international ImMunoGeneTics database, is a high-quality integrated database specializing in Immunoglobulins (Ig), T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species.
The IMGT server provides a common access to all Immunogenetics data.
At present, it includes two databases : IMGT/LIGM-DB , a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates, with translation for fully annotated sequences, and IMGT/HLA-DB , a database of the human MHC referred to as HLA (Human Leucocyte Antigens)
Web Resource: Databases for Specific Protein Families
Collections of links to databases on specific protein families
KABAT - Database of Sequences of Proteins of Immunological Interest - North-Western University (USA)
MHCPEP - Major Histocompatibility Complex Binding Peptides Database - Walter and Eliza Hall Institute (Melbourne, Australia)