Protein databases


Technology
  2. 2. Protein database can be a sequence database orstructure database.Protein sequence database:The protein sequence database was developed atNational biomedical research foundation (NBRF) atGeorgetown university by margaret dayoff in 1960’s.The protein sequence database was collaborativelymaintained by PIR,JIPID(international proteininformation database of Japan) andMIPS(martinsried institute of protein sequence.
  3. 3. PIR (PROTEIN INFORMATION RESOURCE) DATABASE:It is main protein sequence database.This database is classified into 4 classes.PIR1:classified and annotated entries.PIR2:Priliminary entriesPIR3:Unverified entriesPIR4:Conceptual translation of the sequence that arenot transcribed , that are genetically engineered etc.
  4. 4. SWISS-PROTIt is a protein sequence database maintainedcollaboratively by medical biochemistry at theuniversity of Geneva .The d/b endeavours to provide high level annotation,description of the function of the protein andstructure of the domains, post translationalmodifications,varients and so on.They are interlinked to many source and haveminimal redundancy.
  5. 5. TrEMBL:It was created in 1966 as a computer annotatedsupplement to swiss prot.The d/b contains translation of all coding sequences.2 main sections:SP –TrEMBL –contain entries that are not beenannotated but they are eventually incorporated in toswiss prot.REM-TrEMBL-contain entries that are not includedinto swiss Ig seq,synthetic seq.NRL-3D:This d/b is produced by PIR from sequencesextracted from PDB.
  6. 6. NRL 3D is used both for similarity searches and keywordintrogation.ATLAS retrieval system is used to access information fromNRL-3D.Structural database:They store a collection of 3 dimensional biologicalmacromolecular structures of proteins.The last established datbase for protein structures is proteindata bank (PDB)
  7. 7. PDB: It contains following informationName of the proteinThe speciesDescribe the structure determination.Amino acid sequenceAdditional information.SCOP:(Structural classification of protein)The SCOP describes structural and evolutionary relationshipbetween proteins of known structure.Proteins are clustered into families with clear evolutionaryrelationships if they have sequence identities of more than 30 %.Proteins are suggested to have a common fold if they have thesame secondary structures in the same arrangement whether ornot they have a common evolutionary origin.
  8. 8. CATH DATABASE:Class,architecture,topology and homology)Class is derived from gross secondary structure content andpacking.Architecture describes the gross arrangement of secondarystructures.Topology encompasses both overall shape and connectivityof secondary structures.Homology groups domains that share more than 35 %sequence identity and thought to share a common ancestor.
  9. 9. OTHER DATABASE:DALI:Based on extraction of similar structures fromdistance matrices.CE:Database of structural alignments.Proteopedia: A collaborative 3D encyclopedia ofproteins and other molecules.OPM:provides spatial positions of protein 3 DimensionalstructureCONSERVED DOMAIN DATABASE: A collection ofsequence alignments and profiles representing proteindomains conserved in molecular evolution
