Structural database and their classification by abdul qahar


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Structural database and their classification by abdul qahar

  1. 1. Amino acids
  2. 2. Presenting By Abdul Qahar (A Q) Buner Campus Edited, Prepared and shared By Abdul Qahar
  3. 3. Structural database and their classification.
  4. 4. Basic concept about Database 1. What is a database? A database is a collection of data which can be used: • alone, or • combined / related to other data to provide answers to the user’s question.
  5. 5. Data types primary data secondary data tertiary data sequence DNA amino acid DMPVERILEALAVE… primary database secondary protein structure“motifs”: regular expressions, blocks, profiles, fingerprints e. g., alpha-helices, beta- strands secondary db domains, folding units tertiary protein structure tertiary db atomic co-ordinates interaction data binary protein-protein interactions/ networks pathways and functional networks interaction db
  6. 6. Primary biological databases Nucleic acid databases EMBL GenBank DDBJ (DNA Data Bank of Japan) Protein databases PIR MIPS SWISS-PROT TrEMBL NRL-3D
  7. 7. Nucleotide Databases •EMBL:Nucleotide sequence database •Ensembl: Automatics annotation of eukaryotic genomes •Genome Server: Overview of completed genomes at EBI •Genome-MOT: Genome monitoring table •EMBL-Align: Multiple sequence alignment database
  8. 8. Sequence data = strings of letters Nucleotides (bases) Adenine (A) Cytosine (C) Guanine (G) Thymine (T) triplet codons genetic code 20 amino acids (A, L, V, S etc.)
  9. 9. Three-dimensional protein structure = atomic coordinates in 3D space
  10. 10. Protein folding
  11. 11. EMBL/GenBank/DDJB • These 3 db contain mainly the same information (few differences in the format and syntax) • Serve as archives containing all sequences (single genes, ESTs, complete genomes, etc.) derived from: – Genome projects and sequencing centers – Individual scientists – Patent offices (i.e. USPTO, EPO) • Non-confidential data are exchanged daily.
  12. 12. Databases related to Genomics • Contain information on genes, gene location (mapping), gene nomenclature and links to sequence databases; • Exist for most organisms important for life science research; • Examples: MIM, GDB (human), MGD (mouse), FlyBase (Drosophila), SGD (yeast), MaizeDB (maize), SubtiList (B.subtilis), etc.
  13. 13. Swiss-Prot • Annotated protein sequence database established in 1986 and maintained collaboratively since 1987, by the Department of Medical Biochemistry of the University of Geneva and EBI • Complete, Curated, Non-redundant and cross-referenced with 34 other databases • Highly cross-referenced • Available from a variety of servers and through sequence analysis software tools • More than 8,000 different species • First 20 species represent about 42% of all sequences in the database • More than 1,29,000 entries with 4.7 X 1010 amino acids
  14. 14. PDB: Protein Data Bank • Holds 3D models of biological macromolecules (protein, RNA, DNA). • All data are available to the public. • Obtained by X-Ray crystallography (84%) or NMR spectroscopy (16%). • Submitted by biologists and biochemists from around the world.
  15. 15. EMBL Nucleotide Sequence Database • An annotated collection of all publicly available nucleotide and protein sequences • Created in 1980 at the European Molecular Biology Laboratory in Heidelberg. • Maintained since 1994 by EBI- Cambridge.
  16. 16. DDBJ–DNA Data Bank of Japan • An annotated collection of all publicly available nucleotide and protein sequences • Started, 1984 at the National Institute of Genetics (NIG) in Mishima. • Still maintained in this institute a team led by Takashi Gojobori.
  17. 17. Why Proteins Structure ? Proteins are fundamental components of all living cells, performing a variety of biological tasks. Each protein has a particular 3D structure that determines its function. Protein structure is more conserved than protein sequence, and more closely related to function.
  18. 18. Supersecondary structures Assembly of secondary structures which are shared by many structures. Beta hairpin Beta-alpha-beta unit Helix hairpin
  19. 19. Structural Databases SCOP: Structural Classification of Proteins Current Release: 686 folds; 1073 Superfamilies; 1827 Familes representing 15,979 PDB entries CATH: Classification, Architecture, Topology, Homology
  20. 20. Levels in SCOP 1. Class 2. Folds 3. Super families 4. Families
  21. 21. Major classes in scop • Classes – All alpha proteins – Alpha and beta proteins (a/b) – Alpha and beta proteins (a+b) – Multi-domain proteins – Membrane and cell surface proteins – Small proteins
  22. 22. Folds* • Each Class may be divided into one or more folds • Proteins which have the same secondary structure elements arranged the in the same order in the protein chain and in three dimensions are classified as having the same fold
  23. 23. Superfamilies • Superfamilies are a subdivisions of folds • A superfamily contains proteins which are thought to be evolutionarily related due to – Sequence – Function – Special structural features • Relationships between members of a superfamily may not be readily recognizable from the sequence alone
  24. 24. Families • Subdivision of super families • Contains members whose relationship is readily recognizable from the sequence • Families are further subdivided in to Proteins • Proteins are divided into Species – The same protein may be found in several species
  25. 25. All alpha: Hemoglobin
  26. 26. All beta: Immunoglobulin (8fab) OL
  27. 27. OL Alpha/beta: Triosephosphate isomerase
  28. 28. CATH • Levels • Class • Architecture – This level is unique to CATH • Topology – ~Fold(/super family) in SCOP • Homologous Super family – ~Super family(/family) in SCOP
  29. 29. Architecture • Same overall arrangement of secondary structures – Example: The architecture :Two layer beta sheet proteins contains different folds each with a distinct number and connectivity of strands
  30. 30. Abdul Qahar Buneri