Amino acids
Presenting By Abdul Qahar (A Q)
Buner Campus
Edited, Prepared and shared By
Abdul Qahar
Structural database and their
classification.
Basic concept about Database
1. What is a database?
A database is a collection of data which can be used:
• alone, or
• combined / related to other data
to provide answers to the user’s question.
Data types
primary data
secondary data
tertiary data
sequence
DNA
amino acid
DMPVERILEALAVE…
primary database
secondary protein
structure“motifs”: regular
expressions, blocks, profiles,
fingerprints
e. g., alpha-helices, beta-
strands
secondary db
domains, folding units
tertiary protein structure tertiary db
atomic co-ordinates
interaction data
binary protein-protein
interactions/ networks
pathways and
functional networks
interaction db
Primary biological databases
Nucleic acid databases
EMBL
GenBank
DDBJ (DNA Data Bank of
Japan)
Protein databases
PIR
MIPS
SWISS-PROT
TrEMBL
NRL-3D
Nucleotide Databases
•EMBL:Nucleotide sequence database
•Ensembl: Automatics annotation of eukaryotic genomes
•Genome Server: Overview of completed genomes at EBI
•Genome-MOT: Genome monitoring table
•EMBL-Align: Multiple sequence alignment database
Sequence data = strings of
letters
Nucleotides (bases)
Adenine (A)
Cytosine (C)
Guanine (G)
Thymine (T)
triplet codons
genetic code
20 amino acids
(A, L, V, S etc.)
Three-dimensional protein structure =
atomic coordinates in 3D space
Protein folding
EMBL/GenBank/DDJB
• These 3 db contain mainly the same information (few differences
in the format and syntax)
• Serve as archives containing all sequences (single genes, ESTs,
complete genomes, etc.) derived from:
– Genome projects and sequencing centers
– Individual scientists
– Patent offices (i.e. USPTO, EPO)
• Non-confidential data are exchanged daily.
Databases related to Genomics
• Contain information on genes, gene location (mapping),
gene nomenclature and links to sequence databases;
• Exist for most organisms important for life science research;
• Examples: MIM, GDB (human), MGD (mouse), FlyBase
(Drosophila), SGD (yeast), MaizeDB (maize), SubtiList
(B.subtilis), etc.
Swiss-Prot
• Annotated protein sequence database established in 1986 and
maintained collaboratively since 1987, by the Department of
Medical Biochemistry of the University of Geneva and EBI
• Complete, Curated, Non-redundant and cross-referenced with 34
other databases
• Highly cross-referenced
• Available from a variety of servers and through sequence analysis
software tools
• More than 8,000 different species
• First 20 species represent about 42% of all sequences in the
database
• More than 1,29,000 entries with 4.7 X 1010 amino acids
PDB: Protein Data Bank
• Holds 3D models of biological macromolecules (protein, RNA,
DNA).
• All data are available to the public.
• Obtained by X-Ray crystallography (84%) or NMR
spectroscopy (16%).
• Submitted by biologists and biochemists from around the
world.
EMBL Nucleotide Sequence
Database
• An annotated collection of all publicly available nucleotide
and protein sequences
• Created in 1980 at the European Molecular Biology
Laboratory in Heidelberg.
• Maintained since 1994 by EBI- Cambridge.
DDBJ–DNA Data Bank of
Japan
• An annotated collection of all publicly available
nucleotide and protein sequences
• Started, 1984 at the National Institute of Genetics (NIG)
in Mishima.
• Still maintained in this institute a team led by Takashi
Gojobori.
Why Proteins Structure ?
Proteins are fundamental components of all living
cells, performing a variety of biological tasks.
Each protein has a particular 3D structure that determines its
function.
Protein structure is more conserved than protein sequence, and
more closely related to function.
Supersecondary structures
Assembly of secondary structures which are
shared by many structures.
Beta hairpin
Beta-alpha-beta unit
Helix hairpin
Structural Databases
SCOP: Structural Classification of Proteins
Current Release: 686 folds; 1073 Superfamilies; 1827 Familes
representing 15,979 PDB entries
CATH: Classification, Architecture, Topology, Homology
Levels in SCOP
1. Class
2. Folds
3. Super families
4. Families
Major classes in scop
• Classes
– All alpha proteins
– Alpha and beta proteins (a/b)
– Alpha and beta proteins (a+b)
– Multi-domain proteins
– Membrane and cell surface proteins
– Small proteins
Folds*
• Each Class may be divided into one or more folds
• Proteins which have the same secondary structure elements
arranged the in the same order in the protein chain and in three
dimensions are classified as having the same fold
Superfamilies
• Superfamilies are a subdivisions of folds
• A superfamily contains proteins which are thought to be
evolutionarily related due to
– Sequence
– Function
– Special structural features
• Relationships between members of a superfamily may not be
readily recognizable from the sequence alone
Families
• Subdivision of super families
• Contains members whose relationship is readily recognizable
from the sequence
• Families are further subdivided in to Proteins
• Proteins are divided into Species
– The same protein may be found in several species
All alpha: Hemoglobin
All beta: Immunoglobulin
(8fab)
OL
OL
Alpha/beta: Triosephosphate
isomerase
CATH
• Levels
• Class
• Architecture
– This level is unique to CATH
• Topology
– ~Fold(/super family) in SCOP
• Homologous Super family
– ~Super family(/family) in SCOP
Architecture
• Same overall arrangement of secondary structures
– Example: The architecture :Two layer beta sheet proteins
contains different folds each with a distinct number and
connectivity of strands
Abdul Qahar Buneri abdulqahar045@gmail.com
www.slideshare.net/abdulqahar045

Structural database and their classification by abdul qahar

  • 1.
  • 2.
    Presenting By AbdulQahar (A Q) Buner Campus Edited, Prepared and shared By Abdul Qahar
  • 3.
    Structural database andtheir classification.
  • 4.
    Basic concept aboutDatabase 1. What is a database? A database is a collection of data which can be used: • alone, or • combined / related to other data to provide answers to the user’s question.
  • 5.
    Data types primary data secondarydata tertiary data sequence DNA amino acid DMPVERILEALAVE… primary database secondary protein structure“motifs”: regular expressions, blocks, profiles, fingerprints e. g., alpha-helices, beta- strands secondary db domains, folding units tertiary protein structure tertiary db atomic co-ordinates interaction data binary protein-protein interactions/ networks pathways and functional networks interaction db
  • 6.
    Primary biological databases Nucleicacid databases EMBL GenBank DDBJ (DNA Data Bank of Japan) Protein databases PIR MIPS SWISS-PROT TrEMBL NRL-3D
  • 7.
    Nucleotide Databases •EMBL:Nucleotide sequencedatabase •Ensembl: Automatics annotation of eukaryotic genomes •Genome Server: Overview of completed genomes at EBI •Genome-MOT: Genome monitoring table •EMBL-Align: Multiple sequence alignment database
  • 8.
    Sequence data =strings of letters Nucleotides (bases) Adenine (A) Cytosine (C) Guanine (G) Thymine (T) triplet codons genetic code 20 amino acids (A, L, V, S etc.)
  • 9.
    Three-dimensional protein structure= atomic coordinates in 3D space
  • 10.
  • 11.
    EMBL/GenBank/DDJB • These 3db contain mainly the same information (few differences in the format and syntax) • Serve as archives containing all sequences (single genes, ESTs, complete genomes, etc.) derived from: – Genome projects and sequencing centers – Individual scientists – Patent offices (i.e. USPTO, EPO) • Non-confidential data are exchanged daily.
  • 12.
    Databases related toGenomics • Contain information on genes, gene location (mapping), gene nomenclature and links to sequence databases; • Exist for most organisms important for life science research; • Examples: MIM, GDB (human), MGD (mouse), FlyBase (Drosophila), SGD (yeast), MaizeDB (maize), SubtiList (B.subtilis), etc.
  • 13.
    Swiss-Prot • Annotated proteinsequence database established in 1986 and maintained collaboratively since 1987, by the Department of Medical Biochemistry of the University of Geneva and EBI • Complete, Curated, Non-redundant and cross-referenced with 34 other databases • Highly cross-referenced • Available from a variety of servers and through sequence analysis software tools • More than 8,000 different species • First 20 species represent about 42% of all sequences in the database • More than 1,29,000 entries with 4.7 X 1010 amino acids
  • 14.
    PDB: Protein DataBank • Holds 3D models of biological macromolecules (protein, RNA, DNA). • All data are available to the public. • Obtained by X-Ray crystallography (84%) or NMR spectroscopy (16%). • Submitted by biologists and biochemists from around the world.
  • 15.
    EMBL Nucleotide Sequence Database •An annotated collection of all publicly available nucleotide and protein sequences • Created in 1980 at the European Molecular Biology Laboratory in Heidelberg. • Maintained since 1994 by EBI- Cambridge.
  • 16.
    DDBJ–DNA Data Bankof Japan • An annotated collection of all publicly available nucleotide and protein sequences • Started, 1984 at the National Institute of Genetics (NIG) in Mishima. • Still maintained in this institute a team led by Takashi Gojobori.
  • 17.
    Why Proteins Structure? Proteins are fundamental components of all living cells, performing a variety of biological tasks. Each protein has a particular 3D structure that determines its function. Protein structure is more conserved than protein sequence, and more closely related to function.
  • 18.
    Supersecondary structures Assembly ofsecondary structures which are shared by many structures. Beta hairpin Beta-alpha-beta unit Helix hairpin
  • 19.
    Structural Databases SCOP: StructuralClassification of Proteins Current Release: 686 folds; 1073 Superfamilies; 1827 Familes representing 15,979 PDB entries CATH: Classification, Architecture, Topology, Homology
  • 20.
    Levels in SCOP 1.Class 2. Folds 3. Super families 4. Families
  • 21.
    Major classes inscop • Classes – All alpha proteins – Alpha and beta proteins (a/b) – Alpha and beta proteins (a+b) – Multi-domain proteins – Membrane and cell surface proteins – Small proteins
  • 22.
    Folds* • Each Classmay be divided into one or more folds • Proteins which have the same secondary structure elements arranged the in the same order in the protein chain and in three dimensions are classified as having the same fold
  • 23.
    Superfamilies • Superfamilies area subdivisions of folds • A superfamily contains proteins which are thought to be evolutionarily related due to – Sequence – Function – Special structural features • Relationships between members of a superfamily may not be readily recognizable from the sequence alone
  • 24.
    Families • Subdivision ofsuper families • Contains members whose relationship is readily recognizable from the sequence • Families are further subdivided in to Proteins • Proteins are divided into Species – The same protein may be found in several species
  • 25.
  • 26.
  • 27.
  • 28.
    CATH • Levels • Class •Architecture – This level is unique to CATH • Topology – ~Fold(/super family) in SCOP • Homologous Super family – ~Super family(/family) in SCOP
  • 29.
    Architecture • Same overallarrangement of secondary structures – Example: The architecture :Two layer beta sheet proteins contains different folds each with a distinct number and connectivity of strands
  • 31.
    Abdul Qahar Buneriabdulqahar045@gmail.com www.slideshare.net/abdulqahar045