BIOLOGICAL DATABASES
A. SEKHAR REDDY
1st Sem M.S. (Pharm)
Department of Pharmacoinformatics
NIPER S.A.S Nagar
1
• Biological databases are libraries of life sciences information, collected from scientific
experiments, published literature, high-throughput experiment technology, and
computational analysis
• They contain information from research areas including genomics, proteomics,
metabolomics, microarray gene expression, and phylogenetics
• Information contained in biological databases includes gene function, structure,
localization (both cellular and chromosomal), clinical effects of mutations as well as
similarities of biological sequences and structures.
2
Biological
databases
based on the
source
primary
database
secondary
database
based on the
nature of data
sequence
database
nucleotide
database
primary secondary
protein
database
primary secondary
structure
database
primary secondary
literature
gene
expression
metabolic
pathway
3
• Primary databases : Experimental results are submitted directly into the database by
researchers, and the data are essentially archival in nature
• Secondary databases : Secondary databases comprise data derived from the results
of analysing primary data
Primary Nucleotide
Database
Secondary Nucleotide
database
Primary protein
database
Secondary protein
database
GenBank Unigene PIR PROSITE
EMBL Ensembl SWISS-PROT PRINTS
DDBJ EMI genomics NRL-3D TrEMBL
4
• GenBank : (www.ncbi.nlm.nih.gov/Genbank/-) maintained by the National Center
for Biotechnology Information (NCBI) ,contains nucleotide and aminoacid sequences
& it is a part of international nucleotide sequence database collaboration.
• EMBL : (www.ebi.ac.uk/embl/-) The EMBL (European Molecular Biology
Laboratory) nucleotide sequence database is maintained by the European
Bioinformatics Institute (EBI) and it incorporates ,organises,distributes nucleotide
sequences from public sources.
• DDBJ : (www.ddbj.nig.ac.jp) - DNA databank of japan , a biological database that
collects DNA sequences.
• Unigene : UniGene is a NCBI database of the transcriptome and thus, despite the
name, not primarily a database for genes
• Ensembl and EMI genomics : contains data derived from EMBL-EMI
NUCLEOTIDE DATABASE
5
PROTEIN DATABASES
• PIR : Protein Information Resources - maintained by National Biomedical Research
Foundation (NBRF)
• SWISS-PROT : produced by EMBL , maintained by SIB
• NRL-3D : produced by PIR
• Uniprot : (Universal protein resource) a central repository of protein data created by
combining Swiss-Prot,TrEMBl and PIR-PSD databases
• PROSITE : It consists of entries describing the protein families, domains and
functional sites as well as amino acid patterns and profiles in them
• PRINTS : protein fingerprint database - it provides both a detailed annotation
resource for protein families, and a diagnostic tool for newly determined sequences
• TrEMBL : (translated EMBL) is a "computer-annotated supplement of Swiss-prot
that contains all the translations of EMBL nucleotide sequence entries not yet
integrated in Swiss-prot 6
STRUCTURAL DATABASES
• PDB : (www.rcsb.org) (protein data bank)- a crystallographic database obtained by
X-ray crystallography, NMR spectroscopy for three dimensional structure data of
large biomolecules like proteins and nucleic acids
• SCOP : The Structural Classification of Proteins (SCOP) database is a largely
manual classification of protein structural domains based on similarities of their
structures and amino acid sequences
• CATH : Class Architecture Topology Homology - CATH Protein Structure
Classification database is a free, publicly available online resource that provides
information on the evolutionary relationships of protein domains
7
• KEGG : Kyoto Encyclopedia of Genes and Genomes is a collection of databases
dealing with genomes, biological pathways, diseases, drugs, and chemical substances
• SMPDB : The Small Molecule Pathway Database (SMPDB) is a comprehensive,
high-quality, freely accessible, online database containing more than 600 small
molecule (i.e. metabolic) pathways found in humans
• BioCyc : The BioCyc database collection is an assortment of organism specific
Pathway/ Genome Databases (PGDBs). They provide reference to genome and
metabolic pathway information for thousands of organisms
8
THANK YOU
9

Biological databases

  • 1.
    BIOLOGICAL DATABASES A. SEKHARREDDY 1st Sem M.S. (Pharm) Department of Pharmacoinformatics NIPER S.A.S Nagar 1
  • 2.
    • Biological databasesare libraries of life sciences information, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis • They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics • Information contained in biological databases includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures. 2
  • 3.
    Biological databases based on the source primary database secondary database basedon the nature of data sequence database nucleotide database primary secondary protein database primary secondary structure database primary secondary literature gene expression metabolic pathway 3
  • 4.
    • Primary databases: Experimental results are submitted directly into the database by researchers, and the data are essentially archival in nature • Secondary databases : Secondary databases comprise data derived from the results of analysing primary data Primary Nucleotide Database Secondary Nucleotide database Primary protein database Secondary protein database GenBank Unigene PIR PROSITE EMBL Ensembl SWISS-PROT PRINTS DDBJ EMI genomics NRL-3D TrEMBL 4
  • 5.
    • GenBank :(www.ncbi.nlm.nih.gov/Genbank/-) maintained by the National Center for Biotechnology Information (NCBI) ,contains nucleotide and aminoacid sequences & it is a part of international nucleotide sequence database collaboration. • EMBL : (www.ebi.ac.uk/embl/-) The EMBL (European Molecular Biology Laboratory) nucleotide sequence database is maintained by the European Bioinformatics Institute (EBI) and it incorporates ,organises,distributes nucleotide sequences from public sources. • DDBJ : (www.ddbj.nig.ac.jp) - DNA databank of japan , a biological database that collects DNA sequences. • Unigene : UniGene is a NCBI database of the transcriptome and thus, despite the name, not primarily a database for genes • Ensembl and EMI genomics : contains data derived from EMBL-EMI NUCLEOTIDE DATABASE 5
  • 6.
    PROTEIN DATABASES • PIR: Protein Information Resources - maintained by National Biomedical Research Foundation (NBRF) • SWISS-PROT : produced by EMBL , maintained by SIB • NRL-3D : produced by PIR • Uniprot : (Universal protein resource) a central repository of protein data created by combining Swiss-Prot,TrEMBl and PIR-PSD databases • PROSITE : It consists of entries describing the protein families, domains and functional sites as well as amino acid patterns and profiles in them • PRINTS : protein fingerprint database - it provides both a detailed annotation resource for protein families, and a diagnostic tool for newly determined sequences • TrEMBL : (translated EMBL) is a "computer-annotated supplement of Swiss-prot that contains all the translations of EMBL nucleotide sequence entries not yet integrated in Swiss-prot 6
  • 7.
    STRUCTURAL DATABASES • PDB: (www.rcsb.org) (protein data bank)- a crystallographic database obtained by X-ray crystallography, NMR spectroscopy for three dimensional structure data of large biomolecules like proteins and nucleic acids • SCOP : The Structural Classification of Proteins (SCOP) database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences • CATH : Class Architecture Topology Homology - CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains 7
  • 8.
    • KEGG :Kyoto Encyclopedia of Genes and Genomes is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances • SMPDB : The Small Molecule Pathway Database (SMPDB) is a comprehensive, high-quality, freely accessible, online database containing more than 600 small molecule (i.e. metabolic) pathways found in humans • BioCyc : The BioCyc database collection is an assortment of organism specific Pathway/ Genome Databases (PGDBs). They provide reference to genome and metabolic pathway information for thousands of organisms 8
  • 9.