Biological Databases Shweta Kagliwal
Units of biological information Database Types Look at some databases Synopsis
Data Primary or derived data: Primary databases: direct experimental results Secondary databases: result of analysis on primary databases Sequences Structure Genome Pathways Lipids, Carbohydrates Literature
Crop Information Systems Taxonomic Databases Nucleotide Databases  Genomic Databases  Protein Databases  Metabolic Pathway Databases  Bibliographic Databases   Data Levels Primary Secondary Primary Databases Secondary databases Database  Types General plant/animal  genera specific Organism specific Public Commercial Subject wise Taxonomy wise Ownership wise
Taxonomic Databases TaxBrowser GRIN Taxonomy for Plants Species 2000
The nucleotide sequence databases are data repositories, accepting nucleic acid sequence data from the scientific community and making it freely available.  Nucleotide Databases
A Sample Record
Specialized Nucleotide Databases Focussing on particular feature 1) Markers 2) Mitop2 – organelle specific 3) HIV-SD 4)  REBASE  - for restriction enzymes & restriction enzyme sites.
Bibliographic/Literature Databases Scientific literature databases have been available since the 1960’s. Pubmed OMIM/OMIA AGRICOLA
In addition to the human genome, the genomes of about 800 organisms have been sequenced in recent years.  Entrez Genome  -- A resource from the National Center for Biotechnology Information (NCBI) for accessing information about completed and in-progress genomes. ( PGDIC )  The 'Plant Genome Information Resource'  provides access to many different plant genome databases, including chlamydomonas, cotton, alfalfa, wheat, barley, rye, rice, millet, sorghum and species of solanaceae and trees. Genomic Databases
Prokaryotes  Completed   Draft  In Progress Archae  67  13  41 Bacteria  912   1085  1099 Eukaryotes Animals  4   79  56 Plants  2   14  45 Fungi  10   77  38 Protists  6   24  24 Revised: Dec 11, 2009  Complete & Ongoing Genomic Projects
Plant Sequencing Projects
Protein Databases Primary protein sequence databases such as UniProt/SwissProt Structure databases such as PDB  Specialized protein databases such as ENZYME
Metabolic Databases KEGG Pathway Database Ecocyc AraCyc, RiceCyc, SorghumCyc, LycoCyc, CapCyc, etc.
AraCyc Statistics
Compound Pathway Gene Enzyme Reaction
Secondary Databases Pfam ( pfam.sanger.ac.uk/ ) Pfam is a database of protein families defined as domains (contiguous segments of entire protein sequences). For each domain, it contains a multiple alignment of a set of defining sequences (the seeds) and the other sequences in SWISS-PROT and TrEMBL that can be matched to that alignment .  PROSITE  ( www.expasy.ch/prosite/) PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs. InterPro InterPro classifies sequences at superfamily, family and subfamily levels, predicting the occurrence of functional domains, repeats and important sites
 

Biological Databases

  • 1.
  • 2.
    Units of biologicalinformation Database Types Look at some databases Synopsis
  • 3.
    Data Primary orderived data: Primary databases: direct experimental results Secondary databases: result of analysis on primary databases Sequences Structure Genome Pathways Lipids, Carbohydrates Literature
  • 4.
    Crop Information SystemsTaxonomic Databases Nucleotide Databases Genomic Databases Protein Databases Metabolic Pathway Databases Bibliographic Databases Data Levels Primary Secondary Primary Databases Secondary databases Database Types General plant/animal genera specific Organism specific Public Commercial Subject wise Taxonomy wise Ownership wise
  • 5.
    Taxonomic Databases TaxBrowserGRIN Taxonomy for Plants Species 2000
  • 6.
    The nucleotide sequencedatabases are data repositories, accepting nucleic acid sequence data from the scientific community and making it freely available. Nucleotide Databases
  • 7.
  • 8.
    Specialized Nucleotide DatabasesFocussing on particular feature 1) Markers 2) Mitop2 – organelle specific 3) HIV-SD 4) REBASE - for restriction enzymes & restriction enzyme sites.
  • 9.
    Bibliographic/Literature Databases Scientificliterature databases have been available since the 1960’s. Pubmed OMIM/OMIA AGRICOLA
  • 10.
    In addition tothe human genome, the genomes of about 800 organisms have been sequenced in recent years. Entrez Genome -- A resource from the National Center for Biotechnology Information (NCBI) for accessing information about completed and in-progress genomes. ( PGDIC ) The 'Plant Genome Information Resource' provides access to many different plant genome databases, including chlamydomonas, cotton, alfalfa, wheat, barley, rye, rice, millet, sorghum and species of solanaceae and trees. Genomic Databases
  • 11.
    Prokaryotes Completed Draft In Progress Archae 67 13 41 Bacteria 912 1085 1099 Eukaryotes Animals 4 79 56 Plants 2 14 45 Fungi 10 77 38 Protists 6 24 24 Revised: Dec 11, 2009 Complete & Ongoing Genomic Projects
  • 12.
  • 13.
    Protein Databases Primaryprotein sequence databases such as UniProt/SwissProt Structure databases such as PDB Specialized protein databases such as ENZYME
  • 14.
    Metabolic Databases KEGGPathway Database Ecocyc AraCyc, RiceCyc, SorghumCyc, LycoCyc, CapCyc, etc.
  • 15.
  • 16.
    Compound Pathway GeneEnzyme Reaction
  • 17.
    Secondary Databases Pfam( pfam.sanger.ac.uk/ ) Pfam is a database of protein families defined as domains (contiguous segments of entire protein sequences). For each domain, it contains a multiple alignment of a set of defining sequences (the seeds) and the other sequences in SWISS-PROT and TrEMBL that can be matched to that alignment . PROSITE ( www.expasy.ch/prosite/) PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs. InterPro InterPro classifies sequences at superfamily, family and subfamily levels, predicting the occurrence of functional domains, repeats and important sites
  • 18.