Biological Databases

3,949 views
3,499 views

Published on

Overview of biological databases..

Published in: Technology
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total views
3,949
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

Biological Databases

  1. 1. Biological Databases Shweta Kagliwal
  2. 2. <ul><li>Units of biological information </li></ul><ul><li>Database Types </li></ul><ul><li>Look at some databases </li></ul>Synopsis
  3. 3. Data <ul><li>Primary or derived data: </li></ul><ul><li>Primary databases: direct experimental results </li></ul><ul><li>Secondary databases: result of analysis on primary databases </li></ul>Sequences Structure Genome Pathways Lipids, Carbohydrates Literature
  4. 4. <ul><li>Crop Information Systems </li></ul><ul><li>Taxonomic Databases </li></ul><ul><li>Nucleotide Databases </li></ul><ul><li>Genomic Databases </li></ul><ul><li>Protein Databases </li></ul><ul><li>Metabolic Pathway Databases </li></ul><ul><li>Bibliographic Databases </li></ul><ul><li>Data Levels </li></ul><ul><li>Primary </li></ul><ul><li>Secondary </li></ul><ul><li>Primary Databases </li></ul><ul><li>Secondary databases </li></ul>Database Types <ul><li>General </li></ul><ul><li>plant/animal </li></ul><ul><li>genera specific </li></ul><ul><li>Organism specific </li></ul><ul><li>Public </li></ul><ul><li>Commercial </li></ul>Subject wise Taxonomy wise Ownership wise
  5. 5. Taxonomic Databases TaxBrowser GRIN Taxonomy for Plants Species 2000
  6. 6. <ul><li>The nucleotide sequence databases are data repositories, accepting nucleic acid sequence data from the scientific community and making it freely available. </li></ul>Nucleotide Databases
  7. 7. A Sample Record
  8. 8. Specialized Nucleotide Databases Focussing on particular feature 1) Markers 2) Mitop2 – organelle specific 3) HIV-SD 4) REBASE - for restriction enzymes & restriction enzyme sites.
  9. 9. Bibliographic/Literature Databases <ul><li>Scientific literature databases have been available since the 1960’s. </li></ul><ul><li>Pubmed </li></ul><ul><li>OMIM/OMIA </li></ul><ul><li>AGRICOLA </li></ul>
  10. 10. <ul><li>In addition to the human genome, the genomes of about 800 organisms have been sequenced in recent years. </li></ul><ul><li>Entrez Genome -- A resource from the National Center for Biotechnology Information (NCBI) for accessing information about completed and in-progress genomes. </li></ul><ul><li>( PGDIC ) The 'Plant Genome Information Resource' provides access to many different plant genome databases, including chlamydomonas, cotton, alfalfa, wheat, barley, rye, rice, millet, sorghum and species of solanaceae and trees. </li></ul>Genomic Databases
  11. 11. <ul><li>Prokaryotes Completed Draft In Progress </li></ul><ul><li>Archae 67 13 41 </li></ul><ul><li>Bacteria 912 1085 1099 </li></ul><ul><li>Eukaryotes </li></ul><ul><li>Animals 4 79 56 </li></ul><ul><li>Plants 2 14 45 </li></ul><ul><li>Fungi 10 77 38 </li></ul><ul><li>Protists 6 24 24 </li></ul><ul><li>Revised: Dec 11, 2009 </li></ul>Complete & Ongoing Genomic Projects
  12. 12. Plant Sequencing Projects
  13. 13. Protein Databases <ul><li>Primary protein sequence databases such as UniProt/SwissProt </li></ul><ul><li>Structure databases such as PDB </li></ul><ul><li>Specialized protein databases such as ENZYME </li></ul>
  14. 14. Metabolic Databases <ul><li>KEGG Pathway Database </li></ul><ul><li>Ecocyc </li></ul><ul><li>AraCyc, RiceCyc, SorghumCyc, LycoCyc, CapCyc, etc. </li></ul>
  15. 15. AraCyc Statistics
  16. 16. Compound Pathway Gene Enzyme Reaction
  17. 17. <ul><li>Secondary Databases </li></ul><ul><li>Pfam ( pfam.sanger.ac.uk/ ) </li></ul><ul><li>Pfam is a database of protein families defined as domains (contiguous segments of entire protein sequences). For each domain, it contains a multiple alignment of a set of defining sequences (the seeds) and the other sequences in SWISS-PROT and TrEMBL that can be matched to that alignment . </li></ul><ul><li>PROSITE ( www.expasy.ch/prosite/) </li></ul><ul><li>PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs. </li></ul><ul><li>InterPro </li></ul><ul><li>InterPro classifies sequences at superfamily, family and subfamily levels, predicting the occurrence of functional domains, repeats and important sites </li></ul>

×