Biological Databases
1
Submitted
For the M.Sc. (BIOTECHNOLOGY) III-SEMESTER EXAM.DEC.2018 (REGULAR)
Subject
SEMINAR, SCIENTIFIC WRITING AND PRESENTATION
Submitted by
Shradheya R.R. Gupta
M.Sc. Biotechnology
Roll. Number
1832128
1. Content
Biological Databases
Types
A. On the basis of type:-
1. Sequence Databases
2. Structure Databases
3. Functional Databases
Conclusion
B. On the basis of order:-
1. Primary Databases
2. Secondary Databases
3. Composite Databases
2
1. Biological Databases
 Biological databases are store house of life science information.
 Information is collected from scientific experiments, published literature, high-
throughput experiment technology, and computational analysis.
3
A. On the basis of type
1. Sequence database:-
 Composed of a large collection of
nucleic acid and protein sequences.
 BLAST program is the most common
searching tool for sequence
similarity.
 Many annotations of the sequences
are based on the results of sequence
similarity searches of previously-
annotated sequences. 4
2. Structure database:-
 Main aim is to organize and annotate the protein structures.
Example:-
1. PDB
2. Databases of Macromolecules Movements
3. Functional database:-
 Physiological role of gene products - enzyme activities, mutant phenotypes,
biological pathways etc.
Examples:-
1. KEGG PATHWAY Database
2. BRENDA
3. Reactome
4. HMDB
5
B. On the basis of order
1. Primary database:-
 A primary database contains information obtained experimentally.
 Experimental results are submitted directly into the database by researchers,
and the data are essentially archival in nature.
6
A. Nucleotide Primary database:-
 Three chief databases that store and
make available raw nucleic acid
sequences.
1. GenBank:-
Located in the U.S.A.
2. DDBJ:-
Located in Japan
3. EMBL:-
Located in U.K.
 They have uniform data formats (but
not identical) and exchange data on
daily basis. 7
B. Protein Primary database:-
 PIR-PSD is a comprehensive, non- redundant and annotated data.
Classification of protein sequences based on the super family concept.
 SWISS -PROT it provides a high level of annotation.
 Both PIR-PSD and SWISS-PROT have software that enables the user to easily
search through the database to obtain only the required information.
 TrEMB it contains the translation of all coding sequences present in the EMBL
nucleotide database.
8
2. Secondary database:-
 Comprises data derived from the results of primary data.
 Secondary databases have become the molecular biologist’s reference library
over the past decade.
9
A. Nucleotide Secondary database:-
 UniGene automatically partitioning GenBank sequences into a non-redundant
set of gene-oriented clusters.
 Ensembl provide a centralized resource for geneticists, molecular biologists
and other researchers studying the genomes.
 Microbial Resource contains all the focus on one organism.
 ACeDB originally developed for the C. Elegans ( a nematode worm) genome
project. It is a repository of sequence, genetic map and phenotypic information
about the C. Elegans.
 FlyBase genome of the fruit fly D. Melanogaster to a high degree of
completeness and quality.
10
B. Protein Secondary database:-
 InterPro is a database of protein families, domains and functional sites in
which identifiable features found in known proteins can be applied to new
protein.
 UniProt database of protein sequence and functional information.
 GPCRGB database is focused on a single family protein, GPCRGB. These are
transmembrane protein used by cells to communicate with the outside world.
 CluSTr (Cluster of SWISS-PROT and TrEMBL) database offers an automatic
classification of the entries in the SWISS-PROT and TrEMBL databases into
groups of related proteins.
 COGS or Cluster of Orthologous Groups of protein database. 11
3. Composite database:-
 It is an amalgamation of different primary database sources,
which omits the need to search multiple resources.
 NCBI hosts these features to various persons involved in
research.
Examples:-
1. OMIM
Catalog of human genes, genetic disorders and related literature.
2. GENE
Molecular data and literature related to genes with extensive links to
other databases. 12
13
Conclusion
 The present challenge is to:-
1. Handle huge volume of data.
2. To improve database design.
3. Develop software for database access and manipulation.
 There is no doubt of involvement of bioinformatics in biological
sciences and betterment of human lives.
14
Thank You

Biological databases

  • 1.
    Biological Databases 1 Submitted For theM.Sc. (BIOTECHNOLOGY) III-SEMESTER EXAM.DEC.2018 (REGULAR) Subject SEMINAR, SCIENTIFIC WRITING AND PRESENTATION Submitted by Shradheya R.R. Gupta M.Sc. Biotechnology Roll. Number 1832128
  • 2.
    1. Content Biological Databases Types A.On the basis of type:- 1. Sequence Databases 2. Structure Databases 3. Functional Databases Conclusion B. On the basis of order:- 1. Primary Databases 2. Secondary Databases 3. Composite Databases 2
  • 3.
    1. Biological Databases Biological databases are store house of life science information.  Information is collected from scientific experiments, published literature, high- throughput experiment technology, and computational analysis. 3
  • 4.
    A. On thebasis of type 1. Sequence database:-  Composed of a large collection of nucleic acid and protein sequences.  BLAST program is the most common searching tool for sequence similarity.  Many annotations of the sequences are based on the results of sequence similarity searches of previously- annotated sequences. 4
  • 5.
    2. Structure database:- Main aim is to organize and annotate the protein structures. Example:- 1. PDB 2. Databases of Macromolecules Movements 3. Functional database:-  Physiological role of gene products - enzyme activities, mutant phenotypes, biological pathways etc. Examples:- 1. KEGG PATHWAY Database 2. BRENDA 3. Reactome 4. HMDB 5
  • 6.
    B. On thebasis of order 1. Primary database:-  A primary database contains information obtained experimentally.  Experimental results are submitted directly into the database by researchers, and the data are essentially archival in nature. 6
  • 7.
    A. Nucleotide Primarydatabase:-  Three chief databases that store and make available raw nucleic acid sequences. 1. GenBank:- Located in the U.S.A. 2. DDBJ:- Located in Japan 3. EMBL:- Located in U.K.  They have uniform data formats (but not identical) and exchange data on daily basis. 7
  • 8.
    B. Protein Primarydatabase:-  PIR-PSD is a comprehensive, non- redundant and annotated data. Classification of protein sequences based on the super family concept.  SWISS -PROT it provides a high level of annotation.  Both PIR-PSD and SWISS-PROT have software that enables the user to easily search through the database to obtain only the required information.  TrEMB it contains the translation of all coding sequences present in the EMBL nucleotide database. 8
  • 9.
    2. Secondary database:- Comprises data derived from the results of primary data.  Secondary databases have become the molecular biologist’s reference library over the past decade. 9
  • 10.
    A. Nucleotide Secondarydatabase:-  UniGene automatically partitioning GenBank sequences into a non-redundant set of gene-oriented clusters.  Ensembl provide a centralized resource for geneticists, molecular biologists and other researchers studying the genomes.  Microbial Resource contains all the focus on one organism.  ACeDB originally developed for the C. Elegans ( a nematode worm) genome project. It is a repository of sequence, genetic map and phenotypic information about the C. Elegans.  FlyBase genome of the fruit fly D. Melanogaster to a high degree of completeness and quality. 10
  • 11.
    B. Protein Secondarydatabase:-  InterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to new protein.  UniProt database of protein sequence and functional information.  GPCRGB database is focused on a single family protein, GPCRGB. These are transmembrane protein used by cells to communicate with the outside world.  CluSTr (Cluster of SWISS-PROT and TrEMBL) database offers an automatic classification of the entries in the SWISS-PROT and TrEMBL databases into groups of related proteins.  COGS or Cluster of Orthologous Groups of protein database. 11
  • 12.
    3. Composite database:- It is an amalgamation of different primary database sources, which omits the need to search multiple resources.  NCBI hosts these features to various persons involved in research. Examples:- 1. OMIM Catalog of human genes, genetic disorders and related literature. 2. GENE Molecular data and literature related to genes with extensive links to other databases. 12
  • 13.
    13 Conclusion  The presentchallenge is to:- 1. Handle huge volume of data. 2. To improve database design. 3. Develop software for database access and manipulation.  There is no doubt of involvement of bioinformatics in biological sciences and betterment of human lives.
  • 14.