Features of biological databases

Features of Biological
Databases
CHARU SHARMA
B.Sc(H) BOTANY 3rd
YEAR

Biological Database
 It is a collection of data that is
structured, searchable, updated
periodically and cross-referenced.
 Stores biological data in electronic
form.
 Purpose-
Systemization of database
Availability of biological data
Analysis of computed biological data

HISTORY
 Insulin, first protein that was sequenced;
composed of 55 amino acid.
 The sequence was published in “Atlas Of
Protein Sequence” in 1965 by Margaret
Day Hoff.
 Became base for PIR database.
 First nucleotide sequenced was of Yeast
tRNA, composed of 77 bp.
 First organism whose genome was
sequenced, a free living virus
Haemophilus influenzae in 1995 by Craig
Ventar

Features of Biological
Databases
1. Heterogeneity
2. High volume data
3. Uncertainity
4. Data curation
5. Data integration
6. Data sharing
7. Dynamics

1. Data Heterogeneity
Availability of diverse and complex
data types.
Data Types :
 Sequence- Nucleotide, Protein
 Graph - Data indicating relationship
among themselves can be captured
as graph. It includes pathway data,
genetic maps and structural taxonomy.

 High dimensional data –
Data generated from micro-array
experiments that involves thousands of
genes and hundreds of experimental
condition.
 Shapes –
It consists of 3D molecular structural
data.
Example- Docking
 Temporal data –
For studying dynamics of any biological
system.
Example- Development biology

 Patterns –
There are patterns lying within the
genome that characterize biologically
entities.
Example-Regulatory sequence
(promoter)
 Scalar and Vector fields –
 Extracted features data –
Numerical data obtained from
combination of one of the above
mentioned data types

2. High volume data
In addition to being highly
heterogeneous, biological data are
voluminous to support comprehensive
investigations in various fields and
directions.
3. Uncertainity
Biological data have great deal of
uncertainity as they represent biological
phenomenon that are observed and
assumed.

4. Data curation
 Biological data are collected from
various sources across different
structural and functional boundaries.
 There are always chances of missing
links.
 To fill these, the data is analyzed and
curated via automated methods.

5. Data integration
After years of research, across
different structural and functional
scales, data is collected from
laboratories worldwide, and integrated
together through a database and
made available for use.

6. Data sharing
 Biological data is shared via
databases.
 Purpose:
For scientific community’s inspection
For cross verification
To prevent repetition and validation of
data

7. Dynamics
 New data is generated every day in
laboratories.
 And sometimes this new data
contradicts with the old data.
 So, its necessary to develop new
organizational database schemes to
incorporate new data.

Classification of biological
databases
o Data type
o Maintainer status
o Data access
o Data source
o Database design
o Organism

1. Data type
 Sequence database
a. Nucleotide database : GenBank, EMBL-
Bank
b. Protein database : Swiss-Prot, PIR
 Structure database - PDB, NDB, DALI, MSD
 Microarray database - ArrayExpress, MIAME
 Chemical database - PubChem
 Pathway database - KEGG, BioSilico
 Enzyme database - ExPASy, REBASE
 Disease database - OMIM, OMIA
 Literature database - PubMed, ScoPUS

2. Maintainer status
 NCBI, EMBL
 Academic group or scientist
 Commercial company

3. Data access
 Publicly available
 Available with copyright
 Browsing only, accessible but not
downloadable
 Academic but not freely available
 Restricted

4. Data source
a) Primary database (archival)
Original data submission by researcher occurs.
Examples:
Nucleotide - GenBank, EMBL, DDBJ
Protein - UniProt
Structure - PDB
Literature - Medline (PubMed)
b) Secondary database (curated)
- Results of analysis of primary databases.
- Either manually curated or by automated
methods
Examples: Prosite , Pfam , RefSeq

5. Database design
 Flat files
 Relational database (SQL)
 Object oriented database
 Exchange/publication technologies
(FTP, HTML, SOAP, COBRA, XML)

6. Organism
 Bacteria
 Virus
 Human

Features of biological databases

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Features of biological databases

Similar to Features of biological databases (20)

Recently uploaded

Recently uploaded (20)

Features of biological databases