databases in bioinformatics

Introduction
 Fast increase in biological information
 Biological science has now turned into a
data rich science
 Gene sequences
 Amino acid sequences in proteins
 Motifs and domains in proteins
 Structural data from XRD & NMR
 Metabolic pathways
 Protein-protein interactions
 Gene expression data DNA microarrays

Biological databases
 Biological database is a collection of
data which is structured, searchable,
updated periodically and also cross-
referenced.
 Some databases are multi functional
 Major purposes of databases is as
follows:Availability of
biological data
Systemization
of data
Analysis of
computed
biological data

History
 1956; first sequence database when insulin
was sequenced
 51 amino acids
 Atlas of protein sequences and structures in
1965 by Margaret Day Hoff et al was a
printed book.
 Became base for PIR protein information
resource
 First nucleotide sequence: yeast tRNA
 77 bases
 During this time 3D structure of proteins was
being studied and renowned PDB was made.

…
 First genome published was of free
living virus haemophilus influenzae in
1995
 Genome?
 All genes ? Or all DNA?
 Why are complete genome
interesting?

Aspects of genome analysis
Ab initio Gene
prediction
Locus
Gene
identification by
EST (expressed
sequence tags)
Gene prediction
via EST
Gene prediction
via comparison,
coding and
regulatory
regions

Features of biological
databases
1) Data heterogeneity
2) High volume data
3) Uncertainty
4) Data Curation
5) Large scale data integration
6) Data sharing
7) Dynamic and subject to change

Classification scheme for
biological databases
Data type
Maintenance status
Data access
Data source
Database design
Organism

Data type
 Genome database
 Sequence database
 Structure database
 Microarray database
 Chemical database
 Pathway database
 Enzyme database
 Disease database
 Literature database

Based on maintenance status
NCBI EMBL SIB

Based on data access
1) Publicly available
2) Available with copy wright
3) Browsing only, accessible but not
downloadable
4) Academic but not freely available
5) Proprietary commercial
6) Restricted

Based on data sources
Based on
data
sources

Primary databases
 Contains original data from the
researchers
 Public or open access mostly
 NCBI , GENEBANK
 EMBL
 SWISS-PROT
 NDB

Secondary databases
 Results from entries of primary
database
 Manually created or automatically
generated
 Swiss-prot is an example of secondary
database

Biological
sequence
databases
Lecture # 5
By:
Hira Shahzad

DDBJ
 DNA databank of japan
 Nucleotide sequence database
 Established in 1986
 Has been working in collaboration
with EMBL & NCBI
 After 20 years another collaborative
project named INSDC was formed
EMBL Genebank DDBJ

SWISS-PROT
 Protein sequence database
 Maintained by SIB Swiss institute of
bioinformatics in Switzerland and also
the European bioinformatics institute
EBI
 The output format is swiss-prot file
 That has been explained in molecular
file formats

databases in bioinformatics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to databases in bioinformatics

Similar to databases in bioinformatics (20)

More from nadeem akhter

More from nadeem akhter (10)

Recently uploaded

Recently uploaded (20)

databases in bioinformatics