2. CONTENT
What is bioinformatics ?
What is data and information?
Biological databases
Types of biological databases
Retrieval of databases
Advantages of biological databases
3. Bioinformatics
Definition
Marriage between computer science and Molecular Biology.
Techniques of computer science problems of molecular biology.
Information technology applied to analyse biological data.
Helps to gain understanding of biological data.
Plays important role in molecular medicine, evolutionary studies, drug
development and in biotechnology.
Analysis of gene and protein expression, comparison of genomic data , storing of
biological information.
5. BIOLOGICAL DATABASES
Biological Living.
Databases Collection of data in organized manner (i.e information )
Libraries of life sciences information, collected from scientific experiments which is
stored using computational analysis.
Information Accessed, Managed and updated.
Features:
Data heterogeneity
High volume data
Data curation
6. Types of biological databases
Biological databases
Primary Secondary
Basis on source of data: Basis of databases stored:
Biological databases
Sequence
Nucleic acid Protein
Structure
PDB
SCOP
CATH
7. On The basis of Data sources
Primary databases :
Contains experimentally derived data i.e raw data
Examples: Nucleotide sequence, Protein or Macromolecular sequence
Experimental results submitted into databases .
Swiss Prot, PIR , Gen bank, DDBJ.
Secondary databases :
Data derived from analysing primary data i.e information
Examples: Conserved regions, Signature sequence etc
Submitted data Analysed and stored
SCOP, CATH
8. On basis of data stored
Structural databases :
Includes structures of experimentally derived proteins and domains
Main aim is to organize protein structure providing biological community to access the
information
A. PDB ( Protein Data Bank):
Databases of experimentally determined 3D structure of protein
Currently stores 80,000 protein structure
Obtained from NMR spectroscopy and X ray crystallography
Easily accessible, can be downloaded and utilised
9. B. SCOP (Structural classification of proteins):
Contains information about classification and structures of proteins
Also describes evolutionary relation between proteins.
Currently contains 38,000 protein structures
Freely accessible to the internet
C. CATH ( Class architecture topology homology):
Contains information about classification and structures of proteins.
also gives information of bonding of proteins and evolutionary relationships of proteins.
currently contains 8,078 proteins domains information
10. Sequence databases :
Composed of large collections of nucleic acid sequence, protein sequence stored in computer.
Mainly of two types
Nucleic acid sequence:
Contains collections of sequences of genome, gene and transcript sequence.
Three chief databases store and make available raw nucleic acid data to public
Gene bank, EMBL, and DDBJ
referred to as primary sequence databases.
Genebank:
Located in USA.
Accessible through NCBI portal
Contains annotated collections of nucleotide sequence and their protein translations.
Receives 100,000 distinct organism sequences from all over world.
11. EMBL( European molecular biology laboratory):
Maintained by EBI (European bioinformatics Institute)
Comprises of primary nucleotide sequence
Data receives from genome sequencing centers.
DDBJ (DNA data bank of Japan) :
1. Located at the National Institute of Genetics (NIG).
2. Only nucleotide sequence data bank in Asia.
3. Exchange data with Gen Bank and EMBL.
4. Mainly receives data from Japanese researchers.
12. Protein sequence :
1. Database which include a protein’s amino acid sequence, conformation, structure, and features
such as active sites.
2. Compiled by the translation of DNA sequences from different gene databases.
3. Important resource because proteins mediate most biological functions.
4. Includes PIR, Swiss PROT, PDB.
1.PIR (Protein Information Resource):
2. Established in 1984 by National Biomedical Research Foundation
3. Provides a high level of annotation.
4. Contains sequence of amino acid and information about protein function prediction
5. Also contains sequences of domains.
13. Swiss PROT:
1. Swiss institute of bioinformatics in collaborations with EMBL data provides a databank
2. very high quality and consistent annotations
3.
It incorporates:
Functions of proteins
A. Post-translational modification such as phosphorylation, acetylation
B. domains and sites
C. Secondary structural feature and quaternary structure of the protein.
PDB (Protein data bank) :
1. Includes sequences of proteins.
2. Helps to predict 3D structure of proteins.
3. Database holds data derived from mainly two sources: Structure determined by X-ray
crystallography, NMR experiments
14. Retrieval of biological databases
Accessing the stored data of an organism or a particular gene from the databases.
When obtaining a new DNA sequence, one needs to know whether it has already been deposited in
the databanks.
Requirement for retrieval:
name of organism
name of gene
Data retrieval system :
Entrez
SRS
BLAST
15. Entrez :
Molecular biology databases and retrieval system
Developed by NCBI
Nucleotide and protein sequence data, 3D structure data
Easy to access but limited information to search
SRS ( Sequence retrieval system)
Home to over 80,000 biological databases
Developed by European Bioinformatics Institute (EBI)
Includes sequence of metabolic pathways, transcription factors, and conserved regions.
Provides the description of gene, date on which it is uploaded and updated.
16. BLAST (Basic Local Alignment Search Tool) :
Developed by NCBI
Blast programs were designed for fast database searching.
Helps to retrieve the data
Also helps for comparing primary biological sequence information
Raw data obtained
from experiment
Submit that data to
databases
Accession number
Entry accession
number in blast
Search
Find relationship
among them
17. Variants of BLAST
BLASTN - Compares a DNA query to DNA databases
BLASTP - Compares a protein query to a protein database.
BLASTX - Compares a DNA query to a protein database , by translating the query in the 6 possible frames .
TBLASTN -Compares a protein query to a DNA database, in the 6 possible frames of the database.
18. Advantages
Databases act as a store house of information.
Used to store and organize data in such a way that information can be retrieved
easily via a variety of search criteria.
It allows knowledge discovery, which refers to the identification of connections
between pieces of information .
Databases are important tools in assisting scientists to analyze and explain a
host of biological phenomena from the structure of biomolecules and their
interaction, to the whole metabolism of organisms and to understanding the
evolution of species.