2. Biological Database
A biological database is a large, organized body of
persistent data, usually associated with computerized
software designed to update, query, and retrieve
components of the data stored within the system.
For example, a record associated with a nucleotide
sequence database typically contains information such as
contact name; the input sequence with a description of
the type of molecule; the scientific name of the source
organism from which it was isolated; and, often, literature
citations associated with the sequence.
3. Biological Database
o Easy access to the information.
o A method for extracting only that information
needed to answer a specific biological question.
4. Biological Database
“lot of bioinformatics work is concerned with the
technology of databases. These databases include both
"public" repositories of gene data like GenBank or the
Protein DataBank (the PDB), and private databases like
those used by research groups involved in gene mapping
projects or those held by biotech companies.”
5. Biological Database
“A few popular databases are GenBank from NCBI
(National Center for Biotechnology Information),
SwissProt from the Swiss Institute of Bioinformatics and
PIR from the Protein Information Resource.”
6. Biological Database
GenBank
GenBank (Genetic Sequence Databank) is one of the fastest
growing repositories of known genetic sequences.
It has a flat file structure, that is an ASCII text file, readable by
both humans and computers.
In addition to sequence data, GenBank files contain information
like accession numbers and gene names, phylogenetic classification
and references to published literature.
There are approximately 191,400,000 bases and 183,000
sequences as of June 1994.
7. Biological Database
EMBL
[European molecular biology laboratory] The EMBL Nucleotide
Sequence Database is a comprehensive database of DNA and RNA
sequences collected from the scientific literature and patent
applications and directly submitted from researchers and
sequencing groups.
Data collection is done in collaboration with GenBank (USA) and
the DNA Database of Japan (DDBJ).
The database currently doubles in size every 18 months and
currently (June 1994) contains nearly 2 million bases from 182,615
sequence entries.
8. Biological Database
Data file Division
VRT- Vertebrates.
INV-Invertebrates.
BCI- Bacterial.
VRL-Viral.
MAM- Mammalian.
PLN- Plant, Algae & Fungi.
EST- Expressed Sequence Tag.
GSS- Genomic Survey Sequences.
HTGs- High Through output Genomic Sequences.
(Genomic sequences of organisms that are stored in an
unfinished form)
STS- Sequence Sag Sites.
9. Biological Database
Data file Division
The data base for example has the ability to find out following types of
data:
CFTR gene (containing introns, exons etc).
CFTR cDNA (sequences that contains exons).
CFTR mRNA.
CFTR Protein Sequences.
E.g. of Searching;
CFTR, MAM, Homosapien, mRNA
11. Mutations
These are changes in the DNA coding region which might lead to some
severe disease or genetic disorders
Point Mutations
Wild Strand= A A G C A T T C
Query Strand= A A C C A T T C
12. Types of substitution mutations
Transition Substitution Mutation.
Pyramidines are single ring nitrogenous bases (C & T).
Purines are double ring nitrogenous bases (A & G).
Pyramidines Pyramidines
CT / TC
Purines Purines
AG / GA
13. Types of substitution mutations
Tran’s version Substitution Mutation.
Pyramidines Purines
AC
Purines Pyramidine
CA