Biological database by kk sahu

BIOLOGICAL DATABASE
By
KAUSHAL KUMAR SAHU
Assistant Professor (Ad Hoc)
Department of Biotechnology
Govt. Digvijay Autonomous P. G. College
Raj-Nandgaon ( C. G. )

INTRODUCTION
HISTORY
WHAT ARE THE DATABASE…?
WHY DATABASE….?
THE “PERFECT” DATABASE
IDENTIFIERS and ACCESSION NUMBER
TECHNICAL DESIGN
MAINTAINANCE OF BIOLOGICAL DATABASES..
GENERAL FEATURES
SOURCES OF BIOLOGICAL DATA…
DIFFERENT TYPES OF BIOLOGICAL DATABASE
FUNCTION
DATA ENTRY AND QUALITY CONTROL
AVAILIBILITY
APPLICATION
DATA RECORD AT THE YEAR 2004
CONCLUSION
REFFERENCES
SYNOPSIS

Biological databases are libraries of life
sciences information, collected from
scientific experiments, published literature,
high-throughput experiment technology,
and computational analyses. They contain
information from research areas.
Including—genomics, proteomics
,metabolomics, microarray gene
expression etc.

HISTORY
 By Margaret Dayhoff in 1965, who developed a first
protein sequence database called Atlas of Protein
Sequence and Structure.
 The first protein structure prediction algorithm was
developed by Chou and Fasman in 1974.
 The 1980s saw the establishment of GenBank and
the development of fast database searching
algorithms such as FASTA by William Pearson and
BLAST by Stephen Altschul and coworkers.

WHAT ARE THE DATABASES……???
Protein structures
-Experiments
-Models (homologues)
Literature information
Original DNA Sequences
(Genomes)
Protein Sequences
-Inferred
-Direct sequencing
Expressed DNA sequences
( = mRNA Sequences
= cDNA sequences)
Expressed Sequence Tags
(ESTs)

THE “PERFECT” DATABASE
1. Comprehensive, but easy to search
2. Annotated, but not “too annotated”
3. A simple, easy to understand structure
4. Cross-referenced
5. Minimum redundancy
6. Easy retrieval of data

IDENTIFIERS and ACCESSION NUMBER
 Identifier: string of letters and digits that generally is
“understandable”
 Example: TPIS_CHICK (Triose Phosphate Isomerase from
chicken (gallus gallus) ) in SwissProt
 Accession code: a string of letters and digits that
uniquely identifies an entry in its database.
 The accession number for TPIS_CHICK in SwissProt is
P00940

TECHNICAL DESIGN
 Flat-files
 Relational database (SQL)
 Object-oriented database
ALL NAME AND INFORMATION IN PRESENT IN THIS
FILE---like name, subject name, subject number etc.
1.Flat-files

STU
DEN
T NAME STATE
STU
DEN
T SUBJECT SUBJECT SUBJECT NAME
1 NEERAJ SHIMLA 1 GENETICH777 GENETIC777 GENETIC ENGINEERING777
2 ADITYA CHHATTISGARH 2 MOLBIO654 MOLBIO654 MOLECULAR BIOLOGY654
3 AMIT KASHMIR 3 MICRO615 MICRO615 MICROBIOLOGY615
4 BHARTI BILASHPUR 4 BIOCHE575 BIOCHE575 BIOCHEMISTRY575
5 RUCHI MAHASAMUND 5 INSTRU551 INSTRU551 INSTRUMENT551
6 SUNAINA RAIGARH 6 BIOSTA544 BIOSTA544 BIOSTATISTICS544
7 ARCHANA JAGADALPUR 7 ENVIR541 ENVIR541 ENVIRONMENTAL 541
Relation= Table
Consists of heading (a
fixed set of attributes)
Attribute Tuple
Primary key= Unique identifier Attribute
or combination of attributes that uniquely
identifies each tuple
2.Relational database (SQL).
3.Object-oriented database(hierarchical relationships
between data items)

MAINTAINANCE OF BIOLOGICAL
DATABASES..
 Large, public institution funded by government
(EMBL, NCBI).
 Quasi-academic institute (Swiss Institute of
Bioinformatics, TIGR).
 Academic group or scientist.
 Commercial company.
 TCAG(2017)

Biological databases are an important tool
in assisting scientists to understand and
explain a host of biological phenomena
Biological knowledge is distributed
different general and specialized
databases

SOURCES OF BIOLOGICAL DATA…
GenBank
Sequencing
Centers
TATAGCCG TATAGCCGTATAGCCG TATAGCCG
Labs
Algorithms
UniGene
Curators
RefSeq
Genome
Assembly
TATAGCCG
AGCTCCGATA
CCGATGACAA
Updated
continually
by NCBI
Updated ONLY
by submitters

DIFFERENT TYPES OF BIOLOGICAL
DATABASE
 Nucleotide sequences
 Protein sequences
 Genome database
 Protein Structural Database
 Protein Structural classification Databases
 Micro array and gene expression database
 Immunological database
 Metabolic pathway Databases

Nucleotide
sequences
10,378,022
11,302,156,937

IMGT
MHCPEP
Immunological
database

FUNCTION
 Make biological data available to scientists
 Consolidation of data (gather data from different sources)
 Provide access to large dataset that cannot be published
explicitly (genome, …)
 Make biological data available in computer-readable format
 Make data accessible for automated analysis

Data entry and quality control
 Scientists (teams) deposit data directly.
 Appointed curators add and update data.
 Are erroneous data removed or marked?
 Type and degree of error checking.
 Consistency, redundancy, conflicts, updates.

Availability
 Publicly available, no restrictions.
 Available, but with copyright.
 Accessible, but not downloadable.
 Academic, but not freely available.
 Proprietary, commercial; possibly free for academics.

APPLICATION
 Sequence comparison
 Evolutionary relationship between genes
 Gene expression comparison
 Primer designing

DATA RECORD AT THE YEAR 2004
Nucleotide records 36,653,899
Protein sequences 4,436,362
3D structures 19,640
Interactions & complexes 52,385
Human Unigene Cluster 118,517
Maps and Complete Genomes 6,948
Different taxonomy Nodes 283,121
Human dbSNP 13,179,601
Human RefSeq records 22,079
bp in Human Contigs > 5,000 kb (116) 2,487,920,000
PubMed records 12,570,540
OMIM records 15,138
~11,000/day by different sources

ESSENTIAL BIOINFORMATICS--------JIN XIONG
Dr. Jayaram Reddy, Centre for Molecular and computational
Biology, St. Joseph’s College, Bangalore (pdf)
Computational Biology Service Unit Cornell University---Qi
Sun (pdf)
francis@bioinformatics.ubc.ca
Wikipedia
www.ncbi.nlm.nih.gov/

Biological database by kk sahu

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Biological database by kk sahu

Similar to Biological database by kk sahu (20)

More from KAUSHAL SAHU

More from KAUSHAL SAHU (20)

Recently uploaded

Recently uploaded (20)

Biological database by kk sahu