INTRODUCTION
HISTORY
WHAT ARE THE DATABASE…?
WHY DATABASE….?
THE “PERFECT” DATABASE
IDENTIFIERS and ACCESSION NUMBER
TECHNICAL DESIGN
MAINTAINANCE OF BIOLOGICAL DATABASES..
GENERAL FEATURES
SOURCES OF BIOLOGICAL DATA…
DIFFERENT TYPES OF BIOLOGICAL DATABASE
FUNCTION
DATA ENTRY AND QUALITY CONTROL
AVAILIBILITY
APPLICATION
DATA RECORD AT THE YEAR 2004
CONCLUSION
REFFERENCES
Functional group interconversions(oxidation reduction)
Biological database by kk sahu
1. BIOLOGICAL DATABASE
By
KAUSHAL KUMAR SAHU
Assistant Professor (Ad Hoc)
Department of Biotechnology
Govt. Digvijay Autonomous P. G. College
Raj-Nandgaon ( C. G. )
2. INTRODUCTION
HISTORY
WHAT ARE THE DATABASE…?
WHY DATABASE….?
THE “PERFECT” DATABASE
IDENTIFIERS and ACCESSION NUMBER
TECHNICAL DESIGN
MAINTAINANCE OF BIOLOGICAL DATABASES..
GENERAL FEATURES
SOURCES OF BIOLOGICAL DATA…
DIFFERENT TYPES OF BIOLOGICAL DATABASE
FUNCTION
DATA ENTRY AND QUALITY CONTROL
AVAILIBILITY
APPLICATION
DATA RECORD AT THE YEAR 2004
CONCLUSION
REFFERENCES
SYNOPSIS
3. Biological databases are libraries of life
sciences information, collected from
scientific experiments, published literature,
high-throughput experiment technology,
and computational analyses. They contain
information from research areas.
Including—genomics, proteomics
,metabolomics, microarray gene
expression etc.
4. HISTORY
By Margaret Dayhoff in 1965, who developed a first
protein sequence database called Atlas of Protein
Sequence and Structure.
The first protein structure prediction algorithm was
developed by Chou and Fasman in 1974.
The 1980s saw the establishment of GenBank and
the development of fast database searching
algorithms such as FASTA by William Pearson and
BLAST by Stephen Altschul and coworkers.
5. WHAT ARE THE DATABASES……???
Protein structures
-Experiments
-Models (homologues)
Literature information
Original DNA Sequences
(Genomes)
Protein Sequences
-Inferred
-Direct sequencing
Expressed DNA sequences
( = mRNA Sequences
= cDNA sequences)
Expressed Sequence Tags
(ESTs)
7. THE “PERFECT” DATABASE
1. Comprehensive, but easy to search
2. Annotated, but not “too annotated”
3. A simple, easy to understand structure
4. Cross-referenced
5. Minimum redundancy
6. Easy retrieval of data
8. IDENTIFIERS and ACCESSION NUMBER
Identifier: string of letters and digits that generally is
“understandable”
Example: TPIS_CHICK (Triose Phosphate Isomerase from
chicken (gallus gallus) ) in SwissProt
Accession code: a string of letters and digits that
uniquely identifies an entry in its database.
The accession number for TPIS_CHICK in SwissProt is
P00940
9. TECHNICAL DESIGN
Flat-files
Relational database (SQL)
Object-oriented database
ALL NAME AND INFORMATION IN PRESENT IN THIS
FILE---like name, subject name, subject number etc.
1.Flat-files
10. STU
DEN
T NAME STATE
STU
DEN
T SUBJECT SUBJECT SUBJECT NAME
1 NEERAJ SHIMLA 1 GENETICH777 GENETIC777 GENETIC ENGINEERING777
2 ADITYA CHHATTISGARH 2 MOLBIO654 MOLBIO654 MOLECULAR BIOLOGY654
3 AMIT KASHMIR 3 MICRO615 MICRO615 MICROBIOLOGY615
4 BHARTI BILASHPUR 4 BIOCHE575 BIOCHE575 BIOCHEMISTRY575
5 RUCHI MAHASAMUND 5 INSTRU551 INSTRU551 INSTRUMENT551
6 SUNAINA RAIGARH 6 BIOSTA544 BIOSTA544 BIOSTATISTICS544
7 ARCHANA JAGADALPUR 7 ENVIR541 ENVIR541 ENVIRONMENTAL 541
Relation= Table
Consists of heading (a
fixed set of attributes)
Attribute Tuple
Primary key= Unique identifier Attribute
or combination of attributes that uniquely
identifies each tuple
2.Relational database (SQL).
3.Object-oriented database(hierarchical relationships
between data items)
11. MAINTAINANCE OF BIOLOGICAL
DATABASES..
Large, public institution funded by government
(EMBL, NCBI).
Quasi-academic institute (Swiss Institute of
Bioinformatics, TIGR).
Academic group or scientist.
Commercial company.
TCAG(2017)
12. Biological databases are an important tool
in assisting scientists to understand and
explain a host of biological phenomena
Biological knowledge is distributed
different general and specialized
databases
13. SOURCES OF BIOLOGICAL DATA…
GenBank
Sequencing
Centers
TATAGCCG TATAGCCGTATAGCCG TATAGCCG
Labs
Algorithms
UniGene
Curators
RefSeq
Genome
Assembly
TATAGCCG
AGCTCCGATA
CCGATGACAA
Updated
continually
by NCBI
Updated ONLY
by submitters
14. DIFFERENT TYPES OF BIOLOGICAL
DATABASE
Nucleotide sequences
Protein sequences
Genome database
Protein Structural Database
Protein Structural classification Databases
Micro array and gene expression database
Immunological database
Metabolic pathway Databases
19. FUNCTION
Make biological data available to scientists
Consolidation of data (gather data from different sources)
Provide access to large dataset that cannot be published
explicitly (genome, …)
Make biological data available in computer-readable format
Make data accessible for automated analysis
20. Data entry and quality control
Scientists (teams) deposit data directly.
Appointed curators add and update data.
Are erroneous data removed or marked?
Type and degree of error checking.
Consistency, redundancy, conflicts, updates.
21. Availability
Publicly available, no restrictions.
Available, but with copyright.
Accessible, but not downloadable.
Academic, but not freely available.
Proprietary, commercial; possibly free for academics.
23. DATA RECORD AT THE YEAR 2004
Nucleotide records 36,653,899
Protein sequences 4,436,362
3D structures 19,640
Interactions & complexes 52,385
Human Unigene Cluster 118,517
Maps and Complete Genomes 6,948
Different taxonomy Nodes 283,121
Human dbSNP 13,179,601
Human RefSeq records 22,079
bp in Human Contigs > 5,000 kb (116) 2,487,920,000
PubMed records 12,570,540
OMIM records 15,138
~11,000/day by different sources
24. ESSENTIAL BIOINFORMATICS--------JIN XIONG
Dr. Jayaram Reddy, Centre for Molecular and computational
Biology, St. Joseph’s College, Bangalore (pdf)
Computational Biology Service Unit Cornell University---Qi
Sun (pdf)
francis@bioinformatics.ubc.ca
Wikipedia
www.ncbi.nlm.nih.gov/