Primary, secondary, tertiary biological database

Powerpoint Templates
Page 1
PRIMARY,SECONDARY,TERTIARY BIOLOGICAL DATABASE
By
KAUSHAL KUMAR SAHU
Assistant Professor (Ad Hoc)
Department of Biotechnology
Govt. Digvijay Autonomous P. G. College
Raj-Nandgaon ( C. G. )

Page 2
Synopsis
Introduction
Biological database
Types: 1.primary database
 Nucleic acid sequence database : Genebank , EMBL, DDJB
 Protein sequence database:PIR,SWISS-PROT,TrEMBL
2. Secondary database
• PRINTS
• PROSITE
• PROFILES
• BLOCKS
• IDENTITY
3. Composite database: Non-redundant databases (NRDB)
• Non-redundant protein sequence databases (OWL)
• SWISS-PROT+ TrEMBL
• MIPSX
Important database search tool
Application

Page 3
INTRODUCTION
DATABASE
• Convenient method of vast amount of information
• Allows for proper storing, searching & retrieving of data.
• Before analyzing them we need to assemble them into central,
shareable resources
• Different Database Types
• Depends on the nature of information stored (sequences, 2D gel or
3D structure images)
• Manner of storage (flat files, tables in a relational database, etc)

Page 4
BIOLOGICAL DATABASE
• It is the library of life science information collected from scientific
experiment, published literature and computational analysis as
much as possible particular type of information should be available
in one single place and make biological data available in computer
readable form.
• They contain information from research areas including genomics,
proteomics, metabolomics, microarray gene expression, and
phylogenetics.

Page 5
Biological Databases
Types of biological data and the information they contain
Bibliographic databases Literature
Taxonomic databases Classification
Nucleic acid databases DNA information
Genomic databases Gene level information
Protein databases Protein information
Protein families, domains and
functional sites
Classification of proteins and identifying
domains
Enzymes/ metabolic pathway Metabolic pathways

Page 6
Types of Biological Databases
• Primary database:
• Primary sequence database are a database that stores bimolecular
sequence. (protein or nucleic acid) and associated annotation
information (organism, species, function, mutation linked to
particular diseases functional/structured patterns,bibliographic etc)
• Primary database tool are effective for identifying the sequence
similarities, but analysis of output is sometimes difficult and cannot
always answer some of the more sophisticated question of
sequence analysis.

Page 7
primary
sequence
database
Nucleic acid
sequence
database
gene bank,
EMBL
DDJB
Protein
sequence
database
PIR, SWISS -
PROT,TrEMBL

Page 8
Nucleic acid sequence database
gene bank
• The term gene bank refers to any system by which
the genetic composition of some population is identified
and stored
• Set up in 1979 at the LANL ( los Alamos)
• Web server : http://www.ncbi.nlm.nih.gov
• Gene bank is the main nucleotide sequence databases held by the
national center for biotechnology information (NCBI
• Gene bank files contain information like accession numbers and
gene names, phylogenetics classification and references to
published literature.

Page 13
FIG: gene bank file format

Page 14
European molecular biology laboratory
(EMBL)
• Established in 1978 at Heidelberg.
• Place: Heidelberg. Germany.
• Site: http://www,embl-heidelberg
• The EMBL nucleotide sequence database is a comprehensive
database of DNA and RNA sequences collected from the scientific
literature and patent applications and directly submitted from
researches and sequencing groups.
• Data collection is done in collaboration with gene bank (USA) and
the database of japan (DDBJ).

Page 15
Homepage of EMBL

Page 16
DNA Data bank of japan (DDBJ)
• It is located in japan
• Sites: http://www.ddbj.nig.ac.jp
• http://biodatabase.org/index.php/DDBJ
• Establishment: 1984 at the national
institute of genetics (NIG)in mishima,
japan.
• DDBJ has been functioning as an
international nucleotide sequence
database.

Page 17
Protein sequence database
• SWISS-PROT protein sequence database
• SWISS-PROT was created in at the department of medical
biochemistry in 1986.
• In 1987, European Molecular biology laboratory and Swiss institute
of Bioinformatics (SIB) work in collaboration ,as equal partners , to
develop and maintain this highly annotated repository of protein
sequences.
• It provides high quality annotation with minimum redundancy .
• The structure of SWISS – PROT entry is similar to EMBL nucleotide
sequence database format.
• The format is convenient to humans and is used by several
computer programs for analysis.

Page 18
Translated EMBL (TrEMBL)
• It was created in 1996 with the objective to fill the gap
between flow of genomic data and annotated protein sequences.
• TrEMBL contains computer annotated records generated by
translating coding sequences (CDS) available in EMBL nucleotide
sequence database.
• It does not contain translation of those CDS which are already
available in SWISS-PROT ,and acts as a computer annotated
supplement of SWISS-PROT .

Page 19
Protein information resource
(PIR)
• PIR was established in 1984 by the National Biomedical Research
Foundation (NBRF) as a resource to assist researchers in the
identification and interpretation of protein sequence information.
• The database is split into four sections PIR1 to PIR4

Page 20
Secondary databases:
• This database contain additional information derived from the
analysis of data available in primary repositories.
SECONDARY
OR PATTERN
DATABASES
PROFILES
PRINTS
Pfam
identity
BLOCKS
PROSITE

Page 21
1.PROSITE:
• It is a method of determining what is the function of
uncharacterized proteins translated from genomic or cDNA sequences.
• It consists of a database of biologically significant sites, patterns and
profiles that help to reliably identify to which known family of protein (if any)
a new sequence belongs.
• It is first one to develop is PROSITE as a secondary database.
• Maintained collaboratively at the Swiss Institute of bioinformatics .
• SITE: http://ftp.expasy.ch.
• It include protein pattern motifs indicative protein’s function , are widely
used for function prediction studies, cellular localization annotation, and
sequence classification.

Page 22
Home page of PROSITE

Page 24
2. PRINTS:
– A different approach to pattern recognition, termed
"fingerprinting" is used by this database.
– Diagnostically, it makes sense to use many, or all, of
the conserved regions to build a family signature.

Page 25
Direct PRINTS access:
• By accession number
By PRINTS code
By database code
By text
By sequence
By title
By number of motifs
By author
By query language

Page 26
3. BLOCKS
• Blocks are multiply aligned ungapped segments corresponding to
the most highly conserved regions of proteins.
• The BLIMPS (blocks improved searcher ) program searches the
blocks database.
• 4. Pfam
• create protein family
• They are thus particularly useful when analyzing multidomain proteins.
• The biggest drawback of Pfam is its lack of biological information
(annotation) of the protein families.

Page 27
Composite database
• A composite database combines information from various primary
databases and makes it convenient to search the desired
information without querying to all these primary databases.
composite
protien
sequence
database
Nonredundant
database
(NRDB)
Nonredundant
protein
sequence
database
(OWL)
MIPSX
SWISS-
PROT+TrEMBL

Page 28
OWL:
• a composite protein sequence database
• OWL performs fast similarly due to its non redundant which makes it
highly compact.
• Non redundant database (NRDB)
• It is a composite database formed by using PDB sequences,
SWISS-PROT, PIR, TrEMBL.
• It contains non-identical sequences and hence in bigger than OWL
but less proficient for search.

Page 29
MIPSX
• Merged databases
• Produced at the max- planck institute in martinsried. Databases
contains information from resources.
SWISS-PROT+TrEMBL
• Combination of SWISS-PROT and TrEMBL provides the resources .
contains fewer errors.
• Not truly non-redundant.

Page 30
Important database search tool:
SEARCH TOOL FUNCTION PROVIDED
BLAST (BASIC LOCAL
ALIGNMENT TOOL)
Used to analyze sequence information
and detect homologous sequences.
ENTREZ Used to access literature , sequence
and structural database.
DNAPLOT Sequence alignment tool
LOCUS LINK Accessing information on homologous
gene
STRUCTURE It support molecular molding database
(MMDB)and software tool for
structure analysis.
TAXONOMY BROWSER Taxonomic classification of various
species as well as genetic information.

Page 31
Applications
• Protein sequence
• Determination of macromolecular structure
• Molecular evolution
• Biological database in medicines
• Biological database in agriculture
• Drug development
• Sequence alignment
• Evolutionary studies

Page 32
Conclusion:
• Biological databases represent an invaluable resource in
support of biological research.
• Access to biological databases is so important that today virtually
every molecular biological project starts and ends with querying
biological databases.

Page 33
References
• Books
• Bioinformatics: C.S.V Murthy
• Biotechnology: U.Satyanarayan
• Bioinformatics concept ,skill and application:S.C Rastogi
,Parag Rastogi
• Websites
• www.bioinfo.com
• www.wikipedia.com
• www.ncbi.nil.nih.gov

Primary, secondary, tertiary biological database

More Related Content

What's hot

Similar to Primary, secondary, tertiary biological database

More from KAUSHAL SAHU

Recently uploaded

Primary, secondary, tertiary biological database