Powerpoint Templates
Page 1
PRIMARY,SECONDARY,TERTIARY BIOLOGICAL DATABASE
By
KAUSHAL KUMAR SAHU
Assistant Professor (Ad Hoc)
Department of Biotechnology
Govt. Digvijay Autonomous P. G. College
Raj-Nandgaon ( C. G. )
Powerpoint Templates
Page 2
Synopsis
Introduction
Biological database
Types: 1.primary database
 Nucleic acid sequence database : Genebank , EMBL, DDJB
 Protein sequence database:PIR,SWISS-PROT,TrEMBL
2. Secondary database
• PRINTS
• PROSITE
• PROFILES
• BLOCKS
• IDENTITY
3. Composite database: Non-redundant databases (NRDB)
• Non-redundant protein sequence databases (OWL)
• SWISS-PROT+ TrEMBL
• MIPSX
Important database search tool
Application
Powerpoint Templates
Page 3
INTRODUCTION
DATABASE
• Convenient method of vast amount of information
• Allows for proper storing, searching & retrieving of data.
• Before analyzing them we need to assemble them into central,
shareable resources
• Different Database Types
• Depends on the nature of information stored (sequences, 2D gel or
3D structure images)
• Manner of storage (flat files, tables in a relational database, etc)
Powerpoint Templates
Page 4
BIOLOGICAL DATABASE
• It is the library of life science information collected from scientific
experiment, published literature and computational analysis as
much as possible particular type of information should be available
in one single place and make biological data available in computer
readable form.
• They contain information from research areas including genomics,
proteomics, metabolomics, microarray gene expression, and
phylogenetics.
Powerpoint Templates
Page 5
Biological Databases
Types of biological data and the information they contain
Bibliographic databases Literature
Taxonomic databases Classification
Nucleic acid databases DNA information
Genomic databases Gene level information
Protein databases Protein information
Protein families, domains and
functional sites
Classification of proteins and identifying
domains
Enzymes/ metabolic pathway Metabolic pathways
Powerpoint Templates
Page 6
Types of Biological Databases
• Primary database:
• Primary sequence database are a database that stores bimolecular
sequence. (protein or nucleic acid) and associated annotation
information (organism, species, function, mutation linked to
particular diseases functional/structured patterns,bibliographic etc)
• Primary database tool are effective for identifying the sequence
similarities, but analysis of output is sometimes difficult and cannot
always answer some of the more sophisticated question of
sequence analysis.
Powerpoint Templates
Page 7
primary
sequence
database
Nucleic acid
sequence
database
gene bank,
EMBL
DDJB
Protein
sequence
database
PIR, SWISS -
PROT,TrEMBL
Powerpoint Templates
Page 8
Nucleic acid sequence database
gene bank
• The term gene bank refers to any system by which
the genetic composition of some population is identified
and stored
• Set up in 1979 at the LANL ( los Alamos)
• Web server : http://www.ncbi.nlm.nih.gov
• Gene bank is the main nucleotide sequence databases held by the
national center for biotechnology information (NCBI
• Gene bank files contain information like accession numbers and
gene names, phylogenetics classification and references to
published literature.
Powerpoint Templates
Page 9
Powerpoint Templates
Page 10
Powerpoint Templates
Page 11
Powerpoint Templates
Page 12
Powerpoint Templates
Page 13
FIG: gene bank file format
Powerpoint Templates
Page 14
European molecular biology laboratory
(EMBL)
• Established in 1978 at Heidelberg.
• Place: Heidelberg. Germany.
• Site: http://www,embl-heidelberg
• The EMBL nucleotide sequence database is a comprehensive
database of DNA and RNA sequences collected from the scientific
literature and patent applications and directly submitted from
researches and sequencing groups.
• Data collection is done in collaboration with gene bank (USA) and
the database of japan (DDBJ).
Powerpoint Templates
Page 15
Homepage of EMBL
Powerpoint Templates
Page 16
DNA Data bank of japan (DDBJ)
• It is located in japan
• Sites: http://www.ddbj.nig.ac.jp
• http://biodatabase.org/index.php/DDBJ
• Establishment: 1984 at the national
institute of genetics (NIG)in mishima,
japan.
• DDBJ has been functioning as an
international nucleotide sequence
database.
Powerpoint Templates
Page 17
Protein sequence database
• SWISS-PROT protein sequence database
• SWISS-PROT was created in at the department of medical
biochemistry in 1986.
• In 1987, European Molecular biology laboratory and Swiss institute
of Bioinformatics (SIB) work in collaboration ,as equal partners , to
develop and maintain this highly annotated repository of protein
sequences.
• It provides high quality annotation with minimum redundancy .
• The structure of SWISS – PROT entry is similar to EMBL nucleotide
sequence database format.
• The format is convenient to humans and is used by several
computer programs for analysis.
Powerpoint Templates
Page 18
Translated EMBL (TrEMBL)
• It was created in 1996 with the objective to fill the gap
between flow of genomic data and annotated protein sequences.
• TrEMBL contains computer annotated records generated by
translating coding sequences (CDS) available in EMBL nucleotide
sequence database.
• It does not contain translation of those CDS which are already
available in SWISS-PROT ,and acts as a computer annotated
supplement of SWISS-PROT .
Powerpoint Templates
Page 19
Protein information resource
(PIR)
• PIR was established in 1984 by the National Biomedical Research
Foundation (NBRF) as a resource to assist researchers in the
identification and interpretation of protein sequence information.
• The database is split into four sections PIR1 to PIR4
Powerpoint Templates
Page 20
Secondary databases:
• This database contain additional information derived from the
analysis of data available in primary repositories.
SECONDARY
OR PATTERN
DATABASES
PROFILES
PRINTS
Pfam
identity
BLOCKS
PROSITE
Powerpoint Templates
Page 21
1.PROSITE:
• It is a method of determining what is the function of
uncharacterized proteins translated from genomic or cDNA sequences.
• It consists of a database of biologically significant sites, patterns and
profiles that help to reliably identify to which known family of protein (if any)
a new sequence belongs.
• It is first one to develop is PROSITE as a secondary database.
• Maintained collaboratively at the Swiss Institute of bioinformatics .
• SITE: http://ftp.expasy.ch.
• It include protein pattern motifs indicative protein’s function , are widely
used for function prediction studies, cellular localization annotation, and
sequence classification.
Powerpoint Templates
Page 22
Home page of PROSITE
Powerpoint Templates
Page 23
Powerpoint Templates
Page 24
2. PRINTS:
– A different approach to pattern recognition, termed
"fingerprinting" is used by this database.
– Diagnostically, it makes sense to use many, or all, of
the conserved regions to build a family signature.
Powerpoint Templates
Page 25
Direct PRINTS access:
• By accession number
By PRINTS code
By database code
By text
By sequence
By title
By number of motifs
By author
By query language
Powerpoint Templates
Page 26
3. BLOCKS
• Blocks are multiply aligned ungapped segments corresponding to
the most highly conserved regions of proteins.
• The BLIMPS (blocks improved searcher ) program searches the
blocks database.
• 4. Pfam
• create protein family
• They are thus particularly useful when analyzing multidomain proteins.
• The biggest drawback of Pfam is its lack of biological information
(annotation) of the protein families.
Powerpoint Templates
Page 27
Composite database
• A composite database combines information from various primary
databases and makes it convenient to search the desired
information without querying to all these primary databases.
composite
protien
sequence
database
Nonredundant
database
(NRDB)
Nonredundant
protein
sequence
database
(OWL)
MIPSX
SWISS-
PROT+TrEMBL
Powerpoint Templates
Page 28
OWL:
• a composite protein sequence database
• OWL performs fast similarly due to its non redundant which makes it
highly compact.
• Non redundant database (NRDB)
• It is a composite database formed by using PDB sequences,
SWISS-PROT, PIR, TrEMBL.
• It contains non-identical sequences and hence in bigger than OWL
but less proficient for search.
Powerpoint Templates
Page 29
MIPSX
• Merged databases
• Produced at the max- planck institute in martinsried. Databases
contains information from resources.
SWISS-PROT+TrEMBL
• Combination of SWISS-PROT and TrEMBL provides the resources .
contains fewer errors.
• Not truly non-redundant.
Powerpoint Templates
Page 30
Important database search tool:
SEARCH TOOL FUNCTION PROVIDED
BLAST (BASIC LOCAL
ALIGNMENT TOOL)
Used to analyze sequence information
and detect homologous sequences.
ENTREZ Used to access literature , sequence
and structural database.
DNAPLOT Sequence alignment tool
LOCUS LINK Accessing information on homologous
gene
STRUCTURE It support molecular molding database
(MMDB)and software tool for
structure analysis.
TAXONOMY BROWSER Taxonomic classification of various
species as well as genetic information.
Powerpoint Templates
Page 31
Applications
• Protein sequence
• Determination of macromolecular structure
• Molecular evolution
• Biological database in medicines
• Biological database in agriculture
• Drug development
• Sequence alignment
• Evolutionary studies
Powerpoint Templates
Page 32
Conclusion:
• Biological databases represent an invaluable resource in
support of biological research.
• Access to biological databases is so important that today virtually
every molecular biological project starts and ends with querying
biological databases.
Powerpoint Templates
Page 33
References
• Books
• Bioinformatics: C.S.V Murthy
• Biotechnology: U.Satyanarayan
• Bioinformatics concept ,skill and application:S.C Rastogi
,Parag Rastogi
• Websites
• www.bioinfo.com
• www.wikipedia.com
• www.ncbi.nil.nih.gov

Primary, secondary, tertiary biological database

  • 1.
    Powerpoint Templates Page 1 PRIMARY,SECONDARY,TERTIARYBIOLOGICAL DATABASE By KAUSHAL KUMAR SAHU Assistant Professor (Ad Hoc) Department of Biotechnology Govt. Digvijay Autonomous P. G. College Raj-Nandgaon ( C. G. )
  • 2.
    Powerpoint Templates Page 2 Synopsis Introduction Biologicaldatabase Types: 1.primary database  Nucleic acid sequence database : Genebank , EMBL, DDJB  Protein sequence database:PIR,SWISS-PROT,TrEMBL 2. Secondary database • PRINTS • PROSITE • PROFILES • BLOCKS • IDENTITY 3. Composite database: Non-redundant databases (NRDB) • Non-redundant protein sequence databases (OWL) • SWISS-PROT+ TrEMBL • MIPSX Important database search tool Application
  • 3.
    Powerpoint Templates Page 3 INTRODUCTION DATABASE •Convenient method of vast amount of information • Allows for proper storing, searching & retrieving of data. • Before analyzing them we need to assemble them into central, shareable resources • Different Database Types • Depends on the nature of information stored (sequences, 2D gel or 3D structure images) • Manner of storage (flat files, tables in a relational database, etc)
  • 4.
    Powerpoint Templates Page 4 BIOLOGICALDATABASE • It is the library of life science information collected from scientific experiment, published literature and computational analysis as much as possible particular type of information should be available in one single place and make biological data available in computer readable form. • They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics.
  • 5.
    Powerpoint Templates Page 5 BiologicalDatabases Types of biological data and the information they contain Bibliographic databases Literature Taxonomic databases Classification Nucleic acid databases DNA information Genomic databases Gene level information Protein databases Protein information Protein families, domains and functional sites Classification of proteins and identifying domains Enzymes/ metabolic pathway Metabolic pathways
  • 6.
    Powerpoint Templates Page 6 Typesof Biological Databases • Primary database: • Primary sequence database are a database that stores bimolecular sequence. (protein or nucleic acid) and associated annotation information (organism, species, function, mutation linked to particular diseases functional/structured patterns,bibliographic etc) • Primary database tool are effective for identifying the sequence similarities, but analysis of output is sometimes difficult and cannot always answer some of the more sophisticated question of sequence analysis.
  • 7.
    Powerpoint Templates Page 7 primary sequence database Nucleicacid sequence database gene bank, EMBL DDJB Protein sequence database PIR, SWISS - PROT,TrEMBL
  • 8.
    Powerpoint Templates Page 8 Nucleicacid sequence database gene bank • The term gene bank refers to any system by which the genetic composition of some population is identified and stored • Set up in 1979 at the LANL ( los Alamos) • Web server : http://www.ncbi.nlm.nih.gov • Gene bank is the main nucleotide sequence databases held by the national center for biotechnology information (NCBI • Gene bank files contain information like accession numbers and gene names, phylogenetics classification and references to published literature.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
    Powerpoint Templates Page 13 FIG:gene bank file format
  • 14.
    Powerpoint Templates Page 14 Europeanmolecular biology laboratory (EMBL) • Established in 1978 at Heidelberg. • Place: Heidelberg. Germany. • Site: http://www,embl-heidelberg • The EMBL nucleotide sequence database is a comprehensive database of DNA and RNA sequences collected from the scientific literature and patent applications and directly submitted from researches and sequencing groups. • Data collection is done in collaboration with gene bank (USA) and the database of japan (DDBJ).
  • 15.
  • 16.
    Powerpoint Templates Page 16 DNAData bank of japan (DDBJ) • It is located in japan • Sites: http://www.ddbj.nig.ac.jp • http://biodatabase.org/index.php/DDBJ • Establishment: 1984 at the national institute of genetics (NIG)in mishima, japan. • DDBJ has been functioning as an international nucleotide sequence database.
  • 17.
    Powerpoint Templates Page 17 Proteinsequence database • SWISS-PROT protein sequence database • SWISS-PROT was created in at the department of medical biochemistry in 1986. • In 1987, European Molecular biology laboratory and Swiss institute of Bioinformatics (SIB) work in collaboration ,as equal partners , to develop and maintain this highly annotated repository of protein sequences. • It provides high quality annotation with minimum redundancy . • The structure of SWISS – PROT entry is similar to EMBL nucleotide sequence database format. • The format is convenient to humans and is used by several computer programs for analysis.
  • 18.
    Powerpoint Templates Page 18 TranslatedEMBL (TrEMBL) • It was created in 1996 with the objective to fill the gap between flow of genomic data and annotated protein sequences. • TrEMBL contains computer annotated records generated by translating coding sequences (CDS) available in EMBL nucleotide sequence database. • It does not contain translation of those CDS which are already available in SWISS-PROT ,and acts as a computer annotated supplement of SWISS-PROT .
  • 19.
    Powerpoint Templates Page 19 Proteininformation resource (PIR) • PIR was established in 1984 by the National Biomedical Research Foundation (NBRF) as a resource to assist researchers in the identification and interpretation of protein sequence information. • The database is split into four sections PIR1 to PIR4
  • 20.
    Powerpoint Templates Page 20 Secondarydatabases: • This database contain additional information derived from the analysis of data available in primary repositories. SECONDARY OR PATTERN DATABASES PROFILES PRINTS Pfam identity BLOCKS PROSITE
  • 21.
    Powerpoint Templates Page 21 1.PROSITE: •It is a method of determining what is the function of uncharacterized proteins translated from genomic or cDNA sequences. • It consists of a database of biologically significant sites, patterns and profiles that help to reliably identify to which known family of protein (if any) a new sequence belongs. • It is first one to develop is PROSITE as a secondary database. • Maintained collaboratively at the Swiss Institute of bioinformatics . • SITE: http://ftp.expasy.ch. • It include protein pattern motifs indicative protein’s function , are widely used for function prediction studies, cellular localization annotation, and sequence classification.
  • 22.
  • 23.
  • 24.
    Powerpoint Templates Page 24 2.PRINTS: – A different approach to pattern recognition, termed "fingerprinting" is used by this database. – Diagnostically, it makes sense to use many, or all, of the conserved regions to build a family signature.
  • 25.
    Powerpoint Templates Page 25 DirectPRINTS access: • By accession number By PRINTS code By database code By text By sequence By title By number of motifs By author By query language
  • 26.
    Powerpoint Templates Page 26 3.BLOCKS • Blocks are multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins. • The BLIMPS (blocks improved searcher ) program searches the blocks database. • 4. Pfam • create protein family • They are thus particularly useful when analyzing multidomain proteins. • The biggest drawback of Pfam is its lack of biological information (annotation) of the protein families.
  • 27.
    Powerpoint Templates Page 27 Compositedatabase • A composite database combines information from various primary databases and makes it convenient to search the desired information without querying to all these primary databases. composite protien sequence database Nonredundant database (NRDB) Nonredundant protein sequence database (OWL) MIPSX SWISS- PROT+TrEMBL
  • 28.
    Powerpoint Templates Page 28 OWL: •a composite protein sequence database • OWL performs fast similarly due to its non redundant which makes it highly compact. • Non redundant database (NRDB) • It is a composite database formed by using PDB sequences, SWISS-PROT, PIR, TrEMBL. • It contains non-identical sequences and hence in bigger than OWL but less proficient for search.
  • 29.
    Powerpoint Templates Page 29 MIPSX •Merged databases • Produced at the max- planck institute in martinsried. Databases contains information from resources. SWISS-PROT+TrEMBL • Combination of SWISS-PROT and TrEMBL provides the resources . contains fewer errors. • Not truly non-redundant.
  • 30.
    Powerpoint Templates Page 30 Importantdatabase search tool: SEARCH TOOL FUNCTION PROVIDED BLAST (BASIC LOCAL ALIGNMENT TOOL) Used to analyze sequence information and detect homologous sequences. ENTREZ Used to access literature , sequence and structural database. DNAPLOT Sequence alignment tool LOCUS LINK Accessing information on homologous gene STRUCTURE It support molecular molding database (MMDB)and software tool for structure analysis. TAXONOMY BROWSER Taxonomic classification of various species as well as genetic information.
  • 31.
    Powerpoint Templates Page 31 Applications •Protein sequence • Determination of macromolecular structure • Molecular evolution • Biological database in medicines • Biological database in agriculture • Drug development • Sequence alignment • Evolutionary studies
  • 32.
    Powerpoint Templates Page 32 Conclusion: •Biological databases represent an invaluable resource in support of biological research. • Access to biological databases is so important that today virtually every molecular biological project starts and ends with querying biological databases.
  • 33.
    Powerpoint Templates Page 33 References •Books • Bioinformatics: C.S.V Murthy • Biotechnology: U.Satyanarayan • Bioinformatics concept ,skill and application:S.C Rastogi ,Parag Rastogi • Websites • www.bioinfo.com • www.wikipedia.com • www.ncbi.nil.nih.gov