BIOLOGICAL
SEQUENCE
DATABASES
1
NCBI
 What is NCBI?
 National center for biotechnology
information
 Established in 1998
 Part of national library of medicine at
national institute of health
 Major aim : public database
 Development of software tools for
sequence analysis and disseminate
biomedical information
2
2 explain Roles of NCBI
1) Maintenance of biological databases
whether primary or secondary. It
includes GENEBANK
2) NCBI provides the data retrieval
systems such as ENTREZ
3) Provides computational sources for
the analysis of the GENEBANK data
and other biological data
3
Kinds of databases
Primary databases
Secondary
databases
 Original submission by
the experimentalists who
have originally searched
 Content Is controlled by
the submitters
 Examples include
GENEBANK, SNP and
GEO
 Built up from primary
data which is retrieved by
primary database
 Content controlled by
third party NCBI
 Examples include
RefSeq, RefSNP, NCBI
Structure, Protein. Etc.
4
NCBI homepage
5
6
NCBI
TOOLS
BLAST
Standard blast Mega blast
PSI-blast PHI-blast
RPS blast
BLAST 2 SEQ
DATABASE
RETREIVAL
TOOL
SPECIALIZED
TOOL
ORF finder E-pcr
Sequence
submission
tool bankit
Spidey
DATABASES
Nucleotide
database
Literature
database
Protein
database
Expression
database
Structure
database
Retrieval tool ENTREZ
 Integrated database search and
retrieval system
 Provides extensive links between and
within database records
 Cross references of different
databases
7
3 Sequence submission to
NCBI
 Databases are constantly updated
with the newer submissions of the
sequences via sequence submission
tools such as:
 Bankit
 Sequein
8
Bank it
 Web-based sequence submission tool
 Connect to NCBI Home Page
 Connect to GENEBANK side bar at
left
 Tool of choice for simple submissions
 Can also be used for updating
previously added information
9
Sequein
 Stand alone sequence submission
and updating tool
 Handling multiple sequence
submission
 Provides increased capacity for long
sequence submissions
 Multiple annotation
 Phylogenetic analysis population
10
BLAST
 Basic local alignment search tool
program
 Sequence similarity searches against
a variety of different sequence
databases
 Unigene, gene, MMDB, GEO
11
Kinds of BLAST
12
Blastn Blastp Blastx Tblastn Tblastx
SPECIALIZED TOOLS
 There are a lot of sequence analysis
tools which will be explained later
1) ORF Finder
2) e-PCR
3) SPIDEY
13
ORF FINDER
 Open reading frame finder
 Graphical analysis tool
 Finds all open reading frames in the
user’s sequence or the sequence
already submitted in the databases
 Uses standard and alternative genetic
codes for the analysis of reading
frames
 Packaged with sequein
14
e-PCR
 Electronic polymerase chain reaction
 Searches for the STS
 Whole template DNA is searched for
STS
 New database searches a query
sequence against a sequence
database
15
Spidey
 This is another m RNA to genome
alignment tool
 Searches databases via BLAST
 As an input it gets a single genomic
sequence and m RNA FASTA
sequences
 Pseudo genes and paralogues are
eliminated in this search and rue gene
is selected.
16
Databases of NCBI
Nucleotide
Literature
Protein
Gene
expression
Structure
Chemical
17
Nucleotide database-
GENEBANK
 NCBI’s primary sequence database
 Comprehensive public database of
nucleotide sequences
 Bibliographic support
 Built from authors entry into genebak
regarding EST
 Genebank an EMBL make an INSD
 Collaborative approach to share data
daily
18
HOMOLOGENE
 Automated detection of homologues
 Completely sequenced eukaryotic
genes
 Analyses the proteins of the input
organism
 Blastp
 Taxonomic trees are being made
 Statistical analysis of each match is
done and orthologs and paralogs are
identified 19
Db SNP
 Database of single nucleotide
polymorphisms
 Short deletion and insertions
polymorphisms
 SNP~ 3D structures via Cn3D and
MMDB
 Functional variants could be matched
with the OMIM
20
Literature database- PMC
 Pubmed central
 Digital archive of peer review journals
of life sciences
 Enormous full text journals are there
 Immediate access to full text journals
or within 12 months of publishing
21
Protein database
 ENTREZ PROTEIN ~ Protein
sequence database of NCBI
 Databases are cross searched
 PDB, Swiss-Prot
 Taxonomic relations
 CDD conserved domain database
22
Gene expression database
 Distribution and regulation of the
Transcriptional products
 Normal and abnormal cell types
 Lot of techniques have been
developed for survey of genome wide
transcript expression
23
SAGE map
 Serial analysis of gene expression
map
 Gene expression data analysis
 Tag-to-gene function map
 SAGE tags to gene clusters or a
single gene
 A reciprocal gene to tag SAGE Map is
also available
 Updated weekly
24
Structural database- MMDB
 Molecular modeling database MMDB
 3D macromolecular structures
 XRD and NMR are being used for the
experimental structure determination
 Evolutionary history of function
 Relationship between
macromolecules.
25
26
27
28
DATABASES
29
Chemical database- Pubchem
 Database for the chemical molecules
 Freely accessed through web-user
interface
 Chemical structure
 Diagnostic and therapeutic agents
 Molecular mass below 2000u
 Bridge between macromolecular
genomics and small organic
molecules of cellular metabolism
30
31
Display settings
32
Aspirin
33
34
Thanks
35

BIOLOGICAL SEQUENCE DATABASES

  • 1.
  • 2.
    NCBI  What isNCBI?  National center for biotechnology information  Established in 1998  Part of national library of medicine at national institute of health  Major aim : public database  Development of software tools for sequence analysis and disseminate biomedical information 2
  • 3.
    2 explain Rolesof NCBI 1) Maintenance of biological databases whether primary or secondary. It includes GENEBANK 2) NCBI provides the data retrieval systems such as ENTREZ 3) Provides computational sources for the analysis of the GENEBANK data and other biological data 3
  • 4.
    Kinds of databases Primarydatabases Secondary databases  Original submission by the experimentalists who have originally searched  Content Is controlled by the submitters  Examples include GENEBANK, SNP and GEO  Built up from primary data which is retrieved by primary database  Content controlled by third party NCBI  Examples include RefSeq, RefSNP, NCBI Structure, Protein. Etc. 4
  • 5.
  • 6.
    6 NCBI TOOLS BLAST Standard blast Megablast PSI-blast PHI-blast RPS blast BLAST 2 SEQ DATABASE RETREIVAL TOOL SPECIALIZED TOOL ORF finder E-pcr Sequence submission tool bankit Spidey DATABASES Nucleotide database Literature database Protein database Expression database Structure database
  • 7.
    Retrieval tool ENTREZ Integrated database search and retrieval system  Provides extensive links between and within database records  Cross references of different databases 7
  • 8.
    3 Sequence submissionto NCBI  Databases are constantly updated with the newer submissions of the sequences via sequence submission tools such as:  Bankit  Sequein 8
  • 9.
    Bank it  Web-basedsequence submission tool  Connect to NCBI Home Page  Connect to GENEBANK side bar at left  Tool of choice for simple submissions  Can also be used for updating previously added information 9
  • 10.
    Sequein  Stand alonesequence submission and updating tool  Handling multiple sequence submission  Provides increased capacity for long sequence submissions  Multiple annotation  Phylogenetic analysis population 10
  • 11.
    BLAST  Basic localalignment search tool program  Sequence similarity searches against a variety of different sequence databases  Unigene, gene, MMDB, GEO 11
  • 12.
    Kinds of BLAST 12 BlastnBlastp Blastx Tblastn Tblastx
  • 13.
    SPECIALIZED TOOLS  Thereare a lot of sequence analysis tools which will be explained later 1) ORF Finder 2) e-PCR 3) SPIDEY 13
  • 14.
    ORF FINDER  Openreading frame finder  Graphical analysis tool  Finds all open reading frames in the user’s sequence or the sequence already submitted in the databases  Uses standard and alternative genetic codes for the analysis of reading frames  Packaged with sequein 14
  • 15.
    e-PCR  Electronic polymerasechain reaction  Searches for the STS  Whole template DNA is searched for STS  New database searches a query sequence against a sequence database 15
  • 16.
    Spidey  This isanother m RNA to genome alignment tool  Searches databases via BLAST  As an input it gets a single genomic sequence and m RNA FASTA sequences  Pseudo genes and paralogues are eliminated in this search and rue gene is selected. 16
  • 17.
  • 18.
    Nucleotide database- GENEBANK  NCBI’sprimary sequence database  Comprehensive public database of nucleotide sequences  Bibliographic support  Built from authors entry into genebak regarding EST  Genebank an EMBL make an INSD  Collaborative approach to share data daily 18
  • 19.
    HOMOLOGENE  Automated detectionof homologues  Completely sequenced eukaryotic genes  Analyses the proteins of the input organism  Blastp  Taxonomic trees are being made  Statistical analysis of each match is done and orthologs and paralogs are identified 19
  • 20.
    Db SNP  Databaseof single nucleotide polymorphisms  Short deletion and insertions polymorphisms  SNP~ 3D structures via Cn3D and MMDB  Functional variants could be matched with the OMIM 20
  • 21.
    Literature database- PMC Pubmed central  Digital archive of peer review journals of life sciences  Enormous full text journals are there  Immediate access to full text journals or within 12 months of publishing 21
  • 22.
    Protein database  ENTREZPROTEIN ~ Protein sequence database of NCBI  Databases are cross searched  PDB, Swiss-Prot  Taxonomic relations  CDD conserved domain database 22
  • 23.
    Gene expression database Distribution and regulation of the Transcriptional products  Normal and abnormal cell types  Lot of techniques have been developed for survey of genome wide transcript expression 23
  • 24.
    SAGE map  Serialanalysis of gene expression map  Gene expression data analysis  Tag-to-gene function map  SAGE tags to gene clusters or a single gene  A reciprocal gene to tag SAGE Map is also available  Updated weekly 24
  • 25.
    Structural database- MMDB Molecular modeling database MMDB  3D macromolecular structures  XRD and NMR are being used for the experimental structure determination  Evolutionary history of function  Relationship between macromolecules. 25
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
    Chemical database- Pubchem Database for the chemical molecules  Freely accessed through web-user interface  Chemical structure  Diagnostic and therapeutic agents  Molecular mass below 2000u  Bridge between macromolecular genomics and small organic molecules of cellular metabolism 30
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.