Database
Aims:
• Need for storing and communicating
large datasets has grown.
• Make biological data available to
scientists.
• To make biological data available in
computer-readable form.
• To enhance availability.
Classification of database :
Primary
database
Composite
database
Secondary
database
Forms:
• Technical design
• Flat-files
• Relational database (SQL)
• Exchange/publication
technologies (FTP, HTML,
CORBA, XML,...)
Availablity:
• Publicly available, no
restrictions
• Available, but with copyright
• Accessible, but not
downloadable
• Academic, but not freely
available
• Proprietary, commercial;
possibly free for academics
Terminology:
• LOCUS
– size of sequence (in base pairs)
– nature of molecule (e.g. DNA or RNA)
– topology (linear or circular)
• DEFINITION: brief description of gene
• ACCESSION: unique identifier for this (and
some other) databases
• VERSION: lists synonymous or past ID
numbers
Terminology:
• KEYWORDS: list of terms related
to entry; can be used for
keyword searching for related
data
• SOURCE: common name of
relevant organism
• ORGANISM: complete id, with
taxonomic classification
Terminology:
• REFERENCE: credits author(s) who initially
determined the sequence; includes
subsections:
– AUTHOR
– TITLE
– JOURNAL
– PUBMED
• COMMENT: free-formatted text that doesn’t
fit in another category
Primary nucleotide sequence
databases
• EMBL www.ebi.ac.uk/embl/
• GenBank www.ncbi.nlm.nih.gov/Genbank/
• DDBJ www.ddbj.nig.ac.jp
Genbank
• An annotated collection of all publicly
available nucleotide and proteins
• Set up in 1979 at the LANL (Los Alamos).
• Maintained since 1992 NCBI (Bethesda).
• http://www.ncbi.nlm.nih.gov
GenBank
EMBL Nucleotide Sequence Database
• An annotated collection of all publicly
available nucleotide and protein sequences
• Created in 1980 at the European Molecular
Biology Laboratory in Heidelberg.
• Maintained since 1994 by EBI- Cambridge.
• http://www.ebi.ac.uk/embl.html
EMBL
DDBJ–DNA Data Bank of
Japan
• An annotated collection of all publicly
available nucleotide and protein sequences
• Started, 1984 at the National Institute of
Genetics (NIG) in Mishima.
• Still maintained in this institute a team led
by Takashi Gojobori.
• http://www.ddbj.nig.ac.jp
DDBJ
Derived databases
• CUTG Codon usage tabulated from GenBank
http://www.kazusa.or.jp/codon/
• Genetic Codes Deviations from the standard genetic code in
various organisms and organelles
http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mo
de=c
• TIGR Gene Indices Organism-specific databases of EST and gene
sequences http://www.tigr.org/tdb/tgi.shtml
• UniGene Unified clusters of ESTs and full-length mRNA
sequences http://www.ncbi.nlm.nih.gov/UniGene/
• ASAP Alternative spliced isoforms
http://www.bioinformatics.ucla.edu/ASAP
• Intronerator Introns and alternative splicing in C.elegans and
C.briggsae http://www.cse.ucsc.edu/~kent/intronerator/
7/14/2020
5:50 PM
7/14/2020
5:50 PM
7/14/2020
5:50 PM
7/14/2020
5:50 PM
7/14/2020
5:50 PM
7/14/2020
5:50 PM
Nucleic acid structure
databases
• NDB Nucleic acid-containing structures
http://ndbserver.rutgers.edu/
• NTDB Thermodynamic data for nucleic acids
http://ntdb.chem.cuhk.edu.hk/
• RNABase RNA-containing structures from PDB and
NDB http://www.rnabase.org/
• SCOR Structural classification of RNA: RNA motifs by
structure, function and tertiary interactions
• http://scor.lbl.gov/
7/14/2020
5:50 PM
7/14/2020
5:50 PM
7/14/2020
5:50 PM
7/14/2020
5:50 PM
7/14/2020
5:50 PM
Sequence Retrieval Tools
• Various tools to get sequences of interests
from databases
• Entrez in NCBI
http://www.ncbi.nlm.nih.gov/Entrez
• SRS for EMBL and other DBs
http://srs.ebi.ac.uk
• Fetch in GCG package
• Seqret in EMBOSS
Flow
chart
showing
the
organiza
tion of
the
Nucleic
Acid
Database
project.
THANKYOU

Nucleic acid database

  • 2.
  • 3.
    Aims: • Need forstoring and communicating large datasets has grown. • Make biological data available to scientists. • To make biological data available in computer-readable form. • To enhance availability.
  • 4.
    Classification of database: Primary database Composite database Secondary database
  • 5.
    Forms: • Technical design •Flat-files • Relational database (SQL) • Exchange/publication technologies (FTP, HTML, CORBA, XML,...)
  • 7.
    Availablity: • Publicly available,no restrictions • Available, but with copyright • Accessible, but not downloadable • Academic, but not freely available • Proprietary, commercial; possibly free for academics
  • 8.
    Terminology: • LOCUS – sizeof sequence (in base pairs) – nature of molecule (e.g. DNA or RNA) – topology (linear or circular) • DEFINITION: brief description of gene • ACCESSION: unique identifier for this (and some other) databases • VERSION: lists synonymous or past ID numbers
  • 9.
    Terminology: • KEYWORDS: listof terms related to entry; can be used for keyword searching for related data • SOURCE: common name of relevant organism • ORGANISM: complete id, with taxonomic classification
  • 10.
    Terminology: • REFERENCE: creditsauthor(s) who initially determined the sequence; includes subsections: – AUTHOR – TITLE – JOURNAL – PUBMED • COMMENT: free-formatted text that doesn’t fit in another category
  • 11.
    Primary nucleotide sequence databases •EMBL www.ebi.ac.uk/embl/ • GenBank www.ncbi.nlm.nih.gov/Genbank/ • DDBJ www.ddbj.nig.ac.jp
  • 12.
    Genbank • An annotatedcollection of all publicly available nucleotide and proteins • Set up in 1979 at the LANL (Los Alamos). • Maintained since 1992 NCBI (Bethesda). • http://www.ncbi.nlm.nih.gov
  • 13.
  • 14.
    EMBL Nucleotide SequenceDatabase • An annotated collection of all publicly available nucleotide and protein sequences • Created in 1980 at the European Molecular Biology Laboratory in Heidelberg. • Maintained since 1994 by EBI- Cambridge. • http://www.ebi.ac.uk/embl.html
  • 15.
  • 16.
    DDBJ–DNA Data Bankof Japan • An annotated collection of all publicly available nucleotide and protein sequences • Started, 1984 at the National Institute of Genetics (NIG) in Mishima. • Still maintained in this institute a team led by Takashi Gojobori. • http://www.ddbj.nig.ac.jp
  • 17.
  • 18.
    Derived databases • CUTGCodon usage tabulated from GenBank http://www.kazusa.or.jp/codon/ • Genetic Codes Deviations from the standard genetic code in various organisms and organelles http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mo de=c • TIGR Gene Indices Organism-specific databases of EST and gene sequences http://www.tigr.org/tdb/tgi.shtml • UniGene Unified clusters of ESTs and full-length mRNA sequences http://www.ncbi.nlm.nih.gov/UniGene/ • ASAP Alternative spliced isoforms http://www.bioinformatics.ucla.edu/ASAP • Intronerator Introns and alternative splicing in C.elegans and C.briggsae http://www.cse.ucsc.edu/~kent/intronerator/
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
    Nucleic acid structure databases •NDB Nucleic acid-containing structures http://ndbserver.rutgers.edu/ • NTDB Thermodynamic data for nucleic acids http://ntdb.chem.cuhk.edu.hk/ • RNABase RNA-containing structures from PDB and NDB http://www.rnabase.org/ • SCOR Structural classification of RNA: RNA motifs by structure, function and tertiary interactions • http://scor.lbl.gov/ 7/14/2020 5:50 PM
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
    Sequence Retrieval Tools •Various tools to get sequences of interests from databases • Entrez in NCBI http://www.ncbi.nlm.nih.gov/Entrez • SRS for EMBL and other DBs http://srs.ebi.ac.uk • Fetch in GCG package • Seqret in EMBOSS
  • 31.
  • 33.