Nucleic acid database

Aims:
• Need for storing and communicating
large datasets has grown.
• Make biological data available to
scientists.
• To make biological data available in
computer-readable form.
• To enhance availability.

Classification of database :
Primary
database
Composite
database
Secondary
database

Forms:
• Technical design
• Flat-files
• Relational database (SQL)
• Exchange/publication
technologies (FTP, HTML,
CORBA, XML,...)

Availablity:
• Publicly available, no
restrictions
• Available, but with copyright
• Accessible, but not
downloadable
• Academic, but not freely
available
• Proprietary, commercial;
possibly free for academics

Terminology:
• LOCUS
– size of sequence (in base pairs)
– nature of molecule (e.g. DNA or RNA)
– topology (linear or circular)
• DEFINITION: brief description of gene
• ACCESSION: unique identifier for this (and
some other) databases
• VERSION: lists synonymous or past ID
numbers

Terminology:
• KEYWORDS: list of terms related
to entry; can be used for
keyword searching for related
data
• SOURCE: common name of
relevant organism
• ORGANISM: complete id, with
taxonomic classification

Terminology:
• REFERENCE: credits author(s) who initially
determined the sequence; includes
subsections:
– AUTHOR
– TITLE
– JOURNAL
– PUBMED
• COMMENT: free-formatted text that doesn’t
fit in another category

Primary nucleotide sequence
databases
• EMBL www.ebi.ac.uk/embl/
• GenBank www.ncbi.nlm.nih.gov/Genbank/
• DDBJ www.ddbj.nig.ac.jp

Genbank
• An annotated collection of all publicly
available nucleotide and proteins
• Set up in 1979 at the LANL (Los Alamos).
• Maintained since 1992 NCBI (Bethesda).
• http://www.ncbi.nlm.nih.gov

EMBL Nucleotide Sequence Database
available nucleotide and protein sequences
• Created in 1980 at the European Molecular
Biology Laboratory in Heidelberg.
• Maintained since 1994 by EBI- Cambridge.
• http://www.ebi.ac.uk/embl.html

DDBJ–DNA Data Bank of
Japan
available nucleotide and protein sequences
• Started, 1984 at the National Institute of
Genetics (NIG) in Mishima.
• Still maintained in this institute a team led
by Takashi Gojobori.
• http://www.ddbj.nig.ac.jp

Derived databases
• CUTG Codon usage tabulated from GenBank
http://www.kazusa.or.jp/codon/
• Genetic Codes Deviations from the standard genetic code in
various organisms and organelles
http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mo
de=c
• TIGR Gene Indices Organism-specific databases of EST and gene
sequences http://www.tigr.org/tdb/tgi.shtml
• UniGene Unified clusters of ESTs and full-length mRNA
sequences http://www.ncbi.nlm.nih.gov/UniGene/
• ASAP Alternative spliced isoforms
http://www.bioinformatics.ucla.edu/ASAP
• Intronerator Introns and alternative splicing in C.elegans and
C.briggsae http://www.cse.ucsc.edu/~kent/intronerator/

Nucleic acid structure
databases
• NDB Nucleic acid-containing structures
http://ndbserver.rutgers.edu/
• NTDB Thermodynamic data for nucleic acids
http://ntdb.chem.cuhk.edu.hk/
• RNABase RNA-containing structures from PDB and
NDB http://www.rnabase.org/
• SCOR Structural classification of RNA: RNA motifs by
structure, function and tertiary interactions
• http://scor.lbl.gov/
7/14/2020
5:50 PM

Sequence Retrieval Tools
• Various tools to get sequences of interests
from databases
• Entrez in NCBI
http://www.ncbi.nlm.nih.gov/Entrez
• SRS for EMBL and other DBs
http://srs.ebi.ac.uk
• Fetch in GCG package
• Seqret in EMBOSS

Flow
chart
showing
the
organiza
tion of
the
Nucleic
Acid
Database
project.

Nucleic acid database

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Nucleic acid database

Similar to Nucleic acid database (20)

Recently uploaded

Recently uploaded (20)

Nucleic acid database