libraries of life sciences information
collected from scientific experiments
high-throughput experiment technology
genomics, proteomics, metabolomics, mic
roarray gene expression,
Biological database design, development,
and long-term management is a core area
of the discipline of bioinformatics.
There are three major sites for finding
information about nucleic acids (DNA and/or
RNA sequences) on the Web, and all of them
contain basically the same information.
The methods and databases that you will
want to use will depend mainly on how much
data you want and in what form.
The databases EMBL, GenBank, and DDBJ
are the three primary nucleotide sequence
They include sequences submitted directly
by scientists and genome sequencing group,
and sequences taken from literature and
There is comparatively little error checking
and there is a fair amount of redundancy
The entries in the EMBL, GenBank and DDBJ
databases are synchronized on a daily basis, and
the accession numbers are managed in a
consistent manner between these three centers.
The nucleotide databases have reached such
large sizes that they are available
in subdivisions that allow searches or
downloads that are more limited, and hence less
For example, GenBank has currently 17
There are no legal restrictions on the use
of the data in these databases. However,
there are patented sequences in the
GENBANK OR NCBI:
For most sequence searches, GenBank is
your best bet.
It offers a daily exchange of information
with other major sequence databases, has a
variety of user interfaces, fairly detailed
online help (with e-mail addresses for more
information if what is already available is
not sufficient), and a speedy interface.
Because of its popularity, however, GenBank
can also be very slow during peak research
Very detailed searches or searches with massive
amounts of output might be completed more
quickly after hours.
Established by the National Center for
Biotechnology Information (NCBI), GenBank is
a collection of all known DNA sequences from
scientists around the world.
As of July 1, 1996, approximately
286,000,000 bases and 352,400 sequences are
stored in GenBank, and many more are added
Searching GenBank is fairly straightforward
and can be done with a variety of search
if you have never used GenBank before, you
will probably want to start your search with a
Other means of searching GenBank include:
BLAST (Basic Local Alignment
Search Tool) Searches
dbEST (Database of Expressed
dbSTS (Database of Sequence Tagged
Submitting sequences to Gen Bank is
also very easy and is required by most
journals before articles pertaining to the
sequence are published (this provides
easy access to the information for the
You can submit sequences via the WWW
EMBL is good to use when you need a limited
amount of data and when you are not trying to
identify a gene by sequence analysis.
However, because EMBL and all of its mirror
sites are located in Europe, your connection will
be slow more often than not.
All of the information submitted to EMBL is
mirrored daily in both GenBank and DDBJ, so
searching elsewhere might provide the same
amount of information in less time.
EMBL is the database for the European
Molecular Biology Laboratory.
It is a flat-file database that is searched by a
multitude of various search engines.
EMBL sequences are stored in a form
corresponding to the biological state of the
information in vivo.
Thus, cDNA sequences are stored in the
database as RNA sequences, even though
they usually appear in the literature as DNA.
Because DDBJ mirrors its information
daily with GenBank and EMBL, beginning
sequence searchers might want to try a
database with a friendlier searching
However, DDBJ also offers all of its pages
in Japanese as well, so if you are more
comfortable reading the Japanese versions
of the pages, it can be very useful.
DDBJ, the DNA Data Bank of Japan, was
established in 1986 to be one of the major
international DNA Databases (with GenBank
It is certified to collect information from
researchers and assign accession numbers to
SEARCHING DDBJ is some what
awkward as the only way to access most of
the data is by its accession number via
Biological databases are an important tool in
assisting scientists to understand and explain a
host of biological phenomena from the
structure of biomolecules and their interaction,
to the whole metabolism of organisms and to
understanding the evolution of species.
This knowledge helps facilitate the fight
against diseases, assists in the development
of medications and in discovering basic
relationships amongst species in the history of