DNA data bank of japan (DDBJ)

INTRODUCTION
 The DNA Data Bank of Japan is a public
database of nucleotide sequences established at
the National Institute of Genetics (NIG).
DDBJ, http://www.ddbj. nig.ac.jp

HISTORY
 Since 1987, the DDBJ has been collecting
annotated nucleotide sequences as its traditional
database service.
 This endeavor has been conducted in
collaboration with GenBank at the National
Center for Biotechnology Information (NCBI)
and with European Molecular Biology
Laboratory (EMBL) at the European
Bioinformatics Institute (EBI). The collaborative
framework is called the International Nucleotide
Sequence Database Collaboration (INSDC).

 DDBJ collects and edit about 20% of the data
released by these 3 International databases.

 DDBJ began data bank activities in 1986 at NIG
and remains the only nucleotide sequence data
bank in Asia. Although DDBJ mainly receives its
data from Japanese researchers, it can accept
data from contributors from any other country.

DATA UNIT
The nucleotide sequence database is a set of
data units called ENTRIES.
In addition to the nucleotide sequence itself, each
entry contains information about the researcher
who determined the sequence plus related
references, organism and gene function and
features.

DIVISIONS OF DDBJ ENTERIES
 DDBJ classifies entries into 21 divisions as
below;

a: TAXONOMIC DIVISIONS
1. HUM Human
2. PRI Primates (other than human)
3. ROD Rodents
4. MAM Mammals (other than primates and rodents)
5. VRT Vertebrates (other than mammals)
6. INV Invertebrates (animals other than vertebrates)
7. PLN Plants, Fungi, Plastids (eukaryotes other than
animals)
8. BCT Bacteria (including both Eubacteria and
Archaea)
9. VRL Viruses
10. PHG Bacteriophages

b: OTHER DIVISIONS
1. PAT Sequence Data Related To Patent Application
2. ENV Sequences Obtained Via Environmental
Sampling Methods
3. SYN Synthetic Constructs; Artificially Constructed
Sequences
4. EST Expressed Sequence Tags; Short Single Pass
Cdna Sequences
5. TSA Transcriptome Shotgun Assemblies;
Assembled mRNA Sequences
6. GSS Genome Survey Sequences; Short Single
Pass Genomic Sequences
7. HTC High Throughput cDNA Sequences

8. HTG High Throughput Genomic Sequences
9. STS Sequence Tagged Sites
10. UNA The Data Not Annotated
11. CON Contig / Constructed

DATA RETREIVAL IN DDBJ
 For data retrieval on DDBJ, click on SEARCH
AND ANALYSIS on homepage.
 A window tab will open, with various searching
options.

1- GETENTRY
 DDBJ annotated/assembled data retrieval by
accession numbers.
 KEYWORD= ACCESSSION NUMBER thus, only
accession no. is used for the sequence search in
this method of data retrieval.

 In the ID box, write down the exact accession no.
of the sequence you want to search.
 The database is by default on
DDBJ/EMBL/GENBANK.
 OUTPUT FORMAT of any type can be chosen
based on user requirement.

 These formats are shown in the picture below:

 Choose any format and click on SEARCH.
 The sequence in formation will open in a new
window.

CDS amino acid seq FASTA
format

 Same steps are repeated for the search of
PROTEIN SEQUENCE on getentry.
 PROTEIN DATABASE are chosen which are
UNIPROT, PDB, DAD and PATENT.
 Select OUTPUT FORMAT and click on SEARCH.

2- ARSA
 DDBJ annotated/assembled data retrieval by
accession numbers and keywords.
 ACCESSION NUMBERS and KEYWORDS can
both be used for this method of sequence search.

 Put in the keywords or accession no. in the
search bar.
 The more the keywords, the narrower will be the
search.
 A LIST OF ENTERIES based on the search will
be provided below.

 Flatfile, XML and FASTA formats are provided ,
flatfile being the default.
 Click on any sequence or view/download multiple
sequences by selecting more than one sequence.

3- TX SEARCH
 Taxonomy database search of DDBJ.
 Type in the organism name and click on
SEARCH.
 A complete Lineage of the organism will be
given.

4- BLAST
 A blast homepage is also provided on search and
analysis page.

 The BLAST OUTPUT is different in formattinf
from that of provided at NCBI homepage’s
BLAST, the results however are the same.
 The SIGNIFICANT ALIGNMENTS table is first
thing given on DDBJ which is the second on
NCBI BLAST, and the HITS are second on DDBJ
with IDENTIFICATION LINE written below each
hit with its SCORE.

 And lastly the alignment of query with each hit is
given, along with several details.

5- CLUSTALW
 For Multiple alignment and phylogenetic tree-
making, ClustalW is also provided on DDBJ.

 The ClustalW output is almost same, however a
small detail is different.
 In DDBJ , ClustalW only give “*” identifier i.e
only fully conserved sequences are mentioned in
form of symbols.

6- GGGenome
 An ultrafast sequence search, in which you can
type any sequence and it will provide you with the
information from which organism, chromosome
no the sequence belongs to. It will give the exact
base pair no of the sequence too.

 Type in or paste the query sequencein the
search bar,choose the organism and hit
SEARCH.

7- GENDOO
 Functional profiling of gene and disease features
for omics analysis.
 Gendoo provides keywords including diseases,
drugs and biological phenomena related to genes
and diseases of interests.

 Type in the disease/gene,or their IDs to get all
the relevent genes/diseases associated with
them.

DDBJ STATISTICS
 DDBJ statistics gives information about the
releases and their records on DDBJ.
 It also provides useful facts about DDBJ totl data
volume, its contribution in INSDC, proportion of
each division and much more.

 Some important statistical information provided
by DDBJ are:

Data Category Distribution At Each
Archive

Journal Ranking By Counts In Flat
File

DATA SUBMISSION
 When you wish to publicize your sequence
through DDBJ, and your sequence
is acceptable for DDBJ, you can submit your
sequence to DDBJ, even if you have no plan to
publication of any research paper related to the
sequence.
 Once released, the nucleotide sequences
submitted to INSDC including DDBJ are available
for everyone.

(A) Nucleotide Sequence Submission
System
 DDBJ generally recommends you to
use Nucleotide Sequence Submission System

(B) Mass Submission System (MSS)
DDBJ recommend the use of MSS if:
 The submission consists of large number of
sequences (entries); greater than 1024,
 The submission involves long (greater than 500
kb) nucleotide sequences which result in a
complex submission containing many features
(greater than 30 in an entry) as in the case of
genome data, or
 The submission cannot be handled by Nucleotide
Sequence Submission System.

Assignment and Notification of
Accession Number
 We inform an accession number (unique number
assigned by the International Nucleotide
Sequence Database Collaboration) to the Contact
Person whose E-mail address is entered in the
"Contact person E-mail address" field.
This notification is normally sent within five
business days after receipt of the data.

Submitter
 Submitter of the entry is the person who have
responsibility to the submitted data in the entry, in
principle.
Only submitter can update his/her entry. Basically,
submitter takes responsibility to reply inquiry from
DDBJ or DDBJ users about his/her data.

Contact Person
 "Contact person" is the person who is responsible
about the descriptions of the entry and has a duty
as a representative to correspond with DDBJ and
its users."Contact person" has to be one of the
submitters, in principle.
 "Contact person" is the person who will make
contact with DDBJ and its users about the entry,
in principle. So, do not block E-mails from DDBJ.
 When user wishes to contact to the submitter(s)
of an entry of your interest, please contact DDBJ
with the inquiry form with reasons briefly, then we
will forward the message to the submitter(s).

Right of Entry Update
 Only submitters of the entry can update and
modify the entry. After data modification, the
submitter of the entry can also specify either of
immediate release or hold until publication.
However, in principle, if the entry have already
been open to the public, the entry can not restore
hold.

GROWTH IN DDBJ DATA
 When DDBJ first released its nucleotide
sequence database in July 1987, it consisted of
only 66 entries and 108,970 base pairs. In recent
years INSDC databases are increasing at annual
rate of 130-150%.
 Between June 2014 and May 2015, the DDBJ
periodical release increased by 11,879,389
entries and 31,427,753,923 base pairs.

NIG SUPERCOMPUTER
 The NIG supercomputer as a sequence
analytical platform. The DDBJ Center operates
the NIG supercomputer which specializes in
analysis of large-scale sequence data. The NIG
supercomputer offers computational
infrastructure for the construction of DDBJ
databases and analysis services, and provides
researchers with a large-scale data analysis and
supercomputing environment.

 The NIG supercomputer is currently composed of
two computer systems:
(i) the Phase 1 system which was introduced in
2012 .
(ii) the Phase 2 system which went into production
in 2014.

DNA data bank of japan (DDBJ)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to DNA data bank of japan (DDBJ)

Similar to DNA data bank of japan (DDBJ) (20)

Recently uploaded

Recently uploaded (20)

DNA data bank of japan (DDBJ)