Databases for Protein and Nucleic Acid Sequences

Presented by – SWARUP MALAKAR

A database is a repository of sequence ( DNA or amino acids ) stored in a
computer which provide a centralized and homogenous view of its content.
or, it is a vast collection of data pertaining to a specific topic, e.g.,
nucleotide sequence, protein sequence etc.
Basically, it is an electronic environment.
Databases are at the heart of bioinformatics.

1. Sequence databases: - that involves the sequences of both proteins and nucleic
acids.
2. Structural databases:- that involves only protein databases.
In additionally, it is also classified into three categories:
A. Primary database B. Secondary databases C. Composite databases.

It contain information of the sequence or structure alone either protein or
nucleic acid .
Example: PIR, SWISS-PROT for protein sequences , NCBI, EMBL and DDBJ for
genome sequences.

PIR: It is functionally annotated
protein sequences and structure.
PIR has collaborated with EBI and
SIB to establish the UniProt (
United Protein Databases).
The central resource of
protein sequence and function.

NCBI ( National Centre of Biotechnology Information ):
- Nov 4, 1988 , the NCBI was established as division of the National Library of medicine for the
development of information systems in molecular biology.
- The NCBI is located in Bethesta, Maryland (U.S.A).
- NCBI built the GenBank, which is an annotated collection of publically available nucleotide and
protein sequences.
- In 1988, the three partners (DDBJ, EMBL and GenBank) of the international Nucelotide
Sequences Database collaboration had a meeting and agreed to use a common format.

i. Maintains collaboration with several NIH institutes, academia, industry and other governmental
agencies.
ii. Develops, distributes, supports and coordinates access to a variety of databases and software for
the scientific and medical communities.
iii. Develops and promotes standards for databases, data deposition and exchange, and biological
nomenclature.
iv. Engages the members of the international scientific community in informatics research and training
through the scientific visitors programs.
Link: https://www.ncbi.nlm.nih.gov/

 In 1992, NCBI has the responsibility for making available the
DNA sequence database to the GenBank.
 Coordinates with individual laboratories and other sequence
data base such those of EMBL and DDBJ.
 Moreover, NCBI has grown to provide other databases in
addition to GenBank.
 GenBank is a comprehensive sequence database that contains
publicly available DNA sequences for more than 1,19,000
different organisms obtained through the submission of
sequence data from individual lab and batch submissions from
large-scale of seq. projects.
 Daily data exchange with the EMBL data library in the UK and
the DNA Data Bank of Japan helps world wide coverage.

 Developed and maintained by European Molecular Biology Laboratory – European
Bioinformatics Institute (EMBL-EBI).
 Comprehensive data nucleotide sequence information.

 The European Molecular Biology Laboratory (EMBL) Nucleotide Sequence Database is a
comprehensive collection of primary nucleotide sequences maintained at the European
Bioinformatics Institute (EBI).
 Link: http:www.ebi.ac.uk/embl/
EMBL is supported by 22 member states, four prospect, and two associated states.
 The laboratory operatory operates from five sites: the main laboratory in Heidelberg, and
outstations Hinxton (EBI, in England), Grenoble (France), Hambury (Germany) and
Manterotando ( near Rome).

 EMBL groups and laboratories perform basic research in molecular biology and molecular
medicine as well as training for science student and visitors.
 Since 1982 this work has been done in collaboration with GenBank (NCBI, Bethesda, USA)
and the DNA Database of Japan (Mishima).
 For sequencing similar searching, a variety of tools (FASTA and BLAST
are available that allow external users to compare their own seq. against the data in
EMBL nucleotide sequence database and other database.

 The DNA Data Bank of Japan (DDBJ) is a biological database that collects DNA
sequences. It was established in 1986.
 Link: https://www.ddbj.nig.ac.jp
 It is located at the National Institute of Genetics (NIG) in the Shizuoka prefecture of
Japan.
 DDBJ is a member of the International Nucleotide Sequence Database
Collaboration or INSDC.
 It exchanges its data with European Molecular Biology Laboratory at the European
Bioinformatics Institute and with GenBank at the National Center for Biotechnology
Information on a daily basis.

 DDBJ Center collects nucleotide sequence data as a member of INSDC(International
Nucleotide Sequence Database Collaboration) and provides freely available nucleotide
sequence data and supercomputer system, to support research activities in life science.
 FEATURES
 group 1: biological source of the sequence (source) The feature, “source” (group 1) is
mandatory for all entries in the international nucleotide database. ...
 group 2: biological function features of the region. ...
 group 3: difference and/or change of the sequence data.

Data type Organism Accession numbers for annotated
sequences (number of entries)
Accession numbers for raw reads
Genome Radish (Raphanus sativus cv. Aokubi S-
h)
WGS: BAOO01000001-
BAOO01072909 (72 909 entries)
scaffold CON: DF196826-
DF236948 (40,123 entries)
DRR012610-DRR012624
Soybean (Glycine max cv. Enrei) BBNX02000001-BBNX02108601 (108
601 entries)
DRR021740-DRR021744
Common marmoset (Callithrix jacchus) WGS: BBXK01000001-
BBXK01109198 (109 198 entries)
scaffold CON: DG000097-
DG000120 (24 entries)
GSS: LB274659-LB427105 (152 447
entries)
DRR036754-DRR036764
List of notable data sets released from the DNA Data Bank of Japan (DDBJ) sequence databases from June 2015 to May 2016

 Hosted at National Institute of Genetics .
 Mainly from scientists in Japan and also from resources all over the world and shave this
nucleotide data with EMBL and GenBank.
 This officially , certified to collect nucleotide sequence from researchers sand to tissue the
internationally recognized number of data submitters.
 About 99% of the nucleotide data in INSDC are submitted by DDMJ
 This database plays a major role to improve the quality of INSDC.
 Each database entry include details of sequences, submitters details bibiliographic
references, biological significance and the scientific name and taxonomy of the organism.

 Features that identify coding regions transcription units, mutation sites etc. are displayed
in a feature table. Major activities of the database.
 Providing internationally recognized accession numbers to sequences.
 Bioinformatics database management developing tools for the analysis and visualization of
biological data.
 Conducting courses for beginners to reduce the complexity in the biological data analysis.

Databases for Protein and Nucleic Acid Sequences

Databases for Protein and Nucleic Acid Sequences

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Databases for Protein and Nucleic Acid Sequences

Similar to Databases for Protein and Nucleic Acid Sequences (20)

Recently uploaded

Recently uploaded (20)

Databases for Protein and Nucleic Acid Sequences