2. EMBL
European Molecular Biology Laboratory
Nucleotide Sequence Database maintained by European Bioinformatics
Institute (EBI) in Hinxton, Cambridge, UK.
EMBL
International Nucleotide Sequence Database Collaboration
DNA
Data Bank of Japan GenBank
3. Key Goal Build, maintain & prepare
biological databases.
To provide computational
services to support deposition
and data analysis.
Make them available to
scientific community.
4. Sequence Retrieval
Sequence Retrieval System (SRS)
• Primary database retrieval system of EMBL.
• Link several databases with main nucleotide and protein databases.
Several databases Nucleotide databases
GenBank
Protein databases
SWISS-PROT
5. SRS Libraries
1. EMBL
Virtual library comprising: EMBLRELEASE, EMBLNEW, EMBLTPA,
EMBLEWGS
2. EMBLRELEASE: Latest public release of EMBL Nucleotide Sequence
Database.
3. EMBLNEW: containing updated & new entries created since last official
release
4. EMBLTPA: containing Third Party Assembly entries
5. EMBLEWGS: containing Whole Genome Shotgun entries
6. EMBLCON: containing Collaboration of New database division entries
7. EMBLCDS: containing individual Coding Sequence data.
6. Sequence Searching in EMBL
Directly through worldwide web or email or programs.
FASTA3
High scoring
gapped sequence
Query
Sequence
Sequence in
Database
Fastx/y3 tfastx/y3
Nucleotide
Sequence
Protein
Sequence
Protein
Sequence
Translated DNA
Databank
PROGRAMS
7. Sequence Submission at EMBL
Webin
Web-based system
for submission For fast submission of
• single
• multiple and
• large number of sequences
Webin Collects;
• Submitter information
• Reference citation information
• Features (coding regions,
regulatory sequence) information
8. Sequence Submission at EMBL
Sequin
• NCBI tool for sequence submission and update.
• Can handle multiple sequence submissions; that includes
long sequences, multiple annotations, segment sets of
DNA, population studies
• Provide graphical viewing and editing options.
• Software running on Windows/PC, Macintosh, Linux
9. Sequence Submission at EMBL
Webin-Align
• Submission of alignment data from phylogenetic and population
analysis of nucleotide sequences.
• Each submission is assigned a unique accession number and sent to
corresponding scientist and researcher who submitted the data.
Accession Number Format
1 + 5 1 + 6
Alphabet Numbers Alphabet Numbers
Example: A54321 Example: A654321
10. Sequence Identifiers
Sequence 1 ATGCATGCATGCATGC
Sequence 2 ATGGCATGGCATGGCA
Sequence 3 ATTGCATTGCATTGCA
Sequence 4 ATTTGATTTGATTTGC
Sequence 1 ATGCATGCATGCATGC
Sequence 2 ATGGCATGGCATGGCA
Sequence 3 ATTGCATTGCATTGCA
Sequence 4 ATTTGATTTGATTTGC
Sequence 5 ATGCATGGCCCGTAAT
Submitter
Update
Version 1
Version 2
Sequence Identifier = Unchanged
Version Number = Changed
EMBL database include new sequence identifiers and version numbers that specify
changes in the sequences.
11. Sequence Identifiers Examples
Protein Identifiers
Protein Identifiers (PI) are currently assigned to all protein translations of coding
features in the nucleotide sequence database to identify the exact protein
translation for each coding sequence.
ATGCATGCATGCATGCATCGATCG
Coding
Sequence
1
Coding
Sequence
2
Coding
Sequence
3
PI-X PI-Y PI-Z
Protein databases
SWISS-PROT
Search for PI-X Sequence in EMBL
12. Protein Identifier Format
3 + 5
3 Letter Prefix Code 5 Numbers
For Example: CAB66605.1
Decimal denote Version Number
13. Resources of EMBL
2 Resources
Databases Tools (Web-Software)
• Nucleotide databases
• Protein databases
• Structure databases
• Microarray databases
• Align
• Clustal W