The document discusses various types of biological databases including primary databases that contain original sequence data, secondary databases that contain processed sequence data, and specialized databases. It provides examples of major biological databases available on the web including GenBank, SWISS-PROT, and Pfam. It also discusses algorithms and tools for sequence analysis like BLAST and patterns searches with PROSITE and MEME.
2. Biological Databases
Primary Databases: contains original biological data
(raw nucleic acid sequence data produced and submitted
by researchers worldwide)
-Genbank (https://www.ncbi.nlm.nih.gov/genbank/)
-EMBL (European Molecular Biology Laboratory)
(https://www.ebi.ac.uk)
-DDBJ (DNA Data Bank of Japan)
(https://www.ddbj.nig.ac.jp/index-e.html)
3. Secondary Databases: which contain computationally processed
sequence information derived from the primary databases
-SWISS-PROT: which provides detailed sequence annotation that
includes structure, function, and protein family assignment
-TrEMBL, a database of translated nucleic acid sequences stored
in the EMBL database
-UniProt: (SWISS-PROT + TrEMBL + PIR), which has larger
coverage than any one of the three databases
-Pfam and Blocks: aligned protein sequence information, motifs,
patterns
DALI: protein secondary structure database that is vital for protein
structure classification and threading analysis
4. Major Biological Databases Available Via the
World Wide Web
SWISS-Prot Curated protein sequence database www.ebi.ac.uk/swissprot/acces
s.html
AceDB Genome database for Caenorhabditis elegans www.acedb.org
DDBJ Primary nucleotide sequence database in Japan www.ddbj.nig.ac.jp
EMBL Primary nucleotide sequence database in Europe www.ebi.ac.uk/embl/index.html
Entrez NCBI portal for a variety of biological databases www.ncbi.nlm.nih.gov/gquery/g
query.fcgi
ExPASY Proteomics database http://us.expasy.org/
FlyBase A database of the Drosophila genome http://flybase.bio.indiana.edu/
FSSP Protein secondary structures www.bioinfo.biocenter.helsinki.f
i:8080/dali/index.html
GenBank Primary nucleotide sequence database in NCBI www.ncbi.nlm.nih.gov/Genban
k
HIV Databases HIV sequence data and related immunologic
information
www.hiv.lanl.gov/content/index
Microarray gene
expression data
base
DNA microarray data and analysis tools www.ebi.ac.uk/microarray
OMIM Genetic information of human diseases www.ncbi.nlm.nih.gov/entrez/q
uery.fcgi?db=OMIM
SRS General sequence retrieval system http://srs6.ebi.ac.uk
PubMed Biomedical literature information www.ncbi.nlm.nih.gov/PubMed
TAIR Arabidopsis information database www.arabidopsis.org
11. Algorithms for pairwise alignments
Web resources
• LALIGN - pairwise sequence alignment
• Global alignment: Needle (EMBOSS): https://www.ebi.ac.uk/Tools/psa/
• Local alignment: Water (EMBOSS): https://www.ebi.ac.uk/Tools/psa/
12. • The BLAST program was developed by Stephen Altschul of NCBI in 1990
and has since become one of the most popular programs for sequence
analysis
• BLAST uses heuristics to align a query sequence with all sequences in a
database
• The objective is to find high-scoring ungapped segments among related
sequences
• The existence of such segments above a given threshold indicates pairwise
similarity beyond random chance, which helps to discriminate related
sequences from unrelated sequences in a database
BLAST (Basic Local Alignment Search Tool)
(www.ncbi.nlm.nih.gov/BLAST/)
17. PSI-BLAST Contd.
• An iterative search in which sequences found in one round of
searching are used to build a score model for the next round of
searching
• An important tool for predicting both biochemical activity and
function
• Identify week homologies (distant relatives of a proteins, which are
not found in FASTA or BLAST.
Information:
https://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-2.html
22. Patterns can be generated from multiple sequences using PRATT
https://web.expasy.org/pratt/
PRATT - ExPASy
23. PHI-BLAST
PROSITE pattern for the kinase active site, starting from the conserved DRH and
making use of the very conserved DFG region: D-R-H-[NS]-[DS]-N-[IL]-x-[IV]-x-[DEK]-
[DGST]-G-[NQR]-L-F-H-I-D-F-G
The above query sequence and the PROSITE pattern used as inputs for the PHI-
BLAST search (see next slide)