This document provides an overview of several database searching tools used for comparing biological sequences, including BLAST, FASTA, and ENTERZ. BLAST is commonly used to compare gene and protein sequences against public databases and find optimal local alignments. It breaks queries into fragments to seek matches. FASTA is a similar but faster homology search tool. ENTERZ allows text-based searches across various NCBI biological databases and integrates cross-referenced information. The document discusses the types of BLAST searches and advantages of database systems like reduced redundancy and faster access, as well as disadvantages such as complexity and costs.
3. INTRODUCTION
Data base search engine is a search engine that operates on
material stored in a digital database
Data base searching tool allows to search database objects
whose name or definition contain a certain string
The most important DNA databases are from European
Molecular Biology Laboratory (EMBL), GenBank and
DNA Databank of Japan (DDBJ), and the protein
databases Swiss-Prot and TrEMBL. The commonly used
BLAST and FASTA algorithms are described in detail and
alternative approaches mentioned briefly.
4. BLAST ( Basic LocalAlignment Search
Tool)
BLAST is used for comparing gene and protein sequence
against others in public database
It is a set of sequence comparison algorithm used to search
databases for optimal local alignment to the query
It breaks the query and database sequence into fragments and
seek matches between them
BLAST is a computer algorithm that is available for use online
at National center for biotechnological information (NCBI)
website and many other sites
BLAST is of many types
blastn, blastp, blastpgb, blastx, tblastx
5. Types of BLAST
blastn: this program given a DNA query, return the most similar
DNA sequence from the DNA database that the user specifies
blastp: this program given a protein query, return the most similar
protein sequence from the protein database that the user specifies
blastpgb: this program is used to find distance relatives of protein
blastx: this program compares the six-frame conceptual translation
products of the nucleotide query sequence against a protein
database
tblastx: the main purpose of this software is to find distance
relationship between nucleotide sequence
tblastn: this program compares a protein query against the all six
reading frame of a nucleotide sequence database
Megabalst: when comparing large number of input sequence via
the command line blast, megablast is much faster than running
BLAST multiple times
6.
7. FASTA
FASTA is a DNA and protein sequence alignment software
package
It is a fast Homology search tool
It is similar to BLAST but this tool will speed up sequence
comparison when compared with BLAST
This program contain protein-protein, DNA-DNA, protein-
translated DNA, and ordered or unordered peptide searches
Recent version of FASTA will autocorrect the frame shift errors.
The FASTA package is available from the university of virgina and
the European Bioinformatics Institute
FASTA format follows a largely heuristic method which contribute
high speed of its execution
It initially observes the pattern of word hits, word-word matches of
given length and mark potential matches
8.
9. ENTERZ
The most popular data retrieval system for biological database are
enterz and sequence retrieval system (SRS)
the NCBI develops and maintain ENTERZ, a biological system
It allows text based searches for a wide variety of data, including
annotated genetic sequence information, structural information, as
well as citated abstracts
It has the ability to integrate information, which comes from cross
referencing between NCBI databases
This is highly convenient
10.
11. Advantages
Compressed data redundancy
Decreased updating errors and increased
consistency
Excellent data integrity
Independence from application program
Better data transferring and data security
Faster data access
12. Disadvantages
Database systems are complex, difficult, and time-
consuming to design.
Substantial hardware and software start-up costs.
Damage to database affects virtually all applications
programs.
Extensive conversion costs in moving form a file-
based system to a database system.
Initial training required for all programmers and
users.