This document discusses various bioinformatics tools and their functions. It provides details on multiple sequence alignment tools like CLUSTAL Omega, CLUSTALW, BLAST, and FASTA. It explains that CLUSTAL Omega can align a large number of sequences quickly and accurately using progressive alignment. CLUSTALW performs multiple sequence alignment in three steps - pairwise alignment, guide tree creation, and multiple alignment using the guide tree. BLAST can identify unknown sequences by comparing them to known sequences. FASTA uses short exact matches to find similar regions between sequences. Expasy provides access to databases for proteomics, genomics, and other areas. MASCOT searches peptide mass fingerprinting and shotgun proteomics datasets.
1. Article
Bioinformatics (401)
Name: Misbah Tabsum
ID:mc170201553
Bioinformatics tools and software and its specificity and functioning
Abstract: our biological system has been provided us a vast qualitative and quantitative
descriptions of the divers collection of cellular components. This information is being used in
many researches and exploration in dynamic response of living system like cells, tissues,
organelles and organ systems. This information are now a days used in many aspects like
proteomics, genomics phylogeny, population genetics, transcriptomics and system biology
etc.There are many platforms that are responsible for storing, scoring and analyzing all this
information. There are many data bases that can provide us the information about DNA and
protein sequences. Here we will discussed about these data bases and many other online
bioinformatics tools, their specificity and their applications.
Key words: bioinformatics, CLUSTAL OMEGA, BLAST, FASTA, CLUSTALW, EXPASY, MASCOT,
GENE BANK,
Introduction:
Bioinformatics tools provide us a basic plate
form from which we can obtain all the
information about our genetics and
proteomics level. By using these tools we
can sequence, score and analysis the data
obtain from organisms genetic history. We
can also use this data in our many fields like
proteomics, genomics phylogeny,
population genetics, transcriptomics and
system biology etc. here we will discussed
some online free bioinformatics tools and
their applications[5].
CLUSTAL Omega
Introduction:
It is a new multiple sequence alignment
program. It is often use to generate
alignment between three or more
sequences[1].
Specificity and functionality:
it is a new version of widely used tool i.e
Clustal. Which is actually a series of
2. program that we use for multiple sequence
alignment[2].
This program can deal with a large number
of DNA, RNA or protein sequence.
Basically this tool is use to align a large
number of nucleotide or amino acid
sequence.
This can do its job quickly and accurately. As
this program use the progressive alignment
heuristic[3]. Its accuracy on a small test is
much greater as we obtain from a high
quality aligner.
It is known to be a best alignment tool in
term of its execution time and quality[4].
Clustal Omega is also useful in such a sense
it has a powerful ability of adding sequence
and information in existing alignments.
CLUSTALW
Introduction:
ClustalW is an online tool to perfume MSA.
Clustal was 1st
describe in 1988.[6]
It is a tool used for aligning multiple protein
or nucleotide sequence.It is a freely
available tool which is easy to use.
Applications:
Clustal programs also widely used in
molecular systematic. We can use these
tools for identification of different genes
and species. As clustal tools are apply to
sequence the ribosomal RNA and intergenic
regions from this we can identify the
difference between gene, species and strain
level.
We eventually used Clustal to draw the
phylogenetic trees[7].
Specificity and functionality:
ClustalW is a mostly used heuristic method
fo computition of multiple sequence
alignment.
It is developed by European Molecular
Biology Laboratory & European
Bioinformatics Institute[8].
Clustal align the sequence by three basic
steps,
1. first pairwise alignment is done.
2. Then a guide tree is created
3. Last by using this guide tree we carry out
multiple alignment.
It can perform alignment slowly but
accurately. While when it is fast then
approximately performed.
Scop of ClustalW:
It can create multiple alignment.
It can optimize the existing alignment.
It can be profile analysis and create
phylogenetic trees.
FASTA
Introduction:
It is develop in 1988. It is a search data base
for query protein and nucleotide sequence.
It does Fast Alignment. FASTA extract the
region of absolute identity or 100 %
similarity.
Functionality and specificity:
3. FASTA using the short length of exact
matching that means short sequence of our
interest which is exactly match to database
query[9].
It trying to find those small regions within
one sequence that exist in the other
sequence in exact same pattern. Once it
extract these sequence then it perform
Needleman or Smith Waterman over
resulting alignment and find the best
sequence.
We can select protein and DNA sequence
from NCBI or other online websites and
input it into the FAST algorithm so we can
find the matching sequence to
database[10].
Types of FASTA:
There are six types of FASTS:
Fasts 35:
It compare unordered peptides to a protein
sequence database.
Fastm 35:
It compare ordered peptides (or short DNA
sequences) to a protein (DNA) sequence
database.
Fasta 35:
It can scan a protein or DNA sequence
library for similar sequences.
Fastx 35:
It compare a translated DNA sequence (6
ORFs) to a protein sequence database.
tFastx 35:
it compare a protein sequence to a DNA
sequence database (6 ORFs).
Fasty 35:
it compare a DNA sequence (6ORFs) to a
protein sequence database[11].
Conclusion:
It is a very fast way to compare the
sequence of our interest to the sequence of
database.
BLAST
Introduction:
BLAST is an abbreviation of Basic Sequence
Alignment Search Tool. It was developed in
1990 by National Center for the
Biotechnology Information. (NCBI) – USA.
(6),
Blast is an online free program that is use to
search databases for query protein and
nucleotide sequence. It can also search
translational products.
Functionality and specificity:
Blast can identify unknown sequence by
comparing them to the known sequence. It
helps in searching sequence database. By
using Blast can identify the parent
organisms, function and evolutionary
history.
For the sac of working with Blast 1st
we
select specific amino acid from NCBI. For
unknown sequence we use NGS and Mass
Spectrometry, and for known sequence we
use NCBI and UCS[12].
After this we align these sequence for the
sac of getting best alignment by using
different scoring schemes. Blast provide a
quick alignment on sequence.
Types of Blast:
4. There are two basic types of Blast.
Nucleotides:
Blastn:
It can compares a nucleotide query
sequence against a nucleotide database.
Proteins:
Blastp:
We can compare amino acid query
sequence to protein data base sequence.
There are many other types of Blast,
Blastx:
It compares a nucleotide query sequence
against a protein sequence database.
Helps us to find potential translation
products of unknown nucleotide sequences.
tBlastn:
It can compares a protein query sequence
against a nucleotide sequence database.
The nucleotide sequence dynamically
translated into all reading frames.
tBlastx:
We can compares by this the six-frame
translated proteins of a nucleotide query
sequence against the six - frame translated
proteins of a nucleotide sequence database.
Conclusion:
Blast has become an essential tool for
scientist due to its sensitivity and speed.
Biologist use Blast to compare the
nucleotide and protein sequence for both
single and large databases. It can identify
unknown sequence by comparing them to
known sequence[13].
Expasy
Introduction:
Expasy is an online tool developed by Swiss
Bioinformatics Institute (SBI). Expasy
provide us an access to database and tools
like proteomics, genomics, system biology,
phylogeny, transcriptomics and population
genetics[14].
Functionality and specificity:
Expasy is a website on Google, we can
approach Expasy by Xpasy.com.
This is a website which open with a lot of
informations. We can access by this
webpage to the following sites[15].
Scan Prosite:
It helps us to extract all pattern of protein
sequence. It allows to scan the protein
sequence for the occurrence of pattern,
profile and motifs stored in Prosite
database[16].
Peptide Mass:
It can helps us to estimate the mass of
peptides(small portion of protein), may be
resulting by enzymatic digestion of
protein[17].
PROWL and Find Mod:
It helps to predict the post translational
modification of protein[18].
Gene Bank
Introduction:
Gene bank is an online database. Which is
actually consist of a collection of all
publically available DNA sequence. Gene
5. bank is a part of International Nucleotide
Sequence Database Collaboration.
This is mainly comprised DNA Data Bank of
Japan, Gene bank of NCBI and European
Molecular Biology Laboratory (EMBL).
These organizations exchange their data on
daily bases.
Functionality and specificity:
Gene bank is develop by Swiss
Bioinformatics Institute.(SIB).
This website provide us access to database
for protein, DNA and nucleotide. Gene bank
also helps us in such a way that from this
online tool we can take an approach to
many other tools and websites, like
proteomics,
genomicsphylogeny,[19]population genetics
and system biology etc.
How can we access to this?
In the gene bank several structural,
sequence and molecular interaction data
base are present. We can access to this by
going to online web portal as these are
available online on the web. We can freely
access and download data from this[20].
MASCOT
Introduction:
MASCOT is an online tool that helps is in
searching peptide mass finger printing and
shotgun proteomics dataset. This is an
online Bottom up Proteomic Search Engine
developed by Matrix science.
Functionality and specificity:
It’s a fact that there is low similarity of
sequence in MASCOT but it achieve
wonderful alignment by using the three way
alignment in addition to two way
alignment. Matrix technology provide a web
bottom up proteomics search engine,
“Mascot” Mascot can seek peptide mass
fingerprinting and shotgun proteomics
dataset. Mascot is the maximum
extensively used on-line seek tool for
proteomics information. But, it lacks a
batch processing mode. Additionally, it does
not cater for pinnacle-down proteomics
facts.
ORF finder:
ORF finder searches for open studying
frames (orfs) in the DNA collection that we
input. This system returns the variety of
each ORF, together with its protein
translation. We use ORF finder to look
newly sequenced DNA for ability protein
encoding segments; then we verify
expected protein by the usage of newly
advanced smart blast or normal blastp.
ProSighte:
Prosite PTM search Top Down proteomic
data and report the precursor protein. This
online top down proteomics search engine
was developed by Kelleher et all.
Functionality and specificity:
Post translational modifications can be
accurately identified by using ProSight.
Prosight ptm [21] turned into advanced as a
web-based totally application that enabled
researchers the usage of impartial mass lists
of fragment ions to proteomic databases
[22]. while combined with predicted ptm
records, this can allow the researchers to
become aware about and constitute
6. proteins with the resource of identifying,
and it is mentioned both proteins must
have the located precursor mass and bring
about the determined fragment pattern.
Two types of database schema were
supported: a easy schema and a especially
annotated schema. In Easy schema we look
about most effective sequence variations
and a few precise phoshphorylation [23]
cases. Exceedingly annotated schema
databases, but, took beneath consideration
a huge quantity of potential placed up-
translational adjustments , by myself and in
aggregate with others. By means of
querying the found neutral masses toward
the ones databases, a person may need to
accomplish protein identity and
characterization the usage of the pinnacle
down approach.
Reference:
Higgins,D.G. and Sharp,P.M. (1988) CLUSTAL: a package for performing multiple sequence
alignment on a microcomputer. Gene, 73, 237–244.
7. Myers,E.W. and Miller,W. (1988) Optimal alignments in linear space. Comput. Applic.
Biosci., 4, 11–17.
Feng,D.F. and Doolittle,R.F. (1987) Progressive sequence alignment as a prerequisite to correct
phylogenetic trees. J. Mol. Evol. , 25,351–360.
Taylor,W.R. (1988) A flexible method to align large numbers of biological sequences. J. Mol.
Evol. , 28, 161–169.
Wilbur,W.J. and Lipman,D.J. (1983) Rapid similarity searches of nucleic acid and protein data
banks. Proc. Natl Acad. Sci. USA , 80, 726–730.
Higgins D.G. and Sharp,P.M. (1988) CLUSTAL: a package for performing multiple sequence
alignment on a microcomputer. Gene, 73, 237–244. [PubMed
Higgins D.G., Thompson,J.D. and Gibson,T.J. (1996) Using CLUSTAL for multiple sequence
alignments. Methods Enzymol., 266, 383–402. [PubMed]
Jeannmougin F., Thompson,J.D., Gouy,M., Higgins,D.G. and Gibson,T.J. (1998) Multiple
sequence alignment with Clustal X. Trends Biochem. Sci., 23, 403–405. [PubMed]
Rapid and sensitive sequence comparison with FASTP and FASTA.
(1990 January 01) Methods in enzymology 183 :63-98
10. Searching protein sequence libraries: comparison of the sensitivity and selectivity of the
Smith-Waterman and FASTA algorithms.
(1991 November 01) Genomics 11 (3) :635-650
11. EC Franklin, M Stat, X Pochon… - Molecular Ecology …, 2012 - Wiley Online Library
12. Altschul, S. F., et al. Basic Local Alignment Search Tool. Journal of Molecular Biology 215,
403–410 (1990) doi:10.1016/S0022-2836(05)80360-2 (link to article)
13. Altschul, S. F., et al. Gapped Blast and PSI-Blast: A new generation of protein database
search programs. Nucleic Acids Research 25, 3389–3402 (1997)
14. Gasteiger, E.; Gattiker, A; Hoogland, C; Ivanyi, I; Appel, RD; Bairoch, A
(2003). "ExPASy: The proteomics server for in-depth protein knowledge and
analysis". Nucleic Acids Research. 31 (13):
37848. doi:10.1093/nar/gkg563. PMC 168970 PMID 12824418.
15. Ellis, R.H., T.D. Hong and E.H. Roberts (1985). Handbook of Seed Technology for
8. Genebanks Vol. II: Compendium of Specific Germination Information and Test
Recommendations. IBPGR (now Bioversity International). Rome, Italy. Archived from the
original on 11 December 2008.
16. Altschul,S.F., Madden,T.L., Schaeffer,A.A., Zhang,J., Miller,W. and Lipman,D.J. (1997)
Gapped BLAST and PSI-BLAST: a new generation of protein database
17. Gattiker,A., Gasteiger,E. and Bairoch,A. (2002) ScanProsite: a reference implementation of a
PROSITE
scanning tool. Applied Bioinform.
18. Peitsch,M.C. (1995) Protein modelling by E-Mail. Biotechnology
19. Guex,N. and Peitsch,M.C. (1997) SWISS-MODEL and the Swiss-PdbViewer: An environment
for
comparative protein modeling. Electrophoresis
20. Sievers F, Wilm A, Dineen D et al (2011) Fast, scalable generation of high-quality protein
multiple
sequence alignments using Clustal Omega. Mol Syst Biol 7:539. doi: 10.1038/msb.2011.75
21. LeDuc RD, Taylor GK, Kim Y-B, Januszyk TE, Bynum LH, Sola JV, Garavelli JS, Kelleher NL.
ProSight PTM: an integrated environment for protein identification and characterization by top-
down mass spectrometry, Nucleic Acids Res , 2004, vol. 32 (pg. W340-W345)
22. Roth MJ, Forbes AJ, Boyne MTII, Kim Y-B, Robinson DE, Kelleher NL. Precise and parallel
characterization of coding polymorphisms, alternative splicing, and modifications in human
proteins by mass spectrometry, Mol. Cell. Proteom , 2005, vol. 4 (pg. 1002-1008)
23. Pesavento JJ, Kim YB, Taylor GK, Kelleher NL. Shotgun annotation of histone modifications:
a new approach for streamlined characterization of proteins by top down mass spectrometry, J.
Am. Chem. Soc , 2004, vol.