Seyed mohammad motevalli
December 2013
outline
 Introduction to bioinformatics
 Biological databases
 Sequence alignment and their algorithms
 Structural prediction

 Web-based tools
 Stand-alone software
Introduction to bioinformatics
 What is the bioinformatics?
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science.
Introduction to bioinformatics
 What are differences between bioinformatics and

informatics?
 What are differences between bioinformatics and
computational biology?
 What is the algorithm?
What is the proteomics!?
Biological databases
 Database

A database is a computerized archive used to store and organize data in such a
way that information can be retrieved easily via a variety of search criteria
 Entry
Each record should contain a number of fields that hold the actual data items
 Value
a particular piece of information
 Making a query
To retrieve a particular record from the database, a user can specify a value to
be found in a particular field and expect the computer to retrieve the whole
data record
Biological databases
 Primary databases
 Gen bank (NCBI)
 EMBL
 DDBJ

www.ncbi.nlm.nih.gov
www.ebi.ac.uk/embl/index.html
www.ddbj.nig.ac.jp

 Secondary databases
 ExPASY
 PIR
 SWISS-Prot

http://web.expasy.org
http://pir.georgetown.edu/pirwww/pirhome3.shtml
www.ebi.ac.uk/swissprot/access.html
Biological databases
 Interconnection between Biological Databases
Biological databases
 Pitfalls of biological databases
 The causes of redundancy include: repeated submission of identical or

overlapping sequences by the same or different authors, revision of
annotations, dumping of expressed sequence tags (EST) data
 Redundant sequences
 Non-redundant sequences (Ref Seq)
Biological databases
 Further databases
 NCBI







www.ncbi.nlm.nih.gov
Uniprot
http://www.uniprot.org
ExPASY
http://web.expasy.org
PIR
http://pir.georgetown.edu/
SWISS-Prot
http://swissmodel.expasy.org/
PDB
http://www.rcsb.org/pdb/home/home.do
Enzyme structure http://www.ebi.ac.uk/thornton-srv/databases/enzymes
Biological databases
 NCBI

www.ncbi.nlm.nih.gov
Biological databases
 Uniprot

http://www.uniprot.org
Biological databases
 ExPASY

http://web.expasy.org
Biological databases
 PIR

http://pir.georgetown.edu/
Biological databases
 SWISS-Prot

http://swissmodel.expasy.org/
Biological databases
 PDB

http://www.rcsb.org/pdb/home/home.do
Biological databases
 Enzyme structure

http://www.ebi.ac.uk/thornton-srv/databases/enzymes
Sequence alignment and their
algorithms
 Pairwise sequence alignment
Pairwise sequence alignment is the process of aligning two sequences and is
the basis of database similarity searching and multiple sequence alignment

 Sequence similarity versus sequence homology
When two sequences are descended from a common evolutionary origin, they
are said to have a homologous relationship or share homology. A related but
different term is sequence similarity, which is the percentage of aligned
residues that are similar in physiochemical properties such as size, charge,
and hydrophobicity

 Sequence similarity versus sequence identity
In a protein sequence alignment, sequence identity refers to the percentage of
matches of the same amino acid residues between two aligned sequences.
Similarity refers to the percentage of aligned residues that have similar
physicochemical characteristics and can be more readily substituted for each
other
Sequence alignment and their
algorithms
 Sequence alignment strategies
 Global alignment

In global alignment, two sequences to be aligned are assumed to be generally
similar over their entire length. Alignment is carried out from beginning to end
of both sequences to find the best possible alignment across the entire length
between the two sequences
 Local alignment
In local alignment does not assume that the two sequences in question have
similarity over the entire length. It only finds local regions with the highest
level of similarity between the two sequences and aligns these regions without
regard for the alignment of the rest of the sequence regions
Sequence alignment and their
algorithms
Sequence alignment and their
algorithms
Linear gap penalty: The cost for creation and extension of gaps are the same

W(I)= gI, g is the cost for each gap and I is the length

Affine gap penalty: different cost for creation and extension
W(I)=gopen + gext (I-1) and gopen < Gext

S

S

,

W I
Sequence alignment and their
algorithms
 Alignment Algorithms And Methodes
 The dot matrix method
 The word method
 The dynamic programming method
Sequence alignment and their
algorithms
 Alignment Algorithms
 The dot matrix method

The most basic sequence alignment method is the dot matrix method, also
known as the dot plot method
Sequence alignment and their
algorithms
 Alignment Algorithms
 The word method

It works by finding short stretches of identical or nearly identical letters in
two sequences. These short strings of characters are called words, which
are similar to the windows used in the dot matrix method
Sequence alignment and their
algorithms
 Alignment Algorithms
 The word method
Sequence alignment and their
algorithms
 Alignment Algorithms
 The dynamic programming method

Dynamic programming is a method that determines optimal alignment by
matching two sequences for all possible pairs of characters between the
two sequences
Sequence alignment and their
algorithms
 Alignment Algorithms
 The dynamic programming method
 Global alignment

The classical global pairwise alignment algorithm using dynamic
programming is the Needleman–Wunsch algorithm. In this algorithm, an
optimal alignment is obtained over the entire lengths of the two sequences
 Local alignment

The first application of dynamic programming in local alignment is the
Smith–Waterman algorithm. In this algorithm, positive scores are
assigned for matching residues and zeros for mismatches. No negative
scores are used
Sequence alignment and their
algorithms
 substitution matrix
 PAM matrices (point accepted mutation)

The PAM matrices were subsequently derived based on the evolutionary
divergence between sequences of the same cluster. One PAM unit is defined as
1% of the amino acid positions that have been changed. Because of the use of
very closely related homologs, the observed mutations were not expected to
significantly change the common function of the proteins
Sequence alignment and their
algorithms
 substitution matrix
 PAM matrices (point accepted mutation)
Sequence alignment and their
algorithms
 substitution matrix
 BLOSUM matrices

This is the series of blocks amino acid substitution matrices (BLOSUM), all of
which are derived based on direct observation for every possible amino acid
substitution in multiple sequence alignments
Sequence alignment and their
algorithms
 substitution matrix
 BLOSUM matrices
Sequence alignment and their
algorithms
What Matrices should be used and when?
Matrix
PAM40

Best use
Similarity (%)
Short alignment that are
70-90
highly similar
PAM160
Detecting members of a
50-60
protein family
PAM250
Longer alignments of more App. 30
divergent sequences
BLUSOM90
Short alignment that are
70-90
highly similar
BLUSOME80
Detecting members of a
50-60
protein family
BLUSOME62
Most effective in finding
30-40
all potential similarities
BLUSOME30
Longer alignments of more <30
divergent sequences
Similarity: the range of similarities that the matrix is able to best tdetecr.
Comparison
• PAM is based on an evolutionary model
using phylogenetic trees
• BLOSUM assumes no evolutionary model,
but rather conserved “blocks” of proteins
Sequence alignment and their
algorithms
 Heuristic database searching
The heuristic algorithms perform faster searches because they examine only a
fraction of the possible alignments examined in regular dynamic programming
 BLAST (basic local alignment search tool)
BLAST uses heuristics to align a query sequence with all sequences in a
database
Sequence alignment and their
algorithms
 BLAST (basic local alignment search tool)
Sequence alignment and their
algorithms
6- finishing

Negative scores from scoring matrix

Threshold for stopping extension

Minimum
Score (S)
Neighborhood
Score Threshold (T)

If the extension stopped after crossing the X, the alignment is called
High-scoring segment pair (HSP)
Sequence alignment and their
algorithms
Suggested BLAST Cutoffs
Finding by chance in nucleotide database is more than proteins
Identity in proteins is more informative than in the nucleic acids
For nucleotide-based searches: hits with E values of 10-6 or
less and seq identity 70% or more
For protein-based searches: hits with E values of 10-3 or less and
seq. identity of 25% or more.
Sequence alignment and their
algorithms
 BLAST (basic local alignment search tool)
 BLASTN

queries nucleotide sequences with a nucleotide sequence database
 BLASTP
uses protein sequences as queries to search against a protein sequence
database
 BLASTX
uses nucleotide sequences as queries and translates them in all six reading
frames to produce translated protein sequences, which are used to query a
protein sequence database
 TBLASTN
queries protein sequences to a nucleotide sequence database with the
sequences translated in all six reading frames
 TBLASTX
uses nucleotide sequences, which are translated in all six frames, to search
against a nucleotide sequence database that has all the sequences
translated in six frames
Sequence alignment and their
algorithms
 PSI-BLAST

Position-specific iterated BLAST (PSI-BLAST) builds profiles and performs
database searches in an iterative fashion. The main feature of PSI-BLAST is
that profiles are constructed automatically and arefine-tunedin each successive
cycle
Sequence alignment and their
algorithms
 PSI-BLAST
Sequence alignment and their
algorithms
 Multiple sequence alignment
Sequence alignment and their
algorithms
 Multiple sequence alignment
 Exhaustive algorithms

The exhaustive alignment method involves examining all possible aligned
positions simultaneously
 Heuristic algorithms
 Because the use of dynamic programming is not feasible for routine multiple
sequence alignment, faster and heuristic algorithms have been developed.
computational strategy to find a near-optimal solution by using rules of
thumb. Essentially, this strategy takes shortcuts by reducing the search
space according to certain criteria
Sequence alignment and their
algorithms
 Multiple sequence alignment
 Heuristic algorithms
 Progressive alignment
 Progressive alignment depends on the stepwise assembly of multiple

alignment and is heuristic in nature
 Clustal
It is a progressive multiple alignment program available either as a standalone or on-line program
 T-coffee
T-coffee performs progressive sequence alignments as in Clustal. The main
difference is that, in processing a query, T-Coffee performs both global and
local pairwise alignment for all possible pairs involved. The global pairwise
alignment is performed using the Clustal program
Sequence alignment and their
algorithms
 Multiple sequence alignment
 Heuristic algorithms
 Iterative alignment

The iterative approach is based on the idea that an optimal
solution can be found by repeatedly modifying existing
suboptimal solutions
Sequence alignment and their
algorithms
 Multiple sequence alignment
 Heuristic algorithms
 Block-Based Alignment

The strategy identifies a block of ungapped alignment shared by all the
sequences, hence, the block-based local alignment strategy
Structural prediction
 Structural prediction methods
 Ab-initio prediction

Computational prediction based on first principles or using the most
elementary information
 Threading
Method of predicting the most likely protein structural fold based on secondary
structure similarity with database structures and assessment of energies of the
potential fold. The term has been used interchangeably with fold recognition
 Homology-based modeling
Method for predicting the three-dimensional structure of a protein based on
homology by assigning the structure of an unknown protein using an existing
homologous protein structure as a template
Hidden Markova algorithm
Statistical model composed of a number of interconnected. Markov chains
with the capability to generate the probability value of an event by taking
into account the influence from hidden variables. Mathematically, it
calculates probability values of connected states among the Markov chains
to find an optimal path within the network of states. It requires training to
obtain the probability values of state transitions. When using a hidden
Markov model to represent a multiple sequence alignment, a sequence can
be generated through the model by incorporating probability values of
match, insertion, and deletion states
Hidden Markova algorithm
Neural network algorithm
Machine-learning algorithm for pattern recognition. It is composed of
input, hidden, and output layers. Units of information in each layer are
called nodes. The nodes of different layers are interconnected to form a
network analogous to a biological nervous system. Between the nodes are
mathematical weight parameters that can be trained with known patterns
so they can be used for later predictions. After training, the network is able
to recognize correlation between an input and output
Neural network algorithm
Web-based tools
 Alignment tools
 Sequence-based methods
 T-coffee










http://tcoffee.crg.cat/apps/tcoffee/do:regular
NCBI
http://blast.ncbi.nlm.nih.gov/Blast.cgi
Uniprot
http://www.uniprot.org
EMBL
http://coot.embl.de/Alignment
Structural-based methods
Dali server
http://ekhidna.biocenter.helsinki.fi/dali_server
FSSP
http://protein.hbu.cn/fssp
Signal peptide resource http://proline.bic.nus.edu.sg/spdb/searchn.html
Active site prediction http://www.scfbio-iitd.res.in/dock/ActiveSite.jsp
Web-based tools
 T-coffee

http://tcoffee.crg.cat/apps/tcoffee/do:regular
Web-based tools
 NCBI

http://blast.ncbi.nlm.nih.gov/Blast.cgi
Web-based tools
 Uniprot

http://www.uniprot.org
Web-based tools
 EMBL

http://coot.embl.de/Alignment
Web-based tools
 Dali server

http://ekhidna.biocenter.helsinki.fi/dali_server
Web-based tools
FSSP
http://protein.hbu.cn/fssp

Web-based tools
 Secondary structures prediction
 Sopma







http://npsapbil.ibcp.fr/cgibin/npsa_automat.pl?page=npsa_sopma.html
Jpred3
http://www.compbio.dundee.ac.uk/www-jpred
PreSSaPro
http://bioinformatica.isa.cnr.it/PRESSAPRO
HMM protein structure prediction
http://compbio.soe.ucsc.edu/SAM_T08/T08-query.html
PROF
http://www.aber.ac.uk/~phiwww/prof
Software package http://molbiol-tools.ca/Protein_secondary_structure.htm


Web-based tools
Sopma
http://npsapbil.ibcp.fr/cgibin/npsa_automat.pl?page=npsa_sopma.html


Web-based tools
Sopma
http://npsapbil.ibcp.fr/cgibin/npsa_automat.pl?page=npsa_sopma.html
Web-based tools
 Jpred3

http://www.compbio.dundee.ac.uk/www-jpred
Web-based tools
 PreSSaPro

http://bioinformatica.isa.cnr.it/PRESSAPRO
Web-based tools
 HMM protein structure prediction

http://compbio.soe.ucsc.edu/SAM_T08/T08-query.html
Web-based tools
 PROF

http://www.aber.ac.uk/~phiwww/prof
Web-based tools
Software package


http://molbiol-tools.ca/Protein_secondary_structure.htm
Web-basedhttp://proline.bic.nus.edu.sg/spdb/searchn.html
tools
Signal peptide resource

Web-based tools
 Active site prediction

http://www.scfbio-iitd.res.in/dock/ActiveSite.jsp
Web-based tools
 Tertiary structure prediction
 Phyre2

http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index
Web-based tools
 Biochemical features
 Protein calculator






http://www.scripps.edu/~cdputnam/protcalc.html
Amino acid calculator
http://proteome.gs.washington.edu/cgibin/aa_calc.pl
Peptide property calculator
https://www.genscript.com/sslbin/site2/peptide_calculation.cgi
Peptide property calculator
http://www.innovagen.se/custom-peptidesynthesis/peptide-property-calculator/peptide-property-calculator.asp
Physico-chemical profiles
http://npsa-pbil.ibcp.fr/cgibin/npsa_automat.pl?page=/NPSA/npsa_pcprof.html
Tagldent tool
http://web.expasy.org/tagident/
Web-based tools
 Biochemical features
 Peptide cutter








http://web.expasy.org/peptide_cutter/
Kyte doolittle hydropahty plot http://gcat.davidson.edu/DGPB/kd/kytedoolittle.htm
GRAVY calculator
http://www.gravy-calculator.de/index.php
ProtScale
http://web.expasy.org/protscale/
ProtParam
http://web.expasy.org/protparam/
Prosite
http://prosite.expasy.org/prosite.html
Interpro
http://www.ebi.ac.uk/interpro/
Web-based tools
Protein calculator http://www.scripps.edu/~cdputnam/protcalc.html

Web-based tools
Amino acid calculator


http://proteome.gs.washington.edu/cgi- bin/aa_calc.pl
Web-based tools
Peptide property calculator


https://www.genscript.com/ssl-bin/site2/peptide_calculation.cgi
Web-based tools
 Peptide property calculator

http://www.innovagen.se/custom-peptidesynthesis/peptide-property-calculator/peptide-property-calculator.asp
Web-based tools
 Physico-chemical profiles

http://npsa-pbil.ibcp.fr/cgibin/npsa_automat.pl?page=/NPSA/npsa_pcprof.html
Web-based tools
 Tagldent tool

http://web.expasy.org/tagident/
Web-based tools
Peptide cutter
http://web.expasy.org/peptide_cutter/

Web-based tools
Kyte doolittle hydropahty plot http://gcat.davidson.edu/DGPB/kd/kyte

doolittle.htm
Web-based http://www.gravy-calculator.de/index.php
tools
GRAVY calculator

Web-based tools
 ProtScale

http://web.expasy.org/protscale/
Web-based tools
 ProtParam

http://web.expasy.org/protparam/
Web-based tools
Prosite
http://prosite.expasy.org/prosite.html

Web-based tools
Interpro
http://www.ebi.ac.uk/interpro/

Stand-alone softwares
 MEGA
Stand-alone softwares
 CLC main workbench
Stand-alone softwares
 UGENE
Stand-alone softwares
 Spdb viewer
Stand-alone softwares
 Pairwise structure alignment
Stand-alone softwares
 Cn3D
Stand-alone software
 BioEdit
Stand-alone software
 ClustalX

Bioinformatics

  • 1.
  • 2.
    outline  Introduction tobioinformatics  Biological databases  Sequence alignment and their algorithms  Structural prediction  Web-based tools  Stand-alone software
  • 3.
    Introduction to bioinformatics What is the bioinformatics? Bioinformatics is an interdisciplinary research area at the interface between computer science and biological science.
  • 4.
    Introduction to bioinformatics What are differences between bioinformatics and informatics?  What are differences between bioinformatics and computational biology?  What is the algorithm?
  • 6.
    What is theproteomics!?
  • 7.
    Biological databases  Database Adatabase is a computerized archive used to store and organize data in such a way that information can be retrieved easily via a variety of search criteria  Entry Each record should contain a number of fields that hold the actual data items  Value a particular piece of information  Making a query To retrieve a particular record from the database, a user can specify a value to be found in a particular field and expect the computer to retrieve the whole data record
  • 8.
    Biological databases  Primarydatabases  Gen bank (NCBI)  EMBL  DDBJ www.ncbi.nlm.nih.gov www.ebi.ac.uk/embl/index.html www.ddbj.nig.ac.jp  Secondary databases  ExPASY  PIR  SWISS-Prot http://web.expasy.org http://pir.georgetown.edu/pirwww/pirhome3.shtml www.ebi.ac.uk/swissprot/access.html
  • 9.
    Biological databases  Interconnectionbetween Biological Databases
  • 10.
    Biological databases  Pitfallsof biological databases  The causes of redundancy include: repeated submission of identical or overlapping sequences by the same or different authors, revision of annotations, dumping of expressed sequence tags (EST) data  Redundant sequences  Non-redundant sequences (Ref Seq)
  • 11.
    Biological databases  Furtherdatabases  NCBI       www.ncbi.nlm.nih.gov Uniprot http://www.uniprot.org ExPASY http://web.expasy.org PIR http://pir.georgetown.edu/ SWISS-Prot http://swissmodel.expasy.org/ PDB http://www.rcsb.org/pdb/home/home.do Enzyme structure http://www.ebi.ac.uk/thornton-srv/databases/enzymes
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
    Biological databases  Enzymestructure http://www.ebi.ac.uk/thornton-srv/databases/enzymes
  • 19.
    Sequence alignment andtheir algorithms  Pairwise sequence alignment Pairwise sequence alignment is the process of aligning two sequences and is the basis of database similarity searching and multiple sequence alignment  Sequence similarity versus sequence homology When two sequences are descended from a common evolutionary origin, they are said to have a homologous relationship or share homology. A related but different term is sequence similarity, which is the percentage of aligned residues that are similar in physiochemical properties such as size, charge, and hydrophobicity  Sequence similarity versus sequence identity In a protein sequence alignment, sequence identity refers to the percentage of matches of the same amino acid residues between two aligned sequences. Similarity refers to the percentage of aligned residues that have similar physicochemical characteristics and can be more readily substituted for each other
  • 20.
    Sequence alignment andtheir algorithms  Sequence alignment strategies  Global alignment In global alignment, two sequences to be aligned are assumed to be generally similar over their entire length. Alignment is carried out from beginning to end of both sequences to find the best possible alignment across the entire length between the two sequences  Local alignment In local alignment does not assume that the two sequences in question have similarity over the entire length. It only finds local regions with the highest level of similarity between the two sequences and aligns these regions without regard for the alignment of the rest of the sequence regions
  • 21.
    Sequence alignment andtheir algorithms
  • 22.
    Sequence alignment andtheir algorithms Linear gap penalty: The cost for creation and extension of gaps are the same W(I)= gI, g is the cost for each gap and I is the length Affine gap penalty: different cost for creation and extension W(I)=gopen + gext (I-1) and gopen < Gext S S , W I
  • 23.
    Sequence alignment andtheir algorithms  Alignment Algorithms And Methodes  The dot matrix method  The word method  The dynamic programming method
  • 24.
    Sequence alignment andtheir algorithms  Alignment Algorithms  The dot matrix method The most basic sequence alignment method is the dot matrix method, also known as the dot plot method
  • 25.
    Sequence alignment andtheir algorithms  Alignment Algorithms  The word method It works by finding short stretches of identical or nearly identical letters in two sequences. These short strings of characters are called words, which are similar to the windows used in the dot matrix method
  • 26.
    Sequence alignment andtheir algorithms  Alignment Algorithms  The word method
  • 27.
    Sequence alignment andtheir algorithms  Alignment Algorithms  The dynamic programming method Dynamic programming is a method that determines optimal alignment by matching two sequences for all possible pairs of characters between the two sequences
  • 29.
    Sequence alignment andtheir algorithms  Alignment Algorithms  The dynamic programming method  Global alignment The classical global pairwise alignment algorithm using dynamic programming is the Needleman–Wunsch algorithm. In this algorithm, an optimal alignment is obtained over the entire lengths of the two sequences  Local alignment The first application of dynamic programming in local alignment is the Smith–Waterman algorithm. In this algorithm, positive scores are assigned for matching residues and zeros for mismatches. No negative scores are used
  • 30.
    Sequence alignment andtheir algorithms  substitution matrix  PAM matrices (point accepted mutation) The PAM matrices were subsequently derived based on the evolutionary divergence between sequences of the same cluster. One PAM unit is defined as 1% of the amino acid positions that have been changed. Because of the use of very closely related homologs, the observed mutations were not expected to significantly change the common function of the proteins
  • 31.
    Sequence alignment andtheir algorithms  substitution matrix  PAM matrices (point accepted mutation)
  • 32.
    Sequence alignment andtheir algorithms  substitution matrix  BLOSUM matrices This is the series of blocks amino acid substitution matrices (BLOSUM), all of which are derived based on direct observation for every possible amino acid substitution in multiple sequence alignments
  • 33.
    Sequence alignment andtheir algorithms  substitution matrix  BLOSUM matrices
  • 34.
    Sequence alignment andtheir algorithms What Matrices should be used and when? Matrix PAM40 Best use Similarity (%) Short alignment that are 70-90 highly similar PAM160 Detecting members of a 50-60 protein family PAM250 Longer alignments of more App. 30 divergent sequences BLUSOM90 Short alignment that are 70-90 highly similar BLUSOME80 Detecting members of a 50-60 protein family BLUSOME62 Most effective in finding 30-40 all potential similarities BLUSOME30 Longer alignments of more <30 divergent sequences Similarity: the range of similarities that the matrix is able to best tdetecr.
  • 35.
    Comparison • PAM isbased on an evolutionary model using phylogenetic trees • BLOSUM assumes no evolutionary model, but rather conserved “blocks” of proteins
  • 36.
    Sequence alignment andtheir algorithms  Heuristic database searching The heuristic algorithms perform faster searches because they examine only a fraction of the possible alignments examined in regular dynamic programming  BLAST (basic local alignment search tool) BLAST uses heuristics to align a query sequence with all sequences in a database
  • 37.
    Sequence alignment andtheir algorithms  BLAST (basic local alignment search tool)
  • 38.
    Sequence alignment andtheir algorithms 6- finishing Negative scores from scoring matrix Threshold for stopping extension Minimum Score (S) Neighborhood Score Threshold (T) If the extension stopped after crossing the X, the alignment is called High-scoring segment pair (HSP)
  • 39.
    Sequence alignment andtheir algorithms Suggested BLAST Cutoffs Finding by chance in nucleotide database is more than proteins Identity in proteins is more informative than in the nucleic acids For nucleotide-based searches: hits with E values of 10-6 or less and seq identity 70% or more For protein-based searches: hits with E values of 10-3 or less and seq. identity of 25% or more.
  • 40.
    Sequence alignment andtheir algorithms  BLAST (basic local alignment search tool)  BLASTN queries nucleotide sequences with a nucleotide sequence database  BLASTP uses protein sequences as queries to search against a protein sequence database  BLASTX uses nucleotide sequences as queries and translates them in all six reading frames to produce translated protein sequences, which are used to query a protein sequence database  TBLASTN queries protein sequences to a nucleotide sequence database with the sequences translated in all six reading frames  TBLASTX uses nucleotide sequences, which are translated in all six frames, to search against a nucleotide sequence database that has all the sequences translated in six frames
  • 41.
    Sequence alignment andtheir algorithms  PSI-BLAST Position-specific iterated BLAST (PSI-BLAST) builds profiles and performs database searches in an iterative fashion. The main feature of PSI-BLAST is that profiles are constructed automatically and arefine-tunedin each successive cycle
  • 42.
    Sequence alignment andtheir algorithms  PSI-BLAST
  • 43.
    Sequence alignment andtheir algorithms  Multiple sequence alignment
  • 44.
    Sequence alignment andtheir algorithms  Multiple sequence alignment  Exhaustive algorithms The exhaustive alignment method involves examining all possible aligned positions simultaneously  Heuristic algorithms  Because the use of dynamic programming is not feasible for routine multiple sequence alignment, faster and heuristic algorithms have been developed. computational strategy to find a near-optimal solution by using rules of thumb. Essentially, this strategy takes shortcuts by reducing the search space according to certain criteria
  • 45.
    Sequence alignment andtheir algorithms  Multiple sequence alignment  Heuristic algorithms  Progressive alignment  Progressive alignment depends on the stepwise assembly of multiple alignment and is heuristic in nature  Clustal It is a progressive multiple alignment program available either as a standalone or on-line program  T-coffee T-coffee performs progressive sequence alignments as in Clustal. The main difference is that, in processing a query, T-Coffee performs both global and local pairwise alignment for all possible pairs involved. The global pairwise alignment is performed using the Clustal program
  • 47.
    Sequence alignment andtheir algorithms  Multiple sequence alignment  Heuristic algorithms  Iterative alignment The iterative approach is based on the idea that an optimal solution can be found by repeatedly modifying existing suboptimal solutions
  • 48.
    Sequence alignment andtheir algorithms  Multiple sequence alignment  Heuristic algorithms  Block-Based Alignment The strategy identifies a block of ungapped alignment shared by all the sequences, hence, the block-based local alignment strategy
  • 49.
    Structural prediction  Structuralprediction methods  Ab-initio prediction Computational prediction based on first principles or using the most elementary information  Threading Method of predicting the most likely protein structural fold based on secondary structure similarity with database structures and assessment of energies of the potential fold. The term has been used interchangeably with fold recognition  Homology-based modeling Method for predicting the three-dimensional structure of a protein based on homology by assigning the structure of an unknown protein using an existing homologous protein structure as a template
  • 50.
    Hidden Markova algorithm Statisticalmodel composed of a number of interconnected. Markov chains with the capability to generate the probability value of an event by taking into account the influence from hidden variables. Mathematically, it calculates probability values of connected states among the Markov chains to find an optimal path within the network of states. It requires training to obtain the probability values of state transitions. When using a hidden Markov model to represent a multiple sequence alignment, a sequence can be generated through the model by incorporating probability values of match, insertion, and deletion states
  • 51.
  • 52.
    Neural network algorithm Machine-learningalgorithm for pattern recognition. It is composed of input, hidden, and output layers. Units of information in each layer are called nodes. The nodes of different layers are interconnected to form a network analogous to a biological nervous system. Between the nodes are mathematical weight parameters that can be trained with known patterns so they can be used for later predictions. After training, the network is able to recognize correlation between an input and output
  • 53.
  • 54.
    Web-based tools  Alignmenttools  Sequence-based methods  T-coffee         http://tcoffee.crg.cat/apps/tcoffee/do:regular NCBI http://blast.ncbi.nlm.nih.gov/Blast.cgi Uniprot http://www.uniprot.org EMBL http://coot.embl.de/Alignment Structural-based methods Dali server http://ekhidna.biocenter.helsinki.fi/dali_server FSSP http://protein.hbu.cn/fssp Signal peptide resource http://proline.bic.nus.edu.sg/spdb/searchn.html Active site prediction http://www.scfbio-iitd.res.in/dock/ActiveSite.jsp
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
    Web-based tools  Daliserver http://ekhidna.biocenter.helsinki.fi/dali_server
  • 60.
  • 61.
    Web-based tools  Secondarystructures prediction  Sopma      http://npsapbil.ibcp.fr/cgibin/npsa_automat.pl?page=npsa_sopma.html Jpred3 http://www.compbio.dundee.ac.uk/www-jpred PreSSaPro http://bioinformatica.isa.cnr.it/PRESSAPRO HMM protein structure prediction http://compbio.soe.ucsc.edu/SAM_T08/T08-query.html PROF http://www.aber.ac.uk/~phiwww/prof Software package http://molbiol-tools.ca/Protein_secondary_structure.htm
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
    Web-based tools  HMMprotein structure prediction http://compbio.soe.ucsc.edu/SAM_T08/T08-query.html
  • 67.
  • 68.
  • 69.
  • 70.
    Web-based tools  Activesite prediction http://www.scfbio-iitd.res.in/dock/ActiveSite.jsp
  • 71.
    Web-based tools  Tertiarystructure prediction  Phyre2 http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index
  • 72.
    Web-based tools  Biochemicalfeatures  Protein calculator      http://www.scripps.edu/~cdputnam/protcalc.html Amino acid calculator http://proteome.gs.washington.edu/cgibin/aa_calc.pl Peptide property calculator https://www.genscript.com/sslbin/site2/peptide_calculation.cgi Peptide property calculator http://www.innovagen.se/custom-peptidesynthesis/peptide-property-calculator/peptide-property-calculator.asp Physico-chemical profiles http://npsa-pbil.ibcp.fr/cgibin/npsa_automat.pl?page=/NPSA/npsa_pcprof.html Tagldent tool http://web.expasy.org/tagident/
  • 73.
    Web-based tools  Biochemicalfeatures  Peptide cutter       http://web.expasy.org/peptide_cutter/ Kyte doolittle hydropahty plot http://gcat.davidson.edu/DGPB/kd/kytedoolittle.htm GRAVY calculator http://www.gravy-calculator.de/index.php ProtScale http://web.expasy.org/protscale/ ProtParam http://web.expasy.org/protparam/ Prosite http://prosite.expasy.org/prosite.html Interpro http://www.ebi.ac.uk/interpro/
  • 74.
    Web-based tools Protein calculatorhttp://www.scripps.edu/~cdputnam/protcalc.html 
  • 75.
    Web-based tools Amino acidcalculator  http://proteome.gs.washington.edu/cgi- bin/aa_calc.pl
  • 76.
    Web-based tools Peptide propertycalculator  https://www.genscript.com/ssl-bin/site2/peptide_calculation.cgi
  • 77.
    Web-based tools  Peptideproperty calculator http://www.innovagen.se/custom-peptidesynthesis/peptide-property-calculator/peptide-property-calculator.asp
  • 78.
    Web-based tools  Physico-chemicalprofiles http://npsa-pbil.ibcp.fr/cgibin/npsa_automat.pl?page=/NPSA/npsa_pcprof.html
  • 79.
    Web-based tools  Tagldenttool http://web.expasy.org/tagident/
  • 80.
  • 81.
    Web-based tools Kyte doolittlehydropahty plot http://gcat.davidson.edu/DGPB/kd/kyte doolittle.htm
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87.
  • 88.
  • 89.
  • 90.
  • 91.
  • 92.
  • 93.
  • 94.