Bioinformatics

1,313 views

Published on

Published in: Technology, Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,313
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
54
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Bioinformatics

  1. 1. Seyed mohammad motevalli December 2013
  2. 2. outline  Introduction to bioinformatics  Biological databases  Sequence alignment and their algorithms  Structural prediction  Web-based tools  Stand-alone software
  3. 3. Introduction to bioinformatics  What is the bioinformatics? Bioinformatics is an interdisciplinary research area at the interface between computer science and biological science.
  4. 4. Introduction to bioinformatics  What are differences between bioinformatics and informatics?  What are differences between bioinformatics and computational biology?  What is the algorithm?
  5. 5. What is the proteomics!?
  6. 6. Biological databases  Database A database is a computerized archive used to store and organize data in such a way that information can be retrieved easily via a variety of search criteria  Entry Each record should contain a number of fields that hold the actual data items  Value a particular piece of information  Making a query To retrieve a particular record from the database, a user can specify a value to be found in a particular field and expect the computer to retrieve the whole data record
  7. 7. Biological databases  Primary databases  Gen bank (NCBI)  EMBL  DDBJ www.ncbi.nlm.nih.gov www.ebi.ac.uk/embl/index.html www.ddbj.nig.ac.jp  Secondary databases  ExPASY  PIR  SWISS-Prot http://web.expasy.org http://pir.georgetown.edu/pirwww/pirhome3.shtml www.ebi.ac.uk/swissprot/access.html
  8. 8. Biological databases  Interconnection between Biological Databases
  9. 9. Biological databases  Pitfalls of biological databases  The causes of redundancy include: repeated submission of identical or overlapping sequences by the same or different authors, revision of annotations, dumping of expressed sequence tags (EST) data  Redundant sequences  Non-redundant sequences (Ref Seq)
  10. 10. Biological databases  Further databases  NCBI       www.ncbi.nlm.nih.gov Uniprot http://www.uniprot.org ExPASY http://web.expasy.org PIR http://pir.georgetown.edu/ SWISS-Prot http://swissmodel.expasy.org/ PDB http://www.rcsb.org/pdb/home/home.do Enzyme structure http://www.ebi.ac.uk/thornton-srv/databases/enzymes
  11. 11. Biological databases  NCBI www.ncbi.nlm.nih.gov
  12. 12. Biological databases  Uniprot http://www.uniprot.org
  13. 13. Biological databases  ExPASY http://web.expasy.org
  14. 14. Biological databases  PIR http://pir.georgetown.edu/
  15. 15. Biological databases  SWISS-Prot http://swissmodel.expasy.org/
  16. 16. Biological databases  PDB http://www.rcsb.org/pdb/home/home.do
  17. 17. Biological databases  Enzyme structure http://www.ebi.ac.uk/thornton-srv/databases/enzymes
  18. 18. Sequence alignment and their algorithms  Pairwise sequence alignment Pairwise sequence alignment is the process of aligning two sequences and is the basis of database similarity searching and multiple sequence alignment  Sequence similarity versus sequence homology When two sequences are descended from a common evolutionary origin, they are said to have a homologous relationship or share homology. A related but different term is sequence similarity, which is the percentage of aligned residues that are similar in physiochemical properties such as size, charge, and hydrophobicity  Sequence similarity versus sequence identity In a protein sequence alignment, sequence identity refers to the percentage of matches of the same amino acid residues between two aligned sequences. Similarity refers to the percentage of aligned residues that have similar physicochemical characteristics and can be more readily substituted for each other
  19. 19. Sequence alignment and their algorithms  Sequence alignment strategies  Global alignment In global alignment, two sequences to be aligned are assumed to be generally similar over their entire length. Alignment is carried out from beginning to end of both sequences to find the best possible alignment across the entire length between the two sequences  Local alignment In local alignment does not assume that the two sequences in question have similarity over the entire length. It only finds local regions with the highest level of similarity between the two sequences and aligns these regions without regard for the alignment of the rest of the sequence regions
  20. 20. Sequence alignment and their algorithms
  21. 21. Sequence alignment and their algorithms Linear gap penalty: The cost for creation and extension of gaps are the same W(I)= gI, g is the cost for each gap and I is the length Affine gap penalty: different cost for creation and extension W(I)=gopen + gext (I-1) and gopen < Gext S S , W I
  22. 22. Sequence alignment and their algorithms  Alignment Algorithms And Methodes  The dot matrix method  The word method  The dynamic programming method
  23. 23. Sequence alignment and their algorithms  Alignment Algorithms  The dot matrix method The most basic sequence alignment method is the dot matrix method, also known as the dot plot method
  24. 24. Sequence alignment and their algorithms  Alignment Algorithms  The word method It works by finding short stretches of identical or nearly identical letters in two sequences. These short strings of characters are called words, which are similar to the windows used in the dot matrix method
  25. 25. Sequence alignment and their algorithms  Alignment Algorithms  The word method
  26. 26. Sequence alignment and their algorithms  Alignment Algorithms  The dynamic programming method Dynamic programming is a method that determines optimal alignment by matching two sequences for all possible pairs of characters between the two sequences
  27. 27. Sequence alignment and their algorithms  Alignment Algorithms  The dynamic programming method  Global alignment The classical global pairwise alignment algorithm using dynamic programming is the Needleman–Wunsch algorithm. In this algorithm, an optimal alignment is obtained over the entire lengths of the two sequences  Local alignment The first application of dynamic programming in local alignment is the Smith–Waterman algorithm. In this algorithm, positive scores are assigned for matching residues and zeros for mismatches. No negative scores are used
  28. 28. Sequence alignment and their algorithms  substitution matrix  PAM matrices (point accepted mutation) The PAM matrices were subsequently derived based on the evolutionary divergence between sequences of the same cluster. One PAM unit is defined as 1% of the amino acid positions that have been changed. Because of the use of very closely related homologs, the observed mutations were not expected to significantly change the common function of the proteins
  29. 29. Sequence alignment and their algorithms  substitution matrix  PAM matrices (point accepted mutation)
  30. 30. Sequence alignment and their algorithms  substitution matrix  BLOSUM matrices This is the series of blocks amino acid substitution matrices (BLOSUM), all of which are derived based on direct observation for every possible amino acid substitution in multiple sequence alignments
  31. 31. Sequence alignment and their algorithms  substitution matrix  BLOSUM matrices
  32. 32. Sequence alignment and their algorithms What Matrices should be used and when? Matrix PAM40 Best use Similarity (%) Short alignment that are 70-90 highly similar PAM160 Detecting members of a 50-60 protein family PAM250 Longer alignments of more App. 30 divergent sequences BLUSOM90 Short alignment that are 70-90 highly similar BLUSOME80 Detecting members of a 50-60 protein family BLUSOME62 Most effective in finding 30-40 all potential similarities BLUSOME30 Longer alignments of more <30 divergent sequences Similarity: the range of similarities that the matrix is able to best tdetecr.
  33. 33. Comparison • PAM is based on an evolutionary model using phylogenetic trees • BLOSUM assumes no evolutionary model, but rather conserved “blocks” of proteins
  34. 34. Sequence alignment and their algorithms  Heuristic database searching The heuristic algorithms perform faster searches because they examine only a fraction of the possible alignments examined in regular dynamic programming  BLAST (basic local alignment search tool) BLAST uses heuristics to align a query sequence with all sequences in a database
  35. 35. Sequence alignment and their algorithms  BLAST (basic local alignment search tool)
  36. 36. Sequence alignment and their algorithms 6- finishing Negative scores from scoring matrix Threshold for stopping extension Minimum Score (S) Neighborhood Score Threshold (T) If the extension stopped after crossing the X, the alignment is called High-scoring segment pair (HSP)
  37. 37. Sequence alignment and their algorithms Suggested BLAST Cutoffs Finding by chance in nucleotide database is more than proteins Identity in proteins is more informative than in the nucleic acids For nucleotide-based searches: hits with E values of 10-6 or less and seq identity 70% or more For protein-based searches: hits with E values of 10-3 or less and seq. identity of 25% or more.
  38. 38. Sequence alignment and their algorithms  BLAST (basic local alignment search tool)  BLASTN queries nucleotide sequences with a nucleotide sequence database  BLASTP uses protein sequences as queries to search against a protein sequence database  BLASTX uses nucleotide sequences as queries and translates them in all six reading frames to produce translated protein sequences, which are used to query a protein sequence database  TBLASTN queries protein sequences to a nucleotide sequence database with the sequences translated in all six reading frames  TBLASTX uses nucleotide sequences, which are translated in all six frames, to search against a nucleotide sequence database that has all the sequences translated in six frames
  39. 39. Sequence alignment and their algorithms  PSI-BLAST Position-specific iterated BLAST (PSI-BLAST) builds profiles and performs database searches in an iterative fashion. The main feature of PSI-BLAST is that profiles are constructed automatically and arefine-tunedin each successive cycle
  40. 40. Sequence alignment and their algorithms  PSI-BLAST
  41. 41. Sequence alignment and their algorithms  Multiple sequence alignment
  42. 42. Sequence alignment and their algorithms  Multiple sequence alignment  Exhaustive algorithms The exhaustive alignment method involves examining all possible aligned positions simultaneously  Heuristic algorithms  Because the use of dynamic programming is not feasible for routine multiple sequence alignment, faster and heuristic algorithms have been developed. computational strategy to find a near-optimal solution by using rules of thumb. Essentially, this strategy takes shortcuts by reducing the search space according to certain criteria
  43. 43. Sequence alignment and their algorithms  Multiple sequence alignment  Heuristic algorithms  Progressive alignment  Progressive alignment depends on the stepwise assembly of multiple alignment and is heuristic in nature  Clustal It is a progressive multiple alignment program available either as a standalone or on-line program  T-coffee T-coffee performs progressive sequence alignments as in Clustal. The main difference is that, in processing a query, T-Coffee performs both global and local pairwise alignment for all possible pairs involved. The global pairwise alignment is performed using the Clustal program
  44. 44. Sequence alignment and their algorithms  Multiple sequence alignment  Heuristic algorithms  Iterative alignment The iterative approach is based on the idea that an optimal solution can be found by repeatedly modifying existing suboptimal solutions
  45. 45. Sequence alignment and their algorithms  Multiple sequence alignment  Heuristic algorithms  Block-Based Alignment The strategy identifies a block of ungapped alignment shared by all the sequences, hence, the block-based local alignment strategy
  46. 46. Structural prediction  Structural prediction methods  Ab-initio prediction Computational prediction based on first principles or using the most elementary information  Threading Method of predicting the most likely protein structural fold based on secondary structure similarity with database structures and assessment of energies of the potential fold. The term has been used interchangeably with fold recognition  Homology-based modeling Method for predicting the three-dimensional structure of a protein based on homology by assigning the structure of an unknown protein using an existing homologous protein structure as a template
  47. 47. Hidden Markova algorithm Statistical model composed of a number of interconnected. Markov chains with the capability to generate the probability value of an event by taking into account the influence from hidden variables. Mathematically, it calculates probability values of connected states among the Markov chains to find an optimal path within the network of states. It requires training to obtain the probability values of state transitions. When using a hidden Markov model to represent a multiple sequence alignment, a sequence can be generated through the model by incorporating probability values of match, insertion, and deletion states
  48. 48. Hidden Markova algorithm
  49. 49. Neural network algorithm Machine-learning algorithm for pattern recognition. It is composed of input, hidden, and output layers. Units of information in each layer are called nodes. The nodes of different layers are interconnected to form a network analogous to a biological nervous system. Between the nodes are mathematical weight parameters that can be trained with known patterns so they can be used for later predictions. After training, the network is able to recognize correlation between an input and output
  50. 50. Neural network algorithm
  51. 51. Web-based tools  Alignment tools  Sequence-based methods  T-coffee         http://tcoffee.crg.cat/apps/tcoffee/do:regular NCBI http://blast.ncbi.nlm.nih.gov/Blast.cgi Uniprot http://www.uniprot.org EMBL http://coot.embl.de/Alignment Structural-based methods Dali server http://ekhidna.biocenter.helsinki.fi/dali_server FSSP http://protein.hbu.cn/fssp Signal peptide resource http://proline.bic.nus.edu.sg/spdb/searchn.html Active site prediction http://www.scfbio-iitd.res.in/dock/ActiveSite.jsp
  52. 52. Web-based tools  T-coffee http://tcoffee.crg.cat/apps/tcoffee/do:regular
  53. 53. Web-based tools  NCBI http://blast.ncbi.nlm.nih.gov/Blast.cgi
  54. 54. Web-based tools  Uniprot http://www.uniprot.org
  55. 55. Web-based tools  EMBL http://coot.embl.de/Alignment
  56. 56. Web-based tools  Dali server http://ekhidna.biocenter.helsinki.fi/dali_server
  57. 57. Web-based tools FSSP http://protein.hbu.cn/fssp 
  58. 58. Web-based tools  Secondary structures prediction  Sopma      http://npsapbil.ibcp.fr/cgibin/npsa_automat.pl?page=npsa_sopma.html Jpred3 http://www.compbio.dundee.ac.uk/www-jpred PreSSaPro http://bioinformatica.isa.cnr.it/PRESSAPRO HMM protein structure prediction http://compbio.soe.ucsc.edu/SAM_T08/T08-query.html PROF http://www.aber.ac.uk/~phiwww/prof Software package http://molbiol-tools.ca/Protein_secondary_structure.htm
  59. 59.  Web-based tools Sopma http://npsapbil.ibcp.fr/cgibin/npsa_automat.pl?page=npsa_sopma.html
  60. 60.  Web-based tools Sopma http://npsapbil.ibcp.fr/cgibin/npsa_automat.pl?page=npsa_sopma.html
  61. 61. Web-based tools  Jpred3 http://www.compbio.dundee.ac.uk/www-jpred
  62. 62. Web-based tools  PreSSaPro http://bioinformatica.isa.cnr.it/PRESSAPRO
  63. 63. Web-based tools  HMM protein structure prediction http://compbio.soe.ucsc.edu/SAM_T08/T08-query.html
  64. 64. Web-based tools  PROF http://www.aber.ac.uk/~phiwww/prof
  65. 65. Web-based tools Software package  http://molbiol-tools.ca/Protein_secondary_structure.htm
  66. 66. Web-basedhttp://proline.bic.nus.edu.sg/spdb/searchn.html tools Signal peptide resource 
  67. 67. Web-based tools  Active site prediction http://www.scfbio-iitd.res.in/dock/ActiveSite.jsp
  68. 68. Web-based tools  Tertiary structure prediction  Phyre2 http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index
  69. 69. Web-based tools  Biochemical features  Protein calculator      http://www.scripps.edu/~cdputnam/protcalc.html Amino acid calculator http://proteome.gs.washington.edu/cgibin/aa_calc.pl Peptide property calculator https://www.genscript.com/sslbin/site2/peptide_calculation.cgi Peptide property calculator http://www.innovagen.se/custom-peptidesynthesis/peptide-property-calculator/peptide-property-calculator.asp Physico-chemical profiles http://npsa-pbil.ibcp.fr/cgibin/npsa_automat.pl?page=/NPSA/npsa_pcprof.html Tagldent tool http://web.expasy.org/tagident/
  70. 70. Web-based tools  Biochemical features  Peptide cutter       http://web.expasy.org/peptide_cutter/ Kyte doolittle hydropahty plot http://gcat.davidson.edu/DGPB/kd/kytedoolittle.htm GRAVY calculator http://www.gravy-calculator.de/index.php ProtScale http://web.expasy.org/protscale/ ProtParam http://web.expasy.org/protparam/ Prosite http://prosite.expasy.org/prosite.html Interpro http://www.ebi.ac.uk/interpro/
  71. 71. Web-based tools Protein calculator http://www.scripps.edu/~cdputnam/protcalc.html 
  72. 72. Web-based tools Amino acid calculator  http://proteome.gs.washington.edu/cgi- bin/aa_calc.pl
  73. 73. Web-based tools Peptide property calculator  https://www.genscript.com/ssl-bin/site2/peptide_calculation.cgi
  74. 74. Web-based tools  Peptide property calculator http://www.innovagen.se/custom-peptidesynthesis/peptide-property-calculator/peptide-property-calculator.asp
  75. 75. Web-based tools  Physico-chemical profiles http://npsa-pbil.ibcp.fr/cgibin/npsa_automat.pl?page=/NPSA/npsa_pcprof.html
  76. 76. Web-based tools  Tagldent tool http://web.expasy.org/tagident/
  77. 77. Web-based tools Peptide cutter http://web.expasy.org/peptide_cutter/ 
  78. 78. Web-based tools Kyte doolittle hydropahty plot http://gcat.davidson.edu/DGPB/kd/kyte doolittle.htm
  79. 79. Web-based http://www.gravy-calculator.de/index.php tools GRAVY calculator 
  80. 80. Web-based tools  ProtScale http://web.expasy.org/protscale/
  81. 81. Web-based tools  ProtParam http://web.expasy.org/protparam/
  82. 82. Web-based tools Prosite http://prosite.expasy.org/prosite.html 
  83. 83. Web-based tools Interpro http://www.ebi.ac.uk/interpro/ 
  84. 84. Stand-alone softwares  MEGA
  85. 85. Stand-alone softwares  CLC main workbench
  86. 86. Stand-alone softwares  UGENE
  87. 87. Stand-alone softwares  Spdb viewer
  88. 88. Stand-alone softwares  Pairwise structure alignment
  89. 89. Stand-alone softwares  Cn3D
  90. 90. Stand-alone software  BioEdit
  91. 91. Stand-alone software  ClustalX

×