Automated Alphabet Reduction
   Method with Evolutionary
           Algorithms
for Protein Structure Prediction
  Jaume Ba...
What is a protein?
Protein Structure Prediction (PSP)
The goal is to predict the (complex) 3D structure (and some sub-
features) of a protein...
Alphabet reduction process and
                validation

  Domain
               Size = N (<20)
(CN, RSA, …)
           ...
This entry is human competitive
            because:
 G: The result solves a problem of indisputable
 difficulty in its fi...
G:Difficulty
PSP is, after many decades of research, still one of the main
unsolved problems in Science
In the 2006 CASP e...
Why is this entry human-
                  competitive?
             The initial version of our alphabet reduction
       ...
B:Innovative
    Comparison of our results against other reduced alphabets existing in
    the literature and human-design...
Why is this entry better than the
         other entries?
 PSP is a very difficult and very relevant domain
    It has bee...
Better than other entries: New
understanding of the folding process
Simpler rules obtained by BioHEL
  AA alphabet: If AA−...
Better than other entries: run-
time reduction & conclusions
 Alphabet reduction is also beneficial in the short
 term
   ...
Upcoming SlideShare
Loading in …5
×

Automated Alphabet Reduction Method with Evolutionary Algorithms for Protein Structure Prediction

914 views

Published on

Published in: Business, Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
914
On SlideShare
0
From Embeds
0
Number of Embeds
31
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Automated Alphabet Reduction Method with Evolutionary Algorithms for Protein Structure Prediction

  1. 1. Automated Alphabet Reduction Method with Evolutionary Algorithms for Protein Structure Prediction Jaume Bacardit, Michael Stout, Jonathan D. Hirst, Kumara Sastry, Xavier Llorà and Natalio Krasnogor University of Nottingham and University of Illinois at Urbana-Champaign
  2. 2. What is a protein?
  3. 3. Protein Structure Prediction (PSP) The goal is to predict the (complex) 3D structure (and some sub- features) of a protein from its amino acid sequence (a 1D object) Primary Sequence 3D Structure
  4. 4. Alphabet reduction process and validation Domain Size = N (<20) (CN, RSA, …) Test set Dataset Dataset Ensemble ECGA BioHEL Card=N Card=20 of rule sets (<20) Accuracy Mutual Information
  5. 5. This entry is human competitive because: G: The result solves a problem of indisputable difficulty in its field (Difficult) D: The result is publishable in its own right as a new scientific result - independent of the fact that the result was mechanically created (Publishable) E: The result is equal to or better than the most recent human-created solution to a long-standing problem for which there has been a succession of increasingly better human-created solutions (≥Human) B: The result is equal to or better than a result that was accepted as a new scientific result at the time when it was published in a peer-reviewed scientific journal (Innovative)
  6. 6. G:Difficulty PSP is, after many decades of research, still one of the main unsolved problems in Science In the 2006 CASP experiment, one of the best methods (Rosetta@home) used > 3 cpu yrs to predict a single protein Amino acid sequence is a string drawn from a 20-letter alphabet Some AAs are similar & could be grouped, reducing the dimensionality of the domain We can find a new alphabet with much lower cardinality than the AA representation without loosing critical information in the process We can tailor alphabet reduction automatically to a variety of PSP-related domains
  7. 7. Why is this entry human- competitive? The initial version of our alphabet reduction process has been accepted in GECCO D:Publish. 2007, in the biological applications track One of the most famous alphabet reductions is the HP model that reduces AA types to only two: Hydrophobic & Polar (e.g. [Broome & Hecht, 2000]) E:≥Human Other experts use a broader set of physico- chemical properties to propose reduced alphabets (examples in later slides) We have improved upon both of the above
  8. 8. B:Innovative Comparison of our results against other reduced alphabets existing in the literature and human-designed ones, applied to two PSP-related datasets, Coordination Number (CN) and Solvent Accessibility (SA) Our method produces the best reduced alphabets Alphabet Letters CN acc. SA acc. Diff. Ref. AA 20 74.0±0.6 70.7±0.4 --- --- Our method 5 73.3±0.5 70.3±0.4 0.7/0.4 This work WW5 6 73.1±0.7 69.6±0.4 0.9/1.1 [Wang & Wang, 99] Alphabets SR5 6 73.1±0.7 69.6±0.4 0.9/1.1 [Solis & Rackovsky, 00] from the MU4 5 72.6±0.7 69.4±0.4 1.4/1.3 [Murphy et al., 00] literature MM5 6 73.1±0.6 69.3±0.3 0.9/1.4 [Melo & Marti-Renom, 06] HD1 7 72.9±0.6 69.3±0.4 1.1/1.4 This work Expert HD2 9 73.0±0.6 69.3±0.4 1.0/1.4 This work designed alphabets HD3 11 73.2±0.6 69.9±0.4 0.8/0.8 This work
  9. 9. Why is this entry better than the other entries? PSP is a very difficult and very relevant domain It has been named as Grand Challenge by the USA government [1] Impact of having better protein structure models are countless Genetic therapy Synthesis of drugs for incurable diseases Improved crops Environmental remediation Our contribution is a small but concrete step towards achieving this goal [1] Mathematical Committee on Physical, Engineering Engineering Sciences, Federal Coordinating Council for Science, and Technology. Grand challenges 1993: High performance computing and communications, 1992.
  10. 10. Better than other entries: New understanding of the folding process Simpler rules obtained by BioHEL AA alphabet: If AA−4 ∉ {F, G, I, L, V,X, Y }, AA−3 ∉ {F, G, Q,W}, AA−2 ∉ {C,N, P}, AA−1 ∉ {A, I, Q, V, Y }, AA ∈ {K}, AA1 ∉ {F, I, L,M,N, P, T, V }, AA2 ∉ {N, P, Q, S}, AA3 ∉ {C, I, L,R,W}, AA4 ∉ {A,C, I, L,R, S} then AA is exposed Reduced alphabet: If AA−4 ∈ {1, 3}, AA−3 ∈ {1, 3}, AA ∈ {3}, AA1 ∈ {1, 3}, AA2 ∉ {1}, AA3 ∉ {0} then AA is exposed 0 = ACFHILMVWY, 1 = DEKNPQRST (EK for AA), 3 =X Unexpected explanations: Alphabet reduction clustered AA types that experts did not expect. Analyzing the data verified that groups were sound
  11. 11. Better than other entries: run- time reduction & conclusions Alphabet reduction is also beneficial in the short term We have extrapolated the reduced alphabet to Position- Specific Scoring Matrices (PSSM) PSSM is the state-of-the-art representation for PSP with orders of magnitude more information than the AA alphabet Learning time of BioHEL using PSSM has been reduced from 2 weeks to 3 days with only 0.5% accuracy drop We consider that our entry is the best because it addresses successfully and in many ways a very relevant, important, high profile and timely problem

×