SlideShare a Scribd company logo
►Introduction
  ‫ ٭‬Background
  ‫ ٭‬Problem
  ‫ ٭‬Energy Forms
►Methods
  ‫ ٭‬Genetic Algorithm
►Results and Discussion
►Conclusion
►VBA (Visual Basic Add-in) Program Demonstration
►A protein is a string of amino acids connected
 by peptide bonds.

►Amino acid
  ‫٭‬   Acidic        N-Terminus         C-Terminus
  ‫٭‬   Basic
  ‫٭‬   Aliphatic
  ‫٭‬   Polar uncharged
  ‫٭‬   Aromatic
►Proteins catalyze over 1,000 biochemical reactions in
 the human body.
►Protein misfoldings are responsible for over 20
 diseases.
   ‫ ٭‬Mad Cow disease caused by an “evil” protein - The “evil”
     protein and normal protein have identical primary
     structures, but their tertiary structures are different.


       Normal PrP                       Diseased PrP
►Some proteins fold as fast as a millionth of a second

►Theoretically, a protein of only 100 amino acids
 following the trial and error method would take 100
 billion years to try out all possible conformations!

►Protein structures are highly dependent upon various
 environmental parameters.
   ‫ ٭‬Such as temperature, pH, solvent, etc.
► Comparative - Use evolutionary related protein
   ‫ ٭‬Advantages: fast and simple
   ‫ ٭‬Disadvantages: conformation depends upon environmental parameters

► Folding Recognition - Utilize a database of known 3-D protein
  structure
   ‫ ٭‬Advantages: more accurate than comparative
   ‫ ٭‬Disadvantages: not enough NMR confirmed protein structures

► Ab Initio - Uses both scientific and engineering approach
   ‫ ٭‬Advantages: has potential to predict exact shape and immediate
     structures
   ‫ ٭‬Disadvantages: computing limitations, difficulty in selecting correct
     potential energy function
►Not enough NMR confirmed protein structure in Protein
 Data Bank (PDB)

►Evolutionary relatedness does not necessarily translate to
 similar structure

►Ab initio difficulties
   ‫ ٭‬Hydrophilic and hydrophobic modeling gives only general
     arrangement of the protein
   ‫-2 ٭‬D modeling does not predict 3-D shape of the protein
   ‫ ٭‬Monte-carlo computing method is time consuming and does not
     necessarily reach global minimum
►Develop a genetic algorithm based program to predict
 protein conformation

►Reduce the generations needed for prediction, thus
 enhance the efficiency of the search

►Explore different additional operators to modify genetic
 algorithm

►Predict the protein conformation of a short 5-AA
 peptide, Enkephalin
►Electrostatic Energy

►Nonbonding Energy

►Hydrogen Bonding Energy

►Cystein-Cystein Loop Energy
►Energy term calculated in atom pairs
  ‫ ٭‬Modeled after coulomb force

►Forces between two charges at certain distance
 (rij )
+                         +
E, Joule




                             r
                Electrostatic term




               r, Angstrom
►Two types of Lennard-Jones potential
  o 1-4 atom - connected by three bonds
  o 1-5 atom, higher interaction - connected by more than three
    bonds
►Modeled after Lennard-Jones Potential Repulsion/Attractive forces

                             F


                                      -F


                                           
                                           2
                        1

                            1-4 Interactions

                            1-5 Interactions
►Energy associated with the hydrogen bonding in the
 protein.
►Included if there are one or more intramolecular
 disulfide bonds
►The rotational angle
 between the bond between
 one pair of adjacent atoms
 and the next pair’s bond is
 called a dihedral angle

►Phi is between N and C, psi
 is between C and C’, omega
 is between C’ and N
► First 3 atoms on the peptide                                              x
  chain are fixed

► The coordinate system is                                             q
  arbitrarily determined around            Ca (-1.52,1.37,0)
  the first H atom of the N-
  terminus                                                                      N (-1.04 ,0,0)

                                                                  w
► Assumptions:
    ‫ ٭‬Minimal bond length stretch
                                                               H- (0,0,0)
    ‫ ٭‬Bond angle stays constant
                                       Y
    ‫ ٭‬Torsion angle (dihedral angle)
      applies to the 4th atom
                                                                        Z
  cos q ij                   sin q ij                                ri  1 j cos q ij
        x n1               0                                                                                                          
                                                                                                  0
                                                                                                                                     
                                             sin q ij cos w ij        cos q ij cos w ij    sin w ij          ri  1 j sin q ij cos w ij
         xn2                  0
               B B ... B                                                                                                            
                                      Bn 
        x n3               0            sin q ij sin w ij                                                 ri  1 j sin q ij sin w ij 
                   1 2     n
                                                                      cos q ij sin w ij       cos w ij
                                                                                                                                     
                             1                                                                                                         
       1                                            0                       0                   0                         1


The first 3 Bn parameters are fixed due to the previous assumption, B1, B2, and B3 corresponds
to the H-, -N-, Ca

                                                                                  cos q 13       sin q 13             r23 cos q 13 
                                          1               r12 
      1            0                                                                                            0
            0   0                               0    0
                                                                                                                                      
                                                                               sin q 13       cos q 13              r23 sin q 13
                                                                                                                  0
        0   1   0   0                      0    1    0      0
                                                                           B3                                                        
 B1                              B2                         
                                                                                                                                      
      0            0                   0                 0
                                                     1                               0               0           1           0
            0   1                               0
                                                                                                                                      
                                                              
                                                                                                                                      
                                                                                      0               0           0           1
      0            1                   0                 1
            0   0                               0    0
►Fisher projections to             w1= dihedral angle
 determine the
 dihedral angle of
 side-group atoms w2= 120 + w1

                                 w2= 180 + w1
►Assumption:
                                                        w1
   ‫ ٭‬Tetrahedral structure:
     120o apart
   ‫ ٭‬Bent structure: 180o
     apart
► Search and optimization method
  that mimics the natural selection

► Terms to define
    ‫ ٭‬Chromosome – a set of torsion angles
    ‫ ٭‬Gene – an individual torsion angle
    ‫ ٭‬Generation – a single loop within GA
      loop search


► Loops through the reproduction,
  mutation, and adaptation process
  to obtain best fit model
►Use a computer
 simulation to perform
 an intelligent
 search/optimization to
 find the native protein
 conformation that
 requires the least
 amount of energy

                           Native Conformation
►GAPSS is developed under Visual Basic Add-in
 environment

►Modified genetic operators
   ‫٭‬   Fitness function based selection
   ‫٭‬   Multiple entries crossover
   ‫٭‬   Non-uniform mutation
   ‫٭‬   Adaptation

►Advantages
   ‫ ٭‬Faster convergence
   ‫ ٭‬User-friendly
► Basic three primary energy:
  Eletrostatic, Nonbonded (6-
  12), and Hydrogen Bonded

► Exclude Torsion Energy
   ‫ ٭‬Not real interaction energy
   ‫ ٭‬Introduce penalty for positive
     torsion

► Cystine Loop-Closing
  introduced only when more
  than one cysteins are present
  in the protein
►Selection Operator
                                                Higher rank
   ‫ ٭‬Ranked Selection – higher                  or better
     the rank higher the                        fitness
     probability of being chosen
   ‫ ٭‬Fitness Selection – better
     the fitness higher the
     probability of being chosen


►Benefits of Selection             Lower rank
                                   or worse
   ‫ ٭‬Aid the Elitism Search        fitness
► Mutation Operator
   ‫ ٭‬Uniform Mutation – randomly
     replace with a value from
     -180 to 180
   ‫ ٭‬Non-uniform mutation – add
     or subtract a random value
     between 0 and 180

► Effects of Mutation
   ‫ ٭‬Introduce variance to search
   ‫ ٭‬Aid the search for global
     minimum by directing
     gradient search out of the
     local minima
►Crossover Operator
   ‫ ٭‬Random 2-point Crossover
     – randomly exchange
     between parents 2 angles at
     a time
   ‫ ٭‬Multiple Entries Crossover
     – multiple random
     exchange

►Benefits of Crossover
   ‫ ٭‬Aid the search for elites
   ‫ ٭‬Optimize the search by
     keeping the optimal folding
     segments
►Adaptation Operator
   ‫ ٭‬Gradient search applied to
     each chromosome
   ‫ ٭‬Predict energy profile


►Benefits of Adaptation
   ‫ ٭‬Provide the local minima
     search
   ‫ ٭‬Determine the energy
     profile of the native folding
     process
► Free GA search – no restriction on dihedral angles with
  exception of omega and ring structure
   ‫ ٭‬Advantages: use in any protein search, empirical way of obtaining
     protein conformation, and useful for energy profile search

► α-helices and b-sheets specific GA search – randomly select
  segment of protein as α-helices and b-sheets
   ‫ ٭‬Advantages: enhance the speed of free GA and accurate search for α-
     helices and b-sheets

► Binary GA search – use binary to represent dihedral angles
  instead decimal
   ‫ ٭‬Advantages: No barrier when doing crossover
►Creates α-helices and b-sheets
 of random lengths at random
 start positions

►Each α-helix or b-sheet created
 in this way is described by two
 parameters

►Crossover will involve trading
 the two parameters between
 two individuals
►When α-helices are crossed
 over, each individual’s new
 energy is compared to its old
 energy. If there is a net       Green
                                 region
 improvement, the crossover
 is kept.

►The “former helix” regions       Blue
                                 region
 will be filled with random
 torsion angles like normal
►Transfer torsion angles to binary code
   ‫ ٭‬Integer and decimal coded separately to shorten the total
     number of digits - 17 digits altogether
►Idea is to make the torsion angles on a single
 chromosome represented by one long continuous
 chain
   ‫ ٭‬Cross over and Mutation operators all similar to GA

                             10100101010010000101001110101100001
                             01011010100100001010010101001000010
                             10010101001010010100101010011100
►All single AA was predicted with GAPSS

►GA parameters
  ‫ ٭‬Initial population: 20
  ‫ ٭‬Generation limitation: 15
  ‫ ٭‬Percentage of mutations: 90%

►Compared to native single AA folding
Asparagine
    Alanine                      Asparatic Acid
              N
    A                            D
              Asn
    Ala                          Asp

                                                      Cysteine
                                                      C
                                                      Cys


Glutamine     Glutamic Acid
Q             E             Glycine               Isoleucine
Gln           Glu           G                     I
                            Gly                   Ile
Leucine                                     Serine
                      Methionine
L                                           S
                      M
Leu                                         Ser
                      Met




                                   Valine
          Threonine
                                   V
          T
                                   Val
          Thr
►Enkephalin is pentapeptide that is involved in
 regulating pain

►Two forms of enkephalin
   ‫ ٭‬Methylated-enkephalin – Tyr-Gly-Gly-Phe-Met
   ‫ ٭‬Leucine-enkephalin – Tyr-Gly-Gly-Phe-Leu

►Short enough to confirm the accuracy of the
 GAPSS, however still contains complex ring side
 groups
►Gradient zero conformations suggests the GAPSS
 are capable of obtaining local minima

►Backbone conformations showed incredible
 similarities

►Side group conformations still show discrepancy
 between predicted and theoretical
►GAPSS was able to locate a few local minimum
 protein conformations
►Backbone structure was predicted by the GAPSS




        GA                          NMR
        predicted                   Confirmed
        Backbone                    Backbone
        Structure                   Structure
► Discrepancies between side groups due to the lack of
  entropy, solvation energy, and center partial charge
  assumption




GA
predicted
Backbone
Structure                                      NMR
                                               Confirmed
                                               Backbone
                                               Structure
► (a) The minimum energy of each
  generation with different initial
  population at 3 generation limit
  and 20% mutation

► (b) The minimum energy of each
  generation with different the
  percentage of mutation at 10
  generation limit and 20 initial
  population.

► The optimal condition was found
  to be 30 initial population,15
  generation limits, and 90%
  mutation percentage
► Progression of protein folding of the best prediction, potential energy
  continue to reduce suggest that more stringent GA parameters could lead to
  global minimum
►Due to computing capability limitation, less stringent GA
 parameters were used

►Energy level of predicted enkephalin structure is less than
 the theoretical, however, the code is still showing energy
 decrease

►More sophisticated partial charge calculation and non-
 bonded energy could improve the prediction

►There are zero gradient structures predicted by the GAPSS
► GA based search and optimization is a simple and efficient method
  for the isolated native protein structure prediction

► Continuous decimal representation of dihedral angles is more
  efficient than binary representation of dihedral angles, despite the
  crossover barriers

► a-helices and b-sheets search converges faster than free torsion
  angle search

► Similar backbone dihedrals predicted from VBA GA compared to
  Protein Databank
Chemical, Biological, and Materials Engineering
     Department, University of Oklahoma



             Advanced Design II
►Distance calculation from the origin
                                                                             
                                                       x   R cos    q 1      R cos( q 1 )
                                                                                 
                                                                    2        2 

                                                                           
                                                       y  R sin    q 1    cos( b 1 )  R sin( q 1 ) cos( b 1 )
                                                                               
                                 x                                2        2 

                                                                           
                                                       z  R sin    q 1    sin( b 1 )  R sin( q 1 ) sin( b 1 )
                                                                               
                          q                                       2        2 
     Ca (-
     1.52,1.37,0)
                                     N (-1.04 ,0,0)
                                                      (x)  (y )  (z )
                                                               2             2             2


                      w

                                                        R cos( q 1 )  2          R sin( q 1 ) cos( b 1 )    R sin( q 1 ) sin( b 1 ) 
                                                                                                               2                                2
                    H- (0,0,0)
 Y
                                                                                                                                      
                                                         R cos( q 1 )   sin( q 1 ) R  cos( b 1 )  sin( b 1 )
                                                                                  2                    2           2               2


                           Z                             R cos( q 1 )   sin( q 1 ) R  (1)
                                                                                  2                    2



                                                               cos( q                             
                                                                             )  sin( q 1 )
                                                      R
                                                           2                  2                2
                                                                         1

                                                       R (1)
                                                           2
►Rotate one axis at a time to compensate for bond
 and dihedral angle, there is no rotation around y

                 x’                           z’
                       x                                 z
            qz                           qx

                           y                                 y
                      qz                           qx

                       y’                           y’



        z =z’                        x =x’
Qy is 0, cancelation of most of trigonometry functions
             1                   1
                                                     1

                                                     1
  cos q ij                        sin q ij                                   ri  1 j cos q ij
                                                                                                                 
                                                                            0
                                                                                                                 
           
             sin q ij cos w ij            cos q ij cos w ij             sin w ij     ri  1 j sin q ij cos w ij 
    
Det  Bn                                                                                                         1
                                                                                                                    
            sin q ij sin w ij                                                         ri  1 j sin q ij sin w ij 
                                          cos q ij sin w ij            cos w ij
                                                                                                                 
           
                                                                                                                   
                                                                                                                 
                      0                            0                        0                       1
    
                  cos q ij                      sin q ij                                  ri  1 j cos q ij     
                                                                                0
                                                                                                                                           R cos( q ij )
                                                                                                                          x i 1                              
 x i 1                                                                                                          
                  sin q ij cos w ij          cos q ij cos w ij            sin w ij     ri  1 j sin q ij cos w ij                                             
                                                                                                                                
  y i  1  Det                                                                                                         y i  1   R sin( q ij ) cos( w ij ) 
                                                                                                                                
                 sin q ij sin w ij                                                      ri  1 j sin q ij sin w ij 
                                             cos q ij sin w ij            cos w ij
                                                                                                                          z i  1   R sin( q ij ) sin( w ij ) 
 z i 1                                                                                                          
                                                                                                                                                            
                                                                                                                   
                                                                    2
                         2
                                                            
                                        R cos( q ij )
  x i 1   x i 1  
                                                            
                          
                        
  y i  1    y i  1      R sin( q ij ) cos( w ij )  
                                
                                 R sin( q ) sin( w )  
                                                             
  z i 1   z i 1  
                                                   ij  
                                              ij

More Related Content

Viewers also liked

Neural networks...
Neural networks...Neural networks...
Neural networks...
Molly Chugh
 
Homology modelling
Homology modellingHomology modelling
Homology modelling
Elda Nurafnie
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
karamveer prajapat
 
Protein Structure Prediction
Protein Structure PredictionProtein Structure Prediction
Protein Structure Prediction
Balachandramohan Bcm
 
Knowledge based systems
Knowledge based systemsKnowledge based systems
Knowledge based systems
Yowan Rdotexe
 
neural network
neural networkneural network
neural network
STUDENT
 

Viewers also liked (6)

Neural networks...
Neural networks...Neural networks...
Neural networks...
 
Homology modelling
Homology modellingHomology modelling
Homology modelling
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
 
Protein Structure Prediction
Protein Structure PredictionProtein Structure Prediction
Protein Structure Prediction
 
Knowledge based systems
Knowledge based systemsKnowledge based systems
Knowledge based systems
 
neural network
neural networkneural network
neural network
 

Similar to Protein Folding Prediction

Understand Of Linear Algebra
Understand Of Linear AlgebraUnderstand Of Linear Algebra
Understand Of Linear Algebra
Edward Yoon
 
Lesson 12: Linear Independence
Lesson 12: Linear IndependenceLesson 12: Linear Independence
Lesson 12: Linear Independence
Matthew Leingang
 
Midterm II Review Session Slides
Midterm II Review Session SlidesMidterm II Review Session Slides
Midterm II Review Session Slides
Matthew Leingang
 
Eigenvalues in a Nutshell
Eigenvalues in a NutshellEigenvalues in a Nutshell
Eigenvalues in a Nutshell
guest9006ab
 
Quantum Logic
Quantum LogicQuantum Logic
Quantum Logic
Matthew Leifer
 
Minor League Grant Green
Minor League   Grant GreenMinor League   Grant Green
Minor League Grant Green
mabbagliati
 
Lecture 11
Lecture 11Lecture 11
11X1 T08 03 rules for differentiation
11X1 T08 03 rules for differentiation11X1 T08 03 rules for differentiation
11X1 T08 03 rules for differentiation
Nigel Simmons
 
Greens Greenery
Greens GreeneryGreens Greenery
Greens Greenery
mabbagliati
 

Similar to Protein Folding Prediction (9)

Understand Of Linear Algebra
Understand Of Linear AlgebraUnderstand Of Linear Algebra
Understand Of Linear Algebra
 
Lesson 12: Linear Independence
Lesson 12: Linear IndependenceLesson 12: Linear Independence
Lesson 12: Linear Independence
 
Midterm II Review Session Slides
Midterm II Review Session SlidesMidterm II Review Session Slides
Midterm II Review Session Slides
 
Eigenvalues in a Nutshell
Eigenvalues in a NutshellEigenvalues in a Nutshell
Eigenvalues in a Nutshell
 
Quantum Logic
Quantum LogicQuantum Logic
Quantum Logic
 
Minor League Grant Green
Minor League   Grant GreenMinor League   Grant Green
Minor League Grant Green
 
Lecture 11
Lecture 11Lecture 11
Lecture 11
 
11X1 T08 03 rules for differentiation
11X1 T08 03 rules for differentiation11X1 T08 03 rules for differentiation
11X1 T08 03 rules for differentiation
 
Greens Greenery
Greens GreeneryGreens Greenery
Greens Greenery
 

Recently uploaded

Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 

Recently uploaded (20)

Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 

Protein Folding Prediction

  • 1.
  • 2. ►Introduction ‫ ٭‬Background ‫ ٭‬Problem ‫ ٭‬Energy Forms ►Methods ‫ ٭‬Genetic Algorithm ►Results and Discussion ►Conclusion ►VBA (Visual Basic Add-in) Program Demonstration
  • 3. ►A protein is a string of amino acids connected by peptide bonds. ►Amino acid ‫٭‬ Acidic N-Terminus C-Terminus ‫٭‬ Basic ‫٭‬ Aliphatic ‫٭‬ Polar uncharged ‫٭‬ Aromatic
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10. ►Proteins catalyze over 1,000 biochemical reactions in the human body.
  • 11. ►Protein misfoldings are responsible for over 20 diseases. ‫ ٭‬Mad Cow disease caused by an “evil” protein - The “evil” protein and normal protein have identical primary structures, but their tertiary structures are different. Normal PrP Diseased PrP
  • 12. ►Some proteins fold as fast as a millionth of a second ►Theoretically, a protein of only 100 amino acids following the trial and error method would take 100 billion years to try out all possible conformations! ►Protein structures are highly dependent upon various environmental parameters. ‫ ٭‬Such as temperature, pH, solvent, etc.
  • 13. ► Comparative - Use evolutionary related protein ‫ ٭‬Advantages: fast and simple ‫ ٭‬Disadvantages: conformation depends upon environmental parameters ► Folding Recognition - Utilize a database of known 3-D protein structure ‫ ٭‬Advantages: more accurate than comparative ‫ ٭‬Disadvantages: not enough NMR confirmed protein structures ► Ab Initio - Uses both scientific and engineering approach ‫ ٭‬Advantages: has potential to predict exact shape and immediate structures ‫ ٭‬Disadvantages: computing limitations, difficulty in selecting correct potential energy function
  • 14. ►Not enough NMR confirmed protein structure in Protein Data Bank (PDB) ►Evolutionary relatedness does not necessarily translate to similar structure ►Ab initio difficulties ‫ ٭‬Hydrophilic and hydrophobic modeling gives only general arrangement of the protein ‫-2 ٭‬D modeling does not predict 3-D shape of the protein ‫ ٭‬Monte-carlo computing method is time consuming and does not necessarily reach global minimum
  • 15. ►Develop a genetic algorithm based program to predict protein conformation ►Reduce the generations needed for prediction, thus enhance the efficiency of the search ►Explore different additional operators to modify genetic algorithm ►Predict the protein conformation of a short 5-AA peptide, Enkephalin
  • 16.
  • 17. ►Electrostatic Energy ►Nonbonding Energy ►Hydrogen Bonding Energy ►Cystein-Cystein Loop Energy
  • 18. ►Energy term calculated in atom pairs ‫ ٭‬Modeled after coulomb force ►Forces between two charges at certain distance (rij )
  • 19. + + E, Joule r Electrostatic term r, Angstrom
  • 20. ►Two types of Lennard-Jones potential o 1-4 atom - connected by three bonds o 1-5 atom, higher interaction - connected by more than three bonds
  • 21. ►Modeled after Lennard-Jones Potential Repulsion/Attractive forces F -F   2 1 1-4 Interactions 1-5 Interactions
  • 22. ►Energy associated with the hydrogen bonding in the protein.
  • 23. ►Included if there are one or more intramolecular disulfide bonds
  • 24.
  • 25. ►The rotational angle between the bond between one pair of adjacent atoms and the next pair’s bond is called a dihedral angle ►Phi is between N and C, psi is between C and C’, omega is between C’ and N
  • 26. ► First 3 atoms on the peptide x chain are fixed ► The coordinate system is q arbitrarily determined around Ca (-1.52,1.37,0) the first H atom of the N- terminus N (-1.04 ,0,0) w ► Assumptions: ‫ ٭‬Minimal bond length stretch H- (0,0,0) ‫ ٭‬Bond angle stays constant Y ‫ ٭‬Torsion angle (dihedral angle) applies to the 4th atom Z
  • 27.   cos q ij  sin q ij  ri  1 j cos q ij  x n1  0   0      sin q ij cos w ij  cos q ij cos w ij  sin w ij ri  1 j sin q ij cos w ij xn2 0    B B ... B     Bn   x n3  0   sin q ij sin w ij ri  1 j sin q ij sin w ij  1 2 n  cos q ij sin w ij cos w ij      1    1  0 0 0 1 The first 3 Bn parameters are fixed due to the previous assumption, B1, B2, and B3 corresponds to the H-, -N-, Ca   cos q 13  sin q 13  r23 cos q 13   1  r12  1 0 0 0 0 0 0       sin q 13  cos q 13 r23 sin q 13 0 0 1 0 0 0 1 0 0 B3    B1    B2      0 0 0 0 1 0 0 1 0 0 1 0         0 0 0 1 0 1 0 1 0 0 0 0
  • 28. ►Fisher projections to w1= dihedral angle determine the dihedral angle of side-group atoms w2= 120 + w1 w2= 180 + w1 ►Assumption: w1 ‫ ٭‬Tetrahedral structure: 120o apart ‫ ٭‬Bent structure: 180o apart
  • 29.
  • 30. ► Search and optimization method that mimics the natural selection ► Terms to define ‫ ٭‬Chromosome – a set of torsion angles ‫ ٭‬Gene – an individual torsion angle ‫ ٭‬Generation – a single loop within GA loop search ► Loops through the reproduction, mutation, and adaptation process to obtain best fit model
  • 31. ►Use a computer simulation to perform an intelligent search/optimization to find the native protein conformation that requires the least amount of energy Native Conformation
  • 32. ►GAPSS is developed under Visual Basic Add-in environment ►Modified genetic operators ‫٭‬ Fitness function based selection ‫٭‬ Multiple entries crossover ‫٭‬ Non-uniform mutation ‫٭‬ Adaptation ►Advantages ‫ ٭‬Faster convergence ‫ ٭‬User-friendly
  • 33. ► Basic three primary energy: Eletrostatic, Nonbonded (6- 12), and Hydrogen Bonded ► Exclude Torsion Energy ‫ ٭‬Not real interaction energy ‫ ٭‬Introduce penalty for positive torsion ► Cystine Loop-Closing introduced only when more than one cysteins are present in the protein
  • 34. ►Selection Operator Higher rank ‫ ٭‬Ranked Selection – higher or better the rank higher the fitness probability of being chosen ‫ ٭‬Fitness Selection – better the fitness higher the probability of being chosen ►Benefits of Selection Lower rank or worse ‫ ٭‬Aid the Elitism Search fitness
  • 35. ► Mutation Operator ‫ ٭‬Uniform Mutation – randomly replace with a value from -180 to 180 ‫ ٭‬Non-uniform mutation – add or subtract a random value between 0 and 180 ► Effects of Mutation ‫ ٭‬Introduce variance to search ‫ ٭‬Aid the search for global minimum by directing gradient search out of the local minima
  • 36. ►Crossover Operator ‫ ٭‬Random 2-point Crossover – randomly exchange between parents 2 angles at a time ‫ ٭‬Multiple Entries Crossover – multiple random exchange ►Benefits of Crossover ‫ ٭‬Aid the search for elites ‫ ٭‬Optimize the search by keeping the optimal folding segments
  • 37. ►Adaptation Operator ‫ ٭‬Gradient search applied to each chromosome ‫ ٭‬Predict energy profile ►Benefits of Adaptation ‫ ٭‬Provide the local minima search ‫ ٭‬Determine the energy profile of the native folding process
  • 38. ► Free GA search – no restriction on dihedral angles with exception of omega and ring structure ‫ ٭‬Advantages: use in any protein search, empirical way of obtaining protein conformation, and useful for energy profile search ► α-helices and b-sheets specific GA search – randomly select segment of protein as α-helices and b-sheets ‫ ٭‬Advantages: enhance the speed of free GA and accurate search for α- helices and b-sheets ► Binary GA search – use binary to represent dihedral angles instead decimal ‫ ٭‬Advantages: No barrier when doing crossover
  • 39. ►Creates α-helices and b-sheets of random lengths at random start positions ►Each α-helix or b-sheet created in this way is described by two parameters ►Crossover will involve trading the two parameters between two individuals
  • 40. ►When α-helices are crossed over, each individual’s new energy is compared to its old energy. If there is a net Green region improvement, the crossover is kept. ►The “former helix” regions Blue region will be filled with random torsion angles like normal
  • 41. ►Transfer torsion angles to binary code ‫ ٭‬Integer and decimal coded separately to shorten the total number of digits - 17 digits altogether ►Idea is to make the torsion angles on a single chromosome represented by one long continuous chain ‫ ٭‬Cross over and Mutation operators all similar to GA 10100101010010000101001110101100001 01011010100100001010010101001000010 10010101001010010100101010011100
  • 42.
  • 43. ►All single AA was predicted with GAPSS ►GA parameters ‫ ٭‬Initial population: 20 ‫ ٭‬Generation limitation: 15 ‫ ٭‬Percentage of mutations: 90% ►Compared to native single AA folding
  • 44. Asparagine Alanine Asparatic Acid N A D Asn Ala Asp Cysteine C Cys Glutamine Glutamic Acid Q E Glycine Isoleucine Gln Glu G I Gly Ile
  • 45. Leucine Serine Methionine L S M Leu Ser Met Valine Threonine V T Val Thr
  • 46. ►Enkephalin is pentapeptide that is involved in regulating pain ►Two forms of enkephalin ‫ ٭‬Methylated-enkephalin – Tyr-Gly-Gly-Phe-Met ‫ ٭‬Leucine-enkephalin – Tyr-Gly-Gly-Phe-Leu ►Short enough to confirm the accuracy of the GAPSS, however still contains complex ring side groups
  • 47. ►Gradient zero conformations suggests the GAPSS are capable of obtaining local minima ►Backbone conformations showed incredible similarities ►Side group conformations still show discrepancy between predicted and theoretical
  • 48. ►GAPSS was able to locate a few local minimum protein conformations
  • 49. ►Backbone structure was predicted by the GAPSS GA NMR predicted Confirmed Backbone Backbone Structure Structure
  • 50. ► Discrepancies between side groups due to the lack of entropy, solvation energy, and center partial charge assumption GA predicted Backbone Structure NMR Confirmed Backbone Structure
  • 51. ► (a) The minimum energy of each generation with different initial population at 3 generation limit and 20% mutation ► (b) The minimum energy of each generation with different the percentage of mutation at 10 generation limit and 20 initial population. ► The optimal condition was found to be 30 initial population,15 generation limits, and 90% mutation percentage
  • 52. ► Progression of protein folding of the best prediction, potential energy continue to reduce suggest that more stringent GA parameters could lead to global minimum
  • 53. ►Due to computing capability limitation, less stringent GA parameters were used ►Energy level of predicted enkephalin structure is less than the theoretical, however, the code is still showing energy decrease ►More sophisticated partial charge calculation and non- bonded energy could improve the prediction ►There are zero gradient structures predicted by the GAPSS
  • 54. ► GA based search and optimization is a simple and efficient method for the isolated native protein structure prediction ► Continuous decimal representation of dihedral angles is more efficient than binary representation of dihedral angles, despite the crossover barriers ► a-helices and b-sheets search converges faster than free torsion angle search ► Similar backbone dihedrals predicted from VBA GA compared to Protein Databank
  • 55. Chemical, Biological, and Materials Engineering Department, University of Oklahoma Advanced Design II
  • 56. ►Distance calculation from the origin      x   R cos    q 1      R cos( q 1 )   2  2       y  R sin    q 1    cos( b 1 )  R sin( q 1 ) cos( b 1 )   x 2  2       z  R sin    q 1    sin( b 1 )  R sin( q 1 ) sin( b 1 )   q 2  2  Ca (- 1.52,1.37,0) N (-1.04 ,0,0) (x)  (y )  (z ) 2 2 2 w   R cos( q 1 )  2   R sin( q 1 ) cos( b 1 )    R sin( q 1 ) sin( b 1 )  2 2 H- (0,0,0) Y      R cos( q 1 )   sin( q 1 ) R  cos( b 1 )  sin( b 1 ) 2 2 2 2 Z    R cos( q 1 )   sin( q 1 ) R  (1) 2 2 cos( q  )  sin( q 1 ) R 2 2 2 1  R (1) 2
  • 57. ►Rotate one axis at a time to compensate for bond and dihedral angle, there is no rotation around y x’ z’ x z qz qx y y qz qx y’ y’ z =z’ x =x’
  • 58.
  • 59. Qy is 0, cancelation of most of trigonometry functions 1 1 1 1
  • 60.   cos q ij  sin q ij  ri  1 j cos q ij   0    sin q ij cos w ij  cos q ij cos w ij  sin w ij ri  1 j sin q ij cos w ij   Det  Bn    1   sin q ij sin w ij ri  1 j sin q ij sin w ij   cos q ij sin w ij cos w ij        0 0 0 1    cos q ij  sin q ij  ri  1 j cos q ij  0  R cos( q ij )  x i 1     x i 1    sin q ij cos w ij  cos q ij cos w ij  sin w ij ri  1 j sin q ij cos w ij       y i  1  Det   y i  1   R sin( q ij ) cos( w ij )       sin q ij sin w ij ri  1 j sin q ij sin w ij   cos q ij sin w ij cos w ij  z i  1   R sin( q ij ) sin( w ij )   z i 1           2 2    R cos( q ij )   x i 1   x i 1           y i  1    y i  1      R sin( q ij ) cos( w ij )     R sin( q ) sin( w )        z i 1   z i 1      ij   ij