SlideShare a Scribd company logo
1 of 60
►Introduction
  ‫ ٭‬Background
  ‫ ٭‬Problem
  ‫ ٭‬Energy Forms
►Methods
  ‫ ٭‬Genetic Algorithm
►Results and Discussion
►Conclusion
►VBA (Visual Basic Add-in) Program Demonstration
►A protein is a string of amino acids connected
 by peptide bonds.

►Amino acid
  ‫٭‬   Acidic        N-Terminus         C-Terminus
  ‫٭‬   Basic
  ‫٭‬   Aliphatic
  ‫٭‬   Polar uncharged
  ‫٭‬   Aromatic
►Proteins catalyze over 1,000 biochemical reactions in
 the human body.
►Protein misfoldings are responsible for over 20
 diseases.
   ‫ ٭‬Mad Cow disease caused by an “evil” protein - The “evil”
     protein and normal protein have identical primary
     structures, but their tertiary structures are different.


       Normal PrP                       Diseased PrP
►Some proteins fold as fast as a millionth of a second

►Theoretically, a protein of only 100 amino acids
 following the trial and error method would take 100
 billion years to try out all possible conformations!

►Protein structures are highly dependent upon various
 environmental parameters.
   ‫ ٭‬Such as temperature, pH, solvent, etc.
► Comparative - Use evolutionary related protein
   ‫ ٭‬Advantages: fast and simple
   ‫ ٭‬Disadvantages: conformation depends upon environmental parameters

► Folding Recognition - Utilize a database of known 3-D protein
  structure
   ‫ ٭‬Advantages: more accurate than comparative
   ‫ ٭‬Disadvantages: not enough NMR confirmed protein structures

► Ab Initio - Uses both scientific and engineering approach
   ‫ ٭‬Advantages: has potential to predict exact shape and immediate
     structures
   ‫ ٭‬Disadvantages: computing limitations, difficulty in selecting correct
     potential energy function
►Not enough NMR confirmed protein structure in Protein
 Data Bank (PDB)

►Evolutionary relatedness does not necessarily translate to
 similar structure

►Ab initio difficulties
   ‫ ٭‬Hydrophilic and hydrophobic modeling gives only general
     arrangement of the protein
   ‫-2 ٭‬D modeling does not predict 3-D shape of the protein
   ‫ ٭‬Monte-carlo computing method is time consuming and does not
     necessarily reach global minimum
►Develop a genetic algorithm based program to predict
 protein conformation

►Reduce the generations needed for prediction, thus
 enhance the efficiency of the search

►Explore different additional operators to modify genetic
 algorithm

►Predict the protein conformation of a short 5-AA
 peptide, Enkephalin
►Electrostatic Energy

►Nonbonding Energy

►Hydrogen Bonding Energy

►Cystein-Cystein Loop Energy
►Energy term calculated in atom pairs
  ‫ ٭‬Modeled after coulomb force

►Forces between two charges at certain distance
 (rij )
+                         +
E, Joule




                             r
                Electrostatic term




               r, Angstrom
►Two types of Lennard-Jones potential
  o 1-4 atom - connected by three bonds
  o 1-5 atom, higher interaction - connected by more than three
    bonds
►Modeled after Lennard-Jones Potential Repulsion/Attractive forces

                             F


                                      -F


                                           
                                           2
                        1

                            1-4 Interactions

                            1-5 Interactions
►Energy associated with the hydrogen bonding in the
 protein.
►Included if there are one or more intramolecular
 disulfide bonds
►The rotational angle
 between the bond between
 one pair of adjacent atoms
 and the next pair’s bond is
 called a dihedral angle

►Phi is between N and C, psi
 is between C and C’, omega
 is between C’ and N
► First 3 atoms on the peptide                                              x
  chain are fixed

► The coordinate system is                                             q
  arbitrarily determined around            Ca (-1.52,1.37,0)
  the first H atom of the N-
  terminus                                                                      N (-1.04 ,0,0)

                                                                  w
► Assumptions:
    ‫ ٭‬Minimal bond length stretch
                                                               H- (0,0,0)
    ‫ ٭‬Bond angle stays constant
                                       Y
    ‫ ٭‬Torsion angle (dihedral angle)
      applies to the 4th atom
                                                                        Z
  cos q ij                   sin q ij                                ri  1 j cos q ij
        x n1               0                                                                                                          
                                                                                                  0
                                                                                                                                     
                                             sin q ij cos w ij        cos q ij cos w ij    sin w ij          ri  1 j sin q ij cos w ij
         xn2                  0
               B B ... B                                                                                                            
                                      Bn 
        x n3               0            sin q ij sin w ij                                                 ri  1 j sin q ij sin w ij 
                   1 2     n
                                                                      cos q ij sin w ij       cos w ij
                                                                                                                                     
                             1                                                                                                         
       1                                            0                       0                   0                         1


The first 3 Bn parameters are fixed due to the previous assumption, B1, B2, and B3 corresponds
to the H-, -N-, Ca

                                                                                  cos q 13       sin q 13             r23 cos q 13 
                                          1               r12 
      1            0                                                                                            0
            0   0                               0    0
                                                                                                                                      
                                                                               sin q 13       cos q 13              r23 sin q 13
                                                                                                                  0
        0   1   0   0                      0    1    0      0
                                                                           B3                                                        
 B1                              B2                         
                                                                                                                                      
      0            0                   0                 0
                                                     1                               0               0           1           0
            0   1                               0
                                                                                                                                      
                                                              
                                                                                                                                      
                                                                                      0               0           0           1
      0            1                   0                 1
            0   0                               0    0
►Fisher projections to             w1= dihedral angle
 determine the
 dihedral angle of
 side-group atoms w2= 120 + w1

                                 w2= 180 + w1
►Assumption:
                                                        w1
   ‫ ٭‬Tetrahedral structure:
     120o apart
   ‫ ٭‬Bent structure: 180o
     apart
► Search and optimization method
  that mimics the natural selection

► Terms to define
    ‫ ٭‬Chromosome – a set of torsion angles
    ‫ ٭‬Gene – an individual torsion angle
    ‫ ٭‬Generation – a single loop within GA
      loop search


► Loops through the reproduction,
  mutation, and adaptation process
  to obtain best fit model
►Use a computer
 simulation to perform
 an intelligent
 search/optimization to
 find the native protein
 conformation that
 requires the least
 amount of energy

                           Native Conformation
►GAPSS is developed under Visual Basic Add-in
 environment

►Modified genetic operators
   ‫٭‬   Fitness function based selection
   ‫٭‬   Multiple entries crossover
   ‫٭‬   Non-uniform mutation
   ‫٭‬   Adaptation

►Advantages
   ‫ ٭‬Faster convergence
   ‫ ٭‬User-friendly
► Basic three primary energy:
  Eletrostatic, Nonbonded (6-
  12), and Hydrogen Bonded

► Exclude Torsion Energy
   ‫ ٭‬Not real interaction energy
   ‫ ٭‬Introduce penalty for positive
     torsion

► Cystine Loop-Closing
  introduced only when more
  than one cysteins are present
  in the protein
►Selection Operator
                                                Higher rank
   ‫ ٭‬Ranked Selection – higher                  or better
     the rank higher the                        fitness
     probability of being chosen
   ‫ ٭‬Fitness Selection – better
     the fitness higher the
     probability of being chosen


►Benefits of Selection             Lower rank
                                   or worse
   ‫ ٭‬Aid the Elitism Search        fitness
► Mutation Operator
   ‫ ٭‬Uniform Mutation – randomly
     replace with a value from
     -180 to 180
   ‫ ٭‬Non-uniform mutation – add
     or subtract a random value
     between 0 and 180

► Effects of Mutation
   ‫ ٭‬Introduce variance to search
   ‫ ٭‬Aid the search for global
     minimum by directing
     gradient search out of the
     local minima
►Crossover Operator
   ‫ ٭‬Random 2-point Crossover
     – randomly exchange
     between parents 2 angles at
     a time
   ‫ ٭‬Multiple Entries Crossover
     – multiple random
     exchange

►Benefits of Crossover
   ‫ ٭‬Aid the search for elites
   ‫ ٭‬Optimize the search by
     keeping the optimal folding
     segments
►Adaptation Operator
   ‫ ٭‬Gradient search applied to
     each chromosome
   ‫ ٭‬Predict energy profile


►Benefits of Adaptation
   ‫ ٭‬Provide the local minima
     search
   ‫ ٭‬Determine the energy
     profile of the native folding
     process
► Free GA search – no restriction on dihedral angles with
  exception of omega and ring structure
   ‫ ٭‬Advantages: use in any protein search, empirical way of obtaining
     protein conformation, and useful for energy profile search

► α-helices and b-sheets specific GA search – randomly select
  segment of protein as α-helices and b-sheets
   ‫ ٭‬Advantages: enhance the speed of free GA and accurate search for α-
     helices and b-sheets

► Binary GA search – use binary to represent dihedral angles
  instead decimal
   ‫ ٭‬Advantages: No barrier when doing crossover
►Creates α-helices and b-sheets
 of random lengths at random
 start positions

►Each α-helix or b-sheet created
 in this way is described by two
 parameters

►Crossover will involve trading
 the two parameters between
 two individuals
►When α-helices are crossed
 over, each individual’s new
 energy is compared to its old
 energy. If there is a net       Green
                                 region
 improvement, the crossover
 is kept.

►The “former helix” regions       Blue
                                 region
 will be filled with random
 torsion angles like normal
►Transfer torsion angles to binary code
   ‫ ٭‬Integer and decimal coded separately to shorten the total
     number of digits - 17 digits altogether
►Idea is to make the torsion angles on a single
 chromosome represented by one long continuous
 chain
   ‫ ٭‬Cross over and Mutation operators all similar to GA

                             10100101010010000101001110101100001
                             01011010100100001010010101001000010
                             10010101001010010100101010011100
►All single AA was predicted with GAPSS

►GA parameters
  ‫ ٭‬Initial population: 20
  ‫ ٭‬Generation limitation: 15
  ‫ ٭‬Percentage of mutations: 90%

►Compared to native single AA folding
Asparagine
    Alanine                      Asparatic Acid
              N
    A                            D
              Asn
    Ala                          Asp

                                                      Cysteine
                                                      C
                                                      Cys


Glutamine     Glutamic Acid
Q             E             Glycine               Isoleucine
Gln           Glu           G                     I
                            Gly                   Ile
Leucine                                     Serine
                      Methionine
L                                           S
                      M
Leu                                         Ser
                      Met




                                   Valine
          Threonine
                                   V
          T
                                   Val
          Thr
►Enkephalin is pentapeptide that is involved in
 regulating pain

►Two forms of enkephalin
   ‫ ٭‬Methylated-enkephalin – Tyr-Gly-Gly-Phe-Met
   ‫ ٭‬Leucine-enkephalin – Tyr-Gly-Gly-Phe-Leu

►Short enough to confirm the accuracy of the
 GAPSS, however still contains complex ring side
 groups
►Gradient zero conformations suggests the GAPSS
 are capable of obtaining local minima

►Backbone conformations showed incredible
 similarities

►Side group conformations still show discrepancy
 between predicted and theoretical
►GAPSS was able to locate a few local minimum
 protein conformations
►Backbone structure was predicted by the GAPSS




        GA                          NMR
        predicted                   Confirmed
        Backbone                    Backbone
        Structure                   Structure
► Discrepancies between side groups due to the lack of
  entropy, solvation energy, and center partial charge
  assumption




GA
predicted
Backbone
Structure                                      NMR
                                               Confirmed
                                               Backbone
                                               Structure
► (a) The minimum energy of each
  generation with different initial
  population at 3 generation limit
  and 20% mutation

► (b) The minimum energy of each
  generation with different the
  percentage of mutation at 10
  generation limit and 20 initial
  population.

► The optimal condition was found
  to be 30 initial population,15
  generation limits, and 90%
  mutation percentage
► Progression of protein folding of the best prediction, potential energy
  continue to reduce suggest that more stringent GA parameters could lead to
  global minimum
►Due to computing capability limitation, less stringent GA
 parameters were used

►Energy level of predicted enkephalin structure is less than
 the theoretical, however, the code is still showing energy
 decrease

►More sophisticated partial charge calculation and non-
 bonded energy could improve the prediction

►There are zero gradient structures predicted by the GAPSS
► GA based search and optimization is a simple and efficient method
  for the isolated native protein structure prediction

► Continuous decimal representation of dihedral angles is more
  efficient than binary representation of dihedral angles, despite the
  crossover barriers

► a-helices and b-sheets search converges faster than free torsion
  angle search

► Similar backbone dihedrals predicted from VBA GA compared to
  Protein Databank
Chemical, Biological, and Materials Engineering
     Department, University of Oklahoma



             Advanced Design II
►Distance calculation from the origin
                                                                             
                                                       x   R cos    q 1      R cos( q 1 )
                                                                                 
                                                                    2        2 

                                                                           
                                                       y  R sin    q 1    cos( b 1 )  R sin( q 1 ) cos( b 1 )
                                                                               
                                 x                                2        2 

                                                                           
                                                       z  R sin    q 1    sin( b 1 )  R sin( q 1 ) sin( b 1 )
                                                                               
                          q                                       2        2 
     Ca (-
     1.52,1.37,0)
                                     N (-1.04 ,0,0)
                                                      (x)  (y )  (z )
                                                               2             2             2


                      w

                                                        R cos( q 1 )  2          R sin( q 1 ) cos( b 1 )    R sin( q 1 ) sin( b 1 ) 
                                                                                                               2                                2
                    H- (0,0,0)
 Y
                                                                                                                                      
                                                         R cos( q 1 )   sin( q 1 ) R  cos( b 1 )  sin( b 1 )
                                                                                  2                    2           2               2


                           Z                             R cos( q 1 )   sin( q 1 ) R  (1)
                                                                                  2                    2



                                                               cos( q                             
                                                                             )  sin( q 1 )
                                                      R
                                                           2                  2                2
                                                                         1

                                                       R (1)
                                                           2
►Rotate one axis at a time to compensate for bond
 and dihedral angle, there is no rotation around y

                 x’                           z’
                       x                                 z
            qz                           qx

                           y                                 y
                      qz                           qx

                       y’                           y’



        z =z’                        x =x’
Qy is 0, cancelation of most of trigonometry functions
             1                   1
                                                     1

                                                     1
  cos q ij                        sin q ij                                   ri  1 j cos q ij
                                                                                                                 
                                                                            0
                                                                                                                 
           
             sin q ij cos w ij            cos q ij cos w ij             sin w ij     ri  1 j sin q ij cos w ij 
    
Det  Bn                                                                                                         1
                                                                                                                    
            sin q ij sin w ij                                                         ri  1 j sin q ij sin w ij 
                                          cos q ij sin w ij            cos w ij
                                                                                                                 
           
                                                                                                                   
                                                                                                                 
                      0                            0                        0                       1
    
                  cos q ij                      sin q ij                                  ri  1 j cos q ij     
                                                                                0
                                                                                                                                           R cos( q ij )
                                                                                                                          x i 1                              
 x i 1                                                                                                          
                  sin q ij cos w ij          cos q ij cos w ij            sin w ij     ri  1 j sin q ij cos w ij                                             
                                                                                                                                
  y i  1  Det                                                                                                         y i  1   R sin( q ij ) cos( w ij ) 
                                                                                                                                
                 sin q ij sin w ij                                                      ri  1 j sin q ij sin w ij 
                                             cos q ij sin w ij            cos w ij
                                                                                                                          z i  1   R sin( q ij ) sin( w ij ) 
 z i 1                                                                                                          
                                                                                                                                                            
                                                                                                                   
                                                                    2
                         2
                                                            
                                        R cos( q ij )
  x i 1   x i 1  
                                                            
                          
                        
  y i  1    y i  1      R sin( q ij ) cos( w ij )  
                                
                                 R sin( q ) sin( w )  
                                                             
  z i 1   z i 1  
                                                   ij  
                                              ij

More Related Content

Viewers also liked

Neural networks...
Neural networks...Neural networks...
Neural networks...Molly Chugh
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure predictionkaramveer prajapat
 
Knowledge based systems
Knowledge based systemsKnowledge based systems
Knowledge based systemsYowan Rdotexe
 
neural network
neural networkneural network
neural networkSTUDENT
 

Viewers also liked (6)

Neural networks...
Neural networks...Neural networks...
Neural networks...
 
Homology modelling
Homology modellingHomology modelling
Homology modelling
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
 
Protein Structure Prediction
Protein Structure PredictionProtein Structure Prediction
Protein Structure Prediction
 
Knowledge based systems
Knowledge based systemsKnowledge based systems
Knowledge based systems
 
neural network
neural networkneural network
neural network
 

Similar to Protein Folding Prediction

Understand Of Linear Algebra
Understand Of Linear AlgebraUnderstand Of Linear Algebra
Understand Of Linear AlgebraEdward Yoon
 
Lesson 12: Linear Independence
Lesson 12: Linear IndependenceLesson 12: Linear Independence
Lesson 12: Linear IndependenceMatthew Leingang
 
Midterm II Review Session Slides
Midterm II Review Session SlidesMidterm II Review Session Slides
Midterm II Review Session SlidesMatthew Leingang
 
Eigenvalues in a Nutshell
Eigenvalues in a NutshellEigenvalues in a Nutshell
Eigenvalues in a Nutshellguest9006ab
 
Minor League Grant Green
Minor League   Grant GreenMinor League   Grant Green
Minor League Grant Greenmabbagliati
 
11X1 T08 03 rules for differentiation
11X1 T08 03 rules for differentiation11X1 T08 03 rules for differentiation
11X1 T08 03 rules for differentiationNigel Simmons
 

Similar to Protein Folding Prediction (9)

Understand Of Linear Algebra
Understand Of Linear AlgebraUnderstand Of Linear Algebra
Understand Of Linear Algebra
 
Lesson 12: Linear Independence
Lesson 12: Linear IndependenceLesson 12: Linear Independence
Lesson 12: Linear Independence
 
Midterm II Review Session Slides
Midterm II Review Session SlidesMidterm II Review Session Slides
Midterm II Review Session Slides
 
Eigenvalues in a Nutshell
Eigenvalues in a NutshellEigenvalues in a Nutshell
Eigenvalues in a Nutshell
 
Quantum Logic
Quantum LogicQuantum Logic
Quantum Logic
 
Minor League Grant Green
Minor League   Grant GreenMinor League   Grant Green
Minor League Grant Green
 
Lecture 11
Lecture 11Lecture 11
Lecture 11
 
11X1 T08 03 rules for differentiation
11X1 T08 03 rules for differentiation11X1 T08 03 rules for differentiation
11X1 T08 03 rules for differentiation
 
Greens Greenery
Greens GreeneryGreens Greenery
Greens Greenery
 

Recently uploaded

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Protein Folding Prediction

  • 1.
  • 2. ►Introduction ‫ ٭‬Background ‫ ٭‬Problem ‫ ٭‬Energy Forms ►Methods ‫ ٭‬Genetic Algorithm ►Results and Discussion ►Conclusion ►VBA (Visual Basic Add-in) Program Demonstration
  • 3. ►A protein is a string of amino acids connected by peptide bonds. ►Amino acid ‫٭‬ Acidic N-Terminus C-Terminus ‫٭‬ Basic ‫٭‬ Aliphatic ‫٭‬ Polar uncharged ‫٭‬ Aromatic
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10. ►Proteins catalyze over 1,000 biochemical reactions in the human body.
  • 11. ►Protein misfoldings are responsible for over 20 diseases. ‫ ٭‬Mad Cow disease caused by an “evil” protein - The “evil” protein and normal protein have identical primary structures, but their tertiary structures are different. Normal PrP Diseased PrP
  • 12. ►Some proteins fold as fast as a millionth of a second ►Theoretically, a protein of only 100 amino acids following the trial and error method would take 100 billion years to try out all possible conformations! ►Protein structures are highly dependent upon various environmental parameters. ‫ ٭‬Such as temperature, pH, solvent, etc.
  • 13. ► Comparative - Use evolutionary related protein ‫ ٭‬Advantages: fast and simple ‫ ٭‬Disadvantages: conformation depends upon environmental parameters ► Folding Recognition - Utilize a database of known 3-D protein structure ‫ ٭‬Advantages: more accurate than comparative ‫ ٭‬Disadvantages: not enough NMR confirmed protein structures ► Ab Initio - Uses both scientific and engineering approach ‫ ٭‬Advantages: has potential to predict exact shape and immediate structures ‫ ٭‬Disadvantages: computing limitations, difficulty in selecting correct potential energy function
  • 14. ►Not enough NMR confirmed protein structure in Protein Data Bank (PDB) ►Evolutionary relatedness does not necessarily translate to similar structure ►Ab initio difficulties ‫ ٭‬Hydrophilic and hydrophobic modeling gives only general arrangement of the protein ‫-2 ٭‬D modeling does not predict 3-D shape of the protein ‫ ٭‬Monte-carlo computing method is time consuming and does not necessarily reach global minimum
  • 15. ►Develop a genetic algorithm based program to predict protein conformation ►Reduce the generations needed for prediction, thus enhance the efficiency of the search ►Explore different additional operators to modify genetic algorithm ►Predict the protein conformation of a short 5-AA peptide, Enkephalin
  • 16.
  • 17. ►Electrostatic Energy ►Nonbonding Energy ►Hydrogen Bonding Energy ►Cystein-Cystein Loop Energy
  • 18. ►Energy term calculated in atom pairs ‫ ٭‬Modeled after coulomb force ►Forces between two charges at certain distance (rij )
  • 19. + + E, Joule r Electrostatic term r, Angstrom
  • 20. ►Two types of Lennard-Jones potential o 1-4 atom - connected by three bonds o 1-5 atom, higher interaction - connected by more than three bonds
  • 21. ►Modeled after Lennard-Jones Potential Repulsion/Attractive forces F -F   2 1 1-4 Interactions 1-5 Interactions
  • 22. ►Energy associated with the hydrogen bonding in the protein.
  • 23. ►Included if there are one or more intramolecular disulfide bonds
  • 24.
  • 25. ►The rotational angle between the bond between one pair of adjacent atoms and the next pair’s bond is called a dihedral angle ►Phi is between N and C, psi is between C and C’, omega is between C’ and N
  • 26. ► First 3 atoms on the peptide x chain are fixed ► The coordinate system is q arbitrarily determined around Ca (-1.52,1.37,0) the first H atom of the N- terminus N (-1.04 ,0,0) w ► Assumptions: ‫ ٭‬Minimal bond length stretch H- (0,0,0) ‫ ٭‬Bond angle stays constant Y ‫ ٭‬Torsion angle (dihedral angle) applies to the 4th atom Z
  • 27.   cos q ij  sin q ij  ri  1 j cos q ij  x n1  0   0      sin q ij cos w ij  cos q ij cos w ij  sin w ij ri  1 j sin q ij cos w ij xn2 0    B B ... B     Bn   x n3  0   sin q ij sin w ij ri  1 j sin q ij sin w ij  1 2 n  cos q ij sin w ij cos w ij      1    1  0 0 0 1 The first 3 Bn parameters are fixed due to the previous assumption, B1, B2, and B3 corresponds to the H-, -N-, Ca   cos q 13  sin q 13  r23 cos q 13   1  r12  1 0 0 0 0 0 0       sin q 13  cos q 13 r23 sin q 13 0 0 1 0 0 0 1 0 0 B3    B1    B2      0 0 0 0 1 0 0 1 0 0 1 0         0 0 0 1 0 1 0 1 0 0 0 0
  • 28. ►Fisher projections to w1= dihedral angle determine the dihedral angle of side-group atoms w2= 120 + w1 w2= 180 + w1 ►Assumption: w1 ‫ ٭‬Tetrahedral structure: 120o apart ‫ ٭‬Bent structure: 180o apart
  • 29.
  • 30. ► Search and optimization method that mimics the natural selection ► Terms to define ‫ ٭‬Chromosome – a set of torsion angles ‫ ٭‬Gene – an individual torsion angle ‫ ٭‬Generation – a single loop within GA loop search ► Loops through the reproduction, mutation, and adaptation process to obtain best fit model
  • 31. ►Use a computer simulation to perform an intelligent search/optimization to find the native protein conformation that requires the least amount of energy Native Conformation
  • 32. ►GAPSS is developed under Visual Basic Add-in environment ►Modified genetic operators ‫٭‬ Fitness function based selection ‫٭‬ Multiple entries crossover ‫٭‬ Non-uniform mutation ‫٭‬ Adaptation ►Advantages ‫ ٭‬Faster convergence ‫ ٭‬User-friendly
  • 33. ► Basic three primary energy: Eletrostatic, Nonbonded (6- 12), and Hydrogen Bonded ► Exclude Torsion Energy ‫ ٭‬Not real interaction energy ‫ ٭‬Introduce penalty for positive torsion ► Cystine Loop-Closing introduced only when more than one cysteins are present in the protein
  • 34. ►Selection Operator Higher rank ‫ ٭‬Ranked Selection – higher or better the rank higher the fitness probability of being chosen ‫ ٭‬Fitness Selection – better the fitness higher the probability of being chosen ►Benefits of Selection Lower rank or worse ‫ ٭‬Aid the Elitism Search fitness
  • 35. ► Mutation Operator ‫ ٭‬Uniform Mutation – randomly replace with a value from -180 to 180 ‫ ٭‬Non-uniform mutation – add or subtract a random value between 0 and 180 ► Effects of Mutation ‫ ٭‬Introduce variance to search ‫ ٭‬Aid the search for global minimum by directing gradient search out of the local minima
  • 36. ►Crossover Operator ‫ ٭‬Random 2-point Crossover – randomly exchange between parents 2 angles at a time ‫ ٭‬Multiple Entries Crossover – multiple random exchange ►Benefits of Crossover ‫ ٭‬Aid the search for elites ‫ ٭‬Optimize the search by keeping the optimal folding segments
  • 37. ►Adaptation Operator ‫ ٭‬Gradient search applied to each chromosome ‫ ٭‬Predict energy profile ►Benefits of Adaptation ‫ ٭‬Provide the local minima search ‫ ٭‬Determine the energy profile of the native folding process
  • 38. ► Free GA search – no restriction on dihedral angles with exception of omega and ring structure ‫ ٭‬Advantages: use in any protein search, empirical way of obtaining protein conformation, and useful for energy profile search ► α-helices and b-sheets specific GA search – randomly select segment of protein as α-helices and b-sheets ‫ ٭‬Advantages: enhance the speed of free GA and accurate search for α- helices and b-sheets ► Binary GA search – use binary to represent dihedral angles instead decimal ‫ ٭‬Advantages: No barrier when doing crossover
  • 39. ►Creates α-helices and b-sheets of random lengths at random start positions ►Each α-helix or b-sheet created in this way is described by two parameters ►Crossover will involve trading the two parameters between two individuals
  • 40. ►When α-helices are crossed over, each individual’s new energy is compared to its old energy. If there is a net Green region improvement, the crossover is kept. ►The “former helix” regions Blue region will be filled with random torsion angles like normal
  • 41. ►Transfer torsion angles to binary code ‫ ٭‬Integer and decimal coded separately to shorten the total number of digits - 17 digits altogether ►Idea is to make the torsion angles on a single chromosome represented by one long continuous chain ‫ ٭‬Cross over and Mutation operators all similar to GA 10100101010010000101001110101100001 01011010100100001010010101001000010 10010101001010010100101010011100
  • 42.
  • 43. ►All single AA was predicted with GAPSS ►GA parameters ‫ ٭‬Initial population: 20 ‫ ٭‬Generation limitation: 15 ‫ ٭‬Percentage of mutations: 90% ►Compared to native single AA folding
  • 44. Asparagine Alanine Asparatic Acid N A D Asn Ala Asp Cysteine C Cys Glutamine Glutamic Acid Q E Glycine Isoleucine Gln Glu G I Gly Ile
  • 45. Leucine Serine Methionine L S M Leu Ser Met Valine Threonine V T Val Thr
  • 46. ►Enkephalin is pentapeptide that is involved in regulating pain ►Two forms of enkephalin ‫ ٭‬Methylated-enkephalin – Tyr-Gly-Gly-Phe-Met ‫ ٭‬Leucine-enkephalin – Tyr-Gly-Gly-Phe-Leu ►Short enough to confirm the accuracy of the GAPSS, however still contains complex ring side groups
  • 47. ►Gradient zero conformations suggests the GAPSS are capable of obtaining local minima ►Backbone conformations showed incredible similarities ►Side group conformations still show discrepancy between predicted and theoretical
  • 48. ►GAPSS was able to locate a few local minimum protein conformations
  • 49. ►Backbone structure was predicted by the GAPSS GA NMR predicted Confirmed Backbone Backbone Structure Structure
  • 50. ► Discrepancies between side groups due to the lack of entropy, solvation energy, and center partial charge assumption GA predicted Backbone Structure NMR Confirmed Backbone Structure
  • 51. ► (a) The minimum energy of each generation with different initial population at 3 generation limit and 20% mutation ► (b) The minimum energy of each generation with different the percentage of mutation at 10 generation limit and 20 initial population. ► The optimal condition was found to be 30 initial population,15 generation limits, and 90% mutation percentage
  • 52. ► Progression of protein folding of the best prediction, potential energy continue to reduce suggest that more stringent GA parameters could lead to global minimum
  • 53. ►Due to computing capability limitation, less stringent GA parameters were used ►Energy level of predicted enkephalin structure is less than the theoretical, however, the code is still showing energy decrease ►More sophisticated partial charge calculation and non- bonded energy could improve the prediction ►There are zero gradient structures predicted by the GAPSS
  • 54. ► GA based search and optimization is a simple and efficient method for the isolated native protein structure prediction ► Continuous decimal representation of dihedral angles is more efficient than binary representation of dihedral angles, despite the crossover barriers ► a-helices and b-sheets search converges faster than free torsion angle search ► Similar backbone dihedrals predicted from VBA GA compared to Protein Databank
  • 55. Chemical, Biological, and Materials Engineering Department, University of Oklahoma Advanced Design II
  • 56. ►Distance calculation from the origin      x   R cos    q 1      R cos( q 1 )   2  2       y  R sin    q 1    cos( b 1 )  R sin( q 1 ) cos( b 1 )   x 2  2       z  R sin    q 1    sin( b 1 )  R sin( q 1 ) sin( b 1 )   q 2  2  Ca (- 1.52,1.37,0) N (-1.04 ,0,0) (x)  (y )  (z ) 2 2 2 w   R cos( q 1 )  2   R sin( q 1 ) cos( b 1 )    R sin( q 1 ) sin( b 1 )  2 2 H- (0,0,0) Y      R cos( q 1 )   sin( q 1 ) R  cos( b 1 )  sin( b 1 ) 2 2 2 2 Z    R cos( q 1 )   sin( q 1 ) R  (1) 2 2 cos( q  )  sin( q 1 ) R 2 2 2 1  R (1) 2
  • 57. ►Rotate one axis at a time to compensate for bond and dihedral angle, there is no rotation around y x’ z’ x z qz qx y y qz qx y’ y’ z =z’ x =x’
  • 58.
  • 59. Qy is 0, cancelation of most of trigonometry functions 1 1 1 1
  • 60.   cos q ij  sin q ij  ri  1 j cos q ij   0    sin q ij cos w ij  cos q ij cos w ij  sin w ij ri  1 j sin q ij cos w ij   Det  Bn    1   sin q ij sin w ij ri  1 j sin q ij sin w ij   cos q ij sin w ij cos w ij        0 0 0 1    cos q ij  sin q ij  ri  1 j cos q ij  0  R cos( q ij )  x i 1     x i 1    sin q ij cos w ij  cos q ij cos w ij  sin w ij ri  1 j sin q ij cos w ij       y i  1  Det   y i  1   R sin( q ij ) cos( w ij )       sin q ij sin w ij ri  1 j sin q ij sin w ij   cos q ij sin w ij cos w ij  z i  1   R sin( q ij ) sin( w ij )   z i 1           2 2    R cos( q ij )   x i 1   x i 1           y i  1    y i  1      R sin( q ij ) cos( w ij )     R sin( q ) sin( w )        z i 1   z i 1      ij   ij