SlideShare a Scribd company logo
• We propose an accurate potential which combines useful features
• HP, HH and PP interactions among the amino acids
• Sequence based accessibility obtained for each amino acids
• 3D Structure based property i.e. uPhi and uPsi
• The improved potential can be used for
• Protein-Ligand binding site prediction
• Ab Initio protein structure prediction
• Fold recognition
• Drug design and
• Enzyme design
• The proposed potential outperforms all the stat-of-arts approaches.
• 3D structure prediction is useful in drug and novel enzymes design.
• Energy functions can aid in
• Protein structure prediction and
• Fold recognition
• We propose, 3DIGARS3.0 potential for improved accuracy.
• We introduce two 3D structural features
• uPhi based energy
• uPsi based energy
• Motivation comes from the fact that the 3D structural features assists
the advancement of the accuracy.
• uPhi and uPsi are linearly combined with prior energy components
• 3DIGARS energy which is based on HP, HH and PP interactions and
their respective ideal gas reference state
• ASA energy computed by modeling real and predicted accessibility
obtained from protein sequences
• The linearly combined energies are optimized using GA
• Three decoy sets were used in optimization
• Moulder
• Rosetta and
• I-Tasser
• Five independent test decoy sets were used to evaluate the accuracy
• 4state_reduced
• fisa_casp3
• hg_structal
• ig_structal and
• ig_structural hires
• 3DIGARS3.0 outperformed the state-of-the-arts approaches
• DFIRE by 440.91%
• RWplus by 440.91%
• dDFIRE by 72.46%
• GOAP by 20.20%
• 3DIGARS by 417.39%
• 3DIGARS2.0 by 440.91% based on independent test datasets.
• The percentage weighted average improvement is calculated as
where, yi represents new value and xi represents old value
Figure 1: (a) Native like protein conformation, presented in a 3D hexagonal-close-packing (HCP) configuration
using hydrophobic (H) and hydrophilic or polar (P) residues. The H-H interactions space is relatively smaller
than P-P interactions space, since hydrophobic residues (black ball) being afraid of water tends to remain inside
of the central space. (b) 3D metaphoric HP folding kernels, depicted based on HCP configuration based HP
model, showing the 3 layers of distributions of amino-acids.
Figure 5: Process flow of the design and development of 3DIGARS3.0 energy function.
• 3DIGARS potential
• Core statistical function based on HP, HH and PP interactions (see Fig. 1)
• Segregated ideal gas reference state and libraries for HP, HH and PP groups
• Better training dataset (100% sequence identity cutoff can capture natural frequency
distribution)
• Three shape parameters (αhp, αhh and αpp) controls shape of assumed spherical protein
surface
• Three contribution parameters (βhp, βhh and βpp) controls the contribution of each group
• 3DIGARS2.0 potential
• Integration of the core energy and sequence specific features
• Sequence specific feature is computed by modeling error between the real and predicted
ASA (see Fig. 2)
• Real and predicted ASA are obtained from DSSP and REGAd3p respectively
• 3DIGARS2.0 is a linearly weighted accumulation of 3DIGARS and mined ASA
• 3DIGARS3.0 potential
• Integration of core energy, sequence specific energy and 3D structural features (see Fig. 5)
• 3D structural features added are attained based on uPhi and uPsi angles
• uPhi and uPsi are computed using Cartesian coordinates of set of 4 atoms (see Fig. 3 and 4)
• uPhi and uPsi based energies are computed based on following steps (see Fig. 4)
• Cosine value range (-1 to 1) of angles uPhi and uPsi are divided into 20 bins, each of
width 0.1
• Individual frequency tables for uPhi and uPsi are computed
• Frequency tables are further used to compute individual energy score libraries
• Energy score are then used to compute uPhi and uPsi energies for a given protein
• Protein folding and structure prediction problems relies on an accurate energy function.
• Accuracy of the potential function depends on
• Interaction distance between atom pairs
• Hydrophobic (H) and hydrophilic (P) properties
• Sequence-specific information
• Orientation-dependent interactions and
• Optimization techniques
• We develop a potential function, which is an optimized linearly weighted accumulation of
• 3-Dimensional Ideal Gas Reference State based Energy Function (3DIGARS)
• It is formulated using an idea of HP, HH and PP properties of amino acids
• Mined accessible surface area (ASA) and
• Ubiquitously computed Phi (uPhi) and Psi (uPsi) energies
• Optimization is performed using a Genetic Algorithm (GA).
• Based on independent test dataset, the proposed energy function outperformed state-of-the-
art approaches significantly.
An Eclectic Energy Function to Discriminate Native From Decoys
Avdesh Mishra, Sumaiya Iqbal, Md Tamjidul Hoque
email: {amishra2, siqbal1, thoque}@uno.edu
Department of Computer Science, University of New Orleans, New Orleans, LA, USA
Methods
Introduction
Results
Discussions
Conclusions
Acknowledgements
Figure 4: (a) Shows atoms arrangement as well as vectors created using the
Cartesian coordinates of the atoms. (b) Shows the dihedral angle ϴ involving the
four atoms.
PPppHHhhHPhp
DIGARS
EEEE  3
)(30.23 ASADIGARSDIGARS
EwEE 
)()()( 321
30.33 uPsiuPhiASADIGARSDIGARS
EwEwEwEE 
Figure 3: Definition of the angle ϴ formed by four atoms (At1, At2, At3 and At4). uPhi is
computed using At1 belonging to one residue and a set of atoms, At2, At3, At4 belonging
to some other residues. Similarly, uPsi is computed using a set of atoms, At1, At2, At3
belonging to some residues and an atom At4 belonging to some other residue.
Figure 2: The dark central area, composed of atoms, can be thought of a 3D proteins
and the outline around the area in green and red can be thought of real and predicted
accessible surface area respectively. The error between real and predicted ASA is
modelled as an energy feature.
Table 1: Performance comparison of different energy functions on
optimization datasets based on correct native count.
Decoy Sets
(No. of targets)
Methods
DFIRE RWplus dDFIRE GOAP 3DIGARS 3DIGARS2.0 3DIGARS3.0
Moulder
(20)
19
(-2.97)
19
(-2.84)
18
(-2.74)
19
(-3.58)
19
(-2.99)
19
(-2.68)
20
(-3.851)
Rosetta
(58)
20
(-1.82)
20
(-1.47)
12
(-0.83)
45
(-3.70)
31
(-2.023)
49
(-2.987)
46
(-2.683)
I-Tasser
(56)
49
(-4.02)
56
(-5.77)
48
(-5.03)
45
(-5.36)
53
(-4.036)
56
(-4.296)
56
(-5.573)
Weighted
Average in %
38.64 28.42 56.41 11.93 18.45 -1.61
Legend: Entry format is native-count (z-score). Bold indicates best scores. Underscore indicates close to best
scores.
Table 2: Performance comparison of different energy functions on
independent test datasets based on correct native count.
Decoy Sets
(No. of targets)
Methods
DFIRE RWplus dDFIRE GOAP 3DIGARS 3DIGARS2.0 3DIGARS3.0
4state_reduced
(7)
6
(-3.48)
6
(-3.51)
7
(-4.15)
7
(-4.38)
6
(-3.371)
4
(-2.642)
7
(-3.456)
fisa_casp3
(5)
4
(-4.80)
4
(-5.17)
4
(-4.83)
5
(-5.27)
5
(-4.319)
5
(-4.682)
4
(-4.076)
hg_structal
(29)
12
(-1.97)
12
(-1.74)
16
(-1.33)
22
(-2.73)
12
(-1.914)
12
(-1.589)
28
(-3.678)
ig_structal
(61)
0
(0.92)
0
(1.11)
26
(-1.02)
47
(-1.62)
0
(0.645)
0
(0.268)
60
(-2.526)
ig_structal_hires
(20)
0
(0.17)
0
(0.32)
16
(-2.05)
18
(-2.35)
0
(-0.002)
1
(0.030)
20
(-2.378)
Weighted
Average in %
440.91 440.91 72.46 20.20 417.39 440.91
Legend: Entry format is native-count (z-score). Bold indicates best scores. Underscore indicates close to best
scores.

 
 n
i
n
ii
x
xy
WA
1
1
100*)(
%
We gratefully acknowledge the Louisiana Board of Regents
through the Board of Regents Support Fund, LEQSF (2013-
16)-RD-A-19.

More Related Content

Similar to Avdesh-Poster-EnergyFunctionFinal

ijrrest_vol-2_issue-2_013
ijrrest_vol-2_issue-2_013ijrrest_vol-2_issue-2_013
ijrrest_vol-2_issue-2_013
Ashish Gupta
 
IJBB-51-3-188-200
IJBB-51-3-188-200IJBB-51-3-188-200
IJBB-51-3-188-200
sankar basu
 
Multisite UTE 31P Rosette MRSI(PETALUTE)
Multisite UTE 31P Rosette MRSI(PETALUTE)Multisite UTE 31P Rosette MRSI(PETALUTE)
Multisite UTE 31P Rosette MRSI(PETALUTE)
Uzay Emir
 
Final Paper
Final PaperFinal Paper
Final Paper
M M
 
Application of graph theory in drug design
Application of graph theory in drug designApplication of graph theory in drug design
Application of graph theory in drug design
Reihaneh Safavi
 

Similar to Avdesh-Poster-EnergyFunctionFinal (14)

G143741
G143741G143741
G143741
 
ijrrest_vol-2_issue-2_013
ijrrest_vol-2_issue-2_013ijrrest_vol-2_issue-2_013
ijrrest_vol-2_issue-2_013
 
Theoretical study of electronic properties of few variants
Theoretical study of electronic properties of few variantsTheoretical study of electronic properties of few variants
Theoretical study of electronic properties of few variants
 
IJBB-51-3-188-200
IJBB-51-3-188-200IJBB-51-3-188-200
IJBB-51-3-188-200
 
Multisite UTE 31P Rosette MRSI(PETALUTE)
Multisite UTE 31P Rosette MRSI(PETALUTE)Multisite UTE 31P Rosette MRSI(PETALUTE)
Multisite UTE 31P Rosette MRSI(PETALUTE)
 
Lanjutan kimed
Lanjutan kimedLanjutan kimed
Lanjutan kimed
 
Protein Threading
Protein ThreadingProtein Threading
Protein Threading
 
Final Paper
Final PaperFinal Paper
Final Paper
 
NMR Chemical Shift Prediction by Atomic Increment-Based Algorithms
NMR Chemical Shift Prediction by Atomic Increment-Based AlgorithmsNMR Chemical Shift Prediction by Atomic Increment-Based Algorithms
NMR Chemical Shift Prediction by Atomic Increment-Based Algorithms
 
Application of graph theory in drug design
Application of graph theory in drug designApplication of graph theory in drug design
Application of graph theory in drug design
 
Towards More Reliable 13C and 1H Chemical Shift Prediction: A Systematic Comp...
Towards More Reliable 13C and 1H Chemical Shift Prediction: A Systematic Comp...Towards More Reliable 13C and 1H Chemical Shift Prediction: A Systematic Comp...
Towards More Reliable 13C and 1H Chemical Shift Prediction: A Systematic Comp...
 
Optimal placement of_phasor_measurement_units_using_gravitat
Optimal placement of_phasor_measurement_units_using_gravitatOptimal placement of_phasor_measurement_units_using_gravitat
Optimal placement of_phasor_measurement_units_using_gravitat
 
Molecular modelling (1)
Molecular modelling (1)Molecular modelling (1)
Molecular modelling (1)
 
Quark Model Three Body Calculations for the Hypertriton Bound State
Quark Model Three Body Calculations for the Hypertriton Bound StateQuark Model Three Body Calculations for the Hypertriton Bound State
Quark Model Three Body Calculations for the Hypertriton Bound State
 

Avdesh-Poster-EnergyFunctionFinal

  • 1. • We propose an accurate potential which combines useful features • HP, HH and PP interactions among the amino acids • Sequence based accessibility obtained for each amino acids • 3D Structure based property i.e. uPhi and uPsi • The improved potential can be used for • Protein-Ligand binding site prediction • Ab Initio protein structure prediction • Fold recognition • Drug design and • Enzyme design • The proposed potential outperforms all the stat-of-arts approaches. • 3D structure prediction is useful in drug and novel enzymes design. • Energy functions can aid in • Protein structure prediction and • Fold recognition • We propose, 3DIGARS3.0 potential for improved accuracy. • We introduce two 3D structural features • uPhi based energy • uPsi based energy • Motivation comes from the fact that the 3D structural features assists the advancement of the accuracy. • uPhi and uPsi are linearly combined with prior energy components • 3DIGARS energy which is based on HP, HH and PP interactions and their respective ideal gas reference state • ASA energy computed by modeling real and predicted accessibility obtained from protein sequences • The linearly combined energies are optimized using GA • Three decoy sets were used in optimization • Moulder • Rosetta and • I-Tasser • Five independent test decoy sets were used to evaluate the accuracy • 4state_reduced • fisa_casp3 • hg_structal • ig_structal and • ig_structural hires • 3DIGARS3.0 outperformed the state-of-the-arts approaches • DFIRE by 440.91% • RWplus by 440.91% • dDFIRE by 72.46% • GOAP by 20.20% • 3DIGARS by 417.39% • 3DIGARS2.0 by 440.91% based on independent test datasets. • The percentage weighted average improvement is calculated as where, yi represents new value and xi represents old value Figure 1: (a) Native like protein conformation, presented in a 3D hexagonal-close-packing (HCP) configuration using hydrophobic (H) and hydrophilic or polar (P) residues. The H-H interactions space is relatively smaller than P-P interactions space, since hydrophobic residues (black ball) being afraid of water tends to remain inside of the central space. (b) 3D metaphoric HP folding kernels, depicted based on HCP configuration based HP model, showing the 3 layers of distributions of amino-acids. Figure 5: Process flow of the design and development of 3DIGARS3.0 energy function. • 3DIGARS potential • Core statistical function based on HP, HH and PP interactions (see Fig. 1) • Segregated ideal gas reference state and libraries for HP, HH and PP groups • Better training dataset (100% sequence identity cutoff can capture natural frequency distribution) • Three shape parameters (αhp, αhh and αpp) controls shape of assumed spherical protein surface • Three contribution parameters (βhp, βhh and βpp) controls the contribution of each group • 3DIGARS2.0 potential • Integration of the core energy and sequence specific features • Sequence specific feature is computed by modeling error between the real and predicted ASA (see Fig. 2) • Real and predicted ASA are obtained from DSSP and REGAd3p respectively • 3DIGARS2.0 is a linearly weighted accumulation of 3DIGARS and mined ASA • 3DIGARS3.0 potential • Integration of core energy, sequence specific energy and 3D structural features (see Fig. 5) • 3D structural features added are attained based on uPhi and uPsi angles • uPhi and uPsi are computed using Cartesian coordinates of set of 4 atoms (see Fig. 3 and 4) • uPhi and uPsi based energies are computed based on following steps (see Fig. 4) • Cosine value range (-1 to 1) of angles uPhi and uPsi are divided into 20 bins, each of width 0.1 • Individual frequency tables for uPhi and uPsi are computed • Frequency tables are further used to compute individual energy score libraries • Energy score are then used to compute uPhi and uPsi energies for a given protein • Protein folding and structure prediction problems relies on an accurate energy function. • Accuracy of the potential function depends on • Interaction distance between atom pairs • Hydrophobic (H) and hydrophilic (P) properties • Sequence-specific information • Orientation-dependent interactions and • Optimization techniques • We develop a potential function, which is an optimized linearly weighted accumulation of • 3-Dimensional Ideal Gas Reference State based Energy Function (3DIGARS) • It is formulated using an idea of HP, HH and PP properties of amino acids • Mined accessible surface area (ASA) and • Ubiquitously computed Phi (uPhi) and Psi (uPsi) energies • Optimization is performed using a Genetic Algorithm (GA). • Based on independent test dataset, the proposed energy function outperformed state-of-the- art approaches significantly. An Eclectic Energy Function to Discriminate Native From Decoys Avdesh Mishra, Sumaiya Iqbal, Md Tamjidul Hoque email: {amishra2, siqbal1, thoque}@uno.edu Department of Computer Science, University of New Orleans, New Orleans, LA, USA Methods Introduction Results Discussions Conclusions Acknowledgements Figure 4: (a) Shows atoms arrangement as well as vectors created using the Cartesian coordinates of the atoms. (b) Shows the dihedral angle ϴ involving the four atoms. PPppHHhhHPhp DIGARS EEEE  3 )(30.23 ASADIGARSDIGARS EwEE  )()()( 321 30.33 uPsiuPhiASADIGARSDIGARS EwEwEwEE  Figure 3: Definition of the angle ϴ formed by four atoms (At1, At2, At3 and At4). uPhi is computed using At1 belonging to one residue and a set of atoms, At2, At3, At4 belonging to some other residues. Similarly, uPsi is computed using a set of atoms, At1, At2, At3 belonging to some residues and an atom At4 belonging to some other residue. Figure 2: The dark central area, composed of atoms, can be thought of a 3D proteins and the outline around the area in green and red can be thought of real and predicted accessible surface area respectively. The error between real and predicted ASA is modelled as an energy feature. Table 1: Performance comparison of different energy functions on optimization datasets based on correct native count. Decoy Sets (No. of targets) Methods DFIRE RWplus dDFIRE GOAP 3DIGARS 3DIGARS2.0 3DIGARS3.0 Moulder (20) 19 (-2.97) 19 (-2.84) 18 (-2.74) 19 (-3.58) 19 (-2.99) 19 (-2.68) 20 (-3.851) Rosetta (58) 20 (-1.82) 20 (-1.47) 12 (-0.83) 45 (-3.70) 31 (-2.023) 49 (-2.987) 46 (-2.683) I-Tasser (56) 49 (-4.02) 56 (-5.77) 48 (-5.03) 45 (-5.36) 53 (-4.036) 56 (-4.296) 56 (-5.573) Weighted Average in % 38.64 28.42 56.41 11.93 18.45 -1.61 Legend: Entry format is native-count (z-score). Bold indicates best scores. Underscore indicates close to best scores. Table 2: Performance comparison of different energy functions on independent test datasets based on correct native count. Decoy Sets (No. of targets) Methods DFIRE RWplus dDFIRE GOAP 3DIGARS 3DIGARS2.0 3DIGARS3.0 4state_reduced (7) 6 (-3.48) 6 (-3.51) 7 (-4.15) 7 (-4.38) 6 (-3.371) 4 (-2.642) 7 (-3.456) fisa_casp3 (5) 4 (-4.80) 4 (-5.17) 4 (-4.83) 5 (-5.27) 5 (-4.319) 5 (-4.682) 4 (-4.076) hg_structal (29) 12 (-1.97) 12 (-1.74) 16 (-1.33) 22 (-2.73) 12 (-1.914) 12 (-1.589) 28 (-3.678) ig_structal (61) 0 (0.92) 0 (1.11) 26 (-1.02) 47 (-1.62) 0 (0.645) 0 (0.268) 60 (-2.526) ig_structal_hires (20) 0 (0.17) 0 (0.32) 16 (-2.05) 18 (-2.35) 0 (-0.002) 1 (0.030) 20 (-2.378) Weighted Average in % 440.91 440.91 72.46 20.20 417.39 440.91 Legend: Entry format is native-count (z-score). Bold indicates best scores. Underscore indicates close to best scores.     n i n ii x xy WA 1 1 100*)( % We gratefully acknowledge the Louisiana Board of Regents through the Board of Regents Support Fund, LEQSF (2013- 16)-RD-A-19.