IPG (Immobilized pH Gradient) based separations are frequently
used as the first step in shotgun proteomics methods; it yields an
increase in both the dynamic range and resolution of peptide
separation prior to the LC-MS analysis. Experimental isoelectric
point (pI) values can improve peptide identifications in conjunction
with MS/MS information. Our group has previously reported the
possibility of identifying theoretically peptides and proteins based
on different experimental properties. Thus, accurate estimation
of the pI value based on the amino acid sequence becomes critical
to perform these kinds of experiments. Nowadays, pI is commonly
predicted using the charge-state model [3], and/or the co-factor
algorithm. However, none of these methods is capable of
calculating the pI value for basic peptides accurately. In this
manuscript, we present an new approach that can significant
improve the pI estimation, by using Support Vector Machines
(SVM), an experimental amino acid descriptor taken from the
AAIndex database and the isoelectric point predicted by the
charge-state model.
1. Isoelectric point estimation using peptide descriptors and support vector machines
Y. Perez-Riverol, E. Audain, A. Millan, Y. Ramos, A. Sanchez, J. Vizcaíno, R. Wang,
M. Müller, Y. Machado, L. Betancourt, L. González, G. Padrón, V. Besada
Center for Genetic Engineering and Biotechnology, P.O. Box 6162, Havana 10600, Cuba
Contact email: yasset.perez@biocomp.cigb.edu.cu
Therapy
Feature Selection Protocol with APL led significant reduction of TNFα
& Complexity reduction
Introduction
IPG (Immobilized pH Gradient) based separations are frequently
used as the first step in shotgun proteomics methods; it yields an
increase in both the dynamic range and resolution of peptide
separation prior to the LC-MS analysis. Experimental isoelectric
point (pI) values can improve peptide identifications in conjunction
with MS/MS information [1]. Our group has previously reported the
possibility of identifying theoretically peptides and proteins based
on different experimental properties [2]. Thus, accurate estimation
of the pI value based on the amino acid sequence becomes critical
to perform these kinds of experiments. Nowadays, pI is commonly
predicted using the charge-state model [3], and/or the cofactor
algorithm [4]. However, none of these methods is capable of
calculating the pI value for basic peptides accurately. In this
manuscript, we present an new approach that can significant
improve the pI estimation, by using Support Vector Machines
(SVM), an experimental amino acid descriptor taken from the
AAIndex database [5] and the isoelectric point predicted by the
charge-state model.
TLC analysis of in-vitro fructan production using extracts from
leaves, stem and seeds of transgenic line 3.
A) Leaves extracts were subjected to IMAC. Proteins bound to Ni-NTA
beads were eluted with 250 mM imidazole and incubated with 200 mM
sucrose for 24 h at 30ºC. Lanes: 1, fructans from onion bulb; 2, nontransformed plant; 3, transgenic line 3.
B) Stem and seed extracts from transgenic line 3 were incubated with
200 mM sucrose for 24 h at 30ºC. Lanes: 1, substrate (control); 2, heatinactivated stem extract; 3, stem extract; 4, seed extract; 5, marker.
Correlation matrix on the predictors show the correlation between all the calculated
descriptors. Then, a subset of more problematic descriptors (cor > 0.7 ) were removed.
Finally, the feature select algorithm reduces the feature space from 555 to 44
descriptors.
Kernel
Experimental data & processing protocol
Number of
RMSD
Function
Predictors
Polynomial
0.3387
0.97
Lineal
20
0.3866
0.96
Exponential
D. melanogaster Kc167 cells
25
2
0.4
0.96
Radial
Protein Extraction
Four different SVMs function kernels with
automated sigma estimation using the kernlab Rpackage were evaluated. Final model selects
only two predictors, the isoelectric point
predicted with the Bjellqvist algorithm and the
experimental
AAindex
descriptor
from
Zimmerman .
R2
2
0.32
0.98
SVM algorithm vs Current algorithms
Protein Digestion
(A)
OFFGEL
Off-gel
Electrophoresis
LTQ-FT-ICR
4700 MS/MS Precursor 1570.7 Spec #1 MC[BP = 175.1, 3106]
175.1326
100
3105.9
90
1056.5107
80
1554.7853
70
it
e
% In t n s y
X!Tandem & Peptide
Prophet
(B)
1571.9679
684.3845
60
X!Tandem
1556.5172
50
40
112.0977
30
1558.4042
246.1672
20
72.1029
0
69.0
813.4371
333.2105
316.1747
120.0979
10
229.1560
400.2173
386.8
480.2749
463.2531
490.3423
1441.7213
741.3559
758.3326
627.3450
629.3128
942.4836
837.0470 910.8679
704.6
1039.4810
1040.9976
1022.4
1171.5131
1268.5427
1340.2
1445.2834
1559.9417
1570.2634
1551.7002
1658.0
Peptide Prophet
Mass (m/z)
(C)
Peptide
Identifications
Filter by Probability
Probability > 0.97
Non-PTMs
High Probability
Identifications
Descriptors Estimation & isoelectric point calculator
The theoretical and experimental values are more correlated in the 3.0–4.0 pH range. The average of the
standard deviation for the first five fractions for the SVM model, the charge-state and the adjacent
algorithm was 0.26, 0.23 and 0.25 respectively. In last five fractions the average of the standard deviation
(stdv) was 0.20, 0.52, 0.32 for the SVM model, the charge-state and the adjacent algorithm respectively
Detection of possible False positive peptide
identifications.
Probability
References
[1] Cargile BJ, Stephenson JL, Jr. An Alternative to Tandem Mass Spectrometry: Isoelectric Point and Accurate Mass for the
Identification of Peptides. Anal Chem. 2004;76:267-75.
[2] Perez-Riverol Y, Sanchez A, Ramos Y, Schmidt A, Muller M, Betancourt L, et al. In silico analysis of accurate proteomics,
complemented by selective isolation of peptides. J Proteomics. 2011;74:2071-82.
[3] Bjellqvist B, Hughes GJ, Pasquali C, Paquet N, Ravier F, Sanchez JC, et al. The focusing positions of polypeptides in immobilized
pH gradients can be predicted from their amino acid sequences. Electrophoresis. 1993;14:1023-31.
[4] Cargile BJ, Sevinsky JR, Essader AS, Eu JP, Stephenson JL, Jr. Calculation of the isoelectric point of tryptic peptides in the pH 3.54.5 range based on adjacent amino acid effects. Electrophoresis. 2008;29:2768-78.
[5] Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M. AAindex: amino acid index database, progress
report 2008. Nucleic Acids Res. 2008;36:D202-5.
0.7
0.6
0.5
0.4
0.3
0.2
0.1
211687
33492
15960
11244
9780
9540
10200
11556
16212
4344
16893
2791
1330
937
815
795
850
963
1351
362
% peptides
Using pICalculator: Bjellqvist pI and Cargile pI
0.8
Non-redundant
Peptides
Physicochemical and biological properties from AAindex
(PD= (∑AD)/NA
0.9
Identified Peptides
Using
ChemAxon
(http://www.chemaxon.com):
refractivity index, polarizability, surface area, LogP
1
0.2
2.6
5.9
6.1
9.3
14.0
16.4
16.8
22.6
31.2
Non-redundant
10
34
39
33
45
68
94
113
228
86
Conclusion
We combined a SVM approach with only two simple peptide
descriptors to predict the isoelectric point of identified peptides, and
our results have shown better accuracy than the existing methods.
Furthermore, the ability of calculating the pI of peptides to this
accurate level is desirable for peptide pI filtering. We envisage that
the same approach could also be applied to predict the effect of
posttranslational modifications. The use of SVMs and the approach
described in this work could be useful for these types of analyses.