SlideShare a Scribd company logo
1 of 1
Download to read offline
Isoelectric point estimation using peptide descriptors and support vector machines
Y. Perez-Riverol, E. Audain, A. Millan, Y. Ramos, A. Sanchez, J. Vizcaíno, R. Wang,
M. Müller, Y. Machado, L. Betancourt, L. González, G. Padrón, V. Besada
Center for Genetic Engineering and Biotechnology, P.O. Box 6162, Havana 10600, Cuba
Contact email: yasset.perez@biocomp.cigb.edu.cu

Therapy
Feature Selection Protocol with APL led significant reduction of TNFα
& Complexity reduction

Introduction
IPG (Immobilized pH Gradient) based separations are frequently
used as the first step in shotgun proteomics methods; it yields an
increase in both the dynamic range and resolution of peptide
separation prior to the LC-MS analysis. Experimental isoelectric
point (pI) values can improve peptide identifications in conjunction
with MS/MS information [1]. Our group has previously reported the
possibility of identifying theoretically peptides and proteins based
on different experimental properties [2]. Thus, accurate estimation
of the pI value based on the amino acid sequence becomes critical
to perform these kinds of experiments. Nowadays, pI is commonly
predicted using the charge-state model [3], and/or the cofactor
algorithm [4]. However, none of these methods is capable of
calculating the pI value for basic peptides accurately. In this
manuscript, we present an new approach that can significant
improve the pI estimation, by using Support Vector Machines
(SVM), an experimental amino acid descriptor taken from the
AAIndex database [5] and the isoelectric point predicted by the
charge-state model.

TLC analysis of in-vitro fructan production using extracts from
leaves, stem and seeds of transgenic line 3.
A) Leaves extracts were subjected to IMAC. Proteins bound to Ni-NTA
beads were eluted with 250 mM imidazole and incubated with 200 mM
sucrose for 24 h at 30ºC. Lanes: 1, fructans from onion bulb; 2, nontransformed plant; 3, transgenic line 3.
B) Stem and seed extracts from transgenic line 3 were incubated with
200 mM sucrose for 24 h at 30ºC. Lanes: 1, substrate (control); 2, heatinactivated stem extract; 3, stem extract; 4, seed extract; 5, marker.

Correlation matrix on the predictors show the correlation between all the calculated
descriptors. Then, a subset of more problematic descriptors (cor > 0.7 ) were removed.
Finally, the feature select algorithm reduces the feature space from 555 to 44
descriptors.
Kernel

Experimental data & processing protocol

Number of
RMSD

Function

Predictors

Polynomial

0.3387

0.97

Lineal

20

0.3866

0.96

Exponential

D. melanogaster Kc167 cells

25

2

0.4

0.96

Radial

Protein Extraction

Four different SVMs function kernels with
automated sigma estimation using the kernlab Rpackage were evaluated. Final model selects
only two predictors, the isoelectric point
predicted with the Bjellqvist algorithm and the
experimental
AAindex
descriptor
from
Zimmerman .

R2

2

0.32

0.98

SVM algorithm vs Current algorithms
Protein Digestion

(A)

OFFGEL

Off-gel
Electrophoresis

LTQ-FT-ICR

4700 MS/MS Precursor 1570.7 Spec #1 MC[BP = 175.1, 3106]
175.1326

100

3105.9

90
1056.5107

80

1554.7853

70

it
e
% In t n s y

X!Tandem & Peptide
Prophet

(B)

1571.9679

684.3845

60

X!Tandem

1556.5172

50
40
112.0977

30

1558.4042
246.1672

20

72.1029
0
69.0

813.4371

333.2105
316.1747

120.0979

10

229.1560

400.2173
386.8

480.2749
463.2531
490.3423

1441.7213
741.3559
758.3326

627.3450
629.3128

942.4836
837.0470 910.8679

704.6

1039.4810
1040.9976
1022.4

1171.5131

1268.5427
1340.2

1445.2834

1559.9417
1570.2634
1551.7002
1658.0

Peptide Prophet

Mass (m/z)

(C)
Peptide
Identifications

Filter by Probability

Probability > 0.97
Non-PTMs

High Probability
Identifications

Descriptors Estimation & isoelectric point calculator

The theoretical and experimental values are more correlated in the 3.0–4.0 pH range. The average of the
standard deviation for the first five fractions for the SVM model, the charge-state and the adjacent
algorithm was 0.26, 0.23 and 0.25 respectively. In last five fractions the average of the standard deviation
(stdv) was 0.20, 0.52, 0.32 for the SVM model, the charge-state and the adjacent algorithm respectively

Detection of possible False positive peptide
identifications.
Probability

References
[1] Cargile BJ, Stephenson JL, Jr. An Alternative to Tandem Mass Spectrometry: Isoelectric Point and Accurate Mass for the
Identification of Peptides. Anal Chem. 2004;76:267-75.
[2] Perez-Riverol Y, Sanchez A, Ramos Y, Schmidt A, Muller M, Betancourt L, et al. In silico analysis of accurate proteomics,
complemented by selective isolation of peptides. J Proteomics. 2011;74:2071-82.
[3] Bjellqvist B, Hughes GJ, Pasquali C, Paquet N, Ravier F, Sanchez JC, et al. The focusing positions of polypeptides in immobilized
pH gradients can be predicted from their amino acid sequences. Electrophoresis. 1993;14:1023-31.
[4] Cargile BJ, Sevinsky JR, Essader AS, Eu JP, Stephenson JL, Jr. Calculation of the isoelectric point of tryptic peptides in the pH 3.54.5 range based on adjacent amino acid effects. Electrophoresis. 2008;29:2768-78.
[5] Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M. AAindex: amino acid index database, progress
report 2008. Nucleic Acids Res. 2008;36:D202-5.

0.7

0.6

0.5

0.4

0.3

0.2

0.1

211687

33492

15960

11244

9780

9540

10200

11556

16212

4344

16893

2791

1330

937

815

795

850

963

1351

362

% peptides

Using pICalculator: Bjellqvist pI and Cargile pI

0.8

Non-redundant
Peptides

Physicochemical and biological properties from AAindex
(PD= (∑AD)/NA

0.9

Identified Peptides

Using
ChemAxon
(http://www.chemaxon.com):
refractivity index, polarizability, surface area, LogP

1

0.2

2.6

5.9

6.1

9.3

14.0

16.4

16.8

22.6

31.2

Non-redundant

10

34

39

33

45

68

94

113

228

86

Conclusion
We combined a SVM approach with only two simple peptide
descriptors to predict the isoelectric point of identified peptides, and
our results have shown better accuracy than the existing methods.
Furthermore, the ability of calculating the pI of peptides to this
accurate level is desirable for peptide pI filtering. We envisage that
the same approach could also be applied to predict the effect of
posttranslational modifications. The use of SVMs and the approach
described in this work could be useful for these types of analyses.

More Related Content

What's hot

Integrative bioinformatics analysis of Parkinson's disease related omics data
Integrative bioinformatics analysis of Parkinson's disease related omics dataIntegrative bioinformatics analysis of Parkinson's disease related omics data
Integrative bioinformatics analysis of Parkinson's disease related omics dataEnrico Glaab
 
Nc state lecture v2 Computational Toxicology
Nc state lecture v2 Computational ToxicologyNc state lecture v2 Computational Toxicology
Nc state lecture v2 Computational ToxicologySean Ekins
 
Study of Chlorophyll Mutations and Chlorophyll Content in Young Oil Palm (Ela...
Study of Chlorophyll Mutations and Chlorophyll Content in Young Oil Palm (Ela...Study of Chlorophyll Mutations and Chlorophyll Content in Young Oil Palm (Ela...
Study of Chlorophyll Mutations and Chlorophyll Content in Young Oil Palm (Ela...Premier Publishers
 
Arrays and alternative splicing
Arrays and alternative splicingArrays and alternative splicing
Arrays and alternative splicingAnn Loraine
 
The Utility of H/DX-MS in Biopharmaceutical Comparability Studies
The Utility of H/DX-MS in Biopharmaceutical Comparability StudiesThe Utility of H/DX-MS in Biopharmaceutical Comparability Studies
The Utility of H/DX-MS in Biopharmaceutical Comparability StudiesAbhijeet Lokras
 
Misner Kauss Singh et al NAMPT cardiotox CardioVascular Tox 2016
Misner Kauss Singh et al NAMPT cardiotox CardioVascular Tox 2016Misner Kauss Singh et al NAMPT cardiotox CardioVascular Tox 2016
Misner Kauss Singh et al NAMPT cardiotox CardioVascular Tox 2016Jatinder Singh, PhD, ERT.
 
PCB Amperometry_NBTS 2012
PCB Amperometry_NBTS 2012PCB Amperometry_NBTS 2012
PCB Amperometry_NBTS 2012Jenna Fielding
 
Nanoparticle of plant extract: A Novel approach for cancer therap
Nanoparticle of plant extract: A Novel approach for cancer therapNanoparticle of plant extract: A Novel approach for cancer therap
Nanoparticle of plant extract: A Novel approach for cancer theraproshan telrandhe
 
chlorophyll mutation
 chlorophyll mutation  chlorophyll mutation
chlorophyll mutation surehuasb
 
Physical and Structural Characterization of Biofield Treated Imidazole Deriva...
Physical and Structural Characterization of Biofield Treated Imidazole Deriva...Physical and Structural Characterization of Biofield Treated Imidazole Deriva...
Physical and Structural Characterization of Biofield Treated Imidazole Deriva...albertdivis
 
CIIProCluster: Developing Read-Across Predictive Toxicity Models Using Big Data
CIIProCluster: Developing Read-Across Predictive Toxicity Models Using Big DataCIIProCluster: Developing Read-Across Predictive Toxicity Models Using Big Data
CIIProCluster: Developing Read-Across Predictive Toxicity Models Using Big DataDaniel Russo
 
Fundamentals Of Genetic Toxicology In The Pharmaceutical Industry Sept 2010
Fundamentals Of Genetic Toxicology In The Pharmaceutical Industry Sept 2010Fundamentals Of Genetic Toxicology In The Pharmaceutical Industry Sept 2010
Fundamentals Of Genetic Toxicology In The Pharmaceutical Industry Sept 2010TigerTox
 
Effect of Food Source on Enzymatic Activity in C. maculatus [draft 2]
Effect of Food Source on Enzymatic Activity in C. maculatus [draft 2]Effect of Food Source on Enzymatic Activity in C. maculatus [draft 2]
Effect of Food Source on Enzymatic Activity in C. maculatus [draft 2]Dylan Easterday
 
CYP121 Drug Discovery (M. tuberculosis)
CYP121 Drug Discovery (M. tuberculosis)CYP121 Drug Discovery (M. tuberculosis)
CYP121 Drug Discovery (M. tuberculosis)Anthony Coyne
 
dual-event machine learning models to accelerate drug discovery
dual-event machine learning models to accelerate drug discoverydual-event machine learning models to accelerate drug discovery
dual-event machine learning models to accelerate drug discoverySean Ekins
 
Discrimination of symbiotic/parasitic bacterial type III secretion system eff...
Discrimination of symbiotic/parasitic bacterial type III secretion system eff...Discrimination of symbiotic/parasitic bacterial type III secretion system eff...
Discrimination of symbiotic/parasitic bacterial type III secretion system eff...Y-h Taguchi
 
Thesis defense - QUANG ONG - FINAL
Thesis defense - QUANG ONG - FINALThesis defense - QUANG ONG - FINAL
Thesis defense - QUANG ONG - FINALQuang Ong
 

What's hot (20)

Integrative bioinformatics analysis of Parkinson's disease related omics data
Integrative bioinformatics analysis of Parkinson's disease related omics dataIntegrative bioinformatics analysis of Parkinson's disease related omics data
Integrative bioinformatics analysis of Parkinson's disease related omics data
 
Nc state lecture v2 Computational Toxicology
Nc state lecture v2 Computational ToxicologyNc state lecture v2 Computational Toxicology
Nc state lecture v2 Computational Toxicology
 
Study of Chlorophyll Mutations and Chlorophyll Content in Young Oil Palm (Ela...
Study of Chlorophyll Mutations and Chlorophyll Content in Young Oil Palm (Ela...Study of Chlorophyll Mutations and Chlorophyll Content in Young Oil Palm (Ela...
Study of Chlorophyll Mutations and Chlorophyll Content in Young Oil Palm (Ela...
 
Arrays and alternative splicing
Arrays and alternative splicingArrays and alternative splicing
Arrays and alternative splicing
 
The Utility of H/DX-MS in Biopharmaceutical Comparability Studies
The Utility of H/DX-MS in Biopharmaceutical Comparability StudiesThe Utility of H/DX-MS in Biopharmaceutical Comparability Studies
The Utility of H/DX-MS in Biopharmaceutical Comparability Studies
 
Misner Kauss Singh et al NAMPT cardiotox CardioVascular Tox 2016
Misner Kauss Singh et al NAMPT cardiotox CardioVascular Tox 2016Misner Kauss Singh et al NAMPT cardiotox CardioVascular Tox 2016
Misner Kauss Singh et al NAMPT cardiotox CardioVascular Tox 2016
 
PCB Amperometry_NBTS 2012
PCB Amperometry_NBTS 2012PCB Amperometry_NBTS 2012
PCB Amperometry_NBTS 2012
 
Nanoparticle of plant extract: A Novel approach for cancer therap
Nanoparticle of plant extract: A Novel approach for cancer therapNanoparticle of plant extract: A Novel approach for cancer therap
Nanoparticle of plant extract: A Novel approach for cancer therap
 
chlorophyll mutation
 chlorophyll mutation  chlorophyll mutation
chlorophyll mutation
 
Physical and Structural Characterization of Biofield Treated Imidazole Deriva...
Physical and Structural Characterization of Biofield Treated Imidazole Deriva...Physical and Structural Characterization of Biofield Treated Imidazole Deriva...
Physical and Structural Characterization of Biofield Treated Imidazole Deriva...
 
CIIProCluster: Developing Read-Across Predictive Toxicity Models Using Big Data
CIIProCluster: Developing Read-Across Predictive Toxicity Models Using Big DataCIIProCluster: Developing Read-Across Predictive Toxicity Models Using Big Data
CIIProCluster: Developing Read-Across Predictive Toxicity Models Using Big Data
 
Fundamentals Of Genetic Toxicology In The Pharmaceutical Industry Sept 2010
Fundamentals Of Genetic Toxicology In The Pharmaceutical Industry Sept 2010Fundamentals Of Genetic Toxicology In The Pharmaceutical Industry Sept 2010
Fundamentals Of Genetic Toxicology In The Pharmaceutical Industry Sept 2010
 
Effect of Food Source on Enzymatic Activity in C. maculatus [draft 2]
Effect of Food Source on Enzymatic Activity in C. maculatus [draft 2]Effect of Food Source on Enzymatic Activity in C. maculatus [draft 2]
Effect of Food Source on Enzymatic Activity in C. maculatus [draft 2]
 
CYP121 Drug Discovery (M. tuberculosis)
CYP121 Drug Discovery (M. tuberculosis)CYP121 Drug Discovery (M. tuberculosis)
CYP121 Drug Discovery (M. tuberculosis)
 
Yasset perezriverol csi2011
Yasset perezriverol csi2011Yasset perezriverol csi2011
Yasset perezriverol csi2011
 
SBR Final Presentaion
SBR Final PresentaionSBR Final Presentaion
SBR Final Presentaion
 
dual-event machine learning models to accelerate drug discovery
dual-event machine learning models to accelerate drug discoverydual-event machine learning models to accelerate drug discovery
dual-event machine learning models to accelerate drug discovery
 
2906.full
2906.full2906.full
2906.full
 
Discrimination of symbiotic/parasitic bacterial type III secretion system eff...
Discrimination of symbiotic/parasitic bacterial type III secretion system eff...Discrimination of symbiotic/parasitic bacterial type III secretion system eff...
Discrimination of symbiotic/parasitic bacterial type III secretion system eff...
 
Thesis defense - QUANG ONG - FINAL
Thesis defense - QUANG ONG - FINALThesis defense - QUANG ONG - FINAL
Thesis defense - QUANG ONG - FINAL
 

Viewers also liked

PlenaryAwards Presentation 2011.ppt
PlenaryAwards Presentation 2011.pptPlenaryAwards Presentation 2011.ppt
PlenaryAwards Presentation 2011.pptgrssieee
 
Jesse Sampermans Research Project Presentation
Jesse Sampermans Research Project PresentationJesse Sampermans Research Project Presentation
Jesse Sampermans Research Project PresentationJesse Sampermans
 
An efficient transcoding algorithm for G.723.1 and G.729A ...
An efficient transcoding algorithm for G.723.1 and G.729A ...An efficient transcoding algorithm for G.723.1 and G.729A ...
An efficient transcoding algorithm for G.723.1 and G.729A ...Videoguy
 
TH1.L10.1: TANDEM-X: SCIENTIFIC CONTRIBUTIONS
TH1.L10.1: TANDEM-X: SCIENTIFIC CONTRIBUTIONSTH1.L10.1: TANDEM-X: SCIENTIFIC CONTRIBUTIONS
TH1.L10.1: TANDEM-X: SCIENTIFIC CONTRIBUTIONSgrssieee
 
Adaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementAdaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementHarshal Ladhe
 
Instantaneous Frequency Estimation Based On Time-Varying Auto Regressive Mode...
Instantaneous Frequency Estimation Based On Time-Varying Auto Regressive Mode...Instantaneous Frequency Estimation Based On Time-Varying Auto Regressive Mode...
Instantaneous Frequency Estimation Based On Time-Varying Auto Regressive Mode...CSCJournals
 

Viewers also liked (9)

PlenaryAwards Presentation 2011.ppt
PlenaryAwards Presentation 2011.pptPlenaryAwards Presentation 2011.ppt
PlenaryAwards Presentation 2011.ppt
 
Sport tandem
Sport tandemSport tandem
Sport tandem
 
Jesse Sampermans Research Project Presentation
Jesse Sampermans Research Project PresentationJesse Sampermans Research Project Presentation
Jesse Sampermans Research Project Presentation
 
test.pptx
test.pptxtest.pptx
test.pptx
 
An efficient transcoding algorithm for G.723.1 and G.729A ...
An efficient transcoding algorithm for G.723.1 and G.729A ...An efficient transcoding algorithm for G.723.1 and G.729A ...
An efficient transcoding algorithm for G.723.1 and G.729A ...
 
TH1.L10.1: TANDEM-X: SCIENTIFIC CONTRIBUTIONS
TH1.L10.1: TANDEM-X: SCIENTIFIC CONTRIBUTIONSTH1.L10.1: TANDEM-X: SCIENTIFIC CONTRIBUTIONS
TH1.L10.1: TANDEM-X: SCIENTIFIC CONTRIBUTIONS
 
Gomezetal ismir2012
Gomezetal ismir2012Gomezetal ismir2012
Gomezetal ismir2012
 
Adaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementAdaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancement
 
Instantaneous Frequency Estimation Based On Time-Varying Auto Regressive Mode...
Instantaneous Frequency Estimation Based On Time-Varying Auto Regressive Mode...Instantaneous Frequency Estimation Based On Time-Varying Auto Regressive Mode...
Instantaneous Frequency Estimation Based On Time-Varying Auto Regressive Mode...
 

Similar to Yasset iso point-cigb-2012

An untargeted metabolomics approach to MRS in the human brain: a comparison b...
An untargeted metabolomics approach to MRS in the human brain: a comparison b...An untargeted metabolomics approach to MRS in the human brain: a comparison b...
An untargeted metabolomics approach to MRS in the human brain: a comparison b...Uzay Emir
 
Methods for Protein Sequencing.pdf
Methods for Protein Sequencing.pdfMethods for Protein Sequencing.pdf
Methods for Protein Sequencing.pdfCreative Proteomics
 
Three Methods for Protein Sequencing
Three Methods for Protein SequencingThree Methods for Protein Sequencing
Three Methods for Protein SequencingCreative Proteomics
 
Q biomarkersomaticmutation
Q biomarkersomaticmutationQ biomarkersomaticmutation
Q biomarkersomaticmutationElsa von Licy
 
Analytical method development and validation
Analytical method development and validationAnalytical method development and validation
Analytical method development and validationCreative Peptides
 
PROGRAM PHASE IN LIGAND-BASED PHARMACOPHORE MODEL GENERATION AND 3D DATABASE ...
PROGRAM PHASE IN LIGAND-BASED PHARMACOPHORE MODEL GENERATION AND 3D DATABASE ...PROGRAM PHASE IN LIGAND-BASED PHARMACOPHORE MODEL GENERATION AND 3D DATABASE ...
PROGRAM PHASE IN LIGAND-BASED PHARMACOPHORE MODEL GENERATION AND 3D DATABASE ...Simone Brogi
 
1.proteomics coursework-3 dec2012-aky
1.proteomics coursework-3 dec2012-aky1.proteomics coursework-3 dec2012-aky
1.proteomics coursework-3 dec2012-akyAmit Yadav
 
consensus superiority of the pharmacophore based alignment, over maximum comm...
consensus superiority of the pharmacophore based alignment, over maximum comm...consensus superiority of the pharmacophore based alignment, over maximum comm...
consensus superiority of the pharmacophore based alignment, over maximum comm...Deepak Rohilla
 
Metabomeeting2008_rev230408-Jack-parag-final1
Metabomeeting2008_rev230408-Jack-parag-final1Metabomeeting2008_rev230408-Jack-parag-final1
Metabomeeting2008_rev230408-Jack-parag-final1Shahid Malik
 
A novel platform for in situ, multiomic, hyper-plexed analyses of systems bio...
A novel platform for in situ, multiomic, hyper-plexed analyses of systems bio...A novel platform for in situ, multiomic, hyper-plexed analyses of systems bio...
A novel platform for in situ, multiomic, hyper-plexed analyses of systems bio...Rafael Casiano
 
Development, safety and efficacy analysis of liquid state rabies
Development, safety and efficacy analysis of liquid state rabiesDevelopment, safety and efficacy analysis of liquid state rabies
Development, safety and efficacy analysis of liquid state rabiesBalaganesh Kuruba
 
Advances in Breast Tumor Biomarker Discovery Methods
Advances in Breast Tumor Biomarker Discovery MethodsAdvances in Breast Tumor Biomarker Discovery Methods
Advances in Breast Tumor Biomarker Discovery MethodsThermo Fisher Scientific
 
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...QIAGEN
 

Similar to Yasset iso point-cigb-2012 (20)

MpM
MpMMpM
MpM
 
An untargeted metabolomics approach to MRS in the human brain: a comparison b...
An untargeted metabolomics approach to MRS in the human brain: a comparison b...An untargeted metabolomics approach to MRS in the human brain: a comparison b...
An untargeted metabolomics approach to MRS in the human brain: a comparison b...
 
Methods for Protein Sequencing.pdf
Methods for Protein Sequencing.pdfMethods for Protein Sequencing.pdf
Methods for Protein Sequencing.pdf
 
Three Methods for Protein Sequencing
Three Methods for Protein SequencingThree Methods for Protein Sequencing
Three Methods for Protein Sequencing
 
Q biomarkersomaticmutation
Q biomarkersomaticmutationQ biomarkersomaticmutation
Q biomarkersomaticmutation
 
Shape Signatures Light
Shape Signatures LightShape Signatures Light
Shape Signatures Light
 
Analytical method development and validation
Analytical method development and validationAnalytical method development and validation
Analytical method development and validation
 
defense 2.0
defense 2.0defense 2.0
defense 2.0
 
PROGRAM PHASE IN LIGAND-BASED PHARMACOPHORE MODEL GENERATION AND 3D DATABASE ...
PROGRAM PHASE IN LIGAND-BASED PHARMACOPHORE MODEL GENERATION AND 3D DATABASE ...PROGRAM PHASE IN LIGAND-BASED PHARMACOPHORE MODEL GENERATION AND 3D DATABASE ...
PROGRAM PHASE IN LIGAND-BASED PHARMACOPHORE MODEL GENERATION AND 3D DATABASE ...
 
1.proteomics coursework-3 dec2012-aky
1.proteomics coursework-3 dec2012-aky1.proteomics coursework-3 dec2012-aky
1.proteomics coursework-3 dec2012-aky
 
consensus superiority of the pharmacophore based alignment, over maximum comm...
consensus superiority of the pharmacophore based alignment, over maximum comm...consensus superiority of the pharmacophore based alignment, over maximum comm...
consensus superiority of the pharmacophore based alignment, over maximum comm...
 
Metabomeeting2008_rev230408-Jack-parag-final1
Metabomeeting2008_rev230408-Jack-parag-final1Metabomeeting2008_rev230408-Jack-parag-final1
Metabomeeting2008_rev230408-Jack-parag-final1
 
A novel platform for in situ, multiomic, hyper-plexed analyses of systems bio...
A novel platform for in situ, multiomic, hyper-plexed analyses of systems bio...A novel platform for in situ, multiomic, hyper-plexed analyses of systems bio...
A novel platform for in situ, multiomic, hyper-plexed analyses of systems bio...
 
Austin Neurology & Neurosciences
Austin Neurology & NeurosciencesAustin Neurology & Neurosciences
Austin Neurology & Neurosciences
 
JPR2010_TDMB
JPR2010_TDMBJPR2010_TDMB
JPR2010_TDMB
 
Development, safety and efficacy analysis of liquid state rabies
Development, safety and efficacy analysis of liquid state rabiesDevelopment, safety and efficacy analysis of liquid state rabies
Development, safety and efficacy analysis of liquid state rabies
 
Advances in Breast Tumor Biomarker Discovery Methods
Advances in Breast Tumor Biomarker Discovery MethodsAdvances in Breast Tumor Biomarker Discovery Methods
Advances in Breast Tumor Biomarker Discovery Methods
 
HPLC2005
HPLC2005HPLC2005
HPLC2005
 
Sirm core2 (2)
Sirm core2 (2)Sirm core2 (2)
Sirm core2 (2)
 
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
Rapid and accurate Cancer somatic mutation profiling with the qBiomarker Soma...
 

More from Yasset Perez-Riverol

Biocontainers 2019: Presentation for the ELIXIR All Hands
Biocontainers 2019: Presentation for the ELIXIR All HandsBiocontainers 2019: Presentation for the ELIXIR All Hands
Biocontainers 2019: Presentation for the ELIXIR All HandsYasset Perez-Riverol
 
Mapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome CoordinatesMapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome CoordinatesYasset Perez-Riverol
 
Systematic integration of millions of peptidoform evidences into Ensembl and ...
Systematic integration of millions of peptidoform evidences into Ensembl and ...Systematic integration of millions of peptidoform evidences into Ensembl and ...
Systematic integration of millions of peptidoform evidences into Ensembl and ...Yasset Perez-Riverol
 
Biocontainers Hackathon Introduction
Biocontainers Hackathon IntroductionBiocontainers Hackathon Introduction
Biocontainers Hackathon IntroductionYasset Perez-Riverol
 
BioContainers on ELIXIR All Hands 2017
BioContainers on ELIXIR All Hands 2017BioContainers on ELIXIR All Hands 2017
BioContainers on ELIXIR All Hands 2017Yasset Perez-Riverol
 
OpenMS: Quantitative proteomics at large scale
OpenMS: Quantitative proteomics at large scaleOpenMS: Quantitative proteomics at large scale
OpenMS: Quantitative proteomics at large scaleYasset Perez-Riverol
 
Do we need to make public our proteomics data?
Do we need to make public our proteomics data?Do we need to make public our proteomics data?
Do we need to make public our proteomics data?Yasset Perez-Riverol
 
Design of an hexapeptide database for proteomics studies
Design of an hexapeptide database for proteomics studiesDesign of an hexapeptide database for proteomics studies
Design of an hexapeptide database for proteomics studiesYasset Perez-Riverol
 
Parallel conformational search of small molecules
Parallel conformational search of small moleculesParallel conformational search of small molecules
Parallel conformational search of small moleculesYasset Perez-Riverol
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesYasset Perez-Riverol
 
PRIDE and ProteomeXchange – Making proteomics data accessible and reusable
PRIDE and ProteomeXchange – Making proteomics data accessible and reusablePRIDE and ProteomeXchange – Making proteomics data accessible and reusable
PRIDE and ProteomeXchange – Making proteomics data accessible and reusable Yasset Perez-Riverol
 
SintCompound: A Small Compound Database for Virtual Screening
SintCompound: A Small Compound Database for Virtual ScreeningSintCompound: A Small Compound Database for Virtual Screening
SintCompound: A Small Compound Database for Virtual ScreeningYasset Perez-Riverol
 

More from Yasset Perez-Riverol (14)

Introduction to Proteogenomics
Introduction to Proteogenomics Introduction to Proteogenomics
Introduction to Proteogenomics
 
Biocontainers 2019: Presentation for the ELIXIR All Hands
Biocontainers 2019: Presentation for the ELIXIR All HandsBiocontainers 2019: Presentation for the ELIXIR All Hands
Biocontainers 2019: Presentation for the ELIXIR All Hands
 
Mapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome CoordinatesMapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome Coordinates
 
Systematic integration of millions of peptidoform evidences into Ensembl and ...
Systematic integration of millions of peptidoform evidences into Ensembl and ...Systematic integration of millions of peptidoform evidences into Ensembl and ...
Systematic integration of millions of peptidoform evidences into Ensembl and ...
 
Biocontainers Hackathon Introduction
Biocontainers Hackathon IntroductionBiocontainers Hackathon Introduction
Biocontainers Hackathon Introduction
 
BioContainers on ELIXIR All Hands 2017
BioContainers on ELIXIR All Hands 2017BioContainers on ELIXIR All Hands 2017
BioContainers on ELIXIR All Hands 2017
 
OpenMS: Quantitative proteomics at large scale
OpenMS: Quantitative proteomics at large scaleOpenMS: Quantitative proteomics at large scale
OpenMS: Quantitative proteomics at large scale
 
Do we need to make public our proteomics data?
Do we need to make public our proteomics data?Do we need to make public our proteomics data?
Do we need to make public our proteomics data?
 
Design of an hexapeptide database for proteomics studies
Design of an hexapeptide database for proteomics studiesDesign of an hexapeptide database for proteomics studies
Design of an hexapeptide database for proteomics studies
 
Parallel conformational search of small molecules
Parallel conformational search of small moleculesParallel conformational search of small molecules
Parallel conformational search of small molecules
 
PBS Web (Spanish)
PBS Web (Spanish)PBS Web (Spanish)
PBS Web (Spanish)
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata files
 
PRIDE and ProteomeXchange – Making proteomics data accessible and reusable
PRIDE and ProteomeXchange – Making proteomics data accessible and reusablePRIDE and ProteomeXchange – Making proteomics data accessible and reusable
PRIDE and ProteomeXchange – Making proteomics data accessible and reusable
 
SintCompound: A Small Compound Database for Virtual Screening
SintCompound: A Small Compound Database for Virtual ScreeningSintCompound: A Small Compound Database for Virtual Screening
SintCompound: A Small Compound Database for Virtual Screening
 

Recently uploaded

Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 

Recently uploaded (20)

Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 

Yasset iso point-cigb-2012

  • 1. Isoelectric point estimation using peptide descriptors and support vector machines Y. Perez-Riverol, E. Audain, A. Millan, Y. Ramos, A. Sanchez, J. Vizcaíno, R. Wang, M. Müller, Y. Machado, L. Betancourt, L. González, G. Padrón, V. Besada Center for Genetic Engineering and Biotechnology, P.O. Box 6162, Havana 10600, Cuba Contact email: yasset.perez@biocomp.cigb.edu.cu Therapy Feature Selection Protocol with APL led significant reduction of TNFα & Complexity reduction Introduction IPG (Immobilized pH Gradient) based separations are frequently used as the first step in shotgun proteomics methods; it yields an increase in both the dynamic range and resolution of peptide separation prior to the LC-MS analysis. Experimental isoelectric point (pI) values can improve peptide identifications in conjunction with MS/MS information [1]. Our group has previously reported the possibility of identifying theoretically peptides and proteins based on different experimental properties [2]. Thus, accurate estimation of the pI value based on the amino acid sequence becomes critical to perform these kinds of experiments. Nowadays, pI is commonly predicted using the charge-state model [3], and/or the cofactor algorithm [4]. However, none of these methods is capable of calculating the pI value for basic peptides accurately. In this manuscript, we present an new approach that can significant improve the pI estimation, by using Support Vector Machines (SVM), an experimental amino acid descriptor taken from the AAIndex database [5] and the isoelectric point predicted by the charge-state model. TLC analysis of in-vitro fructan production using extracts from leaves, stem and seeds of transgenic line 3. A) Leaves extracts were subjected to IMAC. Proteins bound to Ni-NTA beads were eluted with 250 mM imidazole and incubated with 200 mM sucrose for 24 h at 30ºC. Lanes: 1, fructans from onion bulb; 2, nontransformed plant; 3, transgenic line 3. B) Stem and seed extracts from transgenic line 3 were incubated with 200 mM sucrose for 24 h at 30ºC. Lanes: 1, substrate (control); 2, heatinactivated stem extract; 3, stem extract; 4, seed extract; 5, marker. Correlation matrix on the predictors show the correlation between all the calculated descriptors. Then, a subset of more problematic descriptors (cor > 0.7 ) were removed. Finally, the feature select algorithm reduces the feature space from 555 to 44 descriptors. Kernel Experimental data & processing protocol Number of RMSD Function Predictors Polynomial 0.3387 0.97 Lineal 20 0.3866 0.96 Exponential D. melanogaster Kc167 cells 25 2 0.4 0.96 Radial Protein Extraction Four different SVMs function kernels with automated sigma estimation using the kernlab Rpackage were evaluated. Final model selects only two predictors, the isoelectric point predicted with the Bjellqvist algorithm and the experimental AAindex descriptor from Zimmerman . R2 2 0.32 0.98 SVM algorithm vs Current algorithms Protein Digestion (A) OFFGEL Off-gel Electrophoresis LTQ-FT-ICR 4700 MS/MS Precursor 1570.7 Spec #1 MC[BP = 175.1, 3106] 175.1326 100 3105.9 90 1056.5107 80 1554.7853 70 it e % In t n s y X!Tandem & Peptide Prophet (B) 1571.9679 684.3845 60 X!Tandem 1556.5172 50 40 112.0977 30 1558.4042 246.1672 20 72.1029 0 69.0 813.4371 333.2105 316.1747 120.0979 10 229.1560 400.2173 386.8 480.2749 463.2531 490.3423 1441.7213 741.3559 758.3326 627.3450 629.3128 942.4836 837.0470 910.8679 704.6 1039.4810 1040.9976 1022.4 1171.5131 1268.5427 1340.2 1445.2834 1559.9417 1570.2634 1551.7002 1658.0 Peptide Prophet Mass (m/z) (C) Peptide Identifications Filter by Probability Probability > 0.97 Non-PTMs High Probability Identifications Descriptors Estimation & isoelectric point calculator The theoretical and experimental values are more correlated in the 3.0–4.0 pH range. The average of the standard deviation for the first five fractions for the SVM model, the charge-state and the adjacent algorithm was 0.26, 0.23 and 0.25 respectively. In last five fractions the average of the standard deviation (stdv) was 0.20, 0.52, 0.32 for the SVM model, the charge-state and the adjacent algorithm respectively Detection of possible False positive peptide identifications. Probability References [1] Cargile BJ, Stephenson JL, Jr. An Alternative to Tandem Mass Spectrometry: Isoelectric Point and Accurate Mass for the Identification of Peptides. Anal Chem. 2004;76:267-75. [2] Perez-Riverol Y, Sanchez A, Ramos Y, Schmidt A, Muller M, Betancourt L, et al. In silico analysis of accurate proteomics, complemented by selective isolation of peptides. J Proteomics. 2011;74:2071-82. [3] Bjellqvist B, Hughes GJ, Pasquali C, Paquet N, Ravier F, Sanchez JC, et al. The focusing positions of polypeptides in immobilized pH gradients can be predicted from their amino acid sequences. Electrophoresis. 1993;14:1023-31. [4] Cargile BJ, Sevinsky JR, Essader AS, Eu JP, Stephenson JL, Jr. Calculation of the isoelectric point of tryptic peptides in the pH 3.54.5 range based on adjacent amino acid effects. Electrophoresis. 2008;29:2768-78. [5] Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M. AAindex: amino acid index database, progress report 2008. Nucleic Acids Res. 2008;36:D202-5. 0.7 0.6 0.5 0.4 0.3 0.2 0.1 211687 33492 15960 11244 9780 9540 10200 11556 16212 4344 16893 2791 1330 937 815 795 850 963 1351 362 % peptides Using pICalculator: Bjellqvist pI and Cargile pI 0.8 Non-redundant Peptides Physicochemical and biological properties from AAindex (PD= (∑AD)/NA 0.9 Identified Peptides Using ChemAxon (http://www.chemaxon.com): refractivity index, polarizability, surface area, LogP 1 0.2 2.6 5.9 6.1 9.3 14.0 16.4 16.8 22.6 31.2 Non-redundant 10 34 39 33 45 68 94 113 228 86 Conclusion We combined a SVM approach with only two simple peptide descriptors to predict the isoelectric point of identified peptides, and our results have shown better accuracy than the existing methods. Furthermore, the ability of calculating the pI of peptides to this accurate level is desirable for peptide pI filtering. We envisage that the same approach could also be applied to predict the effect of posttranslational modifications. The use of SVMs and the approach described in this work could be useful for these types of analyses.