Published on

Increasing resistance toward the conventional antibiotics has become a global concern. Antimicrobial peptides (AMPs) are potential alternatives for conventional antibiotics. Due to cost related reasons in designing and synthesis of AMPs. Machine learning based prediction tools are indispensable.

Published in: Technology, Education


  2. 2. We will be discussing..• The problem• The solution• Objectives• Literature reviews• Machine learning in biological problems• Antimicrobial activity prediction• Technical background• Methodology• Results• Conclusions• Future perspective• Availability and publications
  3. 3. The Problem• Increasing resistance toward the conventional antibiotics has become aglobal concern.Source:-CenterforGlobalDevelopment(CGD)
  4. 4. The solution• Novel antibacterial agents• Antimicrobial peptides (AMPs) arepotential alternatives for conventionalantibiotics because of-1.ability to kill target cells rapidly.2.broad spectrum of activity.3. and modularity.
  5. 5. Yet another obstacle• Exact MOA and SAR of AMPs is not known completely *.• Many reasons can be given for the same :-1. Diversity in AMPs sequence2. Varied structures3. Unorganized structure in solution4. Unknown structure of numerous AMPs.• Above and beyond high throughput screening, methods for large scalesynthesis and automated assay techniques, two other important prerequisites area) open source in silico libraries of AMPsb) efficient computational methods.• A computational method includes prediction tools for antimicrobialactivity.* Mohammad Rahnamaeian: Antimicrobial peptides Modes of mechanism, modulation of defenseresponses, Plant Signaling &Behavior 6:9, 1325-1332; 2011Landes Bioscience.
  6. 6. Objectives• Machine learning based prediction tools forantimicrobial activity.• Comparison of SVM, RF and ANN based predictionmodels• Relative importance of various peptide descriptors inprediction ability of models.
  7. 7. Literature Reviews• AMPs are Abundant and diverse group of biomolecules.• Selectively lethal against microbes.• Found every where e.g. Monera(Eubacteria), Protista(protozoans and algae), Fungi (yeasts), Plantae (plants) andAnimalia (insects, fish, amphibians, reptiles, birds andmammals). (Sang Y et al. 2008 )• Exist as α-helical peptides, and β-sheet peptides.• Difference between cell membrane’s composition,polarization, and structure of eukaryotes and prokaryotes isresponsible for selective action. Brogden KA (2005)• Attraction, attachment and pore formation are seen duringthe action of AMPs (Roland ,2009)
  8. 8. Literature Reviews…• Two significant properties which are considered for de-novodesign of AMPs (Richard W. 2008 and Prenner 2005)1. Net positive charge to interact with negatively chargedbacterial membrane.2. Amphipathic structure to facilitate its integration into thebacterial membrane. (Sarika P 2011)Red, basic (positively charged) aminoacidsGreen, hydrophobic amino acidsMichael Zasloff (2002)
  9. 9. Machine learning in Biological problems• 1958 - First attempt to model neuronal architecture of the brain.• 1982 - Stormo et al. proposed ‘Perceptron’ algorithm to distinguish E. colitranslational initiation sequences from other sites.• machine learning is employed for :-1. Prediction models2. Automatic annotation3. Protein structure and function prediction4. Active sites determination in proteins5. Evolutionary analysis6. Determination of binding sites on protein target7. Biological network analysis8. Patterns discovery in biochemical pathways9. Phylogenetic tree analysis10. Identifying genetic markers of disease.
  10. 10. Antimicrobial activity prediction• several machine learning based prediction methods have been developed* support vector machines (SVM), discriminant analysis (DA), Sliding window (SW), artificial neural network(ANN), quantitative matrix (QM), Hidden markov model (HMM), sequence alignment (SA), Weighted finite-state transducers (WFST)• Still a huge gap exists between what need to be achieved and what hasbeen achieved.Algorithm / method * Reference Associated databaseSVM Lata et al. AntiBPANN Lata et al. AntiBPSW Torrent et al. --DA Thomas et al. CAMPQM Lata et al. AntiBPWFST Whelan et al. --HMM Hammami et al. PhytAMPHammami et al. BACTIBASESA Wang et al. APD2
  11. 11. Antimicrobial activity prediction• This is a challenging task, due to• Low sequence similarity among diverse AMPs (Hancock RE1999)• Unorganised conformation• Moreover costly experimental methods• So we need good prediction models• Physicochemical properties like Charge, size, amphipathicity,amino acid composition, structural conformation,hydrophobicity and polar angle are responsible forantimicrobial activity.• Total of 257 peptide descriptors - which includes dipeptideand tripeptide composition, composition based on reducedalphabets, amino acid indices, charge, and hydrophobicityindices.
  12. 12. Technical backgroundSVMs• Supervised learning model.• Originally it was for linearly separable case.• In 1995 it was extended to the linearly non separable casesalso.
  13. 13. Linear SVMs
  14. 14. Linear SVMs…
  15. 15. Non linear SVM• Kernel trick.• Data points are nonlinearly mapped to a feature space of highdimensions.• The transformation used is f([x y]) = [x y (x^2+y^2)].
  16. 16. Random Forest• Ensemble learning framework.• It raises multiple classification trees.• Decision tree is a common flow chart like schema to representclassification problems.
  17. 17. Random forest..• Each decision tree in RF is grown as follows :-• Sample N cases (1/3 of original dataset)with replacement from the original data.• Select randomly m predictor out of the M predictors (m<<M) and variable thatprovides the best split is used to split the node.• Each tree is grown to its largest possible extent & each tree votes for ‘class labels’.• The classification winning most votes are chosen.
  18. 18. Advantages of RF• High prediction accuracy.• Hold perfectly good for large scale dataset with large numberof variables.• Integral variable selection based on importance and variableinteraction.• Deals efficiently with data having missing values.• Ability to reuse forest for future estimation.• Computation of relation between variables and classification.• Proximity calculation between cases.• Can be used for unsupervised learning and outlier detection.• Internal unbiased estimate of the generalization error
  19. 19. ANN
  20. 20. ANN..• Interconnected, complex network of perceptron forms ANN.
  21. 21. Perceptron learning rule• It involves learning to fix the weight vector so that it is able to predictcorrect ±1 output.• It is a method to alter and re-adjust the weights.Perceptron rule• Assign initial weights randomly.• Then iteratively apply the perceptron.• If perceptron mis calculate the output, readjust the weights. Repeat this.Delta rule• Perceptron rule fails to converge in nonlinearly separable case.• Based on gradient descent search algorithm.• Searches the suitable weight from a hypothesis space of weights.
  22. 22. MethodologyKey issues• Data representation• Cross validation• Measurement of classifier’s performance• Sensitivity• Specificity• MCC• Prediction accuracy
  23. 23. Methodology …• CAMP currently contains 4020AMPs• Sequences having X was removed.• redundant sequence - Cd hit (cut-off of 0.9)• Final negative dataset - 4011sequences.• Perl script to calculate 257 peptidefeatures.• train and test data -70:30.• Best 64 features - RF Gini score• Package randomForest in R for RF.• 1000 tree and default mtry.• Kernlab package for SVM,Polynomial kernel.• nnet package for ANN. Log linermodel with 65 weights.• Package “ROCR” for evaluation.
  24. 24. Results• 1470 AMP and 532 NAMP in test dataset.• RF shows the best prediction accuracyAlgorithm MCC against testdatasetPrediction Accuracy(in %)AUC of ROCcurveRF 0.87 94.2 0.98SVM 0.82 92.3 0.97ANN 0.74 87.9 0.94
  25. 25. Comparison with other predictiontoolsServer / tools Prediction accuracy (%)RF SVM ANN SW QMOur method 94.2 92.3 87.9 -- --AntiBP -- 92.1 88.17 -- 90.37AMPA -- -- -- 85 --Random Forest (RF), Support vector machines (SVM),Artificial neural network (ANN), sliding window (SW),quantitative matrix (QM)
  26. 26. Fig 1 Fig 2Fig 3Figure 1 - Plot of cumulative errorrates in RF - black (overall), red - class0 (AMP), green - class 1 (NAMP)Figure 2 - A variable importance plot.Variable importance is determined byMean decrease in Gini score.Figure 3 - Scatter plot of RF model(red triangle - AMP and black circle –NAMP).
  27. 27. ConclusionsGeneral conclusion• Prediction tools are very crucial for designing and synthesis of novel AMPs.• Sequence of an AMP plays an important role in antimicrobial activity.• It is necessary to understand the role of peptide feature in antimicrobialactivity.• Prediction accuracy relies on the relevant information contained withinthe descriptors.Specific conclusions• RF has higher prediction performance. Ensemble technique seems to bethe reason behind this.• Best 64 peptide features is identified.• The prediction tools developed during this study will certainly help inidentifying the new potential AMPs.
  28. 28. Future prospective• Better prediction methods - by incorporating diverse peptidefeatures & more stringent noise removal strategy.• Antimicrobial region prediction in a peptide would be veryuseful.• Developing a benchmark dataset would be a great milestone.• Position specific scoring matrix (PSSM) based prediction.• Classifying a predicted AMP into further sub families based onfunctions. Although this work has been done, it still leaves theroom for improvement in accuracy and methodology.
  29. 29. Availability & Publication• Version 2 of CAMP• Publication of CAMP version 2 is incommunication with Nucleic Acid research(NAR)