SlideShare a Scribd company logo
PREDICTION OF ANTIMICROBIAL
PEPTIDES USING MACHINE LEARNING
METHODS
BILAL NIZAMI
M.Tech (Bioinformatics)
Under the guidance of
Dr. SUSAN THOMAS
Biomedical Informatics Center (BIC)
NIRRH, Mumbai
We will be discussing..
• The problem
• The solution
• Objectives
• Literature reviews
• Machine learning in biological problems
• Antimicrobial activity prediction
• Technical background
• Methodology
• Results
• Conclusions
• Future perspective
• Availability and publications
The Problem
• Increasing resistance toward the conventional antibiotics has become a
global concern.
Source:-CenterforGlobalDevelopment(CGD)
The solution
• Novel antibacterial agents
• Antimicrobial peptides (AMPs) are
potential alternatives for conventional
antibiotics because of-
1.ability to kill target cells rapidly.
2.broad spectrum of activity.
3. and modularity.
Yet another obstacle
• Exact MOA and SAR of AMPs is not known completely *.
• Many reasons can be given for the same :-
1. Diversity in AMPs sequence
2. Varied structures
3. Unorganized structure in solution
4. Unknown structure of numerous AMPs.
• Above and beyond high throughput screening, methods for large scale
synthesis and automated assay techniques, two other important pre
requisites are
a) open source in silico libraries of AMPs
b) efficient computational methods.
• A computational method includes prediction tools for antimicrobial
activity.
* Mohammad Rahnamaeian: Antimicrobial peptides Modes of mechanism, modulation of defense
responses, Plant Signaling &Behavior 6:9, 1325-1332; 2011Landes Bioscience.
Objectives
• Machine learning based prediction tools for
antimicrobial activity.
• Comparison of SVM, RF and ANN based prediction
models
• Relative importance of various peptide descriptors in
prediction ability of models.
Literature Reviews
• AMPs are Abundant and diverse group of biomolecules.
• Selectively lethal against microbes.
• Found every where e.g. Monera(Eubacteria), Protista
(protozoans and algae), Fungi (yeasts), Plantae (plants) and
Animalia (insects, fish, amphibians, reptiles, birds and
mammals). (Sang Y et al. 2008 )
• Exist as α-helical peptides, and β-sheet peptides.
• Difference between cell membrane’s composition,
polarization, and structure of eukaryotes and prokaryotes is
responsible for selective action. Brogden KA (2005)
• Attraction, attachment and pore formation are seen during
the action of AMPs (Roland ,2009)
Literature Reviews…
• Two significant properties which are considered for de-novo
design of AMPs (Richard W. 2008 and Prenner 2005)
1. Net positive charge to interact with negatively charged
bacterial membrane.
2. Amphipathic structure to facilitate its integration into the
bacterial membrane. (Sarika P 2011)
Red, basic (positively charged) amino
acids
Green, hydrophobic amino acids
Michael Zasloff (2002)
Machine learning in Biological problems
• 1958 - First attempt to model neuronal architecture of the brain.
• 1982 - Stormo et al. proposed ‘Perceptron’ algorithm to distinguish E. coli
translational initiation sequences from other sites.
• machine learning is employed for :-
1. Prediction models
2. Automatic annotation
3. Protein structure and function prediction
4. Active sites determination in proteins
5. Evolutionary analysis
6. Determination of binding sites on protein target
7. Biological network analysis
8. Patterns discovery in biochemical pathways
9. Phylogenetic tree analysis
10. Identifying genetic markers of disease.
Antimicrobial activity prediction
• several machine learning based prediction methods have been developed
* support vector machines (SVM), discriminant analysis (DA), Sliding window (SW), artificial neural network
(ANN), quantitative matrix (QM), Hidden markov model (HMM), sequence alignment (SA), Weighted finite-
state transducers (WFST)
• Still a huge gap exists between what need to be achieved and what has
been achieved.
Algorithm / method * Reference Associated database
SVM Lata et al. AntiBP
ANN Lata et al. AntiBP
SW Torrent et al. --
DA Thomas et al. CAMP
QM Lata et al. AntiBP
WFST Whelan et al. --
HMM Hammami et al. PhytAMP
Hammami et al. BACTIBASE
SA Wang et al. APD2
Antimicrobial activity prediction
• This is a challenging task, due to
• Low sequence similarity among diverse AMPs (Hancock RE
1999)
• Unorganised conformation
• Moreover costly experimental methods
• So we need good prediction models
• Physicochemical properties like Charge, size, amphipathicity,
amino acid composition, structural conformation,
hydrophobicity and polar angle are responsible for
antimicrobial activity.
• Total of 257 peptide descriptors - which includes dipeptide
and tripeptide composition, composition based on reduced
alphabets, amino acid indices, charge, and hydrophobicity
indices.
Technical background
SVMs
• Supervised learning model.
• Originally it was for linearly separable case.
• In 1995 it was extended to the linearly non separable cases
also.
Linear SVMs
Linear SVMs…
Non linear SVM
• Kernel trick.
• Data points are nonlinearly mapped to a feature space of high
dimensions.
• The transformation used is f([x y]) = [x y (x^2+y^2)].
Random Forest
• Ensemble learning framework.
• It raises multiple classification trees.
• Decision tree is a common flow chart like schema to represent
classification problems.
Random forest..
• Each decision tree in RF is grown as follows :-
• Sample N cases (1/3 of original dataset)with replacement from the original data.
• Select randomly m predictor out of the M predictors (m<<M) and variable that
provides the best split is used to split the node.
• Each tree is grown to its largest possible extent & each tree votes for ‘class labels’.
• The classification winning most votes are chosen.
Advantages of RF
• High prediction accuracy.
• Hold perfectly good for large scale dataset with large number
of variables.
• Integral variable selection based on importance and variable
interaction.
• Deals efficiently with data having missing values.
• Ability to reuse forest for future estimation.
• Computation of relation between variables and classification.
• Proximity calculation between cases.
• Can be used for unsupervised learning and outlier detection.
• Internal unbiased estimate of the generalization error
ANN
ANN..
• Interconnected, complex network of perceptron forms ANN.
Perceptron learning rule
• It involves learning to fix the weight vector so that it is able to predict
correct ±1 output.
• It is a method to alter and re-adjust the weights.
Perceptron rule
• Assign initial weights randomly.
• Then iteratively apply the perceptron.
• If perceptron mis calculate the output, readjust the weights. Repeat this.
Delta rule
• Perceptron rule fails to converge in nonlinearly separable case.
• Based on gradient descent search algorithm.
• Searches the suitable weight from a hypothesis space of weights.
Methodology
Key issues
• Data representation
• Cross validation
• Measurement of classifier’s performance
• Sensitivity
• Specificity
• MCC
• Prediction accuracy
Methodology …
• CAMP currently contains 4020
AMPs
• Sequences having X was removed.
• redundant sequence - Cd hit (cut-
off of 0.9)
• Final negative dataset - 4011
sequences.
• Perl script to calculate 257 peptide
features.
• train and test data -70:30.
• Best 64 features - RF Gini score
• Package randomForest in R for RF.
• 1000 tree and default mtry.
• Kernlab package for SVM,
Polynomial kernel.
• nnet package for ANN. Log liner
model with 65 weights.
• Package “ROCR” for evaluation.
Results
• 1470 AMP and 532 NAMP in test dataset.
• RF shows the best prediction accuracy
Algorithm MCC against test
dataset
Prediction Accuracy
(in %)
AUC of ROC
curve
RF 0.87 94.2 0.98
SVM 0.82 92.3 0.97
ANN 0.74 87.9 0.94
Comparison with other prediction
tools
Server / tools Prediction accuracy (%)
RF SVM ANN SW QM
Our method 94.2 92.3 87.9 -- --
AntiBP -- 92.1 88.17 -- 90.37
AMPA -- -- -- 85 --
Random Forest (RF), Support vector machines (SVM),
Artificial neural network (ANN), sliding window (SW),
quantitative matrix (QM)
Fig 1 Fig 2
Fig 3
Figure 1 - Plot of cumulative error
rates in RF - black (overall), red - class
0 (AMP), green - class 1 (NAMP)
Figure 2 - A variable importance plot.
Variable importance is determined by
Mean decrease in Gini score.
Figure 3 - Scatter plot of RF model
(red triangle - AMP and black circle –
NAMP).
Conclusions
General conclusion
• Prediction tools are very crucial for designing and synthesis of novel AMPs.
• Sequence of an AMP plays an important role in antimicrobial activity.
• It is necessary to understand the role of peptide feature in antimicrobial
activity.
• Prediction accuracy relies on the relevant information contained within
the descriptors.
Specific conclusions
• RF has higher prediction performance. Ensemble technique seems to be
the reason behind this.
• Best 64 peptide features is identified.
• The prediction tools developed during this study will certainly help in
identifying the new potential AMPs.
Future prospective
• Better prediction methods - by incorporating diverse peptide
features & more stringent noise removal strategy.
• Antimicrobial region prediction in a peptide would be very
useful.
• Developing a benchmark dataset would be a great milestone.
• Position specific scoring matrix (PSSM) based prediction.
• Classifying a predicted AMP into further sub families based on
functions. Although this work has been done, it still leaves the
room for improvement in accuracy and methodology.
Availability & Publication
• Version 2 of CAMP
http://www.bicnirrh.res.in/antimicrobial/
• Publication of CAMP version 2 is in
communication with Nucleic Acid research
(NAR) http://nar.oxfordjournals.org/.
PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS

More Related Content

What's hot

Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
Vijay Hemmadi
 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
Samvartika Majumdar
 
Kegg
KeggKegg
Kegg
msfbi1521
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure Prediction
Arindam Ghosh
 
Primers
PrimersPrimers
Structure analysis of protein
Structure analysis of proteinStructure analysis of protein
Structure analysis of protein
KAUSHAL SAHU
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
Rahul Sahu
 
Multilocus sequence typin1
Multilocus sequence typin1Multilocus sequence typin1
Multilocus sequence typin1
Manash Debbarma
 
Reverse vaccinology by aashi
Reverse vaccinology  by aashiReverse vaccinology  by aashi
Reverse vaccinology by aashi
Aashi Gupta
 
Maxam–Gilbert sequencing
Maxam–Gilbert sequencingMaxam–Gilbert sequencing
Maxam–Gilbert sequencing
Obydulla (Al Mamun)
 
Pyrosequencing
PyrosequencingPyrosequencing
Pyrosequencing
Ashfaq Ahmad
 
Protein database
Protein databaseProtein database
Protein database
Rajpal Choudhary
 
Antimicrobial peptides a novel therapeutic agent
Antimicrobial peptides a novel therapeutic agentAntimicrobial peptides a novel therapeutic agent
Antimicrobial peptides a novel therapeutic agent
karimbscdu
 
Dot matrix
Dot matrixDot matrix
Dot matrix
Tania Khan
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
Vidya Kalaivani Rajkumar
 
Introduction to real-Time Quantitative PCR (qPCR) - Download the slides
Introduction to real-Time Quantitative PCR (qPCR) - Download the slidesIntroduction to real-Time Quantitative PCR (qPCR) - Download the slides
Introduction to real-Time Quantitative PCR (qPCR) - Download the slides
QIAGEN
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
Amritha S R
 
Chou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure predictionChou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure prediction
Roshan Karunarathna
 
Protein structure visualization tools-RASMOL
Protein structure visualization tools-RASMOLProtein structure visualization tools-RASMOL
Protein structure visualization tools-RASMOL
Vidya Kalaivani Rajkumar
 
fish- Fluorescence in situ hybridization
fish- Fluorescence in situ hybridization fish- Fluorescence in situ hybridization
fish- Fluorescence in situ hybridization
gaurav raja
 

What's hot (20)

Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
 
Kegg
KeggKegg
Kegg
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure Prediction
 
Primers
PrimersPrimers
Primers
 
Structure analysis of protein
Structure analysis of proteinStructure analysis of protein
Structure analysis of protein
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
Multilocus sequence typin1
Multilocus sequence typin1Multilocus sequence typin1
Multilocus sequence typin1
 
Reverse vaccinology by aashi
Reverse vaccinology  by aashiReverse vaccinology  by aashi
Reverse vaccinology by aashi
 
Maxam–Gilbert sequencing
Maxam–Gilbert sequencingMaxam–Gilbert sequencing
Maxam–Gilbert sequencing
 
Pyrosequencing
PyrosequencingPyrosequencing
Pyrosequencing
 
Protein database
Protein databaseProtein database
Protein database
 
Antimicrobial peptides a novel therapeutic agent
Antimicrobial peptides a novel therapeutic agentAntimicrobial peptides a novel therapeutic agent
Antimicrobial peptides a novel therapeutic agent
 
Dot matrix
Dot matrixDot matrix
Dot matrix
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Introduction to real-Time Quantitative PCR (qPCR) - Download the slides
Introduction to real-Time Quantitative PCR (qPCR) - Download the slidesIntroduction to real-Time Quantitative PCR (qPCR) - Download the slides
Introduction to real-Time Quantitative PCR (qPCR) - Download the slides
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
Chou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure predictionChou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure prediction
 
Protein structure visualization tools-RASMOL
Protein structure visualization tools-RASMOLProtein structure visualization tools-RASMOL
Protein structure visualization tools-RASMOL
 
fish- Fluorescence in situ hybridization
fish- Fluorescence in situ hybridization fish- Fluorescence in situ hybridization
fish- Fluorescence in situ hybridization
 

Viewers also liked

Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learning
Abhishek Vijayvargia
 
Lezioni Settimana 4
Lezioni Settimana 4Lezioni Settimana 4
Lezioni Settimana 4lab13unisa
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Sri Ambati
 
Rosaura parisi ppt progetto
Rosaura parisi ppt progettoRosaura parisi ppt progetto
Rosaura parisi ppt progetto
lab13unisa
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
Marina Santini
 
Identification of pathogenic bacteria in clinical microbiology
Identification of pathogenic bacteria in clinical microbiologyIdentification of pathogenic bacteria in clinical microbiology
Identification of pathogenic bacteria in clinical microbiology
Aman Ullah
 
Bootstrapping Machine Learning
Bootstrapping Machine LearningBootstrapping Machine Learning
Bootstrapping Machine Learning
Louis Dorard
 
Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Machine learning overview (with SAS software)
Machine learning overview (with SAS software)
Longhow Lam
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Girish Khanzode
 
Peptides
PeptidesPeptides
Glycopeptide And Peptide Antibiotics
Glycopeptide And  Peptide  AntibioticsGlycopeptide And  Peptide  Antibiotics
Glycopeptide And Peptide Antibiotics
shabeel pn
 
Introduction to Bootstrap
Introduction to BootstrapIntroduction to Bootstrap
Introduction to Bootstrap
Ron Reiter
 

Viewers also liked (12)

Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learning
 
Lezioni Settimana 4
Lezioni Settimana 4Lezioni Settimana 4
Lezioni Settimana 4
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
 
Rosaura parisi ppt progetto
Rosaura parisi ppt progettoRosaura parisi ppt progetto
Rosaura parisi ppt progetto
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
Identification of pathogenic bacteria in clinical microbiology
Identification of pathogenic bacteria in clinical microbiologyIdentification of pathogenic bacteria in clinical microbiology
Identification of pathogenic bacteria in clinical microbiology
 
Bootstrapping Machine Learning
Bootstrapping Machine LearningBootstrapping Machine Learning
Bootstrapping Machine Learning
 
Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Machine learning overview (with SAS software)
Machine learning overview (with SAS software)
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Peptides
PeptidesPeptides
Peptides
 
Glycopeptide And Peptide Antibiotics
Glycopeptide And  Peptide  AntibioticsGlycopeptide And  Peptide  Antibiotics
Glycopeptide And Peptide Antibiotics
 
Introduction to Bootstrap
Introduction to BootstrapIntroduction to Bootstrap
Introduction to Bootstrap
 

Similar to PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS

P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
Natalio Krasnogor
 
Computer Aided Molecular Modeling
Computer Aided Molecular ModelingComputer Aided Molecular Modeling
Computer Aided Molecular Modeling
pkchoudhury
 
docking
docking docking
docking
prateek kumar
 
Bio inspiring computing and its application in cheminformatics
Bio inspiring computing and its application in cheminformaticsBio inspiring computing and its application in cheminformatics
Bio inspiring computing and its application in cheminformatics
abdelazim Galal
 
Soft Computing.pptx
Soft Computing.pptxSoft Computing.pptx
Soft Computing.pptx
TusharPatel555199
 
Quantum Mechanics in Molecular modeling
Quantum Mechanics in Molecular modelingQuantum Mechanics in Molecular modeling
Quantum Mechanics in Molecular modeling
Akshay Kank
 
Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...
Alexander Decker
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
3d qsar
3d qsar3d qsar
3d qsar
Mahendra G S
 
SOT short course on computational toxicology
SOT short course on computational toxicology SOT short course on computational toxicology
SOT short course on computational toxicology
Sean Ekins
 
protein Modeling Abi.pptx
protein Modeling Abi.pptxprotein Modeling Abi.pptx
protein Modeling Abi.pptx
MuhammadRizwan863722
 
Systems biology & Approaches of genomics and proteomics
 Systems biology & Approaches of genomics and proteomics Systems biology & Approaches of genomics and proteomics
Systems biology & Approaches of genomics and proteomics
sonam786
 
Protein Structure Determination
Protein Structure DeterminationProtein Structure Determination
Protein Structure Determination
Amjad Ibrahim
 
SBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resourcesSBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resources
Mike Hucka
 
Molecular Simulation to build models for enzyme induced fit
Molecular Simulation to build models for enzyme induced fit Molecular Simulation to build models for enzyme induced fit
Molecular Simulation to build models for enzyme induced fit
MinSung Kim
 
Genome structure prediction a review over soft computing techniques
Genome structure prediction a review over soft computing techniquesGenome structure prediction a review over soft computing techniques
Genome structure prediction a review over soft computing techniques
eSAT Journals
 
Machine Learning
Machine LearningMachine Learning
H43014046
H43014046H43014046
H43014046
IJERA Editor
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
ajay301
 
2007 03-16 modeling and static analysis of complex biological systems dsr
2007 03-16 modeling and static analysis of complex biological systems dsr2007 03-16 modeling and static analysis of complex biological systems dsr
2007 03-16 modeling and static analysis of complex biological systems dsr
Debora Da Rosa
 

Similar to PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS (20)

P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
 
Computer Aided Molecular Modeling
Computer Aided Molecular ModelingComputer Aided Molecular Modeling
Computer Aided Molecular Modeling
 
docking
docking docking
docking
 
Bio inspiring computing and its application in cheminformatics
Bio inspiring computing and its application in cheminformaticsBio inspiring computing and its application in cheminformatics
Bio inspiring computing and its application in cheminformatics
 
Soft Computing.pptx
Soft Computing.pptxSoft Computing.pptx
Soft Computing.pptx
 
Quantum Mechanics in Molecular modeling
Quantum Mechanics in Molecular modelingQuantum Mechanics in Molecular modeling
Quantum Mechanics in Molecular modeling
 
Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...Application of support vector machines for prediction of anti hiv activity of...
Application of support vector machines for prediction of anti hiv activity of...
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
 
3d qsar
3d qsar3d qsar
3d qsar
 
SOT short course on computational toxicology
SOT short course on computational toxicology SOT short course on computational toxicology
SOT short course on computational toxicology
 
protein Modeling Abi.pptx
protein Modeling Abi.pptxprotein Modeling Abi.pptx
protein Modeling Abi.pptx
 
Systems biology & Approaches of genomics and proteomics
 Systems biology & Approaches of genomics and proteomics Systems biology & Approaches of genomics and proteomics
Systems biology & Approaches of genomics and proteomics
 
Protein Structure Determination
Protein Structure DeterminationProtein Structure Determination
Protein Structure Determination
 
SBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resourcesSBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resources
 
Molecular Simulation to build models for enzyme induced fit
Molecular Simulation to build models for enzyme induced fit Molecular Simulation to build models for enzyme induced fit
Molecular Simulation to build models for enzyme induced fit
 
Genome structure prediction a review over soft computing techniques
Genome structure prediction a review over soft computing techniquesGenome structure prediction a review over soft computing techniques
Genome structure prediction a review over soft computing techniques
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
H43014046
H43014046H43014046
H43014046
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
2007 03-16 modeling and static analysis of complex biological systems dsr
2007 03-16 modeling and static analysis of complex biological systems dsr2007 03-16 modeling and static analysis of complex biological systems dsr
2007 03-16 modeling and static analysis of complex biological systems dsr
 

Recently uploaded

みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 

Recently uploaded (20)

みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 

PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS

  • 1. PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS BILAL NIZAMI M.Tech (Bioinformatics) Under the guidance of Dr. SUSAN THOMAS Biomedical Informatics Center (BIC) NIRRH, Mumbai
  • 2. We will be discussing.. • The problem • The solution • Objectives • Literature reviews • Machine learning in biological problems • Antimicrobial activity prediction • Technical background • Methodology • Results • Conclusions • Future perspective • Availability and publications
  • 3. The Problem • Increasing resistance toward the conventional antibiotics has become a global concern. Source:-CenterforGlobalDevelopment(CGD)
  • 4. The solution • Novel antibacterial agents • Antimicrobial peptides (AMPs) are potential alternatives for conventional antibiotics because of- 1.ability to kill target cells rapidly. 2.broad spectrum of activity. 3. and modularity.
  • 5. Yet another obstacle • Exact MOA and SAR of AMPs is not known completely *. • Many reasons can be given for the same :- 1. Diversity in AMPs sequence 2. Varied structures 3. Unorganized structure in solution 4. Unknown structure of numerous AMPs. • Above and beyond high throughput screening, methods for large scale synthesis and automated assay techniques, two other important pre requisites are a) open source in silico libraries of AMPs b) efficient computational methods. • A computational method includes prediction tools for antimicrobial activity. * Mohammad Rahnamaeian: Antimicrobial peptides Modes of mechanism, modulation of defense responses, Plant Signaling &Behavior 6:9, 1325-1332; 2011Landes Bioscience.
  • 6. Objectives • Machine learning based prediction tools for antimicrobial activity. • Comparison of SVM, RF and ANN based prediction models • Relative importance of various peptide descriptors in prediction ability of models.
  • 7. Literature Reviews • AMPs are Abundant and diverse group of biomolecules. • Selectively lethal against microbes. • Found every where e.g. Monera(Eubacteria), Protista (protozoans and algae), Fungi (yeasts), Plantae (plants) and Animalia (insects, fish, amphibians, reptiles, birds and mammals). (Sang Y et al. 2008 ) • Exist as α-helical peptides, and β-sheet peptides. • Difference between cell membrane’s composition, polarization, and structure of eukaryotes and prokaryotes is responsible for selective action. Brogden KA (2005) • Attraction, attachment and pore formation are seen during the action of AMPs (Roland ,2009)
  • 8. Literature Reviews… • Two significant properties which are considered for de-novo design of AMPs (Richard W. 2008 and Prenner 2005) 1. Net positive charge to interact with negatively charged bacterial membrane. 2. Amphipathic structure to facilitate its integration into the bacterial membrane. (Sarika P 2011) Red, basic (positively charged) amino acids Green, hydrophobic amino acids Michael Zasloff (2002)
  • 9. Machine learning in Biological problems • 1958 - First attempt to model neuronal architecture of the brain. • 1982 - Stormo et al. proposed ‘Perceptron’ algorithm to distinguish E. coli translational initiation sequences from other sites. • machine learning is employed for :- 1. Prediction models 2. Automatic annotation 3. Protein structure and function prediction 4. Active sites determination in proteins 5. Evolutionary analysis 6. Determination of binding sites on protein target 7. Biological network analysis 8. Patterns discovery in biochemical pathways 9. Phylogenetic tree analysis 10. Identifying genetic markers of disease.
  • 10. Antimicrobial activity prediction • several machine learning based prediction methods have been developed * support vector machines (SVM), discriminant analysis (DA), Sliding window (SW), artificial neural network (ANN), quantitative matrix (QM), Hidden markov model (HMM), sequence alignment (SA), Weighted finite- state transducers (WFST) • Still a huge gap exists between what need to be achieved and what has been achieved. Algorithm / method * Reference Associated database SVM Lata et al. AntiBP ANN Lata et al. AntiBP SW Torrent et al. -- DA Thomas et al. CAMP QM Lata et al. AntiBP WFST Whelan et al. -- HMM Hammami et al. PhytAMP Hammami et al. BACTIBASE SA Wang et al. APD2
  • 11. Antimicrobial activity prediction • This is a challenging task, due to • Low sequence similarity among diverse AMPs (Hancock RE 1999) • Unorganised conformation • Moreover costly experimental methods • So we need good prediction models • Physicochemical properties like Charge, size, amphipathicity, amino acid composition, structural conformation, hydrophobicity and polar angle are responsible for antimicrobial activity. • Total of 257 peptide descriptors - which includes dipeptide and tripeptide composition, composition based on reduced alphabets, amino acid indices, charge, and hydrophobicity indices.
  • 12. Technical background SVMs • Supervised learning model. • Originally it was for linearly separable case. • In 1995 it was extended to the linearly non separable cases also.
  • 15. Non linear SVM • Kernel trick. • Data points are nonlinearly mapped to a feature space of high dimensions. • The transformation used is f([x y]) = [x y (x^2+y^2)].
  • 16. Random Forest • Ensemble learning framework. • It raises multiple classification trees. • Decision tree is a common flow chart like schema to represent classification problems.
  • 17. Random forest.. • Each decision tree in RF is grown as follows :- • Sample N cases (1/3 of original dataset)with replacement from the original data. • Select randomly m predictor out of the M predictors (m<<M) and variable that provides the best split is used to split the node. • Each tree is grown to its largest possible extent & each tree votes for ‘class labels’. • The classification winning most votes are chosen.
  • 18. Advantages of RF • High prediction accuracy. • Hold perfectly good for large scale dataset with large number of variables. • Integral variable selection based on importance and variable interaction. • Deals efficiently with data having missing values. • Ability to reuse forest for future estimation. • Computation of relation between variables and classification. • Proximity calculation between cases. • Can be used for unsupervised learning and outlier detection. • Internal unbiased estimate of the generalization error
  • 19. ANN
  • 20. ANN.. • Interconnected, complex network of perceptron forms ANN.
  • 21. Perceptron learning rule • It involves learning to fix the weight vector so that it is able to predict correct ±1 output. • It is a method to alter and re-adjust the weights. Perceptron rule • Assign initial weights randomly. • Then iteratively apply the perceptron. • If perceptron mis calculate the output, readjust the weights. Repeat this. Delta rule • Perceptron rule fails to converge in nonlinearly separable case. • Based on gradient descent search algorithm. • Searches the suitable weight from a hypothesis space of weights.
  • 22. Methodology Key issues • Data representation • Cross validation • Measurement of classifier’s performance • Sensitivity • Specificity • MCC • Prediction accuracy
  • 23. Methodology … • CAMP currently contains 4020 AMPs • Sequences having X was removed. • redundant sequence - Cd hit (cut- off of 0.9) • Final negative dataset - 4011 sequences. • Perl script to calculate 257 peptide features. • train and test data -70:30. • Best 64 features - RF Gini score • Package randomForest in R for RF. • 1000 tree and default mtry. • Kernlab package for SVM, Polynomial kernel. • nnet package for ANN. Log liner model with 65 weights. • Package “ROCR” for evaluation.
  • 24. Results • 1470 AMP and 532 NAMP in test dataset. • RF shows the best prediction accuracy Algorithm MCC against test dataset Prediction Accuracy (in %) AUC of ROC curve RF 0.87 94.2 0.98 SVM 0.82 92.3 0.97 ANN 0.74 87.9 0.94
  • 25. Comparison with other prediction tools Server / tools Prediction accuracy (%) RF SVM ANN SW QM Our method 94.2 92.3 87.9 -- -- AntiBP -- 92.1 88.17 -- 90.37 AMPA -- -- -- 85 -- Random Forest (RF), Support vector machines (SVM), Artificial neural network (ANN), sliding window (SW), quantitative matrix (QM)
  • 26. Fig 1 Fig 2 Fig 3 Figure 1 - Plot of cumulative error rates in RF - black (overall), red - class 0 (AMP), green - class 1 (NAMP) Figure 2 - A variable importance plot. Variable importance is determined by Mean decrease in Gini score. Figure 3 - Scatter plot of RF model (red triangle - AMP and black circle – NAMP).
  • 27. Conclusions General conclusion • Prediction tools are very crucial for designing and synthesis of novel AMPs. • Sequence of an AMP plays an important role in antimicrobial activity. • It is necessary to understand the role of peptide feature in antimicrobial activity. • Prediction accuracy relies on the relevant information contained within the descriptors. Specific conclusions • RF has higher prediction performance. Ensemble technique seems to be the reason behind this. • Best 64 peptide features is identified. • The prediction tools developed during this study will certainly help in identifying the new potential AMPs.
  • 28. Future prospective • Better prediction methods - by incorporating diverse peptide features & more stringent noise removal strategy. • Antimicrobial region prediction in a peptide would be very useful. • Developing a benchmark dataset would be a great milestone. • Position specific scoring matrix (PSSM) based prediction. • Classifying a predicted AMP into further sub families based on functions. Although this work has been done, it still leaves the room for improvement in accuracy and methodology.
  • 29. Availability & Publication • Version 2 of CAMP http://www.bicnirrh.res.in/antimicrobial/ • Publication of CAMP version 2 is in communication with Nucleic Acid research (NAR) http://nar.oxfordjournals.org/.