SlideShare a Scribd company logo
1 of 26
Machine Learning Model to Predict
activity of short antimicrobial Peptides
from set of Antimicrobial sets QSAR
By : Kashaf Naz
Outline
1. What is peptides and it's activity
2. List of libraries I used
3. Pfeature composition importance in computing AMP
4. What is Lazypredict
5. What is Random Forest
6. Display Dataframe of the dataset after feature selection (variance threshold) -
Goal
Peptides
• Peptides are short chains of between
two and fifty amino acids, linked by
peptide bonds. Chains of fewer than ten
or fifteen amino acids are called
oligopeptides, and include dipeptides,
tripeptides, and tetrapeptides. A
polypeptide is a longer, continuous,
unbranched peptide chain of up to
approximately fifty amino acids.
Antimicrobial Peptide
• Antimicrobial peptides are a unique and
diverse group of molecules, which are
divided into subgroups on the basis of
their amino acid composition and
structure.Antimicrobial peptides are
generally between 12 and 50 amino
acids.
Activity of short
Antimicrobial Peptides
• Antimicrobial peptides are a unique and
diverse group of molecules, which are
divided into subgroups on the basis of
their amino acid composition and
structure. Antimicrobial peptides are
generally between 12 and 50 amino
acids
• Antimicrobial peptides (AMPs) are a
class of short, usually positively charged
polypeptides that exist in humans,
animals, and plants. Considering the
increasing number of drug-resistant
pathogens,
the antimicrobial activity of AMPs has
attracted much attention.
Predict activity of short Antimicrobial
Peptides We have to Play around with these:
Conda In which we install packages like python, Our working Environments
Lazypredict AutoML
Pfeature Pfeature allow us to compute properties of Amino Acid which will be crucial to Quantify the
Molecular properties of peptides
Jupyter NoteBook/ Colab
CD-Hit from bioconda A library allows us to fit or out any Redundancy in Peptide Sequence, meaning that peptide that
are Much Similar will be removed, So We will get non-redundant and a unique sub set of Peptides
that will be using in Molecular sequence
Pandas It’s data-frame for viewing Visualization
Python For Programing
Random Forest classifier modeling
Matplotlip Graph visualization
Pfeature Composition Table
Feature claass Description Function
AAC Amino acid composition aac_wp
DPC Dipeptide composition dpc_wp
TPC Tripeptide composition tpc_wp
ABC Atom and bond composition atc_wp, btc_wp
PCP Physico-chemical properties pcp_wp
AAI Amino acid index composition aai_wp
RRI Repetitive Residue Information rri_wp
DDR Distance distribution of residues ddr_wp
A glance on .Fasta File of AA
• In bioinformatics and biochemistry, the
FASTA format is a text-based format for
representing either nucleotide sequences
or amino acid (protein) sequences, in
which nucleotides or amino acids are
represented using single-letter codes. The
format also allows for sequence names and
comments to precede the sequences.
Define functions
for calculating the
different features
Amino acid composition (AAC)
from Pfeature.pfeature import aac_wp
Define functions
for calculating the
different features
• tripeptide composition (TPC)
• from Pfeature.pfeature import tpc_wp
Calculate feature for both positive and
negative classes + combines the two
classes + merge with class labels
• pos = 'train_po_cdhit.txt'
• neg = 'train_ne_cdhit.txt'
feature = feature_calc(pos, neg, aac) # AAC
pos = 'train_po_cdhit.txt'
neg = 'train_ne_cdhit.txt'
feature = feature_calc(pos, neg, aac) # TPC
Tripeptide composition (TPC)
Quickly
compare >30
ML algorithms
Lazypredict- The Automl library
Lazy Predict Helps build a lot of basic models without much code and helps understand which models works better without any parameter tuning.
There are two classes, LazyClassifier and LazyRegressor, respectively for classifier and regressor. We can import the classifier class if your problem is classification,
and import regressor if you have a regression problem.
Data split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state =42, stratify=y)
Defines and builds the lazyclassifier
clf = LazyClassifier(verbose=0,ignore_warnings=True, custom_metric=matthews_corrcoef)
models_train,predictions_train = clf.fit(X_train, X_train, y_train, y_train)
#models_test,predictions_test = clf.fit(X_train, X_test, y_train, y_test)
Prints the model performance (Training set)
models_train
Prints the model performance (Test set)
models_test
Plot of Accuracy
Plot of MCC
What is Random Forest?
• Random forest is a technique used in
modeling predictions and behavior
analysis and is built on decision trees.
It contains many decision trees
representing a distinct instance of the
classification of data input into the
random forest. The random forest
technique considers the instances
individually, taking the one with the
majority of votes as the selected
prediction.
• The random forest technique can
handle large data sets due to its
capability to work with many
variables running to thousands.
Build random forest model
Receiver operating
characteristic(ROC) curve
Receiver operating
characteristic(ROC)curve
Combine feature names and Gini values into a Dataframe
Plot of feature importance
Sort by Gini in descending order
Thank You

More Related Content

What's hot

Computer Aided Vaccine Design
Computer Aided Vaccine DesignComputer Aided Vaccine Design
Computer Aided Vaccine DesignGeoffrey Siwo
 
Computational Prediction Of Protein-1.pptx
Computational Prediction Of Protein-1.pptxComputational Prediction Of Protein-1.pptx
Computational Prediction Of Protein-1.pptxashharnomani
 
Mucosal Immunity
Mucosal ImmunityMucosal Immunity
Mucosal ImmunityPamoja
 
Metagenomic analysis
Metagenomic analysisMetagenomic analysis
Metagenomic analysisAnimesh Kumar
 
Application of proteomics science
Application of proteomics scienceApplication of proteomics science
Application of proteomics scienceAanchal46
 
Antibody therapy and engineering
Antibody therapy and engineeringAntibody therapy and engineering
Antibody therapy and engineeringGrace Felciya
 
Heterogeneous and homogeneous immunoassays
Heterogeneous and homogeneous immunoassaysHeterogeneous and homogeneous immunoassays
Heterogeneous and homogeneous immunoassaysGurubarath1
 
Computational predictiction of prrotein structure
Computational predictiction of prrotein structureComputational predictiction of prrotein structure
Computational predictiction of prrotein structureArchita Srivastava
 
Merck molecular force field ppt
Merck molecular force field pptMerck molecular force field ppt
Merck molecular force field pptseema sangwan
 
Immunotolerance: mechanism and consequence
Immunotolerance: mechanism and consequence Immunotolerance: mechanism and consequence
Immunotolerance: mechanism and consequence Dr Alok Tripathi
 
siRNA technology
siRNA technologysiRNA technology
siRNA technologyRiaDas31
 
Immunological Assay.pptx
Immunological Assay.pptxImmunological Assay.pptx
Immunological Assay.pptxAshwani Dhingra
 
Discovery of Novel Shp2 inhibitors
Discovery of Novel Shp2 inhibitorsDiscovery of Novel Shp2 inhibitors
Discovery of Novel Shp2 inhibitorsLiwei Chen
 
Pharmacokinetics of biotherapeutics.
Pharmacokinetics of biotherapeutics.Pharmacokinetics of biotherapeutics.
Pharmacokinetics of biotherapeutics.Parbhat Saini
 

What's hot (20)

Computer Aided Vaccine Design
Computer Aided Vaccine DesignComputer Aided Vaccine Design
Computer Aided Vaccine Design
 
Docking
DockingDocking
Docking
 
Computational Prediction Of Protein-1.pptx
Computational Prediction Of Protein-1.pptxComputational Prediction Of Protein-1.pptx
Computational Prediction Of Protein-1.pptx
 
Aptamers
AptamersAptamers
Aptamers
 
Mucosal Immunity
Mucosal ImmunityMucosal Immunity
Mucosal Immunity
 
Metagenomic analysis
Metagenomic analysisMetagenomic analysis
Metagenomic analysis
 
Application of proteomics science
Application of proteomics scienceApplication of proteomics science
Application of proteomics science
 
Antibody therapy and engineering
Antibody therapy and engineeringAntibody therapy and engineering
Antibody therapy and engineering
 
Heterogeneous and homogeneous immunoassays
Heterogeneous and homogeneous immunoassaysHeterogeneous and homogeneous immunoassays
Heterogeneous and homogeneous immunoassays
 
Computational predictiction of prrotein structure
Computational predictiction of prrotein structureComputational predictiction of prrotein structure
Computational predictiction of prrotein structure
 
Immunoassay
ImmunoassayImmunoassay
Immunoassay
 
Merck molecular force field ppt
Merck molecular force field pptMerck molecular force field ppt
Merck molecular force field ppt
 
Immunotolerance: mechanism and consequence
Immunotolerance: mechanism and consequence Immunotolerance: mechanism and consequence
Immunotolerance: mechanism and consequence
 
Poster Presentation.pdf
Poster Presentation.pdfPoster Presentation.pdf
Poster Presentation.pdf
 
siRNA technology
siRNA technologysiRNA technology
siRNA technology
 
Immunological Assay.pptx
Immunological Assay.pptxImmunological Assay.pptx
Immunological Assay.pptx
 
MHC genes
MHC  genesMHC  genes
MHC genes
 
Discovery of Novel Shp2 inhibitors
Discovery of Novel Shp2 inhibitorsDiscovery of Novel Shp2 inhibitors
Discovery of Novel Shp2 inhibitors
 
Pharmacokinetics of biotherapeutics.
Pharmacokinetics of biotherapeutics.Pharmacokinetics of biotherapeutics.
Pharmacokinetics of biotherapeutics.
 
Immunotherapeutics
ImmunotherapeuticsImmunotherapeutics
Immunotherapeutics
 

Similar to Machine Learning Model to Predict Activity of Short Antimicrobial Peptides

Session ii g2 overview metabolic network modeling mcc
Session ii g2 overview metabolic network modeling mccSession ii g2 overview metabolic network modeling mcc
Session ii g2 overview metabolic network modeling mccUSD Bioinformatics
 
Pepfold 3 peptide structure prediction
Pepfold 3 peptide structure predictionPepfold 3 peptide structure prediction
Pepfold 3 peptide structure predictionBioCode Ltd
 
Drug properties (ADMET) prediction using AI
Drug properties (ADMET) prediction using AIDrug properties (ADMET) prediction using AI
Drug properties (ADMET) prediction using AIIndrajeetKumar124
 
Rosaura parisi ppt progetto
Rosaura parisi ppt progettoRosaura parisi ppt progetto
Rosaura parisi ppt progettolab13unisa
 
Distributed approach for Peptide Identification
Distributed approach for Peptide IdentificationDistributed approach for Peptide Identification
Distributed approach for Peptide Identificationabhinav vedanbhatla
 
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATION
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATIONA NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATION
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATIONcscpconf
 
Oracle databasecapacityanalysisusingstatisticalmethods
Oracle databasecapacityanalysisusingstatisticalmethodsOracle databasecapacityanalysisusingstatisticalmethods
Oracle databasecapacityanalysisusingstatisticalmethodsAjith Narayanan
 
Automatic Generation of Peephole Superoptimizers
Automatic Generation of Peephole SuperoptimizersAutomatic Generation of Peephole Superoptimizers
Automatic Generation of Peephole Superoptimizerskeanumit
 
Matlab Master Thesis Writing Service
Matlab Master Thesis Writing ServiceMatlab Master Thesis Writing Service
Matlab Master Thesis Writing ServicePhdtopiccom
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple stepsRenjith M P
 
StackAdapt Machine Learning Pipeline
StackAdapt Machine Learning PipelineStackAdapt Machine Learning Pipeline
StackAdapt Machine Learning PipelineLarkin Liu
 
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERING
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERINGA GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERING
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERINGLubna_Alhenaki
 
Translating data to predictive models
Translating data to predictive modelsTranslating data to predictive models
Translating data to predictive modelsChemAxon
 
Data Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software DevelopmentData Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software DevelopmentNeil Swainston
 

Similar to Machine Learning Model to Predict Activity of Short Antimicrobial Peptides (20)

Session ii g2 overview metabolic network modeling mcc
Session ii g2 overview metabolic network modeling mccSession ii g2 overview metabolic network modeling mcc
Session ii g2 overview metabolic network modeling mcc
 
ADMET.pptx
ADMET.pptxADMET.pptx
ADMET.pptx
 
Pepfold 3 peptide structure prediction
Pepfold 3 peptide structure predictionPepfold 3 peptide structure prediction
Pepfold 3 peptide structure prediction
 
Drug properties (ADMET) prediction using AI
Drug properties (ADMET) prediction using AIDrug properties (ADMET) prediction using AI
Drug properties (ADMET) prediction using AI
 
Rosaura parisi ppt progetto
Rosaura parisi ppt progettoRosaura parisi ppt progetto
Rosaura parisi ppt progetto
 
Bhageerath h
Bhageerath  h Bhageerath  h
Bhageerath h
 
PPT
PPTPPT
PPT
 
Distributed approach for Peptide Identification
Distributed approach for Peptide IdentificationDistributed approach for Peptide Identification
Distributed approach for Peptide Identification
 
Raptor user manual3.0
Raptor user manual3.0Raptor user manual3.0
Raptor user manual3.0
 
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATION
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATIONA NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATION
A NEW TECHNIQUE INVOLVING DATA MINING IN PROTEIN SEQUENCE CLASSIFICATION
 
Oracle databasecapacityanalysisusingstatisticalmethods
Oracle databasecapacityanalysisusingstatisticalmethodsOracle databasecapacityanalysisusingstatisticalmethods
Oracle databasecapacityanalysisusingstatisticalmethods
 
Automatic Generation of Peephole Superoptimizers
Automatic Generation of Peephole SuperoptimizersAutomatic Generation of Peephole Superoptimizers
Automatic Generation of Peephole Superoptimizers
 
Matlab Master Thesis Writing Service
Matlab Master Thesis Writing ServiceMatlab Master Thesis Writing Service
Matlab Master Thesis Writing Service
 
Fulltext
FulltextFulltext
Fulltext
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
 
StackAdapt Machine Learning Pipeline
StackAdapt Machine Learning PipelineStackAdapt Machine Learning Pipeline
StackAdapt Machine Learning Pipeline
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
 
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERING
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERINGA GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERING
A GENETIC-FROG LEAPING ALGORITHM FOR TEXT DOCUMENT CLUSTERING
 
Translating data to predictive models
Translating data to predictive modelsTranslating data to predictive models
Translating data to predictive models
 
Data Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software DevelopmentData Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software Development
 

More from Kashafnaz2

AI Computer vision
AI Computer visionAI Computer vision
AI Computer visionKashafnaz2
 
Commands and create account
Commands and create accountCommands and create account
Commands and create accountKashafnaz2
 
GIT AND GITHUB
GIT AND GITHUBGIT AND GITHUB
GIT AND GITHUBKashafnaz2
 
Business idea / Bioinformatics startup
Business idea / Bioinformatics startupBusiness idea / Bioinformatics startup
Business idea / Bioinformatics startupKashafnaz2
 
EVOLUTION / CONVERSATONS IN EVOLUTION THEORY 
EVOLUTION / CONVERSATONS IN EVOLUTION THEORY EVOLUTION / CONVERSATONS IN EVOLUTION THEORY 
EVOLUTION / CONVERSATONS IN EVOLUTION THEORY Kashafnaz2
 
what is Epigenetics and It's Functionality
what is Epigenetics and It's Functionalitywhat is Epigenetics and It's Functionality
what is Epigenetics and It's FunctionalityKashafnaz2
 
Fatty acid catabolism
Fatty acid catabolismFatty acid catabolism
Fatty acid catabolismKashafnaz2
 
PCR presentation
PCR presentationPCR presentation
PCR presentationKashafnaz2
 
two tier and three tier
two tier and three tiertwo tier and three tier
two tier and three tierKashafnaz2
 

More from Kashafnaz2 (9)

AI Computer vision
AI Computer visionAI Computer vision
AI Computer vision
 
Commands and create account
Commands and create accountCommands and create account
Commands and create account
 
GIT AND GITHUB
GIT AND GITHUBGIT AND GITHUB
GIT AND GITHUB
 
Business idea / Bioinformatics startup
Business idea / Bioinformatics startupBusiness idea / Bioinformatics startup
Business idea / Bioinformatics startup
 
EVOLUTION / CONVERSATONS IN EVOLUTION THEORY 
EVOLUTION / CONVERSATONS IN EVOLUTION THEORY EVOLUTION / CONVERSATONS IN EVOLUTION THEORY 
EVOLUTION / CONVERSATONS IN EVOLUTION THEORY 
 
what is Epigenetics and It's Functionality
what is Epigenetics and It's Functionalitywhat is Epigenetics and It's Functionality
what is Epigenetics and It's Functionality
 
Fatty acid catabolism
Fatty acid catabolismFatty acid catabolism
Fatty acid catabolism
 
PCR presentation
PCR presentationPCR presentation
PCR presentation
 
two tier and three tier
two tier and three tiertwo tier and three tier
two tier and three tier
 

Recently uploaded

MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 

Recently uploaded (20)

MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 

Machine Learning Model to Predict Activity of Short Antimicrobial Peptides

  • 1. Machine Learning Model to Predict activity of short antimicrobial Peptides from set of Antimicrobial sets QSAR By : Kashaf Naz
  • 2. Outline 1. What is peptides and it's activity 2. List of libraries I used 3. Pfeature composition importance in computing AMP 4. What is Lazypredict 5. What is Random Forest 6. Display Dataframe of the dataset after feature selection (variance threshold) - Goal
  • 3. Peptides • Peptides are short chains of between two and fifty amino acids, linked by peptide bonds. Chains of fewer than ten or fifteen amino acids are called oligopeptides, and include dipeptides, tripeptides, and tetrapeptides. A polypeptide is a longer, continuous, unbranched peptide chain of up to approximately fifty amino acids.
  • 4. Antimicrobial Peptide • Antimicrobial peptides are a unique and diverse group of molecules, which are divided into subgroups on the basis of their amino acid composition and structure.Antimicrobial peptides are generally between 12 and 50 amino acids.
  • 5. Activity of short Antimicrobial Peptides • Antimicrobial peptides are a unique and diverse group of molecules, which are divided into subgroups on the basis of their amino acid composition and structure. Antimicrobial peptides are generally between 12 and 50 amino acids • Antimicrobial peptides (AMPs) are a class of short, usually positively charged polypeptides that exist in humans, animals, and plants. Considering the increasing number of drug-resistant pathogens, the antimicrobial activity of AMPs has attracted much attention.
  • 6. Predict activity of short Antimicrobial Peptides We have to Play around with these: Conda In which we install packages like python, Our working Environments Lazypredict AutoML Pfeature Pfeature allow us to compute properties of Amino Acid which will be crucial to Quantify the Molecular properties of peptides Jupyter NoteBook/ Colab CD-Hit from bioconda A library allows us to fit or out any Redundancy in Peptide Sequence, meaning that peptide that are Much Similar will be removed, So We will get non-redundant and a unique sub set of Peptides that will be using in Molecular sequence Pandas It’s data-frame for viewing Visualization Python For Programing Random Forest classifier modeling Matplotlip Graph visualization
  • 7. Pfeature Composition Table Feature claass Description Function AAC Amino acid composition aac_wp DPC Dipeptide composition dpc_wp TPC Tripeptide composition tpc_wp ABC Atom and bond composition atc_wp, btc_wp PCP Physico-chemical properties pcp_wp AAI Amino acid index composition aai_wp RRI Repetitive Residue Information rri_wp DDR Distance distribution of residues ddr_wp
  • 8. A glance on .Fasta File of AA • In bioinformatics and biochemistry, the FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences.
  • 9. Define functions for calculating the different features Amino acid composition (AAC) from Pfeature.pfeature import aac_wp
  • 10. Define functions for calculating the different features • tripeptide composition (TPC) • from Pfeature.pfeature import tpc_wp
  • 11. Calculate feature for both positive and negative classes + combines the two classes + merge with class labels • pos = 'train_po_cdhit.txt' • neg = 'train_ne_cdhit.txt' feature = feature_calc(pos, neg, aac) # AAC
  • 12. pos = 'train_po_cdhit.txt' neg = 'train_ne_cdhit.txt' feature = feature_calc(pos, neg, aac) # TPC Tripeptide composition (TPC)
  • 14. Lazypredict- The Automl library Lazy Predict Helps build a lot of basic models without much code and helps understand which models works better without any parameter tuning. There are two classes, LazyClassifier and LazyRegressor, respectively for classifier and regressor. We can import the classifier class if your problem is classification, and import regressor if you have a regression problem. Data split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state =42, stratify=y) Defines and builds the lazyclassifier clf = LazyClassifier(verbose=0,ignore_warnings=True, custom_metric=matthews_corrcoef) models_train,predictions_train = clf.fit(X_train, X_train, y_train, y_train) #models_test,predictions_test = clf.fit(X_train, X_test, y_train, y_test)
  • 15. Prints the model performance (Training set) models_train
  • 16. Prints the model performance (Test set) models_test
  • 19. What is Random Forest? • Random forest is a technique used in modeling predictions and behavior analysis and is built on decision trees. It contains many decision trees representing a distinct instance of the classification of data input into the random forest. The random forest technique considers the instances individually, taking the one with the majority of votes as the selected prediction. • The random forest technique can handle large data sets due to its capability to work with many variables running to thousands.
  • 20.
  • 24. Combine feature names and Gini values into a Dataframe
  • 25. Plot of feature importance Sort by Gini in descending order