SlideShare a Scribd company logo
1 of 16
Download to read offline
ABSTRACT
To improve the crop plant yield, agriculture companies have successfully adopted
development of insect resistant crops by expressing insecticidal (insect killing) proteins in
plants. As a leader in Agriculture Biotechnology industry, Bayer tests hundreds of genes
every year for insecticidal activity in their proprietary pipeline to develop next generation of
insect control solutions. Identification and nomination insecticidal proteins using traditional
methods like blast and structure similarity have some drawbacks because of which more
than 90% of the nominated proteins end up displaying no or less activity against insects. The
testing of these proteins consumes enormous amount of time and resource. So we adopted
machine learning (ML) approach to identify these proteins. We generated numerous features
for more than 5000 amino acid sequences using a Python toolkit, iFeature, developed by
Chen et al, in 2018 and built ML models to identify proteins with insecticidal activity.
Proteins identified using this method are tested in the pipeline to check their efficacy against
insect pests. Challenges faced while building the model and methods to overcome those
challenges are discussed in this presentation.
1
HOW WE BUILT A ML MODEL
TO PREDICT PROTEINS WITH
INSECTICIDAL ACTIVITY?
Karnam Vasudeva Rao,
Senior Scientist, Data Science,
Monsanto (A Subsidiary of Bayer)
CONTENTS
▰ What are insecticidal proteins?
▰ Why machine learning for protein activity identification?
▰ Different approaches used by researchers
▰ Why not general methods?
▰ iFeature Python tool kit
▰ Why did we choose iFeature?
▰ What features iFeature has?
▰ How we adopted it for our need?
▰ What were the challenges?
▰ How did we overcome those?
▰ Key learnings 3
IMPROVE CROP YIELD BY DEVELOPING PEST RESISTANT
CROPS BY EXPRESSING INSECTICIDAL PROTEINS IN THEM
4
WHY WE NEED ML FOR GENE NOMINATIONS?
5Current state
What?
Predict protein activity
against insect pests based
on Amino Acid sequence
features to enable quality
nominations to insect control
pipeline in Bayer.
Why?
100’s of proteins are
nominated and analyzed in
each year. Many
nominations have turned out
to be inactive proteins /
toxins. Goal is to develop a
model to predict the
propensity of toxicity.
How?
Extract features from
>5000 Protein (amino
acid) sequences and
develop a predictive
model using historical
data to predict inactive
toxins.
Future state
Pipeline
THREE MAJOR APPROACHES ARE USED BY
RESEARCHERS TO PREDICT PROTEIN FUNCTIONS
6
1 2 3
Sequence similarity between
AA sequences
Protein structure
comparison
Disadvantages with traditional methods:
High-similarity BLAST does not always imply homology.
Proteins with the same function can have different
structures.
Proteins that have diverged from a common ancestral
gene may have the same function but different
sequences.
Sequence similarity-based approaches are often
inadequate in the absence of similar sequences or when
the sequence similarity among known protein sequences
is statistically weak (called the "twilight zone" or
"midnight zone") (reference: Proteome Science 2009,
7:27).
Biological experiments for protein identification are time
consuming and resource intensive.
Sequence and structure
derived features
iFeature - AN OPEN-SOURCE PYTHON TOOLKIT FOR
PREDICTION OF PROTEINS ACTIVITY
7
iFeature
▰ http://iFeature.erc.monash.edu/
▰ https://github.com/Superzchen/iFeature/
▰ Features:
▰ Protein length, molecular weight, number of atoms,
grand average of hydropathicity (GRAVY), amino
acid composition, periodicity, physicochemical
properties, predicted secondary structures,
subcellular location, sequence motifs or highly
conserved regions, classification of protein function,
hydrophobicity, solvent accessibility, secondary
structure, surface tension, charge, polarisability,
polarity, and normalized van der Waals volume and
annotations in protein databases.
•Predicting protein–protein interactions
through sequence-based deep
learning.
•Bioinformatics, 34, 2018, i802–i810
DPPI
•Predicting protein functions from
sequence and interactions using a
deep ontology-aware classifier.
•Bioinformatics, 34(4), 2018, 660–668
DeepGO
•Predicting protein function by
machine learning on amino acid
sequences – a critical evaluation
•BMC Genomics 2007, 8:78
Classifiers
Place your screenshot here
8
iFeature - AN OPEN-
SOURCE PYTHON TOOLKIT
GitHuB repository with codes,
usage instructions and examples.
9
SAMPLE DATA
toxin sequence score
protein3345 MNSYQNQYEILESSSNNTNMPNRYPFANDPNIFPINLDACQGRPWQDTWKSVSDIVTIGTYLIQFLREPGIGGIPVILSIINKLIPSSG0
protein10357 MSDLEVKIGVNPADVRYTANFKVAPNDGYVMYEKNTPIIPEIGVNITVINTGREEMEVHYEWAPPFGGWQCASTTIIPPDGKPVYIA0
protein7062 MSINIDPSKEFVKVSNFAGYEIATSQDSEEEGANLIIYYTADPYLLFYLDEERNNGILVSRRTGFVIGVKSGSNKDGELIIQCEWDGEPYS0
protein000023 MKICVVNILLGLLMIVGESAANIGYADLTTNVYFVATIKSSTCQMSLEGGTAGGGDSYTIPVGSNGKVGAIDIINGTENAMANFSLDI0
protein3518 MKSISKKVMAGLLVGATSLSIWAPISEAAAPENNRYYNIALKSNTKKVWNVSQASNDNDRAIVLWQGGSADHERFAFFQLDGGA0
protein10355 MGIKKTIKFILCLSISLCILNYPSISFAETLDTNSSSVKSKSDIDTGIANLNYNNREVLAVNGDRVDSFVPKEGLNSNDKFIVVERNKKSL0
protein5481 MENSNYFEKNNFSQEDSALDSLLNTFLVIQNKKTNQVIGRPEHYIQKGIITYYFINLENEADIPEQQLILYKLDNKSYYIVSRNKSAYYSF0
protein000025 MKRIFFFIPLILGLVACADDDSFSTSTGLRLDFPSDTIKLDTVFSRTASSTYTFWVNNRNDNGVKLQSVRLKRGNQTGFRVNVDGMY0
protein3918 MNGGKNMNQNNQNEMQIIDSSSNDFSQSNRYPRYPLAKESNYKDWLASCDESNVDTLSTTSDVKGSVSRVLGIVNQILGFLGLGF0
protein000021 MSNDIYGSSTELIANSIYETDYHVLLGIRNSNILFMTPHGGGVETGATELSIASGGTDHNYYCFEGWRTSNNGDMHVTSANFNEPVC0
protein9439 MKKKVSMMLTCVLLAPLFLNGNAPVAHAGDPFLITSIDEPTIDREGLIGYYYREDQFKNLQLFTPTRNHTLVYDQGTARDLLADSQQQ1
protein8184 MNQKKYIFMKPISILSIVCFCVSITPTSSLADMYRSRGNFTSKNENTKHTNEYYPRAIFNPYIEPAPEIITETRFASIKSTDTIAITTKNHPK0
protein2126 MTKNHKKILSMTLVTSMLAGTYIPTAYTAFAETEQKEGSQENQTGLINKGSLPLDSYGLFENPYKGVTFDQFMNAFNNNTWNPLLV0
protein9438 MKKKITKTLLCATMGISILTPLAVSAKTEDNNEQQLITQINQRENSFPNVGLGTQWLFQYYDKYLRANGLLRVAPVVTVEDLEVKNSY0
• 5000+ amino acid sequences
and activity scores 0-5.
• *0-5: inactive to highly active
10
cluster.py
iFeaturePse
KRAAC.py
feaSelector.py
pcaAnalysis.py
python iFeature.py --file examples/test-protein.txt --type CKSAAP
python iFeature.py --file examples/test-protein.txt --type DDE
POSSESS 37 FEATURE DESCRIPTIONS
• three dimensionality reduction
algorithms (PCA, LDA and t-SNE)
• program used to implement the
feature selection algorithms
• program used for running the feature or
sample clustering algorithms.
• program used to extract the 16 types
of pseudo K-tuple reduced amino acid
composition (PseKRAAC) feature
descriptors.
• k-spaced Amino Acid Pairs
11
LIST OF VARIOUS DESCRIPTORS
CALCULATED BY
Descriptor groups Descriptor Dimn.
AA composition Amino acid composition (AAC) 20
Enhanced amino acid composition (EAAC) —
Composition of k-spaced AA pairs (CKSAAP) 2400
Dipeptide composition (DPC) 400
Dipeptide deviation from expected mean (DDE) 400
Tripeptide composition (TPC) 8000
Grouped AA composition Grouped amino acid composition (GAAC) 5
Enhanced grouped AA composition (GEAAC) —
Composition of k-spaced AA group pairs (CKSAAGP) 150
Grouped dipeptide composition (GDPC) 25
Grouped tripeptide composition (GTPC) 125
Binary Binary (BINARY) —
Autocorrelation Moran (Moran) 240
Geary (Geary) 240
Normalized Moreau-Broto (NMBroto) 240
C/T/D Composition (CTDC) 39
Transition (CTDT) 39
Distribution (CTDD) 195
Conjoint triad
Conjoint triad (CTriad) 343
Conjoint k-spaced triad (KSCTriad) 343x(k+1)
Feature selectionFeature extraction Model building
Performance of the
modelsData preparation
1 2 3 4 5
What to explore in
Data?
Only 2 independent
variables
• Sequences
• Assay values
No independent
variables!
Need to generate
features using
sequences.
1000s of features;
which ones to
select?
What these
features explain?
Which model to
choose?
Confusion matrix
Biologically whether
it makes sense?
Meaningful features for protein function
prediction
CHALLENGES IN USING SEQUENCE BASED
ML APPROACHES
iFeature
13
toxin sequence score
protein3345 MNSYQNQYEILESSSNNTNMPNRYPFANDPNIFPINLDACQGRPWQDTWKSVSDIVTIGTYLIQFLREPGIGGIPVILSIINK0
protein10357 MSDLEVKIGVNPADVRYTANFKVAPNDGYVMYEKNTPIIPEIGVNITVINTGREEMEVHYEWAPPFGGWQCASTTIIPPDG0
protein7062 MSINIDPSKEFVKVSNFAGYEIATSQDSEEEGANLIIYYTADPYLLFYLDEERNNGILVSRRTGFVIGVKSGSNKDGELIIQCEW0
protein000023 MKICVVNILLGLLMIVGESAANIGYADLTTNVYFVATIKSSTCQMSLEGGTAGGGDSYTIPVGSNGKVGAIDIINGTENAMA0
protein3518 MKSISKKVMAGLLVGATSLSIWAPISEAAAPENNRYYNIALKSNTKKVWNVSQASNDNDRAIVLWQGGSADHERFAFFQ0
protein10355 MGIKKTIKFILCLSISLCILNYPSISFAETLDTNSSSVKSKSDIDTGIANLNYNNREVLAVNGDRVDSFVPKEGLNSNDKFIVVER0
protein5481 MENSNYFEKNNFSQEDSALDSLLNTFLVIQNKKTNQVIGRPEHYIQKGIITYYFINLENEADIPEQQLILYKLDNKSYYIVSRNK0
protein000025 MKRIFFFIPLILGLVACADDDSFSTSTGLRLDFPSDTIKLDTVFSRTASSTYTFWVNNRNDNGVKLQSVRLKRGNQTGFRVNV0
protein3918 MNGGKNMNQNNQNEMQIIDSSSNDFSQSNRYPRYPLAKESNYKDWLASCDESNVDTLSTTSDVKGSVSRVLGIVNQILG0
protein000021 MSNDIYGSSTELIANSIYETDYHVLLGIRNSNILFMTPHGGGVETGATELSIASGGTDHNYYCFEGWRTSNNGDMHVTSANF0
protein9439 MKKKVSMMLTCVLLAPLFLNGNAPVAHAGDPFLITSIDEPTIDREGLIGYYYREDQFKNLQLFTPTRNHTLVYDQGTARDLLA1
protein8184 MNQKKYIFMKPISILSIVCFCVSITPTSSLADMYRSRGNFTSKNENTKHTNEYYPRAIFNPYIEPAPEIITETRFASIKSTDTIAITT0
protein2126 MTKNHKKILSMTLVTSMLAGTYIPTAYTAFAETEQKEGSQENQTGLINKGSLPLDSYGLFENPYKGVTFDQFMNAFNNNTW0
protein9438 MKKKITKTLLCATMGISILTPLAVSAKTEDNNEQQLITQINQRENSFPNVGLGTQWLFQYYDKYLRANGLLRVAPVVTVEDL0
NUMEROUS SEQUENCE FEATURES
WERE GENERATED USING
MODEL EVALUATION
RANDOM FOREST WAS THE FAVORITE
14
KEY LEARNINGS
FEATURES
▰iFeature - ‘all in one package’
▰Very few independent variables
before using iFeature and too
many after using iFeature.
▰Use not only Importance but
domain knowledge to choose input
variables (e.g. K space, conjoint
triad).
DATA
▰Data bias can be overcome
using domain knowledge – 0:
active; 1-5: active (Multinomial
to binomial).
MODEL BUILDING
▰Build multiple models instead of
one or two and choose the best
based on business needs and
parameters.
▰Where multiple models perform
equally select model based on
business needs / domain
knowledge (False Positives |
False negatives) – sensitivity and
specificity.
15
OTHER APPLICATIONS
▰iFeature and above approach – to
identify disease related proteins and
Protein-protein interaction studies.
16
THANKS!
https://www.linkedin.com/in/karnam-vasudeva-rao-phd-9032759/
vasukarnam@gmail.com
vkarnam@monsanto.com; vasudevarao.karnam@bayer.com
Senior Scientist - Data Science,
Monsanto (Subsidiary of
Bayer), Bengaluru, India.

More Related Content

What's hot

JBEI Research Highlights - October 2017
JBEI Research Highlights - October 2017 JBEI Research Highlights - October 2017
JBEI Research Highlights - October 2017 Irina Silva
 
JBEI Research Highlights - January 2018
JBEI Research Highlights - January 2018  JBEI Research Highlights - January 2018
JBEI Research Highlights - January 2018 Irina Silva
 
JBEI Research Highlight Slides - February 2021
JBEI Research Highlight Slides - February 2021JBEI Research Highlight Slides - February 2021
JBEI Research Highlight Slides - February 2021SaraHarmon4
 
JBEI Research Highlights - February 2018
JBEI Research Highlights - February 2018JBEI Research Highlights - February 2018
JBEI Research Highlights - February 2018Irina Silva
 
JBEI Research Highlights - March 2018
JBEI Research Highlights - March 2018JBEI Research Highlights - March 2018
JBEI Research Highlights - March 2018Irina Silva
 
JBEI Research Highlights - December 2017
JBEI Research Highlights - December 2017 JBEI Research Highlights - December 2017
JBEI Research Highlights - December 2017 Irina Silva
 
JBEI Research Highlights - May 2019
JBEI Research Highlights - May 2019JBEI Research Highlights - May 2019
JBEI Research Highlights - May 2019Irina Silva
 
JBEI Research Highlights - January 2019
JBEI Research Highlights - January 2019JBEI Research Highlights - January 2019
JBEI Research Highlights - January 2019Irina Silva
 
JBEI Research Highlights - March 2019
JBEI Research Highlights - March 2019JBEI Research Highlights - March 2019
JBEI Research Highlights - March 2019Irina Silva
 
JBEI highlights September 2019
JBEI highlights September 2019JBEI highlights September 2019
JBEI highlights September 2019LeahFreemanSloan
 
JBEI highlights November 2019
JBEI highlights November 2019JBEI highlights November 2019
JBEI highlights November 2019LeahFreemanSloan
 
JBEI Research Highlights - April 2019
JBEI Research Highlights - April 2019JBEI Research Highlights - April 2019
JBEI Research Highlights - April 2019Irina Silva
 
September 2021 - JBEI Research Highlights Slides
September 2021 - JBEI Research Highlights SlidesSeptember 2021 - JBEI Research Highlights Slides
September 2021 - JBEI Research Highlights SlidesSaraHarmon4
 
JBEI August 2020 Highlights
JBEI August 2020 HighlightsJBEI August 2020 Highlights
JBEI August 2020 HighlightsLeahFreemanSloan
 
JBEI Highlights - October 2014
JBEI Highlights - October 2014JBEI Highlights - October 2014
JBEI Highlights - October 2014Irina Silva
 
JBEI Highlights December 2014
JBEI Highlights December 2014JBEI Highlights December 2014
JBEI Highlights December 2014Irina Silva
 
JBEI October 2020 Research Highlights
JBEI October 2020 Research HighlightsJBEI October 2020 Research Highlights
JBEI October 2020 Research HighlightsSaraHarmon4
 
JBEI August 2019 highlights
JBEI August 2019 highlightsJBEI August 2019 highlights
JBEI August 2019 highlightsLeahFreemanSloan
 

What's hot (20)

JBEI Research Highlights - October 2017
JBEI Research Highlights - October 2017 JBEI Research Highlights - October 2017
JBEI Research Highlights - October 2017
 
JBEI Research Highlights - January 2018
JBEI Research Highlights - January 2018  JBEI Research Highlights - January 2018
JBEI Research Highlights - January 2018
 
JBEI Research Highlight Slides - February 2021
JBEI Research Highlight Slides - February 2021JBEI Research Highlight Slides - February 2021
JBEI Research Highlight Slides - February 2021
 
JBEI Research Highlights - February 2018
JBEI Research Highlights - February 2018JBEI Research Highlights - February 2018
JBEI Research Highlights - February 2018
 
JBEI Research Highlights - March 2018
JBEI Research Highlights - March 2018JBEI Research Highlights - March 2018
JBEI Research Highlights - March 2018
 
JBEI Research Highlights - December 2017
JBEI Research Highlights - December 2017 JBEI Research Highlights - December 2017
JBEI Research Highlights - December 2017
 
JBEI Research Highlights - May 2019
JBEI Research Highlights - May 2019JBEI Research Highlights - May 2019
JBEI Research Highlights - May 2019
 
JBEI Research Highlights - January 2019
JBEI Research Highlights - January 2019JBEI Research Highlights - January 2019
JBEI Research Highlights - January 2019
 
JBEI Research Highlights - March 2019
JBEI Research Highlights - March 2019JBEI Research Highlights - March 2019
JBEI Research Highlights - March 2019
 
JBEI highlights September 2019
JBEI highlights September 2019JBEI highlights September 2019
JBEI highlights September 2019
 
20140710 1 day1_nist_ercc2.0workshop
20140710 1 day1_nist_ercc2.0workshop20140710 1 day1_nist_ercc2.0workshop
20140710 1 day1_nist_ercc2.0workshop
 
JBEI highlights November 2019
JBEI highlights November 2019JBEI highlights November 2019
JBEI highlights November 2019
 
JBEI Research Highlights - April 2019
JBEI Research Highlights - April 2019JBEI Research Highlights - April 2019
JBEI Research Highlights - April 2019
 
September 2021 - JBEI Research Highlights Slides
September 2021 - JBEI Research Highlights SlidesSeptember 2021 - JBEI Research Highlights Slides
September 2021 - JBEI Research Highlights Slides
 
Ripa buffer - Invent Biotechnologies
Ripa buffer - Invent BiotechnologiesRipa buffer - Invent Biotechnologies
Ripa buffer - Invent Biotechnologies
 
JBEI August 2020 Highlights
JBEI August 2020 HighlightsJBEI August 2020 Highlights
JBEI August 2020 Highlights
 
JBEI Highlights - October 2014
JBEI Highlights - October 2014JBEI Highlights - October 2014
JBEI Highlights - October 2014
 
JBEI Highlights December 2014
JBEI Highlights December 2014JBEI Highlights December 2014
JBEI Highlights December 2014
 
JBEI October 2020 Research Highlights
JBEI October 2020 Research HighlightsJBEI October 2020 Research Highlights
JBEI October 2020 Research Highlights
 
JBEI August 2019 highlights
JBEI August 2019 highlightsJBEI August 2019 highlights
JBEI August 2019 highlights
 

Similar to Prediction of proteins for insecticidal activity using python toolkit iFeature

Sample Prep Solutions for Microbiome Research
Sample Prep Solutions for Microbiome ResearchSample Prep Solutions for Microbiome Research
Sample Prep Solutions for Microbiome ResearchQIAGEN
 
Introduction to Bioprocessing Sample Slides
Introduction to Bioprocessing Sample SlidesIntroduction to Bioprocessing Sample Slides
Introduction to Bioprocessing Sample SlidesPeteDeOlympio
 
AI in Bioinformatics
AI in BioinformaticsAI in Bioinformatics
AI in BioinformaticsAli Kishk
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformaticscontactsoorya
 
Protein microarray
Protein microarrayProtein microarray
Protein microarrayGhalia Nawal
 
Functional annotation
Functional annotationFunctional annotation
Functional annotationRavi Gandham
 
The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...Iddo
 
Proteomics & Metabolomics
Proteomics & MetabolomicsProteomics & Metabolomics
Proteomics & Metabolomicsgumccomm
 
Importance of microbial research
Importance of microbial researchImportance of microbial research
Importance of microbial researchCreative Proteomics
 
Proteomics contributes to your microbial research
Proteomics contributes to your microbial researchProteomics contributes to your microbial research
Proteomics contributes to your microbial researchCreative Proteomics
 
Novozymes Enzyme Stability Prediction
Novozymes Enzyme Stability PredictionNovozymes Enzyme Stability Prediction
Novozymes Enzyme Stability PredictionIttrainingIttraining
 
Protein Qualitative Analysis Services
Protein Qualitative Analysis ServicesProtein Qualitative Analysis Services
Protein Qualitative Analysis ServicesCreative Proteomics
 
JBEI May 2021 - Research Highlights
JBEI May 2021 - Research HighlightsJBEI May 2021 - Research Highlights
JBEI May 2021 - Research HighlightsSaraHarmon4
 
protein microarray-types and approaches.pptx
protein microarray-types and approaches.pptxprotein microarray-types and approaches.pptx
protein microarray-types and approaches.pptxSachin Teotia
 
NACSVMPred: A MACHINE LEARNING APPROACH FOR PREDICTION OF NAC PROTEINS IN RIC...
NACSVMPred: A MACHINE LEARNING APPROACH FOR PREDICTION OF NAC PROTEINS IN RIC...NACSVMPred: A MACHINE LEARNING APPROACH FOR PREDICTION OF NAC PROTEINS IN RIC...
NACSVMPred: A MACHINE LEARNING APPROACH FOR PREDICTION OF NAC PROTEINS IN RIC...cscpconf
 

Similar to Prediction of proteins for insecticidal activity using python toolkit iFeature (20)

Sample Prep Solutions for Microbiome Research
Sample Prep Solutions for Microbiome ResearchSample Prep Solutions for Microbiome Research
Sample Prep Solutions for Microbiome Research
 
Introduction to Bioprocessing Sample Slides
Introduction to Bioprocessing Sample SlidesIntroduction to Bioprocessing Sample Slides
Introduction to Bioprocessing Sample Slides
 
AI in Bioinformatics
AI in BioinformaticsAI in Bioinformatics
AI in Bioinformatics
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformatics
 
Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...
 
Biopharmaceutical
Biopharmaceutical Biopharmaceutical
Biopharmaceutical
 
Protein microarray
Protein microarrayProtein microarray
Protein microarray
 
Functional annotation
Functional annotationFunctional annotation
Functional annotation
 
The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...
 
Primer designing
Primer designingPrimer designing
Primer designing
 
Proteomics & Metabolomics
Proteomics & MetabolomicsProteomics & Metabolomics
Proteomics & Metabolomics
 
Introduction to Proteogenomics
Introduction to Proteogenomics Introduction to Proteogenomics
Introduction to Proteogenomics
 
Importance of microbial research
Importance of microbial researchImportance of microbial research
Importance of microbial research
 
Proteomics contributes to your microbial research
Proteomics contributes to your microbial researchProteomics contributes to your microbial research
Proteomics contributes to your microbial research
 
Novozymes Enzyme Stability Prediction
Novozymes Enzyme Stability PredictionNovozymes Enzyme Stability Prediction
Novozymes Enzyme Stability Prediction
 
Protein Qualitative Analysis Services
Protein Qualitative Analysis ServicesProtein Qualitative Analysis Services
Protein Qualitative Analysis Services
 
Biopharma VS Small Molecules Therapeutic
Biopharma VS Small Molecules TherapeuticBiopharma VS Small Molecules Therapeutic
Biopharma VS Small Molecules Therapeutic
 
JBEI May 2021 - Research Highlights
JBEI May 2021 - Research HighlightsJBEI May 2021 - Research Highlights
JBEI May 2021 - Research Highlights
 
protein microarray-types and approaches.pptx
protein microarray-types and approaches.pptxprotein microarray-types and approaches.pptx
protein microarray-types and approaches.pptx
 
NACSVMPred: A MACHINE LEARNING APPROACH FOR PREDICTION OF NAC PROTEINS IN RIC...
NACSVMPred: A MACHINE LEARNING APPROACH FOR PREDICTION OF NAC PROTEINS IN RIC...NACSVMPred: A MACHINE LEARNING APPROACH FOR PREDICTION OF NAC PROTEINS IN RIC...
NACSVMPred: A MACHINE LEARNING APPROACH FOR PREDICTION OF NAC PROTEINS IN RIC...
 

Recently uploaded

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 

Prediction of proteins for insecticidal activity using python toolkit iFeature

  • 1. ABSTRACT To improve the crop plant yield, agriculture companies have successfully adopted development of insect resistant crops by expressing insecticidal (insect killing) proteins in plants. As a leader in Agriculture Biotechnology industry, Bayer tests hundreds of genes every year for insecticidal activity in their proprietary pipeline to develop next generation of insect control solutions. Identification and nomination insecticidal proteins using traditional methods like blast and structure similarity have some drawbacks because of which more than 90% of the nominated proteins end up displaying no or less activity against insects. The testing of these proteins consumes enormous amount of time and resource. So we adopted machine learning (ML) approach to identify these proteins. We generated numerous features for more than 5000 amino acid sequences using a Python toolkit, iFeature, developed by Chen et al, in 2018 and built ML models to identify proteins with insecticidal activity. Proteins identified using this method are tested in the pipeline to check their efficacy against insect pests. Challenges faced while building the model and methods to overcome those challenges are discussed in this presentation. 1
  • 2. HOW WE BUILT A ML MODEL TO PREDICT PROTEINS WITH INSECTICIDAL ACTIVITY? Karnam Vasudeva Rao, Senior Scientist, Data Science, Monsanto (A Subsidiary of Bayer)
  • 3. CONTENTS ▰ What are insecticidal proteins? ▰ Why machine learning for protein activity identification? ▰ Different approaches used by researchers ▰ Why not general methods? ▰ iFeature Python tool kit ▰ Why did we choose iFeature? ▰ What features iFeature has? ▰ How we adopted it for our need? ▰ What were the challenges? ▰ How did we overcome those? ▰ Key learnings 3
  • 4. IMPROVE CROP YIELD BY DEVELOPING PEST RESISTANT CROPS BY EXPRESSING INSECTICIDAL PROTEINS IN THEM 4
  • 5. WHY WE NEED ML FOR GENE NOMINATIONS? 5Current state What? Predict protein activity against insect pests based on Amino Acid sequence features to enable quality nominations to insect control pipeline in Bayer. Why? 100’s of proteins are nominated and analyzed in each year. Many nominations have turned out to be inactive proteins / toxins. Goal is to develop a model to predict the propensity of toxicity. How? Extract features from >5000 Protein (amino acid) sequences and develop a predictive model using historical data to predict inactive toxins. Future state Pipeline
  • 6. THREE MAJOR APPROACHES ARE USED BY RESEARCHERS TO PREDICT PROTEIN FUNCTIONS 6 1 2 3 Sequence similarity between AA sequences Protein structure comparison Disadvantages with traditional methods: High-similarity BLAST does not always imply homology. Proteins with the same function can have different structures. Proteins that have diverged from a common ancestral gene may have the same function but different sequences. Sequence similarity-based approaches are often inadequate in the absence of similar sequences or when the sequence similarity among known protein sequences is statistically weak (called the "twilight zone" or "midnight zone") (reference: Proteome Science 2009, 7:27). Biological experiments for protein identification are time consuming and resource intensive. Sequence and structure derived features
  • 7. iFeature - AN OPEN-SOURCE PYTHON TOOLKIT FOR PREDICTION OF PROTEINS ACTIVITY 7 iFeature ▰ http://iFeature.erc.monash.edu/ ▰ https://github.com/Superzchen/iFeature/ ▰ Features: ▰ Protein length, molecular weight, number of atoms, grand average of hydropathicity (GRAVY), amino acid composition, periodicity, physicochemical properties, predicted secondary structures, subcellular location, sequence motifs or highly conserved regions, classification of protein function, hydrophobicity, solvent accessibility, secondary structure, surface tension, charge, polarisability, polarity, and normalized van der Waals volume and annotations in protein databases. •Predicting protein–protein interactions through sequence-based deep learning. •Bioinformatics, 34, 2018, i802–i810 DPPI •Predicting protein functions from sequence and interactions using a deep ontology-aware classifier. •Bioinformatics, 34(4), 2018, 660–668 DeepGO •Predicting protein function by machine learning on amino acid sequences – a critical evaluation •BMC Genomics 2007, 8:78 Classifiers
  • 8. Place your screenshot here 8 iFeature - AN OPEN- SOURCE PYTHON TOOLKIT GitHuB repository with codes, usage instructions and examples.
  • 9. 9 SAMPLE DATA toxin sequence score protein3345 MNSYQNQYEILESSSNNTNMPNRYPFANDPNIFPINLDACQGRPWQDTWKSVSDIVTIGTYLIQFLREPGIGGIPVILSIINKLIPSSG0 protein10357 MSDLEVKIGVNPADVRYTANFKVAPNDGYVMYEKNTPIIPEIGVNITVINTGREEMEVHYEWAPPFGGWQCASTTIIPPDGKPVYIA0 protein7062 MSINIDPSKEFVKVSNFAGYEIATSQDSEEEGANLIIYYTADPYLLFYLDEERNNGILVSRRTGFVIGVKSGSNKDGELIIQCEWDGEPYS0 protein000023 MKICVVNILLGLLMIVGESAANIGYADLTTNVYFVATIKSSTCQMSLEGGTAGGGDSYTIPVGSNGKVGAIDIINGTENAMANFSLDI0 protein3518 MKSISKKVMAGLLVGATSLSIWAPISEAAAPENNRYYNIALKSNTKKVWNVSQASNDNDRAIVLWQGGSADHERFAFFQLDGGA0 protein10355 MGIKKTIKFILCLSISLCILNYPSISFAETLDTNSSSVKSKSDIDTGIANLNYNNREVLAVNGDRVDSFVPKEGLNSNDKFIVVERNKKSL0 protein5481 MENSNYFEKNNFSQEDSALDSLLNTFLVIQNKKTNQVIGRPEHYIQKGIITYYFINLENEADIPEQQLILYKLDNKSYYIVSRNKSAYYSF0 protein000025 MKRIFFFIPLILGLVACADDDSFSTSTGLRLDFPSDTIKLDTVFSRTASSTYTFWVNNRNDNGVKLQSVRLKRGNQTGFRVNVDGMY0 protein3918 MNGGKNMNQNNQNEMQIIDSSSNDFSQSNRYPRYPLAKESNYKDWLASCDESNVDTLSTTSDVKGSVSRVLGIVNQILGFLGLGF0 protein000021 MSNDIYGSSTELIANSIYETDYHVLLGIRNSNILFMTPHGGGVETGATELSIASGGTDHNYYCFEGWRTSNNGDMHVTSANFNEPVC0 protein9439 MKKKVSMMLTCVLLAPLFLNGNAPVAHAGDPFLITSIDEPTIDREGLIGYYYREDQFKNLQLFTPTRNHTLVYDQGTARDLLADSQQQ1 protein8184 MNQKKYIFMKPISILSIVCFCVSITPTSSLADMYRSRGNFTSKNENTKHTNEYYPRAIFNPYIEPAPEIITETRFASIKSTDTIAITTKNHPK0 protein2126 MTKNHKKILSMTLVTSMLAGTYIPTAYTAFAETEQKEGSQENQTGLINKGSLPLDSYGLFENPYKGVTFDQFMNAFNNNTWNPLLV0 protein9438 MKKKITKTLLCATMGISILTPLAVSAKTEDNNEQQLITQINQRENSFPNVGLGTQWLFQYYDKYLRANGLLRVAPVVTVEDLEVKNSY0 • 5000+ amino acid sequences and activity scores 0-5. • *0-5: inactive to highly active
  • 10. 10 cluster.py iFeaturePse KRAAC.py feaSelector.py pcaAnalysis.py python iFeature.py --file examples/test-protein.txt --type CKSAAP python iFeature.py --file examples/test-protein.txt --type DDE POSSESS 37 FEATURE DESCRIPTIONS • three dimensionality reduction algorithms (PCA, LDA and t-SNE) • program used to implement the feature selection algorithms • program used for running the feature or sample clustering algorithms. • program used to extract the 16 types of pseudo K-tuple reduced amino acid composition (PseKRAAC) feature descriptors. • k-spaced Amino Acid Pairs
  • 11. 11 LIST OF VARIOUS DESCRIPTORS CALCULATED BY Descriptor groups Descriptor Dimn. AA composition Amino acid composition (AAC) 20 Enhanced amino acid composition (EAAC) — Composition of k-spaced AA pairs (CKSAAP) 2400 Dipeptide composition (DPC) 400 Dipeptide deviation from expected mean (DDE) 400 Tripeptide composition (TPC) 8000 Grouped AA composition Grouped amino acid composition (GAAC) 5 Enhanced grouped AA composition (GEAAC) — Composition of k-spaced AA group pairs (CKSAAGP) 150 Grouped dipeptide composition (GDPC) 25 Grouped tripeptide composition (GTPC) 125 Binary Binary (BINARY) — Autocorrelation Moran (Moran) 240 Geary (Geary) 240 Normalized Moreau-Broto (NMBroto) 240 C/T/D Composition (CTDC) 39 Transition (CTDT) 39 Distribution (CTDD) 195 Conjoint triad Conjoint triad (CTriad) 343 Conjoint k-spaced triad (KSCTriad) 343x(k+1)
  • 12. Feature selectionFeature extraction Model building Performance of the modelsData preparation 1 2 3 4 5 What to explore in Data? Only 2 independent variables • Sequences • Assay values No independent variables! Need to generate features using sequences. 1000s of features; which ones to select? What these features explain? Which model to choose? Confusion matrix Biologically whether it makes sense? Meaningful features for protein function prediction CHALLENGES IN USING SEQUENCE BASED ML APPROACHES iFeature
  • 13. 13 toxin sequence score protein3345 MNSYQNQYEILESSSNNTNMPNRYPFANDPNIFPINLDACQGRPWQDTWKSVSDIVTIGTYLIQFLREPGIGGIPVILSIINK0 protein10357 MSDLEVKIGVNPADVRYTANFKVAPNDGYVMYEKNTPIIPEIGVNITVINTGREEMEVHYEWAPPFGGWQCASTTIIPPDG0 protein7062 MSINIDPSKEFVKVSNFAGYEIATSQDSEEEGANLIIYYTADPYLLFYLDEERNNGILVSRRTGFVIGVKSGSNKDGELIIQCEW0 protein000023 MKICVVNILLGLLMIVGESAANIGYADLTTNVYFVATIKSSTCQMSLEGGTAGGGDSYTIPVGSNGKVGAIDIINGTENAMA0 protein3518 MKSISKKVMAGLLVGATSLSIWAPISEAAAPENNRYYNIALKSNTKKVWNVSQASNDNDRAIVLWQGGSADHERFAFFQ0 protein10355 MGIKKTIKFILCLSISLCILNYPSISFAETLDTNSSSVKSKSDIDTGIANLNYNNREVLAVNGDRVDSFVPKEGLNSNDKFIVVER0 protein5481 MENSNYFEKNNFSQEDSALDSLLNTFLVIQNKKTNQVIGRPEHYIQKGIITYYFINLENEADIPEQQLILYKLDNKSYYIVSRNK0 protein000025 MKRIFFFIPLILGLVACADDDSFSTSTGLRLDFPSDTIKLDTVFSRTASSTYTFWVNNRNDNGVKLQSVRLKRGNQTGFRVNV0 protein3918 MNGGKNMNQNNQNEMQIIDSSSNDFSQSNRYPRYPLAKESNYKDWLASCDESNVDTLSTTSDVKGSVSRVLGIVNQILG0 protein000021 MSNDIYGSSTELIANSIYETDYHVLLGIRNSNILFMTPHGGGVETGATELSIASGGTDHNYYCFEGWRTSNNGDMHVTSANF0 protein9439 MKKKVSMMLTCVLLAPLFLNGNAPVAHAGDPFLITSIDEPTIDREGLIGYYYREDQFKNLQLFTPTRNHTLVYDQGTARDLLA1 protein8184 MNQKKYIFMKPISILSIVCFCVSITPTSSLADMYRSRGNFTSKNENTKHTNEYYPRAIFNPYIEPAPEIITETRFASIKSTDTIAITT0 protein2126 MTKNHKKILSMTLVTSMLAGTYIPTAYTAFAETEQKEGSQENQTGLINKGSLPLDSYGLFENPYKGVTFDQFMNAFNNNTW0 protein9438 MKKKITKTLLCATMGISILTPLAVSAKTEDNNEQQLITQINQRENSFPNVGLGTQWLFQYYDKYLRANGLLRVAPVVTVEDL0 NUMEROUS SEQUENCE FEATURES WERE GENERATED USING
  • 14. MODEL EVALUATION RANDOM FOREST WAS THE FAVORITE 14
  • 15. KEY LEARNINGS FEATURES ▰iFeature - ‘all in one package’ ▰Very few independent variables before using iFeature and too many after using iFeature. ▰Use not only Importance but domain knowledge to choose input variables (e.g. K space, conjoint triad). DATA ▰Data bias can be overcome using domain knowledge – 0: active; 1-5: active (Multinomial to binomial). MODEL BUILDING ▰Build multiple models instead of one or two and choose the best based on business needs and parameters. ▰Where multiple models perform equally select model based on business needs / domain knowledge (False Positives | False negatives) – sensitivity and specificity. 15 OTHER APPLICATIONS ▰iFeature and above approach – to identify disease related proteins and Protein-protein interaction studies.