SlideShare a Scribd company logo
Guide for reproducing
results of Bioassay paper
using Weka
Important points to remember before
starting a run:
   All datasets should be in ARFF format, otherwise weka will complain for incompatible
    format during training and testing.
   Standard classifiers are used for confirmatory screen data as it is smaller and less im-
    balanced, whereas cost-sensitive classifiers are used with primary & mixed datasets as
    they are more imbalanced.
   We have two goals-
       1. To find most robust and versatile classifier for imbalanced bioassay data.
       2. To find out optimal misclassification cost setting for a classifier.
   The misclassification cost for False Negatives has to be set in order to achieve maxi-
    mum number of True Positives with a False Positive rate less than 20%.
   The datasets are randomly split into 80% training and validation set and 20% independ-
    ent test set, so we should have two files for each dataset one for training the classifier
    and one for testing the model built by that classifier.
   Use 5 fold cross-validation for larger datasets i.e. primary and mixed screens and use
    10 fold cross–validation for smaller datasets i.e. confirmatory screens.
   CostSensitiveClassifier is used for base classifiers Naïve Bayes, SMO (Sequential Minimal
    Optimization) and Random Forest, as it outperforms other meta-learners.
   MetaCost with J48 produces bettet results than other meta-learners.
   For Naïve Bayes and Random Forest, default options are used.
   For SMO, option BuildLogisticModels was set to true.
   For J48, option Unpruned was set to true.
   For more details please refer the paper.
Step wise guide to set-up a weka run:
1. Start weka explorer.
2. In Preprocess tab go to open file…
3. Open a training file in ARFF format.




                                              Click open




4. For example, AID1608red_train.arff.
5. After opening the file should look like:
6. Now click on classify tab in the menu bar.
7. We will first train a model using Naïve Bayes classifier, as we are using confirmatory
  screen AID1608 we will first apply standard classifiers and if there will be less than 20%
  False Positive rate than cost-sensitive classifiers is used.
8. Click on Choose button to select a classifier. From Bayes folder choose Naïve Bayes.




9. Your window should appear as below with cross-validation selected with 10 folds:
10. Now click on start button, model will start building.
11. Since we have used 10 fold cross-validation so it will build models for 10 folds.




                               Check status here




               Run completed
12. Look at the output section scroll to bottom section as shown:




13. This is the model generated by Naïve Bayes classifier by using training set
    AID1608red_train.
14. Next step is to test this model on the independent test set AID1608red_test.
15. Go to section test options select Supplied test set and click on set.
16. Open the test file AID1608red_test.
17. After reading the file close the Test instances dialog by clicking on close.
18. Now right-click on your model in result list and choose Re-evaluate model on current
test set.




                                      Click here
19. Within fraction of a second results are produced in the same output window.




                            False positive


         True positive



                                             False negative
                         True negative




20. We have obtained a False Positive rate of 14.5% which is less than 20% and a True posi-
tive rate of 15.4% which is very low. Now, we will set cost-sensitive classifier to improve
the results.
21. As mentioned in page 2 of this tutorial for Naïve Bayes we will use Weka’s CostSensi-
tiveClassifier.
22. The author has used incremental costing where cost was increased in stages from 2 to
    1000000, until a 20% False positive rate was reached.
23. So, we will set up a cost matrix by starting with a misclassification cost of 2.
24. Go to choose button, select CostSensitiveClassifier from meta folder.




25. Click on the text box to open the GenericObjectEditor dialog box as shown:




     Click here and this
    dialog box will open
             up
26. In this dialog box, select Naïve Bayes from choose classifier.
27. Next, click on costMatrix to set up misclassification cost.




28. We have 2 classes in our dataset i.e. actives and inactives so we will set up a 2X2
     Matrix. ( For TP, FP, TN, FN).




   In classes enter 2.
   Click resize to cre-
ate a 2X2 matrix.
   Change misclassi-
fication cost for false
negatives to 2.
   Then close the
dialog box.




                                                                              Write 2 in place of 1
29. Leave all other options default and now close GenericObjectEditor dialog by clicking OK
30. Click start to begin building cost-sensitive model.
31. Repeat steps 13-19 as described above for testing.




32. See improved results, True Positives has increased within a 20% limit for False
    Positives.
33. We stop here as we have achieved our goal.
34. Similarly, you can build models using SMO, Random Forest and J48. Check their
    settings as mentioned on page 2 of this tutorial before starting the run.

More Related Content

What's hot

Slides for a workshop to build the pharma competition Living Business Model
Slides for a workshop to build the pharma competition Living Business ModelSlides for a workshop to build the pharma competition Living Business Model
Slides for a workshop to build the pharma competition Living Business Model
Kim Warren
 
One sample t test (procedure and output in SPSS)
One sample t test (procedure and output in SPSS)One sample t test (procedure and output in SPSS)
One sample t test (procedure and output in SPSS)
Unexplord Solutions LLP
 
Paired sample t test (procedure and output)
Paired sample t test (procedure and output)Paired sample t test (procedure and output)
Paired sample t test (procedure and output)
Unexplord Solutions LLP
 
One way anova in spss (procedure and output)
One way anova in spss (procedure and output)One way anova in spss (procedure and output)
One way anova in spss (procedure and output)
Unexplord Solutions LLP
 
Independent sample t test in spss (procedure and output)
Independent sample t test in spss (procedure and output)Independent sample t test in spss (procedure and output)
Independent sample t test in spss (procedure and output)
Unexplord Solutions LLP
 
Basic abap oo
Basic abap ooBasic abap oo
Basic abap oo
Gabriel Magalhães
 
XL-MINER:Partition
XL-MINER:PartitionXL-MINER:Partition
XL-MINER:Partition
DataminingTools Inc
 
GIMP BASICS by Aedam Ampongan
GIMP BASICS by Aedam AmponganGIMP BASICS by Aedam Ampongan
GIMP BASICS by Aedam Ampongan
AedamIsidoreAmpongan
 
XL-MINER: Data Utilities
XL-MINER: Data UtilitiesXL-MINER: Data Utilities
XL-MINER: Data Utilities
DataminingTools Inc
 
Multiply-and-divide-in-excel
Multiply-and-divide-in-excelMultiply-and-divide-in-excel
Multiply-and-divide-in-excel
Ria Lopez (Reservist)(ms.Education)
 

What's hot (10)

Slides for a workshop to build the pharma competition Living Business Model
Slides for a workshop to build the pharma competition Living Business ModelSlides for a workshop to build the pharma competition Living Business Model
Slides for a workshop to build the pharma competition Living Business Model
 
One sample t test (procedure and output in SPSS)
One sample t test (procedure and output in SPSS)One sample t test (procedure and output in SPSS)
One sample t test (procedure and output in SPSS)
 
Paired sample t test (procedure and output)
Paired sample t test (procedure and output)Paired sample t test (procedure and output)
Paired sample t test (procedure and output)
 
One way anova in spss (procedure and output)
One way anova in spss (procedure and output)One way anova in spss (procedure and output)
One way anova in spss (procedure and output)
 
Independent sample t test in spss (procedure and output)
Independent sample t test in spss (procedure and output)Independent sample t test in spss (procedure and output)
Independent sample t test in spss (procedure and output)
 
Basic abap oo
Basic abap ooBasic abap oo
Basic abap oo
 
XL-MINER:Partition
XL-MINER:PartitionXL-MINER:Partition
XL-MINER:Partition
 
GIMP BASICS by Aedam Ampongan
GIMP BASICS by Aedam AmponganGIMP BASICS by Aedam Ampongan
GIMP BASICS by Aedam Ampongan
 
XL-MINER: Data Utilities
XL-MINER: Data UtilitiesXL-MINER: Data Utilities
XL-MINER: Data Utilities
 
Multiply-and-divide-in-excel
Multiply-and-divide-in-excelMultiply-and-divide-in-excel
Multiply-and-divide-in-excel
 

Viewers also liked

Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random Forest
Hirak Sen Roy
 
Test
TestTest
Testrofop
 
SPIPNOZ 2013 : le plugin evaluations
SPIPNOZ 2013 : le plugin evaluationsSPIPNOZ 2013 : le plugin evaluations
SPIPNOZ 2013 : le plugin evaluations
Cyril Marion
 
Parameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionParameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionDario Panada
 
Conistency of random forests
Conistency of random forestsConistency of random forests
Conistency of random forests
Hoang Nguyen
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Gilles Louppe
 
CVPR2015 reading "Global refinement of random forest"
CVPR2015 reading "Global refinement of random forest"CVPR2015 reading "Global refinement of random forest"
CVPR2015 reading "Global refinement of random forest"
Akisato Kimura
 
Random forest
Random forestRandom forest
Random forestUjjawal
 

Viewers also liked (9)

Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random Forest
 
Test
TestTest
Test
 
SPIPNOZ 2013 : le plugin evaluations
SPIPNOZ 2013 : le plugin evaluationsSPIPNOZ 2013 : le plugin evaluations
SPIPNOZ 2013 : le plugin evaluations
 
Parameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionParameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point Detection
 
Conistency of random forests
Conistency of random forestsConistency of random forests
Conistency of random forests
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
 
CVPR2015 reading "Global refinement of random forest"
CVPR2015 reading "Global refinement of random forest"CVPR2015 reading "Global refinement of random forest"
CVPR2015 reading "Global refinement of random forest"
 
Random forest
Random forestRandom forest
Random forest
 
Random forest
Random forestRandom forest
Random forest
 

Similar to Weka guide

AI Builder - Text Classification
AI Builder - Text ClassificationAI Builder - Text Classification
AI Builder - Text Classification
Cheah Eng Soon
 
OLT open script
OLT open script OLT open script
OLT open script
Sujay Raghuraj
 
Normal Modal Analysis in Hypermesh
Normal Modal Analysis in HypermeshNormal Modal Analysis in Hypermesh
Normal Modal Analysis in Hypermesh
Rahul Shedage
 
Lab report watson
Lab report watsonLab report watson
Lab report watson
Shaily Dubey
 
Lab 10.doc
Lab 10.docLab 10.doc
Lab 10.docbutest
 
Lab 10.doc
Lab 10.docLab 10.doc
Lab 10.docbutest
 
Bank of pecunia mortgage risk model
Bank of pecunia mortgage risk modelBank of pecunia mortgage risk model
Bank of pecunia mortgage risk model
Rui Cao
 
Easy Pivot Tutorial June 2020
Easy Pivot Tutorial June 2020Easy Pivot Tutorial June 2020
Easy Pivot Tutorial June 2020
Adhi Wikantyoso
 
Advance Excel Session__ Scenario Manager.pptx
Advance Excel Session__ Scenario Manager.pptxAdvance Excel Session__ Scenario Manager.pptx
Advance Excel Session__ Scenario Manager.pptx
metaprosys
 
Tutorials.pdf
Tutorials.pdfTutorials.pdf
Tutorials.pdf
ssuser7feaf1
 
CedCommerce Walmart Marketplace Repricer Extension for Magento Store
CedCommerce Walmart Marketplace Repricer Extension for Magento StoreCedCommerce Walmart Marketplace Repricer Extension for Magento Store
CedCommerce Walmart Marketplace Repricer Extension for Magento Store
CedCommerce
 
Weka Term Paper_VGSoM_10BM60011
Weka Term Paper_VGSoM_10BM60011Weka Term Paper_VGSoM_10BM60011
Weka Term Paper_VGSoM_10BM60011
Amu Singh
 
AI Builder - Binary Classification
AI Builder - Binary ClassificationAI Builder - Binary Classification
AI Builder - Binary Classification
Cheah Eng Soon
 
Scoring documentation
Scoring documentationScoring documentation
Scoring documentationFatima Khalid
 
Advanced Computer Programming..pptx
Advanced Computer Programming..pptxAdvanced Computer Programming..pptx
Advanced Computer Programming..pptx
KrishanthaRanaweera1
 
Weka term paper(siddharth 10 bm60086)
Weka term paper(siddharth 10 bm60086)Weka term paper(siddharth 10 bm60086)
Weka term paper(siddharth 10 bm60086)
Siddharth Verma
 
CIS 1403 lab 4 selection
CIS 1403 lab 4 selectionCIS 1403 lab 4 selection
CIS 1403 lab 4 selection
Hamad Odhabi
 
How to prevent duplicate values in a range nta
How to prevent duplicate values in a range ntaHow to prevent duplicate values in a range nta
How to prevent duplicate values in a range nta
Microsoft Office Specialist
 

Similar to Weka guide (20)

AI Builder - Text Classification
AI Builder - Text ClassificationAI Builder - Text Classification
AI Builder - Text Classification
 
OLT open script
OLT open script OLT open script
OLT open script
 
Normal Modal Analysis in Hypermesh
Normal Modal Analysis in HypermeshNormal Modal Analysis in Hypermesh
Normal Modal Analysis in Hypermesh
 
Lab report watson
Lab report watsonLab report watson
Lab report watson
 
Lab 10.doc
Lab 10.docLab 10.doc
Lab 10.doc
 
Lab 10.doc
Lab 10.docLab 10.doc
Lab 10.doc
 
Bank of pecunia mortgage risk model
Bank of pecunia mortgage risk modelBank of pecunia mortgage risk model
Bank of pecunia mortgage risk model
 
Easy Pivot Tutorial June 2020
Easy Pivot Tutorial June 2020Easy Pivot Tutorial June 2020
Easy Pivot Tutorial June 2020
 
Advance Excel Session__ Scenario Manager.pptx
Advance Excel Session__ Scenario Manager.pptxAdvance Excel Session__ Scenario Manager.pptx
Advance Excel Session__ Scenario Manager.pptx
 
Tutorials.pdf
Tutorials.pdfTutorials.pdf
Tutorials.pdf
 
CedCommerce Walmart Marketplace Repricer Extension for Magento Store
CedCommerce Walmart Marketplace Repricer Extension for Magento StoreCedCommerce Walmart Marketplace Repricer Extension for Magento Store
CedCommerce Walmart Marketplace Repricer Extension for Magento Store
 
Weka Term Paper_VGSoM_10BM60011
Weka Term Paper_VGSoM_10BM60011Weka Term Paper_VGSoM_10BM60011
Weka Term Paper_VGSoM_10BM60011
 
AI Builder - Binary Classification
AI Builder - Binary ClassificationAI Builder - Binary Classification
AI Builder - Binary Classification
 
Scoring documentation
Scoring documentationScoring documentation
Scoring documentation
 
Advanced Computer Programming..pptx
Advanced Computer Programming..pptxAdvanced Computer Programming..pptx
Advanced Computer Programming..pptx
 
Predictive Modeling with Enterprise Miner
Predictive Modeling with Enterprise MinerPredictive Modeling with Enterprise Miner
Predictive Modeling with Enterprise Miner
 
Predictive Modeling with Enterprise Miner
Predictive Modeling with Enterprise MinerPredictive Modeling with Enterprise Miner
Predictive Modeling with Enterprise Miner
 
Weka term paper(siddharth 10 bm60086)
Weka term paper(siddharth 10 bm60086)Weka term paper(siddharth 10 bm60086)
Weka term paper(siddharth 10 bm60086)
 
CIS 1403 lab 4 selection
CIS 1403 lab 4 selectionCIS 1403 lab 4 selection
CIS 1403 lab 4 selection
 
How to prevent duplicate values in a range nta
How to prevent duplicate values in a range ntaHow to prevent duplicate values in a range nta
How to prevent duplicate values in a range nta
 

More from Abhik Seal

Chemical data
Chemical dataChemical data
Chemical data
Abhik Seal
 
Clinicaldataanalysis in r
Clinicaldataanalysis in rClinicaldataanalysis in r
Clinicaldataanalysis in r
Abhik Seal
 
Virtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryVirtual Screening in Drug Discovery
Virtual Screening in Drug Discovery
Abhik Seal
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
Abhik Seal
 
Data handling in r
Data handling in rData handling in r
Data handling in r
Abhik Seal
 
Modeling Chemical Datasets
Modeling Chemical DatasetsModeling Chemical Datasets
Modeling Chemical DatasetsAbhik Seal
 
Introduction to Adverse Drug Reactions
Introduction to Adverse Drug ReactionsIntroduction to Adverse Drug Reactions
Introduction to Adverse Drug ReactionsAbhik Seal
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to functionAbhik Seal
 
Sequencedatabases
SequencedatabasesSequencedatabases
SequencedatabasesAbhik Seal
 
Chemical File Formats for storing chemical data
Chemical File Formats for storing chemical dataChemical File Formats for storing chemical data
Chemical File Formats for storing chemical dataAbhik Seal
 
Understanding Smiles
Understanding Smiles Understanding Smiles
Understanding Smiles Abhik Seal
 
Learning chemistry with google
Learning chemistry with googleLearning chemistry with google
Learning chemistry with googleAbhik Seal
 
3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using data3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using dataAbhik Seal
 
Poster
PosterPoster
Poster
Abhik Seal
 
R scatter plots
R scatter plotsR scatter plots
R scatter plots
Abhik Seal
 
Indo us 2012
Indo us 2012Indo us 2012
Indo us 2012
Abhik Seal
 
Q plot tutorial
Q plot tutorialQ plot tutorial
Q plot tutorial
Abhik Seal
 
Pharmacohoreppt
PharmacohorepptPharmacohoreppt
Pharmacohoreppt
Abhik Seal
 
Document1
Document1Document1
Document1
Abhik Seal
 

More from Abhik Seal (20)

Chemical data
Chemical dataChemical data
Chemical data
 
Clinicaldataanalysis in r
Clinicaldataanalysis in rClinicaldataanalysis in r
Clinicaldataanalysis in r
 
Virtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryVirtual Screening in Drug Discovery
Virtual Screening in Drug Discovery
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
 
Data handling in r
Data handling in rData handling in r
Data handling in r
 
Networks
NetworksNetworks
Networks
 
Modeling Chemical Datasets
Modeling Chemical DatasetsModeling Chemical Datasets
Modeling Chemical Datasets
 
Introduction to Adverse Drug Reactions
Introduction to Adverse Drug ReactionsIntroduction to Adverse Drug Reactions
Introduction to Adverse Drug Reactions
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
 
Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
 
Chemical File Formats for storing chemical data
Chemical File Formats for storing chemical dataChemical File Formats for storing chemical data
Chemical File Formats for storing chemical data
 
Understanding Smiles
Understanding Smiles Understanding Smiles
Understanding Smiles
 
Learning chemistry with google
Learning chemistry with googleLearning chemistry with google
Learning chemistry with google
 
3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using data3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using data
 
Poster
PosterPoster
Poster
 
R scatter plots
R scatter plotsR scatter plots
R scatter plots
 
Indo us 2012
Indo us 2012Indo us 2012
Indo us 2012
 
Q plot tutorial
Q plot tutorialQ plot tutorial
Q plot tutorial
 
Pharmacohoreppt
PharmacohorepptPharmacohoreppt
Pharmacohoreppt
 
Document1
Document1Document1
Document1
 

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 

Weka guide

  • 1. Guide for reproducing results of Bioassay paper using Weka
  • 2. Important points to remember before starting a run:  All datasets should be in ARFF format, otherwise weka will complain for incompatible format during training and testing.  Standard classifiers are used for confirmatory screen data as it is smaller and less im- balanced, whereas cost-sensitive classifiers are used with primary & mixed datasets as they are more imbalanced.  We have two goals- 1. To find most robust and versatile classifier for imbalanced bioassay data. 2. To find out optimal misclassification cost setting for a classifier.  The misclassification cost for False Negatives has to be set in order to achieve maxi- mum number of True Positives with a False Positive rate less than 20%.  The datasets are randomly split into 80% training and validation set and 20% independ- ent test set, so we should have two files for each dataset one for training the classifier and one for testing the model built by that classifier.  Use 5 fold cross-validation for larger datasets i.e. primary and mixed screens and use 10 fold cross–validation for smaller datasets i.e. confirmatory screens.  CostSensitiveClassifier is used for base classifiers Naïve Bayes, SMO (Sequential Minimal Optimization) and Random Forest, as it outperforms other meta-learners.  MetaCost with J48 produces bettet results than other meta-learners.  For Naïve Bayes and Random Forest, default options are used.  For SMO, option BuildLogisticModels was set to true.  For J48, option Unpruned was set to true.  For more details please refer the paper.
  • 3. Step wise guide to set-up a weka run: 1. Start weka explorer. 2. In Preprocess tab go to open file… 3. Open a training file in ARFF format. Click open 4. For example, AID1608red_train.arff. 5. After opening the file should look like:
  • 4. 6. Now click on classify tab in the menu bar. 7. We will first train a model using Naïve Bayes classifier, as we are using confirmatory screen AID1608 we will first apply standard classifiers and if there will be less than 20% False Positive rate than cost-sensitive classifiers is used. 8. Click on Choose button to select a classifier. From Bayes folder choose Naïve Bayes. 9. Your window should appear as below with cross-validation selected with 10 folds:
  • 5. 10. Now click on start button, model will start building. 11. Since we have used 10 fold cross-validation so it will build models for 10 folds. Check status here Run completed
  • 6. 12. Look at the output section scroll to bottom section as shown: 13. This is the model generated by Naïve Bayes classifier by using training set AID1608red_train. 14. Next step is to test this model on the independent test set AID1608red_test. 15. Go to section test options select Supplied test set and click on set. 16. Open the test file AID1608red_test.
  • 7. 17. After reading the file close the Test instances dialog by clicking on close. 18. Now right-click on your model in result list and choose Re-evaluate model on current test set. Click here
  • 8. 19. Within fraction of a second results are produced in the same output window. False positive True positive False negative True negative 20. We have obtained a False Positive rate of 14.5% which is less than 20% and a True posi- tive rate of 15.4% which is very low. Now, we will set cost-sensitive classifier to improve the results. 21. As mentioned in page 2 of this tutorial for Naïve Bayes we will use Weka’s CostSensi- tiveClassifier. 22. The author has used incremental costing where cost was increased in stages from 2 to 1000000, until a 20% False positive rate was reached. 23. So, we will set up a cost matrix by starting with a misclassification cost of 2.
  • 9. 24. Go to choose button, select CostSensitiveClassifier from meta folder. 25. Click on the text box to open the GenericObjectEditor dialog box as shown: Click here and this dialog box will open up
  • 10. 26. In this dialog box, select Naïve Bayes from choose classifier. 27. Next, click on costMatrix to set up misclassification cost. 28. We have 2 classes in our dataset i.e. actives and inactives so we will set up a 2X2 Matrix. ( For TP, FP, TN, FN).  In classes enter 2.  Click resize to cre- ate a 2X2 matrix.  Change misclassi- fication cost for false negatives to 2.  Then close the dialog box. Write 2 in place of 1
  • 11. 29. Leave all other options default and now close GenericObjectEditor dialog by clicking OK 30. Click start to begin building cost-sensitive model. 31. Repeat steps 13-19 as described above for testing. 32. See improved results, True Positives has increased within a 20% limit for False Positives. 33. We stop here as we have achieved our goal. 34. Similarly, you can build models using SMO, Random Forest and J48. Check their settings as mentioned on page 2 of this tutorial before starting the run.