Classification by Machine Learning Approaches
Upcoming SlideShare
Loading in...5
×
 

Classification by Machine Learning Approaches

on

  • 688 views

 

Statistics

Views

Total Views
688
Views on SlideShare
688
Embed Views
0

Actions

Likes
0
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Classification by Machine Learning Approaches Classification by Machine Learning Approaches Presentation Transcript

  • Classification by Machine Learning Approaches - Exercise Solution Michael J. Kerner – [email_address] Center for Biological Sequence Analysis Technical University of Denmark
  • Exercise Solution:
    • donors_trainset.arff - All features:
    • trees.J48
    • === Stratified cross-validation ===
    • === Summary ===
    • Correctly Classified Instances 4972 94.5967 %
    • Incorrectly Classified Instances 284 5.4033 %
    • Kappa statistic 0.8381
    • === Detailed Accuracy By Class ===
    • TP Rate FP Rate Precision Recall F-Measure Class
    • 0.87 0.034 0.875 0.87 0.872 true
    • 0.966 0.13 0.965 0.966 0.966 false
    • === Confusion Matrix ===
    • a b <-- classified as
    • 971 145 | a = true
    • 139 4001 | b = false
  • Exercise Solution:
    • donors_trainset.arff - All features:
    • bayes.NaiveBayes
    • === Stratified cross-validation ===
    • === Summary ===
    • Correctly Classified Instances 4910 93.417 %
    • Incorrectly Classified Instances 346 6.583 %
    • Kappa statistic 0.8056
    • === Detailed Accuracy By Class ===
    • TP Rate FP Rate Precision Recall F-Measure Class
    • 0.862 0.046 0.834 0.862 0.848 true
    • 0.954 0.138 0.962 0.954 0.958 false
    • === Confusion Matrix ===
    • a b <-- classified as
    • 962 154 | a = true
    • 192 3948 | b = false
  • Exercise Solution:
    • donors_trainset.arff - All features:
    • functions.SMO
    • === Stratified cross-validation ===
    • === Summary ===
    • Correctly Classified Instances 4986 94.863 %
    • Incorrectly Classified Instances 270 5.137 %
    • Kappa statistic 0.8455
    • === Detailed Accuracy By Class ===
    • TP Rate FP Rate Precision Recall F-Measure Class
    • 0.871 0.03 0.885 0.871 0.878 true
    • 0.97 0.129 0.965 0.97 0.967 false
    • === Confusion Matrix ===
    • a b <-- classified as
    • 972 144 | a = true
    • 126 4014 | b = false
    • @RELATION donors.train
    • @ATTRIBUTE -7_A {0,1}
    • @ATTRIBUTE -7_T {0,1}
    • @ATTRIBUTE -7_C {0,1}
    • [...]
    • @ATTRIBUTE 6_A {0,1}
    • @ATTRIBUTE 6_T {0,1}
    • @ATTRIBUTE 6_C {0,1}
    • @ATTRIBUTE 6_G {0,1}
    • @ATTRIBUTE class {true,false}
    • @DATA
    • 0,0,1,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,1,0,0,true
    • 0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,true
    • [...]
    Exercise Solution: @RELATION donors.train @ATTRIBUTE -7 {A,C,G,T} @ATTRIBUTE -6 {A,C,G,T} @ATTRIBUTE -5 {A,C,G,T} @ATTRIBUTE -4 {A,C,G,T} [...] @ATTRIBUTE +3 {A,C,G,T} @ATTRIBUTE +4 {A,C,G,T} @ATTRIBUTE +5 {A,C,G,T} @ATTRIBUTE +6 {A,C,G,T} @ATTRIBUTE splicesite {true,false} @DATA C,T,C,C,G,A,A,A,G,G,A,T,T,true T,C,A,G,A,A,G,G,A,G,G,G,C,true T,T,G,G,A,A,G,T,C,G,C,A,G,true [..] donors_trainset.arff Binary Feature Encoding donors_trainset_diffencod.arff Fewer features Four (nominal) values per feature
  • Exercise Solution:
    • donors_trainset_ diffencod .arff - All features:
    • trees.J48
    • === Stratified cross-validation ===
    • === Summary ===
    • Correctly Classified Instances 4948 94.14 %
    • Incorrectly Classified Instances 308 5.86 %
    • Kappa statistic 0.8248
    • === Detailed Accuracy By Class ===
    • TP Rate FP Rate Precision Recall F-Measure Class
    • 0.862 0.037 0.862 0.862 0.862 true
    • 0.963 0.138 0.963 0.963 0.963 false
    • === Confusion Matrix ===
    • a b <-- classified as
    • 962 154 | a = true
    • 154 3986 | b = false
  • Exercise Solution:
    • donors_trainset_ diffencod .arff - All features:
    • bayes.NaiveBayes
    • === Stratified cross-validation ===
    • === Summary ===
    • Correctly Classified Instances 4922 93.6454 %
    • Incorrectly Classified Instances 334 6.3546 %
    • Kappa statistic 0.8078
    • === Detailed Accuracy By Class ===
    • TP Rate FP Rate Precision Recall F-Measure Class
    • 0.834 0.036 0.862 0.834 0.848 true
    • 0.964 0.166 0.956 0.964 0.96 false
    • === Confusion Matrix ===
    • a b <-- classified as
    • 931 185 | a = true
    • 149 3991 | b = false
  • Exercise Solution:
    • donors_trainset_ diffencod .arff - All features:
    • functions.SMO
    • === Stratified cross-validation ===
    • === Summary ===
    • Correctly Classified Instances 4986 94.863 %
    • Incorrectly Classified Instances 270 5.137 %
    • Kappa statistic 0.8456
    • === Detailed Accuracy By Class ===
    • TP Rate FP Rate Precision Recall F-Measure Class
    • 0.872 0.031 0.885 0.872 0.878 true
    • 0.969 0.128 0.966 0.969 0.967 false
    • === Confusion Matrix ===
    • a b <-- classified as
    • 973 143 | a = true
    • 127 4013 | b = false
  • Exercise Solution:
    • Feature Selection:
    • CfsSubsetEval , BestFirst :
    • Features -2A, -1G, 1A, 2A, 3_G
    • CorrelationCoefficients:
    • J48: 0.7981
    • NaiveBayes: 0.7762
    • SMO: 0.7388
    • MultilayerPerceptron: 0.8053
    • ClassifierSubsetEval (w/ NaiveBayes ), BestFirst :
    • Features: -7A, -7C, -6G, -4A, -1G, 1A, 1T, 1C, 2A, 3G, 4T, 5A
    • CorrelationCoefficients:
    • J48: 0.7935
    • NaiveBayes: 0.8033
    • SMO: 0.7597
    • MultilayerPerceptron: 0.7765
  • Summary
    • Generally, there is no ‘best’ method for all problems.
    • Feature representation can influence classification results.
    • Feature selection often improves classification performance, but not always.
    • Feature selection significantly speeds up classification – thereby allowing also computationally very demanding classifiers
      • Always try to test multiple methods!