Upcoming SlideShare
×

Like this presentation? Why not share!

# Classification by Machine Learning Approaches

## on Apr 26, 2010

• 688 views

### Views

Total Views
688
Views on SlideShare
688
Embed Views
0

Likes
0
3
0

No embeds

### Report content

• Comment goes here.
Are you sure you want to

## Classification by Machine Learning Approaches Presentation Transcript

• Classification by Machine Learning Approaches - Exercise Solution Michael J. Kerner – [email_address] Center for Biological Sequence Analysis Technical University of Denmark
• Exercise Solution:
• donors_trainset.arff - All features:
• trees.J48
• === Stratified cross-validation ===
• === Summary ===
• Correctly Classified Instances 4972 94.5967 %
• Incorrectly Classified Instances 284 5.4033 %
• Kappa statistic 0.8381
• === Detailed Accuracy By Class ===
• TP Rate FP Rate Precision Recall F-Measure Class
• 0.87 0.034 0.875 0.87 0.872 true
• 0.966 0.13 0.965 0.966 0.966 false
• === Confusion Matrix ===
• a b <-- classified as
• 971 145 | a = true
• 139 4001 | b = false
• Exercise Solution:
• donors_trainset.arff - All features:
• bayes.NaiveBayes
• === Stratified cross-validation ===
• === Summary ===
• Correctly Classified Instances 4910 93.417 %
• Incorrectly Classified Instances 346 6.583 %
• Kappa statistic 0.8056
• === Detailed Accuracy By Class ===
• TP Rate FP Rate Precision Recall F-Measure Class
• 0.862 0.046 0.834 0.862 0.848 true
• 0.954 0.138 0.962 0.954 0.958 false
• === Confusion Matrix ===
• a b <-- classified as
• 962 154 | a = true
• 192 3948 | b = false
• Exercise Solution:
• donors_trainset.arff - All features:
• functions.SMO
• === Stratified cross-validation ===
• === Summary ===
• Correctly Classified Instances 4986 94.863 %
• Incorrectly Classified Instances 270 5.137 %
• Kappa statistic 0.8455
• === Detailed Accuracy By Class ===
• TP Rate FP Rate Precision Recall F-Measure Class
• 0.871 0.03 0.885 0.871 0.878 true
• 0.97 0.129 0.965 0.97 0.967 false
• === Confusion Matrix ===
• a b <-- classified as
• 972 144 | a = true
• 126 4014 | b = false
• @RELATION donors.train
• @ATTRIBUTE -7_A {0,1}
• @ATTRIBUTE -7_T {0,1}
• @ATTRIBUTE -7_C {0,1}
• [...]
• @ATTRIBUTE 6_A {0,1}
• @ATTRIBUTE 6_T {0,1}
• @ATTRIBUTE 6_C {0,1}
• @ATTRIBUTE 6_G {0,1}
• @ATTRIBUTE class {true,false}
• @DATA
• 0,0,1,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,1,0,0,true
• 0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,true
• [...]
Exercise Solution: @RELATION donors.train @ATTRIBUTE -7 {A,C,G,T} @ATTRIBUTE -6 {A,C,G,T} @ATTRIBUTE -5 {A,C,G,T} @ATTRIBUTE -4 {A,C,G,T} [...] @ATTRIBUTE +3 {A,C,G,T} @ATTRIBUTE +4 {A,C,G,T} @ATTRIBUTE +5 {A,C,G,T} @ATTRIBUTE +6 {A,C,G,T} @ATTRIBUTE splicesite {true,false} @DATA C,T,C,C,G,A,A,A,G,G,A,T,T,true T,C,A,G,A,A,G,G,A,G,G,G,C,true T,T,G,G,A,A,G,T,C,G,C,A,G,true [..] donors_trainset.arff Binary Feature Encoding donors_trainset_diffencod.arff Fewer features Four (nominal) values per feature
• Exercise Solution:
• donors_trainset_ diffencod .arff - All features:
• trees.J48
• === Stratified cross-validation ===
• === Summary ===
• Correctly Classified Instances 4948 94.14 %
• Incorrectly Classified Instances 308 5.86 %
• Kappa statistic 0.8248
• === Detailed Accuracy By Class ===
• TP Rate FP Rate Precision Recall F-Measure Class
• 0.862 0.037 0.862 0.862 0.862 true
• 0.963 0.138 0.963 0.963 0.963 false
• === Confusion Matrix ===
• a b <-- classified as
• 962 154 | a = true
• 154 3986 | b = false
• Exercise Solution:
• donors_trainset_ diffencod .arff - All features:
• bayes.NaiveBayes
• === Stratified cross-validation ===
• === Summary ===
• Correctly Classified Instances 4922 93.6454 %
• Incorrectly Classified Instances 334 6.3546 %
• Kappa statistic 0.8078
• === Detailed Accuracy By Class ===
• TP Rate FP Rate Precision Recall F-Measure Class
• 0.834 0.036 0.862 0.834 0.848 true
• 0.964 0.166 0.956 0.964 0.96 false
• === Confusion Matrix ===
• a b <-- classified as
• 931 185 | a = true
• 149 3991 | b = false
• Exercise Solution:
• donors_trainset_ diffencod .arff - All features:
• functions.SMO
• === Stratified cross-validation ===
• === Summary ===
• Correctly Classified Instances 4986 94.863 %
• Incorrectly Classified Instances 270 5.137 %
• Kappa statistic 0.8456
• === Detailed Accuracy By Class ===
• TP Rate FP Rate Precision Recall F-Measure Class
• 0.872 0.031 0.885 0.872 0.878 true
• 0.969 0.128 0.966 0.969 0.967 false
• === Confusion Matrix ===
• a b <-- classified as
• 973 143 | a = true
• 127 4013 | b = false
• Exercise Solution:
• Feature Selection:
• CfsSubsetEval , BestFirst :
• Features -2A, -1G, 1A, 2A, 3_G
• CorrelationCoefficients:
• J48: 0.7981
• NaiveBayes: 0.7762
• SMO: 0.7388
• MultilayerPerceptron: 0.8053
• ClassifierSubsetEval (w/ NaiveBayes ), BestFirst :
• Features: -7A, -7C, -6G, -4A, -1G, 1A, 1T, 1C, 2A, 3G, 4T, 5A
• CorrelationCoefficients:
• J48: 0.7935
• NaiveBayes: 0.8033
• SMO: 0.7597
• MultilayerPerceptron: 0.7765
• Summary
• Generally, there is no ‘best’ method for all problems.
• Feature representation can influence classification results.
• Feature selection often improves classification performance, but not always.
• Feature selection significantly speeds up classification – thereby allowing also computationally very demanding classifiers
• Always try to test multiple methods!