Classification by Machine Learning Approaches

Classification by Machine Learning Approaches - Exercise Solution Michael J. Kerner – [email_address] Center for Biological Sequence Analysis Technical University of Denmark

Exercise Solution: donors_trainset.arff - All features: trees.J48 === Stratified cross-validation === === Summary === Correctly Classified Instances 4972 94.5967 % Incorrectly Classified Instances 284 5.4033 % Kappa statistic 0.8381 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0.87 0.034 0.875 0.87 0.872 true 0.966 0.13 0.965 0.966 0.966 false === Confusion Matrix === a b <-- classified as 971 145 | a = true 139 4001 | b = false

Exercise Solution: donors_trainset.arff - All features: bayes.NaiveBayes === Stratified cross-validation === === Summary === Correctly Classified Instances 4910 93.417 % Incorrectly Classified Instances 346 6.583 % Kappa statistic 0.8056 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0.862 0.046 0.834 0.862 0.848 true 0.954 0.138 0.962 0.954 0.958 false === Confusion Matrix === a b <-- classified as 962 154 | a = true 192 3948 | b = false

Exercise Solution: donors_trainset.arff - All features: functions.SMO === Stratified cross-validation === === Summary === Correctly Classified Instances 4986 94.863 % Incorrectly Classified Instances 270 5.137 % Kappa statistic 0.8455 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0.871 0.03 0.885 0.871 0.878 true 0.97 0.129 0.965 0.97 0.967 false === Confusion Matrix === a b <-- classified as 972 144 | a = true 126 4014 | b = false

@RELATION donors.train @ATTRIBUTE -7_A {0,1} @ATTRIBUTE -7_T {0,1} @ATTRIBUTE -7_C {0,1} [...] @ATTRIBUTE 6_A {0,1} @ATTRIBUTE 6_T {0,1} @ATTRIBUTE 6_C {0,1} @ATTRIBUTE 6_G {0,1} @ATTRIBUTE class {true,false} @DATA 0,0,1,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,1,0,0,true 0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,true [...] Exercise Solution: @RELATION donors.train @ATTRIBUTE -7 {A,C,G,T} @ATTRIBUTE -6 {A,C,G,T} @ATTRIBUTE -5 {A,C,G,T} @ATTRIBUTE -4 {A,C,G,T} [...] @ATTRIBUTE +3 {A,C,G,T} @ATTRIBUTE +4 {A,C,G,T} @ATTRIBUTE +5 {A,C,G,T} @ATTRIBUTE +6 {A,C,G,T} @ATTRIBUTE splicesite {true,false} @DATA C,T,C,C,G,A,A,A,G,G,A,T,T,true T,C,A,G,A,A,G,G,A,G,G,G,C,true T,T,G,G,A,A,G,T,C,G,C,A,G,true [..] donors_trainset.arff Binary Feature Encoding donors_trainset_diffencod.arff Fewer features Four (nominal) values per feature

Exercise Solution: donors_trainset_ diffencod .arff - All features: trees.J48 === Stratified cross-validation === === Summary === Correctly Classified Instances 4948 94.14 % Incorrectly Classified Instances 308 5.86 % Kappa statistic 0.8248 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0.862 0.037 0.862 0.862 0.862 true 0.963 0.138 0.963 0.963 0.963 false === Confusion Matrix === a b <-- classified as 962 154 | a = true 154 3986 | b = false

Exercise Solution: donors_trainset_ diffencod .arff - All features: bayes.NaiveBayes === Stratified cross-validation === === Summary === Correctly Classified Instances 4922 93.6454 % Incorrectly Classified Instances 334 6.3546 % Kappa statistic 0.8078 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0.834 0.036 0.862 0.834 0.848 true 0.964 0.166 0.956 0.964 0.96 false === Confusion Matrix === a b <-- classified as 931 185 | a = true 149 3991 | b = false

Exercise Solution: donors_trainset_ diffencod .arff - All features: functions.SMO === Stratified cross-validation === === Summary === Correctly Classified Instances 4986 94.863 % Incorrectly Classified Instances 270 5.137 % Kappa statistic 0.8456 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0.872 0.031 0.885 0.872 0.878 true 0.969 0.128 0.966 0.969 0.967 false === Confusion Matrix === a b <-- classified as 973 143 | a = true 127 4013 | b = false

Exercise Solution: Feature Selection: CfsSubsetEval , BestFirst : Features -2A, -1G, 1A, 2A, 3_G CorrelationCoefficients: J48: 0.7981 NaiveBayes: 0.7762 SMO: 0.7388 MultilayerPerceptron: 0.8053 ClassifierSubsetEval (w/ NaiveBayes ), BestFirst : Features: -7A, -7C, -6G, -4A, -1G, 1A, 1T, 1C, 2A, 3G, 4T, 5A CorrelationCoefficients: J48: 0.7935 NaiveBayes: 0.8033 SMO: 0.7597 MultilayerPerceptron: 0.7765

Summary Generally, there is no ‘best’ method for all problems. Feature representation can influence classification results. Feature selection often improves classification performance, but not always. Feature selection significantly speeds up classification – thereby allowing also computationally very demanding classifiers Always try to test multiple methods!

Classification by Machine Learning Approaches

More Related Content

Similar to Classification by Machine Learning Approaches

More from butest

Classification by Machine Learning Approaches