Upcoming SlideShare
×

# Classification by Machine Learning Approaches

646 views

Published on

1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
646
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
10
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Classification by Machine Learning Approaches

1. 1. Classification by Machine Learning Approaches - Exercise Solution Michael J. Kerner – [email_address] Center for Biological Sequence Analysis Technical University of Denmark
2. 2. Exercise Solution: <ul><li>donors_trainset.arff - All features: </li></ul><ul><li>trees.J48 </li></ul><ul><li>=== Stratified cross-validation === </li></ul><ul><li>=== Summary === </li></ul><ul><li>Correctly Classified Instances 4972 94.5967 % </li></ul><ul><li>Incorrectly Classified Instances 284 5.4033 % </li></ul><ul><li>Kappa statistic 0.8381 </li></ul><ul><li>=== Detailed Accuracy By Class === </li></ul><ul><li>TP Rate FP Rate Precision Recall F-Measure Class </li></ul><ul><li>0.87 0.034 0.875 0.87 0.872 true </li></ul><ul><li>0.966 0.13 0.965 0.966 0.966 false </li></ul><ul><li>=== Confusion Matrix === </li></ul><ul><li>a b <-- classified as </li></ul><ul><li>971 145 | a = true </li></ul><ul><li>139 4001 | b = false </li></ul>
3. 3. Exercise Solution: <ul><li>donors_trainset.arff - All features: </li></ul><ul><li>bayes.NaiveBayes </li></ul><ul><li>=== Stratified cross-validation === </li></ul><ul><li>=== Summary === </li></ul><ul><li>Correctly Classified Instances 4910 93.417 % </li></ul><ul><li>Incorrectly Classified Instances 346 6.583 % </li></ul><ul><li>Kappa statistic 0.8056 </li></ul><ul><li>=== Detailed Accuracy By Class === </li></ul><ul><li>TP Rate FP Rate Precision Recall F-Measure Class </li></ul><ul><li>0.862 0.046 0.834 0.862 0.848 true </li></ul><ul><li>0.954 0.138 0.962 0.954 0.958 false </li></ul><ul><li>=== Confusion Matrix === </li></ul><ul><li>a b <-- classified as </li></ul><ul><li>962 154 | a = true </li></ul><ul><li>192 3948 | b = false </li></ul>
4. 4. Exercise Solution: <ul><li>donors_trainset.arff - All features: </li></ul><ul><li>functions.SMO </li></ul><ul><li>=== Stratified cross-validation === </li></ul><ul><li>=== Summary === </li></ul><ul><li>Correctly Classified Instances 4986 94.863 % </li></ul><ul><li>Incorrectly Classified Instances 270 5.137 % </li></ul><ul><li>Kappa statistic 0.8455 </li></ul><ul><li>=== Detailed Accuracy By Class === </li></ul><ul><li>TP Rate FP Rate Precision Recall F-Measure Class </li></ul><ul><li>0.871 0.03 0.885 0.871 0.878 true </li></ul><ul><li>0.97 0.129 0.965 0.97 0.967 false </li></ul><ul><li>=== Confusion Matrix === </li></ul><ul><li>a b <-- classified as </li></ul><ul><li>972 144 | a = true </li></ul><ul><li>126 4014 | b = false </li></ul>
5. 5. <ul><li>@RELATION donors.train </li></ul><ul><li>@ATTRIBUTE -7_A {0,1} </li></ul><ul><li>@ATTRIBUTE -7_T {0,1} </li></ul><ul><li>@ATTRIBUTE -7_C {0,1} </li></ul><ul><li>[...] </li></ul><ul><li>@ATTRIBUTE 6_A {0,1} </li></ul><ul><li>@ATTRIBUTE 6_T {0,1} </li></ul><ul><li>@ATTRIBUTE 6_C {0,1} </li></ul><ul><li>@ATTRIBUTE 6_G {0,1} </li></ul><ul><li>@ATTRIBUTE class {true,false} </li></ul><ul><li>@DATA </li></ul><ul><li>0,0,1,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,1,0,0,true </li></ul><ul><li>0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,true </li></ul><ul><li>[...] </li></ul>Exercise Solution: @RELATION donors.train @ATTRIBUTE -7 {A,C,G,T} @ATTRIBUTE -6 {A,C,G,T} @ATTRIBUTE -5 {A,C,G,T} @ATTRIBUTE -4 {A,C,G,T} [...] @ATTRIBUTE +3 {A,C,G,T} @ATTRIBUTE +4 {A,C,G,T} @ATTRIBUTE +5 {A,C,G,T} @ATTRIBUTE +6 {A,C,G,T} @ATTRIBUTE splicesite {true,false} @DATA C,T,C,C,G,A,A,A,G,G,A,T,T,true T,C,A,G,A,A,G,G,A,G,G,G,C,true T,T,G,G,A,A,G,T,C,G,C,A,G,true [..] donors_trainset.arff Binary Feature Encoding donors_trainset_diffencod.arff Fewer features Four (nominal) values per feature
6. 6. Exercise Solution: <ul><li>donors_trainset_ diffencod .arff - All features: </li></ul><ul><li>trees.J48 </li></ul><ul><li>=== Stratified cross-validation === </li></ul><ul><li>=== Summary === </li></ul><ul><li>Correctly Classified Instances 4948 94.14 % </li></ul><ul><li>Incorrectly Classified Instances 308 5.86 % </li></ul><ul><li>Kappa statistic 0.8248 </li></ul><ul><li>=== Detailed Accuracy By Class === </li></ul><ul><li>TP Rate FP Rate Precision Recall F-Measure Class </li></ul><ul><li>0.862 0.037 0.862 0.862 0.862 true </li></ul><ul><li>0.963 0.138 0.963 0.963 0.963 false </li></ul><ul><li>=== Confusion Matrix === </li></ul><ul><li>a b <-- classified as </li></ul><ul><li>962 154 | a = true </li></ul><ul><li>154 3986 | b = false </li></ul>
7. 7. Exercise Solution: <ul><li>donors_trainset_ diffencod .arff - All features: </li></ul><ul><li>bayes.NaiveBayes </li></ul><ul><li>=== Stratified cross-validation === </li></ul><ul><li>=== Summary === </li></ul><ul><li>Correctly Classified Instances 4922 93.6454 % </li></ul><ul><li>Incorrectly Classified Instances 334 6.3546 % </li></ul><ul><li>Kappa statistic 0.8078 </li></ul><ul><li>=== Detailed Accuracy By Class === </li></ul><ul><li>TP Rate FP Rate Precision Recall F-Measure Class </li></ul><ul><li>0.834 0.036 0.862 0.834 0.848 true </li></ul><ul><li>0.964 0.166 0.956 0.964 0.96 false </li></ul><ul><li>=== Confusion Matrix === </li></ul><ul><li>a b <-- classified as </li></ul><ul><li>931 185 | a = true </li></ul><ul><li>149 3991 | b = false </li></ul>
8. 8. Exercise Solution: <ul><li>donors_trainset_ diffencod .arff - All features: </li></ul><ul><li>functions.SMO </li></ul><ul><li>=== Stratified cross-validation === </li></ul><ul><li>=== Summary === </li></ul><ul><li>Correctly Classified Instances 4986 94.863 % </li></ul><ul><li>Incorrectly Classified Instances 270 5.137 % </li></ul><ul><li>Kappa statistic 0.8456 </li></ul><ul><li>=== Detailed Accuracy By Class === </li></ul><ul><li>TP Rate FP Rate Precision Recall F-Measure Class </li></ul><ul><li>0.872 0.031 0.885 0.872 0.878 true </li></ul><ul><li>0.969 0.128 0.966 0.969 0.967 false </li></ul><ul><li>=== Confusion Matrix === </li></ul><ul><li>a b <-- classified as </li></ul><ul><li>973 143 | a = true </li></ul><ul><li>127 4013 | b = false </li></ul>
9. 9. Exercise Solution: <ul><li>Feature Selection: </li></ul><ul><li>CfsSubsetEval , BestFirst : </li></ul><ul><li>Features -2A, -1G, 1A, 2A, 3_G </li></ul><ul><li>CorrelationCoefficients: </li></ul><ul><li>J48: 0.7981 </li></ul><ul><li>NaiveBayes: 0.7762 </li></ul><ul><li>SMO: 0.7388 </li></ul><ul><li>MultilayerPerceptron: 0.8053 </li></ul><ul><li>ClassifierSubsetEval (w/ NaiveBayes ), BestFirst : </li></ul><ul><li>Features: -7A, -7C, -6G, -4A, -1G, 1A, 1T, 1C, 2A, 3G, 4T, 5A </li></ul><ul><li>CorrelationCoefficients: </li></ul><ul><li>J48: 0.7935 </li></ul><ul><li>NaiveBayes: 0.8033 </li></ul><ul><li>SMO: 0.7597 </li></ul><ul><li>MultilayerPerceptron: 0.7765 </li></ul>
10. 10. Summary <ul><li>Generally, there is no ‘best’ method for all problems. </li></ul><ul><li>Feature representation can influence classification results. </li></ul><ul><li>Feature selection often improves classification performance, but not always. </li></ul><ul><li>Feature selection significantly speeds up classification – thereby allowing also computationally very demanding classifiers </li></ul><ul><ul><li>Always try to test multiple methods! </li></ul></ul>