Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

.doc

347 views

Published on

  • Be the first to comment

  • Be the first to like this

.doc

  1. 1. Evaluating multi-class classification using binary SVMs (IT-642: Course Project) Vijay T. Raisinghani Pradeep Jagannath rvijay@it.iitb.ac.in pradeep@it.iitb.ac.in Roll No: 01429703 Roll No: 00329010 Abstract We study how SVM-based binary classifiers are used for multi-way classification. We present the results of experiments run on various UCI and KDD datasets, using the SVMLight package. The methods evaluated are 1-versus-1, 1-versus-many and Erroc Correcting Output Coding (ECOC). 1. Main Objectives 4. Introduction - Use three (1-vs-1, 1-vs-many, ECOC) Many supervised learning tasks can be cast binary classification schemes, with as the problem of assigning elements to a SVMLight [SVM02] on various UCI finite set of classes or categories. For datasets and the KDD intrusion example the goal of optical character detection dataset. recognition (OCR) is to determine the digit - Report accuracy and run-time for the value (0..9) from its image. A number of various methods. other applications too require such multi- way classification e.g.: text and speech 2. Status and other details categorisation, natural language processing - Fully completed tasks and gesture and object recognition in machine vision [All00]. - Percentage contribution of members: In designing machine learning algorithms, it - Pradeep Jagannath – 50% is often easier first to devise algorithms for - Vijay T. Raisinghani – 50% distinguishing between only two classes - Total time spent on the project: ?? [All00]. Ensemble schemes have been proposed, which use binary (two-class) 3. Major stumbling blocks classification algorithms to solve K-class - SVM parameter estimation. We referred classification problems. Decomposing a K- to [Dua01] and other papers (see section class classification problem into a number of “Related Work”) and had discussions binary classification problems allows an with Shantanu Godbole (Ph.D student, ensemble scheme to model binary class KR School of IT, IIT Bombay) to boundaries with much greater flexibility at a “estimate” the kernel parameters lower computational cost [Goh01]. required. Still, with no prior estimate about the time required for various Three representative ensemble schemes are datasets, especially the KDD dataset, we one per class (1-vs-many), pairwise coupling had to abort tests, which were running (1-vs-1), and error-correcting output coding for days together. (ECOC) [Goh01]. - KDD dataset tested using only 1% of the 1. One per class (OPC). This is also known data (i.e. 50,000) records. Full dataset as “one against others.” OPC trains K binary has 5 million records. Even 10% of the classifiers, each of which separates one class records were taking a very large amount from the other (K - 1) classes. Given a point of time. X to classify, the binary classifier with the largest output determines the class of X.
  2. 2. 2. Pairwise coupling (PWC). PWC 7. Experiments and Results constructs K(K-1)/2 pairwise binary All our tests were run on cygnus (PIII – 3 classifiers. The classifying decision is made processors, 512 MB RAM, running Linux by aggregating the outputs of the pairwise version 2.4.17 (RedHat 7.1). classifiers. 3. Error-correcting output coding (ECOC). We experimented with various kernel ECOC was first proposed by Dietterich and settings. For some cases, the test did not Bakiri [S] to reduce classification error by terminate and had to be aborted. For some exploiting the redundancy of the coding settings, the accuracy from all the methods scheme. ECOC employs a set of binary was very low. We exclude the results for classifiers assigned with codewords such which the test had to be aborted or the that the Hamming distance between each accuracy was low for all the methods. pair is far enough apart to enable good error correction. Data Set No. of Train Test Classes records records 5. Related Work iris 3 100 50 [Die95] discusses the use of ECOC method dermatalogy 6 244 122 versus multi-way classification using glass 7 142 72 decision trees. [Gha00], [Ren01] present the ecoli 8 224 112 use of ECOC, for improving the yeast 10 989 495 performance of Naïve-Bayes, for text KDD 23 48,984,31 3,11,029 intrusion (in train (4.9 (0.3 classification. [All00], [Wes99], [Hsu02] detection data) million) million) propose extensions to SVMs for multi-way 38 classification. [Goh01] provides details of (in test how to boost the output of binary SVMs, for data) image classification. [Mor98] discusses letter 26 15,000 5000 various methods of combining the output of 1-vs-1. Figure 1: Data sets 6. Implementation details The dematalogy dataset had missing values All our test scripts were shell scripts, which in the age attribute. We substituted this with invoked SVMLight . Additionally, for ECOC the maximum frequency value of age. we used a modified form of bch3.c [Zar02] – encoder/decoder program for BCH codes For the KDD dataset we did the following: in C. We modified the program to decode - Reduced the training data to only 1% i.e. and encode in parts. The program generates 50,000 records. the code matrix based on the input set of - Scanned the 1% test set for duplicates -- classes and accepts the ‘received’ code from found 50% duplicates. Eliminated them, our shell scripts for decoding. We hard- finally training data had 23000 records. coded other parameters: code length to 31 - One training record had 55 features bits and errors correcting capability to 15 while all others had 41. We eliminated bits. This resulted in the data length being a this record, although it may not have maximum of 6 bits i.e. we could encode a had contributed to any problems. maximum of 64 classes with these settings. - Feature selection was done using This was sufficient for the data sets we used, “Inducer”[MLC++] and C4.5. Selected which had a maximum of 26 classes. 16 features from the original set of 41. - Stratified the de-duped file to max 50 per class to get 689 records. To run a simpler / faster test.
  3. 3. - Used these 689 records to train with the KDD 3.7% 53.9% 57.8% RBF kernel with params: -g 0.03 -c 10 - intrusion q 50 -n 40 detection++ letter 34% 78.5% 88.38% In KDD with FSS using only 689 records. And testing 10 percent of the test file Figure 3: Accruacy with various methods (29615) records: - 5000 records had certain class labels * (1% train data, deduplicated = 23000; present in test file, which were non- 10% test data = 29615) existent in the training data. These 5000 ++ (0.01% train data, deduplicated = 689; records directly contributed to 10% test data = 29615) classification errors. Dataset 1-vs-1 1-vs-many ECOC - 1-vs-1 – almost all records got classified File Convert Learn File Convert Learn File Convert Learn as class 1 (3.7 % accuracy) - 1-vs-many – 13687 errors (53.9% accuracy) - ECOC – 12489 errors (57.8% accuracy) iris 0.09 0.11 0.06 0.17 0.79 1.21 - dermatal 0.43 3.1 0.29 1.19 2.1 9.79 Data Set Parameters ogy iris kernel:poly d=3, glass 0.29 1.14 0.13 0.38 0.96 3.45 other params: c=0.001 ecoli 0.65 1.22 0.19 0.45 1.16 2.53 dermatalogy kernel: RBF g=0.01 other params: c=10 yeast 1.66 9.87 0.6 14.73 4.29 97.1 glass kernel: RBF g=0.8 KDD 75.96 6530 44.37 2980 81.25 13675 other params: c=10 intrusion ecoli kernel: poly d=3 detection other params: - * yeast kernel: RBF g=10 other params: c=10 KDD 6.63 101.9 1.44 19.87 1.2 83.96 KDD kernel: RBF g=0.001 intrusion detection intrusion other params: c=1, q=50, n=40 ++ detection letter kernel: RBF g=0.01 letter 105.4 1609 33.49 972.7 108 9230 other params: c=1, q=50, n=40 Figure 2: SVM Parameters for various data sets #Classes vs accuracy Dataset 1-vs-1 1-vs- ECOC many 120% iris 94% 98% 98% 100% 80% 1v1 dermatalogy 86.9% 89.3% 90.98% 60% 1vm glass 66.7% 68% 73.6% 40% ECOC ecoli 80.3% 55.3% 73.2% 20% yeast 53.3% 44.6% 55.8% 0% 3 6 7 8 10 23 23 26 KDD 5.8% 47.5% 63.7% intrusion detection*
  4. 4. [Gha00] Rayid Ghani. Using error- convert vs row s/class correcting codes for text classification. In Proceedings of 16000 14000 the Seventeenth International 12000 Conference on Machine Learning, 1v1 10000 2000. 8000 1vm 6000 4000 ECOC [Ren01] Jason D. M. Rennie. Improving 2000 multi-class text classification with 0 naïve bayes. Master's thesis, Massachusetts Institute of 4 29 63 43 5. Technology, 2001. 0. 6. 0. 10 [Hsu02] C.-W. Hsu and C.-J. Lin. A comparison of methods for multi- learn vs row s/class class support vector machines , IEEE Transactions on Neural 120 Networks, 13(2002), 415-425. 100 80 1v1 [Wes99] J. Weston, "Extensions to the 60 1vm Support Vector Method", PhD 40 ECOC thesis, Royal Holloway University 20 of London, 1999. 0 [Mor98] M. Moreira and E. Mayoraz. Improving pairwise coupling 4 63 29 43 5. classification with error correcting 0. 6. 0. 10 classifiers. Proceedings of the Tenth European Conference on Machine Learning, April 1998. References [All00] E. L. Allwein, R. E. Schapire, and [SVM02] Thorsten Joachims Y. Singer. Reducing multiclass to http://svmlight.joachims.org/, binary: A unifying approach for Cornell University, Department of margin classifiers. Journal of Computer Science. Machine Learning Research, 1:113-141, 2000. [Dua01] Kaibo Duan, S Sathiya Keerthi, Aun Neow PooICONIP – 2001, [Goh01] K. Goh, E. Chang, K. Cheng. 8th International Conference on SVM Binary Classifier Ensembles Neural Information Processing, for Image Classification. Shanghai China, November CIKM’01, November 5-10,2001, 14-18.2001 Atlanta, Georgia, USA. [MLC++] Silicon Graphics, Inc., MLC++, [Die95] T. G. Dietterich and G. Bakiri. http://www.sgi.com/tech/mlc/, Solving multiclass learning 2002 problems via error-correcting output codes. Journal of Artificial [Zar02] R. Morelos-Zaragoza., BCH codes - Intelligence Research, 2:263-286, The Error Correcting Codes (ECC) 1995. Page, http://www.csl.sony.co.jp/person/ morelos/ecc/codes.html, 2002

×