Classification of data using semi supervised learning  a learning disability case
Upcoming SlideShare
Loading in...5
×
 

Classification of data using semi supervised learning a learning disability case

on

  • 794 views

 

Statistics

Views

Total Views
794
Views on SlideShare
794
Embed Views
0

Actions

Likes
0
Downloads
10
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Classification of data using semi supervised learning  a learning disability case Classification of data using semi supervised learning a learning disability case Document Transcript

  • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 432 CLASSIFICATION OF DATA USING SEMI-SUPERVISED LEARNING (A LEARNING DISABILITY CASE STUDY) Pooja Manghirmalani Mishra1, Dr. Sushil Kulkarni2 1Dept. of Computer Science, University of Mumbai, Mumbai, India 2Dept. of Mathematics, Jai Hind College, Mumbai, India ABSTRACT In classification, Semi-supervised learning occurs when a large amount of unlabeled data is available. In such a situation, how to enhance predictability of classification through unlabeled data is the focus. In this paper, we propose a methodology based on Support Vector Machine of semi- supervised learning and implement it on the case samples of learning disability. It is observed that about 10% of children enrolled in school have a learning disability. Learning disability prediction in school age children is a very complicated task because it tends to be identified in elementary school where there is no one sign to be identified. As the information is in the from labeled and unlabeled data, when applied together with the concept of margins is proving to give better accuracy for predicting learning disability within children. Keywords: Support Vector Machine, Learning Disability, Semi-Supervised Learning, Hyperplane. I. INTRODUCTION Learning refers to the process of inferring common rules by surveying examples. For instance, child can learn what a ‘car’ is, just by showing examples of objects that are cars and objects that are not cars. They need not be told any rules about what makes an object a car; child can simply learn the model ‘car’ by observing examples. As the object car comes with the label that defines it, its examples (data or samples) are of supervised learning category. Semi-supervised learning (SSL) is a type of machine learning techniques that make use of both labeled and unlabeled samples for training. SSL falls between unsupervised learning and supervised learning. Because SSL requires less human effort and claims to give better accuracy, it is of great interest both in theory and in practice. Computation models are needed, as in classification, SSL occurs when a large amount of unlabeled samples are available with only a small INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), pp. 432-440 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com IJCET © I A E M E
  • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 433 number of labeled samples. In such a situation, how to enhance predictability of classification through unlabeled samples is the focus. As classification is a data mining technique used to predict group membership for data instances, this work aims to novel large margin SSL methodologies, using grouping information from unlabeled samples, in a form of regularization controlling the interplay between labeled and unlabeled sample. The working of SSL is sampled on a data set of Learning Disability (LD). Learning disability refers to a neurobiological disorder which affects a person‘s brain and interferes with a person's ability to think and remember [1]. The causes that lead to learning disability (LD) are maturational delay, some unexplained disorder of the nervous system and injuries before birth or in early childhood. Children born prematurely and children who had medical problems soon after birth can also inherit LD [2]. LD can be broadly classified into three types. They are difficulties in learning with respect to read (Dyslexia), to write (Dysgraphia) or to do simple mathematical calculations (Dyscalculia) [3]. The term ―specific learning disability means a disorder in one or more of the basic psychological processes involved in understanding or in using language, spoken or written This may manifest itself in an imperfect ability to listen, speak, read, write, spell, or to do mathematical calculations. The term includes such conditions as perceptual handicaps, brain injury, minimal brain dysfunction, dyslexia, and developmental aphasia. The term does not include children who have learning disabilities which are primarily the result of visual, hearing, or motor handicaps, mental retardation, emotional disturbance, of environmental, cultural, or economic disadvantage [4]. LD cannot be cured completely by medication. Children suffering from LD are made to go through a remedial study in order to make them cope up with non-LD children of their age. For detecting LD, there does not exist a global method. This paper proposes a model for diagnosis and classification of LD. Section II of this paper explores in detail different computational methods and models applied for SSL. Having elaborately explored different approaches, we have found that there are still possible ways of approaching the given problem. Section III discusses the Support Vector Machine concept designed to classify the problem of LD. Section IV gives the implementation requirement of the system and sections V and VI discusses the results and future objectives respectively. II. TAXONOMY Of lately, several SSL methods have been introduced [5] Inclusive reviews are given in [6] on SSL algorithms. One of the popular approaches for SSL is based on a weighted graph [7] where labeled and unlabeled points compose the vertices of the graph where the similarities between the data point pairs are shown by the edge weights. A function is then used to label the unlabeled points on the graph. The technique for finding the appropriate weights and selecting the labeling function may differ. Many of these graph-based methods such as [8, 9, 10, 11] presume a transductive situation. In the transductive situation, the learner needs to observe the unlabeled (testing) data while training. This leads to retraining of these transductive algorithms every time a test sample is to be classified. Hence transductive algorithms may not satisfy the run-time requirements for many real world applications, including computer-aided diagnosis applications where new patient cases need to be classified in real- time as part of the physician’s work flow. Apart from the graph-based methods, T. Joachims [12] introduced an SVM-based semi- supervised algorithm (TSVM), where the labels of the unlabeled points are initialized with the prediction of the SVM classifier trained on the labeled data. Then the labels of the unlabeled points are altered till the time margin is improved. Even though this approach may lead to a local optimal
  • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 434 solution and it can be comparatively time consuming. In Bennett et al. [13] introduced a mixed integer programming (MIP) formulation that results in inductive classifiers (i.e., the algorithm produces a classifier that can be used directly to classify new samples without retraining). Nevertheless, this method needs a complex optimization solver and it is not feasible for data where the size of the unlabeled set is not small. Apart from these, there are some methods that attempt to find efficient approximate solutions to the MIP formulation [14], the drawback of these formulations is that they converge to a local minimum which may not be a sufficiently “good” solution. The only problem seen in SSL is Scalability. Current semi-supervised learning methods have not yet handled large amount of data. The complexity of many elegant graph-based methods is close to O(n3). Speed-up improvements have been proposed by many researchers but their effectiveness has yet to be proven on real large problems [15]. III. CLASSIFYING LEARNING DISABILITY DATA SAMPLE Mostly detection of LD is done using Wechsler Intelligence Scale for Children (WISC) test [16], conducted in the supervision of special educators and with the observation of parent and teachers. In this context, computational approach to detect LD is quite significant. Although states vary considerably in the IQ and achievement criteria used to designate a child as LD, discrepancy is used in either the definition and/or criteria by virtually all states, with the use of an IQ test to establish “aptitude” equally common. The IQ-achievement discrepancy criterion is the most controversial and best-studied component of the central definition of LD. From a classification perspective, it is a hypothesis that children with poor achievement below a level predicted by an IQ score are different from children with poor achievement consistent with their IQ score. IQ-discrepant children with LD have been proposed to differ from low achievers who are not IQ discrepant on several dimensions, including neurological integrity, cognitive characteristics, response to intervention, prognosis, gender, and the heritability of LD. Solution to this is a form of knowledge representation suitable for notions that cannot be defined precisely, but which depend upon their contexts. The soft computing technique called Support Vector Machine provides an alternative way to represent linguistic and subjective attributes of the real world in computing. It deals with the labeled and unlabeled data together. In this circumstance, computational approach to classify LD is quite significant. A. Collection of Exhaustive Parameters A curriculum-based test was designed with respect to the syllabus of primary-level school going children. This test was conducted in schools for collecting LD datasets for testing. Historic data for LD cases were collected from LD Clinics of Government hospitals where the tests were conducted in real-time medical environments. The system was fed with 11 input units which correspond to 11 different sections of the curriculum-based test. Table-1 shows the initial 11 inputs corresponding to curriculum-based test. Column 1 represents the name of the parameter, column-2 represents the total marks allocated to a particular section, and column-3 determines the category of LD a section corresponds to. Dataset consists of 340 cases of LD children. The system was trained using 70% of the data items and the remaining was used to test the network [17, 18, 19, 20].
  • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 435 Input Parameter Marks Category of LD Essay 10 Dysgraphia Reading 10 Dyslexia Comprehension 10 Dyslexia,Dysgraphia Spelling 10 Dysgraphia Perception 10 Dyslexia Solve 10 Dysgraphia Word Problem 10 Dyscalculia,Dyslexia Mental Sums 10 Dyscalculia Time 10 Dyscalculia Calander 05 Dyscalculia Money 05 Dyscalculia Table-1: Parameters and marks of curriculum -based test B.Support Vector Machine Vladimir Vapnik invented Support Vector Machine in 1979 [21].Support Vector Machine algorithm is based on statistical learning theory. It is a new method for the classification of both linear and non-linear data. The fundamental idea following the SVM is to map the original data into a feature space with high dimensionality through a non-linear mapping function and create an optimal hyper plane in new space [22]. SVM can be useful to both classification and regression. In the case of classification, an optimal hyper plane is found that separates the data into two classes, whereas in the case of regression a hyper plane is to be constructed that less close to as many points as possible [23, 24]. By separating the classes with a large margin minimizes a bound on the expected generalization error. A minimum generalization error means that when new examples arrive for classification, the chance of making an error in the prediction based on the learned classifier should be minimum [25]. Such a classifier is one, which achieve maximum separation margin between the classes. The two planes parallel to the classifier and which passes through one or more points in the data set are called bounding planes [26]. SVMs select a small number of critical boundary instances called support vectors from each class and build a linear discriminant function that separates them as widely as possible [15]. The points in the dataset falling on the bounding planes are called support vectors. SVM algorithm transforms the original data in a higher dimension, from where it can find a hyper plane for separation of the data using essential training tuples called support vectors [27]. If the training vectors are separated without errors by an optimal hyper plane, the expected error rate on a test sample is bounded by the ratio of the expectation of the support vectors to the number of training vectors [28]. Since this ratio is independent of the dimension of the problem, if one can find a small set of support vectors, good generalization is guaranteed. .
  • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 436 Figure-1: Maximum-margin hyperplane and margins for an SVM trained with samples from two classes. Samples on the margin are called the support vectors The overall aim is to generalize well to test data. This is obtained by introducing a separating hyperplane, which must maximize the margin between the two classes; this is known as the optimum separating hyperplane [30]. Assuming each example consists of a m number of data points (x1,……xm) followed by a label, which in the two class classification we will consider later, will be +1 or -1. -1 representing one state and 1 representing another. The two classes are then separated by an optimum hyperplane, illustrated in figure 1, minimizing the distance between the closest +1 and -1 points, which are known as support vectors [29]. The right hand side of the separating hyperplane represents the +1 class and the left hand side represents the -1 class. This classification divides two separate classes, which are generated from training examples C. Algorithm We consider data points of the form {(x1, y1), (x2, y2), (x3, y3), (x4, y4) ……….,(xn, yn)}. Where yn=1 or -1, a constant denoting the class to which that point xn belongs. n = number of sample. i. Each xn is p-dimensional real vector. The scaling is important to guard against variable (attributes) with larger variance. We can view this Training data, by means of the dividing (or separating) hyperplane, which takes w . x + b = 0 ----- (1) ii. Where b is scalar and w is p-dimensional Vector. The vector w points perpendicular to the separating hyperplane. Adding the offset parameter b allows us to increase the margin. Absence of b, the hyperplane is forced to pass through the origin, restricting the solution. As we are interesting in the maximum margin, we are interested SVM and the parallel hyperplanes. Parallel hyperplanes can be described by equation w.x + b = 1 , w.x + b = -1 iii. If the training data are linearly separable, we can select these hyperplanes so that there are no points between them and then try to maximize their distance. By geometry, we find the distance between the hyperplane is 2 / │w│. So we want to minimize │w│. To excite data points, we need to ensure that for all i either w. xi – b ≥ 1 or w. xi – b ≤ -1
  • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 437 iv. This can be written as: yi ( w. xi – b) ≥1 , 1 ≤ i ≤ n ------(2) v. Samples along the hyperplanes are called Support Vectors (SVs). A separating hyperplane with the largest margin defined by M = 2 / │w│ that is specifies support vectors means training data points closets to it. Which satisfy, yj [wT . x j + b] = 1 , i =1 -----(3) vi. Optimal Canonical Hyperplane (OCH) is a canonical Hyperplane having a maximum margin. For all the data, OCH should satisfy the following vii. Constraints yi [wT . xi + b] ≥1 ; i =1,2…l ------(4) where l is Number of Training data point. viii. In order to find the optimal separating hyperplane having a maximal margin, A learning machine should minimize ║w║2 subject to the inequality constraints yi [wT . xi + b] ≥ 1 ; i =1,2…….l D. Steps i. The data is in form of pair, consisting of an input object and desired output value. ii. Take the input marks of 11 subjects for each student. iii. Calculate the weighted sum. iv. Plot the graph as Student Name Index vs. Marks (weighted sum). v. Based on the data plotted on the graph find the maximum-margin hyperplane that divides the points having class as -1 or +1. vi. After getting the hyperplane check that IF the points are above the hyperplane; Assign a label LD->NO ELSE IF the points are below the hyperplane THEN assign label LD->YES. vii. Then compare this Predicted Label with Actual Label and calculate the accuracy. IV. IMPLEMENTATION The system is implemented using JAVA. The LD data collected is stored in Excel sheets. The experiments were conducted on a workstation with an Intel Pentium(R) 4 CPU, 3.06ghz, 2GB of RAM, running on Microsoft Windows 7 Home Edition, Version 2002 with Service Pack 3. V. RESULT AND DISCUSSION SVM belongs to the group of supervised learning algorithms in which the learning machine is given a set of examples with the associated labels as in the case of decision trees, the examples are in the form of attribute vectors. This case study has been carried out on more than 300 real data sets with the attributes, which represents the symptoms of LD, takes binary values and more work need to be carried out on quantitative data, as that is an important part of any data set. In comparison with our other study using LVQ and Single Layer Perceptron [17, 18], in this field, we found that SVM is more suitable in attribute selection while decision tree would be probably better suitable in classification.
  • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 438 Graph 1: SVM Classifier graph of the entire dataset. The margin represents the border where the points above the line belong to non-LD group and the points below the line show the LD group of children Correctly Classified Instances in % = 84.61538461538461 Incorrectly Classified Instances in % = 15.0% Hence system accuracy is 84.615% VI. FUTURE WORK OBJECTIVE We want to propose various algorithms that will overcome the limitations of SSL. i. The first limitation is that it is very costly to obtain labeled instances. This is mainly observed in medical domain where physician can evaluate and assign labels which becomes time consuming. We want to propose methods with the help of relationship between labeled and unlabeled sample and later on incorporate hidden information which is hidden in the unlabeled sample into learning algorithm like SVM. ii. The second limitation that we have observed is that in classification techniques, sample from training set and test set is independent and identically distributed (i.i.d.) as each random variable has the same probability distribution as others and all are mutually independent. In practical applications this assumption may or may not be realistic as sub- groups of samples have a high degree of correlation amongst both their features and their labels. Thus we have an intension to introduce approaches that relax the i.i.d. assumption in learning algorithm like SVM. iii. The last limitation that we want to study is that most of the classification techniques are designed for binary classification. In many applications we may require more than two classes. Thus we have an interest to study different algorithms where their learning can be extended to multiple classes for pursuing two goals in mind: for efficiency in terms of training and testing times, increasing accuracy by finding information that is hidden in inter- class relationships. For instance SVM is designed for binary classification and our interest is to study different algorithms where SVM learning can be extended to multiple classes.
  • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 439 REFERENCES [1]. S.A Kirk, Educating Exceptional Children Book, Wadsworth Publishing, ISBN: 0547124139. [2]. Lisa L. Weyandt; ―The physiological bases of cognitive and behavioral disorders‖; Blausen Medical Communications, United States. [3]. Lerner, Janet W, ―Learning disabilities: theories, diagnosis, and teaching strategies‖; Boston: Houghton Mifflin; ISBN 0395961149. [4]. Fletcher, Francis,Rourke, Shaywitz & Shaywitz; "Classification of learning disabilities: an evidence based evaluation"; 1993. [5]. Belkin, Matveeva, and Niyogi. Regularization and semi-supervised learning on large graphs. Proceedings of Workshop on Computational Learning Theory, Morgan Kaufmann, 2004. [6]. M. Seeger. ‘Learning with labeled and unlabeled data’; technical report, Institute for ANC, Edinburgh, UK, 2000. [7]. A. Blum and S. Chawla, ‘Learning from labeled and unlabeled data using graph mincuts’; in ICML ’01: Proceedings of the Eighteenth International Conference on Machine Learning, pages 19–26. Morgan Kaufmann, 2001. [8]. A. Corduneanu and T. Jaakkola. Distributed information regularization on graphs. In NIPS ’04: Advances in Neural Information Processing Systems, 2004. [9]. D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Sch¨olkopf. Learning with local and global consistency. In NIPS ’03: Advances in Neural Information Processing Systems. MIT Press, 2003. [10]. D. Zhou and B. Scholkopf. Learning from labeled and unlabeled data using random walks. In German Pattern Recognition Symposium, pages 237–244, 2004. [11]. X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi-supervised learning using Gaussian fields and harmonic functions. In ICML ’03: Proceedings of the Twentieth International Conference on Machine Learning, pages 912–919, 2003. [12]. T. Joachims. Transductive inference for text classification using support vector machines. In ICML ’99: Proceedings of The Sixteenth International Conference on Machine Learning, pages 200–209. Morgan Kaufmann, 1999. [13]. K. P. Bennett and A. Demiriz. Semi-supervised support vector machines. In NIPS’98: Advances in Neural Information Processing Systems 10, pages 368–374. MIT Press, 1998. [14]. G. Fung and O. L. Mangasarian. Semi-supervised support vector machines for unlabeled data classification. Optimization Methods and Software, 15:29–44, 2001. [15]. Xiaojin Zhu; ‘Semi-Supervised Learning Literature Survey’; Computer Sciences TR 1530 University of Wisconsin – Madison, July 19, 2008. [16]. Kaplan, Robert M.; Saccuzzo, Dennis P. (2009); ‘Psychological Testing: Principles, Applications, and Issues (Seventh ed.)’, Belmont (CA): Wadsworth. p. 262 (citing Wechsler (1958) The Measurement and Appraisal of Adult Intelligence), ISBN 978-0-495-09555-2. [17]. Kavita Jain, Pooja Manghirmalani, Jyotshna Dongardive, Siby Abraham; ‘Computational Diagnosis of Learning Disability’; International Journal of Recent Trends in Engineering, Vol 2, No. 3, pages 64-44, ACEEE 2009. [18]. Pooja Manghirmalani, Zenobia Panthaky, Kavita Jain; ‘Learning Disability Diagnosis and Classification- a soft Computing Approach’; IEEE World Congress on Information and Communication Technologies, pages 479 – 484, 2011. [19]. Pooja Manghirmalani, Darshana More, Kavita Jain; ‘A Fuzzy Approach to Classify Learning Disability’; International Journal of Advanced Research in Artificial Intelligence – IJARAI, pages 1-7, 2012.
  • International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME 440 [20]. Kavita Jain, Pooja Manghirmalani Mishra, Sushil Kulkarni. ‘A Neuro-Fuzzy System to Diagnose Learning Disability’; IEEE International Conference on Radar, Communication and Computing (ICRCC 2012). [21]. N. Cristianini and J. Shawe-Taylor; ‘An Introduction to Support Vector Machines’; Cambridge University Press, 2000; ISBN: 0 521780195 [22]. Asa Ben-Hur, Jason Weston; ‘A User's Guide to Support Vector Machines’; O.Carugo, F. Eisenhaber (eds.), Data Mining Techniques for the Life Sciences, Methods in Molecular Biology 609, DOI 10.1007/978-1-60327-241-4_13, Humana Press, a part of Springer Science + Business Media, LLC 2010. [23]. Soman K.P., Loganathan R., Ajay V, ‘Machine Learning with SVM and other Kernel Methods’; New Delhi, PHI Learning Pvt. Ltd, ISBN-978-81-203-3435-9, 2009 [24]. Stuart Russell, Peter Norvig; ‘Artificial Intelligence – A Modern Approach’; Pearson Prentice Hall, 2009. [25]. Anshu Bharadwaj; ‘Support Vector Machines’; Chapter in Indian Agriculture Statics Research Institute, New Delhi, India. [26]. Radhika Y, Shashi M., ‘Atmospheric Temperature Prediction using Support Vector Machines’, International Journal of Computer Theory and Engineering, Vol. 1, No.1, April 2009, 1793-8201 55-58. [27]. Chapelle, O., Zien, A., & Sch¨olkopf, B. (Eds.);’Semi-supervised learning’; MIT Press, 2006. [28]. Chawla, N. V., & Karakoulas, G; 'Learning from labeled and unlabeled data: An empirical study across techniques and domains’; Journal of Artificial Intelligence Research, 23, 331–366, 2005. [29]. De Bie, T., & Cristianini, N; ‘Semi-supervised learning using semi definite programming’; O. Chapelle, B. Scho¨elkopf and A. Zien (Eds.), Semi supervised learning. Cambridge- Massachussets: MIT Press. 48, 2006. [30]. Fung, G., & Mangasarian, O; ‘Semi-supervised support vector machines for unlabeled data classification; (Technical Report 99-05). Data Mining Institute, University of Wisconsin Madison, 1999. [31]. S. Aruna, L.V. Nandakishore and Dr S.P. Rajagopalan, “A Novel Lns Semi Supervised Learning Algorithm for Detecting Breast Cancer”, International Journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 1, 2012, pp. 44 - 53, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [32]. Jagadanand G, Kiran Y M, Saly George and Jeevamma Jacob, “Single Chip Implementation of Support Vector Machine Based Bi-Classifier”, International Journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 2, 2012, pp. 74 - 84, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.