313 318


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

313 318

  1. 1. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 5, July 2012 CLASSIFICATION OF TEXT USING FUZZY BASED INCREMENTAL FEATURE CLUSTERING ALGORITHMANILKUMARREDDY TETALI B P N MADHUKUMAR K.CHANDRAKUMAR M.Tech Scholar Associate Professor Associate Professor Department of CSE , Department of CSE, Department of CSE,B V C Engineering college, B V C Engineering college, VSL Engineering college, Odalarevu Odalarevu Kakinadaakr.tetali@gmail.com bpnmadhukumar@hotmail.com chandhu_kynm@yahoo.comAbstract: applied before the classification of the text takes The dimensionality of feature vector place. Feature selection [1] and feature extractionplays a major in text classification. We can [2][3] approaches have been proposed for featurereduce the dimensionality of feature vector by reduction.using feature clustering based on fuzzy logic. Classical feature extraction methodsWe propose a fuzzy based incremental feature uses algebraic transformations to convert theclustering algorithm. Based on the similarity representation of the original high dimensional datatest we can classify the feature vector of a set into a lower-dimensional data by a projectingdocument set are grouped into clusters process. Even though different algebraicfollowing clustering properties and each cluster transformations are available the complexity ofis characterized by a membership function with these approaches is still high. Feature clustering isstatistical mean and deviation .Then a desired the most effective technique for feature reductionnumber of clusters are formed automatically. in text classification. The idea of feature clusteringWe then take one extracted feature from each is to group the original features into clusters with acluster which is a weighted combination of high degree of pair wise semantic relatedness. Eachwords contained in a cluster. By using our cluster is treated as a single new feature, and, thus,algorithm the derived membership function feature dimensionality can be drastically reduced.match closely with real distribution of training McCallum proposed a first feature extractiondata. By our work we reduce the burden on user algorithm which was derived from thein specifying the number of features in advance. “distributional clustering”[4] idea of Pereira et al. to generate an efficient representation ofKeywords: documents and applied a learning logic approach for training text classifiers. In these featureIncremental feature clustering, fuzzy clustering methods, each new feature is generatedsimilarity, dimensionality reduction, by combining a subset of the original words andweighting matrix, text classifier. follows hard clustering, also mean and variance of a cluster are not considered. These methodsIntroduction: impose a burden on the user in specifying the number of clusters. A feature vector contains a set of features We propose a fuzzy based incrementalwhich are used for the classification of the text. feature clustering algorithm, which is anThe dimensionality of feature vector plays a incremental feature clustering[[5][6] approach tomajor role in classification of text. For example if a reduce the number of features for the textdocument set have 100000 words then it becomes classification task. The feature vector of adifficult task for the classification of text. To solve document are grouped into clusters followingthis problem, feature reduction approaches are clustering properties and each cluster is 313 All Rights Reserved © 2012 IJARCET
  2. 2. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 5, July 2012characterized by a membership function with Dimensionality Reduction of thestatistical mean and deviation This forms thedesired number of clusters automatically. We then Feature Vector:take one extracted feature from each cluster which In general, there are two ways of doingis a weighted combination of words contained in a feature reduction, feature selection, and featurecluster. By using our algorithm the derived extraction. By feature selection approaches, a newmembership function match closely with real feature set W0 is obtained, which is a subset of thedistribution of training data. Also user need not to original feature set W. Then W0 is used as inputsspecify the number of features in advance. for classification tasks. Information Gain (IG) is frequently employed in the feature selection approach. Feature clustering is an efficientThe main advantages of the approach for feature reduction which groups allproposed work are: features into some clusters, where features in a  A fuzzy incremental feature clustering cluster are similar to each other. The feature clustering methods proposed before are “hard” (FIFC) algorithm which is an incremental clustering methods, where each word of the clustering approach to reduce the original features belongs to exactly one word cluster. Therefore each word contributes to the dimensionality of the features in text synthesis of only one new feature. Each new classification. feature is obtained by summing up the words belonging to one cluster.  Determine the number of features 2(a).Proposed Method: automatically. There are some drawbacks to the existing  Match membership functions closely methods. First up all the user need to specify the number of clusters in advance. Second when with the real distribution of the training calculating the similarities the variance of the data. underlying cluster are not considered. Third all words in a cluster have the same degree of  Runs faster than other methods. contribution to the resulting extracted feature. Our fuzzy incremental feature clustering algorithm is  Better extracted features than other proposed to deal with these issues. methods. Suppose we are given a document set D of n documents d1,d2…dn together with a featureBackground and Related work: vector W of m words w1,w2…wn, and p classes Let D=<d1,d2…dn> be a document set c1,c2…cp. We then construct one word pattern forof n documents, where d1,d2…dn are individual each word in W. For word wi, its word pattern xi isdocuments and each document belongs to one of defined asthe classes in the set {c1,c2…cp}. If a two ordocument belongs to two or more classes, then twoor more copies of the document with differentclasses are included in D. Let the word set WhereW={w1,w2...wn} be the feature vector of thedocument set. The feature reduction task is to finda new word set W0 such that W and W0 workequally but W0<W well for all the desired For i≤j≤p. Here dqi indicates the number ofproperties with D. Based on the new feature vector occurrences of wi in document dqthe documents are classified into correspondingclusters. 314 All Rights Reserved © 2012 IJARCET
  3. 3. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 5, July 2012Also initialized. On the contrary, when the word pattern is combined into an existing cluster, the membership function of that cluster should be updated accordingly. Let k be the number of currently existing clusters. Therefore we have m word patterns in The clusters are G1,G2…Gk respectively. Eachtotal. It is these word patterns, our clustering cluster Gj have the mean mj=<mj1,mj2…mjp>algorithm will work on. Our goal is to group the and deviation σj=< σj1, σj2… σjp>. Let Sj be thewords in W into clusters, based on these word size of the cluster Gj. Initially we have k=0.so nopatterns. A cluster contains a certain number of clusters exist at the beginning. For each wordword patterns, and is characterized by the productof P one-dimensional Gaussian functions. Gaussian pattern xi=<xi1,xi2…xip>, 1≤i≤m,we calculatefunctions[7][8] are adopted because of their the similarity of xi to each existing clusters assuperiority over other functions in performance.Let G be a cluster containing q word patternsXj=<xj1,xj2…xjp> 1≤j≤p. Then the mean isdefined as m=<m1,m2…mp> and the deviationσ =< σ1, σ2… σp> of G are defined For 1≤j≤k, we sat that xi passes the similarity test on cluster Gj, if Where ρ, o≤ρ≤1 is a predefined threshold. If the user intends to have larger clusters, then he/she can give a smaller threshold. Otherwise, a bigger threshold can be given. As the threshold increases,For i≤j≤p, where |G| denotes the size of G.the fuzzy the number of clusters also increases. Two cases may occur. First, there are no existing fuzzysimilarity of a word pattern X=<x1,x2…xp> clusters on which Xi has passed the similarity test.to cluster G is defined by the following For this case, we assume that xi is not similarmembership function enough to any existing cluster and a new cluster Gh, h=k+1, is created withWhere 0≤µG≤1. Where σ0 is a user defined constant vector. The new vector Gh contains only one member i.e., the2(b).Fuzzy based Incremental word pattern xi at this point, since it contains only one member the deviation of a cluster is zero. WeFeature Clustering: cannot use deviation zero in calculating fuzzy Our clustering algorithm is an similarities. Hence we initialize the deviation of aincremental, self constructing algorithm. Word newly created cluster by and the number of clusterspatters are considered one by one. No clusters exist in increased by 1 and the size of cluster Gh, Shthe beginning, and clusters are created can be should be initialized i.e.,created if necessary. For each word pattern, thesimilarity of this word pattern to each existingcluster is calculated to decide whether it iscombined into an existing cluster or a new clusteris created. Once a new cluster is created, thecorresponding membership function should be 315 All Rights Reserved © 2012 IJARCET
  4. 4. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 5, July 2012Second, if there are existing clusters on which xi 3. Feature extraction usinghas passed the similarity test, let cluster Gt be thecluster with the largest membership degree, i.e., Weighting Matrix: Feature Extraction can be expressed in the following form:In this case the modification of cluster Gt is Wheredescribes as follows With For 1≤i≤n. Where T is a weighting matrix. The goal of feature reduction is achieved by finding We discuss briefly here the appropriate T such that k is smaller than m. In thecomputational cost of our method and compare it divisive information theoretic feature clusteringwith DC[9] , IOC[10] , and IG[11]. For an input algorithm the elements of T are binary and can bepattern, we have to calculate the similarity between defined as follows:the input pattern and every existing cluster. Eachpattern consists of p components where p is thenumber of classes in the document set. Therefore,in worst case, the time complexity of our method isO(mkp) where m is the number of original features By applying our feature clusteringand k is the number of clusters finally obtained. algorithm word patterns have been grouped intoFor DC, the complexity is O(mkp) where t is the clusters, and words in the feature vector W are alsonumber of iterations to be done. The complexity of clustered accordingly. For one cluster, we have oneIG is O(mp+mlogm) and the complexity of IOC is extracted feature. Since we have k clusters, weO(mkpn) where n is the number of documents have k extracted features. The elements of T areinvolved. Apparently, IG is the quickest one. Our derived based on the obtained clusters, and featuremethod is better than DC and IOC. extraction will be done. We propose three weighting approaches: hard, soft, and mixed. In the hard-weighting approach, each word is only allowed to belong to a cluster, and so it only contributes to a new extracted feature. In this case, the elements of T are defined as follows: 316 All Rights Reserved © 2012 IJARCET
  5. 5. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 5, July 2012 In the soft-weighting approach, each word isallowed to contribute to all new extracted features,with the degrees depending on the values of themembership functions. The elements of T are Where l is the number of training patterns, C is adefined as follows: parameter, which gives a tradeoff between maximum margin and classification error, and yi,The mixed-weighting approach is a combination of being +1 or -1, is the target label of pattern xi.the hard-weighting approach and the soft- Ø:X → F is a mapping from the input space to theweighting approach. In this case, the elements of T feature space F, where patterns are more easilyare defined as follows: separated, and is the hyper plane to be derived with w, and b being weight vector and offset, respectively. We follow the idea to construct an SVM-based classifier. Suppose, d is an unknown By selecting the value of γ, we provide document. We first convert d to d0 byflexibility to the user. When the similaritythreshold is small, the number of clusters is small,and each cluster covers more training patterns. In Then we feed dʹto the classifier. We getthis case, a smaller γ will favor soft-weighting and p values, one from each SVM. Then d belongs toget a higher accuracy. When the similarity those classes with 1, appearing at the outputs ofthreshold is large, the number of clusters is large, their corresponding SVMs.and each cluster covers fewer training patternswhich get a higher accuracy. 5. Conclusions: We have presented a fuzzy based4. Classification Of Text Data:: incremental feature clustering (FIFC) Given a set D of training documents, text algorithm, which is an incremental clusteringclassification can be done as follows: We specify approach to reduce the dimensionality of thethe similarity threshold ρ, and apply our clustering features classification of text. Feature that arealgorithm. Assume that k clusters are obtained for similar to each other are placed in the samethe words in the feature vector W. Then we find cluster. New clusters formed automatically, ifthe weighting matrix T and convert D to D’ .Using D’ as training data, a text classifier based a word is not similar to any existing cluster.on support vector machines (SVM) is built. SVM is Each cluster so formed is characterized by aa kernel method, which finds the maximum margin membership function with statistical meanhyperplane in feature space separating the images and deviation. By our work the derivedof the training patterns into two groups[12][13]. A membership functions match closely with the realslack variables ξi are introduced to account for distribution of the training data. We reduce themisclassifications. The objective function and burden on the user in specifying the number ofconstraints of the classification problem can be extracted features in advance. Experiments resultsformulated as: shows that our method can run faster and obtain better extracted features methods. 317 All Rights Reserved © 2012 IJARCET
  6. 6. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 5, July 20126. References: [10]J. Yan, B. Zhang, N. Liu, S. Yan, Q. Cheng, W. Fan,[1]Y.Yang and J.O.Pedersen, “A Comparative Q. Yang, W. Xi, and Z. Chen, “Effective and EfficientStudy on Feature Selection in Text Dimensionality Reduction for Large-Scale and StreamingCategorization,” Proc. 14th Int’l Conf. Machine Data Preprocessing,” IEEE Trans. Knowledge andLearning, pp. 412-420, 1997. Data Eng., vol. 18, no. 3, pp. 320-333, Mar. 2006.[2]D.D.Lewis, “Feature Selection and Feature [11]Y. Yang and J.O. Pedersen, “A Comparative StudyExtraction for Text Categorization,” Proc. on Feature Selection in Text Categorization,” Proc. 14th Int’l Conf. Machine Learning, pp. 412-420, 1997.Workshop Speech and Natural Language, pp. 212-217, 1992. [12]B.Scho¨lkopf and A.J.Smola, Learning with[3]H.Li,T.Jiang, and K.Zang, “Efficient and Robust Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2001.Feature Extraction by Maximum MarginCriterion,” T.Sebastian, S.Lawrence, and [13]J.Shawe-Taylor and N.Cristianini, KernelS. Bernhard eds. Advances in Neural InformationProcessing System, pp. 97-104, Springer, 2004. Methods for Pattern Analysis. Cambridge Univ. Press, 2004..[4]L.D.Baker and A.McCallum, “DistributionalClustering of Words for Text Classification,” Proc. 7. About the Authors:ACM SIGIR, pp. 96-103, 1998. Anil Kumar Reddy Tetali is currently pursuing his[5]L.D.Baker and A.McCallum, “Distributional M.Tech in Computer Science and Engineering atClustering of Words for Text Classification,” Proc. ACM BVC Engineering College Odalarevu.SIGIR, pp. 96-103, 1998. B P N Madhu Kumar is currently working as an [6]R.Bekkerman, R. El-Yaniv, N. Tishby, and Y. Winter, Associate Professor in Computer Science and“Distributional Word Clusters versus Words for Text Engineering department, BVC Engineering CollegeCategorization,” J. Machine Learning Research, vol. Odalarevu. His research interests include data3, pp. 1183-1208, 2003. mining, web mining. K.Chandra Kumar is currently working as an[7J.Yen and RLangari, Fuzzy Logic-Intelligence, Associate Professor in Computer Science andControl, and Information. Prentice-Hall, 1999. Engineering Department,VSL Engineering College Kakinada. His research interests include data[8]J.S.Wang and C.S.G.Lee, “Self-Adaptive mining and text mining.Neurofuzzy Inference Systems for ClassificationApplications,” IEEE Trans. Fuzzy Systems, vol.10, no. 6, pp. 790-802, Dec. 2002.[9]I.S. Dhillon, S. Mallela, and R. Kumar, “A DivisiveInfomation- Theoretic Feature Clustering Algorithm forText Classification,” J. Machine Learning Research,vol. 3, pp. 1265-1287, 2003. 318 All Rights Reserved © 2012 IJARCET