SlideShare a Scribd company logo
1 of 7
Download to read offline
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 1, Ver. V (Jan тАУ Feb. 2015), PP 83-89
www.iosrjournals.org
DOI: 10.9790/0661-17158389 www.iosrjournals.org 83 | Page
Computational Intelligence Methods for Clustering of Sense
Tagged Nepali Documents
Sunita Sarkar1
, Arindam Roy2 ,
Bipul Syam Purkayastha3
1,2,3
(Department of Computer Science , Assam University ,India)
Abstract: This paper presents a method using hybridization of self organizing map (SOM ), particle swarm
optimization(PSO) and k-means clustering algorithm for document clustering. Document representation is an
important step for clustering purposes. The common way of represent a text is bag of words approach. This
approach is simple but has two drawbacks viz. synonymy and polysemy which arise because of the ambiguity of
the words and the lack of information about the relations between the words. To avoid the drawbacks of bag of
words approach words are tagged with senses in WordNet in this paper. Sense tagging of words provide exact
senses of words. Feature vectors are generated using sense tagged documents and clustering is carried out
using proposed hybrid SOM+PSO+K-means algorithm. In the proposed algorithm initially SOM is applied to
the feature vectors to produce the prototypes and then K-means clustering algorithm is applied to cluster the
prototypes. Particle Swarm Optimization algorithm is used to find the initial centroid for K-means algorithm.
Text documents in Nepali language are used to test the hybrid SOM+PSO+K-means clustering algorithm.
Keywords: Computational Intelligence, Sense tagging, Self organization map, Particle swarm optimization.
I. Introduction
Computational intelligence (CI) is a set of nature-inspired computational methodologies and approaches to
address complex real-world problems. Paradigms that comprise CI techniques are Neural networks,
evolutionary computing, swarm intelligence and fuzzy systems. Computational Intelligence methods have been
successfully applied to many fields such as diagnosis of diseases, speech recognition, data mining, composing
music, image processing, forecasting, robot control, credit approval, classification, pattern recognition, planning
game strategies, compression, combinatorial optimization, fault diagnosis, clustering, scheduling, and time
series approximation. control systems, gear transmission and braking systems in vehicles, controlling lifts, home
appliances, controlling traffic signals, and many others[1]. This paper concerns with the application of CI
methods in document clustering.
Document clustering is the process of grouping/dividing a set of documents into subsets (called
clusters) so that the documents are similar to one another within the cluster and are dissimilar to documents in
other clusters. Vector space model with bag of words is the common approach for representing a text. This
approach suffers from two drawbacks viz. several words can have same meanings (synonymy) and same words
can have multiple meanings (polysemy). In this paper an attempt has been made to handle these issues by
tagging the words with senses. Given a word and its possible senses, as defined in a knowledge base, sense
tagging is the process of assigning the most appropriate senses to the words in the corpus
within a given context where sense can be defined as semantic value (content) of a word
when compared to other words; i.e. when it is part of a group or set of related words[ 2].
When words are sense tagged, the most appropriate senses are attached to the words . In this
work feature vectors are generated using sense tagged document corpus and clustering is done by a hybrid
SOM+PSO+K-means clustering algorithm.
Self organizing map[3] is an artificial neural network and have been successfully applied to document
clustering. The SOM is an algorithm used to visualize and interpret large high-dimensional data sets. It is an
unsupervised learning algorithm. It produces a set of prototype vectors representing the data set and carries out a
topology preserving projection of the prototypes from the n-dimensional input space onto a low-dimensional
grid.
PSO is a computational intelligence technique first introduced by Kennedy and Eberhart in 1995[4] .
PSO is a population-based stochastic search algorithm which is modeled after the social behavior of a bird flock.
In the context of PSO, a swarm refers to a number of potential solutions to the optimization problem, where
each potential solution is referred to as a particle. The aim of the PSO is to find the particle position that results
in the best evaluation of a given fitness (objective) function[5].
In the context of clustering, a single particle represents the Nc cluster centroid vectors. That is, each
particle xi is constructed as follows:
xi=(oi1,...,oij,...,oiNc) (1)
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Documents
DOI: 10.9790/0661-17158389 www.iosrjournals.org 84 | Page
Where oij refers to the jth
cluster centroid vector of the ith
particle in cluster Cij. Therefore, a swarm represents a
number of candidate clusters for the current data vectors. The fitness of particles is measured using the equation
given below.
(2)
where mij denotes the jth
data vector, which belongs to cluster i; oi is the centroid vector of the ith
cluster;
d(oi, mij) is the distance between data vector mij and the cluster centroid oi; Pi stands for the number of dataset,
which belongs to cluster Ci and Nc stands for the number of clusters.
In this paper a hybrid SOM+PSO+K-means algorithms is presented for document clustering. Several
experiments were performed to analyze the performance of clustering algorithm on sense-tagged Nepali
document corpus and Nepali document corpus without sense-tagging.
Nepali is an Indo-Aryan language spoken by approximately 45 million people in Nepal, where it is the
language of government and the medium of much education, and also in neighboring countries (India, Bhutan
and Myanmar). Nepali is written in the Devanagari alphabet. It is written phonetically, that is, the sounds
correspond almost exactly to the written letters. Nepali has many loanwords from Arabic and Persian languages,
as well as some Hindi and English borrowings[6].
The rest of this paper is organized as follows: Related work is discussed in section II. Section III
provides a description of generation of document vectors. In section IV clustering algorithm is discussed.
Section V discusses the experimental results and Section VI concludes the paper.
II. Related Works
Many clustering techniques have been developed and successfully applied for clustering of text
documents. Cui and Potok [7] proposed a hybrid PSO + K -Means algorithm for clustering of text documents.
Document vectors were generated using tf-idf method. They applied three different clustering algorithms
namely K-Means, PSO and Hybrid PSO+ K-Means to the generated document vectors. The result showed that
the proposed Hybrid PSO+ K-Means algorithm achieves higher clustering quality than other two clustering
algorithms.
Lo[8] applied Self-Organizing Map (SOM) algorithm to cluster Chinese botanical documents. Vector
Space Model with bag of words approach was used to represent each botanical document.
Xinwu[9] presented a text clustering algorithm combining K-means algorithm and SOM. In the
proposed approach clustering is typically carried out in two stages: first, the data set is clustered using the SOMs
to obtain a group of output weights which. the number of SOM networkтАЩs output nodes equals to the number of
the textsтАЩ categories. The results obtained from SOM are used to initialize K-means algorithmтАЩs cluster centers
and implement K-means algorithm to cluster the text sets.
Sridevi and Nagaveni,[10] showed that combination of ontology and optimization improve the
clustering performance. They proposed a ontology similarity measure to identify the importance of the concepts
in the document. Ontology similarity measures is defined using wordnet synsets and the particle swarm
optimization is used to cluster the document.
In [11]authors proposed a semantic text document clustering approach based on the WordNet lexical
categories and Self Organizing Map (SOM) neural network. The proposed approach generates documents
vectors using the lexical category mapping of WordNet after preprocessing the input documents.
Yang and Hodges [12] used sense disambiguation method to construct feature vector for document
representation. In this system, words are first mapped to word senses using a semantic relatedness based word
sense disambiguation algorithm. Then these senses are used to construct the feature vector to represent the
documents. Two different sense representation methods, namely, senseno and offset, are used. For clustering
four algorithms are applied namely K-Means, Buckshot, HAC and bisecting k-means.
Meshrif et.al [13] have developed a system that utilizes KohonenтАЩs Self Organizing Map to cluster
Arabic textual documents containing information about different types of crimes. The system used the
correlation or dependency relationship between some intransitive verbs and some prepositions to recognize the
type of crime.
III. Generation of document vectors
Vector space model with bag of words is the simple and common way of representing a text. In this
method words are used as the component of the document vector. The limitations of this approach is that it
cannot capture ambiguity, synonymy, semantic relations between words. For example suppose a sentence,
"A bear can bear very cold temperatures", the word bear has frequency count of 2. In the above stated example
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Documents
DOI: 10.9790/0661-17158389 www.iosrjournals.org 85 | Page
sentence the word "bear" has two different senses, viz., first bear means an animal and meaning of second
bear is tolerate. Hence, the problem with VSM using bag of words approach is that it finds the frequency count
of a word whereas for a proper generation of document vectors the frequency count of the senses of a word is
required. In this work rather than bag of words senses has been used for generation of document vectors. For
example consider the sentences, "the fly can fly" and "A bear can bear very cold temperatures" which appear in
a document, the vector corresponding to these sentences considering words as component of the vector is
shown in the Table 1
Table 1: Document Vector
Fly Cold Bear Temperature
D1 2 0 0 0
D2 0 1 2 1
If the above example sentences are sense tagged then they will look like "fly_02192818 can
fly_01944262" and the A bear_02134305 can bear_00670017 very cold_14168983 temperatures_05018974.
Vectors corresponding to these sense tagged sentences where we have considered synset id as the component
of the vectors are shown in Table 2
Table 2: Document Vector with senses
Fly_
02192818
Fly_
01944262
Cold_
14168983
Bear_
02134305
Bear_
00670017
Temperature_
05018974
D1 1 1 0 0 0 0
D2 0 0 1 1 1 1
In the example sentences stated above the word fly and bear have two senses. In the first sentence first
'fly' is an two- winged insect and second 'fly' is travel through the air. In the second sentence first bear is an
animal and second bear is tolerate. The term based method ignores the fact that words may be ambiguous and
generate vectors by considering only the frequency count of the words. We can see from the example that vector
generated by the term based method both the words fly has been considered as a single term and has frequency
count of 2.Similarly the word bear has frequency count of 2. But in the vector generated from sense tagged
sentences they have find different places in the vector because they have different synset ids. The synset id of
first fly is 02192818 and the synset id of second fly is 01944262 and the frequency count of synset
id_02192818 and synset id_01944262 is 1. Similarly both bears have find different places in the vector
because they have different synset ids. Once the feature vectors are completed in this way, weights are
assigned to each word/ synset id across the corpus using TF*IDF method [14], which is the combination of the
term/sense frequency (TF), and the inverse document frequency (IDF). Terms which are not present in the
WordNet are not considered as element of the feature vector.
Table 3, 4,5 and 6 presents some sample Napali text documents and their corresponding sense tagged
Nepali text documents.
Table 3 Sample Napali text document1
рд╡рд░реНрддрдорд╛рди рдпреБрдЧрдорд╛ рдЯреЗрдХреНрдиреЛрдХрд▓реНрдЪрд░рдХреЛ рд╡реНрдпрд╛рдйрдХрд░реНрд╛ рдмрдврдврд░рд╣реЗрдЫред рдпрд╕ рд╡реНрдпрд╛рдйрдХ рд╡рд╡рд╢реНрд╡ рдмрдЬрд╛рд░рдорд╛ рдкреНрд░рддрд░реНрдврд┐рди рдмрдврдврд░рд╣реЗрдХреЛ рд░реНрдХреНрддрдирдХреА рд░ рднреМрддрд░реНрдХ рд╕реКрд╕рд╛рдзрдирд▒реЗ рд╕рдмреИрд▒рд╛рдИ
рдЕрдзреАрдирд╕реНрде рдЧрд░реЗрдХреЛ рдЫред
Table 4. Sample Napali text document2
рд╣реЗрд▓рд▒рдХрдкреНрдЯрд░рдорд╛ рд░реН рдмрд╕рдорд╛ рдЬрд╕реНрд░реНреЛ рд▓рднрдб рднрдЗрд╣рд╛рд▓реНрд┐реЛ рд░реИрдирдЫ рддрди рд░реН рдйрд░реИрдмрдб (рдмрд╛рдЯ) рд▒рдЗрди (рд▒рд╛рдЗрди) рд▒рдЧрд╛рдПрд░ рд▒рд╛реЙрд┐реЛ рд░реИрдЫтАЩредрдЙрдиреАрд╣рд░реВрд▒реЗ рдЬреАрд╡рдирдХрд╛рд▒рдорд╛ рд╕рдзреИрдБ рдмрд╕ рд▓рднрдб рднрдПрдХреЛ
рд┐реЗрдЦреЗрдХрд╛ рдЫрди реНред рдмрд╕рдорд╛ рдЪрдвреНрдиреБрднрдиреНрд┐рд╛ рдЕрдЧрд╛рдбрдб рдйреИрд╕рд╛ рдЙрдард╛рдЙрдиреЗ рдПрдХрд┐реЗрдЦрдЦ рд┐реБрдИ рдШрдгреНрдЯрд╛ рдйрдЦрд╛рддрдЙрдиреЗред
Table 5. Sample sense tagged Napali text document1
рд╡рд░реНрддрдорд╛рди_12784 рдпреБрдЧ_128346рдорд╛ рдЯреЗрдХреНрдиреЛрдХрд▓реНрдЪрд░рдХреЛ рд╡реНрдпрд╛рдйрдХрд░реНрд╛_13365 рдмрдврдврд░рд╣реЗрдЫ_210794ред рдпрд╕ рд╡реНрдпрд╛рдйрдХ_4895 рд╡рд╡рд╢реНрд╡ рдмрдЬрд╛рд░_15303рдорд╛
рдкреНрд░рддрд░реНрдврд┐рди_36459 рдмрдврдврд░рд╣реЗрдХреЛ рд░реНрдХреНрддрдирдХреА рд░ рднреМрддрд░реНрдХ_42443 рд╕реКрд╕рд╛рдзрди_110146рд▒реЗ рд╕рдмреИ_46962рд▒рд╛рдИ рдЕрдзреАрдирд╕реНрде_415356 рдЧрд░реЗрдХреЛ рдЫ_210794ред
Table 6. Sample sense tagged Napali text document2
рд╣реЗрд▓рд▒рдХрдкреНрдЯрд░_117822рдорд╛ рд░реН рдмрд╕_19610рдорд╛ рдЬрд╕реНрд░реНреЛ_45655 рд▓рднрдб_15499 рднрдЗрд╣рд╛рд▓реНрд┐реЛ рд░реИрдирдЫ_210794 рддрди рд░реН рдйрд░реИрдмрдб (рдмрд╛рдЯ) рд▒рдЗрди (рд▒рд╛рдЗрди_11878) рд▒рдЧрд╛рдПрд░
рд▒рд╛реЙрд┐реЛ рд░реИрдЫтАЩред рдЙрдиреАрд╣рд░реВрд▒реЗ рдЬреАрд╡рдирдХрд╛рд▒_13665рдорд╛ рд╕рдзреИрдБ_37161 рдмрд╕_19610 рд▓рднрдб_15499 рднрдПрдХреЛ_41469 рд┐реЗрдЦреЗрдХрд╛ рдЫрди реНред рдмрд╕_19610рдорд╛ рдЪрдвреНрдиреБрднрдиреНрд┐рд╛
рдЕрдЧрд╛рдбрдб_38219 рдйреИрд╕рд╛_1859 рдЙрдард╛рдЙрдиреЗ рдПрдХ_12929рд┐реЗрдЦрдЦ рд┐реБрдИ_49167 рдШрдгреНрдЯрд╛_15014 рдйрдЦрд╛рддрдЙрдиреЗред
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Documents
DOI: 10.9790/0661-17158389 www.iosrjournals.org 86 | Page
IV. Hybrid SOM+PSO+K-MEANS Clustering Algorithm
The proposed hybrid SOM+PSO+k-Means clustering algorithm consists of three phases as described below:
Phase I: In this phase Self Organizing Map Algorithm is applied on the input vectors to produce the prototype
vectors which can be interpreted as тАЬprotoclusters. The number of prototype vectors are much larger than the
expected number of clusters. The prototypes are grouped in the next step to form the actual clusters.
Phase II: In this phase PSO algorithm is executed on the prototypes vectors to find clusters' centroid for K-
menas algorithm.
Phase III: The cluster centres obtained in Phase II are used in phase III. K-Means algorithm is applied in this
phase to generate the actual clusters . The result from PSO which is found in phase II is used as the initial seed
of the K-means algorithm. Phase III converges quickly when the centroids from Phase II are used.
The overall process of clustering is described in figure 1.
Fig.1 Overall Process of Clustering
The complete algorithm is now presented next.
Phase I Self Organizing Map
1). Each node's weights are initialized.
2). A vector is chosen at random from the set of training data and presented to the network.
3). Every node in the network is examined to calculate which ones' weights are most like the input vector. The
winning node is commonly known as the Best Matching Unit (BMU).
n is the total number of grid neurons.
4). The radius of the neighborhood of the BMU is calculated. This value starts large. Typically it is set to be the
radius of the network, diminishing each time-step.
5). Any nodes found within the radius of the BMU, are adjusted to make them more like the input vector. The
closer a node is to the BMU, the more its weights are altered.
where
t time;
adaptation coefficient;
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Documents
DOI: 10.9790/0661-17158389 www.iosrjournals.org 87 | Page
neighborhood kernel centered on the winner unit:
where rc and ri are positions of neurons c and i on the SOM grid. Both and decrease
monotonically with time.
6). Repeat step 2 for N iterations.
Phase II Particle Swarm Optimization
(1) Initialize each particle with K random cluster centroid vectors.
(2) For t= 1 to tmax do
a)For each particle i do:
b)For each data vector mp do
(i) Calculate the distance d (mp,oij), to all cluster centroids Cij
(ii) Assign each data vector to the closest centroid vector.
(iii) Calculate the fitness value based on equation (2).
c) Update the global best and local best positions
d) Update the cluster centroids using equations (3) and (4)
vid (t+1)=╧Й.vid(t)+c1.rand().(pid-xid)+c2.rand().(pgd-xgd) (3)
xid(t+1)=vid(t+1)+xid(t) (4)
Where tmax is the maximum number of iterations.
Phase III K-means Algorithm
15. Inheriting cluster centroid vectors from Phase II.
16. Assigning each prototype vector to the closest cluster centroids.
17. Update the cluster centers in each cluster using equation (5)
(5)
where dj denotes the document vectors that belong to cluster Sj; cj stands for the centroid vector; nj is the number
of document vectors belong to cluster Sj.
18. Repeat the steps 16 and 17 until there are no more changes in the values of the centroids.
V. Experimental Results
To demonstrate the performance of the proposed hybrid SOM+PSO + K-means clustering algorithm
an evaluation study was carried out. Comparison with other clustering algorithms viz. K-means, SOM + K-
means and PSO + K-means was conducted on sense tagged Nepali text corpus and Nepali text corpus without
sense tagging . Nepali document dataset has been collected from Technology Development for Indian Language
website [15].The corpus in Nepali language provides data from different domains such as literature, science,
media, art etc. Sense tagging of the whole document corpus was done using sense marking tool. In this study the
quality of the clustering is measured using Intra-cluster similarities and Inter-cluster similarity. A large average
intracluster value indicates that the vectors within a category are grouped tightly together. A small average
intercluster value indicates that the individual clusters are far apart. The results obtained are shown in table 7.
Table 7 Performance Comparison Of K-Means, Hybrid Pso+K-Means, Hybrid Som+K-Means And Hybrid
Som+Pso+K-Means Algorithms
No. of documents Dimension Method Clustering Algorithm Intracluster Intercluster
100 3174 Word based K-Means 3.6403 181.8389
PSO+Kmeans 3.6657 77.7284
SOM+K-means 1.6121 .6507
SOM+PSO+K-Means 1.6033 .6507
2435 Sense Based K-Means 4.1226 112.5602
PSO-Kmeans 4.0470 77.4100
SOM-K-means 2.3611 1.0084
SOM+PSO+K-Means 2.3682 1.0293
The bar graph of the comparison of the parameters is plotted for the different algorithm (k-means, hybrid
PSO+K-means, SOM+K-Means and hybrid SOM+PSO+K-meansтАЭ).
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Documents
DOI: 10.9790/0661-17158389 www.iosrjournals.org 88 | Page
Fig2: Intra cluster similarity using various algorithms
Fig3: Inter cluster similarity using various algorithms
Fig 4: Number of iteration
From the experimental results it is observed that hybrid SOM+PSO+K-means algorithm generates
better result compared to K-means, hybrid PSO+K-means, SOM+K-means in terms of both intercluster
Computational Intelligence Methods for Clustering of Sense Tagged Nepali Documents
DOI: 10.9790/0661-17158389 www.iosrjournals.org 89 | Page
similarity and number of iterations. Table 1 shows that hybrid SOM+PSO+K-means algorithm with sense
based performs better than the hybrid PSO clustering algorithm with bag of words representation.
VI. Conclusion
In this paper a hybrid SOM+PSO+K-means algorithm is presented for clustering of document dataset.
The algorithm is executed on vectors generated from sense tagged Nepali document dataset where synset ids are
considered as component of the vectors. By sense tagging exact senses of words in the context are tagged with
words as synset id is unique for each word for each sense. In the experiments K-means and PSO+K-means
were executed both on direct document vector and on prototype vectors generated by SOM. Experimental
results indicate that using hybrid algorithm that is execution of PSO+K-means on prototype vectors can
produce compact clustering result compared to K-means, PSO+K-means and SOM+K-means.
References
[1]. A.P. Engebrecht, Computational Intelligence An Introduction (John Wiley & Sons, Ltd, 2007)
[2]. S. Urooj, S. Shams, S. Hussain, F. Adeeba, Sense Tagged CLE Urdu Digest Corpus, Proc. Conf. on Language and Technology,
Karachi, 2014
[3]. T. Kohonen, : Self-organized formation of topologically correct feature maps. Biol.Cybern. 43 59тАУ69 (1982)
[4]. J. Kennedy and R.C. Eberhart, тАЬParticle Swarm Optimization,тАЭ Proc. IEEE, International Conference on Neural Networks.
[5]. Piscataway. Vol. 4, pp 1942-1948,1995
[6]. S.C. Satapathy, N. VSSV P B. Rao, JVR. Murthy, R. P.V.G.D. Prasad, тАЬA Comparative Analysis of Unsupervised K-means, PSO
and Self- Organizing PSO for Image Clustering,тАЭProc. International Conference on Computational Intelligence and Multimedia
Applications,2007.
[7]. A. Roy, S. Sarkar, B. S. Purkayastha, тАЬA Proposed Nepali Synset Entry and Extraction Tool,тАЭ Proc. 6th
Global wordnet conference,
Matsue,Japan,2012.
[8]. X. Cui, T.E. Potok, Document Clustering Analysis Based on Hybrid PSO+ K-Means Algorithm, Journal of Computer Sciences
(Special Issue): 27-33, 2005 ISSN 1549-3636
[9]. H. Lo, C.C. Lin, R-J. Fang, C. Lee, and Y-C. Weng, Chinese Document Clustering Using Self Organizing Map Based on
Botanical Document Warehouse, proceedings European Computing Conference, Lecture Notes in Electrical Engineering 28.
[10]. U.K. Sridevi and N. Nagaveni, Semantically Enhanced Document Clustering Based on PSO Algorithm, European Journal of
Scientific Research, ISSN 1450-216X Vol.57 No.3, pp.485-493,2011
[11]. T.F Gharib, M.M Fouad, A. Mashat, I. Bidawi1, Self Organizing Map based Document Clustering Using WordNet Ontologie.
IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 1, No. 2, 2012
[12]. L. Xinwu, Research on Text Clustering Algorithm Based on K_means and SOM International Symposium on Intelligent
Information Technology Application Workshops, 2008.13
[13]. Y. Wang, J. Hodges, Document clustering with semantic analysis, Proc. 39th Annual hawaii International Conference on System
Sciences, HICSS, Vol. 03, pp.54.3, 2006
[14]. M. Alruily, A. Ayesh, A. Al-Marghilani, Using Self Organizing Map to Cluster Arabic Crime Documents, Proc. International
Multiconference on Computer Science and Information Technology, pp. 357тАУ363, ISBN 978-83-60810-27-9 ISSN 1896-7094
[15]. G. Salton, A. Wong, and C.S. Yang. A vector space model for automatic indexing. Communications of the ACM, 18:613тАУ620.
[16]. http://tdil-dc.in.

More Related Content

What's hot

Complete agglomerative hierarchy documentтАЩs clustering based on fuzzy luhnтАЩs ...
Complete agglomerative hierarchy documentтАЩs clustering based on fuzzy luhnтАЩs ...Complete agglomerative hierarchy documentтАЩs clustering based on fuzzy luhnтАЩs ...
Complete agglomerative hierarchy documentтАЩs clustering based on fuzzy luhnтАЩs ...IJECEIAES
┬а
Topic models
Topic modelsTopic models
Topic modelsAjay Ohri
┬а
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingNimrita Koul
┬а
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextUniversity of Bari (Italy)
┬а
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
┬а
Novelty detection via topic modeling in research articles
Novelty detection via topic modeling in research articlesNovelty detection via topic modeling in research articles
Novelty detection via topic modeling in research articlescsandit
┬а
Blei lafferty2009
Blei lafferty2009Blei lafferty2009
Blei lafferty2009Ajay Ohri
┬а
Hyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyHyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyIJwest
┬а
Barzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentationBarzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentationRichard Littauer
┬а
A Newly Proposed Technique for Summarizing the Abstractive NewspapersтАЩ Articl...
A Newly Proposed Technique for Summarizing the Abstractive NewspapersтАЩ Articl...A Newly Proposed Technique for Summarizing the Abstractive NewspapersтАЩ Articl...
A Newly Proposed Technique for Summarizing the Abstractive NewspapersтАЩ Articl...mlaij
┬а
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modellingcsandit
┬а
Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...irjes
┬а
A DOMAIN INDEPENDENT APPROACH FOR ONTOLOGY SEMANTIC ENRICHMENT
A DOMAIN INDEPENDENT APPROACH FOR ONTOLOGY SEMANTIC ENRICHMENTA DOMAIN INDEPENDENT APPROACH FOR ONTOLOGY SEMANTIC ENRICHMENT
A DOMAIN INDEPENDENT APPROACH FOR ONTOLOGY SEMANTIC ENRICHMENTcscpconf
┬а
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Innovation Quotient Pvt Ltd
┬а
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESNAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESacijjournal
┬а

What's hot (17)

Complete agglomerative hierarchy documentтАЩs clustering based on fuzzy luhnтАЩs ...
Complete agglomerative hierarchy documentтАЩs clustering based on fuzzy luhnтАЩs ...Complete agglomerative hierarchy documentтАЩs clustering based on fuzzy luhnтАЩs ...
Complete agglomerative hierarchy documentтАЩs clustering based on fuzzy luhnтАЩs ...
┬а
Topic models
Topic modelsTopic models
Topic models
┬а
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
┬а
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
┬а
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
┬а
Novelty detection via topic modeling in research articles
Novelty detection via topic modeling in research articlesNovelty detection via topic modeling in research articles
Novelty detection via topic modeling in research articles
┬а
Blei lafferty2009
Blei lafferty2009Blei lafferty2009
Blei lafferty2009
┬а
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
┬а
Hyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyHyponymy extraction of domain ontology
Hyponymy extraction of domain ontology
┬а
Barzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentationBarzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentation
┬а
A Newly Proposed Technique for Summarizing the Abstractive NewspapersтАЩ Articl...
A Newly Proposed Technique for Summarizing the Abstractive NewspapersтАЩ Articl...A Newly Proposed Technique for Summarizing the Abstractive NewspapersтАЩ Articl...
A Newly Proposed Technique for Summarizing the Abstractive NewspapersтАЩ Articl...
┬а
Artificial Intelligence of the Web through Domain Ontologies
Artificial Intelligence of the Web through Domain OntologiesArtificial Intelligence of the Web through Domain Ontologies
Artificial Intelligence of the Web through Domain Ontologies
┬а
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modelling
┬а
Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...
┬а
A DOMAIN INDEPENDENT APPROACH FOR ONTOLOGY SEMANTIC ENRICHMENT
A DOMAIN INDEPENDENT APPROACH FOR ONTOLOGY SEMANTIC ENRICHMENTA DOMAIN INDEPENDENT APPROACH FOR ONTOLOGY SEMANTIC ENRICHMENT
A DOMAIN INDEPENDENT APPROACH FOR ONTOLOGY SEMANTIC ENRICHMENT
┬а
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...
┬а
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESNAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
┬а

Viewers also liked

Performance of Concrete at Elevated Temperatures: Utilizing A Blended Ordinar...
Performance of Concrete at Elevated Temperatures: Utilizing A Blended Ordinar...Performance of Concrete at Elevated Temperatures: Utilizing A Blended Ordinar...
Performance of Concrete at Elevated Temperatures: Utilizing A Blended Ordinar...IOSR Journals
┬а
Performance Analysis of Minimum Hop Source Routing Algorithm for Two Dimensio...
Performance Analysis of Minimum Hop Source Routing Algorithm for Two Dimensio...Performance Analysis of Minimum Hop Source Routing Algorithm for Two Dimensio...
Performance Analysis of Minimum Hop Source Routing Algorithm for Two Dimensio...IOSR Journals
┬а
Case Study of Tolled Road Project
Case Study of Tolled Road ProjectCase Study of Tolled Road Project
Case Study of Tolled Road ProjectIOSR Journals
┬а
Roles, Contributions and Dangers of Technologicaldevelopment in Nigeria
Roles, Contributions and Dangers of Technologicaldevelopment in NigeriaRoles, Contributions and Dangers of Technologicaldevelopment in Nigeria
Roles, Contributions and Dangers of Technologicaldevelopment in NigeriaIOSR Journals
┬а
Determinants of Cash holding in German Market
Determinants of Cash holding in German MarketDeterminants of Cash holding in German Market
Determinants of Cash holding in German MarketIOSR Journals
┬а
The evaluation of the effect of Sida acuta leaf extract on the microanatomy a...
The evaluation of the effect of Sida acuta leaf extract on the microanatomy a...The evaluation of the effect of Sida acuta leaf extract on the microanatomy a...
The evaluation of the effect of Sida acuta leaf extract on the microanatomy a...IOSR Journals
┬а
Design and Simulation of transformer less Single Phase Photovoltaic Inverter ...
Design and Simulation of transformer less Single Phase Photovoltaic Inverter ...Design and Simulation of transformer less Single Phase Photovoltaic Inverter ...
Design and Simulation of transformer less Single Phase Photovoltaic Inverter ...IOSR Journals
┬а
Soft Switched Resonant Converters with Unsymmetrical Control
Soft Switched Resonant Converters with Unsymmetrical ControlSoft Switched Resonant Converters with Unsymmetrical Control
Soft Switched Resonant Converters with Unsymmetrical ControlIOSR Journals
┬а
Characterization and Humidity Sensing Application of WO3-SnO2 Nanocomposite
Characterization and Humidity Sensing Application of WO3-SnO2 NanocompositeCharacterization and Humidity Sensing Application of WO3-SnO2 Nanocomposite
Characterization and Humidity Sensing Application of WO3-SnO2 NanocompositeIOSR Journals
┬а
The Effects of Copper Addition on the compression behavior of Al-Ca Alloy
The Effects of Copper Addition on the compression behavior of Al-Ca AlloyThe Effects of Copper Addition on the compression behavior of Al-Ca Alloy
The Effects of Copper Addition on the compression behavior of Al-Ca AlloyIOSR Journals
┬а
Sequential Pattern Tree Mining
Sequential Pattern Tree MiningSequential Pattern Tree Mining
Sequential Pattern Tree MiningIOSR Journals
┬а
Feasibility of Using Tincal Ore Waste as Barrier Material for Solid Waste Con...
Feasibility of Using Tincal Ore Waste as Barrier Material for Solid Waste Con...Feasibility of Using Tincal Ore Waste as Barrier Material for Solid Waste Con...
Feasibility of Using Tincal Ore Waste as Barrier Material for Solid Waste Con...IOSR Journals
┬а
The Effect of Impurity Concentration on Activation Energy Change of Palm Oil ...
The Effect of Impurity Concentration on Activation Energy Change of Palm Oil ...The Effect of Impurity Concentration on Activation Energy Change of Palm Oil ...
The Effect of Impurity Concentration on Activation Energy Change of Palm Oil ...IOSR Journals
┬а
Comparative Study of Geotechnical Properties of Abia State Lateritic Deposits
Comparative Study of Geotechnical Properties of Abia State Lateritic DepositsComparative Study of Geotechnical Properties of Abia State Lateritic Deposits
Comparative Study of Geotechnical Properties of Abia State Lateritic DepositsIOSR Journals
┬а
F1303013440
F1303013440F1303013440
F1303013440IOSR Journals
┬а

Viewers also liked (20)

Performance of Concrete at Elevated Temperatures: Utilizing A Blended Ordinar...
Performance of Concrete at Elevated Temperatures: Utilizing A Blended Ordinar...Performance of Concrete at Elevated Temperatures: Utilizing A Blended Ordinar...
Performance of Concrete at Elevated Temperatures: Utilizing A Blended Ordinar...
┬а
Performance Analysis of Minimum Hop Source Routing Algorithm for Two Dimensio...
Performance Analysis of Minimum Hop Source Routing Algorithm for Two Dimensio...Performance Analysis of Minimum Hop Source Routing Algorithm for Two Dimensio...
Performance Analysis of Minimum Hop Source Routing Algorithm for Two Dimensio...
┬а
B010310612
B010310612B010310612
B010310612
┬а
G010245056
G010245056G010245056
G010245056
┬а
Case Study of Tolled Road Project
Case Study of Tolled Road ProjectCase Study of Tolled Road Project
Case Study of Tolled Road Project
┬а
Roles, Contributions and Dangers of Technologicaldevelopment in Nigeria
Roles, Contributions and Dangers of Technologicaldevelopment in NigeriaRoles, Contributions and Dangers of Technologicaldevelopment in Nigeria
Roles, Contributions and Dangers of Technologicaldevelopment in Nigeria
┬а
Determinants of Cash holding in German Market
Determinants of Cash holding in German MarketDeterminants of Cash holding in German Market
Determinants of Cash holding in German Market
┬а
The evaluation of the effect of Sida acuta leaf extract on the microanatomy a...
The evaluation of the effect of Sida acuta leaf extract on the microanatomy a...The evaluation of the effect of Sida acuta leaf extract on the microanatomy a...
The evaluation of the effect of Sida acuta leaf extract on the microanatomy a...
┬а
Design and Simulation of transformer less Single Phase Photovoltaic Inverter ...
Design and Simulation of transformer less Single Phase Photovoltaic Inverter ...Design and Simulation of transformer less Single Phase Photovoltaic Inverter ...
Design and Simulation of transformer less Single Phase Photovoltaic Inverter ...
┬а
Soft Switched Resonant Converters with Unsymmetrical Control
Soft Switched Resonant Converters with Unsymmetrical ControlSoft Switched Resonant Converters with Unsymmetrical Control
Soft Switched Resonant Converters with Unsymmetrical Control
┬а
D012412931
D012412931D012412931
D012412931
┬а
Characterization and Humidity Sensing Application of WO3-SnO2 Nanocomposite
Characterization and Humidity Sensing Application of WO3-SnO2 NanocompositeCharacterization and Humidity Sensing Application of WO3-SnO2 Nanocomposite
Characterization and Humidity Sensing Application of WO3-SnO2 Nanocomposite
┬а
B0560713
B0560713B0560713
B0560713
┬а
The Effects of Copper Addition on the compression behavior of Al-Ca Alloy
The Effects of Copper Addition on the compression behavior of Al-Ca AlloyThe Effects of Copper Addition on the compression behavior of Al-Ca Alloy
The Effects of Copper Addition on the compression behavior of Al-Ca Alloy
┬а
Sequential Pattern Tree Mining
Sequential Pattern Tree MiningSequential Pattern Tree Mining
Sequential Pattern Tree Mining
┬а
Feasibility of Using Tincal Ore Waste as Barrier Material for Solid Waste Con...
Feasibility of Using Tincal Ore Waste as Barrier Material for Solid Waste Con...Feasibility of Using Tincal Ore Waste as Barrier Material for Solid Waste Con...
Feasibility of Using Tincal Ore Waste as Barrier Material for Solid Waste Con...
┬а
The Effect of Impurity Concentration on Activation Energy Change of Palm Oil ...
The Effect of Impurity Concentration on Activation Energy Change of Palm Oil ...The Effect of Impurity Concentration on Activation Energy Change of Palm Oil ...
The Effect of Impurity Concentration on Activation Energy Change of Palm Oil ...
┬а
Comparative Study of Geotechnical Properties of Abia State Lateritic Deposits
Comparative Study of Geotechnical Properties of Abia State Lateritic DepositsComparative Study of Geotechnical Properties of Abia State Lateritic Deposits
Comparative Study of Geotechnical Properties of Abia State Lateritic Deposits
┬а
F1303013440
F1303013440F1303013440
F1303013440
┬а
F0523740
F0523740F0523740
F0523740
┬а

Similar to Computational Intelligence Methods for Clustering of Sense Tagged Nepali Documents

L1803058388
L1803058388L1803058388
L1803058388IOSR Journals
┬а
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...iosrjce
┬а
A Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationA Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationElena-Oana Tabaranu
┬а
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHSPATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHSkevig
┬а
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING cscpconf
┬а
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE ijdms
┬а
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAijistjournal
┬а
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations onijistjournal
┬а
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
┬а
An Improved Similarity Matching based Clustering Framework for Short and Sent...
An Improved Similarity Matching based Clustering Framework for Short and Sent...An Improved Similarity Matching based Clustering Framework for Short and Sent...
An Improved Similarity Matching based Clustering Framework for Short and Sent...IJECEIAES
┬а
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) ijceronline
┬а
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextFulvio Rotella
┬а
A Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And ApplicationsA Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And ApplicationsLisa Graves
┬а
Automated Essay Scoring Using Generalized Latent Semantic Analysis
Automated Essay Scoring Using Generalized Latent Semantic AnalysisAutomated Essay Scoring Using Generalized Latent Semantic Analysis
Automated Essay Scoring Using Generalized Latent Semantic AnalysisGina Rizzo
┬а
Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsContext Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsIJCSIS Research Publications
┬а
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
┬а
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
┬а
Analysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion MiningAnalysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion Miningmlaij
┬а
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
┬а

Similar to Computational Intelligence Methods for Clustering of Sense Tagged Nepali Documents (20)

L1803058388
L1803058388L1803058388
L1803058388
┬а
F017243241
F017243241F017243241
F017243241
┬а
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
┬а
A Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationA Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense Disambiguation
┬а
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHSPATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHS
┬а
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
┬а
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE
┬а
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
┬а
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
┬а
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
┬а
An Improved Similarity Matching based Clustering Framework for Short and Sent...
An Improved Similarity Matching based Clustering Framework for Short and Sent...An Improved Similarity Matching based Clustering Framework for Short and Sent...
An Improved Similarity Matching based Clustering Framework for Short and Sent...
┬а
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
┬а
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
┬а
A Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And ApplicationsA Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And Applications
┬а
Automated Essay Scoring Using Generalized Latent Semantic Analysis
Automated Essay Scoring Using Generalized Latent Semantic AnalysisAutomated Essay Scoring Using Generalized Latent Semantic Analysis
Automated Essay Scoring Using Generalized Latent Semantic Analysis
┬а
Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsContext Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word Pairs
┬а
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
┬а
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
┬а
Analysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion MiningAnalysis of Opinionated Text for Opinion Mining
Analysis of Opinionated Text for Opinion Mining
┬а
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
┬а

More from IOSR Journals

More from IOSR Journals (20)

A011140104
A011140104A011140104
A011140104
┬а
M0111397100
M0111397100M0111397100
M0111397100
┬а
L011138596
L011138596L011138596
L011138596
┬а
K011138084
K011138084K011138084
K011138084
┬а
J011137479
J011137479J011137479
J011137479
┬а
I011136673
I011136673I011136673
I011136673
┬а
G011134454
G011134454G011134454
G011134454
┬а
H011135565
H011135565H011135565
H011135565
┬а
F011134043
F011134043F011134043
F011134043
┬а
E011133639
E011133639E011133639
E011133639
┬а
D011132635
D011132635D011132635
D011132635
┬а
C011131925
C011131925C011131925
C011131925
┬а
B011130918
B011130918B011130918
B011130918
┬а
A011130108
A011130108A011130108
A011130108
┬а
I011125160
I011125160I011125160
I011125160
┬а
H011124050
H011124050H011124050
H011124050
┬а
G011123539
G011123539G011123539
G011123539
┬а
F011123134
F011123134F011123134
F011123134
┬а
E011122530
E011122530E011122530
E011122530
┬а
D011121524
D011121524D011121524
D011121524
┬а

Recently uploaded

Model Call Girl in Narela Delhi reach out to us at ЁЯФЭ8264348440ЁЯФЭ
Model Call Girl in Narela Delhi reach out to us at ЁЯФЭ8264348440ЁЯФЭModel Call Girl in Narela Delhi reach out to us at ЁЯФЭ8264348440ЁЯФЭ
Model Call Girl in Narela Delhi reach out to us at ЁЯФЭ8264348440ЁЯФЭsoniya singh
┬а
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
┬а
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
┬а
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJo├гo Esperancinha
┬а
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
┬а
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
┬а
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
┬а
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
┬а
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
┬а
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
┬а
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
┬а
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
┬а
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
┬а
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
┬а
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
┬а
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacingjaychoudhary37
┬а
Study on Air-Water & Water-Water Heat Exchange in a Finned я╗┐Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned я╗┐Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned я╗┐Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned я╗┐Tube ExchangerAnamika Sarkar
┬а
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
┬а

Recently uploaded (20)

Model Call Girl in Narela Delhi reach out to us at ЁЯФЭ8264348440ЁЯФЭ
Model Call Girl in Narela Delhi reach out to us at ЁЯФЭ8264348440ЁЯФЭModel Call Girl in Narela Delhi reach out to us at ЁЯФЭ8264348440ЁЯФЭ
Model Call Girl in Narela Delhi reach out to us at ЁЯФЭ8264348440ЁЯФЭ
┬а
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
┬а
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
┬а
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
┬а
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
┬а
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
┬а
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
┬а
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
┬а
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
┬а
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
┬а
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
┬а
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
┬а
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
┬а
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
┬а
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
┬а
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
┬а
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
┬а
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacing
┬а
Study on Air-Water & Water-Water Heat Exchange in a Finned я╗┐Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned я╗┐Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned я╗┐Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned я╗┐Tube Exchanger
┬а
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
┬а

Computational Intelligence Methods for Clustering of Sense Tagged Nepali Documents

  • 1. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 1, Ver. V (Jan тАУ Feb. 2015), PP 83-89 www.iosrjournals.org DOI: 10.9790/0661-17158389 www.iosrjournals.org 83 | Page Computational Intelligence Methods for Clustering of Sense Tagged Nepali Documents Sunita Sarkar1 , Arindam Roy2 , Bipul Syam Purkayastha3 1,2,3 (Department of Computer Science , Assam University ,India) Abstract: This paper presents a method using hybridization of self organizing map (SOM ), particle swarm optimization(PSO) and k-means clustering algorithm for document clustering. Document representation is an important step for clustering purposes. The common way of represent a text is bag of words approach. This approach is simple but has two drawbacks viz. synonymy and polysemy which arise because of the ambiguity of the words and the lack of information about the relations between the words. To avoid the drawbacks of bag of words approach words are tagged with senses in WordNet in this paper. Sense tagging of words provide exact senses of words. Feature vectors are generated using sense tagged documents and clustering is carried out using proposed hybrid SOM+PSO+K-means algorithm. In the proposed algorithm initially SOM is applied to the feature vectors to produce the prototypes and then K-means clustering algorithm is applied to cluster the prototypes. Particle Swarm Optimization algorithm is used to find the initial centroid for K-means algorithm. Text documents in Nepali language are used to test the hybrid SOM+PSO+K-means clustering algorithm. Keywords: Computational Intelligence, Sense tagging, Self organization map, Particle swarm optimization. I. Introduction Computational intelligence (CI) is a set of nature-inspired computational methodologies and approaches to address complex real-world problems. Paradigms that comprise CI techniques are Neural networks, evolutionary computing, swarm intelligence and fuzzy systems. Computational Intelligence methods have been successfully applied to many fields such as diagnosis of diseases, speech recognition, data mining, composing music, image processing, forecasting, robot control, credit approval, classification, pattern recognition, planning game strategies, compression, combinatorial optimization, fault diagnosis, clustering, scheduling, and time series approximation. control systems, gear transmission and braking systems in vehicles, controlling lifts, home appliances, controlling traffic signals, and many others[1]. This paper concerns with the application of CI methods in document clustering. Document clustering is the process of grouping/dividing a set of documents into subsets (called clusters) so that the documents are similar to one another within the cluster and are dissimilar to documents in other clusters. Vector space model with bag of words is the common approach for representing a text. This approach suffers from two drawbacks viz. several words can have same meanings (synonymy) and same words can have multiple meanings (polysemy). In this paper an attempt has been made to handle these issues by tagging the words with senses. Given a word and its possible senses, as defined in a knowledge base, sense tagging is the process of assigning the most appropriate senses to the words in the corpus within a given context where sense can be defined as semantic value (content) of a word when compared to other words; i.e. when it is part of a group or set of related words[ 2]. When words are sense tagged, the most appropriate senses are attached to the words . In this work feature vectors are generated using sense tagged document corpus and clustering is done by a hybrid SOM+PSO+K-means clustering algorithm. Self organizing map[3] is an artificial neural network and have been successfully applied to document clustering. The SOM is an algorithm used to visualize and interpret large high-dimensional data sets. It is an unsupervised learning algorithm. It produces a set of prototype vectors representing the data set and carries out a topology preserving projection of the prototypes from the n-dimensional input space onto a low-dimensional grid. PSO is a computational intelligence technique first introduced by Kennedy and Eberhart in 1995[4] . PSO is a population-based stochastic search algorithm which is modeled after the social behavior of a bird flock. In the context of PSO, a swarm refers to a number of potential solutions to the optimization problem, where each potential solution is referred to as a particle. The aim of the PSO is to find the particle position that results in the best evaluation of a given fitness (objective) function[5]. In the context of clustering, a single particle represents the Nc cluster centroid vectors. That is, each particle xi is constructed as follows: xi=(oi1,...,oij,...,oiNc) (1)
  • 2. Computational Intelligence Methods for Clustering of Sense Tagged Nepali Documents DOI: 10.9790/0661-17158389 www.iosrjournals.org 84 | Page Where oij refers to the jth cluster centroid vector of the ith particle in cluster Cij. Therefore, a swarm represents a number of candidate clusters for the current data vectors. The fitness of particles is measured using the equation given below. (2) where mij denotes the jth data vector, which belongs to cluster i; oi is the centroid vector of the ith cluster; d(oi, mij) is the distance between data vector mij and the cluster centroid oi; Pi stands for the number of dataset, which belongs to cluster Ci and Nc stands for the number of clusters. In this paper a hybrid SOM+PSO+K-means algorithms is presented for document clustering. Several experiments were performed to analyze the performance of clustering algorithm on sense-tagged Nepali document corpus and Nepali document corpus without sense-tagging. Nepali is an Indo-Aryan language spoken by approximately 45 million people in Nepal, where it is the language of government and the medium of much education, and also in neighboring countries (India, Bhutan and Myanmar). Nepali is written in the Devanagari alphabet. It is written phonetically, that is, the sounds correspond almost exactly to the written letters. Nepali has many loanwords from Arabic and Persian languages, as well as some Hindi and English borrowings[6]. The rest of this paper is organized as follows: Related work is discussed in section II. Section III provides a description of generation of document vectors. In section IV clustering algorithm is discussed. Section V discusses the experimental results and Section VI concludes the paper. II. Related Works Many clustering techniques have been developed and successfully applied for clustering of text documents. Cui and Potok [7] proposed a hybrid PSO + K -Means algorithm for clustering of text documents. Document vectors were generated using tf-idf method. They applied three different clustering algorithms namely K-Means, PSO and Hybrid PSO+ K-Means to the generated document vectors. The result showed that the proposed Hybrid PSO+ K-Means algorithm achieves higher clustering quality than other two clustering algorithms. Lo[8] applied Self-Organizing Map (SOM) algorithm to cluster Chinese botanical documents. Vector Space Model with bag of words approach was used to represent each botanical document. Xinwu[9] presented a text clustering algorithm combining K-means algorithm and SOM. In the proposed approach clustering is typically carried out in two stages: first, the data set is clustered using the SOMs to obtain a group of output weights which. the number of SOM networkтАЩs output nodes equals to the number of the textsтАЩ categories. The results obtained from SOM are used to initialize K-means algorithmтАЩs cluster centers and implement K-means algorithm to cluster the text sets. Sridevi and Nagaveni,[10] showed that combination of ontology and optimization improve the clustering performance. They proposed a ontology similarity measure to identify the importance of the concepts in the document. Ontology similarity measures is defined using wordnet synsets and the particle swarm optimization is used to cluster the document. In [11]authors proposed a semantic text document clustering approach based on the WordNet lexical categories and Self Organizing Map (SOM) neural network. The proposed approach generates documents vectors using the lexical category mapping of WordNet after preprocessing the input documents. Yang and Hodges [12] used sense disambiguation method to construct feature vector for document representation. In this system, words are first mapped to word senses using a semantic relatedness based word sense disambiguation algorithm. Then these senses are used to construct the feature vector to represent the documents. Two different sense representation methods, namely, senseno and offset, are used. For clustering four algorithms are applied namely K-Means, Buckshot, HAC and bisecting k-means. Meshrif et.al [13] have developed a system that utilizes KohonenтАЩs Self Organizing Map to cluster Arabic textual documents containing information about different types of crimes. The system used the correlation or dependency relationship between some intransitive verbs and some prepositions to recognize the type of crime. III. Generation of document vectors Vector space model with bag of words is the simple and common way of representing a text. In this method words are used as the component of the document vector. The limitations of this approach is that it cannot capture ambiguity, synonymy, semantic relations between words. For example suppose a sentence, "A bear can bear very cold temperatures", the word bear has frequency count of 2. In the above stated example
  • 3. Computational Intelligence Methods for Clustering of Sense Tagged Nepali Documents DOI: 10.9790/0661-17158389 www.iosrjournals.org 85 | Page sentence the word "bear" has two different senses, viz., first bear means an animal and meaning of second bear is tolerate. Hence, the problem with VSM using bag of words approach is that it finds the frequency count of a word whereas for a proper generation of document vectors the frequency count of the senses of a word is required. In this work rather than bag of words senses has been used for generation of document vectors. For example consider the sentences, "the fly can fly" and "A bear can bear very cold temperatures" which appear in a document, the vector corresponding to these sentences considering words as component of the vector is shown in the Table 1 Table 1: Document Vector Fly Cold Bear Temperature D1 2 0 0 0 D2 0 1 2 1 If the above example sentences are sense tagged then they will look like "fly_02192818 can fly_01944262" and the A bear_02134305 can bear_00670017 very cold_14168983 temperatures_05018974. Vectors corresponding to these sense tagged sentences where we have considered synset id as the component of the vectors are shown in Table 2 Table 2: Document Vector with senses Fly_ 02192818 Fly_ 01944262 Cold_ 14168983 Bear_ 02134305 Bear_ 00670017 Temperature_ 05018974 D1 1 1 0 0 0 0 D2 0 0 1 1 1 1 In the example sentences stated above the word fly and bear have two senses. In the first sentence first 'fly' is an two- winged insect and second 'fly' is travel through the air. In the second sentence first bear is an animal and second bear is tolerate. The term based method ignores the fact that words may be ambiguous and generate vectors by considering only the frequency count of the words. We can see from the example that vector generated by the term based method both the words fly has been considered as a single term and has frequency count of 2.Similarly the word bear has frequency count of 2. But in the vector generated from sense tagged sentences they have find different places in the vector because they have different synset ids. The synset id of first fly is 02192818 and the synset id of second fly is 01944262 and the frequency count of synset id_02192818 and synset id_01944262 is 1. Similarly both bears have find different places in the vector because they have different synset ids. Once the feature vectors are completed in this way, weights are assigned to each word/ synset id across the corpus using TF*IDF method [14], which is the combination of the term/sense frequency (TF), and the inverse document frequency (IDF). Terms which are not present in the WordNet are not considered as element of the feature vector. Table 3, 4,5 and 6 presents some sample Napali text documents and their corresponding sense tagged Nepali text documents. Table 3 Sample Napali text document1 рд╡рд░реНрддрдорд╛рди рдпреБрдЧрдорд╛ рдЯреЗрдХреНрдиреЛрдХрд▓реНрдЪрд░рдХреЛ рд╡реНрдпрд╛рдйрдХрд░реНрд╛ рдмрдврдврд░рд╣реЗрдЫред рдпрд╕ рд╡реНрдпрд╛рдйрдХ рд╡рд╡рд╢реНрд╡ рдмрдЬрд╛рд░рдорд╛ рдкреНрд░рддрд░реНрдврд┐рди рдмрдврдврд░рд╣реЗрдХреЛ рд░реНрдХреНрддрдирдХреА рд░ рднреМрддрд░реНрдХ рд╕реКрд╕рд╛рдзрдирд▒реЗ рд╕рдмреИрд▒рд╛рдИ рдЕрдзреАрдирд╕реНрде рдЧрд░реЗрдХреЛ рдЫред Table 4. Sample Napali text document2 рд╣реЗрд▓рд▒рдХрдкреНрдЯрд░рдорд╛ рд░реН рдмрд╕рдорд╛ рдЬрд╕реНрд░реНреЛ рд▓рднрдб рднрдЗрд╣рд╛рд▓реНрд┐реЛ рд░реИрдирдЫ рддрди рд░реН рдйрд░реИрдмрдб (рдмрд╛рдЯ) рд▒рдЗрди (рд▒рд╛рдЗрди) рд▒рдЧрд╛рдПрд░ рд▒рд╛реЙрд┐реЛ рд░реИрдЫтАЩредрдЙрдиреАрд╣рд░реВрд▒реЗ рдЬреАрд╡рдирдХрд╛рд▒рдорд╛ рд╕рдзреИрдБ рдмрд╕ рд▓рднрдб рднрдПрдХреЛ рд┐реЗрдЦреЗрдХрд╛ рдЫрди реНред рдмрд╕рдорд╛ рдЪрдвреНрдиреБрднрдиреНрд┐рд╛ рдЕрдЧрд╛рдбрдб рдйреИрд╕рд╛ рдЙрдард╛рдЙрдиреЗ рдПрдХрд┐реЗрдЦрдЦ рд┐реБрдИ рдШрдгреНрдЯрд╛ рдйрдЦрд╛рддрдЙрдиреЗред Table 5. Sample sense tagged Napali text document1 рд╡рд░реНрддрдорд╛рди_12784 рдпреБрдЧ_128346рдорд╛ рдЯреЗрдХреНрдиреЛрдХрд▓реНрдЪрд░рдХреЛ рд╡реНрдпрд╛рдйрдХрд░реНрд╛_13365 рдмрдврдврд░рд╣реЗрдЫ_210794ред рдпрд╕ рд╡реНрдпрд╛рдйрдХ_4895 рд╡рд╡рд╢реНрд╡ рдмрдЬрд╛рд░_15303рдорд╛ рдкреНрд░рддрд░реНрдврд┐рди_36459 рдмрдврдврд░рд╣реЗрдХреЛ рд░реНрдХреНрддрдирдХреА рд░ рднреМрддрд░реНрдХ_42443 рд╕реКрд╕рд╛рдзрди_110146рд▒реЗ рд╕рдмреИ_46962рд▒рд╛рдИ рдЕрдзреАрдирд╕реНрде_415356 рдЧрд░реЗрдХреЛ рдЫ_210794ред Table 6. Sample sense tagged Napali text document2 рд╣реЗрд▓рд▒рдХрдкреНрдЯрд░_117822рдорд╛ рд░реН рдмрд╕_19610рдорд╛ рдЬрд╕реНрд░реНреЛ_45655 рд▓рднрдб_15499 рднрдЗрд╣рд╛рд▓реНрд┐реЛ рд░реИрдирдЫ_210794 рддрди рд░реН рдйрд░реИрдмрдб (рдмрд╛рдЯ) рд▒рдЗрди (рд▒рд╛рдЗрди_11878) рд▒рдЧрд╛рдПрд░ рд▒рд╛реЙрд┐реЛ рд░реИрдЫтАЩред рдЙрдиреАрд╣рд░реВрд▒реЗ рдЬреАрд╡рдирдХрд╛рд▒_13665рдорд╛ рд╕рдзреИрдБ_37161 рдмрд╕_19610 рд▓рднрдб_15499 рднрдПрдХреЛ_41469 рд┐реЗрдЦреЗрдХрд╛ рдЫрди реНред рдмрд╕_19610рдорд╛ рдЪрдвреНрдиреБрднрдиреНрд┐рд╛ рдЕрдЧрд╛рдбрдб_38219 рдйреИрд╕рд╛_1859 рдЙрдард╛рдЙрдиреЗ рдПрдХ_12929рд┐реЗрдЦрдЦ рд┐реБрдИ_49167 рдШрдгреНрдЯрд╛_15014 рдйрдЦрд╛рддрдЙрдиреЗред
  • 4. Computational Intelligence Methods for Clustering of Sense Tagged Nepali Documents DOI: 10.9790/0661-17158389 www.iosrjournals.org 86 | Page IV. Hybrid SOM+PSO+K-MEANS Clustering Algorithm The proposed hybrid SOM+PSO+k-Means clustering algorithm consists of three phases as described below: Phase I: In this phase Self Organizing Map Algorithm is applied on the input vectors to produce the prototype vectors which can be interpreted as тАЬprotoclusters. The number of prototype vectors are much larger than the expected number of clusters. The prototypes are grouped in the next step to form the actual clusters. Phase II: In this phase PSO algorithm is executed on the prototypes vectors to find clusters' centroid for K- menas algorithm. Phase III: The cluster centres obtained in Phase II are used in phase III. K-Means algorithm is applied in this phase to generate the actual clusters . The result from PSO which is found in phase II is used as the initial seed of the K-means algorithm. Phase III converges quickly when the centroids from Phase II are used. The overall process of clustering is described in figure 1. Fig.1 Overall Process of Clustering The complete algorithm is now presented next. Phase I Self Organizing Map 1). Each node's weights are initialized. 2). A vector is chosen at random from the set of training data and presented to the network. 3). Every node in the network is examined to calculate which ones' weights are most like the input vector. The winning node is commonly known as the Best Matching Unit (BMU). n is the total number of grid neurons. 4). The radius of the neighborhood of the BMU is calculated. This value starts large. Typically it is set to be the radius of the network, diminishing each time-step. 5). Any nodes found within the radius of the BMU, are adjusted to make them more like the input vector. The closer a node is to the BMU, the more its weights are altered. where t time; adaptation coefficient;
  • 5. Computational Intelligence Methods for Clustering of Sense Tagged Nepali Documents DOI: 10.9790/0661-17158389 www.iosrjournals.org 87 | Page neighborhood kernel centered on the winner unit: where rc and ri are positions of neurons c and i on the SOM grid. Both and decrease monotonically with time. 6). Repeat step 2 for N iterations. Phase II Particle Swarm Optimization (1) Initialize each particle with K random cluster centroid vectors. (2) For t= 1 to tmax do a)For each particle i do: b)For each data vector mp do (i) Calculate the distance d (mp,oij), to all cluster centroids Cij (ii) Assign each data vector to the closest centroid vector. (iii) Calculate the fitness value based on equation (2). c) Update the global best and local best positions d) Update the cluster centroids using equations (3) and (4) vid (t+1)=╧Й.vid(t)+c1.rand().(pid-xid)+c2.rand().(pgd-xgd) (3) xid(t+1)=vid(t+1)+xid(t) (4) Where tmax is the maximum number of iterations. Phase III K-means Algorithm 15. Inheriting cluster centroid vectors from Phase II. 16. Assigning each prototype vector to the closest cluster centroids. 17. Update the cluster centers in each cluster using equation (5) (5) where dj denotes the document vectors that belong to cluster Sj; cj stands for the centroid vector; nj is the number of document vectors belong to cluster Sj. 18. Repeat the steps 16 and 17 until there are no more changes in the values of the centroids. V. Experimental Results To demonstrate the performance of the proposed hybrid SOM+PSO + K-means clustering algorithm an evaluation study was carried out. Comparison with other clustering algorithms viz. K-means, SOM + K- means and PSO + K-means was conducted on sense tagged Nepali text corpus and Nepali text corpus without sense tagging . Nepali document dataset has been collected from Technology Development for Indian Language website [15].The corpus in Nepali language provides data from different domains such as literature, science, media, art etc. Sense tagging of the whole document corpus was done using sense marking tool. In this study the quality of the clustering is measured using Intra-cluster similarities and Inter-cluster similarity. A large average intracluster value indicates that the vectors within a category are grouped tightly together. A small average intercluster value indicates that the individual clusters are far apart. The results obtained are shown in table 7. Table 7 Performance Comparison Of K-Means, Hybrid Pso+K-Means, Hybrid Som+K-Means And Hybrid Som+Pso+K-Means Algorithms No. of documents Dimension Method Clustering Algorithm Intracluster Intercluster 100 3174 Word based K-Means 3.6403 181.8389 PSO+Kmeans 3.6657 77.7284 SOM+K-means 1.6121 .6507 SOM+PSO+K-Means 1.6033 .6507 2435 Sense Based K-Means 4.1226 112.5602 PSO-Kmeans 4.0470 77.4100 SOM-K-means 2.3611 1.0084 SOM+PSO+K-Means 2.3682 1.0293 The bar graph of the comparison of the parameters is plotted for the different algorithm (k-means, hybrid PSO+K-means, SOM+K-Means and hybrid SOM+PSO+K-meansтАЭ).
  • 6. Computational Intelligence Methods for Clustering of Sense Tagged Nepali Documents DOI: 10.9790/0661-17158389 www.iosrjournals.org 88 | Page Fig2: Intra cluster similarity using various algorithms Fig3: Inter cluster similarity using various algorithms Fig 4: Number of iteration From the experimental results it is observed that hybrid SOM+PSO+K-means algorithm generates better result compared to K-means, hybrid PSO+K-means, SOM+K-means in terms of both intercluster
  • 7. Computational Intelligence Methods for Clustering of Sense Tagged Nepali Documents DOI: 10.9790/0661-17158389 www.iosrjournals.org 89 | Page similarity and number of iterations. Table 1 shows that hybrid SOM+PSO+K-means algorithm with sense based performs better than the hybrid PSO clustering algorithm with bag of words representation. VI. Conclusion In this paper a hybrid SOM+PSO+K-means algorithm is presented for clustering of document dataset. The algorithm is executed on vectors generated from sense tagged Nepali document dataset where synset ids are considered as component of the vectors. By sense tagging exact senses of words in the context are tagged with words as synset id is unique for each word for each sense. In the experiments K-means and PSO+K-means were executed both on direct document vector and on prototype vectors generated by SOM. Experimental results indicate that using hybrid algorithm that is execution of PSO+K-means on prototype vectors can produce compact clustering result compared to K-means, PSO+K-means and SOM+K-means. References [1]. A.P. Engebrecht, Computational Intelligence An Introduction (John Wiley & Sons, Ltd, 2007) [2]. S. Urooj, S. Shams, S. Hussain, F. Adeeba, Sense Tagged CLE Urdu Digest Corpus, Proc. Conf. on Language and Technology, Karachi, 2014 [3]. T. Kohonen, : Self-organized formation of topologically correct feature maps. Biol.Cybern. 43 59тАУ69 (1982) [4]. J. Kennedy and R.C. Eberhart, тАЬParticle Swarm Optimization,тАЭ Proc. IEEE, International Conference on Neural Networks. [5]. Piscataway. Vol. 4, pp 1942-1948,1995 [6]. S.C. Satapathy, N. VSSV P B. Rao, JVR. Murthy, R. P.V.G.D. Prasad, тАЬA Comparative Analysis of Unsupervised K-means, PSO and Self- Organizing PSO for Image Clustering,тАЭProc. International Conference on Computational Intelligence and Multimedia Applications,2007. [7]. A. Roy, S. Sarkar, B. S. Purkayastha, тАЬA Proposed Nepali Synset Entry and Extraction Tool,тАЭ Proc. 6th Global wordnet conference, Matsue,Japan,2012. [8]. X. Cui, T.E. Potok, Document Clustering Analysis Based on Hybrid PSO+ K-Means Algorithm, Journal of Computer Sciences (Special Issue): 27-33, 2005 ISSN 1549-3636 [9]. H. Lo, C.C. Lin, R-J. Fang, C. Lee, and Y-C. Weng, Chinese Document Clustering Using Self Organizing Map Based on Botanical Document Warehouse, proceedings European Computing Conference, Lecture Notes in Electrical Engineering 28. [10]. U.K. Sridevi and N. Nagaveni, Semantically Enhanced Document Clustering Based on PSO Algorithm, European Journal of Scientific Research, ISSN 1450-216X Vol.57 No.3, pp.485-493,2011 [11]. T.F Gharib, M.M Fouad, A. Mashat, I. Bidawi1, Self Organizing Map based Document Clustering Using WordNet Ontologie. IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 1, No. 2, 2012 [12]. L. Xinwu, Research on Text Clustering Algorithm Based on K_means and SOM International Symposium on Intelligent Information Technology Application Workshops, 2008.13 [13]. Y. Wang, J. Hodges, Document clustering with semantic analysis, Proc. 39th Annual hawaii International Conference on System Sciences, HICSS, Vol. 03, pp.54.3, 2006 [14]. M. Alruily, A. Ayesh, A. Al-Marghilani, Using Self Organizing Map to Cluster Arabic Crime Documents, Proc. International Multiconference on Computer Science and Information Technology, pp. 357тАУ363, ISBN 978-83-60810-27-9 ISSN 1896-7094 [15]. G. Salton, A. Wong, and C.S. Yang. A vector space model for automatic indexing. Communications of the ACM, 18:613тАУ620. [16]. http://tdil-dc.in.