A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
Data partitioning methods are used to partition the data values with similarity. Similarity
measures are used to estimate transaction relationships. Hierarchical clustering model produces tree
structured results. Partitioned clustering produces results in grid format. Text documents are
unstructured data values with high dimensional attributes. Document clustering group ups unlabeled text
documents into meaningful clusters. Traditional clustering methods require cluster count (K) for the
document grouping process. Clustering accuracy degrades drastically with reference to the unsuitable
cluster count.
Textual data elements are divided into two types’ discriminative words and nondiscriminative
words. Only discriminative words are useful for grouping documents. The involvement of
nondiscriminative words confuses the clustering process and leads to poor clustering solution in return.
A variation inference algorithm is used to infer the document collection structure and partition of
document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition
documents. DPM clustering model uses both the data likelihood and the clustering property of the
Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to
discover the latent cluster structure based on the DPM model. DPMFP clustering is performed without
requiring the number of clusters as input.
Document labels are used to estimate the discriminative word identification process. Concept
relationships are analyzed with Ontology support. Semantic weight model is used for the document
similarity analysis. The system improves the scalability with the support of labels and concept relations
for dimensionality reduction process.
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Editor IJMTER
Databases are build with the fixed number of fields and records. Uncertain database contains a
different number of fields and records. Clustering techniques are used to group up the relevant records
based on the similarity values. The similarity measures are designed to estimate the relationship between
the transactions with fixed attributes. The uncertain data similarity is estimated using similarity
measures with some modifications.
Clustering on uncertain data is one of the essential tasks in mining uncertain data. The existing
methods extend traditional partitioning clustering methods like k-means and density-based clustering
methods like DBSCAN to uncertain data. Such methods cannot handle uncertain objects. Probability
distributions are essential characteristics of uncertain objects have not been considered in measuring
similarity between uncertain objects.
The customer purchase transaction data is analyzed using uncertain data clustering scheme. The
density based clustering mechanism is used for the uncertain data clustering process. This model
produces results with minimum accuracy levels. The clustering technique is improved with distribution
based similarity model for uncertain data. The nearest neighbor search technique is applied on the
distribution based data environment. The system is designed using java as a front end and oracle as a
back end.
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
Data partitioning methods are used to partition the data values with similarity. Similarity
measures are used to estimate transaction relationships. Hierarchical clustering model produces tree
structured results. Partitioned clustering produces results in grid format. Text documents are
unstructured data values with high dimensional attributes. Document clustering group ups unlabeled text
documents into meaningful clusters. Traditional clustering methods require cluster count (K) for the
document grouping process. Clustering accuracy degrades drastically with reference to the unsuitable
cluster count.
Textual data elements are divided into two types’ discriminative words and nondiscriminative
words. Only discriminative words are useful for grouping documents. The involvement of
nondiscriminative words confuses the clustering process and leads to poor clustering solution in return.
A variation inference algorithm is used to infer the document collection structure and partition of
document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition
documents. DPM clustering model uses both the data likelihood and the clustering property of the
Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to
discover the latent cluster structure based on the DPM model. DPMFP clustering is performed without
requiring the number of clusters as input.
Document labels are used to estimate the discriminative word identification process. Concept
relationships are analyzed with Ontology support. Semantic weight model is used for the document
similarity analysis. The system improves the scalability with the support of labels and concept relations
for dimensionality reduction process.
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Editor IJMTER
Databases are build with the fixed number of fields and records. Uncertain database contains a
different number of fields and records. Clustering techniques are used to group up the relevant records
based on the similarity values. The similarity measures are designed to estimate the relationship between
the transactions with fixed attributes. The uncertain data similarity is estimated using similarity
measures with some modifications.
Clustering on uncertain data is one of the essential tasks in mining uncertain data. The existing
methods extend traditional partitioning clustering methods like k-means and density-based clustering
methods like DBSCAN to uncertain data. Such methods cannot handle uncertain objects. Probability
distributions are essential characteristics of uncertain objects have not been considered in measuring
similarity between uncertain objects.
The customer purchase transaction data is analyzed using uncertain data clustering scheme. The
density based clustering mechanism is used for the uncertain data clustering process. This model
produces results with minimum accuracy levels. The clustering technique is improved with distribution
based similarity model for uncertain data. The nearest neighbor search technique is applied on the
distribution based data environment. The system is designed using java as a front end and oracle as a
back end.
An Improved Similarity Matching based Clustering Framework for Short and Sent...IJECEIAES
Text clustering plays a key role in navigation and browsing process. For an efficient text clustering, the large amount of information is grouped into meaningful clusters. Multiple text clustering techniques do not address the issues such as, high time and space complexity, inability to understand the relational and contextual attributes of the word, less robustness, risks related to privacy exposure, etc. To address these issues, an efficient text based clustering framework is proposed. The Reuters dataset is chosen as the input dataset. Once the input dataset is preprocessed, the similarity between the words are computed using the cosine similarity. The similarities between the components are compared and the vector data is created. From the vector data the clustering particle is computed. To optimize the clustering results, mutation is applied to the vector data. The performance the proposed text based clustering framework is analyzed using the metrics such as Mean Square Error (MSE), Peak Signal Noise Ratio (PSNR) and Processing time. From the experimental results, it is found that, the proposed text based clustering framework produced optimal MSE, PSNR and processing time when compared to the existing Fuzzy C-Means (FCM) and Pairwise Random Swap (PRS) methods.
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
automatic classification in information retrievalBasma Gamal
automatic classification in information retrieval-automatic classification of documents
Chapter 3 from IR_VAN_Book
INFORMATION RETRIEVAL
C. J. van RIJSBERGEN B.Sc., Ph.D., M.B.C.S.
EXPERT OPINION AND COHERENCE BASED TOPIC MODELINGijnlc
In this paper, we propose a novel algorithm that rearrange the topic assignment results obtained from topic
modeling algorithms, including NMF and LDA. The effectiveness of the algorithm is measured by how much
the results conform to expert opinion, which is a data structure called TDAG that we defined to represent the
probability that a pair of highly correlated words appear together. In order to make sure that the internal
structure does not get changed too much from the rearrangement, coherence, which is a well known metric
for measuring the effectiveness of topic modeling, is used to control the balance of the internal structure.
We developed two ways to systematically obtain the expert opinion from data, depending on whether the
data has relevant expert writing or not. The final algorithm which takes into account both coherence and
expert opinion is presented. Finally we compare amount of adjustments needed to be done for each topic
modeling method, NMF and LDA.
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
Text Document Clustering is one of the fastest growing research areas because of availability of huge amount of information in an electronic form. There are several number of techniques launched for clustering documents in such a way that documents within a cluster have high intra-similarity and low inter-similarity to other clusters. Many document clustering algorithms provide localized search in effectively navigating, summarizing, and organizing information. A global optimal solution can be obtained by applying high-speed and high-quality optimization algorithms. The optimization technique performs a globalized search in the entire solution space. In this paper, a brief survey on optimization approaches to text document clustering is turned out.
Data mining , knowledge discovery is the process
of analyzing data from different perspectives and summarizing it
into useful information - information that can be used to increase
revenue, cuts costs, or both. Data mining software is one of a
number of analytical tools for analyzing data. It allows users to
analyze data from many different dimensions or angles, categorize
it, and summarize the relationships identified. Technically, data
mining is the process of finding correlations or patterns among
dozens of fields in large relational databases. The goal of
clustering is to determine the intrinsic grouping in a set of
unlabeled data. But how to decide what constitutes a good
clustering? It can be shown that there is no absolute “best”
criterion which would be independent of the final aim of the
clustering. Consequently, it is the user which must supply this
criterion, in such a way that the result of the clustering will suit
their needs.
For instance, we could be interested in finding
representatives for homogeneous groups (data reduction), in
finding “natural clusters” and describe their unknown properties
(“natural” data types), in finding useful and suitable groupings
(“useful” data classes) or in finding unusual data objects (outlier
detection).Of late, clustering techniques have been applied in the
areas which involve browsing the gathered data or in categorizing
the outcome provided by the search engines for the reply to the
query raised by the users. In this paper, we are providing a
comprehensive survey over the document clustering.
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE ijdms
Nowadays, document clustering is considered as a da
ta intensive task due to the dramatic, fast increas
e in
the number of available documents. Nevertheless, th
e features that represent those documents are also
too
large. The most common method for representing docu
ments is the vector space model, which represents
document features as a bag of words and does not re
present semantic relations between words. In this
paper we introduce a distributed implementation for
the bisecting k-means using MapReduce programming
model. The aim behind our proposed implementation i
s to solve the problem of clustering intensive data
documents. In addition, we propose integrating the
WordNet ontology with bisecting k-means in order to
utilize the semantic relations between words to enh
ance document clustering results. Our presented
experimental results show that using lexical catego
ries for nouns only enhances internal evaluation
measures of document clustering; and decreases the
documents features from thousands to tens features.
Our experiments were conducted using Amazon ElasticMapReduce to deploy the Bisecting k-means
algorithm
Clustering the results of a search helps the user to overview the information returned. In this paper, we
look upon the clustering task as cataloguing the search results. By catalogue we mean a structured label
list that can help the user to realize the labels and search results. Labelling Cluster is crucial because
meaningless or confusing labels may mislead users to check wrong clusters for the query and lose extra
time. Additionally, labels should reflect the contents of documents within the cluster accurately. To be able
to label clusters effectively, a new cluster labelling method is introduced. More emphasis was given to
/produce comprehensible and accurate cluster labels in addition to the discovery of document clusters. We
also present a new metric that employs to assess the success of cluster labelling. We adopt a comparative
evaluation strategy to derive the relative performance of the proposed method with respect to the two
prominent search result clustering methods: Suffix Tree Clustering and Lingo.
we perform the experiments using the publicly available Datasets Ambient and ODP-239
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...IJORCS
The increasing nature of World Wide Web has imposed great challenges for researchers in improving the search efficiency over the internet. Now days web document clustering has become an important research topic to provide most relevant documents in huge volumes of results returned in response to a simple query. In this paper, first we proposed a novel approach, to precisely define clusters based on maximal frequent item set (MFI) by Apriori algorithm. Afterwards utilizing the same maximal frequent item set (MFI) based similarity measure for Hierarchical document clustering. By considering maximal frequent item sets, the dimensionality of document set is decreased. Secondly, providing privacy preserving of open web documents is to avoiding duplicate documents. There by we can protect the privacy of individual copy rights of documents. This can be achieved using equivalence relation.
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Novel text categorization by amalgamation of augmented k nearest neighbourhoo...ijcsity
Machine learning for text classification is the
underpinning
of document
cataloging
, news filtering,
document
steering
and
exemplif
ication
. In text mining realm, effective feature selection is significant to
make the learning task more accurate and competent. One of the
traditional
lazy
text classifier
k
-
Nearest
Neighborhood (
k
NN) has
a
major pitfall in calculating the similarity between
all
the
objects in training and
testing se
t
s,
there by leads to exaggeration of
both
computational complexity
of the algorithm
and
massive
consumption
of
main memory
. To diminish these shortcomings
in
viewpoint
of a
data
-
mining
practitioner
a
n
amalgamati
ve technique is proposed in this paper using
a novel restructured version of
k
NN called
Augmented
k
NN
(AkNN)
and
k
-
Medoids
(kMdd)
clustering.
The proposed work
comprises
preprocesses
on
the
initial training
set
by
imposing
attribute feature selection
for reduc
tion of high dimensionality, also it
detects and excludes the high
-
fliers
samples
in t
he
initial
training set
and
re
structure
s
a
constricted
training
set
.
The kMdd clustering algorithm generates the cluster centers (as interior objects) for each category
and
restructures
the constricted training set
with centroids
. This technique
is
amalgamated with
AkNN
classifier
that
was prearranged with
text mining similarity measure
s.
Eventually, s
ignifican
tweights
and ranks were
assigned to each object in the new
training set based upon the
ir
accessory towards the
object in testing set
.
Experiments
conducted
on Reuters
-
21578 a
UCI benchmark
text mining
data
set
, and
comparisons with
traditional
k
NN
classifier designates
the
referred
method
yield
spreeminentrecital
in b
oth clustering and
classification
An Improved Similarity Matching based Clustering Framework for Short and Sent...IJECEIAES
Text clustering plays a key role in navigation and browsing process. For an efficient text clustering, the large amount of information is grouped into meaningful clusters. Multiple text clustering techniques do not address the issues such as, high time and space complexity, inability to understand the relational and contextual attributes of the word, less robustness, risks related to privacy exposure, etc. To address these issues, an efficient text based clustering framework is proposed. The Reuters dataset is chosen as the input dataset. Once the input dataset is preprocessed, the similarity between the words are computed using the cosine similarity. The similarities between the components are compared and the vector data is created. From the vector data the clustering particle is computed. To optimize the clustering results, mutation is applied to the vector data. The performance the proposed text based clustering framework is analyzed using the metrics such as Mean Square Error (MSE), Peak Signal Noise Ratio (PSNR) and Processing time. From the experimental results, it is found that, the proposed text based clustering framework produced optimal MSE, PSNR and processing time when compared to the existing Fuzzy C-Means (FCM) and Pairwise Random Swap (PRS) methods.
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
automatic classification in information retrievalBasma Gamal
automatic classification in information retrieval-automatic classification of documents
Chapter 3 from IR_VAN_Book
INFORMATION RETRIEVAL
C. J. van RIJSBERGEN B.Sc., Ph.D., M.B.C.S.
EXPERT OPINION AND COHERENCE BASED TOPIC MODELINGijnlc
In this paper, we propose a novel algorithm that rearrange the topic assignment results obtained from topic
modeling algorithms, including NMF and LDA. The effectiveness of the algorithm is measured by how much
the results conform to expert opinion, which is a data structure called TDAG that we defined to represent the
probability that a pair of highly correlated words appear together. In order to make sure that the internal
structure does not get changed too much from the rearrangement, coherence, which is a well known metric
for measuring the effectiveness of topic modeling, is used to control the balance of the internal structure.
We developed two ways to systematically obtain the expert opinion from data, depending on whether the
data has relevant expert writing or not. The final algorithm which takes into account both coherence and
expert opinion is presented. Finally we compare amount of adjustments needed to be done for each topic
modeling method, NMF and LDA.
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
Text Document Clustering is one of the fastest growing research areas because of availability of huge amount of information in an electronic form. There are several number of techniques launched for clustering documents in such a way that documents within a cluster have high intra-similarity and low inter-similarity to other clusters. Many document clustering algorithms provide localized search in effectively navigating, summarizing, and organizing information. A global optimal solution can be obtained by applying high-speed and high-quality optimization algorithms. The optimization technique performs a globalized search in the entire solution space. In this paper, a brief survey on optimization approaches to text document clustering is turned out.
Data mining , knowledge discovery is the process
of analyzing data from different perspectives and summarizing it
into useful information - information that can be used to increase
revenue, cuts costs, or both. Data mining software is one of a
number of analytical tools for analyzing data. It allows users to
analyze data from many different dimensions or angles, categorize
it, and summarize the relationships identified. Technically, data
mining is the process of finding correlations or patterns among
dozens of fields in large relational databases. The goal of
clustering is to determine the intrinsic grouping in a set of
unlabeled data. But how to decide what constitutes a good
clustering? It can be shown that there is no absolute “best”
criterion which would be independent of the final aim of the
clustering. Consequently, it is the user which must supply this
criterion, in such a way that the result of the clustering will suit
their needs.
For instance, we could be interested in finding
representatives for homogeneous groups (data reduction), in
finding “natural clusters” and describe their unknown properties
(“natural” data types), in finding useful and suitable groupings
(“useful” data classes) or in finding unusual data objects (outlier
detection).Of late, clustering techniques have been applied in the
areas which involve browsing the gathered data or in categorizing
the outcome provided by the search engines for the reply to the
query raised by the users. In this paper, we are providing a
comprehensive survey over the document clustering.
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE ijdms
Nowadays, document clustering is considered as a da
ta intensive task due to the dramatic, fast increas
e in
the number of available documents. Nevertheless, th
e features that represent those documents are also
too
large. The most common method for representing docu
ments is the vector space model, which represents
document features as a bag of words and does not re
present semantic relations between words. In this
paper we introduce a distributed implementation for
the bisecting k-means using MapReduce programming
model. The aim behind our proposed implementation i
s to solve the problem of clustering intensive data
documents. In addition, we propose integrating the
WordNet ontology with bisecting k-means in order to
utilize the semantic relations between words to enh
ance document clustering results. Our presented
experimental results show that using lexical catego
ries for nouns only enhances internal evaluation
measures of document clustering; and decreases the
documents features from thousands to tens features.
Our experiments were conducted using Amazon ElasticMapReduce to deploy the Bisecting k-means
algorithm
Clustering the results of a search helps the user to overview the information returned. In this paper, we
look upon the clustering task as cataloguing the search results. By catalogue we mean a structured label
list that can help the user to realize the labels and search results. Labelling Cluster is crucial because
meaningless or confusing labels may mislead users to check wrong clusters for the query and lose extra
time. Additionally, labels should reflect the contents of documents within the cluster accurately. To be able
to label clusters effectively, a new cluster labelling method is introduced. More emphasis was given to
/produce comprehensible and accurate cluster labels in addition to the discovery of document clusters. We
also present a new metric that employs to assess the success of cluster labelling. We adopt a comparative
evaluation strategy to derive the relative performance of the proposed method with respect to the two
prominent search result clustering methods: Suffix Tree Clustering and Lingo.
we perform the experiments using the publicly available Datasets Ambient and ODP-239
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...IJORCS
The increasing nature of World Wide Web has imposed great challenges for researchers in improving the search efficiency over the internet. Now days web document clustering has become an important research topic to provide most relevant documents in huge volumes of results returned in response to a simple query. In this paper, first we proposed a novel approach, to precisely define clusters based on maximal frequent item set (MFI) by Apriori algorithm. Afterwards utilizing the same maximal frequent item set (MFI) based similarity measure for Hierarchical document clustering. By considering maximal frequent item sets, the dimensionality of document set is decreased. Secondly, providing privacy preserving of open web documents is to avoiding duplicate documents. There by we can protect the privacy of individual copy rights of documents. This can be achieved using equivalence relation.
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Novel text categorization by amalgamation of augmented k nearest neighbourhoo...ijcsity
Machine learning for text classification is the
underpinning
of document
cataloging
, news filtering,
document
steering
and
exemplif
ication
. In text mining realm, effective feature selection is significant to
make the learning task more accurate and competent. One of the
traditional
lazy
text classifier
k
-
Nearest
Neighborhood (
k
NN) has
a
major pitfall in calculating the similarity between
all
the
objects in training and
testing se
t
s,
there by leads to exaggeration of
both
computational complexity
of the algorithm
and
massive
consumption
of
main memory
. To diminish these shortcomings
in
viewpoint
of a
data
-
mining
practitioner
a
n
amalgamati
ve technique is proposed in this paper using
a novel restructured version of
k
NN called
Augmented
k
NN
(AkNN)
and
k
-
Medoids
(kMdd)
clustering.
The proposed work
comprises
preprocesses
on
the
initial training
set
by
imposing
attribute feature selection
for reduc
tion of high dimensionality, also it
detects and excludes the high
-
fliers
samples
in t
he
initial
training set
and
re
structure
s
a
constricted
training
set
.
The kMdd clustering algorithm generates the cluster centers (as interior objects) for each category
and
restructures
the constricted training set
with centroids
. This technique
is
amalgamated with
AkNN
classifier
that
was prearranged with
text mining similarity measure
s.
Eventually, s
ignifican
tweights
and ranks were
assigned to each object in the new
training set based upon the
ir
accessory towards the
object in testing set
.
Experiments
conducted
on Reuters
-
21578 a
UCI benchmark
text mining
data
set
, and
comparisons with
traditional
k
NN
classifier designates
the
referred
method
yield
spreeminentrecital
in b
oth clustering and
classification
A Novel Clustering Method for Similarity Measuring in Text DocumentsIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Hierarchal clustering and similarity measures along with multi representationeSAT Journals
Abstract All clustering methods have to assume some cluster relationship on the list of data objects that they really are applied on. Graph-Based Document Clustering works with frequent senses rather than frequent keywords used in traditional text mining techniques.Similarity between a pair of objects can be defined either explicitly or implicitly. With this paper, we analyzed existing multi-viewpoint based similarity measure and two related clustering methods. The main difference between a traditional dissimilarity/similarity measure and ours could be that the former uses merely a single viewpoint, which is the origin, even though the latter utilizes many viewpoints, which you ll find are objects assumed to not have the very same cluster using the two objects being measured. Using multiple viewpoints, more informative assessment of similarity could well be achieved. Theoretical analysis and empirical study are conducted to back up this claim. Two criterion functions for document clustering are proposed dependent on this wonderful measure. We compare them several well-known clustering algorithms which use other popular similarity measures on various document collections confirming the good sides of our proposal. Keywords –Multiview Cluster, Document id, ClusterDistance
Data mining is utilized to manage huge measure of information which are put in the data ware houses and databases, to discover required information and data. Numerous data mining systems have been proposed, for example, association rules, decision trees, neural systems, clustering, and so on. It has turned into the purpose of consideration from numerous years. A re-known amongst the available data mining strategies is clustering of the dataset. It is the most effective data mining method. It groups the dataset in number of clusters based on certain guidelines that are predefined. It is dependable to discover the connection between the distinctive characteristics of data.
In k-mean clustering algorithm, the function is being selected on the basis of the relevancy of the function for predicting the data and also the Euclidian distance between the centroid of any cluster and the data objects outside the cluster is being computed for the clustering the data points. In this work, author enhanced the Euclidian distance formula to increase the cluster quality.
The problem of accuracy and redundancy of the dissimilar points in the clusters remains in the improved k-means for which new enhanced approach is been proposed which uses the similarity function for checking the similarity level of the point before including it to the cluster.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
-In this study a comprehensive evaluation of two
supervised feature selection methods for dimensionality
reduction is performed - Latent Semantic Indexing (LSI) and
Principal Component Analysis (PCA). This is gauged against
unsupervised techniques like fuzzy feature clustering using
hard fuzzy C-means (FCM) . The main objective of the study is
to estimate the relative efficiency of two supervised techniques
against unsupervised fuzzy techniques while reducing the
feature space. It is found that clustering using FCM leads to
better accuracy in classifying documents in the face of
evolutionary algorithms like LSI and PCA. Results show that
the clustering of features improves the accuracy of document
classification
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION cscpconf
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text
classification. In this paper, Fast Fuzzy Feature clustering for text classification is proposed. It
is based on the framework proposed by Jung-Yi Jiang, Ren-Jia Liou and Shie-Jue Lee in 2011.
The word in the feature vector of the document is grouped into the cluster in less iteration. The
numbers of iterations required to obtain cluster centers are reduced by transforming clusters
center dimension from n-dimension to 2-dimension. Principle Component Analysis with slit
change is used for dimension reduction. Experimental results show that, this method improve
the performance by significantly reducing the number of iterations required to obtain the cluster
center. The same is being verified with three benchmark datasets
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Recent Trends in Incremental Clustering: A ReviewIOSRjournaljce
This paper presents a review on recent trends in incremental clustering algorithms. It tries to focus on both clustering based on similarity measure and clustering not based on similarity measure. In this context, the paper is devoted to various typical incremental clustering algorithms. Mainly optimization, genetic and fuzzy approaches of these algorithms is covered in the paper. The paper is original with respect to one aspect that is, it provides a complete overview that is fully devoted to evolutionary algorithms for incremental clustering. A number of references are provided that describe applications of evolutionary algorithms for incremental clustering in different domains, such as human activity detection, online fault detection, information security, track an object consistently throughout the network solving boundary problem etc.
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...IJECEIAES
Data analysis plays a prominent role in interpreting various phenomena. Data mining is the process to hypothesize useful knowledge from the extensive data. Based upon the classical statistical prototypes the data can be exploited beyond the storage and management of the data. Cluster analysis a primary investigation with little or no prior knowledge, consists of research and development across a wide variety of communities. Cluster ensembles are melange of individual solutions obtained from different clusterings to produce final quality clustering which is required in wider applications. The method arises in the perspective of increasing robustness, scalability and accuracy. This paper gives a brief overview of the generation methods and consensus functions included in cluster ensemble. The survey is to analyze the various techniques and cluster ensemble methods.
Electrically small antennas: The art of miniaturizationEditor IJARCET
We are living in the technological era, were we preferred to have the portable devices rather than unmovable devices. We are isolating our self rom the wires and we are becoming the habitual of wireless world what makes the device portable? I guess physical dimensions (mechanical) of that particular device, but along with this the electrical dimension is of the device is also of great importance. Reducing the physical dimension of the antenna would result in the small antenna but not electrically small antenna. We have different definition for the electrically small antenna but the one which is most appropriate is, where k is the wave number and is equal to and a is the radius of the imaginary sphere circumscribing the maximum dimension of the antenna. As the present day electronic devices progress to diminish in size, technocrats have become increasingly concentrated on electrically small antenna (ESA) designs to reduce the size of the antenna in the overall electronics system. Researchers in many fields, including RF and Microwave, biomedical technology and national intelligence, can benefit from electrically small antennas as long as the performance of the designed ESA meets the system requirement.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Search and Society: Reimagining Information Access for Radical Futures
Volume 2-issue-6-1969-1973
1. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, Issue 6, June 2013
1969
www.ijarcet.org
Abstract— Clustering is the division of data into groups called
as clusters. Document clustering is done to analyse the large
number of documents distributed over various sites. The similar
documents are grouped together to form a cluster. The success
or failure of a clustering method depends on the nature of
similarity measure used. The multiviewpoint-based similarity
measure or MVS uses different viewpoints unlike the
traditional similarity measures that use only a single viewpoint.
This increases the accuracy of clustering. A hierarchical
clustering algorithm creates a hierarchical tree of the given set
of data objects. Depending on the decomposition approach,
hierarchical algorithms are classified as agglomerative
(merging) or divisive (splitting). This paper focuses on applying
multiviewpoint-based similarity measure on hierarchical
clustering.
Index Terms—Document Clustering, Hierarchical
Clustering, Similarity Measure.
I. INTRODUCTION
Clustering is the process of organizing objects into groups
whose members are similar in some way. Thus a cluster is a
collection of objects which are “similar” to each other and are
“dissimilar” to the objects that are in other clusters.
Clustering [1] is ultimately a process of reducing a mountain
of data to manageable piles. For cognitive and computational
simplification, these piles may consist of "similar" items.
A. Document Clustering
Document clustering has become an increasingly
important task in analysing huge numbers of documents
distributed among various sites. The important feature is to
organize the documents in such a way that it results in better
search without having much cost and complexity. The
Cluster Hypothesis [2] is fundamental to the issue of
improved effectiveness which states that relevant documents
are more similar to each other than non-relevant documents
and thus tend to appear in the same clusters. In a clustered
collection, this relevant document may be clustered together
with other relevant items that have the required query terms
and could therefore be retrieved through a clustered search.
Document clustering offers an alternative file organization to
Manuscript received June, 2013.
Merin Paul, PG Scholar, Computer Science and Engineering,
Coimbatore Institute of Engineering and Technology. Narasipuram,
Coimbatore, Tamil Nadu,,India, 8129898069
P Thangam, Assistant Professor, Computer Science and Engineering,
Coimbatore Institute of Engineering and Technology, Narasipuram,
Coimbatore, Tamil Nadu,,India, 8098099829.,
that of best-match retrieval and it has the potential to address
this issue, thereby increase the effectiveness of an IR system.
There are two approaches to document clustering,
particularly in information retrieval, they are known as term
and item clustering. Term clustering is a method, which
groups redundant terms. The grouping reduces noise and
increase frequency of assignment. The dimension is also
reduced if there are fewer clusters the original terms. But the
semantic properties will be affected.
There are many different algorithms available for term
clustering. These are cliques, stars, single link and connected
components. Cliques need all items in a cluster to be within
the threshold of all other items. In single link clustering the
strong constraint that every term in a class is similar to every
other term is relaxed. The star technique selects a term and
then places in the class all terms that is related to that term.
Terms not yet in classes are selected as new seeds until all
terms are assigned to a class. There are many different classes
that can be created using the star technique. Item clustering
helps the user in identifying relevant items.
When items in the database have been clustered, it is
possible to retrieve all of the items in a cluster, even though
the search statement does not identify them. When the user
retrieves a strongly significant item, the user can look at other
items like it without issuing another search. When significant
items are used to create a new query, the retrieved hits are
similar to what might be produced by a clustering algorithm.
However, term clustering and item clustering in a sense
achieve the same objective even though they are the inverse
of each other. For all of the terms within the same cluster,
there will be significant overlap of the set of items they are
found in. Item clustering is based upon the same terms being
found in the other items in the cluster. Thus the set of items
that caused a term clustering has a strong possibility of being
in the same item cluster based upon the terms.
B. Similarity Measures
The set of terms shared between a pair of documents is
typically used as an indication of the similarity of the pair.
The nature of similarity measure plays a very important role
in the success or failure of a clustering method.
Text document clustering groups similar documents to
form a cluster, while documents that are different have
separated apart into different clusters. Accurate clustering
requires a precise definition of the closeness between a pair
A Modified Hierarchical Clustering Algorithm
for Document Clustering
Merin Paul, P Thangam
3. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, Issue 6, June 2013
1971
www.ijarcet.org
where 0 is vector 0 that represents the origin point. According
to this formula, the measure takes 0 as one and only reference
point. The similarity between two documents di and dj is
determined w.r.t. the angle between the two points when
looking from the origin [8].
C. Multiviewpoint-based Similarity Identification
Similarity of two documents di and dj -given that they are
in the same cluster-is equal to the product of the cosine of the
angle between di and dj looking from dh and the euclidean
distances from dh to di and dj.
(3)
The two objects to be measured, di and dj , must be in the
same cluster, while the points from where to establish this
measurement, dh must be outside of the cluster.
D. Constrained Kmeans Clustering
The traditional kmeans algorithm is used for clustering. IR
and IV are used to determine the quality of the clusters formed
[9].
(4)
(5)
D denotes the composite vector of all the documents and Dr
denotes composite vector of cluster r. The value of ranges
from 0 to 1.
E. Constrained Hierarchical Clustering
Hierarchical agglomerative clustering is used to form
clusters. In an agglomerative method, each object forms a
cluster. Then the two most similar clusters are merged
iteratively until some termination criterion is satisfied. The
cluster merging process is repeated until all the objects are
merged to form one cluster.
After the clusters are formed, the quality of clusters thus
generated is determined using IR and IV, which are given in (4)
and (5).
Fig 1 shows the system flow diagram of the proposed
work.
Fig. 1 System Flow Diagram
IV. ANALYSIS OF DOCUMENT CLUSTERING
METHODS
To verify the advantages of the proposed work, their
performance have to be evaluated. The objective of this
section is to compare hierarchical clustering with kmeans
clustering after applying MVS on both of the methods.
The analysis of the two document clustering methods are
done using two data sets each containing twenty documents.
The data sets ‘autos’ and ‘motorcycles’ are used to analyse
the document clustering methods.
‘Autos’ and ‘motorcycle’ are available with the 20
newsgroup. ‘Autos’ consist of 2344 newspaper articles,
among which 20 are taken for this experiment. ‘Motorcycle’
consist of 2344 newspaper articles, among which 20 are
taken for this experiment.
Fig. 2 shows the sample screen shot of the menu page for
constrained kmeans clustering.
Input dataset
Cosine similarity calculation
MVS matrix generation
Hierarchical clustering Clustering based on K means
Calculate IR & IV
Evaluate cluster
if (IR &
IV=1)
Performance evaluation
No
Yes
5. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, Issue 6, June 2013
1973
www.ijarcet.org
1 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of Classes
Recall
K Means Clustering
Hierarchical Clustering
Fig. 7 Clustering Results on Recall
VI. CONCLUSION
In this paper a new hierarchical clustering method using
MVS is proposed for document clustering. MVS uses
multiple points of reference. The similarity between two
documents in a cluster is calculated with respect to another
document that is outside the cluster. This increases the
accuracy of clustering and thus the quality of the clusters
formed increases. Finally the proposed work is compared
with kmeans clustering using MVS and is found that
performance of hierarchical clustering is higher.
REFERENCES
[1] M. Pelillo, “What Is a Cluster? Perspectives from Game
Theory,”Proc. NIPS Workshop Clustering Theory, 2009.
[2] Guyon I, R.C. Williamson and U.V. Luxburg , “Clustering:
Science or Art?,” Proc. NIPS Workshop Clustering Theory,2009.
[3] Anna Huang, Department of Computer Science The University of
Waikato, Hamilton, New Zealand “Similarity Measures for Text
Document Clustering”,2005.
[4] Ng A, B. Liu, D.J. Hand, D. Steinberg, G.J. McLachlan, J. Ghosh,
J.R. Quinlan, M. Steinbach, P.S. Yu, V. Kumar, Q. Yang, X. Wu,
and Z.-H. Zhou (2007), “Top 10 Algorithms in Data Mining,”
Knowledge Information Systems, vol. 14, no. 1, pp. 1-37.
[5] S. Zhong, “Efficient Online Spherical K-means Clustering,” Proc.
IEEE Int’l Joint Conf. Neural Networks (IJCNN), pp.
3180-3185,2005.
[6] G. Karypis, “CLUTO a Clustering Toolkit,” technical report,
Department of Computer Science,
Univ.ofMinnesota,http://glaros.dtc.umn.edu/~gkhome/views/clu
to, 2003.
[7] Chim H and X. Deng, “Efficient Phrase-Based Document
Similarity for Clustering,” IEEE Trans. Knowledge and Data
Eng., vol. 20, no. 9, pp. 1217-1229 Sept. 2008.
[8] Zha H, C. Ding, X. He, M. Gu, and H. Simon, “A Min-Max Cut
Algorithm for Graph Partitioning and Data Clustering,”
Proc.IEEE Int’l Conf. Data Mining (ICDM), pp. 107-114, 2001.
[9] Chee Keong Chan, Duc Thang Nguyen, Lihui Chen and Senior
Member, IEEE, “Clustering with Multiviewpoint-Based
Similarity Measure” IEEE transactions on knowledge and data
engineering, Vol. 24, No. 6, 2012.
[10] Modha D and I. Dhillon, “Concept Decompositions for Large
Sparse Text Data Using Clustering,” Machine Learning, Vol. 42,
nos. 1/2, pp. 143-175, Jan. 2001.
Merin Paul is currently pursuing M.E
Computer Science and Engineering at
Coimbatore Institute of Engineering and
Technology, Coimbatore, Tamil Nadu, (Anna
University, Chennai). She completed her
B.Tech in Information Technology from MES
College of Engineering, Kuttipuram, Kerala,
(University of Calicut) in 2010. Her research
interests include Data Mining and Testing.
Ms.P.Thangam received her B.E Degree in
Computer Hardware and software Engineering
from Avinashilingam University, Coimbatore in
2001. She has received her M.E degree in
Computer Science and Engineering from
Government College of Technology,
Coimbatore in 2007. She is currently doing her
PhD in the area of Medical Image Processing
under Anna University, Chennai. Presently she
is working as an Assistant Professor in the
Department of Computer Science and
Engineering at Coimbatore Institute of
Engineering and Technology, Coimbatore. Her
research interests are in Image Processing,
Medical Image Analysis, Data Mining,
Classification and Pattern Recognition.