Machine learning for text classification is the
underpinning
of document
cataloging
, news filtering,
document
steering
and
exemplif
ication
. In text mining realm, effective feature selection is significant to
make the learning task more accurate and competent. One of the
traditional
lazy
text classifier
k
-
Nearest
Neighborhood (
k
NN) has
a
major pitfall in calculating the similarity between
all
the
objects in training and
testing se
t
s,
there by leads to exaggeration of
both
computational complexity
of the algorithm
and
massive
consumption
of
main memory
. To diminish these shortcomings
in
viewpoint
of a
data
-
mining
practitioner
a
n
amalgamati
ve technique is proposed in this paper using
a novel restructured version of
k
NN called
Augmented
k
NN
(AkNN)
and
k
-
Medoids
(kMdd)
clustering.
The proposed work
comprises
preprocesses
on
the
initial training
set
by
imposing
attribute feature selection
for reduc
tion of high dimensionality, also it
detects and excludes the high
-
fliers
samples
in t
he
initial
training set
and
re
structure
s
a
constricted
training
set
.
The kMdd clustering algorithm generates the cluster centers (as interior objects) for each category
and
restructures
the constricted training set
with centroids
. This technique
is
amalgamated with
AkNN
classifier
that
was prearranged with
text mining similarity measure
s.
Eventually, s
ignifican
tweights
and ranks were
assigned to each object in the new
training set based upon the
ir
accessory towards the
object in testing set
.
Experiments
conducted
on Reuters
-
21578 a
UCI benchmark
text mining
data
set
, and
comparisons with
traditional
k
NN
classifier designates
the
referred
method
yield
spreeminentrecital
in b
oth clustering and
classification
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Editor IJMTER
Â
Databases are build with the fixed number of fields and records. Uncertain database contains a
different number of fields and records. Clustering techniques are used to group up the relevant records
based on the similarity values. The similarity measures are designed to estimate the relationship between
the transactions with fixed attributes. The uncertain data similarity is estimated using similarity
measures with some modifications.
Clustering on uncertain data is one of the essential tasks in mining uncertain data. The existing
methods extend traditional partitioning clustering methods like k-means and density-based clustering
methods like DBSCAN to uncertain data. Such methods cannot handle uncertain objects. Probability
distributions are essential characteristics of uncertain objects have not been considered in measuring
similarity between uncertain objects.
The customer purchase transaction data is analyzed using uncertain data clustering scheme. The
density based clustering mechanism is used for the uncertain data clustering process. This model
produces results with minimum accuracy levels. The clustering technique is improved with distribution
based similarity model for uncertain data. The nearest neighbor search technique is applied on the
distribution based data environment. The system is designed using java as a front end and oracle as a
back end.
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
Â
Data partitioning methods are used to partition the data values with similarity. Similarity
measures are used to estimate transaction relationships. Hierarchical clustering model produces tree
structured results. Partitioned clustering produces results in grid format. Text documents are
unstructured data values with high dimensional attributes. Document clustering group ups unlabeled text
documents into meaningful clusters. Traditional clustering methods require cluster count (K) for the
document grouping process. Clustering accuracy degrades drastically with reference to the unsuitable
cluster count.
Textual data elements are divided into two typesâ discriminative words and nondiscriminative
words. Only discriminative words are useful for grouping documents. The involvement of
nondiscriminative words confuses the clustering process and leads to poor clustering solution in return.
A variation inference algorithm is used to infer the document collection structure and partition of
document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition
documents. DPM clustering model uses both the data likelihood and the clustering property of the
Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to
discover the latent cluster structure based on the DPM model. DPMFP clustering is performed without
requiring the number of clusters as input.
Document labels are used to estimate the discriminative word identification process. Concept
relationships are analyzed with Ontology support. Semantic weight model is used for the document
similarity analysis. The system improves the scalability with the support of labels and concept relations
for dimensionality reduction process.
Multilevel techniques for the clustering problemcsandit
Â
Data Mining is concerned with the discovery of interesting patterns and knowledge in data
repositories. Cluster Analysis which belongs to the core methods of data mining is the process
of discovering homogeneous groups called clusters. Given a data-set and some measure of
similarity between data objects, the goal in most clustering algorithms is maximizing both the
homogeneity within each cluster and the heterogeneity between different clusters. In this work,
two multilevel algorithms for the clustering problem are introduced. The multilevel
paradigm suggests looking at the clustering problem as a hierarchical optimization process
going through different levels evolving from a coarse grain to fine grain strategy. The clustering
problem is solved by first reducing the problem level by level to a coarser problem where an
initial clustering is computed. The clustering of the coarser problem is mapped back level-bylevel
to obtain a better clustering of the original problem by refining the intermediate different
clustering obtained at various levels. A benchmark using a number of data sets collected from a
variety of domains is used to compare the effectiveness of the hierarchical approach against its
single-level counterpart.
A Novel Clustering Method for Similarity Measuring in Text DocumentsIJMER
Â
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
Clustering of high dimensionality data which can be seen in almost all fields these days is becoming
very tedious process. The key disadvantage of high dimensional data which we can pen down is curse
of dimensionality. As the magnitude of datasets grows the data points become sparse and density of
area becomes less making it difficult to cluster that data which further reduces the performance of
traditional algorithms used for clustering. Semi-supervised clustering algorithms aim to improve
clustering results using limited supervision. The supervision is generally given as pair wise
constraints; such constraints are natural for graphs, yet most semi-supervised clustering algorithms are
designed for data represented as vectors [2]. In this paper, we unify vector-based and graph-based
approaches. We first show that a recently-proposed objective function for semi-supervised clustering
based on Hidden Markov Random Fields, with squared Euclidean distance and a certain class of
constraint penalty functions, can be expressed as a special case of the global kernel k-means objective
[3]. A recent theoretical connection between global kernel k-means and several graph clustering
objectives enables us to perform semi-supervised clustering of data. In particular, some methods have
been proposed for semi supervised clustering based on pair wise similarity or dissimilarity
information. In this paper, we propose a kernel approach for semi supervised clustering and present in
detail two special cases of this kernel approach.
Comparision of methods for combination of multiple classifiers that predict b...IJERA Editor
Â
Predictive analysis include techniques fromdata mining that analyze current and historical data and make
predictions about the future. Predictive analytics is used in actuarial science, financial services, retail, travel,
healthcare, insurance, pharmaceuticals, marketing, telecommunications and other fields.Predicting patterns can
be considered as a classification problem and combining the different classifiers gives better results. We will
study and compare three methods used to combine multiple classifiers. Bayesian networks perform
classification based on conditional probability. It is ineffective and easy to interpret as it assumes that the
predictors are independent. Tree augmented naĂŻve Bayes (TAN) constructs a maximum weighted spanning tree
that maximizes the likelihood of the training data, to perform classification.This tree structure eliminates the
independent attribute assumption of naĂŻve Bayesian networks. Behavior-knowledge space method works in two
phases and can provide very good performances if large and representative data sets are available.
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Editor IJMTER
Â
Databases are build with the fixed number of fields and records. Uncertain database contains a
different number of fields and records. Clustering techniques are used to group up the relevant records
based on the similarity values. The similarity measures are designed to estimate the relationship between
the transactions with fixed attributes. The uncertain data similarity is estimated using similarity
measures with some modifications.
Clustering on uncertain data is one of the essential tasks in mining uncertain data. The existing
methods extend traditional partitioning clustering methods like k-means and density-based clustering
methods like DBSCAN to uncertain data. Such methods cannot handle uncertain objects. Probability
distributions are essential characteristics of uncertain objects have not been considered in measuring
similarity between uncertain objects.
The customer purchase transaction data is analyzed using uncertain data clustering scheme. The
density based clustering mechanism is used for the uncertain data clustering process. This model
produces results with minimum accuracy levels. The clustering technique is improved with distribution
based similarity model for uncertain data. The nearest neighbor search technique is applied on the
distribution based data environment. The system is designed using java as a front end and oracle as a
back end.
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
Â
Data partitioning methods are used to partition the data values with similarity. Similarity
measures are used to estimate transaction relationships. Hierarchical clustering model produces tree
structured results. Partitioned clustering produces results in grid format. Text documents are
unstructured data values with high dimensional attributes. Document clustering group ups unlabeled text
documents into meaningful clusters. Traditional clustering methods require cluster count (K) for the
document grouping process. Clustering accuracy degrades drastically with reference to the unsuitable
cluster count.
Textual data elements are divided into two typesâ discriminative words and nondiscriminative
words. Only discriminative words are useful for grouping documents. The involvement of
nondiscriminative words confuses the clustering process and leads to poor clustering solution in return.
A variation inference algorithm is used to infer the document collection structure and partition of
document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition
documents. DPM clustering model uses both the data likelihood and the clustering property of the
Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to
discover the latent cluster structure based on the DPM model. DPMFP clustering is performed without
requiring the number of clusters as input.
Document labels are used to estimate the discriminative word identification process. Concept
relationships are analyzed with Ontology support. Semantic weight model is used for the document
similarity analysis. The system improves the scalability with the support of labels and concept relations
for dimensionality reduction process.
Multilevel techniques for the clustering problemcsandit
Â
Data Mining is concerned with the discovery of interesting patterns and knowledge in data
repositories. Cluster Analysis which belongs to the core methods of data mining is the process
of discovering homogeneous groups called clusters. Given a data-set and some measure of
similarity between data objects, the goal in most clustering algorithms is maximizing both the
homogeneity within each cluster and the heterogeneity between different clusters. In this work,
two multilevel algorithms for the clustering problem are introduced. The multilevel
paradigm suggests looking at the clustering problem as a hierarchical optimization process
going through different levels evolving from a coarse grain to fine grain strategy. The clustering
problem is solved by first reducing the problem level by level to a coarser problem where an
initial clustering is computed. The clustering of the coarser problem is mapped back level-bylevel
to obtain a better clustering of the original problem by refining the intermediate different
clustering obtained at various levels. A benchmark using a number of data sets collected from a
variety of domains is used to compare the effectiveness of the hierarchical approach against its
single-level counterpart.
A Novel Clustering Method for Similarity Measuring in Text DocumentsIJMER
Â
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
Clustering of high dimensionality data which can be seen in almost all fields these days is becoming
very tedious process. The key disadvantage of high dimensional data which we can pen down is curse
of dimensionality. As the magnitude of datasets grows the data points become sparse and density of
area becomes less making it difficult to cluster that data which further reduces the performance of
traditional algorithms used for clustering. Semi-supervised clustering algorithms aim to improve
clustering results using limited supervision. The supervision is generally given as pair wise
constraints; such constraints are natural for graphs, yet most semi-supervised clustering algorithms are
designed for data represented as vectors [2]. In this paper, we unify vector-based and graph-based
approaches. We first show that a recently-proposed objective function for semi-supervised clustering
based on Hidden Markov Random Fields, with squared Euclidean distance and a certain class of
constraint penalty functions, can be expressed as a special case of the global kernel k-means objective
[3]. A recent theoretical connection between global kernel k-means and several graph clustering
objectives enables us to perform semi-supervised clustering of data. In particular, some methods have
been proposed for semi supervised clustering based on pair wise similarity or dissimilarity
information. In this paper, we propose a kernel approach for semi supervised clustering and present in
detail two special cases of this kernel approach.
Comparision of methods for combination of multiple classifiers that predict b...IJERA Editor
Â
Predictive analysis include techniques fromdata mining that analyze current and historical data and make
predictions about the future. Predictive analytics is used in actuarial science, financial services, retail, travel,
healthcare, insurance, pharmaceuticals, marketing, telecommunications and other fields.Predicting patterns can
be considered as a classification problem and combining the different classifiers gives better results. We will
study and compare three methods used to combine multiple classifiers. Bayesian networks perform
classification based on conditional probability. It is ineffective and easy to interpret as it assumes that the
predictors are independent. Tree augmented naĂŻve Bayes (TAN) constructs a maximum weighted spanning tree
that maximizes the likelihood of the training data, to perform classification.This tree structure eliminates the
independent attribute assumption of naĂŻve Bayesian networks. Behavior-knowledge space method works in two
phases and can provide very good performances if large and representative data sets are available.
Ensemble based Distributed K-Modes ClusteringIJERD Editor
Â
Clustering has been recognized as the unsupervised classification of data items into groups. Due to the explosion in the number of autonomous data sources, there is an emergent need for effective approaches in distributed clustering. The distributed clustering algorithm is used to cluster the distributed datasets without gathering all the data in a single site. The K-Means is a popular clustering method owing to its simplicity and speed in clustering large datasets. But it fails to handle directly the datasets with categorical attributes which are generally occurred in real life datasets. Huang proposed the K-Modes clustering algorithm by introducing a new dissimilarity measure to cluster categorical data. This algorithm replaces means of clusters with a frequency based method which updates modes in the clustering process to minimize the cost function. Most of the distributed clustering algorithms found in the literature seek to cluster numerical data. In this paper, a novel Ensemble based Distributed K-Modes clustering algorithm is proposed, which is well suited to handle categorical data sets as well as to perform distributed clustering process in an asynchronous manner. The performance of the proposed algorithm is compared with the existing distributed K-Means clustering algorithms, and K-Modes based Centralized Clustering algorithm. The experiments are carried out for various datasets of UCI machine learning data repository.
Centralized Class Specific Dictionary Learning for wearable sensors based phy...Sherin Mathews
Â
With recent progress in pervasive healthcare,
physical activity recognition with wearable body sensors has
become an important and challenging area in both research and
industrial communities. Here, we address a novel technique for
a sensor platform that performs physical activity recognition by
leveraging a class specific regularizer term into the dictionary
pair learning objective function. The proposed algorithm jointly
learns a synthesis dictionary and an analysis dictionary in
order to simultaneously perform signal representation and
classification once the time-domain features have been extracted.
Specifically, the class specific regularizer term ensures that the
sparse codes belonging to the same class will be concentrated
thereby proving beneficial for the classification stage. In order
to develop a more practical approach, we employ a combination
of an alternating direction method of multipliers and a l1 â ls
minimization method to approximately minimize the objective
function. We validate the effectiveness of our proposed model
by employing it on two activity recognition problem and an
intensity estimation problem, both of which include a large
number of physical activities. Experimental results demonstrate
that classifiers built in this dictionary learning based framework
outperforms state of art algorithms by using simple features,
thereby achieving competitive results when compared with
classical systems built upon features with prior knowledge
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
Â
ABSTRACT
In this paper we propose a novel method to cluster categorical data while retaining their context. Typically, clustering is performed on numerical data. However it is often useful to cluster categorical data as well, especially when dealing with data in real-world contexts. Several methods exist which can cluster categorical data, but our approach is unique in that we use recent text-processing and machine learning advancements like GloVe and t- SNE to develop a a context-aware clustering approach (using pre-trained
word embeddings). We encode words or categorical data into numerical, context-aware, vectors that we use to cluster the data points using common clustering algorithms like K-means.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
Â
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...sherinmm
Â
Due to its symbolic role in ubiquitous health monitoring,
physical activity recognition with wearable body sensors has been in the
limelight in both research and industrial communities. Physical activity
recognition is difficult due to the inherent complexity involved with different
walking styles and human body movements. Thus we present a
correntropy induced dictionary pair learning framework to achieve this
recognition. Our algorithm for this framework jointly learns a synthesis
dictionary and an analysis dictionary in order to simultaneously perform
signal representation and classification once the time-domain features
have been extracted. In particular, the dictionary pair learning algorithm
is developed based on the maximum correntropy criterion, which
is much more insensitive to outliers. In order to develop a more tractable
and practical approach, we employ a combination of alternating direction
method of multipliers and an iteratively reweighted method to approximately
minimize the objective function. We validate the effectiveness of
our proposed model by employing it on an activity recognition problem
and an intensity estimation problem, both of which include a large number
of physical activities from the recently released PAMAP2 dataset.
Experimental results indicate that classifiers built using this correntropy
induced dictionary learning based framework achieve high accuracy by
using simple features, and that this approach gives results competitive
with classical systems built upon features with prior knowledge.
A Combined Approach for Feature Subset Selection and Size Reduction for High ...IJERA Editor
Â
selection of relevant feature from a given set of feature is one of the important issues in the field of
data mining as well as classification. In general the dataset may contain a number of features however it is not
necessary that the whole set features are important for particular analysis of decision making because the
features may share the common informationâs and can also be completely irrelevant to the undergoing
processing. This generally happen because of improper selection of features during the dataset formation or
because of improper information availability about the observed system. However in both cases the data will
contain the features that will just increase the processing burden which may ultimately cause the improper
outcome when used for analysis. Because of these reasons some kind of methods are required to detect and
remove these features hence in this paper we are presenting an efficient approach for not just removing the
unimportant features but also the size of complete dataset size. The proposed algorithm utilizes the information
theory to detect the information gain from each feature and minimum span tree to group the similar features
with that the fuzzy c-means clustering is used to remove the similar entries from the dataset. Finally the
algorithm is tested with SVM classifier using 35 publicly available real-world high-dimensional dataset and the
results shows that the presented algorithm not only reduces the feature set and data lengths but also improves the
performances of the classifier.
Soft Computing Techniques Based Image Classification using Support Vector Mac...ijtsrd
Â
n this paper we compare different kernel had been developed for support vector machine based time series classification. Despite the better presentation of Support Vector Machine SVM on many concrete classification problems, the algorithm is not directly applicable to multi dimensional routes having different measurements. Training support vector machines SVM with indefinite kernels has just fascinated consideration in the machine learning public. This is moderately due to the fact that many similarity functions that arise in practice are not symmetric positive semidefinite. In this paper, by spreading the Gaussian RBF kernel by Gaussian elastic metric kernel. Gaussian elastic metric kernel is extended version of Gaussian RBF. The extended version divided in two ways time wrap distance and its real penalty. Experimental results on 17 datasets, time series data sets show that, in terms of classification accuracy, SVM with Gaussian elastic metric kernel is much superior to other kernels, and the ultramodern similarity measure methods. In this paper we used the indefinite resemblance function or distance directly without any conversion, and, hence, it always treats both training and test examples consistently. Finally, it achieves the highest accuracy of Gaussian elastic metric kernel among all methods that train SVM with kernels i.e. positive semi definite PSD and Non PSD, with a statistically significant evidence while also retaining sparsity of the support vector set. Tarun Jaiswal | Dr. S. Jaiswal | Dr. Ragini Shukla ""Soft Computing Techniques Based Image Classification using Support Vector Machine Performance"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23437.pdf
Paper URL: https://www.ijtsrd.com/computer-science/artificial-intelligence/23437/soft-computing-techniques-based-image-classification-using-support-vector-machine-performance/tarun-jaiswal
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
A survey on methods and applications of meta-learning with GNNsShreya Goyal
Â
This survey paper has provided a comprehensive review of works that are a combination of graph neural networks (GNNs) and meta-learning. They have also provided a thorough review, summary of methods, and applications in these categories. The application of meta-learning to GNNs is a growing and exciting ďŹeld; many graph problems will beneďŹt immensely from the combination of the two approaches.
Text classification based on gated recurrent unit combines with support vecto...IJECEIAES
Â
As the amount of unstructured text data that humanity produce largely and a lot of texts are grows on the Internet, so the one of the intelligent technique is require processing it and extracting different types of knowledge from it. Gated recurrent unit (GRU) and support vector machine (SVM) have been successfully used to Natural Language Processing (NLP) systems with comparative, remarkable results. GRU networks perform well in sequential learning tasks and overcome the issues of âvanishing and explosion of gradients in standard recurrent neural networks (RNNs) when captureing long-term dependencies. In this paper, we proposed a text classification model based on improved approaches to this norm by presenting a linear support vector machine (SVM) as the replacement of Softmax in the final output layer of a GRU model. Furthermore, the cross-entropy function shall be replaced with a margin-based function. Empirical results present that the proposed GRU-SVM model achieved comparatively better results than the baseline approaches BLSTM-C, DABN.
Study of the Class and Structural Changes Caused By Incorporating the Target ...ijceronline
Â
High dimensional data when processed by using various machine learning and pattern recognition techniques, it undergoes several changes. Dimensionality reduction is one such successfully used pre-processing technique to analyze and represent the high dimensional data that causes several structural changes to occur in the data through the process. The high-dimensional data when used to extract just the target class from among several classes that are spatially scattered then the philosophy of the dimensionality reduction is to find an optimal subset of features either from the original space or from the transformed space using the control set of the target class and then project the input space onto this optimal feature subspace. This paper is an exploratory analysis carried out to study the class properties and the structural properties that are affected due to the target class guided feature subsetting in specific. K-nearest neighbors and minimum spanning tree are employed to study the structural properties, and cluster analysis is applied to understand the target class and other class properties. The experimentation is conducted on the target class derived features on the selected bench mark data sets namely IRIS, AVIRIS Indiana Pine and ROSIS Pavia University data set. Experimentation is also extended to data represented in the optimal principal components obtained by transforming the subset of features and results are also compared
Mining academic social network is becoming increasingly necessary with the increasing amount of data. It
is a favorite topic of research for many researchers. The data mining techniques are used for the mining of
academic social networks. In this paper, we are presenting an efficient frequent item set mining technique
for social academic network. The proposed framework first processes the research documents and then the
enhanced frequent item set mining is applied to find the strength of relationship between the researchers.
The proposed method will be fast in comparison to older algorithms. Also it will takes less main memory
space for computation purpose.
Personal branding is all the rage. But what does it really mean. And, more importantly, how can you get started? In this presentation I delivered during a lunch session at Atlantic BT, I provided an overview of the topic and explained what I see as the three critical components of a solid personal branding strategy.
Atlantic BT conducts a weekly lunch and learn session for our team members on a variety of topics. The aim is to create a knowledgeable workforce that is up-to-date on the latest topics in our industry.
Ensemble based Distributed K-Modes ClusteringIJERD Editor
Â
Clustering has been recognized as the unsupervised classification of data items into groups. Due to the explosion in the number of autonomous data sources, there is an emergent need for effective approaches in distributed clustering. The distributed clustering algorithm is used to cluster the distributed datasets without gathering all the data in a single site. The K-Means is a popular clustering method owing to its simplicity and speed in clustering large datasets. But it fails to handle directly the datasets with categorical attributes which are generally occurred in real life datasets. Huang proposed the K-Modes clustering algorithm by introducing a new dissimilarity measure to cluster categorical data. This algorithm replaces means of clusters with a frequency based method which updates modes in the clustering process to minimize the cost function. Most of the distributed clustering algorithms found in the literature seek to cluster numerical data. In this paper, a novel Ensemble based Distributed K-Modes clustering algorithm is proposed, which is well suited to handle categorical data sets as well as to perform distributed clustering process in an asynchronous manner. The performance of the proposed algorithm is compared with the existing distributed K-Means clustering algorithms, and K-Modes based Centralized Clustering algorithm. The experiments are carried out for various datasets of UCI machine learning data repository.
Centralized Class Specific Dictionary Learning for wearable sensors based phy...Sherin Mathews
Â
With recent progress in pervasive healthcare,
physical activity recognition with wearable body sensors has
become an important and challenging area in both research and
industrial communities. Here, we address a novel technique for
a sensor platform that performs physical activity recognition by
leveraging a class specific regularizer term into the dictionary
pair learning objective function. The proposed algorithm jointly
learns a synthesis dictionary and an analysis dictionary in
order to simultaneously perform signal representation and
classification once the time-domain features have been extracted.
Specifically, the class specific regularizer term ensures that the
sparse codes belonging to the same class will be concentrated
thereby proving beneficial for the classification stage. In order
to develop a more practical approach, we employ a combination
of an alternating direction method of multipliers and a l1 â ls
minimization method to approximately minimize the objective
function. We validate the effectiveness of our proposed model
by employing it on two activity recognition problem and an
intensity estimation problem, both of which include a large
number of physical activities. Experimental results demonstrate
that classifiers built in this dictionary learning based framework
outperforms state of art algorithms by using simple features,
thereby achieving competitive results when compared with
classical systems built upon features with prior knowledge
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
Â
ABSTRACT
In this paper we propose a novel method to cluster categorical data while retaining their context. Typically, clustering is performed on numerical data. However it is often useful to cluster categorical data as well, especially when dealing with data in real-world contexts. Several methods exist which can cluster categorical data, but our approach is unique in that we use recent text-processing and machine learning advancements like GloVe and t- SNE to develop a a context-aware clustering approach (using pre-trained
word embeddings). We encode words or categorical data into numerical, context-aware, vectors that we use to cluster the data points using common clustering algorithms like K-means.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
Â
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity...sherinmm
Â
Due to its symbolic role in ubiquitous health monitoring,
physical activity recognition with wearable body sensors has been in the
limelight in both research and industrial communities. Physical activity
recognition is difficult due to the inherent complexity involved with different
walking styles and human body movements. Thus we present a
correntropy induced dictionary pair learning framework to achieve this
recognition. Our algorithm for this framework jointly learns a synthesis
dictionary and an analysis dictionary in order to simultaneously perform
signal representation and classification once the time-domain features
have been extracted. In particular, the dictionary pair learning algorithm
is developed based on the maximum correntropy criterion, which
is much more insensitive to outliers. In order to develop a more tractable
and practical approach, we employ a combination of alternating direction
method of multipliers and an iteratively reweighted method to approximately
minimize the objective function. We validate the effectiveness of
our proposed model by employing it on an activity recognition problem
and an intensity estimation problem, both of which include a large number
of physical activities from the recently released PAMAP2 dataset.
Experimental results indicate that classifiers built using this correntropy
induced dictionary learning based framework achieve high accuracy by
using simple features, and that this approach gives results competitive
with classical systems built upon features with prior knowledge.
A Combined Approach for Feature Subset Selection and Size Reduction for High ...IJERA Editor
Â
selection of relevant feature from a given set of feature is one of the important issues in the field of
data mining as well as classification. In general the dataset may contain a number of features however it is not
necessary that the whole set features are important for particular analysis of decision making because the
features may share the common informationâs and can also be completely irrelevant to the undergoing
processing. This generally happen because of improper selection of features during the dataset formation or
because of improper information availability about the observed system. However in both cases the data will
contain the features that will just increase the processing burden which may ultimately cause the improper
outcome when used for analysis. Because of these reasons some kind of methods are required to detect and
remove these features hence in this paper we are presenting an efficient approach for not just removing the
unimportant features but also the size of complete dataset size. The proposed algorithm utilizes the information
theory to detect the information gain from each feature and minimum span tree to group the similar features
with that the fuzzy c-means clustering is used to remove the similar entries from the dataset. Finally the
algorithm is tested with SVM classifier using 35 publicly available real-world high-dimensional dataset and the
results shows that the presented algorithm not only reduces the feature set and data lengths but also improves the
performances of the classifier.
Soft Computing Techniques Based Image Classification using Support Vector Mac...ijtsrd
Â
n this paper we compare different kernel had been developed for support vector machine based time series classification. Despite the better presentation of Support Vector Machine SVM on many concrete classification problems, the algorithm is not directly applicable to multi dimensional routes having different measurements. Training support vector machines SVM with indefinite kernels has just fascinated consideration in the machine learning public. This is moderately due to the fact that many similarity functions that arise in practice are not symmetric positive semidefinite. In this paper, by spreading the Gaussian RBF kernel by Gaussian elastic metric kernel. Gaussian elastic metric kernel is extended version of Gaussian RBF. The extended version divided in two ways time wrap distance and its real penalty. Experimental results on 17 datasets, time series data sets show that, in terms of classification accuracy, SVM with Gaussian elastic metric kernel is much superior to other kernels, and the ultramodern similarity measure methods. In this paper we used the indefinite resemblance function or distance directly without any conversion, and, hence, it always treats both training and test examples consistently. Finally, it achieves the highest accuracy of Gaussian elastic metric kernel among all methods that train SVM with kernels i.e. positive semi definite PSD and Non PSD, with a statistically significant evidence while also retaining sparsity of the support vector set. Tarun Jaiswal | Dr. S. Jaiswal | Dr. Ragini Shukla ""Soft Computing Techniques Based Image Classification using Support Vector Machine Performance"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23437.pdf
Paper URL: https://www.ijtsrd.com/computer-science/artificial-intelligence/23437/soft-computing-techniques-based-image-classification-using-support-vector-machine-performance/tarun-jaiswal
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
A survey on methods and applications of meta-learning with GNNsShreya Goyal
Â
This survey paper has provided a comprehensive review of works that are a combination of graph neural networks (GNNs) and meta-learning. They have also provided a thorough review, summary of methods, and applications in these categories. The application of meta-learning to GNNs is a growing and exciting ďŹeld; many graph problems will beneďŹt immensely from the combination of the two approaches.
Text classification based on gated recurrent unit combines with support vecto...IJECEIAES
Â
As the amount of unstructured text data that humanity produce largely and a lot of texts are grows on the Internet, so the one of the intelligent technique is require processing it and extracting different types of knowledge from it. Gated recurrent unit (GRU) and support vector machine (SVM) have been successfully used to Natural Language Processing (NLP) systems with comparative, remarkable results. GRU networks perform well in sequential learning tasks and overcome the issues of âvanishing and explosion of gradients in standard recurrent neural networks (RNNs) when captureing long-term dependencies. In this paper, we proposed a text classification model based on improved approaches to this norm by presenting a linear support vector machine (SVM) as the replacement of Softmax in the final output layer of a GRU model. Furthermore, the cross-entropy function shall be replaced with a margin-based function. Empirical results present that the proposed GRU-SVM model achieved comparatively better results than the baseline approaches BLSTM-C, DABN.
Study of the Class and Structural Changes Caused By Incorporating the Target ...ijceronline
Â
High dimensional data when processed by using various machine learning and pattern recognition techniques, it undergoes several changes. Dimensionality reduction is one such successfully used pre-processing technique to analyze and represent the high dimensional data that causes several structural changes to occur in the data through the process. The high-dimensional data when used to extract just the target class from among several classes that are spatially scattered then the philosophy of the dimensionality reduction is to find an optimal subset of features either from the original space or from the transformed space using the control set of the target class and then project the input space onto this optimal feature subspace. This paper is an exploratory analysis carried out to study the class properties and the structural properties that are affected due to the target class guided feature subsetting in specific. K-nearest neighbors and minimum spanning tree are employed to study the structural properties, and cluster analysis is applied to understand the target class and other class properties. The experimentation is conducted on the target class derived features on the selected bench mark data sets namely IRIS, AVIRIS Indiana Pine and ROSIS Pavia University data set. Experimentation is also extended to data represented in the optimal principal components obtained by transforming the subset of features and results are also compared
Mining academic social network is becoming increasingly necessary with the increasing amount of data. It
is a favorite topic of research for many researchers. The data mining techniques are used for the mining of
academic social networks. In this paper, we are presenting an efficient frequent item set mining technique
for social academic network. The proposed framework first processes the research documents and then the
enhanced frequent item set mining is applied to find the strength of relationship between the researchers.
The proposed method will be fast in comparison to older algorithms. Also it will takes less main memory
space for computation purpose.
Personal branding is all the rage. But what does it really mean. And, more importantly, how can you get started? In this presentation I delivered during a lunch session at Atlantic BT, I provided an overview of the topic and explained what I see as the three critical components of a solid personal branding strategy.
Atlantic BT conducts a weekly lunch and learn session for our team members on a variety of topics. The aim is to create a knowledgeable workforce that is up-to-date on the latest topics in our industry.
This project is to retrieve the similar geographic images from the dataset based on the features extracted.
Retrieval is the process of collecting the relevant images from the dataset which contains more number of
images. Initially the preprocessing step is performed in order to remove noise occurred in input image with
the help of Gaussian filter. As the second step, Gray Level Co-occurrence Matrix (GLCM), Scale Invariant
Feature Transform (SIFT), and Moment Invariant Feature algorithms are implemented to extract the
features from the images. After this process, the relevant geographic images are retrieved from the dataset
by using Euclidean distance. In this, the dataset consists of totally 40 images. From that the images which
are all related to the input image are retrieved by using Euclidean distance. The approach of SIFT is to
perform reliable recognition, it is important that the feature extracted from the training image be
detectable even under changes in image scale, noise and illumination. The GLCM calculates how often a
pixel with gray level value occurs. While the focus is on image retrieval, our project is effectively used in
the applications such as detection and classification.
Mine Blood Donors Information through Improved K-Means Clusteringijcsity
Â
The number of accidents and health diseases which are increasing at an alarming rate are resulting in a huge increase in the demand for blood. There is a necessity for the organized analysis of the blood donor database or blood banks repositories. Clustering analysis is one of the data mining applications and K-means clustering algorithm is the fundamental algorithm for modern clustering techniques. K-means clustering algorithm is traditional approach and iterative algorithm. At every iteration, it attempts to find the distance from the centroid of each cluster to each and every data point. This paper gives the improvement to the original k-means algorithm by improving the initial centroids with distribution of data. Results and discussions show that improved K-means algorithm produces accurate clusters in less computation time to find the donors information
Performance evaluation of clustering algorithmsijcsity
Â
Clustering is an important data mining technique where
we will be interested in maximizing intracluster
distance and also minimizing intercluster distance. We have utilized clustering techniques for detecting
deviation in product sales and also to identify and compare sales over a particular period of time.
Cl
ustering is suited to group items that seem to fall naturally together, when there is no specified class
for
any
new
item
.
We have
utilizedannual
sales data of a steel major to
analyze Sales V
olume
& Value
with
respect to dependent attributes like products
, customers and quantities sold.
The demand for steel products
is cyclical and
depends on
many factors like customer profile, price
,Discounts
and tax issues.
In this
paper, we have analyzed
sales data
with
clustering
algorithms like K
-
Means
&
EMwhich
revealed many
interesting patterns
useful for improving
sales revenue
and
achieving higher
sales
volume
. Our study
confirms that
partition methods like
K
-
Means & EM algorithms
are better suited to analyze
our sales data
in comparison to D
ensity based methods like
DBSCAN & OPTICS or
H
ierarchical methods like COBWEB
ANALYSE THE PERFORMANCE OF MOBILE PEER TO PEER NETWORK USING ANT COLONY OPTIM...ijcsity
Â
A mobile peer-to-peer computer network is the one in which each computer in the network can act as a
client or server for the other computers in the network. The communication process among the nodes in the
mobile peer to peer network requires more no of messages. Due to this large number of messages passing,
propose an interconnection structure called distributed Spanning Tree (DST) and it improves the efficiency
of the mobile peer to peer network. The proposed method improves the data availability and consistency
across the entire network and also reduces the data latency and the required number of message passes for
any specific application in the network. Further to enhance the effectiveness of the proposed system, the
DST network is optimized with the Ant Colony Optimization method. It gives the optimal solution of the
DST method and increased availability, enhanced consistency and scalability of the network. The
simulation results shows that reduces the number of message sent for any specific application and average
delay and increases the packet delivery ratio in the network.
Google Plus and Your Mobile Recruiting StrategyJon Parks
Â
You have a great mobile recruiting strategy. But are you using one of the most powerful social networks available today? Google Plus can provide a powerful boost to your mobile recruiting efforts as long as you know how to make the best use of the platform. In this presentation, I provide an overview of Google Plus, the personal Google Plus profile, Google Plus Communities and Hangouts on Air. The presentation concludes with three key tactics you can begin using today to improve your use of Google Plus in your mobile recruiting strategy. This presentation was delivered at the 2013 Mobile Recruiting Conference in Atlanta, GA on September 24, 2013 (www.mrec.net).
Cancer is one of the deadliest diseases in the world and is responsible for around 13% of all deaths worldwide.
Cancer incidence rate is growing at an alarming rate in the world. Despite the fact that cancer is
preventable and curable in early stages, the vast majority of patients are diagnosed with cancer very late.
Furthermore, cancer commonly comes back after years of treatment. Therefore, it is of paramount
importance to predict cancer recurrence so that specific treatments can be sought. Nonetheless,
conventional methods of predicting cancer recurrence rely solely on histopathology and the results are not
very reliable. The microarray gene expression technology is a promising technology that couldpredict
cancer recurrence by analyzing the gene expression of sample cells. The microarray technology allows
researchers to examine the expression of thousands of genes simultaneously. This paper describes a stateof-
the-art machine learning based approach called averaged one-dependence estimators with subsumption
resolution to tackle the problem of predicting, from DNA microarray gene expression data, whether a
particular cancer will recur within a specific timeframe, which is usually 5 years. To lower the
computational complexity, we employ an entropy-based geneselection approach to select relevant
prognosticgenes that are directly responsible for recurrence prediction. This proposed system has achieved
an average accuracy of 98.9% in predicting cancer recurrence over 3 datasets. The experimental results
demonstrate the efficacy of our framework.
Max stable set problem to found the initial centroids in clustering problemnooriasukmaningtyas
Â
In this paper, we propose a new approach to solve the document-clustering using the K-Means algorithm. The latter is sensitive to the random selection of the k cluster centroids in the initialization phase. To evaluate the quality of K-Means clustering we propose to model the text document clustering problem as the max stable set problem (MSSP) and use continuous Hopfield network to solve the MSSP problem to have initial centroids. The idea is inspired by the fact that MSSP and clustering share the same principle, MSSP consists to find the largest set of nodes completely disconnected in a graph, and in clustering, all objects are divided into disjoint clusters. Simulation results demonstrate that the proposed K-Means improved by MSSP (KM_MSSP) is efficient of large data sets, is much optimized in terms of time, and provides better quality of clustering than other methods.
Text Categorization Using Improved K Nearest Neighbor AlgorithmIJTET Journal
Â
Abstractâ Text categorization is the process of identifying and assigning predefined class to which a document belongs. A wide variety of algorithms are currently available to perform the text categorization. Among them, K-Nearest Neighbor text classifier is the most commonly used one. It is used to test the degree of similarity between documents and k training data, thereby determining the category of test documents. In this paper, an improved K-Nearest Neighbor algorithm for text categorization is proposed. In this method, the text is categorized into different classes based on K-Nearest Neighbor algorithm and constrained one-pass clustering, which provides an effective strategy for categorizing the text. This improves the efficiency of K-Nearest Neighbor algorithm by generating the classification model. The text classification using K-Nearest Neighbor algorithm has a wide variety of text mining applications.
Analysis On Classification Techniques In Mammographic Mass Data SetIJERA Editor
Â
Data mining, the extraction of hidden information from large databases, is to predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. Data-Mining classification techniques deals with determining to which group each data instances are associated with. It can deal with a wide variety of data so that large amount of data can be involved in processing. This paper deals with analysis on various data mining classification techniques such as Decision Tree Induction, NaĂŻve Bayes , k-Nearest Neighbour (KNN) classifiers in mammographic mass dataset.
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONIJDKP
Â
This article will introduce some approaches for improving text categorization models by integrating
previously imported ontologies. From the Reuters Corpus Volume I (RCV1) dataset, some categories very
similar in content and related to telecommunications, Internet and computer areas were selected for models
experiments. Several domain ontologies, covering these areas were built and integrated to categorization
models for their improvements.
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
Â
Data mining is the process of using technology to identify patterns and prospects from large amount of information. In Data Mining, Clustering is an important research topic and wide range of unverified classification application. Clustering is technique which divides a data into meaningful groups. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In this paper, we present the comparison of different K-means clustering algorithms.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A Formal Machine Learning or Multi Objective Decision Making System for Deter...Editor IJCATR
Â
Decision-making typically needs the mechanisms to compromise among opposing norms. Once multiple objectives square measure is concerned of machine learning, a vital step is to check the weights of individual objectives to the system-level performance. Determinant, the weights of multi-objectives is associate in analysis method, associated it's been typically treated as a drawback. However, our preliminary investigation has shown that existing methodologies in managing the weights of multi-objectives have some obvious limitations like the determination of weights is treated as one drawback, a result supporting such associate improvement is limited, if associated it will even be unreliable, once knowledge concerning multiple objectives is incomplete like an integrity caused by poor data. The constraints of weights are also mentioned. Variable weights square measure is natural in decision-making processes. Here, we'd like to develop a scientific methodology in determinant variable weights of multi-objectives. The roles of weights in a creative multi-objective decision-making or machine-learning of square measure analyzed, and therefore the weights square measure determined with the help of a standard neural network.
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...ijsrd.com
Â
A cluster is a group of objects which are similar to each other within a cluster and are dissimilar to the objects of other clusters. The similarity is typically calculated on the basis of distance between two objects or clusters. Two or more objects present inside a cluster and only if those objects are close to each other based on the distance between them.The major objective of clustering is to discover collection of comparable objects based on similarity metric. Fuzzy Possibilistic C-Means (FPCM) is the effective clustering algorithm available to cluster unlabeled data that produces both membership and typicality values during clustering process. In this approach, the efficiency of the Fuzzy Possibilistic C-means clustering approach is enhanced by using the penalized and compensated constraints based FPCM (PCFPCM). The proposed PCFPCM approach differ from the conventional clustering techniques by imposing the possibilistic reasoning strategy on fuzzy clustering with penalized and compensated constraints for updating the grades of membership and typicality. The performance of the proposed approaches is evaluated on the University of California, Irvine (UCI) machine repository datasets such as Iris, Wine, Lung Cancer and Lymphograma. The parameters used for the evaluation is Clustering accuracy, Mean Squared Error (MSE), Execution Time and Convergence behavior.
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
Â
Text Document Clustering is one of the fastest growing research areas because of availability of huge amount of information in an electronic form. There are several number of techniques launched for clustering documents in such a way that documents within a cluster have high intra-similarity and low inter-similarity to other clusters. Many document clustering algorithms provide localized search in effectively navigating, summarizing, and organizing information. A global optimal solution can be obtained by applying high-speed and high-quality optimization algorithms. The optimization technique performs a globalized search in the entire solution space. In this paper, a brief survey on optimization approaches to text document clustering is turned out.
A Novel Approach to Mathematical Concepts in Data Miningijdmtaiir
Â
-This paper describes three different fundamental
mathematical programming approaches that are relevant to
data mining. They are: Feature Selection, Clustering and
Robust Representation. This paper comprises of two clustering
algorithms such as K-mean algorithm and K-median
algorithms. Clustering is illustrated by the unsupervised
learning of patterns and clusters that may exist in a given
databases and useful tool for Knowledge Discovery in
Database (KDD). The results of k-median algorithm are used
to collecting the blood cancer patient from a medical database.
K-mean clustering is a data mining/machine learning algorithm
used to cluster observations into groups of related observations
without any prior knowledge of those relationships. The kmean algorithm is one of the simplest clustering techniques
and it is commonly used in medical imaging, biometrics and
related fields.
In recent machine learning community, there is a trend of constructing a linear logarithm version of
nonlinear version through the âkernel methodâ for example kernel principal component analysis, kernel
fisher discriminant analysis, support Vector Machines (SVMs), and the current kernel clustering
algorithms. Typically, in unsupervised methods of clustering algorithms utilizing kernel method, a
nonlinear mapping is operated initially in order to map the data into a much higher space feature, and then
clustering is executed. A hitch of these kernel clustering algorithms is that the clustering prototype resides
in increased features specs of dimensions and therefore lack intuitive and clear descriptions without
utilizing added approximation of projection from the specs to the data as executed in the literature
presented. This paper aims to utilize the âkernel methodâ, a novel clustering algorithm, founded on the
conventional fuzzy clustering algorithm (FCM) is anticipated and known as kernel fuzzy c-means algorithm
(KFCM). This method embraces a novel kernel-induced metric in the space of data in order to interchange
the novel Euclidean matric norm in cluster prototype and fuzzy clustering algorithm still reside in the space
of data so that the results of clustering could be interpreted and reformulated in the spaces which are
original. This property is used for clustering incomplete data. Execution on supposed data illustrate that
KFCM has improved performance of clustering and stout as compare to other transformations of FCM for
clustering incomplete data.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
International Journal of Computational Science and Information Technology (IJ...ijcsity
Â
International Journal of Computational Science and Information Technology (IJCSITY) focuses on Complex systems, information and computation using mathematics and engineering techniques. This is an open access peer-reviewed journal will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the area of Computation theory and applications. It will also serve to facilitate the exchange of information between researchers and industry professionals to discuss the latest issues and advancement in the area of advanced Computation and its applications
NETWORK MEDIA ATTENTION AND GREEN TECHNOLOGY INNOVATION ijcsity
Â
This paper will provide a novel empirical study for the relationship between network media attention and green technology innovation and examine how network media attention can ease financing constraints. It collected data from listed companies in China's heavy pollution industry and performed rigorous regression analysis, in order to innovatively explore the environmental governance functions of the media. It found that network media attention significantly promotes green technology innovation. By analyzing the inner mechanism further, it found that network media attention can promote green innovation by easing financing constraints. Besides, network media attention has a significant positive impact on green invention patents while not affecting green utility model patents.
4th International Conference on Artificial Intelligence and Machine Learning ...ijcsity
Â
4th International Conference on Artificial Intelligence and Machine Learning (CAIML 2023) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Artificial Intelligence and Machine Learning. The Conference looks for significant contributions to all major fields of the Artificial Intelligence, Machine Learning in theoretical and practical aspects. The aim of the Conference is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
International Journal of Computational Science and Information Technology (IJ...ijcsity
Â
International Journal of Computational Science and Information Technology (IJCSITY) focuses on Complex systems, information and computation using mathematics and engineering techniques. This is an open access peer-reviewed journal will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the area of Computation theory and applications. It will also serve to facilitate the exchange of information between researchers and industry professionals to discuss the latest issues and advancement in the area of advanced Computation and its applications.
5th International Conference on Machine Learning & Applications (CMLA 2023)ijcsity
Â
5th International Conference on Machine Learning & Applications (CMLA 2023) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
International Journal of Computational Science and Information Technology (IJ...ijcsity
Â
International Journal of Computational Science and Information Technology (IJCSITY) focuses on Complex systems, information and computation using mathematics and engineering techniques. This is an open access peer-reviewed journal will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the area of Computation theory and applications. It will also serve to facilitate the exchange of information between researchers and industry professionals to discuss the latest issues and advancement in the area of advanced Computation and its applications.
International Journal of Computational Science and Information Technology (IJ...ijcsity
Â
International Journal of Computational Science and Information Technology (IJCSITY) focuses on Complex systems, information and computation using mathematics and engineering techniques. This is an open access peer-reviewed journal will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the area of Computation theory and applications. It will also serve to facilitate the exchange of information between researchers and industry professionals to discuss the latest issues and advancement in the area of advanced Computation and its applications.
8th International Conference on Networks, Communications, Wireless and Mobile...ijcsity
Â
8th International Conference on Networks, Communications, Wireless and Mobile Computing (NCWMC 2023) looks for significant contributions to the Computer Networks, Communications, wireless and mobile computing for wired and wireless networks in theoretical and practical aspects. Original papers are invited on computer Networks, network protocols and wireless networks, Data communication Technologies, network security and mobile computing. The goal of this Conference is to bring together researchers and practitioners from academia and industry to focus on advanced networking concepts and establishing new collaborations in these areas.
International Journal of Computational Science and Information Technology (IJ...ijcsity
Â
International Journal of Computational Science and Information Technology (IJCSITY) focuses on Complex systems, information and computation using mathematics and engineering techniques. This is an open access peer-reviewed journal will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the area of Computation theory and applications. It will also serve to facilitate the exchange of information between researchers and industry professionals to discuss the latest issues and advancement in the area of advanced Computation and its applications.
10th International Conference on Computer Science and Information Technology ...ijcsity
Â
10th International Conference on Computer Science and Information Technology (CoSIT 2023) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Computer Science, Engineering and Information Technology. The Conference looks for significant contributions to all major fields of the Computer Science and Information Technology in theoretical and practical aspects. The aim of the conference is to provide a platform to the researchers and practitioners from both academia as well as industry to meet and share cutting-edge development in the field.
International Journal of Computational Science and Information Technology (IJ...ijcsity
Â
International Journal of Computational Science and Information Technology (IJCSITY) focuses on Complex systems, information and computation using mathematics and engineering techniques. This is an open access peer-reviewed journal will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the area of Computation theory and applications. It will also serve to facilitate the exchange of information between researchers and industry professionals to discuss the latest issues and advancement in the area of advanced Computation and its applications.
International Conference on Speech and NLP (SPNLP 2023) ijcsity
Â
International Conference on Speech and NLP (SPNLP 2023) will provide an excellent international forums for sharing knowledge and results in theory, methodology and applications of speech and Natural Language Processing (NLP).
International Conference on Speech and NLP (SPNLP 2023) will provide an excellent international forums for sharing knowledge and results in theory, methodology and applications of speech and Natural Language Processing (NLP).
International Journal of Computational Science and Information Technology (IJ...ijcsity
Â
International Journal of Computational Science and Information Technology (IJCSITY) focuses on Complex systems, information and computation using mathematics and engineering techniques. This is an open access peer-reviewed journal will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the area of Computation theory and applications. It will also serve to facilitate the exchange of information between researchers and industry professionals to discuss the latest issues and advancement in the area of advanced Computation and its applications.
International Journal of Computational Science and Information Technology (IJ...ijcsity
Â
International Journal of Computational Science and Information Technology (IJCSITY) focuses on Complex systems, information and computation using mathematics and engineering techniques. This is an open access peer-reviewed journal will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the area of Computation theory and applications. It will also serve to facilitate the exchange of information between researchers and industry professionals to discuss the latest issues and advancement in the area of advanced Computation and its applications.
International Journal of Computational Science and Information Technology (IJ...ijcsity
Â
International Journal of Computational Science and Information Technology (IJCSITY) focuses on Complex systems, information and computation using mathematics and engineering techniques. This is an open access peer-reviewed journal will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the area of Computation theory and applications. It will also serve to facilitate the exchange of information between researchers and industry professionals to discuss the latest issues and advancement in the area of advanced Computation and its applications.
International Journal of Computational Science and Information Technology (IJ...ijcsity
Â
International Journal of Computational Science and Information Technology (IJCSITY) focuses on Complex systems, information and computation using mathematics and engineering techniques. This is an open access peer-reviewed journal will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the area of Computation theory and applications. It will also serve to facilitate the exchange of information between researchers and industry professionals to discuss the latest issues and advancement in the area of advanced Computation and its applications.
International Journal of Computational Science and Information Technology (IJ...ijcsity
Â
International Journal of Computational Science and Information Technology (IJCSITY) focuses on Complex systems, information and computation using mathematics and engineering techniques. This is an open access peer-reviewed journal will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the area of Computation theory and applications. It will also serve to facilitate the exchange of information between researchers and industry professionals to discuss the latest issues and advancement in the area of advanced Computation and its applications.
International Journal of Computational Science and Information Technology (IJ...ijcsity
Â
International Journal of Computational Science and Information Technology (IJCSITY) focuses on Complex systems, information and computation using mathematics and engineering techniques. This is an open access peer-reviewed journal will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the area of Computation theory and applications. It will also serve to facilitate the exchange of information between researchers and industry professionals to discussthelatestissuesandadvancementintheareaofadvancedComputationanditsapplications
International Journal of Computational Science and Information Technology (IJ...ijcsity
Â
International Journal of Computational Science and Information Technology (IJCSITY) focuses on Complex systems, information and computation using mathematics and engineering techniques. This is an open access peer-reviewed journal will act as a major forum for the presentation of innovative ideas, approaches, developments, and research projects in the area of Computation theory and applications. It will also serve to facilitate the exchange of information between researchers and industry professionals to discuss the latest issues and advancement in the area of advanced Computation and its applications.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Â
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Â
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Â
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
Â
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Â
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overviewâ
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Â
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
Â
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Dev Dives: Train smarter, not harder â active learning and UiPath LLMs for do...UiPathCommunity
Â
đĽ Speed, accuracy, and scaling â discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Miningâ˘:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing â with little to no training required
Get an exclusive demo of the new family of UiPath LLMs â GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
đ¨âđŤ Andras Palfi, Senior Product Manager, UiPath
đŠâđŤ Lenka Dulovicova, Product Program Manager, UiPath
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
Â
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
⢠The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
⢠Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
⢠Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
⢠Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
Â
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties â USA
Expansion of bot farms â how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks â Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Â
Clients donât know what they donât know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clientsâ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Â
Novel text categorization by amalgamation of augmented k nearest neighbourhoodclassification and k-medoids clustering
1. International Journal of Computational Science and Information Technology (IJCSITY) Vol.1, No.4, November 2013
NOVEL TEXT CATEGORIZATION BY
AMALGAMATION OF AUGMENTED K-NEAREST
NEIGHBOURHOODCLASSIFICATION AND
K-MEDOIDS CLUSTERING
RamachandraRao Kurada1, Dr. K Karteeka Pavan2, M Rajeswari3 and
M Lakshmi Kamala3
1
Research Scholar (Part-time), AcharyaNagarjuna University, Guntur
Assistant Professor, Department of Computer Applications
Shri Vishnu Engineering College for Women, Bhimavaram
2
Professor, Department of Information Technology
RVR & JC College of Engineering, Guntur
3
Department of Computer Applications
Shri Vishnu Engineering College for Women, Bhimavaram
Abstract
Machine learning for text classification is the underpinning of document cataloging, news filtering,
document steering and exemplification. In text mining realm, effective feature selection is significant to
make the learning task more accurate and competent. One of the traditional lazy text classifier k-Nearest
Neighborhood (kNN) has a major pitfall in calculating the similarity between all the objects in training and
testing sets, there by leads to exaggeration of both computational complexity of the algorithm and massive
consumption of main memory. To diminish these shortcomings in viewpoint of a data-mining practitioner
an amalgamative technique is proposed in this paper using a novel restructured version of kNN called
AugmentedkNN(AkNN) and k-Medoids(kMdd) clustering.The proposed work comprises preprocesses on the
initial training set by imposing attribute feature selection for reduction of high dimensionality, also it
detects and excludes the high-fliers samples in the initial training set and restructures a constrictedtraining
set. The kMdd clustering algorithm generates the cluster centers (as interior objects) for each category and
restructures the constricted training set with centroids. This technique is amalgamated with AkNNclassifier
that was prearranged with text mining similarity measures. Eventually, significantweights and ranks were
assigned to each object in the new training set based upon their accessory towards the object in testing set.
Experiments conducted on Reuters-21578 a UCI benchmark text mining data set, and comparisons with
traditional kNNclassifier designates the referredmethod yieldspreeminentrecitalin both clustering and
classification.
Keywords
Data mining, Dimension reduction, high-fliers, k-Nearest Neighborhood, k-Medoids, Text classification
1. Introduction
Pattern recognition concord with bestowing categories to samples, are declared by a set of
procedures alleged as features or characteristics. Despite of prolific investigations in the past
decannary, and contemporary hypothesis with extemporized thoughts, suspicion and speculation
DOI : 10.5121/ijcsity.2013.1406
81
2. International Journal of Computational Science and Information Technology (IJCSITY) Vol.1, No.4, November 2013
that exists, pattern recognition is facing confront in resolving real world problems [1].There are
two major types of pattern recognition problemscalled unsupervised and supervised. In
classification or the supervised learning, each sample in the dataset appears with a pre-designated
category label, where as in clustering or the unsupervised learning, there is no pre-designated
category label and assemblage is done based on similarities. In this paper, a combinational
learningapproach using classification and clustering is projected for the recognition of text
classifier accuracy and adequacy in judgment.
With the growth of the internet, efficient and accurate Information Retrieval (IR) systems [2]are
of great importance. The present day search engines are able to accomplish enormous amount of
data concerned in Exabyteâs. Text classification effort involuntarily to decide whether adocument
or part of a document has exacting distinctivenesstypically predestined on whether the
documentenclose aconvinced type of topic or not. Uniquely, the topic of fascination is not defined
absolutely or more accurately by the users and as a substitute they are arranged with a set of
documents that comprise the enthrallment of both (positive and negative training set). The
decision-making procedure [3] despotically hauls out the features of text documents and helps to
discriminate positive from negatives. It also administers those appearances inevitably to sample
documents with the help of text classification systems. The k-nearest neighbor rule is a simple
and effective classifier for document classification. In this technique, a document is fashionable
into an exacting category if the category has utmost diversity pertaining to the k nearest neighbors
of the documents in the training set. The k nearest neighbors of a test document isordered based
on their content similarity with the documents in the training set.
This paper probes and recognizes the payback of amalgamation of clustering and classification
for text categorization. This paper is prearranged as follows: Section 2 gives an abrupt analysis of
antecedent works on text classification. Section 3 is devoted to present the proposed approach for
document classification. In Section 4 extensive experimental evaluation on text corpus derived
from the Reuters-21578, a real time data set is summarized and analyzed. The resulting text
categorization rates are comparing favorably for the referred methods with those of the standard
method. Finally, conclusion is drawn and future research directions are identified in Section 5.
2. Literature Survey
TM Cover and PE Hart first proposed an elementary and straightforward classification method
called the K-Nearest Neighbor (kNN) [4], where no former awareness about the circulation of the
data is needed and neighborhood estimation was based on k value.kNN is a non-parametric lazy
learning algorithm, since it does not make any postulations on the original data allocation and
there is no precise training phase to do any simplification over the data points. The fascination of
kNNdominion was of disunion in its minimal training phase and a lavish testing phase.
Few topological methods that pilot to the current state-of-the-are wKNNmethod intended by
TBaileyand AK Jain [5] is aimproved version of simple kNN, assigns weights to training points
according to the distance from centroid. Outliers in the training set that do not affect the mean
point of training set are purged in the work of GW Gates [6], thus reduce the reorganization rate
of sample in large datasets. One of the major pitfall of kNN i.e. memory limitation was addressed
by Angiulli[7], by eliminating data sets which show similarity in training set.Guo and Bell in
[8]proposed a model to automatic reorganize the k value in dynamic web pages of large web
repositories. Automatic classification of data points in training set based on k value was a
research finding of G Guo.In 2003 SC Baguai and K Pal assigned rank to training data for each
category by using Gaussian distribution, thus it dominated the other variations of kNN[9].
Modified kNN[10] suggested by Parvin et.al in 2008 used weights and validity of data points to
classify nearest neighbor and used various outlier elimination techniques. In the year, 2009 Zeng
et.al planned a novel idea for kNN by using n-1 classes to the entire training to address the
82
3. International Journal of Computational Science and Information Technology (IJCSITY) Vol.1, No.4, November 2013
computational complexity [11]. The annoyed paper for this publication was of Z Youg work i.e.
clusters formation using k nearest neighbors in text mining domains [12], proved as a robust
method and overcome the defects of uneven distribution of training set.
Hubert et.al in 2013 composed an algorithm for correspondence search in the metric space of
measure allocation in the thesaurus [13].In 2013, Du et.al proposed an effective strategy to
accelerate the standard kNN, based on a simple principle: usually, nearer points in space are also,
near when they areprojected into a direction, which is used to remove irrelevant points and
decreases the computation cost[14].Basu et.al in 2013 assigned a document to a predefined class
for a large value of k when the margin of majority voting is one or when a tie occurs. The
majority voting method discriminates the criterion to prune the actual search space of the test
document. This rule has enhanced the confidence of the voting process and it makes no prior
assumption about the number of nearest neighbors[15]. One of the major drawbacks of k-Means
clustering technique is its insightfulness to high-fliers; because itâs mean value can be simply
prejudiced by exterior values. The other partitioned clustering technique k-Medoids clustering
which is a disparity of k-Means is more robust to noises and able-bodied to high-fliers. KMedoids clustering is symbolized with an actual point as the center of the cluster instead of a
mean point. The actual point is assumed as a cluster center, if it is located centrally and with
minimum sum of distances to other points [16].
Figure 1. Roadmap of the amalgamative technique for text categorization
3. Proposed Work
The intuition of proposed algorithm operates in three niches. The first niche initiates the
preprocessing on text mining corpus settled in the form of training set. This is accomplished by
attribute feature selection to alienate the higher dimensions of training set and derive a constricted
training set with minimal dimensions. The second niche uses the partitioned clustering kMedoidsalgorithm to define k objects as a most centrally located points in the classes of
constricted training set. A Solemnity model is designed to identify the cluster centers using edit
distance as a similarity measure for each category in the training set. The worthiness of
constricted training set is ensured by high-fliers identification and elimination. The final niche is
embedded exclusively with an accurate text classification system using a novel augmentedkNN
classifier with cosine text mining similarity measure, thus enhances the confidence of assigning a
test sample to a trained categorical class. This insight the text classifier criterion to assign
83
4. International Journal of Computational Science and Information Technology (IJCSITY) Vol.1, No.4, November 2013
considerable weights to each object in the new constricted training set base upon their minimal
average dissimilarity with the latest computed centroids. It was proven from the results attained
that, the amalgamation of all the three niches rationale a better solution for evaluating the
effectiveness of the text classification. The overview of the proposed intuition is outlined as
figure 1 roadmap of the amalgamative technique for text categorization.
3.1.First Niche â Preprocessing of Text Corpus:
The preprocessing of text classification over datasets includes many key technologies before
extracting the features from the text.The data cleaning process involves removing formatting,
converting the data into plain text, segmentation on sentence, stemming [17], removing
whitespace, uppercase characters, stop words removal [18] and weight calculation.The text
classification is made simpler by eradication of inappropriate dimensions in the dataset. This was
defensible by dispensation of the curse of dimensionality [19] i.e. higher dimensional data sets are
reduced into lower dimensional data set without significant loss of information. In
general,attribute feature selection methods are used to rank each latent feature according to a
fastidious feature selection metric, and then engage the best k features. Ranking entails counting
the incidence of each feature in training set as either positive or negative samples discretely, and
then computing a function to each these classes.
3.1.1.Feature selection
Feature selection methods only selects as subset of meaningful or useful dimensions specific to
applications from the original set of dimensions. The text mining domain, attribute feature
selection methods include Document Frequency (DF), Term Frequency (TF), Inverse Document
Frequency (IDF) etc. These methods fort requisites based on numerical procedures computed for
each document in the text corpus.
The documents in the proposed work use the vector space model to represent document objects.
*
+
Each document is symbolized by a vector in the term space such that
where
is the TF of the term in the document. To represent every document with
the same set of terms, extractingall the terms found in the documents and use them, as the
generated feature vector is necessary. The other method used was the combination of term
frequency with inverse document frequency (TF-IDF). The document frequency
is the number
of documents in anassortment of
documents in which the term transpires. Usually IDF is
given as
. â /. The weight of a term
in a document is given by
. â / , -. To maintain theseattribute feature vector dimensionssensible, only
terms with the maximum weights in all the documents are selected as n features. TF-IDF is
(
)
( )
(
), where is the terms in document and is
termed as
the total number of documents in D.
Stemming [21]refers to the process of erasing word suffixes to retrieve the root (or stem) of the
words, which reduces the complexity of the data without significant loss of information in a bag
ofwords representing the text data. An attractive alternative to stemming used in this proposed
work is character n-gramtokenization. Adamson and Borehamin 1974 introduced n-gram
stemmersmethod[22] a conflating terms also called shared digram method. A digram is a pair of
consecutive letters. In this approach, n-gram method is used as association measures to calculate
distance between pairs of terms based on shared unique digrams. Formerly the distinctive digrams
for the word pair are articulated and counted, with a similarity quantifier called Diceâs coefficient
â
for computation. Diceâs coefficient is termed as
where
is the number of
distinctivedigrams in the first word, the number of distinctivedigrams in the second, and the
84
5. International Journal of Computational Science and Information Technology (IJCSITY) Vol.1, No.4, November 2013
number of distinctivedigrams shared by and . This similarity quantifieris used to bentall pairs
of terms in the database, and also used to figure a similarity matrix.Since Dice coefficient is
symmetric, a lower triangle similarity matrix is used.At the end of first niche the text
preprocessing is well-rounded by building a strong bottomline and preparedness for accurate text
classification.
3.2.Second Niche â High-filers Identification, Elimination & k-MedoidsClustering
High-fliers are the noisy and redundant data present in a data set, they materialize to be
conflicting with the other residues in the data set. Such high-fliers are to be eliminated because
they are the applicant for abnormal data that unfavorably guide to sculpt vagueness, unfair
constraint inference and inaccurate domino effect. In text classification, High-fliers raise between
each category because of the uneven distribution of data, as which carries to that the distance
between samples in the same category to be larger than distance between samples in different
categories. This is addressed by building a Solemnity model in k-Medoids clustering algorithm.
3.2.1.Solemnity Model
To reduce the complexity of constricted training set, and overcome the uneven distribution of text
samples the k-Medoids clustering algorithm is used. The similarity measure used in identifying
the categories in training set was Levenshtein distance (LD) or Edit distance [23]. This
quantification is used a string metric for assessing the dissimilarityamidby the two successions.
(| | | |)where
Edit distance between two strings
is given by
(
(
)
)
)
)
)
(
(
{
{
(
(
)
otherwise, LD has numerous upper and
[
]
lower bounds. If the comparing strings are identical, it results zero. If the strings are of the
identical size, the Hamming distance is an upper bound on the LD distance. The LD between
twostrings is no greater than the sum of their LD from a third string. At the end of the second
niche, the high-fliers are eliminated and a solemnity model is built using k-Mddwith LD
similarity measure and the training set is rebuilt by partitioning the objects into different
categories, thus reduces the computational complexityand impacts on the memory usage tenancy.
The cluster centers are treated as the k representativeâs objects in the constricted training set.
3.3.Final Niche â Appliance of AkNNand amending weights towards computed
centroid
The second niche classifies the objects in the training phase by building aconstricted training set
using k centroidsbut when an object from a testing set have to be classified, it is necessary to
calculate similarities between object in testing set and existing samples in it constricted training
sets. It then chooses k nearest neighbor samples, which have larger similarities, and amends
weights using weighting vector.
To accomplish this model Cosine similarity is used as a distance measure in kNNtext classifier to
classify the test sample towards the similarity with k samples in the training set. Cosine similarity
distance measure is commonly used in high dimensional positive spaces, used to compare
documents in text mining. IR uses this cosine similarity for comparing documents and to range
values from 0 to 1, since TF-IDF weights cannot be negative.Each term is speculatively dispensed
over a distinctive dimension and a vector is described for each document, where value of each
85
6. International Journal of Computational Science and Information Technology (IJCSITY) Vol.1, No.4, November 2013
dimension matches to the number of times the term appeared in the document.The cosines of two
â ââ â
vectors are resultant by using the Euclidean dot product formula
. Assumed
two vectors of attributes, A and B the cosine similarity, is symbolized using a dot product and
â
magnitude as
â ââ â
ââ
(
)
ââ
(
)
The ensuing similarity vary
from -1 (exactly opposite) to 1 (exactly the same), and with 0 (indicating independence). For text
similarities, the attribute vectors A and B are usually the TF vectors of the documents
3.3.1. Assigning weights to samples
To assign precedence to the samples in constricted training set that have high similarity with the
test sample the kNN classifier uses a distance weighted kNN rule proposed by Dudani[24].Let
be the weight of the
nearest samples. The WkNN rule resolves the weight
by using a
distance function between the test samples and the
nearest neighbor, i.e. samples with minor
distances are weighted more profoundly than with the larger distances. The simple function that
scales the weights linearly is
{
where
is the distance to the test
sample of the
nearest neighbor and the farthest ( ) neighbor respectively. The WkNN rule
uses the rank weighting function
to assign the weights. The proposed work used
this function for computing the weights of kNNsamples belonging to individual classes.
The upshot observed at the end of the final niche was the AkNN classifies the objects in testing
set with the distance weighted kNN rule compute ranks to each category and ranks k samples that
have high similarities and accessory towards the centroids. The Inactive amalgamation of
classification and clustering text categorization algorithm is obtainable as algorithm 1.
Algorithm1: Amalgamation of Classification and Clustering for Text Categorization
Input:A Raw training set composing of
predefined category lables
*
documents .
text documents
+ and c clusters
*
+, a set of
, and testing set sample
First niche
*
+, into vectors from
1: Preprocess the training set documents,
dataset
2: For each feature in the dataset do
3: Attribute feature selection by removing formatting, sentence segmentation, stop-word
removal, white space removal, weight calcuation, n-gram stemming
4: Compute DF, TF, IDF and TF-IDF
5: End for
Second niche
6:
7:
8:
9:
Recognize and purge high-fliers by building a solmenity model using Edit distance
Obtain constricted training set with the preprocessed text document collection
For each processed document in D do
Compute c clusters
, using K-Medoids clustering algorithm for all categories of
+documents
in *
10: For every cluster ID, generate and index the feature cluster vectors
86
7. International Journal of Computational Science and Information Technology (IJCSITY) Vol.1, No.4, November 2013
11: Redistribute the constricted training set with cluster represetatives of each text categories.
12: End for
Final niche
13: For each processed document in D of constricted training set do
14: Classifythe testing set sample document using AkNN text classifer with Cosine
Simalirity over the constricted training set .
15: Assign the weights using distance weighted kNN rule to training set documents and
assign rank to k samples that has the largest similarity and belongingness to categories.
16: Position the testing set document to the categories C which has the largest similairy and
consign them with heavily weighted rank.
17: Build classifier on these mapped text document and record the classification accuray
18: End for
Output: Classifying testing documents
in to
categories with appropriate weight value
.
4. Experimental Analysis and Observations
4.1.Dataset
The referred algorithm uses Reuters-21578 a classical benchmark dataset for text categorization
classification [25]. Reuters 21578 consist documents from the Reuters newswire, which was
categorized and indexed in to grouping, by Reuters Ltd. and Carnegie Group in 1987. In 1991
David D. Lewis and Peter Shoemaker of Center for Informationand Language Studies, University
of Chicago prepared and formatted the metafile for the Reuters-21578 dataset [26]. The
experimental environment used is Intel core i3 CPU 530@2.93GHz 2.93 GHz, RAM 4GB, 32-bit
Windows OS, and Java software. The software Package html2text-1.3.2 is used to convert the text
corpus into text documents. The categories in text documents are separated as individual text
documents using Amberfish-1.6.4 software.Thetext corpus consists of 21,578 documents,
assigned with 123 different categories. Table 1 presents the initial distribution of all categories in
Reuters-21578 text corpos before it is preprocessed.
Table1. Initial distribution of all categories in Reuters 21578
Categories
Earn
Acq
Money
Fx
Crude
Grain
Trade
Interest
Wheat
Ship
Corn
Oil
Dlr
Gas
Oilseed
Supply
#Documents
3987
2448
991
801
634
628
552
513
306
305
255
238
217
195
192
190
Categories
Feed
Rubber
Zinc
Palm
Chem
Pet
Silver
Lead
Rapeseed
Sorghum
Tin
Metal
Strategic
Wpi
Orange
Fuel
#Documents
51
51
44
43
41
41
37
35
35
35
33
32
32
32
29
28
Categories
Potato
Propane
Austdlr
Belly
Cpu
Nzdlr
Plywood
Pork
Tapioca
Cake
Can
Copra
Dfl
F
Lin
Lit
#Documents
6
6
4
4
4
4
4
4
4
3
3
3
3
3
3
3
87
8. International Journal of Computational Science and Information Technology (IJCSITY) Vol.1, No.4, November 2013
Sugar
Gnp
Coffee
Veg
Gold
Nat
Soybean
Bop
Livestock
Cpi
Reserves
Meal
Copper
Cocoa
Jobs
Carcass
Yen
Iron
Rice
Steel
Cotton
Ipi
Alum
Barley
Soy
184
163
145
137
135
130
120
116
114
112
84
82
78
76
76
75
69
67
67
67
66
65
63
54
52
Hog
Retail
Heat
Housing
Stg
Income
Lei
Lumber
Sunseed
Dmk
Tea
Oat
Coconut
Cattle
Groundnut
Platinum
Nickel
Sun
L
Rape
Jet
Debt
Instal
Inventories
Naphtha
27
27
25
21
21
18
17
17
17
15
15
14
13
12
12
12
11
10
9
9
8
7
7
7
7
Nkr
Palladium
Palmkernel
Rand
Saudriyal
Sfr
Castor
Cornglutenfeed
Fishmeal
Linseed
Rye
Wool
Bean
Bfr
Castorseed
Citruspulp
Cottonseed
Cruzado
Dkr
Hk
Peseta
Red
Ringgit
Rupiah
Skr
3
3
3
3
3
3
2
2
2
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
In the first nitche preprocessing is imposed on the text corpos, further test the effectiveness and
efficiency of Category specific classification data set is divided according to the standard
Modified Apte split. This split results 13575 documents with training set, 6231documents with
testing set and removes 1771 documents from Reuters-21578 documents since they are not used
either in training or testing set. The utmost number of categories allocated to a document is 16
and its average value is 1.24.It also eliminates 47 categories that do not exist in both training and
testings set, thus a total number of distinct terms in the text corpos after preprocessing is 20298.
Table 2 list the top 12 categories with number of documents assigned both in training and testing
sets after completion of preprocessing.
Table 2. Top 12 categories in Reuters-21578 after preprocessing
4.2 Evaluation Measures
Information Retrieval (IR) is apprehensived with the directorialretrieval of information from a
outsized number of textdocuments.There are two basic criterions widely used in document
88
9. International Journal of Computational Science and Information Technology (IJCSITY) Vol.1, No.4, November 2013
categorization field are precision and recall. Precision is the fraction of returned documents that
are correcttargets, while recall is the fraction of correct target documents returned.
,
Van Rijsbergen [28] introduced the F1 measure as an estimation to to evaluate the text
classification structure.It combines both recall and precision and is given as
(
)
,
where
score reaches its best value at 1 and worst score at 0. Accuracy is defined as the
percent of documents correctly classified in the classes based on the documents target labels.
The preprocessed text corpus was supplied as an input to the second nitche. The kMdd clustering
algorithm groups the documents in to differeent categories. Table 3 summarizes the results
after the application of k-Mdd clustering algorithm over the text corpus. Multiple category lables
that tends to high-fliers are discarded and for clustering accurary uplifment clusters having less
than 5 documents are also removed. The average number of categories per documentis 1.334 and
the average number of documentsper category is about 151 or 1.39% of thecorpus.Table 4 gives
the documents in cluster numbers 01 and cluster 06 and the set of categories associated with these
documents. Table 5 summarizes the total number of clusters generated holding minimum and
maximum number of text documents as cluster size. The k-Mdd clustering accuracy based on k
value is given in Table 6. An important observation from Table 6wasas the k value increases a
proportionate raise in computational time and plunge in accuracy is turned out.
Table 3. Results abridged after k-Medoids clustering on text corpus
Description
Total Number of documents
Number of documents used
Number of documents
Un-used
Number of clusters
Reuters
text corpus
21578
19806
Description
Max. cluster size
Min. cluster size
Reuters
text corpus
3854
5
1772
Average cluster size
171
123
Max. cluster size
3854
Table 4. Sample Cluster numbers with document identifiers and categories
Cluster
Number
01
06
Document IDâs
12385, 12874, 12459, 16518, 18719, 2059,
3150, 3565, 3864,4335, 5805, 5625, 6211, 6481,
6501, 6744, 7994, 8050, 8997, 8449, 8333, 9121,
9381, 9355, 9457
12664, 12378, 14712, 2539, 456, 2564, 3751,
5421, 6789, 64561, 6781, 6977, 7947, 8239
Categories
ship, earn, gas,
strategic, metal,
acq,grain ship, dlr,
trade, money,fx
earn, gas, strategic,
metal, acq, tin
At the end of second nitche the high-fliers are eliminated and training set text corpus is clustered
into different categories based on k-value. The ingrained final nitche is sequenced by triggering
89
10. International Journal of Computational Science and Information Technology (IJCSITY) Vol.1, No.4, November 2013
the augumented kNN over training set and regaling solictations to testing set. The weighed vector
using distance weighted kNN rule is imposed over the testing set based on high similarites and
accessory towards the centroids for each categories in testing set.
It ranks the k nearest neighbors from the training set. The performance of AkNN text classifier
with and with out weighting vector using evaluation criterion and comparisions with traditonal
kNN is summarized as Table 7.
Table 5. Number of Clusters shaped with sizes
Size of clusters
(No.of
documents)
0-5
6-34
35-40
41-45
46-50
51-55
56-60
Total
No. of
Clusters
4
5
8
7
9
6
Size of clusters
(No.of
documents)
61-65
66-75
76-85
86-95
96-125
126-150
151-175
Total
No. of
Clusters
4
7
5
2
7
3
1
Size of clusters
(No.of
documents)
176-200
201-300
301-400
401-500
501-600
601-700
Total Clusters
Total
No. of
Clusters
3
7
1
1
2
2
84
Table 6. K-medoids clustering accuracy on text corpus based on k value.
k-Value
1
2
3
4
5
6
Accuracy(%)
0.9145
0.9622
0.7750
0.7586
0.6711
0.7162
CPU time (sec)
2.58
2.63
3.13
3.86
4.98
6.51
k-Value
7
8
9
10
15
20
Accuracy(%)
0.6230
0.5724
0.6185
0.6573
0.6091
0.5458
CPU time (sec)
08.43
09.73
24.28
30.65
56.53
57.26
Table 7. Preformance evaluation of AkNN text classifier on top 10 categories of Reuters-21578
Precesion
Recall
kNN
AkNN
without
Weights
AkNN
with
Weights
0.89
0.70
0.91
0.78
0.71
0.60
0.77
0.66
0.52
0.63
0.91
0.74
0.91
0.81
0.74
0.64
0.81
0.74
0.54
0.66
0.93
0.77
0.93
0.84
0.79
0.69
0.88
0.79
0.57
0.71
Categories
Acq
Crude
Earn
Grain
Interest
Money
Fx
Ship
Trade
Wheat
F-Measure
kNN
AkNN
without
Weights
AkNN
with
Weights
kNN
AkNN
without
Weights
AkNN
with
Weights
0.93
0.88
0.94
0.81
0.80
0.87
0.85
0.89
0.67
0.31
0.94
0.88
0.95
0.84
0.81
0.87
0.84
0.89
0.71
0.35
0.95
0.89
0.95
0.87
0.83
0.91
0.87
0.93
0.74
0.37
0.9096
0.7797
0.9248
0.7947
0.7523
0.7102
0.808
0.7579
0.5855
0.4155
0.9248
0.804
0.9296
0.8247
0.7734
0.7375
0.8247
0.8081
0.6134
0.4574
0.9399
0.8257
0.9399
0.8547
0.8095
0.7849
0.875
0.8543
0.644
0.4865
The momentous observation on F-measure from Table 7 proliferates AkNN with Weight as the
best performer, despite frantic attempts made by the enterent methods. Figure 2 exhibits the FMeasure dominance of weighted AkNN of Text classifiers over top 10 categories in Reuters text
90
11. International Journal of Computational Science and Information Technology (IJCSITY) Vol.1, No.4, November 2013
corpus. The text classifiers overall accuracy and CPU computational efficiency of AkNN with
and without weighting vector is compared with the traditional kNN and is shown as Table 8.
The significant surveillance from Table 8 was AkNN with Weight outperforms its competitor
both in training and testing phases. The overall accuray of conventional classifier could not tug at
the heartstring of white-collar expatriate of weighted vector AkNNs, thus it complains the proactiveness of amalgmative technique for text mining.
Figure 2. F-Measure of Text classifiers over top 10 categories in Reuters-21578 text corpus
Table 8. Comparision of Algorithms CPU Computational Time and accuracy
Modified Apte split
Training Set
AkNN with Weight
21.6 min
AkNN without Weight
58.47 min
kNN
1.50 hrs
Testing Set
Text Classifiers
overall Accuracy(%)
15.4 secs
51.42 sec
1.54 sec
0.942
0.951
0.873
5. Conclusion
Due to the exponential increase in huge volumes of text documents, there is a need for
analyzingtextcollections, so several techniques have been developed for mining knowledge from
textdocuments. To strengthen the expressiveness of text classification, a novel amalgamative
method using AkNN text classifier and kMdd clustering is tailored. It was well suited for
dimension reduction, attribute feature selection, high-flier identification and reduction, and
assigning weights/ ranks to k nearest neighbors using weighing vector. The experimental analysis
conducted on the benchmark dataset Reuters-21578 rationalizes the reference method
substantially and significantly surpasses traditional text categorization techniques, thus makes the
amalgamation method a very promising and easy-to-use method fortext categorization.
Further research may address the issue of reduction of computational cost, by using soft
computing approaches for text classification and clustering method. There is a scope to
investigate the adjustments in the weighting vector based on k- value in accordance to the size of
the categories and their belongingness to text classifier, thus speculating fruitful success in
deriving accuracy and high performance over training and testing set.
91
12. International Journal of Computational Science and Information Technology (IJCSITY) Vol.1, No.4, November 2013
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
L. I. Kuncheva, Combining Pattern Classifiers, Methods and Algorithms, New York: Wiley, 2005.
Singhal, Amit, "Modern Information Retrieval: A Brief Overview", Bulletin of the IEEE Computer
Society Technical Committee on Data Engineering 24 (4): 35â43,2001.
Shi, M., Edwin, D. S., Menon, R. et. al., âA machine learning approach for the curation of biomedical
literature-KDD Cup 2002 (task 1)â, ACM SIGKDD Explorations Newsletter, Vol. 4(2), pp. 93â94,
2002.
Cover TM, Hart PE, âNearest neighbor pattern classificationâ, IEEE Transactions on Information
Theory 13 (1): 21â27, 1967, doi:10.1109/TIT.1967.1053964.
T. Bailey, K. Jain, âA note on Distance weighted k-nearest neighbor rulesâ, IEEE Trans. Systems,
Man Cybernatics, Vol.8, pp 311-313, 1978.
Geoffrey W. Gates, âReduced Nearest Neighbor Ruleâ, IEEE Trans Information Theory, Vol. 18 No.
3, pp 431-433.
F Angiulli, âFast Condensed Nearest Neighborâ, ACM International Conference Proceedings, Vol
119, pp 25-32.
Guo, Gongde, et al. "Using kNN model for automatic text categorization." International Journal of
Soft Computing, Vol 10, 423-430, 2006.
S. C. Bagui, S. Bagui, K. Pal, âBreast Cancer Detection using Nearest Neighbor Classification Rulesâ,
Pattern Recognition Vol. 36, pp 25-34, 2003.
Parvin, H. Alizadeh and B. Minaei, âModified k Nearest Neighborâ, Proceedings of the world
congress on Engg. and computer science, 2008.
Y. Zeng, Y. Yang, L. Zhou, âPseudo Nearest Neighbor Rule for Pattern Recognitionâ, Expert
Systems with Applications (36) pp 3587-3595, 2009.
Z. Yong, âAn Improved kNN Text Classification Algorithm based on Clusteringâ, Journal of
Computers, Vol. 4, No. 3, March 2009.
Duan, Hubert Haoyang, Vladimir Pestov, and VarunSingla, âText Categorization via Similarity
Search: An Efficient and Effective Novel Algorithmâ, arXiv preprint arXiv:1307.2669, 2013.
Du, Min, and Xing-shu Chen. âAccelerated k-nearest neighbors algorithm based on principal
component analysis for text categorizationâ, Journal of Zhejiang University SCIENCE C 14.6, 407416,2013.
Basu, Tanmay, C. A. Murthy. âTowards enriching the quality of k-nearest neighbor rule for document
classificationâ International Journal of Machine Learning and Cybernetics, pp. 1-9, 2013.
Kaufman, Rousseeuw, P.J., âClustering by means of Medoids, in Statistical Data Analysis Based on
the L1âNorm and Related Methodsâ, edited by Y. Dodge, North-Holland, pp. 405â416, 1987.
E. Bingham, H. Mannila, âRandom projection in dimensionality reduction: applications to image and
text dataâ, In Proc. SIGKDD, pages 245-250, 2001.
S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, R. Harshman, âIndexing by latent semantic
analysisâ Journal of the American Society for Information Science, Vol. 41(6), pp. 391- 407, 1990.
Richard Ernest Bellman, Rand Corporation, âDynamic programmingâ, Princeton University Press.
ISBN 978-0-691-07951-6, Republished: Richard Ernest Bellman, 2003.
W. Wong, A. Fu, âIncremental Document Clustering for Web Page Classificationâ, Int. Conf. on Info.
Society in the 21st century: emerging technologies and new challenges (IS2000), Nov 5-8, 2000,
Japan.
T. Bailey, A. K. Jain, âA note on Distance weighted k-nearest neighbor rulesâ, IEEE Trans. Systems,
Man Cybernatics, Vol.8, pp 311-313, 1978.
Adamson, G.W., Boreham, J. âThe use of an association measure based on character structure to
identify semantically related pairs of words and document titlesâ, Information Storage and Retrieval
pp. 253-260, 1974.
Levenshtein VI, âBinary codes capable of correcting deletions, insertions, and reversalsâ, Soviet
Physics Doklady Vol. 10, pp. 707â10, 1967.
S. A. Dudani, The distance-weighted k-nearest neighbor rule, IEEE Transactions on System, Man,and
Cybernetics, 6 (1976), 325-327.
Lewis, D.D., âTest Collections, Reuters-21578â, http://www.daviddlewis.com/resources/
testcollections/reuters21578.
Cardoso-Cachopo, A., âDatasets for single-label text categorizationâ, http://web.ist.utl.pt/
acardoso/datasets.
92
13. International Journal of Computational Science and Information Technology (IJCSITY) Vol.1, No.4, November 2013
Authors
RamachandraRaoKurada is currently working as Asst. Prof. in Department of Computer Applications at
Shri Vishnu Engineering College for Women, Bhimavaram. He has 12 years of teaching experience and is
a part-time Research Scholar in Dept. of CSE, ANU, Guntur under the guidance of Dr. K KarteekaPavan.
He is a lifemember of CSI and ISTE. His research interests are Computational Intelligence, Data warehouse
& Mining, Networking and Securities.
Dr. K. KarteekaPavan has received her PhD in Computer Science &Engg. fromAcharyaNagarjuna
University in 2011. Earlier she had received her postgraduate degree in Computer Applications from
Andhra University in 1996. She is having 16 years of teaching experience and currently working as
Professor in Department of Information Technology of RVR &JC College of Engineering, Guntur. She has
published more than 20 research publications in various International Journals and Conferences. Her
research interest includes Soft Computing, Bioinformatics, Data mining, and Pattern Recognition.
93