Due to its symbolic role in ubiquitous health monitoring,
physical activity recognition with wearable body sensors has been in the
limelight in both research and industrial communities. Physical activity
recognition is difficult due to the inherent complexity involved with different
walking styles and human body movements. Thus we present a
correntropy induced dictionary pair learning framework to achieve this
recognition. Our algorithm for this framework jointly learns a synthesis
dictionary and an analysis dictionary in order to simultaneously perform
signal representation and classification once the time-domain features
have been extracted. In particular, the dictionary pair learning algorithm
is developed based on the maximum correntropy criterion, which
is much more insensitive to outliers. In order to develop a more tractable
and practical approach, we employ a combination of alternating direction
method of multipliers and an iteratively reweighted method to approximately
minimize the objective function. We validate the effectiveness of
our proposed model by employing it on an activity recognition problem
and an intensity estimation problem, both of which include a large number
of physical activities from the recently released PAMAP2 dataset.
Experimental results indicate that classifiers built using this correntropy
induced dictionary learning based framework achieve high accuracy by
using simple features, and that this approach gives results competitive
with classical systems built upon features with prior knowledge.
Classification of MR medical images Based Rough-Fuzzy KMeansIOSRJM
The document summarizes a proposed algorithm for classifying MR medical images using Rough-Fuzzy K-Means (FRKM). It begins with an introduction to the challenges of medical image classification and a literature review of previous techniques. It then provides background on rough set theory, fuzzy set theory, and K-means clustering. The proposed FRKM algorithm is described as using rough set theory for feature selection and dimensionality reduction, followed by a K-means clustering with probabilities assigned based on rough set approximations to classify ambiguous areas. Experimental results show the FRKM approach achieves 94.4% accuracy, higher than other techniques.
This document summarizes a research paper that evaluates cluster quality using a modified density subspace clustering approach. It discusses how density subspace clustering can be used to identify clusters in high-dimensional datasets by detecting density-connected clusters in all subspaces. The proposed approach uses a density subspace clustering algorithm to select attribute subsets and identify the best clusters. It then calculates intra-cluster and inter-cluster distances to evaluate cluster quality and compares the results to other clustering algorithms in terms of accuracy and runtime. Experimental results showed that the proposed method improves clustering quality and performs faster than existing techniques.
Mr image compression based on selection of mother wavelet and lifting based w...ijma
Magnetic Resonance (MR) image is a medical image technique required enormous data to be stored and
transmitted for high quality diagnostic application. Various algorithms have been proposed to improve the
performance of the compression scheme. In this paper we extended the commonly used algorithms to image
compression and compared its performance. For an image compression technique, we have linked different
wavelet techniques using traditional mother wavelets and lifting based Cohen-Daubechies-Feauveau
wavelets with the low-pass filters of the length 9 and 7 (CDF 9/7) wavelet transform with Set Partition in
Hierarchical Trees (SPIHT) algorithm. A novel image quality index with highlighting shape of histogram
of the image targeted is introduced to assess image compression quality. The index will be used in place of
existing traditional Universal Image Quality Index (UIQI) “in one go”. It offers extra information about
the distortion between an original image and a compressed image in comparisons with UIQI. The proposed
index is designed based on modelling image compression as combinations of four major factors: loss of
correlation, luminance distortion, contrast distortion and shape distortion. This index is easy to calculate
and applicable in various image processing applications. One of our contributions is to demonstrate the
choice of mother wavelet is very important for achieving superior wavelet compression performances based
on proposed image quality indexes. Experimental results show that the proposed image quality index plays
a significantly role in the quality evaluation of image compression on the open sources “BrainWeb:
Simulated Brain Database (SBD) ”.
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...cscpconf
Improving accuracy of supervised classification algorithms in biomedical applications,
especially CADx, is one of active area of research. This paper proposes construction of rotation
forest (RF) ensemble using 20 learners over two clinical datasets namely lymphography and
backache. We propose a new feature selection strategy based on support vector machines
optimized by particle swarm optimization for relevant and minimum feature subset for obtaining
higher accuracy of ensembles. We have quantitatively analyzed 20 base learners over two
datasets and carried out the experiments with 10 fold cross validation leave-one-out strategy
and the performance of 20 classifiers are evaluated using performance metrics namely accuracy
(acc), kappa value (K), root mean square error (RMSE) and area under receiver operating
characteristics curve (ROC). Base classifiers succeeded 79.96% & 81.71% average accuracies
for lymphography & backache datasets respectively. As for RF ensembles, they produced
average accuracies of 83.72% & 85.77% for respective diseases. The paper presents promising
results using RF ensembles and provides a new direction towards construction of reliable and robust medical diagnosis systems.
Improved probabilistic distance based locality preserving projections method ...IJECEIAES
In this paper, a dimensionality reduction is achieved in large datasets using the proposed distance based Non-integer Matrix Factorization (NMF) technique, which is intended to solve the data dimensionality problem. Here, NMF and distance measurement aim to resolve the non-orthogonality problem due to increased dataset dimensionality. It initially partitions the datasets, organizes them into a defined geometric structure and it avoids capturing the dataset structure through a distance based similarity measurement. The proposed method is designed to fit the dynamic datasets and it includes the intrinsic structure using data geometry. Therefore, the complexity of data is further avoided using an Improved Distance based Locality Preserving Projection. The proposed method is evaluated against existing methods in terms of accuracy, average accuracy, mutual information and average mutual information.
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
This document provides a survey of optimization approaches that have been applied to text document clustering. It discusses several clustering algorithms and categorizes them as partitioning methods, hierarchical methods, density-based methods, grid-based methods, model-based methods, frequent pattern-based clustering, and constraint-based clustering. It then describes several soft computing techniques that have been used as optimization approaches for text document clustering, including genetic algorithms, bees algorithms, particle swarm optimization, and ant colony optimization. These optimization techniques perform a global search to improve the quality and efficiency of document clustering algorithms.
Novel text categorization by amalgamation of augmented k nearest neighbourhoo...ijcsity
Machine learning for text classification is the
underpinning
of document
cataloging
, news filtering,
document
steering
and
exemplif
ication
. In text mining realm, effective feature selection is significant to
make the learning task more accurate and competent. One of the
traditional
lazy
text classifier
k
-
Nearest
Neighborhood (
k
NN) has
a
major pitfall in calculating the similarity between
all
the
objects in training and
testing se
t
s,
there by leads to exaggeration of
both
computational complexity
of the algorithm
and
massive
consumption
of
main memory
. To diminish these shortcomings
in
viewpoint
of a
data
-
mining
practitioner
a
n
amalgamati
ve technique is proposed in this paper using
a novel restructured version of
k
NN called
Augmented
k
NN
(AkNN)
and
k
-
Medoids
(kMdd)
clustering.
The proposed work
comprises
preprocesses
on
the
initial training
set
by
imposing
attribute feature selection
for reduc
tion of high dimensionality, also it
detects and excludes the high
-
fliers
samples
in t
he
initial
training set
and
re
structure
s
a
constricted
training
set
.
The kMdd clustering algorithm generates the cluster centers (as interior objects) for each category
and
restructures
the constricted training set
with centroids
. This technique
is
amalgamated with
AkNN
classifier
that
was prearranged with
text mining similarity measure
s.
Eventually, s
ignifican
tweights
and ranks were
assigned to each object in the new
training set based upon the
ir
accessory towards the
object in testing set
.
Experiments
conducted
on Reuters
-
21578 a
UCI benchmark
text mining
data
set
, and
comparisons with
traditional
k
NN
classifier designates
the
referred
method
yield
spreeminentrecital
in b
oth clustering and
classification
The document provides a literature review of different clustering techniques. It begins by defining clustering and its applications. It then categorizes and describes several clustering methods including hierarchical (BIRCH, CURE, CHAMELEON), partitioning (k-means, k-medoids), density-based (DBSCAN, OPTICS, DENCLUE), grid-based (CLIQUE, STING, MAFIA), and model-based (RBMN, SOM) methods. For each method, it discusses the algorithm, advantages, disadvantages and time complexity. The document aims to provide an overview of various clustering techniques for classification and comparison.
Classification of MR medical images Based Rough-Fuzzy KMeansIOSRJM
The document summarizes a proposed algorithm for classifying MR medical images using Rough-Fuzzy K-Means (FRKM). It begins with an introduction to the challenges of medical image classification and a literature review of previous techniques. It then provides background on rough set theory, fuzzy set theory, and K-means clustering. The proposed FRKM algorithm is described as using rough set theory for feature selection and dimensionality reduction, followed by a K-means clustering with probabilities assigned based on rough set approximations to classify ambiguous areas. Experimental results show the FRKM approach achieves 94.4% accuracy, higher than other techniques.
This document summarizes a research paper that evaluates cluster quality using a modified density subspace clustering approach. It discusses how density subspace clustering can be used to identify clusters in high-dimensional datasets by detecting density-connected clusters in all subspaces. The proposed approach uses a density subspace clustering algorithm to select attribute subsets and identify the best clusters. It then calculates intra-cluster and inter-cluster distances to evaluate cluster quality and compares the results to other clustering algorithms in terms of accuracy and runtime. Experimental results showed that the proposed method improves clustering quality and performs faster than existing techniques.
Mr image compression based on selection of mother wavelet and lifting based w...ijma
Magnetic Resonance (MR) image is a medical image technique required enormous data to be stored and
transmitted for high quality diagnostic application. Various algorithms have been proposed to improve the
performance of the compression scheme. In this paper we extended the commonly used algorithms to image
compression and compared its performance. For an image compression technique, we have linked different
wavelet techniques using traditional mother wavelets and lifting based Cohen-Daubechies-Feauveau
wavelets with the low-pass filters of the length 9 and 7 (CDF 9/7) wavelet transform with Set Partition in
Hierarchical Trees (SPIHT) algorithm. A novel image quality index with highlighting shape of histogram
of the image targeted is introduced to assess image compression quality. The index will be used in place of
existing traditional Universal Image Quality Index (UIQI) “in one go”. It offers extra information about
the distortion between an original image and a compressed image in comparisons with UIQI. The proposed
index is designed based on modelling image compression as combinations of four major factors: loss of
correlation, luminance distortion, contrast distortion and shape distortion. This index is easy to calculate
and applicable in various image processing applications. One of our contributions is to demonstrate the
choice of mother wavelet is very important for achieving superior wavelet compression performances based
on proposed image quality indexes. Experimental results show that the proposed image quality index plays
a significantly role in the quality evaluation of image compression on the open sources “BrainWeb:
Simulated Brain Database (SBD) ”.
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...cscpconf
Improving accuracy of supervised classification algorithms in biomedical applications,
especially CADx, is one of active area of research. This paper proposes construction of rotation
forest (RF) ensemble using 20 learners over two clinical datasets namely lymphography and
backache. We propose a new feature selection strategy based on support vector machines
optimized by particle swarm optimization for relevant and minimum feature subset for obtaining
higher accuracy of ensembles. We have quantitatively analyzed 20 base learners over two
datasets and carried out the experiments with 10 fold cross validation leave-one-out strategy
and the performance of 20 classifiers are evaluated using performance metrics namely accuracy
(acc), kappa value (K), root mean square error (RMSE) and area under receiver operating
characteristics curve (ROC). Base classifiers succeeded 79.96% & 81.71% average accuracies
for lymphography & backache datasets respectively. As for RF ensembles, they produced
average accuracies of 83.72% & 85.77% for respective diseases. The paper presents promising
results using RF ensembles and provides a new direction towards construction of reliable and robust medical diagnosis systems.
Improved probabilistic distance based locality preserving projections method ...IJECEIAES
In this paper, a dimensionality reduction is achieved in large datasets using the proposed distance based Non-integer Matrix Factorization (NMF) technique, which is intended to solve the data dimensionality problem. Here, NMF and distance measurement aim to resolve the non-orthogonality problem due to increased dataset dimensionality. It initially partitions the datasets, organizes them into a defined geometric structure and it avoids capturing the dataset structure through a distance based similarity measurement. The proposed method is designed to fit the dynamic datasets and it includes the intrinsic structure using data geometry. Therefore, the complexity of data is further avoided using an Improved Distance based Locality Preserving Projection. The proposed method is evaluated against existing methods in terms of accuracy, average accuracy, mutual information and average mutual information.
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
This document provides a survey of optimization approaches that have been applied to text document clustering. It discusses several clustering algorithms and categorizes them as partitioning methods, hierarchical methods, density-based methods, grid-based methods, model-based methods, frequent pattern-based clustering, and constraint-based clustering. It then describes several soft computing techniques that have been used as optimization approaches for text document clustering, including genetic algorithms, bees algorithms, particle swarm optimization, and ant colony optimization. These optimization techniques perform a global search to improve the quality and efficiency of document clustering algorithms.
Novel text categorization by amalgamation of augmented k nearest neighbourhoo...ijcsity
Machine learning for text classification is the
underpinning
of document
cataloging
, news filtering,
document
steering
and
exemplif
ication
. In text mining realm, effective feature selection is significant to
make the learning task more accurate and competent. One of the
traditional
lazy
text classifier
k
-
Nearest
Neighborhood (
k
NN) has
a
major pitfall in calculating the similarity between
all
the
objects in training and
testing se
t
s,
there by leads to exaggeration of
both
computational complexity
of the algorithm
and
massive
consumption
of
main memory
. To diminish these shortcomings
in
viewpoint
of a
data
-
mining
practitioner
a
n
amalgamati
ve technique is proposed in this paper using
a novel restructured version of
k
NN called
Augmented
k
NN
(AkNN)
and
k
-
Medoids
(kMdd)
clustering.
The proposed work
comprises
preprocesses
on
the
initial training
set
by
imposing
attribute feature selection
for reduc
tion of high dimensionality, also it
detects and excludes the high
-
fliers
samples
in t
he
initial
training set
and
re
structure
s
a
constricted
training
set
.
The kMdd clustering algorithm generates the cluster centers (as interior objects) for each category
and
restructures
the constricted training set
with centroids
. This technique
is
amalgamated with
AkNN
classifier
that
was prearranged with
text mining similarity measure
s.
Eventually, s
ignifican
tweights
and ranks were
assigned to each object in the new
training set based upon the
ir
accessory towards the
object in testing set
.
Experiments
conducted
on Reuters
-
21578 a
UCI benchmark
text mining
data
set
, and
comparisons with
traditional
k
NN
classifier designates
the
referred
method
yield
spreeminentrecital
in b
oth clustering and
classification
The document provides a literature review of different clustering techniques. It begins by defining clustering and its applications. It then categorizes and describes several clustering methods including hierarchical (BIRCH, CURE, CHAMELEON), partitioning (k-means, k-medoids), density-based (DBSCAN, OPTICS, DENCLUE), grid-based (CLIQUE, STING, MAFIA), and model-based (RBMN, SOM) methods. For each method, it discusses the algorithm, advantages, disadvantages and time complexity. The document aims to provide an overview of various clustering techniques for classification and comparison.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
This document summarizes a research paper on using a k-means clustering method to detect brain tumors in MRI images. The paper introduces brain tumors and MRI imaging. It then describes using k-means clustering for tumor segmentation, which groups similar image patterns into clusters to identify the tumor region. The paper presents results of applying k-means to two MRI images, including statistical measures of segmentation accuracy, tumor area comparison, and timing. The k-means method achieved average rand index of 0.8358, low average errors, and tumor areas close to manual segmentation in under 3 seconds, demonstrating potential for accurate and efficient brain tumor detection.
Reduct generation for the incremental data using rough set theorycsandit
n today’s changing world huge amount of data is ge
nerated and transferred frequently.
Although the data is sometimes static but most comm
only it is dynamic and transactional. New
data that is being generated is getting constantly
added to the old/existing data. To discover the
knowledge from this incremental data, one approach
is to run the algorithm repeatedly for the
modified data sets which is time consuming. The pap
er proposes a dimension reduction
algorithm that can be applied in dynamic environmen
t for generation of reduced attribute set as
dynamic reduct.
The method analyzes the new dataset, when it become
s available, and modifies
the reduct accordingly to fit the entire dataset. T
he concepts of discernibility relation, attribute
dependency and attribute significance of Rough Set
Theory are integrated for the generation of
dynamic reduct set, which not only reduces the comp
lexity but also helps to achieve higher
accuracy of the decision system. The proposed metho
d has been applied on few benchmark
dataset collected from the UCI repository and a dyn
amic reduct is computed. Experimental
result shows the efficiency of the proposed method
A Binary Bat Inspired Algorithm for the Classification of Breast Cancer Data ijscai
This document summarizes a research paper that proposes using a binary bat algorithm to classify breast cancer data. The researchers developed a hybrid model combining a binary bat algorithm and feedforward neural network. The binary bat algorithm was used to generate a activation function for training the neural network and minimize error. Testing of the model on three breast cancer datasets produced an accuracy of 92.61% for training data and 89.95% for testing data, showing potential for classifying breast cancer as malignant or benign.
The document describes a new algorithm for long-term robust visual tracking of moving objects in video sequences. The algorithm aims to overcome challenges such as geometric deformations, partial or total occlusions, and recovery after the target leaves the field of vision. It does not rely on a probabilistic process or require prior detection data. Experimental results on difficult video sequences demonstrate advantages over recent trackers. The algorithm can be used in applications like video surveillance, active vision, and industrial visual servoing.
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSijdkp
Subspace clustering discovers the clusters embedded in multiple, overlapping subspaces of high
dimensional data. Many significant subspace clustering algorithms exist, each having different
characteristics caused by the use of different techniques, assumptions, heuristics used etc. A comprehensive
classification scheme is essential which will consider all such characteristics to divide subspace clustering
approaches in various families. The algorithms belonging to same family will satisfy common
characteristics. Such a categorization will help future developers to better understand the quality criteria to
be used and similar algorithms to be used to compare results with their proposed clustering algorithms. In
this paper, we first proposed the concept of SCAF (Subspace Clustering Algorithms’ Family).
Characteristics of SCAF will be based on the classes such as cluster orientation, overlap of dimensions etc.
As an illustration, we further provided a comprehensive, systematic description and comparison of few
significant algorithms belonging to “Axis parallel, overlapping, density based” SCAF.
Multi fractal analysis of human brain mr imageeSAT Journals
Abstract In computer programming, code smell may origin of latent problems in source code. Detecting and resolving bad smells remain time intense for software engineers despite proposals on bad smell detecting and refactoring tools. Numerous code smells have been recognized yet the sequence in which the detection and resolution of different kinds of code smells are performed because software engineers do not know how to optimize sequence. In this paper, the novel refactoring approach is proposed to improve the performance of programs. In this recommended approach the code smells are automatically detected and refactored. The simulation results propose the reduction of time over the semi-automated refactoring are achieved when code smells are refactored by using multi-step automated refactoring. Keywords: Code smell, multi step refactoring, detection, code resolution, restructuring etc
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Geometric Correction for Braille Document Images csandit
Image processing is an important research area in computer vision. clustering is an unsupervised
study. clustering can also be used for image segmentation. there exist so many methods for image
segmentation. image segmentation plays an important role in image analysis.it is one of the first
and the most important tasks in image analysis and computer vision. this proposed system
presents a variation of fuzzy c-means algorithm that provides image clustering. the kernel fuzzy
c-means clustering algorithm (kfcm) is derived from the fuzzy c-means clustering
algorithm(fcm).the kfcm algorithm that provides image clustering and improves accuracy
significantly compared with classical fuzzy c-means algorithm. the new algorithm is called
gaussian kernel based fuzzy c-means clustering algorithm (gkfcm)the major characteristic of
gkfcm is the use of a fuzzy clustering approach ,aiming to guarantee noise insensitiveness and
image detail preservation.. the objective of the work is to cluster the low intensity in homogeneity
area from the noisy images, using the clustering method, segmenting that portion separately using
content level set approach. the purpose of designing this system is to produce better segmentation
results for images corrupted by noise, so that it can be useful in various fields like medical image
analysis, such as tumor detection, study of anatomical structure, and treatment planning.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
This document discusses multidimensional clustering methods for data mining and their industrial applications. It begins with an introduction to clustering, including definitions and goals. Popular clustering algorithms are described, such as K-means, fuzzy C-means, hierarchical clustering, and mixture of Gaussians. Distance measures and their importance in clustering are covered. The K-means and fuzzy C-means algorithms are explained in detail. Examples are provided to illustrate fuzzy C-means clustering. Finally, applications of clustering algorithms in fields such as marketing, biology, and earth sciences are mentioned.
Ensemble based Distributed K-Modes ClusteringIJERD Editor
Clustering has been recognized as the unsupervised classification of data items into groups. Due to the explosion in the number of autonomous data sources, there is an emergent need for effective approaches in distributed clustering. The distributed clustering algorithm is used to cluster the distributed datasets without gathering all the data in a single site. The K-Means is a popular clustering method owing to its simplicity and speed in clustering large datasets. But it fails to handle directly the datasets with categorical attributes which are generally occurred in real life datasets. Huang proposed the K-Modes clustering algorithm by introducing a new dissimilarity measure to cluster categorical data. This algorithm replaces means of clusters with a frequency based method which updates modes in the clustering process to minimize the cost function. Most of the distributed clustering algorithms found in the literature seek to cluster numerical data. In this paper, a novel Ensemble based Distributed K-Modes clustering algorithm is proposed, which is well suited to handle categorical data sets as well as to perform distributed clustering process in an asynchronous manner. The performance of the proposed algorithm is compared with the existing distributed K-Means clustering algorithms, and K-Modes based Centralized Clustering algorithm. The experiments are carried out for various datasets of UCI machine learning data repository.
The document summarizes research on medical image segmentation algorithms. It discusses k-means clustering, fuzzy c-means clustering, and proposes enhancements to these algorithms. Specifically, it introduces an enhanced k-means algorithm that improves initial cluster center selection. It also presents a kernelized fuzzy c-means approach that maps data points into a feature space to perform clustering. The algorithms are tested on MRI brain images and evaluated based on segmentation accuracy. The enhanced methods aim to produce more precise segmentations for medical applications such as diagnosis and treatment planning.
Mine Blood Donors Information through Improved K-Means Clusteringijcsity
The number of accidents and health diseases which are increasing at an alarming rate are resulting in a huge increase in the demand for blood. There is a necessity for the organized analysis of the blood donor database or blood banks repositories. Clustering analysis is one of the data mining applications and K-means clustering algorithm is the fundamental algorithm for modern clustering techniques. K-means clustering algorithm is traditional approach and iterative algorithm. At every iteration, it attempts to find the distance from the centroid of each cluster to each and every data point. This paper gives the improvement to the original k-means algorithm by improving the initial centroids with distribution of data. Results and discussions show that improved K-means algorithm produces accurate clusters in less computation time to find the donors information
This document summarizes four techniques used to extract brain tumor regions from MRI images: 1) Gray level stretching and Sobel edge detection, 2) K-Means clustering based on location and intensity, 3) Fuzzy C-Means clustering, and 4) an adapted K-Means and Fuzzy C-Means technique. The techniques were able to successfully detect and extract brain tumors, which helps doctors identify tumor size and location. Clustering algorithms like K-Means and Fuzzy C-Means were used to segment MRI images into clusters representing different tissue types to identify tumor regions.
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...IJITCA Journal
In this paper, we applied a dynamic model for manoeuvring targets in SIR particle filter algorithm for improving tracking accuracy of multiple manoeuvring targets. In our proposed approach, a color distribution model is used to detect changes of target's model . Our proposed approach controls
deformation of target's model. If deformation of target's model is larger than a predetermined threshold,then the model will be updated. Global Nearest Neighbor (GNN) algorithm is used as data association algorithm. We named our proposed method as Deformation Detection Particle Filter (DDPF) . DDPF
approach is compared with basic SIR-PF algorithm on real airshow videos. Comparisons results show that, the basic SIR-PF algorithm is not able to track the manoeuvring targets when the rotation or scaling is occurred in target' s model. However, DDPF approach updates target's model when the rotation or
scaling is occurred. Thus, the proposed approach is able to track the manoeuvring targets more efficiently
and accurately.
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...IJECEIAES
Data analysis plays a prominent role in interpreting various phenomena. Data mining is the process to hypothesize useful knowledge from the extensive data. Based upon the classical statistical prototypes the data can be exploited beyond the storage and management of the data. Cluster analysis a primary investigation with little or no prior knowledge, consists of research and development across a wide variety of communities. Cluster ensembles are melange of individual solutions obtained from different clusterings to produce final quality clustering which is required in wider applications. The method arises in the perspective of increasing robustness, scalability and accuracy. This paper gives a brief overview of the generation methods and consensus functions included in cluster ensemble. The survey is to analyze the various techniques and cluster ensemble methods.
This document provides a literature review of deep reinforcement learning applications in medical imaging. It begins with introducing deep reinforcement learning and reinforcement learning concepts. It then discusses several medical imaging applications of deep reinforcement learning, including landmark detection, image registration, object lesion localization and detection, view plane localization, and plaque tracking. Deep reinforcement learning has also been used for optimization tasks in medical imaging like hyperparameter tuning, data augmentation, and neural architecture search. While promising results have been shown, the document notes that deep reinforcement learning has not been fully utilized to meet clinical image segmentation and classification requirements.
Centralized Class Specific Dictionary Learning for wearable sensors based phy...Sherin Mathews
With recent progress in pervasive healthcare,
physical activity recognition with wearable body sensors has
become an important and challenging area in both research and
industrial communities. Here, we address a novel technique for
a sensor platform that performs physical activity recognition by
leveraging a class specific regularizer term into the dictionary
pair learning objective function. The proposed algorithm jointly
learns a synthesis dictionary and an analysis dictionary in
order to simultaneously perform signal representation and
classification once the time-domain features have been extracted.
Specifically, the class specific regularizer term ensures that the
sparse codes belonging to the same class will be concentrated
thereby proving beneficial for the classification stage. In order
to develop a more practical approach, we employ a combination
of an alternating direction method of multipliers and a l1 − ls
minimization method to approximately minimize the objective
function. We validate the effectiveness of our proposed model
by employing it on two activity recognition problem and an
intensity estimation problem, both of which include a large
number of physical activities. Experimental results demonstrate
that classifiers built in this dictionary learning based framework
outperforms state of art algorithms by using simple features,
thereby achieving competitive results when compared with
classical systems built upon features with prior knowledge
Centralized Class Specific Dictionary Learning for Wearable Sensors based phy...sherinmm
This document proposes a centralized class specific dictionary learning framework for physical activity recognition using wearable sensors. It incorporates a class specific regularizer term into the dictionary pair learning objective function to make sparse codes belonging to the same class more concentrated. This improves classification performance. The framework involves learning a synthesis dictionary and analysis dictionary jointly. It extracts time-domain features from acceleration and heart rate sensor data. Experimental results on activity recognition datasets demonstrate the framework outperforms state-of-the-art methods using only simple features, achieving competitive results to approaches using more complex, domain-specific features.
Wearable sensor-based human activity recognition with ensemble learning: a co...IJECEIAES
The spectacular growth of wearable sensors has provided a key contribution to the field of human activity recognition. Due to its effective and versatile usage and application in various fields such as smart homes and medical areas, human activity recognition has always been an appealing research topic in artificial intelligence. From this perspective, there are a lot of existing works that make use of accelerometer and gyroscope sensor data for recognizing human activities. This paper presents a comparative study of ensemble learning methods for human activity recognition. The methods include random forest, adaptive boosting, gradient boosting, extreme gradient boosting, and light gradient boosting machine (LightGBM). Among the ensemble learning methods in comparison, light gradient boosting machine and random forest demonstrate the best performance. The experimental results revealed that light gradient boosting machine yields the highest accuracy of 94.50% on UCI-HAR dataset and 100% on single accelerometer dataset while random forest records the highest accuracy of 93.41% on motion sense dataset.
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERIJCSEA Journal
Comparison study of algorithms is very much required before implementing them for the needs of any
organization. The comparisons of algorithms are depending on the various parameters such as data
frequency, types of data and relationship among the attributes in a given data set. There are number of
learning and classifications algorithms are used to analyse, learn patterns and categorize data are
available. But the problem is the one to find the best algorithm according to the problem and desired
output. The desired result has always been higher accuracy in predicting future values or events from the
given dataset. Algorithms taken for the comparisons study are Neural net, SVM, Naïve Bayes, BFT and
Decision stump. These top algorithms are most influential data mining algorithms in the research
community. These algorithms have been considered and mostly used in the field of knowledge discovery
and data mining.
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...cscpconf
The growing population of elders in the society calls for a new approach in care giving. By inferring what activities elderly are performing in their houses it is possible to determine their
physical and cognitive capabilities. In this paper we show the potential of important discriminative classifiers namely the Soft-Support Vector Machines (C-SVM), Conditional Random Fields (CRF) and k-Nearest Neighbors (k-NN) for recognizing activities from sensor patterns in a smart home environment. We address also the class imbalance problem in activity recognition field which has been known to hinder the learning performance of classifiers. Cost sensitive learning is attractive under most imbalanced circumstances, but it is difficult to determine the precise misclassification costs in practice. We introduce a new criterion for selecting the suitable cost parameter C of the C-SVM method. Through our evaluation on four real world imbalanced activity datasets, we demonstrate that C-SVM based on our proposed criterion outperforms the state-of-the-art discriminative methods in activity recognition.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
This document summarizes a research paper on using a k-means clustering method to detect brain tumors in MRI images. The paper introduces brain tumors and MRI imaging. It then describes using k-means clustering for tumor segmentation, which groups similar image patterns into clusters to identify the tumor region. The paper presents results of applying k-means to two MRI images, including statistical measures of segmentation accuracy, tumor area comparison, and timing. The k-means method achieved average rand index of 0.8358, low average errors, and tumor areas close to manual segmentation in under 3 seconds, demonstrating potential for accurate and efficient brain tumor detection.
Reduct generation for the incremental data using rough set theorycsandit
n today’s changing world huge amount of data is ge
nerated and transferred frequently.
Although the data is sometimes static but most comm
only it is dynamic and transactional. New
data that is being generated is getting constantly
added to the old/existing data. To discover the
knowledge from this incremental data, one approach
is to run the algorithm repeatedly for the
modified data sets which is time consuming. The pap
er proposes a dimension reduction
algorithm that can be applied in dynamic environmen
t for generation of reduced attribute set as
dynamic reduct.
The method analyzes the new dataset, when it become
s available, and modifies
the reduct accordingly to fit the entire dataset. T
he concepts of discernibility relation, attribute
dependency and attribute significance of Rough Set
Theory are integrated for the generation of
dynamic reduct set, which not only reduces the comp
lexity but also helps to achieve higher
accuracy of the decision system. The proposed metho
d has been applied on few benchmark
dataset collected from the UCI repository and a dyn
amic reduct is computed. Experimental
result shows the efficiency of the proposed method
A Binary Bat Inspired Algorithm for the Classification of Breast Cancer Data ijscai
This document summarizes a research paper that proposes using a binary bat algorithm to classify breast cancer data. The researchers developed a hybrid model combining a binary bat algorithm and feedforward neural network. The binary bat algorithm was used to generate a activation function for training the neural network and minimize error. Testing of the model on three breast cancer datasets produced an accuracy of 92.61% for training data and 89.95% for testing data, showing potential for classifying breast cancer as malignant or benign.
The document describes a new algorithm for long-term robust visual tracking of moving objects in video sequences. The algorithm aims to overcome challenges such as geometric deformations, partial or total occlusions, and recovery after the target leaves the field of vision. It does not rely on a probabilistic process or require prior detection data. Experimental results on difficult video sequences demonstrate advantages over recent trackers. The algorithm can be used in applications like video surveillance, active vision, and industrial visual servoing.
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSijdkp
Subspace clustering discovers the clusters embedded in multiple, overlapping subspaces of high
dimensional data. Many significant subspace clustering algorithms exist, each having different
characteristics caused by the use of different techniques, assumptions, heuristics used etc. A comprehensive
classification scheme is essential which will consider all such characteristics to divide subspace clustering
approaches in various families. The algorithms belonging to same family will satisfy common
characteristics. Such a categorization will help future developers to better understand the quality criteria to
be used and similar algorithms to be used to compare results with their proposed clustering algorithms. In
this paper, we first proposed the concept of SCAF (Subspace Clustering Algorithms’ Family).
Characteristics of SCAF will be based on the classes such as cluster orientation, overlap of dimensions etc.
As an illustration, we further provided a comprehensive, systematic description and comparison of few
significant algorithms belonging to “Axis parallel, overlapping, density based” SCAF.
Multi fractal analysis of human brain mr imageeSAT Journals
Abstract In computer programming, code smell may origin of latent problems in source code. Detecting and resolving bad smells remain time intense for software engineers despite proposals on bad smell detecting and refactoring tools. Numerous code smells have been recognized yet the sequence in which the detection and resolution of different kinds of code smells are performed because software engineers do not know how to optimize sequence. In this paper, the novel refactoring approach is proposed to improve the performance of programs. In this recommended approach the code smells are automatically detected and refactored. The simulation results propose the reduction of time over the semi-automated refactoring are achieved when code smells are refactored by using multi-step automated refactoring. Keywords: Code smell, multi step refactoring, detection, code resolution, restructuring etc
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Geometric Correction for Braille Document Images csandit
Image processing is an important research area in computer vision. clustering is an unsupervised
study. clustering can also be used for image segmentation. there exist so many methods for image
segmentation. image segmentation plays an important role in image analysis.it is one of the first
and the most important tasks in image analysis and computer vision. this proposed system
presents a variation of fuzzy c-means algorithm that provides image clustering. the kernel fuzzy
c-means clustering algorithm (kfcm) is derived from the fuzzy c-means clustering
algorithm(fcm).the kfcm algorithm that provides image clustering and improves accuracy
significantly compared with classical fuzzy c-means algorithm. the new algorithm is called
gaussian kernel based fuzzy c-means clustering algorithm (gkfcm)the major characteristic of
gkfcm is the use of a fuzzy clustering approach ,aiming to guarantee noise insensitiveness and
image detail preservation.. the objective of the work is to cluster the low intensity in homogeneity
area from the noisy images, using the clustering method, segmenting that portion separately using
content level set approach. the purpose of designing this system is to produce better segmentation
results for images corrupted by noise, so that it can be useful in various fields like medical image
analysis, such as tumor detection, study of anatomical structure, and treatment planning.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
This document discusses multidimensional clustering methods for data mining and their industrial applications. It begins with an introduction to clustering, including definitions and goals. Popular clustering algorithms are described, such as K-means, fuzzy C-means, hierarchical clustering, and mixture of Gaussians. Distance measures and their importance in clustering are covered. The K-means and fuzzy C-means algorithms are explained in detail. Examples are provided to illustrate fuzzy C-means clustering. Finally, applications of clustering algorithms in fields such as marketing, biology, and earth sciences are mentioned.
Ensemble based Distributed K-Modes ClusteringIJERD Editor
Clustering has been recognized as the unsupervised classification of data items into groups. Due to the explosion in the number of autonomous data sources, there is an emergent need for effective approaches in distributed clustering. The distributed clustering algorithm is used to cluster the distributed datasets without gathering all the data in a single site. The K-Means is a popular clustering method owing to its simplicity and speed in clustering large datasets. But it fails to handle directly the datasets with categorical attributes which are generally occurred in real life datasets. Huang proposed the K-Modes clustering algorithm by introducing a new dissimilarity measure to cluster categorical data. This algorithm replaces means of clusters with a frequency based method which updates modes in the clustering process to minimize the cost function. Most of the distributed clustering algorithms found in the literature seek to cluster numerical data. In this paper, a novel Ensemble based Distributed K-Modes clustering algorithm is proposed, which is well suited to handle categorical data sets as well as to perform distributed clustering process in an asynchronous manner. The performance of the proposed algorithm is compared with the existing distributed K-Means clustering algorithms, and K-Modes based Centralized Clustering algorithm. The experiments are carried out for various datasets of UCI machine learning data repository.
The document summarizes research on medical image segmentation algorithms. It discusses k-means clustering, fuzzy c-means clustering, and proposes enhancements to these algorithms. Specifically, it introduces an enhanced k-means algorithm that improves initial cluster center selection. It also presents a kernelized fuzzy c-means approach that maps data points into a feature space to perform clustering. The algorithms are tested on MRI brain images and evaluated based on segmentation accuracy. The enhanced methods aim to produce more precise segmentations for medical applications such as diagnosis and treatment planning.
Mine Blood Donors Information through Improved K-Means Clusteringijcsity
The number of accidents and health diseases which are increasing at an alarming rate are resulting in a huge increase in the demand for blood. There is a necessity for the organized analysis of the blood donor database or blood banks repositories. Clustering analysis is one of the data mining applications and K-means clustering algorithm is the fundamental algorithm for modern clustering techniques. K-means clustering algorithm is traditional approach and iterative algorithm. At every iteration, it attempts to find the distance from the centroid of each cluster to each and every data point. This paper gives the improvement to the original k-means algorithm by improving the initial centroids with distribution of data. Results and discussions show that improved K-means algorithm produces accurate clusters in less computation time to find the donors information
This document summarizes four techniques used to extract brain tumor regions from MRI images: 1) Gray level stretching and Sobel edge detection, 2) K-Means clustering based on location and intensity, 3) Fuzzy C-Means clustering, and 4) an adapted K-Means and Fuzzy C-Means technique. The techniques were able to successfully detect and extract brain tumors, which helps doctors identify tumor size and location. Clustering algorithms like K-Means and Fuzzy C-Means were used to segment MRI images into clusters representing different tissue types to identify tumor regions.
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...IJITCA Journal
In this paper, we applied a dynamic model for manoeuvring targets in SIR particle filter algorithm for improving tracking accuracy of multiple manoeuvring targets. In our proposed approach, a color distribution model is used to detect changes of target's model . Our proposed approach controls
deformation of target's model. If deformation of target's model is larger than a predetermined threshold,then the model will be updated. Global Nearest Neighbor (GNN) algorithm is used as data association algorithm. We named our proposed method as Deformation Detection Particle Filter (DDPF) . DDPF
approach is compared with basic SIR-PF algorithm on real airshow videos. Comparisons results show that, the basic SIR-PF algorithm is not able to track the manoeuvring targets when the rotation or scaling is occurred in target' s model. However, DDPF approach updates target's model when the rotation or
scaling is occurred. Thus, the proposed approach is able to track the manoeuvring targets more efficiently
and accurately.
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...IJECEIAES
Data analysis plays a prominent role in interpreting various phenomena. Data mining is the process to hypothesize useful knowledge from the extensive data. Based upon the classical statistical prototypes the data can be exploited beyond the storage and management of the data. Cluster analysis a primary investigation with little or no prior knowledge, consists of research and development across a wide variety of communities. Cluster ensembles are melange of individual solutions obtained from different clusterings to produce final quality clustering which is required in wider applications. The method arises in the perspective of increasing robustness, scalability and accuracy. This paper gives a brief overview of the generation methods and consensus functions included in cluster ensemble. The survey is to analyze the various techniques and cluster ensemble methods.
This document provides a literature review of deep reinforcement learning applications in medical imaging. It begins with introducing deep reinforcement learning and reinforcement learning concepts. It then discusses several medical imaging applications of deep reinforcement learning, including landmark detection, image registration, object lesion localization and detection, view plane localization, and plaque tracking. Deep reinforcement learning has also been used for optimization tasks in medical imaging like hyperparameter tuning, data augmentation, and neural architecture search. While promising results have been shown, the document notes that deep reinforcement learning has not been fully utilized to meet clinical image segmentation and classification requirements.
Centralized Class Specific Dictionary Learning for wearable sensors based phy...Sherin Mathews
With recent progress in pervasive healthcare,
physical activity recognition with wearable body sensors has
become an important and challenging area in both research and
industrial communities. Here, we address a novel technique for
a sensor platform that performs physical activity recognition by
leveraging a class specific regularizer term into the dictionary
pair learning objective function. The proposed algorithm jointly
learns a synthesis dictionary and an analysis dictionary in
order to simultaneously perform signal representation and
classification once the time-domain features have been extracted.
Specifically, the class specific regularizer term ensures that the
sparse codes belonging to the same class will be concentrated
thereby proving beneficial for the classification stage. In order
to develop a more practical approach, we employ a combination
of an alternating direction method of multipliers and a l1 − ls
minimization method to approximately minimize the objective
function. We validate the effectiveness of our proposed model
by employing it on two activity recognition problem and an
intensity estimation problem, both of which include a large
number of physical activities. Experimental results demonstrate
that classifiers built in this dictionary learning based framework
outperforms state of art algorithms by using simple features,
thereby achieving competitive results when compared with
classical systems built upon features with prior knowledge
Centralized Class Specific Dictionary Learning for Wearable Sensors based phy...sherinmm
This document proposes a centralized class specific dictionary learning framework for physical activity recognition using wearable sensors. It incorporates a class specific regularizer term into the dictionary pair learning objective function to make sparse codes belonging to the same class more concentrated. This improves classification performance. The framework involves learning a synthesis dictionary and analysis dictionary jointly. It extracts time-domain features from acceleration and heart rate sensor data. Experimental results on activity recognition datasets demonstrate the framework outperforms state-of-the-art methods using only simple features, achieving competitive results to approaches using more complex, domain-specific features.
Wearable sensor-based human activity recognition with ensemble learning: a co...IJECEIAES
The spectacular growth of wearable sensors has provided a key contribution to the field of human activity recognition. Due to its effective and versatile usage and application in various fields such as smart homes and medical areas, human activity recognition has always been an appealing research topic in artificial intelligence. From this perspective, there are a lot of existing works that make use of accelerometer and gyroscope sensor data for recognizing human activities. This paper presents a comparative study of ensemble learning methods for human activity recognition. The methods include random forest, adaptive boosting, gradient boosting, extreme gradient boosting, and light gradient boosting machine (LightGBM). Among the ensemble learning methods in comparison, light gradient boosting machine and random forest demonstrate the best performance. The experimental results revealed that light gradient boosting machine yields the highest accuracy of 94.50% on UCI-HAR dataset and 100% on single accelerometer dataset while random forest records the highest accuracy of 93.41% on motion sense dataset.
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERIJCSEA Journal
Comparison study of algorithms is very much required before implementing them for the needs of any
organization. The comparisons of algorithms are depending on the various parameters such as data
frequency, types of data and relationship among the attributes in a given data set. There are number of
learning and classifications algorithms are used to analyse, learn patterns and categorize data are
available. But the problem is the one to find the best algorithm according to the problem and desired
output. The desired result has always been higher accuracy in predicting future values or events from the
given dataset. Algorithms taken for the comparisons study are Neural net, SVM, Naïve Bayes, BFT and
Decision stump. These top algorithms are most influential data mining algorithms in the research
community. These algorithms have been considered and mostly used in the field of knowledge discovery
and data mining.
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...cscpconf
The growing population of elders in the society calls for a new approach in care giving. By inferring what activities elderly are performing in their houses it is possible to determine their
physical and cognitive capabilities. In this paper we show the potential of important discriminative classifiers namely the Soft-Support Vector Machines (C-SVM), Conditional Random Fields (CRF) and k-Nearest Neighbors (k-NN) for recognizing activities from sensor patterns in a smart home environment. We address also the class imbalance problem in activity recognition field which has been known to hinder the learning performance of classifiers. Cost sensitive learning is attractive under most imbalanced circumstances, but it is difficult to determine the precise misclassification costs in practice. We introduce a new criterion for selecting the suitable cost parameter C of the C-SVM method. Through our evaluation on four real world imbalanced activity datasets, we demonstrate that C-SVM based on our proposed criterion outperforms the state-of-the-art discriminative methods in activity recognition.
This document compares the performance of various machine learning and classification algorithms, including neural networks, support vector machines, Naive Bayes, decision trees, and decision stumps. It analyzes these algorithms using a dataset of annual and monthly temperature data from India over 1901-2012. The analysis is conducted in RapidMiner and finds that neural networks and support vector machines can effectively model complex nonlinear relationships to predict temperature. Neural networks achieved reasonably accurate predictions of annual temperature compared to the original data values. The document concludes by comparing the performance of the different algorithms.
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...cscpconf
This document compares three classification methods - artificial neural networks, decision trees, and logistic regression - for predicting malignancy in thyroid tumor patients using a clinical dataset. It describes each method and applies them to a dataset of 259 thyroid tumor patients. The artificial neural network achieved 98% accuracy on the training set and 92% on the validation set. The decision tree method used 150 cases to build a model and achieved 86% accuracy. Logistic regression analysis resulted in 88% accuracy. The artificial neural network was able to accurately predict malignancy and identified important attributes like multiple nodules and family cancer history.
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
Many applications of automatic document classification require learning accurately with little training
data. The semi-supervised classification technique uses labeled and unlabeled data for training. This
technique has shown to be effective in some cases; however, the use of unlabeled data is not always
beneficial.
On the other hand, the emergence of web technologies has originated the collaborative development of
ontologies. In this paper, we propose the use of ontologies in order to improve the accuracy and efficiency
of the semi-supervised document classification.
We used support vector machines, which is one of the most effective algorithms that have been studied for
text. Our algorithm enhances the performance of transductive support vector machines through the use of
ontologies. We report experimental results applying our algorithm to three different datasets. Our
experiments show an increment of accuracy of 4% on average and up to 20%, in comparison with the
traditional semi-supervised model.
Noise-robust classification with hypergraph neural networknooriasukmaningtyas
This paper presents a novel version of hypergraph neural network method. This method is utilized to solve the noisy label learning problem. First, we apply the PCA dimensional reduction technique to the feature matrices of the image datasets in order to reduce the “noise” and the redundant features in the feature matrices of the image datasets and to reduce the runtime constructing the hypergraph of the hypergraph neural network method. Then, the classic graph based semisupervised learning method, the classic hypergraph based semi-supervised learning method, the graph neural network, the hypergraph neural network, and our proposed hypergraph neural network are employed to solve the noisy label learning problem. The accuracies of these five methods are evaluated and compared. Experimental results show that the hypergraph neural network methods achieve the best performance when the noise level increases. Moreover, the hypergraph neural network methods are at least as good as the graph neural network.
This document summarizes an empirical study comparing several supervised machine learning approaches for word sense disambiguation: Naive Bayes, decision tree, decision list, and support vector machine (SVM). The study used a dataset of 15 words annotated with senses from WordNet and Senseval-3. Each approach was implemented and evaluated based on its accuracy in identifying the correct sense of each word. The results showed that the decision list approach achieved the highest overall accuracy of 69.12%, followed by SVM at 56.11%, naive Bayes at 58.32%, and decision tree at 45.14%. Thus, the study concluded that decision list performed best on this dataset for the task of word sense disambiguation.
An overlapping conscious relief-based feature subset selection methodIJECEIAES
Feature selection is considered as a fundamental prepossessing step in various data mining and machine learning based works. The quality of features is essential to achieve good classification performance and to have better data analysis experience. Among several feature selection methods, distance-based methods are gaining popularity because of their eligibility in capturing feature interdependency and relevancy with the endpoints. However, most of the distance-based methods only rank the features and ignore the class overlapping issues. Features with class overlapping data work as an obstacle during classification. Therefore, the objective of this research work is to propose a method named overlapping conscious MultiSURF (OMsurf) to handle data overlapping and select a subset of informative features discarding the noisy ones. Experimental results over 20 benchmark dataset demonstrates the superiority of OMsurf over six existing state-of-the-art methods.
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...ijsc
This summary provides the high-level information from the document in 3 sentences:
The document proposes a Particle Swarm Optimization (PSO) based ensemble classification model to improve classification of high-dimensional biomedical datasets. It develops an optimized PSO technique to select optimal features and initialize weights for base classifiers in the ensemble model. Experimental results on microarray datasets show the proposed model achieves higher accuracy, true positive rate, and lower error rate compared to traditional feature selection based classification models.
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
As the size of the biomedical databases are growing day by day, finding an essential features in the disease prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the
availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to analyze, predict and interpret the feature information using the traditional feature selection based classification models. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. The main objective of the feature selection
based ensemble learning models is to classify the high dimensional data with high computational efficiency
and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle swarm optimization (PSO) based Ensemble classification model was developed on high dimensional microarray
datasets. Experimental results proved that the proposed model has high computational efficiency compared to the traditional feature selection based classification models in terms of accuracy , true positive rate and error rate are concerned.
A Formal Machine Learning or Multi Objective Decision Making System for Deter...Editor IJCATR
Decision-making typically needs the mechanisms to compromise among opposing norms. Once multiple objectives square measure is concerned of machine learning, a vital step is to check the weights of individual objectives to the system-level performance. Determinant, the weights of multi-objectives is associate in analysis method, associated it's been typically treated as a drawback. However, our preliminary investigation has shown that existing methodologies in managing the weights of multi-objectives have some obvious limitations like the determination of weights is treated as one drawback, a result supporting such associate improvement is limited, if associated it will even be unreliable, once knowledge concerning multiple objectives is incomplete like an integrity caused by poor data. The constraints of weights are also mentioned. Variable weights square measure is natural in decision-making processes. Here, we'd like to develop a scientific methodology in determinant variable weights of multi-objectives. The roles of weights in a creative multi-objective decision-making or machine-learning of square measure analyzed, and therefore the weights square measure determined with the help of a standard neural network.
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELijcsit
Predicting the student performance is a great concern to the higher education managements.This
prediction helps to identify and to improve students' performance.Several factors may improve this
performance.In the present study, we employ the data mining processes, particularly classification, to
enhance the quality of the higher educational system. Recently, a new direction is used for the improvement
of the classification accuracy by combining classifiers.In thispaper, we design and evaluate a fastlearning
algorithm using AdaBoost ensemble with a simple genetic algorithmcalled “Ada-GA” where the genetic
algorithm is demonstrated to successfully improve the accuracy of the combined classifier performance.
The Ada-GA algorithm proved to be of considerable usefulness in identifying the students at risk early,
especially in very large classes. This early prediction allows the instructor to provide appropriate advising
to those students. The Ada/GA algorithm is implemented and tested on ASSISTments dataset, the results
showed that this algorithm hassuccessfully improved the detection accuracy as well as it reduces the
complexity of computation.
This document describes two machine learning techniques, particle swarm optimization with support vector machines (PSO-SVM) and recursive feature elimination with support vector machines (RFE-SVM), that were used to classify autism neuroimaging data from the Autism Brain Imaging Data Exchange database. PSO-SVM was used to select discriminative features for classification, while RFE-SVM ranked features by importance. Both techniques aimed to improve classification accuracy and reduce overfitting by selecting optimal feature subsets from the high-dimensional neuroimaging data. The results could help develop brain-based diagnostic criteria for autism.
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...csandit
The growing population of elders in the society calls for a new approach in care giving. By
inferring what activities elderly are performing in their houses it is possible to determine their
physical and cognitive capabilities. In this paper we show the potential of important
discriminative classifiers namely the Soft-Support Vector Machines (C-SVM), Conditional
Random Fields (CRF) and k-Nearest Neighbors (k-NN) for recognizing activities from sensor
patterns in a smart home environment. We address also the class imbalance problem in activity
recognition field which has been known to hinder the learning performance of classifiers. Cost
sensitive learning is attractive under most imbalanced circumstances, but it is difficult to
determine the precise misclassification costs in practice. We introduce a new criterion for
selecting the suitable cost parameter C of the C-SVM method. Through our evaluation on four
real world imbalanced activity datasets, we demonstrate that C-SVM based on our proposed
criterion outperforms the state-of-the-art discriminative methods in activity recognition.
A multilabel classification approach for complex human activities using a com...IJECEIAES
In our daily lives, humans perform different Activities of Daily Living (ADL), such as cooking, and studying. According to the nature of humans, they perform these activities in a sequential/simple or an overlapping/complex scenario. Many research attempts addressed simple activity recognition, but complex activity recognition is still a challenging issue. Recognition of complex activities is a multilabel classification problem, such that a test instance is assigned to a multiple overlapping activities. Existing data-driven techniques for complex activity recognition can recognize a maximum number of two overlapping activities and require a training dataset of complex (i.e. multilabel) activities. In this paper, we propose a multilabel classification approach for complex activity recognition using a combination of Emerging Patterns and Fuzzy Sets. In our approach, we require a training dataset of only simple (i.e. single-label) activities. First, we use a pattern mining technique to extract discriminative features called Strong Jumping Emerging Patterns (SJEPs) that exclusively represent each activity. Then, our scoring function takes SJEPs and fuzzy membership values of incoming sensor data and outputs the activity label(s). We validate our approach using two different dataset. Experimental results demonstrate the efficiency and superiority of our approach against other approaches.
Pattern recognition system based on support vector machinesAlexander Decker
This document describes a study that uses support vector machines (SVM) to develop quantitative structure-activity relationship (QSAR) models for predicting the anti-HIV activity of 1,3,4-oxadiazole substituted naphthyridine derivatives based on their molecular descriptors. The SVM model achieved a cross-validation R2 value of 0.90 and RMSE of 0.145, outperforming artificial neural network and multiple linear regression models. An external validation on an independent test set found the SVM model had an R value of 0.96 and RMSE of 0.166, demonstrating good predictive ability.
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
In this paper Compare the performance of two
classification algorithm. I t is useful to differentiate
algorithms based on computational performance rather
than classification accuracy alone. As although
classification accuracy between the algorithms is similar,
computational performance can differ significantly and it
can affect to the final results. So the objective of this paper
is to perform a comparative analysis of two machine
learning algorithms namely, K Nearest neighbor,
classification and Logistic Regression. In this paper it
was considered a large dataset of 7981 data points and 112
features. Then the performance of the above mentioned
machine learning algorithms are examined. In this paper
the processing time and accuracy of the different machine
learning techniques are being estimated by considering the
collected data set, over a 60% for train and remaining
40% for testing. The paper is organized as follows. In
Section I, introduction and background analysis of the
research is included and in section II, problem statement.
In Section III, our application and data analyze Process,
the testing environment, and the Methodology of our
analysis are being described briefly. Section IV comprises
the results of two algorithms. Finally, the paper concludes
with a discussion of future directions for research by
eliminating the problems existing with the current
research methodology.
Similar to Maximum Correntropy Based Dictionary Learning Framework for Physical Activity Recognition Using Wearable Sensors (20)
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Maximum Correntropy Based Dictionary Learning Framework for Physical Activity Recognition Using Wearable Sensors
1. Maximum Correntropy Based Dictionary
Learning Framework for Physical Activity
Recognition Using Wearable Sensors
Sherin M. Mathews1(B)
, Chandra Kambhamettu2
, and Kenneth E. Barner1
1
Department of Electrical and Computer Engineering,
University of Delaware, Newark, DE 19716, USA
{sherinm,barner}@udel.edu
2
Department of Computer and Information Science,
University of Delaware, Newark, DE 19716, USA
chandrak@udel.edu
Abstract. Due to its symbolic role in ubiquitous health monitoring,
physical activity recognition with wearable body sensors has been in the
limelight in both research and industrial communities. Physical activity
recognition is difficult due to the inherent complexity involved with dif-
ferent walking styles and human body movements. Thus we present a
correntropy induced dictionary pair learning framework to achieve this
recognition. Our algorithm for this framework jointly learns a synthesis
dictionary and an analysis dictionary in order to simultaneously perform
signal representation and classification once the time-domain features
have been extracted. In particular, the dictionary pair learning algo-
rithm is developed based on the maximum correntropy criterion, which
is much more insensitive to outliers. In order to develop a more tractable
and practical approach, we employ a combination of alternating direction
method of multipliers and an iteratively reweighted method to approxi-
mately minimize the objective function. We validate the effectiveness of
our proposed model by employing it on an activity recognition problem
and an intensity estimation problem, both of which include a large num-
ber of physical activities from the recently released PAMAP2 dataset.
Experimental results indicate that classifiers built using this correntropy
induced dictionary learning based framework achieve high accuracy by
using simple features, and that this approach gives results competitive
with classical systems built upon features with prior knowledge.
1 Introduction and Related Work
Recognition of basic physical activities (such as walk, run, cycle) and basic pos-
tures (lie, sit, stand) is well researched [1–4], and good recognition performance
can be achieved with a single 3D-accelerometer and simple classifiers. Recent
works have focused on estimating the intensity of an activity (i.e., light, mod-
erate or vigorous) (e.g., in [5,6]) by means of metabolic equivalent (MET), a
parameter that refers to the energy expenditure of a physical activity.
c Springer International Publishing AG 2016
G. Bebis et al. (Eds.): ISVC 2016, Part II, LNCS 10073, pp. 123–132, 2016.
DOI: 10.1007/978-3-319-50832-0 13
2. 124 S.M. Mathews et al.
Over the years, many studies have analysed determining which features in
this activity data are the most informative, and how these data can be most
effectively employed to classify the activities [3,4,7,8]. Other research has inves-
tigated which computational model is the most appropriate to represent human
activity data [9–11]. Despite such research efforts, the scalability of handling
large intra-class variations and the robustness of many existing human-activity
recognition techniques to the model parameters remains limited.
To make physical activity monitoring feasible in everyday life scenarios, an
activity-recognition framework must be robust, i.e., it must handle a wide range
of everyday, household, or sport activities and must manage a variety of potential
users. Two challenges facing these frameworks are dataset size and simultane-
ously occurring background activities. First, using smaller activity datasets con-
sisting of merely a few basic recorded activities limits the scope of the framework
since these methods would only apply to specific scenarios. Although current
research has focused on increasing the number of activities that are recognized,
each increase in the number of activities causes the classification performance to
fall off. Secondly, recording and using only a small set of a few activities in basic
activity recognition, without having simultaneous background activities, limits
the applicability of the developed algorithms. Real-time scenarios might include
such activity switching, thus requiring testing on a wider range of activities than
were used for training. Thus, successful activity-recognition requires frameworks
that are sufficiently robust for classification even with limited training data and
also insensitive to outliers thereby reducing misclassifications.
In this work, we propose to develop a dictionary pair learning framework
based on the maximum correntropy criterion [12] to solve the wearable sensor
based classification problem. Correntropy has demonstrated to obtain robust
inferences in information theory learning (ITL) [13] and effectively handle
non-Gaussian noise and large outliers [12]. Inspired by dictionary learning exper-
iments that achieved highly successful recognition rates using a few representa-
tive samples on high dimensional data, we propose a unified dictionary pair
learning-based framework based on maximum correntropy for human physical
activity monitoring and recognition. To optimize the non-convex correntropy
objective function, a new alternate minimization algorithm incorporated with
an iteratively reweighted method is developed to facilitate convergence. To the
best of our knowledge, dictionary learning frameworks, and specifically corren-
tropy based dictionary pair learning frameworks, have not to date been used in
wearable sensor-based applications. Consequently, our novel dictionary learning-
based framework algorithm will engender future research on this method’s poten-
tial applicability for accurate sensor-based data classifications and for other
physiological-signal classifications.
2 Proposed Methodology
The proposed framework consists of two stages: data processing and recognition.
The data processing stage consists of preprocessing, segmentation and feature
extraction. During the preprocessing step, raw sensory data is synchronized,
3. Maximum Correntropy Based Dictionary Learning Framework 125
timestamped, and labeled; 3D-acceleration and heart rate data are collected.
During the segmentation step, this data is segmented with a sliding window,
using a window size of 512 samples. For the feature extraction step, signal fea-
tures extracted from the segmented 3D-acceleration data are computed for each
axis separately and for the 3 axes together. Including the Heart Rate(HR) moni-
tor with the commonly used inertial sensors proved especially useful for physical
activity intensity estimation. Hence, mean and feature gradient features are cal-
culated on both the raw and normalized heart rate signals from the HR data.
In addition, derived features were extracted as they better distinguish activities
involving both upper and lower body movements, thereby improving recognition
of physical activities. Overall, a total of 137 basic features from each data win-
dow of 512 samples are extracted: 133 features from Inertial Measurement Unit
(IMU) acceleration data and 4 features from HR data.
2.1 Correntropy Based Dictionary Learning Criterion
Maximum Correntropy Criterion (MCC)
Recognition against outliers and noise is critically challenging, mainly due to
the unpredictable nature of the errors (bias) caused by noise and outliers. The
concept of correntropy was proposed in ITL [13] to process non-Gaussian noise.
Correntropy is directly relevant to Renyis quadratic entropy [12] wherin the
Parzen windowing method is employed to estimate the data distribution. Maxi-
mization of correntropy criterion cost function (MCC) is defined by maximizing
V (X, Y ) =
1
N
N
i=1
kσ(xi − yi) =
1
N
N
i=1
kσ(ei) (1)
where X = [x1, x2, ..xN ] is the desired signal, Y = [y1, y2, ..yN ] is the system
output, E = [e1, e2, ..eN ] is the error signal, each of them being N-dimensional
vectors, where N is the training data size and kσ(x) is the Gaussian kernel with
bandwidth σ given by:
kσ(x) =
1
√
2πσ
exp−x2
/2σ2
(2)
In a general framework of M-estimation, MCC can be defined as
ρ(e) = (1 − exp−e2
/2σ2
)/
√
2πσ (3)
MCC cost function has proved to satisfy the properties of non-negativity, trans-
lation invariant, triangle inequality and symmetry, thus is a well-defined metric.
Adoption of MCC to train adaptive systems actually makes the output signal
close to the desired signal. By analyzing the contour maps [12], it can be inferred
that when the error vector is close to zero, it acts like l2 distance; when the error
gets larger, it is equivalent to l1 distance; and for cases when the error is large
the cost metric levels off and is very insensitive to the large-value of error vector,
thereby intuitively explaining the robustness of MCC.
4. 126 S.M. Mathews et al.
Correntropy Based Dictionary Pair Learning Framework
The dictionary pair learning classification algorithm initially jointly learns a
synthesis dictionary and an analysis dictionary to achieve the goal of signal
representation and discrimination. To define discriminative dictionary learn-
ing, we denote a set of p-dimensional training samples from K classes by
X = [X1, ........, Xk, ...., XK], where Xk ∈ Rp×n
is the training sample set of
class k, and n is the number of samples of each class. Discriminative dictionary
learning (DL) methods aim to learn an effective data representation model from
X for classification tasks by exploiting the class label information of training
data, and can be formulated under the following framework:
min
D,A
X − DA 2
F + λ||A||p + Ψ(D, A, Y) (4)
Here λ 0 is a scalar constant; Y represents the class label matrix of samples
in X; D is the synthesis dictionary to be learned; and A is the coding coefficient
matrix of X over D. In the training model (Eq. 4), the data fidelity term X −
DA 2
F ensures the representation ability of D; ||A||p is the lp-norm regularizer
on A; and Ψ(D, A, Y) stands for some discrimination promotion function that
ensures the discrimination power of D and A [14].
If when using an analysis dictionary denoted by P ∈ RmK×p
, the code A
can be analytically obtained as A = PX, then the representation of X becomes
efficient. Based on this idea, the DPL model learns an analysis dictionary P
together with the synthesis dictionary D, leading to the following DPL model
[14]:
P∗
, D∗
= arg min
P,D
X − DPX 2
F + Ψ(D, P, X, Y) (5)
Here Ψ (D, P, X, Y) is the discrimination function; and D and P form a
dictionary pair where the analysis dictionary P is used to analytically code X,
and the synthesis dictionary D is used to reconstruct X. The learned structured
synthesis dictionary D = [D1, ..Dk, ., DK] and the structured analysis dictionary
P = [P1, ....., Pk......PK] form a sub-dictionary pair corresponding to class k.
Thus using the structured analysis dictionary P, we want the sub-dictionary Pk
to project the samples from class i,i = k to a nearly null space thereby making
the coefficient matrix PX nearly block diagonal. Using variable matrix A to relax
the non-convex problem, we readily have the following DPL model:
P∗
, A∗
, D∗
= arg min
P,A,D
K
k=1
Xk−DkAk
2
F +τ PkXk−Ak
2
F +λ||Pk, Xi||2
F (6)
Here Xk denotes the complementary data matrix of Xk in the whole training set
X, λ > 0 is a scalar constant, and di denotes the ith atom of synthesis dictionary
D. Incorporating correntropy based criteria (Eq. 2), Eq. 6 can be rewritten as:
P∗
, A∗
, D∗
=arg min
P,A,D
K
k=1
1+τ−λ exp Xk−DkAk
2
F /σ2
−τ exp PkXk−Ak
2
F /σ2
−λ exp||Pk,Xi||2
F /σ2
(7)
5. Maximum Correntropy Based Dictionary Learning Framework 127
In order to solve this equation, we employ alternating direction method of
multipliers with iteratively reweighted method to facilitate convergence. A gen-
eral maximization problem solved by an iteratively reweighted method can be
described as follows: Consider a general equivalent maximization problem
max f(x) +
i
hi(gi(x)) (8)
where f(x) and gi(x) are arbitrary functions, x denotes an arbitrary constant
and hi(x) is an arbitrary convex function in domain of gi(x). The details to solve
general maximization problem (Eq. 8) using iteratively reweighted optimization
is described in Algorithm 1 [15], where hi(gi(x)) denotes any supergradient of
the concave function hi at point gi(x).
Problem Transformation: hi(gi(x)) → Tr(DT
i gi(x))
Algorithm 1. Optimization algorithm for a general maximization problem
0: Initialize Di = I
1: Update x by optimal solution to the problem
max f(x) + i Tr(DT
i gi(x))
2: Calculate Di = hi(gi(x)) for each i
3: Iteratively perform 1–2 until convergence
Now consider maximum correntropy criterion problem:
min f(x) +
i
−exp−l2
i (x)/2σ2
(9)
where f(x) and li(x) are arbitrary functions, x denotes an arbitrary constant.
Comparing with equation (8), in hi(gi(x)), let hi(z) = exp−z/2σ2
(z > 0)
and gi(x) = l2
i (x)(z > 0), then hi(gi(x)) = exp−l2
i (x)/2σ2
where hi(z) =
1 − exp−z/2σ2
(z > 0) is concave function. Using iteratively reweighted method
(Algorithm 1)[15], the problem transformation and steps to solve the maximum
correntropy criterion problem can be described as:
Problem Transformation: 1−exp−l2
i (x)/2σ2
→ dil2
i (x)(di = 1
2σ2 exp−l2
i (xt)/2σ2
)
Algorithm 2. Optimization algorithm for maximum correntropy criterion
0: Initialize di = 1
1: Update x by optimal solution to the problem
min f(x) + i dil2
i (x)
2: Calculate di = 1
2σ2 exp−l2
i (xt)/2σ2
for each i
3: Iteratively perform 1–2 until convergence
6. 128 S.M. Mathews et al.
The original objective function (Eq. 7) can now be easily solved by a com-
bination of ADMM and an iteratively re-weighted algorithm. The alternating
direction method of multipliers (ADMM) allows to solve convex optimization
problems by fixing some variables and solving for the other variable, thereby
breaking the problem into smaller pieces making each of them easier to handle.
The minimization can be alternated between the following two steps:
1: Update A
A∗
= arg min
A
K
k=1
−λ exp Xk−DkAk
2
F /σ2
−τ exp PkXk−Ak
2
F /σ2
(10)
Using the problem transformation defined in Algorithm 2, we have the closed-
form solution:
A∗
= (d1DT
k Dk + d2τI)−1
(d1DT
k Xk + τd2PkXk) (11)
where d1 =
1
2σ2
exp− Xk−DkAk
2
F /2σ2
and d2 =
1
2σ2
exp− PkXk−Ak
2
F /2σ2
(12)
2: For updating P
P∗
= arg min
P
K
k=1
−τ exp PkXk−Ak
2
F /σ2
−λ exp||Pk,Xi||2
F /σ2
(13)
The closed-form solutions of P can be obtained as:
P∗
= d2τAkXT
k (d2τXkXT
k + d3λ ¯Xk
¯XT
k + Y I)−1
(14)
where
d2 =
1
2σ2
exp− PkXk−Ak
2
F /2σ2
and d3 =
1
2σ2
exp− Pk
¯Xk
2
F /2σ2
(15)
Iterate between the steps until it converges.
In the testing phase, the analysis sub-dictionary Pk is trained to produce
small coefficients for samples from classes other than k, and thus can only gen-
erate significant coding coefficients for samples from class k. Meanwhile, the
synthesis sub-dictionary Dk is trained to reconstruct the samples of class k from
their projective coefficients PkXk, i.e. residual will be small. Conversely, since
PkXi will be small and Dk is not trained to reconstruct Xi, the residual ai will
be much larger. Thus, if the query sample y is from class k, its projective coding
vector Pk will more likely be large, while its projective coding vectors Pi will be
small. Therefore, class-specific reconstruction residual are used to identify the
class label of testing sample.
3 Experimental Results and Discussion
For our evaluation experiments, we used the recently released database PAMAP2
Physical Activity Monitoring Data Set available at the UCI machine learn-
ing repository [16]. Briefly, this database not only incorporates basic physical
7. Maximum Correntropy Based Dictionary Learning Framework 129
activities and postures, but also includes a wide range of everyday, household,
and fitness activities. The dataset captures 18 physical activities performed by
9 subjects wearing 3 IMUs (Inertial measurement unit) and a HR (heart rate)
monitor. Each subject followed a predefined data collection protocol of 12 activ-
ities (lie, sit, stand, walk, run, cycle, Nordic walk, iron, vacuum, jump rope,
ascend and descend stairs), and optionally performed 6 other activities (watch
TV, work at a computer, drive a car, fold laundry, clean house, play soccer),
added to enrich the range.
To evaluate the proposed framework, we employed activity recognition and
intensity estimation classification problems defined on the PAMAP2 dataset. In
the PAMAP2 dataset, the activity classification task is extended to 15 activi-
ties including 3 additional activities from the optional activity list (fold laundry,
clean house, play soccer). The complete activity recognition task consisting of
these 15 different activity classes is referred to as the PAMAP2 Activity Recog-
nition (PAMAP2-AR) task. The intensity-estimation classification task aims
to distinguish activities of light, moderate, and vigorous effort, and is referred
to as the PAMAP2 Intensity Estimation (PAMAP2 - IE) task. Differentiating
these levels of effort is based on the MET of the various physical activities, as
provided by [17]. Therefore, intensity classes are defined as activities of light
effort (<3.0 METs) (lie, sit, stand, drive a car, iron, fold laundry, clean house,
watch TV, work at a computer), moderate effort (3.0–6.0 METs) (walk, cycle,
descend stairs, vacuum and Nordic walk), or vigorous effort (>6.0 METs) (run,
ascend stairs, jump rope, play soccer). Thus two classification tasks were defined:
(1) activity recognition and (2) intensity estimation. Using these two defined clas-
sification tasks, we evaluated different boosting methods and compared them to
our proposed correntropy based dictionary pair learning-based approach. For
the evaluation procedure, we randomly selected 75% of the data for training and
25% for testing. The final result is the averaged value over all 10 runs.
Results on Intensity Estimation Task. In [18], base-level classifiers were
used for activity monitoring classification tasks. Within various classification
approaches, the C4.5 decision tree algorithm was tested on the PAMAP2 dataset
and demonstrated an accuracy of 70.07%. Also, the results obtained in [18] indi-
cated that further improvement in classification accuracy is possible since even
the best result using Adaboost classifier was only 73.93% accuracy. The overall
confusion matrix using the proposed framework for the three intensity estima-
tion tasks (i.e., light (Class 1), moderate (Class 2) and vigorous (Class 3) tasks)
is given in Table 1. An independent performance assessment of the proposed
framework results in an accuracy of 87.6% on the AR-IE task, demonstrating
that our framework outperforms the C4.5 decision tree and AdaBoost classifiers
(Table 2).
Results on Activity Recognition Task. To study our framework’s classifica-
tion performance on the PAMAP2−AR task, we determined its performance on
15 classes based activity-recognition tasks on the PAMAP2 dataset. Table 3 enu-
merates the overall confusion matrix for these tasks. We find that our framework
outperforms the C4.5 decision tree with an accuracy of 74.12% versus 71.59%,
8. 130 S.M. Mathews et al.
Table 1. Overall confusion matrix using dictionary pair learning framework on
PAMAP2-IE dataset. The table shows how different annotated activities are classi-
fied into different classes.
Annotated activity Class 1 Class 2 Class 3
Predicted Class 1 12893 422 9
Predicted Class 2 1258 6291 13
Predicted Class 3 218 1136 760
Table 2. Comparison of proposed approach on PAMAP2-IE dataset to state-of-the-art
methods in terms of accuracy (calculated in %)
Methodology C4.5 [18] Adaboost [18] Proposed approach
Accuracy 70.07% 73.93% 87.59%
and it gives competitive results when compared to an AdaBoost classifier on the
PAMAP2 AR task (Table 4). In addition to competitive accuracy, our framework
provides the additional advantages of lower training and testing times.
Further examination of our results indicate that averaged over 10 test runs,
the confusion matrix of the best-performing classifier on the PAMAP2-AR task
yields an overall accuracy of 74.12%, showing that some activities are recog-
nized with high accuracy, such as lying, walking, or even distinguishing between
ascending and descending. Overall, misclassifications, where activities belong-
ing to one class are mistakenly classified as belonging to its neighboring classes,
are lower. One example of overlapping activity characteristics is the over 5%
confusion between nordic walk (class 7) and cycle (class 6) and ascend (class 9)
Table 3. Confusion matrix using proposed framework on PAMAP2 -AR dataset.
Annotated Class Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 Class 7 Class 8 Class 9 Class 10 Class 11 Class 12 Class 13 Class 14 Class 15
Predicted Class 1 1660 2 2 19 14 0 0 0 1 0 0 0 28 2 0
Predicted Class 2 0 1387 34 126 24 4 0 5 0 2 0 0 68 4 1
Predicted Class 3 0 23 1364 144 102 1 2 1 6 0 0 0 57 1 2
Predicted Class 4 0 10 55 1779 90 0 0 0 0 0 0 0 236 19 0
Predicted Class 5 0 0 13 194 1086 5 0 6 0 2 0 0 243 0 8
Predicted Class 6 0 0 0 0 2 474 13 210 69 2 0 0 9 0 0
Predicted Class 7 0 0 0 0 0 159 295 56 81 1 0 1 38 0 0
Predicted Class 8 0 0 0 0 0 73 16 1824 275 2 0 0 3 0 0
Predicted Class 9 0 0 0 0 0 5 6 98 1600 0 0 0 0 0 0
Predicted Class 10 0 0 0 3 10 8 9 49 90 1271 0 0 27 0 5
Predicted Class 11 0 0 0 0 199 0 0 0 6 0 624 0 0 0 6
Predicted Class 12 0 0 0 0 25 0 49 0 38 0 0 253 3 0 0
Predicted Class 13 0 9 23 320 478 10 6 1 8 11 0 0 860 16 6
Predicted Class 14 0 4 16 439 209 0 0 0 0 0 0 0 145 88 0
Predicted Class 15 0 0 0 0 15 50 1 22 52 2 8 3 55 0 187
Table 4. Comparison of proposed approach on PAMAP2-AR dataset to state-of-the-
art methods in terms of accuracy (calculated in %)
Methodology C4.5 [18] Adaboost [18] Proposed approach
Accuracy 71.59% 71.78% 74.12%
9. Maximum Correntropy Based Dictionary Learning Framework 131
that is function of the positioning of the sensors; thus an IMU on the thigh would
help reliably differentiate these postures.
Another example comes from the playing soccer activity, because playing
soccer (class 14) is a composite activity. Thus it becomes problematic to distin-
guish running with a ball from just running (class 4). Arguably, however, the
main reason for these misclassifications is the diversity inherent in the subject’s
performance of physical activities. Therefore, further increasing the accuracy of
physical activity recognition will require the introduction and investigation of
personalized approaches.
In [18], the evaluation technique was leave-one-activity-out (LOAO) where
an activity monitoring system is used on a previously unknown activity. Our
framework takes in data randomly. Thus evaluation is based on a completely
unknown activity from an unknown user, while training is performed using a dif-
ferent activity with a different user in our random 75–25% validation approach.
These types of subject-independent and activity-independent validation tech-
niques are preferred for physical activity monitoring since they provide results
with more practical meaning. Using our framework, we can not only achieve
good classifier performance but also eliminate the need of pre-training a partic-
ular activity for classification. Thus, our proposed method makes it possible to
design a robust physical activity monitoring system that has the desired gener-
alization characteristics.
4 Conclusion
In this paper, we presented an effective dictionary pair learning-based frame-
work based on the maximum correntropy criterion to evaluate the robustness of
activity recognition and intensity estimation of aerobic activities using data from
wearable sensors. The proposed objective function is robust to outliers and can
be efficiently optimized by combination of iteratively reweighted technique and
alternating direction method of multipliers. Experimental results indicate that
classifiers built in this framework not only provide competitive performance but
also demonstrate subject-independent activity classification using accelerome-
ters. Both of these are important considerations for developing systems that
must be robust, scalable and must perform well in real world settings. Based
on our promising results, we will further extend this work by incorporating tree
structure and smooth constraint to the classification framework.
References
1. Long, X., Yin, B., Aarts, R.M.: Single-accelerometer-based daily physical activity
classification. In: Annual International Conference of the IEEE Engineering in
Medicine and Biology Society, EMBC 2009, pp. 6107–6110. IEEE (2009)
2. Lee, M.-h., Kim, J., Kim, K., Lee, I., Jee, S.H., Yoo, S.K.: Physical activity recog-
nition using a single tri-axis accelerometer. In: Proceedings of the World Congress
on Engineering and Computer Science, vol. 1 (2009)
10. 132 S.M. Mathews et al.
3. P¨arkk¨a, J., Ermes, M., Korpip¨a¨a, P., M¨antyj¨arvi, J., Peltola, J., Korhonen, I.:
Activity classification using realistic data from wearable sensors. IEEE Trans. Inf.
Technol. Biomed. 10, 119–128 (2006)
4. Ermes, M., P¨arkk¨a, J., Cluitmans, L.: Advancing from offline to online activity
recognition with wearable sensors. In: 30th Annual International Conference of the
IEEE Engineering in Medicine and Biology Society, EMBS 2008, pp. 4451–4454.
IEEE (2008)
5. Tapia, E.M., Intille, S.S., Haskell, W., Larson, K., Wright, J., King, A., Friedman,
R.: Real-time recognition of physical activities and their intensities using wire-
less accelerometers and a heart rate monitor. In: 2007 11th IEEE International
Symposium on Wearable Computers, pp. 37–40. IEEE (2007)
6. Parkka, J., Ermes, M., Antila, K., van Gils, M., Manttari, A., Nieminen, H.: Esti-
mating intensity of physical activity: a comparison of wearable accelerometer and
gyro sensors and 3 sensor locations. In: 29th Annual International Conference of the
IEEE Engineering in Medicine and Biology Society, EMBS 2007, pp. 1511–1514.
IEEE (2007)
7. Ravi, N., Dandekar, N., Mysore, P., Littman, M.L.: Activity recognition from
accelerometer data. In: AAAI, vol. 5, pp. 1541–1546 (2005)
8. Reiss, A., Stricker, D.: Introducing a modular activity monitoring system. In: 2011
Annual International Conference of the IEEE Engineering in Medicine and Biology
Society, EMBC, pp. 5621–5624. IEEE (2011)
9. Dohn´alek, P., Gajdoˇs, P., Peterek, T.: Tensor modification of orthogonal match-
ing pursuit based classifier in human activity recognition. In: Zelinka, I., Chen,
G., R¨ossler, O.E., Snasel, V., Abraham, A. (eds.) Nostradamus 2013: Prediction,
Modeling and Analysis of Complex Systems. AISC, vol. 210, pp. 497–505. Springer,
Heidelberg (2013). doi:10.1007/978-3-319-00542-3 49
10. Gjoreski, H., Kozina, S., Gams, M., Lustrek, M., ´Alvarez-Garc´ıa, J.A., Hong, J.H.,
Ramos, J., Dey, A.K., Bocca, M., Patwari, N.: Competitive live evaluations of
activity-recognition systems. IEEE Pervasive Comput. 14, 70–77 (2015)
11. Martınez, A.M., Webb, G.I., Chen, S., Zaidi, N.A.: Scalable learning of bayesian
network classifiers (2015)
12. Liu, W., Pokharel, P.P., Pr´ıncipe, J.C.: Correntropy: properties and applications
in non-gaussian signal processing. IEEE Trans. Sig. Process. 55, 5286–5298 (2007)
13. Principe, J.C., Xu, D., Fisher, J.: Information theoretic learning. Unsupervised
Adapt. Filter. 1, 265–319 (2000)
14. Gu, S., Zhang, L., Zuo, W., Feng, X.: Projective dictionary pair learning for pattern
classification. In: Advances in Neural Information Processing Systems, pp. 793–801
(2014)
15. Nie, F., Yuan, J., Huang, H.: Optimal mean robust principal component analysis.
In: Proceedings of the 31st International Conference on Machine Learning (ICML
2014), pp. 1062–1070 (2014)
16. Bache, K., Lichman, M.: (UCI machine learning repository. url http://archive.ics.
uci.edu/ml)
17. Ainsworth, B.E., Haskell, W.L., Whitt, M.C., Irwin, M.L., Swartz, A.M., Strath,
S.J., O Brien, W.L., Bassett, D.R., Schmitz, K.H., Emplaincourt, P.O., et al.:
Compendium of physical activities: an update of activity codes and met intensities.
Med. Sci. Sports Exerc. 32, S498–S504 (2000)
18. Reiss, A.: Personalized mobile physical activity monitoring for everyday life. Ph.D.
thesis, Technical University of Kaiserslautern (2014)