The document describes using reversible jump Markov chain Monte Carlo (RJMCMC) for clustering with an unknown number of components. It summarizes two papers on this topic: Richardson & Green (1997) and Tadesse et al. (2005). Richardson & Green apply RJMCMC to one-dimensional data using split/merge and birth/death moves. Tadesse et al. extend this approach to high-dimensional data by incorporating variable selection into the model. The document outlines the clustering overview, RJMCMC methodology, and key aspects and results of the two papers.
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is
accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a
σ value, a hyper-parameter which can be manually defined and manipulated to suit the application.
Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster
centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the
exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an
outstanding task because normally such expressions are impossible to solve analytically. However, we
prove that if the points are all included in a square region of size σ, there is only one minimum. This bound
is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new
numerical approach “per block”. This technique decreases the number of particles by approximating some
groups of particles to weighted particles. These findings are not only useful to the quantum clustering
problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics
and other applications.
A comparative study of clustering and biclustering of microarray dataijcsit
There are subsets of genes that have similar behavior under subsets of conditions, so we say that they
coexpress, but behave independently under other subsets of conditions. Discovering such coexpressions can
be helpful to uncover genomic knowledge such as gene networks or gene interactions. That is why, it is of
utmost importance to make a simultaneous clustering of genes and conditions to identify clusters of genes
that are coexpressed under clusters of conditions. This type of clustering is called biclustering.
Biclustering is an NP-hard problem. Consequently, heuristic algorithms are typically used to approximate
this problem by finding suboptimal solutions. In this paper, we make a new survey on clustering and
biclustering of gene expression data, also called microarray data.
Estimating Reconstruction Error due to Jitter of Gaussian Markov ProcessesMudassir Javed
This paper presents estimation of reconstruction error due to jitter of Gaussian Markov Processes. Two samples are considered for the analysis in two different situations. In one situation, the first sample does not have jitter while the other one is effected by jitter. In the second situation, both the samples are effected by jitter. The probability density functions of the jitter are given by Uniform Distribution and Erlang Distribution. Statistical averaging is applied to conditional expectation of random variable of jitter. From that, conditional variance is obtained which is defined as reconstruction error function and by knowing that, the reconstruction error of a Gaussian Markov Process is determined.
This is a lecture is a series on combustion chemical kinetics for engineers. The course topics are selections from thermodynamics and kinetics especially geared to the interests of engineers involved in combusition
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is
accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a
σ value, a hyper-parameter which can be manually defined and manipulated to suit the application.
Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster
centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the
exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an
outstanding task because normally such expressions are impossible to solve analytically. However, we
prove that if the points are all included in a square region of size σ, there is only one minimum. This bound
is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new
numerical approach “per block”. This technique decreases the number of particles by approximating some
groups of particles to weighted particles. These findings are not only useful to the quantum clustering
problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics
and other applications.
A comparative study of clustering and biclustering of microarray dataijcsit
There are subsets of genes that have similar behavior under subsets of conditions, so we say that they
coexpress, but behave independently under other subsets of conditions. Discovering such coexpressions can
be helpful to uncover genomic knowledge such as gene networks or gene interactions. That is why, it is of
utmost importance to make a simultaneous clustering of genes and conditions to identify clusters of genes
that are coexpressed under clusters of conditions. This type of clustering is called biclustering.
Biclustering is an NP-hard problem. Consequently, heuristic algorithms are typically used to approximate
this problem by finding suboptimal solutions. In this paper, we make a new survey on clustering and
biclustering of gene expression data, also called microarray data.
Estimating Reconstruction Error due to Jitter of Gaussian Markov ProcessesMudassir Javed
This paper presents estimation of reconstruction error due to jitter of Gaussian Markov Processes. Two samples are considered for the analysis in two different situations. In one situation, the first sample does not have jitter while the other one is effected by jitter. In the second situation, both the samples are effected by jitter. The probability density functions of the jitter are given by Uniform Distribution and Erlang Distribution. Statistical averaging is applied to conditional expectation of random variable of jitter. From that, conditional variance is obtained which is defined as reconstruction error function and by knowing that, the reconstruction error of a Gaussian Markov Process is determined.
This is a lecture is a series on combustion chemical kinetics for engineers. The course topics are selections from thermodynamics and kinetics especially geared to the interests of engineers involved in combusition
Change Detection of Water-Body in Synthetic Aperture Radar ImagesCSCJournals
Change detection is the art of quantifying the changes in the Synthetic Aperture Radar (SAR) images that have happened over a period of time. Remote sensing has been the parental technique to perform change detection analysis. This paper empirically investigates the impact of applying the combination of texture features for different classification techniques to separate water body from non-water body. At first, the images are classified using unsupervised Principle Component Analysis (PCA) based K-means clustering for dimension reduction. Then the texture features like Energy, Entropy, Contrast , Inverse Differential Moment , Directional Moment and the Median are extracted using Gray Level Co-occurrence Matrix (GLCM) and these features are utilized in Linear Vector Quantization (LVQ) and Support Vector Machine (SVM) classifiers. This paper aims to apply a combination of the texture features in order to significantly improve the accuracy of detection. The utility of detection analysis, influences management and policy decision making for long-term construction projects by predicting the preventable losses.
Fabric Textile Defect Detection, By Selection A Suitable Subset Of Wavelet Co...CSCJournals
This paper presents a novel approach for defect detection of fabric textile. For this purpose, First, all wavelet coefficients were extracted from an perfect fabric. But an optimal subset of These coefficients can delete main fabric of image and indicate defects of fabric textile. So we used Genetic Algorithm for finding a suitable subset. The evaluation function in GA was Shannon entropy. Finally, it was shown that we can gain better results for defect detection, by using two separable sets of wavelet coefficients for horizontal and vertical defects. This approach, not only increases accuracy of fabric defect detection, but also, decreases computation time.
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data IJMER
Clustering mixed type data is one of the major research topics in the area of data mining. In
this paper, a new algorithm for clustering mixed type data is proposed where the concept of distribution
centroid is used to represent the prototype of categorical variables in a cluster which is then combined
with the mean to represent the prototype of clusters with mixed type variables. In the method, data is
observed from different views and the variables are grouped into different views. Those instances that
can be viewed differently from different viewpoints can be defined as multiview data. During clustering
process the differences among views are ignored in usual cases. Here, both views and variables weights
are computed simultaneously. The view weight is used to determine the closeness or density of view and
variable weight is used to identify the significance of each variable. With the intention of determining
the cluster of objects both these weights are used in the distance function. In the proposed method,
enhancement to the k-prototypes is done so that it automatically computes both view and variable
weights. The proposed algorithm MK-Prototypes algorithm is compared with two other clustering
algorithms.
Accelerating materials property predictions using machine learningGhanshyam Pilania
The materials discovery process can be significantly expedited and simplified if we can learn effectively from available knowledge and data. In the present contribution, we show that efficient and accurate prediction of a diverse set of properties of material systems is possible by employing machine (or statistical) learning
methods trained on quantum mechanical computations in combination with the notions of chemical similarity. Using a family of one-dimensional chain systems, we present a general formalism that allows us to discover decision rules that establish a mapping between easily accessible attributes of a system and its properties. It is shown that fingerprints based on either chemo-structural (compositional and configurational information) or the electronic charge density distribution can be used to make ultra-fast, yet accurate, property predictions. Harnessing such learning paradigms extends recent efforts to systematically explore and mine vast chemical spaces, and can significantly accelerate the discovery of new application-specific materials.
Hybrid Method HVS-MRMR for Variable Selection in Multilayer Artificial Neural...IJECEIAES
The variable selection is an important technique the reducing dimensionality of data frequently used in data preprocessing for performing data mining. This paper presents a new variable selection algorithm uses the heuristic variable selection (HVS) and Minimum Redundancy Maximum Relevance (MRMR). We enhance the HVS method for variab le selection by incorporating (MRMR) filter. Our algorithm is based on wrapper approach using multi-layer perceptron. We called this algorithm a HVS-MRMR Wrapper for variables selection. The relevance of a set of variables is measured by a convex combination of the relevance given by HVS criterion and the MRMR criterion. This approach selects new relevant variables; we evaluate the performance of HVS-MRMR on eight benchmark classification problems. The experimental results show that HVS-MRMR selected a less number of variables with high classification accuracy compared to MRMR and HVS and without variables selection on most datasets. HVS-MRMR can be applied to various classification problems that require high classification accuracy.
The tensor language provides a unifying approach that simplifies notation, which leads to compact modeling of multi-way information objects in many knowledge fields, and a thought framework as well. By such a language, it is modeled a generic system that connects to environment through its boundaries.
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATIONorajjournal
This paper presents a lognormal ordinary kriging (LOK) metamodel algorithm and its application to
optimize a stochastic simulation problem. Kriging models have been developed as an interpolation method
in geology. They have been successfully used for the deterministic simulation optimization (SO) problem. In
recent years, kriging metamodeling has attracted a growing interest with stochastic problems. SO
researchers have begun using ordinary kriging through global optimization in stochastic systems. The
goals of this study are to present LOK metamodel algorithm and to analyze the result of the application
step-by-step. The results show that LOK is a powerful alternative metamodel in simulation optimization
when the data are too skewed.
Classification accuracy analyses using Shannon’s EntropyIJERA Editor
There are many methods for determining the Classification Accuracy. In this paper significance of Entropy of
training signatures in Classification has been shown. Entropy of training signatures of the raw digital image
represents the heterogeneity of the brightness values of the pixels in different bands. This implies that an image
comprising a homogeneous lu/lc category will be associated with nearly the same reflectance values that would
result in the occurrence of a very low entropy value. On the other hand an image characterized by the
occurrence of diverse lu/lc categories will consist of largely differing reflectance values due to which the
entropy of such image would be relatively high. This concept leads to analyses of classification accuracy.
Although Entropy has been used many times in RS and GIS but its use in determination of classification
accuracy is new approach.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Dimensionality Reduction Evolution and Validationiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Xmeasures - Accuracy evaluation of overlapping and multi-resolution clusterin...Artem Lutov
Performance of clustering algorithms is evaluated with the help of accuracy metrics. There is a great diversity of clustering algorithms, which are key components of many data analysis and exploration systems. However, there exist only few metrics for the accuracy measurement of overlapping and multi-resolution clustering algorithms on large datasets. In this paper, we first discuss existing metrics, how they satisfy a set of formal constraints, and how they can be applied to specific cases. Then, we propose several optimizations and extensions of these metrics. More specifically, we introduce a new indexing technique to reduce both the runtime and the memory complexity of the Mean F1 score evaluation. Our technique can be applied on large datasets and it is faster on a single CPU than state-of-the-art implementations running on high-performance servers. In addition, we propose several extensions of the discussed metrics to improve their effectiveness and satisfaction to formal constraints without affecting their efficiency. All the metrics discussed in this paper are implemented in C++ and are available for free as open-source packages that can be used either as stand-alone tools or as part of a benchmarking system to compare various clustering algorithms.
AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...Zac Darcy
Various algorithms are known for solving linear system of equations. Iteration methods for solving the
large sparse linear systems are recommended. But in the case of general n× m matrices the classic
iterative algorithms are not applicable except for a few cases. The algorithm presented here is based on the
minimization of residual of solution and has some genetic characteristics which require using Genetic
Algorithms. Therefore, this algorithm is best applicable for construction of parallel algorithms. In this
paper, we describe a sequential version of proposed algorithm and present its theoretical analysis.
Moreover we show some numerical results of the sequential algorithm and supply an improved algorithm
and compare the two algorithms.
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...IJITCA Journal
In this paper, we applied a dynamic model for manoeuvring targets in SIR particle filter algorithm for improving tracking accuracy of multiple manoeuvring targets. In our proposed approach, a color distribution model is used to detect changes of target's model . Our proposed approach controls
deformation of target's model. If deformation of target's model is larger than a predetermined threshold,then the model will be updated. Global Nearest Neighbor (GNN) algorithm is used as data association algorithm. We named our proposed method as Deformation Detection Particle Filter (DDPF) . DDPF
approach is compared with basic SIR-PF algorithm on real airshow videos. Comparisons results show that, the basic SIR-PF algorithm is not able to track the manoeuvring targets when the rotation or scaling is occurred in target' s model. However, DDPF approach updates target's model when the rotation or
scaling is occurred. Thus, the proposed approach is able to track the manoeuvring targets more efficiently
and accurately.
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...IJITCA Journal
In this paper, we applied a dynamic model for manoeuvring targets in SIR particle filter algorithm for improving tracking accuracy of multiple manoeuvring targets. In our proposed approach, a color distribution model is used to detect changes of target's model. Our proposed approach controls deformation of target's model. If deformation of target's model is larger than a predetermined threshold, then the model will be updated. Global Nearest Neighbor (GNN) algorithm is used as data association algorithm. We named our proposed method as Deformation Detection Particle Filter (DDPF). DDPF approach is compared with basic SIR-PF algorithm on real airshow videos. Comparisons results show that, the basic SIR-PF algorithm is not able to track the manoeuvring targets when the rotation or scaling is occurred in target's model. However, DDPF approach updates target's model when the rotation or scaling is occurred. Thus, the proposed approach is able to track the manoeuvring targets more efficientlyand accurately.
Several approaches are proposed to solve global numerical optimization problems. Most of researchers have experimented the robustness of their algorithms by generating the result based on minimization aspect. In this paper, we focus on maximization problems by using several hybrid chemical reaction optimization algorithms including orthogonal chemical reaction optimization (OCRO), hybrid algorithm based on particle swarm and chemical reaction optimization (HP-CRO), real-coded chemical reaction optimization (RCCRO) and hybrid mutation chemical reaction optimization algorithm (MCRO), which showed success in minimization. The aim of this paper is to demonstrate that the approaches inspired by chemical reaction optimization are not only limited to minimization, but also are suitable for maximization. Moreover, experiment comparison related to other maximization algorithms is presented and discussed.
Change Detection of Water-Body in Synthetic Aperture Radar ImagesCSCJournals
Change detection is the art of quantifying the changes in the Synthetic Aperture Radar (SAR) images that have happened over a period of time. Remote sensing has been the parental technique to perform change detection analysis. This paper empirically investigates the impact of applying the combination of texture features for different classification techniques to separate water body from non-water body. At first, the images are classified using unsupervised Principle Component Analysis (PCA) based K-means clustering for dimension reduction. Then the texture features like Energy, Entropy, Contrast , Inverse Differential Moment , Directional Moment and the Median are extracted using Gray Level Co-occurrence Matrix (GLCM) and these features are utilized in Linear Vector Quantization (LVQ) and Support Vector Machine (SVM) classifiers. This paper aims to apply a combination of the texture features in order to significantly improve the accuracy of detection. The utility of detection analysis, influences management and policy decision making for long-term construction projects by predicting the preventable losses.
Fabric Textile Defect Detection, By Selection A Suitable Subset Of Wavelet Co...CSCJournals
This paper presents a novel approach for defect detection of fabric textile. For this purpose, First, all wavelet coefficients were extracted from an perfect fabric. But an optimal subset of These coefficients can delete main fabric of image and indicate defects of fabric textile. So we used Genetic Algorithm for finding a suitable subset. The evaluation function in GA was Shannon entropy. Finally, it was shown that we can gain better results for defect detection, by using two separable sets of wavelet coefficients for horizontal and vertical defects. This approach, not only increases accuracy of fabric defect detection, but also, decreases computation time.
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data IJMER
Clustering mixed type data is one of the major research topics in the area of data mining. In
this paper, a new algorithm for clustering mixed type data is proposed where the concept of distribution
centroid is used to represent the prototype of categorical variables in a cluster which is then combined
with the mean to represent the prototype of clusters with mixed type variables. In the method, data is
observed from different views and the variables are grouped into different views. Those instances that
can be viewed differently from different viewpoints can be defined as multiview data. During clustering
process the differences among views are ignored in usual cases. Here, both views and variables weights
are computed simultaneously. The view weight is used to determine the closeness or density of view and
variable weight is used to identify the significance of each variable. With the intention of determining
the cluster of objects both these weights are used in the distance function. In the proposed method,
enhancement to the k-prototypes is done so that it automatically computes both view and variable
weights. The proposed algorithm MK-Prototypes algorithm is compared with two other clustering
algorithms.
Accelerating materials property predictions using machine learningGhanshyam Pilania
The materials discovery process can be significantly expedited and simplified if we can learn effectively from available knowledge and data. In the present contribution, we show that efficient and accurate prediction of a diverse set of properties of material systems is possible by employing machine (or statistical) learning
methods trained on quantum mechanical computations in combination with the notions of chemical similarity. Using a family of one-dimensional chain systems, we present a general formalism that allows us to discover decision rules that establish a mapping between easily accessible attributes of a system and its properties. It is shown that fingerprints based on either chemo-structural (compositional and configurational information) or the electronic charge density distribution can be used to make ultra-fast, yet accurate, property predictions. Harnessing such learning paradigms extends recent efforts to systematically explore and mine vast chemical spaces, and can significantly accelerate the discovery of new application-specific materials.
Hybrid Method HVS-MRMR for Variable Selection in Multilayer Artificial Neural...IJECEIAES
The variable selection is an important technique the reducing dimensionality of data frequently used in data preprocessing for performing data mining. This paper presents a new variable selection algorithm uses the heuristic variable selection (HVS) and Minimum Redundancy Maximum Relevance (MRMR). We enhance the HVS method for variab le selection by incorporating (MRMR) filter. Our algorithm is based on wrapper approach using multi-layer perceptron. We called this algorithm a HVS-MRMR Wrapper for variables selection. The relevance of a set of variables is measured by a convex combination of the relevance given by HVS criterion and the MRMR criterion. This approach selects new relevant variables; we evaluate the performance of HVS-MRMR on eight benchmark classification problems. The experimental results show that HVS-MRMR selected a less number of variables with high classification accuracy compared to MRMR and HVS and without variables selection on most datasets. HVS-MRMR can be applied to various classification problems that require high classification accuracy.
The tensor language provides a unifying approach that simplifies notation, which leads to compact modeling of multi-way information objects in many knowledge fields, and a thought framework as well. By such a language, it is modeled a generic system that connects to environment through its boundaries.
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATIONorajjournal
This paper presents a lognormal ordinary kriging (LOK) metamodel algorithm and its application to
optimize a stochastic simulation problem. Kriging models have been developed as an interpolation method
in geology. They have been successfully used for the deterministic simulation optimization (SO) problem. In
recent years, kriging metamodeling has attracted a growing interest with stochastic problems. SO
researchers have begun using ordinary kriging through global optimization in stochastic systems. The
goals of this study are to present LOK metamodel algorithm and to analyze the result of the application
step-by-step. The results show that LOK is a powerful alternative metamodel in simulation optimization
when the data are too skewed.
Classification accuracy analyses using Shannon’s EntropyIJERA Editor
There are many methods for determining the Classification Accuracy. In this paper significance of Entropy of
training signatures in Classification has been shown. Entropy of training signatures of the raw digital image
represents the heterogeneity of the brightness values of the pixels in different bands. This implies that an image
comprising a homogeneous lu/lc category will be associated with nearly the same reflectance values that would
result in the occurrence of a very low entropy value. On the other hand an image characterized by the
occurrence of diverse lu/lc categories will consist of largely differing reflectance values due to which the
entropy of such image would be relatively high. This concept leads to analyses of classification accuracy.
Although Entropy has been used many times in RS and GIS but its use in determination of classification
accuracy is new approach.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Dimensionality Reduction Evolution and Validationiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Xmeasures - Accuracy evaluation of overlapping and multi-resolution clusterin...Artem Lutov
Performance of clustering algorithms is evaluated with the help of accuracy metrics. There is a great diversity of clustering algorithms, which are key components of many data analysis and exploration systems. However, there exist only few metrics for the accuracy measurement of overlapping and multi-resolution clustering algorithms on large datasets. In this paper, we first discuss existing metrics, how they satisfy a set of formal constraints, and how they can be applied to specific cases. Then, we propose several optimizations and extensions of these metrics. More specifically, we introduce a new indexing technique to reduce both the runtime and the memory complexity of the Mean F1 score evaluation. Our technique can be applied on large datasets and it is faster on a single CPU than state-of-the-art implementations running on high-performance servers. In addition, we propose several extensions of the discussed metrics to improve their effectiveness and satisfaction to formal constraints without affecting their efficiency. All the metrics discussed in this paper are implemented in C++ and are available for free as open-source packages that can be used either as stand-alone tools or as part of a benchmarking system to compare various clustering algorithms.
AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...Zac Darcy
Various algorithms are known for solving linear system of equations. Iteration methods for solving the
large sparse linear systems are recommended. But in the case of general n× m matrices the classic
iterative algorithms are not applicable except for a few cases. The algorithm presented here is based on the
minimization of residual of solution and has some genetic characteristics which require using Genetic
Algorithms. Therefore, this algorithm is best applicable for construction of parallel algorithms. In this
paper, we describe a sequential version of proposed algorithm and present its theoretical analysis.
Moreover we show some numerical results of the sequential algorithm and supply an improved algorithm
and compare the two algorithms.
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...IJITCA Journal
In this paper, we applied a dynamic model for manoeuvring targets in SIR particle filter algorithm for improving tracking accuracy of multiple manoeuvring targets. In our proposed approach, a color distribution model is used to detect changes of target's model . Our proposed approach controls
deformation of target's model. If deformation of target's model is larger than a predetermined threshold,then the model will be updated. Global Nearest Neighbor (GNN) algorithm is used as data association algorithm. We named our proposed method as Deformation Detection Particle Filter (DDPF) . DDPF
approach is compared with basic SIR-PF algorithm on real airshow videos. Comparisons results show that, the basic SIR-PF algorithm is not able to track the manoeuvring targets when the rotation or scaling is occurred in target' s model. However, DDPF approach updates target's model when the rotation or
scaling is occurred. Thus, the proposed approach is able to track the manoeuvring targets more efficiently
and accurately.
APPLYING DYNAMIC MODEL FOR MULTIPLE MANOEUVRING TARGET TRACKING USING PARTICL...IJITCA Journal
In this paper, we applied a dynamic model for manoeuvring targets in SIR particle filter algorithm for improving tracking accuracy of multiple manoeuvring targets. In our proposed approach, a color distribution model is used to detect changes of target's model. Our proposed approach controls deformation of target's model. If deformation of target's model is larger than a predetermined threshold, then the model will be updated. Global Nearest Neighbor (GNN) algorithm is used as data association algorithm. We named our proposed method as Deformation Detection Particle Filter (DDPF). DDPF approach is compared with basic SIR-PF algorithm on real airshow videos. Comparisons results show that, the basic SIR-PF algorithm is not able to track the manoeuvring targets when the rotation or scaling is occurred in target's model. However, DDPF approach updates target's model when the rotation or scaling is occurred. Thus, the proposed approach is able to track the manoeuvring targets more efficientlyand accurately.
Several approaches are proposed to solve global numerical optimization problems. Most of researchers have experimented the robustness of their algorithms by generating the result based on minimization aspect. In this paper, we focus on maximization problems by using several hybrid chemical reaction optimization algorithms including orthogonal chemical reaction optimization (OCRO), hybrid algorithm based on particle swarm and chemical reaction optimization (HP-CRO), real-coded chemical reaction optimization (RCCRO) and hybrid mutation chemical reaction optimization algorithm (MCRO), which showed success in minimization. The aim of this paper is to demonstrate that the approaches inspired by chemical reaction optimization are not only limited to minimization, but also are suitable for maximization. Moreover, experiment comparison related to other maximization algorithms is presented and discussed.
Particle Swarm Optimization for Nano-Particles Extraction from Supporting Mat...CSCJournals
Metallic and non-metallic nano-particles have attracted much interest concerning their wide applications. Transmission electron microscopy (TEM) is the state of the art method to characterize a nano-particle with respect to size, morphology, structure, or composition. This paper presents an efficient evolutionary computational method, particle swarm optimization (PSO), for automatic segmentation of nano-particles. A threshold-based segmentation technique is applied, where image entropy is attacked as a minimization problem to specify local and global thresholds. We are concerned with reducing wrong characterization of nano-particles due to concentration of liquid solutions or supporting material within the acquired image. The obtained results are compared with manual techniques and with previous researches in this area.
An Automatic Clustering Technique for Optimal ClustersIJCSEA Journal
This paper proposes a simple, automatic and efficient clustering algorithm, namely, Automatic Merging for Optimal Clusters (AMOC) which aims to generate nearly optimal clusters for the given datasets automatically. The AMOC is an extension to standard k-means with a two phase iterative procedure combining certain validation techniques in order to find optimal clusters with automation of merging of clusters. Experiments on both synthetic and real data have proved that the proposed algorithm finds nearly optimal clustering structures in terms of number of clusters, compactness and separation.
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
There is a tremendous proliferation in the amount of information available on the largest shared information source, the World Wide Web. Fast and high-quality clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the information. Recent studies have shown that partitional clustering algorithms such as the k-means algorithm are the most popular algorithms for clustering large datasets. The major problem with partitional clustering algorithms is that they are sensitive to the selection of the initial partitions and are prone to premature converge to local optima. Subtractive clustering is a fast, one-pass algorithm for estimating the number of clusters and cluster centers for any given set of data. The cluster estimates can be used to initialize iterative optimization-based clustering methods and model identification methods. In this paper, we present a hybrid Particle Swarm Optimization, Subtractive + (PSO) clustering algorithm that performs fast clustering. For comparison purpose, we applied the Subtractive + (PSO) clustering algorithm, PSO, and the Subtractive clustering algorithms on three different datasets. The results illustrate that the Subtractive + (PSO) clustering algorithm can generate the most compact clustering results as compared to other algorithms.
A Genetic Algorithm on Optimization Test FunctionsIJMERJOURNAL
ABSTRACT: Genetic Algorithms (GAs) have become increasingly useful over the years for solving combinatorial problems. Though they are generally accepted to be good performers among metaheuristic algorithms, most works have concentrated on the application of the GAs rather than the theoretical justifications. In this paper, we examine and justify the suitability of Genetic Algorithms in solving complex, multi-variable and multi-modal optimization problems. To achieve this, a simple Genetic Algorithm was used to solve four standard complicated optimization test functions, namely Rosenbrock, Schwefel, Rastrigin and Shubert functions. These functions are benchmarks to test the quality of an optimization procedure towards a global optimum. We show that the method has a quicker convergence to the global optima and that the optimal values for the Rosenbrock, Rastrigin, Schwefel and Shubert functions are zero (0), zero (0), -418.9829 and -14.5080 respectively
Relevance Vector Machines for Earthquake Response Spectra drboon
This study uses Relevance Vector Machine (RVM) regression to develop a probabilistic model for the average horizontal component of 5%-damped earthquake response spectra. Unlike conventional models, the proposed approach does not require a functional form, and constructs the model based on a set predictive variables and a set of representative ground motion records. The RVM uses Bayesian inference to determine the confidence intervals, instead of estimating them from the mean squared errors on the training set. An example application using three predictive variables (magnitude, distance and fault mechanism) is presented for sites with shear wave velocities ranging from 450 m/s to 900 m/s. The predictions from the proposed model are compared to an existing parametric model. The results demonstrate the validity of the proposed model, and suggest that it can be used as an alternative to the conventional ground motion models. Future studies will investigate the effect of additional predictive variables on the predictive performance of the model.
Relevance Vector Machines for Earthquake Response Spectra drboon
This study uses Relevance Vector Machine (RVM) regression to develop a probabilistic model for the average horizontal component of 5%-damped earthquake response spectra. Unlike conventional models, the proposed approach does not require a functional form, and constructs the model based on a set predictive variables and a set of representative ground motion records. The RVM uses Bayesian inference to determine the confidence intervals, instead of estimating them from the mean squared errors on the training set. An example application using three predictive variables (magnitude, distance and fault mechanism) is presented for sites with shear wave velocities ranging from 450 m/s to 900 m/s. The predictions from the proposed model are compared to an existing parametric model. The results demonstrate the validity of the proposed model, and suggest that it can be used as an alternative to the conventional ground motion models. Future studies will investigate the effect of additional predictive variables on the predictive performance of the model.
nternational Journal of Engineering Research and Development is an international premier peer reviewed open access engineering and technology journal promoting the discovery, innovation, advancement and dissemination of basic and transitional knowledge in engineering, technology and related disciplines.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
RJMCMC in clustering
1. .
.
Clustering by mixture model
Pham The Thong
April 22, 2011
Pham The Thong ( ) Clustering by mixture model April 22, 2011 1 / 44
2. Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 2 / 44
3. RJMCMC in clustering Clustering overview
Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 3 / 44
4. RJMCMC in clustering Clustering overview
Clustering overview
Divide the observations into groups.
Predict group of a new observation.
Model-based clustering: select a probabilistic model
that underlying the observations and make
statistical inferences based on that model. One
popular model is the mixture model.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 4 / 44
5. RJMCMC in clustering Clustering overview
Clustering via mixture model
X = (x1 , · · · , xn ) be independent p-dimensional
observations from G populations.
∑
G
f (xi |w, θ) = wk f (xi |θk )
k=1
f (xi |θk ) is the density of an observation xi from the kth
component.
w = (w1 , · · · , wG )T are component weights.
θ = (θ1 , · · · , θG )T are component parameters.
Clustering is done via allocation vector
y = (y1 , · · · , yn )T : yi = k if the ith observation xi comes
from component k.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 5 / 44
6. RJMCMC in clustering Clustering overview
Some approaches
Model Selection: Compare some model selection
criteria of fixed-G models for various values of G to
choose the best G . Inference on fixed-G model is
often done via EM algorithm or Gibbs sampler.
Nonparametric method: Use Dirichlet Process.
Trans-dimensional Markov Chain Monte Carlo
(MCMC): Allow G to be changed during the
inference process by combining Gibbs sampler with
MCMC moves that can change dimension of the
model. Reversible jump MCMC (RJMCMC) is one
possible scheme.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 6 / 44
7. RJMCMC in clustering Reversible Jump MCMC
Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 7 / 44
8. RJMCMC in clustering Reversible Jump MCMC
Overview
First developed in Green(1995)
Has applications ranged well beyond mixture model
analysis.
Mixture model analysis power first demonstrated in
Richardson&Green(1997). They considered only the
1-dimensional case.
Applied to multidimensional setting in Tadesse et.al.
(2005).
Pham The Thong ( ) Clustering by mixture model April 22, 2011 8 / 44
9. RJMCMC in clustering Reversible Jump MCMC
Some advantages of clustering by
RJMCMC
Avoid the task of model selection.
Provide a coherent Bayesian framework. The cluster
number G is not treated as a special parameter.
Can provide useful summary of data which is
difficult to obtain by other methods.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 9 / 44
10. RJMCMC in clustering Reversible Jump MCMC
General ideas of RJMCMC I
Simulating a Markov Chain that converges to the
full posterior distribution p(G , y, w, θ|X).
Hybrid sampler consist of Gibbs Sampler(the base)
and jump moves (the extension).
Gibbs sampler will sample (y, w, θ). Jump moves
will sample the cluster number G .
The jump moves come in pair: Split/Merge and
Birth/Death
Pham The Thong ( ) Clustering by mixture model April 22, 2011 10 / 44
11. RJMCMC in clustering Reversible Jump MCMC
General ideas of RJMCMC II
Split move: split one component into two
components.
Merge move: combine two components into one
component.
Birth move: create an empty component.
Death move: delete an empty component.
At each iteration, propose to perform Split(Birth)
move with some fixed probability bk and with
probability 1 − bk propose to perform Merge(Death)
move.
In one proposal, calculate all the changes to the
model as if the move was made.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 11 / 44
12. RJMCMC in clustering Reversible Jump MCMC
General ideas of RJMCMC III
Calculate the acceptance probability A, which is the
product of three terms:
the ratio of the posterior of the new model to that of the
old model
the ratio of the probability of the way to go from the
new model back to the old model to that of the way to
go from old model to new model
the Jacobian arises from the change of dimension
To ensure convergence to the desired distribution,
only actually carry out the move with probability
min(1, A).
Pham The Thong ( ) Clustering by mixture model April 22, 2011 12 / 44
13. Richardson&Green(1997) Overview
Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 13 / 44
14. Richardson&Green(1997) Overview
Overview
1-dimensional data.
Goal:
Clustering data.
Estimating component parameters.
Estimating the distribution of data.
Predicting group of new data.
Demonstrated in three real dataset: Enzym, Acid,
and Galaxy.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 14 / 44
15. Richardson&Green(1997) Split/Merge and Birth/Death Mechanism
Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 15 / 44
16. Richardson&Green(1997) Split/Merge and Birth/Death Mechanism
Split/Merge Mechanism
In Split move, select one component (wj ∗ , µj ∗ , σj ∗ )
to split to 2 components (wj1 , µj1 , σj1 ) and
(wj2 , µj2 , σj2 ).
In Merge move, select two components (wj1 , µj1 , σj1 )
and (wj2 , µj2 , σj2 ) to merge into one new component
(wj ∗ , µj ∗ , σj ∗ ).
Equalizing the zeroth, first, second moment of the
new component to those of a combination of the
two old components.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 16 / 44
17. Richardson&Green(1997) Split/Merge and Birth/Death Mechanism
Birth/Death Mechanism
Birth move
Generate wj ∗ , µj ∗ , σj ∗ from some distributions.
Rescale the weights.
Death move
Delete a randomly chosen empty component.
Rescale the weights.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 17 / 44
18. Richardson&Green(1997) Algorithm
Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 18 / 44
19. Richardson&Green(1997) Algorithm
One iteration contains
Gibbs Sampler:
Updating the weights w
Updating the parameters µ, σ
Updating the allocation y
Split/Merge move
Birth/Death move
Pham The Thong ( ) Clustering by mixture model April 22, 2011 19 / 44
20. Richardson&Green(1997) Result
Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 20 / 44
21. Richardson&Green(1997) Result
Post simulation
By processing the raw data come from the simulation,
one can
clustering data by selecting the allocation vector y
that has the highest frequency.
estimating component parameters by their posterior
mean.
estimating the distribution of data.
predicting group of new data.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 21 / 44
22. Richardson&Green(1997) Result
The three dataset
Enzym data: enzymatic activity of one enzyme in
the blood of 245 unrelated people. The interest is
identifying subgroups of slow or fast activity as a
marker of genetic polymorphism in the general
population(i.e. to some extent, people of the same
subgroup may have similar genetic structure
although they are unrelated).
Acid data: acidity level of 155 lakes in Wisconsin.
Galaxy data: velocities of 82 galaxies diverging from
our galaxy.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 22 / 44
23. Richardson&Green(1997) Result
Pham The Thong ( ) Clustering by mixture model April 22, 2011 23 / 44
24. Richardson&Green(1997) Result
Pham The Thong ( ) Clustering by mixture model April 22, 2011 24 / 44
25. Tadesse et.al.(2005) Overview
Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 25 / 44
26. Tadesse et.al.(2005) Overview
Overview
High dimensional data
Goal:
Variable selecting.
Clustering data.
Predicting group of new data.
Applied to microarray data.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 26 / 44
27. Tadesse et.al.(2005) Variable Selection
Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 27 / 44
28. Tadesse et.al.(2005) Variable Selection
Concept
Perhaps not all variables are useful for clustering.
By throwing away non-discriminating variables
(irrelevant variables) and clustering only on
discriminating variables (relevant variables) we may
improve clustering accuracy.
We can think of variable selection as one way to
generalize the basic approach “clustering by the full
set of variables” to “clustering by a subset of
variables”.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 28 / 44
29. Tadesse et.al.(2005) Variable Selection
The model of Tadesse et.al. I
Introduce γ = (γ1 , · · · , γp ): γj = 1 if the jth variable is
a discriminating variable and 0 if it is not.
Use (γ) and (γ c ) to index discriminating variables and
non-discriminating variables.
Three assumptions:
The set of discriminating variables and the set of
non-discriminating variables are independent.
If we look only at (γ c ), the data X(γ c ) have a
normal distribution(hence unsuitable for clustering).
If we look only at (γ), the data X(γ) have a mixture
distribution of G normal components (hence
suitable for clustering).
Pham The Thong ( ) Clustering by mixture model April 22, 2011 29 / 44
30. Tadesse et.al.(2005) Variable Selection
The model of Tadesse et.al. II
(η (γ c ) , Ω(γ c ) ): mean and covariance for the
non-discriminating variables.
(µk(γ) , Σk(γ) ): mean and covariance for the kth
components Ck .
The three assumptions can be written as
∏
n
( )
p(X|G , γ, w, y, µ, Σ, η, Ω) = N xi(γ c ) , η (γ c ) , Ω(γ c )
i=1
∏G ∏ ( )
N xi(γ) , µk(γ) , Σk(γ)
k=1 xi ∈Ck
Pham The Thong ( ) Clustering by mixture model April 22, 2011 30 / 44
31. Tadesse et.al.(2005) Variable Selection
Searching for γ
The problem of variable selection is re-casted as a
problem of searching for the most probable binary
vector γ.
Use a Metropolis search(of which Simulated
Annealing is one type)
At each step randomly choosing one of the following
two transitional moves: flip one bit or swap two bit
of γ(and accept the ) move with probability
new
|X,y,w,G
min 1, p(γ old |X,y,w,G )) .
p(γ
Pham The Thong ( ) Clustering by mixture model April 22, 2011 31 / 44
32. Tadesse et.al.(2005) RJMCMC Mechanism
Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 32 / 44
33. Tadesse et.al.(2005) RJMCMC Mechanism
Difficulties in high dimension
Unlike 1-dimensional case, there is no obvious way
to split a covariance matrix into two covariance
matrix. Even if this could be done[4], the Jacobian
may not have closed-form.
The number of model parameters increases rapidly
with order p 2 . The chain may converge very slowly.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 33 / 44
34. Tadesse et.al.(2005) RJMCMC Mechanism
Approach of Tadesse et.al.
Integrating out the mean vector and the covariance
matrix to obtain a marginalized posterior in which
only G , w, γ,and y are involved.
Despite being quite tedious, the math follows a
standard framework: define conjugate priors for
mean and covariance matrix and then take the
integration.
Only need to split or merge the weights of
components in Split/Merge move. Birth/Death
move are the same as in 1-dimensional case.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 34 / 44
35. Tadesse et.al.(2005) RJMCMC Mechanism
Algorithm
One iteration contains
Metropolis search for γ
Gibbs sampler:
Updating the weights w
Updating the allocation y
Split/Merge move
Birth/Death move
Pham The Thong ( ) Clustering by mixture model April 22, 2011 35 / 44
36. Tadesse et.al.(2005) Result
Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 36 / 44
37. Tadesse et.al.(2005) Result
Post simulation
Since the mean and covariance are integrated out,
there is no estimation for component parameters.
Variable selection:
Method 1: select the vector γ that have the highest
frequency.
Method 2: select all variables j that have p(γj |X, G )
greater than some threshold: p(γj |X, G ) ≥ a.
Clustering and group prediction can be done in the
same way as in the univariate case.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 37 / 44
38. Tadesse et.al.(2005) Result
Microarray data
14 samples (samples are come from tissues).
Variables are genes. There are 762 variables.
By clustering the samples into subgroups, one may
find out which genes are relevant to each subgroup.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 38 / 44
39. Tadesse et.al.(2005) Result
Pham The Thong ( ) Clustering by mixture model April 22, 2011 39 / 44
40. Tadesse et.al.(2005) Result
Pham The Thong ( ) Clustering by mixture model April 22, 2011 40 / 44
41. Tadesse et.al.(2005) Weakness of the model
Outline
.
1 RJMCMC in clustering
Clustering overview
Reversible Jump MCMC
.
2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with an
Unknown Number of Components
Overview
Split/Merge and Birth/Death Mechanism
Algorithm
Result
.
3 Tadesse et.al.(2005): Bayesian Variable Selection in Clustering
High-Dimensional Data
Overview
Variable Selection
RJMCMC Mechanism
Result
Weakness of the model
Pham The Thong ( ) Clustering by mixture model April 22, 2011 41 / 44
42. Tadesse et.al.(2005) Weakness of the model
Weakness of the model [5]
The independence assumption would often lead to
the wrongly case in which one irrelevant variable be
identified as a discriminating one because it is
related to some discriminating variables.
It is not known whether one can relax this
assumption while still being able to perform
RJMCMC-based full Bayesian analysis.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 42 / 44
43. Tadesse et.al.(2005) Weakness of the model
References
[1]P.J.Green(1995), Reversible jump Markov chain Monte Carlo
computation and Bayesian model determination, Biometrica
82,4,711-732.
[2]S.Richardson and P.J.Green(1997), On Bayesian Analysis of
Mixtures with an Unknown Number of Components, J.R.Statist.
Soc.B 59, 4,731-792.
[3]M.G.Tadesse, N.Sha, and M. Vannucci(2005), Bayesian Variable
Selection in Clustering High-Dimensional Data,Journal of the
American Statistical Association 100,470,602-617.
[4]Petros Dellaportas and Ioulia Papageorgiou(2006), Multivariate
mixtures of normals with unknown number of components,Statistics
and Computing 16,1,57 - 68.
[5]Maugis et.al.(2009), Variable Selection for Clustering with
Gaussian Mixture Models, Biometrics 65, 701-709.
Pham The Thong ( ) Clustering by mixture model April 22, 2011 43 / 44
44. Tadesse et.al.(2005) Weakness of the model
Thank you for your attention
Pham The Thong ( ) Clustering by mixture model April 22, 2011 44 / 44