Slides for the presentation at ENBIS 2018 of "Deep k-Means: Jointly Clustering with k-Means and Learning Representations" by Thibaut Thonet. Joint work with Maziar Moradi Fard and Eric Gaussier.
The document discusses the 0/1 knapsack problem and the greedy algorithm approach. It describes the knapsack problem as selecting a subset of items with weights and values that fit within a knapsack capacity while maximizing the total value. The greedy algorithm works by selecting the highest value item at each step that fits within remaining capacity. The document provides an example problem of selecting boxes to fill a knapsack of 15kg capacity to maximize profit. It outlines the recurrence relation and time/space complexity of the greedy knapsack algorithm.
This document provides information about an algorithms course, including the course syllabus and topics that will be covered. The course topics include introduction to algorithms, analysis of algorithms, algorithm design techniques like divide and conquer, greedy algorithms, dynamic programming, backtracking, and branch and bound. It also covers NP-hard and NP-complete problems. The syllabus outlines 5 units that will analyze performance, teach algorithm design methods, and solve problems using techniques like divide and conquer, dynamic programming, and backtracking. It aims to help students choose appropriate algorithms and data structures for applications and understand how algorithm design impacts program performance.
This document describes the closest pair problem and an efficient divide and conquer algorithm to solve it in O(n log n) time. The problem is to find the shortest distance between any two points in a set of n points. A naive O(n2) brute force approach compares all pairs of points. The divide and conquer approach partitions the points into halves, solves the subproblems recursively, and merges the results by checking at most 6 points within a rectangle region to find the closest pair between subsets.
Closest pair problems (Divide and Conquer)Gem WeBlog
The document describes the divide and conquer algorithm for solving the closest pair problem. It divides the set of points into two equal subsets, recursively finds the closest pairs within each subset, and then examines point pairs between the two subsets that fall within a strip of width 2d, where d is the minimum of the closest pairs in each subset. It scans points within this strip to update the closest pair distance. The runtime of this algorithm is O(n log n) when applying the Master Theorem.
This document provides definitions and concepts related to graph theory. It begins with a brief history of graph theory and then defines basic concepts such as graphs, nodes, edges, adjacency, incidence, isomorphism, subgraphs, walks, trails, paths, connectedness, trees, and spanning trees. It also introduces different types of graphs including null graphs, cycle graphs, path graphs, complete graphs, bipartite graphs, and complete bipartite graphs. Finally, it discusses how vector spaces can be associated with graphs and defines the properties of cycle and cutset spaces.
Asymptotic notations(Big O, Omega, Theta )swapnac12
The document discusses different asymptotic notations used to characterize the complexity of algorithms: Big-O(O) notation provides an upper bound, Big-Omega(Ω) provides a lower bound, and Big-Theta(Θ) indicates the same order of growth. It defines each notation, explaining that Big-O represents f(n) growing less than or equal to g(n), Big-Omega represents f(n) growing greater than or equal to g(n), and Big-Theta represents f(n) growing equal to g(n). The document then discusses basics of probability theory, defining a sample space as the set of all possible outcomes of an experiment, with events being subsets of the sample space.
The document discusses the 0/1 knapsack problem and the greedy algorithm approach. It describes the knapsack problem as selecting a subset of items with weights and values that fit within a knapsack capacity while maximizing the total value. The greedy algorithm works by selecting the highest value item at each step that fits within remaining capacity. The document provides an example problem of selecting boxes to fill a knapsack of 15kg capacity to maximize profit. It outlines the recurrence relation and time/space complexity of the greedy knapsack algorithm.
This document provides information about an algorithms course, including the course syllabus and topics that will be covered. The course topics include introduction to algorithms, analysis of algorithms, algorithm design techniques like divide and conquer, greedy algorithms, dynamic programming, backtracking, and branch and bound. It also covers NP-hard and NP-complete problems. The syllabus outlines 5 units that will analyze performance, teach algorithm design methods, and solve problems using techniques like divide and conquer, dynamic programming, and backtracking. It aims to help students choose appropriate algorithms and data structures for applications and understand how algorithm design impacts program performance.
This document describes the closest pair problem and an efficient divide and conquer algorithm to solve it in O(n log n) time. The problem is to find the shortest distance between any two points in a set of n points. A naive O(n2) brute force approach compares all pairs of points. The divide and conquer approach partitions the points into halves, solves the subproblems recursively, and merges the results by checking at most 6 points within a rectangle region to find the closest pair between subsets.
Closest pair problems (Divide and Conquer)Gem WeBlog
The document describes the divide and conquer algorithm for solving the closest pair problem. It divides the set of points into two equal subsets, recursively finds the closest pairs within each subset, and then examines point pairs between the two subsets that fall within a strip of width 2d, where d is the minimum of the closest pairs in each subset. It scans points within this strip to update the closest pair distance. The runtime of this algorithm is O(n log n) when applying the Master Theorem.
This document provides definitions and concepts related to graph theory. It begins with a brief history of graph theory and then defines basic concepts such as graphs, nodes, edges, adjacency, incidence, isomorphism, subgraphs, walks, trails, paths, connectedness, trees, and spanning trees. It also introduces different types of graphs including null graphs, cycle graphs, path graphs, complete graphs, bipartite graphs, and complete bipartite graphs. Finally, it discusses how vector spaces can be associated with graphs and defines the properties of cycle and cutset spaces.
Asymptotic notations(Big O, Omega, Theta )swapnac12
The document discusses different asymptotic notations used to characterize the complexity of algorithms: Big-O(O) notation provides an upper bound, Big-Omega(Ω) provides a lower bound, and Big-Theta(Θ) indicates the same order of growth. It defines each notation, explaining that Big-O represents f(n) growing less than or equal to g(n), Big-Omega represents f(n) growing greater than or equal to g(n), and Big-Theta represents f(n) growing equal to g(n). The document then discusses basics of probability theory, defining a sample space as the set of all possible outcomes of an experiment, with events being subsets of the sample space.
The document discusses applications and algorithms for satisfiability problems in Boolean logic and functions. It covers topics such as truth tables, composing functions, optimization, decision trees, binary decision diagrams, satisfiability communities and conferences, SAT file formats, and algorithms like resolution, variable elimination, local search, and circuit-based value assignment.
This file contains the concepts of Class P, Class NP, NP- completeness, Travelling Salesman Person problem, Clique Problem, Vertex cover problem, Hamiltonian problem, FFT and DFT.
This document summarizes the DBSCAN clustering algorithm. DBSCAN finds clusters based on density, requiring only two parameters: Eps, which defines the neighborhood distance, and MinPts, the minimum number of points required to form a cluster. It can discover clusters of arbitrary shape. The algorithm works by expanding clusters from core points, which have at least MinPts points within their Eps-neighborhood. Points that are not part of any cluster are classified as noise. Applications include spatial data analysis, image segmentation, and automatic border detection in medical images.
- NP-hard problems are at least as hard as problems in NP. A problem is NP-hard if any problem in NP can be reduced to it in polynomial time.
- Cook's theorem states that if the SAT problem can be solved in polynomial time, then every problem in NP can be solved in polynomial time.
- Vertex cover problem is proven to be NP-hard by showing that independent set problem reduces to it in polynomial time, meaning there is a polynomial time algorithm that converts any instance of independent set into an instance of vertex cover.
- Therefore, if there was a polynomial time algorithm for vertex cover, it could be used to solve independent set in polynomial time. Since independent set is NP-complete
This document discusses algorithm analysis and complexity. It defines key terms like algorithm, asymptotic complexity, Big-O notation, and time complexity. It provides examples of analyzing simple algorithms like summing array elements. The running time is expressed as a function of input size n. Common complexities like constant, linear, quadratic, and exponential time are introduced. Nested loops and sequences of statements are analyzed. The goal of analysis is to classify algorithms into complexity classes to understand how input size affects runtime.
Linear Discriminant Analysis (LDA) is a dimensionality reduction technique that projects data onto a lower dimensional space to maximize separation between classes. It works by computing eigenvectors from within-class and between-class scatter matrices to generate a linear transformation of the data. The transformation projects the high-dimensional data onto a new subspace while preserving the separation between classes.
The document discusses pushdown automata (PDA). It defines a PDA as a 7-tuple that includes a set of states, input alphabet, stack alphabet, initial/start state, starting stack symbol, set of final/accepting states, and a transition function. PDAs operate on an input tape with a stack, and can accept languages that finite automata cannot, such as anbn. The document provides examples of designing PDAs for specific languages and converting between context-free grammars and PDAs.
Riemannian stochastic variance reduced gradient on Grassmann manifold (ICCOPT...Hiroyuki KASAI
Stochastic variance reduction algorithms have recently become popular for minimizing the average of a large, but finite, number of loss functions. In this paper, we propose a novel Riemannian extension of the Euclidean stochastic variance reduced gradient algorithm (R-SVRG) to a compact manifold search space. To this end, we show the developments on the Grassmann manifold. The key challenges of averaging, addition, and subtraction of multiple gradients are addressed with notions like logarithm mapping and parallel translation of vectors on the Grassmann manifold. We present a global convergence analysis of the proposed algorithm with a decay step-size and a local convergence rate analysis under a fixed step-size with under some natural assumptions. The proposed algorithm is applied on a number of problems on the Grassmann manifold like principal components analysis, low-rank matrix completion, and the Karcher mean computation. In all these cases, the proposed algorithm outperforms the standard Riemannian stochastic gradient descent algorithm.
Cheat Sheet for Machine Learning in Python: Scikit-learnKarlijn Willems
Get started with machine learning in Python thanks to this scikit-learn cheat sheet, which is a handy one-page reference that guides you through the several steps to make your own machine learning models. Thanks to the code examples, you won't get lost!
Problem | Problem v/s Algorithm v/s Program | Types of Problems | Computational complexity | P class v/s NP class Problems | Polynomial time v/s Exponential time | Deterministic v/s non-deterministic Algorithms | Functions of non-deterministic Algorithms | Non-deterministic searching Algorithm | Non-deterministic sorting Algorithm | NP - Hard and NP - Complete Problems | Reduction | properties of reduction | Satisfiability problem and Algorithm
This document discusses graph coloring and its applications. It begins by defining graph coloring as assigning labels or colors to elements of a graph such that no two adjacent elements have the same color. It then provides examples of vertex coloring, edge coloring, and face coloring. The document also discusses the chromatic number and chromatic polynomial. It describes several real-world applications of graph coloring, including frequency assignment in cellular networks.
This document discusses k-means clustering and different initialization methods. K-means clustering partitions objects into k clusters based on their attributes, with objects in the same cluster being similar and objects in different clusters being dissimilar. The initialization method affects the clustering result and number of iterations, with better initialization methods leading to fewer iterations. The document compares random, Forgy, MacQueen, and Kaufman initialization methods.
Introduction to Algorithms and Asymptotic NotationAmrinder Arora
Asymptotic Notation is a notation used to represent and compare the efficiency of algorithms. It is a concise notation that deliberately omits details, such as constant time improvements, etc. Asymptotic notation consists of 5 commonly used symbols: big oh, small oh, big omega, small omega, and theta.
Red-black trees are binary search trees where each node is colored red or black. They provide O(log N) operations by ensuring that every path from root to leaf contains the same number of black nodes. They can be viewed as representations of 2-3-4 trees, where each black node corresponds to a node in the 2-3 tree and each red node is placed in the same node as its black parent. Inserting a new node may cause a "double red" violation, requiring restructuring by rotations or recoloring to maintain the red-black properties.
The document discusses the convex hull algorithm. It begins by defining a convex hull as the shape a rubber band would take if stretched around pins on a board. It then provides explanations of extreme points, edges, and applications of convex hulls. Various algorithms for finding convex hulls are presented, including divide and conquer in O(n log n) time and Jarvis march in O(n^2) time in the worst case.
This document provides information about clustering and cluster analysis. It begins by defining clustering as the process of grouping objects into classes of similar objects. It then discusses what a cluster is and different types of clustering techniques, including partitioning methods like k-means clustering. K-means clustering is explained as an algorithm that assigns objects to clusters based on minimizing distance between objects and cluster centers, then updating the cluster centers. Examples are provided to demonstrate how k-means clustering works on a sample dataset.
Mask R-CNN is an algorithm for instance segmentation that builds upon Faster R-CNN by adding a branch for predicting masks in parallel with bounding boxes. It uses a Feature Pyramid Network to extract features at multiple scales, and RoIAlign instead of RoIPool for better alignment between masks and their corresponding regions. The architecture consists of a Region Proposal Network for generating candidate object boxes, followed by two branches - one for classification and box regression, and another for predicting masks with a fully convolutional network using per-pixel sigmoid activations and binary cross-entropy loss. Mask R-CNN achieves state-of-the-art performance on standard instance segmentation benchmarks.
This document discusses the merge sort algorithm for sorting a sequence of numbers. It begins by introducing the divide and conquer approach, which merge sort uses. It then provides an example of how merge sort works, dividing the sequence into halves, sorting the halves recursively, and then merging the sorted halves together. The document proceeds to provide pseudocode for the merge sort and merge algorithms. It analyzes the running time of merge sort using recursion trees, determining that it runs in O(n log n) time. Finally, it covers techniques for solving recurrence relations that arise in algorithms like divide and conquer approaches.
This document presents information about graph coloring. It defines graph coloring as assigning colors to the vertices of a graph so that no adjacent vertices have the same color. There are two types: edge coloring and vertex coloring. Graph coloring is useful for problems like scheduling and radio channel assignment. However, finding the minimum number of colors to color a graph is an NP-complete problem, meaning there is no known efficient algorithm. Some common graph coloring algorithms discussed are greedy algorithm and Welsh-Powell algorithm.
Information-theoretic clustering with applicationsFrank Nielsen
Information-theoretic clustering with applications
Abstract: Clustering is a fundamental and key primitive to discover structural groups of homogeneous data in data sets, called clusters. The most famous clustering technique is the celebrated k-means clustering that seeks to minimize the sum of intra-cluster variances. k-Means is NP-hard as soon as the dimension and the number of clusters are both greater than 1. In the first part of the talk, we first present a generic dynamic programming method to compute the optimal clustering of n scalar elements into k pairwise disjoint intervals. This case includes 1D Euclidean k-means but also other kinds of clustering algorithms like the k-medoids, the k-medians, the k-centers, etc.
We extend the method to incorporate cluster size constraints and show how to choose the appropriate number of clusters using model selection. We then illustrate and refine the method on two case studies: 1D Bregman clustering and univariate statistical mixture learning maximizing the complete likelihood. In the second part of the talk, we introduce a generalization of k-means to cluster sets of histograms that has become an important ingredient of modern information processing due to the success of the bag-of-word modelling paradigm.
Clustering histograms can be performed using the celebrated k-means centroid-based algorithm. We consider the Jeffreys divergence that symmetrizes the Kullback-Leibler divergence, and investigate the computation of Jeffreys centroids. We prove that the Jeffreys centroid can be expressed analytically using the Lambert W function for positive histograms. We then show how to obtain a fast guaranteed approximation when dealing with frequency histograms and conclude with some remarks on the k-means histogram clustering.
References: - Optimal interval clustering: Application to Bregman clustering and statistical mixture learning IEEE ISIT 2014 (recent result poster) http://arxiv.org/abs/1403.2485
- Jeffreys Centroids: A Closed-Form Expression for Positive Histograms and a Guaranteed Tight Approximation for Frequency Histograms.
IEEE Signal Process. Lett. 20(7): 657-660 (2013) http://arxiv.org/abs/1303.7286
http://www.i.kyoto-u.ac.jp/informatics-seminar/
The document outlines various statistical and data analysis techniques that can be performed in R including importing data, data visualization, correlation and regression, and provides code examples for functions to conduct t-tests, ANOVA, PCA, clustering, time series analysis, and producing publication-quality output. It also reviews basic R syntax and functions for computing summary statistics, transforming data, and performing vector and matrix operations.
The document discusses applications and algorithms for satisfiability problems in Boolean logic and functions. It covers topics such as truth tables, composing functions, optimization, decision trees, binary decision diagrams, satisfiability communities and conferences, SAT file formats, and algorithms like resolution, variable elimination, local search, and circuit-based value assignment.
This file contains the concepts of Class P, Class NP, NP- completeness, Travelling Salesman Person problem, Clique Problem, Vertex cover problem, Hamiltonian problem, FFT and DFT.
This document summarizes the DBSCAN clustering algorithm. DBSCAN finds clusters based on density, requiring only two parameters: Eps, which defines the neighborhood distance, and MinPts, the minimum number of points required to form a cluster. It can discover clusters of arbitrary shape. The algorithm works by expanding clusters from core points, which have at least MinPts points within their Eps-neighborhood. Points that are not part of any cluster are classified as noise. Applications include spatial data analysis, image segmentation, and automatic border detection in medical images.
- NP-hard problems are at least as hard as problems in NP. A problem is NP-hard if any problem in NP can be reduced to it in polynomial time.
- Cook's theorem states that if the SAT problem can be solved in polynomial time, then every problem in NP can be solved in polynomial time.
- Vertex cover problem is proven to be NP-hard by showing that independent set problem reduces to it in polynomial time, meaning there is a polynomial time algorithm that converts any instance of independent set into an instance of vertex cover.
- Therefore, if there was a polynomial time algorithm for vertex cover, it could be used to solve independent set in polynomial time. Since independent set is NP-complete
This document discusses algorithm analysis and complexity. It defines key terms like algorithm, asymptotic complexity, Big-O notation, and time complexity. It provides examples of analyzing simple algorithms like summing array elements. The running time is expressed as a function of input size n. Common complexities like constant, linear, quadratic, and exponential time are introduced. Nested loops and sequences of statements are analyzed. The goal of analysis is to classify algorithms into complexity classes to understand how input size affects runtime.
Linear Discriminant Analysis (LDA) is a dimensionality reduction technique that projects data onto a lower dimensional space to maximize separation between classes. It works by computing eigenvectors from within-class and between-class scatter matrices to generate a linear transformation of the data. The transformation projects the high-dimensional data onto a new subspace while preserving the separation between classes.
The document discusses pushdown automata (PDA). It defines a PDA as a 7-tuple that includes a set of states, input alphabet, stack alphabet, initial/start state, starting stack symbol, set of final/accepting states, and a transition function. PDAs operate on an input tape with a stack, and can accept languages that finite automata cannot, such as anbn. The document provides examples of designing PDAs for specific languages and converting between context-free grammars and PDAs.
Riemannian stochastic variance reduced gradient on Grassmann manifold (ICCOPT...Hiroyuki KASAI
Stochastic variance reduction algorithms have recently become popular for minimizing the average of a large, but finite, number of loss functions. In this paper, we propose a novel Riemannian extension of the Euclidean stochastic variance reduced gradient algorithm (R-SVRG) to a compact manifold search space. To this end, we show the developments on the Grassmann manifold. The key challenges of averaging, addition, and subtraction of multiple gradients are addressed with notions like logarithm mapping and parallel translation of vectors on the Grassmann manifold. We present a global convergence analysis of the proposed algorithm with a decay step-size and a local convergence rate analysis under a fixed step-size with under some natural assumptions. The proposed algorithm is applied on a number of problems on the Grassmann manifold like principal components analysis, low-rank matrix completion, and the Karcher mean computation. In all these cases, the proposed algorithm outperforms the standard Riemannian stochastic gradient descent algorithm.
Cheat Sheet for Machine Learning in Python: Scikit-learnKarlijn Willems
Get started with machine learning in Python thanks to this scikit-learn cheat sheet, which is a handy one-page reference that guides you through the several steps to make your own machine learning models. Thanks to the code examples, you won't get lost!
Problem | Problem v/s Algorithm v/s Program | Types of Problems | Computational complexity | P class v/s NP class Problems | Polynomial time v/s Exponential time | Deterministic v/s non-deterministic Algorithms | Functions of non-deterministic Algorithms | Non-deterministic searching Algorithm | Non-deterministic sorting Algorithm | NP - Hard and NP - Complete Problems | Reduction | properties of reduction | Satisfiability problem and Algorithm
This document discusses graph coloring and its applications. It begins by defining graph coloring as assigning labels or colors to elements of a graph such that no two adjacent elements have the same color. It then provides examples of vertex coloring, edge coloring, and face coloring. The document also discusses the chromatic number and chromatic polynomial. It describes several real-world applications of graph coloring, including frequency assignment in cellular networks.
This document discusses k-means clustering and different initialization methods. K-means clustering partitions objects into k clusters based on their attributes, with objects in the same cluster being similar and objects in different clusters being dissimilar. The initialization method affects the clustering result and number of iterations, with better initialization methods leading to fewer iterations. The document compares random, Forgy, MacQueen, and Kaufman initialization methods.
Introduction to Algorithms and Asymptotic NotationAmrinder Arora
Asymptotic Notation is a notation used to represent and compare the efficiency of algorithms. It is a concise notation that deliberately omits details, such as constant time improvements, etc. Asymptotic notation consists of 5 commonly used symbols: big oh, small oh, big omega, small omega, and theta.
Red-black trees are binary search trees where each node is colored red or black. They provide O(log N) operations by ensuring that every path from root to leaf contains the same number of black nodes. They can be viewed as representations of 2-3-4 trees, where each black node corresponds to a node in the 2-3 tree and each red node is placed in the same node as its black parent. Inserting a new node may cause a "double red" violation, requiring restructuring by rotations or recoloring to maintain the red-black properties.
The document discusses the convex hull algorithm. It begins by defining a convex hull as the shape a rubber band would take if stretched around pins on a board. It then provides explanations of extreme points, edges, and applications of convex hulls. Various algorithms for finding convex hulls are presented, including divide and conquer in O(n log n) time and Jarvis march in O(n^2) time in the worst case.
This document provides information about clustering and cluster analysis. It begins by defining clustering as the process of grouping objects into classes of similar objects. It then discusses what a cluster is and different types of clustering techniques, including partitioning methods like k-means clustering. K-means clustering is explained as an algorithm that assigns objects to clusters based on minimizing distance between objects and cluster centers, then updating the cluster centers. Examples are provided to demonstrate how k-means clustering works on a sample dataset.
Mask R-CNN is an algorithm for instance segmentation that builds upon Faster R-CNN by adding a branch for predicting masks in parallel with bounding boxes. It uses a Feature Pyramid Network to extract features at multiple scales, and RoIAlign instead of RoIPool for better alignment between masks and their corresponding regions. The architecture consists of a Region Proposal Network for generating candidate object boxes, followed by two branches - one for classification and box regression, and another for predicting masks with a fully convolutional network using per-pixel sigmoid activations and binary cross-entropy loss. Mask R-CNN achieves state-of-the-art performance on standard instance segmentation benchmarks.
This document discusses the merge sort algorithm for sorting a sequence of numbers. It begins by introducing the divide and conquer approach, which merge sort uses. It then provides an example of how merge sort works, dividing the sequence into halves, sorting the halves recursively, and then merging the sorted halves together. The document proceeds to provide pseudocode for the merge sort and merge algorithms. It analyzes the running time of merge sort using recursion trees, determining that it runs in O(n log n) time. Finally, it covers techniques for solving recurrence relations that arise in algorithms like divide and conquer approaches.
This document presents information about graph coloring. It defines graph coloring as assigning colors to the vertices of a graph so that no adjacent vertices have the same color. There are two types: edge coloring and vertex coloring. Graph coloring is useful for problems like scheduling and radio channel assignment. However, finding the minimum number of colors to color a graph is an NP-complete problem, meaning there is no known efficient algorithm. Some common graph coloring algorithms discussed are greedy algorithm and Welsh-Powell algorithm.
Information-theoretic clustering with applicationsFrank Nielsen
Information-theoretic clustering with applications
Abstract: Clustering is a fundamental and key primitive to discover structural groups of homogeneous data in data sets, called clusters. The most famous clustering technique is the celebrated k-means clustering that seeks to minimize the sum of intra-cluster variances. k-Means is NP-hard as soon as the dimension and the number of clusters are both greater than 1. In the first part of the talk, we first present a generic dynamic programming method to compute the optimal clustering of n scalar elements into k pairwise disjoint intervals. This case includes 1D Euclidean k-means but also other kinds of clustering algorithms like the k-medoids, the k-medians, the k-centers, etc.
We extend the method to incorporate cluster size constraints and show how to choose the appropriate number of clusters using model selection. We then illustrate and refine the method on two case studies: 1D Bregman clustering and univariate statistical mixture learning maximizing the complete likelihood. In the second part of the talk, we introduce a generalization of k-means to cluster sets of histograms that has become an important ingredient of modern information processing due to the success of the bag-of-word modelling paradigm.
Clustering histograms can be performed using the celebrated k-means centroid-based algorithm. We consider the Jeffreys divergence that symmetrizes the Kullback-Leibler divergence, and investigate the computation of Jeffreys centroids. We prove that the Jeffreys centroid can be expressed analytically using the Lambert W function for positive histograms. We then show how to obtain a fast guaranteed approximation when dealing with frequency histograms and conclude with some remarks on the k-means histogram clustering.
References: - Optimal interval clustering: Application to Bregman clustering and statistical mixture learning IEEE ISIT 2014 (recent result poster) http://arxiv.org/abs/1403.2485
- Jeffreys Centroids: A Closed-Form Expression for Positive Histograms and a Guaranteed Tight Approximation for Frequency Histograms.
IEEE Signal Process. Lett. 20(7): 657-660 (2013) http://arxiv.org/abs/1303.7286
http://www.i.kyoto-u.ac.jp/informatics-seminar/
The document outlines various statistical and data analysis techniques that can be performed in R including importing data, data visualization, correlation and regression, and provides code examples for functions to conduct t-tests, ANOVA, PCA, clustering, time series analysis, and producing publication-quality output. It also reviews basic R syntax and functions for computing summary statistics, transforming data, and performing vector and matrix operations.
The document discusses various clustering algorithms and concepts:
1) K-means clustering groups data by minimizing distances between points and cluster centers, but it is sensitive to initialization and may find local optima.
2) K-medians clustering is similar but uses point medians instead of means as cluster representatives.
3) K-center clustering aims to minimize maximum distances between points and clusters, and can be approximated with a farthest-first traversal algorithm.
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
The GraphNet (aka S-Lasso), as well as other “sparsity + structure” priors like TV (Total-Variation), TV-L1, etc., are not easily applicable to brain data because of technical problems
relating to the selection of the regularization parameters. Also, in
their own right, such models lead to challenging high-dimensional optimization problems. In this manuscript, we present some heuristics for speeding up the overall optimization process: (a) Early-stopping, whereby one halts the optimization process when the test score (performance on leftout data) for the internal cross-validation for model-selection stops improving, and (b) univariate feature-screening, whereby irrelevant (non-predictive) voxels are detected and eliminated before the optimization problem is entered, thus reducing the size of the problem. Empirical results with GraphNet on real MRI (Magnetic Resonance Imaging) datasets indicate that these heuristics are a win-win strategy, as they add speed without sacrificing the quality of the predictions. We expect the proposed heuristics to work on other models like TV-L1, etc.
Optimal interval clustering: Application to Bregman clustering and statistica...Frank Nielsen
This document summarizes an academic paper on optimal interval clustering and its applications to Bregman clustering and statistical mixture learning. It begins by introducing hard clustering and center-based clustering approaches. It then describes how k-means clustering is NP-hard in higher dimensions but polynomial-time in 1D using dynamic programming. The document outlines an optimal interval clustering algorithm using dynamic programming with runtime O(n2kT1(n)) or O(n2T1(n)) using a lookup table. It discusses how this can be applied to 1D Bregman clustering and learning statistical mixtures, providing experimental results on Gaussian mixture models. Finally, it considers perspectives on hierarchical clustering, dynamic clustering maintenance, and streaming approximations.
Digital Signal Processing[ECEG-3171]-Ch1_L03Rediet Moges
This Digital Signal Processing Lecture material is the property of the author (Rediet M.) . It is not for publication,nor is it to be sold or reproduced.
#Africa#Ethiopia
The document summarizes a presentation on revocable identity-based encryption (RIBE) from codes with rank metric. Key points:
- RIBE adds an efficient revocation procedure to identity-based encryption by using a binary tree structure and key updates.
- The construction is based on low rank parity-check codes, with the master secret key defined as the "trapdoor" generated by the RankSign algorithm.
- Security relies on the rank syndrome decoding problem. Key updates are done efficiently through the binary tree with logarithmic complexity.
- Parameters are given that allow decoding of up to 2wr errors with small failure probability, suitable for the identity-based encryption scheme.
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
This document discusses parallelizing several algorithms and applications including k-means clustering, frequent itemset mining, integer programming, computer chess, and support vector machines (SVM). For k-means and frequent itemset mining, the algorithms can be parallelized by partitioning the data across processors and performing partial computations locally before combining results with an allreduce operation. Computer chess can be parallelized by exploring different game tree branches simultaneously on different processors. SVM problems involve large dense matrices that are difficult to solve in parallel directly due to their size exceeding memory; alternative approaches include solving smaller subproblems independently.
This document summarizes and compares two popular Python libraries for graph neural networks - Spektral and PyTorch Geometric. It begins by providing an overview of the basic functionality and architecture of each library. It then discusses how each library handles data loading and mini-batching of graph data. The document reviews several common message passing layer types implemented in both libraries. It provides an example comparison of using each library for a node classification task on the Cora dataset. Finally, it discusses a graph classification comparison in PyTorch Geometric using different message passing and pooling layers on the IMDB-binary dataset.
The document describes smart multitask clustering and multitask kernel clustering techniques. It introduces multitask Bregman clustering (MBC) which can have negative effects due to shared clusters. Smart multitask Bregman clustering and smart multitask kernel clustering are proposed to minimize these negative effects by comparing local losses between single and multitask clustering. The techniques are evaluated on three cases of the 20 Newsgroups dataset using clustering accuracy and normalized mutual information.
Introduction to machine learning terminology.
Applications within High Energy Physics and outside HEP.
* Basic problems: classification and regression.
* Nearest neighbours approach and spacial indices
* Overfitting (intro)
* Curse of dimensionality
* ROC curve, ROC AUC
* Bayes optimal classifier
* Density estimation: KDE and histograms
* Parametric density estimation
* Mixtures for density estimation and EM algorithm
* Generative approach vs discriminative approach
* Linear decision rule, intro to logistic regression
* Linear regression
This document introduces a generalized method for constructing sub-quadratic complexity multipliers for finite fields of characteristic 2. It begins by reintroducing the Winograd short convolution algorithm in the context of polynomial multiplication. It then presents a recursive construction technique that extends any d-point multiplier into an n=dk-point multiplier with sub-quadratic area and logarithmic delay complexity. Several new constructions are obtained using this technique, one of which is identical to the Karatsuba multiplier. The techniques aim to develop bit-parallel multipliers with better time and/or space complexity than the traditional quadratic complexity approaches.
A generalized class of normalized distance functions called Q-Metrics is described in this presentation. The Q-Metrics approach relies on a unique functional, using a single bounded parameter (Lambda), which characterizes the conventional distance functions in a normalized per-unit metric space. In addition to this coverage property, a distinguishing and extremely attractive characteristic of the Q-Metric function is its low computational complexity. Q-Metrics satisfy the standard metric axioms. Novel networks for classification and regression tasks are defined and constructed using Q-Metrics. These new networks are shown to outperform conventional feed forward back propagation networks with the same size when tested on real data sets.
A generalized class of normalized distance functions called Q-Metrics is described in this presentation. The Q-Metrics approach relies on a unique functional, using a single bounded parameter Lambda, which characterizes the conventional distance functions in a normalized per-unit metric space. In addition to this coverage property, a distinguishing and extremely attractive characteristic of the Q-Metric function is its low computational complexity. Q-Metrics satisfy the standard metric axioms. Novel networks for classification and regression tasks are defined and constructed using Q-Metrics. These new networks are shown to outperform conventional feed forward back propagation networks with the same size when tested on real data sets.
This document discusses dynamic programming and greedy algorithms. It begins by defining dynamic programming as a technique for solving problems with overlapping subproblems. It provides examples of dynamic programming approaches to computing Fibonacci numbers, binomial coefficients, the knapsack problem, and other problems. It also discusses greedy algorithms and provides examples of their application to problems like the change-making problem, minimum spanning trees, and single-source shortest paths.
This document discusses dynamic programming and greedy algorithms. It begins by defining dynamic programming as a technique for solving problems with overlapping subproblems. Examples provided include computing the Fibonacci numbers and binomial coefficients. Greedy algorithms are introduced as constructing solutions piece by piece through locally optimal choices. Applications discussed are the change-making problem, minimum spanning trees using Prim's and Kruskal's algorithms, and single-source shortest paths. Floyd's algorithm for all pairs shortest paths and optimal binary search trees are also summarized.
This document summarizes a semi-supervised regression method that combines graph Laplacian regularization with cluster ensemble methodology. It proposes using a weighted averaged co-association matrix from the cluster ensemble as the similarity matrix in graph Laplacian regularization. The method (SSR-LRCM) finds a low-rank approximation of the co-association matrix to efficiently solve the regression problem. Experimental results on synthetic and real-world datasets show SSR-LRCM achieves significantly better prediction accuracy than an alternative method, while also having lower computational costs for large datasets. Future work will explore using a hierarchical matrix approximation instead of low-rank.
Similar to ENBIS 2018 presentation on Deep k-Means (20)
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
ESPP presentation to EU Waste Water Network, 4th June 2024 “EU policies driving nutrient removal and recycling
and the revised UWWTD (Urban Waste Water Treatment Directive)”
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
Or: Beyond linear.
Abstract: Equivariant neural networks are neural networks that incorporate symmetries. The nonlinear activation functions in these networks result in interesting nonlinear equivariant maps between simple representations, and motivate the key player of this talk: piecewise linear representation theory.
Disclaimer: No one is perfect, so please mind that there might be mistakes and typos.
dtubbenhauer@gmail.com
Corrected slides: dtubbenhauer.com/talks.html
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Leonel Morgado
Current descriptions of immersive learning cases are often difficult or impossible to compare. This is due to a myriad of different options on what details to include, which aspects are relevant, and on the descriptive approaches employed. Also, these aspects often combine very specific details with more general guidelines or indicate intents and rationales without clarifying their implementation. In this paper we provide a method to describe immersive learning cases that is structured to enable comparisons, yet flexible enough to allow researchers and practitioners to decide which aspects to include. This method leverages a taxonomy that classifies educational aspects at three levels (uses, practices, and strategies) and then utilizes two frameworks, the Immersive Learning Brain and the Immersion Cube, to enable a structured description and interpretation of immersive learning cases. The method is then demonstrated on a published immersive learning case on training for wind turbine maintenance using virtual reality. Applying the method results in a structured artifact, the Immersive Learning Case Sheet, that tags the case with its proximal uses, practices, and strategies, and refines the free text case description to ensure that matching details are included. This contribution is thus a case description method in support of future comparative research of immersive learning cases. We then discuss how the resulting description and interpretation can be leveraged to change immersion learning cases, by enriching them (considering low-effort changes or additions) or innovating (exploring more challenging avenues of transformation). The method holds significant promise to support better-grounded research in immersive learning.
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
ENBIS 2018 presentation on Deep k-Means
1. Deep k-Means: Jointly Clustering with k-Means and Learning
Representations
Thibaut THONET
thibaut.thonet@univ-grenoble-alpes.fr
Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG
Joint work with Maziar MORADI FARD and Eric GAUSSIER
5 September 2018 @ ENBIS, Nancy
Thibaut Thonet Deep k-Means
2. Clustering
Clustering is the process of organizing unlabeled objects into groups (clusters)
whose members are similar in some way
Clustering approaches may be classified as:
Hard clustering: each object belongs at most to one cluster
Soft clustering: each object can belong to more than one cluster
Thibaut Thonet Deep k-Means 2 / 16
3. k-Means clustering
k-Means is a centroid-based approach for hard clustering [MacQueen, 1967].
Given a set of objects X, k-Means clustering aims to group the objects into k clusters
of similar samples by minimizing the following loss function:
min
R
x∈X
||x − c(x; R)||2
2
where R are the cluster centers and c(x; R) = arg min
r∈R
||x − r||2
is the nearest cluster
center to x
r1
r2
K
Assign objects to clusters Update cluster centers
Thibaut Thonet Deep k-Means 3 / 16
4. k-Means clustering
k-Means is a centroid-based approach for hard clustering [MacQueen, 1967].
Given a set of objects X, k-Means clustering aims to group the objects into k clusters
of similar samples by minimizing the following loss function:
min
R
x∈X
||x − c(x; R)||2
2
where R are the cluster centers and c(x; R) = arg min
r∈R
||x − r||2
is the nearest cluster
center to x
r1
r2
K
Assign objects to clusters Update cluster centers
...But the input space is often high-dimensional, sparse and/or with redundant
dimensions
=⇒ It may not be suitable for clustering
Thibaut Thonet Deep k-Means 3 / 16
5. k-Means in an embedded space: Auto-Encoder + k-Means
1. Train an
auto-encoder on the
dataset to learn object
embeddings (e.g., for
text, low-dimensional
dense representations)
2. Perform k-Means in
the embedding space
…
x Auto(x)
r1
r2
K
(x)hθ
(x)hθ
(x) = ||x − Auto(x)|min
θ
∑
x
Lrec ∑
x
|2
2
(x) = || (x) − c( (x); R)|min
R
∑
x
Lclust ∑
x
hθ hθ |2
2
with c( (x); R) = || (x) − r|hθ argmin
r∈R
hθ |2
Untitled Diagram.xml
Thibaut Thonet Deep k-Means 4 / 16
6. k-Means in an embedded space: Auto-Encoder + k-Means
1. Train an
auto-encoder on the
dataset to learn object
embeddings (e.g., for
text, low-dimensional
dense representations)
2. Perform k-Means in
the embedding space
…
x Auto(x)
r1
r2
K
(x)hθ
(x)hθ
(x) = ||x − Auto(x)|min
θ
∑
x
Lrec ∑
x
|2
2
(x) = || (x) − c( (x); R)|min
R
∑
x
Lclust ∑
x
hθ hθ |2
2
with c( (x); R) = || (x) − r|hθ argmin
r∈R
hθ |2
Untitled Diagram.xml
...But embeddings are not specifically learned for clustering purposes
=⇒ They may still not be suitable for clustering
Thibaut Thonet Deep k-Means 4 / 16
7. k-Means in an embedded space: Deep Clustering Network
The Deep Clustering Network (DCN) [Yang+, 2017] alternatively (i) learns cluster
representatives R and auto-encoder parameters θ using SGD and (ii) assigns data
points to the cluster with the nearest representative in the embedding space
= || (x) − c( (x); R)|hθ hθ |2
2
(x) = ||x − Auto(x)|Lrec |2
2
…
(x)Lclust
x Auto(x)
with c( (x); R) = || (x) − r|hθ argmin
r∈R
hθ |2
L = (x) + λ (x)min
R,θ
∑
x
Lrec Lclust
r1
r2
K
(x)hθ
diagram_dcn.xml
Thibaut Thonet Deep k-Means 5 / 16
8. k-Means in an embedded space: Deep Clustering Network
The Deep Clustering Network (DCN) [Yang+, 2017] alternatively (i) learns cluster
representatives R and auto-encoder parameters θ using SGD and (ii) assigns data
points to the cluster with the nearest representative in the embedding space
= || (x) − c( (x); R)|hθ hθ |2
2
(x) = ||x − Auto(x)|Lrec |2
2
…
(x)Lclust
x Auto(x)
with c( (x); R) = || (x) − r|hθ argmin
r∈R
hθ |2
L = (x) + λ (x)min
R,θ
∑
x
Lrec Lclust
r1
r2
K
(x)hθ
diagram_dcn.xml
...But impossibility to solely rely on SGD due to discrete assignments (argmin)
=⇒ Non-joint and less scalable training
Thibaut Thonet Deep k-Means 5 / 16
9. Deep k-means: overview
= closeness( (x), )∑
k
hθ rk
(x) = ||x − Auto(x)|Lrec |2
2
…
(x)Lclust
x Auto(x)
× || (x) − |hθ rk |2
2
L = (x) + λ (x)min
R,θ
∑
x
Lrec Lclust
r1
(x)hθ
r2
K
am_f.xml
Thibaut Thonet Deep k-Means 6 / 16
10. Deep k-means: a differentiable surrogate to DCN
We propose to solve a fully differentiable problem surrogate to DCN’s [Moradi Fard+,
2018]:
P
(α)
DKM: min
R,θ
L(α)
=
x∈X
Lrec(x) + λ L
(α)
clust(x)
with L
(α)
clust(x) =
r∈R
closeness(hθ(x), r; α) × ||hθ(x) − r||2
such that:
closeness(hθ(x), r; α) is differentiable wrt both θ and r
lim
α→∞
closeness(hθ(x), r; α) =
1 if r = arg min
r ∈R
||hθ(x) − r ||2
0 otherwise
Thibaut Thonet Deep k-Means 7 / 16
11. Deep k-means: a differentiable surrogate to DCN
We propose to solve a fully differentiable problem surrogate to DCN’s [Moradi Fard+,
2018]:
P
(α)
DKM: min
R,θ
L(α)
=
x∈X
Lrec(x) + λ L
(α)
clust(x)
with L
(α)
clust(x) =
r∈R
closeness(hθ(x), r; α) × ||hθ(x) − r||2
such that:
closeness(hθ(x), r; α) is differentiable wrt both θ and r
lim
α→∞
closeness(hθ(x), r; α) =
1 if r = arg min
r ∈R
||hθ(x) − r ||2
0 otherwise
Intuitively, closeness(hθ(x), r; α) can be seen as a relaxation to DCN’s hard
clustering assignments such that lim
α→∞
P
(α)
DKM = PDCN holds
Thibaut Thonet Deep k-Means 7 / 16
12. Deep k-means: choice of closeness and α
We chose closeness to be defined based on a parameterized softmax:
closeness(hθ(x), r; α) =
exp(−α ||hθ(x) − r||2
)
r ∈R
exp(−α ||hθ(x) − r ||2
)
where α can be either set as a constant or progressively increased (deterministic
annealing)
Thibaut Thonet Deep k-Means 8 / 16
13. Deep k-means: choice of closeness and α
We chose closeness to be defined based on a parameterized softmax:
closeness(hθ(x), r; α) =
exp(−α ||hθ(x) − r||2
)
r ∈R
exp(−α ||hθ(x) − r ||2
)
where α can be either set as a constant or progressively increased (deterministic
annealing)
α plays two roles: (a) approximation of hard clustering and (b) inverse temperature
in a deterministic annealing scheme
DKMa
: random initialization of θ and R + annealing: sequence (αn)n with
α1 = 0.1
DKMp
: pretraining of θ and k-means-based initialization of R + no annealing:
constant α = 1000
where the sequence (αn)n is defined as αn+1 = 21/ log(n)2
× αn
Thibaut Thonet Deep k-Means 8 / 16
14. Deep k-means: SGD-based training algorithm
Algorithm 1 Deep k-means
Input: data X, number of clusters K, trade-off hyperparameter
λ, scheme for α, number of epochs T, number of minibatches N,
learning rate η
Output: autoencoder parameters θ, cluster representatives R
Initialize θ and rk, 1 ≤ k ≤ K (randomly or through pretraining)
for each α do # α levels (if α not constant)
for t = 1 to T do # epochs per α
for n = 1 to N do # minibatches
Draw a minibatch ˜X ⊂ X
Update (θ, R) ← (θ, R) − η 1
| ˜X| (θ, R)
˜L(α)
end for
end for
end for
Thibaut Thonet Deep k-Means 9 / 16
15. Experimental setup
AE architecture: encoder with d-500-500-2000-K neurons and mirrored decoder
Baselines
k-Means
AE + k-Means
Deep Clustering Network [Yang+, 2017]
Improved Deep Embedded Clustering [Guo+, 2017]
Datasets
Text
20 Newsgroups: 20 classes, 18,846 samples
RCV1: 4 classes, 10,000 samples
Image
MNIST: 10 classes, 70,000 samples
USPS: 10 classes, 9,298 samples
Clustering metrics
Clustering accuracy (ACC)
Normalized Mutual Information (NMI)
Thibaut Thonet Deep k-Means 10 / 16
16. Clustering performance
Mean ± std for ACC and NMI computed over 10 (seeded) runs. Bold (resp. underlined)
values correspond to results with no significant difference (p > 0.05) to the best
approach with (resp. without) pretraining for each dataset/metric pair
Model
MNIST USPS 20NEWS RCV1
ACC NMI ACC NMI ACC NMI ACC NMI
KM 53.5±0.3 49.8±0.5 67.3±0.1 61.4±0.1 23.2±1.5 21.6±1.8 50.8±2.9 31.3±5.4
AE-KM 80.8±1.8 75.2±1.1 72.9±0.8 71.7±1.2 49.0±2.9 44.5±1.5 56.7±3.6 31.5±4.3
Deep clustering approaches without pretraining
DCNnp
34.8±3.0 18.1±1.0 36.4±3.5 16.9±1.3 17.9±1.0 9.8±0.5 41.3±4.0 6.9±1.8
IDECnp
61.8±3.0 62.4±1.6 53.9±5.1 50.0±3.8 22.3±1.5 22.3±1.5 56.7±5.3 31.4±2.8
DKMa
82.3±3.2 78.0±1.9 75.5±6.8 73.0±2.3 44.8±2.4 42.8±1.1 53.8±5.5 28.0±5.8
Deep clustering approaches with pretraining
DCNp
81.1±1.9 75.7±1.1 73.0±0.8 71.9±1.2 49.2±2.9 44.7±1.5 56.7±3.6 31.6±4.3
IDECp
85.7±2.4 86.4±1.0 75.2±0.5 74.9±0.6 40.5±1.3 38.2±1.0 59.5±5.7 34.7±5.0
DKMp
84.0±2.2 79.6±0.9 75.7±1.3 77.6±1.1 51.2±2.8 46.7±1.2 58.3±3.8 33.1±4.9
Thibaut Thonet Deep k-Means 11 / 16
18. Conclusion
Proposition of Deep k-Means, a new approach to jointly perform k-means
clustering and representation learning
Take-home messages:
Pretraining is clearly beneficial to deep clustering
The differentiable formulation of DKM enables fully joint SGD and thus
efficient use of GPU
k-Means-based approaches can perform on par with state-of-the-art deep
clustering approaches
Thibaut Thonet Deep k-Means 13 / 16
19. Ongoing work: Constrained Deep k-Means
We wish to guide the clustering results in order to capture information that is relevant
to the user (e.g., expert knowledge on the classes). We consider here that this
information takes the form of lexical constraints with a set of keywords for document
clustering
engine
car
diet
foodnovel
book
Thibaut Thonet Deep k-Means 14 / 16
20. Ongoing work: Constrained Deep k-Means
We wish to guide the clustering results in order to capture information that is relevant
to the user (e.g., expert knowledge on the classes). We consider here that this
information takes the form of lexical constraints with a set of keywords for document
clustering
engine
car
diet
foodnovel
book
Two approaches considered:
Constrain the document embeddings to put more emphasis on the keywords
Constrain the cluster representatives to be related to subsets of the keywords
Thibaut Thonet Deep k-Means 14 / 16
21. Thank you!
Paper pre-print available at: https://arxiv.org/pdf/1806.10069.pdf
Thibaut Thonet Deep k-Means 15 / 16
22. References
Guo, X., Gao, L., Liu, X., & Yin, J. (2017). Improved Deep Embedded Clustering
with Local Structure Preservation. In Proceedings of the 26th International Joint
Conference on Artificial Intelligence (pp. 1753–1759).
MacQueen, J. (1967). Some Methods for Classification and Analysis of
Multivariate Observations. In Proceedings of the 5th Berkeley Symposium on
Mathematical Statistics and Probability (pp. 281–297).
Moradi Fard, M., Thonet, T., & Gaussier, E. (2018). Deep k-Means: Jointly
Clustering with k-Means and Learning Representations. arXiv:1806.10069.
Yang, B., Fu, X., Sidiropoulos, N. D., & Hong, M. (2017). Towards
K-means-friendly Spaces: Simultaneous Deep Learning and Clustering. In
ICML ’17 (pp. 3861–3870).
Thibaut Thonet Deep k-Means 16 / 16
23. Appendix: clustering metrics
Given the groundtruth classes S = {S1, . . . , SK }, the obtained clusters
C = {C1, . . . , CK }, and the dataset X:
ACC(C, S) = max
φ
1
|X|
|X|
i=1
I{si = φ(ci)}
NMI(C, S) =
2 I(C, S)
H(C) + H(S)
with I(C, S) =
j,k
|Cj ∩ Sk|
|X|
log
|X| |Cj ∩ Sk|
|Cj| |Sk|
and H(C) = −
j
|Cj|
|X|
log
|Cj|
|X|
Thibaut Thonet Deep k-Means 16 / 16