UMAP is a technique for dimensionality reduction that was proposed 2 years ago that quickly gained widespread usage for dimensionality reduction.
In this presentation I will try to demistyfy UMAP by comparing it to tSNE. I also sketch its theoretical background in topology and fuzzy sets.
This document provides an introduction to kernel density estimation for non-parametric density estimation. It discusses how kernel density estimation works by placing a kernel over each data point and summing the kernels to estimate the probability density function without parametric assumptions. The key steps are: (1) using a kernel function like the Parzen window to determine how many points fall within a region of size h centered on the point x to estimate; (2) the estimate is the sum of the kernel values divided by the sample size N and volume h^D; and (3) the bandwidth h acts as a smoothing parameter, with a wider h producing a smoother estimate.
This document discusses t-Distributed Stochastic Neighbor Embedding (t-SNE), a technique for visualizing high-dimensional data. It begins by overviewing dimension reduction techniques before focusing on t-SNE. t-SNE is an improvement on Stochastic Neighbor Embedding (SNE) that converts similarities between data points to joint probabilities and minimizes the divergence between a high-dimensional and low-dimensional distribution. The document explains how t-SNE addresses issues like the "crowding problem" to better separate clusters in low dimensions. Optimization methods for t-SNE are also covered.
This document summarizes the DBSCAN clustering algorithm. DBSCAN finds clusters based on density, requiring only two parameters: Eps, which defines the neighborhood distance, and MinPts, the minimum number of points required to form a cluster. It can discover clusters of arbitrary shape. The algorithm works by expanding clusters from core points, which have at least MinPts points within their Eps-neighborhood. Points that are not part of any cluster are classified as noise. Applications include spatial data analysis, image segmentation, and automatic border detection in medical images.
PCA is an unsupervised learning technique used to reduce the dimensionality of large data sets by transforming the data to a new set of variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. PCA is commonly used for applications like dimensionality reduction, data compression, and visualization. The document discusses PCA algorithms and applications of PCA in domains like face recognition, image compression, and noise filtering.
Naive Bayes is a kind of classifier which uses the Bayes Theorem. It predicts membership probabilities for each class such as the probability that given record or data point belongs to a particular class.
This document provides an overview of parametric and non-parametric supervised machine learning. Parametric learning uses a fixed number of parameters and makes strong assumptions about the data, while non-parametric learning uses a flexible number of parameters that grows with more data, making fewer assumptions. Common examples of parametric models include linear regression and logistic regression, while non-parametric examples include K-nearest neighbors, decision trees, and neural networks. The document also briefly discusses calculating parameters using ordinary least mean square for parametric models and the limitations when data does not follow predefined assumptions.
This document summarizes and compares two popular Python libraries for graph neural networks - Spektral and PyTorch Geometric. It begins by providing an overview of the basic functionality and architecture of each library. It then discusses how each library handles data loading and mini-batching of graph data. The document reviews several common message passing layer types implemented in both libraries. It provides an example comparison of using each library for a node classification task on the Cora dataset. Finally, it discusses a graph classification comparison in PyTorch Geometric using different message passing and pooling layers on the IMDB-binary dataset.
This document provides an introduction to kernel density estimation for non-parametric density estimation. It discusses how kernel density estimation works by placing a kernel over each data point and summing the kernels to estimate the probability density function without parametric assumptions. The key steps are: (1) using a kernel function like the Parzen window to determine how many points fall within a region of size h centered on the point x to estimate; (2) the estimate is the sum of the kernel values divided by the sample size N and volume h^D; and (3) the bandwidth h acts as a smoothing parameter, with a wider h producing a smoother estimate.
This document discusses t-Distributed Stochastic Neighbor Embedding (t-SNE), a technique for visualizing high-dimensional data. It begins by overviewing dimension reduction techniques before focusing on t-SNE. t-SNE is an improvement on Stochastic Neighbor Embedding (SNE) that converts similarities between data points to joint probabilities and minimizes the divergence between a high-dimensional and low-dimensional distribution. The document explains how t-SNE addresses issues like the "crowding problem" to better separate clusters in low dimensions. Optimization methods for t-SNE are also covered.
This document summarizes the DBSCAN clustering algorithm. DBSCAN finds clusters based on density, requiring only two parameters: Eps, which defines the neighborhood distance, and MinPts, the minimum number of points required to form a cluster. It can discover clusters of arbitrary shape. The algorithm works by expanding clusters from core points, which have at least MinPts points within their Eps-neighborhood. Points that are not part of any cluster are classified as noise. Applications include spatial data analysis, image segmentation, and automatic border detection in medical images.
PCA is an unsupervised learning technique used to reduce the dimensionality of large data sets by transforming the data to a new set of variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. PCA is commonly used for applications like dimensionality reduction, data compression, and visualization. The document discusses PCA algorithms and applications of PCA in domains like face recognition, image compression, and noise filtering.
Naive Bayes is a kind of classifier which uses the Bayes Theorem. It predicts membership probabilities for each class such as the probability that given record or data point belongs to a particular class.
This document provides an overview of parametric and non-parametric supervised machine learning. Parametric learning uses a fixed number of parameters and makes strong assumptions about the data, while non-parametric learning uses a flexible number of parameters that grows with more data, making fewer assumptions. Common examples of parametric models include linear regression and logistic regression, while non-parametric examples include K-nearest neighbors, decision trees, and neural networks. The document also briefly discusses calculating parameters using ordinary least mean square for parametric models and the limitations when data does not follow predefined assumptions.
This document summarizes and compares two popular Python libraries for graph neural networks - Spektral and PyTorch Geometric. It begins by providing an overview of the basic functionality and architecture of each library. It then discusses how each library handles data loading and mini-batching of graph data. The document reviews several common message passing layer types implemented in both libraries. It provides an example comparison of using each library for a node classification task on the Cora dataset. Finally, it discusses a graph classification comparison in PyTorch Geometric using different message passing and pooling layers on the IMDB-binary dataset.
Machine learning on graphs is an important and ubiquitous task with applications ranging from drug design to friendship recommendation in social networks. The primary challenge in this domain is finding a way to represent, or encode, graph structure so that it can be easily exploited by machine learning models. However, traditionally machine learning approaches relied on user-defined heuristics to extract features encoding structural information about a graph. In this talk I will discuss methods that automatically learn to encode graph structure into low-dimensional embeddings, using techniques based on deep learning and nonlinear dimensionality reduction. I will provide a conceptual review of key advancements in this area of representation learning on graphs, including random-walk based algorithms, and graph convolutional networks.
t-SNE is a modern visualization algorithm that presents high-dimensional data in 2 or 3 dimensions according to some desired distances. If you have some data and you can measure their pairwise differences, t-SNE visualization can help you identify various clusters.
DBSCAN is a density-based clustering algorithm that can find clusters of arbitrary shape. It requires two parameters: epsilon, which defines the neighborhood distance, and minimum points. It marks points as core, border or noise based on the number of points within their epsilon-neighborhood. Randomized DBSCAN improves the time complexity from O(n^2) to O(n) by randomly selecting a maximum of k points in each neighborhood to analyze rather than all points. Testing shows Randomized DBSCAN performs as well as DBSCAN in terms of accuracy while improving runtime, especially at higher data densities relative to epsilon. Future work includes analyzing accuracy in higher dimensions and combining with indexing to further improve time complexity.
This document provides an overview of graph neural networks (GNNs). GNNs are a type of neural network that can operate on graph-structured data like molecules or social networks. GNNs learn representations of nodes by propagating information between connected nodes over many layers. They are useful when relationships between objects are important. Examples of applications include predicting drug properties from molecular graphs and program understanding by modeling code as graphs. The document explains how GNNs differ from RNNs and provides examples of GNN variations, datasets, and frameworks.
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks
This presentation explains basic ideas of graph neural networks (GNNs) and their common applications. Primary target audiences are students, engineers and researchers who are new to GNNs but interested in using GNNs for their projects. This is a modified version of the course material for a special lecture on Data Science at Nara Institute of Science and Technology (NAIST), given by Preferred Networks researcher Katsuhiko Ishiguro, PhD.
Mean shift clustering finds clusters by locating peaks in the probability density function of the data. It iteratively moves data points to the mean of nearby points until convergence. Hierarchical clustering builds clusters gradually by either merging or splitting clusters at each step. There are two types: divisive which splits clusters, and agglomerative which merges clusters. Agglomerative clustering starts with each point as a cluster and iteratively merges the closest pair of clusters until all are merged based on a chosen linkage method like complete or average linkage. The choice of distance metric and linkage method impacts the resulting clusters.
This document provides an introduction to radial basis function (RBF) interpolation of scattered data. It discusses how RBFs choose basis functions centered at data points to guarantee a well-posed interpolation problem. Common RBF kernels include the multiquadric, inverse multiquadric, and Gaussian functions. While RBF interpolation is guaranteed to have a unique solution, it can still be ill-conditioned depending on the shape parameter choice. Considerations for using RBFs include that the interpolation matrix is dense, requiring optimization of the shape parameter, and interpolation error increases near boundaries.
High Dimensional Data Visualization using t-SNEKai-Wen Zhao
Review of the t-SNE algorithm which helps visualizing the high dimensional data on manifold by projecting them onto 2D or 3D space with metric preserving.
The document summarizes the U-Net convolutional network architecture for biomedical image segmentation. U-Net improves on Fully Convolutional Networks (FCNs) by introducing a U-shaped architecture with skip connections between contracting and expansive paths. This allows contextual information from the contracting path to be combined with localization information from the expansive path, improving segmentation of biomedical images which often have objects at multiple scales. The U-Net architecture has been shown to perform well even with limited training data due to its ability to make use of context.
Spectral clustering transforms the clustering problem into a graph-partitioning problem by treating each data point as a graph node. It constructs a similarity graph for all data points and embeds them into a low-dimensional space using the eigenvectors of the graph Laplacian, in which clusters are more obvious. Then a classical clustering algorithm like k-means is applied to partition the embedding. Spectral clustering is flexible and can handle various cluster shapes without assumptions, often outperforming other approaches.
Tutorial on Object Detection (Faster R-CNN)Hwa Pyung Kim
The document describes Faster R-CNN, an object detection method that uses a Region Proposal Network (RPN) to generate region proposals from feature maps, pools features from each proposal into a fixed size using RoI pooling, and then classifies and regresses bounding boxes for each proposal using a convolutional network. The RPN outputs objectness scores and bounding box adjustments for anchor boxes sliding over the feature map, and non-maximum suppression is applied to reduce redundant proposals.
The document describes multilayer neural networks and their use for classification problems. It discusses how neural networks can handle continuous-valued inputs and outputs unlike decision trees. Neural networks are inherently parallel and can be sped up through parallelization techniques. The document then provides details on the basic components of neural networks, including neurons, weights, biases, and activation functions. It also describes common network architectures like feedforward networks and discusses backpropagation for training networks.
This Python code loads and preprocesses a CSV dataset using Pandas and Scikit-Learn. It splits the data into training and test sets, scales the features, fits an SVC model with RBF kernel, and performs 10-fold cross-validation and grid search to tune the hyperparameters. Key steps include importing libraries, reading the CSV file, splitting the data, fitting an SVC classifier, applying cross-validation to obtain accuracy scores, and using grid search with 10-fold CV to find the best model parameters.
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
This document provides an overview of graph representation learning and various methods for learning embeddings of nodes in graph-structured data. It introduces shallow methods like DeepWalk and Node2Vec that learn embeddings by generating random walks. It then discusses deep methods like graph convolutional networks (GCN) and GraphSAGE that learn embeddings through neural network aggregation of node neighborhoods. Graph attention networks are also introduced as a learnable aggregator for GCN. Finally, applications of these methods at Pinterest for pin recommendation and at Uber Eats for dish recommendation are briefly described.
Hierarchical clustering methods group data points into a hierarchy of clusters based on their distance or similarity. There are two main approaches: agglomerative, which starts with each point as a separate cluster and merges them; and divisive, which starts with all points in one cluster and splits them. AGNES and DIANA are common agglomerative and divisive algorithms. Hierarchical clustering represents the hierarchy as a dendrogram tree structure and allows exploring data at different granularities of clusters.
AlexNet achieved unprecedented results on the ImageNet dataset by using a deep convolutional neural network with over 60 million parameters. It achieved top-1 and top-5 error rates of 37.5% and 17.0%, significantly outperforming previous methods. The network architecture included 5 convolutional layers, some with max pooling, and 3 fully-connected layers. Key aspects were the use of ReLU activations for faster training, dropout to reduce overfitting, and parallelizing computations across two GPUs. This dramatic improvement demonstrated the potential of deep learning for computer vision tasks.
We are interested in finding a permutation of the entries of a given square matrix so that the maximum number of its nonzero entries are moved to one of the corners in a L-shaped fashion.
If we interpret the nonzero entries of the matrix as the edges of a graph, this problem boils down to the so-called core–periphery structure, consisting of two sets: the core, a set of nodes that is highly connected across the whole graph, and the periphery, a set of nodes that is well connected only to the nodes that are in the core.
Matrix reordering problems have applications in sparse factorizations and preconditioning, while revealing core–periphery structures in networks has applications in economic, social and communication networks.
This document is a lecture on digital image fundamentals given by Dr. Pabitra Pal. It discusses the structure of the human eye, how images are formed in the eye, and how light is sensed and digital images are acquired and represented. Key topics covered include the distribution of rods and cones in the retina, image sampling and quantization, basic mathematical tools used in digital image processing like arithmetic and logical operations, and set theory concepts applied to images.
Machine learning on graphs is an important and ubiquitous task with applications ranging from drug design to friendship recommendation in social networks. The primary challenge in this domain is finding a way to represent, or encode, graph structure so that it can be easily exploited by machine learning models. However, traditionally machine learning approaches relied on user-defined heuristics to extract features encoding structural information about a graph. In this talk I will discuss methods that automatically learn to encode graph structure into low-dimensional embeddings, using techniques based on deep learning and nonlinear dimensionality reduction. I will provide a conceptual review of key advancements in this area of representation learning on graphs, including random-walk based algorithms, and graph convolutional networks.
t-SNE is a modern visualization algorithm that presents high-dimensional data in 2 or 3 dimensions according to some desired distances. If you have some data and you can measure their pairwise differences, t-SNE visualization can help you identify various clusters.
DBSCAN is a density-based clustering algorithm that can find clusters of arbitrary shape. It requires two parameters: epsilon, which defines the neighborhood distance, and minimum points. It marks points as core, border or noise based on the number of points within their epsilon-neighborhood. Randomized DBSCAN improves the time complexity from O(n^2) to O(n) by randomly selecting a maximum of k points in each neighborhood to analyze rather than all points. Testing shows Randomized DBSCAN performs as well as DBSCAN in terms of accuracy while improving runtime, especially at higher data densities relative to epsilon. Future work includes analyzing accuracy in higher dimensions and combining with indexing to further improve time complexity.
This document provides an overview of graph neural networks (GNNs). GNNs are a type of neural network that can operate on graph-structured data like molecules or social networks. GNNs learn representations of nodes by propagating information between connected nodes over many layers. They are useful when relationships between objects are important. Examples of applications include predicting drug properties from molecular graphs and program understanding by modeling code as graphs. The document explains how GNNs differ from RNNs and provides examples of GNN variations, datasets, and frameworks.
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks
This presentation explains basic ideas of graph neural networks (GNNs) and their common applications. Primary target audiences are students, engineers and researchers who are new to GNNs but interested in using GNNs for their projects. This is a modified version of the course material for a special lecture on Data Science at Nara Institute of Science and Technology (NAIST), given by Preferred Networks researcher Katsuhiko Ishiguro, PhD.
Mean shift clustering finds clusters by locating peaks in the probability density function of the data. It iteratively moves data points to the mean of nearby points until convergence. Hierarchical clustering builds clusters gradually by either merging or splitting clusters at each step. There are two types: divisive which splits clusters, and agglomerative which merges clusters. Agglomerative clustering starts with each point as a cluster and iteratively merges the closest pair of clusters until all are merged based on a chosen linkage method like complete or average linkage. The choice of distance metric and linkage method impacts the resulting clusters.
This document provides an introduction to radial basis function (RBF) interpolation of scattered data. It discusses how RBFs choose basis functions centered at data points to guarantee a well-posed interpolation problem. Common RBF kernels include the multiquadric, inverse multiquadric, and Gaussian functions. While RBF interpolation is guaranteed to have a unique solution, it can still be ill-conditioned depending on the shape parameter choice. Considerations for using RBFs include that the interpolation matrix is dense, requiring optimization of the shape parameter, and interpolation error increases near boundaries.
High Dimensional Data Visualization using t-SNEKai-Wen Zhao
Review of the t-SNE algorithm which helps visualizing the high dimensional data on manifold by projecting them onto 2D or 3D space with metric preserving.
The document summarizes the U-Net convolutional network architecture for biomedical image segmentation. U-Net improves on Fully Convolutional Networks (FCNs) by introducing a U-shaped architecture with skip connections between contracting and expansive paths. This allows contextual information from the contracting path to be combined with localization information from the expansive path, improving segmentation of biomedical images which often have objects at multiple scales. The U-Net architecture has been shown to perform well even with limited training data due to its ability to make use of context.
Spectral clustering transforms the clustering problem into a graph-partitioning problem by treating each data point as a graph node. It constructs a similarity graph for all data points and embeds them into a low-dimensional space using the eigenvectors of the graph Laplacian, in which clusters are more obvious. Then a classical clustering algorithm like k-means is applied to partition the embedding. Spectral clustering is flexible and can handle various cluster shapes without assumptions, often outperforming other approaches.
Tutorial on Object Detection (Faster R-CNN)Hwa Pyung Kim
The document describes Faster R-CNN, an object detection method that uses a Region Proposal Network (RPN) to generate region proposals from feature maps, pools features from each proposal into a fixed size using RoI pooling, and then classifies and regresses bounding boxes for each proposal using a convolutional network. The RPN outputs objectness scores and bounding box adjustments for anchor boxes sliding over the feature map, and non-maximum suppression is applied to reduce redundant proposals.
The document describes multilayer neural networks and their use for classification problems. It discusses how neural networks can handle continuous-valued inputs and outputs unlike decision trees. Neural networks are inherently parallel and can be sped up through parallelization techniques. The document then provides details on the basic components of neural networks, including neurons, weights, biases, and activation functions. It also describes common network architectures like feedforward networks and discusses backpropagation for training networks.
This Python code loads and preprocesses a CSV dataset using Pandas and Scikit-Learn. It splits the data into training and test sets, scales the features, fits an SVC model with RBF kernel, and performs 10-fold cross-validation and grid search to tune the hyperparameters. Key steps include importing libraries, reading the CSV file, splitting the data, fitting an SVC classifier, applying cross-validation to obtain accuracy scores, and using grid search with 10-fold CV to find the best model parameters.
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
This document provides an overview of graph representation learning and various methods for learning embeddings of nodes in graph-structured data. It introduces shallow methods like DeepWalk and Node2Vec that learn embeddings by generating random walks. It then discusses deep methods like graph convolutional networks (GCN) and GraphSAGE that learn embeddings through neural network aggregation of node neighborhoods. Graph attention networks are also introduced as a learnable aggregator for GCN. Finally, applications of these methods at Pinterest for pin recommendation and at Uber Eats for dish recommendation are briefly described.
Hierarchical clustering methods group data points into a hierarchy of clusters based on their distance or similarity. There are two main approaches: agglomerative, which starts with each point as a separate cluster and merges them; and divisive, which starts with all points in one cluster and splits them. AGNES and DIANA are common agglomerative and divisive algorithms. Hierarchical clustering represents the hierarchy as a dendrogram tree structure and allows exploring data at different granularities of clusters.
AlexNet achieved unprecedented results on the ImageNet dataset by using a deep convolutional neural network with over 60 million parameters. It achieved top-1 and top-5 error rates of 37.5% and 17.0%, significantly outperforming previous methods. The network architecture included 5 convolutional layers, some with max pooling, and 3 fully-connected layers. Key aspects were the use of ReLU activations for faster training, dropout to reduce overfitting, and parallelizing computations across two GPUs. This dramatic improvement demonstrated the potential of deep learning for computer vision tasks.
We are interested in finding a permutation of the entries of a given square matrix so that the maximum number of its nonzero entries are moved to one of the corners in a L-shaped fashion.
If we interpret the nonzero entries of the matrix as the edges of a graph, this problem boils down to the so-called core–periphery structure, consisting of two sets: the core, a set of nodes that is highly connected across the whole graph, and the periphery, a set of nodes that is well connected only to the nodes that are in the core.
Matrix reordering problems have applications in sparse factorizations and preconditioning, while revealing core–periphery structures in networks has applications in economic, social and communication networks.
This document is a lecture on digital image fundamentals given by Dr. Pabitra Pal. It discusses the structure of the human eye, how images are formed in the eye, and how light is sensed and digital images are acquired and represented. Key topics covered include the distribution of rods and cones in the retina, image sampling and quantization, basic mathematical tools used in digital image processing like arithmetic and logical operations, and set theory concepts applied to images.
Double patterning lithography is a technique used to overcome the limitations of optical lithography. It works by splitting the mask pattern into two separate exposures to reduce feature density. The document describes the key techniques used in their software to solve the layout splitting problem, including a novel polygon cutting algorithm, dynamic priority search trees, and representing the problem as a tri-connected graph to decompose it into independent subproblems. Experimental results showed their method achieved a 3-10x speedup over other approaches.
Double patterning lithography is a technique used to overcome the limitations of optical lithography. It works by splitting the mask pattern into two separate exposures to reduce feature density. The document describes the key techniques used in their software to solve the layout splitting problem, including a novel polygon cutting algorithm, dynamic priority search trees, and representing the problem as a tri-connected graph to decompose it into independent subproblems. Experimental results showed their method achieved a 3-10x speedup over other approaches.
The document discusses double patterning lithography (DPL) as a technique to overcome limitations of optical lithography at smaller feature sizes. It presents an algorithm for DPL layout splitting that involves: (1) decomposing the conflict graph into tri-connected components, (2) solving each component independently, and (3) merging the solutions. Experimental results showed the algorithm achieved a 3-10x speedup over previous approaches.
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...NTNU
The introduction of expert knowledge when learning Bayesian Networks from data is known to be an excellent approach to boost the performance of automatic learning methods, specially when the data is scarce. Previous approaches for this problem based on Bayesian statistics introduce the expert knowledge modifying the prior probability distributions. In this study, we propose a new methodology based on Monte Carlo simulation which starts with non-informative priors and requires knowledge from the expert a posteriori, when the simulation ends. We also explore a new Importance Sampling method for Monte Carlo simulation and the definition of new non-informative priors for the structure of the network. All these approaches are experimentally validated with five standard Bayesian networks.
Read more:
http://link.springer.com/chapter/10.1007%2F978-3-642-14049-5_70
The document describes sparse matrix reconstruction using a matrix completion algorithm. It begins with an overview of the matrix completion problem and formulation. It then describes the algorithm which uses soft-thresholding to impose a low-rank constraint and iteratively finds the matrix that agrees with the observed entries. The algorithm is proven to converge to the desired solution. Extensions to noisy data and generalized constraints are also discussed.
The retrieval algorithms in remote sensing generally involve complex physical forward models that are nonlinear and computationally expensive to evaluate. Statistical emulation provides an alternative with cheap computation and can be used to calibrate model parameters and to improve computational efficiency of the retrieval algorithms. We introduce a framework of combining dimension reduction of input and output spaces and Gaussian process emulation technique. The functional principal component analysis (FPCA) is chosen to reduce to the output space of thousands of dimensions by orders of magnitude. In addition, instead of making restrictive assumptions regarding the correlation structure of the high-dimensional input space, we identity and exploit the most important directions of this space and thus construct a Gaussian process emulator with feasible computation. We will present preliminary results obtained from applying our method to OCO-2 data, and discuss how our framework can be generalized in distributed systems.
This document discusses different techniques for image segmentation, which is the process of partitioning an image into meaningful regions or objects. It covers several main methods of region segmentation, including region growing, clustering, and split-and-merge. It also discusses techniques for finding line and curve segments in an image, such as using the Hough transform or edge tracking procedures. Finally, it provides examples of applying these segmentation techniques to extract regions, straight lines, and circles from images.
This document presents an overview of independent component analysis (ICA). It begins with notation for factor analysis and ICA models, then discusses assumptions and challenges of ICA including rotation ambiguity and non-Gaussianity. Methods for estimating the ICA model including measuring non-Gaussianity via kurtosis, entropy, and negentropy are described. A direct approach to ICA called product density ICA is outlined along with its optimization algorithm. Finally, an example application to handwritten digit images is briefly discussed.
The document discusses clustering, mixture models, and the EM algorithm. It provides an overview of k-means clustering and Gaussian mixture models (GMM). K-means aims to partition observations into K clusters while minimizing within-cluster variance. GMM represents data as a weighted sum of Gaussian distributions. The EM algorithm is introduced for training GMM through maximum likelihood. It iteratively performs E-steps to estimate posterior distribution of latent variables, and M-steps to update model parameters, converging to a local optimum.
This document presents and compares three approximation methods for thin plate spline mappings that reduce the computational complexity from O(p3) to O(m3), where m is a small subset of points p. Method 1 uses only the subset of points to estimate the mapping. Method 2 uses the subset of basis functions with all target values. Method 3 approximates the full matrix using the Nyström method. Experiments on synthetic grids show Method 3 has the lowest error, followed by Method 2, with Method 1 having the highest error. The three methods trade off accuracy, computation time, and the ability to do principal warp analysis.
1. The document discusses the author's research in three areas: graph-based clustering methods, approximate Bayesian computation (ABC), and Bayesian computation using empirical likelihood.
2. For graph-based clustering, the author presents asymptotic results for spectral clustering as the number of data points and bandwidth approach infinity.
3. For ABC, the author discusses sequential ABC algorithms and challenges of model choice and high-dimensional summary statistics. Machine learning methods are proposed to analyze simulated ABC data.
4. For empirical likelihood, the author proposes using it for Bayesian computation when the likelihood is intractable and simulation is infeasible, as it provides correct confidence intervals unlike composite likelihoods.
Patch Matching with Polynomial Exponential Families and Projective DivergencesFrank Nielsen
This document presents a method called Polynomial Exponential Family-Patch Matching (PEF-PM) to solve the patch matching problem. PEF-PM models patch colors using polynomial exponential families (PEFs), which are universal smooth positive densities. It estimates PEFs using a Score Matching Estimator and accelerates batch estimation using Summed Area Tables. Patch similarity is measured using a statistical projective divergence called the symmetrized γ-divergence. Experiments show PEF-PM handles noise robustly, symmetries, and outperforms baseline methods.
Double patterning lithography is a technique used to print integrated circuit designs when feature sizes shrink below the resolution limits of a single exposure. It involves splitting the circuit layout into two masks and exposing the photo-resist layer twice to print the full design. Decomposing the circuit layout and assigning patterns to the two masks is an NP-hard graph coloring problem. The document describes techniques for decomposing the conflict graph that represents incompatible patterns, including using SPQR trees to decompose into tri-connected components and solving each independently. Experimental results show the proposed method can achieve a 3-10x speedup over other approaches.
Image segmentation involves partitioning an image into regions, linear structures, or shapes. There are several main methods of region segmentation including region growing, clustering, and split and merge. Region growing starts with seed pixels and grows regions by adding similar neighboring pixels. Clustering groups pixels into clusters to minimize differences within clusters. Common clustering algorithms include K-means, ISODATA, and histogram-based clustering. Edge detection finds boundaries between regions by looking for changes in intensity values. Popular edge detectors include Sobel, Canny, and zero-crossing operators. Line and curve segments can be found from edge images using tracking or the Hough transform, which accumulates votes for parameter values of lines and curves in an image.
Image segmentation involves partitioning an image into regions, linear structures (line segments, curve segments), or 2D shapes (circles, ellipses). Common region segmentation methods include region growing, clustering, and split-merge. Region growing starts with seed pixels and grows regions based on similarity. Clustering groups pixels into clusters to minimize dissimilarity within clusters. The Hough transform detects lines or curves by accumulating votes in a parameter space based on edge pixels.
Similar to Dimensionality reduction with UMAP (20)
Natural Language Processing (NLP), RAG and its applications .pptxfkyes25
1. In the realm of Natural Language Processing (NLP), knowledge-intensive tasks such as question answering, fact verification, and open-domain dialogue generation require the integration of vast and up-to-date information. Traditional neural models, though powerful, struggle with encoding all necessary knowledge within their parameters, leading to limitations in generalization and scalability. The paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" introduces RAG (Retrieval-Augmented Generation), a novel framework that synergizes retrieval mechanisms with generative models, enhancing performance by dynamically incorporating external knowledge during inference.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
1. Dimensionality reduction with UMAP
Advanced Data Mining seminar
Jakub Bartczuk
March 2020
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 1 / 30
2. Overview
1 Manifold learning recap
Multidimensional Scaling
Distances, graphs and matrix decomposition
2 Stochastic Neighbor Embedding
3 UMAP
Topological and geometrical preliminaries
Theoretical foundations
Algorithm
4 Libraries
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 2 / 30
3. Teaser
Reducing dimensionality of COIL20 Dataset
Figure: tSNE Figure: UMAP
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 3 / 30
4. Manifold learning recap
We want to uncover lower-dimensional structure in high-dimensional
space
Is the structure linear? If yes, use PCA
What to do if it is not linear?
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 4 / 30
6. Classical Multidimensional Scaling
Algorithm: MDS(X, d)
Di,j = xi −xj
2
Find y’s: yi ∈ Rd
Di,j = yi −yj
2
Such that D ≈ D
Minimize R = i,j xi −xj
2 − yi −yj
2
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 6 / 30
7. Classical MDS properties
easy to compute - decompose low-rank matrix from D
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 7 / 30
8. Classical MDS properties
easy to compute - decompose low-rank matrix from D
fast - Euclidean distance matrix just couple of vectorized matrix
computations
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 7 / 30
9. Classical MDS properties
easy to compute - decompose low-rank matrix from D
fast - Euclidean distance matrix just couple of vectorized matrix
computations
treats all distortions alike
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 7 / 30
10. Classical MDS properties
easy to compute - decompose low-rank matrix from D
fast - Euclidean distance matrix just couple of vectorized matrix
computations
treats all distortions alike
What structure does MDS preserve?
The last property basically means that we preserve local and global
structure alike.
x’s are close - they are close in embedding.
x’s are distant - they are equally distant in embedding.
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 7 / 30
11. Distances, graphs and matrix decomposition
Local vs global structure
We may not care about preserving exact large distances in embedding
space as much as small distances
Idea: Calculate distance along the set ’spanned’ by datapoints
Manifold hypothesis
The data lies in lower-dimensional, possibly nonlinear space which is
embedded in ambient space
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 8 / 30
12. MDS breakdown
1 estimate distances between some pairs of close points
2 make graph (X,E) with edges weighted by distance
3 define D as distance from graph
4 decompose matrix that represents the graph
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 9 / 30
13. MDS breakdown
1 estimate distances between some pairs of close points
2 make graph (X,E) with edges weighted by distance
3 define D as distance from graph
4 decompose matrix that represents the graph
Getting manifold learning algorithms from schematic
Setting E = {((xi ,xj ), xi −xj
2)|xi ,xj ∈ X} (complete graph) we get
classical MDS
Modifying steps 1-3 will get us Isomap, Hessian Eigenmaps
tSNE and UMAP will change step 4
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 9 / 30
14. Stochastic Neighbor Embedding
Idea: embed into lower-dimensional space, preserve distance statistics
qj|i =
f ( yi −yj )
k=j f ( yi −yk )
,pj|i =
exp(− xi −xj
2/s2
i )
k=j exp(− xi −xk
2/s2
i
)
KL(p,q) =
i,j
pj|i log
pj|i
qj|i
Notes
Choice of f determines algorithm:
f (x) = exp(−x2) - original SNE
f (x) = (1+x2)−1 - tSNE (note this is density of Cauchy distribution
up to a constant)
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 10 / 30
15. Stochastic Neighbor Embedding
Perplexity
s gives rise to perplexity Pi = 2H(pi ) which controls effective neighborhood
size at xi .
High level algorithm
for each xi find s that matches given perplexity
initialize yi randomly
minimize KL with gradient descent w.r.t. yi
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 11 / 30
16. Problems with SNE: Crowding problem
’more room’ for intermediate
distances in higher dimensions
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 12 / 30
17. Problems with SNE: Crowding problem
’more room’ for intermediate
distances in higher dimensions
pairs of points will tend to
have similar distance
(remember curse of
dimensionality for kNN)
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 12 / 30
18. Problems with SNE: Crowding problem
’more room’ for intermediate
distances in higher dimensions
pairs of points will tend to
have similar distance
(remember curse of
dimensionality for kNN)
harder to embed them
faithfully into lower
dimensional space
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 12 / 30
19. Problems with SNE: Crowding problem
’more room’ for intermediate
distances in higher dimensions
pairs of points will tend to
have similar distance
(remember curse of
dimensionality for kNN)
harder to embed them
faithfully into lower
dimensional space
somewhat fixed by using t Distribution which has longer tails (not so
concentrated around maximum).
This problem is not specific to tSNE - it relates to the fact that data
may not be sampled uniformly from manifold
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 12 / 30
20. Bird’s view of UMAP’s theory
Recall UMAP means Uniform Manifold Approximation and Projection
Algorithm
1 get input data representation
2 initialize embedding
3 calculate probabilities of points being nearest neighbors
4 optimize embedding
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 13 / 30
21. Bird’s view of UMAP’s theory
Recall UMAP means Uniform Manifold Approximation and Projection
Algorithm
1 get input data representation
2 initialize embedding
3 calculate probabilities of points being nearest neighbors
4 optimize embedding
This also describes tSNE, what are the differences?
1 Fuzzy set built using local distance functions ( Riemannian metric)
2 kNN graph’s Laplacian
3 Probability between points corresponds to likelihood of points being
connected in neighborhood graph
4 Cross entropy between fuzzy sets
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 13 / 30
22. UMAP vs tSNE in a nutshell
UMAP paper appendix C
UMAP tSNE
Initialization Laplacian decomposition Random Gaussian
Optimization SGD + negative sampling Gradient Descent
pi|j exp(−
d(xi ,xj )−ρi
σi
)
exp(− xi −xj
2
/s2
i )
k=j exp(− xi −xk
2/s2
i )
qi|j (1+a yi −yj
2b))−1 (1+( yi −yj
2
))−1
k=j (1+( yi −yk
2))−1
Loss H(q, p) = H(p) + KL(q, p) KL(q, p)
Why such p, q?
ρi - Riemannian structure
σi - analogous to si
q - approximation of membership for fuzzy simplicial complex
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 14 / 30
23. Riemannian structure
Remember we want to calculate distance along the manifold
Curve length in Euclidean space
Lγ =
1
0 γ (t) dt =
1
0 〈γ (t)|γ (t)〉dt
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 15 / 30
24. Riemannian structure
Remember we want to calculate distance along the manifold
Curve length in Euclidean space
Lγ =
1
0 γ (t) dt =
1
0 〈γ (t)|γ (t)〉dt
Definition
Riemannian space has local inner product.
Precisely, it has an inner product on each tangent space that varies
smoothly.
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 15 / 30
25. Riemannian structure
Remember we want to calculate distance along the manifold
Curve length in Euclidean space
Lγ =
1
0 γ (t) dt =
1
0 〈γ (t)|γ (t)〉dt
Definition
Riemannian space has local inner product.
Precisely, it has an inner product on each tangent space that varies
smoothly.
Idea
Estimate Riemannian structure locally with finite samples - set it constant
on neighborhoods
Problem
We get incompatible local structures
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 15 / 30
27. Some Category Theory
Category theory is not a theory per se, it’s rather a framework for thinking
about mathematics
Definitions
A category C is a collection of objects with morphisms that can be
composed and the composition satisfies some technical conditions
Think a set of some sets with functions that preserve structure
Definition
A morphism f : A → B is an isomorphism if there is a morphism g : B → A
such that f ◦g = idA and g ◦f = idB
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 17 / 30
28. Some Category Theory
Examples
category FinSet of finite sets and functions between them
category Graphs of graphs and graph homomorphisms
category Ab of Abelian groups and group homomorphisms
category of metric spaces and contractions
Definition
A mapping F : C → D is a functor if F(f ◦g) = F(f )◦F(g)
Warning: technically this is a pair of mappings but we usually omit this
Examples
F : Graphs → FinSet F((V ,E)) = V
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 18 / 30
29. Topological preliminaries
Extended pseudometric space
A space is called pseudometric if it suffices metric space axioms, but
without requiring that d(x,y) = 0 =⇒ x = y
d : X ×X → R+
d(x,y) = d(y,x)
d(x,y)+d(y,z) ≥ d(x,z)
In extended pseudometric space it also can happen that d(x,y) = ∞
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 19 / 30
30. Topological preliminaries
Simplex
A d-dimensional simplex is a set
S ⊂ Rn such that there are d linearly
independent points S = conv(d)
Simplicial complex
A set C of simplexes such that if
s,s ∈ C =⇒ s ∩s ∈ C
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 20 / 30
31. Topology
Definition
Definition: nerve of a family of sets U = {Ui |i ∈ I} is the set of finite
subsets s of I for which i∈s Ui =
Definition
U is a cover of X if i Ui = X
Nerve theorem
If set U is an open cover of X then X ∼ N(U ) (they are homotopy
equivalent, this is one notion of equivalence from topology)
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 21 / 30
32. Fuzzy sets
Definitions
For fixed set X, S ⊂ X, m : S → [0,1] (fuzzy membership function)
A pair (S,m) is a fuzzy set if 0 ≤ m(x) ≤ 1
Fuzzy sets have natural generalizations of set operations:
if (S,mS),(R,mR) are fuzzy sets then
(S ∪R,mS∪R) is a fuzzy set
where
mS∪R(x) = max(mS(x),mR(x))
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 22 / 30
33. Fuzzy sets
Definitions
For fixed set X, S ⊂ X, m : S → [0,1] (fuzzy membership function)
A pair (S,m) is a fuzzy set if 0 ≤ m(x) ≤ 1
Fuzzy sets have natural generalizations of set operations:
if (S,mS),(R,mR) are fuzzy sets then
(S ∪R,mS∪R) is a fuzzy set
where
mS∪R(x) = max(mS(x),mR(x))
Example
S1 = ([0,1],m), m(x) = [x ∈ S]x
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 22 / 30
34. Fuzzy sets
Definitions
For fixed set X, S ⊂ X, m : S → [0,1] (fuzzy membership function)
A pair (S,m) is a fuzzy set if 0 ≤ m(x) ≤ 1
Fuzzy sets have natural generalizations of set operations:
if (S,mS),(R,mR) are fuzzy sets then
(S ∪R,mS∪R) is a fuzzy set
where
mS∪R(x) = max(mS(x),mR(x))
Example
S1 = ([0,1],m), m(x) = [x ∈ S]x
Category of fuzzy sets
objects: fuzzy sets
morphisms: functions f : S → R such that mS(x) ≤ mR(f (x))
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 22 / 30
35. Extended pseudometric spaces and fuzzy sets
UMAP paper section 2.2 and appendix B
Fuzzy simplicial sets are generalization of simplicial complexes. This
category is denoted by Fin-sFuzz.
Category of finite extended pseudometric spaces is denoted by FinEPMet
Metric realization of fuzzy simplicial sets
There exist functors
Real : Fin-sFuzz → FinEPMet
Sing : FinEPMet → Fin-sFuzz
That are adjoint.
Adjoint pairs of functions estabilish a weak equivalence relation of
categories.
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 23 / 30
36. Global structure from local structures
Theory
Construct extended pseudometric spaces locally
Get fuzzy sets from pseudometric spaces
Merge local fuzzy sets
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 24 / 30
37. Global structure from local structures
Dataset defines extended pseudometrics
X = {xi }i<n
ρi - distance from xi to its closest neighbor
di (xj ,xk) =
d(xj ,xk)−ρi if i = j or i = k
∞ otherwise
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 25 / 30
38. Global structure from local structures
Dataset defines extended pseudometrics
X = {xi }i<n
ρi - distance from xi to its closest neighbor
di (xj ,xk) =
d(xj ,xk)−ρi if i = j or i = k
∞ otherwise
Extended pseudometric spaces → fuzzy simplicial sets
(X,di ) → Sing((X,di ))
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 25 / 30
39. Global structure from local structures
Dataset defines extended pseudometrics
X = {xi }i<n
ρi - distance from xi to its closest neighbor
di (xj ,xk) =
d(xj ,xk)−ρi if i = j or i = k
∞ otherwise
Extended pseudometric spaces → fuzzy simplicial sets
(X,di ) → Sing((X,di ))
Local fuzzy sets → global fuzzy set
{(X,di )}i<n → i<n Sing((X,di ))
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 25 / 30
40. Algorithm
UMAP paper section 4.1
UMAP(X, n-neighbors, dim, min-dist, n-epochs)
forall x ∈ X: fsx = LocalFuzzySet(x, n-neighbors)
top-rep = x∈X fsx
Y := SpectralEmbedding(top-rep, dim)
Y := OptimizedEmbedding(top-rep, dim, min-dist, n-epochs)
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 26 / 30
41. Algorithm - details
Simplification
We use only 1-skeleton, weighted neighborhood graph.
Approximation
To use gradient descent we need to take a differentiable approximation of
fuzzy set membership.
The authors propose qi|j = (1+a yi −yj
2b))−1
Probably the most important difference between tSNE (see tSNE paper
6.2)
a,b are set to predefined values or estimated using least squares to
approximate
φ(i,j) =
1 if yi −yj 2 ≤ min-dist
exp(− yi −yj 2 +min-dist)otherwise
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 27 / 30
42. Optimization
UMAP paper section 4.2
Optimization
Optimize C(p,q) = H(p)+KL(p,q)
For p,q use symmetrized pi,j = pi|j +pj|i −pi|j pj|i
Computational shortcut
calculate gradient of error for similar points
approximate gradient for dissimilar points by negative sampling
Implementation details - Nearest neighbors
Naive implementation - O(n2) operations
UMAP implementation uses approximate kNN
Nearest Neighbor Descent algorithm is reported to have O(n1.14)
complexity
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 28 / 30
43. Python implementations
tSNE and other algorithms
scikit-learn implements tSNE, MDS, Isomap, Hessian Eigenmaps,
Locally Linear Embedding
faster implementations available in megaman package
UMAP
original author’s package (pip install umap-learn)
GPU accelerated NVidia rapids (cuML)
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 29 / 30
44. Further reading
Understanding UMAP - interactive visualizations
Exploratory Analysis of Interesting Datasets - UMAP’s documentation
UMAP author’s google scholar page
How Exactly UMAP Works - author’s other articles mention
applications in life sciences
The Category Theory Behind UMAP
UMAP - Mathematics and implementational details
Jakub Bartczuk Dimensionality reduction with UMAP March 2020 30 / 30