SlideShare a Scribd company logo
The Uniform
Manifold
Approximation
Projection
Algorithm
Dimensionality reduction from local metric learning via fuzzy
simplicial sets
Umberto Lupo
April 26, 2019
Table of contents
1. The old mathematics
2. The fuzzy mathematics
3. Uniformity and local metric structure
4. Implementational details
1
In one slide!
By L. McInnes, J. Healy and J. Melville (arXiv:1802.03426). Python
library umap-learn: based on scikit-learn, optimized with numba.
An unsupervised algorithm for non-linear dimensionality reduction. A
noteworthy alternative to t-SNE.
1. Input: N × N distance matrix (e.g. from N pts in Euclidean Rm
).
2. Parameters: num. neighbours κ, embedding dimension d, etc.
3. Topological simplification steps:
a) ∀ i = 1, . . . , N, construct an “almost metric” space Mi local to
entry i by normalizing distances with respect to the κth
nearest entry.
b) Distill the topological and geometric content of each Mi into a
fuzzy simplicial set Fi .
c) The fuzzy union i Fi is a global topological representation.
4. Dimensionality reduction steps:
a) Initialize a cloud Z of N points in Euclidean Rd
.
b) Use fuzzy set cross-entropy to measure distance between Z’s fuzzy
simplicial representation and the input’s.
c) Move points of Z around until this distance is minimized. 2
The old mathematics
Abstracting away abstract simplicial complexes
An abstract simplicial complex (ASC) is a family X of non-empty finite
sets such that α ∈ X, ∅ = β ⊆ α ⇒ β ∈ X.
If card(α) = n + 1 then α is an n-simplex of X. The set of all n-simplices
of X is denoted by Xn. V = X0 is the set of vertices.
Can construct a geometric realization |X| of X as a simplicial complex
in the vector space RJ
= {functions J → R} where J is any sufficiently
large index set (J = V works).
No real need for a total ordering on V so far. With one, could define
face maps dn
i : Xn → Xn−1 for each n > 0 and 0 ≤ i ≤ n:
α = {v0, . . . , vn} where v0 < · · · < vn =⇒ dn
i (α) = α  {vi }.
Idea for a generalization: Do not impose that n-simplices for n ≥ 1 be
sets of vertices. Let them simply be elements of an abstract set Xn.
Trade off this loss for a collection of face maps which should behave as if
they arose from a total ordering.
3
Trade off this loss for a collection of face maps which should behave as if
they arose from a total ordering.
→ Promote to axioms key structural properties of the collection of
dn
i : Xn → Xn−1 which don’t require knowing what the simplices look like.
. . . Not much! Only the simplicial identity
(SI) dn−1
i ◦ dn
j = dn−1
j−1 ◦ dn
i : Xn → Xn−2 ∀ 0 ≤ i < j ≤ n.
Sequence of sets (Xn)n∈N0 and {dn
i : Xn → Xn−1} satisfying (SI) → data
for a Delta set (sometimes: “abstract Delta complex”). More general
than ASCs because e.g.:
1. i = j =⇒ di (α) = dj (α);
2. di (α) = di (β) ∀ i =⇒ α = β.
Geometric realization : For each simplex α let |∆α| = |∆dim α
| where
|∆n
| ⊆ Rn+1
is the standard geometric n-simplex. Identify the faces
appropriately to construct the topological space Real(X) as a quotient of
the disjoint union α |∆α|. Hint: (dn
i α, x) ∼ (α, Di
nx) where
Di
n : |∆n−1
| → |∆n
| is the inclusion of the i-th face (a coface map).
4
Reorganize: Prototype ordered combinatorial n-simplex: [n] = {0, . . . , n}.
Since {[n]}n∈N0
∼= N0, can think of (Xn)n∈N0 as X : [n] → X([n]) = Xn.
Know how to extract i-th faces of all n-simplices at once:
dn
i : X([n]) → X([n − 1]). dn
i “corresponds to” [n]  {i}. But
{[n]  {i} : 0 ≤ i ≤ n} ∼= {f : [n − 1] → [n], strictly order-preserving}.
dn
i implements in Xn the prototype map Di
n : [n − 1] → [n] given by
0 → 0, . . . , i → i + 1, . . . , n − 1 → n . . . Familiar?
=⇒ Our Delta set X is an implementation of {[n]}n∈N0
and of the
collection of coface maps. Boring until we notice:
Dj
n ◦ Di
n−1 = Di
n ◦ Dj−1
n−1 ∀ 0 ≤ i < j ≤ n . . . Again familiar?
For [l]
f
−→ [m]
g
−→ [n] let f ◦op
g := g ◦ f . Starting from X(Di
n) := dn
i we
can define X(Di
n−1 ◦op
Dj
n) := X(Di
n−1) ◦ X(Dj
n) consistently thanks to
(SI)! And extend to arbitrary compositions s.t. X(f ◦op
g) = X(f ) ◦ X(g).
Abstract nonsense: A Delta set is a functor X : ∆op
→ Sets where ∆ is
the category with objects the [n]s, and arrows the strictly o.-p. maps.
5
Further generalize (yes, really): Easy with categories and functors!
Enlarge collection of arrows to include all non-strictly o.-p. maps. Call
the new category ∆. A simplicial set is a functor X : ∆op
→ Sets. The
collection of simplicial sets has the structure of a category S.
But why? We would like to include “degenerate” simplices. Degeneracy
maps sn
i : X([n]) → X([n + 1]) expose any hidden degenerate simplices
“by repeating the i-th vertex”. Example: (v0, v1, v1) = s1
1 ((v0, v1)), a
degenerate 2-simplex “living inside” (v0, v1). sn
i corresponds to and
implements the unique o.-p. map Si
n : [n + 1] → [n] repeating i twice – a
codegeneracy map and the prototype of a “collapse” of an ordered
simplex. Additional easy-to-check-but-tedious-to-write identities satisfied
when codegeneracy maps are added to the coface maps. Functoriality
yields corresponding identities satisfied by the face and degeneracy maps.
Geometric realization : As for Delta sets, but add equivalences
(sn
i α, x) ∼ (α, Si
nx). Real: S → Top is a functor.
6
7
Motivation for us: Variations on the theme of singular homology of a
topological space Y : Sing(Y ) is the simplicial set defined by
Sing(Y ): [n] → {σ: |∆n
| → Y continuous},
with di σ the restriction of σ to the i-th face and si σ the composition of
σ with a collapse. Sing: Top → S is in fact a functor.
This is just another definition, I want my time back. OK, but first note
down this theorem: for any Y ∈ Top and X ∈ S,
(Adj) {Top-arrows Real(X) → Y } ∼= {S-arrows X → Sing(Y )}.
Interpretation
Sing and Real are not inverses, but if you did Real(Sing(Y )) the result
would have topologically a lot in common with Y .
UMAP employs a cousin of this result where Top is replaced by a
category of finite “almost metric” spaces because these are directly and
naturally defined by the data. What, then, must replace S, Real and Sing
to yield something analogous to (Adj)?
8
The fuzzy mathematics
Fuzzy sets
In sets, the membership relation ∈ is binary: either x ∈ A or x /∈ A. A
fuzzy set is a pair (A, µ) where A is a carrier set and µ: A → [0, 1] is a
membership function, i.e. µ(x) is the membership strength of x to A.
Interpreting µ as a “field of Bernoulli probabilities” suggests fuzzy
analogues to the standard Boolean operators ∪ and ∩:
(A, µ) ∩ (B, ν) = (A ∩ B, (µ, ν)), with e.g. (µ, ν) := µν
(A, µ) ∪ (B, ν) = (A ∪ B, ¬ (¬µ, ¬ν)), with e.g. ¬(x) := 1 − x
=⇒ ¬ (¬µ, ¬ν) = µ + ν − µν.
If A = B = U, the fuzzy set cross entropy between (U, µ) and (U, ν) is
C((U, µ), (U, ν)) =
u∈U
KL Bern(µ(u)) Bern(ν(u))
=
u∈U
µ(u) log
µ(u)
ν(u)
+ (1 − µ(u)) log
1 − µ(u)
1 − ν(u)
.
9
Fuzzy simplicial sets
A simplicial set was a functor ∆op
→ Sets. A fuzzy simplicial set is a
functor X : ∆op
→ Fuzz where Fuzz is the category of fuzzy sets. sFuzz
is the category of fuzzy simplicial sets.
“Concretely”: Let I be (0, 1] ⊂ R,1
then can view X ∈ sFuzz as a
functor X : (∆ × I)op
→ Sets. For each n, there is a fuzzy set (Xn, µn).
Define X([n], a) := µ−1
n ([a, 1]).
Geometric realization. . . ? For simplicial sets, Real(X) = α |∆α|/ ∼
where each |∆α| = |∆dim α
|. Reliant on the fact that for each object in
∆op
– i.e. for each n – we have a model space |∆n
| ∈ Top. Here objects
in the source category (∆ × I)op
contain the extra piece of information
a ∈ (0, 1]. If we had equivalent model spaces |∆n
a| and chose a category
C |∆n
a| to replace Top we could define a fuzzy set realization functor
fReal: sFuzz → C “analogously” to Real.
1As a category. . .
10
The correct adjunction
Recall (Adj) relating Sing: Top → S and Real: S → Top. |∆n
| appears
in the definition of Real but also of Sing:
Sing(Y )([n]) = {σ: |∆n
| → Y cts} = {Top-arrows |∆n
| → Y }.
With a choice of “geometric” category C and of model space |∆n
a| ∈ C,
we can define by analogy
fSing(Y )([n], a) = {C-arrows |∆n
a| → Y } so that fSing: C → sFuzz.
The obvious question
What are “correct” choices of C and |∆n
a|?
Our answer
Ones yielding a relation between fSing and fReal analogous to (Adj): e.g.
C = EψMet, |∆n
a| = (t0, . . . , tn) ∈ Rn+1
n
i=0
ti = − log(a), ti ≥ 0 .
(Spivak 2012). EψMet is extended
dist=∞ allowed
pseudo
dist(x,y)=0 =⇒ x=y
-metric spaces.
11
Finite version
Starting from a real-life point cloud we can at best hope to encode the
metric structure in a finite almost-metric space. Need finite analogs
Fin-EψMet, Fin-sFuzz, |∆n
a|Fin ∈ Fin-EψMet,
Fin-EψMet
Fin-fSing
−−−−−→ Fin-sFuzz
Fin-fReal
−−−−−→ Fin-EψMet,
and a finite fuzzy analog (Fin-fAdj) of (Adj). Their (straightforward)
definitions and a proof of (Fin-fAdj) are the main mathematical
contributions of the UMAP paper.
Where we at?
If our data problem naturally yields an object M ∈ Fin-EψMet, we can
theoretically distill much of the topological information by computing
Fin-fSing(M)([n], a) ∀ n ≥ 1, a ∈ (0, 1]. If we have a collection {Mi }N
i=1
instead, we can first apply Fin-fSing individually and then take fuzzy
unions! This will give us a global, fuzzy simplicial representation.
12
Computer-friendly version
We descend back to planet Earth.
Truncate: Stop the computation of Fin-fSing(M) at some small finite n!
Maximally cheap: n = 1.
Understand the output data structure: Requires a look at the definitions.
|∆n
a|Fin := ({ 0, . . . , n}, da), da( i , j ) = −(1 − δij ) log a,
Fin-fSing(M)([n], a) := {Fin-EψMet-arrows |∆n
a|Fin → M}
= {distance non-increasing maps |∆n
a|Fin → M}.
So |∆1
a| ∼= ({0, − log a}, dEucl) and, if M = (M, d):
Fin-fSing(M)([1], a) = {(p, q) ∈ M × M | d(p, q) ≤ − log a}.
So the fuzzy set of 1-simplices is (M × M, µ) where µ(p, q) = e−d(p,q)
.
Just a weighted graph!
13
14
Fuzzy set cross-entropy
Let E be the abstract set of all possible 1-simplices and suppose we have
two fuzzy sets (E, µh) and (E, µl ) – in our views these should correspond
to high and low dimensional representations respectively. Then the fuzzy
set cross entropy will be
e∈E
µh(e) log
µh(e)
µl (e)
+ (1 − µh(e)) log
1 − µh(e)
1 − µl (e)
For fixed µh, minimizing this as a function of µl can be viewed as a
force directed graph layout algorithm:
• First term is minimized when µl (e) is as large as possible, i.e. when
the distance between the points is as small as possible =⇒ an
“attractive force” which is larger when µh(e) is large.
• The second term will be minimized by making µl (e) as small as
possible =⇒ a “repulsive force” between the ends of e whenever
µh(e) is small.
15
Uniformity and local metric
structure
Why uniformity? (Very vaguely)
Some motivation: the ˇCech complex construction from a finite sample of
points is best at topologically reconstructing the underlying manifold
when the points are sampled uniformly.
Theorem (Niyogi et al. 2008). Let M be a smooth, compact
submanifold of Rn
with injectivity radius τ. Let D be a collection of
points on M such that the minimal distance between any point of M
and D is less than /2 for < τ 3/5 – say that D is 2
-dense in M.
Then the ˇCech complex ˇC2 (D) deformation retracts to M ( =⇒
homotopy equivalence =⇒ same homology).
Other results show that the more points we sample uniformly from M,
the higher the probability that the resulting D will be 2 -dense.
16
Learning local metric spaces from data
Basic idea: If enough data is sampled uniformly from a Riemannian
manifold, we should be able to estimate the local metric from the local
density of sample points.
Can estimate the local metric structure relative to which the data would
be uniformly sampled by enforcing that spheres of radius δ centred at
different locations in the point cloud should contain the same number K
of sample points.
In practice, locally rescale distances between each reference point and the
rest of the cloud by making sure this is the case.
17
Implementational details
Local (extended pseudo-)metric spaces
Start from an N × N distance matrix D, fix κ ≥ 1. Na¨ıve idea: define,
for i = 1, . . . , N, Mi = (M = {xi }N
i=1, di ) where ∀ j = i
di (xi , xj ) =
Dij − ρi
σi
,
ρi
σi
:= dist. between xi and its
1st
κth
NN,
and all other independent distances are infinite. di (xi , 1st NN) = 0 =⇒
corresponding edge has membership strength 1 =⇒ local connectivity.
Current implementational shortcuts
Using the nearest neighbour descent algorithm (Dong et al 2011) to
efficiently yield an approximate κ-nearest neighbour graph data structure.
The actual normalizing factor is a “smoothed” version of σi : ˆσi s.t.
xjk
∈κ-NNi
exp−(Dijk
−ρi )/ ˆσi
= log2 κ.
RHS chosen experimentally! Final Eψ-metric has points outside κ-NNi
∞-ly far away from xi . Reduction in complexity from O(N2
) to O(Nκ)!
18
Embedding initialization
Fuzzy union of all local fuzzy sets of edges gives an undirected weighted
graph with weighted adjacency matrix B. With D the degree matrix,
L := D−1/2
(D − B)D−1/2
= I − D−1/2
BD−1/2
is the symmetric normalized Laplacian. If the data were generated by
sampling from a Riemannian manifold, L should be closely related to the
Laplace–Beltrami operator. Exploit this to initialize the low dimensional
representation into a good state by spectral embedding techniques.
In practice
Components of eigenvectors associated with d smallest non-zero
eigenvalues of L (listed in ascending e-value order) used to initialize the
embedding to a point cloud Z = {Z1, . . . , ZN } ⊂ Rd
.
19
Embedding optimization (briefly)
Recall the optimization objective: if (E, µh) =
N
i=1 Fin-fSing(Mi )([1])
and Z := (Z, dEucl) then the loss function is
L(Z) = C (E, µh), (E, µ(Z)) where (E, µ(Z)) := Fin-fSing(Z)([1]).
Several shortcuts:
• Use stochastic gradient descent
• (S)GD would benefit from the final objective function being
differentiable. But Fin-fSin – as a function of N points in Rd
– is
not! Use a smooth approximation of the actual membership strength
function for the low dimensional representation, selecting from a
suitably versatile family. In practice UMAP uses the family of curves
1
1+ax2b .
• Don’t want to have to deal with all possible edges, so use the
negative sampling trick (as in word2vec and LargeVis), to sample
negative examples as needed.
20
Thank you for your attention!
20

More Related Content

What's hot

Tensor Train decomposition in machine learning
Tensor Train decomposition in machine learningTensor Train decomposition in machine learning
Tensor Train decomposition in machine learning
Alexander Novikov
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learning
Abhishek Vijayvargia
 
Topological Data Analysis and Persistent Homology
Topological Data Analysis and Persistent HomologyTopological Data Analysis and Persistent Homology
Topological Data Analysis and Persistent Homology
Carla Melia
 
Contraction mapping
Contraction mappingContraction mapping
Contraction mapping
Hancheol Choi
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Jason Tsai
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
Suresh Pokharel
 
strassen matrix multiplication algorithm
strassen matrix multiplication algorithmstrassen matrix multiplication algorithm
strassen matrix multiplication algorithm
evil eye
 
tensor-decomposition
tensor-decompositiontensor-decomposition
tensor-decompositionKenta Oono
 
Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 I
Sungbin Lim
 
Shortest path algorithms
Shortest path algorithmsShortest path algorithms
Shortest path algorithms
Amit Kumar Rathi
 
Variational Inference
Variational InferenceVariational Inference
Variational Inference
Tushar Tank
 
Correlation modeling and portfolio optimization - CIPEFA
Correlation modeling and portfolio optimization - CIPEFACorrelation modeling and portfolio optimization - CIPEFA
Correlation modeling and portfolio optimization - CIPEFA
Juan Andrés Serur
 
A Ragdoll-less Approach To Physics Animations of Characters In Vehicles
A Ragdoll-less Approach To Physics Animations of Characters In VehiclesA Ragdoll-less Approach To Physics Animations of Characters In Vehicles
A Ragdoll-less Approach To Physics Animations of Characters In Vehicles
Hyojong Shin
 
CS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and CullingCS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and Culling
Mark Kilgard
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
Andrew Ferlitsch
 
[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation
[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation
[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation
Sang Jun Lee
 
multiple linear regression
multiple linear regressionmultiple linear regression
multiple linear regression
Akhilesh Joshi
 
Group theory notes
Group theory notesGroup theory notes
Group theory notes
mkumaresan
 
Linear algebra
Linear algebraLinear algebra
Linear algebra
Sungbin Lim
 
Dimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsDimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applications
Viet-Trung TRAN
 

What's hot (20)

Tensor Train decomposition in machine learning
Tensor Train decomposition in machine learningTensor Train decomposition in machine learning
Tensor Train decomposition in machine learning
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learning
 
Topological Data Analysis and Persistent Homology
Topological Data Analysis and Persistent HomologyTopological Data Analysis and Persistent Homology
Topological Data Analysis and Persistent Homology
 
Contraction mapping
Contraction mappingContraction mapping
Contraction mapping
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
strassen matrix multiplication algorithm
strassen matrix multiplication algorithmstrassen matrix multiplication algorithm
strassen matrix multiplication algorithm
 
tensor-decomposition
tensor-decompositiontensor-decomposition
tensor-decomposition
 
Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 I
 
Shortest path algorithms
Shortest path algorithmsShortest path algorithms
Shortest path algorithms
 
Variational Inference
Variational InferenceVariational Inference
Variational Inference
 
Correlation modeling and portfolio optimization - CIPEFA
Correlation modeling and portfolio optimization - CIPEFACorrelation modeling and portfolio optimization - CIPEFA
Correlation modeling and portfolio optimization - CIPEFA
 
A Ragdoll-less Approach To Physics Animations of Characters In Vehicles
A Ragdoll-less Approach To Physics Animations of Characters In VehiclesA Ragdoll-less Approach To Physics Animations of Characters In Vehicles
A Ragdoll-less Approach To Physics Animations of Characters In Vehicles
 
CS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and CullingCS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and Culling
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
 
[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation
[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation
[5분 논문요약] Structured Knowledge Distillation for Semantic Segmentation
 
multiple linear regression
multiple linear regressionmultiple linear regression
multiple linear regression
 
Group theory notes
Group theory notesGroup theory notes
Group theory notes
 
Linear algebra
Linear algebraLinear algebra
Linear algebra
 
Dimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsDimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applications
 

Similar to UMAP - Mathematics and implementational details

Radial Basis Function Interpolation
Radial Basis Function InterpolationRadial Basis Function Interpolation
Radial Basis Function Interpolation
Jesse Bettencourt
 
Notes on Intersection theory
Notes on Intersection theoryNotes on Intersection theory
Notes on Intersection theory
Heinrich Hartmann
 
Mgm
MgmMgm
S. Duplij, Polyadic integer numbers and finite (m,n)-fields (Journal version,...
S. Duplij, Polyadic integer numbers and finite (m,n)-fields (Journal version,...S. Duplij, Polyadic integer numbers and finite (m,n)-fields (Journal version,...
S. Duplij, Polyadic integer numbers and finite (m,n)-fields (Journal version,...
Steven Duplij (Stepan Douplii)
 
Practical computation of Hecke operators
Practical computation of Hecke operatorsPractical computation of Hecke operators
Practical computation of Hecke operators
Mathieu Dutour Sikiric
 
Cs229 cvxopt
Cs229 cvxoptCs229 cvxopt
Cs229 cvxoptcerezaso
 
Integration
IntegrationIntegration
Integration
sakhi pathak
 
Integration material
Integration material Integration material
Integration material
Surya Swaroop
 
Frobenious theorem
Frobenious theoremFrobenious theorem
Frobenious theorem
Pantelis Sopasakis
 
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Xin-She Yang
 
1807.02591v3.pdf
1807.02591v3.pdf1807.02591v3.pdf
1807.02591v3.pdf
rogerkeinwood1
 
11.final paper -0047www.iiste.org call-for_paper-58
11.final paper -0047www.iiste.org call-for_paper-5811.final paper -0047www.iiste.org call-for_paper-58
11.final paper -0047www.iiste.org call-for_paper-58
Alexander Decker
 
Polya recurrence
Polya recurrencePolya recurrence
Polya recurrenceBrian Burns
 
metric spaces
metric spacesmetric spaces
metric spaces
HamKarimRUPP
 
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...Avichai Cohen
 
Dynamical systems solved ex
Dynamical systems solved exDynamical systems solved ex
Dynamical systems solved ex
Maths Tutoring
 
Machine learning (12)
Machine learning (12)Machine learning (12)
Machine learning (12)NYversity
 
Optimization introduction
Optimization introductionOptimization introduction
Optimization introduction
helalmohammad2
 

Similar to UMAP - Mathematics and implementational details (20)

Radial Basis Function Interpolation
Radial Basis Function InterpolationRadial Basis Function Interpolation
Radial Basis Function Interpolation
 
Notes on Intersection theory
Notes on Intersection theoryNotes on Intersection theory
Notes on Intersection theory
 
Mgm
MgmMgm
Mgm
 
S. Duplij, Polyadic integer numbers and finite (m,n)-fields (Journal version,...
S. Duplij, Polyadic integer numbers and finite (m,n)-fields (Journal version,...S. Duplij, Polyadic integer numbers and finite (m,n)-fields (Journal version,...
S. Duplij, Polyadic integer numbers and finite (m,n)-fields (Journal version,...
 
Practical computation of Hecke operators
Practical computation of Hecke operatorsPractical computation of Hecke operators
Practical computation of Hecke operators
 
Cs229 cvxopt
Cs229 cvxoptCs229 cvxopt
Cs229 cvxopt
 
Integration
IntegrationIntegration
Integration
 
Integration material
Integration material Integration material
Integration material
 
Frobenious theorem
Frobenious theoremFrobenious theorem
Frobenious theorem
 
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
 
1807.02591v3.pdf
1807.02591v3.pdf1807.02591v3.pdf
1807.02591v3.pdf
 
11.final paper -0047www.iiste.org call-for_paper-58
11.final paper -0047www.iiste.org call-for_paper-5811.final paper -0047www.iiste.org call-for_paper-58
11.final paper -0047www.iiste.org call-for_paper-58
 
Polya recurrence
Polya recurrencePolya recurrence
Polya recurrence
 
Contour
ContourContour
Contour
 
Astaño 4
Astaño 4Astaño 4
Astaño 4
 
metric spaces
metric spacesmetric spaces
metric spaces
 
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...
Amirim Project - Threshold Functions in Random Simplicial Complexes - Avichai...
 
Dynamical systems solved ex
Dynamical systems solved exDynamical systems solved ex
Dynamical systems solved ex
 
Machine learning (12)
Machine learning (12)Machine learning (12)
Machine learning (12)
 
Optimization introduction
Optimization introductionOptimization introduction
Optimization introduction
 

Recently uploaded

Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 

Recently uploaded (20)

Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 

UMAP - Mathematics and implementational details

  • 1. The Uniform Manifold Approximation Projection Algorithm Dimensionality reduction from local metric learning via fuzzy simplicial sets Umberto Lupo April 26, 2019
  • 2. Table of contents 1. The old mathematics 2. The fuzzy mathematics 3. Uniformity and local metric structure 4. Implementational details 1
  • 3. In one slide! By L. McInnes, J. Healy and J. Melville (arXiv:1802.03426). Python library umap-learn: based on scikit-learn, optimized with numba. An unsupervised algorithm for non-linear dimensionality reduction. A noteworthy alternative to t-SNE. 1. Input: N × N distance matrix (e.g. from N pts in Euclidean Rm ). 2. Parameters: num. neighbours κ, embedding dimension d, etc. 3. Topological simplification steps: a) ∀ i = 1, . . . , N, construct an “almost metric” space Mi local to entry i by normalizing distances with respect to the κth nearest entry. b) Distill the topological and geometric content of each Mi into a fuzzy simplicial set Fi . c) The fuzzy union i Fi is a global topological representation. 4. Dimensionality reduction steps: a) Initialize a cloud Z of N points in Euclidean Rd . b) Use fuzzy set cross-entropy to measure distance between Z’s fuzzy simplicial representation and the input’s. c) Move points of Z around until this distance is minimized. 2
  • 5. Abstracting away abstract simplicial complexes An abstract simplicial complex (ASC) is a family X of non-empty finite sets such that α ∈ X, ∅ = β ⊆ α ⇒ β ∈ X. If card(α) = n + 1 then α is an n-simplex of X. The set of all n-simplices of X is denoted by Xn. V = X0 is the set of vertices. Can construct a geometric realization |X| of X as a simplicial complex in the vector space RJ = {functions J → R} where J is any sufficiently large index set (J = V works). No real need for a total ordering on V so far. With one, could define face maps dn i : Xn → Xn−1 for each n > 0 and 0 ≤ i ≤ n: α = {v0, . . . , vn} where v0 < · · · < vn =⇒ dn i (α) = α {vi }. Idea for a generalization: Do not impose that n-simplices for n ≥ 1 be sets of vertices. Let them simply be elements of an abstract set Xn. Trade off this loss for a collection of face maps which should behave as if they arose from a total ordering. 3
  • 6. Trade off this loss for a collection of face maps which should behave as if they arose from a total ordering. → Promote to axioms key structural properties of the collection of dn i : Xn → Xn−1 which don’t require knowing what the simplices look like. . . . Not much! Only the simplicial identity (SI) dn−1 i ◦ dn j = dn−1 j−1 ◦ dn i : Xn → Xn−2 ∀ 0 ≤ i < j ≤ n. Sequence of sets (Xn)n∈N0 and {dn i : Xn → Xn−1} satisfying (SI) → data for a Delta set (sometimes: “abstract Delta complex”). More general than ASCs because e.g.: 1. i = j =⇒ di (α) = dj (α); 2. di (α) = di (β) ∀ i =⇒ α = β. Geometric realization : For each simplex α let |∆α| = |∆dim α | where |∆n | ⊆ Rn+1 is the standard geometric n-simplex. Identify the faces appropriately to construct the topological space Real(X) as a quotient of the disjoint union α |∆α|. Hint: (dn i α, x) ∼ (α, Di nx) where Di n : |∆n−1 | → |∆n | is the inclusion of the i-th face (a coface map). 4
  • 7. Reorganize: Prototype ordered combinatorial n-simplex: [n] = {0, . . . , n}. Since {[n]}n∈N0 ∼= N0, can think of (Xn)n∈N0 as X : [n] → X([n]) = Xn. Know how to extract i-th faces of all n-simplices at once: dn i : X([n]) → X([n − 1]). dn i “corresponds to” [n] {i}. But {[n] {i} : 0 ≤ i ≤ n} ∼= {f : [n − 1] → [n], strictly order-preserving}. dn i implements in Xn the prototype map Di n : [n − 1] → [n] given by 0 → 0, . . . , i → i + 1, . . . , n − 1 → n . . . Familiar? =⇒ Our Delta set X is an implementation of {[n]}n∈N0 and of the collection of coface maps. Boring until we notice: Dj n ◦ Di n−1 = Di n ◦ Dj−1 n−1 ∀ 0 ≤ i < j ≤ n . . . Again familiar? For [l] f −→ [m] g −→ [n] let f ◦op g := g ◦ f . Starting from X(Di n) := dn i we can define X(Di n−1 ◦op Dj n) := X(Di n−1) ◦ X(Dj n) consistently thanks to (SI)! And extend to arbitrary compositions s.t. X(f ◦op g) = X(f ) ◦ X(g). Abstract nonsense: A Delta set is a functor X : ∆op → Sets where ∆ is the category with objects the [n]s, and arrows the strictly o.-p. maps. 5
  • 8. Further generalize (yes, really): Easy with categories and functors! Enlarge collection of arrows to include all non-strictly o.-p. maps. Call the new category ∆. A simplicial set is a functor X : ∆op → Sets. The collection of simplicial sets has the structure of a category S. But why? We would like to include “degenerate” simplices. Degeneracy maps sn i : X([n]) → X([n + 1]) expose any hidden degenerate simplices “by repeating the i-th vertex”. Example: (v0, v1, v1) = s1 1 ((v0, v1)), a degenerate 2-simplex “living inside” (v0, v1). sn i corresponds to and implements the unique o.-p. map Si n : [n + 1] → [n] repeating i twice – a codegeneracy map and the prototype of a “collapse” of an ordered simplex. Additional easy-to-check-but-tedious-to-write identities satisfied when codegeneracy maps are added to the coface maps. Functoriality yields corresponding identities satisfied by the face and degeneracy maps. Geometric realization : As for Delta sets, but add equivalences (sn i α, x) ∼ (α, Si nx). Real: S → Top is a functor. 6
  • 9. 7
  • 10. Motivation for us: Variations on the theme of singular homology of a topological space Y : Sing(Y ) is the simplicial set defined by Sing(Y ): [n] → {σ: |∆n | → Y continuous}, with di σ the restriction of σ to the i-th face and si σ the composition of σ with a collapse. Sing: Top → S is in fact a functor. This is just another definition, I want my time back. OK, but first note down this theorem: for any Y ∈ Top and X ∈ S, (Adj) {Top-arrows Real(X) → Y } ∼= {S-arrows X → Sing(Y )}. Interpretation Sing and Real are not inverses, but if you did Real(Sing(Y )) the result would have topologically a lot in common with Y . UMAP employs a cousin of this result where Top is replaced by a category of finite “almost metric” spaces because these are directly and naturally defined by the data. What, then, must replace S, Real and Sing to yield something analogous to (Adj)? 8
  • 12. Fuzzy sets In sets, the membership relation ∈ is binary: either x ∈ A or x /∈ A. A fuzzy set is a pair (A, µ) where A is a carrier set and µ: A → [0, 1] is a membership function, i.e. µ(x) is the membership strength of x to A. Interpreting µ as a “field of Bernoulli probabilities” suggests fuzzy analogues to the standard Boolean operators ∪ and ∩: (A, µ) ∩ (B, ν) = (A ∩ B, (µ, ν)), with e.g. (µ, ν) := µν (A, µ) ∪ (B, ν) = (A ∪ B, ¬ (¬µ, ¬ν)), with e.g. ¬(x) := 1 − x =⇒ ¬ (¬µ, ¬ν) = µ + ν − µν. If A = B = U, the fuzzy set cross entropy between (U, µ) and (U, ν) is C((U, µ), (U, ν)) = u∈U KL Bern(µ(u)) Bern(ν(u)) = u∈U µ(u) log µ(u) ν(u) + (1 − µ(u)) log 1 − µ(u) 1 − ν(u) . 9
  • 13. Fuzzy simplicial sets A simplicial set was a functor ∆op → Sets. A fuzzy simplicial set is a functor X : ∆op → Fuzz where Fuzz is the category of fuzzy sets. sFuzz is the category of fuzzy simplicial sets. “Concretely”: Let I be (0, 1] ⊂ R,1 then can view X ∈ sFuzz as a functor X : (∆ × I)op → Sets. For each n, there is a fuzzy set (Xn, µn). Define X([n], a) := µ−1 n ([a, 1]). Geometric realization. . . ? For simplicial sets, Real(X) = α |∆α|/ ∼ where each |∆α| = |∆dim α |. Reliant on the fact that for each object in ∆op – i.e. for each n – we have a model space |∆n | ∈ Top. Here objects in the source category (∆ × I)op contain the extra piece of information a ∈ (0, 1]. If we had equivalent model spaces |∆n a| and chose a category C |∆n a| to replace Top we could define a fuzzy set realization functor fReal: sFuzz → C “analogously” to Real. 1As a category. . . 10
  • 14. The correct adjunction Recall (Adj) relating Sing: Top → S and Real: S → Top. |∆n | appears in the definition of Real but also of Sing: Sing(Y )([n]) = {σ: |∆n | → Y cts} = {Top-arrows |∆n | → Y }. With a choice of “geometric” category C and of model space |∆n a| ∈ C, we can define by analogy fSing(Y )([n], a) = {C-arrows |∆n a| → Y } so that fSing: C → sFuzz. The obvious question What are “correct” choices of C and |∆n a|? Our answer Ones yielding a relation between fSing and fReal analogous to (Adj): e.g. C = EψMet, |∆n a| = (t0, . . . , tn) ∈ Rn+1 n i=0 ti = − log(a), ti ≥ 0 . (Spivak 2012). EψMet is extended dist=∞ allowed pseudo dist(x,y)=0 =⇒ x=y -metric spaces. 11
  • 15. Finite version Starting from a real-life point cloud we can at best hope to encode the metric structure in a finite almost-metric space. Need finite analogs Fin-EψMet, Fin-sFuzz, |∆n a|Fin ∈ Fin-EψMet, Fin-EψMet Fin-fSing −−−−−→ Fin-sFuzz Fin-fReal −−−−−→ Fin-EψMet, and a finite fuzzy analog (Fin-fAdj) of (Adj). Their (straightforward) definitions and a proof of (Fin-fAdj) are the main mathematical contributions of the UMAP paper. Where we at? If our data problem naturally yields an object M ∈ Fin-EψMet, we can theoretically distill much of the topological information by computing Fin-fSing(M)([n], a) ∀ n ≥ 1, a ∈ (0, 1]. If we have a collection {Mi }N i=1 instead, we can first apply Fin-fSing individually and then take fuzzy unions! This will give us a global, fuzzy simplicial representation. 12
  • 16. Computer-friendly version We descend back to planet Earth. Truncate: Stop the computation of Fin-fSing(M) at some small finite n! Maximally cheap: n = 1. Understand the output data structure: Requires a look at the definitions. |∆n a|Fin := ({ 0, . . . , n}, da), da( i , j ) = −(1 − δij ) log a, Fin-fSing(M)([n], a) := {Fin-EψMet-arrows |∆n a|Fin → M} = {distance non-increasing maps |∆n a|Fin → M}. So |∆1 a| ∼= ({0, − log a}, dEucl) and, if M = (M, d): Fin-fSing(M)([1], a) = {(p, q) ∈ M × M | d(p, q) ≤ − log a}. So the fuzzy set of 1-simplices is (M × M, µ) where µ(p, q) = e−d(p,q) . Just a weighted graph! 13
  • 17. 14
  • 18. Fuzzy set cross-entropy Let E be the abstract set of all possible 1-simplices and suppose we have two fuzzy sets (E, µh) and (E, µl ) – in our views these should correspond to high and low dimensional representations respectively. Then the fuzzy set cross entropy will be e∈E µh(e) log µh(e) µl (e) + (1 − µh(e)) log 1 − µh(e) 1 − µl (e) For fixed µh, minimizing this as a function of µl can be viewed as a force directed graph layout algorithm: • First term is minimized when µl (e) is as large as possible, i.e. when the distance between the points is as small as possible =⇒ an “attractive force” which is larger when µh(e) is large. • The second term will be minimized by making µl (e) as small as possible =⇒ a “repulsive force” between the ends of e whenever µh(e) is small. 15
  • 19. Uniformity and local metric structure
  • 20. Why uniformity? (Very vaguely) Some motivation: the ˇCech complex construction from a finite sample of points is best at topologically reconstructing the underlying manifold when the points are sampled uniformly. Theorem (Niyogi et al. 2008). Let M be a smooth, compact submanifold of Rn with injectivity radius τ. Let D be a collection of points on M such that the minimal distance between any point of M and D is less than /2 for < τ 3/5 – say that D is 2 -dense in M. Then the ˇCech complex ˇC2 (D) deformation retracts to M ( =⇒ homotopy equivalence =⇒ same homology). Other results show that the more points we sample uniformly from M, the higher the probability that the resulting D will be 2 -dense. 16
  • 21. Learning local metric spaces from data Basic idea: If enough data is sampled uniformly from a Riemannian manifold, we should be able to estimate the local metric from the local density of sample points. Can estimate the local metric structure relative to which the data would be uniformly sampled by enforcing that spheres of radius δ centred at different locations in the point cloud should contain the same number K of sample points. In practice, locally rescale distances between each reference point and the rest of the cloud by making sure this is the case. 17
  • 23. Local (extended pseudo-)metric spaces Start from an N × N distance matrix D, fix κ ≥ 1. Na¨ıve idea: define, for i = 1, . . . , N, Mi = (M = {xi }N i=1, di ) where ∀ j = i di (xi , xj ) = Dij − ρi σi , ρi σi := dist. between xi and its 1st κth NN, and all other independent distances are infinite. di (xi , 1st NN) = 0 =⇒ corresponding edge has membership strength 1 =⇒ local connectivity. Current implementational shortcuts Using the nearest neighbour descent algorithm (Dong et al 2011) to efficiently yield an approximate κ-nearest neighbour graph data structure. The actual normalizing factor is a “smoothed” version of σi : ˆσi s.t. xjk ∈κ-NNi exp−(Dijk −ρi )/ ˆσi = log2 κ. RHS chosen experimentally! Final Eψ-metric has points outside κ-NNi ∞-ly far away from xi . Reduction in complexity from O(N2 ) to O(Nκ)! 18
  • 24. Embedding initialization Fuzzy union of all local fuzzy sets of edges gives an undirected weighted graph with weighted adjacency matrix B. With D the degree matrix, L := D−1/2 (D − B)D−1/2 = I − D−1/2 BD−1/2 is the symmetric normalized Laplacian. If the data were generated by sampling from a Riemannian manifold, L should be closely related to the Laplace–Beltrami operator. Exploit this to initialize the low dimensional representation into a good state by spectral embedding techniques. In practice Components of eigenvectors associated with d smallest non-zero eigenvalues of L (listed in ascending e-value order) used to initialize the embedding to a point cloud Z = {Z1, . . . , ZN } ⊂ Rd . 19
  • 25. Embedding optimization (briefly) Recall the optimization objective: if (E, µh) = N i=1 Fin-fSing(Mi )([1]) and Z := (Z, dEucl) then the loss function is L(Z) = C (E, µh), (E, µ(Z)) where (E, µ(Z)) := Fin-fSing(Z)([1]). Several shortcuts: • Use stochastic gradient descent • (S)GD would benefit from the final objective function being differentiable. But Fin-fSin – as a function of N points in Rd – is not! Use a smooth approximation of the actual membership strength function for the low dimensional representation, selecting from a suitably versatile family. In practice UMAP uses the family of curves 1 1+ax2b . • Don’t want to have to deal with all possible edges, so use the negative sampling trick (as in word2vec and LargeVis), to sample negative examples as needed. 20
  • 26. Thank you for your attention! 20