UMAP - Mathematics and implementational details

The Uniform
Manifold
Approximation
Projection
Algorithm
Dimensionality reduction from local metric learning via fuzzy
simplicial sets
Umberto Lupo
April 26, 2019

Table of contents
1. The old mathematics
2. The fuzzy mathematics
3. Uniformity and local metric structure
4. Implementational details
1

In one slide!
By L. McInnes, J. Healy and J. Melville (arXiv:1802.03426). Python
library umap-learn: based on scikit-learn, optimized with numba.
An unsupervised algorithm for non-linear dimensionality reduction. A
noteworthy alternative to t-SNE.
1. Input: N × N distance matrix (e.g. from N pts in Euclidean Rm
).
2. Parameters: num. neighbours κ, embedding dimension d, etc.
3. Topological simpliﬁcation steps:
a) ∀ i = 1, . . . , N, construct an “almost metric” space Mi local to
entry i by normalizing distances with respect to the κth
nearest entry.
b) Distill the topological and geometric content of each Mi into a
fuzzy simplicial set Fi .
c) The fuzzy union i Fi is a global topological representation.
4. Dimensionality reduction steps:
a) Initialize a cloud Z of N points in Euclidean Rd
.
b) Use fuzzy set cross-entropy to measure distance between Z’s fuzzy
simplicial representation and the input’s.
c) Move points of Z around until this distance is minimized. 2

Abstracting away abstract simplicial complexes
An abstract simplicial complex (ASC) is a family X of non-empty finite
sets such that α ∈ X, ∅ = β ⊆ α ⇒ β ∈ X.
If card(α) = n + 1 then α is an n-simplex of X. The set of all n-simplices
of X is denoted by Xn. V = X0 is the set of vertices.
Can construct a geometric realization |X| of X as a simplicial complex
in the vector space RJ
= {functions J → R} where J is any sufficiently
large index set (J = V works).
No real need for a total ordering on V so far. With one, could define
face maps dn
i : Xn → Xn−1 for each n > 0 and 0 ≤ i ≤ n:
α = {v0, . . . , vn} where v0 < · · · < vn =⇒ dn
i (α) = α {vi }.
Idea for a generalization: Do not impose that n-simplices for n ≥ 1 be
sets of vertices. Let them simply be elements of an abstract set Xn.
Trade off this loss for a collection of face maps which should behave as if
they arose from a total ordering.
3

Trade oﬀ this loss for a collection of face maps which should behave as if
they arose from a total ordering.
→ Promote to axioms key structural properties of the collection of
dn
i : Xn → Xn−1 which don’t require knowing what the simplices look like.
. . . Not much! Only the simplicial identity
(SI) dn−1
i ◦ dn
j = dn−1
j−1 ◦ dn
i : Xn → Xn−2 ∀ 0 ≤ i < j ≤ n.
Sequence of sets (Xn)n∈N0 and {dn
i : Xn → Xn−1} satisfying (SI) → data
for a Delta set (sometimes: “abstract Delta complex”). More general
than ASCs because e.g.:
1. i = j =⇒ di (α) = dj (α);
2. di (α) = di (β) ∀ i =⇒ α = β.
Geometric realization : For each simplex α let |∆α| = |∆dim α
| where
|∆n
| ⊆ Rn+1
is the standard geometric n-simplex. Identify the faces
appropriately to construct the topological space Real(X) as a quotient of
the disjoint union α |∆α|. Hint: (dn
i α, x) ∼ (α, Di
nx) where
Di
n : |∆n−1
| → |∆n
| is the inclusion of the i-th face (a coface map).
4

Reorganize: Prototype ordered combinatorial n-simplex: [n] = {0, . . . , n}.
Since {[n]}n∈N0
∼= N0, can think of (Xn)n∈N0 as X : [n] → X([n]) = Xn.
Know how to extract i-th faces of all n-simplices at once:
dn
i : X([n]) → X([n − 1]). dn
i “corresponds to” [n] {i}. But
{[n] {i} : 0 ≤ i ≤ n} ∼= {f : [n − 1] → [n], strictly order-preserving}.
dn
i implements in Xn the prototype map Di
n : [n − 1] → [n] given by
0 → 0, . . . , i → i + 1, . . . , n − 1 → n . . . Familiar?
=⇒ Our Delta set X is an implementation of {[n]}n∈N0
and of the
collection of coface maps. Boring until we notice:
Dj
n ◦ Di
n−1 = Di
n ◦ Dj−1
n−1 ∀ 0 ≤ i < j ≤ n . . . Again familiar?
For [l]
f
−→ [m]
g
−→ [n] let f ◦op
g := g ◦ f . Starting from X(Di
n) := dn
i we
can deﬁne X(Di
n−1 ◦op
Dj
n) := X(Di
n−1) ◦ X(Dj
n) consistently thanks to
(SI)! And extend to arbitrary compositions s.t. X(f ◦op
g) = X(f ) ◦ X(g).
Abstract nonsense: A Delta set is a functor X : ∆op
→ Sets where ∆ is
the category with objects the [n]s, and arrows the strictly o.-p. maps.
5

Further generalize (yes, really): Easy with categories and functors!
Enlarge collection of arrows to include all non-strictly o.-p. maps. Call
the new category ∆. A simplicial set is a functor X : ∆op
→ Sets. The
collection of simplicial sets has the structure of a category S.
But why? We would like to include “degenerate” simplices. Degeneracy
maps sn
i : X([n]) → X([n + 1]) expose any hidden degenerate simplices
“by repeating the i-th vertex”. Example: (v0, v1, v1) = s1
1 ((v0, v1)), a
degenerate 2-simplex “living inside” (v0, v1). sn
i corresponds to and
implements the unique o.-p. map Si
n : [n + 1] → [n] repeating i twice – a
codegeneracy map and the prototype of a “collapse” of an ordered
simplex. Additional easy-to-check-but-tedious-to-write identities satisﬁed
when codegeneracy maps are added to the coface maps. Functoriality
yields corresponding identities satisﬁed by the face and degeneracy maps.
Geometric realization : As for Delta sets, but add equivalences
(sn
i α, x) ∼ (α, Si
nx). Real: S → Top is a functor.
6

Motivation for us: Variations on the theme of singular homology of a
topological space Y : Sing(Y ) is the simplicial set defined by
Sing(Y ): [n] → {σ: |∆n
| → Y continuous},
with di σ the restriction of σ to the i-th face and si σ the composition of
σ with a collapse. Sing: Top → S is in fact a functor.
This is just another definition, I want my time back. OK, but first note
down this theorem: for any Y ∈ Top and X ∈ S,
(Adj) {Top-arrows Real(X) → Y } ∼= {S-arrows X → Sing(Y )}.
Interpretation
Sing and Real are not inverses, but if you did Real(Sing(Y )) the result
would have topologically a lot in common with Y .
UMAP employs a cousin of this result where Top is replaced by a
category of finite “almost metric” spaces because these are directly and
naturally defined by the data. What, then, must replace S, Real and Sing
to yield something analogous to (Adj)?
8

Fuzzy sets
In sets, the membership relation ∈ is binary: either x ∈ A or x /∈ A. A
fuzzy set is a pair (A, µ) where A is a carrier set and µ: A → [0, 1] is a
membership function, i.e. µ(x) is the membership strength of x to A.
Interpreting µ as a “ﬁeld of Bernoulli probabilities” suggests fuzzy
analogues to the standard Boolean operators ∪ and ∩:
(A, µ) ∩ (B, ν) = (A ∩ B, (µ, ν)), with e.g. (µ, ν) := µν
(A, µ) ∪ (B, ν) = (A ∪ B, ¬ (¬µ, ¬ν)), with e.g. ¬(x) := 1 − x
=⇒ ¬ (¬µ, ¬ν) = µ + ν − µν.
If A = B = U, the fuzzy set cross entropy between (U, µ) and (U, ν) is
C((U, µ), (U, ν)) =
u∈U
KL Bern(µ(u)) Bern(ν(u))
=
u∈U
µ(u) log
µ(u)
ν(u)
+ (1 − µ(u)) log
1 − µ(u)
1 − ν(u)
.
9

Fuzzy simplicial sets
A simplicial set was a functor ∆op
→ Sets. A fuzzy simplicial set is a
functor X : ∆op
→ Fuzz where Fuzz is the category of fuzzy sets. sFuzz
is the category of fuzzy simplicial sets.
“Concretely”: Let I be (0, 1] ⊂ R,1
then can view X ∈ sFuzz as a
functor X : (∆ × I)op
→ Sets. For each n, there is a fuzzy set (Xn, µn).
Deﬁne X([n], a) := µ−1
n ([a, 1]).
Geometric realization. . . ? For simplicial sets, Real(X) = α |∆α|/ ∼
where each |∆α| = |∆dim α
|. Reliant on the fact that for each object in
∆op
– i.e. for each n – we have a model space |∆n
| ∈ Top. Here objects
in the source category (∆ × I)op
contain the extra piece of information
a ∈ (0, 1]. If we had equivalent model spaces |∆n
a| and chose a category
C |∆n
a| to replace Top we could deﬁne a fuzzy set realization functor
fReal: sFuzz → C “analogously” to Real.
1As a category. . .
10

The correct adjunction
Recall (Adj) relating Sing: Top → S and Real: S → Top. |∆n
| appears
in the deﬁnition of Real but also of Sing:
Sing(Y )([n]) = {σ: |∆n
| → Y cts} = {Top-arrows |∆n
| → Y }.
With a choice of “geometric” category C and of model space |∆n
a| ∈ C,
we can deﬁne by analogy
fSing(Y )([n], a) = {C-arrows |∆n
a| → Y } so that fSing: C → sFuzz.
The obvious question
What are “correct” choices of C and |∆n
a|?
Our answer
Ones yielding a relation between fSing and fReal analogous to (Adj): e.g.
C = EψMet, |∆n
a| = (t0, . . . , tn) ∈ Rn+1
n
i=0
ti = − log(a), ti ≥ 0 .
(Spivak 2012). EψMet is extended
dist=∞ allowed
pseudo
dist(x,y)=0 =⇒ x=y
-metric spaces.
11

Finite version
Starting from a real-life point cloud we can at best hope to encode the
metric structure in a finite almost-metric space. Need finite analogs
Fin-EψMet, Fin-sFuzz, |∆n
a|Fin ∈ Fin-EψMet,
Fin-EψMet
Fin-fSing
−−−−−→ Fin-sFuzz
Fin-fReal
−−−−−→ Fin-EψMet,
and a finite fuzzy analog (Fin-fAdj) of (Adj). Their (straightforward)
definitions and a proof of (Fin-fAdj) are the main mathematical
contributions of the UMAP paper.
Where we at?
If our data problem naturally yields an object M ∈ Fin-EψMet, we can
theoretically distill much of the topological information by computing
Fin-fSing(M)([n], a) ∀ n ≥ 1, a ∈ (0, 1]. If we have a collection {Mi }N
i=1
instead, we can first apply Fin-fSing individually and then take fuzzy
unions! This will give us a global, fuzzy simplicial representation.
12

Computer-friendly version
We descend back to planet Earth.
Truncate: Stop the computation of Fin-fSing(M) at some small ﬁnite n!
Maximally cheap: n = 1.
Understand the output data structure: Requires a look at the deﬁnitions.
|∆n
a|Fin := ({ 0, . . . , n}, da), da( i , j ) = −(1 − δij ) log a,
Fin-fSing(M)([n], a) := {Fin-EψMet-arrows |∆n
a|Fin → M}
= {distance non-increasing maps |∆n
a|Fin → M}.
So |∆1
a| ∼= ({0, − log a}, dEucl) and, if M = (M, d):
Fin-fSing(M)([1], a) = {(p, q) ∈ M × M | d(p, q) ≤ − log a}.
So the fuzzy set of 1-simplices is (M × M, µ) where µ(p, q) = e−d(p,q)
.
Just a weighted graph!
13

Fuzzy set cross-entropy
Let E be the abstract set of all possible 1-simplices and suppose we have
two fuzzy sets (E, µh) and (E, µl ) – in our views these should correspond
to high and low dimensional representations respectively. Then the fuzzy
set cross entropy will be
e∈E
µh(e) log
µh(e)
µl (e)
+ (1 − µh(e)) log
1 − µh(e)
1 − µl (e)
For ﬁxed µh, minimizing this as a function of µl can be viewed as a
force directed graph layout algorithm:
• First term is minimized when µl (e) is as large as possible, i.e. when
the distance between the points is as small as possible =⇒ an
“attractive force” which is larger when µh(e) is large.
• The second term will be minimized by making µl (e) as small as
possible =⇒ a “repulsive force” between the ends of e whenever
µh(e) is small.
15

Uniformity and local metric
structure

Why uniformity? (Very vaguely)
Some motivation: the ˇCech complex construction from a ﬁnite sample of
points is best at topologically reconstructing the underlying manifold
when the points are sampled uniformly.
Theorem (Niyogi et al. 2008). Let M be a smooth, compact
submanifold of Rn
with injectivity radius τ. Let D be a collection of
points on M such that the minimal distance between any point of M
and D is less than /2 for < τ 3/5 – say that D is 2
-dense in M.
Then the ˇCech complex ˇC2 (D) deformation retracts to M ( =⇒
homotopy equivalence =⇒ same homology).
Other results show that the more points we sample uniformly from M,
the higher the probability that the resulting D will be 2 -dense.
16

Learning local metric spaces from data
Basic idea: If enough data is sampled uniformly from a Riemannian
manifold, we should be able to estimate the local metric from the local
density of sample points.
Can estimate the local metric structure relative to which the data would
be uniformly sampled by enforcing that spheres of radius δ centred at
diﬀerent locations in the point cloud should contain the same number K
of sample points.
In practice, locally rescale distances between each reference point and the
rest of the cloud by making sure this is the case.
17

Local (extended pseudo-)metric spaces
Start from an N × N distance matrix D, fix κ ≥ 1. Na¨ıve idea: define,
for i = 1, . . . , N, Mi = (M = {xi }N
i=1, di ) where ∀ j = i
di (xi , xj ) =
Dij − ρi
σi
,
ρi
σi
:= dist. between xi and its
1st
κth
NN,
and all other independent distances are infinite. di (xi , 1st NN) = 0 =⇒
corresponding edge has membership strength 1 =⇒ local connectivity.
Current implementational shortcuts
Using the nearest neighbour descent algorithm (Dong et al 2011) to
efficiently yield an approximate κ-nearest neighbour graph data structure.
The actual normalizing factor is a “smoothed” version of σi : ˆσi s.t.
xjk
∈κ-NNi
exp−(Dijk
−ρi )/ ˆσi
= log2 κ.
RHS chosen experimentally! Final Eψ-metric has points outside κ-NNi
∞-ly far away from xi . Reduction in complexity from O(N2
) to O(Nκ)!
18

Embedding initialization
Fuzzy union of all local fuzzy sets of edges gives an undirected weighted
graph with weighted adjacency matrix B. With D the degree matrix,
L := D−1/2
(D − B)D−1/2
= I − D−1/2
BD−1/2
is the symmetric normalized Laplacian. If the data were generated by
sampling from a Riemannian manifold, L should be closely related to the
Laplace–Beltrami operator. Exploit this to initialize the low dimensional
representation into a good state by spectral embedding techniques.
In practice
Components of eigenvectors associated with d smallest non-zero
eigenvalues of L (listed in ascending e-value order) used to initialize the
embedding to a point cloud Z = {Z1, . . . , ZN } ⊂ Rd
.
19

Embedding optimization (briefly)
Recall the optimization objective: if (E, µh) =
N
i=1 Fin-fSing(Mi )([1])
and Z := (Z, dEucl) then the loss function is
L(Z) = C (E, µh), (E, µ(Z)) where (E, µ(Z)) := Fin-fSing(Z)([1]).
Several shortcuts:
• Use stochastic gradient descent
• (S)GD would benefit from the final objective function being
differentiable. But Fin-fSin – as a function of N points in Rd
– is
not! Use a smooth approximation of the actual membership strength
function for the low dimensional representation, selecting from a
suitably versatile family. In practice UMAP uses the family of curves
1
1+ax2b .
• Don’t want to have to deal with all possible edges, so use the
negative sampling trick (as in word2vec and LargeVis), to sample
negative examples as needed.
20

Thank you for your attention!
20

UMAP - Mathematics and implementational details

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to UMAP - Mathematics and implementational details

Similar to UMAP - Mathematics and implementational details (20)

Recently uploaded

Recently uploaded (20)

UMAP - Mathematics and implementational details