UMAP is a technique for dimensionality reduction that was proposed 2 years ago that quickly gained widespread usage for dimensionality reduction.
In this presentation I will try to demistyfy UMAP by comparing it to tSNE. I also sketch its theoretical background in topology and fuzzy sets.
UMAP is a technique for dimensionality reduction that was proposed 2 years ago that quickly gained widespread usage for dimensionality reduction.
In this presentation I will try to demistyfy UMAP by comparing it to tSNE. I also sketch its theoretical background in topology and fuzzy sets.
Tensor Train (TT) decomposition [3] is a generalization of SVD decomposition from matrices to tensors (=multidimensional arrays).
It represents a tensor compactly in terms of factors and allows to work with the tensor via its factors without materializing the tensor itself.
For example, we can find the elementwise product of two TT-tensors of size 2^100 and get the result in the TT-format as well.
In the talk, we will show how Tensor Train decomposition can be used to represent parameters of neural networks [1] and polynomial models [2].
This parametrization allows exponentially many 'virtual' parameters while working only with small factors of the TT-format.
To train the model, i.e. optimize the objective subject to the constraint that the parameters are in the TT-format, [2] uses stochastic Riemannian optimization.
[1] Novikov, A., Podoprikhin, D., Osokin, A., & Vetrov, D. P. (2015). Tensorizing neural networks. In Advances in Neural Information Processing Systems.
[2] Novikov, A., Trofimov, M., & Oseledets, I. (2016). Tensor Train polynomial models via Riemannian optimization. arXiv:1605.03795.
[3] Oseledets, I. (2011). Tensor-train decomposition. SIAM Journal on Scientific Computing.
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsJason Tsai
Given lecture for Deep Learning 101 study group with Frank Wu on Dec. 9th, 2016.
Reference: https://www.deeplearningbook.org/
Initiated by Taiwan AI Group (https://www.facebook.com/groups/Taiwan.AI.Group/)
A Ragdoll-less Approach To Physics Animations of Characters In VehiclesHyojong Shin
This is the slides I used in Siggraph 2019 Talks. Many people asked me to share the slides but PPT file size was too big to upload here as it was including many videos so I converted it to PDF. Unfortunately the animations used in slides and videos are removed. I uploaded an compilation of videos used in this slides here.
https://youtu.be/XN3H6vmmooA
Abstract: This PDSG workshop introduces basic concepts of multiple linear regression in machine learning. Concepts covered are Feature Elimination and Backward Elimination, with examples in Python.
Level: Fundamental
Requirements: Should have some experience with Python programming.
[5분 논문요약] Structured Knowledge Distillation for Semantic SegmentationSang Jun Lee
[Conference paper summary]
Title: Structured Knowledge Distillation for Semantic Segmentation (CVPR 2019 accepted)
Author: Liu et al.
Video: https://youtu.be/n3BxiTmewMM
Presentation of my NSERC-USRA funded summer research project given at the Canadian Undergraduate Mathematics Conference (CUMC) 2014.
Please refer to the project site: http://jessebett.com/Radial-Basis-Function-USRA/
Notes on intersection theory written for a seminar in Bonn in 2010.
Following Fulton's book the following topics are covered:
- Motivation of intersection theory
- Cones and Segre Classes
- Chern Classes
- Gauss-Bonet Formula
- Segre classes under birational morphisms
- Flat pull back
Tensor Train (TT) decomposition [3] is a generalization of SVD decomposition from matrices to tensors (=multidimensional arrays).
It represents a tensor compactly in terms of factors and allows to work with the tensor via its factors without materializing the tensor itself.
For example, we can find the elementwise product of two TT-tensors of size 2^100 and get the result in the TT-format as well.
In the talk, we will show how Tensor Train decomposition can be used to represent parameters of neural networks [1] and polynomial models [2].
This parametrization allows exponentially many 'virtual' parameters while working only with small factors of the TT-format.
To train the model, i.e. optimize the objective subject to the constraint that the parameters are in the TT-format, [2] uses stochastic Riemannian optimization.
[1] Novikov, A., Podoprikhin, D., Osokin, A., & Vetrov, D. P. (2015). Tensorizing neural networks. In Advances in Neural Information Processing Systems.
[2] Novikov, A., Trofimov, M., & Oseledets, I. (2016). Tensor Train polynomial models via Riemannian optimization. arXiv:1605.03795.
[3] Oseledets, I. (2011). Tensor-train decomposition. SIAM Journal on Scientific Computing.
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsJason Tsai
Given lecture for Deep Learning 101 study group with Frank Wu on Dec. 9th, 2016.
Reference: https://www.deeplearningbook.org/
Initiated by Taiwan AI Group (https://www.facebook.com/groups/Taiwan.AI.Group/)
A Ragdoll-less Approach To Physics Animations of Characters In VehiclesHyojong Shin
This is the slides I used in Siggraph 2019 Talks. Many people asked me to share the slides but PPT file size was too big to upload here as it was including many videos so I converted it to PDF. Unfortunately the animations used in slides and videos are removed. I uploaded an compilation of videos used in this slides here.
https://youtu.be/XN3H6vmmooA
Abstract: This PDSG workshop introduces basic concepts of multiple linear regression in machine learning. Concepts covered are Feature Elimination and Backward Elimination, with examples in Python.
Level: Fundamental
Requirements: Should have some experience with Python programming.
[5분 논문요약] Structured Knowledge Distillation for Semantic SegmentationSang Jun Lee
[Conference paper summary]
Title: Structured Knowledge Distillation for Semantic Segmentation (CVPR 2019 accepted)
Author: Liu et al.
Video: https://youtu.be/n3BxiTmewMM
Presentation of my NSERC-USRA funded summer research project given at the Canadian Undergraduate Mathematics Conference (CUMC) 2014.
Please refer to the project site: http://jessebett.com/Radial-Basis-Function-USRA/
Notes on intersection theory written for a seminar in Bonn in 2010.
Following Fulton's book the following topics are covered:
- Motivation of intersection theory
- Cones and Segre Classes
- Chern Classes
- Gauss-Bonet Formula
- Segre classes under birational morphisms
- Flat pull back
The polyadic integer numbers, which form a polyadic ring, are representatives of a fixed congruence class. The basics of polyadic arithmetic are presented: prime polyadic numbers, the polyadic Euler totient function, polyadic division with a remainder, etc. are defined. Secondary congruence classes of polyadic integer numbers, which become ordinary residue classes in the binary limit, and the corresponding finite polyadic rings are introduced. Further, polyadic versions of (prime) finite fields are considered. These can be zeroless, zeroless and nonunital, or have several units; it is even possible for all of their elements to be units. There exist non-isomorphic finite polyadic fields of the same arity shape and order. None of the above situations is possible in the binary case. It is conjectured that any finite polyadic field should contain a certain canonical prime polyadic field as a smallest finite subfield, which can be considered a polyadic analogue of GF (p).
The computation of automorphic forms for a group Gamma is
a major problem in number theory. The only known way to approach the higher rank cases is by computing the action of Hecke operators on the cohomology.
Henceforth, we consider the explicit computation of the cohomology by using cellular complexes. We then explain how the rational elements can be made to act on the complex when it originate from perfect forms. We illustrate the results obtained for the symplectic Sp4(Z) group.
Integral Calculus. - Differential Calculus - Integration as an Inverse Process of Differentiation - Methods of Integration - Integration using trigonometric identities - Integrals of Some Particular Functions - rational function - partial fraction - Integration by partial fractions - standard integrals - First and second fundamental theorem of integral calculus
A tutorial on the Frobenious Theorem, one of the most important results in differential geometry, with emphasis in its use in nonlinear control theory. All results are accompanied by proofs, but for a more thorough and detailed presentation refer to the book of A. Isidori.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
2. Table of contents
1. The old mathematics
2. The fuzzy mathematics
3. Uniformity and local metric structure
4. Implementational details
1
3. In one slide!
By L. McInnes, J. Healy and J. Melville (arXiv:1802.03426). Python
library umap-learn: based on scikit-learn, optimized with numba.
An unsupervised algorithm for non-linear dimensionality reduction. A
noteworthy alternative to t-SNE.
1. Input: N × N distance matrix (e.g. from N pts in Euclidean Rm
).
2. Parameters: num. neighbours κ, embedding dimension d, etc.
3. Topological simplification steps:
a) ∀ i = 1, . . . , N, construct an “almost metric” space Mi local to
entry i by normalizing distances with respect to the κth
nearest entry.
b) Distill the topological and geometric content of each Mi into a
fuzzy simplicial set Fi .
c) The fuzzy union i Fi is a global topological representation.
4. Dimensionality reduction steps:
a) Initialize a cloud Z of N points in Euclidean Rd
.
b) Use fuzzy set cross-entropy to measure distance between Z’s fuzzy
simplicial representation and the input’s.
c) Move points of Z around until this distance is minimized. 2
5. Abstracting away abstract simplicial complexes
An abstract simplicial complex (ASC) is a family X of non-empty finite
sets such that α ∈ X, ∅ = β ⊆ α ⇒ β ∈ X.
If card(α) = n + 1 then α is an n-simplex of X. The set of all n-simplices
of X is denoted by Xn. V = X0 is the set of vertices.
Can construct a geometric realization |X| of X as a simplicial complex
in the vector space RJ
= {functions J → R} where J is any sufficiently
large index set (J = V works).
No real need for a total ordering on V so far. With one, could define
face maps dn
i : Xn → Xn−1 for each n > 0 and 0 ≤ i ≤ n:
α = {v0, . . . , vn} where v0 < · · · < vn =⇒ dn
i (α) = α {vi }.
Idea for a generalization: Do not impose that n-simplices for n ≥ 1 be
sets of vertices. Let them simply be elements of an abstract set Xn.
Trade off this loss for a collection of face maps which should behave as if
they arose from a total ordering.
3
6. Trade off this loss for a collection of face maps which should behave as if
they arose from a total ordering.
→ Promote to axioms key structural properties of the collection of
dn
i : Xn → Xn−1 which don’t require knowing what the simplices look like.
. . . Not much! Only the simplicial identity
(SI) dn−1
i ◦ dn
j = dn−1
j−1 ◦ dn
i : Xn → Xn−2 ∀ 0 ≤ i < j ≤ n.
Sequence of sets (Xn)n∈N0 and {dn
i : Xn → Xn−1} satisfying (SI) → data
for a Delta set (sometimes: “abstract Delta complex”). More general
than ASCs because e.g.:
1. i = j =⇒ di (α) = dj (α);
2. di (α) = di (β) ∀ i =⇒ α = β.
Geometric realization : For each simplex α let |∆α| = |∆dim α
| where
|∆n
| ⊆ Rn+1
is the standard geometric n-simplex. Identify the faces
appropriately to construct the topological space Real(X) as a quotient of
the disjoint union α |∆α|. Hint: (dn
i α, x) ∼ (α, Di
nx) where
Di
n : |∆n−1
| → |∆n
| is the inclusion of the i-th face (a coface map).
4
7. Reorganize: Prototype ordered combinatorial n-simplex: [n] = {0, . . . , n}.
Since {[n]}n∈N0
∼= N0, can think of (Xn)n∈N0 as X : [n] → X([n]) = Xn.
Know how to extract i-th faces of all n-simplices at once:
dn
i : X([n]) → X([n − 1]). dn
i “corresponds to” [n] {i}. But
{[n] {i} : 0 ≤ i ≤ n} ∼= {f : [n − 1] → [n], strictly order-preserving}.
dn
i implements in Xn the prototype map Di
n : [n − 1] → [n] given by
0 → 0, . . . , i → i + 1, . . . , n − 1 → n . . . Familiar?
=⇒ Our Delta set X is an implementation of {[n]}n∈N0
and of the
collection of coface maps. Boring until we notice:
Dj
n ◦ Di
n−1 = Di
n ◦ Dj−1
n−1 ∀ 0 ≤ i < j ≤ n . . . Again familiar?
For [l]
f
−→ [m]
g
−→ [n] let f ◦op
g := g ◦ f . Starting from X(Di
n) := dn
i we
can define X(Di
n−1 ◦op
Dj
n) := X(Di
n−1) ◦ X(Dj
n) consistently thanks to
(SI)! And extend to arbitrary compositions s.t. X(f ◦op
g) = X(f ) ◦ X(g).
Abstract nonsense: A Delta set is a functor X : ∆op
→ Sets where ∆ is
the category with objects the [n]s, and arrows the strictly o.-p. maps.
5
8. Further generalize (yes, really): Easy with categories and functors!
Enlarge collection of arrows to include all non-strictly o.-p. maps. Call
the new category ∆. A simplicial set is a functor X : ∆op
→ Sets. The
collection of simplicial sets has the structure of a category S.
But why? We would like to include “degenerate” simplices. Degeneracy
maps sn
i : X([n]) → X([n + 1]) expose any hidden degenerate simplices
“by repeating the i-th vertex”. Example: (v0, v1, v1) = s1
1 ((v0, v1)), a
degenerate 2-simplex “living inside” (v0, v1). sn
i corresponds to and
implements the unique o.-p. map Si
n : [n + 1] → [n] repeating i twice – a
codegeneracy map and the prototype of a “collapse” of an ordered
simplex. Additional easy-to-check-but-tedious-to-write identities satisfied
when codegeneracy maps are added to the coface maps. Functoriality
yields corresponding identities satisfied by the face and degeneracy maps.
Geometric realization : As for Delta sets, but add equivalences
(sn
i α, x) ∼ (α, Si
nx). Real: S → Top is a functor.
6
10. Motivation for us: Variations on the theme of singular homology of a
topological space Y : Sing(Y ) is the simplicial set defined by
Sing(Y ): [n] → {σ: |∆n
| → Y continuous},
with di σ the restriction of σ to the i-th face and si σ the composition of
σ with a collapse. Sing: Top → S is in fact a functor.
This is just another definition, I want my time back. OK, but first note
down this theorem: for any Y ∈ Top and X ∈ S,
(Adj) {Top-arrows Real(X) → Y } ∼= {S-arrows X → Sing(Y )}.
Interpretation
Sing and Real are not inverses, but if you did Real(Sing(Y )) the result
would have topologically a lot in common with Y .
UMAP employs a cousin of this result where Top is replaced by a
category of finite “almost metric” spaces because these are directly and
naturally defined by the data. What, then, must replace S, Real and Sing
to yield something analogous to (Adj)?
8
12. Fuzzy sets
In sets, the membership relation ∈ is binary: either x ∈ A or x /∈ A. A
fuzzy set is a pair (A, µ) where A is a carrier set and µ: A → [0, 1] is a
membership function, i.e. µ(x) is the membership strength of x to A.
Interpreting µ as a “field of Bernoulli probabilities” suggests fuzzy
analogues to the standard Boolean operators ∪ and ∩:
(A, µ) ∩ (B, ν) = (A ∩ B, (µ, ν)), with e.g. (µ, ν) := µν
(A, µ) ∪ (B, ν) = (A ∪ B, ¬ (¬µ, ¬ν)), with e.g. ¬(x) := 1 − x
=⇒ ¬ (¬µ, ¬ν) = µ + ν − µν.
If A = B = U, the fuzzy set cross entropy between (U, µ) and (U, ν) is
C((U, µ), (U, ν)) =
u∈U
KL Bern(µ(u)) Bern(ν(u))
=
u∈U
µ(u) log
µ(u)
ν(u)
+ (1 − µ(u)) log
1 − µ(u)
1 − ν(u)
.
9
13. Fuzzy simplicial sets
A simplicial set was a functor ∆op
→ Sets. A fuzzy simplicial set is a
functor X : ∆op
→ Fuzz where Fuzz is the category of fuzzy sets. sFuzz
is the category of fuzzy simplicial sets.
“Concretely”: Let I be (0, 1] ⊂ R,1
then can view X ∈ sFuzz as a
functor X : (∆ × I)op
→ Sets. For each n, there is a fuzzy set (Xn, µn).
Define X([n], a) := µ−1
n ([a, 1]).
Geometric realization. . . ? For simplicial sets, Real(X) = α |∆α|/ ∼
where each |∆α| = |∆dim α
|. Reliant on the fact that for each object in
∆op
– i.e. for each n – we have a model space |∆n
| ∈ Top. Here objects
in the source category (∆ × I)op
contain the extra piece of information
a ∈ (0, 1]. If we had equivalent model spaces |∆n
a| and chose a category
C |∆n
a| to replace Top we could define a fuzzy set realization functor
fReal: sFuzz → C “analogously” to Real.
1As a category. . .
10
14. The correct adjunction
Recall (Adj) relating Sing: Top → S and Real: S → Top. |∆n
| appears
in the definition of Real but also of Sing:
Sing(Y )([n]) = {σ: |∆n
| → Y cts} = {Top-arrows |∆n
| → Y }.
With a choice of “geometric” category C and of model space |∆n
a| ∈ C,
we can define by analogy
fSing(Y )([n], a) = {C-arrows |∆n
a| → Y } so that fSing: C → sFuzz.
The obvious question
What are “correct” choices of C and |∆n
a|?
Our answer
Ones yielding a relation between fSing and fReal analogous to (Adj): e.g.
C = EψMet, |∆n
a| = (t0, . . . , tn) ∈ Rn+1
n
i=0
ti = − log(a), ti ≥ 0 .
(Spivak 2012). EψMet is extended
dist=∞ allowed
pseudo
dist(x,y)=0 =⇒ x=y
-metric spaces.
11
15. Finite version
Starting from a real-life point cloud we can at best hope to encode the
metric structure in a finite almost-metric space. Need finite analogs
Fin-EψMet, Fin-sFuzz, |∆n
a|Fin ∈ Fin-EψMet,
Fin-EψMet
Fin-fSing
−−−−−→ Fin-sFuzz
Fin-fReal
−−−−−→ Fin-EψMet,
and a finite fuzzy analog (Fin-fAdj) of (Adj). Their (straightforward)
definitions and a proof of (Fin-fAdj) are the main mathematical
contributions of the UMAP paper.
Where we at?
If our data problem naturally yields an object M ∈ Fin-EψMet, we can
theoretically distill much of the topological information by computing
Fin-fSing(M)([n], a) ∀ n ≥ 1, a ∈ (0, 1]. If we have a collection {Mi }N
i=1
instead, we can first apply Fin-fSing individually and then take fuzzy
unions! This will give us a global, fuzzy simplicial representation.
12
16. Computer-friendly version
We descend back to planet Earth.
Truncate: Stop the computation of Fin-fSing(M) at some small finite n!
Maximally cheap: n = 1.
Understand the output data structure: Requires a look at the definitions.
|∆n
a|Fin := ({ 0, . . . , n}, da), da( i , j ) = −(1 − δij ) log a,
Fin-fSing(M)([n], a) := {Fin-EψMet-arrows |∆n
a|Fin → M}
= {distance non-increasing maps |∆n
a|Fin → M}.
So |∆1
a| ∼= ({0, − log a}, dEucl) and, if M = (M, d):
Fin-fSing(M)([1], a) = {(p, q) ∈ M × M | d(p, q) ≤ − log a}.
So the fuzzy set of 1-simplices is (M × M, µ) where µ(p, q) = e−d(p,q)
.
Just a weighted graph!
13
18. Fuzzy set cross-entropy
Let E be the abstract set of all possible 1-simplices and suppose we have
two fuzzy sets (E, µh) and (E, µl ) – in our views these should correspond
to high and low dimensional representations respectively. Then the fuzzy
set cross entropy will be
e∈E
µh(e) log
µh(e)
µl (e)
+ (1 − µh(e)) log
1 − µh(e)
1 − µl (e)
For fixed µh, minimizing this as a function of µl can be viewed as a
force directed graph layout algorithm:
• First term is minimized when µl (e) is as large as possible, i.e. when
the distance between the points is as small as possible =⇒ an
“attractive force” which is larger when µh(e) is large.
• The second term will be minimized by making µl (e) as small as
possible =⇒ a “repulsive force” between the ends of e whenever
µh(e) is small.
15
20. Why uniformity? (Very vaguely)
Some motivation: the ˇCech complex construction from a finite sample of
points is best at topologically reconstructing the underlying manifold
when the points are sampled uniformly.
Theorem (Niyogi et al. 2008). Let M be a smooth, compact
submanifold of Rn
with injectivity radius τ. Let D be a collection of
points on M such that the minimal distance between any point of M
and D is less than /2 for < τ 3/5 – say that D is 2
-dense in M.
Then the ˇCech complex ˇC2 (D) deformation retracts to M ( =⇒
homotopy equivalence =⇒ same homology).
Other results show that the more points we sample uniformly from M,
the higher the probability that the resulting D will be 2 -dense.
16
21. Learning local metric spaces from data
Basic idea: If enough data is sampled uniformly from a Riemannian
manifold, we should be able to estimate the local metric from the local
density of sample points.
Can estimate the local metric structure relative to which the data would
be uniformly sampled by enforcing that spheres of radius δ centred at
different locations in the point cloud should contain the same number K
of sample points.
In practice, locally rescale distances between each reference point and the
rest of the cloud by making sure this is the case.
17
23. Local (extended pseudo-)metric spaces
Start from an N × N distance matrix D, fix κ ≥ 1. Na¨ıve idea: define,
for i = 1, . . . , N, Mi = (M = {xi }N
i=1, di ) where ∀ j = i
di (xi , xj ) =
Dij − ρi
σi
,
ρi
σi
:= dist. between xi and its
1st
κth
NN,
and all other independent distances are infinite. di (xi , 1st NN) = 0 =⇒
corresponding edge has membership strength 1 =⇒ local connectivity.
Current implementational shortcuts
Using the nearest neighbour descent algorithm (Dong et al 2011) to
efficiently yield an approximate κ-nearest neighbour graph data structure.
The actual normalizing factor is a “smoothed” version of σi : ˆσi s.t.
xjk
∈κ-NNi
exp−(Dijk
−ρi )/ ˆσi
= log2 κ.
RHS chosen experimentally! Final Eψ-metric has points outside κ-NNi
∞-ly far away from xi . Reduction in complexity from O(N2
) to O(Nκ)!
18
24. Embedding initialization
Fuzzy union of all local fuzzy sets of edges gives an undirected weighted
graph with weighted adjacency matrix B. With D the degree matrix,
L := D−1/2
(D − B)D−1/2
= I − D−1/2
BD−1/2
is the symmetric normalized Laplacian. If the data were generated by
sampling from a Riemannian manifold, L should be closely related to the
Laplace–Beltrami operator. Exploit this to initialize the low dimensional
representation into a good state by spectral embedding techniques.
In practice
Components of eigenvectors associated with d smallest non-zero
eigenvalues of L (listed in ascending e-value order) used to initialize the
embedding to a point cloud Z = {Z1, . . . , ZN } ⊂ Rd
.
19
25. Embedding optimization (briefly)
Recall the optimization objective: if (E, µh) =
N
i=1 Fin-fSing(Mi )([1])
and Z := (Z, dEucl) then the loss function is
L(Z) = C (E, µh), (E, µ(Z)) where (E, µ(Z)) := Fin-fSing(Z)([1]).
Several shortcuts:
• Use stochastic gradient descent
• (S)GD would benefit from the final objective function being
differentiable. But Fin-fSin – as a function of N points in Rd
– is
not! Use a smooth approximation of the actual membership strength
function for the low dimensional representation, selecting from a
suitably versatile family. In practice UMAP uses the family of curves
1
1+ax2b .
• Don’t want to have to deal with all possible edges, so use the
negative sampling trick (as in word2vec and LargeVis), to sample
negative examples as needed.
20