1. 1 KYOTO UNIVERSITY
KYOTO UNIVERSITY
Fast Unbalanced Optimal Transport on a Tree
Ryoma Sato
Kyoto University / RIKEN AIP
2. 2 / 13 KYOTO UNIVERSITY
Self Introduction
I am a second year master's student at
Kyoto University
I’m interested in algorithmic aspects of machine
learning and data mining for structured data, including
Graph neural networks:
Ryoma Sato, Makoto Yamada, Hisashi Kashima. Approximation Ratios of
Graph Neural Networks for Combinatorial Problems. NeurIPS 2019.
Ryoma Sato, Makoto Yamada, Hisashi Kashima. Random Features
Strengthen Graph Neural Networks. SDM 2021
Optimal transport:
Ryoma Sato, Makoto Yamada, Hisashi Kashima. Fast Unbalanced Optimal
Transport on a Tree. NeurIPS 2020.
Today’s topic
3. 3 / 13 KYOTO UNIVERSITY
Background: optimal transport is useful
The optimal transport (OT) distance measures
the discrepancy of two distributions.
We consider discrete distributions in this presentation.
The OT distance is the minimum cumulative
distance that all masses need to travel from
one distribution to another distribution
In generative modeling, a mass is a sample.
discrepancy of model sample distribution
In text classification, a mass is a word.
OT does not require the same support KL divergence
OT exploits the underground geometry
From Word Embeddings To
Document Distances, ICML 2015
4. 4 / 13 KYOTO UNIVERSITY
Background: sliced OT is computationally cheap
OT is formulated as a linear program cubic cost
Sliced OT projects distributions to random
1D spaces and computes OT there
Greedy matching solves 1D OT exactly linear cost
: distance matrix (input), : matching matrix (variable)
: 1st mass vector (input), : 2nd mass vector (input)
The leftmost mass should be matched to
the leftmost mass
The second leftmost mass should be matched
to the second leftmost mass ...
https://www.programmersought.com/article/67174999352/
https://analyticsindiamag.com/how-to-establish-domain-transferability-in-neural-models/
5. 5 / 13 KYOTO UNIVERSITY
Background: unbalanced OT is robust
OT is sensitive to outliers because transporting outliers
becomes the dominating term
Unbalanced OT (UOT) allows to discard and create
masses by paying some penalties
We can discard outliers robust to outliers
UOT is also formulated by a linear program
cubic cost
6. 6 / 13 KYOTO UNIVERSITY
Background: UOT is difficult even in 1D spaces
We want to make a cheap alternative of UOT as 1D OT
But the greedy matching fails to solve 1D UOT
Let’s consider the following instance with discard cost λ
The following plan costs 3λ.
The following plan costs 2λ + 2ε. Thus this is better.
λ λ λ
7. 7 / 13 KYOTO UNIVERSITY
Background: UOT is difficult even in 1D spaces
Let’s consider the following instance with discard cost λ
The following plan costs λ + 2ε.
The following plan costs 2λ + 2ε. Thus this is worse.
λ ε
8. 8 / 13 KYOTO UNIVERSITY
Background: UOT is difficult even in 1D spaces
Although these two instances share the leftmost part,
the leftmost mass in the first instance should be
discarded while that in the second instance should not
The optimal UOT plan cannot be determined locally
The optimal OT plan is determined locally
Thus the greedy algorithm fails to solve 1D UOT
We proposed how to solve 1D UOT efficiently
λ λ λ
λ ε
9. 9 / 13 KYOTO UNIVERSITY
Algorithm: prune redundant plans
Our proposed method determines assignments from
left to right (as the greedy algorithm)
Although there are exponentially many plans, most of
them are redundant.
We proved that only O(n) plans are non-redundant
Only one plan is non-redundant (thus greedy is valid) in the standard OT
not yet
not yet
non redundant
redundant
10. 10 / 13 KYOTO UNIVERSITY
Algorithm: we solve 1D UOT in O(n log2
n) time
A naive algorithm requires cubic time even with this
(non redundant plan) observation
More algorithmic techniques are required for further
speedup (skipped in this presentation)
Dynamic programming
Fast convex min-sum convolution
Efficient data structure (BBST)
Weighted union heuristics
Finally, we derived a quasi-linear time algorithm
which runs in O(n log2
n) time in the worst case
11. 11 / 13 KYOTO UNIVERSITY
Algorithm: tree UOT generalizes 1D UOT
Our method can be extended to tree spaces
A 1D space (path) is a special case of tree spaces
In text classification, the word
space can be represented by a
word tree. Each mass (word)
travels on the word tree to a
nearby (semantically similar) word.
We can “tree-slice” high dimensional
spaces instead of 1D-slicing,
which captures richer structures
http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/117-hcpc-hierarchical-clustering-on-principal-components-essentials/
a → b
12. 12 / 13 KYOTO UNIVERSITY
Experiments: our algorithm is empirically fast
We confirmed that our algorithm could compute tree
UOT with one million masses within one second
We also confirmed that tree-slicing high dimensional
spaces could approximate the original UOT problem
13. 13 / 13 KYOTO UNIVERSITY
Conclusion: fast computation of tree UOT
Sliced OT is a fast alternative of OT
UOT is a robust variant of OT
1D UOT is more difficult than 1D OT
We proposed an efficient algorithm for 1D UOT for the
first time
Our method can be extended to tree spaces
Our method is empirically fast (1M masses in 1 sec)