Fast Unbalanced Optimal Transport on a Tree

1 KYOTO UNIVERSITY
KYOTO UNIVERSITY
Fast Unbalanced Optimal Transport on a Tree
Ryoma Sato
Kyoto University / RIKEN AIP

2 / 13 KYOTO UNIVERSITY
Self Introduction

I am a second year master's student at
Kyoto University

I’m interested in algorithmic aspects of machine
learning and data mining for structured data, including
Graph neural networks:

Ryoma Sato, Makoto Yamada, Hisashi Kashima. Approximation Ratios of
Graph Neural Networks for Combinatorial Problems. NeurIPS 2019.

Ryoma Sato, Makoto Yamada, Hisashi Kashima. Random Features
Strengthen Graph Neural Networks. SDM 2021
Optimal transport:

Ryoma Sato, Makoto Yamada, Hisashi Kashima. Fast Unbalanced Optimal
Transport on a Tree. NeurIPS 2020.
 Today’s topic

Background: optimal transport is useful

The optimal transport (OT) distance measures
the discrepancy of two distributions.
We consider discrete distributions in this presentation.

The OT distance is the minimum cumulative
distance that all masses need to travel from
one distribution to another distribution

In generative modeling, a mass is a sample.
discrepancy of model  sample distribution

In text classification, a mass is a word.
 OT does not require the same support  KL divergence
 OT exploits the underground geometry
From Word Embeddings To
Document Distances, ICML 2015

Background: sliced OT is computationally cheap

OT is formulated as a linear program  cubic cost

Sliced OT projects distributions to random
1D spaces and computes OT there

Greedy matching solves 1D OT exactly  linear cost
: distance matrix (input), : matching matrix (variable)
: 1st mass vector (input), : 2nd mass vector (input)
The leftmost mass should be matched to
the leftmost mass
The second leftmost mass should be matched
to the second leftmost mass ...
https://www.programmersought.com/article/67174999352/
https://analyticsindiamag.com/how-to-establish-domain-transferability-in-neural-models/

Background: unbalanced OT is robust

OT is sensitive to outliers because transporting outliers
becomes the dominating term

Unbalanced OT (UOT) allows to discard and create
masses by paying some penalties
 We can discard outliers  robust to outliers

UOT is also formulated by a linear program
 cubic cost

Background: UOT is difficult even in 1D spaces

We want to make a cheap alternative of UOT as 1D OT

But the greedy matching fails to solve 1D UOT

Let’s consider the following instance with discard cost λ

The following plan costs 3λ.

The following plan costs 2λ + 2ε. Thus this is better.
λ λ λ


Let’s consider the following instance with discard cost λ

The following plan costs λ + 2ε.

The following plan costs 2λ + 2ε. Thus this is worse.
λ ε


Although these two instances share the leftmost part,
the leftmost mass in the first instance should be
discarded while that in the second instance should not

The optimal UOT plan cannot be determined locally
 The optimal OT plan is determined locally

Thus the greedy algorithm fails to solve 1D UOT

We proposed how to solve 1D UOT efficiently
λ λ λ
λ ε

Algorithm: prune redundant plans

Our proposed method determines assignments from
left to right (as the greedy algorithm)

Although there are exponentially many plans, most of
them are redundant.
We proved that only O(n) plans are non-redundant
 Only one plan is non-redundant (thus greedy is valid) in the standard OT
not yet
not yet
 non redundant
 redundant

Algorithm: we solve 1D UOT in O(n log2
n) time

A naive algorithm requires cubic time even with this
(non redundant plan) observation

More algorithmic techniques are required for further
speedup (skipped in this presentation)

Dynamic programming

Fast convex min-sum convolution

Efficient data structure (BBST)

Weighted union heuristics

Finally, we derived a quasi-linear time algorithm
which runs in O(n log2
n) time in the worst case

Algorithm: tree UOT generalizes 1D UOT

Our method can be extended to tree spaces
A 1D space (path) is a special case of tree spaces

In text classification, the word
space can be represented by a
word tree. Each mass (word)
travels on the word tree to a
nearby (semantically similar) word.

We can “tree-slice” high dimensional
spaces instead of 1D-slicing,
which captures richer structures
http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/117-hcpc-hierarchical-clustering-on-principal-components-essentials/
a → b

Experiments: our algorithm is empirically fast

We confirmed that our algorithm could compute tree
UOT with one million masses within one second

We also confirmed that tree-slicing high dimensional
spaces could approximate the original UOT problem

Conclusion: fast computation of tree UOT

Sliced OT is a fast alternative of OT

UOT is a robust variant of OT

1D UOT is more difficult than 1D OT

We proposed an efficient algorithm for 1D UOT for the
first time

Our method can be extended to tree spaces

Our method is empirically fast (1M masses in 1 sec)

Fast Unbalanced Optimal Transport on a Tree

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Fast Unbalanced Optimal Transport on a Tree

Similar to Fast Unbalanced Optimal Transport on a Tree (20)

More from joisino

More from joisino (12)

Recently uploaded

Recently uploaded (20)

Fast Unbalanced Optimal Transport on a Tree