Successfully reported this slideshow.

Fast Unbalanced Optimal Transport on a Tree

0

Share

Upcoming SlideShare
Scalable Machine Learning
Scalable Machine Learning
Loading in …3
×
1 of 13
1 of 13

Fast Unbalanced Optimal Transport on a Tree

0

Share

Download to read offline

Slides on "Fast Unbalanced Optimal Transport on a Tree" (NeurIPS 2020) presented in Australia-Japan Workshop on Machine Learning (AJML) 2021

Slides on "Fast Unbalanced Optimal Transport on a Tree" (NeurIPS 2020) presented in Australia-Japan Workshop on Machine Learning (AJML) 2021

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Fast Unbalanced Optimal Transport on a Tree

  1. 1. 1 KYOTO UNIVERSITY KYOTO UNIVERSITY Fast Unbalanced Optimal Transport on a Tree Ryoma Sato Kyoto University / RIKEN AIP
  2. 2. 2 / 13 KYOTO UNIVERSITY Self Introduction  I am a second year master's student at Kyoto University  I’m interested in algorithmic aspects of machine learning and data mining for structured data, including Graph neural networks:  Ryoma Sato, Makoto Yamada, Hisashi Kashima. Approximation Ratios of Graph Neural Networks for Combinatorial Problems. NeurIPS 2019.  Ryoma Sato, Makoto Yamada, Hisashi Kashima. Random Features Strengthen Graph Neural Networks. SDM 2021 Optimal transport:  Ryoma Sato, Makoto Yamada, Hisashi Kashima. Fast Unbalanced Optimal Transport on a Tree. NeurIPS 2020.  Today’s topic
  3. 3. 3 / 13 KYOTO UNIVERSITY Background: optimal transport is useful  The optimal transport (OT) distance measures the discrepancy of two distributions. We consider discrete distributions in this presentation.  The OT distance is the minimum cumulative distance that all masses need to travel from one distribution to another distribution  In generative modeling, a mass is a sample. discrepancy of model  sample distribution  In text classification, a mass is a word.  OT does not require the same support  KL divergence  OT exploits the underground geometry From Word Embeddings To Document Distances, ICML 2015
  4. 4. 4 / 13 KYOTO UNIVERSITY Background: sliced OT is computationally cheap  OT is formulated as a linear program  cubic cost  Sliced OT projects distributions to random 1D spaces and computes OT there  Greedy matching solves 1D OT exactly  linear cost : distance matrix (input), : matching matrix (variable) : 1st mass vector (input), : 2nd mass vector (input) The leftmost mass should be matched to the leftmost mass The second leftmost mass should be matched to the second leftmost mass ... https://www.programmersought.com/article/67174999352/ https://analyticsindiamag.com/how-to-establish-domain-transferability-in-neural-models/
  5. 5. 5 / 13 KYOTO UNIVERSITY Background: unbalanced OT is robust  OT is sensitive to outliers because transporting outliers becomes the dominating term  Unbalanced OT (UOT) allows to discard and create masses by paying some penalties  We can discard outliers  robust to outliers  UOT is also formulated by a linear program  cubic cost
  6. 6. 6 / 13 KYOTO UNIVERSITY Background: UOT is difficult even in 1D spaces  We want to make a cheap alternative of UOT as 1D OT  But the greedy matching fails to solve 1D UOT  Let’s consider the following instance with discard cost λ  The following plan costs 3λ.  The following plan costs 2λ + 2ε. Thus this is better. λ λ λ
  7. 7. 7 / 13 KYOTO UNIVERSITY Background: UOT is difficult even in 1D spaces  Let’s consider the following instance with discard cost λ  The following plan costs λ + 2ε.  The following plan costs 2λ + 2ε. Thus this is worse. λ ε
  8. 8. 8 / 13 KYOTO UNIVERSITY Background: UOT is difficult even in 1D spaces  Although these two instances share the leftmost part, the leftmost mass in the first instance should be discarded while that in the second instance should not  The optimal UOT plan cannot be determined locally  The optimal OT plan is determined locally  Thus the greedy algorithm fails to solve 1D UOT  We proposed how to solve 1D UOT efficiently λ λ λ λ ε
  9. 9. 9 / 13 KYOTO UNIVERSITY Algorithm: prune redundant plans  Our proposed method determines assignments from left to right (as the greedy algorithm)  Although there are exponentially many plans, most of them are redundant. We proved that only O(n) plans are non-redundant  Only one plan is non-redundant (thus greedy is valid) in the standard OT not yet not yet  non redundant  redundant
  10. 10. 10 / 13 KYOTO UNIVERSITY Algorithm: we solve 1D UOT in O(n log2 n) time  A naive algorithm requires cubic time even with this (non redundant plan) observation  More algorithmic techniques are required for further speedup (skipped in this presentation)  Dynamic programming  Fast convex min-sum convolution  Efficient data structure (BBST)  Weighted union heuristics  Finally, we derived a quasi-linear time algorithm which runs in O(n log2 n) time in the worst case
  11. 11. 11 / 13 KYOTO UNIVERSITY Algorithm: tree UOT generalizes 1D UOT  Our method can be extended to tree spaces A 1D space (path) is a special case of tree spaces  In text classification, the word space can be represented by a word tree. Each mass (word) travels on the word tree to a nearby (semantically similar) word.  We can “tree-slice” high dimensional spaces instead of 1D-slicing, which captures richer structures http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/117-hcpc-hierarchical-clustering-on-principal-components-essentials/ a → b
  12. 12. 12 / 13 KYOTO UNIVERSITY Experiments: our algorithm is empirically fast  We confirmed that our algorithm could compute tree UOT with one million masses within one second  We also confirmed that tree-slicing high dimensional spaces could approximate the original UOT problem
  13. 13. 13 / 13 KYOTO UNIVERSITY Conclusion: fast computation of tree UOT  Sliced OT is a fast alternative of OT  UOT is a robust variant of OT  1D UOT is more difficult than 1D OT  We proposed an efficient algorithm for 1D UOT for the first time  Our method can be extended to tree spaces  Our method is empirically fast (1M masses in 1 sec)

×