•

1 like•1,827 views

NeurIPS2018 Reading Club@PFN https://connpass.com/event/115476/

Report

Share

Report

Share

Download to read offline

We consider stochastic optimization problems arising in deep learning and other areas of statistical and machine learning from a statistical decision theory perspective. In particular, we investigate the admissibility (in the sense of decision theory) of the sample average solution estimator. We show that this estimator can be inadmissible in very simple settings, a phenomenon that is derived from the classical James-Stein estimator. However, for many problems of interest, the sample average estimator is indeed admissible. We will end with several open questions in this research directiDeep Learning Opening Workshop - Admissibility of Solution Estimators in Stoc...

Deep Learning Opening Workshop - Admissibility of Solution Estimators in Stoc...The Statistical and Applied Mathematical Sciences Institute

We consider stochastic optimization problems arising in deep learning and other areas of statistical and machine learning from a statistical decision theory perspective. In particular, we investigate the admissibility (in the sense of decision theory) of the sample average solution estimator. We show that this estimator can be inadmissible in very simple settings, a phenomenon that is derived from the classical James-Stein estimator. However, for many problems of interest, the sample average estimator is indeed admissible. We will end with several open questions in this research directiDeep Learning Opening Workshop - Admissibility of Solution Estimators in Stoc...

Deep Learning Opening Workshop - Admissibility of Solution Estimators in Stoc...The Statistical and Applied Mathematical Sciences Institute

Individualized treatment rules (ITR) assign treatments according to different patients' characteristics. Despite recent advances on the estimation of ITRs, much less attention has been given to uncertainty assessments for the estimated rules. We propose a hypothesis testing procedure for the estimated ITRs from a general framework that directly optimizes overall treatment bene t equipped with sparse penalties. Specifically, we construct a local test for testing low dimensional components of high-dimensional linear decision rules. The procedure can apply to observational studies by taking into account the additional variability from the estimation of propensity score. Theoretically, our test extends the decorrelated score test proposed in Nang and Liu (2017, Ann. Stat.) and is valid no matter whether model selection consistency for the true parameters holds or not. The proposed methodology is illustrated with numerical studies and a real data example on electronic health records of patients with Type-II Diabetes.PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...

PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...The Statistical and Applied Mathematical Sciences Institute

Individualized treatment rules (ITR) assign treatments according to different patients' characteristics. Despite recent advances on the estimation of ITRs, much less attention has been given to uncertainty assessments for the estimated rules. We propose a hypothesis testing procedure for the estimated ITRs from a general framework that directly optimizes overall treatment bene t equipped with sparse penalties. Specifically, we construct a local test for testing low dimensional components of high-dimensional linear decision rules. The procedure can apply to observational studies by taking into account the additional variability from the estimation of propensity score. Theoretically, our test extends the decorrelated score test proposed in Nang and Liu (2017, Ann. Stat.) and is valid no matter whether model selection consistency for the true parameters holds or not. The proposed methodology is illustrated with numerical studies and a real data example on electronic health records of patients with Type-II Diabetes.PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...

PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...The Statistical and Applied Mathematical Sciences Institute

New Insights and Perspectives on the Natural Gradient Method

New Insights and Perspectives on the Natural Gradient Method

Strategic Argumentation is NP-complete

Strategic Argumentation is NP-complete

Approximate Bayesian model choice via random forests

Approximate Bayesian model choice via random forests

PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...

PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...

Mapping analysis

Mapping analysis

k-MLE: A fast algorithm for learning statistical mixture models

k-MLE: A fast algorithm for learning statistical mixture models

A Statistical Perspective on Retrieval-Based Models.pdf

A Statistical Perspective on Retrieval-Based Models.pdf

Lecture notes

Lecture notes

Regret Minimization in Multi-objective Submodular Function Maximization

Regret Minimization in Multi-objective Submodular Function Maximization

On the smallest enclosing information disk

On the smallest enclosing information disk

Slides lln-risques

Slides lln-risques

Uncoupled Regression from Pairwise Comparison Data

Uncoupled Regression from Pairwise Comparison Data

NBBC15, Reyjavik, June 08, 2015

NBBC15, Reyjavik, June 08, 2015

The dual geometry of Shannon information

The dual geometry of Shannon information

Rademacher Averages: Theory and Practice

Rademacher Averages: Theory and Practice

A Unified Perspective for Darmon Points

A Unified Perspective for Darmon Points

Double Robustness: Theory and Applications with Missing Data

Double Robustness: Theory and Applications with Missing Data

Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...

Slides: Total Jensen divergences: Definition, Properties and k-Means++ Cluste...

Workshop in honour of Don Poskitt and Gael Martin

Workshop in honour of Don Poskitt and Gael Martin

Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...

Nodal Domain Theorem for the p-Laplacian on Graphs and the Related Multiway C...

Deep learning for molecules, introduction to chainer chemistry

Deep learning for molecules, introduction to chainer chemistry

Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Comparison of deep learning frameworks from a viewpoint of double backpropaga...

Comparison of deep learning frameworks from a viewpoint of double backpropaga...

深層学習フレームワーク概要とChainerの事例紹介

深層学習フレームワーク概要とChainerの事例紹介

20170422 数学カフェ Part2

20170422 数学カフェ Part2

20170422 数学カフェ Part1

20170422 数学カフェ Part1

情報幾何学の基礎、第7章発表ノート

情報幾何学の基礎、第7章発表ノート

GTC Japan 2016 Chainer feature introduction

GTC Japan 2016 Chainer feature introduction

On the benchmark of Chainer

On the benchmark of Chainer

Tokyo Webmining Talk1

Tokyo Webmining Talk1

VAE-type Deep Generative Models

VAE-type Deep Generative Models

Common Design of Deep Learning Frameworks

Common Design of Deep Learning Frameworks

Introduction to Chainer and CuPy

Introduction to Chainer and CuPy

Stochastic Gradient MCMC

Stochastic Gradient MCMC

Chainer Contribution Guide

Chainer Contribution Guide

2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用

2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用

Introduction to Chainer (LL Ring Recursive)

Introduction to Chainer (LL Ring Recursive)

日本神経回路学会セミナー「DeepLearningを使ってみよう！」資料

日本神経回路学会セミナー「DeepLearningを使ってみよう！」資料

提供AMIについて

提供AMIについて

Chainerインストール

Chainerインストール

ECS 2024 Teams Premium - Pretty Secure

ECS 2024 Teams Premium - Pretty Secure

UiPath Test Automation using UiPath Test Suite series, part 2

UiPath Test Automation using UiPath Test Suite series, part 2

Buy Epson EcoTank L3210 Colour Printer Online.pdf

Buy Epson EcoTank L3210 Colour Printer Online.pdf

Buy Epson EcoTank L3210 Colour Printer Online.pptx

Buy Epson EcoTank L3210 Colour Printer Online.pptx

IoT Analytics Company Presentation May 2024

IoT Analytics Company Presentation May 2024

Optimizing NoSQL Performance Through Observability

Optimizing NoSQL Performance Through Observability

Demystifying gRPC in .Net by John Staveley

Demystifying gRPC in .Net by John Staveley

A Business-Centric Approach to Design System Strategy

A Business-Centric Approach to Design System Strategy

How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf

How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf

FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...

FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...

Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)

Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)

Strategic AI Integration in Engineering Teams

Strategic AI Integration in Engineering Teams

Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom

Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl

Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...

Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...

Where to Learn More About FDO _ Richard at FIDO Alliance.pdf

Where to Learn More About FDO _ Richard at FIDO Alliance.pdf

Top 10 Symfony Development Companies 2024

Top 10 Symfony Development Companies 2024

The Metaverse: Are We There Yet?

The Metaverse: Are We There Yet?

AI revolution and Salesforce, Jiří Karpíšek

AI revolution and Salesforce, Jiří Karpíšek

Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx

Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx

- 1. Minimax statistical learning with Wasserstein distances by Jaeho Lee and Maxim Raginsky January 26, 2019 Presenter: Kenta Oono @ NeurIPS 2018 Reading Club
- 2. Kenta Oono (@delta2323 ) Proﬁle • 2011.3: MSc. (Mathematics) • 2011.4-2014.10: Preferred Infrastructure (PFI) • 2014.10-current: Preferred Networks (PFN) • 2018.4-current: Ph.D student @U.Tokyo Interests • Mathematics • Bioinformatics • Theory of Deep Learning 2/18
- 3. Summary What this paper does. • Develop a distributionally-robust risk minimization problem. • Derive the excess-risk rate O(n−1 2 ), same as the non-robust case. • Application to domain adaptation. Why I choose this paper? • Spotlight talk • Wanted to learn statistics learning theory • Especially minimax optimality of DL. But this paper turned out to not be about it. • Wanted to learn Wasserstein distance 3/18
- 4. Problem Setting (Expected Risk) Given • Z: sample space • P: (unknown) distribution over Z • Dataset: D = (z1, . . . , zN) ∼ P i.i.d. For a hypothesis f : Z → R, we evaluate its expected risk by • Expected Risk: R(P, f ) = EZ∼P[f (Z)] • Hypothesis space: F ⊂ {Z → R} 4/18
- 5. Problem Setting (Estimator) Goal: • Devise an algorithm A : D → ˆf = ˆf (D) • We treat D as a random variable. So, is ˆf . • If A is a random algorithm (e.g. SGD), randomness of ˆf (D) comes from A, too. • Evaluate excess risk: R(P, ˆf ) − inff ∈F R(P, f ) Typical form of theorems: • EA,D[R(P, ˆf ) − inff ∈F R(P, f )] = O(g(n)) • R(P, ˆf ) − inff ∈F R(P, f ) = O(g(n, δ)) with probability 1 − δ with respect to the choice of D (and A) 5/18
- 6. Problem Setting (ERM Estimator) Since we cannot compute the expected risk R, we compute empirical risk instead: ˆRD(f ) = 1 n n i=1 f (zi ) = R(Pn, f ) (Pn: empirical distribution). ERM (Empirical Risk Minimization) estimator for hypothesis space F is ˆf = ˆf (D) ∈ min f ∈F R(Pn, f ) 6/18
- 8. Assumptions + OR Ref. Lee and Raginsky (2018) 8/18
- 9. Example Supervised learning • Z = (X, Y ), X = RD: input space, Y = R: label space • : Y × Y → R: loss function • H ⊂ {X → Y }: set of models • F = {fh(x, y) = (h(x), y)|h ∈ H} Regression • X = RD, Y = R, (y, y) = (y − y)2 • H = (Function realized by a neural networks with a ﬁxed architecture) 9/18
- 10. Classical Result Typically, we have R(P, ˆf ) − inf f ∈F R(P, f ) = OP complexity of F √ n Model complexity measure complexity of F (intuitively, how ”large” F is) 10/18
- 11. Covering number Deﬁnition (Covering Number) For F ⊂ F0 := {f : [−1, 1]D → R}, and ε > 0, the (external) covering number of F is N(F, ε) := inf N ∈ N ∃f1, . . . , fN ∈ F0 s.t. ∀f ∈ F, ∃n ∈ [N] s.t. f − fn ∞ ≤ ε . • Intuition: the minimum # of balls (with radius ε) to cover the space F. • Entropy integral: C(F) := ∞ 0 log N(F, u) du. 11/18
- 12. Distributionally Robust Framework Minimize the worst-case risk close to true distribution P. minimize R(P, f ) ↓ minimize Rρ,p(P, f ) := supQ∈Aρ,p(P) R(Q, f ) We consider p-Wasserstein distance: Aρ,p(P) = {Q|Wp(P, Q) ≤ ρ} Applications • Adversarial attack: ρ = noise level • Domain adaptation: ρ = discrepancy level of train/test dists. 12/18
- 13. Estimator Correspondingly, we change the estimator ˆf ∈ inf f ∈F Rρ,p(Pn, f ) Want to evaluate Rρ,p(P, ˆf ) − inf f ∈F Rρ,pR(P, f ) 13/18
- 14. Main Theorems Same excess-risk rate as the non-robust setting. Ref. Lee and Raginsky (2018) 14/18
- 15. Strategy From authors slide Ref: https://nips.cc/media/Slides/nips/2018/517cd(05-09-45) -05-10-20-12649-Minimax_Statist.pdf 15/18
- 16. Key Lemmas Ref. Lee and Raginsky (2018) 16/18
- 17. Why these lemmas are important? (Complexity of ΨΛ,F ) ≈ (Complexity of F) × (Complexity of Λ) 17/18
- 18. Impression • Duality form of risk (Rρ(P, f ) = infλ≥0 E[ψλ,f (Z)]) may be useful of its own. • Mysterious assumption 4 (incredibly local property of F). • Special structure of p=1-Wasserstein distance? 18/18