[NeurIPS2019 論文読み会] A Meta Analysis of Overfitting in Machine Learning

A Meta-Analysis of Overfitting in Machine Learning
Masanari Kimura (mkimura@ridge-i.com)

1
©2020Ridge-iAllRightsReserved.
• NeurIPS2019採択論文 [1]
• テストデータの使い回しに起因する過学習についてのメタ分析
• 分析対象はここ数年間で開催されたKaggleコンペ
Abstract
論文URL：http://papers.nips.cc/paper/9117-a-meta-analysis-of-overfitting-in-machine-learning

2
Holdout
機械学習モデルの学習の際，学習データからあらかじめ評価データを切り離しておくこと
• 機械学習界隈のほとんどの評価の場においてこの方式が採用される
• コンペティション
• ベンチマーク実験
• ハイパーパラメータの探索
実験者が何度もholdoutを使い回すことに起因するholdoutへの過学習が問題視 [2, 3]

3
Related work: Do ImageNet Classifiers Generalize to ImageNet?
ベンチマークデータセットとして多用されるCIFAR10とImageNetに焦点を当てた関連研究 [4]
• Holdoutの使い回しによる既存研究のベンチマーク結果の信頼性に警鐘
• 実際にドメインが同一の新しいテストデータを用意したら全ての既存研究の実験結果が悪化
論文URL：https://arxiv.org/abs/1902.10811

4
Motivation
• 前述の研究結果はCIFAR-10とImageNetについてだけだった
• Holdoutに対する過学習の現象が一般の機械学習タスクでも観測されるのかが知りたい

5
Kaggle: The platform of machine learning competitions
関連研究のような実験を行うため
には新しいテストデータを構築す
る必要がある
• 人手が必要＆非常に手間
Kaggleのコンペに白羽の矢が立つ
• データソースが豊富
• コンペ参加者は非常に多様な手法を適用
• 期間内にテストデータを何度も参照

6
Kaggle Ranking System
Kaggleのコンペティションでは，テスト
データをpublic/privateに分割
• 分割の内訳は伏せたまま参加者にテスト
データを公開
• コンペ期間中はpublicのテストデータの
みの評価を公開
• コンペ終了時に全テストデータに対する
評価を公開して，最終的な順位を決定

7
MetaKaggle Dataset
• Kaggleによって公開されているコンペに関するメタデータ
• この中のサブミッションに関連する情報を使って分析を行う
https://www.kaggle.com/kaggle/meta-kaggle

8
Adaptive Overfitting
仮説
• コンペの参加者は期間中にprivate test dataを参照できないので，overfitが発生しているのであれば
public test dataに対して観測されるはず
定義
• 今回の分析ではpublic test dataに対するスコアとprivate test dataに対するスコアの差をoverfitの
度合いとして使用

9
Examples of Competitions
ID Name # Submissions npublic nprivate
5275 Can we predict voting outcomes? 35,247 249,344 249,343
3788 Allstate Purchase Prediction Challenge 24,532 59,657 139,199
7634 TensorFlow Speech Recognition Challenge 24,263 3,171 155,365
7115 Cdiscount’s Image Classification Challenge 5,859 53,0455 1,237,727
分析対象のコンペティションの中でsubmission数が多いものの例．全ての評価指標はAccuracy．

10
Private versus Public Accuracy
全競技者のsubmissionのprivate/public score比
上位10%のsubmissionのprivate/public score比
• X軸をpublic score, Y軸をprivate scoreと
して散布図プロット
• 過学習してなければ𝑦 = 𝑥の直線に従うはず
• 全体の傾向としては概ね健全な結果
• 上位10%だけ注目すると，一部コンペで
overfitの兆候が観測

11
Conclusion and Discussion
• 120のKaggleコンペティションを調べた結果，adaptive-overfittingはほとんど観測されなかった
• Testデータの再利用が機械学習モデルの信頼性を損なうという主張に疑問が残る結果
• 少なくともKaggleの運営方式におけるholdoutの扱いは適切と思われる
• 一方で，分布シフト由来のスコアの乖離も関連研究で多く指摘 [5,6,7,8]
• 目下の機械学習界隈における問題の重要度はholdout overfitting < distribution shift

12
References
• [1] Roelofs, Rebecca, et al. "A Meta-Analysis of Overfitting in Machine Learning." Advances in
Neural Information Processing Systems. 2019.
• [2] Dwork, Cynthia, et al. "Preserving statistical validity in adaptive data analysis." Proceedings of
the forty-seventh annual ACM symposium on Theory of computing. 2015.
• [3] Robert, Christian. "Machine learning, a probabilistic perspective." (2014): 62-63.
• [4] Recht, Benjamin, et al. "Do imagenet classifiers generalize to imagenet?." arXiv preprint
arXiv:1902.10811 (2019).
• [5] L. Engstrom, B. Tran, D. Tsipras, L. Schmidt, and A. Madry. Exploring the landscape of spatial
robustness. In International Conference on Machine Learning (ICML), 2019.
• [6] J. Quionero-Candela, M. Sugiyama, A. Schwaighofer, and N. D. Lawrence. Dataset Shift in
Machine Learning. The MIT Press, 2009.
• [7] B. Recht, R. Roelofs, L. Schmidt, and V. Shankar. Do imagenet classifiers generalize to
imagenet? In International Conference on Machine Learning (ICML), 2019.
• [8] A. Torralba and A. A. Efros. Unbiased look at dataset bias. In Conference on Computer Vision
and Pattern Recognition (CVPR), 2011.

[NeurIPS2019 論文読み会] A Meta Analysis of Overfitting in Machine Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to [NeurIPS2019 論文読み会] A Meta Analysis of Overfitting in Machine Learning

Similar to [NeurIPS2019 論文読み会] A Meta Analysis of Overfitting in Machine Learning (20)

Recently uploaded

Recently uploaded (9)

[NeurIPS2019 論文読み会] A Meta Analysis of Overfitting in Machine Learning