帰納バイアスが成立する条件

1
帰納バイアスが成立する条件

目的
本研究では、学習したモデルの推論能力に陰 (Implicitly) に影響を与える帰納バイアス (Inductive Biases) を、先行研究を通し
て、その特徴に応じて分類し、そのイメージに形を与え (定義) 、モデルに対する「効率的な」取み込み方を研究する一方で
、その分類に基づいて、これまで試行錯誤的に (Heuristics) 検討してきたネットワーク構造及びデータに対する拡張 (変換)を
、案件、タスク単位で、ある程度決定論的 (Deterministics) に (取り込みたい特性を) 取り込むことができるようになることを
目的とする。
2
機能バイアス
機能バイアス
{A | a, b, …}
{B | a, b, …}
{C | a, b, …}
{D | a, b, …}

留意事項
a. 「効率的な」「有効な」の定義
理論に基づく、意図的な取り込み
どのような性質を持つ機能バイアスを、どのような手法または、理論に基づいて、
何を実現するために取り入れるか
a. ドメイン: サンプルの特徴量空間及び、その確率分布
b. タスク: ラベル空間及び、条件付き分類確率の確率分布 (Classification、Segmentation等)
3

帰納バイアスについて
4

1. 一般的な定義
陰 (Implicitly) に使用される仮説集合や事前知識によって、モデルの表現能力に制約を与え、未知の入力に対
しても優れた汎化性能を達成させるもの
5
制約を与え表
現力を抑える
タスクに対して良
い汎化性能を達成

6
制約を与え表
現力を抑える
ドメイン
タスク
ハイパーパラ
メータの調性
表現力と汎化性能ドメイン、タスクと汎化性能
ことでのコストを下げることができる
制約を与え表
現力を抑える
ハイパーパラ
メータの調性

2. 本研究における定義
仮説・理論
問題
現象
①
②
④
③
7

● 本研究における定義
1. 仮説に対して、その現象が実際に確認されているか
2. 確認された現象に対して、どのような理論的説明を与えるか
3. 学習過程で、どのような幾何学的特性を発生させたいか
4. 解きたい問題に対して、どのような現象を発生させたいか
仮説・理論
問題
現象
①
②
④
③
8

仮説・理論
問題
現象
①
②
④
③
9
一般的な機能バイアス
原理解析に基づく機能バイアス
仮説に基づく機能バイアス

● データに潜む帰納バイアス
i. 金融時系列の振幅に対する不変性
金融系列データに関して、小さな変化には小さな変化が続く傾向があるという性質を示す
価格の変動率 (Volatility) クラスタリングと呼ばれる現象 [1]
i. 金融時系列の時間軸に対する不変性
株価の時系列に関して、配列の自己類似性の現象を示すフラクタル構造の仮説に基づき、
1つのシーケンスを複数の異なるサンプリングレートで観測した場合、ダウンサンプリン
グされたシーケンスの形状から、基礎となるサンプリングレートを推測することはできな
いという性質 [2, 3, 4, 5]
仮説・理論
問題
現象
①
②
④
③
10

i. 原点近傍説
● 線形回帰モデルのパラメータを原点を初期値として学習すると、パラメータノルムが最小
となる解に収束する [6, 7, 8]
ii. 初期値近傍説
● 微小な学習率の変化での学習もしくは、短い学習時間で学習した場合、初期値近傍のパラ
メータに収束する [9]
iii. 平坦性説
● 損失関数の幾何学的形状 (Loss Landscape) の平坦性が高い場合は、汎化性能が改善され
る [10]
iv. 二重降下理論
● モデルのパラメータ数を増加させるにつれて、汎化性能は一度悪化するが、パラメータ過
剰領域を超えると再び改善される [11, 12, 13]
仮説・理論
問題
現象
①
②
④
③
11

● 動的等長性
i. 直交行列: 転置行列と逆行列が等しい正方行列
ii. 直交変換: 内積を変えず等長性を保存する直交行列による線形変換
iii. 等長変換: 変換の前後で距離を変えない変換
iv. 動的等張性
出力を入力で微分した行列 (入出力のヤコビ行列) の特異値を1付近で維持し、学習を安定
させる性質を与える幾何学的特性
■ 学習を安定させるためには、パラメータの直交初期化必要 [14, 15, 16]
■ パラメータを直交初期化をすることで、10,000層のCNNを学習 [17]
仮説・理論
問題
現象
①
②
④
③
12

i. 同変性
ii. 不変性
仮説・理論
問題
現象
①
②
④
③
13
①
②
変換後 ( ) に処理 ( )
処理後 ( ) に変換 ( )
①
②
同変性
不変性

i. 平行移動不変性
仮説・理論
問題
現象
①
②
④
③
ねこ
プーリング処理変換処理
Classification
14

ii. 平行移動同変性
仮説・理論
問題
現象
①
②
④
③
Segmentation
15

汎化性能について
16
一般的な機能バイアス
原理解析に基づく機能バイアス
仮説に基づく機能バイアス

汎化性能について
● 汎化誤差の評価
帰納バイアスに関連する評価
汎化誤差
近似誤差
(表現能力)
複雑性誤差
(関数空間の広さ)
最適化誤差
(学習の安定性)
複雑性誤差
最適化誤差
近似誤差
時間
損失
0
テスト損失
学習損失
17

複雑性誤差について
特徴量抽出関数表現
サンプル数に対するパラメータ数
で評価 [18]
18

特徴量抽出
で評価 [18]
パラメータ増加 = 複雑性誤差増加汎化性能低下
深層学習では過剰パラメー
タでも高性能を実現 [19]
矛
盾
[18]
19

で評価 [18]
パラメータ増加 = 複雑性誤差増加汎化性能低下
深層学習では過剰パラメー
タでも高性能を実現 [18]
矛
盾
現象に対して理論的説明を
与えることで、矛盾を解消
帰納バイアス
i. 原点近傍説
ii. 初期値近傍説
iii. 平坦性説
iv. 二重降下理論
20

原点近傍・初期値近傍説について
● 探索可能なパラメータ空間の大きさで複雑性誤差を評価
0
原点近傍 [6, 7, 8] 初期値近傍 [9]
21

原点近傍・初期値近傍説について
● 探索可能なパラメータ空間の大きさで複雑性誤差を評価
原点、初期値近傍説では汎化性能を説明できない現象 [113]
22
0
サンプル数の増加につれて、初期
値、原点からの距離が遠くなる一
方で、パラメターノルムも増加
サンプル数の増加につれて、汎化
誤差が低下する一方で、汎化誤差
の上界が増加
汎化性能とサンプル数との関係が説明できない

平坦性説について
● 期待鋭利さ (損失関数に対する重みの微分で
近似 ) して、複雑性誤差を評価 [10]
鋭利さが大きい
鋭利さが小さい
23
# of Parameters
複雑性誤差が小さい複雑性誤差が大きい
汎化性能が高い汎化性能が低い
Sharp Minimum
Flat Minimum

二重降下理論について
● データ数に対するパラメータ数の割合 ( ) を使用して、複雑性誤差を評価 [11, 12, 13]
24

局所的最小値、大域的最小値について
25

局所的最小値について
1. 局所的最小値 (Local minima)
● 学習時に得られる最適なパラメータ (集合)
● 良い最小値 (Good minima) と悪い最小値 (Bad minima)
1. 良い最小値 (Good minima = 大域的最小値)
● 学習データに対して (ほぼ) 100%適合する一方で、未知データに対しても高い汎化性能を示すパラメータ (集合)
1. 悪い最小値 (Bad minima)
● 学習データに対して (ほぼ) 100%適合する一方で、未知データに対しては低い汎化性能を示すパラメータ (集合)
そのような性質を持つパラメータ (集合) を獲得する学習
過程では、損失関数の (幾何学的) 形状が平坦な性質 (凸
性) を持つことが知られている[20]
26

参考文献
27

1. Mandelbrot, B. B. The variation of certain speculative prices.
2. Edgar E. Peters. Fractal market analysis: applying chaos theory to investment and economics.
3. Guangxi Cao, Jie Cao, Longbing Xu. Asymmetric multifractal scaling behavior in the Chinese stock market: Based on
asymmetric MF-DFA.
4. Walid Mensi, Atef Hamdi, Syed Jawad Hussain Shahzad, Muhammad Shafiullah, Khamis Hamed Al-Yahyaee. Modeling cross-
correlations and efficiency of Islamic and conventional banks from Saudi Arabia: Evidence from MF-DFA and MF-DXA
approaches.
5. Minhyuk Lee, Jae Wook Song, Sondo Kim and Woojin Chang. Asymmetric market efficiency using the index-based
asymmetric-MFDFA.
6. Neyshabur, B., Tomioka, R. and Srebro, N. Norm-based capacity control in neural networks.
7. Bartlett, P. L., Foster, D. J. and Telgarsky, M. J. Spectrally-normalized margin bounds for neural networks.
8. Golowich, N., Rakhlin, A. and Shamir, O. Size-independent sample complexity of neural networks.
9. Hardt, M., Recht, B. and Singer, Y. Train faster, generalize better: Stability of stochastic gradient descent.
10. Keskar, N. S., Nocedal, J., Tang, P. T. P., Mudigere, D. and Smelyanskiy, M. On large-batch training for deep learning:
Generalization gap and sharp minima.
11. Belkin, M., Hsu, D., Ma, S. and Mandal, S. Reconciling modern machine-learning practice and the classical bias–variance
trade-off.
参考文献
28

12. Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B. and Sutskever, I. Deep double descent: Where bigger models and
more data hurt.
13. Hastie, T., Montanari, A., Rosset, S. and Tibshirani, R. J. Surprises in high-dimensional ridgeless least squares interpolation.
14. Saxe, A. M., McClelland, J. L., and Ganguli, S. Exact solutions to the nonlinear dynamics of learning in deep linear neural
networks.
15. Pennington, J., Schoenholz, S., and Ganguli, S. Resurrecting the sigmoid in deep learning through dynamical isometry: theory
and practice.
16. Pennington, J., Schoenholz, S. S., and Ganguli, S. The emergence of spectral universality in deep networks.
17. Xiao, L., Bahri, Y., Sohl-Dickstein, J., Schoenholz, S. S., and Pennington, J. Dynamical Isometry and a Mean Field Theory of
CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks.
18. Anthony, M. and Bartlett, P. L. Neural Network Learning : Theoretical Foundations
19. Tan, M. and Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks.
20. Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, Tom Goldstein. Visualizing the Loss Landscape of Neural Nets.
21. Wojciech Tarnowski, Piotr Warchoł, Stanisław Jastrzębski, Jacek Tabor, Maciej A. Nowak. Dynamical Isometry is Achieved in
Residual Networks in a Universal Way for any Activation Function.
22. J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation.
23. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition.
参考文献
29

24. C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going
Deeper with Convolutions.
25. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia
Polosukhin. Attention is all you need.
26. Ilya Sutskever, Oriol Vinyals, Quoc V. Le. Sequence to Sequence Learning with Neural Networks.
27. li Deng, Geoffrey Hinton, and Brian Kingsbury. New types of deep neural network learning for speech recognition and related
applications.
28. A. Mohamed, G.E. Dahl, and G. Hinton. Acoustic Modeling Using Deep Belief Networks.
29. Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks.
30. Vaishnavh Nagarajan and J Zico Kolter. Uniform convergence may be unable to explain generalization in deep learning.
31. Gintare Karolina Dziugaite and Daniel M Roy. Computing nonvacuous generalization bounds for deep (stochastic) neural
networks with many more parameters than training data.
32. Kenji Kawaguchi, Leslie Pack Kaelbling, and Yoshua Bengio. Generalization in deep learning.
33. Behnam Neyshabur, Ryota Tomioka, and Nathan Srebro. In search of the real inductive bias: On the role of implicit
regularization in deep learning.
34. Tengyuan Liang, Tomaso Poggio, Alexander Rakhlin, and James Stokes. Fisher-rao metric, geometry, and complexity of
neural networks.
参考文献
30

35. Behnam Neyshabur, Srinadh Bhojanapalli, and Nathan Srebro. A PAC-Bayesian approach to Spectrally-Normalized margin
bounds for neural networks.
36. Sanjeev Arora, Rong Ge, Behnam Neyshabur, and Yi Zhang. Stronger generalization bounds for deep nets via a compression
approach.
37. Wenda Zhou, Victor Veitch, Morgane Austern, Ryan P Adams, and Peter Orbanz. Non-vacuous generalization bounds at the
ImageNet scale: a PAC-Bayesian compression approach.
38. Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requires
rethinking generalization.
39. Krizhevsky, A., Sutskever, I., and Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks.
40. Simonyan, K. and Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition.
41. He, K., Zhang, X., Ren, S., and Sun, J. Deep Residual Learning for Image Recognition.
42. Gao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Q. Weinberger. Densely Connected Convolutional Networks.
43. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav
Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel HerbertVoss, Gretchen Krueger, Tom Henighan, Rewon Child,
Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray,
Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020.
Language Models are Few-Shot Learners.
参考文献
31

44. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and
Veselin Stoyanov. RoBERTa: A Robustly Optimized BERT Pretraining Approach.
45. Guido Montúfar, Razvan Pascanu, Kyunghyun Cho, Yoshua Bengio. On the Number of Linear Regions of Deep Neural
Networks.
46. Huan Xiong, Lei Huang, Mengyang Yu, Li Liu, Fan Zhu, Ling Shao. On the Number of Linear Regions of Convolutional Neural
Networks.
47. Xiao Zhang & Dongrui Wu. Empirical Studies on the Properties of Linear Regions in Deep Neural Networks.
48. Razvan Pascanu, Tomas Mikolov, Yoshua Bengio On the difficulty of training Recurrent Neural Networks.
49. Yoshua Bengio, Patrice Simard, Paolo Frasconi. Learning long-term dependencies with gradient descent is difficult.
50. Sepp Hochreiter. The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions.
51. Y. Bengio. Learning Deep Architectures for AI.
52. X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks.
53. D. Erhan, Y. Bengio, A. Courville, P.A. Manzagol, and P. Vincent. Why Does Unsupervised Pre-training Help Deep Learning?.
54. Y.N. Dauphin and Y. Bengio. Big Neural Networks Waste Capacity.
参考文献
32

56. P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A.
Santoro, R. Faulkner, C. Gulcehre, F. Song, A. Ballard, J. Gilmer, G. Dahl, A. Vaswani, K. Allen, C. Nash, V. Langston, C.
Dyer, N. Heess, D. Wierstra, P. Kohli, M. Botvinick, O. Vinyals, Y. Li, and R. Pascanu. Relational inductive biases, deep
learning, and graph networks.
57. Daniel Soudry, Elad Ho er, Mor Shpigel Nacson, Suriya Gunasekar, and Nathan Srebro. The Implicit Bias of Gradient
Descent on Separable Data.
58. Vatsal Shah, Anastasios Kyrillidis, and Sujay Sanghavi. Minimum norm solutions do not always generalize well for over-
parameterized problems.
59. Sanjeev Arora, Nadav Cohen, Wei Hu, and Yuping Luo. Implicit Regularization in Deep Matrix Factorization.
60. Depen Morwani & Harish G. Ramaswamy. Inductive Bias of Gradient Descent for Exponentially Weight Normalized Smooth
Homogeneous Neural Nets.
61. Sanjeev Arora, Nadav Cohen, Noah Golowich, Wei Hu. A Convergence Analysis of Gradient Descent for Deep Linear Neural
Networks.
62. Avrim Blum and Ronald L Rivest. Training a 3-Node Neural Network is NP-Complete.
63. Murty, K. G. and Kabadi, S. N. Some NP-complete problems in quadratic and nonlinear programming.
参考文献
33

65. Grzegorz Swirszcz, Wojciech Marian Czarnecki, and Razvan Pascanu. Local minima in training of deep networks.
66. Mor Shpigel Nacson, Jason Lee, Suriya Gunasekar, Pedro Henrique Pamplona Savarese, Nathan Srebro, and Daniel Soudry.
Convergence of Gradient Descent on Separable Data.
67. Mor Shpigel Nacson, Nathan Srebro, and Daniel Soudry. Stochastic Gradient Descent on Separable Data: Exact Convergence
with a Fixed Learning Rate.
68. Suriya Gunasekar, Jason Lee, Daniel Soudry, and Nathan Srebro. Characterizing Implicit Bias in Terms of Optimization
Geometry.
69. Ziwei Ji and Matus Telgarsky. A refined primal-dual analysis of the implicit bias.
70. Ziwei Ji and Matus Telgarsky. The implicit bias of gradient descent on nonseparable data.
71. Ziwei Ji and Matus Telgarsky. Gradient descent aligns the layers of deep linear networks.
72. Suriya Gunasekar, Jason D Lee, Daniel Soudry, and Nati Srebro. Implicit Bias of Gradient Descent on Linear Convolutional
Networks.
73. Mor Shpigel Nacson, Suriya Gunasekar, Jason Lee, Nathan Srebro, and Daniel Soudry. Lexicographic and Depth-Sensitive
Margins in Homogeneous and Non-Homogeneous Deep Models.
74. Lénaïc Chizat, Francis Bach. Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic
Loss.
75. Kaifeng Lyu and Jian Li. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks.
参考文献
34

76. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander CBerg. SSD:
Single Shot Multibox Detector.
77. Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You Only Look Once: Unified, Real-time Object Detection.
78. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards Real-time Object Detection with Region
Proposal Networks.
79. Kaiming He, Georgia Gkioxari, Piotr Doll ar, and Ross Girshick. Mask R-CNN.
80. Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Dollar. Panoptic Segmentation.
81. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation.
82. Jonathan L Long, Ning Zhang, and Trevor Darrell. Do Convnets Learn Correspondence?
83. Xufeng Han, Thomas Leung, Yangqing Jia, Rahul Sukthankar, and Alexander C Berg. MatchNet: Unifying Feature and Metric
Learning for Patch-Based Matching.
84. Sergey Zagoruyko and Nikos Komodakis. Learning to Compare Image Patches via Convolutional Neural Networks.
85. Joao Carreira and Andrew Zisserman. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset.
86. Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and
ImageNet?.
87. Karen Simonyan and Andrew Zisserman. Two-Stream Convolutional Networks for Action Recognition in Videos.
88. Leon A Gatys, Alexander S Ecker, and Matthias Bethge. Image Style Transfer Using Convolutional Neural Networks.
参考文献
35

88. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua
Bengio. Generative Adversarial Networks.
89. Durk P Kingma and Prafulla Dhariwal. Glow: Generative Flow with Invertible 1x1 Convolutions.
90. Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, Dong Yu. Convolutional Neural Networks
for Speech Recognition.
91. Yann LeCun, Yoshua Bengio, et al. Convolutional networks for images, speech, and time series.
92. Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew
Senior, and Koray Kavukcuoglu. WaveNet: A Generative Model for Raw Audio.
93. Keunwoo Choi, Gy orgy Fazekas, Mark Sandler, and Kyunghyun Cho. Convolutional recurrent neural networks for music
classification.
94. Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal,
Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron J. Weiss, Kevin Wilson. CNN Architectures for Large-Scale
Audio Classification.
95. Stanley J Reeves. Fast image restoration without boundary artifacts.
96. Kyunghyun Cho, Bart van Merri enboer, Dzmitry Bahdanau, and Yoshua Bengio. On the Properties of Neural Machine
Translation: Encoder-Decoder Approaches.
97. Cicero Dos Santos and Maira Gatti. Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts.
参考文献
36

98. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation.
99. Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Lecun. Spectral Networks and Locally Connected Networks on
Graphs.
100.David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Al an Aspuru-Guzik, Ryan P
Adams. Convolutional Networks on Graphs for Learning Molecular Fingerprints.
101.Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, Max Welling. Modeling Relational Data
with Graph Convolutional Networks.
102.Anadi Chaman, Ivan Dokmanić. Truly shift-invariant convolutional neural networks
103.David A. McAllester. Some PAC-Bayesian Theorems.
104.David A. McAllester. PAC-Bayesian Model Averaging.
105.Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, and Nati Srebro. Exploring Generalization in Deep Learning.
106.Behnam Neyshabur, Srinadh Bhojanapalli, and Nathan Srebro. A PAC-Bayesian Approach to Spectrally-Normalized Margin
Bounds for Neural Networks.
107.Jiang, Y., Neyshabur, B., Krishnan, D., Mobahi, H., and Bengio, S. Fantastic Generalization Measures and Where to Find
Them.
108.Guillermo Valle-Pérez, Ard A. Louis. Generalization bounds for deep learning.
109.Tsung-Han Hsieh, Li Su, Yi-Hsuan Yang. A Streamlined Encoder/Decoder Architecture for Melody Extraction.
参考文献
37

109.Ta-Wei Tang, Wei-Han Kuo, Jauh-Hsiang Lan, Chien-Fang Ding, Hakiem Hsu, Hong-Tsu Young. Anomaly Detection Neural
Network with Dual Auto-Encoders GAN and Its Industrial Inspection Applications.
110.Yan Hong, Li Niu, Jianfu Zhang, Weijie Zhao, Chen Fu, Liqing Zhang. F2GAN: Fusing-and-Filling GAN for Few-shot Image
Generation.
111.Vasiliy Kuzmin, Fyodor Kravchenko, Artem Sokolov, Jie Geng. Real-time Streaming Wave-U-Net with Temporal Convolutions
for Multichannel Speech Enhancement.
112.Yihe Dong, Jean-Baptiste Cordonnier, Andreas Loukas. Attention is not all you need: pure attention loses rank doubly
exponentially with depth.
113.Vaishnavh Nagarajan, J. Zico Kolter. Nagarajan, V. and Kolter, J. Z. Uniform convergence may be unable to explain
generalization in deep learning.
114.Ohad Shamir. Are ResNets Provably Better than Linear Predictors?
参考文献
38

帰納バイアスが成立する条件

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

帰納バイアスが成立する条件