本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
Deep Learningについて、日本情報システム・ユーザー協会(JUAS)のJUAS ビジネスデータ研究会 AI分科会で発表しました。その際に使用した資料です。専門家向けではなく、一般向けの資料です。
なお本資料は、2015年12月の日本情報システム・ユーザー協会(JUAS)での発表資料の改訂版となります。
In slide #25~26, Linear alignment -> Feedback alignment
Presentation for ICML2019 reading pitch @ Kyoto 4th August 2019. Shuntaro Ohno introduced "Training Neural Networks with Local Error Signals" in Japanese.
17. 研究の背景
• DNNを統計・学習理論で解析する論⽂
• Suzuki, T. (2018). Fast learning rate of deep learning via a
kernel perspective. JMLR W&CP (AISTATS).
• Schmidt-Hieber, J. (2017). Nonparametric regression using
deep neural networks with ReLU activation function. arXiv.
• Neyshabur, B., Tomioka, R., & Srebro, N. (2015). Norm-
based capacity control in neural networks. JMLR W&CP
(COLT).
• Sun, S., Chen, W., Wang, L., & Liu, T. Y. (2015). Large
margin deep neural networks: theory and algorithms, arXiv.
• ⾮滑らかな構造は主たる関⼼ではない
19. 区分上で滑らかな関数の定式化
• 定式化の流れ
• 1. [0,1]-上の滑らかな関数
• 2. [0,1]-に含まれる区分
• 1. [0,1]-
上の滑らかな関数
• 準備:ヘルダーノルム
• 定義:ヘルダー空間
G[✓`](x) = x(`)
,
where x` is defined inductively as
x(0)
:= x,
x(`0)
:= ⌘(A`0 x(`0 1)
+ b`0 ), for `0
= 1, ..., ` 1,
where ⌘ is an element-wise ReLU function, i.e., ⌘(x) = (max{0, x1}, ..., max{0, x
Here, we define that c(✓) denotes a number of non-zero parameters in ✓.
1.2. Characterization for True functions. We consider a piecewise smooth
functions for characterizing f⇤. To this end, we introduce a formation of
some set of functions.
Smooth Functions Secondly, a set for smooth functions is introduced.
With ↵ > 0, let us define the H¨older norm
kfkH := max
|a|b c
sup
x2[ 1,1]D
|@a
f(x)| + max
|a|=b c
sup
x,x02[ 1,1]D
|@af(x) @af(x0)|
|x x0| b c
,
and also H ([ 1, 1]d) be the H¨older space such that
H = H ([ 1, 1]D
) := f : [ 1, 1]D
! R |kfkH CH ,
where CH is some finite constant.
Date: January 13, 2018.
H = H ([0, 1]D
) = f : [0, 1]D
! R|kfkH < 1
43. 参照論⽂
• Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression.
The annals of statistics, 1040-1053.
• Suzuki, T. (2018). Fast learning rate of deep learning via a kernel perspective. JMLR
W&CP (AISTATS).
• Schmidt-Hieber, J. (2017). Nonparametric regression using deep neural networks with
ReLU activation function. arXiv.
• Neyshabur, B., Tomioka, R., & Srebro, N. (2015). Norm-based capacity control in neural
networks. JMLR W&CP (COLT).
• Sun, S., Chen, W., Wang, L., & Liu, T. Y. (2015). Large margin deep neural networks:
theory and algorithms, arXiv.
• Choromanska, A., Henaff, M., Mathieu, M., Arous, G. B., & LeCun, Y. (2017) The loss
surfaces of multilayer networks. JMLR W&CP (AISTATS).
• Kawaguchi, K. (2016). Deep learning without poor local minima. In Advances in Neural
Information Processing Systems.
• Yarotsky, D. (2017). Error bounds for approximations with deep ReLU networks. Neural
Networks, 94, 103-114.
• Safran, I., & Shamir, O. (2017). Depth-width tradeoffs in approximating natural functions
with neural networks. JMLR W&CP (ICML).
• Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2016). Understanding deep
learning requires rethinking generalization. ICLR.
• Xu, A., & Raginsky, M. (2017). Information-theoretic analysis of generalization capability
of learning algorithms. In Advances in Neural Information Processing Systems.