This document summarizes a research paper on scaling laws for neural language models. Some key findings of the paper include:
- Language model performance depends strongly on model scale and weakly on model shape. With enough compute and data, performance scales as a power law of parameters, compute, and data.
- Overfitting is universal, with penalties depending on the ratio of parameters to data.
- Large models have higher sample efficiency and can reach the same performance levels with less optimization steps and data points.
- The paper motivated subsequent work by OpenAI on applying scaling laws to other domains like computer vision and developing increasingly large language models like GPT-3.
This document summarizes a research paper on scaling laws for neural language models. Some key findings of the paper include:
- Language model performance depends strongly on model scale and weakly on model shape. With enough compute and data, performance scales as a power law of parameters, compute, and data.
- Overfitting is universal, with penalties depending on the ratio of parameters to data.
- Large models have higher sample efficiency and can reach the same performance levels with less optimization steps and data points.
- The paper motivated subsequent work by OpenAI on applying scaling laws to other domains like computer vision and developing increasingly large language models like GPT-3.
本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihoodDeep Learning JP
1. The document proposes methods for estimating the marginal log-likelihood of latent variable models in an unbiased manner.
2. It discusses using Monte Carlo methods like MCMC and importance sampling to estimate the intractable integral in the marginal log-likelihood. Multilevel Monte Carlo can provide an unbiased estimate with fewer samples than standard Monte Carlo.
3. Stochastically Unbiased Marginalization Objective (SUMO) is introduced to provide an unbiased estimate of the marginal log-likelihood using a single sample. This involves weighting the importance weighted bound with a geometric distribution.
本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihoodDeep Learning JP
1. The document proposes methods for estimating the marginal log-likelihood of latent variable models in an unbiased manner.
2. It discusses using Monte Carlo methods like MCMC and importance sampling to estimate the intractable integral in the marginal log-likelihood. Multilevel Monte Carlo can provide an unbiased estimate with fewer samples than standard Monte Carlo.
3. Stochastically Unbiased Marginalization Objective (SUMO) is introduced to provide an unbiased estimate of the marginal log-likelihood using a single sample. This involves weighting the importance weighted bound with a geometric distribution.
4. RQ: 良い汎化性能を達成する理由は?
• 一般的なDLの問題設定:サンプル数 <<<< パラメタの数
• しかし,汎化性能は◎
• 一方で汎化性能の悪いNNを考えることも簡単
• 汎化性能の「良い」NNと「悪い」 NNを分けるのは何か?
“What is it then that distinguishes neural networks that
generalize well from those that don’t?”
4