[論文紹介] Understanding and improving transformer from a multi particle dynamic ...Makoto Takenaka
・本資料は下記イベントにおける第三者による論文紹介用資料です。
・間違い等のご指摘は @functionalaho までお願いします。
・イベント情報
https://lpixel.connpass.com/event/135045/
・紹介論文
Yiping Lu et al. "Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View"
https://arxiv.org/abs/1906.02762
[論文紹介] Understanding and improving transformer from a multi particle dynamic ...Makoto Takenaka
・本資料は下記イベントにおける第三者による論文紹介用資料です。
・間違い等のご指摘は @functionalaho までお願いします。
・イベント情報
https://lpixel.connpass.com/event/135045/
・紹介論文
Yiping Lu et al. "Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View"
https://arxiv.org/abs/1906.02762
Imputation of Missing Values using Random ForestSatoshi Kato
missForest packageの紹介
“MissForest - nonparametric missing value imputation for mixed-type data (DJ Stekhoven, P Bühlmann (2011), Bioinformatics 28 (1), 112-118)
1) The document discusses recent advances in deep reinforcement learning algorithms for continuous control tasks. It examines factors like network architecture, reward scaling, random seeds, environments and codebases that impact reproducibility of deep RL results.
2) It analyzes the performance of algorithms like ACKTR, PPO, DDPG and TRPO on benchmarks like Hopper, HalfCheetah and identifies unstable behaviors and unfair comparisons.
3) Simpler approaches like nearest neighbor policies are explored as alternatives to deep networks for solving continuous control tasks, especially in sparse reward settings.
[DL輪読会]The Cramer Distance as a Solution to Biased Wasserstein GradientsDeep Learning JP
This document discusses several mathematical concepts related to probability and statistics:
- It defines the Kullback-Leibler divergence KL(P||Q) as a measure of how one probability distribution P differs from a second distribution Q.
- It presents an equation for Qθ(y|x) as a Gaussian distribution with mean fθ(x) and variance 1/2, commonly used in probabilistic modeling.
- It defines the Wasserstein distance Wp(μ,ν) as a way to measure the distance between two probability distributions based on the minimum cost of transporting mass between them.
- It provides equations for the probability density function and cumulative distribution function of a Dirac delta function and a
Imputation of Missing Values using Random ForestSatoshi Kato
missForest packageの紹介
“MissForest - nonparametric missing value imputation for mixed-type data (DJ Stekhoven, P Bühlmann (2011), Bioinformatics 28 (1), 112-118)
1) The document discusses recent advances in deep reinforcement learning algorithms for continuous control tasks. It examines factors like network architecture, reward scaling, random seeds, environments and codebases that impact reproducibility of deep RL results.
2) It analyzes the performance of algorithms like ACKTR, PPO, DDPG and TRPO on benchmarks like Hopper, HalfCheetah and identifies unstable behaviors and unfair comparisons.
3) Simpler approaches like nearest neighbor policies are explored as alternatives to deep networks for solving continuous control tasks, especially in sparse reward settings.
[DL輪読会]The Cramer Distance as a Solution to Biased Wasserstein GradientsDeep Learning JP
This document discusses several mathematical concepts related to probability and statistics:
- It defines the Kullback-Leibler divergence KL(P||Q) as a measure of how one probability distribution P differs from a second distribution Q.
- It presents an equation for Qθ(y|x) as a Gaussian distribution with mean fθ(x) and variance 1/2, commonly used in probabilistic modeling.
- It defines the Wasserstein distance Wp(μ,ν) as a way to measure the distance between two probability distributions based on the minimum cost of transporting mass between them.
- It provides equations for the probability density function and cumulative distribution function of a Dirac delta function and a