https://arxiv.org/abs/1808.00177
https://openai.com/blog/learning-dexterity/
This slide is for an assignment in a great visionary company which pursues automation of everything. Finally, I couldn't meet its demand, because of lack of important issue in this slide. I hope you'd find out.
34. 参考文献
・Shadow Dexterous Hand™ – Now available for purchase!
https://www.shadowrobot.com/products/dexterous-hand/
・The shadow dexterous hand.
https://www.researchgate.net/figure/The-shadow-dexterous-hand_fig1_312082386
・Kinematics and Statics Analysis of Dexterous Hand
https://download.atlantis-press.com/article/25866110.pdf
・Sawyer Robot - Precision Using 7 Degrees of Freedom
https://www.youtube.com/watch?v=KBrR6tr_b_4
・What is rolling friction?
https://byjus.com/physics/rolling-friction/
・Damping
https://simple.wikipedia.org/wiki/Damping
34
35. 参考文献
・How to build a Recurrent Neural Network in TensorFlow (1/7)
https://medium.com/@erikhallstrm/hello-world-rnn-83cd7105b767
・Domain Randomization for Sim2Real Transfer
https://lilianweng.github.io/lil-log/2019/05/05/domain-randomization.html
・PANDAN TREE タイの織物
https://www.pandantree.com/textile/thailand.html
・Building an LSTM from Scratch in PyTorch (LSTMs in Depth Part 1)
https://mlexplained.com/2019/02/15/building-an-lstm-from-scratch-in-pytorch-lstms-in-depth-part-1/
・PhaseSpace
http://www.phasespace.com/companyMain.html
・Proximal Policy Optimization Algorithms
https://arxiv.org/abs/1707.06347
35
36. 参考文献
・High-Dimensional Continuous Control Using Generalized Advantage Estimation
https://arxiv.org/abs/1506.02438
・A (Long) Peek into Reinforcement Learning
https://lilianweng.github.io/lil-log/2018/02/19/a-long-peek-into-reinforcement-learning.html#value-estimation
・Reinforcement Learning: Eligibility Traces and TD(lambda)
https://amreis.github.io/ml/reinf-learn/2017/11/02/reinforcement-learning-eligibility-traces.html
・Bias-variance Tradeoff in Reinforcement Learning
https://www.endtoend.ai/blog/bias-variance-tradeoff-in-reinforcement-learning/
・符号関数
https://ja.wikipedia.org/wiki/%E7%AC%A6%E5%8F%B7%E9%96%A2%E6%95%B0
・学問 図鑑 - Kei-Net
https://www.keinet.ne.jp/gl/10/04/zukan_1004.pdf
36
1式目
Weighted average(加重平均)の考えを適用することで、kのうちどれか1つを選ばなくてはならない、という問題が解消される。
Weight decayとして、あらゆるk-stepでhyper parameterの λ∈[0,1]を掛ける
2式目
λ=0: low variance, high bias
λ=1: high variance, low bias
→ λとγを調整して、より高いgeneralizationを探ることとなる
Policy network: observationから実際のactionを策定
Value network: 現状から将来受け取る報酬の総計を割り引いたものを予測 (discounted sum of future rewards starting from a given state)
policyと異なり、シミュレーションのTrainingのみに使用