Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep parking

75,715 views

Published on

Internship report at Preferred Networks, Inc.

Published in: Technology

Deep parking

  1. 1. Deep parking: an implementation of automatic parking with deep reinforcement learning Shintaro Shiba, Feb.2016-Dec.2016 Engineer Internship at Preferred Networks Mentor: Abe-san, Fujita-san 1
  2. 2. About me Shintaro Shiba • Graduate student at the University of Tokyo – Major in neuroscience and animal behavior • Part-time engineer (internship) at Preferred Networks, Inc. – Blog post URL: https://research.preferred.jp/2017/03/deep- parking/ 2
  3. 3. Contents • Original Idea • Background: DQN and Double-DQN • Task definition – Environment: car simulator – Agents 1. Coordinate 2. Bird‘s-eye view 3. Subjective view • Discussion • Summary 3
  4. 4. Achievement Trajectory of the car agent Subjective view (Input for DQN) 0 deg -120 deg +120 deg 4
  5. 5. Original Idea: DQN for parking https://research.preferred.jp/2016/01/ces2016/ https://research.preferred.jp/2015/06/distributed-deep-reinforcement-learning/ Succeeded in driving smoothly with DQN Input: 32 virtual sensors, 3 previous actions + Current speed and steering Output: 9 actions Is it possible to learn for car agent to park itself, with inputs of images from camera? 5
  6. 6. Reinforcement learning Environment Agent action state reward Learning algorithm 6
  7. 7. DQN: Deep-Q Network Volodymyr Mnih et al. 2015 each episode >> each action >> update Q function >> 7
  8. 8. Double DQN Preventing overestimation of Q values Hado van Hasselt et al. 2015 8
  9. 9. Reinforcement learning in this project Environment Car simulator Agent Different sensor + different neural network action state = sensor input reward 9
  10. 10. Environment: Car simulator Forces of … • Traction • Air resistance • Rolling resistance • Centrifugal force • Brake • Cornering force F = Ftraction + Faero + Frr + Fc + Fbrake + Fcf 10
  11. 11. Common specifications: state, action, reward Input (States) – Features specific to each agent + car speed, car steering Output (Actions) – 9: accelerate, decelerate, steer right, steer left, throw (do nothing), accelerate + steer right, accelerate + steer left, decelerate + steer right, decelerate + steer left Reward – +1 when the car is in the goal – -1 when the car is out of the field – 0.01 - 0.01 * distance_to_goal otherwise (changed afterward) Goal – Car inside the goal region, no other conditions like car direction Terminate – Time up: 500 times of actions (changed to 450 afterward) – Field out: Out of the field 11
  12. 12. Common specifications: hyperparameters Maximum episode: 50,000 Gamma: 0.97 Optimizer: RMSpropGraves – lr=0.00015, alpha=0.95, momentum=0.95, eps=0.01 – changed afterward: lr=0.00015, alpha=0.95, momentum=0, eps=0.01 Batchsize: 50 or 64 Epsilon: 0.1 at last – linearly decreased from 1.0 at first 12
  13. 13. Agents 1. Coordinate 2. Bird’s-eye view 3. Subjective view – Three cameras – Four cameras 13
  14. 14. Coordinate agent Input features – Relative coordinate value from the car to the goal (80, 300) goal car 14 input shape: (2, ) normalized
  15. 15. Coordinate agent Neural Network – only full-connected layers (3) n of actions (9) n of car parameters (2) coordinates (2) 64 64 15
  16. 16. Coordinate agent Result 16
  17. 17. Bird’s-eye view agent Input features – Bird’s-eye image of the whole field input size: 80 x 80 normalized 17
  18. 18. Bird’s-eye view agent Neural Network 80 80 128 192 n of actions n of car parameters (2) 64 400 18 Conv
  19. 19. Bird’s-eye view agent Neural Network 80 80 128 192 n of actions n of car parameters (2) 64 400 19 Conv
  20. 20. Bird’s-eye view agent Result: 18k episodes 20
  21. 21. Bird’s-eye view agent Result: after 18k episodes ? But we had already spent about 6 month for this agent so moved to the next…21
  22. 22. Subjective view agent Input features – N_of_camera images of subjective view from the car – Number of cameras…Three or Four – FoV = 120 deg camera ex. Input images for four camera agent front +0 back +180 right +90 left +270 22
  23. 23. Subjective view agent Neural Network Conv 80 80 200 x 3 400 256 n of actions n of car parameters (2) 64 23
  24. 24. Subjective view agent Neural Network Conv 80 80 200 x 3 400 256 n of actions n of car parameters (2) 64 24
  25. 25. Subjective view agent Problem – Calculation time (GeForce GTX TITAN X) • At first… 3 [min/ep] x 50k [ep] = 100 days • Reviewed by Abe-san… 1.6 [min/ep] x 50k [ep] = 55 days – Because of copy and synchronization between GPU and CPU – Learning interrupted as soon as divergence of DNN output – (Fortunately) agent “learned” goal by ~10k episodes in some trials – Memory usage • In DQN, we need to store 1M previous input data – 1M x (80 x 80 x 3 ch x 4 cameras) • Save images to disk and access every time 25
  26. 26. Subjective view agent Result: three cameras, 6k episodes 0 deg -120 deg +120 deg Trajectory of the car agent Subjective view (Input for DQN) 26
  27. 27. Subjective view agent Result: three cameras, 50k episodes The policy “move anyways” ? >> Reward setting Seems not able to goal every time Only “easy” goal to achieve >> Variable task difficulty (curriculum Frequent goals here 27
  28. 28. Subjective view agent Four camera at 30k ep. 28
  29. 29. Modify reward Previous – +1 when the car is in the goal – -1 when the car is out of the field – 0.01 - 0.01 * distance_to_goal otherwise New – +1 - speed when the car is in the goal • in order to stop the car – -1 when the car is out of the field – -0.005 29
  30. 30. Modify difficulty Difficulty: Initial car direction & position – Constraint • Car always starts near the middle of the field • Car always starts with face toward center: – Curriculum • Car direction: – where n = currriculum • Criteria: – 0.6 of mean reward over 100 episodes ± p 12 n ± p 4 Goal n = 1 n = 2 30
  31. 31. Subjective view agent: modifications N cameras Reward Difficulty Learning result 3 Default Default about 6k: o 50k: x 3 modified Default about 16k: o 3 modified Constraint ? (still learning) 3 modified Curriculum o (though curriculum 1 yet) 4 Default Default x 4 modified Curriculum △ (not bad, but not successful yet at 6k) 31
  32. 32. Subjective view agent: modifications Curriculum + Three cameras @curriculum 1. Criteria needs to be modified reward mean reward sum 1.0 0.0 500 0 n episode 0 10k 20k n episode 0 10k 20k 32
  33. 33. Discussion 1. Initial settings included the situation where car cannot reach the goal – e.g. Start towards the edge of the field – This made learning unstable 2. Why successful for coordinate agent? – In spite there could be such situations? 33
  34. 34. Discussion 3. Comparison with three and four cameras – Considering success rate and execution time, three camera is better – Why not successful in four cameras? – Need several trials? 4. DQN often diverged – every three times in personal feeling • four cameras is slightly more oftern – Importance of dataset for learning • memory size, batch size 34
  35. 35. Discussion 5. Curriculum – Ideally better to quantify “difficulty of the task” • In this case, maybe it is roughly represented as “bias of distribution” of the selected actions? accelerate decelerate throw (do nothing) steer right steer left accelerate + steer right accelerate + steer left decelerate + steer right decelerate + steer left same times for each actions >> go straight biased distribution of selected actions >> go right/lef 35
  36. 36. Summary • Car agent can park itself with subjective view of cameras, though not always stable learning • Trade-off between reward design and learning difficulty – Simple reward: difficult to learn • Try other algorithms like A3C – Complex reward: difficult to set • Other setting for distance_to_goal 36

×