Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos

1,055 views

Published on

2019/02/23 第4回3D勉強会@関東 の発表資料です

Published in: Science
  • Be the first to comment

Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos

  1. 1. DEPTH PREDICTION WITHOUT THE SENSORS: LEVERAGINGSTRUCTUREFORUNSUPERVISEDLEARNING FROMMONOCULARVIDEOS [CASSER+,AAAI’19] 4 3D Shinya Sumikura @sumicco_cv @shinsumicco 2019/02/23 1
  2. 2. ( ) [ M1] • • SfM, SLAM • 3 CV • bundle adjustment • C/C++ • (CUDA) 2019/02/23 2 (2011-) “GAN” = (GaN) (2016-) 3 CV
  3. 3. Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos [Casser+, AAAI ’19] 2019/02/23 3
  4. 4. Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos [Casser+, AAAI ’19] 1. depth “ ” motion à Motion model 2. depth à Object size constraints 3. inference fine-tuning à Test time refinement 2019/02/23 4
  5. 5. SfMlearner? • “Unsupervised Learning of Depth and Ego-Motion from Video” [Zhou+, CVPR ‘17] depth 2019/02/23 5 training: (depth pose ) Depth CNN Pose CNN
  6. 6. SfMlearner? 1. 3 (! − 1, !, ! + 1) 2. ! 1 depth 3. ! → ! − 1 , (! → ! + 1) )*→*+,, )*→*-, 4. ! warp loss 2019/02/23 6
  7. 7. SfMlearner “Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints” [Mahjourian+, CVPR ‘18] • loss L1 loss SSIM loss • depth 3 ICP-based back-prop • ( ICP loss weight 0 ) “Learning Depth from Monocular Videos using Direct Methods” [Wang+, CVPR ‘18] • differentiable direct visual odometry (DDVO) pose “Digging Into Self-Supervised Monocular Depth Estimation” [Godard+, arxiv] • Depth Pose Encoder • ( ) “Monocular Depth Estimation: A Survey” [Bhoi, arxiv] 2019/02/23 7
  8. 8. • 3 • • à 2019/02/23 8 (“Digging Into Self-Supervised Monocular Depth Estimation” [Godard+, arxiv] )
  9. 9. • motion mask “SfM-Net: Learning of Structure and Motion from Video” [Vijayanarasimhan+, arxiv] • motion mask object motion • 2D scene flow “GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose” [Yin+, CVPR ‘18] • depth pose 2D rigid flow( ) residual flow( ) image warping 2019/02/23 9
  10. 10. • 3D scene flow ”Every Pixel Counts: Unsupervised Geometry Learning with Holistic 3D Motion Understanding” [Yang+, ECCV ‘18] • 3D motion (= 3D scene flow) image warping • 3D scene flow • 3D grid reasoning à 3D scene flow “Learning Independent Object Motion from Unlabelled Stereoscopic Videos” [Cao+, arxiv] (2019/01/07) • bounding box 3D view frustum 3D structure object motion • depth, 3D scene flow, object mask • ( ?) 2019/02/23 10
  11. 11. 1. training instance segmentation (off-the-shelf) 2. instance object motion 3. ego-motion + object motion image warping 2019/02/23 11
  12. 12. 1.Motionmodel • 2019/02/23 12
  13. 13. 1.Motionmodel • instance segmentation • Mask-RCNN 2019/02/23 13 image !" instance seg. mask #$, " image !& instance seg. mask #$, & image !' instance seg. mask #$, ' ( − 1 ( + 1( instance ID object , instance ID
  14. 14. 1.Motionmodel • binary mask • instance ID ! binary mask 2019/02/23 14 image "# instance seg. mask $%, # image "' instance seg. mask $%, ' image "( instance seg. mask $%, ( )* $' = 1 − ⋃% $%,' )/ $' = $/, ' instance 0à1(true), à0(false)
  15. 15. 1.Motionmodel ego-motion • binary mask ! 2019/02/23 15 ! = #$ %& ⊙ #$ %( ⊙ #$ %) warp binary mask #$ %& #$ %( #$ %)
  16. 16. 1.Motionmodel ego-motion • ! 2019/02/23 16 ⊙ ⊙ ⊙ #$ #% #& ! ! !
  17. 17. 1.Motionmodel ego-motion • ego-motion estimator !" 2019/02/23 17 #$→& #&→' !" ($ ⊙ * (& ⊙ * (' ⊙ * ego-motion
  18. 18. 1.Motionmodel instance object motion • depth !" ego-motion #$→", #"→' image warping • segmentation mask (), $, (), ' warp 2019/02/23 18 #$→" #"→' *$ depth !" = ,(.") ." .$ .' 0.$→" 0.'→" ." warp .$ .'
  19. 19. 1.Motionmodel instance object motion • instance ! warp binary mask object motion estimator "# 2019/02/23 19 ⊙ %& '()→+ ⊙ %& (+ ⊙ %& '(,→+ -)→+ (&) -+→, (&) object motion estimator "# '0)→+ '0,→+ 0+
  20. 20. 1.Motionmodel • warp !"#→% (') , !"*→% (') 2019/02/23 20 !"#→% (') = !"#→% ⊙ - + / 01# 2 !"#→% (0) ⊙ 30(4%) "% depth 5% ego-motion 6#→% instance ID 7 warping warping depth 5% ego-motion 6#→% object motion 89→: (;)
  21. 21. 1.Motionmodel • ! No. (instance ID ) • ( "# ) $ = & '() * "#$+,- ' + "/$0012 ' + "* 1 2' $56 ' • warping loss $+,- = min :;#→/ = − ;/ , :;*→/ = − ;/ • SSIM loss ( ) $0012 = 1 − SSIM(;D, ;E) 2 • smoothness loss ( ry) $56 = GHI∗ KL|NOP| + GQI∗ KL|NRP| (I∗ = I/TI) 2019/02/23 21 (2) (1)
  22. 22. 2.Objectsizeconstraints object size • 1.8m • 1.7m • 2.5m 2019/02/23 22 !"#$%&' = 1.8m !-.$ = 1.7m !0 1 ∶ category ID ∈ ℝ?@ A instance (real world ) instance ID B à category ID C(B) : ℕA → ℕA instance ID B à !H(I) : ℕA → ℝ?@ A
  23. 23. 2.Objectsizeconstraints • object size depth • !" 2019/02/23 23 ℎ(%& ' ) [pix])*(&) [m] real world instance ID + mask blob instance ID + depth ,-../01(); ℎ) [m] !" [pix] ,-../01 ); ℎ = !" ) ℎ
  24. 24. 2.Objectsizeconstraints • depth !"##$%& • '! disparity normalization • depth shrink • “Learning Depth from Monocular Videos using Direct Methods” [Wang+, CVPR ’18] 2019/02/23 24 ()* = , -./ 0 ! ⊙ 2- 3 '! − !"##$%&(67 - , ℎ 2- 3 '! :/ depth instance ; binary mask instance ; reference depth
  25. 25. 3.Testtimerefinement • inference • UP • : KITTI • inference 3 ! training • ! over training (! = 20) • fine-tuning online • / 2019/02/23 25
  26. 26. • M: Motion model • R: Test time refinement • M+R SoTA( ) • M ( ) 2019/02/23 26
  27. 27. • 2019/02/23 27
  28. 28. Motionmodel • KITTI training à Cityscapes inference • Motion model depth • worst 2019/02/23 28
  29. 29. Motionmodel • Cityscapes training à KITTI evaluation • Motion model (M) domain • Test time refinement (R) 2019/02/23 29
  30. 30. Motionmodel • training object motion • instance 6DoF motion • instance segmentation • instance ID 2019/02/23 30
  31. 31. Testtimerefinement • : baseline : refinement mode (R) • R depth • zero-shot domain transfer 2019/02/23 31 training: KITTI, inference: Cityscapestraining: KITTI, inference: KITTI
  32. 32. Indoordataset • Cityscapes training Indoor dataset inference • : baseline : refinement model (R) • “ (Cityscapes) à ” domain transfer 2019/02/23 32
  33. 33. • Motion model depth • test time refinement • instance segmentation motion naïve • instance segmentation network back prop • ? • End-to-end ? • instance ID 2019/02/23 33

×