Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Visual geometry with deep learning

145 views

Published on

Visual Geometry with Deep Learning

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Visual geometry with deep learning

  1. 1. Visual Geometry with Deep Learning Kwang Moo Yi University of Victoria
  2. 2. Data
  3. 3. Data !4 “make use of the best ally we have: the unreasonable effectiveness of data.” Alon Halevy, Peter Norvig, and Fernando Pereira, The unreasonable effectiveness of data. IEEE Intelligent Systems, 24(2), 8-12. 2009
  4. 4. Effectiveness of data in deep learning !5 Sun C, Shrivastava A, Singh S, Gupta A. Revisiting unreasonable effectiveness of data in deep learning era. InComputer Vision (ICCV), 2017 IEEE International Conference on 2017 Oct 22 (pp. 843-852). IEEE. Image from arXiv preprint version MSCOCO PASCAL VOC 2007 Object detection performance
  5. 5. Why is data useful? !6 “… perhaps when it comes to natural language processing … will never have the elegance of physical equations…” Alon Halevy, Peter Norvig, and Fernando Pereira, The unreasonable effectiveness of data. IEEE Intelligent Systems, 24(2), 8-12. 2009
  6. 6. Using data • Learn the limitations of your data • Understand how data is acquired • Identify where the mathematical elegance becomes impractical • Domain knowledge
  7. 7. Using data • Learn the limitations of your data • Understand how data is acquired • Identify where the mathematical elegance becomes impractical • Domain knowledge
  8. 8. Multi-view Geometry !9
  9. 9. Multi-view Geometry Hotel Images are in the public domain. Modified to simulate 3D rotation !10
  10. 10. C1 Hotel Images are in the public domain. Modified to simulate 3D rotation Multi-view Geometry !11
  11. 11. C1 C2 Hotel Images are in the public domain. Modified to simulate 3D rotation Multi-view Geometry !12
  12. 12. C1 C2 Hotel Images are in the public domain. Modified to simulate 3D rotation How did the camera move? Multi-view Geometry !13
  13. 13. Hotel Images are in the public domain. Modified to simulate 3D rotation Drone image is from parrot. Reproduced for educational purposes. Multi-view Geometry !14
  14. 14. Hotel Images are in the public domain. Modified to simulate 3D rotation Drone image is from parrot. Reproduced for educational purposes. Multi-view Geometry !15 Car image is CC0
  15. 15. Camera Pose !16 [Crivelaro et. al, TPAMI, 2019]
  16. 16. Camera Pose !17 [Klein and Murray, ISMAR, 2007]
  17. 17. C1 C2 Hotel Images are in the public domain. Modified to simulate 3D rotation Multi-view Geometry How did the camera move? !18
  18. 18. C1 C2 Hotel Images are in the public domain. Modified to simulate 3D rotation Multi-view Geometry Find corresponding points and triangulate! !19
  19. 19. C1 C2 Hotel Images are in the public domain. Modified to simulate 3D rotation Multi-view Geometry Find corresponding points and triangulate! !20
  20. 20. C1 C2 Hotel Images are in the public domain. Modified to simulate 3D rotation Multi-view Geometry Find corresponding points and triangulate! !21
  21. 21. Best tool for matching points across images. SIFT (Lowe, ICCV’99) started the trend: ~68k citations. Interest Points !22
  22. 22. LIFT: Learned Invariant Feature Transform DET Crop ORI Rot DESC LIFT pipeline SCORE MAP softargmax description vector !23 Y. Verdie, K.M. Yi, P. Fua, V. Lepetit: "TILDE: A Temporally Invariant Learned DEtector", CVPR 2015. K.M. Yi, Y. Verdie, V. Lepetit, P. Fua : ”Learning to Assign Orientations to Feature Points", CVPR 2016 (Oral) K.M. Yi, E. Trulls, V. Lepetit, P. Fua: “LIFT: Learned Invariant Feature Transform", ECCV 2016 (Spotlight)
  23. 23. Quantitative results 0.165 0.22 SIFT SURF ORB Daisy sGLOH MROGH LIOP BiCE BRISK FREAK VGG DeepDesc PN-Net KAZE LIFT (pic) LIFT (rf) 0 0.1 0.2 0.3 0.4 Avg. matching score on ‘Strecha’ 0 0.08 0.16 0.24 0.32 Avg. matching score on ‘DTU’ 0 0.055 0.11 0.165 0.22 Avg. matching score on ‘Webcam’ LIFT with ‘pic’ dataset LIFT with ‘rf’ dataset • Best performance on all datasets, with either ‘pic’ or ‘rf’. • Surprising? SIFT remains #3 overall (#1: ours, #2: VGG). !24
  24. 24. LF-Net: Inference !25
  25. 25. LF-Net: Training !26
  26. 26. Quantitative results on outdoor scenes !27
  27. 27. Quantitative results on outdoor scenes !28 Simply training for scale invariance gave best results
  28. 28. Camera Pose? !29 mAP20degrees 0 0.1 0.2 0.3 0.4 SIFT+RANSAC SIFT+CVPR18 SIFT+arXiv19 LF-Net+arXiV19
  29. 29. TL; DR • End-to-end pipeline for local feature matching • Learning with non-differentiable components within Deep Learning • Tighter formulation —> better performance !30
  30. 30. TL; DR • End-to-end pipeline for local feature matching • Learning with non-differentiable components within Deep Learning • Tighter formulation —> better performance !31 Beyond?
  31. 31. Towards practical benchmarks Beyond !32 Towards less/no supervision Towards stable optimization Towards “active” data acquisition
  32. 32. Towards practical benchmarks Beyond !33 Towards less/no supervision Towards stable optimization Towards “active” data acquisition
  33. 33. Image Matching: Local Features and Beyond https://image-matching-workshop.github.io Vassileios Balntas (Scape), Vincent Lepetit (U. Bordeaux), Johannes Schönberger (Microsoft), Eduard Trulls (Google), Kwang Moo Yi (U. Victoria)
  34. 34. Image Matching Challenge !35
  35. 35. The phototourism challenge: Data 36
  36. 36. The phototourism challenge: Data 37
  37. 37. The phototourism challenge: Data ● 25k images in total for training. ● “Quasi” ground truth data is generated by performing SfM with COLMAP with all images. ○ Assumption: Images registered in COLMAP are accurate given enough images. ● Valid pairs are generated via simple visibility check. 38
  38. 38. The phototourism challenge: Data ● 4k images in total for testing. ● Random bags of images are subsampled to form test subsets (size: 3, 5, 10, 25). 39
  39. 39. The phototourism challenge: local features Hotel Images are in the public domain. Modified to simulate 3D rotation ● Submission: Features ● IMW evaluates them via a typical stereo/SfM pipeline ○ Nearest neighbor matching ○ 1-to-1 matching ○ RANSAC_F ○ COLMAP 40
  40. 40. The phototourism challenge: matches Hotel Images are in the public domain. Modified to simulate 3D rotation ● Submission: Features + Matches ● IMW evaluates them via a typical stereo/SfM pipeline ○ Nearest neighbor matching ○ 1-to-1 matching ○ RANSAC_F ○ COLMAP 41
  41. 41. The phototourism challenge: poses Hotel Images are in the public domain. Modified to simulate 3D rotation ● Submission: Poses ● IMW evaluates them via a typical stereo/SfM pipeline ○ Nearest neighbor matching ○ 1-to-1 matching ○ RANSAC_F ○ COLMAP 42
  42. 42. Improving with descriptors (multi-view task) +12% +23% +26% +28% +30% +32% Full results: https://image-matching-workshop.github.io/leaderboard 43
  43. 43. Improving with matching (multi-view task) +11% +37% +14% +35% SuperPoint: Self-Supervised Interest Point Detection and Description. DeTone et al., 2018. ContextDesc: Local Descriptor Augmentation with Cross-Modality Context. Luo et al., CVPR'19 Learning to Find Good Correspondences. Yi et al., CVPR'18 44
  44. 44. End-to-end pipelines SuperPoint: Self-Supervised Interest Point Detection and Description. DeTone et al., 2018. D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. Dusmanu et al., CVPR'19 45
  45. 45. Image Matching: Local Features and Beyond https://image-matching-workshop.github.io Vassileios Balntas (Scape), Vincent Lepetit (U. Bordeaux), Johannes Schönberger (Microsoft), Eduard Trulls (Google), Kwang Moo Yi (U. Victoria)
  46. 46. Towards practical benchmarks Beyond !47 Towards less/no supervision Towards stable optimization Towards “active” data acquisition
  47. 47. Towards practical benchmarks Beyond !48 Towards less/no supervision Towards stable optimization Towards “active” data acquisition
  48. 48. LF-Net: Inference !49
  49. 49. LF-Net: Inference Image-level Scale-space Heatmap Learning !50
  50. 50. LF-Net: Inference Image-level Scale-space Heatmap Learning Extract top-K patches !51
  51. 51. LF-Net: Inference Back propagation breaks Extract top-K patches !52
  52. 52. LF-Net: Training !53 Back propagation until here
  53. 53. LF-Net: Training !54
  54. 54. LF-Net: Training !55 Back prop. with results from other branch
  55. 55. LF-Net: Training !56 Apply score map cleaning, etc. (traditional heuristics)
  56. 56. LF-Net: Training !57 Can we simply back propagate without requiring the second branch?
  57. 57. !58 [Angles et. al, arXiv, 2019] MIST Multiple Instance Spatial Transformer Networks Learning to localize & understand is easy when there are only single instances of the object in the scene
  58. 58. !59 [Angles et. al, arXiv, 2019] MIST Multiple Instance Spatial Transformer Networks Non-trivial when multiple instances exist
  59. 59. !60 [Angles et. al, arXiv, 2019] MIST Multiple Instance Spatial Transformer Networks Non-trivial when multiple instances exist
  60. 60. Key Idea Lifting via slack variable !61
  61. 61. !62 [Angles et. al, arXiv, 2019] MIST Multiple Instance Spatial Transformer Networks
  62. 62. !63 [Angles et. al, arXiv, 2019] MIST Multiple Instance Spatial Transformer Networks Lifting the optimization to circumvent top-K
  63. 63. !64 [Angles et. al, arXiv, 2019] MIST Multiple Instance Spatial Transformer Networks Treat intermediate heatmap as slack variable
  64. 64. !65 [Angles et. al, arXiv, 2019] MIST Multiple Instance Spatial Transformer Networks Back propagate in two stages
  65. 65. !66 [Angles et. al, arXiv, 2019] MIST Multiple Instance Spatial Transformer Networks Learning digits with supervision on “number of things”
  66. 66. !67 [Angles et. al, arXiv, 2019] MIST Multiple Instance Spatial Transformer Networks Learning basis kernel with supervision on “number of things”
  67. 67. !68 [Angles et. al, arXiv, 2019] MIST Multiple Instance Spatial Transformer Networks Learning to find digits without locational supervision
  68. 68. !69 [Angles et. al, arXiv, 2019] MIST Multiple Instance Spatial Transformer Networks
  69. 69. !70 [Angles et. al, arXiv, 2019] MIST Multiple Instance Spatial Transformer Networks Better than with supervision?!
  70. 70. !71 [Angles et. al, arXiv, 2019] MIST Multiple Instance Spatial Transformer Networks Back propagate in two stages
  71. 71. Towards practical benchmarks Beyond !72 Towards less/no supervision Towards stable optimization Towards “active” data acquisition
  72. 72. Towards practical benchmarks Beyond !73 Towards less/no supervision Towards stable optimization Towards “active” data acquisition
  73. 73. !74 [Jiang et. al, arXiv, 2019] Linearized Multi-Sampling Bilinear sampling Our method Visualization of gradients w.r.t. crop location. Should point towards centre.
  74. 74. !75 [Jiang et. al, arXiv, 2019] Linearized Multi-Sampling
  75. 75. !76 [Jiang et. al, arXiv, 2019] Linearized Multi-Sampling
  76. 76. Key Idea Linearize !77
  77. 77. !78 [Jiang et. al, arXiv, 2019] Linearized Multi-Sampling
  78. 78. !79 [Jiang et. al, arXiv, 2019] Linearized Multi-Sampling Intensities
  79. 79. !80 [Jiang et. al, arXiv, 2019] Linearized Multi-Sampling Intensities Coordinates
  80. 80. !81 [Jiang et. al, arXiv, 2019] Linearized Multi-Sampling Intensities Coordinates Plane equation — dY/DX
  81. 81. !82 [Jiang et. al, arXiv, 2019] Linearized Multi-Sampling Intensities Coordinates Plane equation — dY/DX
  82. 82. !83 [Jiang et. al, arXiv, 2019] Linearized Multi-Sampling
  83. 83. Qualitative Highlights: Image alignment Blue: bounding-box of the target region Red: bounding-box from bilinear sampling Green: bounding-box from our method Target image Bilinear sampling [14]Our method [Jiang et. al, arXiv, 2019]
  84. 84. Qualitative Highlights: Image alignment Blue: bounding-box of the target region Red: bounding-box from bilinear sampling Green: bounding-box from our method Target image Bilinear sampling [14]Our method [Jiang et. al, arXiv, 2019]
  85. 85. Qualitative Highlights: Image alignment Blue: bounding-box of the target region Red: bounding-box from bilinear sampling Green: bounding-box from our method Target image Bilinear sampling [14]Our method [Jiang et. al, arXiv, 2019]
  86. 86. !87 [Jiang et. al, arXiv, 2019] Linearized Multi-Sampling
  87. 87. !88 [Jiang et. al, arXiv, 2019] Linearized Multi-Sampling
  88. 88. !89 [Jiang et. al, arXiv, 2019] Linearized Multi-Sampling
  89. 89. Towards practical benchmarks Beyond !90 Towards less/no supervision Towards stable optimization Towards “active” data acquisition
  90. 90. Towards practical benchmarks Beyond !91 Towards less/no supervision Towards stable optimization Towards “active” data acquisition
  91. 91. Using data • Learn the limitations of your data • Understand how data is acquired • Identify where the mathematical elegance becomes impractical • Domain knowledge
  92. 92. Using data • Learn the limitations of your data • Understand how data is acquired • Identify where the mathematical elegance becomes impractical • Domain knowledge
  93. 93. Using data • Learn the limitations of your data • Understand how data is acquired • Identify where the mathematical elegance becomes impractical • Domain knowledge
  94. 94. Magnetic Resonance Imaging !96 fixed sampling Reconstruction acquisitions FT-1 … sampling randomly chosen Learned from data
  95. 95. !97 [Jin et. al, arXiv, 2019] Accelerated MRI(Reconstructed) Image Residual Samplingpattern inFourierSpace
  96. 96. Magnetic Resonance Imaging !98 fixed sampling Reconstruction acquisitions FT-1 … sampling randomly chosen Learned from data
  97. 97. Magnetic Resonance Imaging !99 fixed sampling Reconstruction acquisitions FT-1 … sampling randomly chosen Learned from data
  98. 98. !100 [Jin et. al, arXiv, 2019] Accelerated MRI(Reconstructed) Image Residual Samplingpattern inFourierSpace
  99. 99. !101 [Jin et. al, arXiv, 2019] Accelerated MRI Learning both to acquire data and use data (Reconstructed) Image Residual Samplingpattern inFourierSpace
  100. 100. Magnetic Resonance Imaging !102 fixed sampling Reconstruction acquisitions FT-1 … sampling randomly chosen
  101. 101. Magnetic Resonance Imaging !103 fixed sampling Reconstruction acquisitions FT-1 … sampling randomly chosen Sampler (Deep Net)
  102. 102. Magnetic Resonance Imaging !104 acquisitions FT-1 … sampling randomly chosen Sampler (Deep Net) Reconstrutor (Deep Net)
  103. 103. Magnetic Resonance Imaging !105 acquisitions FT-1 … sampling randomly chosen Sampler (Deep Net) Reconstrutor (Deep Net) Non-differentiable
  104. 104. Key Idea Self supervision Reinforcement Learning !106
  105. 105. Key Idea !107
  106. 106. !108 [Jin et. al, arXiv, 2019] Accelerated MRI Progressive sampling Decompose & Simplify • ReconNet learns to reconstruct • SampleNet learns to predict the next best sample position
  107. 107. !109 [Jin et. al, arXiv, 2019] Accelerated MRI Self-supervision through MCTS with implicit minimax Enhance via Self-supervision • MCTS provides better direction • Supervision to improve, not ground-truth
  108. 108. !110 [Jin et. al, arXiv, 2019] Accelerated MRI Progressive sampling Self-supervision through MCTS with implicit minimax
  109. 109. !111 [Jin et. al, arXiv, 2019] Accelerated MRI Performs best when using both components of our method together.
  110. 110. !112 [Jin et. al, arXiv, 2019] Accelerated MRI When reconstructing vis simple zero filling inverse Fourier Transform, learned sampling does not perform well. Performs best when using both components of our method together.
  111. 111. !113 [Jin et. al, arXiv, 2019] Accelerated MRI When reconstructing vis simple zero filling inverse Fourier Transform, learned sampling does not perform well. Performs best when using both components of our method together. Neither does the learned reconstruction when used with 
 other sampling patterns.
  112. 112. !114 [Jin et. al, arXiv, 2019] Accelerated MRI When reconstructing vis simple zero filling inverse Fourier Transform, learned sampling does not perform well. Neither does the learned reconstruction when used with 
 other sampling patterns. Performs best when using both components of our method together.
  113. 113. !115 Accelerated MRI [Jin et. al, arXiv, 2019] (Reconstructed) Image Residual Samplingpattern inFourierSpace
  114. 114. Accelerated MRI [Jin et. al, arXiv, 2019] (Reconstructed) Image Residual Samplingpattern inFourierSpace !116
  115. 115. !117 [Jin et. al, arXiv, 2019] Accelerated MRI Progressive sampling Self-supervision through MCTS with implicit minimax
  116. 116. Towards practical benchmarks Beyond !118 Towards less/no supervision Towards stable optimization Towards “active” data acquisition
  117. 117. Data !119 “make use of the best ally we have: the unreasonable effectiveness of data.” Alon Halevy, Peter Norvig, and Fernando Pereira, The unreasonable effectiveness of data. IEEE Intelligent Systems, 24(2), 8-12. 2009
  118. 118. Thank you! People behind our research (in the order of appearance) Code and Datasets: https://github.com/vcg-uvic

×