RAFT: Recurrent All-Pairs Field
Transforms for Optical Flow
Hyeongmin Lee
Image and Video Pattern Recognition LAB
Electrical and Electronic Engineering Dept, Yonsei University
6th Semester
PR-278
Content
 Datasets for Optical Flow
 Supervised Optical Flow Estimation
 Network Structure
 Training & Inference
 Results & Conclusion
Teed, Zachary, and Jia Deng. "RAFT: Recurrent All-Pairs Field Transforms for Optical Flow." arXiv preprint arXiv:2003.12039 (2020). Some Images and Slides From Oral Presentation of this paper: RAFT
Datasets for Optical Flow
Datasets for Optical Flow
 Flying Chairs
Hard to get Ground Truth Label  Synthetic Dataset
But, limited in both generality and volume
Supervised Optical Flow Estimation
 Flying Things 3D
Supervised Optical Flow Estimation
 MPI-Sintel
Supervised Optical Flow Estimation
 KITTI
Supervised Optical Flow Estimation
Supervised Optical Flow Estimation
 PWC-Net [CVPR 2018]
• Pyramid
• Warping
• Cost Volume (Correlation Layer)
Supervised Optical Flow Estimation
 Pyramid Structure
𝐺𝐺𝑙𝑙: 7 × 7 Conv로 이루어 진 CNN
𝑤𝑤: backward warping module
• SpyNet [CVPR 2017]
Supervised Optical Flow Estimation
 Warping (Backward Warping)
𝐼𝐼1 𝐼𝐼2
𝐹𝐹12
Supervised Optical Flow Estimation
 Warping (Backward Warping)
�𝐼𝐼2 𝐼𝐼2
𝐹𝐹12
�𝐼𝐼2 𝑥𝑥 = 𝐼𝐼2(𝑥𝑥 + 𝐹𝐹12(𝑥𝑥))
Supervised Optical Flow Estimation
 Warping (Backward Warping)
Supervised Optical Flow Estimation
 Cost Volume (Correlation Layer)
• FlowNet [ICCV 2015; PR-214]
Supervised Optical Flow Estimation
 Cost Volume (Correlation Layer)
𝐱𝐱𝟏𝟏
𝐱𝐱𝟐𝟐
𝐷𝐷
𝐷𝐷
𝐾𝐾
𝐾𝐾
𝑊𝑊
𝐻𝐻
𝑊𝑊
𝐻𝐻
𝐷𝐷2
Network Structure
Network Structure
 Energy Minimization
“Flow Vector로 이어진 두 Pixel 값은 동일하다.”
𝐼𝐼(𝑥𝑥, 𝑦𝑦, 𝑡𝑡) (𝑢𝑢, 𝑣𝑣)
Image Flow
𝐼𝐼 𝑥𝑥, 𝑦𝑦, 𝑡𝑡 = 𝐼𝐼(𝑥𝑥 + 𝑢𝑢, 𝑦𝑦 + 𝑣𝑣, 𝑡𝑡 + 1)
• Optical Flow Constraint
Network Structure
 Energy Minimization
Network Structure
 Estimating Optical Flow by Iteration
Network Structure
 Overall Structure
Feature Extractor
Correlation Volume
GRU Structure
Network Structure
 Feature Extractor
• 256 Channel의 Feature 추출
• Fnet: Motion을 위한 Feature 추출(Frame 1,2 공통)
• Cnet: Context를 위한 Feature 추출 (Frame1에서만)
Network Structure
 Correlation Layer [Before Iter]
• Correlation
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝐻𝐻1
𝑊𝑊1𝑊𝑊
𝐻𝐻
𝐷𝐷
𝑊𝑊
𝐻𝐻
𝐷𝐷
(𝑖𝑖, 𝑗𝑗)
(𝑘𝑘, 𝑙𝑙)
Network Structure
 Correlation Layer [Before Iter]
• Correlation Pyramid
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝐻𝐻1
𝑊𝑊1
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝐻𝐻1
𝑊𝑊1
Down-
sample
𝐶𝐶 𝑘𝑘: [𝐻𝐻 × 𝑊𝑊 ×
𝐻𝐻
2𝑘𝑘
×
𝑊𝑊
2𝑘𝑘
]
Network Structure
 Correlation Layer [In Iter]
• Correlation Lookup (4D Cost Volume  3D Correlation Feature)
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝑊𝑊2
𝐻𝐻2
𝐻𝐻1
𝑊𝑊1
𝐻𝐻1
𝑊𝑊1
Current Optical Flow
2𝑟𝑟 + 1
2𝑟𝑟 + 1
𝑊𝑊
𝐻𝐻
2𝑟𝑟 + 1 2
Network Structure
 GRU [In Iter]
• ℎ𝑡𝑡: Hidden Unit of GRU
• 𝑥𝑥𝑡𝑡: Flow, Correlation Feature, Context Feature
𝑓𝑓𝑘𝑘+1 = 𝑓𝑓𝑘𝑘 + 𝛥𝛥𝛥𝛥
Training & Inference
Training & Inference
 Training
Training & Inference
 Inference
• Zero Initialization
• Warm Start
𝑓𝑓𝑡𝑡 𝑥𝑥 +𝑓𝑓𝑡𝑡−1 (𝑥𝑥) = 𝑓𝑓𝑡𝑡−1(𝑥𝑥)
𝑓𝑓𝑡𝑡 𝑥𝑥 = 0
Results & Conclusion
Results & Conclusion
 End-point-Error
Results & Conclusion
 Performance Plots
Results & Conclusion
Results & Conclusion
Results & Conclusion
Results & Conclusion
Results & Conclusion
 Conclusion
• 성능적인 측면에서 봤을 때에는 매우 인상적
• Best Paper치고는, 기술적인 부분과 성능에 Contribution이 치중되어 있다는 사실이 아쉽다.
• Optimization으로 포장하였지만, 사실상 여러 번의 Iteration을 돌린다는 점 외에는 특별히
Optimization이라고 하기는 어렵다.
• 하지만 Optical Flow와 같이, Ground-Truth 없이도 Input을 통해 어느 정도의 평가가 가능한 분야에
서 여러 번의 Iteration을 통한 Inference를 한다는 점은 매우 Reasonable하고 Novel하다고 볼 수
있음
Thank You!

PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow

  • 1.
    RAFT: Recurrent All-PairsField Transforms for Optical Flow Hyeongmin Lee Image and Video Pattern Recognition LAB Electrical and Electronic Engineering Dept, Yonsei University 6th Semester PR-278
  • 2.
    Content  Datasets forOptical Flow  Supervised Optical Flow Estimation  Network Structure  Training & Inference  Results & Conclusion Teed, Zachary, and Jia Deng. "RAFT: Recurrent All-Pairs Field Transforms for Optical Flow." arXiv preprint arXiv:2003.12039 (2020). Some Images and Slides From Oral Presentation of this paper: RAFT
  • 3.
  • 4.
    Datasets for OpticalFlow  Flying Chairs Hard to get Ground Truth Label  Synthetic Dataset But, limited in both generality and volume
  • 5.
    Supervised Optical FlowEstimation  Flying Things 3D
  • 6.
    Supervised Optical FlowEstimation  MPI-Sintel
  • 7.
    Supervised Optical FlowEstimation  KITTI
  • 8.
  • 9.
    Supervised Optical FlowEstimation  PWC-Net [CVPR 2018] • Pyramid • Warping • Cost Volume (Correlation Layer)
  • 10.
    Supervised Optical FlowEstimation  Pyramid Structure 𝐺𝐺𝑙𝑙: 7 × 7 Conv로 이루어 진 CNN 𝑤𝑤: backward warping module • SpyNet [CVPR 2017]
  • 11.
    Supervised Optical FlowEstimation  Warping (Backward Warping) 𝐼𝐼1 𝐼𝐼2 𝐹𝐹12
  • 12.
    Supervised Optical FlowEstimation  Warping (Backward Warping) �𝐼𝐼2 𝐼𝐼2 𝐹𝐹12 �𝐼𝐼2 𝑥𝑥 = 𝐼𝐼2(𝑥𝑥 + 𝐹𝐹12(𝑥𝑥))
  • 13.
    Supervised Optical FlowEstimation  Warping (Backward Warping)
  • 14.
    Supervised Optical FlowEstimation  Cost Volume (Correlation Layer) • FlowNet [ICCV 2015; PR-214]
  • 15.
    Supervised Optical FlowEstimation  Cost Volume (Correlation Layer) 𝐱𝐱𝟏𝟏 𝐱𝐱𝟐𝟐 𝐷𝐷 𝐷𝐷 𝐾𝐾 𝐾𝐾 𝑊𝑊 𝐻𝐻 𝑊𝑊 𝐻𝐻 𝐷𝐷2
  • 16.
  • 17.
    Network Structure  EnergyMinimization “Flow Vector로 이어진 두 Pixel 값은 동일하다.” 𝐼𝐼(𝑥𝑥, 𝑦𝑦, 𝑡𝑡) (𝑢𝑢, 𝑣𝑣) Image Flow 𝐼𝐼 𝑥𝑥, 𝑦𝑦, 𝑡𝑡 = 𝐼𝐼(𝑥𝑥 + 𝑢𝑢, 𝑦𝑦 + 𝑣𝑣, 𝑡𝑡 + 1) • Optical Flow Constraint
  • 18.
  • 19.
    Network Structure  EstimatingOptical Flow by Iteration
  • 20.
    Network Structure  OverallStructure Feature Extractor Correlation Volume GRU Structure
  • 21.
    Network Structure  FeatureExtractor • 256 Channel의 Feature 추출 • Fnet: Motion을 위한 Feature 추출(Frame 1,2 공통) • Cnet: Context를 위한 Feature 추출 (Frame1에서만)
  • 22.
    Network Structure  CorrelationLayer [Before Iter] • Correlation 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝐻𝐻1 𝑊𝑊1𝑊𝑊 𝐻𝐻 𝐷𝐷 𝑊𝑊 𝐻𝐻 𝐷𝐷 (𝑖𝑖, 𝑗𝑗) (𝑘𝑘, 𝑙𝑙)
  • 23.
    Network Structure  CorrelationLayer [Before Iter] • Correlation Pyramid 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝐻𝐻1 𝑊𝑊1 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝐻𝐻1 𝑊𝑊1 Down- sample 𝐶𝐶 𝑘𝑘: [𝐻𝐻 × 𝑊𝑊 × 𝐻𝐻 2𝑘𝑘 × 𝑊𝑊 2𝑘𝑘 ]
  • 24.
    Network Structure  CorrelationLayer [In Iter] • Correlation Lookup (4D Cost Volume  3D Correlation Feature) 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝑊𝑊2 𝐻𝐻2 𝐻𝐻1 𝑊𝑊1 𝐻𝐻1 𝑊𝑊1 Current Optical Flow 2𝑟𝑟 + 1 2𝑟𝑟 + 1 𝑊𝑊 𝐻𝐻 2𝑟𝑟 + 1 2
  • 25.
    Network Structure  GRU[In Iter] • ℎ𝑡𝑡: Hidden Unit of GRU • 𝑥𝑥𝑡𝑡: Flow, Correlation Feature, Context Feature 𝑓𝑓𝑘𝑘+1 = 𝑓𝑓𝑘𝑘 + 𝛥𝛥𝛥𝛥
  • 26.
  • 27.
  • 28.
    Training & Inference Inference • Zero Initialization • Warm Start 𝑓𝑓𝑡𝑡 𝑥𝑥 +𝑓𝑓𝑡𝑡−1 (𝑥𝑥) = 𝑓𝑓𝑡𝑡−1(𝑥𝑥) 𝑓𝑓𝑡𝑡 𝑥𝑥 = 0
  • 29.
  • 30.
    Results & Conclusion End-point-Error
  • 31.
    Results & Conclusion Performance Plots
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
    Results & Conclusion Conclusion • 성능적인 측면에서 봤을 때에는 매우 인상적 • Best Paper치고는, 기술적인 부분과 성능에 Contribution이 치중되어 있다는 사실이 아쉽다. • Optimization으로 포장하였지만, 사실상 여러 번의 Iteration을 돌린다는 점 외에는 특별히 Optimization이라고 하기는 어렵다. • 하지만 Optical Flow와 같이, Ground-Truth 없이도 Input을 통해 어느 정도의 평가가 가능한 분야에 서 여러 번의 Iteration을 통한 Inference를 한다는 점은 매우 Reasonable하고 Novel하다고 볼 수 있음
  • 37.