2. 4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
What is 4D Vision?
Multi-view video 3D 4D
Spatio-temporally coherent models
3. Why 4D Vision?
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
Robotics
Computer AnimationComputer Graphics Medical Imaging
Virtual Reality Digital Media
4. Why 4D Vision?
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
4D Vision enables machine perception
o Autonomous machine perception for:
o Online video-rate content capture
o Tools for content production in films (e.g.: automatic rotoscoping)
o Intelligent next-generation of sophisticated gaming
o VR/AR/MR (e.g.: holoportation for general scenes, virtual tourism,
immersive story telling)
5. 4D Vision - Challenges
Input:
o Uncalibrated wide-baseline multi-views from static/moving cameras
o Challenging outdoor scenes:
o Large capture volume
o Natural scene backgrounds
o Uncontrolled illumination and Repetitive texture
o Dynamic fast scene motion
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
6. 4D Vision - Challenges
o Temporally coherent reconstruction of complex dynamic scenes.
o Unknown background, structure and segmentation.
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
7. No prior information
Moving cameras
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
Multi-view scene
Framework
Object identification
Temporal coherence
4D scene reconstruction and
segmentation
4D Vision – Overview
8. Contributions to 4D Vision
General Scene Reconstruction
Temporally Coherent Reconstruction
4D Light-field Video
Non-sequential Alignment
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
Semantic Reconstruction
9. General scene reconstruction (3D)
General Dynamic Scene Reconstruction from Multiple View Video.
A. Mustafa, H. Kim, J-Y. Guillemaut and A. Hilton
International Conference in Computer vision (ICCV) 2015
Multi-scale Segmentation based Features for Wide-baseline Scene Reconstruction.
A. Mustafa, H. Kim and A. Hilton
IEEE Transactions in Image Processing (TIP) 2018
10. Existing methods - Problems
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
Segmentation Depth map
o Requires accurate segmentation of dynamic foreground objects
o Known background and structure
J-Y. Guillemaut and A. Hilton. Joint Multi-Layer Segmentation and Reconstruction for Free-Viewpoint Video Applications. IJCV 2010
11. Contributions
o Unsupervised dense reconstruction of general scenes without priors.
o Robust joint refinement of reconstruction and segmentation.
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
16. General dynamic reconstruction
A method to segment and reconstruct dynamic objects with improved quality
No prior information on background appearance or structure
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
17. Limitations
Per-frame inconsistent reconstruction and segmentation
The quality of results is far from perfect
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
18. Temporally coherent general scene
reconstruction
Temporally coherent 4D reconstruction of complex dynamic scenes
A. Mustafa, H. Kim, J-Y. Guillemaut and A. Hilton
Computer vision and pattern recognition (CVPR) 2016
19. Contributions
o Temporally coherent general scene reconstruction and segmentation.
o Improved joint refinement by introducing geodesic star-convexity.
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
20. Previous
frame mesh
Optical flow
Dense temporal
correspondence
Initial coarse
reconstruction
Final mesh
Temporal coherence
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
22. Limitations
The segmentation and reconstruction quality is not perfect
What we want!What we get!
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
23. Semantically Coherent Co-segmentation and
Reconstruction
Semantically coherent co-segmentation and reconstruction of dynamic scenes
A. Mustafa and A. Hilton
Computer vision and pattern recognition (CVPR) 2017
24. Contributions
o Semantic co-segmentation and reconstruction of complex scenes
o Temporal semantic coherence across sequence
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
25. Framework – Semantically coherent reconstruction
Multi-view
data
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
26. Framework – Semantically coherent reconstruction
Scene level
Multi-view
data
Initial Semantic
Segmentation
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
FCNs produce segmentations with poorly
localized object boundaries
27. Framework – Semantically coherent reconstruction
Scene level
Multi-view
data
Object
Clustering
Initial Semantic
Segmentation
Sparse Reconstruction
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
28. Object level
Initial Semantic 3D
Reconstruction
Framework – Semantically coherent reconstruction
Scene level
Multi-view
data
Object
Clustering
Initial Semantic
Segmentation
Sparse Reconstruction
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
29. Framework – Semantically coherent reconstruction
Scene level
Multi-view
data
Object
Clustering
Initial Semantic
Segmentation
Sparse Reconstruction
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
Object level
Semantic
Tracklets
Initial Semantic 3D
Reconstruction
Introduce temporal and
semantic coherence
30. Semantic tracklets
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
where N is the number of views
Am is the measure of appearance similarity.
S(i,j) = 1 /3N 𝑐=1
𝑁
𝐴𝑚 + 𝑆𝑚 + 𝐿𝑚
All frames with similarity > 0.75 are selected to form a semantic tracklet
Lm is the measure of class labels in the semantic segmentation region
31. Framework – Semantically coherent reconstruction
Scene level
Multi-view
data
Object
Clustering
Initial Semantic
Segmentation
Sparse Reconstruction
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
Object level
Initial Semantic 3D
Reconstruction
Refinement
Semantic
Tracklets
E(l,d) = δ Esemantic(l,d) + + + +
32. Framework – Semantically coherent reconstruction
Frame 11
Frame 42
Frame 26
Frame 56
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
33. Framework – Semantically coherent reconstruction
Multi-view data
Initial Semantic
Segmentation
Semantically Coherent
Segmentation
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
35. Results and Evaluation
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
Original videos Semantic reconstruction
Semantic co-segmentation Segmentation comparison
36. Results and Evaluation
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
Input videos Semantic reconstruction
Semantic co-segmentation
37. Results and Evaluation
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
Input videos Semantically coherent reconstruction
Semantic segmentation comparison
38. o Semantic co-segmentation and reconstruction of dynamic scenes
o Temporal semantic coherence enforced by semantic tracklets
o Improved segmentation and reconstruction of dynamic scenes
Semantically coherent reconstruction - Conclusions
Original Image
Frame 195
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
39. Results – 4D Reconstruction
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
Dance dataset Juggler dataset
40. Limitations
Sequential alignment is prone to errors due to drift and
large complex motions
Frame 26 Frame 86 Frame 86
What we want!
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
41. 4D match trees for non-sequential surface
alignment
4D Match Trees for Non-rigid Surface Alignment
A. Mustafa, H. Kim and A. Hilton
European conference in computer vision (ECCV) 2016
42. Contributions
o Robust global 4D alignment of partial reconstructions of non-rigid shape
o Sparse matching between wide-timeframe image pairs using SFD
o 4D Match Trees to represent the optimal non-sequential alignment path
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
43. Multi-view video
+ Surface
Wide-timeframe
sparse matches
4D Match Tree Dense
correspondence
4D scene reconstruction
Frame 1
Results – Non-sequential alignment
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
44. Results – Non-sequential alignment
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
Frame 1 Frame 1
45. 4D Light-field Video
4D Temporally Coherent Light-field Video
A. Mustafa, M. Volino, J-Y. Guillemaut and A. Hilton
3D Vision (3DV) 2017
46. ALIVE – 4D Light-field Video
• Address limitation of 360 video
• Introduce light-fields for immersive virtual experiences
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
“Kinch and the double world” – Figment Cinematic VR Experience
SIGGRAPH 2018 VR Festival
Film festivals (Raindance, Strasbourg)
47. 4D Light-field Video
• 4D Temporally coherent light-field video for dynamic scenes
• Light-field scene flow using Epipolar Plane Image
• Efficient light-field representations for live action VR
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
48. 4D Light-field Video
48
Light-field video Camera 2 video
Light-field scene flow4D light-field video
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
49. 4D Vision - Summary
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
I. Outdoor 3D and Scene Understanding
I. SFD features for wide-baseline reconstruction [3DV 2015 , TIP 2018]
II. Unsupervised general scene reconstruction [ICCV 15]
III. Semantic reconstruction and segmentation [CVPR 2017]
II. 3D video to 4D models
I. Temporally coherent general scene reconstruction [CVPR 2016]
II. Non-sequential alignment [ECCV 2016]
III. 4D light-field video for virtual reality [3DV 2017]
4D Vision - Spatio-temporally coherent models from video
50. Future work: 4D Vision for perceptive machines
Robust machine perception of general dynamic scenes from video
4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
4D Vision for
Perceptive
Machines
Reconstruction
Registration
Machine
Learning
Artificial
Intelligence
Recognition