Armin mustafa talk_08.11.18_a_imeetup

ARMIN MUSTAFA
ROYAL ACADEMY OF ENGINEERING RESEARCH FELLOW
4D Vision for Dynamic Scene
Understanding

4D VISION FOR DYNAMIC SCENE UNDERSTANDING
ARMIN MUSTAFA
What is 4D Vision?
Multi-view video 3D 4D
Spatio-temporally coherent models

Why 4D Vision?
ARMIN MUSTAFA
Robotics
Computer AnimationComputer Graphics Medical Imaging
Virtual Reality Digital Media

Why 4D Vision?
ARMIN MUSTAFA
4D Vision enables machine perception
o Autonomous machine perception for:
o Online video-rate content capture
o Tools for content production in films (e.g.: automatic rotoscoping)
o Intelligent next-generation of sophisticated gaming
o VR/AR/MR (e.g.: holoportation for general scenes, virtual tourism,
immersive story telling)

4D Vision - Challenges
Input:
o Uncalibrated wide-baseline multi-views from static/moving cameras
o Challenging outdoor scenes:
o Large capture volume
o Natural scene backgrounds
o Uncontrolled illumination and Repetitive texture
o Dynamic fast scene motion
ARMIN MUSTAFA

4D Vision - Challenges
o Temporally coherent reconstruction of complex dynamic scenes.
o Unknown background, structure and segmentation.
ARMIN MUSTAFA

No prior information
Moving cameras
ARMIN MUSTAFA
Multi-view scene
Framework
Object identification
Temporal coherence
4D scene reconstruction and
segmentation
4D Vision – Overview

Contributions to 4D Vision
General Scene Reconstruction
Temporally Coherent Reconstruction
4D Light-field Video
Non-sequential Alignment
ARMIN MUSTAFA
Semantic Reconstruction

General scene reconstruction (3D)
General Dynamic Scene Reconstruction from Multiple View Video.
A. Mustafa, H. Kim, J-Y. Guillemaut and A. Hilton
International Conference in Computer vision (ICCV) 2015
Multi-scale Segmentation based Features for Wide-baseline Scene Reconstruction.
A. Mustafa, H. Kim and A. Hilton
IEEE Transactions in Image Processing (TIP) 2018

Existing methods - Problems
ARMIN MUSTAFA
Segmentation Depth map
o Requires accurate segmentation of dynamic foreground objects
o Known background and structure
J-Y. Guillemaut and A. Hilton. Joint Multi-Layer Segmentation and Reconstruction for Free-Viewpoint Video Applications. IJCV 2010

Contributions
o Unsupervised dense reconstruction of general scenes without priors.
o Robust joint refinement of reconstruction and segmentation.
ARMIN MUSTAFA

Scene level
Multi-view
data
Feature
Detection
Sparse
Reconstruction
Feature
Matching
Framework – General scene reconstruction
ARMIN MUSTAFA

Scene level
Multi-view
data
Object
Clustering
Feature
Detection
Sparse
Reconstruction
Feature
Matching
ClusteringSparse point cloud
ARMIN MUSTAFA

Scene level
Multi-view
data
Object
Clustering
Feature
Detection
Sparse
Reconstruction
Feature
Matching
Object level
Initial coarse
reconstruction
ARMIN MUSTAFA

Multi-view
data
Object
Clustering
Feature
Detection
Sparse
Reconstruction
Feature
Matching
Initial coarse
reconstruction
Refinement
o Joint segmentation and reconstruction
optimization
o Photo-consistency, Smoothness and
Contrast constraints
ARMIN MUSTAFA

General dynamic reconstruction
 A method to segment and reconstruct dynamic objects with improved quality
 No prior information on background appearance or structure
ARMIN MUSTAFA

Limitations
 Per-frame inconsistent reconstruction and segmentation
 The quality of results is far from perfect
ARMIN MUSTAFA

Temporally coherent general scene
reconstruction
Temporally coherent 4D reconstruction of complex dynamic scenes
A. Mustafa, H. Kim, J-Y. Guillemaut and A. Hilton
Computer vision and pattern recognition (CVPR) 2016

Contributions
o Temporally coherent general scene reconstruction and segmentation.
o Improved joint refinement by introducing geodesic star-convexity.
ARMIN MUSTAFA

Previous
frame mesh
Optical flow
Dense temporal
correspondence
Initial coarse
reconstruction
Final mesh
Temporal coherence
ARMIN MUSTAFA

Input
4D Reconstruction
ARMIN MUSTAFA

Limitations
 The segmentation and reconstruction quality is not perfect
What we want!What we get!
ARMIN MUSTAFA

Semantically Coherent Co-segmentation and
Reconstruction
Semantically coherent co-segmentation and reconstruction of dynamic scenes
A. Mustafa and A. Hilton
Computer vision and pattern recognition (CVPR) 2017

Contributions
o Semantic co-segmentation and reconstruction of complex scenes
o Temporal semantic coherence across sequence
ARMIN MUSTAFA

Framework – Semantically coherent reconstruction
Multi-view
data
ARMIN MUSTAFA

Scene level
Multi-view
data
Initial Semantic
Segmentation
ARMIN MUSTAFA
FCNs produce segmentations with poorly
localized object boundaries

Scene level
Multi-view
data
Object
Clustering
Initial Semantic
Segmentation
Sparse Reconstruction
ARMIN MUSTAFA

Object level
Initial Semantic 3D
Reconstruction
Scene level
Multi-view
data
Object
Clustering
Initial Semantic
Segmentation
ARMIN MUSTAFA

Scene level
Multi-view
data
Object
Clustering
Initial Semantic
Segmentation
ARMIN MUSTAFA
Object level
Semantic
Tracklets
Initial Semantic 3D
Reconstruction
Introduce temporal and
semantic coherence

Semantic tracklets
ARMIN MUSTAFA
where N is the number of views
Am is the measure of appearance similarity.
S(i,j) = 1 /3N 𝑐=1
𝑁
𝐴𝑚 + 𝑆𝑚 + 𝐿𝑚
All frames with similarity > 0.75 are selected to form a semantic tracklet
Lm is the measure of class labels in the semantic segmentation region

Scene level
Multi-view
data
Object
Clustering
Initial Semantic
Segmentation
ARMIN MUSTAFA
Object level
Initial Semantic 3D
Reconstruction
Refinement
Semantic
Tracklets
E(l,d) = δ Esemantic(l,d) + + + +

Frame 11
Frame 42
Frame 26
Frame 56
ARMIN MUSTAFA

Multi-view data
Initial Semantic
Segmentation
Semantically Coherent
Segmentation
ARMIN MUSTAFA

Results and Evaluation
ARMIN MUSTAFA
CVPR16ProposedInput

ARMIN MUSTAFA
Original videos Semantic reconstruction
Semantic co-segmentation Segmentation comparison

ARMIN MUSTAFA
Input videos Semantic reconstruction
Semantic co-segmentation

ARMIN MUSTAFA
Input videos Semantically coherent reconstruction
Semantic segmentation comparison

o Semantic co-segmentation and reconstruction of dynamic scenes
o Temporal semantic coherence enforced by semantic tracklets
o Improved segmentation and reconstruction of dynamic scenes
Semantically coherent reconstruction - Conclusions
Original Image
Frame 195
ARMIN MUSTAFA

Results – 4D Reconstruction
ARMIN MUSTAFA
Dance dataset Juggler dataset

Limitations
 Sequential alignment is prone to errors due to drift and
large complex motions
Frame 26 Frame 86 Frame 86
What we want!
ARMIN MUSTAFA

4D match trees for non-sequential surface
alignment
4D Match Trees for Non-rigid Surface Alignment
A. Mustafa, H. Kim and A. Hilton
European conference in computer vision (ECCV) 2016

Contributions
o Robust global 4D alignment of partial reconstructions of non-rigid shape
o Sparse matching between wide-timeframe image pairs using SFD
o 4D Match Trees to represent the optimal non-sequential alignment path
ARMIN MUSTAFA

Multi-view video
+ Surface
Wide-timeframe
sparse matches
4D Match Tree Dense
correspondence
4D scene reconstruction
Frame 1
Results – Non-sequential alignment
ARMIN MUSTAFA

Results – Non-sequential alignment
ARMIN MUSTAFA
Frame 1 Frame 1

4D Temporally Coherent Light-field Video
A. Mustafa, M. Volino, J-Y. Guillemaut and A. Hilton
3D Vision (3DV) 2017

ALIVE – 4D Light-field Video
• Address limitation of 360 video
• Introduce light-fields for immersive virtual experiences
ARMIN MUSTAFA
“Kinch and the double world” – Figment Cinematic VR Experience
SIGGRAPH 2018 VR Festival
Film festivals (Raindance, Strasbourg)

• 4D Temporally coherent light-field video for dynamic scenes
• Light-field scene flow using Epipolar Plane Image
• Efficient light-field representations for live action VR
ARMIN MUSTAFA

48
Light-field video Camera 2 video
Light-field scene flow4D light-field video
ARMIN MUSTAFA

4D Vision - Summary
ARMIN MUSTAFA
I. Outdoor 3D and Scene Understanding
I. SFD features for wide-baseline reconstruction [3DV 2015 , TIP 2018]
II. Unsupervised general scene reconstruction [ICCV 15]
III. Semantic reconstruction and segmentation [CVPR 2017]
II. 3D video to 4D models
I. Temporally coherent general scene reconstruction [CVPR 2016]
II. Non-sequential alignment [ECCV 2016]
III. 4D light-field video for virtual reality [3DV 2017]
4D Vision - Spatio-temporally coherent models from video

Future work: 4D Vision for perceptive machines
 Robust machine perception of general dynamic scenes from video
ARMIN MUSTAFA
4D Vision for
Perceptive
Machines
Reconstruction
Registration
Machine
Learning
Artificial
Intelligence
Recognition

THANK YOU!
ARMIN MUSTAFA

Armin mustafa talk_08.11.18_a_imeetup

Recommended

Recommended

More Related Content

Similar to Armin mustafa talk_08.11.18_a_imeetup

Similar to Armin mustafa talk_08.11.18_a_imeetup (17)

More from Peter Bloomfield

More from Peter Bloomfield (11)

Recently uploaded

Recently uploaded (20)

Armin mustafa talk_08.11.18_a_imeetup