ICRA 2015 interactive presentation

PAUL STURGESS AND SUNANDO SENGUPTA
OXFORD BROOKES UNIVERSITY
ICRA 2015
Semantic Octree: Unifying
Recognition, Reconstruction and
Representation via an Octree
Constrained Higher Order MRF
*Joint First Author, {paul.sturgess.cv,sunando.sengupta}@gmail.com

Semantic Octree
 Recognition
 Structured Prediction widely adopted in vision: AHRF[1]
 Efficiency of the outputted structure is not the focus.
 Reconstruction
 Octree widely adopted in robotics: Octomap[2]
 Incorporating high level semantic information is not the focus
 Unifying Representation
 Complementary to recognition and reconstruction.
 Efficient for further manipulations of underlying data.
 Combine Octomap and AHRF to get best of both
2
[1] P. Kohli et at. Robust Higher Order Potentials for Enforcing Label Consistency
[2] O Armin et. al., OctoMap: An efficient probabilistic 3D mapping framework based on octrees.

Recognition
3
● AHRF - Associative Higher-order Random Fields Framework.
● Multi-resolution approach to Semantic image segmentation.
● Efficient and bounded inference with alpha-expansion.

Reconstruction
4
 The main elements of a occupancy based scene reconstruction are:
 Occupied: Objects present in the world,
 Free: required for collision avoidance, path planning.
 Unmapped: unknown areas in the scene need to be avoided.

Representation
5
• Efficient access to, and manipulation of, 3D object models are at
the heart of robotics.
o Point clouds, Mesh---cannot map free and unknown area.
o Stixels/Height maps/2.5D---one height value in a 2D grid and free
area not accurately mapped.
o —Fixed sized grid of voxels---Voxels not indexed which makes it
inefficient
• Octree based volumetric representation
o Represents accurately 3d space, efficient indexing of volume
Image courtesy: O Armin et. al., OctoMap: An efficient probabilistic 3D mapping framework based on
octrees.

Semantic Octree - framework
6
 Input stereo images
Chap 6, Sec 6.3

7
 Generate point clouds and class hypothesis for every pixel
Chap 6, Sec 6.3

8
 Fuse into an octree through estimated camera
 Octree – each volume subdivided in 8 sub-volumes
 Leaf- nodes (xi) are the smallest sized voxels
 Any internal node (xc) gives a natural grouping of 3D space
Chap 6, Sec 6.3

9
 Perform inference over 3D voxels to give labelled scene.
Chap 6, Sec 6.3

CRF graph on Octree voxels
10
 Octree divides the space into subvolumes indexed through tree
with nodes
 τint : Internal nodes in the tree (xc)
 τleaf : leaf level voxels (xi)
 Random variable for every leaf voxel
 Every internal node is associated with a set of leaf voxels
resulting in a clique
 Label set defined as
 Final energy :

 Octree Volume update
 All voxels initially set unknown and occupancy probability P(xi) = 0.5 and
log odds
 For each 3D point (obtained from stereo pairs), voxels’ log odds updated in
a ray casting manner
 Log odds are updated for all 3D points for every stereo pairs
 Final occupancy probability obtained as
Unary score for leaf voxels
11
Chap 6, Sec 6.3.1

 Each occupied voxel xi is associated with a set of 3D pts
 The corresponding image pixels denoted as
 Pixel scores combined together
 Given the initial occupancy P(xi), the unary is given as:
 Thus, for every initially estimated occupied voxels have low cost for
free label and vice verca
Unary score for leaf voxels
12
Chap 6, Sec 6.3.1

 Robust PN potential applied over hierarchical groupings of voxels
 Penalise label inconsistency within the grouping of voxels
 Takes the form
 Maximum cost truncated to ϒmax
 Grouping of voxels correspond to internals nodes in the octree
Hierarchical tree potential
13
Chap 6, Sec 6.3.2

Experiments
14
 Octree defined of 16 levels
 Smallest resolution of voxels = (8x8x8)cm3
 Maximum mapped volume (216 x 8 )3cm ~ 5.24km3
 Hierarchical grouping of voxels corresponding to internal nodes
13-15 considered

Results
15
 Higherarchial grouping while inference vs leaf level voxel
labelling (much sparser)
Chap 6, Sec 6.4

 Quantitative evaluation :
 Performed by projecting into image domain
 Observations
 Small objects tend to get decimated due to octree quantization while mesh
based representation better in representing surface.
Results
16
[1] Sengupta et.al. “Urban 3d semantic modelling using stereo vision,” in ICRA, 2013
[2] Valentin, et. al , “Mesh based semantic modelling for indoor and outdoor scenes,” in CVPR, 2013
[2]
[1]
[1]
[2]
[1]
[1]

Occupancy mapping
17
 Grouping of voxels hierarchically increases the occupied
volume reducing the sparsity

Conclusion
18
● Proposed a method which performs reconstruction in an efficient
representation aided by semantics of the scene
● Combined AHRF and Octomap to get best of both
● Some Future Applications
○ Scene interaction and manipulation.
○ Collision detection, with known object types.
○ Path Planning with known affordances.

ICRA 2015 interactive presentation

More Related Content

What's hot

Similar to ICRA 2015 interactive presentation

Recently uploaded

ICRA 2015 interactive presentation