PAUL STURGESS AND SUNANDO SENGUPTA
OXFORD BROOKES UNIVERSITY
ICRA 2015
Semantic Octree: Unifying
Recognition, Reconstruction and
Representation via an Octree
Constrained Higher Order MRF
*Joint First Author, {paul.sturgess.cv,sunando.sengupta}@gmail.com
Semantic Octree
 Recognition
 Structured Prediction widely adopted in vision: AHRF[1]
 Efficiency of the outputted structure is not the focus.
 Reconstruction
 Octree widely adopted in robotics: Octomap[2]
 Incorporating high level semantic information is not the focus
 Unifying Representation
 Complementary to recognition and reconstruction.
 Efficient for further manipulations of underlying data.
 Combine Octomap and AHRF to get best of both
2
[1] P. Kohli et at. Robust Higher Order Potentials for Enforcing Label Consistency
[2] O Armin et. al., OctoMap: An efficient probabilistic 3D mapping framework based on octrees.
Recognition
3
● AHRF - Associative Higher-order Random Fields Framework.
● Multi-resolution approach to Semantic image segmentation.
● Efficient and bounded inference with alpha-expansion.
Reconstruction
4
 The main elements of a occupancy based scene reconstruction are:
 Occupied: Objects present in the world,
 Free: required for collision avoidance, path planning.
 Unmapped: unknown areas in the scene need to be avoided.
Representation
5
• Efficient access to, and manipulation of, 3D object models are at
the heart of robotics.
o Point clouds, Mesh---cannot map free and unknown area.
o Stixels/Height maps/2.5D---one height value in a 2D grid and free
area not accurately mapped.
o —Fixed sized grid of voxels---Voxels not indexed which makes it
inefficient
• Octree based volumetric representation
o Represents accurately 3d space, efficient indexing of volume
Image courtesy: O Armin et. al., OctoMap: An efficient probabilistic 3D mapping framework based on
octrees.
Semantic Octree - framework
6
 Input stereo images
Chap 6, Sec 6.3
Semantic Octree - framework
7
 Generate point clouds and class hypothesis for every pixel
Chap 6, Sec 6.3
Semantic Octree - framework
8
 Fuse into an octree through estimated camera
 Octree – each volume subdivided in 8 sub-volumes
 Leaf- nodes (xi) are the smallest sized voxels
 Any internal node (xc) gives a natural grouping of 3D space
Chap 6, Sec 6.3
Semantic Octree - framework
9
 Perform inference over 3D voxels to give labelled scene.
Chap 6, Sec 6.3
CRF graph on Octree voxels
10
 Octree divides the space into subvolumes indexed through tree
with nodes
 τint : Internal nodes in the tree (xc)
 τleaf : leaf level voxels (xi)
 Random variable for every leaf voxel
 Every internal node is associated with a set of leaf voxels
resulting in a clique
 Label set defined as
 Final energy :
 Octree Volume update
 All voxels initially set unknown and occupancy probability P(xi) = 0.5 and
log odds
 For each 3D point (obtained from stereo pairs), voxels’ log odds updated in
a ray casting manner
 Log odds are updated for all 3D points for every stereo pairs
 Final occupancy probability obtained as
Unary score for leaf voxels
11
Chap 6, Sec 6.3.1
 Each occupied voxel xi is associated with a set of 3D pts
 The corresponding image pixels denoted as
 Pixel scores combined together
 Given the initial occupancy P(xi), the unary is given as:
 Thus, for every initially estimated occupied voxels have low cost for
free label and vice verca
Unary score for leaf voxels
12
Chap 6, Sec 6.3.1
 Robust PN potential applied over hierarchical groupings of voxels
 Penalise label inconsistency within the grouping of voxels
 Takes the form
 Maximum cost truncated to ϒmax
 Grouping of voxels correspond to internals nodes in the octree
Hierarchical tree potential
13
Chap 6, Sec 6.3.2
Experiments
14
 Octree defined of 16 levels
 Smallest resolution of voxels = (8x8x8)cm3
 Maximum mapped volume (216 x 8 )3cm ~ 5.24km3
 Hierarchical grouping of voxels corresponding to internal nodes
13-15 considered
Results
15
 Higherarchial grouping while inference vs leaf level voxel
labelling (much sparser)
Chap 6, Sec 6.4
 Quantitative evaluation :
 Performed by projecting into image domain
 Observations
 Small objects tend to get decimated due to octree quantization while mesh
based representation better in representing surface.
Results
16
[1] Sengupta et.al. “Urban 3d semantic modelling using stereo vision,” in ICRA, 2013
[2] Valentin, et. al , “Mesh based semantic modelling for indoor and outdoor scenes,” in CVPR, 2013
[2]
[1]
[1]
[2]
[1]
[1]
Occupancy mapping
17
 Grouping of voxels hierarchically increases the occupied
volume reducing the sparsity
Conclusion
18
● Proposed a method which performs reconstruction in an efficient
representation aided by semantics of the scene
● Combined AHRF and Octomap to get best of both
● Some Future Applications
○ Scene interaction and manipulation.
○ Collision detection, with known object types.
○ Path Planning with known affordances.

ICRA 2015 interactive presentation

  • 1.
    PAUL STURGESS ANDSUNANDO SENGUPTA OXFORD BROOKES UNIVERSITY ICRA 2015 Semantic Octree: Unifying Recognition, Reconstruction and Representation via an Octree Constrained Higher Order MRF *Joint First Author, {paul.sturgess.cv,sunando.sengupta}@gmail.com
  • 2.
    Semantic Octree  Recognition Structured Prediction widely adopted in vision: AHRF[1]  Efficiency of the outputted structure is not the focus.  Reconstruction  Octree widely adopted in robotics: Octomap[2]  Incorporating high level semantic information is not the focus  Unifying Representation  Complementary to recognition and reconstruction.  Efficient for further manipulations of underlying data.  Combine Octomap and AHRF to get best of both 2 [1] P. Kohli et at. Robust Higher Order Potentials for Enforcing Label Consistency [2] O Armin et. al., OctoMap: An efficient probabilistic 3D mapping framework based on octrees.
  • 3.
    Recognition 3 ● AHRF -Associative Higher-order Random Fields Framework. ● Multi-resolution approach to Semantic image segmentation. ● Efficient and bounded inference with alpha-expansion.
  • 4.
    Reconstruction 4  The mainelements of a occupancy based scene reconstruction are:  Occupied: Objects present in the world,  Free: required for collision avoidance, path planning.  Unmapped: unknown areas in the scene need to be avoided.
  • 5.
    Representation 5 • Efficient accessto, and manipulation of, 3D object models are at the heart of robotics. o Point clouds, Mesh---cannot map free and unknown area. o Stixels/Height maps/2.5D---one height value in a 2D grid and free area not accurately mapped. o —Fixed sized grid of voxels---Voxels not indexed which makes it inefficient • Octree based volumetric representation o Represents accurately 3d space, efficient indexing of volume Image courtesy: O Armin et. al., OctoMap: An efficient probabilistic 3D mapping framework based on octrees.
  • 6.
    Semantic Octree -framework 6  Input stereo images Chap 6, Sec 6.3
  • 7.
    Semantic Octree -framework 7  Generate point clouds and class hypothesis for every pixel Chap 6, Sec 6.3
  • 8.
    Semantic Octree -framework 8  Fuse into an octree through estimated camera  Octree – each volume subdivided in 8 sub-volumes  Leaf- nodes (xi) are the smallest sized voxels  Any internal node (xc) gives a natural grouping of 3D space Chap 6, Sec 6.3
  • 9.
    Semantic Octree -framework 9  Perform inference over 3D voxels to give labelled scene. Chap 6, Sec 6.3
  • 10.
    CRF graph onOctree voxels 10  Octree divides the space into subvolumes indexed through tree with nodes  τint : Internal nodes in the tree (xc)  τleaf : leaf level voxels (xi)  Random variable for every leaf voxel  Every internal node is associated with a set of leaf voxels resulting in a clique  Label set defined as  Final energy :
  • 11.
     Octree Volumeupdate  All voxels initially set unknown and occupancy probability P(xi) = 0.5 and log odds  For each 3D point (obtained from stereo pairs), voxels’ log odds updated in a ray casting manner  Log odds are updated for all 3D points for every stereo pairs  Final occupancy probability obtained as Unary score for leaf voxels 11 Chap 6, Sec 6.3.1
  • 12.
     Each occupiedvoxel xi is associated with a set of 3D pts  The corresponding image pixels denoted as  Pixel scores combined together  Given the initial occupancy P(xi), the unary is given as:  Thus, for every initially estimated occupied voxels have low cost for free label and vice verca Unary score for leaf voxels 12 Chap 6, Sec 6.3.1
  • 13.
     Robust PNpotential applied over hierarchical groupings of voxels  Penalise label inconsistency within the grouping of voxels  Takes the form  Maximum cost truncated to ϒmax  Grouping of voxels correspond to internals nodes in the octree Hierarchical tree potential 13 Chap 6, Sec 6.3.2
  • 14.
    Experiments 14  Octree definedof 16 levels  Smallest resolution of voxels = (8x8x8)cm3  Maximum mapped volume (216 x 8 )3cm ~ 5.24km3  Hierarchical grouping of voxels corresponding to internal nodes 13-15 considered
  • 15.
    Results 15  Higherarchial groupingwhile inference vs leaf level voxel labelling (much sparser) Chap 6, Sec 6.4
  • 16.
     Quantitative evaluation:  Performed by projecting into image domain  Observations  Small objects tend to get decimated due to octree quantization while mesh based representation better in representing surface. Results 16 [1] Sengupta et.al. “Urban 3d semantic modelling using stereo vision,” in ICRA, 2013 [2] Valentin, et. al , “Mesh based semantic modelling for indoor and outdoor scenes,” in CVPR, 2013 [2] [1] [1] [2] [1] [1]
  • 17.
    Occupancy mapping 17  Groupingof voxels hierarchically increases the occupied volume reducing the sparsity
  • 18.
    Conclusion 18 ● Proposed amethod which performs reconstruction in an efficient representation aided by semantics of the scene ● Combined AHRF and Octomap to get best of both ● Some Future Applications ○ Scene interaction and manipulation. ○ Collision detection, with known object types. ○ Path Planning with known affordances.