3D Scene Analysis via Sequenced Predictions over Points and Regions

XuehanXiong, Daniel Munoz, J. Andrew Bagnell, Martial Hebert Carnegie Mellon University 3-D Scene Analysis via Sequenced Predictions over Points and Regions Presenter: Flavia Grosan Jacobs University Bremen, 2011 me@flaviagrosan.com http://flaviagrosan.com

Introduction Range scanners as standard equipment Scan segmentation = distribute the points in object classes Scene understanding Robot localization Difficulties in 3D: No color information Often noisy and sparse data Handling of previously unseen object instances of configurations

Definition 3D point cloud classification = assign one of the predefined class labels to each point of a cloud based on: Local properties of the point Global properties of the cloud

Segmentation algorithms Exploit different features Automatic trade off Enforce spatial contiguity Adjacent points in the scan tend to have the same label Adapt to the scanner used Different scanners produce qualitatively different outputs

Classification = Training + Validation Data: labeled instances  3D scan manually labeled Training set Validation set Test set Training Estimate parameters on training set Tune parameters on validation set Report results on test set Anything short of this yields over-optimistic claims Evaluation Many different metrics Ideally, the criteria used to train the classifier should be closely related to those used to evaluate the classifier Statistical issues Want a classifier which does well on test data Overfitting: fitting the training data very closely, but not generalizing well Error bars: want realistic (conservative) estimates of accuracy Training Data Validation Data Test Data

Some State-of-the-art Classifiers Support vector machine Random forests Apache Mahout Perceptron Nearest neighbor – kNN Bayesian classifiers Logistic regression

Approach – Generative Model Learn p(y) and p(x|y) – classification step Use Bayes rule: Classify as: p(y) p(x|y) p(y|x)

Approach – Generative Model Carnegie Mellon University, Artificial Intelligence, Fall 2010

State of the Art 3D Point Cloud Classifier Markov Random Fields Scan points modeled as random variables = nodes Each random variable corresponds to the label of each point Proximity links between points = edges Defines joint distribution Pairwise Markov networks Node and edges associated with potentials Node potential = a points ‘individual’ preference for different labels Edge potential = encode interactions between labels of related points

Markov Random Fields Conditional Probability Query: P(Y| X = xi) = ? Generate joint distribution, exhaustively sum out the joint. Bad News: NP-hard

Xiong et al. Approach Explicit joint probability distribution model: Does not model P(y|x) directly Exact inference is hard Approximate inference leads to poor results Instead, directly design and train an inference procedure via sequence of predictions from simple machine learning modules Use discriminative model Logistic regression Max-likelihood estimation problem

Overview 2 level hierarchy Top-level: region, mixed labels Bottom-level: points K-means++, k = 1% points to establish initial clusters Predict label distribution per region Update each region’s intra-level context using neighboring regions predictions Pass the predicted label distribution to the region’s points inter-level context

Overview At point level, train 2 classifiers: Inter-level context + point cloud descriptors Neighboring points predictions Move up in the hierarchy: Average the predicted label distribution of points over a region Send the average as inter-level context to the region Validation set determines the number of up-down iterations

Base Classifier (LogR) Assumption: log p(y|x) of each class is a linear function of x + a normalization constant Ci– RV for the class of a region xi – features yi – ground truth distribution of K labels w – parameters  ?

Base Classifier Max-likelihood estimation: Use regularization to avoid over fitting Concave problem, solved with stochastic gradient descent Choose an initial guess for w Take a small step in the direction opposite the gradient This gives a new configuration Iterate until gradient is 0

Contextual Features Construct a sphere around the region centroid (O) 12 meter radius Divide the sphere in 3 slices on vertical: 4m radius Average points’ label distribution within each slice A feature vector of length K/slice Average angles formed between z-axis and the [O, Ni] vector Ni = neighboring point (not part of this region) Models the spatial configuration of neighboring points 3(K+1) contextual features  add them to xi (a region’s features)

Multi-Round Stacking (MRS) X = {xi} – training set Y = {yi} – label distribution (ground truth) w1 = T(X, Y)- first trained classifier Y’ = P(X, w1) Use Y’ to compute new contextual features for X X’ w2 = T(X’, Y’) – train a second classifier Repeat until no improvement seen w1 is optimistically correct  w2 prone to overfitting

MRS – Avoid Overfitting Generate multiple temporary classifiers Partition the training set into 5 disjoint sets Train temporary classifier γ = T(X – Xi, Y – Yi) Use γ only on Xi to generate Y’I Discard γ afterwards Perform one or more rounds of stacking

Examine the w parameters computed A tree trunk region likely has: vegetation above, but not below car and ground below, but not on top

Stacked 3D Parsing Algorithm (S3DP) Labeled point cloud Construct 2-level hierarchy Top Bottom Extract point cloud features Create ground truth label distribution (Xt, Yt) - top (Xb, Yb) - bottom

Stacked 3D Parsing Algorithm (S3D) Parse UP the hierarchy: Apply N rounds of MRS on (Xb, Yb): N+1 classifiers Yb label prediction from the last round Extend each region’s feature vector with the average of its children’s probability distribution in Yb Apply N rounds of MRS on (Xt, Yt): N+1 classifiers Save ft and fb for inference

Stacked 3D Parsing Algorithm (S3D) Parse DOWN the hierarchy: Apply N rounds of MRS on (Xt, Yt): N+1 classifiers Yt label prediction from the last round Extend each point’s feature vector with the average of its parents’ probability distribution in Yt Apply N rounds of MRS on (Xb, Yb): N+1 classifiers Save ft and fb for inference

Experimental Setup - Features Bottom level Local neighborhood: 0.8m/2m radius Compute covariance matrix and eigenvalues a1> a2> a3 Scattered points: a1≅ a2≅a3 (vegetation) Linear structures: a1, a2>>a3 (wires) Solid surface: a1>> a2,a3 (tree trunks) Scalar projection of local tangent and normal directions on to z-axis

Experimental Setup - Features Bottom & top levels Bounding box enclosing the points Over local neighborhood at bottom level Over region itself at top level Relative elevations Take a horizontal cells of 10m x 10m, centered in centroid Compute min z- and max z- coordinates Compute 2 differences in elevation between region’s centroid elevation and it’s cells 2 extrema

Evaluation Metrics Questions answered Correct answers Objects correctly classified Misclassified objects Unclassified objects TP FP Recall= = fraction of all objects correctly classified For a class k: Precision= = fraction of all questions correctly answered

Experimental Results VMR-Oakland-v2 Dataset CMU Campus 3.1 M points 36 sets, each ~85,000 points 6 training sets 6 validation sets All remaining – test sets Labels: wire, pole, ground, vegetation, tree-trunk, building, car Comparison with associative Max-Margin Markov Network (M3N) algorithm

A. VMR-Oakland-v2 Dataset M3N Conditional Random Fields MRF trained discriminatively Pairwise model: Associative (Potts) model:

Experimental Results GML-PCV Dataset 2 aerial datasets, A and B Each dataset split in training and test, ~1 M points each Each training set split in learning and validation Labels: ground, roof/building, tree, low vegetation/shrub, car Comparison with Non-Associative Markov Network (NAMN) Pairwise Markov network constructed over segments Edge potentials non-zero for different labels

Experimental Results RSE-RSS Dataset 10 scans, each ~ 65,000 points, Velodyne laser on the ground Most difficult set: noise, sparse measurements and ground truth Labels:ground, street signs, tree, building, fence, person, car, background Comparison with the approach from Lai and Fox: Use information from World Wide Web (Google 3D Warehouse) to reduce the need for manually labeled training data

Final Comments S3DP performs a series of simple predictions Effective encoding of neighboring contexts Learning of meaningful spatial layouts E.g.: tree-trunks are below vegetation Usable in many environments scanned with different sensors S3DP requires about 42 seconds

References X. Xiong, D. Munoz, J. A. Bagnell, M. Hebert, 3-D Scene Analysis via Sequenced Predictions over Points and Regions, ICRA 2011 D. Anguelov, B. Taskar, V. Chatalbashev, Discriminative Learning of Markov Random Fields for Segmentation of 3D Scan Data, Computer Vision and Pattern Recognition, 2005 G. Obozinski, Practical Machine Learning CS 294, Berkeley University, Multi-Class and Structured Classification, 2008 A. Kulesza, F. Pereira, Structured Learning with Approximate Inference, In Proceedings of NIPS'2007 K. Lai, D. Fox, 3D Laser Scan Classification Using Web Data and Domain Adaptation, In International Journal of Robotics Research, Special Issue on Robotics: Science & Systems 2009, July 2010 D. Munoz, J.A. Bagnell, M. Hebert, Stacked Hierarchical Labeling, Paper and Presentation, European Conference on Computer Vision, 2010 D. Munoz, J. A. Bagnell, N. Vandapel, M. Hebert, Contextual Classification with Functional Max-Margin Markov Networks, Paper and Presentation, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), June, 2009 R. Shapovalov, A. Velizhev, O. Barinova, Non-Associative Markov Networks for 3D Point Cloud Classification, PCV 2010 C. Sutton, An Introduction to Conditional Random Fields, Statistical Machine Learning Class, University of Edinburgh M. Jordan, Machine Learning Class, University of California, Berkeley, Classification lecture D. Munoz, D. Bagnell, N. Vandapel, M. Hebert, Contextual Classification with Functional Max-Margin Markov Networks, Paper Presentation, 2009 P.J. Flynn, A.K. Jain, Surface Classification: Hypothesis Testing and Parameter Estimation, CVPR, 1988 S.L. Julien, Combining SVM with graphical models for supervised classification: an introduction to Max-Margin Markov Networks, University of California, Berkeley, 2003 D. Koller, N. Friedman, L. Getoor,B. Taskar, Graphical Models in a Nutshell, In Introduction to Statistical Relational Learning, 2007

3D Scene Analysis via Sequenced Predictions over Points and Regions

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 3D Scene Analysis via Sequenced Predictions over Points and Regions

Similar to 3D Scene Analysis via Sequenced Predictions over Points and Regions (20)

Recently uploaded

Recently uploaded (20)

3D Scene Analysis via Sequenced Predictions over Points and Regions

Editor's Notes