Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Randomised Decision Forests
Tae-Kyun Kim
http://www.iis.ee.ic.ac.uk/icvl/
1
BMVA 2014
Computer Vision Summer School
Randomised Forests in the field
[Moosmann et al., 06]
[Shotton et al., 08]
water
boat
chair
tree
road
Visual Words
[Lepeti...
3
Randomised Forests @ ICL
ICCV13 (oral),
CVPR14 (oral) CVPR13
CVPR14 ICCV13
http://www.iis.ee.ic.ac.uk/icvl/publication.h...
4
Randomised Forests @ ICL
Ongoing, partially at IJCV12
BMVC12 BMVC10
ECCV 10,
ICCV 11
http://www.iis.ee.ic.ac.uk/icvl/pub...
Resource
• ICCV09 Tutorial on Boosting and Random Forest (by T-
K Kim et al)
http://www.iis.ee.ic.ac.uk/icvl/iccv09_tutori...
Resource
• For Random Forest Matlab codes
– (i) P. Dollar’s toolbox
http://vision.ucsd.edu/~pdollar/toolbox/doc/
– (ii) ht...
Boosting
Cascade
7
x
x
x
……t1 tTt2
Boosting
Decision tree
Randomised Decision Forest
[IJCV 2012]
[NIPS 2008]
Part I
8
f(x)
ΔE
In
Il Ir
Split node
x
……
y
Tree 1 Tree 2 Tree N
p(c)
Leaf node
…
More discriminative
split functions
BMVC12
Mult...
Tutorial Structure
• Part I: Tree-structured Algorithms
• Computational Demands
• Boosting
– Speed-up, Multiple-classes
• ...
Tutorial Structure
10
f(x)
ΔE
In
Il Ir
Split node
p(c)
Leaf node
…
• Part II: Randomised Forests (Case studies)
• Capturin...
PART I
Tree-structured Algorithms
11
Extremely fast classification by
tree structure algorithms
12
Object Detection
13
Object Detection
14
15
Object Detection
Number of windows
16
Number of Windows: 747,666
x
# of scales
# of pixels
Time per window
17
or raw pixels
……
dimensionD
Num of feature
vectors: 747,666
…
Classification:
Time per window
18
or raw pixels
……
dimensionD
Num of feature
vectors: 747,666
…
Time per window (or vector):
0.00000134 s...
Real-Time Object Tracking
by Fast (Re-) Detection
19
From
Time t to
t+1
Detection again
Search region
• Online discriminat...
Semantic Segmentation
• Requiring pixel-wise classification
20
Input Output
• Integrating Visual Cues [Darrell et al IJCV 00]
– Face pattern detection output (left).
– Connected components recovered...
Since about 2001…
Boosting Simple Features [Viola &Jones 01]
• Adaboost classification
• Weak classifiers: Haar-basis like...
Boosting Simple Features
[Viola and Jones CVPR 01]
• Integral image
– A value at (x,y) is the sum of the
pixel values abov...
Boosting as an optimisation
framework
24
• Input/output:
• Init:
• For t=1 to T
– Learn ht that minimises
– Set the hypothesis weight
– Update the sample weights
•...
Boosting
 Iteratively reweighting training samples.
 Higher weights to previously misclassified samples.
26
1 round2 rou...
Unified Boosting Framework
AnyBoost : unified framework
[Mason et al 00]
• Most boosting algorithms have in common that they
iteratively update sampl...
AnyBoost an example
• Loss ftn:
• Weight samples:
29
AnyBoost related
30
• Multi-component boosting [Dollar et al ECCV08]
• MP boosting [Babenko et al ECCVW08]
• MCBoost [Kim ...
Boosting as a Tree-structured
Classifier
Boosting (very shallow network)
• The strong classifier H as boosted decision stumps has
a flat structure
– Cf. Decision “...
Boosting -continued
• Good generalisation by a flat structure
• Fast evalution
• Sequential optimisation
33
A strong boost...
Speeding up
A sequential structure of varying
length
• FloatBoost [Li et al PAMI04]
– A backtrack mechanism deleting non-effective wea...
Making a Shallow Network Deep
36
v
v
T-K Kim, I Budvytis, R. Cipolla, IJCV 2012
Imperial College, Univ. of Cambridge
37
Decision tree Boosting
• Many short paths for speeding up
• Preserving (smooth) decision regions for good
generalisatio...
• Many short paths for speeding
up
• Preserving (smooth) decision
regions for good generalisation
38
6
8
11
16
13
2
2
3
2
...
39
Boolean optimisation formulation
• For a learnt boosting classifier
split a data space into 2m primitive regions by m bina...
Boolean expression minimization
• Optimally joining the regions of the same class label or don’t
care label.
• A short tre...
Boolean optimisation formulation
• Optimally short tree defined with average
expected path length of data points
where p(R...
Examples
generated from
GMMs
Synthetic data exp1
43
Face detection experiment
Boosting Fast exit [Zhou 05] Super tree
No. of
weak
learners
False
positive
rate
False
negativer...
ST vs Random forest
• ST is about 2 to 3 times faster than RF at similar accuracy
45
46
Experiments on tracking and
segmentation by ST
Segmentation by binary-class
RFs
47
global :
74.50%
average:
79.89%
global:
88.33%,
average:
87.26%
global:
77.46%,
averag...
Multi-Class Boosting
Multi-view and multi-category
object detection
• Images exhibit multi-modality.
• A single boosting classifier is not suff...
Multiple Classifier Boosting
(MCBoost)
T-K Kim, R Cipolla, NIPS 08
Problem
51
K-means
clustering
Face cluster 1
Face cluster 2
MCBoost
MCBoost: Multiple Classifier Boosting
• Objective ftn:
• The joint prob. as a Noisy-OR:
where
• By AnyBoost Framework [Mas...
State diagram
53
Toy XOR classification problem
• Discriminative clustering of positive samples.
• It needs not partition an input space.
54
Toy XOR classification problem
• Matlab demo
classifier1 classifier2 classifier3
Weaklearnerweight
Boosting round
10 20 30...
Experiments
• INRIA pedestrian data set containing 1207 pedestrian images and
11466 random images.
• PIE face data set inv...
Pedestrian detection by MCBoost
[Wojek, Walk, Schiele et al CVPR09]
57
• TUD
Brussels
onboard
dataset
58
recall
1-precision
1.0
0.0
0.0 1.0
Pedestrian detection by MCBoost
[Wojek, Walk, Schiele et al CVPR09]
59
Multi-class Boosting +
Speeding up
Multiclass object detection [Torralba
et al PAMI 07]
• Learning multiple
boosting
classifiers by
sharing features
• Tree s...
62
Multiclass object detection [Torralba
et al PAMI 07]
63
Multiclass object detection [Torralba et
al PAMI 07]
ClusterBoost [Wu et al ICCV07]
• Algorithm
– Start with one sub-category
– Select a feature per branch
by Boosting
– Divid...
ClusterBoost result
65
AdaTree Result [Grossmann 04]
66
1% Noise in data
Error
Boosting round
10% Noise in data
Error
Mean computation cost
0.5
0...
Multiple Classifier System
Committee Machine
Mixture of Experts [Jordan, Jacobs 94]
 Gating network encourages specialization (local
experts) instead of cooperation
6...
69
…
output
y1(x)
y2(x)
yL(x)
Input x S
Expert 1
Expert 2
Expert L
constant
g1
g2
gL
Ensemble learning:
Boosting and Baggi...
Committee Machine
70
 The average sum-of-squares error is
 The average error by acting individually
is
 The expected er...
Committee Machine
71
Randomised Decision Forests
(Basics)
Binary Decision Trees
1
2 3
6 74
9
5
8
category c
split nodes
leaf nodesv
10 11 12 13
14 15 16 17
• data vector v
• split ...
• Train data:
• Recursive algorithm
– set In of training examples that reach node n is split:
• Features f and thresholds ...
• Forest is an ensemble of decision trees
• Bagging (Bootstrap AGGregatING)
– For each set, it randomly draws examples fro...
• Forest is an ensemble of decision trees
– Classification is done by
Forest of Trees : Recognition
……
tree t1 tree tT
cat...
Overfit
v
vs
Reasonably Smooth
x
……tree t1 tree tTtree t2
77
DT RF
Decision Tree vs Random Forest
Wrap-up
79
Boosting
Bagging
Tree hierarchy
A decision stump
Random ferns?
Decision tree
Random forest
Boosted decision trees
Bagge...
Comparisons in literature
 In motion segmentation [Yin, Crinimisi
CVPR07],
• RF>GB(Boosting stumps)>=BT(Boosting trees) i...
Summary
81
Pros Cons
Random
Forest
• Generalization through
random samples/features
• Very fast classification
• Inherentl...
PART II
Randomised Forests
(by Case Studies)
82
83
f(x)
ΔE
In
Il Ir
Split node
x
……
y
Tree 1 Tree 2 Tree N
p(c)
Leaf node
…
More discriminative
split functions
BMVC12
Mul...
Random Forest Codebook
1. Capturing Structural + Semantic
Info.
84
p(c)
Leaf node
… Capturing structural
info. BMVC10
Real-time Action Categorisation
85
T. Yu, T-K Kim, R. Cipolla, BMVC 2010
Univ. of Cambridge, Imperial College
A novel real-time solution for action
recognition
• Our contribution is three-fold:
Efficient
codebook
learning
High run-t...
Typical Approaches
Feature
Encoding
Feature
Matching
K-means Clustering
Slow for Large Codebook
The “Bag of Words” (BOW) M...
Overview
Spatiotemporal
Semantic Texton
Forest
V-FAST
Corner
PSRM
BOST Random Forest
Classifier
K-means Forest
Results
Spa...
Building a codebook using STF
• Extract small video cuboids at detected keypoints
• Visual codebook using STF:
• Efficient...
Randomized Forest Codebook
543
1
2
8 96 7
42 61 3
98
……
tree t1 tree tT
75
Example Cuboids
6
Example Cuboids
6
…
FASTEST C...
Histogram of Forest Codewords
543
1
2
8 96 7
42 61 3
98
……
tree t1 tree tT
75
frequency
tree t1 tree tT
node index
“bag of...
TREE N
TREE N
Pyramidal Spatiotemporal Relationship Match
(PSRM)
92
Multiple Structural
Relationship Histograms
Pyramid
Match Kernel (PMK)
Pyramidal Spatiotemporal
Relationship Match (PSRM)
...
Experiments
• Short video sequences (50 frames ~ 2 seconds) are
extracted from the input video.
• Sampling frequency is 5 ...
Experiments: Results (KTH dataset)
90 91 92 93 94 95 96 97 98 99 100
Mined features (ICCV2009)
CCA (CVPR2007)
Neighbourhoo...
Experiments: Results
• Results: UT interaction dataset
• Run time performance
PSRM and BOST gave low accuracies when
appli...
2. Probabilistic Interpretation
of RF Output
98
p(c)
Leaf node
… Probabilistic outputs
ICCV11ECCV10
using 3D Shape Priors
99
Y. Chen, T-K Kim, R. Cipolla, ICCV 2011, ECCV 2010
Univ. of Cambridge, Imperial College
Object Ph...
Pose Estimation
• Signal: Poses
• Noise: Shape
variations + camera
view points
Challenges:
Pose and viewpoint variations >...
• Shape Generator (MS) by GPLVM
– Training Set: shapes in the canonical pose.
– Input: PCA coefficients of vertex position...
• Pose Generator (MA) by GPLVM
– Training Set: Synthetic 3D poses sequence.
– Input: PCA coefficients of vertex positions ...
The Graphical Model
Shape Synthesis
Pose
Generator
Shape
Generator
Shape Synthesis: Demo
Shape Generator Pose Generator
(Running)
104
Shape Generator Pose Generator
(Running)
Shape Synthesis: Demo
105
Shape Generator Pose Generator
(Running)
Shape Synthesis: Demo
106
Shape Generator Pose Generator
(Running)
Shape Synthesis: Demo
107
Shape Generator Pose Generator
(Running)
Shape Synthesis: Demo
108
Shape Generator Pose Generator
(Running)
Shape Synthesis: Demo
109
Shape Generator Pose Generator
(Running)
Shape Synthesis: Demo
110
Shape Generator Pose Generator
(Running)
Shape Synthesis: Demo
111
Shape Generator Pose Generator
(Running)
Shape Synthesis: Demo
112
Shape Generator Pose Generator
(Running)
Shape Synthesis: Demo
113
Shape Synthesis
Zero
Shape
V0
Pose
Generator
MA
Shape
Generator
MS
VA VA
VS
VS
Shape
Transfer
V
V
114
Solution: Two-Stage Model
• Main Ideas:
Discriminative + Generative
• Two stages:
1. Hypothesising
– Discriminative;
– Usi...
• Use 3 RFs to quickly hypothesize phenotype,
pose, and camera parameters.
• Learned on synthetic silhouettes generated by...
Examples of RF Classifiers
The phenotype
classifier
The pose
classifier
117
Shape Synthesis and Verification
• Generate 3D shapes V
– From candidate parameters
given by RFs.
– Use GPLVM shape priors...
• We infer the label c∗ of the query instance by
maximizing the posteriori probability
119
• We infer the label c∗ of the query instance by
maximizing the posteriori probability
120
Experiments
• Testing data:
– Manually segmented silhouettes;
• Current Datasets
– Human jumping jack
• (13 instances, 170...
Comparative Approaches:
• Learn a single RF phenotype classifier;
• Histogram of Shape Context (HoSC)
– [Agarwal and Trigg...
Comparative Approaches:
• Internal comparisons:
– Proposed method with both feature error
modelling and similarity-aware c...
Recognition Performance
• Cross-validation by splitting the dataset instances.
• 5 phenotype categories for every test.
• ...
Experiments on Single View
Reconstruction
Sharks:
125
Experiments on Single View
Reconstruction
Humans:
126
3. More discriminative split
functions
127
f(x)
ΔE
In
Il Ir
Split node
More discriminative
split functions
BMVC12
Classification forest: the split function model
Split ftn: axis aligned Split ftn: oriented line Split ftn: conic section
...
Fast Pedestrian Detection by Cascaded
Random Forest with Dominant Orientation
Templates
129
D. Tang, Y. Liu, T-K Kim, BMVC...
Beginning with Hough Forest … [Gall et al 2009]
Random Forest + Implicit Shape Model
Multi-cue features (HOG etc)
Hough vo...
Experiments - Accuracy
1. Dominant Orientations; 2. 2-Pixel Test; 3. Template Matching;
4. Dominant Colours; 5. Cascade; 6...
Experiments - Efficiency
132
Multi-cue features (HOG etc)
Hough voting
Patch input
133
...≤ τ > τ ≤ τ > τ
2 pixel test
(axis aligned splits)
i
j
f(p)=I...
Extract gradient information
Discretize dominant orientations into 1 byte
Discard magnitude information
134
Binarization o...
DOT
Hough voting
Patch input
135
...≤ τ > τ ≤ τ > τ
2 pixel test
(axis aligned
splits)
i
j
f(p)=I(i)-I(j)
Beginning with H...
Experiments - Accuracy
1. Dominant Orientations; 2. 2-Pixel Test; 3. Template Matching;
4. Dominant Colours; 5. Cascade; 6...
Template matching split function
2 pixel test
(axis aligned
splits),
efficient but
incapable…
137
Template matching split function
Template
based
(nonlinear),
capable but
efficient?
138
139
Template Matching Split Function
using Binary Bit Operations
0 0 1 0 0 1 0 0
0 0 1 0 1 0 0 0
1
≤ τ > τ
DOT
Hough voting
Patch input
140
...≤ τ > τ ≤ τ > τ
Template
Matching Split
Beginning with Hough Forest … [Gall et al 2009...
Experiments - Accuracy
1. Dominant Orientations; 2. 2-Pixel Test; 3. Template Matching;
4. Dominant Colours; 5. Cascade; 6...
Experiments - Efficiency
142
DOT
Hough voting
Patch input
143
...≤ τ > τ ≤ τ > τ
Template
Matching Split
Beginning with Hough Forest … [Gall et al 2009...
Architecture
DOT Feature Extraction from Input
...
Holistic Random Forest
⊗
...
Patch-based Random Forest
⊗
Hough VotingOu...
Experiments - Accuracy
1. Dominant Orientations; 2. 2-Pixel Test; 3. Template Matching;
4. Dominant Colours; 5. Cascade; 6...
Experiments - Efficiency
Time per window: 1.07x10-6 sec
5 Frames per sec
Cf. Fastest Pedestrian Detector in the West(FPDW)...
147
4. Multiple split criteria
148
f(x)
ΔE
In
Il Ir
Split node
Multiple split criteria
/adaptive weighting
CVPR13 ICCV13
Unconstrained Monocular 3D Human
Pose Estimation by Action Detection
T. Yu, T-K Kim, R. Cipolla, CVPR 2013
Univ. of Cambri...
Challenges
Typical techniques for 3D human pose estimation:
• Segmentation
• Multi-camera approach
• Depthmaps
• Inverse k...
Our Approach
• Action recognition + detection
1. Actions are temporally and spatially
structured
• Action recognition:
sel...
Our Approach
2. Unconstrained part detection
• Low level feature descriptors are replaced by a deformable part models (DPM...
System overview
Method: Feature extraction
Input:
• Action snippet, short, overlapping video subsequences
[Schindler CVPR2009]
• A DPM, us...
Action Detection
Action as an atomic sequence of shape
Each detection provides the
spatiotemporal structure of an action.
...
Method: Action Detection
•Randomized Hough Forest for action
recognition and pose estimation [Gall et al. PAMI2011]
• Acti...
Method: Action Detection
•Randomized Hough Forest for action
recognition and pose estimation [Gall et al. PAMI2011]
• ω is...
Method: Action Detection
• (Testing)
• Pass down feature vectors extracted from a video
snippet
• Action classification: s...
Pose Regression & Combination
Global:
Action detection
Local:
Regression Forests
Final 3D pose estimation:
A mixture of 3D...
Method: Pose Refinement
•Randomized Regression Forest
for local joint position
refinement
•Differences among individual ac...
Method: Final Pose Estimation
•Compute the final 3D pose
estimation
•Assuming the pose distributions from
(1) Action detec...
Experiments
• A relatively new problem
• No public dataset that provides both 2D and 3D ground truths
• We collected a new...
Experiments
• Comparison with the state-of-the-art 2D pose estimator
• A better accuracy is observed: Action detection (ad...
Experiments
• Demo Video
1. APE dataset
2. Extra challenges: KTH and Weizmann dataset
1. Originally used in action recogni...
165
166
Multiple Split Criteria / Adaptive
Weighting … goes on …
For Hand Pose Estimation
167
168
Summary
Boosting
Cascade
169
x
x
x
……t1 tTt2
Boosting
Decision tree
Randomised Decision Forest
[IJCV 2012]
[NIPS 2008]
Part I
170
f(x)
ΔE
In
Il Ir
Split node
x
……
y
Tree 1 Tree 2 Tree N
p(c)
Leaf node
…
More discriminative
split functions
BMVC12
Mu...
171
What other topics @ ICL
1 2 3
Thanks to
Jamie Shotton
Roberto
Cipolla
Wenhan Luo Akis Tsiotsios Yuru Pei Yang Liu
Tsz-Ho
Yu
Danhang
Tang
Yu
Chen
A. Doum...
173
Any Questions?
http://www.iis.ee.ic.ac.uk/icvl
Decision Forests and discriminant analysis
Upcoming SlideShare
Loading in …5
×

Decision Forests and discriminant analysis

3,441 views

Published on

You can watch the video lecture here: https://www.youtube.com/watch?v=pmIDt2dwQ8I

Published in: Software, Technology, Education
  • Be the first to comment

Decision Forests and discriminant analysis

  1. 1. Randomised Decision Forests Tae-Kyun Kim http://www.iis.ee.ic.ac.uk/icvl/ 1 BMVA 2014 Computer Vision Summer School
  2. 2. Randomised Forests in the field [Moosmann et al., 06] [Shotton et al., 08] water boat chair tree road Visual Words [Lepetit et al., 06] Keypoint Recognition [Gall et al., 09] Hough Forest [Shotton et al., 11] KINECT
  3. 3. 3 Randomised Forests @ ICL ICCV13 (oral), CVPR14 (oral) CVPR13 CVPR14 ICCV13 http://www.iis.ee.ic.ac.uk/icvl/publication.htm
  4. 4. 4 Randomised Forests @ ICL Ongoing, partially at IJCV12 BMVC12 BMVC10 ECCV 10, ICCV 11 http://www.iis.ee.ic.ac.uk/icvl/publication.htm CVPRW14 ECCV14
  5. 5. Resource • ICCV09 Tutorial on Boosting and Random Forest (by T- K Kim et al) http://www.iis.ee.ic.ac.uk/icvl/iccv09_tutorial.html • MVA13 Tutorial on Randomised Forests and Tree- structured Algorithms (by T-K Kim) http://www.iis.ee.ic.ac.uk/icvl/res_interest.htm • ICCV11 Tutorial on Decision Forests (by Criminisi and Shotton) http://research.microsoft.com/en- us/projects/decisionforests/ • ICCV13 Tutorial on Decision Forests and Field (by Nowozin and Shotton) http://research.microsoft.com/en- us/um/cambridge/projects/iccv2013tutorial/ 5
  6. 6. Resource • For Random Forest Matlab codes – (i) P. Dollar’s toolbox http://vision.ucsd.edu/~pdollar/toolbox/doc/ – (ii) http://code.google.com/p/randomforest-matlab/ – (iii) http://www.mathworks.co.uk/matlabcentral/fileexch ange/31036-random-forest – (iv) Karpathy’s toolbox https://github.com/karpathy/Random-Forest-Matlab. • For open-source visualisation tool for machine learning algorithms, including RF – MLDEMOS http://mldemos.epfl.ch/ 6
  7. 7. Boosting Cascade 7 x x x ……t1 tTt2 Boosting Decision tree Randomised Decision Forest [IJCV 2012] [NIPS 2008] Part I
  8. 8. 8 f(x) ΔE In Il Ir Split node x …… y Tree 1 Tree 2 Tree N p(c) Leaf node … More discriminative split functions BMVC12 Multiple split criteria /adaptive weighting CVPR13 ICCV13 Capturing structural info. BMVC10 Probabilistic outputs ICCV11ECCV10 Part II
  9. 9. Tutorial Structure • Part I: Tree-structured Algorithms • Computational Demands • Boosting – Speed-up, Multiple-classes • Committee Machine • Randomised Decision Forest (Basics) 9 Criminisi and Shotton, ICCV11 tutorial • Part II: Randomised Forest • Classification forest • Regression forest • Density/Manifold forest • Semi-supervised forest
  10. 10. Tutorial Structure 10 f(x) ΔE In Il Ir Split node p(c) Leaf node … • Part II: Randomised Forests (Case studies) • Capturing structural information • Probabilistic interpretation of outputs • More discriminative split functions • Multiple split criteria / adaptive weighting • Part I: Tree-structured Algorithms • Computational Demands • Boosting – Speed-up, Multiple-classes • Committee Machine • Randomised Decision Forest (Basics)
  11. 11. PART I Tree-structured Algorithms 11
  12. 12. Extremely fast classification by tree structure algorithms 12
  13. 13. Object Detection 13
  14. 14. Object Detection 14
  15. 15. 15 Object Detection
  16. 16. Number of windows 16 Number of Windows: 747,666 x # of scales # of pixels
  17. 17. Time per window 17 or raw pixels …… dimensionD Num of feature vectors: 747,666 … Classification:
  18. 18. Time per window 18 or raw pixels …… dimensionD Num of feature vectors: 747,666 … Time per window (or vector): 0.00000134 sec In order to finish the task in 1 sec Neural Network? Nonlinear SVM?
  19. 19. Real-Time Object Tracking by Fast (Re-) Detection 19 From Time t to t+1 Detection again Search region • Online discriminative feature selection [Collins et al 03], Ensemble tracking [Avidan 07] Previous object location
  20. 20. Semantic Segmentation • Requiring pixel-wise classification 20 Input Output
  21. 21. • Integrating Visual Cues [Darrell et al IJCV 00] – Face pattern detection output (left). – Connected components recovered from stereo range data (mid). – Flesh hue regions from skin hue classification (right). 21 More traditionally… Narrow down the search space
  22. 22. Since about 2001… Boosting Simple Features [Viola &Jones 01] • Adaboost classification • Weak classifiers: Haar-basis like functions (45,396 in total) 22 Weak classifier Strong classifier
  23. 23. Boosting Simple Features [Viola and Jones CVPR 01] • Integral image – A value at (x,y) is the sum of the pixel values above and to the left of (x,y). • The sum of original image values within the rectangle can be computed: Sum = A-B-C+D 23
  24. 24. Boosting as an optimisation framework 24
  25. 25. • Input/output: • Init: • For t=1 to T – Learn ht that minimises – Set the hypothesis weight – Update the sample weights • Break if AdaBoost [Freund and Schapire 04] 25 via min
  26. 26. Boosting  Iteratively reweighting training samples.  Higher weights to previously misclassified samples. 26 1 round2 rounds3 rounds4 rounds5 rounds50 rounds Slide credit to J. Shotton
  27. 27. Unified Boosting Framework
  28. 28. AnyBoost : unified framework [Mason et al 00] • Most boosting algorithms have in common that they iteratively update sample weights and select the next hypothesis based on the weighted samples. • Input: , Loss ftn , Output: • Init: • For t=1 to T – Find that minimises the weighted error – Set – Update 28
  29. 29. AnyBoost an example • Loss ftn: • Weight samples: 29
  30. 30. AnyBoost related 30 • Multi-component boosting [Dollar et al ECCV08] • MP boosting [Babenko et al ECCVW08] • MCBoost [Kim and Cipolla NIPS08]  Noisy-OR boosting for multiple instance learning [Viola et al NIPS06]
  31. 31. Boosting as a Tree-structured Classifier
  32. 32. Boosting (very shallow network) • The strong classifier H as boosted decision stumps has a flat structure – Cf. Decision “ferns” has been shown to outperform “trees” [Zisserman et al, 07] [Fua et al, 07] 32  c0 c1 c0 c1 c0 c1 c0 c1 c0 c1 c0 c1 x …… ……
  33. 33. Boosting -continued • Good generalisation by a flat structure • Fast evalution • Sequential optimisation 33 A strong boosting classifier Boosting Cascade [viola & Jones 04], Boosting chain [Xiao et al] Very unbalanced tree Speeds up for unbalanced binary problems Hard to design
  34. 34. Speeding up
  35. 35. A sequential structure of varying length • FloatBoost [Li et al PAMI04] – A backtrack mechanism deleting non-effective weak- learners at each round. • WaldBoost [Sochman and Matas CVPR05] – Sequential probability ratio test for the shortest set of weak- learners for the given error rate. 35 Classify x as + and exit Classify x as - and exit
  36. 36. Making a Shallow Network Deep 36 v v T-K Kim, I Budvytis, R. Cipolla, IJCV 2012 Imperial College, Univ. of Cambridge
  37. 37. 37 Decision tree Boosting • Many short paths for speeding up • Preserving (smooth) decision regions for good generalisation Converting a boosting classifier to a decision tree
  38. 38. • Many short paths for speeding up • Preserving (smooth) decision regions for good generalisation 38 6 8 11 16 13 2 2 3 2 1 4 7 5 times speed up Boosting Super tree Converting a boosting classifier to a decision tree
  39. 39. 39
  40. 40. Boolean optimisation formulation • For a learnt boosting classifier split a data space into 2m primitive regions by m binary weak-learners. • Code regions Ri i=1,..., 2m by boolean expressions. 40 R3 R5 R6 R1 R2 R4R7 W1 0 0 0 1 1 1 W3 W1 W2 W3 C R1 0 0 0 0 R2 0 0 1 0 R3 0 1 0 0 R4 0 1 1 1 R5 1 0 0 1 R6 1 0 1 1 R7 1 1 0 1 R8 1 1 1 x W2
  41. 41. Boolean expression minimization • Optimally joining the regions of the same class label or don’t care label. • A short tree built from the minimised boolean expression. 41 W2 R3 R5 R6 R1 R2 R4R7 W1 0 0 0 1 1 1 W3 W1 W2 W3 C R1 0 0 0 0 R2 0 0 1 0 R3 0 1 0 0 R4 0 1 1 1 R5 1 0 0 1 R6 1 0 1 1 R7 1 1 0 1 R8 1 1 1 x 0 1 0 0 1 1 W1 W2 W3 10 0 1 R4 R5,R6,R7,R8 R1,R2 R3 W1W2W3 v W1W2W3 v W1W2W3 vW1W2W3 W1 v W1W2W3
  42. 42. Boolean optimisation formulation • Optimally short tree defined with average expected path length of data points where p(Ri)=Mi/M and the tree must duplicate the Boosting decision regions. 42 Solutions  Boolean expression minimization  Growing a tree from the decision regions  Extended region coding
  43. 43. Examples generated from GMMs Synthetic data exp1 43
  44. 44. Face detection experiment Boosting Fast exit [Zhou 05] Super tree No. of weak learners False positive rate False negativer ate Average path length False positive rate False negativer ate Average path length False positiv e rate False negative rate Average path length 20 501 120 20 501 120 11.70 476 122 7.51 40 264 126 40 264 126 23.26 258 128 15.17 60 222 143 60 222 143 37.24 246 126 13.11 100 148 146 100 148 146 69.28 145 152 15.1 200 120 143 200 120 143 146.19 128 146 15.8 44 Total data points = 57499  The proposed solution is about 3 to 5 times faster than boosting and 1.5 to 2.8 times faster than [Zhou 05], at the similar accuracy.
  45. 45. ST vs Random forest • ST is about 2 to 3 times faster than RF at similar accuracy 45
  46. 46. 46 Experiments on tracking and segmentation by ST
  47. 47. Segmentation by binary-class RFs 47 global : 74.50% average: 79.89% global: 88.33%, average: 87.26% global: 77.46%, average: 80.45% global: 85.55 %, average: 85.24 % Building Non-building Error Tree Non-tree Error Car Non-car Error Road Non-road Error
  48. 48. Multi-Class Boosting
  49. 49. Multi-view and multi-category object detection • Images exhibit multi-modality. • A single boosting classifier is not sufficient. • Manual labeling sub-categories. 49 Images from Torralba et al 07
  50. 50. Multiple Classifier Boosting (MCBoost) T-K Kim, R Cipolla, NIPS 08
  51. 51. Problem 51 K-means clustering Face cluster 1 Face cluster 2 MCBoost
  52. 52. MCBoost: Multiple Classifier Boosting • Objective ftn: • The joint prob. as a Noisy-OR: where • By AnyBoost Framework [Mason et al. 00] – For t=1 to T For k=1 to K – Update sample weights 52
  53. 53. State diagram 53
  54. 54. Toy XOR classification problem • Discriminative clustering of positive samples. • It needs not partition an input space. 54
  55. 55. Toy XOR classification problem • Matlab demo classifier1 classifier2 classifier3 Weaklearnerweight Boosting round 10 20 30 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 1.2 1.1 55
  56. 56. Experiments • INRIA pedestrian data set containing 1207 pedestrian images and 11466 random images. • PIE face data set involving 1800 face images and 14616 random images. • A total number of 21780 simple rectangle features. FaceimagesPedestrianimages Random images + simple features Image cluster centers K=5 K=3 56
  57. 57. Pedestrian detection by MCBoost [Wojek, Walk, Schiele et al CVPR09] 57 • TUD Brussels onboard dataset
  58. 58. 58 recall 1-precision 1.0 0.0 0.0 1.0 Pedestrian detection by MCBoost [Wojek, Walk, Schiele et al CVPR09]
  59. 59. 59
  60. 60. Multi-class Boosting + Speeding up
  61. 61. Multiclass object detection [Torralba et al PAMI 07] • Learning multiple boosting classifiers by sharing features • Tree structure speeds up the classification 61
  62. 62. 62 Multiclass object detection [Torralba et al PAMI 07]
  63. 63. 63 Multiclass object detection [Torralba et al PAMI 07]
  64. 64. ClusterBoost [Wu et al ICCV07] • Algorithm – Start with one sub-category – Select a feature per branch by Boosting – Divide the branch if the feature is too weak by clustering on the feature value – Continue to grow while updating the weak-learners of parent nodes • Cf. VBT [Huang et al ICCV05],PBT [Tu ICCV05] 64 Feature sharing ←Splitting ←Splitting
  65. 65. ClusterBoost result 65
  66. 66. AdaTree Result [Grossmann 04] 66 1% Noise in data Error Boosting round 10% Noise in data Error Mean computation cost 0.5 0.4 0.3 0.2 0.1 0 0 90 0.5 0.4 0.3 0.2 0.1 0 0 90
  67. 67. Multiple Classifier System Committee Machine
  68. 68. Mixture of Experts [Jordan, Jacobs 94]  Gating network encourages specialization (local experts) instead of cooperation 68 … output y1(x) y2(x) yL(x) Input x S g1(x) g2(x) gL(x) Expert 1 Expert 2 Expert L Gating Network Gating ftn
  69. 69. 69 … output y1(x) y2(x) yL(x) Input x S Expert 1 Expert 2 Expert L constant g1 g2 gL Ensemble learning: Boosting and Bagging • Cooperation instead of specialization
  70. 70. Committee Machine 70  The average sum-of-squares error is  The average error by acting individually is  The expected error of the committee is
  71. 71. Committee Machine 71
  72. 72. Randomised Decision Forests (Basics)
  73. 73. Binary Decision Trees 1 2 3 6 74 9 5 8 category c split nodes leaf nodesv 10 11 12 13 14 15 16 17 • data vector v • split functions fn(v) • thresholds tn • Classifications Pn(c) ≥ < < ≥ Slide credit to J. Shotton …… ↔ 73
  74. 74. • Train data: • Recursive algorithm – set In of training examples that reach node n is split: • Features f and thresholds t chosen at random • Choose f and t to maximize gain in information Randomized Tree Learning left split right split category c In Il Ir category c category c 74 Slide credit to J. Shotton
  75. 75. • Forest is an ensemble of decision trees • Bagging (Bootstrap AGGregatING) – For each set, it randomly draws examples from the uniform dist. allowing duplication and missing. Forest of Trees : Learning ……tree t1 tree tT DS1 DST DS 75
  76. 76. • Forest is an ensemble of decision trees – Classification is done by Forest of Trees : Recognition …… tree t1 tree tT category c category c split nodes leaf nodes v v 76
  77. 77. Overfit v vs Reasonably Smooth x ……tree t1 tree tTtree t2 77 DT RF Decision Tree vs Random Forest
  78. 78. Wrap-up
  79. 79. 79 Boosting Bagging Tree hierarchy A decision stump Random ferns? Decision tree Random forest Boosted decision trees Bagged boosting classifiers Boosted decision stumps Inspired by Yin, Crinimisi CVPR07
  80. 80. Comparisons in literature  In motion segmentation [Yin, Crinimisi CVPR07], • RF>GB(Boosting stumps)>=BT(Boosting trees) in accuracy • RF=GB>BT in speed at their best accuracy  In face detection/recognition [Belle et al ICPR08], • RF>AB (AdaBoost) at low false positive rate, • AB>RF at high false positive rate, in face detection accuracy • RF>AB in speed • SVM>RF in face recognition 80
  81. 81. Summary 81 Pros Cons Random Forest • Generalization through random samples/features • Very fast classification • Inherently multi-classes • Simple training • Inconsistency • Difficulty for adaptation Boosting Decision Stumps • Generalisation by a flat structure • Fast classification • Optimisation framework • Slower than RF • Slow training
  82. 82. PART II Randomised Forests (by Case Studies) 82
  83. 83. 83 f(x) ΔE In Il Ir Split node x …… y Tree 1 Tree 2 Tree N p(c) Leaf node … More discriminative split functions BMVC12 Multiple split criteria /adaptive weighting CVPR13 ICCV13 Capturing structural info. BMVC10 Probabilistic outputs ICCV11ECCV10 Part II (Recap)
  84. 84. Random Forest Codebook 1. Capturing Structural + Semantic Info. 84 p(c) Leaf node … Capturing structural info. BMVC10
  85. 85. Real-time Action Categorisation 85 T. Yu, T-K Kim, R. Cipolla, BMVC 2010 Univ. of Cambridge, Imperial College
  86. 86. A novel real-time solution for action recognition • Our contribution is three-fold: Efficient codebook learning High run-time performance Local appearance + structural information 86
  87. 87. Typical Approaches Feature Encoding Feature Matching K-means Clustering Slow for Large Codebook The “Bag of Words” (BOW) Model Lacks Structural Information Quantisation Error Our Method Semantic Texton Forest Efficient PSRM Structural Information Hierarchical Matching Robust Comparison with existing approaches 87
  88. 88. Overview Spatiotemporal Semantic Texton Forest V-FAST Corner PSRM BOST Random Forest Classifier K-means Forest Results Spatio-temporal Cuboids Feature detection Feature extraction Feature matching Classification 88
  89. 89. Building a codebook using STF • Extract small video cuboids at detected keypoints • Visual codebook using STF: • Efficient visual codebook • One feature → multiple codewords. • Quantisation and partial matching Random forest based codebook • Work on pixels directly • Hierarchical splits “Textonises” patches recursively 89
  90. 90. Randomized Forest Codebook 543 1 2 8 96 7 42 61 3 98 …… tree t1 tree tT 75 Example Cuboids 6 Example Cuboids 6 … FASTEST Corner …
  91. 91. Histogram of Forest Codewords 543 1 2 8 96 7 42 61 3 98 …… tree t1 tree tT 75 frequency tree t1 tree tT node index “bag of words” …… Extremely Fast  Powerful Discriminative Codebook
  92. 92. TREE N TREE N Pyramidal Spatiotemporal Relationship Match (PSRM) 92
  93. 93. Multiple Structural Relationship Histograms Pyramid Match Kernel (PMK) Pyramidal Spatiotemporal Relationship Match (PSRM) 93
  94. 94. Experiments • Short video sequences (50 frames ~ 2 seconds) are extracted from the input video. • Sampling frequency is 5 frames for experiment and 1 frame for the laptop demo. (so it is a frame-by-frame recognition) • Two datsets are used for performance evaluation: • The standard benchmark • Six classes, with viewpoint changes, illumination changes, zoom , etc. KTH dataset • Human interactions, 6 classes of actions, cluttered background UT dataset (for ICPR contest on Semantic Description of Human Activities 2010) • Intel Core i7 920 (for accuracy and speed tests) • Core 2 Duo P9400 (for laptop demo) Hardware specifications KTH dataset UT interaction dataset 94
  95. 95. Experiments: Results (KTH dataset) 90 91 92 93 94 95 96 97 98 99 100 Mined features (ICCV2009) CCA (CVPR2007) Neighbourhood (CVPR2010) Info. Max. (CVPR2008) Shape-motion tree (ICCV2009) Vocabulary Forest(CVPR2008) Point clouds (CVPR2009) our method (sequence) our method (snippets) 96.7 95.33 94.53 94.15 93.43 93.17 93.17 95.67 93.55 Comparison with recent state-of-the-art • Comparable to most state-of-the-art. • Around ~3% slower than the top performer • Is it a sensible trade-off? • Useful for many more practical applications. (surveillance, robotics, etc.) snippet: subsequence level recognition sequence: major voting of subsequence labels leave-of-out-cross- validation Leave-of-out-cross- validation 95
  96. 96. Experiments: Results • Results: UT interaction dataset • Run time performance PSRM and BOST gave low accuracies when applied separately. ~20% performance improved by simply combining the class labels! < 25 fps, but enough for most real-time applications Can be further optimised (e.g. GPU, mult-core processing) 96
  97. 97. 2. Probabilistic Interpretation of RF Output 98 p(c) Leaf node … Probabilistic outputs ICCV11ECCV10
  98. 98. using 3D Shape Priors 99 Y. Chen, T-K Kim, R. Cipolla, ICCV 2011, ECCV 2010 Univ. of Cambridge, Imperial College Object Phenotype Recognition
  99. 99. Pose Estimation • Signal: Poses • Noise: Shape variations + camera view points Challenges: Pose and viewpoint variations >> phenotype variations Phenotype Recognition vs • Signal: Shapes • Noise: Poses + camera view points 100
  100. 100. • Shape Generator (MS) by GPLVM – Training Set: shapes in the canonical pose. – Input: PCA coefficients of vertex positions. Learning 3D Shape Priors 101
  101. 101. • Pose Generator (MA) by GPLVM – Training Set: Synthetic 3D poses sequence. – Input: PCA coefficients of vertex positions and vertex-wise Jacobian matrices. Learning 3D Shape Priors 102
  102. 102. The Graphical Model Shape Synthesis Pose Generator Shape Generator
  103. 103. Shape Synthesis: Demo Shape Generator Pose Generator (Running) 104
  104. 104. Shape Generator Pose Generator (Running) Shape Synthesis: Demo 105
  105. 105. Shape Generator Pose Generator (Running) Shape Synthesis: Demo 106
  106. 106. Shape Generator Pose Generator (Running) Shape Synthesis: Demo 107
  107. 107. Shape Generator Pose Generator (Running) Shape Synthesis: Demo 108
  108. 108. Shape Generator Pose Generator (Running) Shape Synthesis: Demo 109
  109. 109. Shape Generator Pose Generator (Running) Shape Synthesis: Demo 110
  110. 110. Shape Generator Pose Generator (Running) Shape Synthesis: Demo 111
  111. 111. Shape Generator Pose Generator (Running) Shape Synthesis: Demo 112
  112. 112. Shape Generator Pose Generator (Running) Shape Synthesis: Demo 113
  113. 113. Shape Synthesis Zero Shape V0 Pose Generator MA Shape Generator MS VA VA VS VS Shape Transfer V V 114
  114. 114. Solution: Two-Stage Model • Main Ideas: Discriminative + Generative • Two stages: 1. Hypothesising – Discriminative; – Using random forests; 2. Shape Synthesis and Verification – Generative; – Synthesising 3D shapes using shape priors; – Silhouette verification. • Recognition by a model selection process. 115
  115. 115. • Use 3 RFs to quickly hypothesize phenotype, pose, and camera parameters. • Learned on synthetic silhouettes generated by the shape priors. Parameter Hypothesizing FA: Pose classifier FC: Camera pose classifier FS: Phenotype classifier (canonical pose) 116
  116. 116. Examples of RF Classifiers The phenotype classifier The pose classifier 117
  117. 117. Shape Synthesis and Verification • Generate 3D shapes V – From candidate parameters given by RFs. – Use GPLVM shape priors [Chen et al.’10]. • Compare the projection of V with the query silhouette Sq. – Oriented Chamfer matching (OCM). [Stenger et al’03] 118
  118. 118. • We infer the label c∗ of the query instance by maximizing the posteriori probability 119
  119. 119. • We infer the label c∗ of the query instance by maximizing the posteriori probability 120
  120. 120. Experiments • Testing data: – Manually segmented silhouettes; • Current Datasets – Human jumping jack • (13 instances, 170 images); – Human walking • (16 instances, 184 images); – Shark swimming • (13 instances, 168 images). • Phenotype Categorisation 121
  121. 121. Comparative Approaches: • Learn a single RF phenotype classifier; • Histogram of Shape Context (HoSC) – [Agarwal and Triggs, 2006] • Inner-Distance Shape Context (IDSC) – [Ling and Jacob, 2007] • 2D Oriented Chamfer matching (OCM) – [Stenger et al. 2006] • Mixture of Experts for the shape reconstruction – [Sigal et al. 2007]. – Modified into a recognition algorithm 122
  122. 122. Comparative Approaches: • Internal comparisons: – Proposed method with both feature error modelling and similarity-aware criteria function (G+D); – Proposed method w.o feature error modelling (G+D–E); – Proposed method w.o similarity-aware criteria function (G+D–S) • Using standard diversity index instead. 123
  123. 123. Recognition Performance • Cross-validation by splitting the dataset instances. • 5 phenotype categories for every test. • Selecting one instance from each category. 124
  124. 124. Experiments on Single View Reconstruction Sharks: 125
  125. 125. Experiments on Single View Reconstruction Humans: 126
  126. 126. 3. More discriminative split functions 127 f(x) ΔE In Il Ir Split node More discriminative split functions BMVC12
  127. 127. Classification forest: the split function model Split ftn: axis aligned Split ftn: oriented line Split ftn: conic section Examples of split functions See Appendix C for relation with kernel trick. Slide credit to Criminisi and Shotton, ICCV11 tutorial
  128. 128. Fast Pedestrian Detection by Cascaded Random Forest with Dominant Orientation Templates 129 D. Tang, Y. Liu, T-K Kim, BMVC 2012 Imperial College Fastest Pedestrian Detection by Cascaded Random Forest with Dominant Orientation Templates
  129. 129. Beginning with Hough Forest … [Gall et al 2009] Random Forest + Implicit Shape Model Multi-cue features (HOG etc) Hough voting Patch input 130 ...≤ τ > τ ≤ τ > τ 2 pixel test (axis aligned splits) i j f(p)=I(i)-I(j) c
  130. 130. Experiments - Accuracy 1. Dominant Orientations; 2. 2-Pixel Test; 3. Template Matching; 4. Dominant Colours; 5. Cascade; 6. Clustered Votes. 131
  131. 131. Experiments - Efficiency 132
  132. 132. Multi-cue features (HOG etc) Hough voting Patch input 133 ...≤ τ > τ ≤ τ > τ 2 pixel test (axis aligned splits) i j f(p)=I(i)-I(j) Beginning with Hough Forest … [Gall et al 2009] To accelerate run-time speed
  133. 133. Extract gradient information Discretize dominant orientations into 1 byte Discard magnitude information 134 Binarization of HOG… Dominant Orientation Templates [Hinterstoisser, et al, CVPR10]
  134. 134. DOT Hough voting Patch input 135 ...≤ τ > τ ≤ τ > τ 2 pixel test (axis aligned splits) i j f(p)=I(i)-I(j) Beginning with Hough Forest … [Gall et al 2009] To accelerate run-time speed
  135. 135. Experiments - Accuracy 1. Dominant Orientations; 2. 2-Pixel Test; 3. Template Matching; 4. Dominant Colours; 5. Cascade; 6. Clustered Votes. 136
  136. 136. Template matching split function 2 pixel test (axis aligned splits), efficient but incapable… 137
  137. 137. Template matching split function Template based (nonlinear), capable but efficient? 138
  138. 138. 139 Template Matching Split Function using Binary Bit Operations 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 1 ≤ τ > τ
  139. 139. DOT Hough voting Patch input 140 ...≤ τ > τ ≤ τ > τ Template Matching Split Beginning with Hough Forest … [Gall et al 2009] To accelerate run-time speed
  140. 140. Experiments - Accuracy 1. Dominant Orientations; 2. 2-Pixel Test; 3. Template Matching; 4. Dominant Colours; 5. Cascade; 6. Clustered Votes. 141
  141. 141. Experiments - Efficiency 142
  142. 142. DOT Hough voting Patch input 143 ...≤ τ > τ ≤ τ > τ Template Matching Split Beginning with Hough Forest … [Gall et al 2009] To accelerate run-time speed
  143. 143. Architecture DOT Feature Extraction from Input ... Holistic Random Forest ⊗ ... Patch-based Random Forest ⊗ Hough VotingOutput Applied DOT to pedestrian detection, and extended to colour domain Template matching split function Cascade of detectors Clustering hough votes (similar to Girshick et al, ICCV11) 144
  144. 144. Experiments - Accuracy 1. Dominant Orientations; 2. 2-Pixel Test; 3. Template Matching; 4. Dominant Colours; 5. Cascade; 6. Clustered Votes. 145
  145. 145. Experiments - Efficiency Time per window: 1.07x10-6 sec 5 Frames per sec Cf. Fastest Pedestrian Detector in the West(FPDW) [Dollar 2010]: 5 fps [Benenson et al 2012]: 50 fps by GPU 146
  146. 146. 147
  147. 147. 4. Multiple split criteria 148 f(x) ΔE In Il Ir Split node Multiple split criteria /adaptive weighting CVPR13 ICCV13
  148. 148. Unconstrained Monocular 3D Human Pose Estimation by Action Detection T. Yu, T-K Kim, R. Cipolla, CVPR 2013 Univ. of Cambridge, Imperial College
  149. 149. Challenges Typical techniques for 3D human pose estimation: • Segmentation • Multi-camera approach • Depthmaps • Inverse kinematics Multiple cameras with invserse kinematics [Bissacco et al. CVPR2007] [Yao et al. IJCV2012] [Sigal IJCV2011] Learning-based (regression) [Navaratnam et al. BMVC2006] [Andriluka et al. CVPR2010] Specialized hardware (e.g. structured light sensor, TOF camera) [Baak et al. ICCV2011] [Ye et al. CVPR2011] [Sun et al. CVPR2012] Require controlled environments / special hardware!
  150. 150. Our Approach • Action recognition + detection 1. Actions are temporally and spatially structured • Action recognition: select the best manifold for pose regression. [Yao et al. IJCV2012] • Action detection: Detect the occurence of an action in space-time [Yang, ICCV2009] Actions provides strong prior for pose estimation
  151. 151. Our Approach 2. Unconstrained part detection • Low level feature descriptors are replaced by a deformable part models (DPM) to detect high level 2D body parts in the video (e.g. Pictorial structure [FelzenswalbIJ and Huttenlocher, IJCV2005]) 2D HPE •Body parts detection (The pose Is known by detecting body parts) •A problem of 2D detection and tracking 3D HPE •Full 3D pose estimation •A problem of optimization and manifold learning [Eichner et al. IJCV2012] [Yang and Ramanan.ICCV2011] [Andriluka et al. CVPR2009]
  152. 152. System overview
  153. 153. Method: Feature extraction Input: • Action snippet, short, overlapping video subsequences [Schindler CVPR2009] • A DPM, using HoG, as an off-the-shelf 2D body part detector. (Any 2D part detector will do) Feature representation • Pairwise displacements between parts – 325 dimensions • Dimensionality reduction: PCA – 6 dimensions (separately per action) For each frame • A pose is represented by a bag of feature vectors extracted from the “action snippet”
  154. 154. Action Detection Action as an atomic sequence of shape Each detection provides the spatiotemporal structure of an action. Hough Forest: Each frame in the video votes for the current pose. ACTION: CLAP ACTION: DANCE Locate space-time position Time Classify action Reference Starting Time
  155. 155. Method: Action Detection •Randomized Hough Forest for action recognition and pose estimation [Gall et al. PAMI2011] • Action and pose are estimated from a feature vector simultaneously. • At each split, the quality measure is: • Ha is the information gain used in standard classification forest • Where Hp is: • Which describes the spread of pose in a node = covariance
  156. 156. Method: Action Detection •Randomized Hough Forest for action recognition and pose estimation [Gall et al. PAMI2011] • ω is the purity of the current tree node • Each terminal node gives 1. The posterior of the action P(action class | features) 2. A vote of estimated pose Action detection Action recog.
  157. 157. Method: Action Detection • (Testing) • Pass down feature vectors extracted from a video snippet • Action classification: same as typical randomized classification forest. • Pose: Distributions of votes (joint positions) is modeled as 3D Gaussians.
  158. 158. Pose Regression & Combination Global: Action detection Local: Regression Forests Final 3D pose estimation: A mixture of 3D Gaussians Joint positions are refined locally by regression forests
  159. 159. Method: Pose Refinement •Randomized Regression Forest for local joint position refinement •Differences among individual action instance (of the same class) are corrected using regression forests [Criminisi et al. MACCAI2010] • Input: • Training feature vectors • Corresponding 3D joint locations • Each forest refines one joint location • so there are 15 forests (150 trees!). • Output: Joint locations in 3D Gaussians
  160. 160. Method: Final Pose Estimation •Compute the final 3D pose estimation •Assuming the pose distributions from (1) Action detection, and (2) Pose regression are independent • the final pose estimation is computed by the joint distribution of Gaussians in (1) and (2) •Traditional method: single output, global distribution. •Our approach produces distributions of local "confidence levels" of joint locations.
  161. 161. Experiments • A relatively new problem • No public dataset that provides both 2D and 3D ground truths • We collected a new challenging dataset (APE dataset) • 2D and 3D “ground truths” from Kinect • 7 action classes, 7 different subjects, 5 sequences / class / subject • Different environment for each subject • No segmentation can be applied • Moving backgrounds and occlusions • Unseen testing environments --- traditional methods do not work in this dataset! Bend Box Wave 1 Wave 2 BalanceDance Clap
  162. 162. Experiments • Comparison with the state-of-the-art 2D pose estimator • A better accuracy is observed: Action detection (ad joint refinement) which gives strong cues to the 3D pose • Limitation: the action performed needs to be seen during the training phase. Per-class localization error (pixels)
  163. 163. Experiments • Demo Video 1. APE dataset 2. Extra challenges: KTH and Weizmann dataset 1. Originally used in action recognition 2. Extremely low resolution for typical 3D HPE KTH dataset: 160 x 120 pixels, each subject is ~80-100 pixels high. Weizmann dataset: 180 x 144 pixels, each subject is about ~60-75 pixels high. KTH action dataset Weizmann action dataset
  164. 164. 165
  165. 165. 166 Multiple Split Criteria / Adaptive Weighting … goes on … For Hand Pose Estimation
  166. 166. 167
  167. 167. 168 Summary
  168. 168. Boosting Cascade 169 x x x ……t1 tTt2 Boosting Decision tree Randomised Decision Forest [IJCV 2012] [NIPS 2008] Part I
  169. 169. 170 f(x) ΔE In Il Ir Split node x …… y Tree 1 Tree 2 Tree N p(c) Leaf node … More discriminative split functions BMVC12 Multiple split criteria /adaptive weighting CVPR13 ICCV13 Capturing structural info. BMVC10 Probabilistic outputs ICCV11ECCV10 Part II
  170. 170. 171 What other topics @ ICL 1 2 3
  171. 171. Thanks to Jamie Shotton Roberto Cipolla Wenhan Luo Akis Tsiotsios Yuru Pei Yang Liu Tsz-Ho Yu Danhang Tang Yu Chen A. Doumanoglou Chanki Yu Chao Xiong Mang Shao Kyuhwa Lee Alykhan Tejani Ignas Budvytis Tom Woodley Bjorn Stenger
  172. 172. 173 Any Questions? http://www.iis.ee.ic.ac.uk/icvl

×