SlideShare a Scribd company logo
Recognizing Actions Across Cameras
by Exploring the Correlation Subspace
4th International Workshop on Video Event Categorization,
Tagging and Retrieval (VECTaR), in conjunction with ECCV 2012
Chun-Hao Huang, Yi-Ren Yeh, and Yu-Chiang Frank Wang
Research Center for IT Innovation, Academia Sinica, Taiwan
Oct 12th, 2012
Outline
• Introduction
• Our Proposed Framework
Learning Correlation Subspaces via CCA
Domain Transfer Ability of CCA
SVM with A Novel Correlation Regularizer
• Experiments
• Conclusion
2
Outline
• Introduction
• Our Proposed Framework
Learning Correlation Subspaces via CCA
Domain Transfer Ability of CCA
SVM with A Novel Correlation Regularizer
• Experiments
• Conclusion
3
Representing an Action
4
• Actions are represented as high-dim vectors.
• Bag of spatio-temporal visual word model.
• State-of-the-art classifiers (e.g., SVM) are applied to
address the recognition task.
[Laptev, IJCV, 2005]
[Dollár et al., ICCV WS on VS-PETS, 2005]
• Spatio-temporal interest points
Cross-Camera Action Recognition
5
Source view Target view
• Models learned at source views typically do
not generalize well at target views.
check watch
punch
kick
1
s
v
2
s
v
3
s
v𝒳 𝑠
∈ ℝ 𝑑 𝑠
2
t
v
3
t
v
1
t
v
𝒳 𝑡
∈ ℝ 𝑑 𝑡
Colored: labeled data
: test data
Colored: labeled data
: test data
Gray: unlabeled data
Source view Target view
• An unsupervised strategy:
 Only unlabeled data available at target views.
 They are exploited to learn the relationship between
data at source and target views.
Cross-Camera Action Recognition (cont’d)
6
One branch of transfer learning
Approaches based on Transfer Learning
• To learn a common feature representation (e.g., a joint subspace)
for both source and target view data.
• Training/testing can be performed in terms of such representations.
• How to exploit unlabeled data from both views for determining this
joint subspace is the key issue.
• Previous approaches:
1. Splits-based feature transfer [Farhadi and Tabrizi, ECCV ‘08 ]
 Requires frame-wise correspondence
2. Bag of bilingual words model (BoBW) [Liu et al., CVPR ‘11 ]
 Considers each dimension of the derived representation to be equally important.
7
Outline
• Introduction
• Our Proposed Framework
Learning Correlation Subspaces via CCA
Domain Transfer Ability of CCA
SVM with A Novel Correlation Regularizer
• Experiments
• Conclusion
8
2. Project the source label data onto it
Source view Target view
Overview of Our Proposed Method
9
Correlation subspace 𝒳c ∈ ℝd
1
s
v
2
s
v
3
s
v𝒳s ℝ 𝑑 𝑠
2
t
v
3
t
v
1
t
v
𝒳t ℝ 𝑑 𝑡
,
2
s t
v
,
1
s t
v
4. Prediction
3. Learn a new SVM
with constraints on
domain transfer ability
1. Learn a joint subspace via canonical
correlation analysis (CCA)
Requirements of CCA
10
Source view Target view
: unlabeled data pairs
(observed at both views)
unlabeled actions observed by both cameras
Colored: labeled data
: test data
Gray: unlabeled data
Learning the Correlation Subspace via CCA
• CCA aims at maximizing the correlation between two variable sets.
11
• Given two sets of n centered unlabeled observations :
• CCA learns two projection vectors us and ut, maximizing the
correlation coefficient ρ between projected data, i.e.,
where are
covariance matrices.
,
maxs t
s ts s t t
st
s s s s t t t t s s t t
ss tt
  
u u
u Σ uu X X u
u X X u u X X u u Σ u u Σ u
•• •
• • • • • •
, ,t t s t s s
tt st ss  Σ X X Σ X X Σ X X• • •
1 1, ... , and , ... ,s td n d ns s s t t t
n n
 
         X x x X x xR R
CCA Subspace as Common Feature Representation
12
Source view Target view
correlation subspace 𝒳c ℝd
1
s
v
2
s
v
3
s
v𝒳s ℝ 𝑑 𝑠
2
t
v
3
t
v
1
t
v
𝒳t ℝ 𝑑 𝑡
s s
P x• t t
P x•
,
1
s t
v(ρ1,u1
𝑠
, u1
𝑡
)
,
2
s t
v (ρ2,u2
𝑠
, u2
𝑡
)
u1
𝑠
u1
𝑡
⋯ u 𝑑
𝑠
⋯ u 𝑑
𝑡
]
]
[
[
P 𝑠 =
P 𝑡 =
∈ ℝ 𝑑 𝑠×𝑑
∈ ℝ 𝑑 𝑡×𝑑
Outline
• Introduction
• The Proposed Framework
Learning Correlation Subspaces via CCA
Domain Transfer Ability of CCA
SVM with A Novel Correlation Regularizer
• Experiments
• Conclusion
13
Domain Transfer Ability of CCA
• Learn SVMs in the derived CCA subspace…Problem solved?
- Yes and No!
• Domain Transfer Ability:
- In CCA subspace, each dimension Vi
s,t is associated with a different ρi
- How well can the classifiers learned (in this subspace) from the
projected source view data generalize to those from the target view?
• See the example below…
14
Outline
• Introduction
• The Proposed Framework
Learning Correlation Subspaces via CCA
Domain Transfer Ability of CCA
SVM with a Novel Correlation Regularizer
• Experiments
• Conclusion
15
• Proposed SVM formulation:
• The introduced correlation regularizer r⊤
Abs(w) :
and
• Larger/Smaller ρi
→ Stronger/smaller correlation between source & target view data
→ SVM model wi is more/less reliable at that dimension in the CCA space.
• Our regularizer favors SVM solution to be dominant in reliable CCA dimensions
(i.e., larger correlation coefficents ρi imply larger |wi| values).
• Classification of (projected) target view test data:
16
 
   
2
2
1
1 1
min Abs
2 2
s.t. , 1, 0, ,
N
i
i
s s s s
i i i i i i l
C
y b y D

 

 
     
w
w r w
w P x x
•
•
 ( ) sgn , t t
f b x w P x•
  1 2Abs , , ... , dw w w   w  1 2, , ... , d  r
Our Proposed SVM with Domain Transfer Ability
An Approximation for the Proposed SVM
• It is not straightforward to solve the previous formulation with Abs(w).
• An approximated solution can be derived by relaxing Abs(w):
where ⨀ indicates the element-wise multiplication.
• We can further simplify the approximated problem as:
• We apply SSVM* to solve the above optimization problem.
17
 
   
2 2
1 1
1
min 1
2
s.t. , 1, 0, ,
d N
i i i
i i
s s s s
i i i i i i l
w C
y b y D
 
 
 
 
     
 w
w P x x•
   
   
2
2
1
1 1
min
2 2
s.t. , 1, 0, ,
N
i
i
s s s s
i i i i i i l
C
y b y D

 

 
     
w
w r r w w
w P x x
•
•
⨀ ⨀
*: Lee et al., Computational Optimization and Applications, 2001
Outline
• Introduction
• The Proposed Framework
Learning Correlation Subspaces via CCA
Domain Transfer Ability of CCA
SVM with a Novel Correlation Regularizer
• Experiments
• Conclusion
18
Dataset
• IXMAS multi-view action dataset
 Action videos of eleven action classes
 Each action video is performed three times by twelve actors
 The actions are captured simultaneously by five cameras
19
Experiment Setting
2/3 as unlabeled data: Learning correlation subspaces via CCA
20
Source view Target view
Check-watch Scratch-head
Sit-down
Kick
Kick
1/3 as labeled data: Training and testing
⋯
Leave-one-class-out protocol (LOCO)
Without Kick action
Experimental Results
• A: BoW from source view directly
• B: BoBW + SVM [Liu et al. CVPR’11]
• C: BoBW + our SVM
21
(%)
camera0 camera1 camera2
A B C D E A B C D E A B C D E
c0 - 9.29 60.96 63.03 63.18 64.90 11.62 41.21 50.76 56.97 60.61
c1 10.71 58.08 59.70 66.72 70.25 - 7.12 33.54 38.03 57.83 59.34
c2 8.79 52.63 49.34 57.37 62.47 6.67 50.86 45.79 59.19 61.87 -
c3 6.31 40.35 44.44 65.30 66.01 9.75 33.59 33.27 46.77 52.68 5.96 41.26 43.99 61.36 61.36
c4 5.35 38.59 40.91 54.39 55.76 9.44 37.53 37.00 53.59 55.00 9.19 34.80 38.28 57.88 60.15
avg. 7.79 47.41 48.60 60.95 63.62 8.79 45.73 44.77 55.68 58.61 8.47 37.70 42.77 58.51 60.37
camera3 camera4
A B C D E A B C D E
c0 7.78 39.65 41.36 63.64 62.17 7.12 24.60 37.02 43.69 48.23
c1 12.02 35.91 39.14 48.59 54.85 8.89 26.87 22.22 44.24 49.29
c2 6.46 41.46 42.78 60.00 61.46 10.35 28.03 33.43 45.05 51.82
c3 - 8.89 27.53 28.28 40.66 41.06
c4 9.60 27.68 34.60 48.03 48.89 -
avg. 8.96 36.17 39.47 55.06 56.84 8.81 26.76 30.24 43.41 47.60
• D: CCA + SVM
• E: our proposed framework (CCA + our SVM).
Effects on The Correlation Coefficient ρ
22
• Recognition rates for the two models were 47.22% and 77.78%, respectively.
(a) Averaged |wi| of standard SVM (b) Averaged |wi| of our SVM
• We successfully suppress the SVM model |wi| when lower ρ is resulted.
• Ex: source: camera 3, target: camera 2, left-out action: get-up
dimension index dimension index
wiwi
Outline
• Introduction
• The Proposed Framework
Learning Correlation Subspaces via CCA
Domain Transfer Ability of CCA
SVM with A Novel Correlation Regularizer
• Experiments
• Conclusion
23
Conclusions
• We presented a transfer-learning based approach to cross-
camera action recognition.
• We considered the domain transfer ability of CCA, and proposed
a novel SVM formulation with a correlation regularizer.
• Experimental results on the IXMAS dataset confirmed
performance improvements using our proposed method.
24
25
Thank You!
Representing an action
26
human body model
[Mikić et al., IJCV, 2003] [Junejo et al., TPAMI, 2010]
Representing an action
27
[Blank et al., ICCV, 2005]
[Weinland et al., CVIU, 2006]
Motion history volume
Space-time shapes
spatio-temporal volumes
ℝ276
Source view Target view
Split-based feature transfer (ECCV ‘08)
28
1
s
v
2
s
v
3
s
v𝒳 𝑠
∈ ℝ40
2
t
v
𝒳 𝑡
∈ ℝ40
ℝ276
K-means K-means
Target instance in the
source representation
frame
action video
Matching according to
split-based feature
Source view
How to construct split-based feature
29
ℝ30
ℝ30
ℝ30
⋮
1000 different
random projections
ℝ276
ℝ30
Max Margin Clustering
25 1
1
1
1 
 
  
 
 
 
Split-based feature ℝ25
Pick the best 25
random projections
+
ℝ30
-
Target view
ℝ276
25 1
1
1
1 
 
  
 
 
 
Split-based feature ℝ25
ℝ30
Train SVM using split-based
feature as labels
+
ℝ30
ℝ30
ℝ30
⋮
Same best 25
random projections
unlabeled frame
Source view Target view
4. Train models and
predict with this
representation
3. Construct the codebook of
bilingual words
30
1. Exploit unlabeled data to model the
two codebooks as a bipartite graph
⋯
⋯
1
s
v 2
s
v 2
s
v
s
s
dv 2
s
v 2
s
v
2. Perform spectral clustering
s
s
dv s
s
dv 2
t
v
Bag of Bilingual Words (CVPR ‘11)
Learning correlation subspace via CCA
• The projection vector us can be solved by a generalized
eigenvalue decomposition problem:
31
• Largest η corresponds to largest ρ.
• Once us is obtained, ut can be calculated by
1 s
t tt st



Σ Σ u
u
⋯
⋯
⋯
⋯
]
]
[
[
P 𝑠 =
P 𝑡 =
eigenvalues η1
correlation
coefficient ρ1
u1
𝑠
u1
𝑡
∈ ℝ 𝑑 𝑠×𝑑
∈ ℝ 𝑑 𝑡×𝑑
 
1 s s
st tt st ss

Σ Σ Σ u Σ u•
   
1 s s
st tt t st ss s  

  Σ Σ I Σ u Σ I u•
> ⋯ > ηd
> ⋯ > ρd
⋯ u 𝑑
𝑠
⋯ u 𝑑
𝑡
Source view Target view
32
Colored: labeled data
: test data
Gray: unlabeled data
LOCO protocol in real application: new action
class

More Related Content

What's hot

Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
Rishabh Indoria
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
NamHyuk Ahn
 
Parn pyramidal+affine+regression+networks+for+dense+semantic+correspondence
Parn pyramidal+affine+regression+networks+for+dense+semantic+correspondenceParn pyramidal+affine+regression+networks+for+dense+semantic+correspondence
Parn pyramidal+affine+regression+networks+for+dense+semantic+correspondence
NAVER Engineering
 
Deep Learning for Computer Vision: Segmentation (UPC 2016)
Deep Learning for Computer Vision: Segmentation (UPC 2016)Deep Learning for Computer Vision: Segmentation (UPC 2016)
Deep Learning for Computer Vision: Segmentation (UPC 2016)
Universitat Politècnica de Catalunya
 
Focal loss for dense object detection
Focal loss for dense object detectionFocal loss for dense object detection
Focal loss for dense object detection
DaeHeeKim31
 
Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval
Universitat Politècnica de Catalunya
 
Shai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingShai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble tracking
wolf
 
Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Universitat Politècnica de Catalunya
 
Distributed Parallel Process Particle Swarm Optimization on Fixed Charge Netw...
Distributed Parallel Process Particle Swarm Optimization on Fixed Charge Netw...Distributed Parallel Process Particle Swarm Optimization on Fixed Charge Netw...
Distributed Parallel Process Particle Swarm Optimization on Fixed Charge Netw...
Corey Clark, Ph.D.
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper review
Yoonho Na
 
Pr057 mask rcnn
Pr057 mask rcnnPr057 mask rcnn
Pr057 mask rcnn
Taeoh Kim
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
Nader Karimi
 
Visual Object Analysis using Regions and Local Features
Visual Object Analysis using Regions and Local FeaturesVisual Object Analysis using Regions and Local Features
Visual Object Analysis using Regions and Local Features
Universitat Politècnica de Catalunya
 
Multiple-Symbol Differential Detection for Distributed Space-Time Coding
Multiple-Symbol Differential Detection for Distributed Space-Time CodingMultiple-Symbol Differential Detection for Distributed Space-Time Coding
Multiple-Symbol Differential Detection for Distributed Space-Time Coding
mravendi
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
Jinwon Lee
 
Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료
taeseon ryu
 

What's hot (20)

Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
 
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
 
Parn pyramidal+affine+regression+networks+for+dense+semantic+correspondence
Parn pyramidal+affine+regression+networks+for+dense+semantic+correspondenceParn pyramidal+affine+regression+networks+for+dense+semantic+correspondence
Parn pyramidal+affine+regression+networks+for+dense+semantic+correspondence
 
Deep Learning for Computer Vision: Segmentation (UPC 2016)
Deep Learning for Computer Vision: Segmentation (UPC 2016)Deep Learning for Computer Vision: Segmentation (UPC 2016)
Deep Learning for Computer Vision: Segmentation (UPC 2016)
 
Focal loss for dense object detection
Focal loss for dense object detectionFocal loss for dense object detection
Focal loss for dense object detection
 
Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval
 
Shai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingShai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble tracking
 
Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)Recurrent Instance Segmentation (UPC Reading Group)
Recurrent Instance Segmentation (UPC Reading Group)
 
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
 
Distributed Parallel Process Particle Swarm Optimization on Fixed Charge Netw...
Distributed Parallel Process Particle Swarm Optimization on Fixed Charge Netw...Distributed Parallel Process Particle Swarm Optimization on Fixed Charge Netw...
Distributed Parallel Process Particle Swarm Optimization on Fixed Charge Netw...
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper review
 
Pr057 mask rcnn
Pr057 mask rcnnPr057 mask rcnn
Pr057 mask rcnn
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
Visual Object Analysis using Regions and Local Features
Visual Object Analysis using Regions and Local FeaturesVisual Object Analysis using Regions and Local Features
Visual Object Analysis using Regions and Local Features
 
Multiple-Symbol Differential Detection for Distributed Space-Time Coding
Multiple-Symbol Differential Detection for Distributed Space-Time CodingMultiple-Symbol Differential Detection for Distributed Space-Time Coding
Multiple-Symbol Differential Detection for Distributed Space-Time Coding
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료
 

Viewers also liked

International
InternationalInternational
International
Yaxi Yu
 
Kevin Bourland - Portfolio + CV
Kevin Bourland - Portfolio + CVKevin Bourland - Portfolio + CV
Kevin Bourland - Portfolio + CVKevin Bourland
 
EarlsCourt_FINAL_WEB_Amend3
EarlsCourt_FINAL_WEB_Amend3EarlsCourt_FINAL_WEB_Amend3
EarlsCourt_FINAL_WEB_Amend3Rian Strauss
 
Sectiuni nou
Sectiuni nouSectiuni nou
Sectiuni nou
flaviagiliola
 
Cuadro dt
Cuadro dtCuadro dt
以「保護」為名的言論管控與對兒少的支配慾望
以「保護」為名的言論管控與對兒少的支配慾望以「保護」為名的言論管控與對兒少的支配慾望
以「保護」為名的言論管控與對兒少的支配慾望
Kong Kao
 
JessiKorchCVPortfolio
JessiKorchCVPortfolioJessiKorchCVPortfolio
JessiKorchCVPortfolioJessica Korch
 
ANATOMIA del muslo y rodilla
ANATOMIA del muslo y rodilla ANATOMIA del muslo y rodilla
ANATOMIA del muslo y rodilla
Angeles A. Guevara
 
Bimodal IT - Mode 2 Evolution Roadmap v12
Bimodal IT - Mode 2 Evolution Roadmap v12Bimodal IT - Mode 2 Evolution Roadmap v12
Bimodal IT - Mode 2 Evolution Roadmap v12
Janusz Stankiewicz
 
Management of HIV patients
Management of HIV patientsManagement of HIV patients
Management of HIV patients
murshid0266
 
Stigmatization of Mental Illness
Stigmatization of Mental IllnessStigmatization of Mental Illness
Stigmatization of Mental Illness
NAMITM
 
Continuous Delivery & DevOps - IT Value Stream Improvements Roadmap Chapter 2 v8
Continuous Delivery & DevOps - IT Value Stream Improvements Roadmap Chapter 2 v8Continuous Delivery & DevOps - IT Value Stream Improvements Roadmap Chapter 2 v8
Continuous Delivery & DevOps - IT Value Stream Improvements Roadmap Chapter 2 v8
Janusz Stankiewicz
 

Viewers also liked (13)

International
InternationalInternational
International
 
Kevin Bourland - Portfolio + CV
Kevin Bourland - Portfolio + CVKevin Bourland - Portfolio + CV
Kevin Bourland - Portfolio + CV
 
EarlsCourt_FINAL_WEB_Amend3
EarlsCourt_FINAL_WEB_Amend3EarlsCourt_FINAL_WEB_Amend3
EarlsCourt_FINAL_WEB_Amend3
 
Sectiuni nou
Sectiuni nouSectiuni nou
Sectiuni nou
 
VS_Sales_Presentation_Key_Technologies_WK1411
VS_Sales_Presentation_Key_Technologies_WK1411VS_Sales_Presentation_Key_Technologies_WK1411
VS_Sales_Presentation_Key_Technologies_WK1411
 
Cuadro dt
Cuadro dtCuadro dt
Cuadro dt
 
以「保護」為名的言論管控與對兒少的支配慾望
以「保護」為名的言論管控與對兒少的支配慾望以「保護」為名的言論管控與對兒少的支配慾望
以「保護」為名的言論管控與對兒少的支配慾望
 
JessiKorchCVPortfolio
JessiKorchCVPortfolioJessiKorchCVPortfolio
JessiKorchCVPortfolio
 
ANATOMIA del muslo y rodilla
ANATOMIA del muslo y rodilla ANATOMIA del muslo y rodilla
ANATOMIA del muslo y rodilla
 
Bimodal IT - Mode 2 Evolution Roadmap v12
Bimodal IT - Mode 2 Evolution Roadmap v12Bimodal IT - Mode 2 Evolution Roadmap v12
Bimodal IT - Mode 2 Evolution Roadmap v12
 
Management of HIV patients
Management of HIV patientsManagement of HIV patients
Management of HIV patients
 
Stigmatization of Mental Illness
Stigmatization of Mental IllnessStigmatization of Mental Illness
Stigmatization of Mental Illness
 
Continuous Delivery & DevOps - IT Value Stream Improvements Roadmap Chapter 2 v8
Continuous Delivery & DevOps - IT Value Stream Improvements Roadmap Chapter 2 v8Continuous Delivery & DevOps - IT Value Stream Improvements Roadmap Chapter 2 v8
Continuous Delivery & DevOps - IT Value Stream Improvements Roadmap Chapter 2 v8
 

Similar to ECCV WS 2012 (Frank)

Temporal Segment Network
Temporal Segment NetworkTemporal Segment Network
Temporal Segment Network
Dongang (Sean) Wang
 
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Saimunur Rahman
 
EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171Yaxin Liu
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphs
pione30
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
Benjamin Le
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
Jinwon Lee
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Universitat Politècnica de Catalunya
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
MLconf
 
Multimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAMultimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QA
Jin-Hwa Kim
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Dalei Li
 
Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Lec11 object-re-id
Lec11 object-re-idLec11 object-re-id
IOEfficientParalleMatrixMultiplication_present
IOEfficientParalleMatrixMultiplication_presentIOEfficientParalleMatrixMultiplication_present
IOEfficientParalleMatrixMultiplication_presentShubham Joshi
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Soma Boubou
 
D3L4-objects.pdf
D3L4-objects.pdfD3L4-objects.pdf
D3L4-objects.pdf
ssusere945ae
 
Trajectory Transformer.pptx
Trajectory Transformer.pptxTrajectory Transformer.pptx
Trajectory Transformer.pptx
Seungeon Baek
 
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Pirouz Nourian
 
Optic flow estimation with deep learning
Optic flow estimation with deep learningOptic flow estimation with deep learning
Optic flow estimation with deep learning
Yu Huang
 
Decision Forests and discriminant analysis
Decision Forests and discriminant analysisDecision Forests and discriminant analysis
Decision Forests and discriminant analysis
potaters
 

Similar to ECCV WS 2012 (Frank) (20)

Temporal Segment Network
Temporal Segment NetworkTemporal Segment Network
Temporal Segment Network
 
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
 
EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171EE660_Report_YaxinLiu_8448347171
EE660_Report_YaxinLiu_8448347171
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphs
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
Multimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAMultimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QA
 
lecture_16.pptx
lecture_16.pptxlecture_16.pptx
lecture_16.pptx
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
 
Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018
 
Lec11 object-re-id
Lec11 object-re-idLec11 object-re-id
Lec11 object-re-id
 
IOEfficientParalleMatrixMultiplication_present
IOEfficientParalleMatrixMultiplication_presentIOEfficientParalleMatrixMultiplication_present
IOEfficientParalleMatrixMultiplication_present
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
 
D3L4-objects.pdf
D3L4-objects.pdfD3L4-objects.pdf
D3L4-objects.pdf
 
Trajectory Transformer.pptx
Trajectory Transformer.pptxTrajectory Transformer.pptx
Trajectory Transformer.pptx
 
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
Point Cloud Processing: Estimating Normal Vectors and Curvature Indicators us...
 
Optic flow estimation with deep learning
Optic flow estimation with deep learningOptic flow estimation with deep learning
Optic flow estimation with deep learning
 
Decision Forests and discriminant analysis
Decision Forests and discriminant analysisDecision Forests and discriminant analysis
Decision Forests and discriminant analysis
 

ECCV WS 2012 (Frank)

  • 1. Recognizing Actions Across Cameras by Exploring the Correlation Subspace 4th International Workshop on Video Event Categorization, Tagging and Retrieval (VECTaR), in conjunction with ECCV 2012 Chun-Hao Huang, Yi-Ren Yeh, and Yu-Chiang Frank Wang Research Center for IT Innovation, Academia Sinica, Taiwan Oct 12th, 2012
  • 2. Outline • Introduction • Our Proposed Framework Learning Correlation Subspaces via CCA Domain Transfer Ability of CCA SVM with A Novel Correlation Regularizer • Experiments • Conclusion 2
  • 3. Outline • Introduction • Our Proposed Framework Learning Correlation Subspaces via CCA Domain Transfer Ability of CCA SVM with A Novel Correlation Regularizer • Experiments • Conclusion 3
  • 4. Representing an Action 4 • Actions are represented as high-dim vectors. • Bag of spatio-temporal visual word model. • State-of-the-art classifiers (e.g., SVM) are applied to address the recognition task. [Laptev, IJCV, 2005] [Dollár et al., ICCV WS on VS-PETS, 2005] • Spatio-temporal interest points
  • 5. Cross-Camera Action Recognition 5 Source view Target view • Models learned at source views typically do not generalize well at target views. check watch punch kick 1 s v 2 s v 3 s v𝒳 𝑠 ∈ ℝ 𝑑 𝑠 2 t v 3 t v 1 t v 𝒳 𝑡 ∈ ℝ 𝑑 𝑡 Colored: labeled data : test data
  • 6. Colored: labeled data : test data Gray: unlabeled data Source view Target view • An unsupervised strategy:  Only unlabeled data available at target views.  They are exploited to learn the relationship between data at source and target views. Cross-Camera Action Recognition (cont’d) 6 One branch of transfer learning
  • 7. Approaches based on Transfer Learning • To learn a common feature representation (e.g., a joint subspace) for both source and target view data. • Training/testing can be performed in terms of such representations. • How to exploit unlabeled data from both views for determining this joint subspace is the key issue. • Previous approaches: 1. Splits-based feature transfer [Farhadi and Tabrizi, ECCV ‘08 ]  Requires frame-wise correspondence 2. Bag of bilingual words model (BoBW) [Liu et al., CVPR ‘11 ]  Considers each dimension of the derived representation to be equally important. 7
  • 8. Outline • Introduction • Our Proposed Framework Learning Correlation Subspaces via CCA Domain Transfer Ability of CCA SVM with A Novel Correlation Regularizer • Experiments • Conclusion 8
  • 9. 2. Project the source label data onto it Source view Target view Overview of Our Proposed Method 9 Correlation subspace 𝒳c ∈ ℝd 1 s v 2 s v 3 s v𝒳s ℝ 𝑑 𝑠 2 t v 3 t v 1 t v 𝒳t ℝ 𝑑 𝑡 , 2 s t v , 1 s t v 4. Prediction 3. Learn a new SVM with constraints on domain transfer ability 1. Learn a joint subspace via canonical correlation analysis (CCA)
  • 10. Requirements of CCA 10 Source view Target view : unlabeled data pairs (observed at both views) unlabeled actions observed by both cameras Colored: labeled data : test data Gray: unlabeled data
  • 11. Learning the Correlation Subspace via CCA • CCA aims at maximizing the correlation between two variable sets. 11 • Given two sets of n centered unlabeled observations : • CCA learns two projection vectors us and ut, maximizing the correlation coefficient ρ between projected data, i.e., where are covariance matrices. , maxs t s ts s t t st s s s s t t t t s s t t ss tt    u u u Σ uu X X u u X X u u X X u u Σ u u Σ u •• • • • • • • • , ,t t s t s s tt st ss  Σ X X Σ X X Σ X X• • • 1 1, ... , and , ... ,s td n d ns s s t t t n n            X x x X x xR R
  • 12. CCA Subspace as Common Feature Representation 12 Source view Target view correlation subspace 𝒳c ℝd 1 s v 2 s v 3 s v𝒳s ℝ 𝑑 𝑠 2 t v 3 t v 1 t v 𝒳t ℝ 𝑑 𝑡 s s P x• t t P x• , 1 s t v(ρ1,u1 𝑠 , u1 𝑡 ) , 2 s t v (ρ2,u2 𝑠 , u2 𝑡 ) u1 𝑠 u1 𝑡 ⋯ u 𝑑 𝑠 ⋯ u 𝑑 𝑡 ] ] [ [ P 𝑠 = P 𝑡 = ∈ ℝ 𝑑 𝑠×𝑑 ∈ ℝ 𝑑 𝑡×𝑑
  • 13. Outline • Introduction • The Proposed Framework Learning Correlation Subspaces via CCA Domain Transfer Ability of CCA SVM with A Novel Correlation Regularizer • Experiments • Conclusion 13
  • 14. Domain Transfer Ability of CCA • Learn SVMs in the derived CCA subspace…Problem solved? - Yes and No! • Domain Transfer Ability: - In CCA subspace, each dimension Vi s,t is associated with a different ρi - How well can the classifiers learned (in this subspace) from the projected source view data generalize to those from the target view? • See the example below… 14
  • 15. Outline • Introduction • The Proposed Framework Learning Correlation Subspaces via CCA Domain Transfer Ability of CCA SVM with a Novel Correlation Regularizer • Experiments • Conclusion 15
  • 16. • Proposed SVM formulation: • The introduced correlation regularizer r⊤ Abs(w) : and • Larger/Smaller ρi → Stronger/smaller correlation between source & target view data → SVM model wi is more/less reliable at that dimension in the CCA space. • Our regularizer favors SVM solution to be dominant in reliable CCA dimensions (i.e., larger correlation coefficents ρi imply larger |wi| values). • Classification of (projected) target view test data: 16       2 2 1 1 1 min Abs 2 2 s.t. , 1, 0, , N i i s s s s i i i i i i l C y b y D             w w r w w P x x • •  ( ) sgn , t t f b x w P x•   1 2Abs , , ... , dw w w   w  1 2, , ... , d  r Our Proposed SVM with Domain Transfer Ability
  • 17. An Approximation for the Proposed SVM • It is not straightforward to solve the previous formulation with Abs(w). • An approximated solution can be derived by relaxing Abs(w): where ⨀ indicates the element-wise multiplication. • We can further simplify the approximated problem as: • We apply SSVM* to solve the above optimization problem. 17       2 2 1 1 1 min 1 2 s.t. , 1, 0, , d N i i i i i s s s s i i i i i i l w C y b y D                w w P x x•         2 2 1 1 1 min 2 2 s.t. , 1, 0, , N i i s s s s i i i i i i l C y b y D             w w r r w w w P x x • • ⨀ ⨀ *: Lee et al., Computational Optimization and Applications, 2001
  • 18. Outline • Introduction • The Proposed Framework Learning Correlation Subspaces via CCA Domain Transfer Ability of CCA SVM with a Novel Correlation Regularizer • Experiments • Conclusion 18
  • 19. Dataset • IXMAS multi-view action dataset  Action videos of eleven action classes  Each action video is performed three times by twelve actors  The actions are captured simultaneously by five cameras 19
  • 20. Experiment Setting 2/3 as unlabeled data: Learning correlation subspaces via CCA 20 Source view Target view Check-watch Scratch-head Sit-down Kick Kick 1/3 as labeled data: Training and testing ⋯ Leave-one-class-out protocol (LOCO) Without Kick action
  • 21. Experimental Results • A: BoW from source view directly • B: BoBW + SVM [Liu et al. CVPR’11] • C: BoBW + our SVM 21 (%) camera0 camera1 camera2 A B C D E A B C D E A B C D E c0 - 9.29 60.96 63.03 63.18 64.90 11.62 41.21 50.76 56.97 60.61 c1 10.71 58.08 59.70 66.72 70.25 - 7.12 33.54 38.03 57.83 59.34 c2 8.79 52.63 49.34 57.37 62.47 6.67 50.86 45.79 59.19 61.87 - c3 6.31 40.35 44.44 65.30 66.01 9.75 33.59 33.27 46.77 52.68 5.96 41.26 43.99 61.36 61.36 c4 5.35 38.59 40.91 54.39 55.76 9.44 37.53 37.00 53.59 55.00 9.19 34.80 38.28 57.88 60.15 avg. 7.79 47.41 48.60 60.95 63.62 8.79 45.73 44.77 55.68 58.61 8.47 37.70 42.77 58.51 60.37 camera3 camera4 A B C D E A B C D E c0 7.78 39.65 41.36 63.64 62.17 7.12 24.60 37.02 43.69 48.23 c1 12.02 35.91 39.14 48.59 54.85 8.89 26.87 22.22 44.24 49.29 c2 6.46 41.46 42.78 60.00 61.46 10.35 28.03 33.43 45.05 51.82 c3 - 8.89 27.53 28.28 40.66 41.06 c4 9.60 27.68 34.60 48.03 48.89 - avg. 8.96 36.17 39.47 55.06 56.84 8.81 26.76 30.24 43.41 47.60 • D: CCA + SVM • E: our proposed framework (CCA + our SVM).
  • 22. Effects on The Correlation Coefficient ρ 22 • Recognition rates for the two models were 47.22% and 77.78%, respectively. (a) Averaged |wi| of standard SVM (b) Averaged |wi| of our SVM • We successfully suppress the SVM model |wi| when lower ρ is resulted. • Ex: source: camera 3, target: camera 2, left-out action: get-up dimension index dimension index wiwi
  • 23. Outline • Introduction • The Proposed Framework Learning Correlation Subspaces via CCA Domain Transfer Ability of CCA SVM with A Novel Correlation Regularizer • Experiments • Conclusion 23
  • 24. Conclusions • We presented a transfer-learning based approach to cross- camera action recognition. • We considered the domain transfer ability of CCA, and proposed a novel SVM formulation with a correlation regularizer. • Experimental results on the IXMAS dataset confirmed performance improvements using our proposed method. 24
  • 26. Representing an action 26 human body model [Mikić et al., IJCV, 2003] [Junejo et al., TPAMI, 2010]
  • 27. Representing an action 27 [Blank et al., ICCV, 2005] [Weinland et al., CVIU, 2006] Motion history volume Space-time shapes spatio-temporal volumes
  • 28. ℝ276 Source view Target view Split-based feature transfer (ECCV ‘08) 28 1 s v 2 s v 3 s v𝒳 𝑠 ∈ ℝ40 2 t v 𝒳 𝑡 ∈ ℝ40 ℝ276 K-means K-means Target instance in the source representation frame action video Matching according to split-based feature
  • 29. Source view How to construct split-based feature 29 ℝ30 ℝ30 ℝ30 ⋮ 1000 different random projections ℝ276 ℝ30 Max Margin Clustering 25 1 1 1 1             Split-based feature ℝ25 Pick the best 25 random projections + ℝ30 - Target view ℝ276 25 1 1 1 1             Split-based feature ℝ25 ℝ30 Train SVM using split-based feature as labels + ℝ30 ℝ30 ℝ30 ⋮ Same best 25 random projections unlabeled frame
  • 30. Source view Target view 4. Train models and predict with this representation 3. Construct the codebook of bilingual words 30 1. Exploit unlabeled data to model the two codebooks as a bipartite graph ⋯ ⋯ 1 s v 2 s v 2 s v s s dv 2 s v 2 s v 2. Perform spectral clustering s s dv s s dv 2 t v Bag of Bilingual Words (CVPR ‘11)
  • 31. Learning correlation subspace via CCA • The projection vector us can be solved by a generalized eigenvalue decomposition problem: 31 • Largest η corresponds to largest ρ. • Once us is obtained, ut can be calculated by 1 s t tt st    Σ Σ u u ⋯ ⋯ ⋯ ⋯ ] ] [ [ P 𝑠 = P 𝑡 = eigenvalues η1 correlation coefficient ρ1 u1 𝑠 u1 𝑡 ∈ ℝ 𝑑 𝑠×𝑑 ∈ ℝ 𝑑 𝑡×𝑑   1 s s st tt st ss  Σ Σ Σ u Σ u•     1 s s st tt t st ss s      Σ Σ I Σ u Σ I u• > ⋯ > ηd > ⋯ > ρd ⋯ u 𝑑 𝑠 ⋯ u 𝑑 𝑡
  • 32. Source view Target view 32 Colored: labeled data : test data Gray: unlabeled data LOCO protocol in real application: new action class

Editor's Notes

  1. Among these approaches,…. Partly inspired by the progress of feature extraction in image classification/ tracking and detection, …etc In our work, we adopt the… to represent an action
  2. This is regarded as “cross-camera action recognition” Traditional learning methods fail to predict test data in another view successfully. Not only because of different distribution, but sometimes the even dimension of two view can be different.
  3. Since test data are not available beforehand, one has to assume there are some other data in target view, in order to facilitate the recognition task. One scenario is introducing unlabeled data, whose label is not of our interest for the time being. This scenario is often called “unsupervised cross-camera action recognition”because there are no labeled data in the target view. And it belongs to a branch of transfer learning.
  4. Specifically, transfer learning
  5. Note that we only have labeled source data for training
  6. standard SVM: aims at separating data the in the correlated subspace without considering the domain transfer ability (i.e., the correlation between projected data), and thus we still observe prominent |wi| values at non-dominant feature dimensions (i.e., the 11th dimension) our proposed SVM: suppresses the contributions of non-dominant feature dimensions in the correlated subspace, and thus only results in large |wi| values for dominant feature dimensions.
  7. To recognize an action, one has to decide how to represent it. Some authors utilized human body model. They determined body poses by tracking limbs and torso, and recognizing actions accordingly.
  8. Besides that, some researchers focused more on the action itself rather than human body. They proposed spatio-temporal volumes which encode not only the spatial shape of silhouette but its change in temporal domain
  9. After short derivation, the maximization problem is reduced to be a GED problem. Usually one introduces a regularization term to alleviate singularity and overfit issue.