"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Cross-view Activity Recognition using Hankelets
1. CVPR 2012
Cross-view Activity Recognition using
Hankelets
Binlong Li, Octavia I. Camps and Mario Sznaier
Northeastern University
Mobuddies
2. Dynamic Systems
Dynamic systems have been recently used in a
wide range of computer vision applications
Given temporal sequence of observations
(e.g. track coordinates) model temporal
evolution as a function of low-dimensional state
vector that changes over time
Simplest case – linear time invariant (LTI) system
(w – noise)
Practical limitation: given set of
observations, triple is not unique and is
3. Hankel Matrices
Given a sequence of measurements
, its block Hankel matrix is defined as:
Columns correspond to overlapping
subsequences of data
Block anti-diagonals of the matrix are constant
This structure encapsulates the dynamic
information of the system
4. Initial condition invariance
Linear time invariant system (LTI):
In the absence of noice (w = 0):
Then Hankel matrix is broken down to:
Columns of Hankel matrix span the same
subspace regardless of initial conditions
5. Autoregressive measurements
Suppose the sequence of measurements is auto-
regressive:
Recall, that:
Setting r = n in the above, we obtain:
In other words, last column of Hankel matrix is a
linear combination of other columns
6. Affine transformation invariance
Suppose we have two Hankel matrices and
corresponding to a trajectory and its affine
transformation. Auto-regressive property allows
us to write:
Suppose affine transformation is defined as
Then, taking into account its linearity:
In other words, sequences share the same
autoregressor
Recall, that
Therefore, columns of two Hankel matrices span
7. Previous work
B. Li et. al “Activity Recognition using Dynamic
Subspace Angles”, CVPR 2011
Considers initial condition invariance.
Imagine that class of actions (e.g. “walk”) can be
represented by a single dynamical system, and
in-class variations are captured by different initial
condition
Then differentiating between two actions breaks
down into determining whether columns of the
two corresponding Hankel matrices lie in the
same subspace
Uses angles between subspaces as a measure of
8. Overview of the method
Uses Dense trajectories to extract many short 15-
frame tracklets.
Builds Hankel matrix for each tracklet, capturing its
velocity
Employs BoF-like approach (BoHk)
Does three experiments: single-view data, multiple
view with knowledge transfer, multiple view without
knowledge transfer
9. Hankelets
Hankelet is a Hankel matrix for a short trajectory
of 15 frames, formed by a sequence of
normalized velocities:
Normalize
Hankelets:
10. Comparing Hankelets
Introduce dissimilarity score between two
Hankelets:
Derivations show, that d ≈ 0 for Hankelets
corresponding to noisy measurements of the
same dynamical system
11. Building codebook: cluster center
Modify the K-means algorithm for dissimilarity
scores:
Current Hankelet is assigned to a cluster whose
“representative” has smallest dissimilarity with the
current Hankelet
Cluster’s “representative” is chosed as follows.
Take random Hankelet within the class, find
dissimilarities between the Hankelet and all other
Hankelets in the cluster and compute their mean.
The Hankelet with dissimilarity closest to the
mean is selected as its “representative”
12. Building codebook: Gamma pdf
The histogram of dissimilarities for a typical cluster in
the dictionary of Hankelets:
Represent each cluster
with its representative and
gamma pdf:
Furthermore, each cluster
w has a prior probability
13. Bag of Hankelets (BoHk)
Each activity video is represented with a
histogram of labels from the dictionary of K
Hankelets
Cluster label is assigned using max probability:
where is cluster representative, is
cluster prior
Finally, one-against-all non-linear SVM trained for
activities recognition
14. Bi-Lingual Hankelets
Bi-lingual Hankelets can be easily learned from
unlabeled videos captured simultaneously from
the different viewpoints by matching Hankelets
across views (~80% are matched)
Hankelets are matched using threshold on
dissimilarity score, if their start times are the
same (no spatial information)
15.
16. Cross-view action recognition
A labeled dataset is given, with Source and Target
views
Training
Extract and match Bi-lingual Hankelets with
dissimilarity score
Build codebook of Bi-lingual Hankelets using the K-
means
Label Hankelets in Source data using max posterior
probability
Train one-against-all non-linear SVM using Source
data
Testing
17. Experiments
Single-view
KTH dataset: 95.89% avg.
Cross-View with data transfer
Use only Bi-lingual Hankelets
IXMAS dataset: 56.4% avg. (45.5%
improvement)
Cross-View without data transfer
Use all Hankelets (not only Bi-lingual)
IXMAS dataset: 90.57% avg. (20.28%
improvement)