Cross-view Activity Recognition using Hankelets

CVPR 2012
Cross-view Activity Recognition using
Hankelets
Binlong Li, Octavia I. Camps and Mario Sznaier
Northeastern University

Mobuddies

Dynamic Systems
 Dynamic systems have been recently used in a
wide range of computer vision applications
 Given temporal sequence of observations
(e.g. track coordinates) model temporal
evolution as a function of low-dimensional state
vector that changes over time
 Simplest case – linear time invariant (LTI) system
(w – noise)

 Practical limitation: given set of
observations, triple is not unique and is

Hankel Matrices
 Given a sequence of measurements
, its block Hankel matrix is defined as:

 Columns correspond to overlapping
subsequences of data
 Block anti-diagonals of the matrix are constant
 This structure encapsulates the dynamic
information of the system

Initial condition invariance
 Linear time invariant system (LTI):

 In the absence of noice (w = 0):

 Then Hankel matrix is broken down to:

 Columns of Hankel matrix span the same
subspace regardless of initial conditions

Autoregressive measurements
 Suppose the sequence of measurements is auto-
regressive:

 Recall, that:

 Setting r = n in the above, we obtain:

 In other words, last column of Hankel matrix is a
linear combination of other columns

Affine transformation invariance
 Suppose we have two Hankel matrices and
corresponding to a trajectory and its affine
transformation. Auto-regressive property allows
us to write:

 Suppose affine transformation is defined as
 Then, taking into account its linearity:

 In other words, sequences share the same
autoregressor
 Recall, that
 Therefore, columns of two Hankel matrices span

Previous work
 B. Li et. al “Activity Recognition using Dynamic
Subspace Angles”, CVPR 2011
 Considers initial condition invariance.
 Imagine that class of actions (e.g. “walk”) can be
represented by a single dynamical system, and
in-class variations are captured by different initial
condition
 Then differentiating between two actions breaks
down into determining whether columns of the
two corresponding Hankel matrices lie in the
same subspace
 Uses angles between subspaces as a measure of

Overview of the method
 Uses Dense trajectories to extract many short 15-
frame tracklets.
 Builds Hankel matrix for each tracklet, capturing its
velocity
 Employs BoF-like approach (BoHk)
 Does three experiments: single-view data, multiple
view with knowledge transfer, multiple view without
knowledge transfer

Hankelets
 Hankelet is a Hankel matrix for a short trajectory
of 15 frames, formed by a sequence of
normalized velocities:

 Normalize
Hankelets:

Comparing Hankelets
 Introduce dissimilarity score between two
Hankelets:

 Derivations show, that d ≈ 0 for Hankelets
corresponding to noisy measurements of the
same dynamical system

Building codebook: cluster center
 Modify the K-means algorithm for dissimilarity
scores:
 Current Hankelet is assigned to a cluster whose
“representative” has smallest dissimilarity with the
current Hankelet
 Cluster’s “representative” is chosed as follows.
Take random Hankelet within the class, find
dissimilarities between the Hankelet and all other
Hankelets in the cluster and compute their mean.
The Hankelet with dissimilarity closest to the
mean is selected as its “representative”

Building codebook: Gamma pdf
 The histogram of dissimilarities for a typical cluster in
the dictionary of Hankelets:
 Represent each cluster
with its representative and
gamma pdf:

 Furthermore, each cluster
w has a prior probability

Bag of Hankelets (BoHk)
 Each activity video is represented with a
histogram of labels from the dictionary of K
Hankelets
 Cluster label is assigned using max probability:

where is cluster representative, is
cluster prior
 Finally, one-against-all non-linear SVM trained for
activities recognition

Bi-Lingual Hankelets
 Bi-lingual Hankelets can be easily learned from
unlabeled videos captured simultaneously from
the different viewpoints by matching Hankelets
across views (~80% are matched)
 Hankelets are matched using threshold on
dissimilarity score, if their start times are the
same (no spatial information)

Cross-view action recognition
 A labeled dataset is given, with Source and Target
views
Training
 Extract and match Bi-lingual Hankelets with
dissimilarity score
 Build codebook of Bi-lingual Hankelets using the K-
means
 Label Hankelets in Source data using max posterior
probability
 Train one-against-all non-linear SVM using Source
data
Testing

Experiments
Single-view
 KTH dataset: 95.89% avg.
Cross-View with data transfer
 Use only Bi-lingual Hankelets
 IXMAS dataset: 56.4% avg. (45.5%
improvement)
Cross-View without data transfer
 Use all Hankelets (not only Bi-lingual)
 IXMAS dataset: 90.57% avg. (20.28%
improvement)

Cross-view Activity Recognition using Hankelets

Recommended

Recommended

More Related Content

What's hot

What's hot (11)

Similar to Cross-view Activity Recognition using Hankelets

Similar to Cross-view Activity Recognition using Hankelets (20)

Recently uploaded

Recently uploaded (20)

Cross-view Activity Recognition using Hankelets