Human Action Recognition using Lagrangian Descriptors

Human Action Recognition using Lagrangian Descriptors

Esra Acar, Tobias Senst, Alexander Kuhn, Ivo Keller, Holger Theisel, Sahin Albayrak, Thomas Sikora

Esra Acar
Competence Center Information Retrieval and Machine Learning

VideoSense Project (www.videosense.eu)
SemSeg Project (www.semseg.org )

Outline

▶ Introduction
▶ Lagrangian Methods
▶ Our Method
 Lagrangian Descriptors
 Lagrangian Classifier
 Performance Evaluation
▶ Privacy Aspect
▶ Conclusions & Future Work

MMSP’2012 Human Action Recognition using Lagrangian Descriptors 2

Introduction

▶ Human action recognition
 pertains to a multitude of application domains, ranging
from automatic video surveillance to semantic video
indexing and retrieval.
 requires the description of complex motion patterns in
image sequences.
▶ Our aim in this work: To provide a reliable solution for
recognizing basic human actions such as walking, running
and boxing.
▶ Final goal: To build a solution to recognize more complex
actions, in particular in the scenarios involving more than
one basic action.


Lagrangian Methods

▶ When the sequence of an optical flow fields is treated as an
unsteady vector field, then Lagrangian analysis can be
applied.
▶ Lagrangian methods have been used for crowd
segmentation.

▶ The concept of Lagrangian Coherent Structures (LCS)
describe the properties of trajectories in the space time
domain.
▶ In video analysis, this trajectory description corresponds to
the notion of an edge of an enclosed moving object within
the image.


Our Method

▶ We focus on two important Lagrangian motion features for
the description of human actions in image sequences:

 Motion boundaries denote areas of high particle trajectory
separation and segment areas of different motion patterns
of varying time intervals.

 Areas of coherent flow motion gather regions of similar
speed over the respective time interval.


Lagrangian Descriptors - 1

▶ The flow map describes a mapping of an initial position to its
next position after a predefined integration time τ starting at
t0.
▶ Combining all positions of one specific point within a certain
time interval [t0; t0 + τ] creates a polynomial curve denoted
as path line.



▶ Technique to extract LCS: Finite-Time Lyapunov Exponents
(FTLE) which is a method that
 describes the motion boundaries, and
 separates regions of coherent flow behavior.
▶ Given the optical flow field, FTLE is computed as follows:



▶ Similar motion behavior can be formulated using further
geometric features of the path lines.

▶ The areas of similar motion behaviors over time are
represented with time-normalized arc length (Ʌ ).
arcL

▶ The time-normalized arc length of a given time t0 and an
integration interval τ is defined as:



▶ We compute Histogram of oriented Gradients (HoG) to avoid
explicit extraction of ridge structures.


Lagrangian Classifier

▶ FTLE-HoG and Ʌ -HoG descriptors of video frames provide
arcL
complementary information about an action.
▶ These two descriptors are fused before training linear multi-
class SVMs.


Performance Evaluation - 1

Results on the Weizmann dataset

Results on the KTH dataset


Performance Evaluation - 2
Confusion Matrix on the Weizmann dataset

Most confused action pair is
skip-jump pair.

Confusion Matrix on the KTH dataset

Most confused action pairs are
• jogging-running,
• jogging-walking,
• handclapping-hand waving,
• boxing-hand clapping.


Privacy Aspect

▶ Working with image data introduces another important
aspect which is the privacy of the recorded individuals.

▶ In our method, we do not store image sequences directly,
but optical flow fields.

▶ Optical flow fields only encode abstract motion patterns.

▶ The recognition of person-related features extracted from
abstract Lagrangian motion patterns is much more difficult.


Conclusions & Future Work

▶ This study was a starting point to represent & recognize complex
human actions.
▶ The main difference from previous works is that our method
transforms the motion information of an individual in a given time
interval into a 2D space.
▶ It is shown that FTLE and Ʌ Lagrangian measures are well
arcL
adapted to model individual human activities.

▶ Future Work
 Local motion representation – Interest points-based method.
 Multi-level motion representation with multi-level Bag-of-Words
approach.
 Recognizing more complex human motions.


Thanks!

Questions?


Human Action Recognition using Lagrangian Descriptors

Recommended

Recommended

More Related Content

Similar to Human Action Recognition using Lagrangian Descriptors

Similar to Human Action Recognition using Lagrangian Descriptors (20)

Human Action Recognition using Lagrangian Descriptors