A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Action Recognition

Outline
Introduction
Framework Overview
Experimental Conditions
Results and Analysis
Conclusions and Future Work
A MKL Based Fusion Framework for Real-Time
Multi-View Action Recognition
Feng Gu, Francisco Florez-Revuelta, Dorothy Monekosso and
Paolo Remagnino
Digital Imaging Research Centre
Kingston University, London, UK
December 3rd, 2014
1 / 22

Outline
Introduction
Framework Overview
1 Introduction
2 Framework Overview
3 Experimental Conditions
4 Results and Analysis
5 Conclusions and Future Work
2 / 22

Outline
Introduction
Framework Overview
Background and Motivations
Real-time multi-view action recognition:
Gain an increasing interest in video surveillance, human
computer interaction, and multimedia retrieval etc.
Provide complementary

eld of views (FOVs) of a monitored
scene via multiple cameras
Lead to a more robust decision making based on multiple
heterogeneous video streams
Real-time capacity enables continuous long-term monitoring
If possible multiple cameras should be deployed to monitor
human behaviour, where data fusion techniques can be applied.
3 / 22

Outline
Introduction
Framework Overview
Illustration of the Monitored Scenario
C4
C1
C2
C3
4 / 22

Outline
Introduction
Framework Overview
Motion-Based Person Detector
We use a state-of-the-art motion-based tracker [6]:
Each pixel modelled as a mixture of Gaussians in RGB space
Background model to

nd foreground pixels in a new frame
Found foreground pixels grouped to form large regions
associated the person of interest
Kalman

lters used to track foreground detections
Person detections generated for every frame
5 / 22

Outline
Introduction
Framework Overview
Feature Representation of Videos
Use of STIP and improved dense trajectories (IDT) [7] as
local descriptor to extract visual features from a video
Person detections and frame spans to de

ne a XYT cuboid
associated with an action performed by the monitored person
Apply bag of words (BOWs) to compute the feature vector of
a cuboid, where K-Means clustering used for the generation
of a codebook
6 / 22

Outline
Introduction
Framework Overview
Disciminative Models for Classi

cation
Let xki
2 RD, where i 2 f1; 2; : : : ;Ng is the index of a feature
vector corresponding to a XYT cuboid and k 2 f1; 2; : : : ;Kg is the
index of a camera view. We learn a SVM classi

er as
f (x) =
XN
i=1
i yik(xi ; x) + b (1)
We then compute a classi

cation score via a sigmoid function as
p(y = 1jx) =
1
1 + exp(f (x))
(2)
7 / 22

Outline
Introduction
Framework Overview
Simple Fusion Strategies
Ki1i
Concantenation of Features: concatenate the feature
vectors of multiple views into one single feature vector such
that ~xi = [x; : : : ; x]
Sum of Classi

cation Scores: compute a classi

cation score
for each camera view p(y = 1jx) as in 2, and then average
them as 1K
PK
k=1 p(y = 1jxk )
Product of Classi

cation Scores: apply the product rule to
tQhe classi

cation scores of all the camera views as K
k=1 p(y = 1jxk )
8 / 22

Outline
Introduction
Framework Overview
Multiple Kernel Learning
Combine of multiple kernels corresponding to dierent data
sources (e.g. camera views) via a convex function such as
K(xi ; xj ) =
XK
k=1

A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Action Recognition

More Related Content

What's hot

Viewers also liked

Similar to A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Action Recognition

More from Francisco (Paco) Florez-Revuelta

Recently uploaded

A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Action Recognition