Action Recognition based Graph Cut

Action Recognition using
Graph-Cut (I)
J.Iveel
2014-9-23

Intro
• Proposed to recognize human action from video
using Graph-Cut approach.
• Algorithmic stages can be defined as follows:
– Pre-GraphCut: Input video segment S should be
converted into graphical representation Gs(V,E)
– Pro-GraphCut: Given Gs(V,E) and optimum action-category-
labels Lv for its node V, which is the output
from Graph-Cut, select a set of sub-graph where
action(s) of interest might happened.

Some Notation
• “Video segment”, S, refers to a set of local feature point
extracted at X location, described by descriptor D:
• “Confidence score” refers to a likelihood of class label l
given observation o :

Pre-GraphCut
• Converting video segment into graphical representation
requires:
(1)Breaking down whole video segment S into spatio-temporal
grids. Each grid volume is node Vi connected
to its neibhourhood by edge Ei in graph Gs.
(2)Assigning confidence-score for node Vi

Node Confidence-Score
• The most challenging problem is (2): assigning
confidence score for each node:
– Node is, simply, a set of feature points within
grid volume:
– Therefore, node confidence can be defined by
an unknown function, g, over these feature
points inside.

Node Confidence-Score
• The naïve approach is to find confidence-score for
each feature point inside node and accumulate
these scores to get node-score:
Then, let us find feature confidence-score, i.e,
likelihood of class l given local feature fj.

Feature Confidence-Score (1)
• Target is to measure:

• Constructed BOV histogram for each test video
segment, with centroids C:
• Trained binary linear SVM, to produce a support
vector for class label l:

• Given a feature point from test segment, then its
confidence score:
(1) Hard Assignment:
(2) N-Soft Assignment:

Experiment: Feature Confidence (1)
• Hard-Assignment case:

Experiment: Feature Confidence (2)
• N-Soft Assignment case:

Node Cost-Value (1)
• Graph-Cut framework, it minimizes the total penalty/cost
value of single nodes and neighborhood nodes given node
label configuration L:
• Node cost score is inversely proportional to the likelihood
or confidence score:

Node Cost-Score (2)
• Assuming node confidence score is a sum of feature point
scores (using hard assignment):
• Considered following inverse relationship to derive node
cost score:
(1) Nlog ( Negative Log-likelihood)
(2) Norm ( Negative Normalized Confidence Score)
(3) Naive ( Negative Raw Confidence Score)

Method 1: NLog
• Probabilistic interpretation: According Platt[1], he showed
interpreting SVM confidence score in a probabilistic manner using
a parametric form of a sigmoid to :
• Negative Log Likelihood: In MRF (Graph-Cut), the cost values
often associated with neg-log of the measurement of noise. Similar,
once confidence values are translated into probability, operation is
applied to derive cost score:
•
•

Method 2: Norm
• The confidence score is scaled between 0 and 1. Then cost
value is associated with the negative of these values:

Method 3: Naive
• The cost value is directly associated with the negative of
the raw confidence score:

Experiment: Node Cost Score (1)
• With default parameters, Naive approach, surprisingly, outperforming other two methods. The
worst performance is observed with the Norm method
• The NLog approach performed lesser than my personal expectation. The reason, maybe,
associated with the tuning parameters, A and B, of the sigmoid equation:
• In particular, the parameter A is in control of slope. Let's inspect this parameter's effect on the
performance

• NLog approach: Sigmoid parameter A's effect on the performance

Num Method Avg. Recognition
1 Nlog ( optimized parameter) 96.8 %
2 Norm 95.8 %
3 Naive 93.5 %

Conclusion
• In this slides, the two main questions being explored, which
all related to construction of video graph G and proposed a
few methods and did an experiment on the KTH dataset.
– (i) Assign confidence score at feature-level
● Soft-assignment
● Hard-assignment
– (ii) Assigning confidence score at node-level
● Nlog ( Negative likelihood )
● Norm
● Naive

Future Works
• Future work will explore:
– Alternative construction of video graph:
● Instead of defined grid, use super-voxel for
choosing node region.
– Single feature confidence score:
● Instead of BOF, using VLAD descriptor for
obtaining more discriminative representation of
feature.

Action Recognition based Graph Cut

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Action Recognition based Graph Cut

Similar to Action Recognition based Graph Cut (20)

Recently uploaded

Recently uploaded (20)

Action Recognition based Graph Cut