This survey first briefly reviews the technical problems that are raised in video analysis. Then, existing approaches are categorized to give readers a good understanding of the status quo. In the end, relevant tools are also listed.
1. A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
A brief review on video representation
Xiang Xiang
Department of Computer Science
Johns Hopkins University
May 16, 2018
2. A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Outline
1 Problems
Action recognition and action quality assessment
Spatial/temporal action detection/segmentation
2 Approaches
Image-set models
Temporally local motion models
Temporally global models
Locally spatiotemporal models
3 Tools used
Compressed sensing and sparse recovery
Representation learning
3. A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Outline
1 Problems
Action recognition and action quality assessment
Spatial/temporal action detection/segmentation
2 Approaches
Image-set models
Temporally local motion models
Temporally global models
Locally spatiotemporal models
3 Tools used
Compressed sensing and sparse recovery
Representation learning
4. A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Action recognition and action
quality assessment
• Facial expression recognition.
• Facial action intensity estimation.
• Human action recognition.
• Human action quality assessment.
• Video face recognition.
• Video categorization.
5. A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Outline
1 Problems
Action recognition and action quality assessment
Spatial/temporal action detection/segmentation
2 Approaches
Image-set models
Temporally local motion models
Temporally global models
Locally spatiotemporal models
3 Tools used
Compressed sensing and sparse recovery
Representation learning
6. A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Spatial/temporal action
detection/segmentation
• Facial expression detection.
• Facial action unit detection.
• Camera motion estimation.
• Action detection.
• Object tracking.
• Change detection.
• Video summarization.
7. A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Outline
1 Problems
Action recognition and action quality assessment
Spatial/temporal action detection/segmentation
2 Approaches
Image-set models
Temporally local motion models
Temporally global models
Locally spatiotemporal models
3 Tools used
Compressed sensing and sparse recovery
Representation learning
8. A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Image-set models
• Set theory.
• Distribution based: KL divergence on Gaussian mixtures.
• Subspace based: linear subspaces on Grassmann manifold.
• Sample based: feature averaging, aggregation, covariance
matrix, and Vector of Locally Aggregated Descriptors
(VLAD).
9. A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Outline
1 Problems
Action recognition and action quality assessment
Spatial/temporal action detection/segmentation
2 Approaches
Image-set models
Temporally local motion models
Temporally global models
Locally spatiotemporal models
3 Tools used
Compressed sensing and sparse recovery
Representation learning
10. A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Temporally local motion models
• Data association, regional proposal linking, and thread
building.
• Motion model: optical flow, Two-Stream CNN and
Bayesian filter.
11. A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Outline
1 Problems
Action recognition and action quality assessment
Spatial/temporal action detection/segmentation
2 Approaches
Image-set models
Temporally local motion models
Temporally global models
Locally spatiotemporal models
3 Tools used
Compressed sensing and sparse recovery
Representation learning
12. A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Temporally global models
• Temporal aggregation: NetVLAD.
• Hidden Markov Models (HMM) and Conditional Random
Fields (CRF).
• Temporal coding: Temporal Convolutional Networks
(TCN), Temporal Segment Networks (TSN), and Deep
Temporal Linear Encoding Networks.
• Recurrent neural networks (RNN): Long Short-Term
Memoory (LSTM).
13. A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Outline
1 Problems
Action recognition and action quality assessment
Spatial/temporal action detection/segmentation
2 Approaches
Image-set models
Temporally local motion models
Temporally global models
Locally spatiotemporal models
3 Tools used
Compressed sensing and sparse recovery
Representation learning
14. A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Locally spatiotemporal models
• Spatiotemporal aggregation: ActionVLAD.
• Volumetric modeling: 3D graph models.
• Action tube/tubelet proposal: 3D Convolutional Neural
Network (CNN).
15. A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Outline
1 Problems
Action recognition and action quality assessment
Spatial/temporal action detection/segmentation
2 Approaches
Image-set models
Temporally local motion models
Temporally global models
Locally spatiotemporal models
3 Tools used
Compressed sensing and sparse recovery
Representation learning
16. A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Compressed sensing and sparse
recovery
One goal is to find underlying structures in noisy observations.
• Linear algebra: y = Dx + e = [ D | I ] ×
x
e
= Dx.
• Sparse coding: [x∗, e∗]T = x∗ = arg minx sparsity(x).
• Convex optimization: x∗ = y − Dx 2
2 + λ x 1.
• Matrix theory and matrix approximation: Y = L + E so
that L∗ = arg minL Y − L 2
F + λ L ∗ which is the robust
version of Principal Component Analysis (PCA). PCA finds
projection U = arg minU Y − UUT
Y 2
F s.t. UUT
= I.
17. A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Outline
1 Problems
Action recognition and action quality assessment
Spatial/temporal action detection/segmentation
2 Approaches
Image-set models
Temporally local motion models
Temporally global models
Locally spatiotemporal models
3 Tools used
Compressed sensing and sparse recovery
Representation learning
18. A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Representation learning
Representation here means that we somehow transform the
data so that its essential structure is made more visible or
accessible.
• PCA.
• K-means clustering.
• Linear Discriminant Analysis (LDA).
• Independent component analysis (ICA).
• Deep CNN and RNN.