A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
A brief review on video representation
Xiang Xiang
Department of Computer Science
Johns Hopkins University
May 16, 2018
A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Outline
1 Problems
Action recognition and action quality assessment
Spatial/temporal action detection/segmentation
2 Approaches
Image-set models
Temporally local motion models
Temporally global models
Locally spatiotemporal models
3 Tools used
Compressed sensing and sparse recovery
Representation learning
A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Outline
1 Problems
Action recognition and action quality assessment
Spatial/temporal action detection/segmentation
2 Approaches
Image-set models
Temporally local motion models
Temporally global models
Locally spatiotemporal models
3 Tools used
Compressed sensing and sparse recovery
Representation learning
A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Action recognition and action
quality assessment
• Facial expression recognition.
• Facial action intensity estimation.
• Human action recognition.
• Human action quality assessment.
• Video face recognition.
• Video categorization.
A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Outline
1 Problems
Action recognition and action quality assessment
Spatial/temporal action detection/segmentation
2 Approaches
Image-set models
Temporally local motion models
Temporally global models
Locally spatiotemporal models
3 Tools used
Compressed sensing and sparse recovery
Representation learning
A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Spatial/temporal action
detection/segmentation
• Facial expression detection.
• Facial action unit detection.
• Camera motion estimation.
• Action detection.
• Object tracking.
• Change detection.
• Video summarization.
A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Outline
1 Problems
Action recognition and action quality assessment
Spatial/temporal action detection/segmentation
2 Approaches
Image-set models
Temporally local motion models
Temporally global models
Locally spatiotemporal models
3 Tools used
Compressed sensing and sparse recovery
Representation learning
A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Image-set models
• Set theory.
• Distribution based: KL divergence on Gaussian mixtures.
• Subspace based: linear subspaces on Grassmann manifold.
• Sample based: feature averaging, aggregation, covariance
matrix, and Vector of Locally Aggregated Descriptors
(VLAD).
A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Outline
1 Problems
Action recognition and action quality assessment
Spatial/temporal action detection/segmentation
2 Approaches
Image-set models
Temporally local motion models
Temporally global models
Locally spatiotemporal models
3 Tools used
Compressed sensing and sparse recovery
Representation learning
A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Temporally local motion models
• Data association, regional proposal linking, and thread
building.
• Motion model: optical flow, Two-Stream CNN and
Bayesian filter.
A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Outline
1 Problems
Action recognition and action quality assessment
Spatial/temporal action detection/segmentation
2 Approaches
Image-set models
Temporally local motion models
Temporally global models
Locally spatiotemporal models
3 Tools used
Compressed sensing and sparse recovery
Representation learning
A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Temporally global models
• Temporal aggregation: NetVLAD.
• Hidden Markov Models (HMM) and Conditional Random
Fields (CRF).
• Temporal coding: Temporal Convolutional Networks
(TCN), Temporal Segment Networks (TSN), and Deep
Temporal Linear Encoding Networks.
• Recurrent neural networks (RNN): Long Short-Term
Memoory (LSTM).
A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Outline
1 Problems
Action recognition and action quality assessment
Spatial/temporal action detection/segmentation
2 Approaches
Image-set models
Temporally local motion models
Temporally global models
Locally spatiotemporal models
3 Tools used
Compressed sensing and sparse recovery
Representation learning
A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Locally spatiotemporal models
• Spatiotemporal aggregation: ActionVLAD.
• Volumetric modeling: 3D graph models.
• Action tube/tubelet proposal: 3D Convolutional Neural
Network (CNN).
A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Outline
1 Problems
Action recognition and action quality assessment
Spatial/temporal action detection/segmentation
2 Approaches
Image-set models
Temporally local motion models
Temporally global models
Locally spatiotemporal models
3 Tools used
Compressed sensing and sparse recovery
Representation learning
A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Compressed sensing and sparse
recovery
One goal is to find underlying structures in noisy observations.
• Linear algebra: y = Dx + e = [ D | I ] ×
x
e
= Dx.
• Sparse coding: [x∗, e∗]T = x∗ = arg minx sparsity(x).
• Convex optimization: x∗ = y − Dx 2
2 + λ x 1.
• Matrix theory and matrix approximation: Y = L + E so
that L∗ = arg minL Y − L 2
F + λ L ∗ which is the robust
version of Principal Component Analysis (PCA). PCA finds
projection U = arg minU Y − UUT
Y 2
F s.t. UUT
= I.
A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Outline
1 Problems
Action recognition and action quality assessment
Spatial/temporal action detection/segmentation
2 Approaches
Image-set models
Temporally local motion models
Temporally global models
Locally spatiotemporal models
3 Tools used
Compressed sensing and sparse recovery
Representation learning
A brief review
on video
representation
Xiang Xiang
Problems
Action
recognition and
action quality
assessment
Spatial/temporal
action detec-
tion/segmentation
Approaches
Image-set
models
Temporally local
motion models
Temporally
global models
Locally
spatiotemporal
models
Tools used
Compressed
sensing and
sparse recovery
Representation
learning
Representation learning
Representation here means that we somehow transform the
data so that its essential structure is made more visible or
accessible.
• PCA.
• K-means clustering.
• Linear Discriminant Analysis (LDA).
• Independent component analysis (ICA).
• Deep CNN and RNN.

A brief review on video representation

  • 1.
    A brief review onvideo representation Xiang Xiang Problems Action recognition and action quality assessment Spatial/temporal action detec- tion/segmentation Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models Tools used Compressed sensing and sparse recovery Representation learning A brief review on video representation Xiang Xiang Department of Computer Science Johns Hopkins University May 16, 2018
  • 2.
    A brief review onvideo representation Xiang Xiang Problems Action recognition and action quality assessment Spatial/temporal action detec- tion/segmentation Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models Tools used Compressed sensing and sparse recovery Representation learning Outline 1 Problems Action recognition and action quality assessment Spatial/temporal action detection/segmentation 2 Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models 3 Tools used Compressed sensing and sparse recovery Representation learning
  • 3.
    A brief review onvideo representation Xiang Xiang Problems Action recognition and action quality assessment Spatial/temporal action detec- tion/segmentation Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models Tools used Compressed sensing and sparse recovery Representation learning Outline 1 Problems Action recognition and action quality assessment Spatial/temporal action detection/segmentation 2 Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models 3 Tools used Compressed sensing and sparse recovery Representation learning
  • 4.
    A brief review onvideo representation Xiang Xiang Problems Action recognition and action quality assessment Spatial/temporal action detec- tion/segmentation Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models Tools used Compressed sensing and sparse recovery Representation learning Action recognition and action quality assessment • Facial expression recognition. • Facial action intensity estimation. • Human action recognition. • Human action quality assessment. • Video face recognition. • Video categorization.
  • 5.
    A brief review onvideo representation Xiang Xiang Problems Action recognition and action quality assessment Spatial/temporal action detec- tion/segmentation Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models Tools used Compressed sensing and sparse recovery Representation learning Outline 1 Problems Action recognition and action quality assessment Spatial/temporal action detection/segmentation 2 Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models 3 Tools used Compressed sensing and sparse recovery Representation learning
  • 6.
    A brief review onvideo representation Xiang Xiang Problems Action recognition and action quality assessment Spatial/temporal action detec- tion/segmentation Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models Tools used Compressed sensing and sparse recovery Representation learning Spatial/temporal action detection/segmentation • Facial expression detection. • Facial action unit detection. • Camera motion estimation. • Action detection. • Object tracking. • Change detection. • Video summarization.
  • 7.
    A brief review onvideo representation Xiang Xiang Problems Action recognition and action quality assessment Spatial/temporal action detec- tion/segmentation Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models Tools used Compressed sensing and sparse recovery Representation learning Outline 1 Problems Action recognition and action quality assessment Spatial/temporal action detection/segmentation 2 Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models 3 Tools used Compressed sensing and sparse recovery Representation learning
  • 8.
    A brief review onvideo representation Xiang Xiang Problems Action recognition and action quality assessment Spatial/temporal action detec- tion/segmentation Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models Tools used Compressed sensing and sparse recovery Representation learning Image-set models • Set theory. • Distribution based: KL divergence on Gaussian mixtures. • Subspace based: linear subspaces on Grassmann manifold. • Sample based: feature averaging, aggregation, covariance matrix, and Vector of Locally Aggregated Descriptors (VLAD).
  • 9.
    A brief review onvideo representation Xiang Xiang Problems Action recognition and action quality assessment Spatial/temporal action detec- tion/segmentation Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models Tools used Compressed sensing and sparse recovery Representation learning Outline 1 Problems Action recognition and action quality assessment Spatial/temporal action detection/segmentation 2 Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models 3 Tools used Compressed sensing and sparse recovery Representation learning
  • 10.
    A brief review onvideo representation Xiang Xiang Problems Action recognition and action quality assessment Spatial/temporal action detec- tion/segmentation Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models Tools used Compressed sensing and sparse recovery Representation learning Temporally local motion models • Data association, regional proposal linking, and thread building. • Motion model: optical flow, Two-Stream CNN and Bayesian filter.
  • 11.
    A brief review onvideo representation Xiang Xiang Problems Action recognition and action quality assessment Spatial/temporal action detec- tion/segmentation Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models Tools used Compressed sensing and sparse recovery Representation learning Outline 1 Problems Action recognition and action quality assessment Spatial/temporal action detection/segmentation 2 Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models 3 Tools used Compressed sensing and sparse recovery Representation learning
  • 12.
    A brief review onvideo representation Xiang Xiang Problems Action recognition and action quality assessment Spatial/temporal action detec- tion/segmentation Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models Tools used Compressed sensing and sparse recovery Representation learning Temporally global models • Temporal aggregation: NetVLAD. • Hidden Markov Models (HMM) and Conditional Random Fields (CRF). • Temporal coding: Temporal Convolutional Networks (TCN), Temporal Segment Networks (TSN), and Deep Temporal Linear Encoding Networks. • Recurrent neural networks (RNN): Long Short-Term Memoory (LSTM).
  • 13.
    A brief review onvideo representation Xiang Xiang Problems Action recognition and action quality assessment Spatial/temporal action detec- tion/segmentation Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models Tools used Compressed sensing and sparse recovery Representation learning Outline 1 Problems Action recognition and action quality assessment Spatial/temporal action detection/segmentation 2 Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models 3 Tools used Compressed sensing and sparse recovery Representation learning
  • 14.
    A brief review onvideo representation Xiang Xiang Problems Action recognition and action quality assessment Spatial/temporal action detec- tion/segmentation Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models Tools used Compressed sensing and sparse recovery Representation learning Locally spatiotemporal models • Spatiotemporal aggregation: ActionVLAD. • Volumetric modeling: 3D graph models. • Action tube/tubelet proposal: 3D Convolutional Neural Network (CNN).
  • 15.
    A brief review onvideo representation Xiang Xiang Problems Action recognition and action quality assessment Spatial/temporal action detec- tion/segmentation Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models Tools used Compressed sensing and sparse recovery Representation learning Outline 1 Problems Action recognition and action quality assessment Spatial/temporal action detection/segmentation 2 Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models 3 Tools used Compressed sensing and sparse recovery Representation learning
  • 16.
    A brief review onvideo representation Xiang Xiang Problems Action recognition and action quality assessment Spatial/temporal action detec- tion/segmentation Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models Tools used Compressed sensing and sparse recovery Representation learning Compressed sensing and sparse recovery One goal is to find underlying structures in noisy observations. • Linear algebra: y = Dx + e = [ D | I ] × x e = Dx. • Sparse coding: [x∗, e∗]T = x∗ = arg minx sparsity(x). • Convex optimization: x∗ = y − Dx 2 2 + λ x 1. • Matrix theory and matrix approximation: Y = L + E so that L∗ = arg minL Y − L 2 F + λ L ∗ which is the robust version of Principal Component Analysis (PCA). PCA finds projection U = arg minU Y − UUT Y 2 F s.t. UUT = I.
  • 17.
    A brief review onvideo representation Xiang Xiang Problems Action recognition and action quality assessment Spatial/temporal action detec- tion/segmentation Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models Tools used Compressed sensing and sparse recovery Representation learning Outline 1 Problems Action recognition and action quality assessment Spatial/temporal action detection/segmentation 2 Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models 3 Tools used Compressed sensing and sparse recovery Representation learning
  • 18.
    A brief review onvideo representation Xiang Xiang Problems Action recognition and action quality assessment Spatial/temporal action detec- tion/segmentation Approaches Image-set models Temporally local motion models Temporally global models Locally spatiotemporal models Tools used Compressed sensing and sparse recovery Representation learning Representation learning Representation here means that we somehow transform the data so that its essential structure is made more visible or accessible. • PCA. • K-means clustering. • Linear Discriminant Analysis (LDA). • Independent component analysis (ICA). • Deep CNN and RNN.