Deep Multi-task Learning with Label Correlation Constraint for
Video Concept Detection
Foteini Markatopoulou1,2, Vasileios Mezaris1, Ioannis Patras2
1Information Technologies Institute (ITI), CERTH, Thessaloniki, Greece 2Queen Mary University of London, London, UK
Supported by:
Fine-tuning process (i) Baseline (ii) Extension (iii) Re-initialization
Fine-tuning
parameters
(a) (b) (c) (d) (e) (f)
#Neurons for the
extension layer
#Neurons for the re-
initialization layer
1096 2048 4096 1096 2048
DefaultTL-Softmax 16.76 16.22 15.53 14.79 16.24 16.68
DefaultTL-Hinge 13.26 19.91 19.89 18.76 19.20 15.30
Proposed DMTL 12.71 15.82 14.89 19.93 18.39 19.47
Proposed DMTL_LC 15.78 20.13 22.60 20.84 22.54 21.47
Methods AlexNet AlexNet+DefaultTL
(best from (A))
Direct output - 19.91
STL
(e.g. [4])
LR 18.57 22.34
LSVM 20.59 22.21
KSVM 18.81 21.79
MTL AMTL [2] 20.44 22.21
CMTL [3] 18.18 22.38
2S-NN [1] 20.19 23.12
Proposed DMTL_LC 22.60 25.04
Proposed DMTL_LC Experimental setup
• Dataset: TRECVID SIN 2013
o Internet archive videos
o Training set: 800 hours
o Test set: 200 hours
• Evaluated concepts: 38
• Evaluation measure: MXinfAP (%)
• Compared methods:
o STL using: a) LR, b) LSVM, c)
kernel SVM (KSVM).
o MTL using: a) AMTL [2], b) CMTL
[3], c) 2S-NN [1], the two-sided
neural network instantiated with
GO-MTL.
• Proposed method: Deep multi-task learning with
label constraint (DMTL_LC).
• Contribution: A deep convolutional neural
network (DCNN) that jointly considers task and
label relations.
• Algorithmic details:
o Extend a DCNN with a two-sided network,
which is equivalent to an MTL-like loss (i.e.,
GO-MTL loss function).
o Extend the two-sided network with a label-
based constraint in order to incorporate
statistical information of pairwise correlations
between concepts.
Experimental results
B) Comparisons: DMTL_LC compared to different STL and
MTL methods using two pre-trained DCNNs.
• DefaultTL [4]: Replaces
the classification layer
of AlexNet with a new
layer.
• DMTL: Uses a two-
sided network but does
not use the label-based
constraint of DMTL_LC.
• DMTL_LC: Uses a two-
sided network and the
label-based constraint.
References:
[1] Y. Yang & T. M. Hospedales. A unified perspective on multi-domain and multi-task learning. In Int. Conf. on Learning Representations , San Diego, 2015. [2] G. Sun, et al. Adaptive multi-task learning for fine-grained categorization. In IEEE Int. Conf. on Image Processing , Quebec, 2015.
[3] J. Zhou, et al. Clustered multi-task learning via alternating structure optimization. Advances in Neural Information Processing Systems , 2011. [4] K. Chatfield, et al. Return of the devil in the details: Delving deep into convolutional nets. In British Machine Vision Conference, 2014.
A) Deciding on a fine-tuning process: comparison of three different ones ((i) Baseline, (ii) Extension, (iii) Re-
initialization) on the pre-trained 8-layer AlexNet.
Problem and motivation Background
• Video concept detection: Assign one or more
semantic concepts to video fragments (e.g., video
keyframes) based on a predefined concept list.
• Motivation: Concepts do not appear in
isolation from each other.
Label relationsTask relations
• Single-task learning (STL)
• Multi-task learning (MTL)
• Typical solution: Here you learn k
latent concepts
common for all of
the tasks.
For each task t you
learn which of these
latent concepts
describe it.
Learning task
grouping and
overlap in MTL
(GO-MTL): Let
𝐰(t)
= 𝐋𝐬(t)
For linear models
(e.g. SVM):
𝐲
(t)
= 𝐗
(t)
𝐰(t)

Deep Multi-task Learning with Label Correlation Constraint for Video Concept Detection

  • 1.
    Deep Multi-task Learningwith Label Correlation Constraint for Video Concept Detection Foteini Markatopoulou1,2, Vasileios Mezaris1, Ioannis Patras2 1Information Technologies Institute (ITI), CERTH, Thessaloniki, Greece 2Queen Mary University of London, London, UK Supported by: Fine-tuning process (i) Baseline (ii) Extension (iii) Re-initialization Fine-tuning parameters (a) (b) (c) (d) (e) (f) #Neurons for the extension layer #Neurons for the re- initialization layer 1096 2048 4096 1096 2048 DefaultTL-Softmax 16.76 16.22 15.53 14.79 16.24 16.68 DefaultTL-Hinge 13.26 19.91 19.89 18.76 19.20 15.30 Proposed DMTL 12.71 15.82 14.89 19.93 18.39 19.47 Proposed DMTL_LC 15.78 20.13 22.60 20.84 22.54 21.47 Methods AlexNet AlexNet+DefaultTL (best from (A)) Direct output - 19.91 STL (e.g. [4]) LR 18.57 22.34 LSVM 20.59 22.21 KSVM 18.81 21.79 MTL AMTL [2] 20.44 22.21 CMTL [3] 18.18 22.38 2S-NN [1] 20.19 23.12 Proposed DMTL_LC 22.60 25.04 Proposed DMTL_LC Experimental setup • Dataset: TRECVID SIN 2013 o Internet archive videos o Training set: 800 hours o Test set: 200 hours • Evaluated concepts: 38 • Evaluation measure: MXinfAP (%) • Compared methods: o STL using: a) LR, b) LSVM, c) kernel SVM (KSVM). o MTL using: a) AMTL [2], b) CMTL [3], c) 2S-NN [1], the two-sided neural network instantiated with GO-MTL. • Proposed method: Deep multi-task learning with label constraint (DMTL_LC). • Contribution: A deep convolutional neural network (DCNN) that jointly considers task and label relations. • Algorithmic details: o Extend a DCNN with a two-sided network, which is equivalent to an MTL-like loss (i.e., GO-MTL loss function). o Extend the two-sided network with a label- based constraint in order to incorporate statistical information of pairwise correlations between concepts. Experimental results B) Comparisons: DMTL_LC compared to different STL and MTL methods using two pre-trained DCNNs. • DefaultTL [4]: Replaces the classification layer of AlexNet with a new layer. • DMTL: Uses a two- sided network but does not use the label-based constraint of DMTL_LC. • DMTL_LC: Uses a two- sided network and the label-based constraint. References: [1] Y. Yang & T. M. Hospedales. A unified perspective on multi-domain and multi-task learning. In Int. Conf. on Learning Representations , San Diego, 2015. [2] G. Sun, et al. Adaptive multi-task learning for fine-grained categorization. In IEEE Int. Conf. on Image Processing , Quebec, 2015. [3] J. Zhou, et al. Clustered multi-task learning via alternating structure optimization. Advances in Neural Information Processing Systems , 2011. [4] K. Chatfield, et al. Return of the devil in the details: Delving deep into convolutional nets. In British Machine Vision Conference, 2014. A) Deciding on a fine-tuning process: comparison of three different ones ((i) Baseline, (ii) Extension, (iii) Re- initialization) on the pre-trained 8-layer AlexNet. Problem and motivation Background • Video concept detection: Assign one or more semantic concepts to video fragments (e.g., video keyframes) based on a predefined concept list. • Motivation: Concepts do not appear in isolation from each other. Label relationsTask relations • Single-task learning (STL) • Multi-task learning (MTL) • Typical solution: Here you learn k latent concepts common for all of the tasks. For each task t you learn which of these latent concepts describe it. Learning task grouping and overlap in MTL (GO-MTL): Let 𝐰(t) = 𝐋𝐬(t) For linear models (e.g. SVM): 𝐲 (t) = 𝐗 (t) 𝐰(t)