The document summarizes a research paper on facial landmark detection using deep multi-task learning. It proposes a Tasks-Constrained Deep Convolutional Network (TCDCN) that uses facial landmark detection as the main task and related auxiliary tasks like pose estimation and attribute inference to improve performance. The TCDCN learns shared representations across tasks using a deep convolutional network. It introduces task-wise early stopping to halt learning on auxiliary tasks that reach optimal performance early to avoid overfitting and improve convergence on the main task of landmark detection. Experimental results showed the proposed approach outperformed existing methods.
Semi supervised, weakly-supervised, unsupervised, and active learningYusuke Uchida
An overview of semi supervised learning, weakly-supervised learning, unsupervised learning, and active learning.
Focused on recent deep learning-based image recognition approaches.
Semi supervised, weakly-supervised, unsupervised, and active learningYusuke Uchida
An overview of semi supervised learning, weakly-supervised learning, unsupervised learning, and active learning.
Focused on recent deep learning-based image recognition approaches.
How to Become a Thought Leader in Your NicheLeslie Samuel
Are bloggers thought leaders? Here are some tips on how you can become one. Provide great value, put awesome content out there on a regular basis, and help others.
This paper presents a new local facial feature descriptor, Local Gray Code Pattern (LGCP), for facial expression recognition in contrast to widely adopted Local Binary pattern. Local Gray Code Pattern (LGCP) characterizes both the texture and contrast information of facial components. The LGCP descriptor is obtained using local gray color intensity differences from a local 3x3 pixels area weighted by their corresponding TF (term frequency). I have used extended Cohn-Kanade expression (CK+) dataset and Japanese Female Facial Expression (JAFFE) dataset with a Multiclass Support Vector Machine (LIBSVM) to evaluate proposed method. The proposed method is performed on six and seven basic expression classes in both person dependent and independent environment. According to extensive experimental results with prototypic expressions on static images, proposed method has achieved the highest recognition rate, as compared to other existing appearance-based feature descriptors LPQ, LBP, LBPU2, LBPRI, and LBPRIU2.
Fully Automatic Facial Feature Point Detection Using Gabor Feature Based Boos...Yen Ho
This is a key paper : Fully Automatic Facial Feature Point Detection Using Gabor Feature Based Boosted Classifiers - face detection (100%) & feature extraction(93%) for expressionless faces
Master Thesis on the Mathematial Analysis of Neural NetworksAlina Leidinger
Master Thesis submitted on June 15, 2019 at TUM's chair of Applied Numerical Analysis (M15) at the Mathematics Department.The project was supervised by Prof. Dr. Massimo Fornasier. The thesis took a detailed look at the existing mathematical analysis of neural networks focusing on 3 key aspects: Modern and classical results in approximation theory, robustness and Scattering Networks introduced by Mallat, as well as unique identification of neural network weights. See also the one page summary available on Slideshare.
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity MeasureIJRES Journal
This paper is devoted to processing data given in an ordinal scale. A new objective function of a
special type is introduced. A group of robust fuzzy clustering algorithms based on the similarity measure is
introduced.
Multimodal Residual Networks for Visual QAJin-Hwa Kim
Deep neural networks continue to advance the state-of-the-art of image recognition tasks with various methods. However, applications of these methods to multimodality remain limited. We present Multimodal Residual Networks (MRN) for the multimodal residual learning of visual question-answering, which extends the idea of the deep residual learning. Unlike the deep residual learning, MRN effectively learns the joint representation from vision and language information. The main idea is to use element-wise multiplication for the joint residual mappings exploiting the residual learning of the attentional models in recent studies. Various alternative models introduced by multimodality are explored based on our study. We achieve the state-of-the-art results on the Visual QA dataset for both Open-Ended and Multiple-Choice tasks. Moreover, we introduce a novel method to visualize the attention effect of the joint representations for each learning block using back-propagation algorithm, even though the visual features are collapsed without spatial information.
Slides were formed by referring to the text Machine Learning by Tom M Mitchelle (Mc Graw Hill, Indian Edition) and by referring to Video tutorials on NPTEL
My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...Anirbit Mukherjee
This is a slightly expanded version of the talk I gave at the 2018 ISMP (International Symposium on Mathematical Programming). This SIAM talk has some more introductory material than the ISMP talk.
Image Compression Using Wavelet Packet TreeIDES Editor
Methods of compressing data prior to storage and
transmission are of significant practical and commercial
interest. The necessity in image compression continuously
grows during the last decade. The image compression includes
transform of image, quantization and encoding. One of the
most powerful and perspective approaches in this area is
image compression using discrete wavelet transform. This
paper describes a new approach called as wavelet packet tree
for image compression. It constructs the best tree on the basis
of Shannon entropy. This new approach checks the entropy of
decomposed nodes (child nodes) with entropy of node, which
has been decomposed (parent node) and takes the decision of
decomposition of a node. In addition, authors have proposed
an adaptive thresholding for quantization, which is based on
type of wavelet used and nature of image. Performance of the
proposed algorithm is compared with existing wavelet
transform algorithm in terms of percentage of zeros and
percentage of energy retained and signals to noise ratio.
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMScsandit
The fingerprint images compression based on geometric transformed presents important
research topic, these last year’s many transforms have been proposed to give the best
representation to a particular type of image “fingerprint image”, like classics wavelets and
wave atoms. In this paper we shall present a comparative study between this transforms, in
order to use them in compression. The results show that for fingerprint images, the wave atom
offers better performance than the current transform based compression standard. The wave
atoms transformation brings a considerable contribution on the compression of fingerprints
images by achieving high values of ratios compression and PSNR, with a reduced number of
coefficients. In addition, the proposed method is verified with objective and subjective testing.
FINGERPRINTS IMAGE COMPRESSION BY WAVE ATOMScsandit
The fingerprint images compression based on geometric transformed presents important
research topic, these last year’s many transforms have been proposed to give the best
representation to a particular type of image “fingerprint image”, like classics wavelets and
wave atoms. In this paper we shall present a comparative study between this transforms, in
order to use them in compression. The results show that for fingerprint images, the wave atom
offers better performance than the current transform based compression standard. The wave
atoms transformation brings a considerable contribution on the compression of fingerprints
images by achieving high values of ratios compression and PSNR, with a reduced number of
coefficients. In addition, the proposed method is verified with objective and subjective testing.
A Review on Image Denoising using Wavelet Transformijsrd.com
this paper proposes different approaches of wavelet based image denoising methods. The search for efficient image denoising methods is still a valid challenge at the crossing of functional analysis and statistics. Wavelet algorithms are very useful tool for signal processing such as image denoising. The main of modify the coefficient is remove the noise from data or signal. In this paper, the technique was extended up to almost remove noise Gaussian.
Similar to (研究会輪読) Facial Landmark Detection by Deep Multi-task Learning (20)
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
2. Contents
¤ Paper Information
¤ Introduction
¤ Related work
¤ Tasks-Constrained Deep Convolutional Network
¤ Experiment
¤ Conclusion
3. Paper Information
Title : Facial Landmark Detection by Deep Multi-task Learning
(2014)
Authors : Zhanpeng Zhang, Ping Luo, Chen Change Loy, and
Xiaoou Tang
¤ The Chinese University of Hong Kong / Multimedia Laboratory
Deep Learning (CNN) + Multitask Learning
¤ Motivation
¤ I’m studying lifelong learning (online multitask learning) by deep
learning
4. Facial landmark detection
¤ Facial landmark detection is a fundamental component in
many face analysis task
¤ facial attribute inference
¤ face verification
¤ face recognition
¤ remains a formidable challenge
¤ partial occlusion and large head pose variations
5. Approach
the authors thought that …
¤ facial landmark detection is not a standalone problem
¤ its estimation can be influencedby a number of heterogeneous
and subtly correlated factors
Main task
Auxiliary
task
Multitask learning
6. Contribution
They propose a Tasks-Constrained Deep Convolutional Network
(TCDCN)
¤ the first attempt to investigate how facial landmark detection
can be optimized together with heterogeneous but subtly
correlated tasks
¤ show that …
¤ the representations learned from related tasks facilitate the learning
of the main task
¤ tasks relatedness are captured implicitly by the proposed model
¤ the proposed approach outperforms the existing methods
¤ demonstrate the effectiveness of using five-landmark estimation
as robust initialization for improving a state-of-the-art face
alignment method
7. Contents
¤ Paper Information
¤ Introduction
¤ Related work
¤ Tasks-Constrained Deep Convolutional Network
¤ Experiment
¤ Conclusion
9. Landmark detection by CNN
cascaded CNN [Sun et al. 2013]
¤ 顔を予め幾つかのパーツに分けてそれぞれCNNでlandmarkを推定し、最
後に平均をとって出⼒
¤ 元論⽂を読むと、段階ごとにCNNを適⽤してるっぽい
¤ 本研究に最も近い研究(本著者と同じ研究室)
Figure 2: Three-level cascaded convolutional networks. The input is the face region returned by a face detector. The three
networks at level 1 are denoted as F1, EN1, and NM1. Networks at level 2 are denoted as LE21, LE22, RE21, RE22, N21,
N22, LM21, LM22, RM21, and RM22. Both LE21 and LE22 predict the left eye center, and so forth. Networks at level 3
10. Multi-task learning
¤ deep learningとmulti-task learningは相性がいい
¤ あるタスクで学習した特徴量を他の特徴量でも利⽤できる
¤ 通常のマルチタスク学習では、それぞれのタスクが同じ難易度・収束
率と考えている
¤ 今回の問題は各タスクが平等ではないのでそのままでは利⽤できない
本研究ではタスクごとに早期終了(early-stopping)を設定
([Caruana et al. 1997]がヒント)
11. blem Formulation
ional multi-task learning (MTL) seeks to improve the ge
ce of multiple related tasks by learning them jointly. Supp
T tasks and the training data for the t-th task are denote
{1, . . . , T}, i = {1, . . . , N}, with xt
i 2 Rd
and yt
i 2 R bein
label, respectively1
. The goal of the MTL is to minimize
argmin
{wt}T
t=1
TX
t=1
NX
i=1
`(yt
i, f(xt
i; wt
)) + (wt
),
; wt
) is a function of xt
and parameterized by a weight ve
on is denoted by `(·). A typical choice is the least square f
nge loss for classification. The (wt
) is the regularizatio
he complexity of weights.
Problem Formulation
¤ 従来のマルチタスク学習は、複数の関連するタスクを同時に学習する
ことで汎化性能を⾼める
訓練事例集合
タスク
損失関数 正則化項
ラベル 素性 重み
12. Proposed Formulation
¤ 本研究のマルチタスク学習
特徴
¤ 異なる2つの誤差関数を同時に最適化できる(回帰とクラス分類でも可能)
¤ 素性xがタスク依存でなく共通
loss function is denoted by `(·). A typical choice is the least square for regression
and the hinge loss for classification. The (wt
) is the regularization term that
penalizes the complexity of weights.
In contrast to conventional MTL that maximizes the performance of all tasks
our aim is to optimize the main task r, which is facial landmark detection, with
the assistances of arbitrary number of related/auxiliary tasks a 2 A. Examples
or related tasks include facial pose estimation and attribute inference. To this
end, our problem can be formulated as
argmin
Wr,{Wa}a2A
NX
i=1
`r
(yr
i , f(xi; Wr
)) +
NX
i=1
X
a2A
a
`a
(ya
i , f(xi; Wa
)), (2)
1
In this paper, scalar, vector, and matrix are denoted by lowercase, bold lowercase
and bold capital letter, respectively.
メインタスク 補助タスク
a番⽬の補助タスクの重要度
13. Proposed Formulation
¤ メインタスクが回帰問題、補助タスクがクラス分類なので、誤差関数
はそれぞれ2乗誤差、クロスエントロピー誤差となる
¤ 共有する画像の特徴量をDeep CNで学習
これら2つの式を合わせて学習する
メインタスク
補助タスク
can be combined, while existing methods [30] that employ Eq.(1) assume implic-
itly that the loss functions across all tasks are identical. Second, Eq.(1) allows
data xt
i in di↵erent tasks to have di↵erent input representations, while Eq.(2)
focuses on a shared input representation xi. The latter is more suitable for our
problem, since all tasks share similar facial representation.
In the following, we formulate our facial landmark detection model based on
Eq.(2). Suppose we have a set of feature vectors in a shared feature space across
tasks {xi}N
i=1 and their corresponding labels {yr
i , yp
i , yg
i , yw
i , ys
i }N
i=1, where yr
i is
the target of landmark detection and the remaining are the targets of auxiliary
tasks, including inferences of ‘pose’, ‘gender’, ‘wear glasses’, and ‘smiling’. More
specifically, yr
i 2 R10
is the 2D coordinates of the five landmarks (centers of the
eyes, nose, corners of the mouth), yp
i 2 {0, 1, .., 4} indicates five di↵erent poses
(0 , ±30 , ±60 ), and yg
i , yw
i , ys
i 2 {0, 1} are binary attributes. It is reasonable
to employ the least square and cross-entropy as the loss functions for the main
task (regression) and the auxiliary tasks (classification), respectively. Therefore,
the objective function can be rewritten as
argmin
Wr,{Wa}
1
2
NX
i=1
kyr
i f(xi; Wr
)k2
NX
i=1
X
a2A
a
ya
i log(p(ya
i |xi; Wa
))+
TX
t=1
kWk2
2,
(3)
where f(xi; Wr
) = (Wr
)
T
xi in the first term is a linear function. The second
term is a softmax function p(yi = m|xi) =
exp{(Wa
m)T
xi}
P
j exp{(Wa
j )T
xi}
, which models the
class posterior probability (Wa
j denotes the jth column of the matrix), and
the third term penalizes large weights (W = {Wr
, {Wa
}}). In this work, we
adopt the deep convolutional network (DCN) to jointly learn the share feature
space x, since the unique structure of DCN allows for multitask and shared
representation.
tasks, including inferences of ‘pose’, ‘gender’, ‘wear glasses’, and ‘smiling’. More
specifically, yr
i 2 R10
is the 2D coordinates of the five landmarks (centers of the
eyes, nose, corners of the mouth), yp
i 2 {0, 1, .., 4} indicates five di↵erent poses
(0 , ±30 , ±60 ), and yg
i , yw
i , ys
i 2 {0, 1} are binary attributes. It is reasonable
to employ the least square and cross-entropy as the loss functions for the main
task (regression) and the auxiliary tasks (classification), respectively. Therefore,
the objective function can be rewritten as
argmin
Wr,{Wa}
1
2
NX
i=1
kyr
i f(xi; Wr
)k2
NX
i=1
X
a2A
a
ya
i log(p(ya
i |xi; Wa
))+
TX
t=1
kWk2
2,
(3)
where f(xi; Wr
) = (Wr
)
T
xi in the first term is a linear function. The second
term is a softmax function p(yi = m|xi) =
exp{(Wa
m)T
xi}
P
j exp{(Wa
j )T
xi}
, which models the
class posterior probability (Wa
j denotes the jth column of the matrix), and
the third term penalizes large weights (W = {Wr
, {Wa
}}). In this work, we
adopt the deep convolutional network (DCN) to jointly learn the share feature
space x, since the unique structure of DCN allows for multitask and shared
representation.
In particular, given a face image x0
, the DCN projects it to higher level
representation gradually by learning a sequence of non-linear mappings
x0 ((Ws1 )T
x0
)
! x1 ((Ws2 )T
x1
)
! ...
((Wsl )T
xl 1
)
! xl
. (4)
Here, (·) and Wsl
indicate the non-linear activation function and the filters
needed to be learned in the layer l of DCN. For instance, xl
=
⇣
(Wsl
)
T
xl 1
⌘
.
Note that xl
is the shared representation between the main task r, and related
15. Task-wise early stopping
¤ マルチタスクの場合、異なるタスクで難易度や収束率が異なる
¤ メインタスクよりも補助タスクの⽅が簡単そう→早く収束しそう
¤ 補助タスクが先に最適解に到達してるのにマルチタスク学習を続けると、
過学習となってしまい、メインタスクに悪影響を与えることになる
→タスクによって学習をhaltするtask-wise early stopping
¤ ⾃動的にタスクを停⽌する基準
Facial Landmark Detection by Deep Multi-task Learning 7
of the training process, the TCDCN is constrained by all tasks to avoid being
trapped at a bad local minima. As training proceeds, certain auxiliary tasks are
no longer beneficial to the main task after they reach their peak performance
their learning process thus should be halted. Note that the regularization o↵ered
by early stopping is di↵erent from weight regularization in Eq.(3). The latte
globally helps to prevent over-fitting in each task through penalizing certain
parameter configurations. In Section 4.2, we show that task-wise early stopping
is critical for multi-task learning convergence even with weight regularization.
Now we introduce a criterion to automatically determine when to stop learn
ing an auxiliary task. Let Ea
val and Ea
tr be the values of the loss function of task
a on the validation set and training set, respectively. We stop the task if its
measure exceeds a threshold ✏ as below
k · medt
j=t kEa
tr(j)
Pt
j=t k Ea
tr(j) k · medt
j=t kEa
tr(j)
·
Ea
val(t) minj=1..t Ea
tr(j)
a · minj=1..t Ea
tr(j)
> ✏, (5
where t denotes the current iteration and k controls a training strip of length
k. The ‘med’ denotes the function for calculating median value. The first ter
m in Eq.(5) represents the tendency of the training error. If the training erro
drops rapidly within a period of length k, the value of the first term is small
indicating that training can be continued as the task is still valuable; otherwise
閾値
訓練誤差の傾向
• 訓練データの⼀部kにおいて訓練誤差
が急激に落ちると値は⼩さくなる
→⽌まらない
汎化誤差
• 訓練誤差に対する汎化誤差
• 汎化誤差と訓練誤差の差が⼤
きくなる→⽌まる
16. Learning procedure
¤ 最急降下法で求める
is the importance coe cient of a-th task’s er
gradient descent. Its magnitude reveals that m
longer impact. This strategy achieves satisfac
volution network given multiple tasks. Its sup
in Section 4.2.
Learning procedure: We have discussed w
iliary task during training before it over-fit
stochastic gradient descent to update the w
the network. For example, the weight matri
Wr
= ⌘ @Er
@Wr with ⌘ being the learning ra
tion), and @Er
@Wr = (yr
i (Wr
)
T
xi)xT
i . Also, th
weights can be calculated in a similar manne
For the filters in the lower layer, we compute
loss error back following the back-propagatio
"1
(Ws2 )T
"2 @ (u1)
@u1
"2
(Ws3 )T
"3 @ (u2
@u2
where "l
is the error at the shared represent
(Wr
)T
xi] +
P
a2A(p(ya
i |xi; Wa
) ya
i )Wa
, w
derivatives. The errors of the lower layers a
instance, "l 1
= (Wsl
)T
"l @ (ul 1
)
@ul 1 , where @ (
@u
function. Then, the gradient of the filter is o
⌦ represents the receptive field of the filter.
ing an auxiliary task. Let Ea
val and Ea
tr be the values of the loss function of task
a on the validation set and training set, respectively. We stop the task if its
measure exceeds a threshold ✏ as below
k · medt
j=t kEa
tr(j)
Pt
j=t k Ea
tr(j) k · medt
j=t kEa
tr(j)
·
Ea
val(t) minj=1..t Ea
tr(j)
a · minj=1..t Ea
tr(j)
> ✏, (5)
where t denotes the current iteration and k controls a training strip of length
k. The ‘med’ denotes the function for calculating median value. The first ter-
m in Eq.(5) represents the tendency of the training error. If the training error
drops rapidly within a period of length k, the value of the first term is small,
indicating that training can be continued as the task is still valuable; otherwise,
the first term is large, then the task is more likely to be stopped. The second
term measures the generalization error compared to the training error. The a
is the importance coe cient of a-th task’s error, which can be learned through
gradient descent. Its magnitude reveals that more important task tends to have
longer impact. This strategy achieves satisfactory results for learning deep con-
volution network given multiple tasks. Its superior performance is demonstrated
in Section 4.2.
Learning procedure: We have discussed when and how to switch o↵ an aux-
iliary task during training before it over-fits. For each iteration, we perform
stochastic gradient descent to update the weights of the tasks and filters of
the network. For example, the weight matrix of the main task is updated by
Wr
= ⌘ @Er
@Wr with ⌘ being the learning rate (⌘ = 0.003 in our implementa-
tion), and @Er
@Wr = (yr
i (Wr
)
T
xi)xT
i . Also, the derivative of the auxiliary task’s
weights can be calculated in a similar manner as @Ea
@Wa = (p(ya
i |xi; Wa
) ya
i )xi.
For the filters in the lower layer, we compute the gradients by propagating the
loss error back following the back-propagation strategy as
"1
(Ws2 )T
"2 @ (u1)
@u1
"2
(Ws3 )T
"3 @ (u2)
@u2
...
(Wsl )T
"l @ (ul 1)
@ul 1
"l
, (6)
where "l
is the error at the shared representation layer and "l
= (Wr
)T
[yr
i
(Wr
)T
xi] +
P
a2A(p(ya
i |xi; Wa
) ya
i )Wa
, which is the integration of all tasks’
derivatives. The errors of the lower layers are computed following Eq.(6). For
instance, "l 1
= (Wsl
)T
"l @ (ul 1
)
@ul 1 , where @ (u)
@u is the gradient of the activation
function. Then, the gradient of the filter is obtained by @E
@Wsl
= "l
xl 1
⌦ , where
⌦ represents the receptive field of the filter.
メインタスク
補助タスク
ts magnitude reveals that more important task tends to have
is strategy achieves satisfactory results for learning deep con-
given multiple tasks. Its superior performance is demonstrated
dure: We have discussed when and how to switch o↵ an aux-
training before it over-fits. For each iteration, we perform
t descent to update the weights of the tasks and filters of
example, the weight matrix of the main task is updated by
with ⌘ being the learning rate (⌘ = 0.003 in our implementa-
(yr
i (Wr
)
T
xi)xT
i . Also, the derivative of the auxiliary task’s
culated in a similar manner as @Ea
@Wa = (p(ya
i |xi; Wa
) ya
i )xi.
he lower layer, we compute the gradients by propagating the
owing the back-propagation strategy as
)T
"2 @ (u1)
@u1
"2
(Ws3 )T
"3 @ (u2)
@u2
...
(Wsl )T
"l @ (ul 1)
@ul 1
"l
, (6)
ror at the shared representation layer and "l
= (Wr
)T
[yr
i
(p(ya
i |xi; Wa
) ya
i )Wa
, which is the integration of all tasks’
rrors of the lower layers are computed following Eq.(6). For
Wsl
)T
"l @ (ul 1
)
@ul 1 , where @ (u)
@u is the gradient of the activation
he gradient of the filter is obtained by @E
@Wsl
= "l
xl 1
⌦ , where
volution network
in Section 4.2.
Learning proce
iliary task during
stochastic gradien
the network. For
Wr
= ⌘ @Er
@Wr
tion), and @Er
@Wr =
weights can be ca
For the filters in
loss error back fo
"1
(Ws
where "l
is the er
(Wr
)T
xi] +
P
a2A
derivatives. The
instance, "l 1
= (
function. Then, t
⌦ represents the
gradient descent. Its magnitude reveals tha
longer impact. This strategy achieves satis
volution network given multiple tasks. Its s
in Section 4.2.
Learning procedure: We have discussed
iliary task during training before it over-
stochastic gradient descent to update the
the network. For example, the weight ma
Wr
= ⌘ @Er
@Wr with ⌘ being the learning
tion), and @Er
@Wr = (yr
i (Wr
)
T
xi)xT
i . Also,
weights can be calculated in a similar man
For the filters in the lower layer, we compu
loss error back following the back-propagat
"1
(Ws2 )T
"2 @ (u1)
@u1
"2
(Ws3 )T
"3 @
@
where "l
is the error at the shared represe
(Wr
)T
xi] +
P
a2A(p(ya
i |xi; Wa
) ya
i )Wa
,
derivatives. The errors of the lower layers
instance, "l 1
= (Wsl
)T
"l @ (ul 1
)
@ul 1 , where @
バックプロパゲーション
reveals that more important task tends to have
ieves satisfactory results for learning deep con-
tasks. Its superior performance is demonstrated
discussed when and how to switch o↵ an aux-
re it over-fits. For each iteration, we perform
update the weights of the tasks and filters of
weight matrix of the main task is updated by
he learning rate (⌘ = 0.003 in our implementa-
i)xT
i . Also, the derivative of the auxiliary task’s
milar manner as @Ea
@Wa = (p(ya
i |xi; Wa
) ya
i )xi.
we compute the gradients by propagating the
k-propagation strategy as
(Ws3 )T
"3 @ (u2)
@u2
...
(Wsl )T
"l @ (ul 1)
@ul 1
"l
, (6)
red representation layer and "l
= (Wr
)T
[yr
i
ya
i )Wa
, which is the integration of all tasks’
wer layers are computed following Eq.(6). For
)
, where @ (u)
@u is the gradient of the activation
17. Experiments
¤ Network Structure
¤ Model training
¤ 学習するデータセット:10,000 outdoor face images from the web
¤ 移動とか回転、ズームはあまり気にしないで収集
¤ テストデータ:AFLWとAFL
¤ Evaluation metrics
¤ 平均エラー率
¤ 正解と推定したlandmarkの距離を計算し、⽬の間隔で正規化
¤ 誤り率
¤ 10%を越えると誤りと判断
18. the Effectiveness of Learning with Related Task
¤ AFLWで評価
¤ 左が各landmarkのエラー率、右が全部のlandmarkの失敗率
¤ 補助タスクによって確かにエラー率も失敗率も下がっている
¤ 全部の補助タスクを利⽤すると、失敗率を10%も改善できる
¤ poseが⼀番効いてるっぽい
Facial Landmark Detection by Deep Multi-task Learning 9
6
8
10
12
left eye right eye nose left mouth
corner
right mouth
corner
meanerror(%)
FLD FLD+gender FLD+glasses FLD+smile FLD+pose FLD+all
35.62
31.86
32.87 32.37
28.76
25.00
20
25
30
35
40
failurerate(%)
Fig. 4. Comparison of di↵erent model variants of TCDCN: the mean error over di↵erent
landmarks, and the overall failure rate.
4.1 Evaluating the E↵ectiveness of Learning with Related Task
To examine the influence of related tasks, we evaluate five variants of the pro-
posed model. In particular, the first variant is trained only on facial landmark
detection. We train another four model variants on facial landmark detection
along with the auxiliary task of recognizing ‘pose’, ‘gender’, ‘wearing glasses’,
19. FLD vs. FLD + smile
smileがどのlandmarkで効果的かを検証
(a):⿐や⼝で効果がある
¤ smileは顔の下半分に該当するから
(b):最終層の重みのピアソンの相関係数
¤ ⼝と強い相関
10 Z. Zhang, P. Luo, C. C. Loy, and X. Tang
8
8.5
9
9.5
10
10.5
11
11.5
left eye right eye nose left mouth
corner
right mouth
corner
meanerror(%)
FLD FLD+smile
0.11
0.32
0.17
0.22
0.40
left eye
right eye
nose
left mouth
corner
right mouth
corner
correlation
Landmarkdetectionweights
(a) (b) Learned weights’ correlation with the
weights of‘smiling’task
Fig. 5. FLD vs. FLD+smile. The smiling attribute helps detection more on the nose
and corners of mouth, than the centers of eyes, since ‘smiling’ mainly a↵ects the lower
part of a face.
20. FLD vs. FLD + pose
ポーズの効果を検証
(a):どのポーズでもエラー率は下がっている
(b):正解の改善率で⾒ても、どのポーズでもよくなっている
Fig. 5. FLD vs. FLD+smile. The smiling attribute helps detection more on the nose
and corners of mouth, than the centers of eyes, since ‘smiling’ mainly a↵ects the lower
part of a face.
0
0.5
1
1.5
2
2.5
3
left
profile
left frontal right right
profle
accuracyimprovement(%)
(a)
5
10
15
20
left
profile
left frontal right right
profle
meanerror(%)
FLD FLD+pose
(b)
Fig. 6. FLD vs. FLD+pose. (a) Mean error in di↵erent poses, and (b) Accuracy im-
provement by the FLD+pose in di↵erent poses.
weight vectors, which are learned to predict the positions of the mouth’s corners
have high correlation with the weights of ‘smiling’ inference. This demonstrates
that TCDCN implicitly learns relationship between tasks.
FLD vs. FLD+pose: As observed in Figure 6(a), detection errors of FLD
21. The Benefits of Task-wise Early Stopping
(a):task-wise early stoppingでかなりエラーが落ちている
(b):訓練誤差・汎化誤差がearly stoppingで⼩さくなっている
Facial Landmark Detection by Deep Multi-
stop ‘glasses’
stop ‘gender’
stop ‘smile’
stop ‘pose’6
8
10
12
14
16
left eye right eye nose left mouth
corner
right mouth
corner
meanerror(%)
FLD+all
FLD+all with task-wise early-stopping
Fig. 7. (a) Task-wise early stopping leads to substantially lower
di↵erent landmarks. (b) Its benefit is also reflected on the trainin
convergence rate. The error is measured in L2-norm with respec
of the 10 coordinates values (normalized to [0,1]) for the 5 landm
4.3 Comparison with the Cascaded CNN [21]
Although both the TCDCN and the cascaded CNN [21] a
we show that the proposed model can achieve better detect
significantly lower computational cost. We use the full mo
the publicly available binary code of the cascaded CNN in t
Landmark localization accuracy: Similar to Section 4.1
22. Comparison with the Cascaded CNN
¤ 訓練データを同じにしてAFLWでテスト
¤ 異なる点は、マルチタスク学習を利⽤しているかどうかという点
¤ 4つのlandmarkでcascaded CNNを上回る
¤ 全体的にはcascaded CNNに勝っている
that we use the same 10,000 training faces as in the cascaded CNN method.
Thus the only di↵erence is that we exploit a multi-task learning approach. It
is observed from Figure 8 that our method performs better in four out of five
landmarks, and the overall accuracy is superior to that of cascaded CNN.
(a) (b)
7
8
9
10
11
left eye right eye nose left mouth
corner
right mouth
corner
meanerror(%)
cascaded CNN Ours
10
20
30
40
50
left eye right eye nose left mouth
corner
right mouth
corner
failurerate(%)
Fig. 8. The proposed TCDCN vs. cascaded CNN [21]: (a) mean error over di↵erent
landmarks and (b) the overall failure rate.
Computational e ciency: Suppose the computation time of a 2D-convolution
operation is ⌧, the total time cost for a CNN with L layers can be approximated
by
PL
l=1 s2
l qlql 1⌧, where s2
is the 2D size of the input feature map for l-th
layer, and q is the number of filters. The algorithm complexity of a CNN is thus
O(s2
q2
), directly related to the input image size and number of filters. Note that
23. Comparison with other State-of-the-art Methods
¤ AFLWでの結果
¤ 他の既存研究の結果を全て上回っている
¤ AFWでの結果
¤ AFLWと同様
12 Z. Zhang, P. Luo, C. C. Loy, and X. Tang
5
10
15
20
25
left eye right eye nose left mouth
corner
right mouth
corner
meanerror(%)
TSPM ESR CDM Luxand RCPR SDM Ours
15.9
13.0 13.1
12.4
11.6
8.5 8.0
5
10
15
20
meanerror(%)
5
10
15
20
25
left eye right eye nose left mouth
corner
right mouth
corner
meanerror(%)
14.3
12.2
11.1
10.4
9.3
8.8 8.2
5
10
15
20
meanerror(%)
AFLWAFW
Fig. 9. Comparison with RCPR [3], TSPM [32], CDM [27], Luxand [18], and SDM [25]
on AFLW [11] (the first row) and AFW [32] (the second row) datasets. The left sub-
figures show the mean errors on di↵erent landmarks, while the right subfigures show
the overall errors.
multiple CNNs in di↵erent cascaded layers (23 CNNs in its implementation).
Hence, TCDCN has much lower computational cost. The cascaded CNN requires
24. Comparison with other State-of-the-art Methods
⾊々な画像の結果
¤ 1⾏⽬:メガネかけてる
¤ 2⾏⽬:ポーズのバリエーション
¤ 3⾏⽬:
¤ 1,2列⽬:光の当たり⽅が違う
¤ 3列⽬:画像の質が悪い
¤ 4,5列⽬:異なる表情
¤ 6~8列⽬:間違った例(⾚が間違った部分)
Facial Landmark Detection by Deep Multi-task Learning 13
0’ NS NG F30’ NS G F
60’ NS NG F 30’ S NG F-30’ NS NG F
0’ NS G M
60’ NS NG F
-30’ NS G M-30’ S G M
-30’ S NG F
0’ NS NG F 0’ S NG F
-30’ NS NG M60’ S NG M60’ NS NG M
0’ NS NG M0’ S NG F
0’ S NG M
0’ NS NG M0’ NS NG M 0’ NS NG M -30’ NS NG F 0’ S NG F0’ NS NG F
Fig. 10. Example detections by the proposed model on AFLW [11] and AFW [32]
images. The labels below each image denote the tagging results for the related tasks:
(0 , ±30 , ±60 ) for pose; S/NS = smiling/not-smiling; G/NG = with-glasses/without-
glasses; M/F = male/female. Red rectangles indicate wrong tagging.
4.5 TCDCN for Robust Initialization
This section shows that the TCDCN can be used to generate a good initialization
to improve the state-of-the-art method, owing to its accuracy and e ciency. We
take RCPR [3] as an example. Instead of drawing training samples randomly as
initialization as did in [3], we initialize RCPR by first applying TCDCN on the
25. TCDCN for Robust Initialization
¤ TCDCNはよい初期化を得る⼿法としても利⽤できる
¤ 既存研究であるRCPRについて、TCDCNでの初期化をしたものとしな
かったもので⽐較
(a):相対的な改善(改善後のエラー/元のエラー)
(b):改善の可視化(上が普通のRCPR、下がTCDCNで初期化したRCPR)
14 Z. Zhang, P. Luo, C. C. Loy, and X. Tang
1
23 4
5
6
7
8
9 10
11 12
13
14
15
16
17 18
19 20
21
22
23 24
25
26
27
28
29 0
5
10
15
20
1 6 11 16 21 26
relativeimprovment(%)
landmarks
(a) (b)
Fig. 11. Initialization with our five-landmark estimation for RCPR [3] on
COFW dataset [3]. (a) shows the relative improvement on each landmark
(relative improvement = reduced error
original error
). (b) visualizes the improvement. The upper row
depicts the results of RCPR [3], while the lower row shows the improved results by our
initialization.
heterogeneous but subtly correlated tasks, such as appearance attribute, expres-
sion, demographic, and head pose. The proposed Tasks-Constrained DCN allows
errors of related tasks to be back-propagated in deep hidden layers for construct-
ing a shared representation to be relevant to the main task. We have shown that