Multi task learning in dnn

multi-task
learning in
DNN
Bomwurzel Lior
Research seminar
computer vision

What Is multi-task learning
Auxiliary tasks
Examples
Why does MTL work
Intuition needed for MTL

STL – single task
learning
• optimize a single task
minimizing the loss according to this task only

Simple thought
experiment
Given 50K female and 50K male medical records
Should you train separate models for both genders?
Should you use the gender as additional parameter input?

Simple thought
experiment
Suppose you try to predict:
Ovarian / prostate cancer
Breast cancer
Heart disease

Simple thought
experiment
• we don’t know if to train a separate or
combined model
• Let's let the neural network weights do
the decision for us
• Common features can be learned in the
combined hidden layer
• A feature that develops on one task can
be shared with another.
• weights for features which task do not
use can be low, therefore the coupled
tasks can brake on

MTL – multi task
learning
• Optimize several related tasks
minimizing the loss according to several related tasks
• Learn related tasks in parallel
Use shared representations
leverage information from other tasks

Human inspiration to
multi-task learning
• learn several basics tasks to
perform a difficult task
For Example Driving

MTL – multi-task
learning
STL : 𝑚𝑖𝑛 𝑤
1
𝑚 𝑖=1
𝑚
𝐿 𝑓𝑤(𝑥 𝑖
), 𝑌 𝑖
+ 𝜆𝑅(𝑤)
MTL : 𝑚𝑖𝑛 𝑤
1
𝑚 𝑖=1
𝑚
𝑗=1
4
𝐿 𝑓𝑤𝑗 𝑥 𝑖 , 𝑌𝑗
𝑖
+ 𝜆𝑅(𝑤𝑗)
L - Loss function such as the hinge loss or square loss
R – Regularization function like L2 L1

Disadvantages individually learned tasks
• To perform several tasks, you train several times
• More resources needed for several different networks*
• no information learned from one task can be used to other

Advantages of multi-task learning
• Get more samples from other tasks training sets
• Simplify complex task(hard to codify) to several simple tasks
• Model is more generalized (not optimized on a specific task)

what if I care only about one task?
Surprisingly, most of the real-world problems can use the
benefits of MTL by using auxiliary tasks

auxiliary tasks – learn from hints
• Predicting features as auxiliary tasks
Instead recognize complex objects like cars or pedestrians
train on edges, shapes, regions ,textures, texts,
orientation , distance, shadow, reflections

auxiliary tasks – learn from instances
To Predict the sentiment of the sentence,
use auxiliary task, which predicts if the sentence has a positive or
negative sentiment word.

auxiliary tasks - focusing attention
• Use the auxiliary task to focus attention on parts of the image
that might be ignored.
• For example, lane marking might be ignored because they
don't always appear, and they are relatively small.
If we force the model to learn them, it can be used for the
main task

auxiliary tasks – quantization smoothing
• Train auxiliary task with other quantization
• If our real problem is less quantizes or continuous, it can be
easier to learn a smoother problem.
• example for distance learning, instead of the labels {close,
far} learn the real distance.

auxiliary tasks – use the future
• future measurements can be used in offline learning
problems.
As an example when driving far objects are harder to identify,
Only after car pass near them, you can accurately identify them.
Sometimes you have the results only after the test. Use this to train
offline.

auxiliary tasks – Time series prediction
• When learning a task with a short time scale, the learner may
ﬁnd it difﬁcult to recognize the longer-term processes, and
vice-versa. Training both scales on a single net

auxiliary tasks – the same task from different
point of view
• Use different matrices as tasks in order to let your model learn different things for each
loss
For example, minimize loss on squared loss, log loss, rank loss, or accuracy
• You can learn the problem on several representations
For example, if it is easier to learn polar cartesian but the application need it in cartesian
• Sometimes it helps to learn the same task multiple times
The random waits which connected to each task let it learn the feature in different ways

Examples – related task (hints)

Example-learn from
future
or move input feature to output

move input feature to
output
• Features you decide to not use in the
input can be used as learning signals as
output

Examples -time
series prediction

Examples -time series prediction

Lesson to be learned – time series prediction
• tasks sometimes can help or interfere with each other
• Tasks help for each other can be asymmetric.
• Always try different models to find the best match for your task

Why MTL works
• Several mechanisms that help MTL backprop
nets to generalize better.
• All mechanisms derive from the summing of
error gradient terms at the hidden layer for the
different tasks.
• Each, however, exploits a different relationship
between tasks.

Why MTL works – representation Bias
• Random weights initialization, Several runs can end with different local minima
• If T and T’ have common minima A and other uncommon minima, it turns out that if we train on both tasks, we will
more likely to end on the common local minima.
• The opposite is also interesting, if one task have strong bias for the uncommon minima the MTL tasks prefer NOT
to use hidden layer representations that other tasks prefer NOT to use. And the other task will end in the
uncommon minima as well
• .

Why MTL works – eavesdropping
• If T’ learns feature F which can be useful to T more easily,
And not a complex representation of F which will be learned by
T.
After the feature is learned, T can use the simple
representation of F.
• For example T’ can be the feature F itself.

Why MTL works – Generalization
• When Learning several tasks, the risk of overfitting a specific
feature decrease
• If T and T’ use F differently (depend on the weights) the only
change that allowed in F have to be supported on both tasks
losses .F cannot be changed in direction which is good only
for one task

Why MTL works – features amplifications
• We want to learn a good representation of feature without the
data depended on noise for the task.
• As different tasks have different noise patterns, learning
several tasks with common internal feature enables the
model to obtain a better representation of the feature
And ignore the noise learned on it

Intuition needed for MTL
Things to take into consideration

Which auxiliary tasks will be helpful?
• Open question
• We don’t have good notation if tasks are similar or related
• Currently we use assumptions that the auxiliary tasks should
be related to the main task in some way that it should be
helpful
• You must test several models and find which best fits your
task

Loss functions
considerations
• Some tasks are more important than others
• Some tasks are learned much easier
• Some tasks have more data
• Some tasks have more noise𝑚𝑖𝑛 𝑤
1
𝑚 𝑖=1
𝑚
𝑗=1
𝑛
𝐿 𝑓𝑤𝑗 𝑥 𝑖 , 𝑌𝑗
𝑖

Loss functions
considerations
•
𝑚𝑖𝑛 𝑤
1
𝑚 𝑖=1
𝑚
𝑗=1
𝑛
𝐿 𝑓𝑤𝑗 𝑥 𝑖
, 𝑌𝑗
𝑖
• Some tasks are more important than
others
• Some tasks are learned much easier
• Some tasks have more data
• Some tasks have more noise

References
• Abu-Mostafa, Y . S., “Learning from Hints in Neural Networks,” Journal of Complexity, 1990,
6(2), pp. 192–198.
• Caruana, R. "Multitask learning: A knowledge-based source of inductive bias." Proceedings of
the Tenth International Conference on Machine Learning. 1993
• Sebastian Ruder ”An Overview of Multi-Task Learning in Deep Neural Networks”
• ICML conferences :
Andrej Karpathy “Multi-Task Learning In The Wilderness”
Caruana, Rich “Multi -Task Learning: Tricks Of The Trade”
• Coursera
Andrew Ng “Multi-task learning”

Multi task learning in dnn

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Multi task learning in dnn

Similar to Multi task learning in dnn (20)

Recently uploaded

Recently uploaded (20)

Multi task learning in dnn

Editor's Notes