2. • By Sebastian Ruder, Insight Centre for Data Analytics, Dublin.
• published in Jun 2017
• Cited by 1849
• Multitask learning (1997) By Rich Caruana
• His thesis on Multi-Task Learning helped create interest in a new subfield of
machine learning called Transfer Learning.
3. Introduction
Traditional Machine Learning single task:
typically care about optimizing for a particular metric, whether this is a score on a
certain benchmark or a business KPI. In order to do this, we generally train a single
model or an ensemble of models to perform our desired task. We then fine-tune
and tweak these models until their performance no longer increases.
4. Multi-task learning (MTL):
is a machine learning approach in which we try to learn multiple tasks
simultaneously, optimizing multiple loss functions at once. Rather than training
independent models for each task, we allow a single model to learn to complete all
of the tasks at once.
5.
6. Real world implementation
Tesla Auto Pilot
Andrej Karpathy: Tesla Autopilot and Multi-Task Learning for Perception and
Prediction
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21. MTL methods for Deep Learning
• Hard parameter sharing:
sharing the hidden layers between all tasks, while keeping several task-specific
output layers.
• Soft parameter sharing:
each task has its own model with its own parameters. The distance between the
parameters of the model is then regularized in order to encourage the parameters
to be similar.
25. • Implicit data augmentation
MTL effectively increases the sample size that we are using for training our model.
As all tasks are at least somewhat noisy, when training a model on some task A
aim is to learn a good representation for task A that ideally ignores the data-
dependent noise and generalizes well. As different tasks have different noise
patterns, a model that learns two tasks simultaneously is able to learn a more
general representation. Learning just task A bears the risk of overfitting to task A,
while learning A and B jointly enables the model to obtain a better representation
through averaging the noise patterns.
26. • Attention focusing
If a task is very noisy or data is limited and high-dimensional, it can be difficult for a
model to differentiate between relevant and irrelevant features. MTL help the model
focus its attention on those features that actually matter.
• Eavesdropping
Some features G are easy to learn for some task B, while being difficult to learn for
another task A. This might either be because A interacts with the features in a more
complex way or because other features are impeding the model's ability to learn G.
Through MTL, we can allow the model to eavesdrop, i.e. learn G through task B. The
easiest way to do this is through hints, i.e. directly training the model to predict the
most important features.
27. • Representation bias
MTL biases the model to prefer representations that other tasks also prefer. This
also help the model to generalize to new tasks in the future as a hypothesis space
that performs well for a sufficiently large number of training tasks will also perform
well for learning novel tasks as long as they are from the same environment.
• Regularization
Finally, MTL acts as a regularizer by introducing an inductive bias. As such, it
reduces the risk of overfitting as well as the Rademacher complexity of the model,
i.e. its ability to fit random noise.
28. Experiment
• Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and
Semantics
By:
• Alex Kendall
• Yarin Gal
• Roberto Cipolla
29.
30.
31. Conclusion
• From natural language processing and speech recognition to computer vision and
drug discovery, multi-task learning (MTL) has led to success in a variety of machine
learning applications.
• This work aims to assist ML practitioners in implementing MTL by explaining how it
works and offering advice for selecting relevant auxiliary activities.
• Our understanding of tasks – their similarity, relationship, hierarchy, and benefit for
MTL – is still restricted, and we need to study them more them more to acquire a
deeper grasp of MTL's deep neural network generalization capabilities.
32. References
• Ruder, S. (2017). An overview of multi-task learning in deep neural networks. arXiv
preprint arXiv:1706.05098.
• Caruana, R. (1997). Multitask learning. Machine learning, 28(1), 41-75.
• Andrej Karpathy. (2019, 3 December). Tesla Autopilot and Multi-Task Learning for
Perception and Prediction. [Video]. YouTube.
https://www.youtube.com/watch?v=IHH47nZ7FZU
• Abu-Mostafa, Y. S. (1990). Learning from hints in neural networks. Journal of
Complexity, 6(2), 192–198. https://doi.org/10.1016/0885-064X(90)90006-Y
• Kendall, A., Gal, Y., & Cipolla, R. (2018). Multi-task learning using uncertainty to weigh
losses for scene geometry and semantics. In Proceedings of the IEEE conference on
computer vision and pattern recognition (pp. 7482-7491).