DA 5330 – Advanced Machine Learning
Applications
Lecture 11 – Advanced Learning Techniques
Maninda Edirisooriya
manindaw@uom.lk
End-to-End Learning
• In earlier time intermediate features were generated and they were used again for
training another ML model
• But, when you have more data, it is much accurate to train from original data against the
result information we expect
Source: https://www.youtube.com/watch?v=bkVCAk9Nsss
Multi-Task Learning
• Different tasks (e.g.: News Summarization, News Sentiment Analysis)
need different labeled datasets which are rare
• The available datasets may be insufficient in size to train a model with
a sufficient level of accuracy level
• When the business need is updated new ML tasks emerge where
there are no labeled datasets to train
• In order to address the above problems we need to have a way to
learn more than one task at a time where a new task can be possible
to be trained with the same model without much data and with a
higher speed, which is known as Multi-Task Learning
Examples for Multi-Task Learning
Source: https://www.youtube.com/watch?v=bkVCAk9Nsss
Assumption of Multi-Task Learning
• In order to learn in the multi-task manner each task should share some
structure
• Otherwise, single-task learning is better to be used
• Fortunately, most of the task have common structures. E.g.:
• Share the same laws of physics
• Languages like English and French share common patterns due to historical reasons
• Psychology and physiology of humans are very similar
Source: https://www.youtube.com/watch?v=bkVCAk9Nsss
Notations of Multi-Task Learning
• In multi-task learning, a new variable zi known as Task Descriptor is added
to the approximation function which is generally a one-hot encoded vector
• Task descriptor encodes the task
Source: https://www.youtube.com/watch?v=vI46tzt4O7Y
Encoding the Task Descriptor in NN
Source: https://www.youtube.com/watch?v=vI46tzt4O7Y
Weighted Multi-Task Learning
• Instead of giving an equal weight to each of the task during the training
different weights can be given on different criteria like,
• Manually setting a priority based weight
• Dynamically adjusting during the training process
• This weight is given to the loss function during the optimization
Source: https://www.youtube.com/watch?v=vI46tzt4O7Y
Training With Vanilla Multi-Task Learning
Source: https://www.youtube.com/watch?v=vI46tzt4O7Y
Introduction to Transfer Learning
• Transfer Learning refers to the process of leveraging knowledge
gained from solving one problem and applying it to a different, but
related, problem
• Unlike in traditional ML, where models are trained to perform a
specific task on a specific dataset, Transfer Learning allows to transfer
knowledge from one task/domain to another. This improves the
performance of the target task, especially when labeled data for the
target task is limited or expensive to obtain
• E.g.: In order to train a cat image classifier, you can use a pre-trained
CNN using the huge ImageNet dataset with many miscellaneous
images and then train only the last few layers of the CNN, with the
available cat image dataset which is smaller in size
Motivation of Transfer Learning
• Scarcity of Labeled Data: Annotated datasets required for training
machine learning models are often scarce and expensive to acquire.
Transfer learning mitigates this issue by utilizing knowledge from
related tasks or domains
• Model Generalization: By transferring knowledge from a pre-trained
model, the model can generalize better to new tasks or domains,
even with limited data
• Efficiency: Transfer learning can significantly reduce the
computational resources and time required for training models from
scratch, making it a practical approach in various real-world scenarios
Types of Transfer Learning
1.Inductive Transfer Learning: Involves transferring knowledge from a source
domain to a target domain by learning a new task in the target domain using
the knowledge gained from solving a related task in the source domain
Example: Suppose you have a model trained to classify different types of
fruits based on images in one dataset (source domain). You can then use the
knowledge gained from this task to classify different types of vegetables
based on images in a separate dataset (target domain)
2.Transductive Transfer Learning: Focuses on adapting a model to a new
domain where the target data distribution may differ from the source domain.
Instead of learning a new task, transductive transfer learning aims to adapt
the model to perform well on the target domain.
Example: Let's say you have a model trained on data from one country
(source domain) to predict housing prices. However, when you try to apply
this model to a different country (target domain), you encounter differences
in housing market dynamics. Transductive transfer learning involves
adapting the model to the target domain's characteristics without explicitly
learning a new task
Pre-Trained Models
• Specific models can be developed by training available small labeled
data with supervised learning on top of the commonly available pre-
trained models
• Large generic datasets like ImageNet and GPT models are some of the
examples for the pre-trained models
• ImageNet is an example for a large labeled dataset
• However, there are many unsupervised pre-trained models available
as open source content such as large language models like GPT
models and BERT models
Transfer Learning via Fine Tuning
• The pre-trained model for source data is trained again for the target
domain data
• Sometimes, all the layers of the NN are trained,
• Either a small Learning Rate is used for all the layers
• Or smaller Learning Rates are used for earlier layers
• Sometimes, train only the last layers while freezing the earlier layers and
gradually the unfreezing the earlier layers
• Sometimes, only the last one or few layers are trained while other layers
keeping frozen
• When the target task is simpler than the source task no need to update earlier layers
• Best techniques/hyperparameters are selected with cross-validation
Transfer Learning via Fine Tuning
• Overfitting can be mitigated by Early Stopping technique
• New layers can be added and initialized with Random Initialization
while keeping the earlier layers as they are
Unintuitive Facts about Transfer Learning
• When the pre-training is done with unsupervised ML and fine tuned with supervised ML
(e.g. Transformer models), you don’t need that much diverse data to pre-train
• You can use the same target dataset for pre-training without much sacrifice of the
accuracy!
• This may change when both pre-training and fine tuning is done with supervised ML
Source: https://www.youtube.com/watch?v=bVjCjdq06R4
Unintuitive Facts about Transfer Learning
• Selecting the last layer of a NN may not be the best layer to be fine tuned
• For different scenarios some middle layers may perform better when selected than a full
fine tuning
Source: https://www.youtube.com/watch?v=bVjCjdq06R4
Rule of Thumb for Transfer Learning
Source: https://www.youtube.com/watch?v=bVjCjdq06R4
Meta Learning
• “Given a set of training tasks, can we optimize for the ability to learn
these tasks quickly, so that we can learn new tasks quickly too?”
• This is what is achieved by Meta Learning
• In other words optimization for transferability is known as Meta Learning
Source: https://www.youtube.com/watch?v=bVjCjdq06R4
Two Views of Meta Learning Algorithms
Source: https://www.youtube.com/watch?v=bVjCjdq06R4
Bayes View of Meta Learning
• yi,j label value probabilities are
dependent on 𝜙𝑖 parameter probabilities
of the model of a task
• All the 𝜙𝑖 parameter probabilities for all
the tasks are dependent on the meta
level parameters 𝜃
• If 𝜙𝑖 are independent for each task i,
then 𝜃 has no information and vice versa
• Learning for 𝜃 is the idea of Meta
Learning
Source: https://www.youtube.com/watch?v=bVjCjdq06R4
Mechanistic View of Meta Learning
• yi,j label value probabilities are dependent on 𝜙𝑖 parameter probabilities of the model of a task
• All the 𝜙𝑖 parameter probabilities for all the tasks are dependent on the meta level
parameters 𝜃
• If 𝜙𝑖 are independent for each task i, then 𝜃 has no information and vice versa
• Learning for 𝜃 is the idea of Meta Learning
Source: https://www.youtube.com/watch?v=bVjCjdq06R4
Questions?

Lecture 11 - Advance Learning Techniques

  • 1.
    DA 5330 –Advanced Machine Learning Applications Lecture 11 – Advanced Learning Techniques Maninda Edirisooriya manindaw@uom.lk
  • 2.
    End-to-End Learning • Inearlier time intermediate features were generated and they were used again for training another ML model • But, when you have more data, it is much accurate to train from original data against the result information we expect Source: https://www.youtube.com/watch?v=bkVCAk9Nsss
  • 3.
    Multi-Task Learning • Differenttasks (e.g.: News Summarization, News Sentiment Analysis) need different labeled datasets which are rare • The available datasets may be insufficient in size to train a model with a sufficient level of accuracy level • When the business need is updated new ML tasks emerge where there are no labeled datasets to train • In order to address the above problems we need to have a way to learn more than one task at a time where a new task can be possible to be trained with the same model without much data and with a higher speed, which is known as Multi-Task Learning
  • 4.
    Examples for Multi-TaskLearning Source: https://www.youtube.com/watch?v=bkVCAk9Nsss
  • 5.
    Assumption of Multi-TaskLearning • In order to learn in the multi-task manner each task should share some structure • Otherwise, single-task learning is better to be used • Fortunately, most of the task have common structures. E.g.: • Share the same laws of physics • Languages like English and French share common patterns due to historical reasons • Psychology and physiology of humans are very similar Source: https://www.youtube.com/watch?v=bkVCAk9Nsss
  • 6.
    Notations of Multi-TaskLearning • In multi-task learning, a new variable zi known as Task Descriptor is added to the approximation function which is generally a one-hot encoded vector • Task descriptor encodes the task Source: https://www.youtube.com/watch?v=vI46tzt4O7Y
  • 7.
    Encoding the TaskDescriptor in NN Source: https://www.youtube.com/watch?v=vI46tzt4O7Y
  • 8.
    Weighted Multi-Task Learning •Instead of giving an equal weight to each of the task during the training different weights can be given on different criteria like, • Manually setting a priority based weight • Dynamically adjusting during the training process • This weight is given to the loss function during the optimization Source: https://www.youtube.com/watch?v=vI46tzt4O7Y
  • 9.
    Training With VanillaMulti-Task Learning Source: https://www.youtube.com/watch?v=vI46tzt4O7Y
  • 10.
    Introduction to TransferLearning • Transfer Learning refers to the process of leveraging knowledge gained from solving one problem and applying it to a different, but related, problem • Unlike in traditional ML, where models are trained to perform a specific task on a specific dataset, Transfer Learning allows to transfer knowledge from one task/domain to another. This improves the performance of the target task, especially when labeled data for the target task is limited or expensive to obtain • E.g.: In order to train a cat image classifier, you can use a pre-trained CNN using the huge ImageNet dataset with many miscellaneous images and then train only the last few layers of the CNN, with the available cat image dataset which is smaller in size
  • 11.
    Motivation of TransferLearning • Scarcity of Labeled Data: Annotated datasets required for training machine learning models are often scarce and expensive to acquire. Transfer learning mitigates this issue by utilizing knowledge from related tasks or domains • Model Generalization: By transferring knowledge from a pre-trained model, the model can generalize better to new tasks or domains, even with limited data • Efficiency: Transfer learning can significantly reduce the computational resources and time required for training models from scratch, making it a practical approach in various real-world scenarios
  • 12.
    Types of TransferLearning 1.Inductive Transfer Learning: Involves transferring knowledge from a source domain to a target domain by learning a new task in the target domain using the knowledge gained from solving a related task in the source domain Example: Suppose you have a model trained to classify different types of fruits based on images in one dataset (source domain). You can then use the knowledge gained from this task to classify different types of vegetables based on images in a separate dataset (target domain) 2.Transductive Transfer Learning: Focuses on adapting a model to a new domain where the target data distribution may differ from the source domain. Instead of learning a new task, transductive transfer learning aims to adapt the model to perform well on the target domain. Example: Let's say you have a model trained on data from one country (source domain) to predict housing prices. However, when you try to apply this model to a different country (target domain), you encounter differences in housing market dynamics. Transductive transfer learning involves adapting the model to the target domain's characteristics without explicitly learning a new task
  • 13.
    Pre-Trained Models • Specificmodels can be developed by training available small labeled data with supervised learning on top of the commonly available pre- trained models • Large generic datasets like ImageNet and GPT models are some of the examples for the pre-trained models • ImageNet is an example for a large labeled dataset • However, there are many unsupervised pre-trained models available as open source content such as large language models like GPT models and BERT models
  • 14.
    Transfer Learning viaFine Tuning • The pre-trained model for source data is trained again for the target domain data • Sometimes, all the layers of the NN are trained, • Either a small Learning Rate is used for all the layers • Or smaller Learning Rates are used for earlier layers • Sometimes, train only the last layers while freezing the earlier layers and gradually the unfreezing the earlier layers • Sometimes, only the last one or few layers are trained while other layers keeping frozen • When the target task is simpler than the source task no need to update earlier layers • Best techniques/hyperparameters are selected with cross-validation
  • 15.
    Transfer Learning viaFine Tuning • Overfitting can be mitigated by Early Stopping technique • New layers can be added and initialized with Random Initialization while keeping the earlier layers as they are
  • 16.
    Unintuitive Facts aboutTransfer Learning • When the pre-training is done with unsupervised ML and fine tuned with supervised ML (e.g. Transformer models), you don’t need that much diverse data to pre-train • You can use the same target dataset for pre-training without much sacrifice of the accuracy! • This may change when both pre-training and fine tuning is done with supervised ML Source: https://www.youtube.com/watch?v=bVjCjdq06R4
  • 17.
    Unintuitive Facts aboutTransfer Learning • Selecting the last layer of a NN may not be the best layer to be fine tuned • For different scenarios some middle layers may perform better when selected than a full fine tuning Source: https://www.youtube.com/watch?v=bVjCjdq06R4
  • 18.
    Rule of Thumbfor Transfer Learning Source: https://www.youtube.com/watch?v=bVjCjdq06R4
  • 19.
    Meta Learning • “Givena set of training tasks, can we optimize for the ability to learn these tasks quickly, so that we can learn new tasks quickly too?” • This is what is achieved by Meta Learning • In other words optimization for transferability is known as Meta Learning Source: https://www.youtube.com/watch?v=bVjCjdq06R4
  • 20.
    Two Views ofMeta Learning Algorithms Source: https://www.youtube.com/watch?v=bVjCjdq06R4
  • 21.
    Bayes View ofMeta Learning • yi,j label value probabilities are dependent on 𝜙𝑖 parameter probabilities of the model of a task • All the 𝜙𝑖 parameter probabilities for all the tasks are dependent on the meta level parameters 𝜃 • If 𝜙𝑖 are independent for each task i, then 𝜃 has no information and vice versa • Learning for 𝜃 is the idea of Meta Learning Source: https://www.youtube.com/watch?v=bVjCjdq06R4
  • 22.
    Mechanistic View ofMeta Learning • yi,j label value probabilities are dependent on 𝜙𝑖 parameter probabilities of the model of a task • All the 𝜙𝑖 parameter probabilities for all the tasks are dependent on the meta level parameters 𝜃 • If 𝜙𝑖 are independent for each task i, then 𝜃 has no information and vice versa • Learning for 𝜃 is the idea of Meta Learning Source: https://www.youtube.com/watch?v=bVjCjdq06R4
  • 23.