Neural architecture
search: a probabilistic
approach
Author:
Volodymyr LUT
Supervisors:
Yuriy KHOMA and Vasilii GANISHEV
What is AutoML?
AutoML is a general name of automation in the routine work of
ML engineers.
It covers different areas including
but not limited to data preparation, feature engineering,
feature extraction,
neural architecture search, hyperparameters selection, etc.
Why AutoML?
Democratization of
technology
Extension of ML engineering
toolkit
Effective resource
management
Leverage to move
industry forward
Timeline of accuracy advances in ImageNet image recognition (Google)
- Reinforcement Learning
- Evolutionary algorithms
- Bayesian optimization
- Grid search
- Other
Neural Architecture Search Strategies
- The architecture of the student CNN in terms of action
space of MDP
- Max. accuracy received after student CNN evaluation
would become an immediate reward for the controller
- This stochastic process has Markov property, meaning
that future state depends only upon the present state.
- Reinforcement learning agent is maximizing cumulative
reward
Reinforcement Learning Strategy
DQN in terms of NAS problem
The controller
(CNN)
Trains student
network with
architecture A to
get accuracy R
Sample architecture A with probability p
Update weights of controller based on R
We are interested in the full predictive distribution, not just
a single best fit of Q(state, action) value received from the
controller CNN.
CNN may learn the specific input examples and their
associated outputs. To prevent it, we use the mean and
standard deviation of the target variable to model its
Gaussian distribution.
We use those parameters as a measure of uncertainty of
target variable prediction.
Probabilistic approximation of Q
Output of a last layer of regression model actually relates
to a well-known probability distribution, the Gaussian
distribution.
We would use the maximum likelihood estimate (MLE) of the
variance as a measure of uncertainty. We use log-
likelihood of the variance of the distribution as a loss
function of controller CNN.
Gaussian Layer and loss function
Following limitations was set:
- 80 training epochs per each dataset (CIFAR-10/CIFAR-
100) for controller CNN
- 8 training epochs per each architecture A of student
CNN
- Action space limited to selection of kernel size and
number of output filters for 4 Convolutional Layers
Details of the experiment
Action space limitation
Action type Values available in experiment
Number of filters 2, 4, 8, 16, 32, 64
Kernel size 1, 3, 6, 9, 12, 24
Demo
Algorithm CIFAR-10 mean
accuracy at last 10
epochs
CIFAR-100 mean
accuracy at last 10
epochs
Gaussian epsilon-
greedy
0.3807 0.1588
Classic epsilon-greedy 0.3308 0.0846
Gaussian UCB 0.3875 0.1078
Classic UCB 0.3737 0.0799
Results
- Gaussian modification of controller CNN was able to yield better
architectures on a previously unknown dataset
- Reinforcement learning is a good tool for NAS problem, however, NAS
problem is not the best environment for such research because it is
computationally expensive
- Even though Gaussian modification yields better results it hasn’t
prevented algorithm from overfitting at small action space and other
tools should be also considered.
Conclusions
Because logarithms are strictly increasing functions, maximizing the
likelihood is equivalent to maximizing the log-likelihood. Since Gaussian
density curve is log-concave it is convenient to use log-likelihood in this
case.
Log-likelihood
Detailed results
CIFAR-10 CIFAR-100
Algorithm max. reward max. acc. max. reward max. acc.
Gaussian 0.42795802 0.6592 0.7739771 0.3239
Classic 1.38204733 0.6437 1.2746971 0.1929
Complete CSV log with training history of experiments could be found at
https://github.com/volodymyrlut/masters-project

Master defence 2020 -Volodymyr Lut-Neural Architecture Search: a Probabilistic Approach

  • 1.
    Neural architecture search: aprobabilistic approach Author: Volodymyr LUT Supervisors: Yuriy KHOMA and Vasilii GANISHEV
  • 2.
    What is AutoML? AutoMLis a general name of automation in the routine work of ML engineers. It covers different areas including but not limited to data preparation, feature engineering, feature extraction, neural architecture search, hyperparameters selection, etc.
  • 3.
    Why AutoML? Democratization of technology Extensionof ML engineering toolkit Effective resource management Leverage to move industry forward
  • 5.
    Timeline of accuracyadvances in ImageNet image recognition (Google)
  • 7.
    - Reinforcement Learning -Evolutionary algorithms - Bayesian optimization - Grid search - Other Neural Architecture Search Strategies
  • 8.
    - The architectureof the student CNN in terms of action space of MDP - Max. accuracy received after student CNN evaluation would become an immediate reward for the controller - This stochastic process has Markov property, meaning that future state depends only upon the present state. - Reinforcement learning agent is maximizing cumulative reward Reinforcement Learning Strategy
  • 9.
    DQN in termsof NAS problem The controller (CNN) Trains student network with architecture A to get accuracy R Sample architecture A with probability p Update weights of controller based on R
  • 10.
    We are interestedin the full predictive distribution, not just a single best fit of Q(state, action) value received from the controller CNN. CNN may learn the specific input examples and their associated outputs. To prevent it, we use the mean and standard deviation of the target variable to model its Gaussian distribution. We use those parameters as a measure of uncertainty of target variable prediction. Probabilistic approximation of Q
  • 11.
    Output of alast layer of regression model actually relates to a well-known probability distribution, the Gaussian distribution. We would use the maximum likelihood estimate (MLE) of the variance as a measure of uncertainty. We use log- likelihood of the variance of the distribution as a loss function of controller CNN. Gaussian Layer and loss function
  • 12.
    Following limitations wasset: - 80 training epochs per each dataset (CIFAR-10/CIFAR- 100) for controller CNN - 8 training epochs per each architecture A of student CNN - Action space limited to selection of kernel size and number of output filters for 4 Convolutional Layers Details of the experiment
  • 13.
    Action space limitation Actiontype Values available in experiment Number of filters 2, 4, 8, 16, 32, 64 Kernel size 1, 3, 6, 9, 12, 24
  • 14.
  • 15.
    Algorithm CIFAR-10 mean accuracyat last 10 epochs CIFAR-100 mean accuracy at last 10 epochs Gaussian epsilon- greedy 0.3807 0.1588 Classic epsilon-greedy 0.3308 0.0846 Gaussian UCB 0.3875 0.1078 Classic UCB 0.3737 0.0799 Results
  • 16.
    - Gaussian modificationof controller CNN was able to yield better architectures on a previously unknown dataset - Reinforcement learning is a good tool for NAS problem, however, NAS problem is not the best environment for such research because it is computationally expensive - Even though Gaussian modification yields better results it hasn’t prevented algorithm from overfitting at small action space and other tools should be also considered. Conclusions
  • 17.
    Because logarithms arestrictly increasing functions, maximizing the likelihood is equivalent to maximizing the log-likelihood. Since Gaussian density curve is log-concave it is convenient to use log-likelihood in this case. Log-likelihood
  • 18.
    Detailed results CIFAR-10 CIFAR-100 Algorithmmax. reward max. acc. max. reward max. acc. Gaussian 0.42795802 0.6592 0.7739771 0.3239 Classic 1.38204733 0.6437 1.2746971 0.1929 Complete CSV log with training history of experiments could be found at https://github.com/volodymyrlut/masters-project