AutoML
NeurIPS 2018 Yomikai @ PFN (2019/01/26)
Shotaro Sano
Agenda
• AutoML @ NeurIPS 2018
• 1
“Massively Parallel Hyperparameter Tuning” [Li, et al.]
• 2
“Neural Architecture Optimization” [Luo, et al.]
2
What is AutoML?
• Hyperparameter Optimization (HPO):
• Neural Architecture Search (NAS): NN
• Meta Learning:
3
“The user simply provides data,
and the AutoML system
automatically determines
approach that performs best for
particular applications.”
Futter et al., 2018
AutoML: Methods, Systems, Challenges
AutoML @ NeurIPS 2018
• AutoML
• HPO, NAS, Meta Learning
• Meta Learning
• AutoML Meetup @ Google AI
•
– System for ML
– Meta Learning
– NeurIPS 2018 Competition Track
4
Hyperparameter Optimization @ NeurIPS 2018
• Bayesian Optimization Meta-learning
• 10 @
– “Regret bounds for meta Bayesian optimization with an unknown
Gaussian process prior”
– “Automating Bayesian Optimization with Bayesian Optimization”
– etc.
• Systems for ML
– “Massively Parallel Hyperparameter Tuning”
– “Population Based Training as a Service”
– etc.
5
Neural Architecture Search @ NeurIPS 2018
• , Semantic Segmentation
• 4+ @
– “Neural Architecture Optimization”
– “Neural Architecture Search with Bayesian Optimization and Optimal
Transport”
– etc.
• 2019 AutoDL
6
Meta Learning @ NeurIPS 2018
• Keywords: Model-agnostic Meta-Learning, Few-shot Learning,
Transfer Learning, etc.
• 20 @
– “Bayesian Model-Agnostic Meta Learning”
– “Meta-Reinforcement Learning of Structured Exploration Strategies”
– etc.
• Meta Learning
– HPO NAS
7
Competition Track: AutoML3 @ NeurIPS 2018
• AutoML3:
–
–
• Tree-parzen Estimator + LightGBM/XGBoost
8
Train&Test
Task A Task B Task C Task D Task E
Today’s Papers
• Hyperparameter Optimization
“Massively Parallel Hyperparameter Tuning” [Li, et al.]
• Neural Architecture Search
“Neural Architecture Optimization” [Luo, et al.]
9
Systems for ML Workshop (NeurIPS 2018)
Massively Parallel
Hyperparameter Tuning
11
Blackbox Optimization
such as
Grid Search
Bayesian Optimization
…
Hyperparameter Tuner
LR: 0.00001 ~ 0.1
Dropout: 0.0 ~ 0.5
Massively Parallel Hyperparameter Tuning
12
•
•
–
– Optuna
–
Successive Halving
Related Work: Successive Halving (SHA)
13
• ( )
•
• Hyperband [16, Li, et al.]
Related Work: Successive Halving (SHA)
14
N
resource
config
config
config
config
config
config
config
config
config
resource
config
config
config
config
config
config
config
config
config
Related Work: Successive Halving (SHA)
15
N / η
η
( η=3 )
Related Work: Successive Halving (SHA)
16
N / η2
η2
resource
config
config
config
config
config
config
config
config
config
( η=3 )
Related Work: Successive Halving (SHA)
17
resource
config
config
config
config
config
config
config
config
config
Simple and Powerful!
18
Successive Halving
Random Search
Faster & better!
rung 2
rung 11 1 1 1
2 2
3
resource
19
Related Work: Synchronous SHA
parallelize
rung 3
config
config
config
config
1 2 3
worker 1
1 1 2
worker 2
1
rung 1 rung 2 rung 3
• : rung
• /
20
Problem with Synchronous SHA
Synchronous
Asynchronous
( )
Config
worker 1
1 1 2
worker 2
1 2 31
rung 1 rung 2 rung 3
• “ ” 1/η config
• config config
21
Proposed Method: Asynchronous SHA
1 1 ? 1 2
1 2 ?
( η=2 )
1 2 1
• PROS: rung
• CONS: mis-promote
– Config
– Mis-promote N -1/2
22
Proposed Method: Asynchronous SHA
PROS
(Massively Parallel Hyperparameter Tuning)
1 1 2
1 2 31
1 2 1
1 2 31
2 3
1
23
Experiments: Single Worker Setting
Mis-promote
Synchronous
( )
Synchronous SHA
Asynchronous SHA
24
Experiments: Multi Worker Setting
config
( )
Synchronous SHA
Asynchronous SHA
25
Conclusion
Successive Halving
NeurIPS 2018
Neural Architecture
Optimization
Neural Architecture Search
•
• ImageNet SOTA
27
28
Chain-structured Space
Tree-structured Space
Multi-branch Network
Cell Search Space
…
Full Evaluation
Lower Fidelities
Learning Curve Extrapolation
One-shot Architecture Search
…
Reinforcement Learning
Evolutionary Search
Bayesian Optimization
Monte Carlo Tree Search
…
Neural Architecture Optimization
29
•
•
– CIFAR (+cutout) SOTA
– 2018 PFN
Neural Architecture Search
Related Work: NASNet Search Space
30
• [16, Zoph et al.]
• NASNet Space ImageNet SOTA [17, Zoph et al.]
–
– ResNet ResBlock
Proposed Method: NAONet
•
– ?
• NASNet ( )
31
32
LSTMEncoder
Embedding
Vector
LSTMDecoder
FC Layers
Accuracy
Prediction
33
LSTMEncoder
Embedding
Vector
LSTMDecoder
Encoder-decoder
34
LSTMEncoder
Embedding
Vector
LSTMDecoder
FC Layers
Accuracy
Prediction
Embedding
Multi-task
Loss
35
LSTMEncoder
Embedding
Vector
FC Layers
Accuracy
Prediction
36
LSTMEncoder
Embedding
Vector
FC Layers
Accuracy
Prediction
37
LSTMEncoder
Embedding
Vector
LSTMDecoder
FC Layers
Accuracy
Prediction
38
NAONet
39
Experiments: SOTA on CIFAR-10
SOTA
40
Experiments: Transferring CIFAR-10 to CIFAR-100
SOTA
41
Experiments: Transferring PTB to WikiText-2
42
Conclusion
Neural Architecture Search
CIFAR SOTA
Define-by-run style hyperparameter search framework.
! Fat config & poor control syntax!
" High modularity!
" High representation power!

AutoML in NeurIPS 2018