8. • Penn Treebank
The CIFAR-10 dataset consists of 60000 32x32 colour
images in 10 classes, with 6000 images per class. There
are 50000 training images and 10000 test images.
• CIFAR-10
Penn Treebank dataset, known as PTB dataset, is widely
used in machine learning of NLP (Natural Language
Processing) research
v Datasets
10. NAS is computationally expensive and time consuming, e.g. Zoph et al. (2018) use 450 GPUs for 3-4 days
(i.e. 32,400-43,200 GPU hours)
• We observe that the computational bottleneck of NAS is the training of each child model to
convergence, only to measure its accuracy whilst throwing away all the trained weights.
• The main contribution of this work is to improve the efficiency of NAS by forcing all child models
to share weights to eschew training each child model from scratch to convergence.
Importantly, in all of our experiments, for which we use a single Nvidia GTX 1080Ti GPU, the search for
architectures takes less than 16 hours. Compared to NAS, this is a reduction of GPU-hours by more than 1000x.
11. uDirected Acyclic Graph (DAG)
§ Node
• Local Computation
• Own Parameters (activate시, 사용)
§ Edge
• Flow of information
• Determined by a controller (red)
Input
Output
→ Search space 내의 모든 child model의 parameter sharing
→ RNN cell의 1node 배치와 2operation을 같이 학습 (유연함)
(↔ NAS: 사용자가 node의 배치 등을 정해주고, 각 노드의 operation만 학습)
https://jayhey.github.io/deep%20learning/2018/03/15/ENAS/
12. (1) Macro Search : 전체 구조 Search (7시간)
The 6 available operations (< NAS)
• Convolution with kernel size 3 × 3 and 5 × 5.
• Depthwise-Separable Convolution with kernel size 3 × 3 and 5 × 5.
• Average Pooling / Max pooling with kernel size 3 × 3.
6L×2L(L-1)/2 개
(L=12, 1.6×1029 개 후보)
13.
14. (2) Micro Search
: Cell 단위로 Search하고나서 합침 (11.5시간)
The 5 available operations (< NAS)
• Identity
• Separable Convolution with kernel size 3 × 3 and 5 × 5.
• Average Pooling / Max pooling with kernel size 3 × 3.
(5×(B-2)!)4 개
(B=7, 1.3×1011 개 후보)