SlideShare a Scribd company logo
EXPLORING RANDOMLY WIRED
NEURAL NETWORKS FOR
IMAGE RECOGNITION
2019.5.7
Yongsu Baek
1.Neural Architecture Search(NAS)?
2.Previous Works
3.Randomly Wired Neural Networks
4.Experiments
5.Conclusion
6.Discussion
2
1.Neural Architecture Search(NAS)?
2.Previous Works
3.Randomly Wired Neural Networks
4.Experiments
5.Conclusion
6.Discussion
3
• Automation of Feature Engineering
• Moves to “Architecture Engineering”
• Manually developed architectures(eg. AlexNet,
DenseNet, …)
DEEP LEARNING
Neural Architecture Search(NAS)?
• Automation of Architecture Engineering
• Methods
1) Search Space
• Prior Knowledge: Efficient Search vs. Human bias
• E.g.) Cell/Block repetition
2) Search Strategy
• E.g.) RL, Evolutionary methods, Random, …
3) Performance Estimation Strategy
NEURAL ARCHITECTURE SEARCH
Neural Architecture Search(NAS)?
DEEP LEARNING TO NAS
Neural Architecture Search(NAS)?
Design
an Individual Network
→
Design
a Network Generator
Automation of
Feature Engineering
→
Automation of
Architecture Engineering
1.Neural Architecture Search(NAS)?
2.Previous Works
3.Randomly Wired Neural Networks
4.Experiments
5.Conclusion
6.Discussion
7
• (NAS) Neural Architecture Search with Reinforcement Learning
• Architecture Generator: LSTM
• Architecture Search: Reinforcement Learning
• 800 GPUs, 28 days
• (NASNet) Learning transferable architectures for scalable image
recognition
• Cell Concept
• Transferable architectures
• 500 GPUs, 4 days
• ENAS, MnasNet, etc.
PREVIOUS WORKS
1.Neural Architecture Search(NAS)?
2.Previous Works
3.Randomly Wired Neural Networks
4.Experiments
5.Conclusion
6.Discussion
9
• Network Generator 𝑔
𝑔 ∶ Θ ↦ 𝒩
where Θ : a parameter space, 𝒩 : a family of related networks
• e.g. In ResNet generator, 𝒩: ResNets and 𝜃 ∈ Θ specify the number of
stages, number of residual blocks for each stage, depth/width/filter sizes,
activation types, etc.
• Deterministic
• Stochastic network generator 𝑔
𝑔 ∶ Θ × 𝑆 ↦ 𝒩
where Θ : a parameter space, 𝑆 : seeds of a pseudo-random
number generator, 𝒩 : a family of related networks
• e.g. NAS. 𝜃: weight matrices of LSTM, The output of each LSTM time-
step is a probability distribution conditioned on 𝜃
STOCHASTIC NETWORK GENERATOR
Randomly Wired Neural Networks
• Turing’s unorganized machines, which is a form of the earliest
randomly connected neural network
• Infant human’s cortex: Small-world properties
• Random graph modeling has been used as a tool to study the neural
networks of human brains
• Random graph models are an effective tool for modeling and
analyzing real-world graphs, e.g., social networks, world wide web,
citation networks
MOTIVATION
Randomly Wired Neural Networks
1. Generating general graphs(DAG)
2. Mapping from a general graph to neural network
operations
• Edge operations
- Data flow
• Node operations
- Aggregation: the input data via a weighted sum (learnable
and positive)
- Transformation: ReLU-convolution-BN triplet
- Distribution: The same copy of the transformed data
is sent out
3. Attaching Input and Output nodes
4. Stages
METHODOLOGY
Randomly Wired Neural Networks
• Additive aggregation maintains the same number of output channels
as input channels.
• Transformed data can be combined with the data from any other
nodes.
• Fixing the channel count keeps the FLOPs and parameter count
unchanged for each node, regardless of its input and output degrees.
• The overall FLOPs and parameter count of a graph are roughly
proportional to the number of nodes and nearly independent of the
number of edges
• This enables the comparison of different graphs without
inflating/deflating model complexity. Differences in task
performance are therefore reflective of the properties of the
wiring pattern.
NICE PROPERTIES OF NODE OPERATION
Randomly Wired Neural Networks
• Input
• The same copy of the data flow
• Output
• average(unweighted) from all original
output nodes.
ATTACHING INPUT AND OUTPUT NODES
Randomly Wired Neural Networks
Extra input node
Extra output node
• An entire network consists of multiple stages.
• One random graph represents one stage
• For all nodes that are directly connected to the input node, their
transformations are modified to have a stride 2.
• The channel count in a random graph is increased by 2x when going
from one stage to the next stage.
STAGES
Randomly Wired Neural Networks
RandWire Architecture
RANDOMLY WIRED NEURAL NETWORKS
• Erdös-Rényi(ER), 1959.
• ER(N, P)
• Has N nodes
• An edge between two nodes is connected with probability P.
• The ER generation model has only a single parameter P, and is denoted
as ER(P).
• Any graph with N nodes has non-zero probability of being generated by
the ER model.
GENERATING GENERAL GRAPHS
Randomly Wired Neural Networks
• Barabási-Albert (BA), 1999.
• BA(N, M)
• 1 ≤ M < N
𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒 𝑡ℎ𝑒 𝑔𝑟𝑎𝑝ℎ 𝐺 𝑎𝑠 𝑀 𝑛𝑜𝑑𝑒𝑠 𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑎𝑛𝑦 𝑒𝑑𝑔𝑒
𝑖𝑡𝑒𝑟𝑎𝑡𝑒: 𝐴𝑑𝑑 𝑎 𝑛𝑜𝑑𝑒 𝑣 𝑡 𝑠. 𝑡.
𝑓𝑜𝑟 𝑛𝑜𝑑𝑒 𝑣 𝑖𝑛 𝐺
𝑐𝑜𝑛𝑛𝑒𝑐𝑡 𝑣 𝑎𝑛𝑑 𝑣 𝑡 𝑏𝑦
𝑃 𝑣 𝑡 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 ∝ 𝑑𝑒𝑔𝑟𝑒𝑒(𝑣)
𝑢𝑛𝑡𝑖𝑙 𝑣 𝑡 ℎ𝑎𝑠 𝑀 𝑒𝑑𝑔𝑒𝑠
𝑢𝑛𝑡𝑖𝑙 𝐺 ℎ𝑎𝑠 𝑁 𝑛𝑜𝑑𝑒𝑠
• has exactly M(N-M) edges. → a subset of all graphs with N nodes
GENERATING GENERAL GRAPHS
Randomly Wired Neural Networks
• Watts-Strogatz(WS), 1998.
• WS(N, K, P)
• “Small World” model.
 High clustering, small diameter
0. N nodes를 원형으로 나열
1. 각 node 별로 양쪽으로 K/2개의 nodes를 연결
2. 시계방향으로 돌면서 rewire with probability P (uniformly)
• Has N⋅K edges → Smaller subset of N-node Graph
GENERATING GENERAL GRAPHS
Randomly Wired Neural Networks
• a stochastic network generator 𝑔(𝜃, 𝑠).
• The random graph parameters, P, M, (K; P) in ER, BA, WS
respectively, are part of the parameters 𝜃.
• The “optimization” of such a 1-or 2-parameter space is essentially
done by trial-and-error by human designers. – line/grid search
• The accuracy variation of our networks is small for different seeds 𝑠 so
they perform no random search and report mean accuracy of
multiple random network instances.
DESIGN AND OPTIMIZATION
Randomly Wired Neural Networks
1.Neural Architecture Search(NAS)?
2.Previous Works
3.Randomly Wired Neural Networks
4.Experiments
5.Conclusion
6.Discussion
21
• Imagenet Classification
• A small computation regime – MobileNet& ShuffleNet
• A regular computation regime – ResNet-50/101
• N nodes, C channels determine network complexity.
• N = 32, C = 79 for the small regime.
• N = 32, C = 109 or 154 for the regular regime.
• Random Seeds
• Randomlysample 5 network instances, train them from scratch.
• Report the classification accuracy with “mean±std” for all 5 network instances.
• Implementation Details
• Train for 100 epochs
• Half-period-cosine shaped learning rate decay and initial learning rate 0.1
• The weight decay is 5e-5
• Momentum 0.9
• Label smoothing regularization with a coefficient of 0.1
ARCHITECTURE DETAILS
Experiments
• 모두 학습 성공
• ER, BA, WS 모두 특정 세팅에서 mean accuracy > 73%
• Accuracy의 variance가 작음(std : 0.2 ~ 0.4 %)
• Random generator 별로 평균적인 accuracy차이가 있음
IMAGENET CLASSIFICATION
Experiments
• Node remove
• WS
 the mean degradation of accuracy is larger when the output degree
of the removed node is higher.
 “hub” nodes in WS that send information to many nodes are
influential
GRAPH DAMAGE
Experiments
• Edge remove
• If the input degree of an edge’s target node is smaller, removing this
edge tends to change a larger portion of the target node’s inputs.
• ER
 less sensitive to edge removal, possibly because in ER’s definition
wiring of every edge is independent.
GRAPH DAMAGE
Experiments
• Same conv in all nodes
• adjust the factor C to keep the complexity of all alternative networks
• the Pearson correlation between any two series in Figure is 0:91 ~
0:98
NODE OPERATIONS
Experiments
• Small computation regime
COMPARISONS
Experiments
*250 epochs for fair comparisons
• Regular computation regime
• Use a regularization method inspired by edge removal analysis.
Randomly remove one edge whose target node has input degree > 1
with probability of 0.1.
COMPARISONS
Experiments
• Larger computation
• Increase the test image size to 320 x 320 without retraining
COMPARISONS
Experiments
• Object detection
• The features learned by randomly wired networks can also transfer.
COMPARISONS
Experiments
1.Neural Architecture Search(NAS)?
2.Previous Works
3.Randomly Wired Neural Networks
4.Experiments
5.Conclusion
6.Discussion
31
• The mean accuracy of these models is competitive with hand-
designed and optimized from NAS(Net).
• The authors hope that future work exploring new generator designs
may yield new, powerful networks designs.
• Contribution
• Layer type보다는 wiring pattern에 집중하여 search space를 잘 정의함
• 좋은 Search space를 찾는 것 만으로도 좋은 결과를 낼 수 있음
• (Stochastic) Network Generator 개념을 도입함
CONCLUSION
1.Neural Architecture Search(NAS)?
2.Previous Works
3.Randomly Wired Neural Networks
4.Experiments
5.Conclusion
6.Discussion
33
• Search Space
• 더 나은 search space를 찾는 아이디어
• Search Methods
• Prior knowledge: 의도나 해석이 없음
• 성능이 좋은 Network의 특성에 대한 연구가 있으면 좋을듯(AutoML의 문
제)
DISCUSSION
[1] Xie, S., Kirillov, A., Girshick, R., & He, K. (2019). Exploring Randomly
Wired Neural Networks for Image Recognition. arXiv preprint
arXiv:1904.01569.
[2] Elsken, T., Metzen, J. H., & Hutter, F. (2019). Neural Architecture
Search: A Survey. Journal of Machine Learning Research, 20(55), 1-21.
[3] Zoph, B., & Le, Q. V. (2017). Neural architecture search with
reinforcement learning. ICLR 2017.
[4] Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2018). Learning
transferable architectures for scalable image recognition. In
Proceedings of the IEEE conference on computer vision and pattern
recognition (pp. 8697-8710).
[5] Jinwon, L. PR-155: Exploring Randomly Wired Neural Networks for
Image Recognition. https://www.youtube.com/watch?v=qnGm1h365tc
REFERENCES
ANY QUESTIONS?

More Related Content

What's hot

ShuffleNet - PR054
ShuffleNet - PR054ShuffleNet - PR054
ShuffleNet - PR054
Jinwon Lee
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
Jinwon Lee
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
Jinwon Lee
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
Jinwon Lee
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
Jinwon Lee
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
Jinwon Lee
 
Cnn
CnnCnn
A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural Networks
Rimzim Thube
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphs
Mathias Niepert
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
SungminYou
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
Jinwon Lee
 
Mobilenetv1 v2 slide
Mobilenetv1 v2 slideMobilenetv1 v2 slide
Mobilenetv1 v2 slide
威智 黃
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
Jinwon Lee
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
Kasun Chinthaka Piyarathna
 
[2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review][2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review]
taeseon ryu
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
Hannes Hapke
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
Jeremy Nixon
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
ananth
 
MobileNet - PR044
MobileNet - PR044MobileNet - PR044
MobileNet - PR044
Jinwon Lee
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
Tahmid Abtahi
 

What's hot (20)

ShuffleNet - PR054
ShuffleNet - PR054ShuffleNet - PR054
ShuffleNet - PR054
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
 
Cnn
CnnCnn
Cnn
 
A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural Networks
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphs
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
 
Mobilenetv1 v2 slide
Mobilenetv1 v2 slideMobilenetv1 v2 slide
Mobilenetv1 v2 slide
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
 
[2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review][2020 CVPR Efficient DET paper review]
[2020 CVPR Efficient DET paper review]
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
MobileNet - PR044
MobileNet - PR044MobileNet - PR044
MobileNet - PR044
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
 

Similar to Exploring Randomly Wired Neural Networks for Image Recognition

Wits presentation 6_28072015
Wits presentation 6_28072015Wits presentation 6_28072015
Wits presentation 6_28072015
Beatrice van Eden
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
DonghyunKang12
 
[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx
thanhdowork
 
Neural netorksmatching
Neural netorksmatchingNeural netorksmatching
Neural netorksmatching
Masa Kato
 
network mining and representation learning
network mining and representation learningnetwork mining and representation learning
network mining and representation learning
sun peiyuan
 
NS-CUK Seminar: S.T.Nguyen, Review on "Improving Graph Neural Network Express...
NS-CUK Seminar: S.T.Nguyen, Review on "Improving Graph Neural Network Express...NS-CUK Seminar: S.T.Nguyen, Review on "Improving Graph Neural Network Express...
NS-CUK Seminar: S.T.Nguyen, Review on "Improving Graph Neural Network Express...
ssuser4b1f48
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
nainabhatt2
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
NainaBhatt1
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
ssuser2624f71
 
Digit recognition
Digit recognitionDigit recognition
Digit recognition
btandale
 
Neural
NeuralNeural
BalloonNet: A Deploying Method for a Three-Dimensional Wireless Network Surro...
BalloonNet: A Deploying Method for a Three-Dimensional Wireless Network Surro...BalloonNet: A Deploying Method for a Three-Dimensional Wireless Network Surro...
BalloonNet: A Deploying Method for a Three-Dimensional Wireless Network Surro...
Naoki Shibata
 
240506_JW_labseminar[Structural Deep Network Embedding].pptx
240506_JW_labseminar[Structural Deep Network Embedding].pptx240506_JW_labseminar[Structural Deep Network Embedding].pptx
240506_JW_labseminar[Structural Deep Network Embedding].pptx
thanhdowork
 
ANN load forecasting
ANN load forecastingANN load forecasting
ANN load forecasting
Dr Ashok Tiwari
 
Neural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to LearnNeural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to Learn
Kwanghee Choi
 
artificial neural network ppt.pptx
artificial neural network ppt.pptxartificial neural network ppt.pptx
artificial neural network ppt.pptx
NikithaNicky3
 
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringConvolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
SOYEON KIM
 
Deep Neural Networks.pptx
Deep Neural Networks.pptxDeep Neural Networks.pptx
Deep Neural Networks.pptx
YashPaul20
 
Chapter 4 better.pptx
Chapter 4 better.pptxChapter 4 better.pptx
Chapter 4 better.pptx
AbanobZakaria1
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
Dr.(Mrs).Gethsiyal Augasta
 

Similar to Exploring Randomly Wired Neural Networks for Image Recognition (20)

Wits presentation 6_28072015
Wits presentation 6_28072015Wits presentation 6_28072015
Wits presentation 6_28072015
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx
 
Neural netorksmatching
Neural netorksmatchingNeural netorksmatching
Neural netorksmatching
 
network mining and representation learning
network mining and representation learningnetwork mining and representation learning
network mining and representation learning
 
NS-CUK Seminar: S.T.Nguyen, Review on "Improving Graph Neural Network Express...
NS-CUK Seminar: S.T.Nguyen, Review on "Improving Graph Neural Network Express...NS-CUK Seminar: S.T.Nguyen, Review on "Improving Graph Neural Network Express...
NS-CUK Seminar: S.T.Nguyen, Review on "Improving Graph Neural Network Express...
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
 
Digit recognition
Digit recognitionDigit recognition
Digit recognition
 
Neural
NeuralNeural
Neural
 
BalloonNet: A Deploying Method for a Three-Dimensional Wireless Network Surro...
BalloonNet: A Deploying Method for a Three-Dimensional Wireless Network Surro...BalloonNet: A Deploying Method for a Three-Dimensional Wireless Network Surro...
BalloonNet: A Deploying Method for a Three-Dimensional Wireless Network Surro...
 
240506_JW_labseminar[Structural Deep Network Embedding].pptx
240506_JW_labseminar[Structural Deep Network Embedding].pptx240506_JW_labseminar[Structural Deep Network Embedding].pptx
240506_JW_labseminar[Structural Deep Network Embedding].pptx
 
ANN load forecasting
ANN load forecastingANN load forecasting
ANN load forecasting
 
Neural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to LearnNeural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to Learn
 
artificial neural network ppt.pptx
artificial neural network ppt.pptxartificial neural network ppt.pptx
artificial neural network ppt.pptx
 
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringConvolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
 
Deep Neural Networks.pptx
Deep Neural Networks.pptxDeep Neural Networks.pptx
Deep Neural Networks.pptx
 
Chapter 4 better.pptx
Chapter 4 better.pptxChapter 4 better.pptx
Chapter 4 better.pptx
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 

Recently uploaded

Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 

Recently uploaded (20)

Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 

Exploring Randomly Wired Neural Networks for Image Recognition

  • 1. EXPLORING RANDOMLY WIRED NEURAL NETWORKS FOR IMAGE RECOGNITION 2019.5.7 Yongsu Baek
  • 2. 1.Neural Architecture Search(NAS)? 2.Previous Works 3.Randomly Wired Neural Networks 4.Experiments 5.Conclusion 6.Discussion 2
  • 3. 1.Neural Architecture Search(NAS)? 2.Previous Works 3.Randomly Wired Neural Networks 4.Experiments 5.Conclusion 6.Discussion 3
  • 4. • Automation of Feature Engineering • Moves to “Architecture Engineering” • Manually developed architectures(eg. AlexNet, DenseNet, …) DEEP LEARNING Neural Architecture Search(NAS)?
  • 5. • Automation of Architecture Engineering • Methods 1) Search Space • Prior Knowledge: Efficient Search vs. Human bias • E.g.) Cell/Block repetition 2) Search Strategy • E.g.) RL, Evolutionary methods, Random, … 3) Performance Estimation Strategy NEURAL ARCHITECTURE SEARCH Neural Architecture Search(NAS)?
  • 6. DEEP LEARNING TO NAS Neural Architecture Search(NAS)? Design an Individual Network → Design a Network Generator Automation of Feature Engineering → Automation of Architecture Engineering
  • 7. 1.Neural Architecture Search(NAS)? 2.Previous Works 3.Randomly Wired Neural Networks 4.Experiments 5.Conclusion 6.Discussion 7
  • 8. • (NAS) Neural Architecture Search with Reinforcement Learning • Architecture Generator: LSTM • Architecture Search: Reinforcement Learning • 800 GPUs, 28 days • (NASNet) Learning transferable architectures for scalable image recognition • Cell Concept • Transferable architectures • 500 GPUs, 4 days • ENAS, MnasNet, etc. PREVIOUS WORKS
  • 9. 1.Neural Architecture Search(NAS)? 2.Previous Works 3.Randomly Wired Neural Networks 4.Experiments 5.Conclusion 6.Discussion 9
  • 10. • Network Generator 𝑔 𝑔 ∶ Θ ↦ 𝒩 where Θ : a parameter space, 𝒩 : a family of related networks • e.g. In ResNet generator, 𝒩: ResNets and 𝜃 ∈ Θ specify the number of stages, number of residual blocks for each stage, depth/width/filter sizes, activation types, etc. • Deterministic • Stochastic network generator 𝑔 𝑔 ∶ Θ × 𝑆 ↦ 𝒩 where Θ : a parameter space, 𝑆 : seeds of a pseudo-random number generator, 𝒩 : a family of related networks • e.g. NAS. 𝜃: weight matrices of LSTM, The output of each LSTM time- step is a probability distribution conditioned on 𝜃 STOCHASTIC NETWORK GENERATOR Randomly Wired Neural Networks
  • 11. • Turing’s unorganized machines, which is a form of the earliest randomly connected neural network • Infant human’s cortex: Small-world properties • Random graph modeling has been used as a tool to study the neural networks of human brains • Random graph models are an effective tool for modeling and analyzing real-world graphs, e.g., social networks, world wide web, citation networks MOTIVATION Randomly Wired Neural Networks
  • 12. 1. Generating general graphs(DAG) 2. Mapping from a general graph to neural network operations • Edge operations - Data flow • Node operations - Aggregation: the input data via a weighted sum (learnable and positive) - Transformation: ReLU-convolution-BN triplet - Distribution: The same copy of the transformed data is sent out 3. Attaching Input and Output nodes 4. Stages METHODOLOGY Randomly Wired Neural Networks
  • 13. • Additive aggregation maintains the same number of output channels as input channels. • Transformed data can be combined with the data from any other nodes. • Fixing the channel count keeps the FLOPs and parameter count unchanged for each node, regardless of its input and output degrees. • The overall FLOPs and parameter count of a graph are roughly proportional to the number of nodes and nearly independent of the number of edges • This enables the comparison of different graphs without inflating/deflating model complexity. Differences in task performance are therefore reflective of the properties of the wiring pattern. NICE PROPERTIES OF NODE OPERATION Randomly Wired Neural Networks
  • 14. • Input • The same copy of the data flow • Output • average(unweighted) from all original output nodes. ATTACHING INPUT AND OUTPUT NODES Randomly Wired Neural Networks Extra input node Extra output node
  • 15. • An entire network consists of multiple stages. • One random graph represents one stage • For all nodes that are directly connected to the input node, their transformations are modified to have a stride 2. • The channel count in a random graph is increased by 2x when going from one stage to the next stage. STAGES Randomly Wired Neural Networks RandWire Architecture
  • 17. • Erdös-Rényi(ER), 1959. • ER(N, P) • Has N nodes • An edge between two nodes is connected with probability P. • The ER generation model has only a single parameter P, and is denoted as ER(P). • Any graph with N nodes has non-zero probability of being generated by the ER model. GENERATING GENERAL GRAPHS Randomly Wired Neural Networks
  • 18. • Barabási-Albert (BA), 1999. • BA(N, M) • 1 ≤ M < N 𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒 𝑡ℎ𝑒 𝑔𝑟𝑎𝑝ℎ 𝐺 𝑎𝑠 𝑀 𝑛𝑜𝑑𝑒𝑠 𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑎𝑛𝑦 𝑒𝑑𝑔𝑒 𝑖𝑡𝑒𝑟𝑎𝑡𝑒: 𝐴𝑑𝑑 𝑎 𝑛𝑜𝑑𝑒 𝑣 𝑡 𝑠. 𝑡. 𝑓𝑜𝑟 𝑛𝑜𝑑𝑒 𝑣 𝑖𝑛 𝐺 𝑐𝑜𝑛𝑛𝑒𝑐𝑡 𝑣 𝑎𝑛𝑑 𝑣 𝑡 𝑏𝑦 𝑃 𝑣 𝑡 𝑎𝑛𝑑 𝑣 𝑎𝑟𝑒 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 ∝ 𝑑𝑒𝑔𝑟𝑒𝑒(𝑣) 𝑢𝑛𝑡𝑖𝑙 𝑣 𝑡 ℎ𝑎𝑠 𝑀 𝑒𝑑𝑔𝑒𝑠 𝑢𝑛𝑡𝑖𝑙 𝐺 ℎ𝑎𝑠 𝑁 𝑛𝑜𝑑𝑒𝑠 • has exactly M(N-M) edges. → a subset of all graphs with N nodes GENERATING GENERAL GRAPHS Randomly Wired Neural Networks
  • 19. • Watts-Strogatz(WS), 1998. • WS(N, K, P) • “Small World” model.  High clustering, small diameter 0. N nodes를 원형으로 나열 1. 각 node 별로 양쪽으로 K/2개의 nodes를 연결 2. 시계방향으로 돌면서 rewire with probability P (uniformly) • Has N⋅K edges → Smaller subset of N-node Graph GENERATING GENERAL GRAPHS Randomly Wired Neural Networks
  • 20. • a stochastic network generator 𝑔(𝜃, 𝑠). • The random graph parameters, P, M, (K; P) in ER, BA, WS respectively, are part of the parameters 𝜃. • The “optimization” of such a 1-or 2-parameter space is essentially done by trial-and-error by human designers. – line/grid search • The accuracy variation of our networks is small for different seeds 𝑠 so they perform no random search and report mean accuracy of multiple random network instances. DESIGN AND OPTIMIZATION Randomly Wired Neural Networks
  • 21. 1.Neural Architecture Search(NAS)? 2.Previous Works 3.Randomly Wired Neural Networks 4.Experiments 5.Conclusion 6.Discussion 21
  • 22. • Imagenet Classification • A small computation regime – MobileNet& ShuffleNet • A regular computation regime – ResNet-50/101 • N nodes, C channels determine network complexity. • N = 32, C = 79 for the small regime. • N = 32, C = 109 or 154 for the regular regime. • Random Seeds • Randomlysample 5 network instances, train them from scratch. • Report the classification accuracy with “mean±std” for all 5 network instances. • Implementation Details • Train for 100 epochs • Half-period-cosine shaped learning rate decay and initial learning rate 0.1 • The weight decay is 5e-5 • Momentum 0.9 • Label smoothing regularization with a coefficient of 0.1 ARCHITECTURE DETAILS Experiments
  • 23. • 모두 학습 성공 • ER, BA, WS 모두 특정 세팅에서 mean accuracy > 73% • Accuracy의 variance가 작음(std : 0.2 ~ 0.4 %) • Random generator 별로 평균적인 accuracy차이가 있음 IMAGENET CLASSIFICATION Experiments
  • 24. • Node remove • WS  the mean degradation of accuracy is larger when the output degree of the removed node is higher.  “hub” nodes in WS that send information to many nodes are influential GRAPH DAMAGE Experiments
  • 25. • Edge remove • If the input degree of an edge’s target node is smaller, removing this edge tends to change a larger portion of the target node’s inputs. • ER  less sensitive to edge removal, possibly because in ER’s definition wiring of every edge is independent. GRAPH DAMAGE Experiments
  • 26. • Same conv in all nodes • adjust the factor C to keep the complexity of all alternative networks • the Pearson correlation between any two series in Figure is 0:91 ~ 0:98 NODE OPERATIONS Experiments
  • 27. • Small computation regime COMPARISONS Experiments *250 epochs for fair comparisons
  • 28. • Regular computation regime • Use a regularization method inspired by edge removal analysis. Randomly remove one edge whose target node has input degree > 1 with probability of 0.1. COMPARISONS Experiments
  • 29. • Larger computation • Increase the test image size to 320 x 320 without retraining COMPARISONS Experiments
  • 30. • Object detection • The features learned by randomly wired networks can also transfer. COMPARISONS Experiments
  • 31. 1.Neural Architecture Search(NAS)? 2.Previous Works 3.Randomly Wired Neural Networks 4.Experiments 5.Conclusion 6.Discussion 31
  • 32. • The mean accuracy of these models is competitive with hand- designed and optimized from NAS(Net). • The authors hope that future work exploring new generator designs may yield new, powerful networks designs. • Contribution • Layer type보다는 wiring pattern에 집중하여 search space를 잘 정의함 • 좋은 Search space를 찾는 것 만으로도 좋은 결과를 낼 수 있음 • (Stochastic) Network Generator 개념을 도입함 CONCLUSION
  • 33. 1.Neural Architecture Search(NAS)? 2.Previous Works 3.Randomly Wired Neural Networks 4.Experiments 5.Conclusion 6.Discussion 33
  • 34. • Search Space • 더 나은 search space를 찾는 아이디어 • Search Methods • Prior knowledge: 의도나 해석이 없음 • 성능이 좋은 Network의 특성에 대한 연구가 있으면 좋을듯(AutoML의 문 제) DISCUSSION
  • 35. [1] Xie, S., Kirillov, A., Girshick, R., & He, K. (2019). Exploring Randomly Wired Neural Networks for Image Recognition. arXiv preprint arXiv:1904.01569. [2] Elsken, T., Metzen, J. H., & Hutter, F. (2019). Neural Architecture Search: A Survey. Journal of Machine Learning Research, 20(55), 1-21. [3] Zoph, B., & Le, Q. V. (2017). Neural architecture search with reinforcement learning. ICLR 2017. [4] Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2018). Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8697-8710). [5] Jinwon, L. PR-155: Exploring Randomly Wired Neural Networks for Image Recognition. https://www.youtube.com/watch?v=qnGm1h365tc REFERENCES

Editor's Notes

  1. Deep learning은 Feature engineering을 자동화하는데 기여함 하지만 이는 곧 직접 network architecture를 구성하는 Architecture engineering으로 변질됨 많은 network architecture들이 개발됐지만 time-consuming한 작업이고 error-prune한 작업임
  2. Layer 길이가 7인 DNN등 space를 제한 가능. 사전 지식(conv쓰는게 좋음, 3x3 conv가 좋음, BN쓰면 좋음 등)을 반영하면 search space를 줄일 수 있어 효율적일 수 있지만 이 또한 novel architecture 탐색을 방해하는 human bias가 될 수 있음. 좋은 예시: Cell/Block을 학습 시켜서 반복시키면 다른 data에 transfer가능하면서도 space reduction하면서 좋은 성능 유지 가능. RL, Random pick, evolutionary methods 등 사용 가능. Search space는 보통 exponentially 크거나 unbounded -> 잘 찾아야함; exploration-exploitation trade-off 보통은 그냥 training-validation을 거치는데 최근 이 과정을 효율적으로 만들기 위한 시도가 있음
  3. “Connectionist” Approach
  4. Swish와 같은 activation function이나 Auto augment와 같은 augmentation도 찾아내곤 함 결과적으로 한 형태의 convolution이나 layer크기들만 사용하게 됨 지금까지의 works는 search space에 대한 조절을 layer위주로 하거나 search strategy 위주의 연구가 많았음. 저자는 wiring에 대한 영향이 궁금함 (이제부터 NAS는 Neural Architecture Search with Reinforcement Learning)
  5. Relu를 마지막에 두면 weight가 positive여서 positive 가 계속 더해지게 돼서 값이 계속 커짐-> BN을 마지막에 둬서 조절함
  6. Issue: 특별한 형태의 확률이 낮고 평균적으로 비슷한 그래프를 만들듯 P > ln(N)/N 이면 single component(connected)
  7. This gives one example on how an underlying prior can be introduced by the graph generator in spite of randomness. 친구가 많은 애들이랑 연결될 확률이 높음
  8. “Rewiring” is defined as uniformly choosing a random node that is not v and that is not a duplicate edge.
  9. We randomly remove one node (top) or remove one edge (bottom) from a graph after the network is trained, and evaluate the loss (delta) in accuracy on ImageNet. Red circle: mean; gray bar: median; orange box: interquartile range; blue dot: an individual damaged instance.
  10. We randomly remove one node (top) or remove one edge (bottom) from a graph after the network is trained, and evaluate the loss (delta) in accuracy on ImageNet. Red circle: mean; gray bar: median; orange box: interquartile range; blue dot: an individual damaged instance.
  11. Regularization의 효과를 비교하는 실험도 했으면 좋겠다 그런데 내가 재현하기가 너무 어렵다