© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thomas Delteil, Applied Scientist @ AWS Deep Engine
APJCTech Summit 2018, Macau
Debugging MXNet Gluon
modelsAnd other performance tricks
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thomas Delteil, Applied Scientist @ AWS Deep Engine
APJCTech Summit 2018, Macau
Debugging MXNet Gluon models
And other performance tricks
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Remote debugging with PyCharm
Visualizing deep learning
Performance tricks and gotchas
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Apache History
• CMU project of PHD students in 2015 and the Distributed
Machine Learning Community (DMLC)
• 2017 => MXNet Gluon Imperative API is released
Tianqi Chen
UW
Mu Li
Amazon AI
Yutian Li
Stanford
Min Lin
MILA
Naiyan Wang
TuSimple
Minjie Wang
NYU CS
Tianjun Xiao
Tesla
Bing Xu
Apple AI
Chiyuan Zhang
Google Brain
Zheng Zhang
MSR Asia
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Imperative vs Symbolic computational graphs
Symbolic
define, compile, run
Imperative
define-by-run in the host language
Inception model
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Imperative > Symbolic
Debuggable
Fast to prototype
Hybridizable
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Interactive Debugging
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Shapes
Values
Gradients
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Youtube tutorial
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Visualizing Deep Learning
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Network Architecture
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MXNet native code (#1) print(net)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MXNet native code (#2) mx.viz.plot_network(sym)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MXNet native code (#3) mx.viz.print_summary(sym)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Netron (online tool)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MXBoard sw.add_graph(net)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
System performance
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GPU: gpu_monitor (github)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CPU / RAM: > top i
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Training metrics
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MXBoard
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MXBoard Scalars
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MXBoard Images
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Console
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Performance
Tips and tricks
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
130 samples/sec 1.25x 2.41x 2.46x 2.53x 3.84x
GPU utilization
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Environment
mxnet-mkl (32x)
vs
mxnet
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
I/O Bound
→ GPU Starvation
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
#1 Asynchronously pre-fetching data (low CPU) (1.25x)
DataLoader(num_workers=CPU_COUNT-3)
#2 Offline preprocessing (full CPU)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GPU → CPU memcopy synchronization idling
#3 Smart synchronization calls (2.46x)
→ Small networks
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Copy to GPU
Forward/Backward
Metric
Copy to GPU
Forward/Backward
Metric
Copy to GPU
Forward/Backward
Metric
Copy to GPU
Forward/Backward
Metric
Copy to GPU
Copy to GPU
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Execution engine
Imperative → Symbolic (2.41x)
net.hybridize()
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hyperparameters
Batchsize (2.56x)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Optimizer
Performance:
Time to accuracy
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Mixed precision training
float32 → float16 (3.84x)
net.cast("float16")
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Profiling
profiler.set_state('run')
…
profiler.set_state('stop')
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Conclusion
- Use Gluon to debug and iterate quickly
- Hybridize and optimize for speed
- Know your model: Visualize performance
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
Follow-up:
tdelteil@
Github.com/thomasdelteil
AWS Deep Engine, Vancouver

Debugging and Performance tricks for MXNet Gluon

  • 1.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Thomas Delteil, Applied Scientist @ AWS Deep Engine APJCTech Summit 2018, Macau Debugging MXNet Gluon modelsAnd other performance tricks © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thomas Delteil, Applied Scientist @ AWS Deep Engine APJCTech Summit 2018, Macau Debugging MXNet Gluon models And other performance tricks
  • 2.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Remote debugging with PyCharm Visualizing deep learning Performance tricks and gotchas
  • 3.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Apache History • CMU project of PHD students in 2015 and the Distributed Machine Learning Community (DMLC) • 2017 => MXNet Gluon Imperative API is released Tianqi Chen UW Mu Li Amazon AI Yutian Li Stanford Min Lin MILA Naiyan Wang TuSimple Minjie Wang NYU CS Tianjun Xiao Tesla Bing Xu Apple AI Chiyuan Zhang Google Brain Zheng Zhang MSR Asia
  • 4.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Imperative vs Symbolic computational graphs Symbolic define, compile, run Imperative define-by-run in the host language Inception model
  • 5.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Imperative > Symbolic Debuggable Fast to prototype Hybridizable
  • 6.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Interactive Debugging
  • 7.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Shapes Values Gradients
  • 8.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
  • 9.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Youtube tutorial
  • 10.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Visualizing Deep Learning
  • 11.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Network Architecture
  • 12.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
  • 13.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. MXNet native code (#1) print(net)
  • 14.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. MXNet native code (#2) mx.viz.plot_network(sym)
  • 15.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. MXNet native code (#3) mx.viz.print_summary(sym)
  • 16.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Netron (online tool)
  • 17.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. MXBoard sw.add_graph(net)
  • 18.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. System performance
  • 19.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. GPU: gpu_monitor (github)
  • 20.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. CPU / RAM: > top i
  • 21.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Training metrics
  • 22.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. MXBoard
  • 23.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. MXBoard Scalars
  • 24.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. MXBoard Images
  • 25.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Console
  • 26.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Performance Tips and tricks
  • 27.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. 130 samples/sec 1.25x 2.41x 2.46x 2.53x 3.84x GPU utilization
  • 28.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
  • 29.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Environment mxnet-mkl (32x) vs mxnet
  • 30.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. I/O Bound → GPU Starvation
  • 31.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. #1 Asynchronously pre-fetching data (low CPU) (1.25x) DataLoader(num_workers=CPU_COUNT-3) #2 Offline preprocessing (full CPU)
  • 32.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. GPU → CPU memcopy synchronization idling #3 Smart synchronization calls (2.46x) → Small networks
  • 33.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Copy to GPU Forward/Backward Metric Copy to GPU Forward/Backward Metric Copy to GPU Forward/Backward Metric Copy to GPU Forward/Backward Metric Copy to GPU Copy to GPU
  • 34.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Execution engine Imperative → Symbolic (2.41x) net.hybridize()
  • 35.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Hyperparameters Batchsize (2.56x)
  • 36.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Optimizer Performance: Time to accuracy
  • 37.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Mixed precision training float32 → float16 (3.84x) net.cast("float16")
  • 38.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Profiling profiler.set_state('run') … profiler.set_state('stop')
  • 39.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
  • 40.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Conclusion - Use Gluon to debug and iterate quickly - Hybridize and optimize for speed - Know your model: Visualize performance
  • 41.
    © 2017, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Thank you! Follow-up: tdelteil@ Github.com/thomasdelteil AWS Deep Engine, Vancouver

Editor's Notes

  • #9 Data loading issue Nan values Loss exploding suddenly
  • #13 Explain ssh tunnel and tensorboard
  • #27 22M$ GPU