Debugging and Performance tricks for MXNet Gluon1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thomas Delteil, Applied Scientist @ AWS Deep Engine
APJCTech Summit 2018, Macau
Debugging MXNet Gluon
modelsAnd other performance tricks
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thomas Delteil, Applied Scientist @ AWS Deep Engine
APJCTech Summit 2018, Macau
Debugging MXNet Gluon models
And other performance tricks
2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Remote debugging with PyCharm
Visualizing deep learning
Performance tricks and gotchas
3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Apache History
• CMU project of PHD students in 2015 and the Distributed
Machine Learning Community (DMLC)
• 2017 => MXNet Gluon Imperative API is released
Tianqi Chen
UW
Mu Li
Amazon AI
Yutian Li
Stanford
Min Lin
MILA
Naiyan Wang
TuSimple
Minjie Wang
NYU CS
Tianjun Xiao
Tesla
Bing Xu
Apple AI
Chiyuan Zhang
Google Brain
Zheng Zhang
MSR Asia
4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Imperative vs Symbolic computational graphs
Symbolic
define, compile, run
Imperative
define-by-run in the host language
Inception model
5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Imperative > Symbolic
Debuggable
Fast to prototype
Hybridizable
6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Interactive Debugging
7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Shapes
Values
Gradients
8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Youtube tutorial
10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Visualizing Deep Learning
11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Network Architecture
12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MXNet native code (#1) print(net)
14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MXNet native code (#2) mx.viz.plot_network(sym)
15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MXNet native code (#3) mx.viz.print_summary(sym)
16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Netron (online tool)
17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MXBoard sw.add_graph(net)
18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
System performance
19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GPU: gpu_monitor (github)
20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CPU / RAM: > top i
21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Training metrics
22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MXBoard
23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MXBoard Scalars
24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MXBoard Images
25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Console
26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Performance
Tips and tricks
27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
130 samples/sec 1.25x 2.41x 2.46x 2.53x 3.84x
GPU utilization
28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Environment
mxnet-mkl (32x)
vs
mxnet
30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
I/O Bound
→ GPU Starvation
31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
#1 Asynchronously pre-fetching data (low CPU) (1.25x)
DataLoader(num_workers=CPU_COUNT-3)
#2 Offline preprocessing (full CPU)
32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GPU → CPU memcopy synchronization idling
#3 Smart synchronization calls (2.46x)
→ Small networks
33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Copy to GPU
Forward/Backward
Metric
Copy to GPU
Forward/Backward
Metric
Copy to GPU
Forward/Backward
Metric
Copy to GPU
Forward/Backward
Metric
Copy to GPU
Copy to GPU
34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Execution engine
Imperative → Symbolic (2.41x)
net.hybridize()
35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hyperparameters
Batchsize (2.56x)
36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Optimizer
Performance:
Time to accuracy
37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Mixed precision training
float32 → float16 (3.84x)
net.cast("float16")
38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Profiling
profiler.set_state('run')
…
profiler.set_state('stop')
39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Conclusion
- Use Gluon to debug and iterate quickly
- Hybridize and optimize for speed
- Know your model: Visualize performance
41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
Follow-up:
tdelteil@
Github.com/thomasdelteil
AWS Deep Engine, Vancouver
Editor's Notes Data loading issue
Nan values
Loss exploding suddenly
Explain ssh tunnel and tensorboard 22M$ GPU