Glow is a compiler and execution engine for neural networks created by Facebook. It takes a high-level graph representation of a neural network and compiles it into efficient machine code for different hardware backends like CPU and OpenCL. The key steps in Glow include loading a model, optimizing the graph, lowering it to a low-level IR, scheduling operations to minimize memory usage, generating instructions for the backend, and performing optimizations specific to the target. Glow aims to provide a portable way to deploy neural networks across different hardware platforms.
11. Glow: Graph Lowering Compiler
Techniques for Neural Networks
May 2, 2018
https://arxiv.org/abs/1805.00907
Facebook
12. Glow: A community-driven approach to
AI infrastructure
Sep 13, 2018
https://code.fb.com/ml-applications/glow-a-community-driven-approach-to-ai
-infrastructure/
Facebook
13. @Scale
2018 Keynote: Glow: A community-driven
approach to AI
SEPTEMBER 19, 2018
https://atscaleconference.com/videos/scale-2018-keynote-glow-a-community-driven
-approach-to-ai/
Facebook
27. 1)、The graph is either loaded via the graph loader
(from ONNX or Caffe2 format),
or constructed via the C++ interface.
2)、The graph is differentiated if needed.
3)、The graph is optimized.
4)、Linear algebra node lowering takes place.
5)、Additional rounds of optimizations occur,
both target independent and target specific.
6)、The graph is scheduled into a linear sequence of nodes
that minimizes memory usage.
7)、IRGen converts the low-level graph into instructions.
8)、Low-level IR optimizations are performed.
9)、Backend-specific optimizations
and code generation are performed.
https://github.com/pytorch/glow/blob/master/docs/IR.md
30. 1)、The graph is either loaded via the graph loader
(from ONNX or Caffe2 format),
or constructed via the C++ interface.
2)、The graph is differentiated if needed.
3)、The graph is optimized.
4)、Linear algebra node lowering takes place.
5)、Additional rounds of optimizations occur,
both target independent and target specific.
6)、The graph is scheduled into a linear sequence of nodes
that minimizes memory usage.
7)、IRGen converts the low-level graph into instructions.
8)、Low-level IR optimizations are performed.
9)、Backend-specific optimizations
and code generation are performed.
https://github.com/pytorch/glow/blob/master/docs/IR.md
36. void glow::runBatch(ExecutionEngine &EE, size_t iterations,
size_t &sampleCounter, llvm::ArrayRef<Variable *> vars,
llvm::ArrayRef<Tensor *> input ) {
size_t batchSize = vars[0]->getType()->dims()[0];
for (size_t i = 0; i < iterations; i++) {
for (int i = 0, e = ph.size(); i < e; i++) {
auto *backingTensor = ctx.get(ph[i]);
auto dim = inputs[i]->dims();
size_t slc = sampleCounter % dim[0];
backingTensor->copyConsecutiveSlices(inputs[i], slc);
}
glow::updateVariablesFromBatch(vars, inputs, sampleCounter);
EE.run();
sampleCounter += batchSize;
}
}
glow::runBatch
https://github.com/pytorch/glow/blob/master/lib/ExecutionEngine/ExecutionEngine.cpp
37. void ExecutionEngine:: run() {
assert(function_ && "No function has been compiled");
// Make sure that the context has backing tensors for all placeholders.
ctx.allocate(M_.getPlaceholders());
function_->setupRuns();
function_->beforeRun(ctx);
function_->execute();
function_->afterRun(ctx);
function_->tearDownRuns();
}
ExecutionEngine::run
https://github.com/pytorch/glow/blob/master/lib/ExecutionEngine/ExecutionEngine.cpp
40. 1)、The graph is either loaded via the graph loader
(from ONNX or Caffe2 format),
or constructed via the C++ interface.
2)、The graph is differentiated if needed.
3)、The graph is optimized.
4)、Linear algebra node lowering takes place.
5)、Additional rounds of optimizations occur,
both target independent and target specific.
6)、The graph is scheduled into a linear sequence of nodes
that minimizes memory usage.
7)、IRGen converts the low-level graph into instructions.
8)、Low-level IR optimizations are performed.
9)、Backend-specific optimizations
and code generation are performed.
https://github.com/pytorch/glow/blob/master/docs/IR.md
42. void ExecutionEngine:: optimizeFunction(CompilationMode mode,
Function *F) {
// Verify the function pre-optimization/lowering.
F->verify();
// Optimize the graph.
::glow::optimize(F, mode);
// Allow the backend to transform the graph prior to lowering.
if (backend_->transformPreLowering(F, mode)) {
// Optimize the graph again after the backend transformation.
// In particular, DCE is very likely to be useful.
::glow::optimize(F, mode);
}
ExecutionEngine::optimizeFunction
https://github.com/pytorch/glow/blob/master/lib/ExecutionEngine/ExecutionEngine.cpp
43. // Lower the graph into a sequence of low-level linear algebra operations.
::glow::lower(F, *backend_);
// Optimize the graph again.
::glow::optimize(F, mode);
// Allow the backend to transform the graph after lowering.
if (backend_->transformPostLowering(F, mode)) {
// Optimize the graph again after the backend transformation.
// In particular, DCE is very likely to be useful.
::glow::optimize(F, mode);
}
}
ExecutionEngine::optimizeFunction
https://github.com/pytorch/glow/blob/master/lib/ExecutionEngine/ExecutionEngine.cpp
48. 1)、The graph is either loaded via the graph loader
(from ONNX or Caffe2 format),
or constructed via the C++ interface.
2)、The graph is differentiated if needed.
3)、The graph is optimized.
4)、Linear algebra node lowering takes place.
5)、Additional rounds of optimizations occur,
both target independent and target specific.
6)、The graph is scheduled into a linear sequence of nodes
that minimizes memory usage.
7)、IRGen converts the low-level graph into instructions.
8)、Low-level IR optimizations are performed.
9)、Backend-specific optimizations
and code generation are performed.
https://github.com/pytorch/glow/blob/master/docs/IR.md
63. 1)、The graph is either loaded via the graph loader
(from ONNX or Caffe2 format),
or constructed via the C++ interface.
2)、The graph is differentiated if needed.
3)、The graph is optimized.
4)、Linear algebra node lowering takes place.
5)、Additional rounds of optimizations occur,
both target independent and target specific.
6)、The graph is scheduled into a linear sequence of nodes
that minimizes memory usage.
7)、IRGen converts the low-level graph into instructions.
8)、Low-level IR optimizations are performed.
9)、Backend-specific optimizations
and code generation are performed.
https://github.com/pytorch/glow/blob/master/docs/IR.md
69. 1)、The graph is either loaded via the graph loader
(from ONNX or Caffe2 format),
or constructed via the C++ interface.
2)、The graph is differentiated if needed.
3)、The graph is optimized.
4)、Linear algebra node lowering takes place.
5)、Additional rounds of optimizations occur,
both target independent and target specific.
6)、The graph is scheduled into a linear sequence of nodes
that minimizes memory usage.
7)、IRGen converts the low-level graph into instructions.
8)、Low-level IR optimizations are performed.
9)、Backend-specific optimizations
and code generation are performed.
https://github.com/pytorch/glow/blob/master/docs/IR.md
70. std::unique_ptr<IRFunction>
glow::generateAndOptimizeIR(Function *F, bool shouldShareBuffers) {
auto IR = llvm::make_unique<IRFunction>(F);
# IR の生成
IR->generateIR();
# バックエンドを使って、最適化
::glow::optimize(*IR, shouldShareBuffers);
return IR;
}
IR生成とバックエンドを使ってIR最適化
https://github.com/pytorch/glow/blob/master/lib/Optimizer/IROptimizer.cpp
71. void glow::optimize(IRFunction &M, CompilationMode mode, const Backend &B) {
M.verify();
if (!optimizeIR) return;
performPeepholeOptimizations(M);
eliminateDeadStores(M);
// Replace applicable InsertTensors and ExtractTensors with TensorViews.
optimizeInserts(M);
optimizeExtracts(M);
if (B.shouldShareBuffers ()) // Reuse buffers from previous operations.
shareBuffers(M);;
IR最適化
https://github.com/pytorch/glow/blob/master/lib/Optimizer/IROptimizer.cpp#L1602
72. performPeepholeOptimizations(M);
hoistDealloc(M); // Shorten the lifetime of buffers.
sinkAllocas(M);
eliminateDeadStores(M); // Perform Dead Store Elimination.
deleteDeadAllocs(M);
makeWeightsConst(M); // Turn read-only weights into constant weights.
performDebugInstrumentation(M);
if (dumpOptMod) // Print the module to stdout if requested.
M.dump();
M.verify();
}
IR最適化
https://github.com/pytorch/glow/blob/master/lib/Optimizer/IROptimizer.cpp#L1596
75. class InterpreterFunction final : public CompiledFunction {
/// The IR to be executed.
std::unique_ptr<IRFunction> F_;
/// Maps values to Tensors, that are owned by this class.
std::unordered_map<const Value *, Tensor *> tensors_;
/// Maps values to Tensors, that are *not* owned by this class.
std::unordered_map<const Value *, Tensor *> externalTensors_;
public:
InterpreterFunction(std::unique_ptr<IRFunction> F, const Context &ctx);
~InterpreterFunction() override;
void execute() override;
InterpreterFunction
https://github.com/pytorch/glow/blob/master/lib/Backends/Interpreter/InterpreterFunction.h#L43