3. • MLIR is intended to be a hybrid IR which can support multiple different requirements in a unified infrastructure. For
example, this includes:
• The ability to represent all TensorFlow graphs, including dynamic shapes, the user-extensible op ecosystem,
TensorFlow variables, etc.
• Optimizations and transformations typically done on a TensorFlow graph, e.g. in Grappler.
• Quantization and other graph transformations done on a TensorFlow graph or the TF Lite representation.
• Representation of kernels for ML operations in a form suitable for optimization.
• Ability to host high-performance-computing-style loop optimizations across kernels (fusion, loop interchange,
tiling, etc.) and to transform memory layouts of data.
• Code generation "lowering" transformations such as DMA insertion, explicit cache management, memory tiling,
and vectorization for 1D and 2D register architectures.
• Ability to represent target-specific operations, e.g. the MXU on TPUs.
• non-goals:
• low level machine code generation algorithms (like register allocation and instruction scheduling)
• MLIR as a source language that end-users would themselves write kernels in analogous to CUDA C++
https://github.com/tensorflow/mlir/blob/master/README.md
4. • Entire TensorFlow graph: nope, the “tf” dialect isn’t public yet
• Initial MLIR for in TensorFLow repo on June 28th, 2019
• Early TF, TFLite and XLA support: floating point MobilenetV1 TF pb ! TFLite flatbuffer works
• No, quantized ones don’t work yet although many components are there
• Simple quant, fxp, affine, and vector code is there
• So it’s possible to start exploring tiling and other techniques with affine, vector, and other dialects
• more GPU supports, including Vulkan SPIR-V
• Low-level code generation
• MLIR relies on LLVM and other existing backends
• Where to start
• MLIR’s git repo has
• links to 3 slide deck, one of them is a tutorial in Euro-LLVM 2019
• Docs for Toy lang and linear algebra dialect
• TensorFlow MLIR: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/compiler/mlir
5. TF .pb -> TFLite .tflite
• build TensorFlow MLIR related binaries
bazel build --config opt tensorflow/compiler/mlir/...
• get your model, e.g.,
wget http://download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_1.0_224.tgz
• convert it
./bazel-bin/tensorflow/compiler/mlir/lite/tf_tfl_translate -tf-input-shapes=1,224,224,3 -tf-input-data-types=DT_FLOAT -tf-
output-arrays=MobilenetV1/Predictions/Reshape_1 /tmp/mobilenet_v1_1.0_224_frozen.pb --tf-input-arrays=input -o /tmp/foo.tflite
• yes, it works like a charm. Nope, not for quantized one? neither
./bazel-bin/tensorflow/compiler/mlir/lite/tf_tfl_translate -tf-input-shapes=1,224,224,3 -tf-input-data-types=DT_QUINT8 -tf-
output-arrays=MobilenetV1/Predictions/Reshape_1 /tmp/mobilenet_v1_1.0_224_quant_frozen.pb --tf-input-arrays=input -o /tmp/
bar.tflite
nor
./bazel-bin/tensorflow/compiler/mlir/lite/tf_tfl_translate -tf-input-shapes=1,224,224,3 -tf-input-data-types=DT_FLOAT -tf-
output-arrays=MobilenetV1/Predictions/Reshape_1 /tmp/mobilenet_v1_1.0_224_quant_frozen.pb --tf-input-arrays=input -o /tmp/
bar.tflite —tf-inference-type=TF_QUINT8
works
6. How the converter works?
• Import from GraphDef, in .pb or .pbtxt format, into MLIR
• Raise control-flow graph. Converts TF Control Flow dialect to TF dialect.
• The Canonicalization pass iteratively applies canonicalization transformations in a
greedy way until no further changes occur. Canonicalization includes constant
folding.
• The Legalize pass converts TensorFlow operations to TensorFlow Lite ones. The
operations that cannot be mapped to TensorFlow Lite dialect are left as TensorFlow
operations. Unsupported op handling follows the proposed TFLite mechanism.
• Optimizations are performed in both the TF & TFLite dialect; aiming for small size
and high performance (among the core value proposition of TensorFlow Lite models).
• The Export pass writes out TensorFlow Lite FlatBuffer format. This pass operates on
MLIR TensorFlow Lite dialect and is simple/direct translation.
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/mlir/lite/
README.md
7. tf-mlir-translate
• graphdef —> mlir
$ ./bazel-bin/tensorflow/compiler/mlir/tensorflow/tf-mlir-translate --help
OVERVIEW: MLIR translation driver
USAGE: tf-mlir-translate [options] <input file>
OPTIONS:
Color Options:
--color - Use colors in output (default=autodetect)
General options:
--mlir-max-pattern-match-iterations=<uint> - Max number of iterations scanning the functions for pattern match
--mlir-pretty-debuginfo - Print pretty debug info in MLIR output
--mlir-print-debuginfo - Print debug info in MLIR output
-o=<filename> - Output filename
--remarks-yaml-string-table -
Translation to perform
--deserialize-spirv - deserialize-spirv
--graphdef-to-mlir - graphdef-to-mlir
--graphdef-to-splatted-mlir - graphdef-to-splatted-mlir
--mlir-to-graphdef - mlir-to-graphdef
--mlir-to-llvmir - mlir-to-llvmir
--mlir-to-nvvmir - mlir-to-nvvmir
--serialize-spirv - serialize-spirv
--test-only-mlir-to-tf-nodedef - test-only-mlir-to-tf-nodedef
--tf-debug-info=<string> - Path to the debug info file of the input graph def.
--tf-inference-type=<string> - Sets the type of real-number arrays in the output file. Only allows float and quantized types
--tf-input-arrays=<string> - Input tensor names, separated by ','
--tf-input-data-types=<string> - Input tensor data types, separated by ','
--tf-input-max-values=<string> - Sets the upper bound of the input data. Separated by ','; Each entry in the list should match
an entry in -tf-input-arrays. This is used when -tf-inference-type is a quantized type.
--tf-input-min-values=<string> - Sets the lower bound of the input data. Separated by ','; Each entry in the list should match
an entry in -tf-input-arrays. This is used when -tf-inference-type is a quantized type.
--tf-input-shapes=<string> - Input tensor shapes. Shapes for different tensors are separated by ':', and dimension sizes
for the same tensor are separated by ','
--tf-output-arrays=<string> - Output tensor names, separated by ','
--tf-prune-unused-nodes - Prune unused nodes in the input graphdef
--time-trace-granularity=<uint> - Minimum time granularity (in microseconds) traced by time profile
10. TensorFlow Dialects
• More on TensorFlow dialects:
• tf: the main dialect, representing the regular operations in a TensorFlow graph (the ones that
don’t have special contract with the executor).
• tf_executor: dialect that represents the execution model of the TensorFlow executor (e.g.,
control dependencies, deadness propagation)
• _tf: It's said in the TensorFlow MLIR open source announcement mail thread, https://
groups.google.com/a/tensorflow.org/forum/#!topic/mlir/xe522DD4ZYA, that control flow
dialect "_tf" is temporary.
• "One intent of this design is that TensorFlow 2.x features can choose to target just the tf
dialect, allowing us to phase out the tf_executor dialect in subsequent TensorFlow releases. The
combination of the two dialects allows to represent arbitrary existing TensorFlow graphs." [1]
[1] "https://github.com/tensorflow/community/pull/115
17. TFLite Native Quantization
• Take input min/max information and set the ArrayInfo (which really is
InputOrOutputArrayInfo).
• In LegalizeTF, convert ArrayInfo min/max to tf.Quantize and tf.Dequantize
nodes. (or tf.FakeQuant) Convert all constant FakeQuants to (tf.FQ -> tfl.Q
-> tfl.DQ).
• Hardcode logic/propagation needs to happen here.
• Run TF constant folding.
• In PrepareTFL, convert all tf.FQ to (tfl.Q -> tfl.DQ).
• Run quantization pass that take (tfl.DQ (for both input and weights) -> op ->
tfl.Q) and replaces with (op). Also replace (constant_float -> tfl.Q) with
(constant_quant).
https://github.com/tensorflow/mlir/blob/master/g3doc/Quantization.md#tflite-native-quantization
18. tfl passes
namespace mlir {
class FunctionPassBase;
class ModulePassBase;
namespace TFL {
// Creates an instance of the TensorFlow Lite dialect LegalizeTF pass.
FunctionPassBase *CreateLegalizeTFPass();
// Creates an instance of the TensorFlow Lite dialect Optimize pass.
FunctionPassBase *CreateOptimizePass();
// Creates an instance of the TensorFlow Lite dialect PrepareTF pass.
FunctionPassBase *CreatePrepareTFPass();
// Creates an instance of the TensorFlow Lite dialect LowerStaticTensorList
// pass.
ModulePassBase *CreateLowerStaticTensorListPass();
// Creates an instance of the TensorFlow Lite dialect Quantize pass.
FunctionPassBase *CreateQuantizePass();
// Creates an instance of the TensorFlow Lite dialect PrepareQuantize pass.
FunctionPassBase *CreatePrepareQuantizePass();
// Creates a instance of the TensorFlow Lite dialect PostQuantize pass.
FunctionPassBase *CreatePostQuantizePass(bool emit_quant_adaptor_ops);
} // namespace TFL
} // namespace mlir
19. quantization passes
• prepare-quantize
• Applies prepare quantization on the model in TFL dialect. This pass runs before
the quantization pass and propagate the quantization parameter across ops.
This step is necessary for post-training quantization and also making the
quantization rule for some operations in the quantization-aware training
quantization simpler.
• quantize
• tensorflow/compiler/mlir/lite/transforms/quantize.cc
• tensorflow/compiler/mlir/lite/transforms/quantize_patterns.td
• post-quantize
• Remove Quantization Adaptor Ops
20. TFL optimization
• activation into convolution
• an add op adding a constant value to a convolution op with constant
bias
• a mul op multiplying a constant value to a convolution op with
constant filter and bias
• quantize/dequantize
• fully connected with add
tensorflow/compiler/mlir/lite/transforms/optimize.cc
tensorflow/compiler/mlir/lite/transforms/optimize_patterns.td
23. no tfl.if()?
• yes, there is no tfl.if() of equivalent in
tensorflow/compiler/mlir/lite/ir/tfl_ops.{cc, h, td}
• however, we can convert the mlir in previous page to
TFLite flatbuffer, because there is
CustomOptionsOffset Translator::CreateIfOpCustomOptions(mlir::TF::IfOp op) {
int then_subgraph_index = subgraph_index_map_.at(op.getThen().str());
int else_subgraph_index = subgraph_index_map_.at(op.getElse().str());
auto flex_builder = absl::make_unique<flexbuffers::Builder>();
flex_builder->Map([&]() {
flex_builder->Int("then_subgraph_index", then_subgraph_index);
flex_builder->Int("else_subgraph_index", else_subgraph_index);
});
flex_builder->Finish();
return builder_.CreateVector(flex_builder->GetBuffer());
}
tensorflow/compiler/mlir/lite/flatbuffer_translate.cc
26. Recap: MLIR for TF and TFLite
• Conversion of Floating point models
• Infrastructure for quantized models is there
• Custom ops, such as the if control-flow could be done for mlir ->
flatbuffer
• How about LSTM? It seems something like OpHint [1] is not there yet
• XLA: some ops work
[1] https://www.tensorflow.org/api_docs/python/tf/lite/OpHint
30. Using other passes?
• GPU: nvvmir, spirv, ..
• for codegen and other purposes
• linalg, affine, memref:
• tiling, polyhedral etc.
• NO, not yet
• MLIR is incremental. Things won’t happen overnight.