LeFlow
Enabling Flexible FPGA High-Level Synthesis
of Tensorflow Deep Neural Networks
@Vengineer
2018/07/22 (LeFlow追記)
TensorFlow XLAのロゴ
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/compiler/xla
ブログ (2007年~) : Vengineerの戯言
 http://blogs.yahoo.co.jp/verification_engineer
SlideShare :
 https://www.slideshare.net/ssuser479fa3
Twitter (2009年~) :
@Vengineer
Enabling Flexible FPGA High-Level Synthesis of Tensorflow Deep Neural
Networks
論文 : https://arxiv.org/abs/1807.05317 
Submitted on 14 Jul 2018
github : https://github.com/danielholanda/LeFlow
最適化
 XLAの最適化無しのLLVM IRを使って、LegUpで最適化を行う。
LeFlow
LeFlowのフォロー
・TensorFlowのコード が Verilog HDL に!
・TensorFlow XLA側の修正無し
 (ただし、ソースコードへのパッチは必要)
 (2ファイル)
・LegUpのそのまま使える
・Intel (Altera)へのサンプルコード有
(Xilinxでもできる模様)
生成されるHDLの構成
最適化
Algorithm 1
 %0 = bitcast i8** %params to [2 x float]**
 %arg0 = load [2 x float] %0, align 8
=> @arg0 を global に変更し、ゼロで初期化
Algorithm 2
 %arg0 = global [2 x float] zeroinitializer, align 8
MNIST
with tf.devide(“device:XLA_CPU:0”):
y = tf.nn.softmax(tf.add(tf.matmul(input, weights)[0], bias))
LeFlowの処理内容
src/LeFlow
# Create folder and generate Makefile
# Clean folder to erase previously generated files
# Generate IR from tensorflow
# Remove unused files and name things properly
# Convert to old LLVM syntax (Tensorflow and LegUp use different versions
of LLVM)
# Unrolling loops
# Restructure function signature, change variables scope and reorganizes code
# Reqrites unsupported operations
# Partitioning the memories
# Convert human-readable .ll file to bitcode (.bc file)
# Start LegUp compilation
# Instrumenting verilog testbench
XLA関連のオプション
src/LeFlow
--xla_dump_ir_to="+project_folder+"ir "
ダンプファイルのパス名を指定
--xla_llvm_enable_invariant_load_metadata=false
--xla_llvm_enable_noalias_metadata=false
--xla_llvm_enable_alias_scope_metadata=false --xla_enable_fast_math=false
--xla_backend_optimization_level=0
LLVMのオプションで最適化をしないようにしている
LeFlowの処理内容
src/LeFlow
# Create folder and generate Makefile
# Clean folder to erase previously generated files
# Generate IR from tensorflow
# Remove unused files and name things properly
# Convert to old LLVM syntax (Tensorflow and LegUp use different versions
of LLVM)
# Unrolling loops
# Restructure function signature, change variables scope and reorganizes code
# Reqrites unsupported operations
# Partitioning the memories <= ここがポイント(メモリを分割)
# Convert human-readable .ll file to bitcode (.bc file)
# Start LegUp compilation
# Instrumenting verilog testbench
生成されるHDLの構成
← メモリの分割
Current Limitations and Opportunities
1). LeFlow currently uses kernels that were implemented in XLA and were originally meant to be used
by CPUs. Although compiler optimizations and scheduling are able to retrieve a substantial amount of
parallelism from those implementations, LeFlow would heavily benefit from an XLA back-end with
kernels targeting FPGAs ;
2). The high dimensionality of inputs/weights and the amount of parallel accesses that are typical in
machine learning applications is a challenge for modern automatic memory partitioning algorithms .
LeFlow would specially benefit from a machine learning specific automatic memory partitioning
algorithm.
3). One of the key possibilities that make deep learning networks efficient in FPGAs is t he opportunity
to use a customizable fixed-point bit width . Adding fixed-point support to LeFlow will be an important
step in the development of this toolkit. Additionally, techniques to automatically profile the application
and choose the appropriate representation could be easily explored in software with Tensorflow and
deployed in hardware.
4). Although it is straightforward to use Tensorflow to debug the functionality of an implementation, it
is currently difficult for software developers to debug the generated hardware in terms of the original
Python code. A performance debugging infrastructure suitable for software developers is another
interesting venue for research .
ブログ (2007年~) : Vengineerの戯言
 http://blogs.yahoo.co.jp/verification_engineer
SlideShare :
 https://www.slideshare.net/ssuser479fa3
Twitter (2009年~) :
@Vengineer
ありがとうございました

LeFlowを調べてみました

  • 1.
    LeFlow Enabling Flexible FPGAHigh-Level Synthesis of Tensorflow Deep Neural Networks @Vengineer 2018/07/22 (LeFlow追記) TensorFlow XLAのロゴ https://github.com/tensorflow/tensorflow/tree/master/tensorflow/compiler/xla
  • 2.
    ブログ (2007年~) :Vengineerの戯言  http://blogs.yahoo.co.jp/verification_engineer SlideShare :  https://www.slideshare.net/ssuser479fa3 Twitter (2009年~) : @Vengineer
  • 3.
    Enabling Flexible FPGAHigh-Level Synthesis of Tensorflow Deep Neural Networks 論文 : https://arxiv.org/abs/1807.05317  Submitted on 14 Jul 2018 github : https://github.com/danielholanda/LeFlow 最適化  XLAの最適化無しのLLVM IRを使って、LegUpで最適化を行う。 LeFlow
  • 4.
    LeFlowのフォロー ・TensorFlowのコード が VerilogHDL に! ・TensorFlow XLA側の修正無し  (ただし、ソースコードへのパッチは必要)  (2ファイル) ・LegUpのそのまま使える ・Intel (Altera)へのサンプルコード有 (Xilinxでもできる模様)
  • 5.
  • 6.
    最適化 Algorithm 1  %0 =bitcast i8** %params to [2 x float]**  %arg0 = load [2 x float] %0, align 8 => @arg0 を global に変更し、ゼロで初期化 Algorithm 2  %arg0 = global [2 x float] zeroinitializer, align 8
  • 7.
    MNIST with tf.devide(“device:XLA_CPU:0”): y =tf.nn.softmax(tf.add(tf.matmul(input, weights)[0], bias))
  • 8.
    LeFlowの処理内容 src/LeFlow # Create folderand generate Makefile # Clean folder to erase previously generated files # Generate IR from tensorflow # Remove unused files and name things properly # Convert to old LLVM syntax (Tensorflow and LegUp use different versions of LLVM) # Unrolling loops # Restructure function signature, change variables scope and reorganizes code # Reqrites unsupported operations # Partitioning the memories # Convert human-readable .ll file to bitcode (.bc file) # Start LegUp compilation # Instrumenting verilog testbench
  • 9.
  • 10.
    LeFlowの処理内容 src/LeFlow # Create folderand generate Makefile # Clean folder to erase previously generated files # Generate IR from tensorflow # Remove unused files and name things properly # Convert to old LLVM syntax (Tensorflow and LegUp use different versions of LLVM) # Unrolling loops # Restructure function signature, change variables scope and reorganizes code # Reqrites unsupported operations # Partitioning the memories <= ここがポイント(メモリを分割) # Convert human-readable .ll file to bitcode (.bc file) # Start LegUp compilation # Instrumenting verilog testbench
  • 11.
  • 12.
    Current Limitations andOpportunities 1). LeFlow currently uses kernels that were implemented in XLA and were originally meant to be used by CPUs. Although compiler optimizations and scheduling are able to retrieve a substantial amount of parallelism from those implementations, LeFlow would heavily benefit from an XLA back-end with kernels targeting FPGAs ; 2). The high dimensionality of inputs/weights and the amount of parallel accesses that are typical in machine learning applications is a challenge for modern automatic memory partitioning algorithms . LeFlow would specially benefit from a machine learning specific automatic memory partitioning algorithm. 3). One of the key possibilities that make deep learning networks efficient in FPGAs is t he opportunity to use a customizable fixed-point bit width . Adding fixed-point support to LeFlow will be an important step in the development of this toolkit. Additionally, techniques to automatically profile the application and choose the appropriate representation could be easily explored in software with Tensorflow and deployed in hardware. 4). Although it is straightforward to use Tensorflow to debug the functionality of an implementation, it is currently difficult for software developers to debug the generated hardware in terms of the original Python code. A performance debugging infrastructure suitable for software developers is another interesting venue for research .
  • 13.
    ブログ (2007年~) :Vengineerの戯言  http://blogs.yahoo.co.jp/verification_engineer SlideShare :  https://www.slideshare.net/ssuser479fa3 Twitter (2009年~) : @Vengineer ありがとうございました