Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Intel Nervana Graph とは?

1,926 views

Published on

Intelに買収されたNervanaのGraph Compilerのソースコードを探ってみました.

2017/06/22の下記のBlog情報を追記しました。
Intel® Nervana™ Graph Beta
https://www.intelnervana.com/intel-nervana-graph-and-neon-3-0-updates/

追記)、2017/08/12

Published in: Devices & Hardware
  • Hey guys! Who wants to chat with me? More photos with me here 👉 http://www.bit.ly/katekoxx
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Intel Nervana Graph とは?

  1. 1. Intel Nervana Graph とは @Vengineer 2017/05/22 2017/07/01, 08/12更新 いつものように ソースコードの中を 探ってみました
  2. 2. ブログ : Vengineerの戯言 http://blogs.yahoo.co.jp/verification_engineer Twitter : @Vengineer FPGAマガジン (No.16/17) FPGAコミュニティのススメ http://fpga.cqpub.co.jp/ 自己紹介 SlideShare https://www.slideshare.net/ssuser479fa3
  3. 3. この資料は、 各社の公開情報を Google君で検索したものを まとめたものです。 ご利用は、自己責任でお願いします
  4. 4. 2016年8月9日、Intelは Nervana Systemsを 3.5億ドル以上で買収 創立2年のスタートアップで、投資家から2500万ドル近くを調達していた ということは、投資家は2年で10倍で売り抜けたということ 2年間で3億ドル Softbank GroupのARM買収は240億ポンドなので、ざっくり 1/100 引用 :http://jp.techcrunch.com/2016/08/10/20160809intel-buys-deep-learning-startup-nervana-systems-f or-a-reported-350-million/
  5. 5. Nervana Graph Compiler 引用:https://www.nervanasys.com/intel-nervana-graph-preview-release/ ・Frontends : neon / TensorFlow / Caffe / Caffe2 / CNTK /MXnet ・Nervana Graph ・Transformers : CPU / GPU (CUDA) Lowering
  6. 6. TensorFlow グラフ XLAグラフに変換 コード生成 JIT or AOT LLVMを 利用 Lowering TensorFlow XLA CPU GPU(CUDA)
  7. 7. Nervana Graph Compiler と TensorFlow XLA 何か同じじゃん
  8. 8. 出ましたよ https://www.intelnervana.com/intel-nervana-graph-and-neon-3-0-updates/ The connection between the XLA and Intel Nervana Graph APIs was quite straightforward given the similar projects’ intent for a compact and explicit intermediate representation. While today the XLA/Intel Nervana Graph integration is at a pre-alpha level, we’d love for people to take it for a spin and kick the tires. We’re working on ironing out known performance issues and improving op and backend support. Intel Nervana Graph Beta : 2017/6/22
  9. 9. Intel neon
  10. 10. neon https://github.com/NervanaSystems/neon 最新バージョンは、v1.9 ARMのNEONと同じ名前だけど neon is Intel Nervana's reference deep learning framework committed to best performance on all hardware
  11. 11. Datasets Images: MNIST, CIFAR-10, ImageNet 1K, PASCAL VOC, Mini-Places2 Text: IMDB, Penn Treebank, Shakespeare Text, bAbI, Hutter-prize Video: UCF101 Others: flickr8k, flickr30k, COCO
  12. 12. neon vs cuDNN 4 “Not so fast, FFT”: Winograd (March 3, 2016) 引用:https://www.nervanasys.com/winograd/
  13. 13. cuDNN 5 Optimizing Recurrent Neural Networks in cuDNN 5 (April 6, 2016) https://devblogs.nvidia.com/parallelforall/optimizing-recurren t-neural-networks-cudnn-5/ Faster forward and backward convolutions using the Winograd convolution algorithm;
  14. 14. Winogradで高速化! Fast Algorithms for Convolutional Neural Networks Andrew Lavin, Scott Gray https://arxiv.org/abs/1509.09308 Going beyond full utilization: The inside scoop on Nervana’s Winograd kernels (June 29, 2016) https://www.nervanasys.com/winograd-2/
  15. 15. neon v1.3 vs cuDNN v5.1 Still not slowing down: Benchmarking optimized Winograd implementations (July 25, 2016) 引用:https://www.nervanasys.com/winograd-3/ vs cuDNN v4 vs cuDNN v5.1
  16. 16. Scott Gray さん https://twitter.com/scottgray76 High-Performance GPU kernels for deep learning • Fast matrix multiply for small minibatches • Direct convolution leveraging GEMM advances • Even faster convolution with Winograd Nervana (2014年10月 〜 2017年7月) 現在は、Open AI所属 (〜 2017年7月)  引用 :http://on-demand.gputechconf.com/gtc/2016/presentation/s6485-scott-gray-gpu-programming-deep-learnin g.pdf
  17. 17. Intel Nervana Graph Compiler
  18. 18. Nervana Graph Compiler 引用:https://www.nervanasys.com/intel-nervana-graph-preview-release/ ・Frontends : neon / TensorFlow / Caffe / Caffe2 / CNTK /MXnet ・Nervana Graph ・Transformers : CPU / GPU (CUDA) Lowering
  19. 19. Graph Compilerの位置づけ 引用:http://pc.watch.impress.co.jp/docs/news/1034408.html
  20. 20. MKL-DNN Support Mar 23, 2017 :Intelに買収された後 To install with Intel MKL-DNN support, first download MKL-DNN from [here] ・(https://github.com/01org/mkl-dnn) and follow the installation instructions ・there to install MKL-DNN. Set environment variable MKLDNN_ROOT to point to ・the installated location and follow the rest of the steps to install Ngraph 引用:https://github.com/NervanaSystems/ngraph/commit/f3b7306214f40b4c1b4c40e3e223080797afb382
  21. 21. Transformer API ・CPU と GPU をサポート Memory usage optimization passes Transformers allow users to register an included set of optional compiler passes for debug and visualization. ・GPU automatic kernel fusion/compounding for increased performance ・LLVMのPassのような仕組み 引用:https://github.com/NervanaSystems/ngraph/blob/master/README.md
  22. 22. グラフを生成する ・Nervana Graph構造 Data Dependencies Initializers Non-data Control Dependencies ・General properties of ops ・Op Hierarchy ・Ops influencing evaluation ・Derivatives 引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/building_graphs.rst
  23. 23. 将来サポートするもの? ・Nervana Graph serialization/deserialization ・Further improvements/abstractions to graph composability for usability/optimization ・Distributed, heterogeneous backend target support ・C APIs for interoperability to enable other languages to create/execute graphs ・Better debugging ・Support for model deployment 引用:https://github.com/NervanaSystems/ngraph/blob/master/README.md
  24. 24. コレ以降、 Intel Nervana Graph Compilerの ソースコードを探っていいきます ngraph https://github.com/NervanaSystems/ngraph
  25. 25. サンプルコード import ngraph as ng import ngraph.transformers as ngt x = ng.placeholder(()) x_plus_one = x + 1 transformer = ngt.make_transformer() plus_one = transformer.computation(x_plus_one, x) for i in range(5): print(plus_one(i)) 引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/overview.rst
  26. 26. Caffeでの例 from __future__ import print_function import ngraph.transformers as ngt from ngraph.frontends.caffe.cf_importer.importer import parse_prototxt model = "sum.prototxt" op_map = parse_prototxt(model,verbose=True) op = op_map.get("D") res = ngt.make_transformer().computation(op)() print("Result is:",res) 引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/caffe.rst
  27. 27. TensorFlowでの例 x = tf.constant(1.) y = tf.constant(2.) f = x + y importer = TFImporter() importer.import_graph_def(tf.Session().graph_def) f_ng = importer.get_op_handle(f) transformer = ngt.make_transformer() f_result = transformer.computation(f_ng)() print(f_result) 引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/tensorflow.rst
  28. 28. Transformers & Computations
  29. 29. Transformers Transfomersは、グラフをバックエンド固有の実行フォーマットに変換 する。グラフから、Transformerによって、1つ以上のComputationが 生成される。Transfomerによって生成された実行オブジェクトは、 Computationによって操作される。 すべてのTransformerは、ユーザがバックエンドを切り替えられるよう に共通の抽象インターフェースを実装しなければいけない。 サポートしているバックエンド ・CPUs (via NumPy) ・NVIDIA GPUs (via PyCUDA) 引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
  30. 30. Transformerの生成 1)、デフォルト from ngraph.transformers import make_transformer transformer = make_transformer() 2)、ファクトリを利用 import ngraph.transformers as ngt available_transformers = ngt.transformer_choices() if 'gpu' in available_transformers: factory = ngt.make_transformer_factory('gpu') ngt.set_transformer_factory(factory) transformer = ngt.make_transformer() 引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
  31. 31. Computations Computatonは、Transformerによって生成され、グラフのサブセット を評価するためのインターフェースを備えている 生成された Computation のフォーマットは、Computation を実行する Transformerに依存する。 たとえば、  ・CPUなら、NumPy  ・GPUなら、PyCUDA である 引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
  32. 32. Computationの生成 import ngraph as ng a = ng.constant(4) b = ng.placeholder(()) c = ng.placeholder(()) d = ng.multiply(a, b) e = ng.add(d, c) example_comp = transformer.computation(e, b, c) 引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
  33. 33. Computationの実行 example_comp = transformer.computation(e, b, c)  result_e = eの戻り値  b = 第一引数  c = 第二引数 result_e = example_comp(2, 7) : b = 2, c = 7 result_e = (4 * b) + c => ( 4*2 ) + 7 = 15 引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
  34. 34. Computationの実行 複数の戻り値 example_comp2 = transformer.computation([d, e], b, c)  result_d = dの戻り値, result_e = eの戻り値  b = 第一引数  c = 第二引数 result_d, result_e = example_comp2(2, 7) result_d = (4 * b) = (4 * 2) = 8 result_e = (4 * b) + c => (4 * 2) + 7 = 15 引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
  35. 35. Transformerの実装 ・Transformerの生成 ・Computationの生成 ・Transformerの初期化 Transformer Passes Intialization Computation Tensor Description Initialization Computation Transformation ・Computationの実行 引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_implementation.rst
  36. 36. Transformerの実装 base.py : Transformer_ABC_Meta base.py : Transformer (ベース) cputransform.py : CPUTransformer gputransform.py : GPUTransformer hetrtransform.py : HetrTransformer 引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers
  37. 37. Transformer_ABC_Metaクラス class Transformer_ABC_Meta(abc.ABCMeta): """ metaclass for the backend objects takes care of registering all the backend subclasses """ def __init__(cls, name, bases, dict_): if not hasattr(cls, 'transformers'): # First possible transformer class sets things up cls.transformers = {} # If this transformer has a transformer_name, register it transformer_name = getattr(cls, 'transformer_name', None) if transformer_name is not None: cls.transformers[transformer_name] = cls super(Transformer_ABC_Meta, cls).__init__(name, bases, dict_) 引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/base.py
  38. 38. Transformerクラス class Transformer(with_metaclass(Transformer_ABC_Meta, object)): """ Produce an executable version of op-graphs. Computations are subsets of Ops to compute. The transformer determines storage allocation and transforms the computations and allocations into functions. Arguments: fusion (bool): Whether to combine sequences of operations into one operation. **kwargs: Args for related classes. Attributes: computations (:obj:`set` of :class:`Computation`): The set of requested computations. all_results (:obj:`set` of :class:`ngraph.op_graph.op_graph.Op`): A root set of Ops that need to be computed. finalized (bool): True when transformation has been performed. initialized (bool): True when variables have been initialized/restored. fusion (bool): True when fusion was enabled. device_buffers (set): Set of handles for storage allocations. """ 引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/base.py
  39. 39. Computationクラス class Computation(NameableValue): """ A handle for a computation function. Arguments: transformer (obj:`Transformer`): The associated transformer. returns: If an Op, return the value of the Op, if sequence of Ops, return the sequence of values, if a set return a map, if None, return None. *args: AllocationOps marked input will be arguments to the function. **kwargs: Args for related classes. """ 引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers
  40. 40. Computationクラス def __init__(self, transformer, computation, **kwargs): super(Computation, self).__init__(**kwargs) self.transformer = transformer self.computation = computation self.computation_name = None self.executor = None self.send_nodes = [] self.recv_nodes = [] self.scatter_send_nodes = [] self.scatter_recv_nodes = [] self.gather_send_nodes = [] self.gather_recv_nodes = [] self.allreduce_nodes = [] 引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/base.py
  41. 41. Computationの実装 base.py : Computation (ベース) cputransform.py : CPUComputation Base.py : GPUComputation hetrtransform.py : HetrComputation make_computationが実行されたとき、 各Transformerに対応するComputationが 生成される 引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers
  42. 42. Computationの実装 cputransform.py : CPUComputation def make_computation(self, computation): return CPUDeviceComputation(self, computation) base.py : GPUComputation def make_computation(self, computation): return Computation(self, computation) hetrtransform.py : HetrComputation def make_computation(self, computation): return HetrComputation(self, computation)
  43. 43. Computationクラス class Computation(NameableValue): def __init__(self, transformer, computation_op, **kwargs): super(Computation, self).__init__(**kwargs) logging.info("Creating computation with computation_op: %s", computation_op) self.transformer = transformer self.computation_op = computation_op self.computation_name = None self.executor = None 引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/base.py
  44. 44. CPUDeviceComputationクラス class CPUDeviceComputation(Computation): def __init__(self, transformer, computation, **kwargs): super(CPUComputation, self).__init__(transformer, computation, **kwargs) self.pool_params = dict() self.pool_slices = dict() self.conv_params = dict() self.conv_slices = dict() 引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/cputransform.py
  45. 45. HetrComputationクラス class HetrComputation(Computation): def __init__(self, hetr, computation_op): self.child_computations = dict() self.transformer = hetr self.send_nodes = hetr.send_nodes self.computation_op = computation_op 引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/hetrtransform.py
  46. 46. Passの実装 (その1) passes.py GraphPass (ベースクラス) passes.py GraphBuildingPass passes.py GraphRewritePass passes.py PeepholeGraphPass passes.py RequiredTensorShaping passes.py CPUTensorShaping passes.py SimplePrune flexpass.py FlexDtypePass flexpass.py FlexDECPass flexpass.py ClearTensorDescriptions nviz.py JSONPass(GraphPass): nviz.py VizPass(GraphPass): 引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/passes/base.py
  47. 47. Passの実装 (その2) layout.py PruneContiguousPass layout.py GenerateLayoutDomains layout.py GenerateLayoutConstraints layout.py AssignLayouts layout.py AddLayoutConversions cpufusion.py FusionPass cpulayout.py CPUTensorLayout gpusimplification.py GPUSubstitution hetrpasses.py DeviceAssignPass hetrpasses.py CommunicationPass hetrpasses.py DistributedPass 引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/passes
  48. 48. Passの実装 (その3) MKL-DNNを利用するときの Pass mkldnnpasses.py MklCreateOpDescriptors mkldnnpasses.py MklAddLayoutConversions 引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/passes
  49. 49. ComputationGraphTransformerクラス class ComputationGraphTransformer(Transformer): def run_registered_graph_passes(self, ops, **kwargs): for graph_pass in self.graph_passes: graph_pass.wrapped_do_pass(ops=ops, **kwargs) return ops gputransformer.py  class GPUTransformer(ComputationGraphTransformer): hetrtransformer.py  class HetrTransformer(ComputationGraphTransformer): 引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/base.py
  50. 50. ExecutionGraphTransformerクラス extransformer.py  class ExecutionGraphTransformer(Transformer): def run_registered_graph_passes(self, computation_decl, **kwargs): op_accessor = ExOpGraphOpAccessor() for graph_pass in self.graph_passes: graph_pass.wrapped_do_pass( op_accessor=op_accessor, computation_decl=computation_decl, **kwargs) cputransformer.py  class CPUTransformer(ExecutionGraphTransformer): 引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/extransformer.py
  51. 51. GraphPassクラス class GraphPass(with_metaclass(abc.ABCMeta, DelegateOpAccessor)): def wrapped_do_pass(self, **kwargs): self.begin_pass(**kwargs) self.do_pass(**kwargs) self.end_pass(**kwargs) @abc.abstractmethod def do_pass(self, **kwargs): pass 引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/base.py
  52. 52. CPUTransformerクラス class CPUTransformer(Transformer): def __init__(self, **kwargs): super(CPUTransformer, self).__init__(**kwargs) self.current_computation = None self.conv_engine = CPUConvEngine() self.init_code = CPUCodeGenerator(self) self.allocate_storage_code = CPUCodeGenerator(self) self.allocate_code = CPUCodeGenerator(self) self.compute_code = CPUCodeGenerator(self) self.code = CPUCodeGenerator(self) ….. 引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/cputransform.py
  53. 53. CPUTransformerクラス  Passの追加   self.graph_passes = [] if self.mkldnn.enabled: self.graph_passes.append(CPUFusion()) self.graph_passes += [ # ExVizPass(view=True, filename="initial"), CPUTensorLayout(), SimplePrune(), RequiredTensorShaping(), CPUTensorShaping(), DeadCodeEliminationPass(), ] 引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/cputransform.py
  54. 54. CPUTransformerクラス if self.mkldnn.enabled: self.graph_passes.append(MklCreateOpDescriptors(mkldnn=self.mkldnn)), DeadCodeEliminationPass(), self.graph_passes.append(MklAddLayoutConversions(mkldnn=self.mkldnn, layoutpass=add_layout_conversion)), DeadCodeEliminationPass() self.graph_passes += [ SSAConversion(), IndexElision(), # DeadCodeEliminationPass(), LivenessPass(), MemOptimizePass(), LivenessPass(), MemLayoutPass() ] 引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/cputransform.py
  55. 55. CPUTransformerクラス # from ngraph.transformers.passes.dumpgraphpass import DumpGraphPass # self.graph_passes += [DumpGraphPass()] # from ngraph.transformers.passes.visualizemem import VisualizeMemPass # self.graph_passes += [VisualizeMemPass()] 引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/cputransform.py
  56. 56. GPUTransformerクラス class GPUTransformer(Transformer): def __init__(self, device_id=None, comm=None, **kwargs): super(GPUTransformer, self).__init__(**kwargs) GPUTransformer.gpu_transformers.add(self) ….. self.graph_passes = [ SimplePrune(), PruneContiguousPass(), GPUSubstitution(), layout_domain_pass, layout_constraints_pass, Layout_assign_pass, layout_convert_pass ] 引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/gputransform.py
  57. 57. HetrTransformerクラス class HetrTransformer(Transformer): def __init__(self, device_id=None, comm=None, **kwargs): super(HetrTransformer, self).__init__(**kwargs) ….. self.graph_passes = [ DeviceAssignPass(hetr=self, default_device=device, default_device_id=0), CommunicationPass(self.send_nodes), DistributedPass(self.send_nodes) ] 引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/hetrtransform.py
  58. 58. コード生成
  59. 59. CPUCodeGeneratorクラス class CPUCodeGenerator(PyGen): def __init__(self, transformer, **kwargs): super(CPUCodeGenerator, self).__init__(prefix="op", **kwargs) self.transformer = transformer def name(self, x): if isinstance(x, CPUDeviceBufferStorage): return x.ref_str if isinstance(x, CPUDeviceTensor): return x.ref_str return x 引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/cputransform.py
  60. 60. ありがとうございました ブログ : Vengineerの戯言 http://blogs.yahoo.co.jp/verification_engineer Twitter : @Vengineer 勉強会主催 : Xilinx Zynq MPSoC (2016/02/20) Altera SDK for OpenCL (2016/06/10) Xilinx SDSoC (2017/01/28) PYNQ祭り (2017/03/04) FPGAディープラーニング実践懇親会 (2017/05/20)

×