Bridge TensorFlow to run on Intel nGraph backends (v0.4)

Bridge TensorFlow* to run on Intel®
nGraph™ backends
v0.4 : ソースコード解析
作成：2018/07/16, 21, 08/11
Slideshareにて公開
　　：2018/09/04
@Vengineer

ブログ (2007年～) : Vengineerの戯言
　http://blogs.yahoo.co.jp/verification_engineer
SlideShare :
　https://www.slideshare.net/ssuser479fa3
Twitter (2009年～) :
＠Vengineer
最近は、ソースコード解析職人

https://github.com/NervanaSystems/ngraph
Intel nGraph library
ONNX
neon
TensorFlow
MXNet
NNP = ARGON ?

TensorFlow r1.3
XLA + Intel nGraph
@Vengineer
2018/03/21, 03/25
03/29に新しいコードが公開され、
このコードはgithubから削除されたのでお蔵入り。

TensorFlow
dynamically loadable XLA plugin
proposal
https://blogs.yahoo.co.jp/verification_engineer/71526428.html
2018/04/09

TensorFlow
dynamically loadable XLA plugin の内容
https://blogs.yahoo.co.jp/verification_engineer/71526444.html
2018/04/10

TensorFlow Dynamically loadable XLA
Pluginソースコード解析
https://leapmind.connpass.com/event/87729/
2018/05/25(水)＠LeapMind

TensorFlow
dynamically loadable XLA plugin
https://github.com/NervanaSystems/ngraph-tensorflow
tensorflow/compiler/plugin/dynamic
TensorFlowのXLA側のコードの修正必要が無くなる

ええええ、XLA無しのブリッジだって？

/XLAが無くなっているよ。。。。
https://github.com/NervanaSystems/ngraph-tf

Bridge TensorFlow* to run on Intel®
nGraph™ backends
https://github.com/NervanaSystems/ngraph-tf
https://github.com/NervanaSystems/ngraph-tf/tree/r0.4/

復習
TensorFlowでのグラフ表現

まずは、cpu で確認してみると
def test_cpu(self):
with tf.Session() as sess:
x = tf.placeholder(tf.float32, [2], name="x")
with tf.device("cpu"):
y = x * 2
result = sess.run(y, {x: [1.5, 0.5]})

0)、最初
Mul
Const
Feed(x)
Fetch(y)

1)、Feed/Fetchノードの追加
Mul
_Recv
Const
_Send
Feed(x)
Fetch(y)

2)、Placement
Mul
_Recv
Const
_Send
cpu : Feed(x)
cpu : Fetch(y)
cpu
cpu

3)、グラフの分割
Mul
_Recv
Const
_Send
cpu : Feed(x)
cpu : Fetch(y)
cpu
cpu

次に、gpu に変更してみると
def test_gpu(self):
with tf.device("gpu"):
y = x * 2

2)、Placement
Mul
_Recv
Const
_Send
cpu : Feed(x)
cpu : Fetch(y)
gpu
gpu

_Recv
_Recv
_Send
_Send _Recv _Send
gpu
Feed(x) Fetch(y)cpu
Mul
Const

デバイスの登録
core/common_runtime/device_factory.{h,c}
// The default priority values for built-in devices is:
// GPU: 210
// SYCL: 200
// GPUCompatibleCPU: 70
// ThreadPoolDevice: 60
// Default: 50
REGISTER_LOCAL_DEVICE_FACTORY マクロで設定する

src/ngraph_device.cc
class NGraphDevice : public Device {
public:
explicit NGraphDevice(const DeviceAttributes& attr) : Device(nullptr, attr) {
m_allocator = cpu_allocator();
m_device_context = new NGraphDeviceContext();
m_device_context->Ref();
}
~NGraphDevice() { m_device_context->Unref(); }
Status Sync() override { return Status::OK(); }
…..
private:
tf::Allocator* m_allocator;
NGraphDeviceContext* m_device_context; // not owned
};
NGraphDevice

src/ngraph_device.cc
namespace ngraph_bridge {
extern const char* const DEVICE_NGRAPH = "NGRAPH";
}
class NGraphDeviceFactory : public DeviceFactory {
public:
Status CreateDevices(const SessionOptions& options, const string& name_prefix,
std::vector<Device*>* devices) override {
DeviceAttributes attr;
attr.set_name(strings::StrCat(name_prefix, "/device:NGRAPH:0"));
attr.set_device_type(ngraph_bridge::DEVICE_NGRAPH);
devices->push_back(new NGraphDevice(attr));４
return Status::OK();
}
};
REGISTER_LOCAL_DEVICE_FACTORY( ngraph_bridge::DEVICE_NGRAPH,
NGraphDeviceFactory, 50 );
NGraphDeviceFactory

Passを使ってグラフの変形を行っている
　1)、Feed/Fetchノードの追加
subgraph::RewriteGraphForExecution
ここで、PRE_PLACEMENTパスを実行
　2)、Placement
ここで、POST_PLACEMENTパスを実行
　　SimpleGraphExecutionState::BuildGraph関数で
　　　POST_REWRITE_FOR_EXECパスを実行
　3)、グラフの分割
Partition
ここで、POST_PARTITIONINGパスを実行

OptimizationPass
tensorflow/core/common_runtime/optimization_registry.h
class OptimizationPassRegistry {
public:
// Groups of passes are run at different points in initialization.
enum Grouping {
PRE_PLACEMENT, // after cost model assignment, before placement.
POST_PLACEMENT, // after placement.
POST_REWRITE_FOR_EXEC, // after re-write using feed/fetch endpoints.
POST_PARTITIONING, // after partitioning
};

PRE_PLACEMENT, 80, NGraphLiberatePass
PRE_PLACEMENT, 90, NGraphConfirmPass
PRE_PLACEMENT, 100, NGraphDumpPrePlacement
POST_PLACEMENT, 100, NGraphDumpPostPlacement
POST_REWRITE_FOR_EXEC, 100, NGraphDumpPostReWrite
POST_REWRITE_FOR_EXEC, 105, NGraphClusterPass　　　　// クラスタ化
POST_REWRITE_FOR_EXEC, 105, NGraphDumpPostClustering
POST_REWRITE_FOR_EXEC, 110, NGraphEncapsulatePass // Op化
POST_REWRITE_FOR_EXEC, 115, NGraphDumpPostEncapsulation
POST_PARTITIONING, 100, NGraphDumpPostPartitioning
最適化パス

次に、NGRAPH に変更してみると
def test_ngraph(self):
with tf.device("NGRAPH"):
y = x * 2

2)、Placement
Mul
_Recv
Const
_Send
cpu : Feed(x)
cpu : Fetch(y)
NGRAPH
NGRAPH

_Recv
_Recv
_Send
_Send _Recv _Send
NGRAPH
Feed(x) Fetch(y)cpu
Mul
Const

PRE＿PLACEMENT
1)、NGraphLiberatePass
2)、NGraphConfirmPass
最適化パス

src/ngraph_liberate_pass.cc
REGISTER_OPTIMIZATION(
OptimizationPassRegistry::PRE_PLACEMENT,
80,
ngraph_bridge::NGraphLiberatePass);
NGraphLiberatePass

class NGraphLiberatePass : public tensorflow::GraphOptimizationPass {
public:
tf::Status Run(const tf::GraphOptimizationPassOptions& options) {
return LiberateNGraphPlacement(options.graph->get());
}
private:
static bool IsNGraphNode(const tf::Node* node);
tf::Status LiberateNGraphPlacement(tf::Graph* graph);
};
NGraphLiberatePass

tf::Status LiberateNGraphPlacement(tf::Graph* graph) {
int i = 0;
for (auto node : graph->op_nodes()) {
if (node->IsOp() && IsNGraphNode(node)) {
std::vector<std::string> colo;
if (tf::GetNodeAttr(node->attrs(), tf::kColocationAttrName, &colo) == tf::Status::OK()) {
for (auto& s : colo) {
std::stringstream ss; ss << s << "/LIBERATED_" << (i++); s = ss.str();
}
node->ClearAttr(tf::kColocationAttrName);
node->AddAttr(tf::kColocationAttrName, colo);
}
}
}
return tf::Status::OK();
}
NGraphLiberatePass

// At graph construction time, TensorFlow likes to place colocation constraints
// that force variables onto the same device as their initializers. For nGraph
// this doesn't work very well, because we don't yet support RNG ops, and this
// results in randomly-initialized variables being forced onto the host.
//
// The workaround implemented here is to "liberate" nGraph-placed ops from
// colocation constraints. This pass only applies to nodes with a requested
// placement on NGRAPH, meaning that the graph will be unchanged except
// where the user has explicitly requested nGraph.
NGraphLiberatePass

// General algorithm:
//
// i := 0
// For each node n in the graph:
// If n has been placed on device NGRAPH:
// For each colocation constraint s on n:
// Append the string ("/LIBERATED_" + i) to s
// i++
//
// (Note that simply blanking out the colocation constraints does not work,
// because this causes the placer to act as if the node is subject to an
// eponymous colocation constraint, which happens to be exactly the name that
// the variable construction stuff will assign to it anyway.)
NGraphLiberatePass

src/ngraph_confirm_pass.cc
OptimizationPassRegistry::PRE_PLACEMENT,
90,
ngraph_bridge::NGraphConfirmPass);
NGraphConfirmPass

class NGraphConfirmPass : public tensorflow::GraphOptimizationPass {
public:
return ConfirmPlacement(options.graph->get());
}
private:
using ConfirmationFunction = std::function<tf::Status(tf::Node*, bool*)>;
static bool NGraphPlacementRequested(const tf::Node* node) ;
static tf::Status ExtractConstantData(tf::Node* node, std::vector<tf::int64>* values);
tf::Status ConfirmPlacement(tf::Graph* graph) ;
};
NGraphConfirmPass

// In some cases, we require more complex placement constraints than than
// TensorFlow's native "soft-placement" machinery is capable of handling. To
// handle this, we insert a pass called the "confirmation" pass during the
// pre-placement phase.
// For example, we can only handle Reshape if the "shape" input is a constant,
// so this is okay:
//
// ... Const[2,4,2]
// /
// Reshape (1)
//
// but this is not:
//
// ... Placeholder
// /
// Reshape (2)
NGraphConfirmPass

// We want to reject placement of Reshape on NGRAPH for the second graph, but
// allow it for the first. We also want to attach some more metadata to the
// Reshape node so that we can remember the requested output shape even if the
// Const node winds up being placed in a different subgraph.
//
// This pass exploits a feature of the placement engine that allows a kernel
// builder registration request to restrict use of the kernel to nodes that
// have a particular value set for the "_kernel" attribute. In this case, we
// will check every node that has a requested placement on NGRAPH, and make
// sure that it conforms to certain (op-dependent) constraints. If the
// constraints are satisfied, we will tag the node with a "_kernel" value of
// "ngraph", along with some op-specific metadata (if applicable). The stub
// kernels, in turn, are registered with the constraint that _kernel="ngraph".
// This means that during the placement pass, our kernels will not be allowed
// for nodes we did not mark during this pass, and placement will fall back on
// CPU.
NGraphConfirmPass

// Taking Reshape as an example, the pass ensures that the "shape" input is
// constant, and if so, it adds to the Reshape node the "_kernel=ngraph"
// attribute, along with some metadata recording the value of the constant.
// Thus graph (1) is transformed as follows:
//
// ... Const[2,4,2][_kernel="ngraph"]
// /
// Reshape[_kernel="ngraph",
// _ngraph_reshape_static_shape={2,4,2}]
//
// while graph (2) would be left unchanged, meaning that soft placement will
// fall back on non-nGraph implementations.
NGraphConfirmPass

// Internally, there are two pieces. The first is a type constraint checker,
// which supplants the type checking machinery usually used with
// REGISTER_KERNEL_BUILDER. This ensures that any constraints on the data types
// of input tensors are satisfied---for example, we do not support DT_STRING.
// The second part is a set of finer-grained per-op checks called "confirmation
// functions", implementing more specific checks like the one described for
// Reshape above.
//
// The confirmation functions are implemented as callbacks of the type:
//
// std::function<tf::Status(tf::Node*, bool*)>.
NGraphConfirmPass

// A confirmation function returns true/false by reference through its second
// parameter: true if placement is "accepted", and false if it is "rejected".
// For example, the confirmation function for "Reshape" will return true
// for (1) above, and false for (2).
//
// A confirmation function can also, as a side effect, add attributes to the
// node being checked, which can be used later in ngraph_builder. (Note that in
// general such attributes will need to start with "_" to mark them as
// "internal" or "system" attributes, as otherwise TensorFlow attempts to
// validate them as against the op schema.)
NGraphConfirmPass

POST_REWRITE_FOR_EXEC
1)、NGraphClusterPass
　・TensorFlow XLAでのサブグラフの分割と同じこと
2)、NGraphEncapsulatePass
　・分割したサブグラフを NGraphEncapsulateOp に変換
　　これも、TensorFlow XLAでの _XlaLaunchOp に変換と同じこと
最適化パス

src/ngraph_cluster.cc
namespace tensorflow {
OptimizationPassRegistry::POST_REWRITE_FOR_EXEC,
105,
ngraph_bridge::NGraphClusterPass);
}
NGraphClusterPass

tf::Status NGraphClusterPass::Run(
const tf::GraphOptimizationPassOptions& options) {
// TODO(amprocte): Remove this when we have proper support for graphs with
// cycles.
if (std::getenv("NGRAPH_TF_SKIP_CLUSTERING") != nullptr) {
}
tf::Graph* graph = options.graph->get();
return IdentifyClusters(graph); // ここで NGRAPH でのグラフ構築
}
NGraphClusterPass::Run

tf::Status NGraphClusterPass::IdentifyClusters(tf::Graph* graph) {
NGRAPH_VLOG(2) << "Starting contraction";
NGRAPH_VLOG(2) << "Contraction done";
NGRAPH_VLOG(2) << "Starting tagging";
NGRAPH_VLOG(2) << "Tagging done";
}
// nGraphのClusterに分割しているっぽい。
NGraphClusterPass::IdentifyClusters

src/ngraph_encapsulate_pass.cc
namespace tensorflow {
OptimizationPassRegistry::POST_REWRITE_FOR_EXEC,
110,
ngraph_bridge::NGraphEncapsulatePass);
} // namespace tensorflow
NGraphEncapsulatePass

src/ngraph_cluster.h
class NGraphEncapsulatePass : public tensorflow::GraphOptimizationPass {
public:
if (std::getenv("NGRAPH_TF_SKIP_ENCAPSULATION") != nullptr) {
NGRAPH_VLOG(0)
<< "NGRAPH_TF_SKIP_ENCAPSULATION is set. Skipping encapsulation "
"step.";
}
return EncapsulateFunctions(options.graph->get());
}
…….
};
NGraphEncapsulatePass

tf::Status EncapsulateFunctions(tf::Graph* graph) {
// Pass 1: Populate the cluster-index-to-device name map for each existing
// cluster.
// Pass 2: Find all nodes that are feeding into/out of each cluster, and
// add inputs for them to the corresponding FunctionDef(s).
// 各クラスタに入力部分がある場合は、入力ノードを追加？
NGraphEncapsulatePass::EncapsulateFunctions

// Pass 2: Find all nodes that are feeding into/out of each cluster, and
// add inputs for them to the corresponding FunctionDef(s).
// 各クラスタに入力部分がある場合は、入力ノード /出力ノードの追加？
auto new_input_node_def =
NGraphClusterManager::GetClusterGraph(dst_cluster_idx)->add_node();
new_input_node_def->set_name(new_input_name);
new_input_node_def->set_op("_Arg");
SetAttrValue(dt, &((*(new_input_node_def->mutable_attr()))["T"]));
SetAttrValue(arg_index_count[dst_cluster_idx],
&((*(new_input_node_def->mutable_attr()))["index"]));

// Pass 3: Create encapsulation nodes for all clusters.
// 1つのクラスタが1つのNGraphEncapsulateノードになる
tf::Node* n;
tf::Status status =
tf::NodeBuilder(ss.str(), "NGraphEncapsulate") // <= NGraphEncapsulateノードを
.Attr("ngraph_cluster", cluster_idx)
.Attr("Targuments", input_types)
.Attr("Tresults", cluster_output_dt_map[cluster_idx])
.Device(device_name_map[cluster_idx])
.Input(inputs)
.Finalize(graph, &n);
TF_RETURN_IF_ERROR(status);
n->set_assigned_device_name(device_name_map[cluster_idx]);
cluster_node_map[cluster_idx] = n;
}

// Pass 4: Remap all non-clustered inputs that are reading from
// encapsulated edges, and all control edges that cross cluster
// boundaries.
// Pass 5: Make copies of all clustered nodes inside the cluster graphs,
// rewiring the inputs in their NodeDefs as we go.
// Pass 6: Remove clustered nodes from the graph.

// Pass 7 (optional, only run if environment variable
// NGRAPH_TF_VALIDATE_CLUSTER_GRAPHS is set):
// validate the graph def, and make sure we can construct a graph from it.

src/ngraph_graph_rewrite_passes.cc
REGISTER_OPTIMIZATION(OptimizationPassRegistry::PRE_PLACEMENT, 100,
ngraph_bridge::NGraphDumpPrePlacement);
REGISTER_OPTIMIZATION(OptimizationPassRegistry::POST_PLACEMENT, 100,
ngraph_bridge::NGraphDumpPostPlacement);
REGISTER_OPTIMIZATION(OptimizationPassRegistry::POST_REWRITE_FOR_EXEC, 100,
ngraph_bridge::NGraphDumpPostReWrite);
ngraph_bridge::NGraphDumpPostClustering);
ngraph_bridge::NGraphDumpPostEncapsulation);
REGISTER_OPTIMIZATION(OptimizationPassRegistry::POST_PARTITIONING, 100,
ngraph_bridge::NGraphDumpPostPartitioning);
Passを使ってグラフをダンプしているよ

/src/ngraph_graph_rewrite_passes.cc
class NGraphDumpPrePlacement : public NGraphDumpPass {
public:
NGraphDumpPrePlacement() : NGraphDumpPass("pre_placement") {}
};
class NGraphDumpPostPlacement : public NGraphDumpPass {
public:
NGraphDumpPostPlacement() : NGraphDumpPass("post_placement") {}
};
class NGraphDumpPostReWrite : public NGraphDumpPass {
public:
NGraphDumpPostReWrite() : NGraphDumpPass("post_rewrite") {}
};
NGraphDumpPass

/src/ngraph_graph_rewrite_passes.cc
class NGraphDumpPostClustering : public NGraphDumpPass {
public:
NGraphDumpPostClustering() : NGraphDumpPass("post_clustering") {}
};
class NGraphDumpPostEncapsulation : public NGraphDumpPass {
public:
NGraphDumpPostEncapsulation() : NGraphDumpPass("post_encapsulation") {}
};
class NGraphDumpPostPartitioning : public NGraphDumpPass {
public:
NGraphDumpPostPartitioning() : NGraphDumpPass("post_partitioning") {}
};
NGraphDumpPass

src/ngraph_encapsulate_op.cc
REGISTER_OP("NGraphEncapsulate")
.Input("args: Targuments")
.Attr("Targuments: list(type) >= 0")
.Output("results: Tresults")
.Attr("Tresults: list(type) >= 0")
.Attr("ngraph_cluster: int")
.SetIsStateful()
.Doc("nGraph Encapsulation Op. For use by the nGraph JIT only.");
REGISTER_KERNEL_BUILDER(
Name("NGraphEncapsulate").Device(ngraph_bridge::DEVICE_NGRAPH),
ngraph_bridge::NGraphEncapsulateOp);
NGraphEncapsulateOp

NGraphEncapsulateOp(tf::OpKernelConstruction* ctx)
: tf::OpKernel(ctx), m_graph(tf::OpRegistry::Global()), m_freshness_tracker(nullptr) {
tf::GraphDef* graph_def;
OP_REQUIRES_OK(ctx, ctx->GetAttr<int>("ngraph_cluster", &m_ngraph_cluster));
// ここで、対応する Op の ctxから対応する GraphDef を獲得する
graph_def = NGraphClusterManager::GetClusterGraph(m_ngraph_cluster);
tf::GraphConstructorOptions opts;
opts.allow_internal_ops = true;
// GraphDef => Graphに変換
OP_REQUIRES_OK(ctx, tf::ConvertGraphDefToGraph(opts, *graph_def, &m_graph));
NGraphEncapsulateOp

// Create the backend
if (m_cpu_backend == nullptr) {
// 事前にバックエンドが指定されていないときは、 nGraphのCPUを使う
m_cpu_backend = ng::runtime::Backend::create("CPU");
OP_REQUIRES(ctx, m_cpu_backend != nullptr,
tf::errors::InvalidArgument("Cannot create CPU backend"));
}
}
NGraphEncapsulateOp

void Compute(tf::OpKernelContext* ctx) override {
…..
// Compile the graph using nGraph.
//
// TODO(amprocte): Investigate performance of the compilation cache.
// 最初だけ、コンパイルする
// グラフ : m_graph => 関数 : ng_function
if (it == m_ng_functions.end()) {
OP_REQUIRES_OK(
ctx, Builder::TranslateGraph(input_shapes, &m_graph, ng_function));
…..
NGraphEncapsulateOp::Compute

// Execute the nGraph function.
NGRAPH_VLOG(4) << "call starting for cluster " << m_ngraph_cluster;
// nGraphのバックエンドを使って、ng_function を実行する！
m_cpu_backend->call(ng_function, outputs, ng_inputs);
NGRAPH_VLOG(4) << "call done for cluster " << m_ngraph_cluster;
NGraphEncapsulateOp::Compute

src/ngraph_builder.cc
tf::Status Builder::TranslateGraph(
const std::vector<tf::TensorShape>& inputs,
const tf::Graph* input_graph,
shared_ptr<ng::Function>& ng_function)
// TensorFlow のグラフを nGraphのグラフに変換し、nGraphでJIT後、
// ng_function (関数のポインタ) を返す
// この関数は、めっちゃ巨大ですよ
Builder::TranslateGraph

vector<tf::Node*> tf_params;
vector<tf::Node*> tf_ret_vals;
vector<tf::Node*> tf_ops;
for (auto n : ordered) {
if (n->IsSink() || n->IsSource()) {
continue;
}
if (n->type_string() == "_Arg") {
tf_params.push_back(n); // パラメータ部
} else if (n->type_string() == "_Retval") {
tf_ret_vals.push_back(n); // 出力部
} else {
tf_ops.push_back(n);
}
}

// パラメータ部
Builder::OpMap ng_op_map;
vector<shared_ptr<ng::op::Parameter>> ng_parameter_list(tf_params.size());
for (auto parm : tf_params) {
tf::DataType dtype = tf::GetNodeAttr(parm->attrs(), "T", &dtype);
int index = tf::GetNodeAttr(parm->attrs(), "index", &index);
ng::element::Type ng_et = TFDataTypeToNGraphElementType(dtype, &ng_et));
ng::Shape ng_shape = TFTensorShapeToNGraphShape(inputs[index], &ng_shape);
auto ng_param = make_shared<ng::op::Parameter>(ng_et, ng_shape);
ng_op_map[parm->name()] = ng_param;
ng_parameter_list[index] = ng_param;
}

// 出力部
vector<shared_ptr<ng::Node>> ng_result_list(tf_ret_vals.size());
for (auto n : tf_ret_vals) {
tf::Node* tf_input_node;
n->input_node(0, &tf_input_node;
int index;
tf::GetNodeAttr(n->attrs(), "index", &index);
auto item = ng_op_map.find(tf_input_node->name());
if (item != ng_op_map.end()) {
ng_result_list[index] = item->second;
} else {
return tf::errors::InvalidArgument("Cannot find return node: ",
tf_input_node->name());
}
}

vector<shared_ptr<ng::Node>> ng_result_list(tf_ret_vals.size());
for (auto n : tf_ret_vals) {
tf::Node* tf_input_node;
n->input_node(0, &tf_input_node);
int index;
tf::GetNodeAttr(n->attrs(), "index", &index);
auto item = ng_op_map.find(tf_input_node->name());
if (item != ng_op_map.end()) {
ng_result_list[index] = item->second;
}
// 関数のポインタ (nGraph)
ng_function = make_shared<ng::Function>(ng_result_list, ng_parameter_list);
}

// Now create the nGraph ops from TensorFlow ops.
//
for (auto op : tf_ops) {
NGRAPH_VLOG(2) << "Constructing op " << op->name() << " which is "
<< op->type_string();
// NOTE: The following cases should be kept in alphabetical order.
// いろいろな Ops に対する処理をしている
}
Builder::TranslateGraphでサポートするOps

Abs, Add, AvgPool,
BiasAdd,
Cast, ConcatV2, Const, Conv2D,
DepthwiseConv2dNative,
Equal, Exp, ExpandDims,
Fill, Floor, FusedBatchNorm,
Greater, GreaterEqual,
Identify,
Less,LessEqual, Log, LogicalAnd,
MatMul, Maximum, MaxPool, Mean, Mul,
NoOp,
Pad, Pow, Prod,
Relu, Relu6, Reshape,
Sigmoid, Sign, Slice, Snapshot, Softmax, Sqeeze, StridedSlice, Sub, Sum,
TanH, Transpose,
Builder::TranslateGraphでサポートするOps

src/ngraph_stub_ops.cc
class NGraphStubOp : public OpKernel {
public:
explicit NGraphStubOp(OpKernelConstruction* ctx) : OpKernel(ctx) {}
void Compute(OpKernelContext* ctx) override {
OP_REQUIRES(ctx, false,
errors::Internal("NGraphStubOp compute kernel called"));
}
};
#define REGISTER_NGRAPH_STUB(name)
Name(name).Device(ngraph_bridge::DEVICE_NGRAPH).Label("ngraph"),
NGraphStubOp);
NGraphStubOp

REGISTER_NGRAPH_STUB("Abs");
REGISTER_NGRAPH_STUB("Add");
REGISTER_NGRAPH_STUB("AvgPool");
REGISTER_NGRAPH_STUB("BatchMatMul");
REGISTER_NGRAPH_STUB("BiasAdd");
REGISTER_NGRAPH_STUB("Cast");
REGISTER_NGRAPH_STUB("ConcatV2");
REGISTER_NGRAPH_STUB("Conv2D");
REGISTER_NGRAPH_STUB("Conv2DBackpropInput");
REGISTER_NGRAPH_STUB("DepthwiseConv2dNative");
REGISTER_NGRAPH_STUB("Equal");
REGISTER_NGRAPH_STUB("Exp");
REGISTER_NGRAPH_STUB("ExpandDims");
REGISTER_NGRAPH_STUB("Fill")
REGISTER_NGRAPH_STUB("Floor");
REGISTER_NGRAPH_STUB("MatMul");
REGISTER_NGRAPH_STUB("Maximum");
NGraphStubOp

REGISTER_NGRAPH_STUB("Minimum");
REGISTER_NGRAPH_STUB("Mul");
REGISTER_NGRAPH_STUB("Pack");
REGISTER_NGRAPH_STUB("Pad");
REGISTER_NGRAPH_STUB("Pow");
REGISTER_NGRAPH_STUB("Prod");
REGISTER_NGRAPH_STUB("RealDiv");
REGISTER_NGRAPH_STUB("Relu");
REGISTER_NGRAPH_STUB("Relu6");
REGISTER_NGRAPH_STUB("Reshape");
REGISTER_NGRAPH_STUB("Rsqrt");
REGISTER_NGRAPH_STUB("Slice");
REGISTER_NGRAPH_STUB("Sign");
REGISTER_NGRAPH_STUB("Sigmoid");
REGISTER_NGRAPH_STUB("Softmax");
REGISTER_NGRAPH_STUB("Snapshot");
NGraphStubOp

REGISTER_NGRAPH_STUB("Square");
REGISTER_NGRAPH_STUB("SquaredDifference");
REGISTER_NGRAPH_STUB("Squeeze");
REGISTER_NGRAPH_STUB("StridedSlice");
REGISTER_NGRAPH_STUB("Sub");
REGISTER_NGRAPH_STUB("Sum");
REGISTER_NGRAPH_STUB("Tanh");
REGISTER_NGRAPH_STUB("Tile");
REGISTER_NGRAPH_STUB("Transpose");
NGraphStubOp

src/ngraph_send_recv_ops.cc
REGISTER_KERNEL_BUILDER(Name("_Recv").Device(ngraph_bridge::DEVICE_NGRAPH),
NGraphRecv);
REGISTER_KERNEL_BUILDER(Name("_Send").Device(ngraph_bridge::DEVICE_NGRAPH),
NGraphSend);
Name("_HostRecv").Device(ngraph_bridge::DEVICE_NGRAPH).HostMemory("tensor"),
NGraphRecv);
Name("_HostSend").Device(ngraph_bridge::DEVICE_NGRAPH).HostMemory("tensor"),
NGraphSend);
NGraphRecv/NGraphSend

class NGraphRecv : public AsyncOpKernel {
public:
explicit NGraphRecv(OpKernelConstruction* ctx) {}
void ComputeAsync(OpKernelContext* ctx, DoneCallback done) override {}
private:
string key_prefix_;
tf::Rendezvous::ParsedKey parsed_key_;
bool hostmem_sendrecv_;
};
NGraphRecv

class NGraphSend : public OpKernel {
public:
explicit NGraphSend(OpKernelConstruction* ctx) : OpKernel(ctx):
void Compute(OpKernelContext* ctx) override;
private:
string key_prefix_;
tf::Rendezvous::ParsedKey parsed_key_;
bool hostmem_sendrecv_;
}
NGraphSend

と、説明しましたが、

Switch to deviceless (#117)
Large PR ("never again", I tell myself) to implement "deviceless" support for
nGraph. To make a long story short:
* The `NGRAPH` device goes away.
* `NGraphEncapsulateOp` now runs on the `CPU` device (no more sends/recvs)
* No more stub kernels or copied implementations of TF core ops like `Enter`/`Exit`
* Clustering, encapsulation, etc. is moved to an all-at-once pass in
`POST_REWRITE_FOR_EXEC` (so the weirdness we've seen where a confirmed
op gets rewritten without required attributes will not happen anymore).
https://github.com/NervanaSystems/ngraph-tf/commit/ddba671ba
23dda4e4e0f6e045936e05a624bb962

ブログ (2007年～) : Vengineerの戯言
　http://blogs.yahoo.co.jp/verification_engineer
SlideShare :
　https://www.slideshare.net/ssuser479fa3
Twitter (2009年～) :
＠Vengineer
ありがとうございました

src/ngraph_builder.h
class Builder {
public:
static tf::Status TranslateGraph(
const tf::Graph* tf_graph,
shared_ptr<ng::Function>& ng_function);
using OpMap = unordered_map<string, shared_ptr<ng::Node>>;
};
tf::Status Builder::TranslateGraph(
const tf::Graph* input_graph,
shared_ptr<ng::Function>& ng_function) {
…..
Builder::OpMap ng_op_map;
….
ng_op_map

const std::set<std::string> NGraphClusterPass::s_unclusterable_ops{
"Assign",
"Enter",
"Exit",
"IsVariableInitialized",
"Merge",
"NextIteration",
"Switch",
"VariableV2",
};
nGraphs Ops

src/ngraph_op_kernels.cc
Name("Enter")
.Device(ngraph_bridge::DEVICE_NGRAPH),
NGraphEnterOp);
Name("Exit")
NGraphExitOp);
nGraphs Ops

Name("NextIteration")
NGraphNextIterationOp);
Name("Merge")
NGraphMergeOp);
Name("Switch")
NGraphSwitchOp);
nGraphs Ops

src/ngraph_variable_ops.cc
Name("VariableV2").Device(ngraph_bridge::DEVICE_NGRAPH),
NGraphVariableOp);
Name("Assign").Device(ngraph_bridge::DEVICE_NGRAPH),
NGraphAssignOp);
Name("IsVariableInitialized").Device(ngraph_bridge::DEVICE_NGRAPH),
NGraphIsVariableInitializedOp);
nGraphs Ops

const std::set<std::string> NGraphClusterPass::s_can_be_outside_cluster_ops{
"Const",
"Identity",
"NoOp",
};
nGraphs Ops

REGISTER_KERNEL_BUILDER(Name("Const")
.Device(ngraph_bridge::DEVICE_NGRAPH)
.TypeConstraint("dtype", ngraph_bridge::NGraphDTypes()),
NGraphConstOp);
REGISTER_KERNEL_BUILDER(Name("Identity")
.Device(ngraph_bridge::DEVICE_NGRAPH)
.TypeConstraint("T", ngraph_bridge::NGraphDTypes()),
NGraphIdentityOp);
REGISTER_KERNEL_BUILDER(Name("NoOp")
NGraphNoOp);
nGraphs Ops

Bridge TensorFlow to run on Intel nGraph backends (v0.4)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Bridge TensorFlow to run on Intel nGraph backends (v0.4)

Similar to Bridge TensorFlow to run on Intel nGraph backends (v0.4) (20)

More from Mr. Vengineer

More from Mr. Vengineer (19)

Bridge TensorFlow to run on Intel nGraph backends (v0.4)