This document discusses bridging TensorFlow to run on Intel nGraph backends. It summarizes various optimization passes used in the nGraph-TensorFlow integration, including passes to liberate nodes from placement constraints, confirm placement, cluster the graph, and encapsulate clusters. Key points:
- NGraphLiberatePass and NGraphConfirmPass run during the PRE_PLACEMENT phase to handle nGraph placement
- NGraphClusterPass runs during POST_REWRITE_FOR_EXEC to cluster the graph into subgraphs, similar to XLA partitioning
- NGraphEncapsulatePass encapsulates clusters into NGraphEncapsulateOp nodes, analogous to XLA's use of _XlaLaunchOp
-
TensorFlow XLAの中では、
XLA Client を Pythonで利用できるようになっています。
また、2018年2月に開催されたSysMLの論文(JAX@Google)についても追記しました。
In TensorFlow XLA,
XLA Client is now available in Python.
Also added about SysML's paper (JAX @ Google) held in February 2018.
How to make a large C++-code base manageablecorehard_by
My talk will cover how to work with a large C++ code base professionally. How to write code for debuggability, how to work effectively even due the long C++ compilation times, how and why to utilize the STL algorithms, how and why to keep interfaces clean. In addition, general convenience methods like making wrappers to make the code less error prone (for example ranged integers, listeners, concurrent values). Also a little bit about common architecture patterns to avoid (virtual classes), and patterns to encourage (pure functions), and how std::function/lambda functions can be used to make virtual classes copyable.
TensorFlow XLAの中では、
XLA Client を Pythonで利用できるようになっています。
また、2018年2月に開催されたSysMLの論文(JAX@Google)についても追記しました。
In TensorFlow XLA,
XLA Client is now available in Python.
Also added about SysML's paper (JAX @ Google) held in February 2018.
How to make a large C++-code base manageablecorehard_by
My talk will cover how to work with a large C++ code base professionally. How to write code for debuggability, how to work effectively even due the long C++ compilation times, how and why to utilize the STL algorithms, how and why to keep interfaces clean. In addition, general convenience methods like making wrappers to make the code less error prone (for example ranged integers, listeners, concurrent values). Also a little bit about common architecture patterns to avoid (virtual classes), and patterns to encourage (pure functions), and how std::function/lambda functions can be used to make virtual classes copyable.
Tiramisu : A Code Optimization Framework for High Performance Systems
https://www.csail.mit.edu/research/tiramisu-framework-code-optimization-and-code-generation
の概要です。
ドキュメントがほとんどないので、ソースコード解析をやって、サンプルプログラムの内容について、調べてみました。
SIMD machines — machines capable of evaluating the same instruction on several elements of data in parallel — are nowadays commonplace and diverse, be it in supercomputers, desktop computers or even mobile ones. Numerous tools and libraries can make use of that technology to speed up their computations, yet it could be argued that there is no library that provides a satisfying minimalistic, high-level and platform-agnostic interface for the C++ developer.
Антон Бикинеев, Writing good std::future< C++ >Sergey Platonov
В докладе Антон расскажет о грядущих мажорных изменениях языка, которые, не войдя в Стандарт 17-го года и оставшись в Technical Specifications, будут ждать своего мержа в 20-м, а также быть уже реализованными в некоторых компиляторах. Осветятся также минорные, уже одобренные фичи следующего Стандарта, как языковые, так и библиотечные. Антон расскажет об их целях, покажет методы использования, а также осветит некоторые гайдлайны и трики.
Недавно работы комитета по стандартизации WG21 были завершены, и документ-черновик C++17 был отправлен на рассмотрение в Международную организацию по стандартизации (ISO). С этого момента технически можно считать, что стандарт C++17 у нас есть. Если вы ещё ознакомились с принятыми изменениями, то сейчас для этого самое время. В докладе будет сделан обзор нововведений. Рассмотрено текущее состояние дел у популярных компиляторов с поддержкой С++17
Evgeniy Muralev, Mark Vince, Working with the compiler, not against itSergey Platonov
The talk will look at limitations of compilers when creating fast code and how to make more effective use of both the underlying micro-architecture of modern CPU's and how algorithmic optimizations may have surprising effects on the generated code. We shall discuss several specific CPU architecture features and their pros and cons in relation to creating fast C++ code. We then expand with several algorithmic techniques, not usually well-documented, for making faster, compiler friendly, C++.
Note that we shall not discuss caching and related issues here as they are well documented elsewhere.
C++ How I learned to stop worrying and love metaprogrammingcppfrug
Cette présentation parcours quelques applications directes de la méta-programmation en C++(11/14) avec comme objectif de démontrer son utilité dans un cadre applicatif.
Fuzzing is a software testing technique that involves providing invalid, unexpected, or random data to the inputs of a computer program. The presentation covers types of fuzzers and describes how they work. We will write and run a real fuzzer. Also it shows how fuzzers can guess correct CRC checksums, help with regression testing and find logical bugs in programs. Finally, it summarizes fuzzing usage at Google.
Не так давно Гор Нишанов представил свой доклад: C++ Coroutines a negative overhead abstraction. В этом докладе Гор упомянул, что предложенный дизайн корутин позволяет их использовать практически в любых окружениях, в том числе и с "бедным" C++ рантаймом.
Я решил попробовать запустить корутины в следующих окружениях: обычное приложение, драйвер ОС Windows, EFI приложение. Только в одном из этих окружений есть полноценный C++ рантайм и поддержка исключений, в остальных ничего этого нет. Более того, EFI приложение вообще выполняется до старта ОС.
Я хочу рассказать о том, как мне удалось запустить корутины в этих окружениях, поговорим о том, какие проблемы существуют в асинхронном системном программировании и как их можно обойти.
4Developers 2018: Evolution of C++ Class Design (Mariusz Łapiński)PROIDEA
Since the beginnings of C++, classes are the basic building block of any self-respecting application. The combination of data and its relevant functionality heavily impacts the way we modularize the program, design APIs, construct and reuse the code - think of software in general.
In the previous era, C++ was a hammer for every nail, providing secure high-level abstractions without sacrifice of the superior performance. Yet during the past decades, competing technologies dethroned it out of many domains. As the language evolved to prevail its principal advantages, so did the way classes are programmed.
I am pleased to invite you to a speech about code construction and programming techniques used in programs and libraries now and then. Expect a lot of code samples!
JVM Mechanics: When Does the JVM JIT & Deoptimize?Doug Hawkins
HotSpot promises to do the "right" thing for us by identifying our hot code and compiling "just-in-time", but how does HotSpot make those decisions?
This presentation aims to detail how HotSpot makes those decisions and how it corrects its mistakes through a series of demos that you run yourself.
Tiramisu : A Code Optimization Framework for High Performance Systems
https://www.csail.mit.edu/research/tiramisu-framework-code-optimization-and-code-generation
の概要です。
ドキュメントがほとんどないので、ソースコード解析をやって、サンプルプログラムの内容について、調べてみました。
SIMD machines — machines capable of evaluating the same instruction on several elements of data in parallel — are nowadays commonplace and diverse, be it in supercomputers, desktop computers or even mobile ones. Numerous tools and libraries can make use of that technology to speed up their computations, yet it could be argued that there is no library that provides a satisfying minimalistic, high-level and platform-agnostic interface for the C++ developer.
Антон Бикинеев, Writing good std::future< C++ >Sergey Platonov
В докладе Антон расскажет о грядущих мажорных изменениях языка, которые, не войдя в Стандарт 17-го года и оставшись в Technical Specifications, будут ждать своего мержа в 20-м, а также быть уже реализованными в некоторых компиляторах. Осветятся также минорные, уже одобренные фичи следующего Стандарта, как языковые, так и библиотечные. Антон расскажет об их целях, покажет методы использования, а также осветит некоторые гайдлайны и трики.
Недавно работы комитета по стандартизации WG21 были завершены, и документ-черновик C++17 был отправлен на рассмотрение в Международную организацию по стандартизации (ISO). С этого момента технически можно считать, что стандарт C++17 у нас есть. Если вы ещё ознакомились с принятыми изменениями, то сейчас для этого самое время. В докладе будет сделан обзор нововведений. Рассмотрено текущее состояние дел у популярных компиляторов с поддержкой С++17
Evgeniy Muralev, Mark Vince, Working with the compiler, not against itSergey Platonov
The talk will look at limitations of compilers when creating fast code and how to make more effective use of both the underlying micro-architecture of modern CPU's and how algorithmic optimizations may have surprising effects on the generated code. We shall discuss several specific CPU architecture features and their pros and cons in relation to creating fast C++ code. We then expand with several algorithmic techniques, not usually well-documented, for making faster, compiler friendly, C++.
Note that we shall not discuss caching and related issues here as they are well documented elsewhere.
C++ How I learned to stop worrying and love metaprogrammingcppfrug
Cette présentation parcours quelques applications directes de la méta-programmation en C++(11/14) avec comme objectif de démontrer son utilité dans un cadre applicatif.
Fuzzing is a software testing technique that involves providing invalid, unexpected, or random data to the inputs of a computer program. The presentation covers types of fuzzers and describes how they work. We will write and run a real fuzzer. Also it shows how fuzzers can guess correct CRC checksums, help with regression testing and find logical bugs in programs. Finally, it summarizes fuzzing usage at Google.
Не так давно Гор Нишанов представил свой доклад: C++ Coroutines a negative overhead abstraction. В этом докладе Гор упомянул, что предложенный дизайн корутин позволяет их использовать практически в любых окружениях, в том числе и с "бедным" C++ рантаймом.
Я решил попробовать запустить корутины в следующих окружениях: обычное приложение, драйвер ОС Windows, EFI приложение. Только в одном из этих окружений есть полноценный C++ рантайм и поддержка исключений, в остальных ничего этого нет. Более того, EFI приложение вообще выполняется до старта ОС.
Я хочу рассказать о том, как мне удалось запустить корутины в этих окружениях, поговорим о том, какие проблемы существуют в асинхронном системном программировании и как их можно обойти.
4Developers 2018: Evolution of C++ Class Design (Mariusz Łapiński)PROIDEA
Since the beginnings of C++, classes are the basic building block of any self-respecting application. The combination of data and its relevant functionality heavily impacts the way we modularize the program, design APIs, construct and reuse the code - think of software in general.
In the previous era, C++ was a hammer for every nail, providing secure high-level abstractions without sacrifice of the superior performance. Yet during the past decades, competing technologies dethroned it out of many domains. As the language evolved to prevail its principal advantages, so did the way classes are programmed.
I am pleased to invite you to a speech about code construction and programming techniques used in programs and libraries now and then. Expect a lot of code samples!
JVM Mechanics: When Does the JVM JIT & Deoptimize?Doug Hawkins
HotSpot promises to do the "right" thing for us by identifying our hot code and compiling "just-in-time", but how does HotSpot make those decisions?
This presentation aims to detail how HotSpot makes those decisions and how it corrects its mistakes through a series of demos that you run yourself.
How Many Ohs? (An Integration Guide to Apex & Triple-o)OPNFV
Dan Radez, Red Hat, Tim Rozet, Red Hat
The OPNFV ecosystem is made up of projects that need to integrate with each other. Project Apex uses Triple-o under the covers which most people usually need some assistance to integrate with.
Come and spend a session with the Apex development team learning the ins and outs of Triple-o.
In this session participants will learn about the deployment process that is run when an Apex/Triple-o deployment is executed and how to assign services to nodes and generate networking configurations withing Triple-o to successfully integrate and deploy a new component in OpenStack.
Come learn how to untangle the learning curve presented when integrating and using Triple-o and simplify your future development and deployment endeavors with a new found intimate knowledge of the Apex & Triple-o platform.
How do we go from your Java code to the CPU assembly that actually runs it? Using high level constructs has made us forget what happens behind the scenes, which is however key to write efficient code.
Starting from a few lines of Java, we explore the different layers that constribute to running your code: JRE, byte code, structure of the OpenJDK virtual machine, HotSpot, intrinsic methds, benchmarking.
An introductory presentation to these low-level concerns, based on the practical use case of optimizing 6 lines of code, so that hopefully you to want to explore further!
Presentation given at the Toulouse (FR) Java User Group.
Video (in french) at https://www.youtube.com/watch?v=rB0ElXf05nU
Slideshow with animations at https://docs.google.com/presentation/d/1eIcROfLpdTU2_Z_IKiMG-AwqZGZgbN1Bs2E0nGShpbk/pub?start=true&loop=false&delayms=60000
This talk is a journey through the wonders and mysteries of Kubernetes namespaces. While being a known feature of Kubernetes, there are a number of not so well known things to know about them that can teach a lot about Kubernetes. During the talk we will not only take a look at the details of Kubernetes namespaces, but also show how they are used in different production scenarios.
Inside the JVM - Follow the white rabbit! / Breizh JUGSylvain Wallez
Presentation given at the Rennes (FR) Java User Group in Feb 2019.
How do we go from your Java code to the CPU assembly that actually runs it? Using high level constructs has made us forget what happens behind the scenes, which is however key to write efficient code.
Starting from a few lines of Java, we explore the different layers that constribute to running your code: JRE, byte code, structure of the OpenJDK virtual machine, HotSpot, intrinsic methds, benchmarking.
An introductory presentation to these low-level concerns, based on the practical use case of optimizing 6 lines of code, so that hopefully you to want to explore further!
Continuous Delivery with Maven, Puppet and Tomcat - ApacheCon NA 2013Carlos Sanchez
Continuous Integration, with Apache Continuum or Jenkins, can be extended to fully manage deployments and production environments, running in Tomcat for instance, in a full Continuous Delivery cycle using infrastructure-as-code tools like Puppet, allowing to manage multiple servers and their configurations.
Puppet is an infrastructure-as-code tool that allows easy and automated provisioning of servers, defining the packages, configuration, services,... in code. Enabling DevOps culture, tools like Puppet help drive Agile development all the way to operations and systems administration, and along with continuous integration tools like Apache Continuum or Jenkins, it is a key piece to accomplish repeatability and continuous delivery, automating the operations side during development, QA or production, and enabling testing of systems configuration.
Traditionally a field for system administrators, Puppet can empower developers, allowing both to collaborate coding the infrastructure needed for their developments, whether it runs in hardware, virtual machines or cloud. Developers and sysadmins can define what JDK version must be installed, application server, version, configuration files, war and jar files,... and easily make changes that propagate across all nodes.
Using Vagrant, a command line automation layer for VirtualBox, they can also spin off virtual machines in their local box, easily from scratch with the same configuration as production servers, do development or testing and tear them down afterwards.
We will show how to install and manage Puppet nodes with JDK, multiple Tomcat instances with installed web applications, database, configuration files and all the supporting services. Including getting up and running with Vagrant and VirtualBox for quickstart and Puppet experiments, as well as setting up automated testing of the Puppet code.
More on bpftrace for MariaDB DBAs and Developers - FOSDEM 2022 MariaDB DevroomValeriy Kravchuk
bpftrace is a relatively new open source tracer for modern Linux (kernels 5.x.y) that may help to troubleshoot performance issues in production as well as to get insights on how software really works. I use it for a couple of years and would like to present more details on how to do it efficiently, including but not limited to adding user probes to different lines of the code inside functions, checking values of local variables and using bpftrace as a code coverage tool.
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUsChris Fregly
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs @ Strata London, May 24 2017
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on GPUs - Advanced Spark and TensorFlow Meetup May 23 2017 @ Hotels.com London
We'll discuss how to deploy TensorFlow, Spark, and Sciki-learn models on GPUs with Kubernetes across multiple cloud providers including AWS, Google, and Azure - as well as on-premise.
In addition, we'll discuss how to optimize TensorFlow models for high-performance inference using the latest TensorFlow XLA (Accelerated Linear Algebra) framework including the JIT and AOT Compilers.
Github Repo (100% Open Source!)
https://github.com/fluxcapacitor/pipeline
http://pipeline.io
Similar to Bridge TensorFlow to run on Intel nGraph backends (v0.4) (20)
11. Bridge TensorFlow* to run on Intel®
nGraph™ backends
https://github.com/NervanaSystems/ngraph-tf
https://github.com/NervanaSystems/ngraph-tf/tree/r0.4/
13. まずは、cpu で確認してみると
def test_cpu(self):
with tf.Session() as sess:
x = tf.placeholder(tf.float32, [2], name="x")
with tf.device("cpu"):
y = x * 2
result = sess.run(y, {x: [1.5, 0.5]})
18. 次に、gpu に変更してみると
def test_gpu(self):
with tf.Session() as sess:
x = tf.placeholder(tf.float32, [2], name="x")
with tf.device("gpu"):
y = x * 2
result = sess.run(y, {x: [1.5, 0.5]})
31. 次に、NGRAPH に変更してみると
def test_ngraph(self):
with tf.Session() as sess:
x = tf.placeholder(tf.float32, [2], name="x")
with tf.device("NGRAPH"):
y = x * 2
result = sess.run(y, {x: [1.5, 0.5]})
39. src/ngraph_liberate_pass.cc
tf::Status LiberateNGraphPlacement(tf::Graph* graph) {
int i = 0;
for (auto node : graph->op_nodes()) {
if (node->IsOp() && IsNGraphNode(node)) {
std::vector<std::string> colo;
if (tf::GetNodeAttr(node->attrs(), tf::kColocationAttrName, &colo) == tf::Status::OK()) {
for (auto& s : colo) {
std::stringstream ss; ss << s << "/LIBERATED_" << (i++); s = ss.str();
}
node->ClearAttr(tf::kColocationAttrName);
node->AddAttr(tf::kColocationAttrName, colo);
}
}
}
return tf::Status::OK();
}
NGraphLiberatePass
40. src/ngraph_liberate_pass.cc
// At graph construction time, TensorFlow likes to place colocation constraints
// that force variables onto the same device as their initializers. For nGraph
// this doesn't work very well, because we don't yet support RNG ops, and this
// results in randomly-initialized variables being forced onto the host.
//
// The workaround implemented here is to "liberate" nGraph-placed ops from
// colocation constraints. This pass only applies to nodes with a requested
// placement on NGRAPH, meaning that the graph will be unchanged except
// where the user has explicitly requested nGraph.
NGraphLiberatePass
41. src/ngraph_liberate_pass.cc
// General algorithm:
//
// i := 0
// For each node n in the graph:
// If n has been placed on device NGRAPH:
// For each colocation constraint s on n:
// Append the string ("/LIBERATED_" + i) to s
// i++
//
// (Note that simply blanking out the colocation constraints does not work,
// because this causes the placer to act as if the node is subject to an
// eponymous colocation constraint, which happens to be exactly the name that
// the variable construction stuff will assign to it anyway.)
NGraphLiberatePass
44. src/ngraph_confirm_pass.cc
// In some cases, we require more complex placement constraints than than
// TensorFlow's native "soft-placement" machinery is capable of handling. To
// handle this, we insert a pass called the "confirmation" pass during the
// pre-placement phase.
// For example, we can only handle Reshape if the "shape" input is a constant,
// so this is okay:
//
// ... Const[2,4,2]
// /
// Reshape (1)
//
// but this is not:
//
// ... Placeholder
// /
// Reshape (2)
NGraphConfirmPass
45. src/ngraph_confirm_pass.cc
// We want to reject placement of Reshape on NGRAPH for the second graph, but
// allow it for the first. We also want to attach some more metadata to the
// Reshape node so that we can remember the requested output shape even if the
// Const node winds up being placed in a different subgraph.
//
// This pass exploits a feature of the placement engine that allows a kernel
// builder registration request to restrict use of the kernel to nodes that
// have a particular value set for the "_kernel" attribute. In this case, we
// will check every node that has a requested placement on NGRAPH, and make
// sure that it conforms to certain (op-dependent) constraints. If the
// constraints are satisfied, we will tag the node with a "_kernel" value of
// "ngraph", along with some op-specific metadata (if applicable). The stub
// kernels, in turn, are registered with the constraint that _kernel="ngraph".
// This means that during the placement pass, our kernels will not be allowed
// for nodes we did not mark during this pass, and placement will fall back on
// CPU.
NGraphConfirmPass
46. src/ngraph_confirm_pass.cc
// Taking Reshape as an example, the pass ensures that the "shape" input is
// constant, and if so, it adds to the Reshape node the "_kernel=ngraph"
// attribute, along with some metadata recording the value of the constant.
// Thus graph (1) is transformed as follows:
//
// ... Const[2,4,2][_kernel="ngraph"]
// /
// Reshape[_kernel="ngraph",
// _ngraph_reshape_static_shape={2,4,2}]
//
// while graph (2) would be left unchanged, meaning that soft placement will
// fall back on non-nGraph implementations.
NGraphConfirmPass
47. src/ngraph_confirm_pass.cc
// Internally, there are two pieces. The first is a type constraint checker,
// which supplants the type checking machinery usually used with
// REGISTER_KERNEL_BUILDER. This ensures that any constraints on the data types
// of input tensors are satisfied---for example, we do not support DT_STRING.
// The second part is a set of finer-grained per-op checks called "confirmation
// functions", implementing more specific checks like the one described for
// Reshape above.
//
// The confirmation functions are implemented as callbacks of the type:
//
// std::function<tf::Status(tf::Node*, bool*)>.
NGraphConfirmPass
48. src/ngraph_confirm_pass.cc
// A confirmation function returns true/false by reference through its second
// parameter: true if placement is "accepted", and false if it is "rejected".
// For example, the confirmation function for "Reshape" will return true
// for (1) above, and false for (2).
//
// A confirmation function can also, as a side effect, add attributes to the
// node being checked, which can be used later in ngraph_builder. (Note that in
// general such attributes will need to start with "_" to mark them as
// "internal" or "system" attributes, as otherwise TensorFlow attempts to
// validate them as against the op schema.)
NGraphConfirmPass
54. src/ngraph_cluster.h
class NGraphEncapsulatePass : public tensorflow::GraphOptimizationPass {
public:
tf::Status Run(const tf::GraphOptimizationPassOptions& options) {
if (std::getenv("NGRAPH_TF_SKIP_ENCAPSULATION") != nullptr) {
NGRAPH_VLOG(0)
<< "NGRAPH_TF_SKIP_ENCAPSULATION is set. Skipping encapsulation "
"step.";
return tf::Status::OK();
}
return EncapsulateFunctions(options.graph->get());
}
…….
};
NGraphEncapsulatePass
55. src/ngraph_encapsulate_pass.cc
tf::Status EncapsulateFunctions(tf::Graph* graph) {
// Pass 1: Populate the cluster-index-to-device name map for each existing
// cluster.
// Pass 2: Find all nodes that are feeding into/out of each cluster, and
// add inputs for them to the corresponding FunctionDef(s).
// 各クラスタに入力部分がある場合は、入力ノードを追加?
NGraphEncapsulatePass::EncapsulateFunctions
56. src/ngraph_encapsulate_pass.cc
// Pass 2: Find all nodes that are feeding into/out of each cluster, and
// add inputs for them to the corresponding FunctionDef(s).
// 各クラスタに入力部分がある場合は、入力ノード /出力ノードの追加?
auto new_input_node_def =
NGraphClusterManager::GetClusterGraph(dst_cluster_idx)->add_node();
new_input_node_def->set_name(new_input_name);
new_input_node_def->set_op("_Arg");
SetAttrValue(dt, &((*(new_input_node_def->mutable_attr()))["T"]));
SetAttrValue(arg_index_count[dst_cluster_idx],
&((*(new_input_node_def->mutable_attr()))["index"]));
NGraphEncapsulatePass::EncapsulateFunctions
58. src/ngraph_encapsulate_pass.cc
// Pass 4: Remap all non-clustered inputs that are reading from
// encapsulated edges, and all control edges that cross cluster
// boundaries.
// Pass 5: Make copies of all clustered nodes inside the cluster graphs,
// rewiring the inputs in their NodeDefs as we go.
// Pass 6: Remove clustered nodes from the graph.
NGraphEncapsulatePass::EncapsulateFunctions
59. src/ngraph_encapsulate_pass.cc
// Pass 7 (optional, only run if environment variable
// NGRAPH_TF_VALIDATE_CLUSTER_GRAPHS is set):
// validate the graph def, and make sure we can construct a graph from it.
NGraphEncapsulatePass::EncapsulateFunctions
72. src/ngraph_builder.cc
// 出力部
vector<shared_ptr<ng::Node>> ng_result_list(tf_ret_vals.size());
for (auto n : tf_ret_vals) {
tf::Node* tf_input_node;
n->input_node(0, &tf_input_node;
int index;
tf::GetNodeAttr(n->attrs(), "index", &index);
auto item = ng_op_map.find(tf_input_node->name());
if (item != ng_op_map.end()) {
ng_result_list[index] = item->second;
} else {
return tf::errors::InvalidArgument("Cannot find return node: ",
tf_input_node->name());
}
}
Builder::TranslateGraph
73. src/ngraph_builder.cc
vector<shared_ptr<ng::Node>> ng_result_list(tf_ret_vals.size());
for (auto n : tf_ret_vals) {
tf::Node* tf_input_node;
n->input_node(0, &tf_input_node);
int index;
tf::GetNodeAttr(n->attrs(), "index", &index);
auto item = ng_op_map.find(tf_input_node->name());
if (item != ng_op_map.end()) {
ng_result_list[index] = item->second;
}
// 関数のポインタ (nGraph)
ng_function = make_shared<ng::Function>(ng_result_list, ng_parameter_list);
return tf::Status::OK();
}
Builder::TranslateGraph
74. src/ngraph_builder.cc
// Now create the nGraph ops from TensorFlow ops.
//
for (auto op : tf_ops) {
NGRAPH_VLOG(2) << "Constructing op " << op->name() << " which is "
<< op->type_string();
// NOTE: The following cases should be kept in alphabetical order.
// いろいろな Ops に対する処理をしている
}
Builder::TranslateGraphでサポートするOps
84. Switch to deviceless (#117)
Large PR ("never again", I tell myself) to implement "deviceless" support for
nGraph. To make a long story short:
* The `NGRAPH` device goes away.
* `NGraphEncapsulateOp` now runs on the `CPU` device (no more sends/recvs)
* No more stub kernels or copied implementations of TF core ops like `Enter`/`Exit`
* Clustering, encapsulation, etc. is moved to an all-at-once pass in
`POST_REWRITE_FOR_EXEC` (so the weirdness we've seen where a confirmed
op gets rewritten without required attributes will not happen anymore).
https://github.com/NervanaSystems/ngraph-tf/commit/ddba671ba
23dda4e4e0f6e045936e05a624bb962