// Each executionof an XlaCompile op creates a new XlaExecutableClosure, even
// if it didn't have to compile the cluster because of a compilation-cache
// hit. This is because we at least need new snapshots of the resource
// variables.
XlaExecutableClosureStore::KeyT key =
XlaExecutableClosureStore::Global()->Produce(XlaExecutableClosure(
client, executable, kernel, std::move(variables), constants_.size()));
Tensor compilation_key(cpu_allocator, DT_STRING, TensorShape({}));
compilation_key.flat<string>()(0) = key;
Tensor compilation_successful(cpu_allocator, DT_BOOL, TensorShape({}));
compilation_successful.flat<bool>()(0) = true;
ctx->set_output(0, compilation_key) ;
ctx->set_output(1, compilation_successful) ;
}
EagerLocalExecute
// If weare running a function on explicitly requested TPU,
// compile it with XLA.
// Note that it is not ideal, but currently ok, to set this
// attribute after computing the kernel cache key above.
bool compile_with_xla = false;
if (op->is_function() && device != nullptr &&
(device->device_type() == " TPU" || device->device_type() == "XLA_GPU" ||
device->device_type() == "XLA_CPU")) {
op->MutableAttrs()->Set(kXlaCompileAttr, true);
compile_with_xla = true;
}
kXlaCompileAttr を true にすると、
MarkForCompilationPass::Run にて、XLA化の準備をする
Automatic Full Compilationof Julia
Programs and ML Models to Cloud
TPUs
https://arxiv.org/abs/1810.09868
Qiita : XLA.jl を試してみた
Qiita : JuliaからCloud TPUを使う論文の、ざっくりまとめ
Introducing PyTorch acrossGoogle
Cloud , 2018.10.3
https://cloud.google.com/blog/products/ai-machine-learning/introducing-p
ytorch-across-google-cloud
Today, we’re pleased to announce that engineers on Google’s TPU team
are actively collaborating with core PyTorch developers to connect
PyTorch to Cloud TPUs. The long-term goal is to enable everyone to enjoy
the simplicity and flexibility of PyTorch while benefiting from the
performance, scalability, and cost-efficiency of Cloud TPUs.
69.
As a startingpoint, the engineers involved have produced a prototype that
connects PyTorch to Cloud TPUs via XLA, an open source linear algebra
compiler.
This prototype has successfully enabled us to train a PyTorch
implementation of ResNet-50 on a Cloud TPU, and we’re planning to open
source the prototype and then expand it in collaboration with the PyTorch
community.
Please email us at pytorch-tpu@googlegroups.com to tell us what types of
PyTorch workloads you would be most interested in accelerating with Cloud
TPUs!
for batch_number, (inputs,targets) in wloader:
self._step += 1
optimizer.zero_grad()
xla_outputs = xla_run_model(self._xla_model, inputs, devices=self._devices)
xla_run_grad(self._xla_model, self._get_backward_grads(xla_outputs),
devices=self._devices)
optimizer.step()
if (log_fn is not None and log_interval is not None and
batch_number % log_interval == 0):
if metrics_debug:
log_fn(torch_xla._XLAC._xla_metrics_report())
loss = self._compute_loss(xla_outputs)
log_fn(
TrainStepMetrics(self._epoch, self._num_cores, batch_number,
len(samples_loader), batch_size, loss,
time.time() - start_time, self._step))
return loss