TensorFlow Lite (r1.5)
&
Android 8.1 Neural Networks API
2018/03/07(水)
LeapMind社新オフィスにて
@Vengineer
ブログ (2007年~) : Vengineerの戯言
 http://blogs.yahoo.co.jp/verification_engineer
SlideShare :
 https://www.slideshare.net/ssuser479fa3
Twitter (2009年~) :
@Vengineer
最近は、ソースコード解析職人
2017年
TensorFlow XLAにて、
ディープラーニング業界に登場
CQ出版社
インターフェース 2017年8月号、9月号に
TensorFlow XLAのAOT r1.0
についての記事を書きました
8月号:
衝撃的な性能UPの可能性を秘めた注目テクノロジ速報
AIをサクサク動かす
Google新機能TensorFlow「XLA」を探る
9月号:
最新テクノロジ・マニアの挑戦
AIサクサク用TensorFlow XLA AOTコンパイラ探訪
初めてのGoogleソースコード!
AI用コンパイラの可能性を探る 引用:http://www.kumikomi.net/interface/contents/201708.php
http://www.kumikomi.net/interface/contents/201708.php
CQ出版社
インターフェース 2018年2月号に
TensorFlow XLAのJIT r1.4
についての記事を書きました
特集:「最強グーグルのAI&IoT技術研究」
第2部 AI開発環境の研究
 第1章 ディープ・ラーニングの未来大陸を制覇するのは誰だ?
TensorFlow XLAの可能性を探る グーグルAI最強説の研究
 第2章 数式の専用デバイス割り当て機能絶賛進化中
グーグルTensorFlowがいろんなプロセッサに対応できるメカニズム
引用:http://www.kumikomi.net/interface/contents/201802.php
Google TensorFlow XLAチームにも!
そして、2018年は
TensorFlow Lite
&
Android Neural Networks APIs
TensorFlow r1.5
2018.1.26
dev preview is now available.
https://github.com/tensorflow/tensorflow/releases/tag/v1.5.0
TensorFlow Lite
https://github.com/tensorflow/tensorflow/tree/r1.5/tensorflow/contrib/lite
GraphDef
freeze_graph
TensorFlow Lite Converter
.pb
TensorFlow Lite Model File
.ckpt
.pb
transforms_graph
.pb .tflite (FP32 or 8ビット量子化)
CheckPoint
TensorFlow => TensorFlow Lite
・freeze_graphで変数を定数に
・GraphDefレベルで変換
・モデル (GraphDef)
・学習したチェックポイント
Interpreter
Kernel
TensorFlow Lite Model File
.tflite
TensorFlow Lite => Android Neural Networks API
C++ API
Java API
Android
Neural Networks API
Android App
Hardware
CPU/GPU/DSP/Custom
デフォルトは、CPU
Custom : Pixel Visual Core (Google)
Interpreter
Kernel
Java API
  run
C++ API
Java API
Android
Neural Networks API
Hardware
CPU/GPU/DSP/Custom
tensorflow/contrib/lite/java/src/main/java/org/tensorflow/lite/NativeInterpreterWrapper.java
.tflite ファイルは、JavaのInterpreterクラス
(NativeInterpreterWrapperクラス)が生成さ
れたとき、内部的なC++ APIを介して、C++
コード内で読み込まれる
private static native long[] run(...);
TensorFlow Lite Android Demo
tensorflow/contrib/lite/java/demo/app/src/main
tensorflow/contrib/lite/java/demo/app/src/main/java/com/example/android/tflitecamerademo/
ImageClassifier.java
public class ImageClassifier {
ImageClassifier(Activity activity) throws IOException {
tflite = new Interpreter(loadModelFile(activity));
labelList = loadLabelList(activity);
imgData =
ByteBuffer.allocateDirect(
DIM_BATCH_SIZE * DIM_IMG_SIZE_X * DIM_IMG_SIZE_Y * DIM_PIXEL_SIZE);
imgData.order(ByteOrder.nativeOrder());
labelProbArray = new byte[1][labelList.size()];
Log.d(TAG, "Created a Tensorflow Lite Image Classifier.");
}
….
インタープリタの生成
String classifyFrame(Bitmap bitmap) {
convertBitmapToByteBuffer(bitmap);
long startTime = SystemClock.uptimeMillis();
tflite.run(imgData, labelProbArray);
long endTime = SystemClock.uptimeMillis();
// Smooth the results across frames.
applyFilter();
// Print the results.
String textToShow = printTopKLabels();
textToShow = Long.toString(endTime - startTime) + "ms" + textToShow;
return textToShow;
}.
インタープリタの実行
tensorflow/contrib/lite/java/src/main/java/org/tensorflow/lite/Interpreter.java
public void run(@NotNull Object input, @NotNull Object output) {
Object[] inputs = {input};
Map<Integer, Object> outputs = new HashMap<>();
outputs.put(0, output);
runForMultipleInputsOutputs(inputs, outputs);
}
Interpreterクラスのrunメソッド
public void runForMultipleInputsOutputs(
@NotNull Object[] inputs, @NotNull Map<Integer, Object> outputs) {
if (wrapper == null) {
throw new IllegalStateException("The Interpreter has already been closed.");
}
Tensor[] tensors = wrapper.run(inputs);
if (outputs == null || tensors == null || outputs.size() > tensors.length) {
throw new IllegalArgumentException("Outputs do not match with model outputs.");
}
final int size = tensors.length;
for (Integer idx : outputs.keySet()) {
if (idx == null || idx < 0 || idx >= size) {
throw new IllegalArgumentException(
String.format("Invalid index of output %d (should be in range [0, %d))", idx, size));
}
tensors[idx].copyTo(outputs.get(idx));
}
}
NativeInterpreterWrapperクラスのrunメソッドを呼び出す
tensorflow/contrib/lite/java/src/main/java/org/tensorflow/lite/NativeInterpreterWrapper.java
Tensor[] run(Object[] inputs) {
int[] dataTypes = new int[inputs.length];
Object[] sizes = new Object[inputs.length];
int[] numsOfBytes = new int[inputs.length];
for (int i = 0; i < inputs.length; ++i) {
DataType dataType = dataTypeOf(inputs[i]);
dataTypes[i] = dataType.getNumber();
if (dataType == DataType.BYTEBUFFER) {
ByteBuffer buffer = (ByteBuffer) inputs[i];
numsOfBytes[i] = buffer.limit();
sizes[i] = getInputDims(interpreterHandle, i, numsOfBytes[i]);
} else if (isNonEmptyArray(inputs[i])) {
int[] dims = shapeOf(inputs[i]);
sizes[i] = dims;
numsOfBytes[i] = dataType.elemByteSize() * numElements(dims);
}
NativeInterpreterWrapperクラスのrunメソッドを呼び出す
tensorflow/contrib/lite/java/src/main/java/org/tensorflow/lite/NativeInterpreterWrapper.java
NativeInterpreterWrapper(String modelPath) {
errorHandle = createErrorReporter(ERROR_BUFFER_SIZE);
modelHandle = createModel(modelPath, errorHandle);
interpreterHandle = createInterpreter(modelHandle);
}
JNI : Java_org_tensorflow_lite_NativeInterpreterWrapper_createMode
JNI : Java_org_tensorflow_lite_NativeInterpreterWrapper_createInterpreter
NativeInterpreterWrapperクラスのコンストラクタ
long[] outputsHandles =
run( interpreterHandle, errorHandle, sizes, dataTypes, numsOfBytes, inputs );
Tensor[] outputs = new Tensor[outputsHandles.length];
for (int i = 0; i < outputsHandles.length; ++i) {
outputs[i] = Tensor.fromHandle(outputsHandles[i]);
}
return outputs;
}
インタープリタ(JNI経由でC++モデル)に、入力データを渡して実行
Java_org_tensorflow_lite_NativeInterpreterWrapper_run
tensorflow/contrib/lite/java/src/main/native/nativeinterpreterwrapper_jni.cc
JNI経由でC++モデルのrunメソッドを呼び出す
tensorflow/contrib/lite/java/src/main/native/nativeinterpreterwrapper_jni.cc
JNIEXPORT jlong JNICALL
Java_org_tensorflow_lite_NativeInterpreterWrapper_createModel(
JNIEnv* env, jclass clazz, jstring model_file, jlong error_handle) {
BufferErrorReporter* error_reporter =
convertLongToErrorReporter(env, error_handle);
if (error_reporter == nullptr) return 0;
const char* path = env->GetStringUTFChars(model_file, nullptr);
auto model = tflite::FlatBufferModel::BuildFromFile(path, error_reporter);
env->ReleaseStringUTFChars(model_file, path);
return reinterpret_cast<jlong>(model.release());
}
ファイルからモデル再構築
C++モデル : createModelメソッド
tensorflow/contrib/lite/java/src/main/native/nativeinterpreterwrapper_jni.cc
JNIEXPORT jlong JNICALL
Java_org_tensorflow_lite_NativeInterpreterWrapper_createInterpreter(
JNIEnv* env, jclass clazz, jlong model_handle) {
tflite::FlatBufferModel* model = convertLongToModel(env, model_handle);
if (model == nullptr) return 0;
auto resolver = ::tflite::CreateOpResolver();
std::unique_ptr<tflite::Interpreter> interpreter;
tflite::InterpreterBuilder(*model, *(resolver.get()))(&interpreter);
return reinterpret_cast<jlong>(interpreter.release());
}
モデルをインタープリタで実行
C++モデル : createInterpreterメソッド
Interpreter
Kernel
C++ API (Android NDK API)
  Java_org_tensorflow_lite_NativeInterpreterWrapper_run
C++ API
Java API
Android
Neural Networks API
Hardware
CPU/GPU/DSP/Custom
tensorflow/contrib/lite/java/src/main/native/nativeinterpreterwrapper_jni.cc
interpreter->Invoke()
tensorflow/contrib/lite/java/src/main/native/nativeinterpreterwrapper_jni.cc
JNIEXPORT jlongArray JNICALL
Java_org_tensorflow_lite_NativeInterpreterWrapper_run(
JNIEnv* env, jclass clazz, jlong interpreter_handle, jlong error_handle,
jobjectArray sizes, jintArray data_types, jintArray nums_of_bytes, jobjectArray values) {
tflite::Interpreter* interpreter = convertLongToInterpreter(env, interpreter_handle);
const int input_size = env->GetArrayLength(sizes);
TfLiteStatus status = checkInputs(env, interpreter, input_size, data_types,
nums_of_bytes, values, sizes);
status = resizeInputs(env, interpreter, input_size, sizes);
status = interpreter->AllocateTensors();
C++モデル : runメソッド
status = setInputs(env, interpreter, input_size, data_types, nums_of_bytes, values);
interpreter->Invoke();
const std::vector<int>& results = interpreter->outputs();
jlongArray outputs = env->NewLongArray(results.size());
size_t size = results.size();
for (int i = 0; i < size; ++i) {
TfLiteTensor* source = interpreter->tensor(results[i]);
jlong output = reinterpret_cast<jlong>(source);
env->SetLongArrayRegion(outputs, i, 1, &output);
}
return outputs;
}
インタープリタ実行
C++モデル : runメソッド(続き)
Interpreter
Kernel
Interpreter
  interpreter->invoke()
C++ API
Java API
Android
Neural Networks API
Hardware
CPU/GPU/DSP/Custom
tensorflow/contrib/lite/interpreter.{h, cc}
nnapi_delegate_->Invoke(this)
TfLtensorflow/contrib/lite/interpreter.{h, cc}
iteStatus Interpreter::Invoke() {
// エラーチェック
TfLiteStatus status = kTfLiteOk;
if (nnapi_delegate_) {
if (AllocateTensorsWhoseSizesAreKnown() == kTfLiteError) {
return kTfLiteError;
}
if (next_allocate_node_id_ == nodes_and_registration_.size()) {
TF_LITE_ENSURE_OK(&context_, nnapi_delegate_->Invoke(this));
return kTfLiteOk;
} else {
…..
}
}
nnapiのdelegateを実行
C++モデル : Interpreterクラス、Invokeメソッド
Interpreter
Kernel
nnapi_delegate
C++ API
Java API
Android
Neural Networks API
Hardware
CPU/GPU/DSP/Custom
tensorflow/contrib/lite/nnapi_delegate.{h, cc}
ANeuralNetworksModel_create(...);
ANeuralNetworksCompilation_create(...);
ANeuralNetworksCompilation_finish(...);
ANeuralNetworksExecution_create(...):
ANeuralNetworksExecution_setInput(...);
ANeuralNetworksExecution_setOutput(...);
ANeuralNetworksExecution_startCompute(...);
ANeuralNetworksEvent_wait(...)
ANeuralNetworksEvent_free(...);
ANeuralNetworksExecution_free(...);
  https://developer.android.com/ndk/guides/neuralnetworks/index.html
tensorflow/contrib/lite/nnapi_delegate.cc
TfLiteStatus NNAPIDelegate::Invoke(Interpreter* interpreter) {
if (!nn_model_) {
TF_LITE_ENSURE_STATUS(BuildGraph(interpreter));
}
NN APIを使って、
モデル構築
C++モデル : NNAPIDelegateクラス、Invokeメソッド
TfLiteStatus NNAPIDelegate::BuildGraph(Interpreter* interpreter) {
if (nn_model_ && nn_compiled_model_) return kTfLiteOk;
if (!nn_model_) {
CHECK_NN(ANeuralNetworksModel_create(&nn_model_));
uint32_t next_id = addTensorOperands(interpreter, nn_model_);
AddOpsAndParams(interpreter, nn_model_, next_id);
CHECK_NN(ANeuralNetworksModel_identifyInputsAndOutputs(
nn_model_, static_cast<uint32_t>(interpreter->inputs().size()),
reinterpret_cast<const uint32_t*>(interpreter->inputs().data()),
static_cast<uint32_t>(interpreter->outputs().size()),
reinterpret_cast<const uint32_t*>(interpreter->outputs().data())));
CHECK_NN(ANeuralNetworksModel_finish(nn_model_));
} モデル生成
オペランド
オペレータ
パラメータ
C++モデル : NNAPIDelegateクラス、Invokeメソッド(続き)
if (!nn_compiled_model_) {
CHECK_NN(ANeuralNetworksCompilation_create(nn_model_, &nn_compiled_model_));
CHECK_NN(ANeuralNetworksCompilation_finish(nn_compiled_model_));
}
return kTfLiteOk;
} コンパイル
C++モデル : NNAPIDelegateクラス、Invokeメソッド
uint32_t addTensorOperands(tflite::Interpreter* interpreter,
ANeuralNetworksModel* nn_model) {
// 途中略
ANeuralNetworksOperandType operand_type{
nn_type, static_cast<uint32_t>(tensor->dims->size),
reinterpret_cast<uint32_t*>(tensor->dims->data), scale, zeroPoint};
CHECK_NN(ANeuralNetworksModel_addOperand(nn_model, &operand_type));
if (tensor->allocation_type == kTfLiteMmapRo) {
if (const NNAPIAllocation* alloc = dynamic_cast<const NNAPIAllocation*>(
static_cast<const Allocation*>(tensor->allocation))) {
CHECK_NN(ANeuralNetworksModel_setOperandValueFromMemory(
nn_model, i, alloc->memory(), alloc->offset(tensor->data.raw),
tensor->bytes));
} else {
CHECK_NN(ANeuralNetworksModel_setOperandValue(
nn_model, i, tensor->data.raw, tensor->bytes));
}
C++モデル :addTensorOperands関数
ANeuralNetworksOperandType operand_type{
nn_type, static_cast<uint32_t>(tensor->dims->size),
reinterpret_cast<uint32_t*>(tensor->dims->data), scale, zeroPoint};
CHECK_NN(ANeuralNetworksModel_addOperand(nn_model, &operand_type));
if (tensor->allocation_type == kTfLiteMmapRo) {
if (const NNAPIAllocation* alloc = dynamic_cast<const NNAPIAllocation*>(
static_cast<const Allocation*>(tensor->allocation))) {
CHECK_NN(ANeuralNetworksModel_setOperandValueFromMemory(
nn_model, i, alloc->memory(), alloc->offset(tensor->data.raw),
tensor->bytes));
} else {
CHECK_NN(ANeuralNetworksModel_setOperandValue(
nn_model, i, tensor->data.raw, tensor->bytes));
}
}
….
C++モデル :addTensorOperands関数 (続き)
void AddOpsAndParams(tflite::Interpreter* interpreter,
ANeuralNetworksModel* nn_model, uint32_t next_id) {
// 途中略
ANeuralNetworksExecution* execution = nullptr;
CHECK_NN(ANeuralNetworksExecution_create(nn_compiled_model_, &execution));
// Currently perform deep copy of input buffer
for (size_t i = 0; i < interpreter->inputs().size(); i++) {
int input = interpreter->inputs()[i];
// TODO(aselle): Is this what we want or do we want input instead?
// TODO(aselle): This should be called setInputValue maybe to be cons.
TfLiteTensor* tensor = interpreter->tensor(input);
CHECK_NN(ANeuralNetworksExecution_setInput(
execution, i, nullptr, tensor->data.raw, tensor->bytes));
}
C++モデル : AddOpsAndParams関数
入力データ
for (size_t i = 0; i < interpreter->outputs().size(); i++) {
int output = interpreter->outputs()[i];
TfLiteTensor* tensor = interpreter->tensor(output);
CHECK_NN(ANeuralNetworksExecution_setOutput(
execution, i, nullptr, tensor->data.raw, tensor->bytes));
}
// Currently use blocking compute.
ANeuralNetworksEvent* event = nullptr;
CHECK_NN(ANeuralNetworksExecution_startCompute(execution, &event));
CHECK_NN(ANeuralNetworksEvent_wait(event));
ANeuralNetworksEvent_free(event);
ANeuralNetworksExecution_free(execution);
return kTfLiteOk;
}
出力データ
推論開始
終了待ち
データ開放
C++モデル : AddOpsAndParams関数 (続き)
Android Neural Networks API
(Android 8.1.0 r14)
Interpreter
Kernel
Android Neuratl Networks API
C++ API
Java API
Android
Neural Networks API
Hardware
CPU/GPU/DSP/Custom
https://android.googlesource.com/platform/frameworks/ml
Android 8.1
公開されているものは、CPU版のみ
  https://developer.android.com/ndk/guides/neuralnetworks/index.html
引用:https://developer.android.com/ndk/guides/neuralnetworks/index.html
Programming flow for Android Neural Networks API
コンパイル 実行
モデル生成
終了待ち
モデルの実行
nn/runtime/NeuralNetworks.cpp
int ANeuralNetworksExecution_startCompute(
ANeuralNetworksExecution* execution,
ANeuralNetworksEvent** event) {
ExecutionBuilder* r = reinterpret_cast<ExecutionBuilder*>(execution);
std::unique_ptr<sp<ExecutionCallback>> e = std::make_unique<sp<ExecutionCallback>>();
*event = nullptr;
int n = r->startCompute(e.get());
*event = reinterpret_cast<ANeuralNetworksEvent*>(e.release());
return ANEURALNETWORKS_NO_ERROR;
}
ExecutionBuilder
nn/runtime/ExecutionBuilder.cpp
int ExecutionBuilder::startCompute(sp<ExecutionCallback>* synchronizationCallback) {
*synchronizationCallback = nullptr;
// 途中、略
// Run on the CPU.
VLOG(EXECUTION) << "ExecutionBuilder::startCompute (without plan) on CPU";
StepExecutor executor(this, mModel,
nullptr /* no VersionedIDevice, so CPU */,
nullptr /* no IPreparedModel */);
executor.mapInputsAndOutputsTrivially();
return executor.startCompute(synchronizationCallback);
}
StepExecutor::startCompute
nn/runtime/ExecutionBuilder.cpp
int StepExecutor::startCompute(sp<ExecutionCallback>* synchronizationCallback) {
if (mDriver == nullptr) {
return startComputeOnCpu(synchronizationCallback);
} else {
return startComputeOnDevice(synchronizationCallback);
}
}
CPUで実行
デバイスで実行
デフォルトでは、CPUで実行
StepExecutor::startCompute
nn/runtime/ExecutionBuilder.cpp
int StepExecutor::startCompute(sp<ExecutionCallback>* synchronizationCallback) {
if (mDriver == nullptr) {
return startComputeOnCpu(synchronizationCallback);
} else {
return startComputeOnDevice(synchronizationCallback);
}
}
CPUで実行
デバイスで実行
StepExecutor::startComputeOnCpu
nn/runtime/ExecutionBuilder.cpp
int StepExecutor::startComputeOnCpu(sp<ExecutionCallback>* synchronizationCallback) {
// TODO: use a thread pool
Model model;
mModel->setHidlModel(&model);
途中略
// TODO: should model be moved with a std::cref?
std::thread thread( asyncStartComputeOnCpu, model, std::move(request),
std::move(modelPoolInfos), std::move(requestPoolInfos),
executionCallback);
executionCallback->bind_thread(std::move(thread));
*synchronizationCallback = executionCallback;
return ANEURALNETWORKS_NO_ERROR;
}
asyncStartComputeOnCpu
nn/runtime/ExecutionBuilder.cpp
static void asyncStartComputeOnCpu(
const Model& model, const Request& request,
const std::vector<RunTimePoolInfo>& modelPoolInfos,
const std::vector<RunTimePoolInfo>& requestPoolInfos,
const sp<IExecutionCallback>& executionCallback) {
CpuExecutor executor;
int err = executor.run(model, request, modelPoolInfos, requestPoolInfos);
ErrorStatus status = err == ANEURALNETWORKS_NO_ERROR ?
ErrorStatus::NONE : ErrorStatus::GENERAL_FAILURE;
executionCallback->notify(status);
}
CpuExecutor::run
nn/common/CpuExecutor.cpp
int CpuExecutor::run( const Model& model, const Request& request,
const std::vector<RunTimePoolInfo>& modelPoolInfos,
const std::vector<RunTimePoolInfo>& requestPoolInfos) {
mModel = &model;
mRequest = &request;
initializeRunTimeInfo(modelPoolInfos, requestPoolInfos);
for (const auto& operation : model.operations) {
int n = executeOperation(operation);
if (n != ANEURALNETWORKS_NO_ERROR) {
return n;
}
}
CpuExecutor::executeOperation
nn/common/CpuExecutor.cpp
int CpuExecutor::executeOperation(const Operation& operation) {
const hidl_vec<uint32_t>& ins = operation.inputs;
const hidl_vec<uint32_t>& outs = operation.outputs;
bool success = false;
途中略
switch (operation.type) {
case OperationType::OEM_OPERATION:
// ……….
case OperationType::ADD: {
// ……….
途中略
}
freeNoLongerUsedOperands(ins);
return ANEURALNETWORKS_NO_ERROR;
}
addFloat32
nn/common/operations/SimpleMath.cpp
#include "tensorflow/contrib/lite/kernels/internal/optimized/optimized_ops.h"
bool addFloat32(const float* in1, const Shape& shape1,
const float* in2, const Shape& shape2,
int32_t activation,
float* out, const Shape& shapeOut) {
bool needBroadcast = !SameShape(shape1, shape2);
if (needBroadcast) {
#define ANDROID_NN_BROADCAST_ADD(activation) 
tflite::optimized_ops::BroadcastAdd<tflite::FusedActivationFunctionType::activation>( 
in1, convertShapeToDims(shape1), 
in2, convertShapeToDims(shape2), 
out, convertShapeToDims(shapeOut))
addFloat32
nn/common/operations/SimpleMath.cpp
ANDROID_NN_MACRO_DISPATCH(ANDROID_NN_BROADCAST_ADD)
#undef ANDROID_NN_BROADCAST_ADD
} else {
float output_activation_min, output_activation_max;
CalculateActivationRangeFloat(activation, &output_activation_min,
&output_activation_max);
tflite::optimized_ops::Add(
in1, convertShapeToDims(shape1),
in2, convertShapeToDims(shape2),
output_activation_min, output_activation_max,
out, convertShapeToDims(shapeOut));
}
return true;
}
Add
tensorflow/contrib/lite/kernels/internal/optimized/optimized_ops.h
void Add (const float* input1_data, const Dims<4>& input1_dims,
const float* input2_data, const Dims<4>& input2_dims,
float* output_data, const Dims<4>& output_dims) {
gemmlowp::ScopedProfilingLabel label("Add");
/* const int batches = */ MatchingArraySize(input1_dims, 3, input2_dims, 3, output_dims, 3);
/* const int height = */ MatchingArraySize(input1_dims, 2, input2_dims, 2, output_dims, 2);
/* const int width = */ MatchingArraySize(input1_dims, 1, input2_dims, 1, output_dims, 1);
/* const int depth = */ MatchingArraySize(input1_dims, 0, input2_dims, 0, output_dims, 0);
TFLITE_DCHECK(IsPackedWithoutStrides(input1_dims));
TFLITE_DCHECK(IsPackedWithoutStrides(input2_dims));
TFLITE_DCHECK(IsPackedWithoutStrides(output_dims));
int i = 0;
const int size = input1_dims.sizes[3] * input1_dims.strides[3];
#ifdef USE_NEON
TensorFlow Lite
のコードを使う
デバイスで実行
ExecutionBuilder
int ExecutionBuilder::startCompute(sp<ExecutionCallback>* synchronizationCallback) {
// 途中、略
{
Model hidlModel;
mModel->setHidlModel(&hidlModel);
const std::vector<std::shared_ptr<Device>>& devices = DeviceManager::get()->getDrivers();
for (const auto& device : devices) {
hidl_vec<bool> supports;
device->getSupportedOperations(hidlModel, &supports);
if (std::find(supports.begin(), supports.end(), false) == supports.end()) {
StepExecutor executor(this, mModel,
device->getInterface(),
nullptr /* no IPreparedModel, so compile */);
executor.mapInputsAndOutputsTrivially();
return executor.startCompute(synchronizationCallback);
}
}
}
}
デバイスで実行
StepExecutor::startCompute
nn/runtime/ExecutionBuilder.cpp
int StepExecutor::startCompute(sp<ExecutionCallback>* synchronizationCallback) {
if (mDriver == nullptr) {
return startComputeOnCpu(synchronizationCallback);
} else {
return startComputeOnDevice(synchronizationCallback);
}
}
CPUで実行
デバイスで実行
int StepExecutor::startComputeOnDevice(sp<ExecutionCallback>* synchronizationCallback) {
*synchronizationCallback = nullptr;
if (mPreparedModel == nullptr) {
Model model;
mModel->setHidlModel(&model);
sp<PreparedModelCallback> preparedModelCallback = new PreparedModelCallback();
ErrorStatus prepareLaunchStatus = mDriver->prepareModel(model, preparedModelCallback);
preparedModelCallback->wait();
ErrorStatus prepareReturnStatus = preparedModelCallback->getStatus();
mPreparedModel = preparedModelCallback->getPreparedModel();
}
モデルの準備
int n = allocatePointerArgumentsToPool(&mInputs, &inputPointerArguments);
n = allocatePointerArgumentsToPool(&mOutputs, &outputPointerArguments);
for (auto& info : mInputs) {
if (info.state == ModelArgumentInfo::POINTER) {
DataLocation& loc = info.locationAndLength;
uint8_t* data = nullptr;
int n = inputPointerArguments.getPointer(&data);
memcpy(data + loc.offset, info.buffer, loc.length);
}
}
Request request;
setRequestArgumentArray(mInputs, &request.inputs);
setRequestArgumentArray(mOutputs, &request.outputs);
uint32_t count = mMemories.size();
request.pools.resize(count);
for (uint32_t i = 0; i < count; i++) {
request.pools[i] = mMemories[i]->getHidlMemory();
}
入力ポインタ
出力ポインタ
入力データ
sp<ExecutionCallback> executionCallback = new ExecutionCallback();
Return<ErrorStatus> executeStatus = mPreparedModel->execute(request, executionCallback):
executionCallback->wait();
Return<ErrorStatus> executionStatus = executionCallback->getStatus();
for (auto& info : mOutputs) {
if (info.state == ModelArgumentInfo::POINTER) {
DataLocation& loc = info.locationAndLength;
uint8_t* data = nullptr;
int n = outputPointerArguments.getPointer(&data);
memcpy(info.buffer, data + loc.offset, loc.length);
}
}
*synchronizationCallback = executionCallback;
return ANEURALNETWORKS_NO_ERROR;
}
モデルの実行
Sample Driver
Sample Driver
nn/driver/sample/
// Base class used to create sample drivers for the NN HAL. This class
// provides some implementation of the more common functions.
//
class SampleDriver : public V1_0::IDevice {
public:
SampleDriver(const char* name) : mName(name) {}
~SampleDriver() override {}
Return<ErrorStatus> prepareModel(const Model& model,
const sp<IPreparedModelCallback>& callback) override;
Return<DeviceStatus> getStatus() override;
int run();
protected:
std::string mName;
};
SampleDriver::run
nn/driver/sample/SampleDriver.cpp
int SampleDriver::run() {
android::hardware::configureRpcThreadpool(4, true);
if (registerAsService(mName) != android::OK) {
LOG(ERROR) << "Could not register service";
return 1;
}
android::hardware::joinRpcThreadpool();
LOG(ERROR) << "Service exited!";
return 1;
}
サービス登録
Implementing the Service
https://source.android.com/devices/architecture/configstore/service
registerAsService メソッドは、hidl-genコマンドによって生成される
いろいろなサービス
SampleDriverAll.rc
service neuralnetworks_hal_service_sample_all
/vendor/bin/hw/android.hardware.neuralnetworks@1.0-service-sample-all
SampleDriverFloatFast.rc
service neuralnetworks_hal_service_sample_float_fast
/vendor/bin/hw/android.hardware.neuralnetworks@1.0-service-sample-float-fast
SampleDriverFloatSlow.rc
service neuralnetworks_hal_service_sample_float_slow
/vendor/bin/hw/android.hardware.neuralnetworks@1.0-service-sample-float-slow
SampleDriverMinimal.rc
service neuralnetworks_hal_service_sample_minimal
/vendor/bin/hw/android.hardware.neuralnetworks@1.0-service-sample-minimal
SampleDriverQuant.rc
service neuralnetworks_hal_service_sample_quant
/vendor/bin/hw/android.hardware.neuralnetworks@1.0-service-sample-quant
すべてをサポート
floatが速い
floatが遅い
最低限サポート
量子化サポート
Android NN APIサポート状況
2018.02.07のブログ
ARM GPU MaliでAndroid Neural
Networks APIサポート
https://blogs.yahoo.co.jp/verification_engineer/71457473.html
引用:https://community.arm.com/android-community/b/android/posts/arm-support-for-android-nnapi-gives-up-to-4x-performance-boost
Compute Libraryは、
オープンソース
OpenCLデバイスドライバ
ここは、どうなる?
こちら側は、
Android 8.1のソースコー
ドにて公開
Mobile Machine Learning Hardware at ARM:
A Systems-on-Chip (SoC) Perspective
Yuhao Zhu, Department of Computer Science, University of Rochester
Matthew Mattina, Machine Learning & AI, ARM Research
Paul Whatmough, Machine Learning & AI, ARM Research
 CNN Accelarator と
CPU Cluster (L3) は、
ACP にて接続されている
https://arxiv.org/abs/1801.06274
ソフトウェア・スタックのポイント
The key of such a programming interface is a clear
abstraction that allows applications to execute DNN jobs
efficiently on (one of many) hardware accelerators, or fall
back to execution on a CPU or GPU.
The AndroidNN API provides an example of this principle,
by abstracting common DNN kernels such as convolution,
and scheduling execution through a hardware abstraction
layer (HAL).
Arm NN SDK & Arm ML Processor
Downloads,
resources,
and documentation
Available March 2018.
引用:https://developer.arm.com/products/processors/machine-learning/arm-nn
2018.02.14のブログ
Kirin 970のAI
https://blogs.yahoo.co.jp/verification_engineer/71467363.html
引用:https://www.anandtech.com/show/12195/hisilicon-kirin-970-power-performance-overview/5
2018.03.03のブログ
Snapdragon 845にて、
Android Neural Network APIをサポート
https://blogs.yahoo.co.jp/verification_engineer/71486808.html
Hexagon (DSP) Neural Networkにてサポート
2018.03.04のブログ
Mediatek Helio P60とNeuroPilot SDK
https://blogs.yahoo.co.jp/verification_engineer/71486859.html
Artificial Intelligence Processing Unit (APU)をサポート
まとめ
GraphDef
freeze_graph
TensorFlow Lite Converter
.pb
TensorFlow Lite Model File
.ckpt
.pb
transforms_graph
.pb .tflite (FP32 or 8ビット量子化)
CheckPoint
TensorFlow => TensorFlow Lite
・freeze_graphで変数を定数に
・GraphDefレベルで変換
・モデル (GraphDef)
・学習したチェックポイント
Interpreter
Kernel
TensorFlow Lite Model File
.tflite
TensorFlow Lite => Android Neural Networks API
C++ API
Java API
Android
Neural Networks API
Android App
Hardware
CPU/GPU/DSP/Custom
デフォルトは、CPU
: ARM Cortex-A (NEON)
GPU : ARM Mali (Compute Library)
Custom : Pixel Visual Core (Google)
Kirin 970 (Huawei)
Helio P60 (MediaTek)
Snapdragon 845 (Qualcomm)
ブログ (2007年~) : Vengineerの戯言
 http://blogs.yahoo.co.jp/verification_engineer
SlideShare :
 https://www.slideshare.net/ssuser479fa3
Twitter (2009年~) :
@Vengineer
ありがとうございました

TensorFlow Lite (r1.5) & Android 8.1 Neural Network API