SlideShare a Scribd company logo
TensorFlow on
Android
“freedom” Koan-Sin Tan

freedom@computer.org

COSCUP 2017, Taipei, Taiwan
Who Am I
• A software engineer working for a SoC company

• An old open source user, learned to use Unix on a
VAX-11/780 running 4.3BSD

• Learned a bit about TensorFlow and how it works on
Android

• Send a couple of PRs, when I was learning to use
TensorFlow to classify image
TensorFlow
• “An open-source software library for Machine
Intelligence”

• An open-source library for deep neural network
learning

• https://www.tensorflow.org/
https://github.com/tensorflow/tensorflow/graphs/contributors
• My first impression of TensorFlow

• Hey, that’s scary. How come you see some many
compiler warnings when building such popular open-
source library

• think about: WebKit, llvm/clang, linux kernel, etc.

• Oh, Google has yet another build system and it’s
written in Java
How TensorFlow Works
• TensorFlow is dataflow programming

• a program modeled as an acyclic
directional graph

• node/vertex: operation

• edge: flow of data (tensor in
TensorFlow)

• operations don’t execute right
away

• operations execute when data
are available to ALL inputs
In [1]: import tensorflow as tf
In [2]: node1 = tf.constant(3.0)
...: node2 = tf.constant(4.0)
...: print(node1, node2)
...:
(<tf.Tensor 'Const:0' shape=()
dtype=float32>, <tf.Tensor 'Const_1:0'
shape=() dtype=float32>)
In [3]: sess = tf.Session()
...: print(sess.run([node1,
node2]))
...:
[3.0, 4.0]
In [4]: a = tf.add(3, 4)
...: print(sess.run(a))
...:
7
TensorFlow on Android
• https://www.tensorflow.org/mobile/

• ongoing effort “to reduce the code footprint,
and supporting quantization and lower
precision arithmetic that reduce model size”

• Looks good

• some questions

• how to build ARMv8 binaries, with latest
NDK?

• --cxxopt="-std=c++11" --cxxopt="-
Wno-c++11-narrowing" --cxxopt=“-
DTENSORFLOW_DISABLE_META”

• Inception models (e.g., V3) are relatively slow
on Android devices

• is there any benchmark or profiling tool?

• it turns out YES
9
• bazel build -c opt --linkopt="-ldl" --cxxopt="-std=c++11" --
cxxopt="-Wno-c++11-narrowing" --cxxopt="-
DTENSORFLOW_DISABLE_META" --crosstool_top=//external:android/
crosstool --cpu=arm64-v8a --host_crosstool_top=@bazel_tools//
tools/cpp:toolchain //tensorflow/examples/
android:tensorflow_demo --fat_apk_cpu=arm64-v8a
• bazel build -c opt --cxxopt="-std=c++11" --cxxopt="-
DTENSORFLOW_DISABLE_META" --crosstool_top=//external:android/
crosstool --cpu=arm64-v8a --host_crosstool_top=@bazel_tools//
tools/cpp:toolchain //tensorflow/examples/
android:tensorflow_demo --fat_apk_cpu=arm64-v8a
• in case you wanna know how to do it for with older NDK
• The TensorFlow benchmark
can benchmark a compute
graph and its individual
options

• both on desktop and
Android
• however, it doesn't deal with
real input(s)
• I saw label_image when reading an article on
quantization

• label_image didn't build for Android

• still image decoders (jpg, png, and gif) are
not included

• So,

• made it run

• added a quick and dirty BMP decider

• To hack more quickly (compiling TensorFlow
on MT8173 board running Debian is slow), I
wrote a Python script to mimic what the C++
program does

[1] https://github.com/tensorflow/tensorflow/
tree/master/tensorflow/examples/label_image
Quantization
https://cloud.google.com/blog/big-data/2017/05/an-in-depth-look-at-
googles-first-tensor-processing-unit-tpu
https://www.tensorflow.org/performance/quantization
https://www.tensorflow.org/performance/quantization
Quantizated nodes
label_image
flounder:/data/local/tmp $ ./label_image --graph=inception_v3_2016_08_28_frozen.pb --
image=grace_hopper.bmp --labels=imagenet_slim_labels.txt
can't determine number of CPU cores: assuming 4
can't determine number of CPU cores: assuming 4
native : main.cc:250 military uniform (653): 0.834119
native : main.cc:250 mortarboard (668): 0.0196274
native : main.cc:250 academic gown (401): 0.00946237
native : main.cc:250 pickelhaube (716): 0.00757228
native : main.cc:250 bulletproof vest (466): 0.0055856
flounder:/data/local/tmp $ ./label_image --graph=quantized_graph.pb --image=grace_hopper.bmp --
labels=imagenet_slim_labels.txt
can't determine number of CPU cores: assuming 4
can't determine number of CPU cores: assuming 4
native : main.cc:250 military uniform (653): 0.930771
native : main.cc:250 mortarboard (668): 0.00730017
native : main.cc:250 bulletproof vest (466): 0.00365008
native : main.cc:250 pickelhaube (716): 0.00365008
native : main.cc:250 academic gown (401): 0.00365008
gemmlowp
• GEMM (GEneral Matrix Multiplication)

• The Basic Linear Algebra Subprograms (BLAS) are routines that provide standard building blocks
for performing basic vector and matrix operations

• The Level 1 BLAS (1979) perform scalar, vector and vector-vector operations,

• the Level 2 BLAS (1988) perform matrix-vector operations, and 

• the Level 3 BLAS (1990) perform matrix-matrix operations: {S,D,C,Z}GEMM and others

• Lowp: low-precision

• less than single precision floating point numbers (< 32-bit), well, actually, "low-precision" in
gemmlowp means that the input and output matrix entries are integers on at most 8 bits

• Why GEMM

• Optimized

• FC, Convolution (im2col, see next page)
https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/
Quantization is tricky
• Yes, we see https://www.tensorflow.org/performance/quantization

• tensorflow/tools/quantization/

• There are others utilities

• tensorflow/tools/graph_transforms/

• Inception V3 model,

• Floating point numbers

./benchmark_model --output_layer=InceptionV3/Predictions/Reshape_1 —input_layer_shape=1,299,299,3
• avg: around 840 ms for a 299x299x3 photo 

• Quantized one

./benchmark_model --graph=quantized_graph.pb --output_layer=InceptionV3/Predictions/Reshape_1 --input_layer_shape=1,299,299,3
• If we tried a recent one, oops, > 1.2 seconds
Current status of TF
• Well, public status

• Google does have internal branches, during review process of BMP decoder, I ran into one

• CPU ARMv7 and ARMv8

• Q hexagon DSP, https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/hvx

• Eigen and gemmlowp

• Basic XLA works

• Not all operations are supported

• the ‘name = “mobile_srcs”’ in tensorflow/core/BUILD

• “//tensorflow/core/kernels:android_core_ops", “//tensorflow/core/kernels:android_extended_ops" in tensorflow/
core/kernel/BUILD

• C++ and Java API (the TensorFlow site lists Python, C++, Java, and GO now)

• I am far away from Java, don't know how good the API is

• “A word of caution: the APIs in languages other than Python are not yet covered by the API stability promises."

• You may find something like RSTensorFlow and tf-coriander, but AFAICT they are far away from complete
Arch including distributed
training
• The architecture figure of
TensorFlow show important
components, including
distributed stuff
https://www.tensorflow.org/extend/architecture
TensorFlow Architecture
AndroidNN is coming to town
Android Neural Network API
• New API for Neural Network

• Being added to the Android
framework

• Wraps hardware accelerators
(GPU, DSP, ISP, etc.)
from Google I/O 2017 video
• New TensorFlow runtime
• Optimized for mobile and
embedded apps

• Runs TensorFlow models on
device

• Leverage Android NN API

• Soon to be open sourced
from Google I/O 2017 video
Comparing with CoreML
stack
• No GPU/GPGPU support yet.
Hopefully, Android NN will
help.

• Albeit Google is so good at
ML/DL and various
applications, we don’t see
good application framework(s)
on Android yet.
Simple CoreML Exercise
• Simple app to use InceptionV3 to classify image from
Photo roll or camera 

• in Objective-C 

• in Swift

• Work continuously on camera

• in Objective-C

• in Swift
Depthwise Separable Convolution
• CNNs with depthwise separable convolution such as Mobilenet [1]
changed almost everything

• Depthwise separable convolution “factorize” a standard convolution
into a depthwise convolution and a 1 × 1 convolution called a
pointwise convolution. Thus it greatly reduces computation
complexity.

• Depthwise separable convolution is not that that new [2], but pure
depthwise separable convolution-based networks such as Xception
and MobileNet demonstrated its power

[1] https://arxiv.org/abs/1704.04861

[2] L. Sifre. “Rigid-motion scattering for image classification”, PhD thesis, 2014
...M
N
1
1
...
MDK
DK
1
...
M
DK
DK N
depthwise convolution filters
standard convolution filters
1×1 Convolutional Filters (Pointwise Convolution)https://arxiv.org/abs/1704.04861
Depthwise Separable Convolution
MobileNet
• D_K: kernel size

• D_F: input size

• M: input channel size

• N: output channel size
https://arxiv.org/abs/1704.04861
MobileNet on Nexus 9
• “largest” Mobilenet model http://download.tensorflow.org/
models/mobilenet_v1_1.0_224_frozen.tgz

• benchmark_model: ./benchmark_model --
graph=frozen_graph.pb —output_layer=MobilenetV1/
Predictions/Reshape_1

• around 120 ms

• Smallest one

• mobilenet_v1_0.25_128: ~25 ms
flounder:/data/local/tmp $ ./label_image --graph=mobilenet_10_224.pb --image=grace_hopper.bmp --
labels=imagenet_slim_labels.txt --output_layer=MobilenetV1/Predictions/Reshape_1 --input_width=224 --
input_height=224
can't determine number of CPU cores: assuming 4
can't determine number of CPU cores: assuming 4
native : main.cc:250 military uniform (653): 0.871238
native : main.cc:250 bow tie (458): 0.0575709
native : main.cc:250 ping-pong ball (723): 0.0113924
native : main.cc:250 suit (835): 0.0110482
native : main.cc:250 bearskin (440): 0.00586033
flounder:/data/local/tmp $ ./label_image --graph=mobilenet_025_128.pb --image=grace_hopper.bmp --
labels=imagenet_slim_labels.txt --output_layer=MobilenetV1/Predictions/Reshape_1 --input_width=128 --
input_height=128
can't determine number of CPU cores: assuming 4
can't determine number of CPU cores: assuming 4
native : main.cc:250 suit (835): 0.310988
native : main.cc:250 bow tie (458): 0.197784
native : main.cc:250 military uniform (653): 0.121169
native : main.cc:250 academic gown (401): 0.0309299
native : main.cc:250 mortarboard (668): 0.0242411
Recap
• TensorFlow may not be great on Android yet

• New techniques and NN models are changing status quo

• Android NN, XLA, MobileNet

• big.LITTLE and other system software optimization may
still be needed
Questions?
Backup
MobileNet on iPhone
• Find a Caffe model, e.g., the one

• Or, use a converted one

More Related Content

What's hot

A Peek into TFRT
A Peek into TFRTA Peek into TFRT
A Peek into TFRT
Koan-Sin Tan
 
TFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU Delegates
Koan-Sin Tan
 
Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU
Andrés Leonardo Martinez Ortiz
 
Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020
Koan-Sin Tan
 
GPU Computing for Data Science
GPU Computing for Data Science GPU Computing for Data Science
GPU Computing for Data Science
Domino Data Lab
 
OpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel ProgrammingOpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel Programming
Andreas Schreiber
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Chris Fregly
 
Is Python still production ready ? Ludovic Gasc
Is Python still production ready ? Ludovic GascIs Python still production ready ? Ludovic Gasc
Is Python still production ready ? Ludovic Gasc
Pôle Systematic Paris-Region
 
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Chris Fregly
 
Asynchronous I/O in Python 3
Asynchronous I/O in Python 3Asynchronous I/O in Python 3
Asynchronous I/O in Python 3
Feihong Hsu
 
The Flow of TensorFlow
The Flow of TensorFlowThe Flow of TensorFlow
The Flow of TensorFlow
Jeongkyu Shin
 
Parallel Programming in Python: Speeding up your analysis
Parallel Programming in Python: Speeding up your analysisParallel Programming in Python: Speeding up your analysis
Parallel Programming in Python: Speeding up your analysis
Manojit Nandi
 
C++ Actor Model - You’ve Got Mail ...
C++ Actor Model - You’ve Got Mail ...C++ Actor Model - You’ve Got Mail ...
C++ Actor Model - You’ve Got Mail ...
Gianluca Padovani
 
MTaulty_DevWeek_Parallel
MTaulty_DevWeek_ParallelMTaulty_DevWeek_Parallel
MTaulty_DevWeek_Parallel
ukdpe
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Muralidharan Deenathayalan
 
Concurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System DiscussionConcurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System Discussion
CherryBerry2
 
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Holden Karau
 
Message-passing concurrency in Python
Message-passing concurrency in PythonMessage-passing concurrency in Python
Message-passing concurrency in Python
Sarah Mount
 
Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Beating the (sh** out of the) GIL - Multithreading vs. MultiprocessingBeating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Guy K. Kloss
 
Simplifying training deep and serving learning models with big data in python...
Simplifying training deep and serving learning models with big data in python...Simplifying training deep and serving learning models with big data in python...
Simplifying training deep and serving learning models with big data in python...
Holden Karau
 

What's hot (20)

A Peek into TFRT
A Peek into TFRTA Peek into TFRT
A Peek into TFRT
 
TFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU DelegatesTFLite NNAPI and GPU Delegates
TFLite NNAPI and GPU Delegates
 
Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU Tensorflow 2.0 and Coral Edge TPU
Tensorflow 2.0 and Coral Edge TPU
 
Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020Running TFLite on Your Mobile Devices, 2020
Running TFLite on Your Mobile Devices, 2020
 
GPU Computing for Data Science
GPU Computing for Data Science GPU Computing for Data Science
GPU Computing for Data Science
 
OpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel ProgrammingOpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel Programming
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
 
Is Python still production ready ? Ludovic Gasc
Is Python still production ready ? Ludovic GascIs Python still production ready ? Ludovic Gasc
Is Python still production ready ? Ludovic Gasc
 
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
 
Asynchronous I/O in Python 3
Asynchronous I/O in Python 3Asynchronous I/O in Python 3
Asynchronous I/O in Python 3
 
The Flow of TensorFlow
The Flow of TensorFlowThe Flow of TensorFlow
The Flow of TensorFlow
 
Parallel Programming in Python: Speeding up your analysis
Parallel Programming in Python: Speeding up your analysisParallel Programming in Python: Speeding up your analysis
Parallel Programming in Python: Speeding up your analysis
 
C++ Actor Model - You’ve Got Mail ...
C++ Actor Model - You’ve Got Mail ...C++ Actor Model - You’ve Got Mail ...
C++ Actor Model - You’ve Got Mail ...
 
MTaulty_DevWeek_Parallel
MTaulty_DevWeek_ParallelMTaulty_DevWeek_Parallel
MTaulty_DevWeek_Parallel
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
 
Concurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System DiscussionConcurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System Discussion
 
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
 
Message-passing concurrency in Python
Message-passing concurrency in PythonMessage-passing concurrency in Python
Message-passing concurrency in Python
 
Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Beating the (sh** out of the) GIL - Multithreading vs. MultiprocessingBeating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
 
Simplifying training deep and serving learning models with big data in python...
Simplifying training deep and serving learning models with big data in python...Simplifying training deep and serving learning models with big data in python...
Simplifying training deep and serving learning models with big data in python...
 

Similar to Tensorflow on Android

Introduction to Tensor Flow-v1.pptx
Introduction to Tensor Flow-v1.pptxIntroduction to Tensor Flow-v1.pptx
Introduction to Tensor Flow-v1.pptx
Janagi Raman S
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlow
Darshan Patel
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
Poo Kuan Hoong
 
Ml goes fruitful
Ml goes fruitfulMl goes fruitful
Ml goes fruitful
Preeti Negi
 
Stackato v6
Stackato v6Stackato v6
Stackato v6
Jonas Brømsø
 
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUsScalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
Indrajit Poddar
 
Docker + Tenserflow + GOlang - Golang singapore Meetup
Docker + Tenserflow + GOlang - Golang singapore MeetupDocker + Tenserflow + GOlang - Golang singapore Meetup
Docker + Tenserflow + GOlang - Golang singapore Meetup
sangam biradar
 
Programming using C++ - slides.pptx
Programming using C++ - slides.pptxProgramming using C++ - slides.pptx
Programming using C++ - slides.pptx
HeadoftheDepartment
 
"Project Tye to Tie .NET Microservices", Oleg Karasik
"Project Tye to Tie .NET Microservices", Oleg Karasik"Project Tye to Tie .NET Microservices", Oleg Karasik
"Project Tye to Tie .NET Microservices", Oleg Karasik
Fwdays
 
DotNet Conf Madrid 2019 - Whats New in ML.NET
DotNet Conf Madrid 2019 - Whats New in ML.NETDotNet Conf Madrid 2019 - Whats New in ML.NET
DotNet Conf Madrid 2019 - Whats New in ML.NET
Alberto Diaz Martin
 
All about that reactive ui
All about that reactive uiAll about that reactive ui
All about that reactive ui
Paul van Zyl
 
Python Open CV
Python Open CVPython Open CV
Python Open CV
Tarun Bamba
 
Deep Dive Time Series Anomaly Detection in Azure with dotnet
Deep Dive Time Series Anomaly Detection in Azure with dotnetDeep Dive Time Series Anomaly Detection in Azure with dotnet
Deep Dive Time Series Anomaly Detection in Azure with dotnet
Marco Parenzan
 
Stackato v3
Stackato v3Stackato v3
Stackato v3
Jonas Brømsø
 
Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?
GetInData
 
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f..."Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
Edge AI and Vision Alliance
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Xavier Amatriain
 
Tensor flow
Tensor flowTensor flow
Tensor flow
Nikhil Krishna Nair
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
Ndjido Ardo BAR
 
Continuos Integration and Delivery: from Zero to Hero with TeamCity, Docker a...
Continuos Integration and Delivery: from Zero to Hero with TeamCity, Docker a...Continuos Integration and Delivery: from Zero to Hero with TeamCity, Docker a...
Continuos Integration and Delivery: from Zero to Hero with TeamCity, Docker a...
Lean IT Consulting
 

Similar to Tensorflow on Android (20)

Introduction to Tensor Flow-v1.pptx
Introduction to Tensor Flow-v1.pptxIntroduction to Tensor Flow-v1.pptx
Introduction to Tensor Flow-v1.pptx
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlow
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
 
Ml goes fruitful
Ml goes fruitfulMl goes fruitful
Ml goes fruitful
 
Stackato v6
Stackato v6Stackato v6
Stackato v6
 
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUsScalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
 
Docker + Tenserflow + GOlang - Golang singapore Meetup
Docker + Tenserflow + GOlang - Golang singapore MeetupDocker + Tenserflow + GOlang - Golang singapore Meetup
Docker + Tenserflow + GOlang - Golang singapore Meetup
 
Programming using C++ - slides.pptx
Programming using C++ - slides.pptxProgramming using C++ - slides.pptx
Programming using C++ - slides.pptx
 
"Project Tye to Tie .NET Microservices", Oleg Karasik
"Project Tye to Tie .NET Microservices", Oleg Karasik"Project Tye to Tie .NET Microservices", Oleg Karasik
"Project Tye to Tie .NET Microservices", Oleg Karasik
 
DotNet Conf Madrid 2019 - Whats New in ML.NET
DotNet Conf Madrid 2019 - Whats New in ML.NETDotNet Conf Madrid 2019 - Whats New in ML.NET
DotNet Conf Madrid 2019 - Whats New in ML.NET
 
All about that reactive ui
All about that reactive uiAll about that reactive ui
All about that reactive ui
 
Python Open CV
Python Open CVPython Open CV
Python Open CV
 
Deep Dive Time Series Anomaly Detection in Azure with dotnet
Deep Dive Time Series Anomaly Detection in Azure with dotnetDeep Dive Time Series Anomaly Detection in Azure with dotnet
Deep Dive Time Series Anomaly Detection in Azure with dotnet
 
Stackato v3
Stackato v3Stackato v3
Stackato v3
 
Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?
 
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f..."Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Tensor flow
Tensor flowTensor flow
Tensor flow
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
Continuos Integration and Delivery: from Zero to Hero with TeamCity, Docker a...
Continuos Integration and Delivery: from Zero to Hero with TeamCity, Docker a...Continuos Integration and Delivery: from Zero to Hero with TeamCity, Docker a...
Continuos Integration and Delivery: from Zero to Hero with TeamCity, Docker a...
 

More from Koan-Sin Tan

running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
Koan-Sin Tan
 
Exploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsExploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source Tools
Koan-Sin Tan
 
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolExploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Koan-Sin Tan
 
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Koan-Sin Tan
 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016
Koan-Sin Tan
 
A peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserA peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk User
Koan-Sin Tan
 
Android Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchAndroid Wear and the Future of Smartwatch
Android Wear and the Future of Smartwatch
Koan-Sin Tan
 
Smalltalk and ruby - 2012-12-08
Smalltalk and ruby  - 2012-12-08Smalltalk and ruby  - 2012-12-08
Smalltalk and ruby - 2012-12-08
Koan-Sin Tan
 

More from Koan-Sin Tan (8)

running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
 
Exploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source ToolsExploring Your Apple M1 devices with Open Source Tools
Exploring Your Apple M1 devices with Open Source Tools
 
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source ToolExploring Thermal Related Stuff in iDevices using Open-Source Tool
Exploring Thermal Related Stuff in iDevices using Open-Source Tool
 
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
Why You Cannot Use Neural Engine to Run Your NN Models on A11 Devices?
 
SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016SoC Idling for unconf COSCUP 2016
SoC Idling for unconf COSCUP 2016
 
A peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk UserA peek into Python's Metaclass and Bytecode from a Smalltalk User
A peek into Python's Metaclass and Bytecode from a Smalltalk User
 
Android Wear and the Future of Smartwatch
Android Wear and the Future of SmartwatchAndroid Wear and the Future of Smartwatch
Android Wear and the Future of Smartwatch
 
Smalltalk and ruby - 2012-12-08
Smalltalk and ruby  - 2012-12-08Smalltalk and ruby  - 2012-12-08
Smalltalk and ruby - 2012-12-08
 

Recently uploaded

Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Jay Das
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 

Recently uploaded (20)

Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 

Tensorflow on Android

  • 1. TensorFlow on Android “freedom” Koan-Sin Tan freedom@computer.org COSCUP 2017, Taipei, Taiwan
  • 2. Who Am I • A software engineer working for a SoC company • An old open source user, learned to use Unix on a VAX-11/780 running 4.3BSD • Learned a bit about TensorFlow and how it works on Android • Send a couple of PRs, when I was learning to use TensorFlow to classify image
  • 3. TensorFlow • “An open-source software library for Machine Intelligence” • An open-source library for deep neural network learning • https://www.tensorflow.org/
  • 5.
  • 6.
  • 7. • My first impression of TensorFlow • Hey, that’s scary. How come you see some many compiler warnings when building such popular open- source library • think about: WebKit, llvm/clang, linux kernel, etc. • Oh, Google has yet another build system and it’s written in Java
  • 8. How TensorFlow Works • TensorFlow is dataflow programming • a program modeled as an acyclic directional graph • node/vertex: operation • edge: flow of data (tensor in TensorFlow) • operations don’t execute right away • operations execute when data are available to ALL inputs In [1]: import tensorflow as tf In [2]: node1 = tf.constant(3.0) ...: node2 = tf.constant(4.0) ...: print(node1, node2) ...: (<tf.Tensor 'Const:0' shape=() dtype=float32>, <tf.Tensor 'Const_1:0' shape=() dtype=float32>) In [3]: sess = tf.Session() ...: print(sess.run([node1, node2])) ...: [3.0, 4.0] In [4]: a = tf.add(3, 4) ...: print(sess.run(a)) ...: 7
  • 9. TensorFlow on Android • https://www.tensorflow.org/mobile/ • ongoing effort “to reduce the code footprint, and supporting quantization and lower precision arithmetic that reduce model size” • Looks good • some questions • how to build ARMv8 binaries, with latest NDK? • --cxxopt="-std=c++11" --cxxopt="- Wno-c++11-narrowing" --cxxopt=“- DTENSORFLOW_DISABLE_META” • Inception models (e.g., V3) are relatively slow on Android devices • is there any benchmark or profiling tool? • it turns out YES 9
  • 10. • bazel build -c opt --linkopt="-ldl" --cxxopt="-std=c++11" -- cxxopt="-Wno-c++11-narrowing" --cxxopt="- DTENSORFLOW_DISABLE_META" --crosstool_top=//external:android/ crosstool --cpu=arm64-v8a --host_crosstool_top=@bazel_tools// tools/cpp:toolchain //tensorflow/examples/ android:tensorflow_demo --fat_apk_cpu=arm64-v8a • bazel build -c opt --cxxopt="-std=c++11" --cxxopt="- DTENSORFLOW_DISABLE_META" --crosstool_top=//external:android/ crosstool --cpu=arm64-v8a --host_crosstool_top=@bazel_tools// tools/cpp:toolchain //tensorflow/examples/ android:tensorflow_demo --fat_apk_cpu=arm64-v8a • in case you wanna know how to do it for with older NDK
  • 11. • The TensorFlow benchmark can benchmark a compute graph and its individual options • both on desktop and Android • however, it doesn't deal with real input(s)
  • 12. • I saw label_image when reading an article on quantization • label_image didn't build for Android • still image decoders (jpg, png, and gif) are not included • So, • made it run • added a quick and dirty BMP decider • To hack more quickly (compiling TensorFlow on MT8173 board running Debian is slow), I wrote a Python script to mimic what the C++ program does [1] https://github.com/tensorflow/tensorflow/ tree/master/tensorflow/examples/label_image
  • 16. label_image flounder:/data/local/tmp $ ./label_image --graph=inception_v3_2016_08_28_frozen.pb -- image=grace_hopper.bmp --labels=imagenet_slim_labels.txt can't determine number of CPU cores: assuming 4 can't determine number of CPU cores: assuming 4 native : main.cc:250 military uniform (653): 0.834119 native : main.cc:250 mortarboard (668): 0.0196274 native : main.cc:250 academic gown (401): 0.00946237 native : main.cc:250 pickelhaube (716): 0.00757228 native : main.cc:250 bulletproof vest (466): 0.0055856 flounder:/data/local/tmp $ ./label_image --graph=quantized_graph.pb --image=grace_hopper.bmp -- labels=imagenet_slim_labels.txt can't determine number of CPU cores: assuming 4 can't determine number of CPU cores: assuming 4 native : main.cc:250 military uniform (653): 0.930771 native : main.cc:250 mortarboard (668): 0.00730017 native : main.cc:250 bulletproof vest (466): 0.00365008 native : main.cc:250 pickelhaube (716): 0.00365008 native : main.cc:250 academic gown (401): 0.00365008
  • 17. gemmlowp • GEMM (GEneral Matrix Multiplication) • The Basic Linear Algebra Subprograms (BLAS) are routines that provide standard building blocks for performing basic vector and matrix operations • The Level 1 BLAS (1979) perform scalar, vector and vector-vector operations, • the Level 2 BLAS (1988) perform matrix-vector operations, and • the Level 3 BLAS (1990) perform matrix-matrix operations: {S,D,C,Z}GEMM and others • Lowp: low-precision • less than single precision floating point numbers (< 32-bit), well, actually, "low-precision" in gemmlowp means that the input and output matrix entries are integers on at most 8 bits • Why GEMM • Optimized • FC, Convolution (im2col, see next page)
  • 19. Quantization is tricky • Yes, we see https://www.tensorflow.org/performance/quantization • tensorflow/tools/quantization/ • There are others utilities • tensorflow/tools/graph_transforms/ • Inception V3 model, • Floating point numbers ./benchmark_model --output_layer=InceptionV3/Predictions/Reshape_1 —input_layer_shape=1,299,299,3 • avg: around 840 ms for a 299x299x3 photo • Quantized one ./benchmark_model --graph=quantized_graph.pb --output_layer=InceptionV3/Predictions/Reshape_1 --input_layer_shape=1,299,299,3 • If we tried a recent one, oops, > 1.2 seconds
  • 20.
  • 21. Current status of TF • Well, public status • Google does have internal branches, during review process of BMP decoder, I ran into one • CPU ARMv7 and ARMv8 • Q hexagon DSP, https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/hvx • Eigen and gemmlowp • Basic XLA works • Not all operations are supported • the ‘name = “mobile_srcs”’ in tensorflow/core/BUILD • “//tensorflow/core/kernels:android_core_ops", “//tensorflow/core/kernels:android_extended_ops" in tensorflow/ core/kernel/BUILD • C++ and Java API (the TensorFlow site lists Python, C++, Java, and GO now) • I am far away from Java, don't know how good the API is • “A word of caution: the APIs in languages other than Python are not yet covered by the API stability promises." • You may find something like RSTensorFlow and tf-coriander, but AFAICT they are far away from complete
  • 22. Arch including distributed training • The architecture figure of TensorFlow show important components, including distributed stuff https://www.tensorflow.org/extend/architecture
  • 25. Android Neural Network API • New API for Neural Network • Being added to the Android framework • Wraps hardware accelerators (GPU, DSP, ISP, etc.) from Google I/O 2017 video
  • 26. • New TensorFlow runtime • Optimized for mobile and embedded apps • Runs TensorFlow models on device • Leverage Android NN API • Soon to be open sourced from Google I/O 2017 video
  • 27. Comparing with CoreML stack • No GPU/GPGPU support yet. Hopefully, Android NN will help. • Albeit Google is so good at ML/DL and various applications, we don’t see good application framework(s) on Android yet.
  • 28. Simple CoreML Exercise • Simple app to use InceptionV3 to classify image from Photo roll or camera • in Objective-C • in Swift • Work continuously on camera • in Objective-C • in Swift
  • 29. Depthwise Separable Convolution • CNNs with depthwise separable convolution such as Mobilenet [1] changed almost everything • Depthwise separable convolution “factorize” a standard convolution into a depthwise convolution and a 1 × 1 convolution called a pointwise convolution. Thus it greatly reduces computation complexity. • Depthwise separable convolution is not that that new [2], but pure depthwise separable convolution-based networks such as Xception and MobileNet demonstrated its power [1] https://arxiv.org/abs/1704.04861 [2] L. Sifre. “Rigid-motion scattering for image classification”, PhD thesis, 2014
  • 30. ...M N 1 1 ... MDK DK 1 ... M DK DK N depthwise convolution filters standard convolution filters 1×1 Convolutional Filters (Pointwise Convolution)https://arxiv.org/abs/1704.04861 Depthwise Separable Convolution
  • 31. MobileNet • D_K: kernel size • D_F: input size • M: input channel size • N: output channel size https://arxiv.org/abs/1704.04861
  • 32. MobileNet on Nexus 9 • “largest” Mobilenet model http://download.tensorflow.org/ models/mobilenet_v1_1.0_224_frozen.tgz • benchmark_model: ./benchmark_model -- graph=frozen_graph.pb —output_layer=MobilenetV1/ Predictions/Reshape_1 • around 120 ms • Smallest one • mobilenet_v1_0.25_128: ~25 ms
  • 33. flounder:/data/local/tmp $ ./label_image --graph=mobilenet_10_224.pb --image=grace_hopper.bmp -- labels=imagenet_slim_labels.txt --output_layer=MobilenetV1/Predictions/Reshape_1 --input_width=224 -- input_height=224 can't determine number of CPU cores: assuming 4 can't determine number of CPU cores: assuming 4 native : main.cc:250 military uniform (653): 0.871238 native : main.cc:250 bow tie (458): 0.0575709 native : main.cc:250 ping-pong ball (723): 0.0113924 native : main.cc:250 suit (835): 0.0110482 native : main.cc:250 bearskin (440): 0.00586033 flounder:/data/local/tmp $ ./label_image --graph=mobilenet_025_128.pb --image=grace_hopper.bmp -- labels=imagenet_slim_labels.txt --output_layer=MobilenetV1/Predictions/Reshape_1 --input_width=128 -- input_height=128 can't determine number of CPU cores: assuming 4 can't determine number of CPU cores: assuming 4 native : main.cc:250 suit (835): 0.310988 native : main.cc:250 bow tie (458): 0.197784 native : main.cc:250 military uniform (653): 0.121169 native : main.cc:250 academic gown (401): 0.0309299 native : main.cc:250 mortarboard (668): 0.0242411
  • 34. Recap • TensorFlow may not be great on Android yet • New techniques and NN models are changing status quo • Android NN, XLA, MobileNet • big.LITTLE and other system software optimization may still be needed
  • 37. MobileNet on iPhone • Find a Caffe model, e.g., the one • Or, use a converted one
  • 38. label_image in Python • I am not familiar with Protocol Buffer