The Flow of TensorFlow
Jeongkyu Shin
Lablup Inc.
2017. 11. 12 / GDG DevFest Nanjing 2017
2017. 11. 19 / GDG DevFest Seoul 2017
Descript.ion
§ CEO / Co-founder, Lablup Inc.
§ Develops Backend.AI
§ Open-source devotee
§ Google Developer Experts (Machine Learning)
§ Principal Researcher, KOSSLab., Korea
§ Textcube open-source project maintainer (10th
anniversary!)
§ Physicist / Neuroscientist
§ Adj. professor (Dept. of CSE, Hanyang Univ.)
§ Ph.D in Statistical Physics (complex systems /
computational neuroscience)
Jeongkyu Shin / @inureyes
Machine Learning Era: All came from dust
§ Machine learning
§ ”Field of study that gives computers the ability to learn without being explicitly programmed”
Arthur Samuel (1959)
§ "A computer program is said to learn from experience E with respect to some class of tasks T
and performance measure P, if its performance at tasks in T, as measured by P, improves with
experience E.” Tom Michel (1999)
§ Type of Machine Learning
§ Supervised learning
§ Unsupervised learning
§ Reinforcement learning
§ Recommender system
Artificial Intelligence
§ Definition
§ Allan Turing, ‘The Imitation Game” (1950) => Turing test
§ John McCarthy, Dartmouth Artificial Intelligence Conference (1956)
§ Information Processing Language (1955)
§ From axiom to theory
§ Heuristics to reduce probing space
§ Born of LISP programming language
§ First approach : IF-THEN rule
§ Probe every possible cases and choose the pathway with highest fitness
Artificial Neural Network: Basics
§ Effect of layers
A. K. Jain, J. Mao, K. M. Mohiuddin (1996) Artificial Neural Networks: A Tutorial IEEE Computer 29
Winter was coming
§ First winter (1970s)
§ Complex problems: too difficult to construct logic models (by hand)
§ Second winter (1990s)
§ Overfitting problem → pre-training, supervised backpropagation → dropout (2013)
§ Convergence → vanishing gradient problem (1991)
§ Divergence problem → weight decay / sparsity regularization
§ Tedious training speed → IT evolution, mini-batch
§ And the spring: Environmental changes open the gate
§ Rise of big-data
§ Phenomenal computation cost reduction
Deep Learning: flower of the golden era
§ What if you have enough money to do (formally) crazy experiments? Like
§ Increase the number of hidden layers
§ Pour unlimited number of data
§ Breakthrough of deep learning
§ Geoffrey Hinton (2005)
§ Andrew Ng (2012)
§ Convolution Neural Network
§ Pooling layer + weight
§ Recurrent Neural Network
§ Feedforward routine with (long/short) term memory
§ Deep disbelief Network
§ Multipartite neural network with generative model
§ Deep Q-Network
§ Using deep learning for reinforcement learning
AlphaGo as a mixture of Machine Learning techniques
§ Reducing search space
§ Breadth reduction
§ And depth reduction
§ Prediction
§ 13 layer convolutional NN
§ Value network
§ Policy network
§ Principal variation
Flow of TensorFlow
Still less than two years passed.
TensorFlow
§ Open-source software library for machine learning across a range of tasks
§ Developed by Google (Dec. 2015~)
§ Characteristics
§ Python API (like Theano)
§ From 1.0, TensorFlow expands native API binding with Java C, etc.
§ Supports
§ Linux, macOS
§ NVidia GPUs (pascal and above)
Before TensorFlow
§ User-friendly Deep-learning toolkits
§ Caffe (2012)
§ Generalized programming method to researchers
§ Provides common NN blocks
§ Configuration file + training kernel program
§ Theano (2013~2017)
§ User code / configuration part is written in Python
§ Keras (2015~)
§ Meta-framework for Deep Learning programming
§ Supports various backends:
§ Theano (default) / TensorFlow (2016~) / MXNet (2017~) / CNTK (WIP)
§ ETC
§ Paddle, Chainer, DL4J…
TensorFlow: Summary
§ Statistics
§ More than 24000 commits since Dec. 2015
§ More than 1140 committers
§ More than 24000 forks for last 12 months
§ Dominates Bootstrap! (15000)
§ More than 6400 TensorFlow-related
repository created on GitHub
§ Current
§ Complete ML model prototyping
§ Distributed training
§ CPU / GPU / TPU / Mobile support
§ TensorFlow Serving
§ Enables easier inference / model serving
§ XLA compiler (1.0~)
§ Support various environments / speedups
§ Keras API Support (1.2~)
§ High-level programming API
§ Keras-compatible API
§ Eager Execution (1.4~)
§ Interactive mode of TensorFlow
§ Treat TensorFlow python code as real
python code
https://www.infoworld.com/article/3233283/javascript/at-github-javascript-rules-in-usage-tensorflow-leads-in-forks.html
TensorFlow: Summary
§ TensorFlow Serving
§ Enables easier inference / model serving
§ XLA compiler (1.0~)
§ Support various environments / speedups
§ Keras API Support (1.2~)
§ High-level programming API
§ Keras-compatible API
§ Eager Execution (1.4~)
§ Interactive mode of TensorFlow
§ Treat TensorFlow python code as real
python code
2016
2017
⏤ TensorFlow Serving
⏤ Keras API
⏤ Eager Execution
⏤ TensorFlow Lite
⏤ XLA
⏤ OpenAL w/ OpenCompute
⏤ Distributed TensorFlow
⏤ Multi GPU support
⏤ Mobile TensorFlow
⏤ TensorFlow Datasets
⏤ SKLearn (contrib)
⏤ TensorFlow Slim
⏤ SyntaxNet
⏤ DRAGNN
⏤ TFLearn (contrib)
⏤ TensorFlow TimeSeries
How TensorFlow works
§ CPU
§ Multiprocessor
§ AVX-based acceleration
§ GPU part in chip
§ OpenMP
§ GPU
§ CUDA (NVidia) ➜ cuDNN
§ OpenCL (AMD) ➜ ComputeCPP /
ROCm
§ TPU (1st, 2nd gen.)
§ ASIC for accelerating matrix
calculation
§ In-house development by Google
https://www.tensorflow.org/get_started/graph_viz
How TensorFlow works
§ Python but not Python
§ Python API is default API for
TensorFlow
§ However, TF core is written in C++,
with cuDNN library (for GPU
acceleration)
§ Computation Graph
§ User TF code is not a code
§ it is a configuration to generate
computation graph
§ Session
§ Creates a computation graph and
run the training using C++ core
§ Tedious debug process
Google I/O 2017 / TensorFlow Frontiers
How TensorFlow works
TensorFlow Features
§ Recent TensorFlow core features
§ TensorFlow Estimators
§ Included in 1.4 (Oct. 2017) / high-level API for using, modeling well-known estimators
§ TensorFlow Serving (independent project)
§ TensorFlow Keras-compatible API (Sep. 2017)
§ Included in 1.3 (Sep. 2017)
§ TensorFlow Datasets
§ Included in 1.4 (Oct. 2017)
§ Upcoming/testing TensorFlow core features
§ TensorFlow eager execution
§ Introduced in 1.4 (Oct. 2017)
§ TensorFlow Lite
§ (Work-in-progress)
XLA: linear algebra compiler for TensorFlow
Google I/O 2017 / TensorFlow Frontiers
TensorFlow Serving
§ Serving system for inference service
§ Components
§ Servables
§ Loaders
§ Managers
§ Features
§ Model building
§ Model versioning
§ Model saving / loading
§ Online inference support with RPC
Keras-compatible API for TensorFlow
§ Keras ( https://keras.io )
§ High-level API
§ Focus on user experience
§ “Deep learning accessible to everyone”
§ History
§ Announced at Feb. 2017
§ Bundled as an contribution package from TF 1.2
§ Official core package since 1.4
§ Characteristics
§ “Simplified workflow for TensorFlow users, more powerful features to Keras users”
§ Most Keras code can be used on TensorFlow (with keras. to tf.keras.)
§ Can mix Keras code with TensorFlow codes
TensorFlow Datasets
§ New way to generate data pipeline
§ Dataset classes
§ TextLineDataset
§ TFRecordDataset
§ FixedLengthRecordDataset
§ Iterator
Example: Decoding and resizing image data
# Reads an image from a file, decodes it into a dense tensor, and resizes it
# to a fixed shape.
def _parse_function(filename, label):
image_string = tf.read_file(filename)
image_decoded = tf.image.decode_image(image_string)
image_resized = tf.image.resize_images(image_decoded, [28, 28])
return image_resized, label
# A vector of filenames.
filenames = tf.constant(["/var/data/image1.jpg", "/var/data/image2.jpg", ...])
# `labels[i]` is the label for the image in `filenames[i].
labels = tf.constant([0, 37, ...])
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))
dataset = dataset.map(_parse_function)
Eager execution
§ Announced at Oct. 30, 2017
§ Makes TensorFlow execute operations immediately
§ Returns concrete values
§ Provides
§ A NumPy-like library for numerical computation
§ Support for GPU acceleration and automatic differentiation
§ A flexible platform for machine learning research and experiments
§ Advantages
§ Python debugger tools
§ Immediate error reporting
§ Easy control flow
§ Python data structures
Example: Session
x = tf.placeholder(tf.float32, shape=[1, 1])
m = tf.matmul(x, x)
print(m)
# Tensor("MatMul:0", shape=(1, 1),
dtype=float32)
with tf.Session() as sess:
m_out = sess.run(m, feed_dict={x: [[2.]]})
print(m_out)
# [[4.]]
x = [[2.]]
m = tf.matmul(x, x)
print(m)
# tf.Tensor([[4.]], dtype=float32,
shape=(1,1))
Example: Instant error
x = tf.gather([0, 1, 2], 7)
InvalidArgumentError: indices = 7 is not in [0, 3) [Op:Gather]
Example: removing metaprogramming
x = tf.random_uniform([2, 2])
with tf.Session() as sess:
for i in range(x.shape[0]):
for j in range(x.shape[1]):
print(sess.run(x[i, j]))
x = tf.random_uniform([2, 2])
for i in range(x.shape[0]):
for j in range(x.shape[1]):
print(x[i, j])
a = tf.constant(6)
while not tf.equal(a, 1):
if tf.equal(a % 2, 0):
a = a / 2
else:
a = 3 * a + 1
print(a)
Eager execution: Python Control Flow
# Outputs
tf.Tensor(3, dtype=int32)
tf.Tensor(10, dtype=int32)
tf.Tensor(5, dtype=int32)
tf.Tensor(16, dtype=int32)
tf.Tensor(8, dtype=int32)
tf.Tensor(4, dtype=int32)
tf.Tensor(2, dtype=int32)
tf.Tensor(1, dtype=int32)
def square(x):
return tf.multiply(x, x) # Or x * x
grad = tfe.gradients_function(square)
print(square(3.)) # tf.Tensor(9., dtype=tf.float32
print(grad(3.)) # [tf.Tensor(6., dtype=tf.float32))]
Eager execution: Gradients
def square(x):
return tf.multiply(x, x) # Or x * x
grad = tfe.gradients_function(square)
gradgrad = tfe.gradients_function(lambda x: grad(x)[0])
print(square(3.)) # tf.Tensor(9., dtype=tf.float32)
print(grad(3.)) # [tf.Tensor(6., dtype=tf.float32)]
print(gradgrad(3.)) # [tf.Tensor(2., dtype=tf.float32))]
Eager execution: Gradients
def log1pexp(x):
return tf.log(1 + tf.exp(x))
grad_log1pexp = tfe.gradients_function(log1pexp)
print(grad_log1pexp(0.))
Eager execution: Custom Gradients
Works	fine,	prints	[0.5]
def log1pexp(x):
return tf.log(1 + tf.exp(x))
grad_log1pexp = tfe.gradients_function(log1pexp)
print(grad_log1pexp(100.))
Eager execution: Custom Gradients
[nan]	due	to	numeric	
instability
@tfe.custom_gradient
def log1pexp(x):
e = tf.exp(x)
def grad(dy):
return dy * (1 - 1 / (1 + e))
return tf.log(1 + e), grad
grad_log1pexp = tfe.gradients_function(log1pexp)
# Gradient at x = 0 works as before.
print(grad_log1pexp(0.)) # [0.5]
# And now gradient computation at x=100 works as well.
print(grad_log1pexp(100.)) # [1.0]
Eager execution: Custom Gradients
tf.device() for manual placement
with tf.device(“/gpu:0”):
x = tf.random_uniform([10, 10])
y = tf.matmul(x, x)
# x and y reside in GPU memory
Eager execution: Using GPUs
The	same	APIs	as	graph	building	
(tf.layers, tf.train.Optimizer, tf.data etc.)
model = tf.layers.Dense(units=1, use_bias=True)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
Eager execution: Building Models
model = tf.layers.Dense(units=1, use_bias=True)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
# Define a loss function
def loss(x, y):
return tf.reduce_mean(tf.square(y - model(x)))
Eager execution: Building Models
Compute	and	apply	gradients
for (x, y) in get_next_batch():
optimizer.apply_gradients(grad_fn(x, y))
Eager execution: Training Models
Compute	and	apply	gradients
grad_fn = tfe.implicit_gradients(loss)
for (x, y) in get_next_batch():
optimizer.apply_gradients(grad_fn(x, y))
Eager execution: Training Models
Comparison
TensorFlow TFlearn TF Slim
TF Eager
Execution
Keras
(with TF
backend)
Keras
(with MXNet
backend)
PyTorch CNTK MXNet
Difficulty ■■■■ ■■■ ■■ ■■ ■■ ■■■ ■ ■■■■ ■■■■
Extensibility ■■■■ ■■■■ ■■■■ ■■ ■■ ■■ ■ ■■■■ ■■■■
Interactive
mode
X X X O X X O X X
Multi-CPU
(NUMA)
O O X X O O O O O
Multi-CPU
(Cluster)
O O O X O O X O O
Multi-GPU
(single node)
O O O X O O
?
(manual
multi-
batch)
O O
Multi-GPU
(Cluster)
O O O X O O X O O
TensorFlow Lite
§ TensorFlow Lite: Embedded
TensorFlow
§ No additional environment installation
required
§ OS level hardware acceleration
§ Leverages Android NN
§ XLA-based optimization support
§ Enables binding to various programming
languages
§ Developer Preview (4 days ago)
§ Part of Android O-MR1
Google I/O 2017 / Android meets TensorFlow
TensorFlow Lite
§ Format
§ FlatBuffers instead of ProtocolBuffers
§ Provides converter
§ Models
§ InceptionV3
§ MobileNets: vision-specific model family
§ API
§ Java
§ C++
TensorFlow Lite: Why and How
§ Why? Less traffic / faster response
§ Image / OCR, Speech <-> Text, Translation, NLP
§ Motion, GPS and more
§ ML can extract the meaning from raw data
§ Image recognition: Send raw image vs. send detected label
§ Motion detection: Send raw motion vs. send feature vector
§ How? Model compression
§ Graph freezing
§ Graph conversion tools
§ Quantization
§ Weight
§ Calculation
§ Memory mapping
Google I/O 2017 / Android meets TensorFlow
Android Neural Network API
§ New APIs for NeuralNet
§ Part of Android Framework
§ Since next Android release
§ Reduce the library duplication through apps.
§ Supports Hardware acceleration
§ GPU, DSP, ISP, NeuralNet chips, etc.
Google I/O 2017 / Android meets TensorFlow
Flow goes to: market
What is flowing through the stream?
Market: API-based (personalized) deep learning service
§ Service with pre-baked models via API
§ Focuses on the fields that does not require real-time
§ e.g. Microsoft Azure Cognitive service
§ Pre-trained ANN + personalized data = personalized NN
§ Easy personalization : server-side training
+ =
Market: User-side deep learning services
§ Inference with trained models
§ Does not require heavy calculation
§ e.g. ARMv7 with ~512MB / 1GB RAM
§ Toys / light products
§ Smart toys for kidult (adult + kids) : Self-driving R/C car / drone
§ Home appliance and controllers
§ IoT + ML
§ Locality : Home (per room), Car, Office, etc.
§ E.g. Smart home resource management systems
Market: Deep Learning service for everyone
§ Digital assistants War
§ Digital assistant (with sprakers): Gateway of deep learning based services
§ Context extraction + inference + features
§ Echo (Amazon) / Google Home (Google)
§ Microsoft (Cortana in every MS products) / Apple (HomePod)
§ Korea? Also entering the war field
§ Naver: Wave / Friends
§ Kakao: Kakao mini
§ SK: Nugu
Flow goes to: tech.
What is flowing through the stream?
Portability and extensibility
§ Training on
§ Mac / windows
§ GPU server
§ GPU / TPU on Cloud
§ Prediction / Inference using
§ Android / iOS
§ Raspberry Pi and TPU
§ Android Things
Google I/O 2017 / Android meets TensorFlow
Open-source Machine Learning Framework
§ Machine Learning Framework: (almost) open-source
§ Google: TensorFlow (2015~)
§ Microsoft: CNTK (2016~)
§ Amazon: MxNet (2015~)
§ Facebook: Caffe 2 (2017~) / PyTorch (2016~)
§ Baidu: PaddlePaddle (2016~)
§ Why?
§ 2017
§ General goal of new versions: user-friendly syntax
§ Rise of Keras, PyTorch leads TensorFlow Eager execution
Server-side machine learning
§ Machine learning workload characteristics
§ Training
§ Requires ultra-heavy computation resources
§ Need to feed big, indexed data
§ OR, (reinforcement learning) need pair model / training
environment to give feedbacks
§ Serving
§ Requires (relatively) light resources:
§ Low CPU cost
§ Middle memory capacity (to load NeuralNet)
TensorFlow: Multiverse
§ TensorFlow AMD GPU acceleration
§ OpenCL with ComputeCPP (Feb. 2017)
§ Accelerates c++ codes (codeplay)
§ Khronos support / SYCL standard
§ Still in early stage
§ Only supports Linux
§ ROCm (AMD) based TensorFlow (Sep. 2017)
§ First open-source HPC/Hyperscale-class
platform for GPU computing
§ LLVM based / HCC C++ / GCN compiler
§ https://github.com/ROCmSoftwarePlatform/
hiptensorflow
Hand-held machine learning: Why?
§ Issues from real-time models / apps
§ Autopilot
§ Real-time effect on photos / videos
§ Voice recognition
§ Automators
§ Privacy issues
§ Increasing privacy information
§ ETC
§ Lead the network cost reduction
Hand-held machine learning: How?
§ Apple’s approach
§ Keeping user privacy with Differential Privacy
§ Gather Anonymized user data
§ User-specific machine learning models: keep them in the phone
§ e.g. Photo face detection / voice recognition / smart keyboard
§ Core ML (iOS 11)
§ Support Machine Learning model as function (.mlmodel format)
§ Google’s approach
§ Ultra-large scale server side training using TPU (2nd gen.)
§ Mobile: Handles data compression and feature extraction (to reduce traffic)
§ On the mobile:
§ Android NeuralNet API (Android O)
§ TensorFlow Lite on Android (Android O)
https://backchannel.com/an-exclusive-look-at-how-ai-and-machine-learning-work-at-apple-8dbfb131932b
Hand-held machine learning: How?
§ Train on server, Serve on smartphone
§ Enough to serve pre-trained models on smartphones
§ Both train and serve on smartphone
§ Keeping privacy / reduce traffic / personalization
§ Uses GPUs on recent smartphones
§ Working together
§ Feature extraction / compression / preprocessing ‒ Mobile side
§ Machine Learning model training / updating / streaming advanced models ‒ Server side
Hand-held machine learning: How?
§ TensorFlow
§ Supports both Android and iOS
§ XCode and Android Studio
§ XLA compiler framework since TensorFlow 1.0:
§ Will support diverse languages / environments
§ Also, optimizing for smartphones and tablets
§ MobileNet (Apr. 2017)
§ Efficient Convolutional Neural Networks for Mobile Vision Applications
§ TensorFlow Lite (Nov. 2017): development focus
§ Built-in operators for both quantized models (int (8bit) / fixed point) and floating point models
(FP10, FP16)
§ Support for embedded GPUs / ASICs
Browser-side machine learning
§ Machine Learning without hassle
§ Ingredients for machine learning: Computation, Data, Algorithm
§ XLA: provides binary-code level optimization for various environment
§ Do we have cross-platform computation environment?
§ Java?
§ Browser!
§ Recent improvements of web browser
§ WebGL
§ Unified programming environment for many GPU-enabled machines
§ WebAssembly
§ Binary-level optimization
§ Shipped to every mainstream browser! (just in this week)
Convertible NeuralNet format
§ ONNX (Open Neural Network Exchange)
§ Microsoft / Facebook (Sep. 2017)
§ Caffe 2, PyTorch (by Facebook), CNTK (Microsoft)
§ MLMODEL (Code ML model, Machine Learning Model)
§ Apple (Aug. 2017)
§ Caffe, Keras, scikit-learn, LIBSVM (Open Source)
§ Provides Core ML converter / specification
Recap
§ Machine Learning / Artificial Intelligence
§ Flow of TensorFlow
§ TensorFlow Serving Project
§ Keras-compatible API
§ Datasets
§ Eager execution
§ TensorFlow Lite
§ Flow goes to
§ More user-friendly toolkits / frameworks
§ API-based / personalized
§ User-side inference / Hand-held ML
§ Convertible Machine Learning Model formats
End!
Thank you for listening
https://www.lablup.ai
https://backend.ai
https://cloud.backend.ai
https://www.codeonweb.com
https://github.com/lablup
Lablup Inc.
Backend.AI
Backend.AI Cloud
CodeOnWeb Service
Github repository

The Flow of TensorFlow

  • 1.
    The Flow ofTensorFlow Jeongkyu Shin Lablup Inc. 2017. 11. 12 / GDG DevFest Nanjing 2017 2017. 11. 19 / GDG DevFest Seoul 2017
  • 2.
    Descript.ion § CEO /Co-founder, Lablup Inc. § Develops Backend.AI § Open-source devotee § Google Developer Experts (Machine Learning) § Principal Researcher, KOSSLab., Korea § Textcube open-source project maintainer (10th anniversary!) § Physicist / Neuroscientist § Adj. professor (Dept. of CSE, Hanyang Univ.) § Ph.D in Statistical Physics (complex systems / computational neuroscience) Jeongkyu Shin / @inureyes
  • 8.
    Machine Learning Era:All came from dust § Machine learning § ”Field of study that gives computers the ability to learn without being explicitly programmed” Arthur Samuel (1959) § "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” Tom Michel (1999) § Type of Machine Learning § Supervised learning § Unsupervised learning § Reinforcement learning § Recommender system
  • 10.
    Artificial Intelligence § Definition §Allan Turing, ‘The Imitation Game” (1950) => Turing test § John McCarthy, Dartmouth Artificial Intelligence Conference (1956) § Information Processing Language (1955) § From axiom to theory § Heuristics to reduce probing space § Born of LISP programming language § First approach : IF-THEN rule § Probe every possible cases and choose the pathway with highest fitness
  • 11.
    Artificial Neural Network:Basics § Effect of layers A. K. Jain, J. Mao, K. M. Mohiuddin (1996) Artificial Neural Networks: A Tutorial IEEE Computer 29
  • 12.
    Winter was coming §First winter (1970s) § Complex problems: too difficult to construct logic models (by hand) § Second winter (1990s) § Overfitting problem → pre-training, supervised backpropagation → dropout (2013) § Convergence → vanishing gradient problem (1991) § Divergence problem → weight decay / sparsity regularization § Tedious training speed → IT evolution, mini-batch § And the spring: Environmental changes open the gate § Rise of big-data § Phenomenal computation cost reduction
  • 14.
    Deep Learning: flowerof the golden era § What if you have enough money to do (formally) crazy experiments? Like § Increase the number of hidden layers § Pour unlimited number of data § Breakthrough of deep learning § Geoffrey Hinton (2005) § Andrew Ng (2012) § Convolution Neural Network § Pooling layer + weight § Recurrent Neural Network § Feedforward routine with (long/short) term memory § Deep disbelief Network § Multipartite neural network with generative model § Deep Q-Network § Using deep learning for reinforcement learning
  • 17.
    AlphaGo as amixture of Machine Learning techniques § Reducing search space § Breadth reduction § And depth reduction § Prediction § 13 layer convolutional NN § Value network § Policy network § Principal variation
  • 18.
    Flow of TensorFlow Stillless than two years passed.
  • 19.
    TensorFlow § Open-source softwarelibrary for machine learning across a range of tasks § Developed by Google (Dec. 2015~) § Characteristics § Python API (like Theano) § From 1.0, TensorFlow expands native API binding with Java C, etc. § Supports § Linux, macOS § NVidia GPUs (pascal and above)
  • 20.
    Before TensorFlow § User-friendlyDeep-learning toolkits § Caffe (2012) § Generalized programming method to researchers § Provides common NN blocks § Configuration file + training kernel program § Theano (2013~2017) § User code / configuration part is written in Python § Keras (2015~) § Meta-framework for Deep Learning programming § Supports various backends: § Theano (default) / TensorFlow (2016~) / MXNet (2017~) / CNTK (WIP) § ETC § Paddle, Chainer, DL4J…
  • 21.
    TensorFlow: Summary § Statistics §More than 24000 commits since Dec. 2015 § More than 1140 committers § More than 24000 forks for last 12 months § Dominates Bootstrap! (15000) § More than 6400 TensorFlow-related repository created on GitHub § Current § Complete ML model prototyping § Distributed training § CPU / GPU / TPU / Mobile support § TensorFlow Serving § Enables easier inference / model serving § XLA compiler (1.0~) § Support various environments / speedups § Keras API Support (1.2~) § High-level programming API § Keras-compatible API § Eager Execution (1.4~) § Interactive mode of TensorFlow § Treat TensorFlow python code as real python code https://www.infoworld.com/article/3233283/javascript/at-github-javascript-rules-in-usage-tensorflow-leads-in-forks.html
  • 22.
    TensorFlow: Summary § TensorFlowServing § Enables easier inference / model serving § XLA compiler (1.0~) § Support various environments / speedups § Keras API Support (1.2~) § High-level programming API § Keras-compatible API § Eager Execution (1.4~) § Interactive mode of TensorFlow § Treat TensorFlow python code as real python code 2016 2017 ⏤ TensorFlow Serving ⏤ Keras API ⏤ Eager Execution ⏤ TensorFlow Lite ⏤ XLA ⏤ OpenAL w/ OpenCompute ⏤ Distributed TensorFlow ⏤ Multi GPU support ⏤ Mobile TensorFlow ⏤ TensorFlow Datasets ⏤ SKLearn (contrib) ⏤ TensorFlow Slim ⏤ SyntaxNet ⏤ DRAGNN ⏤ TFLearn (contrib) ⏤ TensorFlow TimeSeries
  • 23.
    How TensorFlow works §CPU § Multiprocessor § AVX-based acceleration § GPU part in chip § OpenMP § GPU § CUDA (NVidia) ➜ cuDNN § OpenCL (AMD) ➜ ComputeCPP / ROCm § TPU (1st, 2nd gen.) § ASIC for accelerating matrix calculation § In-house development by Google https://www.tensorflow.org/get_started/graph_viz
  • 24.
    How TensorFlow works §Python but not Python § Python API is default API for TensorFlow § However, TF core is written in C++, with cuDNN library (for GPU acceleration) § Computation Graph § User TF code is not a code § it is a configuration to generate computation graph § Session § Creates a computation graph and run the training using C++ core § Tedious debug process
  • 25.
    Google I/O 2017/ TensorFlow Frontiers How TensorFlow works
  • 26.
    TensorFlow Features § RecentTensorFlow core features § TensorFlow Estimators § Included in 1.4 (Oct. 2017) / high-level API for using, modeling well-known estimators § TensorFlow Serving (independent project) § TensorFlow Keras-compatible API (Sep. 2017) § Included in 1.3 (Sep. 2017) § TensorFlow Datasets § Included in 1.4 (Oct. 2017) § Upcoming/testing TensorFlow core features § TensorFlow eager execution § Introduced in 1.4 (Oct. 2017) § TensorFlow Lite § (Work-in-progress)
  • 27.
    XLA: linear algebracompiler for TensorFlow Google I/O 2017 / TensorFlow Frontiers
  • 28.
    TensorFlow Serving § Servingsystem for inference service § Components § Servables § Loaders § Managers § Features § Model building § Model versioning § Model saving / loading § Online inference support with RPC
  • 29.
    Keras-compatible API forTensorFlow § Keras ( https://keras.io ) § High-level API § Focus on user experience § “Deep learning accessible to everyone” § History § Announced at Feb. 2017 § Bundled as an contribution package from TF 1.2 § Official core package since 1.4 § Characteristics § “Simplified workflow for TensorFlow users, more powerful features to Keras users” § Most Keras code can be used on TensorFlow (with keras. to tf.keras.) § Can mix Keras code with TensorFlow codes
  • 30.
    TensorFlow Datasets § Newway to generate data pipeline § Dataset classes § TextLineDataset § TFRecordDataset § FixedLengthRecordDataset § Iterator
  • 31.
    Example: Decoding andresizing image data # Reads an image from a file, decodes it into a dense tensor, and resizes it # to a fixed shape. def _parse_function(filename, label): image_string = tf.read_file(filename) image_decoded = tf.image.decode_image(image_string) image_resized = tf.image.resize_images(image_decoded, [28, 28]) return image_resized, label # A vector of filenames. filenames = tf.constant(["/var/data/image1.jpg", "/var/data/image2.jpg", ...]) # `labels[i]` is the label for the image in `filenames[i]. labels = tf.constant([0, 37, ...]) dataset = tf.data.Dataset.from_tensor_slices((filenames, labels)) dataset = dataset.map(_parse_function)
  • 32.
    Eager execution § Announcedat Oct. 30, 2017 § Makes TensorFlow execute operations immediately § Returns concrete values § Provides § A NumPy-like library for numerical computation § Support for GPU acceleration and automatic differentiation § A flexible platform for machine learning research and experiments § Advantages § Python debugger tools § Immediate error reporting § Easy control flow § Python data structures
  • 33.
    Example: Session x =tf.placeholder(tf.float32, shape=[1, 1]) m = tf.matmul(x, x) print(m) # Tensor("MatMul:0", shape=(1, 1), dtype=float32) with tf.Session() as sess: m_out = sess.run(m, feed_dict={x: [[2.]]}) print(m_out) # [[4.]] x = [[2.]] m = tf.matmul(x, x) print(m) # tf.Tensor([[4.]], dtype=float32, shape=(1,1))
  • 34.
    Example: Instant error x= tf.gather([0, 1, 2], 7) InvalidArgumentError: indices = 7 is not in [0, 3) [Op:Gather]
  • 35.
    Example: removing metaprogramming x= tf.random_uniform([2, 2]) with tf.Session() as sess: for i in range(x.shape[0]): for j in range(x.shape[1]): print(sess.run(x[i, j])) x = tf.random_uniform([2, 2]) for i in range(x.shape[0]): for j in range(x.shape[1]): print(x[i, j])
  • 36.
    a = tf.constant(6) whilenot tf.equal(a, 1): if tf.equal(a % 2, 0): a = a / 2 else: a = 3 * a + 1 print(a) Eager execution: Python Control Flow # Outputs tf.Tensor(3, dtype=int32) tf.Tensor(10, dtype=int32) tf.Tensor(5, dtype=int32) tf.Tensor(16, dtype=int32) tf.Tensor(8, dtype=int32) tf.Tensor(4, dtype=int32) tf.Tensor(2, dtype=int32) tf.Tensor(1, dtype=int32)
  • 37.
    def square(x): return tf.multiply(x,x) # Or x * x grad = tfe.gradients_function(square) print(square(3.)) # tf.Tensor(9., dtype=tf.float32 print(grad(3.)) # [tf.Tensor(6., dtype=tf.float32))] Eager execution: Gradients
  • 38.
    def square(x): return tf.multiply(x,x) # Or x * x grad = tfe.gradients_function(square) gradgrad = tfe.gradients_function(lambda x: grad(x)[0]) print(square(3.)) # tf.Tensor(9., dtype=tf.float32) print(grad(3.)) # [tf.Tensor(6., dtype=tf.float32)] print(gradgrad(3.)) # [tf.Tensor(2., dtype=tf.float32))] Eager execution: Gradients
  • 39.
    def log1pexp(x): return tf.log(1+ tf.exp(x)) grad_log1pexp = tfe.gradients_function(log1pexp) print(grad_log1pexp(0.)) Eager execution: Custom Gradients Works fine, prints [0.5]
  • 40.
    def log1pexp(x): return tf.log(1+ tf.exp(x)) grad_log1pexp = tfe.gradients_function(log1pexp) print(grad_log1pexp(100.)) Eager execution: Custom Gradients [nan] due to numeric instability
  • 41.
    @tfe.custom_gradient def log1pexp(x): e =tf.exp(x) def grad(dy): return dy * (1 - 1 / (1 + e)) return tf.log(1 + e), grad grad_log1pexp = tfe.gradients_function(log1pexp) # Gradient at x = 0 works as before. print(grad_log1pexp(0.)) # [0.5] # And now gradient computation at x=100 works as well. print(grad_log1pexp(100.)) # [1.0] Eager execution: Custom Gradients
  • 42.
    tf.device() for manualplacement with tf.device(“/gpu:0”): x = tf.random_uniform([10, 10]) y = tf.matmul(x, x) # x and y reside in GPU memory Eager execution: Using GPUs
  • 43.
    The same APIs as graph building (tf.layers, tf.train.Optimizer, tf.dataetc.) model = tf.layers.Dense(units=1, use_bias=True) optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1) Eager execution: Building Models
  • 44.
    model = tf.layers.Dense(units=1,use_bias=True) optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1) # Define a loss function def loss(x, y): return tf.reduce_mean(tf.square(y - model(x))) Eager execution: Building Models
  • 45.
    Compute and apply gradients for (x, y)in get_next_batch(): optimizer.apply_gradients(grad_fn(x, y)) Eager execution: Training Models
  • 46.
    Compute and apply gradients grad_fn = tfe.implicit_gradients(loss) for(x, y) in get_next_batch(): optimizer.apply_gradients(grad_fn(x, y)) Eager execution: Training Models
  • 47.
    Comparison TensorFlow TFlearn TFSlim TF Eager Execution Keras (with TF backend) Keras (with MXNet backend) PyTorch CNTK MXNet Difficulty ■■■■ ■■■ ■■ ■■ ■■ ■■■ ■ ■■■■ ■■■■ Extensibility ■■■■ ■■■■ ■■■■ ■■ ■■ ■■ ■ ■■■■ ■■■■ Interactive mode X X X O X X O X X Multi-CPU (NUMA) O O X X O O O O O Multi-CPU (Cluster) O O O X O O X O O Multi-GPU (single node) O O O X O O ? (manual multi- batch) O O Multi-GPU (Cluster) O O O X O O X O O
  • 48.
    TensorFlow Lite § TensorFlowLite: Embedded TensorFlow § No additional environment installation required § OS level hardware acceleration § Leverages Android NN § XLA-based optimization support § Enables binding to various programming languages § Developer Preview (4 days ago) § Part of Android O-MR1 Google I/O 2017 / Android meets TensorFlow
  • 49.
    TensorFlow Lite § Format §FlatBuffers instead of ProtocolBuffers § Provides converter § Models § InceptionV3 § MobileNets: vision-specific model family § API § Java § C++
  • 50.
    TensorFlow Lite: Whyand How § Why? Less traffic / faster response § Image / OCR, Speech <-> Text, Translation, NLP § Motion, GPS and more § ML can extract the meaning from raw data § Image recognition: Send raw image vs. send detected label § Motion detection: Send raw motion vs. send feature vector § How? Model compression § Graph freezing § Graph conversion tools § Quantization § Weight § Calculation § Memory mapping Google I/O 2017 / Android meets TensorFlow
  • 51.
    Android Neural NetworkAPI § New APIs for NeuralNet § Part of Android Framework § Since next Android release § Reduce the library duplication through apps. § Supports Hardware acceleration § GPU, DSP, ISP, NeuralNet chips, etc. Google I/O 2017 / Android meets TensorFlow
  • 52.
    Flow goes to:market What is flowing through the stream?
  • 53.
    Market: API-based (personalized)deep learning service § Service with pre-baked models via API § Focuses on the fields that does not require real-time § e.g. Microsoft Azure Cognitive service § Pre-trained ANN + personalized data = personalized NN § Easy personalization : server-side training + =
  • 54.
    Market: User-side deeplearning services § Inference with trained models § Does not require heavy calculation § e.g. ARMv7 with ~512MB / 1GB RAM § Toys / light products § Smart toys for kidult (adult + kids) : Self-driving R/C car / drone § Home appliance and controllers § IoT + ML § Locality : Home (per room), Car, Office, etc. § E.g. Smart home resource management systems
  • 55.
    Market: Deep Learningservice for everyone § Digital assistants War § Digital assistant (with sprakers): Gateway of deep learning based services § Context extraction + inference + features § Echo (Amazon) / Google Home (Google) § Microsoft (Cortana in every MS products) / Apple (HomePod) § Korea? Also entering the war field § Naver: Wave / Friends § Kakao: Kakao mini § SK: Nugu
  • 56.
    Flow goes to:tech. What is flowing through the stream?
  • 57.
    Portability and extensibility §Training on § Mac / windows § GPU server § GPU / TPU on Cloud § Prediction / Inference using § Android / iOS § Raspberry Pi and TPU § Android Things Google I/O 2017 / Android meets TensorFlow
  • 58.
    Open-source Machine LearningFramework § Machine Learning Framework: (almost) open-source § Google: TensorFlow (2015~) § Microsoft: CNTK (2016~) § Amazon: MxNet (2015~) § Facebook: Caffe 2 (2017~) / PyTorch (2016~) § Baidu: PaddlePaddle (2016~) § Why? § 2017 § General goal of new versions: user-friendly syntax § Rise of Keras, PyTorch leads TensorFlow Eager execution
  • 59.
    Server-side machine learning §Machine learning workload characteristics § Training § Requires ultra-heavy computation resources § Need to feed big, indexed data § OR, (reinforcement learning) need pair model / training environment to give feedbacks § Serving § Requires (relatively) light resources: § Low CPU cost § Middle memory capacity (to load NeuralNet)
  • 60.
    TensorFlow: Multiverse § TensorFlowAMD GPU acceleration § OpenCL with ComputeCPP (Feb. 2017) § Accelerates c++ codes (codeplay) § Khronos support / SYCL standard § Still in early stage § Only supports Linux § ROCm (AMD) based TensorFlow (Sep. 2017) § First open-source HPC/Hyperscale-class platform for GPU computing § LLVM based / HCC C++ / GCN compiler § https://github.com/ROCmSoftwarePlatform/ hiptensorflow
  • 61.
    Hand-held machine learning:Why? § Issues from real-time models / apps § Autopilot § Real-time effect on photos / videos § Voice recognition § Automators § Privacy issues § Increasing privacy information § ETC § Lead the network cost reduction
  • 62.
    Hand-held machine learning:How? § Apple’s approach § Keeping user privacy with Differential Privacy § Gather Anonymized user data § User-specific machine learning models: keep them in the phone § e.g. Photo face detection / voice recognition / smart keyboard § Core ML (iOS 11) § Support Machine Learning model as function (.mlmodel format) § Google’s approach § Ultra-large scale server side training using TPU (2nd gen.) § Mobile: Handles data compression and feature extraction (to reduce traffic) § On the mobile: § Android NeuralNet API (Android O) § TensorFlow Lite on Android (Android O) https://backchannel.com/an-exclusive-look-at-how-ai-and-machine-learning-work-at-apple-8dbfb131932b
  • 63.
    Hand-held machine learning:How? § Train on server, Serve on smartphone § Enough to serve pre-trained models on smartphones § Both train and serve on smartphone § Keeping privacy / reduce traffic / personalization § Uses GPUs on recent smartphones § Working together § Feature extraction / compression / preprocessing ‒ Mobile side § Machine Learning model training / updating / streaming advanced models ‒ Server side
  • 64.
    Hand-held machine learning:How? § TensorFlow § Supports both Android and iOS § XCode and Android Studio § XLA compiler framework since TensorFlow 1.0: § Will support diverse languages / environments § Also, optimizing for smartphones and tablets § MobileNet (Apr. 2017) § Efficient Convolutional Neural Networks for Mobile Vision Applications § TensorFlow Lite (Nov. 2017): development focus § Built-in operators for both quantized models (int (8bit) / fixed point) and floating point models (FP10, FP16) § Support for embedded GPUs / ASICs
  • 65.
    Browser-side machine learning §Machine Learning without hassle § Ingredients for machine learning: Computation, Data, Algorithm § XLA: provides binary-code level optimization for various environment § Do we have cross-platform computation environment? § Java? § Browser! § Recent improvements of web browser § WebGL § Unified programming environment for many GPU-enabled machines § WebAssembly § Binary-level optimization § Shipped to every mainstream browser! (just in this week)
  • 66.
    Convertible NeuralNet format §ONNX (Open Neural Network Exchange) § Microsoft / Facebook (Sep. 2017) § Caffe 2, PyTorch (by Facebook), CNTK (Microsoft) § MLMODEL (Code ML model, Machine Learning Model) § Apple (Aug. 2017) § Caffe, Keras, scikit-learn, LIBSVM (Open Source) § Provides Core ML converter / specification
  • 67.
    Recap § Machine Learning/ Artificial Intelligence § Flow of TensorFlow § TensorFlow Serving Project § Keras-compatible API § Datasets § Eager execution § TensorFlow Lite § Flow goes to § More user-friendly toolkits / frameworks § API-based / personalized § User-side inference / Hand-held ML § Convertible Machine Learning Model formats
  • 68.
    End! Thank you forlistening https://www.lablup.ai https://backend.ai https://cloud.backend.ai https://www.codeonweb.com https://github.com/lablup Lablup Inc. Backend.AI Backend.AI Cloud CodeOnWeb Service Github repository