Introduction to the new Tensorflow 2.x and the Coral AI Edge TPU hardware. The presentation introduces Tensorflow main features such as Sequential and Functional APIs, mobile support with Tensorflow Lite, web support with TensorflowJS and Google Cloud support with TFX.
In addition, the presentation introduces the new edge TPU architecture coming from Coral AI, including its main hardware features and description of the compiling flow.
Developer Data Modeling Mistakes: From Postgres to NoSQL
Tensorflow 2.0 and Coral Edge TPU
1. Tensorflow 2.0 for Edge TPU
Programming
April 23th, 2020
Andrés L. Martínez
@davilagrau
2. Andrés L. Martínez
a.k.a almo
Google Developer Relations
Ecosystem Europe
Zurich, Switzerland
#GDESummit2019
@davilagrau
3. Attention: this a public online
event.
It might be recorded for
sharing purpose.
If you prefer not to, please
switch off your mic & camera
4. Code of Conduct: be friendly, be water.
● Be excellent to each other.
● Speak up if you see or hear
something.
● Harassment is not tolerated.
● Practice saying "Yes and" to
each other.
More information, http://bit.ly/2IhF0l3
“I said empty your mind. Be formless,
shapeless, like water. Now you put water into a
cup, it becomes the cup. You put water into a
bottle, it becomes the bottle. You put it in a
teapot, it becomes the teapot. Now water can
flow or it can crash.
Be water, my friend.”
5. Please let us know your questions
slides.app.goo.gl/Ne8pn
7. TensorFlow
What is TensorFlow?
● An end-to-end open source machine
learning platform
● For research and production
● Distributed training and serving
predictions
● Apache 2.0 license
Current Stable Version 2.x
9. Why TensorFlow
Easy model building
Build and train ML models easily
using intuitive high-level APIs like
Keras with eager execution, which
makes for immediate model iteration
and easy debugging.
Robust ML production anywhere
Easily train and deploy models in the
cloud, on-prem, in the browser, or
on-device no matter what language you
use.
Powerful for research
A simple and flexible architecture to
take new ideas from concept to code,
to state-of-the-art models, and to
publication faster.
10. TensorFlow.js
Library for ML in JavaScript
Run existing models
Use off-the-shelf JavaScript models
or convert Python TensorFlow
models to run in the browser or
under Node.js.
Retrain existing models
Retrain pre-existing ML models using
your own data.
Develop ML with JavaScript
Build and train models directly in
JavaScript using flexible and intuitive
APIs.
11. TensorFlow Lite
ML models on mobile and IoT devices
Pick a model
Pick a new model or retrain
an existing one.
Optimize
Quantize by converting
32-bit floats to more
efficient 8-bit integers or
run on GPU.
Convert
Convert a TensorFlow model
into a compressed flat buffer
with the TensorFlow Lite
Converter.
Deploy
Take the compressed .tflite
file and load it into a
mobile or embedded
device.
12. Coral Architecture - Edge TPU
AI at the edge
End-to-end AI
infrastructure
High performance in a
small physical and
power footprint.
Co-design of AI
hardware, software
and algorithms
A broad range of
applications
An open, end-to-end infrastructure for deploying AI solutions
13. TensorFlow Extended (TFX)
Deploying production ML pipelines
TensorFlow Data Validation
TensorFlow Data Validation
(TFDV) helps developers
understand, validate, and
monitor their ML data at
scale.
TensorFlow Serving
Machine Learning serving
systems, supporting model
versioning and multiple
models, experimenting via A/B
testing, while ensuring high
throughput with low latency.
TensorFlow Transform
Preprocessing data into a
suitable format, converting
between formats, tokenizing
and stemming text and
forming vocabularies, etc.
TensorFlow Model Analysis
TensorFlow Model Analysis
(TFMA) enables developers
to compute and visualize
evaluation metrics for their
models.
14. Become an expert in machine learning
Coding skills: Building ML models involves much more
than just knowing ML concepts—it requires coding in
order to do the data management, parameter tuning, and
parsing results needed to test and optimize your model.
Math and stats: ML is a math heavy discipline, so if you
plan to modify ML models or build new ones from
scratch, familiarity with the underlying math concepts is
crucial to the process.
ML theory: Knowing the basics of ML theory will give
you a foundation to build on, and help you troubleshoot
when something goes wrong.
Build your own projects: Getting hands on experience
with ML is the best way to put your knowledge to the
test, so don’t be afraid to dive in early with a simple
colab or tutorial to get some practice.
More: https://www.tensorflow.org/resources/learn-ml
16. Tensorflow: all developers ecosystem support
Newbies, rookies
an other
earl -entr
specimens
Padawans, wh ar
abl t buil their
ow lightsabers
(Standar us
cases)
Discipline an
perience , Jed
Knights
Jed Master, amon
th m accomplishe
an recogn e
polymaths i th Star
Wars gal .
Sequential API
+ built-in
layers
Functional API +
built-in layers
Functional API
+ Custom:
- Layers
- Metrics
- Losses
Subclassing:
everything from
scratch
WhatWho
17. Eager execution
TensorFlow's eager execution is an imperative
programming environment that evaluates
operations immediately, without building graphs:
● An intuitive interface—Structure your code
naturally and use Python data structures.
Quickly iterate on small models and small
data.
● Easier debugging—Call ops directly to
inspect running models and test changes.
Use standard Python debugging tools for
immediate error reporting.
● Natural control flow—Use Python control
flow instead of graph control flow,
simplifying the specification of dynamic
models.
18. The Functional API at a glance
● An API to configure the connectivity of
DAGs of layers
● Targeted at users more than developers
● Declarative configuration level: no logic
○ All logic is contained inside of layers
● All “debugging” is done statically at
construction time; any model you can
instantiate will you:
○ You don’t write any Python, so you don’t
write bugs
○ “Debugging” == topology debugging (can
be done visually)
● Modes are static data structures
○ Inspectable: you can retrieve intermediate
activations and use them in a new model
○ Plottable: you can directly generate the
graphs via “plot_model”
○ Safely serializable
20. TensorFlow Keras
https://www.tensorflow.org/guide/keras
● TensorFlow's implementation of the Keras
API specification
● Support for TensorFlow-specific
○ Eager execution
○ Data Pipelines
○ Estimator
● Keras functional API
● Build complex model topologies
○ Multi-input models,
○ Multi-output models,
○ Models with shared layers (the same layer
called several times),
○ Models with non-sequential data flows
(e.g. residual connections).
● Training Callbacks
import tensorflow as tf
from tensorflow import keras
21. TensorFlow Datasets
https://www.tensorflow.org/datasets
● Easy-to-use
● High-performance input pipelines
● Compatible with both TensorFlow Eager
mode and Graph mode
● Dictionaries mapping feature
● Caching and prefetch
● Integrated with Google Cloud Platform
import tensorflow_datasets as tfds
ds = tfds.load('mnist', split='train', shuffle_files=True)
https://github.com/tensorflow/datasets
https://www.tensorflow.org/datasets/catalog/overview
Catalogs
22. TensorFlow Hub
https://tfhub.dev
Discover our hub
Find out what you can do in
TensorFlow Hub and how
our platform works.
Meet our community
Get to know other users,
find new collaborators, or
post questions and get
answers.
Intro to Machine Learning
If you’re new to machine
learning, our introductory
resources explain all the ins
and outs.
!pip install "tensorflow_hub>=0.6.0"
import tensorflow_hub as hub
embed = hub.KerasLayer("https://tfhub.dev/google/nnlm-en-dim128/2")
embeddings = embed(["A long sentence.", "single-word","http://example.com"])
print(embeddings.shape) #(3,128)
23. Model Garden for TensorFlow
https://github.com/tensorflow/models/tree/master/official
● State-of-the-art language understanding
models: More members in Transformer
family
● Classification models: EfficientNet,
MnasNet and variants.
● Trainable on:
○ Distributed training on multiple GPUs
○ Distributed training on multiple GPU hosts
○ Distributed training on Cloud TPUs
!pip install tf-models-nightly
!export PYTHONPATH=$PYTHONPATH:/path/to/models
import os
os.environ['PYTHONPATH'] += ":/path/to/models"
24. Distributed training with TensorFlow
https://www.tensorflow.org/guide/distributed_training
Distributed Strategy is a TensorFlow API to
distribute training across multiple GPUs,
multiple machines or TPUs.
Using this API, you can distribute your existing
models and training code with minimal code
changes, eagerly, or in a graph.
API can also be used for distributing evaluation
and prediction on different platforms.
Integrated Distribute Strategy into Keras
Keras API Custom
training loop
Estimator
API
Mirrored Supported Experimental Limited
TPU Experimental Experimental No Support
Multi Worker
Mirrored
Experimental Post TF 2.0 Limited
Central Storage Experimental Post TF 2.0 Limited
Parameter
Server
Post TF 2.0 No Support Limited
One Device Supported Supported Limited
26. TensorFlow Trusted Partner Pilot Program
Uses Cases
● Learn how TensorFlow solves real,
everyday machine learning problems
● An entire ecosystem to help you solve
challenging, real-world problems with
machine learning
● Connect with a TensorFlow Trusted
Partner
https://www.tensorflow.org/about/case-studies
28. Coral Edge TPU intro
Inference accelerator:
● Optimized for vision applications and
convolutional neural networks
● Runs concurrent state-of -the-art
models on high-resolution video, at
real-time (MobileNet V2 at 400 FPS)
● Full support for quantized
TensorFlow Lite models
An individual Edge TPU can perform 4 trillion (fixed-point)
operations per second (4 TOPS), using only 2 watts of power—in
other words, you get 2 TOPS per watt.
29. Coral Portfolio (1)
Dev Board
A single-board computer
with a removable
system-on-module (SOM)
featuring the Edge TPU.
Available
Now
Price
$149.99
USB Accelerator
A USB accessory featuring
the Edge TPU that brings ML
inferencing to existing
systems.
Available
Now
Price
$74.99
PCI-E Accelerator
Integrate the Edge TPU into
legacy and new systems
using a Mini PCIe interface.
Available
Now
Price
$34.99
M.2 Accelerator A+E key
Integrate the Edge TPU into
legacy and new systems using
an M.2 A+E key interface.
Available
Now
Price
$34.99
30. Coral Portfolio (2)
Dev Board Mini
A single-board computer
with a removable
system-on-module (SOM)
featuring the Edge TPU.
Available
Coming soon
Price
TBD
M.2 Accelerator B+M
key
Integrate the Edge TPU into
legacy and new systems using
an M.2 B+M key interface.
Available
Now
Price
$34.99
Accelerator module
A solderable multi-chip
module including the Edge
TPU
Available
Coming soon
Price
TBD
System on Module (SoM)
A fully-integrated system for
accelerated ML applications in a
40mm x 48mm pluggable module.
Available
Now
Price
$114.99
31. Features Dev board
31
● Edge TPU System-on-Module (SoM)
○ NXP i.MX 8M SoC (Quad-core Arm Cortex-A53, plus
Cortex-M4F)
○ Google Edge TPU ML accelerator coprocessor
○ Cryptographic coprocessor
○ Wi-Fi 2x2 MIMO (802.11b/g/n/ac 2.4/5 GHz)
○ Bluetooth 4.2
○ 8 GB eMMC
○ 1 GB LPDDR4
● USB connections
○ USB Type-C power port (5 V DC)
○ USB 3.0 Type-C OTG port
○ USB 3.0 Type-A host port
○ USB 2.0 Micro-B serial console port
● Audio connections
○ 3.5 mm audio jack (CTIA compliant)
○ Digital PDM microphone (x2)
○ 2.54 mm 4-pin terminal for stereo speakers
● Video connections
○ HDMI 2.0a (full size)
○ 39-pin FFC connector for MIPI DSI display (4-lane)
○ 24-pin FFC connector for MIPI CSI-2 camera (4-lane)
● MicroSD card slot
● Gigabit Ethernet port
● 40-pin GPIO expansion header
● Supports Mendel Linux (derivative of Debian)
32. Coral SoM block diagram
32
● CPU: Quad symmetric Cortex-A53 processors,
supports 64-bit Armv8-A architecture. Plus Arm
Cortex-M4 core
● GPU: 4 shaders, 267 million triangles/sec, 1.6
Gigapixel/sec, 32 GFLOPs 32-bit or 64 GFLOPs
16-bit.
● Video: 4Kp60 HEVC/H.265 main, 4Kp60 VP9 and
4Kp30 AVC/H.264. 1080p60 MPEG-2, MPEG-4p2,
VC-1, H.263, etc.
● Memory: 1GB LPDDR4 SDRAM, 1600MHz
maximum DDR clock. 8GB NAND eMMC flash
memory, 8-bits MMC mode
● Edge TPU interfaces with SoM via PCIe and
I2C/GPIO to interface the iMX8MQ SOC
● Microchip ATECC608A cryptographic
coprocessor, with asymmetric (public/private) key
cryptographic signature
33. Software toolchain
Mendel OS
A fork of the Debian OS to power our Intelligence Boards, and a
C++ & Python SDK
APIs to low level connections
Edge TPU Compiler
Converts TF graphs to run on targeted chipsets
Companion Software
Abstracts away traditional board management/coding in a
high-level program
(TFLite)
34. Mendel Development Tool (MDT)
Similar to the Android-standard ADB tool
"Porcelain" wrapper based around industry standard protocols such
as SSH, mDNS, and HTTP
Handles device discovery, shell, and key management
Cross-platform (Mac, Windows, Linux)
Open source, Apache licensed
Available as a Debian package via Google-hosted APT repositories
Also available via the Python standard pip installation tool
$ mdt devices
$ mdt shell
$ mdt push
$ mdt pull
$ mdt install
36. Edge TPU Compiler
36
● Compiles a TensorFlow Lite model (.tflite file) into a file that's compatible with the Edge TPU.
● Runs on any modern Debian-based Linux system, does not work on the Coral device or MAC
OSX.
37. Edge TPU Compiler
37
To run a model on the Coral Edge TPU
one needs two components:
- A model quantized for UINT8
(restricted to operations that support
UINT8)
- The compiled version of the
quantized model
edgetpu_compiler [options] model...
Source: https://coral.ai/docs/edgetpu/models-intro/
38. DEMO Inception V2 with/without compilation
Inception V2 model with quantization and compiled (optimized for TPU)
versus
Inception V2 model with quantization but not compiled
mendel@fun-calf:~/inception_v2$ ./command.sh
Using Inception V2 model with quantization and compiled (optimized for TPU;
downloaded from https://coral.ai/models)
Detects 1000 type of objects; dataset ImageNet; Input size: 224x224
---------------------------
macaw
Score : 0.9921875
Inference time: 36.04 ms (27.75 fps)
*****************************
Using Inception V2 model with quantization but not compiled
---------------------------
macaw
Score : 0.9921875
Inference time: 612.56 ms (1.63 fps)
39. Coral Edge TPU pre-trained models
TF Lite models already pre-compiled to
run on the Edge TPU:
image classification, object detection,
semantic segmentation, on-device
retraining
Source: https://coral.ai/models/
40. Co-compiling multiple models
Co-compilation to run multiple models on the same Edge TPU: caches their
parameter data together, eliminating the need to clear the cache each time
you run a different model.
Be careful if using co-compilation in combination with multiple Edge TPUs.
41. Python API
● ClassificationEngine: Performs image classification. Create an instance by
specifying a model, and then pass an image (such as a JPEG) to
ClassifyWithImage() and it returns a list of labels and scores.
● DetectionEngine: Performs object detection. Create an instance by
specifying a model, and then pass an image (such as a JPEG) to
DetectWithImage() and it returns a list of DetectionCandidate objects, each
of which contains a label, a score, and the coordinates of the object.
● ImprintingEngine: This implements a transfer-learning technique called
imprinting that does not require backward propagation, allowing you to
perform model retraining that's accelerated on the Edge TPU
42. DEMO Detection Engine (Python API)
# Initialize engine.
engine = DetectionEngine(args.model)
labels = ReadLabelFile(args.label) if args.label else None
# Open image.
img = Image.open(args.input)
draw = ImageDraw.Draw(img)
# Run inference.
ans = engine.DetectWithImage(img, threshold=0.05,
keep_aspect_ratio=True, relative_coord=False, top_k=10)
for obj in ans:
box = obj.bounding_box.flatten().tolist()
# Draw a rectangle.
draw.rectangle(box, outline='red')
43. References Coral Edge TPU
[1] https://coral.ai/docs
[2] Source code for the Edge TPU: https://github.com/google-coral/edgetpu
[3] Blog: https://blog.tensorflow.org/2019/03/build-ai-that-works-offline-with-coral.html
[4] Codelab: https://codelabs.developers.google.com/codelabs/edgetpu-classifier/index.html
[5] Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., … Yoon, D. H. (2017).
In-Datacenter Performance Analysis of a Tensor Processing Unit. Retrieved from
https://arxiv.org/abs/1704.04760
44. Tensorflow 2.0 for Edge TPU
Programming
April 23th, 2020
Andrés L. Martínez
@davilagrau