Eron Wright
@eronwright
TensorFlow & Apache
FlinkTM
An early look at a community project
Apache®, Apache Flink™, Flink™, and the Apache feather logo are trademarks of The Apache Software Foundation.
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
https://github.com/cookieai/flink-tensorflow
Background
2
Why TensorFlow?
 A powerful & flexible platform for machine
intelligence
 Reusable machine learning models
 C++ core / Java language binding
 Ease of integration with Apache Flink
3
TF Scenarios
 Language Understanding
• “syntaxnet”, Google Translate
 Image/Video/Audio Recognition
• “Inception”
 Creative Arts
• “Magenta”
4
TF Models
 Portable using “Saved Model” format
• Train on GPU-equipped cluster
• Perform inference anywhere
 Well-defined interactions and data types
using “signatures”
 Moving towards a Model Zoo
5
TF Graphs
6
x
W
*
b
+ softmax() y
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
TF in Flink
7
Source map() window() sink
x
W
*
b
+ max() y
Introducing Flink-Tensorflow
8
Project Status
 A prototype focused on inference using
pre-trained TF models
 Scala-only (for now)
 A community effort
9
Basic Idea
 Use TF functionality in a Flink program;
Not a TF compatibility layer
 “TF graph as a Flink map function”
 Support inference today, online learning in
the future
10
Demo
11
“Johnny”
 A hypothetical security system based on
picture passwords
• Present three specific pictures within one
minute: Access Granted!
• On timeout: Access Denied!
 TF model for image labeling (“Inception”)
 Flink CEP library for sequence detection
12
Inception Model
 Pretrained with ImageNet dataset
 Supports “retraining” for learning new
objects not in original dataset
13
load() label()
burger.jpg
ladybug.jpg
Stream
(Connector)
1 0.02
..
623 0.97
…
Inception Model (Con’t)
14
Inception v3
Using the Library
15
Basic Usage
1. Import a TensorFlow model
2. Write code to convert your domain
objects to/from tensors
3. Use the model in a batch or streaming
function
16
Importing a Model
1. Define the graph method(s) supported by
the model (ref)
2. Specify how to load the model
1. ”saved model” loader, or
2. graphdef loader, or
3. ad-hoc graph builder
17
Importing a Model (Con’t)
18
Working with Tensors
 Tensors are off-heap, AutoCloseable multi-
dimensional arrays
 You: convert input records to tensor
 Use Scala Automatic Resource
Management (ARM)
• Supports both imperative and monadic style
19
Writing a Flink Function
 Design goal: use TF in any transformation
function
• `MapFunction`
• `ProcessFunction` (with event-time timers!)
• `WindowFunction`
 Required: model lifecycle support
• `ModelAwareFunction`
20
Runtime
 TF embedded within the Flink JobManager
/ TaskManager
 One model instance per sub-task
 Large unmanaged memory blocks
 No Python needed
21
Future Directions
22
Stateful Models
23
 Integrated
checkpointing
 Support for
keyed streams
 Emphasize
online learning
Graph Builder (DSL)
 Construct TensorFlow graphs from scratch
(ref)
 Code generation for TF operations
 Incorporate other libraries, high-level APIs
(e.g. TF Keras)
24
Other
 Instrumentation
• TF Summaries (for TensorBoard)
• Flink Metrics
 Model versioning
• Leverage Flink job versioning methods
• Treat models as side-input
25
Eron Wright
@eronwright
TensorFlow & Apache
FlinkTM
An early look at a community project
Apache®, Apache Flink™, Flink™, and the Apache feather logo are trademarks of The Apache Software Foundation.
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
https://github.com/cookieai/flink-tensorflow
Thanks!
27

Flink Forward SF 2017: Eron Wright - Introducing Flink Tensorflow

  • 1.
    Eron Wright @eronwright TensorFlow &Apache FlinkTM An early look at a community project Apache®, Apache Flink™, Flink™, and the Apache feather logo are trademarks of The Apache Software Foundation. TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc. https://github.com/cookieai/flink-tensorflow
  • 2.
  • 3.
    Why TensorFlow?  Apowerful & flexible platform for machine intelligence  Reusable machine learning models  C++ core / Java language binding  Ease of integration with Apache Flink 3
  • 4.
    TF Scenarios  LanguageUnderstanding • “syntaxnet”, Google Translate  Image/Video/Audio Recognition • “Inception”  Creative Arts • “Magenta” 4
  • 5.
    TF Models  Portableusing “Saved Model” format • Train on GPU-equipped cluster • Perform inference anywhere  Well-defined interactions and data types using “signatures”  Moving towards a Model Zoo 5
  • 6.
    TF Graphs 6 x W * b + softmax()y x = tf.placeholder(tf.float32, [None, 784]) W = tf.Variable(tf.zeros([784, 10])) b = tf.Variable(tf.zeros([10])) y = tf.nn.softmax(tf.matmul(x, W) + b)
  • 7.
    TF in Flink 7 Sourcemap() window() sink x W * b + max() y
  • 8.
  • 9.
    Project Status  Aprototype focused on inference using pre-trained TF models  Scala-only (for now)  A community effort 9
  • 10.
    Basic Idea  UseTF functionality in a Flink program; Not a TF compatibility layer  “TF graph as a Flink map function”  Support inference today, online learning in the future 10
  • 11.
  • 12.
    “Johnny”  A hypotheticalsecurity system based on picture passwords • Present three specific pictures within one minute: Access Granted! • On timeout: Access Denied!  TF model for image labeling (“Inception”)  Flink CEP library for sequence detection 12
  • 13.
    Inception Model  Pretrainedwith ImageNet dataset  Supports “retraining” for learning new objects not in original dataset 13 load() label() burger.jpg ladybug.jpg Stream (Connector) 1 0.02 .. 623 0.97 …
  • 14.
  • 15.
  • 16.
    Basic Usage 1. Importa TensorFlow model 2. Write code to convert your domain objects to/from tensors 3. Use the model in a batch or streaming function 16
  • 17.
    Importing a Model 1.Define the graph method(s) supported by the model (ref) 2. Specify how to load the model 1. ”saved model” loader, or 2. graphdef loader, or 3. ad-hoc graph builder 17
  • 18.
    Importing a Model(Con’t) 18
  • 19.
    Working with Tensors Tensors are off-heap, AutoCloseable multi- dimensional arrays  You: convert input records to tensor  Use Scala Automatic Resource Management (ARM) • Supports both imperative and monadic style 19
  • 20.
    Writing a FlinkFunction  Design goal: use TF in any transformation function • `MapFunction` • `ProcessFunction` (with event-time timers!) • `WindowFunction`  Required: model lifecycle support • `ModelAwareFunction` 20
  • 21.
    Runtime  TF embeddedwithin the Flink JobManager / TaskManager  One model instance per sub-task  Large unmanaged memory blocks  No Python needed 21
  • 22.
  • 23.
    Stateful Models 23  Integrated checkpointing Support for keyed streams  Emphasize online learning
  • 24.
    Graph Builder (DSL) Construct TensorFlow graphs from scratch (ref)  Code generation for TF operations  Incorporate other libraries, high-level APIs (e.g. TF Keras) 24
  • 25.
    Other  Instrumentation • TFSummaries (for TensorBoard) • Flink Metrics  Model versioning • Leverage Flink job versioning methods • Treat models as side-input 25
  • 26.
    Eron Wright @eronwright TensorFlow &Apache FlinkTM An early look at a community project Apache®, Apache Flink™, Flink™, and the Apache feather logo are trademarks of The Apache Software Foundation. TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc. https://github.com/cookieai/flink-tensorflow
  • 27.

Editor's Notes

  • #5 Show the models TensorFlow Summit Videos
  • #6 Show the ‘compute loss’ signature Talk about TF Serving
  • #7 Discuss graphs, tensors Discuss ”the loop” Discuss TF runtime (various language bindings)
  • #8 Best of both worlds: Flink unified stream and batch programming Machine learning with event time as first-class aspect Connectors for rich I/O with exactly-once semantics Integration with Flink libraries TensorFlow GPU!
  • #10 Best of both worlds: Flink unified stream and batch programming Machine learning with event time as first-class aspect Connectors for rich I/O with exactly-once semantics Integration with Flink libraries TensorFlow GPU!
  • #14 - Show labels
  • #17 - Be sure to use a model-aware function.
  • #18 Load from HDFS Restoring state
  • #19 Load from HDFS Restoring state
  • #20 Typed tensors Converters Scala ARM
  • #21 - mode-aware function
  • #22 One model instance per sub-task, not per key Uses unmanaged memory, suggest tuning Flink memory settings Has performance advantage over remote TF, and vastly more flexibility
  • #24 -