This document discusses moving from TensorFlow's graph mode to eager execution mode. Eager execution evaluates operations immediately without first describing the execution graph. This provides an intuitive interface, fast development iterations, easier debugging, and natural control flow. The document covers best practices for data pipelines, building models, custom layers, and text classification in eager mode. Control flow can now be handled using Python control structures rather than TensorFlow control ops like tf.while_loop.
TF GRAPH TO TF EAGER: Moving from TensorFlow Graph Mode to Eager Execution
1. TF GRAPH TO TF EAGER
Guy Hadash
IBM Research AI
2. WHY MOVE TO EAGER?
Eager Execution changes the core idea of TensorFlow.
Instead of describing the execution graph in Python, compiling it and then running it,
the framework is now imperative. This means it creates the graph on the fly and runs
operations immediately.
This brings some significant improvements:
• An intuitive interface
• Fast development iterations
• Easier debugging
• Natural control flow
3. WHY MOVE TO EAGER?
The main ability which we gain now is:
This will simply gives us the value of the tensor. As we said, this allow us much easier
debugging, and we can also control the model flow based on the tensors values.
There is no need for session anymore, and we can stop worry about graph
dependencies and etc.
tensor.numpy()
4. SESSION PROGRAM
• Data pipeline
• Classifier
• go over the necessary stuff for building and training custom model
• Autoencoder
• building custom layer
• Text classification
• controlling model flow with python, and working with sequence data
All the code inside colab: https://goo.gl/q3rHNT
5. BEST PRACTICES
Moving from graph mode to eager mode also makes it much more natural to now
work in OOPier way.
We will inherit from tf.keras.Model and tf.keras.layers.Layer.
We will use tf.Data for easy and fast data pipeline.
6. DATA PIPELINE
tf.Data is the current best practice for handling the data pipeline.
This will allow us easy and fast data pipeline. From my experience the most common
used initializers are:
from_tensor_slices – retrieve one sample at a time from given tensor. Best for
simple cases training.
from_tensors – returns the full dataset at once. Helpful for testing.
from_generator – a more flexible way, useful in more complicated use cases.
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
train_ds = tf.data.Dataset.from_tensor_slices((x_train, y_train))
TIP: All dataset initializers works naturally with tuples and dictionaries (also nested)
7. DATA PIPELINE
Now that we have Dataset object we can quickly build pipeline:
By creating this pipeline, we allow TF to utilize the CPU in parallel to our training and
prepare the next batch in the GPU waiting for the next optimization step.
train_ds = train_ds.map(_normalize, num_parallel_calls=4)
train_ds = train_ds.apply(tf.contrib.data.shuffle_and_repeat(buffer_size, num_epochs))
# train_ds = train_ds.shuffle(buffer_size).repeat(num_epochs)
train_ds = train_ds .batch(batch_size).apply(tf.contrib.data.prefetch_to_device("/gpu:0"))
TIP: Even when running in Eager mode the pipeline runs as a graph
TIP: Buffer size should be big enough for effective shuffling
TIP: prefetch_to_device must be last operation
8. THE BASIC BUILDING BLOCKS
tf.keras.layers.Layer
Layer - a group of variables tied together.
tf.keras.Network
Network – a group of layers tied together
tf.keras.Model
Model – network with all the training utils
Each of them is callable.
9. BUILDING MODEL
When building a model in eager execution, we derive from tf.keras.Model.
This gives us a few important properties which we will use:
• model.variables - automatically returns all model’s variables.
• It does it by taking all variables from layers (inherit from tf.layers.Layer), and models
(inherit from tf.keras.Network)
• model.save_weights – allow us to save (and load) the model weights easily
• There is also an option to save the model itself and not only the weights. However, it
doesn’t work well when building custom models.
10. MNIST MODEL
We first initialize the layers we will use,
but not describing the model flow
(different from graph mode).
You can notice the real variables sized
are unknown, so it can’t be initialized
yet.
Here we override the call function, this
will be called each time our model is
activated.
class SimpleClassifier(tf.keras.Model):
def __init__(self):
super().__init__()
self.fc1 = tf.keras.layers.Dense(100, activation=tf.nn.relu)
self.fc2 = tf.keras.layers.Dense(50, activation=tf.nn.relu)
self.fc3 = tf.keras.layers.Dense(FLAGS.classes_amount)
self.optimizer = tf.train.AdamOptimizer()
def call(self, inputs, training=None, **kwargs):
x = tf.reshape(inputs, [inputs.shape[0], -1])
x = self.fc1(x)
x = self.fc2(x)
x = self.fc3(x)
return x
TIP: if you want to define list of layers use tf.contrib.checkpoint.List, for all APIs to work.
11. RUNNING THE MODEL
When we want to run the model, we treat it as runnable.
This will run the call function we wrote in the previous slide, with some extra logic.
model = SimpleClassifier()
results = model(inputs)
12. OPTIMIZATION PROCESS
Now in Eager mode, when we optimize, we need to clarify the model which variables
it should calculate gradients by.
This is what tf.GradientTape() as tape context is for. All intermediate results we need
for the gradients calculations of variables are saved. We can also use watch
command to tell the framework watch for any arbitrary tensor results.
We can later use tape.gradient(loss, variables) to get the gradients of the loss with
respect to each of the variables. This will automatically reset the tape and free the
memory.
TIP: If you need to call tape.gradient more the once use tf.GradientTape(persistent=True) - and use del later
13. MNIST MODEL
We define the loss function for the
model – nothing changed here.
No special reason to be in the model
class.
Here is the optimization process. As
described we run the forward in the
tape context, and then we calculate the
gradients and apply them.
Notice we use the model instance as
runnable, it will use the call function we
override before.
@staticmethod
def loss(logits, labels):
return tf.losses.sparse_softmax_cross_entropy(labels, logits)
def optimize(self, inputs, labels):
with tf.GradientTape() as tape:
logits = self(inputs)
batch_loss = self.loss(logits, labels)
gradients = tape.gradient(batch_loss, self.variables)
self.optimizer.apply_gradients(zip(gradients, self.variables))
return batch_loss
14. RUNNING TOGETHER
Now we just need to iterate over the data and optimize for each batch. This is done
very naturally in eager mode:
with tf.device('/gpu:0'):
model = SimpleClassifier()
for step, (batch_x, batch_y) in enumerate(train_ds):
loss = model.optimize(batch_x, batch_y)
if step % FLAGS.print_freq == 0:
print("Step {}: loss: {}".format(step, loss))
if step % FLAGS.validate_freq == 0:
accuracy = model.accuracy(x_test, y_test)
print("Step {}: test accuracy: {}".format(step, accuracy))
TIP: since tf1.8 there is automatically placement, however currently (tf1.9) it is still better to state the
device, performance wise
15. BUILDING CUSTOM LAYERS
We will now build MNIST autoencoder, where the reconstruct is the transpose of the
encoder layers.
In graph mode, we will have the variable w, and we can just use it again. This method
is also possible in eager mode, but since we try to work in a more OOP way, we will
construct a custom layer.
16. BUILDING CUSTOM LAYERS
To create custom layer, we need to implement:
• __Init__ - constructor, preparing everything we can without the input shape
• build – function which gets called in the first time the layer is running, here we will
know the input shape, and we can initialize all weights of the layer.
• call – the layer logic, will be called each time the layer runs on input.
17. BUILDING CUSTOM LAYERS
In init function, we define all the
information we need and variables that
possible.
This is where we define the layer logic.
You can see that the if statement
controls the flow and evaluates each
batch separately.
class InvDense(tf.keras.layers.Layer):
def __init__(self, dense_layer, activation=None, **kwargs):
super().__init__(**kwargs)
self.dense_layer = dense_layer
self.activation_func = activation
def build(self, input_shape):
out_hiddens = self.dense_layer.kernel.get_shape()[-2]
self.bias = self.add_variable("b", [out_hiddens],
initializer=tf.zeros_initializer)
super().build(input_shape)
def call(self, inputs, **kwargs):
x = tf.matmul(inputs, tf.transpose(self.dense_layer.kernel))
x += self.bias
if self.activation_func:
x = self.activation_func(x)
return x
18. BUILDING CUSTOM LAYERS
Now we initialize and use the custom
layers the same as any other
keras.layers.*.
class AutoEncoder(tf.keras.Model):
def __init__(self):
super().__init__()
self.fc1 = tf.keras.layers.Dense(100, activation=tf.nn.relu)
self.fc2 = tf.keras.layers.Dense(50, activation=tf.nn.relu)
self.fc2_t = InvDense(self.fc2, activation=tf.nn.relu)
self.fc1_t = InvDense(self.fc1)
self.optimizer = tf.train.AdamOptimizer()
def call(self, inputs, training=None, **kwargs):
x = tf.reshape(inputs, [inputs.shape[0], -1])
x = self.fc1(x)
x = self.fc2(x)
x = self.fc2_t(x)
x = self.fc1_t(x)
x = tf.reshape(x, inputs.shape)
return x
19. TEXT CLASSIFICATION
Now we do IMDb sentiment classification. We will start by building a suitable data
pipeline:
Notice we use from_generator since each example has different length
def _add_length(x, y):
x = x[:FLAGS.max_len]
x_dict = {"seq": x, "seq_len": tf.size(x)}
return x_dict, y
ds = tf.data.Dataset.from_generator(lambda: zip(x_train, y_train), output_types=(tf.int32, tf.int32),
output_shapes=([None], []))
ds = ds.map(_add_length, num_parallel_calls=4)
ds = ds.apply(tf.contrib.data.shuffle_and_repeat(len(x_train), FLAGS.num_epochs))
ds = ds.padded_batch(FLAGS.batch_size, padded_shapes=({"seq": [None], "seq_len": []}, []))
ds = ds.apply(tf.contrib.data.prefetch_to_device("/gpu:0"))
TIP: from_generator accept any callable object which returns __iter__ supporting object
20. CONTROLLING FLOW
One of the significant advantages that eager brings us is the ability to control our flow
using python and tensors values. No need for tf.while_loop and tf.cond no more!
21. TEXT CLASSIFICATION
Notice that we now go
through the words using
python for loop.
This method won’t work
in graph mode since each
batch the shape in axis=1
is different.
def call(self, inputs, training=None, **kwargs):
seqs = self.word_emb(inputs["seq"])
state = [tf.zeros([seqs.shape[0], self.rnn_cell.state_size])]
seqs = tf.unstack(seqs, num=int(seqs.shape[1]), axis=1)
hiddens = []
for word in seqs:
h, state = self.rnn_cell(word, state)
hiddens.append(h)
hiddens = tf.stack(hiddens, axis=1)
last_hiddens = self.get_last_relevant(hiddens, inputs["seq_len"])
x = self.fc1(last_hiddens)
x = self.fc2(x)
return x
22. ADDING REGULARIZATION
In order to add the common regularizations, we can use tf.contrib.layers.*_regularizer
as we did in tensorflow.
Now, in order to get the regularization loss, instead of using
tf.GraphKeys.REGULARIZATION_LOSSES we will use keras.Model losses
l2_reg = tf.contrib.layers.l2_regularizer(FLAGS.reg_factor)
self.fc1 = tf.keras.layers.Dense(100, kernel_regularizer=l2_reg, bias_regularizer=l2_reg)
loss += tf.reduce_sum(self.losses)