Debug Tensorflow Code with Specifications and Assertions

Find bugs in the herd with
debuggable Tensorflow code
Yi Wei
yiwei@prowler.io
1

Tensorflow code is difficult to debug and verify
● Tensor values are multi-dimensional arrays
● TensorBoard graph visualization has hundred of nodes and edges
● Many (or not enough) tips from the Internet, but I never know whether my
code is correct with those tipis
2

Specification to the rescue
● Reasoning Tensorflow code is difficult since the debugger does
not know what is correct.
● Specification defines correctness of the code.
● Correctness w.r.t. algorithm definition, not whether the code can
learn a model
3
Ask not what the debugger can do for you,
ask what you can do for the debugger

Three assertion techniques to verify correctness
● Tensor shape assertions to validate data shapes
● Tensor dependency assertions to validate graph structure
● Tensor equation assertions to numerical calculations
4

Technique 1: shape assertions
Write an assert to check the shape of every tensor you introduce.
prediction_tensor = q_function.output_tensor
assert prediction_tensor.shape.to_list() == [batch_size, action_dimension]
target_tensor = reward_tensor + discount * bootstrapped_tesnor
assert target_tensor.shape.to_list() == [batch_size, action_dimension]
loss_tensor = tf.losses.mean_squared_error(target_tensor, prediction_tensor)
assert loss_tensor.shape.to_list() == []
5

Next we need to validate graph structure, but how?
TensorBoard gives complicated visualization, not practical for most of us
6

Technique 2: Tensor group dependency
7
We developed a Python package TensorGroupDependency:
● Visualizes part of the graph involving only your tensors
● Helps you to check tensor dependency correctness
● Automatically generates tensor graph structural assertions.
● This is the key step to make the whole process practical!

Use of TensorGroupDependency
d = TensorGroupDependency()
d.add(q_function, 'q_function')
d.add(q_function.output_tensor, 'q_value_tensor')
d.add(prediction_tensor, 'prediction_tensor')
d.add(target_tensor, 'target_tensor')
d.add(loss_tensor, 'loss_tensor')
dot = d.generate_dot_representation()
print(dot)
8

Visualization from TensorGroupDependency
● Tensors as nodes
● Dependency between tensors as edges
● You must explain why edges exist
● Have assertions automatically generated
9

Automatically generated assertions
d.generate_assertions(target_exp='d')
d.assert_immediate_ancestors('q_function.variables', set())
d.assert_immediate_ancestors('target_tensor', set())
d.assert_immediate_ancestors('q_value_tensor', {'q_function.variables'})
d.assert_immediate_ancestors('loss_tensor', {'target_tensor', 'prediction_tensor'})
d.assert_immediate_ancestors('prediction_tensor', {'q_value_tensor'})
10

Visualization for the TD(λ) critic
11

Visualization for the Generalized Advantage Estimation critic
12

Visualization for the StarCraft2 learner
Graphs become smaller and smaller because of composability.
13

Open source TensorGroupDependency
● TensorGroupDependency is the key component that
make the whole process practical
● We are in preparation of open-sourcing
TensorGroupDependency
● Drop me an email at yiwei@prowler.io if interested
14

Technique 3: tensor equations
Write an assertion to check every equation in your algorithm.
_, prediction, target, loss = sess.run(
[parameter_update_operations, prediction_tensor, target_tensor, loss_tensor],
feed_dict={})
mean_square_error = np.mean(np.power(target - prediction, 2))
np.testing.assert_almost_equal(loss, mean_square_error, decimal=1)
15

Effectiveness of quality assurance techniques
Learning module
Coding
time(h)
Debugging
time(h)
Bugs detected by (and time % spent)
shape
asserts (10%)
tensor graph
(40%)
tensor evals
(50%)
MC critic 5 1 3 2 1
Bootstrapping critic 7 0.5 4 1 2
GAE critic 2 0.5 2 0 0
TD(λ) critic 3 2 5 0 1
StarCraft2 learner 7 1 1 1 0
Total 24 5 15 4 4
16

Reasons for the bug-detecting effectiveness
● Practical specification writing
○ Specification defines correctness, no way around it.
○ Explicitly writing down specification helps you and the debugger
○ Practical is the keyword
● Fault localization
○ Locating where a fault originates in Tensorflow code is difficult
○ At each stage of specification, you only need to focus on places within that stage.
● Clear to-do list style engineering process
○ Each assertion stage has finite steps bounded by the tensors you introduce, usually a dozen
○ You know exactly when the verification process ends -- when you’ve validated your code!
○ When people know the exact steps, they are a lot more efficient.
17

Assertions enable advanced testing
● Ingredients of a test: input construction and oracle
● Machine learning code uses numbers as input
● Since we already have the oracle, generating tests are easy
18

Conclusions yiwei@prowler.io
19

Debug Tensorflow Code with Specifications and Assertions

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Debug Tensorflow Code with Specifications and Assertions

Similar to Debug Tensorflow Code with Specifications and Assertions (20)

More from Seldon

More from Seldon (20)

Recently uploaded

Recently uploaded (20)

Debug Tensorflow Code with Specifications and Assertions