This session for beginners introduces tf.data APIs for creating data pipelines by combining various "lazy operators" in tf.data, such as filter(), map(), batch(), zip(), flatmap(), take(), and so forth.
Familiarity with method chaining and TF2 is helpful (but not required). If you are comfortable with FRP, the code samples in this session will be very familiar to you.
Axa Assurance Maroc - Insurer Innovation Award 2024
Working with tf.data (TF 2)
1. Introduction to tf.data (TF2)
H2O Meetup
Galvanize San Francisco
02/19/2020
Oswald Campesato
ocampesato@yahoo.com
2. Highlights/Overview
What is tf.data?
Working with TF 2 tf.data.Dataset
Intermediate operators
Terminal operators
filter() and map()
zip() and batch()
Working with TF 2 generators
3. tf.data: TF Input Pipeline
An input pipeline is useful:
for streaming data
when data is too big to fit in memory
When data requires preprocessing
When you need to shuffle large data
Can be scaled to multiple hosts
=> ETL functionality
4. What are tf.data.Datasets
Simple example:
1) define a Numpy array of numbers
2) create a TF Dataset ds
3) iterate through the dataset ds
5. What are TF Datasets
import tensorflow as tf # tf-dataset1.py
import numpy as np
x = np.array([1,2,3,4,5])
ds = tf.data.Dataset.from_tensor_slices(x)
# iterate through the elements:
for value in ds.take(len(x)):
print(value)
6. What are Lambda Expressions
a lambda expression is an anonymous function
use lambda expressions to define local functions
pass lambda expressions as arguments
return them as the value of function calls
7. Some tf.data “lazy operators”
map()
filter()
flatmap()
batch()
take()
zip()
flatten()
=> Combined via “method chaining”
8. tf.data “lazy operators”
filter():
uses Boolean logic to "filter" the elements in an array to
determine which elements satisfy the Boolean condition
map(): a projection
this operator "applies" a lambda expression to each input
element
flat_map():
maps a single element of the input dataset to a Dataset of
elements
9. tf.data “lazy operators”
batch(n):
processes a "batch" of n elements during each
iteration
repeat(n):
repeats its input values n times
take(n):
operator "takes" n input values
10. tf.data.Dataset.from_tensors()
Import tensorflow as tf
#combine the input into one element
t1 = tf.constant([[1, 2], [3, 4]])
ds1 = tf.data.Dataset.from_tensors(t1)
# output: [[1, 2], [3, 4]]
12. TF2 Datasets: code sample
import tensorflow as tf
import numpy as np
x = np.arange(0, 10)
# create a dataset from a Numpy array
ds = tf.data.Dataset.from_tensor_slices(x)
13. TF filter() operator: ex #1
import tensorflow as tf # tf2_filter1.py
import numpy as np
x = np.array([1,2,3,4,5])
ds = tf.data.Dataset.from_tensor_slices(x)
print("First iteration:")
for value in ds:
print("value:",value)
15. TF filter() operator: ex #2
import tensorflow as tf # tf2_filter2.py
import numpy as np
x = np.array([1,2,3,4,5])
ds = tf.data.Dataset.from_tensor_slices(x)
print("First iteration:")
for value in ds:
print("value:",value)
16. TF filter() operator: ex #2
# "tf.math.equal(x, y)" is required
# for equality comparison
def filter_fn(x):
return tf.math.equal(x, 1)
ds = ds.filter(filter_fn)
print("Second iteration:")
for value in ds:
print("value:",value)
18. What are Lambda Expressions
a lambda expression takes an input variable
performs an operation on that variable
A "bare bones" lambda expression:
lambda x: x + 1
=> this adds 1 to an input variable x
19. TF filter() operator: ex #3
import tensorflow as tf # tf2_filter3.py
import numpy as np
ds = tf.data.Dataset.from_tensor_slices([1,2,3,4,5])
ds = ds.filter(lambda x: x < 4) # [1,2,3]
print("First iteration:")
for value in ds:
print("value:",value)
20. TF filter() operator: ex #3
# "tf.math.equal(x, y)" is required
# for equality comparison
def filter_fn(x):
return tf.math.equal(x, 1)
ds = ds.filter(filter_fn)
print("Second iteration:")
for value in ds:
print("value:",value)
31. TF map() operator: ex #2
import tensorflow as tf # tf2-map2.py
import numpy as np
x = np.array([[1],[2],[3],[4]])
ds = tf.data.Dataset.from_tensor_slices(x)
# METHOD #1: THE LONG WAY
# a lambda expression to double each value
#ds = ds.map(lambda x: x*2)
# a lambda expression to add one to each value
#ds = ds.map(lambda x: x+1)
# a lambda expression to cube each value
#ds = ds.map(lambda x: x**3)
32. TF map() operator: ex #2
# METHOD #2: A SHORTER WAY
ds = ds.map(lambda x: x*2).map(lambda x: x+1).map(lambda x: x**3)
for value in ds:
print("value:",value
# an example of “Method Chaining”
34. TF take() operator: ex #1
import tensorflow as tf # tf2-take.py
import numpy as np
ds = tf.data.Dataset.from_tensor_slices(tf.range(8))
ds = ds.take(5)
for value in ds.take(20):
print("value:",value)
36. TF take() operator: ex #2
import tensorflow as tf # tf2_take.py
import numpy as np
x = np.array([[1],[2],[3],[4]])
# make a ds from a numpy array
ds = tf.data.Dataset.from_tensor_slices(x)
ds = ds.map(lambda x: x*2)
.map(lambda x: x+1).map(lambda x: x**3)
for value in ds.take(4):
print("value:",value)
38. TF zip() operator: ex #1
import tensorflow as tf # tf2_zip1.py
import numpy as np
dx = tf.data.Dataset.from_tensor_slices([0,1,2,3,4])
dy = tf.data.Dataset.from_tensor_slices([1,1,2,3,5])
# zip the two datasets together
d2 = tf.data.Dataset.zip((dx, dy))
for value in d2:
print("value:",value)
39. TF zip() operator: ex #1
value:
(<tf.Tensor: id=11, shape=(), dtype=int32, numpy=0>,
<tf.Tensor: id=12, shape=(), dtype=int32, numpy=1>)
value:
(<tf.Tensor: id=13, shape=(), dtype=int32, numpy=1>,
<tf.Tensor: id=14, shape=(), dtype=int32, numpy=1>)
value:
(<tf.Tensor: id=15, shape=(), dtype=int32, numpy=2>,
<tf.Tensor: id=16, shape=(), dtype=int32, numpy=2>)
=> Plus two more rows of output
40. TF zip() operator: ex #2
import tensorflow as tf # tf2_zip_take.py
import numpy as np
x = np.arange(0, 10)
y = np.arange(1, 11)
dx = tf.data.Dataset.from_tensor_slices(x)
dy = tf.data.Dataset.from_tensor_slices(y)
# zip the two datasets together
d2 = tf.data.Dataset.zip((dx, dy)).batch(3)
for value in d2.take(8):
print("value:",value)
47. Generator Functions (2)
import tensorflow as tf # tf2-timesthree.py
import numpy as np
x = np.arange(0, 5) # 0, 1, 2, 3, 4
def gener():
for i in x:
yield (3*i)
ds = tf.data.Dataset.from_generator(gener, (tf.int64))
for value in ds.take(len(x)):
print("1value:",value)
for value in ds.take(2*len(x)):
print("2value:",value)
52. Processing Text Files (1)
define a TF Dataset with lines in file.txt
skip lines that start with a “#” character
then display only the first two lines
53. Contents of file.txt
#this is file line #1
#this is file line #2
this is file line #3
#this is file line #4
this is file line #5
#this is file line #6
54. Processing Text Files (2)
import tensorflow as tf # tf2_flatmap_filter.py
filenames = ["file.txt”]
ds = tf.data.Dataset.from_tensor_slices(filenames)
55. Processing Text Files (3)
ds = ds.flat_map(
lambda filename: (
tf.data.TextLineDataset(filename)
.skip(1)
.filter(lambda line:
tf.not_equal(tf.strings.substr(line,0,1),"#"))))
for value in ds.take(2):
print("value:",value)
59. Tf.data and MNIST
import tensorflow as tf # tf2_mnist.py
train, test = tf.keras.datasets.mnist.load_data()
mnist_x, mnist_y = train
mnist_ds=tf.data.Dataset.from_tensor_slices(mnist_x)
print(mnist_ds)
for value in mnist_ds.take(2):
print("value:",value)
61. TF2 generator Example
import tensorflow as tf # tf2_generator2.py
import numpy as np
x = np.arange(0, 12)
def gener():
i = 0
while(i < len(x/3)):
yield (i, i+1, i+2) # three integers at a time
i += 3
ds = tf.data.Dataset.from_generator(gener, (tf.int64,tf.int64,tf.int64))
third = int(len(x)/3)
for value in ds.take(third):
print("value:",value)
71. About Me: Recent Books
1) Python3 and Machine Learning (2020)
2) Angular 9 and Deep Learning (2020)
3) Angular 8 & Machine Learning (2020)
4) AI/ML/DL: Concepts and Code (2020)
5) Bash Programming on Mac (2020)
6) TensorFlow 2 Pocket Primer (2019)
7) TensorFlow 1.x Pocket Primer (2019)
8) Python for TensorFlow (2019)
9) C Programming Pocket Primer (2019)
72. About Me: Less Recent Books
10) RegEx Pocket Primer (2018)
11) Data Cleaning Pocket Primer (2018)
12) Angular Pocket Primer (2017)
13) Android Pocket Primer (2017)
14) CSS3 Pocket Primer (2016)
15) SVG Pocket Primer (2016)
16) Python Pocket Primer (2015)
17) D3 Pocket Primer (2015)
18) HTML5 Mobile Pocket Primer (2014)
73. About Me: Older Books
19) jQuery, CSS3, and HTML5 (2013)
20) HTML5 Pocket Primer (2013)
21) jQuery Pocket Primer (2013)
22) HTML5 Canvas (2012)
23) Flash on Android (2011)
24) Web 2.0 Fundamentals (2010)
25) MS Silverlight Graphics (2008)
26) Fundamentals of SVG (2003)
27) Java Graphics Library (2002)