SlideShare a Scribd company logo
1 of 25
Download to read offline
tf.data: TensorFlow Input Pipeline
https://www.tensorflow.org/guide/data
https://www.tensorflow.org/guide/data_performance
1
Jiri Simsa
2
Why input pipeline API?
3
Why input pipeline API?
- data might not fit into memory
- data might require (randomized) pre-processing
- efficiently utilize hardware
- decouple loading + pre-processing from distribution
tf.data: TensorFlow Input Pipeline
4
Extract:
- read data from memory / storage
- parse file format
Transform:
- text vectorization
- image transformations
- video temporal sampling
- shuffling, batching, …
Load:
- transfer data to the accelerator
time
flops
CPU
accelerators
import tensorflow as tf
def preprocess(record):
...
dataset = tf.data.TFRecordDataset(".../*.tfrecord")
dataset = dataset.map(preprocess)
dataset = dataset.batch(batch_size=32)
model = ...
model.fit(dataset, epochs=10)
55
import tensorflow as tf
def preprocess(record):
...
dataset = tf.data.TFRecordDataset(".../*.tfrecord")
dataset = dataset.map(preprocess)
dataset = dataset.batch(batch_size=32)
model = ...
model.fit(dataset, epochs=10)
66
reads data from storage
import tensorflow as tf
def preprocess(record):
...
dataset = tf.data.TFRecordDataset(".../*.tfrecord")
dataset = dataset.map(preprocess)
dataset = dataset.batch(batch_size=32)
model = ...
model.fit(dataset, epochs=10)
77
applies user-defined preprocessing
import tensorflow as tf
def preprocess(record):
...
dataset = tf.data.TFRecordDataset(".../*.tfrecord")
dataset = dataset.map(preprocess)
dataset = dataset.batch(batch_size=32)
model = ...
model.fit(dataset, epochs=10)
88
batches data for training efficiency
import tensorflow as tf
def preprocess(record):
...
dataset = tf.data.TFRecordDataset(".../*.tfrecord")
dataset = dataset.map(preprocess)
dataset = dataset.batch(batch_size=32)
model = ...
model.fit(dataset, epochs=10)
99
training APIs natively support tf.data
10
Input Pipeline Performance
Software Pipelining
11
import tensorflow as tf
def preprocess(record):
...
dataset = tf.data.TFRecordDataset(".../*.tfrecord")
dataset = dataset.map(preprocess)
dataset = dataset.batch(batch_size=32)
model = ...
model.fit(dataset, epochs=10)
121212
import tensorflow as tf
def preprocess(record):
...
dataset = tf.data.TFRecordDataset(".../*.tfrecord")
dataset = dataset.map(preprocess)
dataset = dataset.batch(batch_size=32)
dataset = dataset.prefetch(buffer_size=X)
model = ...
model.fit(dataset, epochs=10)
131313
Parallel Transformation
14
import tensorflow as tf
def preprocess(record):
...
dataset = tf.data.TFRecordDataset(".../*.tfrecord")
dataset = dataset.map(preprocess)
dataset = dataset.batch(batch_size=32)
dataset = dataset.prefetch(buffer_size=X)
model = ...
model.fit(dataset, epochs=10)
151515
import tensorflow as tf
def preprocess(record):
...
dataset = tf.data.TFRecordDataset(".../*.tfrecord")
dataset = dataset.map(preprocess, num_parallel_calls=Y)
dataset = dataset.batch(batch_size=32)
dataset = dataset.prefetch(buffer_size=X)
model = ...
model.fit(dataset, epochs=10)
161616
Parallel Extraction
17
import tensorflow as tf
def preprocess(record):
...
dataset = tf.data.TFRecordDataset(".../*.tfrecord")
dataset = dataset.map(preprocess, num_parallel_calls=Y)
dataset = dataset.batch(batch_size=32)
dataset = dataset.prefetch(buffer_size=X)
model = ...
model.fit(dataset, epochs=10)
181818
import tensorflow as tf
def preprocess(record):
...
dataset = tf.data.TFRecordDataset(".../*.tfrecord", num_parallel_readers=Z)
dataset = dataset.map(preprocess, num_parallel_calls=Y)
dataset = dataset.batch(batch_size=32)
dataset = dataset.prefetch(buffer_size=X)
model = ...
model.fit(dataset, epochs=10)
191919
import tensorflow as tf
def preprocess(record):
...
dataset = tf.data.TFRecordDataset(".../*.tfrecord", num_parallel_readers=Z)
dataset = dataset.map(preprocess, num_parallel_calls=Y)
dataset = dataset.batch(batch_size=32)
dataset = dataset.prefetch(buffer_size=X)
model = ...
model.fit(dataset, epochs=10)
20
tf.data.experimental.AUTOTUNE
2020
tf.data Options
21
- determinism
- statistics
- optimizations (autotuning, fusion, vectorization, parallelization, ...)
- threading (private thread pool, intra op parallelism)
tf.data Options
22
- determinism
- statistics
- optimizations (autotuning, fusion, vectorization, parallelization, ...)
- threading (private thread pool, intra op parallelism)
dataset = ...
options = tf.data.Options()
options.experimental_optimization.map_parallelization = True
dataset = dataset.with_options(options)
TFDS: TensorFlow Datasets
23
- https://www.tensorflow.org/datasets/datasets
- canned datasets ready to be used
TFDS: TensorFlow Datasets
24
- https://www.tensorflow.org/datasets/datasets
- canned datasets ready to be used
import tensorflow as tf
import tensorflow_datasets as tfds
# See available datasets
print(tfds.list_builders())
# Construct a tf.data.Dataset
dataset = tfds.load(name="mnist", split=tfds.Split.TRAIN)
# Customize your input pipeline
dataset = dataset.shuffle(1024).batch(32)
for features in dataset.take(1):
image, label = features["image"], features["label"]
Thank You!

More Related Content

What's hot

Prometeusについてはじめてみよう / Let's start Prometeus
Prometeusについてはじめてみよう / Let's start PrometeusPrometeusについてはじめてみよう / Let's start Prometeus
Prometeusについてはじめてみよう / Let's start PrometeusTakeo Noda
 
密かに話題のBufferbloat
密かに話題のBufferbloat密かに話題のBufferbloat
密かに話題のBufferbloatKazuhito Ohkawa
 
20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_APIKohei KaiGai
 
10分でわかる Cilium と XDP / BPF
10分でわかる Cilium と XDP / BPF10分でわかる Cilium と XDP / BPF
10分でわかる Cilium と XDP / BPFShuji Yamada
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerOpenSource Connections
 
シスコ装置を使い倒す!組込み機能による可視化からセキュリティ強化
シスコ装置を使い倒す!組込み機能による可視化からセキュリティ強化シスコ装置を使い倒す!組込み機能による可視化からセキュリティ強化
シスコ装置を使い倒す!組込み機能による可視化からセキュリティ強化シスコシステムズ合同会社
 
Generalized Pipeline Parallelism for DNN Training
Generalized Pipeline Parallelism for DNN TrainingGeneralized Pipeline Parallelism for DNN Training
Generalized Pipeline Parallelism for DNN TrainingDatabricks
 
Programming the Network Data Plane
Programming the Network Data PlaneProgramming the Network Data Plane
Programming the Network Data PlaneC4Media
 
OpenStackアップストリーム活動実践 中級
OpenStackアップストリーム活動実践 中級OpenStackアップストリーム活動実践 中級
OpenStackアップストリーム活動実践 中級Takashi Natsume
 
PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜
PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜
PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜Preferred Networks
 
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会Shigeru Hanada
 
Topology Managerについて / Kubernetes Meetup Tokyo 50
Topology Managerについて / Kubernetes Meetup Tokyo 50Topology Managerについて / Kubernetes Meetup Tokyo 50
Topology Managerについて / Kubernetes Meetup Tokyo 50Preferred Networks
 
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceGet Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceNeo4j
 
HBase Consistency and Performance Improvements
HBase Consistency and Performance ImprovementsHBase Consistency and Performance Improvements
HBase Consistency and Performance ImprovementsDataWorks Summit
 
通信と放送の融合を考えるBoF 5
通信と放送の融合を考えるBoF 5通信と放送の融合を考えるBoF 5
通信と放送の融合を考えるBoF 5Masaaki Nabeshima
 
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2Preferred Networks
 

What's hot (20)

Prometeusについてはじめてみよう / Let's start Prometeus
Prometeusについてはじめてみよう / Let's start PrometeusPrometeusについてはじめてみよう / Let's start Prometeus
Prometeusについてはじめてみよう / Let's start Prometeus
 
密かに話題のBufferbloat
密かに話題のBufferbloat密かに話題のBufferbloat
密かに話題のBufferbloat
 
20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API
 
10分でわかる Cilium と XDP / BPF
10分でわかる Cilium と XDP / BPF10分でわかる Cilium と XDP / BPF
10分でわかる Cilium と XDP / BPF
 
Apache flink
Apache flinkApache flink
Apache flink
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
 
シスコ装置を使い倒す!組込み機能による可視化からセキュリティ強化
シスコ装置を使い倒す!組込み機能による可視化からセキュリティ強化シスコ装置を使い倒す!組込み機能による可視化からセキュリティ強化
シスコ装置を使い倒す!組込み機能による可視化からセキュリティ強化
 
Generalized Pipeline Parallelism for DNN Training
Generalized Pipeline Parallelism for DNN TrainingGeneralized Pipeline Parallelism for DNN Training
Generalized Pipeline Parallelism for DNN Training
 
Openflow実験
Openflow実験Openflow実験
Openflow実験
 
Programming the Network Data Plane
Programming the Network Data PlaneProgramming the Network Data Plane
Programming the Network Data Plane
 
Apache Spark Components
Apache Spark ComponentsApache Spark Components
Apache Spark Components
 
OpenStackアップストリーム活動実践 中級
OpenStackアップストリーム活動実践 中級OpenStackアップストリーム活動実践 中級
OpenStackアップストリーム活動実践 中級
 
PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜
PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜
PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜
 
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
PostgreSQLのパラレル化に向けた取り組み@第30回(仮名)PostgreSQL勉強会
 
Topology Managerについて / Kubernetes Meetup Tokyo 50
Topology Managerについて / Kubernetes Meetup Tokyo 50Topology Managerについて / Kubernetes Meetup Tokyo 50
Topology Managerについて / Kubernetes Meetup Tokyo 50
 
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceGet Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
 
HBase Consistency and Performance Improvements
HBase Consistency and Performance ImprovementsHBase Consistency and Performance Improvements
HBase Consistency and Performance Improvements
 
Machine Intelligence at Google Scale: TensorFlow
Machine Intelligence at Google Scale: TensorFlowMachine Intelligence at Google Scale: TensorFlow
Machine Intelligence at Google Scale: TensorFlow
 
通信と放送の融合を考えるBoF 5
通信と放送の融合を考えるBoF 5通信と放送の融合を考えるBoF 5
通信と放送の融合を考えるBoF 5
 
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
 

Similar to tf.data: TensorFlow Input Pipeline

TensorFlow.Data 및 TensorFlow Hub
TensorFlow.Data 및 TensorFlow HubTensorFlow.Data 및 TensorFlow Hub
TensorFlow.Data 및 TensorFlow HubJeongkyu Shin
 
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...gdgsurrey
 
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...confluent
 
Introduction to TensorFlow 2
Introduction to TensorFlow 2Introduction to TensorFlow 2
Introduction to TensorFlow 2Oswald Campesato
 
Introduction to TensorFlow 2 and Keras
Introduction to TensorFlow 2 and KerasIntroduction to TensorFlow 2 and Keras
Introduction to TensorFlow 2 and KerasOswald Campesato
 
TensorFlow example for AI Ukraine2016
TensorFlow example  for AI Ukraine2016TensorFlow example  for AI Ukraine2016
TensorFlow example for AI Ukraine2016Andrii Babii
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Chris Fregly
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?Miklos Christine
 
Theano vs TensorFlow | Edureka
Theano vs TensorFlow | EdurekaTheano vs TensorFlow | Edureka
Theano vs TensorFlow | EdurekaEdureka!
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Anyscale
 
How Many Ohs? (An Integration Guide to Apex & Triple-o)
How Many Ohs? (An Integration Guide to Apex & Triple-o)How Many Ohs? (An Integration Guide to Apex & Triple-o)
How Many Ohs? (An Integration Guide to Apex & Triple-o)OPNFV
 
Tensorflow in practice by Engineer - donghwi cha
Tensorflow in practice by Engineer - donghwi chaTensorflow in practice by Engineer - donghwi cha
Tensorflow in practice by Engineer - donghwi chaDonghwi Cha
 
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDistributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDatabricks
 
TensorFlow Dev Summit 2017 요약
TensorFlow Dev Summit 2017 요약TensorFlow Dev Summit 2017 요약
TensorFlow Dev Summit 2017 요약Jin Joong Kim
 
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...confluent
 
Introduction to TensorFlow 2
Introduction to TensorFlow 2Introduction to TensorFlow 2
Introduction to TensorFlow 2Oswald Campesato
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkDatabricks
 
Time Series Analysis for Network Secruity
Time Series Analysis for Network SecruityTime Series Analysis for Network Secruity
Time Series Analysis for Network Secruitymrphilroth
 
Spark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit
 

Similar to tf.data: TensorFlow Input Pipeline (20)

TensorFlow.Data 및 TensorFlow Hub
TensorFlow.Data 및 TensorFlow HubTensorFlow.Data 및 TensorFlow Hub
TensorFlow.Data 및 TensorFlow Hub
 
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
 
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
 
Introduction to TensorFlow 2
Introduction to TensorFlow 2Introduction to TensorFlow 2
Introduction to TensorFlow 2
 
Introduction to TensorFlow 2 and Keras
Introduction to TensorFlow 2 and KerasIntroduction to TensorFlow 2 and Keras
Introduction to TensorFlow 2 and Keras
 
Meetup tensorframes
Meetup tensorframesMeetup tensorframes
Meetup tensorframes
 
TensorFlow example for AI Ukraine2016
TensorFlow example  for AI Ukraine2016TensorFlow example  for AI Ukraine2016
TensorFlow example for AI Ukraine2016
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?
 
Theano vs TensorFlow | Edureka
Theano vs TensorFlow | EdurekaTheano vs TensorFlow | Edureka
Theano vs TensorFlow | Edureka
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
 
How Many Ohs? (An Integration Guide to Apex & Triple-o)
How Many Ohs? (An Integration Guide to Apex & Triple-o)How Many Ohs? (An Integration Guide to Apex & Triple-o)
How Many Ohs? (An Integration Guide to Apex & Triple-o)
 
Tensorflow in practice by Engineer - donghwi cha
Tensorflow in practice by Engineer - donghwi chaTensorflow in practice by Engineer - donghwi cha
Tensorflow in practice by Engineer - donghwi cha
 
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDistributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
 
TensorFlow Dev Summit 2017 요약
TensorFlow Dev Summit 2017 요약TensorFlow Dev Summit 2017 요약
TensorFlow Dev Summit 2017 요약
 
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
 
Introduction to TensorFlow 2
Introduction to TensorFlow 2Introduction to TensorFlow 2
Introduction to TensorFlow 2
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
 
Time Series Analysis for Network Secruity
Time Series Analysis for Network SecruityTime Series Analysis for Network Secruity
Time Series Analysis for Network Secruity
 
Spark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim HunterSpark Summit EU talk by Tim Hunter
Spark Summit EU talk by Tim Hunter
 

More from Alluxio, Inc.

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioAlluxio, Inc.
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingAlluxio, Inc.
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio, Inc.
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...Alluxio, Inc.
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionAlluxio, Inc.
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeAlluxio, Inc.
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudAlluxio, Inc.
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderAlluxio, Inc.
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionAlluxio, Inc.
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio, Inc.
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...Alluxio, Inc.
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAlluxio, Inc.
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...Alluxio, Inc.
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...Alluxio, Inc.
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAlluxio, Inc.
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAlluxio, Inc.
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio, Inc.
 

More from Alluxio, Inc. (20)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
 

Recently uploaded

Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 

Recently uploaded (20)

Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 

tf.data: TensorFlow Input Pipeline

  • 1. tf.data: TensorFlow Input Pipeline https://www.tensorflow.org/guide/data https://www.tensorflow.org/guide/data_performance 1 Jiri Simsa
  • 3. 3 Why input pipeline API? - data might not fit into memory - data might require (randomized) pre-processing - efficiently utilize hardware - decouple loading + pre-processing from distribution
  • 4. tf.data: TensorFlow Input Pipeline 4 Extract: - read data from memory / storage - parse file format Transform: - text vectorization - image transformations - video temporal sampling - shuffling, batching, … Load: - transfer data to the accelerator time flops CPU accelerators
  • 5. import tensorflow as tf def preprocess(record): ... dataset = tf.data.TFRecordDataset(".../*.tfrecord") dataset = dataset.map(preprocess) dataset = dataset.batch(batch_size=32) model = ... model.fit(dataset, epochs=10) 55
  • 6. import tensorflow as tf def preprocess(record): ... dataset = tf.data.TFRecordDataset(".../*.tfrecord") dataset = dataset.map(preprocess) dataset = dataset.batch(batch_size=32) model = ... model.fit(dataset, epochs=10) 66 reads data from storage
  • 7. import tensorflow as tf def preprocess(record): ... dataset = tf.data.TFRecordDataset(".../*.tfrecord") dataset = dataset.map(preprocess) dataset = dataset.batch(batch_size=32) model = ... model.fit(dataset, epochs=10) 77 applies user-defined preprocessing
  • 8. import tensorflow as tf def preprocess(record): ... dataset = tf.data.TFRecordDataset(".../*.tfrecord") dataset = dataset.map(preprocess) dataset = dataset.batch(batch_size=32) model = ... model.fit(dataset, epochs=10) 88 batches data for training efficiency
  • 9. import tensorflow as tf def preprocess(record): ... dataset = tf.data.TFRecordDataset(".../*.tfrecord") dataset = dataset.map(preprocess) dataset = dataset.batch(batch_size=32) model = ... model.fit(dataset, epochs=10) 99 training APIs natively support tf.data
  • 12. import tensorflow as tf def preprocess(record): ... dataset = tf.data.TFRecordDataset(".../*.tfrecord") dataset = dataset.map(preprocess) dataset = dataset.batch(batch_size=32) model = ... model.fit(dataset, epochs=10) 121212
  • 13. import tensorflow as tf def preprocess(record): ... dataset = tf.data.TFRecordDataset(".../*.tfrecord") dataset = dataset.map(preprocess) dataset = dataset.batch(batch_size=32) dataset = dataset.prefetch(buffer_size=X) model = ... model.fit(dataset, epochs=10) 131313
  • 15. import tensorflow as tf def preprocess(record): ... dataset = tf.data.TFRecordDataset(".../*.tfrecord") dataset = dataset.map(preprocess) dataset = dataset.batch(batch_size=32) dataset = dataset.prefetch(buffer_size=X) model = ... model.fit(dataset, epochs=10) 151515
  • 16. import tensorflow as tf def preprocess(record): ... dataset = tf.data.TFRecordDataset(".../*.tfrecord") dataset = dataset.map(preprocess, num_parallel_calls=Y) dataset = dataset.batch(batch_size=32) dataset = dataset.prefetch(buffer_size=X) model = ... model.fit(dataset, epochs=10) 161616
  • 18. import tensorflow as tf def preprocess(record): ... dataset = tf.data.TFRecordDataset(".../*.tfrecord") dataset = dataset.map(preprocess, num_parallel_calls=Y) dataset = dataset.batch(batch_size=32) dataset = dataset.prefetch(buffer_size=X) model = ... model.fit(dataset, epochs=10) 181818
  • 19. import tensorflow as tf def preprocess(record): ... dataset = tf.data.TFRecordDataset(".../*.tfrecord", num_parallel_readers=Z) dataset = dataset.map(preprocess, num_parallel_calls=Y) dataset = dataset.batch(batch_size=32) dataset = dataset.prefetch(buffer_size=X) model = ... model.fit(dataset, epochs=10) 191919
  • 20. import tensorflow as tf def preprocess(record): ... dataset = tf.data.TFRecordDataset(".../*.tfrecord", num_parallel_readers=Z) dataset = dataset.map(preprocess, num_parallel_calls=Y) dataset = dataset.batch(batch_size=32) dataset = dataset.prefetch(buffer_size=X) model = ... model.fit(dataset, epochs=10) 20 tf.data.experimental.AUTOTUNE 2020
  • 21. tf.data Options 21 - determinism - statistics - optimizations (autotuning, fusion, vectorization, parallelization, ...) - threading (private thread pool, intra op parallelism)
  • 22. tf.data Options 22 - determinism - statistics - optimizations (autotuning, fusion, vectorization, parallelization, ...) - threading (private thread pool, intra op parallelism) dataset = ... options = tf.data.Options() options.experimental_optimization.map_parallelization = True dataset = dataset.with_options(options)
  • 23. TFDS: TensorFlow Datasets 23 - https://www.tensorflow.org/datasets/datasets - canned datasets ready to be used
  • 24. TFDS: TensorFlow Datasets 24 - https://www.tensorflow.org/datasets/datasets - canned datasets ready to be used import tensorflow as tf import tensorflow_datasets as tfds # See available datasets print(tfds.list_builders()) # Construct a tf.data.Dataset dataset = tfds.load(name="mnist", split=tfds.Split.TRAIN) # Customize your input pipeline dataset = dataset.shuffle(1024).batch(32) for features in dataset.take(1): image, label = features["image"], features["label"]