Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow & Feature Engineering)

Session 2
Professional Machine Learning Engineer
Vasudev
@vasudevmaduri

Where are we on our journey
1
Session 2 Content Review
2
Q&A
4
Preview actions for next week
5
Sample Question Review
3

Professional Machine Learning Certification
Learning Journey Organized by Google Developer Groups Surrey co hosting with GDG Seattle
Session 1
Feb 24, 2024
Virtual
Session 2
Mar 2, 2024
Virtual
Session 3
Mar 9, 2024
Virtual
Session 4
Mar 16, 2024
Virtual
Session 5
Mar 23, 2024
Virtual
Session 6
Apr 6, 2024
Virtual Review the
Professional ML
Engineer Exam
Guide
Review the
Professional ML
Engineer Sample
Questions
Go through:
Google Cloud
Platform Big Data
and Machine
Learning
Fundamentals
Hands On Lab
Practice:
Perform
Foundational Data,
ML, and AI Tasks in
Google Cloud
(Skill Badge) - 7hrs
Build and Deploy ML
Solutions on Vertex
AI
(Skill Badge) - 8hrs
Self
study
(and
potential
exam)
Lightning talk +
Kick-off & Machine
Learning Basics +
Q&A
Lightning talk +
GCP- Tensorflow &
Feature Engineering
+ Q&A
Lightning talk +
Enterprise Machine
Learning + Q&A
Lightning talk +
Production Machine
Learning with
Google Cloud + Q&A
Lightning talk + NLP
& Recommendation
Systems on GCP +
Q&A
Lightning talk + MOPs
& ML Pipelines on GCP
+ Q&A
Complete course:
Introduction to AI and
Machine Learning on
Google Cloud
Launching into
Machine Learning
Complete course:
TensorFlow on
Google Cloud
Feature
Engineering
Complete course:
Machine Learning in
the Enterprise
Hands On Lab
Practice:
Production Machine
Learning Systems
Computer Vision
Fundamentals with
Google Cloud
Complete course:
Natural Language
Processing on Google
Cloud
Recommendation
Systems on GCP
Complete course:
ML Ops - Getting
Started
ML Pipelines on Google
Cloud
Check Readiness:
Professional ML
Engineer Sample
Questions

Session 2
Study Group
Preparation and Processing
- Data ingestion.
- Data exploration (EDA).
- Design data pipelines.
- Build data pipelines.
- Feature engineering.

Solution: TensorFlow Extended (TFX)
Data
Ingestion
TensorFlow
Data Validation
TensorFlow
Transform
Estimator or
Keras Model
TensorFlow
Model Analysis
TensorFlow
Serving
Logging
Shared Utilities for Garbage Collection, Data Access Controls
Pipeline Storage
Shared Configuration Framework and Job Orchestration
Integrated Frontend for Job Management, Monitoring, Debugging, Data/Model/Evaluation Visualization
An end-to-end tool for deploying production ML system
tensorflow.org/tfx

Solution: TensorFlow Extended (TFX)
An end-to-end tool for deploying production ML system
tensorflow.org/tfx

Data Ingestion Challenges
Data might not
fit into memory
Data might
require
(randomized)
pre-processing
Efficiently utilize
hardware
Decouple
loading + pre-
processing from
deployment

tf.data: TensorFlow Input Pipeline
13
Extract:
- read data from memory /
storage
- parse file format
Transform:
- text vectorization
- image transformations
- video temporal sampling
- shuffling, batching, …
Load:
- transfer data to the accelerator
time
flops
CPU
accelerators

import tensorflow as tf
def preprocess(record):
...
dataset = tf.data.TFRecordDataset(".../*.tfrecord")
dataset = dataset.map(preprocess)
dataset = dataset.batch(batch_size=32)
model = ...
model.fit(dataset, epochs=10)
15
15

...
model = ...
16
16
reads data from storage

...
model = ...
17
17
applies user-defined preprocessing

...
model = ...
18
18
batches data for training efficiency

...
model = ...
19
19
training APIs natively support tf.data

Efficient Resource Utilization

...
dataset = dataset.prefetch(buffer_size=X)
model = ...
23
23
23

...
model = ...
25
25
25

...
dataset = dataset.map(preprocess, num_parallel_calls=tf.data.AUTOTUNE)
model = ...
26
26
26

...
model = ...
28
28
28

...
dataset = tf.data.TFRecordDataset(".../*.tfrecord", num_parallel_reads=N)
model = ...
29
29
29

TensorFlow Data Validation (TFDV)
Helps developers understand, validate, and
monitor their ML data at scale
Used analyze and validate petabytes of
data at Google every day
Has a proven track record in maintaining
the health of production ML pipelines

NUMERIC
number
1-element
vector
BUCKETIZED
number
1-hot encoding
0 1 0
M L XL
buckets
CROSS
Red, Green, Blue
M , L, XL
Red
M
Red
L
Red
XL
Green
M
Green
L
Green
XL
Blue
M
Blue
L
Blue
XL
NxM categories
CATEGORICAL
Red, Green,
Blue
1-hot encoding
0 1 0 1-hot encoding
0 0 0
EMBEDDING
EMBEDDING
many categories
α0 α1 α2
α3 α4 α5
α6 α7 α8
α9 αA αB
Cat.1
Cat.2
Cat.3
Cat.4
trainable
params
“embedding” for Cat.
3
α6 α7 α8
0 1 0 0 0 0
Feature columns
Partition by range
Capture feature
interactions
Learn new
representations
Limit vocab. Size
HASHING
Infinite categories
Cat.1
Cat.2
Cat.3
Cat.1K
Cat.1M
.
.
.
.
.
With vocab list, vocab
files, or identity
0

There are three possible places to do feature engineering,
each of which has its pros and cons
Input
s
Pre
processin
g
Feature
creation
Train
model
Hyper-parameter tuning
Model
Preprocessed
features

1
Input
s
Pre
processin
g
Feature
creation
Train
model
Model
Preprocessed
features
TensorFlow*
*efficient
*tf methods only
*this input only

2 1
Input
s
Pre
processin
g
Feature
creation
Train
model
Model
Preprocessed
features
tf.transform*
*efficient
*tf methods only
*aggregates
TensorFlow*
*efficient
*tf methods only
*this input only

tf.transform*
*efficient
*tf methods only
*aggregates
Beam/Dataflow*
*in a pipeline
*Python/Java code
*time-windows
TensorFlow*
*efficient
*tf methods only
*this input only
3 2 1
Input
s
Pre
processin
g
Feature
creation
Train
model
Model
Preprocessed
features

Different cities in California have markedly different housing prices. Suppose you
must create a model to predict the housing prices. Which of the following sets of
features, or features crosses could learn city-specific relationships between
roomsPerPerson and housing price?
A. Three separated binned features: [binned latitude], [binned longitude],
[roomsPerPerson]
B. Two feature crosses: [binned latitude x roomsPerPerson] and [binned longitude
x roomsPerPerson]
C. One feature cross [latitude x longitude x roomsPerPerson]
D. One feature cross [binned latitude x binned longitude x binned roomsPerPerson]

By our next meeting
1. Complete
a. Machine Learning in the Enterprise

Link to badge
Redeem your participation badge
Thank you for joining the event

Thank you for
tuning in!
For any operational questions about access to
Cloud Skills Boost or the Road to Google
Developers Certification program contact: gdg-
support@google.com

Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow & Feature Engineering)

More Related Content

Similar to Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow & Feature Engineering)

More from gdgsurrey

Recently uploaded

Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow & Feature Engineering)