SlideShare a Scribd company logo
1 of 66
Session 4
Vikram Tiwari
Professional Machine Learning Engineer
Margaret Maynard-Reid
Professional Machine Learning Certification
Learning Journey Organized by Google Developer Groups Surrey co hosting with GDG Seattle
Session 1
Feb 24, 2024
Virtual
Session 2
Mar 2, 2024
Virtual
Session 3
Mar 9, 2024
Virtual
Session 4
Mar 16, 2024
Virtual
Session 5
Mar 23, 2024
Virtual
Session 6
Apr 6, 2024
Virtual Review the
Professional ML
Engineer Exam
Guide
Review the
Professional ML
Engineer Sample
Questions
Go through:
Google Cloud
Platform Big Data
and Machine
Learning
Fundamentals
Hands On Lab
Practice:
Perform
Foundational Data,
ML, and AI Tasks in
Google Cloud
(Skill Badge) - 7hrs
Build and Deploy ML
Solutions on Vertex
AI
(Skill Badge) - 8hrs
Self
study
(and
potential
exam)
Lightning talk +
Kick-off & Machine
Learning Basics +
Q&A
Lightning talk +
GCP- Tensorflow &
Feature Engineering
+ Q&A
Lightning talk +
Enterprise Machine
Learning + Q&A
Production ML
Systems and
Computer Vision
with Google Cloud +
Q&A
Lightning talk + NLP
& Recommendation
Systems on GCP +
Q&A
Lightning talk + MOPs
& ML Pipelines on GCP
+ Q&A
Complete course:
Introduction to AI and
Machine Learning on
Google Cloud
Launching into
Machine Learning
Complete course:
TensorFlow on Google
Cloud
Feature
Engineering
Complete course:
Machine Learning in
the Enterprise
Hands On Lab
Practice:
Production Machine
Learning Systems
Computer Vision
Fundamentals with
Google Cloud
Complete course:
Natural Language
Processing on Google
Cloud
Recommendation
Systems on GCP
Complete course:
ML Ops - Getting
Started
ML Pipelines on Google
Cloud
Check Readiness:
Professional ML
Engineer Sample
Questions
Session 4
Study Group
Computer Vision
● Vision API & AutoML Vision
● Beyond the course
Model Development
● Build a model.
● Train a model.
● Test a model.
● Scale model training and serving.
Computer Vision
on Google Cloud
ML GDE (Google Developer Expert)
GDG Seattle organizer
3D artist
Fashion Designer
Instructor at UW
About me
margaretmz.art
5
Computer Vision on Google Cloud (Cloud Skills Boost)
Course Overview
Module 1
Introduction
to Computer
Vision
Module 2
Vertex AI
AutoML Vision
Module 3
Custom
Training
Moule 4
Convolutional
Neural
Network
Module 5
Working with
Image Data
● What is computer vision
● Different types of computer vision use cases
● Various ML tools on Google Cloud
● Experiment pre-built APIs
Intro to Computer Vision
Module 1
Computer Vision Use Cases
Complexity
Image classification
(single-label)
Classify an image to
a class
Examples
Painting style or
artist
Van Gogh
Image classification
(multi-label)
Classify an image to
multiple classes
Examples
Movie poster genre
action, sci-fi
Feature extraction
Extracting latent
features of an
image with CNN
models
Examples
Visual search
find similar fashion
Object detection
Identify one or
multiple objects
within an image and
their locations with
bounding boxes.
detect UI elements
Segmentation
Classify whether
each pixel of the
image belongs to a
certain class
segment UI elements
Generative models
Computer Vision (+
NLP)
Examples
- Generate new images
- Super resolution
- Image-to-image
- Text-to-image etc.
an generated image
8
Google Cloud Vision API
Module 1 labs
Lab 1
Detecting Labels, Faces and
Landmarks in Images with the Cloud
Vision API
● Create a bucket
● Upload an image (public access)
● Send json request
● Receive json response
Lab 2
Extracting Text from the Images
using the Google Cloud Vision API
● Cloud Functions
● Upload images to Cloud
Storage
● Extract, translate and save
text
● Intro to Vertex AI - Google’s unified AI platform
● Automated ML pipeline with AutoML
● AutoML Vision
● AutoML example
● Options: Vision API vs AutoML Vision vs custom training
Module 2
Vertex AI & Auto ML Vision
● Image classification
● Custom Image classifier with 5-flowers dataset
● TensorFlow:
○ Linear network
○ Neural network
○ Deep Neural Networks (DNN)
● Dropout and Batch Normalization
Module 3
Custom Training
● How to use CNN
● What makes CNN different?
● Key CNN model parameters: filters, # of channels, kernel size etc
● Working with Polling Layers
● Implement CNNs on Vertex with pre-built TensorFlow container using
Vertex workbench
Module 4
Convolutional Neural Networks (CNN)
● Preprocessing (with Keras and TensorFlow dataset)
● Data scarcity problem:
○ Image Augmentation
○ Transfer learning
Module 5 - Image Data
● Why transfer learning?
○ Less data and faster training
● What is transfer learning?
● How to use transfer learning
Transfer learning
Why, What & How
“I know Kungfu…”
Core Services
Cloud Vision API
-Image labeling
-Face detection
-Landmark detection
-Text extraction
(OCR)
AutoML Vision
-Image classification
-Object Detection
Specialized Services
Video Intelligence
API
-Shot change
detection
-Object tracking
-Text detection
Document AI
-Form parsing
-Invoice/receipt
processing
Vertex AI
Gemini Pro Vision
-Visual analysis
-Multimodal Q&A
Computer Vision on Google Cloud
Imagen 2
-Image generation
-Image editing
-Visual captioning
-Visual Q&A
Production ML Systems
Production ML
@Vikram_Tiwari
is hard
ML Systems
ML Systems
ML lifecycle - People
21
ML lifecycle - Tasks
ML lifecycle - Infrastructure
In a notebook Local machine/VM On the cloud
AI Platform
Exploration Phase
(Component Wise Test)
Development Phase
(ML Pipeline Test as a Whole)
Production Phase
(Integrate with Other Products)
23
ML lifecycle - Data
Skew and Drift are the silent killers of
your ML models
Training-Serving Skew
Feature at training: green banana
Feature at serving: yellow banana
Prediction Drift
Feature at serving: changing from
green to yellow
24
Research Production
Objectives Model performance Different stakeholders have different
objectives
Computational priority Fast training, high throughput Fast inference, low latency
Data Static Constantly shifting
Fairness Good to have (sadly) Important
Interpretability Good to have Important
Research vs Production
2
5
Session 4
@Vikram_Tiwari
@chipro
@hanneshapke
Production ML
- Architecting production ML systems
- Designing adaptable ML systems
- Designing High-performance ML systems
- Building Hybrid ML systems
Three ways Google Cloud can help you benefit from ML
Retrained models:
your data + our models
Pre-Trained models:
our data + our models
Custom models:
your data + your model
Vision
Translation Natural
Language
Speech-
to-Text
Job Discovery
Video
Intelligence
Text-to
Speech
AutoML
Easy-to-Use, for non-ML engineers Customizable, for Data Scientists
3
1 2
Compute
Engine
GPU Cloud TPU
Cloud
Dataproc
Kubernetes
Engine
BigQuery
AI Platform
Training & Prediction
Translate
NLP
Speech
Vision
Tables
Recomm
dation
Dialogflow
Enterprise
End-to-end environment for AI inside GCP
console
Offers an integrated tool chain from data
engineering to model deployment with “no lock-
in”
Allows you to run on-premises or on Google Cloud
without significant code changes.
Access to cutting-edge Google AI technology
like TensorFlow, TPUs, and TFX tools as you deploy
your AI applications to production.
What is AI Platform?
AI
Platfor
m
What is included?
AI Platform
Integrated with
Deep Learning
VM Images
Cloud
Dataflow
Cloud
Dataproc
Google
BigQuery
Cloud
Dataprep
Google Data
Studio
Notebooks
Data Labeling
Training Predictions
Pre-built
Algorithms
For data
warehousing
For data
transformation
For data
cleansing
For Hadoop and
Spark clusters
For BI
dashboards
What is included?
Kubeflow
(On premises)
AI Platform
Integrated with
Pipelines
Cloud
Dataflow
Cloud
Dataproc
Google
BigQuery
Cloud
Dataprep
Google Data
Studio
Notebooks
Data Labeling
Training Predictions
Pre-built
Algorithms
For data
warehousing
For data
transformation
For data
cleansing
For Hadoop and
Spark clusters
For BI
dashboards
AI Hub
AI Platform Notebooks
A hosted Jupyter notebook solution that makes
it easy for Data Scientists to spin up
JupyterLab; and gives DevOps teams the
controls they need.
Centrally managed: DevOps teams can easily
manage and secure these environments.
Get started quickly: Latest data science and
machine learning frameworks are pre-configured.
No learning curve: Uses the industry standard
JupyterLab interface.
Scalable & cost-effective: Pick the hardware you
need; and scale up and down easily.
GCP integration: It’s easy to access and use GCP
services from within your notebooks.
Easily build, train, and deploy models: Supports
the full ML lifecycle through integration with the
most popular ML frameworks and tools.
One-Click
Deployment
Spin up a JuypterLab instance, pre-
configured with the latest machine
learning and data science
frameworks in one click.
Get started quickly
JupyterLab
AI Platform Notebooks uses the latest
open-source version of the industry-
standard JupyterLab.
No learning curve
Scale
On Demand
You can easily change hardware
including adding and removing GPUs.
Scalable & cost-effective
What is included?
Kubeflow
(On premises)
AI Platform
Integrated with
Pipelines
Cloud
Dataflow
Cloud
Dataproc
Google
BigQuery
Cloud
Dataprep
Google Data
Studio
Notebooks
Data Labeling
Training Predictions
Pre-built
Algorithms
For data
warehousing
For data
transformation
For data
cleansing
For Hadoop and
Spark clusters
For BI
dashboards
AI Hub
● Serverless and no-ops ML training
● Distributed training infrastructure that
supports CPUs, GPUs and TPUs
● Hyperparameter tuning
● Train and tune TensorFlow models,
Scikit-learn models, XGBoost models
and custom containers
● Multiple runtime versions for different
frameworks
● Prebuilt algorithms (TensorFlow linear
learner and wide&deep algorithm,
XGBoost algorithm)
AI Platform Training
Package TensorFlow trainer
https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/cloudml-template
Project root
directory
Package Model trainer
https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/cloudml-template
Project root directory
python task.py
--train-files $TRAIN_DATA
--eval-files $EVAL_DATA
task.py
➔ Accept command line arguments
➔ Upload the model to GCS
Training locally
gcloud ai-platform local train 
--module-name trainer.task --package-path trainer/ 
-- 
--train-files $TRAIN_DATA --eval-files $EVAL_DATA --job-dir $MODEL_DIR
training
data evaluation
data
output
directory
train locally
Local path
gcloud ai-platform jobs submit training $JOB_NAME --job-dir $OUTPUT_PATH 
--runtime-version 1.13 --module-name trainer.task --package-path trainer --region $REGION 
--scale-tier BASIC
-- 
--train-files $TRAIN_DATA --eval-files $EVAL_DATA
single worker
https://cloud.google.com/ai-platform/training/docs/machine-types
Training in the cloud
with single node
Training in the cloud at scale
gcloud ai-platform jobs submit training $JOB_NAME --job-dir $OUTPUT_PATH 
--runtime-version 1.13 --module-name trainer.task --package-path trainer --region $REGION 
--scale-tier BASIC_GPU
-- 
--train-files $TRAIN_DATA --eval-files $EVAL_DATA
single GPU
https://cloud.google.com/ai-platform/training/docs/machine-types
with GPUs (K80/P100/V100 - availability by region)
Training in the cloud at scale
gcloud ai-platform jobs submit training $JOB_NAME --job-dir $OUTPUT_PATH 
--runtime-version 1.13 --module-name trainer.task --package-path trainer --region $REGION 
--scale-tier BASIC_TPU
-- 
--train-files $TRAIN_DATA --eval-files $EVAL_DATA
TPU Device
https://cloud.google.com/ai-platform/training/docs/using-tpus
with TPUs
gcloud ai-platform jobs submit training $JOB_NAME --job-dir $OUTPUT_PATH 
--runtime-version 1.13 --module-name trainer.task --package-path trainer --region $REGION 
--scale-tier CUSTOM --config config.yaml
-- 
--train-files $TRAIN_DATA --eval-files $EVAL_DATA
custom cluster
https://cloud.google.com/ai-platform/training/docs/machine-types
trainingInput:
scaleTier: CUSTOM
masterType: complex_model_l
workerType: complex_model_l_gpu
workerCount: 10
parameterServerType: large_model
https://cloud.google.com/ai-platform/training/pricing
Training in the cloud at scale
with custom cluster specs
gcloud ai-platform jobs submit training $JOB_NAME --job-dir $OUTPUT_PATH 
--runtime-version 1.13 --module-name trainer.task --package-path trainer/ --region $REGION

--scale-tier PREMIUM_1 --config config.yaml
-- 
--train-files $TRAIN_DATA --eval-files $EVAL_DATA
hypertuning
Hyperparameter tuning
Hyperparameter tuning
● Automatic hyperparameter tuning
service
● Google-developed “black-box” search
(Bayesian Optimisation) algorithm
● In addition to Random Search and Grid
Search
● Supports numeric, discrete, and
categorical params
● Early stopping & resumability
Objective
We want to find this
Not these
https://cloud.google.com/blog/big-data/2017/08/hyperparameter-tuning-in-cloud-machine-learning-engine-using-bayesian-optimization
https://cloud.google.com/blog/big-data/2018/03/hyperparameter-tuning-on-google-cloud-platform-is-now-faster-and-smarter
trainingInput:
hyperparameters:
goal: MAXIMIZE
hyperparameterMetricTag: accuracy
maxTrials: 40
enableTrialEarlyStopping: True
maxParallelTrials: 2
algorithm: UNSPECIFIED
params:
- parameterName: learning-rate
type: FLOAT
minValue: 0.001
maxValue: 0.1
scaleType: UNIT_LOG_SCALE
...
...
# Initialise the optimizer for the DNN
optimizer = tf.train.AdagradOptimizer(
learning_rate=hparams.learning_rate)
...
parser.add_argument(
'--learning-rate',
help='Learning rate used by the DNN
optimizer',
default=0.01,
type=float
)
...
config.yaml
task.py
Hyperparameter tuning
What is included?
AI Platform
Integrated with
Pipelines
Cloud
Dataflow
Google
BigQuery
Cloud
Dataprep
Data Labeling
Training
Pre-built
Algorithms
For data
warehousing
For data
transformation
For data
cleansing
Built-in Algorithms
Start an ML Engine training job using the built-
in algorithms.
No coding required! Just use the provided UI.
Training in 4 easy steps
Training algorithm Training data Algorithm arguments Job settings
1 2 3 4
What is included?
AI Platform
Integrated with
Pipelines
Cloud
Dataflow
Cloud
Dataproc
Google
BigQuery
Cloud
Dataprep
Google Data
Studio
Notebooks
Data Labeling
Training Predictions
For data
warehousing
For data
transformation
For data
cleansing
For Hadoop and
Spark clusters
For BI
dashboards
Pre-built
Algorithms
● Serverless and no-ops ML serving
● Batch prediction for TensorFlow
models on CPUs and GPUs
● Online prediction for Scikit-learn
models, XGBoost models and Custom
prediction routines
● Explainability using different methods
(Integrated Gradients (TF),
TreeSHAP(XGB), Sampled Shapley,
Exact Shapley)
● Data Services for Data Labeling and
Continues Evaluation
AI Platform Prediction
Deploy the trained TF model
# Creating model
gcloud ai-platform models create $NAME --regions $REGION
# Creating versions
gcloud ai-platform versions create $VERSION --model $NAME --origin $MODEL_DIR 
--runtime-version 1.7
gcloud command line tool:
Predicting
POST https://ml.googleapis.com/v1/projects/your-project-id/
models/${model-name}/
versions/${version}:predict
Request Body:
{
"instances": [
[0.0, 1.1, 2.2],
[3.3, 4.4, 5.5],
...
]
}
batch prediction*: online prediction*:
cloud ai-platform jobs submit prediction
$JOB_NAME
--model $NAME
--version $VERSION
--data-format TEXT
--input-paths $GCS_DATA_DIR
--output-path $GCS_OUT_DIR
*gcloud commands and APIs exists for both methods
Simply choose an explanation method
when you set up a model, and Cloud AI
Platform will tell on every prediction how
much each feature affected the final result
Explainable AI (XAI):
Cloud AI Platform
provides analysis with
every prediction
Cloud AI Platform
Prediction Service
Data
Model
Supported AI Platform explanation methods
Support
Method Frameworks Data types Paper link
Integrated
Gradients
TensorFlow Tabular, image, text
(differentiable
modes)
arxiv.org/abs/1703.01365
Sampled Shapley TensorFlow Tabular arxiv.org/pdf/1306.4265
XRAI TensorFlow Image arxiv.org/abs/1906.02825
Deploying explainable models on AI Platform
!gcloud beta ai-platform versions create $VERSION 
--model $MODEL 
--origin $export_path 
--runtime-version 2.1 
--framework 'TENSORFLOW' 
--python-version 3.7 
--machine-type n1-standard-4 
--explanation-method 'integrated-gradients' 
--num-integral-steps 25
Deploy model
Request predictions ...with explanations
...with explanations
gcloud beta ai-platform versions create $VERSION 
--model $MODEL 
--origin $export_path 
--runtime-version 1.15 
--framework TENSORFLOW 
--python-version 3.7 
--machine-type n1-standard-4 
gcloud beta ai-platform versions create $VERSION 
--model $MODEL 
--origin $export_path 
--runtime-version 1.15 
--framework TENSORFLOW 
--python-version 3.7 
--machine-type n1-standard-4 
--explanation-method 'integrated-gradients' 
--num-integral-steps 25
gcloud beta ai-platform predict 
--model $MODEL 
--version $VERSION 
--json-instances='data.txt'
gcloud beta ai-platform explain 
--model $MODEL 
--version $VERSION 
--json-instances='data.txt'
Steps to train a TensorFlow model - Docker user journey
1. Develop a TensorFlow model and training code
2. Create a Dockerfile with your model code
3. Build the image
4. Push it to a container registry (e.g. Google Container Registry)
5. Kick off your AI Platform training job
First: Create your model and training code
model = tf.keras.Sequential(
[
Dense(100, activation=relu,
input_shape=(input_dim,)),
Dense(75, activation=relu),
Dense(50, activation=relu),
Dense(25, activation=relu),
Dense(1, activation=sigmoid)
])
Sample code showing structure in cloudml-samples repo
# Train model
keras_model.fit(
training_dataset,
steps_per_epoch=int(num_train_examples /
args.batch_size),
epochs=args.num_epochs,
validation_data=validation_dataset,
validation_steps=1,
verbose=1,
callbacks=[lr_decay_cb, tensorboard_cb])
model.py task.py
Second: Create your Dockerfile
FROM gcr.io/deeplearning-platform-release/tf2-ent-latest-gpu
WORKDIR /root
COPY model.py /root/model.py
COPY task.py /root/task.py
ENTRYPOINT ["python", "task.py"]
Extend DLVM Image
Third: Build, test, and push image
IMAGE = "gcr.io/MY-PROJECT/MY-REPO:MY_IMAGE"
# Build image
docker build -f Dockerfile -t $IMAGE
# Test locally
docker run $IMAGE --lr 0.1
# Push to container registry
docker push $IMAGE
Run locally to test. You can pass custom model
parameters (e.g. learning rate) into the image.
Fourth: Submit Training Job
gcloud ai-platform jobs submit training my-job 
--region us-west1 
--master-image-uri gcr.io/my-project/my-repo:my-image 
-- 
--lr=0.1
Standard parameters
for the AI Platform
command
Everything after the  are custom parameters that
your training code is designed to accept
Optional: Add hyper-parameter tuning
trainingInput:
hyperparameters:
goal: MINIMIZE
hyperparameterMetricTag: "my_loss"
maxTrials: 20
maxParallelTrials: 5
enableTrialEarlyStopping: True
params:
- parameterName: lr
type: DOUBLE
minValue: 0.0001
maxValue: 0.1
Add parameter --config config.yaml to training job
...
parser.add_argument(
'--lr',
type=float,
default=0.01,
metavar='LR',
help='learning rate (default: 0.01)')
...
config.yaml task.py
View Job in Console
What is included?
AI Platform
Integrated with
Pipelines
Cloud
Dataflow
Cloud
Dataproc
Google
BigQuery
Cloud
Dataprep
Notebooks
Data Labeling
Training Predictions
Pre-built
Algorithms
For data
warehousing
For data
transformation
For data
cleansing
For Hadoop and
Spark clusters
Data Labeling
Service
Custom Instructions: Provide your own custom
instructions to labelers
Human Labeled Data: Get high quality human
labeled data to train and evaluate for your ML
models
Labeling tasks for unstructured data: Task focusing
on images, videos and text
Continues Evaluation: Record sample predictions
on BQ and sent for evaluation

More Related Content

Similar to Production ML Systems and Computer Vision with Google Cloud

Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentDatabricks
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019GoDataDriven
 
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...Naoki (Neo) SATO
 
Accelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSAccelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSSri Ambati
 
Build, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at ScaleBuild, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at ScaleAmazon Web Services
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated MLMark Tabladillo
 
DSC Cloud Study Jams
DSC Cloud Study JamsDSC Cloud Study Jams
DSC Cloud Study JamsKateGrupp
 
GDSC BVCOENM - Google Cloud Study Jam October 2021 | Day 1 + Day 2
GDSC BVCOENM - Google Cloud Study Jam October 2021 | Day 1 + Day 2GDSC BVCOENM - Google Cloud Study Jam October 2021 | Day 1 + Day 2
GDSC BVCOENM - Google Cloud Study Jam October 2021 | Day 1 + Day 2GDSCBVCOENM
 
Google Cloud: Data Analysis and Machine Learningn Technologies
Google Cloud: Data Analysis and Machine Learningn Technologies Google Cloud: Data Analysis and Machine Learningn Technologies
Google Cloud: Data Analysis and Machine Learningn Technologies Andrés Leonardo Martinez Ortiz
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLMárton Kodok
 
Google Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scaleGoogle Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scaleIdan Tohami
 
Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...
Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...
Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...Daniel Zivkovic
 
Easy path to machine learning
Easy path to machine learningEasy path to machine learning
Easy path to machine learningwesley chun
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLMárton Kodok
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1Bill Liu
 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...Henry Saputra
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudMárton Kodok
 
Cloud-Architect-Certification-Masters-Course.pdf
Cloud-Architect-Certification-Masters-Course.pdfCloud-Architect-Certification-Masters-Course.pdf
Cloud-Architect-Certification-Masters-Course.pdf18544AImtiyaz
 

Similar to Production ML Systems and Computer Vision with Google Cloud (20)

Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
 
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
 
Accelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSAccelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWS
 
Build, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at ScaleBuild, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at Scale
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
DSC Cloud Study Jams
DSC Cloud Study JamsDSC Cloud Study Jams
DSC Cloud Study Jams
 
GDSC BVCOENM - Google Cloud Study Jam October 2021 | Day 1 + Day 2
GDSC BVCOENM - Google Cloud Study Jam October 2021 | Day 1 + Day 2GDSC BVCOENM - Google Cloud Study Jam October 2021 | Day 1 + Day 2
GDSC BVCOENM - Google Cloud Study Jam October 2021 | Day 1 + Day 2
 
Google Cloud: Data Analysis and Machine Learningn Technologies
Google Cloud: Data Analysis and Machine Learningn Technologies Google Cloud: Data Analysis and Machine Learningn Technologies
Google Cloud: Data Analysis and Machine Learningn Technologies
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 
Google Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scaleGoogle Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scale
 
DEVOPS AND MACHINE LEARNING
DEVOPS AND MACHINE LEARNINGDEVOPS AND MACHINE LEARNING
DEVOPS AND MACHINE LEARNING
 
Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...
Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...
Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...
 
Design Day Workshop
Design Day WorkshopDesign Day Workshop
Design Day Workshop
 
Easy path to machine learning
Easy path to machine learningEasy path to machine learning
Easy path to machine learning
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1
 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
 
Cloud-Architect-Certification-Masters-Course.pdf
Cloud-Architect-Certification-Masters-Course.pdfCloud-Architect-Certification-Masters-Course.pdf
Cloud-Architect-Certification-Masters-Course.pdf
 

More from gdgsurrey

MOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCMOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCgdgsurrey
 
Certification Study Group - NLP & Recommendation Systems on GCP Session 5
Certification Study Group - NLP & Recommendation Systems on GCP Session 5Certification Study Group - NLP & Recommendation Systems on GCP Session 5
Certification Study Group - NLP & Recommendation Systems on GCP Session 5gdgsurrey
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...gdgsurrey
 
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...gdgsurrey
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptxgdgsurrey
 
Road to Google Developer Certification: Panel Discussion & Networking
Road to Google Developer Certification: Panel Discussion & NetworkingRoad to Google Developer Certification: Panel Discussion & Networking
Road to Google Developer Certification: Panel Discussion & Networkinggdgsurrey
 

More from gdgsurrey (6)

MOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCMOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDC
 
Certification Study Group - NLP & Recommendation Systems on GCP Session 5
Certification Study Group - NLP & Recommendation Systems on GCP Session 5Certification Study Group - NLP & Recommendation Systems on GCP Session 5
Certification Study Group - NLP & Recommendation Systems on GCP Session 5
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
 
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx
 
Road to Google Developer Certification: Panel Discussion & Networking
Road to Google Developer Certification: Panel Discussion & NetworkingRoad to Google Developer Certification: Panel Discussion & Networking
Road to Google Developer Certification: Panel Discussion & Networking
 

Recently uploaded

Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 

Recently uploaded (20)

Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 

Production ML Systems and Computer Vision with Google Cloud

  • 1. Session 4 Vikram Tiwari Professional Machine Learning Engineer Margaret Maynard-Reid
  • 2. Professional Machine Learning Certification Learning Journey Organized by Google Developer Groups Surrey co hosting with GDG Seattle Session 1 Feb 24, 2024 Virtual Session 2 Mar 2, 2024 Virtual Session 3 Mar 9, 2024 Virtual Session 4 Mar 16, 2024 Virtual Session 5 Mar 23, 2024 Virtual Session 6 Apr 6, 2024 Virtual Review the Professional ML Engineer Exam Guide Review the Professional ML Engineer Sample Questions Go through: Google Cloud Platform Big Data and Machine Learning Fundamentals Hands On Lab Practice: Perform Foundational Data, ML, and AI Tasks in Google Cloud (Skill Badge) - 7hrs Build and Deploy ML Solutions on Vertex AI (Skill Badge) - 8hrs Self study (and potential exam) Lightning talk + Kick-off & Machine Learning Basics + Q&A Lightning talk + GCP- Tensorflow & Feature Engineering + Q&A Lightning talk + Enterprise Machine Learning + Q&A Production ML Systems and Computer Vision with Google Cloud + Q&A Lightning talk + NLP & Recommendation Systems on GCP + Q&A Lightning talk + MOPs & ML Pipelines on GCP + Q&A Complete course: Introduction to AI and Machine Learning on Google Cloud Launching into Machine Learning Complete course: TensorFlow on Google Cloud Feature Engineering Complete course: Machine Learning in the Enterprise Hands On Lab Practice: Production Machine Learning Systems Computer Vision Fundamentals with Google Cloud Complete course: Natural Language Processing on Google Cloud Recommendation Systems on GCP Complete course: ML Ops - Getting Started ML Pipelines on Google Cloud Check Readiness: Professional ML Engineer Sample Questions
  • 3. Session 4 Study Group Computer Vision ● Vision API & AutoML Vision ● Beyond the course Model Development ● Build a model. ● Train a model. ● Test a model. ● Scale model training and serving.
  • 5. ML GDE (Google Developer Expert) GDG Seattle organizer 3D artist Fashion Designer Instructor at UW About me margaretmz.art 5
  • 6. Computer Vision on Google Cloud (Cloud Skills Boost) Course Overview Module 1 Introduction to Computer Vision Module 2 Vertex AI AutoML Vision Module 3 Custom Training Moule 4 Convolutional Neural Network Module 5 Working with Image Data
  • 7. ● What is computer vision ● Different types of computer vision use cases ● Various ML tools on Google Cloud ● Experiment pre-built APIs Intro to Computer Vision Module 1
  • 8. Computer Vision Use Cases Complexity Image classification (single-label) Classify an image to a class Examples Painting style or artist Van Gogh Image classification (multi-label) Classify an image to multiple classes Examples Movie poster genre action, sci-fi Feature extraction Extracting latent features of an image with CNN models Examples Visual search find similar fashion Object detection Identify one or multiple objects within an image and their locations with bounding boxes. detect UI elements Segmentation Classify whether each pixel of the image belongs to a certain class segment UI elements Generative models Computer Vision (+ NLP) Examples - Generate new images - Super resolution - Image-to-image - Text-to-image etc. an generated image 8
  • 9. Google Cloud Vision API Module 1 labs Lab 1 Detecting Labels, Faces and Landmarks in Images with the Cloud Vision API ● Create a bucket ● Upload an image (public access) ● Send json request ● Receive json response Lab 2 Extracting Text from the Images using the Google Cloud Vision API ● Cloud Functions ● Upload images to Cloud Storage ● Extract, translate and save text
  • 10. ● Intro to Vertex AI - Google’s unified AI platform ● Automated ML pipeline with AutoML ● AutoML Vision ● AutoML example ● Options: Vision API vs AutoML Vision vs custom training Module 2 Vertex AI & Auto ML Vision
  • 11. ● Image classification ● Custom Image classifier with 5-flowers dataset ● TensorFlow: ○ Linear network ○ Neural network ○ Deep Neural Networks (DNN) ● Dropout and Batch Normalization Module 3 Custom Training
  • 12. ● How to use CNN ● What makes CNN different? ● Key CNN model parameters: filters, # of channels, kernel size etc ● Working with Polling Layers ● Implement CNNs on Vertex with pre-built TensorFlow container using Vertex workbench Module 4 Convolutional Neural Networks (CNN)
  • 13. ● Preprocessing (with Keras and TensorFlow dataset) ● Data scarcity problem: ○ Image Augmentation ○ Transfer learning Module 5 - Image Data
  • 14. ● Why transfer learning? ○ Less data and faster training ● What is transfer learning? ● How to use transfer learning Transfer learning Why, What & How “I know Kungfu…”
  • 15. Core Services Cloud Vision API -Image labeling -Face detection -Landmark detection -Text extraction (OCR) AutoML Vision -Image classification -Object Detection Specialized Services Video Intelligence API -Shot change detection -Object tracking -Text detection Document AI -Form parsing -Invoice/receipt processing Vertex AI Gemini Pro Vision -Visual analysis -Multimodal Q&A Computer Vision on Google Cloud Imagen 2 -Image generation -Image editing -Visual captioning -Visual Q&A
  • 20. ML lifecycle - People
  • 22. ML lifecycle - Infrastructure In a notebook Local machine/VM On the cloud AI Platform Exploration Phase (Component Wise Test) Development Phase (ML Pipeline Test as a Whole) Production Phase (Integrate with Other Products)
  • 23. 23 ML lifecycle - Data Skew and Drift are the silent killers of your ML models Training-Serving Skew Feature at training: green banana Feature at serving: yellow banana Prediction Drift Feature at serving: changing from green to yellow
  • 24. 24 Research Production Objectives Model performance Different stakeholders have different objectives Computational priority Fast training, high throughput Fast inference, low latency Data Static Constantly shifting Fairness Good to have (sadly) Important Interpretability Good to have Important Research vs Production
  • 25. 2 5
  • 26. Session 4 @Vikram_Tiwari @chipro @hanneshapke Production ML - Architecting production ML systems - Designing adaptable ML systems - Designing High-performance ML systems - Building Hybrid ML systems
  • 27. Three ways Google Cloud can help you benefit from ML Retrained models: your data + our models Pre-Trained models: our data + our models Custom models: your data + your model Vision Translation Natural Language Speech- to-Text Job Discovery Video Intelligence Text-to Speech AutoML Easy-to-Use, for non-ML engineers Customizable, for Data Scientists 3 1 2 Compute Engine GPU Cloud TPU Cloud Dataproc Kubernetes Engine BigQuery AI Platform Training & Prediction Translate NLP Speech Vision Tables Recomm dation Dialogflow Enterprise
  • 28. End-to-end environment for AI inside GCP console Offers an integrated tool chain from data engineering to model deployment with “no lock- in” Allows you to run on-premises or on Google Cloud without significant code changes. Access to cutting-edge Google AI technology like TensorFlow, TPUs, and TFX tools as you deploy your AI applications to production. What is AI Platform? AI Platfor m
  • 29. What is included? AI Platform Integrated with Deep Learning VM Images Cloud Dataflow Cloud Dataproc Google BigQuery Cloud Dataprep Google Data Studio Notebooks Data Labeling Training Predictions Pre-built Algorithms For data warehousing For data transformation For data cleansing For Hadoop and Spark clusters For BI dashboards
  • 30. What is included? Kubeflow (On premises) AI Platform Integrated with Pipelines Cloud Dataflow Cloud Dataproc Google BigQuery Cloud Dataprep Google Data Studio Notebooks Data Labeling Training Predictions Pre-built Algorithms For data warehousing For data transformation For data cleansing For Hadoop and Spark clusters For BI dashboards AI Hub
  • 31. AI Platform Notebooks A hosted Jupyter notebook solution that makes it easy for Data Scientists to spin up JupyterLab; and gives DevOps teams the controls they need. Centrally managed: DevOps teams can easily manage and secure these environments. Get started quickly: Latest data science and machine learning frameworks are pre-configured. No learning curve: Uses the industry standard JupyterLab interface. Scalable & cost-effective: Pick the hardware you need; and scale up and down easily. GCP integration: It’s easy to access and use GCP services from within your notebooks. Easily build, train, and deploy models: Supports the full ML lifecycle through integration with the most popular ML frameworks and tools.
  • 32. One-Click Deployment Spin up a JuypterLab instance, pre- configured with the latest machine learning and data science frameworks in one click. Get started quickly
  • 33. JupyterLab AI Platform Notebooks uses the latest open-source version of the industry- standard JupyterLab. No learning curve
  • 34. Scale On Demand You can easily change hardware including adding and removing GPUs. Scalable & cost-effective
  • 35. What is included? Kubeflow (On premises) AI Platform Integrated with Pipelines Cloud Dataflow Cloud Dataproc Google BigQuery Cloud Dataprep Google Data Studio Notebooks Data Labeling Training Predictions Pre-built Algorithms For data warehousing For data transformation For data cleansing For Hadoop and Spark clusters For BI dashboards AI Hub
  • 36. ● Serverless and no-ops ML training ● Distributed training infrastructure that supports CPUs, GPUs and TPUs ● Hyperparameter tuning ● Train and tune TensorFlow models, Scikit-learn models, XGBoost models and custom containers ● Multiple runtime versions for different frameworks ● Prebuilt algorithms (TensorFlow linear learner and wide&deep algorithm, XGBoost algorithm) AI Platform Training
  • 38. Package Model trainer https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/cloudml-template Project root directory python task.py --train-files $TRAIN_DATA --eval-files $EVAL_DATA task.py ➔ Accept command line arguments ➔ Upload the model to GCS
  • 39. Training locally gcloud ai-platform local train --module-name trainer.task --package-path trainer/ -- --train-files $TRAIN_DATA --eval-files $EVAL_DATA --job-dir $MODEL_DIR training data evaluation data output directory train locally Local path
  • 40. gcloud ai-platform jobs submit training $JOB_NAME --job-dir $OUTPUT_PATH --runtime-version 1.13 --module-name trainer.task --package-path trainer --region $REGION --scale-tier BASIC -- --train-files $TRAIN_DATA --eval-files $EVAL_DATA single worker https://cloud.google.com/ai-platform/training/docs/machine-types Training in the cloud with single node
  • 41. Training in the cloud at scale gcloud ai-platform jobs submit training $JOB_NAME --job-dir $OUTPUT_PATH --runtime-version 1.13 --module-name trainer.task --package-path trainer --region $REGION --scale-tier BASIC_GPU -- --train-files $TRAIN_DATA --eval-files $EVAL_DATA single GPU https://cloud.google.com/ai-platform/training/docs/machine-types with GPUs (K80/P100/V100 - availability by region)
  • 42. Training in the cloud at scale gcloud ai-platform jobs submit training $JOB_NAME --job-dir $OUTPUT_PATH --runtime-version 1.13 --module-name trainer.task --package-path trainer --region $REGION --scale-tier BASIC_TPU -- --train-files $TRAIN_DATA --eval-files $EVAL_DATA TPU Device https://cloud.google.com/ai-platform/training/docs/using-tpus with TPUs
  • 43. gcloud ai-platform jobs submit training $JOB_NAME --job-dir $OUTPUT_PATH --runtime-version 1.13 --module-name trainer.task --package-path trainer --region $REGION --scale-tier CUSTOM --config config.yaml -- --train-files $TRAIN_DATA --eval-files $EVAL_DATA custom cluster https://cloud.google.com/ai-platform/training/docs/machine-types trainingInput: scaleTier: CUSTOM masterType: complex_model_l workerType: complex_model_l_gpu workerCount: 10 parameterServerType: large_model https://cloud.google.com/ai-platform/training/pricing Training in the cloud at scale with custom cluster specs
  • 44. gcloud ai-platform jobs submit training $JOB_NAME --job-dir $OUTPUT_PATH --runtime-version 1.13 --module-name trainer.task --package-path trainer/ --region $REGION --scale-tier PREMIUM_1 --config config.yaml -- --train-files $TRAIN_DATA --eval-files $EVAL_DATA hypertuning Hyperparameter tuning
  • 45. Hyperparameter tuning ● Automatic hyperparameter tuning service ● Google-developed “black-box” search (Bayesian Optimisation) algorithm ● In addition to Random Search and Grid Search ● Supports numeric, discrete, and categorical params ● Early stopping & resumability Objective We want to find this Not these https://cloud.google.com/blog/big-data/2017/08/hyperparameter-tuning-in-cloud-machine-learning-engine-using-bayesian-optimization https://cloud.google.com/blog/big-data/2018/03/hyperparameter-tuning-on-google-cloud-platform-is-now-faster-and-smarter
  • 46. trainingInput: hyperparameters: goal: MAXIMIZE hyperparameterMetricTag: accuracy maxTrials: 40 enableTrialEarlyStopping: True maxParallelTrials: 2 algorithm: UNSPECIFIED params: - parameterName: learning-rate type: FLOAT minValue: 0.001 maxValue: 0.1 scaleType: UNIT_LOG_SCALE ... ... # Initialise the optimizer for the DNN optimizer = tf.train.AdagradOptimizer( learning_rate=hparams.learning_rate) ... parser.add_argument( '--learning-rate', help='Learning rate used by the DNN optimizer', default=0.01, type=float ) ... config.yaml task.py Hyperparameter tuning
  • 47. What is included? AI Platform Integrated with Pipelines Cloud Dataflow Google BigQuery Cloud Dataprep Data Labeling Training Pre-built Algorithms For data warehousing For data transformation For data cleansing
  • 48. Built-in Algorithms Start an ML Engine training job using the built- in algorithms. No coding required! Just use the provided UI.
  • 49. Training in 4 easy steps Training algorithm Training data Algorithm arguments Job settings 1 2 3 4
  • 50. What is included? AI Platform Integrated with Pipelines Cloud Dataflow Cloud Dataproc Google BigQuery Cloud Dataprep Google Data Studio Notebooks Data Labeling Training Predictions For data warehousing For data transformation For data cleansing For Hadoop and Spark clusters For BI dashboards Pre-built Algorithms
  • 51. ● Serverless and no-ops ML serving ● Batch prediction for TensorFlow models on CPUs and GPUs ● Online prediction for Scikit-learn models, XGBoost models and Custom prediction routines ● Explainability using different methods (Integrated Gradients (TF), TreeSHAP(XGB), Sampled Shapley, Exact Shapley) ● Data Services for Data Labeling and Continues Evaluation AI Platform Prediction
  • 52. Deploy the trained TF model # Creating model gcloud ai-platform models create $NAME --regions $REGION # Creating versions gcloud ai-platform versions create $VERSION --model $NAME --origin $MODEL_DIR --runtime-version 1.7 gcloud command line tool:
  • 53. Predicting POST https://ml.googleapis.com/v1/projects/your-project-id/ models/${model-name}/ versions/${version}:predict Request Body: { "instances": [ [0.0, 1.1, 2.2], [3.3, 4.4, 5.5], ... ] } batch prediction*: online prediction*: cloud ai-platform jobs submit prediction $JOB_NAME --model $NAME --version $VERSION --data-format TEXT --input-paths $GCS_DATA_DIR --output-path $GCS_OUT_DIR *gcloud commands and APIs exists for both methods
  • 54. Simply choose an explanation method when you set up a model, and Cloud AI Platform will tell on every prediction how much each feature affected the final result Explainable AI (XAI): Cloud AI Platform provides analysis with every prediction Cloud AI Platform Prediction Service Data Model
  • 55. Supported AI Platform explanation methods Support Method Frameworks Data types Paper link Integrated Gradients TensorFlow Tabular, image, text (differentiable modes) arxiv.org/abs/1703.01365 Sampled Shapley TensorFlow Tabular arxiv.org/pdf/1306.4265 XRAI TensorFlow Image arxiv.org/abs/1906.02825
  • 56. Deploying explainable models on AI Platform !gcloud beta ai-platform versions create $VERSION --model $MODEL --origin $export_path --runtime-version 2.1 --framework 'TENSORFLOW' --python-version 3.7 --machine-type n1-standard-4 --explanation-method 'integrated-gradients' --num-integral-steps 25
  • 57. Deploy model Request predictions ...with explanations ...with explanations gcloud beta ai-platform versions create $VERSION --model $MODEL --origin $export_path --runtime-version 1.15 --framework TENSORFLOW --python-version 3.7 --machine-type n1-standard-4 gcloud beta ai-platform versions create $VERSION --model $MODEL --origin $export_path --runtime-version 1.15 --framework TENSORFLOW --python-version 3.7 --machine-type n1-standard-4 --explanation-method 'integrated-gradients' --num-integral-steps 25 gcloud beta ai-platform predict --model $MODEL --version $VERSION --json-instances='data.txt' gcloud beta ai-platform explain --model $MODEL --version $VERSION --json-instances='data.txt'
  • 58. Steps to train a TensorFlow model - Docker user journey 1. Develop a TensorFlow model and training code 2. Create a Dockerfile with your model code 3. Build the image 4. Push it to a container registry (e.g. Google Container Registry) 5. Kick off your AI Platform training job
  • 59. First: Create your model and training code model = tf.keras.Sequential( [ Dense(100, activation=relu, input_shape=(input_dim,)), Dense(75, activation=relu), Dense(50, activation=relu), Dense(25, activation=relu), Dense(1, activation=sigmoid) ]) Sample code showing structure in cloudml-samples repo # Train model keras_model.fit( training_dataset, steps_per_epoch=int(num_train_examples / args.batch_size), epochs=args.num_epochs, validation_data=validation_dataset, validation_steps=1, verbose=1, callbacks=[lr_decay_cb, tensorboard_cb]) model.py task.py
  • 60. Second: Create your Dockerfile FROM gcr.io/deeplearning-platform-release/tf2-ent-latest-gpu WORKDIR /root COPY model.py /root/model.py COPY task.py /root/task.py ENTRYPOINT ["python", "task.py"] Extend DLVM Image
  • 61. Third: Build, test, and push image IMAGE = "gcr.io/MY-PROJECT/MY-REPO:MY_IMAGE" # Build image docker build -f Dockerfile -t $IMAGE # Test locally docker run $IMAGE --lr 0.1 # Push to container registry docker push $IMAGE Run locally to test. You can pass custom model parameters (e.g. learning rate) into the image.
  • 62. Fourth: Submit Training Job gcloud ai-platform jobs submit training my-job --region us-west1 --master-image-uri gcr.io/my-project/my-repo:my-image -- --lr=0.1 Standard parameters for the AI Platform command Everything after the are custom parameters that your training code is designed to accept
  • 63. Optional: Add hyper-parameter tuning trainingInput: hyperparameters: goal: MINIMIZE hyperparameterMetricTag: "my_loss" maxTrials: 20 maxParallelTrials: 5 enableTrialEarlyStopping: True params: - parameterName: lr type: DOUBLE minValue: 0.0001 maxValue: 0.1 Add parameter --config config.yaml to training job ... parser.add_argument( '--lr', type=float, default=0.01, metavar='LR', help='learning rate (default: 0.01)') ... config.yaml task.py
  • 64. View Job in Console
  • 65. What is included? AI Platform Integrated with Pipelines Cloud Dataflow Cloud Dataproc Google BigQuery Cloud Dataprep Notebooks Data Labeling Training Predictions Pre-built Algorithms For data warehousing For data transformation For data cleansing For Hadoop and Spark clusters
  • 66. Data Labeling Service Custom Instructions: Provide your own custom instructions to labelers Human Labeled Data: Get high quality human labeled data to train and evaluate for your ML models Labeling tasks for unstructured data: Task focusing on images, videos and text Continues Evaluation: Record sample predictions on BQ and sent for evaluation