2024-02-24_Session 1 - PMLE_UPDATED.pptx

Session 1
Sowndarya Venkateswaran
Professional Machine Learning Engineer
Margaret Maynard-Reid

Where are we on our journey
1
Session 1 Content Review
2
Q&A
4
Preview actions for next week
5
Sample Question Review
3

Professional Machine Learning Certification
Learning Journey Organized by Google Developer Groups Surrey co hosting with GDG Seattle
Session 1
Feb 24, 2024
Virtual
Session 2
Mar 2, 2024
Virtual
Session 3
Mar 9, 2024
Virtual
Session 4
Mar 16, 2024
Virtual
Session 5
Mar 23, 2024
Virtual
Session 6
Apr 6, 2024
Virtual Review the
Professional ML
Engineer Exam
Guide
Review the
Professional ML
Engineer Sample
Questions
Go through:
Google Cloud
Platform Big Data
and Machine
Learning
Fundamentals
Hands On Lab
Practice:
Perform
Foundational Data,
ML, and AI Tasks in
Google Cloud
(Skill Badge) - 7hrs
Build and Deploy ML
Solutions on Vertex
AI
(Skill Badge) - 8hrs
Self
study
(and
potential
exam)
Lightning talk +
Kick-off & Machine
Learning Basics +
Q&A
Lightning talk +
GCP- TensorFlow &
Feature Engineering
+ Q&A
Lightning talk +
Enterprise Machine
Learning + Q&A
Lightning talk +
Production Machine
Learning with
Google Cloud + Q&A
Lightning talk + NLP
& Recommendation
Systems on GCP +
Q&A
Lightning talk + MOPs
& ML Pipelines on GCP
+ Q&A
Complete course:
Introduction to AI and
Machine Learning on
Google Cloud
Launching into
Machine Learning
Complete course:
TensorFlow on Google
Cloud
Feature
Engineering
Complete course:
Machine Learning in
the Enterprise
Hands On Lab
Practice:
Production Machine
Learning Systems
Computer Vision
Fundamentals with
Google Cloud
Complete course:
Natural Language
Processing on Google
Cloud
Recommendation
Systems on GCP
Complete course:
ML Ops - Getting
Started
ML Pipelines on Google
Cloud
Check Readiness:
Professional ML
Engineer Sample
Questions

Session 1
Topics
Intro to AI and ML on Google Cloud
- Intro
- AI Foundations
- AI Development Options
- AI Development WorkFlow
- Generative AI
ML Problem Framing
- Translate business challenge into ML use case.
- Define ML problem.
- Define business success criteria.
- Identify risks to feasibility and implementation of ML solution.
- Envisioning future solution improvements.
MLE Learning Path - Google Cloud Skills
Boost

Intro to AI and ML
on Google Cloud

ML GDE (Google Developer Expert)
GDG Seattle organizer
3D artist
Fashion Designer
Instructor at UW
About me
margaretmz.art
9

1. Why Google
2. AI/ML framework on Google Cloud
3. Google Cloud infrastructure
4. Data and AI products
5. ML model categories
6. BigQuery ML
Lab: BigQuery ML - Predicting Visitor Purchases with BigQuery ML
1. AI Foundations

1. How a machine learns
2. ML workflow
3. Data preparation
4. Model development
5. Model serving
6. MLOps and workflow automation
Lab: AutoML - Predicting Loan Risk with AutoML
3. AI Development Workflow

AI that generates content (can be multimodal) with generative models:
4. Generative AI
What is it?

Model Garden
● Foundation models
● Task specific models
● Fine tunable models
Generative AI on Google Cloud
Your options
Gen AI Studio
● Language
● Vision
● Speech

LLMs are a very important part of
Generative AI.
Generative AI ≠ LLMs

Register here for worldwide events:
● 2024 Duet AI Roadshow - Google Cloud team
Mark your calendar!
● 4/20/2024 Build with AI - GDG Surrey
● 5/4/2024 Build with AI - GDG Seattle
Generative AI
Upcoming events

ML Problem Framing,
Model Evaluation,
Fairness

ML Problem Framing and the ML Journey

Understand the problem
● State the goal for the product you are developing or
refactoring.
● Determine how best the goal can be solved, predictive
ML, generative AI, or a non-ML solution.
● Verify you have the data required to train a model if
you're using a predictive ML approach.

Stating the Goal
● What am I trying
to accomplish?
● A Weather app =
Calculate
precipitation
every 6 hours in
Surrey.

ML Approach
ML systems can be divided into two broad categories:
● Predictive ML - Makes a prediction
● Generative AI – Generates output based on the
user's intent

Are we having the required data?
● Relevant
● Comprehensive
● Diverse

Framing a Machine Learning Problem
● ML Problem:
○ What is being predicted?
○ What data is needed?
● Software Problem:
○ How will users access predictions from the model?
○ Who will use this service?
○ How are they doing it today?
● Data Problem:
○ How will you collect the data?
○ What analysis needs to be done?
○ How will you react to changes in trends?

Framing a Machine Learning Problem
❑ Defined the ideal outcome and the model's goal.
❑ Model's output type
❑ Success metrics

Let’s start with an example
🚴 You’re building a model to predict bike rental duration
📊 The model’s mean absolute error (MAE) is 1,200 seconds.
Great!
🤔 But is that good or bad??

Jumping into development without a prototype
or heuristic baseline
A machine learning project is an iterative
process.
You should start with a simple model and
continue to refine it until you've reached
your goal.
A quick prototype can tell you a lot about
hidden requirements, implementation
challenges, scope, etc.

Heuristic benchmark = simple point of
comparison
● Good benchmarks:
○ Constant
○ Rule of thumb
○ Mean / median / mode
○ Human experts
● Not necessarily determined by ML: comparing to a linear regression
model isn’t always best

Returning to our bike example
● In our training dataset, what is the average rental duration
given the station name and whether or not it is a peak
commute hour?
● How does our model performance compare to this
benchmark?

Get to Know your Data: Data Split

Get to Know your Data: Improve Data Quality

Machine Learning in Practice
In regression problems, the goal is to use mathematical
functions of different combinations of our features to
predict the continuous value of our label.
In classification problems, instead of trying to predict a
continuous variable, we are trying to create a decision
boundary that separates the different classes.

Responsible AI
Fairness Explainability Privacy Security

Unfair vs. naturally occuring bias
Bias is often, but not always, a bad thing
Unfair bias occurs when data doesn’t
accurately reflect the population who will
be using a model. The Fairness pattern
details how to account for this.
Naturally occurring bias refers to
scenarios where data is inherently
imbalanced, and can’t be improved
through further data collection.
🐱 🐱 🐱 🐱 🐶 💳 💳 💳 💳 ⚠️

Most real-world
datasets aren’t
perfectly balanced
● Anomaly detection
● Multi-class classification
● Predicting occurrence of a
rare event

3 techniques for handling imbalanced data
1. Downsampling
2. Upsampling
3. Weighted classes

Downsampling
Majority class Minority class

Downsampling
Discard a random sample

Downsampling
Majority
class
Minority
class
Discard a random
sample

Upsampling

Upsampling
Majority
class
Minority
class
Generate new data

Weighted Class
Pay more
attention to
me!

Choosing the right evaluation metrics
Accuracy: 93%
the same model

ROC vs PR
AUC
PR curve focuses on the minority class whereas the ROC curve covers both classes

You work for a maintenance company and have built and trained a deep learning model that
identifies defects based on thermal images of underground electric cables. Your dataset contains
10,000 images, 100 of which contain visible defects. How should you evaluate the performance of
the model on a test dataset?
A. Calculate the Area Under the Curve (AUC) value.
B. Calculate the number of true positive results predicted by the model.
C. Calculate the fraction of images predicted by the model to have a visible defect.
D. Calculate the Cosine Similarity to compare the model’s performance on the test dataset to the
model’s performance on the training dataset.

You are an ML engineer at a media company. You need to build an ML model to
analyze video content frame-by-frame, identify objects, and alert users if there is
inappropriate content. Which Google Cloud products should you use to build this
project?
A. Pub/Sub, Cloud Function, Cloud Vision API
B. Pub/Sub, Cloud IoT, Dataflow, Cloud Vision API, Cloud Logging
C. Pub/Sub, Cloud Function, Video Intelligence API, Cloud Logging
D. Pub/Sub, Cloud Function, AutoML Video Intelligence, Cloud Logging

A sarcasm-detection model was trained on 80,000 text messages: 40,000 messages sent by adults (18 years and older) and
40,000 messages sent by minors (less than 18 years old). The model was then evaluated on a test set of 20,000 messages:
10,000 from adults and 10,000 from minors. The following confusion matrices show the results for each group (a positive
prediction signifies a classification of "sarcastic"; a negative prediction signifies a classification of "not sarcastic". Choose the
correct answers):
a) The 10,000 messages sent by adults
are a class-imbalanced dataset.
b) The model fails to classify
approximately 50% of minors' sarcastic
messages as "sarcastic."
c) The 10,000 messages sent by minors
are a class-imbalanced dataset.
d) Overall, the model performs better on
examples from adults than on examples
from minors.

By our next session #2 (3/2/2024)
Try to complete these before the session:
04 TensorFlow on Google Cloud
05 Launching into Machine Learning

As you complete each
course, you will get badges
on Cloud Skills Boost.
Redeem your participation badge
Thank you for joining the event

Thank you for
participating!
For any operational questions about access to
Cloud Skills Boost or the Road to Google
Developers Certification program contact: gdg-
support@google.com

2024-02-24_Session 1 - PMLE_UPDATED.pptx

Recommended

Recommended

More Related Content

Similar to 2024-02-24_Session 1 - PMLE_UPDATED.pptx

Similar to 2024-02-24_Session 1 - PMLE_UPDATED.pptx (20)

More from gdgsurrey

More from gdgsurrey (6)

Recently uploaded

Recently uploaded (20)

2024-02-24_Session 1 - PMLE_UPDATED.pptx