Session 1
Sowndarya Venkateswaran
Professional Machine Learning Engineer
Margaret Maynard-Reid
Where are we on our journey
1
Session 1 Content Review
2
Q&A
4
Preview actions for next week
5
Sample Question Review
3
Where are we on our
journey
Professional Machine Learning Certification
Learning Journey Organized by Google Developer Groups Surrey co hosting with GDG Seattle
Session 1
Feb 24, 2024
Virtual
Session 2
Mar 2, 2024
Virtual
Session 3
Mar 9, 2024
Virtual
Session 4
Mar 16, 2024
Virtual
Session 5
Mar 23, 2024
Virtual
Session 6
Apr 6, 2024
Virtual Review the
Professional ML
Engineer Exam
Guide
Review the
Professional ML
Engineer Sample
Questions
Go through:
Google Cloud
Platform Big Data
and Machine
Learning
Fundamentals
Hands On Lab
Practice:
Perform
Foundational Data,
ML, and AI Tasks in
Google Cloud
(Skill Badge) - 7hrs
Build and Deploy ML
Solutions on Vertex
AI
(Skill Badge) - 8hrs
Self
study
(and
potential
exam)
Lightning talk +
Kick-off & Machine
Learning Basics +
Q&A
Lightning talk +
GCP- TensorFlow &
Feature Engineering
+ Q&A
Lightning talk +
Enterprise Machine
Learning + Q&A
Lightning talk +
Production Machine
Learning with
Google Cloud + Q&A
Lightning talk + NLP
& Recommendation
Systems on GCP +
Q&A
Lightning talk + MOPs
& ML Pipelines on GCP
+ Q&A
Complete course:
Introduction to AI and
Machine Learning on
Google Cloud
Launching into
Machine Learning
Complete course:
TensorFlow on Google
Cloud
Feature
Engineering
Complete course:
Machine Learning in
the Enterprise
Hands On Lab
Practice:
Production Machine
Learning Systems
Computer Vision
Fundamentals with
Google Cloud
Complete course:
Natural Language
Processing on Google
Cloud
Recommendation
Systems on GCP
Complete course:
ML Ops - Getting
Started
ML Pipelines on Google
Cloud
Check Readiness:
Professional ML
Engineer Sample
Questions
Session 1 Content Review
Session 1
Topics
Intro to AI and ML on Google Cloud
- Intro
- AI Foundations
- AI Development Options
- AI Development WorkFlow
- Generative AI
ML Problem Framing
- Translate business challenge into ML use case.
- Define ML problem.
- Define business success criteria.
- Identify risks to feasibility and implementation of ML solution.
- Envisioning future solution improvements.
MLE Learning Path - Google Cloud Skills
Boost
Intro to AI and ML
on Google Cloud
ML GDE (Google Developer Expert)
GDG Seattle organizer
3D artist
Fashion Designer
Instructor at UW
About me
margaretmz.art
9
0. Intro
1. Why Google
2. AI/ML framework on Google Cloud
3. Google Cloud infrastructure
4. Data and AI products
5. ML model categories
6. BigQuery ML
Lab: BigQuery ML - Predicting Visitor Purchases with BigQuery ML
1. AI Foundations
2. AI Development Options
1. How a machine learns
2. ML workflow
3. Data preparation
4. Model development
5. Model serving
6. MLOps and workflow automation
Lab: AutoML - Predicting Loan Risk with AutoML
3. AI Development Workflow
AI that generates content (can be multimodal) with generative models:
4. Generative AI
What is it?
Model Garden
● Foundation models
● Task specific models
● Fine tunable models
Generative AI on Google Cloud
Your options
Gen AI Studio
● Language
● Vision
● Speech
Model Garden
LLMs are a very important part of
Generative AI.
Generative AI ≠ LLMs
Register here for worldwide events:
● 2024 Duet AI Roadshow - Google Cloud team
Mark your calendar!
● 4/20/2024 Build with AI - GDG Surrey
● 5/4/2024 Build with AI - GDG Seattle
Generative AI
Upcoming events
ML Problem Framing,
Model Evaluation,
Fairness
Activity
ML
ML Problem Framing and the ML Journey
Understand the problem
● State the goal for the product you are developing or
refactoring.
● Determine how best the goal can be solved, predictive
ML, generative AI, or a non-ML solution.
● Verify you have the data required to train a model if
you're using a predictive ML approach.
Stating the Goal
● What am I trying
to accomplish?
● A Weather app =
Calculate
precipitation
every 6 hours in
Surrey.
ML Approach
ML systems can be divided into two broad categories:
● Predictive ML - Makes a prediction
● Generative AI – Generates output based on the
user's intent
Are we having the required data?
● Relevant
● Comprehensive
● Diverse
Framing a Machine Learning Problem
● ML Problem:
○ What is being predicted?
○ What data is needed?
● Software Problem:
○ How will users access predictions from the model?
○ Who will use this service?
○ How are they doing it today?
● Data Problem:
○ How will you collect the data?
○ What analysis needs to be done?
○ How will you react to changes in trends?
Framing a Machine Learning Problem
❑ Defined the ideal outcome and the model's goal.
❑ Model's output type
❑ Success metrics
Let’s start with an example
🚴 You’re building a model to predict bike rental duration
📊 The model’s mean absolute error (MAE) is 1,200 seconds.
Great!
🤔 But is that good or bad??
Jumping into development without a prototype
or heuristic baseline
A machine learning project is an iterative
process.
You should start with a simple model and
continue to refine it until you've reached
your goal.
A quick prototype can tell you a lot about
hidden requirements, implementation
challenges, scope, etc.
Heuristic benchmark = simple point of
comparison
● Good benchmarks:
○ Constant
○ Rule of thumb
○ Mean / median / mode
○ Human experts
● Not necessarily determined by ML: comparing to a linear regression
model isn’t always best
Returning to our bike example
● In our training dataset, what is the average rental duration
given the station name and whether or not it is a peak
commute hour?
● How does our model performance compare to this
benchmark?
Get to Know your Data: Data Split
Get to Know your Data: Improve Data Quality
Get to Know your Data: EDA
Machine Learning in Practice
Machine Learning in Practice
In regression problems, the goal is to use mathematical
functions of different combinations of our features to
predict the continuous value of our label.
In classification problems, instead of trying to predict a
continuous variable, we are trying to create a decision
boundary that separates the different classes.
Deep Learning
Responsible AI
Fairness Explainability Privacy Security
Facets
Unfair vs. naturally occuring bias
Bias is often, but not always, a bad thing
Unfair bias occurs when data doesn’t
accurately reflect the population who will
be using a model. The Fairness pattern
details how to account for this.
Naturally occurring bias refers to
scenarios where data is inherently
imbalanced, and can’t be improved
through further data collection.
🐱 🐱 🐱 🐱 🐶 💳 💳 💳 💳 ⚠️
Most real-world
datasets aren’t
perfectly balanced
● Anomaly detection
● Multi-class classification
● Predicting occurrence of a
rare event
3 techniques for handling imbalanced data
1. Downsampling
2. Upsampling
3. Weighted classes
Downsampling
Majority class Minority class
Downsampling
Majority class Minority class
Discard a random sample
Downsampling
Majority
class
Minority
class
Discard a random
sample
Upsampling
Majority class Minority class
Upsampling
Majority
class
Minority
class
Generate new data
Weighted Class
Majority class Minority class
Pay more
attention to
me!
Choosing the right evaluation metrics
Accuracy: 93%
the same model
ROC vs PR
AUC
PR curve focuses on the minority class whereas the ROC curve covers both classes
Sample Questions Review
You work for a maintenance company and have built and trained a deep learning model that
identifies defects based on thermal images of underground electric cables. Your dataset contains
10,000 images, 100 of which contain visible defects. How should you evaluate the performance of
the model on a test dataset?
A. Calculate the Area Under the Curve (AUC) value.
B. Calculate the number of true positive results predicted by the model.
C. Calculate the fraction of images predicted by the model to have a visible defect.
D. Calculate the Cosine Similarity to compare the model’s performance on the test dataset to the
model’s performance on the training dataset.
You work for a maintenance company and have built and trained a deep learning model that
identifies defects based on thermal images of underground electric cables. Your dataset contains
10,000 images, 100 of which contain visible defects. How should you evaluate the performance of
the model on a test dataset?
A. Calculate the Area Under the Curve (AUC) value.
B. Calculate the number of true positive results predicted by the model.
C. Calculate the fraction of images predicted by the model to have a visible defect.
D. Calculate the Cosine Similarity to compare the model’s performance on the test dataset to the
model’s performance on the training dataset.
You are an ML engineer at a media company. You need to build an ML model to
analyze video content frame-by-frame, identify objects, and alert users if there is
inappropriate content. Which Google Cloud products should you use to build this
project?
A. Pub/Sub, Cloud Function, Cloud Vision API
B. Pub/Sub, Cloud IoT, Dataflow, Cloud Vision API, Cloud Logging
C. Pub/Sub, Cloud Function, Video Intelligence API, Cloud Logging
D. Pub/Sub, Cloud Function, AutoML Video Intelligence, Cloud Logging
You are an ML engineer at a media company. You need to build an ML model to
analyze video content frame-by-frame, identify objects, and alert users if there is
inappropriate content. Which Google Cloud products should you use to build this
project?
A. Pub/Sub, Cloud Function, Cloud Vision API
B. Pub/Sub, Cloud IoT, Dataflow, Cloud Vision API, Cloud Logging
C. Pub/Sub, Cloud Function, Video Intelligence API, Cloud Logging
D. Pub/Sub, Cloud Function, AutoML Video Intelligence, Cloud Logging
A sarcasm-detection model was trained on 80,000 text messages: 40,000 messages sent by adults (18 years and older) and
40,000 messages sent by minors (less than 18 years old). The model was then evaluated on a test set of 20,000 messages:
10,000 from adults and 10,000 from minors. The following confusion matrices show the results for each group (a positive
prediction signifies a classification of "sarcastic"; a negative prediction signifies a classification of "not sarcastic". Choose the
correct answers):
a) The 10,000 messages sent by adults
are a class-imbalanced dataset.
b) The model fails to classify
approximately 50% of minors' sarcastic
messages as "sarcastic."
c) The 10,000 messages sent by minors
are a class-imbalanced dataset.
d) Overall, the model performs better on
examples from adults than on examples
from minors.
A sarcasm-detection model was trained on 80,000 text messages: 40,000 messages sent by adults (18 years and older) and
40,000 messages sent by minors (less than 18 years old). The model was then evaluated on a test set of 20,000 messages:
10,000 from adults and 10,000 from minors. The following confusion matrices show the results for each group (a positive
prediction signifies a classification of "sarcastic"; a negative prediction signifies a classification of "not sarcastic". Choose the
correct answers):
a) The 10,000 messages sent by adults
are a class-imbalanced dataset.
b) The model fails to classify
approximately 50% of minors' sarcastic
messages as "sarcastic."
c) The 10,000 messages sent by minors
are a class-imbalanced dataset.
d) Overall, the model performs better on
examples from adults than on examples
from minors.
Q&A
Preview actions for next
week
By our next session #2 (3/2/2024)
Try to complete these before the session:
04 TensorFlow on Google Cloud
05 Launching into Machine Learning
As you complete each
course, you will get badges
on Cloud Skills Boost.
Redeem your participation badge
Thank you for joining the event
Thank you for
participating!
For any operational questions about access to
Cloud Skills Boost or the Road to Google
Developers Certification program contact: gdg-
support@google.com

2024-02-24_Session 1 - PMLE_UPDATED.pptx

  • 1.
    Session 1 Sowndarya Venkateswaran ProfessionalMachine Learning Engineer Margaret Maynard-Reid
  • 2.
    Where are weon our journey 1 Session 1 Content Review 2 Q&A 4 Preview actions for next week 5 Sample Question Review 3
  • 3.
    Where are weon our journey
  • 4.
    Professional Machine LearningCertification Learning Journey Organized by Google Developer Groups Surrey co hosting with GDG Seattle Session 1 Feb 24, 2024 Virtual Session 2 Mar 2, 2024 Virtual Session 3 Mar 9, 2024 Virtual Session 4 Mar 16, 2024 Virtual Session 5 Mar 23, 2024 Virtual Session 6 Apr 6, 2024 Virtual Review the Professional ML Engineer Exam Guide Review the Professional ML Engineer Sample Questions Go through: Google Cloud Platform Big Data and Machine Learning Fundamentals Hands On Lab Practice: Perform Foundational Data, ML, and AI Tasks in Google Cloud (Skill Badge) - 7hrs Build and Deploy ML Solutions on Vertex AI (Skill Badge) - 8hrs Self study (and potential exam) Lightning talk + Kick-off & Machine Learning Basics + Q&A Lightning talk + GCP- TensorFlow & Feature Engineering + Q&A Lightning talk + Enterprise Machine Learning + Q&A Lightning talk + Production Machine Learning with Google Cloud + Q&A Lightning talk + NLP & Recommendation Systems on GCP + Q&A Lightning talk + MOPs & ML Pipelines on GCP + Q&A Complete course: Introduction to AI and Machine Learning on Google Cloud Launching into Machine Learning Complete course: TensorFlow on Google Cloud Feature Engineering Complete course: Machine Learning in the Enterprise Hands On Lab Practice: Production Machine Learning Systems Computer Vision Fundamentals with Google Cloud Complete course: Natural Language Processing on Google Cloud Recommendation Systems on GCP Complete course: ML Ops - Getting Started ML Pipelines on Google Cloud Check Readiness: Professional ML Engineer Sample Questions
  • 5.
  • 6.
    Session 1 Topics Intro toAI and ML on Google Cloud - Intro - AI Foundations - AI Development Options - AI Development WorkFlow - Generative AI ML Problem Framing - Translate business challenge into ML use case. - Define ML problem. - Define business success criteria. - Identify risks to feasibility and implementation of ML solution. - Envisioning future solution improvements. MLE Learning Path - Google Cloud Skills Boost
  • 7.
    Intro to AIand ML on Google Cloud
  • 8.
    ML GDE (GoogleDeveloper Expert) GDG Seattle organizer 3D artist Fashion Designer Instructor at UW About me margaretmz.art 9
  • 9.
  • 10.
    1. Why Google 2.AI/ML framework on Google Cloud 3. Google Cloud infrastructure 4. Data and AI products 5. ML model categories 6. BigQuery ML Lab: BigQuery ML - Predicting Visitor Purchases with BigQuery ML 1. AI Foundations
  • 11.
  • 12.
    1. How amachine learns 2. ML workflow 3. Data preparation 4. Model development 5. Model serving 6. MLOps and workflow automation Lab: AutoML - Predicting Loan Risk with AutoML 3. AI Development Workflow
  • 13.
    AI that generatescontent (can be multimodal) with generative models: 4. Generative AI What is it?
  • 14.
    Model Garden ● Foundationmodels ● Task specific models ● Fine tunable models Generative AI on Google Cloud Your options Gen AI Studio ● Language ● Vision ● Speech
  • 15.
  • 17.
    LLMs are avery important part of Generative AI. Generative AI ≠ LLMs
  • 18.
    Register here forworldwide events: ● 2024 Duet AI Roadshow - Google Cloud team Mark your calendar! ● 4/20/2024 Build with AI - GDG Surrey ● 5/4/2024 Build with AI - GDG Seattle Generative AI Upcoming events
  • 19.
    ML Problem Framing, ModelEvaluation, Fairness
  • 20.
  • 21.
  • 22.
    ML Problem Framingand the ML Journey
  • 23.
    Understand the problem ●State the goal for the product you are developing or refactoring. ● Determine how best the goal can be solved, predictive ML, generative AI, or a non-ML solution. ● Verify you have the data required to train a model if you're using a predictive ML approach.
  • 24.
    Stating the Goal ●What am I trying to accomplish? ● A Weather app = Calculate precipitation every 6 hours in Surrey.
  • 25.
    ML Approach ML systemscan be divided into two broad categories: ● Predictive ML - Makes a prediction ● Generative AI – Generates output based on the user's intent
  • 26.
    Are we havingthe required data? ● Relevant ● Comprehensive ● Diverse
  • 27.
    Framing a MachineLearning Problem ● ML Problem: ○ What is being predicted? ○ What data is needed? ● Software Problem: ○ How will users access predictions from the model? ○ Who will use this service? ○ How are they doing it today? ● Data Problem: ○ How will you collect the data? ○ What analysis needs to be done? ○ How will you react to changes in trends?
  • 28.
    Framing a MachineLearning Problem ❑ Defined the ideal outcome and the model's goal. ❑ Model's output type ❑ Success metrics
  • 29.
    Let’s start withan example 🚴 You’re building a model to predict bike rental duration 📊 The model’s mean absolute error (MAE) is 1,200 seconds. Great! 🤔 But is that good or bad??
  • 30.
    Jumping into developmentwithout a prototype or heuristic baseline A machine learning project is an iterative process. You should start with a simple model and continue to refine it until you've reached your goal. A quick prototype can tell you a lot about hidden requirements, implementation challenges, scope, etc.
  • 31.
    Heuristic benchmark =simple point of comparison ● Good benchmarks: ○ Constant ○ Rule of thumb ○ Mean / median / mode ○ Human experts ● Not necessarily determined by ML: comparing to a linear regression model isn’t always best
  • 32.
    Returning to ourbike example ● In our training dataset, what is the average rental duration given the station name and whether or not it is a peak commute hour? ● How does our model performance compare to this benchmark?
  • 34.
    Get to Knowyour Data: Data Split
  • 35.
    Get to Knowyour Data: Improve Data Quality
  • 36.
    Get to Knowyour Data: EDA
  • 37.
  • 38.
    Machine Learning inPractice In regression problems, the goal is to use mathematical functions of different combinations of our features to predict the continuous value of our label. In classification problems, instead of trying to predict a continuous variable, we are trying to create a decision boundary that separates the different classes.
  • 39.
  • 40.
  • 41.
  • 42.
    Unfair vs. naturallyoccuring bias Bias is often, but not always, a bad thing Unfair bias occurs when data doesn’t accurately reflect the population who will be using a model. The Fairness pattern details how to account for this. Naturally occurring bias refers to scenarios where data is inherently imbalanced, and can’t be improved through further data collection. 🐱 🐱 🐱 🐱 🐶 💳 💳 💳 💳 ⚠️
  • 43.
    Most real-world datasets aren’t perfectlybalanced ● Anomaly detection ● Multi-class classification ● Predicting occurrence of a rare event
  • 44.
    3 techniques forhandling imbalanced data 1. Downsampling 2. Upsampling 3. Weighted classes
  • 45.
  • 46.
    Downsampling Majority class Minorityclass Discard a random sample
  • 47.
  • 48.
  • 49.
  • 50.
    Weighted Class Majority classMinority class Pay more attention to me!
  • 51.
    Choosing the rightevaluation metrics Accuracy: 93% the same model
  • 52.
    ROC vs PR AUC PRcurve focuses on the minority class whereas the ROC curve covers both classes
  • 53.
  • 54.
    You work fora maintenance company and have built and trained a deep learning model that identifies defects based on thermal images of underground electric cables. Your dataset contains 10,000 images, 100 of which contain visible defects. How should you evaluate the performance of the model on a test dataset? A. Calculate the Area Under the Curve (AUC) value. B. Calculate the number of true positive results predicted by the model. C. Calculate the fraction of images predicted by the model to have a visible defect. D. Calculate the Cosine Similarity to compare the model’s performance on the test dataset to the model’s performance on the training dataset.
  • 55.
    You work fora maintenance company and have built and trained a deep learning model that identifies defects based on thermal images of underground electric cables. Your dataset contains 10,000 images, 100 of which contain visible defects. How should you evaluate the performance of the model on a test dataset? A. Calculate the Area Under the Curve (AUC) value. B. Calculate the number of true positive results predicted by the model. C. Calculate the fraction of images predicted by the model to have a visible defect. D. Calculate the Cosine Similarity to compare the model’s performance on the test dataset to the model’s performance on the training dataset.
  • 56.
    You are anML engineer at a media company. You need to build an ML model to analyze video content frame-by-frame, identify objects, and alert users if there is inappropriate content. Which Google Cloud products should you use to build this project? A. Pub/Sub, Cloud Function, Cloud Vision API B. Pub/Sub, Cloud IoT, Dataflow, Cloud Vision API, Cloud Logging C. Pub/Sub, Cloud Function, Video Intelligence API, Cloud Logging D. Pub/Sub, Cloud Function, AutoML Video Intelligence, Cloud Logging
  • 57.
    You are anML engineer at a media company. You need to build an ML model to analyze video content frame-by-frame, identify objects, and alert users if there is inappropriate content. Which Google Cloud products should you use to build this project? A. Pub/Sub, Cloud Function, Cloud Vision API B. Pub/Sub, Cloud IoT, Dataflow, Cloud Vision API, Cloud Logging C. Pub/Sub, Cloud Function, Video Intelligence API, Cloud Logging D. Pub/Sub, Cloud Function, AutoML Video Intelligence, Cloud Logging
  • 59.
    A sarcasm-detection modelwas trained on 80,000 text messages: 40,000 messages sent by adults (18 years and older) and 40,000 messages sent by minors (less than 18 years old). The model was then evaluated on a test set of 20,000 messages: 10,000 from adults and 10,000 from minors. The following confusion matrices show the results for each group (a positive prediction signifies a classification of "sarcastic"; a negative prediction signifies a classification of "not sarcastic". Choose the correct answers): a) The 10,000 messages sent by adults are a class-imbalanced dataset. b) The model fails to classify approximately 50% of minors' sarcastic messages as "sarcastic." c) The 10,000 messages sent by minors are a class-imbalanced dataset. d) Overall, the model performs better on examples from adults than on examples from minors.
  • 60.
    A sarcasm-detection modelwas trained on 80,000 text messages: 40,000 messages sent by adults (18 years and older) and 40,000 messages sent by minors (less than 18 years old). The model was then evaluated on a test set of 20,000 messages: 10,000 from adults and 10,000 from minors. The following confusion matrices show the results for each group (a positive prediction signifies a classification of "sarcastic"; a negative prediction signifies a classification of "not sarcastic". Choose the correct answers): a) The 10,000 messages sent by adults are a class-imbalanced dataset. b) The model fails to classify approximately 50% of minors' sarcastic messages as "sarcastic." c) The 10,000 messages sent by minors are a class-imbalanced dataset. d) Overall, the model performs better on examples from adults than on examples from minors.
  • 61.
  • 62.
  • 63.
    By our nextsession #2 (3/2/2024) Try to complete these before the session: 04 TensorFlow on Google Cloud 05 Launching into Machine Learning
  • 64.
    As you completeeach course, you will get badges on Cloud Skills Boost. Redeem your participation badge Thank you for joining the event
  • 65.
    Thank you for participating! Forany operational questions about access to Cloud Skills Boost or the Road to Google Developers Certification program contact: gdg- support@google.com