David Gerster: Hands on Machine Learning

•Download as PPTX, PDF•

1 like•978 views

David Gerster

An overview of predictive modeling, including an intuitive explanation of the concepts of "training" and "scoring".

Data & Analytics

Hands-on Machine Learning
David Gerster
VP Data Science

Agenda
Part 1: What is “machine learning”?
Part 2: Finding patterns in an actual data set
2

“Machine Learning”: Finding patterns in data
• Famous “Iris” data set has measurements for 150 flowers
• Given a flower’s measurements, can we predict its species?
Iris setosa Iris versicolor Iris virginica
4

Petal Width (cm)
Petal Length (cm)
Iris setosa, red dots
Iris versicolor, green dots
Iris virginica, blue dots
5

Petal Width (cm)
Petal Length (cm)
Congratulations! You just trained a model.
6

Petal Width (cm)
Petal Length (cm)
Prediction: Iris virginica
Prediction: Iris versicolor
Prediction: Iris setosa
Prediction:
Iris virginica
7

Petal Width (cm)
Prediction: Iris virginica
Prediction: Iris versicolor
Prediction: Iris setosa
Petal Length (cm)
Prediction:
Congratulations! You just scored four Iris virginica
previously unseen flowers using your
model, and made a prediction about
the species of each one.
8

• Data is just a table of values
• Each row is an “instance”, an
example of the concept to be
learned
• Each column is an “attribute” or
“feature” of the instance
• The column we want to predict is
the “label”
9

Training versus Scoring
• This process had two steps: training and scoring
• When training on historical data, you’re often looking for patterns
that emerge over weeks, months or even years
• When scoring new data points, you want the answer immediately
(in “real time”)
13

Do you really need to train in “real time”?
• Many real-world cases rely heavily on historical data
• Credit scores, fraud detection, movie ratings, web search relevance, disease
diagnosis, customer churn, yield on a silicon wafer …
• Extreme example: text recognition!
• You might add fresh training data daily or hourly, but you will still
have lots of historical data in the training set.
• You definitely want to score in real time, because you’re typically
using this model in some sort of app
14

What “Real Time” Really Means
• The next time you hear someone talk about “real time”
machine learning, make yourself look really smart and ask if
they mean training or scoring
16

• W
What do you mean,
real time training or
real time scoring?
What? I don’t …
18

The StumbleUpon Dataset
• StumbleUpon is an app that recommends web pages
• Dataset of 7,400 web pages is provided, with each page labeled as
either “evergreen” or “ephemeral”
• We want to predict the page’s class using this historical data
19
While some pages we recommend, such as news
articles or seasonal recipes, are only relevant for a
short period of time, others maintain a timeless
quality and can be recommended to users long after
they are discovered. In other words, pages can
either be classified as "ephemeral" or "evergreen".

Training a model on StumbleUpon data
• Live demo: training a model on StumbleUpon data
• Key concepts:
• “Bag of words” text analysis
• Evaluating the model using a holdout set
• Combining multiple models to improve accuracy
20

Final Thought
• The two datasets we trained on were not “big”
• Iris dataset: 150 rows, less than 5K
• StumbleUpon dataset: 7400 rows, 21MB
• Data doesn’t need to be big to be useful
21

Similar to David Gerster: Hands on Machine Learning

Data Science 101ideatoipo

lec01-IntroductionToDataMining.pptxAmjadAlDgour

Data science for advanced dummiesSaurav Chakravorty

What Managers Need to Know about Data ScienceAnnie Flippo

Begin with Data ScientistNarong Intiruk

Data science-retreat-how it works plus advice for upcoming data scientistsJose Quesada

Introduction to Enterprise SearchFindwise

H2O World - Machine Learning for non-data scientistsSri Ambati

Introduction to Big Data/Machine LearningLars Marius Garshol

Machine Learning for Designers - UX Camp SwitzerlandMemi Beltrame

awari-ds-aula1.pdfMarcos993896

Barga Data Science lecture 4Roger Barga

Demystifying Machine Learning and Artificial IntelligenceEPCC, University of Edinburgh

SnapLogic Technology Open House – January 2018SnapLogic

Machine Learning for Designers - UX ScotlandMemi Beltrame

Bringing Insights to Life (Telling Market Research Stories With Visuals)hinesandlee

Barga DIDC'14 Invited TalkRoger Barga

Barga Data Science lecture 9Roger Barga

Big databigideasit4bcVincent Ohprecio

Barga Data Science lecture 6Roger Barga

Similar to David Gerster: Hands on Machine Learning (20)

Data Science 101

lec01-IntroductionToDataMining.pptx

Data science for advanced dummies

What Managers Need to Know about Data Science

Begin with Data Scientist

Data science-retreat-how it works plus advice for upcoming data scientists

Introduction to Enterprise Search

H2O World - Machine Learning for non-data scientists

Introduction to Big Data/Machine Learning

Machine Learning for Designers - UX Camp Switzerland

awari-ds-aula1.pdf

Barga Data Science lecture 4

Demystifying Machine Learning and Artificial Intelligence

SnapLogic Technology Open House – January 2018

Machine Learning for Designers - UX Scotland

Bringing Insights to Life (Telling Market Research Stories With Visuals)

Barga DIDC'14 Invited Talk

Barga Data Science lecture 9

Big databigideasit4bc

Barga Data Science lecture 6

Recently uploaded

Decoding Loan Approval: Predictive Modeling in ActionBoston Institute of Analytics

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083

Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort

E-Commerce Order PredictionShraddha Kamble.pptxBoston Institute of Analytics

DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett

How we prevented account sharing with MFAAndrei Kaleshka

Data Science Jobs and Salaries Analysis.pptxFurkanTasci3

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss

办理学位证纽约大学毕业证(NYU毕业证书）原版一比一fhwihughh

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal

INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman

Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava

Recently uploaded (20)

Decoding Loan Approval: Predictive Modeling in Action

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call

Call Girls In Mahipalpur O9654467111 Escorts Service

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service

E-Commerce Order PredictionShraddha Kamble.pptx

DBA Basics: Getting Started with Performance Tuning.pdf

How we prevented account sharing with MFA

Data Science Jobs and Salaries Analysis.pptx

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一

办理学位证纽约大学毕业证(NYU毕业证书）原版一比一

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD

Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...

PKS-TGC-1084-630 - Stage 1 Proposal.pptx

David Gerster: Hands on Machine Learning

1. Hands-on Machine Learning David Gerster VP Data Science

2. Agenda Part 1: What is “machine learning”? Part 2: Finding patterns in an actual data set 2

3. “Machine Learning”: Finding patterns in data • Famous “Iris” data set has measurements for 150 flowers • Given a flower’s measurements, can we predict its species? Iris setosa Iris versicolor Iris virginica 4

4. Petal Width (cm) Petal Length (cm) Iris setosa, red dots Iris versicolor, green dots Iris virginica, blue dots 5

5. Petal Width (cm) Petal Length (cm) Congratulations! You just trained a model. 6

6. Petal Width (cm) Petal Length (cm) Prediction: Iris virginica Prediction: Iris versicolor Prediction: Iris setosa Prediction: Iris virginica 7

7. Petal Width (cm) Prediction: Iris virginica Prediction: Iris versicolor Prediction: Iris setosa Petal Length (cm) Prediction: Congratulations! You just scored four Iris virginica previously unseen flowers using your model, and made a prediction about the species of each one. 8

8. • Data is just a table of values • Each row is an “instance”, an example of the concept to be learned • Each column is an “attribute” or “feature” of the instance • The column we want to predict is the “label” 9

9. 10 Try out the Iris data set at

10. Try out the Iris data set at 11

11. That was easy! … So What? 12

12. Training versus Scoring • This process had two steps: training and scoring • When training on historical data, you’re often looking for patterns that emerge over weeks, months or even years • When scoring new data points, you want the answer immediately (in “real time”) 13

13. Do you really need to train in “real time”? • Many real-world cases rely heavily on historical data • Credit scores, fraud detection, movie ratings, web search relevance, disease diagnosis, customer churn, yield on a silicon wafer … • Extreme example: text recognition! • You might add fresh training data daily or hourly, but you will still have lots of historical data in the training set. • You definitely want to score in real time, because you’re typically using this model in some sort of app 14

14. 15

15. What “Real Time” Really Means • The next time you hear someone talk about “real time” machine learning, make yourself look really smart and ask if they mean training or scoring 16

16. • W What do you mean, real time training or real time scoring? What? I don’t … 18

17. The StumbleUpon Dataset • StumbleUpon is an app that recommends web pages • Dataset of 7,400 web pages is provided, with each page labeled as either “evergreen” or “ephemeral” • We want to predict the page’s class using this historical data 19 While some pages we recommend, such as news articles or seasonal recipes, are only relevant for a short period of time, others maintain a timeless quality and can be recommended to users long after they are discovered. In other words, pages can either be classified as "ephemeral" or "evergreen".

18. Training a model on StumbleUpon data • Live demo: training a model on StumbleUpon data • Key concepts: • “Bag of words” text analysis • Evaluating the model using a holdout set • Combining multiple models to improve accuracy 20

19. Final Thought • The two datasets we trained on were not “big” • Iris dataset: 150 rows, less than 5K • StumbleUpon dataset: 7400 rows, 21MB • Data doesn’t need to be big to be useful 21

20. 22

David Gerster: Hands on Machine Learning

Recommended

Recommended

More Related Content

Similar to David Gerster: Hands on Machine Learning

Similar to David Gerster: Hands on Machine Learning (20)

Recently uploaded

Recently uploaded (20)

David Gerster: Hands on Machine Learning