SlideShare a Scribd company logo
1 of 20
Hands-on Machine Learning 
David Gerster 
VP Data Science
Agenda 
Part 1: What is “machine learning”? 
Part 2: Finding patterns in an actual data set 
2
“Machine Learning”: Finding patterns in data 
• Famous “Iris” data set has measurements for 150 flowers 
• Given a flower’s measurements, can we predict its species? 
Iris setosa Iris versicolor Iris virginica 
4
Petal Width (cm) 
Petal Length (cm) 
Iris setosa, red dots 
Iris versicolor, green dots 
Iris virginica, blue dots 
5
Petal Width (cm) 
Petal Length (cm) 
Congratulations! You just trained a model. 
6
Petal Width (cm) 
Petal Length (cm) 
Prediction: Iris virginica 
Prediction: Iris versicolor 
Prediction: Iris setosa 
Prediction: 
Iris virginica 
7
Petal Width (cm) 
Prediction: Iris virginica 
Prediction: Iris versicolor 
Prediction: Iris setosa 
Petal Length (cm) 
Prediction: 
Congratulations! You just scored four Iris virginica 
previously unseen flowers using your 
model, and made a prediction about 
the species of each one. 
8
• Data is just a table of values 
• Each row is an “instance”, an 
example of the concept to be 
learned 
• Each column is an “attribute” or 
“feature” of the instance 
• The column we want to predict is 
the “label” 
9
10 
Try out the Iris data set at
Try out the Iris data set at 
11
That was easy! … So What? 
12
Training versus Scoring 
• This process had two steps: training and scoring 
• When training on historical data, you’re often looking for patterns 
that emerge over weeks, months or even years 
• When scoring new data points, you want the answer immediately 
(in “real time”) 
13
Do you really need to train in “real time”? 
• Many real-world cases rely heavily on historical data 
• Credit scores, fraud detection, movie ratings, web search relevance, disease 
diagnosis, customer churn, yield on a silicon wafer … 
• Extreme example: text recognition! 
• You might add fresh training data daily or hourly, but you will still 
have lots of historical data in the training set. 
• You definitely want to score in real time, because you’re typically 
using this model in some sort of app 
14
15
What “Real Time” Really Means 
• The next time you hear someone talk about “real time” 
machine learning, make yourself look really smart and ask if 
they mean training or scoring 
16
• W 
What do you mean, 
real time training or 
real time scoring? 
What? I don’t … 
18
The StumbleUpon Dataset 
• StumbleUpon is an app that recommends web pages 
• Dataset of 7,400 web pages is provided, with each page labeled as 
either “evergreen” or “ephemeral” 
• We want to predict the page’s class using this historical data 
19 
While some pages we recommend, such as news 
articles or seasonal recipes, are only relevant for a 
short period of time, others maintain a timeless 
quality and can be recommended to users long after 
they are discovered. In other words, pages can 
either be classified as "ephemeral" or "evergreen".
Training a model on StumbleUpon data 
• Live demo: training a model on StumbleUpon data 
• Key concepts: 
• “Bag of words” text analysis 
• Evaluating the model using a holdout set 
• Combining multiple models to improve accuracy 
20
Final Thought 
• The two datasets we trained on were not “big” 
• Iris dataset: 150 rows, less than 5K 
• StumbleUpon dataset: 7400 rows, 21MB 
• Data doesn’t need to be big to be useful 
21
22

More Related Content

Similar to David Gerster: Hands on Machine Learning

Data Science 101
Data Science 101Data Science 101
Data Science 101ideatoipo
 
lec01-IntroductionToDataMining.pptx
lec01-IntroductionToDataMining.pptxlec01-IntroductionToDataMining.pptx
lec01-IntroductionToDataMining.pptxAmjadAlDgour
 
Data science for advanced dummies
Data science for advanced dummiesData science for advanced dummies
Data science for advanced dummiesSaurav Chakravorty
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceAnnie Flippo
 
Begin with Data Scientist
Begin with Data ScientistBegin with Data Scientist
Begin with Data ScientistNarong Intiruk
 
Data science-retreat-how it works plus advice for upcoming data scientists
Data science-retreat-how it works plus advice for upcoming data scientistsData science-retreat-how it works plus advice for upcoming data scientists
Data science-retreat-how it works plus advice for upcoming data scientistsJose Quesada
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise SearchFindwise
 
H2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientistsH2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientistsSri Ambati
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Machine Learning for Designers - UX Camp Switzerland
Machine Learning for Designers - UX Camp SwitzerlandMachine Learning for Designers - UX Camp Switzerland
Machine Learning for Designers - UX Camp SwitzerlandMemi Beltrame
 
awari-ds-aula1.pdf
awari-ds-aula1.pdfawari-ds-aula1.pdf
awari-ds-aula1.pdfMarcos993896
 
Barga Data Science lecture 4
Barga Data Science lecture 4Barga Data Science lecture 4
Barga Data Science lecture 4Roger Barga
 
Demystifying Machine Learning and Artificial Intelligence
Demystifying Machine Learning and Artificial IntelligenceDemystifying Machine Learning and Artificial Intelligence
Demystifying Machine Learning and Artificial IntelligenceEPCC, University of Edinburgh
 
SnapLogic Technology Open House – January 2018
SnapLogic Technology Open House – January 2018SnapLogic Technology Open House – January 2018
SnapLogic Technology Open House – January 2018SnapLogic
 
Machine Learning for Designers - UX Scotland
Machine Learning for Designers - UX ScotlandMachine Learning for Designers - UX Scotland
Machine Learning for Designers - UX ScotlandMemi Beltrame
 
Bringing Insights to Life (Telling Market Research Stories With Visuals)
Bringing Insights to Life (Telling Market Research Stories With Visuals)Bringing Insights to Life (Telling Market Research Stories With Visuals)
Bringing Insights to Life (Telling Market Research Stories With Visuals)hinesandlee
 
Barga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkBarga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkRoger Barga
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9Roger Barga
 
Barga Data Science lecture 6
Barga Data Science lecture 6Barga Data Science lecture 6
Barga Data Science lecture 6Roger Barga
 

Similar to David Gerster: Hands on Machine Learning (20)

Data Science 101
Data Science 101Data Science 101
Data Science 101
 
lec01-IntroductionToDataMining.pptx
lec01-IntroductionToDataMining.pptxlec01-IntroductionToDataMining.pptx
lec01-IntroductionToDataMining.pptx
 
Data science for advanced dummies
Data science for advanced dummiesData science for advanced dummies
Data science for advanced dummies
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data Science
 
Begin with Data Scientist
Begin with Data ScientistBegin with Data Scientist
Begin with Data Scientist
 
Data science-retreat-how it works plus advice for upcoming data scientists
Data science-retreat-how it works plus advice for upcoming data scientistsData science-retreat-how it works plus advice for upcoming data scientists
Data science-retreat-how it works plus advice for upcoming data scientists
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise Search
 
H2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientistsH2O World - Machine Learning for non-data scientists
H2O World - Machine Learning for non-data scientists
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Machine Learning for Designers - UX Camp Switzerland
Machine Learning for Designers - UX Camp SwitzerlandMachine Learning for Designers - UX Camp Switzerland
Machine Learning for Designers - UX Camp Switzerland
 
awari-ds-aula1.pdf
awari-ds-aula1.pdfawari-ds-aula1.pdf
awari-ds-aula1.pdf
 
Barga Data Science lecture 4
Barga Data Science lecture 4Barga Data Science lecture 4
Barga Data Science lecture 4
 
Demystifying Machine Learning and Artificial Intelligence
Demystifying Machine Learning and Artificial IntelligenceDemystifying Machine Learning and Artificial Intelligence
Demystifying Machine Learning and Artificial Intelligence
 
SnapLogic Technology Open House – January 2018
SnapLogic Technology Open House – January 2018SnapLogic Technology Open House – January 2018
SnapLogic Technology Open House – January 2018
 
Machine Learning for Designers - UX Scotland
Machine Learning for Designers - UX ScotlandMachine Learning for Designers - UX Scotland
Machine Learning for Designers - UX Scotland
 
Bringing Insights to Life (Telling Market Research Stories With Visuals)
Bringing Insights to Life (Telling Market Research Stories With Visuals)Bringing Insights to Life (Telling Market Research Stories With Visuals)
Bringing Insights to Life (Telling Market Research Stories With Visuals)
 
Barga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkBarga DIDC'14 Invited Talk
Barga DIDC'14 Invited Talk
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
Big databigideasit4bc
Big databigideasit4bcBig databigideasit4bc
Big databigideasit4bc
 
Barga Data Science lecture 6
Barga Data Science lecture 6Barga Data Science lecture 6
Barga Data Science lecture 6
 

Recently uploaded

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 

Recently uploaded (20)

Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 

David Gerster: Hands on Machine Learning

  • 1. Hands-on Machine Learning David Gerster VP Data Science
  • 2. Agenda Part 1: What is “machine learning”? Part 2: Finding patterns in an actual data set 2
  • 3. “Machine Learning”: Finding patterns in data • Famous “Iris” data set has measurements for 150 flowers • Given a flower’s measurements, can we predict its species? Iris setosa Iris versicolor Iris virginica 4
  • 4. Petal Width (cm) Petal Length (cm) Iris setosa, red dots Iris versicolor, green dots Iris virginica, blue dots 5
  • 5. Petal Width (cm) Petal Length (cm) Congratulations! You just trained a model. 6
  • 6. Petal Width (cm) Petal Length (cm) Prediction: Iris virginica Prediction: Iris versicolor Prediction: Iris setosa Prediction: Iris virginica 7
  • 7. Petal Width (cm) Prediction: Iris virginica Prediction: Iris versicolor Prediction: Iris setosa Petal Length (cm) Prediction: Congratulations! You just scored four Iris virginica previously unseen flowers using your model, and made a prediction about the species of each one. 8
  • 8. • Data is just a table of values • Each row is an “instance”, an example of the concept to be learned • Each column is an “attribute” or “feature” of the instance • The column we want to predict is the “label” 9
  • 9. 10 Try out the Iris data set at
  • 10. Try out the Iris data set at 11
  • 11. That was easy! … So What? 12
  • 12. Training versus Scoring • This process had two steps: training and scoring • When training on historical data, you’re often looking for patterns that emerge over weeks, months or even years • When scoring new data points, you want the answer immediately (in “real time”) 13
  • 13. Do you really need to train in “real time”? • Many real-world cases rely heavily on historical data • Credit scores, fraud detection, movie ratings, web search relevance, disease diagnosis, customer churn, yield on a silicon wafer … • Extreme example: text recognition! • You might add fresh training data daily or hourly, but you will still have lots of historical data in the training set. • You definitely want to score in real time, because you’re typically using this model in some sort of app 14
  • 14. 15
  • 15. What “Real Time” Really Means • The next time you hear someone talk about “real time” machine learning, make yourself look really smart and ask if they mean training or scoring 16
  • 16. • W What do you mean, real time training or real time scoring? What? I don’t … 18
  • 17. The StumbleUpon Dataset • StumbleUpon is an app that recommends web pages • Dataset of 7,400 web pages is provided, with each page labeled as either “evergreen” or “ephemeral” • We want to predict the page’s class using this historical data 19 While some pages we recommend, such as news articles or seasonal recipes, are only relevant for a short period of time, others maintain a timeless quality and can be recommended to users long after they are discovered. In other words, pages can either be classified as "ephemeral" or "evergreen".
  • 18. Training a model on StumbleUpon data • Live demo: training a model on StumbleUpon data • Key concepts: • “Bag of words” text analysis • Evaluating the model using a holdout set • Combining multiple models to improve accuracy 20
  • 19. Final Thought • The two datasets we trained on were not “big” • Iris dataset: 150 rows, less than 5K • StumbleUpon dataset: 7400 rows, 21MB • Data doesn’t need to be big to be useful 21
  • 20. 22