Machine Learning in Production

•

0 likes•241 views

Ben Freundorfer

Working with real life data and deploying ML algorithms

Technology

Machine Learning
Real-Life Data & ML in Production
@benfreu
Ben Freundorfer

Many algorithms are a bunch of matrix calculations.
• Costly to train models
• Cheap to apply models (predict)

Transformation
Transform relational data into vectors
All algos need: matrices of numbers
Some need 
0.0 ≤ x ≤ 1.0 
mean=0 
σ=1
Look out for algos requiring „normalized“ or
„standardized“ values → feature scaling

Categories
• Features with no numerical relation
• Category 5 doesn’t have 5x the y of category 1
• Fix: Dummy variables
• cat_1, cat_2, … cat_5 with values 0 or 1

Missing Values
• days_since_last_purchase = null 
How to deal with this? 
0 or 999?
• Often intuitively clear from the data domain 
One solution: 
max(days_since_last_purchase of other users)
• HAS to be addressed

Outliers
• days_since_last_purchase = 2837 
for a legacy customer
• If it’s irrelevant, get rid of the whole example
(legacy customer)
• Or cap at a max/min value

Reduce Features
• check for correlation between features. 
get rid of correlated ones
• get rid of intuitively useless features

A Better Model
• Less features - i.e. is simpler
• Trained on more training examples

Online vs Ofﬂine
OFFLINE 
From time to time retrain whole model and upload
model
ONLINE 
Algorithm runs each time a new example is added
and adapts the model a bit
examples should be randomized

Example
Predict which category user will buy from after
newsletter-signup

Build Model
• Collect data 
Trafﬁc source, categories looked at prior to signup,
etc. and y = category of purchase after signup
• Analyze 
Try to make predictions using e.g. logistic
regression
• Train ﬁnal model
• Save weights to DB or JSON or ﬁle

Predict
• User signs up
• Load weights and predict probabilities of
categories.
• If P(category X) > threshold 
classify user as „interested in category X“
• Send out newsletters

Tips
• Use R or Python/Jupyter/Pandas to analyze data
• Test if you need a separate system for predictions
or just for training
• Try not to implement algos yourself 
If you do, use numerical computation libraries
(probably wrappers for C or Fortran code)
• Be sure the past predicts the future

Ethics
• Your model might turn into a racially proﬁling sexist.
• Be aware of what your input features mean &  
what you actually base your predictions on
• Relatively harmless when predicting product
categories - questionable for credit ratings

What's hot

Modern Machine Learning Infrastructure and PracticesWill Gardella

MATLAB Assignment Maker Research IdeasMatlab Simulation

Seamless MLOps with Seldon and MLflowDatabricks

Productionizing Machine Learning in Our Health and Wellness MarketplaceDatabricks

Machine Learning With ML.NETDev Raj Gautam

Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...Formulatedby

MLOps - The Assembly Line of MLJordan Birdsell

MLOps Bridging the gap between Data Scientists and Ops.Knoldus Inc.

Build, train, and deploy machine learning models at scale - AWS Summit Cape T...Amazon Web Services

MLconf seattle 2015 presentationehtshamelahi

The A-Z of Data: Introduction to MLOpsDataPhoenix

Help with Matlab Assignment Research HelpMatlab Simulation

Introduction to ML.NETMarco Parenzan

Matlab Assignment Experts Research HelpMatlab Simulation

Common Problems in Hyperparameter OptimizationSigOpt

Recommendations for Building Machine Learning SoftwareJustin Basilico

Open source ml systems that need to be builtNikhil Garg

Top 10 Data Science Practioner Pitfalls - Mark LandrySri Ambati

Getting Started with Azure AutoMLVivek Raja P S

2017 10-10 (netflix ml platform meetup) learning item and user representation...Ed Chi

What's hot (20)

Modern Machine Learning Infrastructure and Practices

MATLAB Assignment Maker Research Ideas

Seamless MLOps with Seldon and MLflow

Productionizing Machine Learning in Our Health and Wellness Marketplace

Machine Learning With ML.NET

Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...

MLOps - The Assembly Line of ML

MLOps Bridging the gap between Data Scientists and Ops.

Build, train, and deploy machine learning models at scale - AWS Summit Cape T...

MLconf seattle 2015 presentation

The A-Z of Data: Introduction to MLOps

Help with Matlab Assignment Research Help

Introduction to ML.NET

Matlab Assignment Experts Research Help

Common Problems in Hyperparameter Optimization

Recommendations for Building Machine Learning Software

Open source ml systems that need to be built

Top 10 Data Science Practioner Pitfalls - Mark Landry

Getting Started with Azure AutoML

2017 10-10 (netflix ml platform meetup) learning item and user representation...

Viewers also liked

An Architecture for Agile Machine Learning in Real-Time ApplicationsJohann Schleier-Smith

Real-time Recommendations for Retail: Architecture, Algorithms, and DesignJuliet Hougland

Machine Learning system architecture – Microsoft Translator, a Case Study : ...Vishal Chowdhary

Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...SoftServe

Building a Recommendation Engine - An example of a product recommendation engineNYC Predictive Analytics

Architecting for Data ScienceJohann Schleier-Smith

AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)Amazon Web Services

(BDT317) Building A Data Lake On AWSAmazon Web Services

Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...Amazon Web Services

¡Más tecnología, menos precio!Beep Informática

Java EE7: Developing for the CloudDmitry Buzdin

Ova aldheidos y cetonasAdriana Ortega Montero

Catálogo BEEP Septiembre 2014Beep Informática

D2Stewart Caceres Rivera

Практическое управление персоналомAndrey Pletenev

Grupo restauranteStewart Caceres Rivera

Manifiesto ciudadanoTomás Fernández Pitalúa

Dossier de Capacidades de I+D+i del CITICCITIC - Centro de Investigación en Tecnologías de la Información y las Comunicaciones

Recession reformationConfidential

Viewers also liked (20)

An Architecture for Agile Machine Learning in Real-Time Applications

Real-time Recommendations for Retail: Architecture, Algorithms, and Design

Machine Learning system architecture – Microsoft Translator, a Case Study : ...

Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...

Building a Recommendation Engine - An example of a product recommendation engine

Architecting for Data Science

AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)

(BDT317) Building A Data Lake On AWS

Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...

¡Más tecnología, menos precio!

Java EE7: Developing for the Cloud

Ova aldheidos y cetonas

Catálogo BEEP Septiembre 2014

Практическое управление персоналом

Grupo restaurante

Manifiesto ciudadano

Dossier de Capacidades de I+D+i del CITIC

Recession reformation

Similar to Machine Learning in Production

Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon

UNIT_5_Data Wrangling.pptxBhagyasriPatel2

General Tips for participating Kaggle CompetitionsMark Peng

The Machine Learning Workflow with AzureIvo Andreev

The Power of Auto ML and How Does it WorkIvo Andreev

Recommender systems for E-commerceAlexander Konduforov

Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün

Choosing a Machine Learning technique to solve your needGibDevs

Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Maninda Edirisooriya

An introduction to machine learning and statisticsSpotle.ai

Week_1 Machine Learning introduction.pptxmuhammadsamroz

Before Kaggle : from a business goal to a Machine Learning problem Dataiku

Before KagglePierre Gutierrez

Agile experiments in Machine Learning with F#J On The Beach

Automated Hyperparameter Tuning, Scaling and TrackingDatabricks

Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Alok Singh

Supervised LearningFEG

machine learningsoundaryasarya

Augmenting Machine Learning with Databricks Labs AutoML ToolkitDatabricks

Apache Spark Model Deployment Databricks

Similar to Machine Learning in Production (20)

Foundations of Machine Learning - StampedeCon AI Summit 2017

UNIT_5_Data Wrangling.pptx

General Tips for participating Kaggle Competitions

The Machine Learning Workflow with Azure

The Power of Auto ML and How Does it Work

Recommender systems for E-commerce

Building High Available and Scalable Machine Learning Applications

Choosing a Machine Learning technique to solve your need

Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...

An introduction to machine learning and statistics

Week_1 Machine Learning introduction.pptx

Before Kaggle : from a business goal to a Machine Learning problem

Before Kaggle

Agile experiments in Machine Learning with F#

Automated Hyperparameter Tuning, Scaling and Tracking

Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...

Supervised Learning

machine learning

Augmenting Machine Learning with Databricks Labs AutoML Toolkit

Apache Spark Model Deployment

Recently uploaded

Slack Application Development 101 Slidespraypatel2

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

Artificial intelligence in the post-deep learning eraDeakin University

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Pigging Solutions in Pet Food ManufacturingPigging Solutions

Vulnerability_Management_GRC_by Sohang Sengupta.pptxnull - The Open Security Community

How to convert PDF to text with Nanonetsnaman860154

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Install Stable Diffusion in windows machinePadma Pradeep

Understanding the Laravel MVC ArchitecturePixlogix Infotech

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

CloudStudio User manual (basic edition):comworks

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Recently uploaded (20)

Slack Application Development 101 Slides

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

Artificial intelligence in the post-deep learning era

The Codex of Business Writing Software for Real-World Solutions 2.pptx

08448380779 Call Girls In Civil Lines Women Seeking Men

Pigging Solutions in Pet Food Manufacturing

Vulnerability_Management_GRC_by Sohang Sengupta.pptx

How to convert PDF to text with Nanonets

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Install Stable Diffusion in windows machine

Understanding the Laravel MVC Architecture

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx

Maximizing Board Effectiveness 2024 Webinar.pptx

CloudStudio User manual (basic edition):

Breaking the Kubernetes Kill Chain: Host Path Mount

Machine Learning in Production

1. Machine Learning Real-Life Data & ML in Production @benfreu Ben Freundorfer

3. Costs

4. What’s a model

5. Many algorithms are a bunch of matrix calculations. • Costly to train models • Cheap to apply models (predict)

6. Human work

7. Real-Life Data

8. Transformation Transform relational data into vectors All algos need: matrices of numbers Some need  0.0 ≤ x ≤ 1.0  mean=0  σ=1 Look out for algos requiring „normalized“ or „standardized“ values → feature scaling

9. Categories • Features with no numerical relation • Category 5 doesn’t have 5x the y of category 1 • Fix: Dummy variables • cat_1, cat_2, … cat_5 with values 0 or 1

10. Missing Values • days_since_last_purchase = null  How to deal with this?  0 or 999? • Often intuitively clear from the data domain  One solution:  max(days_since_last_purchase of other users) • HAS to be addressed

11. Outliers • days_since_last_purchase = 2837  for a legacy customer • If it’s irrelevant, get rid of the whole example (legacy customer) • Or cap at a max/min value

12. Reduce Features • check for correlation between features.  get rid of correlated ones • get rid of intuitively useless features

13. A Better Model • Less features - i.e. is simpler • Trained on more training examples

14. Moving to Production

15.

16. Online vs Ofﬂine OFFLINE  From time to time retrain whole model and upload model ONLINE  Algorithm runs each time a new example is added and adapts the model a bit examples should be randomized

17. Example Predict which category user will buy from after newsletter-signup

18. Build Model • Collect data  Traffic source, categories looked at prior to signup, etc. and y = category of purchase after signup • Analyze  Try to make predictions using e.g. logistic regression • Train final model • Save weights to DB or JSON or file

19. Predict • User signs up • Load weights and predict probabilities of categories. • If P(category X) > threshold  classify user as „interested in category X“ • Send out newsletters

20. Tips • Use R or Python/Jupyter/Pandas to analyze data • Test if you need a separate system for predictions or just for training • Try not to implement algos yourself  If you do, use numerical computation libraries (probably wrappers for C or Fortran code) • Be sure the past predicts the future

21. Ethics • Your model might turn into a racially proﬁling sexist. • Be aware of what your input features mean &   what you actually base your predictions on • Relatively harmless when predicting product categories - questionable for credit ratings

22. Thank you Ben Freundorfer @benfreu

Machine Learning in Production

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Machine Learning in Production

Similar to Machine Learning in Production (20)

Recently uploaded

Recently uploaded (20)

Machine Learning in Production