PRODUCTIONISING MACHINE
LEARNING MODELS
Tash Bickley – FTS Data & AI
linkedin.com/in/tashbickley/
Tash Bickley
Principal Consultant, FTS Data & AI
https://www.linkedin.com/in/tashbickley/
About Me
Background:
- Database administration
- Data engineering (ETL)
- Business intelligence
- Statistics and machine learning
- Data analytics architecture advisory
Productionising Machine Learning Models
• Planning for production
• Selecting the optimal architecture for your solution
• Development and Deployment
• Maintaining the quality of the solution and prediction outcomes in
production
Productionising Machine Learning Models
• Around half of all businesses are developing machine learning solutions
• More than 70% of industry leaders believe AI is important1
• Gartner predicts only 25% of machine learning projects are successful
• Capgemini study found 15% of models made it to full-scale production
• Many models in production quickly become liabilities – degrade within
days or months
1. Dresner Report 2019 https://www.forbes.com/sites/louiscolumbus/2019/09/08/state-of-ai-and-machine-learning-in-2019/
Microsoft Chatbot Tay.ai
• Tay posted 96,000 tweets
• < 24 hours to become offensive
• ‘Repeat after me’
• No filter – jokes, irony, malice
• No coherent personality – hybrid
• Who was the audience?
• Tweeted i love feminisim and I f*#!ng hate feminists
and about Bruce Jenner (learned not repeated):
caitlyn jenner is a hero & is a stunning, beautiful woman and
caitlyn jenner isn't a real woman yet she won woman of the year?
How can we avoid a poorly performing AI solution?
• Focus on desired outcomes
• Build a collaborative team with skills for end-to-end requirements
• Data inputs – volume, relevance and quality
• Modelling – feature engineering/training/testing/evaluating/model selection
• Testing (software type not ML training)
• Model designed and trained to suit platform for inference
• Monitoring platform, data inputs and model targets/outcomes
• DevOps
• Decision Engine – what actions do we want to take?
• Feedback from outcomes to improve model
• End-to-end platform implementation that refreshes the model
• Where are we in the hype cycle? – possibly near the peak?
How can we avoid a poorly performing AI solution?
Outcomes Focus
• What business problem will this solve?
• What outcomes does the business want?
• How will we measure success?
• Is the solution dependable and cost-effective?
• What actions should the model trigger?
• Define non-functional requirements e.g.
infrastructure requirements in production,
up-time, scalability
Ford Motor Company
Amazon.com
Outcomes Focus
➢Define a set of metrics for testing machine
learning model
➢Include metrics in machine learning model
monitoring process
➢Actions triggered by model predictions need
to be coded, tested, deployed, maintained and
monitored also – most likely by a team with
different technical skillset
➢How can we manage the risks?
Ford Motor Company
Amazon.com
Outcomes Focus – Ensuring Success
• Business sponsorship
• Realistic budget, timeframe
• Business actively involved in defining requirements and testing
solution
• Digitization journey – most success where more data digitized
• Manage change – how will solutions impact the way people do
their jobs and serve customers? How will outcomes be trusted?
• Infrastructure to suit the solution
• The right technical and business teams
• On-going monitoring and refresh
• Sufficient volume of quality Data!
• Does the solution make sense long-term?
Ford Motor Company
Amazon.com
Data Science Architecture
• Analytics pipeline – data flows from source, transformation, through model,
outputs prediction, which sends results to consumer or triggers action(s)
• Machine learning prediction models for different purposes:
• regression and classification models
e.g. score, scale, rank, clusters, anomalies
• Purpose influences:
• choice of model/algorithm
• data and features selected
• delivery mechanism(s) for outcomes from model
• monitoring and refresh requirements
➢ Decisions about how solution is architected
Architectures are Solution Specific
• Type of deployment
• Batch?
• Real-time?
• Time-series?
• Delivery mechanism
• Email? (batch)
• Report? (batch)
• Notification/Alert? (real-time,
time-series)
• Web Service? (real-time or
batched results)
• Embedded in App? (real-time)
• Online or offline?
• Action to take?
• ML model refresh frequency?
Data Science Architecture – Hidden Technical Debt
ML Modelling is the small black box in the middle
Data Science Architecture – Possible approaches
Four potential ML system architecture approaches:
How to Deploy Machine Learning Models: A Guide – March 2019 Christopher Samiullah
https://christophergs.github.io/machine%20learning/2019/03/17/how-to-deploy-machine-learning-models/
Data Ingestion Layer
• What data do we need?
• What data is available?
• Where does the data reside? – on-premise, external (web, sensors, IoT)
• Is the solution batch or near real-time?
• Data quality
• Data pre-processing
• Data cleansing, missing values, integration, schema drift, filtering noise
• Privacy and regulatory requirements
• Security layer
Data Ingestion Layer
• Ideally make data available for all data scientists and modelling – business can
also use data for reporting and business intelligence.
• In reality:
• Different data for machine learning modelling than reporting
• Trying to integrate everything is a large task - focus only on data needed initially
if data is not already managed centrally
• Don’t wait for perfect data strategy
• Have an enterprise-wide approach to data ingestion and integration
➢Over time data ingestion and transformation will take less time, more data
already cleansed and integrated leading to better quality data inputs
Data Storage Layer
• What data from model will be stored?
• Input data – training set data (for version control)
• Model configuration at run-time (training and prediction)
• Model outcomes/targets
• Log files – status, audit, errors, exceptions
• Actions taken
• Where will ML modelling data be stored?
E.g. training sets, configurations
• In-memory, near real-time, batch? – different storage pipelines
Feature Engineering Layer
• Usually generated during model training
• Share high-quality features
• Many have value across the organisation – not just for machine learning
• In some cases can be generated in advance as part of ETL layer
• data cleansing, missing values, indicator variables, interaction features,
transformations e.g. extracting hour from datetime
• Functions can be defined to generate feature on retrieval e.g. via web service
• Store features for sharing and provide access paths
• Monitor data ETL and feature engineering processes
• Batch vs real-time requirements? – different feature generation pipeline and
storage timing
Feature Store – Uber
• Features are defined by data scientists and loaded by ETL
• Features and custom algorithms (models) are shared
between teams
• Features automatically calculated, stored and updated
• Common functions defined e.g. normalizing, datetime
• Easy to consume features
• Used for both training and prediction
• Accelerates machine learning projects & outcomes
• Different processes for real-time and batch
• Additional metadata added to feature
– owner, description, SLA
• 10,000 features in Uber feature store
Modelling Layer
• Model training and testing – exploratory, experimentation
• Evaluation and model selection
• Is the best performing model always the best choice for production?
- compress model for deployment – trade-off performance
- trade-off speed of scoring with accuracy (latency threshold is a business metric)
• Identify bias – can be inherent in underlying input data or collection mechanisms
• Explainability – regulatory requirements, reduces risk, improves testing
• In ML model who becomes accountable for outcomes?
• Different hardware and infrastructure needed for model training and inference
Testing the model
• Model should be tested using traditional software approaches
– system testing, integration testing, user acceptance testing
• Evaluate against business metrics and outcomes initially scoped
• Test extreme values/possible outliers
• Perturb the inputs
• Test for bias, test explainability
• Verify data quality
• Validate decision engine actions and reports based on data inputs
and predictions
DevOps and Deployment
• ML model version control – different requirements
• Reproducibility of experiments
• Configuration, parameters, hyperparameters, algorithms
• Data versioning - include data input for training model
• DataVersion Control (DVC)
• Where will model be deployed?
• Cloud
• On-premise
• Web service
• Embedded in app or physical equipment
• Within data ingestion pipeline – streaming, near real-time
• How will model be packaged for deployment e.g. PMML, ONNX, PFA, Pickle, Flask, etc
• Solution deployment – containerize e.g. Docker, Kubernetes/Kubeflow
• Continuous Delivery / Continuous Integration / Continuous Deployment (CI/CD)
Infrastructure and Security Layers
• Cloud, on-premise or hybrid?
• Batch or real-time – auto-scaling requirements and limitations
• Online or offline
• Requirements for data storage, ingestion, model training, inference, serving
outcomes/actions, scalability, model performance
• What hardware will the solution be deployed to? Model needs to function
effectively on production hardware – e.g. are there available GPU resources in
production or an image classification solution? Can model be compressed?
• How will data be input to model in production?
Example: We definitely want an offline model that is secure and
performs with millisecond speed and precision for a self-driving car
Prediction Layer
• Goal: Deliver right data and insights to consumers at right time
• How quickly does model scoring need to occur?
• What are latency requirements for actions and outcomes?
• Decision engine – what action(s) will be taken
based on predictions
• How will predictions be actioned?
• Web service call
• Report
• Batch process or workflow triggered
• Alert or notification
• Action on a piece of equipment (functional AI)
Feedback Layer
• Capture feedback from model outputs and actions through monitoring
• Include in data input to retrain and improve model
• Optimally include a human in the feedback process – correct tags, identify
anomalies, tuning corrections based on experience
• Feedback can come from consumers responses to actions:
• Click-through rate from email received for advertising campaign
• Uplift in sales
• Customer service improvement measures
• Equipment downtime reductions
• Use model to predict errors and correct e.g. Uber Eats ETA time
• Compare predictions with actual behavior e.g. recommended items
Monitoring Layer
• Model performance degrades over time (sometimes in < 24 hours)
• Trends and tastes change over time, competitors change
• Compare model performance with pre-defined business metrics
• Compare with baselines for specific data slices to identify bias
• Evaluate model outcomes in production (AUC-ROC, PR, distribution skews in input data and
features, mean reciprocal rank, etc)
• Monitor infrastructure, ETL, decision engine, model and other components
• Capture and monitor logs - Is 24/7 support required?What are the SLAs?
• Model predictions can lead to changes in user behaviour and model performance
e.g. credit card fraud solutions, change in pricing,
• Optimal hyperparameters may change over time – automate processes to test for improved choices,
and even model refresh process
Model Degradation Examples
• 30-day hospital readmissions prediction:
• Changes impacting model outside the control of the business or IT
• Fields in electronic health record were changed to make documentation easier – made some fields blank
• Lab tests were switched to a different lab who used different codes
• An additional type of insurance was accepted by the hospital changing the distribution of people who went
to the ER
• A new server was provisioned for some source data to improve performance – timestamps
mismatched causing data integration issues and the model failed
• An automated camera sensor for detecting defects in a production line failed due to – dirt on lense
and could not scan a product that wasn’t perfectly centred
• Credit card fraud alerts have been hijacked by fake SMS requesting responses
• Other examples: https://www.oreilly.com/radar/lessons-learned-turning-machine-learning-models-
into-real-products-and-services/
Model Refresh Layer
• How often should model be refreshed? – on-going tuning and redeployment
• Depends on:
• The application of the model
• Changes in data inputs or new data available e.g. latest call centre data, current retail sales and prices
• Likely speed of model degradation and changing trends in market
• Importance of up-to-date data for predictions and use case
• How long it takes to retrain and evaluate new model
• Changed outcomes and behaviour can produce an inferior new model
• Fraud solutions reduce fraudulent transactions in production – less anomalies in new production data
=> need a way of saving fraudulent transactions and accuracy indicators to use in the training dataset
• Can steps be taken to accelerate model training and deployment?
• What deployment strategy will be implemented for updates?
• Shadow mode
• Canary deployment
• Update operational database from predictions
Solution Specific Architectures
Price matching promises -
online
Bulk customer
marketing email
Voice-assistant
contact search Customer Risk of Loan Default report
Patient health monitoring – prognosis,
preventive treatments
Video stream intruder detection
Smart metre monitoring; IoT
device sensors
Credit card
fraud alert
Back to our Machine Learning solutions – what is the ideal architecture for each example?
And one last word fromTay’s sibling Zo
Correcting machine learning model degradation can also have adverse effects…
Up-coming Meetups and Slideshare
Introduction to Kubeflow for MLOps:
• https://www.meetup.com/MLOps-Melbourne/events/hskjjryznbfb/
Machine Learning ASAP:The shortest paths to production
• https://www.meetup.com/Enterprise-Data-Science-
Architecture/events/264185824/
ML Governance slides (Aug 7)
• https://www.slideshare.net/TerenceSiganakis/enterprise-machine-
learning-governance
THANKYOU!
tash.bickley@ftsg.com.au
linkedin.com/in/tashbickley/

Productionising Machine Learning Models

  • 1.
    PRODUCTIONISING MACHINE LEARNING MODELS TashBickley – FTS Data & AI linkedin.com/in/tashbickley/
  • 2.
    Tash Bickley Principal Consultant,FTS Data & AI https://www.linkedin.com/in/tashbickley/ About Me Background: - Database administration - Data engineering (ETL) - Business intelligence - Statistics and machine learning - Data analytics architecture advisory
  • 3.
    Productionising Machine LearningModels • Planning for production • Selecting the optimal architecture for your solution • Development and Deployment • Maintaining the quality of the solution and prediction outcomes in production
  • 4.
    Productionising Machine LearningModels • Around half of all businesses are developing machine learning solutions • More than 70% of industry leaders believe AI is important1 • Gartner predicts only 25% of machine learning projects are successful • Capgemini study found 15% of models made it to full-scale production • Many models in production quickly become liabilities – degrade within days or months 1. Dresner Report 2019 https://www.forbes.com/sites/louiscolumbus/2019/09/08/state-of-ai-and-machine-learning-in-2019/
  • 5.
    Microsoft Chatbot Tay.ai •Tay posted 96,000 tweets • < 24 hours to become offensive • ‘Repeat after me’ • No filter – jokes, irony, malice • No coherent personality – hybrid • Who was the audience? • Tweeted i love feminisim and I f*#!ng hate feminists and about Bruce Jenner (learned not repeated): caitlyn jenner is a hero & is a stunning, beautiful woman and caitlyn jenner isn't a real woman yet she won woman of the year?
  • 6.
    How can weavoid a poorly performing AI solution? • Focus on desired outcomes • Build a collaborative team with skills for end-to-end requirements • Data inputs – volume, relevance and quality • Modelling – feature engineering/training/testing/evaluating/model selection • Testing (software type not ML training) • Model designed and trained to suit platform for inference • Monitoring platform, data inputs and model targets/outcomes • DevOps • Decision Engine – what actions do we want to take? • Feedback from outcomes to improve model • End-to-end platform implementation that refreshes the model
  • 7.
    • Where arewe in the hype cycle? – possibly near the peak? How can we avoid a poorly performing AI solution?
  • 8.
    Outcomes Focus • Whatbusiness problem will this solve? • What outcomes does the business want? • How will we measure success? • Is the solution dependable and cost-effective? • What actions should the model trigger? • Define non-functional requirements e.g. infrastructure requirements in production, up-time, scalability Ford Motor Company Amazon.com
  • 9.
    Outcomes Focus ➢Define aset of metrics for testing machine learning model ➢Include metrics in machine learning model monitoring process ➢Actions triggered by model predictions need to be coded, tested, deployed, maintained and monitored also – most likely by a team with different technical skillset ➢How can we manage the risks? Ford Motor Company Amazon.com
  • 10.
    Outcomes Focus –Ensuring Success • Business sponsorship • Realistic budget, timeframe • Business actively involved in defining requirements and testing solution • Digitization journey – most success where more data digitized • Manage change – how will solutions impact the way people do their jobs and serve customers? How will outcomes be trusted? • Infrastructure to suit the solution • The right technical and business teams • On-going monitoring and refresh • Sufficient volume of quality Data! • Does the solution make sense long-term? Ford Motor Company Amazon.com
  • 11.
    Data Science Architecture •Analytics pipeline – data flows from source, transformation, through model, outputs prediction, which sends results to consumer or triggers action(s) • Machine learning prediction models for different purposes: • regression and classification models e.g. score, scale, rank, clusters, anomalies • Purpose influences: • choice of model/algorithm • data and features selected • delivery mechanism(s) for outcomes from model • monitoring and refresh requirements ➢ Decisions about how solution is architected
  • 12.
    Architectures are SolutionSpecific • Type of deployment • Batch? • Real-time? • Time-series? • Delivery mechanism • Email? (batch) • Report? (batch) • Notification/Alert? (real-time, time-series) • Web Service? (real-time or batched results) • Embedded in App? (real-time) • Online or offline? • Action to take? • ML model refresh frequency?
  • 13.
    Data Science Architecture– Hidden Technical Debt ML Modelling is the small black box in the middle
  • 14.
    Data Science Architecture– Possible approaches Four potential ML system architecture approaches: How to Deploy Machine Learning Models: A Guide – March 2019 Christopher Samiullah https://christophergs.github.io/machine%20learning/2019/03/17/how-to-deploy-machine-learning-models/
  • 15.
    Data Ingestion Layer •What data do we need? • What data is available? • Where does the data reside? – on-premise, external (web, sensors, IoT) • Is the solution batch or near real-time? • Data quality • Data pre-processing • Data cleansing, missing values, integration, schema drift, filtering noise • Privacy and regulatory requirements • Security layer
  • 16.
    Data Ingestion Layer •Ideally make data available for all data scientists and modelling – business can also use data for reporting and business intelligence. • In reality: • Different data for machine learning modelling than reporting • Trying to integrate everything is a large task - focus only on data needed initially if data is not already managed centrally • Don’t wait for perfect data strategy • Have an enterprise-wide approach to data ingestion and integration ➢Over time data ingestion and transformation will take less time, more data already cleansed and integrated leading to better quality data inputs
  • 17.
    Data Storage Layer •What data from model will be stored? • Input data – training set data (for version control) • Model configuration at run-time (training and prediction) • Model outcomes/targets • Log files – status, audit, errors, exceptions • Actions taken • Where will ML modelling data be stored? E.g. training sets, configurations • In-memory, near real-time, batch? – different storage pipelines
  • 18.
    Feature Engineering Layer •Usually generated during model training • Share high-quality features • Many have value across the organisation – not just for machine learning • In some cases can be generated in advance as part of ETL layer • data cleansing, missing values, indicator variables, interaction features, transformations e.g. extracting hour from datetime • Functions can be defined to generate feature on retrieval e.g. via web service • Store features for sharing and provide access paths • Monitor data ETL and feature engineering processes • Batch vs real-time requirements? – different feature generation pipeline and storage timing
  • 19.
    Feature Store –Uber • Features are defined by data scientists and loaded by ETL • Features and custom algorithms (models) are shared between teams • Features automatically calculated, stored and updated • Common functions defined e.g. normalizing, datetime • Easy to consume features • Used for both training and prediction • Accelerates machine learning projects & outcomes • Different processes for real-time and batch • Additional metadata added to feature – owner, description, SLA • 10,000 features in Uber feature store
  • 20.
    Modelling Layer • Modeltraining and testing – exploratory, experimentation • Evaluation and model selection • Is the best performing model always the best choice for production? - compress model for deployment – trade-off performance - trade-off speed of scoring with accuracy (latency threshold is a business metric) • Identify bias – can be inherent in underlying input data or collection mechanisms • Explainability – regulatory requirements, reduces risk, improves testing • In ML model who becomes accountable for outcomes? • Different hardware and infrastructure needed for model training and inference
  • 21.
    Testing the model •Model should be tested using traditional software approaches – system testing, integration testing, user acceptance testing • Evaluate against business metrics and outcomes initially scoped • Test extreme values/possible outliers • Perturb the inputs • Test for bias, test explainability • Verify data quality • Validate decision engine actions and reports based on data inputs and predictions
  • 22.
    DevOps and Deployment •ML model version control – different requirements • Reproducibility of experiments • Configuration, parameters, hyperparameters, algorithms • Data versioning - include data input for training model • DataVersion Control (DVC) • Where will model be deployed? • Cloud • On-premise • Web service • Embedded in app or physical equipment • Within data ingestion pipeline – streaming, near real-time • How will model be packaged for deployment e.g. PMML, ONNX, PFA, Pickle, Flask, etc • Solution deployment – containerize e.g. Docker, Kubernetes/Kubeflow • Continuous Delivery / Continuous Integration / Continuous Deployment (CI/CD)
  • 23.
    Infrastructure and SecurityLayers • Cloud, on-premise or hybrid? • Batch or real-time – auto-scaling requirements and limitations • Online or offline • Requirements for data storage, ingestion, model training, inference, serving outcomes/actions, scalability, model performance • What hardware will the solution be deployed to? Model needs to function effectively on production hardware – e.g. are there available GPU resources in production or an image classification solution? Can model be compressed? • How will data be input to model in production? Example: We definitely want an offline model that is secure and performs with millisecond speed and precision for a self-driving car
  • 24.
    Prediction Layer • Goal:Deliver right data and insights to consumers at right time • How quickly does model scoring need to occur? • What are latency requirements for actions and outcomes? • Decision engine – what action(s) will be taken based on predictions • How will predictions be actioned? • Web service call • Report • Batch process or workflow triggered • Alert or notification • Action on a piece of equipment (functional AI)
  • 25.
    Feedback Layer • Capturefeedback from model outputs and actions through monitoring • Include in data input to retrain and improve model • Optimally include a human in the feedback process – correct tags, identify anomalies, tuning corrections based on experience • Feedback can come from consumers responses to actions: • Click-through rate from email received for advertising campaign • Uplift in sales • Customer service improvement measures • Equipment downtime reductions • Use model to predict errors and correct e.g. Uber Eats ETA time • Compare predictions with actual behavior e.g. recommended items
  • 26.
    Monitoring Layer • Modelperformance degrades over time (sometimes in < 24 hours) • Trends and tastes change over time, competitors change • Compare model performance with pre-defined business metrics • Compare with baselines for specific data slices to identify bias • Evaluate model outcomes in production (AUC-ROC, PR, distribution skews in input data and features, mean reciprocal rank, etc) • Monitor infrastructure, ETL, decision engine, model and other components • Capture and monitor logs - Is 24/7 support required?What are the SLAs? • Model predictions can lead to changes in user behaviour and model performance e.g. credit card fraud solutions, change in pricing, • Optimal hyperparameters may change over time – automate processes to test for improved choices, and even model refresh process
  • 27.
    Model Degradation Examples •30-day hospital readmissions prediction: • Changes impacting model outside the control of the business or IT • Fields in electronic health record were changed to make documentation easier – made some fields blank • Lab tests were switched to a different lab who used different codes • An additional type of insurance was accepted by the hospital changing the distribution of people who went to the ER • A new server was provisioned for some source data to improve performance – timestamps mismatched causing data integration issues and the model failed • An automated camera sensor for detecting defects in a production line failed due to – dirt on lense and could not scan a product that wasn’t perfectly centred • Credit card fraud alerts have been hijacked by fake SMS requesting responses • Other examples: https://www.oreilly.com/radar/lessons-learned-turning-machine-learning-models- into-real-products-and-services/
  • 28.
    Model Refresh Layer •How often should model be refreshed? – on-going tuning and redeployment • Depends on: • The application of the model • Changes in data inputs or new data available e.g. latest call centre data, current retail sales and prices • Likely speed of model degradation and changing trends in market • Importance of up-to-date data for predictions and use case • How long it takes to retrain and evaluate new model • Changed outcomes and behaviour can produce an inferior new model • Fraud solutions reduce fraudulent transactions in production – less anomalies in new production data => need a way of saving fraudulent transactions and accuracy indicators to use in the training dataset • Can steps be taken to accelerate model training and deployment? • What deployment strategy will be implemented for updates? • Shadow mode • Canary deployment • Update operational database from predictions
  • 29.
    Solution Specific Architectures Pricematching promises - online Bulk customer marketing email Voice-assistant contact search Customer Risk of Loan Default report Patient health monitoring – prognosis, preventive treatments Video stream intruder detection Smart metre monitoring; IoT device sensors Credit card fraud alert Back to our Machine Learning solutions – what is the ideal architecture for each example?
  • 30.
    And one lastword fromTay’s sibling Zo Correcting machine learning model degradation can also have adverse effects…
  • 31.
    Up-coming Meetups andSlideshare Introduction to Kubeflow for MLOps: • https://www.meetup.com/MLOps-Melbourne/events/hskjjryznbfb/ Machine Learning ASAP:The shortest paths to production • https://www.meetup.com/Enterprise-Data-Science- Architecture/events/264185824/ ML Governance slides (Aug 7) • https://www.slideshare.net/TerenceSiganakis/enterprise-machine- learning-governance
  • 32.