SlideShare a Scribd company logo
Machine Learning in Dynamic Data Environments:
Trade-offs in Response to Changes
Jungpil Hahn
jungpil@nus.edu.sg
Machine Learning
Data Model Prediction
S : Source Data
T : Target Data
The ML paradigm works well when…
• Trained model is accurate
• Target data is similar to source data
• The world doesn’t change
Change is the only
constant
~ Heraclitos
(circa 500BC)
What can we do when change happens?
Data Model Prediction
New Data New Model Prediction
Data Model Prediction
New Data New Model Prediction
• New data is often scarce (esp. right after change)
• Unsure when change actually happened
However …
What to do?
• Can / should we enhance our model robustness by increasing the
new training data sample size by leveraging historical data?
• Should we retrain the model immediately when change is detected
or later when more new data has become available?
Ø Augment the new data set!
What to do?
• Bias vs.Variance Trade-off
• Can / should we enhance our model robustness by increasing the
new training data sample size by leveraging historical data?
Ø Transfer learning paradigm
• Exploration vs. Exploitation Trade-off
• Should we retrain the model immediately when change is detected
or later when more new data has become available?
Model Setup
Theoretical Analysis
Theoretical Analysis
• Difference in data environment (pre-change vs. post-change) as
sample selection
– ! = 1 : diff-distribution
– ! = 0 : same-distribution
• Empirical risk minimization (ERM)
– Minimize
• Weight based on sample selection:
• Expected risk in target data
• Empirical risk using same- and diff-distribution data
• Empirical risk using on same-distribution data
To transfer or not transfer
S-S : Same-distribution source data (q)
S-D : Diff-distribution source data (p)
• Dd : difference of upper bounds in loss between non-transfer
learning and transfer learning
• ,
•
•
Effectiveness ofTransfer Learning
Relative size of diff- vs. same-distribution data examples
Complexity of the model
Extent of data change
Effectiveness ofTransfer Learning
• Depends on …
• The amount of same-distribution source data (q) relative to the diff-
distribution source data (p)
• The number of predictors being used in the prediction model (b)
• The extent of change across the source and the target data sets
(a/b)
Numerical Analysis
Simulate Changing Data Pattern
• Linear model: y=x×β+ε
• β=!
• k= {10, 20, … 50, 60}
– x=(x1, x2, x3, …, xk), %~'! (, * , (σij)=0.5 for i≠j and 1 for i=j.
• ε follows normal distribution.To keep R2=0.6, var(ε) equals
– +,! %×. ×
"# $!
$!
• Selection model: Pr # = 0 &, ( = ) *!& + ,"(
• "!=!
• #"= {0.3, 0.5, …, 1.5}
• ADWIN algorithm:
• monitoring out-of-sample prediction error of a pre-trained model
1,000 data points: r=1 1,000 data points: r=0
Detecting Changes in Data Patterns
Ø In response to changes …
• Using transfer learning
– Transfer – weighting / equal weight
• Using only same-distribution source data
– Retraining (Dropping)
Ø Performance metrics
• Mean squared error (MSE)
– MSE = Bias2 + Variance
Analysis Strategies Compared
Trade-off #1: Retraining vs.Transfer Learning
Retraining vs.Transfer Learning
Bias2 Variance
MSE = Bias2 + Variance
Trade-off #2: Now or Later
Retrain Transfer
Trade-off #2: Now or Later
Retrain – Transfer
Ø Contributions
• Understand the effectiveness of transfer learning from a sample
selection perspective
• Trade-offs in response to changes in data patterns
– Bias-variance trade-off is alleviated by strategic transfer learning
– The tension of the exploration-exploitation trade-off differs among the
two alternative strategies (using transfer learning or not).
Ø Implications for data analytics practice
• Consistent monitoring of the prediction performance and re-
considering the fitness of the prediction model
• Development of model representing the changing environment
• Optimization of waiting time to gain reliable model adjustment
– Value (cost) of prediction error?
– Value of change detection accuracy?
Conclusions

More Related Content

What's hot

Matlab Data And Statistics
Matlab Data And StatisticsMatlab Data And Statistics
Matlab Data And Statistics
DataminingTools Inc
 
Matlab:Regression
Matlab:RegressionMatlab:Regression
Matlab:Regression
DataminingTools Inc
 
Preparing Data
Preparing DataPreparing Data
Preparing Data
Eng Teong Cheah
 
Cluster Forest
Cluster ForestCluster Forest
Cluster Forest
Romit Singhai
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
Mustafa Sherazi
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDell
Sri Ambati
 

What's hot (6)

Matlab Data And Statistics
Matlab Data And StatisticsMatlab Data And Statistics
Matlab Data And Statistics
 
Matlab:Regression
Matlab:RegressionMatlab:Regression
Matlab:Regression
 
Preparing Data
Preparing DataPreparing Data
Preparing Data
 
Cluster Forest
Cluster ForestCluster Forest
Cluster Forest
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDell
 

Similar to Predictive Analytics in Dynamic Data Environments

Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Henock Beyene
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
eShikshak
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
Sanghamitra Deb
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Maninda Edirisooriya
 
AL slides.ppt
AL slides.pptAL slides.ppt
AL slides.ppt
ShehnazIslam1
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
NAGARAJANS68
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
PriyadharshiniG41
 
IME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptxIME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptx
Temp762476
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
pyingkodi maran
 
Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)
Bioinformatics and Computational Biosciences Branch
 
Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxStatistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptx
nagarajan740445
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
aneeshs28
 
Transfer learning-presentation
Transfer learning-presentationTransfer learning-presentation
Transfer learning-presentation
Bushra Jbawi
 
Data in science
Data in science Data in science
Data in science
Sreejith Aravindakshan
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
Gokulks007
 
Module 4 data analysis
Module 4 data analysisModule 4 data analysis
Module 4 data analysisILRI-Jmaru
 
Statistics for Data Analysis - ODE - BVP .pptx
Statistics for Data Analysis - ODE - BVP .pptxStatistics for Data Analysis - ODE - BVP .pptx
Statistics for Data Analysis - ODE - BVP .pptx
IsfandiyarEminli2
 

Similar to Predictive Analytics in Dynamic Data Environments (20)

Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
 
AL slides.ppt
AL slides.pptAL slides.ppt
AL slides.ppt
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
 
0 introduction
0  introduction0  introduction
0 introduction
 
IME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptxIME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptx
 
evaluation and credibility-Part 1
evaluation and credibility-Part 1evaluation and credibility-Part 1
evaluation and credibility-Part 1
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
 
SQLDay2013_MarcinSzeliga_DataInDataMining
SQLDay2013_MarcinSzeliga_DataInDataMiningSQLDay2013_MarcinSzeliga_DataInDataMining
SQLDay2013_MarcinSzeliga_DataInDataMining
 
Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)
 
Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxStatistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptx
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
 
Transfer learning-presentation
Transfer learning-presentationTransfer learning-presentation
Transfer learning-presentation
 
Data in science
Data in science Data in science
Data in science
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
 
Module 4 data analysis
Module 4 data analysisModule 4 data analysis
Module 4 data analysis
 
Statistics for Data Analysis - ODE - BVP .pptx
Statistics for Data Analysis - ODE - BVP .pptxStatistics for Data Analysis - ODE - BVP .pptx
Statistics for Data Analysis - ODE - BVP .pptx
 

More from Jungpil Hahn

Engaging the Crowd in Technology Development
Engaging the Crowd in Technology DevelopmentEngaging the Crowd in Technology Development
Engaging the Crowd in Technology Development
Jungpil Hahn
 
Making the Crowd Wiser: (Re)combination through Teaming in Crowdsourcing
Making the Crowd Wiser: (Re)combination through Teaming in CrowdsourcingMaking the Crowd Wiser: (Re)combination through Teaming in Crowdsourcing
Making the Crowd Wiser: (Re)combination through Teaming in Crowdsourcing
Jungpil Hahn
 
Impact of End-user PETs on Firms' Analytics Performance
Impact of End-user PETs on Firms' Analytics PerformanceImpact of End-user PETs on Firms' Analytics Performance
Impact of End-user PETs on Firms' Analytics Performance
Jungpil Hahn
 
Understanding Blockchain Governance Decentralization: An Agent-based Simulati...
Understanding Blockchain Governance Decentralization: An Agent-based Simulati...Understanding Blockchain Governance Decentralization: An Agent-based Simulati...
Understanding Blockchain Governance Decentralization: An Agent-based Simulati...
Jungpil Hahn
 
Presentation at AoM 2014
Presentation at AoM 2014Presentation at AoM 2014
Presentation at AoM 2014
Jungpil Hahn
 
CAS Symposium (Oct 12 2013)
CAS Symposium (Oct 12 2013)CAS Symposium (Oct 12 2013)
CAS Symposium (Oct 12 2013)
Jungpil Hahn
 
Archetypes of Crowdfunders’ Backing Behaviors and the Outcome of Crowdfunding...
Archetypes of Crowdfunders’ Backing Behaviors and the Outcome of Crowdfunding...Archetypes of Crowdfunders’ Backing Behaviors and the Outcome of Crowdfunding...
Archetypes of Crowdfunders’ Backing Behaviors and the Outcome of Crowdfunding...
Jungpil Hahn
 
Knowledge Overlap and Task Interdependence in IS Development
Knowledge Overlap and Task Interdependence in IS DevelopmentKnowledge Overlap and Task Interdependence in IS Development
Knowledge Overlap and Task Interdependence in IS Development
Jungpil Hahn
 

More from Jungpil Hahn (8)

Engaging the Crowd in Technology Development
Engaging the Crowd in Technology DevelopmentEngaging the Crowd in Technology Development
Engaging the Crowd in Technology Development
 
Making the Crowd Wiser: (Re)combination through Teaming in Crowdsourcing
Making the Crowd Wiser: (Re)combination through Teaming in CrowdsourcingMaking the Crowd Wiser: (Re)combination through Teaming in Crowdsourcing
Making the Crowd Wiser: (Re)combination through Teaming in Crowdsourcing
 
Impact of End-user PETs on Firms' Analytics Performance
Impact of End-user PETs on Firms' Analytics PerformanceImpact of End-user PETs on Firms' Analytics Performance
Impact of End-user PETs on Firms' Analytics Performance
 
Understanding Blockchain Governance Decentralization: An Agent-based Simulati...
Understanding Blockchain Governance Decentralization: An Agent-based Simulati...Understanding Blockchain Governance Decentralization: An Agent-based Simulati...
Understanding Blockchain Governance Decentralization: An Agent-based Simulati...
 
Presentation at AoM 2014
Presentation at AoM 2014Presentation at AoM 2014
Presentation at AoM 2014
 
CAS Symposium (Oct 12 2013)
CAS Symposium (Oct 12 2013)CAS Symposium (Oct 12 2013)
CAS Symposium (Oct 12 2013)
 
Archetypes of Crowdfunders’ Backing Behaviors and the Outcome of Crowdfunding...
Archetypes of Crowdfunders’ Backing Behaviors and the Outcome of Crowdfunding...Archetypes of Crowdfunders’ Backing Behaviors and the Outcome of Crowdfunding...
Archetypes of Crowdfunders’ Backing Behaviors and the Outcome of Crowdfunding...
 
Knowledge Overlap and Task Interdependence in IS Development
Knowledge Overlap and Task Interdependence in IS DevelopmentKnowledge Overlap and Task Interdependence in IS Development
Knowledge Overlap and Task Interdependence in IS Development
 

Recently uploaded

Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 

Recently uploaded (20)

Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 

Predictive Analytics in Dynamic Data Environments

  • 1. Machine Learning in Dynamic Data Environments: Trade-offs in Response to Changes Jungpil Hahn jungpil@nus.edu.sg
  • 3.
  • 4. Data Model Prediction S : Source Data T : Target Data
  • 5. The ML paradigm works well when… • Trained model is accurate • Target data is similar to source data • The world doesn’t change
  • 6. Change is the only constant ~ Heraclitos (circa 500BC)
  • 7. What can we do when change happens?
  • 8. Data Model Prediction New Data New Model Prediction
  • 9. Data Model Prediction New Data New Model Prediction • New data is often scarce (esp. right after change) • Unsure when change actually happened However …
  • 10. What to do? • Can / should we enhance our model robustness by increasing the new training data sample size by leveraging historical data? • Should we retrain the model immediately when change is detected or later when more new data has become available? Ø Augment the new data set!
  • 11. What to do? • Bias vs.Variance Trade-off • Can / should we enhance our model robustness by increasing the new training data sample size by leveraging historical data? Ø Transfer learning paradigm • Exploration vs. Exploitation Trade-off • Should we retrain the model immediately when change is detected or later when more new data has become available?
  • 14. Theoretical Analysis • Difference in data environment (pre-change vs. post-change) as sample selection – ! = 1 : diff-distribution – ! = 0 : same-distribution • Empirical risk minimization (ERM) – Minimize • Weight based on sample selection:
  • 15. • Expected risk in target data • Empirical risk using same- and diff-distribution data • Empirical risk using on same-distribution data To transfer or not transfer S-S : Same-distribution source data (q) S-D : Diff-distribution source data (p)
  • 16. • Dd : difference of upper bounds in loss between non-transfer learning and transfer learning • , • • Effectiveness ofTransfer Learning Relative size of diff- vs. same-distribution data examples Complexity of the model Extent of data change
  • 17. Effectiveness ofTransfer Learning • Depends on … • The amount of same-distribution source data (q) relative to the diff- distribution source data (p) • The number of predictors being used in the prediction model (b) • The extent of change across the source and the target data sets (a/b)
  • 19. Simulate Changing Data Pattern • Linear model: y=x×β+ε • β=! • k= {10, 20, … 50, 60} – x=(x1, x2, x3, …, xk), %~'! (, * , (σij)=0.5 for i≠j and 1 for i=j. • ε follows normal distribution.To keep R2=0.6, var(ε) equals – +,! %×. × "# $! $! • Selection model: Pr # = 0 &, ( = ) *!& + ,"( • "!=! • #"= {0.3, 0.5, …, 1.5}
  • 20. • ADWIN algorithm: • monitoring out-of-sample prediction error of a pre-trained model 1,000 data points: r=1 1,000 data points: r=0 Detecting Changes in Data Patterns
  • 21. Ø In response to changes … • Using transfer learning – Transfer – weighting / equal weight • Using only same-distribution source data – Retraining (Dropping) Ø Performance metrics • Mean squared error (MSE) – MSE = Bias2 + Variance Analysis Strategies Compared
  • 22. Trade-off #1: Retraining vs.Transfer Learning
  • 23. Retraining vs.Transfer Learning Bias2 Variance MSE = Bias2 + Variance
  • 24.
  • 25. Trade-off #2: Now or Later Retrain Transfer
  • 26. Trade-off #2: Now or Later Retrain – Transfer
  • 27.
  • 28. Ø Contributions • Understand the effectiveness of transfer learning from a sample selection perspective • Trade-offs in response to changes in data patterns – Bias-variance trade-off is alleviated by strategic transfer learning – The tension of the exploration-exploitation trade-off differs among the two alternative strategies (using transfer learning or not). Ø Implications for data analytics practice • Consistent monitoring of the prediction performance and re- considering the fitness of the prediction model • Development of model representing the changing environment • Optimization of waiting time to gain reliable model adjustment – Value (cost) of prediction error? – Value of change detection accuracy? Conclusions