1) The document discusses machine learning techniques for dynamic data environments where the data distribution may change over time.
2) It analyzes trade-offs between retraining models on new data versus using transfer learning when changes are detected, finding transfer learning can alleviate the bias-variance trade-off.
3) The effectiveness of transfer learning depends on factors like the relative amounts of same-distribution and different-distribution data and the complexity of the model.
9. Data Model Prediction
New Data New Model Prediction
• New data is often scarce (esp. right after change)
• Unsure when change actually happened
However …
10. What to do?
• Can / should we enhance our model robustness by increasing the
new training data sample size by leveraging historical data?
• Should we retrain the model immediately when change is detected
or later when more new data has become available?
Ø Augment the new data set!
11. What to do?
• Bias vs.Variance Trade-off
• Can / should we enhance our model robustness by increasing the
new training data sample size by leveraging historical data?
Ø Transfer learning paradigm
• Exploration vs. Exploitation Trade-off
• Should we retrain the model immediately when change is detected
or later when more new data has become available?
14. Theoretical Analysis
• Difference in data environment (pre-change vs. post-change) as
sample selection
– ! = 1 : diff-distribution
– ! = 0 : same-distribution
• Empirical risk minimization (ERM)
– Minimize
• Weight based on sample selection:
15. • Expected risk in target data
• Empirical risk using same- and diff-distribution data
• Empirical risk using on same-distribution data
To transfer or not transfer
S-S : Same-distribution source data (q)
S-D : Diff-distribution source data (p)
16. • Dd : difference of upper bounds in loss between non-transfer
learning and transfer learning
• ,
•
•
Effectiveness ofTransfer Learning
Relative size of diff- vs. same-distribution data examples
Complexity of the model
Extent of data change
17. Effectiveness ofTransfer Learning
• Depends on …
• The amount of same-distribution source data (q) relative to the diff-
distribution source data (p)
• The number of predictors being used in the prediction model (b)
• The extent of change across the source and the target data sets
(a/b)
20. • ADWIN algorithm:
• monitoring out-of-sample prediction error of a pre-trained model
1,000 data points: r=1 1,000 data points: r=0
Detecting Changes in Data Patterns
21. Ø In response to changes …
• Using transfer learning
– Transfer – weighting / equal weight
• Using only same-distribution source data
– Retraining (Dropping)
Ø Performance metrics
• Mean squared error (MSE)
– MSE = Bias2 + Variance
Analysis Strategies Compared
28. Ø Contributions
• Understand the effectiveness of transfer learning from a sample
selection perspective
• Trade-offs in response to changes in data patterns
– Bias-variance trade-off is alleviated by strategic transfer learning
– The tension of the exploration-exploitation trade-off differs among the
two alternative strategies (using transfer learning or not).
Ø Implications for data analytics practice
• Consistent monitoring of the prediction performance and re-
considering the fitness of the prediction model
• Development of model representing the changing environment
• Optimization of waiting time to gain reliable model adjustment
– Value (cost) of prediction error?
– Value of change detection accuracy?
Conclusions