Upcoming SlideShare
×

# Real-time ranking with concept drift using expert advice

675

Published on

Hila Becker, Marta Arias, "Real-time ranking with concept drift using expert advice", in Proceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '07), 86-94

1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total Views
675
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
10
0
Likes
1
Embeds 0
No embeds

No notes for slide
• Given an infinite amount of continuous measurement, how do we model them in order to capture possibly time-evolving trends and patterns in the stream, compute the optimal model and make time critical decisions.
• Compute weighted average, divide into bins [i/epsilon,i+1/epsilon], compute the mean and std. div for the bin and check if can make confident prediction. (Fx-mean-std*t &gt; cost/transaction
• here, the rules of the game change
• ### Real-time ranking with concept drift using expert advice

1. 1. Real-time Ranking with Concept Drift Using Expert Advice Hila Becker and Marta Arias Center for Computational Learning Systems Columbia University
2. 2. Dynamic Ranking <ul><li>Continuous arrival of data over time </li></ul><ul><li>Set of items to rank </li></ul><ul><ul><li>Dynamic features </li></ul></ul><ul><ul><li>Adapt to Changes </li></ul></ul><ul><li>Given a list of electrical grid components, produce a ranking according to failure susceptibility </li></ul>
3. 3. Problem Setting + + + - + - - - time + + + - + - - - + + + - + - - - + + + - + - - - ? . . . t-1 t 1 2 3 ? ? ? ? ? ? ? M + Feature Vector x = x 1 ,x 2 ,…,x n Label y
4. 4. Challenges <ul><li>Changes in underlying distribution </li></ul><ul><ul><li>Hidden </li></ul></ul><ul><ul><li>Concept drift </li></ul></ul><ul><ul><li>Adapt learning model to improve predictions </li></ul></ul><ul><li>Finite storage space </li></ul><ul><ul><li>Sample from the data </li></ul></ul><ul><ul><li>Discard old or irrelevant information </li></ul></ul>
5. 5. Concept Drift + + + + + - + + - - - - - - time
6. 6. Ensemble Methods time
7. 7. Weighted Expert Ensembles <ul><li>Associate a weight with each expert </li></ul><ul><li>Measure of belief in expert performance </li></ul><ul><li>Weights used in final prediction </li></ul><ul><ul><li>Use only the best expert </li></ul></ul><ul><ul><li>Weighted average of predictions </li></ul></ul><ul><li>Update the weights after every prediction </li></ul>
8. 8. Weighted Majority Algorithm e 1 . . . e 2 e 3 e N N Experts 1 0 0 1 ? w 1 *1 + w 2 *0 + w 3 *0 + . . . + w N *1 >0.5 <0.5 1 0 1
9. 9. Modified Weighted Majority <ul><li>Different Constrains for data streams </li></ul><ul><ul><li>Incorporate new data </li></ul></ul><ul><ul><li>Static vs. Dynamic set of experts </li></ul></ul><ul><li>Ranking Algorithm </li></ul><ul><ul><li>Loss function – 1-normalized average rank of positive examples </li></ul></ul><ul><ul><li>Combine Predictions – weighted average rank </li></ul></ul>
10. 10. Online Ranking Algorithm e 1 . . . e 2 e 3 e B w 1 w 2 w 3 w B ? F1 F4 F3 F2 F5 F4 F2 F1 F3 F5 F1 F3 F5 F4 F2 F1 F3 F4 F2 F5 F1 F3 F4 F2 F5 F3 F1 F4 F2 F5 e B+1 e B+2 w B+1 w B+2
11. 11. Performance – Summer 05
12. 12. Performance – Winter 06
13. 13. Contributions <ul><li>Additive weighted ensemble based on the Weighted Majority algorithm </li></ul><ul><li>Algorithm adapted to ranking </li></ul><ul><li>Experiments on a Real-world datastream </li></ul><ul><ul><li>Outperform traditional approaches </li></ul></ul><ul><ul><li>Explore performance/complexity tradeoffs </li></ul></ul>
14. 14. Future Work <ul><li>Ensemble diversity control </li></ul><ul><li>Exploit re-occurring contexts </li></ul><ul><ul><li>Use knowledge of cyclic patterns </li></ul></ul><ul><ul><li>Revive old experts </li></ul></ul><ul><li>Change detection </li></ul><ul><li>Statistical estimation of predicting ensemble size </li></ul>
15. 15. Ensemble Methods <ul><li>Static ensemble with online learners [Hulten ’01] </li></ul><ul><li>Use batch-learners as experts </li></ul><ul><ul><li>Can use many learning algorithms </li></ul></ul><ul><ul><li>Loses interpretability </li></ul></ul><ul><li>Additive ensembles </li></ul><ul><ul><li>Train an expert at constant intervals [Street and Kim ’01] </li></ul></ul><ul><ul><li>Train an expert when performance declines [Kolter ’05] </li></ul></ul>
16. 16. Ensemble Pruning <ul><li>Additive ensembles can grow infinitely large </li></ul><ul><li>Criteria for removing experts </li></ul><ul><ul><li>Age - retire oldest model [Chu and Zaniolo ‘04] </li></ul></ul><ul><ul><li>Performance </li></ul></ul><ul><ul><ul><li>Worst in the ensemble </li></ul></ul></ul><ul><ul><ul><li>Below a minimal threshold [Stanley ’01] </li></ul></ul></ul><ul><ul><li>Instance-based Pruning [Wang et al. ’03] </li></ul></ul>
17. 17. Dealing with a moving set of experts <ul><li>Introduce new parameters </li></ul><ul><ul><li>B: “budget” (max number of models) set to 100 </li></ul></ul><ul><ul><li>p: new models weight percentile in [0,100] </li></ul></ul><ul><ul><li> : age penalty in (0,1] </li></ul></ul><ul><li>If too many models (more than B), drop models with poor q-score, where </li></ul><ul><ul><li>q i = w i • pow(  , age i ) </li></ul></ul><ul><ul><li>I.e.,  is rate of exponential decay </li></ul></ul>
18. 18. Performance Metric ranking outages pAUC=17/24=0.7 0 8 0 7 0 6 1 5 0 4 1 3 1 2 0 1 8 7 6 5 4 3 2 1 1 2 3
19. 19. Budget Variation
20. 20. Data Streams <ul><li>Continuous arrival of data over time </li></ul><ul><li>Real-world applications </li></ul><ul><ul><li>Consumer shopping patterns </li></ul></ul><ul><ul><li>Weather prediction </li></ul></ul><ul><ul><li>Electricity load forecasting </li></ul></ul><ul><li>Increased attention </li></ul><ul><ul><li>Companies collect data </li></ul></ul><ul><ul><li>Traditional approaches do not apply </li></ul></ul>
1. #### A particular slide catching your eye?

Clipping is a handy way to collect important slides you want to go back to later.