Upcoming SlideShare
×

# Beyond Classiﬁcation and Ranking: Constrained Optimization of the ROI

341 views
290 views

Published on

Published in: Economy & Finance, Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
341
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
4
0
Likes
0
Embeds 0
No embeds

No notes for slide
• ### Beyond Classiﬁcation and Ranking: Constrained Optimization of the ROI

1. 1. Beyond Classification and Ranking: Constrained Optimization of the ROI Lian Yan & Patrick Baldasare KDD’06
2. 2. Outline <ul><li>Introduction </li></ul><ul><ul><li>Example </li></ul></ul><ul><li>Related Work </li></ul><ul><li>Algorithm </li></ul><ul><li>Experiment </li></ul><ul><li>Conclusion </li></ul>
3. 3. Introduction <ul><li>Financial Service Industry </li></ul><ul><li>Data Mining </li></ul><ul><li>Classification </li></ul><ul><li>Prediction </li></ul>
4. 4. Return on Investment (ROI) <ul><li>ROI is the ratio of money gained or lost on an investment relative to the amount of money invested. </li></ul><ul><li>\$50/\$1,000 = 5% ROI </li></ul><ul><li>\$20/\$100 = 20% ROI </li></ul>
5. 5. Example <ul><li>Used a classifier to predict defection of mutual fund accounts for a major US mutual fund company. </li></ul><ul><li>Positive samples are defined as those accounts with a net redemption amount of 35% or more of the account balance within a two-month window. </li></ul><ul><ul><li>net redemption amount = redemption minus purchase </li></ul></ul>
6. 6. Real-world evaluation results <ul><li>two levels of defection risk </li></ul><ul><li>three segments based on account values </li></ul>
7. 7. Fixed budget <ul><li>ROI of the project is determined by the amount of redemptions prevented. </li></ul><ul><li>Simply classifying does not enable the mutual fund company to reach out to those accounts with the highest redemption amount. </li></ul>
8. 8. Example 2 <ul><li>Predict collectability of delinquent accounts receivable for credit card issuers </li></ul><ul><ul><li>credit, demographic, account data </li></ul></ul><ul><ul><li>binary class </li></ul></ul><ul><ul><ul><li>payment be received within a certain period </li></ul></ul></ul>
9. 9. Difference of Maximation <ul><li>True positive rate among accounts in the collection process </li></ul><ul><ul><li>-> classiﬁcation accuracy </li></ul></ul><ul><li>Collectable amount for the collection process </li></ul><ul><ul><li>-> ROI </li></ul></ul>
10. 10. Budget constraint <ul><li>Budget constraint determines </li></ul><ul><ul><li>how many mutual fund accounts the customer service team can reach out every month </li></ul></ul><ul><ul><li>how many accounts receivable can be placed into a specific collection process </li></ul></ul><ul><li>pull rate r is the percentage of accounts to pull out for a specific intervention/ collection process. </li></ul>
11. 11. Find Observe Function <ul><li>x as the target monetary measure </li></ul><ul><ul><li>E.g. collection amount, which directly determines the ROI. </li></ul></ul><ul><li>Find function y ( e ) </li></ul><ul><ul><li>e is the independent variables </li></ul></ul><ul><li>accounts in the top r % by y correspond to those in the top r % by the target </li></ul>
12. 12. Observe Function <ul><li>Maximizing the ROI can be formally defined as </li></ul><ul><li>i = 0, 1, . . . , n − 1, and n is the total number of accounts </li></ul>
13. 13. Cost-sensitive learning <ul><li>Minimize the cost </li></ul><ul><li>Cost Matrix </li></ul>
14. 14. Cost-sensitive learning Cont. <ul><li>P = set of positive samples </li></ul><ul><li>N= set of negative samples </li></ul><ul><li>q i , q j are both posterior probabilities of belonging to the positive class </li></ul><ul><li>C 00 = 0, C 11 = 0 </li></ul><ul><li>C 01 = x (the target monetary measure) </li></ul><ul><li>C 10 is not a constant and unknown </li></ul>
15. 15. Regression & Ranking Models <ul><li>Regression Model </li></ul><ul><ul><li>y(ei) = xi </li></ul></ul><ul><ul><li>i = 0 , . . . , n−1, </li></ul></ul><ul><li>Ranking Model </li></ul><ul><ul><li>y(ei) > y(ej) </li></ul></ul><ul><ul><li>( i, j) ∈ {(i, j)|xi > xj, i, j = 0, . . . , n − 1} </li></ul></ul>Maximization Σ xi y(ei)∈Top r%
16. 16. Constrained Optimization <ul><li>0 ≤ y ≤ 1 </li></ul><ul><li>decision threshold β (0 <β < 1) </li></ul><ul><li>I(yi, β) is nondifferentiable </li></ul>
17. 17. Differentiable Approximation <ul><li>p > 1 </li></ul><ul><li>0 ≤ γ < 1 </li></ul>
18. 18. More Problem <ul><li>f(yi, β) is often not close to 1 </li></ul><ul><li>≠ = r </li></ul>
19. 19. Approximate to Related Ratio <ul><li>p > 1 </li></ul>
20. 20. Convert to Unconstrained Optimization <ul><li>minimizing the Lagrangian </li></ul><ul><li>Improve results </li></ul><ul><ul><li>Mapping Xi to value between -1 and 1 </li></ul></ul>
21. 21. Algorithm <ul><li>Parametric model </li></ul><ul><ul><li>diﬀerentiable objective function </li></ul></ul><ul><li>Multilayer perceptron (MLP) network with softmax outputs between 0 and 1 </li></ul><ul><ul><li>single hidden layer </li></ul></ul><ul><li>This paper found that ﬁxing β at 0.5 achieves almost the same results </li></ul>
22. 22. Comparing Methods <ul><li>Classification </li></ul><ul><li>Weighted classiﬁcation </li></ul><ul><li>Ranking </li></ul><ul><li>Regression </li></ul>
23. 23. Classification <ul><li>Classification </li></ul><ul><ul><li>Trained by mean squared error </li></ul></ul><ul><ul><ul><li>35% of the account balance </li></ul></ul></ul><ul><ul><ul><li>top r% of x </li></ul></ul></ul><ul><ul><li>Imbalanced data sets </li></ul></ul><ul><ul><ul><li>class prior is typically low </li></ul></ul></ul>
24. 24. Weighted Classification & Regression <ul><li>Weighted Classification </li></ul><ul><ul><li>Weighted by x or a function of x </li></ul></ul><ul><ul><ul><li>Use sigmoid function to avoid extreme value of x </li></ul></ul></ul><ul><li>Regression </li></ul><ul><ul><li>Map x to a value between 0 and 1 using the sigmoid function </li></ul></ul>
25. 25. Ranking <ul><li>C. Burges, T. Shaked, et al. Learning to rank using gradient descent. In Proc. of the 22nd Intl. Conf. on Machine Learning, 2005. </li></ul><ul><li>Minimize </li></ul><ul><li>is probability of xi > xj </li></ul><ul><li>Cost function </li></ul>
26. 26. Predicting Collectibility of Accounts Receivable <ul><li>Accounts receivable </li></ul><ul><ul><li>unpaid customer invoices </li></ul></ul><ul><ul><li>money owed to a company by its customers </li></ul></ul><ul><li>Banks & Federal </li></ul><ul><ul><li>extends credit, oﬀers payment installment plans, or makes assessments </li></ul></ul><ul><li>The collection industry serves an important role in the U.S. economy </li></ul><ul><ul><li>saves American families on average \$331 a year </li></ul></ul>
27. 27. Goal <ul><li>Goal is to develop a generic predictive model which can be used to guide the agents’ collection efforts </li></ul>
28. 28. Problem <ul><li>Identify a high value segment which consists of 11% of the whole </li></ul><ul><ul><li>The 11% is chosen since the payer rate (percentage of paid accounts in the ﬁrst six months) is 11% </li></ul></ul><ul><li>Data set = 684,600 accounts </li></ul><ul><li>Account history & general demographic info </li></ul>
29. 29. Detail <ul><li>Randomly split into 1:1 training and test set </li></ul><ul><li>Missing values </li></ul><ul><ul><li>continuous variables -> mean + binary column </li></ul></ul><ul><ul><li>categorical variables -> conditional mean + conditional standard deviation </li></ul></ul><ul><li>r = 11% and ﬁx β at 0.5 </li></ul><ul><li>γ = 0.01 and p = 2 </li></ul><ul><li>iterations of μ is updated by μt+1 = 0.75μt </li></ul><ul><ul><li>t is the iteration index </li></ul></ul>
30. 30. Pull Rate <ul><li>This ﬁgure shows convergence of pull rates achieved by the threshold β during the optimization. Line 1 is for the training set, and Line 2 shows the pull rate change over the test set. </li></ul>
31. 31. Avg. Collection Amount <ul><li>This ﬁgure shows the improving average collection amount among the top 11% accounts during the optimization. Line 1 is for the training set, and Line 2 is over the test set. </li></ul>
32. 32. Result <ul><li>Classiﬁcation model is an ensemble of 25 MLP networks with a modiﬁed class prior between 0.02 and 0.5 </li></ul><ul><li>Weighted classiﬁcation are weighted by </li></ul><ul><li>average collection amount over the whole portfolio is \$36 only </li></ul>
33. 33. Conclusion <ul><li>This paper proposed a new learning algorithm which focuses on maximizing the monetary measure under a ﬁxed budget constraint. </li></ul>