Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Beyond Classification and Ranking: Constrained Optimization of the ROI Lian Yan & Patrick Baldasare KDD’06
Outline <ul><li>Introduction </li></ul><ul><ul><li>Example </li></ul></ul><ul><li>Related Work </li></ul><ul><li>Algorithm...
Introduction <ul><li>Financial Service Industry </li></ul><ul><li>Data Mining </li></ul><ul><li>Classification </li></ul><...
Return on Investment (ROI) <ul><li>ROI is the ratio of money gained or lost on an investment relative to the amount of mon...
Example <ul><li>Used a classifier to predict defection of mutual fund accounts for a major US mutual fund company. </li></...
Real-world evaluation results <ul><li>two levels of defection risk </li></ul><ul><li>three segments based on account value...
Fixed budget <ul><li>ROI of the project is determined by the amount of redemptions prevented. </li></ul><ul><li>Simply cla...
Example 2 <ul><li>Predict collectability of delinquent accounts receivable for credit card issuers </li></ul><ul><ul><li>c...
Difference of Maximation <ul><li>True positive rate among accounts in the collection process </li></ul><ul><ul><li>->  cla...
Budget constraint <ul><li>Budget constraint determines  </li></ul><ul><ul><li>how many mutual fund accounts the customer s...
Find Observe Function <ul><li>x  as the target monetary measure </li></ul><ul><ul><li>E.g. collection amount, which direct...
Observe Function <ul><li>Maximizing the ROI can be formally defined as </li></ul><ul><li>i = 0, 1, . . . , n − 1, and n is...
Cost-sensitive learning <ul><li>Minimize the cost </li></ul><ul><li>Cost Matrix </li></ul>
Cost-sensitive learning Cont. <ul><li>P = set of positive samples </li></ul><ul><li>N= set of negative samples </li></ul><...
Regression & Ranking Models <ul><li>Regression Model </li></ul><ul><ul><li>y(ei) = xi </li></ul></ul><ul><ul><li>i =  0 , ...
Constrained Optimization <ul><li>0  ≤ y ≤ 1 </li></ul><ul><li>decision threshold  β (0 <β < 1) </li></ul><ul><li>I(yi,  β)...
Differentiable Approximation <ul><li>p > 1 </li></ul><ul><li>0 ≤ γ < 1 </li></ul>
More Problem <ul><li>f(yi, β) is often not close to 1 </li></ul><ul><li>≠  = r </li></ul>
Approximate to Related Ratio <ul><li>p > 1 </li></ul>
Convert to Unconstrained Optimization <ul><li>minimizing the Lagrangian </li></ul><ul><li>Improve results </li></ul><ul><u...
Algorithm <ul><li>Parametric model </li></ul><ul><ul><li>differentiable objective function </li></ul></ul><ul><li>Multilaye...
Comparing Methods <ul><li>Classification </li></ul><ul><li>Weighted  classification  </li></ul><ul><li>Ranking </li></ul><u...
Classification <ul><li>Classification </li></ul><ul><ul><li>Trained by mean squared error </li></ul></ul><ul><ul><ul><li>3...
Weighted Classification & Regression <ul><li>Weighted Classification </li></ul><ul><ul><li>Weighted by x or a function of ...
Ranking <ul><li>C. Burges, T. Shaked, et al. Learning to rank using gradient descent. In Proc. of the 22nd Intl. Conf. on ...
Predicting Collectibility of Accounts Receivable <ul><li>Accounts receivable </li></ul><ul><ul><li>unpaid customer invoice...
Goal <ul><li>Goal is to develop a generic predictive model which can be used to guide the agents’ collection efforts </li>...
Problem <ul><li>Identify a high value segment which consists of 11% of the whole </li></ul><ul><ul><li>The 11% is chosen s...
Detail <ul><li>Randomly split into 1:1 training and test set </li></ul><ul><li>Missing values </li></ul><ul><ul><li>contin...
Pull Rate <ul><li>This figure shows convergence of pull rates achieved by the threshold β during the optimization. Line 1 i...
Avg. Collection Amount <ul><li>This figure shows the improving average collection amount among the top 11% accounts during ...
Result <ul><li>Classification model is an ensemble of 25 MLP networks with a modified class prior between 0.02 and 0.5 </li>...
Conclusion <ul><li>This paper proposed a new learning algorithm which focuses on maximizing the monetary measure under a fi...
Upcoming SlideShare
Loading in …5
×

Beyond Classification and Ranking: Constrained Optimization of the ROI

420 views

Published on

Published in: Economy & Finance, Technology
  • Be the first to comment

  • Be the first to like this

Beyond Classification and Ranking: Constrained Optimization of the ROI

  1. 1. Beyond Classification and Ranking: Constrained Optimization of the ROI Lian Yan & Patrick Baldasare KDD’06
  2. 2. Outline <ul><li>Introduction </li></ul><ul><ul><li>Example </li></ul></ul><ul><li>Related Work </li></ul><ul><li>Algorithm </li></ul><ul><li>Experiment </li></ul><ul><li>Conclusion </li></ul>
  3. 3. Introduction <ul><li>Financial Service Industry </li></ul><ul><li>Data Mining </li></ul><ul><li>Classification </li></ul><ul><li>Prediction </li></ul>
  4. 4. Return on Investment (ROI) <ul><li>ROI is the ratio of money gained or lost on an investment relative to the amount of money invested. </li></ul><ul><li>$50/$1,000 = 5% ROI </li></ul><ul><li>$20/$100 = 20% ROI </li></ul>
  5. 5. Example <ul><li>Used a classifier to predict defection of mutual fund accounts for a major US mutual fund company. </li></ul><ul><li>Positive samples are defined as those accounts with a net redemption amount of 35% or more of the account balance within a two-month window. </li></ul><ul><ul><li>net redemption amount = redemption minus purchase </li></ul></ul>
  6. 6. Real-world evaluation results <ul><li>two levels of defection risk </li></ul><ul><li>three segments based on account values </li></ul>
  7. 7. Fixed budget <ul><li>ROI of the project is determined by the amount of redemptions prevented. </li></ul><ul><li>Simply classifying does not enable the mutual fund company to reach out to those accounts with the highest redemption amount. </li></ul>
  8. 8. Example 2 <ul><li>Predict collectability of delinquent accounts receivable for credit card issuers </li></ul><ul><ul><li>credit, demographic, account data </li></ul></ul><ul><ul><li>binary class </li></ul></ul><ul><ul><ul><li>payment be received within a certain period </li></ul></ul></ul>
  9. 9. Difference of Maximation <ul><li>True positive rate among accounts in the collection process </li></ul><ul><ul><li>-> classification accuracy </li></ul></ul><ul><li>Collectable amount for the collection process </li></ul><ul><ul><li>-> ROI </li></ul></ul>
  10. 10. Budget constraint <ul><li>Budget constraint determines </li></ul><ul><ul><li>how many mutual fund accounts the customer service team can reach out every month </li></ul></ul><ul><ul><li>how many accounts receivable can be placed into a specific collection process </li></ul></ul><ul><li>pull rate r is the percentage of accounts to pull out for a specific intervention/ collection process. </li></ul>
  11. 11. Find Observe Function <ul><li>x as the target monetary measure </li></ul><ul><ul><li>E.g. collection amount, which directly determines the ROI. </li></ul></ul><ul><li>Find function y ( e ) </li></ul><ul><ul><li>e is the independent variables </li></ul></ul><ul><li>accounts in the top r % by y correspond to those in the top r % by the target </li></ul>
  12. 12. Observe Function <ul><li>Maximizing the ROI can be formally defined as </li></ul><ul><li>i = 0, 1, . . . , n − 1, and n is the total number of accounts </li></ul>
  13. 13. Cost-sensitive learning <ul><li>Minimize the cost </li></ul><ul><li>Cost Matrix </li></ul>
  14. 14. Cost-sensitive learning Cont. <ul><li>P = set of positive samples </li></ul><ul><li>N= set of negative samples </li></ul><ul><li>q i , q j are both posterior probabilities of belonging to the positive class </li></ul><ul><li>C 00 = 0, C 11 = 0 </li></ul><ul><li>C 01 = x (the target monetary measure) </li></ul><ul><li>C 10 is not a constant and unknown </li></ul>
  15. 15. Regression & Ranking Models <ul><li>Regression Model </li></ul><ul><ul><li>y(ei) = xi </li></ul></ul><ul><ul><li>i = 0 , . . . , n−1, </li></ul></ul><ul><li>Ranking Model </li></ul><ul><ul><li>y(ei) > y(ej) </li></ul></ul><ul><ul><li>( i, j) ∈ {(i, j)|xi > xj, i, j = 0, . . . , n − 1} </li></ul></ul>Maximization Σ xi y(ei)∈Top r%
  16. 16. Constrained Optimization <ul><li>0 ≤ y ≤ 1 </li></ul><ul><li>decision threshold β (0 <β < 1) </li></ul><ul><li>I(yi, β) is nondifferentiable </li></ul>
  17. 17. Differentiable Approximation <ul><li>p > 1 </li></ul><ul><li>0 ≤ γ < 1 </li></ul>
  18. 18. More Problem <ul><li>f(yi, β) is often not close to 1 </li></ul><ul><li>≠ = r </li></ul>
  19. 19. Approximate to Related Ratio <ul><li>p > 1 </li></ul>
  20. 20. Convert to Unconstrained Optimization <ul><li>minimizing the Lagrangian </li></ul><ul><li>Improve results </li></ul><ul><ul><li>Mapping Xi to value between -1 and 1 </li></ul></ul>
  21. 21. Algorithm <ul><li>Parametric model </li></ul><ul><ul><li>differentiable objective function </li></ul></ul><ul><li>Multilayer perceptron (MLP) network with softmax outputs between 0 and 1 </li></ul><ul><ul><li>single hidden layer </li></ul></ul><ul><li>This paper found that fixing β at 0.5 achieves almost the same results </li></ul>
  22. 22. Comparing Methods <ul><li>Classification </li></ul><ul><li>Weighted classification </li></ul><ul><li>Ranking </li></ul><ul><li>Regression </li></ul>
  23. 23. Classification <ul><li>Classification </li></ul><ul><ul><li>Trained by mean squared error </li></ul></ul><ul><ul><ul><li>35% of the account balance </li></ul></ul></ul><ul><ul><ul><li>top r% of x </li></ul></ul></ul><ul><ul><li>Imbalanced data sets </li></ul></ul><ul><ul><ul><li>class prior is typically low </li></ul></ul></ul>
  24. 24. Weighted Classification & Regression <ul><li>Weighted Classification </li></ul><ul><ul><li>Weighted by x or a function of x </li></ul></ul><ul><ul><ul><li>Use sigmoid function to avoid extreme value of x </li></ul></ul></ul><ul><li>Regression </li></ul><ul><ul><li>Map x to a value between 0 and 1 using the sigmoid function </li></ul></ul>
  25. 25. Ranking <ul><li>C. Burges, T. Shaked, et al. Learning to rank using gradient descent. In Proc. of the 22nd Intl. Conf. on Machine Learning, 2005. </li></ul><ul><li>Minimize </li></ul><ul><li>is probability of xi > xj </li></ul><ul><li>Cost function </li></ul>
  26. 26. Predicting Collectibility of Accounts Receivable <ul><li>Accounts receivable </li></ul><ul><ul><li>unpaid customer invoices </li></ul></ul><ul><ul><li>money owed to a company by its customers </li></ul></ul><ul><li>Banks & Federal </li></ul><ul><ul><li>extends credit, offers payment installment plans, or makes assessments </li></ul></ul><ul><li>The collection industry serves an important role in the U.S. economy </li></ul><ul><ul><li>saves American families on average $331 a year </li></ul></ul>
  27. 27. Goal <ul><li>Goal is to develop a generic predictive model which can be used to guide the agents’ collection efforts </li></ul>
  28. 28. Problem <ul><li>Identify a high value segment which consists of 11% of the whole </li></ul><ul><ul><li>The 11% is chosen since the payer rate (percentage of paid accounts in the first six months) is 11% </li></ul></ul><ul><li>Data set = 684,600 accounts </li></ul><ul><li>Account history & general demographic info </li></ul>
  29. 29. Detail <ul><li>Randomly split into 1:1 training and test set </li></ul><ul><li>Missing values </li></ul><ul><ul><li>continuous variables -> mean + binary column </li></ul></ul><ul><ul><li>categorical variables -> conditional mean + conditional standard deviation </li></ul></ul><ul><li>r = 11% and fix β at 0.5 </li></ul><ul><li>γ = 0.01 and p = 2 </li></ul><ul><li>iterations of μ is updated by μt+1 = 0.75μt </li></ul><ul><ul><li>t is the iteration index </li></ul></ul>
  30. 30. Pull Rate <ul><li>This figure shows convergence of pull rates achieved by the threshold β during the optimization. Line 1 is for the training set, and Line 2 shows the pull rate change over the test set. </li></ul>
  31. 31. Avg. Collection Amount <ul><li>This figure shows the improving average collection amount among the top 11% accounts during the optimization. Line 1 is for the training set, and Line 2 is over the test set. </li></ul>
  32. 32. Result <ul><li>Classification model is an ensemble of 25 MLP networks with a modified class prior between 0.02 and 0.5 </li></ul><ul><li>Weighted classification are weighted by </li></ul><ul><li>average collection amount over the whole portfolio is $36 only </li></ul>
  33. 33. Conclusion <ul><li>This paper proposed a new learning algorithm which focuses on maximizing the monetary measure under a fixed budget constraint. </li></ul>

×