Beyond Classification and Ranking: Constrained Optimization of the ROI


Published on

Published in: Economy & Finance, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Beyond Classification and Ranking: Constrained Optimization of the ROI

    1. 1. Beyond Classification and Ranking: Constrained Optimization of the ROI Lian Yan & Patrick Baldasare KDD’06
    2. 2. Outline <ul><li>Introduction </li></ul><ul><ul><li>Example </li></ul></ul><ul><li>Related Work </li></ul><ul><li>Algorithm </li></ul><ul><li>Experiment </li></ul><ul><li>Conclusion </li></ul>
    3. 3. Introduction <ul><li>Financial Service Industry </li></ul><ul><li>Data Mining </li></ul><ul><li>Classification </li></ul><ul><li>Prediction </li></ul>
    4. 4. Return on Investment (ROI) <ul><li>ROI is the ratio of money gained or lost on an investment relative to the amount of money invested. </li></ul><ul><li>$50/$1,000 = 5% ROI </li></ul><ul><li>$20/$100 = 20% ROI </li></ul>
    5. 5. Example <ul><li>Used a classifier to predict defection of mutual fund accounts for a major US mutual fund company. </li></ul><ul><li>Positive samples are defined as those accounts with a net redemption amount of 35% or more of the account balance within a two-month window. </li></ul><ul><ul><li>net redemption amount = redemption minus purchase </li></ul></ul>
    6. 6. Real-world evaluation results <ul><li>two levels of defection risk </li></ul><ul><li>three segments based on account values </li></ul>
    7. 7. Fixed budget <ul><li>ROI of the project is determined by the amount of redemptions prevented. </li></ul><ul><li>Simply classifying does not enable the mutual fund company to reach out to those accounts with the highest redemption amount. </li></ul>
    8. 8. Example 2 <ul><li>Predict collectability of delinquent accounts receivable for credit card issuers </li></ul><ul><ul><li>credit, demographic, account data </li></ul></ul><ul><ul><li>binary class </li></ul></ul><ul><ul><ul><li>payment be received within a certain period </li></ul></ul></ul>
    9. 9. Difference of Maximation <ul><li>True positive rate among accounts in the collection process </li></ul><ul><ul><li>-> classification accuracy </li></ul></ul><ul><li>Collectable amount for the collection process </li></ul><ul><ul><li>-> ROI </li></ul></ul>
    10. 10. Budget constraint <ul><li>Budget constraint determines </li></ul><ul><ul><li>how many mutual fund accounts the customer service team can reach out every month </li></ul></ul><ul><ul><li>how many accounts receivable can be placed into a specific collection process </li></ul></ul><ul><li>pull rate r is the percentage of accounts to pull out for a specific intervention/ collection process. </li></ul>
    11. 11. Find Observe Function <ul><li>x as the target monetary measure </li></ul><ul><ul><li>E.g. collection amount, which directly determines the ROI. </li></ul></ul><ul><li>Find function y ( e ) </li></ul><ul><ul><li>e is the independent variables </li></ul></ul><ul><li>accounts in the top r % by y correspond to those in the top r % by the target </li></ul>
    12. 12. Observe Function <ul><li>Maximizing the ROI can be formally defined as </li></ul><ul><li>i = 0, 1, . . . , n − 1, and n is the total number of accounts </li></ul>
    13. 13. Cost-sensitive learning <ul><li>Minimize the cost </li></ul><ul><li>Cost Matrix </li></ul>
    14. 14. Cost-sensitive learning Cont. <ul><li>P = set of positive samples </li></ul><ul><li>N= set of negative samples </li></ul><ul><li>q i , q j are both posterior probabilities of belonging to the positive class </li></ul><ul><li>C 00 = 0, C 11 = 0 </li></ul><ul><li>C 01 = x (the target monetary measure) </li></ul><ul><li>C 10 is not a constant and unknown </li></ul>
    15. 15. Regression & Ranking Models <ul><li>Regression Model </li></ul><ul><ul><li>y(ei) = xi </li></ul></ul><ul><ul><li>i = 0 , . . . , n−1, </li></ul></ul><ul><li>Ranking Model </li></ul><ul><ul><li>y(ei) > y(ej) </li></ul></ul><ul><ul><li>( i, j) ∈ {(i, j)|xi > xj, i, j = 0, . . . , n − 1} </li></ul></ul>Maximization Σ xi y(ei)∈Top r%
    16. 16. Constrained Optimization <ul><li>0 ≤ y ≤ 1 </li></ul><ul><li>decision threshold β (0 <β < 1) </li></ul><ul><li>I(yi, β) is nondifferentiable </li></ul>
    17. 17. Differentiable Approximation <ul><li>p > 1 </li></ul><ul><li>0 ≤ γ < 1 </li></ul>
    18. 18. More Problem <ul><li>f(yi, β) is often not close to 1 </li></ul><ul><li>≠ = r </li></ul>
    19. 19. Approximate to Related Ratio <ul><li>p > 1 </li></ul>
    20. 20. Convert to Unconstrained Optimization <ul><li>minimizing the Lagrangian </li></ul><ul><li>Improve results </li></ul><ul><ul><li>Mapping Xi to value between -1 and 1 </li></ul></ul>
    21. 21. Algorithm <ul><li>Parametric model </li></ul><ul><ul><li>differentiable objective function </li></ul></ul><ul><li>Multilayer perceptron (MLP) network with softmax outputs between 0 and 1 </li></ul><ul><ul><li>single hidden layer </li></ul></ul><ul><li>This paper found that fixing β at 0.5 achieves almost the same results </li></ul>
    22. 22. Comparing Methods <ul><li>Classification </li></ul><ul><li>Weighted classification </li></ul><ul><li>Ranking </li></ul><ul><li>Regression </li></ul>
    23. 23. Classification <ul><li>Classification </li></ul><ul><ul><li>Trained by mean squared error </li></ul></ul><ul><ul><ul><li>35% of the account balance </li></ul></ul></ul><ul><ul><ul><li>top r% of x </li></ul></ul></ul><ul><ul><li>Imbalanced data sets </li></ul></ul><ul><ul><ul><li>class prior is typically low </li></ul></ul></ul>
    24. 24. Weighted Classification & Regression <ul><li>Weighted Classification </li></ul><ul><ul><li>Weighted by x or a function of x </li></ul></ul><ul><ul><ul><li>Use sigmoid function to avoid extreme value of x </li></ul></ul></ul><ul><li>Regression </li></ul><ul><ul><li>Map x to a value between 0 and 1 using the sigmoid function </li></ul></ul>
    25. 25. Ranking <ul><li>C. Burges, T. Shaked, et al. Learning to rank using gradient descent. In Proc. of the 22nd Intl. Conf. on Machine Learning, 2005. </li></ul><ul><li>Minimize </li></ul><ul><li>is probability of xi > xj </li></ul><ul><li>Cost function </li></ul>
    26. 26. Predicting Collectibility of Accounts Receivable <ul><li>Accounts receivable </li></ul><ul><ul><li>unpaid customer invoices </li></ul></ul><ul><ul><li>money owed to a company by its customers </li></ul></ul><ul><li>Banks & Federal </li></ul><ul><ul><li>extends credit, offers payment installment plans, or makes assessments </li></ul></ul><ul><li>The collection industry serves an important role in the U.S. economy </li></ul><ul><ul><li>saves American families on average $331 a year </li></ul></ul>
    27. 27. Goal <ul><li>Goal is to develop a generic predictive model which can be used to guide the agents’ collection efforts </li></ul>
    28. 28. Problem <ul><li>Identify a high value segment which consists of 11% of the whole </li></ul><ul><ul><li>The 11% is chosen since the payer rate (percentage of paid accounts in the first six months) is 11% </li></ul></ul><ul><li>Data set = 684,600 accounts </li></ul><ul><li>Account history & general demographic info </li></ul>
    29. 29. Detail <ul><li>Randomly split into 1:1 training and test set </li></ul><ul><li>Missing values </li></ul><ul><ul><li>continuous variables -> mean + binary column </li></ul></ul><ul><ul><li>categorical variables -> conditional mean + conditional standard deviation </li></ul></ul><ul><li>r = 11% and fix β at 0.5 </li></ul><ul><li>γ = 0.01 and p = 2 </li></ul><ul><li>iterations of μ is updated by μt+1 = 0.75μt </li></ul><ul><ul><li>t is the iteration index </li></ul></ul>
    30. 30. Pull Rate <ul><li>This figure shows convergence of pull rates achieved by the threshold β during the optimization. Line 1 is for the training set, and Line 2 shows the pull rate change over the test set. </li></ul>
    31. 31. Avg. Collection Amount <ul><li>This figure shows the improving average collection amount among the top 11% accounts during the optimization. Line 1 is for the training set, and Line 2 is over the test set. </li></ul>
    32. 32. Result <ul><li>Classification model is an ensemble of 25 MLP networks with a modified class prior between 0.02 and 0.5 </li></ul><ul><li>Weighted classification are weighted by </li></ul><ul><li>average collection amount over the whole portfolio is $36 only </li></ul>
    33. 33. Conclusion <ul><li>This paper proposed a new learning algorithm which focuses on maximizing the monetary measure under a fixed budget constraint. </li></ul>