Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Consumer credit-risk3440
1.
2.
3. Consumer Credit Summer 08, MFIN7011, Tang Consumer Credit Risk Default Risk (low in general) Low High Credit Products Fixed Term Revolving Residential Mortgage Retail Finance Personal Loans Overdrafts Credit Cards
4.
5.
6.
7. Summer 08, MFIN7011, Tang Consumer Credit Risk Never make predictions, especially about the future. — Casey Stengel
8.
9. Evaluating the credit applicant Summer 08, MFIN7011, Tang Consumer Credit Risk Time at present address Time at present job Residential status Debt ratio Bank reference Age Income # of Recent inquiries % of Balance to avail. lines # of Major derogs. Overall Decision Odds of repayment • • • CHARACTERISTICS + + - + + N / A - - + + + Accept ? • • • JUDGMENT 12 20 5 21 28 15 5 -7 10 35 212 Accept 11:1 • • • CREDIT SCORING
10.
11.
12. Input Data: FICO Score Summer 08, MFIN7011, Tang Consumer Credit Risk Not in the score: demographic data
13.
14.
15.
16. Statistical Credit Scoring Summer 08, MFIN7011, Tang Consumer Credit Risk Credit Score #Customers Good Credit Bad Credit Cut-off Score
54. Quadratic Programming Problem Summer 08, MFIN7011, Tang Consumer Credit Risk Minimize the cost function (Lagrangian) Here we have introduced non-negative Lagrange multipliers l n 0 that express the constraints
55.
56.
57.
Editor's Notes
will not calculate until another chapter.
will not calculate until another chapter.
Revolving credit: e.g. credit card
Consumer lending exception: CapitalOne , “a high tech firm that happens to be in the credit card industry”, quote of the founder
Categorical data. Dataset is usually very big : say 100,000 individuals.
Categorical data. Dataset is usually very big : say 100,000 individuals.
1. Distributional features of data
The nature of consumer credit lends itself to statistical analysis.
DA: e.g. Altman’s Z-score model. Normality: law of large numbers for large samples; so important for small samples
Apparently the further the two distributions are separated, the better the credit score model can discriminate good and bad credits. There are several measures that can be used to gauge the difference between the two distribution, e.g. Wilks ’ lamda, information value, alpha/beta error etc. see pp92 – 99 of Handbook.
How are the coefficients derived ? – see Altman ’ s paper. Altman selects the 5 variables from a list of 22 variables as doing the best overall job together in predicting corporate bankruptcy. Uses iterative procedures to evaluate different combinations of eligible variables and selects the profile that does the best job among the alternatives. Two sets of sample firms are used: healthy and bankrupt to find the discriminant function .
heteroscedasticity: use general least square method U taking 2 values: u = y – beta*x and y = 0, or 1; so 2 values for each i.
Mathematical programming: An objective criterion to optimize: e.g. the proportion of applicants correctly classified Subject to certain discriminating conditions (to discriminate good and bad credits) Recursive partitioning algorithm: A computerized nonparametric technique based on pattern recognition Results: a binary classification tree which assigns objects into selected a priori groups Terminal nodes represent final classification of all objects. From two samples of default and non-default firms, e.g., one can calculate the misclassified numbers. Expert systems : evidence shows the predictive performances are quite poor . Also known as artificial intelligence systems Computer-based decision-support systems A consultation module: asking users questions until enough evidence has been collected to support a final recommendation A knowledge base containing static data, algorithms, and rules that tell the system ‘what to do if’ A knowledge acquisition and learning module: rules from the lending officer and rules of its own
Classification methods that are easy to understand (such as regression, nearest neighbor and tree-based methods) are much more appealing , than those which are essentially black boxes (such as neural networks) But neural networks have advantages too
Two group : one is defaulted, the other non-defaulted. Logistic regression is hence more robust than linear models. Normality : for binary data , such as 0 or 1, it’s hard to justify they have normal distribution. This is severe in significance testing.
ML method: we want to obtain estimates of parameters that make the observations most likely to happen . This is usually done by specifying the exact distributions of error terms, i.e. the likelihood function.
The inverse of h is called the logit function : g=log[P/(1-P)] Note that function h guarantees P is between 0 and 1 , as required by probability.
M is the number of 1’s. that is, the number of defaults in the sample. Order l like this: the first are defaulted ones, then the non-defaulted ones. The probability is conditional here, i.e. conditional on X of the sample Likelihood function serves as the (minus) loss function that is to be optimized. The last function is the cumulative logistic distribution function
Simplification is correct: Pi_1_m (P)*Pi_m+1_n (1-P) = Pi_1_m (P/(1-P))*Pi_1_n (1-P) When do MLE, use the log version of the above formula NEED TO correct
All the observations are either default or non-default (so P’s are either 1 or 0 for the observations in the sample). Logit = log(odds) Log(odds) translates the odds of default to nondefault to be the opposite of the odds of nondefault to default (e.g. 2 vs. -2) Then the logit function is assumed to be linear . Not solvable by OLS: coz P is wither 1 or 0. coefficients are solved through MLE . If data set is large enough, we can use the sample relative frequency as an estimate of the true probability for each X level, then we have values of the logits, then OLS can be used This website is a logistic calculator: http://members.aol.com/johnp71/logistic.html
The regression line would be nonlinear None of the observations actually fall on the regression line. They all fall on 0 or 1 .
1. Explain thin tail implications: extreme events
Probit model uses probit function that maps the probability to a numerical value between – inf to inf. The probit function is the inverse of the normal cdf. Assume the observations are independent, one solves for beta ’ s using the ML estimation . See pp100 – 101 Handbook. The other link function often used is the logit function
Transfer function : that converts the combination of inputs to an output.
Several Neural network models: multilayer perceptron (MLP): best model for credit scoring purpose. mixture of experts (MOE) radial basis function (RBF) learning vector quantization (LVQ) fuzzy adaptive resonance (FAR) The weighted combination of inputs is called NET
The overall input g for a neuron is called potential . potential g is a linear combination of weights for the inputs X to the neuron The activation function is also called the transfer function that converts the potential to an output f. W0 is called bias , or the excitation threshold value . It’s like a constant in the regression model. One can set x0=1 and i starts from 0 in the summation sign
SVM can perform binary classification (pattern recognition) and real valued function approximation (regression estimation) tasks.
1. Smiley faces: good credits; stars: bad credits
Largest margin means the strongest differentiating power/most robust as any additional obligor that falls out of the margin region can be clearly identified; that falls into the middle is hard to label but this would incur no error in labeling. Support vectors: X (the data) associated with the two obligors are called support vectors .
NW regression: a nonparametric kernel method Fan’s paper is cited in Atiya’s review paper: predicting bankruptcies
Data get outdated : e.g. income will change; so behavior will change Application scores: the scores computed for applications, i.e. whether to extend facility based on this score Behavior scores: after facility has been granted Probability scores : not only want to know binary result, i.e. 0 and 1, but also the expected probability. This is important , e.g. calculating capitals and expected returns
will not calculate until another chapter.
will not calculate until another chapter.
will not calculate until another chapter.
will not calculate until another chapter.
Revolving credit: e.g. credit card
Revolving credit: e.g. credit card
will not calculate until another chapter.
In this example, X has 2 dimensions. W is perpendicular to the line.
Use the 2 nd equation minus the 1 st one: w’lamda = 2 => margin = 2/|w|
Minimizing |w| is equivalent to maximizing margin n refers to the number of obligors . The constraints: two groups must lie on either side of the margin y identifies default or non-default. Note the label here is different from other methods (say 0 and 1) This is linear SVM , the data are assumed to be linearly separable . (the constraint) The constraint is binding only for support vectors!
If the constraint is binding (i.e. =0), then lamda > 0 ( economic meaning : the shadow price of the constraint) If not binding (>0), then lamda = 0
Using dual is more convenient. See: http://en.wikipedia.org/wiki/Linear_programming#Duality Intuition: in primal problem, max the constraint to meet objective; in dual problem, min objective to meet constraint (use a graph to show this). Lamda is price, want to max it so more constraints are binding (less slack).