Estimating Time To Default

808
-1

Published on

This presentation is an adaptation of the methodology used in the author\'s paper, Survival prediction using gene expression data: a review and comparison, to the credit default modelling.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
808
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Estimating Time To Default

  1. 1. Estimating Time to Defaultusing Survival Analysis tools Problems arising due to insufficient samples and censoring Estimating Time to Default using Survival Analysis 1
  2. 2. Contents• Purpose of the presentation• State the problem• Formalize the problem and the model• Data used and method applied• Model assessment• Results• Conclusions Estimating Time to Default using Survival Analysis 2
  3. 3. Purpose of the presentation• Adapt the author’s paper “Survival prediction using gene expression data: a review and comparison” to a company default setting• Establish the framework in which company defaults can be viewed as a Survival Analysis problem• Define the data generated and the model applied• Present the findings and the conclusions Estimating Time to Default using Survival Analysis 3
  4. 4. Formalizing the task 1The problem• Assume that there is a way to measure many features of few companies (questionnaire, qualitative research)• This results in large quantities of data, but only a few independent samples• It would be useful to know which features are relevant for use of statistical methods• Standard statistics cannot be used• Observations might be censored i.e. the company doesn’t default while observed Estimating Time to Default using Survival Analysis 4
  5. 5. Formalizing the task 2Requirements of a solution• Apply a method which can incorporate the censoring of the data• It also has to be able to reduce the number of predictors efficiently• It should be qualitatively well posed, i.e. characterizing the relevant features• It should be time and computation power efficient Estimating Time to Default using Survival Analysis 5
  6. 6. Formalizing the task 3Definitions I• A company is in default if it fails to meet its obligations• Merge/acquisition does not qualify• An event occurs if the company defaults or if it gets out of scope for any other reason (including end of observation)• Time observed is understood as the time between the beginning of the observation and the occurrence of an event Estimating Time to Default using Survival Analysis 6
  7. 7. Formalizing the task 4Definitions II• An event is censored if it is not a default• An indicator shows whether an event is censored or not (True/False)• For every sample, there is an observation of (censored) time to default• On every sample (i.e. company) the same features are measured (predictors)• A model is defining a connection between the predictors and the observations Estimating Time to Default using Survival Analysis 7
  8. 8. Data used• Since no real-life data is available, this presentation is based on a simulation• The simulation assumes 500 features measured on 50 companies• Only 50 features are relevant predictors, i.e. the simulated time to default is dependent only on 50 features• 1/3 of the observed time are censored• The simulation has been run 1000 times Estimating Time to Default using Survival Analysis 8
  9. 9. Method applied• The methodology is called Supervised Principal Component Analysis method• It was developed by Blair et al. for similar setups• It has the advantage of first finding the relevant predictors (thus the name supervised) and then building quick-to-use predictors from them (principal component analysis) Estimating Time to Default using Survival Analysis 9
  10. 10. Model assessment• Assessing the model is based on a measure of success in estimation• The most straightforward measures are applied: • How large part of the relevant and irrelevant predictors had been characterized as relevant • p-value: measuring the probability of accidental success• The results are shown in the following slides Estimating Time to Default using Survival Analysis 10
  11. 11. Selection of each feature (% over the 1000 runs)Relevantfeatures 90% Irrelevant features with extra noise 30% Irrelevant features with no extra noise 15% Estimating Time to Default using Survival Analysis 11
  12. 12. Histogram of relevant genes selected Estimating Time to Default using Survival Analysis 12
  13. 13. Histogram of irrelevant genes selected Estimating Time to Default using Survival Analysis 13
  14. 14. P-values of the principal component constructed Estimating Time to Default using Survival Analysis 14
  15. 15. Conclusions• The method found the relevant features in most of the cases• With a good threshold (appearance over 90%), the relevant features can be found• The p-values show high prediction power• The estimates can be used for evaluation• Really noisy features mislead the method• Human interaction in the evaluation cannot be omitted Estimating Time to Default using Survival Analysis 15

×