Successfully reported this slideshow.
Upcoming SlideShare
×

# Core deposits sensitivity and survival analysis

1,746 views

Published on

This presentation shows financial managers how to predict how long accounts will likely stay open. It is based on a sophisticated statistical probability model.

Published in: Economy & Finance
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Core deposits sensitivity and survival analysis

1. 1. Core Deposits Sensitivity and Survival Analysis Laura Roberts Hugh Blaxall Brian Velligan Sept. 13, 2010
2. 2. Research question 1a • Question 1a. How can we visually summarize account duration?
3. 3. Research question 1b • Question 1b. How can we predict the length of time a person will keep a core account open (account duration)? We cannot simply compute an average of account durations because we do not know how far into the future current accounts will “survive.” Simple means will produce a negatively biased estimate. • Perhaps we can revise our question to read, “What is the probability a person will keep an account open for a specific period of time?” This new question allows us to use survival analysis, hazard probabilities, and risk functions to get a detailed picture of account duration.
4. 4. Question 1b (continued) • Can we create a model using time and other indictors (e.g. interest rate or change in the interest rate on the account) as predictors of account duration? This is a more sophisticated question for another time…food for thought for now…
5. 5. Question 1c • 1c – How can we summarize typical account duration with a single index? Remember means and other simple average indices will not do the trick because we do not know how long accounts will stay open…
6. 6. What is the best statistical tool for answering each question? • Question 1a – to visually summarize duration use a histogram of the frequency of duration for censored and uncensored accounts. I’ll show you how to do this. • Question 1b - To predict duration, use survival analysis. • Question 1c – for a single index, we can use median lifetime survival probability…more on this…
7. 7. Background for Study • 1. Use a multi-cohort analysis such as accounts opened between 1972 and 1977 and studied until 1984. • 2. Measure duration of each account. • 3. Predict length of time until a given event, in this case, closing of the account. • 4. Some people will not close the account within the time period of observation. These people (accounts) are considered to be censored.
8. 8. © Willett, Harvard University Graduate School of Education, 03/19/14 S052/II.2(b) – Slide 8 Dataset acctdur.txt Overview Discrete-time person-level dataset on the duration of accounts opened between 1972 and 1977, and which were followed uninterruptedly until 1984. Source Bank records. Sample size 3941 accounts. More Info Singer & Willett, 2003 Let’s examine an example …Let’s examine an example … Note on the labeling of the discrete time “bins.” We regarded an account’s first year as their zeroth year. If they then are closed sometime during the following year, they were classified as having a duration of one year and having been closed in “year one.” Note on the labeling of the discrete time “bins.” We regarded an account’s first year as their zeroth year. If they then are closed sometime during the following year, they were classified as having a duration of one year and having been closed in “year one.” S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Introducing A Dataset On Account Duration S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Introducing A Dataset On Account Duration ““Multiple Cohort” Sample DesignMultiple Cohort” Sample Design Be aware that multiple annual cohorts of accounts are pooled together into this single sample: • Cohorts entered the sample sequentially between the 1972 and 1977.* • All cohorts were followed until the end of 1984. ““Multiple Cohort” Sample DesignMultiple Cohort” Sample Design Be aware that multiple annual cohorts of accounts are pooled together into this single sample: • Cohorts entered the sample sequentially between the 1972 and 1977.* • All cohorts were followed until the end of 1984. Important Distinction YouImportant Distinction You Must Keep In MindMust Keep In Mind The two “modern” approaches to survival analysis are distinct in the way that they require duration to be measured: • In discrete-time survival analysis, time is measured in discrete units, such as semesters, years, etc. • In continuous-time survival analysis, time can be measured to any level of precision. Important Distinction YouImportant Distinction You Must Keep In MindMust Keep In Mind The two “modern” approaches to survival analysis are distinct in the way that they require duration to be measured: • In discrete-time survival analysis, time is measured in discrete units, such as semesters, years, etc. • In continuous-time survival analysis, time can be measured to any level of precision. Research QuestionResearch Question Whether, and if so when, accounts are closed? Research QuestionResearch Question Whether, and if so when, accounts are closed?
9. 9. © Willett, Harvard University Graduate School of Education, 03/19/14 S052/II.2(b) – Slide 9 The dataset is straightforward, containing IDs and length of account, with one small hitch …The dataset is straightforward, containing IDs and length of account, with one small hitch … Structure of Dataset Col # Var Name Variable Description Variable Metric/Labels 1 ID Customer identification code. Integer 2 acctopen Number of years that the account remained open, or until the account was censored in 1984 by the end of the study. Integer 3 CENSOR Dummy variable to indicateto indicate whether an account was censored by the end of data collection in 1984. Dichotomous variable: 0 = not censored, 1 = censored. S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis The Difficult Problem of Censoring!!! S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis The Difficult Problem of Censoring!!! There is a problem that is intrinsic to survival data, and is illustrated in this dataset:  The event of importance in the study isis “closing an account.”  But not every customer (account) actually experiences this event while being observed by researchers.  We say that they are “censored” by the end of the data-collection. There is a problem that is intrinsic to survival data, and is illustrated in this dataset:  The event of importance in the study isis “closing an account.”  But not every customer (account) actually experiences this event while being observed by researchers.  We say that they are “censored” by the end of the data-collection. And, of course, some of the censored accounts will eventually experience the event of interest, but not while the researchers are watching!  Ignoring this can seriously impact estimates of time-to-event.  And, given that time-to-event is the focus of our research question, we need to figure out how to deal with this! And, of course, some of the censored accounts will eventually experience the event of interest, but not while the researchers are watching!  Ignoring this can seriously impact estimates of time-to-event.  And, given that time-to-event is the focus of our research question, we need to figure out how to deal with this!
10. 10. © Willett, Harvard University Graduate School of Education, 03/19/14 S052/II.2(b) – Slide 10 One sensible thing you can do is display the frequency with which each account length occurs, in a vertical histogram that includes all the accounts in the sample, both censored and un-censored. One sensible thing you can do is display the frequency with which each account length occurs, in a vertical histogram that includes all the accounts in the sample, both censored and un-censored. I created this vertical histogram by typing the frequencies of each account length into an EXCEL spreadsheet. You can create similar vertical histograms in SAS too, but I created this vertical histogram by typing the frequencies of each account length into an EXCEL spreadsheet. You can create similar vertical histograms in SAS too, but Note the impact of the multi-cohort research design – any account that was opened after 1977 and remained open longer than 6 years is a censored case. Note the impact of the multi-cohort research design – any account that was opened after 1977 and remained open longer than 6 years is a censored case. S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Exploring the Account Data ANSWER TO RESEARCH QUESTION 1a S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Exploring the Account Data ANSWER TO RESEARCH QUESTION 1a
11. 11. © Willett, Harvard University Graduate School of Education, 03/19/14 S052/II.2(b) – Slide 11 S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Exploring the Data S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Exploring the Data Here are two hopeless strategies for dealing with censoring, while summarizing account duration length … Here are two hopeless strategies for dealing with censoring, while summarizing account duration length … If we set the duration lengths of the censored accounts to their longest observed career length, the mean account duration for all accounts is 6.31 years. This too is a negatively biased estimate of true duration even if onlyonly one account has lastedone account has lasted longer than thelonger than the censored durationcensored duration. If we set the duration lengths of the censored accounts to their longest observed career length, the mean account duration for all accounts is 6.31 years. This too is a negatively biased estimate of true duration even if onlyonly one account has lastedone account has lasted longer than thelonger than the censored durationcensored duration. If you take the average of the duration lengths of only the uncensored accounts, their mean account duration is 3.73 years, which is a negatively biased estimate of the average population account If you take the average of the duration lengths of only the uncensored accounts, their mean account duration is 3.73 years, which is a negatively biased estimate of the average population account duration.
12. 12. © Willett, Harvard University Graduate School of Education, 03/19/14 S052/II.2(b) – Slide 12 Dataset Acct dur_PP.txt Overview Person-period dataset containing the same information as the Acctdur.txt person dataset, on the career duration of accounts who began between 1972 and 1977, and who were followed uninterruptedly until 1984. Source Bank records. Sample size 24875 annual person-period records. More Info Singer & Willett, 2003 You can resolve these problems by working with your data in a different format:  Re-format the data into a person-period format.  In a person-period dataset, you can estimate a different class of summary statistics that address the “whether” and “when” questions.  Hazard probability.  Survival probability.  Median lifetime. You can resolve these problems by working with your data in a different format:  Re-format the data into a person-period format.  In a person-period dataset, you can estimate a different class of summary statistics that address the “whether” and “when” questions.  Hazard probability.  Survival probability.  Median lifetime. S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Resolving The Problem Of Censoring By Working In A Person-Period Dataset S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Resolving The Problem Of Censoring By Working In A Person-Period Dataset Notice that the name of the dataset is different Here’s a clue to the difference between the person-level and the person-
13. 13. © Willett, Harvard University Graduate School of Education, 03/19/14 S052/II.2(b) – Slide 13 Person-Level Dataset ID acctopen CENSOR 1 1 Not censored 2 2 Not censored 3 1 Not censored 4 1 Not censored 5 12 Censored 6 1 Not censored 7 12 Censored 8 1 Not censored 9 2 Not censored 10 2 Not censored 12 7 Not censored 13 12 Censored 14 1 Not censored 15 12 Censored 16 12 Censored Etc. Person-Level Dataset ID acctopen CENSOR 1 1 Not censored 2 2 Not censored 3 1 Not censored 4 1 Not censored 5 12 Censored 6 1 Not censored 7 12 Censored 8 1 Not censored 9 2 Not censored 10 2 Not censored 12 7 Not censored 13 12 Censored 14 1 Not censored 15 12 Censored 16 12 Censored Etc. Person-Period Dataset ID PERIOD EVENT 1 1 1 2 1 0 2 2 1 3 1 1 4 1 1 5 1 0 5 2 0 5 3 0 5 4 0 5 5 0 5 6 0 5 7 0 5 8 0 5 9 0 5 10 0 5 11 0 5 12 0 6 1 1 7 1 0 7 2 0 7 3 0 7 4 0 7 5 0 7 6 0 7 7 0 7 8 0 7 9 0 7 10 0 7 11 0 7 12 0 Etc. Person-Period Dataset ID PERIOD EVENT 1 1 1 2 1 0 2 2 1 3 1 1 4 1 1 5 1 0 5 2 0 5 3 0 5 4 0 5 5 0 5 6 0 5 7 0 5 8 0 5 9 0 5 10 0 5 11 0 5 12 0 6 1 1 7 1 0 7 2 0 7 3 0 7 4 0 7 5 0 7 6 0 7 7 0 7 8 0 7 9 0 7 10 0 7 11 0 7 12 0 Etc. In a person-period dataset: • Each person has one row of data for each time-period, • Their data record continues until the time-period in which they either experience the event of interest, or they are censored. In a person-period dataset: • Each person has one row of data for each time-period, • Their data record continues until the time-period in which they either experience the event of interest, or they are censored. S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Inspecting the Person-Period Dataset S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Inspecting the Person-Period Dataset account #2 is not censored and so it experiences the event of interest (i.e. closes account ) in the 2nd year. account #2 is not censored and so it experiences the event of interest (i.e. closes account ) in the 2nd year. account #7 is censored – it never experiences the event of interest (i.e. never closes account ) in all the 12 years during which accounts are observed. account #7 is censored – it never experiences the event of interest (i.e. never closes account ) in all the 12 years during which accounts are observed.
14. 14. © Willett, Harvard University Graduate School of Education, 03/19/14 S052/II.2(b) – Slide 14 EVENT(Did Customer close Account in this Time Period?) Frequency‚ Col Pct ‚ 1‚ 2‚ 3‚ 4‚ 5‚ 6‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ ˆ No close ‚ 3485 ‚ 3101 ‚ 2742 ‚ 2447 ‚ 2229 ‚ 2045 ‚ ‚ 88.43 ‚ 88.98 ‚ 88.42 ‚ 89.24 ‚ 91.09 ‚ 91.75 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ ˆ close ‚ 456 ‚ 384 ‚ 359 ‚ 295 ‚ 218 ‚ 184 ‚ ‚ 11.57 ‚ 11.02 ‚ 11.58 ‚ 10.76 ‚ 8.91 ‚ 8.25 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ ˆ Total 3941 3485 3101 2742 2447 2229 EVENT(Did Customer close Account in this Time Period?) Frequency‚ Col Pct ‚ 1‚ 2‚ 3‚ 4‚ 5‚ 6‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ ˆ No close ‚ 3485 ‚ 3101 ‚ 2742 ‚ 2447 ‚ 2229 ‚ 2045 ‚ ‚ 88.43 ‚ 88.98 ‚ 88.42 ‚ 89.24 ‚ 91.09 ‚ 91.75 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ ˆ close ‚ 456 ‚ 384 ‚ 359 ‚ 295 ‚ 218 ‚ 184 ‚ ‚ 11.57 ‚ 11.02 ‚ 11.58 ‚ 10.76 ‚ 8.91 ‚ 8.25 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ ˆ Total 3941 3485 3101 2742 2447 2229 PERIOD(Current Time Period) ‚ 7‚ 8‚ 9‚ 10‚ 11‚ 12‚ Tota l ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ ‚ 1922 ‚ 1563 ‚ 1203 ‚ 913 ‚ 632 ‚ 386 ‚ 22668 ‚ 93.99 ‚ 95.19 ‚ 95.78 ‚ 96.31 ‚ 97.53 ‚ 98.72 ‚ ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ ‚ 123 ‚ 79 ‚ 53 ‚ 35 ‚ 16 ‚ 5 ‚ 2207 ‚ 6.01 ‚ 4.81 ‚ 4.22 ‚ 3.69 ‚ 2.47 ‚ 1.28 ‚ ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 2045 1642 1256 948 648 391 24875 PERIOD(Current Time Period) ‚ 7‚ 8‚ 9‚ 10‚ 11‚ 12‚ Tota l ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ ‚ 1922 ‚ 1563 ‚ 1203 ‚ 913 ‚ 632 ‚ 386 ‚ 22668 ‚ 93.99 ‚ 95.19 ‚ 95.78 ‚ 96.31 ‚ 97.53 ‚ 98.72 ‚ ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ ‚ 123 ‚ 79 ‚ 53 ‚ 35 ‚ 16 ‚ 5 ‚ 2207 ‚ 6.01 ‚ 4.81 ‚ 4.22 ‚ 3.69 ‚ 2.47 ‚ 1.28 ‚ ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 2045 1642 1256 948 648 391 24875 Here’s the Life Table – a Two-Way Contingency Table Analysis of EVENT by PERIOD …Here’s the Life Table – a Two-Way Contingency Table Analysis of EVENT by PERIOD … S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Beginning Of The Life Table Analysis – Estimating The Sample Hazard Probability S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Beginning Of The Life Table Analysis – Estimating The Sample Hazard Probability We can use these frequencies to estimate a hazard probability that describes the “risk of closing” in each time-period. Hazard probability is the (conditional) probability that an account will experience the event of importance (i.e., close) in a particular time-period, given that it has “survived” up until this period. We can use these frequencies to estimate a hazard probability that describes the “risk of closing” in each time-period. Hazard probability is the (conditional) probability that an account will experience the event of importance (i.e., close) in a particular time-period, given that it has “survived” up until this period. In discrete time period #1, for instance:  There are 3941 accounts “at risk of closing.” Of this “risk set of accounts,” 456 were observed to close. Hence, the probability that an account will close in this period, given that it entered it, is (456/3941), or 0.1157. So, the sample hazard probability in discrete time- period #1 is In discrete time period #1, for instance:  There are 3941 accounts “at risk of closing.” Of this “risk set of accounts,” 456 were observed to close. Hence, the probability that an account will close in this period, given that it entered it, is (456/3941), or 0.1157. So, the sample hazard probability in discrete time- period #1 is ˆh t( ) = 0.1157
15. 15. © Willett, Harvard University Graduate School of Education, 03/19/14 S052/II.2(b) – Slide 15 EVENT(Was account closed in this Time Period?) Frequency‚ Col Pct ‚ 1‚ 2‚ 3‚ 4‚ 5‚ 6‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ ˆ No ‚ 3485 ‚ 3101 ‚ 2742 ‚ 2447 ‚ 2229 ‚ 2045 ‚ ‚ 88.43 ‚ 88.98 ‚ 88.42 ‚ 89.24 ‚ 91.09 ‚ 91.75 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ ˆ Yes ‚ 456 ‚ 384 ‚ 359 ‚ 295 ‚ 218 ‚ 184 ‚ ‚ 11.57 ‚ 11.02 ‚ 11.58 ‚ 10.76 ‚ 8.91 ‚ 8.25 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ ˆ Total 3941 3485 3101 2742 2447 2229 EVENT(Was account closed in this Time Period?) Frequency‚ Col Pct ‚ 1‚ 2‚ 3‚ 4‚ 5‚ 6‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ ˆ No ‚ 3485 ‚ 3101 ‚ 2742 ‚ 2447 ‚ 2229 ‚ 2045 ‚ ‚ 88.43 ‚ 88.98 ‚ 88.42 ‚ 89.24 ‚ 91.09 ‚ 91.75 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ ˆ Yes ‚ 456 ‚ 384 ‚ 359 ‚ 295 ‚ 218 ‚ 184 ‚ ‚ 11.57 ‚ 11.02 ‚ 11.58 ‚ 10.76 ‚ 8.91 ‚ 8.25 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ ˆ Total 3941 3485 3101 2742 2447 2229 PERIOD(Current Time Period) ‚ 7‚ 8‚ 9‚ 10‚ 11‚ 12‚ Tota l ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ ‚ 1922 ‚ 1563 ‚ 1203 ‚ 913 ‚ 632 ‚ 386 ‚ 22668 ‚ 93.99 ‚ 95.19 ‚ 95.78 ‚ 96.31 ‚ 97.53 ‚ 98.72 ‚ ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ ‚ 123 ‚ 79 ‚ 53 ‚ 35 ‚ 16 ‚ 5 ‚ 2207 ‚ 6.01 ‚ 4.81 ‚ 4.22 ‚ 3.69 ‚ 2.47 ‚ 1.28 ‚ ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 2045 1642 1256 948 648 391 24875 PERIOD(Current Time Period) ‚ 7‚ 8‚ 9‚ 10‚ 11‚ 12‚ Tota l ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ ‚ 1922 ‚ 1563 ‚ 1203 ‚ 913 ‚ 632 ‚ 386 ‚ 22668 ‚ 93.99 ‚ 95.19 ‚ 95.78 ‚ 96.31 ‚ 97.53 ‚ 98.72 ‚ ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ ‚ 123 ‚ 79 ‚ 53 ‚ 35 ‚ 16 ‚ 5 ‚ 2207 ‚ 6.01 ‚ 4.81 ‚ 4.22 ‚ 3.69 ‚ 2.47 ‚ 1.28 ‚ ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 2045 1642 1256 948 648 391 24875 And the sample hazard probabilities for discrete time-periods #4, #5, #6 and #7…And the sample hazard probabilities for discrete time-periods #4, #5, #6 and #7… S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Hazard Probability S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Hazard Probability Something different is happening here in the Life Table? What is it? Why is it occurring? Is it a problem? 2229
16. 16. © Willett, Harvard University Graduate School of Education, 03/19/14 S052/II.2(b) – Slide 16 Conclusion? The hazard probability provides the risk of closing at each year after an account is open. Conclusion? The hazard probability provides the risk of closing at each year after an account is open. Collect the sample hazard probabilities together and plot them as a sample hazard function …Collect the sample hazard probabilities together and plot them as a sample hazard function … S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Plotting the Sample Hazard Function S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Plotting the Sample Hazard Function
17. 17. © Willett, Harvard University Graduate School of Education, 03/19/14 S052/II.2(b) – Slide 17 Time Period Sample Hazard Probability h(t) Sample Survival Probability S(t) 0 1.0000 1 0.1157 0.8843 2 0.1102 0.7869 3 0.1158 0.6958 4 0.1076 0.6209 5 0.0891 0.5656 6 0.0825 0.5189 7 0.0601 0.4877 8 0.0481 0.4642 9 0.0422 0.4446 10 0.0369 0.4282 11 0.0247 0.4177 12 0.0128 0.4123 Once you have the sample hazard probabilities, you can cumulate them to get sample survival probabilities … Once you have the sample hazard probabilities, you can cumulate them to get sample survival probabilities … Sample Survival Probability Survival probability in any time period is the probability of “surviving” beyond that period (ie, the probability of not experiencing the event of interest until after the period). Sample Survival Probability Survival probability in any time period is the probability of “surviving” beyond that period (ie, the probability of not experiencing the event of interest until after the period). S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Survival Probability S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Survival Probability Here, all accounts survived the 0th time period, so the estimated sample survival probability in the 0th period is 1.000. Here, all accounts survived the 0th time period, so the estimated sample survival probability in the 0th period is 1.000. The estimated hazard probability suggests that a proportion of 0.1157 of accounts in the 1st period risk set will “die” in the 1st period (i.e., close). The estimated hazard probability suggests that a proportion of 0.1157 of accounts in the 1st period risk set will “die” in the 1st period (i.e., close).  Because a proportion of 0.1157 of the risk set will “die” in the 1st period, we know that (1 - 0.1157) or 0.8843 of the 1st period risk set will survive.  In other words, 0.8843 of the entering “1.0000” will remain “alive” beyond the 1st time- period (and will therefore be potentially available to close at some later time).  The sample survival probability in the 1st time period is therefore 0.8843 × 1.000, or:  Because a proportion of 0.1157 of the risk set will “die” in the 1st period, we know that (1 - 0.1157) or 0.8843 of the 1st period risk set will survive.  In other words, 0.8843 of the entering “1.0000” will remain “alive” beyond the 1st time- period (and will therefore be potentially available to close at some later time).  The sample survival probability in the 1st time period is therefore 0.8843 × 1.000, or: 8843.0)(ˆ 1 =tS
18. 18. © Willett, Harvard University Graduate School of Education, 03/19/14 S052/II.2(b) – Slide 18 Time Period Sample Hazard Probability h(t) Sample Survival Probability S(t) 0 1.0000 1 0.1157 0.8843 2 0.1102 0.7869 3 0.1158 0.6958 4 0.1076 0.6209 5 0.0891 0.5656 6 0.0825 0.5189 7 0.0601 0.4877 8 0.0481 0.4642 9 0.0422 0.4446 10 0.0369 0.4282 11 0.0247 0.4177 12 0.0128 0.4123 And, the estimated survival probability in discrete time period #2… And, the estimated survival probability in discrete time period #2… S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Survival Probability S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Survival Probability Here, according to the estimated sample survival probability, a proportion of 0.8843 of the accounts survived the 1th time period. Here, according to the estimated sample survival probability, a proportion of 0.8843 of the accounts survived the 1th time period. The estimated hazard probability suggests that a proportion of 0.1102 of accounts in the 2nd period risk set will “die” in the 2nd period (i.e., close). The estimated hazard probability suggests that a proportion of 0.1102 of accounts in the 2nd period risk set will “die” in the 2nd period (i.e., close).  Because a proportion of 0.1102 of the risk set will “die” in the 2nd period, we know that (1 - 0.1102), or 0.8898, of the 2nd period risk set will survive.  In other words, a proportion of 0.8898 of the entering “0.8843” will remain “alive” beyond the 2nd time period (and be potentially available to close later).  The sample survival probability in the 2nd time period is therefore 0.8898 × 0.8843, or:  Because a proportion of 0.1102 of the risk set will “die” in the 2nd period, we know that (1 - 0.1102), or 0.8898, of the 2nd period risk set will survive.  In other words, a proportion of 0.8898 of the entering “0.8843” will remain “alive” beyond the 2nd time period (and be potentially available to close later).  The sample survival probability in the 2nd time period is therefore 0.8898 × 0.8843, or: 7869.0)(ˆ 2 =tS
19. 19. © Willett, Harvard University Graduate School of Education, 03/19/14 S052/II.2(b) – Slide 19 Time Period Sample Hazard Probability h(t) Sample Survival Probability S(t) 0 1.0000 1 0.1157 0.8843 2 0.1102 0.7869 3 0.1158 0.6958 4 0.1076 0.6209 5 0.0891 0.5656 6 0.0825 0.5189 7 0.0601 0.4877 8 0.0481 0.4642 9 0.0422 0.4446 10 0.0369 0.4282 11 0.0247 0.4177 12 0.0128 0.4123 And, the estimated survival probability in discrete time period #3 … etcAnd, the estimated survival probability in discrete time period #3 … etc S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Survival Probability S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Continuing The Life Table Analysis – Estimating Sample Survival Probability Here, according to the estimated sample survival probability, a proportion of 0.7869 of the accounts survived the 2nd time period. Here, according to the estimated sample survival probability, a proportion of 0.7869 of the accounts survived the 2nd time period. The estimated hazard probability suggests that a proportion of 0.1158 of accounts in the 3rd period risk set will “die” in the 3rd period (i.e., close). The estimated hazard probability suggests that a proportion of 0.1158 of accounts in the 3rd period risk set will “die” in the 3rd period (i.e., close).  Because a proportion of 0.1158 of the risk set will “die” in the 3rd period, we know that (1 - 0.1158), or 0.8842, of the 3rd period risk set will survive.  In other words, a proportion of 0.8842 of the entering “0.7869” will remain “alive” beyond the 3rd time period (and be potentially available to close later).  The sample survival probability in the 3rd time period is therefore 0.8842 × 0.7869, or:  Because a proportion of 0.1158 of the risk set will “die” in the 3rd period, we know that (1 - 0.1158), or 0.8842, of the 3rd period risk set will survive.  In other words, a proportion of 0.8842 of the entering “0.7869” will remain “alive” beyond the 3rd time period (and be potentially available to close later).  The sample survival probability in the 3rd time period is therefore 0.8842 × 0.7869, or: 6958.0)(ˆ 3 =tS
20. 20. © Willett, Harvard University Graduate School of Education, 03/19/14 S052/II.2(b) – Slide 20 Time Period Sample Hazard Probability h(t) Sample Survival Probability S(t) jt )(ˆ jth )(ˆ jtS 1−jt )(ˆ 1−jtS As a general principle, the estimated survivor probability in any time period j can be found by substituting into a simple little rule … As a general principle, the estimated survivor probability in any time period j can be found by substituting into a simple little rule … S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Simple Rule For Estimating Sample Survival Probability S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Simple Rule For Estimating Sample Survival Probability So, in general, in any time period j ..So, in general, in any time period j .. )(ˆ)](ˆ1[)(ˆ 1−−= jjj tSthtS
21. 21. © Willett, Harvard University Graduate School of Education, 03/19/14 S052/II.2(b) – Slide 21 Plotting the sample survival probabilities against time period provides the sample survivor function.Plotting the sample survival probabilities against time period provides the sample survivor function. Typical monotonically decreasing survivor function …Median lifetime survival probability is 6.6, point at which half of accounts are “still alive.” Typical monotonically decreasing survivor function …Median lifetime survival probability is 6.6, point at which half of accounts are “still alive.” S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Plotting the Sample Survivor Function And Estimating Median Lifetime Survivor Probability S052/II.2(a1): Introducing The Central Concepts In Classical Survival Analysis Plotting the Sample Survivor Function And Estimating Median Lifetime Survivor Probability
22. 22. Research Question 2 for Next Time… • Question 2. How can we predict core deposit interest rates? • A. from prime interest rate? • B. from market interest rate? • 1. Can we predict core deposit interest rate from 3 month LIBOR (one index of market interest rate)? • 2. from lagged LIBOR indices? • 3. Are there other market interest rate indices we want to include to predict core deposit interest rate?