The housing demand in New Zealand is constantly growing and there has been a lot of concern about the adequate supply of houses for a number of people. The aim of the research was to examine which statistical model correctly estimates time to completion of a dwelling.
•Time to Event Analysis was adopted using Kaplan Meier curve, the trajectory to define time to completion of a dwelling.
•Probability of completion to any point was estimated from cumulative probability of completion of the preceding time intervals.
•Kaplan-Meier analysis facilitated the understanding of time to completion across different time lengths considering both completed as well as cancelled dwellings.
•A transition probability matrix was then created based on the consented and the completion dwellings on quarterly basis. This aggregated matrix helped in forecasting the approximate time to completion for the dwellings consented recently
Tabula.io Cheatsheet: automate your data workflows
Time to event analysis
1. Time to Event Analysis
Estimating time to complete a
dwelling
2. 2
• People are interested in housing stocks
• “You can’t live in a consent”– common saying
• People were not aware about the term ‘consen
and there was a vacuum and need for the data
Building Plan
Under construction dwelling
3. 2
Knowing the data
Data Source:
• Building consents issued
• Quarterly Building Activity Survey
• ChristchurchCity Council final inspection and code compliance data
Dataset:
• Data comprises of 490,736 rows and 26 variables
• The consents issued are from 1990Q1 to 2017Q2
• It is a mixed data comprising of integer, factor, categorical and dates.
Limitations:
• Quarterly Building Activity Survey has data for only high valued
dwellings. So the data for the lower valued dwelling projects are not
included
4. 0
1
2
3
4
5
6
7
8
9
10
11
12
13
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Lag to completion for new dwellings
By calendar year completed
Summary
7. 2
Need for a better model?
Why not compare mean time-to-event between groups using a t-test or
linear regression?
- Ignores censoring
Why not compare proportion of events in your groups using logistic
regression?
- Ignores time
Solution: estimating time to event analysis using Kaplan-Meier curve
Because it helps in understanding of time to completion across different
time lengths of time considering completed as well as cancelled dwelling.
8. 2
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
6 9 12 15 18 21 24 27 30 33 36 39 42 45
ProbabiblityofCompletion
Lag in months
Probability of completion by type of dwelling using Kaplan Meier curve
Houses Apartments Retirement villages townhouses
9. Challenges and way ahead
• A further analysis is needed to plot the survival curves based on recent
time period and estimating time to completion of a dwelling.
• The completion values for the lower projects and to interpret the
values for such projects will be help in attaining more accurate
predictions
10. Learning Journey
• Research is the key to solve Data Science problem.
• Nothing is particularly hard if we divide it into small tasks.
• "Alone we can do so little, together we can do so much." - Helen Keller