3. two hypotheses
1. Changes in the number of issued liquor
licenses precedes changes in rental price
increases
2. Changes in the number of taxi pickups and
drop-offs precedes changes in rental price
increases
4. motivation
1. Identify regions ripe for investment
2. Identify areas which may undergo gentrification
• Give early chance to policy makers to
implement rent controls
5. the data
Rental Unit Prices
• Published by Zillow
Liquor Licenses
• NY Gov’t Liquor Authority Database*
Taxi pickups/drop-offs
• Published by NY City Gov’t
• ~30 Gb / year
* Databases were, unfortunately, harmed in the creation of this project
10. data pipeline
Raw Data :
Roughly oscillatory trend
Processed Data
Prediction Target!
Model Features (lagged 3 to 12 months):
• Monthly changes in number of liquor licenses issued
• Monthly changes in taxi pickups and drop-offs
• Historical changes in price
11. pipeline
Liquor Data
Taxi Data
Rental Price Data
Aggregate
Synchronize
Trend
Train/optimize Models
• Vector Autoregression (VAR)
• Random Forest
• Random Forest w/o L+T
Goal 1 : Test taxi and liquor license hypotheses
Goal 2 : Accurately forecast monthly changes
12. are the models accurate?
VAR found no
statistically significant
relationship between
taxis + liquor licenses
and rent increases
Model Forecasts for A Single Zip-code
Random
Forests
VAR
Target
training forecast
13. how far can the models predict?
Forecast Accuracy for All NY Zip-codes
• Hoped to observe Full RF
outperform RF on long term
predictions
• Failed to observe
• Adding taxi and liquor data does
not improve predictions
• Confirms VAR finding
14. what can the models tell us about NY?
NYU has identified a number of regions which have been
undergoing gentrification
Three categories:
1. Gentrifying
• Low-income in 1990, experienced rent growth above
the median between 1990 and 2014
2. Non-Gentrifying
• Started off as low-income in 1990 but experienced
more modest growth than gentrifying areas
3. Higher Income
• Those that were already at high income levels in 1990
15. what can the models tell us about NY?
Bimodal accuracy distribution by zipcode:
For some zip codes, models trained with liquor
and taxi data well outperform models without
These regions are almost unanimously gentrifying
• Bed-Stuy and Crown Heights
• Bronx near Yankee Stadium
• Jackson Heights near Citi Field (Mets)
• Not included in NYU map
• Google results from 2016 indicate gentrification has
just begun
Regions where
Liquor and Taxi
Models
Perform Better
Perhaps there is some signal after all…