2. Goal: Predict the hourly rate of a certain kind of
event for each city in the US.
(Think number of arrests for public intoxication)
f(city, date, time of day, weather) -> #events/hour
3. Data:
● 500,000 geolocation data points across the US (2001-2013).
● < 1000 events/ year in major cities.
● Pulled in external weather data to augment the model
6. Poisson Regression
● GLM perfect for count / rate data that is Poisson distributed.
● Trained with MLE (python, statsmodels,patsy)
● Features: weekday, hour of day, #days from new years, weather, year
9. About me! Discrete Math: Random Graphs
Theoretical CS: (related to P = NP)
10. A Bit of Research
CLT for the number of triangles in a random graph (Erdos-Renyi 1960):
Let S_n = #triangles in G(n,p) for 0<p<1 fixed. Then
Local Central Limit Theorem for # of triangles (Gilmer-Kopparty 2014):
11. Adding Rain:
Data: (lat, long, timestamp, weather)
Problem: In periods with no events, don’t know what the weather was.
Solution:
● Pick a few cities where we can fill in the gaps on weather data
● Calculate multiplicative “rain danger coefficient” and apply nationwide