SlideShare a Scribd company logo
1 of 9
Download to read offline
FutureCast
(Project under NDA)
Justin Gilmer
Goal: Predict the hourly rate of a certain kind of
event for each city in the US.
(Think number of arrests for public intoxication)
f(city, date, time of day, weather) -> #events/hour
Data:
● 500,000 geolocation data points across the US (2001-2013).
● < 1000 events/ year in major cities.
● Pulled in external weather data to augment the model
Signal: Weekly Patterns (Chicago)
Signal
Poisson Regression
● GLM perfect for count / rate data that is Poisson distributed.
● Trained with MLE (python, statsmodels,patsy)
● Features: weekday, hour of day, #days from new years, weather, year
About me! Discrete Math: Random Graphs
Theoretical CS: (related to P = NP)
Model Comparison (2013 test)
Adding Rain:
● data: (lat, long, timestamp, weather)
Problem: In periods with no events, don’t know weather,.
Solution:
● Pick a few cities where we can fill in the gaps on weather data
● Calculate multiplicative “rain danger coeffient” and apply nationwide

More Related Content

Similar to Slides

2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...
2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...
2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...
Azavea
 
Building Climate Resilience: Translating Climate Data into Risk Assessments
Building Climate Resilience: Translating Climate Data into Risk Assessments Building Climate Resilience: Translating Climate Data into Risk Assessments
Building Climate Resilience: Translating Climate Data into Risk Assessments
Safe Software
 

Similar to Slides (8)

2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...
2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...
2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-b...
 
Automatic Forecasting at Scale
Automatic Forecasting at ScaleAutomatic Forecasting at Scale
Automatic Forecasting at Scale
 
Building Climate Resilience: Translating Climate Data into Risk Assessments
Building Climate Resilience: Translating Climate Data into Risk Assessments Building Climate Resilience: Translating Climate Data into Risk Assessments
Building Climate Resilience: Translating Climate Data into Risk Assessments
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
 
Rent, Rain, and Regulations | Du Phan, Dataiku | DN18
Rent, Rain, and Regulations | Du Phan, Dataiku | DN18Rent, Rain, and Regulations | Du Phan, Dataiku | DN18
Rent, Rain, and Regulations | Du Phan, Dataiku | DN18
 
Prediction of taxi rides ETA
Prediction of taxi rides ETAPrediction of taxi rides ETA
Prediction of taxi rides ETA
 
Undergraduate Modeling Workshop - Southeastern US Rainfall Working Group Fina...
Undergraduate Modeling Workshop - Southeastern US Rainfall Working Group Fina...Undergraduate Modeling Workshop - Southeastern US Rainfall Working Group Fina...
Undergraduate Modeling Workshop - Southeastern US Rainfall Working Group Fina...
 
Urban flood prediction digital ocean august edition
Urban flood prediction   digital ocean august editionUrban flood prediction   digital ocean august edition
Urban flood prediction digital ocean august edition
 

Slides

  • 2. Goal: Predict the hourly rate of a certain kind of event for each city in the US. (Think number of arrests for public intoxication) f(city, date, time of day, weather) -> #events/hour
  • 3. Data: ● 500,000 geolocation data points across the US (2001-2013). ● < 1000 events/ year in major cities. ● Pulled in external weather data to augment the model
  • 6. Poisson Regression ● GLM perfect for count / rate data that is Poisson distributed. ● Trained with MLE (python, statsmodels,patsy) ● Features: weekday, hour of day, #days from new years, weather, year
  • 7. About me! Discrete Math: Random Graphs Theoretical CS: (related to P = NP)
  • 9. Adding Rain: ● data: (lat, long, timestamp, weather) Problem: In periods with no events, don’t know weather,. Solution: ● Pick a few cities where we can fill in the gaps on weather data ● Calculate multiplicative “rain danger coeffient” and apply nationwide