4. Complaint dataset
• Non-emergency complaints: Noise, Sewer, Rodent, Mold, Heating,…
• Around 8 million complaints registered since 2010
2
5. Complaint dataset
• Non-emergency complaints: Noise, Sewer, Rodent, Mold, Heating,…
• Around 8 million complaints registered since 2010
• Location: 5 borough, ~ 60 districts, ~200 zip code, lat-long
2
6. Complaint dataset
• Non-emergency complaints: Noise, Sewer, Rodent, Mold, Heating,…
• Around 8 million complaints registered since 2010
• Location: 5 borough, ~ 60 districts, ~200 zip code, lat-long
• Timestamps: created, resolved, due date
2
7. Complaint dataset
• Non-emergency complaints: Noise, Sewer, Rodent, Mold, Heating,…
• Around 8 million complaints registered since 2010
• Location: 5 borough, ~ 60 districts, ~200 zip code, lat-long
• Timestamps: created, resolved, due date
Sewer General Construction 2
8. 3
Snapshot of the data
(after pre-processing)
NYC Open
Data
~5Gb
csv/json
Python
MySQL (analysis) Flask
jQuery
Front
End
(AWS)
pre -
processing
in bash
12. Who uses this data?
4
Citizens
City Council
Members
City Agencies
(NYPD, etc.)
13. Who uses this data?
4
Citizens
City Council
Members
City Agencies
(NYPD, etc.)
Community
Boards
14. Who uses this data?
4
Citizens
City Council
Members
City Agencies
(NYPD, etc.)
Community
Boards
• Advocacy: “I can do better advocacy, if I had data that is easier to look at.”
• Efficiency: “If we had easier to understand 311 data, we could have more
productive CB meetings.”
• Effectiveness: "I could set better agenda topics”
15. Identifying high-priority problems
5
Jun
10
Jun
11
Jun
12
Jun
13
Jun
14
Dec
10
Dec
10
Dec
13
Dec
12
Brooklyn
Blocked
driveway
Paint
Plumbing
Construction
Heat
How do you quantify ‘high-priority’?
18. Analysis
• Model the time series: Linear regression & Gaussian processes
• Categorical features: for periodicity (quarter, day of the week)
• Linear features: for linear trends
19. Analysis
• Model the time series: Linear regression & Gaussian processes
• Categorical features: for periodicity (quarter, day of the week)
• Linear features: for linear trends
A p r
10
Apr
11
Apr
12
Apr
13
Oct
10
Oct
10
Oct
13
Oct
12
Heat complaints: Blue - actual volumes, Red - predicted
20. Analysis
• Model the time series: Linear regression & Gaussian processes
• Categorical features: for periodicity (quarter, day of the week)
• Linear features: for linear trends
A p r
10
Apr
11
Apr
12
Apr
13
Oct
10
Oct
10
Oct
13
Oct
12
Heat complaints: Blue - actual volumes, Red - predicted
• Predict future complaint volumes: useful to allocate resources
21. Very High
Complaint volume less
than expected
High
Medium
Low
Analysis
• Compare expected & actual volumes
!
• Priority score =
22. CB 04 Bronx
!
(last 6 months)
A ug
10
Aug
11
Aug
12
Aug
13
Feb
11
Feb
12
Feb
13
Feb
14
Aug
14
23. 10
All of Bronx
!
(last 3 months)
A ug
10
Aug
11
Aug
12
Aug
13
Feb
11
Feb
12
Feb
13
Feb
14
Aug
14