340 N 12 th  St, Suite 402 Philadelphia, PA 19107 215.925.2600 [email_address] www.azavea.com/hunchlab Data Mining and Ris...
Agenda <ul><li>Who we are </li></ul><ul><li>What we are looking to do with software </li></ul><ul><li>What we’ve learned b...
About Azavea <ul><li>Founded in 2000 </li></ul><ul><li>30 people </li></ul><ul><li>Based in Philadelphia </li></ul><ul><ul...
Clients & Industries <ul><li>Public Safety </li></ul><ul><li>Municipal Services </li></ul><ul><li>Public Health </li></ul>...
HunchLab was developed, in part, based upon work supported by the National Science Foundation under Grant Nos. IIP-0637589...
The Backstory
How Phila PD uses GIS <ul><li>Customized Map Products </li></ul>Weekly CompStat Meetings Web Crime Analysis
Complainant 911 Operator Radio Dispatcher Police Officer District 48 Desk Daily download & Geocoding Routines Incident Rep...
The Context <ul><li>1,500,000 people </li></ul><ul><li>7,000 police officers </li></ul><ul><li>1,000 civilian employees </...
How can software help?
Our goals with software <ul><li>Improve ease of use </li></ul><ul><ul><li>Increase consumers of analysis </li></ul></ul><u...
<ul><li>web-based crime analysis, early warning, and risk forecasting </li></ul>
<ul><li>Crime Analysis </li></ul><ul><ul><li>Mapping (spatial / temporal densities) </li></ul></ul><ul><ul><li>Trending </...
Near Repeat Pattern Analysis
Contagious Crime? <ul><li>Near repeat pattern analysis  </li></ul><ul><ul><ul><li>“ If one burglary occurs, how does the r...
Near Repeat Pattern Analysis <ul><li>How can you test your own data? </li></ul><ul><ul><li>Near Repeat Calculator </li></u...
Near Repeat Pattern Analysis <ul><li>The goal: </li></ul><ul><ul><li>Quantify short term risk due to near-repeat victimiza...
Near Repeat Pattern Analysis <ul><li>The process </li></ul><ul><ul><li>Observe the pattern in historic data </li></ul></ul...
Near Repeat Pattern Analysis
Near Repeat Pattern Analysis
Near Repeat Pattern Analysis
Near Repeat Pattern Analysis
Near Repeat Pattern Analysis <ul><li>What did we learn? </li></ul><ul><ul><li>Having a reference implementation was very h...
 
Load Forecasting
Improving CompStat <ul><li>Load forecasting </li></ul><ul><ul><ul><li>“ Given the time of year, day of week, time of day a...
What Do We Mean By Load Forecasting? <ul><li>Load forecasting </li></ul><ul><ul><ul><li>Generating aggregate crime counts ...
Load Forecasting <ul><li>Measure cyclical patterns </li></ul><ul><ul><ul><li>Take historic incidents (for example: last fi...
Load Forecasting
Load Forecasting
Load Forecasting
Load Forecasting
Load Forecasting <ul><li>Identify non-cyclical trend </li></ul><ul><ul><ul><li>Take recent daily counts (for example: last...
Load Forecasting <ul><li>Forecast expected count </li></ul><ul><ul><ul><li>Project trend into future timeframe </li></ul><...
Load Forecasting bit.ly/gorrcrimeforecastingpaper Measure cyclical patterns Identify non-cyclical trend Forecast expected ...
How Do We Know It ’s Accurate? <ul><li>Testing </li></ul><ul><ul><ul><li>Generated multiple forecasting techniques (exampl...
How Do We Know It ’s Accurate? <ul><li>Error for 28-31 day forecasts for any Part X Series </li></ul>Last 30 Days Last Yea...
Improving CompStat <ul><li>Load forecasting </li></ul><ul><ul><ul><li>“ Given the time of year, day of week, time of day a...
Improving CompStat
What’s next?
Contact Information Jeremy Heffner HunchLab Product Manager [email_address] 215.701.7712 www.azavea.com/hunchlab
Upcoming SlideShare
Loading in...5
×

2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-based Analysis Tools

603

Published on

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
603
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • We make custom map requests, provide data for CompStat and support a Web site that is updated daily to support daily functions of districts.
  • 2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-based Analysis Tools

    1. 1. 340 N 12 th St, Suite 402 Philadelphia, PA 19107 215.925.2600 [email_address] www.azavea.com/hunchlab Data Mining and Risk Forecasting in Web-Based Analysis Tools
    2. 2. Agenda <ul><li>Who we are </li></ul><ul><li>What we are looking to do with software </li></ul><ul><li>What we’ve learned building risk forecasting features </li></ul><ul><ul><li>(forecasting versus prediction) </li></ul></ul>
    3. 3. About Azavea <ul><li>Founded in 2000 </li></ul><ul><li>30 people </li></ul><ul><li>Based in Philadelphia </li></ul><ul><ul><li>Boston office </li></ul></ul><ul><ul><li>Minneapolis office </li></ul></ul><ul><li>Geospatial + web + mobile </li></ul><ul><ul><li>Software development </li></ul></ul><ul><ul><li>Spatial analysis services </li></ul></ul>
    4. 4. Clients & Industries <ul><li>Public Safety </li></ul><ul><li>Municipal Services </li></ul><ul><li>Public Health </li></ul><ul><li>Human Services </li></ul><ul><li>Culture </li></ul><ul><li>Elections & Politics </li></ul><ul><li>Land Conservation </li></ul><ul><li>Economic Development </li></ul>
    5. 5. HunchLab was developed, in part, based upon work supported by the National Science Foundation under Grant Nos. IIP-0637589 and IIP-0750507.
    6. 6. The Backstory
    7. 7. How Phila PD uses GIS <ul><li>Customized Map Products </li></ul>Weekly CompStat Meetings Web Crime Analysis
    8. 8. Complainant 911 Operator Radio Dispatcher Police Officer District 48 Desk Daily download & Geocoding Routines Incident Report Completed by Officer Maps distributed Through Intranet, Printing, CompStat INCT & PARS – main database sources  over 5,000 incidents daily, over 2 million annually CAD Verizon 911 INCT District X District Y District Z PARS
    9. 9. The Context <ul><li>1,500,000 people </li></ul><ul><li>7,000 police officers </li></ul><ul><li>1,000 civilian employees </li></ul><ul><li>2,000,000 new incidents / year </li></ul>3 crime analysts
    10. 10. How can software help?
    11. 11. Our goals with software <ul><li>Improve ease of use </li></ul><ul><ul><li>Increase consumers of analysis </li></ul></ul><ul><li>Automate time-intensive routines </li></ul><ul><ul><li>Free up resources for things that can’t be automated </li></ul></ul><ul><li>Increase sophistication </li></ul><ul><ul><li>Accomplish things not possible manually </li></ul></ul>
    12. 12. <ul><li>web-based crime analysis, early warning, and risk forecasting </li></ul>
    13. 13. <ul><li>Crime Analysis </li></ul><ul><ul><li>Mapping (spatial / temporal densities) </li></ul></ul><ul><ul><li>Trending </li></ul></ul><ul><ul><li>Intelligence Dashboard </li></ul></ul><ul><li>Early Warning </li></ul><ul><ul><li>Statistical & Threshold-based Hunches (data mining) </li></ul></ul><ul><ul><li>Alerting </li></ul></ul><ul><li>Risk Forecasting </li></ul><ul><ul><li>Near Repeat Pattern </li></ul></ul><ul><ul><li>Load Forecasting </li></ul></ul>
    14. 14. Near Repeat Pattern Analysis
    15. 15. Contagious Crime? <ul><li>Near repeat pattern analysis </li></ul><ul><ul><ul><li>“ If one burglary occurs, how does the risk change nearby?” </li></ul></ul></ul>
    16. 16. Near Repeat Pattern Analysis <ul><li>How can you test your own data? </li></ul><ul><ul><li>Near Repeat Calculator </li></ul></ul><ul><ul><ul><li>http://www.temple.edu/cj/misc/nr/ </li></ul></ul></ul><ul><li>Papers </li></ul><ul><ul><li>Near-Repeat Patterns in Philadelphia Shootings (2008) </li></ul></ul><ul><ul><ul><li>One city block & two weeks after one shooting </li></ul></ul></ul><ul><ul><ul><ul><li>33% increase in likelihood of a second event </li></ul></ul></ul></ul>Jerry Ratcliffe Temple University
    17. 17. Near Repeat Pattern Analysis <ul><li>The goal: </li></ul><ul><ul><li>Quantify short term risk due to near-repeat victimization </li></ul></ul><ul><ul><ul><li>“ If one burglary occurs, how does the risk of burglary for the neighbors change?” </li></ul></ul></ul><ul><li>What we know: </li></ul><ul><ul><li>Incident A (place, time) --> Incident B (place, time) </li></ul></ul><ul><ul><ul><li>Distance between A and B </li></ul></ul></ul><ul><ul><ul><li>Timeframe between A and B </li></ul></ul></ul><ul><li>What we need to know: </li></ul><ul><ul><li>What distances/timeframes are not simply random? </li></ul></ul>
    18. 18. Near Repeat Pattern Analysis <ul><li>The process </li></ul><ul><ul><li>Observe the pattern in historic data </li></ul></ul><ul><ul><li>Simulate the pattern in randomized historic data </li></ul></ul><ul><ul><li>Compare the observed pattern to the simulated patterns </li></ul></ul><ul><ul><li>Apply the non-random pattern to new incidents </li></ul></ul><ul><li>An example </li></ul><ul><ul><li>180 days of burglaries in Division 6 of Philadelphia </li></ul></ul>
    19. 19. Near Repeat Pattern Analysis
    20. 20. Near Repeat Pattern Analysis
    21. 21. Near Repeat Pattern Analysis
    22. 22. Near Repeat Pattern Analysis
    23. 23. Near Repeat Pattern Analysis <ul><li>What did we learn? </li></ul><ul><ul><li>Having a reference implementation was very helpful </li></ul></ul><ul><ul><ul><li>Aids in translation of research into software </li></ul></ul></ul><ul><ul><li>Analysts simplify things to make operationalization possible </li></ul></ul><ul><ul><ul><li>They simplify risk bands to ease map making </li></ul></ul></ul><ul><ul><li>Academics leave large questions unanswered </li></ul></ul><ul><ul><ul><li>What happens when risk areas overlap? </li></ul></ul></ul>
    24. 25. Load Forecasting
    25. 26. Improving CompStat <ul><li>Load forecasting </li></ul><ul><ul><ul><li>“ Given the time of year, day of week, time of day and general trend, what counts of crimes should I expect?” </li></ul></ul></ul>
    26. 27. What Do We Mean By Load Forecasting? <ul><li>Load forecasting </li></ul><ul><ul><ul><li>Generating aggregate crime counts for a future timeframe using cyclical time series analysis </li></ul></ul></ul>bit.ly/gorrcrimeforecastingpaper Measure cyclical patterns Identify non-cyclical trend Forecast expected count +
    27. 28. Load Forecasting <ul><li>Measure cyclical patterns </li></ul><ul><ul><ul><li>Take historic incidents (for example: last five years) </li></ul></ul></ul><ul><ul><ul><li>Generate multiplicative seasonal indices </li></ul></ul></ul><ul><ul><ul><ul><li>For each time cycle: </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>time of year </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>day of week </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>time of day </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><li>Count incidents within each time unit (for example: Monday) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Calculate average per time unit if incidents were evenly distributed </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Divide counts within each time unit by the calculated average to generate multiplicative indices </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Index ~ 1 means at the average </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Index > 1 means above average </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Index < 1 means below average </li></ul></ul></ul></ul></ul>
    28. 29. Load Forecasting
    29. 30. Load Forecasting
    30. 31. Load Forecasting
    31. 32. Load Forecasting
    32. 33. Load Forecasting <ul><li>Identify non-cyclical trend </li></ul><ul><ul><ul><li>Take recent daily counts (for example: last year daily counts) </li></ul></ul></ul><ul><ul><ul><li>Remove cyclical trends by dividing by indices </li></ul></ul></ul><ul><ul><ul><li>Run a trending function on the new counts </li></ul></ul></ul><ul><ul><ul><ul><li>Simple average </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Last X Days </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><li>Smoothing function </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Exponential smoothing </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Holt’s linear exponential smoothing </li></ul></ul></ul></ul></ul>
    33. 34. Load Forecasting <ul><li>Forecast expected count </li></ul><ul><ul><ul><li>Project trend into future timeframe </li></ul></ul></ul><ul><ul><ul><ul><li>Always flat </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Simple average </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Exponential smoothing </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><li>Linear trend </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Holt’s linear exponential smoothing </li></ul></ul></ul></ul></ul><ul><ul><ul><li>Multiple by seasonal indices to reseasonalize the data </li></ul></ul></ul>
    34. 35. Load Forecasting bit.ly/gorrcrimeforecastingpaper Measure cyclical patterns Identify non-cyclical trend Forecast expected count +
    35. 36. How Do We Know It ’s Accurate? <ul><li>Testing </li></ul><ul><ul><ul><li>Generated multiple forecasting techniques (examples) </li></ul></ul></ul><ul><ul><ul><ul><li>Commonly Used </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Average of last 30 days </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Average of last 365 days </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Last year’s count for the same time period </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><li>Advanced Combinations </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Different cyclical indices (example: day of year vs. month of year) </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Different levels of geographic aggregation for indices </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Different trending functions </li></ul></ul></ul></ul></ul><ul><ul><ul><li>Scoring methodologies (examples) </li></ul></ul></ul><ul><ul><ul><ul><li>Mean absolute percent error (with some enhancements) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Mean percent error </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Mean squared error </li></ul></ul></ul></ul><ul><ul><ul><li>Run thousands of forecasts through testing framework </li></ul></ul></ul><ul><ul><ul><li>Choose the right technique in the right situation </li></ul></ul></ul>
    36. 37. How Do We Know It ’s Accurate? <ul><li>Error for 28-31 day forecasts for any Part X Series </li></ul>Last 30 Days Last Year Load Forecast Error Reduction Philadelphia - Citywide 6.8% 6.5% 4.1% 39% Philadelphia - Divisions 8.1% 8.4% 5.8% 28% Philadelphia - Districts 10.9% 11.7% 9.3% 15% Lincoln, NE - Citywide 13% 11% 10% 23%
    37. 38. Improving CompStat <ul><li>Load forecasting </li></ul><ul><ul><ul><li>“ Given the time of year, day of week, time of day and general trend, what counts of crimes should I expect?” </li></ul></ul></ul>
    38. 39. Improving CompStat
    39. 40. What’s next?
    40. 41. Contact Information Jeremy Heffner HunchLab Product Manager [email_address] 215.701.7712 www.azavea.com/hunchlab

    ×