2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-based Analysis Tools

  • 524 views
Uploaded on

 

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
524
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • We make custom map requests, provide data for CompStat and support a Web site that is updated daily to support daily functions of districts.

Transcript

  • 1. 340 N 12 th St, Suite 402 Philadelphia, PA 19107 215.925.2600 [email_address] www.azavea.com/hunchlab Data Mining and Risk Forecasting in Web-Based Analysis Tools
  • 2. Agenda
    • Who we are
    • What we are looking to do with software
    • What we’ve learned building risk forecasting features
      • (forecasting versus prediction)
  • 3. About Azavea
    • Founded in 2000
    • 30 people
    • Based in Philadelphia
      • Boston office
      • Minneapolis office
    • Geospatial + web + mobile
      • Software development
      • Spatial analysis services
  • 4. Clients & Industries
    • Public Safety
    • Municipal Services
    • Public Health
    • Human Services
    • Culture
    • Elections & Politics
    • Land Conservation
    • Economic Development
  • 5. HunchLab was developed, in part, based upon work supported by the National Science Foundation under Grant Nos. IIP-0637589 and IIP-0750507.
  • 6. The Backstory
  • 7. How Phila PD uses GIS
    • Customized Map Products
    Weekly CompStat Meetings Web Crime Analysis
  • 8. Complainant 911 Operator Radio Dispatcher Police Officer District 48 Desk Daily download & Geocoding Routines Incident Report Completed by Officer Maps distributed Through Intranet, Printing, CompStat INCT & PARS – main database sources  over 5,000 incidents daily, over 2 million annually CAD Verizon 911 INCT District X District Y District Z PARS
  • 9. The Context
    • 1,500,000 people
    • 7,000 police officers
    • 1,000 civilian employees
    • 2,000,000 new incidents / year
    3 crime analysts
  • 10. How can software help?
  • 11. Our goals with software
    • Improve ease of use
      • Increase consumers of analysis
    • Automate time-intensive routines
      • Free up resources for things that can’t be automated
    • Increase sophistication
      • Accomplish things not possible manually
  • 12.
    • web-based crime analysis, early warning, and risk forecasting
  • 13.
    • Crime Analysis
      • Mapping (spatial / temporal densities)
      • Trending
      • Intelligence Dashboard
    • Early Warning
      • Statistical & Threshold-based Hunches (data mining)
      • Alerting
    • Risk Forecasting
      • Near Repeat Pattern
      • Load Forecasting
  • 14. Near Repeat Pattern Analysis
  • 15. Contagious Crime?
    • Near repeat pattern analysis
        • “ If one burglary occurs, how does the risk change nearby?”
  • 16. Near Repeat Pattern Analysis
    • How can you test your own data?
      • Near Repeat Calculator
        • http://www.temple.edu/cj/misc/nr/
    • Papers
      • Near-Repeat Patterns in Philadelphia Shootings (2008)
        • One city block & two weeks after one shooting
          • 33% increase in likelihood of a second event
    Jerry Ratcliffe Temple University
  • 17. Near Repeat Pattern Analysis
    • The goal:
      • Quantify short term risk due to near-repeat victimization
        • “ If one burglary occurs, how does the risk of burglary for the neighbors change?”
    • What we know:
      • Incident A (place, time) --> Incident B (place, time)
        • Distance between A and B
        • Timeframe between A and B
    • What we need to know:
      • What distances/timeframes are not simply random?
  • 18. Near Repeat Pattern Analysis
    • The process
      • Observe the pattern in historic data
      • Simulate the pattern in randomized historic data
      • Compare the observed pattern to the simulated patterns
      • Apply the non-random pattern to new incidents
    • An example
      • 180 days of burglaries in Division 6 of Philadelphia
  • 19. Near Repeat Pattern Analysis
  • 20. Near Repeat Pattern Analysis
  • 21. Near Repeat Pattern Analysis
  • 22. Near Repeat Pattern Analysis
  • 23. Near Repeat Pattern Analysis
    • What did we learn?
      • Having a reference implementation was very helpful
        • Aids in translation of research into software
      • Analysts simplify things to make operationalization possible
        • They simplify risk bands to ease map making
      • Academics leave large questions unanswered
        • What happens when risk areas overlap?
  • 24.  
  • 25. Load Forecasting
  • 26. Improving CompStat
    • Load forecasting
        • “ Given the time of year, day of week, time of day and general trend, what counts of crimes should I expect?”
  • 27. What Do We Mean By Load Forecasting?
    • Load forecasting
        • Generating aggregate crime counts for a future timeframe using cyclical time series analysis
    bit.ly/gorrcrimeforecastingpaper Measure cyclical patterns Identify non-cyclical trend Forecast expected count +
  • 28. Load Forecasting
    • Measure cyclical patterns
        • Take historic incidents (for example: last five years)
        • Generate multiplicative seasonal indices
          • For each time cycle:
            • time of year
            • day of week
            • time of day
          • Count incidents within each time unit (for example: Monday)
          • Calculate average per time unit if incidents were evenly distributed
          • Divide counts within each time unit by the calculated average to generate multiplicative indices
            • Index ~ 1 means at the average
            • Index > 1 means above average
            • Index < 1 means below average
  • 29. Load Forecasting
  • 30. Load Forecasting
  • 31. Load Forecasting
  • 32. Load Forecasting
  • 33. Load Forecasting
    • Identify non-cyclical trend
        • Take recent daily counts (for example: last year daily counts)
        • Remove cyclical trends by dividing by indices
        • Run a trending function on the new counts
          • Simple average
            • Last X Days
          • Smoothing function
            • Exponential smoothing
            • Holt’s linear exponential smoothing
  • 34. Load Forecasting
    • Forecast expected count
        • Project trend into future timeframe
          • Always flat
            • Simple average
            • Exponential smoothing
          • Linear trend
            • Holt’s linear exponential smoothing
        • Multiple by seasonal indices to reseasonalize the data
  • 35. Load Forecasting bit.ly/gorrcrimeforecastingpaper Measure cyclical patterns Identify non-cyclical trend Forecast expected count +
  • 36. How Do We Know It ’s Accurate?
    • Testing
        • Generated multiple forecasting techniques (examples)
          • Commonly Used
            • Average of last 30 days
            • Average of last 365 days
            • Last year’s count for the same time period
          • Advanced Combinations
            • Different cyclical indices (example: day of year vs. month of year)
            • Different levels of geographic aggregation for indices
            • Different trending functions
        • Scoring methodologies (examples)
          • Mean absolute percent error (with some enhancements)
          • Mean percent error
          • Mean squared error
        • Run thousands of forecasts through testing framework
        • Choose the right technique in the right situation
  • 37. How Do We Know It ’s Accurate?
    • Error for 28-31 day forecasts for any Part X Series
    Last 30 Days Last Year Load Forecast Error Reduction Philadelphia - Citywide 6.8% 6.5% 4.1% 39% Philadelphia - Divisions 8.1% 8.4% 5.8% 28% Philadelphia - Districts 10.9% 11.7% 9.3% 15% Lincoln, NE - Citywide 13% 11% 10% 23%
  • 38. Improving CompStat
    • Load forecasting
        • “ Given the time of year, day of week, time of day and general trend, what counts of crimes should I expect?”
  • 39. Improving CompStat
  • 40. What’s next?
  • 41. Contact Information Jeremy Heffner HunchLab Product Manager [email_address] 215.701.7712 www.azavea.com/hunchlab