• Save
2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-based Analysis Tools
Upcoming SlideShare
Loading in...5
×
 

2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-based Analysis Tools

on

  • 786 views

 

Statistics

Views

Total Views
786
Views on SlideShare
786
Embed Views
0

Actions

Likes
2
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • We make custom map requests, provide data for CompStat and support a Web site that is updated daily to support daily functions of districts.

2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-based Analysis Tools 2011 NIJ Crime Mapping Conference - Data Mining and Risk Forecasting in Web-based Analysis Tools Presentation Transcript

  • 340 N 12 th St, Suite 402 Philadelphia, PA 19107 215.925.2600 [email_address] www.azavea.com/hunchlab Data Mining and Risk Forecasting in Web-Based Analysis Tools
  • Agenda
    • Who we are
    • What we are looking to do with software
    • What we’ve learned building risk forecasting features
      • (forecasting versus prediction)
  • About Azavea
    • Founded in 2000
    • 30 people
    • Based in Philadelphia
      • Boston office
      • Minneapolis office
    • Geospatial + web + mobile
      • Software development
      • Spatial analysis services
  • Clients & Industries
    • Public Safety
    • Municipal Services
    • Public Health
    • Human Services
    • Culture
    • Elections & Politics
    • Land Conservation
    • Economic Development
  • HunchLab was developed, in part, based upon work supported by the National Science Foundation under Grant Nos. IIP-0637589 and IIP-0750507.
  • The Backstory
  • How Phila PD uses GIS
    • Customized Map Products
    Weekly CompStat Meetings Web Crime Analysis
  • Complainant 911 Operator Radio Dispatcher Police Officer District 48 Desk Daily download & Geocoding Routines Incident Report Completed by Officer Maps distributed Through Intranet, Printing, CompStat INCT & PARS – main database sources  over 5,000 incidents daily, over 2 million annually CAD Verizon 911 INCT District X District Y District Z PARS
  • The Context
    • 1,500,000 people
    • 7,000 police officers
    • 1,000 civilian employees
    • 2,000,000 new incidents / year
    3 crime analysts
  • How can software help?
  • Our goals with software
    • Improve ease of use
      • Increase consumers of analysis
    • Automate time-intensive routines
      • Free up resources for things that can’t be automated
    • Increase sophistication
      • Accomplish things not possible manually
    • web-based crime analysis, early warning, and risk forecasting
    • Crime Analysis
      • Mapping (spatial / temporal densities)
      • Trending
      • Intelligence Dashboard
    • Early Warning
      • Statistical & Threshold-based Hunches (data mining)
      • Alerting
    • Risk Forecasting
      • Near Repeat Pattern
      • Load Forecasting
  • Near Repeat Pattern Analysis
  • Contagious Crime?
    • Near repeat pattern analysis
        • “ If one burglary occurs, how does the risk change nearby?”
  • Near Repeat Pattern Analysis
    • How can you test your own data?
      • Near Repeat Calculator
        • http://www.temple.edu/cj/misc/nr/
    • Papers
      • Near-Repeat Patterns in Philadelphia Shootings (2008)
        • One city block & two weeks after one shooting
          • 33% increase in likelihood of a second event
    Jerry Ratcliffe Temple University
  • Near Repeat Pattern Analysis
    • The goal:
      • Quantify short term risk due to near-repeat victimization
        • “ If one burglary occurs, how does the risk of burglary for the neighbors change?”
    • What we know:
      • Incident A (place, time) --> Incident B (place, time)
        • Distance between A and B
        • Timeframe between A and B
    • What we need to know:
      • What distances/timeframes are not simply random?
  • Near Repeat Pattern Analysis
    • The process
      • Observe the pattern in historic data
      • Simulate the pattern in randomized historic data
      • Compare the observed pattern to the simulated patterns
      • Apply the non-random pattern to new incidents
    • An example
      • 180 days of burglaries in Division 6 of Philadelphia
  • Near Repeat Pattern Analysis
  • Near Repeat Pattern Analysis
  • Near Repeat Pattern Analysis
  • Near Repeat Pattern Analysis
  • Near Repeat Pattern Analysis
    • What did we learn?
      • Having a reference implementation was very helpful
        • Aids in translation of research into software
      • Analysts simplify things to make operationalization possible
        • They simplify risk bands to ease map making
      • Academics leave large questions unanswered
        • What happens when risk areas overlap?
  •  
  • Load Forecasting
  • Improving CompStat
    • Load forecasting
        • “ Given the time of year, day of week, time of day and general trend, what counts of crimes should I expect?”
  • What Do We Mean By Load Forecasting?
    • Load forecasting
        • Generating aggregate crime counts for a future timeframe using cyclical time series analysis
    bit.ly/gorrcrimeforecastingpaper Measure cyclical patterns Identify non-cyclical trend Forecast expected count +
  • Load Forecasting
    • Measure cyclical patterns
        • Take historic incidents (for example: last five years)
        • Generate multiplicative seasonal indices
          • For each time cycle:
            • time of year
            • day of week
            • time of day
          • Count incidents within each time unit (for example: Monday)
          • Calculate average per time unit if incidents were evenly distributed
          • Divide counts within each time unit by the calculated average to generate multiplicative indices
            • Index ~ 1 means at the average
            • Index > 1 means above average
            • Index < 1 means below average
  • Load Forecasting
  • Load Forecasting
  • Load Forecasting
  • Load Forecasting
  • Load Forecasting
    • Identify non-cyclical trend
        • Take recent daily counts (for example: last year daily counts)
        • Remove cyclical trends by dividing by indices
        • Run a trending function on the new counts
          • Simple average
            • Last X Days
          • Smoothing function
            • Exponential smoothing
            • Holt’s linear exponential smoothing
  • Load Forecasting
    • Forecast expected count
        • Project trend into future timeframe
          • Always flat
            • Simple average
            • Exponential smoothing
          • Linear trend
            • Holt’s linear exponential smoothing
        • Multiple by seasonal indices to reseasonalize the data
  • Load Forecasting bit.ly/gorrcrimeforecastingpaper Measure cyclical patterns Identify non-cyclical trend Forecast expected count +
  • How Do We Know It ’s Accurate?
    • Testing
        • Generated multiple forecasting techniques (examples)
          • Commonly Used
            • Average of last 30 days
            • Average of last 365 days
            • Last year’s count for the same time period
          • Advanced Combinations
            • Different cyclical indices (example: day of year vs. month of year)
            • Different levels of geographic aggregation for indices
            • Different trending functions
        • Scoring methodologies (examples)
          • Mean absolute percent error (with some enhancements)
          • Mean percent error
          • Mean squared error
        • Run thousands of forecasts through testing framework
        • Choose the right technique in the right situation
  • How Do We Know It ’s Accurate?
    • Error for 28-31 day forecasts for any Part X Series
    Last 30 Days Last Year Load Forecast Error Reduction Philadelphia - Citywide 6.8% 6.5% 4.1% 39% Philadelphia - Divisions 8.1% 8.4% 5.8% 28% Philadelphia - Districts 10.9% 11.7% 9.3% 15% Lincoln, NE - Citywide 13% 11% 10% 23%
  • Improving CompStat
    • Load forecasting
        • “ Given the time of year, day of week, time of day and general trend, what counts of crimes should I expect?”
  • Improving CompStat
  • What’s next?
  • Contact Information Jeremy Heffner HunchLab Product Manager [email_address] 215.701.7712 www.azavea.com/hunchlab