HunchLab 2.0 Predictive Missions: Under the Hood
Upcoming SlideShare
Loading in...5
×
 

HunchLab 2.0 Predictive Missions: Under the Hood

on

  • 818 views

 

Statistics

Views

Total Views
818
Slideshare-icon Views on SlideShare
818
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    HunchLab 2.0 Predictive Missions: Under the Hood HunchLab 2.0 Predictive Missions: Under the Hood Presentation Transcript

    • Missions: Under the Hood 340 N 12th St, Suite 402 Philadelphia, PA 19107 215.925.2600 info@azavea.com www.hunchlab.com
    • Jeremy Heffner HunchLab Product Manager jheffner@azavea.com 215.701.7712 Amelia Longo Business Development Associate alongo@azavea.com 215.701.7715
    • Places People Patterns } Prioritization
    • Predictive Missions
    • It’s the fourth Tuesday in January and school is in session. There were 3 burglaries and 2 robberies yesterday. Six bars, three take-out stores, and a school are in the neighborhood. The forecast is 17° with cloudy skies. Where do you focus your 2 vehicles?
    • How would you do it?
    • Analyst Process •  Identify relevant factors –  Training / Literature –  Experience •  Use heuristics –  –  –  –  –  high concentration of past crime è higher risk near a bar on a Friday night è higher risk near the police station è lower risk concentration of ex-offenders è higher risk near transit stops è higher risk
    • ?  
    • How HunchLab Works
    • term: machine learning A computer system designed to learn how to accomplish a task by using historic data sets. There are different ways (algorithms) to accomplish this training process.
    • term: algorithm The step-by-step procedure to accomplish a given calculation. Different algorithms have different qualities. Algorithms are used to train a machine learning model.
    • Overall Process 1.  Generate training examples of outcomes 2.  Enrich with relevant variables 3.  Build models 4.  Evaluate accuracy 5.  Select best performing model
    • Generate Examples
    • ~ 500 ft cells & 1+ hour time slices
    • Data Volume •  Space –  Lincoln, NE is 90 sq miles –  500 ft cell size creates 12,000 cells •  Time –  3 years of data –  1 hour resolution –  26,000 hour blocks •  Space x Time –  312,000,000 hour block cells (examples)
    • Data Volume •  Space –  Lincoln, NE is 90 sq miles –  500 ft cell size creates 12,000 cells •  Time –  3 years of data –  1 hour resolution –  26,000 hour blocks •  Space x Time –  312,000,000 hour block cells (examples) •  Sampling FTW! –  Outcomes are sparse (small % of examples have crimes) –  Sampling strategy preserves crime events
    • Representing Crime Theories
    • Predictive Missions •  Crime predictions based on: –  Baseline crime levels •  Similar to traditional hotspot maps –  Near repeat patterns •  Event recency (contagion) –  Risk Terrain Modeling •  Proximity and density of geographic features •  Points, Lines, Polygons (bars, bus stops, etc.) –  Collective Efficacy •  Socioeconomic indicators (poverty, unemployment, etc.)
    • Predictive Missions •  Crime predictions based on: –  Routine Activity Theory •  Offender: proximity and concentration of known offenders •  Guardianship: police presence (AVL / GPS) •  Targets: measures of exposure (population, parcels, vehicles) –  Temporal cycles •  Seasonality, time of month, day of week, time of day –  Recurring temporal events •  Holidays, sporting events, etc. –  Weather •  Temperature, precipitation
    • Representing Crime Theories Risk Terrain Modeling
    • Gun  shoo)ngs  example   Source:  Rutgers,  h8p://www.rutgerscps.org/rtm/irvrtmgoogearth.htm  
    • crimes prior7 prior364 dayssincelast bardist dow 0 0 0 365 >2000ft Monday 0 0 1 234 >2000ft Monday 1 1 3 3 750ft Tuesday 0 0 2 43 500ft Wednesday 2 0 2 74 500ft Friday
    • Representing Crime Theories Aoristic Analysis
    • crimes probability 0 0 1 a 2 b 3 c 4 d
    • crimes weights prior7 prior364 0 1 0 0 1 0 dayssincelast bardist dow 0 365 >2000ft Monday 0 1 234 >2000ft Monday 0.5 1 3 3 750ft Tuesday 1 0.5 1 3 3 750ft Tuesday 0 0 0 2 43 500ft Wednesday 0 0.13 0 2 74 500ft Friday 1 0.32 0 2 74 500ft Friday 2 0.55 0 2 74 500ft Friday
    • Building Models
    • Models •  Baseline –  Baseline models (6) •  Counts –  28 day –  56 day –  364 day •  Kernel Densities –  28 day –  56 day –  364 day –  HunchLab models •  Variations of a stacked ensemble: –  examples è gradient boosting machine (gbm) è y/n probabilities –  y/n probabilities è generalized additive model (gam) è counts
    • term: decision tree A machine learning algorithm that recursively partitions a data set based upon variable values forming a tree-like structure.
    • crimes prior7 prior364 dayssincelast bardist dow 0 0 0 365 >2000ft Monday 0 0 1 234 >2000ft Monday 1 1 3 3 750ft Tuesday 0 0 2 43 500ft Wednesday 2 0 2 74 500ft Friday
    • term: gradient boosting machine (GBM) A machine learning algorithm that uses a series of weaker models (typically decision trees) that are trained upon the residuals of prior iterations (boosting) to form one stronger model. 1 Build Decision Tree 1 2 Build Decision Tree 2 3 Build Decision Tree 3 Predict with 1 Predict with 1 & 2 Predict with 1-3 Calculate errors Calculate errors Calculate errors …
    • term: generalized additive model (GAM) A regression model that fits smoothed functions to the input variables. Compare to a generalized linear model which fits just a single coefficient to each variable.
    • HunchLab Model Building 1.  Build a GBM –  examples è gradient boosting machine è y/n probabilities
    • 312 million Sampling 4 million 4 folds 1 mil 1 mil 1 mil 1 mil } GBM 1 mil Evaluate 43 200
    • 312 million Sampling 4 million GBM 43
    • HunchLab Model Building 1.  Build a GBM –  examples è gradient boosting machine è y/n probabilities •  Segment examples into several folds –  For each fold build a GBM model on the rest of the data –  For each iteration in the GBMs: »  Randomly sample a portion of the data (stochastic) »  Adjust weights of observations (adaptive boosting) •  Determine how many iterations result in the most accurate model •  Build a GBM on all of the data for that many iterations
    • HunchLab Model Building 2.  Build a GAM –  y/n probabilities è generalized additive model è counts •  Transforms (“bends”) GBM output into counts •  Calibrates count levels with other key variables
    • Example
    • Lincoln NE
    • Lincoln Assaults
    • Lincoln Assaults
    • Lincoln Assaults
    • Lincoln Assaults
    • Lincoln Assaults
    • Selecting Models
    • Selecting Models 1.  Build models holding out last 28 days of data 2.  Score each model –  Combine different metrics into a selection score 3.  Select best score 4.  Rebuild the best model (including last 28 days data)
    • A map represented as a grid of cells Crime Location 100% 0% Cells ranked highest to lowest
    • 100% 0% Cells ranked highest to lowest Percent of Patrol Area to Capture All Crimes Average Crime Rank 100% 50% Percent of Crimes Captured vs. Percent of Patrol Area 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
    • Assault   Burglary   MVT   Rape   Robbery   Percent  of  Patrol  Area  to  Capture  All  Crimes   0%   20%   40%   60%   80%   100%  
    • Average  Crime  Rank   Assault 0 0.1 0.2 0.3 0.4 0.5 0.6 Burglary MVT Rape Robbery
    • Theft of Motor Vehicle Percent of Crimes Captured 0.4 0.2 0 0 0.02 0.04 0.06 0.08 Percent of Land Area 0.1 0.12 0.14 0.16
    • Overall Process 1.  Generate training examples of outcomes 2.  Enrich with relevant variables 3.  Build models 4.  Evaluate accuracy 5.  Select best performing model
    • Our Solution •  Learns from several years of your data •  Automatically determines which theories apply –  more than just crime data •  Prevents over-fitting •  Calibrates predictions •  Selects a model based upon a blind evaluation –  prioritization and count-based metrics
    • Our Solution •  Learns from several years of your data •  Automatically determines which theories apply –  more than just crime data •  Prevents over-fitting •  Calibrates predictions •  Selects a model based upon a blind evaluation –  prioritization and count-based metrics •  But it still cannot make your morning coffee
    • Additional Information •  How did HunchLab originate? •  How does HunchLab represent crime theories? •  What data is needed? •  How does the modeling work specifically?
    • Questions 340 N 12th St, Suite 402 Philadelphia, PA 19107 215.925.2600 info@azavea.com www.hunchlab.com
    • Jeremy Heffner HunchLab Product Manager jheffner@azavea.com 215.701.7712 Amelia Longo Business Development Associate alongo@azavea.com 215.701.7715 340 N 12th St, Suite 402 Philadelphia, PA 19107 215.925.2600 info@azavea.com www.hunchlab.com