5.
It’s the fourth Tuesday in January and
school is in session. There were 3
burglaries and 2 robberies yesterday.
Six bars, three take-out stores, and a
school are in the neighborhood. The
forecast is 17° with cloudy skies.
Where do you focus your 2 vehicles?
7.
Analyst Process
• Identify relevant factors
– Training / Literature
– Experience
• Use heuristics
–
–
–
–
–
high concentration of past crime è higher risk
near a bar on a Friday night è higher risk
near the police station è lower risk
concentration of ex-offenders è higher risk
near transit stops è higher risk
10.
term: machine learning
A computer system designed to learn how to
accomplish a task by using historic data sets.
There are different ways (algorithms) to
accomplish this training process.
11.
term: algorithm
The step-by-step procedure to accomplish a
given calculation. Different algorithms have
different qualities. Algorithms are used to train
a machine learning model.
12.
Overall Process
1. Generate training examples of outcomes
2. Enrich with relevant variables
3. Build models
4. Evaluate accuracy
5. Select best performing model
15.
Data Volume
• Space
– Lincoln, NE is 90 sq miles
– 500 ft cell size creates 12,000 cells
• Time
– 3 years of data
– 1 hour resolution
– 26,000 hour blocks
• Space x Time
– 312,000,000 hour block cells (examples)
16.
Data Volume
• Space
– Lincoln, NE is 90 sq miles
– 500 ft cell size creates 12,000 cells
• Time
– 3 years of data
– 1 hour resolution
– 26,000 hour blocks
• Space x Time
– 312,000,000 hour block cells (examples)
• Sampling FTW!
– Outcomes are sparse (small % of examples have crimes)
– Sampling strategy preserves crime events
18.
Predictive Missions
• Crime predictions based on:
– Baseline crime levels
• Similar to traditional hotspot maps
– Near repeat patterns
• Event recency (contagion)
– Risk Terrain Modeling
• Proximity and density of geographic features
• Points, Lines, Polygons (bars, bus stops, etc.)
– Collective Efficacy
• Socioeconomic indicators (poverty, unemployment, etc.)
19.
Predictive Missions
• Crime predictions based on:
– Routine Activity Theory
• Offender: proximity and concentration of known offenders
• Guardianship: police presence (AVL / GPS)
• Targets: measures of exposure (population, parcels, vehicles)
– Temporal cycles
• Seasonality, time of month, day of week, time of day
– Recurring temporal events
• Holidays, sporting events, etc.
– Weather
• Temperature, precipitation
30.
term: gradient boosting machine (GBM)
A machine learning algorithm that uses a series
of weaker models (typically decision trees) that
are trained upon the residuals of prior iterations
(boosting) to form one stronger model.
1
Build
Decision
Tree 1
2
Build
Decision
Tree 2
3
Build
Decision
Tree 3
Predict
with 1
Predict
with 1 & 2
Predict
with 1-3
Calculate
errors
Calculate
errors
Calculate
errors
…
31.
term: generalized additive model (GAM)
A regression model that fits smoothed functions to the
input variables. Compare to a generalized linear model
which fits just a single coefficient to each variable.
32.
HunchLab Model Building
1. Build a GBM
– examples è gradient boosting machine è y/n probabilities
33.
312 million
Sampling
4 million
4 folds
1 mil
1 mil
1 mil
1 mil
}
GBM
1 mil
Evaluate
43
200
35.
HunchLab Model Building
1. Build a GBM
– examples è gradient boosting machine è y/n probabilities
• Segment examples into several folds
– For each fold build a GBM model on the rest of the data
– For each iteration in the GBMs:
» Randomly sample a portion of the data (stochastic)
» Adjust weights of observations (adaptive boosting)
• Determine how many iterations result in the most accurate model
• Build a GBM on all of the data for that many iterations
36.
HunchLab Model Building
2. Build a GAM
– y/n probabilities è generalized additive model è counts
• Transforms (“bends”) GBM output into counts
• Calibrates count levels with other key variables
45.
Selecting Models
1. Build models holding out last 28 days of data
2. Score each model
–
Combine different metrics into a selection score
3. Select best score
4. Rebuild the best model (including last 28 days data)
46.
A map represented as a grid of cells
Crime Location
100%
0%
Cells ranked highest to lowest
47.
100%
0%
Cells ranked highest to lowest
Percent of Patrol Area to Capture All Crimes
Average Crime Rank
100%
50%
Percent of Crimes Captured vs. Percent of Patrol Area
0%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
48.
Assault
Burglary
MVT
Rape
Robbery
Percent
of
Patrol
Area
to
Capture
All
Crimes
0%
20%
40%
60%
80%
100%
50.
Theft of Motor Vehicle
Percent of Crimes Captured
0.4
0.2
0
0
0.02
0.04
0.06
0.08
Percent of Land Area
0.1
0.12
0.14
0.16
51.
Overall Process
1. Generate training examples of outcomes
2. Enrich with relevant variables
3. Build models
4. Evaluate accuracy
5. Select best performing model
52.
Our Solution
• Learns from several years of your data
• Automatically determines which theories apply
– more than just crime data
• Prevents over-fitting
• Calibrates predictions
• Selects a model based upon a blind evaluation
– prioritization and count-based metrics
53.
Our Solution
• Learns from several years of your data
• Automatically determines which theories apply
– more than just crime data
• Prevents over-fitting
• Calibrates predictions
• Selects a model based upon a blind evaluation
– prioritization and count-based metrics
• But it still cannot make your morning coffee
54.
Additional Information
• How did HunchLab originate?
• How does HunchLab represent crime theories?
• What data is needed?
• How does the modeling work specifically?
55.
Questions
340 N 12th St, Suite 402
Philadelphia, PA 19107
215.925.2600
info@azavea.com
www.hunchlab.com
56.
Jeremy Heffner
HunchLab Product Manager
jheffner@azavea.com
215.701.7712
Amelia Longo
Business Development Associate
alongo@azavea.com
215.701.7715
340 N 12th St, Suite 402
Philadelphia, PA 19107
215.925.2600
info@azavea.com
www.hunchlab.com
Be the first to comment