Missions: Under the Hood

340 N 12th St, Suite 402
Philadelphia, PA 19107
215.925.2600
info@azavea.com
www.hunchlab.com
Jeremy Heffner
HunchLab Product Manager
jheffner@azavea.com
215.701.7712

Amelia Longo
Business Development Associate
alon...
Places
People
Patterns

}

Prioritization
Predictive Missions
It’s the fourth Tuesday in January and
school is in session. There were 3
burglaries and 2 robberies yesterday.
Six bars, ...
How would you do it?
Analyst Process
•  Identify relevant factors
–  Training / Literature
–  Experience

•  Use heuristics
– 
– 
– 
– 
– 

hig...
?	
  
How HunchLab Works
term: machine learning
A computer system designed to learn how to
accomplish a task by using historic data sets.
There are...
term: algorithm
The step-by-step procedure to accomplish a
given calculation. Different algorithms have
different qualitie...
Overall Process
1.  Generate training examples of outcomes
2.  Enrich with relevant variables
3.  Build models
4.  Evaluat...
Generate Examples
~ 500 ft cells & 1+ hour time slices
Data Volume
•  Space
–  Lincoln, NE is 90 sq miles
–  500 ft cell size creates 12,000 cells

•  Time
–  3 years of data
– ...
Data Volume
•  Space
–  Lincoln, NE is 90 sq miles
–  500 ft cell size creates 12,000 cells

•  Time
–  3 years of data
– ...
Representing Crime Theories
Predictive Missions
•  Crime predictions based on:
–  Baseline crime levels
•  Similar to traditional hotspot maps

–  Nea...
Predictive Missions
•  Crime predictions based on:
–  Routine Activity Theory
•  Offender: proximity and concentration of ...
Representing Crime Theories
Risk Terrain Modeling
Gun	
  shoo)ngs	
  example	
  
Source:	
  Rutgers,	
  h8p://www.rutgerscps.org/rtm/irvrtmgoogearth.htm	
  
crimes

prior7

prior364

dayssincelast

bardist

dow

0

0

0

365

>2000ft

Monday

0

0

1

234

>2000ft

Monday

1

1
...
Representing Crime Theories
Aoristic Analysis
crimes

probability

0

0

1

a

2

b

3

c

4

d
crimes

weights

prior7

prior364

0

1

0

0

1

0

dayssincelast

bardist

dow

0

365 >2000ft

Monday

0

1

234 >2000f...
Building Models
Models
•  Baseline
–  Baseline models (6)
•  Counts
–  28 day
–  56 day
–  364 day

•  Kernel Densities
–  28 day
–  56 da...
term: decision tree
A machine learning algorithm that recursively
partitions a data set based upon variable
values forming...
crimes

prior7

prior364

dayssincelast

bardist

dow

0

0

0

365

>2000ft

Monday

0

0

1

234

>2000ft

Monday

1

1
...
term: gradient boosting machine (GBM)
A machine learning algorithm that uses a series
of weaker models (typically decision...
term: generalized additive model (GAM)
A regression model that fits smoothed functions to the
input variables. Compare to ...
HunchLab Model Building
1.  Build a GBM
–  examples è gradient boosting machine è y/n probabilities
312 million

Sampling
4 million

4 folds
1 mil

1 mil

1 mil

1 mil

}
GBM

1 mil
Evaluate

43

200
312 million

Sampling
4 million

GBM

43
HunchLab Model Building
1.  Build a GBM
–  examples è gradient boosting machine è y/n probabilities
•  Segment examples ...
HunchLab Model Building
2.  Build a GAM
–  y/n probabilities è generalized additive model è counts
•  Transforms (“bends...
Example
Lincoln NE
Lincoln Assaults
Lincoln Assaults
Lincoln Assaults
Lincoln Assaults
Lincoln Assaults
Selecting Models
Selecting Models
1.  Build models holding out last 28 days of data
2.  Score each model
– 

Combine different metrics into...
A map represented as a grid of cells

Crime Location

100%

0%
Cells ranked highest to lowest
100%

0%
Cells ranked highest to lowest

Percent of Patrol Area to Capture All Crimes

Average Crime Rank

100%
50%

Perce...
Assault	
  

Burglary	
  

MVT	
  

Rape	
  

Robbery	
  

Percent	
  of	
  Patrol	
  Area	
  to	
  Capture	
  All	
  Crim...
Average	
  Crime	
  Rank	
  
Assault
0

0.1

0.2

0.3

0.4

0.5

0.6

Burglary

MVT

Rape

Robbery
Theft of Motor Vehicle

Percent of Crimes Captured

0.4

0.2

0
0

0.02

0.04

0.06

0.08
Percent of Land Area

0.1

0.12
...
Overall Process
1.  Generate training examples of outcomes
2.  Enrich with relevant variables
3.  Build models
4.  Evaluat...
Our Solution
•  Learns from several years of your data
•  Automatically determines which theories apply
–  more than just ...
Our Solution
•  Learns from several years of your data
•  Automatically determines which theories apply
–  more than just ...
Additional Information
•  How did HunchLab originate?
•  How does HunchLab represent crime theories?
•  What data is neede...
Questions

340 N 12th St, Suite 402
Philadelphia, PA 19107
215.925.2600
info@azavea.com

www.hunchlab.com
Jeremy Heffner
HunchLab Product Manager
jheffner@azavea.com
215.701.7712
Amelia Longo
Business Development Associate
along...
HunchLab 2.0 Predictive Missions: Under the Hood
HunchLab 2.0 Predictive Missions: Under the Hood
HunchLab 2.0 Predictive Missions: Under the Hood
HunchLab 2.0 Predictive Missions: Under the Hood
HunchLab 2.0 Predictive Missions: Under the Hood
HunchLab 2.0 Predictive Missions: Under the Hood
HunchLab 2.0 Predictive Missions: Under the Hood
HunchLab 2.0 Predictive Missions: Under the Hood
HunchLab 2.0 Predictive Missions: Under the Hood
HunchLab 2.0 Predictive Missions: Under the Hood
HunchLab 2.0 Predictive Missions: Under the Hood
HunchLab 2.0 Predictive Missions: Under the Hood
HunchLab 2.0 Predictive Missions: Under the Hood
HunchLab 2.0 Predictive Missions: Under the Hood
HunchLab 2.0 Predictive Missions: Under the Hood
HunchLab 2.0 Predictive Missions: Under the Hood
HunchLab 2.0 Predictive Missions: Under the Hood
HunchLab 2.0 Predictive Missions: Under the Hood
HunchLab 2.0 Predictive Missions: Under the Hood
HunchLab 2.0 Predictive Missions: Under the Hood
Upcoming SlideShare
Loading in...5
×

HunchLab 2.0 Predictive Missions: Under the Hood

768

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
768
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "HunchLab 2.0 Predictive Missions: Under the Hood"

  1. 1. Missions: Under the Hood 340 N 12th St, Suite 402 Philadelphia, PA 19107 215.925.2600 info@azavea.com www.hunchlab.com
  2. 2. Jeremy Heffner HunchLab Product Manager jheffner@azavea.com 215.701.7712 Amelia Longo Business Development Associate alongo@azavea.com 215.701.7715
  3. 3. Places People Patterns } Prioritization
  4. 4. Predictive Missions
  5. 5. It’s the fourth Tuesday in January and school is in session. There were 3 burglaries and 2 robberies yesterday. Six bars, three take-out stores, and a school are in the neighborhood. The forecast is 17° with cloudy skies. Where do you focus your 2 vehicles?
  6. 6. How would you do it?
  7. 7. Analyst Process •  Identify relevant factors –  Training / Literature –  Experience •  Use heuristics –  –  –  –  –  high concentration of past crime è higher risk near a bar on a Friday night è higher risk near the police station è lower risk concentration of ex-offenders è higher risk near transit stops è higher risk
  8. 8. ?  
  9. 9. How HunchLab Works
  10. 10. term: machine learning A computer system designed to learn how to accomplish a task by using historic data sets. There are different ways (algorithms) to accomplish this training process.
  11. 11. term: algorithm The step-by-step procedure to accomplish a given calculation. Different algorithms have different qualities. Algorithms are used to train a machine learning model.
  12. 12. Overall Process 1.  Generate training examples of outcomes 2.  Enrich with relevant variables 3.  Build models 4.  Evaluate accuracy 5.  Select best performing model
  13. 13. Generate Examples
  14. 14. ~ 500 ft cells & 1+ hour time slices
  15. 15. Data Volume •  Space –  Lincoln, NE is 90 sq miles –  500 ft cell size creates 12,000 cells •  Time –  3 years of data –  1 hour resolution –  26,000 hour blocks •  Space x Time –  312,000,000 hour block cells (examples)
  16. 16. Data Volume •  Space –  Lincoln, NE is 90 sq miles –  500 ft cell size creates 12,000 cells •  Time –  3 years of data –  1 hour resolution –  26,000 hour blocks •  Space x Time –  312,000,000 hour block cells (examples) •  Sampling FTW! –  Outcomes are sparse (small % of examples have crimes) –  Sampling strategy preserves crime events
  17. 17. Representing Crime Theories
  18. 18. Predictive Missions •  Crime predictions based on: –  Baseline crime levels •  Similar to traditional hotspot maps –  Near repeat patterns •  Event recency (contagion) –  Risk Terrain Modeling •  Proximity and density of geographic features •  Points, Lines, Polygons (bars, bus stops, etc.) –  Collective Efficacy •  Socioeconomic indicators (poverty, unemployment, etc.)
  19. 19. Predictive Missions •  Crime predictions based on: –  Routine Activity Theory •  Offender: proximity and concentration of known offenders •  Guardianship: police presence (AVL / GPS) •  Targets: measures of exposure (population, parcels, vehicles) –  Temporal cycles •  Seasonality, time of month, day of week, time of day –  Recurring temporal events •  Holidays, sporting events, etc. –  Weather •  Temperature, precipitation
  20. 20. Representing Crime Theories Risk Terrain Modeling
  21. 21. Gun  shoo)ngs  example   Source:  Rutgers,  h8p://www.rutgerscps.org/rtm/irvrtmgoogearth.htm  
  22. 22. crimes prior7 prior364 dayssincelast bardist dow 0 0 0 365 >2000ft Monday 0 0 1 234 >2000ft Monday 1 1 3 3 750ft Tuesday 0 0 2 43 500ft Wednesday 2 0 2 74 500ft Friday
  23. 23. Representing Crime Theories Aoristic Analysis
  24. 24. crimes probability 0 0 1 a 2 b 3 c 4 d
  25. 25. crimes weights prior7 prior364 0 1 0 0 1 0 dayssincelast bardist dow 0 365 >2000ft Monday 0 1 234 >2000ft Monday 0.5 1 3 3 750ft Tuesday 1 0.5 1 3 3 750ft Tuesday 0 0 0 2 43 500ft Wednesday 0 0.13 0 2 74 500ft Friday 1 0.32 0 2 74 500ft Friday 2 0.55 0 2 74 500ft Friday
  26. 26. Building Models
  27. 27. Models •  Baseline –  Baseline models (6) •  Counts –  28 day –  56 day –  364 day •  Kernel Densities –  28 day –  56 day –  364 day –  HunchLab models •  Variations of a stacked ensemble: –  examples è gradient boosting machine (gbm) è y/n probabilities –  y/n probabilities è generalized additive model (gam) è counts
  28. 28. term: decision tree A machine learning algorithm that recursively partitions a data set based upon variable values forming a tree-like structure.
  29. 29. crimes prior7 prior364 dayssincelast bardist dow 0 0 0 365 >2000ft Monday 0 0 1 234 >2000ft Monday 1 1 3 3 750ft Tuesday 0 0 2 43 500ft Wednesday 2 0 2 74 500ft Friday
  30. 30. term: gradient boosting machine (GBM) A machine learning algorithm that uses a series of weaker models (typically decision trees) that are trained upon the residuals of prior iterations (boosting) to form one stronger model. 1 Build Decision Tree 1 2 Build Decision Tree 2 3 Build Decision Tree 3 Predict with 1 Predict with 1 & 2 Predict with 1-3 Calculate errors Calculate errors Calculate errors …
  31. 31. term: generalized additive model (GAM) A regression model that fits smoothed functions to the input variables. Compare to a generalized linear model which fits just a single coefficient to each variable.
  32. 32. HunchLab Model Building 1.  Build a GBM –  examples è gradient boosting machine è y/n probabilities
  33. 33. 312 million Sampling 4 million 4 folds 1 mil 1 mil 1 mil 1 mil } GBM 1 mil Evaluate 43 200
  34. 34. 312 million Sampling 4 million GBM 43
  35. 35. HunchLab Model Building 1.  Build a GBM –  examples è gradient boosting machine è y/n probabilities •  Segment examples into several folds –  For each fold build a GBM model on the rest of the data –  For each iteration in the GBMs: »  Randomly sample a portion of the data (stochastic) »  Adjust weights of observations (adaptive boosting) •  Determine how many iterations result in the most accurate model •  Build a GBM on all of the data for that many iterations
  36. 36. HunchLab Model Building 2.  Build a GAM –  y/n probabilities è generalized additive model è counts •  Transforms (“bends”) GBM output into counts •  Calibrates count levels with other key variables
  37. 37. Example
  38. 38. Lincoln NE
  39. 39. Lincoln Assaults
  40. 40. Lincoln Assaults
  41. 41. Lincoln Assaults
  42. 42. Lincoln Assaults
  43. 43. Lincoln Assaults
  44. 44. Selecting Models
  45. 45. Selecting Models 1.  Build models holding out last 28 days of data 2.  Score each model –  Combine different metrics into a selection score 3.  Select best score 4.  Rebuild the best model (including last 28 days data)
  46. 46. A map represented as a grid of cells Crime Location 100% 0% Cells ranked highest to lowest
  47. 47. 100% 0% Cells ranked highest to lowest Percent of Patrol Area to Capture All Crimes Average Crime Rank 100% 50% Percent of Crimes Captured vs. Percent of Patrol Area 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
  48. 48. Assault   Burglary   MVT   Rape   Robbery   Percent  of  Patrol  Area  to  Capture  All  Crimes   0%   20%   40%   60%   80%   100%  
  49. 49. Average  Crime  Rank   Assault 0 0.1 0.2 0.3 0.4 0.5 0.6 Burglary MVT Rape Robbery
  50. 50. Theft of Motor Vehicle Percent of Crimes Captured 0.4 0.2 0 0 0.02 0.04 0.06 0.08 Percent of Land Area 0.1 0.12 0.14 0.16
  51. 51. Overall Process 1.  Generate training examples of outcomes 2.  Enrich with relevant variables 3.  Build models 4.  Evaluate accuracy 5.  Select best performing model
  52. 52. Our Solution •  Learns from several years of your data •  Automatically determines which theories apply –  more than just crime data •  Prevents over-fitting •  Calibrates predictions •  Selects a model based upon a blind evaluation –  prioritization and count-based metrics
  53. 53. Our Solution •  Learns from several years of your data •  Automatically determines which theories apply –  more than just crime data •  Prevents over-fitting •  Calibrates predictions •  Selects a model based upon a blind evaluation –  prioritization and count-based metrics •  But it still cannot make your morning coffee
  54. 54. Additional Information •  How did HunchLab originate? •  How does HunchLab represent crime theories? •  What data is needed? •  How does the modeling work specifically?
  55. 55. Questions 340 N 12th St, Suite 402 Philadelphia, PA 19107 215.925.2600 info@azavea.com www.hunchlab.com
  56. 56. Jeremy Heffner HunchLab Product Manager jheffner@azavea.com 215.701.7712 Amelia Longo Business Development Associate alongo@azavea.com 215.701.7715 340 N 12th St, Suite 402 Philadelphia, PA 19107 215.925.2600 info@azavea.com www.hunchlab.com

×