### SlideShare for iOS

by Linkedin Corporation

FREE - On the App Store

- Total Views
- 906
- Views on SlideShare
- 904
- Embed Views

- Likes
- 0
- Downloads
- 2
- Comments
- 0

http://www.slideee.com | 2 |

Uploaded via SlideShare as Adobe PDF

© All Rights Reserved

- 1. Missions: Under the Hood 340 N 12th St, Suite 402 Philadelphia, PA 19107 215.925.2600 info@azavea.com www.hunchlab.com
- 2. Jeremy Heffner HunchLab Product Manager jheffner@azavea.com 215.701.7712 Amelia Longo Business Development Associate alongo@azavea.com 215.701.7715
- 3. Places People Patterns } Prioritization
- 4. Predictive Missions
- 5. It’s the fourth Tuesday in January and school is in session. There were 3 burglaries and 2 robberies yesterday. Six bars, three take-out stores, and a school are in the neighborhood. The forecast is 17° with cloudy skies. Where do you focus your 2 vehicles?
- 6. How would you do it?
- 7. Analyst Process • Identify relevant factors – Training / Literature – Experience • Use heuristics – – – – – high concentration of past crime è higher risk near a bar on a Friday night è higher risk near the police station è lower risk concentration of ex-offenders è higher risk near transit stops è higher risk
- 8. ?
- 9. How HunchLab Works
- 10. term: machine learning A computer system designed to learn how to accomplish a task by using historic data sets. There are different ways (algorithms) to accomplish this training process.
- 11. term: algorithm The step-by-step procedure to accomplish a given calculation. Different algorithms have different qualities. Algorithms are used to train a machine learning model.
- 12. Overall Process 1. Generate training examples of outcomes 2. Enrich with relevant variables 3. Build models 4. Evaluate accuracy 5. Select best performing model
- 13. Generate Examples
- 14. ~ 500 ft cells & 1+ hour time slices
- 15. Data Volume • Space – Lincoln, NE is 90 sq miles – 500 ft cell size creates 12,000 cells • Time – 3 years of data – 1 hour resolution – 26,000 hour blocks • Space x Time – 312,000,000 hour block cells (examples)
- 16. Data Volume • Space – Lincoln, NE is 90 sq miles – 500 ft cell size creates 12,000 cells • Time – 3 years of data – 1 hour resolution – 26,000 hour blocks • Space x Time – 312,000,000 hour block cells (examples) • Sampling FTW! – Outcomes are sparse (small % of examples have crimes) – Sampling strategy preserves crime events
- 17. Representing Crime Theories
- 18. Predictive Missions • Crime predictions based on: – Baseline crime levels • Similar to traditional hotspot maps – Near repeat patterns • Event recency (contagion) – Risk Terrain Modeling • Proximity and density of geographic features • Points, Lines, Polygons (bars, bus stops, etc.) – Collective Efficacy • Socioeconomic indicators (poverty, unemployment, etc.)
- 19. Predictive Missions • Crime predictions based on: – Routine Activity Theory • Offender: proximity and concentration of known offenders • Guardianship: police presence (AVL / GPS) • Targets: measures of exposure (population, parcels, vehicles) – Temporal cycles • Seasonality, time of month, day of week, time of day – Recurring temporal events • Holidays, sporting events, etc. – Weather • Temperature, precipitation
- 20. Representing Crime Theories Risk Terrain Modeling
- 21. Gun shoo)ngs example Source: Rutgers, h8p://www.rutgerscps.org/rtm/irvrtmgoogearth.htm
- 22. crimes prior7 prior364 dayssincelast bardist dow 0 0 0 365 >2000ft Monday 0 0 1 234 >2000ft Monday 1 1 3 3 750ft Tuesday 0 0 2 43 500ft Wednesday 2 0 2 74 500ft Friday
- 23. Representing Crime Theories Aoristic Analysis
- 24. crimes probability 0 0 1 a 2 b 3 c 4 d
- 25. crimes weights prior7 prior364 0 1 0 0 1 0 dayssincelast bardist dow 0 365 >2000ft Monday 0 1 234 >2000ft Monday 0.5 1 3 3 750ft Tuesday 1 0.5 1 3 3 750ft Tuesday 0 0 0 2 43 500ft Wednesday 0 0.13 0 2 74 500ft Friday 1 0.32 0 2 74 500ft Friday 2 0.55 0 2 74 500ft Friday
- 26. Building Models
- 27. Models • Baseline – Baseline models (6) • Counts – 28 day – 56 day – 364 day • Kernel Densities – 28 day – 56 day – 364 day – HunchLab models • Variations of a stacked ensemble: – examples è gradient boosting machine (gbm) è y/n probabilities – y/n probabilities è generalized additive model (gam) è counts
- 28. term: decision tree A machine learning algorithm that recursively partitions a data set based upon variable values forming a tree-like structure.
- 29. crimes prior7 prior364 dayssincelast bardist dow 0 0 0 365 >2000ft Monday 0 0 1 234 >2000ft Monday 1 1 3 3 750ft Tuesday 0 0 2 43 500ft Wednesday 2 0 2 74 500ft Friday
- 30. term: gradient boosting machine (GBM) A machine learning algorithm that uses a series of weaker models (typically decision trees) that are trained upon the residuals of prior iterations (boosting) to form one stronger model. 1 Build Decision Tree 1 2 Build Decision Tree 2 3 Build Decision Tree 3 Predict with 1 Predict with 1 & 2 Predict with 1-3 Calculate errors Calculate errors Calculate errors …
- 31. term: generalized additive model (GAM) A regression model that fits smoothed functions to the input variables. Compare to a generalized linear model which fits just a single coefficient to each variable.
- 32. HunchLab Model Building 1. Build a GBM – examples è gradient boosting machine è y/n probabilities
- 33. 312 million Sampling 4 million 4 folds 1 mil 1 mil 1 mil 1 mil } GBM 1 mil Evaluate 43 200
- 34. 312 million Sampling 4 million GBM 43
- 35. HunchLab Model Building 1. Build a GBM – examples è gradient boosting machine è y/n probabilities • Segment examples into several folds – For each fold build a GBM model on the rest of the data – For each iteration in the GBMs: » Randomly sample a portion of the data (stochastic) » Adjust weights of observations (adaptive boosting) • Determine how many iterations result in the most accurate model • Build a GBM on all of the data for that many iterations
- 36. HunchLab Model Building 2. Build a GAM – y/n probabilities è generalized additive model è counts • Transforms (“bends”) GBM output into counts • Calibrates count levels with other key variables
- 37. Example
- 38. Lincoln NE
- 39. Lincoln Assaults
- 40. Lincoln Assaults
- 41. Lincoln Assaults
- 42. Lincoln Assaults
- 43. Lincoln Assaults
- 44. Selecting Models
- 45. Selecting Models 1. Build models holding out last 28 days of data 2. Score each model – Combine different metrics into a selection score 3. Select best score 4. Rebuild the best model (including last 28 days data)
- 46. A map represented as a grid of cells Crime Location 100% 0% Cells ranked highest to lowest
- 47. 100% 0% Cells ranked highest to lowest Percent of Patrol Area to Capture All Crimes Average Crime Rank 100% 50% Percent of Crimes Captured vs. Percent of Patrol Area 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
- 48. Assault Burglary MVT Rape Robbery Percent of Patrol Area to Capture All Crimes 0% 20% 40% 60% 80% 100%
- 49. Average Crime Rank Assault 0 0.1 0.2 0.3 0.4 0.5 0.6 Burglary MVT Rape Robbery
- 50. Theft of Motor Vehicle Percent of Crimes Captured 0.4 0.2 0 0 0.02 0.04 0.06 0.08 Percent of Land Area 0.1 0.12 0.14 0.16
- 51. Overall Process 1. Generate training examples of outcomes 2. Enrich with relevant variables 3. Build models 4. Evaluate accuracy 5. Select best performing model
- 52. Our Solution • Learns from several years of your data • Automatically determines which theories apply – more than just crime data • Prevents over-fitting • Calibrates predictions • Selects a model based upon a blind evaluation – prioritization and count-based metrics
- 53. Our Solution • Learns from several years of your data • Automatically determines which theories apply – more than just crime data • Prevents over-fitting • Calibrates predictions • Selects a model based upon a blind evaluation – prioritization and count-based metrics • But it still cannot make your morning coffee
- 54. Additional Information • How did HunchLab originate? • How does HunchLab represent crime theories? • What data is needed? • How does the modeling work specifically?
- 55. Questions 340 N 12th St, Suite 402 Philadelphia, PA 19107 215.925.2600 info@azavea.com www.hunchlab.com
- 56. Jeremy Heffner HunchLab Product Manager jheffner@azavea.com 215.701.7712 Amelia Longo Business Development Associate alongo@azavea.com 215.701.7715 340 N 12th St, Suite 402 Philadelphia, PA 19107 215.925.2600 info@azavea.com www.hunchlab.com

Full NameComment goes here.