Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Crime Analysis & Prediction System

4,859 views

Published on

Crime Analysis & Prediction System is a system to analyze & detect crime hotspots & predict crime.
It collects data from various data sources - crime data from OpenData sites, US census data, social media, traffic & weather data etc.
It leverages Microsoft's Azure Cloud and on premise technologies for back-end processing & desktop based visualization tools.

Published in: Data & Analytics

Crime Analysis & Prediction System

  1. 1. GROW WITH BIG DATA Third Eye Consulting Services & Solutions LLC.
  2. 2. Crime Analysis & Predictions System (CAPS)
  3. 3. Public Safety & National Security team at lead by Sanjay Jacob, Parul Bhandari & Mahesh Punyamurthula ORIGINALLY DEVELOPED FOR
  4. 4. CAPS – Problem Definition Public Governments around the world need to: 1. Do more while spending the least. 2.Better manage existing resources. 3.Be proactive in battling crime. 4.Be at the right place at the right time – to beat crime with the lowest impact. 5.Know what to do when and why.
  5. 5. CAPS – Problem Definition Other Challenges for Public Governments: 1. Lack of technical knowledge and resources. 2.Lack of management resources to manage, monitor and operate such systems. 3.Need to analyze disparate data sets spread across various systems and trapped in different formats. 4. Reliance on outdated infrastructure & systems – both stationary & mobile.
  6. 6. • Leverages Open Data initiatives by government bodies worldwide. • Based on Microsoft’s Big Data technologies stack. • Capable of handling Big Data’s Velocity, Volume and Veracity. • Easy to integrate, assemble and develop customized end-to- end solutions. • Analyze various types of data feeds - real time streaming & static data. • Provides comprehensive analytical capabilities. • Predict crime patterns for efficient deployment of public safety resources. CAPS - Solution
  7. 7. • CAPS is a system to analyze & detect crime hotspots & predict crime. • Collects data from various data sources - crime data from OpenData sites, US census data, social media, traffic & weather data etc. • Leverages Azure’s Cloud and on premise technologies for back- end processing & desktop based visualization tools. CAPS - Solution
  8. 8. The police can use the system in two ways: 1. The system can alert that a crime is imminent (in the next 4 hours) based on any new traffic or weather event/s. 2. The police can run the system once a day and based on the predictions, decide how to deploy resources (policemen) in each community/district. BENEFITS FOR THE LOCAL POLICE
  9. 9. TECHNICAL SECTION
  10. 10. • Azure HDInsight • MapReduce • Hive • Stream Analytics • Azure Queue • Azure Storage • SQL Azure • SQL Server • Power BI • PowerQ&A • PowerView • PowerMap TECHNOLOGIES USED
  11. 11. DATA COLLECTION LAYER DATA COLLECTION OPEN DATA - Static CENSUS DATA - Static WEATHER DATA – Real Time CRIME DATA - Static TRAFFIC DATA – Real Time SOCIAL MEDIA DATA – Real Time ENTERPRISE DATA – Real Time & Static MACHINE DATA – Real Time & Static INTERNET OF THINGS – Real Time & Static ANY OTHER DATA - Static ANY OTHER DATA – Real Time ANY OTHER DATA – Real Time & Static DATA PROCESSING LAYER Cloud or On Premise PRESENTATION LAYER
  12. 12. The system can be further enhanced to include additional data sources as available. For ex: • Video Data • Images Data • Police Systems Data ADDITIONAL DATA SOURCES
  13. 13. DATA COLLECTION – Windows Data Sources - For Chicago  Real time Tweet streams ingested from Twitter using Search APIs  Facebook data ingested using Graph Search APIs.  Traffic data ingested from Mapquest.  Weather data ingested from Forcast.io  Data feed ingestion is automated and captured using C# custom code base. Pre-Processor  Tweets are feed into Stream Computing Layer for sentiment logic processing.  Facebook, Traffic & Weather data parsed from JSON to csv on run time.  All data is persisted on Azure Storage.  Analyzed & summarized data is persisted in SQL Azure. Storage  Analyzed Twitter data is pushed to Window Azure SQL  Parsed Twitter/Facebook/Traffic/Weather data is persisted in Azure Storage in different containers.
  14. 14. DATA PROCESSING LAYER - Windows • Windows Azure • Windows HDInsight • Stream Analytics • Azure Queue • Azure Storage • SQL Azure • SQL Server PRESENTATION LAYER DATA COLLECTION LAYER DATA PROCESSING LAYER
  15. 15. DATA STORAGE & PROCESSING STORAGE  Processed & Aggregated data ingested into SQL Azure.  HDInsight blob storage provides reliable and a scalable solution.  All data is partitioned on dates. Sqoop Sqoop STORAGE  Calls script on pre-set schedule to ingest data into Hive tables.  Checks periodically to ensure normal system operations  Inserts data incrementally  Contains all data as per the table schemas.  Enables HiveQL execution when requests come in from PowerBI components. SCHEDULER HIVE SQL AZURE HIVE Scheduled Jobs  Daily scripts to create table and insert data, scheduled with cron jobs. HIVE Tables  Have all data in full details from all data sources.
  16. 16. PRESENTATION LAYER – Windows DATA PROCESSING LAYER • Power BI • PowerQ&A • PowerView • PowerMap • Power Query • PowerPivot • Windows 8 Apps • Mobile Apps DATA COLLECTION LAYER PRESENTATION LAYER
  17. 17. DATA PRESENTATION LAYER
  18. 18. DATA PRESENTATION LAYER  Excel 2013 is used as the platform and workbench for analyzing and mining data, using functionalities which are familiar to most power users.  PowerPivot is the semantic layer that defines the relationship between data and calculated measures.  Data is stored in-memory as a columnar database for faster retrievals.  Model data is saved along with Excel as a part of it, which makes sharing of these reports very easy.  PowerMap provides instant and overall picture of the trends happening across geographies over..  PowerView is a Silverlight Add-in that provides powerful interactive and intuitive dashboards and reports which are built on top of PowerPivot’s data model. It enables slicing/dicing, drilling-up/down of any level of data. It’s very useful to identify trends and root causes.
  19. 19. Real time Data Sources Data Collection Layer (C# custom code) Data Processing Layer (Stream Computing Platform - Storm) HDFS & Blob Storage (Azure) Presentation Layer (Power BI) Analytics (HDInsight Hive) Analytics (Stream Analytics & MapReduce) SQL Azure CLOUD MODEL – Windows • Cloud based data processing & transformations. • Cloud based real time & batch analytics. • Office 365’s PowerBI components for adhoc analytics. • Enabled for Windows 8 based Mobile & Desktop Apps. Static Data Sources CLOUD BASED INFRASTRUCTURE Message Queue Layer (Azure Event Hubs) Machine Learning Algorithms (AzureML)
  20. 20. Real time Data Sources Data Collection Layer (C# custom code) Data Processing Layer (Azure Stream Analytics) HDFS & Blob Storage (Azure) Presentation Layer (Power BI) Analytics (HDInsight Hive) Analytics (Stream Analytics & MapReduce) SQL Server HYBRID MODEL – Windows Static Data Sources Message Queue Layer (Azure Event Hubs) Machine Learning Algorithms (AzureML) • PowerBI components for adhoc analytics. • SQL Server based. • Cloud based data processing & transformations. • Cloud based real time & batch analytics. • Enabled for Windows 8 based Mobile & Desktop Apps. CLOUD BASED INFRASTRUCTURE ON-PREMISE INFRA
  21. 21. DATA SOURCES – For Chicago DATA DESCRIPTION SOURCE Crime Data Historic crime case data over years from 2000 - present • https://data.cityofchicago.org/Public- Safety/Crimes-2001-to-present/ijzp-q8t2 Chicago districts Chicago Police districts address information • https://portal.chicagopolice.org/portal/page/p ortal/ClearPath/Communities/Districts Chicago communities Chicago community area mapping • http://en.wikipedia.org/wiki/Community_areas_ in_Chicago Socio economic factors Selected socio economic indicators like people below poverty, unemployment, per capita income for each community • https://data.cityofchicago.org/Health-Human- Services/Census-Data-Selected- socioeconomic-indicators-in-C/kn9c-c2s2 Twitter Tweets about Chicago. Twitter Streaming API Facebook Posts about Chicago. Facebook Graph Search API Weather Chicago weather data Forecast.io Traffic Chicago traffic details MapQuest
  22. 22. ANALYTICS
  23. 23. CRIME ANALYTICS Analyze Crime Levels • Filters (depending on data) • Number of crime • Crime Types • Location • Date & Time • Temperature • Residents • Graph Type • Line • Bar • Pie Chart • Table • Bubble
  24. 24. CRIME ANALYTICS Analyze Crime Levels • Filters (depending on data) • Number of crime • Crime Types • Location • Date & Time • Temperature • Residents • Graph Type • Line • Bar • Pie Chart • Table • Bubble
  25. 25. CRIME ANALYTICS Analyze Crime Levels • Filters (depending on data) • Number of crime • Crime Types • Location • Date & Time • Temperature • Residents • Graph Type • Line • Bar • Pie Chart • Table • Bubble
  26. 26. PREDICTIONS
  27. 27. Name Values Comments Community Community ID This is the key. The prediction is for a specific community for a specific date & time. Date Date Time Period 1: 12am – 4am 2: 4am – 8am 3: 8am – 12pm 4: 12pm – 4pm 5: 4pm-8pm 6: 8pm – 12am For convenience purposes, we have broken up a day into 6 time slots. We can change this based on the supporting data. Weather 1- Normal 2- Abnormal 3- Extreme All weather conditions are categorized into these values. We picked suitable values for each of the weather types to get a good distribution. Traffic Event 1- Normal 2- Abnormal 3- Extreme All traffic conditions are categorized into these values. We picked suitable values for each of the traffic types to get a good distribution. Traffic Event Distance from Police Station 1 – Near 2- Far 3 – Very Far The assumption is that farther away the event from a police station, higher the chances of a crime. We picked suitable values for each to get a good distribution. Unemployment Rate 0 – 100 This is the unemployment rate in that precinct. Number of police stations in District Number Assuming that propensity for crime is inversely proportional to # of police stations. Crime 1 – Theft 2 – Assault 3 – Burglary 4 – Narcotics 5 – Battery 6 – None This is a placeholder category. This list can be anything that is (a) supported by the underlying data and (b) what the law enforcement are interested in seeing. FACTORS CONSIDERED FOR PREDICTING CRIME
  28. 28. • With the initial dataset, an initial prediction model is constructed. • If any of the fields change value, then the model is retrained. Some of the fields will change infrequently and others will change on a daily basis (ex. social media, weather & traffic events). The model is continuously updated/upgraded with new data. • The system periodically pulls in the latest fields (automatically) from appropriate sources. • Then the model runs against the new data to predict what kind of crime is likely to be committed in each of the communities. PREDICTION MODEL
  29. 29. CRIME PREDICTIONS Predict Crime • Filters (depending on data) • Number of crime • Crime Types • Location • Date & Time • Temperature • Residents • Graph Type • Line • Bar • Pie Chart • Table • Bubble
  30. 30. CRIME PREDICTIONS Predict Crime • Filters (depending on data) • Crime Types • Location • Date • Time • Temperature • Traffic • Distance to Police Station • Weather
  31. 31. The system is fully extensible and future proof. • Lessons learned • Patterns detected • Observations made for one city can be used and extended for other cities worldwide. The backend infrastructure will also adjust accordingly. EXTENSIBLITY
  32. 32. The Crime Analysis and Prediction System (CAPS) can/is: • Detect, Analyze & Predict Crime. • Help public governments battle crime better with lowered costs. • Based on Microsoft’s Big Data technologies – both cloud and on premise. • Built on the robust Azure platform that can scale vertically & horizontally. • Customizable & Extensible to meet the needs of specific business use cases. SUMMARY
  33. 33. THANK YOU!

×