SlideShare a Scribd company logo
1 of 57
Analysis of
London’s Crime and
Census data
Pairview BI Developer Project
COLIN BARTRAM
Project Roadmap
Phase 1 Crime Data
Phase 2 Census Data
1A - Dashboard
2B - Linear Regression
2A - Clustering
2C - Data Mining
1B – Heat Map
DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 2
Data Sources
• The Metropolitan Police Service (MPS) releases monthly
anonymised crime data.
• The Office for National Statistics (ONS) conducts censuses
every 10 years and releases demographic statistics based
on the responses. The latest available is from 2011.
• Geographic data was imported from ONS. The latest Local
Authority boundaries date from 2015.
DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 3
Geographic granularity
MSOAs
LSOAs
Locations
(streets)
• MPS Crime location data includes Lower-Layer Super
Output Areas (LSOA).
• LSOA can be aggregated to various local authority units
and to Middle-Layer Super Output Areas (MSOA).
• MSOA tends to be the lowest level of geography at which
Census data is published.
• MSOA is a more consistent geographic level than local
authority. An MSOA is targeted to have between 5000 and
15000 population.
DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 4
Objectives
My objectives are to:
a. Visualise London crime data.
b. Analyse crime stats to highlight any contributory
demographic factors.
The analysis techniques to be employed will include:
• Linear regression
• K-means clustering
• Data mining using decision trees
DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 5
High Level Architecture
• SSIS will be used to load into a MSSQL database.
• T-SQL queries will be used to populate a data warehouse.
• Power BI will be used for visualisations
• SSAS will be used to develop a data cube.
• Excel will be used for reporting and linear regression
• Python will be used for clustering
• SSAS will be used for data mining and decision tree
analysis.
MSSQL
SSAS
SSIS
Python
Cloud
DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 6
Phase 1
Crime Data
DataSources
The Metropolitan Police Service (MPS) releases monthly anonymised
crime data, alongside a history of the previous two years, at
https://data.police.uk/data/
They use street snap-points to geomask the locations and do not
identify the day of the month.
The MPS data includes locations at street level, including the LSOA
(Lower Layer Super Output Area) code, which allows the data to be
aggregated at LocalAuthority (LAD),Ward level or Middle-Layer Super
Output Areas (MSOA) level.
Geographic data is sources from https://data.gov.uk/dataset/bc0d1720-
0275-490d-a7da-d22e69495314/lower-layer-super-output-area-2011-to-
ward-2015-lookup-in-england-and-wales
Tools
 SSIS will be used to load into SQLServer staging
tables.
 T-SQL queries will be used to populate a
SQLServer data warehouse.
 (By keeping staging tables separate from the data
warehouse, transformation will be handled by SQL
queries for reasons of scalability and transparency).
 Power BI will be used to develop the dashboard.
MSSQL
DW
MSSQL
Staging
Power
BI
SSIS
Cloud
ETLProcess
Download and extract the required Crime and geographic data
using SSIS
Create PoliceStats database and staging tables on SQL Server.
Use SSIS to iterate over the folder structure to upload the
Crime data to SQL Server.
Use SSIS to upload the geographic data to SQL Server.
Extract
Extract
Transform
Load
Number of Months (CSVs) 39
Number of Locations 65,699
Number of Recorded Crimes 3,677,871
ETLProcess
The data is well formed and LSOA codes linking the two data
sources are 100% valid requiring no data cleansing.
MPS do not currently use the fields FallsWithin and Reported
By to indicate where other Police Services are involved
Crimes recorded by the MPS at geographical locations outside
of the Met Police area (London Boroughs) are included and could
distort the results.
I added ‘No Location’ records with zero keys in LAD,Ward,
MSOA, LSOA and Location tables and updated the Crime data
with missing location to reference the ‘No Location’ record.
Transform
Extract
Transform
Load
ETLProcess
Created Dimension and Fact schemas and the tables for the
SQL Server data warehouse.Added foreign key constraints and
indexes.
Merged data from staging tables to DataWarehouse.
To enable a London only visualisation and analysis of the data I
created views that limit the selection to a LAD code beginning
with ‘E09’ which indicates a London Borough.
Crime data is released monthly, so is added using the same
methods but without affecting data from the initial load.
Load
Extract
Transform
Load
Data
Warehouse
Design
Phase1
DashboardVisualisationFeatures
Selection byWard Name (single select only).
Map display based on Latitude and Longitude.Tooltip on rollover to
include Location Name. Size of marker based on Crime Count. Filter
available on location.
CrimeTrend stacked area chart, displayingCrimeType illustrated by
colour, with Crime Count on y-axis and Month on x-axis.Allow
selection onYear, Quarter and Month, or on CrimeType.
Month slicer to allow selection of any start point and end point.
Selection by CrimeType, allowing multiple selection. Include Select
All option and ability to deselect.
LondonCrime
HeatMap
I also took a view of London
using a Heat Map as preparation
for further analysis.
Crime levels are generally higher
towards the centre, but there is
a patchiness which is more
prominent. There is a West End
hot spot.
The next phase will attempt to
explain the influencing factors.
Phase 2
CensusAnalysis
Phase 2 Census Analysis
GoalsoftheAnalysis
To understand whether and
to what extent crime levels
and different types of crime
are affected by socio-
economic and demographic
factors.
Building on the Crime data
by adding MSOA-level
Census data.
Data
Warehouse
Phase 2
Census Data
ETL
Phase 1
Crime Data
ETL
Location
Clustering
Linear
Regression
Data
Mining
ReviewofCrimeStudies
 Crime data has been subject to quite extensive analysis using data
mining and machine learning with the objectives of crime prediction.
 Clearly a time series analysis using past crime data has the most
predictive power, but lacks insight or explicatory power.
 London Landscape provides combined crime, demographic and
socio-economic datasets for local authority use.
 A US study at LSOA level from the 1990s identified Poverty,
Residential instability, Housing and commuting, Income, Population
and Family Disruption as the ‘themes’ whose measures correlate
well.
 I decided to see what could be accomplished by loadingCensus data
and analysing at the level of MSOA within London.
 The first stage of feature selection involves a selection of the
Census data to be loaded.
 The census data presented includes stats using combinations of
features. These are not useful here, as we are interested in
separating the impacts of individual variables, so are ignored.
 I ignored stats which offer little chance of displaying much
variability at the level of MSOA. Factors such as Gender and Age
may strongly correlate with an individual’s propensity for
criminality, but people do not tend to concentrate sufficiently
based on those factors at the level of MSOA.
 The data was sourced from
https://www.ons.gov.uk/census/2011census/2011censusdata/bulk
data/bulkdatadownloads
CensusDataSelection
CensusDataExtract
Census Data Download CSVs
used
Fields Type of Data
Detailed
Characteristics 1
BulkdatadetailedcharacteristicsmsoaE&W
andinfo3.3
6 28 Status, Coupled Y/N, Family Type, No
of Cars, No of Bedrooms,
Occupation
Labour Market BulkdataLabourMarketMSOA3.5aandinfo 1 9 NS-SeC employment classifications
Detailed
Characteristics 2
BulkdataDetailedCharacteristicsMSOAdat
aforE&WLowerGeographiesandinfo1207
5 41 Shared Y/N, Central Heating Y/N,
Occupancy, Ethnicity, Religion,
Residence Y/N, Dwelling Type
The number of rows populated in each case was the number of MSOAs which is 7201.
The subset of this which will be used are the records that relate to London which number 983.
CensusDataProcessing
Select ONS data of individual stats for MSOAs
Also download geographic relationships between LSOA
and MSOA
Use SSIS to populate MSSQL Staging tables
Enhance the DataWarehouse to support census data
Cleanse the MSOA data where relationships out of date
MSSQL
SSAS
SSIS
Python
Cloud
DataModel
Phase2
New
Data
DataCube
A data cube was created in SSAS to bring together crime data
aggregated to the MSOA level, with the census figures.
A Geography Hierarchy was added to relate the Dimensions Location,
LSOA and MSOA
Crime Per Head is a calculation of the Crime Count divided by
Population
Calculations were created for the variables to be measured
independent of population.
MSOAs
LSOAs
Locations
(streets)
FeatureSelection
For each feature selected for the model I created calculated
variables in the data cube.
My intention was that these should be representative of the
socio-economic nature of an area and its social assets,
1. Number of Bedrooms Per Property
2. Number of Cars Per Household
3. Percentage of buildings which are Houses
4. Percentage of persons in AB (Professional/managerial)
Occupations
5. Percentage of Houses over-occupied
These measures I then analysed in Excel.
MeasureStats
A quick view of the features
selected to see that they have a
reasonable distribution of data
MeasureScorecard
The measures provide variety and represent different
aspects of socio-economic asset availability
1. Bedrooms Per Property - measures the housing stock
including size of properties.
2. Cars Per Household - measures the wealth of the
population and their access to public transport
infrastructure.
3. Percent of houses - measures the nature of the housing
stock including the household living space.
4. Percentage of persons in AB (Professional/managerial)
Occupations – measures the skills and earning potential of
the local population.
5. Percentage of Houses over-occupied – measures the
interior household environment and personal living space.
StructureofPhase2
The measures selected will
be used as inputs to all the
subsequent analyses.
Additionally the outputs of
the clustering and linear
regression exercises will be
combined with the
measures for the data
mining exercise.
2A -Cluster
Location
Clusters
2B -Linear
Regression
Crime Type
Categories
2C - Data
Mining
Conclusion
Data
Warehouse
m
e
a
s
u
r
e
s
2A-LocationClustering
To supplement our quantitative measures, I wanted to derive
some qualitative data from the census data, to facilitate other
approaches for analysis and visualisation.
I decided to try to categorise the geographic areas (MSOAs) in
the census data.
I had no pre-conceived notion as to what these clusters should
represent, so I chose an unsupervised method.
The approach chosen is k-means clustering.
TechnicalApproach
Scikit-learn provide a python package with a relevant toolset
for k-means clustering.
As we will be comparing indicators that are measured in
differing units, some pre-processing of the data is required to
counteract that impact.This will be by means of the
StandardScaler function.
Standard Python 3 modules will be pandas for set processing
and matplotlib for the visualisations.
I also used Excel and Power BI to visualise and display results
MSSQL
SSAS
Power
BI
Python
TherelationshipsbetweenCarsPerHousehold,BedroomsperhouseandPercentageHouseslookquitestrong,
Includingthemallmaybecounter-productive,iftheyduplicatinganexistingfeature.
IselectedCarsperHouseholdasrepresentativeofallthreefortheclusteringexercise.
MeasurePre-Selection
ElbowMethod
By varying the number of
clusters and rerunning the
clustering algorithm we can plot
the results against the SSE (sum
of the squared error) – a
goodness-of-fit measure.
More clusters will give a better
fit, but we can see that the rate
of improvement reduces
significantly after three, so this is
a sensible number of clusters.
Havingrunour threemeasuresthroughtheclusteringalgorithmusingaparameterof3clusterstheresultscanbe
viewedusingaseriesoftwo-dimensionalviewsofthethree-dimensions.
ClusterResults
CategorisationResults
Category Colour Percentage of
Houses over-
occupied
Number of Cars
Per Household
Percentage of persons in
Professional or
ManagerialOccupations
Deprived Urban Red High Low Low
White Collar Urban Yellow Low Low High
Suburban Blue Low High Mixed
As a sanity-check on the clustering results I loaded the categories into Power BI, combining them with a
the geographic identifiers (latitude and longitude) to display the result as a map.
LondonCluster
Visualisation
Category Colour
Deprived Urban Red
White Collar Urban Yellow
Suburban Blue
CrimePerHeadbyCluster
Crime levels are significantly
lower in Suburban areas.
Are there any significant
differences between the Crime
types recorded, that depend on
the nature of the environment?
In Excel I merged the data cube
with the cluster categorisations
and displayed the results as a
series of Pie Charts.
First is a baseline Pie Chart
including all of the locations, with
CrimeTypes ordered by volume.
Crime Type
Deprived
Urban Suburban
White-collar
Urban All
Anti-social behaviour 0.184 0.097 0.167 0.155
Violence and sexual offences 0.146 0.082 0.111 0.118
Vehicle crime 0.061 0.051 0.061 0.058
Other theft 0.050 0.028 0.081 0.054
Burglary 0.037 0.031 0.046 0.038
Criminal damage and arson 0.033 0.021 0.027 0.028
Public order 0.030 0.016 0.029 0.026
Drugs 0.030 0.011 0.022 0.022
Theft from the person 0.018 0.003 0.042 0.021
Shoplifting 0.021 0.012 0.028 0.021
Robbery 0.020 0.007 0.021 0.017
Bicycle theft 0.010 0.002 0.018 0.010
Other crime 0.006 0.005 0.004 0.005
Possession of weapons 0.004 0.002 0.003 0.003
All Crime Per Head 0.65 0.37 0.66 0.58
Anti-social behaviour
27%
Violence and sexual
offences
20%
Vehicle crime
10%
Other theft
9%
Burglary
7%
Criminal damage
and arson
5%
Public
order
4%
Drugs
4%
Theft from the person
4%
Shoplifting
4%
Robbery
3%
Bicycle theft
2%
Other crime
1% Possession of weapons
0%
Anti-social behaviour
Violence and sexual
offences
Vehicle crime
Other theft
Burglary
Criminal damage and
arson
Public order
Drugs
Theft from the person
Shoplifting
Robbery
Bicycle theft
Other crime
Possession of
weapons
Pie Chart
Crime broken
down by crime
type
PieChartsbyCluster
Then we have the same analysis but using the data from the
separate Clusters.
There is a degree of consistency in many of the crime type
proportions regardless of which category of area is being
analysed. Some exceptions are:
InWhite-Collar Urban areas there is more bicycle theft and
theft from the person.
In Suburban areas there is less crime overall but
proportionately more burglary and vehicle crime.
In Deprived Urban areas there is more drugs, violence and
sexual offences.
Pie
Charts
By
Cluster
White-collar Urban
Anti-social behaviour
Violence and sexual offences
Vehicle crime
Other theft
Burglary
Criminal damage and arson
Public order
Drugs
Theft from the person
Shoplifting
Suburban
Anti-social behaviour
Violence and sexual
offences
Vehicle crime
Other theft
Burglary
Deprived Urban
Anti-social behaviour
Violence and sexual
offences
Vehicle crime
Other theft
Burglary
40
Exceptions are:
InWhite-Collar Urban areas
there is more bicycle theft and
theft from the person.
In Suburban areas there is
less crime overall but
proportionately more
burglary and vehicle crime.
In Deprived Urban areas
there is more drugs, violence
and sexual offences.
There is a degree of
consistency in many of
the crime type
proportions regardless
of which category of
area is being analysed.
2B-LinearRegression
To get a feel for the relationships between pairs if variables
and their strength, I chose to analyse the data using linear
regression.
1The independent variables (demographics from the
census) are displayed on the x-axis
2The dependent variable (Recorded Crimes per Head) are
displayed on the y
I visualized the data in Excel with a trend line added
I calculated the Slope and Correlation values for each using
the Excel functions.
0
1
2
3
4
5
6
7
8
9
0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00%
Crimes
Per
Head
Occupation AB PerCent
0
1
2
3
4
5
6
7
8
9
0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00%
Crimes
Per
Head
Over - Occupied PerCent
-1
0
1
2
3
4
5
6
7
8
9
0 0.5 1 1.5 2
Crimes
Per
Head
Cars Per Household
0
1
2
3
4
5
6
7
8
9
0.00% 20.00% 40.00% 60.00% 80.00% 100.00%
Crimes
Per
Head
Houses PerCent
-1
0
1
2
3
4
5
6
7
8
9
10
1 1.5 2 2.5 3 3.5 4
Crimes
Per
Head Bedrooms Per House
LinearRegressionResult
Bedrooms per House was the clearest socio-economic
indicator of those tested to demonstrate a relationship with
crime levels, with a Correlation Coefficient of -0.43.
Cars per Household and Houses Per Cent also showed a
correlation.
All of the indicators show a negative trend line (except
over-occupancy) which is as expected.
Over-occupancy had a weak correlation and the Occupation
AB Per Cent showed no overall correlation.
The outliers were identified as locations such as theWest
End and Heathrow Airport, which are crime locations with a
small resident population.
I also analysed each specific CrimeType within each
variable.
LinearRegressionofDemographicIndicatorsvsCrimeperHead
Slope Correlation Coefficient Slope Correlation Coefficient Slope Correlation Coefficient Slope Correlation Coefficient Slope Correlation Coefficient
All -0.567 -0.427 -0.457 -0.352 0.056 0.012 -0.686 -0.371 1.601 0.212
Anti-social behaviour -0.148 -0.557 -0.125 -0.484 -0.076 -0.080 -0.184 -0.499 0.554 0.368
Violence and sexual offences -0.089 -0.502 -0.076 -0.439 -0.179 -0.281 -0.088 -0.356 0.411 0.408
Public order -0.026 -0.477 -0.021 -0.399 -0.008 -0.043 -0.030 -0.404 0.073 0.238
Criminal damage and arson -0.017 -0.477 -0.014 -0.402 -0.034 -0.262 -0.016 -0.309 0.067 0.326
Bicycle theft -0.019 -0.472 -0.017 -0.446 0.042 0.292 -0.029 -0.528 0.019 0.085
Possession of weapons -0.004 -0.467 -0.003 -0.432 -0.006 -0.195 -0.004 -0.382 0.017 0.374
Drugs -0.027 -0.462 -0.024 -0.416 -0.029 -0.138 -0.032 -0.396 0.130 0.391
Robbery -0.024 -0.390 -0.021 -0.360 0.009 0.042 -0.031 -0.363 0.079 0.230
Burglary -0.020 -0.382 -0.016 -0.318 0.044 0.236 -0.030 -0.411 0.016 0.052
Other theft -0.088 -0.284 -0.062 -0.202 0.152 0.135 -0.114 -0.263 0.083 0.047
Theft from the person -0.061 -0.242 -0.044 -0.177 0.120 0.132 -0.082 -0.235 0.055 0.038
Shoplifting -0.028 -0.238 -0.018 -0.161 0.036 0.085 -0.030 -0.187 0.025 0.038
Vehicle crime -0.014 -0.192 -0.013 -0.180 0.000 0.000 -0.014 -0.138 0.054 0.133
Other crime -0.003 -0.074 -0.001 -0.033 -0.014 -0.089 -0.001 -0.017 0.018 0.072
Bedrooms Per House Cars Per Household Occupation AB % Houses % Overcrowded %
CrimeTypeCategorisation
Each CrimeType can be categorised according to the degree of correlation they
had with the Bedrooms Per House indicator.
A Correlation of greater than 0.4 is categorised as more residential in nature.
A Correlation of less than 0.25 is categorised as no more likely in residential
areas (so I categorise that asTown-centre)
Category CrimeTypes
Residential ASB, Public order, Criminal Damage, Drugs, Possession of
Weapons,Violence and sexual offences
Neutral Burglary and robbery
Town-Centre Shoplifting,Vehicle Crime and theft from the person
2C – Data Mining
I wanted to investigate further whether the features chosen
so far would be able to provide a predictive capability using
the SSAS data mining capability.
Data mining would enable decision tree and multi-variate
capability, so it should be possible to tease out more subtle
relationships.
Also, by slicing the data cube by Crime Type Category we can
see how the results vary.
DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 47
The Data Mining Model
I set up a Mining Structure based on the
MSOA dimension with MSOA Code as key
Selected Crimes Per Head as the dependent
variable – to be predicted.
Selected the Census measures, including
Cluster Name, as the independent variables
(Input).
I used the default of reserving 30% of records
for the testing set.
DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 48
Decision Tree Analysis
Run the Mining Model based on Decision Tree algorithm.
The Tree show a initial splits based on Bedrooms Per
House, and substantial variety in subsequent splits and
influencing variables.
Cluster Name does not feature as an influencing variable
(as you may expect since it is derived) but it is used in
many decision points.
The Mining Legends shown are those with the most cases
from the second level.
DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 49
DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 50
Decision Tree
Dependency Network
The Dependency Network
identifies Bedrooms Per House
as the strongest link, followed by
Cluster Category.
The predictive capability is
genuine but is not strong.
1
2
3
4
5
6
DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 51
DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 52
Testing the Model
By applying data slices to restrict the data used by Crime Type
category, better predictive capability can be achieved.
This is particularly true for the crime types that we previously
categorised as Residential crime.
Crime Type Category Score
All 0.33
Residential 0.78
Neutral 0.51
Town-Centre 0.41
DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 53
Residential Crime Category
DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 54
Residential Crime Type
DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 55
Conclusions
1. The clustering of areas by demographics did provide a
useful additional indicator for the model.
2. The data mining confirms the regression analysis
conclusion that Bedrooms Per House is the strongest
demographic indicator.
3. The Bedrooms Per House impact and the model generally
has predictive capability for a subset of crime types.
DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 56
Opportunities for Further Analysis
1. There are other demographic indicators available which I have
not yet explored using this crime dataset.
2. There is an archive of crime data which I have not accessed. It
would be interesting to explore whether there is any substantial
difference by shifting the time frame backwards (closer to the
2011 census date).
3. The Ministry of Housing, Communities & Local Government,
publishes its own indexes of deprivation at LSOA level. While
they constitute a more limited set of features, it may be that
relationships at that level of granularity do become more
apparent.
DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 57

More Related Content

What's hot

Crime Dataset Analysis for City of Chicago
Crime Dataset Analysis for City of ChicagoCrime Dataset Analysis for City of Chicago
Crime Dataset Analysis for City of ChicagoStuti Deshpande
 
Crime reduction initiatives cctv
Crime reduction initiatives  cctvCrime reduction initiatives  cctv
Crime reduction initiatives cctvshannon newton
 
Crime prediction-using-data-mining
Crime prediction-using-data-miningCrime prediction-using-data-mining
Crime prediction-using-data-miningmohammed albash
 
Crime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringCrime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringReuben George
 
Analysis of digital evidence
Analysis of digital evidenceAnalysis of digital evidence
Analysis of digital evidencerakesh mishra
 
Digital investigation
Digital investigationDigital investigation
Digital investigationunnilala11
 
Digital Forensic: Brief Intro & Research Challenge
Digital Forensic: Brief Intro & Research ChallengeDigital Forensic: Brief Intro & Research Challenge
Digital Forensic: Brief Intro & Research ChallengeAung Thu Rha Hein
 
Machine Learning Approaches for Crime Pattern Detection
Machine Learning Approaches for Crime Pattern DetectionMachine Learning Approaches for Crime Pattern Detection
Machine Learning Approaches for Crime Pattern DetectionAPNIC
 
Crime Data Analysis and Prediction for city of Los Angeles
Crime Data Analysis and Prediction for city of Los AngelesCrime Data Analysis and Prediction for city of Los Angeles
Crime Data Analysis and Prediction for city of Los AngelesHeta Parekh
 
Cloud with Cyber Security
Cloud with Cyber SecurityCloud with Cyber Security
Cloud with Cyber SecurityNiki Upadhyay
 
Logging, monitoring and auditing
Logging, monitoring and auditingLogging, monitoring and auditing
Logging, monitoring and auditingPiyush Jain
 
bsi-cyber-resilience-presentation
bsi-cyber-resilience-presentationbsi-cyber-resilience-presentation
bsi-cyber-resilience-presentationAjai Srivastava
 
Review of national cyber security policy 2013 by chintan pathak
Review of national cyber security policy 2013   by chintan pathakReview of national cyber security policy 2013   by chintan pathak
Review of national cyber security policy 2013 by chintan pathakChintan Pathak
 
Intelligence Led Policing for Police Decision Makers
Intelligence Led Policing for Police Decision MakersIntelligence Led Policing for Police Decision Makers
Intelligence Led Policing for Police Decision MakersDeborah Osborne
 

What's hot (20)

Digital forensics
Digital forensicsDigital forensics
Digital forensics
 
Crime Dataset Analysis for City of Chicago
Crime Dataset Analysis for City of ChicagoCrime Dataset Analysis for City of Chicago
Crime Dataset Analysis for City of Chicago
 
Threat Intelligence
Threat IntelligenceThreat Intelligence
Threat Intelligence
 
Crime reduction initiatives cctv
Crime reduction initiatives  cctvCrime reduction initiatives  cctv
Crime reduction initiatives cctv
 
Crime prediction-using-data-mining
Crime prediction-using-data-miningCrime prediction-using-data-mining
Crime prediction-using-data-mining
 
Crime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means ClusteringCrime Pattern Detection using K-Means Clustering
Crime Pattern Detection using K-Means Clustering
 
Network Forensic
Network ForensicNetwork Forensic
Network Forensic
 
Digital Forensic Case Study
Digital Forensic Case StudyDigital Forensic Case Study
Digital Forensic Case Study
 
Analysis of digital evidence
Analysis of digital evidenceAnalysis of digital evidence
Analysis of digital evidence
 
Digital investigation
Digital investigationDigital investigation
Digital investigation
 
Digital Forensic: Brief Intro & Research Challenge
Digital Forensic: Brief Intro & Research ChallengeDigital Forensic: Brief Intro & Research Challenge
Digital Forensic: Brief Intro & Research Challenge
 
Machine Learning Approaches for Crime Pattern Detection
Machine Learning Approaches for Crime Pattern DetectionMachine Learning Approaches for Crime Pattern Detection
Machine Learning Approaches for Crime Pattern Detection
 
Crime Data Analysis and Prediction for city of Los Angeles
Crime Data Analysis and Prediction for city of Los AngelesCrime Data Analysis and Prediction for city of Los Angeles
Crime Data Analysis and Prediction for city of Los Angeles
 
Cloud with Cyber Security
Cloud with Cyber SecurityCloud with Cyber Security
Cloud with Cyber Security
 
Logging, monitoring and auditing
Logging, monitoring and auditingLogging, monitoring and auditing
Logging, monitoring and auditing
 
bsi-cyber-resilience-presentation
bsi-cyber-resilience-presentationbsi-cyber-resilience-presentation
bsi-cyber-resilience-presentation
 
Review of national cyber security policy 2013 by chintan pathak
Review of national cyber security policy 2013   by chintan pathakReview of national cyber security policy 2013   by chintan pathak
Review of national cyber security policy 2013 by chintan pathak
 
Intelligence Led Policing for Police Decision Makers
Intelligence Led Policing for Police Decision MakersIntelligence Led Policing for Police Decision Makers
Intelligence Led Policing for Police Decision Makers
 
Malware analysis
Malware analysisMalware analysis
Malware analysis
 
Cyber Forensics & Challenges
Cyber Forensics & ChallengesCyber Forensics & Challenges
Cyber Forensics & Challenges
 

Similar to Analysis of London's Crime and Census Data

Georgetown Data Analytics Project (Team DC)
Georgetown Data Analytics Project (Team DC)Georgetown Data Analytics Project (Team DC)
Georgetown Data Analytics Project (Team DC)Noah Turner
 
Merseyside Crime Analysis
Merseyside Crime AnalysisMerseyside Crime Analysis
Merseyside Crime AnalysisParang Saraf
 
Richard Smith: Addressing the Problems of Addressing at British Transport Police
Richard Smith: Addressing the Problems of Addressing at British Transport PoliceRichard Smith: Addressing the Problems of Addressing at British Transport Police
Richard Smith: Addressing the Problems of Addressing at British Transport PoliceAGI Geocommunity
 
Analysis of Crime Big Data using MapReduce
Analysis of Crime Big Data using MapReduceAnalysis of Crime Big Data using MapReduce
Analysis of Crime Big Data using MapReduceKaushik Rajan
 
An Intelligence Analysis of Crime Data for Law Enforcement Using Data Mining
An Intelligence Analysis of Crime Data for Law Enforcement Using Data MiningAn Intelligence Analysis of Crime Data for Law Enforcement Using Data Mining
An Intelligence Analysis of Crime Data for Law Enforcement Using Data MiningWaqas Tariq
 
Crime prediction based on crime types
Crime prediction based on crime typesCrime prediction based on crime types
Crime prediction based on crime typesIJDKP
 
Density of route frequency for enforcement
Density of route frequency for enforcement Density of route frequency for enforcement
Density of route frequency for enforcement Conference Papers
 
Chicago Crime Dataset Project Proposal
Chicago Crime Dataset Project ProposalChicago Crime Dataset Project Proposal
Chicago Crime Dataset Project ProposalAashri Tandon
 
Stefan Michalak - Portfolio - January 2016
Stefan Michalak - Portfolio - January 2016Stefan Michalak - Portfolio - January 2016
Stefan Michalak - Portfolio - January 2016stefan michalak
 
IRJET- Crime Analysis using Data Mining and Data Analytics
IRJET- Crime Analysis using Data Mining and Data AnalyticsIRJET- Crime Analysis using Data Mining and Data Analytics
IRJET- Crime Analysis using Data Mining and Data AnalyticsIRJET Journal
 
LokeshShanmuganandam_BigData_FinalProjectReport
LokeshShanmuganandam_BigData_FinalProjectReportLokeshShanmuganandam_BigData_FinalProjectReport
LokeshShanmuganandam_BigData_FinalProjectReportlokesh shanmuganandam
 
Final%20Analysis%20Code%20Displayed.html
Final%20Analysis%20Code%20Displayed.htmlFinal%20Analysis%20Code%20Displayed.html
Final%20Analysis%20Code%20Displayed.htmlRyan Haeri
 
Jones Et Al Gisruk 2008
Jones Et Al  Gisruk 2008Jones Et Al  Gisruk 2008
Jones Et Al Gisruk 2008katyJ
 

Similar to Analysis of London's Crime and Census Data (20)

GEOSPATIAL DATA SOURCES
GEOSPATIAL DATA SOURCESGEOSPATIAL DATA SOURCES
GEOSPATIAL DATA SOURCES
 
Technical Seminar
Technical SeminarTechnical Seminar
Technical Seminar
 
Georgetown Data Analytics Project (Team DC)
Georgetown Data Analytics Project (Team DC)Georgetown Data Analytics Project (Team DC)
Georgetown Data Analytics Project (Team DC)
 
Merseyside Crime Analysis
Merseyside Crime AnalysisMerseyside Crime Analysis
Merseyside Crime Analysis
 
Richard Smith: Addressing the Problems of Addressing at British Transport Police
Richard Smith: Addressing the Problems of Addressing at British Transport PoliceRichard Smith: Addressing the Problems of Addressing at British Transport Police
Richard Smith: Addressing the Problems of Addressing at British Transport Police
 
ESRI Feb 29
ESRI Feb 29ESRI Feb 29
ESRI Feb 29
 
Analysis of Crime Big Data using MapReduce
Analysis of Crime Big Data using MapReduceAnalysis of Crime Big Data using MapReduce
Analysis of Crime Big Data using MapReduce
 
An Intelligence Analysis of Crime Data for Law Enforcement Using Data Mining
An Intelligence Analysis of Crime Data for Law Enforcement Using Data MiningAn Intelligence Analysis of Crime Data for Law Enforcement Using Data Mining
An Intelligence Analysis of Crime Data for Law Enforcement Using Data Mining
 
Crime prediction based on crime types
Crime prediction based on crime typesCrime prediction based on crime types
Crime prediction based on crime types
 
Density of route frequency for enforcement
Density of route frequency for enforcement Density of route frequency for enforcement
Density of route frequency for enforcement
 
Chicago Crime Dataset Project Proposal
Chicago Crime Dataset Project ProposalChicago Crime Dataset Project Proposal
Chicago Crime Dataset Project Proposal
 
Encuentro Mundial
Encuentro MundialEncuentro Mundial
Encuentro Mundial
 
Itech 7407 report
Itech 7407 reportItech 7407 report
Itech 7407 report
 
Stefan Michalak - Portfolio - January 2016
Stefan Michalak - Portfolio - January 2016Stefan Michalak - Portfolio - January 2016
Stefan Michalak - Portfolio - January 2016
 
IRJET- Crime Analysis using Data Mining and Data Analytics
IRJET- Crime Analysis using Data Mining and Data AnalyticsIRJET- Crime Analysis using Data Mining and Data Analytics
IRJET- Crime Analysis using Data Mining and Data Analytics
 
NCompass Live: Accessing Census Data
NCompass Live: Accessing Census Data NCompass Live: Accessing Census Data
NCompass Live: Accessing Census Data
 
LokeshShanmuganandam_BigData_FinalProjectReport
LokeshShanmuganandam_BigData_FinalProjectReportLokeshShanmuganandam_BigData_FinalProjectReport
LokeshShanmuganandam_BigData_FinalProjectReport
 
Final%20Analysis%20Code%20Displayed.html
Final%20Analysis%20Code%20Displayed.htmlFinal%20Analysis%20Code%20Displayed.html
Final%20Analysis%20Code%20Displayed.html
 
Jones Et Al Gisruk 2008
Jones Et Al  Gisruk 2008Jones Et Al  Gisruk 2008
Jones Et Al Gisruk 2008
 
What is my neighbourhood like
What is my neighbourhood likeWhat is my neighbourhood like
What is my neighbourhood like
 

Recently uploaded

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 

Recently uploaded (20)

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 

Analysis of London's Crime and Census Data

  • 1. Analysis of London’s Crime and Census data Pairview BI Developer Project COLIN BARTRAM
  • 2. Project Roadmap Phase 1 Crime Data Phase 2 Census Data 1A - Dashboard 2B - Linear Regression 2A - Clustering 2C - Data Mining 1B – Heat Map DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 2
  • 3. Data Sources • The Metropolitan Police Service (MPS) releases monthly anonymised crime data. • The Office for National Statistics (ONS) conducts censuses every 10 years and releases demographic statistics based on the responses. The latest available is from 2011. • Geographic data was imported from ONS. The latest Local Authority boundaries date from 2015. DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 3
  • 4. Geographic granularity MSOAs LSOAs Locations (streets) • MPS Crime location data includes Lower-Layer Super Output Areas (LSOA). • LSOA can be aggregated to various local authority units and to Middle-Layer Super Output Areas (MSOA). • MSOA tends to be the lowest level of geography at which Census data is published. • MSOA is a more consistent geographic level than local authority. An MSOA is targeted to have between 5000 and 15000 population. DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 4
  • 5. Objectives My objectives are to: a. Visualise London crime data. b. Analyse crime stats to highlight any contributory demographic factors. The analysis techniques to be employed will include: • Linear regression • K-means clustering • Data mining using decision trees DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 5
  • 6. High Level Architecture • SSIS will be used to load into a MSSQL database. • T-SQL queries will be used to populate a data warehouse. • Power BI will be used for visualisations • SSAS will be used to develop a data cube. • Excel will be used for reporting and linear regression • Python will be used for clustering • SSAS will be used for data mining and decision tree analysis. MSSQL SSAS SSIS Python Cloud DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 6
  • 8. DataSources The Metropolitan Police Service (MPS) releases monthly anonymised crime data, alongside a history of the previous two years, at https://data.police.uk/data/ They use street snap-points to geomask the locations and do not identify the day of the month. The MPS data includes locations at street level, including the LSOA (Lower Layer Super Output Area) code, which allows the data to be aggregated at LocalAuthority (LAD),Ward level or Middle-Layer Super Output Areas (MSOA) level. Geographic data is sources from https://data.gov.uk/dataset/bc0d1720- 0275-490d-a7da-d22e69495314/lower-layer-super-output-area-2011-to- ward-2015-lookup-in-england-and-wales
  • 9. Tools  SSIS will be used to load into SQLServer staging tables.  T-SQL queries will be used to populate a SQLServer data warehouse.  (By keeping staging tables separate from the data warehouse, transformation will be handled by SQL queries for reasons of scalability and transparency).  Power BI will be used to develop the dashboard. MSSQL DW MSSQL Staging Power BI SSIS Cloud
  • 10. ETLProcess Download and extract the required Crime and geographic data using SSIS Create PoliceStats database and staging tables on SQL Server. Use SSIS to iterate over the folder structure to upload the Crime data to SQL Server. Use SSIS to upload the geographic data to SQL Server. Extract Extract Transform Load Number of Months (CSVs) 39 Number of Locations 65,699 Number of Recorded Crimes 3,677,871
  • 11. ETLProcess The data is well formed and LSOA codes linking the two data sources are 100% valid requiring no data cleansing. MPS do not currently use the fields FallsWithin and Reported By to indicate where other Police Services are involved Crimes recorded by the MPS at geographical locations outside of the Met Police area (London Boroughs) are included and could distort the results. I added ‘No Location’ records with zero keys in LAD,Ward, MSOA, LSOA and Location tables and updated the Crime data with missing location to reference the ‘No Location’ record. Transform Extract Transform Load
  • 12. ETLProcess Created Dimension and Fact schemas and the tables for the SQL Server data warehouse.Added foreign key constraints and indexes. Merged data from staging tables to DataWarehouse. To enable a London only visualisation and analysis of the data I created views that limit the selection to a LAD code beginning with ‘E09’ which indicates a London Borough. Crime data is released monthly, so is added using the same methods but without affecting data from the initial load. Load Extract Transform Load
  • 14. DashboardVisualisationFeatures Selection byWard Name (single select only). Map display based on Latitude and Longitude.Tooltip on rollover to include Location Name. Size of marker based on Crime Count. Filter available on location. CrimeTrend stacked area chart, displayingCrimeType illustrated by colour, with Crime Count on y-axis and Month on x-axis.Allow selection onYear, Quarter and Month, or on CrimeType. Month slicer to allow selection of any start point and end point. Selection by CrimeType, allowing multiple selection. Include Select All option and ability to deselect.
  • 15.
  • 16.
  • 17. LondonCrime HeatMap I also took a view of London using a Heat Map as preparation for further analysis. Crime levels are generally higher towards the centre, but there is a patchiness which is more prominent. There is a West End hot spot. The next phase will attempt to explain the influencing factors.
  • 19. Phase 2 Census Analysis GoalsoftheAnalysis To understand whether and to what extent crime levels and different types of crime are affected by socio- economic and demographic factors. Building on the Crime data by adding MSOA-level Census data. Data Warehouse Phase 2 Census Data ETL Phase 1 Crime Data ETL Location Clustering Linear Regression Data Mining
  • 20. ReviewofCrimeStudies  Crime data has been subject to quite extensive analysis using data mining and machine learning with the objectives of crime prediction.  Clearly a time series analysis using past crime data has the most predictive power, but lacks insight or explicatory power.  London Landscape provides combined crime, demographic and socio-economic datasets for local authority use.  A US study at LSOA level from the 1990s identified Poverty, Residential instability, Housing and commuting, Income, Population and Family Disruption as the ‘themes’ whose measures correlate well.  I decided to see what could be accomplished by loadingCensus data and analysing at the level of MSOA within London.
  • 21.  The first stage of feature selection involves a selection of the Census data to be loaded.  The census data presented includes stats using combinations of features. These are not useful here, as we are interested in separating the impacts of individual variables, so are ignored.  I ignored stats which offer little chance of displaying much variability at the level of MSOA. Factors such as Gender and Age may strongly correlate with an individual’s propensity for criminality, but people do not tend to concentrate sufficiently based on those factors at the level of MSOA.  The data was sourced from https://www.ons.gov.uk/census/2011census/2011censusdata/bulk data/bulkdatadownloads CensusDataSelection
  • 22. CensusDataExtract Census Data Download CSVs used Fields Type of Data Detailed Characteristics 1 BulkdatadetailedcharacteristicsmsoaE&W andinfo3.3 6 28 Status, Coupled Y/N, Family Type, No of Cars, No of Bedrooms, Occupation Labour Market BulkdataLabourMarketMSOA3.5aandinfo 1 9 NS-SeC employment classifications Detailed Characteristics 2 BulkdataDetailedCharacteristicsMSOAdat aforE&WLowerGeographiesandinfo1207 5 41 Shared Y/N, Central Heating Y/N, Occupancy, Ethnicity, Religion, Residence Y/N, Dwelling Type The number of rows populated in each case was the number of MSOAs which is 7201. The subset of this which will be used are the records that relate to London which number 983.
  • 23. CensusDataProcessing Select ONS data of individual stats for MSOAs Also download geographic relationships between LSOA and MSOA Use SSIS to populate MSSQL Staging tables Enhance the DataWarehouse to support census data Cleanse the MSOA data where relationships out of date MSSQL SSAS SSIS Python Cloud
  • 25. DataCube A data cube was created in SSAS to bring together crime data aggregated to the MSOA level, with the census figures. A Geography Hierarchy was added to relate the Dimensions Location, LSOA and MSOA Crime Per Head is a calculation of the Crime Count divided by Population Calculations were created for the variables to be measured independent of population. MSOAs LSOAs Locations (streets)
  • 26. FeatureSelection For each feature selected for the model I created calculated variables in the data cube. My intention was that these should be representative of the socio-economic nature of an area and its social assets, 1. Number of Bedrooms Per Property 2. Number of Cars Per Household 3. Percentage of buildings which are Houses 4. Percentage of persons in AB (Professional/managerial) Occupations 5. Percentage of Houses over-occupied These measures I then analysed in Excel.
  • 27. MeasureStats A quick view of the features selected to see that they have a reasonable distribution of data
  • 28. MeasureScorecard The measures provide variety and represent different aspects of socio-economic asset availability 1. Bedrooms Per Property - measures the housing stock including size of properties. 2. Cars Per Household - measures the wealth of the population and their access to public transport infrastructure. 3. Percent of houses - measures the nature of the housing stock including the household living space. 4. Percentage of persons in AB (Professional/managerial) Occupations – measures the skills and earning potential of the local population. 5. Percentage of Houses over-occupied – measures the interior household environment and personal living space.
  • 29. StructureofPhase2 The measures selected will be used as inputs to all the subsequent analyses. Additionally the outputs of the clustering and linear regression exercises will be combined with the measures for the data mining exercise. 2A -Cluster Location Clusters 2B -Linear Regression Crime Type Categories 2C - Data Mining Conclusion Data Warehouse m e a s u r e s
  • 30. 2A-LocationClustering To supplement our quantitative measures, I wanted to derive some qualitative data from the census data, to facilitate other approaches for analysis and visualisation. I decided to try to categorise the geographic areas (MSOAs) in the census data. I had no pre-conceived notion as to what these clusters should represent, so I chose an unsupervised method. The approach chosen is k-means clustering.
  • 31. TechnicalApproach Scikit-learn provide a python package with a relevant toolset for k-means clustering. As we will be comparing indicators that are measured in differing units, some pre-processing of the data is required to counteract that impact.This will be by means of the StandardScaler function. Standard Python 3 modules will be pandas for set processing and matplotlib for the visualisations. I also used Excel and Power BI to visualise and display results MSSQL SSAS Power BI Python
  • 33. ElbowMethod By varying the number of clusters and rerunning the clustering algorithm we can plot the results against the SSE (sum of the squared error) – a goodness-of-fit measure. More clusters will give a better fit, but we can see that the rate of improvement reduces significantly after three, so this is a sensible number of clusters.
  • 35. CategorisationResults Category Colour Percentage of Houses over- occupied Number of Cars Per Household Percentage of persons in Professional or ManagerialOccupations Deprived Urban Red High Low Low White Collar Urban Yellow Low Low High Suburban Blue Low High Mixed As a sanity-check on the clustering results I loaded the categories into Power BI, combining them with a the geographic identifiers (latitude and longitude) to display the result as a map.
  • 36. LondonCluster Visualisation Category Colour Deprived Urban Red White Collar Urban Yellow Suburban Blue
  • 37. CrimePerHeadbyCluster Crime levels are significantly lower in Suburban areas. Are there any significant differences between the Crime types recorded, that depend on the nature of the environment? In Excel I merged the data cube with the cluster categorisations and displayed the results as a series of Pie Charts. First is a baseline Pie Chart including all of the locations, with CrimeTypes ordered by volume. Crime Type Deprived Urban Suburban White-collar Urban All Anti-social behaviour 0.184 0.097 0.167 0.155 Violence and sexual offences 0.146 0.082 0.111 0.118 Vehicle crime 0.061 0.051 0.061 0.058 Other theft 0.050 0.028 0.081 0.054 Burglary 0.037 0.031 0.046 0.038 Criminal damage and arson 0.033 0.021 0.027 0.028 Public order 0.030 0.016 0.029 0.026 Drugs 0.030 0.011 0.022 0.022 Theft from the person 0.018 0.003 0.042 0.021 Shoplifting 0.021 0.012 0.028 0.021 Robbery 0.020 0.007 0.021 0.017 Bicycle theft 0.010 0.002 0.018 0.010 Other crime 0.006 0.005 0.004 0.005 Possession of weapons 0.004 0.002 0.003 0.003 All Crime Per Head 0.65 0.37 0.66 0.58
  • 38. Anti-social behaviour 27% Violence and sexual offences 20% Vehicle crime 10% Other theft 9% Burglary 7% Criminal damage and arson 5% Public order 4% Drugs 4% Theft from the person 4% Shoplifting 4% Robbery 3% Bicycle theft 2% Other crime 1% Possession of weapons 0% Anti-social behaviour Violence and sexual offences Vehicle crime Other theft Burglary Criminal damage and arson Public order Drugs Theft from the person Shoplifting Robbery Bicycle theft Other crime Possession of weapons Pie Chart Crime broken down by crime type
  • 39. PieChartsbyCluster Then we have the same analysis but using the data from the separate Clusters. There is a degree of consistency in many of the crime type proportions regardless of which category of area is being analysed. Some exceptions are: InWhite-Collar Urban areas there is more bicycle theft and theft from the person. In Suburban areas there is less crime overall but proportionately more burglary and vehicle crime. In Deprived Urban areas there is more drugs, violence and sexual offences.
  • 40. Pie Charts By Cluster White-collar Urban Anti-social behaviour Violence and sexual offences Vehicle crime Other theft Burglary Criminal damage and arson Public order Drugs Theft from the person Shoplifting Suburban Anti-social behaviour Violence and sexual offences Vehicle crime Other theft Burglary Deprived Urban Anti-social behaviour Violence and sexual offences Vehicle crime Other theft Burglary 40 Exceptions are: InWhite-Collar Urban areas there is more bicycle theft and theft from the person. In Suburban areas there is less crime overall but proportionately more burglary and vehicle crime. In Deprived Urban areas there is more drugs, violence and sexual offences. There is a degree of consistency in many of the crime type proportions regardless of which category of area is being analysed.
  • 41. 2B-LinearRegression To get a feel for the relationships between pairs if variables and their strength, I chose to analyse the data using linear regression. 1The independent variables (demographics from the census) are displayed on the x-axis 2The dependent variable (Recorded Crimes per Head) are displayed on the y I visualized the data in Excel with a trend line added I calculated the Slope and Correlation values for each using the Excel functions.
  • 42. 0 1 2 3 4 5 6 7 8 9 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% Crimes Per Head Occupation AB PerCent 0 1 2 3 4 5 6 7 8 9 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% Crimes Per Head Over - Occupied PerCent -1 0 1 2 3 4 5 6 7 8 9 0 0.5 1 1.5 2 Crimes Per Head Cars Per Household 0 1 2 3 4 5 6 7 8 9 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% Crimes Per Head Houses PerCent
  • 43. -1 0 1 2 3 4 5 6 7 8 9 10 1 1.5 2 2.5 3 3.5 4 Crimes Per Head Bedrooms Per House
  • 44. LinearRegressionResult Bedrooms per House was the clearest socio-economic indicator of those tested to demonstrate a relationship with crime levels, with a Correlation Coefficient of -0.43. Cars per Household and Houses Per Cent also showed a correlation. All of the indicators show a negative trend line (except over-occupancy) which is as expected. Over-occupancy had a weak correlation and the Occupation AB Per Cent showed no overall correlation. The outliers were identified as locations such as theWest End and Heathrow Airport, which are crime locations with a small resident population. I also analysed each specific CrimeType within each variable.
  • 45. LinearRegressionofDemographicIndicatorsvsCrimeperHead Slope Correlation Coefficient Slope Correlation Coefficient Slope Correlation Coefficient Slope Correlation Coefficient Slope Correlation Coefficient All -0.567 -0.427 -0.457 -0.352 0.056 0.012 -0.686 -0.371 1.601 0.212 Anti-social behaviour -0.148 -0.557 -0.125 -0.484 -0.076 -0.080 -0.184 -0.499 0.554 0.368 Violence and sexual offences -0.089 -0.502 -0.076 -0.439 -0.179 -0.281 -0.088 -0.356 0.411 0.408 Public order -0.026 -0.477 -0.021 -0.399 -0.008 -0.043 -0.030 -0.404 0.073 0.238 Criminal damage and arson -0.017 -0.477 -0.014 -0.402 -0.034 -0.262 -0.016 -0.309 0.067 0.326 Bicycle theft -0.019 -0.472 -0.017 -0.446 0.042 0.292 -0.029 -0.528 0.019 0.085 Possession of weapons -0.004 -0.467 -0.003 -0.432 -0.006 -0.195 -0.004 -0.382 0.017 0.374 Drugs -0.027 -0.462 -0.024 -0.416 -0.029 -0.138 -0.032 -0.396 0.130 0.391 Robbery -0.024 -0.390 -0.021 -0.360 0.009 0.042 -0.031 -0.363 0.079 0.230 Burglary -0.020 -0.382 -0.016 -0.318 0.044 0.236 -0.030 -0.411 0.016 0.052 Other theft -0.088 -0.284 -0.062 -0.202 0.152 0.135 -0.114 -0.263 0.083 0.047 Theft from the person -0.061 -0.242 -0.044 -0.177 0.120 0.132 -0.082 -0.235 0.055 0.038 Shoplifting -0.028 -0.238 -0.018 -0.161 0.036 0.085 -0.030 -0.187 0.025 0.038 Vehicle crime -0.014 -0.192 -0.013 -0.180 0.000 0.000 -0.014 -0.138 0.054 0.133 Other crime -0.003 -0.074 -0.001 -0.033 -0.014 -0.089 -0.001 -0.017 0.018 0.072 Bedrooms Per House Cars Per Household Occupation AB % Houses % Overcrowded %
  • 46. CrimeTypeCategorisation Each CrimeType can be categorised according to the degree of correlation they had with the Bedrooms Per House indicator. A Correlation of greater than 0.4 is categorised as more residential in nature. A Correlation of less than 0.25 is categorised as no more likely in residential areas (so I categorise that asTown-centre) Category CrimeTypes Residential ASB, Public order, Criminal Damage, Drugs, Possession of Weapons,Violence and sexual offences Neutral Burglary and robbery Town-Centre Shoplifting,Vehicle Crime and theft from the person
  • 47. 2C – Data Mining I wanted to investigate further whether the features chosen so far would be able to provide a predictive capability using the SSAS data mining capability. Data mining would enable decision tree and multi-variate capability, so it should be possible to tease out more subtle relationships. Also, by slicing the data cube by Crime Type Category we can see how the results vary. DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 47
  • 48. The Data Mining Model I set up a Mining Structure based on the MSOA dimension with MSOA Code as key Selected Crimes Per Head as the dependent variable – to be predicted. Selected the Census measures, including Cluster Name, as the independent variables (Input). I used the default of reserving 30% of records for the testing set. DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 48
  • 49. Decision Tree Analysis Run the Mining Model based on Decision Tree algorithm. The Tree show a initial splits based on Bedrooms Per House, and substantial variety in subsequent splits and influencing variables. Cluster Name does not feature as an influencing variable (as you may expect since it is derived) but it is used in many decision points. The Mining Legends shown are those with the most cases from the second level. DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 49
  • 50. DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 50 Decision Tree
  • 51. Dependency Network The Dependency Network identifies Bedrooms Per House as the strongest link, followed by Cluster Category. The predictive capability is genuine but is not strong. 1 2 3 4 5 6 DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 51
  • 52. DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 52
  • 53. Testing the Model By applying data slices to restrict the data used by Crime Type category, better predictive capability can be achieved. This is particularly true for the crime types that we previously categorised as Residential crime. Crime Type Category Score All 0.33 Residential 0.78 Neutral 0.51 Town-Centre 0.41 DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 53
  • 54. Residential Crime Category DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 54
  • 55. Residential Crime Type DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 55
  • 56. Conclusions 1. The clustering of areas by demographics did provide a useful additional indicator for the model. 2. The data mining confirms the regression analysis conclusion that Bedrooms Per House is the strongest demographic indicator. 3. The Bedrooms Per House impact and the model generally has predictive capability for a subset of crime types. DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 56
  • 57. Opportunities for Further Analysis 1. There are other demographic indicators available which I have not yet explored using this crime dataset. 2. There is an archive of crime data which I have not accessed. It would be interesting to explore whether there is any substantial difference by shifting the time frame backwards (closer to the 2011 census date). 3. The Ministry of Housing, Communities & Local Government, publishes its own indexes of deprivation at LSOA level. While they constitute a more limited set of features, it may be that relationships at that level of granularity do become more apparent. DASHBOARD & ANALYSIS OF LONDON'S CRIME AND CENSUS DATA 57