Mumbai University, T.Y.B.Sc.(I.T.), Semester VI, Principles of Geographic Information System, USIT604, Discipline Specific Elective Unit 2: Data Management and Processing System
Mumbai University, T.Y.B.Sc.(I.T.), Semester VI, Principles of Geographic Information System, USIT604, Discipline Specific Elective Unit 2: Data Management and Processing System
A GIS is a particular form of Information System applied to geographical data.
An Information System is a set of processes, executed on raw data to produce information which will be useful when making decisions.
GIS is not only a tool for making maps, it is a system for data analysis.
In this study various techniques for exploratory spatial data analysis are reviewed : spatial autocorrelation, Moran's I statistic, hot spots analysis, spatial lag and spatial error models.
La geología no permanece ajena a la renovación tecnológica, informatización de la cartografía geológica y el uso de la tecnología SIG, siguiendo el concepto de sistema integrado de información han sido pilares fundamentales para para que la información geológica sea mucho más accesible, fácil de procesar y divulgar.
A GIS is a particular form of Information System applied to geographical data.
An Information System is a set of processes, executed on raw data to produce information which will be useful when making decisions.
GIS is not only a tool for making maps, it is a system for data analysis.
In this study various techniques for exploratory spatial data analysis are reviewed : spatial autocorrelation, Moran's I statistic, hot spots analysis, spatial lag and spatial error models.
La geología no permanece ajena a la renovación tecnológica, informatización de la cartografía geológica y el uso de la tecnología SIG, siguiendo el concepto de sistema integrado de información han sido pilares fundamentales para para que la información geológica sea mucho más accesible, fácil de procesar y divulgar.
We chose to address the issue of misinformation with the general public when it comes to the crime and its patterns in the Hartford region. Dataset which we chose for this is from the publicly available Police Incidents registered in Hartford area from 2005 till date with the time stamp. Pattern of the crime can be analyzed like e.g. Robbery incidents mainly happen during holiday season. Using the API we built a live tableau dashboard and also forecasted the drug offenses as per neighborhood.
I downloaded data from from City of Chicago Data Portal and made the analysis of 2014 Crime Data. This is just a simple version. I can do more complicated analysis if needed. I used Excel to do this analysis.
Crime Analysis & Prediction System is a system to analyze & detect crime hotspots & predict crime.
It collects data from various data sources - crime data from OpenData sites, US census data, social media, traffic & weather data etc.
It leverages Microsoft's Azure Cloud and on premise technologies for back-end processing & desktop based visualization tools.
San Francisco Crime Classification with Tableau 10.4RAHUL SINGH
• Analyzed and visualized crime data in San Francisco based on 12 years of reports
• Derived hourly trends for the crimes including period of day with highest and lowest rates
• Explored police departments with higher crime rates and 5 top crime distributions
Student #1 I have chosen to write about the history of data anal.docxjohniemcm5zt
Student #1
I have chosen to write about the history of data analysis for the Los Angeles Police Department. While I currently reside in Colorado Springs, Colorado and work as a deputy sheriff in Denver, Colorado I grew up in the greater Los Angeles area and I know that they should have a large amount of data to draw from.
Currently the Los Angeles Police Department uses COMPSTAT to compile their data. They have a unit, known as the COMPSTAT unit, whose sole job is to compile crime statistics and analyze the data (Los Angeles Police Department, 2016) COMPSTAT is short for computer statistics. COMPSTAT was developed by Police Commissioner William Bratton in 1994 for use by the New York Police Department. According to the University of Maryland by the year 2000 over a third of police agencies with over 100 officers were utilizing some sort of COMPSTAT like program (University of Maryland, 2015). In 2002 William Bratton became the Chief of Police for the Los Angeles Police Department and brought with him the concept of COMPSTAT. During the first six years of his tenure Los Angeles saw a steady decrease in the cities crime rates thanks largely in part to COMPSTAT policing.
Mean, mode and median play a large part in analyzing criminal data. The mean is the average number. An example of this for crime data analysis would be in neighborhood C there was 14 robberies committed on Monday between 1 and 3 AM, 17 robberies on Tuesday at the same time period and 9 on Wednesday during the same time period. The mean would be 13.3 robberies per night for those 3 nights. Knowing this is high for the city the data could be used to justify extra police presence in Neighborhood C. An example of the mode would be if in the same neighborhood in the same week there were 17 robberies on both Friday and Saturday, 12 on Thursday and 11 on Sunday. The mode would be 17 and it would also be a reason to add extra police presence in the neighborhood until a significant decrease was seen in the amount of robberies taking place. Finally we come to the median. This is simply line the numbers up for the week and take the number that falls in the middle. In the case of the robberies occurring in neighborhood C the number would be 14. All of this data can be combined to show watch commanders and captain’s areas where they should be focusing their officer’s time. If there is a neighborhood that has seen only one or two robberies during the week, it is definitely not in as much need of a heavy police presence as Neighborhood C is.
Student #2
Beginning in the mid-1990’s, police in New York began to run statistical analysis of the city’s crime reports, arrests and other police activity known as COMPSTAT. Law enforcement agencies since this analysis began, has implemented their own data-driven approaches to tracking and adapting to crime trends. The LAPD is both heavily armed and thoroughly computerized. The Real-Time Analysis and Critical Response Division is its central processor..
An Intelligence Analysis of Crime Data for Law Enforcement Using Data MiningWaqas Tariq
The concern about national security has increased significantly since the 26/11 attacks at Mumbai, India. However, information and technology overload hinders the effective analysis of criminal and terrorist activities. Data mining applied in the context of law enforcement and intelligence analysis holds the promise of alleviating such problem. In this paper we use a clustering/classify based model to anticipate crime trends. The data mining techniques are used to analyze the city crime data from Tamil Nadu Police Department. The results of this data mining could potentially be used to lessen and even prevent crime for the forth coming years
Building an urban theft map by analyzing newspaper - SMAP 2018Laura Po
Paper presented at the 13th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP 2018) will take place in Zaragoza (Spain), on 6th and 7th September 2018. http://smap2018.unizar.es
The web app is available at https://www.dbgroup.unimo.it/theftmap/
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
1. San Francisco Crime Analysis – Report
-By Rohit Dandona and Sameer Darekar
Introduction:
From 1934 to 1963, San Francisco was infamous for housing some of the world's most notorious
criminals on the inescapable island of Alcatraz. Today, the city is known more for its tech scene than its
criminal past. But, with rising wealth inequality, housing shortages, and a proliferation of expensive
digital toys riding BART to work, there is no scarcity of crime in the city by the bay. This project examines
the San Francisco Police Department crime records between January 1st 2003 and May 13st 2015. It
visualizes the trends of major crimes and drugs in the city across time and locations. This analysis has
been carried out to facilitate law enforcement officers to enhance their strategies based on time and
location.
R and Tableau have been used for data processing and visualizations.
Dataset:
This dataset contains incidents derived from SFPD Crime Incident Reporting system. The data ranges
from 1/1/2003 to 5/13/2015.
Data fields:
Dates - timestamp of the crime incident
Category - category of the crime incident
Descript - detailed description of the crime incident
DayOfWeek - the day of the week
PdDistrict - name of the Police Department District
Resolution - how the crime incident was resolved
Address - the approximate street address of the crime incident
X – Longitude
Y - Latitude
Data-Preprocessing:
The date field in the dataset is a timestamp present in the “MM/dd/yyyy hh:mm:ss” format and is not of
much utility if used as such.
Tableau provides the facility of parsing through a timestamp through a utility called “DATEPART ” and
extract specific segments of the timestamp. For example:
DATEPART('hour',[Dates])
2. A part of the analysis required representing the data by five separate parts of the day. The following
construct was used to derive a column in Tableau:
IF DAY([Dates]) >= 4 AND DAY([Dates]) <= 8 THEN 'EARLY MORNING(4 AM to 8 AM)'
ELSEIF DAY([Dates]) > 8 AND DAY([Dates]) < 12 THEN 'MORNING(8 AM to 12 PM)'
ELSEIF DAY([Dates]) >= 12 AND DAY([Dates]) < 17 THEN 'AFTERNOON(12 PM to 5 PM)'
ELSEIF DAY([Dates]) >= 17 AND DAY([Dates]) <= 21 THEN 'EVENING (5 PM to 9 PM)'
ELSE 'NIGHT(9 PM to 4 AM)' END
The address data is of the following form
1. 1500 Block of LOMBARD ST
2. OAK ST / LAGUNA ST
The first one is actually a location on the street but the second one is the intersection of two streets, we
extracted the streets from both of them for plotting the bubble chart shown in analysis section. The rest
of the columns are clean and no further processing was required.
Analysis of Crime
Trends:
There are 39 categories of crime report incidents available in the dataset including “Other offenses”
and “Non-criminal”.
Larceny and theft was the most common crime in San Francisco between January 1st 2003 and May
13th 2015, with a total of 174,900 reported incidents. The next two highest crime categories, “Other
offenses” (98,281 reported incidents) and “Non-criminal” (98,172 reported incidents), are omitted
from discussion in the analyses. Instead, the key focus is on the following high crime categories:
larceny and theft, assault (76,876 reported incidents), drugs (53,971 reported incidents), vehicle
theft (53,781 reported incidents), vandalism (44,725 reported incidents), drugs (21,910 reported
incidents) and burglary (19,226 reported incidents).
3. Month wise distribution:
Since the crime incidents recorded in 2015 are only till May, those records have been removed
for this analysis. The month of February is the safest and is observed to have the least crime
incidents reported. October is the least safe with the maximum number of reported crime
incidents.
Region wise distribution:
A contour plot to represent the region wise density of crime leveraging the available
latitude/longitude information of the reported crime, is as follows:
4. A giant hotspot can be observed in the Southern and Tenderloin region with relatively less
dense plots in the surrounding neighborhoods. Majority of the crime seems to be concentrated
in these and surrounding regions.
Plotting a line graph to represent crime in each region on each day of a week uncovers some
interesting trends:
Crime rate by day of the week:
5. Clearly, the Southern region is the most notorious with the maximum number of crimes
reported followed by the Northern region. Richmond is the safest place to be in San Francisco.
The rate of crime is observed to increase over the weekend. The number of reported incidents
seems to be the maximum on Fridays with a total number of 133,743 reported incidents. For
almost all regions, the same trend can be observed. Tenderloin, however, is an exception as the
crime rate in this region decreases over the weekend and is the highest on Wednesdays.
Wednesday, Friday and Saturday are the most crime prominent days. Interestingly, the least
amount of incidents were reported on Mondays (116,707 reported incidents) followed by
Sundays. In the Southern, Northern, Central and Mission districts, the number of crimes
increased sharply on Friday and Saturday, then declined for rest of the week.
Specific crime trends
We examine the crime trends of a hand full of crimes (Kidnapping, Fraud, Robbery, Missing
Person, Burglary, Vandalism, Vehicle Theft, Drug/Narcotics, Assault and Larceny/Theft). The
following trends are observed:
Interestingly, Drug/Narcotic related crimes decrease over the weekend (unlike other
categories). The related reported incidents are the maximum on a Friday.
Among the high crime categories, larceny and theft tend to occur on Friday and Saturday. On
the other hand, the occurrences of assault steadily increased from Thursday to Sunday, while
burglary generally occurred more often during the weekdays than the weekends. Drug crimes
6. were most commonly reported on Wednesday and least reported during the weekends, and
robberies happened by approximately the same frequency every day.
Hourly Crime Analysis
The overall hourly crime trends are as follows:
Hourly trend analysis for specific crimes:
7. The hourly trends unravel some interesting crime facts:
5 am is the safest part of a day with 8,637 reported incidents and 6 pm is the most
dangerous hour with 55,104 reported incidents.
Surprisingly, 12 pm is the second most dangerous hour during a day and in fact is the
hour where incidents reported under some crime categories is maximum.
Kidnapping and stolen property incidents take place uniformly throughout the day
Violent Crimes in San Francisco: Assault, Robbery and Sex Offences (forcible)
Distribution:
Violent crime density (each) by parts of day:
Assault:
Part of Day
Time
Frame
Early
Morning
4am to
8am
Morning
8am to
12pm
Afternoon
12pm to
5 pm
Evening
5pm to
9pm
Night
9pm to
4am
8. Distribution of Assault crimes is more or less the same at different times of the day.
Robbery:
A slight shift in epicenter of the crime can be observed over different parts of the day.
Sex Offences(forcible):
Different Patterns can be observed at different time with increase in density at night.
9. Larceny and Theft:
Crime is spread across multiple regions early in the mornings and at night. It is concentrated on
a single region in the afternoon.
Food for thought
The following shows a bar graph representing the percentage of incidents which have been
investigated by law enforcement agencies vs the ones that haven’t been.
Surprisingly, for 61.02% of the reported incidents no conclusive action has been taken.
10. The following graph shows the top three streets where the maximum crime incidents were
reported [Bryant Street, Market Street, Mission Street]:
The police commission office is located on 850 Bryant Street which is just near the 800 Bryant
Street which has the most crime rate in terms of streets.
Market Street, with the second highest number of crime incidents reported, has a police station
0.5 miles away from it:
11. Conclusion
Tenderloin and Southern are the PD districts needing special attention
Famous Neighborhoods have more crime rate
Violent crime seems to be concentrated around Tenderloin.
Assault is most common violent crime.
Crime rate is highest at 6 pm and lowest at 5 am with a spike at 12 in afternoon that means
whenever there are people on the streets the crime rate is higher.
Pattern observed for each crime should be followed while deploying police personnel
SFPD needs to work more proactively as the 61% of cases are still not resolved
Also the crime rate near the police station needs to be decreased the SFPD needs to create
terror in the minds of criminals.
Need to find out why the months February has the lowest and October has the highest crime
commited
The second part of the Kaggle competetion that is predicting the classes of the crime has also
been completed and in process for further improvement, the current rank is 672 out of
1886[4].
References
[1] https://www.kaggle.com/c/sf-crime/data for Dataset
[2] http://www.r-bloggers.com/google-maps-and-ggmap/ for google maps and using ggmap in
R.
[3] http://blog.dominodatalab.com/geographic-visualization-with-rs-ggmaps/ creating clusters
and heatmaps in R
[4] https://www.kaggle.com/c/sf-crime/leaderboard?submissionId=2963937 Kaggle competition
submission