The document summarizes Margaret Furr's research analyzing crime incident data from 2012-2014 in Washington D.C. in relation to university campus locations. Furr created space-time cubes and conducted emerging hot spot analysis to identify crime patterns near campuses over time. The analysis found increasing trends in theft/other and theft/auto crimes near Howard University, as well as sporadic sexual assault incidents. Theft/other crimes were also increasingly clustered near other D.C. university campuses. Future work could include analyzing additional years of crime data and optimizing space-time cube parameters.
Crime Risk Forecasting and Predictive Analytics - Esri UCAzavea
Presentation at the 2011 Esri User Conference that included an overview of HunchLab features related to forecasting, specifically near repeat forecasts and load forecasts.
As we develop our crime analysis software, HunchLab, we are always on the look out for ways of examining and improving data quality as well as new academic research that shows promise to enhance crime analysis.
In this one-hour webinar, we first explain some of the ways we examine data quality when we utilize historic incident datasets for research and analysis and how you can use these techniques in your department. Then, we walk through a series of analytic techniques and practices that can help your department improve your crime analysis processes.
Deep Learning for Public Safety in Chicago and San FranciscoSri Ambati
Presentation on Deep Learning for Public Safety using open data sets from the cities of San Francisco and Chicago.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Forecasting Space-Time Events - Strata + Hadoop World 2015 San JoseAzavea
This presentation uses the speaker’s experience in building a crime forecasting package to outline some tools and techniques useful in modeling space-time event data. While the case study focuses on modeling crime, the techniques and tools presented are applicable to a broad selection of domains.
This presentation was given at Strata + Hadoop World 2015 in San Jose by Jeremy Heffner.
Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...Tarun Amarnath
Throughout the world, people look to San Francisco as a hub for technology; however, this hides a hidden undercurrent of crime in the City by the Bay. My experiment uses Azure ML and Python to predict without bias the category of crime likeliest to occur at a certain time and location in San Francisco.
Crime Risk Forecasting and Predictive Analytics - Esri UCAzavea
Presentation at the 2011 Esri User Conference that included an overview of HunchLab features related to forecasting, specifically near repeat forecasts and load forecasts.
As we develop our crime analysis software, HunchLab, we are always on the look out for ways of examining and improving data quality as well as new academic research that shows promise to enhance crime analysis.
In this one-hour webinar, we first explain some of the ways we examine data quality when we utilize historic incident datasets for research and analysis and how you can use these techniques in your department. Then, we walk through a series of analytic techniques and practices that can help your department improve your crime analysis processes.
Deep Learning for Public Safety in Chicago and San FranciscoSri Ambati
Presentation on Deep Learning for Public Safety using open data sets from the cities of San Francisco and Chicago.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Forecasting Space-Time Events - Strata + Hadoop World 2015 San JoseAzavea
This presentation uses the speaker’s experience in building a crime forecasting package to outline some tools and techniques useful in modeling space-time event data. While the case study focuses on modeling crime, the techniques and tools presented are applicable to a broad selection of domains.
This presentation was given at Strata + Hadoop World 2015 in San Jose by Jeremy Heffner.
Hello Criminals! Meet Big Data: Preventing Crime in San Francisco by Predicti...Tarun Amarnath
Throughout the world, people look to San Francisco as a hub for technology; however, this hides a hidden undercurrent of crime in the City by the Bay. My experiment uses Azure ML and Python to predict without bias the category of crime likeliest to occur at a certain time and location in San Francisco.
This presentation covers the requirements to get started with HunchLab 2.0's predictive policing system. It starts discussing technical requirements (security, authentication) and then proceeds to discuss guidelines for configuring meaningful predictive models of crime. The presentation concludes with information about related geographic and temporal datasets that are useful in forecasting crime with recommendations on how to prioritize data sets to use in HunchLab.
A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixt...Nandana Mihindukulasooriya
The presentation for the paper "A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case" presented at the 1st International Workshop on Completing and Debugging the Semantic Web at the 13th Extended Semantic Web Conference.
DSD-INT 2019 How machine learning will change flood risk and impact assessmen...Deltares
Presentation by Dennis Wagenaar, Deltares, at the Data Science Symposium, during Delft Software Days - Edition 2019. Thursday, 14 November 2019, Delft.
Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy
Hadoop Training in Chennai from BigDataTraining.IN is a leading Global Talent Development Corporation, building skilled manpower pool for global industry requirements. BigData Training.in has today grown to be amongst worlds leading talent development companies offering learning solutions to Individuals, Institutions & Corporate Clients.
This presentation given an overview of geodemographic classifications and why there is a need to use open tools and methods for creating geodemographic classifications. The presentation also describes the challenges involve with creating real-time geodemographic classifications and the use of social media data for geodemographic applications.
Revealing spatial and temporal patterns from Flickr photography: a case study...Sander van der Drift
An exploratory visual analytics approach was used to identify temporal distributions, spatial clusters and popular routes of tourists in Amsterdam by making use of geotagged photos from social media platform Flickr. The presented methods combine the analytical strength of humans with the data processing power of computers, using geovisualisations and charts to explore data, find patterns, and draw conclusions from its outcomes. For this research, the metadata of 2,849,261 geotagged photos was harvested from Flickr and stored in a spatial database. From this dataset, 393,828 photos were located in the municipality of Amsterdam. A semi-automatic classification method classified 39,1% of the users as tourist with a very high precision and recall. The temporal distribution of tourists and locals is compared for different temporal granularities. A method is presented to assess photo timestamps by making use of photos that contain a real clock. An existing grid-based clustering method was implemented and improved to explore Amsterdam’s spatial distribution of tourists in Google Earth. The major tourist hotspots are detected using the density-based clustering algorithm DBSCAN. Finally, the most probable routes of tourists between subsequent photo locations were estimated and aggregated into a route density map. A qualitative approach was used to validate the study outcomes by interviewing eight tourism experts of the municipality of Amsterdam. Their knowledge about the city bears a good resemblance with the detected spatial clusters and route density map of tourists. Despite several imperfections of geosocial data, we conclude that the methods provide meaningful insight into the spatial and temporal patterns of tourists in urban spaces and are a valuable addition to traditional tourism surveys.
What does it take to create a web of government Linked Data? The UK government is finding out. Our story is one of pioneers. You will hear how we are moving out of existing settlements to the wide plains of government data. How we are starting to build the first railroads across this vast territory to open a new lands of opportunity. All the time, of course, having to avoid both outlaws and the Civil War back east.
From Virtual Museums to Peacebuilding: Creating and Using Linked KnowledgeCraig Knoblock
Companies, such as Google and Microsoft, are building web-scale linked knowledge bases for the purpose of indexing and searching the Web, but these efforts do not address the problem of building accurate, fine-grained, deep knowledge bases for specific application domains. We are developing an integration framework, called Karma, which supports the rapid, end-to-end construction of such linked knowledge bases. In this talk I will describe machine-learning techniques for mapping new data sources to a domain model and linking the data across sources. I will also present several applications of this technology, including building virtual museums and integrating data sources for peacebuilding.
Safety is the major issue anywhere. There are a lot of crimes happening every day. It would be very insightful to analyze the crime rate data to identify frequency of crimes, types of crimes, areas with a higher number of crimes etc. These insights will then have the potential to aid in proactive preventive measures by police increasing the level of safety in certain are.To add a different dimension to the analysis we considered California State University Los Angeles as our focal point and proceeded to project the data based on different parameters like time and distance. This would result in extracting key findings about crimes occurring around California State University Los Angeles and in Los Angeles.
This presentation covers the requirements to get started with HunchLab 2.0's predictive policing system. It starts discussing technical requirements (security, authentication) and then proceeds to discuss guidelines for configuring meaningful predictive models of crime. The presentation concludes with information about related geographic and temporal datasets that are useful in forecasting crime with recommendations on how to prioritize data sets to use in HunchLab.
A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixt...Nandana Mihindukulasooriya
The presentation for the paper "A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case" presented at the 1st International Workshop on Completing and Debugging the Semantic Web at the 13th Extended Semantic Web Conference.
DSD-INT 2019 How machine learning will change flood risk and impact assessmen...Deltares
Presentation by Dennis Wagenaar, Deltares, at the Data Science Symposium, during Delft Software Days - Edition 2019. Thursday, 14 November 2019, Delft.
Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy
Hadoop Training in Chennai from BigDataTraining.IN is a leading Global Talent Development Corporation, building skilled manpower pool for global industry requirements. BigData Training.in has today grown to be amongst worlds leading talent development companies offering learning solutions to Individuals, Institutions & Corporate Clients.
This presentation given an overview of geodemographic classifications and why there is a need to use open tools and methods for creating geodemographic classifications. The presentation also describes the challenges involve with creating real-time geodemographic classifications and the use of social media data for geodemographic applications.
Revealing spatial and temporal patterns from Flickr photography: a case study...Sander van der Drift
An exploratory visual analytics approach was used to identify temporal distributions, spatial clusters and popular routes of tourists in Amsterdam by making use of geotagged photos from social media platform Flickr. The presented methods combine the analytical strength of humans with the data processing power of computers, using geovisualisations and charts to explore data, find patterns, and draw conclusions from its outcomes. For this research, the metadata of 2,849,261 geotagged photos was harvested from Flickr and stored in a spatial database. From this dataset, 393,828 photos were located in the municipality of Amsterdam. A semi-automatic classification method classified 39,1% of the users as tourist with a very high precision and recall. The temporal distribution of tourists and locals is compared for different temporal granularities. A method is presented to assess photo timestamps by making use of photos that contain a real clock. An existing grid-based clustering method was implemented and improved to explore Amsterdam’s spatial distribution of tourists in Google Earth. The major tourist hotspots are detected using the density-based clustering algorithm DBSCAN. Finally, the most probable routes of tourists between subsequent photo locations were estimated and aggregated into a route density map. A qualitative approach was used to validate the study outcomes by interviewing eight tourism experts of the municipality of Amsterdam. Their knowledge about the city bears a good resemblance with the detected spatial clusters and route density map of tourists. Despite several imperfections of geosocial data, we conclude that the methods provide meaningful insight into the spatial and temporal patterns of tourists in urban spaces and are a valuable addition to traditional tourism surveys.
What does it take to create a web of government Linked Data? The UK government is finding out. Our story is one of pioneers. You will hear how we are moving out of existing settlements to the wide plains of government data. How we are starting to build the first railroads across this vast territory to open a new lands of opportunity. All the time, of course, having to avoid both outlaws and the Civil War back east.
From Virtual Museums to Peacebuilding: Creating and Using Linked KnowledgeCraig Knoblock
Companies, such as Google and Microsoft, are building web-scale linked knowledge bases for the purpose of indexing and searching the Web, but these efforts do not address the problem of building accurate, fine-grained, deep knowledge bases for specific application domains. We are developing an integration framework, called Karma, which supports the rapid, end-to-end construction of such linked knowledge bases. In this talk I will describe machine-learning techniques for mapping new data sources to a domain model and linking the data across sources. I will also present several applications of this technology, including building virtual museums and integrating data sources for peacebuilding.
Safety is the major issue anywhere. There are a lot of crimes happening every day. It would be very insightful to analyze the crime rate data to identify frequency of crimes, types of crimes, areas with a higher number of crimes etc. These insights will then have the potential to aid in proactive preventive measures by police increasing the level of safety in certain are.To add a different dimension to the analysis we considered California State University Los Angeles as our focal point and proceeded to project the data based on different parameters like time and distance. This would result in extracting key findings about crimes occurring around California State University Los Angeles and in Los Angeles.
Similar to ArcGIS Space-Time Mining of Crime Data (20)
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
1. Margaret Furr
MINING CRIME INCIDENT DATA FOR SPACE-TIME
PATTERNS AROUND WASHINGTON, DC CAMPUSES
Photo Source: http://www.thecollegefix.com/post/19184/
2. DATA
• 2012 Crime Incidents; 2013 Crime Incidents; 2014 Crime Incidents
• File type: shapefile, points
• Variables of interest: X coordinates, Y coordinates, Offense, Report Date, Start Date, End Date
• Observations: 109,656
• Projection: Spatial reference = 26985
• Source: Open DC Data
• University and College Campuses; Campus Areas Zoning
• File type: shapefile, polygons
• Variables of interest: Campus names, Campus areas, Campus lengths
• Observations: 8 college/university campus zoning areas,
30 college/universities
• Projection: Spatial reference = 26985
• Source: Open DC Data
• USStates
• File type: shapefile, polygons
• Observations: 1 for each state + 1 for Washington, DC
• Projection: GCS_WGS_1984 GCS
• Source: United States Geodatabase
1Photo Source: http://opendata.dc.gov/
3. RESEARCH MOTIVATIONS
• Crime -- less likely to occur on campuses
• Washington, DC -- not a city with the most crime
• 2 DC universities - ranked as the most dangerous campuses in some news reports
• Gallaudet University
• Howard University (7 forcible rapes, 90 robberies, 27 aggravated assaults, 160 burglaries, 43 car thefts)
• In general -- a growing concerns about students’ safety on campuses since 1990s
• Understanding crime frequencies and trends across time helps (1) police departments,
(2) administrators, (3) policymakers, (4) journalists and (5) students make decisions
2
4. RESEARCH MOTIVATIONS
• Researchers have analyzed space-time crime patterns
• None have looked at DC crimes as they relate to campuses’
locations
• I have conducted spatial-regression analyses using Chicago crime data
and R spatial packages
• I have not conducted space-time analyses, using the time series
component of crime data, but I find this component to be an important
one
• I have not analyzed crime data that occurs in the DC area, but this is
where 2 universities are reported to be unsafe
• I have not used ArcGIS tools yet!
3
5. RESEARCH QUESTIONS
• What is the frequency of each crime type at locations
near campuses, and how do these frequencies compare to
the frequencies of each type far from campuses?
• Looking at the start datetimes of 2012-2014 crime
reports, are there any patterns in when each incident
occurs near campuses?
• Do these patterns reveal emerging trends? If so, how do
trends differ by crime type?
• What trends are emerging near the two most
dangerously ranked DC campuses?
4
6. RESEARCH APPROACH
• Space-Time Analysis of crime within buffered campus areas
• Space-Time Cube
• Emerging Hot Spot Analysis
5
7. • Define Projection and Project
• Selection
• Merge
• Buffer
• Frequency Analysis
• Convert Time
• Create Space-Time Cube
• Emerging Hot Spot Analysis
TOOLS
6
Photo Sources: https://desktop.arcgis.com/en/desktop/latest/tools/space-time-pattern-mining-toolbox/learnmorecreatecube.htm
https://desktop.arcgis.com/en/desktop/latest/tools/space-time-pattern-mining-toolbox/learnmoreemerging.htm
9. PROJECTION
• Crime Incident data and Campus Area data had no projection
• Metadata said this data was 26985
• Defined data as NAD83-Maryland, Projection
• Reprojected data to be NAD83 17N, UTM
• DC data had WGS_1985 Coordinates and WGS_1984 datum
• Reprojected data to be NAD83 17, UTM
8
10. MERGE
• Merged Crime Incidents
• 2012 Crime Incidents (NAD83-17N)
• 2013 Crime Incidents (NAD83-17N)
• 2014 Crime Incidents (NAD83-17N)
• Merged Campus Areas
• University College Campuses (NAD83-17N)
• Campus Areas Zoning (NAD83-17N)
9
11. DC LAYER: SELECT
• Data from DC Open data does not have any shapefiles for the
general DC area
• UnitedStates.gdb has a USStates shapefile, which includes
DC as one state
• Selected DC from USStates
• Exported selection as its own DC layer (DC_NAD8317)
10
15. CREATE BUFFERS
• Create buffer around campus areas
• 1000 ft. is a standard buffer for schools
• 1000 ft = 305m
• 305m as the first buffer distance
• 1500 ft, 458m and 2000 ft, 610m* as second and third buffer distances
• SELECT crime points within buffers
• 40,873 incidents with 2000 ft. or 610m* of campus areas
14
19. BUFFERED FREQUENCIES BY TYPE
Offense Frequency - all data Frequency - data in buffer 1 Frequency - data in buffer 2 Frequency - data in buffer 3
Arson 95 11 15 21
Assault w/ dangerous weapon 7224 783 1174 1533
Burglary 10151 1470 2247 2991
Homicide 296 28 39 52
Motor vehicle theft 8633 967 1573 2070
Robbery 11470 1568 2437 3341
Sex abuse 862 171 232 269
Theft f/ auto 31299 6283 9650 13162
Theft / other 39269 8540 13561 17414
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
Arson Assault w/
dangerous
weapon
Burglary Homicide Motor
vehicle theft
Robbery Sex abuse Theft f/ auto Theft /
other
Frequency - data in buffer 1
Frequency - data in buffer 2
Frequency - data in buffer 3
Frequency Differences: Buffered Data Type Frequencies/All Data Type Frequencies
18
20. • “Summarizes a set of points into a netCDF data structure by aggregating
them into space-time bins.”
• “Within each bin, the points are counted.”
• “For all bin locations, the trend for counts over time are evaluated.”
SPACE-TIME PATTERN MINING TOOLBOX:
CREATE SPACE TIME CUBE
19
Photo Sources: https://desktop.arcgis.com/en/desktop/latest/tools/space-time-pattern-mining-
toolbox/learnmorecreatecube.htm
21. • “Identifies trends in the clustering of point densities (counts) or
summary fields in a space time cube created.”
• “Categories: (1) new, (2) consecutive, (3) intensifying, (4) persistent,
(5) diminishing, (6) sporadic, (7) oscillating, and (8) historical hot and
cold spots.”
SPACE-TIME PATTERN MINING TOOLBOX:
EMERGING HOT SPOT ANALYSIS
20
Photo Sources: https://desktop.arcgis.com/en/desktop/latest/tools/space-time-pattern-mining-
toolbox/learnmorecreatecube.htm
22. CONVERT TIME
• Original date variable = String type
• Date type is required for Space-Time Pattern Mining Tools
• Report date, Start date, End date
21
24. CREATE SPACE TIME CUBE
• Time Step Alignment: start time, end time, reference time
• End time; eliminates bias from choosing a reference time
• Time Step Interval: 1 Weeks
• Distance Interval:
• Calculated optimal interval based on algorithm that considers
spatial distribution (histogram bin-width optimization)
• Template Cube:
• did not use
23
Photo Sources: https://desktop.arcgis.com/en/desktop/latest/tools/space-time-pattern-mining-
toolbox/learnmorecreatecube.htm
25. STATISTICS FROM SPACE TIME CUBE
• Mann-Kendall Statistic
• Statistical question: “are the events represented by the input points increasing or
decreasing over time?”
• Answer: “the number of points for all locations in each time-step interval,
analyzed as a time series of count values”
• Rank correlation analysis for the bin count or value and their time sequence
• +1 if 1st bin < 2nd bin
• -1 if 1st bin > 2nd bin
• 0 if 1st bin = 2nd bin)
• Results are summed
• Observed sum compared to expected sum
• p-value
• small p-value: the trend is statistically significant
24
26. STATISTICS FROM SPACE TIME CUBE
All Data Buffered Data 3
Total Number of Locations 8989 8400
Locations with at least one point 2945 1814
Associated Bins 5289220 3198082
% Non-zero Sparseness 1.54 1.03
Time Step Interval 1 week 1 week
Distance Interval 201m 134m
Number of Time Steps 1796
Cube Extent Across Space
Min Y 836904.0313m 837838.3256m
Min X 4303610.9737m 4309021.9271m
Max X 854777.2083m 849063.8549m
Max Y 4323770.1568m 4322418.4232m
Rows 101 100
Columns 89 84
Total bins 16144244 14809200
Overall Data Trend
Trend Direction Increasing
Trend Statistic 24.3788 23.9058
Trend p-value 0 0
25
27. • Theft/Other: Increasing Trend, Significant
• Theft f/Auto: Increasing Trend, Significant
• Robbery: Decreasing Trend (>-5), Significant
• Burglary: Increasing Trend (<5 though unlike others >18), Significant (> 0
though unlike 0)
• Motor Vehicle Theft: Increasing Trend (<6), Significant
• Assault with Dangerous Weapon: Decreasing Trend, Insignificant
• Sex Assault: Increasing Trend, Significant
• Homicide: too few records for analysis
• Arson: too few records for analysis
STATISTICS FROM SPACE TIME CUBES BY TYPE
26
32. CONCLUSIONS
• Around Howard University there are significant patterns of theft/auto crimes and also sexual
assault
• Theft/auto incidents are emerging consecutively
• Sexual Assault incidents emerging sporadically
• These types are two reported (among with others) in the media
• Around George Washington University and Georgetown University there are also significant
patterns of theft/other crimes
• Theft/other incidents are emerging consecutively to the non-river, and also Howard University
side of George Washington University
• Theft/other incidents are emerging sporadically on the river-side of Georgetown University
and George Washington University
• Both types of theft are emerging consecutively around other US colleges’ and universities’ DC
campuses
• Cornell in Washington, University of California Washington Center
31
33. FUTURE WORK
32
• More years of crime data?
• Experiment with ArcGIS Pro or ArcGIS’s multidimensional toolbox
to visualize the space time box and understand it better
• Develop a Python script to tune, or optimize the time and distance
intervals as well as the buffer distance for more accurate results
• Experiment with the template cube for potentially more consistent
results across types