1. Geolocation Data Analysis for Safe
Residence using HiveQL
TEAM: PRIYANKA KALE, PRIYAL MISTRY, HITESH JAGTAP
GUIDE: DR. JONGWOOK WOO
24th Annual Student Symposium, CSULA
26th February 2016
2. Table of Contents
1. Introduction
2. Big Data
3. Flowchart
4. Specifications
5. Implementation
6. Visualization
7. GitHub
8. Business Perspective
9. References
3. Introduction:
Goal- To determine if a location is safe or not by analyzing
huge crime data (1.3 GB) for Chicago city in IL collected from
2001 to present(November 2015).
This is a study of real dataset provided by the government of
United States of America using Big Data Analytics and related
Tools.
Query output is visualized using different graphs and maps for
better interpretation.
9. Hive and Beeswax
Hive is an
infrastructure
built on top of
Hadoop for
data
summarization,
query and
analysis
Beeswax
an
application
to perform
HIVE
queries
11. Total no and rank of crime type –
select primary_type, count(iucr), rank() over (ORDER BY
count(iucr) desc) from crime group by primary_type limit 100;
QueriesandVisualization
12. number of crime as per location type for a given area-
select location_description, count(iucr) from crime where
address = '008XX N MICHIGAN AVE' group by
location_description limit 100;
0
200
400
600
800
1000
1200
Total
Total
13. Final Outcome of Analysis:
CREATE TABLE UnsafeArea row format delimited fields terminated by ','
STORED AS RCFile
AS select address,count(iucr) AS total_crimes,rank() over (ORDER BY
count(iucr) desc) AS rank from crime GROUP BY address;
15. Business Perspective
Get better advertisement
Predictive Policing for Police department: The future of Law
enforcement?
• Reducing Random Gunfire
• Connecting Burglaries and Code Violations