Jongwook Woo
HiPIC
CSULA
CheckMate: Geolocation
Analyzer for Safe Residence
College of Business & Economics
Faculty Research Colloquium
Priyanka Kale, Priyal Mistry, Hitesh Jagtap, Jongwook Woo
California State University Los Angeles
December 1st , 2016
High-Performance Information Computing Center (HiPIC)
California State University Los Angeles
High Performance Information Computing Center
Jongwook Woo
CSULA
Table of Contents
Introduction to Big Data
Implementation
Visualization
Conclusion
References
High Performance Information Computing Center
Jongwook Woo
CSULA
Data Issues
Large-Scale data
Tera-Byte (1012), Peta-byte (1015)
– Because of web
– Sensor Data (IoT), Bioinformatics, Social Computing,
Streaming data, smart phone, online game…
Cannot handle with the legacy approach
Too big
Non-/Semi-structured data
Too expensive
Need new systems
Non-expensive
High Performance Information Computing Center
Jongwook Woo
CSULA
Big Data
Volume
Complexity
Variety
Variability
High Performance Information Computing Center
Jongwook Woo
CSULA
Two Cores in Big Data
How to store Big Data
How to compute Big Data
Google
How to store Big Data
– GFS
– Distributed Systems on non-expensive commodity
computers
How to compute Big Data
– Map-Reduce
– Parallel Computing with non-expensive computers
Data Intensive Super Computers
Published papers in 2003, 2004
High Performance Information Computing Center
Jongwook Woo
CSULA
Definition: Big Data
Non-expensive frameworks that can
store a large scale data and process it
faster in parallel [5, 6]
Hadoop
– Non-expensive Super Computer
– More public than the traditional super computers
• You can store and process your applications
– In your university labs, small companies, research
centers
NoSQL DB
– Cassandra, Hbase, Couchbase, MongoDB
High Performance Information Computing Center
Jongwook Woo
CSULA
Table of Contents
Introduction to Big Data
Implementation
Visualization
Conclusion
 References
High Performance Information Computing Center
Jongwook Woo
CSULA
Motivation
Issues
 Person looking for place of residence may find it
difficult to get real details about the place from safety
point of view.
 Person can be confused to select location for residence.
Solutions
Analyzing open data to find the location for safety
perspective.
Our Approach
 System Design with the flow chart
Big Data
– Hadoop Hive using MS Azure Cloud Computing
High Performance Information Computing Center
Jongwook Woo
CSULA
Flowchart
Download Public
Dataset
Upload data into Big Data HDFS
Trigger Hadoop Hive Queries
Result Data Tables
Output Visualization
High Performance Information Computing Center
Jongwook Woo
CSULA
Specifications in Big Data cluster
10
Cloud Computing: Microsoft Azure with Hortonworks sandbox
1. Linux OS system
2. Number of nodes: 4
3. CPU: 8 cores
4. Memory Size: 14 GB
High Performance Information Computing Center
Jongwook Woo
CSULA
Implementation : Hadoop Hive
 Hortonworks HDP
 Hadoop Big Data Platform
 Ambari: a web GUI in Hadoop
 Explore HDFS file systems
 Process data with Hive and Tez queries as MapReduce jobs
High Performance Information Computing Center
Jongwook Woo
CSULA
Table of Contents
Introduction to Big Data
Implementation
Visualization
Conclusion
 References
High Performance Information Computing Center
Jongwook Woo
CSULA
Analysis Result
Detailed analysis made it easy to understand
which part of Chicago City is having safe
place of residence and which areas are
unsafe.
 This analysis can play vital role in any house
finding website if added as module, it can
help people know more about the crime
history of the place.
 Subsequent slides shows us some of the
important outcomes of the analysis.
High Performance Information Computing Center
Jongwook Woo
CSULA
Queries and Visualization
Rank of Crime type as per occurrences
High Performance Information Computing Center
Jongwook Woo
CSULA
Queries and Visualization
Table
High Performance Information Computing Center
Jongwook Woo
CSULA
Queries and Visualization
High Performance Information Computing Center
Jongwook Woo
CSULA
Final Outcome of Analysis
 Using similar approach
we find out which area of Chicago city is safer than
the other.
 Geo Spatial Analysis
Outcome file is plotted against the map of Chicago
city
 Visualized using green and red mark points.
– Green indicated safe areas
– Red indicates unsafe areas
Top 50 results are highlighted.
High Performance Information Computing Center
Jongwook Woo
CSULA
Final Outcome of Analysis
MAP
High Performance Information Computing Center
Jongwook Woo
CSULA
Table of Contents
Introduction to Big Data
Implementation
Visualization
Conclusion
References
High Performance Information Computing Center
Jongwook Woo
CSULA
Conclusion
An exhaustive analysis of geolocation data
for Chicago City is done
a user searching a place for residence can
easily select better neighborhood depending
on its crime history
Future work
Further Analysis of individual area can be done based
on other factors affecting the residence
 Integrate this analysis with the rental or lease
companies
 Analyze more data at different locations
High Performance Information Computing Center
Jongwook Woo
CSULA
Table of Contents
Introduction to Big Data
Implementation
Visualization
Conclusion
 References
High Performance Information Computing Center
Jongwook Woo
CSULA
1. https://catalog.data.gov
2. https://cwiki.apache.org/confluence/display/Hive/Tutorial
3. https://hortonworks.com/tutorials
4. GitHub Link: https://github.com/priya708/Project-520
5. “Market Basket Analysis Algorithm with Map/Reduce of Cloud
Computing”, Jongwook Woo and Yuhang Xu, The 2011 international
Conference on Parallel and Distributed Processing Techniques and
Applications (PDPTA 2011), Las Vegas (July 18-21, 2011)
6. Jongwook Woo, DMKD-00150, “Market Basket Analysis Algorithms with
MapReduce”, Wiley Interdisciplinary Reviews Data Mining and
Knowledge Discovery, Oct 28 2013, Volume 3, Issue 6, pp445-452
7. Jongwook Woo, “Big Data Trend and Open Data”, UKC 2016, Dallas, TX
Aug 12 2016
References
High Performance Information Computing Center
Jongwook Woo
CSULA
THANK YOU
Any Questions?

Chek mate geolocation analyzer

  • 1.
    Jongwook Woo HiPIC CSULA CheckMate: Geolocation Analyzerfor Safe Residence College of Business & Economics Faculty Research Colloquium Priyanka Kale, Priyal Mistry, Hitesh Jagtap, Jongwook Woo California State University Los Angeles December 1st , 2016 High-Performance Information Computing Center (HiPIC) California State University Los Angeles
  • 2.
    High Performance InformationComputing Center Jongwook Woo CSULA Table of Contents Introduction to Big Data Implementation Visualization Conclusion References
  • 3.
    High Performance InformationComputing Center Jongwook Woo CSULA Data Issues Large-Scale data Tera-Byte (1012), Peta-byte (1015) – Because of web – Sensor Data (IoT), Bioinformatics, Social Computing, Streaming data, smart phone, online game… Cannot handle with the legacy approach Too big Non-/Semi-structured data Too expensive Need new systems Non-expensive
  • 4.
    High Performance InformationComputing Center Jongwook Woo CSULA Big Data Volume Complexity Variety Variability
  • 5.
    High Performance InformationComputing Center Jongwook Woo CSULA Two Cores in Big Data How to store Big Data How to compute Big Data Google How to store Big Data – GFS – Distributed Systems on non-expensive commodity computers How to compute Big Data – Map-Reduce – Parallel Computing with non-expensive computers Data Intensive Super Computers Published papers in 2003, 2004
  • 6.
    High Performance InformationComputing Center Jongwook Woo CSULA Definition: Big Data Non-expensive frameworks that can store a large scale data and process it faster in parallel [5, 6] Hadoop – Non-expensive Super Computer – More public than the traditional super computers • You can store and process your applications – In your university labs, small companies, research centers NoSQL DB – Cassandra, Hbase, Couchbase, MongoDB
  • 7.
    High Performance InformationComputing Center Jongwook Woo CSULA Table of Contents Introduction to Big Data Implementation Visualization Conclusion  References
  • 8.
    High Performance InformationComputing Center Jongwook Woo CSULA Motivation Issues  Person looking for place of residence may find it difficult to get real details about the place from safety point of view.  Person can be confused to select location for residence. Solutions Analyzing open data to find the location for safety perspective. Our Approach  System Design with the flow chart Big Data – Hadoop Hive using MS Azure Cloud Computing
  • 9.
    High Performance InformationComputing Center Jongwook Woo CSULA Flowchart Download Public Dataset Upload data into Big Data HDFS Trigger Hadoop Hive Queries Result Data Tables Output Visualization
  • 10.
    High Performance InformationComputing Center Jongwook Woo CSULA Specifications in Big Data cluster 10 Cloud Computing: Microsoft Azure with Hortonworks sandbox 1. Linux OS system 2. Number of nodes: 4 3. CPU: 8 cores 4. Memory Size: 14 GB
  • 11.
    High Performance InformationComputing Center Jongwook Woo CSULA Implementation : Hadoop Hive  Hortonworks HDP  Hadoop Big Data Platform  Ambari: a web GUI in Hadoop  Explore HDFS file systems  Process data with Hive and Tez queries as MapReduce jobs
  • 12.
    High Performance InformationComputing Center Jongwook Woo CSULA Table of Contents Introduction to Big Data Implementation Visualization Conclusion  References
  • 13.
    High Performance InformationComputing Center Jongwook Woo CSULA Analysis Result Detailed analysis made it easy to understand which part of Chicago City is having safe place of residence and which areas are unsafe.  This analysis can play vital role in any house finding website if added as module, it can help people know more about the crime history of the place.  Subsequent slides shows us some of the important outcomes of the analysis.
  • 14.
    High Performance InformationComputing Center Jongwook Woo CSULA Queries and Visualization Rank of Crime type as per occurrences
  • 15.
    High Performance InformationComputing Center Jongwook Woo CSULA Queries and Visualization Table
  • 16.
    High Performance InformationComputing Center Jongwook Woo CSULA Queries and Visualization
  • 17.
    High Performance InformationComputing Center Jongwook Woo CSULA Final Outcome of Analysis  Using similar approach we find out which area of Chicago city is safer than the other.  Geo Spatial Analysis Outcome file is plotted against the map of Chicago city  Visualized using green and red mark points. – Green indicated safe areas – Red indicates unsafe areas Top 50 results are highlighted.
  • 18.
    High Performance InformationComputing Center Jongwook Woo CSULA Final Outcome of Analysis MAP
  • 19.
    High Performance InformationComputing Center Jongwook Woo CSULA Table of Contents Introduction to Big Data Implementation Visualization Conclusion References
  • 20.
    High Performance InformationComputing Center Jongwook Woo CSULA Conclusion An exhaustive analysis of geolocation data for Chicago City is done a user searching a place for residence can easily select better neighborhood depending on its crime history Future work Further Analysis of individual area can be done based on other factors affecting the residence  Integrate this analysis with the rental or lease companies  Analyze more data at different locations
  • 21.
    High Performance InformationComputing Center Jongwook Woo CSULA Table of Contents Introduction to Big Data Implementation Visualization Conclusion  References
  • 22.
    High Performance InformationComputing Center Jongwook Woo CSULA 1. https://catalog.data.gov 2. https://cwiki.apache.org/confluence/display/Hive/Tutorial 3. https://hortonworks.com/tutorials 4. GitHub Link: https://github.com/priya708/Project-520 5. “Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing”, Jongwook Woo and Yuhang Xu, The 2011 international Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2011), Las Vegas (July 18-21, 2011) 6. Jongwook Woo, DMKD-00150, “Market Basket Analysis Algorithms with MapReduce”, Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery, Oct 28 2013, Volume 3, Issue 6, pp445-452 7. Jongwook Woo, “Big Data Trend and Open Data”, UKC 2016, Dallas, TX Aug 12 2016 References
  • 23.
    High Performance InformationComputing Center Jongwook Woo CSULA THANK YOU Any Questions?