1. Big Data in GIS
Environment
Shivaprakash Yaragal
M.Tech GIS(2015-17)
2. Objective
1. To investigate the existing capabilities of esri
products in handling huge data sets. Processing
and analysis of data sets using esri products.
2. Conduct study on recent esri architecture for Big
Data processing.
3. Objective 1
• To investigate the existing capabilities of esri
products in handling huge data sets. Processing
and analysis of data sets using esri products.
Tasks involved
• Understanding Big Data in GIS
• Identifying python packages and tools used for
data processing with respect to esri products.
• Identifying visualization package and resources to
be used with esri products
• Working on New York Taxi data
4. Spatio-
Temporal Big
Data
Data Source
Type Open Source
Pandas Python package Yes
ArcPy Python package No
IPython Python Package Yes
Anaconda IDE Yes
Tableau Public Software Free public version
FME ArcGIS Interoperability
Extension
No
Figure A: New York Taxi Data(Green Taxi)
Data Source :
http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
Objective 1
5. Objective 1
Green Taxi Data
Ancillary dataDropOff dataPickup data
csv splitting
NY locality polygon
Spatial Join Spatial Join
Merge Data
Data Filtering
Data Visualization
Methodolo
gy
Python
ArcPy
Tableau Public
Python
Interoperability Extension
Python
Python
ArcPy
Python
Spatial Processing
Pre-Processing
Visualization
Method 1 Method 2 Method 3
6. Method
1
Preprocessing
and Spatial Processing
Data splitting, Spatial
processing and merging
Tool
Post-processing Tool
Visualization
IPython Analysis-
Visualization
Figure E : From Bushwick South
to Crown Height North(BK78-
BK61)
Figure F From Crown Heights
South-Clinton Hill(BK63-BK69)
Figure B
Figure F
Figure C
7. Spatial Processing
Method 2
Spatial Join Data Merging
Visualization could be
IPython
Or Tableau Public
Method 3
Spatial Processing by
either ArcPy or
Interoperability tool
Tableau Public
Visualization
Data Analytics and Visualization using Tableau Public
Figure G
Figure I
Figure H
Figure J
8. Can we answer some question?
Figure K : Peak Traffic: 7 am to 11 am . Fair
remains in and around average even during peak
hour. Hence no dynamic fairing
Figure L: Circle marked in Green are anomalies
that deviates from patter and these should be
investigated. Circle in red are outliers
9. Objective 1What can inference:
• Most of passenger travel alone.
• Peak pickup of individual passenger is between 7
am to 10 am and then peaks again between 4 PM
to 10 PM.
• For Sustainability point of view 2 sitter vehicle(1
passenger) could be applied to have efficient
transport system.
• Car polling technique could be devised between
evening peak hours between 3 PM to 9 PM.
Figure M
Figure N : 2 sitter
vehicle(1
passenger)
https://public.tableau.com/views/NYTaxiDataMultiDimentionalVisualizationRegionWise12468sitterdistribution/Regio
nWise12468sitterdistribution?:embed=y&:display_count=yes
10. Objective 1 : ConclusionWhich is the best of 3 methods???
• Data Used : 235 MB, Point Data, 1.2 million rows
• System Used: Windows 10 64 bit OS, 8 GB RAM, 1 core processor
ArcPy (D) ArcPy (S) Python Interoperability
Extension
Python
visualization
Tableau Public
Pre-Processing 15 Min 2 Min 2 Min - - -
Spatial Processing 1.5 Hr 20 Min - 38 Min - -
Post-Processing 18 Min 3 Min 3 Min - - -
Visualization - - - - Basic for Data
and Analytics
Advanced for
Data Analytics
Design Timing 4 Weeks 4 weeks 4 weeks 1 Week 1 Weeks(3 Types
of Graphs)
3 days
Open/License/Publi
c
License License Open License Open Public and Free
Dependency Independent Independent Independent Depends on
ArcGIS License
Independent Independent
ArcPy(D)- Desktop
ArcPy(S)- Server
11. Objective 2
• Conduct study on esri capabilities in Big Data domain : Architecture
Machine 2
Machine 3
Machine 1: Base ArcGIS
Enterprise
Hosting Server
Web
Adaptor
(Portal)
Web Adaptor
Hosting
server
Portal for
ArcGIS
Web Adaptor
(GeoAnalytics
Server)
(GeoAnalytics
Server)
BigData File
Share . HDFS
folder
ArcGIS Relational
Data Store
ArcGIS
Spatiotemp
oral Store Machine
4