Your SlideShare is downloading. ×
0
Adding Location and Geospatial Analytics to
Your Big Data
Marwa Mabrouk, Esri
November 15, 2013

© 2013 Amazon.com, Inc. a...
Why Big Data and Geospatial?
New Challenges for Organizations
•
•
•
•
•
•

Better decision making
Intelligence
Insight / foresight
Social data analysis...
Collect Data!!!
Big Data – A New Data Type for Geospatial

Maps

Spreadsheets
Social Media

Big Data

Services

Sensor
Networks

DBMS

Ima...
Geospatial in Big Data
1. Geo Enable & Enrich Big Data (Geo E&E)
2. Run spatial queries and operations on data where it
re...
Questions in Utilities
Smart Meters
• Billions of readings
• Where are the failures?
• What was the weather like here? Did...
Questions in Agriculture
Tractor Control Box readings
• Billions of readings
• What was the yield in a field?
– Broken by ...
Questions in Telco
Smart phones
• Billions of readings
• Where and when do people start using what
kind of apps?
• Pattern...
Questions in Healthcare
Service Location
• Doctor/ patient/ location and time of service
– Fraud detection
– Quality of se...
Questions in Social Media
Service Quality
• Where are the most complaints/ praises about a
brand?
• Where is it best to st...
Geospatial Analysis
• Beyond a point on the map
• Simple operations
– Geometry relations

• High level analysis
– Hot spot...
Implementing Geospatial
Analysis in Big Data
Geometry Relations

select * from cities
where near(x,y,84.2,39.4);
Geometry Relations

select * from cities
where
contains(x,y,’#mypolys’);
GIS tools for Hadoop libraries

Esri Spatial
UDF

Esri Geometry
API
GIS tools for Hadoop libraries
•

http://esri.github.com/gis-tools-for-hadoop/

• Support running
geometry-based spatial
q...
Analysis Tools Integration
ArcGIS
GIS tools for Hadoop libraries
ArcGIS

Geoprocessing
Tools

Connect From ArcGIS
to Hadoop using GP
Run Hive Queries
with s...
GIS Tools for Hadoop Walkthrough
Amazon Elastic Map Reduce
(Amazon EMR)
•
•
•
•
•
•

Easy to use
Elastic
Low cost
Reliable
Secure
Flexible
Amazon EMR Data Stores
•
•
•
•
•
•

Amazon S3
HDFS
Amazon Redshift
Amazon Glacier
Amazon RDS
Amazon DynamoDB
Amazon EMR for Geospatial Analysis
• Flexible platform to get started and grow large
• Hosted and managed by Amazon Web Se...
GIS tools for Hadoop libraries
Geoprocessing
Tools

Esri Spatial
UDF

Esri Geometry
API

Connect From ArcGIS
to Hadoop usi...
ArcGIS Geoprocessing Tools
• Framework
– Performing analysis
– Manage geographical data

• Rich library of analysis tools
...
GP Tools for AWS
• https://github.com/Esri/gptools-for-aws
• GP tools to use
– Amazon EMR
– Amazon S3
• Open Source
– Apac...
GP Tools for AWS Walkthrough
Boto: A python Interface to AWS
• Python package
• Supports multiple AWS services
– Amazon EMR
– Amazon S3

• Complete fea...
Boto Walkthrough
A Real World Example
Putting it all together!
Geospatial analysis of log files
• Using: GP tools for AWS
• Goal: Analyze log files of a tile
ba...
The Architecture
Amazon EMR Master Node

Amazon EMR Slave Node

ArcGIS
Desktop
+
GP Tools for
AWS

Availability Zone #1

D...
Data Files
• Structured CSV files
– ~8 GB

• Data rows
– Represented 1 month
– More than 700 million records

• Represents...
HQL Script
• External tables for data rows
• Calculations run through temp tables
– Consolidate tile scales from most deta...
Visualization
• Download output to local disk
• Add as a layer, set x/y for display
– Set coordinate system
– Use visualiz...
Demo
Lessons Learned
• External tables and Amazon S3
• Cluster shutdown protection
• Data
– Partitioning

• Cluster sizes vs. e...
Summary
• The value of asking Big Data spatial questions
• Hadoop is now spatially enabled
– GIS Tools for Hadoop

• Boto ...
Q&A
Please give us your feedback on this
presentation

BDT210
As a thank you, we will select prize
winners daily for completed...
Upcoming SlideShare
Loading in...5
×

Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

1,992

Published on

(Presented by Esri)
When people analyze a problem, they often include location at the core of the analysis. Location and spatial context, combined with geographical knowledge, can make the biggest difference in understanding a problem and analyzing it in a more meaningful way.

In this session, we show how Amazon EMR can be used with location and geospatial analytics, and how the Amazon EMR API and the Python SDK were used to build tools that integrate Big Data and geospatial analysis. We also show powerful visualization options for displaying your results, using maps which can be shared in reports or distributed online and to mobile apps.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,992
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
60
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013"

  1. 1. Adding Location and Geospatial Analytics to Your Big Data Marwa Mabrouk, Esri November 15, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  2. 2. Why Big Data and Geospatial?
  3. 3. New Challenges for Organizations • • • • • • Better decision making Intelligence Insight / foresight Social data analysis Log files analysis Fraud detection
  4. 4. Collect Data!!!
  5. 5. Big Data – A New Data Type for Geospatial Maps Spreadsheets Social Media Big Data Services Sensor Networks DBMS Imagery
  6. 6. Geospatial in Big Data 1. Geo Enable & Enrich Big Data (Geo E&E) 2. Run spatial queries and operations on data where it resides 3. Results in Geospatial tools: Visualize results as a map; Include in a report; Publish in a web or mobile app
  7. 7. Questions in Utilities Smart Meters • Billions of readings • Where are the failures? • What was the weather like here? Did it impact operations in any of the areas? • Patterns of usage in specific areas?
  8. 8. Questions in Agriculture Tractor Control Box readings • Billions of readings • What was the yield in a field? – Broken by 2 inch x 2 inch • What was the impact of weather (or other factors) on yield? • What are the other places with conditions like this place?
  9. 9. Questions in Telco Smart phones • Billions of readings • Where and when do people start using what kind of apps? • Patterns of usage in certain areas on certain times?
  10. 10. Questions in Healthcare Service Location • Doctor/ patient/ location and time of service – Fraud detection – Quality of service • Health indicators readings related to where patient has been – Impact of conditions, like weather
  11. 11. Questions in Social Media Service Quality • Where are the most complaints/ praises about a brand? • Where is it best to start a new product limited roll out? • What is the impact of other factors on what people say? • Are there patterns within a certain area on how people react?
  12. 12. Geospatial Analysis • Beyond a point on the map • Simple operations – Geometry relations • High level analysis – Hot spot analysis
  13. 13. Implementing Geospatial Analysis in Big Data
  14. 14. Geometry Relations select * from cities where near(x,y,84.2,39.4);
  15. 15. Geometry Relations select * from cities where contains(x,y,’#mypolys’);
  16. 16. GIS tools for Hadoop libraries Esri Spatial UDF Esri Geometry API
  17. 17. GIS tools for Hadoop libraries • http://esri.github.com/gis-tools-for-hadoop/ • Support running geometry-based spatial queries inside Hadoop • Open Source – Apache 2.0 license
  18. 18. Analysis Tools Integration ArcGIS
  19. 19. GIS tools for Hadoop libraries ArcGIS Geoprocessing Tools Connect From ArcGIS to Hadoop using GP Run Hive Queries with spatial Esri UDF operators Build Map/ Reduce Spatial Esri Geometry API Apps in Java
  20. 20. GIS Tools for Hadoop Walkthrough
  21. 21. Amazon Elastic Map Reduce (Amazon EMR) • • • • • • Easy to use Elastic Low cost Reliable Secure Flexible
  22. 22. Amazon EMR Data Stores • • • • • • Amazon S3 HDFS Amazon Redshift Amazon Glacier Amazon RDS Amazon DynamoDB
  23. 23. Amazon EMR for Geospatial Analysis • Flexible platform to get started and grow large • Hosted and managed by Amazon Web Services – No need for large Big Data in house infrastructure – No need for hiring many people to maintain Hadoop • Data ecosystem in the cloud is leveraged – Geospatial data is usually large in size – Access to third party datasets in the same ecosystem
  24. 24. GIS tools for Hadoop libraries Geoprocessing Tools Esri Spatial UDF Esri Geometry API Connect From ArcGIS to Hadoop using GP Amazon Elastic MapReduce (Amazon EMR)
  25. 25. ArcGIS Geoprocessing Tools • Framework – Performing analysis – Manage geographical data • Rich library of analysis tools • Chaining tools to create models – Drag and drop model builder • Developing new custom tools – Python
  26. 26. GP Tools for AWS • https://github.com/Esri/gptools-for-aws • GP tools to use – Amazon EMR – Amazon S3 • Open Source – Apache 2.0 license
  27. 27. GP Tools for AWS Walkthrough
  28. 28. Boto: A python Interface to AWS • Python package • Supports multiple AWS services – Amazon EMR – Amazon S3 • Complete feature set needed for Amazon EMR • Reliable Amazon S3 implementation
  29. 29. Boto Walkthrough
  30. 30. A Real World Example
  31. 31. Putting it all together! Geospatial analysis of log files • Using: GP tools for AWS • Goal: Analyze log files of a tile base-map web service – Real life high demand web service – Where is the most demand? • Map visualization
  32. 32. The Architecture Amazon EMR Master Node Amazon EMR Slave Node ArcGIS Desktop + GP Tools for AWS Availability Zone #1 Data AWS cloud Scripts/ Logs/ output
  33. 33. Data Files • Structured CSV files – ~8 GB • Data rows – Represented 1 month – More than 700 million records • Represents all 18 map scales – To know in which areas users are looking for details
  34. 34. HQL Script • External tables for data rows • Calculations run through temp tables – Consolidate tile scales from most detailed to level 13 – Calculate points (x,y) representing each tile – Aggregate results – Format output as csv not tab delimited • Ported from RDBMS operations – Adapted to Hive
  35. 35. Visualization • Download output to local disk • Add as a layer, set x/y for display – Set coordinate system – Use visualization settings to cluster points and categorize • Use base maps
  36. 36. Demo
  37. 37. Lessons Learned • External tables and Amazon S3 • Cluster shutdown protection • Data – Partitioning • Cluster sizes vs. execution time – Standard Large – High Memory, XLarge vs Quadruple Xlarge • Costs
  38. 38. Summary • The value of asking Big Data spatial questions • Hadoop is now spatially enabled – GIS Tools for Hadoop • Boto for using Amazon EMR • Geospatial analysts empowered – GP Tools for AWS • Real world scenario using Amazon EMR and GP Tools
  39. 39. Q&A
  40. 40. Please give us your feedback on this presentation BDT210 As a thank you, we will select prize winners daily for completed surveys!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×