Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Spatial Data processing with Hadoop

3,195 views

Published on

Spatial Data processing with Hadoop

Published in: Technology
  • For data visualization,data analyticsand data intelligence tools online training with job placements, register at http://www.todaycourses.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Spatial Data processing with Hadoop

  1. 1. University of Minnesota GeoJinni Spatial Data processing with Hadoop http://spatialhadoop.cs.umn.edu/ @spatialhadoop Ahmed Eldawy
  2. 2. Claudius Ptolemy (AD 90 – AD 168)
  3. 3. Al Idrisi (1099–1165)
  4. 4. Cholera cases in the London epidemic of 1854
  5. 5. Cool technology..!! Can I use it in my application? Oh..!! But, it is not made for me. Can’t make use of it as is My pleasure. Here it is..
  6. 6. Kindly let me get the technology you have Kindly let me understand your needs
  7. 7. HELP..!! I have too much data. Your technology is not helping me mmm…Let me check with my good friends there. My pleasure. Here it is.. Cool DBMS technology..!! Can I use it in my application? Oh..!! But, it is not made for me. Can’t make use of it as is
  8. 8. Kindly let me understand your needs Kindly let me get the technology you have
  9. 9. Let me check with my other good friends there. HELP..!! Again, I have too much data. Your technology is not helping me Cool MapReduce technology..!! Can I use it in my application? Sorry, seems like the DBMS technology cannot scale more My pleasure. Here Oh..!! But, it is not made for me. Can’t make use of it as is it is..
  10. 10. Kindly let me understand your needs Kindly let me get the technology you have
  11. 11. Kindly let me understand your needs Kindly let me get the technology you have aka GeoJinni
  12. 12. VGI Sensor networks 27 Tons of Spatial data out there… Smart phones Satellite Images Medical data Traffic data Geotagged Microblogs Geotagged pictures
  13. 13. GeoJinni Website: http://spatialhadoop.cs.umn.edu/ Download source code, binary distribution, and instructions Email us at: shadoop@cs.umn.edu ■ Released in March 2013; 75,000 downloads since then Spatial language Built-in spatial data types 28 Spatial Indexes Spatial Operations
  14. 14. User Programs Pig Latin Hadoop Java APIS Job Monitoring and 29 The Built-in Approach of GeoJinni Spatial Modules User Programs Pig Latin Hadoop Java APIS Job Monitoring and Scheduling MapReduce Runtime Storage (HDFS) (Spatial) User Program + MapReduce APIs + Job Monitoring and Scheduling + MapReduce Runtime + Storage + … Scheduling MapReduce Runtime Storage (HDFS) Spatial Language Spatial Operators Early Pruning Spatial Indexing The On-top Approach From Scratch Approach The Built-in Approach (GeoJinni)
  15. 15. 30 Spatial Data & Hadoop Spatial Data Hadoop points = LOAD ’points’ AS (id:int, x:int, y:int); result = FILTER points BY x < xmax AND x >= xmin AND y < ymax AND y >= ymin; Takes 193 seconds  GeoJinni GeoJinni points = LOAD ’points’ AS (id:int, location:point); result = FILTER points BY IsOverlap(location, rectangle (xmin, ymin, xmax, ymax)); Finishes in 2 seconds
  16. 16. 31 GeoJinni Architecture Applications: MNTG [SSTD’13, ICDE’14] SHAHED [ICDE’15] – TAREEG [SIGMOD’14, SIGSPATIAL’14] Spatio-temporal Hadoop Language: Pigeon [ICDE’14] Operations: Basic [VLDB’13] – CG_Hadoop [SIGSPATIAL’13] Data Mining – Visualization [Under submission] MapReduce: Spatial File Splitter – Spatial Record Reader Indexing: Grid File – R-tree – R+-tree [ICDE’15]
  17. 17. 32 Language Layer: Pigeon ■ Extends Pig Latin with OGC-compliant primitives  Spatial data types (e.g., Polygon)  Basic operations (e.g., Area)  Spatial predicates (e.g., Touches)  Spatial analysis (e.g., Union)  Spatial aggregate functions (e.g., Convex Hull) cities = LOAD ’cities’ AS (city_id: int, city_geom); City_area = FOREACH cities GENERATE Area(city_geom) AS area; A. Eldawy and M. F. Mokbel. Pigeon: A Spatial MapReduce Language. In ICDE, 2014
  18. 18. 33 Indexing Layer: R+-tree
  19. 19. 34 Indexing Layer: Grid File
  20. 20. 35 Non-indexed Heap File
  21. 21. 36 Range Query SpatialFileSplitter prunes blocks outside the query range SpatialRecordReader passes local indexes to the map function Map function selects records in range
  22. 22. 37 CG_Hadoop ■ Make use of GeoJinni to speedup computational geometry algorithms  Polygon union, Skyline, Convex Hull, Farthest/Closest Pair ■ Single machine implementation  E.g., Skyline of 4 billion points takes three hours ■ Straight forward implementation in Hadoop  Hadoop parallel execution ■ More efficient implementation in GeoJinni  Spatial indexing  Early pruning ■ Free open source as part of GeoJinni Single Machine Hadoop GeoJinni 29x 260x 1x
  23. 23. 38 Convex Hull Find the minimal convex polygon that contains all points Input Output
  24. 24. 39 Convex Hull in CG_Hadoop Hadoop CG_Hadoop Partition Pruning Local hull Global hull
  25. 25. 40 Map rendering ■ Map rendering creates an image that represents the data ■ Visualization is an international language ■ Can reveal patterns that are otherwise hard to spot ■ The visual system occupies about one third of the human brain 210 LINESTRING (-2.3634904 51.3845649, -2.3634254 51.3843983, - 2.3631927 51.3838436) [highway#primary,ref#A4,name#Gay Street] 420 LINESTRING (-1.8230973 52.5541131, -1.8230368 52.5540756, - 1.8229324 52.5540109, -1.8227961 52.5539014, -1.8227365 52.5538461, - 1.8226952 52.5538058, -1.8226204 52.5537103, -1.8223988 52.5534041, - 1.8221814 52.5531498, -1.8218478 52.5528188, -1.8215581 52.5525626, - 1.8213525 52.5524042) [source#GPS Survey,highway#residential,postal_code#B72,name#Moss Drive,is_in#Sutton Coldfield,maxspeed#30,abutters#residential] 490 LINESTRING (-0.1896508 51.6456414, -0.1895803 51.6456036, - 0.1895245 51.645551, -0.1890055 51.6450801, -0.1887808 51.6448764, - 0.1885605 51.6446756, -0.1883084 51.6443753, -0.1875496 51.6433375, - 0.1864572 51.6415288, -0.1862165 51.6411939, -0.1859495 51.6406583, - 0.1858855 51.6405461) [lit#yes,surface#asphalt,maxspeed#30 mph,highway#residential,abutters#residential,name#Sherrards Way] 770 LINESTRING (-1.8184653 52.5723683, -1.8182353 52.5723576, - …
  26. 26. 41 Smoothing Input Buffer Only Buffer + Merge
  27. 27. 42 Multi-level Image ■ Many images at different zoom levels  Pan  Zoom in/out  Fly to ■ More details as the zoom level increases
  28. 28. 43 MNTG - World-wide traffic generator for road networks http://mntg.cs.umn.edu/ M. F. Mokbel, L. Alarabi, J. Bao, A. Eldawy, A. Magdy, M. Sarwat, E. Waytas, and S. Yackel. MNTG: An Extensible Web-based Traffic Generator. In SSTD, 2013
  29. 29. 44 SHAHED – A tool for querying and visualizing spatio-temporal satellite data http://shahed.cs.umn.edu/ "SHAHED: A MapReduce-based System for Querying and Visualizing Spatio-temporal Satellite Data“, Ahmed Eldawy et al, ICDE 2015
  30. 30. 45 World Temperature
  31. 31. 46 Smooth World Temperature
  32. 32. 47 World Heat Map on Google Earth
  33. 33. 48 TAREEG – Web-based extractor for OpenStreetMap data using MapReduce http://tareeg.net/ L. Alarabi, A. Eldawy, R. Alghamdi, and M. F. Mokbel. TAREEG: A MapReduce-Based Web Service for Extracting Spatial Data from OpenStreetMap. In SIGMOD, 2014
  34. 34. 49 Extracted Road Network
  35. 35. GeoJinni Analyze your spatial data efficiently 50 Built-in spatial data types Spatial high level language Efficient Spatial Operations Language Data types Spatial Indexes Indexes Operations Analyze Datasets your are organized data on large efficiently clusters using with spatial built-in indexes spatial operations that runs efficiently using spatial indexes Interact Have with all your the system spatial and datasets express ready your to queries load in in a simple SpatialHadoop (Grid Website: high or level R-tree) http://language with that spatialhadoop.the are with built-adapted built-in spatial cs.to in umn.MapReduce spatial data edu/ support types Download source code, binary distribution, and instructions Email us at: shadoop@cs.umn.edu

×