Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Rob Emanuele @lossyrob
ANALYZING LARGE RASTER DATA
IN A JUPYTER NOTEBOOK
WITH GEOPYSPARK
ON AWS
Connect to the WIFI
Network: Harvard University
http://getonline.harvard.edu
Click “I am a guest”
Credentials:
U: foss4g20...
OUTLINE
8:00 - 8:30 Intro and Background
8:30 - 9:10 Section 1: Land Cover data
9:10 - 10:00 Section 2: Landsat 8 data
10:...
NOW:
A MOTIVATING EXAMPLE
BY
rdd.map(lambda x: x + 1)
Source: http://silverpond.com.au/2016/10/06/balancing-spark.ht
(1, 1) (2, 1)(0, 1)
(0, 0) (1, 0) (2, 0)
(1, 2) (2, 2)(0, 2)
(1, 1) (2, 1)(0, 1)
(0, 0) (1, 0) (2, 0)
(1, 2) (2, 2)(0, 2)
Node 1
Node 2
Node 3
(1, 1) (2, 1)(0, 1)
(0, 0) (1, 0) (2, 0)
(1, 2) (2, 2)(0, 2)
Node 1
Node 2
Node 3
(1, 1) (2, 1)(0, 1)
(0, 0) (1, 0) (2, 0)
(1, 2) (2, 2)(0, 2)
Node 1
Node 2
Node 3
(1, 1) (2, 1)(0, 1)
Node 1
Node 2
Node 3
(1, 1) (2, 1)(0, 1)
Node 1
Node 2
Node 3
rdd.bufferTiles(…)
+
+
Interactive and Batch Processing
of large raster data
Web-Speed Processing
of small to medium sized raster data
GeoTrellis Ecosystem
Raster Foundry by
Spark SQL and Spark ML support
Raster Frames by
Spark SQL and Spark ML support
GeoP...
GeoPySpark
Started December 2016
Follows PySpark’s model of communication
between the JavaVirtual Machine and Python
Access GeoTrelli...
EXERCISE 1:
ANALYZING LAND COVER DATA
EXERCISE 2:
WORKING WITH LANDSAT IMAGERY
AND NDVITHROUGHTIME
(SpaceTimeKey, Tile)
(SpaceTimeKey, Tile)
(SpaceTimeKey, Tile)
…
SpaceTimeKey ≈  (col, row, instant)
(SpaceTimeKey, Tile)
(SpaceTimeKey, Tile)
(SpaceTimeKey, Tile)
…
lambda
lambda
lambda
(SpatialKey, (DateTime, Tile))
(Spat...
…
(SpatialKey, [(DateTime, Tile)
(DateTime, Tile)])
(SpatialKey, (DateTime, Tile))
(SpatialKey, (DateTime, Tile))
(Spatial...
…
(SpatialKey, [(DateTime, Tile)
(DateTime, Tile)])
(SpatialKey, (DateTime, Tile))
(SpatialKey, (DateTime, Tile))
(Spatial...
(SpatialKey, [(DateTime, Tile)
(DateTime, Tile)])
(SpatialKey, [(DateTime, Tile)])
…
mosaic
(SpatialKey, Tile)
(SpatialKey...
BREAK!
WHERE AND HOW ARETHESE
NOTEBOOKS RUNNING?
WHERE’STHIS DATA COMING
FROM?
Supported Backends
EXERCISE 3:
COMBINING LAND COVER AND NDVITO
DETECT CROP CYCLES
(SpaceTimeKey, Tile)
(SpaceTimeKey, Tile)
(SpaceTimeKey, Tile)
…
(SpaceTimeKey, Tile)
(SpaceTimeKey, Tile)
(SpaceTimeKey, Tile)
…
map_to_spatial
(SpatialKey, (STK, Tile))
(SpatialKey, (ST...
(SpatialKey, (STK, Tile))
(SpatialKey, (STK, Tile))
(SpatialKey, (STK, Tile))
…
(SpatialKey, Tile)
(SpatialKey, Tile)
…
nd...
(SpatialKey, (STK, Tile))
(SpatialKey, (STK, Tile))
(SpatialKey, (STK, Tile))
…
(SpatialKey, Tile)
(SpatialKey, Tile)
…
nd...
mask_ndwi
mask_ndwi
mask_ndwi
(SpaceTimeKey, Tile)
(SpaceTimeKey, Tile)
(SpaceTimeKey, Tile)
…
(SpatialKey, ((STK, Tile), ...
EXERCISE 4:
COMBINING IMAGERY, ELEVATION AND
LAND COVER DATA
TO MAKE A COOL LOOKING MAP
EXERCISE 4:
COMBINING IMAGERY, ELEVATION AND
LAND COVER DATA
TO MAKE A COOL LOOKING MAP
TWEETYOUR SWEET MAP SCREENSHOTS WI...
FINAL QUESTIONS?
Thank you!
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FOSS4G 2017 Workshop
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FOSS4G 2017 Workshop
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FOSS4G 2017 Workshop
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FOSS4G 2017 Workshop
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FOSS4G 2017 Workshop
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FOSS4G 2017 Workshop
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FOSS4G 2017 Workshop
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FOSS4G 2017 Workshop
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FOSS4G 2017 Workshop
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FOSS4G 2017 Workshop
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FOSS4G 2017 Workshop
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FOSS4G 2017 Workshop
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FOSS4G 2017 Workshop
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FOSS4G 2017 Workshop
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FOSS4G 2017 Workshop
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FOSS4G 2017 Workshop
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FOSS4G 2017 Workshop
Upcoming SlideShare
Loading in …5
×

Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FOSS4G 2017 Workshop

896 views

Published on

Slides from the 2017 FOSS4G Workshop "Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS"

See the repository at https://github.com/lossyrob/foss4g-2017-geopyspark-workshop

Published in: Technology
  • Be the first to comment

Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FOSS4G 2017 Workshop

  1. 1. Rob Emanuele @lossyrob ANALYZING LARGE RASTER DATA IN A JUPYTER NOTEBOOK WITH GEOPYSPARK ON AWS
  2. 2. Connect to the WIFI Network: Harvard University http://getonline.harvard.edu Click “I am a guest” Credentials: U: foss4g2017@gmail.com P: 7RFQU3rm FIRST: Find your Jupyter Notebook URL https://git.io/v77lh (lowercase L) visit the URL next to your name Log in to the Jupyter Hub U: hadoop P: hadoop
  3. 3. OUTLINE 8:00 - 8:30 Intro and Background 8:30 - 9:10 Section 1: Land Cover data 9:10 - 10:00 Section 2: Landsat 8 data 10:00 - 10:10 BREAK 10:10 - 10:30 Deployment and Ingestion 10:30 - 11:10 Section 3: Combining data layers 11:10 - 12:00 Section 4: Making Cool Maps
  4. 4. NOW: A MOTIVATING EXAMPLE
  5. 5. BY
  6. 6. rdd.map(lambda x: x + 1) Source: http://silverpond.com.au/2016/10/06/balancing-spark.ht
  7. 7. (1, 1) (2, 1)(0, 1) (0, 0) (1, 0) (2, 0) (1, 2) (2, 2)(0, 2)
  8. 8. (1, 1) (2, 1)(0, 1) (0, 0) (1, 0) (2, 0) (1, 2) (2, 2)(0, 2) Node 1 Node 2 Node 3
  9. 9. (1, 1) (2, 1)(0, 1) (0, 0) (1, 0) (2, 0) (1, 2) (2, 2)(0, 2) Node 1 Node 2 Node 3
  10. 10. (1, 1) (2, 1)(0, 1) (0, 0) (1, 0) (2, 0) (1, 2) (2, 2)(0, 2) Node 1 Node 2 Node 3
  11. 11. (1, 1) (2, 1)(0, 1) Node 1 Node 2 Node 3
  12. 12. (1, 1) (2, 1)(0, 1) Node 1 Node 2 Node 3 rdd.bufferTiles(…)
  13. 13. + + Interactive and Batch Processing of large raster data Web-Speed Processing of small to medium sized raster data
  14. 14. GeoTrellis Ecosystem Raster Foundry by Spark SQL and Spark ML support Raster Frames by Spark SQL and Spark ML support GeoPySpark Python bindings Vector Pipes Vector Tiles on Spark PDAL integration Point Clouds on Spark
  15. 15. GeoPySpark
  16. 16. Started December 2016 Follows PySpark’s model of communication between the JavaVirtual Machine and Python Access GeoTrellis functionality through Python, and integrates with your favorite python raster tools (numpy + friends). 0.2 is released! GeoPySpark
  17. 17. EXERCISE 1: ANALYZING LAND COVER DATA
  18. 18. EXERCISE 2: WORKING WITH LANDSAT IMAGERY AND NDVITHROUGHTIME
  19. 19. (SpaceTimeKey, Tile) (SpaceTimeKey, Tile) (SpaceTimeKey, Tile) … SpaceTimeKey ≈  (col, row, instant)
  20. 20. (SpaceTimeKey, Tile) (SpaceTimeKey, Tile) (SpaceTimeKey, Tile) … lambda lambda lambda (SpatialKey, (DateTime, Tile)) (SpatialKey, (DateTime, Tile)) (SpatialKey, (DateTime, Tile)) …
  21. 21. … (SpatialKey, [(DateTime, Tile) (DateTime, Tile)]) (SpatialKey, (DateTime, Tile)) (SpatialKey, (DateTime, Tile)) (SpatialKey, (DateTime, Tile)) (SpatialKey, [(DateTime, Tile)]) …
  22. 22. … (SpatialKey, [(DateTime, Tile) (DateTime, Tile)]) (SpatialKey, (DateTime, Tile)) (SpatialKey, (DateTime, Tile)) (SpatialKey, (DateTime, Tile)) (SpatialKey, [(DateTime, Tile)]) (Shuffle) …
  23. 23. (SpatialKey, [(DateTime, Tile) (DateTime, Tile)]) (SpatialKey, [(DateTime, Tile)]) … mosaic (SpatialKey, Tile) (SpatialKey, Tile) … mosaic
  24. 24. BREAK!
  25. 25. WHERE AND HOW ARETHESE NOTEBOOKS RUNNING?
  26. 26. WHERE’STHIS DATA COMING FROM?
  27. 27. Supported Backends
  28. 28. EXERCISE 3: COMBINING LAND COVER AND NDVITO DETECT CROP CYCLES
  29. 29. (SpaceTimeKey, Tile) (SpaceTimeKey, Tile) (SpaceTimeKey, Tile) …
  30. 30. (SpaceTimeKey, Tile) (SpaceTimeKey, Tile) (SpaceTimeKey, Tile) … map_to_spatial (SpatialKey, (STK, Tile)) (SpatialKey, (STK, Tile)) (SpatialKey, (STK, Tile)) … map_to_spatial map_to_spatial STK = SpaceTimeKey
  31. 31. (SpatialKey, (STK, Tile)) (SpatialKey, (STK, Tile)) (SpatialKey, (STK, Tile)) … (SpatialKey, Tile) (SpatialKey, Tile) … ndwi_rdd nlcd_layer.to_numpy_rdd() (SpatialKey, ((STK, Tile), Tile)) (SpatialKey, ((STK, Tile), Tile)) (SpatialKey, ((STK, Tile),Tile)) …
  32. 32. (SpatialKey, (STK, Tile)) (SpatialKey, (STK, Tile)) (SpatialKey, (STK, Tile)) … (SpatialKey, Tile) (SpatialKey, Tile) … ndwi_rdd nlcd_layer.to_numpy_rdd() (SpatialKey, ((STK, Tile), Tile)) (SpatialKey, ((STK, Tile), Tile)) (SpatialKey, ((STK, Tile),Tile)) … (Shuffle)
  33. 33. mask_ndwi mask_ndwi mask_ndwi (SpaceTimeKey, Tile) (SpaceTimeKey, Tile) (SpaceTimeKey, Tile) … (SpatialKey, ((STK, Tile), Tile)) (SpatialKey, ((STK, Tile), Tile)) (SpatialKey, ((STK, Tile),Tile)) …
  34. 34. EXERCISE 4: COMBINING IMAGERY, ELEVATION AND LAND COVER DATA TO MAKE A COOL LOOKING MAP
  35. 35. EXERCISE 4: COMBINING IMAGERY, ELEVATION AND LAND COVER DATA TO MAKE A COOL LOOKING MAP TWEETYOUR SWEET MAP SCREENSHOTS WITH #GEOPYSPARK #FOSS4G!
  36. 36. FINAL QUESTIONS?
  37. 37. Thank you!

×