Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

2,044 views

Published on

What got you hooked on geospatial? For me it was more than just maps – it was the ability to transform geographic data to see something new or shed light on some aspect of my environment.

Whether you use GDAL, ArcGIS, GRASS or IDRISI, we have usually done this type of data transformation with a variety of desktop software tools. So why have these types of capabilities been relatively rare in web and mobile applications? Speed and scalability are two important factors. It has generally required too much time to calculate a viewshed, combine a stack of raster files into a weighted overlay, or generate slope and aspect from elevation data.

Azavea has been working on this problem – fast, scalable geoprocessing – for several years. In 2012 we released a new open source project called GeoTrellis (http://geotrellis.io/), an open source framework for fast, distributed geoprocessing. GeoTrellis leverages the strong type system and functional programming style of the Scala language and the Spark and Akka frameworks.

This talk will give an overview of GeoTrellis and how it can be integrated with web mapping tools to create online geoprocessing applications for stormwater modeling, education games, infrastructure prioritization, climate change, and transportation.

Published in: Technology

Fast, Distributed Geoprocessing with Scala, Spark and GeoTrellis

  1. 1. 21st Century Geoprocessing with Scala and GeoTrellis Robert Cheetham cheetham@azavea.com @azavea @rcheetham
  2. 2. B Corporation • Civic/Social impact • Donate share of profits Research-Driven • 10% Research Program • Academic Collaborations • Open Source • Open Data
  3. 3. Use geodata to do stuff that matters
  4. 4. Land Water People
  5. 5. Ian McHarg
  6. 6. Dana Tomlin
  7. 7. Idrisi
  8. 8. GRASS
  9. 9. advanced spatial analysis on the web
  10. 10. advanced spatial analysis on the web
  11. 11. 3 Challenges
  12. 12. 1. Performance & Scalability
  13. 13. Big Data – Cities 2. Large Data Sets – Digital City
  14. 14. 2. Large Data Sets – Social Media
  15. 15. 2. Large Data Sets - Science
  16. 16. 3. User Interface
  17. 17. 3. User Interface
  18. 18. 3. User Interface
  19. 19. 3. User Interface
  20. 20. 3. User Interface
  21. 21. We can do better
  22. 22. • IO • Geoprocessing Operations • Distributed Processing • Web Services
  23. 23. Real-time Processing
  24. 24. 6183 x 4992 4598 x 4867 118 MB 86 MB
  25. 25. Cluster-style Processing
  26. 26. 1770271 x 910139 5.8 TB
  27. 27. How does it work
  28. 28. On the shoulders of giants
  29. 29. LocationTech Community
  30. 30. Some changes coming
  31. 31. • Parallel operations across tiles • Parallel execution of operations • Basic cluster capabilities with GeoTrellis v0.9: +
  32. 32. • Sharding raster data across the cluster • Caching operation results across cluster • HDFS support • Advanced Fault tolerance • Advanced Scheduling • ... What's missing? +
  33. 33. • Caches results in memory • Ideal for iterative algorithms • Significantly outperforms Hadoop • Uses Hadoop's file system (HDFS)
  34. 34. +
  35. 35. What becomes possible?
  36. 36. Urban Forests
  37. 37. Urban Forests
  38. 38. Simulation Modeling
  39. 39. Sea Level Rise
  40. 40. Business Siting
  41. 41. Streaming Data
  42. 42. Counting Carbon
  43. 43. Digital Humanities
  44. 44. GeoTrellis Transit
  45. 45. Travelsheds
  46. 46. Crime Analysis and Forecasting
  47. 47. It’s the second Monday in October and school is in session. There were 2 burglaries and 3 assaults yesterday. The Maple Leafs are not playing this evening. Six bars, three take-out stores, and a high school are in the neighborhood. The forecast is 9°C with a 50% chance of rain this evening. Where do you focus your 3 vehicles?
  48. 48. It’s the second Monday in October and school is in session. There were 2 burglaries and 3 assaults yesterday. The Maples Leafs are not playing this evening. Six bars, three take-out stores, and a high school are in the neighborhood. The forecast is 9°C with 50% chance of rain. Where do you focus your 3 vehicles?
  49. 49. Data Science + Geography
  50. 50. Data Science + Geography
  51. 51. Faster is different…
  52. 52. Educational Games
  53. 53. New Devices and Displays
  54. 54. I am very excited
  55. 55. advanced spatial analysis on the web
  56. 56. advanced spatial analysis on the web
  57. 57. Land Water People
  58. 58. Simulation Modeling Forecasting
  59. 59. • Multi-band • Temporal bands (climate) • More operations • Tile indexes • GeoMesa collab. • Simpler setup • More integration points What’s next?
  60. 60. GeoTrellis.io Get Involved
  61. 61. geotrellis-user@googlegroups.com Get Involved
  62. 62. IRC: #geotrellis on freenode Get Involved
  63. 63. Use geodata to do stuff that matters
  64. 64. [is hiring] jobs.azavea.com cheetham@azavea.com @rcheetham

×