GPUs, Cloud and Grids: Distributed Geoprocessing for Speed, Scalability and Better Living


Published on

An overview of Azavea's recent work to increase geoprocessing performance through distributed computing, cloud computing, GPUs and other techniques.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • So this story starts with my wife and I looking for a house and being frustrated by the type of information we had access to. We didn't know where to start. Each of the real estate agents we met knew a particular part of the city really well, so they tended to steer us toward those houses.
  • We didn't own a car, so we wanted to find a place that was * Close to Center City (somewhat important) * Walking distance to a grocery store (vital) * Near some restaurants (very important) * Walk to a library (nice to have) * Near a Park (yes!) * Close to fencing * Biking / walking distance from our work
  • The factors you consider important are probably not the same as ours. Maybe yours are related to child care or rankings for local schools or being near a farmer's market. Or maybe you want to be close to PhillyCarShare or public transit, but don't want to be too close to downtown.
  • We selected a variety of factors that contribute to sustainability, ranging from location in a state or federal tax incentive zone to environmental amenities like tree canopy to transit considerations like access to bus and regional rail lines. Retail businesses targeting markets may be interested in demographic factors like age and per capita income and proxies for environmental engagement like recycling participation.
  • Many of the ideas here are not new. Actually, they were developed here in Philadelphia by Ian McHarg, who was chair of the Landscape Architecture and Regional Planing department at the University of Pennsylvania
  • He wrote a book in 1969 called Design with Nature, and focused on sustainable and ecological design. Among other concepts, he described how a series of inputs drawn on transparent acetate sheets. Could be combined as a set of map overlays to identify the best site for a particular facility, road or whatever.
  • Now fast forward to the 1990’s. We’re interested in digital maps. This approach to compositing several digital maps was developed further by Dana Tomlin, who is now also a professor at University of Pennsylvania. Tomlin developed the computational vocabulary to perform this type of work – he called it Map Algebra. He’s also a really great teacher, so if you ever get a chance to take a class with him. Do it. He’s great.
  • So, these days people do this kind of work using desktop GIS systems. You are looking at the ArcMap application from ESRI
  • Export to KML
  • scenarios
  • Heat map
  • Export to KML
  • Export to KML
  • Export to KML
  • GPUs, Cloud and Grids: Distributed Geoprocessing for Speed, Scalability and Better Living

    1. 1. GPUs, Clouds and Grids: Distributed Geoprocessing for Speed, Scalability and Better Living Robert Cheetham 17 February 2011 NC GIS 2011
    2. 2. About Azavea <ul><li>Founded in 2000 </li></ul><ul><li>27 people </li></ul><ul><ul><li>software engineers </li></ul></ul><ul><ul><li>spatial analysts </li></ul></ul><ul><ul><li>project managers </li></ul></ul><ul><li>Web & Mobile apps </li></ul><ul><li>Spatial Analysis </li></ul><ul><li>R&D </li></ul><ul><li>High Performance Computing </li></ul><ul><li>User Experience </li></ul>
    3. 3. B Corporation <ul><li>10% Research Program </li></ul><ul><li>Pro Bono Program </li></ul><ul><li>Time-to-Give-Back Program </li></ul><ul><li>Employee-focused Culture </li></ul><ul><li>Projects with Social Value </li></ul>
    4. 4. High Performance Geoprocessing
    5. 5. Classic GIS Use Case ...
    6. 6. Close to Center City  Walk to Grocery Store  Nearby Restaurants  Library  Near a Park  Biking / walking distance from our work  Biking distance to fencing club  somewhat important vital very important nice to have somewhat important very important somewhat important Robert’s Rules of Housing
    7. 7. <ul><li>Child Care </li></ul><ul><li>Local School Rankings </li></ul><ul><li>Farmer's Market </li></ul><ul><li>PhillyCarShare </li></ul><ul><li>Public Transit </li></ul>Your Factors might include…
    8. 8. <ul><li>Tax Incentives </li></ul><ul><li>Commercial Corridor Health </li></ul><ul><li>Public Transit </li></ul><ul><li>Car Share </li></ul><ul><li>Open Space </li></ul><ul><li>Farmers’ Markets </li></ul><ul><li>Street Network Density </li></ul><ul><li>Recycling Participation </li></ul><ul><li>Walkability </li></ul>Sustainability Factors
    9. 9. Not a new idea… Ian McHarg
    10. 10. Not a new idea … Design with Nature
    11. 11. Not a new idea … Map Algebra
    12. 12. Desktop GIS
    13. 13. x 5 x 2 x 3 x 1 + + + = Generate Output Heat Map
    14. 14. The Web is different from the Desktop <ul><li>Lots of simultaneous users </li></ul><ul><li>Stateless environment </li></ul><ul><li>HTML+JS+CSS </li></ul><ul><li>Users are less skilled </li></ul><ul><li>Users are less patient </li></ul>
    15. 15. ArcGIS Server <ul><li>Flex, Silverlight and JS API’s </li></ul><ul><li>Publish tasks and models </li></ul><ul><li>Caching </li></ul><ul><li>Optimized MSD files </li></ul>
    16. 16. But wait … there’s a problem <ul><li>10 – 60 second calculation time </li></ul><ul><li>Multiple simultaneous users … </li></ul><ul><li>… that are impatient </li></ul>
    17. 17. User Interface version 1
    18. 24. Reports
    19. 25. Reports
    20. 27. Sustainable Business Network
    21. 34. Walkability:
    22. 35. NYC Big Apps Submission
    23. 36. Specific Optimization Goals <ul><li>New Raster File format </li></ul><ul><li>Distributed processing </li></ul><ul><li>Binary messaging protocol </li></ul>
    24. 37. Optimization: File Format <ul><li>Simple - strip out metadata </li></ul><ul><li>Limit data type and range </li></ul><ul><li>1D arrays are fast to read/write </li></ul><ul><li>Assume </li></ul><ul><ul><li>Same extent </li></ul></ul><ul><ul><li>Same cell size </li></ul></ul><ul><ul><li>Same pixel data type </li></ul></ul><ul><ul><li>Same cell alignment </li></ul></ul><ul><ul><li>Same projection </li></ul></ul><ul><li>Azavea Raster Grid (ARG) </li></ul>
    25. 38. Optimization: Distributed Processing <ul><li>Parallelizable - Local Ops and Focal Ops </li></ul><ul><li>Support multiple </li></ul><ul><ul><li>Threads </li></ul></ul><ul><ul><li>Cores </li></ul></ul><ul><ul><li>CPU’s </li></ul></ul><ul><ul><li>Machines </li></ul></ul><ul><li>Considered </li></ul><ul><ul><li>Hadoop </li></ul></ul><ul><ul><li>Amazon Map Reduce </li></ul></ul><ul><ul><li>Beowolf </li></ul></ul>
    26. 39. Distributed Processing
    27. 40. Binary Messaging Protocol <ul><li>Started with XML </li></ul><ul><li>Binary Protocol Buffer is better </li></ul><ul><ul><li>simpler </li></ul></ul><ul><ul><li>3 to 10 times smaller </li></ul></ul><ul><ul><li>20 to 100 times faster </li></ul></ul><ul><ul><li>less ambiguous </li></ul></ul><ul><ul><li>a bit easier to use programmatically </li></ul></ul><ul><li>Considered </li></ul><ul><ul><li>AMF </li></ul></ul><ul><ul><li>Google Protocol Buffer </li></ul></ul>
    28. 41. Success!! <ul><li>Reduced from 10-60 seconds to </li></ul><ul><li><500 milliseconds </li></ul>
    29. 42. Additional [Experimental] Measures <ul><li>Tiling </li></ul><ul><li>Pyramids </li></ul><ul><li>EC2 for planned peaks – NYC Big Apps </li></ul><ul><li>HTTP file caching - Varnish </li></ul>
    30. 43. Optimizing one process sub-optimizes others <ul><li>Complex to configure and maintain </li></ul><ul><li>One type of operation </li></ul><ul><li>No interpolation </li></ul><ul><li>No mixing cell sizes </li></ul><ul><li>No mixing extents </li></ul><ul><li>No mixing projections </li></ul><ul><li>No Map Algebra </li></ul><ul><li>No ModelBuilder </li></ul><ul><li>etc. </li></ul>
    31. 44. High Performance Geoprocessing 2.0 <ul><li>More generic </li></ul><ul><li>Cache data – memory is cheaper </li></ul><ul><li>New programming technology </li></ul>
    32. 45. OMB Watch: Federal Spending Equity
    33. 46. High Performance Geoprocessing 2.0 <ul><li>Reduced calculation time to </li></ul><ul><li>~40ms </li></ul>
    34. 47. <ul><li>GPU geoprocessing research </li></ul>But wait, there’s more…
    35. 48. GPU geoprocessing research
    36. 49. GPU geoprocessing research
    37. 50. <ul><ul><li>New languages </li></ul></ul><ul><ul><ul><li>CUDA </li></ul></ul></ul><ul><ul><ul><li>OpenCL </li></ul></ul></ul><ul><ul><ul><li>DirectCompute </li></ul></ul></ul><ul><ul><li>Re-write every algorithm </li></ul></ul><ul><ul><li>Hardware Diversity </li></ul></ul>Challenges
    38. 51. <ul><ul><li>We re-wrote a few Map Algebra operations: </li></ul></ul><ul><ul><ul><li>Local </li></ul></ul></ul><ul><ul><ul><li>Neighborhood </li></ul></ul></ul><ul><ul><ul><li>Zonal </li></ul></ul></ul><ul><ul><ul><li>Viewshed </li></ul></ul></ul><ul><ul><ul><li>etc. </li></ul></ul></ul><ul><ul><li>15 – 120x speed improvement </li></ul></ul><ul><ul><li>Large grids </li></ul></ul><ul><ul><li>Large neighborhoods </li></ul></ul>Results
    39. 52. Not for the faint of heart
    40. 53. Sea Level Rise
    41. 54. Crime Analysis, Early Warning and Forecasting
    42. 55. Hunch Helper
    43. 56. Risk Forecasting
    44. 57. Animation
    45. 61. Food, Culture and Sustainability
    46. 62. Quick Demo
    47. 63. The Future <ul><li>Clouds of Processors - Google App Engine </li></ul><ul><li>Faster is different </li></ul>
    48. 64. Summary <ul><li>Challenges </li></ul><ul><ul><li>Geographic data growth rates are exploding </li></ul></ul><ul><ul><li>Size of data sets is growing </li></ul></ul><ul><ul><ul><li>Lidar </li></ul></ul></ul><ul><ul><ul><li>Raster </li></ul></ul></ul><ul><ul><ul><li>Social Media </li></ul></ul></ul><ul><ul><li>New form factors that are less powerful </li></ul></ul><ul><ul><li>Distributed data sets </li></ul></ul><ul><ul><li>Larger numbers of less technical users </li></ul></ul><ul><li>New Options </li></ul><ul><ul><li>Clouds of processors </li></ul></ul><ul><ul><li>Clouds of virtual machines </li></ul></ul><ul><ul><li>GPUs </li></ul></ul>
    49. 65. Many Thanks! © Photo used with permission from Alphafish , via
    50. 66. GPUs, Clouds and Grids: Distributed Geoprocessing for Speed, Scalability and Better Living Robert Cheetham 17 February 2011 NC GIS 2011