Geoprocessing in Web Time (Robert Cheetham)


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • So this story starts with my wife and I looking for a house and being frustrated by the type of information we had access to. We didn't know where to start. Each of the real estate agents we met knew a particular part of the city really well, so they tended to steer us toward those houses.
  • We didn't own a car, so we wanted to find a place that was * Close to Center City (somewhat important) * Walking distance to a grocery store (vital) * Near some restaurants (very important) * Walk to a library (nice to have) * Near a Park (yes!) * Close to fencing * Biking / walking distance from our work
  • The factors you consider important are probably not the same as ours. Maybe yours are related to child care or rankings for local schools or being near a farmer's market. Or maybe you want to be close to PhillyCarShare or public transit, but don't want to be too close to downtown.
  • We selected a variety of factors that contribute to sustainability, ranging from location in a state or federal tax incentive zone to environmental amenities like tree canopy to transit considerations like access to bus and regional rail lines. Retail businesses targeting markets may be interested in demographic factors like age and per capita income and proxies for environmental engagement like recycling participation.
  • He wrote a book in 1969 called Design with Nature, and focused on sustainable and ecological design. Among other concepts, he described how a series of inputs drawn on transparent acetate sheets. Could be combined as a set of map overlays to identify the best site for a particular facility, road or whatever.
  • Now fast forward to the 1990’s. We’re interested in digital maps. This approach to compositing several digital maps was developed further by Dana Tomlin, who is now also a professor at University of Pennsylvania. Tomlin developed the computational vocabulary to perform this type of work – he called it Map Algebra. He’s also a really great teacher, so if you ever get a chance to take a class with him. Do it. He’s great.
  • So, these days people do this kind of work using desktop GIS systems. You are looking at the ArcMap application from ESRI
  • Export to KML
  • Export to KML
  • Export to KML
  • Geoprocessing in Web Time (Robert Cheetham)

    1. 1. Geoprocessing in Web Time: Distributed Computing for High Performance Geoprocessing Robert Cheetham [email_address] @rcheetham
    2. 2. Site Selection Tools
    3. 3. Buying a Home
    4. 4. Close to Center City  Walk to Grocery Store  Nearby Restaurants  Library  Near a Park  Biking / walking distance from our work  Biking distance to fencing  somewhat important vital very important nice to have somewhat important very important somewhat important Robert’s Rules of Housing
    5. 5. <ul><li>Child Care </li></ul><ul><li>Local School Rankings </li></ul><ul><li>Farmer's Market </li></ul><ul><li>PhillyCarShare </li></ul><ul><li>Public Transit </li></ul>Your Factors might include…
    6. 6. <ul><li>Tax Incentives </li></ul><ul><li>Commercial Corridor Health </li></ul><ul><li>Public Transit </li></ul><ul><li>Car Share </li></ul><ul><li>Open Space </li></ul><ul><li>Farmers’ Markets </li></ul><ul><li>Street Network Density </li></ul><ul><li>Recycling Participation </li></ul><ul><li>Walkability </li></ul>Siting Decision Factors
    7. 7. Not a new idea … Design with Nature
    8. 8. Not a new Idea … Dana Tomlin
    9. 9. Desktop GIS
    10. 10. x 5 x 2 x 3 x 1 + + + = Generate Output Heat Map
    11. 11. Web is different from the Desktop <ul><li>Lots of simultaneous users </li></ul><ul><li>Stateless environment </li></ul><ul><li>HTML+JS+CSS </li></ul><ul><li>Users are less skilled </li></ul><ul><li>Users are less patient </li></ul>
    12. 12. ArcGIS Server <ul><li>Flex, Silverlight and JS API’s </li></ul><ul><li>Publish tasks and models </li></ul><ul><li>Caching </li></ul><ul><li>Optimized MSD files </li></ul>
    13. 13. But wait … there’s a problem <ul><li>10 – 60 second calculation time </li></ul><ul><li>Multiple simultaneous users … </li></ul><ul><li>… that are impatient </li></ul>
    14. 18. Specific Optimization Goals <ul><li>New Raster File format </li></ul><ul><li>Distributed processing </li></ul><ul><li>Binary messaging protocol </li></ul>
    15. 19. Optimization: File Format <ul><li>Simple - strip out metadata </li></ul><ul><li>Limit data type and range </li></ul><ul><li>1D arrays are fast to read/write </li></ul><ul><li>Assume </li></ul><ul><ul><li>Same extent </li></ul></ul><ul><ul><li>Same cell size </li></ul></ul><ul><ul><li>Same pixel data type </li></ul></ul><ul><ul><li>Same cell alignment </li></ul></ul><ul><ul><li>Same projection </li></ul></ul><ul><li>Azavea Raster Grid (ARG) </li></ul>
    16. 20. Optimization: Distributed Processing <ul><li>Parallelizable - Local Ops and Focal Ops </li></ul><ul><li>Support multiple </li></ul><ul><ul><li>Threads </li></ul></ul><ul><ul><li>Cores </li></ul></ul><ul><ul><li>CPU’s </li></ul></ul><ul><ul><li>Machines </li></ul></ul><ul><li>Considered </li></ul><ul><ul><li>Hadoop </li></ul></ul><ul><ul><li>Amazon Map Reduce </li></ul></ul><ul><ul><li>Beowolf </li></ul></ul>
    17. 21. Distributed Processing
    18. 22. Binary Messaging Protocol <ul><li>Started with XML </li></ul><ul><li>Binary Protocol Buffer is better </li></ul><ul><ul><li>simpler </li></ul></ul><ul><ul><li>3 to 10 times smaller </li></ul></ul><ul><ul><li>20 to 100 times faster </li></ul></ul><ul><ul><li>less ambiguous </li></ul></ul><ul><ul><li>a bit easier to use programmatically </li></ul></ul><ul><li>Considered </li></ul><ul><ul><li>AMF </li></ul></ul><ul><ul><li>Google Protocol Buffer </li></ul></ul>
    19. 23. Success!! <ul><li>Reduced from 10-60 seconds to </li></ul><ul><li><500 milliseconds </li></ul>
    20. 24. Additional [Experimental] Measures <ul><li>Tiling </li></ul><ul><li>Pyramids </li></ul><ul><li>EC2 for planned peaks – NYC Big Apps </li></ul><ul><li>HTTP file caching - Varnish </li></ul>
    21. 25. Optimizing one process sub-optimizes others <ul><li>Complex to configure and maintain </li></ul><ul><li>One type of operation </li></ul><ul><li>No interpolation </li></ul><ul><li>No mixing cell sizes </li></ul><ul><li>No mixing extents </li></ul><ul><li>No mixing projections </li></ul><ul><li>No Map Algebra </li></ul><ul><li>No ModelBuilder </li></ul><ul><li>etc. </li></ul>
    22. 26. High Performance Geoprocessing 2.0 <ul><li>More generic </li></ul><ul><li>Cache data – memory is cheaper </li></ul><ul><li>New programming technology </li></ul>
    23. 27. High Performance Geoprocessing 2.0 <ul><li>Reduced calculation time to </li></ul><ul><li>~40ms </li></ul>
    24. 28. GPU Processing Research
    25. 29. GPUs
    26. 30. GPU geoprocessing research
    27. 31. <ul><ul><li>We re-wrote a few Map Algebra operations: </li></ul></ul><ul><ul><ul><li>Local </li></ul></ul></ul><ul><ul><ul><li>Neighborhood </li></ul></ul></ul><ul><ul><ul><li>Zonal </li></ul></ul></ul><ul><ul><ul><li>Viewshed </li></ul></ul></ul><ul><ul><ul><li>etc. </li></ul></ul></ul><ul><ul><li>15 – 120x speed improvement </li></ul></ul><ul><ul><li>Large grids </li></ul></ul><ul><ul><li>Large neighborhoods </li></ul></ul>Results
    28. 32.
    29. 33.
    30. 38. Food, Culture and Sustainability
    31. 39. OMB Watch: Federal Spending Equity
    32. 40. Sea Level Rise
    33. 41. GPU Processing Research
    34. 42. GPUs
    35. 43. GPU geoprocessing research
    36. 44. Stormwater Modeling
    37. 45. Stormwater Modeling Game
    38. 46. Stormwater Modeling Game
    39. 47. Stormwater Modeling Game
    40. 48. Summary <ul><li>New technologies are changing what can be done </li></ul><ul><li>Faster geoprocessing is not just faster, it’s different </li></ul><ul><li>Opportunity and responsibility to re-think the user GIS user experience </li></ul><ul><ul><li>Tablets </li></ul></ul><ul><ul><li>GPUs </li></ul></ul><ul><ul><li>Cloud Computing </li></ul></ul><ul><ul><li>Crowd-sourcing </li></ul></ul><ul><ul><li>Increased sampling and tracking </li></ul></ul><ul><ul><li>More corporate and gov transparency </li></ul></ul><ul><ul><li>Many, many more sensors </li></ul></ul>
    41. 49. Many Thanks! © Photo used with permission from Alphafish , via
    42. 50. Geoprocessing in Web Time: Distributed Computing for High Performance Geoprocessing Robert Cheetham [email_address] @rcheetham