Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© Arthur J. Lembo, Jr.
Salisbury University
QGIS Plug-in for Parallel Processing in
Terrain Analysis'
Arthur Lembo
Departm...
© Arthur J. Lembo, Jr.
Salisbury University
If you were plowing a field, which would
you rather use? Two strong oxen or 10...
© Arthur J. Lembo, Jr.
Salisbury University
• As part of a NSF REU, we built a QGIS
plugin to perform terrain-based parall...
© Arthur J. Lembo, Jr.
Salisbury University
NSF REU
• The NSF REU is a 3 year grant
(extended to 5 years) focused on
paral...
© Arthur J. Lembo, Jr.
Salisbury University
• 1971 Intel 4004
• Ted Hoff
• $60,000
• 2,300 transistors
• 582,000,000 Quad
...
© Arthur J. Lembo, Jr.
Salisbury University
• 2x / 18 months
• Design Shrink
• Smaller = Faster
Moore’s Law
© Arthur J. Lembo, Jr.
Salisbury University
Trouble in paradise?
© Arthur J. Lembo, Jr.
Salisbury University
• Heat
– Limits on power density
– Package dissipation limit
– Watercooled ove...
© Arthur J. Lembo, Jr.
Salisbury University
• Parallel dies
• Parallel packages
• Core 2 Duo
• Core 2 Quad
 Repeating his...
© Arthur J. Lembo, Jr.
Salisbury University
• 64-bit just getting traction
• Windows Parallelism
• Multithreading difficul...
© Arthur J. Lembo, Jr.
Salisbury University
© Arthur J. Lembo, Jr.
Salisbury University
© Arthur J. Lembo, Jr.
Salisbury University
Options for Large Geographic
Computations
• Use a smaller dataset
• Generalize...
© Arthur J. Lembo, Jr.
Salisbury University
Background
Parallel Processing - program
allows multiple computations to
occur...
© Arthur J. Lembo, Jr.
Salisbury University
Test Environment
GPU - Nvidia GTX 670
1344 CUDA cores
2 GB DDR5 RAM
Intel Xeon...
© Arthur J. Lembo, Jr.
Salisbury University
Why PyCUDA
• Expose CUDA functions in
QGIS
• Easier to program?
• Easier to ad...
© Arthur J. Lembo, Jr.
Salisbury University
Terrain functions
• Started with 3 common GIS functions
• Slope - ~15 calculat...
© Arthur J. Lembo, Jr.
Salisbury University
Terrain visuals
Altitude HillshadeSlope
© Arthur J. Lembo, Jr.
Salisbury University
Methods, cont.
© Arthur J. Lembo, Jr.
Salisbury University
Scheduler
• Overall manager
• Starts and
manages
processes which
load data,
• ...
© Arthur J. Lembo, Jr.
Salisbury University
GPU
Calculator
• Where the actual
calculations are
performed
• Reads data from...
© Arthur J. Lembo, Jr.
Salisbury University
Saver
• Takes the results of
the calculations done
in the GPU manager
and save...
© Arthur J. Lembo, Jr.
Salisbury University
Python by itself is much slower than C++
PyCUDA is faster both because of util...
© Arthur J. Lembo, Jr.
Salisbury University
Results and Discussion
(Stage 2 - threading)
Adding CPU based parallelism
incr...
© Arthur J. Lembo, Jr.
Salisbury University
Results and Discussion
(Stage 2 - added computation)
Adding more complex compu...
© Arthur J. Lembo, Jr.
Salisbury University
Results and Discussion
(Stage 3 - further optimization)
Main bottleneck in com...
© Arthur J. Lembo, Jr.
Salisbury University
Results and Discussion (Stage
3 - chunking)
Lines read Time taken
1 50:00
10 3...
© Arthur J. Lembo, Jr.
Salisbury University
Results
• The PyCUDA version is
consistently faster than QGIS
when calculating...
© Arthur J. Lembo, Jr.
Salisbury University
Other Takeaways
• CUDA is very efficient when you
have a smaller number of dat...
© Arthur J. Lembo, Jr.
Salisbury University
Earlier work
© Arthur J. Lembo, Jr.
Salisbury University
Next Steps
• Improve the installation process – it is
too arduous at the momen...
© Arthur J. Lembo, Jr.
Salisbury University
Conclusion
• Early results show the ability to triple terrain analysis
speed c...
© Arthur J. Lembo, Jr.
Salisbury University
SO, WHAT IS A GOOD GIS
EXAMPLE OF MASSIVE
CALCULATIONS PER DATA
ELEMENT?
© Arthur J. Lembo, Jr.
Salisbury University
Acknowledgements
Salisbury University
National Science Foundation (Award #
146...
Upcoming SlideShare
Loading in …5
×

QGIS plugin for parallel processing in terrain analysis

1,052 views

Published on

Art Lembo's presentation on embarrassingly parallel processing with QGIS and pyCUDA for terrain analysis. Given at 6th Scottish QGIS UK user group meeting.

Published in: Technology
  • Be the first to comment

QGIS plugin for parallel processing in terrain analysis

  1. 1. © Arthur J. Lembo, Jr. Salisbury University QGIS Plug-in for Parallel Processing in Terrain Analysis' Arthur Lembo Department of Geography and Geoscience @artlembo
  2. 2. © Arthur J. Lembo, Jr. Salisbury University If you were plowing a field, which would you rather use? Two strong oxen or 1024 chickens? - Seymour Cray
  3. 3. © Arthur J. Lembo, Jr. Salisbury University • As part of a NSF REU, we built a QGIS plugin to perform terrain-based parallel processing with Python. • This presentation shows the results of our undergraduate student research project – Multicore processors – Massively parallel GPGPUs – Hardware evolution – Our QGIS plug-in – The road ahead Overview
  4. 4. © Arthur J. Lembo, Jr. Salisbury University NSF REU • The NSF REU is a 3 year grant (extended to 5 years) focused on parallel processing. • The goal is to expose undergraduates to academic research in computer science. • My role has been to mentor students in the use of parallel processing in geography.
  5. 5. © Arthur J. Lembo, Jr. Salisbury University • 1971 Intel 4004 • Ted Hoff • $60,000 • 2,300 transistors • 582,000,000 Quad • Lithography • Killed time-sharing Microcomputer revolution
  6. 6. © Arthur J. Lembo, Jr. Salisbury University • 2x / 18 months • Design Shrink • Smaller = Faster Moore’s Law
  7. 7. © Arthur J. Lembo, Jr. Salisbury University Trouble in paradise?
  8. 8. © Arthur J. Lembo, Jr. Salisbury University • Heat – Limits on power density – Package dissipation limit – Watercooled overclocking • Subunit Complexity – Single clock cycle synchronicity – AMD translation lookaside buffer bug • RC Interconnect delay Limits of Moore’s Law
  9. 9. © Arthur J. Lembo, Jr. Salisbury University • Parallel dies • Parallel packages • Core 2 Duo • Core 2 Quad  Repeating history Parallel microprocessors
  10. 10. © Arthur J. Lembo, Jr. Salisbury University • 64-bit just getting traction • Windows Parallelism • Multithreading difficult to Code • Parallel code even harder • Scientific Computing • Who care’s what I have to say? • Gaming leads the way Limited uptake Opening week: Grand Theft Auto: $500M Halo 3: $300M Spiderman 3: $182M Pirates 3: $196M
  11. 11. © Arthur J. Lembo, Jr. Salisbury University
  12. 12. © Arthur J. Lembo, Jr. Salisbury University
  13. 13. © Arthur J. Lembo, Jr. Salisbury University Options for Large Geographic Computations • Use a smaller dataset • Generalize the resolution of your dataset – Both options compromise the integrity of the data • Invest in clusters (groups of ordinary PCs joined together with combined power and parallel processing) or time sharing – Require special programming – Very costly
  14. 14. © Arthur J. Lembo, Jr. Salisbury University Background Parallel Processing - program allows multiple computations to occur concurrently GPU - graphical processing unit, generally used for video / gaming visuals processing Designed for multithreading, contain hundreds of cores, good at simple math CPU - central processing unit, what computation is traditionally done on Contain much smaller number of cores, good at complex calculations
  15. 15. © Arthur J. Lembo, Jr. Salisbury University Test Environment GPU - Nvidia GTX 670 1344 CUDA cores 2 GB DDR5 RAM Intel Xeon E5607 processor 4 Cores, 4 threads 2.27 GHz
  16. 16. © Arthur J. Lembo, Jr. Salisbury University Why PyCUDA • Expose CUDA functions in QGIS • Easier to program? • Easier to add functionality?
  17. 17. © Arthur J. Lembo, Jr. Salisbury University Terrain functions • Started with 3 common GIS functions • Slope - ~15 calculations, Aspect - ~20 calculations, Hillshade - ~45 calculations • All are embarrassingly parallel
  18. 18. © Arthur J. Lembo, Jr. Salisbury University Terrain visuals Altitude HillshadeSlope
  19. 19. © Arthur J. Lembo, Jr. Salisbury University Methods, cont.
  20. 20. © Arthur J. Lembo, Jr. Salisbury University Scheduler • Overall manager • Starts and manages processes which load data, • Performs raster calculations on GPU, • Save data back to disk.
  21. 21. © Arthur J. Lembo, Jr. Salisbury University GPU Calculator • Where the actual calculations are performed • Reads data from the input pipe, performs GPU calculations over all the cores, sends result through output pipe to data saver • Designed so any algorithm based on a 3x3 grid of pixels can be used
  22. 22. © Arthur J. Lembo, Jr. Salisbury University Saver • Takes the results of the calculations done in the GPU manager and saves them to disk • Uses the GDAL libraries to write multiple lines at a time to a Geotiff • Multiple savers can all run in parallel to save the ouputs of different functions
  23. 23. © Arthur J. Lembo, Jr. Salisbury University Python by itself is much slower than C++ PyCUDA is faster both because of utilizing the GPU and because it is written in C *take away: when given the option, use pyCUDA libraries Results and Discussion (Stage 1 - out of the box) Size Python C++ PyCUDA QGIS 25 MB 50 secs 5 secs 4 secs 5 secs 200 MB 7:30 mins 40 secs 28 secs 15 secs
  24. 24. © Arthur J. Lembo, Jr. Salisbury University Results and Discussion (Stage 2 - threading) Adding CPU based parallelism increases gains Reduces time waiting for data to be given to GPU Size Threaded Python Threaded C++ Threaded PyCUDA QGIS 25 MB 45 secs 5 secs 4 secs 5 secs 200 MB 7:30 mins 40 secs 9 secs 15 secs 1.5 GB 1:30 hrs 18:03 mins 9:04 mins 9 mins
  25. 25. © Arthur J. Lembo, Jr. Salisbury University Results and Discussion (Stage 2 - added computation) Adding more complex computations allows us to maximize the GPU contribution. Computing hillshade which requires about 3x more computations PyCUDA doesn’t even slow down when switching formulas Shows that it can do much more before peaking out Size QGIS Threaded PyCUDA 25 MB 5 secs 4 secs 1.5 GB 11:00 mins 9:04 mins 12 GB 45:00 mins 50:00 mins
  26. 26. © Arthur J. Lembo, Jr. Salisbury University Results and Discussion (Stage 3 - further optimization) Main bottleneck in computations disk I/O Total time the GPU is working for the 1.5 GB file is less than 2 seconds Increasing size of reads and writes gains even more time Size QGIS Threaded PyCUDA Input 2:00 mins 1:55 mins Computation 9:00 mins 1:00 mins Output 2:00 mins 2:20 mins Total 11:00 mins 3:35 mins
  27. 27. © Arthur J. Lembo, Jr. Salisbury University Results and Discussion (Stage 3 - chunking) Lines read Time taken 1 50:00 10 39:50 15 28:30 20 33:30 30 37:00 40 40:00 50 48:30 Lines read Time taken 1 5:18 10 3:54 15 3:00 20 3:50 30 4:16 40 4:52 50 5:21 Reading too much data in one call causes slowdown Optimal number is ~15 lines for all sizes No apparent ratio between disk read lines and raster column and row sizes 1.5 GB is 14400 rows * 28800 cols 12 GB is 51187 rows * 60818 cols The limitation is how fast we can send data to GPU 12 GB file 1.5 GB file
  28. 28. © Arthur J. Lembo, Jr. Salisbury University Results • The PyCUDA version is consistently faster than QGIS when calculating hillshade for files of various sizes. • GPU computations, including CPU based memory management took one ninth of the time required to do the same thing in QGIS • The I/O bottleneck can be seen in the input and output sections of the second table. • Output takes a much longer time because it has to wait for the GPU to pass data to the saver before it can start saving to disk 9:00
  29. 29. © Arthur J. Lembo, Jr. Salisbury University Other Takeaways • CUDA is very efficient when you have a smaller number of data elements, but massive calculations per element. • Terrain based analysis use massive amounts of data, but few calculations per data element.
  30. 30. © Arthur J. Lembo, Jr. Salisbury University Earlier work
  31. 31. © Arthur J. Lembo, Jr. Salisbury University Next Steps • Improve the installation process – it is too arduous at the moment • Get the plug-in to work in Windows
  32. 32. © Arthur J. Lembo, Jr. Salisbury University Conclusion • Early results show the ability to triple terrain analysis speed compared to serial methods • Multithreading can significantly improve GIS analysis speed • Try it out for yourself: https://github.com/aFuerst/PyCUDA-Raster GPU C++ QGIS SERIAL
  33. 33. © Arthur J. Lembo, Jr. Salisbury University SO, WHAT IS A GOOD GIS EXAMPLE OF MASSIVE CALCULATIONS PER DATA ELEMENT?
  34. 34. © Arthur J. Lembo, Jr. Salisbury University Acknowledgements Salisbury University National Science Foundation (Award # 1460900) Students: William Hoffman Charlie Kazer Alex Fuerst

×