High-Throughput
Scientific Computing
        Hanspeter Pfister
    pfister@seas.harvard.edu
Themes
• How is the brain wired?
• How did the Universe start?
How is the brain wired?
      The Connectome Project
Connectome Team
• Harvard Center for Brain Science
  – Jeff Lichtman & Clay Reid
• Microsoft Research / UW
  – Michael Coh...
The Scientific Challenge




                      composite from Roe et al. 1989,
                    Sutton and Brunso-Be...
Confocal Microscopy:
             Brainbow




Adapted from OlympusConfocal.com
Electron Microscopy:
      ATLUM
Serial Sectioning
                  y
    x


z



    .
    .
    .                                          Section i, i...
40,000x40,000 pixels
          1.6 GB
  120x120 µm (3 nm/pixel)


Here shown 40x undersampled




                        ...
5 8mu rlp
4 3mu rlp
3 1mu rlp
2 300 nm rlp
The Data Challenge
• 1 mm3           ~= mouse thalamus ~= 1 petabyte
• 1 cm            ~= mouse brain               ~= 1 e...
Addressing the Data
     Challenge
• Multi-Scale Imaging
• Hierarchical Data Representation
• Distributed Heterogeneous Co...
Addressing the Data
     Challenge
• Multi-Scale Imaging
• Hierarchical Data Representation
• Distributed Heterogeneous Co...
Direct Volume Rendering




              MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
Ray Casting
• Image-order ray shooting
 • Interpolate
 • Assign color & opacity
 • Composite
• Simple to implement
• Very ...
Transfer Functions
• Mapping of density to optical properties
• Simplest: color table with opacity over density




      ...
Connectome: EM Data




             MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
Single-Pass Ray Casting
• Enabled by conditional loops
• Substitute multiple passes with single loop and early
  loop exit...
Basic Ray Setup / Termination
• Two main approaches:

 • Procedural ray/box intersection
  [Röttger et al., 2003], [Green,...
Procedural Ray Setup / Term.
• Procedural ray / box intersection
  • Everything handled in
    fragment shader

• Ray give...
quot;Image-Basedquot; Ray Setup / Term.
• Rasterize bounding box
  front faces and back faces

• Ray start positions:
  fr...
Kernel
•   Image-based
    ray setup
    •   Ray start image
    •   Direction image




•   Ray-cast loop
    •   Sample ...
Standard Ray Casting Optim. (1)
Early ray termination
  • Isosurfaces:
    stop when surface hit
  • Direct volume renderi...
Standard Ray Casting Optim. (2)
Empty space skipping
  • Skip transparent samples
  • Depends on transfer function
  • Sta...
Object-Order Empty Space Skip. (1)
• Modify initial rasterization step




rasterize bounding box   rasterize “tightquot; ...
Object-Order Empty Space Skip. (2)
• Store min-max values of volume blocks
• Cull blocks against transfer function or isov...
Connectome: Fluorescence Data




              MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
Connectome: Implicit Surfaces




              MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
Addressing the Data
     Challenge
• Multi-Scale Imaging
• Hierarchical Data Representation
• Distributed Heterogeneous Co...
Active Ribbons
Active Ribbon:
A set of two non-intersecting and
coupled Active Contours

Active Contour: 
Deformable close...
Results (Matlab)
Axon Segmentation
Interactive Analysis
How did the Universe
       start?
                  The MWA Project

Kevin Dale, Richard Edgar, Daniel Mitchell, Randall ...
MWA CfA / IIC Team
• Harvard Center for Astrophysics /
  Smithsonian Astrophysical
  Observatory
  –   Lincoln Greenhill
 ...
The Scientific Goals
• Epoch of Re-                       ionized

  Inonisation (EOR)                  neutral


• Heliosp...
The Murchison Widefield Array (MWA)



•   Located in the remote Australian
    outback
•   Extremely wide fields of view fo...
© Murchison Wide-field Array Project
(MIT/Harvard/Smithsonian/ANU/Curtin U./U.Melb./UWA/CSIRO)
© Murchison Wide-field Array Project
(MIT/Harvard/Smithsonian/ANU/Curtin U./U.Melb./UWA/CSIRO)
Calibration


                                  Ionospheric offsets




  Ungridded
visibilities with
bright sources
    p...
The Data Rate
                     Challenge
             ent
v. p             ang
     arall             led             ...
Implementation
•   Hardware

    •   2.4 GHz dual-core AMD Opteron, 4GB RAM

    •   NVIDIA Quadro FX 5600

•   Software

...
Single-GPU Speedup
                                           CPUGPU speedup
                                            ...
Example Results
      •   Noisy images from test data




GPU                               Reference
Scaling to a Cluster
• 1000 frequency channels, 65 sources every
  8 seconds, and 16002 output image
• 20-40 frequencies /...
Conclusions
• GPUs enable high-throughput scientific
  computing
• Performance gains of 10-100x
• CUDA makes life easier (b...
IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)
Upcoming SlideShare
Loading in...5
×

IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

2,109

Published on

See http://sites.google.com/site/cudaiap2009 and http://pinto.scripts.mit.edu/Classes/CUDAIAP2009

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,109
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
137
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

  1. 1. High-Throughput Scientific Computing Hanspeter Pfister pfister@seas.harvard.edu
  2. 2. Themes • How is the brain wired? • How did the Universe start?
  3. 3. How is the brain wired? The Connectome Project
  4. 4. Connectome Team • Harvard Center for Brain Science – Jeff Lichtman & Clay Reid • Microsoft Research / UW – Michael Cohen • Kitware Inc. – Will Schroeder, Charles Law, Rusty Blue • VRVis Vienna – Markus Hadwiger, Johanna Beyer • IIC – Amelio Vazquez, Eric Miller (Tufts) – Won-Ki Seung, Hanspeter Pfister
  5. 5. The Scientific Challenge composite from Roe et al. 1989, Sutton and Brunso-Bechtold 1991
  6. 6. Confocal Microscopy: Brainbow Adapted from OlympusConfocal.com
  7. 7. Electron Microscopy: ATLUM
  8. 8. Serial Sectioning y x z . . . Section i, i (1, …,N) Adapted from http://parasol.tamu.edu Texas A&M University
  9. 9. 40,000x40,000 pixels 1.6 GB 120x120 µm (3 nm/pixel) Here shown 40x undersampled 6 15mu EM big view
  10. 10. 5 8mu rlp
  11. 11. 4 3mu rlp
  12. 12. 3 1mu rlp
  13. 13. 2 300 nm rlp
  14. 14. The Data Challenge • 1 mm3 ~= mouse thalamus ~= 1 petabyte • 1 cm ~= mouse brain ~= 1 exabyte 3 • 1000 cm ~= human brain ~= 1 zettabyte 3 All of Google’s world-wide storage today ~= 1 exabyte
  15. 15. Addressing the Data Challenge • Multi-Scale Imaging • Hierarchical Data Representation • Distributed Heterogeneous Computing • Visualization • Segmentation • Analysis
  16. 16. Addressing the Data Challenge • Multi-Scale Imaging • Hierarchical Data Representation • Distributed Heterogeneous Computing • Visualization • Segmentation • Analysis
  17. 17. Direct Volume Rendering MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
  18. 18. Ray Casting • Image-order ray shooting • Interpolate • Assign color & opacity • Composite • Simple to implement • Very flexible (adaptive sampling, …) • Correct perspective MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
  19. 19. Transfer Functions • Mapping of density to optical properties • Simplest: color table with opacity over density MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
  20. 20. Connectome: EM Data MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
  21. 21. Single-Pass Ray Casting • Enabled by conditional loops • Substitute multiple passes with single loop and early loop exit • Volume rendering example in NVIDIA CUDA SDK (procedural ray setup) MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
  22. 22. Basic Ray Setup / Termination • Two main approaches: • Procedural ray/box intersection [Röttger et al., 2003], [Green, 2004] • Rasterize bounding box [Krüger and Westermann, 2003] MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
  23. 23. Procedural Ray Setup / Term. • Procedural ray / box intersection • Everything handled in fragment shader • Ray given by camera position and volume entry position • Exit criterion needed • Pro: simple and self-contained • Con: full load on fragment shader MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
  24. 24. quot;Image-Basedquot; Ray Setup / Term. • Rasterize bounding box front faces and back faces • Ray start positions: front faces • Direction vectors: back faces − front faces - = • Independent of projection (orthogonal/perspective) MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
  25. 25. Kernel • Image-based ray setup • Ray start image • Direction image • Ray-cast loop • Sample volume • Accumulate color and opacity • Terminate • Store output MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
  26. 26. Standard Ray Casting Optim. (1) Early ray termination • Isosurfaces: stop when surface hit • Direct volume rendering: stop when opacity >= threshold • Several possibilities • Current GPUs: early loop exit works well MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
  27. 27. Standard Ray Casting Optim. (2) Empty space skipping • Skip transparent samples • Depends on transfer function • Start casting close to first hit • Several possibilities • Per-sample check of opacity (expensive) • Hierarchical data store (e.g., octree with stack-less traversal [Gobbetti et al., 2008] ) • These are image-order: what about object-order? MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
  28. 28. Object-Order Empty Space Skip. (1) • Modify initial rasterization step rasterize bounding box rasterize “tightquot; bounding geometry MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
  29. 29. Object-Order Empty Space Skip. (2) • Store min-max values of volume blocks • Cull blocks against transfer function or isovalue • Rasterize front and back faces of active blocks MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
  30. 30. Connectome: Fluorescence Data MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
  31. 31. Connectome: Implicit Surfaces MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
  32. 32. Addressing the Data Challenge • Multi-Scale Imaging • Hierarchical Data Representation • Distributed Heterogeneous Computing • Visualization • Segmentation • Analysis
  33. 33. Active Ribbons Active Ribbon: A set of two non-intersecting and coupled Active Contours Active Contour: Deformable closed curve that can be used to segment objects in an image Outer Active Inner Active Contour Contour Active Ribbon
  34. 34. Results (Matlab)
  35. 35. Axon Segmentation
  36. 36. Interactive Analysis
  37. 37. How did the Universe start? The MWA Project Kevin Dale, Richard Edgar, Daniel Mitchell, Randall Wayth, Lincoln Greenhill, and Hanspeter Pfister
  38. 38. MWA CfA / IIC Team • Harvard Center for Astrophysics / Smithsonian Astrophysical Observatory – Lincoln Greenhill – Daniel Mitchell – Randall Wayth – Stephen Ord • IIC / SEAS – Richard Edgar – Kevin Dale, Hanspeter Pfister
  39. 39. The Scientific Goals • Epoch of Re- ionized Inonisation (EOR) neutral • Heliospheric and (H) The “Gap” Ionospheric • Transient detection • Pulsars, Surveys, Interstellar Medium, ionized Galactic Magnetic Field, …
  40. 40. The Murchison Widefield Array (MWA) • Located in the remote Australian outback • Extremely wide fields of view for radio astronomy in the 80-300 MHz band • 512 tiles, each a 4x4 array of dipoles, scattered over ~ 1.5 km • Data center for real-time processing co-located with the array http://www.haystack.mit.edu/ast/arrays/mwa/index.html
  41. 41. © Murchison Wide-field Array Project (MIT/Harvard/Smithsonian/ANU/Curtin U./U.Melb./UWA/CSIRO)
  42. 42. © Murchison Wide-field Array Project (MIT/Harvard/Smithsonian/ANU/Curtin U./U.Melb./UWA/CSIRO)
  43. 43. Calibration Ionospheric offsets Ungridded visibilities with bright sources peeled Imaging
  44. 44. The Data Rate Challenge ent v. p ang arall led Calibration Loop el c om put atio n FFT Averaging ( !) Gridding Vector Rotation Mapping (1) GB/s 16 GB/s Science 8s cadence 0.5s cadence
  45. 45. Implementation • Hardware • 2.4 GHz dual-core AMD Opteron, 4GB RAM • NVIDIA Quadro FX 5600 • Software • AMD Core Math Library (ACML) • NVIDIA CUDA (CUBLAS, CUFFT) • OpenGL
  46. 46. Single-GPU Speedup CPUGPU speedup Image Formation Imaging Mostly OpenGL Gridding * UnpeelTileResponse Calibration Loop PeelTileResponse ReRotateVisibilities MeasureTileResponse MeasureIonosphericOffset RotateAndAccumulateVisibilities 0 10 20 30 40 50 60 70
  47. 47. Example Results • Noisy images from test data GPU Reference
  48. 48. Scaling to a Cluster • 1000 frequency channels, 65 sources every 8 seconds, and 16002 output image • 20-40 frequencies / GPU • 32-64 GPUs, i.e., 16 Tesla S1070s • Need MPI for internal data transfer
  49. 49. Conclusions • GPUs enable high-throughput scientific computing • Performance gains of 10-100x • CUDA makes life easier (but not perfect) • Rasterization / OpenGL still useful • Need CUDA MPI for clusters
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×