• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)
 

IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard)

on

  • 4,269 views

See http://sites.google.com/site/cudaiap2009 and http://pinto.scripts.mit.edu/Classes/CUDAIAP2009

See http://sites.google.com/site/cudaiap2009 and http://pinto.scripts.mit.edu/Classes/CUDAIAP2009

Statistics

Views

Total Views
4,269
Views on SlideShare
4,254
Embed Views
15

Actions

Likes
1
Downloads
132
Comments
0

1 Embed 15

http://www.slideshare.net 15

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard) IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hanspeter Pfister, Harvard) Presentation Transcript

    • High-Throughput Scientific Computing Hanspeter Pfister pfister@seas.harvard.edu
    • Themes • How is the brain wired? • How did the Universe start?
    • How is the brain wired? The Connectome Project
    • Connectome Team • Harvard Center for Brain Science – Jeff Lichtman & Clay Reid • Microsoft Research / UW – Michael Cohen • Kitware Inc. – Will Schroeder, Charles Law, Rusty Blue • VRVis Vienna – Markus Hadwiger, Johanna Beyer • IIC – Amelio Vazquez, Eric Miller (Tufts) – Won-Ki Seung, Hanspeter Pfister
    • The Scientific Challenge composite from Roe et al. 1989, Sutton and Brunso-Bechtold 1991
    • Confocal Microscopy: Brainbow Adapted from OlympusConfocal.com
    • Electron Microscopy: ATLUM
    • Serial Sectioning y x z . . . Section i, i (1, …,N) Adapted from http://parasol.tamu.edu Texas A&M University
    • 40,000x40,000 pixels 1.6 GB 120x120 µm (3 nm/pixel) Here shown 40x undersampled 6 15mu EM big view
    • 5 8mu rlp
    • 4 3mu rlp
    • 3 1mu rlp
    • 2 300 nm rlp
    • The Data Challenge • 1 mm3 ~= mouse thalamus ~= 1 petabyte • 1 cm ~= mouse brain ~= 1 exabyte 3 • 1000 cm ~= human brain ~= 1 zettabyte 3 All of Google’s world-wide storage today ~= 1 exabyte
    • Addressing the Data Challenge • Multi-Scale Imaging • Hierarchical Data Representation • Distributed Heterogeneous Computing • Visualization • Segmentation • Analysis
    • Addressing the Data Challenge • Multi-Scale Imaging • Hierarchical Data Representation • Distributed Heterogeneous Computing • Visualization • Segmentation • Analysis
    • Direct Volume Rendering MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
    • Ray Casting • Image-order ray shooting • Interpolate • Assign color & opacity • Composite • Simple to implement • Very flexible (adaptive sampling, …) • Correct perspective MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
    • Transfer Functions • Mapping of density to optical properties • Simplest: color table with opacity over density MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
    • Connectome: EM Data MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
    • Single-Pass Ray Casting • Enabled by conditional loops • Substitute multiple passes with single loop and early loop exit • Volume rendering example in NVIDIA CUDA SDK (procedural ray setup) MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
    • Basic Ray Setup / Termination • Two main approaches: • Procedural ray/box intersection [Röttger et al., 2003], [Green, 2004] • Rasterize bounding box [Krüger and Westermann, 2003] MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
    • Procedural Ray Setup / Term. • Procedural ray / box intersection • Everything handled in fragment shader • Ray given by camera position and volume entry position • Exit criterion needed • Pro: simple and self-contained • Con: full load on fragment shader MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
    • quot;Image-Basedquot; Ray Setup / Term. • Rasterize bounding box front faces and back faces • Ray start positions: front faces • Direction vectors: back faces − front faces - = • Independent of projection (orthogonal/perspective) MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
    • Kernel • Image-based ray setup • Ray start image • Direction image • Ray-cast loop • Sample volume • Accumulate color and opacity • Terminate • Store output MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
    • Standard Ray Casting Optim. (1) Early ray termination • Isosurfaces: stop when surface hit • Direct volume rendering: stop when opacity >= threshold • Several possibilities • Current GPUs: early loop exit works well MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
    • Standard Ray Casting Optim. (2) Empty space skipping • Skip transparent samples • Depends on transfer function • Start casting close to first hit • Several possibilities • Per-sample check of opacity (expensive) • Hierarchical data store (e.g., octree with stack-less traversal [Gobbetti et al., 2008] ) • These are image-order: what about object-order? MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
    • Object-Order Empty Space Skip. (1) • Modify initial rasterization step rasterize bounding box rasterize “tightquot; bounding geometry MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
    • Object-Order Empty Space Skip. (2) • Store min-max values of volume blocks • Cull blocks against transfer function or isovalue • Rasterize front and back faces of active blocks MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
    • Connectome: Fluorescence Data MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
    • Connectome: Implicit Surfaces MARKUS HADWIGER, VRVIS RESEARCH CENTER, VIENNA, AUSTRIA
    • Addressing the Data Challenge • Multi-Scale Imaging • Hierarchical Data Representation • Distributed Heterogeneous Computing • Visualization • Segmentation • Analysis
    • Active Ribbons Active Ribbon: A set of two non-intersecting and coupled Active Contours Active Contour: Deformable closed curve that can be used to segment objects in an image Outer Active Inner Active Contour Contour Active Ribbon
    • Results (Matlab)
    • Axon Segmentation
    • Interactive Analysis
    • How did the Universe start? The MWA Project Kevin Dale, Richard Edgar, Daniel Mitchell, Randall Wayth, Lincoln Greenhill, and Hanspeter Pfister
    • MWA CfA / IIC Team • Harvard Center for Astrophysics / Smithsonian Astrophysical Observatory – Lincoln Greenhill – Daniel Mitchell – Randall Wayth – Stephen Ord • IIC / SEAS – Richard Edgar – Kevin Dale, Hanspeter Pfister
    • The Scientific Goals • Epoch of Re- ionized Inonisation (EOR) neutral • Heliospheric and (H) The “Gap” Ionospheric • Transient detection • Pulsars, Surveys, Interstellar Medium, ionized Galactic Magnetic Field, …
    • The Murchison Widefield Array (MWA) • Located in the remote Australian outback • Extremely wide fields of view for radio astronomy in the 80-300 MHz band • 512 tiles, each a 4x4 array of dipoles, scattered over ~ 1.5 km • Data center for real-time processing co-located with the array http://www.haystack.mit.edu/ast/arrays/mwa/index.html
    • © Murchison Wide-field Array Project (MIT/Harvard/Smithsonian/ANU/Curtin U./U.Melb./UWA/CSIRO)
    • © Murchison Wide-field Array Project (MIT/Harvard/Smithsonian/ANU/Curtin U./U.Melb./UWA/CSIRO)
    • Calibration Ionospheric offsets Ungridded visibilities with bright sources peeled Imaging
    • The Data Rate Challenge ent v. p ang arall led Calibration Loop el c om put atio n FFT Averaging ( !) Gridding Vector Rotation Mapping (1) GB/s 16 GB/s Science 8s cadence 0.5s cadence
    • Implementation • Hardware • 2.4 GHz dual-core AMD Opteron, 4GB RAM • NVIDIA Quadro FX 5600 • Software • AMD Core Math Library (ACML) • NVIDIA CUDA (CUBLAS, CUFFT) • OpenGL
    • Single-GPU Speedup CPUGPU speedup Image Formation Imaging Mostly OpenGL Gridding * UnpeelTileResponse Calibration Loop PeelTileResponse ReRotateVisibilities MeasureTileResponse MeasureIonosphericOffset RotateAndAccumulateVisibilities 0 10 20 30 40 50 60 70
    • Example Results • Noisy images from test data GPU Reference
    • Scaling to a Cluster • 1000 frequency channels, 65 sources every 8 seconds, and 16002 output image • 20-40 frequencies / GPU • 32-64 GPUs, i.e., 16 Tesla S1070s • Need MPI for internal data transfer
    • Conclusions • GPUs enable high-throughput scientific computing • Performance gains of 10-100x • CUDA makes life easier (but not perfect) • Rasterization / OpenGL still useful • Need CUDA MPI for clusters