Streaming Multigrid for Gradient-Domain Operations on Large Images Michael Kazhdan , Johns Hopkins University Hugues Hoppe...
Abstract <ul><li>Develop a  streaming multigrid  solver – 2 passes over out-of-core data </li></ul><ul><ul><li>Solver of  ...
Abstract <ul><li>Develop a streaming multigrid solver – 2 passes over out-of-core data </li></ul><ul><ul><li>2 nd  order f...
Outline <ul><li>Introduction </li></ul><ul><li>Related work </li></ul><ul><li>Review of finite-difference multigrid </li><...
Outline <ul><li>Introduction </li></ul><ul><li>Related work </li></ul><ul><li>Review of finite-difference multigrid </li><...
Gradient-domain problem as Poisson solution <ul><li>Solve the problem that  </li></ul><ul><li>Find U to minimize  </li></u...
Image processing on  gradient-domain <ul><li>Lighting removal  by zeroing small gradient [Horn 74] </li></ul><ul><li>HDR i...
Painting on gradient-domain <ul><li>Painting  with interactive gradient-domain modeling  </li></ul><ul><li>[McCann & Polla...
Large images on gradient-domain <ul><li>Processing of  large images   [Kopf et al. 07] </li></ul><ul><li>GB pixel images a...
Standard  multigrid   V-cycle
Contributions <ul><li>A general, accurate and efficient solver for  Poisson  eq. over  out-of-core  images </li></ul><ul><...
Contributions <ul><li>On a single CPU core </li></ul><ul><ul><li>Solve a 3-channel, 16-MB gradient-domain problem with an ...
Outline <ul><li>Introduction </li></ul><ul><li>Related work </li></ul><ul><li>Review of finite-difference multigrid </li><...
Solvers of Poisson equation <ul><li>Iterative solvers </li></ul><ul><ul><li>Gauss-Seidel  and conjugate gradient  </li></u...
Out-of-core <ul><li>Multigrid  are difficult to schedule out-of-core [Toledo 99] </li></ul><ul><li>Solve out-of-core probl...
Outline <ul><li>Introduction </li></ul><ul><li>Related work </li></ul><ul><li>Review of finite-difference multigrid </li><...
Gradient-domain image processing    Poisson Equation
▽  u x 0 x 1 x i x N u 0 u 1 u i u N h u i+1
Δ U x i+1 x i-1 x i x N u i-1 u N h u i+1 u i
Δ U= f
Lu=f U u 1,1 u 1,2 u 1,3 u 2,1 u 2,2 u 2,3 u 3,1 u 3,2 u 3,3
Lu=f <ul><li>Gauss-Seidel  relaxation on  u </li></ul><ul><li>L u  =  f </li></ul><ul><li>( L+D+R )  u  =  f  where L =   ...
Error, residual  <ul><li>Consider   Lu = f </li></ul><ul><li>Error   e k  = u – u (k) </li></ul><ul><li>Residual   r k  = ...
Limits of  Gauss-Seidel <ul><li>e k =  R J k   e 0  </li></ul><ul><li>Gauss-Seidel relaxation is smooth but converges slow...
coarse-grid correction <ul><li>Relax  k  times on  L h u h  = f h   on  fine-grid , initial  u (0)  arbitrary </li></ul><u...
Restriction & Prolongation <ul><li>Restriction operator </li></ul><ul><li>Fine grid    coarse grid </li></ul><ul><li>Typi...
2D stencils of the multigrid  2D Restriction 2D Prolongation 2D Relaxation  (Laplacian)
V-cycle <ul><li>v h     MV h (v h , f h ) </li></ul><ul><li>Relax  k  times on  L h u h  = f h , initial v arbitrary </li...
Standard multigrid V-cycle f l-1 =R l l -r l u l =P l l-1 u P l-1 +u R l L lmin u lmin =f lmin Base solution
Outline <ul><li>Introduction </li></ul><ul><li>Related work </li></ul><ul><li>Review of finite-difference multigrid </li><...
Representing the Poisson equation with 1D B-spline basis <ul><li>Solving Poisson eq . reduces to  solving  Lu = f </li></u...
Fitting forward-difference gradient constrains <ul><li>Gradient t of unknown image v </li></ul><ul><li>Difference of B-spl...
2 nd  order B-spline 1D case
2 nd  order B-spline 2D case
2D stencils of the multigrid operators using B-splines <ul><li>Prolongation </li></ul><ul><li>Restriction </li></ul><ul><l...
Outline <ul><li>Introduction </li></ul><ul><li>Related work </li></ul><ul><li>Review of finite-difference multigrid </li><...
How to pipeline ? <ul><li>phases of each row </li></ul><ul><ul><li>A l     R    A l-1     R    A l-2     P    A l-1 ...
Temporally blocked relaxation  <ul><li>Maintain a  window of rows  [i-1, i+2k+1] to perform a skipping, counter-current re...
Full data pipeline for gradient-domain processing
Memory analysis <ul><li>Implement the active windows on  u l R  and  f l  as circular memory buffer of images rows </li></...
Outline <ul><li>Introduction </li></ul><ul><li>Related work </li></ul><ul><li>Review of finite-difference multigrid </li><...
Parameters (n,k,v) <ul><li>n-order of the finite element (n-order B-spline) </li></ul><ul><li>K times Gauss-Seidel relaxat...
(n, v) Plot of the rms and max errors vs. # of multigrid V-cycles 2 nd -order element give the fastest convergence !
(n, k=2, v=1)
Parameter selection <ul><li>Sufficiently accurate solution with a minimum number of V-cycles </li></ul><ul><ul><li>8-bit c...
Outline <ul><li>Introduction </li></ul><ul><li>Related work </li></ul><ul><li>Review of finite-difference multigrid </li><...
Implementation <ul><li>Maximizing disk throughput </li></ul><ul><ul><li>Larger block (4MB) transfer to minimize disk laten...
Outline <ul><li>Introduction </li></ul><ul><li>Related work </li></ul><ul><li>Review of finite-difference multigrid </li><...
Environment <ul><li>NB </li></ul><ul><ul><li>2.2 GHz Core 2 Duo processor </li></ul></ul><ul><ul><li>4 GB RAM </li></ul></...
Image stitching 19,588 x 4,457 (87 MB) panorama form  9 photos Copy the image gradients and solving the Poisson eq.
Image stitching SM: streaming multigrid solver QT: quadtree (AT) solver of Agarwala [07]
Tone-mapping  (HDR    normal tone) before after
Tone-mapping  (HDR    normal tone) Stream multigraid is the first one to solve Poisson eq in time that is linear on the #...
GB stitching and tone-mapping May not capture the true scene contrast
GB stitching and tone-mapping
Outline <ul><li>Introduction </li></ul><ul><li>Related work </li></ul><ul><li>Review of finite-difference multigrid </li><...
Conclusion <ul><li>Streaming multigrid </li></ul><ul><ul><li>Out-of-core tech. for solving large global linear system </li...
Future work <ul><li>Dirichlet boundary condition </li></ul><ul><ul><li>Modify 2D stencils </li></ul></ul><ul><ul><li>Const...
Future work <ul><li>Reduce disk bandwidth by using compression/decompression of the streamed temporary data </li></ul><ul>...
Upcoming SlideShare
Loading in …5
×

study Streaming Multigrid For Gradient Domain Operations On Large Images

1,193 views
1,042 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,193
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Johns Hopkins University : 美國醫界著名大學
  • . 本文就是用 multi-streaming 完成 multigrid 演算法 . 特別針對這種 out-of-core , 即演算過程中 , 不僅只在 main memory, 還需要用到 disk 的大影像 , 在 gradient domain 上處理的數值分析方法 .multigrid method 早在 19?? 年由 ? 俄國數學家 ?? 發展於 finite difference ( f(x+b)-f(x+a) ), 用來解偏微分方程式 (Partial differential equation) 的數值分析方法之一 . . 本文設計了的 streaming multigrid solver, 有 2 passes. 1. 對每個 multigrid method 的 V-cycle 計算 , 用二階的 finite elements 去達到逐夠的正確性 . 這個說法很怪 ? Gradient domain 是說 ▽ U=G. 而二階是對 G 作 divergence, ▽ .▽ U =▽. G = f 這樣當然是 Poisson equation. Uxx + Uyy = f , 也就是二階的偏微分 作二階偏微分的當然是希望在 V-cycle (multi-grid 的演算過程 ) 內做 2 nd order finite elements 2. temporally blocked relation: 回憶 CA 課程 , 以 cache 來增加 temporal 和 special 的 locality. ㄧ次處理數行的方法 ( 而非整張 image 計算 ), 保證計算的數行在 L1 內 , 發揮 cache temporal locality 的優點 3. multi-level streaming, 是用來對 multi-grid 中的兩個必要 phase -restriction 和 prolongation – 作 pipeline. restriction 是高解析度到低解析度的運算 . 而 prolongation 相對來說是低解析度到高解析度的運算 . 怎麼 pipeline, 就要分別分割 restriction 和 prolongation. 實作上 , 本文是利用 GPU 多個 streaming 達成 . 能完成上式的 key component – 過去這種 forward-difference gradient 的方法 ( 我想就是指 Gauss-Seidel 或 multigrid 了 ) 而 multigrid 間高低解析度之間的轉換 , 不外乎就是些高解析度是低解析度內插 , 低解析度是高解析度 mean 之類的 . 這裡卻用上了 B-spline finite-element. 將 ⊿ U(x)=sum ( ui * Bi(x)) ( 其中⊿ , 是▽. ▽ ) 也就是說 , x 點上的 , ⊿U(x), 是 B-spline basis 的集合 . 所以我們對 multigrid 的高低解析度轉換對象 , 成為不同解析度間的 B-spline Basis 轉換
  • 以偏微方程式再對 image 寫一遍 minimize |gradient U– G| =: partial derivation (gU-G)=0  d g (U)- d G = 0
  • Lighting removal HDR tone-mapped by attenuating ( 衰減 ) luminance gradient Image stitching – 影像縫合 Shadow removal Refection removal HDR tone management
  • Paint: 大 meet, 小 meet 上次 present Large image:
  • Out-of-core image 沒有效率 , 在於 disk-IO
  • 1. 可在
  • Rms: root mean square error
  • 就是用數值方法逼近 Poisson eq 這類的偏微分方程式
  • 1. Toledo [1999] presents an excellent survey of out-of-core algorithms for large linear systems
  • 本文要處理的對象 , gradient-domain image processing U is the 2D image ▽ U = G. 是 vector function 不過 , 我們要運算的對象是 scalar, ▽ . ▽ U=▽ . G=f 在 image processing 的課題上 , 通常是設定 f, 解 U 提醒自己: 1. Lapalicain eq: Δ 2 u = 0 Δ 2 u = 0 2. Poisson eq: Δ 2 u = f Δ u = f
  • 後註:以這種方式理解數值分析法對 u x 的處理 . u xx = u(x+h)-u(x) 教科書通常是用 Taylor seriers 展開後相減
  • 後註:以這種方式理解數值分析法對 u x 的處理 . 教科書通常是用 Taylor seriers 展開後相減
  • 本文碎碎念 forward difference : h * f’(x) = f(x+h) – f(x) Backward difference : h * f’(x) = f(x) – f(x-h)
  • 詳細內容請參照 , 所以針對這個特例 , 可以得到以下的 Lu=f, Each row of the matrix L cor
  • 這種反覆地求 u, 被稱為 relaxation 他們是這麼說 , the pressure of constraint is relaxed 若本題的 constrain 是 e  0 則逐次逼進 e = 0, 使得 e 的壓力變少 本文的 relaxation 都用 Gauss-seidel method !!
  • 1. Error 和 residual 的定義 2. 目的 : e  0. 這是一種 relaxation problem. 將 error 變小 , 也就是將限制放鬆 ( 開山祖師的奇妙想法…其實和答案越接近 , 和答案越緊更直覺 ) 要注意到兩件事 , e k 是不是 smooth function ? 有沒有好 initial e 0 ?
  • 完整的說法是 , Gauss-Seidel relaxation converges slowly on the low-frequency components of the solution. . 作 R J 的 Fourier transform (or series?). 可以分析出是 R J 的低頻成分的確要多次才收歛
  • L H 和 L h 一樣嗎 ? 不一樣 !
  • 將之前的整合成 V-cycle, 直接改用本 paper 圖文說明 [figure 1] Restriction: 由 residual 求得 coarse-grid 的 f ( 很奇怪吧 .. 應該也只能求 coarse-grid 的 residual) Base solution: 足夠 coarse level 後 , 直接 求 u 於 Lu = f Prolongation: 由 coarse-grid 的 u, 求得 fine-grid u ( 很難接受 , 仍然應該是 u = u + P e) 至於 data flow, 則改為本 multi-grid 的說法
  • Bi(x) is the B-spline basis B(x) of ith index
  • forward difference : h * f’(x) = f(x+h) – f(x)
  • 由 B-spline, 1D P = ¼ (3 1), 1D R = ¼ (1 3)
  • 分別說明 , 為何 relaxation 不能 pipeline, 也不好作平行化 以低解析度的 l-1 level, Relaxation 需要 laplacian stencil 內的 neighborhood. ( 同解析度 ) 雖然 row (n-1) 已經作了 , 但 row (n+1) 還沒有生成 l-1 level 的資料 所以過去是整 level 算完後再算另外一層 改為下方 , 一次讀入 2k 行 . 而執行一次為 整個移動完畢 . 就作完一次 V-pass ( 先不管 R, P) 使 out-of-core image 如同 in-memory 一樣 , 都在 main memory 內執行 . 執行速度不因
  • k 是用在 k times relaxations
  • N=2, k=3, 3 times Gauss-Seidel relaxations update, 2 V-cycles
  • MAX ERROR: dash Rms (root mean square error) : solid line V-cycles : 表示 幾次 v-cycle. 越多當然越精確 神奇的是 , 2nd order B-spline 收斂最快 . 3rd order B-spline 反而收斂慢
  • 很多都是細節 : 像是用半精度存 u, f, 就可以加快一半速度
  • Panorama: 全景圖
  • HDR  normal tone : 求 log-luminance gradient, apply non-linearly attenuation ( 衰減 ) Tone mapping 比 stitching 更難 ! adaptive method can not be applied full resolution over the entire grid of Poisson eq !!!
  • 收集不同光圈下的 photos
  • Dirichlet boundary condition, 像是在邊界加值 如 d 2 y/dx 2 + 3y=1, x = [0,1], y[0]=a, y[0]=b
  • study Streaming Multigrid For Gradient Domain Operations On Large Images

    1. 1. Streaming Multigrid for Gradient-Domain Operations on Large Images Michael Kazhdan , Johns Hopkins University Hugues Hoppe , Microsoft Research SIGGRAPH 08
    2. 2. Abstract <ul><li>Develop a streaming multigrid solver – 2 passes over out-of-core data </li></ul><ul><ul><li>Solver of Poisson eq. with unconstrained boundary conditions </li></ul></ul><ul><ul><li>Construct Relaxation , Restriction and Prolongation operators in multigrid method based on the B-spline basis </li></ul></ul><ul><ul><li>Build up a framework to pipeline multigrid with a window of rows of images </li></ul></ul>
    3. 3. Abstract <ul><li>Develop a streaming multigrid solver – 2 passes over out-of-core data </li></ul><ul><ul><li>2 nd order finite-elements in a single V-cycle </li></ul></ul><ul><ul><li>Temporally blocked relation </li></ul></ul><ul><ul><li>Multi-level streaming to pipeline restriction & prolongation into single streaming passes </li></ul></ul><ul><li>Key contribution </li></ul><ul><ul><li>forward-difference gradient  B-spline finite-element </li></ul></ul>
    4. 4. Outline <ul><li>Introduction </li></ul><ul><li>Related work </li></ul><ul><li>Review of finite-difference multigrid </li></ul><ul><li>Our finite-element multigrid approach </li></ul><ul><li>Streaming multigrid solver </li></ul><ul><li>Efficient convergence of 2 nd -order elements </li></ul><ul><li>Implementation </li></ul><ul><li>Application results </li></ul><ul><li>Conclusions & future work </li></ul>
    5. 5. Outline <ul><li>Introduction </li></ul><ul><li>Related work </li></ul><ul><li>Review of finite-difference multigrid </li></ul><ul><li>Our finite-element multigrid approach </li></ul><ul><li>Streaming multigrid solver </li></ul><ul><li>Efficient convergence of 2 nd -order elements </li></ul><ul><li>Implementation </li></ul><ul><li>Application results </li></ul><ul><li>Conclusions & future work </li></ul>
    6. 6. Gradient-domain problem as Poisson solution <ul><li>Solve the problem that </li></ul><ul><li>Find U to minimize </li></ul><ul><li>The Poisson equation </li></ul>
    7. 7. Image processing on gradient-domain <ul><li>Lighting removal by zeroing small gradient [Horn 74] </li></ul><ul><li>HDR image tone-mapped by adaptively attenuating luminance gradients [Weiss 01] </li></ul><ul><li>Overlapping images stitched seamlessly by merging gradients [P´erez et al.03; Agarwala et al.04;Levin et al.04] </li></ul><ul><li>Shadow removal by zeroing large luminance gradients in regions of constant chromaticity [Finlayson et al. 02] </li></ul><ul><li>Undesirable reflections removed in flash and ambient image pairs [Agrawal et al. 05] </li></ul><ul><li>Photographic tone management is improved using gradient constraints [Bae et al. 06] </li></ul>
    8. 8. Painting on gradient-domain <ul><li>Painting with interactive gradient-domain modeling </li></ul><ul><li>[McCann & Pollard 08] </li></ul><ul><li>Diffusion curves [Orzan et al. 08] </li></ul>
    9. 9. Large images on gradient-domain <ul><li>Processing of large images [Kopf et al. 07] </li></ul><ul><li>GB pixel images are too large to fit main memory </li></ul><ul><li>Direct solution (ex: Cholesky factorization) is impractical </li></ul><ul><li>Iteration techniques (ex: conjugate gradients and multigrid method) are in-efficient </li></ul><ul><ul><li>Requiring many iterations over out-of-core data </li></ul></ul>
    10. 10. Standard multigrid V-cycle
    11. 11. Contributions <ul><li>A general, accurate and efficient solver for Poisson eq. over out-of-core images </li></ul><ul><ul><li>Sufficient accuracy in 2 V-cycles on GB images </li></ul></ul><ul><ul><ul><li>2 nd order B-spline finite elements gets more accuracy than traditional finite element in multigrid method </li></ul></ul></ul><ul><ul><li>Efficiency on temporally locality </li></ul></ul><ul><ul><ul><li>Small moving windows of data in memory </li></ul></ul></ul><ul><ul><ul><li>Pipelined V-cycle : 3 Gauss-Seidel relaxations, restriction and prolongation </li></ul></ul></ul>
    12. 12. Contributions <ul><li>On a single CPU core </li></ul><ul><ul><li>Solve a 3-channel, 16-MB gradient-domain problem with an rms error on the order of 10 −5 in 15 seconds. </li></ul></ul><ul><li>High ratio of local computation to memory bandwidth </li></ul><ul><ul><li>Temporal locality in L1 cache </li></ul></ul>
    13. 13. Outline <ul><li>Introduction </li></ul><ul><li>Related work </li></ul><ul><li>Review of finite-difference multigrid </li></ul><ul><li>Our finite-element multigrid approach </li></ul><ul><li>Streaming multigrid solver </li></ul><ul><li>Efficient convergence of 2 nd -order elements </li></ul><ul><li>Implementation </li></ul><ul><li>Application results </li></ul><ul><li>Conclusions & future work </li></ul>
    14. 14. Solvers of Poisson equation <ul><li>Iterative solvers </li></ul><ul><ul><li>Gauss-Seidel and conjugate gradient </li></ul></ul><ul><ul><ul><li>Memory-efficient </li></ul></ul></ul><ul><ul><ul><li>Require many iterations </li></ul></ul></ul><ul><ul><li>Reduce # of iterations using multi-resolution pre-conditioners [Gortler and Cohen 95; Szeliski 06] </li></ul></ul><ul><ul><li>Reduce # of iterations using multigrid solvers [Brandt 77; Briggs et al. 00] </li></ul></ul><ul><ul><li>GPU + multigrid [Bolz et al. 03; Goodnight et al. 03; G¨oddeke et al. 08] </li></ul></ul>
    15. 15. Out-of-core <ul><li>Multigrid are difficult to schedule out-of-core [Toledo 99] </li></ul><ul><li>Solve out-of-core problem on a coarser-resolution grid and then upsample the resulting approximation [Kopf et al. 07a] </li></ul><ul><ul><li>Can not maintain robust sharp features </li></ul></ul><ul><li>Adaptive partition to solve Poisson eq.For image stitching, </li></ul><ul><ul><li>Poisson eq. in the image seams [Agarwala 07] </li></ul></ul><ul><ul><li>Poisson eq. in the context of fluid flow simulation [Losasso et al. 04] </li></ul></ul><ul><ul><li>Poisson eq. in the surface reconstruction [Kazhdan et al. 06]. </li></ul></ul><ul><li>Our method addresses the general case where </li></ul><ul><ul><li>The Poisson eq. is solved accurately everywhere </li></ul></ul><ul><ul><li>The problem size can not be reduced </li></ul></ul><ul><ul><li>No initial solution guess is available </li></ul></ul>
    16. 16. Outline <ul><li>Introduction </li></ul><ul><li>Related work </li></ul><ul><li>Review of finite-difference multigrid </li></ul><ul><li>Our finite-element multigrid approach </li></ul><ul><li>Streaming multigrid solver </li></ul><ul><li>Efficient convergence of 2 nd -order elements </li></ul><ul><li>Implementation </li></ul><ul><li>Application results </li></ul><ul><li>Conclusions & future work </li></ul>
    17. 17. Gradient-domain image processing  Poisson Equation
    18. 18. ▽ u x 0 x 1 x i x N u 0 u 1 u i u N h u i+1
    19. 19. Δ U x i+1 x i-1 x i x N u i-1 u N h u i+1 u i
    20. 20. Δ U= f
    21. 21. Lu=f U u 1,1 u 1,2 u 1,3 u 2,1 u 2,2 u 2,3 u 3,1 u 3,2 u 3,3
    22. 22. Lu=f <ul><li>Gauss-Seidel relaxation on u </li></ul><ul><li>L u = f </li></ul><ul><li>( L+D+R ) u = f where L = ( L+D+R ) </li></ul><ul><li>D u = f – ( L+R ) u </li></ul><ul><li>u (k) = D -1 ( f – ( L+R ) u (k-1) ) = D -1 f + R J u (k-1) </li></ul>
    23. 23. Error, residual <ul><li>Consider Lu = f </li></ul><ul><li>Error e k = u – u (k) </li></ul><ul><li>Residual r k = f - L u (k) = L e k </li></ul><ul><li>u (k) = D -1 f + R J u (k-1) </li></ul><ul><li> e k = u – u (k) </li></ul><ul><li>= ( D -1 f + R J u ) – ( D -1 f + R J u (k-1) ) </li></ul><ul><li>= R J ( u - u (k-1) ) </li></ul><ul><li> = R J e k-1 </li></ul><ul><li>= R J k e 0 </li></ul><ul><li>e  0 depends on R J and e o </li></ul><ul><li> </li></ul>
    24. 24. Limits of Gauss-Seidel <ul><li>e k = R J k e 0 </li></ul><ul><li>Gauss-Seidel relaxation is smooth but converges slowly on low-freq. components </li></ul><ul><li>Why use coarse grids ? [Briggs et al. 00] </li></ul><ul><ul><li>Coarse grids can be used to compute an improved initial guess for the fine-grid relaxation </li></ul></ul><ul><ul><li>Relaxation on the coarse-grid is much cheaper </li></ul></ul>
    25. 25. coarse-grid correction <ul><li>Relax k times on L h u h = f h on fine-grid , initial u (0) arbitrary </li></ul><ul><ul><li>u (k) = D -1 f + R J u (k-1) </li></ul></ul><ul><li>Compute the residual of the find-grid </li></ul><ul><ul><li>r = L h u h (k) - f h </li></ul></ul><ul><li>Restrict the residual to the coarse-grid </li></ul><ul><ul><li>r H = R r h </li></ul></ul><ul><ul><li>where H = 2h, R is the restriction </li></ul></ul><ul><li>Compute the error on the coarse-grid </li></ul><ul><ul><li>L H e H = r H , where L H = R L h P </li></ul></ul><ul><li>Prolongate (interpolate) the error to the fine-grid </li></ul><ul><ul><li>e h = P e H , where P is the prolongation </li></ul></ul><ul><li>Correct the fine-grid solution </li></ul><ul><ul><li>u h (k+1) = u h (k) + e h </li></ul></ul>2 3 1 1 1 4 5 6 Fine grid coarse grid
    26. 26. Restriction & Prolongation <ul><li>Restriction operator </li></ul><ul><li>Fine grid  coarse grid </li></ul><ul><li>Typically using local weighted averaging </li></ul><ul><li>Prolongation operator </li></ul><ul><li>Coarse grid  fine grid </li></ul><ul><li>Typically using bilinear interpolation </li></ul>Restriction (1D) prolongation (1D) R = P T
    27. 27. 2D stencils of the multigrid 2D Restriction 2D Prolongation 2D Relaxation (Laplacian)
    28. 28. V-cycle <ul><li>v h  MV h (v h , f h ) </li></ul><ul><li>Relax k times on L h u h = f h , initial v arbitrary </li></ul><ul><li>If Ω h is the coarsest grid, goto 4. </li></ul><ul><ul><li>Else </li></ul></ul><ul><ul><li>f 2h  R(f h – L h v h ) = R r h // restriction </li></ul></ul><ul><ul><li>v 2h  0 </li></ul></ul><ul><ul><li>v 2h  MV 2h (v 2h , f 2h ) </li></ul></ul><ul><li>Correct v h  v h + P v 2h // prolongation </li></ul><ul><li>Relax k times on L h u h = f h , initial guess v h </li></ul>Relaxation on L h u h = f h Restriction f 2h = R r h Relaxation on L 2h u 2h = f 2h Relaxation on L 4h u 4h = f 4h Restriction f 2h = R r h
    29. 29. Standard multigrid V-cycle f l-1 =R l l -r l u l =P l l-1 u P l-1 +u R l L lmin u lmin =f lmin Base solution
    30. 30. Outline <ul><li>Introduction </li></ul><ul><li>Related work </li></ul><ul><li>Review of finite-difference multigrid </li></ul><ul><li>Our finite-element multigrid approach </li></ul><ul><li>Streaming multigrid solver </li></ul><ul><li>Efficient convergence of 2 nd -order elements </li></ul><ul><li>Implementation </li></ul><ul><li>Application results </li></ul><ul><li>Conclusions & future work </li></ul>
    31. 31. Representing the Poisson equation with 1D B-spline basis <ul><li>Solving Poisson eq . reduces to solving Lu = f </li></ul><ul><ul><li>L as the N x N matrix with L i,j = < Δ B i (x), B j (x)> = <> </li></ul></ul><ul><ul><li>f as the vector with f j = < F(x), B j (x)> </li></ul></ul><ul><li> L i,j = < ▽ B i (x), -▽ B j (x)> </li></ul>
    32. 32. Fitting forward-difference gradient constrains <ul><li>Gradient t of unknown image v </li></ul><ul><li>Difference of B-spline </li></ul>
    33. 33. 2 nd order B-spline 1D case
    34. 34. 2 nd order B-spline 2D case
    35. 35. 2D stencils of the multigrid operators using B-splines <ul><li>Prolongation </li></ul><ul><li>Restriction </li></ul><ul><li>Laplacian </li></ul>
    36. 36. Outline <ul><li>Introduction </li></ul><ul><li>Related work </li></ul><ul><li>Review of finite-difference multigrid </li></ul><ul><li>Our finite-element multigrid approach </li></ul><ul><li>Streaming multigrid solver </li></ul><ul><li>Efficient convergence of 2 nd -order elements </li></ul><ul><li>Implementation </li></ul><ul><li>Application results </li></ul><ul><li>Conclusions & future work </li></ul>
    37. 37. How to pipeline ? <ul><li>phases of each row </li></ul><ul><ul><li>A l  R  A l-1  R  A l-2  P  A l-1  P  A l </li></ul></ul><ul><ul><li>Does neighborhood in Laplacian stencil exist ? </li></ul></ul><ul><ul><li>Gauss-Seidel Relaxation (A), Restriction (R), Prolongation (P) </li></ul></ul>Time row row Perform 3 times relaxations (A) as a single streaming operations R n-1 Al R Al-1 R Al-2 P Al-1 P Al R n Al R Al-1 R Al-2 P Al-1 P R n+1 Al R Al-1 R Al-2 P Al-1 A l R A l-1 A l-2 R P A l-1 A l P R n-1 Al Al Al R n Al Al Al R n+1 Al Al Al
    38. 38. Temporally blocked relaxation <ul><li>Maintain a window of rows [i-1, i+2k+1] to perform a skipping, counter-current relaxation sweep, updating pixels in rows {i+2k-1, i+2k-3, …, i+1} </li></ul><ul><li>Row 4 {11} </li></ul><ul><ul><li>Relaxed twice: Row 3 {02,06}, row 2{03,08} </li></ul></ul><ul><ul><li>Relaxed once: Row 5 {7}, row {10} </li></ul></ul>Perform k=3 times relaxations (A) as a single streaming operations Row 1 Row 2 Row 3 Row 4 Row 5 Row 6 The pixel row is memory-resident i The i th relaxation globally
    39. 39. Full data pipeline for gradient-domain processing
    40. 40. Memory analysis <ul><li>Implement the active windows on u l R and f l as circular memory buffer of images rows </li></ul><ul><li>Window size (w, h) </li></ul><ul><ul><li>w: the image width at the coarser level </li></ul></ul><ul><ul><li>h: 2k+3 (restriction), 2k+5 (prolongation) </li></ul></ul><ul><li>Memory usage is O(N x ) for an N x x N y image </li></ul>
    41. 41. Outline <ul><li>Introduction </li></ul><ul><li>Related work </li></ul><ul><li>Review of finite-difference multigrid </li></ul><ul><li>Our finite-element multigrid approach </li></ul><ul><li>Streaming multigrid solver </li></ul><ul><li>Efficient convergence of 2 nd -order elements </li></ul><ul><li>Implementation </li></ul><ul><li>Application results </li></ul><ul><li>Conclusions & future work </li></ul>
    42. 42. Parameters (n,k,v) <ul><li>n-order of the finite element (n-order B-spline) </li></ul><ul><li>K times Gauss-Seidel relaxations </li></ul><ul><li>v passes of V-cycles </li></ul>2-order B-spline basis  (2, 3, 2)
    43. 43. (n, v) Plot of the rms and max errors vs. # of multigrid V-cycles 2 nd -order element give the fastest convergence !
    44. 44. (n, k=2, v=1)
    45. 45. Parameter selection <ul><li>Sufficiently accurate solution with a minimum number of V-cycles </li></ul><ul><ul><li>8-bit channel image processing </li></ul></ul><ul><ul><li>Max error < 1/256 </li></ul></ul><ul><li> (2,5,1): 2 nd order B-spline basis, 5 Gauss-Seidel updates with a single-V cycle in all our applications </li></ul>
    46. 46. Outline <ul><li>Introduction </li></ul><ul><li>Related work </li></ul><ul><li>Review of finite-difference multigrid </li></ul><ul><li>Our finite-element multigrid approach </li></ul><ul><li>Streaming multigrid solver </li></ul><ul><li>Efficient convergence of 2 nd -order elements </li></ul><ul><li>Implementation </li></ul><ul><li>Application results </li></ul><ul><li>Conclusions & future work </li></ul>
    47. 47. Implementation <ul><li>Maximizing disk throughput </li></ul><ul><ul><li>Larger block (4MB) transfer to minimize disk latency </li></ul></ul><ul><ul><li>2X speed improvement </li></ul></ul><ul><ul><ul><li>Storing the intermediate u,f floating-point value on dist at half precision </li></ul></ul></ul><ul><li>Relaxation optimization </li></ul><ul><ul><li>leverage the vertical and horizontal symmetries of the stencil </li></ul></ul><ul><ul><li>Make use of CPU SSE2- 4-vector instructors </li></ul></ul><ul><li>Non-power-of-two image </li></ul><ul><ul><li>Padding the input image to 2 lmax-lmin times the coarest level </li></ul></ul><ul><li>Multi-channel image </li></ul><ul><ul><li>Stitching requires full-color gradient </li></ul></ul><ul><ul><li>Interleave the per-channel solutions to reduce the total # of passes </li></ul></ul>
    48. 48. Outline <ul><li>Introduction </li></ul><ul><li>Related work </li></ul><ul><li>Review of finite-difference multigrid </li></ul><ul><li>Our finite-element multigrid approach </li></ul><ul><li>Streaming multigrid solver </li></ul><ul><li>Efficient convergence of 2 nd -order elements </li></ul><ul><li>Implementation </li></ul><ul><li>Application results </li></ul><ul><li>Conclusions & future work </li></ul>
    49. 49. Environment <ul><li>NB </li></ul><ul><ul><li>2.2 GHz Core 2 Duo processor </li></ul></ul><ul><ul><li>4 GB RAM </li></ul></ul><ul><li>Timing </li></ul><ul><ul><li>I/O to read the gradient field from disk </li></ul></ul><ul><ul><li>Write JPEG-compressed ouput to disk </li></ul></ul><ul><li>Initial solution </li></ul><ul><ul><li>u=0 in all experiments </li></ul></ul><ul><li>(2,5,1) 2 nd order B-spline basis, 5 Gauss-Seidel updates with a single-V cycle </li></ul>
    50. 50. Image stitching 19,588 x 4,457 (87 MB) panorama form 9 photos Copy the image gradients and solving the Poisson eq.
    51. 51. Image stitching SM: streaming multigrid solver QT: quadtree (AT) solver of Agarwala [07]
    52. 52. Tone-mapping (HDR  normal tone) before after
    53. 53. Tone-mapping (HDR  normal tone) Stream multigraid is the first one to solve Poisson eq in time that is linear on the # of pixels
    54. 54. GB stitching and tone-mapping May not capture the true scene contrast
    55. 55. GB stitching and tone-mapping
    56. 56. Outline <ul><li>Introduction </li></ul><ul><li>Related work </li></ul><ul><li>Review of finite-difference multigrid </li></ul><ul><li>Our finite-element multigrid approach </li></ul><ul><li>Streaming multigrid solver </li></ul><ul><li>Efficient convergence of 2 nd -order elements </li></ul><ul><li>Implementation </li></ul><ul><li>Application results </li></ul><ul><li>Conclusions & future work </li></ul>
    57. 57. Conclusion <ul><li>Streaming multigrid </li></ul><ul><ul><li>Out-of-core tech. for solving large global linear system </li></ul></ul><ul><ul><ul><li>Local access </li></ul></ul></ul><ul><ul><ul><li>A few passes of sequential I/O </li></ul></ul></ul><ul><ul><li>2 nd order B-spline finite element formulation is compatible with traditional multigrid </li></ul></ul><ul><ul><ul><li>Efficient accurate solution in a single V-cycle </li></ul></ul></ul>
    58. 58. Future work <ul><li>Dirichlet boundary condition </li></ul><ul><ul><li>Modify 2D stencils </li></ul></ul><ul><ul><li>Construct f </li></ul></ul><ul><li>Soft constraint to match some original image u 0 </li></ul><ul><li>Weighted minimization </li></ul><ul><ul><li>where w(x,y) is a spatially varying 2x2 diagonal matrix that weight difference in x and y independently </li></ul></ul><ul><li>Bilaplacian </li></ul>
    59. 59. Future work <ul><li>Reduce disk bandwidth by using compression/decompression of the streamed temporary data </li></ul><ul><li>Parallelization </li></ul><ul><ul><li>Many-core CPUs or GPUs for instance by partitioning image rows </li></ul></ul>

    ×