Slideshow transcript
Slide 1: A Dimension Abstraction Approach to Vectorization in Matlab Neil Birkbeck Jonathan Levesque Jose Nelson Amaral Computing Science University of Alberta Edmonton, Alberta, Canada
Slide 2: Problem Problem Statement: Generate equivalent, error-free vectorized source code for Matlab source while utilizing higher level matrix operations when possible to improve efficiency.
Slide 3: Motivation Loop-based code is slower n=1000; than vector code in for i=1:n, A(i)=B(i)+C(i); Matlab. end Why? 5x faster! n=1000; interpretive overhead A(1:n)=B(1:n)+C(1:n); (type/shape checking,…) resizing of arrays in loops Vectorization also useful for compiled Matlab code, where optimized vector routines could be substituted.
Slide 4: Related Work Data dependence vectorization Allen & Kennedy’s Codegen algorithm Build data dependence graph Topological visit strongly connected components Abstract Matrix Form (AMF) [Menon & Pingali] axioms used to transform array code take advantage of matrix multiplication Not clear if it is easily extensible or allows for vectorization of irregular access (e.g., access to the diagonal)
Slide 5: Incorrect Vectorization Example 1: for i=1:n, Pull out of loop. Index variable a(i)=b(i)+c(i); a(1:n)=b(1:n)+c(1:n) substitution (i1:n) end Vectorization correct if a,b, and c are row vectors or column vectors If this is not true the vectorized code will introduce an error!
Slide 6: Incorrect Vectorization Example 2: for i=1:n, x(i)=y(i,h)*z(h,i); end Matlab is untyped Vectorization depends on whether h is a vector or scalar. If h is a scalar: x(1:n)=y(1:n,h).*z(h,1:n)’; Otherwise: x(1:n)=sum(y(1:n,h).*z(h,1:n)’,2);
Slide 7: Overview of Solution Data dependence-based vectorizer Knowledge of Vectorizable statement Shape of variables Propagate dimensionality up parse tree Yes Dimensions Perform Agree? Transformations No Leave statement in loop Output Vector statement
Slide 8: More Specifically Examples: Type dim Represent dimensionality of scalar (1) expressions as list of symbols 1xn vector (1,*) 1 or “*” (>1) nx1 vector (*,1),(*) mxn matrix (*,*) Assume known for variables. Propagate up parse tree according to Matlab rules Compatibility: dim(A)≈dim(B) when the lists are equivalent (after removal of redundant 1’s)
Slide 9: Vectorized Dimensionality Vectorized dimensionality: representation of dimensions after vectorization of a loop denoted dimi for loop with index variable i Introduce new symbol ri for index variable i exp dim(exp) vectorized dimi(exp) 10 (1) 10 (1) for i=1:n, i (1) 1:n (1,ri) a(i)=10+i; end a (*) a (*) a(i) (1) a(1:n) (ri)
Slide 10: Vectorized Dimensionality Expressions with incompatible vectorized dimensionality should not be vectorized. When do dimensionalities agree? Θ in {+,-,.*,…} Assignment expressions: elhs=erhs dimi(elhs)≈dimi(erhs) || erhs≈(1) Element-wise binary operators: e=elhsΘerhs dimi(elhs) ≈(1)||dimi(erhs)≈(1)||dimi(elhs)≈dim(erhs)
Slide 11: Vectorized Dimensionality Rules very restrictive: Assume dim(A)=dim(B)=dim(C)=(*,*) dimi,j(B)=(rj,ri) for i=1:100, dimi,j(C)=(ri,rj) for j=1:100 A(i,j)=B(j,i)+C(i,j); Vectorization fails because (ri,rj) is not compatible with (rj,ri) end end
Slide 12: Transpose Transformation Extension to utilize transpose when necessary is straightforward: For assignment: if dimi(A)≈reverse(dimi(B)) then A=BT is allowable for i=1:m, for j=1:n dimi,j(A)=reverse(dimi,j(B))=(ri,rj) A(i,j)=B(j,i); A(1:m,1:n)=(B(1:n,1:m))’ end end
Slide 13: Transpose Transformation Extension to utilize transpose when necessary is straightforward: Similar for pointwise operations: if dimi(A)≈reverse(dimi(B)) then AΘBT is allowable, propagate dimi(AΘBT)=dimi(A) if dimi(reverse(A))≈dimi(A) then ATΘB is allowable, propagate dimi(ATΘB)=dimi(B)
Slide 14: Pattern Database Dimensionality disagreement at binary operators inhibits vectorization. Recognizing patterns (consisting of operator type and operand dimensionalities) can be used to identify a transformation enabling vectorization. lhs operation rhs output Pattern: (ri, rj) Θ (ri,1) (ri, rj) for i=1:m, Transformed Result for j=1:n, A(i,j)=B(i,j)+C(i); B(i,j)+C(i); B(1:m,1:n)+repmat(C(1:m),1,n); end end
Slide 15: Pattern Database Diagonal access pattern: lhs operation rhs output Pattern: (ri, ri) (index) nil (1, ri) for i=1:n, a(1:n)=A((1:n)+size(A,1)*((1:n)-1)).*b(1:n); a(i)=A(i,i)*b(i); end Column major indexing of A
Slide 16: Additive Reduction Statements Additive-reduction statements use a loop variable to perform an accumulation. Not all loop nest index variables appear in output dimensionality for i1=…, Loop nest variables for i2=…, I={i1,i2,…,ik} … for i=1:m, a subset of E J is I={i,j} J={i} for ik=… j=1:n, for A(J)=A(J)+E; a(i)=a(i)+B(i,j); … end end end end end
Slide 17: Additive Reduction (Solution) Maintain/propagate dimensionality and reduced variables for an expression. ρ(E) denotes the reduced variables for expression E When checking statement A(J)=A(J)+E ensure dimi1,i2,…,ikA(J)≈dimi1,i2,…,ik(E) and ρ(E)=I-J any variable ri in I-J but not in ρ(E) must be reduced I={i},J={} rirnotdimi(b(i))=(ri,1) in dimi(10) i in for i=1:m I-J={i} I-J={i} Reduce: 10m*10, ρ(m*10)={ri} a=a+b(i); a=a+10; Reduce: b(i)sum(b(i),1); ρ(10)={} ρ(b(i))={} end Vectorize: a=a+sum(b(1:m)); Vectorize: a=a+m*10;
Slide 18: Additive Reduction via Matrix Multiplication Matrix multiplication can be for i=1:m used to perform reductions for j=1:n a(i)=a(i)+B(i,j)*x(j); on e=elhs*erhs , provided: end end dimi1,…,ik(elhs)=(Sl,rk) 1. dimi1,…,ik(erhs)=(rk,Sr) • j is used for reduction 2. • dimi,j(B(i,j))=(ri,rj) rk is a reduction variable. 3. • dimi,j (x(j))=(rj) Implies: a(1:m)=a(1:m)+… B(1:m,1:n)*x(1:n); dimi1,…,ik(e)=(Sl,Sr) ρ(e)=union(ρ(elhs), ρ(erhs),{rk})
Slide 19: Additive Reduction Example Additive reduction example: ρ(a(i,j))={}, dimi,j(a(i,j))=(ri,rj) for i=1:m, ρ(b(j))={}, dimi,j(b(j))=(rj) for j=1:n, rj is reduction variable matrix multiplication to Use d(i)=d(i)+a(i,j)*b(j)+c(i,j) reduce rj end ρ(a(i,j)*b(j))={rj}, end ρ(c(i,j))={}, dimi,j(a(i,j)*b(j))=(ri) dimi,j(c(i,j))={ri,rj} Need to reduce rj: c(i,j)sum(c(i,j),2); ρ(a(i,j)*b(j)+sum(c(i,j),2))={rj}, dimi,j(a(i,j)*b(j)+sum(c(i,j),2)=(ri,rj) Dimensionality and reduced variables agree, now replace index variables: d(1:m)=d(1:m)+a(1:m,1:n)*b(1:n)+sum(c(1:m,1:n),2);
Slide 20: Implementation Prototype Vectorized Original Loop Vectorizer Loop Embedded Code yes Control Octave Parser Generator Dimension Statements no Check no Success Vectorize Create DDG Statement yes Pattern database and corresponding transformations are specified in modular end-user extensible manner.
Slide 21: Results Source-to-source transformation Timing results averaged over 100 runs: Platform: Matlab7.2.0.283 3.0 GHz Pentium D Processor
Slide 22: Results Histogram Equalization: Input source Vectorized Result h=hist(im(:),[0:255]);%histogram h=hist(im(:),[(0:255)]); heq=255*cumsum(h(:))/sum(h(:)); heq=255*cumsum(h(:))/sum(h(:)); for i=1:size(im,1), im2(1:size(im,1),1:size(im,2))=... for j=1:size(im,2), heq(im(1:size(im,1),1:size(im,2))+1); im2(i,j)=heq(im(i,j)+1); end end For monochrome 8-bit 800x600 image: original/vectorized: Entire routine: 0.178s/0.114s (speedup: 1.56) Loop Portion only: 0.0814s/0.0176s (speedup: 4.6)
Slide 23: Results (Menon & Pingali Examples) for k=1:p, for j=1:(i-1), X(i,1:p)=X(i,1:p)-L(i,1:i-1)*X(1:i-1,1:p); X(i,k)=X(i,k)-L(i,j)*X(j,k); end end for i=1:N,for j=1:N phi(k)=phi(k)+sum(a(1:N,1:N)’* phi(k)=phi(k)+a(i,j)*x_se(i)*f(j); x_se(1:N).*f(1:N),1); end end for i=1:n,for j=1:n, for k=1:n,for l=1:n y(i)=y(i)+x(j)*A(i,k)* y(1:n)=y(1:n)+x(1:n)’*... B(l,k)*C(l,j); (A(1:n,1:n)*B(1:n,1:n)’*C(1:n,1:n))’; end end end end Settings Input time (s) Output time(s) speedup i=500,p=5000 0.536s 0.030s 17 N=1000 0.174s 0.012s 14 n=40 0.622s 0.0001s 5000
Slide 24: Remaining Issues/Future Work Each pattern transformation is local; no optimization over entire statement. e.g., we do not optimize and distribute transposes Control flow within loop Function calls functionsare treated as pointwise operators (correct for many predefined arithmetic functions) Incorporate our analysis directly with shape analysis
Slide 25: Summary Contributions: A simple method to prevent incorrect vectorization in Matlab A user extensible operator/dimensionality pattern database can be used to improve vectorization These patterns can make use of higher level semantics (e.g., matrix multiplication) or diagonal accesses in vectorization.
Slide 26: Acknowledgements Funding provided by NSERC Grateful for reviewers comments and suggestions
Slide 27: Thank You Questions?



Add a comment on Slide 1
If you have a SlideShare account, login to comment; else you can comment as a guest- Favorites & Groups
Showing 1-50 of 0 (more)