A FRAMEWORK FOR 
PRACTICAL FAST MATRIX 
MULTIPLICATION 
1 
0 5000 10000 15000 
25 
20 
15 
10 
5 
0 
Dimension (N) 
Effective GFLOPS / core 
Parallel performance of Strassen on <N,N,N> 
MKL, 6 cores 
MKL, 24 cores 
DFS, 6 cores 
BFS, 6 cores 
HYBRID, 6 cores 
DFS, 24 cores 
BFS, 24 cores 
HYBRID, 24 cores 
arXiv: 1409.2908 
Austin Benson (arbenson@stanford.edu), ICME, Stanford 
Grey Ballard, Sandia National Laboratories 
BLIS Retreat, September 26, 2014
Fast matrix multiplication: 
bridging theory and practice 
2 
• There are a number of Strassen-like algorithms for matrix 
multiplication that have only been “discovered” recently. 
[Smirnov13], [Benson&Ballard14] 
• We show that they can achieve higher performance with 
respect to MKL (sequential and sometimes in parallel). 
• We use code generation to do extensive prototyping. There 
are several practical issues, and there is plenty of room for 
improvement (lots of expertise at UT to help here!) 
2 2.81 3 
[Strassen79] 
2.37 
[Williams12] 
xxxxx xxx x
Strassen’s algorithm 
3
4 
Key ingredients of Strassen’s algorithm 
• 1. Block partitioning of matrices (<2, 2, 2>) 
• 2. Seven linear combinations of sub-blocks of A 
• 3. Seven linear combinations of sub-blocks of B 
• 4. Seven matrix multiplies to form Mr (recursive) 
• 5. Linear combinations of Mr to form Cij
Key ingredients of fast matmul algorithms 
• 1. Block partitioning of matrices (<M, K, N>) 
• 2. R linear combinations of sub-blocks of A 
• 3. R linear combinations of sub-blocks of B 
• 4. R matrix multiplies to form Mr (recursive) 
R < MKN  faster than classical 
• 5. Linear combinations of Mr to form Cij 
5
“Outer product” fast algorithm 
• <4, 2, 4> partitioning 
• R = 26 multiplies (< 4 * 2 * 4 = 32) 
 23% speedup per recursive step (if everything else free) 
• Linear combinations of Aij to form Sr: 68 terms 
• Linear combinations of Bij to form Tr: 52 terms 
• Linear combinations of Mr to form Cij: 69 terms 
6
Discovering fast algorithms is a 
numerical challenge 
7 
• Low-rank tensor decompositions lead to fast algorithms 
• Tensors are small, but we need exact decompositions 
 NP-hard 
• Use alternating least squares with regularization and 
rounding tricks [Smirnov13], [Benson&Ballard14] 
• We have around 10 fast algorithms for <M, K, N> 
decompositions. Also have permutations, e.g., <K, M, N>.
8 
[Strassen69] 
[Smirnov13]
Code generation lets us prototype 
algorithms quickly 
9 
• We have compact representation of many fast algorithms: 
1. dimensions of block partitioning (<M, K, N>) 
2. linear combinations of sub-blocks (Sr, Tr) 
3. linear combinations of Mr to form Cij 
• We use code generation to rapidly prototype fast algorithms 
• Our approach: test all algorithms on a bunch of different 
problem sizes and look for patterns
Practical issues 
10 
• Best way to do matrix additions? (in paper) 
• Can we eliminate redundant linear combinations? (in paper) 
• Different problem shapes other than square (this talk) 
• When to stop recursion? (this talk) 
• How to parallelize? (this talk) 
=
Recursion cutoff: look at gemm curve 
25 
20 
15 
10 
0 1000 2000 3000 
Dimension (N) 
GFLOPS 
Sequential dgemm performance 
N x 800 x 800 
N x 800 x N 
N x N x N 
peak 
25 
20 
15 
10 
0 2000 4000 6000 8000 
Dimension (N) 
GFLOPS / core 
Parallel dgemm performance (24 cores) 
Basic idea: take another 
recursive step if the sub-problems 
will still operate at 
high performance 
11 
<M, K, N> = <4, 2, 3>
Sequential performance 
28 
26 
24 
22 
20 
18 
16 
0 2000 4000 6000 8000 
Dimension (N) 
Effective GFLOPS 
Sequential performance on N x N x N 
= 
12 
MKL 
STRASSEN 
<3,2,2> 
<3,2,4> 
<4,2,3> 
<3,4,2> 
<3,3,3> 
<4,2,4> 
<2,3,4> 
Effective GFLOPS for M x K x N multiplies 
= 1e-9 * 2 * MKN / time in seconds 
True peak
Sequential performance 
28 
26 
24 
22 
20 
18 
16 
0 2000 4000 6000 8000 
Dimension (N) 
Effective GFLOPS 
Sequential performance on N x N x N 
= 
MKL 
STRASSEN 
<4,4,2> 
<4,3,3> 
<3,4,3> 
<3,3,6> 
<3,6,3> 
<6,3,3> 
• All algorithms beat MKL on large problems 
• Strassen’s algorithm is hard to beat 
13
Sequential performance = 
27 
26 
25 
24 
23 
22 
2000 4000 6000 8000 10000 12000 
dimension (N) 
Effective GFLOPS 
Sequential performance on N x 1600 x N 
MKL 
<4,2,4> 
<4,3,3> 
<3,2,3> 
<4,2,3> 
STRASSEN 
• Almost all algorithms beat MKL 
• <4, 2, 4> and <3, 2, 3> tend to perform the best 
14
Sequential performance = 
26 
25 
24 
23 
22 
• Almost all algorithms beat MKL 
• <4, 3, 3> and <4, 2, 3> tend to perform the best 
15 
10000 12000 14000 16000 18000 
dimension (N) 
Effective GFLOPS 
Sequential performance on N x 2400 x 2400 
MKL 
<4,2,4> 
<4,3,3> 
<3,2,3> 
<4,2,3> 
STRASSEN
Parallelization 
C 
+ 
M2 … 
M1 M7 
+ 
M2 … 
M1 M7 
+ 
M2 … 
M1 M7 
16
DFS Parallelization 
C 
+ 
M2 … 
M1 M7 
+ 
M2 … 
M1 M7 
All threads 
Use parallel MKL 
17 
+ Easy to implement 
+ Load balanced 
+ Same memory 
footprint as sequential 
- Need large base 
cases for high 
performance
BFS Parallelization 
C 
+ 
omp taskwait 
M2 … 
M1 M7 
+ 
omp taskwait 
M2 … 
M1 M7 
1 thread 
18 
1 thread 1 thread 
+ High performance for smaller base cases 
- Sometimes harder to load balance: 24 threads, 49 subproblems 
- More memory
HYBRID parallelization 
C 
+ 
omp taskwait 
M2 … 
M1 M7 
+ 
omp taskwait 
M2 … 
M1 M7 
19 
1 thread 1 thread all threads 
+ Better load balancing 
- Explicit synchronization or else we can over-subscribe threads
20 
0 5000 10000 15000 20000 0 5000 10000 15000 
25 
20 
15 
10 
5 
0 
Dimension (N) 
Effective GFLOPS / core 
Parallel performance of <4,2,4> on <N,2800,N> 
MKL, 6 cores 
MKL, 24 cores 
DFS, 6 cores 
BFS, 6 cores 
HYBRID, 6 cores 
DFS, 24 cores 
BFS, 24 cores 
HYBRID, 24 cores
Bandwidth problems 
• We rely on the cost of matrix multiplications to be much 
more expensive than the cost of matrix additions 
• Parallel dgemm on 24 cores: easily get 50-75% of peak 
• STREAM benchmark: < 6x speedup in read/write 
performance on 24 cores 
C 
+ 
M2 … 
M1 M7 
21
Parallel performance = 
22 
Performance (6 cores) on N x N x N 
28 
26 
24 
22 
20 
18 
9000 10000 11000 12000 13000 
Dimension (N) 
Effective GFLOPS / core 
MKL 
STRASSEN 
<3,2,2> 
<3,2,4> 
<4,2,3> 
<3,4,2> 
<3,3,3> 
<4,2,4> 
<2,3,4> 
Performance (24 cores) on N x N x N 
22 
20 
18 
16 
9000 10000 11000 12000 13000 
Dimension (N) 
Effective GFLOPS / core 
MKL 
STRASSEN 
<3,2,2> 
<3,2,4> 
<4,2,3> 
<3,4,2> 
<3,3,3> 
<4,2,4> 
<2,3,4> 
• 6 cores: similar performance to sequential 
• 24 cores: can sometimes beat MKL, but barely
24 
23 
22 
21 
20 
19 
18 
10000 15000 20000 10000 15000 
dimension (N) 
Effective GFLOPS / core 
Performance (6 cores) on N x 2800 x N 
MKL 
<4,2,4> 
<4,3,3> 
<3,2,3> 
<4,2,3> 
STRASSEN 
20 
18 
16 
14 
12 
5000 10000 15000 20000 
dimension (N) 
Effective GFLOPS / core 
Performance (24 cores) on N x 2800 x N 
MKL 
<4,2,4> 
<4,3,3> 
<3,2,3> 
<4,2,3> 
STRASSEN 
Parallel performance = 
Bad MKL 
performance 
• 6 cores: similar performance to sequential 
• 24 cores: MKL best for large problems 
23
Parallel performance = 
23 
22 
21 
20 
19 
18 
18 
17 
16 
15 
14 
13 
12 
• 6 cores: similar performance to sequential 
• 24 cores: MKL usually the best 
24 
10000 15000 20000 10000 15000 
dimension (N) 
Effective GFLOPS / core 
Performance (6 cores) on N x 3000 x 3000 
MKL 
<4,2,4> 
<4,3,3> 
<3,2,3> 
<4,2,3> 
STRASSEN 
16000 18000 20000 22000 24000 26000 
dimension (N) 
Effective GFLOPS / core 
Performance (24 cores) on N x 3000 x 3000 
MKL 
<4,2,4> 
<4,3,3> 
<3,2,3> 
<4,2,3> 
STRASSEN
High-level conclusions 
25 
• For square matrix multiplication, Strassen’s algorithm is 
hard to beat 
• For rectangular matrix multiplication, use a fast algorithm 
that “matches the shape” 
• Bandwidth limits the performance of shared memory 
parallel fast matrix multiplication 
 should be less of an issue in distributed memory 
Future work: 
• Numerical stability 
• Using fast matmul as a kernel for other algorithms in 
numerical linear algebra
A FRAMEWORK FOR 
PRACTICAL FAST MATRIX 
MULTIPLICATION 
26 
0 5000 10000 15000 
25 
20 
15 
10 
5 
0 
Dimension (N) 
Effective GFLOPS / core 
Parallel performance of Strassen on <N,N,N> 
MKL, 6 cores 
MKL, 24 cores 
DFS, 6 cores 
BFS, 6 cores 
HYBRID, 6 cores 
DFS, 24 cores 
BFS, 24 cores 
HYBRID, 24 cores 
arXiv: 1409.2908 
Austin Benson (arbenson@stanford.edu), ICME, Stanford 
Grey Ballard, Sandia National Laboratories 
BLIS Retreat, September 26, 2014
Matrix additions (linear combinations) 
S1 S2 
S S7 S 6 S 5 4 S3 
A11 A12 A21 A22 
“Pairwise” 
2x 
DAXPY 
2x 
DAXPY 
27
Matrix additions (linear combinations) 
S1 S2 
S S7 S 6 S 5 4 S3 
A11 A12 A21 A22 
“Write once” 
custom 
“DAXPY” 
custom 
“DAXPY” 
28
Matrix additions (linear combinations) 
A11 A12 A21 A22 
Entry-wise 
updates 
S1 S2 
S S7 S 6 S 5 4 S3 
“Streaming” 
29
Common subexpression elimination (CSE) 
• Example in <4, 2, 4> algorithm (R = 26 multiples): 
T11 T25 
B B24 12 B22 B23 
Four additions, six reads, two writes 
30
Common subexpression elimination (CSE) 
• Example in <4, 2, 4> algorithm (R = 26 multiples): 
T11 T25 
B B24 12 B22 B23 
Y 
Three additions, six reads, three writes 
 Net increase in communication! 
31
CSE does not really help 
Effective GFLOPS for M x K x N multiplies 
= 1e-9 * 2 * MKN / time in seconds 
32

A framework for practical fast matrix multiplication (BLIS retreat)

  • 1.
    A FRAMEWORK FOR PRACTICAL FAST MATRIX MULTIPLICATION 1 0 5000 10000 15000 25 20 15 10 5 0 Dimension (N) Effective GFLOPS / core Parallel performance of Strassen on <N,N,N> MKL, 6 cores MKL, 24 cores DFS, 6 cores BFS, 6 cores HYBRID, 6 cores DFS, 24 cores BFS, 24 cores HYBRID, 24 cores arXiv: 1409.2908 Austin Benson (arbenson@stanford.edu), ICME, Stanford Grey Ballard, Sandia National Laboratories BLIS Retreat, September 26, 2014
  • 2.
    Fast matrix multiplication: bridging theory and practice 2 • There are a number of Strassen-like algorithms for matrix multiplication that have only been “discovered” recently. [Smirnov13], [Benson&Ballard14] • We show that they can achieve higher performance with respect to MKL (sequential and sometimes in parallel). • We use code generation to do extensive prototyping. There are several practical issues, and there is plenty of room for improvement (lots of expertise at UT to help here!) 2 2.81 3 [Strassen79] 2.37 [Williams12] xxxxx xxx x
  • 3.
  • 4.
    4 Key ingredientsof Strassen’s algorithm • 1. Block partitioning of matrices (<2, 2, 2>) • 2. Seven linear combinations of sub-blocks of A • 3. Seven linear combinations of sub-blocks of B • 4. Seven matrix multiplies to form Mr (recursive) • 5. Linear combinations of Mr to form Cij
  • 5.
    Key ingredients offast matmul algorithms • 1. Block partitioning of matrices (<M, K, N>) • 2. R linear combinations of sub-blocks of A • 3. R linear combinations of sub-blocks of B • 4. R matrix multiplies to form Mr (recursive) R < MKN  faster than classical • 5. Linear combinations of Mr to form Cij 5
  • 6.
    “Outer product” fastalgorithm • <4, 2, 4> partitioning • R = 26 multiplies (< 4 * 2 * 4 = 32)  23% speedup per recursive step (if everything else free) • Linear combinations of Aij to form Sr: 68 terms • Linear combinations of Bij to form Tr: 52 terms • Linear combinations of Mr to form Cij: 69 terms 6
  • 7.
    Discovering fast algorithmsis a numerical challenge 7 • Low-rank tensor decompositions lead to fast algorithms • Tensors are small, but we need exact decompositions  NP-hard • Use alternating least squares with regularization and rounding tricks [Smirnov13], [Benson&Ballard14] • We have around 10 fast algorithms for <M, K, N> decompositions. Also have permutations, e.g., <K, M, N>.
  • 8.
  • 9.
    Code generation letsus prototype algorithms quickly 9 • We have compact representation of many fast algorithms: 1. dimensions of block partitioning (<M, K, N>) 2. linear combinations of sub-blocks (Sr, Tr) 3. linear combinations of Mr to form Cij • We use code generation to rapidly prototype fast algorithms • Our approach: test all algorithms on a bunch of different problem sizes and look for patterns
  • 10.
    Practical issues 10 • Best way to do matrix additions? (in paper) • Can we eliminate redundant linear combinations? (in paper) • Different problem shapes other than square (this talk) • When to stop recursion? (this talk) • How to parallelize? (this talk) =
  • 11.
    Recursion cutoff: lookat gemm curve 25 20 15 10 0 1000 2000 3000 Dimension (N) GFLOPS Sequential dgemm performance N x 800 x 800 N x 800 x N N x N x N peak 25 20 15 10 0 2000 4000 6000 8000 Dimension (N) GFLOPS / core Parallel dgemm performance (24 cores) Basic idea: take another recursive step if the sub-problems will still operate at high performance 11 <M, K, N> = <4, 2, 3>
  • 12.
    Sequential performance 28 26 24 22 20 18 16 0 2000 4000 6000 8000 Dimension (N) Effective GFLOPS Sequential performance on N x N x N = 12 MKL STRASSEN <3,2,2> <3,2,4> <4,2,3> <3,4,2> <3,3,3> <4,2,4> <2,3,4> Effective GFLOPS for M x K x N multiplies = 1e-9 * 2 * MKN / time in seconds True peak
  • 13.
    Sequential performance 28 26 24 22 20 18 16 0 2000 4000 6000 8000 Dimension (N) Effective GFLOPS Sequential performance on N x N x N = MKL STRASSEN <4,4,2> <4,3,3> <3,4,3> <3,3,6> <3,6,3> <6,3,3> • All algorithms beat MKL on large problems • Strassen’s algorithm is hard to beat 13
  • 14.
    Sequential performance = 27 26 25 24 23 22 2000 4000 6000 8000 10000 12000 dimension (N) Effective GFLOPS Sequential performance on N x 1600 x N MKL <4,2,4> <4,3,3> <3,2,3> <4,2,3> STRASSEN • Almost all algorithms beat MKL • <4, 2, 4> and <3, 2, 3> tend to perform the best 14
  • 15.
    Sequential performance = 26 25 24 23 22 • Almost all algorithms beat MKL • <4, 3, 3> and <4, 2, 3> tend to perform the best 15 10000 12000 14000 16000 18000 dimension (N) Effective GFLOPS Sequential performance on N x 2400 x 2400 MKL <4,2,4> <4,3,3> <3,2,3> <4,2,3> STRASSEN
  • 16.
    Parallelization C + M2 … M1 M7 + M2 … M1 M7 + M2 … M1 M7 16
  • 17.
    DFS Parallelization C + M2 … M1 M7 + M2 … M1 M7 All threads Use parallel MKL 17 + Easy to implement + Load balanced + Same memory footprint as sequential - Need large base cases for high performance
  • 18.
    BFS Parallelization C + omp taskwait M2 … M1 M7 + omp taskwait M2 … M1 M7 1 thread 18 1 thread 1 thread + High performance for smaller base cases - Sometimes harder to load balance: 24 threads, 49 subproblems - More memory
  • 19.
    HYBRID parallelization C + omp taskwait M2 … M1 M7 + omp taskwait M2 … M1 M7 19 1 thread 1 thread all threads + Better load balancing - Explicit synchronization or else we can over-subscribe threads
  • 20.
    20 0 500010000 15000 20000 0 5000 10000 15000 25 20 15 10 5 0 Dimension (N) Effective GFLOPS / core Parallel performance of <4,2,4> on <N,2800,N> MKL, 6 cores MKL, 24 cores DFS, 6 cores BFS, 6 cores HYBRID, 6 cores DFS, 24 cores BFS, 24 cores HYBRID, 24 cores
  • 21.
    Bandwidth problems •We rely on the cost of matrix multiplications to be much more expensive than the cost of matrix additions • Parallel dgemm on 24 cores: easily get 50-75% of peak • STREAM benchmark: < 6x speedup in read/write performance on 24 cores C + M2 … M1 M7 21
  • 22.
    Parallel performance = 22 Performance (6 cores) on N x N x N 28 26 24 22 20 18 9000 10000 11000 12000 13000 Dimension (N) Effective GFLOPS / core MKL STRASSEN <3,2,2> <3,2,4> <4,2,3> <3,4,2> <3,3,3> <4,2,4> <2,3,4> Performance (24 cores) on N x N x N 22 20 18 16 9000 10000 11000 12000 13000 Dimension (N) Effective GFLOPS / core MKL STRASSEN <3,2,2> <3,2,4> <4,2,3> <3,4,2> <3,3,3> <4,2,4> <2,3,4> • 6 cores: similar performance to sequential • 24 cores: can sometimes beat MKL, but barely
  • 23.
    24 23 22 21 20 19 18 10000 15000 20000 10000 15000 dimension (N) Effective GFLOPS / core Performance (6 cores) on N x 2800 x N MKL <4,2,4> <4,3,3> <3,2,3> <4,2,3> STRASSEN 20 18 16 14 12 5000 10000 15000 20000 dimension (N) Effective GFLOPS / core Performance (24 cores) on N x 2800 x N MKL <4,2,4> <4,3,3> <3,2,3> <4,2,3> STRASSEN Parallel performance = Bad MKL performance • 6 cores: similar performance to sequential • 24 cores: MKL best for large problems 23
  • 24.
    Parallel performance = 23 22 21 20 19 18 18 17 16 15 14 13 12 • 6 cores: similar performance to sequential • 24 cores: MKL usually the best 24 10000 15000 20000 10000 15000 dimension (N) Effective GFLOPS / core Performance (6 cores) on N x 3000 x 3000 MKL <4,2,4> <4,3,3> <3,2,3> <4,2,3> STRASSEN 16000 18000 20000 22000 24000 26000 dimension (N) Effective GFLOPS / core Performance (24 cores) on N x 3000 x 3000 MKL <4,2,4> <4,3,3> <3,2,3> <4,2,3> STRASSEN
  • 25.
    High-level conclusions 25 • For square matrix multiplication, Strassen’s algorithm is hard to beat • For rectangular matrix multiplication, use a fast algorithm that “matches the shape” • Bandwidth limits the performance of shared memory parallel fast matrix multiplication  should be less of an issue in distributed memory Future work: • Numerical stability • Using fast matmul as a kernel for other algorithms in numerical linear algebra
  • 26.
    A FRAMEWORK FOR PRACTICAL FAST MATRIX MULTIPLICATION 26 0 5000 10000 15000 25 20 15 10 5 0 Dimension (N) Effective GFLOPS / core Parallel performance of Strassen on <N,N,N> MKL, 6 cores MKL, 24 cores DFS, 6 cores BFS, 6 cores HYBRID, 6 cores DFS, 24 cores BFS, 24 cores HYBRID, 24 cores arXiv: 1409.2908 Austin Benson (arbenson@stanford.edu), ICME, Stanford Grey Ballard, Sandia National Laboratories BLIS Retreat, September 26, 2014
  • 27.
    Matrix additions (linearcombinations) S1 S2 S S7 S 6 S 5 4 S3 A11 A12 A21 A22 “Pairwise” 2x DAXPY 2x DAXPY 27
  • 28.
    Matrix additions (linearcombinations) S1 S2 S S7 S 6 S 5 4 S3 A11 A12 A21 A22 “Write once” custom “DAXPY” custom “DAXPY” 28
  • 29.
    Matrix additions (linearcombinations) A11 A12 A21 A22 Entry-wise updates S1 S2 S S7 S 6 S 5 4 S3 “Streaming” 29
  • 30.
    Common subexpression elimination(CSE) • Example in <4, 2, 4> algorithm (R = 26 multiples): T11 T25 B B24 12 B22 B23 Four additions, six reads, two writes 30
  • 31.
    Common subexpression elimination(CSE) • Example in <4, 2, 4> algorithm (R = 26 multiples): T11 T25 B B24 12 B22 B23 Y Three additions, six reads, three writes  Net increase in communication! 31
  • 32.
    CSE does notreally help Effective GFLOPS for M x K x N multiplies = 1e-9 * 2 * MKN / time in seconds 32

Editor's Notes

  • #4 egin{bmatrix} C_{11} & C_{12} \ C_{21} & C_{22} end{bmatrix} = egin{bmatrix} A_{11} & A_{12} \ A_{21} & A_{22} end{bmatrix} cdot egin{bmatrix} B_{11} & B_{12} \ B_{21} & B_{22} end{bmatrix} S_1 &= A_{11} + A_{22} \ S_2 &= A_{21} + A_{22} \ S_3 &= A_{11} \ S_4 &= A_{22} \ S_5 &= A_{11} + A_{12} \ S_6 &= A_{21} - A_{11} \ S_7 &= A_{12} - A_{22} \ T_1 &= B_{11} + B_{22} \ T_2 &= B_{11} \ T_3 &= B_{12} - B_{22} \ T_4 &= B_{21} - B_{11} \ T_5 &= B_{22} \ T_6 &= B_{11} + B_{12} \ T_7 &= B_{21} + B_{22} \
  • #9 egin{table} egin{tabular}{l c c c c c} Algorithm & Multiples & Multiplies & speedup per & \ base case & (fast) & (classical) & recursive step & exponent\ $langle 2,2,3 angle $ & 11 & 12 & 9\% & 2.89 \ $langle 2,2,5 angle $ & 18 & 20 & 11\% & 2.89\ $langle 2,2,2 angle $ & 7 & 8 & 14\% & 2.81 \ $langle 2,2,4 angle $ & 14 & 16 & 14\% & 2.85\ $langle 3,3,3 angle $ & 23 & 26 & 17\% & 2.85 \ $langle 2,3,3 angle $ & 15 & 18 & 20\% & 2.81 \ $langle 2,3,4 angle $ & 20 & 24 & 20\% & 2.83\ $langle 2,4,4 angle $ & 26 & 32 & 23\% & 2.82 \ $langle 3,3,4 angle $ & 29 & 36 & 24\% & 2.82 \ $langle 3,4,4 angle $ & 38 & 48 & 26\% & 2.82 \ $langle 3,3,6 angle $ & 40 & 54 & 35\% & 2.77 \ end{tabular} end{table}
  • #17 egin{eqnarray*} S_7 &=& A_{12} - A_{22} \ T_7 &=& B_{21} + B_{22} \ M_7 &=& S_7 cdot T_7 end{eqnarray*}
  • #18 egin{eqnarray*} S_7 &=& A_{12} - A_{22} \ T_7 &=& B_{21} + B_{22} \ M_7 &=& S_7 cdot T_7 end{eqnarray*}
  • #19 egin{eqnarray*} S_7 &=& A_{12} - A_{22} \ T_7 &=& B_{21} + B_{22} \ M_7 &=& S_7 cdot T_7 end{eqnarray*}
  • #31 T{11} &= B{24} - left(B{12} + B{22} ight) \ T{25} &= B{23} + B{12} + B{22}
  • #32 Y &= B_{12} + B_{22} \ T_{11} &= B_{24} - Y \ T_{25} &= B_{23} + Y