Background       How to use CUDA to accelerate           Experiment                                       Q&A.......      ...
Background            How to use CUDA to accelerate           Experiment                                       Q&A....... ...
Background                How to use CUDA to accelerate           Experiment                                       Q&A.......
Background                How to use CUDA to accelerate           Experiment                                       Q&A.......
Background                How to use CUDA to accelerate           Experiment                                       Q&A.......
Background                How to use CUDA to accelerate           Experiment                                       Q&A.......
Background                How to use CUDA to accelerate           Experiment                                       Q&A.......
Background                How to use CUDA to accelerate           Experiment                                       Q&A.......
Background                    How to use CUDA to accelerate            Experiment                                       Q&...
Background                    How to use CUDA to accelerate           Experiment                                       Q&A...
Background                    How to use CUDA to accelerate              Experiment                                       ...
Background                    How to use CUDA to accelerate               Experiment                                      ...
Background                    How to use CUDA to accelerate           Experiment                                       Q&A...
Background                    How to use CUDA to accelerate                 Experiment                                    ...
Background                    How to use CUDA to accelerate           Experiment                                       Q&A...
Background                    How to use CUDA to accelerate               Experiment                                      ...
Background                    How to use CUDA to accelerate           Experiment                                       Q&A...
Background                    How to use CUDA to accelerate           Experiment                                       Q&A...
Background                How to use CUDA to accelerate           Experiment                                       Q&A.......
Background                         How to use CUDA to accelerate           Experiment                                     ...
Background                         How to use CUDA to accelerate           Experiment                                     ...
Background                         How to use CUDA to accelerate           Experiment                                     ...
Background                         How to use CUDA to accelerate           Experiment                                     ...
Background                         How to use CUDA to accelerate           Experiment                                     ...
Background                         How to use CUDA to accelerate             Experiment                     Q&A.......    ...
Background                    How to use CUDA to accelerate           Experiment                                       Q&A...
Background                    How to use CUDA to accelerate           Experiment                                       Q&A...
Background                    How to use CUDA to accelerate           Experiment                                       Q&A...
Background                    How to use CUDA to accelerate           Experiment                                       Q&A...
Background                    How to use CUDA to accelerate           Experiment                                       Q&A...
Background                    How to use CUDA to accelerate           Experiment                                       Q&A...
Background                    How to use CUDA to accelerate           Experiment                                       Q&A...
Background                    How to use CUDA to accelerate           Experiment                                       Q&A...
Background              How to use CUDA to accelerate           Experiment                                       Q&A.........
Background                        How to use CUDA to accelerate                                   Experiment              ...
Background                     How to use CUDA to accelerate                               Experiment                     ...
Background                         How to use CUDA to accelerate                                    Experiment            ...
Background                   How to use CUDA to accelerate                               Experiment                       ...
Background                        How to use CUDA to accelerate                               Experiment                  ...
Background                        How to use CUDA to accelerate                               Experiment                  ...
Background                      How to use CUDA to accelerate                                     Experiment              ...
Background                        How to use CUDA to accelerate                                      Experiment           ...
Background                       How to use CUDA to accelerate                                   Experiment               ...
Background                         How to use CUDA to accelerate                                       Experiment         ...
Background   How to use CUDA to accelerate           Experiment                                       Q&A.......      .......
Background              How to use CUDA to accelerate           Experiment                                       Q&A.........
Upcoming SlideShare
Loading in …5
×

Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

480 views

Published on

https://github.com/yszheda/GPU-RSCode

Published in: Technology
1 Comment
1 Like
Statistics
Notes
  • 犯二了,今天才发现把实验数据大小写错了...byte应为KB...
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
480
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
20
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system

  1. 1. Background How to use CUDA to accelerate Experiment Q&A....... ..... .... Accelerate Reed-solomon coding for Fault-Tolerance in RAID-like system Shuai YUAN NTHU LSA lab . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 1 / 19
  2. 2. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Outline ▶ Background ▶ How to use CUDA to accelerate ▶ Experiment ▶ Q&A . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 2 / 19
  3. 3. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Why Reed-Solomon codes?Why fault-tolerance? In a RAID-like system, storage is distributed among several devices, the probability of one of these devices failing becomes significant. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 3 / 19
  4. 4. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Why Reed-Solomon codes?Why fault-tolerance? In a RAID-like system, storage is distributed among several devices, the probability of one of these devices failing becomes significant. MTTF (mean time to failure) of one device is P , P MTTF of a system of n devices is . n . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 3 / 19
  5. 5. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Why Reed-Solomon codes?Why fault-tolerance? In a RAID-like system, storage is distributed among several devices, the probability of one of these devices failing becomes significant. MTTF (mean time to failure) of one device is P , P MTTF of a system of n devices is . n Thus in such systems, fault-tolerance must be taken into account. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 3 / 19
  6. 6. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Why Reed-Solomon codes?Why erasure codes? Replication is expensive! e.g. To achieve double-fault tolerance, we need 2 replicas. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 4 / 19
  7. 7. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Why Reed-Solomon codes?Erasure Codes Given a message/file of k equal-size blocks/chunks/fragments, encode it into n blocks/chunks/fragments (n > k) ▶ Optimal erasure codes: any k out of the n fragments are sufficient to recover the original message/file. i.e. can tolerate arbitrary (n − k) failures. We call this (n,k) MDS (maximum distance separable) property. e.g. Reed Solomon Codes ▶ Near-optimal erasure codes: require (1 + ε)k code blocks/chunks to recover (ε > 0). i.e. cannot tolerate arbitrary (n − k) failures. e.g. Local Reconstruction Codes (used in WAS) . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 5 / 19
  8. 8. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Why Reed-Solomon codes?Replication vs. Erasure Codes Given a file of size M Using Reed Solomon Codes(n=4, k=2) Achieve double fault tolerance ▶ R-S Codes: storage overhead = M ▶ Relication: storage overhead = 2M (2 replicas) . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 6 / 19
  9. 9. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Reed-Solomon coding mechanismReed-Solomon encoding process Given file F : divide it into k equal-size native chunks: F = [Fi ]i=1,2,...,k . Encode them into (n − k) code chunks: C = [Ci ]i=1,2,...,n−k . Use Encoding Matrix EM(n−k)×k to produce code chunks: CT = EM × F T Ci is the linear combination of F1 , F2 , . . . , Fk . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 7 / 19
  10. 10. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Reed-Solomon coding mechanismReed-Solomon encoding process In our case, ( ) F = ( B A ) 1 1 EM = (1 2) ( ) 1 1 A CT = × (1 2 ) B A+B = A + 2B . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 7 / 19
  11. 11. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Reed-Solomon coding mechanismReed-Solomon encoding process Let P = [Pi ]i=1,2,...,n = [F1 , F2 , . . . ,) k , C1 , C2 , . . . , Cn−k ] be the ( F I n chunks in storage, EM ′ = , Here EM   1 0 ... 0 0 1 . . . 0   I = . . .. . . . . . . . . 0 0 ... 1 then PT = EM ′ ) F T ( T× F = CT . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 7 / 19
  12. 12. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Reed-Solomon coding mechanismReed-Solomon encoding process In our case,   ( )1 0 I 0 1 EM ′ = = 1 1  EM 1 2 PT = EM ′ × FT  1 0 ( ) 0 1 =  × A 1 1 B 1 2  A  B  =   A+B  A + 2B . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 7 / 19
  13. 13. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Reed-Solomon coding mechanismReed-Solomon encoding process EM ′ is the key of MDS property! . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 7 / 19
  14. 14. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Reed-Solomon coding mechanismReed-Solomon encoding process Theorem (necessary and sufficient condition) Every possible k × k submatrix obtained by removing (n − k) rows from EM ′ has full rank. equivalent expression of full rank: ▶ rank = k ▶ non-singular . . . Alternative view: Consider the linear space of P = [Pi ]i=1,2,...,n = [F1 , F2 , . . . , Fk , C1 , C2 , . . . , Cn−k ], its dimension is k, and any k out of n vectors form a basis of the linear space. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 7 / 19
  15. 15. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Reed-Solomon coding mechanismReed-Solomon encoding process Reed-Solomon Codes uses Vandermonde matrix V as EM   1 1 1 ... 1  1 2 3 ... k     12 2 2 3 2 ... k2  V =   . . . .. .   . . . . . . . .  . 1(n−k) 2(n−k) 3(n−k) . . . k (n−k) . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 7 / 19
  16. 16. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Reed-Solomon coding mechanismReed-Solomon encoding process   1 0 0 ... 0  0 1 0 ... 0     0 0 1 ... 0     . . . .. .   . . . . . . . . .     0 0 0 ... 1  EM =  ′  1    1 1 ... 1   1 2 3 ... k     12 2 2 32 ... k2     . . . .. .   . . . . . . . . .  1 (n−k) 2(n−k) 3(n−k) . . . k (n−k) . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 7 / 19
  17. 17. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Reed-Solomon coding mechanismReed-Solomon encoding process Remark: ▶ All arithmetic operations in Galois Field GF(2x ). Then every number is less than 2x . ▶ EM ′ satisfies MDS property. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 7 / 19
  18. 18. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Reed-Solomon coding mechanismReed-Solomon decoding process We randomly select k out of n fragments, say X = [x1 , x2 , · · · , xk ]T . Then we use the coefficients of each fragments xi to construct a k × k matrix V ′ . Original data can be regenerated by multiplying matrix X with the inverse of matrix V ′ : F = V ′−1 X . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 8 / 19
  19. 19. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Optimization motivationOptimization motivation The reason that discourages using Reed-Solomon coding to replace replication is its high computational complexity: ▶ Galois Field arithmetic ▶ matrix operations . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 9 / 19
  20. 20. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Accelerate operations in Galois Field (GF)Accelerate operations in Galois Field (GF) GF(2w ) field is constructed by finding a primitive polynomial q(x) of degree w over GF(2), and then enumerating the elements (which are polynomials) with the generator x. Addition in this field is performed using polynomial addition, and multiplication is performed using polynomial multiplication and taking the result modulo q(x). . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 10 / 19
  21. 21. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Accelerate operations in Galois Field (GF)Accelerate operations in Galois Field (GF) GF(2w ) field is constructed by finding a primitive polynomial q(x) of degree w over GF(2), and then enumerating the elements (which are polynomials) with the generator x. Addition in this field is performed using polynomial addition, and multiplication is performed using polynomial multiplication and taking the result modulo q(x). In implementation, we map the elements of GF(2w ) to binary words of size w, so that computation in GF(2w ) becomes bitwise operations. Let r(x) be a polynomial in GF(2w ). Then we can map r(x) to a binary word b of size w by setting the ith bit of b to the coefficient of xi in r(x). . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 10 / 19
  22. 22. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Accelerate operations in Galois Field (GF)Accelerate operations in Galois Field (GF) GF(2w ) field is constructed by finding a primitive polynomial q(x) of degree w over GF(2), and then enumerating the elements (which are polynomials) with the generator x. Addition in this field is performed using polynomial addition, and multiplication is performed using polynomial multiplication and taking the result modulo q(x). In implementation, we map the elements of GF(2w ) to binary words of size w, so that computation in GF(2w ) becomes bitwise operations. Let r(x) be a polynomial in GF(2w ). Then we can map r(x) to a binary word b of size w by setting the ith bit of b to the coefficient of xi in r(x). After mapping, Addition/subtraction of binary elements of GF(2w ) can be performed by bitwise exclusive-or. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 10 / 19
  23. 23. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Accelerate operations in Galois Field (GF)Accelerate operations in Galois Field (GF) However, multiplication is still costly. Use bitwise operations to emulate polynomial multiplication in GF(2w ): a loop of w iterations. e.g. In GF(28 ), multiplying 2 bytes takes 8 iterations. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 11 / 19
  24. 24. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Accelerate operations in Galois Field (GF)Accelerate operations in Galois Field (GF) CPU approach store all the result in a multiplication table e.g. In GF(28 ), multiplying 2 bytes takes 256 × 256 = 64KB consumes a lot of space ⇝ not suitable for GPU . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 11 / 19
  25. 25. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Accelerate operations in Galois Field (GF)Accelerate operations in Galois Field (GF) trade-off between computation and memory consumption ([1]) We build up two smaller tables: ▶ exponential table: maps from a binary element b to power j such that xj is equivalent to b. ▶ logarithm table: maps from a power j to its binary element b such that xj is equivalent to b. Multiplication in GF(2w ): ▶ looking each binary number in the logarithm table for its logarithm (this is equivalent to finding the polynomial) ▶ adding the logarithms modulo 2w − 1 (this is equivalent to multiplying the polynomial modulo q(x)) ▶ looking up exponential table to convert the result back to a binary number Totally three table lookups and a modular addition.... ... ... ... ... ... ... ... ... ... ... ... . . . . .. .. .. .. .. . . . . .. .. .. 11 / 19
  26. 26. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Accelerate encoding processAccelerate encoding process ▶ Generating Vandermonde’s Matrix ▶ Matrix multiplication . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 12 / 19
  27. 27. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Accelerate decoding processAccelerate decoding process ▶ Computing inverse matrix ▶ Matrix multiplication . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 13 / 19
  28. 28. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Accelerate decoding processAccelerate matrix inversion Guassian/Guass-Jordon elimination in GF(28 )   1 0 0 0 | 1 0 0 0 0 1 0 0 | 0 1 0 0 [V ′ I] =  1 2 3 4 | 0 0  1 0 1 1 1 1 | 0 0 0 1 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 14 / 19
  29. 29. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Accelerate decoding processAccelerate matrix inversion Guassian/Guass-Jordon elimination in GF(28 )   1 0 0 0 | 1 0 0 0 0 1 0 0 | 0 1 0 0 [V ′ I] =  0 2 3 4 | 1 0  1 0 0 1 1 1 | 1 0 0 1 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 14 / 19
  30. 30. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Accelerate decoding processAccelerate matrix inversion Guassian/Guass-Jordon elimination in GF(28 )   1 0 0 0 | 1 0 0 0 0 1 0 0 | 0 1 0 0 [V ′ I] =  0 0 3 4 | 1 2  1 0 0 0 1 1 | 1 1 0 1 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 14 / 19
  31. 31. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Accelerate decoding processAccelerate matrix inversion Guassian/Guass-Jordon elimination in GF(28 )   1 0 0 0 | 1 0 0 0 0 1 0 0 | 0 1 0 0 [V ′ I] =  0 0 1 247 | 244 245 244 0  0 0 0 246 | 245 244 244 1 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 14 / 19
  32. 32. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Accelerate decoding processAccelerate matrix inversion Guassian/Guass-Jordon elimination in GF(28 )   1 0 0 0 | 1 0 0 0 0 1 0 0 | 0 1 0 0  [V ′ I] =  0 0 1 0 | 104 187 186 210  0 0 0 1 | 105 186 186 211 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 14 / 19
  33. 33. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Accelerate decoding processAccelerate matrix inversion Guassian/Guass-Jordon elimination in GF(28 ) GPU ▶ normalize row ▶ make column into reduced echelon form . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 14 / 19
  34. 34. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Experiment SettingsExperiment Settings ▶ Intel(R) Xeon(R) CPU E5620 @ 2.40GHz ▶ nVidia Corporation GF100 [Tesla C2050] . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 15 / 19
  35. 35. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....GPU Cost BreakdownExperiment Result (GPU cost breakdown) k = 4, n = 6 GPU R-S encoding 600 576.414185 500 429.164764 Copy code chunks from GPU memory to host 400 memory Encoding file (matrix multiplication) time(ms) 300 218.380676 Copy encoding matrix from GPU memory to host memory 173.697983 200 Generating encoding matrix Copy data chunks from host memory to GPU 100 memory 29.45792 0 752KB 47MB 161MB 679MB 1.1GB file size . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 16 / 19
  36. 36. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....GPU Cost BreakdownExperiment Result (GPU cost breakdown) k = 4, n = 6 GPU R-S encoding 450 400 350 300 250 time(ms) 200 Total computation time 150 Total communication time 100 50 0 0 200000 400000 600000 800000 1000000 1200000 file size(KB) . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 16 / 19
  37. 37. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....GPU Cost BreakdownExperiment Result (GPU cost breakdown) k = 4, n = 6 GPU R-S decoding 800 751.384766 700 614.687073 600 Copy data chunks from GPU memory to host memory 500 Decoding file (matrix multiplication) time(ms) 400 Generating decoding matrix (inverse matrix) 300 262.113556 Copy encoding matrix from host memory to 178.489853 GPU memory 200 Copy code chunks from host memory to GPU 100 memory 31.092863 0 752KB 47MB 161MB 679MB 1.1GB file size . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 16 / 19
  38. 38. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....GPU Cost BreakdownExperiment Result (GPU cost breakdown) k = 4, n = 6 GPU R-S decoding 600 500 400 time(ms) 300 Total computation time 200 Total communication time 100 0 0 200000 400000 600000 800000 1000000 1200000 file size(KB) . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 16 / 19
  39. 39. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....GPU vs CPUExperiment Result (GPU vs CPU) k = 4, n = 6 RS encode: CPU vs GPU 2000 64.72 X 1800 58.06 X 1600 1400 1200 bandwidth 1000 (MB/s) 25.73 X GPU bandwidth 800 CPU bandwidth 600 9.87 X 400 200 0 0 200000 400000 600000 800000 1000000 1200000 file size(KB) . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 17 / 19
  40. 40. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....GPU vs CPUExperiment Result (GPU vs CPU) k = 4, n = 6 RS decode: CPU vs GPU 1600 89.99 X 1400 1200 75.4 X 1000 bandwidth 800 39.19 X (MB/s) GPU bandwidth 600 CPU bandwidth 400 18.12 X 200 1.47 X 0 0 200000 400000 600000 800000 1000000 1200000 file size(KB) . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 17 / 19
  41. 41. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....tuning kExperiment Result (tuning k) File size: 1.1GB double fault tolerance Total GPU encoding time (n-k=2) 6000 5060.778809 5000 4000 time(ms) 3000 2702.535156 Total GPU encoding time 2000 1495.114624 947.589661 1000 576.414185 665.971008 0 4 8 16 32 64 128 k . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 18 / 19
  42. 42. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....tuning kExperiment Result (tuning k) File size: 1.1GB double fault tolerance Total GPU decoding time (n-k=2) 7000 5958.092285 6000 5000 4000 time(ms) 3167.548828 3000 Total GPU decoding time 1856.644165 2000 1028.636963 1210.730591 1000 751.384766 0 4 8 16 32 64 128 k . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 18 / 19
  43. 43. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....tuning kExperiment Result (tuning k) File size: 1.1GB triple fault tolerance Total GPU encoding time (n-k=3) 6000 5053.129883 5000 4000 time(ms) 3000 2674.739502 Total GPU encoding time 2000 1509.532715 766.305176 934.1427 1000 695.003662 0 4 8 16 32 64 128 k . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 18 / 19
  44. 44. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....tuning kExperiment Result (tuning k) File size: 1.1GB triple fault tolerance Total GPU decoding time (n-k=3) 7000 5960.175293 6000 5000 4000 time(ms) 3167.339355 3000 Total GPU decoding time 1875.053711 2000 1345.650269 873.917542 1026.679443 1000 0 4 8 16 32 64 128 k . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 18 / 19
  45. 45. Background How to use CUDA to accelerate Experiment Q&A....... ..... ....Q&A Thank You! . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 19 / 19
  46. 46. Background How to use CUDA to accelerate Experiment Q&A....... ..... .... J.S. Plank et al. A tutorial on reed-solomon coding for fault-tolerance in raid-like systems. Software Practice and Experience, 27(9):995–1012, 1997. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 19 / 19

×