GPGPU Programming @DroidconNL 2012 by Alten

1,199 views

Published on

Droid

Published in: Technology

GPGPU Programming @DroidconNL 2012 by Alten

  1. 1. Technology Consulting and Engineering GPGPU Programming on Android devices Alten droidconNL 2012 ALTEN | 11/22/12
  2. 2. Welcome• Alten PTS; leading service provider in the field of technical consultancy and engineering  Eindhoven, Capelle aan de IJssel and Apeldoorn• ir. Arjan Somers Alten-droidconNL 2012 Slide 2
  3. 3. Goals●What is GPGPU?●How is it done on current Android devices?●When is GPGPU programming useful? Alten-droidconNL 2012 Slide 3
  4. 4. Contents●What●Why●How●When●Example Alten-droidconNL 2012 Slide 4
  5. 5. What●GPGPU programming is using the GPU toperform general purpose calculations● Data manipulation using the graphic card Alten-droidconNL 2012 Slide 5
  6. 6. WhatA little bit of history●● GPUs and OpenGL● Parallel vector based operations● Programmable (Shaders) Battlezone (1980) Crysis 3 (2013) Alten-droidconNL 2012 Slide 6
  7. 7. WhatA little bit of history●● GPUs and OpenGL● Parallel vector based operations● Programmable (Shaders) Alten-droidconNL 2012 Slide 7
  8. 8. What: parallel proccessingParallel vector based operations●● Process vertices / pixels● Independent/parallel processing Alten-droidconNL 2012 Slide 8
  9. 9. What: programmable GPUsDisplacement mapping●● Requires control over vertex and fragment processing Alten-droidconNL 2012 Slide 9
  10. 10. What: programmable GPUsOpenGL Pipeline● Alten-droidconNL 2012 Slide 10
  11. 11. What: programmable GPUsOpenGL Pipeline● Shaders Alten-droidconNL 2012 Slide 11
  12. 12. What: programmable GPUs ●Shaders Vertex Fragmentvoid main(void) void main(void){ { gl_Position = gl_Vertex; gl_FragColor = vec4(0.0, 1.0, 0.0, 1.0);} } Alten-droidconNL 2012 Slide 12
  13. 13. What: programmable GPUsA fragment shader● void main(void) { vec2 pos = mod(gl_FragCoord.xy, vec2(50.0)) - vec2(25.0); float dist_squared = dot(pos, pos); gl_FragColor = (dist_squared < 400.0) ? vec4(.90, .90, .90, 1.0) : vec4(.20, .20, .40, 1.0); } Alten-droidconNL 2012 Slide 13
  14. 14. What: General calculationsSimple GPGPU●● Upload input data in textures● Render quad● Perform calculations in pixel shader● Read back results Alten-droidconNL 2012 Slide 14
  15. 15. What: General calculations Java App Alten-droidconNL 2012 Slide 15
  16. 16. What: General calculations Upload() Upload() Java App Alten-droidconNL 2012 Slide 16
  17. 17. What: General calculations Java App Draw() Alten-droidconNL 2012 Slide 17
  18. 18. What: General calculations Java App Read() Alten-droidconNL 2012 Slide 18
  19. 19. WhatA CPU program vs a GPU Program● int[] a; int[] b=F(a); int[] b; Texture draw() pixel-buffer int[] a; int[] b; Alten-droidconNL 2012 Slide 19
  20. 20. Why●Additional computational power ● Can run in parallel with CPU●Greater computational power ● Galaxy S2 ● CPU 4 GFlops ● GPU 10 GFlops Alten-droidconNL 2012 Slide 20
  21. 21. How1) Parallelize code2) Data packing3) Implement OpenGL ES 2.0 Shaders4) Drawing and Input/Output Alten-droidconNL 2012 Slide 21
  22. 22. How: Parallelize codePixel-shader parallel code: public void foo(int[] a, int[] b){ int[] c = new int[a.length]; for(int i=0; i<a.length; i++){ c[i] = a[i] + 2*b[i]; } } Alten-droidconNL 2012 Slide 22
  23. 23. How: Parallelize codeNot Pixel-shader parallel code: (Mobile GPU have no geometry shaders)public int[] bar(int[] a, int[] b, int[] c){ int[] d = new int[a.length]; for(int i=0; i<a.length; i++){ d[b[i]] += a[i]; d[c[i]] += a[i]; } return d;} Alten-droidconNL 2012 Slide 23
  24. 24. How: Parallelize codeNot Pixel-shader parallel code:● Sequence of calculations Alten-droidconNL 2012 Slide 24
  25. 25. How: Data packingCurrent mobile GPUs only have ● 8bpp buffers and textures ● single render target Alten-droidconNL 2012 Slide 25
  26. 26. How: Data packingPacking uses●● Floating point in/out-put● Physics simulation requiring multiple outputs Alten-droidconNL 2012 Slide 26
  27. 27. How: Implement ShadersWill be shown later in detail● ● Use OpenGL ES 2.0 ● No CUDA, OpenCL or similar Alten-droidconNL 2012 Slide 27
  28. 28. How: Drawing and Input/OutputTransfer is slow● Texture draw() pixel-buffer Int[] a; Int[] b; Alten-droidconNL 2012 Slide 28
  29. 29. When: What works, what not●Parallelism ● No geometry shaders●Limited precision / Single Render Target●Limited data transfer●Not yet as fast a desktop Alten-droidconNL 2012 Slide 29
  30. 30. ExampleAES encryption on the GPU● “Hello droiconNL!” Encryption U2FsdGVkX18UAXwN1I7bomP0kuKNXwQ8h 2NHb8lZ5sAG6uaLjZxzkn/ik9QPv8Pq Decryption “Hello droiconNL!” Alten-droidconNL 2012 Slide 30
  31. 31. Example Encoding Decoding Alten-droidconNL 2012 Slide 31
  32. 32. Example Encoding Decoding Not GPU-Parallel GPU-Parallel Alten-droidconNL 2012 Slide 32
  33. 33. Example Alten-droidconNL 2012 Slide 33
  34. 34. Example Are all parts implementable on gpu? Alten-droidconNL 2012 Slide 34
  35. 35. Example Are all parts implementable on gpu? Parallelizable? Packing required? Alten-droidconNL 2012 Slide 35
  36. 36. Example Implementing shader Dec Hex 0 00 25 19 255 FF Alten-droidconNL 2012 Slide 36
  37. 37. Example Implementing shader Dec Hex 0 00 25 19 255 FF Alten-droidconNL 2012 Slide 37
  38. 38. Example Implementing shader●Parallelizable?●Packing?●Steps: ● Find row/column using hex-digits ● Find new value in substitution table Alten-droidconNL 2012 Slide 38
  39. 39. Exampleuniform sampler2D SBox;uniform sampler2D InputTexture;varying vec2 vTexCoord;void main(void){ vec4 inputValues = texture2D(InputTexture, vTexCoord) * 256.0; //four values range [0.255] vec4 firstHexDigits = floor(inputValues / 16.0) / 16.0; vec4 lastHexDigits = mod(inputValues, 16.0) / 16.0; gl_FragColor = vec4( texture2D(SBox, vec2(firstHexDigits.x, lastHexDigits.x)).x), texture2D(SBox, vec2(firstHexDigits.y, lastHexDigits.y)).x), texture2D(SBox, vec2(firstHexDigits.z, lastHexDigits.z)).x), texture2D(SBox, vec2(firstHexDigits.w, lastHexDigits.w)).x));} Alten-droidconNL 2012 Slide 39
  40. 40. Exampleuniform sampler2D SBox;uniform sampler2D InputTexture; Get four bytes from inputvarying vec2 vTexCoord;void main(void){ vec4 inputValues = texture2D(InputTexture, vTexCoord) * 256.0; //four values range [0.255] vec4 firstHexDigits = floor(inputValues / 16.0) / 16.0; vec4 lastHexDigits = mod(inputValues, 16.0) / 16.0; gl_FragColor = vec4( texture2D(SBox, vec2(firstHexDigits.x, lastHexDigits.x)).x), texture2D(SBox, vec2(firstHexDigits.y, lastHexDigits.y)).x), texture2D(SBox, vec2(firstHexDigits.z, lastHexDigits.z)).x), texture2D(SBox, vec2(firstHexDigits.w, lastHexDigits.w)).x));} Alten-droidconNL 2012 Slide 40
  41. 41. Exampleuniform sampler2D SBox;uniform sampler2D InputTexture; Convert from [0, 1) to [0, 255]varying vec2 vTexCoord;void main(void){ vec4 inputValues = texture2D(InputTexture, vTexCoord) * 256.0; //four values range [0.255] vec4 firstHexDigits = floor(inputValues / 16.0) / 16.0; vec4 lastHexDigits = mod(inputValues, 16.0) / 16.0; gl_FragColor = vec4( texture2D(SBox, vec2(firstHexDigits.x, lastHexDigits.x)).x), texture2D(SBox, vec2(firstHexDigits.y, lastHexDigits.y)).x), texture2D(SBox, vec2(firstHexDigits.z, lastHexDigits.z)).x), texture2D(SBox, vec2(firstHexDigits.w, lastHexDigits.w)).x));} Alten-droidconNL 2012 Slide 41
  42. 42. Example No bitshifting, but some build-in functionsuniform sampler2D SBox;uniform sampler2D InputTexture; floor(X / 16) = floor(X / 2^4) = X << 4varying vec2 vTexCoord; Scale back to [0, 1) rangevoid main(void){ vec4 inputValues = texture2D(InputTexture, vTexCoord) * 256.0; //four values range [0.255] vec4 firstHexDigits = floor(inputValues / 16.0) / 16.0; vec4 lastHexDigits = mod(inputValues, 16.0) / 16.0; gl_FragColor = vec4( texture2D(SBox, vec2(firstHexDigits.x, lastHexDigits.x)).x), texture2D(SBox, vec2(firstHexDigits.y, lastHexDigits.y)).x), texture2D(SBox, vec2(firstHexDigits.z, lastHexDigits.z)).x), texture2D(SBox, vec2(firstHexDigits.w, lastHexDigits.w)).x));} Alten-droidconNL 2012 Slide 42
  43. 43. Exampleuniform sampler2D SBox; No masking, but some build-in functionsuniform sampler2D InputTexture; mod(X / 16) = mod(X / 2^4) = X & 0x0Fvarying vec2 vTexCoord;void main(void){ vec4 inputValues = texture2D(InputTexture, vTexCoord) * 256.0; //four values range [0.255] vec4 firstHexDigits = floor(inputValues / 16.0) / 16.0; vec4 lastHexDigits = mod(inputValues, 16.0) / 16.0; gl_FragColor = vec4( texture2D(SBox, vec2(firstHexDigits.x, lastHexDigits.x)).x), texture2D(SBox, vec2(firstHexDigits.y, lastHexDigits.y)).x), texture2D(SBox, vec2(firstHexDigits.z, lastHexDigits.z)).x), texture2D(SBox, vec2(firstHexDigits.w, lastHexDigits.w)).x));} Alten-droidconNL 2012 Slide 43
  44. 44. Exampleuniform sampler2D SBox;uniform sampler2D InputTexture; Find the new value in the s-boxvarying vec2 vTexCoord;void main(void){ vec4 inputValues = texture2D(InputTexture, vTexCoord) * 256.0; //four values range [0.255] vec4 firstHexDigits = floor(inputValues / 16.0) / 16.0; vec4 lastHexDigits = mod(inputValues, 16.0) / 16.0; gl_FragColor = vec4( texture2D(SBox, vec2(firstHexDigits.x, lastHexDigits.x)).x), texture2D(SBox, vec2(firstHexDigits.y, lastHexDigits.y)).x), texture2D(SBox, vec2(firstHexDigits.z, lastHexDigits.z)).x), texture2D(SBox, vec2(firstHexDigits.w, lastHexDigits.w)).x));} Alten-droidconNL 2012 Slide 44
  45. 45. Exampleuniform sampler2D SBox;uniform sampler2D InputTexture; Pack four values in output pixelvarying vec2 vTexCoord;void main(void){ vec4 inputValues = texture2D(InputTexture, vTexCoord) * 256.0; //four values range [0.255] vec4 firstHexDigits = floor(inputValues / 16.0) / 16.0; vec4 lastHexDigits = mod(inputValues, 16.0) / 16.0; gl_FragColor = vec4( texture2D(SBox, vec2(firstHexDigits.x, lastHexDigits.x)).x), texture2D(SBox, vec2(firstHexDigits.y, lastHexDigits.y)).x), texture2D(SBox, vec2(firstHexDigits.z, lastHexDigits.z)).x), texture2D(SBox, vec2(firstHexDigits.w, lastHexDigits.w)).x));} Alten-droidconNL 2012 Slide 45
  46. 46. My experiences●OpenGL ES is limited vs Desktop ● Geometry shaders ● Buffer formats / no MRTs●Sometimes difficult to debug ● Dithering ● NPOT●Complex algorithms are possible ● Computer vision implemented●Large speed gains are possible Alten-droidconNL 2012 Slide 46
  47. 47. Conclusion●How is GPGPU programming performed onAndroid devices? ● Trough the use of shaders and textures●When is GPGPU a viable option? ● Calculations are consuming too much time ● Calculations are parallelizable ● Can be implemented using 32 bit buffers ● Limited transfer GPU-CPU memory required Alten-droidconNL 2012 Slide 47
  48. 48. Conclusion●GPGPU programming has high potential●Mobile GPU are becoming faster●GPGPU programming is fun Alten-droidconNL 2012 Slide 48

×