Fast & Furious: building HPC solutions in a nutshell

1,373 views

Published on

Slides from IT Weekend Ukraine conference presentation

Published in: Technology
  • Be the first to comment

Fast & Furious: building HPC solutions in a nutshell

  1. 1. Victor HaydinHead of R&D, ELEKS
  2. 2. Agenda1. What is HPC?2. Why does somebody need it?3. How to do it?
  3. 3. What?
  4. 4. DefinitionWikipedia: “High-performance computing (HPC)uses supercomputers and computer clusters to solveadvanced computation problems.Today, computer systems approaching the teraflops-region are counted as HPC-computers.”
  5. 5. Definition advancedcomputation problems
  6. 6. Modeling and Simulation
  7. 7. Low-latency processing
  8. 8. Big Data
  9. 9. A.I.
  10. 10. Supercomputers Computer clustersTeraflops performance
  11. 11. HPC systems comparison10000000010000000 1000000 100000 10000 HPC 1000 100 10 1 CPU (Intel Ivy Bridge) 100xCPU GPU (NVIDIA Kepler) 100xGPU IBM Sequoia
  12. 12. Why?
  13. 13. Finances
  14. 14. Healthcare
  15. 15. Fluid- and Aerodynamics
  16. 16. Genetics
  17. 17. Computer Vision and Image Processing
  18. 18. How?
  19. 19. Disclaimer
  20. 20. Commodity Hardware
  21. 21. Specialized VS.
  22. 22. GPU-based
  23. 23. Example 1:Financial Risk AnalysisUsing Monte-Carlo methodOn GPGPU
  24. 24. Distribute
  25. 25. Run
  26. 26. Define
  27. 27. Store
  28. 28. Feed
  29. 29. Present
  30. 30. Survive
  31. 31. High-level architecture
  32. 32. Middleware
  33. 33. Worker
  34. 34. Example 2:Image Search platformUsing local feature detectionOn GPGPU
  35. 35. High-level architecture
  36. 36. Middleware
  37. 37. Load Balancing
  38. 38. Unicast 140• Computation time – 1 second 120• Sending time – 120 seconds! 100 80 Unicast• More workers – slower speed 60 40 20 0 9 workers 18 workers
  39. 39. Multicast 140• Computation time – 1 second 120• Sending time –25 seconds 100 80 Unicast Multicast• Almost 5 times faster 60 40 20 0 1 2
  40. 40. Middleware
  41. 41. Worker
  42. 42. ERROR: CUDA ERROR CODE 30 (“UNKNOWN ERROR”)
  43. 43. Run same code on CPU and GPU
  44. 44. KernelCUDA_KERNEL foo(…){ CUDA_DEFINE_PARAMS; // your code here}CUDA_CALL(threads, blocks, foo(…))
  45. 45. Generated code// GPU mode // CPU mode__global__ void foo (…) void foo(…) {{ // same code here // your code here }} // LOOP OVER threads and blocksfoo<<<threads, blocks>>>(…) { foo(…) }
  46. 46. Pros & Cons• Same code for CPU and • Shared memory GPU • __syncthreads()• Debugging• Range checking• No CUDA ERROR 30
  47. 47. @victor_haydinlinkedin.com/in/victorhaydinvictor.haydin@gmail.com
  48. 48. Got a question? Ask!

×