GPUs in Big Data - StampedeCon 2014

3,652 views

Published on

At StampedeCon 2014, John Tran of NVIDIA presented "GPUs in Big Data." Modern graphics processing units (GPUs) are massively parallel general-purpose processors that are taking Big Data by storm. In terms of power efficiency, compute density, and scalability, it is clear now that commodity GPUs are the future of parallel computing. In this talk, we will cover diverse examples of how GPUs are revolutionizing Big Data in fields such as machine learning, databases, genomics, and other computational sciences.

Published in: Technology

GPUs in Big Data - StampedeCon 2014

  1. 1. BIG DATA IN GPUS John Tran | StampedeCon2014, May 29 2014, St Louis, MO
  2. 2. “If you were plowing a field, which would you rather use? Two strong oxen or 1024 chickens?” —Seymour Cray
  3. 3. Example CPU: Xeon E5-2687W ! 2.27 B transistors ! 8 cores, 16 threads @ 3.1 GHz ! 0.35 SP TFLOPS ! 0.17 DP TFLOPS ! 256 GB DDR3 @1600 MHz ! 51.2 GB/s ! 150 W ! 20 MB L3 cache ! Single thread Perf ! branch prediction ! out of order execution
  4. 4. Example GPU: Tesla K40 ! 7.1 B transistors ! 2880 cores, 30720 threads @ 745 MHz ! 4.29 SP TFLOPS ! 1.43 DP TFLOPS ! 12 GB GDDR5 @ 3GHz ! 288 GB/s memory BW ! 235 W ! PCIE Gen3 x16 ! 12 GB/s
  5. 5. Math and memory peak throughput 4.29 TFLOPS Xeon E5-2687-W Tesla K40 0.35 0.17 1.43 5 4 3 2 1 0 SP TFLOPS DP TFLOPS 51.2 288 400 300 200 100 0 Memory BW GB/s Xeon E5-2687W Tesla K40
  6. 6. The Chickens are Winning ! Parallel computing is no longer “the future” ! If you are not parallel, you are already behind ! GPUs win in ! Performance == $$ ! Power == $$ ! Cost == $$
  7. 7. Where did these GPUs come from?
  8. 8. OK, but what about computing?
  9. 9. All Computing is Parallel Computing
  10. 10. Parallel Computing CPU GPU
  11. 11. The Basic Idea – Accelerated Computing Application Code Compute-Intensive Functions Rest of Sequential CPU Code GPU CPU CUDA
  12. 12. Quick CUDA C example Standard C Code Parallel C Code void saxpy(int n, float a, float *x, float *y) { for (int i = 0; i < n; ++i) y[i] = a*x[i] + y[i]; } int N = 1<<20; // Perform SAXPY on 1M elements saxpy(N, 2.0, x, y); __global__ void saxpy(int n, float a, float *x, float *y) { int i = blockIdx.x*blockDim.x + threadIdx.x; if (i < n) y[i] = a*x[i] + y[i]; } int N = 1<<20; cudaMemcpy(x, d_x, N, cudaMemcpyHostToDevice); cudaMemcpy(y, d_y, N, cudaMemcpyHostToDevice); // Perform SAXPY on 1M elements saxpy<<<4096,256>>>(N, 2.0, x, y); cudaMemcpy(d_y, y, N, cudaMemcpyDeviceToHost); http://developer.nvidia.com/cuda-toolkit
  13. 13. How else can you program it? ! Libraries ! Thrust, BLAS, SPARSE, FFT, NPP, RAND ! Directives ! OpenACC ! Languages ! CUDA C, CUDA C++, thrust, python, fortran, C++ proposal, matlab, gpu.net ! Learn ! “get cuda,” Udacity, Coursera
  14. 14. How does this matter to Big Data?
  15. 15. ! 90 M monthly active users ! 17 M tracks tagged / day ! 27 M tracks in DB “GPUs enable us to handle our tremendous processing needs at a substantial cost savings, delivering twice the performance per dollar compared to a CPU-based system.” -Jason Titus, CTO, Shazam
  16. 16. Deep Neural Networks for image classification
  17. 17. Google Datacenter Stanford AI Lab 1000 CPU Servers 600 kWatts $5,000,000 3 GPU-Accelerated Servers 3.6 kWatts $21,000 Deep learning with COTS HPC systems, A Coates, B Huval, T Wang, D Wu, A Ng, B Catanzaro, NIPS 2013
  18. 18. Speech Recognition
  19. 19. The DataScope at JHU 5PB of science data (in 2010) “The Data-Scope will allow us to mine out relationships among data that already exist but that we can’t yet handle and to sift discoveries from what seems like an overwhelming flow of information. New discoveries will definitely emerge this way. There are relationships and patterns that we just cannot fathom buried in that onslaught of data. Data-Scope will tease these out.” – Alex Szalay, JHU
  20. 20. HIV Capsid
  21. 21. Beating Heart Surgery Patient stands to lose 1 point of IQ every 10 min with heart stopped Only ~2% of heart surgeons will operate on a beating heart GPU enables real-time motion compensation to virtually stop beating heart for surgeons: Courtesy Laboratoire d’Informatique de Robotique et de Microelectronique de Montpellier
  22. 22. NVBIO
  23. 23. Final Thoughts ! Parallel computing is here ! Re-think parallel or get left behind ! Scale up before scaling out ! Several orders of magnitude parallelism increase by using a GPU ! Do you really need a cluster? ! GPUs are the most efficient solution for parallel problems ! Perf / $ ! Perf / Watt
  24. 24. All Computing is Parallel Computing

×