gpgpu gpu cuda university lecture programming numerical simulation finite difference method fdm shared memory vector addition parallel processing education cpu gnuplot cuda fortran architecture cusparse python cfd linear simultaneous equations gpu accelerated library cublas euler method time integration double buffering constant memory optimization fortran modern fortran fortran 2003 computational fluid dynamics cavity flow diffusion equation matrix-matrix multiplication parallel reduction naive implementation memory hierarchy global memory box filter image processing cluster openmp object-oriented vorticity streamfunction convection equation thrust monte carlo curand particle laplacian bank conflict opencl jetson tk1 hierarchy mosaic negative thread processor multicore educational material goal orientation curriculum software development vscode computational science visual studio code lagrange polynomial sympy array of characters string power approximation excel scipy best practices generation schematic diagram multiple gpu fortran 95 fortran 90 cylinder fem iso_c_binding incompressible flow project-based learning micro intelligent robot system numazu national collage of technology nagaoka university of technology lbm d2q9 model lattice boltzmann method bounceback pycuda stream overlap asynchronous cooperative processing concurrent processing multi-gpu uva gpu direct unified virtual addressing marching porting fluid dynamics cpu implementation vorticity equation taylor-green vortex laplace equation conjugate gradient method poisson equation residual red-black ordering sor method rotating cone fastmath atomic operation flops compute-bound roofline flop/byte memory-bound performance pinned memory zero-copy page-locked memory transpose warp branch divergence branch memory access stride access coalesce access cuda event pi csr library cufft all-pair loop unroll interaction n-body problem order of accuracy runge-kutta method modified euler method grayscale blur gaussian blur bitmap uchar4 template occupancy profiler profiling moving average openacc embedded platform tegra opencv memory flip multi-thread universitiy fermi lectura tesla m2050 mpi process co-processor accelerator software hardware open source
See more