Cuda 3

CUDA Programming
continued
ITCS 4145/5145 Nov 24, 2010 © Barry Wilkinson
CUDA-3

2
Error Reporting
continued
CUDA SDK toolkit has some “safety check routines:
•
cutilSafeCall( ... ); // check for error return codes:
•
cutilCheckMsg( ... ); // check for failure messages:
Example
cutilSafeCall( cudaMalloc(… ) ); // allocate GPU memory
myKernel<<<nblocks,nthreads>>>( … ) ; // execute kernel
cutilCheckMsg("myKernel failedn");
cutilSafeCall( cudaMemcpy(…); // copy results back
cutilSafeCall(cudaFree( … ); // free memory
Need details of these routines!

3
Error Reporting
continued
Book by Sanders and Kandrot* uses a macro called HANDLE_ERROR() to
surround CUDA calls, e.g.:
HANDLE_ERROR( cudaMalloc( … ));
HANDLE_ERROR detects that call has returned an error code, prints an
associated error message, and exist the application with an EXIT_FAILURE
code:
static void HandleError( cudaError_t err, const char *file, int line ) {
if (err != cudaSuccess) {
printf( "%s in %s at line %dn", cudaGetErrorString( err ), file, line );
exit( EXIT_FAILURE );
}
}
#define HANDLE_ERROR( err ) (HandleError( err, __FILE__, __LINE__ ))
* “CUDA By Example An Introduction to General-Purpose GPU Programming” by Jason
Sanders and Edward Kandrot, Addison-Wesley, Upper Saddle River, NJ, 2011

4
Timing Execution
CUDA SDK timer
int timer =0;
cutCreateTimer (& timer);
cutStartTimer (timer);
...
cutStopTimer (timer);
cutGetTimerValue (timer);
cutDeleteTimer (timer);
Avoid including time of first kernel launch which will be more
timing consuming that subsequent launches because of
initialization
Use events instead of above for asynchronous functions
Need details of these routines!

5
Timing
If program uses synchronous cudaMemcpy, can
use clock():
#include <time.h>
…
start = clock();
cudaMemcpy
… // kernel call
cudaMemcpy
stop = clock();
printf("GPU pi calculated in %f s.n",
(stop-start)/(float)CLOCKS_PER_SEC);

6
Monte Carlo Computations
Embarrassingly parallel computations that are attractive for GPUs.
Use random numbers to make random selections that are then
used in the computation.
Many application areas: numerical integration, physical simulations,
business models, finance, …
Principle issue is how to generate (pseudo) random sequences.
Cannot call rand() or any other C library function from within a
CUDA kernel.
* http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CURAND_Library.pdf

7
Generating random numbers
Possible solutions:
1.Call rand() in the CPU code and copy the random numbers across
to the GPU (not the best way)
2.Use the NVIDIA CUDA CURAND library*
3.Hand-code the rand() function in kernel.
Common random number generator formula is:
xi+1 = (a * xi + c) mod m.
Good values for a, c, and m are a = 16807, c = 0, and m = 231
-
1 (a prime number).
Will need to use long ints because of the size of numbers.
* http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CURAND_Library.pdf

Cuda 3

More Related Content

What's hot

Similar to Cuda 3

More from Anshul Sharma

Recently uploaded

Cuda 3