NVIDIA® CUDA™ 5.0
Sample evaluation result
PART Ⅱ
GPU: GTX 560 Ti
CPU: i5-3450S (TDP65W)
RAM: 16GB
OS: Windows 7 x64 Ultimate
Yukio Saitoh | FXFROG.com
24/Apr/2013
INDEX
Sample binary :
19. concurrentKernels
20. conjugateGradient
21. concurrentKernels
22. conjugateGradient
23. conjugateGradientPrecond
24. convolutionFFT2D
25. convolutionSeparable
26. convolutionTexture
27. cppIntegration
28. cudaDecodeD3D9 (runaway)
29. cudaDecodeGL
30. cudaEncode (runaway)
31. dct8x8
32. deviceQuery
33. deviceQueryDrv
34. dwtHaar1D
35. dxtc
Sample target path and files
• C:¥ProgramData¥NVIDIA Corporation¥CUDA
Samples¥v5.0¥bin¥win64¥Release
concurrentKernels.exe
[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥concurrentKernels.exe] - Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
> Detected Compute SM 2.1 hardware with 8 multi-processors
Expected time for serial execution of 8 kernels = 0.080s
Expected time for concurrent execution of 8 kernels = 0.010s
Measured time for sample = 0.010s
Test passed
conjugateGradient.exe
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
> GPU device has 8 Multi-Processors, SM 2.1 compute capabilities
iteration = 1, residual = 4.451374e+001
iteration = 2, residual = 3.248658e+000
iteration = 3, residual = 2.695777e-001
iteration = 4, residual = 2.314586e-002
iteration = 5, residual = 1.997625e-003
iteration = 6, residual = 1.852079e-004
iteration = 7, residual = 1.705767e-005
iteration = 8, residual = 1.618583e-006
Test Summary: Error amount = 0.000000
conjugateGradientPrecond.exe
conjugateGradientPrecond starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
GPU selected Device ID = 0
> GPU device has 8 Multi-Processors, SM 2.1 compute capabilities
laplace dimension = 128
Convergence of conjugate gradient without preconditioning:
iteration = 542, residual = 8.660636e-013
Convergence Test: OK
Convergence of conjugate gradient using incomplete LU preconditioning:
iteration = 188, residual = 9.056491e-013
Convergence Test: OK
Test Summary:
Counted total of 0 errors
qaerr1 = 0.000004 qaerr2 = 0.000003
convolutionFFT2D.exe 1/2
[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥convolutionFFT2D.exe] - Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
Testing built-in R2C / C2R FFT-based convolution
...allocating memory
...generating random input data
...creating R2C & C2R FFT plans for 2048 x 2048
...uploading to GPU and padding convolution kernel and input data
...transforming convolution kernel
...running GPU FFT convolution: 1267.922657 MPix/s (3.154767 ms)
...reading back GPU convolution results
...running reference CPU convolution
...comparing the results: rel L2 = 7.179421E-008 (max delta = 4.808732E-007)
L2norm Error OK
...shutting down
Testing custom R2C / C2R FFT-based convolution
...allocating memory
...generating random input data
...creating C2C FFT plan for 2048 x 1024
...uploading to GPU and padding convolution kernel and input data
...transforming convolution kernel
...running GPU FFT convolution: 1261.058719 MPix/s (3.171938 ms)
...reading back GPU FFT results
...running reference CPU convolution
...comparing the results: rel L2 = 7.505000E-008 (max delta = 4.873593E-007)
L2norm Error OK
...shutting down
convolutionFFT2D.exe 2/2
Testing updated custom R2C / C2R FFT-based convolution
...allocating memory
...generating random input data
...creating C2C FFT plan for 2048 x 1024
...uploading to GPU and padding convolution kernel and input data
...transforming convolution kernel
...running GPU FFT convolution: 1588.813202 MPix/s (2.517602 ms)
...reading back GPU FFT results
...running reference CPU convolution
...comparing the results: rel L2 = 7.470519E-008 (max delta = 5.276085E-007)
L2norm Error OK
...shutting down
Test Summary: 0 errors
Test passed
convolutionSeparable.exe
[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥convolutionSeparable.exe] -
Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
Image Width x Height = 3072 x 3072
Allocating and initializing host arrays...
Allocating and initializing CUDA arrays...
Running GPU convolution (16 identical iterations)...
convolutionSeparable, Throughput = 3179.0263 MPixels/sec, Time = 0.00297 s, Size = 9437184 Pixels,
NumDevsUsed = 1, Work
group = 0
Reading back GPU results...
Checking the results...
...running convolutionRowCPU()
...running convolutionColumnCPU()
...comparing the results
...Relative L2 norm: 0.000000E+000
Shutting down...
Test passed
convolutionTexture.exe
[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥convolutionTexture.exe] - Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
Initializing data...
Running GPU rows convolution (10 identical iterations)...
Average convolutionRowsGPU() time: 1.427774 msecs; //3304.859282 Mpix/s
Copying convolutionRowGPU() output back to the texture...
cudaMemcpyToArray() time: 0.481161 msecs; //9806.674660 Mpix/s
Running GPU columns convolution (10 iterations)
Average convolutionColumnsGPU() time: 1.429637 msecs; //3300.552071 Mpix/s
Reading back GPU results...
Checking the results...
...running convolutionRowsCPU()
...running convolutionColumnsCPU()
Relative L2 norm: 0.000000E+000
Shutting down...
Test passed
cppIntegration.exe
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
Hello World.
Hello World.
cudaDecodeD3D9.exe (runaway)
Command Line Arguments:
argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaDecodeD3D9.exe
cudaDecodeGL.exe 1/2
[CUDA/OpenGL Video Decode]
Command Line Arguments:
argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaDecodeGL.exe
[cudaDecodeGL]: input file: <../../../3_Imaging/cudaDecodeGL/data/plush1_720p_10s.m2v>
VideoCodec : MPEG-2
Frame rate : 30000/1001fps ~ 29.97fps
Sequence format : Progressive
Coded frame size: [1280, 720]
Display area : [0, 0, 1280, 720]
Chroma format : 4:2:0
Bitrate : 14116kBit/s
Aspect ratio : 16:9
argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaDecodeGL.exe
> Device 0: <GeForce GTX 560 Ti >, Compute SM 2.1 detected
-> GPU 0: < GeForce GTX 560 Ti > driver mode is: WDDM
>> initGL() creating window [1280 x 720]
> Using CUDA/GL Device [0]: GeForce GTX 560 Ti
> Using GPU Device: GeForce GTX 560 Ti has SM 2.1 compute capability
Total amount of global memory: 1024.0000 MB
>> modInitCTX<NV12ToARGB_drvapi_x64.ptx > initialized OK
>> modGetCudaFunction< CUDA file: NV12ToARGB_drvapi_x64.ptx >
CUDA Kernel Function (0x0a4c6660) = < NV12ToARGB_drvapi >
>> modGetCudaFunction< CUDA file: NV12ToARGB_drvapi_x64.ptx >
CUDA Kernel Function (0x0a4c6210) = < Passthru_drvapi >
> VideoDecoder::cudaVideoCreateFlags = <1>Use CUDA decoder
cudaDecodeGL.exe 2/2
setTextureFilterMode(GL_NEAREST,GL_NEAREST)
ImageGL::CUcontext = 02047fd0
ImageGL::CUdevice = 00000000
reshape() glViewport(0, 0, 1280, 720)
[cudaDecodeGL] - [Frame: 0016, 00.0 fps, frame time: 98854.47 (ms) ]
[cudaDecodeGL] - [Frame: 0032, 736.9 fps, frame time: 1.36 (ms) ]
[cudaDecodeGL] - [Frame: 0048, 687.3 fps, frame time: 1.45 (ms) ]
[cudaDecodeGL] - [Frame: 0064, 788.9 fps, frame time: 1.27 (ms) ]
[cudaDecodeGL] - [Frame: 0080, 748.5 fps, frame time: 1.34 (ms) ]
[cudaDecodeGL] - [Frame: 0096, 724.5 fps, frame time: 1.38 (ms) ]
[cudaDecodeGL] - [Frame: 0112, 747.5 fps, frame time: 1.34 (ms) ]
[cudaDecodeGL] - [Frame: 0128, 738.9 fps, frame time: 1.35 (ms) ]
[cudaDecodeGL] - [Frame: 0144, 749.4 fps, frame time: 1.33 (ms) ]
[cudaDecodeGL] - [Frame: 0160, 764.7 fps, frame time: 1.31 (ms) ]
[cudaDecodeGL] - [Frame: 0176, 802.6 fps, frame time: 1.25 (ms) ]
[cudaDecodeGL] - [Frame: 0192, 766.6 fps, frame time: 1.30 (ms) ]
[cudaDecodeGL] - [Frame: 0208, 827.8 fps, frame time: 1.21 (ms) ]
[cudaDecodeGL] - [Frame: 0224, 774.1 fps, frame time: 1.29 (ms) ]
[cudaDecodeGL] - [Frame: 0240, 793.3 fps, frame time: 1.26 (ms) ]
[cudaDecodeGL] - [Frame: 0256, 742.5 fps, frame time: 1.35 (ms) ]
[cudaDecodeGL] - [Frame: 0272, 789.0 fps, frame time: 1.27 (ms) ]
[cudaDecodeGL] - [Frame: 0288, 803.1 fps, frame time: 1.25 (ms) ]
[cudaDecodeGL] - [Frame: 0304, 723.6 fps, frame time: 1.38 (ms) ]
[cudaDecodeGL] - [Frame: 0320, 728.5 fps, frame time: 1.37 (ms) ]
[cudaDecodeGL] statistics
Video Length (hh:mm:ss.msec) = 00:00:00.440
Frames Presented (inc repeats) = 326
Average Present Rate (fps) = 739.44
Frames Decoded (hardware) = 327
Average Rate of Decoding (fps) = 741.71
cudaDecodeD3D9.exe 1/2
Command Line Arguments:
argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaDecodeD3D9.exe
[cudaDecodeD3D9]: input file: <../../../3_Imaging/cudaDecodeD3D9/data/plush1_720p_10s.m2v>
VideoCodec : MPEG-2
Frame rate : 30000/1001fps ~ 29.97fps
Sequence format : Progressive
Coded frame size: [1280, 720]
Display area : [0, 0, 1280, 720]
Chroma format : 4:2:0
Bitrate : 14116kBit/s
Aspect ratio : 16:9
> Using GPU Device 0: GeForce GTX 560 Ti has SM 2.1 compute capability
Total amount of global memory: 1024.0000 MB
>> modInitCTX<NV12ToARGB_drvapi_x64.ptx> initialized SUCCESS!
>> modGetCudaFunction<NV12ToARGB_drvapi_x64.ptx>
CUDA Kernel Function = <NV12ToARGB_drvapi, 0x04439d20>
>> modGetCudaFunction<NV12ToARGB_drvapi_x64.ptx>
CUDA Kernel Function = <Passthru_drvapi, 0x044398d0>
> VideoDecoder::cudaVideoCreateFlags = <1>Use CUDA decoder
cudaDecodeD3D9.exe 2/2
[cudaDecodeD3D9] - [Frame: 0016, 833.6 fps, time: 1.20 (ms) ]
[cudaDecodeD3D9] - [Frame: 0032, 1031.0 fps, time: 0.97 (ms) ]
[cudaDecodeD3D9] - [Frame: 0048, 843.8 fps, time: 1.19 (ms) ]
[cudaDecodeD3D9] - [Frame: 0064, 864.4 fps, time: 1.16 (ms) ]
[cudaDecodeD3D9] - [Frame: 0080, 850.9 fps, time: 1.18 (ms) ]
[cudaDecodeD3D9] - [Frame: 0096, 819.0 fps, time: 1.22 (ms) ]
[cudaDecodeD3D9] - [Frame: 0112, 844.0 fps, time: 1.18 (ms) ]
[cudaDecodeD3D9] - [Frame: 0128, 815.6 fps, time: 1.23 (ms) ]
[cudaDecodeD3D9] - [Frame: 0144, 821.0 fps, time: 1.22 (ms) ]
[cudaDecodeD3D9] - [Frame: 0160, 874.7 fps, time: 1.14 (ms) ]
[cudaDecodeD3D9] - [Frame: 0176, 960.4 fps, time: 1.04 (ms) ]
[cudaDecodeD3D9] - [Frame: 0192, 947.7 fps, time: 1.06 (ms) ]
[cudaDecodeD3D9] - [Frame: 0208, 896.7 fps, time: 1.12 (ms) ]
[cudaDecodeD3D9] - [Frame: 0224, 872.5 fps, time: 1.15 (ms) ]
[cudaDecodeD3D9] - [Frame: 0240, 922.7 fps, time: 1.08 (ms) ]
[cudaDecodeD3D9] - [Frame: 0256, 943.2 fps, time: 1.06 (ms) ]
[cudaDecodeD3D9] - [Frame: 0272, 936.6 fps, time: 1.07 (ms) ]
[cudaDecodeD3D9] - [Frame: 0288, 899.8 fps, time: 1.11 (ms) ]
[cudaDecodeD3D9] - [Frame: 0304, 901.0 fps, time: 1.11 (ms) ]
[cudaDecodeD3D9] - [Frame: 0320, 813.1 fps, time: 1.23 (ms) ]
[cudaDecodeD3D9] statistics
Video Length (hh:mm:ss.msec) = 00:00:00.375
Frames Presented (inc repeats) = 326
Average Present FPS = 868.73
Frames Decoded (hardware) = 327
Average Decoder FPS = 871.40
cudaEncode.exe (runaway)
Starting cudaEncode...
[ CUDA H.264 Encoder ]
argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaEncode.exe
dct8x8.exe
dct8x8.exe Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
CUDA sample DCT/IDCT implementation
===================================
Loading test image: barbara.bmp... [512 x 512]... Success
Running Gold 1 (CPU) version... Success
Running Gold 2 (CPU) version... Success
Running CUDA 1 (GPU) version... Success
Running CUDA 2 (GPU) version... 10459.499992 MPix/s //0.025063 ms
Success
Running CUDA short (GPU) version... Success
Dumping result to barbara_gold1.bmp... Success
Dumping result to barbara_gold2.bmp... Success
Dumping result to barbara_cuda1.bmp... Success
Dumping result to barbara_cuda2.bmp... Success
Dumping result to barbara_cuda_short.bmp... Success
Processing time (CUDA 1) : 0.209782 ms
Processing time (CUDA 2) : 0.025063 ms
Processing time (CUDA short): 0.170617 ms
PSNR Original <---> CPU(Gold 1) : 32.777073
PSNR Original <---> CPU(Gold 2) : 32.777046
PSNR Original <---> GPU(CUDA 1) : 32.777092
PSNR Original <---> GPU(CUDA 2) : 32.777077
PSNR Original <---> GPU(CUDA short): 32.749447
PSNR CPU(Gold 1) <---> GPU(CUDA 1) : 64.019310
PSNR CPU(Gold 2) <---> GPU(CUDA 2) : 71.777740
PSNR CPU(Gold 2) <---> GPU(CUDA short): 42.258053
Test Summary...
Test passed
dct8x8.exe / result
barbara_cuda_short.bmp
dct8x8.exe / result
barbara_cuda1.bmp
dct8x8.exe / result
barbara_cuda2.bmp
dct8x8.exe / result
barbara_gold1.bmp
dct8x8.exe / result
barbara_gold2.bmp
deviceQuery.exe 1/2
C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥deviceQuery.exe Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 560 Ti"
CUDA Driver Version / Runtime Version 5.0 / 5.0
CUDA Capability Major/Minor version number: 2.1
Total amount of global memory: 1024 MBytes (1073741824 bytes)
( 8) Multiprocessors x ( 48) CUDA Cores/MP: 384 CUDA Cores
GPU Clock rate: 1800 MHz (1.80 GHz)
Memory Clock rate: 2050 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
deviceQuery.exe 2/2
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 1,
Device0 = GeForce
GTX 560 Ti
deviceQueryDrv.exe 1/2
C:¥ProgramData¥NVIDIA Corporation¥CUDA
Samples¥v5.0¥bin¥win64¥Release¥deviceQueryDrv.exe Starting...
CUDA Device Query (Driver API) statically linked version
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 560 Ti"
CUDA Driver Version: 5.0
CUDA Capability Major/Minor version number: 2.1
Total amount of global memory: 1024 MBytes (1073741824 bytes)
( 8) Multiprocessors x ( 48) CUDA Cores/MP: 384 CUDA Cores
GPU Clock rate: 1800 MHz (1.80 GHz)
Memory Clock rate: 2050 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Max Texture Dimension Sizes 1D=(65536) 2D=(65536,65535)
3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
deviceQueryDrv.exe 2/2
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Texture alignment: 512 bytes
Maximum memory pitch: 2147483647 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
dwtHaar1D.exe
C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥dwtHaar1D.exe Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
source file = "../../../3_Imaging/dwtHaar1D/data/signal.dat"
reference file = "result.dat"
gold file = "../../../3_Imaging/dwtHaar1D/data/regression.gold.dat"
Reading signal from "../../../3_Imaging/dwtHaar1D/data/signal.dat"
Writing result to "result.dat"
Reading reference result from "../../../3_Imaging/dwtHaar1D/data/regression.gold.dat"
Test success!
Signal.dat
9.5012929e-001
2.3113851e-001
6.0684258e-001
4.8598247e-001
8.9129897e-001
・
・
・
Regression.gold.dat
Result.dat
dxtc.exe
C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥dxtc.exe Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
Image Loaded '../../../3_Imaging/dxtc/data/lena_std.ppm', 512 x 512 pixels
Running DXT Compression on 512 x 512 image...
16384 Blocks, 64 Threads per Block, 1048576 Threads in Grid...
dxtc, Throughput = 17.7004 MPixels/s, Time = 0.01481 s, Size = 262144 Pixels, NumDevsUsed = 1, Workgroup =
64
dxtc.exe 1/4
Checking accuracy...
Deviation at ( 9, 1): 0.791667 rms
Deviation at ( 99, 1): 1.041667 rms
Deviation at ( 12, 2): 0.937500 rms
Deviation at ( 90, 3): 0.166667 rms
Deviation at ( 38, 4): 1.916667 rms
Deviation at ( 34, 7): 1.687500 rms
Deviation at ( 57, 7): 0.458333 rms
Deviation at ( 100, 8): 2.416667 rms
Deviation at ( 30, 9): 2.375000 rms
Deviation at ( 31, 9): 0.770833 rms
Deviation at ( 58, 9): 0.791667 rms
Deviation at ( 29, 10): 0.020833 rms
Deviation at ( 79, 10): 1.833333 rms
Deviation at ( 13, 11): 1.041667 rms
Deviation at ( 4, 13): 8.562500 rms
Deviation at ( 28, 13): 0.562500 rms
Deviation at ( 90, 13): 0.708333 rms
Deviation at ( 25, 14): 0.520833 rms
Deviation at ( 69, 14): 0.770833 rms
Deviation at ( 87, 16): 0.708333 rms
Deviation at ( 90, 17): 1.041667 rms
Deviation at ( 24, 19): 0.916667 rms
Deviation at ( 25, 19): 0.625000 rms
Deviation at ( 26, 19): 1.041667 rms
Deviation at ( 55, 20): 4.791667 rms
Deviation at ( 20, 23): 1.541667 rms
Deviation at ( 99, 23): 3.312500 rms
Deviation at ( 45, 24): 18.104166 rms
Deviation at ( 8, 28): 0.895833 rms
dxtc.exe 2/4
Deviation at ( 21, 30): 1.562500 rms
Deviation at ( 115, 32): 24.104166 rms
Deviation at ( 2, 33): 0.854167 rms
Deviation at ( 102, 33): 2.250000 rms
Deviation at ( 50, 35): 26.958334 rms
Deviation at ( 68, 35): 11.937500 rms
Deviation at ( 115, 36): 0.458333 rms
Deviation at ( 12, 38): 2.166667 rms
Deviation at ( 40, 40): 0.270833 rms
Deviation at ( 86, 43): 0.604167 rms
Deviation at ( 116, 43): 0.125000 rms
Deviation at ( 43, 44): 2.250000 rms
Deviation at ( 54, 44): 4.791667 rms
Deviation at ( 46, 46): 2.875000 rms
Deviation at ( 116, 46): 0.604167 rms
Deviation at ( 4, 47): 0.708333 rms
Deviation at ( 117, 48): 0.937500 rms
Deviation at ( 23, 51): 3.520833 rms
Deviation at ( 11, 52): 0.041667 rms
Deviation at ( 67, 54): 5.687500 rms
Deviation at ( 26, 55): 0.854167 rms
Deviation at ( 21, 56): 5.000000 rms
Deviation at ( 24, 56): 0.562500 rms
Deviation at ( 30, 57): 0.937500 rms
Deviation at ( 21, 59): 2.541667 rms
Deviation at ( 120, 59): 0.104167 rms
Deviation at ( 112, 60): 1.125000 rms
Deviation at ( 77, 61): 1.083333 rms
dxtc.exe 3/4
Deviation at ( 114, 62): 4.958333 rms
Deviation at ( 78, 66): 0.541667 rms
Deviation at ( 106, 68): 0.375000 rms
Deviation at ( 16, 70): 3.104167 rms
Deviation at ( 10, 71): 0.937500 rms
Deviation at ( 108, 71): 0.354167 rms
Deviation at ( 0, 72): 0.854167 rms
Deviation at ( 118, 72): 5.562500 rms
Deviation at ( 11, 73): 0.541667 rms
Deviation at ( 68, 74): 1.937500 rms
Deviation at ( 70, 76): 1.791667 rms
Deviation at ( 124, 76): 3.354167 rms
Deviation at ( 103, 78): 0.375000 rms
Deviation at ( 127, 78): 0.541667 rms
Deviation at ( 108, 79): 0.083333 rms
Deviation at ( 120, 81): 0.541667 rms
Deviation at ( 43, 82): 24.979166 rms
Deviation at ( 67, 82): 3.125000 rms
Deviation at ( 78, 82): 2.437500 rms
Deviation at ( 123, 84): 0.541667 rms
Deviation at ( 127, 85): 0.187500 rms
Deviation at ( 122, 87): 0.083333 rms
Deviation at ( 124, 87): 0.541667 rms
Deviation at ( 127, 88): 0.229167 rms
Deviation at ( 93, 91): 0.666667 rms
Deviation at ( 115, 93): 0.083333 rms
Deviation at ( 69, 95): 1.875000 rms
Deviation at ( 106, 95): 1.125000 rms
dxtc.exe 4/4
Deviation at ( 107, 95): 3.708333 rms
Deviation at ( 13, 96): 1.354167 rms
Deviation at ( 115, 98): 0.187500 rms
Deviation at ( 118, 98): 0.187500 rms
Deviation at ( 116, 101): 0.187500 rms
Deviation at ( 78, 105): 0.541667 rms
Deviation at ( 67, 107): 0.708333 rms
Deviation at ( 74, 107): 0.375000 rms
Deviation at ( 65, 109): 0.770833 rms
Deviation at ( 89, 109): 0.708333 rms
Deviation at ( 118, 109): 3.854167 rms
Deviation at ( 67, 110): 1.083333 rms
Deviation at ( 88, 111): 0.208333 rms
Deviation at ( 64, 113): 0.708333 rms
Deviation at ( 84, 113): 0.333333 rms
Deviation at ( 88, 113): 0.187500 rms
Deviation at ( 84, 114): 1.666667 rms
Deviation at ( 66, 115): 0.770833 rms
Deviation at ( 19, 118): 5.270833 rms
Deviation at ( 76, 121): 0.104167 rms
Deviation at ( 70, 122): 0.708333 rms
Deviation at ( 91, 122): 0.208333 rms
Deviation at ( 71, 123): 0.854167 rms
Deviation at ( 75, 123): 0.854167 rms
Deviation at ( 61, 124): 0.937500 rms
Deviation at ( 91, 124): 0.270833 rms
RMS(reference, result) = 0.015488
Test passed
Summary
GTX560, Some samples does not work fine.
→ MUST support CUDA compute capability 3.0.
→ Requires GPU devices with compute SM 3.5 or
higher.
This evaluation to be continued, For future
reference.

Nvidia® cuda™ 5 sample evaluationresult_2

  • 1.
    NVIDIA® CUDA™ 5.0 Sampleevaluation result PART Ⅱ GPU: GTX 560 Ti CPU: i5-3450S (TDP65W) RAM: 16GB OS: Windows 7 x64 Ultimate Yukio Saitoh | FXFROG.com 24/Apr/2013
  • 2.
    INDEX Sample binary : 19.concurrentKernels 20. conjugateGradient 21. concurrentKernels 22. conjugateGradient 23. conjugateGradientPrecond 24. convolutionFFT2D 25. convolutionSeparable 26. convolutionTexture 27. cppIntegration 28. cudaDecodeD3D9 (runaway) 29. cudaDecodeGL 30. cudaEncode (runaway) 31. dct8x8 32. deviceQuery 33. deviceQueryDrv 34. dwtHaar1D 35. dxtc
  • 3.
    Sample target pathand files • C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release
  • 4.
    concurrentKernels.exe [C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥concurrentKernels.exe]- Starting... GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 > Detected Compute SM 2.1 hardware with 8 multi-processors Expected time for serial execution of 8 kernels = 0.080s Expected time for concurrent execution of 8 kernels = 0.010s Measured time for sample = 0.010s Test passed
  • 5.
    conjugateGradient.exe GPU Device 0:"GeForce GTX 560 Ti" with compute capability 2.1 > GPU device has 8 Multi-Processors, SM 2.1 compute capabilities iteration = 1, residual = 4.451374e+001 iteration = 2, residual = 3.248658e+000 iteration = 3, residual = 2.695777e-001 iteration = 4, residual = 2.314586e-002 iteration = 5, residual = 1.997625e-003 iteration = 6, residual = 1.852079e-004 iteration = 7, residual = 1.705767e-005 iteration = 8, residual = 1.618583e-006 Test Summary: Error amount = 0.000000
  • 6.
    conjugateGradientPrecond.exe conjugateGradientPrecond starting... GPU Device0: "GeForce GTX 560 Ti" with compute capability 2.1 GPU selected Device ID = 0 > GPU device has 8 Multi-Processors, SM 2.1 compute capabilities laplace dimension = 128 Convergence of conjugate gradient without preconditioning: iteration = 542, residual = 8.660636e-013 Convergence Test: OK Convergence of conjugate gradient using incomplete LU preconditioning: iteration = 188, residual = 9.056491e-013 Convergence Test: OK Test Summary: Counted total of 0 errors qaerr1 = 0.000004 qaerr2 = 0.000003
  • 7.
    convolutionFFT2D.exe 1/2 [C:¥ProgramData¥NVIDIA Corporation¥CUDASamples¥v5.0¥bin¥win64¥Release¥convolutionFFT2D.exe] - Starting... GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 Testing built-in R2C / C2R FFT-based convolution ...allocating memory ...generating random input data ...creating R2C & C2R FFT plans for 2048 x 2048 ...uploading to GPU and padding convolution kernel and input data ...transforming convolution kernel ...running GPU FFT convolution: 1267.922657 MPix/s (3.154767 ms) ...reading back GPU convolution results ...running reference CPU convolution ...comparing the results: rel L2 = 7.179421E-008 (max delta = 4.808732E-007) L2norm Error OK ...shutting down Testing custom R2C / C2R FFT-based convolution ...allocating memory ...generating random input data ...creating C2C FFT plan for 2048 x 1024 ...uploading to GPU and padding convolution kernel and input data ...transforming convolution kernel ...running GPU FFT convolution: 1261.058719 MPix/s (3.171938 ms) ...reading back GPU FFT results ...running reference CPU convolution ...comparing the results: rel L2 = 7.505000E-008 (max delta = 4.873593E-007) L2norm Error OK ...shutting down
  • 8.
    convolutionFFT2D.exe 2/2 Testing updatedcustom R2C / C2R FFT-based convolution ...allocating memory ...generating random input data ...creating C2C FFT plan for 2048 x 1024 ...uploading to GPU and padding convolution kernel and input data ...transforming convolution kernel ...running GPU FFT convolution: 1588.813202 MPix/s (2.517602 ms) ...reading back GPU FFT results ...running reference CPU convolution ...comparing the results: rel L2 = 7.470519E-008 (max delta = 5.276085E-007) L2norm Error OK ...shutting down Test Summary: 0 errors Test passed
  • 9.
    convolutionSeparable.exe [C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥convolutionSeparable.exe]- Starting... GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 Image Width x Height = 3072 x 3072 Allocating and initializing host arrays... Allocating and initializing CUDA arrays... Running GPU convolution (16 identical iterations)... convolutionSeparable, Throughput = 3179.0263 MPixels/sec, Time = 0.00297 s, Size = 9437184 Pixels, NumDevsUsed = 1, Work group = 0 Reading back GPU results... Checking the results... ...running convolutionRowCPU() ...running convolutionColumnCPU() ...comparing the results ...Relative L2 norm: 0.000000E+000 Shutting down... Test passed
  • 10.
    convolutionTexture.exe [C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥convolutionTexture.exe]- Starting... GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 Initializing data... Running GPU rows convolution (10 identical iterations)... Average convolutionRowsGPU() time: 1.427774 msecs; //3304.859282 Mpix/s Copying convolutionRowGPU() output back to the texture... cudaMemcpyToArray() time: 0.481161 msecs; //9806.674660 Mpix/s Running GPU columns convolution (10 iterations) Average convolutionColumnsGPU() time: 1.429637 msecs; //3300.552071 Mpix/s Reading back GPU results... Checking the results... ...running convolutionRowsCPU() ...running convolutionColumnsCPU() Relative L2 norm: 0.000000E+000 Shutting down... Test passed
  • 11.
    cppIntegration.exe GPU Device 0:"GeForce GTX 560 Ti" with compute capability 2.1 Hello World. Hello World.
  • 12.
    cudaDecodeD3D9.exe (runaway) Command LineArguments: argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaDecodeD3D9.exe
  • 13.
    cudaDecodeGL.exe 1/2 [CUDA/OpenGL VideoDecode] Command Line Arguments: argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaDecodeGL.exe [cudaDecodeGL]: input file: <../../../3_Imaging/cudaDecodeGL/data/plush1_720p_10s.m2v> VideoCodec : MPEG-2 Frame rate : 30000/1001fps ~ 29.97fps Sequence format : Progressive Coded frame size: [1280, 720] Display area : [0, 0, 1280, 720] Chroma format : 4:2:0 Bitrate : 14116kBit/s Aspect ratio : 16:9 argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaDecodeGL.exe > Device 0: <GeForce GTX 560 Ti >, Compute SM 2.1 detected -> GPU 0: < GeForce GTX 560 Ti > driver mode is: WDDM >> initGL() creating window [1280 x 720] > Using CUDA/GL Device [0]: GeForce GTX 560 Ti > Using GPU Device: GeForce GTX 560 Ti has SM 2.1 compute capability Total amount of global memory: 1024.0000 MB >> modInitCTX<NV12ToARGB_drvapi_x64.ptx > initialized OK >> modGetCudaFunction< CUDA file: NV12ToARGB_drvapi_x64.ptx > CUDA Kernel Function (0x0a4c6660) = < NV12ToARGB_drvapi > >> modGetCudaFunction< CUDA file: NV12ToARGB_drvapi_x64.ptx > CUDA Kernel Function (0x0a4c6210) = < Passthru_drvapi > > VideoDecoder::cudaVideoCreateFlags = <1>Use CUDA decoder
  • 14.
    cudaDecodeGL.exe 2/2 setTextureFilterMode(GL_NEAREST,GL_NEAREST) ImageGL::CUcontext =02047fd0 ImageGL::CUdevice = 00000000 reshape() glViewport(0, 0, 1280, 720) [cudaDecodeGL] - [Frame: 0016, 00.0 fps, frame time: 98854.47 (ms) ] [cudaDecodeGL] - [Frame: 0032, 736.9 fps, frame time: 1.36 (ms) ] [cudaDecodeGL] - [Frame: 0048, 687.3 fps, frame time: 1.45 (ms) ] [cudaDecodeGL] - [Frame: 0064, 788.9 fps, frame time: 1.27 (ms) ] [cudaDecodeGL] - [Frame: 0080, 748.5 fps, frame time: 1.34 (ms) ] [cudaDecodeGL] - [Frame: 0096, 724.5 fps, frame time: 1.38 (ms) ] [cudaDecodeGL] - [Frame: 0112, 747.5 fps, frame time: 1.34 (ms) ] [cudaDecodeGL] - [Frame: 0128, 738.9 fps, frame time: 1.35 (ms) ] [cudaDecodeGL] - [Frame: 0144, 749.4 fps, frame time: 1.33 (ms) ] [cudaDecodeGL] - [Frame: 0160, 764.7 fps, frame time: 1.31 (ms) ] [cudaDecodeGL] - [Frame: 0176, 802.6 fps, frame time: 1.25 (ms) ] [cudaDecodeGL] - [Frame: 0192, 766.6 fps, frame time: 1.30 (ms) ] [cudaDecodeGL] - [Frame: 0208, 827.8 fps, frame time: 1.21 (ms) ] [cudaDecodeGL] - [Frame: 0224, 774.1 fps, frame time: 1.29 (ms) ] [cudaDecodeGL] - [Frame: 0240, 793.3 fps, frame time: 1.26 (ms) ] [cudaDecodeGL] - [Frame: 0256, 742.5 fps, frame time: 1.35 (ms) ] [cudaDecodeGL] - [Frame: 0272, 789.0 fps, frame time: 1.27 (ms) ] [cudaDecodeGL] - [Frame: 0288, 803.1 fps, frame time: 1.25 (ms) ] [cudaDecodeGL] - [Frame: 0304, 723.6 fps, frame time: 1.38 (ms) ] [cudaDecodeGL] - [Frame: 0320, 728.5 fps, frame time: 1.37 (ms) ] [cudaDecodeGL] statistics Video Length (hh:mm:ss.msec) = 00:00:00.440 Frames Presented (inc repeats) = 326 Average Present Rate (fps) = 739.44 Frames Decoded (hardware) = 327 Average Rate of Decoding (fps) = 741.71
  • 15.
    cudaDecodeD3D9.exe 1/2 Command LineArguments: argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaDecodeD3D9.exe [cudaDecodeD3D9]: input file: <../../../3_Imaging/cudaDecodeD3D9/data/plush1_720p_10s.m2v> VideoCodec : MPEG-2 Frame rate : 30000/1001fps ~ 29.97fps Sequence format : Progressive Coded frame size: [1280, 720] Display area : [0, 0, 1280, 720] Chroma format : 4:2:0 Bitrate : 14116kBit/s Aspect ratio : 16:9 > Using GPU Device 0: GeForce GTX 560 Ti has SM 2.1 compute capability Total amount of global memory: 1024.0000 MB >> modInitCTX<NV12ToARGB_drvapi_x64.ptx> initialized SUCCESS! >> modGetCudaFunction<NV12ToARGB_drvapi_x64.ptx> CUDA Kernel Function = <NV12ToARGB_drvapi, 0x04439d20> >> modGetCudaFunction<NV12ToARGB_drvapi_x64.ptx> CUDA Kernel Function = <Passthru_drvapi, 0x044398d0> > VideoDecoder::cudaVideoCreateFlags = <1>Use CUDA decoder
  • 16.
    cudaDecodeD3D9.exe 2/2 [cudaDecodeD3D9] -[Frame: 0016, 833.6 fps, time: 1.20 (ms) ] [cudaDecodeD3D9] - [Frame: 0032, 1031.0 fps, time: 0.97 (ms) ] [cudaDecodeD3D9] - [Frame: 0048, 843.8 fps, time: 1.19 (ms) ] [cudaDecodeD3D9] - [Frame: 0064, 864.4 fps, time: 1.16 (ms) ] [cudaDecodeD3D9] - [Frame: 0080, 850.9 fps, time: 1.18 (ms) ] [cudaDecodeD3D9] - [Frame: 0096, 819.0 fps, time: 1.22 (ms) ] [cudaDecodeD3D9] - [Frame: 0112, 844.0 fps, time: 1.18 (ms) ] [cudaDecodeD3D9] - [Frame: 0128, 815.6 fps, time: 1.23 (ms) ] [cudaDecodeD3D9] - [Frame: 0144, 821.0 fps, time: 1.22 (ms) ] [cudaDecodeD3D9] - [Frame: 0160, 874.7 fps, time: 1.14 (ms) ] [cudaDecodeD3D9] - [Frame: 0176, 960.4 fps, time: 1.04 (ms) ] [cudaDecodeD3D9] - [Frame: 0192, 947.7 fps, time: 1.06 (ms) ] [cudaDecodeD3D9] - [Frame: 0208, 896.7 fps, time: 1.12 (ms) ] [cudaDecodeD3D9] - [Frame: 0224, 872.5 fps, time: 1.15 (ms) ] [cudaDecodeD3D9] - [Frame: 0240, 922.7 fps, time: 1.08 (ms) ] [cudaDecodeD3D9] - [Frame: 0256, 943.2 fps, time: 1.06 (ms) ] [cudaDecodeD3D9] - [Frame: 0272, 936.6 fps, time: 1.07 (ms) ] [cudaDecodeD3D9] - [Frame: 0288, 899.8 fps, time: 1.11 (ms) ] [cudaDecodeD3D9] - [Frame: 0304, 901.0 fps, time: 1.11 (ms) ] [cudaDecodeD3D9] - [Frame: 0320, 813.1 fps, time: 1.23 (ms) ] [cudaDecodeD3D9] statistics Video Length (hh:mm:ss.msec) = 00:00:00.375 Frames Presented (inc repeats) = 326 Average Present FPS = 868.73 Frames Decoded (hardware) = 327 Average Decoder FPS = 871.40
  • 17.
    cudaEncode.exe (runaway) Starting cudaEncode... [CUDA H.264 Encoder ] argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaEncode.exe
  • 18.
    dct8x8.exe dct8x8.exe Starting... GPU Device0: "GeForce GTX 560 Ti" with compute capability 2.1 CUDA sample DCT/IDCT implementation =================================== Loading test image: barbara.bmp... [512 x 512]... Success Running Gold 1 (CPU) version... Success Running Gold 2 (CPU) version... Success Running CUDA 1 (GPU) version... Success Running CUDA 2 (GPU) version... 10459.499992 MPix/s //0.025063 ms Success Running CUDA short (GPU) version... Success Dumping result to barbara_gold1.bmp... Success Dumping result to barbara_gold2.bmp... Success Dumping result to barbara_cuda1.bmp... Success Dumping result to barbara_cuda2.bmp... Success Dumping result to barbara_cuda_short.bmp... Success Processing time (CUDA 1) : 0.209782 ms Processing time (CUDA 2) : 0.025063 ms Processing time (CUDA short): 0.170617 ms PSNR Original <---> CPU(Gold 1) : 32.777073 PSNR Original <---> CPU(Gold 2) : 32.777046 PSNR Original <---> GPU(CUDA 1) : 32.777092 PSNR Original <---> GPU(CUDA 2) : 32.777077 PSNR Original <---> GPU(CUDA short): 32.749447 PSNR CPU(Gold 1) <---> GPU(CUDA 1) : 64.019310 PSNR CPU(Gold 2) <---> GPU(CUDA 2) : 71.777740 PSNR CPU(Gold 2) <---> GPU(CUDA short): 42.258053 Test Summary... Test passed
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
    deviceQuery.exe 1/2 C:¥ProgramData¥NVIDIA Corporation¥CUDASamples¥v5.0¥bin¥win64¥Release¥deviceQuery.exe Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "GeForce GTX 560 Ti" CUDA Driver Version / Runtime Version 5.0 / 5.0 CUDA Capability Major/Minor version number: 2.1 Total amount of global memory: 1024 MBytes (1073741824 bytes) ( 8) Multiprocessors x ( 48) CUDA Cores/MP: 384 CUDA Cores GPU Clock rate: 1800 MHz (1.80 GHz) Memory Clock rate: 2050 Mhz Memory Bus Width: 256-bit L2 Cache Size: 524288 bytes Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048) Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 32768 Warp size: 32
  • 25.
    deviceQuery.exe 2/2 Maximum numberof threads per multiprocessor: 1536 Maximum number of threads per block: 1024 Maximum sizes of each dimension of a block: 1024 x 1024 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535 Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model) Device supports Unified Addressing (UVA): Yes Device PCI Bus ID / PCI location ID: 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = GeForce GTX 560 Ti
  • 26.
    deviceQueryDrv.exe 1/2 C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥deviceQueryDrv.exeStarting... CUDA Device Query (Driver API) statically linked version Detected 1 CUDA Capable device(s) Device 0: "GeForce GTX 560 Ti" CUDA Driver Version: 5.0 CUDA Capability Major/Minor version number: 2.1 Total amount of global memory: 1024 MBytes (1073741824 bytes) ( 8) Multiprocessors x ( 48) CUDA Cores/MP: 384 CUDA Cores GPU Clock rate: 1800 MHz (1.80 GHz) Memory Clock rate: 2050 Mhz Memory Bus Width: 256-bit L2 Cache Size: 524288 bytes Max Texture Dimension Sizes 1D=(65536) 2D=(65536,65535) 3D=(2048,2048,2048) Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 32768 Warp size: 32
  • 27.
    deviceQueryDrv.exe 2/2 Maximum numberof threads per multiprocessor: 1536 Maximum number of threads per block: 1024 Maximum sizes of each dimension of a block: 1024 x 1024 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535 Texture alignment: 512 bytes Maximum memory pitch: 2147483647 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Concurrent kernel execution: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model) Device supports Unified Addressing (UVA): Yes Device PCI Bus ID / PCI location ID: 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
  • 28.
    dwtHaar1D.exe C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥dwtHaar1D.exeStarting... GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 source file = "../../../3_Imaging/dwtHaar1D/data/signal.dat" reference file = "result.dat" gold file = "../../../3_Imaging/dwtHaar1D/data/regression.gold.dat" Reading signal from "../../../3_Imaging/dwtHaar1D/data/signal.dat" Writing result to "result.dat" Reading reference result from "../../../3_Imaging/dwtHaar1D/data/regression.gold.dat" Test success! Signal.dat 9.5012929e-001 2.3113851e-001 6.0684258e-001 4.8598247e-001 8.9129897e-001 ・ ・ ・ Regression.gold.dat Result.dat
  • 29.
    dxtc.exe C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥dxtc.exeStarting... GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 Image Loaded '../../../3_Imaging/dxtc/data/lena_std.ppm', 512 x 512 pixels Running DXT Compression on 512 x 512 image... 16384 Blocks, 64 Threads per Block, 1048576 Threads in Grid... dxtc, Throughput = 17.7004 MPixels/s, Time = 0.01481 s, Size = 262144 Pixels, NumDevsUsed = 1, Workgroup = 64
  • 30.
    dxtc.exe 1/4 Checking accuracy... Deviationat ( 9, 1): 0.791667 rms Deviation at ( 99, 1): 1.041667 rms Deviation at ( 12, 2): 0.937500 rms Deviation at ( 90, 3): 0.166667 rms Deviation at ( 38, 4): 1.916667 rms Deviation at ( 34, 7): 1.687500 rms Deviation at ( 57, 7): 0.458333 rms Deviation at ( 100, 8): 2.416667 rms Deviation at ( 30, 9): 2.375000 rms Deviation at ( 31, 9): 0.770833 rms Deviation at ( 58, 9): 0.791667 rms Deviation at ( 29, 10): 0.020833 rms Deviation at ( 79, 10): 1.833333 rms Deviation at ( 13, 11): 1.041667 rms Deviation at ( 4, 13): 8.562500 rms Deviation at ( 28, 13): 0.562500 rms Deviation at ( 90, 13): 0.708333 rms Deviation at ( 25, 14): 0.520833 rms Deviation at ( 69, 14): 0.770833 rms Deviation at ( 87, 16): 0.708333 rms Deviation at ( 90, 17): 1.041667 rms Deviation at ( 24, 19): 0.916667 rms Deviation at ( 25, 19): 0.625000 rms Deviation at ( 26, 19): 1.041667 rms Deviation at ( 55, 20): 4.791667 rms Deviation at ( 20, 23): 1.541667 rms Deviation at ( 99, 23): 3.312500 rms Deviation at ( 45, 24): 18.104166 rms Deviation at ( 8, 28): 0.895833 rms
  • 31.
    dxtc.exe 2/4 Deviation at( 21, 30): 1.562500 rms Deviation at ( 115, 32): 24.104166 rms Deviation at ( 2, 33): 0.854167 rms Deviation at ( 102, 33): 2.250000 rms Deviation at ( 50, 35): 26.958334 rms Deviation at ( 68, 35): 11.937500 rms Deviation at ( 115, 36): 0.458333 rms Deviation at ( 12, 38): 2.166667 rms Deviation at ( 40, 40): 0.270833 rms Deviation at ( 86, 43): 0.604167 rms Deviation at ( 116, 43): 0.125000 rms Deviation at ( 43, 44): 2.250000 rms Deviation at ( 54, 44): 4.791667 rms Deviation at ( 46, 46): 2.875000 rms Deviation at ( 116, 46): 0.604167 rms Deviation at ( 4, 47): 0.708333 rms Deviation at ( 117, 48): 0.937500 rms Deviation at ( 23, 51): 3.520833 rms Deviation at ( 11, 52): 0.041667 rms Deviation at ( 67, 54): 5.687500 rms Deviation at ( 26, 55): 0.854167 rms Deviation at ( 21, 56): 5.000000 rms Deviation at ( 24, 56): 0.562500 rms Deviation at ( 30, 57): 0.937500 rms Deviation at ( 21, 59): 2.541667 rms Deviation at ( 120, 59): 0.104167 rms Deviation at ( 112, 60): 1.125000 rms Deviation at ( 77, 61): 1.083333 rms
  • 32.
    dxtc.exe 3/4 Deviation at( 114, 62): 4.958333 rms Deviation at ( 78, 66): 0.541667 rms Deviation at ( 106, 68): 0.375000 rms Deviation at ( 16, 70): 3.104167 rms Deviation at ( 10, 71): 0.937500 rms Deviation at ( 108, 71): 0.354167 rms Deviation at ( 0, 72): 0.854167 rms Deviation at ( 118, 72): 5.562500 rms Deviation at ( 11, 73): 0.541667 rms Deviation at ( 68, 74): 1.937500 rms Deviation at ( 70, 76): 1.791667 rms Deviation at ( 124, 76): 3.354167 rms Deviation at ( 103, 78): 0.375000 rms Deviation at ( 127, 78): 0.541667 rms Deviation at ( 108, 79): 0.083333 rms Deviation at ( 120, 81): 0.541667 rms Deviation at ( 43, 82): 24.979166 rms Deviation at ( 67, 82): 3.125000 rms Deviation at ( 78, 82): 2.437500 rms Deviation at ( 123, 84): 0.541667 rms Deviation at ( 127, 85): 0.187500 rms Deviation at ( 122, 87): 0.083333 rms Deviation at ( 124, 87): 0.541667 rms Deviation at ( 127, 88): 0.229167 rms Deviation at ( 93, 91): 0.666667 rms Deviation at ( 115, 93): 0.083333 rms Deviation at ( 69, 95): 1.875000 rms Deviation at ( 106, 95): 1.125000 rms
  • 33.
    dxtc.exe 4/4 Deviation at( 107, 95): 3.708333 rms Deviation at ( 13, 96): 1.354167 rms Deviation at ( 115, 98): 0.187500 rms Deviation at ( 118, 98): 0.187500 rms Deviation at ( 116, 101): 0.187500 rms Deviation at ( 78, 105): 0.541667 rms Deviation at ( 67, 107): 0.708333 rms Deviation at ( 74, 107): 0.375000 rms Deviation at ( 65, 109): 0.770833 rms Deviation at ( 89, 109): 0.708333 rms Deviation at ( 118, 109): 3.854167 rms Deviation at ( 67, 110): 1.083333 rms Deviation at ( 88, 111): 0.208333 rms Deviation at ( 64, 113): 0.708333 rms Deviation at ( 84, 113): 0.333333 rms Deviation at ( 88, 113): 0.187500 rms Deviation at ( 84, 114): 1.666667 rms Deviation at ( 66, 115): 0.770833 rms Deviation at ( 19, 118): 5.270833 rms Deviation at ( 76, 121): 0.104167 rms Deviation at ( 70, 122): 0.708333 rms Deviation at ( 91, 122): 0.208333 rms Deviation at ( 71, 123): 0.854167 rms Deviation at ( 75, 123): 0.854167 rms Deviation at ( 61, 124): 0.937500 rms Deviation at ( 91, 124): 0.270833 rms RMS(reference, result) = 0.015488 Test passed
  • 34.
    Summary GTX560, Some samplesdoes not work fine. → MUST support CUDA compute capability 3.0. → Requires GPU devices with compute SM 3.5 or higher. This evaluation to be continued, For future reference.