SlideShare a Scribd company logo
NVIDIA® CUDA™ 5.0
Sample evaluation result
PART Ⅱ
GPU: GTX 560 Ti
CPU: i5-3450S (TDP65W)
RAM: 16GB
OS: Windows 7 x64 Ultimate
Yukio Saitoh | FXFROG.com
24/Apr/2013
INDEX
Sample binary :
19. concurrentKernels
20. conjugateGradient
21. concurrentKernels
22. conjugateGradient
23. conjugateGradientPrecond
24. convolutionFFT2D
25. convolutionSeparable
26. convolutionTexture
27. cppIntegration
28. cudaDecodeD3D9 (runaway)
29. cudaDecodeGL
30. cudaEncode (runaway)
31. dct8x8
32. deviceQuery
33. deviceQueryDrv
34. dwtHaar1D
35. dxtc
Sample target path and files
• C:¥ProgramData¥NVIDIA Corporation¥CUDA
Samples¥v5.0¥bin¥win64¥Release
concurrentKernels.exe
[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥concurrentKernels.exe] - Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
> Detected Compute SM 2.1 hardware with 8 multi-processors
Expected time for serial execution of 8 kernels = 0.080s
Expected time for concurrent execution of 8 kernels = 0.010s
Measured time for sample = 0.010s
Test passed
conjugateGradient.exe
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
> GPU device has 8 Multi-Processors, SM 2.1 compute capabilities
iteration = 1, residual = 4.451374e+001
iteration = 2, residual = 3.248658e+000
iteration = 3, residual = 2.695777e-001
iteration = 4, residual = 2.314586e-002
iteration = 5, residual = 1.997625e-003
iteration = 6, residual = 1.852079e-004
iteration = 7, residual = 1.705767e-005
iteration = 8, residual = 1.618583e-006
Test Summary: Error amount = 0.000000
conjugateGradientPrecond.exe
conjugateGradientPrecond starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
GPU selected Device ID = 0
> GPU device has 8 Multi-Processors, SM 2.1 compute capabilities
laplace dimension = 128
Convergence of conjugate gradient without preconditioning:
iteration = 542, residual = 8.660636e-013
Convergence Test: OK
Convergence of conjugate gradient using incomplete LU preconditioning:
iteration = 188, residual = 9.056491e-013
Convergence Test: OK
Test Summary:
Counted total of 0 errors
qaerr1 = 0.000004 qaerr2 = 0.000003
convolutionFFT2D.exe 1/2
[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥convolutionFFT2D.exe] - Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
Testing built-in R2C / C2R FFT-based convolution
...allocating memory
...generating random input data
...creating R2C & C2R FFT plans for 2048 x 2048
...uploading to GPU and padding convolution kernel and input data
...transforming convolution kernel
...running GPU FFT convolution: 1267.922657 MPix/s (3.154767 ms)
...reading back GPU convolution results
...running reference CPU convolution
...comparing the results: rel L2 = 7.179421E-008 (max delta = 4.808732E-007)
L2norm Error OK
...shutting down
Testing custom R2C / C2R FFT-based convolution
...allocating memory
...generating random input data
...creating C2C FFT plan for 2048 x 1024
...uploading to GPU and padding convolution kernel and input data
...transforming convolution kernel
...running GPU FFT convolution: 1261.058719 MPix/s (3.171938 ms)
...reading back GPU FFT results
...running reference CPU convolution
...comparing the results: rel L2 = 7.505000E-008 (max delta = 4.873593E-007)
L2norm Error OK
...shutting down
convolutionFFT2D.exe 2/2
Testing updated custom R2C / C2R FFT-based convolution
...allocating memory
...generating random input data
...creating C2C FFT plan for 2048 x 1024
...uploading to GPU and padding convolution kernel and input data
...transforming convolution kernel
...running GPU FFT convolution: 1588.813202 MPix/s (2.517602 ms)
...reading back GPU FFT results
...running reference CPU convolution
...comparing the results: rel L2 = 7.470519E-008 (max delta = 5.276085E-007)
L2norm Error OK
...shutting down
Test Summary: 0 errors
Test passed
convolutionSeparable.exe
[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥convolutionSeparable.exe] -
Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
Image Width x Height = 3072 x 3072
Allocating and initializing host arrays...
Allocating and initializing CUDA arrays...
Running GPU convolution (16 identical iterations)...
convolutionSeparable, Throughput = 3179.0263 MPixels/sec, Time = 0.00297 s, Size = 9437184 Pixels,
NumDevsUsed = 1, Work
group = 0
Reading back GPU results...
Checking the results...
...running convolutionRowCPU()
...running convolutionColumnCPU()
...comparing the results
...Relative L2 norm: 0.000000E+000
Shutting down...
Test passed
convolutionTexture.exe
[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥convolutionTexture.exe] - Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
Initializing data...
Running GPU rows convolution (10 identical iterations)...
Average convolutionRowsGPU() time: 1.427774 msecs; //3304.859282 Mpix/s
Copying convolutionRowGPU() output back to the texture...
cudaMemcpyToArray() time: 0.481161 msecs; //9806.674660 Mpix/s
Running GPU columns convolution (10 iterations)
Average convolutionColumnsGPU() time: 1.429637 msecs; //3300.552071 Mpix/s
Reading back GPU results...
Checking the results...
...running convolutionRowsCPU()
...running convolutionColumnsCPU()
Relative L2 norm: 0.000000E+000
Shutting down...
Test passed
cppIntegration.exe
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
Hello World.
Hello World.
cudaDecodeD3D9.exe (runaway)
Command Line Arguments:
argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaDecodeD3D9.exe
cudaDecodeGL.exe 1/2
[CUDA/OpenGL Video Decode]
Command Line Arguments:
argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaDecodeGL.exe
[cudaDecodeGL]: input file: <../../../3_Imaging/cudaDecodeGL/data/plush1_720p_10s.m2v>
VideoCodec : MPEG-2
Frame rate : 30000/1001fps ~ 29.97fps
Sequence format : Progressive
Coded frame size: [1280, 720]
Display area : [0, 0, 1280, 720]
Chroma format : 4:2:0
Bitrate : 14116kBit/s
Aspect ratio : 16:9
argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaDecodeGL.exe
> Device 0: <GeForce GTX 560 Ti >, Compute SM 2.1 detected
-> GPU 0: < GeForce GTX 560 Ti > driver mode is: WDDM
>> initGL() creating window [1280 x 720]
> Using CUDA/GL Device [0]: GeForce GTX 560 Ti
> Using GPU Device: GeForce GTX 560 Ti has SM 2.1 compute capability
Total amount of global memory: 1024.0000 MB
>> modInitCTX<NV12ToARGB_drvapi_x64.ptx > initialized OK
>> modGetCudaFunction< CUDA file: NV12ToARGB_drvapi_x64.ptx >
CUDA Kernel Function (0x0a4c6660) = < NV12ToARGB_drvapi >
>> modGetCudaFunction< CUDA file: NV12ToARGB_drvapi_x64.ptx >
CUDA Kernel Function (0x0a4c6210) = < Passthru_drvapi >
> VideoDecoder::cudaVideoCreateFlags = <1>Use CUDA decoder
cudaDecodeGL.exe 2/2
setTextureFilterMode(GL_NEAREST,GL_NEAREST)
ImageGL::CUcontext = 02047fd0
ImageGL::CUdevice = 00000000
reshape() glViewport(0, 0, 1280, 720)
[cudaDecodeGL] - [Frame: 0016, 00.0 fps, frame time: 98854.47 (ms) ]
[cudaDecodeGL] - [Frame: 0032, 736.9 fps, frame time: 1.36 (ms) ]
[cudaDecodeGL] - [Frame: 0048, 687.3 fps, frame time: 1.45 (ms) ]
[cudaDecodeGL] - [Frame: 0064, 788.9 fps, frame time: 1.27 (ms) ]
[cudaDecodeGL] - [Frame: 0080, 748.5 fps, frame time: 1.34 (ms) ]
[cudaDecodeGL] - [Frame: 0096, 724.5 fps, frame time: 1.38 (ms) ]
[cudaDecodeGL] - [Frame: 0112, 747.5 fps, frame time: 1.34 (ms) ]
[cudaDecodeGL] - [Frame: 0128, 738.9 fps, frame time: 1.35 (ms) ]
[cudaDecodeGL] - [Frame: 0144, 749.4 fps, frame time: 1.33 (ms) ]
[cudaDecodeGL] - [Frame: 0160, 764.7 fps, frame time: 1.31 (ms) ]
[cudaDecodeGL] - [Frame: 0176, 802.6 fps, frame time: 1.25 (ms) ]
[cudaDecodeGL] - [Frame: 0192, 766.6 fps, frame time: 1.30 (ms) ]
[cudaDecodeGL] - [Frame: 0208, 827.8 fps, frame time: 1.21 (ms) ]
[cudaDecodeGL] - [Frame: 0224, 774.1 fps, frame time: 1.29 (ms) ]
[cudaDecodeGL] - [Frame: 0240, 793.3 fps, frame time: 1.26 (ms) ]
[cudaDecodeGL] - [Frame: 0256, 742.5 fps, frame time: 1.35 (ms) ]
[cudaDecodeGL] - [Frame: 0272, 789.0 fps, frame time: 1.27 (ms) ]
[cudaDecodeGL] - [Frame: 0288, 803.1 fps, frame time: 1.25 (ms) ]
[cudaDecodeGL] - [Frame: 0304, 723.6 fps, frame time: 1.38 (ms) ]
[cudaDecodeGL] - [Frame: 0320, 728.5 fps, frame time: 1.37 (ms) ]
[cudaDecodeGL] statistics
Video Length (hh:mm:ss.msec) = 00:00:00.440
Frames Presented (inc repeats) = 326
Average Present Rate (fps) = 739.44
Frames Decoded (hardware) = 327
Average Rate of Decoding (fps) = 741.71
cudaDecodeD3D9.exe 1/2
Command Line Arguments:
argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaDecodeD3D9.exe
[cudaDecodeD3D9]: input file: <../../../3_Imaging/cudaDecodeD3D9/data/plush1_720p_10s.m2v>
VideoCodec : MPEG-2
Frame rate : 30000/1001fps ~ 29.97fps
Sequence format : Progressive
Coded frame size: [1280, 720]
Display area : [0, 0, 1280, 720]
Chroma format : 4:2:0
Bitrate : 14116kBit/s
Aspect ratio : 16:9
> Using GPU Device 0: GeForce GTX 560 Ti has SM 2.1 compute capability
Total amount of global memory: 1024.0000 MB
>> modInitCTX<NV12ToARGB_drvapi_x64.ptx> initialized SUCCESS!
>> modGetCudaFunction<NV12ToARGB_drvapi_x64.ptx>
CUDA Kernel Function = <NV12ToARGB_drvapi, 0x04439d20>
>> modGetCudaFunction<NV12ToARGB_drvapi_x64.ptx>
CUDA Kernel Function = <Passthru_drvapi, 0x044398d0>
> VideoDecoder::cudaVideoCreateFlags = <1>Use CUDA decoder
cudaDecodeD3D9.exe 2/2
[cudaDecodeD3D9] - [Frame: 0016, 833.6 fps, time: 1.20 (ms) ]
[cudaDecodeD3D9] - [Frame: 0032, 1031.0 fps, time: 0.97 (ms) ]
[cudaDecodeD3D9] - [Frame: 0048, 843.8 fps, time: 1.19 (ms) ]
[cudaDecodeD3D9] - [Frame: 0064, 864.4 fps, time: 1.16 (ms) ]
[cudaDecodeD3D9] - [Frame: 0080, 850.9 fps, time: 1.18 (ms) ]
[cudaDecodeD3D9] - [Frame: 0096, 819.0 fps, time: 1.22 (ms) ]
[cudaDecodeD3D9] - [Frame: 0112, 844.0 fps, time: 1.18 (ms) ]
[cudaDecodeD3D9] - [Frame: 0128, 815.6 fps, time: 1.23 (ms) ]
[cudaDecodeD3D9] - [Frame: 0144, 821.0 fps, time: 1.22 (ms) ]
[cudaDecodeD3D9] - [Frame: 0160, 874.7 fps, time: 1.14 (ms) ]
[cudaDecodeD3D9] - [Frame: 0176, 960.4 fps, time: 1.04 (ms) ]
[cudaDecodeD3D9] - [Frame: 0192, 947.7 fps, time: 1.06 (ms) ]
[cudaDecodeD3D9] - [Frame: 0208, 896.7 fps, time: 1.12 (ms) ]
[cudaDecodeD3D9] - [Frame: 0224, 872.5 fps, time: 1.15 (ms) ]
[cudaDecodeD3D9] - [Frame: 0240, 922.7 fps, time: 1.08 (ms) ]
[cudaDecodeD3D9] - [Frame: 0256, 943.2 fps, time: 1.06 (ms) ]
[cudaDecodeD3D9] - [Frame: 0272, 936.6 fps, time: 1.07 (ms) ]
[cudaDecodeD3D9] - [Frame: 0288, 899.8 fps, time: 1.11 (ms) ]
[cudaDecodeD3D9] - [Frame: 0304, 901.0 fps, time: 1.11 (ms) ]
[cudaDecodeD3D9] - [Frame: 0320, 813.1 fps, time: 1.23 (ms) ]
[cudaDecodeD3D9] statistics
Video Length (hh:mm:ss.msec) = 00:00:00.375
Frames Presented (inc repeats) = 326
Average Present FPS = 868.73
Frames Decoded (hardware) = 327
Average Decoder FPS = 871.40
cudaEncode.exe (runaway)
Starting cudaEncode...
[ CUDA H.264 Encoder ]
argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaEncode.exe
dct8x8.exe
dct8x8.exe Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
CUDA sample DCT/IDCT implementation
===================================
Loading test image: barbara.bmp... [512 x 512]... Success
Running Gold 1 (CPU) version... Success
Running Gold 2 (CPU) version... Success
Running CUDA 1 (GPU) version... Success
Running CUDA 2 (GPU) version... 10459.499992 MPix/s //0.025063 ms
Success
Running CUDA short (GPU) version... Success
Dumping result to barbara_gold1.bmp... Success
Dumping result to barbara_gold2.bmp... Success
Dumping result to barbara_cuda1.bmp... Success
Dumping result to barbara_cuda2.bmp... Success
Dumping result to barbara_cuda_short.bmp... Success
Processing time (CUDA 1) : 0.209782 ms
Processing time (CUDA 2) : 0.025063 ms
Processing time (CUDA short): 0.170617 ms
PSNR Original <---> CPU(Gold 1) : 32.777073
PSNR Original <---> CPU(Gold 2) : 32.777046
PSNR Original <---> GPU(CUDA 1) : 32.777092
PSNR Original <---> GPU(CUDA 2) : 32.777077
PSNR Original <---> GPU(CUDA short): 32.749447
PSNR CPU(Gold 1) <---> GPU(CUDA 1) : 64.019310
PSNR CPU(Gold 2) <---> GPU(CUDA 2) : 71.777740
PSNR CPU(Gold 2) <---> GPU(CUDA short): 42.258053
Test Summary...
Test passed
dct8x8.exe / result
barbara_cuda_short.bmp
dct8x8.exe / result
barbara_cuda1.bmp
dct8x8.exe / result
barbara_cuda2.bmp
dct8x8.exe / result
barbara_gold1.bmp
dct8x8.exe / result
barbara_gold2.bmp
deviceQuery.exe 1/2
C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥deviceQuery.exe Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 560 Ti"
CUDA Driver Version / Runtime Version 5.0 / 5.0
CUDA Capability Major/Minor version number: 2.1
Total amount of global memory: 1024 MBytes (1073741824 bytes)
( 8) Multiprocessors x ( 48) CUDA Cores/MP: 384 CUDA Cores
GPU Clock rate: 1800 MHz (1.80 GHz)
Memory Clock rate: 2050 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
deviceQuery.exe 2/2
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 1,
Device0 = GeForce
GTX 560 Ti
deviceQueryDrv.exe 1/2
C:¥ProgramData¥NVIDIA Corporation¥CUDA
Samples¥v5.0¥bin¥win64¥Release¥deviceQueryDrv.exe Starting...
CUDA Device Query (Driver API) statically linked version
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 560 Ti"
CUDA Driver Version: 5.0
CUDA Capability Major/Minor version number: 2.1
Total amount of global memory: 1024 MBytes (1073741824 bytes)
( 8) Multiprocessors x ( 48) CUDA Cores/MP: 384 CUDA Cores
GPU Clock rate: 1800 MHz (1.80 GHz)
Memory Clock rate: 2050 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Max Texture Dimension Sizes 1D=(65536) 2D=(65536,65535)
3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
deviceQueryDrv.exe 2/2
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Texture alignment: 512 bytes
Maximum memory pitch: 2147483647 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
dwtHaar1D.exe
C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥dwtHaar1D.exe Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
source file = "../../../3_Imaging/dwtHaar1D/data/signal.dat"
reference file = "result.dat"
gold file = "../../../3_Imaging/dwtHaar1D/data/regression.gold.dat"
Reading signal from "../../../3_Imaging/dwtHaar1D/data/signal.dat"
Writing result to "result.dat"
Reading reference result from "../../../3_Imaging/dwtHaar1D/data/regression.gold.dat"
Test success!
Signal.dat
9.5012929e-001
2.3113851e-001
6.0684258e-001
4.8598247e-001
8.9129897e-001
・
・
・
Regression.gold.dat
Result.dat
dxtc.exe
C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥dxtc.exe Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
Image Loaded '../../../3_Imaging/dxtc/data/lena_std.ppm', 512 x 512 pixels
Running DXT Compression on 512 x 512 image...
16384 Blocks, 64 Threads per Block, 1048576 Threads in Grid...
dxtc, Throughput = 17.7004 MPixels/s, Time = 0.01481 s, Size = 262144 Pixels, NumDevsUsed = 1, Workgroup =
64
dxtc.exe 1/4
Checking accuracy...
Deviation at ( 9, 1): 0.791667 rms
Deviation at ( 99, 1): 1.041667 rms
Deviation at ( 12, 2): 0.937500 rms
Deviation at ( 90, 3): 0.166667 rms
Deviation at ( 38, 4): 1.916667 rms
Deviation at ( 34, 7): 1.687500 rms
Deviation at ( 57, 7): 0.458333 rms
Deviation at ( 100, 8): 2.416667 rms
Deviation at ( 30, 9): 2.375000 rms
Deviation at ( 31, 9): 0.770833 rms
Deviation at ( 58, 9): 0.791667 rms
Deviation at ( 29, 10): 0.020833 rms
Deviation at ( 79, 10): 1.833333 rms
Deviation at ( 13, 11): 1.041667 rms
Deviation at ( 4, 13): 8.562500 rms
Deviation at ( 28, 13): 0.562500 rms
Deviation at ( 90, 13): 0.708333 rms
Deviation at ( 25, 14): 0.520833 rms
Deviation at ( 69, 14): 0.770833 rms
Deviation at ( 87, 16): 0.708333 rms
Deviation at ( 90, 17): 1.041667 rms
Deviation at ( 24, 19): 0.916667 rms
Deviation at ( 25, 19): 0.625000 rms
Deviation at ( 26, 19): 1.041667 rms
Deviation at ( 55, 20): 4.791667 rms
Deviation at ( 20, 23): 1.541667 rms
Deviation at ( 99, 23): 3.312500 rms
Deviation at ( 45, 24): 18.104166 rms
Deviation at ( 8, 28): 0.895833 rms
dxtc.exe 2/4
Deviation at ( 21, 30): 1.562500 rms
Deviation at ( 115, 32): 24.104166 rms
Deviation at ( 2, 33): 0.854167 rms
Deviation at ( 102, 33): 2.250000 rms
Deviation at ( 50, 35): 26.958334 rms
Deviation at ( 68, 35): 11.937500 rms
Deviation at ( 115, 36): 0.458333 rms
Deviation at ( 12, 38): 2.166667 rms
Deviation at ( 40, 40): 0.270833 rms
Deviation at ( 86, 43): 0.604167 rms
Deviation at ( 116, 43): 0.125000 rms
Deviation at ( 43, 44): 2.250000 rms
Deviation at ( 54, 44): 4.791667 rms
Deviation at ( 46, 46): 2.875000 rms
Deviation at ( 116, 46): 0.604167 rms
Deviation at ( 4, 47): 0.708333 rms
Deviation at ( 117, 48): 0.937500 rms
Deviation at ( 23, 51): 3.520833 rms
Deviation at ( 11, 52): 0.041667 rms
Deviation at ( 67, 54): 5.687500 rms
Deviation at ( 26, 55): 0.854167 rms
Deviation at ( 21, 56): 5.000000 rms
Deviation at ( 24, 56): 0.562500 rms
Deviation at ( 30, 57): 0.937500 rms
Deviation at ( 21, 59): 2.541667 rms
Deviation at ( 120, 59): 0.104167 rms
Deviation at ( 112, 60): 1.125000 rms
Deviation at ( 77, 61): 1.083333 rms
dxtc.exe 3/4
Deviation at ( 114, 62): 4.958333 rms
Deviation at ( 78, 66): 0.541667 rms
Deviation at ( 106, 68): 0.375000 rms
Deviation at ( 16, 70): 3.104167 rms
Deviation at ( 10, 71): 0.937500 rms
Deviation at ( 108, 71): 0.354167 rms
Deviation at ( 0, 72): 0.854167 rms
Deviation at ( 118, 72): 5.562500 rms
Deviation at ( 11, 73): 0.541667 rms
Deviation at ( 68, 74): 1.937500 rms
Deviation at ( 70, 76): 1.791667 rms
Deviation at ( 124, 76): 3.354167 rms
Deviation at ( 103, 78): 0.375000 rms
Deviation at ( 127, 78): 0.541667 rms
Deviation at ( 108, 79): 0.083333 rms
Deviation at ( 120, 81): 0.541667 rms
Deviation at ( 43, 82): 24.979166 rms
Deviation at ( 67, 82): 3.125000 rms
Deviation at ( 78, 82): 2.437500 rms
Deviation at ( 123, 84): 0.541667 rms
Deviation at ( 127, 85): 0.187500 rms
Deviation at ( 122, 87): 0.083333 rms
Deviation at ( 124, 87): 0.541667 rms
Deviation at ( 127, 88): 0.229167 rms
Deviation at ( 93, 91): 0.666667 rms
Deviation at ( 115, 93): 0.083333 rms
Deviation at ( 69, 95): 1.875000 rms
Deviation at ( 106, 95): 1.125000 rms
dxtc.exe 4/4
Deviation at ( 107, 95): 3.708333 rms
Deviation at ( 13, 96): 1.354167 rms
Deviation at ( 115, 98): 0.187500 rms
Deviation at ( 118, 98): 0.187500 rms
Deviation at ( 116, 101): 0.187500 rms
Deviation at ( 78, 105): 0.541667 rms
Deviation at ( 67, 107): 0.708333 rms
Deviation at ( 74, 107): 0.375000 rms
Deviation at ( 65, 109): 0.770833 rms
Deviation at ( 89, 109): 0.708333 rms
Deviation at ( 118, 109): 3.854167 rms
Deviation at ( 67, 110): 1.083333 rms
Deviation at ( 88, 111): 0.208333 rms
Deviation at ( 64, 113): 0.708333 rms
Deviation at ( 84, 113): 0.333333 rms
Deviation at ( 88, 113): 0.187500 rms
Deviation at ( 84, 114): 1.666667 rms
Deviation at ( 66, 115): 0.770833 rms
Deviation at ( 19, 118): 5.270833 rms
Deviation at ( 76, 121): 0.104167 rms
Deviation at ( 70, 122): 0.708333 rms
Deviation at ( 91, 122): 0.208333 rms
Deviation at ( 71, 123): 0.854167 rms
Deviation at ( 75, 123): 0.854167 rms
Deviation at ( 61, 124): 0.937500 rms
Deviation at ( 91, 124): 0.270833 rms
RMS(reference, result) = 0.015488
Test passed
Summary
GTX560, Some samples does not work fine.
→ MUST support CUDA compute capability 3.0.
→ Requires GPU devices with compute SM 3.5 or
higher.
This evaluation to be continued, For future
reference.

More Related Content

What's hot

GPU-Accelerated Parallel Computing
GPU-Accelerated Parallel ComputingGPU-Accelerated Parallel Computing
GPU-Accelerated Parallel Computing
Jun Young Park
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
Martin Peniak
 
Kernel Recipes 2015: Introduction to Kernel Power Management
Kernel Recipes 2015: Introduction to Kernel Power ManagementKernel Recipes 2015: Introduction to Kernel Power Management
Kernel Recipes 2015: Introduction to Kernel Power Management
Anne Nicolas
 
Java gpu computing
Java gpu computingJava gpu computing
Java gpu computing
Arjan Lamers
 
Molecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New WorldMolecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New World
Can Ozdoruk
 
NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009
Randall Hand
 
Gc crash course (1)
Gc crash course (1)Gc crash course (1)
Gc crash course (1)
Tier1 app
 
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Ontico
 
CUDA
CUDACUDA
GC Tuning & Troubleshooting Crash Course
GC Tuning & Troubleshooting Crash CourseGC Tuning & Troubleshooting Crash Course
GC Tuning & Troubleshooting Crash Course
Tier1 app
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012DefCamp
 
QGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATORQGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATOR
NVIDIA Japan
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaRob Gillen
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
Savith Satheesh
 
Optimizing Parallel Reduction in CUDA : NOTES
Optimizing Parallel Reduction in CUDA : NOTESOptimizing Parallel Reduction in CUDA : NOTES
Optimizing Parallel Reduction in CUDA : NOTES
Subhajit Sahu
 
Pc De Mis Suenos
Pc De Mis SuenosPc De Mis Suenos
Pc De Mis Suenosslayn123
 

What's hot (18)

GPU-Accelerated Parallel Computing
GPU-Accelerated Parallel ComputingGPU-Accelerated Parallel Computing
GPU-Accelerated Parallel Computing
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
Kernel Recipes 2015: Introduction to Kernel Power Management
Kernel Recipes 2015: Introduction to Kernel Power ManagementKernel Recipes 2015: Introduction to Kernel Power Management
Kernel Recipes 2015: Introduction to Kernel Power Management
 
Java gpu computing
Java gpu computingJava gpu computing
Java gpu computing
 
Molecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New WorldMolecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New World
 
NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009
 
Gc crash course (1)
Gc crash course (1)Gc crash course (1)
Gc crash course (1)
 
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
 
CUDA
CUDACUDA
CUDA
 
GC Tuning & Troubleshooting Crash Course
GC Tuning & Troubleshooting Crash CourseGC Tuning & Troubleshooting Crash Course
GC Tuning & Troubleshooting Crash Course
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
 
QGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATORQGATE 0.3: QUANTUM CIRCUIT SIMULATOR
QGATE 0.3: QUANTUM CIRCUIT SIMULATOR
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
 
Optimizing Parallel Reduction in CUDA : NOTES
Optimizing Parallel Reduction in CUDA : NOTESOptimizing Parallel Reduction in CUDA : NOTES
Optimizing Parallel Reduction in CUDA : NOTES
 
GPU: Understanding CUDA
GPU: Understanding CUDAGPU: Understanding CUDA
GPU: Understanding CUDA
 
Pc De Mis Suenos
Pc De Mis SuenosPc De Mis Suenos
Pc De Mis Suenos
 

Viewers also liked

GESTION DEL CAMBIO
GESTION DEL CAMBIOGESTION DEL CAMBIO
GESTION DEL CAMBIOTania Osita
 
Latarbelangan lahirnya ORBA dan menguatnya ORBA
Latarbelangan lahirnya ORBA dan menguatnya ORBALatarbelangan lahirnya ORBA dan menguatnya ORBA
Latarbelangan lahirnya ORBA dan menguatnya ORBA
aswansetiawan
 
Video Marketing Mastery: YouTube and Google Hangouts
Video Marketing Mastery: YouTube and Google HangoutsVideo Marketing Mastery: YouTube and Google Hangouts
Video Marketing Mastery: YouTube and Google Hangouts
Lou Bortone
 
Enterprise zones do they create or transfer value
Enterprise zones do they create or transfer valueEnterprise zones do they create or transfer value
Enterprise zones do they create or transfer value
Simon Wainwright
 
La dynamique de l’épidémie de vih en tunisie
La  dynamique de l’épidémie de vih en tunisieLa  dynamique de l’épidémie de vih en tunisie
La dynamique de l’épidémie de vih en tunisieclac.cab
 
Attitudes, motivation, and second language learning
Attitudes, motivation, and second language learningAttitudes, motivation, and second language learning
Attitudes, motivation, and second language learning
Alexis Viera
 
Liabilities & assets 2010 11
Liabilities & assets 2010 11Liabilities & assets 2010 11
Liabilities & assets 2010 11karmapath
 

Viewers also liked (9)

GESTION DEL CAMBIO
GESTION DEL CAMBIOGESTION DEL CAMBIO
GESTION DEL CAMBIO
 
Latarbelangan lahirnya ORBA dan menguatnya ORBA
Latarbelangan lahirnya ORBA dan menguatnya ORBALatarbelangan lahirnya ORBA dan menguatnya ORBA
Latarbelangan lahirnya ORBA dan menguatnya ORBA
 
Video Marketing Mastery: YouTube and Google Hangouts
Video Marketing Mastery: YouTube and Google HangoutsVideo Marketing Mastery: YouTube and Google Hangouts
Video Marketing Mastery: YouTube and Google Hangouts
 
Enterprise zones do they create or transfer value
Enterprise zones do they create or transfer valueEnterprise zones do they create or transfer value
Enterprise zones do they create or transfer value
 
La dynamique de l’épidémie de vih en tunisie
La  dynamique de l’épidémie de vih en tunisieLa  dynamique de l’épidémie de vih en tunisie
La dynamique de l’épidémie de vih en tunisie
 
Attitudes, motivation, and second language learning
Attitudes, motivation, and second language learningAttitudes, motivation, and second language learning
Attitudes, motivation, and second language learning
 
Liabilities & assets 2010 11
Liabilities & assets 2010 11Liabilities & assets 2010 11
Liabilities & assets 2010 11
 
2 web-forms
2 web-forms2 web-forms
2 web-forms
 
Burma
Burma Burma
Burma
 

Similar to Nvidia® cuda™ 5 sample evaluationresult_2

Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to Accelerators
Dilum Bandara
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
Amgad Muhammad
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
mouhouioui
 
Gpu perf-presentation
Gpu perf-presentationGpu perf-presentation
Gpu perf-presentation
GiannisTsagatakis
 
GPGPU Computation
GPGPU ComputationGPGPU Computation
GPGPU Computation
jtsagata
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
J On The Beach
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
Ofer Rosenberg
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
inside-BigData.com
 
DCSF 19 Accelerating Docker Containers with NVIDIA GPUs
DCSF 19 Accelerating Docker Containers with NVIDIA GPUsDCSF 19 Accelerating Docker Containers with NVIDIA GPUs
DCSF 19 Accelerating Docker Containers with NVIDIA GPUs
Docker, Inc.
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
inside-BigData.com
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
ssuser413a98
 
Classification of aerial photographs using DIGITS 2 - Mike Wang
Classification of aerial photographs using DIGITS 2 - Mike WangClassification of aerial photographs using DIGITS 2 - Mike Wang
Classification of aerial photographs using DIGITS 2 - Mike Wang
PAPIs.io
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Akihiro Hayashi
 

Similar to Nvidia® cuda™ 5 sample evaluationresult_2 (20)

Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to Accelerators
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Cuda intro
Cuda introCuda intro
Cuda intro
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
 
Gpu perf-presentation
Gpu perf-presentationGpu perf-presentation
Gpu perf-presentation
 
GPGPU Computation
GPGPU ComputationGPGPU Computation
GPGPU Computation
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
Linux boot-time
Linux boot-timeLinux boot-time
Linux boot-time
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
DCSF 19 Accelerating Docker Containers with NVIDIA GPUs
DCSF 19 Accelerating Docker Containers with NVIDIA GPUsDCSF 19 Accelerating Docker Containers with NVIDIA GPUs
DCSF 19 Accelerating Docker Containers with NVIDIA GPUs
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
Cuda Architecture
Cuda ArchitectureCuda Architecture
Cuda Architecture
 
Lecture 04
Lecture 04Lecture 04
Lecture 04
 
Classification of aerial photographs using DIGITS 2 - Mike Wang
Classification of aerial photographs using DIGITS 2 - Mike WangClassification of aerial photographs using DIGITS 2 - Mike Wang
Classification of aerial photographs using DIGITS 2 - Mike Wang
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 

More from Yukio Saito

東京2020ボランティア参加メモ(簡易)
東京2020ボランティア参加メモ(簡易)東京2020ボランティア参加メモ(簡易)
東京2020ボランティア参加メモ(簡易)
Yukio Saito
 
Exam prep microsoft_ai900_japanese_210428
Exam prep microsoft_ai900_japanese_210428Exam prep microsoft_ai900_japanese_210428
Exam prep microsoft_ai900_japanese_210428
Yukio Saito
 
Simple know how to creating agenda notes and daily reports
Simple know how to creating agenda notes and daily reportsSimple know how to creating agenda notes and daily reports
Simple know how to creating agenda notes and daily reports
Yukio Saito
 
Aws 転送時間計測(手順付き参考例)
Aws 転送時間計測(手順付き参考例)Aws 転送時間計測(手順付き参考例)
Aws 転送時間計測(手順付き参考例)
Yukio Saito
 
異業種から福祉業界ジョブチェンジして10か月後
異業種から福祉業界ジョブチェンジして10か月後異業種から福祉業界ジョブチェンジして10か月後
異業種から福祉業界ジョブチェンジして10か月後
Yukio Saito
 
異業種から福祉介護ジョブチェンジ検討
異業種から福祉介護ジョブチェンジ検討異業種から福祉介護ジョブチェンジ検討
異業種から福祉介護ジョブチェンジ検討
Yukio Saito
 
オンデマンド学習スタイル例 NFU
オンデマンド学習スタイル例 NFUオンデマンド学習スタイル例 NFU
オンデマンド学習スタイル例 NFU
Yukio Saito
 
Engadget電子工作部 健康ガジェットを作ろう ドS!コーチ発表最終版
Engadget電子工作部 健康ガジェットを作ろう ドS!コーチ発表最終版Engadget電子工作部 健康ガジェットを作ろう ドS!コーチ発表最終版
Engadget電子工作部 健康ガジェットを作ろう ドS!コーチ発表最終版
Yukio Saito
 
Tobii eye x controller で遊ぶ
Tobii eye x controller で遊ぶTobii eye x controller で遊ぶ
Tobii eye x controller で遊ぶ
Yukio Saito
 
斉藤之雄 が 公立大学 産業技術大学院大学 で獲得したこと。
斉藤之雄 が 公立大学 産業技術大学院大学 で獲得したこと。斉藤之雄 が 公立大学 産業技術大学院大学 で獲得したこと。
斉藤之雄 が 公立大学 産業技術大学院大学 で獲得したこと。
Yukio Saito
 
Microsoft windows phone_激安購入方法
Microsoft windows phone_激安購入方法Microsoft windows phone_激安購入方法
Microsoft windows phone_激安購入方法
Yukio Saito
 
PBLでは先行学習は大事だぜ、シラバスは参考程度で主体的に楽しもうぜ
PBLでは先行学習は大事だぜ、シラバスは参考程度で主体的に楽しもうぜPBLでは先行学習は大事だぜ、シラバスは参考程度で主体的に楽しもうぜ
PBLでは先行学習は大事だぜ、シラバスは参考程度で主体的に楽しもうぜ
Yukio Saito
 
CentOS7をインストールして遊ぶのだ
CentOS7をインストールして遊ぶのだCentOS7をインストールして遊ぶのだ
CentOS7をインストールして遊ぶのだ
Yukio Saito
 
Androidエミュレータをちょっと速くするintel haxm(ハッサム)
Androidエミュレータをちょっと速くするintel haxm(ハッサム)Androidエミュレータをちょっと速くするintel haxm(ハッサム)
Androidエミュレータをちょっと速くするintel haxm(ハッサム)
Yukio Saito
 
Winodws7のruby2でrails4を遊ぶ環境を作るのだ。
Winodws7のruby2でrails4を遊ぶ環境を作るのだ。Winodws7のruby2でrails4を遊ぶ環境を作るのだ。
Winodws7のruby2でrails4を遊ぶ環境を作るのだ。
Yukio Saito
 
Astah plugin 実行方法とSysML要求図のサンプル
Astah plugin 実行方法とSysML要求図のサンプルAstah plugin 実行方法とSysML要求図のサンプル
Astah plugin 実行方法とSysML要求図のサンプル
Yukio Saito
 
Windows8でOpenCVを使ったAndroid(MOVERIO)開発体験したい
Windows8でOpenCVを使ったAndroid(MOVERIO)開発体験したいWindows8でOpenCVを使ったAndroid(MOVERIO)開発体験したい
Windows8でOpenCVを使ったAndroid(MOVERIO)開発体験したい
Yukio Saito
 
NTTcom cloud n にサービス追加の適当な手順
NTTcom cloud n にサービス追加の適当な手順NTTcom cloud n にサービス追加の適当な手順
NTTcom cloud n にサービス追加の適当な手順
Yukio Saito
 
Intel xdk導入とhtml5サンプルビルド手順書
Intel xdk導入とhtml5サンプルビルド手順書Intel xdk導入とhtml5サンプルビルド手順書
Intel xdk導入とhtml5サンプルビルド手順書
Yukio Saito
 
圏央道ウォーキング日記
圏央道ウォーキング日記圏央道ウォーキング日記
圏央道ウォーキング日記Yukio Saito
 

More from Yukio Saito (20)

東京2020ボランティア参加メモ(簡易)
東京2020ボランティア参加メモ(簡易)東京2020ボランティア参加メモ(簡易)
東京2020ボランティア参加メモ(簡易)
 
Exam prep microsoft_ai900_japanese_210428
Exam prep microsoft_ai900_japanese_210428Exam prep microsoft_ai900_japanese_210428
Exam prep microsoft_ai900_japanese_210428
 
Simple know how to creating agenda notes and daily reports
Simple know how to creating agenda notes and daily reportsSimple know how to creating agenda notes and daily reports
Simple know how to creating agenda notes and daily reports
 
Aws 転送時間計測(手順付き参考例)
Aws 転送時間計測(手順付き参考例)Aws 転送時間計測(手順付き参考例)
Aws 転送時間計測(手順付き参考例)
 
異業種から福祉業界ジョブチェンジして10か月後
異業種から福祉業界ジョブチェンジして10か月後異業種から福祉業界ジョブチェンジして10か月後
異業種から福祉業界ジョブチェンジして10か月後
 
異業種から福祉介護ジョブチェンジ検討
異業種から福祉介護ジョブチェンジ検討異業種から福祉介護ジョブチェンジ検討
異業種から福祉介護ジョブチェンジ検討
 
オンデマンド学習スタイル例 NFU
オンデマンド学習スタイル例 NFUオンデマンド学習スタイル例 NFU
オンデマンド学習スタイル例 NFU
 
Engadget電子工作部 健康ガジェットを作ろう ドS!コーチ発表最終版
Engadget電子工作部 健康ガジェットを作ろう ドS!コーチ発表最終版Engadget電子工作部 健康ガジェットを作ろう ドS!コーチ発表最終版
Engadget電子工作部 健康ガジェットを作ろう ドS!コーチ発表最終版
 
Tobii eye x controller で遊ぶ
Tobii eye x controller で遊ぶTobii eye x controller で遊ぶ
Tobii eye x controller で遊ぶ
 
斉藤之雄 が 公立大学 産業技術大学院大学 で獲得したこと。
斉藤之雄 が 公立大学 産業技術大学院大学 で獲得したこと。斉藤之雄 が 公立大学 産業技術大学院大学 で獲得したこと。
斉藤之雄 が 公立大学 産業技術大学院大学 で獲得したこと。
 
Microsoft windows phone_激安購入方法
Microsoft windows phone_激安購入方法Microsoft windows phone_激安購入方法
Microsoft windows phone_激安購入方法
 
PBLでは先行学習は大事だぜ、シラバスは参考程度で主体的に楽しもうぜ
PBLでは先行学習は大事だぜ、シラバスは参考程度で主体的に楽しもうぜPBLでは先行学習は大事だぜ、シラバスは参考程度で主体的に楽しもうぜ
PBLでは先行学習は大事だぜ、シラバスは参考程度で主体的に楽しもうぜ
 
CentOS7をインストールして遊ぶのだ
CentOS7をインストールして遊ぶのだCentOS7をインストールして遊ぶのだ
CentOS7をインストールして遊ぶのだ
 
Androidエミュレータをちょっと速くするintel haxm(ハッサム)
Androidエミュレータをちょっと速くするintel haxm(ハッサム)Androidエミュレータをちょっと速くするintel haxm(ハッサム)
Androidエミュレータをちょっと速くするintel haxm(ハッサム)
 
Winodws7のruby2でrails4を遊ぶ環境を作るのだ。
Winodws7のruby2でrails4を遊ぶ環境を作るのだ。Winodws7のruby2でrails4を遊ぶ環境を作るのだ。
Winodws7のruby2でrails4を遊ぶ環境を作るのだ。
 
Astah plugin 実行方法とSysML要求図のサンプル
Astah plugin 実行方法とSysML要求図のサンプルAstah plugin 実行方法とSysML要求図のサンプル
Astah plugin 実行方法とSysML要求図のサンプル
 
Windows8でOpenCVを使ったAndroid(MOVERIO)開発体験したい
Windows8でOpenCVを使ったAndroid(MOVERIO)開発体験したいWindows8でOpenCVを使ったAndroid(MOVERIO)開発体験したい
Windows8でOpenCVを使ったAndroid(MOVERIO)開発体験したい
 
NTTcom cloud n にサービス追加の適当な手順
NTTcom cloud n にサービス追加の適当な手順NTTcom cloud n にサービス追加の適当な手順
NTTcom cloud n にサービス追加の適当な手順
 
Intel xdk導入とhtml5サンプルビルド手順書
Intel xdk導入とhtml5サンプルビルド手順書Intel xdk導入とhtml5サンプルビルド手順書
Intel xdk導入とhtml5サンプルビルド手順書
 
圏央道ウォーキング日記
圏央道ウォーキング日記圏央道ウォーキング日記
圏央道ウォーキング日記
 

Recently uploaded

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 

Recently uploaded (20)

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 

Nvidia® cuda™ 5 sample evaluationresult_2

  • 1. NVIDIA® CUDA™ 5.0 Sample evaluation result PART Ⅱ GPU: GTX 560 Ti CPU: i5-3450S (TDP65W) RAM: 16GB OS: Windows 7 x64 Ultimate Yukio Saitoh | FXFROG.com 24/Apr/2013
  • 2. INDEX Sample binary : 19. concurrentKernels 20. conjugateGradient 21. concurrentKernels 22. conjugateGradient 23. conjugateGradientPrecond 24. convolutionFFT2D 25. convolutionSeparable 26. convolutionTexture 27. cppIntegration 28. cudaDecodeD3D9 (runaway) 29. cudaDecodeGL 30. cudaEncode (runaway) 31. dct8x8 32. deviceQuery 33. deviceQueryDrv 34. dwtHaar1D 35. dxtc
  • 3. Sample target path and files • C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release
  • 4. concurrentKernels.exe [C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥concurrentKernels.exe] - Starting... GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 > Detected Compute SM 2.1 hardware with 8 multi-processors Expected time for serial execution of 8 kernels = 0.080s Expected time for concurrent execution of 8 kernels = 0.010s Measured time for sample = 0.010s Test passed
  • 5. conjugateGradient.exe GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 > GPU device has 8 Multi-Processors, SM 2.1 compute capabilities iteration = 1, residual = 4.451374e+001 iteration = 2, residual = 3.248658e+000 iteration = 3, residual = 2.695777e-001 iteration = 4, residual = 2.314586e-002 iteration = 5, residual = 1.997625e-003 iteration = 6, residual = 1.852079e-004 iteration = 7, residual = 1.705767e-005 iteration = 8, residual = 1.618583e-006 Test Summary: Error amount = 0.000000
  • 6. conjugateGradientPrecond.exe conjugateGradientPrecond starting... GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 GPU selected Device ID = 0 > GPU device has 8 Multi-Processors, SM 2.1 compute capabilities laplace dimension = 128 Convergence of conjugate gradient without preconditioning: iteration = 542, residual = 8.660636e-013 Convergence Test: OK Convergence of conjugate gradient using incomplete LU preconditioning: iteration = 188, residual = 9.056491e-013 Convergence Test: OK Test Summary: Counted total of 0 errors qaerr1 = 0.000004 qaerr2 = 0.000003
  • 7. convolutionFFT2D.exe 1/2 [C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥convolutionFFT2D.exe] - Starting... GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 Testing built-in R2C / C2R FFT-based convolution ...allocating memory ...generating random input data ...creating R2C & C2R FFT plans for 2048 x 2048 ...uploading to GPU and padding convolution kernel and input data ...transforming convolution kernel ...running GPU FFT convolution: 1267.922657 MPix/s (3.154767 ms) ...reading back GPU convolution results ...running reference CPU convolution ...comparing the results: rel L2 = 7.179421E-008 (max delta = 4.808732E-007) L2norm Error OK ...shutting down Testing custom R2C / C2R FFT-based convolution ...allocating memory ...generating random input data ...creating C2C FFT plan for 2048 x 1024 ...uploading to GPU and padding convolution kernel and input data ...transforming convolution kernel ...running GPU FFT convolution: 1261.058719 MPix/s (3.171938 ms) ...reading back GPU FFT results ...running reference CPU convolution ...comparing the results: rel L2 = 7.505000E-008 (max delta = 4.873593E-007) L2norm Error OK ...shutting down
  • 8. convolutionFFT2D.exe 2/2 Testing updated custom R2C / C2R FFT-based convolution ...allocating memory ...generating random input data ...creating C2C FFT plan for 2048 x 1024 ...uploading to GPU and padding convolution kernel and input data ...transforming convolution kernel ...running GPU FFT convolution: 1588.813202 MPix/s (2.517602 ms) ...reading back GPU FFT results ...running reference CPU convolution ...comparing the results: rel L2 = 7.470519E-008 (max delta = 5.276085E-007) L2norm Error OK ...shutting down Test Summary: 0 errors Test passed
  • 9. convolutionSeparable.exe [C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥convolutionSeparable.exe] - Starting... GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 Image Width x Height = 3072 x 3072 Allocating and initializing host arrays... Allocating and initializing CUDA arrays... Running GPU convolution (16 identical iterations)... convolutionSeparable, Throughput = 3179.0263 MPixels/sec, Time = 0.00297 s, Size = 9437184 Pixels, NumDevsUsed = 1, Work group = 0 Reading back GPU results... Checking the results... ...running convolutionRowCPU() ...running convolutionColumnCPU() ...comparing the results ...Relative L2 norm: 0.000000E+000 Shutting down... Test passed
  • 10. convolutionTexture.exe [C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥convolutionTexture.exe] - Starting... GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 Initializing data... Running GPU rows convolution (10 identical iterations)... Average convolutionRowsGPU() time: 1.427774 msecs; //3304.859282 Mpix/s Copying convolutionRowGPU() output back to the texture... cudaMemcpyToArray() time: 0.481161 msecs; //9806.674660 Mpix/s Running GPU columns convolution (10 iterations) Average convolutionColumnsGPU() time: 1.429637 msecs; //3300.552071 Mpix/s Reading back GPU results... Checking the results... ...running convolutionRowsCPU() ...running convolutionColumnsCPU() Relative L2 norm: 0.000000E+000 Shutting down... Test passed
  • 11. cppIntegration.exe GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 Hello World. Hello World.
  • 12. cudaDecodeD3D9.exe (runaway) Command Line Arguments: argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaDecodeD3D9.exe
  • 13. cudaDecodeGL.exe 1/2 [CUDA/OpenGL Video Decode] Command Line Arguments: argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaDecodeGL.exe [cudaDecodeGL]: input file: <../../../3_Imaging/cudaDecodeGL/data/plush1_720p_10s.m2v> VideoCodec : MPEG-2 Frame rate : 30000/1001fps ~ 29.97fps Sequence format : Progressive Coded frame size: [1280, 720] Display area : [0, 0, 1280, 720] Chroma format : 4:2:0 Bitrate : 14116kBit/s Aspect ratio : 16:9 argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaDecodeGL.exe > Device 0: <GeForce GTX 560 Ti >, Compute SM 2.1 detected -> GPU 0: < GeForce GTX 560 Ti > driver mode is: WDDM >> initGL() creating window [1280 x 720] > Using CUDA/GL Device [0]: GeForce GTX 560 Ti > Using GPU Device: GeForce GTX 560 Ti has SM 2.1 compute capability Total amount of global memory: 1024.0000 MB >> modInitCTX<NV12ToARGB_drvapi_x64.ptx > initialized OK >> modGetCudaFunction< CUDA file: NV12ToARGB_drvapi_x64.ptx > CUDA Kernel Function (0x0a4c6660) = < NV12ToARGB_drvapi > >> modGetCudaFunction< CUDA file: NV12ToARGB_drvapi_x64.ptx > CUDA Kernel Function (0x0a4c6210) = < Passthru_drvapi > > VideoDecoder::cudaVideoCreateFlags = <1>Use CUDA decoder
  • 14. cudaDecodeGL.exe 2/2 setTextureFilterMode(GL_NEAREST,GL_NEAREST) ImageGL::CUcontext = 02047fd0 ImageGL::CUdevice = 00000000 reshape() glViewport(0, 0, 1280, 720) [cudaDecodeGL] - [Frame: 0016, 00.0 fps, frame time: 98854.47 (ms) ] [cudaDecodeGL] - [Frame: 0032, 736.9 fps, frame time: 1.36 (ms) ] [cudaDecodeGL] - [Frame: 0048, 687.3 fps, frame time: 1.45 (ms) ] [cudaDecodeGL] - [Frame: 0064, 788.9 fps, frame time: 1.27 (ms) ] [cudaDecodeGL] - [Frame: 0080, 748.5 fps, frame time: 1.34 (ms) ] [cudaDecodeGL] - [Frame: 0096, 724.5 fps, frame time: 1.38 (ms) ] [cudaDecodeGL] - [Frame: 0112, 747.5 fps, frame time: 1.34 (ms) ] [cudaDecodeGL] - [Frame: 0128, 738.9 fps, frame time: 1.35 (ms) ] [cudaDecodeGL] - [Frame: 0144, 749.4 fps, frame time: 1.33 (ms) ] [cudaDecodeGL] - [Frame: 0160, 764.7 fps, frame time: 1.31 (ms) ] [cudaDecodeGL] - [Frame: 0176, 802.6 fps, frame time: 1.25 (ms) ] [cudaDecodeGL] - [Frame: 0192, 766.6 fps, frame time: 1.30 (ms) ] [cudaDecodeGL] - [Frame: 0208, 827.8 fps, frame time: 1.21 (ms) ] [cudaDecodeGL] - [Frame: 0224, 774.1 fps, frame time: 1.29 (ms) ] [cudaDecodeGL] - [Frame: 0240, 793.3 fps, frame time: 1.26 (ms) ] [cudaDecodeGL] - [Frame: 0256, 742.5 fps, frame time: 1.35 (ms) ] [cudaDecodeGL] - [Frame: 0272, 789.0 fps, frame time: 1.27 (ms) ] [cudaDecodeGL] - [Frame: 0288, 803.1 fps, frame time: 1.25 (ms) ] [cudaDecodeGL] - [Frame: 0304, 723.6 fps, frame time: 1.38 (ms) ] [cudaDecodeGL] - [Frame: 0320, 728.5 fps, frame time: 1.37 (ms) ] [cudaDecodeGL] statistics Video Length (hh:mm:ss.msec) = 00:00:00.440 Frames Presented (inc repeats) = 326 Average Present Rate (fps) = 739.44 Frames Decoded (hardware) = 327 Average Rate of Decoding (fps) = 741.71
  • 15. cudaDecodeD3D9.exe 1/2 Command Line Arguments: argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaDecodeD3D9.exe [cudaDecodeD3D9]: input file: <../../../3_Imaging/cudaDecodeD3D9/data/plush1_720p_10s.m2v> VideoCodec : MPEG-2 Frame rate : 30000/1001fps ~ 29.97fps Sequence format : Progressive Coded frame size: [1280, 720] Display area : [0, 0, 1280, 720] Chroma format : 4:2:0 Bitrate : 14116kBit/s Aspect ratio : 16:9 > Using GPU Device 0: GeForce GTX 560 Ti has SM 2.1 compute capability Total amount of global memory: 1024.0000 MB >> modInitCTX<NV12ToARGB_drvapi_x64.ptx> initialized SUCCESS! >> modGetCudaFunction<NV12ToARGB_drvapi_x64.ptx> CUDA Kernel Function = <NV12ToARGB_drvapi, 0x04439d20> >> modGetCudaFunction<NV12ToARGB_drvapi_x64.ptx> CUDA Kernel Function = <Passthru_drvapi, 0x044398d0> > VideoDecoder::cudaVideoCreateFlags = <1>Use CUDA decoder
  • 16. cudaDecodeD3D9.exe 2/2 [cudaDecodeD3D9] - [Frame: 0016, 833.6 fps, time: 1.20 (ms) ] [cudaDecodeD3D9] - [Frame: 0032, 1031.0 fps, time: 0.97 (ms) ] [cudaDecodeD3D9] - [Frame: 0048, 843.8 fps, time: 1.19 (ms) ] [cudaDecodeD3D9] - [Frame: 0064, 864.4 fps, time: 1.16 (ms) ] [cudaDecodeD3D9] - [Frame: 0080, 850.9 fps, time: 1.18 (ms) ] [cudaDecodeD3D9] - [Frame: 0096, 819.0 fps, time: 1.22 (ms) ] [cudaDecodeD3D9] - [Frame: 0112, 844.0 fps, time: 1.18 (ms) ] [cudaDecodeD3D9] - [Frame: 0128, 815.6 fps, time: 1.23 (ms) ] [cudaDecodeD3D9] - [Frame: 0144, 821.0 fps, time: 1.22 (ms) ] [cudaDecodeD3D9] - [Frame: 0160, 874.7 fps, time: 1.14 (ms) ] [cudaDecodeD3D9] - [Frame: 0176, 960.4 fps, time: 1.04 (ms) ] [cudaDecodeD3D9] - [Frame: 0192, 947.7 fps, time: 1.06 (ms) ] [cudaDecodeD3D9] - [Frame: 0208, 896.7 fps, time: 1.12 (ms) ] [cudaDecodeD3D9] - [Frame: 0224, 872.5 fps, time: 1.15 (ms) ] [cudaDecodeD3D9] - [Frame: 0240, 922.7 fps, time: 1.08 (ms) ] [cudaDecodeD3D9] - [Frame: 0256, 943.2 fps, time: 1.06 (ms) ] [cudaDecodeD3D9] - [Frame: 0272, 936.6 fps, time: 1.07 (ms) ] [cudaDecodeD3D9] - [Frame: 0288, 899.8 fps, time: 1.11 (ms) ] [cudaDecodeD3D9] - [Frame: 0304, 901.0 fps, time: 1.11 (ms) ] [cudaDecodeD3D9] - [Frame: 0320, 813.1 fps, time: 1.23 (ms) ] [cudaDecodeD3D9] statistics Video Length (hh:mm:ss.msec) = 00:00:00.375 Frames Presented (inc repeats) = 326 Average Present FPS = 868.73 Frames Decoded (hardware) = 327 Average Decoder FPS = 871.40
  • 17. cudaEncode.exe (runaway) Starting cudaEncode... [ CUDA H.264 Encoder ] argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaEncode.exe
  • 18. dct8x8.exe dct8x8.exe Starting... GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 CUDA sample DCT/IDCT implementation =================================== Loading test image: barbara.bmp... [512 x 512]... Success Running Gold 1 (CPU) version... Success Running Gold 2 (CPU) version... Success Running CUDA 1 (GPU) version... Success Running CUDA 2 (GPU) version... 10459.499992 MPix/s //0.025063 ms Success Running CUDA short (GPU) version... Success Dumping result to barbara_gold1.bmp... Success Dumping result to barbara_gold2.bmp... Success Dumping result to barbara_cuda1.bmp... Success Dumping result to barbara_cuda2.bmp... Success Dumping result to barbara_cuda_short.bmp... Success Processing time (CUDA 1) : 0.209782 ms Processing time (CUDA 2) : 0.025063 ms Processing time (CUDA short): 0.170617 ms PSNR Original <---> CPU(Gold 1) : 32.777073 PSNR Original <---> CPU(Gold 2) : 32.777046 PSNR Original <---> GPU(CUDA 1) : 32.777092 PSNR Original <---> GPU(CUDA 2) : 32.777077 PSNR Original <---> GPU(CUDA short): 32.749447 PSNR CPU(Gold 1) <---> GPU(CUDA 1) : 64.019310 PSNR CPU(Gold 2) <---> GPU(CUDA 2) : 71.777740 PSNR CPU(Gold 2) <---> GPU(CUDA short): 42.258053 Test Summary... Test passed
  • 24. deviceQuery.exe 1/2 C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥deviceQuery.exe Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "GeForce GTX 560 Ti" CUDA Driver Version / Runtime Version 5.0 / 5.0 CUDA Capability Major/Minor version number: 2.1 Total amount of global memory: 1024 MBytes (1073741824 bytes) ( 8) Multiprocessors x ( 48) CUDA Cores/MP: 384 CUDA Cores GPU Clock rate: 1800 MHz (1.80 GHz) Memory Clock rate: 2050 Mhz Memory Bus Width: 256-bit L2 Cache Size: 524288 bytes Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048) Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 32768 Warp size: 32
  • 25. deviceQuery.exe 2/2 Maximum number of threads per multiprocessor: 1536 Maximum number of threads per block: 1024 Maximum sizes of each dimension of a block: 1024 x 1024 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535 Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model) Device supports Unified Addressing (UVA): Yes Device PCI Bus ID / PCI location ID: 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = GeForce GTX 560 Ti
  • 26. deviceQueryDrv.exe 1/2 C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥deviceQueryDrv.exe Starting... CUDA Device Query (Driver API) statically linked version Detected 1 CUDA Capable device(s) Device 0: "GeForce GTX 560 Ti" CUDA Driver Version: 5.0 CUDA Capability Major/Minor version number: 2.1 Total amount of global memory: 1024 MBytes (1073741824 bytes) ( 8) Multiprocessors x ( 48) CUDA Cores/MP: 384 CUDA Cores GPU Clock rate: 1800 MHz (1.80 GHz) Memory Clock rate: 2050 Mhz Memory Bus Width: 256-bit L2 Cache Size: 524288 bytes Max Texture Dimension Sizes 1D=(65536) 2D=(65536,65535) 3D=(2048,2048,2048) Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 32768 Warp size: 32
  • 27. deviceQueryDrv.exe 2/2 Maximum number of threads per multiprocessor: 1536 Maximum number of threads per block: 1024 Maximum sizes of each dimension of a block: 1024 x 1024 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535 Texture alignment: 512 bytes Maximum memory pitch: 2147483647 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Concurrent kernel execution: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model) Device supports Unified Addressing (UVA): Yes Device PCI Bus ID / PCI location ID: 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
  • 28. dwtHaar1D.exe C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥dwtHaar1D.exe Starting... GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 source file = "../../../3_Imaging/dwtHaar1D/data/signal.dat" reference file = "result.dat" gold file = "../../../3_Imaging/dwtHaar1D/data/regression.gold.dat" Reading signal from "../../../3_Imaging/dwtHaar1D/data/signal.dat" Writing result to "result.dat" Reading reference result from "../../../3_Imaging/dwtHaar1D/data/regression.gold.dat" Test success! Signal.dat 9.5012929e-001 2.3113851e-001 6.0684258e-001 4.8598247e-001 8.9129897e-001 ・ ・ ・ Regression.gold.dat Result.dat
  • 29. dxtc.exe C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥dxtc.exe Starting... GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 Image Loaded '../../../3_Imaging/dxtc/data/lena_std.ppm', 512 x 512 pixels Running DXT Compression on 512 x 512 image... 16384 Blocks, 64 Threads per Block, 1048576 Threads in Grid... dxtc, Throughput = 17.7004 MPixels/s, Time = 0.01481 s, Size = 262144 Pixels, NumDevsUsed = 1, Workgroup = 64
  • 30. dxtc.exe 1/4 Checking accuracy... Deviation at ( 9, 1): 0.791667 rms Deviation at ( 99, 1): 1.041667 rms Deviation at ( 12, 2): 0.937500 rms Deviation at ( 90, 3): 0.166667 rms Deviation at ( 38, 4): 1.916667 rms Deviation at ( 34, 7): 1.687500 rms Deviation at ( 57, 7): 0.458333 rms Deviation at ( 100, 8): 2.416667 rms Deviation at ( 30, 9): 2.375000 rms Deviation at ( 31, 9): 0.770833 rms Deviation at ( 58, 9): 0.791667 rms Deviation at ( 29, 10): 0.020833 rms Deviation at ( 79, 10): 1.833333 rms Deviation at ( 13, 11): 1.041667 rms Deviation at ( 4, 13): 8.562500 rms Deviation at ( 28, 13): 0.562500 rms Deviation at ( 90, 13): 0.708333 rms Deviation at ( 25, 14): 0.520833 rms Deviation at ( 69, 14): 0.770833 rms Deviation at ( 87, 16): 0.708333 rms Deviation at ( 90, 17): 1.041667 rms Deviation at ( 24, 19): 0.916667 rms Deviation at ( 25, 19): 0.625000 rms Deviation at ( 26, 19): 1.041667 rms Deviation at ( 55, 20): 4.791667 rms Deviation at ( 20, 23): 1.541667 rms Deviation at ( 99, 23): 3.312500 rms Deviation at ( 45, 24): 18.104166 rms Deviation at ( 8, 28): 0.895833 rms
  • 31. dxtc.exe 2/4 Deviation at ( 21, 30): 1.562500 rms Deviation at ( 115, 32): 24.104166 rms Deviation at ( 2, 33): 0.854167 rms Deviation at ( 102, 33): 2.250000 rms Deviation at ( 50, 35): 26.958334 rms Deviation at ( 68, 35): 11.937500 rms Deviation at ( 115, 36): 0.458333 rms Deviation at ( 12, 38): 2.166667 rms Deviation at ( 40, 40): 0.270833 rms Deviation at ( 86, 43): 0.604167 rms Deviation at ( 116, 43): 0.125000 rms Deviation at ( 43, 44): 2.250000 rms Deviation at ( 54, 44): 4.791667 rms Deviation at ( 46, 46): 2.875000 rms Deviation at ( 116, 46): 0.604167 rms Deviation at ( 4, 47): 0.708333 rms Deviation at ( 117, 48): 0.937500 rms Deviation at ( 23, 51): 3.520833 rms Deviation at ( 11, 52): 0.041667 rms Deviation at ( 67, 54): 5.687500 rms Deviation at ( 26, 55): 0.854167 rms Deviation at ( 21, 56): 5.000000 rms Deviation at ( 24, 56): 0.562500 rms Deviation at ( 30, 57): 0.937500 rms Deviation at ( 21, 59): 2.541667 rms Deviation at ( 120, 59): 0.104167 rms Deviation at ( 112, 60): 1.125000 rms Deviation at ( 77, 61): 1.083333 rms
  • 32. dxtc.exe 3/4 Deviation at ( 114, 62): 4.958333 rms Deviation at ( 78, 66): 0.541667 rms Deviation at ( 106, 68): 0.375000 rms Deviation at ( 16, 70): 3.104167 rms Deviation at ( 10, 71): 0.937500 rms Deviation at ( 108, 71): 0.354167 rms Deviation at ( 0, 72): 0.854167 rms Deviation at ( 118, 72): 5.562500 rms Deviation at ( 11, 73): 0.541667 rms Deviation at ( 68, 74): 1.937500 rms Deviation at ( 70, 76): 1.791667 rms Deviation at ( 124, 76): 3.354167 rms Deviation at ( 103, 78): 0.375000 rms Deviation at ( 127, 78): 0.541667 rms Deviation at ( 108, 79): 0.083333 rms Deviation at ( 120, 81): 0.541667 rms Deviation at ( 43, 82): 24.979166 rms Deviation at ( 67, 82): 3.125000 rms Deviation at ( 78, 82): 2.437500 rms Deviation at ( 123, 84): 0.541667 rms Deviation at ( 127, 85): 0.187500 rms Deviation at ( 122, 87): 0.083333 rms Deviation at ( 124, 87): 0.541667 rms Deviation at ( 127, 88): 0.229167 rms Deviation at ( 93, 91): 0.666667 rms Deviation at ( 115, 93): 0.083333 rms Deviation at ( 69, 95): 1.875000 rms Deviation at ( 106, 95): 1.125000 rms
  • 33. dxtc.exe 4/4 Deviation at ( 107, 95): 3.708333 rms Deviation at ( 13, 96): 1.354167 rms Deviation at ( 115, 98): 0.187500 rms Deviation at ( 118, 98): 0.187500 rms Deviation at ( 116, 101): 0.187500 rms Deviation at ( 78, 105): 0.541667 rms Deviation at ( 67, 107): 0.708333 rms Deviation at ( 74, 107): 0.375000 rms Deviation at ( 65, 109): 0.770833 rms Deviation at ( 89, 109): 0.708333 rms Deviation at ( 118, 109): 3.854167 rms Deviation at ( 67, 110): 1.083333 rms Deviation at ( 88, 111): 0.208333 rms Deviation at ( 64, 113): 0.708333 rms Deviation at ( 84, 113): 0.333333 rms Deviation at ( 88, 113): 0.187500 rms Deviation at ( 84, 114): 1.666667 rms Deviation at ( 66, 115): 0.770833 rms Deviation at ( 19, 118): 5.270833 rms Deviation at ( 76, 121): 0.104167 rms Deviation at ( 70, 122): 0.708333 rms Deviation at ( 91, 122): 0.208333 rms Deviation at ( 71, 123): 0.854167 rms Deviation at ( 75, 123): 0.854167 rms Deviation at ( 61, 124): 0.937500 rms Deviation at ( 91, 124): 0.270833 rms RMS(reference, result) = 0.015488 Test passed
  • 34. Summary GTX560, Some samples does not work fine. → MUST support CUDA compute capability 3.0. → Requires GPU devices with compute SM 3.5 or higher. This evaluation to be continued, For future reference.