Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1
Upcoming SlideShare
Loading in...5
×
 

Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1

on

  • 804 views

This evaluation to be continued, For future reference.

This evaluation to be continued, For future reference.

Statistics

Views

Total Views
804
Views on SlideShare
804
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1 Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1 Presentation Transcript

    • NVIDIA® CUDA™ 5.0Sample evaluation result PART Ⅰ GPU: GTX 560 Ti CPU: i5-3450S (TDP65W) RAM: 16GB OS: Windows 7 x64 Ultimate Yukio Saitoh | FXFROG.com 21st/Apr/2013
    • INDEXSample binary : 1. alignedTypes.exe 2. asyncAPI.exe 3. bandwidthTest.exe 4. batchCUBLAS.exe 5. bicubicTexture.exe 6. bilateralFilter.exe 7. bindlessTexture.exe / Failure 8. binomialOptions.exe 9. BlackScholes.exe 1/2 10. boxFilter.exe 11. boxFilterNPP.exe 12. cdpAdvancedQuicksort.exe / Failure 13. cdpLUDecomposition.exe / Failure 14. cdpQuadTree.exe / Failure 15. cdpSimplePrint.exe / Failure 16. cdpSimplePrint.exe / Failure 17. cdpSimpleQuicksort.exe / Failure 18. clock.exe
    • Sample target path and files• C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release
    • alignedTypes.exe 1/2[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥alignedTypes.exe] - Starting...GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1[GeForce GTX 560 Ti] has 8 MP(s) x 48 (Cores/MP) = 384 (Cores)> Compute scaling value = 1.00> Memory Size = 49999872Allocating memory...Generating host input data array...Uploading input data to GPU memory...Testing misaligned types...uint8...Avg. time: 2.563287 ms / Copy throughput: 18.166525 GB/s. TEST OKuint16...Avg. time: 1.429239 ms / Copy throughput: 32.580981 GB/s. TEST OKRGBA8_misaligned...Avg. time: 1.766606 ms / Copy throughput: 26.359026 GB/s. TEST OKLA32_misaligned...Avg. time: 0.998594 ms / Copy throughput: 46.631585 GB/s. TEST OKRGB32_misaligned...Avg. time: 1.273794 ms / Copy throughput: 36.556941 GB/s. TEST OKRGBA32_misaligned...Avg. time: 1.703606 ms / Copy throughput: 27.333794 GB/s. TEST OK
    • alignedTypes.exe 2/2Testing aligned types...RGBA8...Avg. time: 1.131558 ms / Copy throughput: 41.152104 GB/s. TEST OKI32...Avg. time: 1.091073 ms / Copy throughput: 42.679095 GB/s. TEST OKLA32...Avg. time: 0.952468 ms / Copy throughput: 48.889827 GB/s. TEST OKRGB32...Avg. time: 1.431797 ms / Copy throughput: 32.522784 GB/s. TEST OKRGBA32...Avg. time: 0.961305 ms / Copy throughput: 48.440401 GB/s. TEST OKRGBA32_2...Avg. time: 1.340105 ms / Copy throughput: 34.748032 GB/s. TEST OK[alignedTypes] -> Test Results: 0 Failures
    • asyncAPI.exe[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥asyncAPI.exe] - Starting...GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1CUDA device [GeForce GTX 560 Ti]time spent executing by the GPU: 22.45time spent by CPU in CUDA calls: 0.04CPU executed 12884 iterations while waiting for GPU to finish
    • bandwidthTest.exe[CUDA Bandwidth Test] - Starting...Running on...Device 0: GeForce GTX 560 TiQuick ModeHost to Device Bandwidth, 1 Device(s)PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 6016.1Device to Host Bandwidth, 1 Device(s)PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 6103.5Device to Device Bandwidth, 1 Device(s)PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 108588.2
    • batchCUBLAS.exe 1/3batchCUBLAS Starting...GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1==== Running single kernels ====Testing sgemm#### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0xbf800000, -1) beta= (0x40000000, 2)#### args: lda=128 ldb=128 ldc=128^^^^ elapsed = 0.00010011 sec GFLOPS=41.8986@@@@ sgemm test OKTesting dgemm#### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0x0000000000000000, 0) beta= (0x0000000000000000, 0)#### args: lda=128 ldb=128 ldc=128^^^^ elapsed = 0.00012166 sec GFLOPS=34.4752@@@@ dgemm test OK==== Running N=10 without streams ====Testing sgemm#### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0xbf800000, -1) beta= (0x00000000, 0)#### args: lda=128 ldb=128 ldc=128^^^^ elapsed = 0.00030251 sec GFLOPS=138.65@@@@ sgemm test OKTesting dgemm#### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0xbff0000000000000, -1) beta= (0x0000000000000000, 0)#### args: lda=128 ldb=128 ldc=128^^^^ elapsed = 0.00062913 sec GFLOPS=66.668@@@@ dgemm test OK
    • batchCUBLAS.exe 2/3==== Running N=10 without streams ====Testing sgemm#### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0xbf800000, -1) beta= (0x00000000, 0)#### args: lda=128 ldb=128 ldc=128^^^^ elapsed = 0.00030251 sec GFLOPS=138.65@@@@ sgemm test OKTesting dgemm#### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0xbff0000000000000, -1) beta= (0x0000000000000000, 0)#### args: lda=128 ldb=128 ldc=128^^^^ elapsed = 0.00062913 sec GFLOPS=66.668@@@@ dgemm test OK==== Running N=10 with streams ====Testing sgemm#### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0x40000000, 2) beta= (0x40000000, 2)#### args: lda=128 ldb=128 ldc=128^^^^ elapsed = 0.00030580 sec GFLOPS=137.159@@@@ sgemm test OKTesting dgemm#### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0xbff0000000000000, -1) beta= (0x0000000000000000, 0)#### args: lda=128 ldb=128 ldc=128^^^^ elapsed = 0.00055826 sec GFLOPS=75.1324@@@@ dgemm test OK
    • batchCUBLAS.exe 3/3==== Running N=10 batched ====Testing sgemm#### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0x3f800000, 1) beta= (0xbf800000, -1)#### args: lda=128 ldb=128 ldc=128^^^^ elapsed = 0.00051843 sec GFLOPS=80.9036@@@@ sgemm test OKTesting dgemm#### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0xbff0000000000000, -1) beta= (0x4000000000000000, 2)#### args: lda=128 ldb=128 ldc=128^^^^ elapsed = 0.00065873 sec GFLOPS=63.6729@@@@ dgemm test OKTest Summary0 error(s)
    • bicubicTexture.exe 1/2Starting bicubicTexture[CUDA BicubicTexture] (OpenGL Mode)CUDA device [GeForce GTX 560 Ti] has 8 Multi-ProcessorsLoaded lena_bw.pgm, 512 x 512 pixels Controls =/- : Zoom in/out b : Run Benchmark g_FilterMode c : Draw Bicubic Spline Curve [esc] - Quit Press number keys to change filtering g_FilterMode: 1 : nearest filtering 2 : bilinear filtering 3 : bicubic filtering 4 : fast bicubic filtering 5 : Catmull-Rom filtering
    • bicubicTexture.exe 2/2[CUDA BicubicTexture] (Benchmark Mode)time: 0.098 ms, 2673.560320 Mpixels/sec> FilterMode[1] = Nearest> FilterMode[2] = Bilinear> FilterMode[3] = Bicubic> FilterMode[4] = Fast Bicubic> FilterMode[5] = Catmull-Rom
    • bilateralFilter.exe 1/2Loading ../../../3_Imaging/bilateralFilter/data/nature_monte.bmp...BMP width: 640BMP height: 480BMP file loaded successfully!Loaded ../../../3_Imaging/bilateralFilter/data/nature_monte.bmp, 640 x 480 pixelsFound 1 CUDA Capable device(s) supporting CUDADevice 0: "GeForce GTX 560 Ti" CUDA Runtime Version : 5.0 CUDA Compute Capability : 2.1Found CUDA Capable Device 0: "GeForce GTX 560 Ti"Setting active device to 0Using device 0: GeForce GTX 560 TiRunning Standard Demonstration with GLUT loop...Press + and - to change filter widthPress ] and [ to change number of iterationsPress e and E to change Euclidean deltaPress g and G to changle Gaussian deltaPress a or A to change Animation mode ON/OFF
    • bilateralFilter.exe 2/2
    • bindlessTexture.exe / FailureCUDA bindlessTexture Starting...No GPU device was found that can support CUDA compute capability 3.0.
    • binomialOptions.exe[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥binomialOptions.exe] - Starting...GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1Using single precision...Generating input data...Running GPU binomial tree...Options count : 512Time steps : 2048binomialOptionsGPU() time: 29.790300 msecOptions per second : 17186.802203Running CPU binomial tree...Comparing the results...GPU binomial vs. Black-ScholesL1 norm: 1.323721E-004CPU binomial vs. Black-ScholesL1 norm: 1.045245E-004CPU binomial vs. GPU binomialL1 norm: 3.391858E-005Shutting down...Test passed
    • BlackScholes.exe 1/2[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥BlackScholes.exe] - Starting...GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1Initializing data......allocating CPU memory for options....allocating GPU memory for options....generating input data in CPU mem....copying input data to GPU mem.Data init done.Executing Black-Scholes GPU kernel (512 iterations)...Options count : 8000000BlackScholesGPU() time : 0.806277 msecEffective memory bandwidth: 99.221508 GB/sGigaoptions per second : 9.922151BlackScholes, Throughput = 9.9222 GOptions/s, Time = 0.00081 s, Size = 8000000 options, NumDevsUsed = 1,Workgroup = 128
    • BlackScholes.exe 2/2Reading back GPU results...Checking the results......running CPU calculations.Comparing the results...L1 norm: 1.768024E-007Max absolute error: 1.120567E-005Shutting down......releasing GPU memory....releasing CPU memory.Shutdown done.[BlackScholes] - Test SummaryTest passed
    • boxFilter.exeC:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥boxFilter.exe Starting...Loaded ../../../3_Imaging/boxFilter/data/lenaRGB.ppm, 1024 x 1024 pixelsFound 1 CUDA Capable device(s) supporting CUDADevice 0: "GeForce GTX 560 Ti" CUDA Runtime Version : 5.0 CUDA Compute Capability : 2.1Found CUDA Capable Device 0: "GeForce GTX 560 Ti"Setting active device to 0Running Standard Demonstration with GLUT loop...Press + and - to change filter widthPress ] and [ to change number of iterationsPress a or A to change animation ON/OFF
    • boxFilterNPP.exeC:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥boxFilterNPP.exe Starting...GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1cudaSetDevice GPU0 = GeForce GTX 560 TiNPP Library Version 5.0.35C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥boxFilterNPP.exe using GPU<GeForce GTX 560 Ti> with 8 SM(s) with Compute 2.1boxFilterNPP opened: <../../../common/data/Lena.pgm> successfully!Saved image: ../../../common/data/Lena_boxFilter.pgm
    • cdpAdvancedQuicksort.exe / FailureGPU 0 (GeForce GTX 560 Ti) does not support CUDA Dynamic ParallelismcdpAdvancedQuicksort requires GPU devices with compute SM 3.5 or higher. Exiting...
    • cdpLUDecomposition.exe / FailureStarting LU Decomposition (CUDA Dynamic Parallelism)GPU device GeForce GTX 560 Ti has compute capabilities (SM 2.1)cdpLUDecomposition requires SM 3.5 or higher to use CUDA Dynamic Parallelism. Exiting...
    • cdpQuadTree.exe / FailureGPU 0 (GeForce GTX 560 Ti) does not support CUDA Dynamic ParallelismcdpQuadTree requires SM 3.5 or higher to use CUDA Dynamic Parallelism. Exiting...
    • cdpSimplePrint.exe / Failurestarting Simple Print (CUDA Dynamic Parallelism)GPU 0 (GeForce GTX 560 Ti) does not support CUDA Dynamic ParallelismcdpSimplePrint requires GPU devices with compute SM 3.5 or higher. Exiting...
    • cdpSimpleQuicksort.exe / FailureGPU 0 (GeForce GTX 560 Ti) does not support CUDA Dynamic ParallelismcdpSimpleQuicksort requires GPU devices with compute SM 3.5 or higher. Exiting...
    • clock.exeCUDA Clock sampleGPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1Total clocks = 15204
    • SummaryGTX560, Some samples does not work fine.→ MUST support CUDA compute capability 3.0.→ Requires GPU devices with compute SM 3.5 orhigher. This evaluation to be continued, For futurereference.