More Related Content
Similar to NVIDIA GT520, GT440 SDK MasterLog (20)
More from Yukio Saito (20)
NVIDIA GT520, GT440 SDK MasterLog
- 1. ATTN: yukio.saitoh@gmail.com NVIDIA GT520 SDK MASTER LOG 2011/08/04
FXFROG (C2D E8200, RAM 8GB, Win7 x64)
deviceQuery CUDA Driver = CUDART CUDA Driver Version = 4.0 CUDA Runtime Version = 4.0 NumDevs = 1 Device = GeForce GT 520
deviceQuery CUDA Driver = CUDART CUDA Driver Version = 4.0 CUDA Runtime Version = 4.0 NumDevs = 1 Device = GeForce GT 520
MersenneTwister Throughput = 0.2892 GNumbers/s Time = 0.08299 s Size = 24002560 Numbers NumDevsUsed = 1 Workgroup = 128
quasirandomGenerator Throughput = 0.3590 GNumbers/s Time = 0.00876 s Size = 3145728 Numbers NumDevsUsed = 1 Workgroup = 384
quasirandomGenerator-inverse Throughput = 0.7063 GNumbers/s Time = 0.00445 s Size = 3145728 Numbers NumDevsUsed = 1 Workgroup = 128
radixSort Throughput = 29.1391 MElements/s Time = 0.03599 s Size = 1048576 elements
Reduction Throughput = 7.3134 GB/s Time = 0.00918 s Size = 16777216 Elements NumDevsUsed = 1 Workgroup = 256
scan-Short Throughput = 0.1387 MElements/s Time = 0.00738 s Size = 1024 Elements NumDevsUsed = 1 Workgroup = 256
scan-Large Throughput = 17.5481 MElements/s Time = 0.01494 s Size = 262144 Elements NumDevsUsed = 1 Workgroup = 256
- 2. ATTN: yukio.saitoh@gmail.com NVIDIA GT440 SDK MASTER LOG 2011/08/04
FXFROG (C2D E8200, RAM 8GB, Win7 x64)
deviceQuery CUDA Driver = CUDART CUDA Driver Version = 4.0 CUDA Runtime Version = 4.0 NumDevs = 1 Device = GeForce GT 440
BlackScholes Throughput = 1.6170 GOptions/s Time = 0.00495 s Size = 8000000 options NumDevsUsed =1 Workgroup = 128
convolutionSeparable Throughput = 278.8831 MPixels/sec Time = 0.03384 s Size = 9437184 Pixels NumDevsUsed =1 Workgroup = 0
deviceQuery CUDA Driver = CUDART CUDA Driver Version = 4.0 CUDA Runtime Version = 4.0 NumDevs = 1 Device = GeForce GT 440
dxtc Throughput = 1.8818 MPixels/s Time = 0.13930 s Size = 262144 Pixels NumDevsUsed =1 Workgroup = 64
histogram64 Throughput = 3087.4561 MB/s Time = 0.02174 s Size = 67108864 Bytes NumDevsUsed =1 Workgroup = 64
histogram256 Throughput = 1720.8238 MB/s Time = 0.03900 s Size = 67108864 Bytes NumDevsUsed =1 Workgroup = 192
> CUBLAS Throughput = 39.4083 GFlop/s Time = 0.00333 s Size = 131072000 Ops
> CUDA matrixMul Throughput = 18.3094 GFlop/s Time = 0.00716 s Size = 131072000 Ops NumDevsUsed = 1 Workgroup = 1024
MersenneTwister Throughput = 0.6063 GNumbers/s Time = 0.03959 s Size = 24002560 Numbers NumDevsUsed = 1 Workgroup = 128
quasirandomGenerator Throughput = 0.0468 GNumbers/s Time = 0.06721 s Size = 3145728 Numbers NumDevsUsed = 1 Workgroup = 384
quasirandomGenerator-inverse Throughput = 0.0998 GNumbers/s Time = 0.03153 s Size = 3145728 Numbers NumDevsUsed = 1 Workgroup = 128
radixSort Throughput = 5.1810 MElements/s Time = 0.20239 s Size = 1048576 elements
Reduction Throughput = 22.4863 GB/s Time = 0.00298 s Size = 16777216 Elements NumDevsUsed = 1 Workgroup = 256
scan-Short Throughput = 0.2270 MElements/s Time = 0.00451 s Size = 1024 Elements NumDevsUsed = 1 Workgroup = 256
scan-Large Throughput = 40.6028 MElements/s Time = 0.00646 s Size = 262144 Elements NumDevsUsed = 1 Workgroup = 256
sortingNetworks-bitonic Throughput = 11.6002 MElements/s Time = 0.09039 s Size = 1048576 elements NumDevsUsed = 1 Workgroup = 512
transpose-Outer-simple copy Throughput = 8.8884 GB/s Time = 0.21974 s Size = 262144 fp32 elements NumDevsUsed = 1 Workgroup = 256
transpose-Inner-simple copy Throughput = 36.6637 GB/s Time = 0.05327 s Size = 262144 fp32 elements NumDevsUsed = 1 Workgroup = 256
transpose-Outer-shared memory copy Throughput = 3.8134 GB/s Time = 0.51218 s Size = 262144 fp32 elements NumDevsUsed = 1 Workgroup = 256
transpose-Inner-shared memory copy Throughput = 13.6798 GB/s Time = 0.14277 s Size = 262144 fp32 elements NumDevsUsed = 1 Workgroup = 256
transpose-Outer-naive Throughput = 3.2989 GB/s Time = 0.59205 s Size = 262144 fp32 elements NumDevsUsed = 1 Workgroup = 256
transpose-Inner-naive Throughput = 6.8978 GB/s Time = 0.28315 s Size = 262144 fp32 elements NumDevsUsed = 1 Workgroup = 256
transpose-Outer-coalesced Throughput = 4.6051 GB/s Time = 0.42412 s Size = 262144 fp32 elements NumDevsUsed = 1 Workgroup = 256
transpose-Inner-coalesced Throughput = 13.1095 GB/s Time = 0.14899 s Size = 262144 fp32 elements NumDevsUsed = 1 Workgroup = 256
transpose-Outer-optimized Throughput = 5.9365 GB/s Time = 0.32900 s Size = 262144 fp32 elements NumDevsUsed = 1 Workgroup = 256
transpose-Inner-optimized Throughput = 18.6504 GB/s Time = 0.10472 s Size = 262144 fp32 elements NumDevsUsed = 1 Workgroup = 256
transpose-Outer-coarse-grained Throughput = 7.5914 GB/s Time = 0.25728 s Size = 262144 fp32 elements NumDevsUsed = 1 Workgroup = 256
transpose-Inner-coarse-grained Throughput = 18.4704 GB/s Time = 0.10574 s Size = 262144 fp32 elements NumDevsUsed = 1 Workgroup = 256
transpose-Outer-fine-grained Throughput = 5.5713 GB/s Time = 0.35057 s Size = 262144 fp32 elements NumDevsUsed = 1 Workgroup = 256
transpose-Inner-fine-grained Throughput = 18.2539 GB/s Time = 0.10700 s Size = 262144 fp32 elements NumDevsUsed = 1 Workgroup = 256
transpose-Outer-diagonal Throughput = 4.2356 GB/s Time = 0.46112 s Size = 262144 fp32 elements NumDevsUsed = 1 Workgroup = 256
transpose-Inner-diagonal Throughput = 19.9927 GB/s Time = 0.09769 s Size = 262144 fp32 elements NumDevsUsed = 1 Workgroup = 256