SlideShare a Scribd company logo
NVIDIA® CUDA™ 5.0
Sample evaluation result
       PART Ⅰ


          GPU: GTX 560 Ti
          CPU: i5-3450S (TDP65W)
          RAM: 16GB
          OS: Windows 7 x64 Ultimate


             Yukio Saitoh | FXFROG.com
                                21st/Apr/2013
INDEX
Sample binary :
      1. alignedTypes.exe
      2. asyncAPI.exe
      3. bandwidthTest.exe
      4. batchCUBLAS.exe
      5. bicubicTexture.exe
      6. bilateralFilter.exe
      7. bindlessTexture.exe / Failure
      8. binomialOptions.exe
      9. BlackScholes.exe 1/2
      10. boxFilter.exe
      11. boxFilterNPP.exe
      12. cdpAdvancedQuicksort.exe / Failure
      13. cdpLUDecomposition.exe / Failure
      14. cdpQuadTree.exe / Failure
      15. cdpSimplePrint.exe / Failure
      16. cdpSimplePrint.exe / Failure
      17. cdpSimpleQuicksort.exe / Failure
      18. clock.exe
Sample target path and files
• C:¥ProgramData¥NVIDIA Corporation¥CUDA
  Samples¥v5.0¥bin¥win64¥Release
alignedTypes.exe 1/2
[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥alignedTypes.exe] - Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1

[GeForce GTX 560 Ti] has 8 MP(s) x 48 (Cores/MP) = 384 (Cores)
> Compute scaling value = 1.00
> Memory Size = 49999872
Allocating memory...
Generating host input data array...
Uploading input data to GPU memory...
Testing misaligned types...
uint8...
Avg. time: 2.563287 ms / Copy throughput: 18.166525 GB/s.
      TEST OK
uint16...
Avg. time: 1.429239 ms / Copy throughput: 32.580981 GB/s.
      TEST OK
RGBA8_misaligned...
Avg. time: 1.766606 ms / Copy throughput: 26.359026 GB/s.
      TEST OK
LA32_misaligned...
Avg. time: 0.998594 ms / Copy throughput: 46.631585 GB/s.
      TEST OK
RGB32_misaligned...
Avg. time: 1.273794 ms / Copy throughput: 36.556941 GB/s.
      TEST OK
RGBA32_misaligned...
Avg. time: 1.703606 ms / Copy throughput: 27.333794 GB/s.
      TEST OK
alignedTypes.exe 2/2
Testing aligned types...
RGBA8...
Avg. time: 1.131558 ms     / Copy throughput: 41.152104 GB/s.
       TEST OK
I32...
Avg. time: 1.091073 ms     / Copy throughput: 42.679095 GB/s.
       TEST OK
LA32...
Avg. time: 0.952468 ms     / Copy throughput: 48.889827 GB/s.
       TEST OK
RGB32...
Avg. time: 1.431797 ms     / Copy throughput: 32.522784 GB/s.
       TEST OK
RGBA32...
Avg. time: 0.961305 ms     / Copy throughput: 48.440401 GB/s.
       TEST OK
RGBA32_2...
Avg. time: 1.340105 ms     / Copy throughput: 34.748032 GB/s.
       TEST OK

[alignedTypes] -> Test Results: 0 Failures
asyncAPI.exe
[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥asyncAPI.exe] - Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1

CUDA device [GeForce GTX 560 Ti]
time spent executing by the GPU: 22.45
time spent by CPU in CUDA calls: 0.04
CPU executed 12884 iterations while waiting for GPU to finish
bandwidthTest.exe
[CUDA Bandwidth Test] - Starting...
Running on...

Device 0: GeForce GTX 560 Ti
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
 Transfer Size (Bytes)     Bandwidth(MB/s)
 33554432                6016.1

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
 Transfer Size (Bytes)     Bandwidth(MB/s)
 33554432                6103.5

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
 Transfer Size (Bytes)    Bandwidth(MB/s)
 33554432                108588.2
batchCUBLAS.exe 1/3
batchCUBLAS Starting...

GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
==== Running single kernels ====

Testing sgemm
#### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0xbf800000, -1) beta= (0x40000000, 2)
#### args: lda=128 ldb=128 ldc=128
^^^^ elapsed = 0.00010011 sec GFLOPS=41.8986
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0x0000000000000000, 0) beta= (0x0000000000000000, 0)
#### args: lda=128 ldb=128 ldc=128
^^^^ elapsed = 0.00012166 sec GFLOPS=34.4752
@@@@ dgemm test OK

==== Running N=10 without streams ====

Testing sgemm
#### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0xbf800000, -1) beta= (0x00000000, 0)
#### args: lda=128 ldb=128 ldc=128
^^^^ elapsed = 0.00030251 sec GFLOPS=138.65
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0xbff0000000000000, -1) beta= (0x0000000000000000, 0)
#### args: lda=128 ldb=128 ldc=128
^^^^ elapsed = 0.00062913 sec GFLOPS=66.668
@@@@ dgemm test OK
batchCUBLAS.exe 2/3
==== Running N=10 without streams ====

Testing sgemm
#### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0xbf800000, -1) beta= (0x00000000, 0)
#### args: lda=128 ldb=128 ldc=128
^^^^ elapsed = 0.00030251 sec GFLOPS=138.65
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0xbff0000000000000, -1) beta= (0x0000000000000000, 0)
#### args: lda=128 ldb=128 ldc=128
^^^^ elapsed = 0.00062913 sec GFLOPS=66.668
@@@@ dgemm test OK

==== Running N=10 with streams ====

Testing sgemm
#### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0x40000000, 2) beta= (0x40000000, 2)
#### args: lda=128 ldb=128 ldc=128
^^^^ elapsed = 0.00030580 sec GFLOPS=137.159
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0xbff0000000000000, -1) beta= (0x0000000000000000, 0)
#### args: lda=128 ldb=128 ldc=128
^^^^ elapsed = 0.00055826 sec GFLOPS=75.1324
@@@@ dgemm test OK
batchCUBLAS.exe 3/3
==== Running N=10 batched ====

Testing sgemm
#### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0x3f800000, 1) beta= (0xbf800000, -1)
#### args: lda=128 ldb=128 ldc=128
^^^^ elapsed = 0.00051843 sec GFLOPS=80.9036
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0xbff0000000000000, -1) beta= (0x4000000000000000, 2)
#### args: lda=128 ldb=128 ldc=128
^^^^ elapsed = 0.00065873 sec GFLOPS=63.6729
@@@@ dgemm test OK

Test Summary
0 error(s)
bicubicTexture.exe 1/2
Starting bicubicTexture
[CUDA BicubicTexture] (OpenGL Mode)
CUDA device [GeForce GTX 560 Ti] has 8 Multi-Processors
Loaded 'lena_bw.pgm', 512 x 512 pixels

     Controls
     =/- : Zoom in/out
     b : Run Benchmark g_FilterMode
     c : Draw Bicubic Spline Curve
     [esc] - Quit

     Press number keys to change filtering g_FilterMode:

     1   :   nearest filtering
     2   :   bilinear filtering
     3   :   bicubic filtering
     4   :   fast bicubic filtering
     5   :   Catmull-Rom filtering
bicubicTexture.exe 2/2
[CUDA BicubicTexture] (Benchmark Mode)
time: 0.098 ms, 2673.560320 Mpixels/sec
> FilterMode[1] = Nearest
> FilterMode[2] = Bilinear
> FilterMode[3] = Bicubic
> FilterMode[4] = Fast Bicubic
> FilterMode[5] = Catmull-Rom
bilateralFilter.exe 1/2
Loading ../../../3_Imaging/bilateralFilter/data/nature_monte.bmp...
BMP width: 640
BMP height: 480
BMP file loaded successfully!
Loaded '../../../3_Imaging/bilateralFilter/data/nature_monte.bmp', 640 x 480 pixels



Found 1 CUDA Capable device(s) supporting CUDA

Device 0: "GeForce GTX 560 Ti"
 CUDA Runtime Version    : 5.0
 CUDA Compute Capability : 2.1

Found CUDA Capable Device 0: "GeForce GTX 560 Ti"
Setting active device to 0
Using device 0: GeForce GTX 560 Ti
Running Standard Demonstration with GLUT loop...

Press   '+' and '-' to change filter width
Press   ']' and '[' to change number of iterations
Press   'e' and 'E' to change Euclidean delta
Press   'g' and 'G' to changle Gaussian delta
Press   'a' or 'A' to change Animation mode ON/OFF
bilateralFilter.exe 2/2
bindlessTexture.exe / Failure
CUDA bindlessTexture Starting...

No GPU device was found that can support CUDA compute capability 3.0.
binomialOptions.exe
[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥binomialOptions.exe] - Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1

Using single precision...
Generating input data...
Running GPU binomial tree...
Options count          : 512
Time steps            : 2048
binomialOptionsGPU() time: 29.790300 msec
Options per second        : 17186.802203
Running CPU binomial tree...
Comparing the results...
GPU binomial vs. Black-Scholes
L1 norm: 1.323721E-004
CPU binomial vs. Black-Scholes
L1 norm: 1.045245E-004
CPU binomial vs. GPU binomial
L1 norm: 3.391858E-005
Shutting down...
Test passed
BlackScholes.exe 1/2
[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥BlackScholes.exe] - Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1

Initializing data...
...allocating CPU memory for options.
...allocating GPU memory for options.
...generating input data in CPU mem.
...copying input data to GPU mem.
Data init done.

Executing Black-Scholes GPU kernel (512 iterations)...
Options count         : 8000000
BlackScholesGPU() time : 0.806277 msec
Effective memory bandwidth: 99.221508 GB/s
Gigaoptions per second : 9.922151

BlackScholes, Throughput = 9.9222 GOptions/s, Time = 0.00081 s, Size = 8000000 options, NumDevsUsed = 1,
Workgroup = 128
BlackScholes.exe 2/2
Reading back GPU results...
Checking the results...
...running CPU calculations.

Comparing the results...
L1 norm: 1.768024E-007
Max absolute error: 1.120567E-005

Shutting down...
...releasing GPU memory.
...releasing CPU memory.
Shutdown done.

[BlackScholes] - Test Summary
Test passed
boxFilter.exe
C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥boxFilter.exe Starting...

Loaded '../../../3_Imaging/boxFilter/data/lenaRGB.ppm', 1024 x 1024 pixels

Found 1 CUDA Capable device(s) supporting CUDA

Device 0: "GeForce GTX 560 Ti"
 CUDA Runtime Version    : 5.0
 CUDA Compute Capability : 2.1

Found CUDA Capable Device 0: "GeForce GTX 560 Ti"
Setting active device to 0
Running Standard Demonstration with GLUT loop...

Press '+' and '-' to change filter width
Press ']' and '[' to change number of iterations
Press 'a' or 'A' to change animation ON/OFF
boxFilterNPP.exe
C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥boxFilterNPP.exe Starting...

GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1

cudaSetDevice GPU0 = GeForce GTX 560 Ti
NPP Library Version 5.0.35
C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥boxFilterNPP.exe using GPU
<GeForce GTX 560 Ti> wi
th 8 SM(s) with Compute 2.1
boxFilterNPP opened: <../../../common/data/Lena.pgm> successfully!
Saved image: ../../../common/data/Lena_boxFilter.pgm
cdpAdvancedQuicksort.exe / Failure
GPU 0 (GeForce GTX 560 Ti) does not support CUDA Dynamic Parallelism
cdpAdvancedQuicksort requires GPU devices with compute SM 3.5 or higher. Exiting...
cdpLUDecomposition.exe / Failure
Starting LU Decomposition (CUDA Dynamic Parallelism)
GPU device GeForce GTX 560 Ti has compute capabilities (SM 2.1)
cdpLUDecomposition requires SM 3.5 or higher to use CUDA Dynamic Parallelism. Exiting...
cdpQuadTree.exe / Failure
GPU 0 (GeForce GTX 560 Ti) does not support CUDA Dynamic Parallelism
cdpQuadTree requires SM 3.5 or higher to use CUDA Dynamic Parallelism. Exiting...
cdpSimplePrint.exe / Failure
starting Simple Print (CUDA Dynamic Parallelism)
GPU 0 (GeForce GTX 560 Ti) does not support CUDA Dynamic Parallelism
cdpSimplePrint requires GPU devices with compute SM 3.5 or higher. Exiting...
cdpSimpleQuicksort.exe / Failure
GPU 0 (GeForce GTX 560 Ti) does not support CUDA Dynamic Parallelism
cdpSimpleQuicksort requires GPU devices with compute SM 3.5 or higher. Exiting...
clock.exe
CUDA Clock sample
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1

Total clocks = 15204
Summary
GTX560, Some samples does not work fine.

→ MUST support CUDA compute capability 3.0.
→ Requires GPU devices with compute SM 3.5 or
higher.



 This evaluation to be continued, For future
reference.

More Related Content

What's hot

Gc crash course (1)
Gc crash course (1)Gc crash course (1)
Gc crash course (1)
Tier1 app
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and LatencyOptimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
Henning Jacobs
 
CUDA lab's slides of "parallel programming" course
CUDA lab's slides of "parallel programming" courseCUDA lab's slides of "parallel programming" course
CUDA lab's slides of "parallel programming" course
Shuai Yuan
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
Brendan Gregg
 
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Ontico
 
Kernel crashdump
Kernel crashdumpKernel crashdump
Kernel crashdump
Adrien Mahieux
 
Troubleshooting real production problems
Troubleshooting real production problemsTroubleshooting real production problems
Troubleshooting real production problems
Tier1 app
 
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelKernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Anne Nicolas
 
Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective
Ceph Community
 
Perintah cmd
Perintah cmdPerintah cmd
Perintah cmd
dody faizal
 
Troubleshooting performanceavailabilityproblems (1)
Troubleshooting performanceavailabilityproblems (1)Troubleshooting performanceavailabilityproblems (1)
Troubleshooting performanceavailabilityproblems (1)
Tier1 app
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAPiyush Mittal
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
Raymond Tay
 
Molecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New WorldMolecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New World
Can Ozdoruk
 
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC PlatformsProtecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Heechul Yun
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievVolodymyr Saviak
 
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012DefCamp
 
CUDA
CUDACUDA
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule
 

What's hot (20)

Gc crash course (1)
Gc crash course (1)Gc crash course (1)
Gc crash course (1)
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and LatencyOptimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
 
CUDA lab's slides of "parallel programming" course
CUDA lab's slides of "parallel programming" courseCUDA lab's slides of "parallel programming" course
CUDA lab's slides of "parallel programming" course
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
 
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
 
Kernel crashdump
Kernel crashdumpKernel crashdump
Kernel crashdump
 
Jud con presentation_brazil
Jud con presentation_brazilJud con presentation_brazil
Jud con presentation_brazil
 
Troubleshooting real production problems
Troubleshooting real production problemsTroubleshooting real production problems
Troubleshooting real production problems
 
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelKernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
 
Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective Ceph Day New York 2014: Ceph, a physical perspective
Ceph Day New York 2014: Ceph, a physical perspective
 
Perintah cmd
Perintah cmdPerintah cmd
Perintah cmd
 
Troubleshooting performanceavailabilityproblems (1)
Troubleshooting performanceavailabilityproblems (1)Troubleshooting performanceavailabilityproblems (1)
Troubleshooting performanceavailabilityproblems (1)
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDA
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
 
Molecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New WorldMolecular Shape Searching on GPUs: A Brave New World
Molecular Shape Searching on GPUs: A Brave New World
 
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC PlatformsProtecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 Kiev
 
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012NVidia CUDA for Bruteforce Attacks - DefCamp 2012
NVidia CUDA for Bruteforce Attacks - DefCamp 2012
 
CUDA
CUDACUDA
CUDA
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
 

Viewers also liked

Health Academy Portfolio 2010- Kara Kaster
Health Academy Portfolio 2010- Kara Kaster Health Academy Portfolio 2010- Kara Kaster
Health Academy Portfolio 2010- Kara Kaster Health Academy
 
For marathon volunteer
For marathon volunteerFor marathon volunteer
For marathon volunteer
Yukio Saito
 
Ocean City 2010
Ocean City 2010Ocean City 2010
Ocean City 2010
guest45f28e3
 
Beyond Functional Contribution
Beyond Functional ContributionBeyond Functional Contribution
Beyond Functional Contribution
rsibley
 
Good Power Point
Good Power PointGood Power Point
Good Power Point
guest45f28e3
 
Hipercolestiremia
HipercolestiremiaHipercolestiremia
HipercolestiremiaSofía
 
SSHDノートPC高速化 / Let's note CF-S9
SSHDノートPC高速化 / Let's note CF-S9SSHDノートPC高速化 / Let's note CF-S9
SSHDノートPC高速化 / Let's note CF-S9
Yukio Saito
 

Viewers also liked (7)

Health Academy Portfolio 2010- Kara Kaster
Health Academy Portfolio 2010- Kara Kaster Health Academy Portfolio 2010- Kara Kaster
Health Academy Portfolio 2010- Kara Kaster
 
For marathon volunteer
For marathon volunteerFor marathon volunteer
For marathon volunteer
 
Ocean City 2010
Ocean City 2010Ocean City 2010
Ocean City 2010
 
Beyond Functional Contribution
Beyond Functional ContributionBeyond Functional Contribution
Beyond Functional Contribution
 
Good Power Point
Good Power PointGood Power Point
Good Power Point
 
Hipercolestiremia
HipercolestiremiaHipercolestiremia
Hipercolestiremia
 
SSHDノートPC高速化 / Let's note CF-S9
SSHDノートPC高速化 / Let's note CF-S9SSHDノートPC高速化 / Let's note CF-S9
SSHDノートPC高速化 / Let's note CF-S9
 

Similar to Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1

Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to Accelerators
Dilum Bandara
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
Amgad Muhammad
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
Mail.ru Group
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
libfetion
 
Become a Java GC Hero - ConFoo Conference
Become a Java GC Hero - ConFoo ConferenceBecome a Java GC Hero - ConFoo Conference
Become a Java GC Hero - ConFoo Conference
Tier1app
 
Become a Java GC Hero - All Day Devops
Become a Java GC Hero - All Day DevopsBecome a Java GC Hero - All Day Devops
Become a Java GC Hero - All Day Devops
Tier1app
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
Andrey Kudryavtsev
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
fcassier
 
Android Boot Time Optimization
Android Boot Time OptimizationAndroid Boot Time Optimization
Android Boot Time OptimizationKan-Ru Chen
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
Brendan Gregg
 
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debugging
Anne Nicolas
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
Alcides Fonseca
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Speedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderSpeedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql Loader
GregSmith458515
 
ELC-E Linux Awareness
ELC-E Linux AwarenessELC-E Linux Awareness
ELC-E Linux AwarenessPeter Griffin
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 

Similar to Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1 (20)

Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to Accelerators
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
Become a Java GC Hero - ConFoo Conference
Become a Java GC Hero - ConFoo ConferenceBecome a Java GC Hero - ConFoo Conference
Become a Java GC Hero - ConFoo Conference
 
Become a Java GC Hero - All Day Devops
Become a Java GC Hero - All Day DevopsBecome a Java GC Hero - All Day Devops
Become a Java GC Hero - All Day Devops
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
Android Boot Time Optimization
Android Boot Time OptimizationAndroid Boot Time Optimization
Android Boot Time Optimization
 
Linux boot-time
Linux boot-timeLinux boot-time
Linux boot-time
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debugging
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
GPU: Understanding CUDA
GPU: Understanding CUDAGPU: Understanding CUDA
GPU: Understanding CUDA
 
Speedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderSpeedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql Loader
 
ELC-E Linux Awareness
ELC-E Linux AwarenessELC-E Linux Awareness
ELC-E Linux Awareness
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 

More from Yukio Saito

東京2020ボランティア参加メモ(簡易)
東京2020ボランティア参加メモ(簡易)東京2020ボランティア参加メモ(簡易)
東京2020ボランティア参加メモ(簡易)
Yukio Saito
 
Exam prep microsoft_ai900_japanese_210428
Exam prep microsoft_ai900_japanese_210428Exam prep microsoft_ai900_japanese_210428
Exam prep microsoft_ai900_japanese_210428
Yukio Saito
 
Simple know how to creating agenda notes and daily reports
Simple know how to creating agenda notes and daily reportsSimple know how to creating agenda notes and daily reports
Simple know how to creating agenda notes and daily reports
Yukio Saito
 
Aws 転送時間計測(手順付き参考例)
Aws 転送時間計測(手順付き参考例)Aws 転送時間計測(手順付き参考例)
Aws 転送時間計測(手順付き参考例)
Yukio Saito
 
異業種から福祉業界ジョブチェンジして10か月後
異業種から福祉業界ジョブチェンジして10か月後異業種から福祉業界ジョブチェンジして10か月後
異業種から福祉業界ジョブチェンジして10か月後
Yukio Saito
 
異業種から福祉介護ジョブチェンジ検討
異業種から福祉介護ジョブチェンジ検討異業種から福祉介護ジョブチェンジ検討
異業種から福祉介護ジョブチェンジ検討
Yukio Saito
 
オンデマンド学習スタイル例 NFU
オンデマンド学習スタイル例 NFUオンデマンド学習スタイル例 NFU
オンデマンド学習スタイル例 NFU
Yukio Saito
 
Engadget電子工作部 健康ガジェットを作ろう ドS!コーチ発表最終版
Engadget電子工作部 健康ガジェットを作ろう ドS!コーチ発表最終版Engadget電子工作部 健康ガジェットを作ろう ドS!コーチ発表最終版
Engadget電子工作部 健康ガジェットを作ろう ドS!コーチ発表最終版
Yukio Saito
 
Tobii eye x controller で遊ぶ
Tobii eye x controller で遊ぶTobii eye x controller で遊ぶ
Tobii eye x controller で遊ぶ
Yukio Saito
 
斉藤之雄 が 公立大学 産業技術大学院大学 で獲得したこと。
斉藤之雄 が 公立大学 産業技術大学院大学 で獲得したこと。斉藤之雄 が 公立大学 産業技術大学院大学 で獲得したこと。
斉藤之雄 が 公立大学 産業技術大学院大学 で獲得したこと。
Yukio Saito
 
Microsoft windows phone_激安購入方法
Microsoft windows phone_激安購入方法Microsoft windows phone_激安購入方法
Microsoft windows phone_激安購入方法
Yukio Saito
 
PBLでは先行学習は大事だぜ、シラバスは参考程度で主体的に楽しもうぜ
PBLでは先行学習は大事だぜ、シラバスは参考程度で主体的に楽しもうぜPBLでは先行学習は大事だぜ、シラバスは参考程度で主体的に楽しもうぜ
PBLでは先行学習は大事だぜ、シラバスは参考程度で主体的に楽しもうぜ
Yukio Saito
 
CentOS7をインストールして遊ぶのだ
CentOS7をインストールして遊ぶのだCentOS7をインストールして遊ぶのだ
CentOS7をインストールして遊ぶのだ
Yukio Saito
 
Androidエミュレータをちょっと速くするintel haxm(ハッサム)
Androidエミュレータをちょっと速くするintel haxm(ハッサム)Androidエミュレータをちょっと速くするintel haxm(ハッサム)
Androidエミュレータをちょっと速くするintel haxm(ハッサム)
Yukio Saito
 
Winodws7のruby2でrails4を遊ぶ環境を作るのだ。
Winodws7のruby2でrails4を遊ぶ環境を作るのだ。Winodws7のruby2でrails4を遊ぶ環境を作るのだ。
Winodws7のruby2でrails4を遊ぶ環境を作るのだ。
Yukio Saito
 
Astah plugin 実行方法とSysML要求図のサンプル
Astah plugin 実行方法とSysML要求図のサンプルAstah plugin 実行方法とSysML要求図のサンプル
Astah plugin 実行方法とSysML要求図のサンプル
Yukio Saito
 
Windows8でOpenCVを使ったAndroid(MOVERIO)開発体験したい
Windows8でOpenCVを使ったAndroid(MOVERIO)開発体験したいWindows8でOpenCVを使ったAndroid(MOVERIO)開発体験したい
Windows8でOpenCVを使ったAndroid(MOVERIO)開発体験したい
Yukio Saito
 
NTTcom cloud n にサービス追加の適当な手順
NTTcom cloud n にサービス追加の適当な手順NTTcom cloud n にサービス追加の適当な手順
NTTcom cloud n にサービス追加の適当な手順
Yukio Saito
 
Intel xdk導入とhtml5サンプルビルド手順書
Intel xdk導入とhtml5サンプルビルド手順書Intel xdk導入とhtml5サンプルビルド手順書
Intel xdk導入とhtml5サンプルビルド手順書
Yukio Saito
 
圏央道ウォーキング日記
圏央道ウォーキング日記圏央道ウォーキング日記
圏央道ウォーキング日記Yukio Saito
 

More from Yukio Saito (20)

東京2020ボランティア参加メモ(簡易)
東京2020ボランティア参加メモ(簡易)東京2020ボランティア参加メモ(簡易)
東京2020ボランティア参加メモ(簡易)
 
Exam prep microsoft_ai900_japanese_210428
Exam prep microsoft_ai900_japanese_210428Exam prep microsoft_ai900_japanese_210428
Exam prep microsoft_ai900_japanese_210428
 
Simple know how to creating agenda notes and daily reports
Simple know how to creating agenda notes and daily reportsSimple know how to creating agenda notes and daily reports
Simple know how to creating agenda notes and daily reports
 
Aws 転送時間計測(手順付き参考例)
Aws 転送時間計測(手順付き参考例)Aws 転送時間計測(手順付き参考例)
Aws 転送時間計測(手順付き参考例)
 
異業種から福祉業界ジョブチェンジして10か月後
異業種から福祉業界ジョブチェンジして10か月後異業種から福祉業界ジョブチェンジして10か月後
異業種から福祉業界ジョブチェンジして10か月後
 
異業種から福祉介護ジョブチェンジ検討
異業種から福祉介護ジョブチェンジ検討異業種から福祉介護ジョブチェンジ検討
異業種から福祉介護ジョブチェンジ検討
 
オンデマンド学習スタイル例 NFU
オンデマンド学習スタイル例 NFUオンデマンド学習スタイル例 NFU
オンデマンド学習スタイル例 NFU
 
Engadget電子工作部 健康ガジェットを作ろう ドS!コーチ発表最終版
Engadget電子工作部 健康ガジェットを作ろう ドS!コーチ発表最終版Engadget電子工作部 健康ガジェットを作ろう ドS!コーチ発表最終版
Engadget電子工作部 健康ガジェットを作ろう ドS!コーチ発表最終版
 
Tobii eye x controller で遊ぶ
Tobii eye x controller で遊ぶTobii eye x controller で遊ぶ
Tobii eye x controller で遊ぶ
 
斉藤之雄 が 公立大学 産業技術大学院大学 で獲得したこと。
斉藤之雄 が 公立大学 産業技術大学院大学 で獲得したこと。斉藤之雄 が 公立大学 産業技術大学院大学 で獲得したこと。
斉藤之雄 が 公立大学 産業技術大学院大学 で獲得したこと。
 
Microsoft windows phone_激安購入方法
Microsoft windows phone_激安購入方法Microsoft windows phone_激安購入方法
Microsoft windows phone_激安購入方法
 
PBLでは先行学習は大事だぜ、シラバスは参考程度で主体的に楽しもうぜ
PBLでは先行学習は大事だぜ、シラバスは参考程度で主体的に楽しもうぜPBLでは先行学習は大事だぜ、シラバスは参考程度で主体的に楽しもうぜ
PBLでは先行学習は大事だぜ、シラバスは参考程度で主体的に楽しもうぜ
 
CentOS7をインストールして遊ぶのだ
CentOS7をインストールして遊ぶのだCentOS7をインストールして遊ぶのだ
CentOS7をインストールして遊ぶのだ
 
Androidエミュレータをちょっと速くするintel haxm(ハッサム)
Androidエミュレータをちょっと速くするintel haxm(ハッサム)Androidエミュレータをちょっと速くするintel haxm(ハッサム)
Androidエミュレータをちょっと速くするintel haxm(ハッサム)
 
Winodws7のruby2でrails4を遊ぶ環境を作るのだ。
Winodws7のruby2でrails4を遊ぶ環境を作るのだ。Winodws7のruby2でrails4を遊ぶ環境を作るのだ。
Winodws7のruby2でrails4を遊ぶ環境を作るのだ。
 
Astah plugin 実行方法とSysML要求図のサンプル
Astah plugin 実行方法とSysML要求図のサンプルAstah plugin 実行方法とSysML要求図のサンプル
Astah plugin 実行方法とSysML要求図のサンプル
 
Windows8でOpenCVを使ったAndroid(MOVERIO)開発体験したい
Windows8でOpenCVを使ったAndroid(MOVERIO)開発体験したいWindows8でOpenCVを使ったAndroid(MOVERIO)開発体験したい
Windows8でOpenCVを使ったAndroid(MOVERIO)開発体験したい
 
NTTcom cloud n にサービス追加の適当な手順
NTTcom cloud n にサービス追加の適当な手順NTTcom cloud n にサービス追加の適当な手順
NTTcom cloud n にサービス追加の適当な手順
 
Intel xdk導入とhtml5サンプルビルド手順書
Intel xdk導入とhtml5サンプルビルド手順書Intel xdk導入とhtml5サンプルビルド手順書
Intel xdk導入とhtml5サンプルビルド手順書
 
圏央道ウォーキング日記
圏央道ウォーキング日記圏央道ウォーキング日記
圏央道ウォーキング日記
 

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 

Nvidia® cuda™ 5.0 Sample Evaluation Result Part 1

  • 1. NVIDIA® CUDA™ 5.0 Sample evaluation result PART Ⅰ GPU: GTX 560 Ti CPU: i5-3450S (TDP65W) RAM: 16GB OS: Windows 7 x64 Ultimate Yukio Saitoh | FXFROG.com 21st/Apr/2013
  • 2. INDEX Sample binary : 1. alignedTypes.exe 2. asyncAPI.exe 3. bandwidthTest.exe 4. batchCUBLAS.exe 5. bicubicTexture.exe 6. bilateralFilter.exe 7. bindlessTexture.exe / Failure 8. binomialOptions.exe 9. BlackScholes.exe 1/2 10. boxFilter.exe 11. boxFilterNPP.exe 12. cdpAdvancedQuicksort.exe / Failure 13. cdpLUDecomposition.exe / Failure 14. cdpQuadTree.exe / Failure 15. cdpSimplePrint.exe / Failure 16. cdpSimplePrint.exe / Failure 17. cdpSimpleQuicksort.exe / Failure 18. clock.exe
  • 3. Sample target path and files • C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release
  • 4. alignedTypes.exe 1/2 [C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥alignedTypes.exe] - Starting... GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 [GeForce GTX 560 Ti] has 8 MP(s) x 48 (Cores/MP) = 384 (Cores) > Compute scaling value = 1.00 > Memory Size = 49999872 Allocating memory... Generating host input data array... Uploading input data to GPU memory... Testing misaligned types... uint8... Avg. time: 2.563287 ms / Copy throughput: 18.166525 GB/s. TEST OK uint16... Avg. time: 1.429239 ms / Copy throughput: 32.580981 GB/s. TEST OK RGBA8_misaligned... Avg. time: 1.766606 ms / Copy throughput: 26.359026 GB/s. TEST OK LA32_misaligned... Avg. time: 0.998594 ms / Copy throughput: 46.631585 GB/s. TEST OK RGB32_misaligned... Avg. time: 1.273794 ms / Copy throughput: 36.556941 GB/s. TEST OK RGBA32_misaligned... Avg. time: 1.703606 ms / Copy throughput: 27.333794 GB/s. TEST OK
  • 5. alignedTypes.exe 2/2 Testing aligned types... RGBA8... Avg. time: 1.131558 ms / Copy throughput: 41.152104 GB/s. TEST OK I32... Avg. time: 1.091073 ms / Copy throughput: 42.679095 GB/s. TEST OK LA32... Avg. time: 0.952468 ms / Copy throughput: 48.889827 GB/s. TEST OK RGB32... Avg. time: 1.431797 ms / Copy throughput: 32.522784 GB/s. TEST OK RGBA32... Avg. time: 0.961305 ms / Copy throughput: 48.440401 GB/s. TEST OK RGBA32_2... Avg. time: 1.340105 ms / Copy throughput: 34.748032 GB/s. TEST OK [alignedTypes] -> Test Results: 0 Failures
  • 6. asyncAPI.exe [C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥asyncAPI.exe] - Starting... GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 CUDA device [GeForce GTX 560 Ti] time spent executing by the GPU: 22.45 time spent by CPU in CUDA calls: 0.04 CPU executed 12884 iterations while waiting for GPU to finish
  • 7. bandwidthTest.exe [CUDA Bandwidth Test] - Starting... Running on... Device 0: GeForce GTX 560 Ti Quick Mode Host to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 6016.1 Device to Host Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 6103.5 Device to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 108588.2
  • 8. batchCUBLAS.exe 1/3 batchCUBLAS Starting... GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 ==== Running single kernels ==== Testing sgemm #### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0xbf800000, -1) beta= (0x40000000, 2) #### args: lda=128 ldb=128 ldc=128 ^^^^ elapsed = 0.00010011 sec GFLOPS=41.8986 @@@@ sgemm test OK Testing dgemm #### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0x0000000000000000, 0) beta= (0x0000000000000000, 0) #### args: lda=128 ldb=128 ldc=128 ^^^^ elapsed = 0.00012166 sec GFLOPS=34.4752 @@@@ dgemm test OK ==== Running N=10 without streams ==== Testing sgemm #### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0xbf800000, -1) beta= (0x00000000, 0) #### args: lda=128 ldb=128 ldc=128 ^^^^ elapsed = 0.00030251 sec GFLOPS=138.65 @@@@ sgemm test OK Testing dgemm #### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0xbff0000000000000, -1) beta= (0x0000000000000000, 0) #### args: lda=128 ldb=128 ldc=128 ^^^^ elapsed = 0.00062913 sec GFLOPS=66.668 @@@@ dgemm test OK
  • 9. batchCUBLAS.exe 2/3 ==== Running N=10 without streams ==== Testing sgemm #### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0xbf800000, -1) beta= (0x00000000, 0) #### args: lda=128 ldb=128 ldc=128 ^^^^ elapsed = 0.00030251 sec GFLOPS=138.65 @@@@ sgemm test OK Testing dgemm #### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0xbff0000000000000, -1) beta= (0x0000000000000000, 0) #### args: lda=128 ldb=128 ldc=128 ^^^^ elapsed = 0.00062913 sec GFLOPS=66.668 @@@@ dgemm test OK ==== Running N=10 with streams ==== Testing sgemm #### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0x40000000, 2) beta= (0x40000000, 2) #### args: lda=128 ldb=128 ldc=128 ^^^^ elapsed = 0.00030580 sec GFLOPS=137.159 @@@@ sgemm test OK Testing dgemm #### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0xbff0000000000000, -1) beta= (0x0000000000000000, 0) #### args: lda=128 ldb=128 ldc=128 ^^^^ elapsed = 0.00055826 sec GFLOPS=75.1324 @@@@ dgemm test OK
  • 10. batchCUBLAS.exe 3/3 ==== Running N=10 batched ==== Testing sgemm #### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0x3f800000, 1) beta= (0xbf800000, -1) #### args: lda=128 ldb=128 ldc=128 ^^^^ elapsed = 0.00051843 sec GFLOPS=80.9036 @@@@ sgemm test OK Testing dgemm #### args: ta=0 tb=0 m=128 n=128 k=128 alpha = (0xbff0000000000000, -1) beta= (0x4000000000000000, 2) #### args: lda=128 ldb=128 ldc=128 ^^^^ elapsed = 0.00065873 sec GFLOPS=63.6729 @@@@ dgemm test OK Test Summary 0 error(s)
  • 11. bicubicTexture.exe 1/2 Starting bicubicTexture [CUDA BicubicTexture] (OpenGL Mode) CUDA device [GeForce GTX 560 Ti] has 8 Multi-Processors Loaded 'lena_bw.pgm', 512 x 512 pixels Controls =/- : Zoom in/out b : Run Benchmark g_FilterMode c : Draw Bicubic Spline Curve [esc] - Quit Press number keys to change filtering g_FilterMode: 1 : nearest filtering 2 : bilinear filtering 3 : bicubic filtering 4 : fast bicubic filtering 5 : Catmull-Rom filtering
  • 12. bicubicTexture.exe 2/2 [CUDA BicubicTexture] (Benchmark Mode) time: 0.098 ms, 2673.560320 Mpixels/sec > FilterMode[1] = Nearest > FilterMode[2] = Bilinear > FilterMode[3] = Bicubic > FilterMode[4] = Fast Bicubic > FilterMode[5] = Catmull-Rom
  • 13. bilateralFilter.exe 1/2 Loading ../../../3_Imaging/bilateralFilter/data/nature_monte.bmp... BMP width: 640 BMP height: 480 BMP file loaded successfully! Loaded '../../../3_Imaging/bilateralFilter/data/nature_monte.bmp', 640 x 480 pixels Found 1 CUDA Capable device(s) supporting CUDA Device 0: "GeForce GTX 560 Ti" CUDA Runtime Version : 5.0 CUDA Compute Capability : 2.1 Found CUDA Capable Device 0: "GeForce GTX 560 Ti" Setting active device to 0 Using device 0: GeForce GTX 560 Ti Running Standard Demonstration with GLUT loop... Press '+' and '-' to change filter width Press ']' and '[' to change number of iterations Press 'e' and 'E' to change Euclidean delta Press 'g' and 'G' to changle Gaussian delta Press 'a' or 'A' to change Animation mode ON/OFF
  • 15. bindlessTexture.exe / Failure CUDA bindlessTexture Starting... No GPU device was found that can support CUDA compute capability 3.0.
  • 16. binomialOptions.exe [C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥binomialOptions.exe] - Starting... GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 Using single precision... Generating input data... Running GPU binomial tree... Options count : 512 Time steps : 2048 binomialOptionsGPU() time: 29.790300 msec Options per second : 17186.802203 Running CPU binomial tree... Comparing the results... GPU binomial vs. Black-Scholes L1 norm: 1.323721E-004 CPU binomial vs. Black-Scholes L1 norm: 1.045245E-004 CPU binomial vs. GPU binomial L1 norm: 3.391858E-005 Shutting down... Test passed
  • 17. BlackScholes.exe 1/2 [C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥BlackScholes.exe] - Starting... GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 Initializing data... ...allocating CPU memory for options. ...allocating GPU memory for options. ...generating input data in CPU mem. ...copying input data to GPU mem. Data init done. Executing Black-Scholes GPU kernel (512 iterations)... Options count : 8000000 BlackScholesGPU() time : 0.806277 msec Effective memory bandwidth: 99.221508 GB/s Gigaoptions per second : 9.922151 BlackScholes, Throughput = 9.9222 GOptions/s, Time = 0.00081 s, Size = 8000000 options, NumDevsUsed = 1, Workgroup = 128
  • 18. BlackScholes.exe 2/2 Reading back GPU results... Checking the results... ...running CPU calculations. Comparing the results... L1 norm: 1.768024E-007 Max absolute error: 1.120567E-005 Shutting down... ...releasing GPU memory. ...releasing CPU memory. Shutdown done. [BlackScholes] - Test Summary Test passed
  • 19. boxFilter.exe C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥boxFilter.exe Starting... Loaded '../../../3_Imaging/boxFilter/data/lenaRGB.ppm', 1024 x 1024 pixels Found 1 CUDA Capable device(s) supporting CUDA Device 0: "GeForce GTX 560 Ti" CUDA Runtime Version : 5.0 CUDA Compute Capability : 2.1 Found CUDA Capable Device 0: "GeForce GTX 560 Ti" Setting active device to 0 Running Standard Demonstration with GLUT loop... Press '+' and '-' to change filter width Press ']' and '[' to change number of iterations Press 'a' or 'A' to change animation ON/OFF
  • 20. boxFilterNPP.exe C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥boxFilterNPP.exe Starting... GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 cudaSetDevice GPU0 = GeForce GTX 560 Ti NPP Library Version 5.0.35 C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥boxFilterNPP.exe using GPU <GeForce GTX 560 Ti> wi th 8 SM(s) with Compute 2.1 boxFilterNPP opened: <../../../common/data/Lena.pgm> successfully! Saved image: ../../../common/data/Lena_boxFilter.pgm
  • 21. cdpAdvancedQuicksort.exe / Failure GPU 0 (GeForce GTX 560 Ti) does not support CUDA Dynamic Parallelism cdpAdvancedQuicksort requires GPU devices with compute SM 3.5 or higher. Exiting...
  • 22. cdpLUDecomposition.exe / Failure Starting LU Decomposition (CUDA Dynamic Parallelism) GPU device GeForce GTX 560 Ti has compute capabilities (SM 2.1) cdpLUDecomposition requires SM 3.5 or higher to use CUDA Dynamic Parallelism. Exiting...
  • 23. cdpQuadTree.exe / Failure GPU 0 (GeForce GTX 560 Ti) does not support CUDA Dynamic Parallelism cdpQuadTree requires SM 3.5 or higher to use CUDA Dynamic Parallelism. Exiting...
  • 24. cdpSimplePrint.exe / Failure starting Simple Print (CUDA Dynamic Parallelism) GPU 0 (GeForce GTX 560 Ti) does not support CUDA Dynamic Parallelism cdpSimplePrint requires GPU devices with compute SM 3.5 or higher. Exiting...
  • 25. cdpSimpleQuicksort.exe / Failure GPU 0 (GeForce GTX 560 Ti) does not support CUDA Dynamic Parallelism cdpSimpleQuicksort requires GPU devices with compute SM 3.5 or higher. Exiting...
  • 26. clock.exe CUDA Clock sample GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1 Total clocks = 15204
  • 27. Summary GTX560, Some samples does not work fine. → MUST support CUDA compute capability 3.0. → Requires GPU devices with compute SM 3.5 or higher. This evaluation to be continued, For future reference.