Tensor Core

Tensor Core
"SIMD" for GPU
https://devblogs.nvidia.com/programming-tensor-cores-cuda-9/

Tensor Cores

Tensor Cores
https://www.nvidia.com/en-us/data-center/tensorcore/

12X
https://www.nvidia.com/en-us/data-center/tensorcore/

Supported Types
namespace experimental {
namespace precision {
struct u4; // 4-bit unsigned
struct s4; // 4-bit signed
struct b1; // 1-bit
}
enum bmmaBitOp { bmmaBitOpXOR = 1 };
enum bmmaAccumulateOp { bmmaAccumulateOpPOPC = 1 };
}
• Input : FP16, u8, s8, u4, s4, b1

• Accumulator : FP16, FP32, int

• Also in experimental:

Mixed Precision

CUDA Library
also in TensorRT 3
cuBLAS cuDNN

CUDA WMMA API
https://en.wikipedia.org/wiki/Joanna_J%C4%99drzejczyk

CPU Level
simpleTensorCoreGEMM.cu
https://github.com/parallel-forall/code-samples/blob/master/posts/tensor-cores/simpleTensorCoreGEMM.cu
call kernel function in wrap

Warp-Level
http://on-demand.gputechconf.com/gtc/2017/presentation/s7132-mark-harris-new-cuda-features-and-beyond.pdf
(In short)

Warp-Level : 
Initialization
Values
simpleTensorCoreGEMM.cu
Kernel function in wrap

Warp-Level : 
Fragments on Registers
Fragment Type
Clear Acc

Warp-Level :
Tile Calculation(compute one tile of the output matrix per warp)
= x +

Warp-Level :
Finishing
Optional Scaling
C = alpha * Acc + beta * C
Store to Memory

Availability
• V100, Titan V

• RTX 2070, RTX 2080, RTX 2080 Ti, etc.

Tensor Core

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Tensor Core

Similar to Tensor Core (20)

More from Mindos Cheng

More from Mindos Cheng (13)

Recently uploaded

Recently uploaded (20)

Tensor Core