SlideShare a Scribd company logo
Graphics Processing Unit
A Graphics Processing Unit (GPU)
also known as a Video Processing
Unit (VPU) is an electronic circuit
which rapidly manipulates memory to
accelerate image
creation/processing to be displayed
on a display device.
The term GPU was given by NVIDIA
in 1999 with their release of the
GeForce 256 as “The world’s first
GPU”.
Modern GPUs uses the
concept of parallel
processing which makes
them efficient than CPUs in
processing large blocks of
data.
GPUs uses the process of rendering in which each pixel of
the display screen
(e.g. 1920x1080=2073600 pixels) is given a value of
texture, lighting and location on the screen. Using these
parameters, the GPUs make 3D images on a 2D screen.
The GPUs have there own dedicated RAM or Virtual RAM
or VRAM which stores information about each pixel (its
color, location and lighting) and also can store frame
buffers i.e. a full image which is going to be projected on
the screen.
GPU vs. CPU
Graphics Processing
Unit
• Performs Data Parallelism.
• Have more cores but low
clock speed.
• Uses VRAM which is fast
but small in size.
• Has low cache memory.
Central Processing Unit
• Performs Task Parallelism.
• Have less cores but high
clock speed.
• Uses External RAM which is
slow but large in size.
• Has high cache memory.
ARCHITECT
URE
Streaming Multiprocessors and
CUDA Cores
A Streaming Multiprocessor is the
main computation unit of a GPU. It
consists more smaller processing
units called Stream Processors or
CUDA Cores.
CUDA is the term coined by NVIDIA
which stands for Compute Unified
Device Architecture. A CUDA core
consists of further two units, a
Floating Point Unit which computes
floating point data and an Integer Unit
which computes integer data.
There are 16 Load/Store Unit per SM
There are also four Special Function Units which executes
transcendental instructions such as sine, cosine, reciprocal
& square root.
Each SM uses two Warp Scheduler & two Instruction
Dispatch Units, allowing two warps to be issued & executed
concurrently.
One of the feature of SM is the Fused Multiply-Add (FMA)
instruction. A frequently used sequence of operation in
computer graphics and in linear algebra is to multiply two
numbers and add the product to a third number.
Prior to FMA, this was achieved by Multiply-Add (MAD)
instruction in which performs multiplication with truncation,
followed by the addition.
The FMA maintains full precision after multiplication for both
CUDA Processing Flow
1) The main memory (RAM)
copies data onto the GPU
memory (VRAM) and the data
waits there for further
instructions.
2) Now the CPU instructs the
GPU to compute the data. At
this point, the GPU picks up
the data and throws it onto the
SPs or CUDA Cores.
3) The data processing occur in
such a way that the same kind
of instructions are executed
in parallel saving processing
time.
Graphics Pipelining
Vertex Shader: Provides location of vertices in a 3D space.
Generating Primitives: Making polygons using vertex in
3D space.
Rasterization: Process of filing triangular geometries with
dots or pixels.
Pixel Shader: Defines each pixel with attributes such as
light and color.
Testing and Mixing: Here the 3D objects are tested for
shadow effects and also the Anti-Aliasing (AA) and
Components of a Modern GPUs
Shader Processing Units (CUDA Cores) : A CUDA Core is
basically a parallel processor which computes graphics
calculations.
Texture Mapping Unit (TMU) : A TMU is used to resize,
rotate or distort a bitmap image to be placed on a 3D model
as texture. The measure of how fast a GPU can map a texture
onto a 3D model is given by the Texture Fill Rate.
Raster Operation Pipeline (ROP) : The final step of
rendering is done by the ROP. The ROP takes up pixel & texel
information and process it to give a final texture and depth
value to a pixel. The measure of how quickly a ROP can do
Nvidia Pascal
GP100
The Most Advanced
Datacenter Accelerator
Pascal Features
• Fabricated using 16nm FinFET technology by TSMC.
• New Chip-on-Wafer-on-Substrate (CoWoS) HBM2 high
bandwidth memory.
• NVLink for high bandwidth interconnect between P100
chips.
• Unified Memory, Compute Preemption and New AI
Algorithms.
NVLink
NVLink is a communication
protocol developed by Nvidia.
NVLink specifies a point-to-point
interconnect between a CPU and
a GPU and also between a GPU
and another GPU.
With NVLink a GPU can access
system memory at a rate of 160
GB/s which was 120 GB/s in
previous Maxwell architecture.
HBM 2
HBM stands for High Bandwidth Memory in which one
or more memory dies are vertically stacked over each
other whereas traditionally discrete memory chips were
soldered around the GPU chip.
The HBM2 provides a stack of 8 GB of DRAM with 180
GB/s of data transfer rates which was earlier limited to 2
GB of stacks at 125GB/s in HBM1.
Unified Memory, Compute Preemption
and New AI Algorithms
Unified Memory: Unified memory is an essential part of
CUDA programming model which greatly simplifies
programming and porting of applications to GPU by
providing a single unified virtual address space for
accessing all CPU and GPU memory in a system.
Compute Preemption: This feature allows the compute tasks
running on GPU to be interrupted at instruction level solving
the problem of long running or ill-behaved applications
which causes the system to became unresponsive while it
waits for the task to get completed.
New AI Algorithms: The Pascal features new AI technology
Applications
Mainly used for parallel computing for high calculation
intensive tasks.
Used in many Physics Simulations, Statistical Physics,
Fast Fourier Transforms, Fuzzy Logics, Analog Signal
Processing and Digital Image Processing.
The new Pascal Architecture features Deep Learning where
neural learning process of a human brain is modeled through
which the system continuously learning, getting smarter
and delivering more accurate results over time.
FUNAC, a leading brand in manufacturing Robots
demonstrated an assembly line robot powered by Pascal GPU
Questions?

More Related Content

What's hot

Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)
Amal R
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
Khan Mostafa
 
Graphic Processing Unit
Graphic Processing UnitGraphic Processing Unit
Graphic Processing UnitKamran Ashraf
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)
Fatima Qayyum
 
Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)
Antonios Katsarakis
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware Landscape
Grigory Sapunov
 
It's Time to ROCm!
It's Time to ROCm!It's Time to ROCm!
It's Time to ROCm!
inside-BigData.com
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
Sandeep Singh
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
Jungsoo Nam
 
Cuda
CudaCuda
Cuda
CudaCuda
Gpu Systems
Gpu SystemsGpu Systems
Gpu Systems
jpaugh
 
High performance computing with accelarators
High performance computing with accelaratorsHigh performance computing with accelarators
High performance computing with accelarators
Emmanuel college
 
Graphics Processing Unit - GPU
Graphics Processing Unit - GPUGraphics Processing Unit - GPU
Graphics Processing Unit - GPU
Chetan Gole
 
AI Chip Trends and Forecast
AI Chip Trends and ForecastAI Chip Trends and Forecast
AI Chip Trends and Forecast
CastLabKAIST
 

What's hot (20)

Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
 
Graphic Processing Unit
Graphic Processing UnitGraphic Processing Unit
Graphic Processing Unit
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
Graphics processing unit
Graphics processing unitGraphics processing unit
Graphics processing unit
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)
 
GPU Programming
GPU ProgrammingGPU Programming
GPU Programming
 
Tech Talk NVIDIA CUDA
Tech Talk NVIDIA CUDATech Talk NVIDIA CUDA
Tech Talk NVIDIA CUDA
 
Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware Landscape
 
It's Time to ROCm!
It's Time to ROCm!It's Time to ROCm!
It's Time to ROCm!
 
Cuda Architecture
Cuda ArchitectureCuda Architecture
Cuda Architecture
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 
Cuda
CudaCuda
Cuda
 
Cuda
CudaCuda
Cuda
 
Gpu Systems
Gpu SystemsGpu Systems
Gpu Systems
 
High performance computing with accelarators
High performance computing with accelaratorsHigh performance computing with accelarators
High performance computing with accelarators
 
Graphics Processing Unit - GPU
Graphics Processing Unit - GPUGraphics Processing Unit - GPU
Graphics Processing Unit - GPU
 
AI Chip Trends and Forecast
AI Chip Trends and ForecastAI Chip Trends and Forecast
AI Chip Trends and Forecast
 

Similar to Nvidia (History, GPU Architecture and New Pascal Architecture)

Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
Dhaval Kaneria
 
Image Processing Application on Graphics processors
Image Processing Application on Graphics processorsImage Processing Application on Graphics processors
Image Processing Application on Graphics processors
CSCJournals
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
fcassier
 
Stream Processing
Stream ProcessingStream Processing
Stream Processing
arnamoy10
 
Ultra Fast SOM using CUDA
Ultra Fast SOM using CUDAUltra Fast SOM using CUDA
Ultra Fast SOM using CUDA
QuEST Global (erstwhile NeST Software)
 
Map SMAC Algorithm onto GPU
Map SMAC Algorithm onto GPUMap SMAC Algorithm onto GPU
Map SMAC Algorithm onto GPUZhengjie Lu
 
Mod 2 hardware_graphics.pdf
Mod 2 hardware_graphics.pdfMod 2 hardware_graphics.pdf
Mod 2 hardware_graphics.pdf
DavidsonJebaseelan1
 
Gpu application in cuda memory
Gpu application in cuda memoryGpu application in cuda memory
Gpu application in cuda memory
journalacij
 
DSP Processor.pptx
DSP Processor.pptxDSP Processor.pptx
DSP Processor.pptx
AswathSelvaraj
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010
John Holden
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Ac922 cdac webinar
Ac922 cdac webinarAc922 cdac webinar
Ac922 cdac webinar
Ganesan Narayanasamy
 
hard ware compoenents used in Computer Graphics.pptx
hard ware compoenents used in Computer Graphics.pptxhard ware compoenents used in Computer Graphics.pptx
hard ware compoenents used in Computer Graphics.pptx
AaronAbraham36
 
Tesla personal super computer
Tesla personal super computerTesla personal super computer
Tesla personal super computer
Priya Manik
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
ssuser413a98
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
bakers84
 
A New Approach for Parallel Region Growing Algorithm in Image Segmentation u...
A New Approach for Parallel Region Growing Algorithm in Image Segmentation u...A New Approach for Parallel Region Growing Algorithm in Image Segmentation u...
A New Approach for Parallel Region Growing Algorithm in Image Segmentation u...
maneesh boddu
 

Similar to Nvidia (History, GPU Architecture and New Pascal Architecture) (20)

Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
Image Processing Application on Graphics processors
Image Processing Application on Graphics processorsImage Processing Application on Graphics processors
Image Processing Application on Graphics processors
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
Stream Processing
Stream ProcessingStream Processing
Stream Processing
 
Ultra Fast SOM using CUDA
Ultra Fast SOM using CUDAUltra Fast SOM using CUDA
Ultra Fast SOM using CUDA
 
Map SMAC Algorithm onto GPU
Map SMAC Algorithm onto GPUMap SMAC Algorithm onto GPU
Map SMAC Algorithm onto GPU
 
Mod 2 hardware_graphics.pdf
Mod 2 hardware_graphics.pdfMod 2 hardware_graphics.pdf
Mod 2 hardware_graphics.pdf
 
Gpu application in cuda memory
Gpu application in cuda memoryGpu application in cuda memory
Gpu application in cuda memory
 
DSP Processor.pptx
DSP Processor.pptxDSP Processor.pptx
DSP Processor.pptx
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Ac922 cdac webinar
Ac922 cdac webinarAc922 cdac webinar
Ac922 cdac webinar
 
hard ware compoenents used in Computer Graphics.pptx
hard ware compoenents used in Computer Graphics.pptxhard ware compoenents used in Computer Graphics.pptx
hard ware compoenents used in Computer Graphics.pptx
 
Tesla personal super computer
Tesla personal super computerTesla personal super computer
Tesla personal super computer
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
 
A New Approach for Parallel Region Growing Algorithm in Image Segmentation u...
A New Approach for Parallel Region Growing Algorithm in Image Segmentation u...A New Approach for Parallel Region Growing Algorithm in Image Segmentation u...
A New Approach for Parallel Region Growing Algorithm in Image Segmentation u...
 

Recently uploaded

NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
Amil baba
 
F5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptxF5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptx
ArjunJain44
 
Cyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber securityCyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber security
perweeng31
 
Drugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptxDrugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptx
ThalapathyVijay15
 
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
kywwoyk
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
kywwoyk
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
eemet
 
web-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jerweb-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jer
freshgammer09
 
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
PinkySharma900491
 

Recently uploaded (9)

NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
 
F5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptxF5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptx
 
Cyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber securityCyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber security
 
Drugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptxDrugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptx
 
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
 
web-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jerweb-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jer
 
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
 

Nvidia (History, GPU Architecture and New Pascal Architecture)

  • 1.
  • 2. Graphics Processing Unit A Graphics Processing Unit (GPU) also known as a Video Processing Unit (VPU) is an electronic circuit which rapidly manipulates memory to accelerate image creation/processing to be displayed on a display device. The term GPU was given by NVIDIA in 1999 with their release of the GeForce 256 as “The world’s first GPU”.
  • 3. Modern GPUs uses the concept of parallel processing which makes them efficient than CPUs in processing large blocks of data. GPUs uses the process of rendering in which each pixel of the display screen (e.g. 1920x1080=2073600 pixels) is given a value of texture, lighting and location on the screen. Using these parameters, the GPUs make 3D images on a 2D screen. The GPUs have there own dedicated RAM or Virtual RAM or VRAM which stores information about each pixel (its color, location and lighting) and also can store frame buffers i.e. a full image which is going to be projected on the screen.
  • 4. GPU vs. CPU Graphics Processing Unit • Performs Data Parallelism. • Have more cores but low clock speed. • Uses VRAM which is fast but small in size. • Has low cache memory. Central Processing Unit • Performs Task Parallelism. • Have less cores but high clock speed. • Uses External RAM which is slow but large in size. • Has high cache memory.
  • 6. Streaming Multiprocessors and CUDA Cores A Streaming Multiprocessor is the main computation unit of a GPU. It consists more smaller processing units called Stream Processors or CUDA Cores. CUDA is the term coined by NVIDIA which stands for Compute Unified Device Architecture. A CUDA core consists of further two units, a Floating Point Unit which computes floating point data and an Integer Unit which computes integer data. There are 16 Load/Store Unit per SM
  • 7. There are also four Special Function Units which executes transcendental instructions such as sine, cosine, reciprocal & square root. Each SM uses two Warp Scheduler & two Instruction Dispatch Units, allowing two warps to be issued & executed concurrently. One of the feature of SM is the Fused Multiply-Add (FMA) instruction. A frequently used sequence of operation in computer graphics and in linear algebra is to multiply two numbers and add the product to a third number. Prior to FMA, this was achieved by Multiply-Add (MAD) instruction in which performs multiplication with truncation, followed by the addition. The FMA maintains full precision after multiplication for both
  • 8. CUDA Processing Flow 1) The main memory (RAM) copies data onto the GPU memory (VRAM) and the data waits there for further instructions. 2) Now the CPU instructs the GPU to compute the data. At this point, the GPU picks up the data and throws it onto the SPs or CUDA Cores. 3) The data processing occur in such a way that the same kind of instructions are executed in parallel saving processing time.
  • 9. Graphics Pipelining Vertex Shader: Provides location of vertices in a 3D space. Generating Primitives: Making polygons using vertex in 3D space. Rasterization: Process of filing triangular geometries with dots or pixels. Pixel Shader: Defines each pixel with attributes such as light and color. Testing and Mixing: Here the 3D objects are tested for shadow effects and also the Anti-Aliasing (AA) and
  • 10. Components of a Modern GPUs Shader Processing Units (CUDA Cores) : A CUDA Core is basically a parallel processor which computes graphics calculations. Texture Mapping Unit (TMU) : A TMU is used to resize, rotate or distort a bitmap image to be placed on a 3D model as texture. The measure of how fast a GPU can map a texture onto a 3D model is given by the Texture Fill Rate. Raster Operation Pipeline (ROP) : The final step of rendering is done by the ROP. The ROP takes up pixel & texel information and process it to give a final texture and depth value to a pixel. The measure of how quickly a ROP can do
  • 11. Nvidia Pascal GP100 The Most Advanced Datacenter Accelerator
  • 12. Pascal Features • Fabricated using 16nm FinFET technology by TSMC. • New Chip-on-Wafer-on-Substrate (CoWoS) HBM2 high bandwidth memory. • NVLink for high bandwidth interconnect between P100 chips. • Unified Memory, Compute Preemption and New AI Algorithms.
  • 13. NVLink NVLink is a communication protocol developed by Nvidia. NVLink specifies a point-to-point interconnect between a CPU and a GPU and also between a GPU and another GPU. With NVLink a GPU can access system memory at a rate of 160 GB/s which was 120 GB/s in previous Maxwell architecture.
  • 14. HBM 2 HBM stands for High Bandwidth Memory in which one or more memory dies are vertically stacked over each other whereas traditionally discrete memory chips were soldered around the GPU chip. The HBM2 provides a stack of 8 GB of DRAM with 180 GB/s of data transfer rates which was earlier limited to 2 GB of stacks at 125GB/s in HBM1.
  • 15. Unified Memory, Compute Preemption and New AI Algorithms Unified Memory: Unified memory is an essential part of CUDA programming model which greatly simplifies programming and porting of applications to GPU by providing a single unified virtual address space for accessing all CPU and GPU memory in a system. Compute Preemption: This feature allows the compute tasks running on GPU to be interrupted at instruction level solving the problem of long running or ill-behaved applications which causes the system to became unresponsive while it waits for the task to get completed. New AI Algorithms: The Pascal features new AI technology
  • 16. Applications Mainly used for parallel computing for high calculation intensive tasks. Used in many Physics Simulations, Statistical Physics, Fast Fourier Transforms, Fuzzy Logics, Analog Signal Processing and Digital Image Processing. The new Pascal Architecture features Deep Learning where neural learning process of a human brain is modeled through which the system continuously learning, getting smarter and delivering more accurate results over time. FUNAC, a leading brand in manufacturing Robots demonstrated an assembly line robot powered by Pascal GPU