Nvidia (History, GPU Architecture and New Pascal Architecture)

Graphics Processing Unit
A Graphics Processing Unit (GPU)
also known as a Video Processing
Unit (VPU) is an electronic circuit
which rapidly manipulates memory to
accelerate image
creation/processing to be displayed
on a display device.
The term GPU was given by NVIDIA
in 1999 with their release of the
GeForce 256 as “The world’s first
GPU”.

Modern GPUs uses the
concept of parallel
processing which makes
them efficient than CPUs in
processing large blocks of
data.
GPUs uses the process of rendering in which each pixel of
the display screen
(e.g. 1920x1080=2073600 pixels) is given a value of
texture, lighting and location on the screen. Using these
parameters, the GPUs make 3D images on a 2D screen.
The GPUs have there own dedicated RAM or Virtual RAM
or VRAM which stores information about each pixel (its
color, location and lighting) and also can store frame
buffers i.e. a full image which is going to be projected on
the screen.

GPU vs. CPU
Graphics Processing
Unit
• Performs Data Parallelism.
• Have more cores but low
clock speed.
• Uses VRAM which is fast
but small in size.
• Has low cache memory.
Central Processing Unit
• Performs Task Parallelism.
• Have less cores but high
clock speed.
• Uses External RAM which is
slow but large in size.
• Has high cache memory.

Streaming Multiprocessors and
CUDA Cores
A Streaming Multiprocessor is the
main computation unit of a GPU. It
consists more smaller processing
units called Stream Processors or
CUDA Cores.
CUDA is the term coined by NVIDIA
which stands for Compute Unified
Device Architecture. A CUDA core
consists of further two units, a
Floating Point Unit which computes
floating point data and an Integer Unit
which computes integer data.
There are 16 Load/Store Unit per SM

There are also four Special Function Units which executes
transcendental instructions such as sine, cosine, reciprocal
& square root.
Each SM uses two Warp Scheduler & two Instruction
Dispatch Units, allowing two warps to be issued & executed
concurrently.
One of the feature of SM is the Fused Multiply-Add (FMA)
instruction. A frequently used sequence of operation in
computer graphics and in linear algebra is to multiply two
numbers and add the product to a third number.
Prior to FMA, this was achieved by Multiply-Add (MAD)
instruction in which performs multiplication with truncation,
followed by the addition.
The FMA maintains full precision after multiplication for both

CUDA Processing Flow
1) The main memory (RAM)
copies data onto the GPU
memory (VRAM) and the data
waits there for further
instructions.
2) Now the CPU instructs the
GPU to compute the data. At
this point, the GPU picks up
the data and throws it onto the
SPs or CUDA Cores.
3) The data processing occur in
such a way that the same kind
of instructions are executed
in parallel saving processing
time.

Graphics Pipelining
Vertex Shader: Provides location of vertices in a 3D space.
Generating Primitives: Making polygons using vertex in
3D space.
Rasterization: Process of filing triangular geometries with
dots or pixels.
Pixel Shader: Defines each pixel with attributes such as
light and color.
Testing and Mixing: Here the 3D objects are tested for
shadow effects and also the Anti-Aliasing (AA) and

Components of a Modern GPUs
Shader Processing Units (CUDA Cores) : A CUDA Core is
basically a parallel processor which computes graphics
calculations.
Texture Mapping Unit (TMU) : A TMU is used to resize,
rotate or distort a bitmap image to be placed on a 3D model
as texture. The measure of how fast a GPU can map a texture
onto a 3D model is given by the Texture Fill Rate.
Raster Operation Pipeline (ROP) : The final step of
rendering is done by the ROP. The ROP takes up pixel & texel
information and process it to give a final texture and depth
value to a pixel. The measure of how quickly a ROP can do

Nvidia Pascal
GP100
The Most Advanced
Datacenter Accelerator

Pascal Features
• Fabricated using 16nm FinFET technology by TSMC.
• New Chip-on-Wafer-on-Substrate (CoWoS) HBM2 high
bandwidth memory.
• NVLink for high bandwidth interconnect between P100
chips.
• Unified Memory, Compute Preemption and New AI
Algorithms.

NVLink
NVLink is a communication
protocol developed by Nvidia.
NVLink specifies a point-to-point
interconnect between a CPU and
a GPU and also between a GPU
and another GPU.
With NVLink a GPU can access
system memory at a rate of 160
GB/s which was 120 GB/s in
previous Maxwell architecture.

HBM 2
HBM stands for High Bandwidth Memory in which one
or more memory dies are vertically stacked over each
other whereas traditionally discrete memory chips were
soldered around the GPU chip.
The HBM2 provides a stack of 8 GB of DRAM with 180
GB/s of data transfer rates which was earlier limited to 2
GB of stacks at 125GB/s in HBM1.

Unified Memory, Compute Preemption
and New AI Algorithms
Unified Memory: Unified memory is an essential part of
CUDA programming model which greatly simplifies
programming and porting of applications to GPU by
providing a single unified virtual address space for
accessing all CPU and GPU memory in a system.
Compute Preemption: This feature allows the compute tasks
running on GPU to be interrupted at instruction level solving
the problem of long running or ill-behaved applications
which causes the system to became unresponsive while it
waits for the task to get completed.
New AI Algorithms: The Pascal features new AI technology

Applications
Mainly used for parallel computing for high calculation
intensive tasks.
Used in many Physics Simulations, Statistical Physics,
Fast Fourier Transforms, Fuzzy Logics, Analog Signal
Processing and Digital Image Processing.
The new Pascal Architecture features Deep Learning where
neural learning process of a human brain is modeled through
which the system continuously learning, getting smarter
and delivering more accurate results over time.
FUNAC, a leading brand in manufacturing Robots
demonstrated an assembly line robot powered by Pascal GPU

Nvidia (History, GPU Architecture and New Pascal Architecture)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Nvidia (History, GPU Architecture and New Pascal Architecture)

Similar to Nvidia (History, GPU Architecture and New Pascal Architecture) (20)

Recently uploaded

Recently uploaded (9)

Nvidia (History, GPU Architecture and New Pascal Architecture)