Successfully reported this slideshow.
You’ve unlocked unlimited downloads on SlideShare!
Ravi Kumar Parmar
Student ,BCA V semester(Roll-42)
School of Computer And Systems Sciences
Jaipur National University
The graphics processing unit (GPU) has become
an integral part of today’s mainstream
computing systems. Over the past six years,
there has been a marked increase in the
performance and capabilities of GPUs.
GPU is a graphical processing unit which enables
you run high definitions graphics on your PC,
which are the emend of modern computing. Like
the CPU (Central Processing Unit), it is a single-chip
processor. The GPU has hundreds of cores as
compared to the 4 or 8 in the latest CPUs. The
primary job of the GPU is to compute 3D
This has already been undertaken by a number of
graphics processor vendors, such as in NVidia's Pure
video technology, and AT i's, A vivo. Both companies’
technologies offload a number of the most
computationally intensive aspects of MPEG decoding
to the GPU, in order to speed up the process over
the CPU alone so we will concentrate our work
on the post-decoding phases.
With this paper we aim to fill the existing gap in the
literature using the broader perspective of SOA. We
do not restrict our self to specific problems but give
an overview of the multitude of existing application
examples and the promising future employment of
graphics hardware in SOAs. By classifying the use
cases into the layer of reference architecture, we
show which specific advantage of GPUs is most
beneficial at each layer
Control hardware dominates processors
Complex, difficult to build and verify
Takes substantial fraction of die Scales poorly
Pay for max throughput, sustain average
Quadratic dependency checking
Control hardware doesn’t do any math!
Over the past few years, the GPU has
evolved from a fixed-function special-purpose
processor into a full-fledged parallel
programmable processor with additional
fixed-function special-purpose functionality.
The Graphics Pipeline
- The input to the GPU is a list of
geometric primitives, typically triangles, in a
3-D world coordinate system.Through many
steps, Vertex Operations: The input primitives
are formed from individual vertices. Each
vertex must be transformed into screen space
and shaded, typically through computing their
interaction with the lights in the scene.
Because typical scenes have tens to hundreds
of thousands of vertices, and each vertex can
be computed independently, this stage is well
suited for parallel hardware.
Evolution of GPU Architecture
- The fixed-function pipeline lacked the
generality to efficiently express more complicated
shading and lighting operations that are essential
for complex effects. The key step was replacing
the fixed-function per-vertex and per-fragment
operations with user-specified programs run on
each vertex and fragment. Over the past six years,
these vertex programs and fragment programs
have become increasingly more capable, with
larger limits on their size and resource
consumption, with more fully featured instruction
sets, and with more flexible control-flow
Architecture Of Modern Gpu
- we noted that the GPU is built for different
application demands than the CPU: large parallel
computation requirements with an emphasis on
throughput rather than latency. Consequently, the
architecture of the GPU has progressed in a different
direction than that of the CPU.
- The CPU divides the pipeline in time, applying
all resources in the processor to each stage in turn.
GPUs have historically taken a different approach. The
GPU divides the resources of the processor among the
different stages, such that the pipeline is divided in
space, not time. The part of the processor working on
one stage feeds its output directly into a different part
that works on the next stage.
– Tesla HPC specific GPUs have evolved
from GeForce series
– Fire Stream HPC specific GPUs have
evolved from (ATI) Radeon series
– Knights Corner many-core x86 chip is
like hybrid between a GPU and many-core CPU
Large computational requirements
Graphics pipeline designed for
Long latencies tolerable
Deep, feed-forward pipelines
Hacks are OK—can tolerate lack of
GPUs are good at parallel, arithmetically
intense, streaming-memory problems
In addition to query processing, large web search
engines need to perform many other operations
including web crawl-ing, index building, and
data mining steps for tasks such as link analysis
and spam and duplicate detection. We focus
here on query processing and in particular on
one phase of this step as explained further
below. We believe that this part is suitable for
implementation on GPUs as it is fairly simple in
structure but nonetheless consumes a
disproportionate amount of the overall system
resources. In contrast, we do not think that
implementation of a complete search engine on
a GPU is currently realistic.
The Gpu Programming Model
- The programmable units of the GPU
follow a single program multiple-data (SPMD)
programming model. For efficiency, the GPU
processes many elements (vertices or
fragments) in parallel using the same program.
Each element is independent from the other
elements, and in the base programming model,
elements cannot communicate with each
other. All GPU programs must be structured in
this way: many parallel elements, each
processed in parallel by a single program.
General-Purpose Computing on the GPU
- Steps to show the simpler and direct way that
today’s GPU computing applications are written.
1. Programming a GPU for Graphics: We begin with the
same GPU pipeline that we described in Section II,
concentrating on the programmable aspects of this
2. The programmer specifies geometry that covers a region
on the screen. The rasterizer generates a fragment at
each pixel location covered by that geometry.
3. Each fragment is shaded by the fragment program.
4. The fragment program computes the value of the
fragment by a combination of math operations and
global memory reads from a global Btexture [memory].
5. The resulting image can then be used as texture on
future passes through the graphics pipeline.
We observe that different use cases weight
the criteria differently—for example a VDI
deployment values high VM-to-GPU
consolidation ratios (e.g., multiplexing)
while a consumer running a VM to access a
game or CAD application unavailable on his
host values performance and likely fidelity. A
tech support person maintaining a library of
different configurations and an IT
administrator running server VMs are both
likely to value portability and secure
- Front-end virtualization introduces a virtualization
boundary at a relatively high level in the stack, and runs the
graphics driver in the host/hypervisor. This approach does
not rely on any GPU vendor- or model-specific de- tails.
-The most obvious back-end virtualization technique
is fixed pass-through: the permanent association of a virtual
machine with full exclusive access to a physical GPU. Recent
chipset features, such as Intel’s VT-d, make fixed pass-through
practical without requiring any special knowledge of
a GPU’s programming interfaces. However, fixed pass-through
is not a general solution. It completely forgoes any
multiplexing and packing machines with one GPU per virtual
machine (plus one for the host) is not feasible.
This paper presents our evaluation and analysis of the efficiency of GPU
computing for data-parallel scientific applications. Starting with a
bimolecular code that calculates electrostatic properties in a data-parallel
manner (i.e., GEM), we evaluate our different implementations
of GEM across three metrics: performance, energy consumption, and
In the future, we will continue this work by investigating the effects of
memory layout (global, constant, texture) on GPU performance and
efficiency. In addition, we will delve further into potential techniques for
proactively reducing power and conserving energy on the gpu.
There is much future work in developing reliable benchmarks which
specifically stress the performance weaknesses of a virtualization layer.
Our tests show API overheads of about 2 to 120 times that of a native
GPU. As a result, the performance of a virtualized GPU can be highly
dependent on subtle implementation details of the application under
Back-end virtualization holds much promise for performance, breadth of
GPU feature support, and ease of driver maintenance. While fixed pass-through
is easy, none of the more advanced techniques have been