Stream Processing

Stream processing is a computer
programming paradigm, related to
SIMD

Stream processing is a computer
programming paradigm, related to
SIMD

It allows some applications to more
easily exploit a limited form of
parallel processing

A stream is simply a set of records
that require similar computation.
Streams provide data parallelism


Kernels are the functions that
are applied to each element in
the stream


Kernels are the functions that
are applied to each element in
the stream

For each element we can only read from the
input, perform operations on it, and write to the
output

Stream processing is especially suitable for
applications that exhibit three characteristics ---

Flynn’s Taxonomy: SISD

Single Instruction: Only one instruction stream is being acted on
by the CPU during any one clock cycle

Single Data: Only one data stream is being used as input during
any one clock cycle

Flynn’s Taxonomy: SIMD

Single Instruction: All processing units execute the same
instruction at any given clock cycle

Multiple Data: Each processing unit can operate on a different
data element

Flynn’s Taxonomy: MISD

Multiple Instruction: Each processing unit operates on the data
independently via separate instruction streams.

Single Data: A single data stream is fed into multiple processing
units.

Flynn’s Taxonomy: MIMD

Multiple Instruction: Every processor may be executing a different
instruction stream

Multiple Data: Every processor may be working with a different data
stream

Stream Processors

stream processing makes use of locality of reference by explicitly
grouping related code and data together for easy fetching into the
cache

A stream processing language for programs based
on streams of data

e.g Audio, video, DSP, networking,
and cryptographic processing kernels

HDTV editing, radar tracking, microphone arrays,
cellphone base stations, graphics

[Thies 2002]

A high-level, architecture-independent language
for streaming applications

1. Improves programmer productivity (vs.
Java, C)

2. Offers scalable performance on multicores

[Thies 2002]

GPU
GPU is a single-chip processor
that creates lighting effects and
transforms objects every time a 3D
scene is redrawn

Used primarily for 3-D
applications.

a GPU can be present on a video
card, or it can be on the
motherboard, or in certain CPUs, on
the CPU die

World’s First GPu
Nvidia in 1999 marketed the GeForce 256 as "the world's
first 'GPU, a single-chip processor that is capable of
processing a minimum of 10 million polygons per second".

Rival ATI Technologies coined the term visual processing
unit or VPU with the release of the Radeon 9700 in 2002.

GPUs have a very high compute capacity


To the hardware, the accelerator
looks like another IO unit; it
communicates with the CPU using IO
commands and DMA memory transfers


To the hardware, the accelerator
looks like another IO unit; it To the software, the accelerator
communicates with the CPU using IO is another computer to which your
commands and DMA memory transfers program sends data and routines
to execute

GPGPU
This concept turns the massive floating-point computational
power of a modern graphics accelerator into general-purpose
computing power

GPGPU
computing power

GPUs are stream processors – processors that can operate
in parallel by running a single kernel on many records in
a stream at once

GPGPU
computing power

GPUs are stream processors – processors that can operate
in parallel by running a single kernel on many records in
a stream at once

Ideal GPGPU applications have large data sets, high parallelism,
and minimal dependency between data elements

In certain circumstances the GPU calculates forty
times faster than the conventional CPUs


AMD
Athlon 64 CPU 154 m
X2


AMD ATI X1950
Athlon 64 CPU 154 m GPU 384 m
XTX
X2


AMD ATI X1950
XTX
X2

Intel Core 2
CPU 582 m
Quad


AMD ATI X1950
XTX
X2

Intel Core 2 NVIDIA
CPU 582 m GPU 680 m
Quad G8800 GTX

“The processing power of just 5,000 ATI processors is
also enough to rival that of the existing 200,000
computers currently involved in the Folding@home project”

[Ref 1]

“The processing power of just 5,000 ATI processors is
also enough to rival that of the existing 200,000
computers currently involved in the Folding@home project”

“..it is estimated that if a mere 10,000 computers were to
each use an ATI processor to conduct folding research, that
the Folding@home program would effectively perform faster
than the fastest supercomputer in existence
today, surpassing the 1 petaFLOP level “- 2007

November 10, 2011- Folding@home 6.0 petaFlop where
8.162 petaFLOP ( K computer) [Ref 1]

comparing GPUs to CPUs isn't an
apples-to-apples comparison

The clock rates are lower

the architectures are radically
different

the problems they're trying to solve are almost
completely unrelated

Application Processor:

Executes application code like
MPEG decoding

Sequences the instructions and
issues them to Stream clients
e.g KEU and
DRAM interface

[Kapasi 2003]

Two Stream Clients:

KEU:
Programmable Kernel Execution
Unit

DRAM interface:

Provides access to global data
storage

[Kapasi 2003]

KEU:

It has two stream level
instructions:

1. load_kernel – loads
compiled kernel function in
the local instruction
storage inside the KEU

2. run_kernel – executes the
kernel

[Kapasi 2003]

DRAM interface:

Two stream level instructions
as well –

1. load_stream – loads an
entire stream from SRF

2. store_stream – stores a
stream into SRF

[Kapasi 2003]

Local register files (LRFs)

1. use for operands for
arithmetic operations
(similar to caches on CPUs)

2. exploit fine-grain locality

[Kapasi 2003]

Stream register files
(SRFs)

1. capture coarse-grain
locality
2. efficiently transfer data
to and from the LRFs

[Kapasi 2003]

Topics learnt today:

1. Stream Processing
3. How modern GPUs use stream processing

4. Imagine Stream Processor from
Stanford
2. StreamIT language from MIT

Stream Processing

More Related Content

What's hot

Similar to Stream Processing

Recently uploaded

Stream Processing