Iecon slides

Address generation unit for multimedia
applications
on application specific instruction set
processors
Marc MorenoBerengue, Guillermo Talavera Velilla, Aitor RodriguezAlsina,
Jordi Carrabina
Universitat Autònoma de Barcelona (Spain)

IECON 2010
7–10 November – Phoenix, AZ, USA

Motivation

➢ Design a custom Address Generation Unit (AGU)
➢ Connected to an ASIP datapath

➢ Benefits of custom AGU design
➢ Previous software optimizations.
➢ Multimedia applications

2

Structure
➢ Introduction
➢ Design
➢ Work Flow
➢ Results
➢ Conclusions

3

➢ Introduction
➢ Design
➢ Work Flow
➢ Results
➢ Conclusions

Multimedia applications features
➢ Multimedia applications
➢ Complex index manipulation
➢ Large number of data access
➢ Require
➢ High performance
➢ Low energy consumption

It is crucial reduce these data accesses and related address
computations in an effective way
5

SW optimizations
Data Transfer and Storage Exploration (DTSE)* methodology
has oriented to:
➢ Reduce data transfers between memories and processor
➢ Improve the energy efficiency
➢ Reduce the execution time

SW transformations create high overhead in the address
generation and control flow

*Methodology developed at IMEC research center
6

SW optimizations
...

for (y=0; y<=M+2; ++y){
...
for (x=0; x<=N+2; ++x) {
for (x=1; x<=N-2; ++x)
if (x>=0&&x<N &&y>=1&&y<=M-2)
for (y=1; y<=N-2; ++y)
D[x%3] = B[(y*N+x)%8704+
for (k=-1; k<=1; ++k){
A[x][y] += B[x+k][y] (y*N+x)/8704*16384+7680] ;
*C[abs(k)];
if (x-1>=1&&x-1<=N-2
A[x][y] /=tot;
&&y>=1&&y<=M-2) {
}
for (k=-1; k<=1; ++k)
...
acc += D[(x-1+k)%3]*C[abs(k)];
}

acc /= tot;}

}

...
7

SW optimizations
...

for (y=0; y<=M+2; ++y){
...
for (x=0; x<=N+2; ++x) {
for (x=1; x<=N-2; ++x)
if (x>=0&&x<N &&y>=1&&y<=M-2)
for (y=1; y<=N-2; ++y)
D[x%3] = B[(y*N+x)%8704+
for (k=-1; k<=1; ++k){
A[x][y] += B[x+k][y] (y*N+x)/8704*16384+7680] ;
*C[abs(k)];
if (x-1>=1&&x-1<=N-2
A[x][y] /=tot;
&&y>=1&&y<=M-2) {
}
for (k=-1; k<=1; ++k)
... Need to be optimized acc += D[(x-1+k)%3]*C[abs(k)];
}

acc /= tot;}

}

...
8

Address Generation Unit
The Address Generation Unit (AGU) is a coprocessor which use
the address equation (AE) to generate the address sequence (AS).

&X[AE]=AS

Example:
B[(y*N+x)%8704+(y*N+x)/8704*16384+7680]
AE = (y*N+x) % 8704 + (y*N+x) / 8704*16384+7680
AS = 7680,7681,7682,7683, ...
9

➢ Introduction

➢ Design
➢ Work Flow
➢ Results
➢ Conclusions

Application specific instruction set
processor
Application specific instruction set processor (ASIP)
➢ Extend its instruction set
➢ Fast interface for read/write data from/to specific
hardware
➢ 1 Instruction
➢ 1 Cycle

11

AGU design

➢ AGU attached to the ASIP datapath save execution time
● 1 instruction
● 1 cycle

12

AGU skeleton
The AGU has one control unit,
one process unit and one FIFO
Custom Instruction interface

CI unit

Change AE values

Read AS values

CO unit

AS generation

13

AGU skeleton

➢ CI (custom instruction) unit CI unit

Change AE values
• AE configuration & read FIFO
Read AS values

CO unit

AS generation

14

AGU skeleton

➢ CI (custom instruction) unit CI unit

Change AE values
• AE configuration & read FIFO
Read AS values

➢ CO (coprocessador) unit CO unit

• Calculate the AE to generate the
AS and store all values in the AS generation

FIFO

15

AGU Creator

Web based application
16

➢ Introduction
➢ Design

➢ Work Flow
➢ Results
➢ Conclusions

Work Flow
Init.c Opt.c CI_code.c
int A[70],B[70],C=0; int A[7],B[7],C=0; int A[7],B[7],C=0,ix,x;

... ... initAGU(); initAGU2();

for (i=7; i<70; i++) for (i=7; i<70; i++) ...

{ { for (i=7; i<70; i++)

B[i]=A[i-7]+B[i-7]; B[i%7]=A[(i-7)%7] {

A[i]=i; SW Opt. +B[(i-7)%7]; x=readAGU();

C+=B[i]; (DTSE) A[i%7]=i; ix=readAGU2();

} C+=B[i%7]; B[x]=A[ix]+B[ix];

... } AGUs A[x]=i;
... C+=B[x];

}

... 19

➢ Introduction
➢ Design
➢ Work Flow

➢ Results
➢ Conclusions

Test environment
➢ NIOS II softcore processor (Altera)
● 32 bits RISC processor
● Harvard memory architecture
● Data/Instructions cache
● 256 Custom Instructions (Fast datapath interface)

➢ Cyclone II EP2C35 Altera FPGA

21

Test Applications

➢ Cavity Detector
Medical imaging application to detect cavities on tomography scans

➢ Quadtree Structured Difference Pulse Code Modulation
(QSDPCM)
An interframe compression technique for video imaging.

22

Speedup
Speedup ( Cavity ) Speedup ( QSDPCM )
1.4 1.4

1.2 1.2

1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0
DTSE
Init AGU inclusion
HW AGU inclusion DTSE
Init AGU inclusion
HW AGU inclusion

Speedup: 1.26 Speedup: 1.19

23

Energy improvements
Energy ( Cavity ) Energy ( QSDPCM )
1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0
DTSE
Init AGU inclusion
HW AGU inclusion DTSE
Init AGU inclusion
HW AGU inclusion

Energy reduction: 27% Energy reduction: 21%

24

Area penalties

Cavity (LEs) QSPCM (LEs)

NIOS-F 2644 2644

NIOS-F +AGU 3596 3592

The AGU inclusion in the NIOS II architecture use
2.9% of total FPGA resources (33216LEs)

25

➢ Introduction
➢ Design
➢ Work Flow
➢ Results

➢ Conclusions

Conclusions
➢ Extend an ASIP by AGUs is an efficient way to meet the
performance/energy requirements of multimedia applications
after some SW optimizations

➢ The innovation of connecting the AGU in the processor data
path and working in parallel with the main processor allow
calculate a wide range of values before the processor needs them

➢ Use an AGU skeleton and a wizard decrease the design and
implementation time.

27

Future Work
➢ Improve the AGU wizard in order to:

● Detect automatically AEs and show relevant informations
about each AE for a given C file.
● Generate the appropriate AGU for a specific set of AEs
● Generate AGUs for more than one ASIP

➢ Extend the set of applications have been used in this work

28

Iecon slides

Recommended

Recommended

More Related Content

Similar to Iecon slides

Similar to Iecon slides (20)

Iecon slides