3. 9/5/2012 Embedded Systems 3
Introduction to DSP
DSP is application of mathematical operations to digitally
represented signals
Signals represented digitally as sequences of samples.
Digital signals obtained from physical signals via tranducers
(e.g., microphones) and analog to digital converters (ADC).
Digital signals converted back to physical signals via digital-to-
analog converters (DAC)
4. 9/5/2012 Embedded Systems 4
Common DSP applications & algorithms
Applications
Communications
Audio and video processing
Graphics, image enhancement, 3- D rendering
Navigation, Radar, GPS
Control - robotics, machine vision, guidance
Algorithms
Frequency domain filtering - FIR and IIR
Frequency- time transformations - FFT
Correlation, convolution
Estimation /Prediction.
5. 9/5/2012 Embedded Systems 5
What Do DSPs Need to Do Well?
Most DSP tasks require:
Repetitive numeric computations
Attention to numeric fidelity
High memory bandwidth, mostly via array accesses
Real-time processing
DSPs must perform these tasks efficiently
while minimizing:
Cost
Power
Memory use
Development time
6. 9/5/2012 Embedded Systems 6
DSP Algorithms Mold DSP Architectures
FIR Filtering: A Motivating Problem
h(k)=filter coefficients
x(n)=input samples
y(n)=output samples
y[n]= ∑ h[k] * x[n-k]
7. 9/5/2012 Embedded Systems 7
loop: lw x0, (r0)
lw y0, (r1)
mul a, x0,y0
add b,a,b
inc r0
inc r1
dec ctr
tst ctr
jnz loop
sw b,(r2)
inc r2
FIR Filtering: on GPP
Problems:
•Memory bandwidth bottleneck
•Control code and addressing overhead
•Possibly slow multiply
8. 9/5/2012 Embedded Systems 8
16-bit fixed-point
“Harvard architecture”– separate
instruction, data memories
Specialized instruction set– Load
and Accumulate
Accumulator
Highly parallel Multiple-
Accumulate (MAC)
FIR Filtering: on an early DSP (TMS32010)
Processor
Instruction
Memory
Data
Memory
Early Improvements:
•Memory Structure
•Datapath structure
9. 9/5/2012 Embedded Systems 9
DSP Architecture Characteristics
Data path configured for DSP
Multiple memory banks and buses
Specialized instruction set
Specialized addressing modes
Specialized execution control
Specialized peripherals for DSP
10. 9/5/2012 Embedded Systems 10
DSP Data Path-I
Arithmetic: DSPs use Fixed Point Arithmetic instead of Floating
Point data type.
Fixed pt data use fixed number of bits.
Floating pt allow wider range of values eliminating the hazard of
numeric overflow.
Fixed pt implementation is simpler thus making it cheaper and less
power hungry.
Word size affects precision of fixed point numbers
DSPs have 16-bit, 20-bit, or 24-bit data words for audio processing.
Floating Point DSPs cost 2x – 4x vs fixed point DSPs and also slower
than fixed point
11. 9/5/2012 Embedded Systems 11
DSP Data Path-II
Fast Multiplier :
DSP processors integrate MAC H/W in to
main datapath
Specialized hardware performs all key arithmetic operations in
1 cycle
≥50% of instructions can involve multiplier
single cycle latency multiplier
Parallel multiply-accumulate (MAC)
n-bit multiplier => 2n-bit product.
12. 9/5/2012 Embedded Systems 12
DSP Data Path-III
Accumulator : Don’t want overflow or have to scale
accumulator
Option 1: accumulator wider than product:" guard bits”
Ex: Motorola DSP: 24b x 24b => 48b product, 56b Accumulator
Option 2: shift right and round product before adder
13. 9/5/2012 Embedded Systems 13
DSP Memory
Harvard Architecture:
DSPs want multiple data ports
Instruction repeat buffer: do 1
instruction 256 times.
Some recent DSPs have instruction
caches
Even then may allow programmer
to “lock in” instructions into cache
Option to turn cache into fast
program memory
No DSPs have data caches
May have multiple data memories
Modified Harvard Architecture.
14. 9/5/2012 Embedded Systems 14
DSP Addressing Modes
Want to keep MAC datapth busy
Complex addressing is good
Don’t use datapath to calculate fancy address
Absolute Addressing Modes commonly used
Mem.Access in DSPs have predictable pattern
Circular Buffers to support FIR
Bit Reversal Addressing to support FFT.
15. 9/5/2012 Embedded Systems 15
DSP specialized instruction set
May specify multiple operations in a single instruction
Must support Multiply-Accumulate (MAC)
Need parallel move support
Usually have special loop support to reduce branch
overhead
May have saturating arithmetic.
May have conditional execution to reduce branches.