Introduction to Digital Signal
Processors (DSPs)
Dr. Konstantinos Tatas
ACOE343 - Embedded Real-Time Processor Systems -
Frederick University
2
Outline/objectives
• Identify the most important DSP processor
architecture features and how they relate
to DSP applications.
ACOE343 - Embedded Real-Time Processor Systems -
Frederick University
3
What is a DSP?
• A specialized microprocessor for real-
time DSP applications
– Digital filtering (FIR and IIR)
– FFT
– Convolution, Matrix Multiplication etc
ADC DAC
DSP
ANALOG
INPUT
ANALOG
OUTPUT
DIGITAL
INPUT
DIGITAL
OUTPUT
ACOE343 - Embedded Real-Time Processor Systems -
Frederick University
4
Hardware used in DSP
ASIC FPGA GPP DSP
Performance Very High High Medium Medium High
Flexibility Very low High High High
Power
consumption
Very low low Medium Low Medium
Development
Time
Long Medium Short Short
ACOE343 - Embedded Real-Time Processor Systems -
Frederick University
5
Common DSP features
• Harvard architecture
• Dedicated single-cycle Multiply-Accumulate
(MAC) instruction (hardware MAC units)
• Single-Instruction Multiple Data (SIMD) Very
Large Instruction Word (VLIW) architecture
• Pipelining
• Cache
• DMA
ACOE343 - Embedded Real-Time Processor Systems -
Frederick University
6
Harvard Architecture
• Physically separate
memories and paths
for instruction and
data
DATA
MEMORY
PROGRAM
MEMORY
CPU
ACOE343 - Embedded Real-Time Processor Systems -
Frederick University
7
Single-Cycle MAC unit
Multiplier
Adder
Register
a x
i i
a x
i i
a x
i-1 i-1
a x
i i a x
i-1 i-1
+
Σ(a x )
i i
i=0
n
Can compute a sum of n-
products in n cycles
ACOE343 - Embedded Real-Time Processor Systems -
Frederick University
8
Single Instruction - Multiple Data
(SIMD)
• A technique for data-level parallelism by
employing a number of processing
elements working in parallel
ACOE343 - Embedded Real-Time Processor Systems -
Frederick University
9
Very Long Instruction Word (VLIW)
• A technique for
instruction-level
parallelism by executing
instructions without
dependencies (known at
compile-time) in parallel
• Example of a single
VLIW instruction:
F=a+b; c=e/g; d=x&y; w=z*h;
VLIW instruction F=a+b c=e/g d=x&y w=z*h
PU
PU
PU
PU
a
b
F
c
d
w
e
g
x
y
z
h
ACOE343 - Embedded Real-Time Processor Systems -
Frederick University
10
CISC vs. RISC vs. VLIW
ACOE343 - Embedded Real-Time Processor Systems -
Frederick University
11
Pipelining
• DSPs commonly feature deep pipelines
• TMS320C6x processors have 3 pipeline stages
with a number of phases (cycles):
– Fetch
• Program Address Generate (PG)
• Program Address Send (PS)
• Program ready wait (PW)
• Program receive (PR)
– Decode
• Dispatch (DP)
• Decode (DC)
– Execute
• 6 to 10 phases
ACOE343 - Embedded Real-Time Processor Systems -
Frederick University
12
Direct Memory Access (DMA)
• The feature that allows peripherals to access
main memory without the intervention of the
CPU
• Typically, the CPU initiates DMA transfer, does
other operations while the transfer is in
progress, and receives an interrupt from the
DMA controller once the operation is complete.
• Can create cache coherency problems (the data
in the cache may be different from the data in
the external memory after DMA)
• Requires a DMA controller
ACOE343 - Embedded Real-Time Processor Systems -
Frederick University
13
Cache memory
• Separate instruction and data L1 caches
(Harvard architecture)
• most systems uses DMA
ACOE343 - Embedded Real-Time Processor Systems -
Frederick University
14
DSP vs. Microcontroller
• DSP
– Harvard Architecture
– VLIW/SIMD (parallel
execution units)
– No bit level operations
– Hardware MACs
– DSP applications
• Microcontroller
– Mostly von Neumann
Architecture
– Single execution unit
– Flexible bit-level
operations
– No hardware MACs
– Control applications
15
The TMS320C6713’s high performance CPU and rich peripheral
set are tailored for multichannel audio applications such as
broadcast and recording mixing,
home and large venue audio decoders,
and multi-zone audio distribution.
The TMS320C6713 device is
based on the high-performance advanced VelociTI very-long-
instruction-word (VLIW)architecture developed by Texas
Instruments (TI).
The VelociTI architecture provides ample performance to decode
a variety of existing digital audio formats and the flexibility to add
future formats.
Architecture of TMS320C67xx
TMS320C6713 DSP Starter Kit (DSK) Block Diagram
16
• A TMS320C6713 DSP operating at 225
MHz.
• 16 Mbytes of synchronous DRAM
• 512 Kbytes of non-volatile Flash memory
• (256 Kbytes usable in default conguration)
• 4 user accessible LEDs and DIP switches
• Software board conguration through
• registers implemented in CPLD
ACOE343 - Embedded Real-Time Processor Systems -
Frederick University
17
• JTAG emulation through on-board JTAG
• emulator with USB host interface or
external emulator
18
19
20
21
22
ACOE343 - Embedded Real-Time Processor Systems -
Frederick University
23
Review Questions
• Which of the following is not a typical DSP
feature?
– Dedicated multiplier/MAC
– Von Neumann memory architecture
– Pipelining
– Saturation arithmetic
• Which implementation would you choose for
lowest power consumption?
– ASIC
– FPGA
– General-Purpose Processor
– DSP
24
References
• DR. Chassaing, “DSP Applications using
C and the TMS320C6x DSK”, Wiley, 2002
• Texas Instruments, TMS320C64x
datasheets
• Analog Devices, ADSP-21xx Processors

digital signal processing Power point presentation

  • 1.
    Introduction to DigitalSignal Processors (DSPs) Dr. Konstantinos Tatas
  • 2.
    ACOE343 - EmbeddedReal-Time Processor Systems - Frederick University 2 Outline/objectives • Identify the most important DSP processor architecture features and how they relate to DSP applications.
  • 3.
    ACOE343 - EmbeddedReal-Time Processor Systems - Frederick University 3 What is a DSP? • A specialized microprocessor for real- time DSP applications – Digital filtering (FIR and IIR) – FFT – Convolution, Matrix Multiplication etc ADC DAC DSP ANALOG INPUT ANALOG OUTPUT DIGITAL INPUT DIGITAL OUTPUT
  • 4.
    ACOE343 - EmbeddedReal-Time Processor Systems - Frederick University 4 Hardware used in DSP ASIC FPGA GPP DSP Performance Very High High Medium Medium High Flexibility Very low High High High Power consumption Very low low Medium Low Medium Development Time Long Medium Short Short
  • 5.
    ACOE343 - EmbeddedReal-Time Processor Systems - Frederick University 5 Common DSP features • Harvard architecture • Dedicated single-cycle Multiply-Accumulate (MAC) instruction (hardware MAC units) • Single-Instruction Multiple Data (SIMD) Very Large Instruction Word (VLIW) architecture • Pipelining • Cache • DMA
  • 6.
    ACOE343 - EmbeddedReal-Time Processor Systems - Frederick University 6 Harvard Architecture • Physically separate memories and paths for instruction and data DATA MEMORY PROGRAM MEMORY CPU
  • 7.
    ACOE343 - EmbeddedReal-Time Processor Systems - Frederick University 7 Single-Cycle MAC unit Multiplier Adder Register a x i i a x i i a x i-1 i-1 a x i i a x i-1 i-1 + Σ(a x ) i i i=0 n Can compute a sum of n- products in n cycles
  • 8.
    ACOE343 - EmbeddedReal-Time Processor Systems - Frederick University 8 Single Instruction - Multiple Data (SIMD) • A technique for data-level parallelism by employing a number of processing elements working in parallel
  • 9.
    ACOE343 - EmbeddedReal-Time Processor Systems - Frederick University 9 Very Long Instruction Word (VLIW) • A technique for instruction-level parallelism by executing instructions without dependencies (known at compile-time) in parallel • Example of a single VLIW instruction: F=a+b; c=e/g; d=x&y; w=z*h; VLIW instruction F=a+b c=e/g d=x&y w=z*h PU PU PU PU a b F c d w e g x y z h
  • 10.
    ACOE343 - EmbeddedReal-Time Processor Systems - Frederick University 10 CISC vs. RISC vs. VLIW
  • 11.
    ACOE343 - EmbeddedReal-Time Processor Systems - Frederick University 11 Pipelining • DSPs commonly feature deep pipelines • TMS320C6x processors have 3 pipeline stages with a number of phases (cycles): – Fetch • Program Address Generate (PG) • Program Address Send (PS) • Program ready wait (PW) • Program receive (PR) – Decode • Dispatch (DP) • Decode (DC) – Execute • 6 to 10 phases
  • 12.
    ACOE343 - EmbeddedReal-Time Processor Systems - Frederick University 12 Direct Memory Access (DMA) • The feature that allows peripherals to access main memory without the intervention of the CPU • Typically, the CPU initiates DMA transfer, does other operations while the transfer is in progress, and receives an interrupt from the DMA controller once the operation is complete. • Can create cache coherency problems (the data in the cache may be different from the data in the external memory after DMA) • Requires a DMA controller
  • 13.
    ACOE343 - EmbeddedReal-Time Processor Systems - Frederick University 13 Cache memory • Separate instruction and data L1 caches (Harvard architecture) • most systems uses DMA
  • 14.
    ACOE343 - EmbeddedReal-Time Processor Systems - Frederick University 14 DSP vs. Microcontroller • DSP – Harvard Architecture – VLIW/SIMD (parallel execution units) – No bit level operations – Hardware MACs – DSP applications • Microcontroller – Mostly von Neumann Architecture – Single execution unit – Flexible bit-level operations – No hardware MACs – Control applications
  • 15.
    15 The TMS320C6713’s highperformance CPU and rich peripheral set are tailored for multichannel audio applications such as broadcast and recording mixing, home and large venue audio decoders, and multi-zone audio distribution. The TMS320C6713 device is based on the high-performance advanced VelociTI very-long- instruction-word (VLIW)architecture developed by Texas Instruments (TI). The VelociTI architecture provides ample performance to decode a variety of existing digital audio formats and the flexibility to add future formats.
  • 16.
    Architecture of TMS320C67xx TMS320C6713DSP Starter Kit (DSK) Block Diagram 16
  • 17.
    • A TMS320C6713DSP operating at 225 MHz. • 16 Mbytes of synchronous DRAM • 512 Kbytes of non-volatile Flash memory • (256 Kbytes usable in default conguration) • 4 user accessible LEDs and DIP switches • Software board conguration through • registers implemented in CPLD ACOE343 - Embedded Real-Time Processor Systems - Frederick University 17
  • 18.
    • JTAG emulationthrough on-board JTAG • emulator with USB host interface or external emulator 18
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
    ACOE343 - EmbeddedReal-Time Processor Systems - Frederick University 23 Review Questions • Which of the following is not a typical DSP feature? – Dedicated multiplier/MAC – Von Neumann memory architecture – Pipelining – Saturation arithmetic • Which implementation would you choose for lowest power consumption? – ASIC – FPGA – General-Purpose Processor – DSP
  • 24.
    24 References • DR. Chassaing,“DSP Applications using C and the TMS320C6x DSK”, Wiley, 2002 • Texas Instruments, TMS320C64x datasheets • Analog Devices, ADSP-21xx Processors