Introduction to Digital Signal processors

Introduction to Digital Signal
processors
Dr.S.Periyanayagi
Professor& Head/ECE
Ramco Institute of Technology

DSP Processors
• A digital Signal Processor is a specialized
microprocessor targeted at digital signal processing.
• DSP Processors - needs of specific digital signal
processing applications.
• Advanced Microprocessor – RISC (Reduced
Instruction Set Computer) Processor, CISC (Complex
Instruction set Computer) Processor
• For real time signal processing, DSP processors are
rated best among the programmable processors

Salient features
• For Efficient performance of DSP Operations
Multiplier and Multiplier Accumulator
Modified Bus Structure and Memory Access
Schemes
Multiple Access Memory
Very Long Instruction Word VLIW Architecture
Pipelining
Special Addressing Modes
On Chip Peripherals

Categories of DSPs
• General Purpose digital Signal processors
 Fixed Point Processors
TMS320C5X,TMS320C54x and Motorola
DSP563x, DSP56156/166 (16 bit)
 Floating Point Processors
TMS320C4x,TMS320C67xx, Motorola
DSP6002(32 bit)
• Special Purpose Processors
 Design for specific DSP algorithms
FFT
 Hardware designed for specific applications PCM,

• Analog Devices
– ADSP-2100 Family (16-bit Fixed Point)
– ADSP- 21020 (32 bit Floating point)
– ADSP-2106x(32 bit Floating Point
Texas Instruments
TMS320C1x (16 bit fixed point)
TMS320C2x(16 bit fixed point)
TMS320C3x(32 bit floating point)
TMS320C5x(16 bit fixed point)

• TMS 320 Family of processors includes four
basic types of processors.
 Fixed Point Processors – Low power, Low cost
device and operates at high speed.
 32-bit floating point Processors - Large
dynamic range, wider instruction word size and
more addressing modes.
 VLIW architecture processors - Executes
Parallel instructions at a time by multiple
execution unit.
Multiprocessor DSPs – Provides parallel
processing

Architecture of Microprocessor
Diagram taken from ‘Digital signal Processing’- Emmanuel Ifeachor & Barrie W.Jervis, Second
edition book

Basic General Hardware Architecture for Signal
Processing
edition book

Techniques in DSP Processor
• Harvard architecture
• Pipelining
• Fast, dedicated hardware multiplier/
accumulator
• Special instruction dedicated to DSP
• Replication
• On chip memory/Cache
• Extended parallelism – SMID, VLIW and
static superscalar processing

Special Features of Digital Signal
Processing
• Fast Data Access
• Fast Computation
• Numerical fidelity
• Fast execution control

Fast Data Access
• High-bandwidth Memory Architectures
 Von Neumann Architecture
 Harvard Architecture
 Modified Harvard Architecture
 Architecture of Advanced digital Signal
processors.
• Specialized Addressing Modes
 Circular Addressing
 Bit reversed addressing
• Direct Memory Access (DMA)

Fast Computation
• MAC (Multiply/Accumulate) Unit
• Pipelining of Instruction Execution
 Phase 1 : Fetch the opcode (or instruction code)
from program memory.
 Phase 2 : Decode the instruction code.
 Phase 3 : Read the operands (or data) from
data/program memory.
 Phase 4 : Execute the task specified by the
instruction and store the result.

Von Neumann Architecture
Diagram taken from ‘Digital signal Processors-Architecture, Programming and Applications’-
B.Venkataramani & M Bhaskar, Second Edition book

• MAC operation with data move (MACD
instruction) – Requires 4 memory access per
instruction cycle.
 Fetch the MACD instruction from the program
memory
 Fetch one of the operands from the program
memory
 Fetch the second operand from the data
memory
 Write the content of the data memory with
address DMA into the location with address
DMA+1
• Von Neumann architecture – 4 clock cycles

Von Neumann Architecture
• Consist of three buses
 Data bus
 Address bus
 Control bus

Non-Harvard architecture with single
memory space
edition book

Types of instruction
• Instruction Fetch
• Instruction decode
• Instruction execute

Class Poll
• A Microprocessor and DSP processor
differ in
a. Speed of operation
b. Real time signal processing
c. Multiple Busses
d. All the above

Harvard Architecture

Basic Harvard Architecture
edition book

Instruction overlap made in Harvard
architecture
edition book

• Number of clock cycles are reduced by
using two separate buses for the program
and data memory
• Content of program and data memory can
be accessed parallel.
• Instruction code can be fed from the
program memory to the control unit while
the operand is fed to the processing unit
from the data memory.
• Processing unit consist of
– Registers
– Processing elements - MAC units, Multiplier,
ALU, shifter

• The number of memory accesses/ Clock
cycles - Increased by using more number
of buses
 Motorola DSP5600X.DSP96002 – three
separate buses
 TMS320C54X – 4 address buses
• The cost of IC increases - Number of pins
in the IC
• Extending number of buses – unduly
increases the price
• P-DSPs – multiple buses for connecting
on chip memory to the control unit and
data path.

Modified Harvard Architecture

• One set of bus - access both program and
data memory
• Other – data alone
• Data can be transferred from one memory
to another
• Texas instruments and Analog devices

Special Purpose DSPs Examples
FFT processor
PDSP 16515A,TM-44,TM-66
Programmable FIR filter
UPDSP 16256, Model13092

Architecture of Advanced Digital
Signal Processors

VLIW Architecture

Numerical Fidelity
• Guard Bits
• Dynamic range
dBdB
ValueSmallest
ValueestL
rangeDynamic 6.186
2/1
1
log20
arg
log20
31














Fast Execution control
• Zero-overhead Hardware loop
• Very fast interrupt handling by employing
shadow registers.

Applications of TI DSPs
• C1X,C2X,C2XX,C5X,C54X : toys, Hard disk drives,
Modems, Cellular phones and active car suspensions.
• C3X : Filters, analysers, hi-fi systems, voice mail,
imaging, barcode readers, motor control, 3D graphics or
scientific processing.
• C4X: parallel- processing clusters in virtual reality,
image recognition telecom routing, and parallel
processing systems.
• C6X: Wireless base stations, pooled modems, remote-
access servers, digital subscriber loop systems, cable
modems and multichannel telephone systems.
• C8X: video telephony, 3D computer graphics, virtual
reality and number of multimedia applications.

IC Number
• The TI DSP Chip have IC numbers with the
prefix TMS320.
• Next letter C - CMOS technology (TMS320Cxx)
• Next Letter E – CMOS and On chip non volatile
memory EPROM (TMS320E5x)
• If TMS3205x – NMOS technology and On chip
non volatile memory ROM
• Under C5X – C50, C51, and C5X – identical in
instruction set but differs in capacity of on chip
ROM and RAM.

Characteristics of some TMS320 family DSP
chips
'C15 'C25 'C30 'C50 'C541
Cycle Time (ns) 200 100 60 50 25
On chip RAM 4K 4K 4K 2K 5K
Total Memory 4K 128K 16M 128K 128K
Parallel ports 8 16 16M 64K 64K

• Bus Structure
 Program Bus (PB) – carries instruction code
and immediate operands from program
memory to CPU.
 Program address bus (PAB) – It provides
addresses to program memory space for both
reads and writes.
 Data read bus (DB) – It interconnects various
elements of the CPU to data memory space.
 Data read address bus(DAB) – It provides
address to access the data memory space.

Features of TMS320C5x Family
• Central Arithmetic Logic Unit (CALU)
– 16-bit CPU
– 20 to 50 ns single cycle instruction execution time
– Single cycle 16 x 16-bit MAC (Multiply/
Accumulate) unit
– 64k x 16-bit external Program memory address
space
– 64k x 16-bit external data memory address space

• 64k x 16-bit external IO address space
• 32k x 16k-bit external global memory address
space
• 2k to 32k x 16-bit single-access On-chip
PROM
• 1k to 9k x 16-bit single-access On-chip
Program/data RAM
• 1k x 16-bit dual-access On-chip program/ data
RAM

• Synchronous, TDM and buffered serial ports
• Programmable timer and PLL (Phase Locked
Loops)
• IEEE standard JTAG ports
• 5 V/3 V operation with low power dissipation
and power down modes
• DMA interface
• 100/128/132/144 pins in plastic QFP and
TQFP

• Central Processing Unit
 Central arithmetic logic unit (CALU)
 Parallel logic unit(PLU)
 Auxiliary register arithmetic unit(ARAU)
 Memory mapped registers
 Program controller

Central arithmetic logic unit (CALU)
• 16X16 Bit Parallel Multiplier
• 32 bit Accumulator(ACC)
• 32 bit Accumulator Buffer (ACCB)
• Product register (PREG)
• 0-16 bit barrel shifters(right and left)
• 32 bit ALU

• One of the operands for ALU operation comes from ACC.
• Result is stored in ACC
• A 32 bit ACCB is used for temporary storage of ACC.
• The hardware multiplier – 16x16 multiplication of
number represented in 2’s complement form.
• 32 bit PREG – result of multiplication
• 0-16 bit left and right barrel shifters in CALU - permit
the contents of memory to be left shifted by 0-16 bits
before they are fed to ALU or stored from ALU to
memory.

• Auxiliary register arithmetic unit(ARAU):
Eight auxiliary register (AR0-AR&) each of 16 bit
length,
3 bit ARP (Auxiliary register pointer) and unsigned 16
bit ALU
Used as Address pointer and general purpose register
Index register,
ARCR,BMAR,BRR(RPTC,BRCR,PASR,PAER), PLU
– Index register:
 Used by ARAU as step value to modify the address in AR's during
indirect addressing.
– Auxiliary Register compare register:
 Used for Address boundary comparison

– Block Move Address Register (BMAR)
 16 bit holds an address value to be used with block
moves and multiply/accumulate operations.
 provides 16 bit address for indirect addressed second
operand
– Block Repeat Register (BRR)
 16 bit wide
 Repeat counter register (RPTC)
 Block repeat counter register (BRCR)
 Block repeat program address start register (PASR)
 Block repeat program address end register (PAER)
– Parallel Logic Unit (PLU)
Performs Boolean operations or bit manipulations
Logic unit executes Logic operations – set, clear, test
or toggle multiplier bits in control register or any
data memory location.

• Memory mapped registers
 96 registers
 Used for indirect data pointer, temporary
storage
• Instruction Registers
• Interrupt Registers
• Status Registers
• Program Controller
Program Counter
Hardware Stack
Program Memory Address Generation

Status and Control Registers
Circular Buffer Registers
Process Mode Status Register
Status Register (ST0 and ST1)

Status Registers
• ST0 bit assignment
 ARP: Auxiliary Register Pointer- Select AR in indirect
addressing
 OV: Overflow flag bit – Arithmetic operation overflow in ALU
 OVM: overflow Mode bit –Accumulator overflow saturation
mode
 INTM: Interrupt Mode bit – Globally masks or enables all
interrupts.
 DP: Data memory page pointer bit – Address of current
data memory page
15-14 12 11 10 9 8-0
ARP OV OVM 1 INTM DP

• ST1 Bit assignment
• ARB: Auxiliary Register Buffer – Holds previous value of
ARP
• CNF on chip RAM configuration control bit
CNF: 0-on chip DARAM B0 is mapped to data memory
CNF:1 – on chip DARAM B0 is mapped to Program memory
• TC test/ control flag bit – Stores the result of ALU or PLU test
bit operations
• SXM: Sign extension mode bit- enables /disables sign
extension of an arithmetic operation
15-14 12 11 10 9 8-7 6 5 4 3-2 1-0
ARB CNF TC SXM C 11 HM 1 XF 11 PM

ARB Auxiliary Register Buffer
• C: Carry bit-Indicates arithmetic carry or borrow
• HM: Hold mode bit- indicates CPU stops or continues
execution
• XF: pin status bit – determines level of external flag
output pin
• PM: product shift mode bits
00 – No shift
01-Left shifted 1 bit; LSBs zero filled
10- Left shifted 4 bits; LSBs zero filled
11- Right shifted 6 bits;6 LSBs lost

• On-chip Memory
Program Memory
Data/Program Dual access RAM
Data/Program single Access RAM
On Chip Memory Protection
• On-chip Peripherals
Clock Generator
Hardware Timer
Software Programmable wait state generators
General Purpose I/O Pins

Parallel I/O Ports
Serial Port Interface
Buffered Serial Port
TDM Serial Port
Host Port Interface
User maskable interrupts

Addressing Modes
The method of specifying the data to be operated
by the instruction is called addressing modes.
Direct addressing
Memory mapped register addressing
Indirect addressing
Immediate addressing
Register addressing
Circular addressing mode

Direct Addressing Mode
• Address of the data is directly specified in the
instruction itself.
• 16-bit data memory address bus(DAB)

Memory Mapped Register
Addressing
• LAMM- Load accumulator with memory
mapped register
• LMMR – Load memory mapped register
• SAMM – Store accumulator in memory
mapped register
• SMMR – Store memory mapped register

Bit Reversed Addressing Mode
• Bit Reversed operation – FFT

Immediate Addressing
• The immediate addressing mode can be used
to load either a 16-bit constant or a constant of
length 13,9 or 7
• Accordingly it is referred to as long immediate
or short immediate addressing mode.
• This mode is indicated by the symbol#.

Indirect Addressing
Symbol Value of AR pointed by ARP after
instruction execution
* AR unaltered
*+ AR incremented by 1
*- AR decremented by 1
*0+ AR incremented by the content of INDX
*0- AR decremented by the content of INDX
*BR0+ AR incremented by the content of INDX with
reverse carry propagation
*BR0- AR decremented by the content of INDX with
reverse carry propagation

Dedicated - Register Addressing
• The advantage of this addressing mode is that
the address of the block of memory to be acted
upon can be changed during execution of the
program

Circular Addressing Mode
• CBSR1- Circular buffer 1 start register
• CBSR1- Circular buffer 2 start register
• CBER1- Circular buffer 1 end register
• CBER1- Circular buffer 2 end register
• CBCR- Circular buffer control register

Pipelining
• Pipelining a processor means breaking down
its instruction into series of discrete pipeline
stages which are completed in sequence.
• Phases of Pipelining
Fetch(F)
Decode(D)
Read(R)
Execute(E)

Pipelining

Advantages
• Improves system Performance
• Increases the speed of operation

MAC Operation
• Numerical operations in DSP- Multiplication
and Addition
• Real Time DSP to be fast - MAC unit is
mandatory
• Fixed or floating Hardware MAC – Standard
in DSPs
• In fixed point – It multiplies two 16 bit 2’s
complement fractional numbers and computes
a 32 bit product in a single cycle (25ns)
• DSP Hardware MAC Configuration is depicted

MAC Configuration in DSPs

• The multiplier has a pair of input registers that
hold the inputs to the multiplier and a 32 bit
product register which holds the result of a
multiplication.
• The output of the P (product) register is connected
to a double precision accumulator.
• The principle is very much the same for hardware
floating point multiplier accumulators.
• Floating point MACs allow fast computation of
DSP results with minimal Errors.
• Floating point offers a wide dynamic range and
reduced arithmetic errors, Many applications the
dynamic range provided by the fixed point
representation is adequate.

First generation – Fixed point processor

Fixed Point digital signal processors
• The Key features of four generation of the fixed point
DSP processors from 4 leading semiconductor
manufacturers.
• Basic architecture of the first generation fixed point
DSP processor TMS320C1x by Texas Instruments
• Dedicated arithmetic units – multiplier and an
accumulator
• The processor family – modified Harvard architecture
with two separate memory spaces for programs and
data.
• On-chip memory and special instruction for execution

• Has three separate address spaces for program
memory, data memory and I/O.
• 16 Bit two Auxiliary registers (AR0-AR1)
• The content of auxiliary registers can be saved in
and loaded from data memory with SAR and
LAR
• Provides 144/256words of 16 bit on chip data
RAM
• 1.5K/4K words of program ROM/EPROM

Second generation – Fixed point processor

• Second generation fixed point DSPs – enhanced
features compared to the first generation.
• Much larger on chip memories and more special
instruction to support efficient execution of DSP
algorithms
• Computational performance – 4 to 6 times more than
first generation.
• Figure shows – Special instructions for DSP
operations include a multiply and accumulate with
data move instruction – repeat instruction to execute
an FIR filter with time saving.
• Second generation – provides more on chip memory

• TMS320C2X – Modified Harvard architecture for
speed and flexibility
• 32 bit ALU and accumulator perform a wide range of
arithmetic and logical instructions.
• Separate Program and Data memory spaces – each
with 16 bit address and on chip data buses.
• 16x16 bit hardware multiplier capable of computing a
signed and unsigned 32 bit product in single machine
cycle.

• Six register :
– A serial port receive register
– A Serial port Transmit register
– A time register
– A period register
– An Interrupt mask register
– Memory allocation register
• TMS320C2X allows flexible configurations
– A Stand alone Processor
– A multiprocessor with devices in parallel
– A slave/host multiprocessor with global memory space
– A peripheral processed interfaced via processor controlled
signal to another device

Second generation – Motorola DSP56002

Third Generation – Fixed Point processor

• Third Generation fixed point DSPs –
enhancement of second generation DSPs
• Performance Enhancement – Achieved by
increasing and/or making more effective use of
available on chip resources.
• More data paths, wider data paths, Larger on
chip memory and instruction cache and dual
MAC.
• Third generation DSPs - 2 or 3 times superior
to second generation
• Texas Instruments TMS320C3x,
TMS320C54X

• Third Generation TMS320C3X -executes 60
million floating point operations per second.
• On chip parallelism in processor – 11
operations in a single instruction
• High performance
– Perform parallel multiply and arithmetic unit
operations on integer, floating point in single cycle
– General purpose register file
– Large on chip memory
– High degree of parallelism
– Direct memory access controller

Fourth Generation-Fixed point processors
edition book

• Fourth Generation fixed point DSP processors
– Multi channel Applications
• Digital Subscriber loop
• Remote Access server modem
• Wireless base station
• 3 G Mobile systems
• Medical Imaging
 Uses VLIW Architecture
 Wider Instruction word
 Wider data paths
 More registers
 Larger Instruction Cache
 Multiple Arithmetic unit

• Core processor has – two independent arithmetic
paths, each with four execution units
– Logic Unit (Li)
– Shifter/ Logic Unit (Si)
– Multiplier (Mi)
– Data Address Unit (Di)
• Core processor- fetches 32 bit instructions at a time
• Instruction Width of 256 bits
• Executes 8 instruction in parallel for one cycle
• Large Program and Data memory
• Advantages of VLIW Architecture – High
computational performance.

Floating Point Representation
• First generation – TMS320 C3x
• Second generation – TMS320C4x
• Third generation – TMS320C67X

• TMS320C6x – VelociTI architecture – first DSP
to use advanced VLIW architecture
• Excellent choice of multiple execution and
multifunction applications
• VelociTI architecture – Reduced Code size,
flexibility of code, data type and zero overhead in
branching.
• TMS320C62X,64X – Fixed point processors
• TMS320C67X – Floating point processor 32
general purpose register with 32 bit size.

Features of TMS320C6x processors
• Advanced VLIW CPU – 8 functional units (2 multiplier and
6 ALUs)
• Executes 8 instructions per cycle
• Instruction Packing reduces code size, program fetch and
power consumption
• Conditional Execution of all instruction
• Efficient code execution on independent functional units
• Supports 8/16/32 bit data format
• 40 bit arithmetic, saturation and normalization operations
• Field manipulation and instruction extract, set, clear, and bit
counting operations.
• Supports Single precision (32 bit) , double precision (64 bit)
IEEE floating point operations
• 32 x 32 bit integer multiplication with 32 or 64 bit results.

Internal Architecture
edition book

• C6x- contains 32 bit CPU
• On chip program
• Data memory and on chip peripheral
• Peripherals such as
– External memory interface (EMIF)
– Direct memory Access Control (DMA)
– Timers
– Multi channel buffered serial ports (MsBSP)
– Host port interface(HPI)
– Power down logic

CPU unit of TMS320C6X
edition book

• CPU contains
– Program fetch unit
– Instruction dispatch unit
– Instruction decode unit
– Two data path – 4 functional unit
– Register file for each data path
– Control register
– Control Logic
– Test, Emulator and interrupt logic
• Functional unit accepts 32 bit instruction
(Instruction packet size is 256 bits) at a time
• Program fetch unit generates address of eight
instructions and send it to program memory for
each fetch packet – once fetched CPU receives
the packets.

• The Instruction Dispatch unit receives the fetch
packet and split it into execute packets.
• Instruction in execute packets are assigned to
appropriate eight functional unit in data path
• In instruction decode, the source register and
destination register and associated path are
decoded for execution of the instruction in
functional unit.
• Instructions are executed by functional units
• The register files A& B – 32 numbers of 32 bit
registers (16 register for each data path)
• 8 Functional units – 6 ALU and 2 Multiplier
(.L1,.L2,.S1,.S2,.M1,.M2,.D1,.D2)

edition book

Functional units of C6x
Name of the
Unit
Type of Floating point Operation
.L Unit Arithmetic operations
.S Unit Compare square root and absolute value
operations
.M unit 32x32 bit fixed point multiply operations and
floating point multiply operations
.D unit Load double word with 5 bit constant offset

• Data Path
– Register file data path
– Register file cross path
– Register file Memory access path
• Control Register File –
– 10 control registers,
– .S2 can read and write to control register
– Accessed by MVC (Move between Control and register
file)
– Addressing mode register
– Control Status register
– Program counter
– Interrupt flag , set, clear, enable register
– Interrupt return pointer
– Non-maskable interrupt return pointer

Visual Quiz
• The features in which PDSP is superior to advanced
Microprocessor is
A. Low cost
B. Low power
C. Computational speed
D. Real time I/O Capability
• The addressing mode convenient for FFT computation
is
A. Indirect addressing
B. Circular mode
C. Bit reversed addressing
D. Memory mapped addressing

• The Result of operations performed in CALU are
stored in
A. ACC
B. ACCB
C. TREG0
D. PREG
• The ........ Permits execution of logical operation
on data without affecting the contents of ALU
A. PLU
B. Auxiliary ALU
C. CALU
D. Memory mapped addressing

• The Register used for indirect addressing of
memory
A. ARs
B. Block move address register
C. TREG
D. Index register
• The Content to left unaltered in indirect
addressing mode symbol is
A. *
B. *0+
C. #
D. *+

• The C6X processor is based on ..........
architecture
A. Modified Harvard
B. VelociTI
C. Advanced Harvard
D. Davinci
• The functional unit used for 32/40 bit shift
operation
A. .L
B. .S
C. .M
D. .D

• The number of register in C62x and C67x CPU
register file
A. 16
B. 32
C. 40
D. 64
• The floating point devices in C6x processors
are
A. C62x
B. C67x
C. C64x
D. C62x and C74x

Introduction to Digital Signal processors

More Related Content

What's hot

Similar to Introduction to Digital Signal processors

More from PeriyanayagiS

Recently uploaded

Introduction to Digital Signal processors