The document discusses the architecture of a digital signal processor (DSP). It describes key components like the central processing unit, memory architecture, instruction set, and on-chip peripherals. The CPU contains an ALU, accumulators, barrel shifter, multiplier, and other functional units. It uses a Harvard architecture with separate program and data memories and multiple buses. Pipelining allows overlapping of instruction execution. On-chip peripherals include timers, serial ports, and a DMA controller.
27. 27
SIMD is the technique used to data level parallelism
The data is distributed across different parallel
computing units. These computing units are called
processing units (PU).
Each PU is some functional unit that performs some
task on different pieces of distributed data.
The single execution thread controls operation on all
the pieces of the data.
The SIMD handles data manipulation.
29. 29
All algorithms cannot take advantage of SIMD
Most of the compilers do not generate SIMD instructions
Programming with SIMD instructions involves various low level
challenges because of architecture, data, number
representation mismatch.
31. 31
The superscalar processing employs instruction-level parallelism.
Therefore multiple instructions are executed in one cycle.
The Power PC and pentium processors use superscalar
architectures.
There are multiple memories with separate data buses.
Multiple instructions can be executed simultaneously with separate
set of execution unit consisting of ALU, multiplier and shifters.
The execution units take inputs from register file and returns result
in the same.
34. 34
Bus Structure :-
There are separate program and data buses.
The program address bus (PAB) addresses to the program
memory.
The data read address bus (DAB) addresses to the program as
well as data memory.
The program bus (PB) carries the instructions from program
memory.
These instructions are given further to CPU for execution.
The data read bus (DB) carries the data required for execution.
It gets the data from the I/O ports, CPU or data memory
High degree of parallelism is obtained because of 4 types of bus.
35. 35
Central Processing Unit :-
The Central Processing unit consists of 32 bit ALU/accumulator,
scaling shifter, parallel logic unit (PLU), parallel multiplier and auxillary
register arithmetic unit (ARAU).
32-bit ALU / Accumulator:-
The 32-bit ALU and accumulator performs arithmetic and
logical functions.
Almost all of these functions are executed in single cycle.
ALU can also perform boolean operations.
ALU takes its operands from accumulator, shifter and multiplier
36. 36
Scaling Shifter:-
has 16 bit input connected to the data bus.
has 32 bit output connected to the ALU.
Produces a left shift of 0 to 16 bits on the input data
Other shifters perform numerical scaling, bit extraction, extended
precision arithmetic and overflow prevention.
Parallel Logic unit (PLU) :-
The parallel logic unit (PLU) is the second logic unit.
It executes logic operations on the data without affecting the
contents of ACC.
PLU provides bit manipulation which can be used to set, clear, test
or toggle bits in data memory control or status registers.
37. 37
16 X 16 bit parallel multiplier
This is 16 X 16 bit hardware multiplier is capable of multiplying
signed or unsigned 32 bit product in a single machine cycle.
The two number being multiplied are treated as 2’s complement
number and the result is also a 32 bit 2’s complement number.
Auxillary registers and auxillary register arithmetic unit (ARAU).
This is a register file of eight auxillary registers.
These registers are used for temporary data storage.
The auxillary register file (AR0 -AR7) is connected to the auxillary
register arithmetic unit (ARAU).
38. 38
The contents of the auxillary registers can be stored in data
memory or used as inputs to central arithmetic logic unit (CALU)
The ARAU helps to speed up the operation of CALU.
Program Controller:-
The program controller decodes the instructions, manages the
CPU pipeline, stores the status of CPU operations and decodes
conditional operations.
The program controller consists of program sequencer, address
generator, program counter, instruction register, status and
control registers and hardware stack.
I/O Ports:-
There are total 64 I/O ports.
Out of these there are 16 ports memory mapped in data memory
Space.
39. 39
It uses Advanced Modified Harvard Architecture
It is 16 bit fixed point DSP processor family.
Advantages of ‘C54X Devices’
Enhanced Harvard architecture which include one program bus,
three data buses and four address buses.
CPU has high degree of parallelism and application specific
hardware logic.
It has highly specialized instruction set for faster algorithms
Modular architecture design for fast development of spinoff
devices.
It has increased performance and low power consumption.
40. 40
Features of ‘C54X’
A. CPU
One program bus, three data buses and four address buses.
40-bit ALU, including 40-bit barrel shifter and two independent
40 bit accumulators.
17 Bit x 17 bit parallel multiplier coupled to 40 bit dedicated
adder for nonpipelined single cycle multiply / accumulate (MAC)
operation.
Compare, select, store unit (CSSU) for the add/compare selection of
viterbi operator.
Exponent encoder to compute the exponent of 40 bit accumulator
value in single cycle.
Two address generators, including eight auxillary registers and
two auxillary register arithmetic units.
41. 41
Multiple CPU / core architecture on some devices.
B. Memory
K Words x 16 bit addressable memory space.
Extended program memory in some devices.
C. Instruction Set
Single instruction repeat and block repeat operations.
Block memory move operations.
32 bit long operand instructions.
Instructions with 2 or 3 operand simultaneous reads.
Parallel load and parallel store instructions
Conditional Store instructions.
42. 42
Fast return from interrupt.
D. On-chip peripherals
Software programmable wait state generator.
Programmable bank switching logic.
On-chip PLL generator with internal generator with internal
oscillator.
External bus-off control to disable the external data bus,
address bus and control signals.
Programmable timer
Bus hold feature for data bus.
44. 44
Bus Architecture
Eight major 16 bit buses ( four program / data bus and four
address buses ).
Program bus (PB) carries instruction code and immediate
operands from program memory.
Three address buses (CB, DB and EB) interconnect CPU, data
address generation logic, program address generation logic, on-
chip peripherals and data memory.
Four address buses (PAB, CAB, DAB and EAB) carry the
addresses needed for instruction execution.
45. 45
Internal Memory Organization
There are three individually selectable spaces :- program ,
data and I/O space.
There are 26 CPU registers plus peripheral registers that
are mapped in data memory space.
The C54X devices can contain RAM as well as ROM.
On-chip ROM is part of program memory space and in
some cases part of data memory space.
There can be DARAM, SARAM, two way shared RAM on
the chip.
On-chip memory can be protected from being manipulated
externally.
46. 46
CPU
The CPU Contains :
40 bit ALU
Performs 2’s complement arithmetic with 40 bit ALU and
two 40 bit accumulator. The ALU can also perform boolean
operations.
Accumulators
There are two accumulators A and B. They store the output from
the ALU or the multiplier / adder block.
Barrel shifter
Its 40 bit input can come from accumulators or data memory.
It’s 40 bit output is connected to ALU or data memory.
It can produce left shift of 0 to 31 bits and a right shift of 0
to 16 bits.
47. 47
Multiplier / Adder unit
It performs 17 x 17 - bit 2’s complement mutliplication with a
40-bit addition in a single instruction cycle.
Compare, Select and Store unit (CSSU)
It performs comparisons between the accumulators high and
low word
Data Addressing
It has seven basic addressing modes
Immediate Addressing
Absolute Addressing
Accumulator Addresssing
Direct Addressing
Indirect Addressing
Memory mapped register addressing
Stack Addressing
Program Memory Addressing
The program memory is addressed with program counter (PC).
48. 48
The PC is used to fetch individual instructions
PC is loaded by program address generator PAGEN).
PAGEN increments PC
Pipeline operation
The ‘C54X DSP’ has six levels :- prefetch, fetch, decode,
access, read and execute.
One to six instructions can be active in a single cycle.
On-chip Peripherals:-
General purpose I/O pins:-
These pins can be read or written through software control.
These pins are BIO and XF
Software programmable wait-state Generator :-
It extends external bus cycles upto seven machine cycles
upto seven machine cycles to interface with slower off-chip
memory and I/O Devices.
49. 49
Programmable Bank – Switching Logic:-
It can automatically insert one cycle when an access crosses
memory blank boundaries inside program memory or data
memory space.
Hardware Timer:-
It provides 16 bit timing circuit with 4-bit prescaler.
The timer can be stopped, restarted, reset or disabled by
specific status bits.
Clock Generator:-
The clock can be generated by two options
(a)Internal oscillator (or)
(b) PLL Circuit
DMA Controller
It transfers data btween points in the memory map without
intervention by the CPU.
The data can be moved to and from program data memory ,
on chip peripherals or external memory devices.
50. 50
Host Post Interface (HPI)
It is parallel port.
It provides an interface to a host processor.
The information is exchanged between ‘C54X’ and host
processor through on-chip memory.
Serial ports:-
There are four types of serial ports
(i) Synchronous
(ii) Buffered
51. 51
S No Parameter DSP Processors General Purpose
Processors
1 Instruction Cycle Instructions are
executed in single
cycle of the clock i.e.
True instruction cycle
Multiple Clock
Cycles are required
for execution of one
instructions
2 Instruction
Execution
Parallel execution is
possible
Execution of
instruction is always
sequential
3 Operand fetch
from memory
Multiple operands
are fetched
simultaneously
Operands are fetched
sequentially
4 Memories Separate program
and data memories
No seperate
memories
5 On chip / off chip
memories
Program and data
memories are present
on-chip and
extendable off-chip
Normally on-chip
cache memory is
present. Main
memory is off-chip
52. 52
S No Parameter DSP Processors General Purpose
Processors
6 Program flow
control
Program Sequencer
and instruction
register takes care
of program flow.
Program counter
maintains the flow of
execution
7 Queuing /
Pipelining
Queuing is implicite
through instruction
register and
instruction cache
Queue is performed
explicitely by queue
registers for
pipelining of
instructions
8 Address
generation
Addresses are
generated combinely
by DAGs and
program sequencer
Program counter is
incremented
sequentially to
generate addresses
9 Address/data
bus multiplexing
Address and data
buses are not
multiplexed.They are
seperate on-chip as
well as off-chip
Address / data buses
can be seperate on
the chip but usually
multiplexed off chip
53. 53
S No Parameter DSP Processors General Purpose
Processors
10 Computational
units
Three seperate
computational units :
ALU, MAC and
shifter
ALU is the main
computational unit
11 On-chip address
and data buses
Separate address
and data buses for
program memory and
data memories and
result bus i.e.PMA,
DMA, PMD, DMD and
R-bus
Address and data
buses are the two
buses on the chip
12 Addressing
modes
Direct and indirect
addressing is
supported
Direct, indirect,
register, register
indirect, immediate
etc addressing
modes are supported
13 Suitable for Array processing
operators
General purpose
processing
54. 54
Architectural Features
Size of on-chip memory
DMA capability and multiprocessor support
Special instructions to support DSP operations
I/O Capabilities
Execution Speed
Clock speed of the processor
Million Instruction per second (MIPS)
Speed of benchmark algorithms such as FFT, FIR and
IIR filters
Type of Arithmetic
Fixed Point
Floating Point
Wordlength
Noise and errors
Precision directly related to wordlength
Signal Quality