3. INTROUCTION
This unit provides the architectural overview of TMS320C54XX
which comprises of :-
• CPU
• On Chip Memory
• On Chip Peripherals
• Addressing Modes
• Interrupts
• Program Control
• Internal Memory Bus Organization
• Buses
• Pipelining
9/14/2017 Dr. Sudhir N Shelke
4. INTROUCTION
• The C54XX DSP uses modified harvard architecture that
maximizes processing power eight buses.
• Separate Program & Data buses allow simultaneous access to
program & data providing high degree of parallelism.
• Data can be transferred between program & data memory.
9/14/2017 Dr. Sudhir N Shelke
5. Efficient data/program flow
#1: CPU designed for efficient DSP processing
MAC unit, 2 Accumulators, Additional Adder,
Barrel Shifter
#2: Multiple busses for efficient data
and program flow
Four busses and large on-chip memory that
result in sustained performance near peak
#3: Highly tuned instruction set for
powerful DSP computing
Sophisticated instructions that execute in fewer
cycles, with less code and low power demands
9/14/2017 Dr. Sudhir N Shelke
8. Buses in C54XX
The C54XX architecture is built around 8 major 16 bit buses.
The Program Bus carries the instruction code & immediate operands
from program memory.
Three data buses (CB,DB,EB) interconnect to various elements such
as CPU, Data address generation logic ,on chip Peripherals & data
memory.
The CB & DB carry the data operands that are read from memory.
The EB carries the data to be written to memory.
Four address buses (PAB, CAB, DAB, and EAB) carry the addresses
needed for instruction execution.
9/14/2017 Dr. Sudhir N Shelke
12. Buses
• The C54x DSP can generate up to two data-memory addresses
per cycle using the two auxiliary register arithmetic units
(ARAU0 and ARAU1).
• The PB can carry data operands stored in program space to
the multiplier and adder for multiply/accumulate operations
or to a destination in data space for data move instructions.
• The C54x DSP also has an on-chip bidirectional bus for
accessing on-chip peripherals. This bus is connected to DB
and EB through the bus exchanger in the CPU interface
9/14/2017 Dr. Sudhir N Shelke
15. Internal Memory Organization
The C54XX DSP memory is organized into three individually
selectable spaces: program, data, and I/O space.
The C54x devices can contain random access memory (RAM)
and read-only memory (ROM).
Among the devices, the following types of RAM are
represented: dual-access RAM (DARAM), single-access RAM
(SARAM), and two-way shared RAM.
The DARAM or SARAM can be shared within subsystems of
a multiple-CPU core device.
We can configure the DARAM and SARAM as data memory
or program/data memory.
The C54x DSP also has 26 CPU registers plus peripheral
registers that are mapped in data-memory space.
9/14/2017 Dr. Sudhir N Shelke
17. On Chip ROM
The on-chip ROM is part of the program memory space and,
in some cases, part of the data memory space.
The amount of on-chip ROM available on each device varies
On most devices, the ROM contains a boot loader that is
useful for booting to faster on-chip or external RAM.
On devices with large amounts of ROM, a portion of the
ROM may be mapped into both data and program space.
9/14/2017 Dr. Sudhir N Shelke
18. On Chip DARAM
The amount of on-chip DARAM available on each device
varies.
The DARAM is composed of several blocks. Because each
DARAM block can be accessed twice per machine cycle.
The CPU and peripherals, such as a buffered serial port
(BSP) and host-port interface (HPI), can read from and write
to a DARAM memory address in the same cycle.
The DARAM is always mapped in data space and is primarily
intended to store data values. It can also be mapped into
program space and used to store program code.
9/14/2017 Dr. Sudhir N Shelke
19. On Chip SARAM
The amount of on-chip SARAM available on each device
varies.
The SARAM is composed of several blocks. Each block is
accessible once per machine cycle for either a read or a write.
The SARAM is always mapped in data space and is primarily
intended to store data values.
It can also be mapped into program space and used to store
program code.
9/14/2017 Dr. Sudhir N Shelke
20. MEMORY MAPPED REGISTERS (MMR)
• The data memory space contains memory-mapped registers for
the CPU and the on-chip peripherals.
• These registers are located on data page 0, simplifying
access to them.
• The memory-mapped access provides a convenient way to
save and restore the registers for context switches and to
transfer information between the accumulators and the other
registers.
9/14/2017 Dr. Sudhir N Shelke
21. Central Processing Unit (CPU)
40 Bit ALU
Two 40 bit Accumulators
Barrel Shifter
17 X 17 bit Multiplier
40 Bit Adder
16 Bit Temp Register
CSSU
9/14/2017 Dr. Sudhir N Shelke
22. STATUS REGISTERS
ST0: Contains the status of flags (OVA, OVB, C, TC)
produced by arithmetic operations & bit
manipulations.
ST1: Contain the status of various conditions &
modes. Bits of ST0 & ST1 registers can be set or clear
with the SSBX & RSBX instructions.
9/14/2017 Dr. Sudhir N Shelke
23. ST0
•DP: Data memory page pointer, concatenated with the 7-LSBs of an
instruction word to form a direct memory address of 16-bits, if CPL = 0.
• OVB: Overflow for AccB.
• OVA: Overflow for AccA.
•C: Carry,
1 for Carry generated by addition. 0 for Borrow generated by
subtraction otherwise, 0 for add & 1 for sub.
• TC: Test/Control flag, Stores the result of ALU test bit
operations.
• ARP: Auxiliary Register Pointer, Selects AR0 –AR7 for
indirect single-operand addressing.
9/14/2017 Dr. Sudhir N Shelke
24. ST1
•15.BRAF: Block-Rep active flag
BRAF=0, when BRC< zero; BRAF=1, when RPTB
• 14.CPL: Compiler mode.
CPL=0, DP is selected; CPL=1, SP is selected
• 13.XF: External flag, a GP O/P pin for multiprocessor configuration.
Set: SSBX; Reset: RSBX
• 12.HM: Hold Mode, determines whether the CPU stops or continues
execution when acknowledging an active HOLD signal.
9/14/2017 Dr. Sudhir N Shelke
25. • 11.INTM: Interrupt mode.
0, all unmasked interrupts are enabled
1, all maskable interrupts are enabled
• 10. O: Overflow.
• 09.OVM: Overflow mode, enables (1) / disables(0) the
accumulator to saturate on overflow.
• 08.SXM: Sign extension mode, enables / disables sign
extension of an arithmetic operation
9/14/2017 Dr. Sudhir N Shelke
26. •07.C16: Dual 16-bit/ Double precision arithmetic mode.
C16=0, ALU operates in double precision mode
C16=1, ALU operates in dual 16 bit arithmetic mode
• 06.FRCT: Fractional mode (multiplication)
If 1, multiplier output is left shifted by 1 bit to compensate for extra
sign bit
• 05.CMPT: Compatibility mode for ARP.
(ARP not updated(0), ARP updated(1))
• 04.ASM: Accumulator Shift Mode.
Specifies a shift value of -16 to +15 range and is coded as 2’s
complement value
9/14/2017 Dr. Sudhir N Shelke
27. ALU
• The 40-bit ALU, implements a wide range of arithmetic and
logical functions, most of which execute in a single clock
cycle.
• After an operation is performed in the ALU, the result is
usually transferred to a destination accumulator (accumulator
A or B).
• The ALU can also function as two 16-bit ALUs and perform
two 16-bit operations simultaneously.
9/14/2017 Dr. Sudhir N Shelke
28. ACCUMULATORS
The C54XX devices have two 40 bit ACC’s A & B.
Accumulators A and B store the output from the ALU or the
multiplier/adder block. They can also provide a second input to the ALU.
Accumulator A can be an input to the multiplier/adder.
Each accumulator is divided into three parts:
Guard bits (bits 39–32)
High-order word (bits 31–16)
Low-order word (bits 15–0)
9/14/2017 Dr. Sudhir N Shelke
29. BARREL SHIFTER
• The C54x DSP barrel shifter has a 40-bit input connected to the accumulators
or to data memory (using CB or DB), and a 40-bit output connected to the ALU
or to data memory (using EB).
• The barrel shifter can produce a left shift of 0 to 31 bits and a right shift of 0
to 16 bits on the input data.
• The shift requirements are defined in the shift count field of the instruction,
the shift count field (ASM = Accu shift mode) of status register ST1, or in
temporary register T (when it is designated as a shift count register).
• The shift count determines how many bits to shift. Positive shift values
correspond to left shifts, whereas negative values correspond to right shifts.
9/14/2017 Dr. Sudhir N Shelke
30. MAC UNIT
The C54xx CPU has a 17-bit × 17-bit hardware multiplier coupled to a
40-bit dedicated adder.
This multiplier/adder unit provides multiply and accumulate (MAC)
capability in one pipeline phase cycle.
Signed / unsigned multiplication.
First Input to the Multiplier:-
Temp Register
Data Memory Operand from DB
ACC A (32-16)
Second Input to the Multiplier:-
Data Memory Operand from CB
Data Memory Operand from DB
Data Memory Operand from EB
ACC A (32-16)
9/14/2017 Dr. Sudhir N Shelke
31. REGISTERS
Temporary Registers:-
• It may hold one of the multiplicands for Multiplication
Instructions.
• A dynamic shift count for instructions with shift operation
such as ADD & SUB instruction.
• It may hold branch metrics of Viterbi decoding.
• In addition the EXP instruction stores the exponent value
computed into Temp Reg & the NORM instruction uses the
Temp Register value to normalize the number
9/14/2017 Dr. Sudhir N Shelke
32. REGISTERS
Transition Register:-
• The 16 bit Transition Register holds the transition decisions
for the path to new metrics to perform Viterbi algorithm.
• The CMPS instruction compares the updates the content of
TRN Reg on the basis of comparison between ACC High
Word & ACC Low Word.
9/14/2017 Dr. Sudhir N Shelke
33. REGISTERS
Auxillary Registers:-
• The Eight 16 bit ARs (AR0 – AR7) can be accessed by CPU &
modified by ARAU.
• The primary function of ARs is to generate 16 bit addresses
for data space.
• However these registers can also act as general purpose
registers.
9/14/2017 Dr. Sudhir N Shelke
34. REGISTERS
Stack Pointer:-
• The 16 bit Stack Pointer (SP) Register contains the 16 bit
address of Top of Stack.
• The SP always points to last element pushed onto the stack.
• The stack is manipulated by Interrupts, Traps , Calls,
Returns, PSHD,PSHM,POPD,POPM Instructions.
9/14/2017 Dr. Sudhir N Shelke
35. COMAPRE SELECT STORE UNIT(CSSU)
The compare, select, and store unit (CSSU) is an
application-specific hardware unit dedicated to
add/compare/select (ACS) operations of the Viterbi operator.
9/14/2017 Dr. Sudhir N Shelke
36. CSSU
The CSSU allows the C54x device to support various Viterbi
butterfly algorithms used in equalizers and channel
decoders.
The add function of the Viterbi operator is performed by the
ALU. This function consists of a double addition function
(Met1 ± D1 and Met2 ± D2).
Double addition is completed in one machine cycle if the
ALU is configured for dual 16-bit mode by setting the C16
bit in ST1.
With the ALU configured in dual 16-bit mode, all the long-
word (32-bit) instructions become dual 16-bit arithmetic
instructions.
9/14/2017 Dr. Sudhir N Shelke
38. Working of CSSU
1. The CSSU implements the compare and select operation via
the CMPS instruction, a comparator, and the 16-bit transition
register (TRN).
2. This operation compares two 16-bit parts of the specified
accumulator and shifts the decision into bit 0 of TRN.
3. This decision is also stored in the TC bit of ST0.
4. Based on the decision, the corresponding 16-bit part of the
accumulator is stored in data memory.
9/14/2017 Dr. Sudhir N Shelke
39. PROGRAM CONTROL
The Program Control unit of TMS320C54XX processors
contain:-
Program Counter
Hardware Stack
Repeat Counters
Status Registers
9/14/2017 Dr. Sudhir N Shelke
40. Program Counter
The PC addresses the Program Memory either on chip or off chip
& is loaded in one of the several ways:-
Code Operation Address loaded into PC
Reset PC is loaded with FF80h
Sequential Execution PC is loaded with PC+1
Branch PC is loaded with the 16-bit-immediate value directly following the
branch instruction
Branch from ACC PC is loaded with the lower 16-bit word of accumulator A or B
Block Repeat Loop PC is loaded with the repeat start address (RSA) when PC + 1 equals the
repeat end address (REA) + 1,
provided that BRAF = 1.
Subroutine Call PC+2 is pushed on stack & PC is loaded with 16 bit immediate value
following CALL instruction. The return instruction pops the top of stack
into PC to return.
Interrupts P C is pushed onto stack & PC is loaded with address of appropriate
vector address. The return instruction pops the top of stack into PC to
return.
9/14/2017 Dr. Sudhir N Shelke
41. PROGRAM CONTROL
• The program counter related hardware PAGEN provides for above
options.
• Hardware Stack: The Stack is used to solve & restore the PC value
during subroutine Call & Interrupts.
• Repeat Counter: A single instruction can be repeated N+1 times by
loading value N in Repeat Counter Register, likewise a block of
instructions can be repeated N+1 times by loading value into Block
Repeat Counter Register.
• Status Register :The TMS320C54XX contains
• ST0
• ST1
9/14/2017 Dr. Sudhir N Shelke
42. INTERRUPTS
• Many times when the CPU is in the midst of executing a
program a peripheral device may require a service from CPU.
• In such a situation main program may be interrupted by signal
generated by peripheral devices.
• This results in processor suspending the main program in order
to execute another program called Interrupt Service Routine to
service the peripheral.
• On completion of ISR the processor returns to the main
program to continue from where it left.
• Interrupt may be generated by internal or external device.
9/14/2017 Dr. Sudhir N Shelke
43. INTERRUPTS
• It may also generated by software.
• Not all the interrupts are serviced by when they occur only
those interrupts that are called non maskable are serviced
when they occur.
• Other Interrupts which are called maskable interrupts are
serviced only if they are enabled.
• There is also a priority to determine which interrupts gets
serviced first if more than one interrupts occur simultaneously.
9/14/2017 Dr. Sudhir N Shelke
44. PIPELINE OPERATION of TMS320C54XX
• The C54xx DSP has a six-level deep instruction pipeline.
• The six stages of the pipeline are independent of each other,
which allows overlapping execution of instructions.
• During any given cycle, from one to six different instructions
can be active, each at a different stage of completion.
9/14/2017 Dr. Sudhir N Shelke
45. PIPELINE OPERATION of TMS320C54XX
• The six levels and functions of the pipeline structure are:
• Program address bus (PAB) is loaded
with the address of the next instruction
to be fetched.
Program Prefetch
• An instruction word is fetched from the
program bus (PB) and loaded into the
instruction register (IR). This completes
an instruction fetch sequence that
consists of this and the previous cycle.
Program fetch
9/14/2017 Dr. Sudhir N Shelke
46. • The contents of the instruction
register (IR) are decoded to determine
the type of memory access operation
and the control sequence at the data-
address generation unit (DAGEN) and
the CPU.
Decode
• DAGEN outputs the read operand’s
address on the data address bus, DAB.
If a second operand is required, the
other data address bus, CAB, is also
loaded with an appropriate address.
Auxiliary registers in indirect
addressing mode and the stack
pointer (SP) are also updated.
Access
9/14/2017 Dr. Sudhir N Shelke
47. • The read data operand(s), if any, are
read from the data buses, DB and CB.
This completes the two-stage operand
read sequence. At the same time, the
two-stage operand write sequence
begins. The data address of the write
operand, if any, is loaded into the data
write address bus (EAB).
Read
• The operand write sequence is
completed by writing the data using the
data write bus (EB). The instruction is
executed in this phase
Execute
9/14/2017 Dr. Sudhir N Shelke
49. Show the pipeline operation of following sequence of
instructions if the initial value of AR3 is 80 & the values stored in
memory location 80,81,82 are 1,2,3.
LD * AR3+,A
ADD #100h,A
STL A,*AR3+
-----------
-----------
9/14/2017 Dr. Sudhir N Shelke
50. Cycle Prefetch Fetch Decode Access Read
Execute
& Write
AR3 A
1 LD 80 X
2 ADD LD 80 X
3 STL ADD LD 80 X
4 STL ADD LD 81 X
5 STL ADD LD 82 1
6 STL ------ LD 82 0001h
7 STL ADD 82 1001h
8 STL 82 1001h
9/14/2017 Dr. Sudhir N Shelke
51. ON CHIP PERIPHERALS
General-purpose I/O pins: XF and BIO
Timer
Host port interface (HPI)
Synchronous serial port
Buffered serial port (BSP)
Multichannel buffered serial port (McBSP)
Time-division multiplexed (TDM) serial port
Software-programmable wait-state generator
Programmable bank-switching module
9/14/2017 Dr. Sudhir N Shelke
52. GENERAL-PURPOSE I/O
• The C54xx DSP offers general-purpose I/O through two dedicated pins that
are software controlled. The two dedicated pins are the branch control input
pin (BIO) and the external flag output pin (XF).
• BIO can be used to monitor the status of peripheral devices.
• XF can be used to signal external devices. The XF pin is controlled using
software.
• It is driven high by setting the XF bit (in ST1) and is driven low by clearing
the XF bit. The set status register bit (SSBX) and reset status register bit
(RSBX) instructions can be used to set and clear XF, respectively.
9/14/2017 Dr. Sudhir N Shelke
53. SOFTWARE PROGRAMMABLE WAIT STATE
GENERATOR
• Software Programmable wait state generator extends external bus cycle up
to seven machine cycles to interface with slower off chip memory &
devices.
• The Software wait state generator is incorporated without any external
hardware.
• For off chip memory access from zero to seven wait states can be specified
within the software wait state register.
9/14/2017 Dr. Sudhir N Shelke
54. HOST PORT INTERFACE
• The host port interface is an 8 bit parallel port that provides an
interface with host processor.
• Information is exchanged between C54xx & host processor the
C54xx on chip memory that is accessible to both C54xx &
host processor.
9/14/2017 Dr. Sudhir N Shelke
55. HARDWARE TIMER
• The on-chip timer is a software-programmable timer that consists of three
registers and can be used to periodically generate interrupts.
• The timer resolution is the CPU clock rate of the processor.
• The high dynamic range of the timer is achieved with a 16-bit counter with
a 4-bit prescaler.
• Timer Registers:-
The on-chip timer consists of three memory-mapped registers (TIM, PRD,
and TCR).
9/14/2017 Dr. Sudhir N Shelke
56. • Timer register (TIM):The 16-bit memory-mapped timer register (TIM)
is loaded with the period register (PRD) value and decremented.
• Timer period register (PRD): The 16-bit memory-mapped timer period
register (PRD) is used to reload the timer register (TIM).
• Timer control register (TCR):The 16-bit memory-mapped timer control
register (TCR) contains the control and status bits of the timer.
9/14/2017 Dr. Sudhir N Shelke