TMS320C6X architecture - processor, peripherals, 3 level memory, various internal buses
32 bit program address bus
256 bit program data bus
2, 32 bit data address bus
2, 64bit load data bus
2,64 bit store data bus
• The TMS320C6711 is a ﬂoating-point processor
based on the
• VLIW architecture .
• Internal memory includes a two-level cache
architecture with 4kB of level 1 program cache
(L1P), 4kB of level 1 data cache (L1D), and 64kB of
RAM or level 2 cache for data/program allocation
• It has a direct interface to both synchronous
memories and asynchronous memories
• On-chip peripherals include two multichannel buffered serial ports
(McBSPs),two timers, a 16-bit host port interface (HPI), and a 32-bit
external memory interface (EMIF).
• It requires 3.3V for I/O and 1.8V for the core (internal).
• Internal buses
– 32-bit program address bus
– 256-bit program data bus (eight 32-bit instructions),
– two 32-bit data address buses,
– two 64-bit data buses
– two 64-bit store data buses.
• With a 32-bit address bus, the total memory space is 2^32
• = 4GB, including four external memory spaces: CE0, CE1, CE2, and
3-Access level of Memory Map
1. L1 Memory
-Program Cache & Data Cache
-Size : PC(4Kbyte), DC(4Kbyte)
2. L2 Memory
- Size : 64Kbyte
- Program & Data
3. L3 Memory
• Independent memory banks on the C6x allow for two
memory accesses within one instruction cycle.
• Two independent memory banks can be accessed using
two independent buses.
• Two loads or two stores instructions can be performed
• No conﬂict results if the data accessed are in different
• Separate buses for program, data, and direct memory
access (DMA) allow the C6x to perform concurrent
program fetches, data read and write, and DMA
• C6x has a byte-addressable memory space.
• Internal memory is organized as separate
program and data memory spaces, with two 32-
bit internal ports (two 64-bit ports with the C64x)
to access internal memory.
• With a clock of 150MHz onboard the DSK, one
can ideally achieve two multiplies and
accumulates per cycle, for a total of 300 million
multiplies and accumulates (MACs) per second.
• With six of the eight functional units capable
of handling ﬂoating-point operations, it is
possible to perform 900 million ﬂoating-point
operations per second (MFLOPS).
• 1200 million instructions per second (MIPS)
• The CPU consists of eight independent functional
units divided into two data paths
• Each path has a unit for
– multiply operations (.M),
– logical and arithmetic operations (.L),
– branch, bit manipulation, and arithmetic operations
– loading/storing and arithmetic operations (.D).
• The .S and .L units are for arithmetic, logical, and
• All data transfers make use of the .D units.