Advanced RISC Machine
           ARM7 TDMI
           Prof. Anish Goel
µP Systems Overview




2            Advanced RISC Machine   Anish Goel
µP Systems Overview


 Complexity growth
 Like no other industry




3                         Advanced RISC Machine   Anish Goel
µP Systems Overview

    Embedded Systems and Applications
     Embedded microprocessors account for about 94% of all
     microprocessor sales.
     Embedded microprocessors extend over a much larger
     performance range than PC’s.
    Terminology
    GP Systems vs. Embedded Systems
     What are the key design parameters?




4                      Advanced RISC Machine        Anish Goel
µP Systems Overview

    Basic microprocessor system structure
      Central processing unit (CPU)
      Memory
      Input/Output (I/O)
      System bus
    A microcontroller or SoC will include some or all
    components on the same chip as the CPU.




5                        Advanced RISC Machine    Anish Goel
Why the ARM?

    Many possible devices to study (or use!)…
     Intel, Motorola, Microchip, Atmel, TI, Zilog, Philips, Rabbit,
     Siemens, Hitachi, AMD, etc.
    Considerations
     Installed base and software compatibility
     Development tool availability
     Complexity and architectural issues
     Computational capabilities
    Why not use the Pentium 4 instead?


6                          Advanced RISC Machine             Anish Goel
System Design
                       User needs
                  1 Requirements Analysis
                         2 Specification
                      3 System Architecture
       4 HW Design                             4 SW Design
    5 HW Implementation                   5 SW Implementation
       6 HW Testing                            6 SW Testing
                      7 System Integration
                       8 System Validation
                       9 O & M, Evolution
7                           Advanced RISC Machine               Anish Goel
Microprocessor System Design
Options
    Discrete microprocessor/microcontroller
    System-on-Chip (SoC)
     ASIC
    Programmable logic
     Soft cores
     Hard cores
    Specialized microprocessors
     Digital signal processors
     Network processors



8                        Advanced RISC Machine   Anish Goel
Simplified Pentium 4 Architecture




9             Advanced RISC Machine   Anish Goel
Caches: CPU-Memory Performance Gap




      350 nm
       (94%)                                                        130 nm
                          180 nm                                     (93%)
StrongArm SA-110
                           (86%) Itanium® 2 Processors
                                           From Dileep Bhandarkar, Intel
 10                Advanced RISC Machine                                   Anish Goel
Topics

     Microprocessor Organization
     Organization of Microprocessor Systems
     Endian-ness
     ARM History and Characteristics
     ARM7TDMI Implementation
     NXP LPC 2148 Overview




11                     Advanced RISC Machine   Anish Goel
Microprocessor Components

     Register file
       Program counter
       General purpose registers
       Hidden registers
     ALU
     Buses
     Memory interface
       Signal conventions
     Control and timing unit


12                          Advanced RISC Machine   Anish Goel
A Simple µP Architecture

                                                                                                                    16
                                                                                                                           ADDR


                  TR0                                    PC                                       AR
TEMPORARY    REGISTER                 PROGRAM     COUNTER                    ADDRESS       REGISTER


                                                                                                                     8
                                                                                                                           DATA
                                   Internal Data Bus
             A                                                                                     IR
  ACCUMULATOR           TEMP REG                       GEN REG 0   R0                       INST REG


                                                       GEN REG 1   R1
                                           F           GEN REG 2   R2
                                       FLAGS
                                                                                       INSTRUCTION DECODER
                                                                                                                          /RD
     ARITHMETIC AND LOGIC UNIT                         GEN REG 3   R3                   TIMING AND CONTROL
               (ALU)                                                                                                      /WR


                                                                          CLOCK
                                                                        GENERATOR                                         /RESET




            A less simple architecture
13                                              Advanced RISC Machine                                        Anish Goel
Instruction Set Architecture (ISA)
     Complex Instruction Set (CISC)
       Single instructions for complex tasks (string search, block
       move, RMW, etc.)
       Usually have variable length instructions
       Registers have specialized functions
     Reduced Instruction Set (RISC) (load/store)
       Instructions for simple operations only
       Usually fixed length instructions
       Large register sets (Register File based)




14                         Advanced RISC Machine           Anish Goel
Register Architectures
     Accumulator
      One instruction operand comes from a dedicated register
      (the accumulator) closely coupled to the ALU.
     Register-Memory
      Instruction operands can be obtained from both registers and
      memory
      Commonly used in CISC machines
     Load-Store
      All operands must be in general-purpose registers
      Only a very limited number of instructions (loads/stores) can
      “touch” memory
      Commonly used in RISC machines

15                       Advanced RISC Machine         Anish Goel
Microprocessor System Organization

     Memory Architectures
      Von Neumann architecture
      Harvard architecture
      Input/Output (I/O)
        Memory-mapped I/O
        Isolated I/O
     Programmer’s Model
      aka Register View
     Memory Maps



16                        Advanced RISC Machine   Anish Goel
Endian-ness
     Byte Ordering for Little Endian vs. Big Endian

                         Byte Byte Byte Byte
                          3    2    1    0
      Most Significant                                   Least Significant Byte
          Byte (MSB)                                     (LSB)

     Memory Address      +0       +1        +2      +3

      Big Endian         Byte    Byte     Byte Byte MSB in the lowest (first)
                          3       2        1    0 memory address
 Little Endian           Byte    Byte     Byte Byte LSB in the lowest (first)
                          0       1        2    3 memory address


17                              Advanced RISC Machine                Anish Goel
ARM Ltd

Founded in November 1990
  Spun out of Acorn Computers

Designs the ARM range of RISC processor cores
Licenses ARM core designs to semiconductor partners
who fabricate and sell to their customers.
   ARM does not fabricate silicon itself

Also develop technologies to assist with the design-in of
the ARM architecture
   Software tools, boards, debug hardware, application
   software, bus architectures, peripherals etc




18                              Advanced RISC Machine       Anish Goel
ARM Partnership Model




 19           Advanced RISC Machine   Anish Goel
ARM Powered Products




20           Advanced RISC Machine   Anish Goel
ARM Characteristics
     Designed to be a simple, efficient RISC core
      Small die area
      Low power
      Low interrupt latency
     These characteristics enabled ARM to become
     dominant in the cell phone market.
      Most cell phones contain a heterogenous multiprocessor
      SoC with an ARM and a DSP.
     Advanced ARM designs (ARM9,10,11) have
     become much more sophisticated (i.e. Intel Xscale
     in PDAs), but have had less success in penetrating
     other markets where power consumption issues
     are not as severe.

21                      Advanced RISC Machine       Anish Goel
CASE STUDY….
 ARM powered Product.
     Nintendo DS Lite
 Features:
     Touch Screen: Same specs as top
     screen, but with a transparent analog
     touch screen.
     Wireless Communication: IEEE 802.11
     embedded microphone for voice
     recognition
     embedded real-time clock; date, time
     and alarm
     CPUs: One ARM9 and one ARM7
     power-saving sleep mode

 http://www.arm.com/markets/home_solutions/app.html

22                         Advanced RISC Machine   Anish Goel
ARM7TDMI Implementation

     The ARM7TDMI uses the ARM v4T ISA.
      All instructions are conditional
     The ARM7TDMI is a basic load-store RISC
      Sixteen GP registers (R15-R0) with banking
      Three stage pipeline (FDE)
      No caches
      Support for ARM (32-bit) and Thumb (16-bit) instruction sets
      Multiply-accumulate (MAC) unit
      On-chip hardware debug support
        Test Access Port controller


23                        Advanced RISC Machine         Anish Goel
ARM7TDMI Processor Block Diagram




24         Advanced RISC Machine   Anish Goel
ARM7TDMI Processor Core




25        Advanced RISC Machine   Anish Goel
CPU Performance: Pipelining
 Several instructions are executed simultaneously at
 different stages of completion.

 Various conditions can cause pipeline bubbles that reduce
 utilization:
     branches;
     memory system delays;
     etc.

 Both ARM and SHARC have 3-stage pipes:
     fetch instruction from memory;
     decode opcode and operands;
     execute.

26                       Advanced RISC Machine   Anish Goel
ARM Pipeline Execution



            fetch    decode        execute    add r0,r1,#5


     sub r2,r3,r6    fetch         decode     execute

         cmp r2,#3                   fetch    decode        execute


                                                                time
              1         2               3


27                    Advanced RISC Machine             Anish Goel
Performance Measures
 Latency: time it takes for an instruction to get through
 the pipeline.

 Throughput: number of instructions executed per time
 period.

 Pipelining increases throughput without reducing latency.




28                    Advanced RISC Machine       Anish Goel
Pipeline Stalls
 If every step cannot be completed in the same amount of
 time, pipeline stalls.

 Bubbles introduced by stall increase latency, reduce
 throughput.




29                    Advanced RISC Machine       Anish Goel
ARM Multi-cycle LDMIA Instruction


ldmia        fetch decodeex ld r2ex ld r3
  r0,{r2,r3}

sub                        fetch decode                 ex sub
  r2,r3,r6

cmp                                 fetch               decodeex cmp
  r2,#3

                                                                              time
   http://www.sesp.cse.clrc.ac.uk/html/SoftwareTools/vtune/users_guide/mergedProj
   ects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/xscin
   struct_hh/LDMIA_(Thumb).htm
   30                           Advanced RISC Machine                  Anish Goel
Control Stalls
 Branches often introduce stalls (branch penalty).
     Stall time may depend on whether branch is taken.


 May have to squash instructions that already started
 executing.

 Don’t know what to fetch until condition is evaluated.




31                       Advanced RISC Machine           Anish Goel
ARM Pipelined Branch


bne foo      fetch decode ex bne ex bne ex bne

sub                fetch decode
  r2,r3,r6

foo add                              fetch decode ex add
 r0,r1,r2

                      2 cycle                                     time
                      penalty



  32                   Advanced RISC Machine               Anish Goel

Arm Lecture

  • 1.
    Advanced RISC Machine ARM7 TDMI Prof. Anish Goel
  • 2.
    µP Systems Overview 2 Advanced RISC Machine Anish Goel
  • 3.
    µP Systems Overview Complexity growth Like no other industry 3 Advanced RISC Machine Anish Goel
  • 4.
    µP Systems Overview Embedded Systems and Applications Embedded microprocessors account for about 94% of all microprocessor sales. Embedded microprocessors extend over a much larger performance range than PC’s. Terminology GP Systems vs. Embedded Systems What are the key design parameters? 4 Advanced RISC Machine Anish Goel
  • 5.
    µP Systems Overview Basic microprocessor system structure Central processing unit (CPU) Memory Input/Output (I/O) System bus A microcontroller or SoC will include some or all components on the same chip as the CPU. 5 Advanced RISC Machine Anish Goel
  • 6.
    Why the ARM? Many possible devices to study (or use!)… Intel, Motorola, Microchip, Atmel, TI, Zilog, Philips, Rabbit, Siemens, Hitachi, AMD, etc. Considerations Installed base and software compatibility Development tool availability Complexity and architectural issues Computational capabilities Why not use the Pentium 4 instead? 6 Advanced RISC Machine Anish Goel
  • 7.
    System Design User needs 1 Requirements Analysis 2 Specification 3 System Architecture 4 HW Design 4 SW Design 5 HW Implementation 5 SW Implementation 6 HW Testing 6 SW Testing 7 System Integration 8 System Validation 9 O & M, Evolution 7 Advanced RISC Machine Anish Goel
  • 8.
    Microprocessor System Design Options Discrete microprocessor/microcontroller System-on-Chip (SoC) ASIC Programmable logic Soft cores Hard cores Specialized microprocessors Digital signal processors Network processors 8 Advanced RISC Machine Anish Goel
  • 9.
    Simplified Pentium 4Architecture 9 Advanced RISC Machine Anish Goel
  • 10.
    Caches: CPU-Memory PerformanceGap 350 nm (94%) 130 nm 180 nm (93%) StrongArm SA-110 (86%) Itanium® 2 Processors From Dileep Bhandarkar, Intel 10 Advanced RISC Machine Anish Goel
  • 11.
    Topics Microprocessor Organization Organization of Microprocessor Systems Endian-ness ARM History and Characteristics ARM7TDMI Implementation NXP LPC 2148 Overview 11 Advanced RISC Machine Anish Goel
  • 12.
    Microprocessor Components Register file Program counter General purpose registers Hidden registers ALU Buses Memory interface Signal conventions Control and timing unit 12 Advanced RISC Machine Anish Goel
  • 13.
    A Simple µPArchitecture 16 ADDR TR0 PC AR TEMPORARY REGISTER PROGRAM COUNTER ADDRESS REGISTER 8 DATA Internal Data Bus A IR ACCUMULATOR TEMP REG GEN REG 0 R0 INST REG GEN REG 1 R1 F GEN REG 2 R2 FLAGS INSTRUCTION DECODER /RD ARITHMETIC AND LOGIC UNIT GEN REG 3 R3 TIMING AND CONTROL (ALU) /WR CLOCK GENERATOR /RESET A less simple architecture 13 Advanced RISC Machine Anish Goel
  • 14.
    Instruction Set Architecture(ISA) Complex Instruction Set (CISC) Single instructions for complex tasks (string search, block move, RMW, etc.) Usually have variable length instructions Registers have specialized functions Reduced Instruction Set (RISC) (load/store) Instructions for simple operations only Usually fixed length instructions Large register sets (Register File based) 14 Advanced RISC Machine Anish Goel
  • 15.
    Register Architectures Accumulator One instruction operand comes from a dedicated register (the accumulator) closely coupled to the ALU. Register-Memory Instruction operands can be obtained from both registers and memory Commonly used in CISC machines Load-Store All operands must be in general-purpose registers Only a very limited number of instructions (loads/stores) can “touch” memory Commonly used in RISC machines 15 Advanced RISC Machine Anish Goel
  • 16.
    Microprocessor System Organization Memory Architectures Von Neumann architecture Harvard architecture Input/Output (I/O) Memory-mapped I/O Isolated I/O Programmer’s Model aka Register View Memory Maps 16 Advanced RISC Machine Anish Goel
  • 17.
    Endian-ness Byte Ordering for Little Endian vs. Big Endian Byte Byte Byte Byte 3 2 1 0 Most Significant Least Significant Byte Byte (MSB) (LSB) Memory Address +0 +1 +2 +3 Big Endian Byte Byte Byte Byte MSB in the lowest (first) 3 2 1 0 memory address Little Endian Byte Byte Byte Byte LSB in the lowest (first) 0 1 2 3 memory address 17 Advanced RISC Machine Anish Goel
  • 18.
    ARM Ltd Founded inNovember 1990 Spun out of Acorn Computers Designs the ARM range of RISC processor cores Licenses ARM core designs to semiconductor partners who fabricate and sell to their customers. ARM does not fabricate silicon itself Also develop technologies to assist with the design-in of the ARM architecture Software tools, boards, debug hardware, application software, bus architectures, peripherals etc 18 Advanced RISC Machine Anish Goel
  • 19.
    ARM Partnership Model 19 Advanced RISC Machine Anish Goel
  • 20.
    ARM Powered Products 20 Advanced RISC Machine Anish Goel
  • 21.
    ARM Characteristics Designed to be a simple, efficient RISC core Small die area Low power Low interrupt latency These characteristics enabled ARM to become dominant in the cell phone market. Most cell phones contain a heterogenous multiprocessor SoC with an ARM and a DSP. Advanced ARM designs (ARM9,10,11) have become much more sophisticated (i.e. Intel Xscale in PDAs), but have had less success in penetrating other markets where power consumption issues are not as severe. 21 Advanced RISC Machine Anish Goel
  • 22.
    CASE STUDY…. ARMpowered Product. Nintendo DS Lite Features: Touch Screen: Same specs as top screen, but with a transparent analog touch screen. Wireless Communication: IEEE 802.11 embedded microphone for voice recognition embedded real-time clock; date, time and alarm CPUs: One ARM9 and one ARM7 power-saving sleep mode http://www.arm.com/markets/home_solutions/app.html 22 Advanced RISC Machine Anish Goel
  • 23.
    ARM7TDMI Implementation The ARM7TDMI uses the ARM v4T ISA. All instructions are conditional The ARM7TDMI is a basic load-store RISC Sixteen GP registers (R15-R0) with banking Three stage pipeline (FDE) No caches Support for ARM (32-bit) and Thumb (16-bit) instruction sets Multiply-accumulate (MAC) unit On-chip hardware debug support Test Access Port controller 23 Advanced RISC Machine Anish Goel
  • 24.
    ARM7TDMI Processor BlockDiagram 24 Advanced RISC Machine Anish Goel
  • 25.
    ARM7TDMI Processor Core 25 Advanced RISC Machine Anish Goel
  • 26.
    CPU Performance: Pipelining Several instructions are executed simultaneously at different stages of completion. Various conditions can cause pipeline bubbles that reduce utilization: branches; memory system delays; etc. Both ARM and SHARC have 3-stage pipes: fetch instruction from memory; decode opcode and operands; execute. 26 Advanced RISC Machine Anish Goel
  • 27.
    ARM Pipeline Execution fetch decode execute add r0,r1,#5 sub r2,r3,r6 fetch decode execute cmp r2,#3 fetch decode execute time 1 2 3 27 Advanced RISC Machine Anish Goel
  • 28.
    Performance Measures Latency:time it takes for an instruction to get through the pipeline. Throughput: number of instructions executed per time period. Pipelining increases throughput without reducing latency. 28 Advanced RISC Machine Anish Goel
  • 29.
    Pipeline Stalls Ifevery step cannot be completed in the same amount of time, pipeline stalls. Bubbles introduced by stall increase latency, reduce throughput. 29 Advanced RISC Machine Anish Goel
  • 30.
    ARM Multi-cycle LDMIAInstruction ldmia fetch decodeex ld r2ex ld r3 r0,{r2,r3} sub fetch decode ex sub r2,r3,r6 cmp fetch decodeex cmp r2,#3 time http://www.sesp.cse.clrc.ac.uk/html/SoftwareTools/vtune/users_guide/mergedProj ects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/xscin struct_hh/LDMIA_(Thumb).htm 30 Advanced RISC Machine Anish Goel
  • 31.
    Control Stalls Branchesoften introduce stalls (branch penalty). Stall time may depend on whether branch is taken. May have to squash instructions that already started executing. Don’t know what to fetch until condition is evaluated. 31 Advanced RISC Machine Anish Goel
  • 32.
    ARM Pipelined Branch bnefoo fetch decode ex bne ex bne ex bne sub fetch decode r2,r3,r6 foo add fetch decode ex add r0,r1,r2 2 cycle time penalty 32 Advanced RISC Machine Anish Goel