1. RISC DESIGN PHILOSOPHY
• RISC is design philosophy
• Aimed at delivering
Simple but powerful instructions
Instructions execute in single cycle at high clock
cycle
Reduced complexity of instructions performed by
H/w.
• Easier to provide greater flexibility and
intelligence in s/w rather than h/w
2. RISC DESIGN PHILOSOPHY
• CISC relies more on the h/w for instruction
functionality, Hence CISC instructions are
more complicated
3. RISC DESIGN PHILOSOPHY
RISC philosophy is implemented with 4 major
design rules.
1. INSTRUCTIONS
• Reduced number of instructions
• Each instruction executes in a single cycle.
• Fixed length instructions facilitate pipelining
4. RISC DESIGN PHILOSOPHY
2. PIPELINES
F1 D1 E1
F2 D2
D3
F3
E2
E3
I1
I2
I3
TIME
Execution of instructions I1, I2 AND I3 Without PIPELINE
6. RISC DESIGN PHILOSOPHY
3. REGISTERS
• RISC machines have large general purpose
registers.
• Registers act as the fast local memory store
for all data processing operations.
• CISC processors have less general purpose
registers. 8086 has only 4, 16-bit registers:
AX, BX, CX & DX.
• ARM has 12 active data registers.
7. RISC DESIGN PHILOSOPHY
4. LOAD-STORE ARCHITECTURE
• Processor operates on data held in registers.
• Separate Load-Store instructions transfer
data between the register bank and external
memory.
• Memory access are costly so separating
memory access from data processing
provides advantage.
• In CISC design, data processing operations
can act on memory directly.
8. Physical Features Driving ARM Design
Portable Embedded Systems
• Require battery power : smaller in size, extended
battery power
• High Code Density : ES have limited memory
• Price Sensitive : Use slow and low cost memory
• Smaller size : More space for specialized peripherals.
9. ARM Core is not a pure RISC architecture
Some modifications to make ARM suitable
for Embedded systems
10. ARM-BASED EMBEDDED SYSTEM
• Processor
Core : Execution Engine- process instructions
Surrounding components that interface with a bus
• Controllers
Coordinate important functional blocks of the system
2 Commonly found- Interrupt & Memory Controllers
• Peripherals
Provide input output capability external to the chip
• Bus
Used to Communicate between different parts of the device
12. ARM BUS TECHNOLOGY
• Microprocessors use off chip bus to connect to the
devices such as video cards, hard disk controllers.
• Most commonly used one is PCI (Peripheral Component
Interface)
PCI is designed to
connect mechanically
and electrically to
devices external to
the chip
13. ARM BUS TECHNOLOGY
• 2 different types of devices attached to the bus.
Bus Master: Logical device capable of initiating a data transfer
with another device across the bus
Bus Slave: Logical device capable of only responding to a
transfer request from a bus master device
• Bus has 2 architecture levels
Physical Level: electrical characteristics and bus width
Protocol: Logical rules govern the communication between the
processor and peripheral
14. AMBA BUS PROTOCOL
• Advanced Microcontroller Bus Architecture : widely
adopted on-chip bus architecture used for ARM
processor.
• ASB: ARM System Bus
Very first introduced AMBA buses, bidirectional bus
Multiple bus masters
High Performance
Pipelined Operation
15. AMBA BUS PROTOCOL
• APB: ARM Peripheral Bus
Low power
Simple interface
Suitable for many peripherals
• AHB: ARM High Performance Bus
Provides higher data throughput than ASB
Pipelined Operation
Multiple bus transfer
Burst transfer: 2 to 5 times faster than series of single data transfer
Bus width up to 128 bits
17. MEMORY
• Memory Constraints : Price, Performance & Power
consumption
• Characteristics: Hierarchy, width and type
• Hierarchy: Fastest memory Cache is located physically
nearer to the ARM processor and slowest secondary
memory is set further away. Closer memory to the
processor core, more it costs and smaller in capacity.
18. • Cache:
– Placed in between main memory and core.
– Speed up data transfer between the processor
and main memory.
– Does not support Real-Time Systems.
• Main Memory:
– Large memory space.
– Generally stored in separate chips
– Load and store instructions access the main
memory
20. MEMORY
• Width:
• Number of bits the memory returns on each access : 8,
16, 32 or 64 bits.
• 32 ARM instructions and 16-bit wide memory chip:
processor will make two memory fetches per instruction
21. MEMORY
• Types:
• ROM: Least flexible – Set at production time – Can not
reprogrammed
• FLASH ROM: Can read and write. Slow to write-not used for
holding dynamic data
• DRAM: Commonly used-Lowest cost
• SRAM: Faster than DRAM-but requires more silicon area-Cost
high
• SDRAM:Can run at Higher clock speeds- Synchronized itself with
the processor bus, because it is clocked.
22. PERIPHERALS
• Embedded systems interact with out side world through
peripherals.
• Each Peripheral performs single function and may reside
on-chip.
• All ARM peripherals are memory mapped- the
programming interface is set of memory-addressed
registers. The address of these registers is an offset from
specific peripheral base address.
23. CONTROLLERS
• Memory Controllers:
• These connect different types of memory to the processor bus.
• These controller allows certain memory to be active on power-up
which allows the initialization code to be executed.
• Some other devices must be set up by software.
Two important types of controllers are
1. MEMORY CONTROLLER
2. INTERRUPT CONTROLLER
24. PERIPHERALS
• Interrupt Controller:
• When a peripheral or device requires attention, it raises an
interrupt to the processor.
• An interrupt controller provides a programmable governing policy
to deal with external interrupt by setting the appropriate bits in the
interrupt control registers.
• Two types of interrupt controller available for the ARM-
Standard interrupt controller and the Vector interrupt
Controller
25. CONTROLLERS
• Standard Interrupt Controller:
• It can be programmed to ignore or mask an individual
device or set of devices.
• The interrupt handler determines which device requires
servicing by reading a device bitmap register in the
interrupt controller.
26. CONTROLLERS
• Vector Interrupt Controller: VIC
• It prioritizes interrupts and simplifies the determination of
which device caused the interrupt.
• VIC allows the interrupt signal to the core if the priority of a
new interrupt is higher than the currently executing interrupt
handler.
• VIC either call the standard interrupt exception handler, which
can load the address of the handler for the device from VIC or
cause the core to jump to the handler of the device directly.
28. EMBEDDED SYSTEM SOFTWARE
• Embedded System needs software to drive it.
• Initialization code:
It is the first code executed on the board. It sets the minimum
parts the board before handing control to the Operating system.
• Operating System:
It controls the applications and manage hardware system
resources. Many embedded systems do not require a full
operating system , but simple task scheduler that is either
event of poll driven.
29. EMBEDDED SYSTEM SOFTWARE
• Device Drivers:
They provide a consistent software interface to
the peripherals on the hardware device.
• Application:
It performs one of the tasks required for a
device.
30. EMBEDDED SYSTEM SOFTWARE
• Initialization (Boot) Code:
• It takes the processor from the reset state to a state where the
OS can run. It performs number of tasks before handing control
to OS. Those tasks can be grouped into 3 phases.
1. Initial hardware configuration: Setting up the target
platform
2. Diagnostics: To check if the target is in working order
3. Booting: Loading an image and handing control over to
that image (to operating system).
31. EMBEDDED SYSTEM SOFTWARE
• Initial hardware configuration
• Setting up the target platform so it can boot
image.
• This requires modification to satisfy the
requirements of the booted image.
• Ex. Memory system requires reorganization
of the memory map
33. EMBEDDED SYSTEM SOFTWARE
• Diagnostics
• The primary purpose is fault identification and
isolation
• Diagnostic code tests the system by exercising the
hardware target to check if the target h/w is in
working order.
• Booting
• Loading an image and handing control over to the
image.
• Copying entire program including code and data
into RAM
34. OPERATING SYSTEM
• ARM processor supports around 50 operating systems.
• Two categories of OS: RTOS & platform OS
• RTOS
• Guaranteed response times to event
• Different OS have different amounts of control over the
response time.
• Hard real time applications requires a guaranteed response.
• Soft real-time application requires good response time.
• Platform OS
• Requires a memory management unit to manage large, non
real time applications and tend to have secondary storage
• Linux OS is typical ex
35. APPLICATIONS
• OS schedules applications – code dedicated to
handling a particular task
• Automotive, Mobile and consumer devices, mass
storage, imaging
• Within each segment ARM processor can be found in
multiple applications.
• Ex: ARM can found in networking applications like
gateways, DSL modems for high speed internet
communication, 802.11 wireless communications.
36. ARM CORE DATAFLOW MODEL
• Programmer can think an ARM core as functional units
connected by data buses.
• Arrows represents the flow of the data and lines
represent the buses and the boxes represent either an
operation unit or storage area.
• Data enters processor core through data bus, data may
be an instruction to execute or a data item
• Von Neumann architecture- instructions and data uses
the same bus.(Harvard-separate)
38. ARM CORE DATAFLOW MODEL
• Decoder translates instructions before they are
executed.
• ARM uses load-store architecture:
– Load: Copy data from memory to register in the core
– Store: Copy data from registers to memory
– No data processing instructions directly manipulate
data in memory.
– Data processing is carried out solely in registers.
39. ARM CORE DATAFLOW MODEL
• Data items are placed in the register file- a
storage bank made up of 32-bit registers.
• ARM instructions typically have 2 registers:
Rn and Rm and destination register Rd.
• Source registers are read from register bank
using internal buses A and B respectively.
40. ARM CORE DATAFLOW MODEL
• One important feature of the ARM is register Rm can be
preprocessed in the barrel shifter before it enters the ALU
• Rd is written back to registers using result bus.
• For load-store instructions the incrementer updates the
address register before the core reads or writes the next
register value from or to the next sequential memory
location.
• Processor continues the executing instructions until an
exception or interrupt changes the normal execution flow.
41. REGISTERS
• General purpose register hold either data or an
address.
• They are identified with letter r prefixed to the register
number.
• ARM has 37 registers, all are 32-bits long
• The processor can operate in 7 different modes.
• There are up to 18 active registers: 16 data registers
(r0 to r15) and 2 processor status registers.
• r13, r14 and r15 special function registers.
43. REGISTERS
• r13 : Stack Pointer (sp)- stores the head of the stack in
the current processor mode
• r14: Link Register(lr) – core puts the return address
whenever it calls a subroutine.
• r15: Program Counter (pc) – contains the address of
the next instruction to be fetched by the processor.
• There are up to 18 active registers: 16 data registers
(r0 to r15) and 2 processor status registers.
44. Current Program Status Register
• ARM core uses the CPSR to monitor and control
internal operation.
• CPSR is 32 bit register resides in register file.
• It is divided into 4 fields, each 8 bits wide: flags,
status, extension and control.
45. PROCESSOR MODES
• Processor mode determines which registers are active
and the access rights to the CPSR register itself.
• Each processor mode is either privileged or non-
privileged
• Privileged mode allows full read-write access to the
CPSR.
• Non-privileged mode allows only read access to the
control field of the CPSR, but still allows read-write
access to the condition flags.
46. PROCESSOR MODES
• There are 7 processor modes in total:
• 6 Privileged modes and 1 non-privileged mode
49. Current Program Status Register
Q bit: Indicates if an overflow or saturation has occurred in
enhanced DSP instruction. Sticky word refers that flag is set by
hardware
51. PIPELINE
• Pipeline is the mechanism a RISC processor uses to
execute instructions.
• Pipeline speeds up the execution.
• Next instruction will be fetched when while other
instructions are being decoded and executed.
• Three-stage pipeline
• Fetch loads an instruction from memory
• Decode identifies the instruction to be executed
• Execute process the instruction and writes the result
back to a register
52. PIPELINE
• Filling the pipeline:
• Pipeline allows the core to execute an instruction
every cycle.
• As the length of the pipeline increases the amount of
work done at each stage is reduced, which allows the
processor to attain a higher operating frequency
54. PIPELINE EXECUTING CHARACTERSTICS
• ARM pipeline has not processed an instruction until it
passes completely through the execute stage. Ex. First
instruction get executed only when 4th instruction is
fetched.
55. PIPELINE EXECUTING CHARACTERSTICS
• During execution PC always points to the address of the
instruction being executed plus two instruction ahead.
• The execution of a branch instructions or branching by the
direct modification of the pc causes the ARM core to flush
its pipeline.
• ARM 10 uses branch prediction – reduces the effect of
pipeline flush.
• Instruction in the execute stage will complete even though
an interrupt has been raised. Other instruction in the
pipeline will be abandoned.
56. EXCEPTIONS, INTERRUPTS & VECTOR TABLE
• When an exception or interrupt occurs the processor sets the pc to a
specific memory address.
• The address is within a special address range called the vector table.
• The memory map address 0x00000000 is reserved for the vector
table.
• When exception or interrupts occurs, the processor suspends normal
execution and starts loading instructions from the exception vector
table.
• Each vector table entry contains a form of branch instruction
pointing to the start of a specific routine.
57. EXCEPTIONS, INTERRUPTS & VECTOR TABLE
• Reset Vector is the location of the first instruction executed by
the processor when power is applied. This instruction branches
to the initialization code.
• Undefined Instruction Vector is used when the processor cannot
decode an instruction.
• Software Interrupt Vector is called when you execute a SWI
instruction. SWI instruction is used to invoke OS routine.
• Prefetch Abort Vector occurs when the processor attempts to
fetch an instruction from an address without the correct access
permission. The actual abort occurs during the decode stage.
58. EXCEPTIONS, INTERRUPTS & VECTOR TABLE
• Data Abort Vector is raised when an instruction attempts
to access data memory without correct access permission.
• Interrupt Request Vector is used by external hardware to
interrupt the normal execution flow of the processor. It can
be only be raised if IRQs are not masked in the CPSR.
• Fast Interrupt Request Vector is reserved for hardware
requiring faster response times. It can only be raised if
FIQs are not masked in the CPSR.
60. CORE EXTENSIONS
• Standard devices placed next to the ARM core.
• They improve performance, manage resources
and provide extra functionality.
• Each ARM family has different extensions
• Three hardware extensions of ARM core :
Cache & Tightly Coupled Memory,
Memory management
Coprocessor interface.
61. CACHE & TIGHTLY COUPLE MEMORY
• Cache is a block of memory placed between main
memory and the core.
• Cache memory is an extremely fast memory type that
acts as a buffer between RAM and the CPU.
• It holds frequently requested data and instructions so
that they are immediately available to the CPU when
needed.
• Cache memory is used to reduce the average time to
access data from the Main memory
62. CACHE & TIGHTLY COUPLE MEMORY
• Most of the ARM-based embedded systems use a single-
level(L1) cache internal to the processor.
• Many small embedded systems do not require the
performance gains that a cache brings.
• It holds frequently requested data and instructions so that
they are immediately available to the CPU when needed.
• Cache memory is used to reduce the average time to
access data from the Main memory.
• ARM has two forms of cache
63. CACHE MEMORY
• First is found attached to the Von Neuman-style cores.
• Single unified cache for both data and instructions.
64. TIGHTLY COUPLED MEMORY
• Second form is found attached to the Harvard-style
cores.
• For real time systems code execution is deterministic
– the time taken for loading and storing instructions or
data must be predictable.
• This is achieved using a form of memory called
Tightly Coupled Memory (TCM).
• TCM is fast SRAM located close to the core
• It guarantees the clock cycles required to fetch
instructions or data- critical for real time algorithms
requiring deterministic behavior.
66. By combining both technologies, ARM can have both improved
performance and predictable real-time response
67. MEMORY MANAGEMENT
• Embedded system often uses multiple memory
devices.
• It is necessary to organize these devices and protect
the system from applications trying to make
inappropriate access to hardware.
• This is achieved with assistance of memory
management hardware.
• ARM core have 3 types of memory management
hardware:
– No extensions providing no protection
– Memory Protection Unit(MPU) providing ltd protection
– Memory Management Unit (MMU) providing full
protection
68. MEMORY MANAGEMENT
• Non-protected memory is fixed and provides very
little flexibility. Its is used for small, simple embedded
systems that require no protection.
• MPU: This type memory management is used for
system that require memory protection but don’t have
a complex memory map.
• MMUs are designed for more sophisticated platform
operating systems that support multitasking.
69. MEMORY MANAGEMENT
• MPU:
Uses a limited number of memory regions. These
regions are controlled with set of special
coprocessor registers, each region is defined with
specific access permissions.
This type memory management is used for system
that require memory protection but don’t have a
complex memory map.
70. MEMORY MANAGEMENT
• MMU
MMUs are designed for more sophisticated platform
operating systems that support multitasking.
MMU uses a set of translation tables to provide
fine-grained control over memory.
These tables are stored in main memory and provide
a virtual-to-physical address map as well as access
permission.
71. COPROCESSORS
• Coprocessors can be attached to the ARM processor.
• Coprocessor extends the processing feature of a core
by extending the instruction set or providing
configuration registers.
• More than one coprocessor can be added to the ARM
core via coprocessor interface.
• Coprocessor can be accessed through a group of
dedicated ARM instructions.
• Coprocessor can also extend the instruction set by
providing a specialized group of new instruction
72. COPROCESSORS
• These new instructions are processed at the decode
stage of the ARM pipeline.
• If the decode stage sees a coprocessor instructions
then it offers it to the relevant coprocessor.
• If the coprocessor is not present or doesn’t recognize
the instruction, then the ARM takes an undefined
instruction exception.