SlideShare a Scribd company logo
Hossam Adel
Hossam Adel Mohammed
eng.hossam.ha@gmail.com
eg.linkedin.com/in/enghossam
Positions :
 Electronics Engineer | Helwan Company for Engineering
Industries (Military Factory 99).
 Electronics Engineer |Ministry of Industry and Foreign Trade.
 Embedded SW/HW Developer | IST Industries.
 Embedded system Instructor at AMIT
Lecture Contents:
-Embedded Introduction.
1- ARM Overview.
2- ARM architecture
a- Pipeline
b- Memory management
- Cache, TCM, MMU, TLB and MPU
c- DMA “direct memory access controller ”
d-NEON ( SIMD ) unit and Multi-Tasks.
e- AMBA specifications.
f- Interworking with Tumb.
3- Security in ARM.
4- semihosting.
What’s mean of Embedded ?!
 Any system depend on MC / MP that called Embedded System.
 An Embedded System is a computer system with a dedicated function.
 Embedded systems are designed to do some specific task, rather than be a
general-purpose computer for multiple tasks.
 SOC ?!
 Ardunio & Raspberry PI.
How to Build suitable ES
1 - Select suitable MC.
2 - Select Peripherals.
3- Select OS (RTOS).
4 - Select Lang. Programmed.
a- Assembly
b- C/C++
ARM Ltd
 ARM founded in November 1990
 Advanced RISC Machines
 Company headquarters in Cambridge, UK
 Processor design centres in Cambridge, Austin, and Sophia Antipolis
 Sales, support, and engineering offices all over the world
 Best known for its range of RISC processor cores designs
 Other products – fabric IP, software tools, models, cell libraries - to help partners
develop and ship ARM-based SoCs
 More information about ARM and our offices on our web site:
 http://www.arm.com/aboutarm/
Why ARM ?!!
 The ARM CPU core is a 32-bit RISC processor macro-cell upon which the current generation of ARM
processors is based. It has 32-bit data and address buses. It has a single 32-bit external data interface
through which both instructions and data pass during execution. It includes 15 general purpose registers. A
multi-stage pipeline is employed to speed the execution of instructions. Because branches cause the
sequential flow of instructions to be interrupted, it is usual to employ the ARM’s conditional execution
facility when possible. The ability of every instruction to be conditionally executed increases the chance
that the program address references will run sequentially there by allowing the memory sub-system to
make predictions about the next address required. Non-sequential addresses are held for two cycles.
ARM Offices Worldwide
ARM Connected Community – 900+
Connect, Collaborate, Create – accelerating innovation
Embedded Processors
Application Processors
Applications of ARM Processor
 ARM Processor Family
Processor
family
# of pipeline
stages
Memory
organization
Clock
Rate
MIPS/MHz
ARM6 3 Von
Neumann
25 MHz
ARM7 3 Von
Neumann
66 MHz 0.9
ARM8 5 Von
Neumann
72 MHz 1.2
ARM9 5 Harvard 200 MHz 1.1
ARM10 6 Harvard 400 MHz 1.25
StrongARM 5 Harvard 233 MHz 1.15
ARM11 8 Von
Neumann/
Harvard
550 MHz 1.2
16
Pipeline :
Pipeline in ARM:
System Control Coprocessor
functions of the system control coprocessor are:
• overall system control and configuration
• optional cache configuration and management
• optional Tightly-Coupled Memory (TCM) configuration and management
• optional Memory Protection Unit (MPU) configuration and management
• debug accesses to the caches
• system performance monitoring.
Memory System of ARM1176JZ
 Cache Organization
 TCM - Tightly-coupled memory
 MMU
 TLB
 MPU
 DMA - Direct Memory Access
Memory Size and Speed
25
On-chip cache memory
registers
2nd-level off chip cache
Main memory
Hard disk
Access
time
capacity
Slow
Fast
Large
Small
Cost
Cheap
Expensive
Cache Organization
 Separate Instruction and Data Caches in a Harvard arrangement. So
both store and load instruction on cost one cycle.
 Four-way set associative cache of configurable size.
 The caches are virtually indexed and physically tagged.
 Both the Instruction Cache and the Data Cache can provide two
words per cycle for all requesting sources.
Features of the cache system
 The cache is a Harvard implementation (Separate Instruction and Data Caches ).
 The other main operations performed by the cache is cache line Write-Back.
 Cache replacement policies are Pseudo-Random or Round-Robin, Round-Robin
uses a single counter for all sets, that selects the way used for replacement.
 Cache lines can be either Write-Back or Write-Through,
TCM (Tightly-coupled memory)
 Instruction TCM (ITCM)
 Data TCM (DTCM)
 The TCM is designed to provide low-latency memory that can be used by
the processor without the unpredictability that is a feature of caches.
 You can use such memory to hold critical routines. such as interrupt
handling routines or real-time tasks where the indeterminacy of a cache is
highly undesirable.
TCM Behavior
 TCM forms a continuous area of memory that is always valid if the TCM is enabled. The TCM is used
as part of the physical memory map of the system, and is not backed by a level of external memory
with the same physical addresses. For this reason, the TCM behaves differently from the caches for
regions of memory that are marked as being Write-Through Cachable. In such regions, no external
writes occur in the event of a write to memory locations contained in the TCM.
MMU
 Memory Management Unit
 The processor MMU works with the cache memory system to control accesses to and
from external memory.
 The MMU also controls the translation of virtual addresses to physical addresses.
 The processor implements an ARMv6 MMU enhanced with features to provide address
translation and access permission checks for all ports of the processor.
 While controlling the translation of virtual addresses to physical addresses, MMU in
ARM11 makes use of TLB (Translation Look aside Buffers)
TLB
 Translation Look aside Buffers
 A single set of two-level page tables stored in main memory controls the
contents of the instruction and data side
 Micro TLB
 Main TLB
Micro TLB & Main TLB
 Micro TLB -- The first level of caching for the page table information is a small
Micro TLB of ten entries that is implemented on each of the instruction and
data sides.
 You can configure Micro TLB replacement to be round-robin or random. By
default the round-robin replacement algorithm is used.
 Main TLB -- the second layer in the TLB structure that catches the cache misses
from the Micro TLBs. It provides a centralized source for translation entries.
Misses from the instruction and data Micro TLBs are handled by a unified main
MPU
 Memory protection unit
The MPU supports 16 memory regions. Each region is programmed with a base address
and size, and can be overlayed to enable efficient programming of the memory map. To
support overlaying the regions are assigned priorities, with region 0 having the lowest
priority and region 15 having the highest. The MPU returns access permissions and
attributes for the highest priority region in which the address hits.
 Enabling the MPU
When enabled, the behavior of the MPU is as follows:
• When the Load Store Unit or Pre-fetch Unit generates a memory access, the MPU
compares the memory address with the programmed memory regions.
• If the address does not exist in a memory region, a background fault is signalled
to the requesting block, along with status information to enable the Fault Status
Register to be correctly formed.
• If a matching memory region is found, then the region information is used:
1. The access permission bits are used to determine if the access is permitted.
If the access is not permitted the MPU signals a memory abort, otherwise
the access is permitted to proceed.
2. The memory region attributes are used to determine if the access is cached,
uncached or device and if it is shared.
• If the address matches in multiple memory regions, then a fixed priority scheme
selects the attributes for the highest numbered region.
Direct memory access (DMA)
 Direct memory access (DMA) is a feature of computerized systems that allows certain
hardware subsystems to access main system memory independently of the central
processing unit (CPU).
 Without DMA, when the CPU is using programmed input/output, it is typically fully
occupied for the entire duration of the read or write operation, and is thus unavailable
to perform other work. With DMA, the CPU initiates the transfer, does other operations
while the transfer is in progress, and receives an interrupt from the DMA controller
when the operation is done. This feature is useful any time the CPU cannot keep up
with the rate of data transfer, or where the CPU needs to perform useful work while
waiting for a relatively slow I/O data transfer.
 Many hardware systems use DMA, including disk drive controllers, graphics
cards, network cards and sound cards. DMA is also used for intra-chip data
transfer in multi-core processors. Computers that have DMA channels can
transfer data to and from devices with much less CPU overhead than computers
without DMA channels. Similarly, a processing element inside a multi-core
processor can transfer data to and from its local memory without occupying its
processor time, allowing computation and data transfer to proceed in parallel.
 DMA can also be used for "memory to memory" copying or moving of data
within memory. DMA can offload expensive memory operations, such as large
copies or scatter-gather operations, from the CPU to a dedicated DMA engine.
An implementation example is the I/O Acceleration Technology.
The NEON unit (SIMD)
 The NEON unit provides 32 vector registers that each hold 16 bytes of
information.
 These 16 byte registers can then be operated on in parallel in the NEON unit.
For example, in one vector add instruction you can add eight 16-bit integers
to eight other16 bit integers to produce eight 16-bit results.
 The NEON unit supports 8-bit, 16-bit and 32-bit integer operations, and
some 64-bit operations, in addition to 32-bit floating point operations.
What is NEON?
 NEON is a wide SIMD data processing architecture
 Extension of the ARM instruction set (v7-A)
 32 x 64-bit wide registers (can also be used as 16 x 128-bit wide registers)
 NEON instructions perform “Packed SIMD” processing
 Registers are considered as vectors of elements of the same data type
 Data types available: signed/unsigned 8-bit, 16-bit, 32-bit, 64-bit, single prec. float
 Instructions usually perform the same operation in all lanes
Dn
Dm
Dd
Source RegistersSource Registers
Operation
Destination
Register
ElementsElementsElements
NEON Coprocessor registers
 NEON has a 256-byte register file
 Separate from the core registers (r0-r15)
 Extension to the VFPv2 register file (VFPv3)
 Two different views of the NEON registers
 32 x 64-bit registers (D0-D31)
 16 x 128-bit registers (Q0-Q15)
 Enables register trade-offs
 Vector length can be variable
 Different registers available
Q0
Q1
Q15
:
D0
D1
D2
D3
:
D30
D31
D0
D1
D2
D3
:
NEON vectorizing example
 How does the compiler perform vectorization?
void add_int(int * __restrict pa,
int * __restrict pb,
unsigned int n, int x)
{
unsigned int i;
for(i = 0; i < (n & ~3); i++)
pa[i] = pb[i] + x;
}
1. Analyze each loop:
 Are pointer accesses safe for vectorization?
 What data types are being used?
How do they map onto NEON vector registers?
 Number of loop iterations
void add_int(int *pa, int *pb,
unsigned n, int x)
{
unsigned int i;
for (i = ((n & ~3) >> 2); i; i--)
{
*(pa + 0) = *(pb + 0) + x;
*(pa + 1) = *(pb + 1) + x;
*(pa + 2) = *(pb + 2) + x;
*(pa + 3) = *(pb + 3) + x;
pa += 4; pb += 4;
}
}
2. Unroll the loop to the appropriate number of iterations,
and perform other transformations like pointerization
+ + + +3. Map each unrolled operation onto a NEON vector
lane, and generate corresponding NEON
instructions
pb
x
pa
+
0127
Performance goals
Most applications require tuning to gain the best performance from vectorization. There is
always some overhead so the theoretical maximum performance cannot be reached. For
example, the NEON unit can process four single-precision floats at one time. This
means that the theoretical maximum performance for a floating-point application is a factor
of four over the original scalar non vectorized code. Given typical overheads, a reasonable
goal for a whole floating-point application is to aim for a 50% improvement
on performance over the scalar code. For large applications that are not completely
vectorizable, achieving a 25% improvement on performance over the scalar code is a
reasonable goal, although this is highly application dependent..
AMBA specifications :
 The Advanced Microcontroller Bus Architecture.
 Three distinct buses are defined within the AMBA specification:
 the Advanced High-performance Bus (AHB)
 the Advanced System Bus (ASB)
 the Advanced Peripheral Bus (APB).
Advanced High-performance Bus (AHB)
 The AMBA AHB is for high-performance, high clock frequency system modules.
 The AHB acts as the high-performance system backbone bus. AHB supports the
 efficient connection of processors, on-chip memories and off-chip external memory
 interfaces with low-power peripheral macrocell functions. AHB is also specified to
 ensure ease of use in an efficient design flow using synthesis and automated test
 techniques.
Advanced System Bus (ASB)
 The AMBA ASB is for high-performance system modules.
 AMBA ASB is an alternative system bus suitable for use where the high-performance
 features of AHB are not required. ASB also supports the efficient connection of
 processors, on-chip memories and off-chip external memory interfaces with low-power
 peripheral macrocell functions.
Advanced Peripheral Bus (APB)
 The AMBA APB is for low-power peripherals.
 AMBA APB is optimized for minimal power consumption and reduced interface
 complexity to support peripheral functions. APB can be used in conjunction with either
 version of the system bus.
Objectives of the AMBA specification
 The AMBA specification has been derived to satisfy four key requirements:
• to facilitate the right-first-time development of embedded microcontroller
products with one or more CPUs or signal processors
• to be technology-independent and ensure that highly reusable peripheral and
system macro-Cells can be migrated across a diverse range of IC processes and be
appropriate for full-custom, standard cell and gate array technologies
• to encourage modular system design to improve processor independence,
providing a development road-map for advanced cached CPU cores and the
developmentof peripheral libraries
• to minimize the silicon infrastructure required to support efficient on-chip and
off-chip communication for both operation and manufacturing test.
Interworking with Tumb.
 What’s mean of Tumb.
 Why Tumb code.
When you write code for an ARM processor that supports Thumb instructions, you
probably build most of your application to run in Thumb state. This gives the best code
density. With 8-bit or 16-bit wide memory, it also gives the best performance. However,
you might want parts of your application to run in ARM state for reasons such as:
Speed Some parts of an application might be speed critical. These sections
might be more efficient running in ARM state than in Thumb state.
Some systems include a small amount of fast 32-bit memory. ARM code
can be run from this without the overhead of fetching each instruction
from 8-bit or 16-bit memory.
Functionality
Thumb instructions are less flexible than their equivalent ARM
instructions. Some operations are not possible in Thumb state. A state
change to ARM is required to carry out the following operations:
• accesses to CPSR to enable or disable interrupts, and to change
mode,
• accesses to coprocessors
• execution of Digital Signal Processor (DSP) math instructions that
can not be performed in C language.
 Exception handling
The processor automatically enters ARM state when a processor
exception occurs. This means that the first part of an exception handler
must be coded with ARM instructions, even if it reenters Thumb state to
carry out the main processing of the exception. At the end of such
processing, the processor must be returned to ARM state to return from
the handler to the main application.
 Standalone Thumb programs
An ARM processor that supports Thumb instructions always starts in
ARM state. To run simple Thumb assembly language programs, add an
ARM header that carries out a state change to Thumb state and then calls
the main Thumb routine.
Security in ARM
 Global Platform based Trusted Execution
Environment and TrustZone® Ready
 The foundations for trusted services
 ARM® TrustZone® technology based Trusted
Execution Environment (TEE) as an important
component of delivering secure services (including
premium content) and applications. The paper then
describes how a correctly implemented TrustZone®
based TEE system can be “Secure by default”, and
robust against software attack.
The Four Compartment Security Model
 To protect against these security threats, ARM offers a four compartment security model that
provides increasing levels of security through a combination of reducing the attack surface and
increasing isolation. A system designer will typically use a combination of the following to
provide an appropriate level of protection of user and system assets:
 Normal World - User mode/System mode - PL0/PL1 in ARMv7 or EL0/EL1 in ARMv8
Running processes or applications are isolated from each other by the operating system
and the MMU. Each executing process has it’s own addressable memory, isolated from
other processes, along with a set of capabilities and permissions are administered by the
operating system kernel which executes with System level privilege.
 This is the normal operating state for application software together with the operating system
and is often referred to as the ‘Normal World’, in contrast to the ‘Trusted World’ described below.
 Hypervisor Mode - PL2 in ARMv7 or EL2 in ARMv8 The hypervisor allows multiple instances of
the same or different operating systems to execute on the same processor as a virtual machine.
Each virtual machine can be isolated from each other, and through use of a System MMU, such as
the MMU-400, other bus masters can also be virtualized. This separation can be used to protect
and secure resources and assets in one virtual machine from other virtual machines.
 Trusted World - Secure state or EL3 in ARMv8 Using the Trust Zone® security extensions,
allows the system to be physically partitioned into the secure and non-secure components. This
provides further isolation of assets and can be used to ensure that software operating within the
normal operating system cannot directly access secure memory or secure peripherals.
 SecurCore processors enable physically separate, tamper proof ICs offering secured processing
and storage protected against both physical and software attack.
Semihosting
 Semihosting enables code running on an ARM® target to use the I/O
facilities on a host computer that is running RealView® Debugger. Examples
of these facilities include keyboard input, screen output, and disk I/O.
 Semihosting is a mechanism for ARM targets to communicate input/output
requests from application code to a host computer running a debugger
ARM
Processor core
DEBUG
On chip
memory
Example ARM-based system
 ARM core deeply embedded within an SoC
 External debug and trace via JTAG or CoreSight
interface
 Design can have both external and internal
memories
 Varying width, speed and size – depending on system
requirements
 Can include ARM licensed CoreLink peripherals
 Interrupt controller, since core only has two interrupt
sources
 Other peripherals and interfaces
 Can include on-chip memory from ARM Artisan
Physical IP Libraries
 Elements connected using AMBA (Advanced
Microcontroller Bus Architecture)
ARM
Processor core
AMBAAXI
External
Memory
Interface
APB
Bridge
AMBAAPB
CoreLink
Interrupt
Controller
Other
CoreLink
Peripherals
DMA
Port
Clocks and
Reset Controller
DEBUG
nIRQ
nFIQ
FLASH
SDRAM
ARM based SoC
Custom
Peripherals
ARM architcture

More Related Content

What's hot

Introduction to arm architecture
Introduction to arm architectureIntroduction to arm architecture
Introduction to arm architecture
Zakaria Gomaa
 
Embedded C programming session10
Embedded C programming  session10Embedded C programming  session10
Embedded C programming session10
Keroles karam khalil
 
R&amp;c
R&amp;cR&amp;c
R&amp;c
haider1986
 
ARM - Advance RISC Machine
ARM - Advance RISC MachineARM - Advance RISC Machine
ARM - Advance RISC Machine
EdutechLearners
 
High Bandwidth Memory(HBM)
High Bandwidth Memory(HBM)High Bandwidth Memory(HBM)
High Bandwidth Memory(HBM)
HARINATH REDDY
 
ARM Micro-controller
ARM Micro-controllerARM Micro-controller
ARM Micro-controller
Ravikumar Tiwari
 
Module 2 ARM CORTEX M3 Instruction Set and Programming
Module 2 ARM CORTEX M3 Instruction Set and ProgrammingModule 2 ARM CORTEX M3 Instruction Set and Programming
Module 2 ARM CORTEX M3 Instruction Set and Programming
Amogha Bandrikalli
 
LECT 1: ARM PROCESSORS
LECT 1: ARM PROCESSORSLECT 1: ARM PROCESSORS
LECT 1: ARM PROCESSORS
Dr.YNM
 
Advanced Pipelining in ARM Processors.pptx
Advanced Pipelining  in ARM Processors.pptxAdvanced Pipelining  in ARM Processors.pptx
Advanced Pipelining in ARM Processors.pptx
JoyChowdhury30
 
Arm architecture
Arm architectureArm architecture
Arm processor
Arm processorArm processor
Arm processor
SHREEHARI WADAWADAGI
 
Introduction to ARM Architecture
Introduction to ARM ArchitectureIntroduction to ARM Architecture
Introduction to ARM Architecture
Racharla Rohit Varma
 
Unit II Arm7 Thumb Instruction
Unit II Arm7 Thumb InstructionUnit II Arm7 Thumb Instruction
Unit II Arm7 Thumb Instruction
Dr. Pankaj Zope
 
ARM Architecture in Details
ARM Architecture in Details ARM Architecture in Details
ARM Architecture in Details
GlobalLogic Ukraine
 
eMMC Embedded Multimedia Card overview
eMMC Embedded Multimedia Card overvieweMMC Embedded Multimedia Card overview
eMMC Embedded Multimedia Card overview
VijayGESYS
 
ARM 7 Detailed instruction set
ARM 7 Detailed instruction setARM 7 Detailed instruction set
ARM 7 Detailed instruction set
P.r. Dinesh
 
DDR SDRAMs
DDR SDRAMsDDR SDRAMs
DDR SDRAMs
Prankit Mishra
 
Arm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furberArm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furber
asodariyabhavesh
 
Unit II Arm 7 Introduction
Unit II Arm 7 IntroductionUnit II Arm 7 Introduction
Unit II Arm 7 Introduction
Dr. Pankaj Zope
 

What's hot (20)

Introduction to arm architecture
Introduction to arm architectureIntroduction to arm architecture
Introduction to arm architecture
 
Embedded C programming session10
Embedded C programming  session10Embedded C programming  session10
Embedded C programming session10
 
R&amp;c
R&amp;cR&amp;c
R&amp;c
 
ARM - Advance RISC Machine
ARM - Advance RISC MachineARM - Advance RISC Machine
ARM - Advance RISC Machine
 
High Bandwidth Memory(HBM)
High Bandwidth Memory(HBM)High Bandwidth Memory(HBM)
High Bandwidth Memory(HBM)
 
ARM Micro-controller
ARM Micro-controllerARM Micro-controller
ARM Micro-controller
 
Module 2 ARM CORTEX M3 Instruction Set and Programming
Module 2 ARM CORTEX M3 Instruction Set and ProgrammingModule 2 ARM CORTEX M3 Instruction Set and Programming
Module 2 ARM CORTEX M3 Instruction Set and Programming
 
LECT 1: ARM PROCESSORS
LECT 1: ARM PROCESSORSLECT 1: ARM PROCESSORS
LECT 1: ARM PROCESSORS
 
Advanced Pipelining in ARM Processors.pptx
Advanced Pipelining  in ARM Processors.pptxAdvanced Pipelining  in ARM Processors.pptx
Advanced Pipelining in ARM Processors.pptx
 
Introduction to stm32-part1
Introduction to stm32-part1Introduction to stm32-part1
Introduction to stm32-part1
 
Arm architecture
Arm architectureArm architecture
Arm architecture
 
Arm processor
Arm processorArm processor
Arm processor
 
Introduction to ARM Architecture
Introduction to ARM ArchitectureIntroduction to ARM Architecture
Introduction to ARM Architecture
 
Unit II Arm7 Thumb Instruction
Unit II Arm7 Thumb InstructionUnit II Arm7 Thumb Instruction
Unit II Arm7 Thumb Instruction
 
ARM Architecture in Details
ARM Architecture in Details ARM Architecture in Details
ARM Architecture in Details
 
eMMC Embedded Multimedia Card overview
eMMC Embedded Multimedia Card overvieweMMC Embedded Multimedia Card overview
eMMC Embedded Multimedia Card overview
 
ARM 7 Detailed instruction set
ARM 7 Detailed instruction setARM 7 Detailed instruction set
ARM 7 Detailed instruction set
 
DDR SDRAMs
DDR SDRAMsDDR SDRAMs
DDR SDRAMs
 
Arm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furberArm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furber
 
Unit II Arm 7 Introduction
Unit II Arm 7 IntroductionUnit II Arm 7 Introduction
Unit II Arm 7 Introduction
 

Viewers also liked

Q4.11: ARM Architecture
Q4.11: ARM ArchitectureQ4.11: ARM Architecture
Q4.11: ARM Architecture
Linaro
 
Hardware accelerated Virtualization in the ARM Cortex™ Processors
Hardware accelerated Virtualization in the ARM Cortex™ ProcessorsHardware accelerated Virtualization in the ARM Cortex™ Processors
Hardware accelerated Virtualization in the ARM Cortex™ Processors
The Linux Foundation
 
ARM Processor Tutorial
ARM Processor Tutorial ARM Processor Tutorial
ARM Processor Tutorial
Embeddedcraft Craft
 
Arm architecture overview
Arm architecture overviewArm architecture overview
Arm architecture overview
Sathish Arumugasamy
 
Memory model
Memory modelMemory model
Memory model
Yi-Hsiu Hsu
 
Arm architecture overview
Arm architecture overviewArm architecture overview
Arm architecture overviewSunil Thorat
 
FPGA/Reconfigurable computing (HPRC)
FPGA/Reconfigurable computing (HPRC)FPGA/Reconfigurable computing (HPRC)
FPGA/Reconfigurable computing (HPRC)
rinnocente
 
LCA13: ARMv8 Status and Updates
LCA13: ARMv8 Status and UpdatesLCA13: ARMv8 Status and Updates
LCA13: ARMv8 Status and Updates
Linaro
 
ARM AAE - Memory Systems
ARM AAE - Memory SystemsARM AAE - Memory Systems
ARM AAE - Memory Systems
Anh Dung NGUYEN
 
4th ARM Developer Day Presenters info
4th ARM Developer Day Presenters info4th ARM Developer Day Presenters info
4th ARM Developer Day Presenters info
Antonio Mondragon
 
4th ARM Developer Day Presentation
4th ARM Developer Day Presentation4th ARM Developer Day Presentation
4th ARM Developer Day PresentationAntonio Mondragon
 
Los jaliles pesados
Los jaliles pesadosLos jaliles pesados
Los jaliles pesados
Antonio Mondragon
 
2013 ARM Student Design Competition @RIT
2013 ARM Student Design Competition @RIT 2013 ARM Student Design Competition @RIT
2013 ARM Student Design Competition @RIT Antonio Mondragon
 
Arm rit design_comp 2014
Arm rit design_comp 2014Arm rit design_comp 2014
Arm rit design_comp 2014
Antonio Mondragon
 
Review Multicore processing based on ARM architecture
Review Multicore processing based on ARM architectureReview Multicore processing based on ARM architecture
Review Multicore processing based on ARM architecture
Mohammad Reza Khalifeh Mahmoodi
 
Windows Azure – Building & Deploying Cloud Services
Windows Azure – Building & Deploying Cloud Services Windows Azure – Building & Deploying Cloud Services
Windows Azure – Building & Deploying Cloud Services
WinWire Technologies Inc
 
RIT 2nd ARM Developer Day - Texas Instruments Presentation
RIT 2nd ARM Developer Day - Texas Instruments PresentationRIT 2nd ARM Developer Day - Texas Instruments Presentation
RIT 2nd ARM Developer Day - Texas Instruments PresentationAntonio Mondragon
 
Raspberry Pi presentation for Computer Architecture class
Raspberry Pi presentation for Computer Architecture classRaspberry Pi presentation for Computer Architecture class
Raspberry Pi presentation for Computer Architecture class
Michael Gordon
 

Viewers also liked (20)

Q4.11: ARM Architecture
Q4.11: ARM ArchitectureQ4.11: ARM Architecture
Q4.11: ARM Architecture
 
Hardware accelerated Virtualization in the ARM Cortex™ Processors
Hardware accelerated Virtualization in the ARM Cortex™ ProcessorsHardware accelerated Virtualization in the ARM Cortex™ Processors
Hardware accelerated Virtualization in the ARM Cortex™ Processors
 
ARM Processor Tutorial
ARM Processor Tutorial ARM Processor Tutorial
ARM Processor Tutorial
 
arm-cortex-a8
arm-cortex-a8arm-cortex-a8
arm-cortex-a8
 
Arm architecture overview
Arm architecture overviewArm architecture overview
Arm architecture overview
 
Memory model
Memory modelMemory model
Memory model
 
Arm architecture overview
Arm architecture overviewArm architecture overview
Arm architecture overview
 
FPGA/Reconfigurable computing (HPRC)
FPGA/Reconfigurable computing (HPRC)FPGA/Reconfigurable computing (HPRC)
FPGA/Reconfigurable computing (HPRC)
 
LCA13: ARMv8 Status and Updates
LCA13: ARMv8 Status and UpdatesLCA13: ARMv8 Status and Updates
LCA13: ARMv8 Status and Updates
 
ARM AAE - Memory Systems
ARM AAE - Memory SystemsARM AAE - Memory Systems
ARM AAE - Memory Systems
 
4th ARM Developer Day Presenters info
4th ARM Developer Day Presenters info4th ARM Developer Day Presenters info
4th ARM Developer Day Presenters info
 
4th ARM Developer Day Presentation
4th ARM Developer Day Presentation4th ARM Developer Day Presentation
4th ARM Developer Day Presentation
 
Los jaliles pesados
Los jaliles pesadosLos jaliles pesados
Los jaliles pesados
 
Exp w21
Exp w21Exp w21
Exp w21
 
2013 ARM Student Design Competition @RIT
2013 ARM Student Design Competition @RIT 2013 ARM Student Design Competition @RIT
2013 ARM Student Design Competition @RIT
 
Arm rit design_comp 2014
Arm rit design_comp 2014Arm rit design_comp 2014
Arm rit design_comp 2014
 
Review Multicore processing based on ARM architecture
Review Multicore processing based on ARM architectureReview Multicore processing based on ARM architecture
Review Multicore processing based on ARM architecture
 
Windows Azure – Building & Deploying Cloud Services
Windows Azure – Building & Deploying Cloud Services Windows Azure – Building & Deploying Cloud Services
Windows Azure – Building & Deploying Cloud Services
 
RIT 2nd ARM Developer Day - Texas Instruments Presentation
RIT 2nd ARM Developer Day - Texas Instruments PresentationRIT 2nd ARM Developer Day - Texas Instruments Presentation
RIT 2nd ARM Developer Day - Texas Instruments Presentation
 
Raspberry Pi presentation for Computer Architecture class
Raspberry Pi presentation for Computer Architecture classRaspberry Pi presentation for Computer Architecture class
Raspberry Pi presentation for Computer Architecture class
 

Similar to ARM architcture

Multilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memoryMultilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memoryMahesh Kumar Attri
 
ARM Processor architecture
ARM Processor  architectureARM Processor  architecture
ARM Processor architecture
rajkciitr
 
Linux Internals - Interview essentials 3.0
Linux Internals - Interview essentials 3.0Linux Internals - Interview essentials 3.0
Linux Internals - Interview essentials 3.0
Emertxe Information Technologies Pvt Ltd
 
Ram and types of ram.Cache
Ram and types of ram.CacheRam and types of ram.Cache
Ram and types of ram.Cache
hamza mukhtiar
 
11. Computer Systems Hardware 1
11. Computer Systems   Hardware 111. Computer Systems   Hardware 1
11. Computer Systems Hardware 1New Era University
 
301378156 design-of-sram-in-verilog
301378156 design-of-sram-in-verilog301378156 design-of-sram-in-verilog
301378156 design-of-sram-in-verilog
Srinivas Naidu
 
Computer System Architecture
Computer System ArchitectureComputer System Architecture
Computer System Architecture
Brenda Debra
 
COMPUTER BASIC AND FUNDAMENTAL AND ITS ORGANISATION.pptx
COMPUTER BASIC AND FUNDAMENTAL AND ITS ORGANISATION.pptxCOMPUTER BASIC AND FUNDAMENTAL AND ITS ORGANISATION.pptx
COMPUTER BASIC AND FUNDAMENTAL AND ITS ORGANISATION.pptx
PannaBushratul
 
Hcs Topic 2 Computer Structure V2
Hcs Topic 2  Computer Structure V2Hcs Topic 2  Computer Structure V2
Hcs Topic 2 Computer Structure V2Naruin
 
Hcs Topic 2 Computer Structure V2
Hcs Topic 2  Computer Structure V2Hcs Topic 2  Computer Structure V2
Hcs Topic 2 Computer Structure V2Kyle
 
Hcs Topic 2 Computer Structure V2
Hcs Topic 2  Computer Structure V2Hcs Topic 2  Computer Structure V2
Hcs Topic 2 Computer Structure V2ekul
 
UNIT 1 ERTS-1.pptusbce18ugxy8vsxysqvyexv
UNIT 1 ERTS-1.pptusbce18ugxy8vsxysqvyexvUNIT 1 ERTS-1.pptusbce18ugxy8vsxysqvyexv
UNIT 1 ERTS-1.pptusbce18ugxy8vsxysqvyexv
eeerithanya
 
L010236974
L010236974L010236974
L010236974
IOSR Journals
 
Topic 1 Data Representation
Topic 1 Data RepresentationTopic 1 Data Representation
Topic 1 Data RepresentationNaruin
 
Unit 2 processor&amp;memory-organisation
Unit 2 processor&amp;memory-organisationUnit 2 processor&amp;memory-organisation
Unit 2 processor&amp;memory-organisation
Pavithra S
 
Unit 1 processormemoryorganisation
Unit 1 processormemoryorganisationUnit 1 processormemoryorganisation
Unit 1 processormemoryorganisation
Karunamoorthy B
 
2. the memory systems (module2)
2. the memory systems (module2)2. the memory systems (module2)
2. the memory systems (module2)
Ajit Saraf
 
Co notes
Co notesCo notes

Similar to ARM architcture (20)

Multilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memoryMultilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memory
 
ARM Processor architecture
ARM Processor  architectureARM Processor  architecture
ARM Processor architecture
 
Linux Internals - Interview essentials 3.0
Linux Internals - Interview essentials 3.0Linux Internals - Interview essentials 3.0
Linux Internals - Interview essentials 3.0
 
Ram and types of ram.Cache
Ram and types of ram.CacheRam and types of ram.Cache
Ram and types of ram.Cache
 
11. Computer Systems Hardware 1
11. Computer Systems   Hardware 111. Computer Systems   Hardware 1
11. Computer Systems Hardware 1
 
301378156 design-of-sram-in-verilog
301378156 design-of-sram-in-verilog301378156 design-of-sram-in-verilog
301378156 design-of-sram-in-verilog
 
Computer System Architecture
Computer System ArchitectureComputer System Architecture
Computer System Architecture
 
COMPUTER BASIC AND FUNDAMENTAL AND ITS ORGANISATION.pptx
COMPUTER BASIC AND FUNDAMENTAL AND ITS ORGANISATION.pptxCOMPUTER BASIC AND FUNDAMENTAL AND ITS ORGANISATION.pptx
COMPUTER BASIC AND FUNDAMENTAL AND ITS ORGANISATION.pptx
 
Hcs Topic 2 Computer Structure V2
Hcs Topic 2  Computer Structure V2Hcs Topic 2  Computer Structure V2
Hcs Topic 2 Computer Structure V2
 
Hcs Topic 2 Computer Structure V2
Hcs Topic 2  Computer Structure V2Hcs Topic 2  Computer Structure V2
Hcs Topic 2 Computer Structure V2
 
Hcs Topic 2 Computer Structure V2
Hcs Topic 2  Computer Structure V2Hcs Topic 2  Computer Structure V2
Hcs Topic 2 Computer Structure V2
 
Coa presentation3
Coa presentation3Coa presentation3
Coa presentation3
 
E.s unit 4 and 5
E.s unit 4 and 5E.s unit 4 and 5
E.s unit 4 and 5
 
UNIT 1 ERTS-1.pptusbce18ugxy8vsxysqvyexv
UNIT 1 ERTS-1.pptusbce18ugxy8vsxysqvyexvUNIT 1 ERTS-1.pptusbce18ugxy8vsxysqvyexv
UNIT 1 ERTS-1.pptusbce18ugxy8vsxysqvyexv
 
L010236974
L010236974L010236974
L010236974
 
Topic 1 Data Representation
Topic 1 Data RepresentationTopic 1 Data Representation
Topic 1 Data Representation
 
Unit 2 processor&amp;memory-organisation
Unit 2 processor&amp;memory-organisationUnit 2 processor&amp;memory-organisation
Unit 2 processor&amp;memory-organisation
 
Unit 1 processormemoryorganisation
Unit 1 processormemoryorganisationUnit 1 processormemoryorganisation
Unit 1 processormemoryorganisation
 
2. the memory systems (module2)
2. the memory systems (module2)2. the memory systems (module2)
2. the memory systems (module2)
 
Co notes
Co notesCo notes
Co notes
 

Recently uploaded

Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
Kamal Acharya
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
Kamal Acharya
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
Kamal Acharya
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
abh.arya
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
Kamal Acharya
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 

Recently uploaded (20)

Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 

ARM architcture

  • 2. Hossam Adel Mohammed eng.hossam.ha@gmail.com eg.linkedin.com/in/enghossam Positions :  Electronics Engineer | Helwan Company for Engineering Industries (Military Factory 99).  Electronics Engineer |Ministry of Industry and Foreign Trade.  Embedded SW/HW Developer | IST Industries.  Embedded system Instructor at AMIT
  • 3. Lecture Contents: -Embedded Introduction. 1- ARM Overview. 2- ARM architecture a- Pipeline b- Memory management - Cache, TCM, MMU, TLB and MPU c- DMA “direct memory access controller ” d-NEON ( SIMD ) unit and Multi-Tasks. e- AMBA specifications. f- Interworking with Tumb. 3- Security in ARM. 4- semihosting.
  • 4. What’s mean of Embedded ?!  Any system depend on MC / MP that called Embedded System.  An Embedded System is a computer system with a dedicated function.  Embedded systems are designed to do some specific task, rather than be a general-purpose computer for multiple tasks.  SOC ?!  Ardunio & Raspberry PI.
  • 5. How to Build suitable ES 1 - Select suitable MC. 2 - Select Peripherals. 3- Select OS (RTOS). 4 - Select Lang. Programmed. a- Assembly b- C/C++
  • 6.
  • 7.
  • 8. ARM Ltd  ARM founded in November 1990  Advanced RISC Machines  Company headquarters in Cambridge, UK  Processor design centres in Cambridge, Austin, and Sophia Antipolis  Sales, support, and engineering offices all over the world  Best known for its range of RISC processor cores designs  Other products – fabric IP, software tools, models, cell libraries - to help partners develop and ship ARM-based SoCs  More information about ARM and our offices on our web site:  http://www.arm.com/aboutarm/
  • 9. Why ARM ?!!  The ARM CPU core is a 32-bit RISC processor macro-cell upon which the current generation of ARM processors is based. It has 32-bit data and address buses. It has a single 32-bit external data interface through which both instructions and data pass during execution. It includes 15 general purpose registers. A multi-stage pipeline is employed to speed the execution of instructions. Because branches cause the sequential flow of instructions to be interrupted, it is usual to employ the ARM’s conditional execution facility when possible. The ability of every instruction to be conditionally executed increases the chance that the program address references will run sequentially there by allowing the memory sub-system to make predictions about the next address required. Non-sequential addresses are held for two cycles.
  • 11. ARM Connected Community – 900+ Connect, Collaborate, Create – accelerating innovation
  • 14. Applications of ARM Processor
  • 15.
  • 16.  ARM Processor Family Processor family # of pipeline stages Memory organization Clock Rate MIPS/MHz ARM6 3 Von Neumann 25 MHz ARM7 3 Von Neumann 66 MHz 0.9 ARM8 5 Von Neumann 72 MHz 1.2 ARM9 5 Harvard 200 MHz 1.1 ARM10 6 Harvard 400 MHz 1.25 StrongARM 5 Harvard 233 MHz 1.15 ARM11 8 Von Neumann/ Harvard 550 MHz 1.2 16
  • 19.
  • 20.
  • 21.
  • 22.
  • 23. System Control Coprocessor functions of the system control coprocessor are: • overall system control and configuration • optional cache configuration and management • optional Tightly-Coupled Memory (TCM) configuration and management • optional Memory Protection Unit (MPU) configuration and management • debug accesses to the caches • system performance monitoring.
  • 24. Memory System of ARM1176JZ  Cache Organization  TCM - Tightly-coupled memory  MMU  TLB  MPU  DMA - Direct Memory Access
  • 25. Memory Size and Speed 25 On-chip cache memory registers 2nd-level off chip cache Main memory Hard disk Access time capacity Slow Fast Large Small Cost Cheap Expensive
  • 26. Cache Organization  Separate Instruction and Data Caches in a Harvard arrangement. So both store and load instruction on cost one cycle.  Four-way set associative cache of configurable size.  The caches are virtually indexed and physically tagged.  Both the Instruction Cache and the Data Cache can provide two words per cycle for all requesting sources.
  • 27. Features of the cache system  The cache is a Harvard implementation (Separate Instruction and Data Caches ).  The other main operations performed by the cache is cache line Write-Back.  Cache replacement policies are Pseudo-Random or Round-Robin, Round-Robin uses a single counter for all sets, that selects the way used for replacement.  Cache lines can be either Write-Back or Write-Through,
  • 28. TCM (Tightly-coupled memory)  Instruction TCM (ITCM)  Data TCM (DTCM)  The TCM is designed to provide low-latency memory that can be used by the processor without the unpredictability that is a feature of caches.  You can use such memory to hold critical routines. such as interrupt handling routines or real-time tasks where the indeterminacy of a cache is highly undesirable.
  • 29. TCM Behavior  TCM forms a continuous area of memory that is always valid if the TCM is enabled. The TCM is used as part of the physical memory map of the system, and is not backed by a level of external memory with the same physical addresses. For this reason, the TCM behaves differently from the caches for regions of memory that are marked as being Write-Through Cachable. In such regions, no external writes occur in the event of a write to memory locations contained in the TCM.
  • 30. MMU  Memory Management Unit  The processor MMU works with the cache memory system to control accesses to and from external memory.  The MMU also controls the translation of virtual addresses to physical addresses.  The processor implements an ARMv6 MMU enhanced with features to provide address translation and access permission checks for all ports of the processor.  While controlling the translation of virtual addresses to physical addresses, MMU in ARM11 makes use of TLB (Translation Look aside Buffers)
  • 31. TLB  Translation Look aside Buffers  A single set of two-level page tables stored in main memory controls the contents of the instruction and data side  Micro TLB  Main TLB
  • 32. Micro TLB & Main TLB  Micro TLB -- The first level of caching for the page table information is a small Micro TLB of ten entries that is implemented on each of the instruction and data sides.  You can configure Micro TLB replacement to be round-robin or random. By default the round-robin replacement algorithm is used.  Main TLB -- the second layer in the TLB structure that catches the cache misses from the Micro TLBs. It provides a centralized source for translation entries. Misses from the instruction and data Micro TLBs are handled by a unified main
  • 33. MPU  Memory protection unit The MPU supports 16 memory regions. Each region is programmed with a base address and size, and can be overlayed to enable efficient programming of the memory map. To support overlaying the regions are assigned priorities, with region 0 having the lowest priority and region 15 having the highest. The MPU returns access permissions and attributes for the highest priority region in which the address hits.
  • 34.  Enabling the MPU When enabled, the behavior of the MPU is as follows: • When the Load Store Unit or Pre-fetch Unit generates a memory access, the MPU compares the memory address with the programmed memory regions. • If the address does not exist in a memory region, a background fault is signalled to the requesting block, along with status information to enable the Fault Status Register to be correctly formed. • If a matching memory region is found, then the region information is used: 1. The access permission bits are used to determine if the access is permitted. If the access is not permitted the MPU signals a memory abort, otherwise the access is permitted to proceed. 2. The memory region attributes are used to determine if the access is cached, uncached or device and if it is shared. • If the address matches in multiple memory regions, then a fixed priority scheme selects the attributes for the highest numbered region.
  • 35. Direct memory access (DMA)  Direct memory access (DMA) is a feature of computerized systems that allows certain hardware subsystems to access main system memory independently of the central processing unit (CPU).  Without DMA, when the CPU is using programmed input/output, it is typically fully occupied for the entire duration of the read or write operation, and is thus unavailable to perform other work. With DMA, the CPU initiates the transfer, does other operations while the transfer is in progress, and receives an interrupt from the DMA controller when the operation is done. This feature is useful any time the CPU cannot keep up with the rate of data transfer, or where the CPU needs to perform useful work while waiting for a relatively slow I/O data transfer.
  • 36.
  • 37.  Many hardware systems use DMA, including disk drive controllers, graphics cards, network cards and sound cards. DMA is also used for intra-chip data transfer in multi-core processors. Computers that have DMA channels can transfer data to and from devices with much less CPU overhead than computers without DMA channels. Similarly, a processing element inside a multi-core processor can transfer data to and from its local memory without occupying its processor time, allowing computation and data transfer to proceed in parallel.  DMA can also be used for "memory to memory" copying or moving of data within memory. DMA can offload expensive memory operations, such as large copies or scatter-gather operations, from the CPU to a dedicated DMA engine. An implementation example is the I/O Acceleration Technology.
  • 38. The NEON unit (SIMD)  The NEON unit provides 32 vector registers that each hold 16 bytes of information.  These 16 byte registers can then be operated on in parallel in the NEON unit. For example, in one vector add instruction you can add eight 16-bit integers to eight other16 bit integers to produce eight 16-bit results.  The NEON unit supports 8-bit, 16-bit and 32-bit integer operations, and some 64-bit operations, in addition to 32-bit floating point operations.
  • 39. What is NEON?  NEON is a wide SIMD data processing architecture  Extension of the ARM instruction set (v7-A)  32 x 64-bit wide registers (can also be used as 16 x 128-bit wide registers)  NEON instructions perform “Packed SIMD” processing  Registers are considered as vectors of elements of the same data type  Data types available: signed/unsigned 8-bit, 16-bit, 32-bit, 64-bit, single prec. float  Instructions usually perform the same operation in all lanes Dn Dm Dd Source RegistersSource Registers Operation Destination Register ElementsElementsElements
  • 40. NEON Coprocessor registers  NEON has a 256-byte register file  Separate from the core registers (r0-r15)  Extension to the VFPv2 register file (VFPv3)  Two different views of the NEON registers  32 x 64-bit registers (D0-D31)  16 x 128-bit registers (Q0-Q15)  Enables register trade-offs  Vector length can be variable  Different registers available Q0 Q1 Q15 : D0 D1 D2 D3 : D30 D31 D0 D1 D2 D3 :
  • 41. NEON vectorizing example  How does the compiler perform vectorization? void add_int(int * __restrict pa, int * __restrict pb, unsigned int n, int x) { unsigned int i; for(i = 0; i < (n & ~3); i++) pa[i] = pb[i] + x; } 1. Analyze each loop:  Are pointer accesses safe for vectorization?  What data types are being used? How do they map onto NEON vector registers?  Number of loop iterations void add_int(int *pa, int *pb, unsigned n, int x) { unsigned int i; for (i = ((n & ~3) >> 2); i; i--) { *(pa + 0) = *(pb + 0) + x; *(pa + 1) = *(pb + 1) + x; *(pa + 2) = *(pb + 2) + x; *(pa + 3) = *(pb + 3) + x; pa += 4; pb += 4; } } 2. Unroll the loop to the appropriate number of iterations, and perform other transformations like pointerization + + + +3. Map each unrolled operation onto a NEON vector lane, and generate corresponding NEON instructions pb x pa + 0127
  • 42. Performance goals Most applications require tuning to gain the best performance from vectorization. There is always some overhead so the theoretical maximum performance cannot be reached. For example, the NEON unit can process four single-precision floats at one time. This means that the theoretical maximum performance for a floating-point application is a factor of four over the original scalar non vectorized code. Given typical overheads, a reasonable goal for a whole floating-point application is to aim for a 50% improvement on performance over the scalar code. For large applications that are not completely vectorizable, achieving a 25% improvement on performance over the scalar code is a reasonable goal, although this is highly application dependent..
  • 43.
  • 44. AMBA specifications :  The Advanced Microcontroller Bus Architecture.  Three distinct buses are defined within the AMBA specification:  the Advanced High-performance Bus (AHB)  the Advanced System Bus (ASB)  the Advanced Peripheral Bus (APB).
  • 45.
  • 46. Advanced High-performance Bus (AHB)  The AMBA AHB is for high-performance, high clock frequency system modules.  The AHB acts as the high-performance system backbone bus. AHB supports the  efficient connection of processors, on-chip memories and off-chip external memory  interfaces with low-power peripheral macrocell functions. AHB is also specified to  ensure ease of use in an efficient design flow using synthesis and automated test  techniques. Advanced System Bus (ASB)  The AMBA ASB is for high-performance system modules.  AMBA ASB is an alternative system bus suitable for use where the high-performance  features of AHB are not required. ASB also supports the efficient connection of  processors, on-chip memories and off-chip external memory interfaces with low-power  peripheral macrocell functions. Advanced Peripheral Bus (APB)  The AMBA APB is for low-power peripherals.  AMBA APB is optimized for minimal power consumption and reduced interface  complexity to support peripheral functions. APB can be used in conjunction with either  version of the system bus.
  • 47. Objectives of the AMBA specification  The AMBA specification has been derived to satisfy four key requirements: • to facilitate the right-first-time development of embedded microcontroller products with one or more CPUs or signal processors • to be technology-independent and ensure that highly reusable peripheral and system macro-Cells can be migrated across a diverse range of IC processes and be appropriate for full-custom, standard cell and gate array technologies • to encourage modular system design to improve processor independence, providing a development road-map for advanced cached CPU cores and the developmentof peripheral libraries • to minimize the silicon infrastructure required to support efficient on-chip and off-chip communication for both operation and manufacturing test.
  • 48. Interworking with Tumb.  What’s mean of Tumb.  Why Tumb code. When you write code for an ARM processor that supports Thumb instructions, you probably build most of your application to run in Thumb state. This gives the best code density. With 8-bit or 16-bit wide memory, it also gives the best performance. However, you might want parts of your application to run in ARM state for reasons such as: Speed Some parts of an application might be speed critical. These sections might be more efficient running in ARM state than in Thumb state. Some systems include a small amount of fast 32-bit memory. ARM code can be run from this without the overhead of fetching each instruction from 8-bit or 16-bit memory.
  • 49. Functionality Thumb instructions are less flexible than their equivalent ARM instructions. Some operations are not possible in Thumb state. A state change to ARM is required to carry out the following operations: • accesses to CPSR to enable or disable interrupts, and to change mode, • accesses to coprocessors • execution of Digital Signal Processor (DSP) math instructions that can not be performed in C language.
  • 50.  Exception handling The processor automatically enters ARM state when a processor exception occurs. This means that the first part of an exception handler must be coded with ARM instructions, even if it reenters Thumb state to carry out the main processing of the exception. At the end of such processing, the processor must be returned to ARM state to return from the handler to the main application.  Standalone Thumb programs An ARM processor that supports Thumb instructions always starts in ARM state. To run simple Thumb assembly language programs, add an ARM header that carries out a state change to Thumb state and then calls the main Thumb routine.
  • 51. Security in ARM  Global Platform based Trusted Execution Environment and TrustZone® Ready  The foundations for trusted services  ARM® TrustZone® technology based Trusted Execution Environment (TEE) as an important component of delivering secure services (including premium content) and applications. The paper then describes how a correctly implemented TrustZone® based TEE system can be “Secure by default”, and robust against software attack.
  • 52. The Four Compartment Security Model  To protect against these security threats, ARM offers a four compartment security model that provides increasing levels of security through a combination of reducing the attack surface and increasing isolation. A system designer will typically use a combination of the following to provide an appropriate level of protection of user and system assets:  Normal World - User mode/System mode - PL0/PL1 in ARMv7 or EL0/EL1 in ARMv8 Running processes or applications are isolated from each other by the operating system and the MMU. Each executing process has it’s own addressable memory, isolated from other processes, along with a set of capabilities and permissions are administered by the operating system kernel which executes with System level privilege.  This is the normal operating state for application software together with the operating system and is often referred to as the ‘Normal World’, in contrast to the ‘Trusted World’ described below.
  • 53.  Hypervisor Mode - PL2 in ARMv7 or EL2 in ARMv8 The hypervisor allows multiple instances of the same or different operating systems to execute on the same processor as a virtual machine. Each virtual machine can be isolated from each other, and through use of a System MMU, such as the MMU-400, other bus masters can also be virtualized. This separation can be used to protect and secure resources and assets in one virtual machine from other virtual machines.  Trusted World - Secure state or EL3 in ARMv8 Using the Trust Zone® security extensions, allows the system to be physically partitioned into the secure and non-secure components. This provides further isolation of assets and can be used to ensure that software operating within the normal operating system cannot directly access secure memory or secure peripherals.  SecurCore processors enable physically separate, tamper proof ICs offering secured processing and storage protected against both physical and software attack.
  • 54. Semihosting  Semihosting enables code running on an ARM® target to use the I/O facilities on a host computer that is running RealView® Debugger. Examples of these facilities include keyboard input, screen output, and disk I/O.  Semihosting is a mechanism for ARM targets to communicate input/output requests from application code to a host computer running a debugger ARM Processor core DEBUG
  • 55.
  • 56. On chip memory Example ARM-based system  ARM core deeply embedded within an SoC  External debug and trace via JTAG or CoreSight interface  Design can have both external and internal memories  Varying width, speed and size – depending on system requirements  Can include ARM licensed CoreLink peripherals  Interrupt controller, since core only has two interrupt sources  Other peripherals and interfaces  Can include on-chip memory from ARM Artisan Physical IP Libraries  Elements connected using AMBA (Advanced Microcontroller Bus Architecture) ARM Processor core AMBAAXI External Memory Interface APB Bridge AMBAAPB CoreLink Interrupt Controller Other CoreLink Peripherals DMA Port Clocks and Reset Controller DEBUG nIRQ nFIQ FLASH SDRAM ARM based SoC Custom Peripherals