SlideShare a Scribd company logo
Architecture and Implementation of the ARM Cortex-A8 Microprocessor

ANEESH R

aneeshr2020@gmail.com
Architecture and Implementation of the ARM Cortex-A8 Microprocessor

1 Introduction
The ARM® Cortex™-A8 microprocessor is the first applications microprocessor in ARM’s new Cortex
family. With high performance and power efficiency, it targets a wide variety of mobile and consumer
applications including mobile phones, set-top boxes, gaming consoles and automotive
navigation/entertainment systems. The Cortex-A8 processor spans a range of performance points
depending on the implementation, delivering over to 2000 Dhrystone MIPS (DMIPS) of performance for
demanding consumer applications and consuming less than 300mW for low-power mobile devices. This
translates into a large increase in processing capability while staying with the power levels of previous
generations of mobile devices. Consumer applications will benefit from the reduced heat dissipation
and resulting lower packaging and integration costs.
It is the first ARM processor to incorporate all of the new technologies available in the ARMv7
architecture. New technologies seen for the first time include NEON™ for media and signal processing
and Jazelle® RCT for acceleration of real-time compilers. Other technologies recently introduced that
are now standard on the ARMv7 architecture include TrustZone® technology for security, Thumb®-2
technology for code density and the VFPv3 floating point architecture.

2 Overview of the Cortex Architecture
The unifying technology of Cortex processors is Thumb-2 technology. The Thumb-2 instruction set
combines 16- and 32-bit instructions to improve code density and performance. The original ARM
instruction set consists of fixed-length 32-bit instructions, while the Thumb instruction set employs 16bit instructions. Because not all operations mapped into the original Thumb instruction set, multiple
instructions were sometimes needed to emulate the task of one 32-bit instruction.
Thumb-2 technology adds about 130 additional instructions to Thumb. The added functionality removes
the need to switch between ARM and Thumb modes in order to service interrupts, and gives access to
the full set of processor registers. The resulting code maintains the traditional code density of Thumb
instructions while running at the performance levels of 32-bit ARM code. Entire applications can now be
written in Thumb-2 technology, removing the original architecture required for mode switching.

An entire application can be written using space-saving Thumb-2 instructions, whereas with the original
Thumb mode the processor had to switch between ARM and Thumb modes.
Making its first appearance in an ARM processor is the NEON media and signal processing technology
targeted at audio, video and 3D graphics. It is a 64/128-bit hybrid SIMD architecture. NEON technology
has its own register file and execution pipeline which are separate from the main ARM integer
pipeline. It can handle both integer and single precision floating-point values, and includes support for
unaligned data accesses and easy loading of interleaved data stored in structure form. Using NEON
technology to perform typical multimedia functions, the Cortex-A8 processor can decode MPEG-4 VGA
video (including dering, deblock filters and yuv2rgb) at 30 frames/sec at 275 MHz, and H.264 Video at
350 MHz
aneeshr2020@gmail.com

Aneesh R
Architecture and Implementation of the ARM Cortex-A8 Microprocessor

ANEESH R

aneeshr2020@gmail.com

Also new is Jazelle RCT technology, an architecture extension that cuts the memory footprint of just-intime (JIT) bytecode applications to a third of their original size. The smaller code size results in a boost
performance and a reduction of power.
TrustZone technology is included in the Cortex-A8 to ensure data privacy and DRM protection in
consumer products like mobile phones, personal digital assistants and set-top boxes that run open
operating systems. Implemented within the processor core, TrustZone technology protects peripherals
and memory against a security attack. A secure monitor within the core serves as a gatekeeper
switching the system between secure and non-secure states. In the secure state, the processor runs
“trusted” code from a secure code block to handle security-sensitive tasks such as authentication and
signature manipulation.
Besides contributing to the processor's signal processing performance, NEON technology enables
software solutions to data processing applications. The result is a flexible platform which can
accommodate new algorithms and new applications as they emerge with simply the download of new
software or a driver.
The VFPv3 technology is an enhancement to the VFPv2 technology. New features include a doubling of
the number of double-precision registers to 32, and the introduction of instructions that perform
conversions between fixed-point and floating-point numbers.

3 Exploring Features of the Cortex-A8 Microarchitecture
The Cortex-A8 processor is the most sophisticated low-power design yet produced by ARM. To achieve
its high levels of performance, new microarchitecture features were added which are not traditionally
found in the ARM architecture, including a dual in-order issue ARM integer pipeline, an integrated L2
cache and a deep 13-stage pipe.

Fig: ARM Cortex-A8 architecture

aneeshr2020@gmail.com

Aneesh R
Architecture and Implementation of the ARM Cortex-A8 Microprocessor

ANEESH R

aneeshr2020@gmail.com

3.1 Super scalar pipeline
Perhaps the most significant of these new features is the dual-issue, in-order, statically scheduled ARM
integer pipeline. Previous ARM processors have only a single integer execution pipeline. The ability to
issue two data processing instructions at the same time significantly increases the maximum potential
instructions executed per cycle. It was decided to stay with in-order issue to keep additional power
required to a minimum. Out-of-order issue and retire can require extensive amounts of logic consuming
extra power. The choice to go with in-order also allows for fire-and-forget instruction issue, thus
removing critical paths from the design and reducing the need for custom design in the pipeline. Static
scheduling allows for extensive clock gating for reduced power during execution.
The dual ALU (arithmetic logic unit) pipelines (ALU 0 and ALU 1) are symmetric and both can handle
most arithmetic instructions. ALU pipe 0 always carries the older of a pair of issued instructions. The
Cortex-A8 processor also has multiplier and load-store pipelines, but these do not carry additional
instructions to the two ALU pipelines. These can be thought of as “dependent” pipelines. Their use
requires simultaneous use of one of the ALU pipelines. The multiplier pipeline can only be coupled with
instructions that are in ALU 0 pipelines, whereas the load-store pipeline can be coupled with instructions
in either ALU.

3.2 Branch prediction
The 13-stage pipeline was selected to enable significantly higher operating frequencies than previous
generations of ARM microarchitectures. Note that stage F0 is not counted because it is only address
generation. To minimize the branch penalties typically associated with a deeper pipeline, the Cortex-A8
processor implements a two-level global history branch predictor. It consists of two different structures:
the Branch Target Buffer (BTB) and the Global History Buffer (GHB) which are accessed in parallel with
instruction fetches.
The BTB indicates whether or not the current fetch address will return a branch instruction and its
branch target address. It contains 512 entries. On a hit in the BTB a branch is predicted and the GHB is
accessed. The GHB consists of 4096 2-bit saturating counters that encode the strength and direction
information of branches. The GHB is indexed by 10-bit history of the direction of the last ten branches
encountered and 4 bits of the PC.
In addition to the dynamic branch predictor, a return stack is used to predict subroutine return
addresses. The return stack has eight 32-bit entries that store the link register value in r14 (register 14)
and the ARM or Thumb state of the calling function. When a return-type instruction is predicted taken,
the return stack provides the last pushed address and state.

3.3 Level-1 cache
The Cortex-A8 processor has a single-cycle load-use penalty for fast access to the Level-1 caches. The
data and instruction caches are configurable to 16k or 32k. Each is 4-way set associative and uses a
Hash Virtual Address Buffer (HVAB) way prediction scheme to improve timing and reduce power
consumption. The caches are physically addressed (virtual index, physical tag) and have hardware
support for avoiding aliased entries. Parity is supported with one parity bit per byte.

aneeshr2020@gmail.com

Aneesh R
Architecture and Implementation of the ARM Cortex-A8 Microprocessor

ANEESH R

aneeshr2020@gmail.com

The replacement policy for the data cache is write-back with no write allocates. Also included is a store
buffer for data merging before writing to main memory.
The HVAB is a novel approach to reducing the power required for accessing the caches. It uses a
prediction scheme to determine which way of the RAM to enable before an access.

3.4

Level-2 cache

The Cortex-A8 processor includes an integrated Level-2 cache. This gives the Level-2 cache a dedicated
low latency, high bandwidth interface to the Level-l cache. This minimizes the latency of Level-1 cache
linefills and does not conflict with traffic on the main system bus. It can be configured in sizes from 64k
to 2M.
The Level-2 cache is physically addressed and 8-way set associative. It is a unified data and instruction
cache, and supports optional ECC and Parity. Write back, write through, and write-allocate policies are
followed according to page table settings. A pseudo-random allocation policy is used. The contents of
the Level-1 data cache are exclusive with the Level-2 cache, whereas the contents of the Level-1
instruction cache are a subset of the Level-2 cache. The tag and data RAMs of the Level-2 cache are
accessed serially for power savings.

3.5 Neon-media engine
The Cortex-A8 processor’s NEON media processing engine pipeline starts at the end of the main integer
pipeline. As a result, all exceptions and branch mispredictions are resolved before instructions reach
it. More importantly, there is a zero load-use penalty for data in the Level-1 cache. The ARM integer
unit generates the addresses for NEON loads and stores as they pass through the pipeline, thus allowing
data to be fetched from the Level-1 cache before it is required by a NEON data processing
operation. Deep instruction and load-data buffering between the NEON engine, the ARM integer unit
and the memory system allow the latency of Level-2 accesses to be hidden for streamed data. A store
buffer prevents NEON stores from blocking the pipeline and detects address collisions with the ARM
integer unit accesses and NEON loads.

aneeshr2020@gmail.com

Aneesh R
Architecture and Implementation of the ARM Cortex-A8 Microprocessor

ANEESH R

aneeshr2020@gmail.com
The NEON unit is decoupled from the main ARM integer pipeline by the NEON instruction queue
(NIQ). The ARM Instruction Execute Unit can issue up to two valid instructions to the NEON unit each
clock cycle. NEON has 128-bit wide load and store paths to the Level-1 and Level-2 cache, and supports
streaming from both.
The NEON media engine has its own 10 stage pipeline that begins at the end ARM integer
pipeline. Since all mispredicts and exceptions have been resolved in the ARM integer unit, once an
instruction has been issued to the NEON media engine it must be completed as it cannot generate
exceptions. NEON has three SIMD integer pipelines, a load-store/permute pipeline, two SIMD singleprecision floating-point pipelines, and a non-pipelined Vector Floating-Point unit (VFPLite).
NEON instructions are issued and retired in-order. A data processing instruction is either a NEON
integer instruction or a NEON floating-point instruction. The Cortex-A8 NEON unit does not parallel
issue two data-processing instructions to avoid the area overhead with duplicating the data-processing
functional blocks, and to avoid timing critical paths and complexity overhead associated with the muxing
of the read and write register ports.
The NEON integer datapath consists of three pipelines: an integer multiply/accumulate pipeline (MAC),
an integer Shift pipeline, and an integer ALU pipeline. A load-store/permute pipeline is responsible for
all NEON load/stores, data transfers to/from the integer unit, and data permute operations such as
interleave and de-interleave. The NEON floating-point (NFP) datapath has two main pipelines: a
multiply pipeline and an add pipeline. The separate VFPLite unit is a non-pipelined implementation of
the ARM
VFPv3 Floating Point Specification targeted for medium performance IEEE 754 compliant floating point
support. VFPLite is used to provide backwards compatibility with existing ARM floating point code and
to provide IEEE 754 compliant single and double precision arithmetic. The “Lite” refers to area and
performance, not functionality.

aneeshr2020@gmail.com

Aneesh R
Architecture and Implementation of the ARM Cortex-A8 Microprocessor

ANEESH R

aneeshr2020@gmail.com

4 Implementation
Because of the aggressive performance, power, and area targets (PPA) of the Cortex-A8 processor, new
implementation flows have been developed in order to meet goals without resorting to a full-custom
implementation. The resulting flows enable fine tuning of the design to the desired application. The
result is fundamentally a cell-based flow, but under it lies semi-custom techniques that have been used
where necessary to meet performance.
The Cortex-A8 processor uses a combination of synthesized, structured, and custom circuits. The design
was divided into seven functional units and then subdivided into blocks, and the appropriate
implementation technique chosen for each. Since the entire design is synthesizeable, blocks that can
easily meet their PPA goals can stick with a standard synthesis flow.
A structured flow is used for blocks which contain logic that can take advantage of controlled placement
and routing approach to meet timing or area goals. This approach is a semi-custom flow that manually
maps the block into a gate-level netlist and specifies a relative placement for all the cells in the
block. The relative placement does not specify the exact locations of the cells but how each cell is
placed with respect to the other cells in the block.
The structured approach is typically used for data blocks that have regular structure. The logic
implementation and technology mapping of the block is done manually to maintain the regular dataoriented bus structure of the block instead of generating a random gate structure through
synthesis. The logic gates of the block are placed according to the flow of data through the block. This
approach offers more control over the design than an automated synthesis approach and leads to a
more predictable timing closure. It is also possible to get better performance and area on complex,
high-performance designs than traditional techniques. The resulting netlists may be interpreted with
traditional tiling techniques using the ARM Artisan® Advantage-CE™ or compatible library.
The Artisan Advantage-CE library contains more than a thousand cells. Besides the standard cells used
typical synthesis libraries, many tactical cells are included more in line with custom implementation
techniques. These are used in an automated fashion in the structured flow. The library is specifically
designed to deal with the high-density routing requirements of high-performance processors with a
focus on both high speed operation and low static and dynamic power. Leakage reduction is achieved
through power gating MT-CMOS cells and retention flip-fops to support sleep and standby modes. ARM
has worked with tool vendors to ensure support for this critical new flow.
Finally, a few of the most critical timing and area sensitive blocks of the design are reserved for full
custom techniques. This includes memory arrays, register files and scoreboards. These blocks contain a
mix of static and dynamic logic. No self-timed circuits are used.

5 Conclusion
The Cortex-A8 processor is the fastest, most power-efficient microprocessor yet developed by
ARM. With the ability to decode VGA H.264 video in under 350MHz, it provides the media processing
power required for next generation wireless and consumer products while consuming less than 300mW
in 65nm technologies. Its new NEON technology provides a platform for flexible software-based
solutions for media processing. Thumb-2 instructions provide code density while maintaining the
performance of standard ARM code; Jazelle RCT technology does likewise for realtime
compilers. TrustZone technology provides security for sensitive data and DRM.
Many significant new microarchitecture features make their first appearance on the Cortex-A8
processor. These include a dual issue, in-order superscalar pipeline, an integrated Level-2 cache and a
aneeshr2020@gmail.com

Aneesh R
Architecture and Implementation of the ARM Cortex-A8 Microprocessor

ANEESH R

aneeshr2020@gmail.com
significantly deeper pipeline than precious ARM processors. To meet its aggressive performance targets
while maintaining ARM’s traditional small power budget, new flows have been developed which
approach the efficiency of custom techniques while keeping the flexibility of an automated flow. The
Cortex-A8 processor is a quantum jump in flexible low power, high-performance processing.

aneeshr2020@gmail.com

Aneesh R

More Related Content

What's hot

Features of modern intel microprocessors
Features of modern intel microprocessorsFeatures of modern intel microprocessors
Features of modern intel microprocessors
Krunal Siddhapathak
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
ananth
 
Lecture 1.pptx
Lecture 1.pptxLecture 1.pptx
Lecture 1.pptx
GautamDhargalkar1
 
ความหมายของโครงงานคอมพิวเตอร์
ความหมายของโครงงานคอมพิวเตอร์ความหมายของโครงงานคอมพิวเตอร์
ความหมายของโครงงานคอมพิวเตอร์
kanlayaann
 
ใบงานที่ 3 การตกแต่งข้อมูลและตาราง
ใบงานที่ 3 การตกแต่งข้อมูลและตารางใบงานที่ 3 การตกแต่งข้อมูลและตาราง
ใบงานที่ 3 การตกแต่งข้อมูลและตารางMeaw Sukee
 
Sobel Edge Detection Using FPGA
Sobel Edge Detection Using FPGASobel Edge Detection Using FPGA
Sobel Edge Detection Using FPGA
ghanshyam zambare
 
Chroma from Luma Intra Prediction for AV1
Chroma from Luma Intra Prediction for AV1Chroma from Luma Intra Prediction for AV1
Chroma from Luma Intra Prediction for AV1
Luc Trudeau
 
02ใบความรู้ที่2.1 อีเมล
02ใบความรู้ที่2.1 อีเมล02ใบความรู้ที่2.1 อีเมล
02ใบความรู้ที่2.1 อีเมลJeeraJaree Srithai
 
แบบทดสอบ คอม
แบบทดสอบ คอมแบบทดสอบ คอม
แบบทดสอบ คอมSarinya Samkaew
 
แนวข้อสอบ คอมพิวเตอร์
แนวข้อสอบ คอมพิวเตอร์แนวข้อสอบ คอมพิวเตอร์
แนวข้อสอบ คอมพิวเตอร์Jinwara Sriwichai
 
Graphics Processing Unit by Saurabh
Graphics Processing Unit by SaurabhGraphics Processing Unit by Saurabh
Graphics Processing Unit by Saurabh
Saurabh Kumar
 
ใบความรุ้ 2 เรื่อง เครือข่ายคอมพิวเตอร์
ใบความรุ้ 2 เรื่อง เครือข่ายคอมพิวเตอร์ใบความรุ้ 2 เรื่อง เครือข่ายคอมพิวเตอร์
ใบความรุ้ 2 เรื่อง เครือข่ายคอมพิวเตอร์jansaowapa
 
ใบความรู้เรื่อง การนำเสนอผลงาน
ใบความรู้เรื่อง การนำเสนอผลงานใบความรู้เรื่อง การนำเสนอผลงาน
ใบความรู้เรื่อง การนำเสนอผลงานPraphaphun Kaewmuan
 
CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentation
Vishal Singh
 
CXL chapter1 and chapter 2 presentation.pptx
CXL chapter1 and chapter 2 presentation.pptxCXL chapter1 and chapter 2 presentation.pptx
CXL chapter1 and chapter 2 presentation.pptx
kirankumarpalakurthi
 
CPU vs GPU Comparison
CPU  vs GPU ComparisonCPU  vs GPU Comparison
CPU vs GPU Comparison
jeetendra mandal
 
Performance Comparison Between x86 and ARM Assembly
Performance Comparison Between x86 and ARM AssemblyPerformance Comparison Between x86 and ARM Assembly
Performance Comparison Between x86 and ARM Assembly
Manasa K
 
Optical character recognition (ocr) ppt
Optical character recognition (ocr) pptOptical character recognition (ocr) ppt
Optical character recognition (ocr) ppt
Deijee Kalita
 

What's hot (20)

Features of modern intel microprocessors
Features of modern intel microprocessorsFeatures of modern intel microprocessors
Features of modern intel microprocessors
 
Iqa iso9001 dark style
Iqa iso9001 dark styleIqa iso9001 dark style
Iqa iso9001 dark style
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
Lecture 1.pptx
Lecture 1.pptxLecture 1.pptx
Lecture 1.pptx
 
ความหมายของโครงงานคอมพิวเตอร์
ความหมายของโครงงานคอมพิวเตอร์ความหมายของโครงงานคอมพิวเตอร์
ความหมายของโครงงานคอมพิวเตอร์
 
ใบงานที่ 3 การตกแต่งข้อมูลและตาราง
ใบงานที่ 3 การตกแต่งข้อมูลและตารางใบงานที่ 3 การตกแต่งข้อมูลและตาราง
ใบงานที่ 3 การตกแต่งข้อมูลและตาราง
 
Sobel Edge Detection Using FPGA
Sobel Edge Detection Using FPGASobel Edge Detection Using FPGA
Sobel Edge Detection Using FPGA
 
Strain
StrainStrain
Strain
 
Chroma from Luma Intra Prediction for AV1
Chroma from Luma Intra Prediction for AV1Chroma from Luma Intra Prediction for AV1
Chroma from Luma Intra Prediction for AV1
 
02ใบความรู้ที่2.1 อีเมล
02ใบความรู้ที่2.1 อีเมล02ใบความรู้ที่2.1 อีเมล
02ใบความรู้ที่2.1 อีเมล
 
แบบทดสอบ คอม
แบบทดสอบ คอมแบบทดสอบ คอม
แบบทดสอบ คอม
 
แนวข้อสอบ คอมพิวเตอร์
แนวข้อสอบ คอมพิวเตอร์แนวข้อสอบ คอมพิวเตอร์
แนวข้อสอบ คอมพิวเตอร์
 
Graphics Processing Unit by Saurabh
Graphics Processing Unit by SaurabhGraphics Processing Unit by Saurabh
Graphics Processing Unit by Saurabh
 
ใบความรุ้ 2 เรื่อง เครือข่ายคอมพิวเตอร์
ใบความรุ้ 2 เรื่อง เครือข่ายคอมพิวเตอร์ใบความรุ้ 2 เรื่อง เครือข่ายคอมพิวเตอร์
ใบความรุ้ 2 เรื่อง เครือข่ายคอมพิวเตอร์
 
ใบความรู้เรื่อง การนำเสนอผลงาน
ใบความรู้เรื่อง การนำเสนอผลงานใบความรู้เรื่อง การนำเสนอผลงาน
ใบความรู้เรื่อง การนำเสนอผลงาน
 
CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentation
 
CXL chapter1 and chapter 2 presentation.pptx
CXL chapter1 and chapter 2 presentation.pptxCXL chapter1 and chapter 2 presentation.pptx
CXL chapter1 and chapter 2 presentation.pptx
 
CPU vs GPU Comparison
CPU  vs GPU ComparisonCPU  vs GPU Comparison
CPU vs GPU Comparison
 
Performance Comparison Between x86 and ARM Assembly
Performance Comparison Between x86 and ARM AssemblyPerformance Comparison Between x86 and ARM Assembly
Performance Comparison Between x86 and ARM Assembly
 
Optical character recognition (ocr) ppt
Optical character recognition (ocr) pptOptical character recognition (ocr) ppt
Optical character recognition (ocr) ppt
 

Viewers also liked

ARM cortex A15
ARM cortex A15ARM cortex A15
ARM cortex A15
KOMAL YAMGAR
 
ISE FundHub product guide
ISE FundHub product guideISE FundHub product guide
ISE FundHub product guide
gerrysugrue
 
Evaluation Question 1
Evaluation Question 1Evaluation Question 1
Evaluation Question 1
LydiaGreenwood
 
25 Important Content Marketing Statistics
25 Important Content Marketing Statistics25 Important Content Marketing Statistics
25 Important Content Marketing Statistics
The Content Strategist
 
Unalligned versus natureally alligned memory access
Unalligned versus natureally alligned memory accessUnalligned versus natureally alligned memory access
Unalligned versus natureally alligned memory access
Aneesh Raveendran
 
Singapore
Singapore Singapore
Singapore
zoolainsudan
 
Evaluation Question 1: In what ways does your media product use, develop or c...
Evaluation Question 1: In what ways does your media product use, develop or c...Evaluation Question 1: In what ways does your media product use, develop or c...
Evaluation Question 1: In what ways does your media product use, develop or c...
LydiaGreenwood
 
Transforming learning with new technologies
Transforming learning with new technologiesTransforming learning with new technologies
Transforming learning with new technologies
tonikaperry
 
Resources Valley
Resources ValleyResources Valley
Resources Valley
Resources Valley
 
SFO15-406: ARM FDPIC toolset, kernel & libraries for Cortex-M & Cortex-R mmul...
SFO15-406: ARM FDPIC toolset, kernel & libraries for Cortex-M & Cortex-R mmul...SFO15-406: ARM FDPIC toolset, kernel & libraries for Cortex-M & Cortex-R mmul...
SFO15-406: ARM FDPIC toolset, kernel & libraries for Cortex-M & Cortex-R mmul...
Linaro
 
Arm cortex R(real time)processor series
Arm cortex R(real time)processor series Arm cortex R(real time)processor series
Arm cortex R(real time)processor series
Ronak047
 

Viewers also liked (11)

ARM cortex A15
ARM cortex A15ARM cortex A15
ARM cortex A15
 
ISE FundHub product guide
ISE FundHub product guideISE FundHub product guide
ISE FundHub product guide
 
Evaluation Question 1
Evaluation Question 1Evaluation Question 1
Evaluation Question 1
 
25 Important Content Marketing Statistics
25 Important Content Marketing Statistics25 Important Content Marketing Statistics
25 Important Content Marketing Statistics
 
Unalligned versus natureally alligned memory access
Unalligned versus natureally alligned memory accessUnalligned versus natureally alligned memory access
Unalligned versus natureally alligned memory access
 
Singapore
Singapore Singapore
Singapore
 
Evaluation Question 1: In what ways does your media product use, develop or c...
Evaluation Question 1: In what ways does your media product use, develop or c...Evaluation Question 1: In what ways does your media product use, develop or c...
Evaluation Question 1: In what ways does your media product use, develop or c...
 
Transforming learning with new technologies
Transforming learning with new technologiesTransforming learning with new technologies
Transforming learning with new technologies
 
Resources Valley
Resources ValleyResources Valley
Resources Valley
 
SFO15-406: ARM FDPIC toolset, kernel & libraries for Cortex-M & Cortex-R mmul...
SFO15-406: ARM FDPIC toolset, kernel & libraries for Cortex-M & Cortex-R mmul...SFO15-406: ARM FDPIC toolset, kernel & libraries for Cortex-M & Cortex-R mmul...
SFO15-406: ARM FDPIC toolset, kernel & libraries for Cortex-M & Cortex-R mmul...
 
Arm cortex R(real time)processor series
Arm cortex R(real time)processor series Arm cortex R(real time)processor series
Arm cortex R(real time)processor series
 

Similar to Architecture and Implementation of the ARM Cortex-A8 Microprocessor

Ec8791 arm 9 processor
Ec8791 arm 9 processorEc8791 arm 9 processor
Ec8791 arm 9 processor
RajalakshmiSermadurai
 
Digital electronics
Digital electronicsDigital electronics
Digital electronics
Sudipta Chatterjee
 
Arm Cortex A8 Vs Intel Atom:Architectural And Benchmark Comparisons
Arm Cortex A8 Vs Intel Atom:Architectural And Benchmark ComparisonsArm Cortex A8 Vs Intel Atom:Architectural And Benchmark Comparisons
Arm Cortex A8 Vs Intel Atom:Architectural And Benchmark Comparisons
napoleaninlondon
 
Arm
ArmArm
Module-2 Instruction Set Cpus.pdf
Module-2 Instruction Set Cpus.pdfModule-2 Instruction Set Cpus.pdf
Module-2 Instruction Set Cpus.pdf
Sitamarhi Institute of Technology
 
A42060105
A42060105A42060105
A42060105
IJERA Editor
 
iPhone Architecture - Review
iPhone Architecture - ReviewiPhone Architecture - Review
iPhone Architecture - Review
Abdelrahman Hosny
 
18CS44-MODULE1-PPT.pptx
18CS44-MODULE1-PPT.pptx18CS44-MODULE1-PPT.pptx
18CS44-MODULE1-PPT.pptx
KokilaK25
 
How to Select Hardware for Internet of Things Systems?
How to Select Hardware for Internet of Things Systems?How to Select Hardware for Internet of Things Systems?
How to Select Hardware for Internet of Things Systems?
Hannes Tschofenig
 
Iaetsd near field
Iaetsd near fieldIaetsd near field
Iaetsd near field
Iaetsd Iaetsd
 
ARM 32-bit Microcontroller Cortex-M3 introduction
ARM 32-bit Microcontroller Cortex-M3 introductionARM 32-bit Microcontroller Cortex-M3 introduction
ARM 32-bit Microcontroller Cortex-M3 introduction
anand hd
 
Design of a low power processor for Embedded system applications
Design of a low power processor for Embedded system applicationsDesign of a low power processor for Embedded system applications
Design of a low power processor for Embedded system applications
ROHIT89352
 
Unit ii arm7 thumb
Unit ii arm7 thumbUnit ii arm7 thumb
Unit ii arm7 thumb
Dr. Pankaj Zope
 
Unit II Arm7 Thumb Instruction
Unit II Arm7 Thumb InstructionUnit II Arm7 Thumb Instruction
Unit II Arm7 Thumb Instruction
Dr. Pankaj Zope
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Eric Van Hensbergen
 
embedded system and microcontroller
 embedded system and microcontroller embedded system and microcontroller
embedded system and microcontroller
SHILPA Sillobhargav
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
arm 7 microprocessor architecture ans pin diagram.ppt
arm 7 microprocessor architecture ans pin diagram.pptarm 7 microprocessor architecture ans pin diagram.ppt
arm 7 microprocessor architecture ans pin diagram.ppt
manikandan970975
 
Arm7 document
Arm7  documentArm7  document
Arm7 document
N Harisha
 
A 32-Bit Parameterized Leon-3 Processor with Custom Peripheral Integration
A 32-Bit Parameterized Leon-3 Processor with Custom Peripheral IntegrationA 32-Bit Parameterized Leon-3 Processor with Custom Peripheral Integration
A 32-Bit Parameterized Leon-3 Processor with Custom Peripheral Integration
Talal Khaliq
 

Similar to Architecture and Implementation of the ARM Cortex-A8 Microprocessor (20)

Ec8791 arm 9 processor
Ec8791 arm 9 processorEc8791 arm 9 processor
Ec8791 arm 9 processor
 
Digital electronics
Digital electronicsDigital electronics
Digital electronics
 
Arm Cortex A8 Vs Intel Atom:Architectural And Benchmark Comparisons
Arm Cortex A8 Vs Intel Atom:Architectural And Benchmark ComparisonsArm Cortex A8 Vs Intel Atom:Architectural And Benchmark Comparisons
Arm Cortex A8 Vs Intel Atom:Architectural And Benchmark Comparisons
 
Arm
ArmArm
Arm
 
Module-2 Instruction Set Cpus.pdf
Module-2 Instruction Set Cpus.pdfModule-2 Instruction Set Cpus.pdf
Module-2 Instruction Set Cpus.pdf
 
A42060105
A42060105A42060105
A42060105
 
iPhone Architecture - Review
iPhone Architecture - ReviewiPhone Architecture - Review
iPhone Architecture - Review
 
18CS44-MODULE1-PPT.pptx
18CS44-MODULE1-PPT.pptx18CS44-MODULE1-PPT.pptx
18CS44-MODULE1-PPT.pptx
 
How to Select Hardware for Internet of Things Systems?
How to Select Hardware for Internet of Things Systems?How to Select Hardware for Internet of Things Systems?
How to Select Hardware for Internet of Things Systems?
 
Iaetsd near field
Iaetsd near fieldIaetsd near field
Iaetsd near field
 
ARM 32-bit Microcontroller Cortex-M3 introduction
ARM 32-bit Microcontroller Cortex-M3 introductionARM 32-bit Microcontroller Cortex-M3 introduction
ARM 32-bit Microcontroller Cortex-M3 introduction
 
Design of a low power processor for Embedded system applications
Design of a low power processor for Embedded system applicationsDesign of a low power processor for Embedded system applications
Design of a low power processor for Embedded system applications
 
Unit ii arm7 thumb
Unit ii arm7 thumbUnit ii arm7 thumb
Unit ii arm7 thumb
 
Unit II Arm7 Thumb Instruction
Unit II Arm7 Thumb InstructionUnit II Arm7 Thumb Instruction
Unit II Arm7 Thumb Instruction
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
 
embedded system and microcontroller
 embedded system and microcontroller embedded system and microcontroller
embedded system and microcontroller
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
arm 7 microprocessor architecture ans pin diagram.ppt
arm 7 microprocessor architecture ans pin diagram.pptarm 7 microprocessor architecture ans pin diagram.ppt
arm 7 microprocessor architecture ans pin diagram.ppt
 
Arm7 document
Arm7  documentArm7  document
Arm7 document
 
A 32-Bit Parameterized Leon-3 Processor with Custom Peripheral Integration
A 32-Bit Parameterized Leon-3 Processor with Custom Peripheral IntegrationA 32-Bit Parameterized Leon-3 Processor with Custom Peripheral Integration
A 32-Bit Parameterized Leon-3 Processor with Custom Peripheral Integration
 

More from Aneesh Raveendran

Single_Electron_Transistor_Aneesh_Raveendran
Single_Electron_Transistor_Aneesh_RaveendranSingle_Electron_Transistor_Aneesh_Raveendran
Single_Electron_Transistor_Aneesh_Raveendran
Aneesh Raveendran
 
Universal Asynchronous Receive and transmit IP core
Universal Asynchronous Receive and transmit IP coreUniversal Asynchronous Receive and transmit IP core
Universal Asynchronous Receive and transmit IP core
Aneesh Raveendran
 
Branch prediction
Branch predictionBranch prediction
Branch prediction
Aneesh Raveendran
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with Pipelining
Aneesh Raveendran
 
Reversible Logic Gate
Reversible Logic GateReversible Logic Gate
Reversible Logic Gate
Aneesh Raveendran
 
Pipelineing idealisam
Pipelineing idealisamPipelineing idealisam
Pipelineing idealisam
Aneesh Raveendran
 
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGADesign and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Aneesh Raveendran
 
Design of FPGA based 8-bit RISC Controller IP core using VHDL
Design of FPGA based 8-bit RISC Controller IP core using VHDLDesign of FPGA based 8-bit RISC Controller IP core using VHDL
Design of FPGA based 8-bit RISC Controller IP core using VHDL
Aneesh Raveendran
 

More from Aneesh Raveendran (8)

Single_Electron_Transistor_Aneesh_Raveendran
Single_Electron_Transistor_Aneesh_RaveendranSingle_Electron_Transistor_Aneesh_Raveendran
Single_Electron_Transistor_Aneesh_Raveendran
 
Universal Asynchronous Receive and transmit IP core
Universal Asynchronous Receive and transmit IP coreUniversal Asynchronous Receive and transmit IP core
Universal Asynchronous Receive and transmit IP core
 
Branch prediction
Branch predictionBranch prediction
Branch prediction
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with Pipelining
 
Reversible Logic Gate
Reversible Logic GateReversible Logic Gate
Reversible Logic Gate
 
Pipelineing idealisam
Pipelineing idealisamPipelineing idealisam
Pipelineing idealisam
 
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGADesign and Implementation of Bluetooth MAC core with RFCOMM on FPGA
Design and Implementation of Bluetooth MAC core with RFCOMM on FPGA
 
Design of FPGA based 8-bit RISC Controller IP core using VHDL
Design of FPGA based 8-bit RISC Controller IP core using VHDLDesign of FPGA based 8-bit RISC Controller IP core using VHDL
Design of FPGA based 8-bit RISC Controller IP core using VHDL
 

Recently uploaded

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 

Recently uploaded (20)

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 

Architecture and Implementation of the ARM Cortex-A8 Microprocessor

  • 1. Architecture and Implementation of the ARM Cortex-A8 Microprocessor ANEESH R aneeshr2020@gmail.com Architecture and Implementation of the ARM Cortex-A8 Microprocessor 1 Introduction The ARM® Cortex™-A8 microprocessor is the first applications microprocessor in ARM’s new Cortex family. With high performance and power efficiency, it targets a wide variety of mobile and consumer applications including mobile phones, set-top boxes, gaming consoles and automotive navigation/entertainment systems. The Cortex-A8 processor spans a range of performance points depending on the implementation, delivering over to 2000 Dhrystone MIPS (DMIPS) of performance for demanding consumer applications and consuming less than 300mW for low-power mobile devices. This translates into a large increase in processing capability while staying with the power levels of previous generations of mobile devices. Consumer applications will benefit from the reduced heat dissipation and resulting lower packaging and integration costs. It is the first ARM processor to incorporate all of the new technologies available in the ARMv7 architecture. New technologies seen for the first time include NEON™ for media and signal processing and Jazelle® RCT for acceleration of real-time compilers. Other technologies recently introduced that are now standard on the ARMv7 architecture include TrustZone® technology for security, Thumb®-2 technology for code density and the VFPv3 floating point architecture. 2 Overview of the Cortex Architecture The unifying technology of Cortex processors is Thumb-2 technology. The Thumb-2 instruction set combines 16- and 32-bit instructions to improve code density and performance. The original ARM instruction set consists of fixed-length 32-bit instructions, while the Thumb instruction set employs 16bit instructions. Because not all operations mapped into the original Thumb instruction set, multiple instructions were sometimes needed to emulate the task of one 32-bit instruction. Thumb-2 technology adds about 130 additional instructions to Thumb. The added functionality removes the need to switch between ARM and Thumb modes in order to service interrupts, and gives access to the full set of processor registers. The resulting code maintains the traditional code density of Thumb instructions while running at the performance levels of 32-bit ARM code. Entire applications can now be written in Thumb-2 technology, removing the original architecture required for mode switching. An entire application can be written using space-saving Thumb-2 instructions, whereas with the original Thumb mode the processor had to switch between ARM and Thumb modes. Making its first appearance in an ARM processor is the NEON media and signal processing technology targeted at audio, video and 3D graphics. It is a 64/128-bit hybrid SIMD architecture. NEON technology has its own register file and execution pipeline which are separate from the main ARM integer pipeline. It can handle both integer and single precision floating-point values, and includes support for unaligned data accesses and easy loading of interleaved data stored in structure form. Using NEON technology to perform typical multimedia functions, the Cortex-A8 processor can decode MPEG-4 VGA video (including dering, deblock filters and yuv2rgb) at 30 frames/sec at 275 MHz, and H.264 Video at 350 MHz aneeshr2020@gmail.com Aneesh R
  • 2. Architecture and Implementation of the ARM Cortex-A8 Microprocessor ANEESH R aneeshr2020@gmail.com Also new is Jazelle RCT technology, an architecture extension that cuts the memory footprint of just-intime (JIT) bytecode applications to a third of their original size. The smaller code size results in a boost performance and a reduction of power. TrustZone technology is included in the Cortex-A8 to ensure data privacy and DRM protection in consumer products like mobile phones, personal digital assistants and set-top boxes that run open operating systems. Implemented within the processor core, TrustZone technology protects peripherals and memory against a security attack. A secure monitor within the core serves as a gatekeeper switching the system between secure and non-secure states. In the secure state, the processor runs “trusted” code from a secure code block to handle security-sensitive tasks such as authentication and signature manipulation. Besides contributing to the processor's signal processing performance, NEON technology enables software solutions to data processing applications. The result is a flexible platform which can accommodate new algorithms and new applications as they emerge with simply the download of new software or a driver. The VFPv3 technology is an enhancement to the VFPv2 technology. New features include a doubling of the number of double-precision registers to 32, and the introduction of instructions that perform conversions between fixed-point and floating-point numbers. 3 Exploring Features of the Cortex-A8 Microarchitecture The Cortex-A8 processor is the most sophisticated low-power design yet produced by ARM. To achieve its high levels of performance, new microarchitecture features were added which are not traditionally found in the ARM architecture, including a dual in-order issue ARM integer pipeline, an integrated L2 cache and a deep 13-stage pipe. Fig: ARM Cortex-A8 architecture aneeshr2020@gmail.com Aneesh R
  • 3. Architecture and Implementation of the ARM Cortex-A8 Microprocessor ANEESH R aneeshr2020@gmail.com 3.1 Super scalar pipeline Perhaps the most significant of these new features is the dual-issue, in-order, statically scheduled ARM integer pipeline. Previous ARM processors have only a single integer execution pipeline. The ability to issue two data processing instructions at the same time significantly increases the maximum potential instructions executed per cycle. It was decided to stay with in-order issue to keep additional power required to a minimum. Out-of-order issue and retire can require extensive amounts of logic consuming extra power. The choice to go with in-order also allows for fire-and-forget instruction issue, thus removing critical paths from the design and reducing the need for custom design in the pipeline. Static scheduling allows for extensive clock gating for reduced power during execution. The dual ALU (arithmetic logic unit) pipelines (ALU 0 and ALU 1) are symmetric and both can handle most arithmetic instructions. ALU pipe 0 always carries the older of a pair of issued instructions. The Cortex-A8 processor also has multiplier and load-store pipelines, but these do not carry additional instructions to the two ALU pipelines. These can be thought of as “dependent” pipelines. Their use requires simultaneous use of one of the ALU pipelines. The multiplier pipeline can only be coupled with instructions that are in ALU 0 pipelines, whereas the load-store pipeline can be coupled with instructions in either ALU. 3.2 Branch prediction The 13-stage pipeline was selected to enable significantly higher operating frequencies than previous generations of ARM microarchitectures. Note that stage F0 is not counted because it is only address generation. To minimize the branch penalties typically associated with a deeper pipeline, the Cortex-A8 processor implements a two-level global history branch predictor. It consists of two different structures: the Branch Target Buffer (BTB) and the Global History Buffer (GHB) which are accessed in parallel with instruction fetches. The BTB indicates whether or not the current fetch address will return a branch instruction and its branch target address. It contains 512 entries. On a hit in the BTB a branch is predicted and the GHB is accessed. The GHB consists of 4096 2-bit saturating counters that encode the strength and direction information of branches. The GHB is indexed by 10-bit history of the direction of the last ten branches encountered and 4 bits of the PC. In addition to the dynamic branch predictor, a return stack is used to predict subroutine return addresses. The return stack has eight 32-bit entries that store the link register value in r14 (register 14) and the ARM or Thumb state of the calling function. When a return-type instruction is predicted taken, the return stack provides the last pushed address and state. 3.3 Level-1 cache The Cortex-A8 processor has a single-cycle load-use penalty for fast access to the Level-1 caches. The data and instruction caches are configurable to 16k or 32k. Each is 4-way set associative and uses a Hash Virtual Address Buffer (HVAB) way prediction scheme to improve timing and reduce power consumption. The caches are physically addressed (virtual index, physical tag) and have hardware support for avoiding aliased entries. Parity is supported with one parity bit per byte. aneeshr2020@gmail.com Aneesh R
  • 4. Architecture and Implementation of the ARM Cortex-A8 Microprocessor ANEESH R aneeshr2020@gmail.com The replacement policy for the data cache is write-back with no write allocates. Also included is a store buffer for data merging before writing to main memory. The HVAB is a novel approach to reducing the power required for accessing the caches. It uses a prediction scheme to determine which way of the RAM to enable before an access. 3.4 Level-2 cache The Cortex-A8 processor includes an integrated Level-2 cache. This gives the Level-2 cache a dedicated low latency, high bandwidth interface to the Level-l cache. This minimizes the latency of Level-1 cache linefills and does not conflict with traffic on the main system bus. It can be configured in sizes from 64k to 2M. The Level-2 cache is physically addressed and 8-way set associative. It is a unified data and instruction cache, and supports optional ECC and Parity. Write back, write through, and write-allocate policies are followed according to page table settings. A pseudo-random allocation policy is used. The contents of the Level-1 data cache are exclusive with the Level-2 cache, whereas the contents of the Level-1 instruction cache are a subset of the Level-2 cache. The tag and data RAMs of the Level-2 cache are accessed serially for power savings. 3.5 Neon-media engine The Cortex-A8 processor’s NEON media processing engine pipeline starts at the end of the main integer pipeline. As a result, all exceptions and branch mispredictions are resolved before instructions reach it. More importantly, there is a zero load-use penalty for data in the Level-1 cache. The ARM integer unit generates the addresses for NEON loads and stores as they pass through the pipeline, thus allowing data to be fetched from the Level-1 cache before it is required by a NEON data processing operation. Deep instruction and load-data buffering between the NEON engine, the ARM integer unit and the memory system allow the latency of Level-2 accesses to be hidden for streamed data. A store buffer prevents NEON stores from blocking the pipeline and detects address collisions with the ARM integer unit accesses and NEON loads. aneeshr2020@gmail.com Aneesh R
  • 5. Architecture and Implementation of the ARM Cortex-A8 Microprocessor ANEESH R aneeshr2020@gmail.com The NEON unit is decoupled from the main ARM integer pipeline by the NEON instruction queue (NIQ). The ARM Instruction Execute Unit can issue up to two valid instructions to the NEON unit each clock cycle. NEON has 128-bit wide load and store paths to the Level-1 and Level-2 cache, and supports streaming from both. The NEON media engine has its own 10 stage pipeline that begins at the end ARM integer pipeline. Since all mispredicts and exceptions have been resolved in the ARM integer unit, once an instruction has been issued to the NEON media engine it must be completed as it cannot generate exceptions. NEON has three SIMD integer pipelines, a load-store/permute pipeline, two SIMD singleprecision floating-point pipelines, and a non-pipelined Vector Floating-Point unit (VFPLite). NEON instructions are issued and retired in-order. A data processing instruction is either a NEON integer instruction or a NEON floating-point instruction. The Cortex-A8 NEON unit does not parallel issue two data-processing instructions to avoid the area overhead with duplicating the data-processing functional blocks, and to avoid timing critical paths and complexity overhead associated with the muxing of the read and write register ports. The NEON integer datapath consists of three pipelines: an integer multiply/accumulate pipeline (MAC), an integer Shift pipeline, and an integer ALU pipeline. A load-store/permute pipeline is responsible for all NEON load/stores, data transfers to/from the integer unit, and data permute operations such as interleave and de-interleave. The NEON floating-point (NFP) datapath has two main pipelines: a multiply pipeline and an add pipeline. The separate VFPLite unit is a non-pipelined implementation of the ARM VFPv3 Floating Point Specification targeted for medium performance IEEE 754 compliant floating point support. VFPLite is used to provide backwards compatibility with existing ARM floating point code and to provide IEEE 754 compliant single and double precision arithmetic. The “Lite” refers to area and performance, not functionality. aneeshr2020@gmail.com Aneesh R
  • 6. Architecture and Implementation of the ARM Cortex-A8 Microprocessor ANEESH R aneeshr2020@gmail.com 4 Implementation Because of the aggressive performance, power, and area targets (PPA) of the Cortex-A8 processor, new implementation flows have been developed in order to meet goals without resorting to a full-custom implementation. The resulting flows enable fine tuning of the design to the desired application. The result is fundamentally a cell-based flow, but under it lies semi-custom techniques that have been used where necessary to meet performance. The Cortex-A8 processor uses a combination of synthesized, structured, and custom circuits. The design was divided into seven functional units and then subdivided into blocks, and the appropriate implementation technique chosen for each. Since the entire design is synthesizeable, blocks that can easily meet their PPA goals can stick with a standard synthesis flow. A structured flow is used for blocks which contain logic that can take advantage of controlled placement and routing approach to meet timing or area goals. This approach is a semi-custom flow that manually maps the block into a gate-level netlist and specifies a relative placement for all the cells in the block. The relative placement does not specify the exact locations of the cells but how each cell is placed with respect to the other cells in the block. The structured approach is typically used for data blocks that have regular structure. The logic implementation and technology mapping of the block is done manually to maintain the regular dataoriented bus structure of the block instead of generating a random gate structure through synthesis. The logic gates of the block are placed according to the flow of data through the block. This approach offers more control over the design than an automated synthesis approach and leads to a more predictable timing closure. It is also possible to get better performance and area on complex, high-performance designs than traditional techniques. The resulting netlists may be interpreted with traditional tiling techniques using the ARM Artisan® Advantage-CE™ or compatible library. The Artisan Advantage-CE library contains more than a thousand cells. Besides the standard cells used typical synthesis libraries, many tactical cells are included more in line with custom implementation techniques. These are used in an automated fashion in the structured flow. The library is specifically designed to deal with the high-density routing requirements of high-performance processors with a focus on both high speed operation and low static and dynamic power. Leakage reduction is achieved through power gating MT-CMOS cells and retention flip-fops to support sleep and standby modes. ARM has worked with tool vendors to ensure support for this critical new flow. Finally, a few of the most critical timing and area sensitive blocks of the design are reserved for full custom techniques. This includes memory arrays, register files and scoreboards. These blocks contain a mix of static and dynamic logic. No self-timed circuits are used. 5 Conclusion The Cortex-A8 processor is the fastest, most power-efficient microprocessor yet developed by ARM. With the ability to decode VGA H.264 video in under 350MHz, it provides the media processing power required for next generation wireless and consumer products while consuming less than 300mW in 65nm technologies. Its new NEON technology provides a platform for flexible software-based solutions for media processing. Thumb-2 instructions provide code density while maintaining the performance of standard ARM code; Jazelle RCT technology does likewise for realtime compilers. TrustZone technology provides security for sensitive data and DRM. Many significant new microarchitecture features make their first appearance on the Cortex-A8 processor. These include a dual issue, in-order superscalar pipeline, an integrated Level-2 cache and a aneeshr2020@gmail.com Aneesh R
  • 7. Architecture and Implementation of the ARM Cortex-A8 Microprocessor ANEESH R aneeshr2020@gmail.com significantly deeper pipeline than precious ARM processors. To meet its aggressive performance targets while maintaining ARM’s traditional small power budget, new flows have been developed which approach the efficiency of custom techniques while keeping the flexibility of an automated flow. The Cortex-A8 processor is a quantum jump in flexible low power, high-performance processing. aneeshr2020@gmail.com Aneesh R