This document compares the ARM Cortex-A8 and Intel Atom processor architectures through a review of their specifications and benchmark performance tests. Specifically, it analyzes the architectural details of the Cortex-A8 and Atom N330, and benchmarks a Texas Instruments OMAP3530-based Cortex-A8 system (BeagleBoard) against an Intel Atom N330 desktop PC. The benchmarks show the Atom has higher raw performance, while the Cortex-A8 has significantly greater power efficiency. Overall, the document provides a concise overview and performance comparison of the two architectures for mobile and low-power applications.
Smartphones architecture is generally different from
common desktop architectures. It is limited by power, size and
cost of manufacturing with the goal to provide the best
experience for users in a minimum cost. Stemming from this
fact, modern micro-processors are designed with an
architecture that has three main components: an application
processor that executes the end user’s applications, a modem
responding to baseband radio activities, and peripheral devices
for interacting with the end user.
Parallelism
Multicores:
The Cortex A7 MPCore processor implements the ARMv7-A
architecture. The Cortex A7 MPCore processor has one to
four processors in a single multi-processor device. The
following figure shows an example configuration with four
processors [3].
In this paper, we are discussing the architecture of the
application processor of Apple iPhone. Specifically, Apple
iPhone uses ARM Cortex generation of processors as their
core. The following sections discusses this architecture in terms
of Instruction Set Architecture, Memory Hierarchy and
Parallelism.
Smartphones architecture is generally different from
common desktop architectures. It is limited by power, size and
cost of manufacturing with the goal to provide the best
experience for users in a minimum cost. Stemming from this
fact, modern micro-processors are designed with an
architecture that has three main components: an application
processor that executes the end user’s applications, a modem
responding to baseband radio activities, and peripheral devices
for interacting with the end user.
Parallelism
Multicores:
The Cortex A7 MPCore processor implements the ARMv7-A
architecture. The Cortex A7 MPCore processor has one to
four processors in a single multi-processor device. The
following figure shows an example configuration with four
processors [3].
In this paper, we are discussing the architecture of the
application processor of Apple iPhone. Specifically, Apple
iPhone uses ARM Cortex generation of processors as their
core. The following sections discusses this architecture in terms
of Instruction Set Architecture, Memory Hierarchy and
Parallelism.
ARM 32-bit Microcontroller Cortex-M3 introductionanand hd
What is the ARM Cortex-M3 processor?
Architecture Versions,Processor naming, Instruction Set Development, The Thumb-2 Technology and Instruction Set Architecture, Cortex-M3 Processor Applications
Sybsc cs sem 3 physical computing and iot programming unit 1WE-IT TUTORIALS
SoC and Raspberry Pi
System on Chip: What is System on chip? Structure of System on Chip.
SoC products: FPGA, GPU, APU, Compute Units.
ARM 8 Architecture: SoC on ARM 8. ARM 8 Architecture Introduction
Introduction to Raspberry Pi: Introduction to Raspberry Pi, Raspberry Pi Hardware, Preparing your raspberry Pi.
Raspberry Pi Boot: Learn how this small SoC boots without BIOS. Configuring boot sequences and hardware.
Semiconductor Memory Fundamentals
Memory Types
Memory Structure and its requirements
Memory Decoding
Examples
Input - Output Interfacing
Types of Parallel Data Transfer or I/O Techniques
ARM 32-bit Microcontroller Cortex-M3 introductionanand hd
What is the ARM Cortex-M3 processor?
Architecture Versions,Processor naming, Instruction Set Development, The Thumb-2 Technology and Instruction Set Architecture, Cortex-M3 Processor Applications
Sybsc cs sem 3 physical computing and iot programming unit 1WE-IT TUTORIALS
SoC and Raspberry Pi
System on Chip: What is System on chip? Structure of System on Chip.
SoC products: FPGA, GPU, APU, Compute Units.
ARM 8 Architecture: SoC on ARM 8. ARM 8 Architecture Introduction
Introduction to Raspberry Pi: Introduction to Raspberry Pi, Raspberry Pi Hardware, Preparing your raspberry Pi.
Raspberry Pi Boot: Learn how this small SoC boots without BIOS. Configuring boot sequences and hardware.
Semiconductor Memory Fundamentals
Memory Types
Memory Structure and its requirements
Memory Decoding
Examples
Input - Output Interfacing
Types of Parallel Data Transfer or I/O Techniques
Intel Microprocessors- a Top down ApproachEditor IJCATR
IBM is the world's largest manufacturer of computer chips. Although it has been challenged in recent years by
newcomers AMD and Cyrix, Intel still Predominate the market for PC microprocessors. Nearly all PCs are based on Intel's x86
architecture. IBM (International Business Machines)IBM (International Business Machines) is by far the world's largest information
technology company in terms of Gross ($88 billion in 2000) and by most other measures, a position it has held for about the past
50 years. IBM products include hardware and software for a line of business servers, storage products, custom-designed microchips,
and application software. Increasingly, IBM derives revenue from a range of consulting and outsourcing services. In this paper we
will compare different technologies of computer system, its processor and chips
How to Select Hardware for Internet of Things Systems?Hannes Tschofenig
With the increasing commercial interest in Internet of Things (IoT) the question about a reasonable hardware configuration surfaces again and again.
Peter Aldworth, a hardware engineer with more than 19 years of experience, discusses this topic in a presentation given to the IETF community.
This PPT is about the ARM processors, family of processors,significance,applications and architectural features and Instruction Set Architecture useful for beginners
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Accelerate your Kubernetes clusters with Varnish Caching
Arm Cortex A8 Vs Intel Atom:Architectural And Benchmark Comparisons
1. ARM Cortex-A8 vs. Intel Atom:
Architectural and Benchmark Comparisons
University of Texas at Dallas
EE6304 Computer Architecture Course Project – Fall 2009
Katie Roberts-Hoffman, Pawankumar Hegde
Abstract—Mobile Internet Devices (MIDs) are increasingly gaming systems, e-books, point-of-sale systems,
being found in the consumer electronic markets that require and even occasionally netbooks. As ARM‘s
low power and high performance processors. ARM and Intel
have become major competitors in this domain. In order to processor performance increases to Desktop PC
meet the architectural requirements of MIDs, both companies levels, they are starting to compete for traditional
have developed the following processor architectures to address x86 sockets alongside the Intel Atom, which was
market needs: the Cortex-A series from ARM and the Atom-N
series by Intel. In this paper, the Cortex-A8 and Atom N330 are originally designed for netbooks. As Intel
compared with respect to their architectures and their real- continues to decrease the Atom‘s power
world performance in specific implementations. For requirements, it may become quickly adopted as
performance comparison, a Texas Instruments OMAP3530-
based Cortex-A8 was selected along with an Intel Atom N330- the defacto embedded processor in handheld
based Desktop PC. Four benchmarks are used to compare the devices as well. Both of these companies have
performance with respect to clock rates, power, price, and die modified their respective processor architectural
size. Results show that the Atom has better raw performance
while the Cortex-A8 has significantly greater power efficiency. designs to meet the requirements for Mobile
Internet Devices (MIDs). In this paper, a
I. INTRODUCTION comparison of the ARM Cortex-A8 and the Intel
Over the past two years, two key devices have Atom N330 processor architectures is presented
drastically changed the consumer electronics with a focus on performance and power. In
marketplace. The first is Apple‘s iPhone, which addition, specific implementations of both
has single-handedly redefined the concept of ―app architectures are used for a real-world
stores,‖ increased mobile web browsing performance comparison including an
expectations, and created a demand for all-in-one OMAP3530-based Cortex-A8 and an Intel Atom
devices containing GPS, Bluetooth, media players, N330-based Desktop PC. Performance results
e-mail, instant messenger clients, and more. This using four key processor benchmarks are
list of applications is actually the same as those presented which can be used as an aid in showing
found on the second market-changing device: the how the architectures ultimately compare in
netbook. Netbooks are synonymous with low cost, regards to clock rates, power, price, and die size.
low power, and small size. Thus it seems that the
II. ARCHITECTURE OVERVIEW
iPhone (and similar type smartphones, such as the
Motorola Droid) and netbooks share a number of A. Cortex-A8
traits and purposes although they are based on two ARM processors range in performance, power and
very different microprocessor architectures: ARM price from ARM7 processors at 15MHz to the
Cortex-A8 series and the Intel Atom. Cortex-A8 running at more than 1 GHz, and the
ARM processors now dominate the mobile recently announced symmetric multi-core Cortex-
phone space due to their lower power A9. For practical purposes, even the fastest ARM9
requirements—with 98% of mobile phones having and ARM11 processors at clock speeds of
an ARM processor. Last year, ARM licensed over 500MHz cannot compete with modern x86
1.5 billion processors to companies such as Texas architectures used in typical Desktop PCs. Thus,
Instruments, Qualcomm, and Freescale [1]. ARM this paper only focuses on the Cortex-A8
Cortex-A8 processors can now also be found architecture. Limited comments on the Cortex-A9
outside of the handset market in set-top-boxes, are also provided for perspective, but as the device
is not widely available, benchmarking results are
2. not provided. caches with the main pipeline. NEON support has
The Cortex-A8 is the highest performance and been added to compilers so that programmers need
most power efficient ARM processor currently in not hand code NEON assembly or use intrinsic to
production today. Like other ARM processors, it take advantage of the extra pipeline, although
is a 32-bit RISC architecture with 16 registers and specific compiler flags are needed to invoke this
a Harvard memory architecture. It can run at feature. ARM has stated that 4.5x performance
speeds between 600MHz to over 1 GHz while increase is expected for MPEG4 decoding using
averaging as little as 300mW of power. The NEON versus older ARMv5 technology [3].
Cortex-A8 uses the ARMv7 architecture, which is B. Intel’s Atom
ARM‘s first superscalar architecture. The ARMv7
architecture includes two pipelines: a 13-stage Intel has released two variants of the Atom for
integer pipeline and a 10-stage NEON pipeline the mobile processor market: the Atom-N and
(useful for accelerating multimedia and signal Atom-Z series. While the Atom-Z was designed to
processing applications). In addition, the ARMv7 cater to the low-cost desktop and notebook
architecture includes the Jazelle-RCT technology market, the Atom-N is targeted at MIDs.
that allows for just-in-time compilation of Compared to the Intel Celeron architecture, the
bytecode languages (such as Java), and Thumb-2 performance has been reduced by nearly half. All
technology that reduces code size while of the processors in the Atom series are clocked
maintaining equivalent performance by allowing between 0.8-2.0GHz and built with 45nm
16-bit instructions to coexist with 32-bit technology. The Atom N330, used for
instructions [2]. benchmarking in this paper, is a dual core
Since the Cortex-A8 was designed for the processor whereas other Atom processors in the
mobile/embedded space, the previous ARM family that is single-core. The Atom N330 runs at
architectures needed to be modified to provide the 1.6 GHz with a FSB running at 533MHz and an
best performance at the lowest power. To achieve average consumption of 8W [5].
this, the ARMv7 architecture uses a super-scalar The Atom family is an x86-based architecture
architecture with in-order instruction issue and with dual in-order instructions. The Atom N330 is
support for forwarding paths. An in-order the only 64-bit architecture in the family apart
instruction is less complex than an out-of-order from N230, with the remainder using 32-bit. All
one and thus yields lower power consumption. instructions in the CPU are translated into micro-
With multiple ALU pipelines, the Cortex-A8 has operations (μ-ops) that contain a memory load and
an average Instructions Per Cycle (IPC) of 0.9. a store operation for each ALU operation. This
Branch penalties are kept at a minimum by using kind of design is similar to that of traditional x86
dynamic branch predictors which were designed architectures but makes use of even more
with a 95% level of prediction accuracy and by powerful μ-ops than those used in previous
resolving most branches in a single stage [3]. designs. Such architecture enables significant
One of the most interesting performance- performance per cycle for each core without any
increasing features of ARMv7 architecture is its instruction re-ordering, speculative execution or
NEON technology that can be used as a SIMD register re-naming. However, this μ-op
accelerator. This allows the Cortex-A8 to perform architecture increases the branch miss-prediction
four multiply-accumulates instructions per cycle penalty. Intel's Atom has a 16-stage pipeline,
via dual-issue instructions to two pipelines [4]. including three instruction phase stages: decode (3
NEON is a hybrid 64/128-bit architecture that is stages), instruction dispatch (2 stages), and data
capable of both integer and floating-point cache access (3 stages). Each Atom core is also
operations. Geared towards accelerating equipped with two ALUs and two FPUs. The first
multimedia and graphics, it provides native ALU manages shift operations, and the second
support for complex numbers, and coordinates to controls jumps. All arithmetic operations,
avoid ―data shuffling‖ overhead. The NEON including integer operations, are automatically
pipeline has its own register file, but shares its sent to the FPUs. The first FPU is simple and
3. limited to addition, while the second manages Instrument‘s OMAP3530 processor, is compared
SIMD and multiply/divide operations. While basic to an Intel Atom N330 Desktop computer.
instructions are executed in one cycle, it can take The OMAP3530 contains a Cortex-A8 as the
up to 31 cycles (for floating point division) for GPP, a SGX510 GPU and C64x+ DSP for
complex instructions [6]. multimedia. The Beagle Board‘s Cortex-A8 is
To increase performance, Atom also supports clocked at 600MHz with 256MB DRAM and 256
Hyper-Threading. Hyper-Threading is Intel's NAND. The OMAP3530 uses a number of power
proprietary technology that implements management techniques to reduce power,
simultaneous multi-threading. For each processor including dynamic voltage and frequency scaling,
core that is physically present, the operating dynamic power switching, and exceptionally low
system can create two virtual processors and stand-by power. The OMAP3530 supports
shares the workload between them. Hence a dual- multiple modes of operation with varying power
core Atom, such as the N330, appears to have four requirements, including: device-off mode
cores to the operating system [7]. Finally, in order (consumes 0.59mW), standby mode (7mW), audio
to support the multimedia operations, the Atom is and video decode mode (with ARM at 500MHz
commonly paired with a 945G chipset that further and DSP at 360MHz consumes 540mW), and full-
increases both its performance and power on (with ARM at 600MHz, DSP at 430MHz, and
consumption. SGX510 at 110MHz consumes 1.5W)[9]. The 1k-
unit price for the OMAP3530 is $32[10].
The Intel Atom N330 Desktop used in this
experiment has two Atom cores that are each
clocked at 1.6 GHz. The N330 dual-core processor
actually looks as if Intel simply combined two
Atom N230 single-core processors. The added
core unfortunately doubles the processor‘s TDP
from N230‘s 4W to 8W. The N330 is available as
part of Intel's D945GCLF2 desktop motherboard,
consisting of an Intel 945GC Express Chipset, an
Intel GMA 950 for integrated graphics, and a 533
MHz system bus. The Intel D945GCLF2
motherboard also has an S-Video connector,
gigabit Ethernet, 6-channel high definition audio,
Figure 1. Intel Atom motherboard includes a fan for heat dissipation [8] and a single DIMM socket that can support up to 2
GB of DDR2 667/533 memory. The 1k-unit price
Intel has put significant efforts in reducing the power
for the Atom 330 is $43[5].
consumption for the Atom processor family. It consists of
six low-power modes for the bus and the cache. The Atom For the following set of experiments, each
N330‘s Thermal Density Power (TDP) rate is measured at processor is running Linux 2.6.28, using the
2.5W. The Atom N330 is expected to consume an average Angstrom distribution for the Beagle Board and
8W of power when paired with the 945GSE chipset, and has Ubuntu 9.04 for the Intel Desktop. Linux was
a specified maximum TDP of 11.8W[5].
chosen due to its widespread availability and since
it is the predominant operating system for high
C. Comparison Overview performance consumer electronics, whether that is
While there are a number of whitepapers and mobile phones, set-top boxes or GPS systems. In
architecture guides for the Cortex-A8 and Atom, addition, there are a number of similar tools
these can‘t provide a direct comparison of available in the open source community. The
performance between the two architectures. For GNU C Compiler (GCC) is available for both the
this, the actual implementations of both ARM Cortex-A8 platform and for x86-based
architectures are needed. Due to cost and processors.
availability, the Beagle Board, based on Texas
4. Here it is assumed the reader is familiar with the computing and audio processing. The following
standard process of compiling code and booting a benchmarks were selected with summaries
Linux kernel for an x86 target, but a brief detour provided for each below:
for the ARM processor is given. The Cortex-A8
itself can compile code (known as native 1. Fhourstone: An integer benchmark that
compilation), however it is frequently faster to effectively solves the ―Connect-4‖ game. The
cross-compile code on an x86 target (assuming benchmark calculates the number of positions
that a modern x86 processor can run at searched per second.
significantly greater speeds than an ARM 2. Dhrystone: A traditional synthetic integer
processor). The Code Sourcery 2007q3 cross- benchmark designed to test a spread of commonly
compiler was chosen for the Beagle Board. The used functions. It outputs the number of Dhrystone
cross-compiler compiles the specified source on loop iterations run per second, also known as
an x86 system. Following compilation, the Dhrystone MIPS (DMIPS).
compiler out file is copied to an SD card that 3. Whetstone: A common synthetic floating-
contains the Beagle Board‘s file system. The same point benchmark used to measure typical scientific
SD card also contains the Linux uImage and computing performance. It outputs the number of
bootloaders needed for booting the device. This Whetstone Instructions per Second (WIPS).
SD card is inserted into the Beagle Board and the 4. Linpack: A floating-point benchmark used to
operator can interact with the Beagle Board via measure linear algebra performance by calculating
RS232 serial port or through the DVI connector the time spent solving N by N linear equations. It
with USB mouse and keyboard. For further outputs the number of floating-point operations
instructions on how to cross-compile and boot the per second performed (FLOPS).
Linux kernel on the Beagle Board, see
http://www.beagleboard.org.
III. BENCHMARK RESULTS
Table I displays the results obtained from the four
benchmarks run. Clearly each benchmark can only
be compared to itself, as the resulting values are
meaningless outside of that benchmark‘s context.
The same gcc compiler options were used on the
Beagle Board and Intel Desktop for the same
benchmark. By varying the compiler options,
better performance may be possible than that
reported below. ARM research indicates an
expected 1200 DMIPS on the Beagle Board while
only 883 were found in this experiment [2]. It is
hypothesized that sub-optimal compiler flags led
to this disparity. Below, the raw performance
Figure 2. Beagle Board shown with SD card, Serial cable and USB cable
measurements are given, followed by performance
[11] per MHz, performance per Watt, and performance
per price. The latter three should give insights into
Once both platforms were functional, a set of how strongly or weakly each device performs for a
benchmarks had to be selected. For targeted market. For example, an e-book may be
reproducibility, only open source benchmarks very sensitive to power but due to the novelty of
were selected. Four benchmarks were chosen— the item, may be less sensitive to price. Note that
two integer-based and two floating-point-based. the power comparison uses the datasheet power
The former gives insight into typical text-based consumption values for both processors which are
applications, such as word-processing, whereas expected to be accurate as each CPU is fully
the latter may be indicative of scientific loaded for each of these benchmarks (verified via
5. ‗top‘ during execution). conditions were not the same for each simulation.
From these benchmarks, it appears that the However, it does demonstrate that a system-on-
Atom Desktop consistently outperforms the chip device like OMAP3530 can achieve
Beagle Board with respect to raw processing multimedia playback at significantly less power
power. It‘s clear here that the Beagle Board does (approximately 1W for the Beagle Board with
significantly better on integer benchmarks DSP decoding versus approximately 8W for the
(Fhourstone and Dhrystone) versus the floating- Atom 330 Desktop solution). Not only does the
point benchmarks (Whetstone and Linpack). The OMAP3530 exhibit power savings, but also
NEON pipeline is only used when specific allows the ARM to service other tasks during the
compiler flags are used, specifically, ―– decoding.
mtune=cortex-a8 –mfpu=neon –free-vectorize –
mfloat-abi=softfp‖. If these compiler flags are not Table II
used, the Whestone benchmark performance Media Benchmark Performance Results
Media Benchmarks - D1 MPEG4 Decode
decreases by a factor of 2, and the Linpack
Load
performance by 4.5x. The Intel Atom does (Million cycles
particularly well on floating-point operations since per sec)
Beagle Board (ARM) 500
all arithmetic operations (both integer and Beagle Board (DSP) 90
floating-point) use the Atom‘s FPUs and it is a fair [4][12] Atom 330 Desktop 880
assumption that Intel has optimized these units.
However, all is not lost for the Beagle Board as it Similarly, the OMAP3530 shines in graphics
appears significantly more power efficient and performance at low power. The SGX510
provides more performance per dollar than the accelerator is capable of delivering 10 million
Atom Desktop. polygons per second without adding a significant
power penalty. This type of performance is
comparable to graphics found on PlayStation2.
TABLE I
Benchmark Performance Results for The Intel Atom currently requires an external
BeagleBoard and an Atom 330 Desktop graphics chip to achieve this performance that
Fhourstone Dhrystone* Whetstone Linpack increases the power consumption by 6 or more
(Kpos/s) (DMIPS) (MIPS) (KFLOPS)
Raw Benchmark Performance
Watts as previously discussed.
Beagle Board 731 883 100 23376
Atom 330 Desktop 2054 1822 1667 933638
Benchmark Performance per MHz (Beagle at 600MHz, Atom at 1.6GHz) IV. CONCLUSIONS
Beagle Board 1.22 1.47 0.17 38.96
Atom 330 Desktop 1.28 1.14 1.04 583.52 The basic benchmarking results show nothing
Benchmark Performance per W (Beagle at 0.5W and Atom at 2W) surprising: the Cortex-A8 provides significant
Beagle Board 1462 1766 200 46752
Atom 330 Desktop 256.75 227.75 208.38 116704.75 power savings while the Intel Atom‘s pure
Benchmark Performance per Dollar(Beagle at $32, Atom at $43)
Beagle Board 22.84 27.59 3.13 730.5
processing power is unmatched. The main culprit
Atom 330 Desktop
Benchmark Performance per mm2 of the final die size
47.77 42.37 38.77 21712.51 to the Atom‘s power efficiency is the external
(OMAP at 60mm2, Atom at 52mm2)
Beagle Board 12.18 14.72 1.67 389.6
945GC Express Chipset as the Atom processor
Atom 330 Desktop 39.5 35.03 32.06 17954.58 core alone is relatively low power at 2W.
Unfortunately this chipset is usually used in
There is a variety of other benchmark data multimedia applications indicating that the
available, usually targeting specific end Cortex-A8 has considerable advantages in
equipments. For example, a gaming unit needs multimedia areas if it is integrated on a system-on-
graphics performance whereas a personal media chip device like OMAP3530 with a DSP.
player needs strong video decoder performance. Intel‘s roadmap includes two power saving
Below are other benchmarks that were not features: moving to 28nm technology and
reproduced, but provide insight into the two integrating the 945GC chipset onto the main
devices. The reader is cautioned in advance that processor chip. If Intel is able to reduce the
the following multimedia benchmark contains processor (with 945GC chipset integrated) to less
references from three different sources thus the than 1.5W, it will achieve approximately the same
6. performance-per-power as the OMAP3530. On the Dhrystone benchmark:
other hand, if the Cortex-A8 increases its clock Available from
speed to 1.5MHz, it can achieve similar http://www.xanthos.se/~joachim/dhrystone-
performance to the Intel Atom as both chips src.tar.gz
provide approximately the same performance-per- On the Linux host, to compile and run on an x86:
MHz In an Intel like fashion, ARM‘s roadmap $ wget
includes symmetric multi-core Cortex-A9 http://www.xanthos.se/~joachim/dhrystone-
processors. The Cortex-A9 is the next generation src.tar.gz
Cortex-A8 and is expected to achieve clock speeds $ tar –xvf Dhrystone-src.tar.gz
in excess of 2GHz. Assuming that both of these $ gcc -O2 -o dhrystone dhry21a.c dhry21b.c
companies achieve their roadmap goals, then their timers.c
processors will remain competitive in terms of Similarly, to run on Beagle Board, simply cross-
performance, power, and price. compile using the Code Sourcery cross compiler.
Thus the feature that may lead to marketplace
dominance may not be limited to the hardware but Whetstone benchmark:
in software. While there are many more ARM Available from
processors in production than x86-based http://netlib.org/benchmark/whetstone.c
processors, more consumers are familiar with x86- $ wget http://netlib.org/benchmark/whetstone.c
based software. For example, Linux netbooks have $ gcc -O -o whetstone whetstone.c -lm
been relatively unsuccessful in the market despite $ ./whetstone -c 100000
their lower cost, as the majority of consumers Similarly, to run on Beagle Board, simple cross-
prefer to run Windows. In this sense, ARM must compile using the Code Sourcery cross compiler
ensure adequate consumer software is available. with CCFLAGS=–mtune=cortex-a8 –mfpu=neon
With the introduction of the Apple iTunes app –free-vectorize –mfloat-abi=softfp.
store and Google‘s Android marketplace, software
is become more readily available for ARM Linpack benchmark:
processors to the end consumer. Supposing ARM Available from
software continues to become more abundant, the http://www.netlib.org/benchmark/linpackc.new
ARM and Intel rivalry can be expected to On the Linux host, to compile and run on an x86:
continue. $ wget
http://www.netlib.org/benchmark/linpackc.new
APPENDIX $ gcc -O -o linpack linpack.c –lm
Steps for benchmark reproduction $ ./linpack
Similarly, to run on Beagle Board, simply cross-
Fhourstones benchmark: compile using the Code Sourcery cross compiler
Available from with CCFLAGS=–mtune=cortex-a8 –mfpu=neon
http://homepages.cwi.nl/~tromp/c4/fhour.html –free-vectorize –mfloat-abi=softfp.
On the Linux host, to compile and run on an x86:
$ wget
http://homepages.cwi.nl/~tromp/c4/Fhourstones.ta REFERENCES
r.gz [1] Krazit,Tom, Apri 3, 2006. ARMed for the Living Room.
$ tar -xvf Fhourstones.tar.gz Available: http://news.cnet.com/ARMed-for-the-living-
room/2100-1006_3-6056729.html
$ make
[2] ARM, March 2008. ARM Cortex-A8
$ ./SearchGame < inputs
Microprocessor Overview. Available:
To run on Beagle Board, simple change the
http://www.arm.com/pdfs/Cortex-
Makefile to use arm-none-linux-gnueabi-gcc (or
A8_data_brief_FINAL_2_.pdf
other ARM cross compiler) instead of the host [3] Grisenthwaite, Richard, Dec 5, 2005. IEEE Cambridge
gcc. Branch Seminar: Cortex A8 Processor. Available:
www.iet-
7. cambridge.org.uk/arc/seminar05/slides/RichardGrisenth
waite.pdf
[4] BDTI, Feb. 26, 2008. ARM Cortex-A8
Benchmark Results. Available:
http://www.bdti.com/bdtimark/cortex_a8.htm
[5] Intel, 2008. Intel Atom Processor 330.
Available:
http://ark.intel.com/Product.aspx?id=35641
[6] Dandumont, Pierre, June 5, 2008. Intel Atom CPU
Review. Available:
http://www.tomshardware.com/reviews/intel-atom-
cpu,1947-4.html
[7] Intel, 2008. Intel Hyper-Threading Technology.
Available: http://www.intel.com/technology/platform-
technology/hyper-threading/index.htm
[8] Kucharik, Eliot, Oct. 6, 2008. Dual Core Atom 330
image. Available:
http://www.fudzilla.com/images/stories/2008/October/R
eviews/Atom_330/atom_330_d945gclf2_front.jpg
[9] Musah, Arthur, Dykstra. Andy, 2008. Power
Management Techniques for OMAP35x Application
processors. Available:
http://www.ti.com/litv/pdf/sprt495
[10]Texas Instruments, 2009. OMAP3530
Application Processor. Available:
http://focus.ti.com/docs/prod/folders/print/oma
p3530.html
[11] Benson. Duane, Aug. 28, 2009. Open
Source Hardware. Available:
http://blog.screamingcircuits.com/2009/08/ind
ex.html
[12]Felipe, C, Mar. 26, 2009. OMAP3 Public DSP
binaries now work. Available:
http://felipec.wordpress.com/2009/03/26/omap
3-public-dsp-binaries-now-work/