COA Complete Notes.pdf

EET 2211
4TH SEMESTER – CSE & CSIT
CHAPTER 1, LECTURE 1
LECTURE 1

CHAPTER 1 – BASIC CONCEPTS AND
COMPUTER EVOLUTION
2
TOPICSTO BE COVERED
Ø Organization andArchitecture
Ø Structure and Function
Ø A brief history of Computers
Ø The evolution of Intel x86Architecture
Ø Embedded Systems
Ø ARMArchitecture
Ø Cloud Computing
LECTURE 1

ORGANIZATION AND ARCHITECTURE
3
COMPUTER ARCHITECTURE COMPUTER ORGANIZATION
It refers to those attributes of a system
visible to a programmer.
It refers to the operational units and their
interconnections that realize the
architectural specifications.
Architectural attributes have a direct impact
on the logical execution of the program.
Organizational attributes include those
hardware details transparent to the
programmer.
e.g. Instruction set, the number of bits used
to represent different data types, I/O
mechanisms and memory addressing
techniques.
e.g. control signals, interfaces between the
computer and peripherals and memory
technology used.
It is an architectural design issue
whether a computer will have multiply
instruction .
It is an organizational issue whether that
instruction will be implemented by a special
multiply unit or by a mechanism that makes
use of repeated addition units.
A particular architecture generally lasts for
many years.
The organization generally changes with the
changing technology.
LECTURE 1

STRUCTURE AND FUNCTION
4
 STRUCTURE defines the way in which the components are interrelated.
 FUNCTION defines the operation of each individual component as part of the
structure.
 There are basically two types of computer structures:
1. single-processor computer
2. Multi-core computer
 There are four basic functions of a computer:
1. Data processing
2. Data storage
3. Data movement
4. Control
LECTURE 1

SINGLE PROCESSOR COMPUTER
5
 There are four main structural components
1. CPU
2. Main memory
3. I/O
4. System interconnections
LECTURE 1

Contd.
6
Fig. The computer:Top-Level Structure [Source: Computer Organization and
Architecture byWilliam Stallings]
LECTURE 1

MULTICORE COMPUTER STRUCTURE
7
 The computers with multiple processors present on a single
chip is called a multicore computer and each processing unit
consisting of a control unit,ALU, registers and cache is called
a core.
 An important feature of this is the use of multiple layers of
memory called cache memory between the processor and the
main memory.
LECTURE 1

Contd.
8
Fig.: Simplified view of Major Elements of a Multicore computer [Source:
Computer Organization andArchitecture byWilliam Stallings]
LECTURE 1

Contd.
9
Fig.: Motherboard withTwo Intel Quad-Core Xeon Processors [Source:Computer
Organization andArchitecture byWilliam Stallings]
LECTURE 1

EET 2211
LECTURE 2

A BRIEF HISTORY OF COMPUTERS
6/2/2021
LECTURE 2
2
ü The computer generations are classified based on the
fundamental hardware technology employed.
ü Each new generation is characterized by greater processing
performance, larger memory capacity, smaller size and lower
cost than the previous one.

COMPUTER GENERATIONS
6/2/2021
LECTURE 2
3
GENERATION APPROXIMATE
DATES
TECHNOLOGY TYPICAL SPEED
(operations per
second)
1 1946-1957 Vacuum tubes 40,000
2 1957-1964 Transistors 2,00,000
3 1965-1971 Small and medium scale
integration
10,00,000
4 1972-1977 Large scale integration 1,00,00,000
5 1978-1991 Very large scale integration 10,00,00,000
6 1991- Ultra large scale integration >10,00,00,000

FIRST GENERATION : VACUUM TUBES
6/2/2021
LECTURE 2
4
ü The first generation of computers used vacuum tubes for digital
logic elements and memory.
ü Famous first generation computer is known as IAS computer (is the
basic prototype for all general-purpose computers).
ü Basic design approach is the stored-program concept.
ü The idea was proposed by von Neumann.
ü It consists of (i) a main memory (which stores both data and
instructions), (ii) an arithmetic and logic unit (ALU) (capable of
operating on binary data), (iii) a control unit (which interprets the
instructions in memory and causes them to be executed), and (iv)
Input-Output (I/O) (equipment operated by the control unit).

Contd.
6/2/2021
LECTURE 2
5
Fig.: IAS computer structure [Source: Computer

Contd.
6/2/2021
LECTURE 2
6
VON NEUMANN’S PROPOSAL
1) As the device is primarily a computer, it has to perform the
elementary arithmetic operations.
2) The logical control of the device, i.e. the proper sequencing of
operations can be most efficiently carried out by the central
control unit.
3) Any device that is to carry out long and complicated sequences
of operations must have a memory unit.
4) The device must have interconnections to transfer information
from R (outside recording medium of the device) into specific
parts C (CA+CC) and M (main memory), and form the specific
part I (input).
5) The device must have interconnections to transfer from its
specific parts C and M into R, and form the specific part O
(output).

Contd.
6/2/2021
LECTURE 2
7
ü The memory of IAS consists of 4096 storage locations (words of 40
binary digits/bits each).
ü It stores both data and instructions.
ü Numbers are represented in binary form and instructions are in binary
codes.
ü Each number is represented by a sign bit and a 39-bit value.
ü A word may alternatively contain 20-bit instructions.
ü Each instruction consists of an 8-bit operation code/opcode (specifying
the operation to be performed) and a 12-bit address designating one of
the words in the memory (0-999).
Fig.: IAS memory format
[Source: Computer
Organization and
Architecture byWilliam
Stallings]

Contd.
6/2/2021
LECTURE 2
8
Table: IAS instruction set [Source: Computer Organization and
Architecture byWilliam Stallings]

SECOND GENERATION : TRANSISTORS
6/2/2021
LECTURE 2
9
ü Vacuum tubes were replaced by transistors.
ü Transistor is a solid-state device made form silicon; smaller,
cheaper and generates less heat than vacuum tubes.
ü Complex arithmetic & logic units, control units, high level
programming language, and the provision of system software were
introduced.
ü E.g. IBM 7094 where data channels or independent I/O modules
were used with their own processor and instruction sets.
ü Multiplexers were used which are the central termination point
for data channels, CPU and memory.

THIRD GENERATION : INTEGRATED CIRCUITS
6/2/2021
LECTURE 2
10
ü The integrated circuits consists of discrete components like transistors,
resistors, capacitors etc.
ü Two fundamental components that are required are gates and memory
cells.
ü Gates control the data flow.
ü Memory cells store 1 bit data.
ü Governed by Moore’s Law which states that the number of transistors
doubles in every 18 months.
Fig.: Fundamental computer elements [Source: Computer

LATER GENERATIONS
6/2/2021
LECTURE 2
11
Two important developments of later generations are:
1. SEMICONDUCTOR MEMORY : First application of
integrated circuit is processor. It is faster, smaller in size,
memory cost decreased with corresponding increase in
physical memory density.
2. MICROPROCESSORS : It started in 1971 with the
development of first chip 4004 to contain all the components
of a CPU on a single chip.

Contd.
6/2/2021
LECTURE 2
12
Table: Evolution of Intel Microprocessors [Source: Computer

Contd.
6/2/2021
LECTURE 2
13
Table: Evolution of Intel Microprocessors [Source:
Computer Organization andArchitecture byWilliam
Stallings]

EET 2211
COMPUTER ORGANIZATION AND
ARCHITECTURE (COA)
EET- 2211

COMPUTER EVOLUTION
6/2/2021
LECTURE 30
2
TOPICSTO BE COVERED
Ø The Evolution of the Intel x86Architecture
Ø Embedded Systems
LEARNING OBJECTIVES
Ø Present an overview of the evolution of the x86 architecture.
Ø Define embedded systems.
Ø List some of the requirements and constraints that various embedded systems meet.
ALREADY COVERED
Ø Organization and architecture
Ø Structure and Function
Ø A Brief History of Computers.

THE EVOLUTION OF THE INTEL x86
ARCHITECTURE
6/2/2021
LECTURE 2
3
GENERATIO
N
APPROXIM
ATE DATES
TECHNOLOGY TYPICALSPEE
D (operations
per second)
1 1946-1957 Vacuum tubes 40,000
2 1957-1964 Transistors 2,00,000
3 1965-1971 Small and medium
scale integration
10,00,000
4 1972-1977 Large scale integration 1,00,00,000
5 1978-1991 Very large scale
integration
10,00,00,000
6 1991- Ultra large scale
integration
>10,00,00,000
üMicroprocessors have grown faster and more complex.
üIntel used to develop microprocessors every 4 years.

Contd.
6/2/2021
LECTURE 2
4
Table: Evolution of Intel Microprocessors [Source: Computer

Contd.
6/2/2021
LECTURE 2
5
Table: Evolution of Intel Microprocessors [Source:
Computer Organization andArchitecture byWilliam
Stallings]

Contd.
6/2/2021
LECTURE 3
6
MICROPROCESSOR DESCRIPTION
8080 ØThe world’s first general purpose microprocessor.
ØThis was an 8-bit machine, with an 8-bit data path to memory.
ØIt was used in the first personal computer,ALTAIR.
8086 ØA more powerful 16-bit machine.
ØIt has wider data path, larger registers and an instruction
cache/queue, that prefetches a few instructions before they are
executed.
ØA variant of this processor, the 8088, was used in IBM’s first
personal computer.
ØIt is the first use of x86 architecture.
80286 ØIt is an extension of 8086.
ØIt enabled addressing a 16-MB memory instead of just 1MB.
80386 ØIntel’s first 32-bit machine.
ØThe complexity and power of minicomputers and mainframes
was introduced.
ØIt was the first Intel processor to support multitasking.

Contd.
6/2/2021
LECTURE 3
7
80486 ØIt introduced the use of sophisticated and powerful
cache technology and instruction pipelining.
ØIt also used a built-in math co-processor helpful in
offloading complex maths operations from the main CPU.
Pentium ØIntel introduced the use of superscalar techniques.
ØIt allows multiple instructions to execute in parallel.
Pentium Pro ØIt followed the superscalar architecture with use of
register renaming, branch prediction, data flow analysis
and speculative execution.
Pentium II ØIt incorporated Intel MMX technology which is
designed specifically to process video, audio and graphics
data efficiently.

Contd.
6/2/2021
LECTURE 3
8
Pentium III ØIt incorporates additional floating-point instructions.
ØThe Streaming SIMD Extensions (SSE) instruction set
extension added 70 new instructions designed to increase
performance.
ØE.g. DSP and GP.
Pentium 4 ØIt includes additional floating-point and other enhancements
for multimedia.
Core ØIt is the first Intel x86 microprocessor with dual core,
referring to the implementation of two cores on a single chip.
Core 2 ØIt extends the Core architecture to 64-bits.
ØThe Core 2 Quad provides four cores on a single chip.
ØAn important addition to the architecture was the Advanced
Vector Extensions instruction set that provided a set of 256-bit
and then 512-bit instructions for efficient processing of vector
data.

EMBEDDED SYSTEMS
6/2/2021
LECTURE 3
9
ü It refers to the use of electronics and software within a
product.
ü E.g. cell phones, digital computers, video cameras,
calculators, microwave ovens, home security systems,
washing machines, lighting systems, thermostats, printers,
various automotive systems, toothbrushes and numerous
types of sensors and actuators in automated systems.
ü Generally embedded systems are tightly coupled to their
environments.

Contd.
6/2/2021
LECTURE 3
10
Fig.1: Organization of an Embedded System [Source: Computer

Contd.
6/2/2021
LECTURE 3
11
ELEMENTS THAT ARE DIFFERENT IN AN EMBEDDED
SYSTEM FROMTYPICAL DESKTOP/ LAPTOP
1. There may be a variety of interfaces that enable the system to
measure, manipulate and interact with the external environment.
2. The human interface can be either very simple or complicated.
3. The diagnostic port may be used for diagnosing the system.
4. Special purpose FPGA and ASIC or non-digital hardware can be
used to increase performance.
5. Software often has a fixed function and specific to the application.
6. They are optimized for energy, code size, execution time, weight,
dimensions and cost in-order to increase the efficiency.

Contd.
6/2/2021
LECTURE 3
12
SIMILARITY BETWEEN EMBEDDED SYSTEMS AND
GENERAL PURPOSE COMPUTER
1. Even with nominally fixed function software, the ability to
upgrade to fix bugs, to improve security and to add
functionality is very important for both.
2. Both support wide variety of apps.

Contd.
6/2/2021
LECTURE 3
13
INTERNET OF THINGS
ü IoT is a system of interrelated computing devices, mechanical and
digital machines provided with unique identifiers (UIDs) and the
ability to transfer data over a network without requiring human-
to-human or human-to-computer interaction.
ü Dominant theme is embedding short range mobile trans-receivers
into a wide array of gadgets and everyday items, enabling a form
of communication between people and things.
ü E.g. embedded systems, wireless sensor networks, control systems,
automation (home and building), smart home (lighting fixtures,
thermostats, home security systems, appliances).
ü It refers to expanding interconnection of smart devices (ranging
form appliances to tiny sensors).

Contd.
6/2/2021
LECTURE 3
14
Fig.2: IoT applications

Contd.
6/2/2021
LECTURE 3
16
üThe objects deliver sensor information, act on the
environment, and modify themselves to create overall
management of a larger system.
üThese devices are low-bandwidth, low-repetition data-
capture and low-bandwidth data-usage appliances that
communicate with each other and provide data through
user interface.
üWith reference to end systems supported, the internet
has gone through roughly four generations of
deployment culminating in the IoT:
1. Information technology (IT) : PCs, servers, routers,
firewalls, IT devices bought by enterprise IT people and
primarily using wired connectivity.

Contd.
6/2/2021
LECTURE 3
17
2. Operational technology (OT): machines/appliances
with embedded IT built by non-IT companies, such as
medical machinery, SCADA(Supervisory Control & Data
Acquisition System), process control and Kiosks, bought as
appliances by enterprise OT people and primarily using
wired connectivity.
3. Personal technology : Smartphones, tablets and eBook
readers bought as IT devices by consumers exclusively
using wireless connectivity.
4. Sensor/Actuator technology : Single-purpose devices
bought by consumers, IT and OT people exclusively
using wireless connectivity generally of a single form.

Contd.
6/2/2021
LECTURE 3
18
EMBEDDED OPERATING SYSTEMS
1. First approach is to take an existing OS and adapt it for the
embedded application. E.g. there are embedded versions of
LINUX,Windows, MAC and other commercial operating
systems specialized for embedded systems.
2. Second approach is to design and implement an OS
intended solely for embedded use. E.g.TinyOS (widely
used in wireless sensor networks)

Contd.
6/2/2021
LECTURE 3
19
APPLICATION PROCESSORSVERSUS DEDICATED
PROCESSORS
ü Application processors are defined by the processors ability to
execute complex operating systems such as LINUX,Android and
Chrome.
ü They are general-purpose in nature.
ü E.g. use of embedded application processor is the Smartphone.
ü Dedicated processors are dedicated to one or a small number of
specific tasks required by the host device.
ü The associated components as are dedicated to a specific task can
be engineered to reduce size and cost.

Contd.
6/2/2021
LECTURE 3
20
MICROPROCESSORS vs MICROCONTROLLERS

Contd.
6/2/2021
LECTURE 3
21
Fig.3:Typical Microcontroller chip Elements [Source: Computer

Contd.
6/2/2021
LECTURE 3
22
EMBEDDED vs. DEEPLY EMBEDDED SYSTEMS
üDeeply embedded systems are dedicated, single purpose
devices.
üThey have wireless capability and appear in networked
configurations (network of sensors over a large area like
factory or agricultural field).
üThey have extreme resource constraints in terms of
memory, processor size, time and power consumption.

QUESTIONS
6/2/2021
LECTURE 3
23
For each of the following examples determine whether this is an embedded system, explaining
why or why not?
a) Are programs that understand physics and /or hardware embedded? For example, one that
uses finite-element methods to predict fluid flow over airplane wings?
b) Is the internal microprocessor controlling a disk drive an example of an embedded system?
c) I/O drivers control hardware, so does the presence of an I/O driver imply that the
computer executing the driver is embedded?
d) Is a PDA (Personal DisplayAssistant) an embedded system?
e) Is the microprocessor controlling a cell phone an embedded system?
f) Are the computers in a big phased-array radar considered embedded?These radars are 10-
storey buildings with one to three 100-foot diameter radiating patches on sloped sides of
the building.
g) Is a traditional flight management system (FMS) built into an airplane cockpit considered
embedded?
h) Are the computers in a hardware-in-the-loop (HIL) simulator embedded?
i) Is the computer controlling a pacemaker in a person’s chest an embedded computer?
j) Is the computer controlling fuel injection in an automobile engine embedded?

EET 2211
ARCHITECTURE (COA)

COMPUTER EVOLUTION
6/2/2021
LECTURE 4
2
TOPICSTO BE COVERED
Ø ArmArchitecture
Ø Cloud Computing
LEARNING OBJECTIVES
Ø Define embedded systems.
Ø List some of the requirements and constraints that various embedded systems meet.
Ø List the importance of cloud computing.
ALREADY COVERED
Ø The Evolution of the Intel x86Architecture
Ø Embedded Systems

CISC vs. RISC
6/2/2021
LECTURE 4
3
ü Two important processor families are Intel x86 and ARM
Architectures.
ü X86 represent the complex instruction set computers (CISC).
ü X86 incorporates the sophisticated design principles found
on mainframes and supercomputers.
ü The ARM architecture used in variety of embedded systems
is one of the most powerful and best-designed reduced
instruction set computers (RISC).
ü The x86 provides excellent advances in computer hardware
over the past 35 years.

Contd.
6/2/2021
LECTURE 4
4
CISC RISC
Stands for Complex Instruction Set
Computers.
Stands for reduced Instruction Set
Computers.
A full set of computer instructions that
intends to provide the necessary capabilities
in an efficient way.
An instruction set architecture that is
designed to perform a smaller number of
computer instructions so that it can operate
at a higher speed.
The original microprocessor ISA. Redesigned ISA that emerged in the early
1980s.
Hardware centric design (the ISA does as
much as possible using hardware circuitry).
Software centric design (high level compilers
take on most of the burden of coding and
many software steps from the programmer).
Instruction cycles can take several clock
cycles to execute.
Single cycle instructions execution takes
place.

Contd.
6/2/2021
LECTURE 4
5
CISC RISC
Pipelining is difficult. Pipelining is easy.
Extensive use of microprogramming (where
instructions are treated like small programs).
Complexity in compiler and there is only
one layer of instructions.
Complex and variable length instructions Simple, standardized instructions.
Large number of instructions. Small number of fixed length instructions.
Compound addressing modes. Limited addressing modes.
Less registers. Uses more registers.
Requires a minimum amount of RAM. Requires more RAM.
Used in Microprogrammed Control Unit;
used in applications such as desktop
computer and laptops.
Used in Hardwired Control Unit; used in
applications such as mobile phones and
tablets.

ARM ARCHITECTURE
6/2/2021
LECTURE 4
6
ü It has evolved form RISC design principle.
ü It is used in embedded systems.
ARM EVOLUTION
ü ARM is a family of RISC based microcontrollers and
microprocessors designed byARM holdings.
ü ARM chips are high speed processors.
ü They have small die size and require very less power.
ü They are widely used in Smartphones and other hand held
devices including game stations and consumer products.

Contd.
6/2/2021
LECTURE 4
7
üARM chips are the processors in Apple’s popular iPod
and iPhone devices.
üIt is the most widely used embedded processor
architecture.
üAcron RISC Machine/ ARM was the first to develop the
commercial RISC processor.
üThe ARM design matched the growing commercial need
for a high-performance, low-power-consumption, small
size and low-cost processor for embedded applications.

Contd.
6/2/2021
LECTURE 4
8
INSTRUCTION SET ARCHITECTURE
ü ARM instruction set is highly regular, designed for efficient
implementation of the processor and efficient execution.
ü All instructions are 32 bits long and follow a regular format.
ü ARM ISA is theThumb instruction set, which is a re-encoded
subset of theARM instruction set.
ü Thumb is designed to increase the performance of ARM
implementations that use a 16-bit or narrower memory data
bus and allow better code density than provided by the ARM
instruction set.
ü TheThumb instruction set contains a subset of the ARM 32-
bit instruction recoded into 16-bit instructions.

Contd.
6/2/2021
LECTURE 4
9
ARM PRODUCTS
ü ARM Holdings licenses a number of specialized
microprocessors and related technologies, but the bulk of
their product line is the Cortex family of microprocessor
architectures.
ü There are 3 Cortex architectures, conveniently labeled with
the initialsA, R and M.
1. CORTEX-A/CORTEX-A50
2. CORTEX-R
3. CORTEX-M

Contd.
6/2/2021
LECTURE 4
10
üCORTEX-A/CORTEX-A50
i. They are application processors
ii. They are intended for mobile devices as Smartphones and
eBooks readers as well as consumer devices such as digital
TV and home gateways.
iii. These processors run at higher clock frequency.
iv. They support a MMU which is required for full feature
Oss such as Linux,Android, MSWindows and Mobile Oss.
v. The two architectures use both the ARM and Thumb-2
instruction sets.
vi. Cortex-A is a 32-bit machine and Cortex-A50 is a 64-bit
machine.

Contd.
6/2/2021
LECTURE 4
11
ü CORTEX-R
i. It is designed to support real-time applications, in which the timing
of events needs to be controlled with rapid response to events.
ii. They run at a higher clock frequency and have a very low response
latency.
iii. It includes enhancements both to the instruction set and to the
processor organization to support deeply embedded real-time devices.
iv. Most of these processors don’t have MMU(memory management
unit), limited data requirements and limited number of simultaneous
processes eliminates the need for elaborate hardware and software
support for virtual memory.
v. It does not have a MPU(memory protection unit), cache and other
memory features designed for industrial applications.
vi. E.g. automotive braking systems, mass storage controllers and
networking and printing devices.

Contd.
6/2/2021
LECTURE 4
12
ü CORTEX-M
i. They have been developed primarily for the microcontroller
domain where the need for fast, highly deterministic interrupt
management is coupled with the desire for extremely low gate
count and lowest possible power consumption.
ii. They have MPU but no MMU.
iii. It uses theThumb-2 instruction set.
iv. E.g. IoT devices, wireless sensor/actuator networks used in
factories and other enterprises, automotive body electronics.
v. There are currently 4 versions viz. Cortex-M0, Cortex-M0+,
Cortex-M3 and Cortex-M4.

CLOUD COMPUTING
6/2/2021
LECTURE 4
13
üGeneral concepts for cloud computing had developed in
1950s.
üThe cloud services first became available in the early 2000s
and particularly targeted at large enterprises.
üThen cloud computing has spread to small and medium size
businesses and recently to consumers.
üEvernote, the cloud based notetaking and archiving
services were lunched in 2008.
üApple’s iCloud was launched in 2012.

Contd.
6/2/2021
LECTURE 4
14
BASIC CONCEPTS
ü CLOUD COMPUTING : A model
for enabling ubiquitous, convenient,
on-demand network access to a
shared pool of configurable
computing resources that can be
rapidly provisioned and released with
minimal management effort or
service provider interaction.
ü All information technology (IT)
operations are moved to an Internet
connected infrastructure known as
enterprise cloud computing.
Fig.1 : Cloud Computing

Contd.
6/2/2021
LECTURE 4
15
ü We get economics of scale, professional network management
and professional security management with cloud computing.
ü These features are attractive to companies, government
agencies and individual PC and mobile users.
ü The individual or company only needs to pay for the storage
capacity and services they need.
ü The setting up a database system, acquiring the hardware they
need, doing maintenance and backing up the data are all parts of
the cloud services.
ü The cloud also takes care of the data security.

Contd.
6/2/2021
LECTURE 4
16
CLOUD NETWORKING
ü It refers to the networks and network management
functionality that must be in place to enable cloud computing.
ü It also refers to the collection of network capabilities required
to access a cloud, including making use of specialized services
over the internet, linking enterprise data centers to a cloud, and
using firewalls and other network security devices at critical
points to enforce access security policies.
ü Cloud storage can be thought of one subset of cloud computing.
ü Cloud storage consists of database storage and database
applications hosted remotely on cloud servers.

Contd.
6/2/2021
LECTURE 4
17
TYPES OF CLOUD NETWORKS
Fig.2: Cloud Networks

Contd.
6/2/2021
LECTURE 4
18
CLOUD SERVICES
ü A cloud service provider
(CSP) maintains computing
and data storage resources
that are available over the
internet or private networks.
ü Customer s can rent a
portion of these resources
as needed.
ü All cloud ser vices are
provided using one of the
three models: (i) SaaS (ii)
PaaS (iii) IaaS.
Fig.3: Cloud Services

Contd.
6/2/2021
LECTURE 4
19
Fig.4:Alternative InformationTechnologyArchitectures [Source:
Computer Organization andArchitecture byWilliam Stallings]

Contd.
SaaS – Software as a Service
 In simple this is a service which leverages business to roll over the
internet. SaaS is also called as “on-demand software” and is priced on pay-
per-use basis. SaaS allows a business to reduce IT operational costs by
outsourcing hardware and software maintenance and support to the cloud
provider. SaaS is a rapidly growing market as indicated in recent reports
that predict ongoing double digit growth.
PaaS – Platform as a Service
 PaaS is quiet similar to SaaS rather than SaaS been offered through web
the PaaS creates software, delivered over the web.
 PaaS provides a computing platform and solution stack as a service. In this
model user or consumers creates software using tools or libraries from
the providers. Consumer also controls software deployment and
configuration settings. Main aim of provider is to provide networks,
servers, storage and other services.
Computer Organization andArchitecture
20

Contd.
IaaS – Infrastructure as a Service
 Infrastructure is the foundation of cloud computing.
 It provides delivery of computing as a shared service reducing
the investment cost, operational and maintenance of hardware.
 Infrastructure as a Service (IaaS) is a way of delivering Cloud
Computing infrastructure – servers, storage, network and
operating systems – as an on-demand service.
 Rather than purchasing servers, software, datacenter space or
network equipment, clients instead buy those resources as a
fully outsourced service on demand.
21

OTHER SERVICES OF CLOUD COMPUTING
Here the components in the sense refer to the platforms like cloud delivery, usage of
network’s front end back end which together forms the cloud computing architecture.
1.Storage-as-a-service: In this component, we can avail storage as we do it at the
remote site. It is the main component and called as disk space on demand.
2.Database-as-a-service: This acts as a live database and main aim of this component is
to reduce the price of dB by using more number of software and hardware.
3.Information-as-a-service: Data that can approach from anywhere is known as
information-as-a-service. Internet banking, online news and much more are included in it.
4.Process-as-a-service: Combination of different sources like information and services
is done in process-as-a-service; it is mainly helpful for mobile networks.
22

Contd.
5. Application-as-a-service: It is a complete application which is
ready to use and it is the final front end for the users. Few sample
applications are Gmail, Google calendar and much more.
7. Integration-as-a-service: This deals with components of an
application that are built and need to integrate with other applications.
8. Security-as-a-service: This component is required to many
customers because the security has the initial preference.
9. Management-as-a-service: This component is useful for the
management of the clouds.
10.Testing-as-a-service: This component refers to the testing of
applications that are hosted remotely.
23

ADVANTAGES
1. Say ‘Goodbye’ to costly systems: Cloud hosting enables the businesses to enjoy minimal
expenditure.As everything can be done in the cloud, the local systems of the employees have very
less to do with. It saves the dollars that are spent on costly devices.
2. Access from infinite options: Another advantage of cloud computing is accessing the
environment of cloud not only from the system but through other amazing options.These options
are tablets, IPad, netbooks and even mobile phones. It not only increases efficiency but enhances
the services provided to the consumers.
3. Software Expense: Cloud infrastructure eliminates the high software costs of the businesses.
The numbers of software are already stored on the cloud servers. It removes the need for buying
expensive software and paying for their licensing costs.
4. The cooked food: The expense of adding new employees is not affected by the applications’
setup, installation and arrangement of a new device. Cloud applications are right at the desk of
employees that are ready to let them perform all the work.The cloud devices are like cooked food.
5. Free Cloud Storage: Cloud is the best platform to store all your valuable information.The
storage is free, limitless and forever secure, unlike your system.
24

Contd.
5. Lowers traditional servers’ cost: Cloud for business removes the huge costs at the front for the
servers of the enterprise.The extra costs associated with increasing memory, hard drive space and
processing power are all abolished.
6. Data Centralization: Another key benefit of cloud services is the centralized data.The information
for multiple projects and different branch offices are stored in one location that can be accessed from
remote places.
7. Data Recovery: Cloud computing providers enables automatic data backup on the cloud system.The
recovery of data when a hard drive crash is either not possible or may cost a huge amount of dollars or
wastage of valuable time.
8. Sharing Capabilities: We talked about documents accessibility, let’s hit sharing too.All your precious
documents and files can be emailed, and shared whenever required. So, you can be present wherever you
are not!
9. Cloud Security: Cloud service vendor chooses only the highest secure data centers for your
information. Moreover, for sensitive information in the cloud there are proper auditing, passwords, and
encryptions.
10. Instantly Test: Various tools employed in cloud computing permits you to test a new product,
application, feature, upgrade or load instantly.The infrastructure is quickly available with flexibility and
scalability of distributed testing environment.
25

DISADVANTAGES
1. Net Connection: For cloud computing, an internet connection is a must to access your
precious data.
2. Low Bandwidth: With a low bandwidth net, the benefits of Cloud computing cannot be
utilized. Sometimes even a high bandwidth satellite connection can lead to poor quality
performance due to high latency.
3. Affected Quality: The internet is used for various reasons such as listening to audios,
watching videos online, downloading and uploading heavy files, printing from the cloud and
the list goes on.The quality of Cloud computing connection can get affected when a lot of
people utilize the net at the same time.
4. Security Issues: Of course, cloud computing keeps your data secure. But for maintaining
complete security, an IT consulting firm’s assistance and advice is important. Else, the
business can become vulnerable to hackers and threats.
5. Non-negotiable Agreements: Some cloud computing vendors have non-negotiable
contracts for the companies. It can be disadvantageous for a lot of businesses.
26

Contd.
6. Cost Comparison: Cloud software may look like an affordable option when compared to an in-house
installation of software. But it is important to compare the features of the installed software and the cloud
software.As some specific features in the cloud software can be missing that might be essential for your business.
Sometimes you are charged extra for unrequired additional features.
7. No Hard Drive: As Steve Jobs, the late chairman of Apple had exclaimed “I don’t need a hard disk on my
computer if I can get to the server faster… carrying around these non-connected computers is byzantine by
comparison.” But some people who use programs cannot do without an attached hard drive.
8. Lack of full support: Cloud-based services do not always provide proper support to the customers.The
vendors are not available on e-mail or phones and want the consumers to depend on FAQ and online community
for support. Due to this, complete transparency is never offered.
9. Incompatibility: Sometimes, there are problems of software incompatibility.As some applications, tools, and
software connect particularly to a personal computer.
10. Fewer insights into your network: It’s true cloud computing companies provide you access to data like
CPU, RAM, and disk utilization. But just think once how minimal your insight becomes into your network. So,
if it’s a bug in your code, a hardware problem or anything, without recognizing the issue it is impossible to fix it.
11. Minimal flexibility: The application and services run on a remote server. Due to this, enterprises using cloud
computing have minimal control over the functions of the software as well as hardware.The applications can
never be run locally due to the remote software.
27

REVIEW QUESTIONS
6/2/2021
LECTURE 4
28
1. What in general is the distinction between computer organization and
computer architecture?
2. What is the distinction between computer structure and computer
functions?
3. What are the four main functions of a computer?
4. List and briefly define the main structural components of a computer.
5. List and briefly define the main structural components of a processor.
6. What is a stored program computer?
7. Explain Moore’s Law.
8. List and explain the key characteristics of computer family.
9. What is the key distinguishing feature of a microprocessor?
10. On the IAS, describe the process that the CPU must undertake to read a
value from memory and write a value to memory in terms of what is put
into MAR, MBR, address bus, data bus and control bus.

Computer Organization
and Architecture
(EET 2211)
Chapter-2
Lecture 01

Chapter 2
Performance
Issues
Computer Organization &
Architecture(EET2211)

Designing for Performance
• Year by year, the cost of computer systems continues to drop
dramatically, while the performance and capacity of those systems
continue to rise equally dramatically.
• What is fascinating about all this from the perspective of computer
organization and architecture is that, on the one hand, the basic
building blocks for today’s computer miracles are virtually the same
as those of the IAS computer from over 50 years ago, while on the
other hand, the techniques for squeezing the maximum
performance out of the materials at hand have become increasingly
sophisticated.

• Here in this section, we highlight some of the driving factors behind
the need to design for performance.
• Microprocessor Speed :The evolution of these machines continues
to bear out Moore’s law, as described in Chapter 1.
Pipelining:
Branch prediction:
Superscalar execution:
Data flow analysis:
Speculative execution:

• Performance Balance:
While processor power has raced ahead at breakneck speed, other
critical components of the Computer have not kept up. The result is
a need to look for performance balance: an adjustment/tuning of
the organization and architecture to compensate for the mismatch
among the capabilities of the various components.
• The problem created by such mismatches is particularly critical at
the interface between processor and main memory.
• If memory or the pathway fails to keep pace with the processor’s
insistent demands, the processor stalls in a wait state, and valuable
processing time is lost.

A system architect can attack this problem in a number of ways, all of
which are reflected in contemporary computer designs. Consider
the following examples:
• Increase the number of bits that are retrieved at one time by
making DRAMs “wider” rather than “deeper” and by using wide bus
data paths.
• Change the DRAM interface to make it more efficient by including a
cache or other buffering scheme on the DRAM chip.
• Reduce the frequency of memory access by incorporating
increasingly complex and efficient cache structures between the
processor and main memory.
• Increase the interconnect bandwidth between processors and
memory by using higher-speed buses and a hierarchy of buses to
buffer and structure data flow.

Another area of design focus is the handling of I/O devices. As
computers become faster and more capable, more sophisticated
applications are developed that support the use of peripherals with
intensive I/O demands.
Typical I/O Device Data Rates

• The key in all this is balance. This design must constantly be
rethought to cope with two constantly evolving factors:
(i) The rate at which performance is changing in the various
technology areas (processor, buses, memory, peripherals) differs
greatly from one type of element to another.
(ii) New applications and new peripheral devices constantly
change the nature of the demand on the system in terms of typical
instruction profile and the data access patterns.

• Improvements in Chip Organization and
Architecture:
As designers wrestle with the challenge of balancing processor
performance with that of main memory and other computer
components, the need to increase processor speed remains. There
are three approaches to achieving increased processor speed:
(i) Increase the hardware speed of the processor
(ii) Increase the size and speed of caches
(iii) Increase the effective speed of instruction execution

• Traditionally, the dominant factor in performance gains has been in
increases in clock speed due and logic density. However, as clock
speed and logic density increase, a number of obstacles become
more significant [INTE04]:
Power: As the density of logic and the clock speed on a chip
increase, so does the power density (Watts/cm2).
RC delay: The speed at which electrons can flow on a chip
between transistors is limited by the resistance and capacitance of
the metal wires connecting them; specifically, delay increases as
the RC product increases.
Memory latency and throughput: Memory access speed
(latency) and transfer speed (throughput) lag processor speeds, as
previously discussed.

MULTICORE,MICS,GPGPUS
• With all of the difficulties cited in the preceding section in mind,
designers have turned to a fundamentally new approach to
improving performance: placing multiple processors on the same
chip, with a large shared cache. The use of multiple processors on
the same chip, also referred to as multiple cores or multicore,
provides the potential to increase performance without increasing
the clock rate.
• Chip manufacturers are now in the process of making a huge leap
forward in the number of cores per chip, with more than 50 cores
per chip. The leap in performance as well as the challenges in
developing software to exploit such a large number of cores has led
to the introduction of a new term: many integrated core (MIC).

• The multicore and MIC strategy involves a homogeneous collection
of general purpose processors on a single chip. At the same time,
chip manufacturers are pursuing another design option: a chip with
multiple general-purpose processors plus graphics processing units
(GPUs) and specialized cores for video processing and other tasks.
• The line between the GPU and the CPU [AROR12, FATA08, PROP11].
When a broad range of applications are supported by such a
processor, the term general-purpose computing on GPUs (GPGPU)
is used.

Amdahl’s Law & Little’s Law
• Amdahl’s Law
• Amdahl’s law was first proposed by Gene Amdahl in 1967
([AMDA67], [AMDA13]) and deals with the potential speedup of a
program using multiple processors compared to a single processor.
Illustration of Amdahl’s Law

From this equation two important conclusions can be drawn:
1. When f is small, the use of parallel processors has little effect.
2. As N approaches infinity, speedup is bound by 1/ (1 - f ), so that
there are diminishing returns for using more processors.

• Amdahl’s law can be generalized to evaluate any design or technical
improvement in a computer system. Consider any enhancement to
a feature of a system that results in a speedup. The speedup can be
expressed as

Amdahl’s Law for Multiprocessors

Suppose that a feature of the system is used during execution a
fraction of the time f, before enhancement, and that the speedup of
that feature after enhancement is SUf. Then the overall speedup of
the system is

• Little’s Law
• A fundamental and simple relation with broad applications is
Little’s Law [LITT61,LITT11]. We can apply it to almost any
system that is statistically in steady state, and in which there is
no leakage.
• we have a steady state system to which items arrive at an
average rate of λ items per unit time. The items stay in the
system an average of W units of time. Finally, there is an
average of L units in the system at any one time.
Little’s Law relates these three variables as L = λ W

• To summarize, under steady state conditions, the average number
of items in a queuing system equals the average rate at which items
arrive multiplied by the average time that an item spends in the
system.
• Consider a multicore system, with each core supporting multiple
threads of execution. At some level, the cores share a common
memory. The cores share a common main memory and typically
share a common cache memory as well.
• For this purpose, each user request is broken down into subtasks
that are implemented as threads. We then have λ = the average
rate of total thread processing required after all members’ requests
have been broken down into whatever detailed subtasks are
required. Define L as the average number of stopped threads
waiting during some relevant time. Then W= average response time.

EET 2211
ARCHITECTURE (COA)

CHAPTER 2 – PERFORMANCE ISSUES
6/2/2021
LECTURE 4
2
TOPICSTO BE COVERED
Ø Designing for performance
Ø Multicore, MICs and GPGPUs
Ø Amdahl’s & Little’s Law
Ø Basic measures of Computer performance
Ø Calculating the mean

LEARNING OBJECTIVES
6/2/2021
LECTURE 4
3
After studying this chapter, you should be able to:
v Understand the key performance issues that relate to computer
design.
v Explain the reasons for the move to multicore organization, and
understand the trade-off between cache and processor resources
on a single chip.
v Distinguish among multicore, MIC and GPGPU organizations.
v Summarize some of the issues in computer performance
assessment.
v Explain the differences among arithmetic, harmonic and
geometric means.

Overview of Previous Lecture
Designing for Performance:
Microprocessor Speed :
Pipelining:
Branch prediction:
Superscalar execution:
Data flow analysis:
Speculative execution:
Performance Balance:
Improvements in Chip Organization and Architecture:
MULTICORE,MICS,GPGPUS:
Amdhal’s Law & Little’s Law:
Computer Organization &Architecture(EET2211)

Contd.
6/2/2021
LECTURE 4
5
 AMDAHL’S LAW:
Speedup =
Time to execute program on a single processor
Time to execute program on N parallel processors
=
� 1−� +��
� 1−� +
��
�
=
1
1−� +
�
�
 LITTLE’S LAW:
We have a steady state system to which items arrive at an
average rate of λ items per unit time.The items stay in the
system an average ofW units of time. Finally, there is an average
of L units in the system at any one time. Little’s Law relates
these three variables as L = λ W

Basic Measures of a Computer
Performance
6/2/2021
LECTURE 4
6
ØIn evaluating processor hardware and setting
requirements for new systems, performance is one of the
key parameters to consider, along with cost, size,
security, reliability and in some cases, power
consumption.
ØIn this section, we look at some traditional measures of
processor speed. In the next section, we examine
benchmarking, which is the most common approach to
assessing processor and computer system performance.

1. Clock Speed
ü Operations performed by a processor, such as fetching an instruction,
decoding the instruction, performing an arithmetic operation, and so on, are
governed by a system clock.
ü the speed of a processor is dictated by the pulse frequency produced by the
clock, measured in cycles per second, or Hertz (Hz).
ü The rate of pulses is known as the clock rate, or clock speed. One
increment, or pulse, of the clock is referred to as a clock cycle, or a clock
tick. The time between pulses is the cycle time.

ü The clock rate is not arbitrary, but must be appropriate for the physical
layout of the processor.Actions in the processor require signals to be sent
from one processor element to another.
ü Most instructions on most processors require multiple clock cycles to
complete. Some instructions may take only a few cycles, while others require
dozens.
ü In addition, when pipelining is used, multiple instructions are being executed
simultaneously.
ü Thus, a straight comparison of clock speeds on different processors does not
tell the whole story about performance.

System Clock

2. Instruction Execution Rate
 A processor is driven by a clock with a constant frequency f or, equivalently,
a constant cycle time τ, where τ = 1/ f.
 The instruction count, Ic , for a program is the number of machine
instructions executed for that program until it runs to completion or for
some defined time interval.
 An important parameter is the average cycles per instruction (CPI) for a
program.
 The overall CPI:
CPI = �=1
�
�� × ��
��

 The processor timeT needed to execute a given program can be expressed
as:
T= Ic × CPI × τ
 The above formula can be refined by recognizing that during the execution of
an instruction, part of work done by the processor, and part of the time a
word is being transferred to or from memory.The time to transfer depends
on the memory cycle time which may be greater than the processor cycle
time.
T= Ic × [p + (m×k)] × τ
 Here p = number of processor cycles needed to decode and execute the
instruction, m= number of memory references needed and k = ratio
between memory cycle time and processor cycle time.

 The five performance factors in the preceding equation (Ic, p, m, k, τ) are
influenced by four system attributes: the design of the instruction set (known
as instruction set architecture);compiler technology (how effective the compiler
is in producing an efficient machine language program from a high-level
language program); processor implementation; and cache and memory
hierarchy.
TABLE: Performance Factors and SystemAttributes

 A common measure of performance for a processor is the rate at which
instructions are executed, expressed as millions of instructions per second
(MIPS), referred to as the MIPS rate. We can express the MIPS rate in
terms of the clock rate and CPI as follows:
MIPS rate =
��
� × 106 =
�
�� × 106
 Another common performance measure deals only with floating-point
instructions.These are common in many scientific and game applications.
Floating-point performance is expressed as millions of floating-point
operations per second(MFLOPS),defined as follows:
MFLOPS rate =
�� −��
�� × 106

Equation summary
6/2/2021
LECTURE 4
14
1. CPI = �=1
�
�� × ��
��
2. T= Ic × CPI × τ
3. T= Ic × [p + (m×k)] × τ
4. MIPS rate =
��
� × 106 =
�
�� × 106
5. MIPS rate =
�� −��
�� × 106

EET 2211
ARCHITECTURE (COA)

CHAPTER 2 – PERFORMANCE ISSUES
6/2/2021
2
TOPICSTO BE COVERED
Ø Designing for performance
Ø Multicore, MICs and GPGPUs
Ø Amdahl’s & Little’s Law
Ø Basic measures of Computer performance
Ø Calculating the mean

LEARNING OBJECTIVES
6/2/2021
3
v Understand the key performance issues that relate to computer
design.
v Explain the reasons for the move to multicore organization, and
understand the trade-off between cache and processor resources
on a single chip.
v Distinguish among multicore, MIC and GPGPU organizations.
v Summarize some of the issues in computer performance
assessment.
v Explain the differences among arithmetic, harmonic and
geometric means.

Overview of Previous Lecture
1. CPI = �=1
�
(�� × ��)
��
2. T= Ic × CPI × τ
3. T= Ic × [p + (m×k)] × τ
4. MIPS rate =
��
� × 106 =
�
�� × 106
5. MIPS rate =
�� −��
�� × 106
Computer Organization &Architecture(EET2211) 6/2/2021
4

CALCULATING THE MEAN
 In evaluating some aspect of computer system performance, it is often the
case that a single number, such as execution time or memory consumed, is
used to characterize performance and to compare systems.
 Especially in the field of benchmarking, single numbers are typically used for
performance comparison and this involves calculating the mean value of a set
of data points related to execution time.
 It turns out that there are multiple alternative algorithms that can be used for
calculating a mean value, and this has been the source of controversy in the
benchmarking field.
5

 In this section, we define these alternative algorithms and comment on some
of their properties.
 The three common formulas used for calculating a mean are:
Arithmetic Mean
Geometric Mean
Harmonic Mean
6

v Given a set of n real numbers (x1, x2, …,xn ), the three means are
defined as follows:
1.Arithmetic Mean
 An AM is an appropriate measure if the sum of all the measurements is a
meaningful and interesting value.The AM is a good candidate for comparing
the execution time performance of several systems.
 The AM used for a time-based variable (e.g., seconds), such as program
execution time, has the important property that it is directly proportional to
the total time.
�� =
�1+....+��
�
=
1
� �=1
�
��
 We can conclude that the AM execution rate is proportional to the sum of
the inverse execution time.
7

2. Harmonic Mean
 For some situations, a system’s execution rate may be viewed as a more
useful measure of the value of the system. This could be either the
instruction execution rate, measured in MIPS or MFLOPS, or a program
execution rate, which measures the rate at which a given type of program
can be executed.
 The HM is inversely proportional to the total execution time, which is the
desired property.
6/2/2021
8

 Let us look at a basic example and first examine how the AM performs.
Suppose we have a set of n benchmark programs and record the execution
times of each program on a given system as t1 , t2, …, tn.
 For simplicity, let us assume that each program executes the same number of
operations Z; we could weight the individual programs and calculate
accordingly but this would not change the conclusion of our argument.
 The execution rate for each individual program is R i= Z/ ti .We use the AM
to calculate the average execution rate.
6/2/2021
9

 If we use the AM to calculate the average execution rate.
We see that the AM execution rate is proportional to the sum of the inverse
execution times, which is not the same as being inversely proportional to the
sum of the execution times.Thus, the AM does not have the desired property.
 The HM yields the following result
The HM is inversely proportional to the total execution time, which is the
desired property.
10
�� =
�
�=1
�
1
��
=
�
�=1
�
1
� ��
=
��
�=1
�
��

 A simple numerical example will illustrate the difference between the two
means in calculating a mean value of the rates, shown in Table below.The
table compares the performance of three computers on the execution of two
programs. For simplicity, we assume that the execution of each program
results in the execution of 108 floating-point operations.
 The left half of the table shows the execution times for computer running
each program, the total execution time and the AM of the execution times.
Computer A executes in less total time than B, which executes in less total
time than C, and this is also reflected by the AM.
 The right half of the table shows a comparison in terms of MFLOPS rate.
11

Table2.1 A Comparison of Arithmetic and Harmonic Means for Rates
6/2/2021
12
ü The greatest value of AM is for computer A, which means computer A is the fastest computer.
B is also slower than C, where as B is faster than C.
ü In terms of total execution time,A has minimum time, so it is the fastest computer out of the
three.
ü The HM values correctly reflect the speed ordering of the computers.This confirms that the
HM is preferred when calculating the rates.

 There are two reasons for doing the individual calculations rather than only
looking at the aggregate numbers:
❶ A customer or researcher may be interested not only in the overall average
performance but also performance against different types of benchmark
programs, such as business applications, scientific modelling, multimedia
applications and system programs.
❷ Usually, the different programs used for evaluation are weighted differently.
In Table 2.1 it is assumed that the two test programs execute the same
number of operations. If that is not the case, we may want to weight
accordingly. Or different programs could be weighted differently to reflect
importance or priority.
13

 Let us see what the result is if test programs are weighted proportional to the
number of operations. The weighted HM is therefore:
 We can see that the weighted HM is the quotient of the sum of the operation
count divided by the sum of the execution times.
14
�� =
1
�=1
�
��
�=1
�
��
1
��
=
�
�=1
�
��
�=1
�
��
��
��
=
�=1
�
��
�=1
�
��

3. Geometric Mean
 Here we note that
i. with respect to changes in values, the GM gives equal weight to all of the
values in the data set
ii. and the GM of the ratios equals the ratio of the GMs (equation is given
below)
�� =
�=1
�
��
��
1 �
= �=1
�
��
1 �
�=1
�
��
1 �
15

 For use with execution times, as opposed to rates, one drawback of the
GM is that it may be non-monotonic relative to the AM.
 One property of the GM that has made it appealing for benchmark analysis is
that it provides consistent results when measuring the relative performance
of machines.
 This is in fact what benchmarks are used for i.e. to compare one machine
with another in terms of performance metrics.The results are expressed in
terms of normalized values to a reference machine.
 A simple example will illustrate the way in which the GM exhibits
consistency for normalized results. In Table 2.2, we use the same
performance results as were used inTable 2.1.
16

Table 2.2 A Comparison of Arithmetic and Geometric Means for
Normalized Results
6/2/2021
17

Table 2.3 Another Comparison of Arithmetic and Geometric Means for
Normalized Results
6/2/2021
18

Why to choose GM?
1. As mentioned, the GM gives consistent results regardless of which system
is used as a reference. Because benchmarking is primarily a comparison
analysis, this is an important feature.
2. The GM is less biased by outliers than the HM or AM.
3. Distributions of performance ratios are better modelled by lognormal
distributions than by normal ones, because of the generally skewed
distribution of the normalized numbers.The GM can be described as the
back-transformed average of a lognormal distribution.
19

 It can be shown that the following inequality holds:
AM ≥ GM ≥ HM
The values are equal only if x1= x2= ….xn.
 We can get a useful insight into these alternative calculations by defining the
Functional mean (FM).
20

 Let f(x) be a continuous monotonic function defined in the interval 0 ≤ y ˂
∞….The functional mean with respect to the function f(x) for n positive
real numbers (x1, x2, …, xn) is defined as:
�� = �−1 � �1 +....+�(��)
�
= �−1 1
� �=1
�
�(��)
where f -1(x) is the inverse of f(x).
 The mean values are also special cases of the functional mean as defined as
follows:
i. AM is the FM with respect to f(x) = x
ii. GM is the FM with respect to f(x) = ln x
iii. HM is the FM with respect to f(x) = 1/ x
21

REVIEW QUESTIONS
6/2/2021
22
1. List and briefly discuss the obstacles that arise when clock speed
and logic density increases.
2. What are the advantages of using a cache?
3. Briefly describe some of the methods used to increase processor
speed.
4. Briefly characterize Little’s law.
5. How can we determine the speed of a processor?
6. With respect to the system clock define the terms of clock rate,
clock cycle and cycle time.
7. Define MIPS and MFLOPS.
8. When is harmonic mean an appropriate measure of the value of a
system?
9. Explain each variable that is related to Little’s law.

EET 2211
ARCHITECTURE (COA)

6/2/2021
5
�� = �−1
� �1 + . . . . + �(��)
�
= �−1
1
�
�=1
�
�(��)

v Benchmark Principles
Measures such as MIPS and MF LOPS have proven Inadequate to evaluating
the performance of processors. Because of differences in instruction sets, the
instruction execution rate is not a valid means of comparing the performance
of different architectures.
v Characteristics of a benchmark program:
1. It is written in a high-level language, making it portable across different
machines.
2. It is representative of a particular kind of programming domain or
paradigm, such as systems programming, numerical programming, or
commercial programming.
3. It can be measured easily.
4. It has wide distribution.
Computer Organization &Architecture
(EET 2211)

SPEC Benchmarks
 The common need in industry and academic and research communities for
generally accepted computer performance measurements has led to the
development of standardized
benchmark suites.
 A benchmark suite is a collection of programs, defined in a high-level
language, that together attempt to provide a representative test of a
computer in a particular application or system programming area.
 The best known such collection of benchmark suites is defined and
maintained by the Standard Performance Evaluation Corporation
(SPEC), an industry consortium.
(EET 2211)

Review Questions
2.1 List and briefly discuss the obstacles that arise when clock speed and logic density increase.
2.2 What are the advantages of using a cache?
2.3 Briefly describe some of the methods used to increase processor speed.
2.4 Briefly characterizeAmdahl’s law.
2.5 Define clock rate. Is it similar to clock speed?
2.6 Define MIPS and FLOPS.
2.7When is the Harmonic mean an appropriate measure of the value of a system?
2.8 Explain each variable that is related to Little’s Law.
(EET 2211)

(EET 2211)
PROBLEMS
2.1 What will be the overall speed up if N =10 and f =0.9
Speedup = 100/19 = 5.2632

Computer Organization &Architecture (EET 2211)
2.2 What fraction of the execution time involves code that is
parallel to achieve an overall speedup of 2.25.Assume 15 numbers
of parallel processors?
Here N=15 speedup=2.25
Hence f = 0.59

2.3.A doctor in a hospital observes that on average 6 patients per
hour arrive and there are typically 3 patient in the hospital.What
is the average range of time each patient spend in the hospital?
Here λ = 6 and L= 3
According to Little’s Law i.e. L = λ W
Therefore,W= L/ λ = 0.5 hrs = 30mins
(EET 2211)

2.4 Two benchmark programs are executed on three computers with the
following results:
(EET 2211)
Computer A Computer B Computer C
Program 1 50 20 10
Program 2 100 200 40
The table shows the execution time in seconds, with 10,000,000 instructions
executed in each of the two programs. Calculate the MIPS values for each
computer for each program.Then calculate the arithmetic and harmonic
means assuming equal weights for the two programs, and rank the computers
based on arithmetic mean and harmonic mean.

(EET 2211)
Program 1 .2 .5 1
Program 2 .1 .05 .25
MIPS rate:
AM rate .15 .275 .625
HM rate .133 .09 0.4
Mean calculation

(EET 2211)
AM rate 3rd 2nd 1st
HM rate 2nd 3rd 1st
Rank

2.5Two benchmark programs are executed on three computers with
the following result:
a. Compute the arithmetic mean value for each system using
X as the reference machine and then using Y as the reference
machine. Argue that intuitively the three machines have roughly
equivalent performance and that the arithmetic mean gives
misleading results.
b. Compute the geometric mean value for each system
using X as the reference machine and then using Y as the
reference machine. Argue that the results are more realistic than
with the arithmetic mean.
Computer Organization &Architecture (EET 2211)

(EET 2211)
Benchmar
k
Processor
X Y Z
1 20 10 40
2 40 80 20
Normalized w.r.t X
Benchmar
k
Processor
X Y Z
1 1 .5 2
2 1 2 .5
AM 1 1.25 1.25
GM 1 1 1

Benchmark Processor
X Y Z
1 2 1 4
2 .5 1 .25
AM 1.25 1 2.125
GM 1 1 1
(EET 2211)
Normalized w.r.t Y

(EET 2211)
PRACTICE QUESTIONS:
1. Let a program have 40% of its code enhanced to yield a system
speed of 4.3 times faster.What is the factor of improvement?
2. The following table, based on data reported in the literature
[HEAT84], shows the execution times, in seconds, for five different
benchmark programs on three machines.
a. Compute the speed metric for each processor for each benchmark,
normalized to machine R.Then compute the arithmetic mean value
for each system.
b. Repeat part (a) using M as the reference machine.
c.Which machine is the slowest based on each of the preceding two
calculations?
d. Repeat the calculations of parts (a) and (b) using the geometric
mean,Which machine is the slowest based on the two calculations?

(EET 2211)
3. Early examples of CISC and RISC design are the VAX 11/780
and the IBM RS/6000, respectively. Using a typical benchmark
program, the following machine characteristics result:

(EET 2211)
The final column shows that theVAX required 12 times longer
than the IBM measured in CPU time.
a.What is the relative size of the instruction count of the
machine code for this benchmark program running on the
two machines?
b. What are the CPI values for the two machines?

4. A benchmark program is run first on a 200 MHz.The executed
program consists of 1000,000 instruction executions, with the
following instruction mix and clock cycle count:
(EET 2211)
Determine the effective CPI and MIPS rate.
InstructionType Instruction Count Cycles per
Instruction
Integer arithmetic 400000 1
Data transfer 350000 2
Floating point 200000 3
Control transfer 50000 2

Computer Organization
and Architecture
(EET 2211)

Chapter 3
A Top-Level View of Computer Function and
Interconnection
6/2/2021 Computer Organization and Architecture 2

Learning Objectives:
• Understand the basic elements of an instruction cycle and the role of
interrupts.
• Describe the concept of interconnection within a computer system.
• Assess the relative advantages of point-to-point interconnection
compared to bus interconnection.
• Present an overview of QPI.
• Present an overview of PCIe.

Introduction:
• At a top level, a computer consists of CPU (central processing unit),
memory, and I/O components.
• At a top level, we can characterize a computer system by describing :
(1) the external behavior of each component, that is, the data and
control signals that it exchanges with other components, and
(2) the interconnection structure and the controls required to manage
the use of the interconnection structure.

Contd.
• Top-level view of structure and function is important because it explains the
nature of a computer and also provides understanding about the
increasingly complex issues of performance evaluation.
• This chapter focuses on the basic structures used for computer component
interconnection.
• The chapter begins with a brief examination of the basic components and
their interface requirements.
• Then a functional overview is provided.
• Then the use of buses to interconnect system components has been
explained.

3.1. Computer Components
All contemporary computer designs are based on the concepts of von
Neumann architecture. It is based on three key concepts:
• Data and instructions are stored in a single read–write memory.
• The contents of this memory are addressable by location, without
regard to the type of data contained there.
• Execution occurs in a sequential fashion (unless explicitly modified)
from one instruction to the next.

Programming in hardware
ØThe fig.1 shows a customized hardware.
ØThe system accepts data and produces
results.
ØIf there is a particular computation to be
performed, a configuration of logic
components designed specifically for that
computation could be constructed.
Ø However, a rewiring of the hardware is required if a different computation is
needed every time.
Fig.1. Programming in H/W.

Programming in software
• Instead of rewiring the hardware for each new
program, the programmer merely needs to
supply a new set of control signals.
• The fig.2 shows a general purpose hardware,
that will perform various functions on data
depending on control signals applied to the
hardware.
• The system accepts data and control signals
and produces results.
Fig.2. Programming in S/W.

How to supply the control signals?
• The entire program is actually a sequence of steps. At each step,
some arithmetic or logical operation is performed on some data.
• For each step, a new set of control signals is needed. Provide a unique
code for each possible set of control signals and add to the general-
purpose hardware a segment that can accept a code and generate
control signals as shown in fig.2.
• Instead of rewiring the hardware for each new program, provide a
new sequence of codes.
• Each code is an instruction, and part of the hardware interprets each
instruction and generates control signals. To distinguish this new
method of programming, a sequence of codes or instructions is called
software.

3.2 Computer Function
• The basic function performed by a computer is execution of a
program, which consists of a set of instructions stored in memory.
• The processor does the actual work by executing instructions
specified in the program.
• Instruction processing consists of two steps:
1. The processor reads (fetches) instructions from memory one at a
time and
2. Executes each instruction.
• Program execution consists of repeating the process of instruction
fetch and instruction execution. The instruction execution may involve
several operations and depends on the nature of the instruction

Computer Components: Top-Level View
Fig.3. Computer Components :
Top-Level View

Main Memory:
• Figure 3 illustrates these top-level components and suggests the
interactions among them.
ØMemory, or main memory:
• An input device may fetch instructions and data sequentially. But the
execution of a program may not be sequential always; it may jump around.
• Similarly, operations on data may require access to more than just one
element at a time in a predetermined sequence. Thus, there must be a
place to temporarily store both instructions and data. That module is called
memory, or main memory.
• The term ‘main memory’ has been used to distinguish it from external
storage or peripheral devices.
• Von Neumann stated that the same memory could be used to store both
instructions and data.

Central Processing Unit (CPU):
• The CPU exchanges data with memory by using two internal (to the
CPU) registers:
1. Memory Address Register (MAR): It specifies the address in
memory for the next read or write.
2. Memory Buffer Register (MBR): It contains the data to be written
into memory or receives the data read from memory.
• The CPU also contains:
• (I/O AR): It is an I/O address register which specifies a particular I/O
device.
• (I/OBR): It is an I/O buffer register which is used for the exchange of
data between an I/O module and the CPU.

Memory and I/O Module:
• Memory Module:
• It consists of a set of locations, defined by sequentially numbered
addresses.
• Each location contains a binary number that can be interpreted as
either an instruction or data.
• I/O module:
• It transfers data from external devices to CPU and memory, and vice
versa.
• It contains internal buffers for temporarily holding these data until
they can be sent on.

Instruction Fetch and Execute:
• The processing required for a single instruction is called an
instruction cycle.
• There are two steps referred to as the fetch cycle and the execute
cycle as shown in the fig.4.
Fig.4. Basic Instruction Cycle

Contd.
• At the beginning of each instruction cycle, the processor fetches an
instruction from memory.
• In a typical processor, a register called the program counter (PC)
holds the address of the instruction to be fetched next.
• Unless instructed, the processor always increments the PC after each
instruction fetch so that it will fetch the next instruction in sequence
(i.e., the instruction located at the next higher memory address).

Contd.
• The fetched instruction is loaded into a register in the processor
known as the instruction register (IR).
• The instruction contains bits that specify the action the processor is
to take.
• The processor interprets the instruction and performs the required
action.

Contd.
• The processor performs the following four actions:
• Processor-memory: Data may be transferred from processor to
memory or from memory to processor.
• Processor-I/O: Data may be transferred to or from a peripheral device
by transferring between the processor and an I/O module.
• Data processing: The processor may perform some arithmetic or logic
operation on data.
• Control: An instruction may specify that the sequence of execution be
altered.

Characteristics of a Hypothetical Machine
Fig.5. Characteristics of a Hypothetical Machine

Contd.
• An instruction’s execution may involve a combination of these
actions:
• Let us consider an example using a hypothetical machine that
includes the characteristics listed in fig.5.
• The processor contains a single data register, called an accumulator
(AC).
• Both instructions and data are 16 bits long. Thus, it is convenient to
organize memory using 16-bit words.
• The instruction format provides 4 bits for the opcode, so that there
can be as many as different opcodes, and
• Up to words of memory can be directly addressed.

Basic Instruction Cycle:
Fig.6. Instruction Cycle State Diagram

Contd.
• Fig.6. shows the state diagram of basic instruction cycle. The states
can be described as follows:
• Instruction address calculation (iac): Determine the address of the
next instruction to be executed. Usually, this involves adding a fixed
number to the address of the previous instruction.
• For example, if each instruction is 16 bits long and memory is
organized into 16-bit words, then add 1 to the previous address. If,
instead, memory is organized as individually addressable 8-bit bytes,
then add 2 to the previous address.
• Instruction fetch (if): Read instruction from its memory location into
the processor.

Contd.
• Instruction operation decoding (iod): Analyze instruction to
determine type of operation to be performed and operand(s) to be
used.
• Operand address calculation (oac): If the operation involves
reference to an operand in memory or available via I/O, then
determine the address of the operand.
• Operand fetch (of): Fetch the operand from memory or read it in
from I/O.
• Data operation (do): Perform the operation indicated in the
instruction.
• Operand store (os): Write the result into memory or out to I/O.

Contd.
• States in the upper part of fig.6. involve an exchange between the
processor and either memory or an I/O module.
• States in the lower part of the diagram involve only internal processor
operations.
• The oac state appears twice, because an instruction may involve a
read, a write, or both.
• However, the action performed during that state is fundamentally the
same in both cases, and so only a single state identifier is needed.

Thank You !

Interrupts
• Interrupt is a mechanism by which other modules (I/O, memory) may
interrupt the normal processing of the processor.
• Interrupts are provided primarily as a way to improve processing
efficiency.
• For example, most external devices are much slower than the
processor. Suppose that the processor is transferring data to a printer
using the instruction cycle scheme. After each write operation, the
processor must pause and remain idle until the printer catches up.
The length of this pause may be on the order of many hundreds or
even thousands of instruction cycles that do not involve memory.
Clearly, this is a very wasteful use of the processor.

Classes of Interrupts
Fig.1. Classes of Interrupts

Instruction Cycle with Interrupts
Fig.2. Instruction Cycle with Interrupts

• To accommodate interrupts, an interrupt cycle is added to the
instruction cycle, as shown in fig.2.
• In the interrupt cycle, the processor checks to see if any interrupts
have occurred, indicated by the presence of an interrupt signal.
• If no interrupts are pending, the processor proceeds to the fetch cycle
and fetches the next instruction of the current program.

• If an interrupt is pending, the processor does the following:
• It suspends execution of the current program being executed and
saves its context. This means saving the address of the next
instruction to be executed (current contents of the program counter)
and any other data relevant to the processor’s current activity.
• It sets the program counter to the starting address of an interrupt
handler routine.

Interrupt handler
Fig.3. Transfer of Control via Interrupts

• From the user program’s point of view, an interrupt is an interruption
of the normal sequence of execution.
• When the interrupt processing is completed, execution resumes as
shown in fig.3.
• The user program does not contain any special code to accommodate
interrupts; the processor and the operating system are responsible for
suspending the user program and then resuming it at the same point.

• When the processor proceeds to the fetch cycle, it fetches the first
instruction in the interrupt handler program, which will service the
interrupt.
• The interrupt handler program is generally part of the operating
system which determines the nature of the interrupt and performs
whatever actions are needed.
• In fig.3. the handler determines which I/O module generated the
interrupt and may branch to a program that will write more data out
to that I/O module.
• When the interrupt handler routine is completed, the processor can
resume execution of the user program at the point of interruption.

Program Flow of Control and program timing
without Interrupts
Fig.4(a) Flow control
• Fig.4. shows the program flow
of control with no interrupts.
• The user program performs a
series of WRITE calls interleaved
with processing.
• Code segments 1, 2, and 3 refer
to sequences of instructions
that do not involve I/O.
• The WRITE calls are to an I/O
program that is a system utility
and that will perform the actual
I/O operation.
Fig.4(b) Program Timing

• The I/O program consists of three sections:
• A sequence of instructions, labeled 4 in the figure, to prepare for the actual I/O
operation. This may include copying the data to be output into a special buffer
and preparing the parameters for a device command.
• The actual I/O command. Without the use of interrupts, once this command is
issued, the program must wait for the I/O device to perform the requested
function (or periodically poll the device). The program might wait by simply
repeatedly performing a test operation to determine if the I/O operation is done.
• A sequence of instructions, labeled 5 in the figure, to complete the operation.
This may include setting a flag indicating the success or failure of the operation.
*Because the I/O operation may take a relatively long time to complete, the I/O
program is hung up waiting for the operation to complete; hence, the user program
is stopped at the point of the WRITE call for some considerable period of time.

Program Flow of Control and Program Timing with
Interrupts: Short I/O wait
Fig.5(a) Flow Control
• With interrupts, the processor can be engaged
in executing other instructions while an I/O
operation is in progress.
• The I/O program that is invoked in this case
consists only of the preparation code and the
actual I/O command.
• After these few instructions have been
executed, control returns to the user program.
• Meanwhile, the external device is busy
accepting data from computer memory and
printing it.
• This I/O operation is conducted concurrently
with the execution of instructions in the user
program. Fig.5(b) Program
Timing

• When the external device becomes ready to be serviced—that is,
when it is ready to accept more data from the processor—the I/O
module for that external device sends an interrupt request signal to
the processor.
• The processor responds by suspending operation of the current
program, branching off to a program to service that particular I/O
device, known as an interrupt handler, and resuming the original
execution after the device is serviced.
• The points at which such interrupts occur are indicated by an
asterisk(x) in fig.5.

• Fig- 5(a) and 5(b) assume that the time required for the I/O operation is
relatively short: less than the time to complete the execution of
instructions between write operations in the user program.
• In this case, the segment of code labeled code segment 2 is interrupted.
• A portion of the code (2a) executes (while the I/O operation is performed)
and then the interrupt occurs (upon the completion of the I/O operation).
• After the interrupt is serviced, execution resumes with the remainder of
code segment 2 (2b).

Program Flow of Control and Program Timing with
Interrupts: Long I/O wait
• Let us consider a typical case where the
I/O operation will take much more time
than executing a sequence of user
instructions (especially for a slow device
such as a printer) as shown in fig.6(a).
• In this case, the user program reaches
the second WRITE call before the I/O
operation spawned by the first call is
complete.
• The result is that the user program is
hung up at that point.
Fig.6(a) Flow Control Fig.6(b) Program
Timing

• When the preceding I/O operation is completed, this new WRITE call
may be processed, and a new I/O operation may be started.
• Fig.6(b) shows the timing for this situation with the use of interrupts.
• We can see that there is still a gain in efficiency because part of the
time during which the I/O operation is under way overlaps with the
execution of user instructions.

Instruction Cycle State Diagram with Interrupts
Fig.7. Instruction Cycle State Diagram with Interrupts
Fig.7 shows a
r e v i s e d
instruction cycle
state diagram
that includes
interrupt
cycle processing.

Thank you !

Interconnection Structures
• A computer consists of a set of components or modules of three basic
types (processor, memory, I/O) that communicate with each other.
• In effect, a computer is a network of basic modules.
• Thus, there must be paths for connecting the modules.
• The collection of paths connecting the various modules is called the
interconnection structure.
• The design of this structure will depend on the exchanges that must
be made among modules.

Computer Modules
Fig.1. Computer Modules
• Fig.1. shows the types of
exchanges that are needed
by indicating the major forms
of input and output for each
module type.
• The wide arrows represent
multiple signal lines carrying
multiple bits of information
in parallel.
• E a c h n a r r o w a r r o w
represents a single signal line.

Contd..
• Memory:
• Typically, a memory module will consist of N words of equal length.
• Each word is assigned a unique numerical address (0, 1,……., N-1).
• A word of data can be read from or written into the memory.
• The nature of the operation is indicated by read and write control
signals.
• The location for the operation is specified by an address.

Contd..
• I/O module:
• From an internal (to the computer system) point of view, I/O is
functionally similar to memory.
• There are two operations; read and write.
• An I/O module may control more than one external device.
• Each of the interfaces to an external device is referred as a port which
is assigned with a unique address (e.g., 0, 1,……….,M-1).
• Also, there are external data paths for the input and output of data
with an external device.
• An I/O module may be able to send interrupt signals to the processor.

Contd..
• Processor:
• The processor reads in instructions and data, writes out data after
processing, and uses control signals to control the overall operation of
the system.
• It also receives interrupt signals.

Types of transfers
• The interconnection structure must support the following types of
transfers:
• Memory to processor: The processor reads an instruction or a unit of
data from memory.
• Processor to memory: The processor writes a unit of data to memory.
• I/O to processor: The processor reads data from an I/O device via an
I/O module.
• Processor to I/O: The processor sends data to the I/O device.
• I/O to or from memory: For these two cases, an I/O module is
allowed to exchange data directly with memory, without going
through the processor, using direct memory access.

Bus Interconnection
• A bus is a communication pathway connecting two or more devices.
• The bus is a shared transmission medium.
• Multiple devices connect to the bus, and a signal transmitted by any
one device is available for reception by all other devices attached to
the bus.
• If two devices transmit during the same time period, their signals will
overlap and become garbled.
• Thus, only one device at a time can successfully transmit.

Contd..
• A bus consists of multiple communication pathways, or lines.
• Each line is capable of transmitting signals representing binary 1 and binary
0.
• Hence a sequence of binary digits can be transmitted across a single line.
• Taken together, several lines of a bus can be used to transmit binary digits
simultaneously (in parallel).
• For example, an 8-bit unit of data can be transmitted over eight bus lines.

System Bus:
• Computer systems contain a number of different buses that provide
pathways between components at various levels of the computer
system hierarchy.
• A bus that connects major computer components (processor, memory,
I/O) is called a system bus.
• The most common computer interconnection structures are based on
the use of one or more system buses.
• A system bus consists around fifty to hundreds of separate lines. Each
line is assigned a particular meaning or function.

Types of System Bus:
• Although there are many different bus designs, but on any bus the
lines can be classified into three functional groups:
• Data lines
• Address lines, and
• Control lines.

Bus Interconnection Scheme
Fig.2. Bus Interconnection Scheme

Data lines:
• The data lines provide a path for moving data among system modules.
• These lines, collectively, are called the data bus.
• The data bus may consist of 32, 64, 128,or even more separate lines.
• The number of lines being referred to as the width of the data bus.
• Because each line can carry only one bit at a time, the number of
lines determines how many bits can be transferred at a time.
• The width of the data bus is a key factor in determining overall system
performance.
• For example, if the data bus is 32 bits wide and each instruction is 64
bits long, then the processor must access the memory module twice
during each instruction cycle.

COA Complete Notes.pdf

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to COA Complete Notes.pdf

Similar to COA Complete Notes.pdf (20)

Recently uploaded

Recently uploaded (20)

COA Complete Notes.pdf