CE412 -advanced computer Architecture lecture 1.pdf

Velammal Engineering College
Department of Computer Science
and Engineering
Welcome…
Slide Sources: Patterson & Hennessy COD book
website (copyright Morgan Kaufmann) adapted
and supplemented
Mr. A. Arockia Abins &
Ms. R. Amirthavalli,
Asst. Prof,
CSE,
Velammal Engineering College

Course Objectives
• This course aims to learn the basic structure and operations of
a computer.
• The course is intended to learn ALU, pipelined execution,
parallelism and multi-core processors.
• The course will enable the students to understand memory
hierarchies, cache memories and virtual memories.

Course Outcomes
CO 1
Discuss the basics structure of computers, operations and
instructions.
CO 2 Design arithmetic and logic unit.
CO 3 Analyze pipelined execution and design control unit.
CO 4 Analyze parallel processing architectures.
CO 5 Examine the performance of various memory systems
CO 6 Organize the various I/O communications.

Syllabus
Unit Titles:
• Unit I Basic Structure of a Computer System
• Unit II Arithmetic for Computers
• Unit III Processor and Control Unit
• Unit IV Parallelism
• Unit V Memory & I/O Systems

Syllabus – Unit I
UNIT-I BASIC STRUCTURE OF A COMPUTER
SYSTEM
Functional Units – Basic operational concepts –– Instructions:
Operations, Operands – Instruction representation – Instruction
Types – MIPS addressing, Performance

Syllabus – Unit II
UNIT-II ARITHMETIC FOR COMPUTERS
Addition and Subtraction – Multiplication – Division – Floating
Point Representation – Floating Point Addition and Subtraction.

Syllabus – Unit III
UNIT-III PROCESSOR AND CONTROL UNIT
A Basic MIPS implementation – Building a Datapath – Control
Implementation Scheme – Pipelining – Pipelined datapath and
control – Handling Data Hazards & Control Hazards.

Syllabus – Unit IV
UNIT-IV PARALLELISM
Introduction to Multicore processors and other shared memory
multiprocessors – Flynn’s classification: SISD, MIMD, SIMD,
SPMD and Vector – Hardware multithreading – GPU
architecture.

Syllabus – Unit V
• UNIT-V MEMORY & I/O SYSTEMS
Memory Hierarchy – memory technologies – Cache Memory –
Performance Considerations, Virtual Memory,TLB’s – Accessing
I/O devices – Interrupts – Direct Memory Access – Bus Structure
– Bus operation.

Text Books
• Book 1:
o Name: Computer Organization and Design: The
Hardware/Software Interface
o Authors: David A. Patterson and John L. Hennessy
o Publisher: Morgan Kaufmann / Elsevier
o Edition: Fifth Edition, 2014
• Book 2:
o Name: Computer Organization and Embedded Systems
Interface
o Authors: Carl Hamacher, Zvonko Vranesic, Safwat Zaky and
Naraig Manjikian
o Publisher: Tata McGraw Hill
o Edition: Sixth Edition, 2012

Introduction
• What is mean by Computer Architecture?
Hardware parts
Instruction set
Interface between hardware &
software

Introduction
ISA: a+b -> add a,b ->000100110101010

Instruction Set Architecture
(ISA)
ISA: The interface or contact between the hardware and
the software
Rules about how to code and interpret machine
instructions:
Execution model (program counter)
Operations (instructions)
Data formats (sizes, addressing modes)
Processor state (registers)
Input and Output (memory, etc.)

Introduction
• What is meant by Computer
Architecture?
Computer architecture encompasses
the speciﬁcation of an instruction set
and the functional behavior of the
hardware units that implement the
instructions.

UNIT-I
BASIC STRUCTURE OF A
COMPUTER SYSTEM
Topics:
• Functional Units
• Basic operational concepts
• Instructions: Operations, Operands
• Instruction representation
• Instruction Types
• MIPS addressing mode
• Performance

Functional Units
Also called
as Datapath

Functional Units
• Input unit
• Output unit
• Memory unit
• Arithmetic Logic unit
• Control unit

Functional Units
• Input unit

Functional Units
• Output unit

Functional Units
• Memory unit

Functional Units
Arithmetic & Logic unit and Control unit

Basic Operational Concepts
Unit I

Connection between the processor and the main
memory Code Snippet:
Load R2, LOC
Add R4, R3, R2
Store LOC, R4

IR & PC
• Instruction Register:
The instruction register (IR) holds the
instruction that is currently being executed.
• Program Counter:
The program counter (PC) contains the
memory address of the next instruction to be
fetched and executed.

Memory Locations and Addresses

Examples of encoded information in a
32-bit word.

Machine vs Assembly
Language
Machine Language Assembly Language
• A particular set of
instructions that the
CPU can directly
execute – but these
are ones and zeros
• Ex:
0100001010101
• Assembly language
is a symbolic
version of the
equivalent machine
language
• Ex:
add a,b

Instructions
• Instruction Set:
o The vocabulary of commands understand by a
given architecture.
• Some ISA:
o ARM
o Intel x86
o IBM Power
o MIPS
o SPARC
• Different CPUs implement different set of
instructions.

MIPS
MIPS - Microprocessor with Interlocked Pipeline Stages
Features:
• five-stage execution pipeline: fetch, decode, execute,
memory-access, write-result
• regular instruction set, all instructions are 32-bit
• three-operand arithmetical and logical instructions
• 32 general-purpose registers of 32-bits each
• only the load and store instruction access memory
• flat address space of 4 GBytes of main memory (2^32
bytes)

MIPS Assembly Language
• Categories:
oArithmetic – Only processor and registers
involved (sum of two registers)
oData transfer – Interacts with memory
(load and store)
oLogical - Only processor and registers
involved (and, sll)
oConditional branch – Change flow of
execution (branch instructions)
oUnconditional Jump – Change flow of
execution (jump to a subroutine)

Load & Store Instructions
• Load:
o Transfer data from memory to a register
• Store:
o Transfer a data from a register to memory
• Memory address must be specified by
load and store
•
Processor Memory
STORE
LOAD

MIPS Arithmetic
• All MIPS arithmetic instructions have 3 operands
• Operand order is fixed (e.g., destination first)
• Example:
C code: A = B + C
MIPS code: add $s0, $s1, $s2
compiler’s job to associate
variables with registers

MIPS Arithmetic
• Design Principle 1: simplicity favors regularity.
Translation: Regular instructions make for simple hardware!
• Simpler hardware reduces design time and manufacturing cost.
• Of course this complicates some things...
C code: A = B + C + D;
E = F - A;
MIPS code add $t0, $s1, $s2
(arithmetic): add $s0, $t0, $s3
sub $s4, $s5, $s0
• Performance penalty: high-level code translates to denser machine
code.
Allowing variable number
of operands would
simplify the assembly
code but complicate the
hardware.

MIPS Arithmetic
a b c f g h i j
$ s 0 $ s 1 $ s 2 $ s 3 $ s 4 $ s 5 $ s 6
$ s 7
a = b - c ;
f = ( g + h ) – ( i + j ) ;
s u b $ s 0 , $ s 1 , $ s 2
a d d $ t 0 , $ s 4 , $ s 5
a d d $ t 1 , $ s 6 , $ s 7
s u b $ s 3 , $ t 0 , $ t 1
1 9 / 6 7
T r y :
1 . f = g + ( h – 5 )
2 . f = ( i + j ) – ( k – 2 0 )

Registers vs. Memory
• Arithmetic instructions operands must be in registers
o MIPS has 32 registers
• Compiler associates variables with registers
• What about programs with lots of variables (arrays, etc.)? Use
memory, load/store operations to transfer data from memory to
register – if not enough registers spill registers to memory
• MIPS is a load/store architecture
Processor I/O
Control
Datapath
Memory
Input
Output

Memory Organization
• Viewed as a large single-dimension array with access by
address
• A memory address is an index into the memory array
• Byte addressing means that the index points to a byte of
memory, and that the unit of memory accessed by a load/store
is a byte
0
1
2
3
4
5
6
...
8 bits of data
8 bits of data
8 bits of data
8 bits of data
8 bits of data
8 bits of data
8 bits of data

Memory Organization
• Bytes are load/store units, but most data items use larger words
• For MIPS, a word is 32 bits or 4 bytes.
• 232 bytes with byte addresses from 0 to 232-1
• 230 words with byte addresses 0, 4, 8, ... 232-4
o i.e., words are aligned
o what are the least 2 significant bits of a word address?
0
4
8
12
...
32 bits of data
32 bits of data
32 bits of data
32 bits of data
Registers correspondingly hold 32 bits of data

The Endian Question
Big Endian
31 0
MIPS can also load and
store 4-byte words and
2-byte halfwords.
The endian question:
when you read a word, in
what order do the bytes
appear?
Little Endian: Intel, DEC,
et al.
Big Endian: Motorola,
IBM, Sun, et al.
MIPS can do either
SPIM adopts its host’s
convention
by te 0 by te 1 by te 2 by te 3
Little Endian
31 0
by te 3 by te 2 by te 1 by te 0
3 2 / 6 7

The Endian Question
x = 0x01234567

Load/Store Instructions
• Load and store instructions
• Example:
C code: A[8] = h + A[8];
MIPS code (load): lw $t0, 32($s3)
(arithmetic): add $t0, $s2, $t0
(store): sw $t0, 32($s3)
• Load word has destination first, store has destination last
• Remember MIPS arithmetic operands are registers, not memory
locations
o therefore, words must first be moved from memory to registers using
loads before they can be operated on; then result can be stored back to
memory
offset address
value

So far we’ve learned:
• MIPS
o loading words but addressing bytes
o arithmetic on registers only
• Instruction Meaning
add $s1, $s2, $s3 $s1 = $s2 + $s3
sub $s1, $s2, $s3 $s1 = $s2 – $s3
lw $s1, 100($s2) $s1 = Memory[$s2+100]
sw $s1, 100($s2) Memory[$s2+100]= $s1
• Try:Find the assembly code of B[8]=A[i]+A[j];
A and B available in $s6 and $s7 respectively
$so-$s5 consists of the values f-j

Exercise
Q: For the following C statement, what is the corresponding
MIPS assembly code? Assume that the variables f, g, h,
and i are given and could be considered 32-bit integers as
declared in a C program. Use a minimal number of MIPS
assembly instructions. f = g + (h − 5);
Solution:
f -> $s1, g -> $s2, h -> $s3
addi $t0, $s3,-5
add $s1, $s2, $t0

Representing Instructions
in the Computer
• Instruction format:
o A form of representation of an instruction
composed of fields of binary numbers.
• All MIPS instructions are 32 bit long.
• Three types of instruction formats:
o R-type (for register) or R-format
o I-type (for immediate) or I-format
o J-type (for jump) or J-format

R-type (for register)
• MIPS fields:
• op: Basic operation of the instruction (opcode)
• rs: The first register source operand
• rt: The second register source operand
• rd: The register destination operand
• shamt: Shift amount
• funt: Function. It selects the specific variant of the
operation in the op filed. (function code)
Ex: add $t0, $s1, $s2

I-type (for immediate)
• MIPS fields:
• rs: The register source operand
• rt: destination register, which receives the result of the
load
• constant or address: It contains 16 bit constant or
address value.

I-type (for immediate)
• MIPS fields:
Ex: addi $t1, $s0, 10
lw $t0, 40($s4)
bne $s5,$s6, 100

J-type (for jump)
• MIPS fields:
• address: It contains 26 bit address value.
• Ex:
j 10000

Instruction formats for
MIPS architecture

Mapping register names
to register numbers
t0 t1 t2 t3 t4 t5 t6 t7
8 9 10 11 12 13 14 15
s0 s1 s2 s3 s4 s5 s6 S7
16 17 18 19 20 21 22 23

Translating a MIPS Assembly
Instruction into a Machine Instruction
Given instruction: add $t0,$s1,$s2
• Solution:
• Identify the type instruction format: R-type
• Format: Operation rd, rs, rt
• rs -> $s1, rt -> $s2, rd -> $t0, shamt – NA
• Op -> , funct ->
• Decimal representation:
• Binary representation:
op rs rt rd shamt funct
0 17 18 8 0 32
000000 10001 10010 01000 00000 100000

Exercise
Q: Translate the following MIPS Assembly code
into binary code.
sub $t3,$s4,$s5
op rs rt rd Shamt Funct
0 20 21 11 0 34
000000 10100 10101 01011 00000 100010

Exercise
into binary code.
sub $t3,$s4,$s5
000000 10100 10101 01011 00000 100010

Translating a MIPS Assembly
Instruction into a Machine Instruction
Given instruction: lw $t0,32($s3)
• Solution:
• Identify the type instruction format: I-type
• Format: Operation rt, addr.(rs)
• rs -> $s3, rt -> $to, immediate -> 32
• Decimal representation:
• Binary representation:
op rs rt address
35 19 8 32
op rs rt
100011 10011 01000 0000 0000 0010 0000

Exercise
into binary code.
sw $t2,58($s5)
101011 10101 01010 0000 0000 0011 1010

Translating High level Language
into Machine Language
Q: Consider the following high level statement
A[300] = h + A[300];
If $t1 has the base of the array A and $s2 corresponds to
h, What is the MIPS machine language code?

Shift operations
• Shift allow bits to be moved around inside of a register.
• Shift left logical
Example: sll $t2,$s0,4 # reg $t2 = reg $s0 << 4 bits
Machine Code:
000000 00000 10000 01010 00100 000000

Shift Left Logical(sll)
• Example: sll $t2,$s0,4 # reg $t2 = reg $s0 << 4 bits
• If $s0=10
• Value of $t2=???

Shift operations
• Shift right logical
Example: srl $t5,$s3,2 # reg $t5 = reg $s3 >> 2 bits
Machine Code:
000000 00000 10011 01101 00010 000010
0 00000 19 13 2 2

Shift Right Logical(srl)
Example: srl $t5,$s3,2 # reg $t5 = reg $s3 >> 2 bits
• If $s3=12
• Value of $t5=???

Logical Operations –
AND, OR & NOT
• A logical bit-by-bit operation with two operands.
• EX:
and $t0,$t1,$t2 # reg $t0 = reg $t1 & reg $t2
or $t0,$t1,$t2 # reg $t0 = reg $t1 | reg $t2
nor $t0,$t1,$t3 # reg $t0 = ~ (reg $t1 | reg $t3)

Instructions for Making
Decisions
• Sequences that allow programs to execute statements in order
one after another.
•  Branches that allow programs to jump to other points in a
program.
•  Loops that allow a program to execute a fragment of code
multiple times.
• MIPS Instructions:
beq register1, register2, L1
bne register1, register2, L1
• beq and bne are mnemonics
• Conditional branches

Decisions
Q: In the following code segment, f, g, h, i, and j are
variables. If the five variables f through j correspond to the
five registers $s0 through $s4, what is the compiled MIPS
code for this C if statement?
if (i == j) f = g + h; else f = g - h;

Decisions
• Solution:

Decisions
High level code:
if (i == j)
f = g + h;
else
f = g - h;
MIPS code:
bne $s3,$s4,Else # go to Else if i ≠ j
add $s0,$s1,$s2 # f = g + h (skipped if i ≠ j)
j Exit # go to Exit
Else: sub $s0,$s1,$s2 # f = g - h (skipped if i = j)
Exit:

Compiling a while Loop
in C
while (save[i] == k)
i += 1;
Assume that i and k correspond to registers $s3 and $s5
and the base of the array save is in $s6. What is the MIPS
assembly code corresponding to this C segment?

Compiling a while Loop
in C
i += 1;
1. load save[i] into a temporary register
1. add i to the base of array save to form the address
2. performs the loop test
1. go to Exit if save[i] ≠ k
3. adds 1 to I
4. back to the while test at the top of the loop
5. Exit

i += 1;
Assume that i and k correspond to registers $s3 and $s5
and the base of the array save is in $s6. What is the MIPS
assembly code corresponding to this C segment?
Solution:
Loop: sll $t1,$s3,2 # Temp reg $t1 = i * 4
add $t1,$t1,$s6 # $t1 = address of save[i]
lw $t0,0($t1) # Temp reg $t0 = save[i]
bne $t0,$s5, Exit # go to Exit if save[i] ≠ k
addi $s3,$s3,1 # i = i + 1
j Loop # go to Loop
Exit:

MIPS Addressing Mode
• The different ways for specifying the locations
of instruction operands are known as
addressing mode.
• The MIPS addressing modes are the following:
1. Immediate addressing mode
2. Register addressing mode
3. Base or displacement addressing mode
4. PC-relative addressing mode
5. Pseudodirect addressing mode

Immediate addressing mode
• Def:
o the operand is a constant within the instruction itself
• Ex:
o addi $s1, $s2, 20 #$s1=$s2+20
• Ilustration:

Register addressing mode
• Def:
o source and destination operands are registers which are
available in processor registers.
o Direct addressing mode
• Ex:
o add $s1, $s2, $s3 #$s1=$s2+$s3
• Ilustration:

Base or displacement
addressing mode
• Def:
o the operand is at the memory location whose address is the
sum of a register and a constant in the instruction
o Indirect addressing mode
• Ex:
o lw $s1, 20 ($s3) #$s1= Memory[$s3+20]
• Ilustration:

PC-relative addressing mode
• Def:
o the branch address is the sum of the PC and a constant in
the instruction
• Ex:
o bne $s4, $s5, 25 # if ($s4 != $s5), go to
pc=12+4+100
• Ilustration:

Pseudodirect addressing
mode
• Def:
o the jump address is the 26 bits of the instruction
concatenated with the upper bits of the PC
• Ex:
o j 1000
• Ilustration:

Decoding Machine Code
• Q: What is the assembly language statement
corresponding to this machine instruction?
00af8020hex
Solution:
converting hexadecimal to binary
Binary instruction format
Assembly instruction

Translating Machine Language
to Assembly Language
• Translate the following machine language code into
assembly language.
0x02F34022

Performance
• Performance is the key to understanding underlying motivation for
the hardware and its organization
• Measure, report, and summarize performance to enable users to
o make intelligent choices
o see through the marketing hype!
• Why is some hardware better than others for different programs?
• What factors of system performance are hardware related?
(e.g., do we need a new machine, or a new operating system?)
• How does the machine's instruction set affect performance?

Computer Performance:
TIME, TIME, TIME!!!
• Response Time (elapsed time, latency):
o how long does it take for my job to run?
o how long does it take to execute (start to
finish) my job?
o how long must I wait for the database query?
• Throughput:
o how many jobs can the machine run at once?
o what is the average execution rate?
o how much work is getting done?
• If we upgrade a machine with a new processor what do we increase?
• If we add a new machine to the lab what do we increase?
Individual user
concerns…
Systems manager
concerns…

Execution Time
• Elapsed Time
o counts everything (disk and memory accesses, waiting for I/O, running
other programs, etc.) from start to finish
o a useful number, but often not good for comparison purposes
elapsed time = CPU time + wait time (I/O, other programs, etc.)
• CPU time
o doesn't count waiting for I/O or time spent running other programs
o can be divided into user CPU time and system CPU time (OS calls)
CPU time = user CPU time + system CPU time
 elapsed time = user CPU time + system CPU time + wait time
• Our focus: user CPU time (CPU execution time or, simply, execution
time)
o time spent executing the lines of code that are in our program

Definition of Performance
• For some program running on machine X:
PerformanceX = 1 / Execution timeX
• If there are two machines X and Y if the performance of X is greater than performance of
Y,
PerformanceX > PerformanceY
ie., 1 / Execution timeX > 1 / Execution timeY
• X is n times faster than Y means:
PerformanceX / PerformanceY = n
PerformanceX / PerformanceY = Execution timeY / Execution timeX = n

Q: If computer A runs a program in 10 sec
and computer B runs the same program in
15 secs, how much faster is A than B
• We know that,
PerformanceA / PerformanceB
= Execution timeB / Execution timeA = n
Thus the performance ratio is,
Execution timeB / Execution timeA = 15 / 10 = 1.5
ie., PerformanceA / PerformanceB = 1.5
Therfore Peformance of A 1.5 times faster than Performance
of B

Clock Cycles
• Instead of reporting execution time in seconds, we often use cycles.
In modern computers hardware events progress cycle by cycle: in
other words, each event, e.g., multiplication, addition, etc., is a
sequence of cycles
• Clock ticks indicate start and end of cycles:
• cycle time = time between ticks = seconds per cycle
• clock rate (frequency) = clock cycles per second (1 Hz. = 1
cycle/sec, 1 MHz. = 106 cycles/sec)
• Example: A 200 Mhz. clock has a cycle time of ????
time
seconds
program

cycles
program

seconds
cycle
cycle
tick
tick

Performance Equation I
• So, to improve performance one can either:
o reduce the number of cycles for a program, or
o reduce the clock cycle time, or, equivalently,
o increase the clock rate
seconds
program

cycles
program

seconds
cycle
CPU execution time CPU clock cycles Clock cycle time
for a program for a program
=

equivalently
Also, CPU execution time CPU clock cycles / Clock cycle rate

Our favorite program runs in 10 seconds on computer A, which has a 2
GHz clock. We are trying to help a computer designer build a computer,
B, which will run this program in 6 seconds. The designer has determined
that a substantial increase in the clock rate is possible, but this increase
will affect the rest of the CPU design, causing computer B to require 1.2
times as many clock cycles as computer A for this program. What clock
rate should we tell the designer to target?
CPU timeA = CPU Clock cyclesA / clock rateA
10 sec = CPU Clock cyclesA / 2*109 cycles/sec
CPU Clock cyclesA = 10 sec * 2*109 cycles/sec
= 20 *109 cycles
CPU timeB = 1.2 * CPU Clock cyclesA / clock rateB
6 secs = 1.2 * 20 *109 cycles / clock rateB
clock rateB = 1.2 * 20 *109 cycles / 6 sec= 4 * 109 Hz
To run the program in 6 secs, B must be 4 * 109 Hz

Instruction Performance
• No reference to no of instructions in previous equation
• The execution time depends on the number of
instructions in the program
Clock cycles per instruction (CPI)
• Average number of clock cycles per instruction for a
program or program fragment

Suppose we have two implementations of the same instruction
set architecture. Computer A has a clock cycle time of 250 ps
and a CPI of 2.0 for some program, and computer B has a
clock cycle time of 500 ps and a CPI of 1.2 for the same
program. Which computer is faster for this program and by
how much?
• Same number of instructions are instructions are
executed

Instruction Performance
CPU execution time = Instruction count * average CPI * Clock cycle time
Or
CPU execution time = Instruction count * average CPI / Clock rate

Which code sequence
executes the most?
• Sequence 1 executes,
2 + 1 + 2 = 5 instructions
• Sequence 2 executes,
4+ 1 + 1 = 6 instructions
Sequence 2 executes most no of instructions

Which will be faster?
• So code sequence 2 is faster

What is the CPI for each
sequence?
• Sequence 2 has lower CPI as it takes fewer clock cycles
but has more instructions

Basic components of
Performance

CE412 -advanced computer Architecture lecture 1.pdf

Recommended

Recommended

More Related Content

Similar to CE412 -advanced computer Architecture lecture 1.pdf

Similar to CE412 -advanced computer Architecture lecture 1.pdf (20)

Recently uploaded

Recently uploaded (20)

CE412 -advanced computer Architecture lecture 1.pdf