1. Velammal Engineering College
Department of Computer Science
and Engineering
Welcome…
Slide Sources: Patterson & Hennessy COD book
website (copyright Morgan Kaufmann) adapted
and supplemented
Mr. A. Arockia Abins &
Ms. R. Amirthavalli,
Asst. Prof,
CSE,
Velammal Engineering College
2. Course Objectives
• This course aims to learn the basic structure and operations of
a computer.
• The course is intended to learn ALU, pipelined execution,
parallelism and multi-core processors.
• The course will enable the students to understand memory
hierarchies, cache memories and virtual memories.
3. Course Outcomes
CO 1
Discuss the basics structure of computers, operations and
instructions.
CO 2 Design arithmetic and logic unit.
CO 3 Analyze pipelined execution and design control unit.
CO 4 Analyze parallel processing architectures.
CO 5 Examine the performance of various memory systems
CO 6 Organize the various I/O communications.
4. Syllabus
Unit Titles:
• Unit I Basic Structure of a Computer System
• Unit II Arithmetic for Computers
• Unit III Processor and Control Unit
• Unit IV Parallelism
• Unit V Memory & I/O Systems
5. Syllabus – Unit I
UNIT-I BASIC STRUCTURE OF A COMPUTER
SYSTEM
Functional Units – Basic operational concepts –– Instructions:
Operations, Operands – Instruction representation – Instruction
Types – MIPS addressing, Performance
6. Syllabus – Unit II
UNIT-II ARITHMETIC FOR COMPUTERS
Addition and Subtraction – Multiplication – Division – Floating
Point Representation – Floating Point Addition and Subtraction.
7. Syllabus – Unit III
UNIT-III PROCESSOR AND CONTROL UNIT
A Basic MIPS implementation – Building a Datapath – Control
Implementation Scheme – Pipelining – Pipelined datapath and
control – Handling Data Hazards & Control Hazards.
8. Syllabus – Unit IV
UNIT-IV PARALLELISM
Introduction to Multicore processors and other shared memory
multiprocessors – Flynn’s classification: SISD, MIMD, SIMD,
SPMD and Vector – Hardware multithreading – GPU
architecture.
9. Syllabus – Unit V
• UNIT-V MEMORY & I/O SYSTEMS
Memory Hierarchy – memory technologies – Cache Memory –
Performance Considerations, Virtual Memory,TLB’s – Accessing
I/O devices – Interrupts – Direct Memory Access – Bus Structure
– Bus operation.
10. Text Books
• Book 1:
o Name: Computer Organization and Design: The
Hardware/Software Interface
o Authors: David A. Patterson and John L. Hennessy
o Publisher: Morgan Kaufmann / Elsevier
o Edition: Fifth Edition, 2014
• Book 2:
o Name: Computer Organization and Embedded Systems
Interface
o Authors: Carl Hamacher, Zvonko Vranesic, Safwat Zaky and
Naraig Manjikian
o Publisher: Tata McGraw Hill
o Edition: Sixth Edition, 2012
11. Introduction
• What is mean by Computer Architecture?
Hardware parts
Instruction set
Interface between hardware &
software
13. Instruction Set Architecture
(ISA)
ISA: The interface or contact between the hardware and
the software
Rules about how to code and interpret machine
instructions:
Execution model (program counter)
Operations (instructions)
Data formats (sizes, addressing modes)
Processor state (registers)
Input and Output (memory, etc.)
14. Introduction
• What is meant by Computer
Architecture?
Computer architecture encompasses
the specification of an instruction set
and the functional behavior of the
hardware units that implement the
instructions.
28. Connection between the processor and the main
memory Code Snippet:
Load R2, LOC
Add R4, R3, R2
Store LOC, R4
29. IR & PC
• Instruction Register:
The instruction register (IR) holds the
instruction that is currently being executed.
• Program Counter:
The program counter (PC) contains the
memory address of the next instruction to be
fetched and executed.
35. Machine vs Assembly
Language
Machine Language Assembly Language
• A particular set of
instructions that the
CPU can directly
execute – but these
are ones and zeros
• Ex:
0100001010101
• Assembly language
is a symbolic
version of the
equivalent machine
language
• Ex:
add a,b
36.
37. Instructions
• Instruction Set:
o The vocabulary of commands understand by a
given architecture.
• Some ISA:
o ARM
o Intel x86
o IBM Power
o MIPS
o SPARC
• Different CPUs implement different set of
instructions.
38. MIPS
MIPS - Microprocessor with Interlocked Pipeline Stages
Features:
• five-stage execution pipeline: fetch, decode, execute,
memory-access, write-result
• regular instruction set, all instructions are 32-bit
• three-operand arithmetical and logical instructions
• 32 general-purpose registers of 32-bits each
• only the load and store instruction access memory
• flat address space of 4 GBytes of main memory (2^32
bytes)
39. MIPS Assembly Language
• Categories:
oArithmetic – Only processor and registers
involved (sum of two registers)
oData transfer – Interacts with memory
(load and store)
oLogical - Only processor and registers
involved (and, sll)
oConditional branch – Change flow of
execution (branch instructions)
oUnconditional Jump – Change flow of
execution (jump to a subroutine)
43. Load & Store Instructions
• Load:
o Transfer data from memory to a register
• Store:
o Transfer a data from a register to memory
• Memory address must be specified by
load and store
•
Processor Memory
STORE
LOAD
48. MIPS Arithmetic
• All MIPS arithmetic instructions have 3 operands
• Operand order is fixed (e.g., destination first)
• Example:
C code: A = B + C
MIPS code: add $s0, $s1, $s2
compiler’s job to associate
variables with registers
49. MIPS Arithmetic
• Design Principle 1: simplicity favors regularity.
Translation: Regular instructions make for simple hardware!
• Simpler hardware reduces design time and manufacturing cost.
• Of course this complicates some things...
C code: A = B + C + D;
E = F - A;
MIPS code add $t0, $s1, $s2
(arithmetic): add $s0, $t0, $s3
sub $s4, $s5, $s0
• Performance penalty: high-level code translates to denser machine
code.
Allowing variable number
of operands would
simplify the assembly
code but complicate the
hardware.
50. MIPS Arithmetic
a b c f g h i j
$ s 0 $ s 1 $ s 2 $ s 3 $ s 4 $ s 5 $ s 6
$ s 7
a = b - c ;
f = ( g + h ) – ( i + j ) ;
s u b $ s 0 , $ s 1 , $ s 2
a d d $ t 0 , $ s 4 , $ s 5
a d d $ t 1 , $ s 6 , $ s 7
s u b $ s 3 , $ t 0 , $ t 1
1 9 / 6 7
T r y :
1 . f = g + ( h – 5 )
2 . f = ( i + j ) – ( k – 2 0 )
51. Registers vs. Memory
• Arithmetic instructions operands must be in registers
o MIPS has 32 registers
• Compiler associates variables with registers
• What about programs with lots of variables (arrays, etc.)? Use
memory, load/store operations to transfer data from memory to
register – if not enough registers spill registers to memory
• MIPS is a load/store architecture
Processor I/O
Control
Datapath
Memory
Input
Output
52. Memory Organization
• Viewed as a large single-dimension array with access by
address
• A memory address is an index into the memory array
• Byte addressing means that the index points to a byte of
memory, and that the unit of memory accessed by a load/store
is a byte
0
1
2
3
4
5
6
...
8 bits of data
8 bits of data
8 bits of data
8 bits of data
8 bits of data
8 bits of data
8 bits of data
53. Memory Organization
• Bytes are load/store units, but most data items use larger words
• For MIPS, a word is 32 bits or 4 bytes.
• 232 bytes with byte addresses from 0 to 232-1
• 230 words with byte addresses 0, 4, 8, ... 232-4
o i.e., words are aligned
o what are the least 2 significant bits of a word address?
0
4
8
12
...
32 bits of data
32 bits of data
32 bits of data
32 bits of data
Registers correspondingly hold 32 bits of data
54. The Endian Question
Big Endian
31 0
MIPS can also load and
store 4-byte words and
2-byte halfwords.
The endian question:
when you read a word, in
what order do the bytes
appear?
Little Endian: Intel, DEC,
et al.
Big Endian: Motorola,
IBM, Sun, et al.
MIPS can do either
SPIM adopts its host’s
convention
by te 0 by te 1 by te 2 by te 3
Little Endian
31 0
by te 3 by te 2 by te 1 by te 0
3 2 / 6 7
56. Load/Store Instructions
• Load and store instructions
• Example:
C code: A[8] = h + A[8];
MIPS code (load): lw $t0, 32($s3)
(arithmetic): add $t0, $s2, $t0
(store): sw $t0, 32($s3)
• Load word has destination first, store has destination last
• Remember MIPS arithmetic operands are registers, not memory
locations
o therefore, words must first be moved from memory to registers using
loads before they can be operated on; then result can be stored back to
memory
offset address
value
57. So far we’ve learned:
• MIPS
o loading words but addressing bytes
o arithmetic on registers only
• Instruction Meaning
add $s1, $s2, $s3 $s1 = $s2 + $s3
sub $s1, $s2, $s3 $s1 = $s2 – $s3
lw $s1, 100($s2) $s1 = Memory[$s2+100]
sw $s1, 100($s2) Memory[$s2+100]= $s1
• Try:Find the assembly code of B[8]=A[i]+A[j];
A and B available in $s6 and $s7 respectively
$so-$s5 consists of the values f-j
58. Exercise
Q: For the following C statement, what is the corresponding
MIPS assembly code? Assume that the variables f, g, h,
and i are given and could be considered 32-bit integers as
declared in a C program. Use a minimal number of MIPS
assembly instructions. f = g + (h − 5);
Solution:
f -> $s1, g -> $s2, h -> $s3
addi $t0, $s3,-5
add $s1, $s2, $t0
59. Representing Instructions
in the Computer
• Instruction format:
o A form of representation of an instruction
composed of fields of binary numbers.
• All MIPS instructions are 32 bit long.
• Three types of instruction formats:
o R-type (for register) or R-format
o I-type (for immediate) or I-format
o J-type (for jump) or J-format
60. R-type (for register)
• MIPS fields:
• op: Basic operation of the instruction (opcode)
• rs: The first register source operand
• rt: The second register source operand
• rd: The register destination operand
• shamt: Shift amount
• funt: Function. It selects the specific variant of the
operation in the op filed. (function code)
Ex: add $t0, $s1, $s2
61. I-type (for immediate)
• MIPS fields:
• op: Basic operation of the instruction (opcode)
• rs: The register source operand
• rt: destination register, which receives the result of the
load
• constant or address: It contains 16 bit constant or
address value.
68. Translating a MIPS Assembly
Instruction into a Machine Instruction
Given instruction: add $t0,$s1,$s2
• Solution:
• Identify the type instruction format: R-type
• Format: Operation rd, rs, rt
• rs -> $s1, rt -> $s2, rd -> $t0, shamt – NA
• Op -> , funct ->
• Decimal representation:
• Binary representation:
op rs rt rd shamt funct
0 17 18 8 0 32
op rs rt rd shamt funct
000000 10001 10010 01000 00000 100000
69. Exercise
Q: Translate the following MIPS Assembly code
into binary code.
sub $t3,$s4,$s5
op rs rt rd Shamt Funct
0 20 21 11 0 34
000000 10100 10101 01011 00000 100010
70. Exercise
Q: Translate the following MIPS Assembly code
into binary code.
sub $t3,$s4,$s5
000000 10100 10101 01011 00000 100010
71. Translating a MIPS Assembly
Instruction into a Machine Instruction
Given instruction: lw $t0,32($s3)
• Solution:
• Identify the type instruction format: I-type
• Format: Operation rt, addr.(rs)
• rs -> $s3, rt -> $to, immediate -> 32
• Decimal representation:
• Binary representation:
op rs rt address
35 19 8 32
op rs rt
100011 10011 01000 0000 0000 0010 0000
72. Exercise
Q: Translate the following MIPS Assembly code
into binary code.
sw $t2,58($s5)
101011 10101 01010 0000 0000 0011 1010
73. Translating High level Language
into Machine Language
Q: Consider the following high level statement
A[300] = h + A[300];
If $t1 has the base of the array A and $s2 corresponds to
h, What is the MIPS machine language code?
81. Instructions for Making
Decisions
• Sequences that allow programs to execute statements in order
one after another.
• Branches that allow programs to jump to other points in a
program.
• Loops that allow a program to execute a fragment of code
multiple times.
• MIPS Instructions:
beq register1, register2, L1
bne register1, register2, L1
• beq and bne are mnemonics
• Conditional branches
82. Instructions for Making
Decisions
Q: In the following code segment, f, g, h, i, and j are
variables. If the five variables f through j correspond to the
five registers $s0 through $s4, what is the compiled MIPS
code for this C if statement?
if (i == j) f = g + h; else f = g - h;
84. Instructions for Making
Decisions
High level code:
if (i == j)
f = g + h;
else
f = g - h;
MIPS code:
bne $s3,$s4,Else # go to Else if i ≠ j
add $s0,$s1,$s2 # f = g + h (skipped if i ≠ j)
j Exit # go to Exit
Else: sub $s0,$s1,$s2 # f = g - h (skipped if i = j)
Exit:
85. Compiling a while Loop
in C
while (save[i] == k)
i += 1;
Assume that i and k correspond to registers $s3 and $s5
and the base of the array save is in $s6. What is the MIPS
assembly code corresponding to this C segment?
86. Compiling a while Loop
in C
while (save[i] == k)
i += 1;
1. load save[i] into a temporary register
1. add i to the base of array save to form the address
2. performs the loop test
1. go to Exit if save[i] ≠ k
3. adds 1 to I
4. back to the while test at the top of the loop
5. Exit
87. while (save[i] == k)
i += 1;
Assume that i and k correspond to registers $s3 and $s5
and the base of the array save is in $s6. What is the MIPS
assembly code corresponding to this C segment?
Solution:
Loop: sll $t1,$s3,2 # Temp reg $t1 = i * 4
add $t1,$t1,$s6 # $t1 = address of save[i]
lw $t0,0($t1) # Temp reg $t0 = save[i]
bne $t0,$s5, Exit # go to Exit if save[i] ≠ k
addi $s3,$s3,1 # i = i + 1
j Loop # go to Loop
Exit:
88. MIPS Addressing Mode
• The different ways for specifying the locations
of instruction operands are known as
addressing mode.
• The MIPS addressing modes are the following:
1. Immediate addressing mode
2. Register addressing mode
3. Base or displacement addressing mode
4. PC-relative addressing mode
5. Pseudodirect addressing mode
89. Immediate addressing mode
• Def:
o the operand is a constant within the instruction itself
• Ex:
o addi $s1, $s2, 20 #$s1=$s2+20
• Ilustration:
90. Register addressing mode
• Def:
o source and destination operands are registers which are
available in processor registers.
o Direct addressing mode
• Ex:
o add $s1, $s2, $s3 #$s1=$s2+$s3
• Ilustration:
91. Base or displacement
addressing mode
• Def:
o the operand is at the memory location whose address is the
sum of a register and a constant in the instruction
o Indirect addressing mode
• Ex:
o lw $s1, 20 ($s3) #$s1= Memory[$s3+20]
• Ilustration:
92. PC-relative addressing mode
• Def:
o the branch address is the sum of the PC and a constant in
the instruction
• Ex:
o bne $s4, $s5, 25 # if ($s4 != $s5), go to
pc=12+4+100
• Ilustration:
93. Pseudodirect addressing
mode
• Def:
o the jump address is the 26 bits of the instruction
concatenated with the upper bits of the PC
• Ex:
o j 1000
• Ilustration:
94. Decoding Machine Code
• Q: What is the assembly language statement
corresponding to this machine instruction?
00af8020hex
Solution:
converting hexadecimal to binary
Binary instruction format
Assembly instruction
95. Translating Machine Language
to Assembly Language
• Translate the following machine language code into
assembly language.
0x02F34022
96. Performance
• Performance is the key to understanding underlying motivation for
the hardware and its organization
• Measure, report, and summarize performance to enable users to
o make intelligent choices
o see through the marketing hype!
• Why is some hardware better than others for different programs?
• What factors of system performance are hardware related?
(e.g., do we need a new machine, or a new operating system?)
• How does the machine's instruction set affect performance?
97. Computer Performance:
TIME, TIME, TIME!!!
• Response Time (elapsed time, latency):
o how long does it take for my job to run?
o how long does it take to execute (start to
finish) my job?
o how long must I wait for the database query?
• Throughput:
o how many jobs can the machine run at once?
o what is the average execution rate?
o how much work is getting done?
• If we upgrade a machine with a new processor what do we increase?
• If we add a new machine to the lab what do we increase?
Individual user
concerns…
Systems manager
concerns…
98. Execution Time
• Elapsed Time
o counts everything (disk and memory accesses, waiting for I/O, running
other programs, etc.) from start to finish
o a useful number, but often not good for comparison purposes
elapsed time = CPU time + wait time (I/O, other programs, etc.)
• CPU time
o doesn't count waiting for I/O or time spent running other programs
o can be divided into user CPU time and system CPU time (OS calls)
CPU time = user CPU time + system CPU time
elapsed time = user CPU time + system CPU time + wait time
• Our focus: user CPU time (CPU execution time or, simply, execution
time)
o time spent executing the lines of code that are in our program
99. Definition of Performance
• For some program running on machine X:
PerformanceX = 1 / Execution timeX
• If there are two machines X and Y if the performance of X is greater than performance of
Y,
PerformanceX > PerformanceY
ie., 1 / Execution timeX > 1 / Execution timeY
• X is n times faster than Y means:
PerformanceX / PerformanceY = n
PerformanceX / PerformanceY = Execution timeY / Execution timeX = n
100. Q: If computer A runs a program in 10 sec
and computer B runs the same program in
15 secs, how much faster is A than B
• We know that,
PerformanceA / PerformanceB
= Execution timeB / Execution timeA = n
Thus the performance ratio is,
Execution timeB / Execution timeA = 15 / 10 = 1.5
ie., PerformanceA / PerformanceB = 1.5
Therfore Peformance of A 1.5 times faster than Performance
of B
101. Clock Cycles
• Instead of reporting execution time in seconds, we often use cycles.
In modern computers hardware events progress cycle by cycle: in
other words, each event, e.g., multiplication, addition, etc., is a
sequence of cycles
• Clock ticks indicate start and end of cycles:
• cycle time = time between ticks = seconds per cycle
• clock rate (frequency) = clock cycles per second (1 Hz. = 1
cycle/sec, 1 MHz. = 106 cycles/sec)
• Example: A 200 Mhz. clock has a cycle time of ????
time
seconds
program
cycles
program
seconds
cycle
cycle
tick
tick
102. Performance Equation I
• So, to improve performance one can either:
o reduce the number of cycles for a program, or
o reduce the clock cycle time, or, equivalently,
o increase the clock rate
seconds
program
cycles
program
seconds
cycle
CPU execution time CPU clock cycles Clock cycle time
for a program for a program
=
equivalently
Also, CPU execution time CPU clock cycles / Clock cycle rate
for a program for a program
103. Our favorite program runs in 10 seconds on computer A, which has a 2
GHz clock. We are trying to help a computer designer build a computer,
B, which will run this program in 6 seconds. The designer has determined
that a substantial increase in the clock rate is possible, but this increase
will affect the rest of the CPU design, causing computer B to require 1.2
times as many clock cycles as computer A for this program. What clock
rate should we tell the designer to target?
CPU timeA = CPU Clock cyclesA / clock rateA
10 sec = CPU Clock cyclesA / 2*109 cycles/sec
CPU Clock cyclesA = 10 sec * 2*109 cycles/sec
= 20 *109 cycles
CPU timeB = 1.2 * CPU Clock cyclesA / clock rateB
6 secs = 1.2 * 20 *109 cycles / clock rateB
clock rateB = 1.2 * 20 *109 cycles / 6 sec= 4 * 109 Hz
To run the program in 6 secs, B must be 4 * 109 Hz
104. Instruction Performance
• No reference to no of instructions in previous equation
• The execution time depends on the number of
instructions in the program
Clock cycles per instruction (CPI)
• Average number of clock cycles per instruction for a
program or program fragment
105. Suppose we have two implementations of the same instruction
set architecture. Computer A has a clock cycle time of 250 ps
and a CPI of 2.0 for some program, and computer B has a
clock cycle time of 500 ps and a CPI of 1.2 for the same
program. Which computer is faster for this program and by
how much?
• Same number of instructions are instructions are
executed
106. Instruction Performance
CPU execution time = Instruction count * average CPI * Clock cycle time
for a program for a program
Or
CPU execution time = Instruction count * average CPI / Clock rate
for a program for a program