3. Performance Metrics
Purchasing perspective
◦ given a collection of machines, which has the
◦ best performance ?
◦ least cost ?
◦ best cost/performance?
Design perspective
◦ faced with design options, which has the
◦ best performance improvement ?
◦ least cost ?
◦ best cost/performance?
Both require
◦ basis for comparison
◦ metric for evaluation
Our goal is to understand what factors in the architecture contribute to overall
system performance and the relative importance (and cost) of these factors
5. Response Time and Throughput
Response time
◦ How long it takes to do a task
Throughput
◦ Total work done per unit time
◦ e.g., tasks/transactions/… per hour
How are response time and throughput affected by
◦ Replacing the processor with a faster version?
◦ Adding more processors?
We’ll focus on response time for now…
6. Relative Performance
Performance = 1/Execution Time
“X is n time faster than Y”
n
X
Y
Y
X
time
Execution
time
Execution
e
Performanc
e
Performanc
Example: time taken to run a program
10s on A, 15s on B
Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
So A is 1.5 times faster than B
7. Measuring Execution Time
Elapsed time
◦ Total response time, including all aspects
◦ Processing, I/O, OS overhead, idle time
◦ Determines system performance
CPU time
◦ Time spent processing a given job
◦ Discounts I/O time, other jobs’ shares
◦ Comprises user CPU time and system CPU time
◦ Different programs are affected differently by CPU and system
performance
8. CPU Clocking
Operation of digital hardware governed by a constant-
rate clock
Clock (cycles)
Data transfer
and computation
Update state
Clock period
Clock period: duration of a clock cycle
e.g., 250ps = 0.25ns = 250×10–12s
Clock frequency (rate): cycles per second
e.g., 4.0GHz = 4000MHz = 4.0×109Hz
9. Review: Machine Clock Rate
Clock rate (clock cycles per second in MHz or GHz) is inverse of clock cycle
time (clock period)
one clock period
1 nsec (10-9) clock cycle => 1 GHz (109) clock rate
10. CPU Time
Performance improved by
◦ Reducing number of clock cycles
◦ Increasing clock rate
◦ Hardware designer must often trade off clock rate against cycle count
Rate
Clock
Cycles
Clock
CPU
Time
Cycle
Clock
Cycles
Clock
CPU
Time
CPU
11. CPU Time Example
Computer A: 2GHz clock, 10s CPU time
Designing Computer B
◦ Aim for 6s CPU time
◦ Can do faster clock, but causes 1.2 × clock cycles of A
How fast must Computer B clock be?
4GHz
6s
10
24
6s
10
20
1.2
Rate
Clock
10
20
2GHz
10s
Rate
Clock
Time
CPU
Cycles
Clock
6s
Cycles
Clock
1.2
Time
CPU
Cycles
Clock
Rate
Clock
9
9
B
9
A
A
A
A
B
B
B
12. Instruction Count and CPI
Instruction Count for a program
◦ Determined by program, ISA and compiler
Average cycles per instruction
◦ Determined by CPU hardware
◦ If different instructions have different CPI
◦ Average CPI affected by instruction mix
Rate
Clock
CPI
Count
n
Instructio
Time
Cycle
Clock
CPI
Count
n
Instructio
Time
CPU
n
Instructio
per
Cycles
Count
n
Instructio
Cycles
Clock
13. CPI Example
Computer A: Cycle Time = 250ps, CPI = 2.0
Computer B: Cycle Time = 500ps, CPI = 1.2
Same ISA
Which is faster, and by how much?
1.2
500ps
I
600ps
I
A
Time
CPU
B
Time
CPU
600ps
I
500ps
1.2
I
B
Time
Cycle
B
CPI
Count
n
Instructio
B
Time
CPU
500ps
I
250ps
2.0
I
A
Time
Cycle
A
CPI
Count
n
Instructio
A
Time
CPU
A is faster…
…by this much
14. CPI in More Detail
If different instruction classes take different numbers of cycles
n
1
i
i
i )
Count
n
Instructio
(CPI
Cycles
Clock
Weighted average CPI
n
1
i
i
i
Count
n
Instructio
Count
n
Instructio
CPI
Count
n
Instructio
Cycles
Clock
CPI
Relative frequency
15. Power Trends
In CMOS IC technology
§1.7
The
Power
Wall
Frequency
Voltage
load
Capacitive
Power 2
)
2
/
1
(
×1000
×40 5V → 1V
17. Multiprocessors
Multicore microprocessors
◦ More than one processor per chip
Requires explicitly parallel programming
◦ Compare with instruction level parallelism
◦ Hardware executes multiple instructions at once
◦ Hidden from the programmer
◦ Hard to do
◦ Programming for performance
◦ Load balancing
◦ Optimizing communication and synchronization