SlideShare a Scribd company logo
Fundamentals of Computer Design
Unit 1- Chapter 1
1
Reference :
Computer Architecture, A
Quantitative Approach – John
L. Hennessey and David A.
Patterson. 4th edition
2
Outline
 Introduction
 Computing Classes
 Task of Computer Designer
 Measuring, Reporting,
Summarizing Performance
 Quantitative Principles of Design
3
Introduction
Eniac, 1946
(first stored-program computer)
Occupied 50x30 feet room,
weighted 30 tonnes,
contained 18000 electronic valves,
consumed 25KW of electrical power;
capable to perform 100K calc. per second
CHANGE!
PlayStation Portable (PSP)
Approx. 170 mm (L) x 74 mm (W) x 23 mm (D)
Weight: Approx. 260 g (including battery)
CPU: PSP CPU (clock frequency 1~333MHz)
Main Memory: 32MB
Embedded DRAM: 4MB
Profile: PSP Game, UMD Audio, UMD Video
4
A short history of computing
 Continuous growth in performance due to advances in technology and
innovations in computer design
 First 25 years (1945 – 1970)
 25% yearly growth in performance
 Both forces contributed to performance improvement
 Mainframes and minicomputers dominated the industry
 Late 70s, emergence of the microprocessor
 35% yearly growth in performance thanks to integrated circuit technology
 Changes in computer marketplace:
elimination of assembly language programming,
emergence of Unix  easier to develop new architectures
 Mid 80s, emergence of RISCs (Reduced Instruction Set Computers)
 52% yearly growth in performance
 Performance improvements through instruction level parallelism
(pipelining, multiple instruction issue), caches
 Since ‘02, end of 16 years of renaissance
 20% yearly growth in performance
 Limited by 3 hurdles: maximum power dissipation,
instruction-level parallelism, and so called “memory wall”
 Switch from ILP to TLP and DLP (Thread-, Data-level Parallelism)
5
Effect of this Dramatic Growth
 Significant enhancement of the capability
available to computer user
 Example: a today’s $500 PC has more performance,
more main memory, and more disk storage than a $1
million computer in 1985
 Microprocessor-based computers dominate
 Workstations and PCs
have emerged as major products
 Minicomputers - replaced by servers
 Mainframes - replaced by multiprocessors
 Supercomputers - replaced by
large arrays of microprocessors
6
Changing Face of Computing
 In the 1960s mainframes roamed the planet
 Very expensive, operators oversaw operations
 Applications: business data processing,
large scale scientific computing
 In the 1970s, minicomputers emerged
 Less expensive, time sharing
 In the 1990s, Internet and WWW, handheld devices
(PDA), high-performance consumer electronics for
video games and set-top boxes have emerged
 Dramatic changes have led to
3 different computing markets
 Desktop computing, Servers, Embedded Computers
7
Computing Classes: A Summary
Feature Desktop Server Embedded
Price of the
system
$500-$5K $5K-$5M $10-$100K (including network
routers at high end)
Price of the
processor
$50-$500 $200-$10K $0.01 - $100
Sold per year
(estimates for
2000)
150M 4M 300M
(only 32-bit and 64-bit)
Critical system
design issues
Price &
performance.
graphics, compute
performance
Throughput,
availability,
scalability
Price, power consumption,
application-specific
performance
8
Desktop Computers
 Largest market in dollar terms
 Spans low-end (<$500) to high-end ($5K) systems
 Optimize price-performance
 Performance measured in the number of calculations
and graphic operations
 Price is what matters to customers
 Arena where the newest, highest-performance and
cost-reduced microprocessors appear
 Reasonably well characterized in terms of
applications and benchmarking
9
Servers
 Provide more reliable file and computing
services (Web servers)
 Key requirements
 Availability – effectively provide service
(Yahoo!, Google, eBay)
 Reliability – never fails
 Scalability – server systems grow over time,
so the ability to scale up the computing
capacity is crucial
 Performance – transactions per minute
 Related category: clusters / supercomputers
10
Embedded Computers
 Fastest growing portion of the market
 Computers as parts of other devices where their presence is not
obviously visible
 E.g., home appliances, printers, smart cards,
cell phones, palmtops, gaming consoles, network routers
 Wide range of processing power and cost
 $0.1 (8-bit, 16-bit processors), $10 (32-bit capable to execute
50M instructions per second), $100-$200 (high-end video
gaming consoles and network switches)
 Requirements
 Real-time performance requirement
(e.g., time to process a video frame is limited)
 Minimize memory requirements, power
 SOCs (System-on-a-chip) combine processor cores and
application-specific circuitry, DSP processors…
11
Task of Computer Designer
 “Determine what attributes are important for
a new machine; then design a machine to
maximize performance while staying within
cost, power, and availability constraints.”
 Aspects of this task
 Instruction set design
 Functional organization
 Logic design and implementation
(IC design, packaging, power, cooling...)
12
What is Computer Architecture?
 Instruction Set Architecture
 the computer visible to the assembler language
programmer or compiler writer (registers, data types,
instruction set, instruction formats, addressing modes)
 Organization
 high level aspects of computer’s design such as
the memory system, the bus structure, and
the internal CPU (datapath + control) design
 Hardware
 detailed logic design, interconnection and packing
technology, external connections
Computer Architecture covers
all three aspects of computer design
Instruction Set Architecture, Organization, and
Hardware
13
14
Instruction Set Architecture:
Critical Interface
instruction set
software
hardware
 Properties of a good abstraction
 Lasts through many generations (portability)
 Used in many different ways (generality)
 Provides convenient functionality to higher levels
 Permits an efficient implementation at lower levels
Measuring, Reporting,
Summarizing Performance
15
16
Cost-Performance
 Purchasing perspective: from a collection of
machines, choose one which has
 best performance?
 least cost?
 best performance/cost?
 Computer designer perspective:
faced with design options, select one which has
 best performance improvement?
 least cost?
 best performance/cost?
 Both require: basis for comparison and
metric for evaluation
17
Two “notions” of performance
 Which computer has better performance?
 User: one which runs a program in less time
 Computer centre manager:
one which completes more jobs in a given time
 Users are interested in reducing
Response time or Execution time
 the time between the start and
the completion of an event
 Managers are interested in increasing
Throughput or Bandwidth
 total amount of work done in a given time
18
An Example
 Which has higher performance?
 Time to deliver 1 passenger?
 B is 6.5/3 = 2.2 times faster
 No. of passengers delivered per hour?
 A is 72/44 = 1.6 times faster
Plane DC to Paris
[hour]
Top Speed
[mph]
Passe
-ngers
Throughput
[p/h]
A 6.5 610 470 72
(=470/6.5)
B 3 1350 132 44 (=132/3)
19
Definition of Performance
 We are primarily concerned with Response Time
 Performance :
 “X is n times faster than Y”
 As faster means both increased performance and decreased execution
time, to reduce confusion will use “improve performance” or
“improve execution time”
)(_
)(
xtimeExecution
xePerformanc
1

)(
)(
)(_
)(_
yePerformanc
xePerformanc
xtimeExecution
ytimeExecution
n 
20
Execution Time and Its Components
 Execution Time - response time, elapsed time
 the latency to complete a task, including disk
accesses, memory accesses, input/output activities,
operating system overhead,...
 CPU time
 the time the CPU is computing, excluding I/O or
running other programs with multiprogramming
 often further divided into user and system CPU times
 User CPU time
 the CPU time spent in the program
 System CPU time
 the CPU time spent in the operating system
21
CPU Time for a program
 Instruction count (IC) = Number of
instructions executed
 Clock cycles per instruction (CPI)
timecycleClockprogramaforcyclesclockCPUtimeCPU 
rateClock
programaforcyclesclockCPU
CPUtime 
IC
programaforcyclesclockCPU
CPI 
CPI - one way to compare two machines with same instruction set,
since Instruction Count would be the same
22
CPU Execution Time (cont’d)
timecycleClockCPIICtimeCPU 
Program
Seconds
cycleClock
Seconds
nInstructio
cyclesClock
Program
nsInstructio
timeCPU 
Substituting the units of measurement
 Basic technologies involved in changing each
characteristic are interdependent:
 Hence it is difficult to change one parameter without
affecting others
23
IC CPI Clock rate
Program X
Compiler X
ISA X X
Organisation X X
Technology X
CPU Execution Time (cont’d)
24
How to Calculate 3 Components?
 Clock Cycle Time
 in specification of computer
(Clock Rate in advertisements)
 Instruction count
 Count instructions in loop of small program
 Use simulator to count instructions
 Hardware counter in special register (Pentium II)
 CPI
 Calculate:
Execution Time / Clock cycle time / Instruction Count
 Hardware counter in special register (Pentium II)
25
Metrics
 For desktop and workstations in most cases it is execution
time
 Also known as response time
 Reciprocal of performance
 For servers it is normally throughput
 How many jobs can get done in unit time
 Almost always a response time limit (per job) is also
imposed
 For embedded processors it is also execution time
 In many situations a hard or soft real time deadline is
imposed
 Hard deadlines cannot be missed, soft deadlines can be
missed in a limited cases
 Goal: maximize performance within power budget
26
SPEC- Benchmarks
 Want to compare two processors by measuring
their performance
 Need some standardized set of programs
 These are called benchmark programs
 Each market sector has a different focus: needs
different set of benchmark
27
Choosing Programs to Evaluate Per.
 Ideally run typical programs with typical input before
purchase, or before even build machine
 Engineer uses compiler, spreadsheet
 Author uses word processor, drawing program,
compression software
 Workload – mixture of programs and OS commands
that users run on a machine
28
Benchmarks
 Different types of benchmarks
 Real programs (Ex. MSWord, Excel, Photoshop,...)
 Kernels - small pieces from real programs (Linpack,...)
 Toy Benchmarks - short, easy to type and run
( Quicksort, Puzzle,...)
 Synthetic benchmarks - code that matches frequency
of key instructions and operations to real programs
(Whetstone, Dhrystone)
 Need industry standards so that different processors
can be fairly compared
 Companies exist that create these benchmarks:
“typical” code used to evaluate systems
29
Desktop benchmarks
 SPEC CPU is the standard
 Provided by Standard Performance Evaluation Corporation (SPEC)
http://www.spec.org/
 Currently in the 5th generation: SPEC89, SPEC92, SPEC95,
SPEC2000, SPEC2006
 Has two types of programs: integer and floating-point to stress
different CPU units
 SPEC2000 has 12 integer and 14 floating-point programs
 If you are introducing a new processor for desktop or uniprocessor
workstation/server, you are supposed to report performance of all 26
programs
 For graphics performance two available benchmarks: SPECviewperf
and SPECapc
30
Server benchmarks
 SPECrate
 Not the most reliable measure: run a copy of the same
SPEC application on each processor and report average
number of jobs finished per unit time
 SPECSFS
 Tests the performance of NFS using a series of file
server requests; measures CPU, disk and network
throughput; also puts response time limit on each request
 SPECWeb
 Web server benchmark; simulates multiple clients
requesting both static and dynamic pages from a server;
also clients can upload data on the server
31
Server benchmarks
 Transaction processing (TP)
 Probably the most widely used benchmark in server community
 Measures database access and update throughput of a server
 Simple examples: airline reservation, bank ATM
 Complex systems: complicated query processing e.g. online book
shops (amazon)
 Provided by Transaction Processing Council (TPC)
 TPC-A was the first one; now we have TPC-C (complex queries),
TPC-H (unrelated queries), TPC-R (DBMS is optimized based on
past query pattern), TPC-W (business-oriented transactional web
server)
 The measured metric is transactions per second along with response
time limit for individual transactions
32
Embedded sector
 Quite difficult to come up with a good benchmark
 Wide range of applications
 Requirements vary widely from one system to another
 Reality: in many cases an embedded system designer
designs his/her own benchmarks which are either the
target applications or kernels extracted from them
 Today the best known benchmark set is offered by
EEMBC (Embedded Microprocessor Benchmark
Consortium)
 EEMBC has five types of applications:
automotive/industrial (16 kernels), consumer (5
multimedia kernels), networking (3 kernels), office
automation (4 graphics and text processing kernels),
telecommunications (6 filtering and DSP kernels)
33
Comparing and Summarising Per.
 An Example
 What we can learn from these statements?
 We know nothing about
relative performance of computers A, B, C!
 One approach to summarise relative performance:
use total execution times of programs
Program Com. A Com. B Com. C
P1 (sec) 1 10 20
P2 (sec) 1000 100 20
Total (sec) 1001 110 40
– A is 20 times faster than C for
program P1
– C is 50 times faster than A for
program P2
– B is 2 times faster than C for
program P1
– C is 5 times faster than B for
program P2
 A Benchmark Program is compiled and run
on computer under test- note the running time
 The same pgm is compiled and run on a
reference computer
 Compute the SPEC ratio as
34
testundercomputertheontimerunning
computerrefencetheontimerunning
RatioSPEC 
SPEC ratio of 50 means comp. under test is 50 times faster than ref. comp.
Comparing and Sum. Per. (cont’d)
 The above process is repeated for all pgms in
the SPEC suite
 The overall SPEC ratio for the computer
35









n
i
i
ratioSPEC SPEC
n
1
1
Where,
n – number of pgms in the suite
SPECi – ratio for the pgm i in the suite
Comparing and Sum. Per. (cont’d)
 SPECRatio of computer A on a benchmark
was 1.25 times higher than computer B
 This means performance of computer A is
1.25 times higher than computer B
36
 Note - choice of the reference computer is
irrelevant
Comparing and Sum. Per. (cont’d)
37
Quantitative Principles of Design
 Where to spend time making improvements?
 Make the Common Case Fast
 Most important principle of computer design:
Spend your time on improvements where those
improvements will do the most good
 Example
 Instruction A represents 5% of execution
 Instruction B represents 20% of execution
 Even if you can drive the time for A to 0, the CPU will only be
5% faster
 Key questions
 What the frequent case is?
 How much performance can be improved by making
that case faster?
38
1. Focus on the Common Case -
Amdahl’s Law
 Suppose that we make an enhancement to a
machine that will improve its performance; Speedup
is ratio:
 Amdahl’s Law states that the performance
improvement that can be gained by a particular
enhancement is limited by the amount of time that
enhancement can be used
tenhancemenusingtaskentireforExTime
tenhancemenwithouttaskentireforExTime
Speedup 
tenhancemenwithouttaskentireforePerformanc
tenhancemenusingtaskentireforePerformanc
Speedup 
39
Computing Speedup
 Fraction_enhanced = fraction of execution time in
the original machine that can be converted to take
advantage of enhancement (E.g., 10/30)
 Speedup_enhanced = how much faster the
enhanced code will run (E.g., 10/2=5)
 Execution_timenew = ex time of unenhanced part +
new ex time of the enhanced part
enhanced
enhanced
unenhancednew
Speedup
ExTime
ExTimeExTime 
20 10 20 2
40
Computing Speedup (cont’d)
 so times are:
 Factor out Timeold and divide by
Speedupenhanced:
 Overall speedup is ratio of Timeold to Timenew:
 enhancedoldunenhanced FractionExTimeExTime  1
enhancedoldenhanced FractionExTimeExTime 







enhanced
enhanced
enhancedoldnew
Speedup
Fraction
FractionExTimeExTime 1
enhanced
enhanced
enhanced
Speedup
Fraction
Fraction
Speedup


1
1
41
Example 1
 Enhancement runs 10 times faster and it
affects 40% of the execution time; compute
the overall speedup :
 Fractionenhanced = 0.40
 Speedupenhanced = 10
 Speedupoverall = ?
561
640
1
10
40
401
1
.
..
.


Speedup
Example 2
 A common transformation required in graphics processors is square
root. Implementations of floating-point (FP) square root vary
significantly in performance, especially among processors designed
for graphics. Suppose FP square root (FPSQR) is responsible for
20% of the execution time of a critical graphics benchmark. One
proposal is to enhance the FPSQR hardware and speed up this
operation by a factor of 10.
 The other alternative is just to try to make all FP instructions in the
graphics processor run faster by a factor of 1.6; FP instructions are
responsible for half of the execution time for the application. The
design team believes that they can make all FP instructions run 1.6
times faster with the same effort as required for the fast square root.
Compare these two design alternatives.
42
 Hence improving the performance of the FP operations
overall is slightly better because of the higher frequency.
43
enhanced
enhanced
enhanced
Speedup
Fraction
Fraction
Speedup


1
1
Another method of calculating CPU time
44
where –
ICi represents number of times instruction i is executed in a program
CPIi represents the average number of clocks per instruction for instruction i.
The CPU clock cycles taken by a program is given by :
45
The CPU time taken by a program is given by :
The overall CPI is expressed by :
Example 3
 Suppose we have made the following measurements:

 Frequency of FP operations =25%
 Average CPI of FP operations =4.0
 Average CPI of other instructions = 1.33
 Frequency of FPSQR=2%
 CPI of FPSQR=20
 Assume that the two design alternatives are to decrease the CPI of
FPSQR to 2 or to decrease the average CPI of all FP operations to
2.5. Compare these two design alternatives using the processor
performance equation.
46
 Find the original CPI with neither enhancement
 Compute the CPI for the enhanced FPSQR by
subtracting the cycles saved from the original CPI:
47
 compute the CPI for the enhancement of all FP
instructions the same way:
CPInewFP = CPIoriginal - 25% x (CPIoldFP - CPInewFPonly )
=2 – 0.25 x (4 - 2.5)
=2 – (0.25 X 1.5)
=1.625
 Performance is marginally better with FP enhancement
 Speedup with new FP = 2 / 1.625 = 1.23
48
2. Take Advantage of Parallelism
 One of the most important methods for
improving performance
In a server:
 multiple processors and multiple disks can be
used - improves throughput.
 Scalability- Being able to expand memory,
number of processors, disks - is a valuable
asset for servers.
49
Individual processor
 Parallelism among instructions - pipelining
 set-associative caches use multiple banks of
memory
50
2. Take Advantage of Parallelism
3. Principle of Locality
 Programs tend to reuse data and instructions
they have used recently.
 Program spends 90% of its execution time in
only 10% of the code
 Two different types of locality
 Temporal locality- recently accessed items
are likely to be accessed in the near future.
 Spatial locality- items whose addresses are
near one another tend to be referenced close
together in time.
51
Principle of Locality
 We can predict which instructions and data a
program will use in the near future based on
its accesses in the recent past
52

More Related Content

What's hot

Embedded system design process
Embedded system design processEmbedded system design process
Embedded system design process
RajalakshmiSermadurai
 
Introduction to Embedded system
Introduction to Embedded systemIntroduction to Embedded system
Introduction to Embedded system
tmnportal
 
Computer Architecture
Computer ArchitectureComputer Architecture
Computer Architecture
Haris456
 
Introduction to embedded system
Introduction to embedded systemIntroduction to embedded system
Introduction to embedded system
ajitsaraf123
 
Unit 1b
Unit 1bUnit 1b
Unit 1b
Karthik Vivek
 
Introduction
IntroductionIntroduction
Introduction
Karthik Vivek
 
The Deal
The DealThe Deal
The Dealadhaval
 
Unit 1c
Unit 1cUnit 1c
Unit 1c
Karthik Vivek
 
computer application in hospitality Industry, periyar university unit 1
computer application in hospitality Industry, periyar university  unit 1computer application in hospitality Industry, periyar university  unit 1
computer application in hospitality Industry, periyar university unit 1
admin information
 
Chapter 3 Computer Hardware
Chapter 3 Computer HardwareChapter 3 Computer Hardware
Chapter 3 Computer Hardware
shelly3160
 
System on chip architectures
System on chip architecturesSystem on chip architectures
System on chip architectures
A B Shinde
 
clasification of digital computer and application of computers
clasification of digital computer and application of computersclasification of digital computer and application of computers
clasification of digital computer and application of computersJelz JZ
 
Embedded os
Embedded osEmbedded os
Embedded os
K Senthil Kumar
 
Computer architecture overview
Computer architecture overviewComputer architecture overview
Computer architecture overviewMuhammad Ishaq
 
Introduction to embedded system
Introduction to embedded systemIntroduction to embedded system
Introduction to embedded system
Revathi Subramaniam
 
Ch1
Ch1Ch1

What's hot (17)

Embedded system design process
Embedded system design processEmbedded system design process
Embedded system design process
 
Introduction to Embedded system
Introduction to Embedded systemIntroduction to Embedded system
Introduction to Embedded system
 
Computer Architecture
Computer ArchitectureComputer Architecture
Computer Architecture
 
Introduction to embedded system
Introduction to embedded systemIntroduction to embedded system
Introduction to embedded system
 
Unit 1b
Unit 1bUnit 1b
Unit 1b
 
Introduction
IntroductionIntroduction
Introduction
 
The Deal
The DealThe Deal
The Deal
 
Unit 1c
Unit 1cUnit 1c
Unit 1c
 
computer application in hospitality Industry, periyar university unit 1
computer application in hospitality Industry, periyar university  unit 1computer application in hospitality Industry, periyar university  unit 1
computer application in hospitality Industry, periyar university unit 1
 
Chapter 3 Computer Hardware
Chapter 3 Computer HardwareChapter 3 Computer Hardware
Chapter 3 Computer Hardware
 
System on chip architectures
System on chip architecturesSystem on chip architectures
System on chip architectures
 
clasification of digital computer and application of computers
clasification of digital computer and application of computersclasification of digital computer and application of computers
clasification of digital computer and application of computers
 
Embedded System-design technology
Embedded System-design technologyEmbedded System-design technology
Embedded System-design technology
 
Embedded os
Embedded osEmbedded os
Embedded os
 
Computer architecture overview
Computer architecture overviewComputer architecture overview
Computer architecture overview
 
Introduction to embedded system
Introduction to embedded systemIntroduction to embedded system
Introduction to embedded system
 
Ch1
Ch1Ch1
Ch1
 

Similar to Unit i-introduction

1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
enriquealbabaena6868
 
slides.pdf
slides.pdfslides.pdf
slides.pdf
GafryMahmoud
 
Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism)
A B Shinde
 
IS 139 Lecture 1 - 2015
IS 139 Lecture 1 - 2015IS 139 Lecture 1 - 2015
IS 139 Lecture 1 - 2015
Aron Kondoro
 
UNIT I_Introduction.pptx
UNIT I_Introduction.pptxUNIT I_Introduction.pptx
UNIT I_Introduction.pptx
ssuser4ca1eb
 
Chapter_01.pptx
Chapter_01.pptxChapter_01.pptx
Chapter_01.pptx
aliceasiedu980
 
Computer Evolution
Computer EvolutionComputer Evolution
Computer Evolution
Education Front
 
UNIT I.pptx
UNIT I.pptxUNIT I.pptx
UNIT I.pptx
SeshuSrinivas2
 
introduction to Embedded System
introduction to Embedded Systemintroduction to Embedded System
introduction to Embedded System
Ankur Soni
 
ERTS_Unit 1_PPT.pdf
ERTS_Unit 1_PPT.pdfERTS_Unit 1_PPT.pdf
ERTS_Unit 1_PPT.pdf
VinothkumarUruman1
 
Esd module1
Esd module1Esd module1
Esd module1
SOURAV KUMAR
 
CSE675_01_Introduction.ppt
CSE675_01_Introduction.pptCSE675_01_Introduction.ppt
CSE675_01_Introduction.ppt
AshokRachapalli1
 
CSE675_01_Introduction.ppt
CSE675_01_Introduction.pptCSE675_01_Introduction.ppt
CSE675_01_Introduction.ppt
AshokRachapalli1
 
Lecture 31.pdf
Lecture 31.pdfLecture 31.pdf
Lecture 31.pdf
AleenaJamil4
 
software engineering CSE675_01_Introduction.ppt
software engineering CSE675_01_Introduction.pptsoftware engineering CSE675_01_Introduction.ppt
software engineering CSE675_01_Introduction.ppt
SomnathMule5
 
Introduction to Embedded Systems (1).pptx
Introduction to Embedded Systems (1).pptxIntroduction to Embedded Systems (1).pptx
Introduction to Embedded Systems (1).pptx
priyaanaparthy
 
Chapter 1 computer abstractions and technology
Chapter 1 computer abstractions and technologyChapter 1 computer abstractions and technology
Chapter 1 computer abstractions and technology
BATMUNHMUNHZAYA
 
Emb Sys Rev Ver1
Emb Sys   Rev Ver1Emb Sys   Rev Ver1
Emb Sys Rev Ver1
ncct
 
System On Chip
System On ChipSystem On Chip
System On Chip
A B Shinde
 

Similar to Unit i-introduction (20)

1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
 
slides.pdf
slides.pdfslides.pdf
slides.pdf
 
Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism)
 
IS 139 Lecture 1 - 2015
IS 139 Lecture 1 - 2015IS 139 Lecture 1 - 2015
IS 139 Lecture 1 - 2015
 
UNIT I_Introduction.pptx
UNIT I_Introduction.pptxUNIT I_Introduction.pptx
UNIT I_Introduction.pptx
 
Chapter_01.pptx
Chapter_01.pptxChapter_01.pptx
Chapter_01.pptx
 
Computer Evolution
Computer EvolutionComputer Evolution
Computer Evolution
 
UNIT I.pptx
UNIT I.pptxUNIT I.pptx
UNIT I.pptx
 
Information technology
Information technologyInformation technology
Information technology
 
introduction to Embedded System
introduction to Embedded Systemintroduction to Embedded System
introduction to Embedded System
 
ERTS_Unit 1_PPT.pdf
ERTS_Unit 1_PPT.pdfERTS_Unit 1_PPT.pdf
ERTS_Unit 1_PPT.pdf
 
Esd module1
Esd module1Esd module1
Esd module1
 
CSE675_01_Introduction.ppt
CSE675_01_Introduction.pptCSE675_01_Introduction.ppt
CSE675_01_Introduction.ppt
 
CSE675_01_Introduction.ppt
CSE675_01_Introduction.pptCSE675_01_Introduction.ppt
CSE675_01_Introduction.ppt
 
Lecture 31.pdf
Lecture 31.pdfLecture 31.pdf
Lecture 31.pdf
 
software engineering CSE675_01_Introduction.ppt
software engineering CSE675_01_Introduction.pptsoftware engineering CSE675_01_Introduction.ppt
software engineering CSE675_01_Introduction.ppt
 
Introduction to Embedded Systems (1).pptx
Introduction to Embedded Systems (1).pptxIntroduction to Embedded Systems (1).pptx
Introduction to Embedded Systems (1).pptx
 
Chapter 1 computer abstractions and technology
Chapter 1 computer abstractions and technologyChapter 1 computer abstractions and technology
Chapter 1 computer abstractions and technology
 
Emb Sys Rev Ver1
Emb Sys   Rev Ver1Emb Sys   Rev Ver1
Emb Sys Rev Ver1
 
System On Chip
System On ChipSystem On Chip
System On Chip
 

More from akruthi k

Pattern matching programs
Pattern matching programsPattern matching programs
Pattern matching programs
akruthi k
 
Kmp
KmpKmp
Boyer moore
Boyer mooreBoyer moore
Boyer moore
akruthi k
 
Physical layer overview
Physical layer overviewPhysical layer overview
Physical layer overview
akruthi k
 
Fhss
FhssFhss
Fhss
akruthi k
 
Dsss phy
Dsss phyDsss phy
Dsss phy
akruthi k
 
802.11 mgt-opern
802.11 mgt-opern802.11 mgt-opern
802.11 mgt-opern
akruthi k
 
802.11i
802.11i802.11i
802.11i
akruthi k
 
802.1x
802.1x802.1x
802.1x
akruthi k
 
Wired equivalent privacy (wep)
Wired equivalent privacy (wep)Wired equivalent privacy (wep)
Wired equivalent privacy (wep)
akruthi k
 

More from akruthi k (10)

Pattern matching programs
Pattern matching programsPattern matching programs
Pattern matching programs
 
Kmp
KmpKmp
Kmp
 
Boyer moore
Boyer mooreBoyer moore
Boyer moore
 
Physical layer overview
Physical layer overviewPhysical layer overview
Physical layer overview
 
Fhss
FhssFhss
Fhss
 
Dsss phy
Dsss phyDsss phy
Dsss phy
 
802.11 mgt-opern
802.11 mgt-opern802.11 mgt-opern
802.11 mgt-opern
 
802.11i
802.11i802.11i
802.11i
 
802.1x
802.1x802.1x
802.1x
 
Wired equivalent privacy (wep)
Wired equivalent privacy (wep)Wired equivalent privacy (wep)
Wired equivalent privacy (wep)
 

Recently uploaded

ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
ssuser9bd3ba
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
Kamal Acharya
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
Kamal Acharya
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 

Recently uploaded (20)

ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
LIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.pptLIGA(E)11111111111111111111111111111111111111111.ppt
LIGA(E)11111111111111111111111111111111111111111.ppt
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 

Unit i-introduction

  • 1. Fundamentals of Computer Design Unit 1- Chapter 1 1 Reference : Computer Architecture, A Quantitative Approach – John L. Hennessey and David A. Patterson. 4th edition
  • 2. 2 Outline  Introduction  Computing Classes  Task of Computer Designer  Measuring, Reporting, Summarizing Performance  Quantitative Principles of Design
  • 3. 3 Introduction Eniac, 1946 (first stored-program computer) Occupied 50x30 feet room, weighted 30 tonnes, contained 18000 electronic valves, consumed 25KW of electrical power; capable to perform 100K calc. per second CHANGE! PlayStation Portable (PSP) Approx. 170 mm (L) x 74 mm (W) x 23 mm (D) Weight: Approx. 260 g (including battery) CPU: PSP CPU (clock frequency 1~333MHz) Main Memory: 32MB Embedded DRAM: 4MB Profile: PSP Game, UMD Audio, UMD Video
  • 4. 4 A short history of computing  Continuous growth in performance due to advances in technology and innovations in computer design  First 25 years (1945 – 1970)  25% yearly growth in performance  Both forces contributed to performance improvement  Mainframes and minicomputers dominated the industry  Late 70s, emergence of the microprocessor  35% yearly growth in performance thanks to integrated circuit technology  Changes in computer marketplace: elimination of assembly language programming, emergence of Unix  easier to develop new architectures  Mid 80s, emergence of RISCs (Reduced Instruction Set Computers)  52% yearly growth in performance  Performance improvements through instruction level parallelism (pipelining, multiple instruction issue), caches  Since ‘02, end of 16 years of renaissance  20% yearly growth in performance  Limited by 3 hurdles: maximum power dissipation, instruction-level parallelism, and so called “memory wall”  Switch from ILP to TLP and DLP (Thread-, Data-level Parallelism)
  • 5. 5 Effect of this Dramatic Growth  Significant enhancement of the capability available to computer user  Example: a today’s $500 PC has more performance, more main memory, and more disk storage than a $1 million computer in 1985  Microprocessor-based computers dominate  Workstations and PCs have emerged as major products  Minicomputers - replaced by servers  Mainframes - replaced by multiprocessors  Supercomputers - replaced by large arrays of microprocessors
  • 6. 6 Changing Face of Computing  In the 1960s mainframes roamed the planet  Very expensive, operators oversaw operations  Applications: business data processing, large scale scientific computing  In the 1970s, minicomputers emerged  Less expensive, time sharing  In the 1990s, Internet and WWW, handheld devices (PDA), high-performance consumer electronics for video games and set-top boxes have emerged  Dramatic changes have led to 3 different computing markets  Desktop computing, Servers, Embedded Computers
  • 7. 7 Computing Classes: A Summary Feature Desktop Server Embedded Price of the system $500-$5K $5K-$5M $10-$100K (including network routers at high end) Price of the processor $50-$500 $200-$10K $0.01 - $100 Sold per year (estimates for 2000) 150M 4M 300M (only 32-bit and 64-bit) Critical system design issues Price & performance. graphics, compute performance Throughput, availability, scalability Price, power consumption, application-specific performance
  • 8. 8 Desktop Computers  Largest market in dollar terms  Spans low-end (<$500) to high-end ($5K) systems  Optimize price-performance  Performance measured in the number of calculations and graphic operations  Price is what matters to customers  Arena where the newest, highest-performance and cost-reduced microprocessors appear  Reasonably well characterized in terms of applications and benchmarking
  • 9. 9 Servers  Provide more reliable file and computing services (Web servers)  Key requirements  Availability – effectively provide service (Yahoo!, Google, eBay)  Reliability – never fails  Scalability – server systems grow over time, so the ability to scale up the computing capacity is crucial  Performance – transactions per minute  Related category: clusters / supercomputers
  • 10. 10 Embedded Computers  Fastest growing portion of the market  Computers as parts of other devices where their presence is not obviously visible  E.g., home appliances, printers, smart cards, cell phones, palmtops, gaming consoles, network routers  Wide range of processing power and cost  $0.1 (8-bit, 16-bit processors), $10 (32-bit capable to execute 50M instructions per second), $100-$200 (high-end video gaming consoles and network switches)  Requirements  Real-time performance requirement (e.g., time to process a video frame is limited)  Minimize memory requirements, power  SOCs (System-on-a-chip) combine processor cores and application-specific circuitry, DSP processors…
  • 11. 11 Task of Computer Designer  “Determine what attributes are important for a new machine; then design a machine to maximize performance while staying within cost, power, and availability constraints.”  Aspects of this task  Instruction set design  Functional organization  Logic design and implementation (IC design, packaging, power, cooling...)
  • 12. 12 What is Computer Architecture?  Instruction Set Architecture  the computer visible to the assembler language programmer or compiler writer (registers, data types, instruction set, instruction formats, addressing modes)  Organization  high level aspects of computer’s design such as the memory system, the bus structure, and the internal CPU (datapath + control) design  Hardware  detailed logic design, interconnection and packing technology, external connections Computer Architecture covers all three aspects of computer design
  • 13. Instruction Set Architecture, Organization, and Hardware 13
  • 14. 14 Instruction Set Architecture: Critical Interface instruction set software hardware  Properties of a good abstraction  Lasts through many generations (portability)  Used in many different ways (generality)  Provides convenient functionality to higher levels  Permits an efficient implementation at lower levels
  • 16. 16 Cost-Performance  Purchasing perspective: from a collection of machines, choose one which has  best performance?  least cost?  best performance/cost?  Computer designer perspective: faced with design options, select one which has  best performance improvement?  least cost?  best performance/cost?  Both require: basis for comparison and metric for evaluation
  • 17. 17 Two “notions” of performance  Which computer has better performance?  User: one which runs a program in less time  Computer centre manager: one which completes more jobs in a given time  Users are interested in reducing Response time or Execution time  the time between the start and the completion of an event  Managers are interested in increasing Throughput or Bandwidth  total amount of work done in a given time
  • 18. 18 An Example  Which has higher performance?  Time to deliver 1 passenger?  B is 6.5/3 = 2.2 times faster  No. of passengers delivered per hour?  A is 72/44 = 1.6 times faster Plane DC to Paris [hour] Top Speed [mph] Passe -ngers Throughput [p/h] A 6.5 610 470 72 (=470/6.5) B 3 1350 132 44 (=132/3)
  • 19. 19 Definition of Performance  We are primarily concerned with Response Time  Performance :  “X is n times faster than Y”  As faster means both increased performance and decreased execution time, to reduce confusion will use “improve performance” or “improve execution time” )(_ )( xtimeExecution xePerformanc 1  )( )( )(_ )(_ yePerformanc xePerformanc xtimeExecution ytimeExecution n 
  • 20. 20 Execution Time and Its Components  Execution Time - response time, elapsed time  the latency to complete a task, including disk accesses, memory accesses, input/output activities, operating system overhead,...  CPU time  the time the CPU is computing, excluding I/O or running other programs with multiprogramming  often further divided into user and system CPU times  User CPU time  the CPU time spent in the program  System CPU time  the CPU time spent in the operating system
  • 21. 21 CPU Time for a program  Instruction count (IC) = Number of instructions executed  Clock cycles per instruction (CPI) timecycleClockprogramaforcyclesclockCPUtimeCPU  rateClock programaforcyclesclockCPU CPUtime  IC programaforcyclesclockCPU CPI  CPI - one way to compare two machines with same instruction set, since Instruction Count would be the same
  • 22. 22 CPU Execution Time (cont’d) timecycleClockCPIICtimeCPU  Program Seconds cycleClock Seconds nInstructio cyclesClock Program nsInstructio timeCPU  Substituting the units of measurement
  • 23.  Basic technologies involved in changing each characteristic are interdependent:  Hence it is difficult to change one parameter without affecting others 23 IC CPI Clock rate Program X Compiler X ISA X X Organisation X X Technology X CPU Execution Time (cont’d)
  • 24. 24 How to Calculate 3 Components?  Clock Cycle Time  in specification of computer (Clock Rate in advertisements)  Instruction count  Count instructions in loop of small program  Use simulator to count instructions  Hardware counter in special register (Pentium II)  CPI  Calculate: Execution Time / Clock cycle time / Instruction Count  Hardware counter in special register (Pentium II)
  • 25. 25 Metrics  For desktop and workstations in most cases it is execution time  Also known as response time  Reciprocal of performance  For servers it is normally throughput  How many jobs can get done in unit time  Almost always a response time limit (per job) is also imposed  For embedded processors it is also execution time  In many situations a hard or soft real time deadline is imposed  Hard deadlines cannot be missed, soft deadlines can be missed in a limited cases  Goal: maximize performance within power budget
  • 26. 26 SPEC- Benchmarks  Want to compare two processors by measuring their performance  Need some standardized set of programs  These are called benchmark programs  Each market sector has a different focus: needs different set of benchmark
  • 27. 27 Choosing Programs to Evaluate Per.  Ideally run typical programs with typical input before purchase, or before even build machine  Engineer uses compiler, spreadsheet  Author uses word processor, drawing program, compression software  Workload – mixture of programs and OS commands that users run on a machine
  • 28. 28 Benchmarks  Different types of benchmarks  Real programs (Ex. MSWord, Excel, Photoshop,...)  Kernels - small pieces from real programs (Linpack,...)  Toy Benchmarks - short, easy to type and run ( Quicksort, Puzzle,...)  Synthetic benchmarks - code that matches frequency of key instructions and operations to real programs (Whetstone, Dhrystone)  Need industry standards so that different processors can be fairly compared  Companies exist that create these benchmarks: “typical” code used to evaluate systems
  • 29. 29 Desktop benchmarks  SPEC CPU is the standard  Provided by Standard Performance Evaluation Corporation (SPEC) http://www.spec.org/  Currently in the 5th generation: SPEC89, SPEC92, SPEC95, SPEC2000, SPEC2006  Has two types of programs: integer and floating-point to stress different CPU units  SPEC2000 has 12 integer and 14 floating-point programs  If you are introducing a new processor for desktop or uniprocessor workstation/server, you are supposed to report performance of all 26 programs  For graphics performance two available benchmarks: SPECviewperf and SPECapc
  • 30. 30 Server benchmarks  SPECrate  Not the most reliable measure: run a copy of the same SPEC application on each processor and report average number of jobs finished per unit time  SPECSFS  Tests the performance of NFS using a series of file server requests; measures CPU, disk and network throughput; also puts response time limit on each request  SPECWeb  Web server benchmark; simulates multiple clients requesting both static and dynamic pages from a server; also clients can upload data on the server
  • 31. 31 Server benchmarks  Transaction processing (TP)  Probably the most widely used benchmark in server community  Measures database access and update throughput of a server  Simple examples: airline reservation, bank ATM  Complex systems: complicated query processing e.g. online book shops (amazon)  Provided by Transaction Processing Council (TPC)  TPC-A was the first one; now we have TPC-C (complex queries), TPC-H (unrelated queries), TPC-R (DBMS is optimized based on past query pattern), TPC-W (business-oriented transactional web server)  The measured metric is transactions per second along with response time limit for individual transactions
  • 32. 32 Embedded sector  Quite difficult to come up with a good benchmark  Wide range of applications  Requirements vary widely from one system to another  Reality: in many cases an embedded system designer designs his/her own benchmarks which are either the target applications or kernels extracted from them  Today the best known benchmark set is offered by EEMBC (Embedded Microprocessor Benchmark Consortium)  EEMBC has five types of applications: automotive/industrial (16 kernels), consumer (5 multimedia kernels), networking (3 kernels), office automation (4 graphics and text processing kernels), telecommunications (6 filtering and DSP kernels)
  • 33. 33 Comparing and Summarising Per.  An Example  What we can learn from these statements?  We know nothing about relative performance of computers A, B, C!  One approach to summarise relative performance: use total execution times of programs Program Com. A Com. B Com. C P1 (sec) 1 10 20 P2 (sec) 1000 100 20 Total (sec) 1001 110 40 – A is 20 times faster than C for program P1 – C is 50 times faster than A for program P2 – B is 2 times faster than C for program P1 – C is 5 times faster than B for program P2
  • 34.  A Benchmark Program is compiled and run on computer under test- note the running time  The same pgm is compiled and run on a reference computer  Compute the SPEC ratio as 34 testundercomputertheontimerunning computerrefencetheontimerunning RatioSPEC  SPEC ratio of 50 means comp. under test is 50 times faster than ref. comp. Comparing and Sum. Per. (cont’d)
  • 35.  The above process is repeated for all pgms in the SPEC suite  The overall SPEC ratio for the computer 35          n i i ratioSPEC SPEC n 1 1 Where, n – number of pgms in the suite SPECi – ratio for the pgm i in the suite Comparing and Sum. Per. (cont’d)
  • 36.  SPECRatio of computer A on a benchmark was 1.25 times higher than computer B  This means performance of computer A is 1.25 times higher than computer B 36  Note - choice of the reference computer is irrelevant Comparing and Sum. Per. (cont’d)
  • 37. 37 Quantitative Principles of Design  Where to spend time making improvements?  Make the Common Case Fast  Most important principle of computer design: Spend your time on improvements where those improvements will do the most good  Example  Instruction A represents 5% of execution  Instruction B represents 20% of execution  Even if you can drive the time for A to 0, the CPU will only be 5% faster  Key questions  What the frequent case is?  How much performance can be improved by making that case faster?
  • 38. 38 1. Focus on the Common Case - Amdahl’s Law  Suppose that we make an enhancement to a machine that will improve its performance; Speedup is ratio:  Amdahl’s Law states that the performance improvement that can be gained by a particular enhancement is limited by the amount of time that enhancement can be used tenhancemenusingtaskentireforExTime tenhancemenwithouttaskentireforExTime Speedup  tenhancemenwithouttaskentireforePerformanc tenhancemenusingtaskentireforePerformanc Speedup 
  • 39. 39 Computing Speedup  Fraction_enhanced = fraction of execution time in the original machine that can be converted to take advantage of enhancement (E.g., 10/30)  Speedup_enhanced = how much faster the enhanced code will run (E.g., 10/2=5)  Execution_timenew = ex time of unenhanced part + new ex time of the enhanced part enhanced enhanced unenhancednew Speedup ExTime ExTimeExTime  20 10 20 2
  • 40. 40 Computing Speedup (cont’d)  so times are:  Factor out Timeold and divide by Speedupenhanced:  Overall speedup is ratio of Timeold to Timenew:  enhancedoldunenhanced FractionExTimeExTime  1 enhancedoldenhanced FractionExTimeExTime         enhanced enhanced enhancedoldnew Speedup Fraction FractionExTimeExTime 1 enhanced enhanced enhanced Speedup Fraction Fraction Speedup   1 1
  • 41. 41 Example 1  Enhancement runs 10 times faster and it affects 40% of the execution time; compute the overall speedup :  Fractionenhanced = 0.40  Speedupenhanced = 10  Speedupoverall = ? 561 640 1 10 40 401 1 . .. .   Speedup
  • 42. Example 2  A common transformation required in graphics processors is square root. Implementations of floating-point (FP) square root vary significantly in performance, especially among processors designed for graphics. Suppose FP square root (FPSQR) is responsible for 20% of the execution time of a critical graphics benchmark. One proposal is to enhance the FPSQR hardware and speed up this operation by a factor of 10.  The other alternative is just to try to make all FP instructions in the graphics processor run faster by a factor of 1.6; FP instructions are responsible for half of the execution time for the application. The design team believes that they can make all FP instructions run 1.6 times faster with the same effort as required for the fast square root. Compare these two design alternatives. 42
  • 43.  Hence improving the performance of the FP operations overall is slightly better because of the higher frequency. 43 enhanced enhanced enhanced Speedup Fraction Fraction Speedup   1 1
  • 44. Another method of calculating CPU time 44 where – ICi represents number of times instruction i is executed in a program CPIi represents the average number of clocks per instruction for instruction i. The CPU clock cycles taken by a program is given by :
  • 45. 45 The CPU time taken by a program is given by : The overall CPI is expressed by :
  • 46. Example 3  Suppose we have made the following measurements:   Frequency of FP operations =25%  Average CPI of FP operations =4.0  Average CPI of other instructions = 1.33  Frequency of FPSQR=2%  CPI of FPSQR=20  Assume that the two design alternatives are to decrease the CPI of FPSQR to 2 or to decrease the average CPI of all FP operations to 2.5. Compare these two design alternatives using the processor performance equation. 46
  • 47.  Find the original CPI with neither enhancement  Compute the CPI for the enhanced FPSQR by subtracting the cycles saved from the original CPI: 47
  • 48.  compute the CPI for the enhancement of all FP instructions the same way: CPInewFP = CPIoriginal - 25% x (CPIoldFP - CPInewFPonly ) =2 – 0.25 x (4 - 2.5) =2 – (0.25 X 1.5) =1.625  Performance is marginally better with FP enhancement  Speedup with new FP = 2 / 1.625 = 1.23 48
  • 49. 2. Take Advantage of Parallelism  One of the most important methods for improving performance In a server:  multiple processors and multiple disks can be used - improves throughput.  Scalability- Being able to expand memory, number of processors, disks - is a valuable asset for servers. 49
  • 50. Individual processor  Parallelism among instructions - pipelining  set-associative caches use multiple banks of memory 50 2. Take Advantage of Parallelism
  • 51. 3. Principle of Locality  Programs tend to reuse data and instructions they have used recently.  Program spends 90% of its execution time in only 10% of the code  Two different types of locality  Temporal locality- recently accessed items are likely to be accessed in the near future.  Spatial locality- items whose addresses are near one another tend to be referenced close together in time. 51
  • 52. Principle of Locality  We can predict which instructions and data a program will use in the near future based on its accesses in the recent past 52