SlideShare a Scribd company logo
1
Copyright © 2012, Elsevier Inc. All rights reserved.
Module 1
Fundamentals of Quantitative
Design and Analysis
Computer Architecture
A Quantitative Approach, sixth Edition
1970: The IBM 7044 Computer
Introduction
Copyright © 2013, Daniel A. Menasce. All rights reserved. 2
1970: Magnetic Core Memory
Introduction
1core =
1 bit
Copyright © 2013, Daniel A. Menasce. All rights reserved. 3
How Does a Computer Work?
 What is the Von Neumann computer
architecture?
 What are the elements of a Von Neumann
computer?
 How do these elements interact to execute
programs?
Introduction
Copyright © 2013, Daniel A. Menasce. All rights reserved. 4
Von Neumann Architecture
 What is the Von Neumann computer
architecture?
 What are the elements of a Von Neumann
computer?
 How do these elements interact to execute
programs?
Introduction
Copyright © 2013, Daniel A. Menasce. All rights reserved. 5
Von Neumann Architecture
Introduction
Stored-program
computer
Program
ALU Data
Registers
system bus
Copyright © 2013, Daniel A. Menasce. All rights reserved. 6
Von Neumann Architecture
Introduction
Program
ALU Data
Registers
Stored-program
computer
Sequence of
instructions:
(1) Load/Store from
memory to/from
registers, (2)
Arithmetic/Logical
Ops., (3) Conditional
and unconditional
branches.
General
Copyright © 2013, Daniel A. Menasce. All rights reserved. 7
registers,
FP registers,
Program
counter, etc.
Example of Instruction Formats
Introduction
Copyright © 2013, Daniel A. Menasce. All rights reserved. 8
opcode RX RY RZ
opcode RX Address
RZ RX op RY
RX  Mem(address)
Mem(address) RX
opcode Address
Unconditional or
Conditional Jump to
Address
Computer Technology
 Performance improvements:
 Improvements in semiconductor technology
 Feature size, clock speed
 Improvements in computer architectures
 Enabled by HLL compilers, UNIX
 Lead to RISC architectures
 Together have enabled:
 Lightweight computers
 Productivity-based managed/interpreted
programming languages
Introduction
Copyright © 2012, Elsevier Inc. All rights reserved. 9
Single Processor Performance
Introduction
RISC
Move to multi-processor. From ILP to DLP and TLP
Copyright © 2012, Elsevier Inc. All rights reserved. 10
11
Moore’s Law
Copyright © 2013, Daniel A. Menasce. All rights reserved.
Introduction
Source: Wikimedia Commons
The number of
transistors on
integrated
circuits doubles
approximately
every two years.
(Gordon Moore,
1965)
12
Copyright © 2012, Elsevier Inc. All rights reserved.
Current Trends in Architecture
 Cannot continue to leverage Instruction-Level
parallelism (ILP)
 Single processor performance improvement ended in
2003
 New models for performance:
 Data-level parallelism (DLP)
 Thread-level parallelism (TLP)
 Request-level parallelism (RLP)
 These require explicit restructuring of the
application
Introduction
Instruction Level Parallelism (ILP)
Introduction
In high-level language:
e= a + b;
f = c + d;
g = e * f;
c d
a b
+
e
+
f
*
g
3. ADD
4. STO
Copyright © 2013, Daniel A. Menasce. All rights reserved. 13
* f = c + d
7. ADD
8. STO
In assembly language:
* e = a+b
001 LOAD A,R1
002 LOAD B,R2
R1,R2,R3
R3,E
5. LOAD C,R4
6. LOAD D,R5
R4,R5,R6
R6,F
* g = e * f
9. MULT R3,R6,R7
10. STO R7,G
Instruction Level Parallelism
Introduction
Copyright © 2013, Daniel A. Menasce. All rights reserved. 14
In assembly language:
* e = a+b
001 LOAD A,R1
002 LOAD B,R2
3. ADD
4. STO
R1,R2,R3
R3,E
* f = c + d
005 LOAD C,R4
006 LOAD D,R5
7. ADD
8. STO
R4,R5,R6
R6,F
* g = e * f
9. MULT R3,R6,R7
10. STO R7,G
Instructions that can
potentially be executed in
parallel:
001, 002, 005, 006
003, 007
004, 008, 009
010
Data Level Parallelism (DLP)
Introduction
Task
Copyright © 2013, Daniel A. Menasce. All rights reserved. 15
Data
Task Data
Task Level Parallelism (TLP)
Introduction
Tasks
Copyright © 2013, Daniel A. Menasce. All rights reserved. 16
Task
Classes of Computers
 Personal Mobile Device (PMD)
 e.g. smart phones, tablet computers
 Emphasis on energy efficiency and real-time
 Desktop Computing
 Emphasis on price-performance
 Servers
 Emphasis on availability, scalability, throughput
 Clusters / Warehouse Scale Computers
 Used for “Software as a Service (SaaS)”
 Emphasis on availability and price-performance
 Sub-class: Supercomputers, emphasis: floating-point
performance and fast internal networks
 Embedded Computers
 Emphasis: price
Classes
of
Computers
Copyright © 2012, Elsevier Inc. All rights reserved. 17
Parallelism
 Classes of parallelism in applications:
 Data-Level Parallelism (DLP)
 Task-Level Parallelism (TLP)
 Classes of architectural parallelism:
 Instruction-Level Parallelism (ILP)
 Vector architectures/Graphic Processor Units (GPUs)
 Thread-Level Parallelism
 Request-Level Parallelism
Classes
of
Computers
Copyright © 2012, Elsevier Inc. All rights reserved. 18
Flynn’s Taxonomy
 Single instruction stream, single data stream (SISD)
 Single instruction stream, multiple data streams (SIMD)
 Vector architectures
 Multimedia extensions
 Graphics processor units
 Multiple instruction streams, single data stream (MISD)
 No commercial implementation
 Multiple instruction streams, multiple data streams
(MIMD)
 Tightly-coupled MIMD (thread-level parallelism)
 Loosely-coupled MIMD (cluster or warehouse-scale computing)
Classes
of
Computers
Copyright © 2012, Elsevier Inc. All rights reserved. 19
Defining Computer Architecture
 “Old” view of computer architecture:
 Instruction Set Architecture (ISA) design
 i.e. decisions regarding:
 Registers (register-to-memory or load-store), memory
addressing (e.g., byte addressing, alignment), addressing
modes e.g., register, immediate, displacement), instruction
operands (type and size), available operations (CISC or
RISC), control flow instructions, instruction encoding (fixed
vs. variable length)
 “Real” computer architecture:
 Specific requirements of the target machine
 Design to maximize performance within constraints:
cost, power, and availability
 Includes ISA, microarchitecture, hardware
Defining
Computer
Architecture
Copyright © 2012, Elsevier Inc. All rights reserved. 20
Trends in Technology
 Integrated circuit technology
 Transistor density: 35%/year
 Die size: 10-20%/year
 Integration overall (Moore’s Law): 40-55%/year
 DRAM capacity: 25-40%/year (slowing)
 Flash capacity (non-volatile semiconductor memory):
50-60%/year
 15-20X cheaper/bit than DRAM
 Magnetic disk technology: density increase: 40%/year
 15-25X cheaper/bit then Flash
 300-500X cheaper/bit than DRAM
Trends
in
Technology
Copyright © 2012, Elsevier Inc. All rights reserved. 21
Bandwidth and Latency
 Bandwidth or throughput
 Total work done in a given time (e.g., MB/sec, MIPS)
 10,000-25,000X improvement for processors
 300-1200X improvement for memory and disks
 Latency or response time
 Time between start and completion of an event
 30-80X improvement for processors
 6-8X improvement for memory and disks
Trends
in
Technology
Copyright © 2012, Elsevier Inc. All rights reserved. 22
Bandwidth and Latency
Log-log plot of bandwidth and latency milestones
Trends
in
Technology
Copyright © 2012, Elsevier Inc. All rights reserved. 23
Transistors and Wires
 Feature size
 Minimum size of transistor or wire in x or y
dimension
 10 microns in 1971 to .032 microns in 2011
 Transistor performance scales linearly
 Wire delay does not improve with feature size!
 Integration density scales quadratically
Trends
in
Technology
Copyright © 2012, Elsevier Inc. All rights reserved. 24
Power (1 Watt = 1 Joule/sec) and
Energy (Joules)
 Problem: Get power in, get power out
 Thermal Design Power (TDP)
 Characterizes sustained power consumption
 Used as target for power supply and cooling system
 Lower than peak power, higher than average power
consumption
 Clock rate can be reduced dynamically to limit
power consumption
 Energy per task is often a better measurement
Trends
in
Power
and
Energy
Copyright © 2012, Elsevier Inc. All rights reserved. 25
26
Energy and Power Example
Copyright © 2013, Daniel A. Menasce. All rights reserved.
Introduction
Processor A Processor B
20% higher average power
consumption: 1.2 P
P
Executes a task in 70% of
the time needed by B:
0.7 * T
T
Energy consumption: 1.2 *
0.7 * T = 0.84 P * T
Energy consumption = P * T
More energy efficient!
It is better to use energy instead of
power for comparing a fixed workload.
Dynamic Energy and Power
 Dynamic energy
 Transistor switch from 0 -> 1 or 1 -> 0
 ½ x Capacitive load x Voltage2
 Dynamic power
 ½ x Capacitive load x Voltage2 x Frequency switched
 Reducing clock rate reduces power, not energy
Trends
in
Power
and
Energy
Copyright © 2012, Elsevier Inc. All rights reserved. 27
Power
 Intel 80386
consumed ~ 2 W
 3.3 GHz Intel
Core i7 consumes
130 W
 Heat must be
dissipated from
1.5 x 1.5 cm chip
 This is the limit of
what can be
cooled by air
Trends
in
Power
and
Energy
Copyright © 2012, Elsevier Inc. All rights reserved. 29
Reducing Power
 Techniques for reducing power:
 Do nothing well: turn-off clock of inactive
modules.
 Dynamic Voltage-Frequency Scaling
 Low power state for DRAM, disks (spin-down)
 Overclocking some cores and turning off other
cores
Trends
in
Power
and
Energy
Copyright © 2012, Elsevier Inc. All rights reserved. 30
Static Power
 Static power consumption
 Leakage current flows even when a transistor
is off
 Leakage can be as high as 50% due to large
SRAM caches that need power to maintain
values.
 Powerstatic = Currentstatic x Voltage
 Scales with number of transistors
 To reduce: power gating (i.e., turn off the
power supply).
Trends
in
Power
and
Energy
Copyright © 2012, Elsevier Inc. All rights reserved. 31
Trends in Cost
 Cost driven down by learning curve
 Yield (% of manufactured devices that survive
testing).
 DRAM: price closely tracks cost
 Microprocessors: price depends on
volume (less time to get down the learning
curve and increase in manufacturing
efficiency)
 10% less for each doubling of volume
Trends
in
Cost
Copyright © 2012, Elsevier Inc. All rights reserved. 32
Integrated Circuit Cost
 Integrated circuit
 Bose-Einstein formula:
 Defects per unit area = 0.016-0.057 defects per square cm (2010)
 N = process-complexity factor = 11.5-15.5 (40 nm, 2010)
Trends
in
Cost
# dies along the
wafer’s perimeter
empirical formula
Copyright © 2012, Elsevier Inc. All rights reserved. 33
Dependability
 Service Level Agreement (SLA) or Service Level
Objectives (SLO)
 Module reliability
 Mean time to failure (MTTF) – reliability measure
 Mean time to repair (MTTR) – service interruption
 Mean time between failures (MTBF) = MTTF + MTTR
 Availability = MTTF / MTBF
Dependability
Copyright © 2012, Elsevier Inc. All rights reserved. 36
MTTR, MTTF, and MTBF
time
failure failure
MTTR MTTF
machine
is fixed
MTBF
© 2004 D. A. Menascé. All Rights Reserved. 37
Dependability Example
 A disk subsystem has the following components
and MTTF values:
 Assume that lifetimes are exponentially
distributed and independent, what is the
system’s MTTF?
Dependability
Copyright © 2012, Elsevier Inc. All rights reserved. 38
Component MTTF (hours)
10 disks Each at 1,000,000
hours
1 ATA controller 500,000
1 power supply 200,000
1 fan 200,00
1 ATA cable 1,000,000
Dependability Example Cont’d
Dependability
Copyright © 2012, Elsevier Inc. All rights reserved. 39
Component MTTF (hours)
10 disks Each at 1,000,000
hours
1 ATA controller 500,000
1 power supply 200,000
1 fan 200,00
1 ATA cable 1,000,000
Failure Ratesyst
10 
1

1

1

1

1
1,000,000 500,000 200,000 200,000 1,000,000
 23,000/billion hours
MTTFsyst
Failure Ratesyst
1
  43,500 hours  4.96 yrs
Measuring Performance
 Typical performance metrics:
 Response time (of interest to users)
 Throughput (of interest to operators)
 Speedup of X relative to Y
 Execution timeY / Execution timeX
 Execution time
 Wall clock time: includes all system overheads
 CPU time: only computation time
 Benchmarks
 Kernels (e.g., matrix multiply)
 Toy programs (e.g., sorting)
 Synthetic benchmarks (e.g., Dhrystone)
 Benchmark suites (e.g., SPEC, TPC)
Measuring
Performance
Copyright © 2012, Elsevier Inc. All rights reserved. 40
SPECRate and SPECRatio
 SPECrate: a throughput metrics. Measures the number jobs of a
given type that can be processed in a given time interval.
 SPECratio = ratio between elapsed time for a given job at a
reference machines and the elapsed time of the same job at a given
machine.
Measuring
Performance
reference
Execution Time 10.5sec
Execution Timemachine A  5.25sec
SPECratiomachine A 10.5/5.25  2
 Comparing machines A and B:
Execution Timemachine B  21sec
Execution Timemachine A  5.25sec
Execution Timereference
Execution Timereference
Execution Timemachine B
SPECratiomachine A
 Execution Timemachine A
 Execution Timemachine B
SPECratiomachine B Execution Timemachine A
 21/5.25  4
Copyright © 2013, Daniel A. Menasce 41
Geometric Mean of SPECratios
 When computing the average of SPECratios one should use the
geometric mean and not the arithmetic mean:
n
Geometric mean = n xi
i=1
Measuring
Performance
Program SPECRatio
A 10.2
B 21.5
C 15.2
Geometric mean 14.94
Geometric mean = 3
10.2  21.5 15.5  3 3,333.36 14.94
Copyright © 2013, Daniel A. Menasce 42
Principles of Computer Design
 Take Advantage of Parallelism
 e.g., multiple processors, disks, memory banks,
pipelining, multiple functional units
 Principle of Locality
 Reuse of data and instructions
 Focus on the Common Case
 Amdahl’s Law
Principles
Copyright © 2012, Elsevier Inc. All rights reserved. 43
Principles of Computer Design
 The Processor Performance Equation
Principles
(average) Clock cycles per
instruction
Copyright © 2012, Elsevier Inc. All rights reserved. 46

More Related Content

Similar to CAQA5e_ch1 (3).pptx

L14-Embedded.ppt
L14-Embedded.pptL14-Embedded.ppt
L14-Embedded.ppt
AashuSah2
 
Back to The Future V
Back to The Future VBack to The Future V
Back to The Future V
Magnus Backman
 
The Cloud & Its Impact on IT
The Cloud & Its Impact on ITThe Cloud & Its Impact on IT
The Cloud & Its Impact on IT
Anand Haridass
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Romeo Kienzler
 
Architecting for Hyper-Scale Datacenter Efficiency
Architecting for Hyper-Scale Datacenter EfficiencyArchitecting for Hyper-Scale Datacenter Efficiency
Architecting for Hyper-Scale Datacenter Efficiency
Intel IT Center
 
Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism)
A B Shinde
 
N045067680
N045067680N045067680
N045067680
IJERA Editor
 
Quad Core Processors - Technology Presentation
Quad Core Processors - Technology PresentationQuad Core Processors - Technology Presentation
Quad Core Processors - Technology Presentation
vinaya.hs
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
Sagar Dolas
 
Lllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzj
LllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzjLllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzj
Lllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzj
ManhHoangVan
 
Infrastructure Strategies 2007
Infrastructure Strategies 2007Infrastructure Strategies 2007
Infrastructure Strategies 2007
Dr. Jimmy Schwarzkopf
 
Ju3417721777
Ju3417721777Ju3417721777
Ju3417721777
IJERA Editor
 
Chapter_01.pptx
Chapter_01.pptxChapter_01.pptx
Chapter_01.pptx
aliceasiedu980
 
Design and implementation of DADCT
Design and implementation of DADCTDesign and implementation of DADCT
Design and implementation of DADCT
Satish Kumar
 
hyperlynx_compress.pdf
hyperlynx_compress.pdfhyperlynx_compress.pdf
hyperlynx_compress.pdf
raimonribal
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
Slide_N
 
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization
Cell Technology for Graphics and Visualization
Slide_N
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and Computation
Tal Lavian Ph.D.
 

Similar to CAQA5e_ch1 (3).pptx (20)

L14-Embedded.ppt
L14-Embedded.pptL14-Embedded.ppt
L14-Embedded.ppt
 
Back to The Future V
Back to The Future VBack to The Future V
Back to The Future V
 
The Cloud & Its Impact on IT
The Cloud & Its Impact on ITThe Cloud & Its Impact on IT
The Cloud & Its Impact on IT
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
 
Architecting for Hyper-Scale Datacenter Efficiency
Architecting for Hyper-Scale Datacenter EfficiencyArchitecting for Hyper-Scale Datacenter Efficiency
Architecting for Hyper-Scale Datacenter Efficiency
 
Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism)
 
N045067680
N045067680N045067680
N045067680
 
Quad Core Processors - Technology Presentation
Quad Core Processors - Technology PresentationQuad Core Processors - Technology Presentation
Quad Core Processors - Technology Presentation
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Lllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzj
LllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzjLllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzj
Lllsjjsjsjjshshjshjsjjsjjsjjzjsjjzjjzjjzj
 
20090720 smith
20090720 smith20090720 smith
20090720 smith
 
Infrastructure Strategies 2007
Infrastructure Strategies 2007Infrastructure Strategies 2007
Infrastructure Strategies 2007
 
Embedded system
Embedded systemEmbedded system
Embedded system
 
Ju3417721777
Ju3417721777Ju3417721777
Ju3417721777
 
Chapter_01.pptx
Chapter_01.pptxChapter_01.pptx
Chapter_01.pptx
 
Design and implementation of DADCT
Design and implementation of DADCTDesign and implementation of DADCT
Design and implementation of DADCT
 
hyperlynx_compress.pdf
hyperlynx_compress.pdfhyperlynx_compress.pdf
hyperlynx_compress.pdf
 
Cell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology GroupCell Today and Tomorrow - IBM Systems and Technology Group
Cell Today and Tomorrow - IBM Systems and Technology Group
 
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization
Cell Technology for Graphics and Visualization
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and Computation
 

Recently uploaded

MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
NelTorrente
 
Delivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and TrainingDelivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and Training
AG2 Design
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
IreneSebastianRueco1
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Ashish Kohli
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
Reflective and Evaluative Practice...pdf
Reflective and Evaluative Practice...pdfReflective and Evaluative Practice...pdf
Reflective and Evaluative Practice...pdf
amberjdewit93
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
Bisnar Chase Personal Injury Attorneys
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Fresher’s Quiz 2023 at GMC Nizamabad.pptx
Fresher’s Quiz 2023 at GMC Nizamabad.pptxFresher’s Quiz 2023 at GMC Nizamabad.pptx
Fresher’s Quiz 2023 at GMC Nizamabad.pptx
SriSurya50
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 

Recently uploaded (20)

MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
 
Delivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and TrainingDelivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and Training
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
Reflective and Evaluative Practice...pdf
Reflective and Evaluative Practice...pdfReflective and Evaluative Practice...pdf
Reflective and Evaluative Practice...pdf
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Fresher’s Quiz 2023 at GMC Nizamabad.pptx
Fresher’s Quiz 2023 at GMC Nizamabad.pptxFresher’s Quiz 2023 at GMC Nizamabad.pptx
Fresher’s Quiz 2023 at GMC Nizamabad.pptx
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 

CAQA5e_ch1 (3).pptx

  • 1. 1 Copyright © 2012, Elsevier Inc. All rights reserved. Module 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative Approach, sixth Edition
  • 2. 1970: The IBM 7044 Computer Introduction Copyright © 2013, Daniel A. Menasce. All rights reserved. 2
  • 3. 1970: Magnetic Core Memory Introduction 1core = 1 bit Copyright © 2013, Daniel A. Menasce. All rights reserved. 3
  • 4. How Does a Computer Work?  What is the Von Neumann computer architecture?  What are the elements of a Von Neumann computer?  How do these elements interact to execute programs? Introduction Copyright © 2013, Daniel A. Menasce. All rights reserved. 4
  • 5. Von Neumann Architecture  What is the Von Neumann computer architecture?  What are the elements of a Von Neumann computer?  How do these elements interact to execute programs? Introduction Copyright © 2013, Daniel A. Menasce. All rights reserved. 5
  • 6. Von Neumann Architecture Introduction Stored-program computer Program ALU Data Registers system bus Copyright © 2013, Daniel A. Menasce. All rights reserved. 6
  • 7. Von Neumann Architecture Introduction Program ALU Data Registers Stored-program computer Sequence of instructions: (1) Load/Store from memory to/from registers, (2) Arithmetic/Logical Ops., (3) Conditional and unconditional branches. General Copyright © 2013, Daniel A. Menasce. All rights reserved. 7 registers, FP registers, Program counter, etc.
  • 8. Example of Instruction Formats Introduction Copyright © 2013, Daniel A. Menasce. All rights reserved. 8 opcode RX RY RZ opcode RX Address RZ RX op RY RX  Mem(address) Mem(address) RX opcode Address Unconditional or Conditional Jump to Address
  • 9. Computer Technology  Performance improvements:  Improvements in semiconductor technology  Feature size, clock speed  Improvements in computer architectures  Enabled by HLL compilers, UNIX  Lead to RISC architectures  Together have enabled:  Lightweight computers  Productivity-based managed/interpreted programming languages Introduction Copyright © 2012, Elsevier Inc. All rights reserved. 9
  • 10. Single Processor Performance Introduction RISC Move to multi-processor. From ILP to DLP and TLP Copyright © 2012, Elsevier Inc. All rights reserved. 10
  • 11. 11 Moore’s Law Copyright © 2013, Daniel A. Menasce. All rights reserved. Introduction Source: Wikimedia Commons The number of transistors on integrated circuits doubles approximately every two years. (Gordon Moore, 1965)
  • 12. 12 Copyright © 2012, Elsevier Inc. All rights reserved. Current Trends in Architecture  Cannot continue to leverage Instruction-Level parallelism (ILP)  Single processor performance improvement ended in 2003  New models for performance:  Data-level parallelism (DLP)  Thread-level parallelism (TLP)  Request-level parallelism (RLP)  These require explicit restructuring of the application Introduction
  • 13. Instruction Level Parallelism (ILP) Introduction In high-level language: e= a + b; f = c + d; g = e * f; c d a b + e + f * g 3. ADD 4. STO Copyright © 2013, Daniel A. Menasce. All rights reserved. 13 * f = c + d 7. ADD 8. STO In assembly language: * e = a+b 001 LOAD A,R1 002 LOAD B,R2 R1,R2,R3 R3,E 5. LOAD C,R4 6. LOAD D,R5 R4,R5,R6 R6,F * g = e * f 9. MULT R3,R6,R7 10. STO R7,G
  • 14. Instruction Level Parallelism Introduction Copyright © 2013, Daniel A. Menasce. All rights reserved. 14 In assembly language: * e = a+b 001 LOAD A,R1 002 LOAD B,R2 3. ADD 4. STO R1,R2,R3 R3,E * f = c + d 005 LOAD C,R4 006 LOAD D,R5 7. ADD 8. STO R4,R5,R6 R6,F * g = e * f 9. MULT R3,R6,R7 10. STO R7,G Instructions that can potentially be executed in parallel: 001, 002, 005, 006 003, 007 004, 008, 009 010
  • 15. Data Level Parallelism (DLP) Introduction Task Copyright © 2013, Daniel A. Menasce. All rights reserved. 15 Data Task Data
  • 16. Task Level Parallelism (TLP) Introduction Tasks Copyright © 2013, Daniel A. Menasce. All rights reserved. 16 Task
  • 17. Classes of Computers  Personal Mobile Device (PMD)  e.g. smart phones, tablet computers  Emphasis on energy efficiency and real-time  Desktop Computing  Emphasis on price-performance  Servers  Emphasis on availability, scalability, throughput  Clusters / Warehouse Scale Computers  Used for “Software as a Service (SaaS)”  Emphasis on availability and price-performance  Sub-class: Supercomputers, emphasis: floating-point performance and fast internal networks  Embedded Computers  Emphasis: price Classes of Computers Copyright © 2012, Elsevier Inc. All rights reserved. 17
  • 18. Parallelism  Classes of parallelism in applications:  Data-Level Parallelism (DLP)  Task-Level Parallelism (TLP)  Classes of architectural parallelism:  Instruction-Level Parallelism (ILP)  Vector architectures/Graphic Processor Units (GPUs)  Thread-Level Parallelism  Request-Level Parallelism Classes of Computers Copyright © 2012, Elsevier Inc. All rights reserved. 18
  • 19. Flynn’s Taxonomy  Single instruction stream, single data stream (SISD)  Single instruction stream, multiple data streams (SIMD)  Vector architectures  Multimedia extensions  Graphics processor units  Multiple instruction streams, single data stream (MISD)  No commercial implementation  Multiple instruction streams, multiple data streams (MIMD)  Tightly-coupled MIMD (thread-level parallelism)  Loosely-coupled MIMD (cluster or warehouse-scale computing) Classes of Computers Copyright © 2012, Elsevier Inc. All rights reserved. 19
  • 20. Defining Computer Architecture  “Old” view of computer architecture:  Instruction Set Architecture (ISA) design  i.e. decisions regarding:  Registers (register-to-memory or load-store), memory addressing (e.g., byte addressing, alignment), addressing modes e.g., register, immediate, displacement), instruction operands (type and size), available operations (CISC or RISC), control flow instructions, instruction encoding (fixed vs. variable length)  “Real” computer architecture:  Specific requirements of the target machine  Design to maximize performance within constraints: cost, power, and availability  Includes ISA, microarchitecture, hardware Defining Computer Architecture Copyright © 2012, Elsevier Inc. All rights reserved. 20
  • 21. Trends in Technology  Integrated circuit technology  Transistor density: 35%/year  Die size: 10-20%/year  Integration overall (Moore’s Law): 40-55%/year  DRAM capacity: 25-40%/year (slowing)  Flash capacity (non-volatile semiconductor memory): 50-60%/year  15-20X cheaper/bit than DRAM  Magnetic disk technology: density increase: 40%/year  15-25X cheaper/bit then Flash  300-500X cheaper/bit than DRAM Trends in Technology Copyright © 2012, Elsevier Inc. All rights reserved. 21
  • 22. Bandwidth and Latency  Bandwidth or throughput  Total work done in a given time (e.g., MB/sec, MIPS)  10,000-25,000X improvement for processors  300-1200X improvement for memory and disks  Latency or response time  Time between start and completion of an event  30-80X improvement for processors  6-8X improvement for memory and disks Trends in Technology Copyright © 2012, Elsevier Inc. All rights reserved. 22
  • 23. Bandwidth and Latency Log-log plot of bandwidth and latency milestones Trends in Technology Copyright © 2012, Elsevier Inc. All rights reserved. 23
  • 24. Transistors and Wires  Feature size  Minimum size of transistor or wire in x or y dimension  10 microns in 1971 to .032 microns in 2011  Transistor performance scales linearly  Wire delay does not improve with feature size!  Integration density scales quadratically Trends in Technology Copyright © 2012, Elsevier Inc. All rights reserved. 24
  • 25. Power (1 Watt = 1 Joule/sec) and Energy (Joules)  Problem: Get power in, get power out  Thermal Design Power (TDP)  Characterizes sustained power consumption  Used as target for power supply and cooling system  Lower than peak power, higher than average power consumption  Clock rate can be reduced dynamically to limit power consumption  Energy per task is often a better measurement Trends in Power and Energy Copyright © 2012, Elsevier Inc. All rights reserved. 25
  • 26. 26 Energy and Power Example Copyright © 2013, Daniel A. Menasce. All rights reserved. Introduction Processor A Processor B 20% higher average power consumption: 1.2 P P Executes a task in 70% of the time needed by B: 0.7 * T T Energy consumption: 1.2 * 0.7 * T = 0.84 P * T Energy consumption = P * T More energy efficient! It is better to use energy instead of power for comparing a fixed workload.
  • 27. Dynamic Energy and Power  Dynamic energy  Transistor switch from 0 -> 1 or 1 -> 0  ½ x Capacitive load x Voltage2  Dynamic power  ½ x Capacitive load x Voltage2 x Frequency switched  Reducing clock rate reduces power, not energy Trends in Power and Energy Copyright © 2012, Elsevier Inc. All rights reserved. 27
  • 28. Power  Intel 80386 consumed ~ 2 W  3.3 GHz Intel Core i7 consumes 130 W  Heat must be dissipated from 1.5 x 1.5 cm chip  This is the limit of what can be cooled by air Trends in Power and Energy Copyright © 2012, Elsevier Inc. All rights reserved. 29
  • 29. Reducing Power  Techniques for reducing power:  Do nothing well: turn-off clock of inactive modules.  Dynamic Voltage-Frequency Scaling  Low power state for DRAM, disks (spin-down)  Overclocking some cores and turning off other cores Trends in Power and Energy Copyright © 2012, Elsevier Inc. All rights reserved. 30
  • 30. Static Power  Static power consumption  Leakage current flows even when a transistor is off  Leakage can be as high as 50% due to large SRAM caches that need power to maintain values.  Powerstatic = Currentstatic x Voltage  Scales with number of transistors  To reduce: power gating (i.e., turn off the power supply). Trends in Power and Energy Copyright © 2012, Elsevier Inc. All rights reserved. 31
  • 31. Trends in Cost  Cost driven down by learning curve  Yield (% of manufactured devices that survive testing).  DRAM: price closely tracks cost  Microprocessors: price depends on volume (less time to get down the learning curve and increase in manufacturing efficiency)  10% less for each doubling of volume Trends in Cost Copyright © 2012, Elsevier Inc. All rights reserved. 32
  • 32. Integrated Circuit Cost  Integrated circuit  Bose-Einstein formula:  Defects per unit area = 0.016-0.057 defects per square cm (2010)  N = process-complexity factor = 11.5-15.5 (40 nm, 2010) Trends in Cost # dies along the wafer’s perimeter empirical formula Copyright © 2012, Elsevier Inc. All rights reserved. 33
  • 33. Dependability  Service Level Agreement (SLA) or Service Level Objectives (SLO)  Module reliability  Mean time to failure (MTTF) – reliability measure  Mean time to repair (MTTR) – service interruption  Mean time between failures (MTBF) = MTTF + MTTR  Availability = MTTF / MTBF Dependability Copyright © 2012, Elsevier Inc. All rights reserved. 36
  • 34. MTTR, MTTF, and MTBF time failure failure MTTR MTTF machine is fixed MTBF © 2004 D. A. Menascé. All Rights Reserved. 37
  • 35. Dependability Example  A disk subsystem has the following components and MTTF values:  Assume that lifetimes are exponentially distributed and independent, what is the system’s MTTF? Dependability Copyright © 2012, Elsevier Inc. All rights reserved. 38 Component MTTF (hours) 10 disks Each at 1,000,000 hours 1 ATA controller 500,000 1 power supply 200,000 1 fan 200,00 1 ATA cable 1,000,000
  • 36. Dependability Example Cont’d Dependability Copyright © 2012, Elsevier Inc. All rights reserved. 39 Component MTTF (hours) 10 disks Each at 1,000,000 hours 1 ATA controller 500,000 1 power supply 200,000 1 fan 200,00 1 ATA cable 1,000,000 Failure Ratesyst 10  1  1  1  1  1 1,000,000 500,000 200,000 200,000 1,000,000  23,000/billion hours MTTFsyst Failure Ratesyst 1   43,500 hours  4.96 yrs
  • 37. Measuring Performance  Typical performance metrics:  Response time (of interest to users)  Throughput (of interest to operators)  Speedup of X relative to Y  Execution timeY / Execution timeX  Execution time  Wall clock time: includes all system overheads  CPU time: only computation time  Benchmarks  Kernels (e.g., matrix multiply)  Toy programs (e.g., sorting)  Synthetic benchmarks (e.g., Dhrystone)  Benchmark suites (e.g., SPEC, TPC) Measuring Performance Copyright © 2012, Elsevier Inc. All rights reserved. 40
  • 38. SPECRate and SPECRatio  SPECrate: a throughput metrics. Measures the number jobs of a given type that can be processed in a given time interval.  SPECratio = ratio between elapsed time for a given job at a reference machines and the elapsed time of the same job at a given machine. Measuring Performance reference Execution Time 10.5sec Execution Timemachine A  5.25sec SPECratiomachine A 10.5/5.25  2  Comparing machines A and B: Execution Timemachine B  21sec Execution Timemachine A  5.25sec Execution Timereference Execution Timereference Execution Timemachine B SPECratiomachine A  Execution Timemachine A  Execution Timemachine B SPECratiomachine B Execution Timemachine A  21/5.25  4 Copyright © 2013, Daniel A. Menasce 41
  • 39. Geometric Mean of SPECratios  When computing the average of SPECratios one should use the geometric mean and not the arithmetic mean: n Geometric mean = n xi i=1 Measuring Performance Program SPECRatio A 10.2 B 21.5 C 15.2 Geometric mean 14.94 Geometric mean = 3 10.2  21.5 15.5  3 3,333.36 14.94 Copyright © 2013, Daniel A. Menasce 42
  • 40. Principles of Computer Design  Take Advantage of Parallelism  e.g., multiple processors, disks, memory banks, pipelining, multiple functional units  Principle of Locality  Reuse of data and instructions  Focus on the Common Case  Amdahl’s Law Principles Copyright © 2012, Elsevier Inc. All rights reserved. 43
  • 41. Principles of Computer Design  The Processor Performance Equation Principles (average) Clock cycles per instruction Copyright © 2012, Elsevier Inc. All rights reserved. 46