Introduction to computer architecture .pptx

Introduction to Computer
Architecture

Introduction to Computer Architecture
Embedded computing system:
any device that includes a programmable computer
but is not itself a general-purpose computer.
Take advantage of application characteristics to
optimize the design:
don’t need all the general-purpose additional features

Embedding a computer
CPU
mem
input
output analog
analog
embedded
computer

Examples
 Cell phone.
 Printer.
 Automobile: engine, brakes, dash, etc.
 Airplane: engine, flight controls.
 Digital television.
 Household appliances.

Early history
 First microprocessor was Intel 4004 in early
1970’s.

Microprocessor varieties
 Microcontroller: includes I/O devices, on-board memory.
 Digital signal processor (DSP): microprocessor optimized
for digital signal processing.
 Typical embedded word sizes: 8-bit, 16-bit, 32-bit.

Characteristics of embedded systems
 Sophisticated functionality.
 Real-time operation.
 Low manufacturing cost.
 Low power.
 Designed to tight deadlines by small teams.

Functional complexity
 Often have to run sophisticated algorithms or multiple
algorithms.
 Cell phone, laser printer.
 Often provide sophisticated user interfaces.

Real-time operation
 Must finish operations by deadlines.
 Hard real time: missing deadline causes
failure.
 Soft real time: missing deadline results in
degraded performance.
 Many systems are multi-rate: must handle
operations at widely varying rates.

Non-functional requirements
 Many embedded systems are mass-market items that
must have low manufacturing costs.
 Limited memory, microprocessor power, etc.
 Power consumption is critical in battery-powered
devices.
 Excessive power consumption increases system cost even
in wall-powered devices.

Why use microprocessors?
 Alternatives: field-programmable gate arrays (FPGAs),
custom logic, etc.
 Microprocessors are often very efficient:
can use same logic to perform many different functions.
 Microprocessors simplify the design of families of
products.

The performance paradox
 Microprocessors use much more logic to
implement a function than does custom logic.
 But microprocessors are often at least as fast:
 heavily pipelined;
 large design teams;
 aggressive VLSI technology.

Platforms
 Embedded computing platform: hardware
architecture + associated software.
 Many platforms are multiprocessors.
 Examples:
 Single-chip multiprocessors for cell phone baseband.
 Automotive network + processors.

The physics of software
 Computing is a physical act.
 Software doesn’t do anything without hardware.
 Executing software consumes energy,
requires time.
 To understand the dynamics of software
(time, energy), we need to characterize the
platform on which the software runs.

Characterizing performance
We need to analyze the system at several
levels of abstraction to understand
performance:
CPU.
Platform.
Program.
Task.
Multiprocessor.

Challenges in embedded
system design
 How much hardware do we need?
 How big is the CPU? Memory?
 How do we meet our deadlines?
 Faster hardware or cleverer software?
 How do we minimize power?
 Turn off unnecessary logic? Reduce memory
accesses?

Design goals
 Performance.
 Overall speed, deadlines.
 Functionality and user interface.
 Manufacturing cost.
 Power consumption.
 Other requirements (physical size, etc.)

Levels of abstraction
requirements
specification
architecture
component
design
system
integration

Top-down vs. bottom-up
 Top-down design:
 start from most abstract description;
 work to most detailed.
 Bottom-up design:
 work from small components to big system.
 Real design uses both techniques.

Requirements
 Plain language description of what the user
wants and expects to get.
 May be developed in several ways:
 talking directly to customers;
 talking to marketing representatives;
 providing prototypes to users for comment.

Functional vs. non-functional
requirements
 Functional requirements:
 output as a function of input.
 Non-functional requirements:
 time required to compute output;
 size, weight, etc.;
 power consumption;
 reliability;
 etc.

UML
 Object-oriented design.
 Unified Modeling Language (UML).
 Object-oriented (OO) design: A generalization of object-
oriented programming.
 Object = state + methods.
 State provides each object with its own identity.
 Methods provide an abstract interface to the object.

UML object
d1: Display
pixels: array[] of pixels
elements
menu_items
pixels is a
2-D array
comment
object name
class name
attributes

Speaker Display
Multimedia_display
base classes
derived class

pipeline is a set of data processing elements connected in series.
output of one element is the input of the next one.
The elements of a pipeline are often executed in parallel or in
time-sliced fashion

CPU
ARM C55x
Computer architecture
taxonomy
Von Neumann architecture Harvard architecture
Computer architecture according to
their instruction
CISC RISC

von Neumann architecture
 Memory holds data, instructions.
 Central processing unit (CPU) fetches instructions from
memory.
 Separate CPU and memory distinguishes programmable
computer.
 CPU registers help out: program counter (PC), instruction
register (IR), general-purpose registers, etc.

von Neumann architecture
memory
CPU
PC
address
data
IR
ADD r5,r1,r3
200
200
ADD r5,r1,r3

Harvard architecture
CPU
PC
data memory
program memory
data
address
data

von Neumann vs. Harvard
 Harvard can’t use self-modifying code.
 Harvard allows two simultaneous memory
fetches.
 Most DSPs use Harvard architecture for
streaming data:
 greater memory bandwidth;
 more predictable bandwidth.

RISC vs. CISC
 Complex instruction set computer (CISC):
 many addressing modes;
 many operations.
 Reduced instruction set computer (RISC):
 load/store;
 pipelinable instructions.

Instruction set characteristics
 Fixed vs. variable length.
 Addressing modes.
 Number of operands.
 Types of operands.

Programming model
 Programming model: registers visible to the
programmer.
 Some registers are not visible (IR).

Assembly language
 One-to-one with instructions (more or less).
 Basic features:
 One instruction per line.
 Labels provide names for addresses (usually in first column).
 Instructions often start in later columns.
 Columns run to end of line.

ARM assembly language example
label1 ADR r4,c
LDR r0,[r4] ; a comment
ADR r4,d
LDR r1,[r4]
SUB r0,r0,r1 ; comment
Pseudo-ops
Some assembler directives don’t correspond directly to instructions:
Define current address.
Reserve storage.
Constants.

ARM programming model
r0
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
r13
r14
r15 (PC)
CPSR
31 0
N Z C V

Endianness
 Relationship between bit and byte/word ordering defines
endianness:
byte 3 byte 2 byte 1 byte 0
byte 0 byte 1 byte 2 byte 3
bit 31 bit 0
bit 0 bit 31
little-endian
big-endian

ARM data types
 Word is 32 bits long.
 Word can be divided into four 8-bit bytes.
 ARM addresses cam be 32 bits long.
 Address refers to byte.
 Address 4 starts at byte 4.
 Can be configured at power-up as either little-
or bit-endian mode.

ARM versions
 ARM architecture has been extended over several
versions.
 We will concentrate on ARM7.

ARM status bits
 Every arithmetic, logical, or shifting
operation sets CPSR bits:
 N (negative), Z (zero), C (carry), V (overflow).
 Examples:
 -1 + 1 = 0: NZCV = 0110.
 0-1 = -1: NZCV = 1000
 15+10 = 25: NZCV = 0011.

ARM data instructions
 Basic format:
ADD r0,r1,r2
 Computes r1+r2, stores in r0.
 Immediate operand:
ADD r0,r1,#2
 Computes r1+2, stores in r0.

ARM data instructions
 ADD, ADC : add (w.
carry)
 SUB, SBC : subtract
(w. carry)
 RSB, RSC : reverse
subtract (w. carry)
 MUL, MLA : multiply
(and accumulate)
 AND, ORR, EOR
 BIC : bit clear
 LSL, LSR : logical shift
left/right
 ASL, ASR : arithmetic
shift left/right
 ROR : rotate right
 RRX : rotate right
extended with C

©
2008
Wayn
e Wolf
Overheads for Computers as Components 2nd ed.
ARM move instructions
 MOV, MVN : move (negated)
MOV r0, r1 ; sets r0 to r1

©
2008
Wayn
e Wolf
ARM load/store instructions
 LDR, LDRH, LDRB : load (half-word, byte)
 STR, STRH, STRB : store (half-word, byte)
 Addressing modes:
 register indirect : LDR r0,[r1]
 with second register : LDR r0,[r1,-r2]
 with constant : LDR r0,[r1,#4]

Example: C assignments
 C:
x = (a + b) - c;
 Assembler:
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
ADR r4,b ; get address for b, reusing r4
LDR r1,[r4] ; get value of b
ADD r3,r0,r1 ; compute a+b
ADR r4,c ; get address for c
LDR r2,[r4] ; get value of c

C assignment, cont’d.
SUB r3,r3,r2 ; complete computation of x
ADR r4,x ; get address for x
STR r3,[r4] ; store value of x

Example: C assignment
 C:
y = a*(b+c);
 Assembler:
ADR r4,b ; get address for b
LDR r0,[r4] ; get value of b
ADR r4,c ; get address for c
LDR r1,[r4] ; get value of c
ADD r2,r0,r1 ; compute partial result
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a

©
2008
Wayn
e Wolf
C assignment, cont’d.
MUL r2,r2,r0 ; compute final value
for y
ADR r4,y ; get address for y
STR r2,[r4] ; store y

©
2008
Wayn
e Wolf
Bus-Based Computer Systems
 Busses.
 Memory devices.
 I/O devices:
 serial links
 timers and counters
 keyboards
 displays
 analog I/O

DMA
 Direct memory access (DMA) performs data
transfers without executing instructions.
 CPU sets up transfer.
 DMA engine fetches, writes.
 DMA controller is a separate unit.

Bus mastership
 By default, CPU is bus master and
initiates transfers.
 DMA must become bus master to perform
its work.
 CPU can’t use bus while DMA operates.
 Bus mastership protocol:
 Bus request.
 Bus grant.
©
2008
Wayn
e Wolf

System bus configurations
 Multiple busses allow parallelism:
 Slow devices on one bus.
 Fast devices on separate bus.
 A bridge connects two busses.
CPU slow device
memory
high-speed
device
bridge
slow device

State diagrams for bus read
CPU device
Get
data
Done
Adrs
Wait
See
ack
Send
data
Release
ack
Adrs
Wait
Ack
start

GPU
 CUDA
 Hardware architecture
 Programming model
 Convolution on GPU

 ‘Compute Unified Device Architecture’
– Hardware and software architecture for issuing and
managing computations on GPU
• Massively parallel architecture
– over 8000 threads is common
• C for CUDA (C++ for CUDA)
– C/C++ language with some additions and restrictions
• Enables GPGPU – ‘General Purpose Computing on
GPUs’
CUDA

GPU: a multithreaded coprocessor
SM
streaming multiprocessor
32xSP (or 16, 48 or more)
Fast local ‘shared memory’
(shared between SPs)
16 KiB (or 64 KiB)
GLOBAL MEMORY
(ON DEVICE)
SM
SP SP SP SP
SP SP SP SP
SP SP SP SP
SP SP SP SP
SHARED
MEMORY
SP: scalar processor
‘CUDA core’
Executes one thread

GDDR memory
512 MiB - 6 GiB
•GPU:
SMs
o30xSM on GT200,
o14xSM on Fermi
For example, GTX 480:
14 SMs x 32 cores
= 448 cores on a GPU
GLOBAL MEMORY
(ON DEVICE)
SM
SP SP SP SP
SP SP SP SP
SP SP SP SP
SP SP SP SP
SHARED
MEMORY

•Parallelization
• Decomposition to threads
•Memory
• shared memory, global memory
GLOBAL MEMORY
(ON DEVICE)
SM
SP SP SP SP
SP SP SP SP
SP SP SP SP
SP SP SP SP
SHARED
MEMORY

Important Things To Keep In Mind
Avoid divergent branches
Threads of single SM must be
executing the same code
Code that branches heavily and
unpredictably will execute slowly
Threads shoud be independent as
much as possible
Synchronization and communication
can be done efficiently only for
threads of single multiprocessor
SM
SP SP SP SP
SP SP SP SP
SP SP SP SP
SP SP SP SP
SHARED
MEMORY

Parallelization
Decomposition to threads
Memory
shared memory, global memory
Enormous processing power
Avoid divergence
Thread communication
Synchronization, no
interdependencies
GLOBAL MEMORY
(ON DEVICE)
SM
SP SP SP SP
SP SP SP SP
SP SP SP SP
SP SP SP SP
SHARED
MEMORY

Introduction to computer architecture .pptx

Introduction to computer architecture .pptx

Recommended

Recommended

More Related Content

Similar to Introduction to computer architecture .pptx

Similar to Introduction to computer architecture .pptx (20)

More from Fatma Sayed Ibrahim

More from Fatma Sayed Ibrahim (7)

Recently uploaded

Recently uploaded (20)

Introduction to computer architecture .pptx

Editor's Notes