SlideShare a Scribd company logo
1 of 67
Introduction to Computer
Architecture
Introduction to Computer Architecture
Embedded computing system:
any device that includes a programmable computer
but is not itself a general-purpose computer.
Take advantage of application characteristics to
optimize the design:
don’t need all the general-purpose additional features
Embedding a computer
CPU
mem
input
output analog
analog
embedded
computer
Examples
 Cell phone.
 Printer.
 Automobile: engine, brakes, dash, etc.
 Airplane: engine, flight controls.
 Digital television.
 Household appliances.
Early history
 First microprocessor was Intel 4004 in early
1970’s.
Microprocessor varieties
 Microcontroller: includes I/O devices, on-board memory.
 Digital signal processor (DSP): microprocessor optimized
for digital signal processing.
 Typical embedded word sizes: 8-bit, 16-bit, 32-bit.
Characteristics of embedded systems
 Sophisticated functionality.
 Real-time operation.
 Low manufacturing cost.
 Low power.
 Designed to tight deadlines by small teams.
Functional complexity
 Often have to run sophisticated algorithms or multiple
algorithms.
 Cell phone, laser printer.
 Often provide sophisticated user interfaces.
Real-time operation
 Must finish operations by deadlines.
 Hard real time: missing deadline causes
failure.
 Soft real time: missing deadline results in
degraded performance.
 Many systems are multi-rate: must handle
operations at widely varying rates.
Non-functional requirements
 Many embedded systems are mass-market items that
must have low manufacturing costs.
 Limited memory, microprocessor power, etc.
 Power consumption is critical in battery-powered
devices.
 Excessive power consumption increases system cost even
in wall-powered devices.
Why use microprocessors?
 Alternatives: field-programmable gate arrays (FPGAs),
custom logic, etc.
 Microprocessors are often very efficient:
can use same logic to perform many different functions.
 Microprocessors simplify the design of families of
products.
The performance paradox
 Microprocessors use much more logic to
implement a function than does custom logic.
 But microprocessors are often at least as fast:
 heavily pipelined;
 large design teams;
 aggressive VLSI technology.
Platforms
 Embedded computing platform: hardware
architecture + associated software.
 Many platforms are multiprocessors.
 Examples:
 Single-chip multiprocessors for cell phone baseband.
 Automotive network + processors.
The physics of software
 Computing is a physical act.
 Software doesn’t do anything without hardware.
 Executing software consumes energy,
requires time.
 To understand the dynamics of software
(time, energy), we need to characterize the
platform on which the software runs.
Characterizing performance
We need to analyze the system at several
levels of abstraction to understand
performance:
CPU.
Platform.
Program.
Task.
Multiprocessor.
Challenges in embedded
system design
 How much hardware do we need?
 How big is the CPU? Memory?
 How do we meet our deadlines?
 Faster hardware or cleverer software?
 How do we minimize power?
 Turn off unnecessary logic? Reduce memory
accesses?
Design goals
 Performance.
 Overall speed, deadlines.
 Functionality and user interface.
 Manufacturing cost.
 Power consumption.
 Other requirements (physical size, etc.)
Levels of abstraction
requirements
specification
architecture
component
design
system
integration
Top-down vs. bottom-up
 Top-down design:
 start from most abstract description;
 work to most detailed.
 Bottom-up design:
 work from small components to big system.
 Real design uses both techniques.
Requirements
 Plain language description of what the user
wants and expects to get.
 May be developed in several ways:
 talking directly to customers;
 talking to marketing representatives;
 providing prototypes to users for comment.
Functional vs. non-functional
requirements
 Functional requirements:
 output as a function of input.
 Non-functional requirements:
 time required to compute output;
 size, weight, etc.;
 power consumption;
 reliability;
 etc.
UML
 Object-oriented design.
 Unified Modeling Language (UML).
 Object-oriented (OO) design: A generalization of object-
oriented programming.
 Object = state + methods.
 State provides each object with its own identity.
 Methods provide an abstract interface to the object.
UML object
d1: Display
pixels: array[] of pixels
elements
menu_items
pixels is a
2-D array
comment
object name
class name
attributes
Speaker Display
Multimedia_display
base classes
derived class
pipeline is a set of data processing elements connected in series.
output of one element is the input of the next one.
The elements of a pipeline are often executed in parallel or in
time-sliced fashion
CPU
ARM C55x
Computer architecture
taxonomy
Von Neumann architecture Harvard architecture
Computer architecture according to
their instruction
CISC RISC
von Neumann architecture
 Memory holds data, instructions.
 Central processing unit (CPU) fetches instructions from
memory.
 Separate CPU and memory distinguishes programmable
computer.
 CPU registers help out: program counter (PC), instruction
register (IR), general-purpose registers, etc.
von Neumann architecture
memory
CPU
PC
address
data
IR
ADD r5,r1,r3
200
200
ADD r5,r1,r3
Harvard architecture
CPU
PC
data memory
program memory
data
address
data
von Neumann vs. Harvard
 Harvard can’t use self-modifying code.
 Harvard allows two simultaneous memory
fetches.
 Most DSPs use Harvard architecture for
streaming data:
 greater memory bandwidth;
 more predictable bandwidth.
RISC vs. CISC
 Complex instruction set computer (CISC):
 many addressing modes;
 many operations.
 Reduced instruction set computer (RISC):
 load/store;
 pipelinable instructions.
Instruction set characteristics
 Fixed vs. variable length.
 Addressing modes.
 Number of operands.
 Types of operands.
Programming model
 Programming model: registers visible to the
programmer.
 Some registers are not visible (IR).
Assembly language
 One-to-one with instructions (more or less).
 Basic features:
 One instruction per line.
 Labels provide names for addresses (usually in first column).
 Instructions often start in later columns.
 Columns run to end of line.
ARM assembly language example
label1 ADR r4,c
LDR r0,[r4] ; a comment
ADR r4,d
LDR r1,[r4]
SUB r0,r0,r1 ; comment
Pseudo-ops
Some assembler directives don’t correspond directly to instructions:
Define current address.
Reserve storage.
Constants.
ARM programming model
r0
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
r13
r14
r15 (PC)
CPSR
31 0
N Z C V
Endianness
 Relationship between bit and byte/word ordering defines
endianness:
byte 3 byte 2 byte 1 byte 0
byte 0 byte 1 byte 2 byte 3
bit 31 bit 0
bit 0 bit 31
little-endian
big-endian
ARM data types
 Word is 32 bits long.
 Word can be divided into four 8-bit bytes.
 ARM addresses cam be 32 bits long.
 Address refers to byte.
 Address 4 starts at byte 4.
 Can be configured at power-up as either little-
or bit-endian mode.
ARM versions
 ARM architecture has been extended over several
versions.
 We will concentrate on ARM7.
ARM status bits
 Every arithmetic, logical, or shifting
operation sets CPSR bits:
 N (negative), Z (zero), C (carry), V (overflow).
 Examples:
 -1 + 1 = 0: NZCV = 0110.
 0-1 = -1: NZCV = 1000
 15+10 = 25: NZCV = 0011.
ARM data instructions
 Basic format:
ADD r0,r1,r2
 Computes r1+r2, stores in r0.
 Immediate operand:
ADD r0,r1,#2
 Computes r1+2, stores in r0.
ARM data instructions
 ADD, ADC : add (w.
carry)
 SUB, SBC : subtract
(w. carry)
 RSB, RSC : reverse
subtract (w. carry)
 MUL, MLA : multiply
(and accumulate)
 AND, ORR, EOR
 BIC : bit clear
 LSL, LSR : logical shift
left/right
 ASL, ASR : arithmetic
shift left/right
 ROR : rotate right
 RRX : rotate right
extended with C
©
2008
Wayn
e Wolf
Overheads for Computers as Components 2nd ed.
ARM move instructions
 MOV, MVN : move (negated)
MOV r0, r1 ; sets r0 to r1
©
2008
Wayn
e Wolf
Overheads for Computers as Components 2nd ed.
ARM load/store instructions
 LDR, LDRH, LDRB : load (half-word, byte)
 STR, STRH, STRB : store (half-word, byte)
 Addressing modes:
 register indirect : LDR r0,[r1]
 with second register : LDR r0,[r1,-r2]
 with constant : LDR r0,[r1,#4]
Example: C assignments
 C:
x = (a + b) - c;
 Assembler:
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
ADR r4,b ; get address for b, reusing r4
LDR r1,[r4] ; get value of b
ADD r3,r0,r1 ; compute a+b
ADR r4,c ; get address for c
LDR r2,[r4] ; get value of c
C assignment, cont’d.
SUB r3,r3,r2 ; complete computation of x
ADR r4,x ; get address for x
STR r3,[r4] ; store value of x
Example: C assignment
 C:
y = a*(b+c);
 Assembler:
ADR r4,b ; get address for b
LDR r0,[r4] ; get value of b
ADR r4,c ; get address for c
LDR r1,[r4] ; get value of c
ADD r2,r0,r1 ; compute partial result
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
©
2008
Wayn
e Wolf
Overheads for Computers as Components 2nd ed.
C assignment, cont’d.
MUL r2,r2,r0 ; compute final value
for y
ADR r4,y ; get address for y
STR r2,[r4] ; store y
©
2008
Wayn
e Wolf
Overheads for Computers as Components 2nd ed.
Bus-Based Computer Systems
 Busses.
 Memory devices.
 I/O devices:
 serial links
 timers and counters
 keyboards
 displays
 analog I/O
DMA
 Direct memory access (DMA) performs data
transfers without executing instructions.
 CPU sets up transfer.
 DMA engine fetches, writes.
 DMA controller is a separate unit.
Bus mastership
 By default, CPU is bus master and
initiates transfers.
 DMA must become bus master to perform
its work.
 CPU can’t use bus while DMA operates.
 Bus mastership protocol:
 Bus request.
 Bus grant.
©
2008
Wayn
e Wolf
Overheads for Computers as Components 2nd ed.
System bus configurations
 Multiple busses allow parallelism:
 Slow devices on one bus.
 Fast devices on separate bus.
 A bridge connects two busses.
CPU slow device
memory
high-speed
device
bridge
slow device
© 2000
Morgan
Kaufman
Overheads for Computers as Components
State diagrams for bus read
CPU device
Get
data
Done
Adrs
Wait
See
ack
Send
data
Release
ack
Adrs
Wait
Ack
start
Bus transfer sequence
diagram
GPU
 CUDA
 Hardware architecture
 Programming model
 Convolution on GPU
 ‘Compute Unified Device Architecture’
– Hardware and software architecture for issuing and
managing computations on GPU
• Massively parallel architecture
– over 8000 threads is common
• C for CUDA (C++ for CUDA)
– C/C++ language with some additions and restrictions
• Enables GPGPU – ‘General Purpose Computing on
GPUs’
CUDA
GPU: a multithreaded coprocessor
SM
streaming multiprocessor
32xSP (or 16, 48 or more)
Fast local ‘shared memory’
(shared between SPs)
16 KiB (or 64 KiB)
GLOBAL MEMORY
(ON DEVICE)
SM
SP SP SP SP
SP SP SP SP
SP SP SP SP
SP SP SP SP
SHARED
MEMORY
SP: scalar processor
‘CUDA core’
Executes one thread
GDDR memory
512 MiB - 6 GiB
•GPU:
SMs
o30xSM on GT200,
o14xSM on Fermi
For example, GTX 480:
14 SMs x 32 cores
= 448 cores on a GPU
GLOBAL MEMORY
(ON DEVICE)
SM
SP SP SP SP
SP SP SP SP
SP SP SP SP
SP SP SP SP
SHARED
MEMORY
•Parallelization
• Decomposition to threads
•Memory
• shared memory, global memory
GLOBAL MEMORY
(ON DEVICE)
SM
SP SP SP SP
SP SP SP SP
SP SP SP SP
SP SP SP SP
SHARED
MEMORY
Important Things To Keep In Mind
Avoid divergent branches
Threads of single SM must be
executing the same code
Code that branches heavily and
unpredictably will execute slowly
Threads shoud be independent as
much as possible
Synchronization and communication
can be done efficiently only for
threads of single multiprocessor
SM
SP SP SP SP
SP SP SP SP
SP SP SP SP
SP SP SP SP
SHARED
MEMORY
Parallelization
Decomposition to threads
Memory
shared memory, global memory
Enormous processing power
Avoid divergence
Thread communication
Synchronization, no
interdependencies
GLOBAL MEMORY
(ON DEVICE)
SM
SP SP SP SP
SP SP SP SP
SP SP SP SP
SP SP SP SP
SHARED
MEMORY
Introduction to computer architecture .pptx
Introduction to computer architecture .pptx

More Related Content

Similar to Introduction to computer architecture .pptx

C from hello world to 010101
C from hello world to 010101C from hello world to 010101
C from hello world to 010101Bellaj Badr
 
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structuChapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structuEstelaJeffery653
 
Arduino by yogesh t s'
Arduino by yogesh t s'Arduino by yogesh t s'
Arduino by yogesh t s'tsyogesh46
 
Introduction to Processor Design and ARM Processor
Introduction to Processor Design and ARM ProcessorIntroduction to Processor Design and ARM Processor
Introduction to Processor Design and ARM ProcessorDarling Jemima
 
Computer architecture is made up of two main components the Instruct.docx
Computer architecture is made up of two main components the Instruct.docxComputer architecture is made up of two main components the Instruct.docx
Computer architecture is made up of two main components the Instruct.docxbrownliecarmella
 
how to generate sms
how to generate smshow to generate sms
how to generate smssumant reddy
 
Computer architecture instruction formats
Computer architecture instruction formatsComputer architecture instruction formats
Computer architecture instruction formatsMazin Alwaaly
 
Instruction set.pptx
Instruction set.pptxInstruction set.pptx
Instruction set.pptxssuser000e54
 
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMDEdge AI and Vision Alliance
 
Target updated track f
Target updated   track fTarget updated   track f
Target updated track fAlona Gradman
 
Chip Ex2010 Gert Goossens
Chip Ex2010 Gert GoossensChip Ex2010 Gert Goossens
Chip Ex2010 Gert GoossensAlona Gradman
 
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Deepak Shankar
 
DESIGN OF A 16-BIT HARVARD STRUCTURED RISC PROCESSOR IN CADENCE 45nmTECHNOLOGY
DESIGN OF A 16-BIT HARVARD STRUCTURED RISC PROCESSOR  IN CADENCE 45nmTECHNOLOGYDESIGN OF A 16-BIT HARVARD STRUCTURED RISC PROCESSOR  IN CADENCE 45nmTECHNOLOGY
DESIGN OF A 16-BIT HARVARD STRUCTURED RISC PROCESSOR IN CADENCE 45nmTECHNOLOGYshaikalthaf40
 
Necessity of 32-Bit Controllers
Necessity of 32-Bit ControllersNecessity of 32-Bit Controllers
Necessity of 32-Bit Controllersmohanav
 

Similar to Introduction to computer architecture .pptx (20)

C from hello world to 010101
C from hello world to 010101C from hello world to 010101
C from hello world to 010101
 
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structuChapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
 
Arduino by yogesh t s'
Arduino by yogesh t s'Arduino by yogesh t s'
Arduino by yogesh t s'
 
Introduction to Processor Design and ARM Processor
Introduction to Processor Design and ARM ProcessorIntroduction to Processor Design and ARM Processor
Introduction to Processor Design and ARM Processor
 
Computer architecture is made up of two main components the Instruct.docx
Computer architecture is made up of two main components the Instruct.docxComputer architecture is made up of two main components the Instruct.docx
Computer architecture is made up of two main components the Instruct.docx
 
how to generate sms
how to generate smshow to generate sms
how to generate sms
 
Computer architecture instruction formats
Computer architecture instruction formatsComputer architecture instruction formats
Computer architecture instruction formats
 
Instruction set.pptx
Instruction set.pptxInstruction set.pptx
Instruction set.pptx
 
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
“Programming Vision Pipelines on AMD’s AI Engines,” a Presentation from AMD
 
C programming part2
C programming part2C programming part2
C programming part2
 
C programming part2
C programming part2C programming part2
C programming part2
 
C programming part2
C programming part2C programming part2
C programming part2
 
Embedded concepts
Embedded conceptsEmbedded concepts
Embedded concepts
 
Target updated track f
Target updated   track fTarget updated   track f
Target updated track f
 
Chip Ex2010 Gert Goossens
Chip Ex2010 Gert GoossensChip Ex2010 Gert Goossens
Chip Ex2010 Gert Goossens
 
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
 
DESIGN OF A 16-BIT HARVARD STRUCTURED RISC PROCESSOR IN CADENCE 45nmTECHNOLOGY
DESIGN OF A 16-BIT HARVARD STRUCTURED RISC PROCESSOR  IN CADENCE 45nmTECHNOLOGYDESIGN OF A 16-BIT HARVARD STRUCTURED RISC PROCESSOR  IN CADENCE 45nmTECHNOLOGY
DESIGN OF A 16-BIT HARVARD STRUCTURED RISC PROCESSOR IN CADENCE 45nmTECHNOLOGY
 
8871077.ppt
8871077.ppt8871077.ppt
8871077.ppt
 
Necessity of 32-Bit Controllers
Necessity of 32-Bit ControllersNecessity of 32-Bit Controllers
Necessity of 32-Bit Controllers
 
A STUDY OF AN ENTRENCHED SYSTEM USING INTERNET OF THINGS
A STUDY OF AN ENTRENCHED SYSTEM USING INTERNET OF THINGSA STUDY OF AN ENTRENCHED SYSTEM USING INTERNET OF THINGS
A STUDY OF AN ENTRENCHED SYSTEM USING INTERNET OF THINGS
 

More from Fatma Sayed Ibrahim

Introduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptxIntroduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptxFatma Sayed Ibrahim
 
CIBEC Presentation Fatma Sayed.pptx
CIBEC Presentation Fatma Sayed.pptxCIBEC Presentation Fatma Sayed.pptx
CIBEC Presentation Fatma Sayed.pptxFatma Sayed Ibrahim
 
installationoftensorflowandkeras-190310121258.pptx
installationoftensorflowandkeras-190310121258.pptxinstallationoftensorflowandkeras-190310121258.pptx
installationoftensorflowandkeras-190310121258.pptxFatma Sayed Ibrahim
 
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...Fatma Sayed Ibrahim
 
Automatic System for Detection and Classification of Brain Tumors
Automatic System for Detection and Classification of Brain TumorsAutomatic System for Detection and Classification of Brain Tumors
Automatic System for Detection and Classification of Brain TumorsFatma Sayed Ibrahim
 
Hospital architecture design planning
Hospital architecture design  planningHospital architecture design  planning
Hospital architecture design planningFatma Sayed Ibrahim
 

More from Fatma Sayed Ibrahim (7)

Introduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptxIntroduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptx
 
CIBEC Presentation Fatma Sayed.pptx
CIBEC Presentation Fatma Sayed.pptxCIBEC Presentation Fatma Sayed.pptx
CIBEC Presentation Fatma Sayed.pptx
 
The steps of R code Master.pptx
The steps of R code Master.pptxThe steps of R code Master.pptx
The steps of R code Master.pptx
 
installationoftensorflowandkeras-190310121258.pptx
installationoftensorflowandkeras-190310121258.pptxinstallationoftensorflowandkeras-190310121258.pptx
installationoftensorflowandkeras-190310121258.pptx
 
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
 
Automatic System for Detection and Classification of Brain Tumors
Automatic System for Detection and Classification of Brain TumorsAutomatic System for Detection and Classification of Brain Tumors
Automatic System for Detection and Classification of Brain Tumors
 
Hospital architecture design planning
Hospital architecture design  planningHospital architecture design  planning
Hospital architecture design planning
 

Recently uploaded

AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationAadityaSharma884161
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayMakMakNepo
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........LeaCamillePacle
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 

Recently uploaded (20)

AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint Presentation
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up Friday
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 

Introduction to computer architecture .pptx

  • 2. Introduction to Computer Architecture Embedded computing system: any device that includes a programmable computer but is not itself a general-purpose computer. Take advantage of application characteristics to optimize the design: don’t need all the general-purpose additional features
  • 3. Embedding a computer CPU mem input output analog analog embedded computer
  • 4. Examples  Cell phone.  Printer.  Automobile: engine, brakes, dash, etc.  Airplane: engine, flight controls.  Digital television.  Household appliances.
  • 5. Early history  First microprocessor was Intel 4004 in early 1970’s.
  • 6. Microprocessor varieties  Microcontroller: includes I/O devices, on-board memory.  Digital signal processor (DSP): microprocessor optimized for digital signal processing.  Typical embedded word sizes: 8-bit, 16-bit, 32-bit.
  • 7. Characteristics of embedded systems  Sophisticated functionality.  Real-time operation.  Low manufacturing cost.  Low power.  Designed to tight deadlines by small teams.
  • 8. Functional complexity  Often have to run sophisticated algorithms or multiple algorithms.  Cell phone, laser printer.  Often provide sophisticated user interfaces.
  • 9. Real-time operation  Must finish operations by deadlines.  Hard real time: missing deadline causes failure.  Soft real time: missing deadline results in degraded performance.  Many systems are multi-rate: must handle operations at widely varying rates.
  • 10. Non-functional requirements  Many embedded systems are mass-market items that must have low manufacturing costs.  Limited memory, microprocessor power, etc.  Power consumption is critical in battery-powered devices.  Excessive power consumption increases system cost even in wall-powered devices.
  • 11. Why use microprocessors?  Alternatives: field-programmable gate arrays (FPGAs), custom logic, etc.  Microprocessors are often very efficient: can use same logic to perform many different functions.  Microprocessors simplify the design of families of products.
  • 12. The performance paradox  Microprocessors use much more logic to implement a function than does custom logic.  But microprocessors are often at least as fast:  heavily pipelined;  large design teams;  aggressive VLSI technology.
  • 13. Platforms  Embedded computing platform: hardware architecture + associated software.  Many platforms are multiprocessors.  Examples:  Single-chip multiprocessors for cell phone baseband.  Automotive network + processors.
  • 14. The physics of software  Computing is a physical act.  Software doesn’t do anything without hardware.  Executing software consumes energy, requires time.  To understand the dynamics of software (time, energy), we need to characterize the platform on which the software runs.
  • 15. Characterizing performance We need to analyze the system at several levels of abstraction to understand performance: CPU. Platform. Program. Task. Multiprocessor.
  • 16. Challenges in embedded system design  How much hardware do we need?  How big is the CPU? Memory?  How do we meet our deadlines?  Faster hardware or cleverer software?  How do we minimize power?  Turn off unnecessary logic? Reduce memory accesses?
  • 17. Design goals  Performance.  Overall speed, deadlines.  Functionality and user interface.  Manufacturing cost.  Power consumption.  Other requirements (physical size, etc.)
  • 19. Top-down vs. bottom-up  Top-down design:  start from most abstract description;  work to most detailed.  Bottom-up design:  work from small components to big system.  Real design uses both techniques.
  • 20. Requirements  Plain language description of what the user wants and expects to get.  May be developed in several ways:  talking directly to customers;  talking to marketing representatives;  providing prototypes to users for comment.
  • 21. Functional vs. non-functional requirements  Functional requirements:  output as a function of input.  Non-functional requirements:  time required to compute output;  size, weight, etc.;  power consumption;  reliability;  etc.
  • 22. UML  Object-oriented design.  Unified Modeling Language (UML).  Object-oriented (OO) design: A generalization of object- oriented programming.  Object = state + methods.  State provides each object with its own identity.  Methods provide an abstract interface to the object.
  • 23. UML object d1: Display pixels: array[] of pixels elements menu_items pixels is a 2-D array comment object name class name attributes
  • 25. pipeline is a set of data processing elements connected in series. output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time-sliced fashion
  • 26. CPU ARM C55x Computer architecture taxonomy Von Neumann architecture Harvard architecture Computer architecture according to their instruction CISC RISC
  • 27. von Neumann architecture  Memory holds data, instructions.  Central processing unit (CPU) fetches instructions from memory.  Separate CPU and memory distinguishes programmable computer.  CPU registers help out: program counter (PC), instruction register (IR), general-purpose registers, etc.
  • 30. von Neumann vs. Harvard  Harvard can’t use self-modifying code.  Harvard allows two simultaneous memory fetches.  Most DSPs use Harvard architecture for streaming data:  greater memory bandwidth;  more predictable bandwidth.
  • 31. RISC vs. CISC  Complex instruction set computer (CISC):  many addressing modes;  many operations.  Reduced instruction set computer (RISC):  load/store;  pipelinable instructions.
  • 32. Instruction set characteristics  Fixed vs. variable length.  Addressing modes.  Number of operands.  Types of operands.
  • 33. Programming model  Programming model: registers visible to the programmer.  Some registers are not visible (IR).
  • 34. Assembly language  One-to-one with instructions (more or less).  Basic features:  One instruction per line.  Labels provide names for addresses (usually in first column).  Instructions often start in later columns.  Columns run to end of line.
  • 35. ARM assembly language example label1 ADR r4,c LDR r0,[r4] ; a comment ADR r4,d LDR r1,[r4] SUB r0,r0,r1 ; comment Pseudo-ops Some assembler directives don’t correspond directly to instructions: Define current address. Reserve storage. Constants.
  • 37. Endianness  Relationship between bit and byte/word ordering defines endianness: byte 3 byte 2 byte 1 byte 0 byte 0 byte 1 byte 2 byte 3 bit 31 bit 0 bit 0 bit 31 little-endian big-endian
  • 38. ARM data types  Word is 32 bits long.  Word can be divided into four 8-bit bytes.  ARM addresses cam be 32 bits long.  Address refers to byte.  Address 4 starts at byte 4.  Can be configured at power-up as either little- or bit-endian mode.
  • 39.
  • 40. ARM versions  ARM architecture has been extended over several versions.  We will concentrate on ARM7.
  • 41. ARM status bits  Every arithmetic, logical, or shifting operation sets CPSR bits:  N (negative), Z (zero), C (carry), V (overflow).  Examples:  -1 + 1 = 0: NZCV = 0110.  0-1 = -1: NZCV = 1000  15+10 = 25: NZCV = 0011.
  • 42. ARM data instructions  Basic format: ADD r0,r1,r2  Computes r1+r2, stores in r0.  Immediate operand: ADD r0,r1,#2  Computes r1+2, stores in r0.
  • 43. ARM data instructions  ADD, ADC : add (w. carry)  SUB, SBC : subtract (w. carry)  RSB, RSC : reverse subtract (w. carry)  MUL, MLA : multiply (and accumulate)  AND, ORR, EOR  BIC : bit clear  LSL, LSR : logical shift left/right  ASL, ASR : arithmetic shift left/right  ROR : rotate right  RRX : rotate right extended with C
  • 44. © 2008 Wayn e Wolf Overheads for Computers as Components 2nd ed. ARM move instructions  MOV, MVN : move (negated) MOV r0, r1 ; sets r0 to r1
  • 45. © 2008 Wayn e Wolf Overheads for Computers as Components 2nd ed. ARM load/store instructions  LDR, LDRH, LDRB : load (half-word, byte)  STR, STRH, STRB : store (half-word, byte)  Addressing modes:  register indirect : LDR r0,[r1]  with second register : LDR r0,[r1,-r2]  with constant : LDR r0,[r1,#4]
  • 46. Example: C assignments  C: x = (a + b) - c;  Assembler: ADR r4,a ; get address for a LDR r0,[r4] ; get value of a ADR r4,b ; get address for b, reusing r4 LDR r1,[r4] ; get value of b ADD r3,r0,r1 ; compute a+b ADR r4,c ; get address for c LDR r2,[r4] ; get value of c
  • 47. C assignment, cont’d. SUB r3,r3,r2 ; complete computation of x ADR r4,x ; get address for x STR r3,[r4] ; store value of x
  • 48. Example: C assignment  C: y = a*(b+c);  Assembler: ADR r4,b ; get address for b LDR r0,[r4] ; get value of b ADR r4,c ; get address for c LDR r1,[r4] ; get value of c ADD r2,r0,r1 ; compute partial result ADR r4,a ; get address for a LDR r0,[r4] ; get value of a
  • 49. © 2008 Wayn e Wolf Overheads for Computers as Components 2nd ed. C assignment, cont’d. MUL r2,r2,r0 ; compute final value for y ADR r4,y ; get address for y STR r2,[r4] ; store y
  • 50. © 2008 Wayn e Wolf Overheads for Computers as Components 2nd ed. Bus-Based Computer Systems  Busses.  Memory devices.  I/O devices:  serial links  timers and counters  keyboards  displays  analog I/O
  • 51. DMA  Direct memory access (DMA) performs data transfers without executing instructions.  CPU sets up transfer.  DMA engine fetches, writes.  DMA controller is a separate unit.
  • 52.
  • 53. Bus mastership  By default, CPU is bus master and initiates transfers.  DMA must become bus master to perform its work.  CPU can’t use bus while DMA operates.  Bus mastership protocol:  Bus request.  Bus grant. © 2008 Wayn e Wolf Overheads for Computers as Components 2nd ed.
  • 54. System bus configurations  Multiple busses allow parallelism:  Slow devices on one bus.  Fast devices on separate bus.  A bridge connects two busses. CPU slow device memory high-speed device bridge slow device
  • 55. © 2000 Morgan Kaufman Overheads for Computers as Components
  • 56. State diagrams for bus read CPU device Get data Done Adrs Wait See ack Send data Release ack Adrs Wait Ack start
  • 57.
  • 59. GPU  CUDA  Hardware architecture  Programming model  Convolution on GPU
  • 60.  ‘Compute Unified Device Architecture’ – Hardware and software architecture for issuing and managing computations on GPU • Massively parallel architecture – over 8000 threads is common • C for CUDA (C++ for CUDA) – C/C++ language with some additions and restrictions • Enables GPGPU – ‘General Purpose Computing on GPUs’ CUDA
  • 61. GPU: a multithreaded coprocessor SM streaming multiprocessor 32xSP (or 16, 48 or more) Fast local ‘shared memory’ (shared between SPs) 16 KiB (or 64 KiB) GLOBAL MEMORY (ON DEVICE) SM SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SHARED MEMORY SP: scalar processor ‘CUDA core’ Executes one thread
  • 62. GDDR memory 512 MiB - 6 GiB •GPU: SMs o30xSM on GT200, o14xSM on Fermi For example, GTX 480: 14 SMs x 32 cores = 448 cores on a GPU GLOBAL MEMORY (ON DEVICE) SM SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SHARED MEMORY
  • 63. •Parallelization • Decomposition to threads •Memory • shared memory, global memory GLOBAL MEMORY (ON DEVICE) SM SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SHARED MEMORY
  • 64. Important Things To Keep In Mind Avoid divergent branches Threads of single SM must be executing the same code Code that branches heavily and unpredictably will execute slowly Threads shoud be independent as much as possible Synchronization and communication can be done efficiently only for threads of single multiprocessor SM SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SHARED MEMORY
  • 65. Parallelization Decomposition to threads Memory shared memory, global memory Enormous processing power Avoid divergence Thread communication Synchronization, no interdependencies GLOBAL MEMORY (ON DEVICE) SM SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SHARED MEMORY

Editor's Notes

  1. Wars size: number of bits processed by computer’s CPU in one go
  2. Mass-market market for goods that are produced in large quantities. a non-functional requirement is a requirement that specifies criteria that can be used to judge the operation of a system, rather than specific behaviors. They are contrasted with functional requirements that define specific behavior or functions.
  3. VLSI= very large-scale integration, the process of integrating hundreds of thousands of components on a single silicon chip.
  4. Abstraction = فكزة غامضة ، تعبير تجريدي
  5. مستويات الاستخراج
  6. Plain بسيطط
  7. Describe number of interacting objects rather than one single large block (monoloithic)
  8. Data sets that arrive continuously and periodically are called streaming data.
  9. Bandwidth is also defined as the amount of data that can be transmitted in a fixed amount of time. For digital devices, the bandwidth is usually expressed in bits per second(bps) or bytes per second.
  10. The set of registers available for use by programs is called the programming model,alsoknownastheprogrammermodel
  11.  arithmetic overflow has occurred in an operation, indicating that the signed two's-complement result would not fit in the number of bits used for the operation (the ALU width). Some architectures may be configured to automatically generate an exception on an operation resulting in overflow.