SlideShare a Scribd company logo
1 of 29
Download to read offline
1
PRAGMATIC
OPTIMIZATION
IN MODERN PROGRAMMING
MODERN COMPUTER ARCHITECTURE CONCEPTS
Created by for / 2015-2016Marina (geek) Kolpakova UNN
2
COURSE TOPICS
Ordering optimization approaches
Demystifying a compiler
Mastering compiler optimizations
Modern computer architectures concepts
3
OUTLINE
Three aspects of the computer architecture
Latency vs Throughput architectures
Architecture families
CISC
RISC
VLIW
Vector
Why is it doing to be load/store?
Latest trends
summary
4 . 1
1-ST ASPECT OF COMPUTER ARCHITECTURE
Instruction Set Architecture or ISA (interface)
is a contract between HW and SW,
which speci es right, possibilities & limitations.
Class of ISA (load-store, register-memory)
Memory addressing modes & rules (base-immediate,
alignment requirements)
Types & sizes of operands (size of byte, short)
Operations (general arithmetic, control, logical)
Control ow instructions (branches, jumps, calls, returns)
Encoding an ISA ( xed or variable length)
All the conceptual aspects of the architecture
4 . 2
2-ND ASPECT OF COMPUTER ARCHITECTURE
Microarchitecture (organization) is a concrete
implementation of the ISA, the high-level aspects of a
processor design (memory system, memory interconnect,
design of the processor internals).
Pipeline width
Instruction latencies
Issue wight and scheduling
Speculation capabilities
All the concrete aspects of the architecture
4 . 3
3-ND ASPECT OF COMPUTER ARCHITECTURE
Hardware or chip (design) is the speci cs of a computer,
including the logic design and packaging. This is a concrete
implementation of the microarchitecture.
Tech-process
Clock rates
On die placement
All the properties of the chip
5 . 1
ARCHITECTURE: ARMV8-A
μarch IP Hardware
Cortex-A53 ARM Octa Exynos 7(7580) 1.6GHz
28nmHKMG
Cortex-A57 ARM Octa Exynos 7(7420) big.LITTLE 2.1/1.5
14FF ( LPE) (Samsung)
Cortex-A72 ARM Deca MediaTek Helio X20 big.LITTLE
2.5/2.0/1.4 20nmHKMG (TSMC)
Cortex-A35 ARM -
5 . 2
ARCHITECTURE: ARMV8-A
μarch IP Hardware
Denver NVIDIA Dual Tegra K1 2.3GHz
28nmHPM
Kryo Qualcomm Tetra S820 big.LITTLE 2.2/1.6GHz
14FF ( LPP) (Samsung)
Exynos M1 Samsung Quad Exynos 8890 big.LITTLE
2.6/2.29GHz 14FF ( LPP)
(Samsung)
5 . 3
ARCHITECTURE: ARMV8-A
μarch IP Hardware
Cyclone Apple Dual A7 (APL0698) 1.4GHz
28nmHKMG (Samsung)
Typhoon Apple Dual A8 (APL1012) 1.3GHz
20nmHKMG (TSMC)
Twister Apple Dual A9 (APL0898) 1.85GHz
16nmFF+ (TSMC)
Dual A9 (APL1022) 1.85GHz
14nmFF( LPP) (Samsung)
6 . 1
LATENCY VS THROUGHPUT ARCHITECTURES
Latency oriented architecture
addresses latency hiding issues;
features sophisticated pipelining;
out-of-order;
employs advanced cache hierarchies;
widely uses speculation.
Compute cores occupy only a small part of a die.
6 . 2
LATENCY VS THROUGHPUT ARCHITECTURES
Throughput oriented architecture
performs a bunch of operations in y;
features many simple compute units/cores;
employs simple pipelines and large register le to
provide a low-cost thread scheduling;
uses wide basses, tiling, programmable local memory.
Compute cores occupy most part of a die.
7
KEY ARCHITECTURE FAMILIES
RISC
Reduced Instruction Set Computer
CISC
Complex Instruction Set Computer
VLIW
Very Long Instruction Word
Vector architecture
8 . 1
CISC
Complex Instruction Set Computer
Designed in the 1970s which was a time where transistors
were expensive while compilers were naive. Additionally,
instruction packaging was the main concern due to
shortage of memory. The latency of the memory was just a
bit higher then registers.
The goal was to de ne an instruction set that allows high
level language constructs be translated into as few
assembly language instructions as possible, improving
performance as well as code density.
Examples are VAX, x86, AMD64.
Latency-oriented architecture.
8 . 2
CISC
Heritage
instructions access memory, a plenty of addressing modes,
many instruction families and a very rich variable length
ISA (alignment counts!),
consequently, complicated instruction decoding logic.
Moreover, a few registers are available for programmers.
Nowadays
1. Instructions are broken down into μcode which are
much easy to pipeline and process power ef ciently.
2. Transistors are spent to cache hierarchies, out-of-order
execution, large RB and speculation to eliminate stalls.
3. Symmetric multi-processing.
9 . 1
RISC
Reduced Instruction Set Computer
Designed in the 1980s which was a time there IPL was the
great concern. The memory-processor gap already began
to come out.
The goal was to decrease the number of clocks per
instruction (CPI) while pipeline instructions as much as
possible employing hardware to help with it. Uniform ISA,
pipelining and large register le is a must-have.
Examples are MIPS, ARM, PowerPC.
Latency-oriented architecture.
9 . 2
RISC
Heritage
Relatively few instructions, all are the same length.
Only load and store instructions access memory.
Large resister le than typical CISC processors have.
No μcode
Nowadays Most architectures that comes from RISC are
called Load-Store architectures, while may employ μops.
They combine concepts of a classic RISC with usage of
modern hardware enhancements:
1. deep pipelines, multi-cycle instructions,
2. out-of-order execution,
3. speculation.
10
THE HARDWARE/SOFTWARE GAP
Compiler
analyzes control ow, analyzes dependency
schedules instructions
maps variables to limited register set
Hardware
analyzes control ow, analyzes dependency
schedules instructions
remaps ISA register to large internal register set
11
A WORD TOWARDS REGISTERS
In deed, registers are temporary storage locations inside the
processor that hold data and addresses.
Local variables are not the same as registers in ISA, since
compiler uses IR internally and does register allocation
close to the end of optimization process.
Registers provided by ISA is not the same as actual
registers on the processor. Internal reorder buffers which
hold decoded instruction parameters and intermediate
results are closer to classic de nition of a register le.
12 . 1
VLIW
Very Long Instruction Word
Designed in the 1980s which was a time there IPL was the
great concern.
The goal was to pipeline instructions as much as possible
employing software to help with it reducing complexity of
the hardware and mitigate the the Hardware/Software
gap. Boost processor clock simplifying work per cycle.
Example is IntelHP Itanium.
Throughput-oriented architecture
12 . 2
VLIW
Heritage
Compiler determines which instructions can be
performed in parallel,
bundles this information and the instructions,
and passes the bundle(word) to the hardware.
No data dependencies between instructions in a word.
Each operation in a word assigned to speci c issue slot
(dedicated FU).
12 . 3
VLIW
Nowadays
hardly any generic processor implements VLIW
brunchy nature of production codes (in contrast to
HPC or scienti c codes),
need to follow binary compatibility across the
μarchitecture families.
Whereas architecture is widely adopted for
programmable co-processors where shrink in power
consumption without lose of performance is crucial
(DSP, GPU).
13 . 1
VECTOR PROCESSORS
First introduced in 1976 and dominated for HPC in the
1980s because of high instruction throughput.
The goal was to perform operations on vectors of data
exposing data level parallelism (DLP) to increase
instruction throughput. Vector pipelining is also called
chaining.
Example is Cray
Throughput-oriented architectures.
13 . 2
VECTOR PROCESSORS
Heritage
Process the data in vectors, each element in a vector
(lane) is independent on any other.
Deep pipelines, wide execution units, not necessary the
same width (batch length) as size of vector in elements
(vector length).
Most ef cient for simple memory patterns, but
getter/scatter is usually possible too.
Wide memory interfaces to saturate execution units.
Large vector register le, cache is not a strict
requirement and absent for classical vector processors.
13 . 3
VECTOR PROCESSORS
Nowadays
They aren't used in generic processors design, but used
as a co-processors for a speci c workloads: HPC,
multimedia.
Precursors of most designs of modern GPUs.
Vector pipes with short vector length (8-16 bytes) called
SIMD units are widely integrated in modern general
purpose processors to accelerate most demanding
loops.
14 . 1
WHY IS IT DOING TO BE RISC LOAD/STORE?
1. Simple xed-width instructions & few addressing modes
Cache-ef cient instruction fetch, branches are aligned.
Simple hardware logic → power ef cient chips.
Drive a higher clock rate.
2. Concise ISA with orthogonal functionality
Complex instructions are ignored by compilers due to
semantic gap → simple instructions simplify scheduling.
Complex addressing lead either to variable length
instructions or big instruction size → inef cient
decoding and scheduling as well as alignment issues.
3. Large register set
Expose possible instruction parallelism to the compiler.
15
LATEST TRENDS
Architecture is seen as Load-store RISC-inherited
Internally instructions are broken down into single-pipe
μops
μops are reordered and optionally organized into words
μops or words are scheduled for execution, caching in the
highest level is usually performed on this preprocessed
view.
Latest generations of Intel processors, NVIDIA Denver
architecture and 64-bit ARM Cortex-A processors already
employ this approach.
16 . 1
SUMMARY
There are three key aspects of computer architecture:
Instruction Set Architecture, μarchitecture and design.
Some architectures aim to hide latency while others aim to
maximize instruction throughput.
CISC is created for compact code size and exact instruction
encoding and used only on ISA level nowadays.
RISC leads to less complicated decoding and pipeline stages
allow boosting clock in affordable power budget.
VLIW targets power ef cient high performance devices for
speci c tasks or used internally on μarchitecture level.
Vector processors transformed into SIMD-extensions and
SIMT-like GPU designs.
16 . 2
SUMMARY
Loads-Store architectres with its simple xed-width
instructions, few addressing modes, concise ISA
and optimal size register size is a winner solution.
Architecture can expose different properties
for it's different levels (ISA, μarchitecture).
17
THE END
/ 2015-2016MARINA KOLPAKOVA

More Related Content

What's hot

Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWLec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWHsien-Hsin Sean Lee, Ph.D.
 
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISALec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISAHsien-Hsin Sean Lee, Ph.D.
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bitsChiou-Nan Chen
 
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreLec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreHsien-Hsin Sean Lee, Ph.D.
 
Lec17 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Me...
Lec17 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Me...Lec17 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Me...
Lec17 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Me...Hsien-Hsin Sean Lee, Ph.D.
 
TinyML - 4 speech recognition
TinyML - 4 speech recognition TinyML - 4 speech recognition
TinyML - 4 speech recognition 艾鍗科技
 
Code gpu with cuda - CUDA introduction
Code gpu with cuda - CUDA introductionCode gpu with cuda - CUDA introduction
Code gpu with cuda - CUDA introductionMarina Kolpakova
 
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Hsien-Hsin Sean Lee, Ph.D.
 
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
 An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
An Open Discussion of RISC-V BitManip, trends, and comparisons _ ClaireRISC-V International
 
Arm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler supportArm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler supportLinaro
 
07 processor basics
07 processor basics07 processor basics
07 processor basicsMurali M
 
An evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loopsAn evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loopsLinaro
 
RISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentorRISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentorRISC-V International
 
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...Hsien-Hsin Sean Lee, Ph.D.
 
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Hsien-Hsin Sean Lee, Ph.D.
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilersAnastasiaStulova
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...AMD Developer Central
 
Code GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limitersCode GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limitersMarina Kolpakova
 
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Hsien-Hsin Sean Lee, Ph.D.
 

What's hot (20)

Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWLec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
 
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISALec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
Lec4 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- ISA
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bits
 
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- MulticoreLec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
 
Lec17 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Me...
Lec17 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Me...Lec17 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Me...
Lec17 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Me...
 
TinyML - 4 speech recognition
TinyML - 4 speech recognition TinyML - 4 speech recognition
TinyML - 4 speech recognition
 
Code gpu with cuda - CUDA introduction
Code gpu with cuda - CUDA introductionCode gpu with cuda - CUDA introduction
Code gpu with cuda - CUDA introduction
 
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
 
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
 An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
 
Arm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler supportArm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler support
 
07 processor basics
07 processor basics07 processor basics
07 processor basics
 
An evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loopsAn evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loops
 
RISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentorRISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentor
 
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...
 
Debug generic process
Debug generic processDebug generic process
Debug generic process
 
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
Lec11 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part3
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilers
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
 
Code GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limitersCode GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limiters
 
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
 

Viewers also liked

Computer Architecture and organization
Computer Architecture and organizationComputer Architecture and organization
Computer Architecture and organizationBadrinath Kadam
 
Computer architecture
Computer architectureComputer architecture
Computer architectureRishabha Garg
 
Code GPU with CUDA - Applying optimization techniques
Code GPU with CUDA - Applying optimization techniquesCode GPU with CUDA - Applying optimization techniques
Code GPU with CUDA - Applying optimization techniquesMarina Kolpakova
 
0 introduction to computer architecture
0 introduction to computer architecture0 introduction to computer architecture
0 introduction to computer architectureaamc1100
 
Computer architecture
Computer architecture Computer architecture
Computer architecture Ashish Kumar
 
[2017.03.18] hst binary training part 1
[2017.03.18] hst binary training   part 1[2017.03.18] hst binary training   part 1
[2017.03.18] hst binary training part 1Chia-Hao Tsai
 
Hydrogen production by a thermally integrated ATR based fuel processor
Hydrogen production by a thermally integrated ATR based fuel processorHydrogen production by a thermally integrated ATR based fuel processor
Hydrogen production by a thermally integrated ATR based fuel processorAntonio Ricca
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPA B Shinde
 
1 Computer Architecture
1 Computer Architecture1 Computer Architecture
1 Computer Architecturefika sweety
 
SOC Chip Basics
SOC Chip BasicsSOC Chip Basics
SOC Chip BasicsA B Shinde
 
History of computers
History of computersHistory of computers
History of computersHoang Nguyen
 
SOC Interconnects: AMBA & CoreConnect
SOC Interconnects: AMBA  & CoreConnectSOC Interconnects: AMBA  & CoreConnect
SOC Interconnects: AMBA & CoreConnectA B Shinde
 
SOC Processors Used in SOC
SOC Processors Used in SOCSOC Processors Used in SOC
SOC Processors Used in SOCA B Shinde
 
BigchainDB: A Scalable Blockchain Database, In Python
BigchainDB: A Scalable Blockchain Database, In PythonBigchainDB: A Scalable Blockchain Database, In Python
BigchainDB: A Scalable Blockchain Database, In PythonTrent McConaghy
 

Viewers also liked (18)

Computer Architecture and organization
Computer Architecture and organizationComputer Architecture and organization
Computer Architecture and organization
 
Computer architecture
Computer architectureComputer architecture
Computer architecture
 
Code GPU with CUDA - Applying optimization techniques
Code GPU with CUDA - Applying optimization techniquesCode GPU with CUDA - Applying optimization techniques
Code GPU with CUDA - Applying optimization techniques
 
0 introduction to computer architecture
0 introduction to computer architecture0 introduction to computer architecture
0 introduction to computer architecture
 
Computer architecture
Computer architecture Computer architecture
Computer architecture
 
Ntroduction to computer architecture and organization
Ntroduction to computer architecture and organizationNtroduction to computer architecture and organization
Ntroduction to computer architecture and organization
 
[2017.03.18] hst binary training part 1
[2017.03.18] hst binary training   part 1[2017.03.18] hst binary training   part 1
[2017.03.18] hst binary training part 1
 
Hydrogen production by a thermally integrated ATR based fuel processor
Hydrogen production by a thermally integrated ATR based fuel processorHydrogen production by a thermally integrated ATR based fuel processor
Hydrogen production by a thermally integrated ATR based fuel processor
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILP
 
1 Computer Architecture
1 Computer Architecture1 Computer Architecture
1 Computer Architecture
 
SOC Chip Basics
SOC Chip BasicsSOC Chip Basics
SOC Chip Basics
 
The Modern Software Architect
The Modern Software ArchitectThe Modern Software Architect
The Modern Software Architect
 
History of computers
History of computersHistory of computers
History of computers
 
Dual-core processor
Dual-core processorDual-core processor
Dual-core processor
 
SOC Interconnects: AMBA & CoreConnect
SOC Interconnects: AMBA  & CoreConnectSOC Interconnects: AMBA  & CoreConnect
SOC Interconnects: AMBA & CoreConnect
 
Computer architecture
Computer architecture Computer architecture
Computer architecture
 
SOC Processors Used in SOC
SOC Processors Used in SOCSOC Processors Used in SOC
SOC Processors Used in SOC
 
BigchainDB: A Scalable Blockchain Database, In Python
BigchainDB: A Scalable Blockchain Database, In PythonBigchainDB: A Scalable Blockchain Database, In Python
BigchainDB: A Scalable Blockchain Database, In Python
 

Similar to Modern Programming Optimization Concepts

Microcontroller architecture
Microcontroller architectureMicrocontroller architecture
Microcontroller architectureVikas Dongre
 
Instruction Set Architecture
Instruction Set ArchitectureInstruction Set Architecture
Instruction Set ArchitectureJaffer Haadi
 
CS304PC:Computer Organization and Architecture UNIT V_merged_merged.pdf
CS304PC:Computer Organization and Architecture UNIT V_merged_merged.pdfCS304PC:Computer Organization and Architecture UNIT V_merged_merged.pdf
CS304PC:Computer Organization and Architecture UNIT V_merged_merged.pdfAsst.prof M.Gokilavani
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersRyousei Takano
 
The sunsparc architecture
The sunsparc architectureThe sunsparc architecture
The sunsparc architectureTaha Malampatti
 
Implementation of RISC-Based Architecture for Low power applications
Implementation of RISC-Based Architecture for Low power applicationsImplementation of RISC-Based Architecture for Low power applications
Implementation of RISC-Based Architecture for Low power applicationsIOSR Journals
 
Processors used in System on chip
Processors used in System on chip Processors used in System on chip
Processors used in System on chip A B Shinde
 
risc_and_cisc.ppt
risc_and_cisc.pptrisc_and_cisc.ppt
risc_and_cisc.pptRuhul Amin
 
A PIC compatible RISC CPU core Implementation for FPGA based Configurable SOC...
A PIC compatible RISC CPU core Implementation for FPGA based Configurable SOC...A PIC compatible RISC CPU core Implementation for FPGA based Configurable SOC...
A PIC compatible RISC CPU core Implementation for FPGA based Configurable SOC...IDES Editor
 
Reconfigurable network on chip
Reconfigurable network on chipReconfigurable network on chip
Reconfigurable network on chipAngelinaRoyappa1
 
Explain briefly about the major enhancements in ARM processor archite.pdf
Explain briefly about the major enhancements in ARM processor archite.pdfExplain briefly about the major enhancements in ARM processor archite.pdf
Explain briefly about the major enhancements in ARM processor archite.pdfarjunenterprises1978
 

Similar to Modern Programming Optimization Concepts (20)

Microcontroller architecture
Microcontroller architectureMicrocontroller architecture
Microcontroller architecture
 
Instruction Set Architecture
Instruction Set ArchitectureInstruction Set Architecture
Instruction Set Architecture
 
Hg3612911294
Hg3612911294Hg3612911294
Hg3612911294
 
arm-cortex-a8
arm-cortex-a8arm-cortex-a8
arm-cortex-a8
 
Nehalem
NehalemNehalem
Nehalem
 
L2
L2L2
L2
 
CS304PC:Computer Organization and Architecture UNIT V_merged_merged.pdf
CS304PC:Computer Organization and Architecture UNIT V_merged_merged.pdfCS304PC:Computer Organization and Architecture UNIT V_merged_merged.pdf
CS304PC:Computer Organization and Architecture UNIT V_merged_merged.pdf
 
Presentation on risc pipeline
Presentation on risc pipelinePresentation on risc pipeline
Presentation on risc pipeline
 
Ef35745749
Ef35745749Ef35745749
Ef35745749
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
The sunsparc architecture
The sunsparc architectureThe sunsparc architecture
The sunsparc architecture
 
Implementation of RISC-Based Architecture for Low power applications
Implementation of RISC-Based Architecture for Low power applicationsImplementation of RISC-Based Architecture for Low power applications
Implementation of RISC-Based Architecture for Low power applications
 
ARM.pdf
ARM.pdfARM.pdf
ARM.pdf
 
Processors used in System on chip
Processors used in System on chip Processors used in System on chip
Processors used in System on chip
 
risc_and_cisc.ppt
risc_and_cisc.pptrisc_and_cisc.ppt
risc_and_cisc.ppt
 
esunit1.pptx
esunit1.pptxesunit1.pptx
esunit1.pptx
 
A PIC compatible RISC CPU core Implementation for FPGA based Configurable SOC...
A PIC compatible RISC CPU core Implementation for FPGA based Configurable SOC...A PIC compatible RISC CPU core Implementation for FPGA based Configurable SOC...
A PIC compatible RISC CPU core Implementation for FPGA based Configurable SOC...
 
Reconfigurable network on chip
Reconfigurable network on chipReconfigurable network on chip
Reconfigurable network on chip
 
W04505116121
W04505116121W04505116121
W04505116121
 
Explain briefly about the major enhancements in ARM processor archite.pdf
Explain briefly about the major enhancements in ARM processor archite.pdfExplain briefly about the major enhancements in ARM processor archite.pdf
Explain briefly about the major enhancements in ARM processor archite.pdf
 

Recently uploaded

Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 

Recently uploaded (20)

Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 

Modern Programming Optimization Concepts

  • 1. 1 PRAGMATIC OPTIMIZATION IN MODERN PROGRAMMING MODERN COMPUTER ARCHITECTURE CONCEPTS Created by for / 2015-2016Marina (geek) Kolpakova UNN
  • 2. 2 COURSE TOPICS Ordering optimization approaches Demystifying a compiler Mastering compiler optimizations Modern computer architectures concepts
  • 3. 3 OUTLINE Three aspects of the computer architecture Latency vs Throughput architectures Architecture families CISC RISC VLIW Vector Why is it doing to be load/store? Latest trends summary
  • 4. 4 . 1 1-ST ASPECT OF COMPUTER ARCHITECTURE Instruction Set Architecture or ISA (interface) is a contract between HW and SW, which speci es right, possibilities & limitations. Class of ISA (load-store, register-memory) Memory addressing modes & rules (base-immediate, alignment requirements) Types & sizes of operands (size of byte, short) Operations (general arithmetic, control, logical) Control ow instructions (branches, jumps, calls, returns) Encoding an ISA ( xed or variable length) All the conceptual aspects of the architecture
  • 5. 4 . 2 2-ND ASPECT OF COMPUTER ARCHITECTURE Microarchitecture (organization) is a concrete implementation of the ISA, the high-level aspects of a processor design (memory system, memory interconnect, design of the processor internals). Pipeline width Instruction latencies Issue wight and scheduling Speculation capabilities All the concrete aspects of the architecture
  • 6. 4 . 3 3-ND ASPECT OF COMPUTER ARCHITECTURE Hardware or chip (design) is the speci cs of a computer, including the logic design and packaging. This is a concrete implementation of the microarchitecture. Tech-process Clock rates On die placement All the properties of the chip
  • 7. 5 . 1 ARCHITECTURE: ARMV8-A μarch IP Hardware Cortex-A53 ARM Octa Exynos 7(7580) 1.6GHz 28nmHKMG Cortex-A57 ARM Octa Exynos 7(7420) big.LITTLE 2.1/1.5 14FF ( LPE) (Samsung) Cortex-A72 ARM Deca MediaTek Helio X20 big.LITTLE 2.5/2.0/1.4 20nmHKMG (TSMC) Cortex-A35 ARM -
  • 8. 5 . 2 ARCHITECTURE: ARMV8-A μarch IP Hardware Denver NVIDIA Dual Tegra K1 2.3GHz 28nmHPM Kryo Qualcomm Tetra S820 big.LITTLE 2.2/1.6GHz 14FF ( LPP) (Samsung) Exynos M1 Samsung Quad Exynos 8890 big.LITTLE 2.6/2.29GHz 14FF ( LPP) (Samsung)
  • 9. 5 . 3 ARCHITECTURE: ARMV8-A μarch IP Hardware Cyclone Apple Dual A7 (APL0698) 1.4GHz 28nmHKMG (Samsung) Typhoon Apple Dual A8 (APL1012) 1.3GHz 20nmHKMG (TSMC) Twister Apple Dual A9 (APL0898) 1.85GHz 16nmFF+ (TSMC) Dual A9 (APL1022) 1.85GHz 14nmFF( LPP) (Samsung)
  • 10. 6 . 1 LATENCY VS THROUGHPUT ARCHITECTURES Latency oriented architecture addresses latency hiding issues; features sophisticated pipelining; out-of-order; employs advanced cache hierarchies; widely uses speculation. Compute cores occupy only a small part of a die.
  • 11. 6 . 2 LATENCY VS THROUGHPUT ARCHITECTURES Throughput oriented architecture performs a bunch of operations in y; features many simple compute units/cores; employs simple pipelines and large register le to provide a low-cost thread scheduling; uses wide basses, tiling, programmable local memory. Compute cores occupy most part of a die.
  • 12. 7 KEY ARCHITECTURE FAMILIES RISC Reduced Instruction Set Computer CISC Complex Instruction Set Computer VLIW Very Long Instruction Word Vector architecture
  • 13. 8 . 1 CISC Complex Instruction Set Computer Designed in the 1970s which was a time where transistors were expensive while compilers were naive. Additionally, instruction packaging was the main concern due to shortage of memory. The latency of the memory was just a bit higher then registers. The goal was to de ne an instruction set that allows high level language constructs be translated into as few assembly language instructions as possible, improving performance as well as code density. Examples are VAX, x86, AMD64. Latency-oriented architecture.
  • 14. 8 . 2 CISC Heritage instructions access memory, a plenty of addressing modes, many instruction families and a very rich variable length ISA (alignment counts!), consequently, complicated instruction decoding logic. Moreover, a few registers are available for programmers. Nowadays 1. Instructions are broken down into μcode which are much easy to pipeline and process power ef ciently. 2. Transistors are spent to cache hierarchies, out-of-order execution, large RB and speculation to eliminate stalls. 3. Symmetric multi-processing.
  • 15. 9 . 1 RISC Reduced Instruction Set Computer Designed in the 1980s which was a time there IPL was the great concern. The memory-processor gap already began to come out. The goal was to decrease the number of clocks per instruction (CPI) while pipeline instructions as much as possible employing hardware to help with it. Uniform ISA, pipelining and large register le is a must-have. Examples are MIPS, ARM, PowerPC. Latency-oriented architecture.
  • 16. 9 . 2 RISC Heritage Relatively few instructions, all are the same length. Only load and store instructions access memory. Large resister le than typical CISC processors have. No μcode Nowadays Most architectures that comes from RISC are called Load-Store architectures, while may employ μops. They combine concepts of a classic RISC with usage of modern hardware enhancements: 1. deep pipelines, multi-cycle instructions, 2. out-of-order execution, 3. speculation.
  • 17. 10 THE HARDWARE/SOFTWARE GAP Compiler analyzes control ow, analyzes dependency schedules instructions maps variables to limited register set Hardware analyzes control ow, analyzes dependency schedules instructions remaps ISA register to large internal register set
  • 18. 11 A WORD TOWARDS REGISTERS In deed, registers are temporary storage locations inside the processor that hold data and addresses. Local variables are not the same as registers in ISA, since compiler uses IR internally and does register allocation close to the end of optimization process. Registers provided by ISA is not the same as actual registers on the processor. Internal reorder buffers which hold decoded instruction parameters and intermediate results are closer to classic de nition of a register le.
  • 19. 12 . 1 VLIW Very Long Instruction Word Designed in the 1980s which was a time there IPL was the great concern. The goal was to pipeline instructions as much as possible employing software to help with it reducing complexity of the hardware and mitigate the the Hardware/Software gap. Boost processor clock simplifying work per cycle. Example is IntelHP Itanium. Throughput-oriented architecture
  • 20. 12 . 2 VLIW Heritage Compiler determines which instructions can be performed in parallel, bundles this information and the instructions, and passes the bundle(word) to the hardware. No data dependencies between instructions in a word. Each operation in a word assigned to speci c issue slot (dedicated FU).
  • 21. 12 . 3 VLIW Nowadays hardly any generic processor implements VLIW brunchy nature of production codes (in contrast to HPC or scienti c codes), need to follow binary compatibility across the μarchitecture families. Whereas architecture is widely adopted for programmable co-processors where shrink in power consumption without lose of performance is crucial (DSP, GPU).
  • 22. 13 . 1 VECTOR PROCESSORS First introduced in 1976 and dominated for HPC in the 1980s because of high instruction throughput. The goal was to perform operations on vectors of data exposing data level parallelism (DLP) to increase instruction throughput. Vector pipelining is also called chaining. Example is Cray Throughput-oriented architectures.
  • 23. 13 . 2 VECTOR PROCESSORS Heritage Process the data in vectors, each element in a vector (lane) is independent on any other. Deep pipelines, wide execution units, not necessary the same width (batch length) as size of vector in elements (vector length). Most ef cient for simple memory patterns, but getter/scatter is usually possible too. Wide memory interfaces to saturate execution units. Large vector register le, cache is not a strict requirement and absent for classical vector processors.
  • 24. 13 . 3 VECTOR PROCESSORS Nowadays They aren't used in generic processors design, but used as a co-processors for a speci c workloads: HPC, multimedia. Precursors of most designs of modern GPUs. Vector pipes with short vector length (8-16 bytes) called SIMD units are widely integrated in modern general purpose processors to accelerate most demanding loops.
  • 25. 14 . 1 WHY IS IT DOING TO BE RISC LOAD/STORE? 1. Simple xed-width instructions & few addressing modes Cache-ef cient instruction fetch, branches are aligned. Simple hardware logic → power ef cient chips. Drive a higher clock rate. 2. Concise ISA with orthogonal functionality Complex instructions are ignored by compilers due to semantic gap → simple instructions simplify scheduling. Complex addressing lead either to variable length instructions or big instruction size → inef cient decoding and scheduling as well as alignment issues. 3. Large register set Expose possible instruction parallelism to the compiler.
  • 26. 15 LATEST TRENDS Architecture is seen as Load-store RISC-inherited Internally instructions are broken down into single-pipe μops μops are reordered and optionally organized into words μops or words are scheduled for execution, caching in the highest level is usually performed on this preprocessed view. Latest generations of Intel processors, NVIDIA Denver architecture and 64-bit ARM Cortex-A processors already employ this approach.
  • 27. 16 . 1 SUMMARY There are three key aspects of computer architecture: Instruction Set Architecture, μarchitecture and design. Some architectures aim to hide latency while others aim to maximize instruction throughput. CISC is created for compact code size and exact instruction encoding and used only on ISA level nowadays. RISC leads to less complicated decoding and pipeline stages allow boosting clock in affordable power budget. VLIW targets power ef cient high performance devices for speci c tasks or used internally on μarchitecture level. Vector processors transformed into SIMD-extensions and SIMT-like GPU designs.
  • 28. 16 . 2 SUMMARY Loads-Store architectres with its simple xed-width instructions, few addressing modes, concise ISA and optimal size register size is a winner solution. Architecture can expose different properties for it's different levels (ISA, μarchitecture).