The document discusses instruction set architecture (ISA), describing it as the interface between software and hardware that defines the programming model and machine language instructions. It provides details on RISC ISAs like MIPS and how they aim to have simpler instructions, more registers, load/store architectures, and pipelining to improve performance compared to CISC ISAs. The document also discusses different types of ISA designs including stack-based, accumulator-based, and register-to-register architectures.
The presentation given at MSBTE sponsored content updating program on 'PC Maintenance and Troubleshooting' for Diploma Engineering teachers of Maharashtra. Venue: Government Polytechnic, Nashik Date: 17/01/2011 Session-2: Computer Organization and Architecture.
CISC & RISC Architecture with contents
History Of CISC & RISC
Need Of CISC
CISC
CISC Characteristics
CISC Architecture
The Search for RISC
RISC Characteristics
Bus Architecture
Pipeline Architecture
Compiler Structure
Commercial Application
Reference
The presentation given at MSBTE sponsored content updating program on 'PC Maintenance and Troubleshooting' for Diploma Engineering teachers of Maharashtra. Venue: Government Polytechnic, Nashik Date: 17/01/2011 Session-2: Computer Organization and Architecture.
CISC & RISC Architecture with contents
History Of CISC & RISC
Need Of CISC
CISC
CISC Characteristics
CISC Architecture
The Search for RISC
RISC Characteristics
Bus Architecture
Pipeline Architecture
Compiler Structure
Commercial Application
Reference
Explain cache memory with a diagram, demonstrate hit ratio and miss penalty with an example. Discussed different types of cache mapping: direct mapping, fully-associative mapping and set-associative mapping. Discussed temporal and spatial locality of references in cache memory. Explained cache write policies: write through and write back. Shown the differences between unified cache and split cache.
(Ref : Computer System Architecture by Morris Mano 3rd edition) : Microprogrammed Control unit, micro instructions, micro operations, symbolic and binary microprogram.
Topics included:
===============================================
The different types of computers
The basic structure of a computer and its operation
Machine instructions and their execution
Integer, floating-point, and character representations
Addition and subtraction of binary numbers
Basic performance issues in computer systems
A brief history of computer development
Explain cache memory with a diagram, demonstrate hit ratio and miss penalty with an example. Discussed different types of cache mapping: direct mapping, fully-associative mapping and set-associative mapping. Discussed temporal and spatial locality of references in cache memory. Explained cache write policies: write through and write back. Shown the differences between unified cache and split cache.
(Ref : Computer System Architecture by Morris Mano 3rd edition) : Microprogrammed Control unit, micro instructions, micro operations, symbolic and binary microprogram.
Topics included:
===============================================
The different types of computers
The basic structure of a computer and its operation
Machine instructions and their execution
Integer, floating-point, and character representations
Addition and subtraction of binary numbers
Basic performance issues in computer systems
A brief history of computer development
An advanced processor is a type of microprocessor that is designed to handle complex tasks and perform calculations at a high speed. These processors are typically used in high-performance computing applications, such as scientific research, artificial intelligence, and data analysis. They often have multiple cores and advanced instruction sets that allow them to process large amounts of data quickly and efficiently. Some examples of advanced processors include Intel's Core i9 and AMD's Ryzen Threadripper
Microchip's PIC Micro Controller - Presentation Covers- Embedded system,Application, Harvard and Von Newman Architecture, PIC Microcontroller Instruction Set, PIC assembly language programming, PIC Basic circuit design and its programming etc.
FLPU = Floating Points operations Unit
PFCU = Prefetch control unit
AOU = Atomic Operations Unit
Memory-Management unit (MMU)
MAR (memory address register)
MDR (memory data register)
BIU (Bus Interface Unit)
ARS (Application Register Set)
FRS File Register Set
(SRS) single register set
Evaluation of morden computer & system attributes in ACAPankaj Kumar Jain
Elements of Modern Computers, Architectural
Evolution in computer architecture ,System Attributes to Performance,Clock Rate and CPI,MIPS Rate,Throughput Rate,Implicit Parallelism,Explicit Parallelism, State of computing,
This is introduction to micro processor and assembly language course. In this chapter you are going to be introduced to basic idea of microprocessor. Language hierarchy and virtual machine concept.
Various processor architectures are described in this presentation. It could be useful for people working for h/w selection and processor identification.
Kerala Engineering Architecture Medical is an entrance examination series for admissions to various professional degree courses in the state of Kerala, India. It is conducted by the Office of the Commissioner of Entrance Exams run by the Government of Kerala
Paleontology is the study of the history of life on Earth as based on fossils. Fossils are the remains of plants, animals, fungi, bacteria, and single-celled living things that have been replaced by rock material or impressions of organisms preserved in roc
The ways in which an element—or compound such as water—moves between its various living and nonliving forms and locations in the biosphere is called a biogeochemical cycle. Biogeochemical cycles important to living organisms include the water, carbon, nitrogen, phosphorus, and sulfur cycles.
The AC and DC bridge both are used for measuring the unknown parameter of the circuit. The AC bridge measures the unknown impedance of the circuit. The DC bridge measures the unknown resistance of the circuit.
The Wien bridge is a type of bridge circuit that was developed by Max Wien in 1891. The bridge consists of four resistors and two capacitors. At the time of the Wien bridge's invention, bridge circuits were a common way of measuring component values by comparing them to known values.
For most of us, our name existed even before we did.
In anticipation of our arrival, our parents went through an ultra stressful process of narrowing down dozens of potential names until they chose the perfect one.
Luckily they did, because whatever your name is, it has followed you throughout your entire life; and in some cases, people may have heard of your name before they’ve ever met you.
When it comes to how to name an app, it’s of similar importance as naming a child. The name of your app will follow your brand forever, and in many cases, potential users will hear the name before they ever actually use your app.
flora and fauna of himachal pradesh and keralaAJAL A J
flora and fauna of himachal pradesh and kerala
A green pearl in the Himalayan crown, Himachal Pradesh is blessed with a rich flora and fauna that graces the land with grandeur and majesty. Other animals that can be sighted in the wild include the ibex, wild yak, ghoral deer, musk deer, Himalayan black bear, brown bear, leopards and the Himalayan Thar. Also kerala is gods on country
Bachelor of Science in Cardio-vascular technology is an undergraduate course in cardiology. These technologists assist the physicians in the diagnosis and the treatment of cardiac (heart) and peripheral vascular conditions (blood vessels). The cardiovascular technologists are also responsible for preparing the patients for open-heart surgeries and pacemaker implantation surgeries. The technologists also monitor the patient’s cardiac parameters while they undergo the surgery. B. Sc. in Cardiovascular technology is a three years’ full-time undergraduate course and is an interesting and important course in medicine.
`Remove Unprofitable Products and Services. The products or services with the highest gross profit margin are the most important to your business. ...
Find New Customers. New customers can help grow your business. ...
Increase your Conversion Rate. ...
Review Current Pricing Structure. ...
Reduce your inventory. ...
Reduce your overheads.
Polycystic ovary syndrome (PCOS) is a hormonal disorder common among women of reproductive age. Women with PCOS may have infrequent or prolonged menstrual periods or excess male hormone (androgen) levels. The ovaries may develop numerous small collections of fluid (follicles) and fail to regularly release eggs
Are you an NRI and aiming to come back to India to pursue graduation from the top-tier colleges of India?
Then, you’re halfway there. Being an NRI, your top preference would be IITs and NITs of India. If that's the case, you must know the fee structure of both the IITs, NITs (under DASA scheme), Centrally Funded Institutions and State-Level Govt. Engineering Colleges.
Note: According to the latest update from DASA, from session 2021-22 onwards, JEE Rank is made mandatory for NRI/PIO/OCI Students to be eligible for DASA & CIWG Schemes. Hence, 2020-21 will be the last year when SAT 2 scores will be considered for DASA/CIWG Scheme.
Subjects to study if you want to work for a charityAJAL A J
The charity sector can be competitive and experience, volunteer or otherwise, can count for a lot. But there are ways to make that third sector CV stand out from the competition. Why not take some courses? A course can be a great way to make your application shine and an opportunity to learn new skills and ideas.
Joint Entrance Examination - Main or commonly known as JEE Main is a national level entrance exam conducted by the NTA to offer admission to BE/BTech, BPlan and BArch courses at the IIITs (Indian Institute of Information Technology), NITs (National Institute of Technology) and other Centrally Funded Technical Institutions (CFTIs) across the country.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
2. Instruction Set Architecture
• Instruction set architecture is the structure of a
computer that a machine language programmer must
understand to write a correct (timing independent)
program for that machine.
• The instruction set architecture is also the machine
description that a hardware designer must
understand to design a correct implementation of
the computer.
• a fixed number of operations are formatted as
one big instruction (called a bundle)
op op op Bundling info
3. Instruction Set Architecture
Computer Architecture =
Instruction Set Architecture
+ Machine Organization
• “... the attributes of a [computing] system as seen by
the programmer, i.e. the conceptual structure and
functional behavior …”
4. Evolution of Instruction Sets
Single Accumulator (EDSAC 1950)
Accumulator + Index Registers
(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model
from Implementation
High-level Language Based Concept of a Family
(B5000 1963) (IBM 360 1964)
General Purpose Register Machines
Complex Instruction Sets Load/Store Architecture
RISC
(Vax, Intel 432 1977-80) (CDC 6600, Cray 1 1963-76)
(Mips,Sparc,HP-PA,IBM RS6000,PowerPC . . .1987)
LIW/”EPIC”? (IA-64. . .1999) VLIW
5. Instruction Set Architecture
– Interface between all the software that runs on the
machine and the hardware that executes it
• Computer Architecture = Hardware + ISA
6. instruction set, or instruction set
architecture (ISA)
• An instruction set, or instruction set
architecture (ISA), is the part of the computer
architecture related to programming, including the
native data types, instructions, registers, addressing
modes, memory,
architecture, interrupt and exception handling, and
external I/O. An ISA includes a specification of the
set of opcodes (machine language), and the native
commands implemented by a particular processor.
7. Microarchitecture
• Instruction set architecture is distinguished from
the microarchitecture, which is the set of processor
design techniques used to implement the
instruction set. Computers with different micro
architectures can share a common instruction set.
• For example, the Intel Pentium and
the AMD Athlon implement nearly identical
versions of the x86 instruction set, but have
radically different internal designs.
8.
9. NUAL vs. UAL
• Unit Assumed Latency (UAL)
– Semantics of the program are that each
instruction is completed before the next one is
issued
– This is the conventional sequential model
• Non-Unit Assumed Latency (NUAL):
– At least 1 operation has a non-unit assumed
latency, L, which is greater than 1
– The semantics of the program are correctly
understood if exactly the next L-1 instructions are
understood to have issued before this operation
completes
23. Instruction Set Architectures
Reduced Instruction Set Computers (RISCs)
Simple instruction
Flexibility
Higher throughput
Faster execution
Complex Instruction Set Computers (CISCs)
Hardware support for high-level language
Compact program
24. MIPS: A RISC example
Smaller and simpler instruction set
111 instructions
One cycle execution time
Pipelining
32 registers
32 bits for each register
26. Overview of the MIPS Processor
Memory
Up to 232
bytes = 230
words
4 bytes per word
$0
$1
$2
$31
Hi Lo
ALU
F0
F1
F2
F31
FP
Arith
EPC
Cause
BadVaddr
Status
EIU FPU
TMU
Execution
&
Integer Unit
(Main proc)
Floating
Point Unit
(Coproc 1)
Trap &
Memory Unit
(Coproc 0)
. . .
. . .
Integer
mul/div
Arithmetic &
Logic Unit
32 General
Purpose
Registers
Integer
Multiplier/Divider
32 Floating-Point
Registers
Floating-Point
Arithmetic Unit
28. 3-28ECE 361
DefinitionsDefinitions
Performance is typically in units-per-second
• bigger is better
If we are primarily concerned with response time
• performance = 1
execution_time
" X is n times faster than Y" means
n
ePerformanc
ePerformanc
imeExecutionT
imeExecutionT
y
x
x
y
==
29. 3-29ECE 361
Organizational Trade-offsOrganizational Trade-offs
Compiler
Programming
Language
Application
Datapath
Control
TransistorsWiresPins
ISA
Function Units
Instruction Mix
Cycle Time
CPI
CPI is a useful design measure relating the Instruction Set
Architecture with the Implementation of that architecture, and the
program measured
30. 3-30ECE 361
Principal Design Metrics: CPI and Cycle TimePrincipal Design Metrics: CPI and Cycle Time
Seconds
nsInstructio
Cycle
Seconds
nInstructio
Cycles
ePerformanc
CycleTimeCPI
ePerformanc
imeExecutionT
ePerformanc
=
×
=
×
=
=
1
1
1
31. 3-31ECE 361
Amdahl's “Law”: Make the Common Case FastAmdahl's “Law”: Make the Common Case Fast
Speedup due to enhancement E:
ExTime w/o E Performance w/ E
Speedup(E) = -------------------- = ---------------------
ExTime w/ E Performance w/o E
Suppose that enhancement E accelerates a fraction F of the task
by a factor S and the remainder of the task is unaffected then,
ExTime(with E) = ((1-F) + F/S) X ExTime(without E)
Speedup(with E) = ExTime(without E) ÷
((1-F) + F/S) X ExTime(without E)
Performance improvement
is limited by how much the
improved feature is used
Invest resources where
time is spent.
34. 3-34ECE 361
The steps for executing an instruction:The steps for executing an instruction:
1.Fetch the instruction
2.Decode the instruction
3.Locate the operand
4.Fetch the operand (if necessary)
5.Execute the operation in processor
registers
6.Store the results
7.Go back to step 1
35. 3-35ECE 361
Typical Processor Execution CycleTypical Processor Execution Cycle
Instruction
Fetch
Instruction
Decode
Operand
Fetch
Execute
Result
Store
Next
Instruction
Obtain instruction from program storage
Determine required actions and instruction size
Locate and obtain operand data
Compute result value or status
Deposit results in register or storage for later use
Determine successor instruction
36. 3-36ECE 361
Instruction and Data Memory: Unified or SeparateInstruction and Data Memory: Unified or Separate
ADD
SUBTRACT
AND
OR
COMPARE
.
.
.
01010
01110
10011
10001
11010
.
.
.
Programmer's View
Computer's View
CPU
Memory
I/O
Computer
Program
(Instructions)
Princeton (Von Neumann) Architecture
--- Data and Instructions mixed in same
unified memory
--- Program as data
--- Storage utilization
--- Single memory interface
Harvard Architecture
--- Data & Instructions in
separate memories
--- Has advantages in certain
high performance
implementations
--- Can optimize each memory
43. 3-43ECE 361
Stack architecture
high frequency of memory accesses has made it
unattractive
is useful for rapid interpretation of high-level
language programs
Infix expression
(A+B) ×C+(D×E)
Postfix expression
AB+C×DE×+
50. 3-50ECE 361
Instruction Set Design MetricsInstruction Set Design Metrics
Static Metrics
• How many bytes does the program
occupy in memory?
Dynamic Metrics
• How many instructions are executed?
• How many bytes does the processor fetch to execute the
program?
• How many clocks are required per instruction?
• How "lean" a clock is practical?
CPI
Instruction Count Cycle Time
Cycle
Seconds
nInstructio
Cycles
nsInstructio
ePerformanc
imeExecutionT ××==
1
51. Types of ISA and examples:
1. RISC -> Playstation
2. CISC -> Intel x86
3. MISC -> INMOS Transputer
4. ZISC -> ZISC36
5. SIMD -> many GPUs
6. EPIC -> IA-64 Itanium
7. VLIW -> C6000 (Texas Instruments)
52. Problems of the Past
• In the past, it was believed that hardware
design was easier than compiler design
– Most programs were written in assembly
language
• Hardware concerns of the past:
– Limited and slower memory
– Few registers
53. The Solution
• Have instructions do more work, thereby
minimizing the number of instructions called
in a program
• Allow for variations of each instruction
– Usually variations in memory access
• Minimize the number of memory accesses
54. The Search for RISC
• Compilers became more prevalent
• The majority of CISC instructions were rarely
used
• Some complex instructions were slower than
a group of simple instructions performing an
equivalent task
– Too many instructions for designers to optimize
each one
55. RISC Architecture
• Small, highly optimized set of instructions
• Uses a load-store architecture
• Short execution time
• Pipelining
• Many registers
56. Pipelining
• Break instructions into steps
• Work on instructions like in an assembly line
• Allows for more instructions to be executed
in less time
• A n-stage pipeline is n times faster than a non
pipeline processor (in theory)
58. Without Pipelining
Instr 1
Instr 2
Clock Cycle 1 2 3 4 5 6 7 8 9 10
• Normally, you would perform the fetch, decode,
execute, operate, and write steps of an instruction
and then move on to the next instruction
59. With Pipelining
Clock Cycle 1 2 3 4 5 6 7 8 9
Instr 1
Instr 2
Instr 3
Instr 4
Instr 5
• The processor is able to perform each stage
simultaneously.
• If the processor is decoding an instruction, it may
also fetch another instruction at the same time.
60. Pipeline (cont.)
• Length of pipeline depends on the longest
step
• Thus in RISC, all instructions were made to
be the same length
• Each stage takes 1 clock cycle
• In theory, an instruction should be finished
each clock cycle
62. Pipeline Solution :
• Solution: Compiler may recognize which
instructions are dependent or independent of
the current instruction, and rearrange them to
run the independent one first
63. How to make pipelines faster
• Superpipelining
– Divide the stages of pipelining into more stages
• Ex: Split “fetch instruction” stage into two
stages
Super duper pipelining
Super scalar pipelining
Run multiple pipelines in parallel
Automated consolidation of data from many
sources,
64. Dynamic pipeline
• Dynamic pipeline: Uses buffers to hold
instruction bits in case a dependent
instruction stalls
65. Why CISC Persists ?
• Most Intel and AMD chips are CISC x86
• Most PC applications are written for x86
• Intel spent more money improving the
performance of their chips
• Modern Intel and AMD chips incorporate
elements of pipelining
– During decoding, x86 instructions are split into
smaller pieces
67. Outline
• Types of architectures
• Superscalar
• Differences between CISC, RISC and VLIW
• VLIW ( very long instruction word )
68. VLIW Goals:
Flexible enough
Match well technology
Very Long Instruction Word
o Very long instruction word or VLIW refers to a processor architecture designed to
take advantage of instruction level parallelism
VLIW philosophy:
– “dumb” hardware
– “intelligent” compiler
69. VLIW - History
• Floating Point Systems Array Processor
– very successful in 70’s
– all latencies fixed; fast memory
• Multiflow
– Josh Fisher (now at HP)
– 1980’s Mini-Supercomputer
• Cydrome
– Bob Rau (now at HP)
– 1980’s Mini-Supercomputer
• Tera
– Burton Smith
– 1990’s Supercomputer
– Multithreading
• Intel IA-64 (Intel & HP)
70. VLIW Processors
Goal of the hardware design:
• reduce hardware complexity
• to shorten the cycle time for better performance
• to reduce power requirements
How VLIW designs reduce hardware complexity ?
1. less multiple-issue hardware
1. no dependence checking for instructions within a bundle
2. can be fewer paths between instruction issue slots & FUs
2. simpler instruction dispatch
1. no out-of-order execution, no instruction grouping
3. ideally no structural hazard checking logic
• Reduction in hardware complexity affects cycle time & power
consumption
71. VLIW Processors
More compiler support to increase ILP
detects hazards & hides
latencies
• structural hazards
• no 2 operations to the same functional unit
• no 2 operations to the same memory bank
• hiding latencies
• data prefetching
• hoisting loads above stores
• data hazards
• no data hazards among instructions in a
bundle
• control hazards
• predicated execution
72. VLIW: Definition
• Multiple independent Functional Units
• Instruction consists of multiple independent instructions
• Each of them is aligned to a functional unit
• Latencies are fixed
– Architecturally visible
• Compiler packs instructions into a VLIW also schedules all
hardware resources
• Entire VLIW issues as a single unit
• Result: ILP with simple hardware
– compact, fast hardware control
– fast clock
– At least, this is the goal
73. Introduction
o Instruction of a VLIW processor consists of multiple independent
operations grouped together.
o There are multiple independent Functional Units in VLIW processor
architecture.
o Each operation in the instruction is aligned to a functional unit.
o All functional units share the use of a common large register file.
o This type of processor architecture is intended to allow higher
performance without the inherent complexity of some other
approaches.
74. Slide74
VLIW History
The term coined by J.A. Fisher (Yale) in 1983
ELI S12 (prototype) Trace
(Commercial)
Origin lies in horizontal microcode optimization
Another pioneering work by B. Ramakrishna Rau in
1982 Poly
cyclic (Prototype) Cydra-5
(Commercial)
Recent developments Trimedia
– Philips TMS320C6X –
Texas Instruments
75. "Bob" Rau
• Bantwal Ramakrishna "Bob" Rau (1951
– December 10, 2002) was a computer
engineer and HP Fellow. Rau was a founder
and chief architect of Cydrome, where he
helped develop the Very long instruction
word technology that is now standard in
modern computer processors. Rau was the
recipient of the 2002 Eckert–Mauchly
Award.
76. 1984: Co-founded Cydrome Inc. and was the chief
architect of the Cydra 5 mini-supercomputer.
1989: Joined Hewlett Packard and started HP Lab's research
program in VLIW and instruction-level parallel processing.
Director of the Compiler and Architecture Research (CAR)
program, which during the 1990s, developed advanced
compiler technology for Hewlett Packard and Intel computers.
At HP, also worked on PICO (Program In, Chip Out) project
to take an embedded application and to automatically design
highly customized computing hardware that is specific to that
application, as well as any compiler that might be needed.
2002: passed away after losing a long battle with cancer
77. The VLIW Architecture
• A typical VLIW (very long instruction word) machine
has instruction words hundreds of bits in length.
• Multiple functional units are used concurrently in a
VLIW processor.
• All functional units share the use of a
common large register file.
78. Parallel Operating Environment (POE)
• Compiler creates complete plan of run-time execution
– At what time and using what resource
– POE communicated to hardware via the ISA
– Processor obediently follows POE
– No dynamic scheduling, out of order execution
• These second guess the compiler’s plan
• Compiler allowed to play the statistics
– Many types of info only available at run-time
• branch directions, pointer values
– Traditionally compilers behave conservatively handle worst case
possibility
– Allow the compiler to gamble when it believes the odds are in its favor
• Profiling
• Expose micro-architecture to the compiler
– memory system, branch execution
79. VLIW Processors
Compiler support to increase ILP
• compiler creates each VLIW word
• need for good code scheduling greater than with in-order issue superscalars
• instruction doesn’t issue if 1 operation can’t ( reverse to maala
bulb )
• techniques for increasing ILP
1.loop unrolling
2.software pipelining (schedules instructions from
different iterations together)
3.aggressive inlining (function becomes part of the
caller code)
4.trace scheduling (schedule beyond basic block
boundaries)
80. Different Approaches
Other approaches to improving performance in processor architectures :
o Pipelining
Breaking up instructions into sub-steps so that instructions can be
executed partially at the same time
o Superscalar architectures
Dispatching individual instructions to be executed completely
independently in different parts of the processor
o Out-of-order execution
Executing instructions in an order different from the program
81. Parallel processing
Processing instructions in parallel requires three
major tasks:
1. checking dependencies between instructions to
determine which instructions can be grouped
together for parallel execution;
2. assigning instructions to the functional units on
the hardware;
3. determining when instructions are initiated placed
together into a single word.
82. ILP
Consider the following program:
op 1 e = a + b
op2 f = c + d
op3 m = e * f
o Operation 3 depends on the results of operations 1 and 2, so it
cannot be calculated until both of them are completed
o However, operations 1 and 2 do not depend on any other
operation, so they can be calculated simultaneously
o If we assume that each operation can be completed in one unit of
time then these three instructions can be completed in a total of
two units of time giving an ILP of 3/2.
83. Two approaches to ILP
oHardware approach:
Works upon dynamic parallelism where
scheduling of instructions is at run time
oSoftware approach:
Works on static parallelism where
scheduling of instructions is by compiler
84. VLIW COMPILER
o Compiler is responsible for static scheduling of instructions in VLIW
processor.
o Compiler finds out which operations can be
executed in parallel in the program.
o It groups together these operations in single instruction which is the
very large instruction word.
o Compiler ensures that an operation is not issued before its operands
are ready.
87. Working
o Long instruction words are fetched from the memory
o A common multi-ported register file for fetching the operands and
storing the results.
o Parallel random access to the register file is possible through the
read/write cross bar.
o Execution in the functional units is carried out concurrently with the
load/store operation of data between RAM and the register file.
o One or multiple register files for FX and FP data.
o Rely on compiler to find parallelism and schedule dependency free
program code.
88. Major categories
VLIW – Very Long Instruction Word
EPIC – Explicitly Parallel Instruction Computing
89. IA-64 EPIC
Explicitly Parallel Instruction Computing, VLIW
2001 800 MHz Itanium IA-64 implementation
Bundle of instructions
• 128 bit bundles
• 3 41-bit instructions/bundle
• 2 bundles can be issued at once
• if issue one, get another
• less delay in bundle issue
90. Slide90
Data path : A simple VLIW Architecture
FU FU FU
Register file
Scalability ?
Access time, area, power consumption sharply increase with
number of register ports
91. Slide91
Data path : Clustered VLIW Architecture
(distributed register file)
FU FU
Register file
FU FU
Register file
FU FU
Register file
Interconnection Network
92. Slide92
Coarse grain Fus with
VLIW core
MULT RAM ALU
Coarse grain
FU
Reg2
Reg1
Reg1
Reg1
Reg2
Reg2
Multiplexer network
Micro
Code
IR
Prg. Counter
Logic
Embedded (co)-processors as Fus in a VLIW architecture
94. Superscalar Processors
• Superscalar processors are designed to exploit more
instruction-level parallelism in user programs.
• Only independent instructions can be executed in parallel
without causing a wait state.
• The amount of instruction-level parallelism varies widely
depending on the type of code being executed.
• Superscalar
– Operations are sequential
– Hardware figures out resource assignment, time of execution
95. Pipelining in Superscalar Processors
• In order to fully utilise a superscalar processor of
degree m, m instructions must be executable in
parallel. This situation may not be true in all clock
cycles. In that case, some of the pipelines may be
stalling in a wait state.
• In a superscalar processor, the simple operation
latency should require only one cycle, as in the base
scalar processor.
98. Superscalar Implementation
• Simultaneously fetch multiple instructions
• Logic to determine true dependencies involving
register values
• Mechanisms to communicate these values
• Mechanisms to initiate multiple instructions in
parallel
• Resources for parallel execution of multiple
instructions
• Mechanisms for committing process state in
correct order
100. Slide
100
Why Superscalar Processors are
commercially more popular as
compared to VLIW processor ?
Binary code compatibility among scalar &
superscalar processors of same family
Same compiler works for all processors (scalars
and superscalars) of same family
Assembly programming of VLIWs is tedious
Code density in VLIWs is very poor
- Instruction encoding schemes
Area Performance
101. Slide
101
Superscalars vs. VLIW
VLIW requires a more complex compiler
Superscalar's can more efficiently execute
pipeline-independent code
• consequence: don’t have to recompile if change
the implementation
102. VLSI DESIGN GROUP – METS SCHOOL OF ENGINEERING , MALA
Comparison: CISC, RISC, VLIW
104. Advantages of VLIW
Compiler prepares fixed packets of multiple
operations that give the full "plan of execution"
– dependencies are determined by compiler and used to
schedule according to function unit latencies
– function units are assigned by compiler and correspond to
the position within the instruction packet ("slotting")
– compiler produces fully-scheduled, hazard-free code =>
hardware doesn't have to "rediscover" dependencies or
schedule
105. Disadvantages of VLIW
Compatibility across implementations is a major
problem
– VLIW code won't run properly with different number
of function units or different latencies
– unscheduled events (e.g., cache miss) stall entire
processor
Code density is another problem
– low slot utilization (mostly nops)
– reduce nops by compression ("flexible VLIW",
"variable-length VLIW")
106.
107. References
1. Advanced Computer Architectures, Parallelism, Scalability,
Programmability, K. Hwang, 1993.
2. M. Smotherman, "Understanding EPIC Architectures and
Implementations" (pdf)
http://www.cs.clemson.edu/~mark/464/acmse_epic.pdf
3. Lecture notes of Mark Smotherman,
http://www.cs.clemson.edu/~mark/464/hp3e4.html
4. An Introduction To Very-Long Instruction Word (VLIW) Computer
Architecture, Philips Semiconductors,
http://www.semiconductors.philips.com/acrobat_download/other
/vliw-wp.pdf
5. Texas Instruments, Tutorial on TMS320C6000 VelociTI Advanced
VLIW Architecture.
http://www.acm.org/sigs/sigmicro/existing/micro31/pdf/m31_sesha
n.pdf