SlideShare a Scribd company logo
1 of 37
Introduction to
ILP-Processors
Instruction-Level Parallel Processors
• Evolution and overview of ILP-processors
• Dependencies between instructions
• Instruction scheduling
• Preserving sequential consistency
• The speed-up potential of ILP-processing
CH04
Improve CPU performance by
• Increasing clock rates
 (CPU running at 4GHz!)
• Increasing the number of instructions to be
executed in parallel
 (executing 6 instructions at the same time)
How do we increase the number of instructions
to be executed?
Time and Space parallelism
Pipeline (assembly line)
Result of pipeline (e.g.)
Pipeline of PowerPC 601
Data fetch from register file
VLIW (Very Long Instruction Word, 1024 bits!)
Superscalar (Sequential stream of instructions)
From sequential instructions to parallel execution
• Dependencies between instructions
• Instruction scheduling
• Preserving sequential consistency
Dependencies between instructions
Instructions often depend on each other in such a way
that a particular instruction cannot be executed until a
preceding instruction or even two or three preceding
instructions have been executed.
1. Data dependencies
2. Control dependencies
3. Resource dependencies
Data dependencies
• Read after Write
• Write after Read
• Write after Write
• Recurrences
Data dependency
Data dependencies in straight-line code (RAW)
• RAW dependencies
i1: load r1, a
i2: add r2, r1, r1
• flow dependencies
• true dependencies
• cannot be abandoned
Data dependencies in straight-line code (WAR)
• WAR dependencies
i1: mul r1, r2, r3
i2: add r2, r4, r5
• anti-dependencies
• false dependencies
• can be eliminated through register renaming
i1: mul r1, r2, r3
i2: add r6, r4, r5
by using compiler or ILP-processor
Data dependencies in straight-line code (WAW)
• WAW dependencies
i1: mul r1, r2, r3
i2: add r1, r4, r5
• output dependencies
• false dependencies
• can be eliminated through register renaming
i1: mul r1, r2, r3
i2: add r6, r4, r5
by using compiler or ILP-processor
Data dependencies in loops
for (int i=2; i<10; i++) {
x[i] = a*x[i-1] + b
}
cannot be executed in parallel
Data dependencies
Data dependency graphs
i1: load r1, a
i2: load r2, b
i3: add r3, r1, r2
i4: mul r1, r2, r4
i5: div r1, r2, r4
i1 i2
i3
i4
i5
Instruction interpretation: r3(r1)+(r2) etc.
Figure 4.11 A straight-line assembly code sequence and its corresponding DDG
δt δt
δo
δa
Control dependencies
mul r1, r2, r3
jz zproc
sub r4, r7, r1
:
zproc: load r1, x
:
• actual path of execution depends on the outcome of
multiplication
• impose dependencies on the logical subsequent
instructions
Control Dependency Graph
Branches? Frequency and branch distance
• Expected frequency of (all) branch
general-purpose programs (non scientific): 20-30%
scientific programs: 5-10%
• Expected frequency of conditional branch
general-purpose programs: 20%
scientific programs: 5-10%
• Expected branch distance (between two branch)
general-purpose programs: every 3rd-5th instruction,
on average, to be a conditional branch
scientific programs: every 10th-20th instruction, on
average, to be a conditional branch
Impact of Branch on instruction issue
Resource dependencies
• An instruction is resource-dependent on a
previously issued instruction if it requires a
hardware resource which is still being used by a
previously issued instruction.
• e.g.
div r1, r2, r3
div r4, r2, r5
Instruction scheduling
• scheduling or arranging two or more instruction to
be executed in parallel
 Need to detect code dependency (detection)
 Need to remove false dependency (resolution)
• a means to extract parallelism
 instruction-level parallelism which is implicit, is
made explicit
• Two basic approaches
 Static: done by compiler
 Dynamic: done by processor
Instruction Scheduling:
ILP-instruction scheduling
ILP-instruction scheduling
Detection and resolution
of dependencies
Parallel optimization
Figure 4.18 interpretation of the concept of instruction scheduling in ILP-processors
Basic approaches to ILP-instruction scheduling
Preserving sequential consistency
• care must be taken to maintain the logical integrity
of the program execution
• parallel execution vs sequential execution as far as
the logical integrity of program execution is
concerned
• e.g.
add r5, r6, r7
div r1, r2, r3
jz somewhere
Concept of sequential consistency
The speed-up potential of ILP-processing
• Parallel instruction execution may be restricted by
data, control and resource dependencies.
• Potential speed-up when parallel instruction
execution is restricted by true data and control
dependencies:
 general purpose programs: about 2
 scientific programs: about 2-4
• Why are the speed-up figures so low?
 basic block (a low-efficiency method used to extract
parallelism)
Basic Block
• is a straight-line code sequence that can only be
entered at the beginning and left at its end.
i1: calc: add r3, r1, r2
i2: sub r4, r1, r2
i3: mul r4, r3, r4
i4: jn negproc
• Basic block lengths of 3.5-6.3, with an overall
average of 4.9
(RISC: general 7.8 and scientific 31.6 )
• Conditional Branch  control dependencies
Two other methods for speed up
• Potential speed-up embodied in loops
 amount to a figure of 50 (on average speed up)
 unlimited resources (about 50 processors, and about 400
registers)
 ideal schedule
• appropriate handling of control dependencies
 amount to 10 to 100 times speed up
 assuming perfect oracles that always pick the right path for
conditional branch
  control dependencies are the real obstacle in
utilizing instruction-level parallelism!
What do we do without a perfect oracle?
• Execute all possible paths in conditional branch
 there are 2^N paths for N conditional branches
 pursuing an exponential increasing number of paths
would be an unrealistic approach.
• Make your Best Guess
 branch prediction
 pursuing both possible paths but restrict the number
of subsequent conditional branches
How close real systems can come to the upper
limits of speed-up?
• ambitious processor can expect to achieve speed-
up figures of about
 4 for general purpose programs
 10-20 for scientific programs
• an ambitious processor:
 predicts conditional branches
 has 256 integer and 256 FP register
 eliminates false data dependencies through register
renaming,
 performs a perfect memory disambiguation
 maintains a gliding instruction window of 64 items

More Related Content

Similar to Introduction to ILP-Processors: Improving CPU Performance Through Instruction-Level Parallelism

Instruction Level Parallelism (ILP) Limitations
Instruction Level Parallelism (ILP) LimitationsInstruction Level Parallelism (ILP) Limitations
Instruction Level Parallelism (ILP) LimitationsJose Pinilla
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPA B Shinde
 
Unit 2 contd. and( unit 3 voice over ppt)
Unit 2 contd. and( unit 3   voice over ppt)Unit 2 contd. and( unit 3   voice over ppt)
Unit 2 contd. and( unit 3 voice over ppt)Dr Reeja S R
 
Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6Ismail Mukiibi
 
Processor Organization and Architecture
Processor Organization and ArchitectureProcessor Organization and Architecture
Processor Organization and ArchitectureDhaval Bagal
 
SOC System Design Approach
SOC System Design ApproachSOC System Design Approach
SOC System Design ApproachA B Shinde
 
Computer Architecture and Organization
Computer Architecture and OrganizationComputer Architecture and Organization
Computer Architecture and Organizationssuserdfc773
 
CALecture3Module1.ppt
CALecture3Module1.pptCALecture3Module1.ppt
CALecture3Module1.pptBeeMUcz
 
Pipelining
PipeliningPipelining
PipeliningAJAL A J
 
Instruction level parallelism
Instruction level parallelismInstruction level parallelism
Instruction level parallelismdeviyasharwin
 
ehhhhhhhhhhhhhhhhhhhhhhhhhjjjjjllaye.pptx
ehhhhhhhhhhhhhhhhhhhhhhhhhjjjjjllaye.pptxehhhhhhhhhhhhhhhhhhhhhhhhhjjjjjllaye.pptx
ehhhhhhhhhhhhhhhhhhhhhhhhhjjjjjllaye.pptxEliasPetros
 
Optimization Techniques
Optimization TechniquesOptimization Techniques
Optimization TechniquesJoud Khattab
 
Pipelining And Vector Processing
Pipelining And Vector ProcessingPipelining And Vector Processing
Pipelining And Vector ProcessingTheInnocentTuber
 
Instruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesInstruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesDilum Bandara
 
Superscalar processor
Superscalar processorSuperscalar processor
Superscalar processornoor ul ain
 

Similar to Introduction to ILP-Processors: Improving CPU Performance Through Instruction-Level Parallelism (20)

Instruction Level Parallelism (ILP) Limitations
Instruction Level Parallelism (ILP) LimitationsInstruction Level Parallelism (ILP) Limitations
Instruction Level Parallelism (ILP) Limitations
 
Lec1 final
Lec1 finalLec1 final
Lec1 final
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILP
 
Unit 2 contd. and( unit 3 voice over ppt)
Unit 2 contd. and( unit 3   voice over ppt)Unit 2 contd. and( unit 3   voice over ppt)
Unit 2 contd. and( unit 3 voice over ppt)
 
Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6
 
Processor Organization and Architecture
Processor Organization and ArchitectureProcessor Organization and Architecture
Processor Organization and Architecture
 
SOC System Design Approach
SOC System Design ApproachSOC System Design Approach
SOC System Design Approach
 
Computer Architecture and Organization
Computer Architecture and OrganizationComputer Architecture and Organization
Computer Architecture and Organization
 
CALecture3Module1.ppt
CALecture3Module1.pptCALecture3Module1.ppt
CALecture3Module1.ppt
 
Pipelining
PipeliningPipelining
Pipelining
 
15 ia64
15 ia6415 ia64
15 ia64
 
Instruction level parallelism
Instruction level parallelismInstruction level parallelism
Instruction level parallelism
 
CA UNIT III.pptx
CA UNIT III.pptxCA UNIT III.pptx
CA UNIT III.pptx
 
ehhhhhhhhhhhhhhhhhhhhhhhhhjjjjjllaye.pptx
ehhhhhhhhhhhhhhhhhhhhhhhhhjjjjjllaye.pptxehhhhhhhhhhhhhhhhhhhhhhhhhjjjjjllaye.pptx
ehhhhhhhhhhhhhhhhhhhhhhhhhjjjjjllaye.pptx
 
Optimization Techniques
Optimization TechniquesOptimization Techniques
Optimization Techniques
 
OpenPOWER Webinar
OpenPOWER Webinar OpenPOWER Webinar
OpenPOWER Webinar
 
Pipelining And Vector Processing
Pipelining And Vector ProcessingPipelining And Vector Processing
Pipelining And Vector Processing
 
Instruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler TechniquesInstruction Level Parallelism – Compiler Techniques
Instruction Level Parallelism – Compiler Techniques
 
13 risc
13 risc13 risc
13 risc
 
Superscalar processor
Superscalar processorSuperscalar processor
Superscalar processor
 

Recently uploaded

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 

Recently uploaded (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 

Introduction to ILP-Processors: Improving CPU Performance Through Instruction-Level Parallelism

  • 2. Instruction-Level Parallel Processors • Evolution and overview of ILP-processors • Dependencies between instructions • Instruction scheduling • Preserving sequential consistency • The speed-up potential of ILP-processing CH04
  • 3. Improve CPU performance by • Increasing clock rates  (CPU running at 4GHz!) • Increasing the number of instructions to be executed in parallel  (executing 6 instructions at the same time)
  • 4. How do we increase the number of instructions to be executed?
  • 5. Time and Space parallelism
  • 9. Data fetch from register file
  • 10. VLIW (Very Long Instruction Word, 1024 bits!)
  • 11. Superscalar (Sequential stream of instructions)
  • 12. From sequential instructions to parallel execution • Dependencies between instructions • Instruction scheduling • Preserving sequential consistency
  • 13. Dependencies between instructions Instructions often depend on each other in such a way that a particular instruction cannot be executed until a preceding instruction or even two or three preceding instructions have been executed. 1. Data dependencies 2. Control dependencies 3. Resource dependencies
  • 14. Data dependencies • Read after Write • Write after Read • Write after Write • Recurrences
  • 16. Data dependencies in straight-line code (RAW) • RAW dependencies i1: load r1, a i2: add r2, r1, r1 • flow dependencies • true dependencies • cannot be abandoned
  • 17. Data dependencies in straight-line code (WAR) • WAR dependencies i1: mul r1, r2, r3 i2: add r2, r4, r5 • anti-dependencies • false dependencies • can be eliminated through register renaming i1: mul r1, r2, r3 i2: add r6, r4, r5 by using compiler or ILP-processor
  • 18. Data dependencies in straight-line code (WAW) • WAW dependencies i1: mul r1, r2, r3 i2: add r1, r4, r5 • output dependencies • false dependencies • can be eliminated through register renaming i1: mul r1, r2, r3 i2: add r6, r4, r5 by using compiler or ILP-processor
  • 19. Data dependencies in loops for (int i=2; i<10; i++) { x[i] = a*x[i-1] + b } cannot be executed in parallel
  • 21. Data dependency graphs i1: load r1, a i2: load r2, b i3: add r3, r1, r2 i4: mul r1, r2, r4 i5: div r1, r2, r4 i1 i2 i3 i4 i5 Instruction interpretation: r3(r1)+(r2) etc. Figure 4.11 A straight-line assembly code sequence and its corresponding DDG δt δt δo δa
  • 22. Control dependencies mul r1, r2, r3 jz zproc sub r4, r7, r1 : zproc: load r1, x : • actual path of execution depends on the outcome of multiplication • impose dependencies on the logical subsequent instructions
  • 24. Branches? Frequency and branch distance • Expected frequency of (all) branch general-purpose programs (non scientific): 20-30% scientific programs: 5-10% • Expected frequency of conditional branch general-purpose programs: 20% scientific programs: 5-10% • Expected branch distance (between two branch) general-purpose programs: every 3rd-5th instruction, on average, to be a conditional branch scientific programs: every 10th-20th instruction, on average, to be a conditional branch
  • 25. Impact of Branch on instruction issue
  • 26. Resource dependencies • An instruction is resource-dependent on a previously issued instruction if it requires a hardware resource which is still being used by a previously issued instruction. • e.g. div r1, r2, r3 div r4, r2, r5
  • 27. Instruction scheduling • scheduling or arranging two or more instruction to be executed in parallel  Need to detect code dependency (detection)  Need to remove false dependency (resolution) • a means to extract parallelism  instruction-level parallelism which is implicit, is made explicit • Two basic approaches  Static: done by compiler  Dynamic: done by processor
  • 29. ILP-instruction scheduling ILP-instruction scheduling Detection and resolution of dependencies Parallel optimization Figure 4.18 interpretation of the concept of instruction scheduling in ILP-processors
  • 30. Basic approaches to ILP-instruction scheduling
  • 31. Preserving sequential consistency • care must be taken to maintain the logical integrity of the program execution • parallel execution vs sequential execution as far as the logical integrity of program execution is concerned • e.g. add r5, r6, r7 div r1, r2, r3 jz somewhere
  • 32. Concept of sequential consistency
  • 33. The speed-up potential of ILP-processing • Parallel instruction execution may be restricted by data, control and resource dependencies. • Potential speed-up when parallel instruction execution is restricted by true data and control dependencies:  general purpose programs: about 2  scientific programs: about 2-4 • Why are the speed-up figures so low?  basic block (a low-efficiency method used to extract parallelism)
  • 34. Basic Block • is a straight-line code sequence that can only be entered at the beginning and left at its end. i1: calc: add r3, r1, r2 i2: sub r4, r1, r2 i3: mul r4, r3, r4 i4: jn negproc • Basic block lengths of 3.5-6.3, with an overall average of 4.9 (RISC: general 7.8 and scientific 31.6 ) • Conditional Branch  control dependencies
  • 35. Two other methods for speed up • Potential speed-up embodied in loops  amount to a figure of 50 (on average speed up)  unlimited resources (about 50 processors, and about 400 registers)  ideal schedule • appropriate handling of control dependencies  amount to 10 to 100 times speed up  assuming perfect oracles that always pick the right path for conditional branch   control dependencies are the real obstacle in utilizing instruction-level parallelism!
  • 36. What do we do without a perfect oracle? • Execute all possible paths in conditional branch  there are 2^N paths for N conditional branches  pursuing an exponential increasing number of paths would be an unrealistic approach. • Make your Best Guess  branch prediction  pursuing both possible paths but restrict the number of subsequent conditional branches
  • 37. How close real systems can come to the upper limits of speed-up? • ambitious processor can expect to achieve speed- up figures of about  4 for general purpose programs  10-20 for scientific programs • an ambitious processor:  predicts conditional branches  has 256 integer and 256 FP register  eliminates false data dependencies through register renaming,  performs a perfect memory disambiguation  maintains a gliding instruction window of 64 items