SlideShare a Scribd company logo
A FINE-GRAIN MULTITHREADING
SUPERSCALAR ARCHITECTURE
ANKIK JOSHI(13014081004)
DHARMESH TANK(13014081024)
MTECHS-CS (SEM II)
POINTS TO BE COVERED
• Introduction
• Need of Multithread Architecture
• Superscalar Architecture
• Thread & Its Characteristics
• RRM(Register Relocation Map)
• Multithreading Architecture
• Results and Analysis
• Conclusion
INTRODUCTION
• To increase instruction-level parallelism and hide the latencies of long-
latency operations in a superscalar processor.
• The effects of long-latency operations are detrimental to performance
since such operations typically cause a stall.
• Even superscalar processors, that are capable of performing various
operations in parallel, are vulnerable.
NEED OF MULTITHREAD ARCHITECTURE
• Multithreading is the logical separation of a task into independent threads whose
work may be done separate from one another, with limited interaction or
synchronization between threads.
• A thread may be anything from a light-weight process in the operating system
domain to a short sequence of machine instructions in a program.
• The execution of multiple threads, or contexts, on a multithreading processor A
dynamically-scheduled, functional-thread execution model and processor
architecture with on-chip thread management is presented in this paper
• Though the multithreading concepts and general thread presented in this paper
are not necessarily new, We present a unique multithreading superscalar
processor architecture.
SUPERSCALAR ARCHITECTURE
• The base superscalar architecture for this research is derived from and is
very similar named the Superscalar Digital Signal Processor (SDSP)
• It employs a central instruction window into which instructions are fetched,
and from which they are scheduled to multiple, independent functional
units, potentially out-of-order.
• Stalls prevent instruction results at the bottom of the buffer from being
committed to the register file, and new instructions from being fetched
BASE SUPERSCALAR (SDSP) BLOCK DIAGRAM
THREAD & ITS CHARACTERISTICS
• A thread is a machine code sequence that is explicitly specified as distinct
from other threads of the same program; thread creation and synchronizing
instructions are used.
• These specify parallelism and serve to direct multithreaded control flow and
synchronization.
• No data dependencies
• The fork instruction is used to spawn a new thread of control in the program.
• The join instruction is used to synchronize or merge multiple threads into a
single continuing thread
THREAD REGISTER ALLOCATION AND REMAPPING
• A register usage and allocation scheme that dynamically partitions the
register file and uses runtime register remapping was devised
• This partitioning allows threads to execute the same code, and assumes a
general purpose register set of 32 registers, but may be scaled to other sizes.
• Register numbers specified in machine instructions are logical registers which
are mapped, during decode, to physical registers according to a thread’s
Register Relocation Map (RRM)
• The user-visible register set consists of thirty-two 32- bit general purpose
registers; the processor actually has 64 physical registers.
• Below figure shows that Actually only a subset of the registers are
remapped, as others serve as registers common between threads (e.g. for
synchronization).
• Above lists the possible register set partitionings and associated RRM values.
MULTITHREADING ARCHITECTURE
• Latency hiding without stalling is accomplished by suspending a thread.
• The instruction is removed from the main pipeline of the processor (central
instruction window) and placed into a buffer associated with the functional unit
doing the computation.
• The block diagram illustrates the
flow of control and data as they
relate to thread scheduling.
• TAS keeps information about thread
• Thread Suspending Instruction
Buffers (TSIB).
• Scheduling Unit and Pipeline.
• Instruction Fetch
• Instruction Decode.
• Instruction Issue.
• Instruction Execution.
• Data Cache-Miss
• When a load instruction misses the data cache (Dcache), it effectively
becomes a long-latency instruction even though it was not initially recognized
as such in the Decode phase.
• Dcache-miss is not recognized until the execute phase of the pipeline.
• At most, only a few instructions must be discarded in a scalar processor ,
equal to the number of pipeline phases that precede the phase in which the
Dcache-miss is recognized.
• It is not feasible to invalidate all instructions of the thread that came after the
Dcachemissing load instruction
• There is no reason to suspend a thread upon a Dcache-miss.
• Remote Loads and Stores
Remote loads are issued into a Load Queue of TSIBs of the Load Unit
• Instruction Commit and Write-Back
The completion of a TSI is relatively complex
• Branch Handling and Mispredict Recovery
Branch prediction occurs normally except that the PC affected is that
of the branch instruction’s thread
RESULTS & ANALYSIS
• Figure shows the execution cycle counts
of the benchmarks, comparing the
single-threaded (ST) run against the
multithreaded (MT) run. The cycle counts
were normalized to the common
denominator of one million cycles for
the single-threaded runs. The results
show an average multithreading
speedup of 1.51.
• Figure 5 shows the difference between
multithreading results obtained with and without
suspending. The point of reference is the
normalized result of running multithreaded
programs with the processor in no-suspend mode.
• Overall, it is clear that suspension functionality of
the processor is worth the added cost as
multithreading would otherwise not offer nearly
as much benefit.
• Figure shows the improved performance of
multithreading with an cache large enough to
contain the entire program once initially
loaded. Appreciable improvement is seen even
with this cold-start simulation test.
• Data Cache Miss Latencies
The latency of a Dcache miss is not hidden
but rather its effects are mitigated with
multithreading. Figure 7 shows results.
• Full Model
Full model simulations were done to see if
the modeling of the various latency hiding features
would conflict with one another in a way as to
decrease multithreading performance.
HOW IT WORKS
CONCLUSION
• The multithreading superscalar processor architecture presented here can say
decisively to significantly improve performance on a single thread superscalar
execution. The simulation results show that increased efficiency through
multithreading as an opportunity to hide the latency increases.
• Performance advantage of multithreading, however, closely related to
availability of different independent instruction. When a thread is suspended,
it is necessary to have a sufficient instructions from other threads to fill the
void. It is also found that if a program has a very good single threaded
performance ,multithreading offers less of an advantage.
FIne Grain Multithreading

More Related Content

What's hot

Multicore Processors
Multicore ProcessorsMulticore Processors
Multicore Processors
Smruti Sarangi
 
Superscalar & superpipeline processor
Superscalar & superpipeline processorSuperscalar & superpipeline processor
Superscalar & superpipeline processorMuhammad Ishaq
 
Computer Architecture: A quantitative approach - Cap4 - Section 1
Computer Architecture: A quantitative approach - Cap4 - Section 1Computer Architecture: A quantitative approach - Cap4 - Section 1
Computer Architecture: A quantitative approach - Cap4 - Section 1Marcelo Arbore
 
Parallel computing
Parallel computingParallel computing
Parallel computingvirend111
 
message passing vs shared memory
message passing vs shared memorymessage passing vs shared memory
message passing vs shared memory
Hamza Zahid
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors ppt
Siddhartha Anand
 
Lecture 6
Lecture  6Lecture  6
Lecture 6Mr SMAK
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
A B Shinde
 
Dichotomy of parallel computing platforms
Dichotomy of parallel computing platformsDichotomy of parallel computing platforms
Dichotomy of parallel computing platforms
Syed Zaid Irshad
 
Parallelism
ParallelismParallelism
Parallelism
Md Raseduzzaman
 
Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler
Sarwan ali
 
VLIW Processors
VLIW ProcessorsVLIW Processors
VLIW Processors
Sudhanshu Janwadkar
 
Parallel processing
Parallel processingParallel processing
Parallel processing
Syed Zaid Irshad
 
Lecture 6
Lecture  6Lecture  6
Lecture 6Mr SMAK
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
Dr Shashikant Athawale
 
Parallel processing
Parallel processingParallel processing
Parallel processing
Shivalik college of engineering
 
Parallel Processing Presentation2
Parallel Processing Presentation2Parallel Processing Presentation2
Parallel Processing Presentation2daniyalqureshi712
 
Cache coherence problem and its solutions
Cache coherence problem and its solutionsCache coherence problem and its solutions
Cache coherence problem and its solutions
Majid Saleem
 
Graphics processing uni computer archiecture
Graphics processing uni computer archiectureGraphics processing uni computer archiecture
Graphics processing uni computer archiecture
Haris456
 

What's hot (20)

Multicore Processors
Multicore ProcessorsMulticore Processors
Multicore Processors
 
Superscalar & superpipeline processor
Superscalar & superpipeline processorSuperscalar & superpipeline processor
Superscalar & superpipeline processor
 
Computer Architecture: A quantitative approach - Cap4 - Section 1
Computer Architecture: A quantitative approach - Cap4 - Section 1Computer Architecture: A quantitative approach - Cap4 - Section 1
Computer Architecture: A quantitative approach - Cap4 - Section 1
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
message passing vs shared memory
message passing vs shared memorymessage passing vs shared memory
message passing vs shared memory
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors ppt
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
 
Dichotomy of parallel computing platforms
Dichotomy of parallel computing platformsDichotomy of parallel computing platforms
Dichotomy of parallel computing platforms
 
Parallelism
ParallelismParallelism
Parallelism
 
Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler
 
VLIW Processors
VLIW ProcessorsVLIW Processors
VLIW Processors
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
Parallel Processing Presentation2
Parallel Processing Presentation2Parallel Processing Presentation2
Parallel Processing Presentation2
 
Cache coherence problem and its solutions
Cache coherence problem and its solutionsCache coherence problem and its solutions
Cache coherence problem and its solutions
 
Parallel processing Concepts
Parallel processing ConceptsParallel processing Concepts
Parallel processing Concepts
 
Graphics processing uni computer archiecture
Graphics processing uni computer archiectureGraphics processing uni computer archiecture
Graphics processing uni computer archiecture
 

Viewers also liked

Architecture Challenges In Cloud Computing
Architecture Challenges In Cloud ComputingArchitecture Challenges In Cloud Computing
Architecture Challenges In Cloud Computing
IndicThreads
 
Network Functions Virtualization – Our Strategy
Network Functions Virtualization – Our StrategyNetwork Functions Virtualization – Our Strategy
Network Functions Virtualization – Our Strategy
ADVA
 
MULTITHREADING CONCEPT
MULTITHREADING CONCEPTMULTITHREADING CONCEPT
MULTITHREADING CONCEPT
RAVI MAURYA
 
Multithreading
MultithreadingMultithreading
Multithreadingsagsharma
 
Flynns classification
Flynns classificationFlynns classification
Flynns classification
Yasir Khan
 
Multi core processors
Multi core processorsMulti core processors
Multi core processors
Adithya Bhat
 
Multithreading models.ppt
Multithreading models.pptMultithreading models.ppt
Multithreading models.ppt
Luis Goldster
 
Open Source Private Cloud Management with OpenStack and Security Evaluation w...
Open Source Private Cloud Management with OpenStack and Security Evaluation w...Open Source Private Cloud Management with OpenStack and Security Evaluation w...
Open Source Private Cloud Management with OpenStack and Security Evaluation w...
XHANI TRUNGU
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
guest0edcaf
 
Multithreading
Multithreading Multithreading
Multithreading
WafaQKhan
 
Analysis and Design for Intrusion Detection System Based on Data Mining
Analysis and Design for Intrusion Detection System Based on Data MiningAnalysis and Design for Intrusion Detection System Based on Data Mining
Analysis and Design for Intrusion Detection System Based on Data Mining
Pritesh Ranjan
 
Multicore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash PrajapatiMulticore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash Prajapati
Ankit Raj
 
Multicore Processsors
Multicore ProcesssorsMulticore Processsors
Multicore Processsors
Aveen Meena
 
Junctionless transistors
Junctionless transistorsJunctionless transistors
Junctionless transistors
Pratishtha Agnihotri
 
Recursos 2.0 para docentes.
Recursos 2.0 para docentes.Recursos 2.0 para docentes.
Recursos 2.0 para docentes.
rociomartinez88
 
Стажировка 2016-07-12 02 Денис Нелюбин. Web, HTTP, TCP/IP
Стажировка 2016-07-12 02 Денис Нелюбин. Web, HTTP, TCP/IPСтажировка 2016-07-12 02 Денис Нелюбин. Web, HTTP, TCP/IP
Стажировка 2016-07-12 02 Денис Нелюбин. Web, HTTP, TCP/IP
SmartTools
 
Consulta datos del paciente
Consulta datos del pacienteConsulta datos del paciente
Consulta datos del pacienteangiedaiana
 
Hardware multithreading
Hardware multithreadingHardware multithreading
Hardware multithreading
Fraboni Ec
 
COMO INSTALAR MySQL EN LINUX
COMO INSTALAR  MySQL EN LINUXCOMO INSTALAR  MySQL EN LINUX
COMO INSTALAR MySQL EN LINUX
Ing-D-SW-TorresKhano--ME
 

Viewers also liked (20)

Architecture Challenges In Cloud Computing
Architecture Challenges In Cloud ComputingArchitecture Challenges In Cloud Computing
Architecture Challenges In Cloud Computing
 
Network Functions Virtualization – Our Strategy
Network Functions Virtualization – Our StrategyNetwork Functions Virtualization – Our Strategy
Network Functions Virtualization – Our Strategy
 
MULTITHREADING CONCEPT
MULTITHREADING CONCEPTMULTITHREADING CONCEPT
MULTITHREADING CONCEPT
 
Multithreading
MultithreadingMultithreading
Multithreading
 
Flynns classification
Flynns classificationFlynns classification
Flynns classification
 
Multi core processors
Multi core processorsMulti core processors
Multi core processors
 
Multithreading models.ppt
Multithreading models.pptMultithreading models.ppt
Multithreading models.ppt
 
Open Source Private Cloud Management with OpenStack and Security Evaluation w...
Open Source Private Cloud Management with OpenStack and Security Evaluation w...Open Source Private Cloud Management with OpenStack and Security Evaluation w...
Open Source Private Cloud Management with OpenStack and Security Evaluation w...
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Multithreading
Multithreading Multithreading
Multithreading
 
Multi core processor
Multi core processorMulti core processor
Multi core processor
 
Analysis and Design for Intrusion Detection System Based on Data Mining
Analysis and Design for Intrusion Detection System Based on Data MiningAnalysis and Design for Intrusion Detection System Based on Data Mining
Analysis and Design for Intrusion Detection System Based on Data Mining
 
Multicore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash PrajapatiMulticore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash Prajapati
 
Multicore Processsors
Multicore ProcesssorsMulticore Processsors
Multicore Processsors
 
Junctionless transistors
Junctionless transistorsJunctionless transistors
Junctionless transistors
 
Recursos 2.0 para docentes.
Recursos 2.0 para docentes.Recursos 2.0 para docentes.
Recursos 2.0 para docentes.
 
Стажировка 2016-07-12 02 Денис Нелюбин. Web, HTTP, TCP/IP
Стажировка 2016-07-12 02 Денис Нелюбин. Web, HTTP, TCP/IPСтажировка 2016-07-12 02 Денис Нелюбин. Web, HTTP, TCP/IP
Стажировка 2016-07-12 02 Денис Нелюбин. Web, HTTP, TCP/IP
 
Consulta datos del paciente
Consulta datos del pacienteConsulta datos del paciente
Consulta datos del paciente
 
Hardware multithreading
Hardware multithreadingHardware multithreading
Hardware multithreading
 
COMO INSTALAR MySQL EN LINUX
COMO INSTALAR  MySQL EN LINUXCOMO INSTALAR  MySQL EN LINUX
COMO INSTALAR MySQL EN LINUX
 

Similar to FIne Grain Multithreading

SOC System Design Approach
SOC System Design ApproachSOC System Design Approach
SOC System Design Approach
A B Shinde
 
Advanced processor principles
Advanced processor principlesAdvanced processor principles
Advanced processor principles
Dhaval Bagal
 
Motivation for multithreaded architectures
Motivation for multithreaded architecturesMotivation for multithreaded architectures
Motivation for multithreaded architectures
Young Alista
 
Vector computing
Vector computingVector computing
Vector computing
Safayet Hossain
 
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...
IDES Editor
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
Mohsin Bhat
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
Murtadha Alsabbagh
 
Digital signal processor architecture
Digital signal processor architectureDigital signal processor architecture
Digital signal processor architecture
komal mistry
 
Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6
Ismail Mukiibi
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architecture
Mani Goswami
 
Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUB
Nusrat Mary
 
VLSI design Dr B.jagadeesh UNIT-5.pptx
VLSI design Dr B.jagadeesh   UNIT-5.pptxVLSI design Dr B.jagadeesh   UNIT-5.pptx
VLSI design Dr B.jagadeesh UNIT-5.pptx
jagadeesh276791
 
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer Architecture
Balaji Vignesh
 
Coding For Cores - C# Way
Coding For Cores - C# WayCoding For Cores - C# Way
Coding For Cores - C# Way
Bishnu Rawal
 
Cc module 3.pptx
Cc module 3.pptxCc module 3.pptx
Cc module 3.pptx
ssuserbead51
 
BIL406-Chapter-6-Basic Parallelism and CPU.ppt
BIL406-Chapter-6-Basic Parallelism and CPU.pptBIL406-Chapter-6-Basic Parallelism and CPU.ppt
BIL406-Chapter-6-Basic Parallelism and CPU.ppt
Kadri20
 
ASIC design verification
ASIC design verificationASIC design verification
ASIC design verification
Gireesh Kallihal
 
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
ijceronline
 

Similar to FIne Grain Multithreading (20)

SOC System Design Approach
SOC System Design ApproachSOC System Design Approach
SOC System Design Approach
 
Advanced processor principles
Advanced processor principlesAdvanced processor principles
Advanced processor principles
 
Motivation for multithreaded architectures
Motivation for multithreaded architecturesMotivation for multithreaded architectures
Motivation for multithreaded architectures
 
Vector computing
Vector computingVector computing
Vector computing
 
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
 
Digital signal processor architecture
Digital signal processor architectureDigital signal processor architecture
Digital signal processor architecture
 
Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architecture
 
Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUB
 
Embedded C
Embedded CEmbedded C
Embedded C
 
VLSI design Dr B.jagadeesh UNIT-5.pptx
VLSI design Dr B.jagadeesh   UNIT-5.pptxVLSI design Dr B.jagadeesh   UNIT-5.pptx
VLSI design Dr B.jagadeesh UNIT-5.pptx
 
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer Architecture
 
Coding For Cores - C# Way
Coding For Cores - C# WayCoding For Cores - C# Way
Coding For Cores - C# Way
 
Lec1 final
Lec1 finalLec1 final
Lec1 final
 
Cc module 3.pptx
Cc module 3.pptxCc module 3.pptx
Cc module 3.pptx
 
BIL406-Chapter-6-Basic Parallelism and CPU.ppt
BIL406-Chapter-6-Basic Parallelism and CPU.pptBIL406-Chapter-6-Basic Parallelism and CPU.ppt
BIL406-Chapter-6-Basic Parallelism and CPU.ppt
 
ASIC design verification
ASIC design verificationASIC design verification
ASIC design verification
 
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
Matlab Based High Level Synthesis Engine for Area And Power Efficient Arithme...
 

More from Dharmesh Tank

Basic of Python- Hands on Session
Basic of Python- Hands on SessionBasic of Python- Hands on Session
Basic of Python- Hands on Session
Dharmesh Tank
 
Seminar on MATLAB
Seminar on MATLABSeminar on MATLAB
Seminar on MATLAB
Dharmesh Tank
 
Goal Recognition in Soccer Match
Goal Recognition in Soccer MatchGoal Recognition in Soccer Match
Goal Recognition in Soccer Match
Dharmesh Tank
 
Face recognization using artificial nerual network
Face recognization using artificial nerual networkFace recognization using artificial nerual network
Face recognization using artificial nerual network
Dharmesh Tank
 
Graph problem & lp formulation
Graph problem & lp formulationGraph problem & lp formulation
Graph problem & lp formulation
Dharmesh Tank
 

More from Dharmesh Tank (6)

Basic of Python- Hands on Session
Basic of Python- Hands on SessionBasic of Python- Hands on Session
Basic of Python- Hands on Session
 
Seminar on MATLAB
Seminar on MATLABSeminar on MATLAB
Seminar on MATLAB
 
Goal Recognition in Soccer Match
Goal Recognition in Soccer MatchGoal Recognition in Soccer Match
Goal Recognition in Soccer Match
 
Face recognization using artificial nerual network
Face recognization using artificial nerual networkFace recognization using artificial nerual network
Face recognization using artificial nerual network
 
Graph problem & lp formulation
Graph problem & lp formulationGraph problem & lp formulation
Graph problem & lp formulation
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 

Recently uploaded

ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 

Recently uploaded (20)

ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 

FIne Grain Multithreading

  • 1. A FINE-GRAIN MULTITHREADING SUPERSCALAR ARCHITECTURE ANKIK JOSHI(13014081004) DHARMESH TANK(13014081024) MTECHS-CS (SEM II)
  • 2. POINTS TO BE COVERED • Introduction • Need of Multithread Architecture • Superscalar Architecture • Thread & Its Characteristics • RRM(Register Relocation Map) • Multithreading Architecture • Results and Analysis • Conclusion
  • 3. INTRODUCTION • To increase instruction-level parallelism and hide the latencies of long- latency operations in a superscalar processor. • The effects of long-latency operations are detrimental to performance since such operations typically cause a stall. • Even superscalar processors, that are capable of performing various operations in parallel, are vulnerable.
  • 4. NEED OF MULTITHREAD ARCHITECTURE • Multithreading is the logical separation of a task into independent threads whose work may be done separate from one another, with limited interaction or synchronization between threads. • A thread may be anything from a light-weight process in the operating system domain to a short sequence of machine instructions in a program. • The execution of multiple threads, or contexts, on a multithreading processor A dynamically-scheduled, functional-thread execution model and processor architecture with on-chip thread management is presented in this paper • Though the multithreading concepts and general thread presented in this paper are not necessarily new, We present a unique multithreading superscalar processor architecture.
  • 5. SUPERSCALAR ARCHITECTURE • The base superscalar architecture for this research is derived from and is very similar named the Superscalar Digital Signal Processor (SDSP) • It employs a central instruction window into which instructions are fetched, and from which they are scheduled to multiple, independent functional units, potentially out-of-order. • Stalls prevent instruction results at the bottom of the buffer from being committed to the register file, and new instructions from being fetched
  • 7. THREAD & ITS CHARACTERISTICS • A thread is a machine code sequence that is explicitly specified as distinct from other threads of the same program; thread creation and synchronizing instructions are used. • These specify parallelism and serve to direct multithreaded control flow and synchronization. • No data dependencies • The fork instruction is used to spawn a new thread of control in the program. • The join instruction is used to synchronize or merge multiple threads into a single continuing thread
  • 8. THREAD REGISTER ALLOCATION AND REMAPPING • A register usage and allocation scheme that dynamically partitions the register file and uses runtime register remapping was devised • This partitioning allows threads to execute the same code, and assumes a general purpose register set of 32 registers, but may be scaled to other sizes. • Register numbers specified in machine instructions are logical registers which are mapped, during decode, to physical registers according to a thread’s Register Relocation Map (RRM) • The user-visible register set consists of thirty-two 32- bit general purpose registers; the processor actually has 64 physical registers.
  • 9. • Below figure shows that Actually only a subset of the registers are remapped, as others serve as registers common between threads (e.g. for synchronization). • Above lists the possible register set partitionings and associated RRM values.
  • 10. MULTITHREADING ARCHITECTURE • Latency hiding without stalling is accomplished by suspending a thread. • The instruction is removed from the main pipeline of the processor (central instruction window) and placed into a buffer associated with the functional unit doing the computation.
  • 11. • The block diagram illustrates the flow of control and data as they relate to thread scheduling. • TAS keeps information about thread • Thread Suspending Instruction Buffers (TSIB). • Scheduling Unit and Pipeline. • Instruction Fetch • Instruction Decode. • Instruction Issue. • Instruction Execution.
  • 12. • Data Cache-Miss • When a load instruction misses the data cache (Dcache), it effectively becomes a long-latency instruction even though it was not initially recognized as such in the Decode phase. • Dcache-miss is not recognized until the execute phase of the pipeline. • At most, only a few instructions must be discarded in a scalar processor , equal to the number of pipeline phases that precede the phase in which the Dcache-miss is recognized. • It is not feasible to invalidate all instructions of the thread that came after the Dcachemissing load instruction • There is no reason to suspend a thread upon a Dcache-miss.
  • 13. • Remote Loads and Stores Remote loads are issued into a Load Queue of TSIBs of the Load Unit • Instruction Commit and Write-Back The completion of a TSI is relatively complex • Branch Handling and Mispredict Recovery Branch prediction occurs normally except that the PC affected is that of the branch instruction’s thread
  • 14. RESULTS & ANALYSIS • Figure shows the execution cycle counts of the benchmarks, comparing the single-threaded (ST) run against the multithreaded (MT) run. The cycle counts were normalized to the common denominator of one million cycles for the single-threaded runs. The results show an average multithreading speedup of 1.51.
  • 15. • Figure 5 shows the difference between multithreading results obtained with and without suspending. The point of reference is the normalized result of running multithreaded programs with the processor in no-suspend mode. • Overall, it is clear that suspension functionality of the processor is worth the added cost as multithreading would otherwise not offer nearly as much benefit.
  • 16. • Figure shows the improved performance of multithreading with an cache large enough to contain the entire program once initially loaded. Appreciable improvement is seen even with this cold-start simulation test.
  • 17. • Data Cache Miss Latencies The latency of a Dcache miss is not hidden but rather its effects are mitigated with multithreading. Figure 7 shows results. • Full Model Full model simulations were done to see if the modeling of the various latency hiding features would conflict with one another in a way as to decrease multithreading performance.
  • 19.
  • 20.
  • 21. CONCLUSION • The multithreading superscalar processor architecture presented here can say decisively to significantly improve performance on a single thread superscalar execution. The simulation results show that increased efficiency through multithreading as an opportunity to hide the latency increases. • Performance advantage of multithreading, however, closely related to availability of different independent instruction. When a thread is suspended, it is necessary to have a sufficient instructions from other threads to fill the void. It is also found that if a program has a very good single threaded performance ,multithreading offers less of an advantage.