SlideShare a Scribd company logo
ECWAY TECHNOLOGIES
IEEE PROJECTS & SOFTWARE DEVELOPMENTS
OUR OFFICES @ CHENNAI / TRICHY / KARUR / ERODE / MADURAI / SALEM / COIMBATORE
CELL: +91 98949 17187, +91 875487 2111 / 3111 / 4111 / 5111 / 6111
VISIT: www.ecwayprojects.com MAIL TO: ecwaytechnologies@gmail.com

MICROARCHITECTURE OF A COARSE-GRAIN OUT-OF-ORDER
SUPERSCALAR PROCESSOR
ABSTRACT:

We explore the design, implementation, and evaluation of a coarse-grain superscalar processor in
the context of the microarchitecture of the Control Processor (CP) of the Multilevel Computing
Architecture (MLCA), a novel architecture targeted for multimedia multicore systems. The
MLCA augments a traditional multicore architecture (called the lower level) with a CP (called
the top-level), which automatically extracts parallelism among coarse-grain units of computation
(tasks), synchronizes these tasks and schedules them for execution on processors. It does so in a
fashion similar to how instruction-level parallelism is extracted by superscalar processors, i.e.,
using registers renaming, Out-of-Order Execution (OoOE) and scheduling. The coarse-grain
nature of tasks imposes challenging constraints on the direct use of these techniques, but also
offers opportunities for simpler designs.

We analyze the impact of these constraints and opportunities and present novel
microarchitectural mechanisms for coarse-grain superscalar execution, including register
renaming, task queue, dynamic out-of-order scheduling and task-issue. We design an MLCA
system around our CP microarchitecture and implement it on an FPGA. We evaluate the system
using multimedia applications and show good scalability for eight processors, limited by the
memory bandwidth of the FPGA platform. Furthermore, we show that the CP introduces little
overhead in terms of resource usage. Finally, we show scalability beyond eight processors using
cycle-accurate RTL-level simulation with an idealized memory subsystem. We demonstrate that
the CP poses no performance bottlenecks and is scalable up to 32 processors.

More Related Content

Viewers also liked

Min max a counter-based algorithm for regular expression matching
Min max a counter-based algorithm for regular expression matchingMin max a counter-based algorithm for regular expression matching
Min max a counter-based algorithm for regular expression matchingecwayprojects
 
Maximum likelihood estimation from uncertain data in the belief function fram...
Maximum likelihood estimation from uncertain data in the belief function fram...Maximum likelihood estimation from uncertain data in the belief function fram...
Maximum likelihood estimation from uncertain data in the belief function fram...ecwayprojects
 
Localization based radio model calibration for fault-tolerant wireless mesh n...
Localization based radio model calibration for fault-tolerant wireless mesh n...Localization based radio model calibration for fault-tolerant wireless mesh n...
Localization based radio model calibration for fault-tolerant wireless mesh n...ecwayprojects
 
Mining semantic context information for intelligent video surveillance of tra...
Mining semantic context information for intelligent video surveillance of tra...Mining semantic context information for intelligent video surveillance of tra...
Mining semantic context information for intelligent video surveillance of tra...ecwayprojects
 
Large graph analysis in the g mine system
Large graph analysis in the g mine systemLarge graph analysis in the g mine system
Large graph analysis in the g mine systemecwayprojects
 
Large graph analysis in the g mine system
Large graph analysis in the g mine systemLarge graph analysis in the g mine system
Large graph analysis in the g mine systemecwayprojects
 
Model based analysis of wireless system architectures for real-time applications
Model based analysis of wireless system architectures for real-time applicationsModel based analysis of wireless system architectures for real-time applications
Model based analysis of wireless system architectures for real-time applicationsecwayprojects
 

Viewers also liked (7)

Min max a counter-based algorithm for regular expression matching
Min max a counter-based algorithm for regular expression matchingMin max a counter-based algorithm for regular expression matching
Min max a counter-based algorithm for regular expression matching
 
Maximum likelihood estimation from uncertain data in the belief function fram...
Maximum likelihood estimation from uncertain data in the belief function fram...Maximum likelihood estimation from uncertain data in the belief function fram...
Maximum likelihood estimation from uncertain data in the belief function fram...
 
Localization based radio model calibration for fault-tolerant wireless mesh n...
Localization based radio model calibration for fault-tolerant wireless mesh n...Localization based radio model calibration for fault-tolerant wireless mesh n...
Localization based radio model calibration for fault-tolerant wireless mesh n...
 
Mining semantic context information for intelligent video surveillance of tra...
Mining semantic context information for intelligent video surveillance of tra...Mining semantic context information for intelligent video surveillance of tra...
Mining semantic context information for intelligent video surveillance of tra...
 
Large graph analysis in the g mine system
Large graph analysis in the g mine systemLarge graph analysis in the g mine system
Large graph analysis in the g mine system
 
Large graph analysis in the g mine system
Large graph analysis in the g mine systemLarge graph analysis in the g mine system
Large graph analysis in the g mine system
 
Model based analysis of wireless system architectures for real-time applications
Model based analysis of wireless system architectures for real-time applicationsModel based analysis of wireless system architectures for real-time applications
Model based analysis of wireless system architectures for real-time applications
 

Similar to Microarchitecture of a coarse grain out-of-order superscalar processor

Java microarchitecture of a coarse-grain out-of-order superscalar processor
Java  microarchitecture of a coarse-grain out-of-order superscalar processorJava  microarchitecture of a coarse-grain out-of-order superscalar processor
Java microarchitecture of a coarse-grain out-of-order superscalar processorecwayerode
 
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind
 
OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021
OpenACC
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
faithxdunce63732
 
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...
csandit
 
An octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingAn octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passing
eSAT Journals
 
Cache performance-x86-2009
Cache performance-x86-2009Cache performance-x86-2009
Cache performance-x86-2009Léia de Sousa
 
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGHOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
cscpconf
 
Synergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architectureSynergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architecture
Michael Gschwind
 
Genetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing EnvironmentGenetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing Environment
Swapnil Shahade
 
MPSoC Platform Design and Simulation for Power %0A Performance Estimation
MPSoC Platform Design and  Simulation for Power %0A Performance EstimationMPSoC Platform Design and  Simulation for Power %0A Performance Estimation
MPSoC Platform Design and Simulation for Power %0A Performance EstimationZhengjie Lu
 
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORSAFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
cscpconf
 
Affect of parallel computing on multicore processors
Affect of parallel computing on multicore processorsAffect of parallel computing on multicore processors
Affect of parallel computing on multicore processors
csandit
 
main-camera-ready
main-camera-readymain-camera-ready
main-camera-readyShaolin Xie
 
Parallex - The Supercomputer
Parallex - The SupercomputerParallex - The Supercomputer
Parallex - The Supercomputer
Ankit Singh
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spacejsvetter
 
Chap 2 classification of parralel architecture and introduction to parllel p...
Chap 2  classification of parralel architecture and introduction to parllel p...Chap 2  classification of parralel architecture and introduction to parllel p...
Chap 2 classification of parralel architecture and introduction to parllel p...
Malobe Lottin Cyrille Marcel
 
1.multicore processors
1.multicore processors1.multicore processors
1.multicore processors
Hebeon1
 
Concurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core ProcessorsConcurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core Processors
CSCJournals
 
Interface for Performance Environment Autoconfiguration Framework
Interface for Performance Environment Autoconfiguration FrameworkInterface for Performance Environment Autoconfiguration Framework
Interface for Performance Environment Autoconfiguration FrameworkLiang Men
 

Similar to Microarchitecture of a coarse grain out-of-order superscalar processor (20)

Java microarchitecture of a coarse-grain out-of-order superscalar processor
Java  microarchitecture of a coarse-grain out-of-order superscalar processorJava  microarchitecture of a coarse-grain out-of-order superscalar processor
Java microarchitecture of a coarse-grain out-of-order superscalar processor
 
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
 
OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
 
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...
 
An octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingAn octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passing
 
Cache performance-x86-2009
Cache performance-x86-2009Cache performance-x86-2009
Cache performance-x86-2009
 
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGHOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
 
Synergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architectureSynergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architecture
 
Genetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing EnvironmentGenetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing Environment
 
MPSoC Platform Design and Simulation for Power %0A Performance Estimation
MPSoC Platform Design and  Simulation for Power %0A Performance EstimationMPSoC Platform Design and  Simulation for Power %0A Performance Estimation
MPSoC Platform Design and Simulation for Power %0A Performance Estimation
 
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORSAFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
 
Affect of parallel computing on multicore processors
Affect of parallel computing on multicore processorsAffect of parallel computing on multicore processors
Affect of parallel computing on multicore processors
 
main-camera-ready
main-camera-readymain-camera-ready
main-camera-ready
 
Parallex - The Supercomputer
Parallex - The SupercomputerParallex - The Supercomputer
Parallex - The Supercomputer
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design space
 
Chap 2 classification of parralel architecture and introduction to parllel p...
Chap 2  classification of parralel architecture and introduction to parllel p...Chap 2  classification of parralel architecture and introduction to parllel p...
Chap 2 classification of parralel architecture and introduction to parllel p...
 
1.multicore processors
1.multicore processors1.multicore processors
1.multicore processors
 
Concurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core ProcessorsConcurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core Processors
 
Interface for Performance Environment Autoconfiguration Framework
Interface for Performance Environment Autoconfiguration FrameworkInterface for Performance Environment Autoconfiguration Framework
Interface for Performance Environment Autoconfiguration Framework
 

More from ecwayprojects

In network estimation with delay constraints in wireless sensor networks
In network estimation with delay constraints in wireless sensor networksIn network estimation with delay constraints in wireless sensor networks
In network estimation with delay constraints in wireless sensor networksecwayprojects
 
Importance of coherence protocols with network applications on multicore proc...
Importance of coherence protocols with network applications on multicore proc...Importance of coherence protocols with network applications on multicore proc...
Importance of coherence protocols with network applications on multicore proc...ecwayprojects
 
Idm an indirect dissemination mechanism for spatial voice interaction in netw...
Idm an indirect dissemination mechanism for spatial voice interaction in netw...Idm an indirect dissemination mechanism for spatial voice interaction in netw...
Idm an indirect dissemination mechanism for spatial voice interaction in netw...ecwayprojects
 
Harvesting aware energy management for time-critical wireless sensor networks...
Harvesting aware energy management for time-critical wireless sensor networks...Harvesting aware energy management for time-critical wireless sensor networks...
Harvesting aware energy management for time-critical wireless sensor networks...ecwayprojects
 
Gaussian versus uniform distribution for intrusion detection in wireless sens...
Gaussian versus uniform distribution for intrusion detection in wireless sens...Gaussian versus uniform distribution for intrusion detection in wireless sens...
Gaussian versus uniform distribution for intrusion detection in wireless sens...ecwayprojects
 
Finding rare classes active learning with generative and discriminative models
Finding rare classes active learning with generative and discriminative modelsFinding rare classes active learning with generative and discriminative models
Finding rare classes active learning with generative and discriminative modelsecwayprojects
 
Fast channel zapping with destination oriented multicast for ip video delivery
Fast channel zapping with destination oriented multicast for ip video deliveryFast channel zapping with destination oriented multicast for ip video delivery
Fast channel zapping with destination oriented multicast for ip video deliveryecwayprojects
 
Fast activity detection indexing for temporal stochastic automaton based acti...
Fast activity detection indexing for temporal stochastic automaton based acti...Fast activity detection indexing for temporal stochastic automaton based acti...
Fast activity detection indexing for temporal stochastic automaton based acti...ecwayprojects
 
Exploiting ubiquitous data collection for mobile users in wireless sensor net...
Exploiting ubiquitous data collection for mobile users in wireless sensor net...Exploiting ubiquitous data collection for mobile users in wireless sensor net...
Exploiting ubiquitous data collection for mobile users in wireless sensor net...ecwayprojects
 
Event tracking for real time unaware sensitivity analysis
Event tracking for real time unaware sensitivity analysisEvent tracking for real time unaware sensitivity analysis
Event tracking for real time unaware sensitivity analysisecwayprojects
 
Emap expedite message authentication protocol for vehicular ad hoc networks
Emap expedite message authentication protocol for vehicular ad hoc networksEmap expedite message authentication protocol for vehicular ad hoc networks
Emap expedite message authentication protocol for vehicular ad hoc networksecwayprojects
 
Eaack—a secure intrusion detection system for mane ts
Eaack—a secure intrusion detection system for mane tsEaack—a secure intrusion detection system for mane ts
Eaack—a secure intrusion detection system for mane tsecwayprojects
 
Dynamic coverage of mobile sensor networks
Dynamic coverage of mobile sensor networksDynamic coverage of mobile sensor networks
Dynamic coverage of mobile sensor networksecwayprojects
 
Distributed web systems performance forecasting using turning bands method
Distributed web systems performance forecasting using turning bands methodDistributed web systems performance forecasting using turning bands method
Distributed web systems performance forecasting using turning bands methodecwayprojects
 
Distributed processing of probabilistic top k queries in wireless sensor netw...
Distributed processing of probabilistic top k queries in wireless sensor netw...Distributed processing of probabilistic top k queries in wireless sensor netw...
Distributed processing of probabilistic top k queries in wireless sensor netw...ecwayprojects
 
Discovery and verification of neighbor positions in mobile ad hoc networks
Discovery and verification of neighbor positions in mobile ad hoc networksDiscovery and verification of neighbor positions in mobile ad hoc networks
Discovery and verification of neighbor positions in mobile ad hoc networksecwayprojects
 
Detection and localization of multiple spoofing attackers in wireless networks
Detection and localization of multiple spoofing attackers in wireless networksDetection and localization of multiple spoofing attackers in wireless networks
Detection and localization of multiple spoofing attackers in wireless networksecwayprojects
 
Delay optimal broadcast for multihop wireless networks using self-interferenc...
Delay optimal broadcast for multihop wireless networks using self-interferenc...Delay optimal broadcast for multihop wireless networks using self-interferenc...
Delay optimal broadcast for multihop wireless networks using self-interferenc...ecwayprojects
 
Cross layer design of congestion control and power control in fast-fading wir...
Cross layer design of congestion control and power control in fast-fading wir...Cross layer design of congestion control and power control in fast-fading wir...
Cross layer design of congestion control and power control in fast-fading wir...ecwayprojects
 
Covering points of interest with mobile sensors
Covering points of interest with mobile sensorsCovering points of interest with mobile sensors
Covering points of interest with mobile sensorsecwayprojects
 

More from ecwayprojects (20)

In network estimation with delay constraints in wireless sensor networks
In network estimation with delay constraints in wireless sensor networksIn network estimation with delay constraints in wireless sensor networks
In network estimation with delay constraints in wireless sensor networks
 
Importance of coherence protocols with network applications on multicore proc...
Importance of coherence protocols with network applications on multicore proc...Importance of coherence protocols with network applications on multicore proc...
Importance of coherence protocols with network applications on multicore proc...
 
Idm an indirect dissemination mechanism for spatial voice interaction in netw...
Idm an indirect dissemination mechanism for spatial voice interaction in netw...Idm an indirect dissemination mechanism for spatial voice interaction in netw...
Idm an indirect dissemination mechanism for spatial voice interaction in netw...
 
Harvesting aware energy management for time-critical wireless sensor networks...
Harvesting aware energy management for time-critical wireless sensor networks...Harvesting aware energy management for time-critical wireless sensor networks...
Harvesting aware energy management for time-critical wireless sensor networks...
 
Gaussian versus uniform distribution for intrusion detection in wireless sens...
Gaussian versus uniform distribution for intrusion detection in wireless sens...Gaussian versus uniform distribution for intrusion detection in wireless sens...
Gaussian versus uniform distribution for intrusion detection in wireless sens...
 
Finding rare classes active learning with generative and discriminative models
Finding rare classes active learning with generative and discriminative modelsFinding rare classes active learning with generative and discriminative models
Finding rare classes active learning with generative and discriminative models
 
Fast channel zapping with destination oriented multicast for ip video delivery
Fast channel zapping with destination oriented multicast for ip video deliveryFast channel zapping with destination oriented multicast for ip video delivery
Fast channel zapping with destination oriented multicast for ip video delivery
 
Fast activity detection indexing for temporal stochastic automaton based acti...
Fast activity detection indexing for temporal stochastic automaton based acti...Fast activity detection indexing for temporal stochastic automaton based acti...
Fast activity detection indexing for temporal stochastic automaton based acti...
 
Exploiting ubiquitous data collection for mobile users in wireless sensor net...
Exploiting ubiquitous data collection for mobile users in wireless sensor net...Exploiting ubiquitous data collection for mobile users in wireless sensor net...
Exploiting ubiquitous data collection for mobile users in wireless sensor net...
 
Event tracking for real time unaware sensitivity analysis
Event tracking for real time unaware sensitivity analysisEvent tracking for real time unaware sensitivity analysis
Event tracking for real time unaware sensitivity analysis
 
Emap expedite message authentication protocol for vehicular ad hoc networks
Emap expedite message authentication protocol for vehicular ad hoc networksEmap expedite message authentication protocol for vehicular ad hoc networks
Emap expedite message authentication protocol for vehicular ad hoc networks
 
Eaack—a secure intrusion detection system for mane ts
Eaack—a secure intrusion detection system for mane tsEaack—a secure intrusion detection system for mane ts
Eaack—a secure intrusion detection system for mane ts
 
Dynamic coverage of mobile sensor networks
Dynamic coverage of mobile sensor networksDynamic coverage of mobile sensor networks
Dynamic coverage of mobile sensor networks
 
Distributed web systems performance forecasting using turning bands method
Distributed web systems performance forecasting using turning bands methodDistributed web systems performance forecasting using turning bands method
Distributed web systems performance forecasting using turning bands method
 
Distributed processing of probabilistic top k queries in wireless sensor netw...
Distributed processing of probabilistic top k queries in wireless sensor netw...Distributed processing of probabilistic top k queries in wireless sensor netw...
Distributed processing of probabilistic top k queries in wireless sensor netw...
 
Discovery and verification of neighbor positions in mobile ad hoc networks
Discovery and verification of neighbor positions in mobile ad hoc networksDiscovery and verification of neighbor positions in mobile ad hoc networks
Discovery and verification of neighbor positions in mobile ad hoc networks
 
Detection and localization of multiple spoofing attackers in wireless networks
Detection and localization of multiple spoofing attackers in wireless networksDetection and localization of multiple spoofing attackers in wireless networks
Detection and localization of multiple spoofing attackers in wireless networks
 
Delay optimal broadcast for multihop wireless networks using self-interferenc...
Delay optimal broadcast for multihop wireless networks using self-interferenc...Delay optimal broadcast for multihop wireless networks using self-interferenc...
Delay optimal broadcast for multihop wireless networks using self-interferenc...
 
Cross layer design of congestion control and power control in fast-fading wir...
Cross layer design of congestion control and power control in fast-fading wir...Cross layer design of congestion control and power control in fast-fading wir...
Cross layer design of congestion control and power control in fast-fading wir...
 
Covering points of interest with mobile sensors
Covering points of interest with mobile sensorsCovering points of interest with mobile sensors
Covering points of interest with mobile sensors
 

Microarchitecture of a coarse grain out-of-order superscalar processor

  • 1. ECWAY TECHNOLOGIES IEEE PROJECTS & SOFTWARE DEVELOPMENTS OUR OFFICES @ CHENNAI / TRICHY / KARUR / ERODE / MADURAI / SALEM / COIMBATORE CELL: +91 98949 17187, +91 875487 2111 / 3111 / 4111 / 5111 / 6111 VISIT: www.ecwayprojects.com MAIL TO: ecwaytechnologies@gmail.com MICROARCHITECTURE OF A COARSE-GRAIN OUT-OF-ORDER SUPERSCALAR PROCESSOR ABSTRACT: We explore the design, implementation, and evaluation of a coarse-grain superscalar processor in the context of the microarchitecture of the Control Processor (CP) of the Multilevel Computing Architecture (MLCA), a novel architecture targeted for multimedia multicore systems. The MLCA augments a traditional multicore architecture (called the lower level) with a CP (called the top-level), which automatically extracts parallelism among coarse-grain units of computation (tasks), synchronizes these tasks and schedules them for execution on processors. It does so in a fashion similar to how instruction-level parallelism is extracted by superscalar processors, i.e., using registers renaming, Out-of-Order Execution (OoOE) and scheduling. The coarse-grain nature of tasks imposes challenging constraints on the direct use of these techniques, but also offers opportunities for simpler designs. We analyze the impact of these constraints and opportunities and present novel microarchitectural mechanisms for coarse-grain superscalar execution, including register renaming, task queue, dynamic out-of-order scheduling and task-issue. We design an MLCA system around our CP microarchitecture and implement it on an FPGA. We evaluate the system using multimedia applications and show good scalability for eight processors, limited by the memory bandwidth of the FPGA platform. Furthermore, we show that the CP introduces little overhead in terms of resource usage. Finally, we show scalability beyond eight processors using cycle-accurate RTL-level simulation with an idealized memory subsystem. We demonstrate that the CP poses no performance bottlenecks and is scalable up to 32 processors.