SlideShare a Scribd company logo
1 of 1
ECWAY TECHNOLOGIES
IEEE PROJECTS & SOFTWARE DEVELOPMENTS
OUR OFFICES @ CHENNAI / TRICHY / KARUR / ERODE / MADURAI / SALEM / COIMBATORE
CELL: +91 98949 17187, +91 875487 2111 / 3111 / 4111 / 5111 / 6111
VISIT: www.ecwayprojects.com MAIL TO: ecwaytechnologies@gmail.com

MICROARCHITECTURE OF A COARSE-GRAIN OUT-OF-ORDER
SUPERSCALAR PROCESSOR
ABSTRACT:

We explore the design, implementation, and evaluation of a coarse-grain superscalar processor in
the context of the microarchitecture of the Control Processor (CP) of the Multilevel Computing
Architecture (MLCA), a novel architecture targeted for multimedia multicore systems. The
MLCA augments a traditional multicore architecture (called the lower level) with a CP (called
the top-level), which automatically extracts parallelism among coarse-grain units of computation
(tasks), synchronizes these tasks and schedules them for execution on processors. It does so in a
fashion similar to how instruction-level parallelism is extracted by superscalar processors, i.e.,
using registers renaming, Out-of-Order Execution (OoOE) and scheduling. The coarse-grain
nature of tasks imposes challenging constraints on the direct use of these techniques, but also
offers opportunities for simpler designs.

We analyze the impact of these constraints and opportunities and present novel
microarchitectural mechanisms for coarse-grain superscalar execution, including register
renaming, task queue, dynamic out-of-order scheduling and task-issue. We design an MLCA
system around our CP microarchitecture and implement it on an FPGA. We evaluate the system
using multimedia applications and show good scalability for eight processors, limited by the
memory bandwidth of the FPGA platform. Furthermore, we show that the CP introduces little
overhead in terms of resource usage. Finally, we show scalability beyond eight processors using
cycle-accurate RTL-level simulation with an idealized memory subsystem. We demonstrate that
the CP poses no performance bottlenecks and is scalable up to 32 processors.

More Related Content

Viewers also liked

Dotnet on exploiting transient social contact patterns for data forwarding i...
Dotnet  on exploiting transient social contact patterns for data forwarding i...Dotnet  on exploiting transient social contact patterns for data forwarding i...
Dotnet on exploiting transient social contact patterns for data forwarding i...Ecwaytech
 
Dotnet target tracking and mobile sensor navigation in wireless sensor networks
Dotnet  target tracking and mobile sensor navigation in wireless sensor networksDotnet  target tracking and mobile sensor navigation in wireless sensor networks
Dotnet target tracking and mobile sensor navigation in wireless sensor networksEcwaytech
 
Edll 5341 january 2014
Edll 5341 january 2014Edll 5341 january 2014
Edll 5341 january 2014cswstyle
 
Dotnet min-max a counter-based algorithm for regular expression matching
Dotnet  min-max a counter-based algorithm for regular expression matchingDotnet  min-max a counter-based algorithm for regular expression matching
Dotnet min-max a counter-based algorithm for regular expression matchingEcwaytech
 
Dotnet fast activity detection indexing for temporal stochastic automaton-ba...
Dotnet  fast activity detection indexing for temporal stochastic automaton-ba...Dotnet  fast activity detection indexing for temporal stochastic automaton-ba...
Dotnet fast activity detection indexing for temporal stochastic automaton-ba...Ecwaytech
 
Concepto de pawkar raymi
Concepto de pawkar raymiConcepto de pawkar raymi
Concepto de pawkar raymiJusiku Parco
 
Emap expedite message authentication protocol for vehicular ad hoc networks
Emap expedite message authentication protocol for vehicular ad hoc networksEmap expedite message authentication protocol for vehicular ad hoc networks
Emap expedite message authentication protocol for vehicular ad hoc networksEcwayt
 
Temporalización 2º evaluación 2º eso 2013 2014 (1)
Temporalización 2º evaluación 2º eso 2013 2014 (1)Temporalización 2º evaluación 2º eso 2013 2014 (1)
Temporalización 2º evaluación 2º eso 2013 2014 (1)Agus_elipeja
 
Dotnet ontology matching state of the art and future challenges
Dotnet  ontology matching state of the art and future challengesDotnet  ontology matching state of the art and future challenges
Dotnet ontology matching state of the art and future challengesEcwaytech
 
Dotnet sink trail a proactive data reporting protocol for wireless sensor ne...
Dotnet  sink trail a proactive data reporting protocol for wireless sensor ne...Dotnet  sink trail a proactive data reporting protocol for wireless sensor ne...
Dotnet sink trail a proactive data reporting protocol for wireless sensor ne...Ecwaytech
 
A casa do amor
A casa do amorA casa do amor
A casa do amorVal Ruas
 
disablitynadchildsupportletter.
disablitynadchildsupportletter.disablitynadchildsupportletter.
disablitynadchildsupportletter.Mark Bizzelle
 

Viewers also liked (15)

Enfermedades del Sistema Respiratorio
Enfermedades del Sistema RespiratorioEnfermedades del Sistema Respiratorio
Enfermedades del Sistema Respiratorio
 
Dotnet on exploiting transient social contact patterns for data forwarding i...
Dotnet  on exploiting transient social contact patterns for data forwarding i...Dotnet  on exploiting transient social contact patterns for data forwarding i...
Dotnet on exploiting transient social contact patterns for data forwarding i...
 
Dotnet target tracking and mobile sensor navigation in wireless sensor networks
Dotnet  target tracking and mobile sensor navigation in wireless sensor networksDotnet  target tracking and mobile sensor navigation in wireless sensor networks
Dotnet target tracking and mobile sensor navigation in wireless sensor networks
 
Edll 5341 january 2014
Edll 5341 january 2014Edll 5341 january 2014
Edll 5341 january 2014
 
Glosario
GlosarioGlosario
Glosario
 
Dotnet min-max a counter-based algorithm for regular expression matching
Dotnet  min-max a counter-based algorithm for regular expression matchingDotnet  min-max a counter-based algorithm for regular expression matching
Dotnet min-max a counter-based algorithm for regular expression matching
 
Dotnet fast activity detection indexing for temporal stochastic automaton-ba...
Dotnet  fast activity detection indexing for temporal stochastic automaton-ba...Dotnet  fast activity detection indexing for temporal stochastic automaton-ba...
Dotnet fast activity detection indexing for temporal stochastic automaton-ba...
 
Concepto de pawkar raymi
Concepto de pawkar raymiConcepto de pawkar raymi
Concepto de pawkar raymi
 
Emap expedite message authentication protocol for vehicular ad hoc networks
Emap expedite message authentication protocol for vehicular ad hoc networksEmap expedite message authentication protocol for vehicular ad hoc networks
Emap expedite message authentication protocol for vehicular ad hoc networks
 
Temporalización 2º evaluación 2º eso 2013 2014 (1)
Temporalización 2º evaluación 2º eso 2013 2014 (1)Temporalización 2º evaluación 2º eso 2013 2014 (1)
Temporalización 2º evaluación 2º eso 2013 2014 (1)
 
Dotnet ontology matching state of the art and future challenges
Dotnet  ontology matching state of the art and future challengesDotnet  ontology matching state of the art and future challenges
Dotnet ontology matching state of the art and future challenges
 
Dotnet sink trail a proactive data reporting protocol for wireless sensor ne...
Dotnet  sink trail a proactive data reporting protocol for wireless sensor ne...Dotnet  sink trail a proactive data reporting protocol for wireless sensor ne...
Dotnet sink trail a proactive data reporting protocol for wireless sensor ne...
 
A casa do amor
A casa do amorA casa do amor
A casa do amor
 
Página 15
Página 15Página 15
Página 15
 
disablitynadchildsupportletter.
disablitynadchildsupportletter.disablitynadchildsupportletter.
disablitynadchildsupportletter.
 

Similar to Dotnet microarchitecture of a coarse-grain out-of-order superscalar processor

Java microarchitecture of a coarse-grain out-of-order superscalar processor
Java  microarchitecture of a coarse-grain out-of-order superscalar processorJava  microarchitecture of a coarse-grain out-of-order superscalar processor
Java microarchitecture of a coarse-grain out-of-order superscalar processorecwayerode
 
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...Michael Gschwind
 
OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxfaithxdunce63732
 
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...csandit
 
An octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingAn octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingeSAT Journals
 
Cache performance-x86-2009
Cache performance-x86-2009Cache performance-x86-2009
Cache performance-x86-2009Léia de Sousa
 
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGHOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGcscpconf
 
Synergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architectureSynergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architectureMichael Gschwind
 
Genetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing EnvironmentGenetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing EnvironmentSwapnil Shahade
 
MPSoC Platform Design and Simulation for Power %0A Performance Estimation
MPSoC Platform Design and  Simulation for Power %0A Performance EstimationMPSoC Platform Design and  Simulation for Power %0A Performance Estimation
MPSoC Platform Design and Simulation for Power %0A Performance EstimationZhengjie Lu
 
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORSAFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORScscpconf
 
Affect of parallel computing on multicore processors
Affect of parallel computing on multicore processorsAffect of parallel computing on multicore processors
Affect of parallel computing on multicore processorscsandit
 
main-camera-ready
main-camera-readymain-camera-ready
main-camera-readyShaolin Xie
 
Parallex - The Supercomputer
Parallex - The SupercomputerParallex - The Supercomputer
Parallex - The SupercomputerAnkit Singh
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spacejsvetter
 
Chap 2 classification of parralel architecture and introduction to parllel p...
Chap 2  classification of parralel architecture and introduction to parllel p...Chap 2  classification of parralel architecture and introduction to parllel p...
Chap 2 classification of parralel architecture and introduction to parllel p...Malobe Lottin Cyrille Marcel
 
1.multicore processors
1.multicore processors1.multicore processors
1.multicore processorsHebeon1
 
Concurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core ProcessorsConcurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core ProcessorsCSCJournals
 
Interface for Performance Environment Autoconfiguration Framework
Interface for Performance Environment Autoconfiguration FrameworkInterface for Performance Environment Autoconfiguration Framework
Interface for Performance Environment Autoconfiguration FrameworkLiang Men
 

Similar to Dotnet microarchitecture of a coarse-grain out-of-order superscalar processor (20)

Java microarchitecture of a coarse-grain out-of-order superscalar processor
Java  microarchitecture of a coarse-grain out-of-order superscalar processorJava  microarchitecture of a coarse-grain out-of-order superscalar processor
Java microarchitecture of a coarse-grain out-of-order superscalar processor
 
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
Michael Gschwind, Cell Broadband Engine: Exploiting multiple levels of parall...
 
OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
 
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...
 
An octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingAn octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passing
 
Cache performance-x86-2009
Cache performance-x86-2009Cache performance-x86-2009
Cache performance-x86-2009
 
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGHOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
 
Synergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architectureSynergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architecture
 
Genetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing EnvironmentGenetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing Environment
 
MPSoC Platform Design and Simulation for Power %0A Performance Estimation
MPSoC Platform Design and  Simulation for Power %0A Performance EstimationMPSoC Platform Design and  Simulation for Power %0A Performance Estimation
MPSoC Platform Design and Simulation for Power %0A Performance Estimation
 
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORSAFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
 
Affect of parallel computing on multicore processors
Affect of parallel computing on multicore processorsAffect of parallel computing on multicore processors
Affect of parallel computing on multicore processors
 
main-camera-ready
main-camera-readymain-camera-ready
main-camera-ready
 
Parallex - The Supercomputer
Parallex - The SupercomputerParallex - The Supercomputer
Parallex - The Supercomputer
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design space
 
Chap 2 classification of parralel architecture and introduction to parllel p...
Chap 2  classification of parralel architecture and introduction to parllel p...Chap 2  classification of parralel architecture and introduction to parllel p...
Chap 2 classification of parralel architecture and introduction to parllel p...
 
1.multicore processors
1.multicore processors1.multicore processors
1.multicore processors
 
Concurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core ProcessorsConcurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core Processors
 
Interface for Performance Environment Autoconfiguration Framework
Interface for Performance Environment Autoconfiguration FrameworkInterface for Performance Environment Autoconfiguration Framework
Interface for Performance Environment Autoconfiguration Framework
 

Dotnet microarchitecture of a coarse-grain out-of-order superscalar processor

  • 1. ECWAY TECHNOLOGIES IEEE PROJECTS & SOFTWARE DEVELOPMENTS OUR OFFICES @ CHENNAI / TRICHY / KARUR / ERODE / MADURAI / SALEM / COIMBATORE CELL: +91 98949 17187, +91 875487 2111 / 3111 / 4111 / 5111 / 6111 VISIT: www.ecwayprojects.com MAIL TO: ecwaytechnologies@gmail.com MICROARCHITECTURE OF A COARSE-GRAIN OUT-OF-ORDER SUPERSCALAR PROCESSOR ABSTRACT: We explore the design, implementation, and evaluation of a coarse-grain superscalar processor in the context of the microarchitecture of the Control Processor (CP) of the Multilevel Computing Architecture (MLCA), a novel architecture targeted for multimedia multicore systems. The MLCA augments a traditional multicore architecture (called the lower level) with a CP (called the top-level), which automatically extracts parallelism among coarse-grain units of computation (tasks), synchronizes these tasks and schedules them for execution on processors. It does so in a fashion similar to how instruction-level parallelism is extracted by superscalar processors, i.e., using registers renaming, Out-of-Order Execution (OoOE) and scheduling. The coarse-grain nature of tasks imposes challenging constraints on the direct use of these techniques, but also offers opportunities for simpler designs. We analyze the impact of these constraints and opportunities and present novel microarchitectural mechanisms for coarse-grain superscalar execution, including register renaming, task queue, dynamic out-of-order scheduling and task-issue. We design an MLCA system around our CP microarchitecture and implement it on an FPGA. We evaluate the system using multimedia applications and show good scalability for eight processors, limited by the memory bandwidth of the FPGA platform. Furthermore, we show that the CP introduces little overhead in terms of resource usage. Finally, we show scalability beyond eight processors using cycle-accurate RTL-level simulation with an idealized memory subsystem. We demonstrate that the CP poses no performance bottlenecks and is scalable up to 32 processors.