SlideShare a Scribd company logo
Ateeq Ateeq
Super-Scalar Processors
 Super-Scalar processor is a CPU that implements a form of parallelism called
instruction-level parallelism within a single processor.
Simple superscalar
pipeline. By fetching and
dispatching two
instructions at a time, a
maximum of two
instructions per cycle can
be completed
Super-Scalar Processors
 Allow faster CPU throughput by executes more than one instruction during a
clock cycle by simultaneously dispatching multiple instructions to different
execution units on the processor.
 The execution unit is an execution resource within a single CPU such as an
arithmetic logic unit, a bit shifter, or a multiplier.
Super-Scalar Processor
 Superscalar design involves the processor being able to issue multiple
instructions in a single clock, with redundant facilities to execute an
instruction. We're talking about within a single core, mind you
multicore processing is different.
 Pipelining divides an instruction into steps, and since each step is
executed in a different part of the processor, multiple instructions can
be in different "phases" each clock
Super-Scalar Processors
What is Multi Core Processor?
 A multi-core processor is a single computing component with two or more
independent actual processing units.
Core is the processing
unit which receives
instructions and
performs calculations,
or actions, based on
those instructions.
Multicore VS Super-Scalar
 In Super- Scalar, there is only one instruction counter.
 Even that Super-Scalar is keep tracking of multiple instructions in-flight, but
all the instructions are from a single program because this is still just one
processor.
 In multi-core we have multiple instruction streams executing simultaneously.
The important part is that each core (executing with its own instruction
counter) can also be super-scalar in order to execute each single process more
quickly.
What is Chip Multiprocessor?
 Integrate the cores onto a single integrated circuit
memory controller is a
digital circuit that
manages the flow of
data going to and from
the computer's main
memory.
Peripheral Component
Interconnect Express, a
serial expansion bus
standard for connecting
a computer to one or
more peripheral devices.
PCIe provides lower
latency and higher data
transfer rates than
parallel busses such as
PCI
Minimal Instruction Set
Computer (MISC) is a
processor architecture
with a very small
number of basic
operations and
corresponding opcodes.
Why CMP?
 Enable sharing of computation resources.
Single Chip Multiprocessor
 Integration of resources on a single chip.
 Why?
 Commercial: Dependency on these multi-threaded throughput-oriented
programs.
 Long off-chip delays: Traditional symmetric multiprocessors suffer from a
performance penalty caused by memory stalls due to cache misses and
cache-to-cache transfers.
Single Chip Multiprocessor
 Benefits
 Reduced the cost of processing power.
 Low per unit cost.
 Increase reliability as there are many fewer electrical connections to fail.
 Increased throughput required by multi-threaded applications
 Reducing the overhead incurred due to sharing misses in traditional shared-
memory multiprocessors.
Single Chip Multiprocessor
Core cache is
divided into 2
parts one for data
and one for
instructions.
Multiple-Chip Multiprocessor
 Known as M-CMP
 M-CMP is a combination of multiple CMPs.
Processor
Data and
Inst. cache
Multiple-Chip Multiprocessor
 All of the systems use shared memory to preserve operating
system and application investment.
 Key challenge for M-CMP systems is implementing correct and high
performance cache coherence protocols.
 These protocols keep caches transparent to software, usually by maintaining
the coherence invariant that each block may have either one writer or
multiple readers.
 M-CMPs present a greater challenge, because they must maintain both intra-
CMP coherence and inter- CMP coherence
Simulation
 Goal: evaluating the performance of the novel CMP or M-CMP micro-
architectures requires a way of simulating the environment in which we
would expect these architectures to be used in real systems.
 Software:
 GEM5 - modular platform for computer-system architecture research.
 Ruby – memory simulator, implements a detailed simulation model for the
memory subsystem.
 Execution time: Ruby Cycle
 L1 cache misses: calculated by dividing request missed by number of requests
(Instruction + Data).
 L2 miss/miss rate: calculated from the number of requests issued to the L2 and
the misses of all banks of L2.
Simulation
 L2/Dir replacement: Number of replacements of L2/Directory entries. It's
caused by capacity misses and conflict misses.
 Miss latency average: Average of the L1 miss latency in Ruby cycles. It is
measured from the moment a memory request is issued to the moment when
the date is retrieved.
 Memory requests: Number of reads and writes issued to main memory.
 Cache size: 32KB L1I + 32KB L1D and 512KB L2 cache per core
Results
Directory size
Processors
More cores needs to
increase the
directory to have
less cache miss
Results L2 misses
increases as # of
cores within the
system
increases
miss rate
decrease to
40%-80%
because of the
larger total L2
cache on chip
Results Observed:
L2 miss rate
decreases with
increase in
director sizeL2 misses is mainly
determined by L2
cache size and more
importantly
application working
set.
Results Replacement policy : Least Recent Use
(LRU)
L2 Replacement occurs when the L2
cache is full and another allocation is
required.
Directory and L2
size and on the
applications
Results Observation: larger directory size does not
improve the directory replacements. The
reason is that so many data are mapped to the
same location resulting in many conflicts.
Increasing the set associativity of the directory
to avoid the conflicts.
Results
Memory read
requests caused
by L2 miss is the
dominant faction
of total memory
requests.
Results Miss latency increases 50% from 4-core to 16-
core and 150%-230% from 16-core to 64-core.
On a L1 miss, there are up to 3 nodes involved
to fulfill the miss: local node, home node and
remote node.
Home Node: output of address mapping
function
Remote Node: Cache line requested by one of
the cores.
Local Node: Cache line founded in the local
private or shared partition
Results
For all structure, the latency looks almost the
same, which depends on network topology
and on-chip link latency.

More Related Content

What's hot

Multi-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architectureMulti-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architecture
Umair Amjad
 
Multicore processors and its advantages
Multicore processors and its advantagesMulticore processors and its advantages
Multicore processors and its advantages
Nitesh Tudu
 
Difference between Single core, Dual core and Quad core Processors
Difference between Single core, Dual core and Quad core ProcessorsDifference between Single core, Dual core and Quad core Processors
Difference between Single core, Dual core and Quad core Processors
Deep Kakkad
 
Multicore Processsors
Multicore ProcesssorsMulticore Processsors
Multicore Processsors
Aveen Meena
 
Introduction to multi core
Introduction to multi coreIntroduction to multi core
Introduction to multi coremukul bhardwaj
 
Multicore processing
Multicore processingMulticore processing
Multicore processingguestc0be34a
 
Multicore Processor Technology
Multicore Processor TechnologyMulticore Processor Technology
Multicore Processor Technology
Venkata Raja Paruchuru
 
29092013042656 multicore-processor-technology
29092013042656 multicore-processor-technology29092013042656 multicore-processor-technology
29092013042656 multicore-processor-technologySindhu Nathan
 
Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecturePiyush Mittal
 
CA presentation of multicore processor
CA presentation of multicore processorCA presentation of multicore processor
CA presentation of multicore processor
Zeeshan Aslam
 
Final draft intel core i5 processors architecture
Final draft intel core i5 processors architectureFinal draft intel core i5 processors architecture
Final draft intel core i5 processors architectureJawid Ahmad Baktash
 
COMPARATIVE ANALYSIS OF SINGLE-CORE AND MULTI-CORE SYSTEMS
COMPARATIVE ANALYSIS OF SINGLE-CORE AND MULTI-CORE SYSTEMSCOMPARATIVE ANALYSIS OF SINGLE-CORE AND MULTI-CORE SYSTEMS
COMPARATIVE ANALYSIS OF SINGLE-CORE AND MULTI-CORE SYSTEMS
ijcsit
 
Core i3,i5,i7 and i9 processors
Core i3,i5,i7 and i9 processorsCore i3,i5,i7 and i9 processors
Core i3,i5,i7 and i9 processors
hajra azam
 
Core 2 Duo Processor
Core 2 Duo ProcessorCore 2 Duo Processor
Core 2 Duo ProcessorKashif Latif
 
Intel core i7 processor
Intel core i7 processorIntel core i7 processor
Intel core i7 processor
Gautam Kumar
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
Neel Patel
 
Dual Core Processor
Dual Core ProcessorDual Core Processor
Dual Core Processor
faiza nahin
 
Processors
ProcessorsProcessors
Processorsmzlnmy
 

What's hot (20)

Multi-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architectureMulti-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architecture
 
Multicore processors and its advantages
Multicore processors and its advantagesMulticore processors and its advantages
Multicore processors and its advantages
 
Difference between Single core, Dual core and Quad core Processors
Difference between Single core, Dual core and Quad core ProcessorsDifference between Single core, Dual core and Quad core Processors
Difference between Single core, Dual core and Quad core Processors
 
Multi core processor
Multi core processorMulti core processor
Multi core processor
 
Multicore Processsors
Multicore ProcesssorsMulticore Processsors
Multicore Processsors
 
Introduction to multi core
Introduction to multi coreIntroduction to multi core
Introduction to multi core
 
Multicore processing
Multicore processingMulticore processing
Multicore processing
 
Multicore Processor Technology
Multicore Processor TechnologyMulticore Processor Technology
Multicore Processor Technology
 
29092013042656 multicore-processor-technology
29092013042656 multicore-processor-technology29092013042656 multicore-processor-technology
29092013042656 multicore-processor-technology
 
Multi core-architecture
Multi core-architectureMulti core-architecture
Multi core-architecture
 
CA presentation of multicore processor
CA presentation of multicore processorCA presentation of multicore processor
CA presentation of multicore processor
 
Final draft intel core i5 processors architecture
Final draft intel core i5 processors architectureFinal draft intel core i5 processors architecture
Final draft intel core i5 processors architecture
 
COMPARATIVE ANALYSIS OF SINGLE-CORE AND MULTI-CORE SYSTEMS
COMPARATIVE ANALYSIS OF SINGLE-CORE AND MULTI-CORE SYSTEMSCOMPARATIVE ANALYSIS OF SINGLE-CORE AND MULTI-CORE SYSTEMS
COMPARATIVE ANALYSIS OF SINGLE-CORE AND MULTI-CORE SYSTEMS
 
Core i3,i5,i7 and i9 processors
Core i3,i5,i7 and i9 processorsCore i3,i5,i7 and i9 processors
Core i3,i5,i7 and i9 processors
 
Core 2 Duo Processor
Core 2 Duo ProcessorCore 2 Duo Processor
Core 2 Duo Processor
 
Intel core i7 processor
Intel core i7 processorIntel core i7 processor
Intel core i7 processor
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
 
Dual Core Processor
Dual Core ProcessorDual Core Processor
Dual Core Processor
 
Processors
ProcessorsProcessors
Processors
 
Processors
ProcessorsProcessors
Processors
 

Similar to Study of various factors affecting performance of multi core processors

Multicore Computers
Multicore ComputersMulticore Computers
Multicore Computers
A B Shinde
 
1.multicore processors
1.multicore processors1.multicore processors
1.multicore processors
Hebeon1
 
Area Optimized Implementation For Mips Processor
Area Optimized Implementation For Mips ProcessorArea Optimized Implementation For Mips Processor
Area Optimized Implementation For Mips Processor
IOSR Journals
 
Hyper threading technology
Hyper threading technologyHyper threading technology
Hyper threading technology
Nikhil Venugopal
 
Intel new processors
Intel new processorsIntel new processors
Intel new processors
zaid_b
 
Multilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memoryMultilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memoryMahesh Kumar Attri
 
Cache memory and cache
Cache memory and cacheCache memory and cache
Cache memory and cache
VISHAL DONGA
 
Implementation of RISC-Based Architecture for Low power applications
Implementation of RISC-Based Architecture for Low power applicationsImplementation of RISC-Based Architecture for Low power applications
Implementation of RISC-Based Architecture for Low power applications
IOSR Journals
 
Symmetric multiprocessing and Microkernel
Symmetric multiprocessing and MicrokernelSymmetric multiprocessing and Microkernel
Symmetric multiprocessing and Microkernel
Manoraj Pannerselum
 
Modern processor art
Modern processor artModern processor art
Modern processor art
waqasjadoon11
 
processor struct
processor structprocessor struct
processor struct
waqasjadoon11
 
Modern processor art
Modern processor artModern processor art
Modern processor artwaqasjadoon11
 
Multi-Core on Chip Architecture *doc - IK
Multi-Core on Chip Architecture *doc - IKMulti-Core on Chip Architecture *doc - IK
Multi-Core on Chip Architecture *doc - IK
Ilgın Kavaklıoğulları
 
Hyper Threading Technology
Hyper Threading TechnologyHyper Threading Technology
Hyper Threading Technologynayakslideshare
 
What is simultaneous multithreading
What is simultaneous multithreadingWhat is simultaneous multithreading
What is simultaneous multithreading
Fraboni Ec
 
Parallel Processing (Part 2)
Parallel Processing (Part 2)Parallel Processing (Part 2)
Parallel Processing (Part 2)
Ajeng Savitri
 
Computer Organization.pptx
Computer Organization.pptxComputer Organization.pptx
Computer Organization.pptx
saimagul310
 

Similar to Study of various factors affecting performance of multi core processors (20)

Multicore Computers
Multicore ComputersMulticore Computers
Multicore Computers
 
1.multicore processors
1.multicore processors1.multicore processors
1.multicore processors
 
Area Optimized Implementation For Mips Processor
Area Optimized Implementation For Mips ProcessorArea Optimized Implementation For Mips Processor
Area Optimized Implementation For Mips Processor
 
Hyper threading technology
Hyper threading technologyHyper threading technology
Hyper threading technology
 
Hyper threading
Hyper threadingHyper threading
Hyper threading
 
Intel new processors
Intel new processorsIntel new processors
Intel new processors
 
Multilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memoryMultilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memory
 
Cache memory and cache
Cache memory and cacheCache memory and cache
Cache memory and cache
 
Implementation of RISC-Based Architecture for Low power applications
Implementation of RISC-Based Architecture for Low power applicationsImplementation of RISC-Based Architecture for Low power applications
Implementation of RISC-Based Architecture for Low power applications
 
Parallel processing Concepts
Parallel processing ConceptsParallel processing Concepts
Parallel processing Concepts
 
Symmetric multiprocessing and Microkernel
Symmetric multiprocessing and MicrokernelSymmetric multiprocessing and Microkernel
Symmetric multiprocessing and Microkernel
 
Modern processor art
Modern processor artModern processor art
Modern processor art
 
processor struct
processor structprocessor struct
processor struct
 
Modern processor art
Modern processor artModern processor art
Modern processor art
 
Danish presentation
Danish presentationDanish presentation
Danish presentation
 
Multi-Core on Chip Architecture *doc - IK
Multi-Core on Chip Architecture *doc - IKMulti-Core on Chip Architecture *doc - IK
Multi-Core on Chip Architecture *doc - IK
 
Hyper Threading Technology
Hyper Threading TechnologyHyper Threading Technology
Hyper Threading Technology
 
What is simultaneous multithreading
What is simultaneous multithreadingWhat is simultaneous multithreading
What is simultaneous multithreading
 
Parallel Processing (Part 2)
Parallel Processing (Part 2)Parallel Processing (Part 2)
Parallel Processing (Part 2)
 
Computer Organization.pptx
Computer Organization.pptxComputer Organization.pptx
Computer Organization.pptx
 

Recently uploaded

Drugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptxDrugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptx
ThalapathyVijay15
 
F5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptxF5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptx
ArjunJain44
 
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
PinkySharma900491
 
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
kywwoyk
 
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
Amil baba
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
kywwoyk
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
eemet
 
web-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jerweb-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jer
freshgammer09
 
Cyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber securityCyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber security
perweeng31
 

Recently uploaded (9)

Drugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptxDrugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptx
 
F5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptxF5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptx
 
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
 
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
 
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
 
web-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jerweb-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jer
 
Cyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber securityCyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber security
 

Study of various factors affecting performance of multi core processors

  • 2. Super-Scalar Processors  Super-Scalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. Simple superscalar pipeline. By fetching and dispatching two instructions at a time, a maximum of two instructions per cycle can be completed
  • 3. Super-Scalar Processors  Allow faster CPU throughput by executes more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor.  The execution unit is an execution resource within a single CPU such as an arithmetic logic unit, a bit shifter, or a multiplier.
  • 4. Super-Scalar Processor  Superscalar design involves the processor being able to issue multiple instructions in a single clock, with redundant facilities to execute an instruction. We're talking about within a single core, mind you multicore processing is different.  Pipelining divides an instruction into steps, and since each step is executed in a different part of the processor, multiple instructions can be in different "phases" each clock
  • 6. What is Multi Core Processor?  A multi-core processor is a single computing component with two or more independent actual processing units. Core is the processing unit which receives instructions and performs calculations, or actions, based on those instructions.
  • 7. Multicore VS Super-Scalar  In Super- Scalar, there is only one instruction counter.  Even that Super-Scalar is keep tracking of multiple instructions in-flight, but all the instructions are from a single program because this is still just one processor.  In multi-core we have multiple instruction streams executing simultaneously. The important part is that each core (executing with its own instruction counter) can also be super-scalar in order to execute each single process more quickly.
  • 8. What is Chip Multiprocessor?  Integrate the cores onto a single integrated circuit memory controller is a digital circuit that manages the flow of data going to and from the computer's main memory. Peripheral Component Interconnect Express, a serial expansion bus standard for connecting a computer to one or more peripheral devices. PCIe provides lower latency and higher data transfer rates than parallel busses such as PCI Minimal Instruction Set Computer (MISC) is a processor architecture with a very small number of basic operations and corresponding opcodes.
  • 9. Why CMP?  Enable sharing of computation resources.
  • 10. Single Chip Multiprocessor  Integration of resources on a single chip.  Why?  Commercial: Dependency on these multi-threaded throughput-oriented programs.  Long off-chip delays: Traditional symmetric multiprocessors suffer from a performance penalty caused by memory stalls due to cache misses and cache-to-cache transfers.
  • 11. Single Chip Multiprocessor  Benefits  Reduced the cost of processing power.  Low per unit cost.  Increase reliability as there are many fewer electrical connections to fail.  Increased throughput required by multi-threaded applications  Reducing the overhead incurred due to sharing misses in traditional shared- memory multiprocessors.
  • 12. Single Chip Multiprocessor Core cache is divided into 2 parts one for data and one for instructions.
  • 13. Multiple-Chip Multiprocessor  Known as M-CMP  M-CMP is a combination of multiple CMPs. Processor Data and Inst. cache
  • 14. Multiple-Chip Multiprocessor  All of the systems use shared memory to preserve operating system and application investment.  Key challenge for M-CMP systems is implementing correct and high performance cache coherence protocols.  These protocols keep caches transparent to software, usually by maintaining the coherence invariant that each block may have either one writer or multiple readers.  M-CMPs present a greater challenge, because they must maintain both intra- CMP coherence and inter- CMP coherence
  • 15. Simulation  Goal: evaluating the performance of the novel CMP or M-CMP micro- architectures requires a way of simulating the environment in which we would expect these architectures to be used in real systems.  Software:  GEM5 - modular platform for computer-system architecture research.  Ruby – memory simulator, implements a detailed simulation model for the memory subsystem.  Execution time: Ruby Cycle  L1 cache misses: calculated by dividing request missed by number of requests (Instruction + Data).  L2 miss/miss rate: calculated from the number of requests issued to the L2 and the misses of all banks of L2.
  • 16. Simulation  L2/Dir replacement: Number of replacements of L2/Directory entries. It's caused by capacity misses and conflict misses.  Miss latency average: Average of the L1 miss latency in Ruby cycles. It is measured from the moment a memory request is issued to the moment when the date is retrieved.  Memory requests: Number of reads and writes issued to main memory.  Cache size: 32KB L1I + 32KB L1D and 512KB L2 cache per core
  • 17. Results Directory size Processors More cores needs to increase the directory to have less cache miss
  • 18. Results L2 misses increases as # of cores within the system increases miss rate decrease to 40%-80% because of the larger total L2 cache on chip
  • 19. Results Observed: L2 miss rate decreases with increase in director sizeL2 misses is mainly determined by L2 cache size and more importantly application working set.
  • 20. Results Replacement policy : Least Recent Use (LRU) L2 Replacement occurs when the L2 cache is full and another allocation is required. Directory and L2 size and on the applications
  • 21. Results Observation: larger directory size does not improve the directory replacements. The reason is that so many data are mapped to the same location resulting in many conflicts. Increasing the set associativity of the directory to avoid the conflicts.
  • 22. Results Memory read requests caused by L2 miss is the dominant faction of total memory requests.
  • 23. Results Miss latency increases 50% from 4-core to 16- core and 150%-230% from 16-core to 64-core. On a L1 miss, there are up to 3 nodes involved to fulfill the miss: local node, home node and remote node. Home Node: output of address mapping function Remote Node: Cache line requested by one of the cores. Local Node: Cache line founded in the local private or shared partition
  • 24. Results For all structure, the latency looks almost the same, which depends on network topology and on-chip link latency.