SlideShare a Scribd company logo
1 of 21
Core 2 Duo die
“Just a few years ago, the idea of putting multiple processors
on a chip was farfetched. Now it is accepted and
commonplace, and virtually every new high performance
processor is a chip multiprocessor of some sort…”
Center for Electronic System Design
Univ. of California Berkeley
Chip Multiprocessors??
“Mowry is working on the development
of single-chip multiprocessors: one large
chip capable of performing multiple
operations at once, using similar
techniques to maximize performance”
-- Technology Review, 1999
Sony's Playstation 3, 2006
CMP Caches: Design Space
• Architecture
– Placement of Cache/Processors
– Interconnects/Routing
• Cache Organization & Management
– Private/Shared/Hybrid
– Fully Hardware/OS Interface
“L2 is the last line of defense before hitting the
memory wall, and is the focus of our talk”
Private L2 Cache
I$ D$ I$ D$
L2 $ L2 $ L2 $ L2 $ L2 $ L2 $
I N T E R C O N N E C T
Coherence Protocol
Offchip Memory
+ Less interconnect traffic
+ Insulates L2 units
+ Hit latency
– Duplication
– Load imbalance
– Complexity of coherence
– Higher miss rate
L1 L1
Proc
Shared-Interleaved L2 Cache
– Interconnect traffic
– Interference between cores
– Hit latency is higher
+ No duplication
+ Balance the load
+ Lower miss rate
+ Simplicity of coherence
I$ D$ I$ D$
I N T E R C O N N E C T
Coherence ProtocolL1
L2
Take Home Message
• Leverage on-chip access time
Take Home Messages
• Leverage on-chip access time
• Better sharing of cache resources
• Isolating performance of processors
• Place data on the chip close to where it is used
• Minimize inter-processor misses (in shared cache)
• Fairness towards processors
On to some solutions…
Jichuan Chang and Gurindar S. Sohi
Cooperative Caching for Chip Multiprocessors
International Symposium on Computer Architecture, 2006.
Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki
Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches
International Symposium on Computer Architecture, 2009.
Shekhar Srikantaiah, Mahmut Kandemir, and Mary Jane Irwin
Adaptive Set-Pinning: Managing Shared Caches in Chip Multiprocessors
Architectural Support for Programming Languages and Operating, Systems 2008.
each handles this problem in a different way
Co-operative Caching
(Chang & Sohi)
• Private L2 caches
• Attract data locally to reduce remote on chip access.
Lowers average on-chip misses.
• Co-operation among the private caches for efficient
use of resources on the chip.
• Controlling the extent of co-operation to suit the
dynamic workload behavior
CC Techniques
• Cache to cache transfer of clean data
– In case of miss transfer “clean” blocks from another L2 cache.
– This is useful in the case of “read only” data (instructions) .
• Replication aware data replacement
– Singlet/Replicate.
– Evict singlet only when no replicates exist.
– Singlets can be “spilled” to other cache banks.
• Global replacement of inactive data
– Global management needed for managing “spilling”.
– N-Chance Forwarding.
– Set recirculation count to N when spilled.
– Decrease N by 1 when spilled again, unless N becomes 0.
Set “Pinning” -- Setup
P1
P2
P3
P4
Set 0
Set 1
:
:
Set (S-1)
L1
cache
Processors Shared
L2 cache
I
n
t
e
r
c
o
n
n
e
c
t
Main
Memory
Set “Pinning” -- Problem
P1
P2
P3
P4
Set 0
Set 1
:
:
Set (S-1)
Main
Memory
Set “Pinning”
-- Types of Cache Misses
• Compulsory
(aka Cold)
• Capacity
• Conflict
• Coherence
• Compulsory
• Inter-processor
• Intra-processor
versus
P1
P2
P3
P4
Main
Memory
POP 1
POP 2
POP 3
POP 4
Set
:
:
Set
Owner Other bits Data
R-NUCA: Use Class-Based Strategies
Solve for the common case!
Most current (and future) programs have the following types of accesses
1. Instruction Access – Shared, but Read-Only
2. Private Data Access – Read-Write, but not Shared
3. Shared Data Access – Read-Write (or) Read-Only, but Shared.
R-NUCA: Can do this online!
• We have information from the OS and TLB
• For each memory block, classify it as
– Instruction
– Private Data
– Shared Data
• Handle them differently
– Replicate instructions
– Keep private data locally
– Keep shared data globally
R-NUCA: Reactive Clustering
• Assign clusters based on level of sharing
– Private Data given level-1 clusters (local cache)
– Shared Data given level-16 clusters (16 neighboring machines), etc.
Clusters ≈ Overlapping Sets in Set-Associative Mapping
• Within a cluster, “Rotational Interleaving”
– Load-Balancing to minimize contention on bus and controller
Future Directions
Area has been closed.
Just Kidding…
• Optimize for Power Consumption
• Assess trade-offs between more caches and more cores
• Minimize usage of OS, but still retain flexibility
• Application adaptation to allocated cache quotas
• Adding hardware directed thread level speculation
Questions?
THANK YOU!
Backup
• Commercial and research prototypes
– Sun MAJC
– Piranha
– IBM Power 4/5
– Stanford Hydra
Backup

More Related Content

What's hot

What's hot (15)

Introduction to Microkernels
Introduction to MicrokernelsIntroduction to Microkernels
Introduction to Microkernels
 
Plan 9: Not (Only) A Better UNIX
Plan 9: Not (Only) A Better UNIXPlan 9: Not (Only) A Better UNIX
Plan 9: Not (Only) A Better UNIX
 
From L3 to seL4: What have we learnt in 20 years of L4 microkernels
From L3 to seL4: What have we learnt in 20 years of L4 microkernelsFrom L3 to seL4: What have we learnt in 20 years of L4 microkernels
From L3 to seL4: What have we learnt in 20 years of L4 microkernels
 
Linux kernel Architecture and Properties
Linux kernel Architecture and PropertiesLinux kernel Architecture and Properties
Linux kernel Architecture and Properties
 
μ-Kernel Evolution
μ-Kernel Evolutionμ-Kernel Evolution
μ-Kernel Evolution
 
Architecture Of The Linux Kernel
Architecture Of The Linux KernelArchitecture Of The Linux Kernel
Architecture Of The Linux Kernel
 
Parallel Computing - Lec 3
Parallel Computing - Lec 3Parallel Computing - Lec 3
Parallel Computing - Lec 3
 
L4 Microkernel :: Design Overview
L4 Microkernel :: Design OverviewL4 Microkernel :: Design Overview
L4 Microkernel :: Design Overview
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBD
 
[TALK] Exokernel vs. Microkernel
[TALK] Exokernel vs. Microkernel[TALK] Exokernel vs. Microkernel
[TALK] Exokernel vs. Microkernel
 
Hints for L4 Microkernel
Hints for L4 MicrokernelHints for L4 Microkernel
Hints for L4 Microkernel
 
Embedded Hypervisor for ARM
Embedded Hypervisor for ARMEmbedded Hypervisor for ARM
Embedded Hypervisor for ARM
 
Multicore Processors
Multicore ProcessorsMulticore Processors
Multicore Processors
 
Implement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVMImplement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVM
 
Ubuntu
UbuntuUbuntu
Ubuntu
 

Viewers also liked

creativity at work fall2015
creativity at work fall2015creativity at work fall2015
creativity at work fall2015Terry Chong
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherenceHarry Potter
 
Búsqueda no informada - Backtracking/Hacia atrás
Búsqueda no informada - Backtracking/Hacia atrásBúsqueda no informada - Backtracking/Hacia atrás
Búsqueda no informada - Backtracking/Hacia atrásLaura Del Pino Díaz
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithmsTony Nguyen
 
Rafael orsini 24118492
Rafael orsini 24118492Rafael orsini 24118492
Rafael orsini 24118492orsini07
 
Prolog Visualizer
Prolog VisualizerProlog Visualizer
Prolog VisualizerZhixuan Lai
 
Power point 1 η μικροδιδασκαλία
Power point 1 η μικροδιδασκαλίαPower point 1 η μικροδιδασκαλία
Power point 1 η μικροδιδασκαλίαMaria Antorka
 
Αναλυτικά Προγράμματα Σπουδών Β ́ και Γ ́ τάξεων Τομέα Δομικών Έργων ΦΕΚ 770 ...
Αναλυτικά Προγράμματα Σπουδών Β ́ και Γ ́ τάξεων Τομέα Δομικών Έργων ΦΕΚ 770 ...Αναλυτικά Προγράμματα Σπουδών Β ́ και Γ ́ τάξεων Τομέα Δομικών Έργων ΦΕΚ 770 ...
Αναλυτικά Προγράμματα Σπουδών Β ́ και Γ ́ τάξεων Τομέα Δομικών Έργων ΦΕΚ 770 ...John Tzortzakis
 
Εξεταστέα ύλη μαθημάτων της Γ΄τάξης του Τομέα Δομικών Εργων σχ.έτους 2015-16
Εξεταστέα ύλη μαθημάτων της Γ΄τάξης του Τομέα Δομικών Εργων σχ.έτους 2015-16Εξεταστέα ύλη μαθημάτων της Γ΄τάξης του Τομέα Δομικών Εργων σχ.έτους 2015-16
Εξεταστέα ύλη μαθημάτων της Γ΄τάξης του Τομέα Δομικών Εργων σχ.έτους 2015-16John Tzortzakis
 

Viewers also liked (15)

ΛΟΙΜΩΞΕΙΣ
ΛΟΙΜΩΞΕΙΣΛΟΙΜΩΞΕΙΣ
ΛΟΙΜΩΞΕΙΣ
 
Poo java
Poo javaPoo java
Poo java
 
Abstract class
Abstract classAbstract class
Abstract class
 
creativity at work fall2015
creativity at work fall2015creativity at work fall2015
creativity at work fall2015
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherence
 
Búsqueda no informada - Backtracking/Hacia atrás
Búsqueda no informada - Backtracking/Hacia atrásBúsqueda no informada - Backtracking/Hacia atrás
Búsqueda no informada - Backtracking/Hacia atrás
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
 
Internet de las cosas
Internet de las cosasInternet de las cosas
Internet de las cosas
 
αγαθα διακρισεις αγαθων
αγαθα διακρισεις αγαθωναγαθα διακρισεις αγαθων
αγαθα διακρισεις αγαθων
 
Rafael orsini 24118492
Rafael orsini 24118492Rafael orsini 24118492
Rafael orsini 24118492
 
Busqueda informada y explorada
Busqueda informada y exploradaBusqueda informada y explorada
Busqueda informada y explorada
 
Prolog Visualizer
Prolog VisualizerProlog Visualizer
Prolog Visualizer
 
Power point 1 η μικροδιδασκαλία
Power point 1 η μικροδιδασκαλίαPower point 1 η μικροδιδασκαλία
Power point 1 η μικροδιδασκαλία
 
Αναλυτικά Προγράμματα Σπουδών Β ́ και Γ ́ τάξεων Τομέα Δομικών Έργων ΦΕΚ 770 ...
Αναλυτικά Προγράμματα Σπουδών Β ́ και Γ ́ τάξεων Τομέα Δομικών Έργων ΦΕΚ 770 ...Αναλυτικά Προγράμματα Σπουδών Β ́ και Γ ́ τάξεων Τομέα Δομικών Έργων ΦΕΚ 770 ...
Αναλυτικά Προγράμματα Σπουδών Β ́ και Γ ́ τάξεων Τομέα Δομικών Έργων ΦΕΚ 770 ...
 
Εξεταστέα ύλη μαθημάτων της Γ΄τάξης του Τομέα Δομικών Εργων σχ.έτους 2015-16
Εξεταστέα ύλη μαθημάτων της Γ΄τάξης του Τομέα Δομικών Εργων σχ.έτους 2015-16Εξεταστέα ύλη μαθημάτων της Γ΄τάξης του Τομέα Δομικών Εργων σχ.έτους 2015-16
Εξεταστέα ύλη μαθημάτων της Γ΄τάξης του Τομέα Δομικών Εργων σχ.έτους 2015-16
 

Similar to Optimizing shared caches in chip multiprocessors

[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)npinto
 
Introduction to multi core
Introduction to multi coreIntroduction to multi core
Introduction to multi coremukul bhardwaj
 
Leveraging Structured Data To Reduce Disk, IO & Network Bandwidth
Leveraging Structured Data To Reduce Disk, IO & Network BandwidthLeveraging Structured Data To Reduce Disk, IO & Network Bandwidth
Leveraging Structured Data To Reduce Disk, IO & Network BandwidthPerforce
 
MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...
MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...
MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...Dell EMC World
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc
 
Chap1
Chap1Chap1
Chap1adisi
 
Preparing OpenSHMEM for Exascale
Preparing OpenSHMEM for ExascalePreparing OpenSHMEM for Exascale
Preparing OpenSHMEM for Exascaleinside-BigData.com
 
Ceg4131 models
Ceg4131 modelsCeg4131 models
Ceg4131 modelsanandme07
 
Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Akhil Nadh PC
 
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Nikolay Savvinov
 
Network support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersNetwork support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersSangjin Han
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsYasin Memari
 
Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Fazli Amin
 
Multiprocessor.pptx
 Multiprocessor.pptx Multiprocessor.pptx
Multiprocessor.pptxMuhammad54342
 

Similar to Optimizing shared caches in chip multiprocessors (20)

[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
 
Massively Parallel Architectures
Massively Parallel ArchitecturesMassively Parallel Architectures
Massively Parallel Architectures
 
Introduction to multi core
Introduction to multi coreIntroduction to multi core
Introduction to multi core
 
Cache coherence ppt
Cache coherence pptCache coherence ppt
Cache coherence ppt
 
Leveraging Structured Data To Reduce Disk, IO & Network Bandwidth
Leveraging Structured Data To Reduce Disk, IO & Network BandwidthLeveraging Structured Data To Reduce Disk, IO & Network Bandwidth
Leveraging Structured Data To Reduce Disk, IO & Network Bandwidth
 
Pdc lecture1
Pdc lecture1Pdc lecture1
Pdc lecture1
 
12-6810-12.ppt
12-6810-12.ppt12-6810-12.ppt
12-6810-12.ppt
 
MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...
MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...
MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...
 
PyData Paris 2015 - Closing keynote Francesc Alted
PyData Paris 2015 - Closing keynote Francesc AltedPyData Paris 2015 - Closing keynote Francesc Alted
PyData Paris 2015 - Closing keynote Francesc Alted
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Chap1
Chap1Chap1
Chap1
 
Preparing OpenSHMEM for Exascale
Preparing OpenSHMEM for ExascalePreparing OpenSHMEM for Exascale
Preparing OpenSHMEM for Exascale
 
Ceg4131 models
Ceg4131 modelsCeg4131 models
Ceg4131 models
 
Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]
 
module4.ppt
module4.pptmodule4.ppt
module4.ppt
 
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...
 
Network support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersNetwork support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacenters
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data Genomics
 
Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Lecture 1 (distributed systems)
Lecture 1 (distributed systems)
 
Multiprocessor.pptx
 Multiprocessor.pptx Multiprocessor.pptx
Multiprocessor.pptx
 

More from James Wong

Multi threaded rtos
Multi threaded rtosMulti threaded rtos
Multi threaded rtosJames Wong
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data miningJames Wong
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryJames Wong
 
Big picture of data mining
Big picture of data miningBig picture of data mining
Big picture of data miningJames Wong
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching worksJames Wong
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherenceJames Wong
 
Abstract data types
Abstract data typesAbstract data types
Abstract data typesJames Wong
 
Abstraction file
Abstraction fileAbstraction file
Abstraction fileJames Wong
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cacheJames Wong
 
Abstract class
Abstract classAbstract class
Abstract classJames Wong
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysisJames Wong
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with javaJames Wong
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithmsJames Wong
 
Cobol, lisp, and python
Cobol, lisp, and pythonCobol, lisp, and python
Cobol, lisp, and pythonJames Wong
 

More from James Wong (20)

Data race
Data raceData race
Data race
 
Multi threaded rtos
Multi threaded rtosMulti threaded rtos
Multi threaded rtos
 
Recursion
RecursionRecursion
Recursion
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Cache recap
Cache recapCache recap
Cache recap
 
Big picture of data mining
Big picture of data miningBig picture of data mining
Big picture of data mining
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching works
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherence
 
Abstract data types
Abstract data typesAbstract data types
Abstract data types
 
Abstraction file
Abstraction fileAbstraction file
Abstraction file
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cache
 
Object model
Object modelObject model
Object model
 
Abstract class
Abstract classAbstract class
Abstract class
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysis
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with java
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
 
Cobol, lisp, and python
Cobol, lisp, and pythonCobol, lisp, and python
Cobol, lisp, and python
 
Inheritance
InheritanceInheritance
Inheritance
 
Api crash
Api crashApi crash
Api crash
 

Recently uploaded

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Optimizing shared caches in chip multiprocessors

  • 1. Core 2 Duo die “Just a few years ago, the idea of putting multiple processors on a chip was farfetched. Now it is accepted and commonplace, and virtually every new high performance processor is a chip multiprocessor of some sort…” Center for Electronic System Design Univ. of California Berkeley Chip Multiprocessors?? “Mowry is working on the development of single-chip multiprocessors: one large chip capable of performing multiple operations at once, using similar techniques to maximize performance” -- Technology Review, 1999 Sony's Playstation 3, 2006
  • 2. CMP Caches: Design Space • Architecture – Placement of Cache/Processors – Interconnects/Routing • Cache Organization & Management – Private/Shared/Hybrid – Fully Hardware/OS Interface “L2 is the last line of defense before hitting the memory wall, and is the focus of our talk”
  • 3. Private L2 Cache I$ D$ I$ D$ L2 $ L2 $ L2 $ L2 $ L2 $ L2 $ I N T E R C O N N E C T Coherence Protocol Offchip Memory + Less interconnect traffic + Insulates L2 units + Hit latency – Duplication – Load imbalance – Complexity of coherence – Higher miss rate L1 L1 Proc
  • 4. Shared-Interleaved L2 Cache – Interconnect traffic – Interference between cores – Hit latency is higher + No duplication + Balance the load + Lower miss rate + Simplicity of coherence I$ D$ I$ D$ I N T E R C O N N E C T Coherence ProtocolL1 L2
  • 5. Take Home Message • Leverage on-chip access time
  • 6. Take Home Messages • Leverage on-chip access time • Better sharing of cache resources • Isolating performance of processors • Place data on the chip close to where it is used • Minimize inter-processor misses (in shared cache) • Fairness towards processors
  • 7. On to some solutions… Jichuan Chang and Gurindar S. Sohi Cooperative Caching for Chip Multiprocessors International Symposium on Computer Architecture, 2006. Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches International Symposium on Computer Architecture, 2009. Shekhar Srikantaiah, Mahmut Kandemir, and Mary Jane Irwin Adaptive Set-Pinning: Managing Shared Caches in Chip Multiprocessors Architectural Support for Programming Languages and Operating, Systems 2008. each handles this problem in a different way
  • 8. Co-operative Caching (Chang & Sohi) • Private L2 caches • Attract data locally to reduce remote on chip access. Lowers average on-chip misses. • Co-operation among the private caches for efficient use of resources on the chip. • Controlling the extent of co-operation to suit the dynamic workload behavior
  • 9. CC Techniques • Cache to cache transfer of clean data – In case of miss transfer “clean” blocks from another L2 cache. – This is useful in the case of “read only” data (instructions) . • Replication aware data replacement – Singlet/Replicate. – Evict singlet only when no replicates exist. – Singlets can be “spilled” to other cache banks. • Global replacement of inactive data – Global management needed for managing “spilling”. – N-Chance Forwarding. – Set recirculation count to N when spilled. – Decrease N by 1 when spilled again, unless N becomes 0.
  • 10. Set “Pinning” -- Setup P1 P2 P3 P4 Set 0 Set 1 : : Set (S-1) L1 cache Processors Shared L2 cache I n t e r c o n n e c t Main Memory
  • 11. Set “Pinning” -- Problem P1 P2 P3 P4 Set 0 Set 1 : : Set (S-1) Main Memory
  • 12. Set “Pinning” -- Types of Cache Misses • Compulsory (aka Cold) • Capacity • Conflict • Coherence • Compulsory • Inter-processor • Intra-processor versus
  • 13. P1 P2 P3 P4 Main Memory POP 1 POP 2 POP 3 POP 4 Set : : Set Owner Other bits Data
  • 14. R-NUCA: Use Class-Based Strategies Solve for the common case! Most current (and future) programs have the following types of accesses 1. Instruction Access – Shared, but Read-Only 2. Private Data Access – Read-Write, but not Shared 3. Shared Data Access – Read-Write (or) Read-Only, but Shared.
  • 15. R-NUCA: Can do this online! • We have information from the OS and TLB • For each memory block, classify it as – Instruction – Private Data – Shared Data • Handle them differently – Replicate instructions – Keep private data locally – Keep shared data globally
  • 16. R-NUCA: Reactive Clustering • Assign clusters based on level of sharing – Private Data given level-1 clusters (local cache) – Shared Data given level-16 clusters (16 neighboring machines), etc. Clusters ≈ Overlapping Sets in Set-Associative Mapping • Within a cluster, “Rotational Interleaving” – Load-Balancing to minimize contention on bus and controller
  • 18. Just Kidding… • Optimize for Power Consumption • Assess trade-offs between more caches and more cores • Minimize usage of OS, but still retain flexibility • Application adaptation to allocated cache quotas • Adding hardware directed thread level speculation
  • 20. Backup • Commercial and research prototypes – Sun MAJC – Piranha – IBM Power 4/5 – Stanford Hydra