SlideShare a Scribd company logo
1 of 21
Core 2 Duo die
“Just a few years ago, the idea of putting multiple processors
on a chip was farfetched. Now it is accepted and
commonplace, and virtually every new high performance
processor is a chip multiprocessor of some sort…”
Center for Electronic System Design
Univ. of California Berkeley
Chip Multiprocessors??
“Mowry is working on the development
of single-chip multiprocessors: one large
chip capable of performing multiple
operations at once, using similar
techniques to maximize performance”
-- Technology Review, 1999
Sony's Playstation 3, 2006
CMP Caches: Design Space
• Architecture
– Placement of Cache/Processors
– Interconnects/Routing
• Cache Organization & Management
– Private/Shared/Hybrid
– Fully Hardware/OS Interface
“L2 is the last line of defense before hitting the
memory wall, and is the focus of our talk”
Private L2 Cache
I$ D$ I$ D$
L2 $ L2 $ L2 $ L2 $ L2 $ L2 $
I N T E R C O N N E C T
Coherence Protocol
Offchip Memory
+ Less interconnect traffic
+ Insulates L2 units
+ Hit latency
– Duplication
– Load imbalance
– Complexity of coherence
– Higher miss rate
L1 L1
Proc
Shared-Interleaved L2 Cache
– Interconnect traffic
– Interference between cores
– Hit latency is higher
+ No duplication
+ Balance the load
+ Lower miss rate
+ Simplicity of coherence
I$ D$ I$ D$
I N T E R C O N N E C T
Coherence ProtocolL1
L2
Take Home Message
• Leverage on-chip access time
Take Home Messages
• Leverage on-chip access time
• Better sharing of cache resources
• Isolating performance of processors
• Place data on the chip close to where it is used
• Minimize inter-processor misses (in shared cache)
• Fairness towards processors
On to some solutions…
Jichuan Chang and Gurindar S. Sohi
Cooperative Caching for Chip Multiprocessors
International Symposium on Computer Architecture, 2006.
Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki
Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches
International Symposium on Computer Architecture, 2009.
Shekhar Srikantaiah, Mahmut Kandemir, and Mary Jane Irwin
Adaptive Set-Pinning: Managing Shared Caches in Chip Multiprocessors
Architectural Support for Programming Languages and Operating, Systems 2008.
each handles this problem in a different way
Co-operative Caching
(Chang & Sohi)
• Private L2 caches
• Attract data locally to reduce remote on chip access.
Lowers average on-chip misses.
• Co-operation among the private caches for efficient
use of resources on the chip.
• Controlling the extent of co-operation to suit the
dynamic workload behavior
CC Techniques
• Cache to cache transfer of clean data
– In case of miss transfer “clean” blocks from another L2 cache.
– This is useful in the case of “read only” data (instructions) .
• Replication aware data replacement
– Singlet/Replicate.
– Evict singlet only when no replicates exist.
– Singlets can be “spilled” to other cache banks.
• Global replacement of inactive data
– Global management needed for managing “spilling”.
– N-Chance Forwarding.
– Set recirculation count to N when spilled.
– Decrease N by 1 when spilled again, unless N becomes 0.
Set “Pinning” -- Setup
P1
P2
P3
P4
Set 0
Set 1
:
:
Set (S-1)
L1
cache
Processors Shared
L2 cache
I
n
t
e
r
c
o
n
n
e
c
t
Main
Memory
Set “Pinning” -- Problem
P1
P2
P3
P4
Set 0
Set 1
:
:
Set (S-1)
Main
Memory
Set “Pinning”
-- Types of Cache Misses
• Compulsory
(aka Cold)
• Capacity
• Conflict
• Coherence
• Compulsory
• Inter-processor
• Intra-processor
versus
P1
P2
P3
P4
Main
Memory
POP 1
POP 2
POP 3
POP 4
Set
:
:
Set
Owner Other bits Data
R-NUCA: Use Class-Based Strategies
Solve for the common case!
Most current (and future) programs have the following types of accesses
1. Instruction Access – Shared, but Read-Only
2. Private Data Access – Read-Write, but not Shared
3. Shared Data Access – Read-Write (or) Read-Only, but Shared.
R-NUCA: Can do this online!
• We have information from the OS and TLB
• For each memory block, classify it as
– Instruction
– Private Data
– Shared Data
• Handle them differently
– Replicate instructions
– Keep private data locally
– Keep shared data globally
R-NUCA: Reactive Clustering
• Assign clusters based on level of sharing
– Private Data given level-1 clusters (local cache)
– Shared Data given level-16 clusters (16 neighboring machines), etc.
Clusters ≈ Overlapping Sets in Set-Associative Mapping
• Within a cluster, “Rotational Interleaving”
– Load-Balancing to minimize contention on bus and controller
Future Directions
Area has been closed.
Just Kidding…
• Optimize for Power Consumption
• Assess trade-offs between more caches and more cores
• Minimize usage of OS, but still retain flexibility
• Application adaptation to allocated cache quotas
• Adding hardware directed thread level speculation
Questions?
THANK YOU!
Backup
• Commercial and research prototypes
– Sun MAJC
– Piranha
– IBM Power 4/5
– Stanford Hydra
Backup

More Related Content

What's hot

What's hot (15)

Introduction to Microkernels
Introduction to MicrokernelsIntroduction to Microkernels
Introduction to Microkernels
 
Plan 9: Not (Only) A Better UNIX
Plan 9: Not (Only) A Better UNIXPlan 9: Not (Only) A Better UNIX
Plan 9: Not (Only) A Better UNIX
 
From L3 to seL4: What have we learnt in 20 years of L4 microkernels
From L3 to seL4: What have we learnt in 20 years of L4 microkernelsFrom L3 to seL4: What have we learnt in 20 years of L4 microkernels
From L3 to seL4: What have we learnt in 20 years of L4 microkernels
 
Linux kernel Architecture and Properties
Linux kernel Architecture and PropertiesLinux kernel Architecture and Properties
Linux kernel Architecture and Properties
 
μ-Kernel Evolution
μ-Kernel Evolutionμ-Kernel Evolution
μ-Kernel Evolution
 
Architecture Of The Linux Kernel
Architecture Of The Linux KernelArchitecture Of The Linux Kernel
Architecture Of The Linux Kernel
 
Parallel Computing - Lec 3
Parallel Computing - Lec 3Parallel Computing - Lec 3
Parallel Computing - Lec 3
 
L4 Microkernel :: Design Overview
L4 Microkernel :: Design OverviewL4 Microkernel :: Design Overview
L4 Microkernel :: Design Overview
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBD
 
[TALK] Exokernel vs. Microkernel
[TALK] Exokernel vs. Microkernel[TALK] Exokernel vs. Microkernel
[TALK] Exokernel vs. Microkernel
 
Hints for L4 Microkernel
Hints for L4 MicrokernelHints for L4 Microkernel
Hints for L4 Microkernel
 
Embedded Hypervisor for ARM
Embedded Hypervisor for ARMEmbedded Hypervisor for ARM
Embedded Hypervisor for ARM
 
Multicore Processors
Multicore ProcessorsMulticore Processors
Multicore Processors
 
Implement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVMImplement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVM
 
Ubuntu
UbuntuUbuntu
Ubuntu
 

Viewers also liked

Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherenceHarry Potter
 
Rest api to integrate with your site
Rest api to integrate with your siteRest api to integrate with your site
Rest api to integrate with your siteTony Nguyen
 
Abstract class
Abstract classAbstract class
Abstract classFraboni Ec
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cacheTony Nguyen
 
Programming for engineers in python
Programming for engineers in pythonProgramming for engineers in python
Programming for engineers in pythonHarry Potter
 
Extending burp with python
Extending burp with pythonExtending burp with python
Extending burp with pythonTony Nguyen
 
Abstract data types
Abstract data typesAbstract data types
Abstract data typesTony Nguyen
 
Programming for engineers in python
Programming for engineers in pythonProgramming for engineers in python
Programming for engineers in pythonFraboni Ec
 
Encapsulation anonymous class
Encapsulation anonymous classEncapsulation anonymous class
Encapsulation anonymous classHarry Potter
 
Abstraction file
Abstraction fileAbstraction file
Abstraction fileFraboni Ec
 
Rest api to integrate with your site
Rest api to integrate with your siteRest api to integrate with your site
Rest api to integrate with your siteHarry Potter
 
Abstract data types
Abstract data typesAbstract data types
Abstract data typesHarry Potter
 

Viewers also liked (20)

Learning python
Learning pythonLearning python
Learning python
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherence
 
Rest api to integrate with your site
Rest api to integrate with your siteRest api to integrate with your site
Rest api to integrate with your site
 
Abstract class
Abstract classAbstract class
Abstract class
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cache
 
Programming for engineers in python
Programming for engineers in pythonProgramming for engineers in python
Programming for engineers in python
 
Extending burp with python
Extending burp with pythonExtending burp with python
Extending burp with python
 
Object model
Object modelObject model
Object model
 
Smm & caching
Smm & cachingSmm & caching
Smm & caching
 
Abstract data types
Abstract data typesAbstract data types
Abstract data types
 
Inheritance
InheritanceInheritance
Inheritance
 
Programming for engineers in python
Programming for engineers in pythonProgramming for engineers in python
Programming for engineers in python
 
Encapsulation anonymous class
Encapsulation anonymous classEncapsulation anonymous class
Encapsulation anonymous class
 
Poo java
Poo javaPoo java
Poo java
 
Abstract data types
Abstract data typesAbstract data types
Abstract data types
 
Abstraction file
Abstraction fileAbstraction file
Abstraction file
 
Python basics
Python basicsPython basics
Python basics
 
Rest api to integrate with your site
Rest api to integrate with your siteRest api to integrate with your site
Rest api to integrate with your site
 
Inheritance
InheritanceInheritance
Inheritance
 
Abstract data types
Abstract data typesAbstract data types
Abstract data types
 

Similar to Optimizing shared caches in chip multiprocessors

[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)npinto
 
Introduction to multi core
Introduction to multi coreIntroduction to multi core
Introduction to multi coremukul bhardwaj
 
Leveraging Structured Data To Reduce Disk, IO & Network Bandwidth
Leveraging Structured Data To Reduce Disk, IO & Network BandwidthLeveraging Structured Data To Reduce Disk, IO & Network Bandwidth
Leveraging Structured Data To Reduce Disk, IO & Network BandwidthPerforce
 
MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...
MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...
MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...Dell EMC World
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc
 
Chap1
Chap1Chap1
Chap1adisi
 
Preparing OpenSHMEM for Exascale
Preparing OpenSHMEM for ExascalePreparing OpenSHMEM for Exascale
Preparing OpenSHMEM for Exascaleinside-BigData.com
 
Ceg4131 models
Ceg4131 modelsCeg4131 models
Ceg4131 modelsanandme07
 
Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Akhil Nadh PC
 
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Nikolay Savvinov
 
Network support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersNetwork support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersSangjin Han
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsYasin Memari
 
Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Fazli Amin
 
Multiprocessor.pptx
 Multiprocessor.pptx Multiprocessor.pptx
Multiprocessor.pptxMuhammad54342
 

Similar to Optimizing shared caches in chip multiprocessors (20)

[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
 
Massively Parallel Architectures
Massively Parallel ArchitecturesMassively Parallel Architectures
Massively Parallel Architectures
 
Introduction to multi core
Introduction to multi coreIntroduction to multi core
Introduction to multi core
 
Cache coherence ppt
Cache coherence pptCache coherence ppt
Cache coherence ppt
 
Leveraging Structured Data To Reduce Disk, IO & Network Bandwidth
Leveraging Structured Data To Reduce Disk, IO & Network BandwidthLeveraging Structured Data To Reduce Disk, IO & Network Bandwidth
Leveraging Structured Data To Reduce Disk, IO & Network Bandwidth
 
Pdc lecture1
Pdc lecture1Pdc lecture1
Pdc lecture1
 
12-6810-12.ppt
12-6810-12.ppt12-6810-12.ppt
12-6810-12.ppt
 
MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...
MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...
MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...
 
PyData Paris 2015 - Closing keynote Francesc Alted
PyData Paris 2015 - Closing keynote Francesc AltedPyData Paris 2015 - Closing keynote Francesc Alted
PyData Paris 2015 - Closing keynote Francesc Alted
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Chap1
Chap1Chap1
Chap1
 
Preparing OpenSHMEM for Exascale
Preparing OpenSHMEM for ExascalePreparing OpenSHMEM for Exascale
Preparing OpenSHMEM for Exascale
 
Ceg4131 models
Ceg4131 modelsCeg4131 models
Ceg4131 models
 
Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]
 
module4.ppt
module4.pptmodule4.ppt
module4.ppt
 
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...
 
Network support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacentersNetwork support for resource disaggregation in next-generation datacenters
Network support for resource disaggregation in next-generation datacenters
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data Genomics
 
Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Lecture 1 (distributed systems)
Lecture 1 (distributed systems)
 
Multiprocessor.pptx
 Multiprocessor.pptx Multiprocessor.pptx
Multiprocessor.pptx
 

More from Fraboni Ec

Hardware multithreading
Hardware multithreadingHardware multithreading
Hardware multithreadingFraboni Ec
 
What is simultaneous multithreading
What is simultaneous multithreadingWhat is simultaneous multithreading
What is simultaneous multithreadingFraboni Ec
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherenceFraboni Ec
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data miningFraboni Ec
 
Big picture of data mining
Big picture of data miningBig picture of data mining
Big picture of data miningFraboni Ec
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryFraboni Ec
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching worksFraboni Ec
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cacheFraboni Ec
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithmsFraboni Ec
 
Cobol, lisp, and python
Cobol, lisp, and pythonCobol, lisp, and python
Cobol, lisp, and pythonFraboni Ec
 
Abstract data types
Abstract data typesAbstract data types
Abstract data typesFraboni Ec
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysisFraboni Ec
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with javaFraboni Ec
 
Extending burp with python
Extending burp with pythonExtending burp with python
Extending burp with pythonFraboni Ec
 
Learning python
Learning pythonLearning python
Learning pythonFraboni Ec
 
Python language data types
Python language data typesPython language data types
Python language data typesFraboni Ec
 

More from Fraboni Ec (20)

Hardware multithreading
Hardware multithreadingHardware multithreading
Hardware multithreading
 
Lisp
LispLisp
Lisp
 
What is simultaneous multithreading
What is simultaneous multithreadingWhat is simultaneous multithreading
What is simultaneous multithreading
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherence
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Big picture of data mining
Big picture of data miningBig picture of data mining
Big picture of data mining
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Cache recap
Cache recapCache recap
Cache recap
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching works
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cache
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
 
Cobol, lisp, and python
Cobol, lisp, and pythonCobol, lisp, and python
Cobol, lisp, and python
 
Abstract data types
Abstract data typesAbstract data types
Abstract data types
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysis
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with java
 
Api crash
Api crashApi crash
Api crash
 
Extending burp with python
Extending burp with pythonExtending burp with python
Extending burp with python
 
Learning python
Learning pythonLearning python
Learning python
 
Python language data types
Python language data typesPython language data types
Python language data types
 
Python basics
Python basicsPython basics
Python basics
 

Recently uploaded

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Recently uploaded (20)

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

Optimizing shared caches in chip multiprocessors

  • 1. Core 2 Duo die “Just a few years ago, the idea of putting multiple processors on a chip was farfetched. Now it is accepted and commonplace, and virtually every new high performance processor is a chip multiprocessor of some sort…” Center for Electronic System Design Univ. of California Berkeley Chip Multiprocessors?? “Mowry is working on the development of single-chip multiprocessors: one large chip capable of performing multiple operations at once, using similar techniques to maximize performance” -- Technology Review, 1999 Sony's Playstation 3, 2006
  • 2. CMP Caches: Design Space • Architecture – Placement of Cache/Processors – Interconnects/Routing • Cache Organization & Management – Private/Shared/Hybrid – Fully Hardware/OS Interface “L2 is the last line of defense before hitting the memory wall, and is the focus of our talk”
  • 3. Private L2 Cache I$ D$ I$ D$ L2 $ L2 $ L2 $ L2 $ L2 $ L2 $ I N T E R C O N N E C T Coherence Protocol Offchip Memory + Less interconnect traffic + Insulates L2 units + Hit latency – Duplication – Load imbalance – Complexity of coherence – Higher miss rate L1 L1 Proc
  • 4. Shared-Interleaved L2 Cache – Interconnect traffic – Interference between cores – Hit latency is higher + No duplication + Balance the load + Lower miss rate + Simplicity of coherence I$ D$ I$ D$ I N T E R C O N N E C T Coherence ProtocolL1 L2
  • 5. Take Home Message • Leverage on-chip access time
  • 6. Take Home Messages • Leverage on-chip access time • Better sharing of cache resources • Isolating performance of processors • Place data on the chip close to where it is used • Minimize inter-processor misses (in shared cache) • Fairness towards processors
  • 7. On to some solutions… Jichuan Chang and Gurindar S. Sohi Cooperative Caching for Chip Multiprocessors International Symposium on Computer Architecture, 2006. Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches International Symposium on Computer Architecture, 2009. Shekhar Srikantaiah, Mahmut Kandemir, and Mary Jane Irwin Adaptive Set-Pinning: Managing Shared Caches in Chip Multiprocessors Architectural Support for Programming Languages and Operating, Systems 2008. each handles this problem in a different way
  • 8. Co-operative Caching (Chang & Sohi) • Private L2 caches • Attract data locally to reduce remote on chip access. Lowers average on-chip misses. • Co-operation among the private caches for efficient use of resources on the chip. • Controlling the extent of co-operation to suit the dynamic workload behavior
  • 9. CC Techniques • Cache to cache transfer of clean data – In case of miss transfer “clean” blocks from another L2 cache. – This is useful in the case of “read only” data (instructions) . • Replication aware data replacement – Singlet/Replicate. – Evict singlet only when no replicates exist. – Singlets can be “spilled” to other cache banks. • Global replacement of inactive data – Global management needed for managing “spilling”. – N-Chance Forwarding. – Set recirculation count to N when spilled. – Decrease N by 1 when spilled again, unless N becomes 0.
  • 10. Set “Pinning” -- Setup P1 P2 P3 P4 Set 0 Set 1 : : Set (S-1) L1 cache Processors Shared L2 cache I n t e r c o n n e c t Main Memory
  • 11. Set “Pinning” -- Problem P1 P2 P3 P4 Set 0 Set 1 : : Set (S-1) Main Memory
  • 12. Set “Pinning” -- Types of Cache Misses • Compulsory (aka Cold) • Capacity • Conflict • Coherence • Compulsory • Inter-processor • Intra-processor versus
  • 13. P1 P2 P3 P4 Main Memory POP 1 POP 2 POP 3 POP 4 Set : : Set Owner Other bits Data
  • 14. R-NUCA: Use Class-Based Strategies Solve for the common case! Most current (and future) programs have the following types of accesses 1. Instruction Access – Shared, but Read-Only 2. Private Data Access – Read-Write, but not Shared 3. Shared Data Access – Read-Write (or) Read-Only, but Shared.
  • 15. R-NUCA: Can do this online! • We have information from the OS and TLB • For each memory block, classify it as – Instruction – Private Data – Shared Data • Handle them differently – Replicate instructions – Keep private data locally – Keep shared data globally
  • 16. R-NUCA: Reactive Clustering • Assign clusters based on level of sharing – Private Data given level-1 clusters (local cache) – Shared Data given level-16 clusters (16 neighboring machines), etc. Clusters ≈ Overlapping Sets in Set-Associative Mapping • Within a cluster, “Rotational Interleaving” – Load-Balancing to minimize contention on bus and controller
  • 18. Just Kidding… • Optimize for Power Consumption • Assess trade-offs between more caches and more cores • Minimize usage of OS, but still retain flexibility • Application adaptation to allocated cache quotas • Adding hardware directed thread level speculation
  • 20. Backup • Commercial and research prototypes – Sun MAJC – Piranha – IBM Power 4/5 – Stanford Hydra