SlideShare a Scribd company logo
An Application Classification Guided Cache
Tuning Heuristic for Multi-core Architectures
PRESENTED BY:- GUIDED BY:-
DEBABRATA PAUL CHOWDHURY(14014081002) PRO F. PRASHANT MODI
KHYATI RAJPUT (14014081007) (UVPCE)
M.TECH-CE(SEM -II)
Contents
• Introduction
• Multi core System Optimization
• Cache Tuning
• Cache Tuning Process
• Multi-core Architectural Layout
• Application Classification Guided Cache Tuning Heuristic
• Experimental Work
• Conclusion
Introduction
Basic Concepts
• Single Core :- In single core architecture, computing component having one
independent processing unit.
Introduction(cont.)
• Multi Core:-
• In multi core architecture single computing component with two or more independent
actual processing units (called "cores").
• Run multiple instructions of a program at the same time, increasing overall speed for
programs –”Parallel Computing”.
Muti-Core System Optimization
•Previous multi-core cache optimizations only focused on improving performance(such
as number of Hits, misses, and write backs).
•But now multi-core optimizations focused on reducing energy consumption via tuning
individual cores.
•Definition of Multi-core system optimization
• Multi-core system optimization improve system performance and energy
consumption by tuning the system to the application’s runtime behavior and
resource requirements.
What is Cache Tuning?
•Cache tuning the task of choosing the best configuration of cache design parameters
for a particular application, or for a particular phase of an application, such that
performance, power and/or energy are optimized.
Cache Tuning Process
Step 1:- Execute the application for one tuning interval in each potential
configuration (tuning intervals must be long enough for the cache behavior to
stabilize).
Step 2:- Gather cache statistics, such as the number of accesses, misses, and
write backs, for each explored configuration.
Step 3:- Combine the cache statistics with an energy model to determine the
optimal cache configuration.
Step 4:- Fix the cache parameter values to the optimal cache configuration’s
parameter values.
Multi-core Architectural Layout
Multi-core Architectural Layout(cont.)
• Multi- core Architecture consist of:-
1. Arbitrary number of cores
2. A cache tuner
• Each core has a private data cache (L1).
• Global cache tuner connected to each core’s private data cache(L1).
• It calculates the cache tuner heuristics by gathering cache statistics, coordinating
cache tuning among the cores and calculates the cache’s energy consumption.
Multi-core Architectural Layout(cont.)
Overheads in this Multi-core Architecture Layout
• During tuning, applications incur stall cycles while the tuner gathers cache statistics,
calculates energy consumption, and changes the cache configuration.
• These tuning stall cycles introduce Energy and Performance overhead.
• Our tuning heuristic considers these overheads incurred during the tuning stall cycles,
and thus minimizes the number of simultaneously tuned cores and the tuning energy
and performance overheads.
Multi-core Architectural Layout(cont.)
Multi-core Architectural Layout(cont.)
• Figure illustrates the similarities using actual data cache miss rates for an 8-core system
(the cores are denoted as P0 to P7).
•We evaluate cache miss rate similarity by normalizing the caches’ miss rates to the core
with the lowest miss rate.
•In first figure, normalized miss rates are nearly 1.0 for all cores, all caches are classified
as having similar behavior.
•In second figure, normalized miss rates show that P1 has similar cache behavior as P2 to
P7 (i.e. P1 to P7’s normalized miss rates are nearly 3.5), but P0 has different cache
behavior than P1 to P7.
Application Classification Guided Cache
Tuning Heuristic
• Application classification is based on the two things :-
1. Cache Behaviour
2. Data Sharing or Non Data Sharing Application
• Cache accesses and misses are used to determine if data sets have similar cache
behavior.
•In data-sharing application’s if coherence misses attribute to more than 5% of the total
cache misses , then application is classified as data sharing otherwise the application is
non-data-sharing.
Application Classification Guided Cache
Tuning Heuristic(cont.)
Application Classification Guided Cache
Tuning Heuristic(cont.)
• Application classification guided cache tuning heuristic, which consists of three
main steps:
1) Application profiling and initial tuning
2) Application classification
3) Final tuning actions
Application Classification Guided Cache
Tuning Heuristic(cont.)
•Step 1 profiles the application to gather the caches statistics, which are used to determine
cache behavior and data sharing in step 2.
•Step 1 is critical for avoiding redundant cache tuning in situations where the data sets have
similar cache behavior and similar optimal configurations.
•Condition 1 and Condition 2 classify the applications based on whether or not the cores have
similar cache behavior and/or exhibit data sharing, respectively.
•Evaluating these conditions determines the necessary cache tuning effort in Step 3.
•If condition 1 is evaluated as true. In these situations, only a single cache needs to be tuned.
•When final configuration is obtained apply this configuration to all other cores.
Application Classification Guided Cache
Tuning Heuristic(cont.)
•If the data sets have different cache behavior, or Condition 1 is false, tuning is
more complex and several cores must be tuned.
•If the application does not shares data, or Condition 2 is false, the heuristic only
tunes one core from each group and cores can be tuned independently without
affecting the behavior of the other cores.
•If the application shares data, or Condition 2 is true, the heuristic still only tunes
one core from each group but the tuning must be coordinated among the cores.
Experimental Results
• We quantified the energy savings and performance of our heuristic using SPLASH-2
multithreaded application.
• The SPLASH-2 suite is one of the most widely used collections of multithreaded
workloads.
• On the SESC simulator for a 1-, 2-, 4-, 8- and 16- core system. In SESC, we modeled a
heterogeneous system with the L1 data cache parameters.
•Since the L1 data cache has 36 possible configurations, our design space is 36^n where
n is the of cores in the system.
•The L1 instruction cache and L2 unified cache were fixed at the base configuration and
256 KB, 4-way set associative cache with a 64 byte line size, respectively. We modified
SESC to identify coherence misses.
Experimental Results(cont.)
Energy Model for the multi-core system
• total energy = ∑(energy consumed by each core)
• energy consumed by each core:
energy = dynamic_energy + static_energy + fill_energy + writeback_energy + CPU_stall_energy
• dynamic_energy: The dynamic power consumption originates from logic-gate activities in the
CPU.
dynamic_energy = dL1_accesses * dL1_access_energy
• static energy: The static energy consumption enables energy-aware software development.
Static energy is actually not good for the system at all.
static_energy = ((dL1_misses * miss_latency_cycles) + (dL1_hits * hit_latency_cycles) +
(dL1_writebacks * writeback_latency_cycles)) * dL1_static_energy
Experimental Results(cont.)
•fill_energy: fill_energy = dL1_misses * (linesize / wordsize) *mem_read_energy_perword
• writeback_energy: Write back is a storage method in which data is written into the cache
writeback_energy = dL1_writebacks * (linesize / wordsize) *
mem_write_energy_perword
•CPU_stall_energy: CPU_stall_energy = ((dL1_misses * miss_latency_cycles) +
(dL1_writebacks * writeback_latency_cycles)) * CPU_idle_energy
• Our model calculates the dynamic and static energy of each data cache, the energy needed to
fill the cache on a miss, the energy consumed on a cache write back, and the energy consumed
when the processor is stalled during cache fills and write backs.
• We gathered dL1_misses, dL1_hits, and dL1_writebacks cache statistics using SESC.
Experimental Results(cont.)
• We assumed the core’s idle energy (CPU_idle_energy) to be 25% and the static energy
per cycle to be 25% of the cache’s dynamic energy.
• Let the tuning interval of 50,000 cycles.
• Using configuration_energy_per_cycle to determine the energy consumed during each
500,000 cycle tuning interval and the energy consumed in the final configuration.
• Energy savings were calculated by normalizing the energy to the energy consumed
executing the application in the base configuration.
Results and Analysis
•Figure given below depict the energy savings and performance, respectively, for the
optimal configuration determined via exhaustive design space exploration (optimal) for 2-
and 4-core systems and for the final configuration found by our application classification
cache tuning heuristic (heuristic) for 2-, 4-, 8-, and 16-core systems, for each application
and averaged across all applications (Avg).
•Our heuristic achieved 26% and 25% energy savings, incurred 9% and 6% performance
penalties, and achieved average speedups for the 8- and 16-core systems, respectively.
Results and Analysis(Cont..)
• Normalised performance for the optimal cache (optimal) for 2- and 4-core systems and
the final configuration for the application classification cache tuning heuristic for 2-, 4-,
8- and 16-core systems as compared to the systems respective base configurations.
Results and Analysis(Cont..)
• Energy Saving
• We can get this much of energy consumption.
Conclusion
•Our heuristic classified applications based on data sharing and cache behavior, and used
this classification to identify which cores needed to be tuned and to reduce the number
of cores being tuned simultaneously.
Future Work
•Our heuristic searched at most 1% of the design space, yielded configurations within 2%
of the optimal, and achieved an average of 25% energy savings.
•In future work we plan to investigate how our heuristic will be applicable to a larger
system with hundreds of cores.
An application classification guided cache tuning heuristic for

More Related Content

What's hot

Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query Optimization
Christian Antognini
 
Comparative Study on the Performance of A Coherency-based Simple Dynamic Equi...
Comparative Study on the Performance of A Coherency-based Simple Dynamic Equi...Comparative Study on the Performance of A Coherency-based Simple Dynamic Equi...
Comparative Study on the Performance of A Coherency-based Simple Dynamic Equi...
IJAPEJOURNAL
 
Energy efficient-resource-allocation-in-distributed-computing-systems
Energy efficient-resource-allocation-in-distributed-computing-systemsEnergy efficient-resource-allocation-in-distributed-computing-systems
Energy efficient-resource-allocation-in-distributed-computing-systemsCemal Ardil
 
A multi objective hybrid aco-pso optimization algorithm for virtual machine p...
A multi objective hybrid aco-pso optimization algorithm for virtual machine p...A multi objective hybrid aco-pso optimization algorithm for virtual machine p...
A multi objective hybrid aco-pso optimization algorithm for virtual machine p...
eSAT Publishing House
 
Memory management in oracle
Memory management in oracleMemory management in oracle
Memory management in oracleDavin Abraham
 
Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...
Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...
Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...
Szymon Skorupinski
 
Thesis-MitchellColgan_LongTerm_PowerSystem_Planning
Thesis-MitchellColgan_LongTerm_PowerSystem_PlanningThesis-MitchellColgan_LongTerm_PowerSystem_Planning
Thesis-MitchellColgan_LongTerm_PowerSystem_PlanningElliott Mitchell-Colgan
 
Oracle Database 12.1.0.2 New Performance Features
Oracle Database 12.1.0.2 New Performance FeaturesOracle Database 12.1.0.2 New Performance Features
Oracle Database 12.1.0.2 New Performance Features
Christian Antognini
 
Comparative study to realize an automatic speaker recognition system
Comparative study to realize an automatic speaker recognition system Comparative study to realize an automatic speaker recognition system
Comparative study to realize an automatic speaker recognition system
IJECEIAES
 
Dynamic task scheduling on multicore automotive ec us
Dynamic task scheduling on multicore automotive ec usDynamic task scheduling on multicore automotive ec us
Dynamic task scheduling on multicore automotive ec us
VLSICS Design
 
Les 01 core
Les 01 coreLes 01 core
Les 01 core
Femi Adeyemi
 
Windows server power_efficiency___robben_and_worthington__final
Windows server power_efficiency___robben_and_worthington__finalWindows server power_efficiency___robben_and_worthington__final
Windows server power_efficiency___robben_and_worthington__final
Bruce Worthington
 

What's hot (19)

Les 14 perf_db
Les 14 perf_dbLes 14 perf_db
Les 14 perf_db
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query Optimization
 
Comparative Study on the Performance of A Coherency-based Simple Dynamic Equi...
Comparative Study on the Performance of A Coherency-based Simple Dynamic Equi...Comparative Study on the Performance of A Coherency-based Simple Dynamic Equi...
Comparative Study on the Performance of A Coherency-based Simple Dynamic Equi...
 
Energy efficient-resource-allocation-in-distributed-computing-systems
Energy efficient-resource-allocation-in-distributed-computing-systemsEnergy efficient-resource-allocation-in-distributed-computing-systems
Energy efficient-resource-allocation-in-distributed-computing-systems
 
A multi objective hybrid aco-pso optimization algorithm for virtual machine p...
A multi objective hybrid aco-pso optimization algorithm for virtual machine p...A multi objective hybrid aco-pso optimization algorithm for virtual machine p...
A multi objective hybrid aco-pso optimization algorithm for virtual machine p...
 
Les 16 resource
Les 16 resourceLes 16 resource
Les 16 resource
 
Memory management in oracle
Memory management in oracleMemory management in oracle
Memory management in oracle
 
Les 13 memory
Les 13 memoryLes 13 memory
Les 13 memory
 
Les 05 create_bu
Les 05 create_buLes 05 create_bu
Les 05 create_bu
 
Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...
Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...
Prepare for the Worst: Reliable Data Protection with Oracle RMAN and Oracle D...
 
Thesis-MitchellColgan_LongTerm_PowerSystem_Planning
Thesis-MitchellColgan_LongTerm_PowerSystem_PlanningThesis-MitchellColgan_LongTerm_PowerSystem_Planning
Thesis-MitchellColgan_LongTerm_PowerSystem_Planning
 
Oracle Database 12.1.0.2 New Performance Features
Oracle Database 12.1.0.2 New Performance FeaturesOracle Database 12.1.0.2 New Performance Features
Oracle Database 12.1.0.2 New Performance Features
 
Comparative study to realize an automatic speaker recognition system
Comparative study to realize an automatic speaker recognition system Comparative study to realize an automatic speaker recognition system
Comparative study to realize an automatic speaker recognition system
 
Les 10 fl1
Les 10 fl1Les 10 fl1
Les 10 fl1
 
Dynamic task scheduling on multicore automotive ec us
Dynamic task scheduling on multicore automotive ec usDynamic task scheduling on multicore automotive ec us
Dynamic task scheduling on multicore automotive ec us
 
Les 19 space_db
Les 19 space_dbLes 19 space_db
Les 19 space_db
 
Les 01 core
Les 01 coreLes 01 core
Les 01 core
 
Windows server power_efficiency___robben_and_worthington__final
Windows server power_efficiency___robben_and_worthington__finalWindows server power_efficiency___robben_and_worthington__final
Windows server power_efficiency___robben_and_worthington__final
 
Les 04 config_bu
Les 04 config_buLes 04 config_bu
Les 04 config_bu
 

Viewers also liked

Dz'iat0310 copy
Dz'iat0310   copyDz'iat0310   copy
Dz'iat0310 copyGWROY
 
Orientaciones
 Orientaciones Orientaciones
Orientacionesreyna20121
 
La chine (guangxi)
La chine (guangxi)La chine (guangxi)
La chine (guangxi)Renée Bukay
 
S_aptitud
S_aptitudS_aptitud
S_aptitudlido
 
Nomina de personas autorizadas a legalizar documentacion de brasil
Nomina de personas autorizadas a legalizar documentacion de brasilNomina de personas autorizadas a legalizar documentacion de brasil
Nomina de personas autorizadas a legalizar documentacion de brasil
Diego Gebil
 
Taller investigativo y reflexivo decisiones financierras
Taller investigativo y reflexivo decisiones financierrasTaller investigativo y reflexivo decisiones financierras
Taller investigativo y reflexivo decisiones financierrasreyna20121
 
Haushaltsauflösung sowie wohnungsauflösung nrw
Haushaltsauflösung sowie wohnungsauflösung nrwHaushaltsauflösung sowie wohnungsauflösung nrw
Haushaltsauflösung sowie wohnungsauflösung nrw
NRW Schrott
 
Cagri Merkezi Mesleki Yabanci Dil Ornek Diyalog 3
Cagri Merkezi Mesleki Yabanci Dil Ornek Diyalog 3Cagri Merkezi Mesleki Yabanci Dil Ornek Diyalog 3
Cagri Merkezi Mesleki Yabanci Dil Ornek Diyalog 3
Aretiasus
 
Innovacio_Menedzsment_Divizo English
Innovacio_Menedzsment_Divizo EnglishInnovacio_Menedzsment_Divizo English
Innovacio_Menedzsment_Divizo EnglishZoltan Galla
 
Henry Castro resume feb 2017
Henry Castro resume feb 2017Henry Castro resume feb 2017
Henry Castro resume feb 2017
Henry Castro
 
InternationalConferenceonAgeing-Newsletter
InternationalConferenceonAgeing-NewsletterInternationalConferenceonAgeing-Newsletter
InternationalConferenceonAgeing-NewsletterErmira Pirdeni
 

Viewers also liked (13)

Dz'iat0310 copy
Dz'iat0310   copyDz'iat0310   copy
Dz'iat0310 copy
 
Orientaciones
 Orientaciones Orientaciones
Orientaciones
 
La chine (guangxi)
La chine (guangxi)La chine (guangxi)
La chine (guangxi)
 
WK6ProjKoulagnaR
WK6ProjKoulagnaRWK6ProjKoulagnaR
WK6ProjKoulagnaR
 
S_aptitud
S_aptitudS_aptitud
S_aptitud
 
Nomina de personas autorizadas a legalizar documentacion de brasil
Nomina de personas autorizadas a legalizar documentacion de brasilNomina de personas autorizadas a legalizar documentacion de brasil
Nomina de personas autorizadas a legalizar documentacion de brasil
 
Taller investigativo y reflexivo decisiones financierras
Taller investigativo y reflexivo decisiones financierrasTaller investigativo y reflexivo decisiones financierras
Taller investigativo y reflexivo decisiones financierras
 
Haushaltsauflösung sowie wohnungsauflösung nrw
Haushaltsauflösung sowie wohnungsauflösung nrwHaushaltsauflösung sowie wohnungsauflösung nrw
Haushaltsauflösung sowie wohnungsauflösung nrw
 
Cagri Merkezi Mesleki Yabanci Dil Ornek Diyalog 3
Cagri Merkezi Mesleki Yabanci Dil Ornek Diyalog 3Cagri Merkezi Mesleki Yabanci Dil Ornek Diyalog 3
Cagri Merkezi Mesleki Yabanci Dil Ornek Diyalog 3
 
Innovacio_Menedzsment_Divizo English
Innovacio_Menedzsment_Divizo EnglishInnovacio_Menedzsment_Divizo English
Innovacio_Menedzsment_Divizo English
 
Henry Castro resume feb 2017
Henry Castro resume feb 2017Henry Castro resume feb 2017
Henry Castro resume feb 2017
 
Family org sg
Family org sgFamily org sg
Family org sg
 
InternationalConferenceonAgeing-Newsletter
InternationalConferenceonAgeing-NewsletterInternationalConferenceonAgeing-Newsletter
InternationalConferenceonAgeing-Newsletter
 

Similar to An application classification guided cache tuning heuristic for

참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의DzH QWuynh
 
AN EFFICIENT MEMORY DESIGN FOR ERROR TOLERANT APPLICATION1 (1).pdf
AN EFFICIENT MEMORY DESIGN FOR ERROR TOLERANT APPLICATION1 (1).pdfAN EFFICIENT MEMORY DESIGN FOR ERROR TOLERANT APPLICATION1 (1).pdf
AN EFFICIENT MEMORY DESIGN FOR ERROR TOLERANT APPLICATION1 (1).pdf
Keshvan Dhanapal
 
AN EFFICIENT MEMORY DESIGN FOR ERROR TOLERANT APPLICATION1.pptx
AN EFFICIENT MEMORY DESIGN FOR ERROR TOLERANT APPLICATION1.pptxAN EFFICIENT MEMORY DESIGN FOR ERROR TOLERANT APPLICATION1.pptx
AN EFFICIENT MEMORY DESIGN FOR ERROR TOLERANT APPLICATION1.pptx
Keshvan Dhanapal
 
BIRA recent.pptx
BIRA recent.pptxBIRA recent.pptx
BIRA recent.pptx
Keshvan Dhanapal
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
faithxdunce63732
 
TASK SCHEDULING ON ADAPTIVE MULTI-CORE
TASK SCHEDULING ON ADAPTIVE MULTI-CORETASK SCHEDULING ON ADAPTIVE MULTI-CORE
TASK SCHEDULING ON ADAPTIVE MULTI-CORE
Haris Muhammed
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptx
AkshitAgiwal1
 
A methodology for full system power modeling in heterogeneous data centers
A methodology for full system power modeling in  heterogeneous data centersA methodology for full system power modeling in  heterogeneous data centers
A methodology for full system power modeling in heterogeneous data centers
Raimon Bosch
 
Run-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environmentsRun-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environments
NECST Lab @ Politecnico di Milano
 
Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler
Sarwan ali
 
CNR @ VMUG.IT 20150304
CNR @ VMUG.IT 20150304CNR @ VMUG.IT 20150304
CNR @ VMUG.IT 20150304
VMUG IT
 
Approximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithmsApproximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithms
Sabidur Rahman
 
Wait-free data structures on embedded multi-core systems
Wait-free data structures on embedded multi-core systemsWait-free data structures on embedded multi-core systems
Wait-free data structures on embedded multi-core systems
Menlo Systems GmbH
 
Project Presentation Final
Project Presentation FinalProject Presentation Final
Project Presentation FinalDhritiman Halder
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud Infrastructures
CloudLightning
 
RT15 Berkeley | Optimized Power Flow Control in Microgrids - Sandia Laboratory
RT15 Berkeley | Optimized Power Flow Control in Microgrids - Sandia LaboratoryRT15 Berkeley | Optimized Power Flow Control in Microgrids - Sandia Laboratory
RT15 Berkeley | Optimized Power Flow Control in Microgrids - Sandia Laboratory
OPAL-RT TECHNOLOGIES
 
Oracle ebs capacity_analysisusingstatisticalmethods
Oracle ebs capacity_analysisusingstatisticalmethodsOracle ebs capacity_analysisusingstatisticalmethods
Oracle ebs capacity_analysisusingstatisticalmethods
Ajith Narayanan
 
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
Matteo Ferroni
 

Similar to An application classification guided cache tuning heuristic for (20)

참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의
 
AN EFFICIENT MEMORY DESIGN FOR ERROR TOLERANT APPLICATION1 (1).pdf
AN EFFICIENT MEMORY DESIGN FOR ERROR TOLERANT APPLICATION1 (1).pdfAN EFFICIENT MEMORY DESIGN FOR ERROR TOLERANT APPLICATION1 (1).pdf
AN EFFICIENT MEMORY DESIGN FOR ERROR TOLERANT APPLICATION1 (1).pdf
 
AN EFFICIENT MEMORY DESIGN FOR ERROR TOLERANT APPLICATION1.pptx
AN EFFICIENT MEMORY DESIGN FOR ERROR TOLERANT APPLICATION1.pptxAN EFFICIENT MEMORY DESIGN FOR ERROR TOLERANT APPLICATION1.pptx
AN EFFICIENT MEMORY DESIGN FOR ERROR TOLERANT APPLICATION1.pptx
 
BIRA recent.pptx
BIRA recent.pptxBIRA recent.pptx
BIRA recent.pptx
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
 
TASK SCHEDULING ON ADAPTIVE MULTI-CORE
TASK SCHEDULING ON ADAPTIVE MULTI-CORETASK SCHEDULING ON ADAPTIVE MULTI-CORE
TASK SCHEDULING ON ADAPTIVE MULTI-CORE
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptx
 
Cache
CacheCache
Cache
 
A methodology for full system power modeling in heterogeneous data centers
A methodology for full system power modeling in  heterogeneous data centersA methodology for full system power modeling in  heterogeneous data centers
A methodology for full system power modeling in heterogeneous data centers
 
Run-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environmentsRun-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environments
 
Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler Chip Multithreading Systems Need a New Operating System Scheduler
Chip Multithreading Systems Need a New Operating System Scheduler
 
On chip cache
On chip cacheOn chip cache
On chip cache
 
CNR @ VMUG.IT 20150304
CNR @ VMUG.IT 20150304CNR @ VMUG.IT 20150304
CNR @ VMUG.IT 20150304
 
Approximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithmsApproximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithms
 
Wait-free data structures on embedded multi-core systems
Wait-free data structures on embedded multi-core systemsWait-free data structures on embedded multi-core systems
Wait-free data structures on embedded multi-core systems
 
Project Presentation Final
Project Presentation FinalProject Presentation Final
Project Presentation Final
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud Infrastructures
 
RT15 Berkeley | Optimized Power Flow Control in Microgrids - Sandia Laboratory
RT15 Berkeley | Optimized Power Flow Control in Microgrids - Sandia LaboratoryRT15 Berkeley | Optimized Power Flow Control in Microgrids - Sandia Laboratory
RT15 Berkeley | Optimized Power Flow Control in Microgrids - Sandia Laboratory
 
Oracle ebs capacity_analysisusingstatisticalmethods
Oracle ebs capacity_analysisusingstatisticalmethodsOracle ebs capacity_analysisusingstatisticalmethods
Oracle ebs capacity_analysisusingstatisticalmethods
 
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
 

Recently uploaded

Self-Control of Emotions by Slidesgo.pptx
Self-Control of Emotions by Slidesgo.pptxSelf-Control of Emotions by Slidesgo.pptx
Self-Control of Emotions by Slidesgo.pptx
iemerc2024
 
Fundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptxFundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptx
manasideore6
 
TOP 10 B TECH COLLEGES IN JAIPUR 2024.pptx
TOP 10 B TECH COLLEGES IN JAIPUR 2024.pptxTOP 10 B TECH COLLEGES IN JAIPUR 2024.pptx
TOP 10 B TECH COLLEGES IN JAIPUR 2024.pptx
nikitacareer3
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
dxobcob
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
obonagu
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
zwunae
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
yokeleetan1
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
Mukeshwaran Balu
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.pptPROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
bhadouriyakaku
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 

Recently uploaded (20)

Self-Control of Emotions by Slidesgo.pptx
Self-Control of Emotions by Slidesgo.pptxSelf-Control of Emotions by Slidesgo.pptx
Self-Control of Emotions by Slidesgo.pptx
 
Fundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptxFundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptx
 
TOP 10 B TECH COLLEGES IN JAIPUR 2024.pptx
TOP 10 B TECH COLLEGES IN JAIPUR 2024.pptxTOP 10 B TECH COLLEGES IN JAIPUR 2024.pptx
TOP 10 B TECH COLLEGES IN JAIPUR 2024.pptx
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.pptPROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 

An application classification guided cache tuning heuristic for

  • 1. An Application Classification Guided Cache Tuning Heuristic for Multi-core Architectures PRESENTED BY:- GUIDED BY:- DEBABRATA PAUL CHOWDHURY(14014081002) PRO F. PRASHANT MODI KHYATI RAJPUT (14014081007) (UVPCE) M.TECH-CE(SEM -II)
  • 2. Contents • Introduction • Multi core System Optimization • Cache Tuning • Cache Tuning Process • Multi-core Architectural Layout • Application Classification Guided Cache Tuning Heuristic • Experimental Work • Conclusion
  • 3. Introduction Basic Concepts • Single Core :- In single core architecture, computing component having one independent processing unit.
  • 4. Introduction(cont.) • Multi Core:- • In multi core architecture single computing component with two or more independent actual processing units (called "cores"). • Run multiple instructions of a program at the same time, increasing overall speed for programs –”Parallel Computing”.
  • 5. Muti-Core System Optimization •Previous multi-core cache optimizations only focused on improving performance(such as number of Hits, misses, and write backs). •But now multi-core optimizations focused on reducing energy consumption via tuning individual cores. •Definition of Multi-core system optimization • Multi-core system optimization improve system performance and energy consumption by tuning the system to the application’s runtime behavior and resource requirements.
  • 6. What is Cache Tuning? •Cache tuning the task of choosing the best configuration of cache design parameters for a particular application, or for a particular phase of an application, such that performance, power and/or energy are optimized.
  • 7. Cache Tuning Process Step 1:- Execute the application for one tuning interval in each potential configuration (tuning intervals must be long enough for the cache behavior to stabilize). Step 2:- Gather cache statistics, such as the number of accesses, misses, and write backs, for each explored configuration. Step 3:- Combine the cache statistics with an energy model to determine the optimal cache configuration. Step 4:- Fix the cache parameter values to the optimal cache configuration’s parameter values.
  • 9. Multi-core Architectural Layout(cont.) • Multi- core Architecture consist of:- 1. Arbitrary number of cores 2. A cache tuner • Each core has a private data cache (L1). • Global cache tuner connected to each core’s private data cache(L1). • It calculates the cache tuner heuristics by gathering cache statistics, coordinating cache tuning among the cores and calculates the cache’s energy consumption.
  • 10. Multi-core Architectural Layout(cont.) Overheads in this Multi-core Architecture Layout • During tuning, applications incur stall cycles while the tuner gathers cache statistics, calculates energy consumption, and changes the cache configuration. • These tuning stall cycles introduce Energy and Performance overhead. • Our tuning heuristic considers these overheads incurred during the tuning stall cycles, and thus minimizes the number of simultaneously tuned cores and the tuning energy and performance overheads.
  • 12. Multi-core Architectural Layout(cont.) • Figure illustrates the similarities using actual data cache miss rates for an 8-core system (the cores are denoted as P0 to P7). •We evaluate cache miss rate similarity by normalizing the caches’ miss rates to the core with the lowest miss rate. •In first figure, normalized miss rates are nearly 1.0 for all cores, all caches are classified as having similar behavior. •In second figure, normalized miss rates show that P1 has similar cache behavior as P2 to P7 (i.e. P1 to P7’s normalized miss rates are nearly 3.5), but P0 has different cache behavior than P1 to P7.
  • 13. Application Classification Guided Cache Tuning Heuristic • Application classification is based on the two things :- 1. Cache Behaviour 2. Data Sharing or Non Data Sharing Application • Cache accesses and misses are used to determine if data sets have similar cache behavior. •In data-sharing application’s if coherence misses attribute to more than 5% of the total cache misses , then application is classified as data sharing otherwise the application is non-data-sharing.
  • 14. Application Classification Guided Cache Tuning Heuristic(cont.)
  • 15. Application Classification Guided Cache Tuning Heuristic(cont.) • Application classification guided cache tuning heuristic, which consists of three main steps: 1) Application profiling and initial tuning 2) Application classification 3) Final tuning actions
  • 16. Application Classification Guided Cache Tuning Heuristic(cont.) •Step 1 profiles the application to gather the caches statistics, which are used to determine cache behavior and data sharing in step 2. •Step 1 is critical for avoiding redundant cache tuning in situations where the data sets have similar cache behavior and similar optimal configurations. •Condition 1 and Condition 2 classify the applications based on whether or not the cores have similar cache behavior and/or exhibit data sharing, respectively. •Evaluating these conditions determines the necessary cache tuning effort in Step 3. •If condition 1 is evaluated as true. In these situations, only a single cache needs to be tuned. •When final configuration is obtained apply this configuration to all other cores.
  • 17. Application Classification Guided Cache Tuning Heuristic(cont.) •If the data sets have different cache behavior, or Condition 1 is false, tuning is more complex and several cores must be tuned. •If the application does not shares data, or Condition 2 is false, the heuristic only tunes one core from each group and cores can be tuned independently without affecting the behavior of the other cores. •If the application shares data, or Condition 2 is true, the heuristic still only tunes one core from each group but the tuning must be coordinated among the cores.
  • 18. Experimental Results • We quantified the energy savings and performance of our heuristic using SPLASH-2 multithreaded application. • The SPLASH-2 suite is one of the most widely used collections of multithreaded workloads. • On the SESC simulator for a 1-, 2-, 4-, 8- and 16- core system. In SESC, we modeled a heterogeneous system with the L1 data cache parameters. •Since the L1 data cache has 36 possible configurations, our design space is 36^n where n is the of cores in the system. •The L1 instruction cache and L2 unified cache were fixed at the base configuration and 256 KB, 4-way set associative cache with a 64 byte line size, respectively. We modified SESC to identify coherence misses.
  • 19. Experimental Results(cont.) Energy Model for the multi-core system • total energy = ∑(energy consumed by each core) • energy consumed by each core: energy = dynamic_energy + static_energy + fill_energy + writeback_energy + CPU_stall_energy • dynamic_energy: The dynamic power consumption originates from logic-gate activities in the CPU. dynamic_energy = dL1_accesses * dL1_access_energy • static energy: The static energy consumption enables energy-aware software development. Static energy is actually not good for the system at all. static_energy = ((dL1_misses * miss_latency_cycles) + (dL1_hits * hit_latency_cycles) + (dL1_writebacks * writeback_latency_cycles)) * dL1_static_energy
  • 20. Experimental Results(cont.) •fill_energy: fill_energy = dL1_misses * (linesize / wordsize) *mem_read_energy_perword • writeback_energy: Write back is a storage method in which data is written into the cache writeback_energy = dL1_writebacks * (linesize / wordsize) * mem_write_energy_perword •CPU_stall_energy: CPU_stall_energy = ((dL1_misses * miss_latency_cycles) + (dL1_writebacks * writeback_latency_cycles)) * CPU_idle_energy • Our model calculates the dynamic and static energy of each data cache, the energy needed to fill the cache on a miss, the energy consumed on a cache write back, and the energy consumed when the processor is stalled during cache fills and write backs. • We gathered dL1_misses, dL1_hits, and dL1_writebacks cache statistics using SESC.
  • 21. Experimental Results(cont.) • We assumed the core’s idle energy (CPU_idle_energy) to be 25% and the static energy per cycle to be 25% of the cache’s dynamic energy. • Let the tuning interval of 50,000 cycles. • Using configuration_energy_per_cycle to determine the energy consumed during each 500,000 cycle tuning interval and the energy consumed in the final configuration. • Energy savings were calculated by normalizing the energy to the energy consumed executing the application in the base configuration.
  • 22. Results and Analysis •Figure given below depict the energy savings and performance, respectively, for the optimal configuration determined via exhaustive design space exploration (optimal) for 2- and 4-core systems and for the final configuration found by our application classification cache tuning heuristic (heuristic) for 2-, 4-, 8-, and 16-core systems, for each application and averaged across all applications (Avg). •Our heuristic achieved 26% and 25% energy savings, incurred 9% and 6% performance penalties, and achieved average speedups for the 8- and 16-core systems, respectively.
  • 23. Results and Analysis(Cont..) • Normalised performance for the optimal cache (optimal) for 2- and 4-core systems and the final configuration for the application classification cache tuning heuristic for 2-, 4-, 8- and 16-core systems as compared to the systems respective base configurations.
  • 24. Results and Analysis(Cont..) • Energy Saving • We can get this much of energy consumption.
  • 25. Conclusion •Our heuristic classified applications based on data sharing and cache behavior, and used this classification to identify which cores needed to be tuned and to reduce the number of cores being tuned simultaneously.
  • 26. Future Work •Our heuristic searched at most 1% of the design space, yielded configurations within 2% of the optimal, and achieved an average of 25% energy savings. •In future work we plan to investigate how our heuristic will be applicable to a larger system with hundreds of cores.