SlideShare a Scribd company logo
1 of 26
Process Coordinator in NUMA Environment
Song Chi Young
stchuck@dankook.ac.kr
Dankook UNIV.
Jongmoo Choi
choijm@dankook.ac.kr
Dankook UNIV.
Contents
 Introduction
 Design
 Result
 Conclusion
Introduction
What is NUMA?
Motivation
What is NUMA?
1. Introduction
What is NUMA?
1. Introduction
What is NUMA?
1. Introduction
1. Core 0 is request to GQ
2. GQ finds form LLC
3. If can not find
1. Find from own DRAM
2. Request to next nodes GQ
Motivation
1. Introduction
0
50
100
150
200
250
4 4 4
streamcluster canneal facesim
① 1Processor Local ② 1Processor Interleaving ③ 2Processor Interleaving ④ 1Processor Remote
Motivation
1. Introduction
0
50
100
150
200
250
4
facesim
① 1Processor Local
② 1Processor Interleaving
③ 2Processor Interleaving
④ 1Processor Remote
Why memory interleaving policy
Better than local memory policy?
Design
Effective Factor
How to measure
Effective Factor
2. Design
0
50
100
150
200
250
0 0.1 0.2 0.3 0.4 0.5
LLC Hit Ratio
0
50
100
150
200
250
0 100000000 200000000 300000000
GQ_data_from_LLC
0
50
100
150
200
250
0 50000000 100000000 150000000 200000000
GQ_data_from_QPI
0
50
100
150
200
250
0 2E+10 4E+10 6E+10
GQ_DATA.FROM_IMC
0
50
100
150
200
250
0 5E+09 1E+10 1.5E+10 2E+10
QHL_REQUEST.REMOTE
0
50
100
150
200
250
0 10000 20000 30000 40000 50000 60000 70000
MBW
Effective Factor
2. Design
0
50
100
150
200
250
0 0.1 0.2 0.3 0.4 0.5
LLC Hit Ratio
0
50
100
150
200
250
0 100000000 200000000 300000000
GQ_data_from_LLC
0
50
100
150
200
250
0 50000000 100000000 150000000 200000000
GQ_data_from_QPI
0
50
100
150
200
250
0 2E+10 4E+10 6E+10
GQ_DATA.FROM_IMC
0
50
100
150
200
250
0 5E+09 1E+10 1.5E+10 2E+10
QHL_REQUEST.REMOTE
0
50
100
150
200
250
0 10000 20000 30000 40000 50000 60000 70000
MBW
LLC, Memory Interleaving, Local/Remote
Effective Factor
2. Design
Shared Data
MeMory usage
Working Set size
Read Write
NuMa
Effective Factor
2. Design
Shared Data
MeMory usage
Working Set size
Read Write
NuMa
(출처: BIENIA, Christian, et al. The PARSEC benchmark suite: Characterization and architectural implications. In: Proceedings of the 17th international conference on Parallel architectures and compilation techniques. ACM, 2008. p. 72-81.)
Effective Factor
2. Design
Shared Data
MeMory usage
Working Set size
Read Write
NuMa
(출처: BIENIA, Christian, et al. The PARSEC benchmark suite: Characterization and architectural implications. In: Proceedings of the 17th international conference on Parallel architectures and compilation techniques. ACM, 2008. p. 72-81.)
Effective Factor
2. Design
Shared Data
MeMory usage
Working Set size
Read Write
NuMa
PMU data
A : UNC_IMC_NORMAL_READS.ANY
B : UNC_IMC_WRITES.FULL.ANY
Usage = A + B
Read Write 비율 = A / B
How to Measure
2. Design
𝑃𝑣
𝑡
× 𝑀 × 𝑆 𝑣 ∝ 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒
Pv : Performance Vector table for workloads
Sv : System Performance Vector table
M : Transformation matrix
How to Measure
2. Design
ⓐ
ⓑ
SD MM WS RW NM ×
0 0 0
1 1 0
0 0 1
0 0 1
1 0 0
×
LC
MI
LR
ⓐ
ⓑ
How to Measure
2. Design
0 0 0
1 1 0
0 0 1
0 0 1
1 0 0
𝑀 × 𝑆 𝑣 ∝ 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒=
LC MI LR
SD
MM
WS
RW
NM
Heuristic method
If, one factor affect to the other factor
Check this table, 1
Result
Performance Vector for Workloads
3. Result
Canneal StreamCluster Facesim
SD 1 2 1
MM 1 2 3
WS 1 0 1
RW 1 2 1
NM 2 2 2
SD: Shared data, MM : Memory usage, WS : Working Set size, RW : R/W 비율, NM : NUMA 가중치
Performance Vector for NUMA Configuration
3. Result
1
1_Processor
Local
2
1_Processor
interleaving
3
2_Processor
interleaving
4
1_Processor
Remote
LC 1 1 2 1
MI 1 2 2 1
LR 2 1 1 0
LC : LLC Size, MI : Memory Interleaving, LR : Local/Remote 비율
Result
3. Result
1 2 3 4
canneal 10 8 9 6
stream 6 5 6 4
facesim 5 5 6 4
1 2 3 4
canneal 9 7 9 3
stream 12 10 12 4
facesim 13 12 14 5
실제 성능
계산된 성능 가중치
Result
3. Result
각 워크로드 별 상관계수
canneal stream cluster facesim
0.966092 0.965581 0.9
1 2 3 4
canneal 10 8 9 6
stream 6 5 6 4
facesim 5 5 6 4
1 2 3 4
canneal 9 7 9 3
stream 12 10 12 4
facesim 13 12 14 5
실제 성능
계산된 성능 가중치
Conclusion
Conclusion
4. Conclusion
Why memory interleaving policy Better than local memory policy?
Effective Factor : LLC, Memory Interleaving, Local/Remote
SD MM WS RW NM ×
0 0 0
1 1 0
0 0 1
0 0 1
1 0 0
×
LC
MI
LR
𝑃𝑣
𝑡
× 𝑀 × 𝑆 𝑣 ∝ 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒
canneal stream cluster facesim
0.966092 0.965581 0.9
pearson correlation coefficient
Q :

Process coordinator in NUMA environment

More Related Content

Similar to Process coordinator in NUMA environment

Sample_CPT_Presentation-by_Dongwei_Mei.pdf
Sample_CPT_Presentation-by_Dongwei_Mei.pdfSample_CPT_Presentation-by_Dongwei_Mei.pdf
Sample_CPT_Presentation-by_Dongwei_Mei.pdfSURYAPRAKASH281978
 
MongoDB performance tuning and monitoring with MMS
MongoDB performance tuning and monitoring with MMSMongoDB performance tuning and monitoring with MMS
MongoDB performance tuning and monitoring with MMSNicholas Tang
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Vincenzo Gulisano
 
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009James McGalliard
 
Runtime Performance Optimizations for an OpenFOAM Simulation
Runtime Performance Optimizations for an OpenFOAM SimulationRuntime Performance Optimizations for an OpenFOAM Simulation
Runtime Performance Optimizations for an OpenFOAM SimulationFisnik Kraja
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsTask Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsRafael Ferreira da Silva
 
Online recommendations at scale using matrix factorisation
Online recommendations at scale using matrix factorisationOnline recommendations at scale using matrix factorisation
Online recommendations at scale using matrix factorisationMarcus Ljungblad
 
Lync stress test guide v2.0 (ebook)
Lync stress test guide v2.0 (ebook)Lync stress test guide v2.0 (ebook)
Lync stress test guide v2.0 (ebook)Thomas Poett
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorJinwon Lee
 
Performance Evaluation of a Network Using Simulation Tools or Packet Tracer
Performance Evaluation of a Network Using Simulation Tools or Packet TracerPerformance Evaluation of a Network Using Simulation Tools or Packet Tracer
Performance Evaluation of a Network Using Simulation Tools or Packet TracerIOSRjournaljce
 
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...Engr. Md. Jamal Uddin Rayhan
 
Practical Guidelines for Solving Difficult Mixed Integer Programs
Practical Guidelines for Solving Difficult  Mixed Integer ProgramsPractical Guidelines for Solving Difficult  Mixed Integer Programs
Practical Guidelines for Solving Difficult Mixed Integer ProgramsIBM Decision Optimization
 
Neural Field aware Factorization Machine
Neural Field aware Factorization MachineNeural Field aware Factorization Machine
Neural Field aware Factorization MachineInMobi
 
Ai final ppt with InMobi template
Ai  final ppt with InMobi templateAi  final ppt with InMobi template
Ai final ppt with InMobi templateGunjan Sharma
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafkaconfluent
 
Ruby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3xRuby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3xMatthew Gaudet
 
On Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLOn Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLDatabricks
 
Cache Design for an Alpha Microprocessor
Cache Design for an Alpha MicroprocessorCache Design for an Alpha Microprocessor
Cache Design for an Alpha MicroprocessorBharat Biyani
 
PASS Summit 2010 Keynote David DeWitt
PASS Summit 2010 Keynote David DeWittPASS Summit 2010 Keynote David DeWitt
PASS Summit 2010 Keynote David DeWittGraySystemsLab
 

Similar to Process coordinator in NUMA environment (20)

Sample_CPT_Presentation-by_Dongwei_Mei.pdf
Sample_CPT_Presentation-by_Dongwei_Mei.pdfSample_CPT_Presentation-by_Dongwei_Mei.pdf
Sample_CPT_Presentation-by_Dongwei_Mei.pdf
 
MongoDB performance tuning and monitoring with MMS
MongoDB performance tuning and monitoring with MMSMongoDB performance tuning and monitoring with MMS
MongoDB performance tuning and monitoring with MMS
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
 
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
 
Runtime Performance Optimizations for an OpenFOAM Simulation
Runtime Performance Optimizations for an OpenFOAM SimulationRuntime Performance Optimizations for an OpenFOAM Simulation
Runtime Performance Optimizations for an OpenFOAM Simulation
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsTask Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and Workflows
 
Online recommendations at scale using matrix factorisation
Online recommendations at scale using matrix factorisationOnline recommendations at scale using matrix factorisation
Online recommendations at scale using matrix factorisation
 
Lync stress test guide v2.0 (ebook)
Lync stress test guide v2.0 (ebook)Lync stress test guide v2.0 (ebook)
Lync stress test guide v2.0 (ebook)
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
 
SOFA Tutorial
SOFA TutorialSOFA Tutorial
SOFA Tutorial
 
Performance Evaluation of a Network Using Simulation Tools or Packet Tracer
Performance Evaluation of a Network Using Simulation Tools or Packet TracerPerformance Evaluation of a Network Using Simulation Tools or Packet Tracer
Performance Evaluation of a Network Using Simulation Tools or Packet Tracer
 
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...
 
Practical Guidelines for Solving Difficult Mixed Integer Programs
Practical Guidelines for Solving Difficult  Mixed Integer ProgramsPractical Guidelines for Solving Difficult  Mixed Integer Programs
Practical Guidelines for Solving Difficult Mixed Integer Programs
 
Neural Field aware Factorization Machine
Neural Field aware Factorization MachineNeural Field aware Factorization Machine
Neural Field aware Factorization Machine
 
Ai final ppt with InMobi template
Ai  final ppt with InMobi templateAi  final ppt with InMobi template
Ai final ppt with InMobi template
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
 
Ruby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3xRuby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3x
 
On Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLOn Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQL
 
Cache Design for an Alpha Microprocessor
Cache Design for an Alpha MicroprocessorCache Design for an Alpha Microprocessor
Cache Design for an Alpha Microprocessor
 
PASS Summit 2010 Keynote David DeWitt
PASS Summit 2010 Keynote David DeWittPASS Summit 2010 Keynote David DeWitt
PASS Summit 2010 Keynote David DeWitt
 

Recently uploaded

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Recently uploaded (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Process coordinator in NUMA environment

  • 1. Process Coordinator in NUMA Environment Song Chi Young stchuck@dankook.ac.kr Dankook UNIV. Jongmoo Choi choijm@dankook.ac.kr Dankook UNIV.
  • 4. What is NUMA? 1. Introduction
  • 5. What is NUMA? 1. Introduction
  • 6. What is NUMA? 1. Introduction 1. Core 0 is request to GQ 2. GQ finds form LLC 3. If can not find 1. Find from own DRAM 2. Request to next nodes GQ
  • 7. Motivation 1. Introduction 0 50 100 150 200 250 4 4 4 streamcluster canneal facesim ① 1Processor Local ② 1Processor Interleaving ③ 2Processor Interleaving ④ 1Processor Remote
  • 8. Motivation 1. Introduction 0 50 100 150 200 250 4 facesim ① 1Processor Local ② 1Processor Interleaving ③ 2Processor Interleaving ④ 1Processor Remote Why memory interleaving policy Better than local memory policy?
  • 10. Effective Factor 2. Design 0 50 100 150 200 250 0 0.1 0.2 0.3 0.4 0.5 LLC Hit Ratio 0 50 100 150 200 250 0 100000000 200000000 300000000 GQ_data_from_LLC 0 50 100 150 200 250 0 50000000 100000000 150000000 200000000 GQ_data_from_QPI 0 50 100 150 200 250 0 2E+10 4E+10 6E+10 GQ_DATA.FROM_IMC 0 50 100 150 200 250 0 5E+09 1E+10 1.5E+10 2E+10 QHL_REQUEST.REMOTE 0 50 100 150 200 250 0 10000 20000 30000 40000 50000 60000 70000 MBW
  • 11. Effective Factor 2. Design 0 50 100 150 200 250 0 0.1 0.2 0.3 0.4 0.5 LLC Hit Ratio 0 50 100 150 200 250 0 100000000 200000000 300000000 GQ_data_from_LLC 0 50 100 150 200 250 0 50000000 100000000 150000000 200000000 GQ_data_from_QPI 0 50 100 150 200 250 0 2E+10 4E+10 6E+10 GQ_DATA.FROM_IMC 0 50 100 150 200 250 0 5E+09 1E+10 1.5E+10 2E+10 QHL_REQUEST.REMOTE 0 50 100 150 200 250 0 10000 20000 30000 40000 50000 60000 70000 MBW LLC, Memory Interleaving, Local/Remote
  • 12. Effective Factor 2. Design Shared Data MeMory usage Working Set size Read Write NuMa
  • 13. Effective Factor 2. Design Shared Data MeMory usage Working Set size Read Write NuMa (출처: BIENIA, Christian, et al. The PARSEC benchmark suite: Characterization and architectural implications. In: Proceedings of the 17th international conference on Parallel architectures and compilation techniques. ACM, 2008. p. 72-81.)
  • 14. Effective Factor 2. Design Shared Data MeMory usage Working Set size Read Write NuMa (출처: BIENIA, Christian, et al. The PARSEC benchmark suite: Characterization and architectural implications. In: Proceedings of the 17th international conference on Parallel architectures and compilation techniques. ACM, 2008. p. 72-81.)
  • 15. Effective Factor 2. Design Shared Data MeMory usage Working Set size Read Write NuMa PMU data A : UNC_IMC_NORMAL_READS.ANY B : UNC_IMC_WRITES.FULL.ANY Usage = A + B Read Write 비율 = A / B
  • 16. How to Measure 2. Design 𝑃𝑣 𝑡 × 𝑀 × 𝑆 𝑣 ∝ 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 Pv : Performance Vector table for workloads Sv : System Performance Vector table M : Transformation matrix
  • 17. How to Measure 2. Design ⓐ ⓑ SD MM WS RW NM × 0 0 0 1 1 0 0 0 1 0 0 1 1 0 0 × LC MI LR ⓐ ⓑ
  • 18. How to Measure 2. Design 0 0 0 1 1 0 0 0 1 0 0 1 1 0 0 𝑀 × 𝑆 𝑣 ∝ 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒= LC MI LR SD MM WS RW NM Heuristic method If, one factor affect to the other factor Check this table, 1
  • 20. Performance Vector for Workloads 3. Result Canneal StreamCluster Facesim SD 1 2 1 MM 1 2 3 WS 1 0 1 RW 1 2 1 NM 2 2 2 SD: Shared data, MM : Memory usage, WS : Working Set size, RW : R/W 비율, NM : NUMA 가중치
  • 21. Performance Vector for NUMA Configuration 3. Result 1 1_Processor Local 2 1_Processor interleaving 3 2_Processor interleaving 4 1_Processor Remote LC 1 1 2 1 MI 1 2 2 1 LR 2 1 1 0 LC : LLC Size, MI : Memory Interleaving, LR : Local/Remote 비율
  • 22. Result 3. Result 1 2 3 4 canneal 10 8 9 6 stream 6 5 6 4 facesim 5 5 6 4 1 2 3 4 canneal 9 7 9 3 stream 12 10 12 4 facesim 13 12 14 5 실제 성능 계산된 성능 가중치
  • 23. Result 3. Result 각 워크로드 별 상관계수 canneal stream cluster facesim 0.966092 0.965581 0.9 1 2 3 4 canneal 10 8 9 6 stream 6 5 6 4 facesim 5 5 6 4 1 2 3 4 canneal 9 7 9 3 stream 12 10 12 4 facesim 13 12 14 5 실제 성능 계산된 성능 가중치
  • 25. Conclusion 4. Conclusion Why memory interleaving policy Better than local memory policy? Effective Factor : LLC, Memory Interleaving, Local/Remote SD MM WS RW NM × 0 0 0 1 1 0 0 0 1 0 0 1 1 0 0 × LC MI LR 𝑃𝑣 𝑡 × 𝑀 × 𝑆 𝑣 ∝ 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 canneal stream cluster facesim 0.966092 0.965581 0.9 pearson correlation coefficient Q : 