SlideShare a Scribd company logo
Process Coordinator in NUMA Environment
Song Chi Young
stchuck@dankook.ac.kr
Dankook UNIV.
Jongmoo Choi
choijm@dankook.ac.kr
Dankook UNIV.
Contents
 Introduction
 Design
 Result
 Conclusion
Introduction
What is NUMA?
Motivation
What is NUMA?
1. Introduction
What is NUMA?
1. Introduction
What is NUMA?
1. Introduction
1. Core 0 is request to GQ
2. GQ finds form LLC
3. If can not find
1. Find from own DRAM
2. Request to next nodes GQ
Motivation
1. Introduction
0
50
100
150
200
250
4 4 4
streamcluster canneal facesim
① 1Processor Local ② 1Processor Interleaving ③ 2Processor Interleaving ④ 1Processor Remote
Motivation
1. Introduction
0
50
100
150
200
250
4
facesim
① 1Processor Local
② 1Processor Interleaving
③ 2Processor Interleaving
④ 1Processor Remote
Why memory interleaving policy
Better than local memory policy?
Design
Effective Factor
How to measure
Effective Factor
2. Design
0
50
100
150
200
250
0 0.1 0.2 0.3 0.4 0.5
LLC Hit Ratio
0
50
100
150
200
250
0 100000000 200000000 300000000
GQ_data_from_LLC
0
50
100
150
200
250
0 50000000 100000000 150000000 200000000
GQ_data_from_QPI
0
50
100
150
200
250
0 2E+10 4E+10 6E+10
GQ_DATA.FROM_IMC
0
50
100
150
200
250
0 5E+09 1E+10 1.5E+10 2E+10
QHL_REQUEST.REMOTE
0
50
100
150
200
250
0 10000 20000 30000 40000 50000 60000 70000
MBW
Effective Factor
2. Design
0
50
100
150
200
250
0 0.1 0.2 0.3 0.4 0.5
LLC Hit Ratio
0
50
100
150
200
250
0 100000000 200000000 300000000
GQ_data_from_LLC
0
50
100
150
200
250
0 50000000 100000000 150000000 200000000
GQ_data_from_QPI
0
50
100
150
200
250
0 2E+10 4E+10 6E+10
GQ_DATA.FROM_IMC
0
50
100
150
200
250
0 5E+09 1E+10 1.5E+10 2E+10
QHL_REQUEST.REMOTE
0
50
100
150
200
250
0 10000 20000 30000 40000 50000 60000 70000
MBW
LLC, Memory Interleaving, Local/Remote
Effective Factor
2. Design
Shared Data
MeMory usage
Working Set size
Read Write
NuMa
Effective Factor
2. Design
Shared Data
MeMory usage
Working Set size
Read Write
NuMa
(출처: BIENIA, Christian, et al. The PARSEC benchmark suite: Characterization and architectural implications. In: Proceedings of the 17th international conference on Parallel architectures and compilation techniques. ACM, 2008. p. 72-81.)
Effective Factor
2. Design
Shared Data
MeMory usage
Working Set size
Read Write
NuMa
(출처: BIENIA, Christian, et al. The PARSEC benchmark suite: Characterization and architectural implications. In: Proceedings of the 17th international conference on Parallel architectures and compilation techniques. ACM, 2008. p. 72-81.)
Effective Factor
2. Design
Shared Data
MeMory usage
Working Set size
Read Write
NuMa
PMU data
A : UNC_IMC_NORMAL_READS.ANY
B : UNC_IMC_WRITES.FULL.ANY
Usage = A + B
Read Write 비율 = A / B
How to Measure
2. Design
𝑃𝑣
𝑡
× 𝑀 × 𝑆 𝑣 ∝ 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒
Pv : Performance Vector table for workloads
Sv : System Performance Vector table
M : Transformation matrix
How to Measure
2. Design
ⓐ
ⓑ
SD MM WS RW NM ×
0 0 0
1 1 0
0 0 1
0 0 1
1 0 0
×
LC
MI
LR
ⓐ
ⓑ
How to Measure
2. Design
0 0 0
1 1 0
0 0 1
0 0 1
1 0 0
𝑀 × 𝑆 𝑣 ∝ 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒=
LC MI LR
SD
MM
WS
RW
NM
Heuristic method
If, one factor affect to the other factor
Check this table, 1
Result
Performance Vector for Workloads
3. Result
Canneal StreamCluster Facesim
SD 1 2 1
MM 1 2 3
WS 1 0 1
RW 1 2 1
NM 2 2 2
SD: Shared data, MM : Memory usage, WS : Working Set size, RW : R/W 비율, NM : NUMA 가중치
Performance Vector for NUMA Configuration
3. Result
1
1_Processor
Local
2
1_Processor
interleaving
3
2_Processor
interleaving
4
1_Processor
Remote
LC 1 1 2 1
MI 1 2 2 1
LR 2 1 1 0
LC : LLC Size, MI : Memory Interleaving, LR : Local/Remote 비율
Result
3. Result
1 2 3 4
canneal 10 8 9 6
stream 6 5 6 4
facesim 5 5 6 4
1 2 3 4
canneal 9 7 9 3
stream 12 10 12 4
facesim 13 12 14 5
실제 성능
계산된 성능 가중치
Result
3. Result
각 워크로드 별 상관계수
canneal stream cluster facesim
0.966092 0.965581 0.9
1 2 3 4
canneal 10 8 9 6
stream 6 5 6 4
facesim 5 5 6 4
1 2 3 4
canneal 9 7 9 3
stream 12 10 12 4
facesim 13 12 14 5
실제 성능
계산된 성능 가중치
Conclusion
Conclusion
4. Conclusion
Why memory interleaving policy Better than local memory policy?
Effective Factor : LLC, Memory Interleaving, Local/Remote
SD MM WS RW NM ×
0 0 0
1 1 0
0 0 1
0 0 1
1 0 0
×
LC
MI
LR
𝑃𝑣
𝑡
× 𝑀 × 𝑆 𝑣 ∝ 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒
canneal stream cluster facesim
0.966092 0.965581 0.9
pearson correlation coefficient
Q :

Process coordinator in NUMA environment

More Related Content

Similar to Process coordinator in NUMA environment

Sample_CPT_Presentation-by_Dongwei_Mei.pdf
Sample_CPT_Presentation-by_Dongwei_Mei.pdfSample_CPT_Presentation-by_Dongwei_Mei.pdf
Sample_CPT_Presentation-by_Dongwei_Mei.pdf
SURYAPRAKASH281978
 
MongoDB performance tuning and monitoring with MMS
MongoDB performance tuning and monitoring with MMSMongoDB performance tuning and monitoring with MMS
MongoDB performance tuning and monitoring with MMS
Nicholas Tang
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
Vincenzo Gulisano
 
Bài tập lớn hệ điều hành HCMUT_HK232.pdf
Bài tập lớn hệ điều hành HCMUT_HK232.pdfBài tập lớn hệ điều hành HCMUT_HK232.pdf
Bài tập lớn hệ điều hành HCMUT_HK232.pdf
danhnguyenthanh15
 
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
James McGalliard
 
Runtime Performance Optimizations for an OpenFOAM Simulation
Runtime Performance Optimizations for an OpenFOAM SimulationRuntime Performance Optimizations for an OpenFOAM Simulation
Runtime Performance Optimizations for an OpenFOAM Simulation
Fisnik Kraja
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsTask Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and Workflows
Rafael Ferreira da Silva
 
Online recommendations at scale using matrix factorisation
Online recommendations at scale using matrix factorisationOnline recommendations at scale using matrix factorisation
Online recommendations at scale using matrix factorisation
Marcus Ljungblad
 
Lync stress test guide v2.0 (ebook)
Lync stress test guide v2.0 (ebook)Lync stress test guide v2.0 (ebook)
Lync stress test guide v2.0 (ebook)
Thomas Poett
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
Jinwon Lee
 
SOFA Tutorial
SOFA TutorialSOFA Tutorial
SOFA Tutorial
NTU CSIE, Taiwan
 
Performance Evaluation of a Network Using Simulation Tools or Packet Tracer
Performance Evaluation of a Network Using Simulation Tools or Packet TracerPerformance Evaluation of a Network Using Simulation Tools or Packet Tracer
Performance Evaluation of a Network Using Simulation Tools or Packet Tracer
IOSRjournaljce
 
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...
Engr. Md. Jamal Uddin Rayhan
 
Practical Guidelines for Solving Difficult Mixed Integer Programs
Practical Guidelines for Solving Difficult  Mixed Integer ProgramsPractical Guidelines for Solving Difficult  Mixed Integer Programs
Practical Guidelines for Solving Difficult Mixed Integer Programs
IBM Decision Optimization
 
Neural Field aware Factorization Machine
Neural Field aware Factorization MachineNeural Field aware Factorization Machine
Neural Field aware Factorization Machine
InMobi
 
Ai final ppt with InMobi template
Ai  final ppt with InMobi templateAi  final ppt with InMobi template
Ai final ppt with InMobi template
Gunjan Sharma
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
confluent
 
Ruby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3xRuby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3x
Matthew Gaudet
 
On Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLOn Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQL
Databricks
 
Cache Design for an Alpha Microprocessor
Cache Design for an Alpha MicroprocessorCache Design for an Alpha Microprocessor
Cache Design for an Alpha Microprocessor
Bharat Biyani
 

Similar to Process coordinator in NUMA environment (20)

Sample_CPT_Presentation-by_Dongwei_Mei.pdf
Sample_CPT_Presentation-by_Dongwei_Mei.pdfSample_CPT_Presentation-by_Dongwei_Mei.pdf
Sample_CPT_Presentation-by_Dongwei_Mei.pdf
 
MongoDB performance tuning and monitoring with MMS
MongoDB performance tuning and monitoring with MMSMongoDB performance tuning and monitoring with MMS
MongoDB performance tuning and monitoring with MMS
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
 
Bài tập lớn hệ điều hành HCMUT_HK232.pdf
Bài tập lớn hệ điều hành HCMUT_HK232.pdfBài tập lớn hệ điều hành HCMUT_HK232.pdf
Bài tập lớn hệ điều hành HCMUT_HK232.pdf
 
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
 
Runtime Performance Optimizations for an OpenFOAM Simulation
Runtime Performance Optimizations for an OpenFOAM SimulationRuntime Performance Optimizations for an OpenFOAM Simulation
Runtime Performance Optimizations for an OpenFOAM Simulation
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsTask Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and Workflows
 
Online recommendations at scale using matrix factorisation
Online recommendations at scale using matrix factorisationOnline recommendations at scale using matrix factorisation
Online recommendations at scale using matrix factorisation
 
Lync stress test guide v2.0 (ebook)
Lync stress test guide v2.0 (ebook)Lync stress test guide v2.0 (ebook)
Lync stress test guide v2.0 (ebook)
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
 
SOFA Tutorial
SOFA TutorialSOFA Tutorial
SOFA Tutorial
 
Performance Evaluation of a Network Using Simulation Tools or Packet Tracer
Performance Evaluation of a Network Using Simulation Tools or Packet TracerPerformance Evaluation of a Network Using Simulation Tools or Packet Tracer
Performance Evaluation of a Network Using Simulation Tools or Packet Tracer
 
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...
 
Practical Guidelines for Solving Difficult Mixed Integer Programs
Practical Guidelines for Solving Difficult  Mixed Integer ProgramsPractical Guidelines for Solving Difficult  Mixed Integer Programs
Practical Guidelines for Solving Difficult Mixed Integer Programs
 
Neural Field aware Factorization Machine
Neural Field aware Factorization MachineNeural Field aware Factorization Machine
Neural Field aware Factorization Machine
 
Ai final ppt with InMobi template
Ai  final ppt with InMobi templateAi  final ppt with InMobi template
Ai final ppt with InMobi template
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
 
Ruby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3xRuby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3x
 
On Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLOn Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQL
 
Cache Design for an Alpha Microprocessor
Cache Design for an Alpha MicroprocessorCache Design for an Alpha Microprocessor
Cache Design for an Alpha Microprocessor
 

Recently uploaded

Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 

Recently uploaded (20)

Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 

Process coordinator in NUMA environment

  • 1. Process Coordinator in NUMA Environment Song Chi Young stchuck@dankook.ac.kr Dankook UNIV. Jongmoo Choi choijm@dankook.ac.kr Dankook UNIV.
  • 4. What is NUMA? 1. Introduction
  • 5. What is NUMA? 1. Introduction
  • 6. What is NUMA? 1. Introduction 1. Core 0 is request to GQ 2. GQ finds form LLC 3. If can not find 1. Find from own DRAM 2. Request to next nodes GQ
  • 7. Motivation 1. Introduction 0 50 100 150 200 250 4 4 4 streamcluster canneal facesim ① 1Processor Local ② 1Processor Interleaving ③ 2Processor Interleaving ④ 1Processor Remote
  • 8. Motivation 1. Introduction 0 50 100 150 200 250 4 facesim ① 1Processor Local ② 1Processor Interleaving ③ 2Processor Interleaving ④ 1Processor Remote Why memory interleaving policy Better than local memory policy?
  • 10. Effective Factor 2. Design 0 50 100 150 200 250 0 0.1 0.2 0.3 0.4 0.5 LLC Hit Ratio 0 50 100 150 200 250 0 100000000 200000000 300000000 GQ_data_from_LLC 0 50 100 150 200 250 0 50000000 100000000 150000000 200000000 GQ_data_from_QPI 0 50 100 150 200 250 0 2E+10 4E+10 6E+10 GQ_DATA.FROM_IMC 0 50 100 150 200 250 0 5E+09 1E+10 1.5E+10 2E+10 QHL_REQUEST.REMOTE 0 50 100 150 200 250 0 10000 20000 30000 40000 50000 60000 70000 MBW
  • 11. Effective Factor 2. Design 0 50 100 150 200 250 0 0.1 0.2 0.3 0.4 0.5 LLC Hit Ratio 0 50 100 150 200 250 0 100000000 200000000 300000000 GQ_data_from_LLC 0 50 100 150 200 250 0 50000000 100000000 150000000 200000000 GQ_data_from_QPI 0 50 100 150 200 250 0 2E+10 4E+10 6E+10 GQ_DATA.FROM_IMC 0 50 100 150 200 250 0 5E+09 1E+10 1.5E+10 2E+10 QHL_REQUEST.REMOTE 0 50 100 150 200 250 0 10000 20000 30000 40000 50000 60000 70000 MBW LLC, Memory Interleaving, Local/Remote
  • 12. Effective Factor 2. Design Shared Data MeMory usage Working Set size Read Write NuMa
  • 13. Effective Factor 2. Design Shared Data MeMory usage Working Set size Read Write NuMa (출처: BIENIA, Christian, et al. The PARSEC benchmark suite: Characterization and architectural implications. In: Proceedings of the 17th international conference on Parallel architectures and compilation techniques. ACM, 2008. p. 72-81.)
  • 14. Effective Factor 2. Design Shared Data MeMory usage Working Set size Read Write NuMa (출처: BIENIA, Christian, et al. The PARSEC benchmark suite: Characterization and architectural implications. In: Proceedings of the 17th international conference on Parallel architectures and compilation techniques. ACM, 2008. p. 72-81.)
  • 15. Effective Factor 2. Design Shared Data MeMory usage Working Set size Read Write NuMa PMU data A : UNC_IMC_NORMAL_READS.ANY B : UNC_IMC_WRITES.FULL.ANY Usage = A + B Read Write 비율 = A / B
  • 16. How to Measure 2. Design 𝑃𝑣 𝑡 × 𝑀 × 𝑆 𝑣 ∝ 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 Pv : Performance Vector table for workloads Sv : System Performance Vector table M : Transformation matrix
  • 17. How to Measure 2. Design ⓐ ⓑ SD MM WS RW NM × 0 0 0 1 1 0 0 0 1 0 0 1 1 0 0 × LC MI LR ⓐ ⓑ
  • 18. How to Measure 2. Design 0 0 0 1 1 0 0 0 1 0 0 1 1 0 0 𝑀 × 𝑆 𝑣 ∝ 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒= LC MI LR SD MM WS RW NM Heuristic method If, one factor affect to the other factor Check this table, 1
  • 20. Performance Vector for Workloads 3. Result Canneal StreamCluster Facesim SD 1 2 1 MM 1 2 3 WS 1 0 1 RW 1 2 1 NM 2 2 2 SD: Shared data, MM : Memory usage, WS : Working Set size, RW : R/W 비율, NM : NUMA 가중치
  • 21. Performance Vector for NUMA Configuration 3. Result 1 1_Processor Local 2 1_Processor interleaving 3 2_Processor interleaving 4 1_Processor Remote LC 1 1 2 1 MI 1 2 2 1 LR 2 1 1 0 LC : LLC Size, MI : Memory Interleaving, LR : Local/Remote 비율
  • 22. Result 3. Result 1 2 3 4 canneal 10 8 9 6 stream 6 5 6 4 facesim 5 5 6 4 1 2 3 4 canneal 9 7 9 3 stream 12 10 12 4 facesim 13 12 14 5 실제 성능 계산된 성능 가중치
  • 23. Result 3. Result 각 워크로드 별 상관계수 canneal stream cluster facesim 0.966092 0.965581 0.9 1 2 3 4 canneal 10 8 9 6 stream 6 5 6 4 facesim 5 5 6 4 1 2 3 4 canneal 9 7 9 3 stream 12 10 12 4 facesim 13 12 14 5 실제 성능 계산된 성능 가중치
  • 25. Conclusion 4. Conclusion Why memory interleaving policy Better than local memory policy? Effective Factor : LLC, Memory Interleaving, Local/Remote SD MM WS RW NM × 0 0 0 1 1 0 0 0 1 0 0 1 1 0 0 × LC MI LR 𝑃𝑣 𝑡 × 𝑀 × 𝑆 𝑣 ∝ 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 canneal stream cluster facesim 0.966092 0.965581 0.9 pearson correlation coefficient Q : 