Gpu submit time frequency boosting

•Download as PPTX, PDF•

0 likes•57 views

Somdutta Roy

Improving frequency scaling for GPU bound workloads for better responsiveness compared to traditional DVFS scaling

Engineering

IMPROVING GPU FREQUENCY SCALING FOR GPU WORKLOADS

TYPICAL DVFS BASED GPU BOOST MECHANISM
• GPU frequency boosting wired through the devfreq governor
• Monitors GPU busyness and tries to keep current load under given
target load by adjusting gpu frequency with tunables like settling
time, bias, damp and rampdown_delay
• Basically boost_freq = bias * freq * (load - target)/target
• Ideal for sustained loads and burstiness within high load window
• Too aggressive tunings lead to higher reactiveness
• However also leads to constant gpu overpowering
• For e.g. too low target_load or high rampdown_delay

PROBLEM
• Low latency VR use cases typically present repetitive & bursty GPU
workloads
• Need is guaranteed GPU horsepower exactly when workload
gets scheduled
• Load quickly gets degenerated (but high chance of repeating) -
so frequency needs to quickly fall down (and ramp up back)
• Typical use cases exhibiting this kind of burstiness are camera post
processing, edge detection, atw...
• Slower response time associated with current governor in ramping
up frequency clearly shows up with overall low perf/watt

JUST IN (SUBMIT) TIME FREQ SCALING
• Density of work submission (unit time) forms basis of GPU load
• Delay (order of ms) in submit to governor’s load visibility
• Translates to latency in effective gpu frequency boost
• Short boost pulse in submit code path takes care of ramp up latency
• Inherently makes frequency follow workload
• Increased chances of governor now seeing lower load and pulling
frequency down
• Effective gpu freq comes down to fmax@vmin for profiled use cases
(presenting better perf/watt)

PERF/POWER DATA ACROSS USE CASES
GPU intensive
section (ms)
Avg GPU
Busyness
Avg GPU
Frequency
(Mhz)
Avg GPU
Power
(mW)
Avg
(VDD_IN)
Total Power
(mW)
%
Perf/Watt
Increase
Pupil Detection (with
JIT scaling)
Edge
Detection
11.004 34 497 471 5488
99.623182
Pupil Detection (with
default scaling)
21.158 182 293 421 5286
Passthrough camera
(with JIT scaling)
Camera to
Display
(e2e)
40.599 219 596 856 7591
4.5763017
Passthrough camera
(with default scaling)
45.466 590 283 837 8129
Passthrough camera
(with max gpu)
40.025 153 1331 1377 8677

PUPIL DETECTION WITH CURRENT FREQ SCALING
Avg Max Min
GPU
intensive
code
latency ( in
ms)
21.158 843.41 7.289
GPU
Busyness
182 401 57
GPU
frequency
(in Mhz)
293 595 109
GPU Power
(in mW)
421 534 152

PUPIL DETECTION WITH JIT FREQ SCALING
Avg Max Min
GPU
intensive
code
latency (in
ms)
11.004 957.52 5890
GPU
Busyness
34 504 10
GPU
frequency
(in Mhz)
497 790 109
GPU Power
(in mW)
471 610 152

PASSTHROUGH WITH DEFAULT FREQ SCALING
Avg Max Min
GPU
intensive
code
latency (in
ms)
45.466 82.425 33.461
GPU
Busyness
590 946 173
GPU
frequency
(in Mhz)
283 693 109
GPU Power
(in mW)
837 838 761

PASSTHROUGH WITH JIT FREQ SCALING
Avg Max Min
GPU
intensive
code
latency (in
ms)
40.599 63.626 31.690
GPU
Busyness
219 390 67
GPU
frequency
(in Mhz)
596 790 303
GPU Power
(in mW)
856 914 762

Similar to Gpu submit time frequency boosting

KVM Tuning @ eBayXu Jiang

WALT vs PELT : Redux - SFO17-307Linaro

Dasia 2022klepsydratechnologie

Dynamic Resolution and Interlaced RenderingMartinMueller34

Service Assurance for Virtual Network Functions in Cloud-Native EnvironmentsNikos Anastopoulos

VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...VMworld

customization of a deep learning accelerator, based on NVDLAShien-Chun Luo

improve deep learning training and inference performances.rohit

Project ACRN CPU sharing BVT scheduler in ACRN hypervisorProject ACRN

Nick Fisk - low latency CephShapeBlue

Energy Efficient Computing using Dynamic Tuninginside-BigData.com

Symposium on HPC Applications – IIT KanpurRishi Pathak

HiPEAC 2019 Workshop - Use CasesTulipp. Eu

On the Capability and Achievable Performance of FPGAs for HPC ApplicationsWim Vanderbauwhede

Inside Microsoft's FPGA-Based Configurable Cloudinside-BigData.com

45 KVA Ground Power Unit for Raphael .pptxNeometrix_Engineering_Pvt_Ltd

AMD PowerTune & ZeroCore Power TechnologiesAMD

Ovs perfMadhu c

Performance Evaluation and Comparison of Service-based Image Processing based...Matthias Trapp

Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 InstanceScyllaDB

Similar to Gpu submit time frequency boosting (20)

KVM Tuning @ eBay

WALT vs PELT : Redux - SFO17-307

Dasia 2022

Dynamic Resolution and Interlaced Rendering

Service Assurance for Virtual Network Functions in Cloud-Native Environments

VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...

customization of a deep learning accelerator, based on NVDLA

improve deep learning training and inference performance

Project ACRN CPU sharing BVT scheduler in ACRN hypervisor

Nick Fisk - low latency Ceph

Energy Efficient Computing using Dynamic Tuning

Symposium on HPC Applications – IIT Kanpur

HiPEAC 2019 Workshop - Use Cases

On the Capability and Achievable Performance of FPGAs for HPC Applications

Inside Microsoft's FPGA-Based Configurable Cloud

45 KVA Ground Power Unit for Raphael .pptx

AMD PowerTune & ZeroCore Power Technologies

Ovs perf

Performance Evaluation and Comparison of Service-based Image Processing based...

Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance

Recently uploaded

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat

UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan

KubeKraft presentation @CloudNativeHooghlysanyuktamishra911

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Porous Ceramics seminar and technical writingrakeshbaidya232001

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla

Introduction to IEEE STANDARDS and its different types.pptxupamatechverse

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat

SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome

UNIT-II FMM-Flow Through Circular Conduitsrknatarajan

Recently uploaded (20)

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working

KubeKraft presentation @CloudNativeHooghly

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts

IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

Porous Ceramics seminar and technical writing

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS

Introduction to IEEE STANDARDS and its different types.pptx

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts

SPICE PARK APR2024 ( 6,793 SPICE Models )

UNIT-II FMM-Flow Through Circular Conduits

Gpu submit time frequency boosting

1. IMPROVING GPU FREQUENCY SCALING FOR GPU WORKLOADS

2. TYPICAL DVFS BASED GPU BOOST MECHANISM • GPU frequency boosting wired through the devfreq governor • Monitors GPU busyness and tries to keep current load under given target load by adjusting gpu frequency with tunables like settling time, bias, damp and rampdown_delay • Basically boost_freq = bias * freq * (load - target)/target • Ideal for sustained loads and burstiness within high load window • Too aggressive tunings lead to higher reactiveness • However also leads to constant gpu overpowering • For e.g. too low target_load or high rampdown_delay

3. PROBLEM • Low latency VR use cases typically present repetitive & bursty GPU workloads • Need is guaranteed GPU horsepower exactly when workload gets scheduled • Load quickly gets degenerated (but high chance of repeating) - so frequency needs to quickly fall down (and ramp up back) • Typical use cases exhibiting this kind of burstiness are camera post processing, edge detection, atw... • Slower response time associated with current governor in ramping up frequency clearly shows up with overall low perf/watt

4. JUST IN (SUBMIT) TIME FREQ SCALING • Density of work submission (unit time) forms basis of GPU load • Delay (order of ms) in submit to governor’s load visibility • Translates to latency in effective gpu frequency boost • Short boost pulse in submit code path takes care of ramp up latency • Inherently makes frequency follow workload • Increased chances of governor now seeing lower load and pulling frequency down • Effective gpu freq comes down to fmax@vmin for profiled use cases (presenting better perf/watt)

5. PERF/POWER DATA ACROSS USE CASES GPU intensive section (ms) Avg GPU Busyness Avg GPU Frequency (Mhz) Avg GPU Power (mW) Avg (VDD_IN) Total Power (mW) % Perf/Watt Increase Pupil Detection (with JIT scaling) Edge Detection 11.004 34 497 471 5488 99.623182 Pupil Detection (with default scaling) 21.158 182 293 421 5286 Passthrough camera (with JIT scaling) Camera to Display (e2e) 40.599 219 596 856 7591 4.5763017 Passthrough camera (with default scaling) 45.466 590 283 837 8129 Passthrough camera (with max gpu) 40.025 153 1331 1377 8677

6. PUPIL DETECTION WITH CURRENT FREQ SCALING Avg Max Min GPU intensive code latency ( in ms) 21.158 843.41 7.289 GPU Busyness 182 401 57 GPU frequency (in Mhz) 293 595 109 GPU Power (in mW) 421 534 152

7. PUPIL DETECTION WITH JIT FREQ SCALING Avg Max Min GPU intensive code latency (in ms) 11.004 957.52 5890 GPU Busyness 34 504 10 GPU frequency (in Mhz) 497 790 109 GPU Power (in mW) 471 610 152

8. PASSTHROUGH WITH DEFAULT FREQ SCALING Avg Max Min GPU intensive code latency (in ms) 45.466 82.425 33.461 GPU Busyness 590 946 173 GPU frequency (in Mhz) 283 693 109 GPU Power (in mW) 837 838 761

9. PASSTHROUGH WITH JIT FREQ SCALING Avg Max Min GPU intensive code latency (in ms) 40.599 63.626 31.690 GPU Busyness 219 390 67 GPU frequency (in Mhz) 596 790 303 GPU Power (in mW) 856 914 762

Gpu submit time frequency boosting

Recommended

Recommended

More Related Content

Similar to Gpu submit time frequency boosting

Similar to Gpu submit time frequency boosting (20)

Recently uploaded

Recently uploaded (20)

Gpu submit time frequency boosting