David Lo, Dragos Sbirlea, Rohit Jnagal
Managing Memory Bandwidth Antagonism @ Scale
David Dragos & Rohit
Borg Model
● Large clusters with multi-tenant hosts.
● Run a mix of :
○ high and low priority workloads.
○ latency-sensitive and batch workloads.
● Isolation through bare-metal containers
(cgroups/namespaces)
○ Cgroups and perf to monitor host and job
performance.
○ Cgroups and h/w controls to manage
on-node performance.
○ Cluster scheduling and balancing manages
service performance.
Efficiency
Availability
Performance
3
The Memory Bandwidth Problem
● Large variation in performance
on multi-tenant hosts.
● On average, saturation events are
few, but:
○ Periodically causes significant
cluster-wide performance
degradation.
● Some workloads are much more
seriously affected than others.
○ Does not necessarily correlate
with victim’s memory bandwidth
use.
Latency
time
Antagonist task starts
4
Note : This talk is focussed on membw problem for general servers and
does not cover GPUs and other special devices. Similar techniques apply
there too.
Memory BW Saturation is Increasing Over Time
Nov
2018
5
Time
Fractionofmachineswithsaturation
Jan
2018
Fraction of machines that experienced mem BW saturation
● Large machines need to pack more jobs to maintain
utilization, resulting in more “noisy neighbor” problems.
Why It Is a (Bigger) Problem Now
● ML workloads are memory BW intensive
6
● Track per-socket local and remote memory bandwidth use
● Identify per-platform thresholds for performance dips (saturation)
● Characterize saturation by platform and clusters
Understanding the Scope : Socket-Level MonitoringMEM
Local
Write
Remote
MEM
LocalRemote
Read WriteWriteReadRead WriteRead
Socket 0 Socket 1
7
Saturation behavior varies with platform and cluster, due to
● hardware differences (membw/core ratio)
● workload (large CPU consumers run on bigger platforms)
Platform and Cluster Variation
8
By platform
By cluster
● Socket-level information gives the magnitude of the
problem and hot-spots
● Need task-level information to identify:
○ Abusers : tasks using disproportionate amount of
bandwidth
○ Victims : tasks seeing performance drop
● New platforms provide task-level memory bandwidth
monitoring, but:
○ RDT cgroup was on its way out
○ Have no data on older platforms
For our purposes, a rough attribution of memory bandwidth
was good enough
Monitoring Sockets ↣ Monitoring Tasks
Saturation threshold
9
Totalmemorybandwidth
MemoryBWbreakdown
● Summary of requirements:
○ Local and remote bandwidth breakdown
○ Compatible with with cgroup model
● What's available in hardware?
○ Uncore counters (IMC, CHA)
■ Difficult to attribute to HyperThread => cgroup
○ CPU PMU counters
■ Counters are HyperThread local
■ Works with cgroup profiling mode
D
D
R
I
M
C
CPU Core
CHA
HT0 HT1
CPU Core
CHA
HT0 HT1
Per-task Memory Bandwidth Estimation
10
● OFFCORE_RESPONSE for Intel CPUs
● Programmable filter to specify events of interest (i.e. DRAM local and DRAM remote)
● Captures both demand load and HW prefetcher traffic
● Online documentation of the meaning of bits, per CPU (download.01.org)
● How to interpret: cache lines / sec X 64b/cache line = BW
Intel SDM Vol 3
Which CPU Perfmon to Use?
11
Abuser insights
● Large percentage of time, a single consumer uses up most bandwidth.
● The share of CPU of that consumer are much lower than its share of membw.
Victim insight
● Many jobs are sensitive to membw saturation.
● Jobs are sensitive even though they are not big users of membw.
Guidance on enforcement options
● How much saturation would we avoid if we do X?
● Which jobs would get caught in the crossfire?
Insights from Task Measurement
CPI degradation on saturation
(as a fraction)
Numberofjobs
Combinations of jobs (by CPU
requirements) during saturation
12
Enforcement : Controlling Different Workloads
BW Usage
Priority
Moderate Heavy
LowMediumHigh
Isolate
Disable
ThrottleThrottle
Reactive rescheduling
Isolate
13
What Can We Do ? Node and Cluster Level Actuators
Node
Memory Bandwidth Allocation in hardware
Use HW QoS to apply max limits to tasks
overusing memory bandwidth.
CPU throttling for indirect control
Limit CPU access of over-using tasks to
indirectly limit the memory bandwidth used.
Cluster
Reactive evictions & re-scheduling
Hosts experiencing memory BW saturation
signals scheduler to re-distribute bigger memory
bandwidth users to lightly-loaded machines.
Disabling heavy antagonist workloads
Tasks that saturate a socket by itself cannot be
effectively redistributed. If slowing down is not
an option, de-schedule them.
14
+ Very effective in reducing saturation;
+ Works on all platforms
Node : CPU Throttling
Socket 0 (saturated) Socket 1
CPUs running memBW over-users
- Too coarse in granularity;
- Interacts poorly with Autoscaling & Load-balancing
15
Socket memory BW
saturation detector
Cgroup memory BW
estimator
Memory BW enforcer
Socket
perf counters
Every x seconds
If socket BW > saturation threshold
Socket, Cgroup
perf counters
Profile potentially
eligible tasks
Policy filter
CPU runnable mask
Select eligible tasks
for throttling
If socket BW < unthrottle threshold,
unthrottle tasks
16
Throttling - Enforcement Algorithm
Node : Memory Bandwidth Allocation
Intel RDT
Memory Bandwidth Allocation
+ Reduced bandwidth without lowering CPU
utilization.
+ Somewhat fine-grained than cpu-level
controls.
- Newer platforms only.
- Can’t isolate well between hyperthreads.
Supported through resctrl in kernel
(more on that later)
17
In many cases, there are:
● A low-percentage of saturated sockets in cluster, and
● Multiple tasks contributing to saturation.
Re-scheduling the tasks to less loaded machines can avoid
slow-downs.
Does not help with large antagonists that can saturate any
socket it runs on.
Cluster : Reactive Re-Scheduling
ObserverScheduler
host
A
host
B
host
C
host
D
saturated
1.Callforhelp
2.Evict
3.Reschedule
18
Low priority jobs can be dealt at node-level through throttling.
If SLOs do not permit throttling and the antagonists cannot be
redistributed :
● Disable (kick out of the cluster)
● Users can then reconfigure their service to use different product.
● Area of continual work.
Alternative :
● Colocate multiple antagonists (that’s just working around SLOs)
Handling Cluster-Wide Saturation
Cluster Membw distribution
amenable to rescheduling
Cluster Membw distribution
amenable to job disabling
Saturation
threshold
Saturation
threshold
19
Results : CPU Throttling + Rescheduling
20
Results : Rebalancing
21
● New, unified interface: resctrl
● resctrl is a big improvement over the previous non-standard cgroup interface
● Uniform way of monitoring/controlling HW QoS across vendors/architectures
○ AMD, ARM, Intel
● (Non-exhaustive) list of HW features supported:
○ Memory BW monitoring
○ Memory BW throttling
○ L3 cache usage monitoring
○ L3 cache partitioning
resctrl : HW QoS Support in Kernel
22
● Below is using x86 terminology
● CLass of Service ID (CLOSID): maps to a QoS configuration. Typically O(10) unique
ones in HW.
● Resource Monitoring ID (RMID): used to tag workloads and their used resources to
aggregate their resource usage. Typically O(100) unique ones in HW.
Intro to HW QoS Terms and Concepts
Hi priority (CLOSID 0)
100% L3 cache
100% mem BW
Low priority (CLOSID 1)
50% L3 cache
20% mem BW
RMID0 RMID1 RMID2 RMID3 RMID4
Workload A Workload B Workload C
23
resctrl/
|- groupA/
| |- mon_groups/
| | |- monA/
| | | |- mon_data/
| | | |- tasks
| | | |- ...
| | |- monB/
| | |- mon_data/
| | |- ...
| |- schemata
| |- tasks
| |- ...
|- groupB/
|- ...
Overview of resctrl Filesystem
Documentation: https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt
A resource control group. Represents one unique HW CLOSID.
A monitoring group. Represents one unique HW RMID.
TIDs in monitoring group
TIDs in resource control group
QoS configuration for resource control group
Resource usage data for entire resource control group
Resource usage data for monitoring group
24
Example Usage of resctrl Interfaces
$ cat groupA/schemata
L3:0=ff;1=ff
MB:0=90;1=90
$ READING0=$(cat groupA/mon_data/mon_L3_00/mbm_total_bytes)
$ sleep 1
$ READING1=$(cat groupA/mon_data/mon_L3_00/mbm_total_bytes)
$ echo $((READING1-READING0))
1816234126
Allowed to use 8 cache ways for L3 on both sockets.
Per-core memory BW constrained to 90% on both sockets.
Compute memory BW by taking a rate.
In this case, BW ~= 1.8GiB/s
25
Reconciling resctrl and cgroups: First Try
resctrl/
|- no_throttle/
| |- mon_groups/
| | |- cgroupX/
| | | |- mon_data/
| | | |- tasks
| | | |- ...
| | |- monB/
| | |- mon_data/
| | |- ...
| |- schemata
| |- tasks
| |- ...
|- bw_throttled/
|- ...
<< #1
<< #1
<< #1
<< #3
<< #5 ↻
<< #6 ↻
Use case: dynamically apply memory BW throttling if
machine is in trouble
1. Node SW creates 2 resctrl groups: no_throttle
and bw_throttled
2. On cgroup creation, logically assign cgroupX to
no_throttle
3. Create a mongroup for cgroupX in
no_throttle
4. Start cgroupX
5. Move TIDs into no_throttle/tasks
6. Move TIDs into
no_throttle/mon_groups/cgroupX/tasks
7. Move TIDs of high BW user into bw_throttled
26
Use case: dynamically apply memory BW throttling if
machine is in trouble
1. Node SW creates 2 resctrl groups: no_throttle
and bw_throttled
2. On cgroup creation, logically assign cgroupX to
no_throttle
3. Create a mongroup for cgroupX in
no_throttle
4. Start cgroupX
5. Move TIDs into no_throttle/tasks
6. Move TIDs into
no_throttle/mon_groups/cgroupX/tasks
7. Move TIDs of high BW user into bw_throttled
Challenges with Naive Approach
Race in moving TIDs if cgroup is
creating threads. Expensive if lots
of TIDs and to deal with the race.
Desynchronization of L3 cache
occupancy data, since existing
data is tagged with an old RMID.
27
● What if we had the ability to have a 1:1 mapping of cgroups to resctrl groups
○ To change QoS configs, just rewrite schemata
○ More efficient, remove need to move TIDs around
○ Keep existing RMID, prevent L3 occupancy desynchronization issue
○ 100% compatible with existing resctrl abstraction
● CHALLENGE: with existing system, will run out of CLOSIDs very quickly
● SOLUTION: share CLOSIDs between resource control groups with the same schemata
● Google-developed kernel patch for this functionality to be released soon
● Demonstrates need to make cgroup model a first class consideration for QoS
interfaces
A Better Approach for resctrl and cgroups
28
cgroups and resctrl with After the Change
resctrl/
|- cgroupX/
| |- mon_groups/
| | |- mon_data/
| | |- ...
| |- schemata
| |- tasks
| |- ...
|- high_bw_cgroup/
| |- schemata
| |- ...
|- ...
<< #1
<< #4 ↻
Use case: dynamically apply memory BW throttling if
machine is in trouble
1. Create a resctrl group cgroupX
2. Write no throttling configuration to
cgroupX/schemata
3. Start cgroupX
4. Move TIDs into cgroupX/tasks
5. Rewrite schemata of high BW using cgroup to
throttle
<< #2
<< #5
29
● Measuring µArch impact is not a first class component of
most container runtimes.
○ Can’t manage what we can’t see...
● Most container runtimes expose isolation knobs per
container.
● Managing µArch isolation requires node and cluster level
feedback-loops.
○ Dual operating mode : admins & users.
○ Performance isolation not necessarily controllable by
end-users.
We would love to contribute to a standard framework around
performance management for container runtimes.
µArch Features & Container Runtimes
Efficiency
Availability
Performance
30
Takeaways and Future work
● Memory bandwidth and low-level isolation issues becoming more significant.
● Continuous monitoring is critical to run successful multi-tenant hosts.
● Defining requirements for h/w providers and s/w interfaces on QoS knobs.
○ Critical to have these solutions work for containers / process-groups.
● Increasing success rate with current approach:
○ Handling of minimum guaranteed membw usage
○ Handling logically related jobs - Borg allocs
● A general framework would help collaboration.
● Future : Memory BW scheduling (based on hints)
○ Based on membw usage
○ Based on membw sensitivity
31
Find us at the conf or reach out at :
davidlo@
dragoss@
google.com
jnagal@
eranian@
Thanks !
32

Memory Bandwidth QoS

  • 1.
    David Lo, DragosSbirlea, Rohit Jnagal Managing Memory Bandwidth Antagonism @ Scale
  • 2.
  • 3.
    Borg Model ● Largeclusters with multi-tenant hosts. ● Run a mix of : ○ high and low priority workloads. ○ latency-sensitive and batch workloads. ● Isolation through bare-metal containers (cgroups/namespaces) ○ Cgroups and perf to monitor host and job performance. ○ Cgroups and h/w controls to manage on-node performance. ○ Cluster scheduling and balancing manages service performance. Efficiency Availability Performance 3
  • 4.
    The Memory BandwidthProblem ● Large variation in performance on multi-tenant hosts. ● On average, saturation events are few, but: ○ Periodically causes significant cluster-wide performance degradation. ● Some workloads are much more seriously affected than others. ○ Does not necessarily correlate with victim’s memory bandwidth use. Latency time Antagonist task starts 4 Note : This talk is focussed on membw problem for general servers and does not cover GPUs and other special devices. Similar techniques apply there too.
  • 5.
    Memory BW Saturationis Increasing Over Time Nov 2018 5 Time Fractionofmachineswithsaturation Jan 2018 Fraction of machines that experienced mem BW saturation
  • 6.
    ● Large machinesneed to pack more jobs to maintain utilization, resulting in more “noisy neighbor” problems. Why It Is a (Bigger) Problem Now ● ML workloads are memory BW intensive 6
  • 7.
    ● Track per-socketlocal and remote memory bandwidth use ● Identify per-platform thresholds for performance dips (saturation) ● Characterize saturation by platform and clusters Understanding the Scope : Socket-Level MonitoringMEM Local Write Remote MEM LocalRemote Read WriteWriteReadRead WriteRead Socket 0 Socket 1 7
  • 8.
    Saturation behavior varieswith platform and cluster, due to ● hardware differences (membw/core ratio) ● workload (large CPU consumers run on bigger platforms) Platform and Cluster Variation 8 By platform By cluster
  • 9.
    ● Socket-level informationgives the magnitude of the problem and hot-spots ● Need task-level information to identify: ○ Abusers : tasks using disproportionate amount of bandwidth ○ Victims : tasks seeing performance drop ● New platforms provide task-level memory bandwidth monitoring, but: ○ RDT cgroup was on its way out ○ Have no data on older platforms For our purposes, a rough attribution of memory bandwidth was good enough Monitoring Sockets ↣ Monitoring Tasks Saturation threshold 9 Totalmemorybandwidth MemoryBWbreakdown
  • 10.
    ● Summary ofrequirements: ○ Local and remote bandwidth breakdown ○ Compatible with with cgroup model ● What's available in hardware? ○ Uncore counters (IMC, CHA) ■ Difficult to attribute to HyperThread => cgroup ○ CPU PMU counters ■ Counters are HyperThread local ■ Works with cgroup profiling mode D D R I M C CPU Core CHA HT0 HT1 CPU Core CHA HT0 HT1 Per-task Memory Bandwidth Estimation 10
  • 11.
    ● OFFCORE_RESPONSE forIntel CPUs ● Programmable filter to specify events of interest (i.e. DRAM local and DRAM remote) ● Captures both demand load and HW prefetcher traffic ● Online documentation of the meaning of bits, per CPU (download.01.org) ● How to interpret: cache lines / sec X 64b/cache line = BW Intel SDM Vol 3 Which CPU Perfmon to Use? 11
  • 12.
    Abuser insights ● Largepercentage of time, a single consumer uses up most bandwidth. ● The share of CPU of that consumer are much lower than its share of membw. Victim insight ● Many jobs are sensitive to membw saturation. ● Jobs are sensitive even though they are not big users of membw. Guidance on enforcement options ● How much saturation would we avoid if we do X? ● Which jobs would get caught in the crossfire? Insights from Task Measurement CPI degradation on saturation (as a fraction) Numberofjobs Combinations of jobs (by CPU requirements) during saturation 12
  • 13.
    Enforcement : ControllingDifferent Workloads BW Usage Priority Moderate Heavy LowMediumHigh Isolate Disable ThrottleThrottle Reactive rescheduling Isolate 13
  • 14.
    What Can WeDo ? Node and Cluster Level Actuators Node Memory Bandwidth Allocation in hardware Use HW QoS to apply max limits to tasks overusing memory bandwidth. CPU throttling for indirect control Limit CPU access of over-using tasks to indirectly limit the memory bandwidth used. Cluster Reactive evictions & re-scheduling Hosts experiencing memory BW saturation signals scheduler to re-distribute bigger memory bandwidth users to lightly-loaded machines. Disabling heavy antagonist workloads Tasks that saturate a socket by itself cannot be effectively redistributed. If slowing down is not an option, de-schedule them. 14
  • 15.
    + Very effectivein reducing saturation; + Works on all platforms Node : CPU Throttling Socket 0 (saturated) Socket 1 CPUs running memBW over-users - Too coarse in granularity; - Interacts poorly with Autoscaling & Load-balancing 15
  • 16.
    Socket memory BW saturationdetector Cgroup memory BW estimator Memory BW enforcer Socket perf counters Every x seconds If socket BW > saturation threshold Socket, Cgroup perf counters Profile potentially eligible tasks Policy filter CPU runnable mask Select eligible tasks for throttling If socket BW < unthrottle threshold, unthrottle tasks 16 Throttling - Enforcement Algorithm
  • 17.
    Node : MemoryBandwidth Allocation Intel RDT Memory Bandwidth Allocation + Reduced bandwidth without lowering CPU utilization. + Somewhat fine-grained than cpu-level controls. - Newer platforms only. - Can’t isolate well between hyperthreads. Supported through resctrl in kernel (more on that later) 17
  • 18.
    In many cases,there are: ● A low-percentage of saturated sockets in cluster, and ● Multiple tasks contributing to saturation. Re-scheduling the tasks to less loaded machines can avoid slow-downs. Does not help with large antagonists that can saturate any socket it runs on. Cluster : Reactive Re-Scheduling ObserverScheduler host A host B host C host D saturated 1.Callforhelp 2.Evict 3.Reschedule 18
  • 19.
    Low priority jobscan be dealt at node-level through throttling. If SLOs do not permit throttling and the antagonists cannot be redistributed : ● Disable (kick out of the cluster) ● Users can then reconfigure their service to use different product. ● Area of continual work. Alternative : ● Colocate multiple antagonists (that’s just working around SLOs) Handling Cluster-Wide Saturation Cluster Membw distribution amenable to rescheduling Cluster Membw distribution amenable to job disabling Saturation threshold Saturation threshold 19
  • 20.
    Results : CPUThrottling + Rescheduling 20
  • 21.
  • 22.
    ● New, unifiedinterface: resctrl ● resctrl is a big improvement over the previous non-standard cgroup interface ● Uniform way of monitoring/controlling HW QoS across vendors/architectures ○ AMD, ARM, Intel ● (Non-exhaustive) list of HW features supported: ○ Memory BW monitoring ○ Memory BW throttling ○ L3 cache usage monitoring ○ L3 cache partitioning resctrl : HW QoS Support in Kernel 22
  • 23.
    ● Below isusing x86 terminology ● CLass of Service ID (CLOSID): maps to a QoS configuration. Typically O(10) unique ones in HW. ● Resource Monitoring ID (RMID): used to tag workloads and their used resources to aggregate their resource usage. Typically O(100) unique ones in HW. Intro to HW QoS Terms and Concepts Hi priority (CLOSID 0) 100% L3 cache 100% mem BW Low priority (CLOSID 1) 50% L3 cache 20% mem BW RMID0 RMID1 RMID2 RMID3 RMID4 Workload A Workload B Workload C 23
  • 24.
    resctrl/ |- groupA/ | |-mon_groups/ | | |- monA/ | | | |- mon_data/ | | | |- tasks | | | |- ... | | |- monB/ | | |- mon_data/ | | |- ... | |- schemata | |- tasks | |- ... |- groupB/ |- ... Overview of resctrl Filesystem Documentation: https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt A resource control group. Represents one unique HW CLOSID. A monitoring group. Represents one unique HW RMID. TIDs in monitoring group TIDs in resource control group QoS configuration for resource control group Resource usage data for entire resource control group Resource usage data for monitoring group 24
  • 25.
    Example Usage ofresctrl Interfaces $ cat groupA/schemata L3:0=ff;1=ff MB:0=90;1=90 $ READING0=$(cat groupA/mon_data/mon_L3_00/mbm_total_bytes) $ sleep 1 $ READING1=$(cat groupA/mon_data/mon_L3_00/mbm_total_bytes) $ echo $((READING1-READING0)) 1816234126 Allowed to use 8 cache ways for L3 on both sockets. Per-core memory BW constrained to 90% on both sockets. Compute memory BW by taking a rate. In this case, BW ~= 1.8GiB/s 25
  • 26.
    Reconciling resctrl andcgroups: First Try resctrl/ |- no_throttle/ | |- mon_groups/ | | |- cgroupX/ | | | |- mon_data/ | | | |- tasks | | | |- ... | | |- monB/ | | |- mon_data/ | | |- ... | |- schemata | |- tasks | |- ... |- bw_throttled/ |- ... << #1 << #1 << #1 << #3 << #5 ↻ << #6 ↻ Use case: dynamically apply memory BW throttling if machine is in trouble 1. Node SW creates 2 resctrl groups: no_throttle and bw_throttled 2. On cgroup creation, logically assign cgroupX to no_throttle 3. Create a mongroup for cgroupX in no_throttle 4. Start cgroupX 5. Move TIDs into no_throttle/tasks 6. Move TIDs into no_throttle/mon_groups/cgroupX/tasks 7. Move TIDs of high BW user into bw_throttled 26
  • 27.
    Use case: dynamicallyapply memory BW throttling if machine is in trouble 1. Node SW creates 2 resctrl groups: no_throttle and bw_throttled 2. On cgroup creation, logically assign cgroupX to no_throttle 3. Create a mongroup for cgroupX in no_throttle 4. Start cgroupX 5. Move TIDs into no_throttle/tasks 6. Move TIDs into no_throttle/mon_groups/cgroupX/tasks 7. Move TIDs of high BW user into bw_throttled Challenges with Naive Approach Race in moving TIDs if cgroup is creating threads. Expensive if lots of TIDs and to deal with the race. Desynchronization of L3 cache occupancy data, since existing data is tagged with an old RMID. 27
  • 28.
    ● What ifwe had the ability to have a 1:1 mapping of cgroups to resctrl groups ○ To change QoS configs, just rewrite schemata ○ More efficient, remove need to move TIDs around ○ Keep existing RMID, prevent L3 occupancy desynchronization issue ○ 100% compatible with existing resctrl abstraction ● CHALLENGE: with existing system, will run out of CLOSIDs very quickly ● SOLUTION: share CLOSIDs between resource control groups with the same schemata ● Google-developed kernel patch for this functionality to be released soon ● Demonstrates need to make cgroup model a first class consideration for QoS interfaces A Better Approach for resctrl and cgroups 28
  • 29.
    cgroups and resctrlwith After the Change resctrl/ |- cgroupX/ | |- mon_groups/ | | |- mon_data/ | | |- ... | |- schemata | |- tasks | |- ... |- high_bw_cgroup/ | |- schemata | |- ... |- ... << #1 << #4 ↻ Use case: dynamically apply memory BW throttling if machine is in trouble 1. Create a resctrl group cgroupX 2. Write no throttling configuration to cgroupX/schemata 3. Start cgroupX 4. Move TIDs into cgroupX/tasks 5. Rewrite schemata of high BW using cgroup to throttle << #2 << #5 29
  • 30.
    ● Measuring µArchimpact is not a first class component of most container runtimes. ○ Can’t manage what we can’t see... ● Most container runtimes expose isolation knobs per container. ● Managing µArch isolation requires node and cluster level feedback-loops. ○ Dual operating mode : admins & users. ○ Performance isolation not necessarily controllable by end-users. We would love to contribute to a standard framework around performance management for container runtimes. µArch Features & Container Runtimes Efficiency Availability Performance 30
  • 31.
    Takeaways and Futurework ● Memory bandwidth and low-level isolation issues becoming more significant. ● Continuous monitoring is critical to run successful multi-tenant hosts. ● Defining requirements for h/w providers and s/w interfaces on QoS knobs. ○ Critical to have these solutions work for containers / process-groups. ● Increasing success rate with current approach: ○ Handling of minimum guaranteed membw usage ○ Handling logically related jobs - Borg allocs ● A general framework would help collaboration. ● Future : Memory BW scheduling (based on hints) ○ Based on membw usage ○ Based on membw sensitivity 31
  • 32.
    Find us atthe conf or reach out at : davidlo@ dragoss@ google.com jnagal@ eranian@ Thanks ! 32