Closed-Loop Network Automation for Optimal Resource Allocation via Reinforcement Learning by Bin Li

Bin Li – Intel Labs
The Intersection of Networking and AI/ML Meetup – August 2019

2
Contributors
 Yipeng Wang yipeng1.wang@intel.com
 Ren Wang ren.wang@intel.com
 Charlie Tai charlie.tai@intel.com
 Andrew Herdrich andrew.j.herdrich@intel.com
 Tong Zhang tong2.zhang@intel.com
 Zhu Zhou zhu.zhou@intel.com

3
Agenda
 Intel® Resource Director Technology (Intel® RDT) Introduction
 Closed-loop Network Automation for Intel® RDT via Reinforcement Learning
 Research Proof of Concept

4
Introduction
 Network Function Virtualization (NFV) technology makes it possible for service
providers to allocate resource to virtual network functions on demand
 Customer trend: reduce TCO by improving server utilization while maintaining
workload performance
 Co-locate best effort (BE) workload with high priority workload
 Maintaining Service Level Agreement (SLA) performance on the high priority
jobs
 Running high priority and best effort workloads can cause contention on
shared resources

Intel® Resource Director Technology (Intel® RDT)
5
Software
Domain
Hardware
Domain
Model & Approach: Enable Monitoring and Control (enforcement)
Software
Policies
Hardware
Policies Resource
Enforcement
Feedback QoS Hints QoS
Hardware
FeaturesResource
Monitoring
RDT
Exposure
5
Simple Architectural CPUID/MSR Interfaces – Value across bare metal OS,
Containers, Cloud, Consolidation, Communications, SDI, NFV….

Intel® Resource Director Technology (Intel® RDT)
LPHP
Cache Monitoring Tech (CMT)
 Per-thread L3 Occupancy Monitoring
 Introduced in the Xeon E5 v3 Series
IMCCORE CREDITS
Memory Bandwidth Allocation (MBA)
 Per-core BW Control
 New on the Xeon-SP Family
Memory BW Monitoring (MBM)
 Per-thread BW Monitoring
IMC?
Cache Allocation Tech (CAT)
 Per-thread L3 Occupancy Control
Cache LPHP
Monitoring Allocation
MemoryCache
A Full-Featured Set of Technologies for Cache + Memory Monitoring and Control
6

7
Intel® RDT Usage in Networking
 NFV enables service providers to consolidate NFs on shared servers
 Use Intel® RDT to enforce a high degree of performance isolation among high
priority NFVs and best effort tasks
High Priority
NFV
Best Effort
Tasks
LLCHigh priority Low priority
Workload collocation
 Classify NFVs with strict SLA as
high priority applications
 Launch best effort (BE) tasks on
the same server to exploit any
resources unutilized by high
priority NFVs

8
Current Approach for Intel® RDT Allocation
 RDT resource allocation:
 Static resource allocation:
 Run NFVs offline, find the RDT configuration
that satisfy the SLA requirements for worst case
 Allocate remaining RDT resources to BE
 Static resource allocation based on 24-hour
traffic behavior
 Networking traffic exhibits daily busy and idle
scenario (diurnal patterns)
 Allocate RDT based on day or night worst case
 Still static
Time
RDT Resource Allocation
VNF Workloads
BE Workloads
RDT
Resource
Reserved for
VNFs
Actual resource
utilized
Current approach protects high priority VNFs, but lose
opportunity to achieve higher system utilization

9
How Can We Do Better?
 Goal: Ensures NFVs meet SLA while
maximizing the performance for BE tasks
 Dynamically allocate the unused RDT resources
from high priority VNFs to BE tasks
 Why Reinforcement Learning
 Act in an environment to maximize long term
rewards
 Self-learning the optimal policy to achieve defined
goals
 Capable of adapting to changing environment
 Exploration without prior knowledge required
Our Research: Dynamic RDT Resource Allocation Using Reinforcement Learning
Time
RDT Resource Allocation
VNF Workloads
BE Workloads
Potential RDT
resources that
can be allocated
to BE workloads

10
Agenda

11
Networking Closed-loop Automation for Intel® RDT
Objective – Automatic Dynamic RDT allocation between HP and
BE workloads while ensuring SLA for HP VNF
Platform
Analytics Agent
Dynamic Resource Controller
(AI/ML/RL)
Telemetry
High Priority VNF
Best Effort
Workload
Monitoring &
Storage
HW RDT Optimization
RDT
Action

12
Analytics: Reinforcement Learning
Reinforcement Learning
Agent
Policy: S->A
Environment
(Server Platform)
Action
State
Reward

13
RL Algorithm: Dueling Double Deep Q-Network
 Deep Q-Network (DQN)
 Learn a Q-function that estimate long term
reward from taking an action
 Widely used in computer games
 + sample efficient
 - overestimation of Q value
 - oscillation in training
 Dueling Double Deep Q-Network (DDDQN)
 + alleviate potential overestimation
 + higher stability in learning

14
Reinforcement Learning Design for RDT Allocation
Reinforcement Learning
Agent (DDDQN):
Dynamic resource
controller
(policy S->A)
Measure packet loss
and calculate reward
state
action
Set RDT
allocation
reward
Platform
HP BE
Measure state
State: List of inputs that can be fed to RL agent, e.g.: telemetry
Action: Intel® RDT – CAT
Reward: Minimal resource to achieve SLA for HP

 Rewards reflect the goal – allocate minimal LLC ways for the HP workload with
lowest possible packet loss
 Reward for packet loss
𝑅𝑝𝑘𝑡_𝑙𝑜𝑠𝑠 =
−𝑚1 𝑖𝑓 𝑝𝑘𝑡𝑙𝑜𝑠𝑠 > 𝑡ℎ
+𝑚2 𝑖𝑓 𝑝𝑘𝑡𝑙𝑜𝑠𝑠 <= 𝑡ℎ
 Reward for RDT - Rrdt
 packet drop <= th : higher reward for using less LLC ways for the HP
workload
 packet drop > th : assign higher reward for using more LLC ways for HP
workload
 Total reward:
15
Reinforcement Learning Agent - Reward
𝑅𝑡𝑜𝑡𝑎𝑙 = 𝑅𝑝kt_loss + 𝑅𝑟𝑑𝑡

16
Agenda

17
Experiment Setup
 Traffic: vary the traffic injection rate to mimic 24-hour traffic pattern
 High priority VNF workload: IPv4 forwarding
 Best effort workload: SPEC omnetpp
 Baseline Static RDT allocation:
 IPv4 forwarding: 9 cache ways
 Omnetpp: 2 cache ways
CoS1:
IPv4
CoS0:
BE

18
Test System: Closed-loop Automation for Intel® RDT
Platform
Analytics Agent
Dynamic Resource Controller
(AI/ML/RL)
Telemetry
(collectd)
High Priority VNF:
IPv4 forwarding
Best Effort
Workload: SPEC
Monitoring &
Storage
(Influxdb & Grafana)
HW RDT Optimization
RDT
Action

19
Results
 Dynamic RDT allocation with RL follows well with the traffic
shape
 Improves BE performance by 37%
 Packet drop for HP VNF remains similar as in baseline static
 Allocation policy is very close to the oracle policy
(a) Baseline static (b) dynamic RDT allocation with RL (c) Oracle dynamic RDT allocation
BE Workload Runtime Performance
Injection traffic

20
Summary and Future Work
 Closed-loop automation with RL demonstrated for dynamic RDT allocation
 Future work
 Explore various reinforcement learning algorithms balancing sample
efficiency and computation complexity with more complexed system
 Online model update to track changing traffic pattern and operation
environment
 Integrate with Orchestrator to handle dynamic resource allocation to further
improve server utilization

21
Legal Disclaimers
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability,
fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course
of dealing, or usage in trade.
This document contains information on products, services and/or processes in development. All information provided here
is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications
and roadmaps.
The products and services described may contain defects or errors known as errata which may cause deviations from
published specifications. Current characterized errata are available on request.
Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-
548-4725 or by visiting www.intel.com/design/literature.htm.
Intel, the Intel logo, Intel® RDT, Cache Allocation Technology (CAT) are trademarks of Intel Corporation in the U.S. and/or
other countries.
*Other names and brands may be claimed as the property of others
Copyright © 2017 Intel Corporation. All rights reserved.

Closed-Loop Network Automation for Optimal Resource Allocation via Reinforcement Learning by Bin Li

Closed-Loop Network Automation for Optimal Resource Allocation via Reinforcement Learning by Bin Li

More Related Content

What's hot

Similar to Closed-Loop Network Automation for Optimal Resource Allocation via Reinforcement Learning by Bin Li

More from Liz Warner

Recently uploaded

Closed-Loop Network Automation for Optimal Resource Allocation via Reinforcement Learning by Bin Li