Bin Li – Intel Labs
The Intersection of Networking and AI/ML Meetup – August 2019
2
Contributors
 Yipeng Wang yipeng1.wang@intel.com
 Ren Wang ren.wang@intel.com
 Charlie Tai charlie.tai@intel.com
 Andrew Herdrich andrew.j.herdrich@intel.com
 Tong Zhang tong2.zhang@intel.com
 Zhu Zhou zhu.zhou@intel.com
3
Agenda
 Intel® Resource Director Technology (Intel® RDT) Introduction
 Closed-loop Network Automation for Intel® RDT via Reinforcement Learning
 Research Proof of Concept
4
Introduction
 Network Function Virtualization (NFV) technology makes it possible for service
providers to allocate resource to virtual network functions on demand
 Customer trend: reduce TCO by improving server utilization while maintaining
workload performance
 Co-locate best effort (BE) workload with high priority workload
 Maintaining Service Level Agreement (SLA) performance on the high priority
jobs
 Running high priority and best effort workloads can cause contention on
shared resources
IntelÂŽ Resource Director Technology (IntelÂŽ RDT)
5
Software
Domain
Hardware
Domain
Model & Approach: Enable Monitoring and Control (enforcement)
Software
Policies
Hardware
Policies Resource
Enforcement
Feedback QoS Hints QoS
Hardware
FeaturesResource
Monitoring
RDT
Exposure
5
Simple Architectural CPUID/MSR Interfaces – Value across bare metal OS,
Containers, Cloud, Consolidation, Communications, SDI, NFV….
IntelÂŽ Resource Director Technology (IntelÂŽ RDT)
LPHP
Cache Monitoring Tech (CMT)
 Per-thread L3 Occupancy Monitoring
 Introduced in the Xeon E5 v3 Series
IMCCORE CREDITS
Memory Bandwidth Allocation (MBA)
 Per-core BW Control
 New on the Xeon-SP Family
Memory BW Monitoring (MBM)
 Per-thread BW Monitoring
 Introduced in the Xeon E5 v4 Series
IMC?
Cache Allocation Tech (CAT)
 Per-thread L3 Occupancy Control
 Introduced in the Xeon E5 v4 Series
Cache LPHP
Monitoring Allocation
MemoryCache
A Full-Featured Set of Technologies for Cache + Memory Monitoring and Control
6
7
IntelÂŽ RDT Usage in Networking
 NFV enables service providers to consolidate NFs on shared servers
 Use Intel® RDT to enforce a high degree of performance isolation among high
priority NFVs and best effort tasks
High Priority
NFV
Best Effort
Tasks
LLCHigh priority Low priority
Workload collocation
 Classify NFVs with strict SLA as
high priority applications
 Launch best effort (BE) tasks on
the same server to exploit any
resources unutilized by high
priority NFVs
8
Current Approach for IntelÂŽ RDT Allocation
 RDT resource allocation:
 Static resource allocation:
 Run NFVs offline, find the RDT configuration
that satisfy the SLA requirements for worst case
 Allocate remaining RDT resources to BE
 Static resource allocation based on 24-hour
traffic behavior
 Networking traffic exhibits daily busy and idle
scenario (diurnal patterns)
 Allocate RDT based on day or night worst case
 Still static
Time
RDT Resource Allocation
VNF Workloads
BE Workloads
RDT
Resource
Reserved for
VNFs
Actual resource
utilized
Current approach protects high priority VNFs, but lose
opportunity to achieve higher system utilization
9
How Can We Do Better?
 Goal: Ensures NFVs meet SLA while
maximizing the performance for BE tasks
 Dynamically allocate the unused RDT resources
from high priority VNFs to BE tasks
 Why Reinforcement Learning
 Act in an environment to maximize long term
rewards
 Self-learning the optimal policy to achieve defined
goals
 Capable of adapting to changing environment
 Exploration without prior knowledge required
Our Research: Dynamic RDT Resource Allocation Using Reinforcement Learning
Time
RDT Resource Allocation
VNF Workloads
BE Workloads
Potential RDT
resources that
can be allocated
to BE workloads
10
Agenda
 Intel® Resource Director Technology (Intel® RDT) Introduction
 Closed-loop Network Automation for Intel® RDT via Reinforcement Learning
 Research Proof of Concept
11
Networking Closed-loop Automation for IntelÂŽ RDT
Objective – Automatic Dynamic RDT allocation between HP and
BE workloads while ensuring SLA for HP VNF
Platform
Analytics Agent
Dynamic Resource Controller
(AI/ML/RL)
Telemetry
High Priority VNF
Best Effort
Workload
Monitoring &
Storage
HW RDT Optimization
RDT
Action
12
Analytics: Reinforcement Learning
Reinforcement Learning
Agent
Policy: S->A
Environment
(Server Platform)
Action
State
Reward
13
RL Algorithm: Dueling Double Deep Q-Network
 Deep Q-Network (DQN)
 Learn a Q-function that estimate long term
reward from taking an action
 Widely used in computer games
 + sample efficient
 - overestimation of Q value
 - oscillation in training
 Dueling Double Deep Q-Network (DDDQN)
 + alleviate potential overestimation
 + higher stability in learning
14
Reinforcement Learning Design for RDT Allocation
Reinforcement Learning
Agent (DDDQN):
Dynamic resource
controller
(policy S->A)
Measure packet loss
and calculate reward
state
action
Set RDT
allocation
reward
Platform
HP BE
Measure state
State: List of inputs that can be fed to RL agent, e.g.: telemetry
Action: Intel® RDT – CAT
Reward: Minimal resource to achieve SLA for HP
 Rewards reflect the goal – allocate minimal LLC ways for the HP workload with
lowest possible packet loss
 Reward for packet loss
𝑅𝑝𝑘𝑡_𝑙𝑜𝑠𝑠 =
−𝑚1 𝑖𝑓 𝑝𝑘𝑡𝑙𝑜𝑠𝑠 > 𝑡ℎ
+𝑚2 𝑖𝑓 𝑝𝑘𝑡𝑙𝑜𝑠𝑠 <= 𝑡ℎ
 Reward for RDT - Rrdt
 packet drop <= th : higher reward for using less LLC ways for the HP
workload
 packet drop > th : assign higher reward for using more LLC ways for HP
workload
 Total reward:
15
Reinforcement Learning Agent - Reward
𝑅𝑡𝑜𝑡𝑎𝑙 = 𝑅𝑝kt_loss + 𝑅𝑟𝑑𝑡
16
Agenda
 Intel® Resource Director Technology (Intel® RDT) Introduction
 Closed-loop Network Automation for Intel® RDT via Reinforcement Learning
 Research Proof of Concept
17
Experiment Setup
 Traffic: vary the traffic injection rate to mimic 24-hour traffic pattern
 High priority VNF workload: IPv4 forwarding
 Best effort workload: SPEC omnetpp
 Baseline Static RDT allocation:
 IPv4 forwarding: 9 cache ways
 Omnetpp: 2 cache ways
CoS1:
IPv4
CoS0:
BE
18
Test System: Closed-loop Automation for IntelÂŽ RDT
Platform
Analytics Agent
Dynamic Resource Controller
(AI/ML/RL)
Telemetry
(collectd)
High Priority VNF:
IPv4 forwarding
Best Effort
Workload: SPEC
Monitoring &
Storage
(Influxdb & Grafana)
HW RDT Optimization
RDT
Action
19
Results
 Dynamic RDT allocation with RL follows well with the traffic
shape
 Improves BE performance by 37%
 Packet drop for HP VNF remains similar as in baseline static
 Allocation policy is very close to the oracle policy
(a) Baseline static (b) dynamic RDT allocation with RL (c) Oracle dynamic RDT allocation
BE Workload Runtime Performance
Injection traffic
20
Summary and Future Work
 Closed-loop automation with RL demonstrated for dynamic RDT allocation
 Future work
 Explore various reinforcement learning algorithms balancing sample
efficiency and computation complexity with more complexed system
 Online model update to track changing traffic pattern and operation
environment
 Integrate with Orchestrator to handle dynamic resource allocation to further
improve server utilization
21
Legal Disclaimers
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability,
fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course
of dealing, or usage in trade.
This document contains information on products, services and/or processes in development. All information provided here
is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications
and roadmaps.
The products and services described may contain defects or errors known as errata which may cause deviations from
published specifications. Current characterized errata are available on request.
Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-
548-4725 or by visiting www.intel.com/design/literature.htm.
Intel, the Intel logo, IntelÂŽ RDT, Cache Allocation Technology (CAT) are trademarks of Intel Corporation in the U.S. and/or
other countries.
*Other names and brands may be claimed as the property of others
Copyright Š 2017 Intel Corporation. All rights reserved.
Bin Li
bin.li@intel.com
Closed-Loop Network Automation for Optimal Resource Allocation via Reinforcement Learning by Bin Li

Closed-Loop Network Automation for Optimal Resource Allocation via Reinforcement Learning by Bin Li

  • 1.
    Bin Li –Intel Labs The Intersection of Networking and AI/ML Meetup – August 2019
  • 2.
    2 Contributors  Yipeng Wangyipeng1.wang@intel.com  Ren Wang ren.wang@intel.com  Charlie Tai charlie.tai@intel.com  Andrew Herdrich andrew.j.herdrich@intel.com  Tong Zhang tong2.zhang@intel.com  Zhu Zhou zhu.zhou@intel.com
  • 3.
    3 Agenda  Intel® ResourceDirector Technology (Intel® RDT) Introduction  Closed-loop Network Automation for Intel® RDT via Reinforcement Learning  Research Proof of Concept
  • 4.
    4 Introduction  Network FunctionVirtualization (NFV) technology makes it possible for service providers to allocate resource to virtual network functions on demand  Customer trend: reduce TCO by improving server utilization while maintaining workload performance  Co-locate best effort (BE) workload with high priority workload  Maintaining Service Level Agreement (SLA) performance on the high priority jobs  Running high priority and best effort workloads can cause contention on shared resources
  • 5.
    Intel® Resource DirectorTechnology (Intel® RDT) 5 Software Domain Hardware Domain Model & Approach: Enable Monitoring and Control (enforcement) Software Policies Hardware Policies Resource Enforcement Feedback QoS Hints QoS Hardware FeaturesResource Monitoring RDT Exposure 5 Simple Architectural CPUID/MSR Interfaces – Value across bare metal OS, Containers, Cloud, Consolidation, Communications, SDI, NFV….
  • 6.
    Intel® Resource DirectorTechnology (Intel® RDT) LPHP Cache Monitoring Tech (CMT)  Per-thread L3 Occupancy Monitoring  Introduced in the Xeon E5 v3 Series IMCCORE CREDITS Memory Bandwidth Allocation (MBA)  Per-core BW Control  New on the Xeon-SP Family Memory BW Monitoring (MBM)  Per-thread BW Monitoring  Introduced in the Xeon E5 v4 Series IMC? Cache Allocation Tech (CAT)  Per-thread L3 Occupancy Control  Introduced in the Xeon E5 v4 Series Cache LPHP Monitoring Allocation MemoryCache A Full-Featured Set of Technologies for Cache + Memory Monitoring and Control 6
  • 7.
    7 Intel® RDT Usagein Networking  NFV enables service providers to consolidate NFs on shared servers  Use Intel® RDT to enforce a high degree of performance isolation among high priority NFVs and best effort tasks High Priority NFV Best Effort Tasks LLCHigh priority Low priority Workload collocation  Classify NFVs with strict SLA as high priority applications  Launch best effort (BE) tasks on the same server to exploit any resources unutilized by high priority NFVs
  • 8.
    8 Current Approach forIntel® RDT Allocation  RDT resource allocation:  Static resource allocation:  Run NFVs offline, find the RDT configuration that satisfy the SLA requirements for worst case  Allocate remaining RDT resources to BE  Static resource allocation based on 24-hour traffic behavior  Networking traffic exhibits daily busy and idle scenario (diurnal patterns)  Allocate RDT based on day or night worst case  Still static Time RDT Resource Allocation VNF Workloads BE Workloads RDT Resource Reserved for VNFs Actual resource utilized Current approach protects high priority VNFs, but lose opportunity to achieve higher system utilization
  • 9.
    9 How Can WeDo Better?  Goal: Ensures NFVs meet SLA while maximizing the performance for BE tasks  Dynamically allocate the unused RDT resources from high priority VNFs to BE tasks  Why Reinforcement Learning  Act in an environment to maximize long term rewards  Self-learning the optimal policy to achieve defined goals  Capable of adapting to changing environment  Exploration without prior knowledge required Our Research: Dynamic RDT Resource Allocation Using Reinforcement Learning Time RDT Resource Allocation VNF Workloads BE Workloads Potential RDT resources that can be allocated to BE workloads
  • 10.
    10 Agenda  Intel® ResourceDirector Technology (Intel® RDT) Introduction  Closed-loop Network Automation for Intel® RDT via Reinforcement Learning  Research Proof of Concept
  • 11.
    11 Networking Closed-loop Automationfor Intel® RDT Objective – Automatic Dynamic RDT allocation between HP and BE workloads while ensuring SLA for HP VNF Platform Analytics Agent Dynamic Resource Controller (AI/ML/RL) Telemetry High Priority VNF Best Effort Workload Monitoring & Storage HW RDT Optimization RDT Action
  • 12.
    12 Analytics: Reinforcement Learning ReinforcementLearning Agent Policy: S->A Environment (Server Platform) Action State Reward
  • 13.
    13 RL Algorithm: DuelingDouble Deep Q-Network  Deep Q-Network (DQN)  Learn a Q-function that estimate long term reward from taking an action  Widely used in computer games  + sample efficient  - overestimation of Q value  - oscillation in training  Dueling Double Deep Q-Network (DDDQN)  + alleviate potential overestimation  + higher stability in learning
  • 14.
    14 Reinforcement Learning Designfor RDT Allocation Reinforcement Learning Agent (DDDQN): Dynamic resource controller (policy S->A) Measure packet loss and calculate reward state action Set RDT allocation reward Platform HP BE Measure state State: List of inputs that can be fed to RL agent, e.g.: telemetry Action: Intel® RDT – CAT Reward: Minimal resource to achieve SLA for HP
  • 15.
     Rewards reflectthe goal – allocate minimal LLC ways for the HP workload with lowest possible packet loss  Reward for packet loss 𝑅𝑝𝑘𝑡_𝑙𝑜𝑠𝑠 = −𝑚1 𝑖𝑓 𝑝𝑘𝑡𝑙𝑜𝑠𝑠 > 𝑡ℎ +𝑚2 𝑖𝑓 𝑝𝑘𝑡𝑙𝑜𝑠𝑠 <= 𝑡ℎ  Reward for RDT - Rrdt  packet drop <= th : higher reward for using less LLC ways for the HP workload  packet drop > th : assign higher reward for using more LLC ways for HP workload  Total reward: 15 Reinforcement Learning Agent - Reward 𝑅𝑡𝑜𝑡𝑎𝑙 = 𝑅𝑝kt_loss + 𝑅𝑟𝑑𝑡
  • 16.
    16 Agenda  Intel® ResourceDirector Technology (Intel® RDT) Introduction  Closed-loop Network Automation for Intel® RDT via Reinforcement Learning  Research Proof of Concept
  • 17.
    17 Experiment Setup  Traffic:vary the traffic injection rate to mimic 24-hour traffic pattern  High priority VNF workload: IPv4 forwarding  Best effort workload: SPEC omnetpp  Baseline Static RDT allocation:  IPv4 forwarding: 9 cache ways  Omnetpp: 2 cache ways CoS1: IPv4 CoS0: BE
  • 18.
    18 Test System: Closed-loopAutomation for IntelÂŽ RDT Platform Analytics Agent Dynamic Resource Controller (AI/ML/RL) Telemetry (collectd) High Priority VNF: IPv4 forwarding Best Effort Workload: SPEC Monitoring & Storage (Influxdb & Grafana) HW RDT Optimization RDT Action
  • 19.
    19 Results  Dynamic RDTallocation with RL follows well with the traffic shape  Improves BE performance by 37%  Packet drop for HP VNF remains similar as in baseline static  Allocation policy is very close to the oracle policy (a) Baseline static (b) dynamic RDT allocation with RL (c) Oracle dynamic RDT allocation BE Workload Runtime Performance Injection traffic
  • 20.
    20 Summary and FutureWork  Closed-loop automation with RL demonstrated for dynamic RDT allocation  Future work  Explore various reinforcement learning algorithms balancing sample efficiency and computation complexity with more complexed system  Online model update to track changing traffic pattern and operation environment  Integrate with Orchestrator to handle dynamic resource allocation to further improve server utilization
  • 21.
    21 Legal Disclaimers No license(express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800- 548-4725 or by visiting www.intel.com/design/literature.htm. Intel, the Intel logo, IntelŽ RDT, Cache Allocation Technology (CAT) are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others Copyright Š 2017 Intel Corporation. All rights reserved.
  • 22.