State of the Union: Open Source Netw
ork Function Virtualization

Mario Smarduch
Senior Virtualization Architect
Open Source Group
Samsung Research America (Silicon Valley)
m.smarduch@samsung.com

© 2013 SAMSUNG Electronics Co.
Talk Description

laitnedifnoC

One of the hottest developments today for Fixed and Mobile Networks is
'Network Function Virtualization', headed by ETSI (European Telecommunications
Standard Institute) ISG which managed to become the largest ISG in matter of six months
with close to 70 members and 90 participants. Goals of NFV are to eliminate proprietary
hardware appliances, to reduce energy, space, and hardware turnover cost. Leverage IT
virtualization benefits like consolidation, time to market, multi-tenancy of heterogeneous
applications, scaling out and in, and encourage an open eco-system not tied to any
specific hardware. However IT virtualization is currently not fit for some NFV scenarios,
Network Elements, User Equipment. Proprietary vendors and chip manufacturers are
rushing to close this gap.
This presentation focuses on open source virtualization technology primarily KVM-ARM to
contrast these Gaps and identify required low level enhancements in hypervisor, guest,
and

ongoing community development to address these gaps is presented. Real uses

cases are presented to illustrate why IT virtualization is not always a fit for many NFV
scenarios. A brief overview of ARM-KVM virtualization and hardware extensions are also
2

covered.

© 2013 SAMSUNG Electronics Co.
laitnedifnoC

Agenda
General Public Clouds
NFV Introduction, Status
Cloud RAN NFV use case
KVM (ARM) – limitations/required enhancements

© 2013 SAMSUNG Electronics Co.

3
Public Cloud Control

laitnedifnoC

Focus on IaaS – PaaS, SaaS build on top of each other
- NFV does have PaaS, SaaS – powerful use cases as well (see NFV use case document)
- IaaS to grow – 2011 $4.2B $24B 2016 (Source: Gartner)
IaaS owner issues new VM request via portal server
with params # of cores, memory, storage, image to
load/install
Scheduler – view physical server/storage/network DB
selects optimal server, loads image, creates raid creates
VM in Compute cloud
o May need to migrate load, create NAT entries
o For KVM issue libvirt
qemu, commands
Update DB to maintain availability
IaaS owner – unaware of physical topology, migration,
i.e. other management – cloud infrastructure control plane
OpenStack equivalent components – Dashboard, Network
Compute, Image, Block Storage, …

esabataD

e g a mi
2 woc Q
U ME Q
DIAR

kl b - oi t ri v
t e n - oi t ri v

s d a er h T
OI / U P C v
U ME Q

VM

WSv

srenwO
SaaI

reludehcS
tnegA
duolC
egamI & egarotoS

VM

)..,ERG ,SNALV
osla( lortnoC
NDS – wolFnepO

latroP
tnegA

CNV ,HSS

duolC etupmoC

VM

VM
© 2013 SAMSUNG Electronics Co.

4
Public Cloud Network

laitnedifnoC

L3 & L2 in public cloud – Scaling the Cloud
- Public clouds 40,000 Physical machines possibly up to 1,000,000 VMs
∙ 2011-Gartner 8VMs/Server, probable 30:1 ratio
- IaaS – typically don’t require L2 broadcast domain, scale through multiple VMs

- VMs place on unique subnets – isolated for security
- Very few apps require – L2 in Cloud (broadcast, multicast – discovery of services)
∙ Large cloud providers – support L2 subnets
∙ Some client/server architectures – i.e. front end/backend processing
- Large cloud Scaling achieved through L3 hierarchical aggregated routes

© 2013 SAMSUNG Electronics Co.

5
© 2013 SAMSUNG Electronics Co.

6

swolf lortnoc – ltcfo-svo 0rb ot pat dda ltcsv-svo – pufi.umeq >cftni-syhp< 0rb trop-dda ltcsv-svo #
0rb rb-dda ltcsv-svo #

WG 1 ,PI 1 hcaE
4 * stenbus 8 – etagergga xx.]03-82[
stib 03/13.452.02.01 – 03/0.452.02.01

MVK htiw hciwsvnepO - NDS
x.x.861.291 WSv

23 * stenbus 8 – etagergga xxxx_x.]72-52[

72/422.452.02.01 – 72/0.452.02.01

……

42/0.0.02.01

42/0.552.02.01
sdleif yna no etuor/hctiws wolFnepO
noitartsihcrO NDS

652 * stenbus 652 – etagerggA
xxxx_xxxx.]42-71[ stib – 61/0.0.02.01

sPI elpitlum ot drocer A elpmaxe roF ecnalaB daoL SND
-

L2 overlay of L3, support isolated L2
Subnets For VMs IaaS

Scaling via L3

Public Cloud Network

laitnedifnoC
Web front end, SQL data base backend
– eCommerce
Social Networking
SaaS apps like email, Content Backup
High Performance Computing in the clo
ud

Characteristics
•
•
•

•
•

•

Resources – traditional compute – cpu,
ram, storage, network
Response - Not Real-time – response
driven by user perception (web interface)
Scalability – out, in – add/remove VMs
- Front frontend server, or load balancer
distributes load,
I/O – primarily virtualized – storage,
network
Overcommit – as much as current
average 8:1, future 30:1 per Server
[Source: cloudscaling]
Orchestration – spans few VM types,
small geographic area – same Pod

7

Workloads

laitnedifnoC

Public Cloud Characteristics (IaaS)

© 2013 SAMSUNG Electronics Co.
laitnedifnoC

Introduction to NFV
Mobile Network – LTE EUTRAN/EPC

© 2013 SAMSUNG Electronics Co.

8
laitnedifnoC

Introduction to NFV
• EUTRAN – eNodeB, UE

- Radio – bearer, admission, mobility, scheduling dynamic radio resource allocation for
uplink and downlink

• EPC

- PCRF = Policy Control and Charging Rules
∙ Determines QoS Class Identifier for data flow
∙ QoS – GBR/non-GBR, Priority, Delay, Pkt error loss rate – RT Gaming, Voice, Live

-

-

9

-

Streaming most demanding
HSS = Home Serving Server
∙ Subscriber profile – QoS, APN (PDN), current user MME
P-GW = Packet Data Network Gateway
∙ UP IP alloc, enforce PCRF QCI map to DL bearers
S-GW = Serving Gateway
∙ UE anchor for all IP traffic as UE roams through eNodeBs, retain bearer info for UE
in idle
MME = Mobility Management Engine
∙ Control node, UE attachment, bearer setup, UE context management from HSS,
process Tracking Area Update, paging, UE-IDLE to CONNECT state
© 2013 SAMSUNG Electronics Co.
© 2013 SAMSUNG Electronics Co.

DIET 5S



TFT-LD ICQ

1L
2L
PI / P D U
U- PT G
PI

DIET 5S



01

DIET 1S

S5-U Bearer

WG-P

DIET 1S

DI-BR

BedoNe

WG-S

g ni m a g T R s m 0 5 y al e D R B G 4I C Q
o e di v s m 0 0 1 y al e D R B G 2I C Q
e ci o v s m 0 0 1 y al e D R B G 1I C Q

DI BR

TFT-LU

YHP
CA M
CLR
PCDP
PI

EU

egasseM
EU skcabyggiP
dna BedoNe @
reraeB etavitcA

tsqR reraeB etaerC

tsqR reraeB etaerC

DIET U-PTG dna
ICQ ot pam ,TFT etacollA



S1-U Bearer

dmC ecruoseR reraeB

reraeB rof secruoseR
E2E enimreteD



Radio Bearer

……..
Data Plane Three traffic pipes - call setup
dmC ecruoseR reraeB

notacolla
CRR no EU
llet BedoNe

esnopseR reraeB yfidoM

esnopseR reraeB yfidoM
tseuqeR reraeB yfidoM

tseuqeR reraeB yfidoM

)erudecorp gnol( tseuqeR noitazirohtuA & noitacifitnedI
qeR noitacollA ecruoseR reraeB
)sGSM lareveS( hsilbatsE nnoC CRR

SSH

WGS

WG-P

EU

BedoNe

EMM

Paging – MME 500-800 UE msgs/hr, heavy load – 1500msgs/hr
- Example Call Setup – range from 2-3sec -

- LTE supports Public Safety – call setup time < 300ms – support group calls
- Other procedures –Sys Info Bcast, UE Rand Access Proc., UE Attach/Detach, TAU, Call Term.,

Control Plane – idea of messaging in Bearer setup – mobile initiated

Establishing Bearers

laitnedifnoC
laitnedifnoC

LTE EUTRAN/EPC Load Characteristics
Resources
- Radio BW, Network (CN), CPU, Memory, Storage (varies on NE like HSS).

Response - State Machine driven
-

Attachment, idle-connect, bearer setup – associated with timers/states
Real-time sensitive – various parameters can be tuned – but User Experience Suffers
User perception still all important – but hard deadlines exist
Near native scheduling

Scalability & Orchestration
- Network tightly coupled – scaling out – ripples through NEs
- Unlike Public Cloud just adding new VMs will not do it
- Orchestration for scale out/in extremely complex

I/O – Need near native
- RAN – massive device pass-through BBU accelerators, EPC NIC device pass-through

Overcommit
- Delicate load calculation required for PLMN to scale on demand where needed
- Can’t apply Cloud 8:1, 30:1 ratios

© 2013 SAMSUNG Electronics Co.

11
Current State of NVF

laitnedifnoC

NFV ETSI ISG

- Initial White Paper published Oct 2012
- Spans Mobile, and Fixed Networks
- First serious attempt to virtualize Mobile/Fixed networks
∙ Members Service Providers and all eco-system players

Proof of Concepts – Cloud Ran, Migration with Dev Pass-through, Cloud rGW
- Network Function Virtualization as a Service (NVF IaaS)
∙ Target Big Telco/Small Telco – lease NFVI as IasS for VNF and Cloud
- VNFaaS – move enterprise CPE into SP cloud, and later PE simplify Opex/Capex
∙ AR, NG-FW, QoS/DPI in owned/provisioned by SP
- VNPasS – Platform as a Service for example DNS, DHCP, email, FW
∙ Bring closer to APN – no tunneling back central IT infrastructure – total control
∙ SP provides bare services and Enterprise with config tools to manage the service
- VNF Forwarding Graphs
∙ Essential SDN – in multi-tenant environment OpenFlow capable config required to host f.e. small
telco in VNFI
∙ Need SDN orchestration OpenStack enhancing Quantum for SDN – to span VNFs and Physical
Network functions
- Mobile Core Virtualization – Goes along with NVFIaaS (to some extent)
∙ Improves Self Optimizing Networks – deliver performance where needed
- Cloud-RAN - key features for SON, on demand Radio BW, Opex/Capex savings
- Virtualizing home – vSTB, vGW – Fixed Network video/internet delivery to home
© 2013 SAMSUNG Electronics Co.

21
NFV Cloud-RAN Use Case

laitnedifnoC

Evolution of Radio Access Network
Single mode – 2G,3G – combined BBU & RRU
Scaled to maximum peak – waste of resources
Base Band Processing co-located with Remote Radio Unit

Remote Radio Units distributed via fiber links
Base Band Processing support multiple technologies
BBU can be housed in-door RRUs strategically distributed
MME/SGW

sURR

Pooling of Radio Base Band Unit Processing
Capacity dynamically adjusted – example sport event
Resources maximized – delivered on demand
Several Technologies supported

SGSN

s r o t a r el e e c A Y H P

UBB

31

•
•
•
•

ETL
UBBv

•
•
•

Hard access, power an issue in some locations

ETL
UBBv

UBB

o

STMU
UBBv

•
•
•

© 2013 SAMSUNG Electronics Co.
laitnedifnoC

New Virtualization HYP Mode

© 2013 SAMSUNG Electronics Co.

41
laitnedifnoC

Virtualization MMU Extensions

© 2013 SAMSUNG Electronics Co.

51
laitnedifnoC

Interrupt Virtualization Extensions

© 2013 SAMSUNG Electronics Co.

61
Device Pass-through

laitnedifnoC

Architecture/cost of interrupts
BBU cloud has hundreds of devices passed through – small cells many RRUs and fiber links
Libvirt, qemu not ready for such passive pasthrough, another issue handling faults
RRU to/from BBU PHY OFDMA (channels framing,FEC) to MAC – L2 logical Channels
L3 - RRC, NAS, IP
• MMU Pass-through – to user
o
o

Devices emulated – trap to QEMU – not this type
GVA
IPA
HPA – Direct access to HW regs

o

No performance penalty for MMU pass-through

PCI – looks up target BARs for HPA, QEMU selects IPA
DT – Device node with HPA, QEMU selects IPA

) CI N / E U o t Y H P ( h g u o r h t s s a P e ci v e D OI M M
tseuG

res U

•

Cost of Exit/Enter – executed in HYP – optimized assembler
o

Similar to process switch, Guest switch very costly

o
o

No concept of light-weight context switch like threads
Goal avoid at all costs

)… ,qri ,tbai ,tbad( sger deknab llA
derotser/devas sretsiger metsys SO – os eroM

tseuG
etatS
tseuG daoL
s g er 5 1 P C
D MI S / P F V
sgeRP G
s g er 0 8 ~

tsoH
etatS
tsoH evaS
51PC
D MI S / P F V
sgeRP G
s g er 0 8 ~

tsoH
etatS
tsoH daoL
51PC
D MI S / P F V
sgeRP G
s g er 0 8 ~

tseuG
etatS
tseuG evaS
s g er 5 1 P C
D MI S / P F V
sgeRP G
s g er 0 8 ~

71

-

© 2013 SAMSUNG Electronics Co.

AVG
l e nr e K

ksaT

ts o H oT
nr ut e R

UMEQ

ksaT

AVG
AVG

s r e vi r D
A PI

ti x E
tseu G

tseuG

UMEQ

s r e vi r D

A PI
yromeM

e ci v e D
E &T

tseu G oT
t c ej nI

ti x E e t a d p u
I O E D CI G v
tsoH/MVK
APH )2LP( edoM PYH

)NC ot CIN ,EU ot YHP( erawdraH

ts o H
oT
Q RI
Device Pass-through

laitnedifnoC

IRQ over head and optimizations

1. Guest executes – exit to hyp mode – save guest/restore host
2. Host enable Interrupts – deliver to host – 1 Complete IRQ OS PATH
3. Inject to Guest – save host, restore guest & 2 Complete IRQ OS PATH
4. Guest EOI – exit save guest/restore host
5. Update virtual distributor
6. Resume Guest – save host/restore guest
Note: Applying most direct injection no – irqfd, and additional threads

) CI N / E U o t Y H P ( h g u o r h t s s a P e ci v e D OI M M

ts

dn

tseuG

res U
AVG
l e nr e K

Testing by Virtual Open Systems reveals atleast 5x delay

Optimization 1

• ARM supports piority drop/deactivation after ack IRQ priority drops
and can deactivate from Guest during EOI w/no exit
• ARM can inject hwirq
• Eliminate 4-6 (experimenting)

Optimization 2

• Process Interrupts directly from HYP mode
• Build hwirq inject to Guest
• Eliminate 2-6 HOWEVER requires C-code, more overhead in HYP mode

l e v el w ol o t d e ti mil e d o m P Y H yl t n e r r u C

o

In Addition

• IRQ CPU affinity must match vCPU affinity – either bind or follow vCPU
otherwise you need IPIs – very slow
• Prefereable vCPU in idle not exit, wait for event not exit
• Future GIC versions per IRQ – direct delivery – still handle sleeping guests
© 2013 SAMSUNG Electronics Co.

81

ksaT

ts o H oT
nr ut e R

UMEQ

ksaT

AVG
AVG

s r e vi r D
A PI

ti x E
tseu G

tseuG

UMEQ

s r e vi r D

A PI
yromeM

e ci v e D
E &T

tseu G oT
t c ej nI

ti x E e t a d p u
I O E D CI G v
tsoH/MVK
APH )2LP( edoM PYH

)NC ot CIN ,EU ot YHP( erawdraH

ts o H
oT
Q RI
laitnedifnoC

Dynamic Load Balancing
Cloud Ran – dynamic load balancing between VMs
- Cell sites exhibit various loads throughout the day
- vCPU hotplug
∙ Unplug/plug vCPUs – dynamically scale to demand
sUPCv
gulp/gulpnu
Multicore Platform
Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

vBBU

vBBU

91

vBBU

© 2013 SAMSUNG Electronics Co.

gnildI sUPCv
tmgM rewoP
© 2013 SAMSUNG Electronics Co.

02

) e nil n e e r g( U M E Q h g u o r h t e ci v e d
d e t al u m e ai v l o c o t o r p r e v o c si d y r o m e m d e r a h s : n oi t ul o S
y o r t s e d / e t a e r c d n a m e d n o s d u ol c U B B v y n a M

g nil a n gi S
t p urr et nI
d e zi mi t p O

o
o

Pull packet from Shared Memory Ring buffer
Tx/Rx to Core Network SCTP or GTP-U

o

Signaling – signaling/interrupt path too long (red lines)

o

Discovery – to pair Guest must discover shared
memory segments dynamically

e c s el a o c , e d o m P Y H t p u r r e t ni : n oi t ul o S
t s e u G o t t p u r r e t ni t c ej ni r e e p n o e ci v e D d e t al u m E
U M E Q r e e p ’ t n e v e‘ U M E Q
U M E Q o t OI M M , ti x e , g e r Q RI o t s e ti r w OI U ai v t s e u G

)NC ot CIN ,EU ot YHP( erawdraH

edo M PYH

Issues:

Core Network

y r o m e m d e r a h s t s e u g - r e t ni o t y p o c o r e Z
t ek c a p 3 L + C A M + e c a p s r e s u o t yl t c e ri d e ri w f o ll u P

•

yromeM
derahS

M V K/ T S O H

e ci v e D
E &T

•

o

Dedicate CPUs – poll –or- optimized dev passthrough

CIN

VRD
MHS

e ci v e D
E &T

VED
MHS

YHP
NAR

VED
MHS

VRD
MHS

o

PHY Device passthrough –

. d o o g r e v e n si e t a ci d e D

ksaT

ksaT

UMEQ

tseuG

tseuG

h g u or ht-ss a P

UMEQ

vBBU Instance

•
•
•
•

BBU needs fast switching – radio  core network
Can’t have full stack with expensive IPC
Want to separate Radio and Core functions
Radio

- ivshmem one example, add enhancements

Zero copy message passing – Guest/Guest, Guest to Host

Fast Path between Radio/CN

laitnedifnoC
RT-Scheduling

laitnedifnoC

Network stack time sensitive - requirements
- Highres timers a must
- Preemptibility – event PREEMPT_RT a must – prevent interrupt inversion
- Scheduling at several levels – host and guest threads

Timers
-

Arch-timers improvement no exit on reg updates
But still exit on timer fire – need injection
Issue for high res timers in Guest
Again near native IRQ pass-through important

IRQs & page faults
- Any host IRQ can prevent guest from running
- PFRA as well (if so most likely not tuned for RT)

PREEMPT_RT
- Not really tested with virtualization

12

Guest

© 2013 SAMSUNG Electronics Co.

)TR_TPMEERP(

MVK &
t s o H x u ni L

)TR_TPMEERP(

WG-Sv

xet
EMMvum=kcoL nipS

… ,MSF locotorP ,OIFV
ycnetaL

stnevE remiT revnI

oirP/tnI on setucexE daerhT oirP rehgiH
tpurretnI
xetum=kcoL nipS

ycnetaL

.revnI oirP/tnI on setucexE daerhT oirP rehgiH
tpurretnI

stnevE remiT

er a w dr a H
s e cr u o S Q RI
RT-Scheduling

laitnedifnoC

Possible Optimizations – area of research
Host PREEMPT_RT
Eliminate spinlocks, replace with mutexes
Prioritize interrupts - non-VM targeted IRQs
vCPUs – prioritize at higher priority
VM IRQs don’t run as threads – timers, dev-pa
ssthrough IRQs
- Use Priority Drop/Deactivation to schedule
highest priority interrupts for VMs

-

Guests most likely PREEMPT only
Challenges –
- multiple VMs sharing CPU
∙ Priority between them
∙ Priority of their IRQs
∙ Context switching an issue, depends on load

- OS periodic tick work - CONFIG_NO_HZ_FULL
∙ Promising, for dedicated vCPU to core reduces tick ov
erhread
∙ Improves multiple vCPUs as well tick rate

22

Guest

© 2013 SAMSUNG Electronics Co.

)TPMEERP(

MVK &
t s o H x u ni L

)TR_TPMEERP(

WG-Sv

xet
EMMvum=kcoL nipS

… ,MSF locotorP ,OIFV
ycnetaL

stnevE remiT revnI

oirP/tnI on setucexE daerhT oirP rehgiH
tpurretnI
xetum=kcoL nipS

ycnetaL

.revnI oirP/tnI on setucexE daerhT oirP rehgiH
tpurretnI

stnevE remiT

er a w dr a H
s e cr u o S Q RI
Thank you.
Mario Smarduch
Senior Virtualization Architect
Open Source Group
Samsung Research America (Silicon Valley)
m.smarduch@samsung.com

© 2013 SAMSUNG Electronics Co.

State of the Union: Open Source Network Function Virtualization

  • 1.
    State of theUnion: Open Source Netw ork Function Virtualization Mario Smarduch Senior Virtualization Architect Open Source Group Samsung Research America (Silicon Valley) m.smarduch@samsung.com © 2013 SAMSUNG Electronics Co.
  • 2.
    Talk Description laitnedifnoC One ofthe hottest developments today for Fixed and Mobile Networks is 'Network Function Virtualization', headed by ETSI (European Telecommunications Standard Institute) ISG which managed to become the largest ISG in matter of six months with close to 70 members and 90 participants. Goals of NFV are to eliminate proprietary hardware appliances, to reduce energy, space, and hardware turnover cost. Leverage IT virtualization benefits like consolidation, time to market, multi-tenancy of heterogeneous applications, scaling out and in, and encourage an open eco-system not tied to any specific hardware. However IT virtualization is currently not fit for some NFV scenarios, Network Elements, User Equipment. Proprietary vendors and chip manufacturers are rushing to close this gap. This presentation focuses on open source virtualization technology primarily KVM-ARM to contrast these Gaps and identify required low level enhancements in hypervisor, guest, and ongoing community development to address these gaps is presented. Real uses cases are presented to illustrate why IT virtualization is not always a fit for many NFV scenarios. A brief overview of ARM-KVM virtualization and hardware extensions are also 2 covered. © 2013 SAMSUNG Electronics Co.
  • 3.
    laitnedifnoC Agenda General Public Clouds NFVIntroduction, Status Cloud RAN NFV use case KVM (ARM) – limitations/required enhancements © 2013 SAMSUNG Electronics Co. 3
  • 4.
    Public Cloud Control laitnedifnoC Focuson IaaS – PaaS, SaaS build on top of each other - NFV does have PaaS, SaaS – powerful use cases as well (see NFV use case document) - IaaS to grow – 2011 $4.2B $24B 2016 (Source: Gartner) IaaS owner issues new VM request via portal server with params # of cores, memory, storage, image to load/install Scheduler – view physical server/storage/network DB selects optimal server, loads image, creates raid creates VM in Compute cloud o May need to migrate load, create NAT entries o For KVM issue libvirt qemu, commands Update DB to maintain availability IaaS owner – unaware of physical topology, migration, i.e. other management – cloud infrastructure control plane OpenStack equivalent components – Dashboard, Network Compute, Image, Block Storage, … esabataD e g a mi 2 woc Q U ME Q DIAR kl b - oi t ri v t e n - oi t ri v s d a er h T OI / U P C v U ME Q VM WSv srenwO SaaI reludehcS tnegA duolC egamI & egarotoS VM )..,ERG ,SNALV osla( lortnoC NDS – wolFnepO latroP tnegA CNV ,HSS duolC etupmoC VM VM © 2013 SAMSUNG Electronics Co. 4
  • 5.
    Public Cloud Network laitnedifnoC L3& L2 in public cloud – Scaling the Cloud - Public clouds 40,000 Physical machines possibly up to 1,000,000 VMs ∙ 2011-Gartner 8VMs/Server, probable 30:1 ratio - IaaS – typically don’t require L2 broadcast domain, scale through multiple VMs - VMs place on unique subnets – isolated for security - Very few apps require – L2 in Cloud (broadcast, multicast – discovery of services) ∙ Large cloud providers – support L2 subnets ∙ Some client/server architectures – i.e. front end/backend processing - Large cloud Scaling achieved through L3 hierarchical aggregated routes © 2013 SAMSUNG Electronics Co. 5
  • 6.
    © 2013 SAMSUNGElectronics Co. 6 swolf lortnoc – ltcfo-svo 0rb ot pat dda ltcsv-svo – pufi.umeq >cftni-syhp< 0rb trop-dda ltcsv-svo # 0rb rb-dda ltcsv-svo # WG 1 ,PI 1 hcaE 4 * stenbus 8 – etagergga xx.]03-82[ stib 03/13.452.02.01 – 03/0.452.02.01 MVK htiw hciwsvnepO - NDS x.x.861.291 WSv 23 * stenbus 8 – etagergga xxxx_x.]72-52[ 72/422.452.02.01 – 72/0.452.02.01 …… 42/0.0.02.01 42/0.552.02.01 sdleif yna no etuor/hctiws wolFnepO noitartsihcrO NDS 652 * stenbus 652 – etagerggA xxxx_xxxx.]42-71[ stib – 61/0.0.02.01 sPI elpitlum ot drocer A elpmaxe roF ecnalaB daoL SND - L2 overlay of L3, support isolated L2 Subnets For VMs IaaS Scaling via L3 Public Cloud Network laitnedifnoC
  • 7.
    Web front end,SQL data base backend – eCommerce Social Networking SaaS apps like email, Content Backup High Performance Computing in the clo ud Characteristics • • • • • • Resources – traditional compute – cpu, ram, storage, network Response - Not Real-time – response driven by user perception (web interface) Scalability – out, in – add/remove VMs - Front frontend server, or load balancer distributes load, I/O – primarily virtualized – storage, network Overcommit – as much as current average 8:1, future 30:1 per Server [Source: cloudscaling] Orchestration – spans few VM types, small geographic area – same Pod 7 Workloads laitnedifnoC Public Cloud Characteristics (IaaS) © 2013 SAMSUNG Electronics Co.
  • 8.
    laitnedifnoC Introduction to NFV MobileNetwork – LTE EUTRAN/EPC © 2013 SAMSUNG Electronics Co. 8
  • 9.
    laitnedifnoC Introduction to NFV •EUTRAN – eNodeB, UE - Radio – bearer, admission, mobility, scheduling dynamic radio resource allocation for uplink and downlink • EPC - PCRF = Policy Control and Charging Rules ∙ Determines QoS Class Identifier for data flow ∙ QoS – GBR/non-GBR, Priority, Delay, Pkt error loss rate – RT Gaming, Voice, Live - - 9 - Streaming most demanding HSS = Home Serving Server ∙ Subscriber profile – QoS, APN (PDN), current user MME P-GW = Packet Data Network Gateway ∙ UP IP alloc, enforce PCRF QCI map to DL bearers S-GW = Serving Gateway ∙ UE anchor for all IP traffic as UE roams through eNodeBs, retain bearer info for UE in idle MME = Mobility Management Engine ∙ Control node, UE attachment, bearer setup, UE context management from HSS, process Tracking Area Update, paging, UE-IDLE to CONNECT state © 2013 SAMSUNG Electronics Co.
  • 10.
    © 2013 SAMSUNGElectronics Co. DIET 5S  TFT-LD ICQ 1L 2L PI / P D U U- PT G PI DIET 5S  01 DIET 1S S5-U Bearer WG-P DIET 1S DI-BR BedoNe WG-S g ni m a g T R s m 0 5 y al e D R B G 4I C Q o e di v s m 0 0 1 y al e D R B G 2I C Q e ci o v s m 0 0 1 y al e D R B G 1I C Q DI BR TFT-LU YHP CA M CLR PCDP PI EU egasseM EU skcabyggiP dna BedoNe @ reraeB etavitcA tsqR reraeB etaerC tsqR reraeB etaerC DIET U-PTG dna ICQ ot pam ,TFT etacollA  S1-U Bearer dmC ecruoseR reraeB reraeB rof secruoseR E2E enimreteD  Radio Bearer …….. Data Plane Three traffic pipes - call setup dmC ecruoseR reraeB notacolla CRR no EU llet BedoNe esnopseR reraeB yfidoM esnopseR reraeB yfidoM tseuqeR reraeB yfidoM tseuqeR reraeB yfidoM )erudecorp gnol( tseuqeR noitazirohtuA & noitacifitnedI qeR noitacollA ecruoseR reraeB )sGSM lareveS( hsilbatsE nnoC CRR SSH WGS WG-P EU BedoNe EMM Paging – MME 500-800 UE msgs/hr, heavy load – 1500msgs/hr - Example Call Setup – range from 2-3sec - - LTE supports Public Safety – call setup time < 300ms – support group calls - Other procedures –Sys Info Bcast, UE Rand Access Proc., UE Attach/Detach, TAU, Call Term., Control Plane – idea of messaging in Bearer setup – mobile initiated Establishing Bearers laitnedifnoC
  • 11.
    laitnedifnoC LTE EUTRAN/EPC LoadCharacteristics Resources - Radio BW, Network (CN), CPU, Memory, Storage (varies on NE like HSS). Response - State Machine driven - Attachment, idle-connect, bearer setup – associated with timers/states Real-time sensitive – various parameters can be tuned – but User Experience Suffers User perception still all important – but hard deadlines exist Near native scheduling Scalability & Orchestration - Network tightly coupled – scaling out – ripples through NEs - Unlike Public Cloud just adding new VMs will not do it - Orchestration for scale out/in extremely complex I/O – Need near native - RAN – massive device pass-through BBU accelerators, EPC NIC device pass-through Overcommit - Delicate load calculation required for PLMN to scale on demand where needed - Can’t apply Cloud 8:1, 30:1 ratios © 2013 SAMSUNG Electronics Co. 11
  • 12.
    Current State ofNVF laitnedifnoC NFV ETSI ISG - Initial White Paper published Oct 2012 - Spans Mobile, and Fixed Networks - First serious attempt to virtualize Mobile/Fixed networks ∙ Members Service Providers and all eco-system players Proof of Concepts – Cloud Ran, Migration with Dev Pass-through, Cloud rGW - Network Function Virtualization as a Service (NVF IaaS) ∙ Target Big Telco/Small Telco – lease NFVI as IasS for VNF and Cloud - VNFaaS – move enterprise CPE into SP cloud, and later PE simplify Opex/Capex ∙ AR, NG-FW, QoS/DPI in owned/provisioned by SP - VNPasS – Platform as a Service for example DNS, DHCP, email, FW ∙ Bring closer to APN – no tunneling back central IT infrastructure – total control ∙ SP provides bare services and Enterprise with config tools to manage the service - VNF Forwarding Graphs ∙ Essential SDN – in multi-tenant environment OpenFlow capable config required to host f.e. small telco in VNFI ∙ Need SDN orchestration OpenStack enhancing Quantum for SDN – to span VNFs and Physical Network functions - Mobile Core Virtualization – Goes along with NVFIaaS (to some extent) ∙ Improves Self Optimizing Networks – deliver performance where needed - Cloud-RAN - key features for SON, on demand Radio BW, Opex/Capex savings - Virtualizing home – vSTB, vGW – Fixed Network video/internet delivery to home © 2013 SAMSUNG Electronics Co. 21
  • 13.
    NFV Cloud-RAN UseCase laitnedifnoC Evolution of Radio Access Network Single mode – 2G,3G – combined BBU & RRU Scaled to maximum peak – waste of resources Base Band Processing co-located with Remote Radio Unit Remote Radio Units distributed via fiber links Base Band Processing support multiple technologies BBU can be housed in-door RRUs strategically distributed MME/SGW sURR Pooling of Radio Base Band Unit Processing Capacity dynamically adjusted – example sport event Resources maximized – delivered on demand Several Technologies supported SGSN s r o t a r el e e c A Y H P UBB 31 • • • • ETL UBBv • • • Hard access, power an issue in some locations ETL UBBv UBB o STMU UBBv • • • © 2013 SAMSUNG Electronics Co.
  • 14.
    laitnedifnoC New Virtualization HYPMode © 2013 SAMSUNG Electronics Co. 41
  • 15.
    laitnedifnoC Virtualization MMU Extensions ©2013 SAMSUNG Electronics Co. 51
  • 16.
  • 17.
    Device Pass-through laitnedifnoC Architecture/cost ofinterrupts BBU cloud has hundreds of devices passed through – small cells many RRUs and fiber links Libvirt, qemu not ready for such passive pasthrough, another issue handling faults RRU to/from BBU PHY OFDMA (channels framing,FEC) to MAC – L2 logical Channels L3 - RRC, NAS, IP • MMU Pass-through – to user o o Devices emulated – trap to QEMU – not this type GVA IPA HPA – Direct access to HW regs o No performance penalty for MMU pass-through PCI – looks up target BARs for HPA, QEMU selects IPA DT – Device node with HPA, QEMU selects IPA ) CI N / E U o t Y H P ( h g u o r h t s s a P e ci v e D OI M M tseuG res U • Cost of Exit/Enter – executed in HYP – optimized assembler o Similar to process switch, Guest switch very costly o o No concept of light-weight context switch like threads Goal avoid at all costs )… ,qri ,tbai ,tbad( sger deknab llA derotser/devas sretsiger metsys SO – os eroM tseuG etatS tseuG daoL s g er 5 1 P C D MI S / P F V sgeRP G s g er 0 8 ~ tsoH etatS tsoH evaS 51PC D MI S / P F V sgeRP G s g er 0 8 ~ tsoH etatS tsoH daoL 51PC D MI S / P F V sgeRP G s g er 0 8 ~ tseuG etatS tseuG evaS s g er 5 1 P C D MI S / P F V sgeRP G s g er 0 8 ~ 71 - © 2013 SAMSUNG Electronics Co. AVG l e nr e K ksaT ts o H oT nr ut e R UMEQ ksaT AVG AVG s r e vi r D A PI ti x E tseu G tseuG UMEQ s r e vi r D A PI yromeM e ci v e D E &T tseu G oT t c ej nI ti x E e t a d p u I O E D CI G v tsoH/MVK APH )2LP( edoM PYH )NC ot CIN ,EU ot YHP( erawdraH ts o H oT Q RI
  • 18.
    Device Pass-through laitnedifnoC IRQ overhead and optimizations 1. Guest executes – exit to hyp mode – save guest/restore host 2. Host enable Interrupts – deliver to host – 1 Complete IRQ OS PATH 3. Inject to Guest – save host, restore guest & 2 Complete IRQ OS PATH 4. Guest EOI – exit save guest/restore host 5. Update virtual distributor 6. Resume Guest – save host/restore guest Note: Applying most direct injection no – irqfd, and additional threads ) CI N / E U o t Y H P ( h g u o r h t s s a P e ci v e D OI M M ts dn tseuG res U AVG l e nr e K Testing by Virtual Open Systems reveals atleast 5x delay Optimization 1 • ARM supports piority drop/deactivation after ack IRQ priority drops and can deactivate from Guest during EOI w/no exit • ARM can inject hwirq • Eliminate 4-6 (experimenting) Optimization 2 • Process Interrupts directly from HYP mode • Build hwirq inject to Guest • Eliminate 2-6 HOWEVER requires C-code, more overhead in HYP mode l e v el w ol o t d e ti mil e d o m P Y H yl t n e r r u C o In Addition • IRQ CPU affinity must match vCPU affinity – either bind or follow vCPU otherwise you need IPIs – very slow • Prefereable vCPU in idle not exit, wait for event not exit • Future GIC versions per IRQ – direct delivery – still handle sleeping guests © 2013 SAMSUNG Electronics Co. 81 ksaT ts o H oT nr ut e R UMEQ ksaT AVG AVG s r e vi r D A PI ti x E tseu G tseuG UMEQ s r e vi r D A PI yromeM e ci v e D E &T tseu G oT t c ej nI ti x E e t a d p u I O E D CI G v tsoH/MVK APH )2LP( edoM PYH )NC ot CIN ,EU ot YHP( erawdraH ts o H oT Q RI
  • 19.
    laitnedifnoC Dynamic Load Balancing CloudRan – dynamic load balancing between VMs - Cell sites exhibit various loads throughout the day - vCPU hotplug ∙ Unplug/plug vCPUs – dynamically scale to demand sUPCv gulp/gulpnu Multicore Platform Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core vBBU vBBU 91 vBBU © 2013 SAMSUNG Electronics Co. gnildI sUPCv tmgM rewoP
  • 20.
    © 2013 SAMSUNGElectronics Co. 02 ) e nil n e e r g( U M E Q h g u o r h t e ci v e d d e t al u m e ai v l o c o t o r p r e v o c si d y r o m e m d e r a h s : n oi t ul o S y o r t s e d / e t a e r c d n a m e d n o s d u ol c U B B v y n a M g nil a n gi S t p urr et nI d e zi mi t p O o o Pull packet from Shared Memory Ring buffer Tx/Rx to Core Network SCTP or GTP-U o Signaling – signaling/interrupt path too long (red lines) o Discovery – to pair Guest must discover shared memory segments dynamically e c s el a o c , e d o m P Y H t p u r r e t ni : n oi t ul o S t s e u G o t t p u r r e t ni t c ej ni r e e p n o e ci v e D d e t al u m E U M E Q r e e p ’ t n e v e‘ U M E Q U M E Q o t OI M M , ti x e , g e r Q RI o t s e ti r w OI U ai v t s e u G )NC ot CIN ,EU ot YHP( erawdraH edo M PYH Issues: Core Network y r o m e m d e r a h s t s e u g - r e t ni o t y p o c o r e Z t ek c a p 3 L + C A M + e c a p s r e s u o t yl t c e ri d e ri w f o ll u P • yromeM derahS M V K/ T S O H e ci v e D E &T • o Dedicate CPUs – poll –or- optimized dev passthrough CIN VRD MHS e ci v e D E &T VED MHS YHP NAR VED MHS VRD MHS o PHY Device passthrough – . d o o g r e v e n si e t a ci d e D ksaT ksaT UMEQ tseuG tseuG h g u or ht-ss a P UMEQ vBBU Instance • • • • BBU needs fast switching – radio  core network Can’t have full stack with expensive IPC Want to separate Radio and Core functions Radio - ivshmem one example, add enhancements Zero copy message passing – Guest/Guest, Guest to Host Fast Path between Radio/CN laitnedifnoC
  • 21.
    RT-Scheduling laitnedifnoC Network stack timesensitive - requirements - Highres timers a must - Preemptibility – event PREEMPT_RT a must – prevent interrupt inversion - Scheduling at several levels – host and guest threads Timers - Arch-timers improvement no exit on reg updates But still exit on timer fire – need injection Issue for high res timers in Guest Again near native IRQ pass-through important IRQs & page faults - Any host IRQ can prevent guest from running - PFRA as well (if so most likely not tuned for RT) PREEMPT_RT - Not really tested with virtualization 12 Guest © 2013 SAMSUNG Electronics Co. )TR_TPMEERP( MVK & t s o H x u ni L )TR_TPMEERP( WG-Sv xet EMMvum=kcoL nipS … ,MSF locotorP ,OIFV ycnetaL stnevE remiT revnI oirP/tnI on setucexE daerhT oirP rehgiH tpurretnI xetum=kcoL nipS ycnetaL .revnI oirP/tnI on setucexE daerhT oirP rehgiH tpurretnI stnevE remiT er a w dr a H s e cr u o S Q RI
  • 22.
    RT-Scheduling laitnedifnoC Possible Optimizations –area of research Host PREEMPT_RT Eliminate spinlocks, replace with mutexes Prioritize interrupts - non-VM targeted IRQs vCPUs – prioritize at higher priority VM IRQs don’t run as threads – timers, dev-pa ssthrough IRQs - Use Priority Drop/Deactivation to schedule highest priority interrupts for VMs - Guests most likely PREEMPT only Challenges – - multiple VMs sharing CPU ∙ Priority between them ∙ Priority of their IRQs ∙ Context switching an issue, depends on load - OS periodic tick work - CONFIG_NO_HZ_FULL ∙ Promising, for dedicated vCPU to core reduces tick ov erhread ∙ Improves multiple vCPUs as well tick rate 22 Guest © 2013 SAMSUNG Electronics Co. )TPMEERP( MVK & t s o H x u ni L )TR_TPMEERP( WG-Sv xet EMMvum=kcoL nipS … ,MSF locotorP ,OIFV ycnetaL stnevE remiT revnI oirP/tnI on setucexE daerhT oirP rehgiH tpurretnI xetum=kcoL nipS ycnetaL .revnI oirP/tnI on setucexE daerhT oirP rehgiH tpurretnI stnevE remiT er a w dr a H s e cr u o S Q RI
  • 23.
    Thank you. Mario Smarduch SeniorVirtualization Architect Open Source Group Samsung Research America (Silicon Valley) m.smarduch@samsung.com © 2013 SAMSUNG Electronics Co.