Photo
ACRN Real Time Management and
Performance Optimization
Minggui Cao (minggui.cao@intel.com)
ACRN vMeet-Up Europe 2021
Agenda
• Challenges in RTVM (Real Time VM) in ACRN
• ACRN RT main features supported:
• CPU partition / LAPIC / PCI Device passthrough
• CAT configuration
• RT profiling/tuning on ACRN
• Tuning steps for RTVM
• BIOS setting for RT
• Configurations of RTVM example: Preempt-RT Linux
• VMEXIT data profiling
• PMU data sampling example: cyclictest
• Summary
Challenges in RTVM
• Resource contention
o CPU in other VMs
o GPU
o DMA/MMIO operation on Device
• Cost in virtualization
o device virtualization
o VM Exit:
▪ CPUID,specific MSR,Port IO...
• Longer path to optimize RT
o BIOS -> Hypervisor -> Guest OS -> RT task
Hardware( CPU/Device ...)
Linux
Service OS
RT work on ACRN targets:
1. Reduce the impact of virtualization on real time performance
2. Reduce other VM’s interference to RTVM.
ACRN RTVM - CPU partition / LAPIC passthrough
ACRN Hypervisor
RTVM
PCI Device Driver
MSI-X
table
CFG
space
PBA
space
BAR
space
INTx
IRQ
PCI device
direct
interrupt delivery
trap & emulation
pointer MSI MSI-X
LAPIC
MSI
IRQ
INTx
pass-thru
• RTVM CPU partition
• LAPIC PT: Avoid VM-Exit caused by interrupt
• Devices Passthrough
• Some limitations to RTVM:
• Just x2APIC adopted in RTVM (for MSR access)
• PCI devices passthrough
o Just MSI supported, INTx not supported
o After device passthrough configured,guest OS shall avoid to
access configure space and MSI-X table.
ACRN RTVM – Cache partition based on CAT
CAT Waymask0
CAT
PQR_ASSOC
CAT
PQR_ASSOC
CAT Waymask
CAT Waymask0
CAT QOS_MASK0 (COS0)
CAT
PQR_ASSOC
CAT
PQR_ASSOC
CPU0 CPU1 CPU2 CPU3
global
LLC shared LLC RT partition
GPU
GT CLOS
Setting
GT part
• CAT: Cache Allocation Technology
• CPU LLC Cache partition:
o Some cache ways isolated for RTVM cores
o The rest of cache ways shared by other cores.
• GPU LLC
o GPU LLC allocated in the shared LLC
• Avoid other VM to invalidate whole Cache
o “WBINVD” will be trapped into ACRN, and emulated.
ACRN Hypervisor
HMI
Setup #3
RTVM w/ Noise
ACRN Hypervisor
SOS
ACRN DM
Service OS
RTVM
Setup #2
RTVM w/o Noise
Setup #1
Bare-metal
Bare-metal
ACRN DM
Service OS RTVM
RT tuning steps on ACRN:
Linux Linux
BIOS setting- main items for RT
Item Status Note
Hyper-threading Disabled
CPU turbo mode Disabled
P-state/C-State Disabled
GPU frequency Set to 1/3 of MAX
… …
Preempt-RT Linux kernel configurations
https://github.com/projectacrn/acrn-kernel/blob/4.19/preempt-rt/x86-64_defconfig
RT Kernel :linux-4.19.72-rt25+
▪ CONFIG_PREEMPT_RT_FULL=y
▪ CONFIG_NO_HZ=y
▪ CONFIG_NO_HZ_FULL=y
▪ CONFIG_RCU_NOCB_CPU=y
▪ CONFIG_CPU_ISOLATION=y
kernel parameters:
▪ processor.max_cstate=0 intel_idle.max_cstate=0 idle=poll
▪ intel_pstate=disable
▪ isolcpus=nohz,domain,1
▪ nohz_full=1
▪ rcu_nocbs=1 rcu_nocb_poll
▪ irqaffinity=0
Configurations of Preempt-RT Linux as Native:
VMEXIT data profiling-I
• ACRNTrace: https://projectacrn.github.io/latest/tutorials/debug.html?highlight=acrntrace
VMEXIT data profiling-II
• VMEXIT command in hypervisor console (integrated in ACRN2.0 branch)
PMU data sampling example: Cyclictest
• Cyclictest accurately and repeatedly measures the difference between a thread's
intended wake-up time and the time at which it actually wakes up in order to provide
statistics about the system's latencies. It can measure latencies in real-time systems
caused by the hardware, the firmware, and the operating system.
nanosleep nanosleep returns
Latency
...
Timeline
Expected returns
1ms to set
Actual total time
Cyclictest test settting:
• RTVM 2 CPU core isolated:
o House keeping core: core 0
o RT core: core 1
• RT Linux setting:
o Bind all IRQ to Core 0 except local timer
o Bind all RCU task to Core 0
o Set all tasks (non-RT tasks) SCHED_OTHER and
priority 0
## example CMD to run cyclictest
cyclictest -a 1 -p 80 -m -N -D 24h -q -H 30000 --histfile=test.log
numbers
latency
An example for profiling data chart:
Enhanced PMU data profiling in benchmark
• When there are some outliers happens in benchmark, we’d like to know the
root cause
• If PMU data can be offered, it is helpful to root cause.
So we did an enhancement feature/patch to profile PMU data in the benchmark.
You can refer:
https://github.com/mgcao/rt-tests/tree/v1.0-post-pmc/src/cyclictest
Example:
1. Set PMU registers first in RTOS: cyclictestpmc_setting_shcache_misses.sh
2. cyclictest -a 1 -p 80 -m -N -n -D 900 -q -H 10000 --extra_sample --random_sample
500 --max_check 5000
3. Detailed PMU data can be sampled for analysis.
Example for profiling PMU data
PMU data
Latency
Summary:
▪ For real time on ACRN, resource partition for RTVM is a main point,
including CPU, Cache(CAT), Device pass-through…
▪ For RT tuning/profiling: including BIOS setting, RTOS setting, tools to
profile VMEXIT data, PMU data…
▪ Reference resource:
1. Intel® 64 and IA-32 Architectures Software Developer’s Manual
2. Intel® 64 and IA-32 Architectures Optimization Reference Manual
3. cache-allocation-technology-white-paper.pdf
Call for Participation
https://projectacrn.github.io/index.html
Joining ACRN Community Today!!!

ACRN vMeet-Up EU 2021 - Real Time Management and Performance Optimization

  • 1.
    Photo ACRN Real TimeManagement and Performance Optimization Minggui Cao (minggui.cao@intel.com) ACRN vMeet-Up Europe 2021
  • 2.
    Agenda • Challenges inRTVM (Real Time VM) in ACRN • ACRN RT main features supported: • CPU partition / LAPIC / PCI Device passthrough • CAT configuration • RT profiling/tuning on ACRN • Tuning steps for RTVM • BIOS setting for RT • Configurations of RTVM example: Preempt-RT Linux • VMEXIT data profiling • PMU data sampling example: cyclictest • Summary
  • 3.
    Challenges in RTVM •Resource contention o CPU in other VMs o GPU o DMA/MMIO operation on Device • Cost in virtualization o device virtualization o VM Exit: ▪ CPUID,specific MSR,Port IO... • Longer path to optimize RT o BIOS -> Hypervisor -> Guest OS -> RT task Hardware( CPU/Device ...) Linux Service OS RT work on ACRN targets: 1. Reduce the impact of virtualization on real time performance 2. Reduce other VM’s interference to RTVM.
  • 4.
    ACRN RTVM -CPU partition / LAPIC passthrough ACRN Hypervisor RTVM PCI Device Driver MSI-X table CFG space PBA space BAR space INTx IRQ PCI device direct interrupt delivery trap & emulation pointer MSI MSI-X LAPIC MSI IRQ INTx pass-thru • RTVM CPU partition • LAPIC PT: Avoid VM-Exit caused by interrupt • Devices Passthrough • Some limitations to RTVM: • Just x2APIC adopted in RTVM (for MSR access) • PCI devices passthrough o Just MSI supported, INTx not supported o After device passthrough configured,guest OS shall avoid to access configure space and MSI-X table.
  • 5.
    ACRN RTVM –Cache partition based on CAT CAT Waymask0 CAT PQR_ASSOC CAT PQR_ASSOC CAT Waymask CAT Waymask0 CAT QOS_MASK0 (COS0) CAT PQR_ASSOC CAT PQR_ASSOC CPU0 CPU1 CPU2 CPU3 global LLC shared LLC RT partition GPU GT CLOS Setting GT part • CAT: Cache Allocation Technology • CPU LLC Cache partition: o Some cache ways isolated for RTVM cores o The rest of cache ways shared by other cores. • GPU LLC o GPU LLC allocated in the shared LLC • Avoid other VM to invalidate whole Cache o “WBINVD” will be trapped into ACRN, and emulated.
  • 6.
    ACRN Hypervisor HMI Setup #3 RTVMw/ Noise ACRN Hypervisor SOS ACRN DM Service OS RTVM Setup #2 RTVM w/o Noise Setup #1 Bare-metal Bare-metal ACRN DM Service OS RTVM RT tuning steps on ACRN: Linux Linux
  • 7.
    BIOS setting- mainitems for RT Item Status Note Hyper-threading Disabled CPU turbo mode Disabled P-state/C-State Disabled GPU frequency Set to 1/3 of MAX … …
  • 8.
    Preempt-RT Linux kernelconfigurations https://github.com/projectacrn/acrn-kernel/blob/4.19/preempt-rt/x86-64_defconfig RT Kernel :linux-4.19.72-rt25+ ▪ CONFIG_PREEMPT_RT_FULL=y ▪ CONFIG_NO_HZ=y ▪ CONFIG_NO_HZ_FULL=y ▪ CONFIG_RCU_NOCB_CPU=y ▪ CONFIG_CPU_ISOLATION=y kernel parameters: ▪ processor.max_cstate=0 intel_idle.max_cstate=0 idle=poll ▪ intel_pstate=disable ▪ isolcpus=nohz,domain,1 ▪ nohz_full=1 ▪ rcu_nocbs=1 rcu_nocb_poll ▪ irqaffinity=0 Configurations of Preempt-RT Linux as Native:
  • 9.
    VMEXIT data profiling-I •ACRNTrace: https://projectacrn.github.io/latest/tutorials/debug.html?highlight=acrntrace
  • 10.
    VMEXIT data profiling-II •VMEXIT command in hypervisor console (integrated in ACRN2.0 branch)
  • 11.
    PMU data samplingexample: Cyclictest • Cyclictest accurately and repeatedly measures the difference between a thread's intended wake-up time and the time at which it actually wakes up in order to provide statistics about the system's latencies. It can measure latencies in real-time systems caused by the hardware, the firmware, and the operating system. nanosleep nanosleep returns Latency ... Timeline Expected returns 1ms to set Actual total time
  • 12.
    Cyclictest test settting: •RTVM 2 CPU core isolated: o House keeping core: core 0 o RT core: core 1 • RT Linux setting: o Bind all IRQ to Core 0 except local timer o Bind all RCU task to Core 0 o Set all tasks (non-RT tasks) SCHED_OTHER and priority 0 ## example CMD to run cyclictest cyclictest -a 1 -p 80 -m -N -D 24h -q -H 30000 --histfile=test.log numbers latency An example for profiling data chart:
  • 13.
    Enhanced PMU dataprofiling in benchmark • When there are some outliers happens in benchmark, we’d like to know the root cause • If PMU data can be offered, it is helpful to root cause. So we did an enhancement feature/patch to profile PMU data in the benchmark. You can refer: https://github.com/mgcao/rt-tests/tree/v1.0-post-pmc/src/cyclictest Example: 1. Set PMU registers first in RTOS: cyclictestpmc_setting_shcache_misses.sh 2. cyclictest -a 1 -p 80 -m -N -n -D 900 -q -H 10000 --extra_sample --random_sample 500 --max_check 5000 3. Detailed PMU data can be sampled for analysis.
  • 14.
    Example for profilingPMU data PMU data Latency
  • 15.
    Summary: ▪ For realtime on ACRN, resource partition for RTVM is a main point, including CPU, Cache(CAT), Device pass-through… ▪ For RT tuning/profiling: including BIOS setting, RTOS setting, tools to profile VMEXIT data, PMU data… ▪ Reference resource: 1. Intel® 64 and IA-32 Architectures Software Developer’s Manual 2. Intel® 64 and IA-32 Architectures Optimization Reference Manual 3. cache-allocation-technology-white-paper.pdf
  • 16.