Hwanju Kim, Sangwook Kim, Jinkyu Jeong, Joonwon Lee, and Seungryoul Maeng, “Demand-Based Coordinated Scheduling for SMP VMs”, International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Houston, Texas, USA, Mar. 2013.
Live VM migration allows virtual machines to be relocated between physical hosts with little to no downtime. There are two main approaches: pre-copy migration copies memory contents iteratively with little downtime, while post-copy migration copies CPU states first and then memory pages on demand to reduce total migration time. Several research projects use live migration techniques to improve data center efficiency: LiteGreen saves energy by consolidating idle desktop VMs, Jettison uses partial VM migration for quick consolidation, and Kaleidoscope proposes VM state coloring to enable fast micro-elasticity through live cloning of warm VMs.
CPU Scheduling for Virtual Desktop InfrastructureHwanju Kim
This document discusses CPU scheduling techniques for virtual desktop infrastructure (VDI). It proposes a demand-based coordinated scheduling approach for scheduling multithreaded workloads on multiprocessor virtual machines (VMs). The key points are:
1. Coordinated scheduling of sibling virtual CPUs (vCPUs) in a VM is needed to effectively schedule multithreaded workloads, as uncoordinated scheduling can reduce inter-thread communication performance.
2. A coordination space consisting of space (physical CPU assignment) and time (preemption policy) domains is defined to coordinate vCPU scheduling.
3. In the space domain, a load-conscious balance scheduling approach assigns sibling vCPUs across physical CPUs based
This document discusses CPU virtualization and scheduling techniques. It covers topics such as deprivileging the operating system, virtualization-unfriendly architectures like x86, hardware-assisted virtualization using VMX mode, and proportional-share scheduling. It also summarizes research on improving VM scheduling by making it task-aware to prioritize I/O-bound tasks and correlate I/O events with tasks to boost their performance while maintaining inter-VM fairness. The document provides historical context on the evolution of virtualization technologies and research challenges in building lightweight and intelligent VMM schedulers.
Hyper-V High Availability and Live MigrationPaulo Freitas
This document provides an overview of a Microsoft Virtual Academy training program on Hyper-V virtualization. The program is split into two halves, with the first half covering topics like Hyper-V infrastructure, networking, storage, and management. The second half focuses on high availability, disaster recovery, and integrating Hyper-V with System Center. It also discusses capabilities like live migration, replication, clustering and improving application availability and redundancy through virtualization.
OS vs. VMM provides an overview of the similarities and differences between operating systems (OS) and virtual machine monitors (VMM). Both OS and VMM abstract hardware resources, but VMM provides virtualization while OS provides abstraction. Nested virtualization further complicates resource management by adding additional layers of indirection. Key issues in virtualization include trapping privileged OS operations, scheduling virtual CPUs, managing virtual memory translations, and achieving high performance I/O.
Yabusame: postcopy live migration for qemu/kvmIsaku Yamahata
Yabusame is a postcopy live migration technique for QEMU/KVM. It was developed by Isaku Yamahata of VALinux Systems Japan K.K. and Takahiro Hirofuchi of AIST. The project aims to improve live migration performance by allowing the guest VM to resume execution at the destination host before memory pages have been fully copied. This is achieved through asynchronous page fault handling during the postcopy phase. Evaluation shows the technique can improve CPU utilization and reduce total migration times compared to traditional precopy approaches. Future work includes upstream integration, support for KSM/THP, multithreading optimizations, and integration with management platforms like libvirt and OpenStack.
Inroduction to Virtualization and Video Playback during a Live Migrated Virtual Machine hosting the server with its time analysis.
OS- Ubuntu
Hypervisor- KVM
Scheduler Support for Video-oriented Multimedia on Client-side VirtualizationHwanju Kim
Hwanju Kim, Jinkyu Jeong, Jaeho Hwang, Joonwon Lee, and Seungryoul Maeng, “Scheduler Support for Video-oriented Multimedia on Client-side Virtualization”, ACM Multimedia Systems (MMSys), Chapel Hill, North Carolina, USA, Feb. 2012.
Live VM migration allows virtual machines to be relocated between physical hosts with little to no downtime. There are two main approaches: pre-copy migration copies memory contents iteratively with little downtime, while post-copy migration copies CPU states first and then memory pages on demand to reduce total migration time. Several research projects use live migration techniques to improve data center efficiency: LiteGreen saves energy by consolidating idle desktop VMs, Jettison uses partial VM migration for quick consolidation, and Kaleidoscope proposes VM state coloring to enable fast micro-elasticity through live cloning of warm VMs.
CPU Scheduling for Virtual Desktop InfrastructureHwanju Kim
This document discusses CPU scheduling techniques for virtual desktop infrastructure (VDI). It proposes a demand-based coordinated scheduling approach for scheduling multithreaded workloads on multiprocessor virtual machines (VMs). The key points are:
1. Coordinated scheduling of sibling virtual CPUs (vCPUs) in a VM is needed to effectively schedule multithreaded workloads, as uncoordinated scheduling can reduce inter-thread communication performance.
2. A coordination space consisting of space (physical CPU assignment) and time (preemption policy) domains is defined to coordinate vCPU scheduling.
3. In the space domain, a load-conscious balance scheduling approach assigns sibling vCPUs across physical CPUs based
This document discusses CPU virtualization and scheduling techniques. It covers topics such as deprivileging the operating system, virtualization-unfriendly architectures like x86, hardware-assisted virtualization using VMX mode, and proportional-share scheduling. It also summarizes research on improving VM scheduling by making it task-aware to prioritize I/O-bound tasks and correlate I/O events with tasks to boost their performance while maintaining inter-VM fairness. The document provides historical context on the evolution of virtualization technologies and research challenges in building lightweight and intelligent VMM schedulers.
Hyper-V High Availability and Live MigrationPaulo Freitas
This document provides an overview of a Microsoft Virtual Academy training program on Hyper-V virtualization. The program is split into two halves, with the first half covering topics like Hyper-V infrastructure, networking, storage, and management. The second half focuses on high availability, disaster recovery, and integrating Hyper-V with System Center. It also discusses capabilities like live migration, replication, clustering and improving application availability and redundancy through virtualization.
OS vs. VMM provides an overview of the similarities and differences between operating systems (OS) and virtual machine monitors (VMM). Both OS and VMM abstract hardware resources, but VMM provides virtualization while OS provides abstraction. Nested virtualization further complicates resource management by adding additional layers of indirection. Key issues in virtualization include trapping privileged OS operations, scheduling virtual CPUs, managing virtual memory translations, and achieving high performance I/O.
Yabusame: postcopy live migration for qemu/kvmIsaku Yamahata
Yabusame is a postcopy live migration technique for QEMU/KVM. It was developed by Isaku Yamahata of VALinux Systems Japan K.K. and Takahiro Hirofuchi of AIST. The project aims to improve live migration performance by allowing the guest VM to resume execution at the destination host before memory pages have been fully copied. This is achieved through asynchronous page fault handling during the postcopy phase. Evaluation shows the technique can improve CPU utilization and reduce total migration times compared to traditional precopy approaches. Future work includes upstream integration, support for KSM/THP, multithreading optimizations, and integration with management platforms like libvirt and OpenStack.
Inroduction to Virtualization and Video Playback during a Live Migrated Virtual Machine hosting the server with its time analysis.
OS- Ubuntu
Hypervisor- KVM
Scheduler Support for Video-oriented Multimedia on Client-side VirtualizationHwanju Kim
Hwanju Kim, Jinkyu Jeong, Jaeho Hwang, Joonwon Lee, and Seungryoul Maeng, “Scheduler Support for Video-oriented Multimedia on Client-side Virtualization”, ACM Multimedia Systems (MMSys), Chapel Hill, North Carolina, USA, Feb. 2012.
Application Live Migration in LAN/WAN EnvironmentMahendra Kutare
Evaluation of VM live migration policies on VMware, Xen, IBM System P, and Hyper-V ! Examination of critical stages of VM live migration policy as state machine and steps to optimize and improve service disruption time.
This document discusses I/O virtualization and GPU virtualization. It covers:
- Two approaches to I/O virtualization: hosted and device driver approaches. Hosted has lower engineering cost but lower performance.
- Methods to optimize para-virtualized I/O including split-driver models, reducing data copy costs, and hardware supports like IOMMU and SR-IOV.
- Challenges of GPU virtualization including whether to take a low-level virtualization or high-level API remoting approach. API remoting is preferred due to closed and evolving GPU hardware.
- Hardware pass-through of GPUs for high performance but low scalability. Industry solutions for remote desktop
VM live migration from one physical server to another is a key advantage of virtualization. It's used widely in the scenarios such as load balance / power consumption optimization inside the cluster and host maintenance, etc. Being able to do VM live migration as quickly as possible with no service interruption is regarded as a key competitiveness of the virtualization platform.
Xen has supported live migration for many years. However our recent study shows that Xen still has lots of room to improve, in the aspects of live migration elapsed time, service downtime and concurrency instance number. Several experimental enhancements have been added and the initial result looks pretty good. For instance, merely using memory comparison before migration can speed up the elapsed time by >2X in some cases per our evaluation. The policy to balance the CPU utilization and the compression ratio is also considered.
Building a KVM-based Hypervisor for a Heterogeneous System Architecture Compl...Hann Yu-Ju Huang
This document discusses building a KVM-based hypervisor that can virtualize the key features of Heterogeneous System Architecture (HSA) for a compliant system. It describes HSA features like shared virtual memory, I/O page faulting, and user-level queueing. It then outlines the design of virtualizing these features through techniques like VirtIO-KFD for queues, shadow page tables for shared memory, and shadow PPR interrupts for page faults. Evaluation shows the hypervisor approach incurs average performance overhead of 5% for GPU execution compared to native execution.
This document discusses memory management techniques in Xen virtualization. It covers:
1) Xen uses a buddy allocator to hand out frames to guests and tracks memory usage and types with reference counts and a frametable.
2) For paravirtualized guests, Xen uses PV pagetables where the guest manages a PFN to MFN table and Xen provides a shared MFN to PFN table and checks guest pagetable contents.
3) For hardware-assisted guests, Xen supplies a second set of pagetables describing the PFN to MFN translation and access restrictions, which the CPU applies along with the guest's pagetables.
The document discusses virtualization techniques used in KVM. It describes how KVM uses shadow page tables to virtualize memory management. The shadow page tables allow virtual addresses used by a guest OS to be translated to physical addresses on the host machine. Different techniques for implementing shadow page tables are described, including pre-validation of guest page tables and using a virtual translation lookaside buffer to cache translations.
Memory virtualization allows virtual machines to access virtual memory addresses that are translated to physical memory addresses. Hardware support for memory virtualization reduces overhead by offloading page table management to the CPU. Memory sharing between virtual machines reduces memory usage by identifying identical pages and having them share physical memory. Virtual memory overcommitment allocates more virtual memory than physical memory available by swapping out unused memory to disk. Techniques for memory sharing and overcommitment aim to improve memory utilization in virtualized systems.
Virtual Machine Migration Techniques in Cloud Environment: A Surveyijsrd.com
Cloud is an emerging technology in the world of information technology and is built on the key concept of virtualization. Virtualization separates hardware from software and has benefits of server consolidation and live migration. Live migration is a useful tool for migrating OS instances across distant physical of data centers and clusters. It facilitates load balancing, fault management, low-level system maintenance and reduction in energy consumption. In this paper, we survey the major issues of virtual machine live migration. There are various techniques available for live migration and different parameters are considered for migration.
This document summarizes a talk on redesigning Xen's memory sharing (grant) mechanism. It proposes moving grant-related hypercalls to guest domains to allow unilateral revocation of grants by domains and enable better reuse of grants. An evaluation shows the redesigned mechanism with grant reuse reduces overhead and improves I/O performance compared to the traditional approach.
webinar vmware v-sphere performance management Challenges and Best PracticesMetron
With the majority of businesses using internal Cloud Services, whether it be Software as a Service (SaaS), Platform as a Service (PaaS) or Infrastructure as a Service (IaaS) in a VMware vSphere environment, this presentation gives an insight into how to manage the gathering Storm Clouds. After an introduction to VMware's Virtual Infrastructure 4 (vSphere) environment andCloud Computing, we discuss how Capacity Management provides the means to spot potential Storm Clouds far in advance and more specifically, how you can avoid them.
Delving deeper we look at IaaS and how to identify potential capacity on demand issues. Discussion focuses on topics such as:
•identifying whether virtual machines are under or over provisioned
•the advantages/disadvantages of application sizing
•how to minimize SLA impact
•whether to scale the infrastructure out, up or in and ultimately how to get it right.
Typically organizations have adopted a "silo mentality" whereby they ring fence IT systems and don’t share resources through lack of trust and confidence. We look at the advantages virtualization brings in terms of flexibility, scalability, cost reduction (monetary and environmental) and how we can protect our 'loved ones' through resource pools, shares, reservations and limits.
With all this in mind, join us to find out what information and processes we recommend you need to have and implement to avoid an Internal Storm and ensure that Brighter Outlook!
Todd Deshane presented results from benchmarking tests of Xen and KVM virtualization systems at the 2008 Xen Summit. Key findings included Xen having similar or better CPU performance than KVM, but KVM outperforming Xen on some disk and network tests. Tests of performance isolation showed Xen was generally more isolated, while KVM scaling failed as guest numbers increased beyond 4. Areas for further work were identified, such as expanding tests and automating processes.
This document discusses different techniques for virtual machine migration. It begins with an introduction to virtualization and how virtual machine migration involves copying a VM from one physical machine to another. There are three main categories of migration techniques: fault tolerant techniques which migrate VMs to prevent failures, load balancing techniques which distribute load across servers, and energy efficient techniques which optimize resource utilization to conserve energy. Live VM migration is described as migrating the entire OS and applications between physical machines without disrupting applications. The document also covers background details on virtual machine migration methods being either hot/live where the VM continues running, or cold/non-live where the VM status is lost during migration.
XPDDS18: Memory Overcommitment in XEN - Huang Zhichao, HuaweiThe Linux Foundation
This talk will introduce our practice on Memory Overcommitment in XEN, and share some findings and lessons. i.e.: The best practice of POD(Populate On Demand), including live migration of the POD pages; Introduce the mem-shr, a memory-saving de-duplication feature, to merge the samepages; Xenpaging optimizing, including some policy enhancements; Scalability investigation and enhancements on Memory Overcommitment; What fields Memory Overcommitment benefits Huawei Cloud, and some performance data
Virtual Asymmetric Multiprocessor for Interactive Performance of Consolidated...Sangwook Kim
The document proposes a virtual asymmetric multiprocessor (vAMP) approach to improve interactive performance in virtual desktop infrastructure (VDI) environments. vAMP dynamically adjusts CPU shares asymmetrically between virtual CPUs (vCPUs) within a virtual machine (VM) based on whether the task hosted on each vCPU is interactive or background. It identifies interactive tasks using a lightweight technique that monitors user I/O and per-task CPU load at the virtual machine monitor (VMM) level. An optional guest OS extension further isolates interactive and background tasks on different vCPUs to mitigate performance degradation from frequent task multiplexing. Evaluation shows the approach improves launch times of interactive applications by up to 70% and frame rates of media
Introduction to Virtualization, Virsh and Virt-Managerwalkerchang
Virtualization allows for the abstraction and sharing of computer hardware resources like CPU, memory, storage and network capacity. The document introduces virtualization concepts and the tools KVM, Virsh and Virt-manager. It provides documentation on Virsh commands to manage domains (VMs), interfaces and networks. These include commands to define, start, suspend, resume VMs and interfaces, as well as take and restore VM snapshots to revert states. Managing VMs, interfaces and networks with Virsh commands allows administrators to efficiently share hardware resources across VMs.
In this session we examined the Xen PV performance on the latest platforms in a few cases that covers CPU/memory intensive, disk intensive and network intensive workloads. We compared Xen PV guest vs. HVM/PVOPS to see whether PV guest still have advantage over HVM on a system with state-of-the-art VT features. KVM was also compared as a reference. We also compared PV driver performance against bare-metal and pass-through/SR-IOV. The identified issues were discussed and we presented our proposal on fixing those issues.
Virtual machine migration techniques can be categorized as non-live and live migration. Live migration has lower downtime and involves pre-copying memory pages to the destination host before migration, post-copying memory after migration, or a hybrid approach. Pre-copy migration creates duplicates during a warm-up phase before stopping the virtual machine to copy remaining changes, while post-copy sends pages on demand after migration completes and the virtual machine resumes.
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVMvwchu
With co-presenter Maninder Singh, delivered a presentation about hypervisors and virtualization technology for an independent topic study project for the Operating System Design (EECS 4221) course at York University, Canada in October 2014.
Virtualization, briefly, is the separation of resources or requests for a service from the underlying physical delivery of that service. It is a concept in which access to a single underlying piece of hardware is coordinated so that multiple guest operating systems can share a single piece of hardware, with no guest operating system being aware that it is actually sharing anything at all.
The document summarizes Xen, an open source hypervisor, and its approach to virtualizing I/O. Xen uses a privileged "dom0" domain to control hardware access and export virtualized devices to other unprivileged domains. It implements I/O memory management through software techniques like grant tables and swiotlb, as well as emerging hardware support from AMD and Intel. Overall, Xen provides secure isolation of guest VMs while enabling high-performance shared access to physical hardware resources.
Hardware support for virtualization originated in the 1970s with goals of running multiple virtual machines on a single physical machine. A key requirement was virtualization allowing equivalent execution of programs in a virtual environment as running natively. The x86 architecture posed challenges to virtualization due to sensitive instructions. Intel Virtualization Technology (VT-x) added hardware support for virtualization on x86 by introducing a new CPU operation mode called VMX non-root, and transitions between it and VMX root mode. This reduced the need for software emulation of sensitive instructions and improved virtualization performance.
Task-aware Virtual Machine Scheduling for I/O PerformanceHwanju Kim
Hwanju Kim, Hyeontaek Lim, Jinkyu Jeong, Heeseung Jo, and Joonwon Lee, “Task-aware Virtual Machine Scheduling for I/O Performance”, ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE), Washington, DC, USA, Mar. 2009.
Application Live Migration in LAN/WAN EnvironmentMahendra Kutare
Evaluation of VM live migration policies on VMware, Xen, IBM System P, and Hyper-V ! Examination of critical stages of VM live migration policy as state machine and steps to optimize and improve service disruption time.
This document discusses I/O virtualization and GPU virtualization. It covers:
- Two approaches to I/O virtualization: hosted and device driver approaches. Hosted has lower engineering cost but lower performance.
- Methods to optimize para-virtualized I/O including split-driver models, reducing data copy costs, and hardware supports like IOMMU and SR-IOV.
- Challenges of GPU virtualization including whether to take a low-level virtualization or high-level API remoting approach. API remoting is preferred due to closed and evolving GPU hardware.
- Hardware pass-through of GPUs for high performance but low scalability. Industry solutions for remote desktop
VM live migration from one physical server to another is a key advantage of virtualization. It's used widely in the scenarios such as load balance / power consumption optimization inside the cluster and host maintenance, etc. Being able to do VM live migration as quickly as possible with no service interruption is regarded as a key competitiveness of the virtualization platform.
Xen has supported live migration for many years. However our recent study shows that Xen still has lots of room to improve, in the aspects of live migration elapsed time, service downtime and concurrency instance number. Several experimental enhancements have been added and the initial result looks pretty good. For instance, merely using memory comparison before migration can speed up the elapsed time by >2X in some cases per our evaluation. The policy to balance the CPU utilization and the compression ratio is also considered.
Building a KVM-based Hypervisor for a Heterogeneous System Architecture Compl...Hann Yu-Ju Huang
This document discusses building a KVM-based hypervisor that can virtualize the key features of Heterogeneous System Architecture (HSA) for a compliant system. It describes HSA features like shared virtual memory, I/O page faulting, and user-level queueing. It then outlines the design of virtualizing these features through techniques like VirtIO-KFD for queues, shadow page tables for shared memory, and shadow PPR interrupts for page faults. Evaluation shows the hypervisor approach incurs average performance overhead of 5% for GPU execution compared to native execution.
This document discusses memory management techniques in Xen virtualization. It covers:
1) Xen uses a buddy allocator to hand out frames to guests and tracks memory usage and types with reference counts and a frametable.
2) For paravirtualized guests, Xen uses PV pagetables where the guest manages a PFN to MFN table and Xen provides a shared MFN to PFN table and checks guest pagetable contents.
3) For hardware-assisted guests, Xen supplies a second set of pagetables describing the PFN to MFN translation and access restrictions, which the CPU applies along with the guest's pagetables.
The document discusses virtualization techniques used in KVM. It describes how KVM uses shadow page tables to virtualize memory management. The shadow page tables allow virtual addresses used by a guest OS to be translated to physical addresses on the host machine. Different techniques for implementing shadow page tables are described, including pre-validation of guest page tables and using a virtual translation lookaside buffer to cache translations.
Memory virtualization allows virtual machines to access virtual memory addresses that are translated to physical memory addresses. Hardware support for memory virtualization reduces overhead by offloading page table management to the CPU. Memory sharing between virtual machines reduces memory usage by identifying identical pages and having them share physical memory. Virtual memory overcommitment allocates more virtual memory than physical memory available by swapping out unused memory to disk. Techniques for memory sharing and overcommitment aim to improve memory utilization in virtualized systems.
Virtual Machine Migration Techniques in Cloud Environment: A Surveyijsrd.com
Cloud is an emerging technology in the world of information technology and is built on the key concept of virtualization. Virtualization separates hardware from software and has benefits of server consolidation and live migration. Live migration is a useful tool for migrating OS instances across distant physical of data centers and clusters. It facilitates load balancing, fault management, low-level system maintenance and reduction in energy consumption. In this paper, we survey the major issues of virtual machine live migration. There are various techniques available for live migration and different parameters are considered for migration.
This document summarizes a talk on redesigning Xen's memory sharing (grant) mechanism. It proposes moving grant-related hypercalls to guest domains to allow unilateral revocation of grants by domains and enable better reuse of grants. An evaluation shows the redesigned mechanism with grant reuse reduces overhead and improves I/O performance compared to the traditional approach.
webinar vmware v-sphere performance management Challenges and Best PracticesMetron
With the majority of businesses using internal Cloud Services, whether it be Software as a Service (SaaS), Platform as a Service (PaaS) or Infrastructure as a Service (IaaS) in a VMware vSphere environment, this presentation gives an insight into how to manage the gathering Storm Clouds. After an introduction to VMware's Virtual Infrastructure 4 (vSphere) environment andCloud Computing, we discuss how Capacity Management provides the means to spot potential Storm Clouds far in advance and more specifically, how you can avoid them.
Delving deeper we look at IaaS and how to identify potential capacity on demand issues. Discussion focuses on topics such as:
•identifying whether virtual machines are under or over provisioned
•the advantages/disadvantages of application sizing
•how to minimize SLA impact
•whether to scale the infrastructure out, up or in and ultimately how to get it right.
Typically organizations have adopted a "silo mentality" whereby they ring fence IT systems and don’t share resources through lack of trust and confidence. We look at the advantages virtualization brings in terms of flexibility, scalability, cost reduction (monetary and environmental) and how we can protect our 'loved ones' through resource pools, shares, reservations and limits.
With all this in mind, join us to find out what information and processes we recommend you need to have and implement to avoid an Internal Storm and ensure that Brighter Outlook!
Todd Deshane presented results from benchmarking tests of Xen and KVM virtualization systems at the 2008 Xen Summit. Key findings included Xen having similar or better CPU performance than KVM, but KVM outperforming Xen on some disk and network tests. Tests of performance isolation showed Xen was generally more isolated, while KVM scaling failed as guest numbers increased beyond 4. Areas for further work were identified, such as expanding tests and automating processes.
This document discusses different techniques for virtual machine migration. It begins with an introduction to virtualization and how virtual machine migration involves copying a VM from one physical machine to another. There are three main categories of migration techniques: fault tolerant techniques which migrate VMs to prevent failures, load balancing techniques which distribute load across servers, and energy efficient techniques which optimize resource utilization to conserve energy. Live VM migration is described as migrating the entire OS and applications between physical machines without disrupting applications. The document also covers background details on virtual machine migration methods being either hot/live where the VM continues running, or cold/non-live where the VM status is lost during migration.
XPDDS18: Memory Overcommitment in XEN - Huang Zhichao, HuaweiThe Linux Foundation
This talk will introduce our practice on Memory Overcommitment in XEN, and share some findings and lessons. i.e.: The best practice of POD(Populate On Demand), including live migration of the POD pages; Introduce the mem-shr, a memory-saving de-duplication feature, to merge the samepages; Xenpaging optimizing, including some policy enhancements; Scalability investigation and enhancements on Memory Overcommitment; What fields Memory Overcommitment benefits Huawei Cloud, and some performance data
Virtual Asymmetric Multiprocessor for Interactive Performance of Consolidated...Sangwook Kim
The document proposes a virtual asymmetric multiprocessor (vAMP) approach to improve interactive performance in virtual desktop infrastructure (VDI) environments. vAMP dynamically adjusts CPU shares asymmetrically between virtual CPUs (vCPUs) within a virtual machine (VM) based on whether the task hosted on each vCPU is interactive or background. It identifies interactive tasks using a lightweight technique that monitors user I/O and per-task CPU load at the virtual machine monitor (VMM) level. An optional guest OS extension further isolates interactive and background tasks on different vCPUs to mitigate performance degradation from frequent task multiplexing. Evaluation shows the approach improves launch times of interactive applications by up to 70% and frame rates of media
Introduction to Virtualization, Virsh and Virt-Managerwalkerchang
Virtualization allows for the abstraction and sharing of computer hardware resources like CPU, memory, storage and network capacity. The document introduces virtualization concepts and the tools KVM, Virsh and Virt-manager. It provides documentation on Virsh commands to manage domains (VMs), interfaces and networks. These include commands to define, start, suspend, resume VMs and interfaces, as well as take and restore VM snapshots to revert states. Managing VMs, interfaces and networks with Virsh commands allows administrators to efficiently share hardware resources across VMs.
In this session we examined the Xen PV performance on the latest platforms in a few cases that covers CPU/memory intensive, disk intensive and network intensive workloads. We compared Xen PV guest vs. HVM/PVOPS to see whether PV guest still have advantage over HVM on a system with state-of-the-art VT features. KVM was also compared as a reference. We also compared PV driver performance against bare-metal and pass-through/SR-IOV. The identified issues were discussed and we presented our proposal on fixing those issues.
Virtual machine migration techniques can be categorized as non-live and live migration. Live migration has lower downtime and involves pre-copying memory pages to the destination host before migration, post-copying memory after migration, or a hybrid approach. Pre-copy migration creates duplicates during a warm-up phase before stopping the virtual machine to copy remaining changes, while post-copy sends pages on demand after migration completes and the virtual machine resumes.
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVMvwchu
With co-presenter Maninder Singh, delivered a presentation about hypervisors and virtualization technology for an independent topic study project for the Operating System Design (EECS 4221) course at York University, Canada in October 2014.
Virtualization, briefly, is the separation of resources or requests for a service from the underlying physical delivery of that service. It is a concept in which access to a single underlying piece of hardware is coordinated so that multiple guest operating systems can share a single piece of hardware, with no guest operating system being aware that it is actually sharing anything at all.
The document summarizes Xen, an open source hypervisor, and its approach to virtualizing I/O. Xen uses a privileged "dom0" domain to control hardware access and export virtualized devices to other unprivileged domains. It implements I/O memory management through software techniques like grant tables and swiotlb, as well as emerging hardware support from AMD and Intel. Overall, Xen provides secure isolation of guest VMs while enabling high-performance shared access to physical hardware resources.
Hardware support for virtualization originated in the 1970s with goals of running multiple virtual machines on a single physical machine. A key requirement was virtualization allowing equivalent execution of programs in a virtual environment as running natively. The x86 architecture posed challenges to virtualization due to sensitive instructions. Intel Virtualization Technology (VT-x) added hardware support for virtualization on x86 by introducing a new CPU operation mode called VMX non-root, and transitions between it and VMX root mode. This reduced the need for software emulation of sensitive instructions and improved virtualization performance.
Task-aware Virtual Machine Scheduling for I/O PerformanceHwanju Kim
Hwanju Kim, Hyeontaek Lim, Jinkyu Jeong, Heeseung Jo, and Joonwon Lee, “Task-aware Virtual Machine Scheduling for I/O Performance”, ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE), Washington, DC, USA, Mar. 2009.
Extending TripleO for OpenStack ManagementKeith Basil
Operational awareness and value for cloud operators has largely been ignored by the OpenStack community. Today with the maturity of TripleO and inclusion of Tuskar, we can now begin to think about TripleO's use as a vehicle for OpenStack infrastructure management.
The question now is How do we extend TripleO with additional value?".
Within this context, there are several areas of integration which can be explored. These include an operator dashboard, infrastructure instrumentation agents, bare metal drivers and other supporting services. Hardware and software vendors can gain insight into what integration looks like from a product point of view.
In this session, we will explore:
- Why TripleO works for infrastructure management
- TripleO management integration points
- What TripleO means for hardware/software vendors
- Early work in this area
Become An OpenStack TripleO ATC - Easy As ABCK Rain Leander
The goal of today’s talk will be about becoming an RDO / OpenStack / TripleO active technical contributor, but what does that mean? It means that you are a developer or designer or support engineer or tester or project manager or writer or whatnot and you’d like to become involved in an open source project. It means that you want contribute to TripleO, which is a project within the umbrella space called OpenStack, and that you can join the RDO community for guidance, feedback, and support.
TripleO is an OpenStack project that aims to deploy OpenStack using OpenStack. It provides automation to deploy and test OpenStack clouds at the bare metal layer using tools like Heat, Diskimage-Builder, and Ironic. TripleO designs robust gold images to deploy consistently tested and reliable OpenStack environments, reducing costs of operations and maintenance through continuous integration and deployment techniques. By deploying OpenStack on bare metal with tools like Ironic, TripleO can reliably install and upgrade OpenStack clouds.
Author: Rico Lin
Intro:
Dive in detail about a big task in Heat: To optimize application experiences in OpenStack.
This task aim to provide datacenter ready Orchestration service on OpenStack and make heat,
murano, sahara, tripleO and anyother services (used heat) to have trusted and stable Orchestration over cloud.
This document provides an introduction to virtualization including:
1) The benefits of virtualization like efficient resource utilization and strong isolation between virtual machines.
2) A brief history of virtualization from the 1960s mainframe era to modern ubiquitous cloud computing.
3) Popular use cases of virtualization including cloud computing, virtual desktop infrastructure, and mobile virtualization.
4) Basic terminologies that distinguish type-1 and type-2 virtual machine monitors as well as full and para-virtualization methods.
RTOS Material hfffffffffffffffffffffffffffffffffffffadugnanegero
This document provides an overview of real-time operating systems and kernel concepts across 34 slides. The key topics covered include real-time kernels, tasks and processes, scheduling algorithms like priority-based and cyclic executives, intertask communication methods like mailboxes and semaphores, and synchronization techniques.
This document provides an introduction to real-time systems and discusses approaches to making Linux a real-time operating system. It defines hard and soft real-time systems and explains why Linux is commonly used instead of dedicated real-time operating systems. The document then discusses two main solutions, PREEMPT_RT and Xenomai 3, which provide patches to make Linux meet timing constraints through different approaches. It also provides an overview of basic real-time concepts like scheduling algorithms, preemptive vs. non-preemptive scheduling, and interprocess communication.
mTCP enables high-performance userspace TCP/IP stacks by bypassing the kernel and reducing system call overhead. It was shown to achieve up to 25x higher throughput than Linux for short flows. The document discusses porting the iperf benchmark to use mTCP, which required only minor changes. Performance tests found that mTCP-ified iperf achieved similar throughput as Linux iperf for different packet sizes, demonstrating mTCP's ability to easily accelerate networking applications with minimal changes. The author concludes mTCP is a simple and effective way to improve TCP performance but notes that for full-featured stacks, a system like NUSE may be preferable as it can provide the high performance of userspace stacks while supporting the full functionality of kernel
Achieving Performance Isolation with Lightweight Co-KernelsJiannan Ouyang, PhD
This slides were presented at the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '15)
Performance isolation is emerging as a requirement for High Performance Computing (HPC) applications, particularly as HPC architectures turn to in situ data processing and application composition techniques to increase system throughput. These approaches require the co-location of disparate workloads on the same compute node, each with different resource and runtime requirements. In this paper we claim that these workloads cannot be effectively managed by a single Operating System/Runtime (OS/R). Therefore, we present Pisces, a system software architecture that enables the co-existence of multiple independent and fully isolated OS/Rs, or enclaves, that can be customized to address the disparate requirements of next generation HPC workloads. Each enclave consists of a specialized lightweight OS co-kernel and runtime, which is capable of independently managing partitions of dynamically assigned hardware resources. Contrary to other co-kernel approaches, in this work we consider performance isolation to be a primary requirement and present a novel co-kernel architecture to achieve this goal. We further present a set of design requirements necessary to ensure performance isolation, including: (1) elimination of cross OS dependencies, (2) internalized management of I/O, (3) limiting cross enclave communication to explicit shared memory channels, and (4) using virtualization techniques to provide missing OS features. The implementation of the Pisces co-kernel architecture is based on the Kitten Lightweight Kernel and Palacios Virtual Machine Monitor, two system software architectures designed specifically for HPC systems. Finally we will show that lightweight isolated co-kernels can provide better performance for HPC applications, and that isolated virtual machines are even capable of outperforming native environments in the presence of competing workloads.
Project ACRN CPU sharing BVT scheduler in ACRN hypervisorProject ACRN
This document describes the Borrowed Virtual Time (BVT) scheduler algorithm implemented in the ACRN hypervisor. BVT aims to provide weighted fair sharing of CPU resources across VMs while prioritizing latency-sensitive workloads. It tracks an effective virtual time for each VM and dispatches the VM with the earliest time. Latency threads can "warp" back in time. BVT is evaluated against the IORR scheduler in ACRN across CPU throughput, I/O throughput and latency tests, showing BVT provides more fair sharing and higher performance. The BVT implementation consists of 302 lines of code in the acrn-hypervisor.
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUsJiannan Ouyang, PhD
This slides were presented at the 12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’16).
Virtual Machine based approaches to workload consolidation, as seen in IaaS cloud as well as datacenter platforms, have long had to contend with performance degradation caused by synchronization primitives inside the guest environments. These primitives can be affected by virtual CPU preemptions by the host scheduler that can introduce delays that are orders of magnitude longer than those primitives were designed for. While a significant amount of work has focused on the behavior of spinlock primitives as a source of these performance issues, spinlocks do not represent the entirety of synchronization mechanisms that are susceptible to scheduling issues when running in a virtualized environment. In this paper we address the virtualized performance issues introduced by TLB shootdown operations. Our profiling study, based on the PARSEC benchmark suite, has shown that up to 64% of a VM's CPU time can be spent on TLB shootdown operations under certain workloads. In order to address this problem, we present a paravirtual TLB shootdown scheme named Shoot4U. Shoot4U completely eliminates TLB shootdown preemptions by invalidating guest TLB entries from the VMM and allowing guest TLB shootdown operations to complete without waiting for remote virtual CPUs to be scheduled. Our performance evaluation using the PARSEC benchmark suite demonstrates that Shoot4U can reduce benchmark runtime by up to 85% compared an unmodified Linux kernel, and up to 44% over a state-of-the-art paravirtual TLB shootdown scheme.
NUSE is a library implementation of a network stack in userspace that allows new protocols and implementations to be added more quickly without modifying the kernel. It works by hijacking system calls related to networking at the library level, running the network stack code in a separate execution context using lightweight virtualization, and connecting to the network interface using options like raw sockets, DPDK, or netmap. This approach avoids the slow evolution of making kernel changes and allows network stacks and applications to be updated and deployed more flexibly on a per-application basis.
PFQ@ 9th Italian Networking Workshop (Courmayeur)Nicola Bonelli
PFQ is a novel packet capturing architecture designed to maximize performance on multi-core systems. It avoids issues with previous approaches like slow kernel packet capturing and single-processor packet steering. PFQ uses wait-free algorithms, prefetching queues, and a demultiplexing matrix to distribute packets concurrently across cores and sockets while minimizing sharing and false sharing. It provides flexible packet steering and filtering to balance loads and classify packets. Tests show PFQ can process over 13 million packets per second on commodity hardware using only a few cores.
Advanced performance troubleshooting using esxtopAlan Renouf
This document discusses using esxtop and resxtop tools to troubleshoot performance issues on VMware ESXi hosts. It provides 10 key things to know about esxtop counters and how they work. It then gives examples of using esxtop to troubleshoot common problems like CPU contention, memory issues, network throughput problems, and disk I/O latency. It also lists some other diagnostic tools that can be used along with esxtop.
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
In this presentation I talk about our motivation to converting our microservices to run on Kubernetes. I discuss many of the technical challenges we encountered along the way, including networking issues, Java issues, monitoring and alerting, and managing all of our resources!
Ensuring performance for real time packet processing in open stack white paperhptoga
1) The document discusses ensuring low latency for real-time packet processing in OpenStack by addressing issues like non-deterministic behavior and accelerating packet handling.
2) It recommends OpenStack configurations including NUMA awareness, CPU pinning, huge pages, and disabling overcommit settings to provide deterministic performance.
3) Testing showed these recommendations cumulatively provided a 5x increase in capacity and 6x increase in throughput for real-time communication workloads.
Network functions virtualization (NFV) has the potential to transform the way operators offer services. While it brings with it flexibility to enable operators to offer customizable services that can deliver great value to the end user - or as a leading carrier describes it, a "user-defined network" - it can also complicate network operations.
Some of the concerns over sync and NFV are already being addressed in the data center world. Take, for example, in
large financial trading houses where synchronization is
tightly coupled into the software architecture to provide microsecond-level time-stamping to trades. This presentation
examines the new options for synchronization as it relates to NFV - and what it will take to enable accurate synchronization over a virtual network.
Network functions virtualization (NFV) has the potential to transform the way operators offer services. While it brings with it flexibility to enable operators to offer customizable services that can deliver great value to the end user - or as a leading carrier describes it, a "user-defined network" - it can also complicate network operations.
Some of the concerns over sync and NFV are already being addressed in the data center world. Take, for example, in
large financial trading houses where synchronization is
tightly coupled into the software architecture to provide microsecond-level time-stamping to trades. This presentation
examines the new options for synchronization as it relates to NFV - and what it will take to enable accurate synchronization over a virtual network.
AIST Super Green Cloud: lessons learned from the operation and the performanc...Ryousei Takano
This document discusses lessons learned from operating the AIST Super Green Cloud (ASGC), a fully virtualized high-performance computing (HPC) cloud system. It summarizes key findings from the first six months of operation, including performance evaluations of SR-IOV virtualization and HPC applications. It also outlines conclusions and future work, such as improving data movement efficiency across hybrid cloud environments.
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...VMworld
VMworld 2013
Bhavesh Davda, VMware
Josh Simons, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
The Next Step ofOpenStack Evolution for NFV DeploymentsDirk Kutscher
NFV is now a well-known concept and in an early deployment stage, leveraging and adapting OpenStack and other Open Source Software systems. In the OPNFV project, a large group of industry peers is building a carrier-grade, integrated, open source reference platform for the NFV community. The telco industry has successfully adopted Open Source Software for carrier-grade deployments. It is now time for taking the next steps and to extend the colloaboration with upstream projects -- by opening up previously proprietary developments, by contributing code and other artifacts in order to create a ecosystem of NFV platforms, applications, and management/orchestration systems.
This talk shares some insights on how Red Hat and NEC are working together to foster collaboration in the NFV ecosystem by actively working with OpenStack and other upstream projects.
NEC has pioneered the adoption of Linux, KVM, Open vSwitch, and OpenStack for their mobile network core product line (virtualized EPC) and has gained significant experience through development work and deployments. NEC's extensions for high efficiency and high availability have led to contributions of new features to OpenStack, such as DPDK vSwitch control and CPU allocation features. For NEC, it is very important to have those features integrated into the mainstream code base for building reliable infrastructure systems.
Red Hat, one of main contributors to OpenStack, leads the development of those functions to meet NFV requirements in OpenStack, making critical and demanding applications run of top of open platforms. The presentation explains how NEC and Red Hat are integrating and optimizing Red Hat Enterprise Linux OpenStack Platform and NFV, along with contributions to open source communities, including OpenStack and Open Platform for NFV (OPNFV).
The engineering challenges of designing for low latency execution include tightly controlling the time it takes to detect the onset of latency excursion and a diagnosis of its most likely cause. In modern x-as-a-service (XaaS) forms of distributed applications, the points at which latency is experienced by a service consumer are separated by many layers of modular abstractions from the underlying system hardware. This separation makes it difficult to pinpoint the causes of latency pushouts and to apply corrective actions in a timely manner. The classic performance methodology to profile ‘cycles’ of work may be broadly successful in extracting higher levels of latency, but not very effective in determining causes of short-duration latency surges; and, to determine that, it is frequently necessary to:
• trace execution
• pinpoint when a significant latency stretch out occurs
• establish its correlation with a nearby precursor or a set of precursor events
Each of these steps can incur significant overheads; further, one has to be concerned that even modest overheads from tracing risk contributing to tail latencies. Not just the detection of the onset of a latency excursion, but the identification of why it occurs must be completed quickly so that if a corrective action is possible, it can be taken promptly. Similarly, if no recourse to curb the latency of a slice of computation is available at some point in time, then it is ideal that steps to minimize the impact of the exception are put into effect as early as possible
In our talk, we present an approach that complements the very low overhead software tracing provided by KUtrace. It uses eBPF to trigger a collection of additional data at very low overhead from the hardware performance monitoring unit (PMU) so that latency excursions within a span of execution can be examined in a timely manner. We will describe the use of PMU capabilities like precise events-based sampling (PEBS) and timed last branch records (Timed LBRs) in close proximity to events of interest to extract critical clues. We will further discuss planned future work to integrate in-band network telemetry (INT) into these tracing flows.
[EWiLi2016] Enabling power-awareness for the Xen HypervisorMatteo Ferroni
Virtualization allows simultaneous execution of multi-tenant workloads on the same platform, either a server or an embedded system. Unfortunately, it is non-trivial to attribute hardware events to multiple virtual tenants, as some system’s metrics relate to the whole system (e.g., RAPL energy counters). Virtualized environments have then a rather incomplete picture of how tenants use the hardware, limiting their optimization capabilities. Thus, we propose XeMPower, a lightweight monitoring solution for Xen that precisely accounts hardware events to guest workloads. It also enables attribution of CPU power consumption to individual tenants. We show that XeMPower introduces negligible overhead in power consumption, aiming to be a reference design for power-aware virtualized environments.
Full paper: http://ceur-ws.org/Vol-1697/EWiLi16_10.pdf
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Demand-Based Coordinated Scheduling for SMP VMs
1. Demand-Based Coordinated
Scheduling for SMP VMs
Hwanju Kim1, Sangwook Kim2, Jinkyu Jeong1, Joonwon Lee2,
and Seungryoul Maeng1
Korea Advanced Institute of Science and Technology (KAIST)1
Sungkyunkwan University2
The 18th International Conference on Architectural Support for Programming Languages and Operating
Systems (ASPLOS)
Houston, Texas, March 16-20 2013
1
2. Software Trends in Multi-core Era
• Making the best use of HW parallelism
• Increasing “thread-level parallelism”
HW
SW
“Convergence of Recognition, Mining, and Synthesis Workloads and Its Implications”,
Proceedings of IEEE, 2008
Processor
OS
App App App
Apps increasingly being multithreaded
RMS apps are “emerging killer apps”
Processors increasingly adding more cores
2/28
3. Software Trends in Multi-core Era
• Synchronization (communication)
• The greatest obstacle to the performance of
multithreaded workloads
HW
SW
Processor
OS
App App App
Barrier
Barrier
Thread
Lock wait
SpinlockSpin
wait
CPU
3/28
4. Software Trends in Multi-core Era
• Virtualization
• Ubiquitous for consolidating multiple workloads
• “Even OSes are workloads to be handled by VMM”
HW
SW
Processor
OS
App App App
OS OS
VMM
SMP
VM
SMP
VM
SMP
VM
“Synchronization-conscious coordination”
is essential for VMM to improve efficiency
Virtual CPU (vCPU) as a software entity
dictated by VMM scheduler
4/28
5. Coordinated Scheduling
vCPU
VMM scheduler VMM
pCPU pCPU pCPU pCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
Time
shared
Uncoordinated scheduling
A vCPU treated as an independent entity
Independent
entity
vCPU
VMM scheduler VMM
pCPU pCPU pCPU pCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
Coordinated scheduling
Sibling vCPUs treated as a group
(who belongs to the same VM)
Coordinated
group
vCPU
vCPU
vCPU
Time
shared
Lock
holder
Lock
waiter
Lock
waiter
Running
Waiting
Waiting
Uncoordinated scheduling makes
inter-vCPU synchronization ineffective
Time
shared
5/28
6. Prior Efforts for Coordination
Coscheduling [Ousterhout82]
: Synchronizing execution
Time
pCPU
pCPU
pCPU
pCPU
vCPU execution
Illusion of dedicated multi-core,
but CPU fragmentation
Relaxed coscheduling [VMware10]
: Balancing execution time
Time
pCPU
pCPU
pCPU
pCPU
Stop execution for siblings to catch up
Good CPU utilization & coordination,
but not based on synchronization demands
Time
pCPU
pCPU
pCPU
pCPU
Balance scheduling [Sukwong11]
: Balancing pCPU allocation
Good CPU utilization & coordination,
but not based on synchronization demands
Selective coscheduling [Weng09,11]…
: Coscheduling selected vCPUs
Time
pCPU
pCPU
pCPU
pCPU
Better coordination through explicit information,
but relying on user or OS support
Selected vCPUs
Need for VMM scheduling based on
synchronization (coordination) demands
6/28
7. Overview
• Demand-based coordinated scheduling
• Identifying synchronization demands
• With non-intrusive design
• Not compromising inter-VM fairness
Time
pCPU
pCPU
pCPU
pCPU
Demand of coscheduling for synchronization
Demand of delayed preemption for synchronization
Preemption
attempt
7/28
8. Coordination Space
• Time and space domains
• Independent scheduling decision for each domain
Space
Where to schedule?
Time
When to schedule?
vCPU
pCPU pCPU pCPU pCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
Coordinated
groupPreemptive scheduling policy
Coscheduling
Delayed preemption
pCPU assignment policy
8/28
9. Outline
• Motivation
• Coordination in time domain
• Kernel-level coordination demands
• User-level coordination demands
• Coordination in space domain
• Load-conscious balance scheduling
• Evaluation
vCPU
pCPU pCPU
vCPU
vCPU
vCPU
vCPU
vCPU
Space
Time
9/28
10. Synchronization to be Coordinated
• Synchronization based on “busy-waiting”
• Unnecessary CPU consumption by busy-waiting
for a descheduled vCPU
• Significant performance degradation
• Semantic gap
• “OSes make liberal use of busy-waiting (e.g., spinlock)
since they believe their vCPUs are dedicated”
Serious problem in kernel
vCPU
pCPU pCPU pCPU pCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
• When and where to demand synchronization?
• How to identify coordination demands?
10/28
11. Kernel-Level Coordination Demands
• Does kernel really need coordination?
• Experimental analysis
• Multithreaded applications in the PARSEC suite
• Measuring “kernel time” when uncoordinated
Solorun (no consolidation) Corun (w/ 1 VM running streamcluster)A 8-vCPU VM
on 8 pCPUs
0%
20%
40%
60%
80%
100%
blackscholes
bodytrack
canneal
dedup
facesim
ferret
fluidanimate
freqmine
raytrace
streamcluster
swaptions
vips
x264
CPUtime(%)
Kernel time User time
0%
20%
40%
60%
80%
100%
blackscholes
bodytrack
canneal
dedup
facesim
ferret
fluidanimate
freqmine
raytrace
streamcluster
swaptions
vips
x264
CPUtime(%)
Kernel time User time
Kernel time ratio is largely amplified by x1.3-x30
“Newly introduced kernel-level contention”
11/28
12. Kernel-Level Coordination Demands
• Where is the kernel time amplified?
0%
20%
40%
60%
80%
100%
blackscholes
bodytrack
canneal
dedup
facesim
ferret
fluidanimate
freqmine
raytrace
streamcluster
swaptions
vips
x264
CPUtime(%)
Kernel time User time
0%
20%
40%
60%
80%
100%
CPUusageforkerneltime(%)
TLB shootdown Lock spinning Others
Kernel time breakdown by functions
Dominant sources
1) TLB shootdown
2) Lock spinning
How to identify?
12/28
13. How to Identify TLB Shootdown?
• TLB shootdown
• Notification of TLB invalidation to a remote CPU
CPU
Thread
CPU
Thread
Virtual address
space
TLB TLB
V->P1
V->P1
V->P1
V->P2 or V->Null
Modify
or
Unmap
Inter-processor interrupt (IPI)
Busy-waiting until all corresponding
TLB entries are invalidated
“Busy-waiting for TLB synchronization” is efficient in native systems,
but not in virtualized systems if target vCPUs are not scheduled.
(Even worse if TLBs are synchronized in a broadcast manner)
13/28
14. How to Identify TLB Shootdown?
• TLB shootdown IPI
• Virtualized by VMM
• Used in x86-based Windows and Linux
0%
20%
40%
60%
80%
100%
bodytrack
canneal
dedup
facesim
ferret
fluidani…
streamcl…
swaptions
vips
x264
CPUusageforkerneltime(%)
TLB shootdown Lock spinning Others
0
500
1000
1500
2000
bodytrack
canneal
dedup
facesim
ferret
fluidanim…
streamclu…
swaptions
vips
x264
#ofIPIs/vCPU/sec
“A TLB shootdown IPI is a signal for coordination demand!”
Co-schedule IPI-recipient vCPUs with its sender vCPU
TLB shootdown IPI traffic
14/28
15. How to Identify Lock Spinning?
• Why excessive lock spinning?
• “Lock-holder preemption (LHP)”
• Short critical section can be unpredictably prolonged by
vCPU preemption
• Which spinlock is problematic?
vCPU
pCPU pCPU pCPU pCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
0%
20%
40%
60%
80%
100%
Lockwaittime(%)
Other locks
Runqueue lock
Pagetable lock
Semaphore wait-queue lock
Futex wait-queue lock
Spinlock
wait time
breakdown
82%
93%
15/28
16. How to Identify Lock Spinning?
• Futex
• Linux kernel support for user-level synchronization
(e.g., mutex, barrier, conditional variables, etc)
mutex_lock(mutex)
/* critical section */
mutex_unlock(mutex)
futex_wake(mutex) {
spin_lock(queue->lock)
thread=dequeue(queue)
wake_up(thread)
spin_unlock(queue->lock)
}
mutex_lock(mutex)
futex_wait(mutex) {
spin_lock(queue->lock)
enqueue(queue, me)
spin_unlock(queue->lock)
schedule() /* blocked */
vCPU1 vCPU2
/* wake-up */
/* critical section */
mutex_unlock(mutex)
futex_wake(mutex) {
spin_lock(queue->lock)
Reschedule IPI
User-level
contention
Kernel-level
contention
If vCPU1 is preempted before releasing its spinlock,
vCPU2 starts busy-waiting on the preempted spinlock
LHP!
Kernel
space
16/28
Preempted
17. How to Identify Lock Spinning?
• Why preemption-prone?
pCPU
vCPU1
vCPU0
VMExit
IPI emulation
Wait-queue lock
VMExit
APIC reg access
VMEntry
VMExit
APIC reg access
VMEntry
Wait-queue unlock
VMEntry
Wait-queue lock
spinning
Prolonged by VMM intervention
Multiple VMM interventions
for one IPI transmission
Repeated by iterative wake-up
No more short critical section!
Likelihood of preemption
Preemption by woken-up sibling
Serious issue
Remote thread wake-up
17/28
18. How to Identify Lock Spinning?
• Generalization: “Wait-queue locks”
• Not limited to futex wake-up
• Many wake-up functions in the Linux kernel
• General wake-up
• __wake_up*()
• Semaphore or mutex unlock
• rwsem_wake(), __mutex_unlock_common_slowpath(), …
• “Multithreaded workloads usually communicate
and synchronize on wait-queues”
“A Reschedule IPI is a signal for coordination demand!”
Delay preemption of an IPI-sender vCPU
until a likely-held spinlock is released
18/28
19. Outline
• Motivation
• Coordination in time domain
• Kernel-level coordination demands
• User-level coordination demands
• Coordination in space domain
• Load-conscious balance scheduling
• Evaluation
vCPU
pCPU pCPU
vCPU
vCPU
vCPU
vCPU
vCPU
Space
Time
19/28
20. vCPU-to-pCPU Assignment
• Balance scheduling [Sukwong11]
• Spreading sibling vCPUs on different pCPUs
• Increase in likelihood of coscheduling
• No coordination in time domain
vCPU
pCPU pCPU pCPU pCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
pCPU pCPU pCPU pCPU
Uncoordinated scheduling Balance scheduling
vCPU stacking
Likelihood of
coscheduling
<
No vCPU stacking
20/28
21. vCPU-to-pCPU Assignment
• Balance scheduling [Sukwong11]
• Limitation
• Based on “global CPU loads are well balanced”
• In practice, VMs with fair CPU shares can have
vCPU vCPU
vCPU
vCPU vCPU
x4 shares
SMP VM
UP VM
vCPU vCPU
vCPU
vCPU vCPUSMP VM
SMP VM
Inactive vCPUs
Single-threaded workload
Multithreaded workload
Different # of vCPUs Different TLP
0
200
400
600
800
5 15 25 35 45 55 65 75 85 95
CPUusage(%)
Time (sec)
canneal
0
200
400
600
800
1 4 7 10 13 16 19 22
CPUusage(%)
Time (sec)
dedup
TLP can be changed
in a multithreaded app
TLP: Thread-level parallelism
pCPU pCPU
vCPUvCPU
pCPU pCPU
vCPU vCPU
vCPU vCPU
High scheduling latency
Balance scheduling
on imbalanced loads
21/28
22. Proposed Scheme
• Load-conscious balance scheduling
• Adaptive scheme based on pCPU loads
• When assigning a vCPU, check pCPU loads
vCPU
pCPU pCPU pCPU pCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
If load is balanced
Balance scheduling
vCPU
pCPU pCPU pCPU pCPU
vCPU
vCPU
vCPU vCPU
vCPU
If load is imbalanced
Favoring underloaded pCPUs
CPU load > Avg. CPU load
overloaded
Handled by coordination
in time domain
22/28
23. Outline
• Motivation
• Coordination in time domain
• Kernel-level coordination demands
• User-level coordination demands
• Coordination in space domain
• Load-conscious balance scheduling
• Evaluation
23/28
24. Evaluation
• Implementation
• Based on Linux KVM and CFS
• Evaluation
• Effective time slice
• For coscheduling & delayed preemption
• 500us decided by sensitive analysis
• Performance improvement
• Alternative
• OS re-engineering
24/28
25. Evaluation
• SMP VM with UP VMs
• One 8-vCPU VM + four 1-vCPU VMs (x264)
0.00
0.50
1.00
1.50
2.00
Normalizedexecutiontime
Workloads of 8-vCPU VM
Baseline Balance LC-Balance LC-Balance+Resched-DP LC-Balance+Resched-DP+TLB-Co
Futex-intensive
5-53% improvement
TLB-intensive
20-90% improvement
Performance of 8-vCPU VM
LC-Balance: Load-conscious balance scheduling
Resched-DP: Delayed preemption for reschedule IPI
TLB-Co: Coscheduling for TLB shootdown IPI
Non-synchronization-intensive
25/28
High scheduling latencyBalance
scheduling
26. Alternative: OS Re-engineering
• Virtualization-friendly re-engineering
• Decoupling reschedule IPI transmission from
thread wake-up
wake_up (queue) {
spin_lock(queue->lock)
thread=dequeue(queue)
wake_up(thread)
spin_unlock(queue->lock)
}
Reschedule IPI
Delayed reschedule IPI transmission
• Modified wake_up func
• Using per-cpu bitmap
• Applied to futex_wakeup
& futex_requeue
One 8-vCPU VM + four 1-vCPU VMs (x264)
Delayed reschedule IPI is virtualization-friendly to resolve LHP problems
26/28
0.00
0.20
0.40
0.60
0.80
1.00
1.20
facesim streamcluster
Normalizedexecutiontime
Baseline
Baseline w/ DelayedResched
LC_Balance
LC_Balance w/ DelayedResched
LC_Balance w/ Resched-DP
27. Conclusions & Future Work
• Demand-based coordinated scheduling
• IPI as an effective signal for coordination
• pCPU assignment conscious of dynamic CPU loads
• Limitation
• Cannot cover ALL types of synchronization demands
• Kernel spinlock contention w/o VMM intervention
• Future work
• Cooperation with HW (e.g., PLE) & paravirt
Barrier or lock
27/28
Address
space
30. User-Level Coordination Demands
• Coscheduling-friendly workloads
• SPMD, bulk-synchronous, etc.
• Busy-waiting synchronization
• “Spin-then-block”
Barrier
Barrier
Thread1 Thread2 Thread3 Thread4
Wake
up
Wake
up
Wake
up
Wake
up
Additional
barrier
Thread1 Thread2 Thread3 Thread4 Thread1 Thread2 Thread3 Thread4
Wake
up
Coscheduling
(balanced execution)
Uncoordinated
(largely skewed execution)
Uncoordinated
(skewed execution)
More blocking operations
when uncoordinated
Spin Block
30/28
31. User-Level Coordination Demands
• Coscheduling
• Avoiding more expensive blocking in a VM
• VMExits for CPU yielding and wake-up
• Halt (HLT) and Reschedule IPI
• When to coschedule?
• User-level synchronization involves reschedule IPIs
Providing a knob to selectively enable this coscheduling for coscheduling-friendly VMs
Reschedule IPI traffic of streamcluster
Barriers Barriers Barriers Barriers Barriers Barriers
“A Reschedule IPI is a signal for coordination demand!”
Co-schedule IPI-recipient vCPUs with a sender vCPU
31/28
32. Urgent vCPU First (UVF) Scheduling
• Urgent vCPU
• 1. Preemptively scheduled if fairness is kept
• 2. Protected from preemption once scheduled
• During “Urgent time slice (utslice)”
pCPU
vCPU vCPU vCPU
Urgent queue Runqueue
vCPU
pCPU
vCPU vCPU vCPUvCPU
FIFO order Proportional shares order
vCPU : urgent vCPU
vCPU vCPU
Wait queue
If inter-VM fairness is kept
Coscheduled
Protected from
preemption
32/28
33. Proposed Scheme
• Load-conscious balance scheduling
• Adaptive scheme based on pCPU loads
vCPU
pCPU pCPU pCPU pCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
vCPU
Balanced loads
Balance scheduling
vCPU
pCPU pCPU pCPU pCPU
vCPU
vCPU
vCPU vCPU
vCPU
Imbalanced loads
Favoring underloaded pCPUs
vCPU
pCPU0 pCPU1 pCPU2 pCPU3
vCPU
vCPU vCPU
Wait queue
• Example
vCPUvCPU vCPU
Candidate pCPU set
(Scheduler assigns a lowest-loaded pCPU in this set)
= {pCPU0, pCPU1, pCPU2, pCPU3}
pCPU3 is overloaded
(i.e., CPU load > Avg. CPU load)
Handled by coordination in time domain
(UVF scheduling)
33/28
34. Evaluation
• Urgent time slice (utslice)
• 1. Utslice for reducing LHP
• 2. Utslice for quickly serving multiple urgent vCPUs
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 100 300 500 700 1000
#offutexqueueLHP
Utslice (usec)
bodytrack
facesim
streamcluster
Workloads:
A futex-intensive workload in one VM
+ dedup in another VM as a preempting VM
>300us utslice
2x-3.8x LHP reduction
Remaining LHPs occur during local wake-up or
before reschedule IPI transmission
Unlikely lead to lock contention
34/28
35. Evaluation
• Urgent time slice (utslice)
• 1. utslice for reducing LHP
• 2. utslice for quickly serving multiple urgent vCPUs
30
35
40
45
50
55
60
0
2
4
6
8
10
12
14
16
100 500 1000 3000 5000
Averageexecutiontime(sec)
CPUcycles(%)
Utslice (usec)
Spinlock cycles (%)
TLB cycles (%)
Execution time (sec)
Workloads:
3 VMs, each of which runs vips
(vips - TLB-IPI-intensive application)
As utslice increases,
TLB shootdown cycles increase
500usec is an appropriate utslice for both
LHP reduction and multiple urgent vCPUs
~11% degradation
35/28
36. Evaluation
• Urgent allowance
• Improving overall efficiency with fairness
0
0.5
1
1.5
2
2.5
3
3.5
0
5
10
15
20
25
30
No UVF 0 6 12 18 24
Slowdown
CPUcycles(%)
Urgent allowance (msec)
Spinlock cycles
TLB cycles
Slowdown (vips)
Slowdown (facesim x 2)
Workloads:
vips (TLB-IPI-intensive) VM + two facesim VMs
Efficient TLB synchronization
No performance drop
36/28
37. Evaluation
• Impact of kernel-level coordination
• One 8-vCPU VM + four 1-vCPU VMs (x264)
0.00
0.50
1.00
1.50
Normalizedexecutiontime
Co-running workloads with 1-vCPU VM (x264)
Baseline Balance LC-Balance LC-Balance+Resched-DP LC-Balance+Resched-DP+TLB-Co
Performance of 1-vCPU VM
LC-Balance: Load-conscious balance scheduling
Resched-DP: Delayed preemption for reschedule IPI
TLB-Co: Coscheduling for TLB shootdown IPI
Unfair
contention
Balance
scheduling
Balance scheduling Up to 26% degradation
37/28
38. Evaluation: Two SMP VMs
w/ dedup
w/ freqmine
a: baseline
b: balance
c: LC-balance
d: LC-balance+Resched-DP
e: LC-balance+Resched-DP+TLB-Co
corun
solorun
Time
Time
38/28
39. Evaluation
• Effectiveness on HW-assisted feature
• CPU feature to reduce the amount of busy-waiting
• VMExit in response to excessive busy-waiting
• Intel Pause-Loop-Exiting (PLE), AMD Pause Filter
• Inevitable cost of some busy-waiting and VMExit
LHP
PAUSE
PAUSE
PAUSE
…
Threshold
VMExit
Yielding
0
0.2
0.4
0.6
0.8
1
0
2
4
6
8
10
Baseline LC_Balance LC_Balance
w/ UVF
Normalizedexecutiontime
CPUcycles(%)
TLB cycles (%) Spinlock cycles (%)
Execution time (sec)
0
0.2
0.4
0.6
0.8
1
0
2
4
6
8
10
Baseline LC_Balance LC_Balance
w/ UVF
Normalizedexecutiontime
CPUcycles(%)
TLB cycles (%) Spinlock cycles (%)
Execution time (sec)
streamcluster (futex-intensive) ferret (TLB-IPI-intensive)
Apps Streamcluster facesim ferret vips
Reduction in Pause-
loop VMExits (%) 44.5 97.7 74.0 37.9
39/28