Virtualization allows multiple virtual machines to run on a single physical machine. It relies on hardware advances like multi-core CPUs and networking improvements. Virtualization works by either emulating hardware, trapping privileged instructions and emulating them, dynamic binary translation, or paravirtualization where the guest OS is aware it is virtualized. I/O virtualization can emulate devices, use paravirtualized drivers, or directly assign devices to VMs. This enables server consolidation and efficient utilization of resources in cloud computing.
The Xen Project supports some of the biggest clouds in production today and is moving into new industries, like security and automotive. Usually, you will use Xen indirectly as part of a commercial product, a distro, a hosting or cloud service and only indirectly use Xen. By following this session you will learn how Xen and virtualization work under the hood exploring high-level topics like architecture concepts related to virtualization to more technical attributes of the hypervisor like memory management (ballooning), virtual CPUs, scheduling, pinning, saving/restoring and migrating VMs.
XPDDS18: LCC18: Xen Project: After 15 years, What's Next? - George Dunlap, C...The Linux Foundation
The Xen Hypervisor is 15 years old, but like Linux, it is still undergoing significant upgrades and improvements. This talk will cover recent and upcoming developments in Xen on the x86 architecture, including the newly-released 'PVH' guest virtualization mode, the future of PV mode, qemu deprivileging, and more. We will cover why these new features are important for a wide range of environments, from cloud to embedded.
Note: also see https://www.slideshare.net/xen_com_mgr/ossna18-xen-beginners-training-exercise-script
The Xen Project supports some of the biggest clouds in production today and is moving into new industries, like security and automotive. Usually, you will use Xen indirectly as part of a commercial product, a distro, a hosting or cloud service and only indirectly use Xen. By following this session you will learn how Xen and virtualization work under the hood exploring high-level topics like architecture concepts related to virtualization to more technical attributes of the hypervisor like memory management (ballooning), virtual CPUs, scheduling, pinning, saving/restoring and migrating VMs.
Introduce F9 microkernel, new open source implementation built from scratch, which deploys modern kernel techniques, derived from L4 microkernel designs, to deep embedded devices.
:: https://github.com/f9micro
Characteristics of F9 microkernel
– Efficiency: performance + power consumption
– Security: memory protection + isolated execution
– Flexible development environment
The needs for immediate responsiveness of VMs in the virtualized environments have been on the rise. Several services in SKT also require soft realtime support for virtual machines to substitute the physical machines to achieve high utilization and adaptability. However, consolidated multiple OSes and irregular external events might render the hypervisor infringe on a VM's promptitude. As a solution of this problem, we are improving Xen's credit scheduler by introducing the RT_PRIORITY that guarantees a VM's running at any given point in time as long as credits remains to be burn. It would increase the quality of service and make a VM's behavior predictable on the consolidated environment. In addition, we extend our suggestion to the multi-core environment and even a large number of physical machines by using live migrations.
The Xen Project supports some of the biggest clouds in production today and is moving into new industries, like security and automotive. Usually, you will use Xen indirectly as part of a commercial product, a distro, a hosting or cloud service and only indirectly use Xen. By following this session you will learn how Xen and virtualization work under the hood exploring high-level topics like architecture concepts related to virtualization to more technical attributes of the hypervisor like memory management (ballooning), virtual CPUs, scheduling, pinning, saving/restoring and migrating VMs.
XPDDS18: LCC18: Xen Project: After 15 years, What's Next? - George Dunlap, C...The Linux Foundation
The Xen Hypervisor is 15 years old, but like Linux, it is still undergoing significant upgrades and improvements. This talk will cover recent and upcoming developments in Xen on the x86 architecture, including the newly-released 'PVH' guest virtualization mode, the future of PV mode, qemu deprivileging, and more. We will cover why these new features are important for a wide range of environments, from cloud to embedded.
Note: also see https://www.slideshare.net/xen_com_mgr/ossna18-xen-beginners-training-exercise-script
The Xen Project supports some of the biggest clouds in production today and is moving into new industries, like security and automotive. Usually, you will use Xen indirectly as part of a commercial product, a distro, a hosting or cloud service and only indirectly use Xen. By following this session you will learn how Xen and virtualization work under the hood exploring high-level topics like architecture concepts related to virtualization to more technical attributes of the hypervisor like memory management (ballooning), virtual CPUs, scheduling, pinning, saving/restoring and migrating VMs.
Introduce F9 microkernel, new open source implementation built from scratch, which deploys modern kernel techniques, derived from L4 microkernel designs, to deep embedded devices.
:: https://github.com/f9micro
Characteristics of F9 microkernel
– Efficiency: performance + power consumption
– Security: memory protection + isolated execution
– Flexible development environment
The needs for immediate responsiveness of VMs in the virtualized environments have been on the rise. Several services in SKT also require soft realtime support for virtual machines to substitute the physical machines to achieve high utilization and adaptability. However, consolidated multiple OSes and irregular external events might render the hypervisor infringe on a VM's promptitude. As a solution of this problem, we are improving Xen's credit scheduler by introducing the RT_PRIORITY that guarantees a VM's running at any given point in time as long as credits remains to be burn. It would increase the quality of service and make a VM's behavior predictable on the consolidated environment. In addition, we extend our suggestion to the multi-core environment and even a large number of physical machines by using live migrations.
What do “Crazy in Love” by Beyonce and the “Xen Project” have in common? They are both 15-year-old hits. Flash forward to today. The Xen Project is used by more than 10 million users, powers some of the largest clouds on the planet, and is starting to build momentum in embedded and safety-conscious market segments. The Xen Project played a key role in developing technologies outside of the hypervisor, like hardware virtualization, and open source security disclosure standards that impact entire industries.
The Xen Project’s success and longevity can be attributed to its flexible architecture, but more importantly to enabling community members to contribute ideas and code, even if they are not core to the project's main use-case. We will share how the project has supported new technologies and ideas (sometimes in the form of failures and sometimes wins) and will derive best practices that may help other projects .
System Device Tree is an ongoing effort to expand the scope of Device Tree to describe and configure modern heterogeneous SoCs, including multiple CPUs clusters, their views of the system, and the software running on them. System Device Tree comes with Lopper, an Open Source Python tool to read a System Device Tree and produce one traditional Device Tree for each software execution domain.
The System Device Tree specification progressed significantly in the last year. This presentation will provide an update on the latest developments, such as the new bindings for the description and configuration of bus firewalls. The talk will deep-dive into Lopper, its flexible plugins architecture, and explain how to use it with System Device Tree today. If time allows, some common System Device Tree and Lopper use cases will be demonstrated.
Securing Your Cloud With the Xen Hypervisor by Russell Pavlicekbuildacloud
The Xen Project produces a mature, enterprise-grade virtualization technology designed for the Cloud featuring many advanced and unique security features. For this reason, it's a hypervisor of choice for government agencies like NSA and the DoD, as well as for new security-minded projects the QubesOS Secure Desktop. However, while much of the security of Xen is inherent in its design, many of the advanced security features, such as stub domains, driver domains, and Xen Security Modules (XSM), are not enabled by default. This session will describe many of the advanced security features of Xen, as well as explaining why Xen is an excellent choice for secure Clouds
Xen has been very successful on servers, and yet there are substantial areas where Xen can evolve further. In this talk Jun will discuss a compelling area where the Xen technologies can be applied to -- Mobile virtualization. Using Android as an example, the talk will explore two types of usage models, 1) Android as a guest, 2) Android as the host, showing the benefits of using the Xen technologies.
We implement link virtualization based on Xen. Link virtualization is a basic building block for network virtualizaiton that allows the co-existence of different Internet protocols. To minimize virtualize overhead, we use SR-IOV with Intel 82576
The lies we tell our code, LinuxCon/CloudOpen 2015-08-18Casey Bisson
As presented at LinuxCon/CloudOpen 2015: http://sched.co/3Y3v
We tell our code lies from development to deploy. The most common of these lies start with the simple act of launching a virtual machine. These lies are critical to our applications. Some of them protect applications from themselves and each other, some even improve performance. Some, however, decrease performance, and others create barriers to simply getting things done.
We lie about the systems, networks, storage, RAM, CPU and other resources our applications use, but how we tell those lies is critical to how the applications that depend on them perform. Joyent's Casey Bisson will explore the lies we tell our code and demonstrate examples of how they sometimes help and hurt us.
What do “Crazy in Love” by Beyonce and the “Xen Project” have in common? They are both 15-year-old hits. Flash forward to today. The Xen Project is used by more than 10 million users, powers some of the largest clouds on the planet, and is starting to build momentum in embedded and safety-conscious market segments. The Xen Project played a key role in developing technologies outside of the hypervisor, like hardware virtualization, and open source security disclosure standards that impact entire industries.
The Xen Project’s success and longevity can be attributed to its flexible architecture, but more importantly to enabling community members to contribute ideas and code, even if they are not core to the project's main use-case. We will share how the project has supported new technologies and ideas (sometimes in the form of failures and sometimes wins) and will derive best practices that may help other projects .
System Device Tree is an ongoing effort to expand the scope of Device Tree to describe and configure modern heterogeneous SoCs, including multiple CPUs clusters, their views of the system, and the software running on them. System Device Tree comes with Lopper, an Open Source Python tool to read a System Device Tree and produce one traditional Device Tree for each software execution domain.
The System Device Tree specification progressed significantly in the last year. This presentation will provide an update on the latest developments, such as the new bindings for the description and configuration of bus firewalls. The talk will deep-dive into Lopper, its flexible plugins architecture, and explain how to use it with System Device Tree today. If time allows, some common System Device Tree and Lopper use cases will be demonstrated.
Securing Your Cloud With the Xen Hypervisor by Russell Pavlicekbuildacloud
The Xen Project produces a mature, enterprise-grade virtualization technology designed for the Cloud featuring many advanced and unique security features. For this reason, it's a hypervisor of choice for government agencies like NSA and the DoD, as well as for new security-minded projects the QubesOS Secure Desktop. However, while much of the security of Xen is inherent in its design, many of the advanced security features, such as stub domains, driver domains, and Xen Security Modules (XSM), are not enabled by default. This session will describe many of the advanced security features of Xen, as well as explaining why Xen is an excellent choice for secure Clouds
Xen has been very successful on servers, and yet there are substantial areas where Xen can evolve further. In this talk Jun will discuss a compelling area where the Xen technologies can be applied to -- Mobile virtualization. Using Android as an example, the talk will explore two types of usage models, 1) Android as a guest, 2) Android as the host, showing the benefits of using the Xen technologies.
We implement link virtualization based on Xen. Link virtualization is a basic building block for network virtualizaiton that allows the co-existence of different Internet protocols. To minimize virtualize overhead, we use SR-IOV with Intel 82576
The lies we tell our code, LinuxCon/CloudOpen 2015-08-18Casey Bisson
As presented at LinuxCon/CloudOpen 2015: http://sched.co/3Y3v
We tell our code lies from development to deploy. The most common of these lies start with the simple act of launching a virtual machine. These lies are critical to our applications. Some of them protect applications from themselves and each other, some even improve performance. Some, however, decrease performance, and others create barriers to simply getting things done.
We lie about the systems, networks, storage, RAM, CPU and other resources our applications use, but how we tell those lies is critical to how the applications that depend on them perform. Joyent's Casey Bisson will explore the lies we tell our code and demonstrate examples of how they sometimes help and hurt us.
Making Clouds: Turning OpenNebula into a ProductNETWAYS
What does it takes to bring innovations like private clouds to small and medium enterprises? In the course of this talk we will present our experience in creating a self-service toolkit for creating a complete virtualization and cloud platform based on OpenNebula, as well as our experience gathered in tens of installations of all sizes. From scalable storage (with benchmarks!) to autonomic optimization, we will present what in our view is needed to bring private clouds to everyone, what components and additions we created to better solve our customers’ problems (from replacing industrial control systems to medium scale virtual desktop infrastructures), and why OpenNebula has been chosen over other competing cloud toolkits.
OpenNebulaConf 2013 - Making Clouds: Turning OpenNebula into a Product by Car...OpenNebula Project
What does it takes to bring innovations like private clouds to small and medium enterprises? In the course of this talk we will present our experience in creating a self-service toolkit for creating a complete virtualization and cloud platform based on OpenNebula, as well as our experience gathered in tens of installations of all sizes. From scalable storage (with benchmarks!) to autonomic optimization, we will present what in our view is needed to bring private clouds to everyone, what components and additions we created to better solve our customers’ problems (from replacing industrial control systems to medium scale virtual desktop infrastructures), and why OpenNebula has been chosen over other competing cloud toolkits.
Bio:
Carlo Daffara the Technical director of Cloudweavers, and formerly head of research and development at Conecta, a consulting firm specializing in open source systems and distributed computing; Italian member of the European Working Group on Libre Software and co-coordinator of the working group on SMEs of the EU ICT task force on competitiveness. Since 1999, works as evaluator for IST programme submissions in the field of component-based software engineering, GRIDs and international cooperation. Coordinator of the open source platforms technical area of the IEEE technical committee on scalable computing, co-chair of the SIENA EU cloud initiative roadmap editorial board and part of the editorial review board of the International Journal of Open Source Software & Processes (IJOSSP).
What is an operating System Structure?
We want a clear structure to let us apply an operating system to our particular needs because operating systems have complex structures. It is easier to create an operating system in pieces, much as we break down larger issues into smaller, more manageable subproblems. Every segment is also a part of the operating system. Operating system structure can be thought of as the strategy for connecting and incorporating various operating system components within the kernel. Operating systems are implemented using many types of structures, as will be discussed below:
SIMPLE STRUCTURE
It is the most straightforward operating system structure, but it lacks definition and is only appropriate for usage with tiny and restricted systems. Since the interfaces and degrees of functionality in this structure are clearly defined, programs are able to access I/O routines, which may result in unauthorized access to I/O procedures.
The Lies We Tell Our Code (#seascale 2015 04-22)Casey Bisson
We tell our code lies from development to deploy. The most common of these lies start with the simple act of launching a virtual machine. These lies are critical to our applications. Some of them protect applications from themselves and each other, some even improve performance. Some, however, decrease performance, and others create barriers to simply getting things done.
We lie about the systems, networks, storage, RAM, CPU and other resources our applications use, but how we tell those lies is critical to how the applications that depend on them perform. Joyent's Casey Bisson will explore the lies we tell our code and demonstrate examples of how they sometimes help and hurt us.
Slides as presented at http://www.meetup.com/Seattle-Scalability-Meetup/events/219709036/. Video from that meetup is on YouTube, https://www.youtube.com/watch?v=LtPS2z_c2v4.
1. INSE 6620 (Cloud Computing Security and Privacy)
Cloud Computing 101
Prof. Lingyu Wang
1
2. The Big PictureThe Big Picture
Cloud applications: data-
intensi e omp te intensi e
storage intensive
intensive, compute-intensive,
storage-intensive
Bandwidth
WS
Web-services, SOA, WS standards
Services interface
WS
Virtualization: bare metal hypervisor
VM0 VM1 VMn
Storage
Multi-core architectures
Virtualization: bare metal, hypervisor. …Storage
Models: S3,
BigTable,
BlobStore,
...
2Ramamurthy et al., Cloud Computing: Concepts, Technologies and Business Implications
64-bit
processor
3. Enabling TechnologiesEnabling Technologies
Cloud computing relies on:
1. Hardware advancements
2. Web x.0 technologies
3 Vi t li ti3. Virtualization
4. Distributed file system
3
Slides 3-11 are partially based on: Li et al., Chapter 3 Enabling technologies, In Spatial Cloud Computing: a practical approach,
edited by Yang et al., CRC Press: pp. 31-46.
4. Hardware Advancements: Multi-coreHardware Advancements: Multi core
Single-core and multi-thread computing model
bl t t th i t i tiwas unable to meet the intensive computing
demand
M lti o e CPU fi t ed in l te 1900Multi-core CPU was first used in late 1900s
Characterized by low electricity consumption,
efficient space utilization, and favorableefficient space utilization, and favorable
performance
Help cloud providers build energy-efficient and
high performance data centers
Virtualization, multi-tenancy
4
5. Hardware advancements: NetworkingHardware advancements: Networking
Cloud computing provides services in a multi-
t t i t h t k i itenant environment where network is serving
as the “glue” function.
Intra-cloud
network
Wide-area
network Virtual
instance networkinstance
• Blob
• Table
Queue
StorageCompute
• Queue
5
Storage
service
p
cluster
“Elastic”Li et al., CloudCmp: Comparing Public Cloud Providers, IMC
6. Storage/Smart DevicesStorage/Smart Devices
The fast developing storage technologies meet
th t d f l d tithe storage need of cloud computing.
Smart devices accelerate the development of
lo d omp ting b en i hing it ecloud computing by enriching its access
channels for cloud consumers.
6
8. Web X.0: the Evolution of WebWeb X.0: the Evolution of Web
8
9. Web x.0: Web ServicesWeb x.0: Web Services
A web service is a software system designed to
t i t bl hi t hisupport interoperable machine-to-machine
interaction over a network
SOAP b ed eb e i eSOAP-based web services:
Web Services Description Language (WSDL)
Simple Object Access Protocol (SOAP)Simple Object Access Protocol (SOAP)
XML is extensively used
RESTful web services:RESTful web services:
retrieve information through simple HTTP methods
such as GET, POST, PUT and DELETE.
E.g. Google APIs, Yahoo APIs
9
10. Service-Oriented Architecture (SOA)Service Oriented Architecture (SOA)
A service based component model for
d l i ft i th f f i t bldeveloping software in the form of interoperable
services
Benefit of ing SOABenefits of using SOA:
Component reusing
Existing system integrationExisting system integration
Language and platform independent
10
11. Web x.0: Cloud computing and SOAWeb x.0: Cloud computing and SOA
Cloud computing, to a large extent, leverages
th t f SOA i ll i th S S dthe concept of SOA, especially in the SaaS and
PaaS layers.
The h e diffe ent emph iThey have different emphasis:
-- SOA is an architecture focusing on
i th ti f “h tanswering the question of “how to
develop applications”.
-- Cloud computing is an infrastructure
h i i th l ti f “hemphasizing on the solution of “how
to deliver applications”.
11
13. What Is Virtualization?What Is Virtualization?
“Creating a virtual (rather than actual) version of something,
including but not limited to a virtual computer hardware platform,including but not limited to a virtual computer hardware platform,
operating system (OS), storage device, or computer network
resources.”
E.g., Windows and Linux on the same laptopg , p p
How is it different from dual-boot?
The OSes are completely isolated from each other
13Slides 13-34 are partially based on: Alex Landau, Virtualization Technologies, IBM Haifa Research Lab
14. We’ve Been Doing It For Decades!We ve Been Doing It For Decades!
Indeed – an OS provides isolation between processes
Each has it’s own virtual memoryEach has it s own virtual memory
Controlled access to I/O devices (disk, network) via system calls
Process scheduler to decide which process runs on which CPU core
So why virtual “machine”?So why virtual machine ?
Try running Microsoft Exchange requiring Windows and some
applications requiring Linux simultaneously on the same box!
O b tt t t t d Boei d Ai b t th iOr better yet, try to persuade Boeing and Airbus to run their
processes side-by-side on one server
Psychological effect – what sounds better?
’ i l hi d ’ h dYou’re given your own virtual machine and you’re root there – do
whatever you want
You can run certain processes, but you don’t get root, call our
helpdesk with your configuration requests and we’ll get back to you
14
helpdesk with your configuration requests and we ll get back to you
in 5 business days…
15. BenefitsBenefits
Decoupling HW/SW leads to many benefits:
Server consolidation
Running web/app/DB servers on same machine,u g eb/app/ se e s o sa e a e,
without losing robustness
electricity savings, room space savings...
Easier backup/restore/upgrade/provisioning
Easier testing (e.g., firewall)
Making IaaS possible
15
16. Two Types of HypervisorsTwo Types of Hypervisors
Definitions
Hypervisor (or VMM – Virtual Machine Monitor) is a software
layer that allows several virtual machines to run on a
physical machine
The physical OS and hardware are called the Host
The virtual machine OS and applications are called the Guest
Type 1 (bare-metal) Type 2 (hosted)
VM1 VM2
yp ( )
Guest Process Hypervisor
VM1 VM2
yp ( )
Guest
VMware ESX Microsoft Hyper V Xen
Hardware
Hypervisor
Host
Hardware
OS
VMware Workstation Microsoft Virtual PC
Host
16
VMware ESX, Microsoft Hyper-V, Xen VMware Workstation, Microsoft Virtual PC,
Sun VirtualBox, QEMU, KVM
17. Bare-Metal or Hosted?Bare Metal or Hosted?
Bare-metal
Has complete control over hardwareHas complete control over hardware
Doesn’t have to “fight” an OS
Hosted
Avoid code duplication: need not code a process schedulerAvoid code duplication: need not code a process scheduler,
memory management system – the OS already does that
Can run native processes alongside VMs
Familiar environment – how much CPU and memory does a VMFamiliar environment how much CPU and memory does a VM
take? Use top! How big is the virtual disk? ls –l
Easy management – stop a VM? Sure, just kill it!
A combinationA combination
Mostly hosted, but some parts are inside the OS kernel for
performance reasons
E.g., KVM
17
g ,
18. How to Run a VM? Emulate!How to Run a VM? Emulate!
Do whatever the CPU does but in software
Fetch the next instruction
Decode – is it an ADD, a XOR, a MOV?
Execute – using the emulated registers and memoryg g y
Example:
addl %ebx, %eax
is emulated as:
enum {EAX=0, EBX=1, ECX=2, EDX=3, …};
unsigned long regs[8];
regs[EAX] += regs[EBX];
Pro: Simple!
Con: Slooooooooow
Example hypervisor: BOCHS
18
Example hypervisor: BOCHS
19. How to run a VM? Trap and emulate!How to run a VM? Trap and emulate!
Run the VM directly on the CPU – no
l ti !emulation!
Most of the code can execute just fine
ddl % b %E.g., addl %ebx, %eax
Some code needs hypervisor intervention
i t $0 80int $0x80
movl something, %cr3
I/OI/O
Trap and emulate it!
E g if guest runs int $0x80
19
E.g., if guest runs int $0x80,
trap it and execute guest’s
interrupt 0x80 handler
20. Trap and Emulate ModelTrap and Emulate Model
Traditional OS :
When application invoke a
system call :
CPU will trap to interruptCPU will trap to interrupt
handler vector in OS.
CPU will switch to kernel
mode (Ring 0) and
execute OS instructions.
When hardware event :
Hardware will interrupt
CPU execution, and jump
to interrupt handler in
OS.
21. Trap and Emulate Model Cont’dTrap and Emulate Model Cont d
VMM and Guest OS :
System CallSystem Call
CPU will trap to interrupt handler
vector of VMM.
VMM jump back into guest OS.
Hardware Interrupt
Hardware make CPU trap to
interrupt handler of VMM.
VMM jump to correspondingVMM jump to corresponding
interrupt handler of guest OS.
Privilege Instruction
Running privilege instructionsg p g
in guest OS will be trapped to
VMM for instruction emulation.
After emulation, VMM jump back
to guest OS.to guest OS.
22. Trap and Emulate Model Cont’dTrap and Emulate Model Cont d
Pro:
Pe fo mance!Performance!
Cons:
Harder to implementp
Need hardware support
Not all “sensitive” instructions cause a trap when executed in
usermode
E.g., POPF, that may be used to clear interrupt flag (IF)
This instruction does not trap, but value of IF does not
change!
This hardware support is called VMX (Intel) or SVM (AMD)
Exists in modern CPUs
Example hypervisor: KVM
22
Example hypervisor: KVM
23. Dynamic (Binary) TranslationDynamic (Binary) Translation
Take a block of binary VM code that is about to be
executedexecuted
Translate it on the fly to “safe” code (like JIT – just in
time compilation)p )
Execute the new “safe” code directly on the CPU
Translation rules?Translation rules?
Most code translates identically (e.g., movl %eax, %ebx
translates to itself)
“Sensitive” operations are translated into “hypercalls”Sensitive operations are translated into hypercalls
Hypercall – call into the hypervisor to ask for service
Implemented as trapping instructions (unlike POPF)
23
24. Dynamic (Binary) Translation Cont’dDynamic (Binary) Translation Cont d
Pros:
No hardware support required
Performance – better than emulation
CCons:
Performance – worse than trap and emulate
Hard to implementHard to implement
Example hypervisors:
VMware QEMUVMware, QEMU
24
25. How to run a VM? Paravirtualization!How to run a VM? Paravirtualization!
Requires modified guest OS to “know” it is
i t f h irunning on top of a hypervisor
E.g., instead of doing cli to turn off interrupts,
guest OS should do hypercall(DISABLE INTERRUPTS)guest OS should do hypercall(DISABLE_INTERRUPTS)
25
26. How to run a VM? Paravirtualization!How to run a VM? Paravirtualization!
Pros:
No hardware support required
Performance – better than emulation
CCon:
Requires specifically modified guest
Same guest OS cannot run in the VM and bareSame guest OS cannot run in the VM and bare-
metal
Example hypervisor: XenExample hypervisor: Xen
26
27. I/O VirtualizationI/O Virtualization
Types of I/O:
Block (e.g., hard disk)
Network
Input (e g keyboard mouse)Input (e.g., keyboard, mouse)
Sound
VideoVideo
Most performance critical (for servers):
NetworkNetwork
Block
27
28. I/O Virtualization ModelsI/O Virtualization Models
VM VM
Monolithic Model
VM VM
Pass-through Model
Service VMs Guest VMs
Service VM Model
I/O Services
VM0
Guest OS
and Apps
VMn
Guest OS
and Apps
VM0
Guest OS
and Apps
Device
Drivers
VMn
Guest OS
and Apps
Device
Drivers
I/O
Services
Device
Drivers
VMn
VM0
Guest OS
Hypervisor
I/O Services
Device Drivers
Hypervisor
Drivers Drivers
Hypervisor
Drivers
and Apps
Hypervisor
Shared
Devices
Assigned
Devices
Shared
Devices
Pro: Higher Performance
Pro: I/O Device Sharing
Pro: VM Migration
Con: Larger Hypervisor
Pro: Highest Performance
Pro: Smaller Hypervisor
Pro: Device assisted sharing
Con: Migration Challenges
Pro: High Security
Pro: I/O Device Sharing
Pro: VM Migration
Con: Lower Performance
28
g yp Con: Migration ChallengesCon: Lower Performance
29. How Does a NIC Driver Work?How Does a NIC Driver Work?
Transmit path:
OS prepares packet to transmit in a buffer in memoryOS prepares packet to transmit in a buffer in memory
Driver writes start address of buffer to register X of the NIC
Driver writes length of buffer to register Y
Driver writes ‘1’ (GO!) into register T
NIC reads packet from memory addresses [X,X+Y) and sends it on the wire
NIC sends interrupt to host (TX complete, next packet please)
Receive path:
Driver prepares buffer to receive packet into
Driver writes start address of buffer to register X
Driver writes length of buffer to register Y
Driver writes ‘1’ (READY-TO-RECEIVE) into register RDriver writes 1 (READY-TO-RECEIVE) into register R
When packet arrives, NIC copies it into memory at [X,X+Y)
NIC interrupts host (RX)
OS processes packet (e.g., wake the waiting process up)
29
p p ( g , g p p)
30. I/O Virtualization? Emulate!I/O Virtualization? Emulate!
Hypervisor implements virtual NIC (by the
specification of a real NIC e g Intel Realtekspecification of a real NIC, e.g., Intel, Realtek,
Broadcom)
NIC registers (X, Y, Z, T, R, …) are just variables in
hypervisor (host) memory
If guest writes ‘1’ to register T, hypervisor reads buffer
from memory [X,X+Y) and passes it to physical NIC driver
ffor transmission
When physical NIC interrupts (TX complete), hypervisor
injects TX complete interrupt into guest
Similar for receive path
30
31. I/O Virtualization? Emulate!I/O Virtualization? Emulate!
Pro:
Unmodified guest (guest already has drivers for
Intel NICs…)
Cons:Cons:
Slow – every access to every NIC register causes a
VM exit (trap to hypervisor)( p yp )
Hypervisor needs to emulate complex hardware
Example hypervisors: QEMU, KVM, VMwarep yp Q , ,
(without VMware Tools)
31
32. I/O Virtualization? Paravirtualize!I/O Virtualization? Paravirtualize!
Add virtual NIC driver into guest OS (frontend)
Implement the i t al NIC in the h pe iso (backend)Implement the virtual NIC in the hypervisor (backend)
Everything works just like in the emulation case…
…except – protocol between frontend and backend
Protocol in emulation case:
Guest writes registers X, Y, waits at least 3 nano-sec and
writes to register Twrites to register T
Hypervisor infers guest wants to transmit packet
Paravirtual protocol:
Guest does a hypercall, passes it start address and length as
arguments
Hypervisor knows what it should do
32
33. I/O Virtualization? Paravirtualize!I/O Virtualization? Paravirtualize!
Pro: Fast – no need to emulate physical device
Con: Requires guest driver
Example hypervisors: QEMU, KVM, VMware
(with VMware Tools), Xen
How is paravirtual I/O different from
i t l t?paravirtual guest?
Paravirtual guest requires to modify whole OS
Try doing it on Windows (without source code) or evenTry doing it on Windows (without source code), or even
Linux (lots of changes)
Paravirtual I/O requires the addition of a single
d i t t
33
driver to a guest
Easy to do on both Windows and Linux guests
34. Direct access / direct assignmentDirect access / direct assignment
“Pull” NIC out of the host, and “plug” it into
th tthe guest
Guest is allowed to access NIC registers directly,
no hypervisor interventionno hypervisor intervention
Host can’t access NIC anymore
Pro: As fast as possible!Pro: As fast as possible!
Cons:
Need NIC per guest, plus one for hostNeed NIC per guest, plus one for host
Can’t do “cool stuff”
Encapsulate guest packets, monitor, modify them at the
h i l l
34
hypervisor level
Example hypervisors: KVM, Xen, VMware
35. XenXen
The University of Cambridge Computer
L b t d l d th fi t i f XLaboratory developed the first versions of Xen
The Xen community develops and maintains Xen as
free and open-source software (GPL)free and open source software (GPL)
Xen is currently available for the IA-32, x86-64 and
ARM instruction sets
(Original) Target: 100 virtual OSes per
machine
Slides 35-48 partially based on: Barham et al., Xen and the Art of Virtualization, SOSP’03 35
36. Xen: Approach OverviewXen: Approach Overview
Conventional approach
Full virtualization
Cannot access the hardware
Problematic for certain privileged instructions (e.g., traps)Problematic for certain privileged instructions (e.g., traps)
No real-time guarantees
Xen: paravirtualization
Provides some exposures to the underlying HW
Better performance
Need modifications to the OSNeed modifications to the OS
No modifications to applications
36
37. TLB (Translation Lookaside Buffer)TLB (Translation Lookaside Buffer)
Hardware cache containing parts of page table
Translates virtual into real addresses
A TLB “miss” will cause an expensive page walk
TLB t b fl h d h t t it hiTLB must be flushed when context switching
Minimum cost on Pentium 4 to change TLB is
516 cycles (184ns)516 cycles (184ns)
http://www.mega-
tokyo.com/osfaq2/index.php/Context%20Switching
Thus, Xen avoids context switching on system
calls for performance reasons
37
38. Memory ManagementMemory Management
Depending on the hardware supports
Software managed TLB (translation lookaside
buffer) can be easily virtualized
Tagged TLB will allow coexistence of OSes andTagged TLB will allow coexistence of OSes, and
avoid TLB flushing across OS boundaries
X86 has no software managed/tagged TLBg / gg
Xen exists at the top 64MB of every address space
to avoid TLB flushing when a guest enter/exist Xen
Each OS can only map to memory it owns
Writes are validated by Xen
38
39. CPUCPU
X86 supports 4 levels of privileges
Xen downgrades the privilege of OSes
System-call and page-fault handlers registered to
XenXen
“fast handlers” for most exceptions, Xen isn’t
involved
I/O: Xen exposes a set of simple device
abstractions
I/O data is transferred to and from guest via Xen,
using shared-memory
Efficient while allowing Xen to perform validationEfficient while allowing Xen to perform validation
39
40. The Cost of Porting an OS to XenThe Cost of Porting an OS to Xen
<2% of code-base
Privileged instructions
Page table access
Network driverNetwork driver
Block device driver
40
41. Control ManagementControl Management
Domain0 (a special guest) hosts the
li ti l l t ftapplication-level management software
Creation and deletion
of other guests processor memoryof other guests, processor, memory,
virtual network
interfaces and blockinterfaces and block
devices
Exposed through anp g
interface to application
-level management
software
41
42. Control TransferControl Transfer
Hypercall: synchronous calls from a guest to
XXen
Software trap to perform privileged operation
Analogous to system callsAnalogous to system calls
e.g., page table update requests
Events: asynchronous notifications from XenEvents: asynchronous notifications from Xen
to guests
Replace device interrupts for lightweight notificationReplace device interrupts for lightweight notification
e.g., guest termination request, new data received
over network
42
43. Data Transfer: I/O RingsData Transfer: I/O Rings
e.g., requests for received packets
43
44. NetworkNetwork
Virtual firewall-router attached to each guest
Virtual NICs have two I/O rings and rules
e.g., rules for preventing IP source spoofing,
incoming connection attemptsincoming connection attempts
To send a packet, enqueue a buffer descriptor
into the transmit I/O ringinto the transmit I/O ring
A domain needs to exchange unused page
frame for each received packetframe for each received packet
use DMA (zero copy)
avoid copy of packets between Xen and guestpy p g
44
45. DiskDisk
Only Domain0 has direct access to disks
Oth t d t i t l bl k d iOther guests need to use virtual block devices
Use the I/O ring
Guest OS will typically reorder requests prior toGuest OS will typically reorder requests prior to
enqueuing them on the ring
Xen will also reorder requests to improve
performance since it knows better about the realperformance since it knows better about the real
disk layout
Use DMA (zero copy)( py)
45
46. EvaluationEvaluation
Dell 2650 dual processor
2.4 GHz Xeon server
2GB RAM
3 Gb Ethernet NIC3 Gb Ethernet NIC
1 Hitachi DK32eJ 146 GB 10k RPM SCSI disk
Linux 2 4 21Linux 2.4.21
46
49. Live Migration of Virtual MachinesLive Migration of Virtual Machines
Move a running virtual machine from one host
t th h t ith i d d tito another host with no perceived downtime
VM is not aware of the migration
Maintain TCP connections of the guest OSMaintain TCP connections of the guest OS
VM is treated as a black box
How is Live Migration (LM) different from QuickHow is Live Migration (LM) different from Quick
Migration (QM)?
QM: VM is saved and restored on destinationQM: VM is saved and restored on destination
QM: Results in downtime for applications/workloads
running inside VMs
49
50. Use CasesUse Cases
Patching or hardware servicing
Migrate VMs to temporary hosts and migrate back
after original hosts are patched/upgraded
Load balancingLoad balancing
Migrate VMs to hosts with less load
Server consolidationServer consolidation
Migrate VMs to a few hosts during off-peak hours
and shut down other hosts to reduce powerp
consumption
50
51. MethodologyMethodology
Three phases
Push: source VM continues running
Stop and copy: stop source VM, start new VM
Pull: copy what remainsPull: copy what remains
Possible approaches
Pure stop and copyPure stop-and-copy
Pure demand-migration
Pre-copyPre copy
Slides 52-56 partially based on: Tewari et al., From Zero to Live Migration 51
52. Memory Copy: Full CopyMemory Copy: Full Copy
Memory content isMemory content isMemory content isMemory content is
copied to new servercopied to new server
VM preVM pre--stagedstaged
SAN
First initial copy is of all
SAN
First initial copy is of all
in memory content
VHD
52
53. Memory Copy: Dirty PagesMemory Copy: Dirty Pages
Client continuesClient continues
accessing VMaccessing VM
Pages arePages are
b i di i db i di i d
accessing VMaccessing VM
being dirtiedbeing dirtied
SAN
Client continues to access
VM, which results in
memory being modified
SAN
VHD
53
54. Memory Copy: Incremental CopyMemory Copy: Incremental Copy
Smaller set ofSmaller set of
changeschanges
Recopy of changesRecopy of changes
changeschanges
Transfer the content of the VM’s SANTransfer the content of the VM s
memory to the destination host
Track pages modified by the
VM, retransfer these pages
SAN
VHD
54
55. Live Migration Final TransitionLive Migration Final Transition
Partition StatePartition State
i di dcopiedcopied
Save register and device state of
VM on so ce host
SAN
VM on source host
Transfer saved state and
storage ownership to destination
host VHD
Restore VM from saved state on
destination host
55
56. Post-Transition: Clean-upPost Transition: Clean up
Client directed toClient directed to
new hostnew host
Old VM deleted onceOld VM deleted onceOld VM deleted onceOld VM deleted once
migration is verifiedmigration is verified
successfullysuccessfully
SAN
ARP issued to have routing
devices update their tables
Since session state is
maintained no
SAN
VHDmaintained, no
reconnections necessary
VHD
56