Disco: Running CommodityOperating Systems on ScalableMultiprocessors    Edouard Bugnion, Scott Devine, Mendel Rosenblum,  ...
Outline   Virtualization   Disco description   Disco performance   Discussion                        2
Virtualization   “a technique for hiding the physical    characteristics of computing resources from the    way in which ...
Old idea from the 1960s   IBM VM/370 – A VMM for IBM mainframe          Multiple OS environments on expensive hardware  ...
A Return to Virtual Machines   Disco: Stanford research project (SOSP ’97)       Run commodity OSes on scalable multipro...
Overview   Virtual Machine       A fully protected and isolated copy of the underlying        physical machine’s hardwar...
Classification of VirtualMachines                            7
Classification of VirtualMachines   Type I          VMM is implemented directly on the physical hardware.           VMM...
Non-Virtualizable Architectures   According to Popek and Goldberg,     ” an architecture is virtualizable if the set of  ...
Type I contd..     Hardware Support for VirtualizationFigure: The hardware support approach to x86 Virtualization        ...
Type I contd..    Full VirtualizationFigure : The binary translation approach to x86 Virtualization                E.g. V...
Type I contd..   ParavirtualizationFigure: The Paravirtualization approach to x86 Virtualization                E.g. Xen ...
Type II   Hosted VM Architecture        E.g. VMware Workstation, Connectix VirtualPC                                     ...
Disco : VMM Prototype   Goals      Extend modern OS to run efficiently on shared       memory multiprocessors without la...
Problem Description   Multiprocessor in the market (1990s)      Innovative Hardware   Hardware faster than System Softw...
Solution to the problems    Resource-intensive Modification of OS (hard and     time consuming, increase in size, etc)  ...
Two opposite Way for SystemSoftware   Address these challenges in the operating system:    OS-Intensive      Hive , Hurr...
Return to Virtual MachineMonitors   One Compromise Way between OS-intensive & OS-    light – VMM   Virtual machine monit...
Architecture of Disco                        19
Advantages of this approach   Scalability   Flexibility   Hide NUMA effect   Fault Containment   Compatibility with l...
Challenges Facing VirtualMachines   Overheads         Trap and emulate privileged instructions of          guest OS     ...
Disco’s Interface   Processors       MIPS R10000 processor       Emulates all instructions, the MMU, trap architecture ...
Disco Implementation   Multi threaded shared memory program   Attention to NUMA memory placement, cache aware    data st...
Virtual CPUs   Direct Execution        execution of virtual CPU on real CPU       Sets the real machine’s registers to ...
Virtual Physical Memory   Address translation & maintains a physical-to-    machine address (40 bit) mapping.   Virtual ...
Contd..   Kernel mode references on MIPS processors access    memory and I/O directly - need to re-link OS code    and da...
NUMA Memory management   Cache misses should be satisfied from local memory    (fast) rather than remote memory (slow)  ...
Transparent Page Replication1. Two different virtual processors of the same virtual machine logically   read-share the sam...
Disco Memory Management                          29
Virtual I/O Devices   Disco intercepts all device accesses from the virtual    machine and forwards them to the physical ...
Copy-on-write Disks   Intercept DMA requests to translate the physical    addresses into machine addresses.   Maps machi...
Transparent Sharing of Pages Creates a global buffer cache shared across VMs and reduces memory foot print of the system  ...
Virtual Network Interface   Virtual subnet and network interface use copy on    write mapping to share the read only page...
Transparent sharing of pages        over NFS1. The monitor’s networking device remaps the data page from the source’s    m...
Modifications to the IRIX 5.3OS   Minor changes to kernel code and data    segment – specific to MIPS       Relocate the...
Modifications to the IRIX 5.3OS   Added code to HAL to pass hints to monitor    for resource management   New Monitor ca...
SPLASHOS: A specialized OS   Thin specialized library OS, supported    directly by Disco   No need for virtual memory su...
Disco: Performance   Experimental Setup       Disco targets the FLASH machine not        available that time       Used...
Disco: Performance   Workloads                     39
Disco: Performance           Execution OverheadPmake overhead due to I/O virtualization, others due to TLB mappingReducti...
Disco: Performance   Memory Overheads    V: Pmake memory used if there is no sharing    M: Pmake memory used if there is ...
Disco: Performance    Scalability    Partitioning of problem into different VM’s increases scalability.    Kernel synchro...
Disco: Performance   Dynamic Page Migration and replication                                         43
Conclusion   Disco VMM hides NUMA-ness from non-    NUMA aware OS   Disco VMM is low(er) effort   Moderate overhead due...
Discussion   Was Disco- VMM done rightly?        Virtual Physical Memory on architectures other        than MIPS        ...
Discussion Are VMM Microkernels done right?                        46
Upcoming SlideShare
Loading in …5
×

Disco: Running Commodity Operating Systems on Scalable Multiprocessors Disco

1,412 views

Published on

Disco: Running Commodity Operating Systems on Scalable Multiprocessors

VMware started as a grad school project out of Stansford university.
Many servers were under utilized so Project Disco was born.
Yes, VMware was originally called Project Disco.
In fact both GOOGLE and Project Disco started out together as grad projects.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,412
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
20
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Disco: Running Commodity Operating Systems on Scalable Multiprocessors Disco

  1. 1. Disco: Running CommodityOperating Systems on ScalableMultiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented by Divya Parekh 1
  2. 2. Outline Virtualization Disco description Disco performance Discussion 2
  3. 3. Virtualization “a technique for hiding the physical characteristics of computing resources from the way in which other systems, applications, or end users interact with those resources. This includes making a single physical resource appear to function as multiple logical resources; or it can include making multiple physical resources appear as a single logical resource” 3
  4. 4. Old idea from the 1960s IBM VM/370 – A VMM for IBM mainframe  Multiple OS environments on expensive hardware  Desirable when few machine around Popular research idea in 1960s and 1970s  Entire conferences on virtual machine monitors  Hardware/VMM/OS designed together Interest died out in the 1980s and 1990s  Hardware got more cheaper  Operating systems got more powerful (e.g. multi-user) 4
  5. 5. A Return to Virtual Machines Disco: Stanford research project (SOSP ’97)  Run commodity OSes on scalable multiprocessors  Focus on high-end: NUMA, MIPS, IRIX Commercial virtual machines for x86 architecture  VMware Workstation (now EMC) (1999-)  Connectix VirtualPC (now Microsoft) Research virtual machines for x86 architecture  Xen (SOSP ’03)  plex86 OS-level virtualization  FreeBSD Jails, User-mode-linux, UMLinux 5
  6. 6. Overview Virtual Machine  A fully protected and isolated copy of the underlying physical machine’s hardware. (definition by IBM)” Virtual Machine Monitor  A thin layer of software thats between the hardware and the Operating system, virtualizing and managing all hardware resources.  Also known as “Hypervisor” 6
  7. 7. Classification of VirtualMachines 7
  8. 8. Classification of VirtualMachines Type I  VMM is implemented directly on the physical hardware.  VMM performs the scheduling and allocation of the system’s resources.  IBM VM/370, Disco, VMware’s ESX Server, Xen Type II  VMMs are built completely on top of a host OS.  The host OS provides resource allocation and standard execution environment to each “guest OS.”  User-mode Linux (UML), UMLinux 8
  9. 9. Non-Virtualizable Architectures According to Popek and Goldberg, ” an architecture is virtualizable if the set of sensitive instructions is a subset of the set of privileged instructions.” x86  Several instructions can read system state in register CPL 3 without trapping MIPS  KSEG0 bypasses TLB, reads physical memory directly 9
  10. 10. Type I contd..  Hardware Support for VirtualizationFigure: The hardware support approach to x86 Virtualization E.g. Intel Vanderpool/VT and AMD-V/SVM 10
  11. 11. Type I contd..  Full VirtualizationFigure : The binary translation approach to x86 Virtualization E.g. VMware ESX server 11
  12. 12. Type I contd.. ParavirtualizationFigure: The Paravirtualization approach to x86 Virtualization E.g. Xen 12
  13. 13. Type II Hosted VM Architecture E.g. VMware Workstation, Connectix VirtualPC 13
  14. 14. Disco : VMM Prototype Goals  Extend modern OS to run efficiently on shared memory multiprocessors without large changes to the OS.  A VMM built to run multiple copies of Silicon Graphics IRIX operating system on a Stanford Flash shared memory multiprocessor. 14
  15. 15. Problem Description Multiprocessor in the market (1990s)  Innovative Hardware Hardware faster than System Software  Customized OS are late, incompatible, and possibly bug Commodity OS not suited for multiprocessors  Do not scale cause of lock contention, memory architecture  Do not isolate/contain faults  More Processors  More failures 15
  16. 16. Solution to the problems  Resource-intensive Modification of OS (hard and time consuming, increase in size, etc)  Make a Virtual Machine Monitor (software) between OS and Hardware to resolve the problem 16
  17. 17. Two opposite Way for SystemSoftware Address these challenges in the operating system: OS-Intensive  Hive , Hurricane, Cellular-IRIX, etc  innovative, single system image  But large effort. Hard-partition machine into independent failure units: OS-light  Sun Enterprise10000 machine  Partial single system image  Cannot dynamically adapt the partitioning 17
  18. 18. Return to Virtual MachineMonitors One Compromise Way between OS-intensive & OS- light – VMM Virtual machine monitors, in combination with commodity and specialized operating systems, form a flexible system software solution for these machines Disco was introduced to allow trading off between the costs of performance and development cost. 18
  19. 19. Architecture of Disco 19
  20. 20. Advantages of this approach Scalability Flexibility Hide NUMA effect Fault Containment Compatibility with legacy applications 20
  21. 21. Challenges Facing VirtualMachines Overheads  Trap and emulate privileged instructions of guest OS  Access to I/O devices  Replication of memory in each VM Resource Management  Lack of information to make good policy decisions Communication and Sharing  Stand alone VM’s cannot communicate 21
  22. 22. Disco’s Interface Processors  MIPS R10000 processor  Emulates all instructions, the MMU, trap architecture  Extension to support common processor operations  Enabling/disabling interrupts, accessing privileged registers Physical memory  Contiguous, starting at address 0 I/O devices  Virtualize devices like I/O, disks, n/w interface exclusive to VM  Physical devices multiplexed by Disco  Special abstractions for SCSI disks and network interfaces  Virtual disks for VMs  Virtual subnet across all virtual machines 22
  23. 23. Disco Implementation Multi threaded shared memory program Attention to NUMA memory placement, cache aware data structures and IPC patterns Code segment of DISCO copied to each flash processor – data locality Communicate using shared memory 23
  24. 24. Virtual CPUs Direct Execution  execution of virtual CPU on real CPU  Sets the real machine’s registers to the virtual CPU’s  Jumps to the current PC of the virtual CPU, Direct execution on the real CPU Challenges  Detection and fast emulation of operations that cannot be safely exported to the virtual machine  privileged instructions such as TLB modification and Direct access to physical memory and I/O devices. Maintains data structure for each virtual CPU for trap emulation Scheduler multiplexes virtual CPU on real processor 24
  25. 25. Virtual Physical Memory Address translation & maintains a physical-to- machine address (40 bit) mapping. Virtual machines use physical addresses Software reloaded translation-lookaside buffer (TLB) of the MIPS processor Maintains pmap data structure for each VM – contains one entry for each physical to virtual mapping pmap also has a back pointer to its virtual address to help invalidate mappings in the TLB 25
  26. 26. Contd.. Kernel mode references on MIPS processors access memory and I/O directly - need to re-link OS code and data to a mapped address space MIPS tags each TLB entry with Address space identifiers (ASID) ASIDs are not virtualized - TLB need to be flushed on VM context switches Increased TLB misses in workloads  Additional Operating system references  VM context switches TLB misses expensive - create 2nd level software - TLB . Idea similar to cache? 26
  27. 27. NUMA Memory management Cache misses should be satisfied from local memory (fast) rather than remote memory (slow) Dynamic Page Migration and Replication  Pages frequently accessed by one node are migrated  Read-shared pages are replicated among all nodes  Write-shared are not moved, since maintaining consistency requires remote access anyway  Migration and replacement policy is driven by cache-miss- counting facility provided by the FLASH hardware 27
  28. 28. Transparent Page Replication1. Two different virtual processors of the same virtual machine logically read-share the same physical page, but each virtual processor accesses a local copy.2. memmap tracks which virtual page references each physical page. Used during TLB shootdown 28
  29. 29. Disco Memory Management 29
  30. 30. Virtual I/O Devices Disco intercepts all device accesses from the virtual machine and forwards them to the physical devices Special device drivers are added to the guest OS Disco device provide monitor call interface to pass all the arguments in single trap Single VM accessing a device does not require virtualizing the I/O – only needs to assure exclusivity 30
  31. 31. Copy-on-write Disks Intercept DMA requests to translate the physical addresses into machine addresses. Maps machine page as read only to destination address page of DMA  Sharing machine memory Attempts to modify a shared page will result in a copy-on-write fault handled internally by the monitor.  Logs are maintained for each VM Modification  Modification made in main memory Non-persistent disks are copy on write shared  E.g. Kernel text and buffer cache  E.g. File systems root disks 31
  32. 32. Transparent Sharing of Pages Creates a global buffer cache shared across VMs and reduces memory foot print of the system 32
  33. 33. Virtual Network Interface Virtual subnet and network interface use copy on write mapping to share the read only pages Persistent disks can be accessed using standard system protocol NFS Provides a global buffer cache that is transparently shared by independent VMs 33
  34. 34. Transparent sharing of pages over NFS1. The monitor’s networking device remaps the data page from the source’s machine address space to the destination’s.2. The monitor remaps the data page from the driver’s mbuf to the clients buffer cache. 34
  35. 35. Modifications to the IRIX 5.3OS Minor changes to kernel code and data segment – specific to MIPS  Relocate the unmapped segment of the virtual machine into the mapped supervisor segment of the processor– Kernel relocation Disco drivers are same as original device drivers of IRIX Patched HAL to use memory loads/stores instead of privileged instructions 35
  36. 36. Modifications to the IRIX 5.3OS Added code to HAL to pass hints to monitor for resource management New Monitor calls to MMU to request zeroed page, unused memory reclamation Changed mbuf management to be page- aligned Changed bcopy to use remap (with copy-on- write) 36
  37. 37. SPLASHOS: A specialized OS Thin specialized library OS, supported directly by Disco No need for virtual memory subsystem since they share address space Used for the parallel scientific applications that can span the entire machine 37
  38. 38. Disco: Performance Experimental Setup  Disco targets the FLASH machine not available that time  Used SimOS, a machine simulator that models the hardware of MIPS-based multiprocessors for the Disco monitor.  Simulator was too slow to allow long work loads to be studied 38
  39. 39. Disco: Performance Workloads 39
  40. 40. Disco: Performance  Execution OverheadPmake overhead due to I/O virtualization, others due to TLB mappingReduction of kernel time 40On average virtualization overhead of 3% to 16%
  41. 41. Disco: Performance Memory Overheads V: Pmake memory used if there is no sharing M: Pmake memory used if there is sharing 41
  42. 42. Disco: Performance Scalability Partitioning of problem into different VM’s increases scalability. Kernel synchronization time becomes smaller. 42
  43. 43. Disco: Performance Dynamic Page Migration and replication 43
  44. 44. Conclusion Disco VMM hides NUMA-ness from non- NUMA aware OS Disco VMM is low(er) effort Moderate overhead due to virtualization 44
  45. 45. Discussion Was Disco- VMM done rightly?  Virtual Physical Memory on architectures other than MIPS  MIPS TLB is software managed  Not sure of how well other OS perform on Disco since IRIX was designed for MIPS  Not sure how HIVE, Hurricane performs comparatively  Performance of long workloads on the system  Performance of heterogeneous VMs e.g. Pmake case 45
  46. 46. Discussion Are VMM Microkernels done right? 46

×