SlideShare a Scribd company logo
Disco: Running Commodity
Operating Systems on Scalable
Multiprocessors

    Edouard Bugnion, Scott Devine, Mendel Rosenblum,
               Stanford University, 1997


               Presented by Divya Parekh



                                                       1
Outline
   Virtualization
   Disco description
   Disco performance
   Discussion




                        2
Virtualization
   “a technique for hiding the physical
    characteristics of computing resources from the
    way in which other systems, applications, or end
    users interact with those resources. This includes
    making a single physical resource appear to
    function as multiple logical resources; or it can
    include making multiple physical resources
    appear as a single logical resource”



                                                         3
Old idea from the 1960s
   IBM VM/370 – A VMM for IBM mainframe
          Multiple OS environments on expensive hardware
          Desirable when few machine around
   Popular research idea in 1960s and 1970s
          Entire conferences on virtual machine monitors
          Hardware/VMM/OS designed together
 Interest died out in the 1980s and 1990s
          Hardware got more cheaper
          Operating systems got more powerful (e.g. multi-user)


                                                                   4
A Return to Virtual Machines
   Disco: Stanford research project (SOSP ’97)
       Run commodity OSes on scalable multiprocessors
       Focus on high-end: NUMA, MIPS, IRIX
   Commercial virtual machines for x86 architecture
       VMware Workstation (now EMC) (1999-)
       Connectix VirtualPC (now Microsoft)
   Research virtual machines for x86 architecture
       Xen (SOSP ’03)
       plex86
   OS-level virtualization
       FreeBSD Jails, User-mode-linux, UMLinux

                                                         5
Overview
   Virtual Machine
       A fully protected and isolated copy of the underlying
        physical machine’s hardware. (definition by IBM)”
   Virtual Machine Monitor
       A thin layer of software that's between the hardware
        and the Operating system, virtualizing and managing
        all hardware resources.
       Also known as “Hypervisor”



                                                                6
Classification of Virtual
Machines




                            7
Classification of Virtual
Machines
   Type I
          VMM is implemented directly on the physical hardware.
           VMM performs the scheduling and allocation of the
           system’s resources.
           IBM VM/370, Disco, VMware’s ESX Server, Xen
   Type II
          VMMs are built completely on top of a host OS.
           The host OS provides resource allocation and standard
           execution environment to each “guest OS.”
           User-mode Linux (UML), UMLinux


                                                                    8
Non-Virtualizable Architectures
   According to Popek and Goldberg,
     ” an architecture is virtualizable if the set of
        sensitive instructions is a subset of the set of
        privileged instructions.”
   x86
       Several instructions can read system state in
        register CPL 3 without trapping
   MIPS
       KSEG0 bypasses TLB, reads physical memory
        directly

                                                           9
Type I contd..
     Hardware Support for Virtualization




Figure: The hardware support approach to x86 Virtualization
        E.g. Intel Vanderpool/VT and AMD-V/SVM
                                                              10
Type I contd..
    Full Virtualization




Figure : The binary translation approach to x86 Virtualization
                E.g. VMware ESX server
                                                                 11
Type I contd..
   Paravirtualization




Figure: The Paravirtualization approach to x86 Virtualization
                E.g. Xen
                                                                12
Type II
   Hosted VM Architecture




        E.g. VMware Workstation, Connectix VirtualPC


                                                       13
Disco : VMM Prototype
   Goals
      Extend modern OS to run efficiently on shared
       memory multiprocessors without large changes to
       the OS.
      A VMM built to run multiple copies of Silicon
       Graphics IRIX operating system on a Stanford
       Flash shared memory multiprocessor.




                                                         14
Problem Description
   Multiprocessor in the market (1990s)
      Innovative Hardware

   Hardware faster than System Software
      Customized OS are late, incompatible, and
       possibly bug
   Commodity OS not suited for multiprocessors
      Do not scale cause of lock contention, memory
       architecture
      Do not isolate/contain faults

         More Processors  More failures




                                                       15
Solution to the problems
    Resource-intensive Modification of OS (hard and
     time consuming, increase in size, etc)
    Make a Virtual Machine Monitor (software)
     between OS and Hardware to resolve the problem




                                                   16
Two opposite Way for System
Software
   Address these challenges in the operating system:
    OS-Intensive
      Hive , Hurricane, Cellular-IRIX, etc

      innovative, single system image

      But large effort.



   Hard-partition machine into independent failure units:
    OS-light
      Sun Enterprise10000 machine

      Partial single system image

      Cannot dynamically adapt the partitioning



                                                        17
Return to Virtual Machine
Monitors
   One Compromise Way between OS-intensive & OS-
    light – VMM
   Virtual machine monitors, in combination with
    commodity and specialized operating systems, form a
    flexible system software solution for these machines
   Disco was introduced to allow trading off between
    the costs of performance and development cost.




                                                      18
Architecture of Disco




                        19
Advantages of this approach
   Scalability
   Flexibility
   Hide NUMA effect
   Fault Containment
   Compatibility with legacy applications




                                             20
Challenges Facing Virtual
Machines
   Overheads
         Trap and emulate privileged instructions of

          guest OS
         Access to I/O devices

         Replication of memory in each VM

   Resource Management
         Lack of information to make good policy

          decisions
   Communication and Sharing
         Stand alone VM’s cannot communicate

                                                        21
Disco’s Interface
   Processors
       MIPS R10000 processor
       Emulates all instructions, the MMU, trap architecture
       Extension to support common processor operations
            Enabling/disabling interrupts, accessing privileged registers
   Physical memory
       Contiguous, starting at address 0
   I/O devices
       Virtualize devices like I/O, disks, n/w interface exclusive to VM
       Physical devices multiplexed by Disco
       Special abstractions for SCSI disks and network interfaces
            Virtual disks for VMs
            Virtual subnet across all virtual machines




                                                                             22
Disco Implementation
   Multi threaded shared memory program
   Attention to NUMA memory placement, cache aware
    data structures and IPC patterns
   Code segment of DISCO copied to each flash
    processor – data locality
   Communicate using shared memory




                                                  23
Virtual CPUs
   Direct Execution
        execution of virtual CPU on real CPU
       Sets the real machine’s registers to the virtual CPU’s
       Jumps to the current PC of the virtual CPU, Direct execution
        on the real CPU
   Challenges
       Detection and fast emulation of operations that cannot be
        safely exported to the virtual machine  privileged
        instructions such as TLB modification and Direct access to
        physical memory and I/O devices.
   Maintains data structure for each virtual CPU for trap
    emulation
   Scheduler multiplexes virtual CPU on real processor

                                                                     24
Virtual Physical Memory
   Address translation & maintains a physical-to-
    machine address (40 bit) mapping.
   Virtual machines use physical addresses
   Software reloaded translation-lookaside buffer (TLB)
    of the MIPS processor
   Maintains pmap data structure for each VM –
    contains one entry for each physical to virtual
    mapping
   pmap also has a back pointer to its virtual address to
    help invalidate mappings in the TLB


                                                         25
Contd..
   Kernel mode references on MIPS processors access
    memory and I/O directly - need to re-link OS code
    and data to a mapped address space
   MIPS tags each TLB entry with Address space
    identifiers (ASID)
   ASIDs are not virtualized - TLB need to be flushed on
    VM context switches
   Increased TLB misses in workloads
       Additional Operating system references
       VM context switches
   TLB misses expensive - create 2nd level software -
    TLB . Idea similar to cache?

                                                         26
NUMA Memory management
   Cache misses should be satisfied from local memory
    (fast) rather than remote memory (slow)
   Dynamic Page Migration and Replication
       Pages frequently accessed by one node are migrated
       Read-shared pages are replicated among all nodes
       Write-shared are not moved, since maintaining consistency
        requires remote access anyway
       Migration and replacement policy is driven by cache-miss-
        counting facility provided by the FLASH hardware




                                                                    27
Transparent Page Replication




1. Two different virtual processors of the same virtual machine logically
   read-share the same physical page, but each virtual processor accesses
   a local copy.
2. memmap tracks which virtual page references each physical page.
   Used during TLB shootdown                                              28
Disco Memory Management




                          29
Virtual I/O Devices
   Disco intercepts all device accesses from the virtual
    machine and forwards them to the physical devices
   Special device drivers are added to the guest OS
   Disco device provide monitor call interface to pass all
    the arguments in single trap
   Single VM accessing a device does not require
    virtualizing the I/O – only needs to assure exclusivity




                                                          30
Copy-on-write Disks
   Intercept DMA requests to translate the physical
    addresses into machine addresses.
   Maps machine page as read only to destination
    address page of DMA  Sharing machine memory
   Attempts to modify a shared page will result in a
    copy-on-write fault handled internally by the monitor.
       Logs are maintained for each VM Modification
       Modification made in main memory
   Non-persistent disks are copy on write shared
       E.g. Kernel text and buffer cache
       E.g. File systems root disks

                                                        31
Transparent Sharing of Pages




 Creates a global buffer cache shared across VM's and reduces
 memory foot print of the system

                                                                32
Virtual Network Interface
   Virtual subnet and network interface use copy on
    write mapping to share the read only pages
   Persistent disks can be accessed using standard
    system protocol NFS
   Provides a global buffer cache that is transparently
    shared by independent VMs




                                                           33
Transparent sharing of pages
        over NFS




1. The monitor’s networking device remaps the data page from the source’s
    machine address space to the destination’s.
2. The monitor remaps the data page from the driver’s mbuf to the clients
   buffer cache.                                                          34
Modifications to the IRIX 5.3
OS
   Minor changes to kernel code and data
    segment – specific to MIPS
       Relocate the unmapped segment of the virtual
        machine into the mapped supervisor segment of
        the processor– Kernel relocation
   Disco drivers are same as original device
    drivers of IRIX
   Patched HAL to use memory loads/stores
    instead of privileged instructions

                                                        35
Modifications to the IRIX 5.3
OS
   Added code to HAL to pass hints to monitor
    for resource management
   New Monitor calls to MMU to request zeroed
    page, unused memory reclamation
   Changed mbuf management to be page-
    aligned
   Changed bcopy to use remap (with copy-on-
    write)

                                                 36
SPLASHOS: A specialized OS
   Thin specialized library OS, supported
    directly by Disco
   No need for virtual memory subsystem
    since they share address space
   Used for the parallel scientific
    applications that can span the entire
    machine

                                             37
Disco: Performance
   Experimental Setup
       Disco targets the FLASH machine not
        available that time
       Used SimOS, a machine simulator that
        models the hardware of MIPS-based
        multiprocessors for the Disco monitor.
       Simulator was too slow to allow long work
        loads to be studied

                                                    38
Disco: Performance
   Workloads




                     39
Disco: Performance
           Execution Overhead




Pmake overhead due to I/O virtualization, others due to TLB mapping
Reduction of kernel time
                                                                      40
On average virtualization overhead of 3% to 16%
Disco: Performance
   Memory Overheads




    V: Pmake memory used if there is no sharing
    M: Pmake memory used if there is sharing      41
Disco: Performance
    Scalability




    Partitioning of problem into different VM’s increases scalability.
    Kernel synchronization time becomes smaller.                         42
Disco: Performance
   Dynamic Page Migration and replication




                                         43
Conclusion
   Disco VMM hides NUMA-ness from non-
    NUMA aware OS
   Disco VMM is low(er) effort
   Moderate overhead due to virtualization




                                          44
Discussion
   Was Disco- VMM done rightly?
        Virtual Physical Memory on architectures other
        than MIPS
            MIPS TLB is software managed
       Not sure of how well other OS perform on Disco
        since IRIX was designed for MIPS
       Not sure how HIVE, Hurricane performs
        comparatively
       Performance of long workloads on the system
       Performance of heterogeneous VMs e.g. Pmake
        case


                                                          45
Discussion


 Are VMM Microkernels
 done right?



                        46

More Related Content

What's hot

Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVMHypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
vwchu
 
Virtual machines and their architecture
Virtual machines and their architectureVirtual machines and their architecture
Virtual machines and their architecture
Mrinmoy Dalal
 
密かに話題のBufferbloat
密かに話題のBufferbloat密かに話題のBufferbloat
密かに話題のBufferbloat
Kazuhito Ohkawa
 
VMware vSphere 6.0 - Troubleshooting Training - Day 1
VMware vSphere 6.0 - Troubleshooting Training - Day 1VMware vSphere 6.0 - Troubleshooting Training - Day 1
VMware vSphere 6.0 - Troubleshooting Training - Day 1
Sanjeev Kumar
 
GPU Virtualization in SUSE
GPU Virtualization in SUSEGPU Virtualization in SUSE
GPU Virtualization in SUSE
Liang Yan
 
Redesigning Xen Memory Sharing (Grant) Mechanism
Redesigning Xen Memory Sharing (Grant) MechanismRedesigning Xen Memory Sharing (Grant) Mechanism
Redesigning Xen Memory Sharing (Grant) Mechanism
The Linux Foundation
 
Xen & virtualization
Xen & virtualizationXen & virtualization
Xen & virtualization
Susheel Thakur
 
virtualization and hypervisors
virtualization and hypervisorsvirtualization and hypervisors
virtualization and hypervisors
Gaurav Suri
 
VMware Interview questions and answers
VMware Interview questions and answersVMware Interview questions and answers
VMware Interview questions and answers
vivaankumar
 
VMWARE VS MS-HYPER-V
VMWARE VS MS-HYPER-VVMWARE VS MS-HYPER-V
VMWARE VS MS-HYPER-V
David Ramirez
 
Virtual machine subhash gupta
Virtual machine subhash guptaVirtual machine subhash gupta
Virtual machine subhash gupta
Subhash Chandra Gupta
 
Virtualization
VirtualizationVirtualization
Virtualization
Kingston Smiler
 
05.2 virtio introduction
05.2 virtio introduction05.2 virtio introduction
05.2 virtio introduction
zenixls2
 
10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤ10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤTakashi Hoshino
 
Server virtualization by VMWare
Server virtualization by VMWareServer virtualization by VMWare
Server virtualization by VMWare
sgurnam73
 
Virtualization - Kernel Virtual Machine (KVM)
Virtualization - Kernel Virtual Machine (KVM)Virtualization - Kernel Virtual Machine (KVM)
Virtualization - Kernel Virtual Machine (KVM)
Wan Leung Wong
 
Virtualization
VirtualizationVirtualization
Virtualization
Kumar Harsha
 
Virtual Machines - Virtual Box
Virtual Machines  - Virtual BoxVirtual Machines  - Virtual Box
Virtual Machines - Virtual Box
Lahiru Danushka
 
VMware Log Insight
VMware Log Insight VMware Log Insight
VMware Log Insight
Iwan Rahabok
 
Chapter 07
Chapter 07Chapter 07
Chapter 07
dikochiqa
 

What's hot (20)

Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVMHypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
 
Virtual machines and their architecture
Virtual machines and their architectureVirtual machines and their architecture
Virtual machines and their architecture
 
密かに話題のBufferbloat
密かに話題のBufferbloat密かに話題のBufferbloat
密かに話題のBufferbloat
 
VMware vSphere 6.0 - Troubleshooting Training - Day 1
VMware vSphere 6.0 - Troubleshooting Training - Day 1VMware vSphere 6.0 - Troubleshooting Training - Day 1
VMware vSphere 6.0 - Troubleshooting Training - Day 1
 
GPU Virtualization in SUSE
GPU Virtualization in SUSEGPU Virtualization in SUSE
GPU Virtualization in SUSE
 
Redesigning Xen Memory Sharing (Grant) Mechanism
Redesigning Xen Memory Sharing (Grant) MechanismRedesigning Xen Memory Sharing (Grant) Mechanism
Redesigning Xen Memory Sharing (Grant) Mechanism
 
Xen & virtualization
Xen & virtualizationXen & virtualization
Xen & virtualization
 
virtualization and hypervisors
virtualization and hypervisorsvirtualization and hypervisors
virtualization and hypervisors
 
VMware Interview questions and answers
VMware Interview questions and answersVMware Interview questions and answers
VMware Interview questions and answers
 
VMWARE VS MS-HYPER-V
VMWARE VS MS-HYPER-VVMWARE VS MS-HYPER-V
VMWARE VS MS-HYPER-V
 
Virtual machine subhash gupta
Virtual machine subhash guptaVirtual machine subhash gupta
Virtual machine subhash gupta
 
Virtualization
VirtualizationVirtualization
Virtualization
 
05.2 virtio introduction
05.2 virtio introduction05.2 virtio introduction
05.2 virtio introduction
 
10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤ10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤ
 
Server virtualization by VMWare
Server virtualization by VMWareServer virtualization by VMWare
Server virtualization by VMWare
 
Virtualization - Kernel Virtual Machine (KVM)
Virtualization - Kernel Virtual Machine (KVM)Virtualization - Kernel Virtual Machine (KVM)
Virtualization - Kernel Virtual Machine (KVM)
 
Virtualization
VirtualizationVirtualization
Virtualization
 
Virtual Machines - Virtual Box
Virtual Machines  - Virtual BoxVirtual Machines  - Virtual Box
Virtual Machines - Virtual Box
 
VMware Log Insight
VMware Log Insight VMware Log Insight
VMware Log Insight
 
Chapter 07
Chapter 07Chapter 07
Chapter 07
 

Similar to Disco: Running Commodity Operating Systems on Scalable Multiprocessors Disco

virtualization.pptx
virtualization.pptxvirtualization.pptx
virtualization.pptx
ssuser6e6eec
 
Virtualization (Distributed computing)
Virtualization (Distributed computing)Virtualization (Distributed computing)
Virtualization (Distributed computing)
Sri Prasanna
 
Linux virtualization
Linux virtualizationLinux virtualization
Linux virtualization
Google
 
IaaS - Virtualization_Cambridge.pdf
IaaS - Virtualization_Cambridge.pdfIaaS - Virtualization_Cambridge.pdf
IaaS - Virtualization_Cambridge.pdf
DharavathRamesh2
 
Virtual pc
Virtual pcVirtual pc
Cloud Computing Tools
Cloud Computing ToolsCloud Computing Tools
Cloud Computing Tools
Jithin Parakka
 
Operating system Definition Structures
Operating  system Definition  StructuresOperating  system Definition  Structures
Operating system Definition Structures
anair23
 
Vmm concepts
Vmm conceptsVmm concepts
Vmm concepts
Libin M
 
Vmm concepts
Vmm conceptsVmm concepts
Vmm concepts
anilanindian
 
Handout2o
Handout2oHandout2o
Handout2o
Shahbaz Sidhu
 
Virtual Server 2005 Overview Rich McBrine, CISSP
Virtual Server 2005 Overview Rich McBrine, CISSPVirtual Server 2005 Overview Rich McBrine, CISSP
Virtual Server 2005 Overview Rich McBrine, CISSP
webhostingguy
 
Unit II.ppt
Unit II.pptUnit II.ppt
Unit II.ppt
HARISHK762704
 
Live VM Migration
Live VM MigrationLive VM Migration
Live VM Migration
Shivam Singh
 
Parth virt
Parth virtParth virt
Parth virt
Parth Monga
 
PPT
PPTPPT
PPT
butest
 
Virtual Server 2004 Overview
Virtual Server 2004 OverviewVirtual Server 2004 Overview
Virtual Server 2004 Overview
webhostingguy
 
Virtual Server 2004 Overview
Virtual Server 2004 OverviewVirtual Server 2004 Overview
Virtual Server 2004 Overview
webhostingguy
 
Vitualisation
VitualisationVitualisation
Vitualisation
Priya_Srivastava
 
Chapter 5 – Cloud Resource Virtua.docx
Chapter 5 – Cloud Resource                        Virtua.docxChapter 5 – Cloud Resource                        Virtua.docx
Chapter 5 – Cloud Resource Virtua.docx
madlynplamondon
 
Chapter 5 – Cloud Resource Virtua.docx
Chapter 5 – Cloud Resource                        Virtua.docxChapter 5 – Cloud Resource                        Virtua.docx
Chapter 5 – Cloud Resource Virtua.docx
gertrudebellgrove
 

Similar to Disco: Running Commodity Operating Systems on Scalable Multiprocessors Disco (20)

virtualization.pptx
virtualization.pptxvirtualization.pptx
virtualization.pptx
 
Virtualization (Distributed computing)
Virtualization (Distributed computing)Virtualization (Distributed computing)
Virtualization (Distributed computing)
 
Linux virtualization
Linux virtualizationLinux virtualization
Linux virtualization
 
IaaS - Virtualization_Cambridge.pdf
IaaS - Virtualization_Cambridge.pdfIaaS - Virtualization_Cambridge.pdf
IaaS - Virtualization_Cambridge.pdf
 
Virtual pc
Virtual pcVirtual pc
Virtual pc
 
Cloud Computing Tools
Cloud Computing ToolsCloud Computing Tools
Cloud Computing Tools
 
Operating system Definition Structures
Operating  system Definition  StructuresOperating  system Definition  Structures
Operating system Definition Structures
 
Vmm concepts
Vmm conceptsVmm concepts
Vmm concepts
 
Vmm concepts
Vmm conceptsVmm concepts
Vmm concepts
 
Handout2o
Handout2oHandout2o
Handout2o
 
Virtual Server 2005 Overview Rich McBrine, CISSP
Virtual Server 2005 Overview Rich McBrine, CISSPVirtual Server 2005 Overview Rich McBrine, CISSP
Virtual Server 2005 Overview Rich McBrine, CISSP
 
Unit II.ppt
Unit II.pptUnit II.ppt
Unit II.ppt
 
Live VM Migration
Live VM MigrationLive VM Migration
Live VM Migration
 
Parth virt
Parth virtParth virt
Parth virt
 
PPT
PPTPPT
PPT
 
Virtual Server 2004 Overview
Virtual Server 2004 OverviewVirtual Server 2004 Overview
Virtual Server 2004 Overview
 
Virtual Server 2004 Overview
Virtual Server 2004 OverviewVirtual Server 2004 Overview
Virtual Server 2004 Overview
 
Vitualisation
VitualisationVitualisation
Vitualisation
 
Chapter 5 – Cloud Resource Virtua.docx
Chapter 5 – Cloud Resource                        Virtua.docxChapter 5 – Cloud Resource                        Virtua.docx
Chapter 5 – Cloud Resource Virtua.docx
 
Chapter 5 – Cloud Resource Virtua.docx
Chapter 5 – Cloud Resource                        Virtua.docxChapter 5 – Cloud Resource                        Virtua.docx
Chapter 5 – Cloud Resource Virtua.docx
 

More from Magnus Backman

The latest in IT transformation at EMC
The latest in IT transformation at EMCThe latest in IT transformation at EMC
The latest in IT transformation at EMC
Magnus Backman
 
Cygate Lounge 2011 - VCE
Cygate Lounge 2011 - VCECygate Lounge 2011 - VCE
Cygate Lounge 2011 - VCE
Magnus Backman
 
Computer Sweden Cloud Strategies - EMC Keynote
Computer Sweden Cloud Strategies - EMC KeynoteComputer Sweden Cloud Strategies - EMC Keynote
Computer Sweden Cloud Strategies - EMC Keynote
Magnus Backman
 
IT NonStop - Business Continuity and Disaster Recovery Roadshow
IT NonStop - Business Continuity and Disaster Recovery RoadshowIT NonStop - Business Continuity and Disaster Recovery Roadshow
IT NonStop - Business Continuity and Disaster Recovery Roadshow
Magnus Backman
 
Arrow inspiration day cloud keynote
Arrow inspiration day cloud keynoteArrow inspiration day cloud keynote
Arrow inspiration day cloud keynote
Magnus Backman
 
Information Wars - Starring Symmetrix VMAX
Information Wars - Starring Symmetrix VMAXInformation Wars - Starring Symmetrix VMAX
Information Wars - Starring Symmetrix VMAX
Magnus Backman
 
VMware Forum 2012 - EMC "The Way Ahead"
VMware Forum 2012 - EMC "The Way Ahead"VMware Forum 2012 - EMC "The Way Ahead"
VMware Forum 2012 - EMC "The Way Ahead"
Magnus Backman
 

More from Magnus Backman (7)

The latest in IT transformation at EMC
The latest in IT transformation at EMCThe latest in IT transformation at EMC
The latest in IT transformation at EMC
 
Cygate Lounge 2011 - VCE
Cygate Lounge 2011 - VCECygate Lounge 2011 - VCE
Cygate Lounge 2011 - VCE
 
Computer Sweden Cloud Strategies - EMC Keynote
Computer Sweden Cloud Strategies - EMC KeynoteComputer Sweden Cloud Strategies - EMC Keynote
Computer Sweden Cloud Strategies - EMC Keynote
 
IT NonStop - Business Continuity and Disaster Recovery Roadshow
IT NonStop - Business Continuity and Disaster Recovery RoadshowIT NonStop - Business Continuity and Disaster Recovery Roadshow
IT NonStop - Business Continuity and Disaster Recovery Roadshow
 
Arrow inspiration day cloud keynote
Arrow inspiration day cloud keynoteArrow inspiration day cloud keynote
Arrow inspiration day cloud keynote
 
Information Wars - Starring Symmetrix VMAX
Information Wars - Starring Symmetrix VMAXInformation Wars - Starring Symmetrix VMAX
Information Wars - Starring Symmetrix VMAX
 
VMware Forum 2012 - EMC "The Way Ahead"
VMware Forum 2012 - EMC "The Way Ahead"VMware Forum 2012 - EMC "The Way Ahead"
VMware Forum 2012 - EMC "The Way Ahead"
 

Recently uploaded

Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 

Recently uploaded (20)

Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 

Disco: Running Commodity Operating Systems on Scalable Multiprocessors Disco

  • 1. Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, Mendel Rosenblum, Stanford University, 1997 Presented by Divya Parekh 1
  • 2. Outline  Virtualization  Disco description  Disco performance  Discussion 2
  • 3. Virtualization  “a technique for hiding the physical characteristics of computing resources from the way in which other systems, applications, or end users interact with those resources. This includes making a single physical resource appear to function as multiple logical resources; or it can include making multiple physical resources appear as a single logical resource” 3
  • 4. Old idea from the 1960s  IBM VM/370 – A VMM for IBM mainframe  Multiple OS environments on expensive hardware  Desirable when few machine around  Popular research idea in 1960s and 1970s  Entire conferences on virtual machine monitors  Hardware/VMM/OS designed together  Interest died out in the 1980s and 1990s  Hardware got more cheaper  Operating systems got more powerful (e.g. multi-user) 4
  • 5. A Return to Virtual Machines  Disco: Stanford research project (SOSP ’97)  Run commodity OSes on scalable multiprocessors  Focus on high-end: NUMA, MIPS, IRIX  Commercial virtual machines for x86 architecture  VMware Workstation (now EMC) (1999-)  Connectix VirtualPC (now Microsoft)  Research virtual machines for x86 architecture  Xen (SOSP ’03)  plex86  OS-level virtualization  FreeBSD Jails, User-mode-linux, UMLinux 5
  • 6. Overview  Virtual Machine  A fully protected and isolated copy of the underlying physical machine’s hardware. (definition by IBM)”  Virtual Machine Monitor  A thin layer of software that's between the hardware and the Operating system, virtualizing and managing all hardware resources.  Also known as “Hypervisor” 6
  • 8. Classification of Virtual Machines  Type I  VMM is implemented directly on the physical hardware.  VMM performs the scheduling and allocation of the system’s resources.  IBM VM/370, Disco, VMware’s ESX Server, Xen  Type II  VMMs are built completely on top of a host OS.  The host OS provides resource allocation and standard execution environment to each “guest OS.”  User-mode Linux (UML), UMLinux 8
  • 9. Non-Virtualizable Architectures  According to Popek and Goldberg, ” an architecture is virtualizable if the set of sensitive instructions is a subset of the set of privileged instructions.”  x86  Several instructions can read system state in register CPL 3 without trapping  MIPS  KSEG0 bypasses TLB, reads physical memory directly 9
  • 10. Type I contd..  Hardware Support for Virtualization Figure: The hardware support approach to x86 Virtualization E.g. Intel Vanderpool/VT and AMD-V/SVM 10
  • 11. Type I contd..  Full Virtualization Figure : The binary translation approach to x86 Virtualization E.g. VMware ESX server 11
  • 12. Type I contd..  Paravirtualization Figure: The Paravirtualization approach to x86 Virtualization E.g. Xen 12
  • 13. Type II  Hosted VM Architecture E.g. VMware Workstation, Connectix VirtualPC 13
  • 14. Disco : VMM Prototype  Goals  Extend modern OS to run efficiently on shared memory multiprocessors without large changes to the OS.  A VMM built to run multiple copies of Silicon Graphics IRIX operating system on a Stanford Flash shared memory multiprocessor. 14
  • 15. Problem Description  Multiprocessor in the market (1990s)  Innovative Hardware  Hardware faster than System Software  Customized OS are late, incompatible, and possibly bug  Commodity OS not suited for multiprocessors  Do not scale cause of lock contention, memory architecture  Do not isolate/contain faults  More Processors  More failures 15
  • 16. Solution to the problems  Resource-intensive Modification of OS (hard and time consuming, increase in size, etc)  Make a Virtual Machine Monitor (software) between OS and Hardware to resolve the problem 16
  • 17. Two opposite Way for System Software  Address these challenges in the operating system: OS-Intensive  Hive , Hurricane, Cellular-IRIX, etc  innovative, single system image  But large effort.  Hard-partition machine into independent failure units: OS-light  Sun Enterprise10000 machine  Partial single system image  Cannot dynamically adapt the partitioning 17
  • 18. Return to Virtual Machine Monitors  One Compromise Way between OS-intensive & OS- light – VMM  Virtual machine monitors, in combination with commodity and specialized operating systems, form a flexible system software solution for these machines  Disco was introduced to allow trading off between the costs of performance and development cost. 18
  • 20. Advantages of this approach  Scalability  Flexibility  Hide NUMA effect  Fault Containment  Compatibility with legacy applications 20
  • 21. Challenges Facing Virtual Machines  Overheads  Trap and emulate privileged instructions of guest OS  Access to I/O devices  Replication of memory in each VM  Resource Management  Lack of information to make good policy decisions  Communication and Sharing  Stand alone VM’s cannot communicate 21
  • 22. Disco’s Interface  Processors  MIPS R10000 processor  Emulates all instructions, the MMU, trap architecture  Extension to support common processor operations  Enabling/disabling interrupts, accessing privileged registers  Physical memory  Contiguous, starting at address 0  I/O devices  Virtualize devices like I/O, disks, n/w interface exclusive to VM  Physical devices multiplexed by Disco  Special abstractions for SCSI disks and network interfaces  Virtual disks for VMs  Virtual subnet across all virtual machines 22
  • 23. Disco Implementation  Multi threaded shared memory program  Attention to NUMA memory placement, cache aware data structures and IPC patterns  Code segment of DISCO copied to each flash processor – data locality  Communicate using shared memory 23
  • 24. Virtual CPUs  Direct Execution  execution of virtual CPU on real CPU  Sets the real machine’s registers to the virtual CPU’s  Jumps to the current PC of the virtual CPU, Direct execution on the real CPU  Challenges  Detection and fast emulation of operations that cannot be safely exported to the virtual machine  privileged instructions such as TLB modification and Direct access to physical memory and I/O devices.  Maintains data structure for each virtual CPU for trap emulation  Scheduler multiplexes virtual CPU on real processor 24
  • 25. Virtual Physical Memory  Address translation & maintains a physical-to- machine address (40 bit) mapping.  Virtual machines use physical addresses  Software reloaded translation-lookaside buffer (TLB) of the MIPS processor  Maintains pmap data structure for each VM – contains one entry for each physical to virtual mapping  pmap also has a back pointer to its virtual address to help invalidate mappings in the TLB 25
  • 26. Contd..  Kernel mode references on MIPS processors access memory and I/O directly - need to re-link OS code and data to a mapped address space  MIPS tags each TLB entry with Address space identifiers (ASID)  ASIDs are not virtualized - TLB need to be flushed on VM context switches  Increased TLB misses in workloads  Additional Operating system references  VM context switches  TLB misses expensive - create 2nd level software - TLB . Idea similar to cache? 26
  • 27. NUMA Memory management  Cache misses should be satisfied from local memory (fast) rather than remote memory (slow)  Dynamic Page Migration and Replication  Pages frequently accessed by one node are migrated  Read-shared pages are replicated among all nodes  Write-shared are not moved, since maintaining consistency requires remote access anyway  Migration and replacement policy is driven by cache-miss- counting facility provided by the FLASH hardware 27
  • 28. Transparent Page Replication 1. Two different virtual processors of the same virtual machine logically read-share the same physical page, but each virtual processor accesses a local copy. 2. memmap tracks which virtual page references each physical page. Used during TLB shootdown 28
  • 30. Virtual I/O Devices  Disco intercepts all device accesses from the virtual machine and forwards them to the physical devices  Special device drivers are added to the guest OS  Disco device provide monitor call interface to pass all the arguments in single trap  Single VM accessing a device does not require virtualizing the I/O – only needs to assure exclusivity 30
  • 31. Copy-on-write Disks  Intercept DMA requests to translate the physical addresses into machine addresses.  Maps machine page as read only to destination address page of DMA  Sharing machine memory  Attempts to modify a shared page will result in a copy-on-write fault handled internally by the monitor.  Logs are maintained for each VM Modification  Modification made in main memory  Non-persistent disks are copy on write shared  E.g. Kernel text and buffer cache  E.g. File systems root disks 31
  • 32. Transparent Sharing of Pages Creates a global buffer cache shared across VM's and reduces memory foot print of the system 32
  • 33. Virtual Network Interface  Virtual subnet and network interface use copy on write mapping to share the read only pages  Persistent disks can be accessed using standard system protocol NFS  Provides a global buffer cache that is transparently shared by independent VMs 33
  • 34. Transparent sharing of pages over NFS 1. The monitor’s networking device remaps the data page from the source’s machine address space to the destination’s. 2. The monitor remaps the data page from the driver’s mbuf to the clients buffer cache. 34
  • 35. Modifications to the IRIX 5.3 OS  Minor changes to kernel code and data segment – specific to MIPS  Relocate the unmapped segment of the virtual machine into the mapped supervisor segment of the processor– Kernel relocation  Disco drivers are same as original device drivers of IRIX  Patched HAL to use memory loads/stores instead of privileged instructions 35
  • 36. Modifications to the IRIX 5.3 OS  Added code to HAL to pass hints to monitor for resource management  New Monitor calls to MMU to request zeroed page, unused memory reclamation  Changed mbuf management to be page- aligned  Changed bcopy to use remap (with copy-on- write) 36
  • 37. SPLASHOS: A specialized OS  Thin specialized library OS, supported directly by Disco  No need for virtual memory subsystem since they share address space  Used for the parallel scientific applications that can span the entire machine 37
  • 38. Disco: Performance  Experimental Setup  Disco targets the FLASH machine not available that time  Used SimOS, a machine simulator that models the hardware of MIPS-based multiprocessors for the Disco monitor.  Simulator was too slow to allow long work loads to be studied 38
  • 39. Disco: Performance  Workloads 39
  • 40. Disco: Performance  Execution Overhead Pmake overhead due to I/O virtualization, others due to TLB mapping Reduction of kernel time 40 On average virtualization overhead of 3% to 16%
  • 41. Disco: Performance  Memory Overheads V: Pmake memory used if there is no sharing M: Pmake memory used if there is sharing 41
  • 42. Disco: Performance  Scalability Partitioning of problem into different VM’s increases scalability. Kernel synchronization time becomes smaller. 42
  • 43. Disco: Performance  Dynamic Page Migration and replication 43
  • 44. Conclusion  Disco VMM hides NUMA-ness from non- NUMA aware OS  Disco VMM is low(er) effort  Moderate overhead due to virtualization 44
  • 45. Discussion  Was Disco- VMM done rightly?  Virtual Physical Memory on architectures other than MIPS  MIPS TLB is software managed  Not sure of how well other OS perform on Disco since IRIX was designed for MIPS  Not sure how HIVE, Hurricane performs comparatively  Performance of long workloads on the system  Performance of heterogeneous VMs e.g. Pmake case 45
  • 46. Discussion Are VMM Microkernels done right? 46