SlideShare a Scribd company logo
Building a Better Userspace
KVM Forum 2008




  Anthony Liguori – aliguori@us.ibm.com
  Open Virtualization
  IBM Linux Technology Center


  June 11th 2008

                    Linux is a registered trademark of Linus Torvalds.
Reflecting on QEMU



We really owe a lot to QEMU
Reflecting on QEMU
●   QEMU is a community-driven project
    –   No company has sponsored major portions of it's
        development
●   QEMU does a really amazing thing
    –   Can emulate 9 target architectures on 13 host
        architectures
    –   Provides full system emulation supporting ~200
        distinct devices
●   Is the basis of KVM, Xen HVM, and xVM Virtual
    Box
    –   Every Open Source virtualization project uses
        QEMU
●   QEMU receives very little credit for this
But all is not well

Many see QEMU as the last mile for open 
           virtualization
QEMU has a patch problem
●   Many patches are ignored
●   Security fixes are not applied in a timely
    manner
●   The quality of the patches that are committed
    is often questionable
●   Some sub-systems are effectively unmaintained



                 How do we fix it?
Fixing the QEMU process
●   First step
    –   Patch review
    –   IBM will be dedicating resources (me) to
        reviewing patches on qemu-devel
    –   This is now happening (although I need help)
●   Second step
    –   I'll be seeking commit access for virtualization
        support in QEMU
         ●   Many patches will still be committed by other
             people according to subsystem maintainer
         ●   I'll try to review everything I can
●   Third step
    –   As things improve, we can hopefully move to a
        DSCM like git and start taking in pull requests
Patch Review
●   We need people reviewing patches on qemu-
    devel
    –   Make use of Reviewed-by, Tested-by, Acked-by
●   It's arguable that more committers are needed
    –   QEMU has about 9 as it is; KVM just has 1
●   We need to jump start a culture of review
The Challenge
●   Action is needed from the people in the room
●   What can we do to facilitate this?
Let's get down to business




    Once we fix the QEMU process,
     we can start to do the fun stuff
PCI DMA Interface
●   We want zero copy
    –   pci_ram_base + PA is broken, but fast
    –   iommu emulation complicates zero copy
         ●   Some DMA engines can be programmed to
             perform data transformation (xor)
    –   The device drivers need lots of updating
●   QEMU uses a multilevel table to lookup
    pci_ram_base offsets for a given PA
    –   Even though there are a small number of
        contiguous ram regions
         ●   0-640K, 1M-3.8GB, 4GB+
    –   We should have a fast path for RAM to avoid
        lookups in the l1_phys_map
●   Fabrice is on-board
PCI devices today
                                  PCI Bus                        pci_register_device




                               cpu_physical_memory_rw, ldl/stl
            RAM                                                     PCI Device


                    ldl/stl
                                                                  cpu_register_physical_memory

                                       CPU


• Often broken for > 4GB RAM
• Byte-swapping is FUBAR
• IOMMU support is impossible
• This model is fundamentally broken
DMA API
                                 PCI Bus                     pci_register_device

                                                 pci_register_region

                                       pci_ldl/pci_stl

            RAM                                                  PCI Device

                                       cpu_register_physical_memory
                    ldl/stl


                                     CPU



• Simplify devices; no special PIO/MMIO paths
• Everything goes through PCI bus, allowing zero-copy or IOMMU
• Can finally sanitize byte swapping
Tasklets
●   Linux has sucky interfaces
    –   O_NONBLOCK still blocks
    –   linux-aio can block
    –   posix-aio uses threads
●   Threads are evil, but are the only way to get
    around synchronous interfaces
●   Introduce a generic thread-pool
    –   Tasks are normally run holding a “big qemu
        lock”
    –   Tasks can mark themselves as not requiring the
        BQL
    –   Tasks can also drop the BQL before sleeping
         ●   Caution must be exercised when doing this!
Introducing re-entrancy
●   Introduce a QEMUMutex type
    –   Can be used to make existing code re-entrant
    –   On platforms without Tasklets, it's simple
        reference counting
    –   Can be used to gradually wean QEMU off of the
        BQL
●   <Refer to QEMUBH code>
Block API
●   The current block API is a mix of
    synchronous/asynchronous behavior
    –   Qcow2 does meta-data lookup synchronously
        but data lookup asynchronously
    –   Possible source of latency
●   We either need to rewrite Qcow2 to use a state
    machine and be entirely asynchronous or
●   Make qcow2 synchronous, but re-entrant
●   Use tasklets to parallelize qcow2 requests
●   Equally applicable to other disk types (vmdk)
    –   Illustrate current API verses tasklet one
Block API
Old API                 New API

●   bdrv_read()         ●   bdrv_pread()
●   bdrv_write()        ●   bdrv_pwrite()
●   bdrv_aio_read()
●   bdrv_aio_write()
●   bdrv_aio_cancel()
●   bdrv_pread()
●   bdrv_pwrite()
Networking
●   Current API is good for performance, but bad
    for zero-copy
●   For vringfd, we need an API that pre-registers
    RX buffers, preferrably with a batching API for
    RX completion notification
●   Can also be used for rate limiting without
    dropping packets
    –   Illustrate current API
Networking API
                                     Effectively a HUB
                   VLANState
                                        • fd_read()                      • fd_read()
                                        • fd_can_read()                  • fd_can_read()
• fd_read()
• fd_can_read()



 VLANClientState                 VLANClientState                    VLANClientState

         TAPState                  VirtIONetState                    VirtIONetState



                     • Does not scale well for > 1 network device
                     • Cannot take advantage of DMA engines
                     • Is confusing to users
Proposed Networking API

                           VLANState BridgeState
• add_tx_buffer()
                                                          • add_tx_buffer()
• add_rx_buffer()
                                                          • add_rx_buffer()
• poll_completed()
                                                          • poll_completed()



          VLANClientState                VLANClientState

            VirtIONetState                 VirtIONetState



                     • Use Linux bridging for packet copying
                     • Use vringfd more effectively
                     • Can make use of DMA offloading and associated trickery
Managability
●   The monitor interface is widely used to
    automate management
    –   Almost no error messages
●   Need a non-human mode for the monitor
●   Need a `select' command for receiving
    asynchronous operations
●   Need to eliminate assumption of “global
    monitor”
    –   Each monitor callback should take an opaque
        state parameter
    –   term_printf() should take that state
    –   Monitor commands should be dynamically
        registerable
General code quality
●   Splitting up vl.c
    –   Logically divisible into net, chardev, option
        parsing, etc.
●   Organizing some of the existing code into
    directories would be a good idea
    –   Block is long overdue
    –   Some better organization in hw/
●   This must be approach gradually
    –   There is such a thing as too much cosmetic
        churn
Upstream support for KVM
●   Need to get rid of some if (kvm_enabled())
●   Need to clean-up qemu-kvm*.c
●   Need to fork device models for in-kernel KVM
    devices
    –   Already do this for PIT, need to do it for APIC
●   Get migration, virtio, and PCI hot-plug upstream
    –   What to do about BIOS changes?
●   Introduce a target-kvm
    –   Useful for embedded PPC, Itanium, and s390
    –   Would produce a qemu-kvm executable
Where can we get to




A QEMU that we're happy with by
     KVM Forum 2009
Questions
●   Comments?

More Related Content

What's hot

Xensummit Asia 2009 Talk
Xensummit Asia 2009 TalkXensummit Asia 2009 Talk
Xensummit Asia 2009 Talk
The Linux Foundation
 
XS Boston 2008 OpenSolaris
XS Boston 2008 OpenSolarisXS Boston 2008 OpenSolaris
XS Boston 2008 OpenSolaris
The Linux Foundation
 
XS Japan 2008 Xen Mgmt English
XS Japan 2008 Xen Mgmt EnglishXS Japan 2008 Xen Mgmt English
XS Japan 2008 Xen Mgmt English
The Linux Foundation
 
Project ACRN CPU sharing BVT scheduler in ACRN hypervisor
Project ACRN CPU sharing BVT scheduler in ACRN hypervisorProject ACRN CPU sharing BVT scheduler in ACRN hypervisor
Project ACRN CPU sharing BVT scheduler in ACRN hypervisor
Project ACRN
 
IBM MQ - Comparing Distributed and z/OS platforms
IBM MQ - Comparing Distributed and z/OS platformsIBM MQ - Comparing Distributed and z/OS platforms
IBM MQ - Comparing Distributed and z/OS platforms
MarkTaylorIBM
 
Project ACRN GVT-d introduction and tutorial
Project ACRN GVT-d introduction and tutorialProject ACRN GVT-d introduction and tutorial
Project ACRN GVT-d introduction and tutorial
Project ACRN
 
Subversion Day Berlin 2011 Configuration Management With Subversion And Rpm
Subversion Day Berlin 2011 Configuration Management With Subversion And RpmSubversion Day Berlin 2011 Configuration Management With Subversion And Rpm
Subversion Day Berlin 2011 Configuration Management With Subversion And Rpm
Schlomo Schapiro
 
ACRN vMeet-Up EU 2021 - Real Time Management and Performance Optimization
ACRN vMeet-Up EU 2021 - Real Time Management and Performance OptimizationACRN vMeet-Up EU 2021 - Real Time Management and Performance Optimization
ACRN vMeet-Up EU 2021 - Real Time Management and Performance Optimization
Project ACRN
 
Link Virtualization based on Xen
Link Virtualization based on XenLink Virtualization based on Xen
Link Virtualization based on Xen
The Linux Foundation
 
QEMU and Raspberry Pi. Instant Embedded Development
QEMU and Raspberry Pi. Instant Embedded DevelopmentQEMU and Raspberry Pi. Instant Embedded Development
QEMU and Raspberry Pi. Instant Embedded Development
GlobalLogic Ukraine
 
Qemu Pcie
Qemu PcieQemu Pcie
Project ACRN hypervisor introduction
Project ACRN hypervisor introduction Project ACRN hypervisor introduction
Project ACRN hypervisor introduction
Project ACRN
 
GCMA: Guaranteed Contiguous Memory Allocator
GCMA: Guaranteed Contiguous Memory AllocatorGCMA: Guaranteed Contiguous Memory Allocator
GCMA: Guaranteed Contiguous Memory Allocator
SeongJae Park
 
IBM MQ - Comparing Distributed and z/OS platforms
IBM MQ - Comparing Distributed and z/OS platformsIBM MQ - Comparing Distributed and z/OS platforms
IBM MQ - Comparing Distributed and z/OS platforms
MarkTaylorIBM
 
ACRN vMeet-Up EU 2021 - debug ACRN hypervisor
ACRN vMeet-Up EU 2021 - debug ACRN hypervisorACRN vMeet-Up EU 2021 - debug ACRN hypervisor
ACRN vMeet-Up EU 2021 - debug ACRN hypervisor
Project ACRN
 
Project ACRN Device Passthrough Introduction
Project ACRN Device Passthrough IntroductionProject ACRN Device Passthrough Introduction
Project ACRN Device Passthrough Introduction
Project ACRN
 
Using QEMU for cross development
Using QEMU for cross developmentUsing QEMU for cross development
Using QEMU for cross development
Tetsuyuki Kobayashi
 
Linux rt in financial markets
Linux rt in financial marketsLinux rt in financial markets
Linux rt in financial markets
Adrien Mahieux
 
ACRN vMeet-Up EU 2021 - Bridging Orchestrator and Hard Realtime Workload Cons...
ACRN vMeet-Up EU 2021 - Bridging Orchestrator and Hard Realtime Workload Cons...ACRN vMeet-Up EU 2021 - Bridging Orchestrator and Hard Realtime Workload Cons...
ACRN vMeet-Up EU 2021 - Bridging Orchestrator and Hard Realtime Workload Cons...
Project ACRN
 
Qemu Introduction
Qemu IntroductionQemu Introduction
Qemu Introduction
Chiawei Wang
 

What's hot (20)

Xensummit Asia 2009 Talk
Xensummit Asia 2009 TalkXensummit Asia 2009 Talk
Xensummit Asia 2009 Talk
 
XS Boston 2008 OpenSolaris
XS Boston 2008 OpenSolarisXS Boston 2008 OpenSolaris
XS Boston 2008 OpenSolaris
 
XS Japan 2008 Xen Mgmt English
XS Japan 2008 Xen Mgmt EnglishXS Japan 2008 Xen Mgmt English
XS Japan 2008 Xen Mgmt English
 
Project ACRN CPU sharing BVT scheduler in ACRN hypervisor
Project ACRN CPU sharing BVT scheduler in ACRN hypervisorProject ACRN CPU sharing BVT scheduler in ACRN hypervisor
Project ACRN CPU sharing BVT scheduler in ACRN hypervisor
 
IBM MQ - Comparing Distributed and z/OS platforms
IBM MQ - Comparing Distributed and z/OS platformsIBM MQ - Comparing Distributed and z/OS platforms
IBM MQ - Comparing Distributed and z/OS platforms
 
Project ACRN GVT-d introduction and tutorial
Project ACRN GVT-d introduction and tutorialProject ACRN GVT-d introduction and tutorial
Project ACRN GVT-d introduction and tutorial
 
Subversion Day Berlin 2011 Configuration Management With Subversion And Rpm
Subversion Day Berlin 2011 Configuration Management With Subversion And RpmSubversion Day Berlin 2011 Configuration Management With Subversion And Rpm
Subversion Day Berlin 2011 Configuration Management With Subversion And Rpm
 
ACRN vMeet-Up EU 2021 - Real Time Management and Performance Optimization
ACRN vMeet-Up EU 2021 - Real Time Management and Performance OptimizationACRN vMeet-Up EU 2021 - Real Time Management and Performance Optimization
ACRN vMeet-Up EU 2021 - Real Time Management and Performance Optimization
 
Link Virtualization based on Xen
Link Virtualization based on XenLink Virtualization based on Xen
Link Virtualization based on Xen
 
QEMU and Raspberry Pi. Instant Embedded Development
QEMU and Raspberry Pi. Instant Embedded DevelopmentQEMU and Raspberry Pi. Instant Embedded Development
QEMU and Raspberry Pi. Instant Embedded Development
 
Qemu Pcie
Qemu PcieQemu Pcie
Qemu Pcie
 
Project ACRN hypervisor introduction
Project ACRN hypervisor introduction Project ACRN hypervisor introduction
Project ACRN hypervisor introduction
 
GCMA: Guaranteed Contiguous Memory Allocator
GCMA: Guaranteed Contiguous Memory AllocatorGCMA: Guaranteed Contiguous Memory Allocator
GCMA: Guaranteed Contiguous Memory Allocator
 
IBM MQ - Comparing Distributed and z/OS platforms
IBM MQ - Comparing Distributed and z/OS platformsIBM MQ - Comparing Distributed and z/OS platforms
IBM MQ - Comparing Distributed and z/OS platforms
 
ACRN vMeet-Up EU 2021 - debug ACRN hypervisor
ACRN vMeet-Up EU 2021 - debug ACRN hypervisorACRN vMeet-Up EU 2021 - debug ACRN hypervisor
ACRN vMeet-Up EU 2021 - debug ACRN hypervisor
 
Project ACRN Device Passthrough Introduction
Project ACRN Device Passthrough IntroductionProject ACRN Device Passthrough Introduction
Project ACRN Device Passthrough Introduction
 
Using QEMU for cross development
Using QEMU for cross developmentUsing QEMU for cross development
Using QEMU for cross development
 
Linux rt in financial markets
Linux rt in financial marketsLinux rt in financial markets
Linux rt in financial markets
 
ACRN vMeet-Up EU 2021 - Bridging Orchestrator and Hard Realtime Workload Cons...
ACRN vMeet-Up EU 2021 - Bridging Orchestrator and Hard Realtime Workload Cons...ACRN vMeet-Up EU 2021 - Bridging Orchestrator and Hard Realtime Workload Cons...
ACRN vMeet-Up EU 2021 - Bridging Orchestrator and Hard Realtime Workload Cons...
 
Qemu Introduction
Qemu IntroductionQemu Introduction
Qemu Introduction
 

Similar to Buiding a better Userspace - The current and future state of QEMU and KVM integration Buiding a better Userspace - The current and future state of QEMU and KVM integration

Rmll Virtualization As Is Tool 20090707 V1.0
Rmll Virtualization As Is Tool 20090707 V1.0Rmll Virtualization As Is Tool 20090707 V1.0
Rmll Virtualization As Is Tool 20090707 V1.0
guest72e8c1
 
RMLL / LSM 2009
RMLL / LSM 2009RMLL / LSM 2009
RMLL / LSM 2009
Franck_Villaume
 
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack
eurobsdcon
 
Kvm optimizations
Kvm optimizationsKvm optimizations
Kvm optimizations
OpenNebula Project
 
XPDDS17: Keynote: Shared Coprocessor Framework on ARM - Oleksandr Andrushchen...
XPDDS17: Keynote: Shared Coprocessor Framework on ARM - Oleksandr Andrushchen...XPDDS17: Keynote: Shared Coprocessor Framework on ARM - Oleksandr Andrushchen...
XPDDS17: Keynote: Shared Coprocessor Framework on ARM - Oleksandr Andrushchen...
The Linux Foundation
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
dotCloud
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Docker, Inc.
 
High-Performance Computing with C++
High-Performance Computing with C++High-Performance Computing with C++
High-Performance Computing with C++
JetBrains
 
Experiences porting KVM to SmartOS
Experiences porting KVM to SmartOSExperiences porting KVM to SmartOS
Experiences porting KVM to SmartOS
bcantrill
 
Joyent's Bryan Cantrill: Experiences Porting KVM to SmartOS at KVM Forum, Aug...
Joyent's Bryan Cantrill: Experiences Porting KVM to SmartOS at KVM Forum, Aug...Joyent's Bryan Cantrill: Experiences Porting KVM to SmartOS at KVM Forum, Aug...
Joyent's Bryan Cantrill: Experiences Porting KVM to SmartOS at KVM Forum, Aug...
Peter Tripp
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
Yandex
 
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdfStorage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
aaajjj4
 
Considerations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfConsiderations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmf
hik_lhz
 
CPU Caches
CPU CachesCPU Caches
CPU Caches
shinolajla
 
SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgi
Takuya ASADA
 
IBM MQ Appliance - Administration simplified
IBM MQ Appliance - Administration simplifiedIBM MQ Appliance - Administration simplified
IBM MQ Appliance - Administration simplified
Anthony Beardsmore
 
OpenStack Nova Scheduler
OpenStack Nova Scheduler OpenStack Nova Scheduler
OpenStack Nova Scheduler
Peeyush Gupta
 
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMUSFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU
Linaro
 
Hands on Virtualization with Ganeti (part 1) - LinuxCon 2012
Hands on Virtualization with Ganeti (part 1)  - LinuxCon 2012Hands on Virtualization with Ganeti (part 1)  - LinuxCon 2012
Hands on Virtualization with Ganeti (part 1) - LinuxCon 2012
Lance Albertson
 
Achieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVMAchieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVM
data://disrupted®
 

Similar to Buiding a better Userspace - The current and future state of QEMU and KVM integration Buiding a better Userspace - The current and future state of QEMU and KVM integration (20)

Rmll Virtualization As Is Tool 20090707 V1.0
Rmll Virtualization As Is Tool 20090707 V1.0Rmll Virtualization As Is Tool 20090707 V1.0
Rmll Virtualization As Is Tool 20090707 V1.0
 
RMLL / LSM 2009
RMLL / LSM 2009RMLL / LSM 2009
RMLL / LSM 2009
 
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack
 
Kvm optimizations
Kvm optimizationsKvm optimizations
Kvm optimizations
 
XPDDS17: Keynote: Shared Coprocessor Framework on ARM - Oleksandr Andrushchen...
XPDDS17: Keynote: Shared Coprocessor Framework on ARM - Oleksandr Andrushchen...XPDDS17: Keynote: Shared Coprocessor Framework on ARM - Oleksandr Andrushchen...
XPDDS17: Keynote: Shared Coprocessor Framework on ARM - Oleksandr Andrushchen...
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
 
High-Performance Computing with C++
High-Performance Computing with C++High-Performance Computing with C++
High-Performance Computing with C++
 
Experiences porting KVM to SmartOS
Experiences porting KVM to SmartOSExperiences porting KVM to SmartOS
Experiences porting KVM to SmartOS
 
Joyent's Bryan Cantrill: Experiences Porting KVM to SmartOS at KVM Forum, Aug...
Joyent's Bryan Cantrill: Experiences Porting KVM to SmartOS at KVM Forum, Aug...Joyent's Bryan Cantrill: Experiences Porting KVM to SmartOS at KVM Forum, Aug...
Joyent's Bryan Cantrill: Experiences Porting KVM to SmartOS at KVM Forum, Aug...
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
 
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdfStorage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
 
Considerations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfConsiderations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmf
 
CPU Caches
CPU CachesCPU Caches
CPU Caches
 
SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgi
 
IBM MQ Appliance - Administration simplified
IBM MQ Appliance - Administration simplifiedIBM MQ Appliance - Administration simplified
IBM MQ Appliance - Administration simplified
 
OpenStack Nova Scheduler
OpenStack Nova Scheduler OpenStack Nova Scheduler
OpenStack Nova Scheduler
 
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMUSFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU
 
Hands on Virtualization with Ganeti (part 1) - LinuxCon 2012
Hands on Virtualization with Ganeti (part 1)  - LinuxCon 2012Hands on Virtualization with Ganeti (part 1)  - LinuxCon 2012
Hands on Virtualization with Ganeti (part 1) - LinuxCon 2012
 
Achieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVMAchieving the Ultimate Performance with KVM
Achieving the Ultimate Performance with KVM
 

Recently uploaded

20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 

Recently uploaded (20)

20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 

Buiding a better Userspace - The current and future state of QEMU and KVM integration Buiding a better Userspace - The current and future state of QEMU and KVM integration

  • 1. Building a Better Userspace KVM Forum 2008 Anthony Liguori – aliguori@us.ibm.com Open Virtualization IBM Linux Technology Center June 11th 2008 Linux is a registered trademark of Linus Torvalds.
  • 3. Reflecting on QEMU ● QEMU is a community-driven project – No company has sponsored major portions of it's development ● QEMU does a really amazing thing – Can emulate 9 target architectures on 13 host architectures – Provides full system emulation supporting ~200 distinct devices ● Is the basis of KVM, Xen HVM, and xVM Virtual Box – Every Open Source virtualization project uses QEMU ● QEMU receives very little credit for this
  • 4. But all is not well Many see QEMU as the last mile for open  virtualization
  • 5. QEMU has a patch problem ● Many patches are ignored ● Security fixes are not applied in a timely manner ● The quality of the patches that are committed is often questionable ● Some sub-systems are effectively unmaintained How do we fix it?
  • 6. Fixing the QEMU process ● First step – Patch review – IBM will be dedicating resources (me) to reviewing patches on qemu-devel – This is now happening (although I need help) ● Second step – I'll be seeking commit access for virtualization support in QEMU ● Many patches will still be committed by other people according to subsystem maintainer ● I'll try to review everything I can ● Third step – As things improve, we can hopefully move to a DSCM like git and start taking in pull requests
  • 7. Patch Review ● We need people reviewing patches on qemu- devel – Make use of Reviewed-by, Tested-by, Acked-by ● It's arguable that more committers are needed – QEMU has about 9 as it is; KVM just has 1 ● We need to jump start a culture of review
  • 8. The Challenge ● Action is needed from the people in the room ● What can we do to facilitate this?
  • 9. Let's get down to business Once we fix the QEMU process, we can start to do the fun stuff
  • 10. PCI DMA Interface ● We want zero copy – pci_ram_base + PA is broken, but fast – iommu emulation complicates zero copy ● Some DMA engines can be programmed to perform data transformation (xor) – The device drivers need lots of updating ● QEMU uses a multilevel table to lookup pci_ram_base offsets for a given PA – Even though there are a small number of contiguous ram regions ● 0-640K, 1M-3.8GB, 4GB+ – We should have a fast path for RAM to avoid lookups in the l1_phys_map ● Fabrice is on-board
  • 11. PCI devices today PCI Bus pci_register_device cpu_physical_memory_rw, ldl/stl RAM PCI Device ldl/stl cpu_register_physical_memory CPU • Often broken for > 4GB RAM • Byte-swapping is FUBAR • IOMMU support is impossible • This model is fundamentally broken
  • 12. DMA API PCI Bus pci_register_device pci_register_region pci_ldl/pci_stl RAM PCI Device cpu_register_physical_memory ldl/stl CPU • Simplify devices; no special PIO/MMIO paths • Everything goes through PCI bus, allowing zero-copy or IOMMU • Can finally sanitize byte swapping
  • 13. Tasklets ● Linux has sucky interfaces – O_NONBLOCK still blocks – linux-aio can block – posix-aio uses threads ● Threads are evil, but are the only way to get around synchronous interfaces ● Introduce a generic thread-pool – Tasks are normally run holding a “big qemu lock” – Tasks can mark themselves as not requiring the BQL – Tasks can also drop the BQL before sleeping ● Caution must be exercised when doing this!
  • 14. Introducing re-entrancy ● Introduce a QEMUMutex type – Can be used to make existing code re-entrant – On platforms without Tasklets, it's simple reference counting – Can be used to gradually wean QEMU off of the BQL ● <Refer to QEMUBH code>
  • 15. Block API ● The current block API is a mix of synchronous/asynchronous behavior – Qcow2 does meta-data lookup synchronously but data lookup asynchronously – Possible source of latency ● We either need to rewrite Qcow2 to use a state machine and be entirely asynchronous or ● Make qcow2 synchronous, but re-entrant ● Use tasklets to parallelize qcow2 requests ● Equally applicable to other disk types (vmdk) – Illustrate current API verses tasklet one
  • 16. Block API Old API New API ● bdrv_read() ● bdrv_pread() ● bdrv_write() ● bdrv_pwrite() ● bdrv_aio_read() ● bdrv_aio_write() ● bdrv_aio_cancel() ● bdrv_pread() ● bdrv_pwrite()
  • 17. Networking ● Current API is good for performance, but bad for zero-copy ● For vringfd, we need an API that pre-registers RX buffers, preferrably with a batching API for RX completion notification ● Can also be used for rate limiting without dropping packets – Illustrate current API
  • 18. Networking API Effectively a HUB VLANState • fd_read() • fd_read() • fd_can_read() • fd_can_read() • fd_read() • fd_can_read() VLANClientState VLANClientState VLANClientState TAPState VirtIONetState VirtIONetState • Does not scale well for > 1 network device • Cannot take advantage of DMA engines • Is confusing to users
  • 19. Proposed Networking API VLANState BridgeState • add_tx_buffer() • add_tx_buffer() • add_rx_buffer() • add_rx_buffer() • poll_completed() • poll_completed() VLANClientState VLANClientState VirtIONetState VirtIONetState • Use Linux bridging for packet copying • Use vringfd more effectively • Can make use of DMA offloading and associated trickery
  • 20. Managability ● The monitor interface is widely used to automate management – Almost no error messages ● Need a non-human mode for the monitor ● Need a `select' command for receiving asynchronous operations ● Need to eliminate assumption of “global monitor” – Each monitor callback should take an opaque state parameter – term_printf() should take that state – Monitor commands should be dynamically registerable
  • 21. General code quality ● Splitting up vl.c – Logically divisible into net, chardev, option parsing, etc. ● Organizing some of the existing code into directories would be a good idea – Block is long overdue – Some better organization in hw/ ● This must be approach gradually – There is such a thing as too much cosmetic churn
  • 22. Upstream support for KVM ● Need to get rid of some if (kvm_enabled()) ● Need to clean-up qemu-kvm*.c ● Need to fork device models for in-kernel KVM devices – Already do this for PIT, need to do it for APIC ● Get migration, virtio, and PCI hot-plug upstream – What to do about BIOS changes? ● Introduce a target-kvm – Useful for embedded PPC, Itanium, and s390 – Would produce a qemu-kvm executable
  • 23. Where can we get to A QEMU that we're happy with by KVM Forum 2009
  • 24. Questions ● Comments?