Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

XPDDS18: Design and Implementation of Automotive: Virtualization Based on Xen - Sung-Min Lee, Samsung Electronics


Published on

This talk presents a production-ready automotive virtualization solution with Xen. The key requirements that we focus are super-fast startup and recovery from failure, static virtual machine creation with dedicated resources, and performance effective graphics rendering. To reduce the boot time, we optimize the Xen startup procedure by effectively initializing Xen heap and VM memory, and booting multiple VMs concurrently. We provide fast recovery mechanism by re-implementing the VM reset feature. We also develop a highly optimized graphics APIs-forwarding mechanism supporting OpenGLES APIs up to v3.2. The pass rate of Khronos CTS in a guest OS is comparable to the Domain0’s. Our experiment shows that our virtualization solution provides reasonable performance for ARM-based automotive systems (hypervisor booting: less than 70ms, graphics performance: about 96% of Domain0).

Published in: Technology
  • Be the first to comment

  • Be the first to like this

XPDDS18: Design and Implementation of Automotive: Virtualization Based on Xen - Sung-Min Lee, Samsung Electronics

  1. 1. Design and Implementation of Automotive Virtualization Based on Xen June 21, 2018 Xen Developer and Design Summit Sung-Min Lee
  2. 2. 1 / 21 Acknowledgement Bokdeuk Jeong, Hanjung Park, Kiwoong Ha, Hyeongseop Shim, Jinsub Sim, Byungchul So, Min Kang, Junbong Yu, Hyeonkwon Cho, Sungjin Kim, Hakyoung Kim, Sangbok Han, Vasily Leonenko, Dmitrii Martynov, Andrew Gazizov, Sergey Grekhov, Ildar Ismagilov, Imran Navruzbekov, and colleagues from S.LSI Major Contributors
  3. 3. Contents Why Virtualization Key Requirements Our Design and Implementation Evaluation Demo (Video Clip) Ⅰ Ⅱ Ⅲ Ⅳ Ⅴ
  4. 4. 3 / 21 Why Virtualization in Automotive Industry? Easy Firmware Update for Automotive System Enhanced In-Vehicle Security Cost/Complexity Reduction Virtualization?
  5. 5. 4 / 21 Key Requirements for Automotive Virtualization Fast Startup Fault Recovery for Reliable In-Vehicle Services High Performance Graphics Support for Guest OS
  6. 6. 5 / 21 Overall Architecture Xen Dom0 (Cluster OS) DomU (IVI OS) Virtual I/O Frontend Drivers (Storage, N/W, Serial, Input, Audio, Camera) Virtual I/O Backend Drivers (Storage, N/W, Serial, Input, Audio, Camera) Passthrough I/O Drivers Virtual Graphics Frontend Virtual Graphics Backend Fast Startup Fault Recovery Native Device Drivers Fault Recovery IVI Platform (e.g., GDP)Cluster OS UI Fault RecoveryXen Tools PV Backend Drivers for DomU Passthrough (clocksource,regulator,pinctrl) PV Frontend Driver for Passthrough
  7. 7. 6 / 21 Fast Startup (1/3) Xen Heap Initialization Static VM Memory Assignment Reducing Memory Initialization Time Open source Xen is based on per-page xen heap insertion Reduce the number of the init function invocations and remove spinlocks by implementing a new initialization function Skip scrub_heap_pages() Assign specific addresses and size for VMs in device tree and exclude them for Xen Heap insertion The number of pages to initialize is reduced and double definition overhead (init + VM assigning) is disappeared Xen Heap initialization range Dom0 memory Dom1 memory Double assignment Dom0 Memory Dom1 Memory Heap Initialization Range is reduced Original ver Samsung ver Double assignment disappeared 0x8000xxxx 0xFFFFxxxxPagePage Page Page Xen Heap Page Page Page Page Xen Heap free_heap_pages fastboot_insert_page Call init function by the number of total pages times Initialize total pages as heap Don’t have to call init function many times alloc_domheap_pages domain_direct_assign_pages Contiguous Page Chunk Order of 2 Original ver Samsung ver
  8. 8. 7 / 21 Fast Startup (2/3) Relocate DTB Nodes Improve DTB Creation Algorithm Reducing DTB Lookup and Creation Time Scanning all nodes in device tree takes a lot when its size is large Group scattered “chosen” and “memory” nodes into one node and relocate it at the head of device tree to search faster Creating device tree for control OS takes most of time FDT library uses the most naïve algorithm when comparing strings Replace it with better one, Boyer- Moore algorithm with time complexity O(m+n) Flattened Device Tree Description Grouped information node Chosen node Memory node Memory node Visit all nodes to find information nodes Doesn't have to visit all nodes Original ver Samsung ver Special node “xen-early-scan-end”
  9. 9. 8 / 21 Fast Startup (3/3) Assign Big Core for Dom0 Creation Concurrent VM Creation Assign Big Core and Concurrent VM Creation During the Xen Startup Exynos8890 uses little core for boot strap processor (CPU0) Assign tasklet to sleeping big core to create control OS CPU0 waits until domain creation done To shorten the time spent on XL for DomU creation, Samsung moves its data structure initialization to xen startup Assign a guest OS creation tasklet to another sleeping big core and create simultaneously with control OS creation CPU0 (little core) CPU4 (Big core) Scanning DTB, Heap init, Secondary core booting.. Domain creation Schedule tasklet to Big core ETC Wait until dom creation done… Create domain Notify that creation done Scanning DTB, Heap init, Secondary core booting.. Domain creation Schedule control OS tasklet to CPU4 ETC Wait until dom(s) creation done… Create dom0 Notify that creation done CPU0 (little core) CPU4 (Big core) CPU5 (Big core) Schedule guest OS tasklet to CPU5 Create dom1
  10. 10. 9 / 21 Fault Recovery (1/3) First Sub-headingFault Recovery Scenario Linux Kernel Linux Kernel Xen Watchdogd Xen Watchdog Timer Xen ToolsWatchdogd HW Watchdog Timer Xen Shutdown Manager Samsung VM Reset Manager (Relaunch VMs) HW Reset XEN CrashCrash Exynos8 KickKick Timeout Timeout 1.Reset Request 2.Execute resetVM void panic (const char *fmt, …)  hypercall HYPERVISOR_sched_op(SCHEDOP_ shutdown, &r); void panic (const char *fmt, …)  machine_restart Set timeout/ notification period Set timeout/ notification period Dom0 DomU
  11. 11. 10 / 21 Fault Recovery (2/3) First Sub-headingConventional VM Restart vs Our VM Reset VM Restart VM Reset Destroy a domain Create a new domain Free/Allocate all resources Use hypercalls for destroy domain and create domain in xen - XEN_DOMCTL_destroydomain - XEN_DOMCTL_createdomain Destroy devices only Just reuse almost all resources like vcpu structure, shared_info, p2m table, etc Reset the existing domain Add a new hypercall for reset domain in xen - XEN_DOMCTL_samsung_reset
  12. 12. 11 / 21 Fault Recovery (3/3) First Sub-heading XEN Dom0 (Control domain) XL DomU XS domain_reset - Reset grant table - Reset vcpu, vgic, vtimer - Reset shared_info page - Reset p2m table - Reset event channel send_global_virq(VIRQ_DOM_EXC); Release Domain Reset flags setting libxl_domain_reset 1. xs_release_domain 2. Destroy devices fire Hypercall: XEN_DOMCTL_unpausedomain DomU unpause Relaunching complete ② ③ ④ ⑤ domain_shutdown (Xen Shutdown Manager) - Set reason flags for shutdown - Pause vcpus initiate_domain_create - Reload kernel binary - Reload device tree from memory - Reset CPU registers - xs_introduce_domain Hypercall: XEN_DOMCTL_samsung_reset ⑥ ⑦ devices_destroy_cb - Request domain reset Xen Watchdog Timeout ① VM Reset Procedure Hypercall: HYPERVISOR_sched _op(SCHEDOP_shut down,&r); Crash ①
  13. 13. 12 / 21 I/O Virtualization First Sub-heading DomU (IVI OS)Dom0 (Cluster OS) Dom0 (Cluster OS) DomU (IVI OS) PV I/O Drivers Passthrough I/O Drivers virtual clocksources, regulators and pintcontrollers are provided to DomU for device passthrough Xen Pass-thru Device Drivers vClock FE vRegulator FE vPinctrl FE vNet, vStorage, vConsole : utilized existing open source codes vInput: provides touch vCamera, vAudio : developed from scratch Performance enhancement with direct cache operations on foreign pages vNet BE vBlock BE vCamera BE vAudio BE vConsole BE vInput BE vNet FE vBlock FE vCamera FE (as a V4L2 driver) vAudio FE (as an ALSA driver) vConsole FE vInput FE image Native Device Drivers (Pinctrl, Regulator, Clocksource) vClock BE vRegulator BE vPinctrl BE I/O device Xen I/O device
  14. 14. 13 / 21 Graphics Virtualization (1/3) First Sub-headingOverall Architecture of Graphics Virtualization Xen Dom0 (Cluster OS) DomU (IVI OS) Application Display Server SVGL FE (Library) Kernel vGPU FE Graphics Control Manager (GCM) FE Virtual OpenGL ES FE Virtual DRI (GBM, Wayland Buf) Virtual GPU Command Buffer Native Display Server Kernel Native Graphics Library Native Graphics Driver vGPU BE Samsung Virtual Display Manager (SVDM) Client (Native Window) Samsung Virtual Graphics Library(SVGL) BE Virtual EGL FE GL Render Virtual EGL BE Virtual OpenGL ES BE RenderThread ③ ④⑤ ⑥ ① ②⑧ ⑨ Encoded Graphics Data to Shared Memory Graphics APIsVirtual Display GLES APIs and Data EGL APIs GCM BE eglCreateWindowSurface [eglCreatePbufferSurface] ⑦ ②
  15. 15. 14 / 21 Graphics Virtualization (2/3) DomU Kernel vGPU FE Key Features Adaptive Interrupt/Poll Processing Key Features and Performance Optimization Latest Graphics APIs Support : OpenGL ES v3.2/EGL v1.5 Support Shared Memory Between BE and FE With Simple Protocol Batch Processing for Transferring APIs Selective APIs Transfer to BE Adaptive Interrupt/Poll Processing in FE Efficient Graphics Buffer Management Dynamically Choosing Either Interrupt or Poll Based on Request Rate Between SVGL FE and vGPU FE Recalculate “the rate” at Every 250 ms Performance Improvement Over Interrupt Method with the Threshold “0.3” in Policy [75 cmd buf flush OPs/250ms] Xen SVGL FE Dom0 Kernel Graphics Native Library SVDM Application Virtual GPU Command Buffer interrupt GCM Monitor Interrupt/ Poll Read Graphics APIs & Data Interrupt or Poll Rate = # of Buffer Flush OPs/Time Policy Threshold: 0.3  # if 0.3  the calculated rate then Polling else Interrupt
  16. 16. 15 / 21 Graphics Virtualization (3/3) First Sub-heading DomU Kernel vGPU Graphics Command Buffer Transfer Variable Size Buffer Allocation Efficient Graphics Command Buffer Management Accumulate each Opcode of API whose return type is “void” and Data to the Buffer Flush Buffer for “glFlush”, “glFinish” and “non- void return type APIs”. Flush Buffer in the Case of “Buffer Full” Xen SVGL FE DomU Kernel Dom0 Kernel Native Graphics Library SVDM Application Virtual GPU Command Buffer … SVGL FE Dom0 Kernel Native Graphics Library SVDM Application Virtual GPU CMD Buffer Xen 30 M 30 M EGL/GLES API add/remove memchunk Send event for memory alloc/realloc Map/ Unmap Shared Memory 30 M 30 M glFlush, glFinish, Non void return type APIs Void Return type APIs Store opcode and Data Send accumulated APIs Any type of APIs Buffer is full Dynamically Resizable Buffer Management : Initial Buffer Size (30MB) Expand Buffer Pool Based on Available Buffer Size and Requested Size from SVGL FE Shrink Buffer When App is Terminated Foreignmap Buffer Pool Manager Buffer Allocator
  17. 17. 16 / 21 Evaluation: Startup Performance First Sub-heading 144 103 Base Line Reduce Memory Initialization Improve DTB creation Bigcore Assining Time Measurement Unit : msec Xen Startup Time Changes Time Measurement for Sections Measured Time: start_xen() ~ init_done() [xen/arch/setup.c] 10 471 313 685 64 16 1 Original ver Samsung ver Page Scrubbing Domain creation Heap & Hardware Initialization Scanning DTB 10 471 685 313 1 9 59 1479 69 1,479 msec 69 msec
  18. 18. 17 / 21 Evaluation: VM Fault Recovery Time First Sub-headingVM Restart Time vs. VM Reset Time (sec) 7.1 0.97 VM Restart VM Reset Time Measurement After Optimization libxl_domain_unpause function [in xl_cmdimpl.c] libxl_domain_unpause function [in xl_cmdimpl.c] xl reset xl restart
  19. 19. 18 / 21 Evaluation: I/O Performance First Sub-heading Storage Throughput Storage and Network Throughput Dom0 216MB/s (94.7%) 228MB/s DomU Network Throughput Dom0 129Mbits/s (92.8%) 139Mbits/s DomUDom0 132MB/s (98.5%) 134MB/s DomU Seq. Read Seq. Write
  20. 20. 19 / 21 Evaluation: Khronos CTS First Sub-headingNumber of Passed Test Case by Khronos CTS 1228 2991 1392 2511 261 1228 2991 1392 2511 261 OpenGL ES 2.0 OpenGL ES 3.0 OpenGL ES 3.1 OpenGL ES 3.2 OpenGL ES EXT Dom0 DomU Number of Passed TC Total # of Passed TC: 8,383
  21. 21. 20 / 21 Evaluation: Graphics Performance Glmark2-es-wayland 1920x1080 Off-screen 660 662 663 662 662 661.8 639 638 636 640 639 638.4 1st 2nd 3rd 4th 5th Average Dom0 DomU Score 96.8% 96.4% 95.9% 96.7% 96.5%96.5% 1st 2nd 3rd 4th 5th Average 1st 2nd 3rd 4th 5th Average Ratio
  22. 22. 21 / 21 Demo System Environment - Exynos 8890 Octacore (Big core: 2.28GHz, Little: 1.58GHz) - 6 GB DRAM - 32 GB MMC - Mali T880 MP12 Xen (v4.8) Cluster OS (Dashboard UI) IVI OS (Genivi Demo Platform) Exynos 8890
  23. 23. Thank you 22