Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

XPDDS18: The Art of Virtualizing Cache Maintenance - Julien Grall, Arm

174 views

Published on

The Arm architecture allows for a wide variety of cache configurations, levels and features. This enables building systems that will optimally fit power/area budgets set for the target application.

A consequence of this is that architecturally compliant software has to cater for a much wider range of behaviors than on other architectures. While most software uses cache instructions that don't need special treatment in a virtualized environment, some will want to directly manage a given cache using set/way instructions and will introduce challenges for the hypervisor to handle them.

This talk will give an overview of how caches behave in the Arm architecture, especially in the context of virtualization. It will then describe the problem of using set/way instructions in a virtualized environment. We will also discuss the modifications required in Xen to handle those instructions.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

XPDDS18: The Art of Virtualizing Cache Maintenance - Julien Grall, Arm

  1. 1. The art of virtualizing cache Julien Grall <julien.grall@arm.com> Xen Developer Summit 2018 © 2018 Arm Limited
  2. 2. Cache coherency on Arm Cache coherent architecture Scales from single CPU to massive SMP systems Implementer chooses to offer caches that are visible to so ware invisible to so ware ... or any point between these two op ons Enough abstrac on to cope with these differences Allows different PPA (Performance, Power, Area) points: Running a VM on your smart watch? Easy. The same VM on your $15K server? Sure. The architecture is designed for maximum flexibility. 2 © 2018 Arm Limited
  3. 3. Cache architecture (Modified) Harvard architecture Mul ple levels of caching (with snooping) Separate I-cache and D-cache (no snooping between I and D) Either PIPT or non-aliasing VIPT for D-cache Mee ng at the Point of Unifica on (PoU) Controlled by a ributes in the page tables Memory type (normal, device) Cacheability, Shareability Two Enable bits (I and C) Actually not really an Enable switch More like a global ”a ribute override” Generally invisible to normal so ware With a few key excep ons An example is Executable code loading / genera on 3 © 2018 Arm Limited
  4. 4. Interac ng with caches The Arm architecture offers the usual (mostly) privileged opera ons to interact with caches: Invalidate (I & D-cache) Clean (D-cache) Clean + Invalidate (D-cache) Cache maintenance by Virtual Address Cache maintenance by Set/Way 4 © 2018 Arm Limited
  5. 5. Interac ng with caches The Arm architecture offers the usual (mostly) privileged opera ons to interact with caches: Invalidate (I & D-cache) Clean (D-cache) Clean + Invalidate (D-cache) Cache maintenance by Virtual Address Cache maintenance by Set/Way Set/Way opera ons are local to a CPU Will break if more than one CPU is ac ve No ALL opera on on the D side Itera on over Sets/Ways Only for bring-up/shutdown of a CPU Not all the levels have to implement Set/Way System caches only know about VA Set/Way opera ons are impossible to virtualize VA opera ons are the only way to perform cache maintenance outside of CPU bring-up/teardown 4 © 2018 Arm Limited
  6. 6. Introducing Stage-2 transla on Virtual machines add their share of complexity: Second stage of page tables (equivalent to EPT on x86) Second set of memory a ributes Xen always configures RAM cacheable at Stage-2 These memory a ributes get combined with those controlled by the guest: The strongest memory type wins Device vs Normal memory The least cacheable memory a ribute wins Non-cacheable is always enforced And the hypervisor doesn’t much have control over it Some global controls, but nothing fine grained 5 © 2018 Arm Limited
  7. 7. Linux 32-bit boot example Boo ng a 32-bit guest on a 64-bit host (with an L3 system cache). The (compressed) kernel is in RAM The embedded decompressor: enables the caches decompress the image turns the cache off, flushes it by Set/Way, and jumps to the payload... What could possibly go wrong? 6 © 2018 Arm Limited
  8. 8. Linux 32-bit boot example Boo ng a 32-bit guest on a 64-bit host (with an L3 system cache). The (compressed) kernel is in RAM The embedded decompressor: enables the caches decompress the image turns the cache off, flushes it by Set/Way, and jumps to the payload... What could possibly go wrong? System caches do not implement Set/Way ops So our guest code sits in L3, while fetching from RAM 6 © 2018 Arm Limited
  9. 9. Set/Way in virtualized environment The guest cannot directly use set/way because of: The presence of system caches on Arm64 The vCPU can be migrated to another pCPU at any me The new pCPU cache may not be cleaned How can we solve this? 7 © 2018 Arm Limited
  10. 10. Set/Way in virtualized environment The guest cannot directly use set/way because of: The presence of system caches on Arm64 The vCPU can be migrated to another pCPU at any me The new pCPU cache may not be cleaned How can we solve this? We need to trap these ops and convert them into VA ops Which means itera ng over all the mapped pages Good thing we’re only doing that at boot me! 7 © 2018 Arm Limited
  11. 11. Implementa on of Set/Way in Xen 8 © 2018 Arm Limited
  12. 12. Xen and Set/Way today Set/Way instruc ons are not trapped The guest is directly ac ng on the cache Poten al cause of a heisenbug in Osstest https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg03191.html All guests using Set/Way are unsafe on Xen Linux 32-bit UEFI ... 9 © 2018 Arm Limited
  13. 13. Cleaning guest memory We need to iterate on each mapped page and clean them. Any problems? 10 © 2018 Arm Limited
  14. 14. Cleaning guest memory We need to iterate on each mapped page and clean them. Any problems? Guest memory is always mapped Lots of pages to clean 32-bit Linux is using Set/Way during CPU bring-up Bring-up is bound by a meout Pages are cleaned when first assigned to the guest 10 © 2018 Arm Limited
  15. 15. Cleaning guest memory We need to iterate on each mapped page and clean them. Any problems? Guest memory is always mapped Lots of pages to clean 32-bit Linux is using Set/Way during CPU bring-up Bring-up is bound by a meout Pages are cleaned when first assigned to the guest We need to clean only pages used since the last flush. 10 © 2018 Arm Limited
  16. 16. Trapping Set/Way instruc ons Set/Way instruc ons usually happen: In batch of instruc ons Before turning on/off caches A poten al approach to trap would: On first Set/Way instruc on Enable trapping of VM instruc ons (e.g HCR EL2.TVM) Do a full clean of the guest memory Subsquent Set/Way instruc ons will be ignored un l the cache is toggled On cache toggling Do a full clean of the guest memory Turn off trapping of VM instruc ons 11 © 2018 Arm Limited
  17. 17. Current status Some approach was discussed on Xen-devel in December 2017 https://lists.xen.org/archives/html/xen-devel/2017-12/msg00328.html A PoC based on the feedback was wri en Sharing page-table is not possible with the approach More details will be posted on xen-devel 12 © 2018 Arm Limited
  18. 18. Conclusion Caches are not just a ”make it faster” block slapped on the side of the CPU They are essen al part of the coherency protocol Using uncached memory explicitely bypasses it It looks logical to cope with the consequence No magic involved! Following the architecture rules ensures correctness on all implementa ons RTFAA (Read The Fabulous ARM ARM, almost 7000 pages - and coun ng) 13 © 2018 Arm Limited
  19. 19. Ques ons? 14 © 2018 Arm Limited
  20. 20. The Arm trademarks featured in this presenta on are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respec ve owners. www.arm.com/company/policies/trademarks © 2018 Arm Limited

×