SlideShare a Scribd company logo
1 of 21
Download to read offline
Evaluation of X32 ABI for
            Virtualization and Cloud

     Jun Nakajima

     Intel Corporation


                                        1

Wednesday, August 29, 12                    1
Agenda

      • What is X32 ABI?
      • Benefits for Virtualization & Cloud
      • Evaluation of X32
      • Summary




                              2

Wednesday, August 29, 12                      2
x32 ABI Basics
   • x86-64 ABI but with 32-bit longs and pointers
   • 64-bit arithmetic
   • Fully utilize modern x86
   • 8 additional integer registers (16 total)
   • 8 additional SSE registers
   • SSE for FP math
   • 64-bit kernel is required
      - Linux kernel 3.4 has support for x32

    Same memory footprint as x86 with advantages of x86-64.
             No hardware changes are required.
                                     3

Wednesday, August 29, 12                                      3
ABI Comparison


                              i386          x86-64       x32

Integer registers              6              15          15

FP registers                   8              16          16

Pointers                     4 bytes        8 bytes     4 bytes

64-bit arithmetic              No            Yes         Yes

Floating point                x87            SSE         SSE

Calling convention          Memory         Registers   Registers

PIC prologue                2-3 insn        None        None
                                       4

Wednesday, August 29, 12                                           4
The x32 Performance Advantage
   • In the order of Expected Contribution:
         -   Efficient Position Independent Code
         -   Efficient Function Parameter Passing
         -   Efficient 64-bit Arithmetic
         -   Efficient Floating Point Operations




          X32 is expected to give a 10-20% performance boost
                               for C, C++


                                          5



Wednesday, August 29, 12                                       5
Efficient Position Independent Code

       extern int x, y, z;
       void foo () { z = x * y; }

 i386 psABI                                    x32 psABI
       call __i686.get_pc_thunk.cx                movl     x@GOTPCREL(%rip), %edx
       addl $_GLOBAL_OFFSET_TABLE_, %ecx          movl     y@GOTPCREL(%rip), %eax
       movl!y@GOT(%ecx), %eax                     movl     (%rax), %rax
       movl!x@GOT(%ecx), %edx                     imull     (%rdx), %rax
       movl!(%eax), %eax                          movl     z@GOTPCREL(%rip), %edx
       imull!        (%edx), %eax                 movl     %rax, (%rdx)
       movl!z@GOT(%ecx), %edx                     ret
       movl!%eax, (%edx)
       ret

 __i686.get_pc_thunk.cx:
     movl!(%esp), %ecx
     ret




                           X32 PIC code is shorter and faster
                                           6

Wednesday, August 29, 12                                                            6
Efficient Function Parameter Passing


         void bar (int x, int y, int z);
         void foo (int x, int y, int z) { bar (y, z, x); }


     i386 psABI                               x32 psABI
          subl       $28, %esp                   subl     $8, %esp
          movl       32(%esp), %eax              movl     %edi, %eax
          movl       %eax, 8(%esp)               movl     %esi, %edi
          movl       40(%esp), %eax              movl     %edx, %esi
          movl       %eax, 4(%esp)               movl     %eax, %edx
          movl       36(%esp), %eax              call     bar
          movl       %eax, (%esp)                addl     $8, %esp
          call       bar                         ret
          addl       $28, %esp
          ret




                           X32 passes parameters in registers
                                          7

Wednesday, August 29, 12                                               7
Efficient 64-bit Integer Arithmetic
         extern long long x, y, z;

         void foo () {      z = x * y; }


    i386 psABI                                 x32 psABI
     !    movl!
              x+4, %edx                        !   movq!
                                                       x(%rip), %rax
    !     movl!
              y+4, %eax                        !   imulq!  y(%rip), %rax
    !     imull!  y, %edx                      !   movq!
                                                       %rax, z(%rip)
    !     imull!  x, %eax                      !   ret
    !     leal!
              (%edx,%eax), %ecx
    !     movl! %eax
              y,
    !     mull!
              x
    !     addl!
              %ecx, %edx
    !     movl!
              %eax, z
    !     movl!
              %edx, z+4
    !     ret



             X32 provides very efficient 64-bit integer support
                    (3 instructions vs. 10 instructions).
                                           8

Wednesday, August 29, 12                                                   8
Efficient Floating Point Operation


        extern double bar;
        float foo () { return bar * bar; }


     i386 psABI                               x32 psABI
           subl       $12, %esp                  movsd    bar(%rip), %xmm0
           movsd      bar, %xmm0                 mulsd    %xmm0, %xmm0
           mulsd      %xmm0, %xmm0               ret
           movsd      %xmm0, (%esp)
           fldl       (%esp)
           addl       $12, %esp
           ret




                       X32 doesn’t use X87 to return FP values
                                          9

Wednesday, August 29, 12                                                     9
Agenda

      • What is X32 ABI?
      • Benefits for Virtualization & Cloud
      • Evaluation of X32
      • Summary




                             10

Wednesday, August 29, 12                      10
Characteristics of Virtualization
      • TLB misses in guests can be more
        expensive
            - Need to walk through both guest page tables
              and HAP (Hardware Assisted Paging) page
              tables
      • Utilization of cache can be higher
        because of additional components
            - Hypervisor, driver domains
            - Other VMs


                                 11

Wednesday, August 29, 12                                    11
Advantage of x32 in Virtualization

      • Over x86
            - Generic performance advantages of x32
            - Better cache utilization
                  • fewer instructions, passing arguments in registers

      • Over x86-64
            - Efficient guest page table structures
                  • Only 4GB virtual address space
                           – Single PML4 and 4 PDP directory entries in a row
            - Better cache utilization
                  • 4-byte pointers

                                                12

Wednesday, August 29, 12                                                        12
Advantage of x32 in Cloud

      • Performance/memory is crucial in Cloud
            - 32-bit address space is still sufficient for
              many apps
            - Use memory for data, not for 8-byte pointers
      • Use unified (64-bit) kernel
            - Use x32 ABI for 32-bit apps
            - Traditional x86 and x86-64 apps can run as
              well
           Add x32 support to the list of OS images for Cloud

                                   13

Wednesday, August 29, 12                                        13
How should we use x32 in Xen?

      • x32 apps in HVM Linux
            - Xen doesn’t need to know about x32
      • x32 apps in PV Linux
            - Performance with x32 can regress because
              of 64-bit PV issues
            - Should be run in HVM container
      • x32 PV Linux
            - 32-bit PV Linux with more registers used
            - Practically insignificant
                           Run x32 apps in HVM
                                    14

Wednesday, August 29, 12                                 14
Agenda

      • What is X32 ABI?
      • Benefits for Virtualization & Cloud
      • Evaluation of X32
      • Summary




                             15

Wednesday, August 29, 12                      15
Configurations

      • VM
            - HVM (Linux) with 1GB/4GM memory
            - VCPUs (1, 2)
            - i386, x32, x86-64 apps
      • Simple Web Applications
            - mysql-server (5.1.52) (run remotely, 64-bit)
            - apache (2.2.22), php (5.4.4), varnish
               • Compiled for i386, x32, x86-64 ABI



                                  16

Wednesday, August 29, 12                                     16
Simple Web Apps (1)

   $ ab -c $concurrency -n $requests http://$guestip/drupal/
   index.php
             1GB, 2 VCPUs                                                   1GB, 1 VCPU
                                   Request (/sec.) (Higher is better)
250"                                                        8250#


200"                                                        8200#


150"                                                        8150#
                                             i386"                                            i386#
                                             x86_64"        8100#                             x86_64#
100"
                                             x32"                                             x32#
                                                            8050#
 50"

                                                            8000#
  0"                                                                           #
           100/1000"         10/1000"   Concurrency/Requests              1000/100000#


       No Varnish Enable X-Drupal-Cache                       Enable Varnish Enable X-Drupal-Cache

                                                       17

Wednesday, August 29, 12                                                                              17
Simple Web Apps (2)
         $ ab -c $concurrency -n $requests http://$guestip/drupal/
         index.php


      Request (/sec.) (Higher is better)                                        4GB, 2 VCPUs

                           230.00%
                           225.00%
                           220.00%
                           215.00%
                           210.00%
                           205.00%                                    x86_64%

                           200.00%                                    x32%
                           195.00%
                           190.00%
                           185.00%
                           180.00%
                                          100/1000%        10/1000%

                                            Concurrency/Requests
                                     No Varnish Enable X-Drupal-Cache

                                                      18

Wednesday, August 29, 12                                                                       18
Issues Found

      • Some apps are not x32 clean
            - Code assumes 64-bit long and pointers using
              __amd64__ or __x86_64__
            - Assembly code that assumes 64-bit long and
              pointers




                                19

Wednesday, August 29, 12                                    19
Agenda

      • What is X32 ABI?
      • Benefits for Virtualization & Cloud
      • Evaluation of X32
      • Summary




                             20

Wednesday, August 29, 12                      20
Summary

      • Generic advantages of x32 apps
            - See for details and status of x32 ABI:
                  • https://sites.google.com/site/x32abi/

      • Virtualization and Cloud can take more
        advantage of x32 apps
      • Preliminary results look promising
            - x32 was always better than i386 or x86-64
              with simple Web apps*
      • Build and test more Web apps
    *: Intel internal measurements       21

Wednesday, August 29, 12                                    21

More Related Content

Similar to Evaluation of X32 ABI for Virtualization and Cloud

Exploring the x64
Exploring the x64Exploring the x64
Exploring the x64
FFRI, Inc.
 
Minko stage3d workshop_20130525
Minko stage3d workshop_20130525Minko stage3d workshop_20130525
Minko stage3d workshop_20130525
Minko3D
 
General Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics HardwareGeneral Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics Hardware
Daniel Blezek
 
Potapenko, vyukov forewarned is forearmed. a san and tsan
Potapenko, vyukov   forewarned is forearmed. a san and tsanPotapenko, vyukov   forewarned is forearmed. a san and tsan
Potapenko, vyukov forewarned is forearmed. a san and tsan
DefconRussia
 

Similar to Evaluation of X32 ABI for Virtualization and Cloud (20)

Exploring the x64
Exploring the x64Exploring the x64
Exploring the x64
 
[Ruxcon 2011] Post Memory Corruption Memory Analysis
[Ruxcon 2011] Post Memory Corruption Memory Analysis[Ruxcon 2011] Post Memory Corruption Memory Analysis
[Ruxcon 2011] Post Memory Corruption Memory Analysis
 
[HITB Malaysia 2011] Exploit Automation
[HITB Malaysia 2011] Exploit Automation[HITB Malaysia 2011] Exploit Automation
[HITB Malaysia 2011] Exploit Automation
 
[Kiwicon 2011] Post Memory Corruption Memory Analysis
[Kiwicon 2011] Post Memory Corruption Memory Analysis[Kiwicon 2011] Post Memory Corruption Memory Analysis
[Kiwicon 2011] Post Memory Corruption Memory Analysis
 
Qemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System EmulationQemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System Emulation
 
Extreme dxt compression
Extreme dxt compressionExtreme dxt compression
Extreme dxt compression
 
Minko stage3d workshop_20130525
Minko stage3d workshop_20130525Minko stage3d workshop_20130525
Minko stage3d workshop_20130525
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
 
Vectorization in ATLAS
Vectorization in ATLASVectorization in ATLAS
Vectorization in ATLAS
 
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John MelonakosPT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos
 
General Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics HardwareGeneral Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics Hardware
 
[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis[CCC-28c3] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis
 
Multilayer Neuronal network hardware implementation
Multilayer Neuronal network hardware implementation Multilayer Neuronal network hardware implementation
Multilayer Neuronal network hardware implementation
 
Hadoop london
Hadoop londonHadoop london
Hadoop london
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2
 
Introduction to Julia
Introduction to JuliaIntroduction to Julia
Introduction to Julia
 
Potapenko, vyukov forewarned is forearmed. a san and tsan
Potapenko, vyukov   forewarned is forearmed. a san and tsanPotapenko, vyukov   forewarned is forearmed. a san and tsan
Potapenko, vyukov forewarned is forearmed. a san and tsan
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)
 

More from The Linux Foundation

More from The Linux Foundation (20)

ELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made SimpleELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made Simple
 
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
 
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
 
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
 
XPDDS19 Keynote: Unikraft Weather Report
XPDDS19 Keynote:  Unikraft Weather ReportXPDDS19 Keynote:  Unikraft Weather Report
XPDDS19 Keynote: Unikraft Weather Report
 
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
 
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, XilinxXPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
 
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
 
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, BitdefenderXPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
 
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
 
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
 OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making... OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
 
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, CitrixXPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
 
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltdXPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
 
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
 
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&DXPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
 
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM SystemsXPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
 
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
 
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
 
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
 
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSEXPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Evaluation of X32 ABI for Virtualization and Cloud

  • 1. Evaluation of X32 ABI for Virtualization and Cloud Jun Nakajima Intel Corporation 1 Wednesday, August 29, 12 1
  • 2. Agenda • What is X32 ABI? • Benefits for Virtualization & Cloud • Evaluation of X32 • Summary 2 Wednesday, August 29, 12 2
  • 3. x32 ABI Basics • x86-64 ABI but with 32-bit longs and pointers • 64-bit arithmetic • Fully utilize modern x86 • 8 additional integer registers (16 total) • 8 additional SSE registers • SSE for FP math • 64-bit kernel is required - Linux kernel 3.4 has support for x32 Same memory footprint as x86 with advantages of x86-64. No hardware changes are required. 3 Wednesday, August 29, 12 3
  • 4. ABI Comparison i386 x86-64 x32 Integer registers 6 15 15 FP registers 8 16 16 Pointers 4 bytes 8 bytes 4 bytes 64-bit arithmetic No Yes Yes Floating point x87 SSE SSE Calling convention Memory Registers Registers PIC prologue 2-3 insn None None 4 Wednesday, August 29, 12 4
  • 5. The x32 Performance Advantage • In the order of Expected Contribution: - Efficient Position Independent Code - Efficient Function Parameter Passing - Efficient 64-bit Arithmetic - Efficient Floating Point Operations X32 is expected to give a 10-20% performance boost for C, C++ 5 Wednesday, August 29, 12 5
  • 6. Efficient Position Independent Code extern int x, y, z; void foo () { z = x * y; } i386 psABI x32 psABI call __i686.get_pc_thunk.cx movl x@GOTPCREL(%rip), %edx addl $_GLOBAL_OFFSET_TABLE_, %ecx movl y@GOTPCREL(%rip), %eax movl!y@GOT(%ecx), %eax movl (%rax), %rax movl!x@GOT(%ecx), %edx imull (%rdx), %rax movl!(%eax), %eax movl z@GOTPCREL(%rip), %edx imull! (%edx), %eax movl %rax, (%rdx) movl!z@GOT(%ecx), %edx ret movl!%eax, (%edx) ret __i686.get_pc_thunk.cx: movl!(%esp), %ecx ret X32 PIC code is shorter and faster 6 Wednesday, August 29, 12 6
  • 7. Efficient Function Parameter Passing void bar (int x, int y, int z); void foo (int x, int y, int z) { bar (y, z, x); } i386 psABI x32 psABI subl $28, %esp subl $8, %esp movl 32(%esp), %eax movl %edi, %eax movl %eax, 8(%esp) movl %esi, %edi movl 40(%esp), %eax movl %edx, %esi movl %eax, 4(%esp) movl %eax, %edx movl 36(%esp), %eax call bar movl %eax, (%esp) addl $8, %esp call bar ret addl $28, %esp ret X32 passes parameters in registers 7 Wednesday, August 29, 12 7
  • 8. Efficient 64-bit Integer Arithmetic extern long long x, y, z; void foo () { z = x * y; } i386 psABI x32 psABI ! movl! x+4, %edx ! movq! x(%rip), %rax ! movl! y+4, %eax ! imulq! y(%rip), %rax ! imull! y, %edx ! movq! %rax, z(%rip) ! imull! x, %eax ! ret ! leal! (%edx,%eax), %ecx ! movl! %eax y, ! mull! x ! addl! %ecx, %edx ! movl! %eax, z ! movl! %edx, z+4 ! ret X32 provides very efficient 64-bit integer support (3 instructions vs. 10 instructions). 8 Wednesday, August 29, 12 8
  • 9. Efficient Floating Point Operation extern double bar; float foo () { return bar * bar; } i386 psABI x32 psABI subl $12, %esp movsd bar(%rip), %xmm0 movsd bar, %xmm0 mulsd %xmm0, %xmm0 mulsd %xmm0, %xmm0 ret movsd %xmm0, (%esp) fldl (%esp) addl $12, %esp ret X32 doesn’t use X87 to return FP values 9 Wednesday, August 29, 12 9
  • 10. Agenda • What is X32 ABI? • Benefits for Virtualization & Cloud • Evaluation of X32 • Summary 10 Wednesday, August 29, 12 10
  • 11. Characteristics of Virtualization • TLB misses in guests can be more expensive - Need to walk through both guest page tables and HAP (Hardware Assisted Paging) page tables • Utilization of cache can be higher because of additional components - Hypervisor, driver domains - Other VMs 11 Wednesday, August 29, 12 11
  • 12. Advantage of x32 in Virtualization • Over x86 - Generic performance advantages of x32 - Better cache utilization • fewer instructions, passing arguments in registers • Over x86-64 - Efficient guest page table structures • Only 4GB virtual address space – Single PML4 and 4 PDP directory entries in a row - Better cache utilization • 4-byte pointers 12 Wednesday, August 29, 12 12
  • 13. Advantage of x32 in Cloud • Performance/memory is crucial in Cloud - 32-bit address space is still sufficient for many apps - Use memory for data, not for 8-byte pointers • Use unified (64-bit) kernel - Use x32 ABI for 32-bit apps - Traditional x86 and x86-64 apps can run as well Add x32 support to the list of OS images for Cloud 13 Wednesday, August 29, 12 13
  • 14. How should we use x32 in Xen? • x32 apps in HVM Linux - Xen doesn’t need to know about x32 • x32 apps in PV Linux - Performance with x32 can regress because of 64-bit PV issues - Should be run in HVM container • x32 PV Linux - 32-bit PV Linux with more registers used - Practically insignificant Run x32 apps in HVM 14 Wednesday, August 29, 12 14
  • 15. Agenda • What is X32 ABI? • Benefits for Virtualization & Cloud • Evaluation of X32 • Summary 15 Wednesday, August 29, 12 15
  • 16. Configurations • VM - HVM (Linux) with 1GB/4GM memory - VCPUs (1, 2) - i386, x32, x86-64 apps • Simple Web Applications - mysql-server (5.1.52) (run remotely, 64-bit) - apache (2.2.22), php (5.4.4), varnish • Compiled for i386, x32, x86-64 ABI 16 Wednesday, August 29, 12 16
  • 17. Simple Web Apps (1) $ ab -c $concurrency -n $requests http://$guestip/drupal/ index.php 1GB, 2 VCPUs 1GB, 1 VCPU Request (/sec.) (Higher is better) 250" 8250# 200" 8200# 150" 8150# i386" i386# x86_64" 8100# x86_64# 100" x32" x32# 8050# 50" 8000# 0" # 100/1000" 10/1000" Concurrency/Requests 1000/100000# No Varnish Enable X-Drupal-Cache Enable Varnish Enable X-Drupal-Cache 17 Wednesday, August 29, 12 17
  • 18. Simple Web Apps (2) $ ab -c $concurrency -n $requests http://$guestip/drupal/ index.php Request (/sec.) (Higher is better) 4GB, 2 VCPUs 230.00% 225.00% 220.00% 215.00% 210.00% 205.00% x86_64% 200.00% x32% 195.00% 190.00% 185.00% 180.00% 100/1000% 10/1000% Concurrency/Requests No Varnish Enable X-Drupal-Cache 18 Wednesday, August 29, 12 18
  • 19. Issues Found • Some apps are not x32 clean - Code assumes 64-bit long and pointers using __amd64__ or __x86_64__ - Assembly code that assumes 64-bit long and pointers 19 Wednesday, August 29, 12 19
  • 20. Agenda • What is X32 ABI? • Benefits for Virtualization & Cloud • Evaluation of X32 • Summary 20 Wednesday, August 29, 12 20
  • 21. Summary • Generic advantages of x32 apps - See for details and status of x32 ABI: • https://sites.google.com/site/x32abi/ • Virtualization and Cloud can take more advantage of x32 apps • Preliminary results look promising - x32 was always better than i386 or x86-64 with simple Web apps* • Build and test more Web apps *: Intel internal measurements 21 Wednesday, August 29, 12 21