SlideShare a Scribd company logo
1 of 56
Download to read offline
VIRTUAL DATABASES?
                               Optimizing for Virtualization




Liz van Dijk - @lizztheblizz - liz@sizingservers.be

Sunday 8 May 2011
THE GOAL



       “Virtualized Databases suck!” - Is this really
       true? Does it have to be?



Sunday 8 May 2011

Databases are supposed to be “hard to virtualize” and have decreased performance in a
virtual environment. This is actually correct, dumping a native database into a virtual
environment without applying any changes could potentially cause some issues.
HOW DO WE GET THERE?


       1. Understanding just why the virtual environment
          impacts performance, and taking the correct steps to
          adapt our database to its new habitat.
       2. Optimize, optimize, optimize...



Sunday 8 May 2011

Action: We have to understand why performance of databases is influenced, and how we can
arm ourselves against this impact.

On the other hand, while there used be less of a need for optimization in an environment
where hardware was abundant, a virtual environment causes struggles for resources more
quickly. It’s important to create our application as slim as possible without losing
performance. In many cases, performance can be multiplied by having a closer look at the
database.

Message: Why is this interesting for you? This knowledge could convince you to make the
switch to a virtual environment, trusting it won’t hit your software’s performance, and will
help you take a look at your existing infrastructure to take the necessary steps to run your
application as optimal as possible.
THE INFLUENCE OF VIRTUALIZATION



      • All “kernel” activity is more costly:
          • Interrupts
          • System Calls (I/O)
          • Memory page management




Sunday 8 May 2011

So, let’s start with the understanding step: what could potentially slow down because of
virtualization?

The 3 most important aspects are:
Interrupts - An actual piece of hardware is looking for attention from the CPU. Making use of
Jumbo Frames is a very good idea in a virtual environment, because sending the same data
causes less interrupts (1500 --> 9000 bytes per packet)
System Calls - A process is looking for attention from the kernel to do a privileged task like
accessing certain hardware (network/disk IO)
Page Management - This is the most important one for databases: think caching. The
database keeps an enormous amount of data in its own caches, so memory is manipulated a
lot of the time. Every time something changes in this memory, the virtual host has to perform
a double translation: From Virtual Memory to VM pagetable to physical address.

Usually, this causes the biggest performance hit when switching from native to virtual. We
really have to do everything we can to minimize this problem.
GENERAL OPTIMIZATION STRATEGY


                    Making the right hardware choices

                    Tuning the hypervisor to your database’s needs

                    Tuning the OS to your database’s needs

                    Squeezing every last bit of performance out of your
                    database


Sunday 8 May 2011

Performance issues should be dealt with systematically, and we can split that process up in
these 4 steps.
HARDWARE CHOICES
   • Choosing            the right CPU’s

             • Intel     5500/7500 and later types
                    (Nehalem) / All AMD quadcore
                    Opterons (HW-assisted/MMU
                    virtualization)

   • Choosing            the right NIC’s (VMDQ)

   • Choosing     the right storage system
       (iSCSI vs FC SAN)

Sunday 8 May 2011

CPU’s --> HW Virtualization (dom -1) & HAP

best price/quality at the moment
Opteron 6000 series very good at datamining/decision support
Xeon 5600 series still very good at OLTP


VMDQ = sorting/queueing offloaded to the NIC
CPU EVOLUTION




Sunday 8 May 2011
CPU EVOLUTION




Sunday 8 May 2011
CPU EVOLUTION




Sunday 8 May 2011
OVERVIEW	
  NIC	
  -­‐	
  VMDQ	
  /	
  NETQUEUE
          Netqueue	
  Devices                                                         Part	
  nr      Speed            Interface
          Intel	
  Ethernet	
  Server	
  Adapter	
  X520-­‐SR2	
  2	
  ports          E10G42BFSR      10Gbps           SR-­‐LC
          Intel	
  Ethernet	
  Server	
  Adapter	
  X520-­‐DA2	
  2	
  ports          E10G42BTDA      10Gbps           SFP+
          Intel	
  Gigabit	
  ET	
  Dual	
  Port	
  Server	
  Adapter	
  2	
  ports   E1G42ET         1Gbps            RJ-­‐45	
  -­‐	
  Copper
          Intel	
  Gigabit	
  EF	
  Dual	
  Port	
  Server	
  Adapter	
  2	
  ports   E1G42EF         1Gbps            RJ-­‐45	
  -­‐	
  Fibre

          Intel	
  Gigabit	
  ET	
  Quad	
  Port	
  Server	
  Adapter	
  4	
  ports   E1G44ET         1Gbps            RJ-­‐45	
  -­‐	
  Copper
          Intel	
  Gigabit	
  CT	
  Desktop	
  Adapter                                EXPI9301CT      1Gbps            RJ-­‐45	
  -­‐	
  Copper
          Supermicro	
  Add-­‐on	
  Card	
  AOC-­‐SG-­‐I2	
  2	
  ports               AOC-­‐SG-­‐I2   1Gbps            RJ-­‐45	
  copper

          Onboard	
  82576	
  (8	
  Virtual	
  Queues)

          Onboard	
  82574	
  Geen	
  IOV

          Broadcom's	
  NetXtreme	
  II	
  Ethernet	
  chipse                                         1-­‐10	
  GBps

          Alle	
  Neterions	
                                                                         1-­‐10	
  GBps




Sunday 8 May 2011
SAN CHOICES
    • Fibre Channel
         • ESX with FC-HBA
         • vSphere: FC-HBA pass-through to Guest OS




     • iSCSI (using 10Gbit if possible)
         • ESX with Hardware Initiator (iSCSI HBA)
         • ESX with Software Initiator
         • Initiator inside the Guest OS
         • vSphere: iSCSI HBA pass-through to Guest OS




Sunday 8 May 2011

10Gbit = high CPU overhead!! We’re talking 24GHz to fill up 9Gbits
This problem can be reduced by the following technologies
VT-d ---> Moving DMA and address translation to the NIC
VMDQ/Netqueue ---> Netqueue is pretty much VMware’s implementation
SR-IOV ---> Allowing one physical device (NIC) to show itself as multiple virtual devices.
SAN CHOICES
    • Fibre Channel
         • ESX with FC-HBA
         • vSphere: FC-HBA pass-through to Guest OS                        Server with (hardware) iSCSI = iSCSI Target




     • iSCSI (using 10Gbit if possible)
         • ESX with Hardware Initiator (iSCSI HBA)
         • ESX with Software Initiator
         • Initiator inside the Guest OS
         • vSphere: iSCSI HBA pass-through to Guest OS
                                                       (Virtualization-) server with (hardware) iSCSI
                                                                                        = iSCSI Initiator




Sunday 8 May 2011

10Gbit = high CPU overhead!! We’re talking 24GHz to fill up 9Gbits
This problem can be reduced by the following technologies
VT-d ---> Moving DMA and address translation to the NIC
VMDQ/Netqueue ---> Netqueue is pretty much VMware’s implementation
SR-IOV ---> Allowing one physical device (NIC) to show itself as multiple virtual devices.
GENERAL OPTIMIZATION STRATEGY


                    Making the right “hardware” choices

                    Tuning the hypervisor to your database’s needs

                    Tuning the OS to your database’s needs

                    Squeezing every last bit of performance out of your
                    database


Sunday 8 May 2011
VIRTUAL MEMORY
                                                  0xA
                                                  0xB
                                                  0xC
                                                  0xD
                                                  0xE
                                                  0xF
                                                  0xG
                                                  0xH
                                                                            CPU
                                                Mem




                    Managed by software

                    Actual Hardware

Sunday 8 May 2011

CPU’s: AMD: all 4core opterons
Intel: Xeon 5500, 7500, 5600

Physical memory is divided into segments of 4KB, which is translated in software to so-called
pages. Small chunks with each its own address, which the CPU uses to find the data in the
physical memory.

A piece of software always gets a continuous block of “virtual” memory assigned to it within
an OS, even though the physical memory is fragmented, to prevent a coding nightmare.
(keeping track of every single page address is madness).

The page table was made for the CPU to run through and to make the necessary translation
to the physical memory. The CPU has a hardware cache that keeps track of these entries, the
Translation Lookaside Buffer. This is an extremely fast buffer that saves the most recent
addresses, so the CPU can prevent running through the Page Table as much as possible.
VIRTUAL MEMORY
                       Virtual
                                                  0xA
                      Memory
                                                  0xB

                         1                        0xC
                         2                        0xD
                         3                        0xE
                         4                        0xF
                         5                        0xG


                OS
                         6
                         7
                                                  0xH
                                                                            CPU
                         8
                         9
                                                Mem
                         10
                         11
                         12




                    Managed by software

                    Actual Hardware

Sunday 8 May 2011

CPU’s: AMD: all 4core opterons
Intel: Xeon 5500, 7500, 5600

Physical memory is divided into segments of 4KB, which is translated in software to so-called
pages. Small chunks with each its own address, which the CPU uses to find the data in the
physical memory.

A piece of software always gets a continuous block of “virtual” memory assigned to it within
an OS, even though the physical memory is fragmented, to prevent a coding nightmare.
(keeping track of every single page address is madness).

The page table was made for the CPU to run through and to make the necessary translation
to the physical memory. The CPU has a hardware cache that keeps track of these entries, the
Translation Lookaside Buffer. This is an extremely fast buffer that saves the most recent
addresses, so the CPU can prevent running through the Page Table as much as possible.
VIRTUAL MEMORY
                       Virtual
                                          Page Table    0xA
                      Memory
                                                        0xB

                         1                  1 | 0xD     0xC
                         2                  2 | 0xC     0xD
                         3                  3 | 0xF     0xE
                         4                  4 | 0xA     0xF
                         5                  5 | 0xH     0xG


                OS
                         6
                         7
                                            6 | 0xG
                                            7 | 0xB
                                                        0xH
                                                                            CPU
                         8
                         9
                                            8 | 0xE
                                                       Mem
                         10
                         11
                                              etc.
                         12




                    Managed by software

                    Actual Hardware

Sunday 8 May 2011

CPU’s: AMD: all 4core opterons
Intel: Xeon 5500, 7500, 5600

Physical memory is divided into segments of 4KB, which is translated in software to so-called
pages. Small chunks with each its own address, which the CPU uses to find the data in the
physical memory.

A piece of software always gets a continuous block of “virtual” memory assigned to it within
an OS, even though the physical memory is fragmented, to prevent a coding nightmare.
(keeping track of every single page address is madness).

The page table was made for the CPU to run through and to make the necessary translation
to the physical memory. The CPU has a hardware cache that keeps track of these entries, the
Translation Lookaside Buffer. This is an extremely fast buffer that saves the most recent
addresses, so the CPU can prevent running through the Page Table as much as possible.
VIRTUAL MEMORY
                       Virtual
                                          Page Table    0xA
                      Memory
                                                        0xB

                         1                  1 | 0xD     0xC
                         2                  2 | 0xC     0xD   TLB
                         3                  3 | 0xF     0xE
                                                              1 | 0xD
                         4                  4 | 0xA     0xF
                                                              5 | 0xH
                         5                  5 | 0xH     0xG
                                                              2 | 0xC

                OS
                         6
                         7
                                            6 | 0xG
                                            7 | 0xB
                                                        0xH
                                                                            CPU
                         8
                         9
                                            8 | 0xE
                                                       Mem
                                                               etc.
                         10
                         11
                                              etc.
                         12




                    Managed by software

                    Actual Hardware

Sunday 8 May 2011

CPU’s: AMD: all 4core opterons
Intel: Xeon 5500, 7500, 5600

Physical memory is divided into segments of 4KB, which is translated in software to so-called
pages. Small chunks with each its own address, which the CPU uses to find the data in the
physical memory.

A piece of software always gets a continuous block of “virtual” memory assigned to it within
an OS, even though the physical memory is fragmented, to prevent a coding nightmare.
(keeping track of every single page address is madness).

The page table was made for the CPU to run through and to make the necessary translation
to the physical memory. The CPU has a hardware cache that keeps track of these entries, the
Translation Lookaside Buffer. This is an extremely fast buffer that saves the most recent
addresses, so the CPU can prevent running through the Page Table as much as possible.
SPT VS HAP
                              “Read-only”
                                               0xA
                               Page Table
                                               0xB
                                1 | 0xD        0xC
                         1
                         2      5 | 0xH        0xD

              VM A       3      2 | 0xC        0xE
                                               0xF
                         4
                         5                     0xG
                                  N
                                               0xH
                                                                            CPU
                         1
                                12 | 0xB

                                10 | 0xE
                                              Mem
                         2

              VM B       3      9 | 0xA

                         4
                         12       etc.




                    Managed by VM OS
                    Managed by hypervisor
                    Actual Hardware

Sunday 8 May 2011

In a virtual environment, where the guest OS is not allowed direct access to the memory, this
was solved in a different way. Each VM gets access to its own page table, but this one is
actually locked/read-only, and as soon as a change is made, a “trap” is generated, so the
hypervisor is forced to take over and handle the page management. This causes a lot of
overhead, because every single memory management action forces the hypervisor to
intervene.

As an alternative, new CPU’s came to the market with a modified TLB-cache, which was able
to keep track of the complete translation path (VM virtual address --> VM physical address
--> host physical address)

Downside: Because of this, filling up the TLB got a lot more complex. A page that is not yet in
there is very hard to find. Once the TLB is properly warmed up, though, most applications
rarely have to wait for other pages.
SPT VS HAP
                              “Read-only”   “Shadow”
                                                          0xA
                               Page Table   Page Table
                                                          0xB
                                1 | 0xD       1 | 0xG     0xC
                         1
                                5 | 0xH       5 | 0xD     0xD
                         2

              VM A       3      2 | 0xC       2 | 0xF     0xE
                                                          0xF
                         4
                         5
                                  N
                                                A         0xG
                                                          0xH
                                                                            CPU
                         1
                                12 | 0xB

                                10 | 0xE
                                              12 | 0xE
                                              10 | 0xB
                                                         Mem
                         2

              VM B       3      9 | 0xA       9 | 0xC

                         4
                         12       etc.
                                                B


                    Managed by VM OS
                    Managed by hypervisor
                    Actual Hardware

Sunday 8 May 2011

In a virtual environment, where the guest OS is not allowed direct access to the memory, this
was solved in a different way. Each VM gets access to its own page table, but this one is
actually locked/read-only, and as soon as a change is made, a “trap” is generated, so the
hypervisor is forced to take over and handle the page management. This causes a lot of
overhead, because every single memory management action forces the hypervisor to
intervene.

As an alternative, new CPU’s came to the market with a modified TLB-cache, which was able
to keep track of the complete translation path (VM virtual address --> VM physical address
--> host physical address)

Downside: Because of this, filling up the TLB got a lot more complex. A page that is not yet in
there is very hard to find. Once the TLB is properly warmed up, though, most applications
rarely have to wait for other pages.
SPT VS HAP
                              “Read-only”
                                               0xA
                               Page Table                      TLB
                                               0xB
                                1 | 0xD        0xC            A1 | 0xD
                         1
                                5 | 0xH        0xD            A5 | 0xH
                         2

              VM A       3      2 | 0xC        0xE
                                               0xF
                                                              A2 | 0xC
                                                              B12 | 0xB
                         4
                         5                     0xG            B10 | 0xE
                                  N
                                               0xH            B9 | 0xA
                                                                            CPU
                         1
                                12 | 0xB

                                10 | 0xE
                                              Mem
                         2

              VM B       3      9 | 0xA

                         4
                         12       etc.                          etc.




                    Managed by VM OS
                    Managed by hypervisor
                    Actual Hardware

Sunday 8 May 2011

In a virtual environment, where the guest OS is not allowed direct access to the memory, this
was solved in a different way. Each VM gets access to its own page table, but this one is
actually locked/read-only, and as soon as a change is made, a “trap” is generated, so the
hypervisor is forced to take over and handle the page management. This causes a lot of
overhead, because every single memory management action forces the hypervisor to
intervene.

As an alternative, new CPU’s came to the market with a modified TLB-cache, which was able
to keep track of the complete translation path (VM virtual address --> VM physical address
--> host physical address)

Downside: Because of this, filling up the TLB got a lot more complex. A page that is not yet in
there is very hard to find. Once the TLB is properly warmed up, though, most applications
rarely have to wait for other pages.
HAP




Sunday 8 May 2011

As you can see, in general this does help improve performance, though not by a really huge
amount. It opens the door to a great combination with another technique, though!
HAP + LARGE PAGES
      Setting Large Pages:
        • Linux - increase SHMMAX in rc.local
        • Windows - grant “Lock Pages in memory”
        • MySQL (only InnoDB) - large-pages
        • Oracle - ORA_LPENABLE=1 in registry
        • SQL Server - Enterprise only, need >8GB RAM. For buffer
               pool start up with trace flag -834




Sunday 8 May 2011

While using HAP, you should definitely make use of Large Pages, because filling up the TLB is
a lot more expensive. By using Large Pages (2mb in 4kb), a LOT more memory can be
accessed by a single entry. This in combination with a bigger TLB in the newest CPU’s
attempts to prevent entries from disappearing from the TLB too fast.

Oracle:      HKEY_LOCAL_MACHINESOFTWAREORACLEKEY_HOME_NAME
HAP + LARGE PAGES
      Setting Large Pages:
        • Linux - increase SHMMAX in rc.local
        • Windows - grant “Lock Pages in memory”
        • MySQL (only InnoDB) - large-pages
        • Oracle - ORA_LPENABLE=1 in registry
        • SQL Server - Enterprise only, need >8GB RAM. For buffer
               pool start up with trace flag -834




Sunday 8 May 2011

While using HAP, you should definitely make use of Large Pages, because filling up the TLB is
a lot more expensive. By using Large Pages (2mb in 4kb), a LOT more memory can be
accessed by a single entry. This in combination with a bigger TLB in the newest CPU’s
attempts to prevent entries from disappearing from the TLB too fast.

Oracle:      HKEY_LOCAL_MACHINESOFTWAREORACLEKEY_HOME_NAME
VIRTUAL HBA’S
     • Choices         (ESX)
               • Before vSphere:
                 • BusLogic Parallel (Legacy)
                 • LSI Logic Parallel (Optimized)


               • Since vSphere
                 • LSI Logic SAS (default as of Win2008)
                 • VMware Paravirtual (PVSCSI)


     • Thin         vs Thick Provisioning (vSphere)

     • Snapshots          & performance do not go together

Sunday 8 May 2011

BusLogic ---> Generic adapter
LSILogic ---> Optimized adapter that requires tools
LSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)
PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command
queueing.
VIRTUAL HBA’S
     • Choices         (ESX)
               • Before vSphere:
                 • BusLogic Parallel (Legacy)
                 • LSI Logic Parallel (Optimized)


               • Since vSphere
                 • LSI Logic SAS (default as of Win2008)
                 • VMware Paravirtual (PVSCSI)


     • Thin         vs Thick Provisioning (vSphere)

     • Snapshots          & performance do not go together

Sunday 8 May 2011

BusLogic ---> Generic adapter
LSILogic ---> Optimized adapter that requires tools
LSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)
PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command
queueing.
VIRTUAL HBA’S
     • Choices         (ESX)
               • Before vSphere:
                 • BusLogic Parallel (Legacy)
                 • LSI Logic Parallel (Optimized)


               • Since vSphere
                 • LSI Logic SAS (default as of Win2008)
                 • VMware Paravirtual (PVSCSI)


     • Thin         vs Thick Provisioning (vSphere)

     • Snapshots          & performance do not go together

Sunday 8 May 2011

BusLogic ---> Generic adapter
LSILogic ---> Optimized adapter that requires tools
LSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)
PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command
queueing.
VIRTUAL HBA’S
     • Choices         (ESX)
               • Before vSphere:
                 • BusLogic Parallel (Legacy)
                 • LSI Logic Parallel (Optimized)


               • Since vSphere
                 • LSI Logic SAS (default as of Win2008)
                 • VMware Paravirtual (PVSCSI)


     • Thin         vs Thick Provisioning (vSphere)

     • Snapshots          & performance do not go together

Sunday 8 May 2011

BusLogic ---> Generic adapter
LSILogic ---> Optimized adapter that requires tools
LSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)
PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command
queueing.
VIRTUAL HBA’S
     • Choices         (ESX)
               • Before vSphere:
                 • BusLogic Parallel (Legacy)
                 • LSI Logic Parallel (Optimized)


               • Since vSphere
                 • LSI Logic SAS (default as of Win2008)
                 • VMware Paravirtual (PVSCSI)


     • Thin         vs Thick Provisioning (vSphere)

     • Snapshots          & performance do not go together

Sunday 8 May 2011

BusLogic ---> Generic adapter
LSILogic ---> Optimized adapter that requires tools
LSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)
PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command
queueing.
VIRTUAL HBA’S
     • Choices         (ESX)
               • Before vSphere:
                 • BusLogic Parallel (Legacy)
                 • LSI Logic Parallel (Optimized)


               • Since vSphere
                 • LSI Logic SAS (default as of Win2008)
                 • VMware Paravirtual (PVSCSI)


     • Thin         vs Thick Provisioning (vSphere)

     • Snapshots          & performance do not go together

Sunday 8 May 2011

BusLogic ---> Generic adapter
LSILogic ---> Optimized adapter that requires tools
LSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)
PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command
queueing.
VIRTUAL NIC’S
     • Choices       (ESX)

          • Before vSphere:
            • Flexible (emulation)
            • E1000 (Intel E1000 emulation, default x64)
            • (enhanced) VMXNET (paravirtual)


          • Since vSphere:
            • VMXNET 3 (third generation paravirtual NIC)




     • Jumbo        frames, NIC Teaming, VLANs

     • Colocation        (minimize NIC traffic by sharing a host)
Sunday 8 May 2011

Flexible ---> required for 32-bit systems
Automatically turns into a VMXNET after installing VMware Tools

VMXNet adds ‘Jumbo Frames’

VMXNET3 adds:
* MSI/MSI-X support (if supported by guest OS Kernel)
• Receive Side Scaling (Windows 2008 only)
• IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU)
• VLAN Offloading
• Bigger TX/RX ring sizes
• Optimizations for iSCSI & VMotion
• Necessary for VMDq!!
VIRTUAL NIC’S
     • Choices       (ESX)

          • Before vSphere:
            • Flexible (emulation)
            • E1000 (Intel E1000 emulation, default x64)
            • (enhanced) VMXNET (paravirtual)


          • Since vSphere:
            • VMXNET 3 (third generation paravirtual NIC)




     • Jumbo        frames, NIC Teaming, VLANs

     • Colocation        (minimize NIC traffic by sharing a host)
Sunday 8 May 2011

Flexible ---> required for 32-bit systems
Automatically turns into a VMXNET after installing VMware Tools

VMXNet adds ‘Jumbo Frames’

VMXNET3 adds:
* MSI/MSI-X support (if supported by guest OS Kernel)
• Receive Side Scaling (Windows 2008 only)
• IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU)
• VLAN Offloading
• Bigger TX/RX ring sizes
• Optimizations for iSCSI & VMotion
• Necessary for VMDq!!
VIRTUAL NIC’S
     • Choices       (ESX)

          • Before vSphere:
            • Flexible (emulation)
            • E1000 (Intel E1000 emulation, default x64)
            • (enhanced) VMXNET (paravirtual)


          • Since vSphere:
            • VMXNET 3 (third generation paravirtual NIC)




     • Jumbo        frames, NIC Teaming, VLANs

     • Colocation        (minimize NIC traffic by sharing a host)
Sunday 8 May 2011

Flexible ---> required for 32-bit systems
Automatically turns into a VMXNET after installing VMware Tools

VMXNet adds ‘Jumbo Frames’

VMXNET3 adds:
* MSI/MSI-X support (if supported by guest OS Kernel)
• Receive Side Scaling (Windows 2008 only)
• IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU)
• VLAN Offloading
• Bigger TX/RX ring sizes
• Optimizations for iSCSI & VMotion
• Necessary for VMDq!!
VIRTUAL NIC’S
     • Choices       (ESX)

          • Before vSphere:
            • Flexible (emulation)
            • E1000 (Intel E1000 emulation, default x64)
            • (enhanced) VMXNET (paravirtual)


          • Since vSphere:
            • VMXNET 3 (third generation paravirtual NIC)




     • Jumbo        frames, NIC Teaming, VLANs

     • Colocation        (minimize NIC traffic by sharing a host)
Sunday 8 May 2011

Flexible ---> required for 32-bit systems
Automatically turns into a VMXNET after installing VMware Tools

VMXNet adds ‘Jumbo Frames’

VMXNET3 adds:
* MSI/MSI-X support (if supported by guest OS Kernel)
• Receive Side Scaling (Windows 2008 only)
• IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU)
• VLAN Offloading
• Bigger TX/RX ring sizes
• Optimizations for iSCSI & VMotion
• Necessary for VMDq!!
VIRTUAL NIC’S
     • Choices       (ESX)

          • Before vSphere:
            • Flexible (emulation)
            • E1000 (Intel E1000 emulation, default x64)
            • (enhanced) VMXNET (paravirtual)


          • Since vSphere:
            • VMXNET 3 (third generation paravirtual NIC)




     • Jumbo        frames, NIC Teaming, VLANs

     • Colocation        (minimize NIC traffic by sharing a host)
Sunday 8 May 2011

Flexible ---> required for 32-bit systems
Automatically turns into a VMXNET after installing VMware Tools

VMXNet adds ‘Jumbo Frames’

VMXNET3 adds:
* MSI/MSI-X support (if supported by guest OS Kernel)
• Receive Side Scaling (Windows 2008 only)
• IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU)
• VLAN Offloading
• Bigger TX/RX ring sizes
• Optimizations for iSCSI & VMotion
• Necessary for VMDq!!
GENERAL OPTIMIZATION STRATEGY


                    Making the right hardware choices

                    Tuning the hypervisor to your database’s needs

                    Tuning the OS to your database’s needs

                    Squeezing every last bit of performance out of your
                    database


Sunday 8 May 2011
BEST OS CHOICES


   • 64-bit         Linux for MySQL

   • MySQL           5.1.32 or later



   • ... ?     (discuss mode on! :) )



Sunday 8 May 2011

Modified mutexes for InnoDB = improvement of locking for multithreaded environments.
This allows for much better scaling.
DON’T FORGET

   • VMware Tools                              • Integration     Services

        • Paravirtualized Vmxnet,                 • Paravirtualized      Drivers
            PVSCSI
                                                  • Hypercall     adapter
        • Ballooning
                                                  • Time      Sync
        • Time      Sync
                                                  • ... and   more recent drivers
        • ... and   more recent drivers

Sunday 8 May 2011

Definitely install the tools of the hypervisor in question to enable use of its newest
functionalities. This is very important if you want to use for example overcommitting memory
in ESX, or using paravirtualization in Linux on Hyper-V.
CACHING LEVELS

   • CPU

   • Application

   • Filesystem         / OS

   • RAID           Controller (switch off or use a BBU!)

   • Disk



Sunday 8 May 2011

CPU: Just buy the right CPU
App/FS: use the correct settings (Direct IO)
RAID Controller: Make use of a battery backupped unit (for transactional databases: lots of
random writes in the cache, so to be sure, the RAID controller keeps track of those). This is
mostly used as a write buffer.
Disk: If cache is available on-disk, it’s best we disable this, especially when the power drops
(so nothing can get stuck in the caches). HP disables these by default.
GENERAL OPTIMIZATION STRATEGY


                    Making the right hardware choices

                    Tuning the hypervisor to your database’s needs

                    Tuning the OS to your database’s needs

                    Squeezing every last bit of performance out of your
                    database


Sunday 8 May 2011
DIRECT IO


   • Less Page management
   • Smallest cache possible        vs Less I/O

       SQL Server: Automatically
       MySQL: only for use with InnoDB! - innodb_flush_method=O_DIRECT
       Oracle: filesystemio_options=DIRECTIO




Sunday 8 May 2011

Though in Windows this is on by default, in Linux it should definitely be enabled. Otherwise
everything that is already cached by the InnoDB buffer pool may also be cached by the
filesystem cache, so two separate but identical caches need to be maintained in the memory:
far too much memory management.

MySQL’s MyISAM actually depends on this filesystem cache. It expects the OS to do the brunt
of the caching work itself.
GENERAL MY.CNF
                                OPTIMIZATIONS
         •     max_connections (151) (File descriptors!)

         •     Per connection
                    •   read_buffer_size (128K) (Full Scan)
                    •   read_rnd_buffer_size (256K) (Order By)
                    •   sort_buffer_size (2M) (Sorts)
                    •   join_buffer_size (128K) (Full Scan Join)

         	



Sunday 8 May 2011
GENERAL MY.CNF
                                    OPTIMIZATIONS
         •   thread_cache (check out max_used_connections)

         •   table_cache (64) - table_open_cache (5.1.3x)

                    •   Engine dependent

                    •   open_tables variable

                 opened_tables ∆ ≈ 0
                          •
         • innodb_buffer_pool_size

         • innodb_thread_concurrency

Sunday 8 May 2011
Try to fit max_used_connections into the thread_cache IF
POSSIBLE
INDEXING

   • Heaps

   • Unclustered      Indexes

   • Clustered      Indexes (InnoDB)




Sunday 8 May 2011
INDEX FRAGMENTATION
                                              Clustered Index Leaf Level


   • Happens         with clustered indexes

   • Large-scale fragmentation of the indexes could cause serious
       performance problems

   • Fixes:

             • SQL Server: REBUILD/REORGANIZE
             • MySQL: ALTER TABLE tbl_name ENGINE=INNODB
             • Oracle: ALTER INDEX index_name REBUILD


Sunday 8 May 2011
STORAGE ENGINE INTERNALS
                                                  Datafile
                     DB
                    Front                     Transaction Log




                      Buffer Pool
                        Cache


Sunday 8 May 2011

SQL Server --> Set memory options in server properties > Memory > Server memory
Options
STORAGE ENGINE INTERNALS
   Update                                         Datafile
                     DB
                    Front                     Transaction Log




                      Buffer Pool
                        Cache


Sunday 8 May 2011

SQL Server --> Set memory options in server properties > Memory > Server memory
Options
STORAGE ENGINE INTERNALS
   Update                                         Datafile
                     DB
                    Front                     Transaction Log




                      Buffer Pool
                        Cache


Sunday 8 May 2011

SQL Server --> Set memory options in server properties > Memory > Server memory
Options
STORAGE ENGINE INTERNALS
   Update                                         Datafile
                     DB
                    Front                     Transaction Log




                      Buffer Pool
                        Cache


Sunday 8 May 2011

SQL Server --> Set memory options in server properties > Memory > Server memory
Options
STORAGE ENGINE INTERNALS
   Update                                         Datafile
                     DB
                    Front                     Transaction Log




                      Buffer Pool
                        Cache


Sunday 8 May 2011

SQL Server --> Set memory options in server properties > Memory > Server memory
Options
STORAGE ENGINE INTERNALS
   Update                                         Datafile

   Insert
                     DB
   Delete           Front                     Transaction Log




                      Buffer Pool
                        Cache


Sunday 8 May 2011

SQL Server --> Set memory options in server properties > Memory > Server memory
Options
STORAGE ENGINE INTERNALS
   Update                                         Datafile

   Insert
                     DB
   Delete           Front                     Transaction Log




                                           Checkpoint process
                      Buffer Pool
                        Cache


Sunday 8 May 2011

SQL Server --> Set memory options in server properties > Memory > Server memory
Options
STORAGE ENGINE INTERNALS
   Update                                         Datafile

   Insert
                     DB
   Delete           Front                     Transaction Log




                                           Checkpoint process
                      Buffer Pool
                        Cache


Sunday 8 May 2011

SQL Server --> Set memory options in server properties > Memory > Server memory
Options
DATA AND LOG PLACEMENT




Sunday 8 May 2011

This is most important for transactional databases.

As you can see, the difference of using a decent SAS or SSD disk for the database log is
negligible. There is no use sinking the cache into an SSD for logs, just get a decent, fast SAS.
SQL STATEMENT ‘DUHS’

   • Every          table MUST have a primary key

   • If    possible, use a clustered index

   • Only           keep regularly used indexes around (f. ex. FK)

   • WHERE             > JOIN > ORDER BY > SELECT

   • Don’t          use SELECT *

   • Try        not to use COUNT() (in InnoDB always a full table scan)

Sunday 8 May 2011
GENERAL OPTIMIZATION STRATEGY


                    Making the right hardware choices

                    Tuning the hypervisor to your database’s needs

                    Tuning the OS to your database’s needs

                    Squeezing every last bit of performance out of your
                    database


Sunday 8 May 2011
QUESTIONS?




             I don’t have the attention span to keep up a blog :(

     Results of benchmarks: http://www.anandtech.com/tag/IT

Sunday 8 May 2011

More Related Content

What's hot

iMinds The Conference: Jan Lemeire
iMinds The Conference: Jan LemeireiMinds The Conference: Jan Lemeire
iMinds The Conference: Jan Lemeireimec
 
GPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU ArchitecturesGPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU Architecturesinside-BigData.com
 
World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...
World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...
World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...Intel® Software
 
MikroTik RouterOS supported Hardware 2013
MikroTik RouterOS supported Hardware 2013MikroTik RouterOS supported Hardware 2013
MikroTik RouterOS supported Hardware 2013Tũi Wichets
 
RouterOS supported hardware, sep 2012
RouterOS supported hardware, sep 2012RouterOS supported hardware, sep 2012
RouterOS supported hardware, sep 2012Tũi Wichets
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingDESMOND YUEN
 
Provision Intel® Optane™ DC Persistent Memory in Linux*
Provision Intel® Optane™ DC Persistent Memory in Linux*Provision Intel® Optane™ DC Persistent Memory in Linux*
Provision Intel® Optane™ DC Persistent Memory in Linux*Intel® Software
 
SSD based storage tuning for databases
SSD based storage tuning for databasesSSD based storage tuning for databases
SSD based storage tuning for databasesAngelo Rajadurai
 
VMworld 2013: Cisco, VMware and Hyper-converged Solutions for the Enterprise....
VMworld 2013: Cisco, VMware and Hyper-converged Solutions for the Enterprise....VMworld 2013: Cisco, VMware and Hyper-converged Solutions for the Enterprise....
VMworld 2013: Cisco, VMware and Hyper-converged Solutions for the Enterprise....VMworld
 
Revisit DCA, PCIe TPH and DDIO
Revisit DCA, PCIe TPH and DDIORevisit DCA, PCIe TPH and DDIO
Revisit DCA, PCIe TPH and DDIOHisaki Ohara
 
Final draft intel core i5 processors architecture
Final draft intel core i5 processors architectureFinal draft intel core i5 processors architecture
Final draft intel core i5 processors architectureJawid Ahmad Baktash
 
Cuda 6 performance_report
Cuda 6 performance_reportCuda 6 performance_report
Cuda 6 performance_reportMichael Zhang
 
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVMHypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVMvwchu
 
Xen PV Performance Status and Optimization Opportunities
Xen PV Performance Status and Optimization OpportunitiesXen PV Performance Status and Optimization Opportunities
Xen PV Performance Status and Optimization OpportunitiesThe Linux Foundation
 
ZFS for Databases
ZFS for DatabasesZFS for Databases
ZFS for Databasesahl0003
 

What's hot (18)

第42回「System x vs ThinkServer 徹底比較 - 1Uラック・サーバー編 -」(2015/03/26 on しすなま!)
第42回「System x vs ThinkServer 徹底比較 - 1Uラック・サーバー編 -」(2015/03/26 on しすなま!)第42回「System x vs ThinkServer 徹底比較 - 1Uラック・サーバー編 -」(2015/03/26 on しすなま!)
第42回「System x vs ThinkServer 徹底比較 - 1Uラック・サーバー編 -」(2015/03/26 on しすなま!)
 
iMinds The Conference: Jan Lemeire
iMinds The Conference: Jan LemeireiMinds The Conference: Jan Lemeire
iMinds The Conference: Jan Lemeire
 
GPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU ArchitecturesGPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU Architectures
 
World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...
World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...
World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...
 
MikroTik RouterOS supported Hardware 2013
MikroTik RouterOS supported Hardware 2013MikroTik RouterOS supported Hardware 2013
MikroTik RouterOS supported Hardware 2013
 
RouterOS supported hardware, sep 2012
RouterOS supported hardware, sep 2012RouterOS supported hardware, sep 2012
RouterOS supported hardware, sep 2012
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic Computing
 
Provision Intel® Optane™ DC Persistent Memory in Linux*
Provision Intel® Optane™ DC Persistent Memory in Linux*Provision Intel® Optane™ DC Persistent Memory in Linux*
Provision Intel® Optane™ DC Persistent Memory in Linux*
 
Openstorage Openstack
Openstorage OpenstackOpenstorage Openstack
Openstorage Openstack
 
SSD based storage tuning for databases
SSD based storage tuning for databasesSSD based storage tuning for databases
SSD based storage tuning for databases
 
VMworld 2013: Cisco, VMware and Hyper-converged Solutions for the Enterprise....
VMworld 2013: Cisco, VMware and Hyper-converged Solutions for the Enterprise....VMworld 2013: Cisco, VMware and Hyper-converged Solutions for the Enterprise....
VMworld 2013: Cisco, VMware and Hyper-converged Solutions for the Enterprise....
 
Revisit DCA, PCIe TPH and DDIO
Revisit DCA, PCIe TPH and DDIORevisit DCA, PCIe TPH and DDIO
Revisit DCA, PCIe TPH and DDIO
 
Final draft intel core i5 processors architecture
Final draft intel core i5 processors architectureFinal draft intel core i5 processors architecture
Final draft intel core i5 processors architecture
 
Cuda 6 performance_report
Cuda 6 performance_reportCuda 6 performance_report
Cuda 6 performance_report
 
Nd Evo Plus
Nd Evo PlusNd Evo Plus
Nd Evo Plus
 
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVMHypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
 
Xen PV Performance Status and Optimization Opportunities
Xen PV Performance Status and Optimization OpportunitiesXen PV Performance Status and Optimization Opportunities
Xen PV Performance Status and Optimization Opportunities
 
ZFS for Databases
ZFS for DatabasesZFS for Databases
ZFS for Databases
 

Viewers also liked

Security attacks taxonomy on
Security attacks taxonomy onSecurity attacks taxonomy on
Security attacks taxonomy onijmnct
 
Learning Computer Network Through Network Simulation Program
Learning Computer Network Through Network Simulation ProgramLearning Computer Network Through Network Simulation Program
Learning Computer Network Through Network Simulation ProgramI Putu Hariyadi
 
VMware Performance for Gurus - A Tutorial
VMware Performance for Gurus - A TutorialVMware Performance for Gurus - A Tutorial
VMware Performance for Gurus - A TutorialRichard McDougall
 
CSO Magazine Confab 2013 Atlanta - Cyber Security
CSO Magazine Confab 2013 Atlanta - Cyber SecurityCSO Magazine Confab 2013 Atlanta - Cyber Security
CSO Magazine Confab 2013 Atlanta - Cyber SecurityPhil Agcaoili
 
Cyber Security Attacks - Critical Legal and Investigation Aspects
Cyber Security Attacks - Critical Legal and Investigation AspectsCyber Security Attacks - Critical Legal and Investigation Aspects
Cyber Security Attacks - Critical Legal and Investigation AspectsBenjamin Ang
 
Managing Cyber Threats: A Cyber Security Conversation with the Experts
Managing Cyber Threats: A Cyber Security Conversation with the ExpertsManaging Cyber Threats: A Cyber Security Conversation with the Experts
Managing Cyber Threats: A Cyber Security Conversation with the ExpertsCareer Communications Group
 

Viewers also liked (8)

Virtualized Databases?
Virtualized Databases?Virtualized Databases?
Virtualized Databases?
 
Security attacks taxonomy on
Security attacks taxonomy onSecurity attacks taxonomy on
Security attacks taxonomy on
 
Learning Computer Network Through Network Simulation Program
Learning Computer Network Through Network Simulation ProgramLearning Computer Network Through Network Simulation Program
Learning Computer Network Through Network Simulation Program
 
VMware Performance for Gurus - A Tutorial
VMware Performance for Gurus - A TutorialVMware Performance for Gurus - A Tutorial
VMware Performance for Gurus - A Tutorial
 
Distribute the workload, PHPTek, Amsterdam, 2011
Distribute the workload, PHPTek, Amsterdam, 2011Distribute the workload, PHPTek, Amsterdam, 2011
Distribute the workload, PHPTek, Amsterdam, 2011
 
CSO Magazine Confab 2013 Atlanta - Cyber Security
CSO Magazine Confab 2013 Atlanta - Cyber SecurityCSO Magazine Confab 2013 Atlanta - Cyber Security
CSO Magazine Confab 2013 Atlanta - Cyber Security
 
Cyber Security Attacks - Critical Legal and Investigation Aspects
Cyber Security Attacks - Critical Legal and Investigation AspectsCyber Security Attacks - Critical Legal and Investigation Aspects
Cyber Security Attacks - Critical Legal and Investigation Aspects
 
Managing Cyber Threats: A Cyber Security Conversation with the Experts
Managing Cyber Threats: A Cyber Security Conversation with the ExpertsManaging Cyber Threats: A Cyber Security Conversation with the Experts
Managing Cyber Threats: A Cyber Security Conversation with the Experts
 

Similar to OpenDBCamp Virtualization

Features of modern intel microprocessors
Features of modern intel microprocessorsFeatures of modern intel microprocessors
Features of modern intel microprocessorsKrunal Siddhapathak
 
Virtualization with Lenovo X6 Blade Servers: white paper
Virtualization with Lenovo X6 Blade Servers: white paperVirtualization with Lenovo X6 Blade Servers: white paper
Virtualization with Lenovo X6 Blade Servers: white paperLenovo Data Center
 
Introduce: IBM Power Linux with PowerKVM
Introduce: IBM Power Linux with PowerKVMIntroduce: IBM Power Linux with PowerKVM
Introduce: IBM Power Linux with PowerKVMZainal Abidin
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Slide_N
 
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce RichardsonThe 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardsonharryvanhaaren
 
OSDC 2017 | Open POWER for the data center by Werner Fischer
OSDC 2017 | Open POWER for the data center by Werner FischerOSDC 2017 | Open POWER for the data center by Werner Fischer
OSDC 2017 | Open POWER for the data center by Werner FischerNETWAYS
 
OSDC 2017 - Werner Fischer - Open power for the data center
OSDC 2017 - Werner Fischer - Open power for the data centerOSDC 2017 - Werner Fischer - Open power for the data center
OSDC 2017 - Werner Fischer - Open power for the data centerNETWAYS
 
OSDC 2017 | Linux Performance Profiling and Monitoring by Werner Fischer
OSDC 2017 | Linux Performance Profiling and Monitoring by Werner FischerOSDC 2017 | Linux Performance Profiling and Monitoring by Werner Fischer
OSDC 2017 | Linux Performance Profiling and Monitoring by Werner FischerNETWAYS
 
Oracle Database Appliance - RAC in a box Some strings attached
Oracle Database Appliance - RAC in a box Some strings attached Oracle Database Appliance - RAC in a box Some strings attached
Oracle Database Appliance - RAC in a box Some strings attached Fuad Arshad
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red_Hat_Storage
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화OpenStack Korea Community
 
QNAP for IoT
QNAP for IoTQNAP for IoT
QNAP for IoTqnapivan
 
PhegData X - High Performance EBS
PhegData X - High Performance EBSPhegData X - High Performance EBS
PhegData X - High Performance EBSHanson Dong
 
Presentation v mware performance overview
Presentation   v mware performance overviewPresentation   v mware performance overview
Presentation v mware performance overviewsolarisyourep
 
OSS Presentation VMWorld 2011 by Andy Bennett & Craig Morgan
OSS Presentation VMWorld 2011 by Andy Bennett & Craig MorganOSS Presentation VMWorld 2011 by Andy Bennett & Craig Morgan
OSS Presentation VMWorld 2011 by Andy Bennett & Craig MorganOpenStorageSummit
 
SQLintersection keynote a tale of two teams
SQLintersection keynote a tale of two teamsSQLintersection keynote a tale of two teams
SQLintersection keynote a tale of two teamsSumeet Bansal
 
Wp intelli cache_reduction_iops_xd5.6_fp1_xs6.1
Wp intelli cache_reduction_iops_xd5.6_fp1_xs6.1Wp intelli cache_reduction_iops_xd5.6_fp1_xs6.1
Wp intelli cache_reduction_iops_xd5.6_fp1_xs6.1Nuno Alves
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...In-Memory Computing Summit
 

Similar to OpenDBCamp Virtualization (20)

Features of modern intel microprocessors
Features of modern intel microprocessorsFeatures of modern intel microprocessors
Features of modern intel microprocessors
 
Virtualization with Lenovo X6 Blade Servers: white paper
Virtualization with Lenovo X6 Blade Servers: white paperVirtualization with Lenovo X6 Blade Servers: white paper
Virtualization with Lenovo X6 Blade Servers: white paper
 
Introduce: IBM Power Linux with PowerKVM
Introduce: IBM Power Linux with PowerKVMIntroduce: IBM Power Linux with PowerKVM
Introduce: IBM Power Linux with PowerKVM
 
The Cell Processor
The Cell ProcessorThe Cell Processor
The Cell Processor
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
 
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce RichardsonThe 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
 
OSDC 2017 | Open POWER for the data center by Werner Fischer
OSDC 2017 | Open POWER for the data center by Werner FischerOSDC 2017 | Open POWER for the data center by Werner Fischer
OSDC 2017 | Open POWER for the data center by Werner Fischer
 
OSDC 2017 - Werner Fischer - Open power for the data center
OSDC 2017 - Werner Fischer - Open power for the data centerOSDC 2017 - Werner Fischer - Open power for the data center
OSDC 2017 - Werner Fischer - Open power for the data center
 
OSDC 2017 | Linux Performance Profiling and Monitoring by Werner Fischer
OSDC 2017 | Linux Performance Profiling and Monitoring by Werner FischerOSDC 2017 | Linux Performance Profiling and Monitoring by Werner Fischer
OSDC 2017 | Linux Performance Profiling and Monitoring by Werner Fischer
 
Ibm cell
Ibm cell Ibm cell
Ibm cell
 
Oracle Database Appliance - RAC in a box Some strings attached
Oracle Database Appliance - RAC in a box Some strings attached Oracle Database Appliance - RAC in a box Some strings attached
Oracle Database Appliance - RAC in a box Some strings attached
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
 
QNAP for IoT
QNAP for IoTQNAP for IoT
QNAP for IoT
 
PhegData X - High Performance EBS
PhegData X - High Performance EBSPhegData X - High Performance EBS
PhegData X - High Performance EBS
 
Presentation v mware performance overview
Presentation   v mware performance overviewPresentation   v mware performance overview
Presentation v mware performance overview
 
OSS Presentation VMWorld 2011 by Andy Bennett & Craig Morgan
OSS Presentation VMWorld 2011 by Andy Bennett & Craig MorganOSS Presentation VMWorld 2011 by Andy Bennett & Craig Morgan
OSS Presentation VMWorld 2011 by Andy Bennett & Craig Morgan
 
SQLintersection keynote a tale of two teams
SQLintersection keynote a tale of two teamsSQLintersection keynote a tale of two teams
SQLintersection keynote a tale of two teams
 
Wp intelli cache_reduction_iops_xd5.6_fp1_xs6.1
Wp intelli cache_reduction_iops_xd5.6_fp1_xs6.1Wp intelli cache_reduction_iops_xd5.6_fp1_xs6.1
Wp intelli cache_reduction_iops_xd5.6_fp1_xs6.1
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

OpenDBCamp Virtualization

  • 1. VIRTUAL DATABASES? Optimizing for Virtualization Liz van Dijk - @lizztheblizz - liz@sizingservers.be Sunday 8 May 2011
  • 2. THE GOAL “Virtualized Databases suck!” - Is this really true? Does it have to be? Sunday 8 May 2011 Databases are supposed to be “hard to virtualize” and have decreased performance in a virtual environment. This is actually correct, dumping a native database into a virtual environment without applying any changes could potentially cause some issues.
  • 3. HOW DO WE GET THERE? 1. Understanding just why the virtual environment impacts performance, and taking the correct steps to adapt our database to its new habitat. 2. Optimize, optimize, optimize... Sunday 8 May 2011 Action: We have to understand why performance of databases is influenced, and how we can arm ourselves against this impact. On the other hand, while there used be less of a need for optimization in an environment where hardware was abundant, a virtual environment causes struggles for resources more quickly. It’s important to create our application as slim as possible without losing performance. In many cases, performance can be multiplied by having a closer look at the database. Message: Why is this interesting for you? This knowledge could convince you to make the switch to a virtual environment, trusting it won’t hit your software’s performance, and will help you take a look at your existing infrastructure to take the necessary steps to run your application as optimal as possible.
  • 4. THE INFLUENCE OF VIRTUALIZATION • All “kernel” activity is more costly: • Interrupts • System Calls (I/O) • Memory page management Sunday 8 May 2011 So, let’s start with the understanding step: what could potentially slow down because of virtualization? The 3 most important aspects are: Interrupts - An actual piece of hardware is looking for attention from the CPU. Making use of Jumbo Frames is a very good idea in a virtual environment, because sending the same data causes less interrupts (1500 --> 9000 bytes per packet) System Calls - A process is looking for attention from the kernel to do a privileged task like accessing certain hardware (network/disk IO) Page Management - This is the most important one for databases: think caching. The database keeps an enormous amount of data in its own caches, so memory is manipulated a lot of the time. Every time something changes in this memory, the virtual host has to perform a double translation: From Virtual Memory to VM pagetable to physical address. Usually, this causes the biggest performance hit when switching from native to virtual. We really have to do everything we can to minimize this problem.
  • 5. GENERAL OPTIMIZATION STRATEGY Making the right hardware choices Tuning the hypervisor to your database’s needs Tuning the OS to your database’s needs Squeezing every last bit of performance out of your database Sunday 8 May 2011 Performance issues should be dealt with systematically, and we can split that process up in these 4 steps.
  • 6. HARDWARE CHOICES • Choosing the right CPU’s • Intel 5500/7500 and later types (Nehalem) / All AMD quadcore Opterons (HW-assisted/MMU virtualization) • Choosing the right NIC’s (VMDQ) • Choosing the right storage system (iSCSI vs FC SAN) Sunday 8 May 2011 CPU’s --> HW Virtualization (dom -1) & HAP best price/quality at the moment Opteron 6000 series very good at datamining/decision support Xeon 5600 series still very good at OLTP VMDQ = sorting/queueing offloaded to the NIC
  • 10. OVERVIEW  NIC  -­‐  VMDQ  /  NETQUEUE Netqueue  Devices Part  nr Speed Interface Intel  Ethernet  Server  Adapter  X520-­‐SR2  2  ports E10G42BFSR 10Gbps SR-­‐LC Intel  Ethernet  Server  Adapter  X520-­‐DA2  2  ports E10G42BTDA 10Gbps SFP+ Intel  Gigabit  ET  Dual  Port  Server  Adapter  2  ports E1G42ET 1Gbps RJ-­‐45  -­‐  Copper Intel  Gigabit  EF  Dual  Port  Server  Adapter  2  ports E1G42EF 1Gbps RJ-­‐45  -­‐  Fibre Intel  Gigabit  ET  Quad  Port  Server  Adapter  4  ports E1G44ET 1Gbps RJ-­‐45  -­‐  Copper Intel  Gigabit  CT  Desktop  Adapter EXPI9301CT 1Gbps RJ-­‐45  -­‐  Copper Supermicro  Add-­‐on  Card  AOC-­‐SG-­‐I2  2  ports AOC-­‐SG-­‐I2 1Gbps RJ-­‐45  copper Onboard  82576  (8  Virtual  Queues) Onboard  82574  Geen  IOV Broadcom's  NetXtreme  II  Ethernet  chipse 1-­‐10  GBps Alle  Neterions   1-­‐10  GBps Sunday 8 May 2011
  • 11. SAN CHOICES • Fibre Channel • ESX with FC-HBA • vSphere: FC-HBA pass-through to Guest OS • iSCSI (using 10Gbit if possible) • ESX with Hardware Initiator (iSCSI HBA) • ESX with Software Initiator • Initiator inside the Guest OS • vSphere: iSCSI HBA pass-through to Guest OS Sunday 8 May 2011 10Gbit = high CPU overhead!! We’re talking 24GHz to fill up 9Gbits This problem can be reduced by the following technologies VT-d ---> Moving DMA and address translation to the NIC VMDQ/Netqueue ---> Netqueue is pretty much VMware’s implementation SR-IOV ---> Allowing one physical device (NIC) to show itself as multiple virtual devices.
  • 12. SAN CHOICES • Fibre Channel • ESX with FC-HBA • vSphere: FC-HBA pass-through to Guest OS Server with (hardware) iSCSI = iSCSI Target • iSCSI (using 10Gbit if possible) • ESX with Hardware Initiator (iSCSI HBA) • ESX with Software Initiator • Initiator inside the Guest OS • vSphere: iSCSI HBA pass-through to Guest OS (Virtualization-) server with (hardware) iSCSI = iSCSI Initiator Sunday 8 May 2011 10Gbit = high CPU overhead!! We’re talking 24GHz to fill up 9Gbits This problem can be reduced by the following technologies VT-d ---> Moving DMA and address translation to the NIC VMDQ/Netqueue ---> Netqueue is pretty much VMware’s implementation SR-IOV ---> Allowing one physical device (NIC) to show itself as multiple virtual devices.
  • 13. GENERAL OPTIMIZATION STRATEGY Making the right “hardware” choices Tuning the hypervisor to your database’s needs Tuning the OS to your database’s needs Squeezing every last bit of performance out of your database Sunday 8 May 2011
  • 14. VIRTUAL MEMORY 0xA 0xB 0xC 0xD 0xE 0xF 0xG 0xH CPU Mem Managed by software Actual Hardware Sunday 8 May 2011 CPU’s: AMD: all 4core opterons Intel: Xeon 5500, 7500, 5600 Physical memory is divided into segments of 4KB, which is translated in software to so-called pages. Small chunks with each its own address, which the CPU uses to find the data in the physical memory. A piece of software always gets a continuous block of “virtual” memory assigned to it within an OS, even though the physical memory is fragmented, to prevent a coding nightmare. (keeping track of every single page address is madness). The page table was made for the CPU to run through and to make the necessary translation to the physical memory. The CPU has a hardware cache that keeps track of these entries, the Translation Lookaside Buffer. This is an extremely fast buffer that saves the most recent addresses, so the CPU can prevent running through the Page Table as much as possible.
  • 15. VIRTUAL MEMORY Virtual 0xA Memory 0xB 1 0xC 2 0xD 3 0xE 4 0xF 5 0xG OS 6 7 0xH CPU 8 9 Mem 10 11 12 Managed by software Actual Hardware Sunday 8 May 2011 CPU’s: AMD: all 4core opterons Intel: Xeon 5500, 7500, 5600 Physical memory is divided into segments of 4KB, which is translated in software to so-called pages. Small chunks with each its own address, which the CPU uses to find the data in the physical memory. A piece of software always gets a continuous block of “virtual” memory assigned to it within an OS, even though the physical memory is fragmented, to prevent a coding nightmare. (keeping track of every single page address is madness). The page table was made for the CPU to run through and to make the necessary translation to the physical memory. The CPU has a hardware cache that keeps track of these entries, the Translation Lookaside Buffer. This is an extremely fast buffer that saves the most recent addresses, so the CPU can prevent running through the Page Table as much as possible.
  • 16. VIRTUAL MEMORY Virtual Page Table 0xA Memory 0xB 1 1 | 0xD 0xC 2 2 | 0xC 0xD 3 3 | 0xF 0xE 4 4 | 0xA 0xF 5 5 | 0xH 0xG OS 6 7 6 | 0xG 7 | 0xB 0xH CPU 8 9 8 | 0xE Mem 10 11 etc. 12 Managed by software Actual Hardware Sunday 8 May 2011 CPU’s: AMD: all 4core opterons Intel: Xeon 5500, 7500, 5600 Physical memory is divided into segments of 4KB, which is translated in software to so-called pages. Small chunks with each its own address, which the CPU uses to find the data in the physical memory. A piece of software always gets a continuous block of “virtual” memory assigned to it within an OS, even though the physical memory is fragmented, to prevent a coding nightmare. (keeping track of every single page address is madness). The page table was made for the CPU to run through and to make the necessary translation to the physical memory. The CPU has a hardware cache that keeps track of these entries, the Translation Lookaside Buffer. This is an extremely fast buffer that saves the most recent addresses, so the CPU can prevent running through the Page Table as much as possible.
  • 17. VIRTUAL MEMORY Virtual Page Table 0xA Memory 0xB 1 1 | 0xD 0xC 2 2 | 0xC 0xD TLB 3 3 | 0xF 0xE 1 | 0xD 4 4 | 0xA 0xF 5 | 0xH 5 5 | 0xH 0xG 2 | 0xC OS 6 7 6 | 0xG 7 | 0xB 0xH CPU 8 9 8 | 0xE Mem etc. 10 11 etc. 12 Managed by software Actual Hardware Sunday 8 May 2011 CPU’s: AMD: all 4core opterons Intel: Xeon 5500, 7500, 5600 Physical memory is divided into segments of 4KB, which is translated in software to so-called pages. Small chunks with each its own address, which the CPU uses to find the data in the physical memory. A piece of software always gets a continuous block of “virtual” memory assigned to it within an OS, even though the physical memory is fragmented, to prevent a coding nightmare. (keeping track of every single page address is madness). The page table was made for the CPU to run through and to make the necessary translation to the physical memory. The CPU has a hardware cache that keeps track of these entries, the Translation Lookaside Buffer. This is an extremely fast buffer that saves the most recent addresses, so the CPU can prevent running through the Page Table as much as possible.
  • 18. SPT VS HAP “Read-only” 0xA Page Table 0xB 1 | 0xD 0xC 1 2 5 | 0xH 0xD VM A 3 2 | 0xC 0xE 0xF 4 5 0xG N 0xH CPU 1 12 | 0xB 10 | 0xE Mem 2 VM B 3 9 | 0xA 4 12 etc. Managed by VM OS Managed by hypervisor Actual Hardware Sunday 8 May 2011 In a virtual environment, where the guest OS is not allowed direct access to the memory, this was solved in a different way. Each VM gets access to its own page table, but this one is actually locked/read-only, and as soon as a change is made, a “trap” is generated, so the hypervisor is forced to take over and handle the page management. This causes a lot of overhead, because every single memory management action forces the hypervisor to intervene. As an alternative, new CPU’s came to the market with a modified TLB-cache, which was able to keep track of the complete translation path (VM virtual address --> VM physical address --> host physical address) Downside: Because of this, filling up the TLB got a lot more complex. A page that is not yet in there is very hard to find. Once the TLB is properly warmed up, though, most applications rarely have to wait for other pages.
  • 19. SPT VS HAP “Read-only” “Shadow” 0xA Page Table Page Table 0xB 1 | 0xD 1 | 0xG 0xC 1 5 | 0xH 5 | 0xD 0xD 2 VM A 3 2 | 0xC 2 | 0xF 0xE 0xF 4 5 N A 0xG 0xH CPU 1 12 | 0xB 10 | 0xE 12 | 0xE 10 | 0xB Mem 2 VM B 3 9 | 0xA 9 | 0xC 4 12 etc. B Managed by VM OS Managed by hypervisor Actual Hardware Sunday 8 May 2011 In a virtual environment, where the guest OS is not allowed direct access to the memory, this was solved in a different way. Each VM gets access to its own page table, but this one is actually locked/read-only, and as soon as a change is made, a “trap” is generated, so the hypervisor is forced to take over and handle the page management. This causes a lot of overhead, because every single memory management action forces the hypervisor to intervene. As an alternative, new CPU’s came to the market with a modified TLB-cache, which was able to keep track of the complete translation path (VM virtual address --> VM physical address --> host physical address) Downside: Because of this, filling up the TLB got a lot more complex. A page that is not yet in there is very hard to find. Once the TLB is properly warmed up, though, most applications rarely have to wait for other pages.
  • 20. SPT VS HAP “Read-only” 0xA Page Table TLB 0xB 1 | 0xD 0xC A1 | 0xD 1 5 | 0xH 0xD A5 | 0xH 2 VM A 3 2 | 0xC 0xE 0xF A2 | 0xC B12 | 0xB 4 5 0xG B10 | 0xE N 0xH B9 | 0xA CPU 1 12 | 0xB 10 | 0xE Mem 2 VM B 3 9 | 0xA 4 12 etc. etc. Managed by VM OS Managed by hypervisor Actual Hardware Sunday 8 May 2011 In a virtual environment, where the guest OS is not allowed direct access to the memory, this was solved in a different way. Each VM gets access to its own page table, but this one is actually locked/read-only, and as soon as a change is made, a “trap” is generated, so the hypervisor is forced to take over and handle the page management. This causes a lot of overhead, because every single memory management action forces the hypervisor to intervene. As an alternative, new CPU’s came to the market with a modified TLB-cache, which was able to keep track of the complete translation path (VM virtual address --> VM physical address --> host physical address) Downside: Because of this, filling up the TLB got a lot more complex. A page that is not yet in there is very hard to find. Once the TLB is properly warmed up, though, most applications rarely have to wait for other pages.
  • 21. HAP Sunday 8 May 2011 As you can see, in general this does help improve performance, though not by a really huge amount. It opens the door to a great combination with another technique, though!
  • 22. HAP + LARGE PAGES Setting Large Pages: • Linux - increase SHMMAX in rc.local • Windows - grant “Lock Pages in memory” • MySQL (only InnoDB) - large-pages • Oracle - ORA_LPENABLE=1 in registry • SQL Server - Enterprise only, need >8GB RAM. For buffer pool start up with trace flag -834 Sunday 8 May 2011 While using HAP, you should definitely make use of Large Pages, because filling up the TLB is a lot more expensive. By using Large Pages (2mb in 4kb), a LOT more memory can be accessed by a single entry. This in combination with a bigger TLB in the newest CPU’s attempts to prevent entries from disappearing from the TLB too fast. Oracle: HKEY_LOCAL_MACHINESOFTWAREORACLEKEY_HOME_NAME
  • 23. HAP + LARGE PAGES Setting Large Pages: • Linux - increase SHMMAX in rc.local • Windows - grant “Lock Pages in memory” • MySQL (only InnoDB) - large-pages • Oracle - ORA_LPENABLE=1 in registry • SQL Server - Enterprise only, need >8GB RAM. For buffer pool start up with trace flag -834 Sunday 8 May 2011 While using HAP, you should definitely make use of Large Pages, because filling up the TLB is a lot more expensive. By using Large Pages (2mb in 4kb), a LOT more memory can be accessed by a single entry. This in combination with a bigger TLB in the newest CPU’s attempts to prevent entries from disappearing from the TLB too fast. Oracle: HKEY_LOCAL_MACHINESOFTWAREORACLEKEY_HOME_NAME
  • 24. VIRTUAL HBA’S • Choices (ESX) • Before vSphere: • BusLogic Parallel (Legacy) • LSI Logic Parallel (Optimized) • Since vSphere • LSI Logic SAS (default as of Win2008) • VMware Paravirtual (PVSCSI) • Thin vs Thick Provisioning (vSphere) • Snapshots & performance do not go together Sunday 8 May 2011 BusLogic ---> Generic adapter LSILogic ---> Optimized adapter that requires tools LSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering) PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command queueing.
  • 25. VIRTUAL HBA’S • Choices (ESX) • Before vSphere: • BusLogic Parallel (Legacy) • LSI Logic Parallel (Optimized) • Since vSphere • LSI Logic SAS (default as of Win2008) • VMware Paravirtual (PVSCSI) • Thin vs Thick Provisioning (vSphere) • Snapshots & performance do not go together Sunday 8 May 2011 BusLogic ---> Generic adapter LSILogic ---> Optimized adapter that requires tools LSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering) PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command queueing.
  • 26. VIRTUAL HBA’S • Choices (ESX) • Before vSphere: • BusLogic Parallel (Legacy) • LSI Logic Parallel (Optimized) • Since vSphere • LSI Logic SAS (default as of Win2008) • VMware Paravirtual (PVSCSI) • Thin vs Thick Provisioning (vSphere) • Snapshots & performance do not go together Sunday 8 May 2011 BusLogic ---> Generic adapter LSILogic ---> Optimized adapter that requires tools LSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering) PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command queueing.
  • 27. VIRTUAL HBA’S • Choices (ESX) • Before vSphere: • BusLogic Parallel (Legacy) • LSI Logic Parallel (Optimized) • Since vSphere • LSI Logic SAS (default as of Win2008) • VMware Paravirtual (PVSCSI) • Thin vs Thick Provisioning (vSphere) • Snapshots & performance do not go together Sunday 8 May 2011 BusLogic ---> Generic adapter LSILogic ---> Optimized adapter that requires tools LSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering) PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command queueing.
  • 28. VIRTUAL HBA’S • Choices (ESX) • Before vSphere: • BusLogic Parallel (Legacy) • LSI Logic Parallel (Optimized) • Since vSphere • LSI Logic SAS (default as of Win2008) • VMware Paravirtual (PVSCSI) • Thin vs Thick Provisioning (vSphere) • Snapshots & performance do not go together Sunday 8 May 2011 BusLogic ---> Generic adapter LSILogic ---> Optimized adapter that requires tools LSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering) PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command queueing.
  • 29. VIRTUAL HBA’S • Choices (ESX) • Before vSphere: • BusLogic Parallel (Legacy) • LSI Logic Parallel (Optimized) • Since vSphere • LSI Logic SAS (default as of Win2008) • VMware Paravirtual (PVSCSI) • Thin vs Thick Provisioning (vSphere) • Snapshots & performance do not go together Sunday 8 May 2011 BusLogic ---> Generic adapter LSILogic ---> Optimized adapter that requires tools LSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering) PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command queueing.
  • 30. VIRTUAL NIC’S • Choices (ESX) • Before vSphere: • Flexible (emulation) • E1000 (Intel E1000 emulation, default x64) • (enhanced) VMXNET (paravirtual) • Since vSphere: • VMXNET 3 (third generation paravirtual NIC) • Jumbo frames, NIC Teaming, VLANs • Colocation (minimize NIC traffic by sharing a host) Sunday 8 May 2011 Flexible ---> required for 32-bit systems Automatically turns into a VMXNET after installing VMware Tools VMXNet adds ‘Jumbo Frames’ VMXNET3 adds: * MSI/MSI-X support (if supported by guest OS Kernel) • Receive Side Scaling (Windows 2008 only) • IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU) • VLAN Offloading • Bigger TX/RX ring sizes • Optimizations for iSCSI & VMotion • Necessary for VMDq!!
  • 31. VIRTUAL NIC’S • Choices (ESX) • Before vSphere: • Flexible (emulation) • E1000 (Intel E1000 emulation, default x64) • (enhanced) VMXNET (paravirtual) • Since vSphere: • VMXNET 3 (third generation paravirtual NIC) • Jumbo frames, NIC Teaming, VLANs • Colocation (minimize NIC traffic by sharing a host) Sunday 8 May 2011 Flexible ---> required for 32-bit systems Automatically turns into a VMXNET after installing VMware Tools VMXNet adds ‘Jumbo Frames’ VMXNET3 adds: * MSI/MSI-X support (if supported by guest OS Kernel) • Receive Side Scaling (Windows 2008 only) • IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU) • VLAN Offloading • Bigger TX/RX ring sizes • Optimizations for iSCSI & VMotion • Necessary for VMDq!!
  • 32. VIRTUAL NIC’S • Choices (ESX) • Before vSphere: • Flexible (emulation) • E1000 (Intel E1000 emulation, default x64) • (enhanced) VMXNET (paravirtual) • Since vSphere: • VMXNET 3 (third generation paravirtual NIC) • Jumbo frames, NIC Teaming, VLANs • Colocation (minimize NIC traffic by sharing a host) Sunday 8 May 2011 Flexible ---> required for 32-bit systems Automatically turns into a VMXNET after installing VMware Tools VMXNet adds ‘Jumbo Frames’ VMXNET3 adds: * MSI/MSI-X support (if supported by guest OS Kernel) • Receive Side Scaling (Windows 2008 only) • IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU) • VLAN Offloading • Bigger TX/RX ring sizes • Optimizations for iSCSI & VMotion • Necessary for VMDq!!
  • 33. VIRTUAL NIC’S • Choices (ESX) • Before vSphere: • Flexible (emulation) • E1000 (Intel E1000 emulation, default x64) • (enhanced) VMXNET (paravirtual) • Since vSphere: • VMXNET 3 (third generation paravirtual NIC) • Jumbo frames, NIC Teaming, VLANs • Colocation (minimize NIC traffic by sharing a host) Sunday 8 May 2011 Flexible ---> required for 32-bit systems Automatically turns into a VMXNET after installing VMware Tools VMXNet adds ‘Jumbo Frames’ VMXNET3 adds: * MSI/MSI-X support (if supported by guest OS Kernel) • Receive Side Scaling (Windows 2008 only) • IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU) • VLAN Offloading • Bigger TX/RX ring sizes • Optimizations for iSCSI & VMotion • Necessary for VMDq!!
  • 34. VIRTUAL NIC’S • Choices (ESX) • Before vSphere: • Flexible (emulation) • E1000 (Intel E1000 emulation, default x64) • (enhanced) VMXNET (paravirtual) • Since vSphere: • VMXNET 3 (third generation paravirtual NIC) • Jumbo frames, NIC Teaming, VLANs • Colocation (minimize NIC traffic by sharing a host) Sunday 8 May 2011 Flexible ---> required for 32-bit systems Automatically turns into a VMXNET after installing VMware Tools VMXNet adds ‘Jumbo Frames’ VMXNET3 adds: * MSI/MSI-X support (if supported by guest OS Kernel) • Receive Side Scaling (Windows 2008 only) • IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU) • VLAN Offloading • Bigger TX/RX ring sizes • Optimizations for iSCSI & VMotion • Necessary for VMDq!!
  • 35. GENERAL OPTIMIZATION STRATEGY Making the right hardware choices Tuning the hypervisor to your database’s needs Tuning the OS to your database’s needs Squeezing every last bit of performance out of your database Sunday 8 May 2011
  • 36. BEST OS CHOICES • 64-bit Linux for MySQL • MySQL 5.1.32 or later • ... ? (discuss mode on! :) ) Sunday 8 May 2011 Modified mutexes for InnoDB = improvement of locking for multithreaded environments. This allows for much better scaling.
  • 37. DON’T FORGET • VMware Tools • Integration Services • Paravirtualized Vmxnet, • Paravirtualized Drivers PVSCSI • Hypercall adapter • Ballooning • Time Sync • Time Sync • ... and more recent drivers • ... and more recent drivers Sunday 8 May 2011 Definitely install the tools of the hypervisor in question to enable use of its newest functionalities. This is very important if you want to use for example overcommitting memory in ESX, or using paravirtualization in Linux on Hyper-V.
  • 38. CACHING LEVELS • CPU • Application • Filesystem / OS • RAID Controller (switch off or use a BBU!) • Disk Sunday 8 May 2011 CPU: Just buy the right CPU App/FS: use the correct settings (Direct IO) RAID Controller: Make use of a battery backupped unit (for transactional databases: lots of random writes in the cache, so to be sure, the RAID controller keeps track of those). This is mostly used as a write buffer. Disk: If cache is available on-disk, it’s best we disable this, especially when the power drops (so nothing can get stuck in the caches). HP disables these by default.
  • 39. GENERAL OPTIMIZATION STRATEGY Making the right hardware choices Tuning the hypervisor to your database’s needs Tuning the OS to your database’s needs Squeezing every last bit of performance out of your database Sunday 8 May 2011
  • 40. DIRECT IO • Less Page management • Smallest cache possible vs Less I/O SQL Server: Automatically MySQL: only for use with InnoDB! - innodb_flush_method=O_DIRECT Oracle: filesystemio_options=DIRECTIO Sunday 8 May 2011 Though in Windows this is on by default, in Linux it should definitely be enabled. Otherwise everything that is already cached by the InnoDB buffer pool may also be cached by the filesystem cache, so two separate but identical caches need to be maintained in the memory: far too much memory management. MySQL’s MyISAM actually depends on this filesystem cache. It expects the OS to do the brunt of the caching work itself.
  • 41. GENERAL MY.CNF OPTIMIZATIONS • max_connections (151) (File descriptors!) • Per connection • read_buffer_size (128K) (Full Scan) • read_rnd_buffer_size (256K) (Order By) • sort_buffer_size (2M) (Sorts) • join_buffer_size (128K) (Full Scan Join) Sunday 8 May 2011
  • 42. GENERAL MY.CNF OPTIMIZATIONS • thread_cache (check out max_used_connections) • table_cache (64) - table_open_cache (5.1.3x) • Engine dependent • open_tables variable opened_tables ∆ ≈ 0 • • innodb_buffer_pool_size • innodb_thread_concurrency Sunday 8 May 2011 Try to fit max_used_connections into the thread_cache IF POSSIBLE
  • 43. INDEXING • Heaps • Unclustered Indexes • Clustered Indexes (InnoDB) Sunday 8 May 2011
  • 44. INDEX FRAGMENTATION Clustered Index Leaf Level • Happens with clustered indexes • Large-scale fragmentation of the indexes could cause serious performance problems • Fixes: • SQL Server: REBUILD/REORGANIZE • MySQL: ALTER TABLE tbl_name ENGINE=INNODB • Oracle: ALTER INDEX index_name REBUILD Sunday 8 May 2011
  • 45. STORAGE ENGINE INTERNALS Datafile DB Front Transaction Log Buffer Pool Cache Sunday 8 May 2011 SQL Server --> Set memory options in server properties > Memory > Server memory Options
  • 46. STORAGE ENGINE INTERNALS Update Datafile DB Front Transaction Log Buffer Pool Cache Sunday 8 May 2011 SQL Server --> Set memory options in server properties > Memory > Server memory Options
  • 47. STORAGE ENGINE INTERNALS Update Datafile DB Front Transaction Log Buffer Pool Cache Sunday 8 May 2011 SQL Server --> Set memory options in server properties > Memory > Server memory Options
  • 48. STORAGE ENGINE INTERNALS Update Datafile DB Front Transaction Log Buffer Pool Cache Sunday 8 May 2011 SQL Server --> Set memory options in server properties > Memory > Server memory Options
  • 49. STORAGE ENGINE INTERNALS Update Datafile DB Front Transaction Log Buffer Pool Cache Sunday 8 May 2011 SQL Server --> Set memory options in server properties > Memory > Server memory Options
  • 50. STORAGE ENGINE INTERNALS Update Datafile Insert DB Delete Front Transaction Log Buffer Pool Cache Sunday 8 May 2011 SQL Server --> Set memory options in server properties > Memory > Server memory Options
  • 51. STORAGE ENGINE INTERNALS Update Datafile Insert DB Delete Front Transaction Log Checkpoint process Buffer Pool Cache Sunday 8 May 2011 SQL Server --> Set memory options in server properties > Memory > Server memory Options
  • 52. STORAGE ENGINE INTERNALS Update Datafile Insert DB Delete Front Transaction Log Checkpoint process Buffer Pool Cache Sunday 8 May 2011 SQL Server --> Set memory options in server properties > Memory > Server memory Options
  • 53. DATA AND LOG PLACEMENT Sunday 8 May 2011 This is most important for transactional databases. As you can see, the difference of using a decent SAS or SSD disk for the database log is negligible. There is no use sinking the cache into an SSD for logs, just get a decent, fast SAS.
  • 54. SQL STATEMENT ‘DUHS’ • Every table MUST have a primary key • If possible, use a clustered index • Only keep regularly used indexes around (f. ex. FK) • WHERE > JOIN > ORDER BY > SELECT • Don’t use SELECT * • Try not to use COUNT() (in InnoDB always a full table scan) Sunday 8 May 2011
  • 55. GENERAL OPTIMIZATION STRATEGY Making the right hardware choices Tuning the hypervisor to your database’s needs Tuning the OS to your database’s needs Squeezing every last bit of performance out of your database Sunday 8 May 2011
  • 56. QUESTIONS? I don’t have the attention span to keep up a blog :( Results of benchmarks: http://www.anandtech.com/tag/IT Sunday 8 May 2011