This is a presentation I gave on impulse at Open Database Camp in Sardegna, Italy last weekend, en then a bit less impulsively at the Inuits igloo.
A word of caution: I included the notes because they contain some extra info, but the presentation was hacked together from several older ones (not all of them my own) so there might be some flukes in there. :)
08448380779 Call Girls In Friends Colony Women Seeking Men
OpenDBCamp Virtualization
1. VIRTUAL DATABASES?
Optimizing for Virtualization
Liz van Dijk - @lizztheblizz - liz@sizingservers.be
Sunday 8 May 2011
2. THE GOAL
“Virtualized Databases suck!” - Is this really
true? Does it have to be?
Sunday 8 May 2011
Databases are supposed to be “hard to virtualize” and have decreased performance in a
virtual environment. This is actually correct, dumping a native database into a virtual
environment without applying any changes could potentially cause some issues.
3. HOW DO WE GET THERE?
1. Understanding just why the virtual environment
impacts performance, and taking the correct steps to
adapt our database to its new habitat.
2. Optimize, optimize, optimize...
Sunday 8 May 2011
Action: We have to understand why performance of databases is influenced, and how we can
arm ourselves against this impact.
On the other hand, while there used be less of a need for optimization in an environment
where hardware was abundant, a virtual environment causes struggles for resources more
quickly. It’s important to create our application as slim as possible without losing
performance. In many cases, performance can be multiplied by having a closer look at the
database.
Message: Why is this interesting for you? This knowledge could convince you to make the
switch to a virtual environment, trusting it won’t hit your software’s performance, and will
help you take a look at your existing infrastructure to take the necessary steps to run your
application as optimal as possible.
4. THE INFLUENCE OF VIRTUALIZATION
• All “kernel” activity is more costly:
• Interrupts
• System Calls (I/O)
• Memory page management
Sunday 8 May 2011
So, let’s start with the understanding step: what could potentially slow down because of
virtualization?
The 3 most important aspects are:
Interrupts - An actual piece of hardware is looking for attention from the CPU. Making use of
Jumbo Frames is a very good idea in a virtual environment, because sending the same data
causes less interrupts (1500 --> 9000 bytes per packet)
System Calls - A process is looking for attention from the kernel to do a privileged task like
accessing certain hardware (network/disk IO)
Page Management - This is the most important one for databases: think caching. The
database keeps an enormous amount of data in its own caches, so memory is manipulated a
lot of the time. Every time something changes in this memory, the virtual host has to perform
a double translation: From Virtual Memory to VM pagetable to physical address.
Usually, this causes the biggest performance hit when switching from native to virtual. We
really have to do everything we can to minimize this problem.
5. GENERAL OPTIMIZATION STRATEGY
Making the right hardware choices
Tuning the hypervisor to your database’s needs
Tuning the OS to your database’s needs
Squeezing every last bit of performance out of your
database
Sunday 8 May 2011
Performance issues should be dealt with systematically, and we can split that process up in
these 4 steps.
6. HARDWARE CHOICES
• Choosing the right CPU’s
• Intel 5500/7500 and later types
(Nehalem) / All AMD quadcore
Opterons (HW-assisted/MMU
virtualization)
• Choosing the right NIC’s (VMDQ)
• Choosing the right storage system
(iSCSI vs FC SAN)
Sunday 8 May 2011
CPU’s --> HW Virtualization (dom -1) & HAP
best price/quality at the moment
Opteron 6000 series very good at datamining/decision support
Xeon 5600 series still very good at OLTP
VMDQ = sorting/queueing offloaded to the NIC
10. OVERVIEW
NIC
-‐
VMDQ
/
NETQUEUE
Netqueue
Devices Part
nr Speed Interface
Intel
Ethernet
Server
Adapter
X520-‐SR2
2
ports E10G42BFSR 10Gbps SR-‐LC
Intel
Ethernet
Server
Adapter
X520-‐DA2
2
ports E10G42BTDA 10Gbps SFP+
Intel
Gigabit
ET
Dual
Port
Server
Adapter
2
ports E1G42ET 1Gbps RJ-‐45
-‐
Copper
Intel
Gigabit
EF
Dual
Port
Server
Adapter
2
ports E1G42EF 1Gbps RJ-‐45
-‐
Fibre
Intel
Gigabit
ET
Quad
Port
Server
Adapter
4
ports E1G44ET 1Gbps RJ-‐45
-‐
Copper
Intel
Gigabit
CT
Desktop
Adapter EXPI9301CT 1Gbps RJ-‐45
-‐
Copper
Supermicro
Add-‐on
Card
AOC-‐SG-‐I2
2
ports AOC-‐SG-‐I2 1Gbps RJ-‐45
copper
Onboard
82576
(8
Virtual
Queues)
Onboard
82574
Geen
IOV
Broadcom's
NetXtreme
II
Ethernet
chipse 1-‐10
GBps
Alle
Neterions
1-‐10
GBps
Sunday 8 May 2011
11. SAN CHOICES
• Fibre Channel
• ESX with FC-HBA
• vSphere: FC-HBA pass-through to Guest OS
• iSCSI (using 10Gbit if possible)
• ESX with Hardware Initiator (iSCSI HBA)
• ESX with Software Initiator
• Initiator inside the Guest OS
• vSphere: iSCSI HBA pass-through to Guest OS
Sunday 8 May 2011
10Gbit = high CPU overhead!! We’re talking 24GHz to fill up 9Gbits
This problem can be reduced by the following technologies
VT-d ---> Moving DMA and address translation to the NIC
VMDQ/Netqueue ---> Netqueue is pretty much VMware’s implementation
SR-IOV ---> Allowing one physical device (NIC) to show itself as multiple virtual devices.
12. SAN CHOICES
• Fibre Channel
• ESX with FC-HBA
• vSphere: FC-HBA pass-through to Guest OS Server with (hardware) iSCSI = iSCSI Target
• iSCSI (using 10Gbit if possible)
• ESX with Hardware Initiator (iSCSI HBA)
• ESX with Software Initiator
• Initiator inside the Guest OS
• vSphere: iSCSI HBA pass-through to Guest OS
(Virtualization-) server with (hardware) iSCSI
= iSCSI Initiator
Sunday 8 May 2011
10Gbit = high CPU overhead!! We’re talking 24GHz to fill up 9Gbits
This problem can be reduced by the following technologies
VT-d ---> Moving DMA and address translation to the NIC
VMDQ/Netqueue ---> Netqueue is pretty much VMware’s implementation
SR-IOV ---> Allowing one physical device (NIC) to show itself as multiple virtual devices.
13. GENERAL OPTIMIZATION STRATEGY
Making the right “hardware” choices
Tuning the hypervisor to your database’s needs
Tuning the OS to your database’s needs
Squeezing every last bit of performance out of your
database
Sunday 8 May 2011
14. VIRTUAL MEMORY
0xA
0xB
0xC
0xD
0xE
0xF
0xG
0xH
CPU
Mem
Managed by software
Actual Hardware
Sunday 8 May 2011
CPU’s: AMD: all 4core opterons
Intel: Xeon 5500, 7500, 5600
Physical memory is divided into segments of 4KB, which is translated in software to so-called
pages. Small chunks with each its own address, which the CPU uses to find the data in the
physical memory.
A piece of software always gets a continuous block of “virtual” memory assigned to it within
an OS, even though the physical memory is fragmented, to prevent a coding nightmare.
(keeping track of every single page address is madness).
The page table was made for the CPU to run through and to make the necessary translation
to the physical memory. The CPU has a hardware cache that keeps track of these entries, the
Translation Lookaside Buffer. This is an extremely fast buffer that saves the most recent
addresses, so the CPU can prevent running through the Page Table as much as possible.
15. VIRTUAL MEMORY
Virtual
0xA
Memory
0xB
1 0xC
2 0xD
3 0xE
4 0xF
5 0xG
OS
6
7
0xH
CPU
8
9
Mem
10
11
12
Managed by software
Actual Hardware
Sunday 8 May 2011
CPU’s: AMD: all 4core opterons
Intel: Xeon 5500, 7500, 5600
Physical memory is divided into segments of 4KB, which is translated in software to so-called
pages. Small chunks with each its own address, which the CPU uses to find the data in the
physical memory.
A piece of software always gets a continuous block of “virtual” memory assigned to it within
an OS, even though the physical memory is fragmented, to prevent a coding nightmare.
(keeping track of every single page address is madness).
The page table was made for the CPU to run through and to make the necessary translation
to the physical memory. The CPU has a hardware cache that keeps track of these entries, the
Translation Lookaside Buffer. This is an extremely fast buffer that saves the most recent
addresses, so the CPU can prevent running through the Page Table as much as possible.
16. VIRTUAL MEMORY
Virtual
Page Table 0xA
Memory
0xB
1 1 | 0xD 0xC
2 2 | 0xC 0xD
3 3 | 0xF 0xE
4 4 | 0xA 0xF
5 5 | 0xH 0xG
OS
6
7
6 | 0xG
7 | 0xB
0xH
CPU
8
9
8 | 0xE
Mem
10
11
etc.
12
Managed by software
Actual Hardware
Sunday 8 May 2011
CPU’s: AMD: all 4core opterons
Intel: Xeon 5500, 7500, 5600
Physical memory is divided into segments of 4KB, which is translated in software to so-called
pages. Small chunks with each its own address, which the CPU uses to find the data in the
physical memory.
A piece of software always gets a continuous block of “virtual” memory assigned to it within
an OS, even though the physical memory is fragmented, to prevent a coding nightmare.
(keeping track of every single page address is madness).
The page table was made for the CPU to run through and to make the necessary translation
to the physical memory. The CPU has a hardware cache that keeps track of these entries, the
Translation Lookaside Buffer. This is an extremely fast buffer that saves the most recent
addresses, so the CPU can prevent running through the Page Table as much as possible.
17. VIRTUAL MEMORY
Virtual
Page Table 0xA
Memory
0xB
1 1 | 0xD 0xC
2 2 | 0xC 0xD TLB
3 3 | 0xF 0xE
1 | 0xD
4 4 | 0xA 0xF
5 | 0xH
5 5 | 0xH 0xG
2 | 0xC
OS
6
7
6 | 0xG
7 | 0xB
0xH
CPU
8
9
8 | 0xE
Mem
etc.
10
11
etc.
12
Managed by software
Actual Hardware
Sunday 8 May 2011
CPU’s: AMD: all 4core opterons
Intel: Xeon 5500, 7500, 5600
Physical memory is divided into segments of 4KB, which is translated in software to so-called
pages. Small chunks with each its own address, which the CPU uses to find the data in the
physical memory.
A piece of software always gets a continuous block of “virtual” memory assigned to it within
an OS, even though the physical memory is fragmented, to prevent a coding nightmare.
(keeping track of every single page address is madness).
The page table was made for the CPU to run through and to make the necessary translation
to the physical memory. The CPU has a hardware cache that keeps track of these entries, the
Translation Lookaside Buffer. This is an extremely fast buffer that saves the most recent
addresses, so the CPU can prevent running through the Page Table as much as possible.
18. SPT VS HAP
“Read-only”
0xA
Page Table
0xB
1 | 0xD 0xC
1
2 5 | 0xH 0xD
VM A 3 2 | 0xC 0xE
0xF
4
5 0xG
N
0xH
CPU
1
12 | 0xB
10 | 0xE
Mem
2
VM B 3 9 | 0xA
4
12 etc.
Managed by VM OS
Managed by hypervisor
Actual Hardware
Sunday 8 May 2011
In a virtual environment, where the guest OS is not allowed direct access to the memory, this
was solved in a different way. Each VM gets access to its own page table, but this one is
actually locked/read-only, and as soon as a change is made, a “trap” is generated, so the
hypervisor is forced to take over and handle the page management. This causes a lot of
overhead, because every single memory management action forces the hypervisor to
intervene.
As an alternative, new CPU’s came to the market with a modified TLB-cache, which was able
to keep track of the complete translation path (VM virtual address --> VM physical address
--> host physical address)
Downside: Because of this, filling up the TLB got a lot more complex. A page that is not yet in
there is very hard to find. Once the TLB is properly warmed up, though, most applications
rarely have to wait for other pages.
19. SPT VS HAP
“Read-only” “Shadow”
0xA
Page Table Page Table
0xB
1 | 0xD 1 | 0xG 0xC
1
5 | 0xH 5 | 0xD 0xD
2
VM A 3 2 | 0xC 2 | 0xF 0xE
0xF
4
5
N
A 0xG
0xH
CPU
1
12 | 0xB
10 | 0xE
12 | 0xE
10 | 0xB
Mem
2
VM B 3 9 | 0xA 9 | 0xC
4
12 etc.
B
Managed by VM OS
Managed by hypervisor
Actual Hardware
Sunday 8 May 2011
In a virtual environment, where the guest OS is not allowed direct access to the memory, this
was solved in a different way. Each VM gets access to its own page table, but this one is
actually locked/read-only, and as soon as a change is made, a “trap” is generated, so the
hypervisor is forced to take over and handle the page management. This causes a lot of
overhead, because every single memory management action forces the hypervisor to
intervene.
As an alternative, new CPU’s came to the market with a modified TLB-cache, which was able
to keep track of the complete translation path (VM virtual address --> VM physical address
--> host physical address)
Downside: Because of this, filling up the TLB got a lot more complex. A page that is not yet in
there is very hard to find. Once the TLB is properly warmed up, though, most applications
rarely have to wait for other pages.
20. SPT VS HAP
“Read-only”
0xA
Page Table TLB
0xB
1 | 0xD 0xC A1 | 0xD
1
5 | 0xH 0xD A5 | 0xH
2
VM A 3 2 | 0xC 0xE
0xF
A2 | 0xC
B12 | 0xB
4
5 0xG B10 | 0xE
N
0xH B9 | 0xA
CPU
1
12 | 0xB
10 | 0xE
Mem
2
VM B 3 9 | 0xA
4
12 etc. etc.
Managed by VM OS
Managed by hypervisor
Actual Hardware
Sunday 8 May 2011
In a virtual environment, where the guest OS is not allowed direct access to the memory, this
was solved in a different way. Each VM gets access to its own page table, but this one is
actually locked/read-only, and as soon as a change is made, a “trap” is generated, so the
hypervisor is forced to take over and handle the page management. This causes a lot of
overhead, because every single memory management action forces the hypervisor to
intervene.
As an alternative, new CPU’s came to the market with a modified TLB-cache, which was able
to keep track of the complete translation path (VM virtual address --> VM physical address
--> host physical address)
Downside: Because of this, filling up the TLB got a lot more complex. A page that is not yet in
there is very hard to find. Once the TLB is properly warmed up, though, most applications
rarely have to wait for other pages.
21. HAP
Sunday 8 May 2011
As you can see, in general this does help improve performance, though not by a really huge
amount. It opens the door to a great combination with another technique, though!
22. HAP + LARGE PAGES
Setting Large Pages:
• Linux - increase SHMMAX in rc.local
• Windows - grant “Lock Pages in memory”
• MySQL (only InnoDB) - large-pages
• Oracle - ORA_LPENABLE=1 in registry
• SQL Server - Enterprise only, need >8GB RAM. For buffer
pool start up with trace flag -834
Sunday 8 May 2011
While using HAP, you should definitely make use of Large Pages, because filling up the TLB is
a lot more expensive. By using Large Pages (2mb in 4kb), a LOT more memory can be
accessed by a single entry. This in combination with a bigger TLB in the newest CPU’s
attempts to prevent entries from disappearing from the TLB too fast.
Oracle: HKEY_LOCAL_MACHINESOFTWAREORACLEKEY_HOME_NAME
23. HAP + LARGE PAGES
Setting Large Pages:
• Linux - increase SHMMAX in rc.local
• Windows - grant “Lock Pages in memory”
• MySQL (only InnoDB) - large-pages
• Oracle - ORA_LPENABLE=1 in registry
• SQL Server - Enterprise only, need >8GB RAM. For buffer
pool start up with trace flag -834
Sunday 8 May 2011
While using HAP, you should definitely make use of Large Pages, because filling up the TLB is
a lot more expensive. By using Large Pages (2mb in 4kb), a LOT more memory can be
accessed by a single entry. This in combination with a bigger TLB in the newest CPU’s
attempts to prevent entries from disappearing from the TLB too fast.
Oracle: HKEY_LOCAL_MACHINESOFTWAREORACLEKEY_HOME_NAME
24. VIRTUAL HBA’S
• Choices (ESX)
• Before vSphere:
• BusLogic Parallel (Legacy)
• LSI Logic Parallel (Optimized)
• Since vSphere
• LSI Logic SAS (default as of Win2008)
• VMware Paravirtual (PVSCSI)
• Thin vs Thick Provisioning (vSphere)
• Snapshots & performance do not go together
Sunday 8 May 2011
BusLogic ---> Generic adapter
LSILogic ---> Optimized adapter that requires tools
LSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)
PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command
queueing.
25. VIRTUAL HBA’S
• Choices (ESX)
• Before vSphere:
• BusLogic Parallel (Legacy)
• LSI Logic Parallel (Optimized)
• Since vSphere
• LSI Logic SAS (default as of Win2008)
• VMware Paravirtual (PVSCSI)
• Thin vs Thick Provisioning (vSphere)
• Snapshots & performance do not go together
Sunday 8 May 2011
BusLogic ---> Generic adapter
LSILogic ---> Optimized adapter that requires tools
LSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)
PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command
queueing.
26. VIRTUAL HBA’S
• Choices (ESX)
• Before vSphere:
• BusLogic Parallel (Legacy)
• LSI Logic Parallel (Optimized)
• Since vSphere
• LSI Logic SAS (default as of Win2008)
• VMware Paravirtual (PVSCSI)
• Thin vs Thick Provisioning (vSphere)
• Snapshots & performance do not go together
Sunday 8 May 2011
BusLogic ---> Generic adapter
LSILogic ---> Optimized adapter that requires tools
LSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)
PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command
queueing.
27. VIRTUAL HBA’S
• Choices (ESX)
• Before vSphere:
• BusLogic Parallel (Legacy)
• LSI Logic Parallel (Optimized)
• Since vSphere
• LSI Logic SAS (default as of Win2008)
• VMware Paravirtual (PVSCSI)
• Thin vs Thick Provisioning (vSphere)
• Snapshots & performance do not go together
Sunday 8 May 2011
BusLogic ---> Generic adapter
LSILogic ---> Optimized adapter that requires tools
LSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)
PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command
queueing.
28. VIRTUAL HBA’S
• Choices (ESX)
• Before vSphere:
• BusLogic Parallel (Legacy)
• LSI Logic Parallel (Optimized)
• Since vSphere
• LSI Logic SAS (default as of Win2008)
• VMware Paravirtual (PVSCSI)
• Thin vs Thick Provisioning (vSphere)
• Snapshots & performance do not go together
Sunday 8 May 2011
BusLogic ---> Generic adapter
LSILogic ---> Optimized adapter that requires tools
LSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)
PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command
queueing.
29. VIRTUAL HBA’S
• Choices (ESX)
• Before vSphere:
• BusLogic Parallel (Legacy)
• LSI Logic Parallel (Optimized)
• Since vSphere
• LSI Logic SAS (default as of Win2008)
• VMware Paravirtual (PVSCSI)
• Thin vs Thick Provisioning (vSphere)
• Snapshots & performance do not go together
Sunday 8 May 2011
BusLogic ---> Generic adapter
LSILogic ---> Optimized adapter that requires tools
LSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)
PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command
queueing.
30. VIRTUAL NIC’S
• Choices (ESX)
• Before vSphere:
• Flexible (emulation)
• E1000 (Intel E1000 emulation, default x64)
• (enhanced) VMXNET (paravirtual)
• Since vSphere:
• VMXNET 3 (third generation paravirtual NIC)
• Jumbo frames, NIC Teaming, VLANs
• Colocation (minimize NIC traffic by sharing a host)
Sunday 8 May 2011
Flexible ---> required for 32-bit systems
Automatically turns into a VMXNET after installing VMware Tools
VMXNet adds ‘Jumbo Frames’
VMXNET3 adds:
* MSI/MSI-X support (if supported by guest OS Kernel)
• Receive Side Scaling (Windows 2008 only)
• IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU)
• VLAN Offloading
• Bigger TX/RX ring sizes
• Optimizations for iSCSI & VMotion
• Necessary for VMDq!!
31. VIRTUAL NIC’S
• Choices (ESX)
• Before vSphere:
• Flexible (emulation)
• E1000 (Intel E1000 emulation, default x64)
• (enhanced) VMXNET (paravirtual)
• Since vSphere:
• VMXNET 3 (third generation paravirtual NIC)
• Jumbo frames, NIC Teaming, VLANs
• Colocation (minimize NIC traffic by sharing a host)
Sunday 8 May 2011
Flexible ---> required for 32-bit systems
Automatically turns into a VMXNET after installing VMware Tools
VMXNet adds ‘Jumbo Frames’
VMXNET3 adds:
* MSI/MSI-X support (if supported by guest OS Kernel)
• Receive Side Scaling (Windows 2008 only)
• IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU)
• VLAN Offloading
• Bigger TX/RX ring sizes
• Optimizations for iSCSI & VMotion
• Necessary for VMDq!!
32. VIRTUAL NIC’S
• Choices (ESX)
• Before vSphere:
• Flexible (emulation)
• E1000 (Intel E1000 emulation, default x64)
• (enhanced) VMXNET (paravirtual)
• Since vSphere:
• VMXNET 3 (third generation paravirtual NIC)
• Jumbo frames, NIC Teaming, VLANs
• Colocation (minimize NIC traffic by sharing a host)
Sunday 8 May 2011
Flexible ---> required for 32-bit systems
Automatically turns into a VMXNET after installing VMware Tools
VMXNet adds ‘Jumbo Frames’
VMXNET3 adds:
* MSI/MSI-X support (if supported by guest OS Kernel)
• Receive Side Scaling (Windows 2008 only)
• IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU)
• VLAN Offloading
• Bigger TX/RX ring sizes
• Optimizations for iSCSI & VMotion
• Necessary for VMDq!!
33. VIRTUAL NIC’S
• Choices (ESX)
• Before vSphere:
• Flexible (emulation)
• E1000 (Intel E1000 emulation, default x64)
• (enhanced) VMXNET (paravirtual)
• Since vSphere:
• VMXNET 3 (third generation paravirtual NIC)
• Jumbo frames, NIC Teaming, VLANs
• Colocation (minimize NIC traffic by sharing a host)
Sunday 8 May 2011
Flexible ---> required for 32-bit systems
Automatically turns into a VMXNET after installing VMware Tools
VMXNet adds ‘Jumbo Frames’
VMXNET3 adds:
* MSI/MSI-X support (if supported by guest OS Kernel)
• Receive Side Scaling (Windows 2008 only)
• IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU)
• VLAN Offloading
• Bigger TX/RX ring sizes
• Optimizations for iSCSI & VMotion
• Necessary for VMDq!!
34. VIRTUAL NIC’S
• Choices (ESX)
• Before vSphere:
• Flexible (emulation)
• E1000 (Intel E1000 emulation, default x64)
• (enhanced) VMXNET (paravirtual)
• Since vSphere:
• VMXNET 3 (third generation paravirtual NIC)
• Jumbo frames, NIC Teaming, VLANs
• Colocation (minimize NIC traffic by sharing a host)
Sunday 8 May 2011
Flexible ---> required for 32-bit systems
Automatically turns into a VMXNET after installing VMware Tools
VMXNet adds ‘Jumbo Frames’
VMXNET3 adds:
* MSI/MSI-X support (if supported by guest OS Kernel)
• Receive Side Scaling (Windows 2008 only)
• IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU)
• VLAN Offloading
• Bigger TX/RX ring sizes
• Optimizations for iSCSI & VMotion
• Necessary for VMDq!!
35. GENERAL OPTIMIZATION STRATEGY
Making the right hardware choices
Tuning the hypervisor to your database’s needs
Tuning the OS to your database’s needs
Squeezing every last bit of performance out of your
database
Sunday 8 May 2011
36. BEST OS CHOICES
• 64-bit Linux for MySQL
• MySQL 5.1.32 or later
• ... ? (discuss mode on! :) )
Sunday 8 May 2011
Modified mutexes for InnoDB = improvement of locking for multithreaded environments.
This allows for much better scaling.
37. DON’T FORGET
• VMware Tools • Integration Services
• Paravirtualized Vmxnet, • Paravirtualized Drivers
PVSCSI
• Hypercall adapter
• Ballooning
• Time Sync
• Time Sync
• ... and more recent drivers
• ... and more recent drivers
Sunday 8 May 2011
Definitely install the tools of the hypervisor in question to enable use of its newest
functionalities. This is very important if you want to use for example overcommitting memory
in ESX, or using paravirtualization in Linux on Hyper-V.
38. CACHING LEVELS
• CPU
• Application
• Filesystem / OS
• RAID Controller (switch off or use a BBU!)
• Disk
Sunday 8 May 2011
CPU: Just buy the right CPU
App/FS: use the correct settings (Direct IO)
RAID Controller: Make use of a battery backupped unit (for transactional databases: lots of
random writes in the cache, so to be sure, the RAID controller keeps track of those). This is
mostly used as a write buffer.
Disk: If cache is available on-disk, it’s best we disable this, especially when the power drops
(so nothing can get stuck in the caches). HP disables these by default.
39. GENERAL OPTIMIZATION STRATEGY
Making the right hardware choices
Tuning the hypervisor to your database’s needs
Tuning the OS to your database’s needs
Squeezing every last bit of performance out of your
database
Sunday 8 May 2011
40. DIRECT IO
• Less Page management
• Smallest cache possible vs Less I/O
SQL Server: Automatically
MySQL: only for use with InnoDB! - innodb_flush_method=O_DIRECT
Oracle: filesystemio_options=DIRECTIO
Sunday 8 May 2011
Though in Windows this is on by default, in Linux it should definitely be enabled. Otherwise
everything that is already cached by the InnoDB buffer pool may also be cached by the
filesystem cache, so two separate but identical caches need to be maintained in the memory:
far too much memory management.
MySQL’s MyISAM actually depends on this filesystem cache. It expects the OS to do the brunt
of the caching work itself.
42. GENERAL MY.CNF
OPTIMIZATIONS
• thread_cache (check out max_used_connections)
• table_cache (64) - table_open_cache (5.1.3x)
• Engine dependent
• open_tables variable
opened_tables ∆ ≈ 0
•
• innodb_buffer_pool_size
• innodb_thread_concurrency
Sunday 8 May 2011
Try to fit max_used_connections into the thread_cache IF
POSSIBLE
44. INDEX FRAGMENTATION
Clustered Index Leaf Level
• Happens with clustered indexes
• Large-scale fragmentation of the indexes could cause serious
performance problems
• Fixes:
• SQL Server: REBUILD/REORGANIZE
• MySQL: ALTER TABLE tbl_name ENGINE=INNODB
• Oracle: ALTER INDEX index_name REBUILD
Sunday 8 May 2011
45. STORAGE ENGINE INTERNALS
Datafile
DB
Front Transaction Log
Buffer Pool
Cache
Sunday 8 May 2011
SQL Server --> Set memory options in server properties > Memory > Server memory
Options
46. STORAGE ENGINE INTERNALS
Update Datafile
DB
Front Transaction Log
Buffer Pool
Cache
Sunday 8 May 2011
SQL Server --> Set memory options in server properties > Memory > Server memory
Options
47. STORAGE ENGINE INTERNALS
Update Datafile
DB
Front Transaction Log
Buffer Pool
Cache
Sunday 8 May 2011
SQL Server --> Set memory options in server properties > Memory > Server memory
Options
48. STORAGE ENGINE INTERNALS
Update Datafile
DB
Front Transaction Log
Buffer Pool
Cache
Sunday 8 May 2011
SQL Server --> Set memory options in server properties > Memory > Server memory
Options
49. STORAGE ENGINE INTERNALS
Update Datafile
DB
Front Transaction Log
Buffer Pool
Cache
Sunday 8 May 2011
SQL Server --> Set memory options in server properties > Memory > Server memory
Options
50. STORAGE ENGINE INTERNALS
Update Datafile
Insert
DB
Delete Front Transaction Log
Buffer Pool
Cache
Sunday 8 May 2011
SQL Server --> Set memory options in server properties > Memory > Server memory
Options
51. STORAGE ENGINE INTERNALS
Update Datafile
Insert
DB
Delete Front Transaction Log
Checkpoint process
Buffer Pool
Cache
Sunday 8 May 2011
SQL Server --> Set memory options in server properties > Memory > Server memory
Options
52. STORAGE ENGINE INTERNALS
Update Datafile
Insert
DB
Delete Front Transaction Log
Checkpoint process
Buffer Pool
Cache
Sunday 8 May 2011
SQL Server --> Set memory options in server properties > Memory > Server memory
Options
53. DATA AND LOG PLACEMENT
Sunday 8 May 2011
This is most important for transactional databases.
As you can see, the difference of using a decent SAS or SSD disk for the database log is
negligible. There is no use sinking the cache into an SSD for logs, just get a decent, fast SAS.
54. SQL STATEMENT ‘DUHS’
• Every table MUST have a primary key
• If possible, use a clustered index
• Only keep regularly used indexes around (f. ex. FK)
• WHERE > JOIN > ORDER BY > SELECT
• Don’t use SELECT *
• Try not to use COUNT() (in InnoDB always a full table scan)
Sunday 8 May 2011
55. GENERAL OPTIMIZATION STRATEGY
Making the right hardware choices
Tuning the hypervisor to your database’s needs
Tuning the OS to your database’s needs
Squeezing every last bit of performance out of your
database
Sunday 8 May 2011
56. QUESTIONS?
I don’t have the attention span to keep up a blog :(
Results of benchmarks: http://www.anandtech.com/tag/IT
Sunday 8 May 2011