Your SlideShare is downloading. ×
OpenDBCamp Virtualization
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

OpenDBCamp Virtualization

648
views

Published on

This is a presentation I gave on impulse at Open Database Camp in Sardegna, Italy last weekend, en then a bit less impulsively at the Inuits igloo. …

This is a presentation I gave on impulse at Open Database Camp in Sardegna, Italy last weekend, en then a bit less impulsively at the Inuits igloo.

A word of caution: I included the notes because they contain some extra info, but the presentation was hacked together from several older ones (not all of them my own) so there might be some flukes in there. :)

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
648
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. VIRTUAL DATABASES? Optimizing for VirtualizationLiz van Dijk - @lizztheblizz - liz@sizingservers.beSunday 8 May 2011
  • 2. THE GOAL “Virtualized Databases suck!” - Is this really true? Does it have to be?Sunday 8 May 2011Databases are supposed to be “hard to virtualize” and have decreased performance in avirtual environment. This is actually correct, dumping a native database into a virtualenvironment without applying any changes could potentially cause some issues.
  • 3. HOW DO WE GET THERE? 1. Understanding just why the virtual environment impacts performance, and taking the correct steps to adapt our database to its new habitat. 2. Optimize, optimize, optimize...Sunday 8 May 2011Action: We have to understand why performance of databases is influenced, and how we canarm ourselves against this impact.On the other hand, while there used be less of a need for optimization in an environmentwhere hardware was abundant, a virtual environment causes struggles for resources morequickly. It’s important to create our application as slim as possible without losingperformance. In many cases, performance can be multiplied by having a closer look at thedatabase.Message: Why is this interesting for you? This knowledge could convince you to make theswitch to a virtual environment, trusting it won’t hit your software’s performance, and willhelp you take a look at your existing infrastructure to take the necessary steps to run yourapplication as optimal as possible.
  • 4. THE INFLUENCE OF VIRTUALIZATION • All “kernel” activity is more costly: • Interrupts • System Calls (I/O) • Memory page managementSunday 8 May 2011So, let’s start with the understanding step: what could potentially slow down because ofvirtualization?The 3 most important aspects are:Interrupts - An actual piece of hardware is looking for attention from the CPU. Making use ofJumbo Frames is a very good idea in a virtual environment, because sending the same datacauses less interrupts (1500 --> 9000 bytes per packet)System Calls - A process is looking for attention from the kernel to do a privileged task likeaccessing certain hardware (network/disk IO)Page Management - This is the most important one for databases: think caching. Thedatabase keeps an enormous amount of data in its own caches, so memory is manipulated alot of the time. Every time something changes in this memory, the virtual host has to performa double translation: From Virtual Memory to VM pagetable to physical address.Usually, this causes the biggest performance hit when switching from native to virtual. Wereally have to do everything we can to minimize this problem.
  • 5. GENERAL OPTIMIZATION STRATEGY Making the right hardware choices Tuning the hypervisor to your database’s needs Tuning the OS to your database’s needs Squeezing every last bit of performance out of your databaseSunday 8 May 2011Performance issues should be dealt with systematically, and we can split that process up inthese 4 steps.
  • 6. HARDWARE CHOICES • Choosing the right CPU’s • Intel 5500/7500 and later types (Nehalem) / All AMD quadcore Opterons (HW-assisted/MMU virtualization) • Choosing the right NIC’s (VMDQ) • Choosing the right storage system (iSCSI vs FC SAN)Sunday 8 May 2011CPU’s --> HW Virtualization (dom -1) & HAPbest price/quality at the momentOpteron 6000 series very good at datamining/decision supportXeon 5600 series still very good at OLTPVMDQ = sorting/queueing offloaded to the NIC
  • 7. CPU EVOLUTIONSunday 8 May 2011
  • 8. CPU EVOLUTIONSunday 8 May 2011
  • 9. CPU EVOLUTIONSunday 8 May 2011
  • 10. OVERVIEW  NIC  -­‐  VMDQ  /  NETQUEUE Netqueue  Devices Part  nr Speed Interface Intel  Ethernet  Server  Adapter  X520-­‐SR2  2  ports E10G42BFSR 10Gbps SR-­‐LC Intel  Ethernet  Server  Adapter  X520-­‐DA2  2  ports E10G42BTDA 10Gbps SFP+ Intel  Gigabit  ET  Dual  Port  Server  Adapter  2  ports E1G42ET 1Gbps RJ-­‐45  -­‐  Copper Intel  Gigabit  EF  Dual  Port  Server  Adapter  2  ports E1G42EF 1Gbps RJ-­‐45  -­‐  Fibre Intel  Gigabit  ET  Quad  Port  Server  Adapter  4  ports E1G44ET 1Gbps RJ-­‐45  -­‐  Copper Intel  Gigabit  CT  Desktop  Adapter EXPI9301CT 1Gbps RJ-­‐45  -­‐  Copper Supermicro  Add-­‐on  Card  AOC-­‐SG-­‐I2  2  ports AOC-­‐SG-­‐I2 1Gbps RJ-­‐45  copper Onboard  82576  (8  Virtual  Queues) Onboard  82574  Geen  IOV Broadcoms  NetXtreme  II  Ethernet  chipse 1-­‐10  GBps Alle  Neterions   1-­‐10  GBpsSunday 8 May 2011
  • 11. SAN CHOICES • Fibre Channel • ESX with FC-HBA • vSphere: FC-HBA pass-through to Guest OS • iSCSI (using 10Gbit if possible) • ESX with Hardware Initiator (iSCSI HBA) • ESX with Software Initiator • Initiator inside the Guest OS • vSphere: iSCSI HBA pass-through to Guest OSSunday 8 May 201110Gbit = high CPU overhead!! We’re talking 24GHz to fill up 9GbitsThis problem can be reduced by the following technologiesVT-d ---> Moving DMA and address translation to the NICVMDQ/Netqueue ---> Netqueue is pretty much VMware’s implementationSR-IOV ---> Allowing one physical device (NIC) to show itself as multiple virtual devices.
  • 12. SAN CHOICES • Fibre Channel • ESX with FC-HBA • vSphere: FC-HBA pass-through to Guest OS Server with (hardware) iSCSI = iSCSI Target • iSCSI (using 10Gbit if possible) • ESX with Hardware Initiator (iSCSI HBA) • ESX with Software Initiator • Initiator inside the Guest OS • vSphere: iSCSI HBA pass-through to Guest OS (Virtualization-) server with (hardware) iSCSI = iSCSI InitiatorSunday 8 May 201110Gbit = high CPU overhead!! We’re talking 24GHz to fill up 9GbitsThis problem can be reduced by the following technologiesVT-d ---> Moving DMA and address translation to the NICVMDQ/Netqueue ---> Netqueue is pretty much VMware’s implementationSR-IOV ---> Allowing one physical device (NIC) to show itself as multiple virtual devices.
  • 13. GENERAL OPTIMIZATION STRATEGY Making the right “hardware” choices Tuning the hypervisor to your database’s needs Tuning the OS to your database’s needs Squeezing every last bit of performance out of your databaseSunday 8 May 2011
  • 14. VIRTUAL MEMORY 0xA 0xB 0xC 0xD 0xE 0xF 0xG 0xH CPU Mem Managed by software Actual HardwareSunday 8 May 2011CPU’s: AMD: all 4core opteronsIntel: Xeon 5500, 7500, 5600Physical memory is divided into segments of 4KB, which is translated in software to so-calledpages. Small chunks with each its own address, which the CPU uses to find the data in thephysical memory.A piece of software always gets a continuous block of “virtual” memory assigned to it withinan OS, even though the physical memory is fragmented, to prevent a coding nightmare.(keeping track of every single page address is madness).The page table was made for the CPU to run through and to make the necessary translationto the physical memory. The CPU has a hardware cache that keeps track of these entries, theTranslation Lookaside Buffer. This is an extremely fast buffer that saves the most recentaddresses, so the CPU can prevent running through the Page Table as much as possible.
  • 15. VIRTUAL MEMORY Virtual 0xA Memory 0xB 1 0xC 2 0xD 3 0xE 4 0xF 5 0xG OS 6 7 0xH CPU 8 9 Mem 10 11 12 Managed by software Actual HardwareSunday 8 May 2011CPU’s: AMD: all 4core opteronsIntel: Xeon 5500, 7500, 5600Physical memory is divided into segments of 4KB, which is translated in software to so-calledpages. Small chunks with each its own address, which the CPU uses to find the data in thephysical memory.A piece of software always gets a continuous block of “virtual” memory assigned to it withinan OS, even though the physical memory is fragmented, to prevent a coding nightmare.(keeping track of every single page address is madness).The page table was made for the CPU to run through and to make the necessary translationto the physical memory. The CPU has a hardware cache that keeps track of these entries, theTranslation Lookaside Buffer. This is an extremely fast buffer that saves the most recentaddresses, so the CPU can prevent running through the Page Table as much as possible.
  • 16. VIRTUAL MEMORY Virtual Page Table 0xA Memory 0xB 1 1 | 0xD 0xC 2 2 | 0xC 0xD 3 3 | 0xF 0xE 4 4 | 0xA 0xF 5 5 | 0xH 0xG OS 6 7 6 | 0xG 7 | 0xB 0xH CPU 8 9 8 | 0xE Mem 10 11 etc. 12 Managed by software Actual HardwareSunday 8 May 2011CPU’s: AMD: all 4core opteronsIntel: Xeon 5500, 7500, 5600Physical memory is divided into segments of 4KB, which is translated in software to so-calledpages. Small chunks with each its own address, which the CPU uses to find the data in thephysical memory.A piece of software always gets a continuous block of “virtual” memory assigned to it withinan OS, even though the physical memory is fragmented, to prevent a coding nightmare.(keeping track of every single page address is madness).The page table was made for the CPU to run through and to make the necessary translationto the physical memory. The CPU has a hardware cache that keeps track of these entries, theTranslation Lookaside Buffer. This is an extremely fast buffer that saves the most recentaddresses, so the CPU can prevent running through the Page Table as much as possible.
  • 17. VIRTUAL MEMORY Virtual Page Table 0xA Memory 0xB 1 1 | 0xD 0xC 2 2 | 0xC 0xD TLB 3 3 | 0xF 0xE 1 | 0xD 4 4 | 0xA 0xF 5 | 0xH 5 5 | 0xH 0xG 2 | 0xC OS 6 7 6 | 0xG 7 | 0xB 0xH CPU 8 9 8 | 0xE Mem etc. 10 11 etc. 12 Managed by software Actual HardwareSunday 8 May 2011CPU’s: AMD: all 4core opteronsIntel: Xeon 5500, 7500, 5600Physical memory is divided into segments of 4KB, which is translated in software to so-calledpages. Small chunks with each its own address, which the CPU uses to find the data in thephysical memory.A piece of software always gets a continuous block of “virtual” memory assigned to it withinan OS, even though the physical memory is fragmented, to prevent a coding nightmare.(keeping track of every single page address is madness).The page table was made for the CPU to run through and to make the necessary translationto the physical memory. The CPU has a hardware cache that keeps track of these entries, theTranslation Lookaside Buffer. This is an extremely fast buffer that saves the most recentaddresses, so the CPU can prevent running through the Page Table as much as possible.
  • 18. SPT VS HAP “Read-only” 0xA Page Table 0xB 1 | 0xD 0xC 1 2 5 | 0xH 0xD VM A 3 2 | 0xC 0xE 0xF 4 5 0xG N 0xH CPU 1 12 | 0xB 10 | 0xE Mem 2 VM B 3 9 | 0xA 4 12 etc. Managed by VM OS Managed by hypervisor Actual HardwareSunday 8 May 2011In a virtual environment, where the guest OS is not allowed direct access to the memory, thiswas solved in a different way. Each VM gets access to its own page table, but this one isactually locked/read-only, and as soon as a change is made, a “trap” is generated, so thehypervisor is forced to take over and handle the page management. This causes a lot ofoverhead, because every single memory management action forces the hypervisor tointervene.As an alternative, new CPU’s came to the market with a modified TLB-cache, which was ableto keep track of the complete translation path (VM virtual address --> VM physical address--> host physical address)Downside: Because of this, filling up the TLB got a lot more complex. A page that is not yet inthere is very hard to find. Once the TLB is properly warmed up, though, most applicationsrarely have to wait for other pages.
  • 19. SPT VS HAP “Read-only” “Shadow” 0xA Page Table Page Table 0xB 1 | 0xD 1 | 0xG 0xC 1 5 | 0xH 5 | 0xD 0xD 2 VM A 3 2 | 0xC 2 | 0xF 0xE 0xF 4 5 N A 0xG 0xH CPU 1 12 | 0xB 10 | 0xE 12 | 0xE 10 | 0xB Mem 2 VM B 3 9 | 0xA 9 | 0xC 4 12 etc. B Managed by VM OS Managed by hypervisor Actual HardwareSunday 8 May 2011In a virtual environment, where the guest OS is not allowed direct access to the memory, thiswas solved in a different way. Each VM gets access to its own page table, but this one isactually locked/read-only, and as soon as a change is made, a “trap” is generated, so thehypervisor is forced to take over and handle the page management. This causes a lot ofoverhead, because every single memory management action forces the hypervisor tointervene.As an alternative, new CPU’s came to the market with a modified TLB-cache, which was ableto keep track of the complete translation path (VM virtual address --> VM physical address--> host physical address)Downside: Because of this, filling up the TLB got a lot more complex. A page that is not yet inthere is very hard to find. Once the TLB is properly warmed up, though, most applicationsrarely have to wait for other pages.
  • 20. SPT VS HAP “Read-only” 0xA Page Table TLB 0xB 1 | 0xD 0xC A1 | 0xD 1 5 | 0xH 0xD A5 | 0xH 2 VM A 3 2 | 0xC 0xE 0xF A2 | 0xC B12 | 0xB 4 5 0xG B10 | 0xE N 0xH B9 | 0xA CPU 1 12 | 0xB 10 | 0xE Mem 2 VM B 3 9 | 0xA 4 12 etc. etc. Managed by VM OS Managed by hypervisor Actual HardwareSunday 8 May 2011In a virtual environment, where the guest OS is not allowed direct access to the memory, thiswas solved in a different way. Each VM gets access to its own page table, but this one isactually locked/read-only, and as soon as a change is made, a “trap” is generated, so thehypervisor is forced to take over and handle the page management. This causes a lot ofoverhead, because every single memory management action forces the hypervisor tointervene.As an alternative, new CPU’s came to the market with a modified TLB-cache, which was ableto keep track of the complete translation path (VM virtual address --> VM physical address--> host physical address)Downside: Because of this, filling up the TLB got a lot more complex. A page that is not yet inthere is very hard to find. Once the TLB is properly warmed up, though, most applicationsrarely have to wait for other pages.
  • 21. HAPSunday 8 May 2011As you can see, in general this does help improve performance, though not by a really hugeamount. It opens the door to a great combination with another technique, though!
  • 22. HAP + LARGE PAGES Setting Large Pages: • Linux - increase SHMMAX in rc.local • Windows - grant “Lock Pages in memory” • MySQL (only InnoDB) - large-pages • Oracle - ORA_LPENABLE=1 in registry • SQL Server - Enterprise only, need >8GB RAM. For buffer pool start up with trace flag -834Sunday 8 May 2011While using HAP, you should definitely make use of Large Pages, because filling up the TLB isa lot more expensive. By using Large Pages (2mb in 4kb), a LOT more memory can beaccessed by a single entry. This in combination with a bigger TLB in the newest CPU’sattempts to prevent entries from disappearing from the TLB too fast.Oracle: HKEY_LOCAL_MACHINESOFTWAREORACLEKEY_HOME_NAME
  • 23. HAP + LARGE PAGES Setting Large Pages: • Linux - increase SHMMAX in rc.local • Windows - grant “Lock Pages in memory” • MySQL (only InnoDB) - large-pages • Oracle - ORA_LPENABLE=1 in registry • SQL Server - Enterprise only, need >8GB RAM. For buffer pool start up with trace flag -834Sunday 8 May 2011While using HAP, you should definitely make use of Large Pages, because filling up the TLB isa lot more expensive. By using Large Pages (2mb in 4kb), a LOT more memory can beaccessed by a single entry. This in combination with a bigger TLB in the newest CPU’sattempts to prevent entries from disappearing from the TLB too fast.Oracle: HKEY_LOCAL_MACHINESOFTWAREORACLEKEY_HOME_NAME
  • 24. VIRTUAL HBA’S • Choices (ESX) • Before vSphere: • BusLogic Parallel (Legacy) • LSI Logic Parallel (Optimized) • Since vSphere • LSI Logic SAS (default as of Win2008) • VMware Paravirtual (PVSCSI) • Thin vs Thick Provisioning (vSphere) • Snapshots & performance do not go togetherSunday 8 May 2011BusLogic ---> Generic adapterLSILogic ---> Optimized adapter that requires toolsLSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports commandqueueing.
  • 25. VIRTUAL HBA’S • Choices (ESX) • Before vSphere: • BusLogic Parallel (Legacy) • LSI Logic Parallel (Optimized) • Since vSphere • LSI Logic SAS (default as of Win2008) • VMware Paravirtual (PVSCSI) • Thin vs Thick Provisioning (vSphere) • Snapshots & performance do not go togetherSunday 8 May 2011BusLogic ---> Generic adapterLSILogic ---> Optimized adapter that requires toolsLSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports commandqueueing.
  • 26. VIRTUAL HBA’S • Choices (ESX) • Before vSphere: • BusLogic Parallel (Legacy) • LSI Logic Parallel (Optimized) • Since vSphere • LSI Logic SAS (default as of Win2008) • VMware Paravirtual (PVSCSI) • Thin vs Thick Provisioning (vSphere) • Snapshots & performance do not go togetherSunday 8 May 2011BusLogic ---> Generic adapterLSILogic ---> Optimized adapter that requires toolsLSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports commandqueueing.
  • 27. VIRTUAL HBA’S • Choices (ESX) • Before vSphere: • BusLogic Parallel (Legacy) • LSI Logic Parallel (Optimized) • Since vSphere • LSI Logic SAS (default as of Win2008) • VMware Paravirtual (PVSCSI) • Thin vs Thick Provisioning (vSphere) • Snapshots & performance do not go togetherSunday 8 May 2011BusLogic ---> Generic adapterLSILogic ---> Optimized adapter that requires toolsLSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports commandqueueing.
  • 28. VIRTUAL HBA’S • Choices (ESX) • Before vSphere: • BusLogic Parallel (Legacy) • LSI Logic Parallel (Optimized) • Since vSphere • LSI Logic SAS (default as of Win2008) • VMware Paravirtual (PVSCSI) • Thin vs Thick Provisioning (vSphere) • Snapshots & performance do not go togetherSunday 8 May 2011BusLogic ---> Generic adapterLSILogic ---> Optimized adapter that requires toolsLSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports commandqueueing.
  • 29. VIRTUAL HBA’S • Choices (ESX) • Before vSphere: • BusLogic Parallel (Legacy) • LSI Logic Parallel (Optimized) • Since vSphere • LSI Logic SAS (default as of Win2008) • VMware Paravirtual (PVSCSI) • Thin vs Thick Provisioning (vSphere) • Snapshots & performance do not go togetherSunday 8 May 2011BusLogic ---> Generic adapterLSILogic ---> Optimized adapter that requires toolsLSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports commandqueueing.
  • 30. VIRTUAL NIC’S • Choices (ESX) • Before vSphere: • Flexible (emulation) • E1000 (Intel E1000 emulation, default x64) • (enhanced) VMXNET (paravirtual) • Since vSphere: • VMXNET 3 (third generation paravirtual NIC) • Jumbo frames, NIC Teaming, VLANs • Colocation (minimize NIC traffic by sharing a host)Sunday 8 May 2011Flexible ---> required for 32-bit systemsAutomatically turns into a VMXNET after installing VMware ToolsVMXNet adds ‘Jumbo Frames’VMXNET3 adds:* MSI/MSI-X support (if supported by guest OS Kernel)• Receive Side Scaling (Windows 2008 only)• IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU)• VLAN Offloading• Bigger TX/RX ring sizes• Optimizations for iSCSI & VMotion• Necessary for VMDq!!
  • 31. VIRTUAL NIC’S • Choices (ESX) • Before vSphere: • Flexible (emulation) • E1000 (Intel E1000 emulation, default x64) • (enhanced) VMXNET (paravirtual) • Since vSphere: • VMXNET 3 (third generation paravirtual NIC) • Jumbo frames, NIC Teaming, VLANs • Colocation (minimize NIC traffic by sharing a host)Sunday 8 May 2011Flexible ---> required for 32-bit systemsAutomatically turns into a VMXNET after installing VMware ToolsVMXNet adds ‘Jumbo Frames’VMXNET3 adds:* MSI/MSI-X support (if supported by guest OS Kernel)• Receive Side Scaling (Windows 2008 only)• IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU)• VLAN Offloading• Bigger TX/RX ring sizes• Optimizations for iSCSI & VMotion• Necessary for VMDq!!
  • 32. VIRTUAL NIC’S • Choices (ESX) • Before vSphere: • Flexible (emulation) • E1000 (Intel E1000 emulation, default x64) • (enhanced) VMXNET (paravirtual) • Since vSphere: • VMXNET 3 (third generation paravirtual NIC) • Jumbo frames, NIC Teaming, VLANs • Colocation (minimize NIC traffic by sharing a host)Sunday 8 May 2011Flexible ---> required for 32-bit systemsAutomatically turns into a VMXNET after installing VMware ToolsVMXNet adds ‘Jumbo Frames’VMXNET3 adds:* MSI/MSI-X support (if supported by guest OS Kernel)• Receive Side Scaling (Windows 2008 only)• IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU)• VLAN Offloading• Bigger TX/RX ring sizes• Optimizations for iSCSI & VMotion• Necessary for VMDq!!
  • 33. VIRTUAL NIC’S • Choices (ESX) • Before vSphere: • Flexible (emulation) • E1000 (Intel E1000 emulation, default x64) • (enhanced) VMXNET (paravirtual) • Since vSphere: • VMXNET 3 (third generation paravirtual NIC) • Jumbo frames, NIC Teaming, VLANs • Colocation (minimize NIC traffic by sharing a host)Sunday 8 May 2011Flexible ---> required for 32-bit systemsAutomatically turns into a VMXNET after installing VMware ToolsVMXNet adds ‘Jumbo Frames’VMXNET3 adds:* MSI/MSI-X support (if supported by guest OS Kernel)• Receive Side Scaling (Windows 2008 only)• IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU)• VLAN Offloading• Bigger TX/RX ring sizes• Optimizations for iSCSI & VMotion• Necessary for VMDq!!
  • 34. VIRTUAL NIC’S • Choices (ESX) • Before vSphere: • Flexible (emulation) • E1000 (Intel E1000 emulation, default x64) • (enhanced) VMXNET (paravirtual) • Since vSphere: • VMXNET 3 (third generation paravirtual NIC) • Jumbo frames, NIC Teaming, VLANs • Colocation (minimize NIC traffic by sharing a host)Sunday 8 May 2011Flexible ---> required for 32-bit systemsAutomatically turns into a VMXNET after installing VMware ToolsVMXNet adds ‘Jumbo Frames’VMXNET3 adds:* MSI/MSI-X support (if supported by guest OS Kernel)• Receive Side Scaling (Windows 2008 only)• IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU)• VLAN Offloading• Bigger TX/RX ring sizes• Optimizations for iSCSI & VMotion• Necessary for VMDq!!
  • 35. GENERAL OPTIMIZATION STRATEGY Making the right hardware choices Tuning the hypervisor to your database’s needs Tuning the OS to your database’s needs Squeezing every last bit of performance out of your databaseSunday 8 May 2011
  • 36. BEST OS CHOICES • 64-bit Linux for MySQL • MySQL 5.1.32 or later • ... ? (discuss mode on! :) )Sunday 8 May 2011Modified mutexes for InnoDB = improvement of locking for multithreaded environments.This allows for much better scaling.
  • 37. DON’T FORGET • VMware Tools • Integration Services • Paravirtualized Vmxnet, • Paravirtualized Drivers PVSCSI • Hypercall adapter • Ballooning • Time Sync • Time Sync • ... and more recent drivers • ... and more recent driversSunday 8 May 2011Definitely install the tools of the hypervisor in question to enable use of its newestfunctionalities. This is very important if you want to use for example overcommitting memoryin ESX, or using paravirtualization in Linux on Hyper-V.
  • 38. CACHING LEVELS • CPU • Application • Filesystem / OS • RAID Controller (switch off or use a BBU!) • DiskSunday 8 May 2011CPU: Just buy the right CPUApp/FS: use the correct settings (Direct IO)RAID Controller: Make use of a battery backupped unit (for transactional databases: lots ofrandom writes in the cache, so to be sure, the RAID controller keeps track of those). This ismostly used as a write buffer.Disk: If cache is available on-disk, it’s best we disable this, especially when the power drops(so nothing can get stuck in the caches). HP disables these by default.
  • 39. GENERAL OPTIMIZATION STRATEGY Making the right hardware choices Tuning the hypervisor to your database’s needs Tuning the OS to your database’s needs Squeezing every last bit of performance out of your databaseSunday 8 May 2011
  • 40. DIRECT IO • Less Page management • Smallest cache possible vs Less I/O SQL Server: Automatically MySQL: only for use with InnoDB! - innodb_flush_method=O_DIRECT Oracle: filesystemio_options=DIRECTIOSunday 8 May 2011Though in Windows this is on by default, in Linux it should definitely be enabled. Otherwiseeverything that is already cached by the InnoDB buffer pool may also be cached by thefilesystem cache, so two separate but identical caches need to be maintained in the memory:far too much memory management.MySQL’s MyISAM actually depends on this filesystem cache. It expects the OS to do the bruntof the caching work itself.
  • 41. GENERAL MY.CNF OPTIMIZATIONS • max_connections (151) (File descriptors!) • Per connection • read_buffer_size (128K) (Full Scan) • read_rnd_buffer_size (256K) (Order By) • sort_buffer_size (2M) (Sorts) • join_buffer_size (128K) (Full Scan Join) Sunday 8 May 2011
  • 42. GENERAL MY.CNF OPTIMIZATIONS • thread_cache (check out max_used_connections) • table_cache (64) - table_open_cache (5.1.3x) • Engine dependent • open_tables variable opened_tables ∆ ≈ 0 • • innodb_buffer_pool_size • innodb_thread_concurrencySunday 8 May 2011Try to fit max_used_connections into the thread_cache IFPOSSIBLE
  • 43. INDEXING • Heaps • Unclustered Indexes • Clustered Indexes (InnoDB)Sunday 8 May 2011
  • 44. INDEX FRAGMENTATION Clustered Index Leaf Level • Happens with clustered indexes • Large-scale fragmentation of the indexes could cause serious performance problems • Fixes: • SQL Server: REBUILD/REORGANIZE • MySQL: ALTER TABLE tbl_name ENGINE=INNODB • Oracle: ALTER INDEX index_name REBUILDSunday 8 May 2011
  • 45. STORAGE ENGINE INTERNALS Datafile DB Front Transaction Log Buffer Pool CacheSunday 8 May 2011SQL Server --> Set memory options in server properties > Memory > Server memoryOptions
  • 46. STORAGE ENGINE INTERNALS Update Datafile DB Front Transaction Log Buffer Pool CacheSunday 8 May 2011SQL Server --> Set memory options in server properties > Memory > Server memoryOptions
  • 47. STORAGE ENGINE INTERNALS Update Datafile DB Front Transaction Log Buffer Pool CacheSunday 8 May 2011SQL Server --> Set memory options in server properties > Memory > Server memoryOptions
  • 48. STORAGE ENGINE INTERNALS Update Datafile DB Front Transaction Log Buffer Pool CacheSunday 8 May 2011SQL Server --> Set memory options in server properties > Memory > Server memoryOptions
  • 49. STORAGE ENGINE INTERNALS Update Datafile DB Front Transaction Log Buffer Pool CacheSunday 8 May 2011SQL Server --> Set memory options in server properties > Memory > Server memoryOptions
  • 50. STORAGE ENGINE INTERNALS Update Datafile Insert DB Delete Front Transaction Log Buffer Pool CacheSunday 8 May 2011SQL Server --> Set memory options in server properties > Memory > Server memoryOptions
  • 51. STORAGE ENGINE INTERNALS Update Datafile Insert DB Delete Front Transaction Log Checkpoint process Buffer Pool CacheSunday 8 May 2011SQL Server --> Set memory options in server properties > Memory > Server memoryOptions
  • 52. STORAGE ENGINE INTERNALS Update Datafile Insert DB Delete Front Transaction Log Checkpoint process Buffer Pool CacheSunday 8 May 2011SQL Server --> Set memory options in server properties > Memory > Server memoryOptions
  • 53. DATA AND LOG PLACEMENTSunday 8 May 2011This is most important for transactional databases.As you can see, the difference of using a decent SAS or SSD disk for the database log isnegligible. There is no use sinking the cache into an SSD for logs, just get a decent, fast SAS.
  • 54. SQL STATEMENT ‘DUHS’ • Every table MUST have a primary key • If possible, use a clustered index • Only keep regularly used indexes around (f. ex. FK) • WHERE > JOIN > ORDER BY > SELECT • Don’t use SELECT * • Try not to use COUNT() (in InnoDB always a full table scan)Sunday 8 May 2011
  • 55. GENERAL OPTIMIZATION STRATEGY Making the right hardware choices Tuning the hypervisor to your database’s needs Tuning the OS to your database’s needs Squeezing every last bit of performance out of your databaseSunday 8 May 2011
  • 56. QUESTIONS? I don’t have the attention span to keep up a blog :( Results of benchmarks: http://www.anandtech.com/tag/ITSunday 8 May 2011