Your SlideShare is downloading. ×
Designing Information Structures For Performance And Reliability
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Designing Information Structures For Performance And Reliability

1,082
views

Published on


0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,082
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Keep relevant data closest to the CPU in memory once it has been read from disk. More memory reduces the need for costly “page-in” operations from disk by reducing the need to “page-out” data to make space for new data. Memory bus speed is still much slower than CPU bus speeds, often becoming a bottleneck as CPU speeds increase. It’s important to have the fastest memory speed and FSB that your chipset will support. More CPU cores allows you to parallelize workloads. A multithreaded database takes advantage of multi-processing bydistributing a query into several threads across multiple CPUs, drastically increasing the query’s efficiency while reducing its process time. Faster disks with high bandwidth and low seek times maximize read performance into memory for CPUs to process complex queries. OLAP databases benefit from this because they scan large datasets frequently. Using RAID allows you to aggregate disk I/O by striping data across several spindles, drastically decreasing the time it takes to read data into memory and write back onto the disks during commits, while also providing massive storage space, redundancy and fault-tolerance.
  • Set realistic goals, know the hardware’s expected limitations....Measure current performance....Analyze the results (research upgrades and possible performance problems)Modify the system..Benchmark again....Repeat as needed
  • Client issues a query across the network ....Database server searches cache and memory for database extents...if they’re not found in memory, they’re located on disk...Disks then seek out the blocks containing the database extents and begin loading the data into memory...Memory pages are then fed into CPU’s cache and ultimately into the CPU for processing...Results are found and sent back to the client
  • Each CPU has several coresInternal Clock Speed: processes per second in MHz or GHz (advertised)External Clock Speed: speed the FSB is accessed (typical bottleneck)Memory Clock Speed: speed at which RAM is given requests for data (another bottleneck)PostgreSQL is multi-process, one Unix process per DB connection.A single connection can only use on CPU, not multithreaded.
  • CPU speed has increased roughly 70% each year, memory speed hasn’t kept up.DDR Memory (double data rate) allows for sending data to the CPU at the top and bottom of the clock cycle (sine wave). Doubling throughput....still bottleneck.Memory tradeoff is typically speed for capacity, faster = less capacity/more expensiveFurther out from the CPU you go, the slower and greater capacity the storage. Also, disk is the only permanent storage....also holds swap.
  • 1st Generation Xeon multiprocessing bottlenecks....still bottlenecks today, but less so...Shared FSB between processors....halves bandwidth to memory....second processor competes for FSB bandwidth...Memory access delayed between controller and memory bankBandwidth between I/O controller (Southbridge) and memory controller (Northbridge) congested...as is bandwidth from Northbridge to the expansion slots.
  • Here you see how each Intel processor shares a common FSB bandwidth, dividing the bandwidth per CPU. Access to memory must be at the reduced bandwidth, through the northbridge memory controller, and into the memory banks.AMD’s approach places a Northbridge controller directly on each processor, so there’s no external chipsets to deal with. Each processor features three point-to-point HyperTransport links, delivering 3.2 GB/s of bandwidth in each direction (6.4GB/s full duplex). So AMD’s scalability was better in the earlier days of Xeon multiprocessing.
  • “Second Generation” (Harpertown) Xeon Processors E5200/E5400:==========================================Each CPU has a clock speed of 2GHz, 12MB of L2 cache, and a FSB of 1333MHz (1600MHz max on other models). The read bandwidth for each DDR2 667-MHz memory channel is 5.325 GB/s which gives a total read bandwidth of 21.3 GB/s for four memory channelsWrite memory bandwidth through the same four channels is 10.7 GB/s write memory bandwidth for the same four memory channels. Overall Effective bandwidth to memory is then 32 GB/s ... 21.3 GB/s read and 10.7 GB/s write.5500-series "Nehalem-EP" (Gainestown) adds: (December 2008)=================================Integrated memory controller supporting 2-3 DDR3 memory channelsPoint to Point processor interconnect called “QuickPath” (like AMD’s HyperTransport), bypassing FSBHyperthreading, doubling each core
  • There’s 3 delays associated with reading or writing to a hard drive:Seek Time, Rotational Delay, and Transfer TimeSeek Time is the time it takes for the drive’s read/write head to be physically moved into the correct place for the data being sought.Rotational Delay is the time required for the addressed area of the disk to rotate into a position where it is accessible by the read/write head….typically measured in milliseconds.Transfer Time is the time it takes to transfer data from the disk through the read/write head, across the storage bus, into memory for processing by the CPU.Seek Time/Rotational Delay is heavily influenced by the disk’s rotational speed (RPMs), data location on the actual platters, how many platters the disk has, and the diameter of the platters.Generally speaking, the faster a disk spins, the lower its seek times will be. Also, the further outside the circumference of the platter data is located, the faster it will be sought and lower it’s rotational delay will be.Bandwidth/Throughput (Transfer Time): Once data is located, this is the raw throughput rate at which data is transferred from disk into memory. This can be aggregated using RAID, which will be discussed later.SATA-I bandwidth is 1Gb/s which translates into ~ 150MB/s real speed.SATA-II and SAS bandwidth is 3Gb/s, which translates into ~ 300MB/s real speed.Generally speaking, the higher the data density of the platter, the more data will be sent through the read/write head per block…resulting in higher throughput and lower transfer times.
  • Buffer/Cache:Writeback-cacheData normally written to disk by the CPU is first written into disk’s the cache. This allows for higher write performance with the risk that data stored in cache isn’t flushed to disk before a power During idle machine cycles, the data are written from the cache into memory or onto disk. Write back caches improve performance, because a write to the high-speed cache is faster than to normal RAM or disk….this cache aids in Addressing the disk-to-memory subsystem bottleneck.I’ve enabled write-back caching on all of our RAID arrays. RAID will be discussed later.
  • 4. Track Data Density :Defines how much information can be stored on a given track. The higher the track data density, the more information the disk can store on one track. If a disk can store more data on one track it does not have to move the head to the next track as often. This means that the higher the recording density the lower the chances are that the head will have to be moved to the next track to get the required data.
  • RAID: (n = number of drives in array) “Redundant Array of Inexpensive Disks”. RAID systems improve performance by allowing the controller to exploit the capabilities of multiple hard disks to get around performance-limiting mechanical issues that plague individual hard disks. Different RAID implementations improve performance in different ways and to different degrees, but all improve it in some way. RAID0 “Striping” (n) : Fastest due to no parity…raw cumulative speed. Single drive failure causes the entire array to fail. “All-or-none” RAID1 “Mirroring” (n/2): Each drive is mirrored, speed and capacity is ½ of RAID0, requires even number of disks in order to be divided. Entire source or mirror array can go bad before data is jeopardized.RAID5 “Striping w/Parity” (n – 1): Fast, with a drive set aside for fault-tolerance. Only one drive can fail before the array is lost.RAID6 “Striping with dual Parity” (n -2): Fast, with 2 drives set aside for fault tolerance. Two drives can fail before the array is lost.
  • Normal PCI’s bandwidth is 132MB/sAGP8x is 2.1GB/sPCI Express outperforms PCI significantly:PCIe is bidirectional/full-duplex...allowing data to flow in both directions simultaneously (doubling throughput):PCIe 1x = 500MB/s (250MB/s each way)PCIe 2x = 1GB/s (500MB/s each way)PCIe 4x = 2GB/s (1GB/s each way)PCIe 8x = 4GB/s (2GB/s each way)PCIe 16x = 8GB/s (4/GB each way)Even PCIe 32x = 16GB/s (8GB/s each way)So, to open the Internally, you want to use PCIe, not just plain PCI which is old and slow in comparison.AGP is also obsolete due to PCIe’s introduction, now graphics cards use this interface as well.Regular PCI is a bottleneck in modern computers.All of our 2950 servers have PERC6/i RAID controllers built in, the “i” means “integrated’ on the motherboard.I found that our throughput was significantly slower than what we expected, despite having 6 SATA-II drives even in RAID0. The settings we selected for the RAID Virtual Drives were: Stripe Element Size 64KB Read Policy: Adaptive Read-Ahead (to optimize large read operations) Write Policy: Write BackWe were seeing read speeds in RAID5 of approximately 150-225MB/s across 4 drives..which we knew was way too slow given the hardware.After rebuilding the array several times and searching around on the Internet, I came across DELL’s PERC firmware update site, which showed that a newer release was available: v.6.2.0-0013“performance enhancements including significant improvements in random-write performance, multi-threaded write performance, and reduction in maximum and average I/O response times.”I couldn’t flash the PERC controllers without a floppy, so I had to create a Linux based FreeDOS bootable CD with the updated PERC firmware in a subdirectory, allowing me to successfully flash the controller’s BIOS. Later, I discovered that DELL’s OpenManage CD provides a tool to handle BIOS updates, however, I wasn’t able to get this working....so the FreeDOS solution worked out.I also dug around and found that I could set filesystem read-ahead parameters through “hdparm” in Linux that would allow me to tell the OS to read ahead 2048 blocks whenever a read operation was performed. I set this in /etc/rc.local to persist after a reboot.Once the PERC controller was flashed and linux filesystem readahead was set, performance increased dramatically:We’re now seeing just over 500MB/s reads in RAID5/6.This significantly reduces the time it takes to load tables into memory for complex queries, thereby reducing overall query execution time. Performance is now on par with GreenPlum without having to pay $40,000/year licensing.
  • Keep relevant data closest to the CPU in memory once it has been read from disk. More memory reduces the need for costly “page-in” operations from disk by reducing the need to “page-out” data to make space for new data. Memory bus speed is still much slower than CPU bus speeds, often becoming a bottleneck as CPU speeds increase. It’s important to have the fastest memory speed and FSB that your chipset will support. More CPU cores allows you to parallelize workloads. A multithreaded database takes advantage of multi-processing bydistributing a query into several threads across multiple CPUs, drastically increasing the query’s efficiency while reducing its process time. Faster disks with high bandwidth and low seek times maximize read performance into memory for CPUs to process complex queries. OLAP databases benefit from this because they scan large datasets frequently. Using RAID allows you to aggregate disk I/O by striping data across several spindles, drastically decreasing the time it takes to read data into memory and write back onto the disks during commits, while also providing massive storage space, redundancy and fault-tolerance.
  • Keep relevant data closest to the CPU in memory once it has been read from disk. More memory reduces the need for costly “page-in” operations from disk by reducing the need to “page-out” data to make space for new data. Memory bus speed is still much slower than CPU bus speeds, often becoming a bottleneck as CPU speeds increase. It’s important to have the fastest memory speed and FSB that your chipset will support. More CPU cores allows you to parallelize workloads. A multithreaded database takes advantage of multi-processing bydistributing a query into several threads across multiple CPUs, drastically increasing the query’s efficiency while reducing its process time. Faster disks with high bandwidth and low seek times maximize read performance into memory for CPUs to process complex queries. OLAP databases benefit from this because they scan large datasets frequently. Using RAID allows you to aggregate disk I/O by striping data across several spindles, drastically decreasing the time it takes to read data into memory and write back onto the disks during commits, while also providing massive storage space, redundancy and fault-tolerance.
  • Database Types:OLAP (Online Analytical Processing):Provides big picture, supports analysis, needs aggregate data, evaluates all datasets quickly, uses a multidimensional model.DB size is typically 100GB to several TB (even petabytes)Mostly read-only operations, lots of scans, complex queries.Benefits from multi-threading, parallel processing, and fast drives with highread throughput/low seek times.Key Performance Metrics: Query throughput/Response time.OLTP (Online Transactional Processing): Provides detailed audit, supports operations, needs detailed data, finds one dataset quickly, uses a relational model.DB size typically < 100GBShort, atomic transactions…read/write.Key Performance Metrics: Transaction Throughput, Availability
  • Database Types:The time and expense involved in retrieving answers from databases means that a lot of business intelligence information often goes unused. The reason: most operational databases (OLTP) are designed to store your data, not to help you analyze it. The solution: an online analytical processing (OLAP) database, a specialized database designed to help you extract business intelligence information from your data.
  • A connection from an application program to the PostgreSQL server has to be established.The parser stagechecks the query transmitted by the application program for correct syntax and creates a query tree. The rewrite systemtakes the query tree created by the parser stage and looks for any rules (stored in the system catalogs) to apply to the query tree. It performs the transformations given in the rule bodies. The planner/optimizer takes the (rewritten) query tree and creates a query planthat will be the input to the executor. It does so by first creating all possible paths leading to the same result. For example if there is an index on a relation to be scanned, there are two paths for the scan. One possibility is a simple sequential scan and the other possibility is to use the index. Next the cost for the execution of each path is estimated and the cheapest path is chosen. The executor recursively steps through the plan tree and retrieves rows in the way represented by the plan. The executor makes use ofthe storage systemwhile scanning relations, performs sorts and joins, evaluates qualifications and finally hands back the rows derived.
  • GreenPlum and PostgreSQL:We found that despite the claims above, GreenPlum was overpriced, slow, and problematic. Furthermore, our GreenPlum PSA database grew to exceed the hardware we had in place, requiring us to constantly have to manually delete old tables. To replace GreenPlum while maintaining the table structures already in place, we opted to go with PostgreSQL, aware that it’s not pre-optimized for OLAP/data-warehouse applications.The mindset in doing this was that we could tweak PostgreSQL to mimic the actual performance we saw from GreenPlum without having to pay an expensive license.Understanding how to tweak PostgreSQL to mimic the performance of GreenPlum requires an understanding of PostgreSQL query execution characteristics and its tweak file concepts.
  • Max_connections sets the maximum number of client connections per server. Several performance parameters use “max_connections” as part of their formula for tweaking Postgresql.Shared buffers: As the name implies, this is the maximum shared memory allowed to PostgreSQL. Too much and you risk paging.Working Memory:You need to consider what you set max_connections to in order to size this parameter correctly. This is a setting where data warehouse systems, where users are submitting very large queries, can readily make use of many gigabytes of memory. This size is applied to each and every sort done by each user, and complex queries can use multiple working memory sort buffers. Set it to 50MB, and have 30 users submitting queries, and you are soon using 1.5GB of real memory. Furthermore, if a query involves doing merge sorts of 8 tables, that requires 8 times work_mem.
  • Max_connections sets the maximum number of client connections per server. Several performance parameters use “max_connections” as part of their formula for tweaking Postgresql.Shared buffers: As the name implies, this is the maximum shared memory allowed to PostgreSQL. Too much and you risk paging.Working Memory:You need to consider what you set max_connections to in order to size this parameter correctly. This is a setting where data warehouse systems, where users are submitting very large queries, can readily make use of many gigabytes of memory. This size is applied to each and every sort done by each user, and complex queries can use multiple working memory sort buffers. Set it to 50MB, and have 30 users submitting queries, and you are soon using 1.5GB of real memory. Furthermore, if a query involves doing merge sorts of 8 tables, that requires 8 times work_mem.
  • Partitioning refers to splitting what is logically one large table into smaller physical pieces. Partitioning can provide several benefits: Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory. When queries or updates access a large percentage of a single partition, performance can be improved by taking advantage of sequential scan of that partition instead of using an index and random access reads scattered across the whole table. Bulk loads and deletes can be accomplished by adding or removing partitions, if that requirement is planned into the partitioning design. ALTER TABLE is far faster than a bulk operation. It also entirely avoids the VACUUM overhead caused by a bulk DELETE. Seldom-used data can be migrated to cheaper and slower storage media.
  • Vacuuming ensures the databases remain ACID.... Atomic, Consistent, Isolated, and Durable.Atomicity: Guarantees that either all of the tasks of a transaction are performed or none.Consistency: Only valid data will ever be written to the database.Isolation: other operations cannot access or see the data in an intermediate state during a transaction.Durability: once the user has been notified of the transaction’s success, the transaction will persist and not be undone….thus surviving a system failure.
  • disk I/O was roughly 90MB/s Database volume was constantly around 90% capacity, causing Mike to have to manually delete tables…space was at a premium.Licensing with GreenPlum was expensive ($20,000/6 months….$40,000/yr.) made sense to leverage the underlying free open source code and scrap the proprietary distributed DB solution
  • Performance challenges with this server:Limited capacity, drives were small and slow (150MB/s).RAID controller’s SATA-1 interface didn’t recognize higher capacity SATA-II drives, despite SATA-II backwards compatibility.No floppy drive or USB boot capability…making it difficult to flash the controller and BIOS for SATA-II backwards compatibility. No PCIe expansion bays, only PCI-X, ruling out high-performance external enclosures.PostgreSQL requires significant configuration tweaks to realize decent performance.PostgreSQL isn’t multithreaded. A single query process (regardless of its complexity) uses only 1 CPU core.
  • Performance challenges with this server:This is our third (and current) generation PSA boxDELL PowerEdge 2950, with dual Xeon Quad Core processors @ 2.5GHz16GBDDR memory @ 667MHz bus speed (x2, DDR)1333MHz FSB6 SATA-II1TB7200RPM Drives, configured in a single 5TBRAID5 Array…1 drive can fail. Throughput is across 5 spindles…effective capacity is roughly 4.5TB.Drive Setup: Used PERC6/i BIOS Menu for RAID configuration (hardware RAID, OS transparent), battery backup is enabled for write-caching.Virtual Drive1 (Physical drives 1 and 2): RAID1 for system (1TB), 64KB Stripe Element Size, Write-Back enabledVirtual Drive2 (Physical drives 3,4,5,6): RAID5 for PostgreSQL data (3TB), 64KB Stripe Element Size, Write-Back enabledI/O Performance: Read:    507MB/sWrite:   401MB/s
  • Through process of elimination and online research (Google and Postgresql forums) we have setteled on the above settings in the PSA server’s configuration file:max_connections = 25Mike confirmed that only 10-15 PSA ever really connect at any given time, so this setting allows for spikes while remaining conservative enough to not inflate “work_mem” as work_mem uses max_connections in its memory allocation formula.shared_buffers = 4096MB This number comes from best practices, ¼ total physical memory (16GB)
  • Transcript

    • 1. Designing Information Structures for Performance and Reliability
      Key elements to maximizing DB Server Performance
      Bryan Randol
      IT/Systems Manager
      1
    • 2. Designing Information Structures for Performance and Reliability : Discussion Outline
      DAY 1: Hardware Performance:
      Systematic Tuning Concepts
      CPU
      Memory Architecture and Front-Side Bus (FSB)
      Data Flow Concepts
      Disk Considerations
      RAID
      DAY 2: Database Performance:
      OLAP vs. OLTP
      GreenPlum vs. PostgreSQL
      PostgreSQL Concepts and Performance Tweaking
      PSA v.1 – GreenPlum AOPen mini-PCs “dbnode1-dbnode6”
      PSA v.2 – Tyan Transport w/PostgreSQL
      PSA v.3 – Current PSA Implementation, DELL PowerEdge 2950 w/PostgreSQL 8.3
      2
    • 3. 3
      I. Database Server Performance: Hardware & Operating System Considerations
      DAY 1: Hardware Performance
    • 4. 4
      Designing Information Structures for Performance and Reliability : Discussion Outline
      Systematic tuning essentially follows these five steps:
      Assess the problem and establish numeric values that categorize acceptable behavior. (Know the system’s specifications and set realistic goals.)
      Measure the performance of the system before modification. (Benchmark)
      Identify the part of the system that is critical for improving the performance. This is called the “bottleneck”. (Analyze)
      Modify that part of the system to remove the bottleneck. (Upgrade/Tweak)
      Measure the performance of the system after
      modification. (Benchmark)
      Repeat steps 3-6 as needed.
      (Continuous Improvement)
    • 5. 5
      I. Database Server Performance: Data Flow Concepts
      DB Files are stored in the filesystem on disk in blocks.
      A “job” is requested, initiating a “process thread”, associated files are read into memory “pages”.
      Memory pages are read into the CPU’s cache as needed.
      “Page-outs” to disk occur to make space as needed.
      “Page-ins “ fromdisk are what slows down performance
      Once in CPU cache, jobs are processed in threads per CPU (or “core”).
    • 6. 6
      I. Database Server Performance: Hardware & Operating System Considerations
      Server Performance Considerations:
      CPU:
      Each CPU has at least one core, each core processes jobs (threads) sequentially based on the job’s priority. Higher priority jobs get more CPU time. Multi-threaded jobs are distributed evenly across all cores (“parallelized”).
      Internal Clock Speed: Operations the CPU can process internally per second in MHz, as advertised.
      External Clock Speed: Speed at which the CPU interacts with
      the rest of the system….also known as the front side bus (FSB).
      Memory Clock Speed: Speed at which RAM is given requests for data.
      Important PostgreSQL Performance Note:
      PostgreSQL uses a multi-process model, meaning each database connection has its own Unix process. Because of this, all multi-cpu operating systems can spread multiple database connections among the available CPUs.
      However, if only a single database connection is active, it can only use one CPU.
      PostgreSQL does not use multi-threading to allow a single process to use multiple CPUs.
    • 7. 7
      I. Database Server Performance: Hardware & Operating System Considerations
      Server Performance Considerations:
      Memory Architecture and FSB (Front Side Bus):
      On Intel based computers the CPU interfaces with memory through the “North Bridge” memory controller, across the FSB (Front Side Bus).
      FSB speed and the NorthBridge MMU (memory management unity) drastically affects the server’s performance, as it determines how fast data can be fed into the CPU from memory.
      Unless special care is taken, a database
      server running even a simple sequential
      scan on a table will spend 95% of its cycles
      waitingfor memory to be accessed.
      This memory access bottleneck is even more
      difficult to avoid in more complex database
      operations such as sorting, aggregation and
      join, which exhibit a random access pattern.
      Database algorithms and data structures
      should therefore be designed and optimized
      for memory access from the outset.
    • 8. 8
      I. Database Server Performance: Hardware & Operating System Considerations
      Intel “Xeon” based systems: Memory Access Challenges
      FSB is a fixed frequency and requires a separate chip to access memory.
      Newer processors will run at the same fixed FSB speed. Memory access is delayed by passing through the separate controller chip. 
      Both Processors share the same Front Side Buseffectively halving each processors bandwidth to memory, thereby stalling one processor while the other is accessing memory or I/O.
      All processor to system I/O and control must use this one path.
      One interleaved memory bank for both processors,  again, effectively halving each processor’s bandwidth to memory.
      Half the bandwidth of a 2 memory bank architecture. 
      All program access to graphics, PCI(e), PCI-X or other I/O must be through this bottleneck
    • 9. 9
      I. Database Server Performance: Hardware & Operating System Considerations
      Multiprocessing Memory Access Approaches
      Intel Xeon Multiprocessing “1st Gen.”
      • FSB cuts bandwidth per CPU
      • 10. NorthBridge controller produces overhead
      • 11. UMA (Uniform Memory Access)
      Access to memory banks is “uniform”.
      AMD Multiprocessing
      • “HyperTransport”
      • 12. FSB is on the CPU
      • 13. NUMA (Non-Uniform Memory Access)
      Latency to each memory bank varies
    • 14. 10
      I. Database Server Performance: Hardware & Operating System Considerations
      Intel “Harpertown” Xeon Improvements
      DELL PowerEdge 2950 III
      (2 x Xeon E5405 = 8 cores)
      4 cores/CPU + faster FSB ( &gt;= 1333MHz)
      Northbridge Controller bandwidth increased to 21.3GB/sreads from memory, and 10.7GB/swrites into memory…32GB/s overall bandwidth.
      DELL PowerEdge 1950
      (2 x Xeon E5405 = 8 cores)
    • 15. 11
      I. Database Server Performance: Hardware & Operating System Considerations
      Disk Considerations (secondary storage):
      Seek Time/Rotational Delay:
      How fast the read/write head is positioned appropriately for reading/writing and how fast the addressed area is placed under the read/write head for data transfer…
      SATA (Serial Advanced Technology Attachment) drives are cheap and come in sizes up to 2.5TB, typically maxing out at 7200RPMs. (“Velociraptor” is the exception @ 10,000RPM)
      SAS (Serial Attached SCSI) drives are twice as fast (15,000 RPMS) and typically twice as expensive, with roughly 1/5 the max capacity of SATA (~450GB).
      Bandwidth/Throughput (Transfer Time):
      Raw throughput rate at which data is transferred from disk into memory. This can be aggregated using RAID, which will be discussed later.
      SATA-I bandwidth is 1Gb/s which translates into ~ 150MB/s real speed.
      SATA-II and SAS bandwidth is 3Gb/s, which translates into ~ 300MB/s real speed.
    • 16. 12
      I. Database Server Performance: Hardware & Operating System Considerations
      Disk Considerations (secondary storage):
      Buffer/Cache:
      Disks contain intelligent controllers, read cache and write cache. When you ask for a given piece of data, the disk locates the data and sends it back to the motherboard. It also reads the rest of the track and caches this data on the assumption that you will want the next piece of data on the disk.
      This data is stored locally in its read cache. If, sometime later you request the next piece of data and it is in the read cache the disk can deliver it with almost no delay.
      Write back cache improves performance, because a write to the high-speed cache is faster than writes to normal RAM or disk….this cache aids in addressing the disk-to- memory subsystem bottleneck.
      Most good drives feature a 32MB buffer cache.
    • 17. 13
      I. Database Server Performance: Hardware & Operating System Considerations
      Disk Considerations :
      4. Track Data Density :
      Defines how much information can be stored on a given track. The higher the track data density, the more information the disk can store.
      If a disk can store more data on one track it does not have to move the head to the next track as often.
      This means that the higher the recording
      density the lower the chances are that the
      head will have to be moved to the next track
      to get the required data.
    • 18. 14
      I. Database Server Performance: Hardware & Operating System Considerations
      Disk Considerations:
      5. RAID: (n = number of drives in array)
      “Redundant Array of Inexpensive Disks”. Pools disks together to aggregate their throughput by “striping” data in segments across each disk. Also provides fault-tolerance. (n = number of drives)
      RAID0 “Striping” (n) : Fastest due to no parity…raw cumulative speed. Single drive failure causes the entire array to fail. “All-or-none”
      RAID1 “Mirroring” (n/2): Each drive is mirrored, speed and capacity is ½ of RAID0, requires even number of disks in order to be divided. Entire source or mirror array can go bad before data is jeopardized.
      RAID5 “Striping w/Parity” (n – 1): Fast, with a drive set aside for fault-tolerance. Only one drive can fail before the array is lost.
      RAID6 “Striping with dual Parity” (n -2): Fast, with 2 drives set aside for fault tolerance. Two drives can fail before the array is lost.
    • 19. 15
      I. Database Server Performance: Hardware & Operating System Considerations
      Disk Considerations:
      RAID controller
      Device responsible for managing the disk drives in an array.
      Stores the RAID configuration while also providing additional disk cache. Offloads costly checksum routines from CPU in parity driven RAID configurations (e.g. RAID5 and RAID6)
      The type of internal and external interface dramatically impacts the overall I/O performance of the array.
      Internal bus interface should be PCIe v2.0 (500 MB/s per lane throughput). Most common cards are x2, x4, and x8 “lanes” providing: 1GB/s, 2GB/s, and 4 GB/s throughput respectively.
      Notable external storage interfaces to the array enclosure include:
    • 20. 16
      I. Database Server Performance: Hardware & Operating System Considerations
      Filesystem Considerations
      As an easy performance boost with no downside, make sure the file system on which your database is kept is mounted &quot;noatime&quot;, which turns off the access time bookkeeping.
      XFS is a 64-bit filesystem, supports a maximum filesystem size of 8 binary exabytes minus one byte.
      On 32-bit Linux systems, XFS is “limited” to 16 binary terabytes.
      Journal updates in XFS are performed asynchronously to prevent a performance penalty.
      Files and directories in XFS can span allocation groups, each allocation group manages its own inode
      tables (unlike EXT3/EXT2), providing scalability and parallelism.
      Multiple threads and processes can perform I/O operations on the same filesystem simultaneously.
      On a RAID array, a “stripe unit” can be specified within XFS at creation time. This maximizes throughput
      by aligning inode allocations with RAID stripe sizes.
      XFS provides a 64-bit sparse address space for each file, which allows both for very large file sizes,
      and for holes within files for which no disk space is allocated.
    • 21. 17
      I. Database Server Performance: Hardware & Operating System Considerations
      Takeaways from Hardware Performance Concepts:
      Keep relevant data closest to the CPU in memory once it has been read from disk.
      More memory reduces the need for costly “page-in” operations from disk by reducing the need to “page-out” data to make space for new data.
      Memory bus speed is still much slower than CPU bus speeds, often becoming a bottleneck as CPU speeds increase. It’s important to have the fastest memory speed and FSB that your chipset will support.
      More CPU cores allows you to parallelize workloads.
      A multithreaded database takes advantage of multi-processing by
      distributing a query into several threads across multiple CPUs,
      drastically increasing the query’s efficiency while reducing its
      process time.
      Faster disks with high bandwidth and low seek times maximize
      read performance into memory for CPUs to process complex queries.
      OLAP databases benefit from this because they scan large datasets
      frequently.
      Using RAID allows you to aggregate disk I/O by striping data across several spindles, drastically decreasing the time it takes to read data into memory and write back onto the disks during commits, while also providing massive storage space, redundancy and fault-tolerance.
    • 22. 18
      I. Database Server Performance: Hardware & Operating System Considerations
      DAY 2: Database Performance
    • 23. 19
      II. Software & Application Considerations: OLAP and OLTP
      OLAP (Online Analytical Processing):
      Provides big picture, supports analysis, needs aggregate data, evaluates all datasets quickly, uses a multidimensional model.
      DB size is typically 100GB to several TB (even petabytes)
      Mostly read-only operations, lots of scans, complex queries.
      Benefits from multi-threading, parallel processing, and fast drives with highread throughput/low seek times.
      Key Performance Metrics: Query throughput/Response time.
      OLTP (Online Transactional Processing):
      Provides detailed audit, supports operations, needs detailed data, finds one dataset quickly, uses a relational model.
      DB size typically &lt; 100GB
      Short, atomic transactions. Heavy emphasis on
      lightning fast writes.
      Key Performance Metrics: Transaction Throughput, Availability
    • 24. 20
      II. Software & Application Considerations: OLAP and OLTP
      Database Types:
      OLAP (Online Analytical Processing):
      OLAP databases should only receive historical business data and remain isolated from OLTP (transactional) databases. Summaries not transactions.
      Data in OLAP databases never change, OLTP data constantly changes.
      OLAP databases typically contain fewer tables arranged into a “star” or “snowflake” schema.
      The central table in this star schema is called the “fact table”. The leaf tables are called “dimension tables”. The facts within a dimension table are called “members”.
      The joins between the dimension and fact tables allow you to browse through the facts across any number of dimensions.
      The simple design of the star schema makes it easier to write queries, and they run faster. OLTP database could involve dozens of tables, making query design complicated. In addition, the resulting query could take hours to run.
      OLAP databases make heavy use of indexes because they help find records in less time. In contrast, OLTP databases avoid them because they lengthen the process of inserting data.
    • 25. 21
      II. Software & Application Considerations: OLAP and OLTP
      Database Types:
      OLAP (Online Analytical Processing):
      The process by which OLAP databases are populated is called: Extract, Transform, and Load (ETL). No direct
      data-entries are made into a OLAP database, only summaritive bulk ETL transactions.
      A cube aggregates the facts in each level of each dimension in a given OLAP schema.
      Because the cube contains all of the data in an aggregated form, it seems to know the answers to queries in advance.
      This arrangement of data into cubes overcomes a limitation of relational databases.
    • 26. 22
      II. Software & Application Considerations: OLAP and OLTP
      OLAP (Online Analytical Processing):
      What happens during a query?
      Client statement is issued
      Database Server Processes the query by locating extents
      Data is found on Disk
      Results are sent through database server to client.
    • 27. 23
      II. Software & Application Considerations: PostgreSQL Query Flow
      PostgreSQL: The Path of a Query
      1. Connection from Application.
      2. Parsing Stage
      3. Rewrite Stage
      4. Cost comparison and Plan/Optimization Stage
      5. Execution Stage
      6. Result
    • 28. 24
      II. Software & Application Considerations: OLAP and OLTP
      GreenPlum and PostgreSQL:
      Of the open source database options, PostgreSQL is the most robust, object-relational database management system.
      GreenPlum is a commercially based PostgreSQL DBMS, adding enterprise (OLAP) oriented enhancements to PostgreSQL, promising the following features:
      • Economical Petabyte Scaling
      • 29. Massively Parallel Query Execution
      • 30. Unified Analytical Processing
      • 31. Shared-nothing massively parallel processing architecture
      • 32. Fault tolerance
      • 33. Linear Scalability
      • 34. “In-database” compression, 3-10x disk space reduction,
      with corresponding I/O improvement.
      License was $20,000 every 6 months ($40,000/yr.)
      It’s important to note that PostgreSQL is free and can be modified to perform similarly to GreenPlum. We did just that with our PSA server reconstruction project.
    • 35. PostgreSQL tweaks explained:
      PostgreSQL is tweaked through a configuration file called: “postgresql.conf”
      This flat file contains several dozen parameters from which the master
      PostgreSQL service “postmaster” reads at startup.
      Changes made to this file require the “postgresql “ service to be bounced (restarted) via the command as root: “service postgresql restart”
      Corresponding “postgresql.conf” parameter affecting query performance:
      Maximum Connections (max_connections): Determines the maximum number of concurrent connections to the database server. Keep in mind that this figure is used as a multiplier for work_mem.
      Shared Buffers (shared_buffers): The shared_buffers configuration parameter determines how much memory is dedicated to PostgreSQL to use for caching data. If you have a system with 1GB or more of RAM, a reasonable starting value for shared_buffers is 1/4 of the memory in your system.
      Working Memory (work_mem): If you do a lot of complex sorts, and have a lot of memory, then increasing the work_mem parameter allows PostgreSQL to do larger in-memory sorts which, unsurprisingly, will be faster than disk-based equivalents.
      25
      II. Software & Application Considerations: PostgreSQL Tweaks
    • 36. 26
      The default POSTGRESQL configuration allocates 1000 shared buffers. Each buffer is 8 kilobytes. Increasing the number of buffers makes it more likely backends will find the information they need in the cache, thus avoiding an expensive operating system request. The change can be made with a postmaster command-line flag or by changing the value of shared_buffers in postgresql.conf.
      The default POSTGRESQL configuration allocates 1000 shared buffers. Each buffer is 8 kilobytes. Increasing the number of buffers makes it more likely backends will find the information they need in the cache, thus avoiding an expensive operating system request. The change can be made with a postmaster command-line flag or by changing the value of shared_buffers in postgresql.conf.
      II. Software & Application Considerations: PostgreSQL Tweaks
      PostgreSQL tweaks explained:
      Shared Buffers
      PostgreSQL does not directly change information on disk. Instead, it requests data be read into the PostgreSQL shared buffer cache. PostgreSQL backends then read/write these blocks, and finally flush them back to disk.
      Backends that need to access tables first look for needed blocks in this cache. If they are already there, they can continue processing right away.
      If not, an operating system request is made to load the blocks. The blocks are loaded either from the kernel disk buffer cache, or from disk. These can be expensive operations.
      The default PostgreSQL configuration allocates 1000 shared buffers. Each buffer is 8 kilobytes.
      Increasing the number of buffers makes it more likely backends will find information in cache...to a limit.
    • 37. 27
      The default POSTGRESQL configuration allocates 1000 shared buffers. Each buffer is 8 kilobytes. Increasing the number of buffers makes it more likely backends will find the information they need in the cache, thus avoiding an expensive operating system request. The change can be made with a postmaster command-line flag or by changing the value of shared_buffers in postgresql.conf.
      The default POSTGRESQL configuration allocates 1000 shared buffers. Each buffer is 8 kilobytes. Increasing the number of buffers makes it more likely backends will find the information they need in the cache, thus avoiding an expensive operating system request. The change can be made with a postmaster command-line flag or by changing the value of shared_buffers in postgresql.conf.
      II. Software & Application Considerations: PostgreSQL Tweaks
      PostgreSQL tweaks explained:
      Shared Buffers “How much is too much?”
      Setting “shared_buffers” too high results in expensive “paging”...which severely degrades the database’s performance.
      If everything doesn&apos;t fit in RAM, the kernel starts forcing memory pages to a disk area called swap. It moves pages that have not been used recently. This operation is called a swap pageout. Pageouts are not a problem because they happen during periods of inactivity.
      What is bad is when these pages have to be brought back in from swap, meaning an old page that was moved out to swap has to be moved back into RAM. This is called a swap pagein.This is bad because while the page is moved from swap, the program is suspended until the pagein completes.
    • 38. PostgreSQL tweaks explained:
      Horizontal “Range” Partitioning:
      Also known as “shard” involves putting different rows into different tables for improved manageability and performance.
      Benefits of partitioning include:
      Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory.
      When queries or updates access a large percentage of a single partition, performance can be improved by taking advantage of sequential scan of that partition instead of using an index and random access reads scattered across the whole table.
      Seldom-used data can be migrated to cheaper and slower storage media.
      28
      II. Software & Application Considerations: PostgreSQL Tweaks
    • 39. PostgreSQL tweaks explained:
      Partitioning (cont.)
      The benefits will normally be worthwhile only when a table would otherwise be very large.
      The exact point at which a table will benefit from partitioning depends on the application, although a rule of thumb is that the size of the table should exceed the physical memory of the database server.
      The following forms of partitioning can be implemented in PostgreSQL:
      Range Partitioning (aka “Horizontal”)
      The table is partitioned into &quot;ranges&quot; defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions. For example one might partition by date ranges, or by ranges of identifiers for particular business objects.
      List Partitioning
      The table is partitioned by explicitly listing which key values appear in each partition.
      29
      II. Software & Application Considerations: PostgreSQL Tweaks
    • 40. PostgreSQL tweaks explained:
      VACUUM:
      Ensures database is ACID
      Atomic
      Consistent
      Isolated
      Durable
      PostgreSQL uses MVCC (Multi-version Concurrency Control)…eliminating read locks on records by allowing several versions of data to exist in a database.
      VACUUM removes old versions of this multi-versioned data in base tables from the database. These old versions waste space once a commit is made.
      To keep a PostgreSQL database performing well, you must ensure VACUUM is run correctly.
      AUTOVACUUM suffices for our query based, low transaction database, keeping dead space to a minimum.
      30
      II. Software & Application Considerations: PostgreSQL Tweaks
    • 41. 31
      III. PSA Server Case Studies: AOPen mini-PCs + GreenPlum
      PSA Server (v1): “dbnode1 – dbnode6”
      Originally, PSA was hosted on GreenPlum using 6 AOpen mini-PC nodes.
      Performance was slow, disk I/O was roughly 90MB/s (realized), Sysco’s weekly reports took roughly 15 minutes. Database volume was constantly around 90% capacity, causing Mike to have to manually delete tables…space was at a premium.
      Licensing with GreenPlum was expensive ($20,000/6 months….$40,000/yr.) and the system didn’t deliver performance as promised (in either PSA or NewOps). NewOps’ performance should have been significantly better given it’s more robust hardware (12 x DELL PowerEdge 2950’s).
      Since GreenPlum is based on PostgreSQL, it made sense to leverage the underlying free open source code and scrap the proprietary distributed DB solution, opting for a standalone server with enhanced space and I/O. Migrating existing tables to PostgreSQL required very little modification.
      The mini-PC’s we used to cluster GreenPlum were limited in capacity and scalability…each box was sealed and didn’t allow for expansion.
      Mini-PC Details:
      AOpenMP965-D
      Intel® Core™2 Duo CPU T7300 @ 2GHz
      3.24GB Memory
      Bus Speed: 800MHz
      150GBSATA Drive
    • 42. III. PSA Server Case Studies: TYAN Transport + PostgreSQL
      PSA Server (v2): “sentrana-psa-dw”
      This is our second generation PSA box, this time using PostgreSQL 8.3 instead of GreenPlum.
      Formerly used as a testing box at the colo, named “econ.sentrana.com”….consists of a basic Tyan Transport GX28 (B2881) commodity chassis, with a Tyan Thunder K8SR (S2881) motherboard, 2 Dual Core AMD Opteron 270’s @ 1000MHz w/2MBL2 Cache, 8GB memory, and 4 SATA-1 drive bays (SATA-II drives are backwards compatible, able to fit in these bays, however running at SATA-I speed).
      Filesystem: EXT3 (4KB block size = kernel page size)
      Storage Configuration: 4 drives bays = 1 OS drive + 3 RAID5 DB Drives @ SATA-I speed (150MB/s)
      Read Performance: ~ 76.75MB/s
      32
    • 43. III. PSA Server Case Studies: DELL PowerEdge 2950 + PostgreSQL
      PSA Server (v3): “psa-dw-2950”
      This is our third (and current) generation PSA box, still using PostgreSQL, only the server platform has evolved to a DELL PowerEdge 2950, with dual Xeon Quad Core processors @ 2.5GHz, 16GBDDR memory, 1333MHz FSB, and 6 SATA-II/SAS drive bays configured via PCIe PERC6/I integrated RAID controller.
      Formerly used as one of the NewOps DBNode’s, with GreenPlum, this box was rebuilt from the OS out using Ubuntu 8.10 Linux as the OS serving PostgreSQL 8.3 as the DB System.
      Filesystem: XFS (4KB block size = kernel page size)
      Storage Configuration:
      6 x 1TB Drives @ 7,2KRPMs (300Mb/s SATA-II speed) in single RAID5 array
      ~ 5TB actual storage space (5 drive spindles used for data, 1 for RAID5 parity)
      Read Performance: ~ 507MB/s
      33
    • 44. III. PSA Server Case Studies: DELL PowerEdge 2950 + PostgreSQL
      PSA Server (v3): “psa-dw-2950”
      Postgresql.conf settings:
      max_connections = 25
      shared_buffers = 4096MB (1/4 total physical memory)
      (Sets the amount of memory the database server uses for shared memory buffers. )
      temp_buffers = 1024MB
      (Sets the maximum number of temporary buffers used by each database session.)
      work_mem = 4096MB
      Specifies the amount of memory to be used by internal sort operations and hash tables before switching to temporary disk files. (too high = paging will occur, too low = writing to tempdb)
      maintenance_work_mem = 256MB
      random_page_cost = 2.0
      (query planner constant... stating the cost of using disks is 2.0)
      effective_cache_size = 12288MB
      (query planner constant)
      constraint_exclusion = on
      (query planner uses table constraints to optimize queries...e.g. partitioned tables)
      34
    • 45. 1725 Eye St. NW, Suite 900
      Washington DC, 20006
      OFFICE 202.507.4480
      FAX 866.597.3285
      WEB sentrana.com